GSoC’20 Phase-1 @ OWASP-IIDS

Ashish Malik
3 min readJun 30, 2020

Hey Everyone,

This is my first time of writing a blog related to GSoC. Almost 2 months have passed since the day I got selected for gsoc and i never felt like I was doing any work( although, I was xD). Time flies when you’re having fun.

Since the community bonding and the phase-1 of coding is over, I would like to share my progress, so far. My project was about to create an anamoly based network intrusion detection system which would use AI technologies for detection.

For the project, I decided to use the KDDCup’99 dataset for training. As this project is new, I started coding early in the period of community bonding itself. I started with data-preprocessing of the dataset, where I basically had to clean and transform the dataset, so that it will become machine readable. This involves replacing the missing data with their respective mean or median value and encoding the categorial data. It was simply done by using sklearn.preprocessing package.

As there are 42 features present on the KDDCup dataset it would become difficult for our model to train and detect anomalies, because there were a number of irrelevant features on it. So, next I started with the process of Feature extraction. I had to use RfeCV (Recursive feature elimination with cross-validation) for the extraction process and the results were like this:-

As u can see, when we use a particular set of 25 features, our model will have a accuracy of >90%. And to find those particular set of features, I had to again use the functionality of RfeCV module and through that those 25 features were ranked according to their importance.

By this time, the phase-1 of coding was already started and then I started to create a model for detection. I used my own custom script for choosing the model. This script would use multiple models from sklearn and print the results of their accuracy. For me, it was a very fast way of comparing the models.

After then, I moved on to create a custom neural network by using Pytorch, which is still in progress. In the second phase, I will have to create and setup the backend architecture for the project. I will be sharing more details soon, when the phase-2 of coding will be over. So stay tuned!!

Until Next Time…

--

--

Ashish Malik

GS0C’21 @ Casbin, GSoC'20 @ OWASP, Backend-Dev, Security