AI_Cybersecurity_IDS_PoC

Winning Contribution of Michael Schwabe and David Lassig to BWI Data Analytics Hackathon 2020 in the Category Cyber Security. Proof of Concept Intrusion Detection using Zeek with selfmade MachineLearning in a nice WebApp.

Stars
7

AI_Cybersecurity_IDS_PoC

and

Davids Udacity CloudDevOps Nanodegree Capstone Project

  • Winning Solution of BWI Data Analytics Hackathon 2020
  • CloudDevOps Pipeline with Green-Blue-Deployment for Davids Udacity CloudDevOps Nanodegree Capstone Project

App Screenshots

  • (as App is running on privately-owned real Internet-connected Infrastructure IPs are blurred)
Monitoring Dashboard Model Performance Anomaly Training Application of Models

Concept

  • unfortunately only in german :/

Features

  • Live-updating Webapp with DataPipeline from live running Zeek-Logs
    • extensive and easily extentable Monitoring Dashboard
  • Application of Neural Net and Random Forest models trained on pretrained labelled data against live Zeek logs
  • Training of Anomaly Detection using IsolationForest can be triggered during Runtime

Content

  • analysis contains all stuff Michael did for

    • exploring the used labelled data from UNSW-NB15 Datasets
    • checking out the performance of different models (mainly Random Forest and Neural Nets)
    • train and optimize the best model approaches using keras-tuner
  • app contains all stuff David did for

    • creating the live-updating Datapipeline using zeek logs
      • parsing them with an tinkered version of ParseZeekLogs for enabling continuously feeding the logs into the pipeline
      • and pygtail for also continuously feeding the logs into the pipeline
    • creating Webapp using plotly and Dash
    • Implementing live trained Anomaly Detection using Isolation Forest from scikit-learn

Installation/Deployment (CloudDevOps Nanodegree Part)

CircleCI Branch CI/CD Pipeline CircleCI Main CI/CD Pipeline

Local Docker-Compose Deployment

  1. Clone the repository:

    git clone https://github.com/herrfeder/AI_Cybersecurity_IDS_PoC.git
    
  2. Go into Deploy Folder and run_compose.sh to run file-based or kafka-based Stack:

    deploy/run_compose.sh kafka
    # OR
    deploy/run_compose.sh file
    
  • first run will take very long because Docker Containers will be build locally and the zeek compilation and Kafka Plugin Install will take a while
  1. Go to http://127.0.0.1:8050/

Local Kubernetes Deployment

  1. You need to build the previous Compose-based stack at least once and upload the resulting Docker Container using the upload-docker.sh script or you relying on my public-built Container:
  1. You have to prepare and start minikube and run run_kube_local.sh:

    cd deploy
    ./run_kube_local.sh file
    # OR (you can run booth as well)
    ./run_kube_local.sh file 
    
  2. Now add local Ingress Rule to reach the broai endpoint:

    kubectl apply -f broai_kubernetes/ingress-local-service.yaml
    # Check now these ingress service with
    kubectl get svc
    
  3. Now add green.broai and blue.broai with your minikube IP to your /etc/hosts and visit this domains.

AWS Kubernetes Deployment

  1. You need to build the previous Compose-based stack at least once and upload the resulting Docker Container using the upload-docker.sh script or you relying on my public-built Container:
  1. Install aws-cli and deploy the Network and Cluster Requirements with the provided AWS Cloudformation Scripts:

    cd .circleci
    
    scripts/push_cloudformation_stack.sh broainetwork cloudformation/network.yaml <your individual id>
    scripts/push_cloudformation_stack.sh broaicluster cloudformation/cluster.yaml <your individual id>
    
  2. Get Access Token to acess your AWS EKS Cluster with kubectl:

    cd deploy
    
    mkdir .kube
    aws eks --region us-west-2 update-kubeconfig --kubeconfig .kube/config-aws --name AWSK8SCluster
    
  3. Deploy Kubernetes Manifests:

    ./run_kube_aws.sh
    
  4. Go to http://127.0.0.1:8050/

  5. Wait for finishing and check with kubectl --kubeconfig .kube/config-aws get svc the resulting Loadbalancer Hostnames and access them. :)

TODO

  • replacing filebased Datapipeline by Apache Kafka feed (DONE in scope of Davids Udacity CloudDevOps Nanodegree Capstone Project)
    • faster feeding into webapp
    • more elegant data management
  • also enabling Random Forest and Neural Net training during runtime
  • feeding predicted live-data into analysis workflow for automatic re-evaluation and re-training