End-to-End ETL Pipeline for Network Security using Docker & AWS
Introduction In this project, I present a modular, reproducible ETL (Extract, Transform, Load) workflow for network security analytics, leveraging Docker containers, cloud platforms (AWS), and automated pipelines. The aim is seamless, secure data handling from ingestion to machine learning model deployment, addressing real-world security needs. Objectives Part 1: Data Ingestion Gather raw data from multiple sources (CSV files, APIs, internal databases). Automatically store processed data in MongoDB Atlas using a dedicated Docker container. Part 2: Data Validation & Transformation Validate data integrity, schema, and detect drift. Preprocess data: clean, scale, encode, split into features and labels using Docker containers for each phase. Part 3: Model Training, Evaluation & Deployment Train ML models (like PKCKNN, robust scaler, KNN imputer) on processed features. Evaluate model performance with detailed reports. Deploy secure model as a containerized app on AWS using CI/...