iot_cloud_infrastructure

A Docker Compose-based IoT data pipeline for local development, featuring MQTT, MinIO, Cassandra, FastAPI, and Airflow for easy testing and expansion.

MIT License

Stars
0
Committers
2

IoT "Cloud" Data Pipeline

This repository helps you understand the basic components needed to build a data pipeline for IoT data and how they work together. Use this setup to test individual components or see how they function as a complete system. You can also expand this setup to create a more complex pipeline and deploy it to cloud platforms like AWS, Azure, or Google Cloud.

I chose Docker Compose for local deployment to focus on understanding the components and their interactions without the complexity of cloud providers. This approach also makes it easy to share the setup and run it on any machine with minimal effort.

The pipeline and infrastructure include:

  • MQTT Broker
  • MQTT Agent/Application
  • Data Lake (MinIO)
  • Database (Cassandra)
  • REST API (FastAPI)
  • Orchestration (Airflow)
  • Transformation (ELT)

The components are connected as follows:

  1. The MQTT Broker is the entry point for the data. It receives data from the IoT devices and publishes it to a topic.
  2. The MQTT Agent subscribes to the topic and writes it to the Data Lake.
  3. The Data Lake stores raw data and acts as the source for the Transformation component.
  4. The Transformation reads raw data from the Data Lake, processes it, and writes it to the Database. Airflow is used to orchestrate the workflow.

Once you have clean data in the database, you can use it for analytics, machine learning, or other applications.

Table of Contents

Prerequisites Component Testing

Integration Testing

Stopping and Cleaning Up

Good to know

Prerequisites

git clone https://github.com/daleonpz/iot_cloud_test.git
cd iot_cloud_test

Component Testing

MQTT Broker

  1. Build and Run
cd mqtt
docker build -t my-broker .
docker run -d --name my-broker -p 1883:1883 my-broker
  1. Test MQTT Broker
  • Subscribe to Topic
docker exec -it my-broker mosquitto_sub -h localhost -t test
  1. Publish to Topic

In another terminal:

docker exec -it my-broker mosquitto_pub -h localhost -t test -m "hello"

Data Lake (MinIO)

  1. Build and Run
cd datalake
docker build -t my-datalake .
docker run -d --name my-datalake -p 9000:9000 -e "MINIO_ACCESS_KEY=minio" -e "MINIO_SECRET_KEY=minio123" my-datalake server /data --console-address ":9001"
  1. Access Data Lake

Open http://localhost:9000 in your browser.

  • Access Key: minio
  • Secret Key: minio123

If not accessible via localhost, use the container's IP address:

docker logs my-datalake

Database (Cassandra)

  1. Build and Run
cd database
docker build -t my-db .
docker run -d --name my-db -p 9042:9042 my-db
  1. Test Cassandra with cqlsh
docker exec -it my-db cqlsh localhost
  1. Run the following commands in cqlsh:
    CREATE KEYSPACE iot WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
    USE iot;
    CREATE TABLE measurements (id UUID PRIMARY KEY, temperature float, battery_level float);
    INSERT INTO measurements (id, temperature, battery_level) VALUES (uuid(), 25.0, 50.0);
    SELECT * FROM measurements;

REST API (FastAPI)

  1. Build and Run
cd restapi
docker build -t api .
docker run -d --name api -p 8000:8000 --link my-db:my-db api
  1. Test API

For debugging:

docker run -it --name api -p 8000:8000 --link my-db:my-db api bash
  • Send Data to the Database
curl -X GET "http://localhost:8000/data/{id}" -H "accept: application/json" -d '{"temperature": 25.0, "battery_level": 50.0}'
  • Get Data from the Database
curl -X POST "http://localhost:8000/data/{id}" -H "accept: application/json"

Transformation (ELT)

  1. Build and Run
docker-compose -f docker-compose.yml.etl_test up --build
  1. Verify Data
docker exec -it my-db cqlsh localhost
  1. Run the following commands in cqlsh:
USE iot;
SELECT * FROM measurements;

Integration Testing

MQTT with Data Lake

  1. Build and Run
docker-compose -f docker-compose.yml.mqtt_app_test up --build
  1. Publish Test Data
cd mqtt/
python mqtt_publisher_test.py

Airflow Workflow

  1. Build and Run
docker-compose -f docker-compose.yml up --build
  1. Publish Test Data
cd mqtt/
python mqtt_publisher_test.py
  1. Access Airflow

Log in to http://localhost:8080 with:

  • Username: airflow
  • Password: airflow

Trigger the DAG:

  • Click on "transform_data" under the "DAG" tab.
  • Click "Trigger DAG" or the "Play" button.
  1. Verify Data
docker exec -it my-db cqlsh localhost

Run the following commands in cqlsh:

USE iot;
SELECT * FROM measurements;

Stopping and Cleaning Up

  1. Remove All Containers
./tools/delete_containers.sh
  1. Delete All Images
./tools/delete_docker_images.sh

Good to know

  • There is a .env file in the root directory that sets environment variables for the services. You can modify this file as needed.