This project is an automated ETL pipeline (Extract, Transform, Load) designed to extract commodity data (specifically Gold and Silver) from a free public API.
This project is an automated ETL pipeline (Extract, Transform, Load) designed to extract commodity data (specifically Gold and Silver) from a free public API, transform the data, and load it into a structured format for analysis using AWS services. The pipeline runs on a weekly schedule, pulling fresh data using AWS Lambda functions, and storing both raw and transformed data in Amazon S3. AWS Glue is used to infer the schema, and Amazon Athena is used to query the data.
The ETL pipeline has three main phases: Extract, Transform, and Load.
commodities_data_extraction
) pulls raw commodity data (Gold and Silver prices) from a free API and stores it in the S3 bucket under the folder raw_data/to_process
.raw_data/to_process
, an S3 event trigger fires, invoking the second Lambda function (commodities_data_transform_and_load
).transformed_data/
in S3.raw_data/to_process
: New raw data from the API lands here.raw_data/processed
: Once processed, the raw data is moved to this folder.transformed_data/
: Data that has been transformed and is ready for querying.commodities_data_extraction
raw_data/to_process
folder in S3.commodities_data_transform_and_load
transformed_data
folder in S3.raw_data/to_process
folder (via S3 trigger).commodities_data_extraction
Lambda function every week.raw_data/to_process
folder in the S3 bucket.to_process
folder and invokes the commodities_data_transform_and_load
Lambda function.transformed_data
folder in S3.Get Your Free API Key: Visit GoldAPI and sign up to obtain your API key.
Clone the Repository: Clone this project to your local machine.
Set Up Your Environment: Create a .env
file inside 'For local usage' and add your API key like this:
API_KEY=your_api_key
Install Dependencies: You'll need the pandas and requests libraries. Install them by running:
pip install pandas requests dotenv
Alternatively, feel free to use a virtual environment if you prefer.
Create following folders inside 'For local usage'
|- raw_data
|--- to_process
|--- processed
|- transformed_data
Run the Scripts: Execute the scripts from the "For local usage" section to extract and transform data.