This project implements the code examples from the course: `Concurrent and Parallel Programming in Python` by Maximilian Schallwig
MIT License
This project implements the code examples from the course: Concurrent and Parallel Programming in Python
by Maximilian Schallwig.
The project involves building a system that fetches the list of companies from the S&P 500 and retrieves the stock information for each of those companies using Yahoo Finance.
The stock data is then inserted into a PostgreSQL database. The system leverages concurrent and parallel programming in Python to efficiently manage the flow of data between different components: fetching the list of companies, retrieving stock prices, and storing the data in the database.
This project now includes a pipeline feature that allows the process to be configured from a configuration file.
The pipeline executor initializes and manages queues, workers, and schedulers based on the provided configuration, making the system highly flexible and easy to modify.
select * from public.prices;
The primary goal of this project is to learn and demonstrate the concepts of concurrent and parallel programming in Python.
By building a simple yet practical application, we aim to understand how to manage multiple tasks simultaneously, efficiently handle inter-process communication, and effectively utilize system resources.
The project showcases the use of Python's built-in queue functionality, multiprocessing, and logging modules to create a robust and scalable application.
The scope of this project includes:
The project is designed to be a learning exercise, focusing on the practical application of concurrent and parallel programming concepts. It provides a hands-on approach to understanding how to build and manage a system that performs multiple tasks simultaneously, highlighting the challenges and solutions associated with such an approach. The context of this project is educational, aimed at enhancing the developer's skills in Python and system design.
graph TD;
A[Wikipedia] -->|Fetches tickers| B[WikiWorker]
B -->|Puts tickers into| C[Tickers Queue]
C -->|Fetches tickers| D[YahooFinancePriceScheduler]
D -->|Puts stock data into| E[Postgres Queue]
E -->|Fetches stock data| F[PostgresScheduler]
D -->|STOP_SIGNAL| C
subgraph System Design
direction TB
B --> C
D --> E
E --> F
end
subgraph Scheduler Instances
direction LR
D[YahooFinancePriceScheduler]
F[PostgresScheduler]
end
subgraph Queues
direction TB
C[Tickers Queue]
E[Postgres Queue]
end
graph TD;
A[PipelineExecutor Initialization] --> B[Initialize Queues]
B --> C[Create Queues from Config]
C --> D[Assign Queue Instances]
A --> E[Initialize Workers]
E --> F[Create Workers from Config]
F --> G[Import Worker Classes]
G --> H[Assign Input/Output Queues]
H --> I[Instantiate Worker Classes]
A --> J[Initialize Schedulers]
J --> K[Create Schedulers from Config]
K --> L[Import Scheduler Classes]
L --> M[Assign Input/Output Queues]
M --> N[Instantiate Scheduler Instances]
A --> O[Join Schedulers]
O --> P[Collect Schedulers]
P --> Q[Join Scheduler Instances]
A --> R[Setup Pipeline]
R --> B
R --> E
R --> J
Date | Learning |
---|---|
06-08-2024 | Psycopg3 (psycopg) has been realized and offers optimizations for psycopg2. |
06-09-2024 | DB_HOST can be set to db (service) name in the docker-compose.yaml for connectivity between the db and the app. |