CS244b Course Project

Project Description

Distributed Futures are an extension of traditional RPCs in which a reference to the computed value is returned that may reside on a remote node. We have developed and evaluated a distributed futures system intended for mixed fine-grain and course-grain workloads. Our worker-scheduler pairing can optimize across GPU or CPU simulated workers across different software and hardware generations. System provides a mechanism for clients to track task completion and retrieve results.

The system's performance is measured against multiple matrix operations, map reduce operations and a RAG-LLM (Retrieval-Augmented Generation Large Language Model) application leveraging shared memory for reuse, drawing inspiration from web application prefetching strategies.

Steps to run experiments (using CI pipeline)

Run the github workflow in .github/workflows/python-package-conda.yml using the following command:

gh workflow run

In case of any errors with gh, you may also run the individual commands in the python-package-conda.yml file.

Steps to run expirements (using Driver)

We need to start the scheduler and workers in different terminals. Make sure to start atleast one GPU and one CPU enabled worker. This ensures that all types of workloads can be run. Once scheduler and workers are up, run the tests in the tests directory with pytest.

Following steps go over the starting the scheduler in PowerOf2 mode and start 4 workers (2 of them with GPU).

Install miniconda from https://docs.anaconda.com/free/miniconda/
Create a conda environment for the experiments

conda create --name cs244b --file requirements.txt python=3.9

Activate the conda package and environment manager:

conda activate cs244b

Run compile_proto script from root repo folder CS244B which generates the necessary gRPC proto files. The script depends on your target machine.

For UNIX based machines, run:

./compile_proto.sh

For Windows OS machine, run:

compile_proto.cmd

Start the scheduler server from a terminal -

cd Scheduler;
python3 scheduler_server.py --PortNumber 50051 --SchedulerMode PowerOf2

Reference Command: python3 scheduler_server.py --PortNumber <PortNumber> --SchedulerMode <SchedulingMode> --AssingedWorkersPerTask <AssignedWorkersPerTask>

First parameter is the port number. Type int.
Second parameter is the scheduling algorithm that would be used for request routing in the scheduler. Type string. It supports 4 modes Random, RoundRobin, LoadAware and PowerOf2
Third parameter is an optional parameter representing the number of workers a single task would assigned. Type int. This Task Replication Strategy helps up retrieve a task computation from another worker when one of them is down.

Start multiple worker servers in different terminals -
Worker1

cd Worker;
python3 worker_server.py --PortNumber 50052 --MaxThreadCount 2 --HardwareGeneration Gen2

Worker2

cd Worker;
python3 worker_server.py --PortNumber 50053 --MaxThreadCount 2 --HardwareGeneration Gen2

Worker3

cd Worker;
python3 worker_server.py --PortNumber 50054 --MaxThreadCount 2 --HardwareGeneration Gen2 --gpuEnabled

Worker4

cd Worker;
python3 worker_server.py --PortNumber 50055 --MaxThreadCount 2 --HardwareGeneration Gen2 --gpuEnabled

Reference Command: python3 worker_server.py --PortNumber <PortNumber> --MaxThreadCount <MaxThreadCount> --HardwareGeneration <HardwareGeneration>

First parameter is the port number. Type int.
Second parameter is the number of threads the worker can process. Type int.
Third parameter is the hardware generation parameter. Type string. It supports 4 values Gen1, Gen2, Gen3 and Gen4. Scheduler uses this information in LoadAware and PowerOf2 scheduling modes to determine the number of requests worker can handle.
Fourth parameter is an optional parameter. If --gpuEnabled parameter is passed, worker is treated as GPU Enabled.
Fifth parameter is an optional parameter. If --addDelay parameter is passed, delays are added randomly for the worker operations.

Run driver for another terminal -

cd Driver;
python3 driver.py

Driver code runs all the supported operations in the system with predefined inputs.

Driver can be expanded to modify inputs to run any supported operations

As soon as you start the driver, the tasks would start getting executed and would be visible in terminal logs.

Related Projects

python-worker-dispatcher

A flexible task dispatcher for Python with multiple threading or processing control

17 Apr 2024 0

submitit

Python 3.8+ toolbox for submitting jobs to Slurm

24 Apr 2020 1,266

mpire

A Python package for easy multiprocessing, but faster than multiprocessing

07 Sep 2020 1,786

DeepCoord

DeepCoord: Self-Learning Network and Service Coordination Using Deep Reinforcement Learning

18 Jun 2020 25

cluster-workers

a Python client/master/worker system for distributing jobs in a cluster

29 Aug 2012 44

mlcomp

Distributed DAG (Directed acyclic graph) framework for machine learning with UI

06 Apr 2019 187

ScheduLumos

ScheduLumos is a web application designed to simulate and visualize various CPU scheduling algori...

30 Aug 2024 1

clusterer

A package for easier interface and setup for pipelining in distributed cluster.

16 Jun 2018 12

Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

03 Aug 2023 11,522

Swar-Chia-Plot-Manager

This is a Cross-Platform Plot Manager for Chia Plotting that is simple, easy-to-use, and reliable.

28 Mar 2021 1,255

granian

A Rust HTTP server for Python applications

15 Apr 2022 2,011

spython

Making sbatch more user-friendly (for python users of Jean-Zay).

27 Apr 2023 7

rq-scheduler

A lightweight library that adds job scheduling capabilities to RQ (Redis Queue)

15 Jun 2012 1,388

gputil

A Python module for getting the GPU status from NVIDA GPUs using nvidia-smi programmically in Python

16 Jan 2017 1,128

adaptive-scheduler

Run many functions (adaptively) on many cores (>10k-100k) using mpi4py.futures, ipyparallel, loky...

03 May 2019 26