This repository contains the basic repository structure for machine learning projects based on Azure technologies (Azure ML and Azure DevOps).
OTHER License
The idea of this template is to provide a minimum number of scripts to implement development environment to train new models using Azure ML SDK v2 With Azure DevOps or Github Actions.
The template contains the following folders/files:
The template contains the following documents:
Information about how to setup the repo is in the following document.
You can start training pipelines from your local computer by creating an environment based on the following instructions:
Rename .env.sample to .env and update .env file with values from your Azure subscription for the following properties (Any values that are already set can be left unchanged (BUILD_BUILDID="local"). This value is dynamic when run in the context of Azure DevOps or Github Actions, and used for various naming/tagging purposes.):
SUBSCRIPTION_ID
RESOURCE_GROUP_NAME
WORKSPACE_NAME
Check all parameters in config.yaml for the model under test. Note: In the sample code provided in this solution, the development team elected to use a single config file, but this is by no means the only way to do this. It's possible to simplify configs by extracting elements common across all models into their own file, and to create model-specific configs in their own files. The Class MLOPsConfig supports passing config_path in its constructor enabling a modular design for configuration.
Create the an environment on your local machine using one of the following options below.
(Option 1). VSCode base dev container
(Option 2). Use the docker_taxi Dockerfile for the dev container
(Option 3). Create a local conda environment
Open the terminal and run the following commands to create a conda environment (we assume that anaconda has been installed on your local computer):
Sign in with Azure CLI : run az login -t <your tenant>
Note: Before running the training pipeline locally, you will have to have the data assets registered. If not already done, you can register the data using the following command:
python -m mlops.common.register_data_asset --data_config_path config/data_config.json
Run the training pipeline under test using the module notation (for example, python -m mlops.nyc_taxi.start_local_pipeline --build_environment pr --wait_for_completion True
)
Caching is used to store Python dependencies to improve build times by reusing packages between runs. The cache is managed using the Cache@2 task in the pipeline.
An example of how caching is implemented in this repo can be found in build_validation_pipeline.yml.
Cache Key: A unique key based on python_build_validate
, the agent OS ($(Agent.OS)
), and the build_validation_requirements.txt
file.
Example:
python_build_validate | "$(Agent.OS)" | .azure-pipelines/requirements/build_validation_requirements.txt`
Cache Path: Dependencies are cached at $(PIP_CACHE_DIR)
, where pip
stores package files.
Restore Keys: If an exact cache match isn’t found, the pipeline will attempt to restore based on partial keys:
python_build_validate | "$(Agent.OS)"`.
PIP_CACHE_DIR
: Directory where pip
stores cached package files.Agent.OS
: The operating system of the build agent, used as part of the cache key.You can use Visual Studio Code to run and debug specific tasks related to the MLOps pipelines. The following configurations are set up in the launch.json file, allowing you to execute various scripts with ease.
Register Data Asset
python -m mlops.common.register_data_asset --data_config_path config/data_config.json
Start NYC Taxi Local Pipeline
python -m mlops.nyc_taxi.start_local_pipeline --build_environment=<environment> --wait_for_completion=<True/False>
build_environment
and whether the pipeline should wait for completion.Start London Taxi Local Pipeline
python -m mlops.london_taxi.start_local_pipeline --build_environment=<environment> --wait_for_completion=<True/False>
build_environment
and whether the pipeline should wait for completion.Register Data Asset
Start NYC Taxi Local Pipeline
Start London Taxi Local Pipeline
▶
) next to the dropdown to start the task.pr
, dev
, or any other configured environments.True
if you want the pipeline to wait for completion before exiting, or False
to allow it to run asynchronously.build_environment
and wait_for_completion
are defined in the launch.json file and can be modified to suit your project’s needs..vscode
directory to verify the configuration.This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.