Manage your resources such as service and data connections using YAML and get credential management along the way. It's for all of data scientists, data engineers and developers.
MIT License
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
This projects provides an abstraction layer between secrets and services by externalizing the configuration using yaml. In constrast to other config libraries the library returns fully-configured and authenticated client SDK objects for services. Secrets can be fetched from a number of sources. It's expected that the briefcase.yaml is stored along side notebooks (e.g. in the root folder of a git repository).
Accessing private blobs is usually performed using SAS tokens or by sharing account keys.
pip install mlbriefcase
Create briefcase.yaml
azure:
storage:
blob:
-name: blob1
url: https://myblob123.blob.core.windows.net/test/test.csv
# use Azure Storage Account key
export blob1=KwY...8w==
import mlbriefcase
import pandas as pd
# searches for briefcase.yaml in current directory and all parent directories
briefcase = mlbriefcase.Briefcase()
# let's get the resource by name
blob = briefcase['blob1']
# Performs
# - probe credential providers (e.g. environment variable, dotenv, ...) to find storage account key
# - create Azure Storage SDK object (available through blob.get_client())
# - generated authenticated url using SAS token
url = blob.get_url()
df = pd.read_csv(url, sep='\t')
This example demonstrate how to get the Azure Cognitive Vision service client.
Create briefcase.yaml
azure:
cognitiveservice:
vision:
- name: vision1
vision1=<Insert Cog Service Key>
import mlbriefcase
# searches for briefcase.yaml in current directory and all parent directories
briefcase = mlbriefcase.Briefcase()
# Performs
# - probe credential providers (e.g. environment variable, dotenv, ...) to find cognitive service key
# - initialize the Cognitive Service Vision SDK object
vision = briefcase['vision1'].get_client()
vision.detect... # TODO
In the example below the Cognitive Service Vision token is searched using VISION_KEY. Since the url is not specified and remapped it's search using VISION_URL.
azure:
cognitiveservice:
vision:
- name: vision1
secret:
key: VISION_KEY
url:
key: VISION_URL
As mentioned earlier which credential provider is used for lookup can be customized using the credentialprovider property.
azure:
keyvault:
- name: kv1
dnsname: https://myvault.vault.azure.net/
storage:
account:
- name: blob1
accountname: test1
credentialprovider: kv1
account:
- name: blob2
accountname: test2
credentialprovider: env
python:
env:
- name: env
To ease authoring we provide a JSON schema used by VS Code yaml plugin and enables IntelliSense in VS Code.
The default order for credential provider resolution:
For Azure resources the following authentication methods are supported
Add the following cells to your Jupyter notebook (and yes the first cell throws an error, but that seems to be required).
%config Application.log_level='WORKAROUND'
import logging
logging.getLogger('briefcase').setLevel(logging.DEBUG)
Run
pip install -e .[test]
cd tests
pytest -s . -k test_sql_alchemy
Note: most tests depend on secrets thus you won't be able to run them without setting up your own resources.
azure:
cognitiveservice:
vision:
- name: vision1
To get live updates of JSON schema and validate in VS Code, update the settings to directly reference the JSON schema.
"yaml.schemas": {
"file:///mnt/c/work/Workspace/mlbriefcase/briefcase-schema.json": ["briefcase.yaml"]
}