Build and manage real-life ML, AI, and data science projects with ease!
APACHE-2.0 License
Published by saikonen 12 months ago
Full Changelog: https://github.com/Netflix/metaflow/compare/2.10.4...2.10.5
Published by saikonen about 1 year ago
With this release it is possible to gather telemetry data using an opentelemetry endpoint.
Specifying an endpoint in one of the environment variables
METAFLOW_OTEL_ENDPOINT
METAFLOW_ZIPKIN_ENDPOINT
will enable the corresponding tracing provider.
Some additional dependencies are required for the tracing functionality in the execution environment. These can be installed in the base Docker image, or supplied through a conda environment. The relevant packages are
opentelemetry-sdk, opentelemetry-api, opentelemetry-instrumentation, opentelemetry-instrumentation-requests
and depending on your endpoint, either opentelemetry-exporter-otlp
or opentelemetry-exporter-zipkin
pypi
decoratorThe pypi
decorator now supports using a custom index in the users Pip configuration under global.index-url
.
This enables using private indices, even ones that require authentication.
For example the following would set up one authenticated and two extra non-authenticated indices for package resolution
pip config set global.index-url "https://user:[email protected]"
pip config set global.extra-index-url "https://extra.example.com https://extra2.example.com"
resources
decoratorIt is now possible to specify the ephemeral storage size for Kubernetes jobs when using the resources
decorator with the disk=
attribute.
argo-workflows status
commandAdds a command for easily checking the current status of a workflow on Argo workflows.
python flow.py argo-workflows status [run-id]
There was an issue where relying solely on the Kubernetes apiserver for generating random pod names was resulting in significant collisions with sufficiently large number of executions.
This release adds more randomness to the pod names besides what is generated by Kubernetes.
resources
decorator in combination with step functionsThis release fixes an issue where deploying flows on AWS Step Functions was failing in the following cases
@resources(shared_memory=)
with any value@resources
and @batch(use_tmpfs=True)
Full Changelog: https://github.com/Netflix/metaflow/compare/2.10.3...2.10.4
Published by saikonen about 1 year ago
pandas.DataFrame
indexes for default card by @amerberg in https://github.com/Netflix/metaflow/pull/1574
ArgoEvent.publish
by @savingoyal in https://github.com/Netflix/metaflow/pull/1587
Full Changelog: https://github.com/Netflix/metaflow/compare/2.10.2...2.10.3
Published by oavdeev about 1 year ago
New configuration option to use same headers as metadata service for argo events webhook calls by @oavdeev in https://github.com/Netflix/metaflow/pull/1560 . Default behavior is the same as before.
Metaflow CLI now supports list-workflow-templates
command to list deployed argo workflows by @saikonen in https://github.com/Netflix/metaflow/pull/1577
Full Changelog: https://github.com/Netflix/metaflow/compare/2.10.0...2.10.2
Published by oavdeev about 1 year ago
list-workflow-templates
command for argo-workflows
by @saikonen in https://github.com/Netflix/metaflow/pull/1577 . You can now list deployed Argo workflows from metaflow CLIFull Changelog: https://github.com/Netflix/metaflow/compare/2.10.0...2.10.2
Published by savingoyal about 1 year ago
Coming soon!
Full Changelog: https://github.com/Netflix/metaflow/compare/2.9.15...2.10.0
Published by romain-intel about 1 year ago
We now check for processes in the order in which they complete not in the order in which they are launched. This also increases the likelihood of failing fast.
Deadlocks and errors could occur when using the environment escape mechanism in two cases: (a) GC would occur at an inopportune moment or (b) subprocesses were involved. Both issues were fixed.
Full Changelog: https://github.com/Netflix/metaflow/compare/2.9.14...2.9.15
Published by saikonen about 1 year ago
This release fixes an issue with merging broken log lines.
LD_LIBRARY_PATH
with Conda environmentsIn a Conda environment, it is sometimes necessary to set LD_LIBRARY_PATH
to first include the Conda's environment libraries before anything else. Prior to this release, this used to cause issues with the escape hatch.
Full Changelog: https://github.com/Netflix/metaflow/compare/2.9.13...2.9.14
Published by savingoyal about 1 year ago
The recent annotations feature introduced an issue where project
, flow_name
or user
annotations are not being populated for Kubernetes. This release reverts the changes.
Full Changelog: https://github.com/Netflix/metaflow/compare/2.9.12...2.9.13
Published by saikonen about 1 year ago
The annotations feature introduced in this release has an issue where project, flow_name or user annotations are not being populated for Kubernetes. This has been reverted in the next release.
This release enables users to add custom annotations to the Kubernetes resources that Flows create. The annotations can be configured much in the same way as custom labels
export METAFLOW_KUBERNETES_ANNOTATIONS="first=A,second=B"
@kubernetes(annotations={"first": "A", "second": "B"})
executable
by @romain-intel in https://github.com/Netflix/metaflow/pull/1454
Full Changelog: https://github.com/Netflix/metaflow/compare/2.9.11...2.9.12
Published by savingoyal over 1 year ago
This release reverts a validation fix introduced in 2.9.10, which prevented executions of Metaflow tasks on AWS Batch
Full Changelog: https://github.com/Netflix/metaflow/compare/2.9.10...2.9.11
Published by saikonen over 1 year ago
With this release, Metaflow users can get events on PagerDuty when their workflows succeed or fail on Argo Workflows.
Setting up the notifications is similar to the existing Slack notifications support
python flow.py argo-workflows create --notify-on-error --notify-on-success --notify-pager-duty-integration-key <pager-duty-integration-key>
METAFLOW_ARGO_WORKFLOWS_CREATE_NOTIFY_PAGER_DUTY_INTEGRATION_KEY=<pager-duty-integration-key>
Full Changelog: https://github.com/Netflix/metaflow/compare/2.9.9...2.9.10
Published by saikonen over 1 year ago
@conda
with some S3 providersThis release fixes a bug with the @conda
bootstrapping process. There was an issue with the ServerSideEncryption
support, that affected some of the S3 operations when using S3 providers that do not implement the encryption headers (for example MinIO).
Affected operations were all that handle multiple files at once:
get_many / get_all / get_recursive / put_many / info_many
which are used as part of bootstrapping a @conda
environment when executing remotely.
Full Changelog: https://github.com/Netflix/metaflow/compare/2.9.8...2.9.9
Published by saikonen over 1 year ago
This release fixes an issue with mapping values with spaces from the Argo events payload to flow parameters.
@secrets
by @oavdeev in https://github.com/Netflix/metaflow/pull/1474
Full Changelog: https://github.com/Netflix/metaflow/compare/2.9.7...2.9.8
Published by saikonen over 1 year ago
This release includes new commands for managing workflows on Argo Workflows.
When needed, commands can be authorized by supplying a production token with --authorize
.
argo-workflows delete
A deployed workflow can be deleted through the CLI with
python flow.py argo-workflows delete
argo-workflows terminate
A run can be terminated mid-execution through the CLI with
python flow.py argo-workflows terminate RUN_ID
argo-workflows suspend/unsuspend
A run can be suspended temporarily with
python flow.py argo-workflows suspend RUN_ID
Note that the suspended flow will show up as failed on Metaflow-UI after a period, due to this also suspending the heartbeat process. Unsuspending will resume the flow and its status will show as running again. This can be done with
python flow.py argo-workflows unsuspend RUN_ID
Previously the status for tasks running on Kubernetes was determined through the pod status, which can take a while to update after the last container finishes. This release changes the status checks to use container statuses directly instead.
Full Changelog: https://github.com/Netflix/metaflow/compare/2.9.6...2.9.7
Published by saikonen over 1 year ago
This release introduces the command step-functions delete
for deleting state machines through the CLI.
python flow.py step-functions delete
Comment out the @project
decorator from the flow file, as we do not allow using --name
with projects.
python project_flow.py step-functions --name project_a.user.saikonen.ProjectFlow delete
python project_flow.py --production step-functions delete
# or
python project_flow.py --branch custom step-functions delete
add --authorize PRODUCTION_TOKEN
to the command if you do not have the correct production token locally
This release fixes an issue with the S3 server side encryption support, where some S3 compliant providers do not respond with the expected encryption method in the payload. This bug specifically affected regular operation when using MinIO.
--with environment
in AirflowFixes a bug with the Airflow support for environment variables, where the env values set in the environment decorator could get overwritten.
--with environment
in Airflow by @valayDave in https://github.com/Netflix/metaflow/pull/1459
Full Changelog: https://github.com/Netflix/metaflow/compare/2.9.5...2.9.6
Published by saikonen over 1 year ago
There is now the possibility to choose which server side encryption method to use for S3 uploads by setting an environment variable METAFLOW_S3_SERVER_SIDE_ENCRYPTION
with an appropriate value, for example aws:kms
or AES256
This release fixes an issue where using parameters on Argo Workflows caused the values to be unnecessarily quoted.
In case you need any assistance or have feedback for us, ping us at chat.metaflow.org or open a GitHub issue.
Full Changelog: https://github.com/Netflix/metaflow/compare/2.9.4...2.9.5
Published by saikonen over 1 year ago
Using an email address as the username when deploying with a @project
decorator to Argo Workflows is now possible. This release fixes an issue with some generated resources containing characters that are not permitted in names of Argo Workflow resources.
secrets
decorator now supports assuming rolesThis release adds the capability to assume specific roles when accessing secrets with the @secrets
decorator. The role for accessing a secret can be defined in the following ways
By setting the METAFLOW_DEFAULT_SECRET_ROLE
environment variable, this role will be assumed when accessing any secret specified in the decorator.
This will assume the role secret-iam-role
for accessing all of the secrets in the sources list.
@secrets(
sources=["first-secret-source", "second-secret-source"],
role="secret-iam-role"
)
Assuming a different role based on the secret in question can be done as well
@secrets(
sources=[
{"type": "aws-secrets-manager", "id": "first-secret-source", "role": "first-secret-role"},
{"type": "aws-secrets-manager", "id": "second-secret-source", "role": "second-secret-role"}
]
)
In case you need any assistance or have feedback for us, ping us at chat.metaflow.org or open a GitHub issue.
Full Changelog: https://github.com/Netflix/metaflow/compare/2.9.3...2.9.4
Published by romain-intel over 1 year ago
Duplicate Metaflow Extensions packages were not properly ignored in all cases. This release fixes this and will allow the loading of extensions even if they are present in duplicate form in your sys.path.
In some cases, packages from the outside environment (non Conda) could leak into the Conda environment when using the environment escape functionality. This release addresses this issue and ensures that no spurious packages are imported in the Conda environment.
In case you need any assistance or have feedback for us, ping us at chat.metaflow.org or open a GitHub issue.
Full Changelog: https://github.com/Netflix/metaflow/compare/2.9.2...2.9.3
Published by savingoyal over 1 year ago
With this release, Metaflow users can specify image pull policy for their workloads through the @kubernetes decorator for Metaflow tasks.
@kubernetes(image='foo:tag', image_pull_policy='Always') # Allowed values are Always, IfNotPresent, Never
@step
def train(self):
...
...
If an image pull policy is not specified, and the tag for the container image is :latest or the tag for the container image is not specified, image pull policy is automatically set to Always.
If an image pull policy is not specified, and the tag for the container image is specified as a value that is not :latest, image pull policy is automatically set to IfNotPresent.
In case you need any assistance or have feedback for us, ping us at chat.metaflow.org or open a GitHub issue.
Full Changelog: https://github.com/Netflix/metaflow/compare/2.9.1...2.9.2