metaflow

Build and manage real-life ML, AI, and data science projects with ease!

APACHE-2.0 License

Downloads
903.6K
Stars
7.5K
Committers
79

Bot releases are hidden (Show)

metaflow - 2.5.1 (Feb 15, 2022)

Published by oavdeev over 2 years ago

Metaflow 2.5.1 Release Notes

The Metaflow 2.5.1 release is a minor release.

New Features

  • Introduce Mamba as a dependency solver for @conda in https://github.com/Netflix/metaflow/pull/918 . Mamba promises faster package dependency resolution times, which should result in an appreciable speedup in flow environment initialization. It is not yet enabled by default; to use it you need to set METAFLOW_CONDA_DEPENDENCY_RESOLVER to mamba in Metaflow config.

Improvements

Full Changelog: https://github.com/Netflix/metaflow/compare/2.5.0...2.5.1

metaflow - 2.5.0 (Jan 25, 2022)

Published by oavdeev over 2 years ago

Metaflow 2.5.0 Release Notes

The Metaflow 2.5.0 release is a minor release.

New Features

✨ Metaflow cards are now publicly available! For details, see a new section in the documentation, Visualizing Results, and a release blog post.

Bug Fixes

Full Changelog: https://github.com/Netflix/metaflow/compare/2.4.9...2.5.0

metaflow - 2.4.9 (Jan 18, 2022)

Published by oavdeev almost 3 years ago

Metaflow 2.4.9 Release Notes

The Metaflow 2.4.9 release is a patch release.

Improvements

Bug Fixes

Full Changelog: https://github.com/Netflix/metaflow/compare/2.4.8...2.4.9

metaflow - 2.4.8 (Jan 10, 2022)

Published by oavdeev almost 3 years ago

Metaflow 2.4.8 Release Notes

The Metaflow 2.4.8 release is a patch release.

Bug fixes

Improvements

Full Changelog: https://github.com/Netflix/metaflow/compare/2.4.7...2.4.8

metaflow - 2.4.7 (Dec 16, 2021)

Published by oavdeev almost 3 years ago

Metaflow 2.4.7 Release Notes

The Metaflow 2.4.7 release is a patch release. We skipped 2.4.6 for technical reasons.

Improvements

  • added plumbing for @card decorator
  • added plumbing to support distributed training on GPUs

Full Changelog: https://github.com/Netflix/metaflow/compare/2.4.5...2.4.7

metaflow - 2.4.6 (Dec 16, 2021)

Published by oavdeev almost 3 years ago

Metaflow 2.4.6 Release Notes

The Metaflow 2.4.6 release is a patch release.

Improvements

  • added plumbing for @card decorator
  • added plumbing to support distributed training on GPUs

Full Changelog: https://github.com/Netflix/metaflow/compare/2.4.5...2.4.6

metaflow - 2.4.5 (Dec 8 2021)

Published by oavdeev almost 3 years ago

Metaflow 2.4.5 Release Notes

The Metaflow 2.4.5 release is a patch release.

Bug fixes

Full Changelog: https://github.com/Netflix/metaflow/compare/2.4.4...2.4.5

metaflow - 2.4.4 (Nov 29 2021)

Published by oavdeev almost 3 years ago

Metaflow 2.4.4 Release Notes

The Metaflow 2.4.4 release is a patch release.

  • Improvements
    • Add default image config option as described in #489 (#813)
    • Read default k8s namespace from config (#823)
  • Bug Fixes
    • Fixed a couple of issues in S3 error handling (#821)
    • Fixed an issue with load_artifacts when several artifacts have the same name (
      #817)
  • Misc internal improvements
    • Pipe logs to $cwd/.logs instead of /logs for @batch & @kubernetes (#807)
    • mflog changes for supporting AWS Lambda (#801)
    • Add 'last modified' to S3 object (#778)

Improvements

Add default image config option as described in #489 (#813)

We're moving to a more consistent scheme for naming options related to docker images. You can read the details in #489, but this release introduces new config options DEFAULT_CONTAINER_IMAGE and DEFAULT_CONTAINER_REGISTRY that can be used to specify docker image in addition to plugin-specific options like KUBERNETES_CONTAINER_IMAGE

Read default k8s namespace from config (#823)

This adds a new configuration option to set the default namespace for the Kubernetes plugin

metaflow - 2.4.3 (Nov 3rd 2021)

Published by romain-intel almost 3 years ago

Metaflow 2.4.3 Release Notes

The Metaflow 2.4.3 release is a patch release

Bug Fixes

Fix a race condition when accessing artifacts of a running task (#789)

When accessing artifacts of a running task using Task(...).artifacts, a race condition existed and the call could return a difficult to understand error message. This release fixes this issue and making this call will either return the artifacts present or no artifacts at all if none are present yet.

Fix an issue when using a combination of @catch and @retry decorators (#776)

A step as below:

@retry(times=2)
@catch(var='exception')
@step
def my_step(self):
    raise ValueError()

would not retry 2 times as expected but instead the exception would be caught the first time around. This release fixes this issue and the step will now execute a total of 3 times and the exception will be caught on the third time.

Upgrade Pandas in tutorials (#707)

On MacOS Big Sur, certain tutorials were broken due to using an older version of Pandas. This updates the tutorials to use 1.3.3 to solve this issue.

metaflow - 2.4.2 (Oct 25th 2021)

Published by savingoyal almost 3 years ago

Metaflow 2.4.2 Release Notes

The Metaflow 2.4.2 release is a patch release

Bug Fixes

Fix a bug with accessing legacy logs through metaflow.client (#779)

Metaflow v2.4.1 introduced a bug (due to a typo) in accessing legacy task logs through metaflow.client

Task("pathspec/to/task").stdout

This release fixes this issue.

Fix a bug with task datastore access when no task attempt has been recorded (#780)

A subtle bug was introduced in Metaflow 2.4.0 where the task datastore access fails when no task attempt was recorded. This release fixes this issue.

metaflow - 2.4.1 (Oct 18th 2021)

Published by savingoyal about 3 years ago

Metaflow 2.4.1 Release Notes

The Metaflow 2.4.1 release is a patch release

Bug Fixes

Expose non-pythonic dependencies inside the conda environment on AWS Batch (#735)

Prior to this release, non-pythonic dependencies in a conda environment were not automatically visible to a Metaflow task executing on AWS Batch (see #734) (they were available for tasks that were executed locally). For example

import os
from metaflow import FlowSpec, step, conda, conda_base, batch

class TestFlow(FlowSpec):

    @step
    def start(self):
        self.next(self.use_node)

    @batch
    @conda(libraries={"nodejs": ">=16.0.0"})
    @step
    def use_node(self):
        print(os.system("node --version"))
        self.next(self.end)

    @step
    def end(self):
        pass


if __name__ == "__main__":
    TestFlow()

would print an error. This release fixes the issue with the incorrect PATH configuration.

New Features

Introduce size properties for artifacts and logs in metaflow.client (#752)

This release exposes size properties for artifacts and logs (stderr and stdout) in metaflow.client. These properties are relied upon by the Metaflow UI (open-sourcing soon!).

Expose attempt level task properties (#725)

In addition to the above mentioned properties, now users of Metaflow can access attempt specific Task metadata using the client

Task('42/start/452', attempt=1)

Introduce @kubernetes decorator for launching Metaflow tasks on Kubernetes (#644)

This release marks the alpha launch of @kubernetes decorator that allows farming off Metaflow tasks onto Kubernetes. The functionality works in exactly the same manner as @batch -

from metaflow import FlowSpec, step, resources

class BigSum(FlowSpec):

    @resources(memory=60000, cpu=1)
    @step
    def start(self):
        import numpy
        import time
        big_matrix = numpy.random.ranf((80000, 80000))
        t = time.time()
        self.sum = numpy.sum(big_matrix)
        self.took = time.time() - t
        self.next(self.end)

    @step
    def end(self):
        print("The sum is %f." % self.sum)
        print("Computing it took %dms." % (self.took * 1000))

if __name__ == '__main__':
    BigSum()
python big_sum.py run --with kubernetes

will run all steps of this workflow on your existing EKS cluster (which can be configured with metaflow configure eks) and provides all the goodness of Metaflow!

To get started follow this guide! We would appreciate your early feedback at http://slack.outerbounds.co.

metaflow - 2.4.0 (Oct 4th 2021)

Published by romain-intel about 3 years ago

Metaflow 2.4.0 Release Notes

The Metaflow 2.4.0 release is a minor release and includes a breaking change

Breaking Changes

Change return type of created_at/finished_at in the client (#692)

Prior to this release, the return type for created_at and finished_at properties in the Client API was a timestamp
string. This release changes this to a datetime object, as the old behavior is considered an unintentional mis-feature
(see below for details).

How to retain the old behavior

To keep the old behavior, append an explicit string conversion, .strftime('%Y-%m-%dT%H:%M:%SZ'), to
the created_at and finshed_at calls, e.g.

run.created_at.strftime('%Y-%m-%dT%H:%M:%SZ')

Background

The first versions of Metaflow (internal to Netflix) returned a datetime object in all calls dealing with timestamps in
the Client API to make it easier to perform operations between timestamps. Unintentionally, the return type was changed
to string in the initial open-source release. This release introduces a number of internal changes, removing all
remaining discrepancies between the legacy version of Metaflow that was used inside Netflix and the open-source version.

The timestamp change is the only change affecting the user-facing API. While Metaflow continues to make a strong promise
of backwards compatibility of user-facing features and APIs, the benefits of one-time unification outweigh the cost of this
relatively minor breaking change.

Bug Fixes

Better error messages in case of a Conda issue (#706)

Conda errors printed to stderr were not surfaced to the user; this release addresses this issue.

Fix error message in Metadata service (#690)

The code responsible for printing error messages from the metadata service had a problem that could cause it to be unable to print the correct error message and would instead raise another error that obfuscated the initial error. This release addresses this issue and errors from the metadata service are now properly printed.

New Features

S3 retry counts are now configurable (#700)

This release allows you to set the number of times S3 access are retried (the default is 7). The relevant environment variable is: METAFLOW_S3_RETRY_COUNT.

New datastore implementation resulting in improved performance (#580)

The datastore implementation was reworked to make it easier to extend in the future. It also now uploads artifacts in parallel to S3 (as opposed to sequentially) which can lead to better performance. The changes also contribute to a notable improvement in the speed of resume which can now start resuming a flow twice as fast as before. Documentation can be found here.

S3 datatools performance improvements (#697)

The S3 datatools better handles small versus large files by using the download_file command for larger files and using get_object for smaller files to minimize the number of calls made to S3.

metaflow - 2.3.6 (Sep 8th, 2021)

Published by savingoyal about 3 years ago

Metaflow 2.3.6 Release Notes

The Metaflow 2.3.6 release is a patch release.

Bug Fixes

Fix recursion error when METAFLOW_DEFAULT_ENVIRONMENT is set to conda

Prior to this release, setting default execution environment to conda through METAFLOW_DEFAULT_ENVIRONMENT would result in a recursion error.

METAFLOW_DEFAULT_ENVIRONMENT=conda python flow.py run
  File "/Users/savin/Code/metaflow/metaflow/cli.py", line 868, in start
    if e.TYPE == environment][0](ctx.obj.flow)
  File "/Users/savin/Code/metaflow/metaflow/plugins/conda/conda_environment.py", line 27, in __init__
    if e.TYPE == DEFAULT_ENVIRONMENT][0](self.flow)
  File "/Users/savin/Code/metaflow/metaflow/plugins/conda/conda_environment.py", line 27, in __init__
    if e.TYPE == DEFAULT_ENVIRONMENT][0](self.flow)
  File "/Users/savin/Code/metaflow/metaflow/plugins/conda/conda_environment.py", line 27, in __init__
    if e.TYPE == DEFAULT_ENVIRONMENT][0](self.flow)
  [Previous line repeated 488 more times]
  File "/Users/savin/Code/metaflow/metaflow/plugins/conda/conda_environment.py", line 24, in __init__
    from ...plugins import ENVIRONMENTS
RecursionError: maximum recursion depth exceeded

This release fixes this bug.

Allow dots in host_volumes attribute for @batch decorator

Dots in volume names - @batch(host_volumes='/path/with/.dot') weren't being santized properly resulting in errors when a Metaflow task launched on AWS Batch. This release fixes this bug.

metaflow - 2.3.5 (Aug 23rd, 2021)

Published by savingoyal about 3 years ago

Metaflow 2.3.5 Release Notes

The Metaflow 2.3.5 release is a patch release.

Features

Enable mounting host volumes in AWS Batch

With this release, you can now mount and access instance host volumes within a Metaflow task running on AWS Batch. To access a host volume, you can add host-volumes argument to your @batch decorator -

@batch(host_volumes=['/home', '/var/log'])

Bug Fixes

Fix input values for Parameters of type list within a Metaflow Foreach task

The following flow had a bug where the value for self.input was being imputed to None rather than the dictionary element. This release fixes this issue -

from metaflow import FlowSpec, Parameter, step, JSONType

class ForeachFlow(FlowSpec):
    numbers_param = Parameter(
        "numbers_param",
        type=JSONType,
        default='[1,2,3]'
    )
    
    @step
    def start(self):
        # This works, and passes each number to the run_number step:
        #
        # self.numbers = self.numbers_param
        # self.next(self.run_number, foreach='numbers')
        
        # But this doesn't:
        self.next(self.run_number, foreach='numbers_param')
        
    @step
    def run_number(self):
        print(f"number is {self.input}")
        self.next(self.join)
        
    @step
    def join(self, inputs):
        self.next(self.end)
        
    @step
    def end(self):
        pass
    
if __name__ == '__main__':
    ForeachFlow()
metaflow - 2.3.4 (Aug 11th, 2021)

Published by savingoyal about 3 years ago

Metaflow 2.3.4 Release Notes

The Metaflow 2.3.4 release is a patch release.

Bug Fixes

Fix execution of step-functions create when using an IncludeFile parameter

PR #607 in Metaflow 2.3.3 introduced a bug with step-functions create command for IncludeFile parameters. This release rolls back that PR. A subsequent release will reintroduce a modified version of PR #607.

metaflow - 2.3.3 (Jul 29th, 2021)

Published by savingoyal about 3 years ago

Metaflow 2.3.3 Release Notes

The Metaflow 2.3.3 release is a patch release.

Features

Support resource tags for Metaflow's integration with AWS Batch

Metaflow now supports setting resource tags for AWS Batch jobs and propagating them to the underlying ECS tasks. The following tags are attached to the AWS Batch jobs now -

  • metaflow.flow_name
  • metaflow.run_id
  • metaflow.step_name
  • metaflow.user / metaflow.owner
  • metaflow.version
  • metaflow.production_token

To enable this feature, set the environment variable (or alternatively in the metaflow config) METAFLOW_BATCH_EMIT_TAGS to True. Keep in mind that the IAM role (MetaflowUserRole, StepFunctionsRole) submitting the jobs to AWS Batch will need to have the Batch:TagResource permission.

Bug Fixes

Properly handle None as defaults for parameters for AWS Step Functions execution

Prior to this release, a parameter specification like -

Parameter(name="test_param", type=int, default=None)

will result in an error even though the default has been specified

Flow failed:
    The value of parameter test_param is ambiguous. It does not have a default and it is not required.

This release fixes this behavior by allowing the flow to execute as it would locally.

Fix return value of IncludeFile artifacts

The IncludeFile parameter would return JSONified metadata about the file rather than the file contents when accessed through the Metaflow Client. This release fixes that behavior by returning instead the file contents, just like any other Metaflow data artifact.

metaflow - 2.3.2 (Jun 29th 2021)

Published by savingoyal over 3 years ago

Metaflow 2.3.2 Release Notes

The Metaflow 2.3.2 release is a minor release.

  • Features
    • step-functions trigger command now supports --run-id-file option

Features

step-functions trigger command now supports --run-id-file option

Similar to run , you can now pass --run-id-file option to step-function trigger. Metaflow then will write the triggered run id to the specified file. This is useful if you have additional scripts that require the run id to examine the run or wait until it finishes.

metaflow - 2.3.1 (Jun 23rd 2021)

Published by savingoyal over 3 years ago

Metaflow 2.3.1 Release Notes

The Metaflow 2.3.1 release is a minor release.

Features

Performance optimizations for merge_artifacts

Prior to this release, FlowSpec.merge_artifacts was loading all of the merged artifacts into memory after doing all of the consistency checks with hashes. This release now avoids the memory and compute costs of decompressing, de-pickling, re-pickling, and recompressing each merged artifact - resulting in improved performance of merge_artifacts.

metaflow - 2.3.0 (May 27th. 2021)

Published by savingoyal over 3 years ago

Metaflow 2.3.0 Release Notes

The Metaflow 2.3.0 release is a minor release.

Features

Coordinate larger Metaflow projects with @project

It's not uncommon for multiple people to work on the same workflow simultaneously. Metaflow makes it possible by keeping executions isolated through independently stored artifacts and namespaces. However, by default, all AWS Step Functions deployments are bound to the name of the workflow. If multiple people call step-functions create independently, each deployment will overwrite the previous one.
In the early stages of a project, this simple model is convenient but as the project grows, it is desirable that multiple people can test their own AWS Step Functions deployments without interference. Or, as a single developer, you may want to experiment with multiple independent AWS Step Functions deployments of their workflow.
This release introduces a @project decorator to address this need. The @project decorator is used at the FlowSpec-level to bind a Flow to a specific project. All flows with the same project name belong to the same project.

from metaflow import FlowSpec, step, project, current

@project(name='example_project')
class ProjectFlow(FlowSpec):

    @step
    def start(self):
        print('project name:', current.project_name)
        print('project branch:', current.branch_name)
        print('is this a production run?', current.is_production)
        self.next(self.end)

    @step
    def end(self):
        pass

if __name__ == '__main__':
    ProjectFlow()
python flow.py run

The flow works exactly as before when executed outside AWS Step Functions and introduces project_name, branch_name & is_production in the current object.

On AWS Step Functions, however, step-functions create will create a new workflow example_project.user.username.ProjectFlow (where username is your user name) with a user-specific isolated namespace and a separate production token.

For deploying experimental (test) versions that can run in parallel with production, you can deploy custom branches with --branch

python flow.py --branch foo step-functions create

To deploy a production version, you can deploy with --production flag (or pair it up with --branch if you want to run multiple variants in production)

python project_flow.py --production step-functions create

Note that the isolated namespaces offered by @project work best when your code is designed to respect these boundaries. For instance, when writing results to a table, you can use current.branch_name to choose the table to write to or you can disable writes outside production by checking current.is_production.

Hyphenated-parameters support in AWS Step Functions

Prior to this release, hyphenated parameters in AWS Step Functions weren't supported through CLI.

from metaflow import FlowSpec, Parameter, step

class ParameterFlow(FlowSpec):
    foo_bar = Parameter('foo-bar',
                      help='Learning rate',
                      default=0.01)

    @step
    def start(self):
        print('foo_bar is %f' % self.foo_bar)
        self.next(self.end)

    @step
    def end(self):
        print('foo_bar is still %f' % self.foo_bar)

if __name__ == '__main__':
    ParameterFlow()

Now, users can create their flows as usual on AWS Step Functions (with step-functions create) and trigger the deployed flows through CLI with hyphenated parameters -

python flow.py step-functions trigger --foo-bar 42

State Machine execution history logging for AWS Step Functions

Metaflow now logs State Machine execution history in AWS CloudWatch Logs for deployed Metaflow flows. You can enable it by specifying --log-execution-history flag while creating the state machine

python flow.py step-functions create --log-execution-history

Note that you would need to set the environment variable (or alternatively in your Metaflow config) METAFLOW_SFN_EXECUTION_LOG_GROUP_ARN to your AWS CloudWatch Logs Log Group ARN to pipe the execution history logs to AWS CloudWatch Logs

metaflow - 2.2.13 (May 19th, 2021)

Published by savingoyal over 3 years ago

Metaflow 2.2.13 Release Notes

The Metaflow 2.2.13 release is a minor patch release.

Bug Fixes

Handle regression with @batch execution on certain docker images

Certain docker images override the entrypoint by executing eval on the user-supplied command. The 2.2.10 release impacted these docker images where we modified the entrypoint to support datastore based logging. This release fixes that regression.