runhouse

The fastest way to iterate and deploy AI workloads on your own infra. Unobtrusive, debuggable, PyTorch-like APIs.

APACHE-2.0 License

Downloads
35.8K
Stars
707
Committers
16

Bot releases are hidden (Show)

runhouse - v0.0.25 Latest Release

Published by dongreenberg 6 months ago

Improved parallelism, clearer exceptions, and saving resources within Den orgs

Improvements

  • Improve the thread, reference, and fault tolerance model for EnvServlet ray actors (#735, #733, #736, #734, #737)
  • Catch all non-deserializable exceptions client-side (#730)
  • Support for saving resources on behalf of an org (#676, #732)

Bugfixes

  • Dynamically set API_SERVER_URL (#708)
  • Move OMP_NUM_THREADS setting into servlet to avoid setting it on import by (#731)

Full Changelog: https://github.com/run-house/runhouse/compare/v0.0.24...v0.0.25

runhouse - v0.0.24

Published by dongreenberg 6 months ago

Fast-follow bugfixes for CPU parallelism and log streaming

Bug fixes

  • Fix ray persistently setting OMP_NUM_THREADS=1 (#723)
  • Fix method call log streaming by unbuffering stdout/err in call threadpool (#724)

Full Changelog: https://github.com/run-house/runhouse/compare/v0.0.23...v0.0.24

runhouse - v0.0.23

Published by dongreenberg 6 months ago

Richer async support, performance improvements, and bugfixes

Improvements

  • Client-side Async support (#690, #696, #696, #689) - We've improved the way we handle async calls to remove modules. Now, you can properly unblock the event loop and await any remote call by passing run_async as an argument into the method call. If your method is already defined as async, this will be applied automatically without specifying run_async so your code can await the remote method just as it did the original. You can still explicitly set run_async=False in that case to make the local call sync.
  • Improve Mapper ergonomics and docs (#700, #709) - Now you can simply pass a function to the mapper and it will send over the module and create replicas on its own. We'll publish new mapper tutorials shortly.
  • Cache rich signature for Module to improve method call performance (#699)
  • Don't serialize tracebacks in OutputType.EXCEPTION (#721) - Sometimes exceptions can't be deserialized locally because they depend on remote libraries. In those cases, we now still print the traceback for better visibility.
  • Unset OMP_NUM_THREADS when Ray automatically set it because it may break user parallelism expectations (#719)

Bugfixes

  • Fix stdout and logs streaming in various scenarios (#716, #717)
  • Remove unused requests.Session created in HTTPClient (#694)
  • Change Caddy installation to download from Github (#702) (Sorry Caddy!)
  • Inherit Cluster READ access for resources on the cluster (#706)
  • Set the cluster name in the HTTPClient upon rename (#704)
  • Fix some runhouse login bugs (#717)
  • Make errors from Den include status code and be more verbose (#707)
  • Fix SkySSHRunner tunnels and processes to be correctly cleaned up (#718)

Full Changelog: https://github.com/run-house/runhouse/compare/v0.0.22...v0.0.23

runhouse - v0.0.22

Published by carolineechen 7 months ago

Performance improvements + bug fixes

Improvements

  • Add to open_ports when creating new on demand cluster (#651)
  • Updates to Sagemaker Cluster (#654)
  • Change AuthCache logic to request per keypair (#684)

Performance Improvements

  • Cache various module/function computations (#661, #665, #662)
  • Async daemon side components (#656, #664, #673, #674, #670)
  • Use ThreadPoolExecutor to synchronous function calls on server side (#663)
  • Decrease log wait time (#685)

Bug Fixes

  • Fix bug with json serialization for exceptions (#655)
    • Update returned exceptions to be json serializable.
  • Use shell for running cmd in env servlet (#667)
    • Previously shell commands would not consistently work.
  • Fix cluster autostop (#672, #681, #683)
    • Change to correctly set and update last activity time and do it in a background thread
  • Fix multinode cluster ips (#681)
    • Cluster ips previously computed from cached ips and would incorporate stale ones. Update to use only current ips.

Examples

  • Add Llama2 on Inferentia with TGI example (#649)
  • Update Inferentia examples to use the DL AMI (#677)
runhouse - v0.0.21

Published by carolineechen 7 months ago

Some performance and feature improvements, bug fixes, and new examples.

Improvements

  • OpenAPI pages for cluster (#579, #586, #587, #589, #590)
  • Properly raise exceptions in Module's load_config when dependency is missing (#595)
  • Kill Ray actors by default during runhouse stop (#596)
  • module.to(rh.here) throws error if local server is not initialized (#597)
  • Send exceptions in data field (#602)
  • Run commands inside env servlet (#603)
  • Return exceptions instead of None in failed mapper replicas (#605)
  • Remove sshtunnel library dependency (#625, #634, #640)
  • Don't save cluster secret during cluster init (#633)
  • Remove creds from cluster's config file (#637)

Performance

  • Use check_server instead of is_up with refresh for ondemand cluster endpoint (#614)
  • Remove register_activity calls within env servlet (#629)

Bug Fixes

  • Install aws dependencies properly for runhouse[aws] (#613)
  • Fix env servlet name in put_resource (#626)
    • Env servlet was using conda env name instead of env resource name.
  • Fix SkySSHRunner local and remote port ordering (#630)

BC-Breaking

  • Remove previously deprecated items (#624)
    • reqs and setup_cmds in rh.function.to() removed. Pass it into the env instead.
    • access_type removed in Resource and share. Use access_level instead.
    • global pinning methods removed. Use rh.here.put/get/delete/keys/clear instead.
  • Deprecate and raise exception for passing system into function/module factories (#625)
    • Passing in system to rh.function/module does not send code to the system and can be misleading. Use .to or get_or_to to sync code to the cluster.

Examples

See rendered examples on https://www.run.house/examples

New Examples

  • Mistral 7B Inference with TGI on AWS EC2 (#585, #604)
  • Mistral 7B Inference on AWS Inferentia (#609)
  • Langchain RAG App on AWS EC2, with Custom Domain (#607, #621)
  • Llama2 on EC2 A10G (#608)
  • Llama2 Inference with TGI on AWS EC2 A10G (#610)

Updates

  • Add READMEs to GitHub (#612, #619)
  • Avoid reinstall for envs and extra imports in examples (#616, #618)
runhouse - v0.0.20

Published by carolineechen 7 months ago

Highlights

Cluster Sharing

We’ve made it easier to share clusters across different environments and with other users. You can now share and load a cluster just as you would any other resource.

my_cluster = rh.cluster("rh-cluster", ips=[...], ...)
my_cluster.share(["[email protected]", "username2"])

# load the box with
shared_cluster = rh.cluster("owner_username/rh-cluster")

Shared users will be able to seamlessly run shared apps on that cluster, or SSH directly onto the remote box. To enable this, we persist the SSH credentials for the cluster as a Runhouse Secret object, which can easily be reloaded when another user tries to connect.

Improved rh.Mapper

rh.Mapper was first introduced in runhouse v0.0.15, an extension of functions/modules to handle mapping, replicating, and load balancing. Further improvements and some bug fixes were included in this release, plus a BC-breaking variable name (see section below).

def local_sum(arg1, arg2, arg3):
    return arg1 + arg2 + arg3

remote_fn = rh.function(local_sum).to(my_cluster)
mapper = rh.mapper(remote_fn, replicas=2)
mapper.map([1, 2], [1, 4], [2, 3])
# output: [4, 9]

Improvements

  • Use hashed subtoken for cluster requests (#270)
  • Simplify storage of SSH creds for more reliable cluster access across environments and users (#479)
  • Remove sky storage dependency (#415)
  • Replace subprocess check_call with run (#503)
  • Serialize exceptions properly (#516)
  • Improved Logging
    • Only write out execution logs if stream_logs is set (#490)
    • Propagate logs from pip installs on cluster (#505)
    • Write some logs to sys.out (#519)

Bug Fixes

  • Mapper bug fixes (#539)

Deprecation

  • Renaming config_for_rns property to config function (#553, #554, #555)

BC-Breaking

  • rh.mapper factory function args renaming
    • num_replicas -> replicas
    • replicas -> concurrency

Docs

See updated tutorials on Runhouse docs

  • New quick start guides -- local, cloud, and Den versions
  • Updated API tutorials -- clusters, functions & modules, envs, folders

Examples

See new Runhouse examples on GitHub or webpage

  • Llama2 inference on AWS EC2
  • Stable Diffusion XL 1.0 on AWS EC2
  • Stable Diffusion XL 1.0 on AWS Inferentia

Other

  • Remove paramiko as server connection type
runhouse - v0.0.19

Published by rohinb2 8 months ago

Minor bug fix release
Bug fix fixing import breaking in Python 3.8
Bug fix for loading public functions by name

runhouse - v0.0.18

Published by carolineechen 8 months ago

Highlights

Runhouse Local Mode and rh.here

Previously, the Runhouse server was strictly designed to allow you to deploy apps to it remotely with my_module.to(my_cluster). Now, you can now start the Runhouse server daemon directly to be able to deploy it locally like a traditional web server. Access the local daemon's Cluster object in Python with rh.here. rh.here always refers to the locally running daemon, so you can use within an existing Runhouse cluster as well.

Start your local Runhouse server:

$ runhouse restart
$ runhouse status

To send a module:

def concat(a, b):
    return a+b

import runhouse as rh
rh.function(concat).to(rh.here)

To try out your service:

curl -X "GET" 'http://localhost:32300/concat/call?a=run&b=house'

>>> {"data":"\"runhouse\"","error":null,"traceback":null,"output_type":"result_serialized","serialization":"json"}

This is also particularly useful for debugging. You can ssh onto your cluster, start a Python shell, and run methods like rh.here.call("my_module", "my_method") to test or analyze your deployed module's behavior or contents quickly.

Replace nginx with Caddy

Use Caddy as a reverse proxy for the Runhouse server launched on clusters, as well as automatically generating and auto-renewing self-signed certificates, making it easy to secure your cluster with HTTPS right out of the box.

Improvements

  • Improved logging to reduce log clutter, and differentiate local and cluster(#436, #475)
  • Support packages using setup.cfg (#456)
  • Runhouse status updates (#462, #469)

Build

  • Remove Sky dependency for SSH command runner

Bug Fixes

  • Fix name to properly be updated in cluster when saved (#451, #477)
  • Fix bug in sagemaker cluster factory (#459)
  • Fix Cluster.from_name to properly load existing config in Den (#468)
  • Fix CLI runhouse status for on-demand cluster (#478)

BC-Breaking

  • reqs and setup_cmds removed from function .to (#373)
  • Generator module now returns generator rather than streamed results (#373)

Other

  • Refactor and new methods for obj store (#373)
  • Replace nginx with Caddy (#406)
  • Set unique SSH control path
runhouse -

Published by carolineechen 8 months ago

Patch for v0.0.16

Remove deprecated runhouse.rns.Secrets class, which is no longer being used and was causing an issue in importing runhouse.

runhouse - v0.0.16

Published by carolineechen 8 months ago

This release largely consists of updates to Runhouse infrastructure and build, with the addition of two new runhouse CLI commands (stop and status), basic ASGI support, and some bug fixes.

Note: We’ve removed some dependencies to better support local-only use cases of Runhouse. To use Runhouse with a cluster, please install with pip install “runhouse[sky]”, and to use Runhouse data structures like tables, please install with pip install “runhouse[data]”

Improvements

  • Change Module's endpoint from property to function (#367)
  • Change ray to only be initialized in runhouse start or HTTPServer() (#369)
  • Ray start to connect to existing cluster (#405)

New features

  • Introduce Asgi module and support calling route functions directly (#370)
  • Add runhouse stop command/function (#392)
  • Add deleting env servlet functionality (#417)
  • Add runhouse status command (#416)

Build

  • Relax hard dependency on sky (#414)
  • Localize data dependency imports (#418)

Bug Fixes

  • Account for workdir or compute in env factory (#354)
  • rh.here working (#338)
  • Fix to function's .notebook() functionality (#362)
  • Only set rns_address upon save (#434)

BC-Breaking

  • Replace num_entries with limit for resource history API (#399)
runhouse - v0.0.15

Published by carolineechen 9 months ago

Highlights

  • Mapper, a built-in Runhouse module for mapping functions over a list of inputs across compute (#327)
  • Python3.11 support (#279)

rh.Mapper Module

The Mapper expands Runhouse functions and Module methods to handle mapping, replicating, and load balancing. A Mapper object is constructed simply by passing in a function or module and module method, along with the number of replicas to use, and optionally your own user-specified replicas. It takes the function and creates replicas of it and its envs, and round-robin calls the replicas to run function calls in parallel.

def local_sum(arg1, arg2, arg3):
	return arg1 + arg2 + arg3

remote_fn = rh.function(local_sum).to(my_cluster)
mapper = rh.mapper(remote_fn, num_replicas=2)
mapper.map([1, 2], [1, 4], [2, 3])
# output: [4,9]

Improvements

Better multinode cluster support

  • Sync runhouse to all nodes instead of just the head node (#278)
  • Start Ray on both head and worker nodes (#305)
  • Add back support for cluster IPs (#346)
    Introduce cluster servlet for handling global cluster object store (#308)

Build

  • Python3.11 support (#279)
  • Update AWS dependencies (#290)

Bug Fixes

  • Fix streaming with HTTP/HTTPS/Nginx (#261)

BC-Breaking

  • Replace instance_count with num_instances for cluster class (#269)

Docs

  • Updated quick start and compute tutorials (#310, #347)
runhouse - v0.0.14

Published by carolineechen 10 months ago

Highlights

  • Secrets Revamp (#135)
    • Facilitate saving, sending, and sharing of Secrets by treating Secrets as a Runhouse resource
  • (Alpha) AWS Lambda Functions support (#139, #240, #244)
    • Introduce AWS Lambda support for Runhouse functions

Secrets Revamp

The rh.Secrets class is being deprecated in favor of converting secrets to a Runhouse resource type. As with other resources, the new Secret class supports saving, reloading, and sending secrets across clusters.

There are various builtin secret provider types, for keeping track of compute providers (aws, azure, gcp..), api key based providers (openai, anthropic, …), and ssh key pairs.

# non-provider secret, in-memory
my_secret = rh.secret(name=”my_secret”, values={“key1”: “val1”, “key2”: “val2”})
my_secret.save()
reloaded_secret = rh.secret(“my_secret”)

# provider secret, in-memory or loaded from default location
aws_secret = rh.provider_secret(“aws”)  # loads from ~/.aws/credentials or from env vars
openai_secret = rh.provider_secret(“openai”, values={“api_key”: “my_openai_key”})  # explicitly provided values

There are also various APIs for syncing secrets across your clusters and environments:

aws_secret.to(cluster, env)
cluster.sync_secrets([“aws”, “gcp”], env)

env = rh.env(secrets=[“aws”, “openai”]
fn.to(cluster, env)

Please refer to the API tutorial for a more in-depth walkthrough of using Secrets, or the documentation for specific APIs and a full list of builtin providers.

(Alpha) Lambda Functions (AWS serverless)

Runhouse is extending functions to Amazon Web Services (AWS) Lambda Compute. These functions are deployed directly on AWS serverless compute, with Lambda’s infra and servers handled under the hood, making the Lambda onboarding process more smooth and removing the need to translate code through Lambda-specific APIs.

Note: Lambda Functions are in Alpha and the APIs are subject to change. A more stable release along with examples will be published soon. In the meantime, you can find documentation here.

New Additions

  • Add visibility to resource config, and enable public resources (#222)
  • API for revoking access to shared secrets (#235)

Bug Fixes

  • Proper tunnel caching (#191, #194): tunnels were not previously being cached correctly, and dead connections not accounted for
  • Sagemaker cluster launch fix (#206): remove runhouse as a dependency from the launch script, as it has not yet been installed on the cluster
  • Fix bug with loading runhouse files/folders through SSH fsspec (#225): custom SSH port was not being set in fsspec filesystem of runhouse files/folders
  • Correctly launch multiple node clusters according to num_instances (#229): previously was not properly launching multiple nodes

Deprecations + BC-Breaking

  • access_type deprecated and renamed to access_level for resource and sharing (#223, #224, #231)
  • rh.Secrets class deprecated in favor of convert Secrets to a resource type ((#135). Some old APIs are removed, and others are deprecated. Please refer to docs and tutorial for the new secrets flow.

Other

  • README updates (#187)
  • Various docs updates
runhouse - v0.0.13

Published by DenisYay 11 months ago

Highlights

  • AWS Sagemaker Cluster (#105 #115 #166)
    • facilitates easy access to existing or new AWS SageMaker compute
  • HTTPS support (Alpha) (#114)
    • adds option for starting up the Runhouse API server on the cluster with HTTPS

Sagemaker Cluster

Runhouse is integrating with Amazon Web Services (AWS) SageMaker to allow rapid onboarding onto SageMaker, usually within minutes, and to remove the need to translate code into SageMaker-specific APIs so it can still be used dynamically with other compute infra.

The SageMaker cluster follows the Runhouse cluster definition and usage, but uses Sagemaker compute under the hood.

If you already use SageMaker with your AWS account, you should already be set to use Runhouse SageMaker support. For full SageMaker setup and dependencies, please refer to the docs.

Example 1: Launch a new SageMaker instance and keep it up indefinitely.

# Note: this will use Role ARN associated with the "sagemaker" profile defined in the local AWS config (e.g. `~/.aws/config`).
import runhouse as rh
c = rh.sagemaker_cluster(name='sm-cluster', profile="sagemaker").save()

Example 2: Running a training job with a provided Estimator

c = rh.sagemaker_cluster(name='sagemaker-cluster',
                          estimator=PyTorch(entry_point='train.py',
                                            role='arn:aws:iam::123456789012:role/MySageMakerRole',
                                            source_dir='/Users/myuser/dev/sagemaker',
                                            framework_version='1.8.1',
                                            py_version='py36',
                                            instance_type='ml.p3.2xlarge'),
                          ).save()

Support HTTPS calls to clusters (Alpha)

Adds an option for starting up the Runhouse API server on the cluster with HTTPS, including optionally creating self-signed certs and proxying through Nginx. This makes it incredibly fast and easy to stand up a microservice with standard bearer token authentication (using a Runhouse token), allowing users to share Runhouse resources with collaborators, teams, customers, etc.

Supports several new server connection types, including tls, ssh. For more information on these types, please refer to docs.

BC Breaking

  • The default Runhouse HTTP server port is now 32300 (#124)

Other

  • Remove the paramiko dependency for password clusters (#131)
  • Support running shell commands in env (#132)
    Example code:
rh.env(
        name="my_env",
        reqs=["torch", "diffusers"],
        setup_cmds=["source ~/.bashrc"]
)
  • Support an optional host parameter for the runhouse start and runhouse restart commands, which now defaults to 0.0.0.0 (#110)
    Example code:
runhouse restart --host 0.0.0.0
runhouse - v0.0.12

Published by carolineechen about 1 year ago

Highlights

  • In-memory resources, an update to existing remote resource implementations (#78)
    • includes new rh.Module resource, and a resulting performance and feature improvements
  • Sagemaker Cluster (Alpha) (#89)
    • facilitates easy access to existing or new SageMaker compute

In-memory Resources

As mentioned in the 0.0.11 Release Notes, we've redesigned how we handle remote resources, resulting in performance and feature improvements, as well as support for a new type of resource. Basic notes can be found below, or a more comprehensive technical overview can be found in our 0.0.12 blog post (coming soon!)

rh.Module Resource

rh.Module represents a class that can be accessed and used remotely, including all its class methods and variables, and with out-of-the-box support for capabilities like streaming logs/results, async, queuing, etc

  • rh.module() factory function for wrapping existing Python classes
  • rh.Module class that can be subclasses to write natively Runhouse-compatible classes

In-Python Object Pinning

Storing large objects, such as models, in Python memory can reduce time spent loading objects from disk or sending them over.

  • more stable object pinning in Python memory
  • intuitive rh.here.get() and rh.here.put() APIs, where rh.here returns the cluster it is called from

Performance Improvements

  • Reduced process overhead and latency, by having each underlying Ray Actor live in it's own process rather than launching

Other resulting improvements

  • Streaming support
  • Increased logging support
  • Async support

Sagemaker Cluster (Alpha)

Runhouse is integrating with SageMaker to make the SageMaker onboarding process more smooth, and removing the need to translate code through SageMaker specific estimators or APIs. This will be described in more detail in the 0.0.13 release, or check out the documentation in the meantime.

Build

  • Remove s3fs dependency
  • Upgrade to SkyPilot 0.0.4, to resolve Colab installation issues

BC Breaking

  • .remote() now returns a remote object, rather than a string associated with the object/run. To get the contents of the result, use result.fetch()
runhouse - v0.0.11

Published by carolineechen about 1 year ago

What's New

In-memory Resources (Alpha)

We revamped our underlying code implementation for handling remote code execution, resulting in improvements and added support for:

  • True in-Python pinning
  • Improved performance and decreased process overhead
  • Increased support for streaming and logs
  • Remote classes and class method calls (rh.Module resource)

These new features and updates will be explained in more detail in the following (0.0.12) release

Docs Site

Documentation is now supported and hosted directly in our website, at run.house/docs. Easily access documentation for any of or current and past releases.

Other

  • Environment caching, skip env subpackage installations if existing environment is already detected
  • ssh proxy tunnel support for BYO clusters (#85)
  • troubleshooting and manual setup instructions for commonly encountered issues
  • add runhouse start command

BC-Breaking

  • rename runhouse restart_server command to runhouse restart
runhouse - v0.0.10

Published by carolineechen about 1 year ago

What's New

Support for BYO clusters requiring a password

  • To create a Runhouse cluster instance for a cluster that requires password authentication:
    rh.cluster("cluster-name", host=["hostname or ips"], ssh_creds={'ssh_user': '...', 'password':'*****'},

Funhouse/Tutorials Updates

  • Update funhouse organization structure
  • Deprecate tutorials repo in favor of tutorial walkthroughs in docsite and funhouse for standalone scripts

Sentry integration for Runhouse error reports

runhouse - v0.0.9

Published by carolineechen about 1 year ago

Patch release to upgrade skypilot version to v0.3.3, which resolves a critical dependency fix for PyYAML following the Cython 3 release. On Runhouse side, fix a bug for handling git function environment requirements.

runhouse - v0.0.8

Published by dongreenberg over 1 year ago

What's New

Bugfixes

  • FastAPI recently released 0.100.0, which upgrades to Pydantic v2. This introduced breakage in Runhouse and for now we've pinned to FastAPI<=0.99.0.
  • Autostop for OnDemandClusters broke following the release of SkyPilot 0.3.0, as SkyPilot began to use their own Ray server on a separate port. When we started the Runhouse server, we were inadvertently killing the SkyPilot server, which caused the cluster status to show as in the INIT state indefinitely and suspended autostop.
  • The recently introduced Env.working_dir caused the working directory to be synced to the cluster extraneously, which is now fixed.
  • Ray does not work with PyOpenSSL<21.1.0, which was causing pesky breakage in some multiprocessing scenarios. We've pinned pyOpenSSL>=21.1.0.
  • Improve performance by removing several RNS lookups.
runhouse - v0.0.7

Published by carolineechen over 1 year ago

What's New

Dashboard & Login

Env

  • Support passing in env_vars and custom working_dir to Env resource (#75)
  • Better auto torch version handling for requirements.txt files
  • Support "requirements.txt" auto file detection

Docs and Tutorials

  • Updated README, check it out! (#73)
  • New transformers training tutorial
  • New accessibility API tutorial
  • Example code snippets for resource methods (#74)

BC-Breaking

Factory Functions (#67)

  • Remove load parameter, instead will automatically try to load from name argument if that is the only argument provided
  • Default dryrun=False

Deprecations

  • Use ondemand_cluster instead of cluster for On-Demand cloud clusters
runhouse - v0.0.6

Published by carolineechen over 1 year ago

What's New

Replace gRPC server with HTTP

  • gRPC installation is unreliable on Apple silicon, replace with HTTP for more seamless experience, and allow HTTP calls to Runhouse functions (e.g. outside Python) (#62)

Torch Version Auto-Detection

  • Support auto-detection and installation of correct torch version based on CUDA version (#41)

Envs and Packages

  • Better handling of local package syncing to remote systems (#43)
  • New Runhouse Env resource (#54)
  • Conda Env support (#57)

Docsite Restructure

  • Add Getting Started and Logging/Debugging sections (#61)
  • Improved tutorials: Add Data+Compute API Tutorials and Render Usage tutorials (#66)

Additional

  • Add --yes/-y option for interactive CLI login (#53)
  • Unit test refactors, with fixtures and pytest mark (#59)
  • Correctly sync local Runhouse version

BC-Breaking

  • Replace reqs and setup_cmds in support for env (#54)
Package Rankings
Top 6.68% on Proxy.golang.org
Top 11.1% on Pypi.org
Badges
Extracted from project README's
Discord Twitter Website Docs Den
Related Projects