genv

GPU environment and cluster management with LLM support

AGPL-3.0 License

Downloads
789
Stars
412
Committers
4

Bot releases are visible (Hide)

genv - v1.4.2 Latest Release

Published by razrotenberg 5 months ago

Fixed

  • genv remote llm supports multiple Linux users
genv - v1.4.1

Published by razrotenberg 6 months ago

Fixed

  • genv llm supports multiple Linux users
genv - v1.4.0

Published by razrotenberg 8 months ago

Added

  • Introduced LLM support in Genv
  • Added command genv llm
  • Added command genv remote llm
  • Added command genv version

Fixed

  • Passing environment variables over SSH commands
genv - v1.2.0

Published by razrotenberg over 1 year ago

Added

  • Ray integration and genv.ray subpackage
genv - v1.1.0

Published by razrotenberg over 1 year ago

Added

  • Added flag --max-devices-for-user to genv enforce and genv remote enforce commands
genv - v1.0.0

Published by razrotenberg over 1 year ago

Changed

  • Renamed Python CLI from genvctl to genv
  • Installation is now 100% Python-based
  • Moved Python source directory to the project root

Removed

  • Removed shell code base libexec/ entirely
genv - v0.12.0

Published by razrotenberg over 1 year ago

Added

  • Added genv.sdk.activate
  • Added genv.sdk.attach
  • Added genv.sdk.attached
  • Added genv.sdk.configure
  • Added genv.sdk.configuration

Changed

  • Merged executables into genvctl
genv - v0.11.0

Published by razrotenberg over 1 year ago

Added

  • Introducing the Genv CLI genvctl
  • Added CLI subcommand genvctl lock for locking over-subscribed devices as access control
  • Supporting over-subscription in container toolkit with genv-docker flag --over-subscribe
  • Added genv.sdk.env SDK package for the active environment
  • Supporting locking multiple devices in genv.core.devices.lock

Changed

  • Renamed genv.sdk.lock_devices to genv.sdk.lock
genv - v0.10.1

Published by razrotenberg over 1 year ago

Fixed

  • Fixed genv lock
genv - v0.10.0

Published by razrotenberg over 1 year ago

Added

  • Using global lock around critical sections instead of locking per module
  • Enriched core entities with more fields
  • Python package does not run executables in subprocesses anymore
  • Supporting cleaning up entities in place with .cleanup()
  • Bug fix in serialization of Report objects

Changed

  • Major restructure in project directory
  • Renamed Snapshot entities
  • Renamed genv.env to genv.sdk

Removed

  • Removed methods from genv.envs and genv.devices
genv - v0.9.0

Published by razrotenberg over 1 year ago

Added

  • Over-subscribe devices with new flag -o --over-subscribe to genv attach
  • New Python SDK at genv.env
  • Control access to over-subscribed devices with genv lock and genv.env.lock_devices()
  • Added fields and actions to genv.devices.Device and genv.devices.Snapshot

Changed

  • Renamed previous command genv-devices query to genv-devices find
  • Using genv.devices.Snapshot instead of plain JSON in genv-devices and devices.json (backwards compatible)

Removed

  • Removed environment variable GENV_ALLOW_DEVICE_OVER_ALLOCATION
genv - v0.8.0

Published by razrotenberg over 1 year ago

Added

  • Introduced Genv container toolkit: genv-docker and the Genv container runtime
  • Flags --count and --index are optional in genv-devices attach; if none passed, genv-devices attach uses the configured device count if set
  • Added genv.devices.attach()
  • Added genv.envs.gpus()
  • Added genv.envs.activate()
  • Added genv.envs.configure()

Changed

  • nvidia-smi shim prints warning message when missing information about processes
  • nvidia-smi shim supports the case when environment variable CUDA_VISIBLE_DEVICES is not set
  • nvidia-smi shim fails if no other nvidia-smi executable found
genv - v0.7.0

Published by razrotenberg over 1 year ago

Added

  • Added monitoring features with Prometheus and Grafana using genv monitor and genv remote monitor
genv - v0.6.0

Published by razrotenberg over 1 year ago

Added

  • Added enforcement rule env-devices
  • Added enforcement rule env-memory
  • Added command genv remote query
  • Added flag -t --timeout to genv remote to set SSH connection timeout
  • Added flag -e --exit-on-error to genv remote to exit on SSH connection issues
  • Added flag -q --quiet to genv remote to ignore SSH connection issues
  • Added flag --no-prompt to genv remote activate to not change shell prompt
  • Ignoring commented lines in hostfile used in genv remote
  • Added query uid to genv-envs query
  • Set up Google Analytics for documentation site

Changed

  • genv remote does not exit on SSH connection issues by default
  • Refactor entities and snapshots in genv Python package
  • Refactor remote capabilities in genv.remote Python subpackage
  • Combined enforcement rules code under genv.enforce.rules
  • Improved development setup and in particular nvidia-smi development shim and CPU-only setup for remote features
genv - v0.5.0

Published by razrotenberg over 1 year ago

Added

  • Introduced local and remote enforcement features with two enforcement rules: non environment processes and max devices per user
  • Added genv-usage executable for taking snapshots and executing enforcement reports
  • Supporting querying environment usernames
  • Added flag --quiet to genv-devices detach
  • Created devel directory and nvidia-smi mock shim
  • Added environment variable GENV_TERMINATE_PROCESSES to allow not terminating processes
  • Added environment variable GENV_MOCK_NVIDIA_SMI_PIDS to set process identifiers in the nvidia-smi mock shim

Changed

  • Created Python package genv
  • Major refactor to Python code by adding logic layers (e.g. genv.envs) and entities (e.g. genv.envs.Env)
  • Supporting encoding and decoding entities as JSON
  • Supporting sending standard input to SSH commands
  • Supporting running SSH commands with sudo
genv - v0.4.0

Published by razrotenberg almost 2 years ago

Added

  • Listing active environments on remote hosts with genv remote envs
  • Showing device information on remote hosts with genv remote devices
  • Activating an environment on a remote host with genv remote activate

Changed

  • Formatting Python code with black
  • Linting Python code with flake8
genv - v0.3.0

Published by razrotenberg almost 2 years ago

Added

  • Configuring environment GPU memory capacity
  • GPU memory aware device provisioning
  • Supporting device over allocation with multiple environments
  • Documentation site

Changed

  • docker shim injects environment variable GENV_ENVIRONMENT_ID to containers
  • Renamed environment variable GENV_DEVICES to GENV_MOCK_DEVICE_COUNT

Fixed

  • nvidia-smi shim supports environment variables with =
  • docker shim supports the case when argument --gpus is not passed
  • nvidia-smi shim does not pass argument --id in bypass mode
genv - v0.2.0

Published by razrotenberg almost 2 years ago

Features

  • nvidia-smi shim shows only information relevant to the environment (i.e. device memory, processes)
  • Added docker shim to expose containers to devices attached to the environment
genv - v0.1.0

Published by razrotenberg about 2 years ago