studio-go-runner

ML/ENN Runner for privately hosted, cloud, and data-center production deployments of StudioML.

OTHER License

Stars
1
Committers
7

Bot releases are visible (Hide)

studio-go-runner - 0.14.3-main-aaaagseyvek Latest Release

Published by karlmutch about 3 years ago

studio-go-runner - 0.14.3-main-aaaagsdvzdw

Published by karlmutch about 3 years ago

studio-go-runner - 0.14.3-main-aaaagsbfdfr

Published by karlmutch about 3 years ago

studio-go-runner - 0.14.2

Published by karlmutch over 3 years ago

studio-go-runner - 0.14.1

Published by karlmutch over 3 years ago

IMPROVEMENTS:

  • The queue-status is now called the queue-scaler due to its extended functionality
  • cosign support for Image verification on dockerhub and AWS ECR

FIXES:

  • Provisioning of hosts with the queue-scaler tool can cause overly powerful machines to be allocated
  • The dockerhub release images for this version have been signed. Please review the instructions in the README.md A note concerning security and privacy.
studio-go-runner - 0.14.0

Published by karlmutch over 3 years ago

IMPROVEMENTS:

  • Upgrades to the AWS cli, and prometheus common libraries
  • Introduce queue-status, a tool for use with Job dispatching deployments using AutoScaling
  • Ubuntu 18.04 migrated to Ubuntu 20.04
  • TensorFlow 1.x support removed, versions now supported are 2.3-2.5
  • Python support bumped to include 3.9, 3.8.10 is the default
  • gRPC and protobuf upgrades
  • Go 1.16.4 support
  • CUDA 11.2 Migration

FIXES:

  • GPU Memory usage could result in 2 cards being allocated 1 for memory 1 for compute incorrectly

It is worth reminding that the Go module feature now being used provides module authentication using checksums against a database of modules hosted by google. Please review the following privacy notice in regards to this feature, https://proxy.golang.org/privacy. A vendor directory is provided as a means of avoiding Go module proxies performing integrity checking if you wish to run in a air-gaped configuration.

studio-go-runner - 0.13.2

Published by karlmutch over 3 years ago

IMPROVEMENTS:

  • Storage limitations now used when downloading artifacts, based on the requested disk space from the StudioML client
  • Idle Time limits added, new options -limit-idle-duration duration, -limit-interval duration with string values such as 10m for 10 minutes
  • Jobs completed limit option added, -limit-tasks
  • Document auto scaling, down to 0, in docs/aws_k8s.md, for the EKS use case.
  • Go 1.16.3 support
  • A100 support in non mig mode only for AWS, mixed, and single mig mode for on-premises Kubernetes
  • RabbitMQ Rabbit Hole and many other dependency upgrades

FIXES:

  • Security changes made for file escape when unpacking artifact archives
  • When using multiple GPUs the CUDA_VISIBLE_DEVICES was getting overwritten by the addition of new GPU devices

KNOWN BUGS:

  • AWS A100 (p4d.24xlarge) mixed, and single mig support is waiting on AWS fixes
studio-go-runner - 0.13.1

Published by karlmutch over 3 years ago

IMPROVEMENTS:

  • Go 1.16 support
  • Docker file for the stack introduced to improve build times
  • AWS MMQ support for RabbitMQ, specific instructions can be found at docs/aws_k8s.md

FIXES:

  • TestTFXCfgGenerator timeout was too small causing the test to be flaky and timeout
  • Prevent releases overwritting older versions
  • Fix CWE-22 code blocks for symbolic links in tarfiles, https://cwe.mitre.org/data/definitions/22.html
  • CVE impacted package upgrades
studio-go-runner - 0.13.0

Published by karlmutch over 3 years ago

IMPROVEMENTS:

  • Code base pkg components used by multiple projects refactored into a new repository, github.com/leaf-ai/go-service

  • Go 1.15.8 support with modules

  • Remove deprecated Google Cloud storage proprietary API and use S3 mode to interact with the Google Cloud Storage offering

  • S3 Credential migration to being per artifact, also environment variables are no longer used, except when the --allow-env-secrets is specified

studio-go-runner - 0.12.1

Published by karlmutch almost 4 years ago

IMPROVEMENTS:

  • CUDA 11.0 migration
  • Go 1.15.6 support with modules
  • AWS Support stack refresh, with AWS Managed Rabbit MQ support
studio-go-runner - 0.12.0

Published by karlmutch almost 4 years ago

IMPROVEMENTS:

Model Serving Bridge to TFX Serving platform implementation with application note and complete Kubernetes deployment example

COMPATIBILITY:

Downgrade use of S3 ListObjects to V1 to support Google Cloud Storage

studio-go-runner - 0.11.0

Published by karlmutch about 4 years ago

IMPROVEMENTS:

Response queue support with encryption for RabbitMQ installations

FIXES:

Testing improved for CI
Individual developer workstation testing robustness improved
Fix CWE-22 Alerts
Workaround issues introduced for Cuda 10.1 images from Nvidia, https://github.com/NVIDIA/nvidia-docker/issues/1143

studio-go-runner - 0.10.1

Published by karlmutch over 4 years ago

IMPROVEMENTS:

CUDA 10.1 support added and CUDA 8.0 support dropped
Tensorflow 1.12 and below no longer supported
Tensorflow 2.0 to 2.2 now supported along with pytorch 1.0.0 and above
Migrated from Ubuntu 16.04 to 18.04

studio-go-runner - 0.10.0

Published by karlmutch over 4 years ago

IMPROVEMENTS:

PKI message encryption, and ed25519 message signing for messaging between python studioml clients and the go runner
Docker Desktop support with multiple concurrent experiments on Mac and PC
Go 1.14.4 support
CUDA 10.1 support for all platforms except Azure
Python 2 support retired
Extensive improvements to the keel based build, functional and speedwise
Quay.io is now the only offical container image registra in order that vulnerability scanning is the default for any runner related images.
CUDA 10 Support for GPU Docker images

FIXES:

Mount specifications for encryption were missing from the examples folder
Titan X cards would be skipped on smaller resourced jobs, allow jobs to be run on cards more than 3 times the capacity than the job requires
pyenv installations were failing on blank slate installs used in on-premises environments
management requests to rabbitMQ were leaking small amounts of memory

studio-go-runner - 0.9.25

Published by karlmutch over 4 years ago

Queue deletion support for task cancellation
Artifact return extended to 5 minutes for early job failures

studio-go-runner - 0.9.24

Published by karlmutch over 4 years ago

studio-go-runner - 0.9.23

Published by karlmutch over 4 years ago

studio-go-runner - 0.9.22

Published by karlmutch over 4 years ago

studio-go-runner - 0.9.21

Published by karlmutch about 5 years ago

studio-go-runner - 0.9.20

Published by karlmutch about 5 years ago

Package Rankings
Top 7.31% on Proxy.golang.org
Badges
Extracted from project README
License Go Reference Go Report Card CodeQL Codefresh build status