Distributed ML Training and Fine-Tuning on Kubernetes
APACHE-2.0 License
Bot releases are hidden (Show)
New features
Bug fixes
Misc
Published by johnugeorge 12 months ago
Breaking Changes
New features
Bug fixes
Misc
Published by johnugeorge about 1 year ago
Breaking Changes
New features
Bug fixes
Misc
Published by johnugeorge over 1 year ago
Note: Since scheduler-plugins has changed API from sigs.k8s.io
with the x-k8s.io
, future releases of training operator(v1.7+) will not support scheduler-plugins v0.24.x or lower. Related: #1773
Note: Latest Python SDK 1.6 version does not support earlier training operator versions. The minimum training operator version required is v1.6.0 release. Related: #1702
New Features
Bug fixes
Misc
Closed issues:
Published by johnugeorge over 1 year ago
Note: Since scheduler-plugins has changed API from sigs.k8s.io
with the x-k8s.io
, future releases of training operator(v1.7+) will not support scheduler-plugins v0.24.x or lower
Merged pull requests:
Closed issues:
Published by johnugeorge over 1 year ago
v1.6.0-rc.0 release
Published by johnugeorge about 2 years ago
table-logger
dependency #1544 (person142)Published by johnugeorge over 2 years ago
Closed issues:
TableLogger
component in the SDK for better support with ipykernel>=6.x
#1446
kubectl get jobs
output for CRD. #1301
Merged pull requests:
table-logger
dependency #1544 (person142)Published by johnugeorge over 2 years ago
Merged pull requests:
swagger.json
file for all frameworks #1437 (alembiewski)Closed issues:
Published by johnugeorge over 2 years ago
Features and improvements:
Fixed bugs:
Closed issues:
Merged pull requests:
swagger.json
file for all frameworks #1437 (alembiewski)Published by Jeffwan about 3 years ago
Published by Jeffwan about 3 years ago
Closed issues:
Merged pull requests:
Published by johnugeorge about 3 years ago
Closed issues:
Merged pull requests:
Published by Jeffwan about 3 years ago
Published by gaocegege about 3 years ago
Closed issues:
Merged pull requests:
Published by Jeffwan about 3 years ago
v1.3.0 will be the first release version to support tensorflow, pytorch, mxnet and xgboost distributed training jobs.
More background can be found in design doc All-in-one Kubeflow Training Operator
Install Kubeflow training operator by running:
kubectl apply -k "github.com/kubeflow/tf-operator.git/manifests/overlays/standalone?ref=v1.3.0-alpha.3"
require kubectl >= 1.21.x
Published by Jeffwan about 3 years ago
v1.3.0 will be the first release version to support tensorflow, pytorch, mxnet and xgboost distributed training jobs.
More background can be found in design doc All-in-one Kubeflow Training Operator
Install Kubeflow training operator by running:
kubectl apply -k "github.com/kubeflow/tf-operator.git/manifests/overlays/standalone?ref=v1.3.0-alpha.2"
require kubectl >= 1.21.x
Published by Jeffwan about 3 years ago
v1.3.0 will be the first release version to support tensorflow, pytorch, mxnet and xgboost distributed training jobs.
More background can be found in design doc All-in-one Kubeflow Training Operator
Install Kubeflow training operator by running:
kubectl apply -k "github.com/kubeflow/tf-operator.git/manifests/overlays/standalone?ref=v1.3.0-alpha.2"
require kubectl >= 1.21.x
Published by Jeffwan about 3 years ago
tf_operator_jobs_*
counters (#1283, @alembiewski)Published by Jeffwan over 3 years ago
This is a large official release since v0.5.3. Please give more feedbacks. Thanks for all contributors.
tf-operator.v1 -version
, GitSHA is always 'not provided' (#1046, @asdfsx)conditions
is empty. (#1185, @Corea)