kserve

Standardized Serverless ML Inference Platform on Kubernetes

APACHE-2.0 License

Downloads
57.6K
Stars
3K
Committers
244

Bot releases are hidden (Show)

kserve - v0.6.0-rc0

Published by yuzisun over 3 years ago

🌈 What's New?

  • Web app for managing InferenceServices #1328
  • web-app: Add manifests for launching and exposing the app #1505
  • web-app: Implement a GitHub action for building the web app #1504
  • [storage-initializer] add support for aws sts #1451
  • MMS: Add heathcheck endpoint for InferenceService agent #1041
  • MMS: Trained Model Validation Webhook + Memory in trained model immutable #1394
  • MMS: multi-model-serving support for custom container in predictorSpec #1427
  • MMS: Added annotation to use anonymous credentials for s3 #1538
  • MMS: Adds condition for Trained Model to check if isvc predictor supports MMS #1522
  • MMS: Introducing HTTP protocol for MMS downloader
  • Improve PMMLServer predict performance #1405

🐛 What's Fixed?

  • Fix duplicated revision when creating the service initially #1467
  • The ingress virtual service is not reconciled when updating annotations/labels of inference service #1524
  • Model server response status code not propagated when using logger #1530
  • MMS service gets 404 during autoscaling #1338
  • MMS: Added mutex for downloader providers. Fixes #1531
  • MMS: Prevents /mnt/models/ from being converted into a file #1549
  • MMS: Watcher should not be started until models downloaded in MMS #1429
  • Resolve knative service diff to prevent dup revision #1484
  • Storage initializer download tar.gz or zip from uri with query params fails #1462
  • Make v1beta1 custom predictors have configurable protocol #1483
  • Fix logger for error response case #1533
  • [xgboostserver] Convert list input to numpy array before creating DMatrix #1513

What's Changed?

  • support knative 0.19+, defaults to knative-local-gateway #1334

Development experience and docs

  • speed-up alibi-explainer image build #1395
  • Update logger samples for newer eventing versions #1526
  • Update pipelines documentation #1498
  • Add github action for python lint #1485
  • Add Spark model inference example with export pmml file #1434
  • Update kubeflow overlay #1424
  • reorg multi-model serving doc #1412
kserve -

Published by yuzisun over 3 years ago

Features

  • Support credentials for HTTP storage URIs (#1372)
  • Trained Model Validation Webhook + Memory in trained model immutable (#1394)
  • Validate the parent inference service is ready in trained model controller (#1402)
  • Validation for storage URI in Trained Model webhook (#1407)

Bug Fixes

  • Use custom local gateway for isvc external service (#1382)
  • Avoid overwriting arguments specified on container fields (#1400)
  • Bug Fix for CloudEvent data access (#1396)
  • Propagate Inferenceservice annotations to top level virtualservice (#1403)
  • Remove unnecessary "latest" routing tag (#1378)
kserve -

Published by yuzisun over 3 years ago

InferenceService V1Beta1

🚢 KFServing 0.5 promotes the core InferenceService from v1alpha2 to v1beta1!

The minimum required versions are Kubernetes 1.16 and Istio 1.3.1/Knative 0.14.3. Conversion webhook is installed to automatically convert v1alpha2 inference service to v1beta1.

🆕 What's new ?

  • You can now specify container fields on ML Framework spec such as env variable, liveness/readiness probes etc.
  • You can now specify pod template fields on component spec such as NodeAffinity etc.
  • Allow specifying timeouts on component spec
  • Tensorflow Serving gRPC support.
  • Triton Inference server V2 inference REST/gRPC protocol support, see examples
  • TorchServe predict integration, see examples
  • SKLearn/XGBoost V2 inference REST/gRPC protocol support with MLServer, see SKLearn and XGBoost examples
  • PMMLServer support, see examples
  • LightGBM support, see examples
  • Simplified canary rollout, traffic split at knative revisions level instead of services level, see examples
  • Transformer to predictor call is now using AsyncIO by default

⚠️ What's gone ?

  • Default/Canary level is removed, canaryTrafficPercent is moved to the component level
  • rollout_canary and promote_canary API is deprecated on KFServing SDK
  • Parallelism field is renamed to containerConcurrency
  • Custom keyword is removed and container field is changed to be an array

⬆️ What actions are needed to take to upgrade?

  • Make sure canary traffic is all rolled out before upgrade as v1alpha2 canary spec is deprecated, please use v1beta1 spec for canary rollout feature.
  • Although KFServing automatically converts the InferenceService to v1beta1, we recommend rewriting all your spec with V1beta1 API as we plan to drop the support for v1alpha2 in later versions.

Contribution list

  • Make KFServer HTTP requests asynchronous #983 by @salanki

  • Add support for generic HTTP/HTTPS URI for Storage Initializer  #979 by @tduffy000

  • 



InferenceService v1beta1 API  #991 by @yuzisun

  • Validation check for InferenceService Name #1079 by @jazzsir

  • Set KFServing default worker to 1  #1106 by @yuzliu

  • Add support for MLServer in the SKLearn predictor  #1155 by @adriangonz

  • Add V2 support to XGBoost predictor #1196 by @adriangonz

  • Support PMML server #1141 by @AnyISalIn

  • Generate SDK for KFServing v1beta1  #1150 by @jinchihe


  • Support Kubernetes 1.18 #1128 by @pugangxa
  • Integrate TorchServe to v1beta1 spec #1161 by @jagadeeshi2i
  • Merge batcher to model agent #1287 by @yuzisun
  • Fix torchserve protocol version and update doc #1271 #1277
  • Support CloudEvent(Avro/Protobuf) for KFServer #1343 @mtickoobb

Multi Model Serving V1Alpha1

🌈 KFServing 0.5 introduces Multi Model Serving with V1Alpha1 TrainedModel CR, this is currently for experiment only and we are looking for your feedbacks!

Checkout sklearn, triton MMS examples.

  • Multi-Model Puller #989 by @ifilonenko 

  • Add multi model configmap #992 by @wengyao04

  • Trained model v1alpha1 api  #1009 by @yuzliu
  • TrainedModel controller #1013 by @yuzliu

  • Harden model puller logic and add tests  #1055 by @yuzisun
  • Puller streamlining/simplification #1057 by @njhill
  • 
Integrate MMS inferenceservice controller, configmap controller, model agent #1132 by @yuzliu
  • Add load/unload endpoint for SKLearn/XGBoost KFServer #1082 by @wengyao04

  • Sync from model config on agent startup #1204 by @yuzisun
  • Fix model puller flag for MMS #1281 by @yuzisun
  • TrainedModel status url #1319 by @abchoo
  • Add MMS support for SKLearn/XGBoost MLServer #1290 @adriangonz
  • Support GCS for model agent #1105 @mszacillo

Explanation

  • 
Add support for AIX360 explanations #1094 by @drewbutlerbb4

  • Alibi 0.5.5 #1168 by @cliveseldon
  • Adversarial robustness explainer(ART) #1244 by @drewbutlerbb4
  • PyTorch Captum explain integration, see example

Documentation

  • Docs/custom domain  #1036 by adamkgray

  • Update ingress gateway access instruction #1008 by @yuzisun

  • Document working k8s version #1062 by @riklopfer
  • Add triton torchscript example with prediction v2 protocol  #1131 by @yuzisun

  • Add torchserve custom server with pv storage example #1182 by @jagadeeshi2i 
  • Add torchserve custom server example #1156 by @jagadeeshi2i 
  • 
Add torchserve custom server bert sample #1185 by @jagadeeshi2i 


  • Bump up minimal Kube and Istio requirements  #1166 by @animeshsingh
  • V1beta1 canary rollout examples #1267 by @yuzisun
  • Promethus based metrics and monitoring docs #1276 by @sriumcp

Developer Experience

  • Migrate controller tests to use BDD testing style #936 by @yuzisun
  • 
Genericized component logic. #1018 by @ellistarn

  • Use github action for kfserving controller tests #1056 by @yuzisun
  • Make standalone installation kustomizable #1103 by @jazzsir
  • 
Move KFServing CI to AWS #1170 by @yuzisun
  • Upgrade k8s and kn go library versions #1144 by @ryandawsonuk
  • Add e2e test for torchserve #1265 by @jagadeeshi2i
  • Add e2e test for SKLearn/XGBoost MMS #1306 by @abchoo
  • Upgrade k8s client library to 1.19 #1305 by @ivan-valkov
  • Upgrade controller-runtime to 0.7.0 #1341 by @pugangxa
kserve -

Published by yuzisun almost 4 years ago

Final RC release for InferenceService V1Beta1

Merge logger/batcher to model agent

  • Merge batcher to model agent #1287
  • Fix model puller flag for MMS #1281
  • Fix torchserve protocol version and update doc #1271 #1277
  • Add e2e test for torchserve #1265
  • V1beta1 canary rollout examples #1267
  • Promethus based metrics and monitoring docs #1276
kserve -

Published by yuzisun almost 4 years ago

InferenceService V1Beta1

🚢 TorchServe Integration!

  • Add TorchServe to v1beta1 spec #1161 by @jagadeeshi2i

📝 Documentation

https://github.com/kubeflow/kfserving/tree/master/docs/samples/v1beta1/torchserve

kserve -

Published by yuzisun almost 4 years ago

InferenceService V1Beta1

🚢 KFServing 0.5 promotes the core InferenceService from v1alpha2 to v1beta1!

The minimum required versions are Kubernetes 1.15 and Istio 1.3.1. Conversion webhook is installed to automatically convert v1alpha2 inference service to v1beta1.

🆕 What's new ?

  • You can now specify container fields on ML Framework spec such as env variable, liveness/readiness probes etc.
  • You can now specify pod template fields on component spec such as NodeAffinity etc.
  • gRPC support Tensorflow Serving.
  • Triton Inference server V2 inference REST/gRPC protocol support
  • SKLearn/XGBoost V2 inference REST/gRPC protocol support with MLServer
  • PMMLServer support
  • Allow specifying timeouts on component spec
  • Simplified canary rollout, traffic split at knative revisions level instead of services level
  • Transformer to predictor call is now made async

What's gone ?

  • Default/Canary level is removed, canaryTrafficPercent is moved to the component level
  • Parallelism field is renamed to containerConcurrency

What actions are needed to take to upgrade?

  • Make sure canary traffic is all rolled out before upgrade as v1alpha2 canary spec is deprecated, please use v1beta1 spec for canary rollout feature.
  • Although KFServing automatically converts the InferenceService to v1beta1, we recommend rewriting all your spec with V1beta1 API as we plan to drop the support for v1alpha2 in later versions.

Contribution list

  • Make KFServer HTTP requests asynchronous #983 by @salanki

  • Add support for generic HTTP/HTTPS URI for Storage Initializer  #979 by @tduffy000

  • 



InferenceService v1beta1 API  #991 by @yuzisun

  • Validation check for InferenceService Name #1079 by @jazzsir

  • Set KFServing default worker to 1  #1106 by @yuzliu

  • Add support for MLServer in the SKLearn predictor  #1155 by @adriangonz

  • Add V2 support to XGBoost predictor #1196 by @adriangonz

  • Support PMML server #1141 by @AnyISalIn

  • Generate SDK for KFServing v1beta1  #1150 by @jinchihe


  • Support Kubernetes 1.18 #1128 by @pugangxa

Multi Model Serving V1Alpha1

🌈 KFServing 0.5 introduces Multi Model Serving with V1Alpha1 TrainedModel CR, this is currently for experiment only and we are looking for your feedbacks!

  • Multi-Model Puller #989 by @ifilonenko 

  • Add multi model configmap #992 by @wengyao04

  • Trained model v1alpha1 api  #1009 by @yuzliu
  • TrainedModel controller #1013 by @yuzliu

  • Harden model puller logic and add tests  #1055 by @yuzisun
  • Puller streamlining/simplification #1057 by @njhill
  • 
Integrate MMS inferenceservice controller, configmap controller, model agent #1132 by @yuzliu
  • Add load/unload endpoint for SKLearn/XGBoost KFServer #1082 by @wengyao04

  • Sync from model config on agent startup #1204 by @yuzisun

Explanation

  • 
Add support for AIX360 explanations #1094 by @drewbutlerbb4

  • Alibi 0.5.5 #1168 by @cliveseldon

Documentation

  • Docs/custom domain  #1036 by adamkgray

  • Update ingress gateway access instruction #1008 by @yuzisun

  • Document working k8s version #1062 by @riklopfer
  • Add triton torchscript example with prediction v2 protocol  #1131 by @yuzisun

  • Add torchserve custom server with pv storage example #1182 by @jagadeeshi2i 
  • Add torchserve custom server example #1156 by @jagadeeshi2i 
  • 
Add torchserve custom server bert sample #1185 by @jagadeeshi2i 


  • Bump up minimal Kube and Istio requirements  #1166 by @animeshsingh

Developer Experience

  • Migrate controller tests to use BDD testing style #936 by @yuzisun
  • 
Genericized component logic. #1018 by @ellistarn

  • Use github action for kfserving controller tests #1056 by @yuzisun
  • Make standalone installation kustomizable #1103 by @jazzsir
  • 
Move KFServing CI to AWS #1170 by @yuzisun
  • Upgrade k8s and kn go library versions #1144 by @ryandawsonuk
kserve - KFServing 0.4.1 release

Published by animeshsingh almost 4 years ago

KFServing patch release on top of v0.4 to enable deployment on OpenShift. Fixes include

  • Fixed issues on openshift (#1122)
  • change to use port 9443
  • add rbac for finalizer
  • Add inferenceservice finalizer rbac rules (#1134)
  • Fixes KFServing SDK 0.4 import error while running the custom built image (#1117)
kserve - KFServing 0.4 release

Published by yuzisun about 4 years ago

Action Required

  • KFServing has added object selector on pod mutator webhook configuration which requires minimally Kubernetes 1.15 to take effect.
  • The generated KFServing InferenceService openAPI schema validation now includes markers like x-kubernetes-list-map-keys and x-kubernetes-map-type which requires minimally Kubernetes 1.16, if you are on kubernetes 1.15 or lower version please install KFServing with --validate=false flag.
  • Tensorrt inference server has been renamed to Triton inference server, if you are using tensorrt predictor on inference service yaml please rename to triton.
  • KFserving has removed the default percentage based queue proxy resource limit due to #844, please set queue proxy requests/limits in the knative config-deployment.yaml config map which is introduced in knative 0.16 or add the queue proxy resource limit annotation if you are on lower version and your cluster has resource quota turned on, we highly recommend upgrading linux kernel if you are hitting the same cpu throttling issue.
  • The default S3 credential name has been updated to follow the convention from awsAccessKeyID and awsSecretAccessKey to AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, if you have secrets configured with the old way please update accordingly.
  • KFServing has stopped maintaining the model server image versions in the configmap, user now can set the corresponding model server version on runtimeVersion field if you need the version different from the default.

New features

  • Add batcher module as sidecar #847 @zhangrongguo
  • Add Default LivenessProbe to Tensorflow Predictor #925 @salanki
  • Remove framework image version list from configmap  #917 @yuzisun
  • Record Events when InferenceService goes in and out of readiness state #876 @ifilonenko
  • Triton inference server rename and integrations #747 @deadeyegoodwin
  • Alibi explainer upgrade to 0.4.0 #803 @cliveseldon
  • Make default request logger url more flexible #837 @ryandawsonuk 
  • Allow customized url paths on data plane #907 @iamlovingit
  • Add object selector for KFServing pod mutator webhook configuration #893 @yuzisun
  • Update logger to CloudEvents V1 protocol #886 @cliveseldon
  • Set ContainerConcurrency to Parallelism #806 @salanki

Bug Fixes

  • Disable retries in Istio VirtualService  #807 @salanki 
  • Remove default queue proxy resource limit and Add KFServing benchmarking #894 @yuzisun
  • Enhance SDK watch API to avoid traceback  #889 @jinchihe
  • Update KNative annotation when modifying minReplicas to 0 #963 @salanki
  • Allow configurable region name when creating minio client #823 @harshavardhana
  • Return 503 from healthhandler when model is not ready #818 @kolasanichaitanya
  • Updated S3 credential variable names to commonly used en var names #704 @karlschriek
  • Fix duplicated volume issue when attaching GCS secret #766 @kangwoo

Documentations

  • Add BERT example for triton inference server integration #750 @yuzisun
  • Add KFServing Debugging guide #829 @yuzisun
  • Add new KFServing sample for GCP IAP #853 @owennewo 
  • Add KFServing on Kubeflow with Istio-Dex Example #821 #822 @sachua 
  • Add Outlier Detection and Drift Detection Examples #764 @cliveseldon
  • Update pipeline sample to point to mnist e2e one  #926 @animeshsingh 
  • Add custom gRPC sample  #921 @Iamlovingit
  • Add custom inference example using BentoML #800 @yubozhao
  • Update KFServing roadmap for Q3/Q4 #861 @yuzisun

Developer Experience

  • Migrate KFServing to Go Module  #796 @yuzisun
  • Add tabular explainer e2e test #865 @janeman @yuzisun
  • Add logger and improve batcher e2e tests #938 @yuzisun
kserve - v0.3 "Stability"

Published by ellistarn over 4 years ago

Features

  • Pytorch model server with GPU inference #540
  • Support internal mesh routing to inference service e.g routing from Kafka event source #583
  • Add storage URI for transformer #643
  • Add parallelism field to allow setting autoscaling target concurrency and number of tornado workers #637
  • SKLearn model server to support pickled model #560
  • Add extra information for Logger #699
  • Default min replica to 1 instead of 0 #655
  • Upgrade knative API from v1alpha1 to v1 for KFServing #585
  • Upgrade KFServing Kubernetes dependency 1.15 and knative dependency to 1.11 #630
  • Upgrade openapi-gen #600
  • Expose containerPort to let knative listen on logger port, support logger for custom spec #592
  • Self-signed certs generation script #650

Bug Fixes

  • Fix default queue proxy container resource limit which was too low #608
  • Allow configuring max buffer size for tornado server #665
  • Relax data plane "instances" key validation #705
  • Return application/json in response header #615
  • Fix top level virtual service for HTTPS #726

Developer Experience, Tools & Testing, Examples

  • Enable local development for model servers, explainer and storage initializer #591
  • Add wait inference service SDK api #610
  • Adding custom examples #678 #698
  • Add canary rollout examples #691
  • Add e2e tests for canary rollout #658
kserve - v0.2 Prerelease

Published by ellistarn about 5 years ago