ScaleLLM

A high-performance inference system for large language models, designed for production environments.

APACHE-2.0 License

Downloads
611
Stars
289
Committers
6

Bot releases are visible (Hide)

ScaleLLM - v0.1.3 Latest Release

Published by github-actions[bot] 5 months ago

Major changes

  • Model arg hotfix for llama3
  • Added more help functions

What's Changed

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.1.2...v0.1.3

ScaleLLM - v0.1.2

Published by github-actions[bot] 5 months ago

Major changes

  • set up github pages for docs https://docs.vectorch.com/
  • set up whl repository to host published whls: https://whl.vectorch.com/
  • support pip install with different versions: for example: pip install scalellm -i https://whl.vectorch.com/cu121/torch2.3/
  • added latency and system metrics
  • added initial monitoring dashboard.
  • bug fix for decoder, rejection sampler, and default value for llama2

What's Changed

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.1.1...v0.1.2

ScaleLLM - v0.1.1

Published by github-actions[bot] 5 months ago

What's Changed

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.1.0...v0.1.1

ScaleLLM - v0.1.0

Published by github-actions[bot] 5 months ago

Major changes:

  • Added python wrapper and published scalellm package to PyPI.
  • Supported openai-compatible rest api server. 'python3 -m scalellm.serve.api_server'
  • Install scalellm with pip: 'pip install scalellm'
  • Added examples for offline inference and async stream.

What's Changed

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.9...v0.1.0

ScaleLLM - v0.0.9

Published by guocuimi 6 months ago

Major Changes

  • Enabled speculative decoding and updated README

What's Changed

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.8...v0.0.9

ScaleLLM - v0.0.8

Published by guocuimi 6 months ago

Major changes

  • Added Meta Llama3 and Google Gemma support
  • Added cuda graph support for decoding

What's Changed

New Contributors

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.7...v0.0.8

ScaleLLM - v0.0.7

Published by guocuimi 7 months ago

Major changes

  • Dynamic prefix cache
  • Dynamic split-fuse scheduler
  • Speculative decoding

What's Changed

New Contributors

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.6...v0.0.7

ScaleLLM - v0.0.6

Published by guocuimi 7 months ago

Major changes:

  • Introduced new kernels aimed at enhancing efficiency.
  • Implemented an initial Python wrapper, simplifying integration and extending accessibility.
  • Incorporated new models such as Baichuan2 and ChatGLM.
  • Added support for Jinja chat templates, enhancing customization and user interaction.
  • Added usage statistics into responses, ensuring compatibility with OpenAI APIs.
  • Enabled ccache to accelerate build speed, facilitating quicker development cycles.

What's Changed

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.5...v0.0.6

ScaleLLM - v0.0.5

Published by guocuimi 10 months ago

Major changes

  • Added Qwen, ChatGLM and Phi2 support.
  • Added tiktoken tokenizer support.
  • Enabled more custom kernels for sampling.

What's Changed

New Contributors

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.4...v0.0.5

ScaleLLM - v0.0.4

Published by guocuimi 11 months ago

Major change:

  • Added docker image build for cuda 11.8.
  • Added exception handling logic in http server.

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.3-fix...v0.0.4

ScaleLLM - v0.0.3

Published by guocuimi 11 months ago

  • Added support for Yi Chat Model.
  • Added args overrider support.
  • Replaced libevhtp with boost asio for http server to fix epoll_wait not implemented error on old linux kernels.

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.2...v0.0.3-fix

ScaleLLM - v0.0.2

Published by guocuimi 12 months ago

Major changes

What's Changed

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.1...v0.0.2

ScaleLLM - first release

Published by guocuimi 12 months ago

First official release, check README.md for details.

Package Rankings
Top 36.35% on Pypi.org
Badges
Extracted from project README
Docs PyPI Twitter Discord build License
Related Projects