ScaleLLM

A high-performance inference system for large language models, designed for production environments.

APACHE-2.0 License

Downloads
611
Stars
289
Committers
6

Bot releases are hidden (Show)

ScaleLLM - v0.0.9

Published by guocuimi 6 months ago

Major Changes

  • Enabled speculative decoding and updated README

What's Changed

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.8...v0.0.9

ScaleLLM - v0.0.8

Published by guocuimi 6 months ago

Major changes

  • Added Meta Llama3 and Google Gemma support
  • Added cuda graph support for decoding

What's Changed

New Contributors

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.7...v0.0.8

ScaleLLM - v0.0.7

Published by guocuimi 7 months ago

Major changes

  • Dynamic prefix cache
  • Dynamic split-fuse scheduler
  • Speculative decoding

What's Changed

New Contributors

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.6...v0.0.7

ScaleLLM - v0.0.6

Published by guocuimi 7 months ago

Major changes:

  • Introduced new kernels aimed at enhancing efficiency.
  • Implemented an initial Python wrapper, simplifying integration and extending accessibility.
  • Incorporated new models such as Baichuan2 and ChatGLM.
  • Added support for Jinja chat templates, enhancing customization and user interaction.
  • Added usage statistics into responses, ensuring compatibility with OpenAI APIs.
  • Enabled ccache to accelerate build speed, facilitating quicker development cycles.

What's Changed

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.5...v0.0.6

ScaleLLM - v0.0.5

Published by guocuimi 10 months ago

Major changes

  • Added Qwen, ChatGLM and Phi2 support.
  • Added tiktoken tokenizer support.
  • Enabled more custom kernels for sampling.

What's Changed

New Contributors

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.4...v0.0.5

ScaleLLM - v0.0.4

Published by guocuimi 11 months ago

Major change:

  • Added docker image build for cuda 11.8.
  • Added exception handling logic in http server.

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.3-fix...v0.0.4

ScaleLLM - v0.0.3

Published by guocuimi 11 months ago

  • Added support for Yi Chat Model.
  • Added args overrider support.
  • Replaced libevhtp with boost asio for http server to fix epoll_wait not implemented error on old linux kernels.

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.2...v0.0.3-fix

ScaleLLM - v0.0.2

Published by guocuimi 12 months ago

Major changes

What's Changed

Full Changelog: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.1...v0.0.2

ScaleLLM - first release

Published by guocuimi 12 months ago

First official release, check README.md for details.

Package Rankings
Top 36.35% on Pypi.org
Badges
Extracted from project README
Docs PyPI Twitter Discord build License
Related Projects