friendli-client

Friendli: the fastest serving engine for generative AI

APACHE-2.0 License

Downloads
47.4K
Stars
40
Committers
9

Bot releases are hidden (Show)

friendli-client - Release v1.5.4 🚀

Published by kooyunmo about 2 months ago

  • text-to-image API is removed from the CLI command.
  • Support cancelling gRPC stream.
friendli-client - Release v1.5.3 🚀

Published by kooyunmo about 2 months ago

  • Support stream close and context manager.
  • API E2E tests are added.
friendli-client - Release v1.5.2 🚀 Latest Release

Published by kooyunmo 2 months ago

  • Hotfix: Automatically close streaming response at the end of the stream.
friendli-client - Release v1.5.1 🚀

Published by kooyunmo 2 months ago

Now it is available to use API calls to Friendli Dedicated Endpoints.

from friendli import Friendli

client = Friendli(use_dedicated_endpoint=True)
chat = client.chat.completions.create(
    model="{endpoint_id}",
    messages=[
        {
            "role": "user",
            "content": "Give three tips for staying healthy.",
        }
    ]
)

If you want to send a request to a specific adapter of the Multi-LoRA endpoint, provide "{endpoint_id}:{adapter_route}" to model argument:

from friendli import Friendli

client = Friendli(use_dedicated_endpoint=True)
chat = client.chat.completions.create(
   model="{endpoint_id}:{adapter_route}",
   messages=[
       {
           "role": "user",
           "content": "Give three tips for staying healthy.",
       }
   ]
)
friendli-client - Release v1.5.0 🚀

Published by kooyunmo 3 months ago

  • Deprecate model conversion and quantization. Alternatively, please use friendli-model-optmizer to quantize your models.
  • Increase default HTTP timeout.
friendli-client - Release v1.4.2 🚀

Published by kooyunmo 3 months ago

  • Support for Tool Calling API: Added new API to support tool calling.
  • Phi3 INT8 Support: Implemented support for Phi3 INT8.
  • Snowflake Arctic FP8 Quantizer: Introduced new quantizer for Snowflake Arctic FP8.
  • Added support for INT8 quantization for Llama and refactored quantizer to use only safetensors.
friendli-client - Release v1.4.1 🚀

Published by kooyunmo 4 months ago

Updating Patch Version

This patch version Introduces explicit resource management to prevent unexpected resource leaks.
By default, the library closes underlying HTTP and gRPC connections when the client is garbage-collected. However, you can now manually close the Friendli or AsyncFriendli client using the .close() method or utilize a context manager to ensure proper closure when exiting a with block.

Usage examples

import asyncio
from friendli import AsyncFriendli

client = AsyncFriendli(base_url="0.0.0.0:8000", use_grpc=True)

async def run():
    async with client:
        stream = await client.completions.create(
            prompt="Explain what gRPC is. Also give me a Python code snippet of gRPC client.",
            stream=True,
            top_k=1,
        )

        async for chunk in stream:
            print(chunk.text, end="", flush=True)

asyncio.run(run())
friendli-client - Release v1.4.0 🚀

Published by kooyunmo 4 months ago

  • gRPC client support for completions API.
friendli-client - Release v1.3.7 🚀

Published by kooyunmo 4 months ago

  • Minor: add a default value for the "index" and "text" fields of the completion stream's chunk.
friendli-client - Release v1.3.6 🚀

Published by kooyunmo 4 months ago

  • Support Phi3 FP8 conversion.
  • Hotfix for safetensor checkpoint saver.
friendli-client - Release v1.3.5 🚀

Published by kooyunmo 5 months ago

  • Optimize CPU RAM usage during quantization with offloading
  • Support FP8 conversion for DBRX, Mixtral, and Command R+
friendli-client - Release v1.3.4 🚀

Published by kooyunmo 7 months ago

  • Hotfix for LoRA checkpoint saving error.
friendli-client - Release v1.3.3 🚀

Published by kooyunmo 7 months ago

New Features

  • FP8 Checkpoint Conversion: We've introduced a new feature for FP8 checkpoint conversion.
  • Sharded Safetensors Checkpoint Saving: Added the ability to save sharded safetensors checkpoints.
  • LoRA Support on Mistral Model: We have added support for LoRA (Low-Rank Adaptation) on the Mistral model.

Bug Fixes

  • BF16 Hotfix: Addressed an urgent issue with bf16 processing.
  • BFloat Safetensors Conversion: Fixed an issue related to bfloat conversion for safetensors.
  • Automatic Token Refresh: Resolved a bug affecting automatic token refresh.
friendli-client - Release v1.3.2 🚀

Published by kooyunmo 7 months ago

  • Add base_model_name_or_path option to friendli model convert-adapter.
  • Remove stale dependencies.
friendli-client - Release v1.3.1 🚀

Published by kooyunmo 7 months ago

  • Update protobuf schema.
  • Patch sending API requests with content type application/protobuf.
friendli-client - Release v1.3.0 🚀

Published by kooyunmo 7 months ago

  • Now resources of Friendli Dedicated Endpoints can be managed with CLI and SDK. The available resources are endpoint, model, team, and project.
  • Login with CLI is now available. SSO login is also available.
  • Update on Multi-LoRA checkpoint conversion.
friendli-client - Release v1.2.4 🚀

Published by kooyunmo 8 months ago

Patch Version v1.2.3

  • Distribute a Python package as type hinted.
friendli-client - Release v1.2.3 🚀

Published by kooyunmo 8 months ago

Patch Version v1.2.3

  • Support pydantic V1 compatibility.
friendli-client - Release v1.2.2 🚀

Published by kooyunmo 8 months ago

Release Patch Version

  • Package dependencies are updated.
friendli-client - Release v1.2.1 🚀

Published by kooyunmo 8 months ago

Release Patch Version v1.2.1

  • Update package dependencies (no more exact version matching).
  • Add Mixtral model type
  • Add a stop option to completions and chat completions SDK/CLI.
Package Rankings
Top 38.35% on Pypi.org
Related Projects