kani (カニ) is a highly hackable microframework for chat-based language models with tool use/function calling. (NLP-OSS @ EMNLP 2023)
MIT License
Bot releases are hidden (Show)
Published by zhudotexe 15 days ago
MistralFunctionCallingAdapter
wrapper engine for Mistral-Large and Mistral-Small function calling models.PromptPipeline.explain()
where manual examples would not be explained.PromptPipeline.ensure_bound_function_calls()
where passing an ID translator would mutate the ID of the underlying messagesPublished by zhudotexe 28 days ago
HuggingEngine
now use chat templates for conversational prompting and tool usage if available by default. This should make it much easier to get started with a Hugging Face model in Kani.OpenAIEngine
(e.g., for using OpenAI-compatible APIs)\llama
extraHuggingEngine
will now automatically set device_map="auto"
if the accelerate
library is installedPromptPipeline.ensure_bound_function_calls()
could still let unbound function calls through in cases of particularly long prompts with prefixing system promptsPublished by zhudotexe 4 months ago
max_function_rounds
to Kani.full_round
, Kani.full_round_str
, and Kani.full_round_stream
:
The maximum number of function calling rounds to perform in this round. If this number is reached, the model is allowed to generate a final response without any functions defined.
Default unlimited (continues until model's response does not contain a function call).
__repr__
to enginesPublished by zhudotexe 5 months ago
Kani.add_completion_to_history
(useful for token counting, see #29)PromptPipeline.explain()
when a function-related step is includedid_translator
arg to PromptPipeline.ensure_bound_function_calls()
Published by zhudotexe 5 months ago
Published by zhudotexe 6 months ago
kani now supports streaming to print tokens from the engine as they are received! Streaming is designed to be a drop-in superset of the chat_round
and full_round
methods, allowing you to gradually refactor your code without ever leaving it in a broken state.
To request a stream from the engine, use Kani.chat_round_stream()
or Kani.full_round_stream()
. These methods will return a StreamManager
, which you can use in different ways to consume the stream.
The simplest way to consume the stream is to iterate over it with async for, which will yield a stream of str.
# CHAT ROUND:
stream = ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")
async for token in stream:
print(token, end="")
msg = await stream.message()
# FULL ROUND:
async for stream in ai.full_round_stream("What is the airspeed velocity of an unladen swallow?"):
async for token in stream:
print(token, end="")
msg = await stream.message()
After a stream finishes, its contents will be available as a ChatMessage
. You can retrieve the final message or BaseCompletion with:
msg = await stream.message()
completion = await stream.completion()
The final ChatMessage may contain non-yielded tokens (e.g. a request for a function call). If the final message or completion is requested before the stream is iterated over, the stream manager will consume the entire stream.
[!TIP]
For compatibility and ease of refactoring, awaiting the stream itself will also return the message, i.e.:msg = await ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")
(note the await that is not present in the above examples). This allows you to refactor your code by changing chat_round to chat_round_stream without other changes.
- msg = await ai.chat_round("What is the airspeed velocity of an unladen swallow?") + msg = await ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")
Issue: #30
kani now has bundled support for the following new models:
Hosted
Open Source
Although these models have built-in support, kani supports every chat model available on Hugging Face through transformers
or llama.cpp
using the new Prompt Pipelines feature (see below)!
Issue: #34
To use GGUF-quantized versions of models, kani now supports the LlamaCppEngine
, which uses the llama-cpp-python
library to interface with the llama.cpp
library. Any model with a GGUF version is compatible with this engine!
A prompt pipeline creates a reproducible pipeline for translating a list of ChatMessage
into an engine-specific format using fluent-style chaining.
To build a pipeline, create an instance of PromptPipeline()
and add steps by calling the step methods documented below. Most pipelines will end with a call to one of the terminals, which translates the intermediate form into the desired output format.
Pipelines come with a built-in explain()
method to print a detailed explanation of the pipeline and multiple examples (selected based on the pipeline steps).
Here’s an example using the PromptPipeline to build a LLaMA 2 chat-style prompt:
from kani import PromptPipeline, ChatRole
LLAMA2_PIPELINE = (
PromptPipeline()
# System messages should be wrapped with this tag. We'll translate them to USER
# messages since a system and user message go together in a single [INST] pair.
.wrap(role=ChatRole.SYSTEM, prefix="<<SYS>>\n", suffix="\n<</SYS>>\n")
.translate_role(role=ChatRole.SYSTEM, to=ChatRole.USER)
# If we see two consecutive USER messages, merge them together into one with a
# newline in between.
.merge_consecutive(role=ChatRole.USER, sep="\n")
# Similarly for ASSISTANT, but with a space (kani automatically strips whitespace from the ends of
# generations).
.merge_consecutive(role=ChatRole.ASSISTANT, sep=" ")
# Finally, wrap USER and ASSISTANT messages in the instruction tokens. If our
# message list ends with an ASSISTANT message, don't add the EOS token
# (we want the model to continue the generation).
.conversation_fmt(
user_prefix="<s>[INST] ",
user_suffix=" [/INST]",
assistant_prefix=" ",
assistant_suffix=" </s>",
assistant_suffix_if_last="",
)
)
# We can see what this pipeline does by calling explain()...
LLAMA2_PIPELINE.explain()
# And use it in our engine to build a string prompt for the LLM.
prompt = LLAMA2_PIPELINE(ai.get_prompt())
Previously, to use a model with a different prompt format than the ones bundled with the library, one had to create a subclass of the HuggingEngine
to implement the prompting scheme. With the release of Prompt Pipelines, you can now supply a PromptPipeline
in addition to the model ID to use the HuggingEngine
directly!
For example, the LlamaEngine
(huggingface) is now equivalent to the following:
engine = HuggingEngine(
"meta-llama/Llama-2-7b-chat-hf",
prompt_pipeline=LLAMA2_PIPELINE
)
The engine will use the passed pipeline to automatically infer a model's token usage, making it easier than ever to implement new models.
Issue: #32
OpenAIEngine
now uses the official openai-python
package. (#31)
aiohttp
is no longer a direct dependency, and the HTTPClient
has been deprecated. For API-based models, we recommend using the httpx
library.chat_in_terminal
helper to control maximum width, echo user inputs, show function call arguments and results, and other interactive utilities (#33)HuggingEngine
can now automatically determine a model's context length.@ai_function
is missing a docstring. (#37)WrapperEngine
to make writing wrapper extensions easier.kani
models (e.g. ChatMessage
) are no longer immutable. This means that you can edit the chat history directly, and token counting will still work correctly.ctransformers
library does not appear to be maintained, we have removed the CTransformersEngine
and replaced it with the LlamaCppEngine
.chat_in_terminal
(except the first) are now keyword-only.HuggingEngine
(except model_id
, max_context_size
, and prompt_pipeline
) are now keyword-only.kani.engines.openai.models.*
models. (If you aren't sure if you're affected by this, you probably aren't.)It should be a painless upgrade from kani v0.x to kani v1.0! We tried our best to ensure that we didn't break any existing code. If you encounter any issues, please reach out on our Discord.
Published by zhudotexe 6 months ago
WrapperEngine
to make writing wrapper extensions easierPublished by zhudotexe 6 months ago
kani now supports streaming to print tokens from the engine as they are received! Streaming is designed to be a drop-in superset of the chat_round
and full_round
methods, allowing you to gradually refactor your code without ever leaving it in a broken state.
To request a stream from the engine, use Kani.chat_round_stream()
or Kani.full_round_stream()
. These methods will return a StreamManager
, which you can use in different ways to consume the stream.
The simplest way to consume the stream is to iterate over it with async for, which will yield a stream of str.
# CHAT ROUND:
stream = ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")
async for token in stream:
print(token, end="")
msg = await stream.message()
# FULL ROUND:
async for stream in ai.full_round_stream("What is the airspeed velocity of an unladen swallow?"):
async for token in stream:
print(token, end="")
msg = await stream.message()
After a stream finishes, its contents will be available as a ChatMessage
. You can retrieve the final message or BaseCompletion with:
msg = await stream.message()
completion = await stream.completion()
The final ChatMessage may contain non-yielded tokens (e.g. a request for a function call). If the final message or completion is requested before the stream is iterated over, the stream manager will consume the entire stream.
[!TIP]
For compatibility and ease of refactoring, awaiting the stream itself will also return the message, i.e.:msg = await ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")
(note the await that is not present in the above examples). This allows you to refactor your code by changing chat_round to chat_round_stream without other changes.
- msg = await ai.chat_round("What is the airspeed velocity of an unladen swallow?") + msg = await ai.chat_round_stream("What is the airspeed velocity of an unladen swallow?")
Issue: #30
kani now has bundled support for the following new models:
Hosted
Open Source
Although these models have built-in support, kani supports every chat model available on Hugging Face through transformers
or llama.cpp
using the new Prompt Pipelines feature (see below)!
Issue: #34
To use GGUF-quantized versions of models, kani now supports the LlamaCppEngine
, which uses the llama-cpp-python
library to interface with the llama.cpp
library. Any model with a GGUF version is compatible with this engine!
A prompt pipeline creates a reproducible pipeline for translating a list of ChatMessage
into an engine-specific format using fluent-style chaining.
To build a pipeline, create an instance of PromptPipeline()
and add steps by calling the step methods documented below. Most pipelines will end with a call to one of the terminals, which translates the intermediate form into the desired output format.
Pipelines come with a built-in explain()
method to print a detailed explanation of the pipeline and multiple examples (selected based on the pipeline steps).
Here’s an example using the PromptPipeline to build a LLaMA 2 chat-style prompt:
from kani import PromptPipeline, ChatRole
LLAMA2_PIPELINE = (
PromptPipeline()
# System messages should be wrapped with this tag. We'll translate them to USER
# messages since a system and user message go together in a single [INST] pair.
.wrap(role=ChatRole.SYSTEM, prefix="<<SYS>>\n", suffix="\n<</SYS>>\n")
.translate_role(role=ChatRole.SYSTEM, to=ChatRole.USER)
# If we see two consecutive USER messages, merge them together into one with a
# newline in between.
.merge_consecutive(role=ChatRole.USER, sep="\n")
# Similarly for ASSISTANT, but with a space (kani automatically strips whitespace from the ends of
# generations).
.merge_consecutive(role=ChatRole.ASSISTANT, sep=" ")
# Finally, wrap USER and ASSISTANT messages in the instruction tokens. If our
# message list ends with an ASSISTANT message, don't add the EOS token
# (we want the model to continue the generation).
.conversation_fmt(
user_prefix="<s>[INST] ",
user_suffix=" [/INST]",
assistant_prefix=" ",
assistant_suffix=" </s>",
assistant_suffix_if_last="",
)
)
# We can see what this pipeline does by calling explain()...
LLAMA2_PIPELINE.explain()
# And use it in our engine to build a string prompt for the LLM.
prompt = LLAMA2_PIPELINE(ai.get_prompt())
Previously, to use a model with a different prompt format than the ones bundled with the library, one had to create a subclass of the HuggingEngine
to implement the prompting scheme. With the release of Prompt Pipelines, you can now supply a PromptPipeline
in addition to the model ID to use the HuggingEngine
directly!
For example, the LlamaEngine
(huggingface) is now equivalent to the following:
engine = HuggingEngine(
"meta-llama/Llama-2-7b-chat-hf",
prompt_pipeline=LLAMA2_PIPELINE
)
Issue: #32
OpenAIEngine
now uses the official openai-python
package. (#31)
aiohttp
is no longer a direct dependency, and the HTTPClient
has been deprecated. For API-based models, we recommend using the httpx
library.chat_in_terminal
helper to control maximum width, echo user inputs, show function call arguments and results, and other interactive utilities (#33)HuggingEngine
can now automatically determine a model's context length.@ai_function
is missing a docstring. (#37)kani
models (e.g. ChatMessage
) are no longer immutable. This means that you can edit the chat history directly, and token counting will still work correctly.ctransformers
library does not appear to be maintained, we have removed the CTransformersEngine
and replaced it with the LlamaCppEngine
.chat_in_terminal
(except the first) are now keyword-only.HuggingEngine
(except model_id
, max_context_size
, and prompt_pipeline
) are now keyword-only.kani.engines.openai.models.*
models. (If you aren't sure if you're affected by this, you probably aren't.)It should be a painless upgrade from kani v0.x to kani v1.0! We tried our best to ensure that we didn't break any existing code. If you encounter any issues, please reach out on our Discord.
Published by zhudotexe 7 months ago
Most likely the last release before v1.0! This update mostly contains improvements to chat_in_terminal
to improve usability in interactive environments like Jupyter Notebook.
All arguments to chat_in_terminal
except the Kani instance must now be keyword arguments; positional arguments are no longer accepted.
For example, chat_in_terminal(ai, 1, "!stop")
must now be written chat_in_terminal(ai, rounds=1, stopword="!stop")
.
None
as the user query in chat_round
and full_round
. This will request a new ASSISTANT message without adding a USER message to the chat history (e.g. to continue an unfinished generation).Added the following keyword args to chat_in_terminal
to improve usability in interactive environments like Jupyter Notebook:
echo
, show_function_args
, and show_function_returns
to True.Published by zhudotexe 9 months ago
max_context_length
explicitlyPublished by zhudotexe 11 months ago
always_included_messages
near the maximum context lengthPublished by zhudotexe 11 months ago
AnthropicEngine
ToolCallError
to a more general PromptError
ToolCallError
yetPublished by zhudotexe 12 months ago
content
field might get omitted in certain requests, causing an API errorPublished by zhudotexe 12 months ago
Published by zhudotexe 12 months ago
As of Nov 6, 2023, OpenAI added the ability for a single assistant message to request calling multiple functions in
parallel, and wrapped all function calls in a ToolCall
wrapper. In order to add support for this in kani while
maintaining backwards compatibility with OSS function calling models, a ChatMessage
now actually maintains the
following internal representation:
ChatMessage.function_call
is actually an alias for ChatMessage.tool_calls[0].function
. If there is more
than one tool call in the message, when trying to access this property, kani will raise an exception.
To translate kani's FUNCTION message types to OpenAI's TOOL message types, the OpenAIEngine now performs a translation based on binding free tool call IDs to following FUNCTION messages deterministically.
To the kani end user, there should be no change to how functions are defined and called. One breaking change was necessary:
Kani.do_function_call
and Kani.handle_function_call_exception
now take an additional tool_call_id
parameter, which may break overriding functions. The documentation has been updated to encourage overriders to handle *args, **kwargs
to prevent this happening again.kani can now handle making multiple function calls in parallel if the model requests it. Rather than returning an ASSISTANT message with a single function_call
, an engine can now return a list of tool_calls
. kani will resolve these tool calls in parallel using asyncio, and add their results to the chat history in the order of the list provided.
Returning a single function_call
will continue to work for backwards compatibility.
Published by zhudotexe 12 months ago
OpenAIChatMessage
s as input rather than kani.ChatMessage
in order to better type-validate API requestsPublished by zhudotexe 12 months ago
The Message Parts API is intended to provide a foundation for future multimodal LLMs and other engines that require engine-specific input without compromising kani's model-agnostic design. This is accomplished by allowing ChatMessage.content
to be a list of MessagePart
objects, in addition to a string.
This change is fully backwards-compatible and will not affect existing code.
When writing code with compatibility in mind, the ChatMessage
class exposes ChatMessage.text
(always a string or None) and ChatMessage.parts
(always a list of message parts), which we recommend using instead of ChatMessage.content
. These properties are dynamically generated based on the underlying content, and it is safe to mix messages with different content types in a single Kani.
Generally, message part classes are defined by an engine, and consumed by the developer. Message parts can be used in any role’s message - for example, you might use a message part in an assistant message to separate out a chain of thought from a user reply, or in a user message to supply an image to a multimodal model.
For more information, see the Message Parts documentation.
Up next: we're adding support for multimodal vision-language models like LLaVA and GPT-Vision through a kani extension!
[INST]
wrapper. See the tests for how kani translates consecutive message types into the LLaMA prompt.Published by zhudotexe about 1 year ago
Kani.full_round
now emits every message generated during the round, not just assistant messages
FUNCTION
messages, and potentially SYSTEM
messages from a function exception handler.Kani.full_round_str
's default behaviour is unchanged.Kani.full_round_str
now takes in a message_formatter
rather than a function_call_formatter
ASSISTANT
messages.Kani.do_function_call
now returns a FunctionCallResult
rather than a bool
Kani.add_to_history
in the override, save the ChatMessage to a variableFunctionCallResult(is_model_turn=<old return value>, message=<message from above>)
Kani.handle_function_call_exception
now returns a ExceptionHandleResult
rather than a bool
Kani.add_to_history
in the override, save the ChatMessage to a variableExceptionHandleResult(should_retry=<old return value>, message=<message from above>)
kani.utils.message_formatters
kani.ExceptionHandleResult
and kani.FunctionCallResult
ChatMessage.copy_with
could cause unset values to appear in JSON serializationsPublished by zhudotexe about 1 year ago
.copy_with
method to ChatMessage and FunctionCall to make updating chat history easier