Official inference library for Mistral models
APACHE-2.0 License
Bot releases are visible (Hide)
Read more about Mistral-Nemo here.
Install
pip install mistral-inference>=1.3.0
Download
export NEMO_MODEL=$HOME/12B_NEMO_MODEL
wget https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-instruct-2407.tar
mkdir -p $NEMO_MODEL
tar -xf mistral-nemo-instruct-v0.1.tar -C $NEMO_MODEL
Chat
mistral-chat $HOME/NEMO_MODEL --instruct --max_tokens 1024
or directly in Python:
import os
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
tokenizer = MistralTokenizer.from_model("mistral-nemo")
model = Transformer.from_folder(os.environ.get("NEMO_MODEL"))
prompt = "How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar."
completion_request = ChatCompletionRequest(messages=[UserMessage(content=prompt)])
tokens = tokenizer.encode_chat_completion(completion_request).tokens
out_tokens, _ = generate([tokens], model, max_tokens=1024, temperature=0.35, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.decode(out_tokens[0])
print(result)
Function calling:
from mistral_common.protocol.instruct.tool_calls import Function, Tool
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
tokenizer = MistralTokenizer.from_model("mistral-nemo")
model = Transformer.from_folder(os.environ.get("NEMO_MODEL"))
completion_request = ChatCompletionRequest(
tools=[
Tool(
function=Function(
name="get_current_weather",
description="Get the current weather",
parameters={
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
)
)
],
messages=[
UserMessage(content="What's the weather like today in Paris?"),
],
)
tokens = tokenizer.encode_chat_completion(completion_request).tokens
out_tokens, _ = generate([tokens], model, max_tokens=256, temperature=0.35, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.decode(out_tokens[0])
print(result)
The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-Nemo-Base-2407. Trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size.
For more details about this model please refer to our release blog post.
Mistral Nemo is a transformer model, with the following architecture choices:
Benchmark | Score |
---|---|
HellaSwag (0-shot) | 83.5% |
Winogrande (0-shot) | 76.8% |
OpenBookQA (0-shot) | 60.6% |
CommonSenseQA (0-shot) | 70.4% |
TruthfulQA (0-shot) | 50.3% |
MMLU (5-shot) | 68.0% |
TriviaQA (5-shot) | 73.8% |
NaturalQuestions (5-shot) | 31.2% |
Language | Score |
---|---|
French | 62.3% |
German | 62.7% |
Spanish | 64.6% |
Italian | 61.3% |
Portuguese | 63.3% |
Russian | 59.2% |
Chinese | 59.0% |
Japanese | 59.0% |
Full Changelog: https://github.com/mistralai/mistral-inference/compare/v1.2.0...v1.3.0
Published by patrickvonplaten 3 months ago
pip install mistral-inference>=1.2.0
Codestral-Mamba
pip install packaging mamba-ssm causal-conv1d transformers
export MAMBA_CODE=$HOME/7B_MAMBA_CODE
wget https://models.mistralcdn.com/codestral-mamba-7b-v0-1/codestral-mamba-7B-v0.1.tar
mkdir -p $MAMBA_CODE
tar -xf codestral-mamba-7B-v0.1.tar -C $MAMBA_CODE
mistral-chat $HOME/7B_MAMBA_CODE --instruct --max_tokens 256
Mathstral
export MATHSTRAL=$HOME/7B_MATH
wget https://models.mistralcdn.com/mathstral-7b-v0-1/mathstral-7B-v0.1.tar
mkdir -p $MATHSTRAL
tar -xf mathstral-7B-v0.1.tar -C $MATHSTRAL
mistral-chat $HOME/7B_MATH --instruct --max_tokens 256
Blogs:
Blog Codestral Mamba 7B: https://mistral.ai/news/codestral-mamba/
Blog Mathstral 7B: https://mistral.ai/news/mathstral/
Full Changelog: https://github.com/mistralai/mistral-inference/compare/v1.1.0...v1.2.0
Published by patrickvonplaten 5 months ago
mistral-inference==1.1.0 supports running LoRA models that were trained with: https://github.com/mistralai/mistral-finetune
Having trained a 7B base LoRA, you can run mistral-inference
as follows:
from mistral_inference.model import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
MODEL_PATH = "path/to/downloaded/7B_base_dir"
tokenizer = MistralTokenizer.from_file(f"{MODEL_PATH}/tokenizer.model.v3") # change to extracted tokenizer file
model = Transformer.from_folder(MODEL_PATH) # change to extracted model dir
model.load_lora("/path/to/run_lora_dir/checkpoints/checkpoint_000300/consolidated/lora.safetensors")
completion_request = ChatCompletionRequest(messages=[UserMessage(content="Explain Machine Learning to me in a nutshell.")])
tokens = tokenizer.encode_chat_completion(completion_request).tokens
out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
print(result)
Published by patrickvonplaten 5 months ago
Mistral-inference is the official inference library for all Mistral models: 7B, 8x7B, 8x22B.
Install with:
pip install mistral-inference
Run with:
from mistral_inference.model import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
from mistral_common.protocol.instruct.tool_calls import Function, Tool
tokenizer = MistralTokenizer.from_file("/path/to/tokenizer/file") # change to extracted tokenizer file
model = Transformer.from_folder("./path/to/model/folder") # change to extracted model dir
from mistral_common.protocol.instruct.tool_calls import Function, Tool
completion_request = ChatCompletionRequest(
tools=[
Tool(
function=Function(
name="get_current_weather",
description="Get the current weather",
parameters={
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
)
)
],
messages=[
UserMessage(content="What's the weather like today in Paris?"),
],
)
tokens = tokenizer.encode_chat_completion(completion_request).tokens
out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
print(result)