Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
MIT License
Bot releases are visible (Hide)
Published by github-actions[bot] 4 months ago
Ollama 0.2.0 is now available with concurrency support. This unlocks 2 specific features:
Ollama can now serve multiple requests at the same time, using only a little bit of additional memory for each request. This enables use cases such as:
https://github.com/ollama/ollama/assets/251292/9772a5f1-c072-41db-be6c-dd3c621aa2fd
Ollama now supports loading different models at the same time, dramatically improving:
Models are automatically loaded and unloaded based on requests and how much GPU memory is available.
To see which models are loaded, run ollama ps
:
% ollama ps
NAME ID SIZE PROCESSOR UNTIL
gemma:2b 030ee63283b5 2.8 GB 100% GPU 4 minutes from now
all-minilm:latest 1b226e2802db 530 MB 100% GPU 4 minutes from now
llama3:latest 365c0bd3c000 6.7 GB 100% GPU 4 minutes from now
For more information on concurrency, see the FAQ
gemma2:27b
ollama pull gemma2
or ollama pull gemma2:27b
FROM
linesFull Changelog: https://github.com/ollama/ollama/compare/v0.1.48...v0.2.0
Published by github-actions[bot] 4 months ago
/show info
will now show additional model information in ollama run
ollama show
would result in an error on certain vision modelsFull Changelog: https://github.com/ollama/ollama/compare/v0.1.47...v0.1.48
Published by github-actions[bot] 4 months ago
ollama create
when importing from SafetensorsA special thank you to the Google Cloud and DeepMind team members for Gemma 2 support.
Full Changelog: https://github.com/ollama/ollama/compare/v0.1.46...v0.1.47
Published by github-actions[bot] 4 months ago
ollama run
, especially if running an already-loaded model/api/show
including for large models--quantize
flag in ollama create
would lead to an errorModelfile
parameters would not be parsed correctlyFull Changelog: https://github.com/ollama/ollama/compare/v0.1.45...v0.1.46
Published by github-actions[bot] 4 months ago
ollama show
ollama show
will now show model details such as context length, parameters, embedding size, license and more:
% ollama show llama3
Model
arch llama
parameters 8.0B
quantization Q4_0
context length 8192
embedding length 4096
Parameters
num_keep 24
stop "<|start_header_id|>"
stop "<|end_header_id|>"
stop "<|eot_id|>"
License
META LLAMA 3 COMMUNITY LICENSE AGREEMENT
Meta Llama 3 Version Release Date: April 18, 2024
ollama show <model>
will now show model information such as context window sizeseed
in the /v1/chat/completions
OpenAI compatibility endpoint no longer changes temperature
deepseek-v2
and deepseek-coder-v2
modelsapi/show
endpoint returns extensive model metadataollama serve
Full Changelog: https://github.com/ollama/ollama/compare/v0.1.44...v0.1.45
Published by github-actions[bot] 4 months ago
ollama create
Full Changelog: https://github.com/ollama/ollama/compare/v0.1.43...v0.1.44
Published by github-actions[bot] 4 months ago
/api/embeddings
would not be accurateFile not found
errorsollama create
would result in an error on Windows with certain file formattingFull Changelog: https://github.com/ollama/ollama/compare/v0.1.42...v0.1.43
Published by github-actions[bot] 4 months ago
qwen2
would output erroneous text such as GGG
on Nvidia and AMD GPUsollama pull
is now faster if it detects a model is already downloadedollama create
will now automatically detect prompt templates for popular model architectures such as Llama, Gemma, Phi and more.llama3
/api/ps
and /api/tags
would show invalid timestamps in responsesFull Changelog: https://github.com/ollama/ollama/compare/v0.1.41...v0.1.42
Published by github-actions[bot] 5 months ago
Full Changelog: https://github.com/ollama/ollama/compare/v0.1.40...v0.1.41
Published by github-actions[bot] 5 months ago
ollama run
Full Changelog: https://github.com/ollama/ollama/compare/v0.1.39...v0.1.40
Published by github-actions[bot] 5 months ago
It is now possible to import and quantize Llama 3 and its finetunes from Safetensors format to Ollama.
First, clone a Hugging Face repo with a Safetensors model:
git clone https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
cd Meta-Llama-3-8B-Instruct
Next, create a Modelfile
:
FROM .
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|>"""
PARAMETER stop <|start_header_id|>
PARAMETER stop <|end_header_id|>
PARAMETER stop <|eot_id|>
Then, create and quantize a model:
ollama create --quantize q4_0 -f Modelfile my-llama3
ollama run my-llama3
OLLAMA_NOHISTORY=1
environment variable that can be set to disable history when using ollama run
OLLAMA_FLASH_ATTENTION=1
flag for ollama serve
that improves token generation speed on Apple Silicon Macs and NVIDIA graphics cardsollama create -f Modelfile
ollama create
can now create models from I-Quant GGUF filesEOF
errors when resuming downloads via ollama pull
Ctrl+W
shortcut to ollama run
Full Changelog: https://github.com/ollama/ollama/compare/v0.1.38...v0.1.39
Published by github-actions[bot] 5 months ago
ollama ps
A new command is now available: ollama ps
. This command displays currently loaded models, their memory footprint, and the processors used (GPU or CPU):
% ollama ps
NAME ID SIZE PROCESSOR UNTIL
mixtral:latest 7708c059a8bb 28 GB 47%/53% CPU/GPU Forever
llama3:latest a6990ed6be41 5.5 GB 100% GPU 4 minutes from now
all-minilm:latest 1b226e2802db 585 MB 100% GPU 4 minutes from now
/clear
To clear the chat history for a session when running ollama run
, use /clear
:
>>> /clear
Cleared session context
/save
will no longer abort the chat session if an incorrect name is provided/api/tags
API endpoint will now correctly return an empty list []
instead of null
if no models are providedFull Changelog: https://github.com/ollama/ollama/compare/v0.1.37...v0.1.38
Published by github-actions[bot] 5 months ago
ollama list
ollama create
finish_reason
being ""
instead of null
in the Open-AI compatible chat API.Full Changelog: https://github.com/ollama/ollama/compare/v0.1.36...v0.1.37
Published by github-actions[bot] 5 months ago
exit status 0xc0000005
error with AMD graphics cards on WindowsFull Changelog: https://github.com/ollama/ollama/compare/v0.1.35...v0.1.36
Published by github-actions[bot] 5 months ago
ollama create
can now quantize models when importing them using the --quantize
or -q
flag:ollama create -f Modelfile --quantize q4_0 mymodel
[!NOTE]
--quantize
works when importingfloat16
orfloat32
models:
- From a binary GGUF files (e.g.
FROM ./model.gguf
)- From a library model (e.g.
FROM llama3:8b-instruct-fp16
)
ollama run
ollama show
for vision modelsOPTIONS
requests to the Ollama API will no longer result in errorsdone_reason
field in responses describing why generation stopped respondingFull Changelog: https://github.com/ollama/ollama/compare/v0.1.34...v0.1.35
Published by github-actions[bot] 6 months ago
ollama create
no slots available
error that occurred when cancelling a request and then sending follow up requestsOLLAMA_NUM_PARALLEL
and OLLAMA_MAX_LOADED_MODELS
flagsollama serve
will now shut down quickly, even if a model is loadingFull Changelog: https://github.com/ollama/ollama/compare/v0.1.33...v0.1.34
Published by github-actions[bot] 6 months ago
New concurrency features are coming soon to Ollama. They are available
OLLAMA_NUM_PARALLEL
: Handle multiple requests simultaneously for a single modelOLLAMA_MAX_LOADED_MODELS
: Load multiple models simultaneouslyTo enable these features, set the environment variables for ollama serve
. For more info see this guide:
OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve
Full Changelog: https://github.com/ollama/ollama/compare/v0.1.32...v0.1.33-rc5
Published by github-actions[bot] 6 months ago
mixtral:8x22b
and command-r-plus
OLLAMA_ORIGINS
would cause an errorTo install this pre-release version on Linux:
curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.1.32-rc1 sh
Full Changelog: https://github.com/ollama/ollama/compare/v0.1.31...v0.1.32-rc1
Published by github-actions[bot] 7 months ago
Full Changelog: https://github.com/ollama/ollama/compare/v0.1.30...v0.1.31
Published by github-actions[bot] 7 months ago
ollama run
on Windows
:
in the filename.Important change
For filesystem compatibility, Ollama has changed model data filenames to use -
instead of :
. This change will be applied automatically. If downgrading to 0.1.29 or lower from 0.1.30 (on Linux or macOS only) run:
find ~/.ollama/models/blobs -type f -exec bash -c 'mv "$0" "${0//-/:}"' {} \;
Full Changelog: https://github.com/ollama/ollama/compare/v0.1.29...v0.1.30