ollama - v0.2.0

Published by github-actions[bot] 4 months ago

Concurrency

Ollama 0.2.0 is now available with concurrency support. This unlocks 2 specific features:

Parallel requests

Ollama can now serve multiple requests at the same time, using only a little bit of additional memory for each request. This enables use cases such as:

Handling multiple chat sessions at the same time
Hosting a code completion LLM for your internal team
Processing different parts of a document simultaneously
Running several agents at the same time.

https://github.com/ollama/ollama/assets/251292/9772a5f1-c072-41db-be6c-dd3c621aa2fd

Multiple models

Ollama now supports loading different models at the same time, dramatically improving:

Retrieval Augmented Generation (RAG): both the embedding and text completion models can be loaded into memory simultaneously.
Agents: multiple different agents can now run simultaneously
Running large and small models side-by-side

Models are automatically loaded and unloaded based on requests and how much GPU memory is available.

To see which models are loaded, run ollama ps:

% ollama ps
NAME                    ID              SIZE    PROCESSOR       UNTIL
gemma:2b                030ee63283b5    2.8 GB  100% GPU        4 minutes from now
all-minilm:latest       1b226e2802db    530 MB  100% GPU        4 minutes from now
llama3:latest           365c0bd3c000    6.7 GB  100% GPU        4 minutes from now

For more information on concurrency, see the FAQ

New models

GLM-4: A strong multi-lingual general language model with competitive performance to Llama 3.
CodeGeeX4: A versatile model for AI software development scenarios, including code completion.
Gemma 2: Improved output quality and base text generation models now available

What's Changed

Improved Gemma 2
- Fixed issue where model would generate invalid tokens after hitting context window
- Fixed inference output issues with gemma2:27b
- Re-downloading the model may be required: ollama pull gemma2 or ollama pull gemma2:27b
Ollama will now show a better error if a model architecture isn't supported
Improved handling of quotes and spaces in Modelfile FROM lines
Ollama will now return an error if the system does not have enough memory to run a model on Linux

New Contributors

@Muku784 made their first contribution in https://github.com/ollama/ollama/pull/5382
@abitrolly made their first contribution in https://github.com/ollama/ollama/pull/4821

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.48...v0.2.0

ollama - v0.1.48

Published by github-actions[bot] 4 months ago

gemma 2

What's Changed

Fixed issue where Gemma 2 would continuously output when reaching context limits
Fixed out of memory and core dump errors when running Gemma 2
/show info will now show additional model information in ollama run
Fixed issue where ollama show would result in an error on certain vision models

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.47...v0.1.48

ollama - v0.1.47

Published by github-actions[bot] 4 months ago

Ollama Gemma 2 illustration

What's Changed

Added support for Google Gemma 2 models (9B and 27B)
Fixed issues with ollama create when importing from Safetensors

A special thank you to the Google Cloud and DeepMind team members for Gemma 2 support.

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.46...v0.1.47

ollama - v0.1.46

Published by github-actions[bot] 4 months ago

ollama run

What's Changed

Increased model loading speed with ollama run, especially if running an already-loaded model
Improved performance of /api/show including for large models
Fixes issue where the --quantize flag in ollama create would lead to an error
Improved model loading times when models would not completely fit in system memory on Linux
Fixed issue where certain Modelfile parameters would not be parsed correctly

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.45...v0.1.46

ollama - v0.1.45

Published by github-actions[bot] 4 months ago

New models

DeepSeek-Coder-V2: A 16B & 236B open-source Mixture-of-Experts code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks.

`ollama show`

ollama show will now show model details such as context length, parameters, embedding size, license and more:

% ollama show llama3
  Model                                              
  	arch            	llama	                              
  	parameters      	8.0B 	                              
  	quantization    	Q4_0 	                              
  	context length  	8192 	                              
  	embedding length	4096 	                              
  	                                                   
  Parameters                                         
  	num_keep	24                   	                      
  	stop    	"<|start_header_id|>"	                      
  	stop    	"<|end_header_id|>"  	                      
  	stop    	"<|eot_id|>"         	                      
  	                                                   
  License                                            
  	META LLAMA 3 COMMUNITY LICENSE AGREEMENT         	  
  	Meta Llama 3 Version Release Date: April 18, 2024

What's Changed

ollama show <model> will now show model information such as context window size
Model loading on Windows with CUDA GPUs is now faster
Setting seed in the /v1/chat/completions OpenAI compatibility endpoint no longer changes temperature
Enhanced GPU discovery and multi-gpu support with concurrency
The Linux install script will now skip searching for network devices
Introduced a workaround for AMD Vega RX 56 SDMA support on Linux
Fix memory prediction for deepseek-v2 and deepseek-coder-v2 models
api/show endpoint returns extensive model metadata
GPU configuration variables are now reported in ollama serve
Update Linux ROCm to v6.1.1

New Contributors

@jayson-cloude made their first contribution in https://github.com/ollama/ollama/pull/4972

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.44...v0.1.45

ollama - v0.1.44

Published by github-actions[bot] 4 months ago

What's Changed

Fixed issue where unicode characters such as emojis would not be loaded correctly when running ollama create
Fixed certain cases where Nvidia GPUs would not be detected and reported as compute capability 1.0 devices

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.43...v0.1.44

ollama - v0.1.43

Published by github-actions[bot] 4 months ago

Ollama honest work

What's Changed

New import.md guide for converting and importing models to Ollama
Fixed issue where embedding vectors resulting from /api/embeddings would not be accurate
JSON mode responses will no longer include invalid escape characters
Removing a model will no longer show incorrect File not found errors
Fixed issue where running ollama create would result in an error on Windows with certain file formatting

New Contributors

@erhant made their first contribution in https://github.com/ollama/ollama/pull/4854
@nischalj10 made their first contribution in https://github.com/ollama/ollama/pull/4612
@dcasota made their first contribution in https://github.com/ollama/ollama/pull/4852
@Napuh made their first contribution in https://github.com/ollama/ollama/pull/4084
@hughescr made their first contribution in https://github.com/ollama/ollama/pull/3782
@jimscard made their first contribution in https://github.com/ollama/ollama/pull/3382

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.42...v0.1.43

ollama - v0.1.42

Published by github-actions[bot] 4 months ago

New models

Qwen 2: a new series of large language models from Alibaba group

What's Changed

Fixed issue where qwen2 would output erroneous text such as GGG on Nvidia and AMD GPUs
ollama pull is now faster if it detects a model is already downloaded
ollama create will now automatically detect prompt templates for popular model architectures such as Llama, Gemma, Phi and more.
Ollama can now be accessed from local apps built with Electron and Tauri, as well as in developing apps in local html files
Update welcome prompt in Windows to llama3
Fixed issues where /api/ps and /api/tags would show invalid timestamps in responses

New Contributors

@shoebham made their first contribution in https://github.com/ollama/ollama/pull/4766
@kartikm7 made their first contribution in https://github.com/ollama/ollama/pull/4719
@royjhan made their first contribution in https://github.com/ollama/ollama/pull/4822

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.41...v0.1.42

ollama - v0.1.41

Published by github-actions[bot] 5 months ago

What's Changed

Fixed issue on Windows 10 and 11 with Intel CPUs with integrated GPUs where Ollama would encounter an error

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.40...v0.1.41

ollama - v0.1.40

Published by github-actions[bot] 5 months ago

ollama continuing to capture bugs

New models

Codestral: Codestral is Mistral AI’s first-ever code model designed for code generation tasks.
IBM Granite Code: now in 3B and 8B parameter sizes.
Deepseek V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

What's Changed

Fixed out of memory and incorrect token issues when running Codestral on 16GB Macs
Fixed issue where full-width characters (e.g. Japanese, Chinese, Russian) were deleted at end of the line when using ollama run

New Examples

Use open-source models as coding assistant with Continue

New Contributors

@zhewang1-intc made their first contribution in https://github.com/ollama/ollama/pull/3278

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.39...v0.1.40

ollama - v0.1.39

Published by github-actions[bot] 5 months ago

New models

Cohere Aya 23: A new state-of-the-art, multilingual LLM covering 23 different languages.
Mistral 7B 0.3: A new version of Mistral 7B with initial support for function calling.
Phi-3 Medium: a 14B parameters, lightweight, state-of-the-art open model by Microsoft.
Phi-3 Mini 128K and Phi-3 Medium 128K: versions of the Phi-3 models that support a context window size of 128K
Granite code: A family of open foundation models by IBM for Code Intelligence

Llama 3 import

It is now possible to import and quantize Llama 3 and its finetunes from Safetensors format to Ollama.

First, clone a Hugging Face repo with a Safetensors model:

git clone https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
cd Meta-Llama-3-8B-Instruct

Next, create a Modelfile:

FROM .

TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""

PARAMETER stop <|start_header_id|>
PARAMETER stop <|end_header_id|>
PARAMETER stop <|eot_id|>

Then, create and quantize a model:

ollama create --quantize q4_0 -f Modelfile my-llama3 
ollama run my-llama3

What's Changed

Fixed issues where wide characters such as Chinese, Korean, Japanese and Russian languages.
Added new OLLAMA_NOHISTORY=1 environment variable that can be set to disable history when using ollama run
New experimental OLLAMA_FLASH_ATTENTION=1 flag for ollama serve that improves token generation speed on Apple Silicon Macs and NVIDIA graphics cards
Fixed error that would occur on Windows running ollama create -f Modelfile
ollama create can now create models from I-Quant GGUF files
Fixed EOF errors when resuming downloads via ollama pull
Added a Ctrl+W shortcut to ollama run

New Contributors

@rapmd73 made their first contribution in https://github.com/ollama/ollama/pull/4467
@sammcj made their first contribution in https://github.com/ollama/ollama/pull/4120
@likejazz made their first contribution in https://github.com/ollama/ollama/pull/4535

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.38...v0.1.39

ollama - v0.1.38

Published by github-actions[bot] 5 months ago

New Models

Falcon 2: A new 11B parameters causal decoder-only model built by TII and trained over 5T tokens.
Yi 1.5: A new high-performing version of Yi, now licensed as Apache 2.0. Available in 6B, 9B and 34B sizes.

What's Changed

`ollama ps`

A new command is now available: ollama ps. This command displays currently loaded models, their memory footprint, and the processors used (GPU or CPU):

% ollama ps
NAME             	ID          	SIZE  	PROCESSOR      	UNTIL              
mixtral:latest   	7708c059a8bb	28 GB 	47%/53% CPU/GPU	Forever           	
llama3:latest    	a6990ed6be41	5.5 GB	100% GPU       	4 minutes from now	
all-minilm:latest	1b226e2802db	585 MB	100% GPU       	4 minutes from now

`/clear`

To clear the chat history for a session when running ollama run, use /clear:

>>> /clear
Cleared session context

Fixed issue where switching loaded models on Windows would take several seconds
Running /save will no longer abort the chat session if an incorrect name is provided
The /api/tags API endpoint will now correctly return an empty list [] instead of null if no models are provided

New Contributors

@fangtaosong made their first contribution in https://github.com/ollama/ollama/pull/4387
@machimachida made their first contribution in https://github.com/ollama/ollama/pull/4424

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.37...v0.1.38

ollama - v0.1.37

Published by github-actions[bot] 5 months ago

What's Changed

Fixed issue where models with uppercase characters in the name would not show with ollama list
Fixed usage string for ollama create
Fix finish_reason being "" instead of null in the Open-AI compatible chat API.

New Contributors

@todashuta made their first contribution in https://github.com/ollama/ollama/pull/4362

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.36...v0.1.37

ollama - v0.1.36

Published by github-actions[bot] 5 months ago

What's Changed

Fixed exit status 0xc0000005 error with AMD graphics cards on Windows
Fixed rare out of memory errors when loading a model to run with CPU

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.35...v0.1.36

ollama - v0.1.35

Published by github-actions[bot] 5 months ago

New models

Llama 3 ChatQA: A model from NVIDIA based on Llama 3 that excels at conversational question answering (QA) and retrieval-augmented generation (RAG).

What's Changed

Quantization: ollama create can now quantize models when importing them using the --quantize or -q flag:

ollama create -f Modelfile --quantize q4_0 mymodel

[!NOTE]
--quantize works when importing float16 or float32 models:

From a binary GGUF files (e.g. FROM ./model.gguf)

From a library model (e.g. FROM llama3:8b-instruct-fp16)

Fixed issue where inference subprocesses wouldn't be cleaned up on shutdown.
Fixed a series out of memory errors when loading models on multi-GPU systems
Ctrl+J characters will now properly add newlines in ollama run
Fixed issues when running ollama show for vision models
OPTIONS requests to the Ollama API will no longer result in errors
Fixed issue where partially downloaded files wouldn't be cleaned up
Added a new done_reason field in responses describing why generation stopped responding
Ollama will now more accurately estimate how much memory is available on multi-GPU systems especially when running different models one after another

New Contributors

@fmaclen made their first contribution in https://github.com/ollama/ollama/pull/3884
@Renset made their first contribution in https://github.com/ollama/ollama/pull/3881
@glumia made their first contribution in https://github.com/ollama/ollama/pull/3043
@boessu made their first contribution in https://github.com/ollama/ollama/pull/4236
@gaardhus made their first contribution in https://github.com/ollama/ollama/pull/2307
@svilupp made their first contribution in https://github.com/ollama/ollama/pull/2192
@WolfTheDeveloper made their first contribution in https://github.com/ollama/ollama/pull/4300

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.34...v0.1.35

ollama - v0.1.34

Published by github-actions[bot] 6 months ago

Ollama goes on an adventure to hunt down bugs

New models

Llava Llama 3: A new high-performing LLaVA model fine-tuned from Llama 3 Instruct.
Llava Phi 3: A new small LLaVA model fine-tuned from Phi 3.
StarCoder2 15B Instruct: A new instruct fine-tune of the StarCoder2 model
CodeGemma 1.1: A new release of the CodeGemma model.
StableLM2 12B: A new 12B version of the StableLM 2 model from Stability AI
Moondream 2: Moondream 2's runtime parameters have been improved for better responses

What's Changed

Fixed issues with LLaVa models where they would respond incorrectly after the first request
Fixed out of memory errors when running large models such as Llama 3 70B
Fixed various issues with Nvidia GPU discovery on Linux and Windows
Fixed a series of Modelfile errors when running ollama create
Fixed no slots available error that occurred when cancelling a request and then sending follow up requests
Improved AMD GPU detection on Fedora
Improved reliability when using the experimental OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS flags
ollama serve will now shut down quickly, even if a model is loading

New Contributors

@drnic made their first contribution in https://github.com/ollama/ollama/pull/4116
@bernardo-bruning made their first contribution in https://github.com/ollama/ollama/pull/4111
@Drlordbasil made their first contribution in https://github.com/ollama/ollama/pull/4174
@Saif-Shines made their first contribution in https://github.com/ollama/ollama/pull/4119
@HydenLiu made their first contribution in https://github.com/ollama/ollama/pull/4194
@jl-codes made their first contribution in https://github.com/ollama/ollama/pull/3621
@Nurgo made their first contribution in https://github.com/ollama/ollama/pull/3473
@adrienbrault made their first contribution in https://github.com/ollama/ollama/pull/3129
@Darinochka made their first contribution in https://github.com/ollama/ollama/pull/3945

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.33...v0.1.34

ollama - v0.1.33

Published by github-actions[bot] 6 months ago

Models:

Llama 3: a new model by Meta, and the most capable openly available LLM to date
Phi 3 Mini: a new 3.8B parameters, lightweight, state-of-the-art open model by Microsoft.
Dolphin Llama 3: The uncensored Dolphin model, trained by Eric Hartford and based on Llama 3 with a variety of instruction, conversational, and coding skills.
Qwen 110B: The first Qwen model over 100B parameters in size with outstanding performance in evaluations

What's Changed

Fixed issues where the model would not terminate, causing the API to hang.
Fixed a series of out of memory errors on Apple Silicon Macs
Fixed out of memory errors when running Mixtral architecture models

Experimental concurrency features

New concurrency features are coming soon to Ollama. They are available

OLLAMA_NUM_PARALLEL: Handle multiple requests simultaneously for a single model
OLLAMA_MAX_LOADED_MODELS: Load multiple models simultaneously

To enable these features, set the environment variables for ollama serve. For more info see this guide:

OLLAMA_NUM_PARALLEL=4 OLLAMA_MAX_LOADED_MODELS=4 ollama serve

New Contributors

@sidxt made their first contribution in https://github.com/ollama/ollama/pull/3705
@ChengenH made their first contribution in https://github.com/ollama/ollama/pull/3789
@secondtruth made their first contribution in https://github.com/ollama/ollama/pull/3503
@reid41 made their first contribution in https://github.com/ollama/ollama/pull/3612
@ericcurtin made their first contribution in https://github.com/ollama/ollama/pull/3626
@JT2M0L3Y made their first contribution in https://github.com/ollama/ollama/pull/3633
@datvodinh made their first contribution in https://github.com/ollama/ollama/pull/3655
@MapleEve made their first contribution in https://github.com/ollama/ollama/pull/3817
@swuecho made their first contribution in https://github.com/ollama/ollama/pull/3810
@brycereitano made their first contribution in https://github.com/ollama/ollama/pull/3895
@bsdnet made their first contribution in https://github.com/ollama/ollama/pull/3889
@fyxtro made their first contribution in https://github.com/ollama/ollama/pull/3855
@natalyjazzviolin made their first contribution in https://github.com/ollama/ollama/pull/3962

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.32...v0.1.33-rc5

ollama - v0.1.32

Published by github-actions[bot] 6 months ago

What's Changed

Support for larger models such as mixtral:8x22b and command-r-plus
Ollama will now better estimate memory utilization when loading models, leading to less out-of-memory errors, as well as better GPU utilization
Fixed several issues where Ollama would hang upon encountering an error
Fix issue where using quotes in OLLAMA_ORIGINS would cause an error

To install this pre-release version on Linux:

curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.1.32-rc1 sh

New Contributors

@sugarforever made their first contribution in https://github.com/ollama/ollama/pull/3400
@yaroslavyaroslav made their first contribution in https://github.com/ollama/ollama/pull/3378
@Nagi-ovo made their first contribution in https://github.com/ollama/ollama/pull/3423
@ParisNeo made their first contribution in https://github.com/ollama/ollama/pull/3436
@philippgille made their first contribution in https://github.com/ollama/ollama/pull/3437
@cesto93 made their first contribution in https://github.com/ollama/ollama/pull/3461
@ThomasVitale made their first contribution in https://github.com/ollama/ollama/pull/3515
@writinwaters made their first contribution in https://github.com/ollama/ollama/pull/3539
@alexmavr made their first contribution in https://github.com/ollama/ollama/pull/3555

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.31...v0.1.32-rc1

ollama - v0.1.31

Published by github-actions[bot] 7 months ago

New models

Qwen 1.5 32B: A new 32B multilingual model competitive with larger models such as Mixtral
StarlingLM Beta: A high ranking 7B model that ranks higher than 7B on popular benchmarks and includes a permissive Apache 2.0 license.

What's new

Fixed issue where Ollama would hang when using unicode characters in the prompt such as emojis

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.30...v0.1.31

ollama - v0.1.30

Published by github-actions[bot] 7 months ago

New models

Command R: a Large Language Model optimized for conversational interaction and long context tasks.
mxbai-embed-large: A new state-of-the-art large embedding model

What's Changed

Fixed various issues with ollama run on Windows
- History now will work when pressing up and down arrow keys
- Right and left arrow keys will now move the cursor appropriately
- Pasting multi-line strings will now work on Windows
Fixed issue where mounting or sharing files between Linux and Windows (e.g. via WSL or Docker) would cause errors due to having : in the filename.
Improved support for AMD MI300 and MI300X Accelerators
Improved cleanup of temporary files resulting in better space utilization

Important change

For filesystem compatibility, Ollama has changed model data filenames to use - instead of :. This change will be applied automatically. If downgrading to 0.1.29 or lower from 0.1.30 (on Linux or macOS only) run:

find ~/.ollama/models/blobs -type f -exec bash -c 'mv "$0" "${0//-/:}"' {} \;

New Contributors

@alitrack made their first contribution in https://github.com/ollama/ollama/pull/3111
@drazdra made their first contribution in https://github.com/ollama/ollama/pull/3338
@rapidarchitect made their first contribution in https://github.com/ollama/ollama/pull/3288
@yusufcanb made their first contribution in https://github.com/ollama/ollama/pull/3274
@jikkuatwork made their first contribution in https://github.com/ollama/ollama/pull/3178
@timothycarambat made their first contribution in https://github.com/ollama/ollama/pull/3145
@fly2tomato made their first contribution in https://github.com/ollama/ollama/pull/2946
@enoch1118 made their first contribution in https://github.com/ollama/ollama/pull/2927
@danny-avila made their first contribution in https://github.com/ollama/ollama/pull/2918
@mmo80 made their first contribution in https://github.com/ollama/ollama/pull/2881
@anaisbetts made their first contribution in https://github.com/ollama/ollama/pull/2428
@marco-souza made their first contribution in https://github.com/ollama/ollama/pull/1905
@guchenhe made their first contribution in https://github.com/ollama/ollama/pull/1944
@herval made their first contribution in https://github.com/ollama/ollama/pull/1873
@Npahlfer made their first contribution in https://github.com/ollama/ollama/pull/1623
@remy415 made their first contribution in https://github.com/ollama/ollama/pull/2279

Full Changelog: https://github.com/ollama/ollama/compare/v0.1.29...v0.1.30