
Develop LangChain using local LLMs with Ollama

MIT License


+ Ollama LangChain Guide

Develop LangChain using local LLMs with Ollama

  • LLM costs getting you down?
  • Want to develop offline?
  • Don't want to share your personal data with LLM providers?
  • Save costs, develop anywhere, and own all your data with Ollama and LangChain!

Before you start

  • This tutorial requires several terminals to be open and running proccesses at once i.e.: to run various Ollama servers.
  • When you see the 🆕 emoji before a set of terminal commands, open a new terminal process.
  • When you see the ♻️ emoji before a set of terminal commands, you can re-use the same terminal you used last time.


  1. Download and install Ollama and start the server.


curl -fsSL https://ollama.com/install.sh | sh
ollama serve
  1. Download and install Poetry.

  2. Fork this repository and setup the Poetry environment:


git clone https://github.com/Cutwell/ollama-langchain-guide.git
cd ollama-langchain-guide
poetry install


  1. Browse the available Ollama models and select a model.
  • Think about your local computers available RAM and GPU memory when picking the model + quantisation level.
  • We will be using the phi-2 model from Microsoft (Ollama, Hugging Face) as it is both small and fast.
  • Read this summary for advice on prompting the phi-2 model optimally.


ollama pull phi
  1. Start the Ollama server.
  • This server can be queried with LangChain, or you can interact with it in this terminal.


ollama run phi
  1. Run the PyTest tests in /ollama_langchain_guide/tests to check everything is working correctly.


poetry run pytest -rP ollama_langchain_guide/tests
  1. Get started building your own local LLM projects with the example StreamLit app in /ollama_langchain_guide/src.


poetry run streamlit run ollama_langchain_guide/src/app.py --server.port=8080

Pros and Cons of Phi-2

Pros Cons
Natural language, human-like outputs. Can distract itself, prone to creating logic puzzles based on user queries + tries to solve them itself.
Context window of 2048 tokens - can use chat history in answers. Often ignores established facts in chat history - answers same question multiple ways in the same conversation.
Can output syntax-correct Python code. Bad at generating code that achieves desired goal - e.g.: outputs a syntax-correct function to calculate Pi, but the outputs are garbage.
Very fast response time.
