Experimental front-end client library for interacting with llama.cpp
AGPL-3.0 License
The llama.cpp
client is a experimental front-end client library for interacting with llama.cpp
, a powerful tool for natural language processing and text generation. This client enables seamless communication with the llama.cpp
server, making it easy to integrate and interact with llama.cpp
's capabilities.
llama.cpp
server using a simple api, cli, web ui.llama.cpp
server for text generation and conversation.NOTE: All interfaces are currently a WIP (work in progress)
To get started with the llama.cpp
client, follow these steps:
Clone the repositories: Use Git to clone both the llama-cpp-client
and
llama.cpp
repositories onto your local machine or server.
git clone https://github.com/teleprint-me/llama-cpp-client
cd llama-cpp-client
Note that git
will ignore the llama.cpp
repository.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
Build and install llama.cpp
: Use the provided instructions to build and
install llama.cpp
. For example, you can use CMake to build the library with ROCm support. I personally prefer Vulkan when using AMD because Vulkan has better support for a wider range of GPU's than ROCm does.
Build the library with Vulkan support.
make LLAMA_VULKAN=1 # GPU
Build the library with CUDA support.
make LLAMA_CUDA=1 # GPU
Build the library with BLAS support.
make LLAMA_OPENBLAS=1 # CPU
Run the llama.cpp
server: Use the provided instructions to run the
llama.cpp
server
with your chosen model and configuration settings.
./llama.cpp/server -m [model path here] --ctx-size [int] --n-gpu-layers [int] --path app
Note that you can extend the front end by running the server binary with
--path
.
How to use the web user interface
Open your preferred web browser and visit localhost:8080
to access the llama.cpp
client's web UI. From here, you can interact with the llama.cpp
server for text generation and conversation.
Note: The WebUI is currently a limited prototype for completions.
How to use the command-line interface
python -m llama_cpp_client.client -n llama-3-test --stop "<|eot_id|>"
How to use the application programming interface
from llama_cpp_client.request import LlamaCppRequest
# Initialize the LlamaCppRequest instance
llama_cpp_request = LlamaCppRequest(base_url="http://127.0.0.1", port="8080")
# Define the prompt for the model
llama_prompt = "Once upon a time"
# Prepare data for streaming request
llama_data = {"prompt": llama_prompt, "stream": True}
# Request the models stream generator
llama_generator = llama_cpp_request.stream("/completion", data=llama_data)
# Track completions
completions = []
# Generate the model's response
llama_output = ""
for response in llama_generator:
if "content" in response:
token = response["content"]
llama_output += token
# Print each token to the user
print(token, end="")
sys.stdout.flush()
# Add padding to the model's output
print()
# Append the completion
completions.append({"prompt": llama_prompt, "output": llama_output})
Note that most of the Python API modules for llama_cpp_client
can be executed as a CLI tool providing an example, test, and output sample all in one place.
python -m llama_cpp_client.request
The general idea is to keep the implementation as simple as possible for now.
Check out the source code for more examples.
By following these steps, you should be able to get started with the llama.cpp
client and begin exploring its capabilities. For more detailed documentation and
examples, please refer to the llama-cpp-client
documentation.
Refer to the llama.cpp client documentation for detailed documentation and examples.
This project is licensed under the MIT License.
llama.cpp
team for developing an incredible natural language processing