A wrapper class for interacting with a LLaMA model instance loaded locally. Use any gguf models, you can find models on huggingface.co. This projects is based on llama.cpp by ggerganov used via node-llama-cpp by withcatai.
Creates a new instance of the LlamaWrapper class, assigning it a unique ID and initializing its internal state.
Loads the LLaMA module required for interacting with the model. Throws an error if the module cannot be loaded.
Loads the LLaMA library instance, optionally specifying a GPU device to use. Throws an error if the module isn't initialized.
Loads a specific LLaMA model from the specified modelPath. Throws an error if Llama isn't loaded.
Initializes a new chat session with the wrapped LLaMA model using the specified systemPrompt. Throws an error if the model isn't initialized.
Returns the current status of the wrapper as an object containing the status and optional message. The status can be one of LlamaStatusType (Uninitialized, Ready, Loading, Generating or Error).
Returns a boolean indicating whether the wrapper is in ready state.
Generates an answer to the specified message and calls the optional onToken callback function with each generated chunk. Throws an error if the session isn't initialized.
Retrieves the chat history associated with this wrapper's current session. Throws an error if no session is found.
Sets the chat history associated with this wrapper's current session to the specified chatHistoryItem array. Throws an error if no session is found.
Returns the unique ID assigned to this wrapper instance.
Returns an object containing various information about the wrapped LLaMA model, including its ID, model filename, train context size, and more.
Disposes of the current chat session associated with this wrapper. Throws an error if no session is found.
Clears the history of the current chat sequence associated with this wrapper. Throws an error if no session or sequence is found.
The Gpu
type represents a GPU device to use.
The LlamaStatusType
type defines the possible status values for a llama wrapper instance.
The ChatHistoryItem
type represents a single item in the chat history, which contains the message and its corresponding response. Can be ChatSystemMessage | ChatUserMessage | ChatModelResponse
By following the steps below, you can build and install the module from source code.
git clone https://github.com/tib0/llama3-wrapper.git
cd ./llama3-wrapper
pnpm i
pnpm build
pnpm link -g
cd /path/to/target-project
pnpm link -g llama3-wrapper
Add your GGUF model path in a .env file at the root of your project:
LLAMA_MODELS_PATH=/Users/me/example/LLM/Models/my-model-file.gguf
Sample chat-like usage in terminal:
import { type ChatHistoryItem, LlamaWrapper } from 'llama3-wrapper';
import readline from 'readline';
import { spawn } from 'node:child_process';
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
const run = async () => {
console.log(`# START LLAMA CHAT`);
console.log(`\n`);
console.log(`# Feeding history traces`);
const history: ChatHistoryItem[] = [
{ type: 'user', text: 'Hey.' },
{ type: 'model', response: ['Hello !'] },
];
console.log(`# Waiting seat allocation`);
const llamaNodeCPP = new LlamaWrapper();
await llamaNodeCPP.loadModule();
await llamaNodeCPP.loadLlama();
await llamaNodeCPP.loadModel(process.env.LLAMA_MODELS_PATH);
await llamaNodeCPP.initSession(promptSystem);
console.log(`# Prompt ready`);
console.log(`# Activated TTS (voice)`);
console.log(`\n`);
rl.setPrompt('1 > ');
rl.prompt();
let i = 1;
rl.on('line', async (q) => {
if (!q || q === '' || q === 'exit' || q === 'quit' || q === 'q') {
rl.close();
} else {
const a = await llamaNodeCPP.prompt(q);
console.log(`${i} @ ${a}`);
spawn('say', [a]);
console.log(`\n`);
i++;
}
rl.setPrompt(`${i} > `);
rl.prompt();
}).on('close', async () => {
console.log(`\n`);
console.log(`Disposing session...`);
await llamaNodeCPP.disposeSession();
console.log(`\n`);
const a = await llamaNodeCPP.getHistory();
console.log(`History:`);
console.log(JSON.stringify(a));
console.log(`\n`);
console.log('# END LLAMA CHAT');
process.exit(0);
});
};
run();