Next.js, Vercel AI SDK, Llama.cpp & ModelFusion starter

This starter example shows how to use Next.js, the Vercel AI SDK, Llama.cpp and ModelFusion to create a ChatGPT-like AI-powered streaming chat bot.

Setup

Install Llama.cpp on your machine.
Clone the repository: git clone https://github.com/lgrammel/modelfusion-llamacpp-nextjs-starter.git
Install dependencies: npm install
Start the development server: npm run dev

For each example, you also need to download the GGUF model and start the Llama.cpp server:

Examples

Llama 2

Model: Llama-2-7B-Chat-GGUF
Server start: ./server -m models/llama-2-7b-chat.Q4_K_M.gguf (with the right model path)
Go to http://localhost:3000/llama2
Code: app/api/llama/route.ts

Mistral Instruct

Model: Mistral-7B-Instruct-v0.2-GGUF
Server start: ./server -m models/mistral-7b-instruct-v0.2.Q4_K_M.gguf (with the right model path)
Go to http://localhost:3000/mistral
Code: app/api/mistral/route.ts

Mixtral Instruct

Model: Mixtral-8x7B-Instruct-v0.1-GGUF
Server start: ./server -m models/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf (with the right model path)
Go to http://localhost:3000/mixtral
Code: app/api/mixtral/route.ts

OpenHermes 2.5

Model: OpenHermes-2.5-Mistral-7B-GGUF
Server start: ./server -m models/openhermes-2.5-mistral-7b.Q4_K_M.gguf (with the right model path)
Go to http://localhost:3000/openhermes
Code: app/api/openhermes/route.ts

Example Route

import { ModelFusionTextStream, asChatMessages } from "@modelfusion/vercel-ai";
import { Message, StreamingTextResponse } from "ai";
import { llamacpp, streamText, trimChatPrompt } from "modelfusion";

export const runtime = "edge";

export async function POST(req: Request) {
  const { messages }: { messages: Message[] } = await req.json();

  const model = llamacpp
    .CompletionTextGenerator({
      promptTemplate: llamacpp.prompt.Llama2, // choose the correct prompt template
      temperature: 0,
      cachePrompt: true,
      contextWindowSize: 4096, // Llama 2 context window size
      maxGenerationTokens: 512, // Room for answer
    })
    .withChatPrompt();

  // Use ModelFusion to call llama.cpp:
  const textStream = await streamText({
    model,
    // reduce chat prompt length to fit the context window:
    prompt: await trimChatPrompt({
      model,
      prompt: {
        system:
          "You are an AI chat bot. " +
          "Follow the user's instructions carefully.",

        // map Vercel AI SDK Message to ModelFusion ChatMessage:
        messages: asChatMessages(messages),
      },
    }),
  });

  // Return the result using the Vercel AI SDK:
  return new StreamingTextResponse(
    ModelFusionTextStream(
      textStream,
      // optional callbacks:
      {
        onStart() {
          console.log("onStart");
        },
        onToken(token) {
          console.log("onToken", token);
        },
        onCompletion: () => {
          console.log("onCompletion");
        },
        onFinal(completion) {
          console.log("onFinal", completion);
        },
      }
    )
  );
}

Related Projects

modelfusion-ollama-nextjs-starter

Starter examples for using Next.js and the Vercel AI SDK with Ollama and ModelFusion.

19 Nov 2023 77

modelfusion

The TypeScript library for building AI applications.

25 May 2023 889

modelfusion-llamacpp-nextjs-starter