Completion After Prompt Probability. Make your LLM make a choice
APACHE-2.0 License
Make your LLM pick from a list of choices. Or compute the probability of a completion given a prompt, which may be useful. Squeeze more out of open source LLMs.
from llama_cpp import Llama
from cappr.llama_cpp.classify import predict
model = Llama("./TinyLLama-v0.Q8_0.gguf", verbose=False)
prompt = """Gary told Spongebob a story:
There once was a man from Peru; who dreamed he was eating his shoe. He
woke with a fright, in the middle of the night, to find that his dream
had come true.
The moral of the story is to"""
completions = (
"look at the bright side",
"use your imagination",
"eat shoes",
)
pred = predict(prompt, completions, model)
print(pred)
# use your imagination
See this page of the documentation for more info on using GGUF models.
from transformers import AutoModelForCausalLM, AutoTokenizer
from cappr.huggingface.classify import predict
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Which planet is closer to the Sun: Mercury or Earth?"
completions = ("Mercury", "Earth")
pred = predict(prompt, completions, model_and_tokenizer=(model, tokenizer))
print(pred)
# Mercury
See this page of the
documentation
for more info on using transformers
models.
Many prompts start with the same set of instructions, e.g., a system prompt plus a handful of example input-output pairs. Instead of repeatedly running the model on common instructions, cache them so that future computations are faster.
Here's an
example using
cappr.huggingface.classify.cache_model
.
from transformers import AutoModelForCausalLM, AutoTokenizer
from cappr.huggingface.classify import cache_model, predict
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model_and_tokenizer = (model, tokenizer)
# Create data
prompt_prefix = '''Instructions: complete the sequence.
Here are examples:
A, B, C => D
1, 2, 3 => 4
Complete this sequence:'''
prompts = ["X, Y =>", "10, 9, 8 =>"]
completions = ["7", "Z", "Hi"]
# Cache prompt_prefix because it's used for all prompts
cached_model_and_tokenizer = cache_model(
model_and_tokenizer, prompt_prefix
)
# Compute
preds = predict(
prompts, completions, cached_model_and_tokenizer
)
print(preds)
# ['Z', '7']
Here's an example using
cappr.huggingface.classify.log_probs_conditional
.
from transformers import AutoModelForCausalLM, AutoTokenizer
from cappr.huggingface.classify import log_probs_conditional
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
# Create data
prompts = ["x y", "a b c"]
completions = ["z", "d e"]
# Compute
log_probs_completions = log_probs_conditional(
prompts, completions, model_and_tokenizer=(model, tokenizer)
)
# Outputs (rounded) next to their symbolic representation
print(log_probs_completions[0])
# [[-4.5], [[log Pr(z | x, y)],
# [-5.6, -3.2]] [log Pr(d | x, y), log Pr(e | x, y, d)]]
print(log_probs_completions[1])
# [[-9.7], [[log Pr(z | a, b, c)],
# [-0.2, -0.03]] [log Pr(d | a, b, c), log Pr(e | a, b, c, d)]]
Efficiently aggregate these log-probabilities using
cappr.utils.classify.agg_log_probs
.
For a slightly more advanced demo, see
./demos/huggingface/dpo.ipynb
.
Step-by-step and chain-of-thought prompts are highly effective ways to get an LLM to "reason" about more complex tasks. But if you need a structured output, a step-by-step completion is unwieldy. Use CAPPr to extract the final answer from these types of completions, given a list of possible answers.
See this idea in action here in the documentation.
from transformers import AutoModelForCausalLM, AutoTokenizer
from cappr.huggingface.classify import predict_proba
# Load a model and its tokenizer
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompts = [
"Stephen Curry is a",
"Martina Navratilova was a",
"Dexter, from the TV Series Dexter's Laboratory, is a",
"LeBron James is a",
]
# Each of the prompts could be completed with one of these:
class_names = ("basketball player", "tennis player", "scientist")
prior = ( 1/6, 1/6, 2/3 )
# Say I expect most of my data to have scientists
# Run CAPPr
pred_probs = predict_proba(
prompts=prompts,
completions=class_names,
model_and_tokenizer=(model, tokenizer),
batch_size=2, # whatever fits on your CPU/GPU
prior=prior,
)
# pred_probs[i,j] = probability that prompts[i] is classified as class_names[j]
print(pred_probs.round(1))
# [[0.5 0.3 0.2]
# [0.3 0.6 0.2]
# [0.1 0.1 0.8]
# [0.8 0.2 0. ]]
# For each prompt, which completion is most likely?
pred_class_idxs = pred_probs.argmax(axis=-1)
preds = [class_names[pred_class_idx] for pred_class_idx in pred_class_idxs]
print(preds)
# ['basketball player',
# 'tennis player',
# 'scientist',
# 'basketball player']
Again, let's predict probabilities.
from transformers import AutoModelForCausalLM, AutoTokenizer
from cappr.huggingface.classify import predict_proba_examples
from cappr import Example
# Load a model and its tokenizer
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Create a sequence of Example objects representing your classification tasks
examples = [
Example(
prompt="Jodie Foster played",
completions=("Clarice Starling", "Trinity in The Matrix"),
),
Example(
prompt="Batman, from Batman: The Animated Series, was played by",
completions=("Pete Holmes", "Kevin Conroy", "Spongebob!"),
prior= ( 1/3 , 2/3 , 0 ),
),
]
# Run CAPPr
pred_probs = predict_proba_examples(
examples, model_and_tokenizer=(model, tokenizer)
)
# pred_probs[i][j] = probability that examples[i].prompt is classified as
# examples[i].completions[j]
print([example_pred_probs.round(2) for example_pred_probs in pred_probs])
# [array([0.7, 0.3]),
# array([0.03, 0.97, 0. ])]
# For each example, which completion is most likely?
pred_class_idxs = [
example_pred_probs.argmax() for example_pred_probs in pred_probs
]
preds = [
example.completions[pred_class_idx]
for example, pred_class_idx in zip(examples, pred_class_idxs)
]
print(preds)
# ['Clarice Starling',
# 'Kevin Conroy']
See the demos
for demonstrations
of slightly harder classification tasks.
For CAPPr, GPTQ models are the most computationally performant. These models are
compatible with cappr.huggingface.classify
. See this page of the
documentation
for more info on using these models.
See this page of the documentation.
See this page of the documentation.
Reduce engineering complexity.
See this page of the documentation for more info.
You input a prompt
string, a end_of_prompt
string (a whitespace or empty) and a set
of candidate completion
strings such that the string
{prompt}{end_of_prompt}{completion}
is a naturally flowing thought. CAPPr picks the completion
which is mostly likely to
follow prompt
by computing the
Completion After Prompt Probability
as fleshed out in my question on Cross Validated.
See this page of the documentation.
I'm dumping todos here:
Feel free to raise issues ofc