cappr

Completion After Prompt Probability. Make your LLM make a choice

APACHE-2.0 License

Downloads
992
Stars
68
Committers
1

Bot releases are hidden (Show)

cappr - v0.6.1 - fix openai.token_logprobs

Published by kddubey about 1 year ago

Breaking changes

  • cappr.openai.token_logprobs now prepends a space to each text by default. Set end_of_prompt="" if you don't want that

New features

None

Bug fixes

  • cappr.openai's (still highly experimental) discount feature works for a wider range of completions
cappr - v0.6.0 - HF no-batching module

Published by kddubey about 1 year ago

Breaking changes

None

New features

  • To minimize memory usage, use cappr.huggingface.classify_no_batch. See this section of the docs. I ended up needing this feature to demo Mistral 7B on a T4 GPU

Bug fixes

  • show_progress_bar=False now works, my b
cappr - v0.5.1 - allow installation with no dependencies

Published by kddubey about 1 year ago

Breaking changes

None

New features

Bug fixes

None

cappr - v0.5.0 - support GGUF models using llama-cpp-python

Published by kddubey about 1 year ago

Breaking changes

  • completions is not allowed to be an empty sequence

New features

  • Use GGUF models using the cappr.llama_cpp.classify module. Install using:

    pip install "cappr[llama-cpp]"
    

    See this section of the docs. See this demo for an example.

Bug fixes

None

cappr - v0.4.7 - breaking little things

Published by kddubey about 1 year ago

Breaking changes

  • end_of_prompt is restricted to be a whitespace, ” “, or empty string, ””. After much thought and experimentation, I realized that anything else is unnecessarily complicated

  • The OpenAI API model gpt-3.5-turbo-instruct has been deprecated b/c their API won’t allow setting echo=True, logprobs=1 starting tomorrow

  • The keyword argument for the (still highly experimental) discount feature, log_marginal_probs_completions, has been renamed to log_marg_probs_completions

New features

  • You can input your OpenAI API key dynamically: api_key=

  • The User Guide is much better

Bug fixes

None

cappr - v0.4.6 - support more types of sequence inputs

Published by kddubey about 1 year ago

Breaking changes

None

New features

  • Input checks on prompts and completions are more accurate. You can now input, e.g., a polars or pandas Series of strings

Bug fixes

None

cappr - v0.4.5 - niceties

Published by kddubey about 1 year ago

Breaking changes

  • There are stronger input checks to avoid silent failures. prompts cannot be empty. completions cannot be empty or a pure string (it has to be a sequence of strings)

New features

  • Pass normalize=False when you want raw, unnormalized probabilities for, e.g., multi-label classification applications
  • You can input a single prompt string or Example object. You no longer have to wrap it in a list and then unwrap it
  • You can disable progress bars using show_progress_bar=False
  • cappr.huggingface type-hints the model as a PreTrainedModelForCausalLM for greater clarity

Bug fixes

  • cappr.huggingface doesn't modify the model or tokenizer anymore, sorry bout that
  • The jagged/inhomogenous numpy array warning from earlier numpy versions (when using _examples functions) is correctly handled
cappr - v0.4.0 - HF single-token speedup, token_logprobs, discount feature

Published by kddubey about 1 year ago

Breaking changes

None

New features

  • cappr.huggingface is faster when all of the completions are single tokens. Specifically, we just do inference once on the prompts, and don't repeat data unnecessarily
  • cappr.huggingface implements token_logprobs like cappr.openai did
  • cappr.huggingface now supports the (highly experimental) discount feature (mentioned at the bottom of this answer) like cappr.openai did

Bug fixes

None

cappr - v0.3.0 - support Llama and Llama 2

Published by kddubey about 1 year ago

Breaking changes

None

New features

  • cappr.huggingface now supports Llama and Llama 2 (chat, raw, GPTQd)

Bug fixes

None

cappr - v0.2.6 - deprecate model string as input to HF functions

Published by kddubey over 1 year ago

Breaking changes

  • cappr.huggingface functions only allow model_and_tokenizer input, not the string model input.

New features

None

Bug fixes

  • Correct type hint for predict_proba_examples functions to reflect that the 2nd dimension is always an array.
cappr - v0.2.5 - add prior kwarg to HF no-cache functions

Published by kddubey over 1 year ago

Breaking changes

None

New features

None

Bug fixes

  • cappr.huggingface.classify.predict_proba and cappr.huggingface.classify.predict now accept a prior kwarg, as was intended (I just forgot to add it in).
cappr - v0.2.4 - fix token slicing

Published by kddubey over 1 year ago

Breaking changes

None

New features

None

Bug fixes

  • For OpenAI models, the completion token probabilities should actually be sliced based on the tokenization of end_of_prompt + completion, not just completion. Based on a few experiments, this change doesn't impact statistical performance. But it should be fixed ofc.

Breaking changes

None

New features

  • Allow for pre-computed completion log-probs for the experimental discount feature. Use the newly surfaced function, cappr.openai.token_logprobs, to compute them once and re-use them.

Bug fixes

None

cappr - v0.2.2 - highly experimental discount feature

Published by kddubey over 1 year ago

Breaking changes

  • Deprecate cappr.utils.classify.agg_log_probs_from_constant_completions. I doubt anyone was using this. If you were, then use cappr.utils.classify.agg_log_probs from now on (it does the exact same thing).

New features

  • Highly experimental feature which discounts completions by their marginal probability. See my updated answer here. The plan is to evaluate this method more thoroughly and discuss it in the user guide. For now, feel free to mess with it.

Bug fixes

  • Fix type hint for tokenizer: AutoTokenizer to PreTrainedTokenizer.
cappr - v0.2.1 - allow prior to be a numpy array

Published by kddubey over 1 year ago

Breaking changes

None

New features

None

Bug fixes

  • Allow prior to be a numpy array
cappr - v0.2.0 - add HF no-cache module

Published by kddubey over 1 year ago

Breaking changes

None

New features

Adds cappr.huggingface.classify_no_cache, which appears to be faster for non-batch processing. This may be a bug tho lol. If it is and I fix it, I'm going to hide this module again, which will be a breaking change.

Here's its documentation.

Bug fixes

None

cappr - v0.1.0 - first release

Published by kddubey over 1 year ago

See the documentation

Installation

If you intend on using OpenAI models, sign up for the OpenAI API here, and then set the environment variable OPENAI_API_KEY. For zero-shot classification, OpenAI models are currently far ahead of others. But using them will cost ya 💰!

Install with pip:

python -m pip install cappr
python -m pip install cappr[hf]
python -m pip install cappr[demos]
Package Rankings
Top 18.73% on Pypi.org
Badges
Extracted from project README
Python 3.8+ tests codecov PyPI - Package Version License
Related Projects