stealing-part-lm-supplementary

Some code for "Stealing Part of a Production Language Model"

MIT License

Stars
10
Committers
1

Supplementary code for Stealing Part of a Production Language Model, Carlini et al., 2024

optimize_logit_queries

Implements logprob-free attacks on a known vector of logits (no API calls). Use run_attacks.py to access all implemented attacks, described in the paper in varying levels of detail. Running run_attacks.py without modification runs several methods on a small random vector of logits. This directory is useful as a starting point for further research on logprob-free attacks.

Other, less directly useful code

query_logprobs_emulator.

Emulates the OpenAI API to test any logprob attack on a local model, as well as possible mitigation strategies. (Could also use it on the OpenAI API before March 3 2024, but not anymore.) Defaults to the logprob attack that uses top_logprobs - 1 tokens per query. May be useful for research on mitigations and ways to bypass them.

distribution_logits

Very unpolished script investigating the distribution of logits over the vocabulary for various open-source models. Pythia seems to be an outlier with very low probabilities on the long tail of the vocabulary.

openai_api_intricacies

Verifying undocumented properties of logit_bias in the OpenAI API that are necessary for the attack to work as described in the paper.

Disclaimer

Note: this repo does not contain any parameters of OpenAI or Google proprietary models, nor any code that can directly extract weights from any API known to the authors.

Related Projects