The LLM Evaluation Framework
APACHE-2.0 License
Bot releases are hidden (Show)
Published by penguine-ip 7 months ago
For deepeval's latest release v0.21.15, we release:
-c
flag: https://docs.confident-ai.com/docs/evaluation-introduction#cache
-r
flag: https://docs.confident-ai.com/docs/evaluation-introduction#repeats
Published by penguine-ip 7 months ago
In deepeval v0.20.85:
evaluate()
function for more customizability: https://docs.confident-ai.com/docs/evaluation-introduction#evaluating-without-pytest
Published by penguine-ip 8 months ago
In DeepEval's latest release, there is now:
Published by penguine-ip 8 months ago
For the newest release, deepeval now is now stable for production use:
Published by penguine-ip 8 months ago
For the latest release, DeepEval:
Published by penguine-ip 9 months ago
LLMTestCase
now has execution_time
and cost
, useful for those looking to evaluate on these parametersminimum_score
is now threshold
instead, meaning you can now create custom metrics that either have a "minimum" or "maximum" thresholdPublished by penguine-ip 10 months ago
In this release:
transformers
, sentence_transformers
, and pandas
to reduce package sizePublished by penguine-ip 10 months ago
Lots of new features this release:
JudgementalGPT
now allows for different languages - useful for our APAC and European friendsRAGAS
metrics now supports all OpenAI models - useful for those running into context length issuesLLMEvalMetric
now returns a reasoning for its scoredeepeval test run
now has hooks that call on test run completionevaluate
now displays retrieval_context
for RAG evaluationRAGAS
metric now displays metric breakdown for all its distinct metricsPublished by penguine-ip 11 months ago
Automatically integrated with Confident AI for continous evaluation throughout the lifetime of your LLM (app):
-log evaluation results and analyze metrics pass / fails
-compare and pick the optimal hyperparameters (eg. prompt templates, chunk size, models used, etc.) based on evaluation results
-debug evaluation results via LLM traces
-manage evaluation test cases / datasets in one place
-track events to identify live LLM responses in production
-add production events to existing evaluation datasets to strength evals over time
Published by penguine-ip 11 months ago
Automatically integrated with Confident AI for continous evaluation throughout the lifetime of your LLM (app):
-log evaluation results and analyze metrics pass / fails
-compare and pick the optimal hyperparameters (eg. prompt templates, chunk size, models used, etc.) based on evaluation results
-debug evaluation results via LLM traces
-manage evaluation test cases / datasets in one place
-track events to identify live LLM responses in production
-add production events to existing evaluation datasets to strength evals over time
Published by penguine-ip 11 months ago
Mid-week bug fixes release with an extra feature:
evaluate
, evaluates a list of test cases (dataset) on metrics you define, all without having to go through the CLI. More info here: https://docs.confident-ai.com/docs/evaluation-datasets#evaluate-your-dataset-without-pytest
Published by penguine-ip 11 months ago
In this release, deepeval has added support for:
Published by penguine-ip 12 months ago
Published by penguine-ip 12 months ago
Published by penguine-ip about 1 year ago