November 22, 2024 | Launch Week 2 🚀

Prompt Experiments on Datasets with LLM-as-a-Judge Evaluations

Move fast on prompts without breaking things! Run experiments on Datasets and directly compare evaluation results side-by-side. Experimentation speeds up the feedback loop when working on prompts and prevents regressions when making rapid changes.

On Day 5 of Langfuse Launch Week 2, we’re excited to announce Prompt Experimentation. They are the final piece of the launch week theme of “closing the development loop” and allow you to:

Test a prompt version from Langfuse Prompt Management
on a Dataset of test inputs and expected outputs
optionally, use LLM-as-a-Judge Evaluators to automatically evaluate the responses based on the expected outputs (released this launch week)
and finally, compare the results in the new side-by-side experiment comparison view (released this launch week)

Watch the video below for a walkthrough of the new feature:

Check out the docs to learn more.

Was this page useful?

Questions? We're here to help

GitHub Q&AEmail Talk to sales

Prompt Experiments on Datasets with LLM-as-a-Judge Evaluations

Was this page useful?

Questions? We're here to help

Subscribe to updates