← Back to changelog
November 22, 2024 | Launch Week 2 🚀
Prompt Experiments on Datasets with LLM-as-a-Judge Evaluations
Marlies Mayerhofer
Move fast on prompts without breaking things! Run experiments on Datasets and directly compare evaluation results side-by-side. Experimentation speeds up the feedback loop when working on prompts and prevents regressions when making rapid changes.
On Day 5 of Langfuse Launch Week 2, we’re excited to announce Prompt Experimentation. They are the final piece of the launch week theme of “closing the development loop” and allow you to:
- Test a prompt version from Langfuse Prompt Management
- on a Dataset of test inputs and expected outputs
- optionally, use LLM-as-a-Judge Evaluators to automatically evaluate the responses based on the expected outputs (released this launch week)
- and finally, compare the results in the new side-by-side experiment comparison view (released this launch week)
Watch the video below for a walkthrough of the new feature:
Check out the docs to learn more.