Langfuse Launch Week #2

A week dedicated to unlocking model capabilities and integrating Langfuse into your development loop

Clemens

We are excited to announce Langfuse Launch Week #2!

Starting on Monday, November 18th, 2024, we’ll unveil a major upgrade to the Langfuse platform every day until Friday. We’re topping it off with a Product Hunt launch on Friday and a Virtual Town Hall on Wednesday.

Star us on GitHub and follow us on Twitter for daily updates.

Want to see what we have launched in LW1 back in April to get a taste of what’s coming? Check out the blog post.

Launch Week Focus

Langfuse Launch Week #2 Theme - Supporting the next generation of models and Langfuse in the development loop

This week, we’re focusing on supporting the next generation of models. As AI models evolve Langfuse is evolving too. We’re integrating Langfuse deeply into your development loop with end-to-end prompt engineering workflow tooling tailored for product development teams. Stay tuned for the exciting updates ahead!

🚀 Day 0: Prompt Management for Vercel AI SDK

import { generateText } from "ai";
import { Langfuse } from "langfuse";
 
const langfuse = new Langfuse();
 
// fetch prompt from Langfuse Prompt Management
const fetchedPrompt = await langfuse.getPrompt("my-prompt");
 
const result = await generateText({
  model: openai("gpt-4o"),
  prompt: fetchedPrompt.prompt, // use prompt
  experimental_telemetry: {
    isEnabled: true, // enable tracing
    metadata: {
      langfusePrompt: fetchedPrompt.toJSON(), // link trace to prompt version
    },
  },
});

Langfuse Prompt Management now integrates natively with the Vercel AI SDK. Version and release prompts in Langfuse, use them via Vercel AI SDK, monitor metrics in Langfuse.

This helps you answer questions like:

Which prompt version caused a particular bug?
What is the average latency and cost impact of a prompt version?
Which prompt version is the most used?

See the changelog for more details.

🆚 Day 1: Dataset Experiment Run Comparison View

Langfuse Datasets now features a powerful comparison view for dataset experiment runs. This new interface allows both technical and non-technical users to analyze multiple experiment runs side by side. Compare your application’s performance across different test dataset experiment runs, examine metrics like latency and cost, and dive deep into individual dataset items. Use it to accelerate your testing of different prompts, models, or application configurations.

See the changelog for more details.

⚖️ Day 2: LLM-as-a-judge for Dataset Experiments

Langfuse now brings managed LLM-as-a-judge evaluators to dataset experiments. Assign evaluators to your datasets and they will automatically run on new experiment runs, scoring your outputs based on your evaluation criteria. You can use any LLM that supports tool/function calling (OpenAI, Azure OpenAI, Anthropic, AWS Bedrock) and choose from built-in templates for criteria like hallucination, helpfulness, relevance, toxicity, and more. Unlike previous evaluators that were limited to production runs, these new evaluators can access your dataset’s ground truth (expected_output) for reliable offline evaluation - helping teams catch issues before they reach production.

See the changelog for more details or watch the video above for a walkthrough.

Langfuse now supports multi-modal traces including images, audio files, and attachments, enabling you to observe and debug multi-modal LLM applications end-to-end. Base64 encoded media is automatically handled by the SDKs across all integrations and SDKs, with no additional configuration required. You can also upload custom attachments or reference external media, with support for common formats like images (png, jpg, webp), audio files (mpeg, mp3, wav), and other attachments (pdf, plain text).

See the changelog for more details or watch the video above for a walkthrough.

📚 Day 4: All new Datasets and Evaluations documentation

Today we’re highlighting documentation - an often overlooked but critical element of great Developer Experience. Alongside major updates to our Datasets and Evaluations features, we’ve completely rebuilt their documentation to be more thorough and user-friendly than ever before. The new docs better explain how and when to use these features, introduce core data models, and provide end-to-end examples as Jupyter Notebooks. We’ve also revamped the /docs start page to reflect Langfuse’s comprehensive platform scope, and added llms.txt for better LLM tool integration. Documentation is product at Langfuse - we take it seriously and have built many features to help users get the most value from it.

See the changelog for more details. It also includes a summary of all the features we added to the documentation over the last year to make it truly awesome.

🧪 Day 5: Prompt Experiments

Prompt Experiments are the final piece of the launch week theme of “closing the development loop”. They allow you to test prompt versions from Langfuse Prompt Management on datasets of test inputs and expected outputs. You can optionally use LLM-as-a-Judge evaluators to automatically evaluate responses based on expected outputs, and compare results in the new side-by-side experiment comparison view. This powerful combination speeds up the feedback loop when working on prompts and prevents regressions when making rapid prompt changes

See the changelog for more details or watch the video above for a walkthrough.

🍒 Extra Goodies

List of additional features that were released this week:

llms.txt: Easily use the Langfuse documentation in Cursor and other LLM editors via the new llms.txt file.
/docs: New documentation start page with a simplified overview of all Langfuse features.
Self-hosted Pro Plan: Get access to additional features without the need for a sales call or enterprise pricing. All core Langfuse features are OSS without limitations, see comparison for more details.
Developer Preview of v3 (self-hosted): v3 is the biggest release in Langfuse history. After running large parts of it on Langfuse Cloud for a while, an initial developer preview for self-hosted users is now available.

Don’t Miss Out

GitHub

⭐️ Star us on GitHub & see all of our releases!

Channels

Twitter and LinkedIn will be our main channels for launching a new feature every day.

You can subscribe to our mailing list for daily updates:

Wednesday: Virtual Town Hall

Townhall Invite Image

You’re invited to our virtual town hall where we’ll demo the new features and discuss how they integrate into your development workflow. We will also answer your questions and talk about the future of Langfuse (especially V3).

When: Wednesday, November 20, 2024, at 10 am PT / 7pm CET
Where: https://lu.ma/c7zsbc3b

Note: The recording of the town hall is now on YouTube: https://www.youtube.com/watch?v=9MzdiL9tUe0

Friday: Product Hunt Launch

We are launching on Product Hunt for our third time on Friday, November 22nd. Stay tuned for the biggest launch of the week and get notified here.

Chat with the community on Discord

Join the community of over 2,000 members on Discord for banter and to talk directly to the team.

Learn More About Langfuse

Docs Quickstart Interactive Demo About Us

OpenTelemetry (OTel) for LLM Observability LLM Product Development for Product Managers

Was this page useful?

Questions? We're here to help

GitHub Q&AEmail Talk to sales

Langfuse Launch Week #2

Launch Week Focus

🚀 Day 0: Prompt Management for Vercel AI SDK

🆚 Day 1: Dataset Experiment Run Comparison View

⚖️ Day 2: LLM-as-a-judge for Dataset Experiments

🎨 Day 3: Full multi-modal support, including audio, images, and attachments

📚 Day 4: All new Datasets and Evaluations documentation

🧪 Day 5: Prompt Experiments

🍒 Extra Goodies

Don’t Miss Out

GitHub

Channels

Wednesday: Virtual Town Hall

Friday: Product Hunt Launch

Chat with the community on Discord

Learn More About Langfuse

Was this page useful?

Questions? We're here to help

Subscribe to updates