Multi-Modality and Attachments

Support for media attachments to traces is currently in beta. Please report any issues and add feature requests to this ongoing discussion thread.

Langfuse supports multi-modal traces including text, images, audio, and other attachments.

By default, base64 encoded data URIs are handled automatically by the Langfuse SDKs. They are extracted from the payloads commonly used in multi-modal LLMs, uploaded to Langfuse’s object storage, and linked to the trace.

This also works if you:

Reference media files via external URLs.
Customize the handling of media files in the SDKs via the LangfuseMedia class.
Integrate via the Langfuse API directly.

Learn more on how to get started and how this works under the hood below.

Examples

Trace in Langfuse UI

Availability

Langfuse Cloud

Multi-modal attachments on Langfuse Cloud are free while in beta. We will be rolling out a new pricing metric to account for the additional storage and compute costs associated with large multi-modal traces in the coming weeks.

Self-hosting

Multi-modal attachments are available today. You need to configure your own object storage bucket via the Langfuse environment variables (LANGFUSE_S3_MEDIA_UPLOAD_*). See self-hosting documentation for details on these environment variables. S3-compatible APIs are supported across all major cloud providers and can be self-hosted via minio.

Supported media formats

Langfuse supports:

Images: .png, .jpg, .webp
Audio files: .mpeg, .mp3, .wav
Other attachments: .pdf, plain text

If you require support for additional file types, please let us know in our GitHub Discussion where we’re actively gathering feedback on multi-modal support.

Get Started

Base64 data URI encoded media

If you use base64 encoded images, audio, or other files in your LLM applications, upgrade to the latest version of the Langfuse SDKs. The Langfuse SDKs automatically detect and handle base64 encoded media by extracting it, uploading it separately as a Langfuse Media file, and including a reference in the trace.

This works with standard Data URI (MDN) formatted media (like those used by OpenAI and other LLMs).

This notebook includes a couple of examples using the OpenAI SDK and LangChain.

External media (URLs)

Langfuse supports in-line rendering of media files via URLs if they follow common formats. In this case, the media file is not uploaded to Langfuse’s object storage but simply rendered in the UI directly from the source.

Supported formats:

![Alt text](https://example.com/image.jpg)

Custom attachments

If you want to have more control or your media is not base64 encoded, you can upload arbitrary media attachments to Langfuse via the SDKs using the new LangfuseMedia class. Wrap media with LangfuseMedia before including it in trace inputs, outputs, or metadata. See the multi-modal documentation for examples.

from langfuse.decorators import observe, langfuse_context
from langfuse.media import LangfuseMedia
 
with open("static/bitcoin.pdf", "rb") as pdf_file:
        pdf_bytes = pdf_file.read()
 
# Wrap media in LangfuseMedia class
wrapped_obj = LangfuseMedia(
    obj=pdf_bytes, content_bytes=pdf_bytes, content_type="application/pdf"
)
 
# Optionally, access media via wrapped_obj.obj
wrapped_obj.obj
 
@observe()
def main():
    langfuse_context.update_current_trace(
      input=wrapped_obj,
      metadata={
          "context": wrapped_obj
      },
    )
 
    return # Limitation: LangfuseMedia object does not work in decorated function IO, needs to be set via update_current_trace or update_current_observation
 
main()

API

If you use the API directly to log traces to Langfuse, you need to follow these steps:

Upload media to Langfuse

If you use base64 encoded media: you need to extract it from the trace payloads similar to how the Langfuse SDKs do it.
Initialize the upload and get a mediaId and presignedURL: POST /api/public/media.
Upload media file: PUT [presignedURL].

See this end-to-end example (Python) on how to use the API directly to upload media files.

Add reference to mediaId in trace/observation

Use the Langfuse Media Token to reference the mediaId in the trace or observation input, output, or metadata.

How does it work?

When using media files (that are not referenced via external URLs), Langfuse handles them in the following way:

1. Media Upload Process

Detection and Extraction

Langfuse supports media files in traces and observations on input, output, and metadata fields
SDKs separate media from tracing data client-side for performance optimization
Media files are uploaded directly to object storage (AWS S3 or compatible)
Original media content is replaced with a reference string

Security and Optimization

Uploads use presigned URLs with content validation (content length, content type, content SHA256 hash)
Deduplication: Files are simply replaced by their mediaId reference string if already uploaded
File uniqueness determined by project, content type, and content SHA256 hash

Implementation Details

Python SDK: Background thread handling for non-blocking execution
JS/TS SDKs: Asynchronous, non-blocking implementation
API support for direct uploads (see guide)

2. Media Reference System

The base64 data URIs and the wrapped LangfuseMedia objects in Langfuse traces are replaced by references to the mediaId in the following standardized token format, which helps reconstruct the original payload if needed:

@@@langfuseMedia:type={MIME_TYPE}|id={LANGFUSE_MEDIA_ID}|source={SOURCE_TYPE}@@@

MIME_TYPE: MIME type of the media file, e.g., image/jpeg
LANGFUSE_MEDIA_ID: ID of the media file in Langfuse’s object storage
SOURCE_TYPE: Source type of the media file, can be base64_data_uri, bytes, or file

Based on this token, the Langfuse UI can automatically detect the mediaId and render the media file inline. The LangfuseMedia class provides utility functions to extract the mediaId from the reference string.