Multi-Modality and Attachments
Support for media attachments to traces is currently in beta. Please report any issues and add feature requests to this ongoing discussion thread.
Langfuse supports multi-modal traces including text, images, audio, and other attachments.
By default, base64 encoded data URIs are handled automatically by the Langfuse SDKs. They are extracted from the payloads commonly used in multi-modal LLMs, uploaded to Langfuse’s object storage, and linked to the trace.
This also works if you:
- Reference media files via external URLs.
- Customize the handling of media files in the SDKs via the
LangfuseMedia
class. - Integrate via the Langfuse API directly.
Learn more on how to get started and how this works under the hood below.
Examples
Availability
Langfuse Cloud
Multi-modal attachments on Langfuse Cloud are free while in beta. We will be rolling out a new pricing metric to account for the additional storage and compute costs associated with large multi-modal traces in the coming weeks.
Self-hosting
Multi-modal attachments are available today. You need to configure your own object storage bucket via the Langfuse environment variables (LANGFUSE_S3_MEDIA_UPLOAD_*
). See self-hosting documentation for details on these environment variables. S3-compatible APIs are supported across all major cloud providers and can be self-hosted via minio.
Supported media formats
Langfuse supports:
- Images: .png, .jpg, .webp
- Audio files: .mpeg, .mp3, .wav
- Other attachments: .pdf, plain text
If you require support for additional file types, please let us know in our GitHub Discussion where we’re actively gathering feedback on multi-modal support.
Get Started
Base64 data URI encoded media
If you use base64 encoded images, audio, or other files in your LLM applications, upgrade to the latest version of the Langfuse SDKs. The Langfuse SDKs automatically detect and handle base64 encoded media by extracting it, uploading it separately as a Langfuse Media file, and including a reference in the trace.
This works with standard Data URI (MDN) formatted media (like those used by OpenAI and other LLMs).
This notebook includes a couple of examples using the OpenAI SDK and LangChain.
External media (URLs)
Langfuse supports in-line rendering of media files via URLs if they follow common formats. In this case, the media file is not uploaded to Langfuse’s object storage but simply rendered in the UI directly from the source.
Supported formats:
![Alt text](https://example.com/image.jpg)
Custom attachments
If you want to have more control or your media is not base64 encoded, you can upload arbitrary media attachments to Langfuse via the SDKs using the new LangfuseMedia
class. Wrap media with LangfuseMedia before including it in trace inputs, outputs, or metadata. See the multi-modal documentation for examples.
from langfuse.decorators import observe, langfuse_context
from langfuse.media import LangfuseMedia
with open("static/bitcoin.pdf", "rb") as pdf_file:
pdf_bytes = pdf_file.read()
# Wrap media in LangfuseMedia class
wrapped_obj = LangfuseMedia(
obj=pdf_bytes, content_bytes=pdf_bytes, content_type="application/pdf"
)
# Optionally, access media via wrapped_obj.obj
wrapped_obj.obj
@observe()
def main():
langfuse_context.update_current_trace(
input=wrapped_obj,
metadata={
"context": wrapped_obj
},
)
return # Limitation: LangfuseMedia object does not work in decorated function IO, needs to be set via update_current_trace or update_current_observation
main()
API
If you use the API directly to log traces to Langfuse, you need to follow these steps:
Upload media to Langfuse
- If you use base64 encoded media: you need to extract it from the trace payloads similar to how the Langfuse SDKs do it.
- Initialize the upload and get a
mediaId
andpresignedURL
:POST /api/public/media
. - Upload media file:
PUT [presignedURL]
.
See this end-to-end example (Python) on how to use the API directly to upload media files.
Add reference to mediaId in trace/observation
Use the Langfuse Media Token to reference the mediaId
in the trace or observation input
, output
, or metadata
.
How does it work?
When using media files (that are not referenced via external URLs), Langfuse handles them in the following way:
1. Media Upload Process
Detection and Extraction
- Langfuse supports media files in traces and observations on
input
,output
, andmetadata
fields - SDKs separate media from tracing data client-side for performance optimization
- Media files are uploaded directly to object storage (AWS S3 or compatible)
- Original media content is replaced with a reference string
Security and Optimization
- Uploads use presigned URLs with content validation (content length, content type, content SHA256 hash)
- Deduplication: Files are simply replaced by their
mediaId
reference string if already uploaded - File uniqueness determined by project, content type, and content SHA256 hash
Implementation Details
- Python SDK: Background thread handling for non-blocking execution
- JS/TS SDKs: Asynchronous, non-blocking implementation
- API support for direct uploads (see guide)
2. Media Reference System
The base64 data URIs and the wrapped LangfuseMedia
objects in Langfuse traces are replaced by references to the mediaId
in the following standardized token format, which helps reconstruct the original payload if needed:
@@@langfuseMedia:type={MIME_TYPE}|id={LANGFUSE_MEDIA_ID}|source={SOURCE_TYPE}@@@
MIME_TYPE
: MIME type of the media file, e.g.,image/jpeg
LANGFUSE_MEDIA_ID
: ID of the media file in Langfuse’s object storageSOURCE_TYPE
: Source type of the media file, can bebase64_data_uri
,bytes
, orfile
Based on this token, the Langfuse UI can automatically detect the mediaId
and render the media file inline. The LangfuseMedia
class provides utility functions to extract the mediaId
from the reference string.