Skip to main content

Documentation Index

Fetch the complete documentation index at: https://langchain-5e9cc07a-preview-cbclea-1779438149-9d7c578.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Many LLM applications have a chatbot-like interface in which the user and the LLM application engage in a multi-turn conversation. In order to track these conversations, you can use threads in LangSmith.

Group traces into threads

To associate traces together into a thread, you need to pass in a special metadata key where the value is the unique identifier for that thread. The key name should be one of:
  • session_id
  • thread_id
  • conversation_id
The value can be any string you want, but we recommend using UUID v7 thread IDs. The LangSmith SDK exports a uuid7 helper (Python v0.4.43+, JS v0.3.80+):
  • Python: from langsmith import uuid7
  • JS/TS: import { uuid7 } from 'langsmith'
For instructions, refer to Add metadata and tags to traces.
Important: To ensure filtering and token counting work correctly across your entire thread, you must set the thread metadata (session_id, thread_id, or conversation_id) on all runs, including child runs within a trace.If child runs don’t have the thread_id metadata, they won’t be included when:
  • Filtering runs by thread.
  • Calculating token usage for a thread.
  • Aggregating costs across a thread.
When creating child runs (e.g., using @traceable for nested functions or creating child spans), ensure you propagate the thread metadata to all child runs.

Example

This example demonstrates how to log and retrieve conversation history using a structured message format to maintain long-running chats. The example sets a THREAD_ID and passes it via metadata to the tracing wrapper, linking every run from that session into the same thread in LangSmith. Conversation history is persisted locally between turns—replace the file-based or in-memory store with a database or cache in production. The get_chat_history flag controls whether the pipeline continues an existing thread or starts a fresh one:
import os
import json
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

import openai
from langsmith import traceable, Client, uuid7
from langsmith.wrappers import wrap_openai

# Initialize clients
langsmith_client = Client()
client = wrap_openai(openai.Client())

# Configuration
THREAD_ID = str(uuid7())

# Using a local directory to store thread history. For production use, use a persistent storage solution.
THREADS_DIR = os.path.join(os.path.dirname(__file__), "threads")

# gets a history of all LLM calls in the thread to construct conversation history
def get_thread_history(thread_id: str) -> list:
    path = os.path.join(THREADS_DIR, f"{thread_id}.json")
    if not os.path.exists(path):
        return []
    with open(path, "r") as f:
        return json.load(f)

def save_thread_history(thread_id: str, messages: list):
    os.makedirs(THREADS_DIR, exist_ok=True)
    with open(os.path.join(THREADS_DIR, f"{thread_id}.json"), "w") as f:
        json.dump(messages, f, indent=2, default=str)


@traceable(name="Chat Bot", metadata={"thread_id": THREAD_ID})
def chat_pipeline(messages: list, get_chat_history: bool = False):
    # Whether to continue an existing thread or start a new one
    if get_chat_history:
        history_messages = get_thread_history(THREAD_ID)
        # Get existing conversation history and append new messages
        all_messages = history_messages + messages
    else:
        all_messages = messages

    # Invoke the model
    chat_completion = client.chat.completions.create(
        model="gpt-5.4-mini", messages=all_messages
    )

    response_message = chat_completion.choices[0].message
    print("Response from model:", response_message)

    full_conversation = all_messages + [{"role": response_message.role, "content": response_message.content}]
    save_thread_history(THREAD_ID, full_conversation)

    return {"messages": full_conversation}


# Format message
messages = [
    {
        "content": "Hi, my name is Sally",
        "role": "user"
    }
]

# Call the chat pipeline
result = chat_pipeline(messages, get_chat_history=False)
The Java and Kotlin examples use a dedicated executor. Shutting down the executor and awaiting termination ensures background trace submissions complete before the process exits. Make the following calls to continue the conversation. By passing get_chat_history=True / get_chat_history: true / getChatHistory = true, you can continue the conversation from where it left off. This means that the LLM receives the entire message history and responds to it, instead of just responding to the latest message:
# Format message
messages = [
    {
        "content": "What is my name",
        "role": "user"
    }
]

# Call the chat pipeline
result = chat_pipeline(messages, get_chat_history=True)
Keep the conversation going. Since past messages are included, the LLM will remember the conversation:
# Continue the conversation.
messages = [
    {
        "content": "What was the first message I sent you?",
        "role": "user"
    }
]

chat_pipeline(messages, get_chat_history=True)

View threads

You can view threads in the UI by clicking on the Threads tab in any project details page. The table shows each thread’s first input, last output, start times, turn count, latency (P50/P99), token usage, cost, and feedback score. The right panel displays aggregate stats for the project, including thread and trace counts, total and median token usage, error rate, and P50/P99 latency.
Use Polly in thread views to analyze conversation threads, understand user sentiment, identify pain points, and track whether issues were resolved.
You can then click into a particular thread. You can view the thread in three different ways:
  • Messages view (Beta): the conversation layer. Scan each turn as a chat-style thread showing user and assistant messages, tool calls, and subagent activity.
  • Turns view: the per-turn summary. View each turn as a card showing its inputs and outputs, with expand/collapse and customizable input/output fields.
  • Details view: the debugging layer. Drill into a specific run to inspect inputs, outputs, metadata, timing, errors, and child runs. The surrounding thread context stays visible so you can see where the run fits in the broader conversation.
Switch between views using the buttons at the top of the page or keyboard shortcuts M (Messages), T (Turns), and D (Details). While the Messages view is in beta, the thread side panel defaults to the Details view. The right panel shows stats for the thread, including turn count, first and last start times, P50/P99 latency, and a cost breakdown by input and output tokens. For a full description of each view, see View traces.

View feedback

Feedback scores are visible in the Feedback column of the threads table on the project’s Threads tab. Within a thread, open the Messages view and click the LLM call link in a turn’s metadata row to go to the Details view for that run, where you can review feedback for the run. You can also see thread-level feedback there.

Save thread-level filter

Thread filters look through all runs and surface a thread if at least 1 run matches the filter.
On the Threads tab of a project, you can save commonly used filters: Set a filter using the Add filter button, then click Save view.