Messages and chat history
PydanticAI provides access to messages exchanged during an agent run. These messages can be used both to continue a coherent conversation, and to understand how an agent performed.
Accessing Messages from Results
After running an agent, you can access the messages exchanged during that run from the result
object.
Both RunResult
(returned by Agent.run
, Agent.run_sync
)
and StreamedRunResult
(returned by Agent.run_stream
) have the following methods:
all_messages()
: returns all messages, including messages from prior runs. There's also a variant that returns JSON bytes,all_messages_json()
.new_messages()
: returns only the messages from the current run. There's also a variant that returns JSON bytes,new_messages_json()
.
StreamedRunResult and complete messages
On StreamedRunResult
, the messages returned from these methods will only include the final result message once the stream has finished.
E.g. you've awaited one of the following coroutines:
StreamedRunResult.stream()
StreamedRunResult.stream_text()
StreamedRunResult.stream_structured()
StreamedRunResult.get_output()
Note: The final result message will NOT be added to result messages if you use .stream_text(delta=True)
since in this case the result content is never built as one string.
Example of accessing methods on a RunResult
:
from pydantic_ai import Agent
agent = Agent('openai:gpt-4o', system_prompt='Be a helpful assistant.')
result = agent.run_sync('Tell me a joke.')
print(result.output)
#> Did you hear about the toothpaste scandal? They called it Colgate.
# all messages from the run
print(result.all_messages())
"""
[
ModelRequest(
parts=[
SystemPromptPart(
content='Be a helpful assistant.',
timestamp=datetime.datetime(...),
),
UserPromptPart(
content='Tell me a joke.',
timestamp=datetime.datetime(...),
),
]
),
ModelResponse(
parts=[
TextPart(
content='Did you hear about the toothpaste scandal? They called it Colgate.'
)
],
usage=Usage(requests=1, request_tokens=60, response_tokens=12, total_tokens=72),
model_name='gpt-4o',
timestamp=datetime.datetime(...),
),
]
"""
Example of accessing methods on a StreamedRunResult
:
from pydantic_ai import Agent
agent = Agent('openai:gpt-4o', system_prompt='Be a helpful assistant.')
async def main():
async with agent.run_stream('Tell me a joke.') as result:
# incomplete messages before the stream finishes
print(result.all_messages())
"""
[
ModelRequest(
parts=[
SystemPromptPart(
content='Be a helpful assistant.',
timestamp=datetime.datetime(...),
),
UserPromptPart(
content='Tell me a joke.',
timestamp=datetime.datetime(...),
),
]
)
]
"""
async for text in result.stream_text():
print(text)
#> Did you hear
#> Did you hear about the toothpaste
#> Did you hear about the toothpaste scandal? They called
#> Did you hear about the toothpaste scandal? They called it Colgate.
# complete messages once the stream finishes
print(result.all_messages())
"""
[
ModelRequest(
parts=[
SystemPromptPart(
content='Be a helpful assistant.',
timestamp=datetime.datetime(...),
),
UserPromptPart(
content='Tell me a joke.',
timestamp=datetime.datetime(...),
),
]
),
ModelResponse(
parts=[
TextPart(
content='Did you hear about the toothpaste scandal? They called it Colgate.'
)
],
usage=Usage(request_tokens=50, response_tokens=12, total_tokens=62),
model_name='gpt-4o',
timestamp=datetime.datetime(...),
),
]
"""
asyncio.run(main())
to run main
)
Using Messages as Input for Further Agent Runs
The primary use of message histories in PydanticAI is to maintain context across multiple agent runs.
To use existing messages in a run, pass them to the message_history
parameter of
Agent.run
, Agent.run_sync
or
Agent.run_stream
.
If message_history
is set and not empty, a new system prompt is not generated — we assume the existing message history includes a system prompt.
from pydantic_ai import Agent
agent = Agent('openai:gpt-4o', system_prompt='Be a helpful assistant.')
result1 = agent.run_sync('Tell me a joke.')
print(result1.output)
#> Did you hear about the toothpaste scandal? They called it Colgate.
result2 = agent.run_sync('Explain?', message_history=result1.new_messages())
print(result2.output)
#> This is an excellent joke invented by Samuel Colvin, it needs no explanation.
print(result2.all_messages())
"""
[
ModelRequest(
parts=[
SystemPromptPart(
content='Be a helpful assistant.',
timestamp=datetime.datetime(...),
),
UserPromptPart(
content='Tell me a joke.',
timestamp=datetime.datetime(...),
),
]
),
ModelResponse(
parts=[
TextPart(
content='Did you hear about the toothpaste scandal? They called it Colgate.'
)
],
usage=Usage(requests=1, request_tokens=60, response_tokens=12, total_tokens=72),
model_name='gpt-4o',
timestamp=datetime.datetime(...),
),
ModelRequest(
parts=[
UserPromptPart(
content='Explain?',
timestamp=datetime.datetime(...),
)
]
),
ModelResponse(
parts=[
TextPart(
content='This is an excellent joke invented by Samuel Colvin, it needs no explanation.'
)
],
usage=Usage(requests=1, request_tokens=61, response_tokens=26, total_tokens=87),
model_name='gpt-4o',
timestamp=datetime.datetime(...),
),
]
"""
Storing and loading messages (to JSON)
While maintaining conversation state in memory is enough for many applications, often times you may want to store the messages history of an agent run on disk or in a database. This might be for evals, for sharing data between Python and JavaScript/TypeScript, or any number of other use cases.
The intended way to do this is using a TypeAdapter
.
We export ModelMessagesTypeAdapter
that can be used for this, or you can create your own.
Here's an example showing how:
from pydantic_core import to_jsonable_python
from pydantic_ai import Agent
from pydantic_ai.messages import ModelMessagesTypeAdapter # (1)!
agent = Agent('openai:gpt-4o', system_prompt='Be a helpful assistant.')
result1 = agent.run_sync('Tell me a joke.')
history_step_1 = result1.all_messages()
as_python_objects = to_jsonable_python(history_step_1) # (2)!
same_history_as_step_1 = ModelMessagesTypeAdapter.validate_python(as_python_objects)
result2 = agent.run_sync( # (3)!
'Tell me a different joke.', message_history=same_history_as_step_1
)
- Alternatively, you can create a
TypeAdapter
from scratch:from pydantic import TypeAdapter from pydantic_ai.messages import ModelMessage ModelMessagesTypeAdapter = TypeAdapter(list[ModelMessage])
- Alternatively you can serialize to/from JSON directly:
from pydantic_core import to_json ... as_json_objects = to_json(history_step_1) same_history_as_step_1 = ModelMessagesTypeAdapter.validate_json(as_json_objects)
- You can now continue the conversation with history
same_history_as_step_1
despite creating a new agent run.
(This example is complete, it can be run "as is")
Other ways of using messages
Since messages are defined by simple dataclasses, you can manually create and manipulate, e.g. for testing.
The message format is independent of the model used, so you can use messages in different agents, or the same agent with different models.
In the example below, we reuse the message from the first agent run, which uses the openai:gpt-4o
model, in a second agent run using the google-gla:gemini-1.5-pro
model.
from pydantic_ai import Agent
agent = Agent('openai:gpt-4o', system_prompt='Be a helpful assistant.')
result1 = agent.run_sync('Tell me a joke.')
print(result1.output)
#> Did you hear about the toothpaste scandal? They called it Colgate.
result2 = agent.run_sync(
'Explain?',
model='google-gla:gemini-1.5-pro',
message_history=result1.new_messages(),
)
print(result2.output)
#> This is an excellent joke invented by Samuel Colvin, it needs no explanation.
print(result2.all_messages())
"""
[
ModelRequest(
parts=[
SystemPromptPart(
content='Be a helpful assistant.',
timestamp=datetime.datetime(...),
),
UserPromptPart(
content='Tell me a joke.',
timestamp=datetime.datetime(...),
),
]
),
ModelResponse(
parts=[
TextPart(
content='Did you hear about the toothpaste scandal? They called it Colgate.'
)
],
usage=Usage(requests=1, request_tokens=60, response_tokens=12, total_tokens=72),
model_name='gpt-4o',
timestamp=datetime.datetime(...),
),
ModelRequest(
parts=[
UserPromptPart(
content='Explain?',
timestamp=datetime.datetime(...),
)
]
),
ModelResponse(
parts=[
TextPart(
content='This is an excellent joke invented by Samuel Colvin, it needs no explanation.'
)
],
usage=Usage(requests=1, request_tokens=61, response_tokens=26, total_tokens=87),
model_name='gemini-1.5-pro',
timestamp=datetime.datetime(...),
),
]
"""
Processing Message History
Sometimes you may want to modify the message history before it's sent to the model. This could be for privacy reasons (filtering out sensitive information), to save costs on tokens, to give less context to the LLM, or custom processing logic.
PydanticAI provides a history_processors
parameter on Agent
that allows you to intercept and modify
the message history before each model request.
Usage
The history_processors
is a list of callables that take a list of
ModelMessage
and return a modified list of the same type.
Each processor is applied in sequence, and processors can be either synchronous or asynchronous.
from pydantic_ai import Agent
from pydantic_ai.messages import (
ModelMessage,
ModelRequest,
ModelResponse,
TextPart,
UserPromptPart,
)
def filter_responses(messages: list[ModelMessage]) -> list[ModelMessage]:
"""Remove all ModelResponse messages, keeping only ModelRequest messages."""
return [msg for msg in messages if isinstance(msg, ModelRequest)]
# Create agent with history processor
agent = Agent('openai:gpt-4o', history_processors=[filter_responses])
# Example: Create some conversation history
message_history = [
ModelRequest(parts=[UserPromptPart(content='What is 2+2?')]),
ModelResponse(parts=[TextPart(content='2+2 equals 4')]), # This will be filtered out
]
# When you run the agent, the history processor will filter out ModelResponse messages
# result = agent.run_sync('What about 3+3?', message_history=message_history)
Keep Only Recent Messages
You can use the history_processor
to only keep the recent messages:
from pydantic_ai import Agent
from pydantic_ai.messages import ModelMessage
async def keep_recent_messages(messages: list[ModelMessage]) -> list[ModelMessage]:
"""Keep only the last 5 messages to manage token usage."""
return messages[-5:] if len(messages) > 5 else messages
agent = Agent('openai:gpt-4o', history_processors=[keep_recent_messages])
# Example: Even with a long conversation history, only the last 5 messages are sent to the model
long_conversation_history: list[ModelMessage] = [] # Your long conversation history here
# result = agent.run_sync('What did we discuss?', message_history=long_conversation_history)
RunContext
parameter
History processors can optionally accept a RunContext
parameter to access
additional information about the current run, such as dependencies, model information, and usage statistics:
from pydantic_ai import Agent
from pydantic_ai.messages import ModelMessage
from pydantic_ai.tools import RunContext
def context_aware_processor(
ctx: RunContext[None],
messages: list[ModelMessage],
) -> list[ModelMessage]:
# Access current usage
current_tokens = ctx.usage.total_tokens
# Filter messages based on context
if current_tokens > 1000:
return messages[-3:] # Keep only recent messages when token usage is high
return messages
agent = Agent('openai:gpt-4o', history_processors=[context_aware_processor])
This allows for more sophisticated message processing based on the current state of the agent run.
Summarize Old Messages
Use an LLM to summarize older messages to preserve context while reducing tokens.
from pydantic_ai import Agent
from pydantic_ai.messages import ModelMessage
# Use a cheaper model to summarize old messages.
summarize_agent = Agent(
'openai:gpt-4o-mini',
instructions="""
Summarize this conversation, omitting small talk and unrelated topics.
Focus on the technical discussion and next steps.
""",
)
async def summarize_old_messages(messages: list[ModelMessage]) -> list[ModelMessage]:
# Summarize the oldest 10 messages
if len(messages) > 10:
oldest_messages = messages[:10]
summary = await summarize_agent.run(message_history=oldest_messages)
# Return the last message and the summary
return summary.new_messages() + messages[-1:]
return messages
agent = Agent('openai:gpt-4o', history_processors=[summarize_old_messages])
Testing History Processors
You can test what messages are actually sent to the model provider using
FunctionModel
:
import pytest
from pydantic_ai import Agent
from pydantic_ai.messages import (
ModelMessage,
ModelRequest,
ModelResponse,
TextPart,
UserPromptPart,
)
from pydantic_ai.models.function import AgentInfo, FunctionModel
@pytest.fixture
def received_messages() -> list[ModelMessage]:
return []
@pytest.fixture
def function_model(received_messages: list[ModelMessage]) -> FunctionModel:
def capture_model_function(messages: list[ModelMessage], info: AgentInfo) -> ModelResponse:
# Capture the messages that the provider actually receives
received_messages.clear()
received_messages.extend(messages)
return ModelResponse(parts=[TextPart(content='Provider response')])
return FunctionModel(capture_model_function)
def test_history_processor(function_model: FunctionModel, received_messages: list[ModelMessage]):
def filter_responses(messages: list[ModelMessage]) -> list[ModelMessage]:
return [msg for msg in messages if isinstance(msg, ModelRequest)]
agent = Agent(function_model, history_processors=[filter_responses])
message_history = [
ModelRequest(parts=[UserPromptPart(content='Question 1')]),
ModelResponse(parts=[TextPart(content='Answer 1')]),
]
agent.run_sync('Question 2', message_history=message_history)
assert received_messages == [
ModelRequest(parts=[UserPromptPart(content='Question 1')]),
ModelRequest(parts=[UserPromptPart(content='Question 2')]),
]
Multiple Processors
You can also use multiple processors:
from pydantic_ai import Agent
from pydantic_ai.messages import ModelMessage, ModelRequest
def filter_responses(messages: list[ModelMessage]) -> list[ModelMessage]:
return [msg for msg in messages if isinstance(msg, ModelRequest)]
def summarize_old_messages(messages: list[ModelMessage]) -> list[ModelMessage]:
return messages[-5:]
agent = Agent('openai:gpt-4o', history_processors=[filter_responses, summarize_old_messages])
In this case, the filter_responses
processor will be applied first, and the
summarize_old_messages
processor will be applied second.
Examples
For a more complete example of using messages in conversations, see the chat app example.