OpenAI - AgentVista

The OpenAIAdapter extracts telemetry from OpenAI responses — model name, token counts, and cost in USD — and passes them directly into agentvista.record() or a traced run. The adapter works with both the Chat Completions API (client.chat.completions.create) and the Responses API (client.responses.create), auto-detecting the response shape. Cost is calculated automatically from the model name using AgentVista’s built-in pricing table.

Setup

Install the SDK with the OpenAI extra

pip install agentvista[openai]

This installs agentvista along with the openai package.

Initialize AgentVista

Call agentvista.init() once at application startup with your API key.

import agentvista

agentvista.init(api_key="av_xxxxx")

Import the adapter

from agentvista.adapters.openai import OpenAIAdapter

adapter = OpenAIAdapter()

Usage examples

Basic
With tracing
Responses API

Use adapter.extract() on any OpenAI response and unpack the result directly into agentvista.record().

from openai import OpenAI
import agentvista
from agentvista.adapters.openai import OpenAIAdapter

agentvista.init(api_key="av_xxxxx")
adapter = OpenAIAdapter()

client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{"role": "user", "content": "Qualify this lead..."}],
)

# Extract model, input_tokens, output_tokens, total_tokens, cost_usd
telemetry = adapter.extract(response)
agentvista.record(agent="lead-qualifier", success=True, **telemetry)

adapter.extract() returns a dict with any of these keys present:

Key	Type	Description
`model`	`str`	Model ID returned by the API (e.g. `"gpt-4.1"`)
`input_tokens`	`int`	Prompt tokens (or `input_tokens` from the Responses API)
`output_tokens`	`int`	Completion tokens (or `output_tokens` from the Responses API)
`total_tokens`	`int`	Sum of input and output tokens
`cost_usd`	`float`	Total cost in USD, rounded to 6 decimal places

The adapter automatically detects whether the response uses prompt_tokens / completion_tokens (Chat Completions) or input_tokens / output_tokens (Responses API) and normalizes both into the same output shape.

Wrap your OpenAI call in agentvista.run() to record a full traced run with outcome signals.

from openai import OpenAI
import agentvista
from agentvista.adapters.openai import OpenAIAdapter

agentvista.init(api_key="av_xxxxx")
adapter = OpenAIAdapter()
client = OpenAI()

with agentvista.run("lead-qualifier") as r:
    response = client.chat.completions.create(
        model="gpt-4.1",
        messages=[{"role": "user", "content": "Qualify this lead..."}],
    )

    telemetry = adapter.extract(response)
    agentvista.record(agent="lead-qualifier", success=True, **telemetry)

    qualified = "yes" in response.choices[0].message.content.lower()
    r.set_outcome(success=qualified, outcome="qualified" if qualified else "not-qualified")

r.set_outcome(success, outcome) attaches a business-level result to the trace. The context manager automatically records duration and flushes the trace on exit.

The adapter works identically with the OpenAI Responses API.

from openai import OpenAI
import agentvista
from agentvista.adapters.openai import OpenAIAdapter

agentvista.init(api_key="av_xxxxx")
adapter = OpenAIAdapter()
client = OpenAI()

response = client.responses.create(
    model="gpt-4.1",
    input="Summarize this support ticket...",
)

telemetry = adapter.extract(response)
agentvista.record(agent="ticket-summarizer", success=True, **telemetry)

The adapter reads input_tokens, output_tokens, and input_tokens_details.cached_tokens from Responses API responses, applying the correct cached token rate when calculating cost.

Supported models

Cost calculation is automatic for the following OpenAI models. If your model is not listed, cost_usd will be absent from the extracted telemetry; all other fields (tokens, model name) are still captured.

Model	Input (per M tokens)	Output (per M tokens)
`gpt-5.4`	$2.50	$15.00
`gpt-5.4-mini`	$0.75	$4.50
`gpt-5.4-nano`	$0.20	$1.25
`gpt-5`	$0.625	$5.00
`gpt-4.1`	$2.00	$8.00
`gpt-4.1-mini`	$0.40	$1.60
`gpt-4.1-nano`	$0.10	$0.40
`gpt-4o`	$2.50	$10.00
`gpt-4o-mini`	$0.15	$0.60
`o3`	$2.00	$8.00
`o4-mini`	$1.10	$4.40

Models that support prompt caching (gpt-5.4, gpt-5.4-mini, gpt-5.4-nano, gpt-5, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, gpt-4o-mini, o3, o4-mini) automatically detect cached token counts from the response and apply the correct cached read rate when calculating cost.

​Setup

​Usage examples

​Supported models

Setup

Usage examples

Supported models