cooldown_minutes setting (default: 60) controls the minimum time between repeated firings of the same alert. This prevents notification floods when a condition persists.
An alert can target either a specific agent (by
agent_id) or a named service (by target_service). You must provide at least one. Latency and cost alerts are commonly targeted at a target_service to monitor infrastructure services that are not themselves AI agents.Alert types
failure_rate — fires when failure rate exceeds a fraction
failure_rate — fires when failure rate exceeds a fraction
The
failure_rate alert fires when the fraction of failed runs for the target exceeds the threshold.The threshold is a decimal fraction from 0.0 to 1.0. For example, a threshold of 0.20 fires when more than 20% of runs fail.inactivity — fires when an agent has no runs for N hours
inactivity — fires when an agent has no runs for N hours
The
inactivity alert fires when the target has not received any runs for longer than the threshold number of hours.For example, a threshold of 24.0 fires if no runs are received for 24 hours. Use this alert to catch silent failures where an agent has stopped running entirely.cost_threshold — fires when daily cost exceeds a USD amount
cost_threshold — fires when daily cost exceeds a USD amount
The
cost_threshold alert fires when the target’s daily cost exceeds the threshold value in USD.For example, a threshold of 50.00 fires when daily spend exceeds $50. Use this alert to catch runaway costs caused by retry loops or unexpectedly high traffic.latency_threshold — fires when p95 latency exceeds N milliseconds
latency_threshold — fires when p95 latency exceeds N milliseconds
The
latency_threshold alert fires when the p95 latency for the target exceeds the threshold value in milliseconds.For example, a threshold of 5000 fires when the 95th percentile of run durations exceeds 5,000 ms (5 seconds).composite — custom combination of conditions
composite — custom combination of conditions
The The
composite alert type lets you combine AI and infrastructure conditions into a single rule. Use this when you only want to be alerted if multiple conditions are true simultaneously — for example, if your agent’s failure rate is high AND your database connection pool is saturated.Composite alerts require a composite_config object. The threshold field is not used for composite alerts.Example composite_config:operator field controls how conditions combine: "and" fires only when all conditions are met; "or" fires when any condition is met. A composite alert must include at least one "ai" condition and at least one "infra" condition.Alert channels
| Channel | Behavior |
|---|---|
email | Sends a notification to your registered email address (default) |
webhook | Sends an HTTP POST to the webhook_url you configure |
both | Sends email and webhook simultaneously |
webhook or both, you must provide a webhook_url.
Creating an alert
- UI
- API
Choose the alert type
Select one of:
failure_rate, inactivity, cost_threshold, latency_threshold, or composite.Set the target
Select an agent from the dropdown, or enter a service name in the Target service field for infrastructure alerts.
Set the threshold
Enter the threshold value appropriate for the alert type you selected. For composite alerts, configure the
composite_config JSON instead.Choose a delivery channel
Select
email, webhook, or both. If you select webhook or both, paste your webhook URL.Managing alerts
You can update an alert’sthreshold, cooldown_minutes, is_active, channel, or webhook_url at any time using PATCH /api/v1/alerts/{alert_id}. Setting is_active to false pauses the alert without deleting it.
To permanently remove an alert, send DELETE /api/v1/alerts/{alert_id}. Alerts are hard-deleted — this action cannot be undone.