observra

Production Deployment Guide

How to configure observra for production agent workloads running on cloud infrastructure (Cloud Run, Kubernetes, VMs).


For most production deployments, the OTel backend is the right choice. It pushes telemetry via OTLP HTTP to any compatible observability platform (Dynatrace, Datadog, Grafana, Honeycomb, New Relic, etc.).

import observra

observra.initialize(
    backend="otel_log",
    endpoint="https://your-platform.example.com/api/v2/otlp/v1/logs",
    headers={"Authorization": "Api-Token YOUR_TOKEN"},
    service_name="my-agent-svc",
)

plugin = observra.create_plugin()
# Pass plugin to your framework runner

Why OTel for production: - Works from ephemeral containers (no filesystem persistence needed) - Vendor-neutral — switch platforms with one config change - Follows gen_ai.* OpenTelemetry semantic conventions - Auto-indexed attributes for fast querying


Configuration Patterns

1. OTel Logs — Single Observability Platform

Best for: teams with one observability platform (DT, Datadog, Grafana).

import observra

observra.initialize(
    backend="otel_log",
    endpoint="https://your-instance.live.dynatrace.com/api/v2/otlp/v1/logs",
    headers={"Authorization": "Api-Token dt0c01.XXXX"},
    service_name="my-agent-svc",
)

Using the standard OTel environment variables:

observra's initialize() does not read telemetry settings from the environment, so the backend must be selected in code. The underlying OTLP exporter, however, honors the standard OpenTelemetry environment variables — so you can keep endpoint, headers, and service name out of your source:

export OTEL_EXPORTER_OTLP_LOGS_ENDPOINT=https://your-instance.live.dynatrace.com/api/v2/otlp/v1/logs
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Api-Token dt0c01.XXXX"
import observra

# The OTLP exporter reads endpoint + headers from the OTEL_* env vars above.
# service_name is set in code (defaults to "observra" if omitted).
observra.initialize(backend="otel_log", service_name="my-agent-svc")

2. OTel Spans — Distributed Tracing

Best for: multi-service architectures where you need trace correlation.

import observra

observra.initialize(
    backend="otel",
    endpoint="https://your-instance.live.dynatrace.com/api/v2/otlp/v1/traces",
    headers={"Authorization": "Api-Token dt0c01.XXXX"},
    service_name="my-agent-svc",
)

3. MultiBackend — Observability + Local Debug

Best for: teams that need both real-time observability AND a local audit trail.

import observra
from observra.backends.otel_log import OTelLogBackend
from observra.backends.jsonl import JSONLBackend

otel = OTelLogBackend(
    endpoint="https://your-instance.live.dynatrace.com/api/v2/otlp/v1/logs",
    headers={"Authorization": "Api-Token dt0c01.XXXX"},
    service_name="my-agent-svc",
)
local = JSONLBackend(path="/data/telemetry.jsonl")

observra.initialize(backend="multi", backends=[otel, local])

4. MultiBackend — OTel + Webhook (SIEM forwarding)

Best for: teams that need both observability push AND SIEM ingestion via webhook.

import observra
from observra.backends.otel_log import OTelLogBackend
from observra.backends.webhook import WebhookBackend

otel = OTelLogBackend(
    endpoint="https://your-instance.live.dynatrace.com/api/v2/otlp/v1/logs",
    headers={"Authorization": "Api-Token dt0c01.XXXX"},
    service_name="my-agent-svc",
)
siem = WebhookBackend(url="https://your-siem/api/v1/events")

observra.initialize(backend="multi", backends=[otel, siem])

5. stdout/Cloud Logging (GCP-native path)

Best for: GCP deployments where Cloud Logging is already integrated with your observability platform (e.g., DT GCP integration auto-ingests).

import observra

observra.initialize(
    backend="jsonl",
    path="/dev/stdout",
)

Note: This writes the native TelemetryEvent schema as JSON lines to stdout. Cloud Run captures stdout as Cloud Logging entries. Your platform's GCP integration must be configured to ingest these logs. The event JSON lands in the content field as a string — your log processing pipeline must parse it.


Backend Selection Guide

Deployment Recommended Backend Why
Cloud Run / serverless otel_log No filesystem; direct push
Kubernetes (GKE, EKS) otel_log or multi Push to platform; optional local JSONL on PV
VM / long-running multi (otel + jsonl) Push + local audit trail
GCP-native stack otel_log or jsonl to stdout Leverage Cloud Logging integrations
Multi-consumer (observability + SIEM) multi (otel + webhook) Each consumer gets its ideal format
Air-gapped / on-prem jsonl No external connectivity needed

Schema Differences by Backend

All backends receive the same TelemetryEvent dataclass. The serialized output differs:

Backend Schema Format Key Characteristics
jsonl Native flat JSON Full data{} dict preserved, event_type, model_name, tool_name as top-level keys
webhook Native JSON POST body Same schema as JSONL, delivered via HTTP
otel (spans) OTel semantic conventions gen_ai.* attributes, flat, strings
otel_log (logs) OTel semantic conventions gen_ai.* attributes, flat, strings

OTel backends remap fields to semantic conventions:

Native → OTel
──────────────────────────────────────────────────
event_type          → observra.event_type
model_name          → gen_ai.request.model
tool_name           → gen_ai.tool.name
agent_name          → gen_ai.agent.name
session_id          → observra.session_id
data.input_tokens   → gen_ai.usage.input_tokens (string)
data.output_tokens  → gen_ai.usage.output_tokens (string)
data.cost_usd       → observra.cost_usd (string)
data.error_type     → error.type
framework           → observra.framework

OTel backends are lossy: only explicitly mapped fields from data{} are exported as attributes. The full data dict is not preserved. If you need the complete event payload downstream, use multi with a native backend leg.


Platform-Specific Examples

Dynatrace

# OTel Logs (recommended)
observra.initialize(
    backend="otel_log",
    endpoint="https://{env-id}.live.dynatrace.com/api/v2/otlp/v1/logs",
    headers={"Authorization": "Api-Token dt0c01.XXXXX"},
    service_name="my-agent-svc",
)

# OTel Spans (if you need trace waterfall view)
observra.initialize(
    backend="otel",
    endpoint="https://{env-id}.live.dynatrace.com/api/v2/otlp/v1/traces",
    headers={"Authorization": "Api-Token dt0c01.XXXXX"},
    service_name="my-agent-svc",
)

Datadog

observra.initialize(
    backend="otel_log",
    endpoint="https://http-intake.logs.datadoghq.com/api/v2/otlp/v1/logs",
    headers={"DD-API-KEY": "your-api-key"},
    service_name="my-agent-svc",
)

Grafana Cloud (Loki via OTLP)

observra.initialize(
    backend="otel_log",
    endpoint="https://otlp-gateway-prod-us-east-0.grafana.net/otlp/v1/logs",
    headers={"Authorization": "Basic BASE64_ENCODED_USER:TOKEN"},
    service_name="my-agent-svc",
)

Honeycomb

observra.initialize(
    backend="otel",
    endpoint="https://api.honeycomb.io/v1/traces",
    headers={"x-honeycomb-team": "your-api-key"},
    service_name="my-agent-svc",
)

Self-hosted OpenTelemetry Collector

observra.initialize(
    backend="otel_log",
    endpoint="http://otel-collector.monitoring:4318/v1/logs",
    service_name="my-agent-svc",
)

Production Checklist

  • [ ] Use environment variables for secrets — never hardcode API tokens
  • [ ] Set service_name — identifies your agent in multi-service environments
  • [ ] Configure cost thresholds — get alerted on runaway LLM spend
  • [ ] Test with backend="jsonl" first — verify events are captured before adding OTel
  • [ ] Use MultiBackend for dual-consumer — don't force one schema on both observability and SIEM
  • [ ] Handle graceful shutdown — use atexit hooks or framework lifecycle to flush buffered events
  • [ ] Set appropriate batch sizes — larger batches reduce network calls but increase data-loss window on crash

Graceful Shutdown

OTel backends use BatchLogRecordProcessor / BatchSpanProcessor which buffer events. The BackgroundWorker flushes automatically via atexit, but for containers that receive SIGTERM (Cloud Run, Kubernetes), explicit shutdown ensures no events are lost:

import atexit
import observra

observra.initialize(backend="otel_log", ...)

# The worker registers its own atexit handler automatically.
# For explicit control (e.g., signal handlers), access the worker directly:
atexit.register(observra._worker.shutdown)

For frameworks with lifecycle hooks (FastAPI, Flask):

from contextlib import asynccontextmanager
from fastapi import FastAPI

@asynccontextmanager
async def lifespan(app: FastAPI):
    yield
    if observra._worker is not None:
        observra._worker.shutdown()

app = FastAPI(lifespan=lifespan)

Note: observra._worker is an internal — a public observra.shutdown() API is planned but not yet exposed. The worker's atexit handler covers most deployments without explicit shutdown calls.


Security Considerations

  • Redaction is on by default — API keys, tokens, and common secret patterns are scrubbed from event payloads before they reach any backend
  • Add custom redaction patterns for org-specific secrets: python observra.initialize( backend="otel_log", custom_patterns=[ (r"sk-[a-zA-Z0-9]{48}", "OPENAI_KEY"), (r"ACME_TOKEN_[A-Z0-9]+", "ACME_TOKEN"), ], )
  • Encryption at rest — JSONL backend supports optional AES encryption: python observra.initialize( backend="jsonl", path="telemetry.jsonl", encryption_key=os.environ["TELEMETRY_ENCRYPTION_KEY"].encode(), )
  • Never log prompts/responses in production unless you have explicit consent and data residency compliance. The library redacts by default but scrub carefully when compliance is a concern.

Monitoring the Telemetry Pipeline

Use the self-observability API to monitor the health of the telemetry pipeline itself:

from observra import observability

metrics = observability.get_metrics()
# {
#   "drop_count": 0,           # events lost to backpressure
#   "queue_depth": 3,          # pending events in buffer
#   "write_latency_p99": 0.02, # 99th percentile write latency (seconds)
#   "backend_write_success": 1547,
#   "backend_write_failure": 0,
# }

Expose these via your /health or /metrics endpoint to alert on pipeline issues.

Edit this page on GitHub ↗