Changelog

What we shipped to the SDK.

Every release that changes how you log, replay, or store episodes. SDK calls, ingest pipeline, storage, portal - the surfaces a developer touches. Marketing-site changes land on the team blog instead.

  1. SDK 0.1.0a5.post1 - PyPI README + CLI on the tin

    PEP 440 post-release: same wheel as `0.1.0a5`, refreshed PyPI project-description copy. Quickstart now shows portal API key vs `robotrace login`, documents `robotrace whoami`, adds a CLI command table, and clarifies env vs `~/.robotrace/credentials` resolution.

    • Install pin · `pip install robotrace-dev==0.1.0a5.post1` matches this README drop; `==0.1.0a5` is byte-identical code - pick either for CI reproducibility.
  2. SDK 0.1.0a5 - OTel traceparent respects sampling flag

    The `traceparent` string we attach to episodes now preserves the upstream OpenTelemetry `trace_flags` byte instead of overwriting the sampled bit to `01`. That matches W3C Trace Context - downstream systems that ingest the header (curl replay, sidecars, APMs) no longer contradict the customer's sampler.

    • Behavior change · Previously `capture_trace_context()` forced `…-01` even when `SpanContext.trace_flags` indicated unsampled (`…-00`). Portal deep-links still use `trace_id` / `span_id` only; only the propagation header semantics changed. Upgrade from `0.1.0a4` if you propagate `metadata.otel.traceparent` outside RoboTrace.
  3. SDK 0.1.0a11 - PyPI README + docs pin sweep

    Doc-only release: no code or API changes vs 0.1.0a10. Fixes the PyPI project page still advertising 0.1.0a6 in the README status block and syncs every stale install pin across web docs. SDK ConfigurationError hints now call `_version.install_command()` so future bumps don't drift again. Pin `pip install robotrace-dev==0.1.0a11`.

    • PyPI README · Status line and adapter install pins updated to 0.1.0a11. The distribution-name vs import-name block is unchanged; older pins (0.1.0a10, 0.1.0a9, …) are called out as prior alphas on the same pre-1.0 surface.
    • Web docs · /docs/quickstart, /docs/sdk/{ros2,lerobot,gymnasium,otel,log-episode} install lines bumped to 0.1.0a11. Portal Powered-by chips and SdkInstallCard read live SDK_VERSION from apps/web/lib/sdk/version.ts.
    • SDK error hints · ROS 2, LeRobot, and eval extras ConfigurationError messages now suggest a pin derived from `robotrace.__version__` via `_version.install_command(...)` — no more hardcoded 0.1.0a6 strings in adapter code.
  4. Portal - Failure intelligence (the Explain pillar)

    Every failed episode now runs through an auto root-cause analyzer at finalize time. Nine heuristic rules walk the metadata, the replay against its baseline, and the verification scenarios this candidate is part of - the result is a ranked, confidence-scored list of what went wrong. Visible as a 'Failure insights' card on every episode detail page (admin + portal). Findings are structured so a future LLM layer can write the narrative on top of the same rows.

    • Auto on every failed finalize · The `/api/ingest/episode/<id>/finalize` route runs the analyzer synchronously when status=failed - heuristics read jsonb metadata + three joined tables (eval_results, verification_results, recent successful runs from the same robot for duration baselining), no NPZ download in V1. Analyzer failure never fails the finalize.
    • Nine rules ship in V1 · explicit_outcome (SDK's EpisodeOutcome marker), failure_reason_in_metadata (Episode.__exit__ tracebacks), adapter_upload_error (distinct from a policy bug - look at R2 first), replay_regression (candidate failed where baseline succeeded), verification_failed (CI gate will block this candidate), battery_low (<15% during the run), gymnasium_truncated (env hit max_episode_steps without termination), duration_anomaly (<50% of median successful duration for the same robot), status_failed_no_reason (catch-all so the analyzer always says something).
    • Confidence-ranked findings · Each finding has a stable machine-readable code, a one-line title, a 1-2 sentence description, structured evidence (key/value pills), and a suggested next step. Sorted highest-confidence-first in the UI. Rule order in the harness is the tie-break inside a confidence tier.
    • Failure insights card · Renders on /admin/episodes/[id] and /portal/episodes/[id] when an analysis row exists. Admin sees a 'Re-run analysis' button (Server Action, requireAdmin); portal users see the same findings but can't re-run yet. Stale analyses (older analyzer_version) show a refresh prompt.
    • Audit + RLS · New `failure_analyses` table (one row per episode, upsert on re-run) with RLS mirroring eval_results - org members see their own client's analyses, admins see all via service role. Every analyzer run writes `failure_analysis.completed` or `failure_analysis.errored` to audit_log. New 'Failure analysis' filter on /admin/audit.
    • Marketing · New `ExplainPillarSection` between Replay and Evals on the landing page - mocks the in-product Failure insights card so visitors see what they get. New FAQ entry. Full reference: /docs/portal/failure-intelligence.
  5. Portal - R2 byte reaper

    Hard-deleting an episode now removes its R2 objects too. A Postgres trigger captures the storage keys when the row is deleted (works for admin deletes, portal deletes, and clients CASCADE), a durable queue holds them, and a Vercel Cron drains the queue every 5 minutes - retries with exponential backoff, audit-logs every reap.

    • In-DB capture · A BEFORE DELETE trigger on `episodes` writes one row to `storage_reap_queue` per deleted episode, including cascaded deletes from `clients`. Capture is transactional with the row delete, so a failed enqueue rolls the delete back - we never lose a key.
    • Drain · /api/cron/reap (Vercel Cron, every 5 min) claims up to 50 rows with a claim_token + claim_at lock (stuck claims older than 10 min auto-recover), calls R2 DeleteObjects with a prefix sweep to catch orphan chunks, and acks. Bounded batches keep the function within Vercel's serverless timeouts.
    • Retry · Failures back off 1m → 5m → 15m → 1h → 6h → 24h. After 7 attempts the row flips to `failed` and emits `storage.reap_failed` in audit_log for human triage. Successful reaps emit `storage.reaped` with the deleted_count - retention-policy traceable.
    • Admin UI · /admin/reaper (under System in the sidebar) is the queue inspector: stat strip + status filter chips, per-row error messages, Retry button on failed/pending rows, Run now button to drain the queue ad-hoc. Sidebar carries a backlog badge so pending + failed + stuck rows are visible from anywhere in the console.
    • Kill switch · REAPER_ENABLED=false stops the cron from calling DeleteObjects (queue inserts continue regardless). CRON_SECRET guards the cron route end-to-end; missing secret returns 503, wrong secret 401.
  6. SDK 0.1.0a10 - Live ros2.record(topics=…)

    The offline rosbag adapter is no longer the only ROS 2 path. `ros2.record(topics=[...])` subscribes via rclpy during a live run, writes a tempdir bag, then encodes + uploads + finalizes as one episode on close - same artifact contract as `upload_bag`, the only difference is which side wrote the bag. Pin `pip install 'robotrace-dev[ros2]==0.1.0a10'`.

    • ros2.record(topics=[…]) · Context manager + explicit start/stop API. Subscribes via rclpy, writes to a tempdir rosbag2, pipes through the existing encode_bag + upload_bag pipeline on close. Topics validated against the live ROS 2 graph at start() so typos fail loudly instead of producing an empty bag.
    • rclpy stays unpinned · rclpy ships with the ROS 2 distro via apt (`apt install ros-<distro>-rclpy`) - we deliberately don't pull it from PyPI because the wheels there aren't always compatible with the rmw bindings sourced from a workspace. Lazy-import; missing rclpy raises ConfigurationError pointing at the apt command. The offline upload_bag(...) path stays zero-rclpy.
    • Failure handling · Exception inside the `with` block: episode finalizes as `failed` with the traceback in metadata.failure_reason, then re-raises. Empty bags (no messages received) are silently dropped - no upload, no orphaned tempdir.
    • Live-mode metadata · Episodes stamp `metadata.ros2.mode = "live"` + distro + topic list + message_count so the portal can tell live recordings apart from bag-file uploads when triaging.
    • Tests · 7 new SDK tests exercise the bag-writer end-to-end without rclpy (feed CDR-serialized bytes via rosbags.typesys, then run scan_bag + encode_bag on the output). Suite is now 112 passing. Docs at /docs/sdk/ros2 #live-recording-via-rclpy.
  7. SDK 0.1.0a9 - Typed metadata

    The six payloads robotics teams ship in metadata over and over are now first-class classes. Pass `rt.JointState(...)`, `rt.Pose3D(...)`, `rt.EpisodeOutcome(...)` etc. straight into `metadata={...}` and the portal renders them with per-shape widgets instead of stringified JSON. Pin `pip install 'robotrace-dev==0.1.0a9'`.

    • robotrace.types · Six frozen dataclasses with strict validation - JointState (sensor_msgs/JointState mirror with parallel-array length checks), Pose3D (xyz meters + [x, y, z, w] quaternion), Twist (linear m/s + angular rad/s), Imu (accel + gyro + optional orientation), Battery (percent in [0, 100], voltage, current, charging), EpisodeOutcome (success / reward_total / collision_count / time_to_goal_s, mirrors the eval harness rollup).
    • Wire format · Each instance serializes to a JSON object tagged with `"__type": "robotrace.<name>"`. Existing customers passing free-form dicts see zero behaviour change - the typed classes are a pure superset. Nested typed values inside lists or sub-dicts are encoded recursively.
    • Portal renderer · EpisodeMetadataPreview detects the `__type` tag and dispatches to per-shape widgets: joint sparklines (centered-zero bars + value column), pose grids (xyz + quaternion in mono), twist arrows, IMU triplet, battery pills colored by percent, outcome stats with pass/fail tone.
    • Server-side validation · New shared validator lib/metadata/typed-values.ts runs in the create + finalize ingest routes. Known `__type` shapes are strictly validated (length checks, value ranges); unknown `__type` values pass through so a newer SDK shipping new types against an older server doesn't get rejected.
    • Docs · New /docs/sdk/types covers the six classes, the encoding contract, the convention table (meters / radians / quaternion order), and the forward-compat rule.
  8. SDK 0.1.0a8 - Verification scenarios

    Promote a failed episode and the next candidate has to replay it without re-failing. Adds `robotrace.verify` (Python) and `robotrace verify check` (CLI) as a CI deploy gate, plus a new portal section at /portal/verify and an auto-sync from replay regression finalize. Pin `pip install 'robotrace-dev==0.1.0a8'`.

    • robotrace.verify · Four verbs: promote (turn an episode into a scenario, idempotent), check_gate (read deploy gate state), record_result (upload a per-scenario pass/fail/error), and run_check (run the replays + record results + re-check, one call). Same customer-side runner and weight-stays-local guarantees as robotrace.evals.
    • robotrace verify check · CLI verb for CI: pass --candidate <version> and (when needed) --policy module:fn. Exits 0 when every critical scenario passes for that candidate, 1 when one is blocking, prints a compact pass/fail/pending summary with a portal hyperlink.
    • Portal · /portal/verify + eval-run card · List every active scenario with mini-stats and the latest result per candidate; detail pages show full history per scenario. Eval-run detail pages now render a Verification scenarios card under the DiffCard so CI engineers can see the gate state without leaving the eval view.
    • Auto-sync on eval finalize · When `robotrace replay run` finalizes, the server mirrors every matching baseline result onto its scenario - no extra CLI call if your nightly sweep already covers the verification set.
    • Docs · New /docs/sdk/verify reference covers promotion, severity tiers, the CI gate, programmatic API, error hierarchy, and the V0/V1 scope cuts.
  9. SDK 0.1.0a4 - Replay regression harness

    Re-roll a candidate policy against historical episodes for real. The customer-side runner downloads baseline actions/sensors from R2, replays them through a Python callable on the customer's own hardware, and uploads per-episode diff metrics. The portal renders the same 5-metric DiffCard the marketing site has been promising - now with actual numbers from actual training runs.

    • robotrace.evals · three new verbs · create_run(candidate_policy_version, baseline_episode_ids) opens a campaign and seeds one eval_results row per baseline. run_against(run, policy_callable=...) walks every baseline, fetches its actions.npz + sensors.npz via the signed-GET artifact resolver, runs the customer's policy locally, computes success / reward / collision / time-to-goal / L2 / OOD-share metrics, and posts each result. complete_run(run) triggers the server-side rollup and returns the same summary shape the portal DiffCard renders.
    • robotrace replay run · new CLI verb · Drives the same loop from the command line. Flags: --policy module:fn (importable callable, gunicorn-style), --candidate-version, --baseline-episodes ep1,ep2,… or @file.txt, --baseline-version, --dry-run (skip uploads, useful while iterating on the callable). Prints per-episode progress with clickable portal links and a final summary table. See /docs/sdk/evals for the full reference.
    • Customer-side runner, by design · Per AGENTS.md the policy weights never touch RoboTrace infrastructure - the customer's policy_callable runs on their hardware, the SDK only uploads the per-episode metric blob plus a metadata-only source="replay" episode so the portal can drill from the eval row back to its replay. Reuses the existing artifact resolver route - same RLS guard, no new bytes-egress path.
    • Per-episode safety + the _outcome sentinel · Failures inside policy_callable are caught per-baseline and recorded as status="failed" rows with a truncated traceback - one bad observation can't sink a sweep. Customers who can compute success at the policy layer return a {"_outcome": {success, reward_total, …}} dict in the last action; the runner pulls those values into the candidate columns of the metric blob so the DiffCard shows real movement instead of a delta-of-zero.
    • New portal surface · /portal/evals lists every campaign (success-delta pill, candidate vs baseline policy, status). /portal/evals/[id] shows the rollup DiffCard alongside a per-episode results table that links into each replay episode. Episode detail pages render a 'Part of eval run' pill when metadata.eval_run_id is set so the navigation closes the loop both ways. Three new server routes (POST /api/ingest/eval-run, .../[id]/result, .../[id]/finalize) carry the ingest contract - cross-tenant guards on every baseline_episode_id.
    • Rate-limit ergonomics on 429 · New typed RateLimitError(APIError) with a retry_after int parsed from the Retry-After header - a robot rig that bumps a quota now sees "wait 30s" instead of an opaque APIError. The SDK transparently retries on the safe call sites (start_episode create, signed-PUT uploads, evals.create_run, evals run_against per-result upserts) using Retry-After when present (capped at 30s) and exponential backoff (1, 2, 4s) otherwise. Episode.finalize and evals.complete_run deliberately do NOT auto-retry - the server may have processed the mutation before the 429 was sent back, and re-issuing on future paid tiers could double-bill artifact storage. Catch RateLimitError at the call site, sleep exc.retry_after or 30, retry yourself. See /docs/sdk/errors#ratelimiterror-429.
    • robotrace logout --revoke · Self-revoke from the CLI. Until now logout only removed the local credentials file - the key on the server stayed alive until you opened the portal, which was the wrong default for stolen-laptop and rig-decommission scenarios. Passing --revoke POSTs to a new /api/cli/auth/revoke endpoint authenticated with the stored Bearer key, flips revoked_at on the matching client_api_keys row, then deletes the local file. The route is scoped to "the key you authenticated with" - you can't use one CLI key to revoke a sibling, which keeps the blast radius of a leaked key consistent (attacker holding key A can only kill A, never B / C). Network failure or 5xx still wipes the local file (the point of logout is a local guarantee) but exits non-zero so CI catches it. See /docs/sdk/cli-login#logout-revoke-kill-the-key-server-side-too.
    • What's still V1 · Webhooks (eval_run.completed) - the Team-tier bullet on /pricing. Hosted runner - the eval_runs.runner_kind column is already in place so the V1 schema bump is a default change, not a migration. Cross-run trendlines (v13 vs v14 vs v15). CI-triggered regressions.
  10. SDK 0.1.0a7 - Gymnasium adapter

    Gymnasium env rollouts are now first-class. Run env.step() with your policy, pack observations and actions into NPZ, optionally encode render frames to mp4, and upload as one RoboTrace episode. Pin `pip install 'robotrace-dev[gymnasium]==0.1.0a7'`.

    • robotrace.adapters.gymnasium · Three verbs mirror ROS 2 / LeRobot - scan_env (read spaces and render capability), encode_rollout (write artifacts without network), upload_rollout (one-shot rollout + upload + finalize). Default source is sim. Video comes from env.render() only - pass render_mode='rgb_array' and install [video] for mp4.
    • MuJoCo via Gymnasium · No separate MuJoCo adapter in this release. Most MuJoCo teams already use Gymnasium env ids - install gymnasium[mujoco] on your side and call upload_rollout the same way.
    • Docs · New /docs/sdk/gymnasium reference. Integrations section on the marketing site moves Gymnasium from roadmap to shipped.
  11. Episode delete cascade, admin audit log, platform polish

    Fixes eval baseline FK blocking hard-delete, ships an append-only admin audit trail, and closes legal + docs share-surface work. CLI login polish for SDK 0.1.0a6 is in the May 16 entry.

    • Migration 0013 · eval_results baseline CASCADE · Deleting an episode referenced as an eval-run baseline used to fail on FK RESTRICT. The baseline FK is now ON DELETE CASCADE - per-episode eval_results rows drop with the baseline episode; eval run rollups stay until you archive the run. Portal and admin delete flows work again when regression history exists.
    • Admin audit log · /admin/audit is live - append-only trail of sensitive CMS actions (access decisions, client invites, maintenance toggles, API key mint/revoke, episode deletes, role changes). Rows insert via the service role inside Server Actions; admins read through RLS. Apply migration 0014_audit_log.sql and supabase/policies/audit_log.sql if your deployment predates this ship.
    • Accessibility statement · New /accessibility route (LegalShell, sitemap, maintenance gate). Documents UserWay lazy-loaded on marketing only - portal and admin omit overlays. Footer Company column + portal Help link for procurement conversations.
    • Legal contact routing · /terms and /privacy drop placeholder legal-entity lines and add hello@robotrace.dev for general routing alongside the existing legal@ address.
    • Docs share surfaces (P1f partial) · /docs/quickstart ships a dedicated Open Graph card plus aligned openGraph/twitter metadata (Start-here ribbon) so Slack previews read as onboarding, not the generic docs hub. /request-access, /about, and /contact got the same shared-copy metadata pass.
    • SDK version pins · Admin and portal Powered-by chips read live SDK_VERSION from apps/web/lib/sdk/version.ts (0.1.0a6 at this ship).
  12. SDK 0.1.0a6 - friendlier `robotrace login` terminal output

    CLI sign-in reads warmer and clearer: welcome line, verification block rework, ansi hints on capable ttys, and a tighter in-place spinner. Pin `pip install robotrace-dev==0.1.0a6` for this drop.

    • Portal · The `/cli/auth` countdown now hydrates cleanly (timer starts after mount) so approving a device login no longer flashes a React mismatch in devtools.
  13. SDK 0.1.0a3 - LeRobot adapter

    Hugging Face LeRobot datasets are now first-class. One `pip install`, one call, every trajectory in a Hub dataset becomes its own RoboTrace episode with frame-accurate video, sensor / action NPZ files, and reward / outcome rolled into metadata.

    • robotrace.adapters.lerobot · Four verbs mirror the ROS 2 adapter shape - scan_dataset (read meta only, fast Hub probe), encode_episode (write video.mp4 + sensors.npz + actions.npz for one trajectory), upload_episode (one-shot single episode), upload_dataset (bulk walk every trajectory with optional on_progress callback). Each LeRobot trajectory becomes one RoboTrace episode - natural mapping, no policy decisions to make.
    • Lean install, no torch baggage · The [lerobot] extra deliberately does NOT depend on the `lerobot` PyPI package (which would pull torch + torchvision + pyav + several CUDA wheels). We read the v2.1 on-disk format directly with pyarrow + huggingface_hub. ~20 MB install on top of the base SDK - same footprint as [ros2].
    • Auto-classification of LeRobot columns · observation.images.<cam> → video, action[.x] → actions, next.{reward,done,success,*} → episode-level metadata, observation.* + unknown columns → sensors. Internal LeRobot bookkeeping (timestamp, frame_index, etc.) gets filtered. Multi-camera datasets tile horizontally; pass canonical_camera=... to pin one camera and skip the opencv path entirely (single-cam copies the source mp4 byte-for-byte).
    • Episode outcome surfaces in metadata · next.reward gets summed into a single per-trajectory next.reward_sum on the episode's metadata block, alongside next.done / next.success. Training pipelines can read it without unpacking the actions NPZ.
    • v3.0 dataset format · Multi-episode parquet shards (LeRobot v3.0, late 2025) are NOT yet supported - the adapter raises a clear ConfigurationError pointing at the v2.1 revision fallback. Most public lerobot/* Hub datasets are still v2.1 as of this release; v3.0 lands in a follow-up once we see real-user demand.
  14. SDK 0.1.0a2, R2 storage, and portal polish

    A long day of shipping. Storage went from local-only to a real Cloudflare R2 bucket behind signed URLs, the SDK earned its first OpenTelemetry release, and the portal closed three of its biggest day-1 friction points.

    • SDK 0.1.0a2 · OpenTelemetry trace correlation · New optional [otel] extra (opentelemetry-api only - no heavy SDK). When the SDK detects an active span, it attaches trace_id / span_id / traceparent to every start_episode call. Server validates the W3C trace-context shape and persists it on the episode. The portal episode page renders a Tracing card with copy buttons and an optional one-click 'Open trace' deep-link via NEXT_PUBLIC_TRACE_URL_TEMPLATE (Datadog, Honeycomb, Grafana Tempo, Jaeger). Zero new kwargs - turn it on by installing the extra.
    • Cloudflare R2 wired end-to-end · Episode bytes (.mp4, .npz, .parquet) now flow from the SDK straight to a real R2 bucket via signed PUT URLs minted by the ingest route. The bucket stays private - the DB stores canonical R2 object keys, and a new /api/episodes/[id]/artifact/[kind] route handler mints fresh 1-hour signed GET URLs on every read, gated by the caller's tenant.
    • Episode delete in portal + admin · Three-dot row menu on the episode list (and a matching admin variant) with Archive / Restore / Delete. Delete uses a type-DELETE confirmation dialog to make accidental loss expensive. Note: bytes in R2 are not yet swept - the row-delete clears the DB record only. A reaper worker is on the roadmap.
    • Demo episode for empty portal · First-time-approved users used to land on 'No episodes yet'. They now land on the same empty state, but the preview row is a real clickable Sample run - clicking opens a canonical, read-only sample episode with a synthetic pick-and-place video. Implemented as a sentinel-UUID short-circuit at three boundaries (list page, detail page, artifact resolver) - no migration, no fake DB row, gated behind DEMO_EPISODE_VIDEO_KEY so unseeded deployments can't ship a broken player.
    • Profile vs. Workspace · Settings used to have one editable name field that - silently - also wrote to the workspace name when the caller was the owner. So 'Acme Robotics' became both your personal name (greeting: 'Good afternoon Acme') and your workspace label. The two are now split into a Profile card (personal display name) and a Workspace card (owner-only rename), with clear copy explaining which is which.
  15. ROS 2 adapter - rosbag2 in, episode out

    • ROS 2 adapter (rosbag2 → episode) · The packages/sdk-python [ros2] extra is no longer empty. New scan_bag / encode_bag / upload_bag helpers walk a rosbag2 directory, encode camera topics to .mp4 with OpenCV, and hand the result to the same upload pipeline the rest of the SDK uses. ROS 2 humble + jazzy supported, no rclpy at runtime so you can read bags without a sourced ROS environment.