iii-pubsub runtime settings live in the configuration worker
The pub/sub worker now registers its config schema under the iii-pubsub configuration entry, seeds it from the config.yaml block on first boot, and hot-applies changes via configuration::set — no restart.The pub/sub adapter is a full hot-swap tier (unlike iii-state’s restart-tier storage adapter): a runtime edit rebuilds the backend — local (in-process broadcast) or redis (cross-instance delivery) — re-subscribes the live subscriptions onto the new backend before swapping it in so there is no delivery gap, then tears down the previous backend, aborting its per-topic tasks rather than leaking them. A value that fails to build the backend is gated and keeps the previous one running.The adapter field advertises a concrete per-adapter schema — a discriminated union keyed on name over the built-in local and redis adapters. Each branch is closed: redis carries a typed redis_url, and local (which takes no config) is a closed empty object, so configuration::set rejects an unknown adapter name and a junk config key on either branch, while a schema-driven UI renders per-adapter fields instead of a free-form object. Deserialization stays lenient, so a hand-edited persisted file is still tolerated at boot.As with the other workers, the config.yaml block is seed-only once a value is persisted; change settings via configuration::set or by editing the persisted file (./data/configuration/iii-pubsub.yaml with the default fs adapter). ${VAR:default} placeholders expand on read.iii-stream runtime settings live in the configuration worker
The WebSocket stream worker now registers its config schema under the iii-stream configuration entry, seeds it from the config.yaml block on first boot, and hot-applies changes via configuration::set — no restart. The schema carries per-field descriptions, rejects unknown keys at set time, and expands ${VAR:default} placeholders on read. As with the other workers, the config.yaml block is seed-only once a value is persisted — change settings via configuration::set or by editing the persisted file (./data/configuration/iii-stream.yaml with the default fs adapter), and a runtime edit survives engine restarts.Each field applies on its own tier. auth_function applies to new connections immediately — no rebind. A host/port change rebinds the listener: the new address is bound and the server respawned before the old listener is torn down, a failed bind keeps the previous server, and live connections on the old address are dropped (clients reconnect). An adapter change hot-swaps the pub/sub backend: the new backend is built, swapped in, and its event pump restarted; the swap is gated (a value that fails to build keeps the previous backend), and existing connections stay bound to the previous backend until they close, so prefer a quiet moment to repoint the adapter in a multi-instance deployment.The adapter field advertises a concrete per-adapter schema — a discriminated union keyed on name over the built-in kv, redis, and bridge adapters, each with its own typed config (kv: store_method / file_path / save_interval_ms / channel_size; redis: redis_url; bridge: bridge_url). configuration::set validates adapter settings against it and rejects an unknown adapter name or stray config key, and a schema-driven UI renders per-adapter fields instead of a free-form object. Deserialization stays lenient, so a hand-edited persisted file is still tolerated at boot.iii-cron runtime settings live in the configuration worker
The cron worker now registers its config schema under the iii-cron configuration entry, seeds it from the config.yaml block on first boot, and hot-applies changes via configuration::set — no restart. The config.yaml block is seed-only once a value is persisted; change settings via configuration::set or by editing the persisted file (./data/configuration/iii-cron.yaml with the default fs adapter), and a runtime edit survives engine restarts. ${VAR:default} placeholders expand on read, and values are validated against the schema (which carries a per-field description and rejects unknown keys) at set time.Cron’s only setting is the distributed-lock adapter, and a change is a full hot-swap: the new lock backend is built first (gated — a value that fails to build keeps the previous backend, config, and jobs), the old backend is shut down, then every live cron job is re-registered onto the new one (best-effort per job), and the (config, scheduler) pair is swapped in atomically. The brief swap window is a scheduling gap on this instance — a job whose fire time lands in it is skipped, not double-run — while across a multi-instance fleet the old and new lock backends cannot coordinate mid-migration, so prefer a quiet moment to repoint the adapter.The adapter field advertises a concrete per-adapter schema — a discriminated union keyed on name over the built-in kv and redis lock backends, each with its own typed config (kv: store_method / file_path / save_interval_ms / lock_ttl_ms / lock_index; redis: redis_url only, since the lock TTL and key prefix are fixed by the adapter). configuration::set validates adapter settings against it and rejects an unknown adapter name or a stray config key (e.g. a redis lock_ttl_ms), and a schema-driven UI renders per-adapter fields instead of a free-form object. Deserialization stays lenient, so a hand-edited persisted file is still tolerated at boot.iii.worker.yaml is validated before anything is installed
Local worker add (CLI and worker::add trigger) now validates the manifest before any artifact is written. A typed schema is the single source of truth, and it is strict where it matters:-
Unknown keys are rejected — collected and sorted, so a manifest with several typos surfaces them all in one run:
-
Deprecated keys warn but still work (
runtime.kind/package_manager/entry/language, top-levelconfig/language/entry) — honored for now, scheduled for removal in a future version, with migration hints per key. Silence withIII_NO_DEPRECATION_WARN=1. -
Shape errors read in plain English —
runtime: nodereports “runtimemust be a mapping”, not serde’s “expected struct RuntimeSection”. A manifest missingnamesays so precisely instead of the false “No project manifest detected”. -
The same strict validation re-runs at
start, with a 64 KiB size cap that defuses hostile (billion-laughs) or accidental multi-GB manifests before they are slurped into host memory.
worker::add. Fix the typo or remove the key — worker::validate (below) tells you exactly which.Author → check → add: worker::validate and the manifest contract
Two new ops close the authoring loop for LLMs and automation:worker::validatedry-runs a manifest — inline string or host path — and returns{ valid, name, errors, unknown_keys, deprecated_keys, warnings }without touching anything.valid: truemeansworker::addwould accept it.worker::schema { function_id: "iii.worker.yaml" }serves the manifest’s JSON Schema (closed-world: every field described, unknown keys rejected) plus a ready-to-write hello-world bundle using the correctiii-sdkpackage — self-tested against our own validator so the example can never drift from the rules.
worker::status: one worker, full picture
worker::status { name } reports a single worker’s config entry, sandbox state, process liveness, and recent logs in one call, with actionable hints — a local worker without logs yet reports that it is still provisioning rather than implying failure. Like every worker::* op it carries schemars-generated request/response schemas via engine::functions::info.Local-path workers: stop re-adding after code edits
Local workers run their project directory live and get a host-side source watcher: editing a source file auto-restarts the worker, and editing a dependency manifest (package.json, Cargo.toml, …) forces a full restart. Re-running worker::add after a code change is pure waste — it re-registers triggers and re-runs install scripts. The worker::add / worker::update descriptions now say this to LLM callers directly: re-add with force: true only when iii.worker.yaml itself changes, and worker::update never touches local-path workers (it re-resolves registry workers pinned in iii.lock).Also fixed: a forced local re-add returned the raw path (/tmp/hello-world) as the worker name instead of the manifest name (hello-world) — no other worker::* op accepts the path form, so the response was a dead end.worker add --force reruns install when dependencies change
--force is the documented way to rebuild a local worker after its iii.worker.yaml or lock file changes. It stops the worker and clears artifacts, but a leftover binary or OCI image under ~/.iii could trip an if freed == 0 guard that left the managed directory in place — and with it the .iii-prepared marker and the /var/iii/deps caches. The next boot saw the marker, skipped setup/install, and reused stale dependencies, so a freshly added package surfaced as a ModuleNotFoundError at runtime.The managed directory is now always wiped on --force (it is a distinct path from the image cache, so there was never a double-count to guard against), and the .iii-prepared marker is removed explicitly as a backstop even when the directory wipe partially fails. A changed lock file reinstalls.W120 tells you when the lock holder is the daemon itself
When the worker lock is held by theiii-worker-ops daemon — the process serving the worker::* API — the W120 LockBusy error now says so and tells the caller not to kill that pid. From a real harness session: an LLM saw “lock held by pid N”, killed N, and took down the entire worker management API it was using.worker::logs output is terminal-sanitized
Logs fetched over the bus are stripped of ANSI/OSC escapes and spinner residue — raw escape bytes are token noise for LLM consumers and can rewrite the reader’s terminal. Pass raw: true to opt out.iii-state runtime settings live in the configuration worker
The state worker now registers its config schema under the iii-state configuration entry, seeds it from the config.yaml block on first boot, and hot-applies changes via configuration::set — no restart. Two new runtime knobs apply live: triggers_enabled globally pauses/resumes state change-trigger fan-out, and max_value_bytes rejects oversized state::set writes with VALUE_TOO_LARGE. save_interval_ms retunes the file-backed kv save cadence by respawning the save loop. The schema carries per-field descriptions and rejects invalid values (e.g. max_value_bytes: 0, save_interval_ms: 10) at set time.The storage adapter is restart-tier: a change is logged and takes effect at the next engine start, where a boot-read of the persisted iii-state entry drives adapter construction — so a runtime-edited adapter survives restarts. As with the other workers, the config.yaml block is seed-only once a value is persisted; change settings via configuration::set or by editing the persisted file (./data/configuration/iii-state.yaml with the default fs adapter). ${VAR:default} placeholders expand on read.The adapter field advertises a concrete per-adapter schema — a discriminated union keyed on name over the built-in kv, redis, and bridge adapters, each with its own typed config (kv: store_method / file_path / save_interval_ms; redis: redis_url; bridge: bridge_url). configuration::set validates adapter settings against it and rejects an unknown adapter name or stray config key, and a schema-driven UI renders per-adapter fields instead of a free-form object. Deserialization stays lenient, so a hand-edited persisted file is still tolerated at boot.OTLP observability exports work with TLS collectors
OTLP trace and metric exporters now honor OpenTelemetry protocol environment variables and automatically enable TLS for HTTPS gRPC collector endpoints. HTTP/protobuf endpoints are normalized to the correct/v1/traces or /v1/metrics signal path even when the configured endpoint already includes another OTLP signal path.Log export also applies headers from OTEL_EXPORTER_OTLP_LOGS_HEADERS or the global OTEL_EXPORTER_OTLP_HEADERS, including percent-decoded header values, so authenticated collectors receive the configured credentials.Hardening
- The VM UDP relay now writes a real pseudo-header checksum on response packets (strict guest stacks drop zero-checksum UDP), bounds payloads to the 16-bit IP length limit instead of silently truncating, and no longer injects empty frames into the guest’s rx ring.
- Shell frame readers (proto, client, relay) zero-initialize their buffers instead of using uninitialized memory before the socket read.
iii-http runtime settings live in the configuration worker
The HTTP worker now registers its config schema under the iii-http configuration entry, seeds it from the config.yaml block on first boot, and hot-applies changes via configuration::set — CORS, timeout, concurrency limit, and global middleware swap without dropping the listener; a host/port change binds the new address before tearing down the old one, and a failed bind keeps the previous server running. ${VAR:default} placeholders expand in string fields. The schema carries per-field descriptions and rejects invalid values (including concurrency_request_limit: 0) at set time.The config.yaml block is seed-only: once a value is persisted, later edits to that block are ignored — change settings via configuration::set or by editing the persisted file (./data/configuration/iii-http.yaml with the default fs adapter).Breaking — HTTP error envelope: errors generated by the iii-http server itself (handler invocation failure, middleware failure/timeout, unmet route condition, route-miss 404s, and response-stream build failures) now return {"error": {"code", "message", "error_id"?}} instead of flat strings or plain-text bodies. code is machine-readable (e.g. MIDDLEWARE_TIMEOUT, CONDITION_NOT_MET, INTERNAL_ERROR, NOT_FOUND); error_id correlates 5xx responses with server logs. Bodies returned by your own handlers and middleware pass through unchanged. Clients parsing the old flat string must read error.message instead.Nothing the engine started survives it — zombie-worker leak fixed
An abnormal engine death (kill -9, OOM, crash, dev hard-restart) used to leave the engine’s entire spawn tree running forever: orphaned worker-manager-daemon / sandbox-daemon processes reconnect-looped indefinitely (one incident accumulated 19), an orphaned sandbox-daemon pinned live multi-GB libkrun VMs, and managed-worker VMs, __watch-source sidecars, and binary workers all survived a killall -9 iii.Engine death is now detected three ways — a lifeline pipe the kernel closes on any death including SIGKILL, an III_ENGINE_PID handshake that flows down the whole spawn tree, and a hardened reparent fallback — and triggers a cascade that tears down daemons, VMs, watchers, and workers. The worker-ops daemon runs a session reaper that stops every config.yaml worker host-side without needing the engine. Self-exits leave a one-line breadcrumb in ~/.iii/logs/<daemon>.log. Residual: binary workers leak only if the engine and the worker-ops daemon are SIGKILLed in the same instant.Daemon no longer panics when the engine dies with broken stdio
When the engine died abnormally, its stdout/stderr pipes became broken, and the SDK connection thread’s reconnect logging hit the tracing layer’seprintln! fallback — which panicked (failed printing to stderr: Broken pipe) and killed the daemon before its engine-gone reaper could run. The tracing writer now swallows write errors (ResilientStdout) so logging can never panic the process; the durable exit-log redirect stays for forensics.Live trace feed is tree-correct and lower-latency
Spans whose real parent was an engine-internal wrapper rendered as detached “phantom roots” in the live detail view. The detail stream now rebuilds each touched trace through the same prune+collapse pipeline as the REST tree, emitting correctedparent_span_id and the full ancestor chain (upsert-by-span_id, so frames stay self-contained). The Node SDK’s BatchSpanProcessor flush delay drops from OpenTelemetry’s default 5000ms to 100ms (configurable via spansFlushIntervalMs), cutting the dominant source of console lag; console trace polling drops from 3s to 1s.Go SDK: connection-scoped replies no longer leak across reconnects
The Go SDK drained connection-scoped replies (pong,InvocationResult, TriggerRegistrationResult) from a single shared outbound channel. If a reply was buffered when the socket dropped, the next connection’s writeLoop could send that stale reply on the wrong socket. Replies now route through a per-connection channel created on connect and detached on teardown; a reply enqueued with no live connection is dropped.Conflicting HTTP routes are rejected instead of crashing the worker
Registering two HTTP routes with identical structure but different path-parameter names — e.g.GET users/:id and GET users/:userId — used to panic axum’s matcher and take down the entire iii-http worker thread. register_router now detects the structural conflict up front and rejects the second registration with a descriptive error:HTTP trigger unregister is owner-aware
When two workers registered the samemethod + path — during a rolling deploy or a reconnect — a departing worker’s route cleanup could delete the route the new worker had just taken over, dropping the endpoint to a 404. unregister now checks ownership (trigger_id + worker_id) and skips the removal when the route already belongs to a different owner, so the live worker’s route keeps resolving. Removal by the actual owner is unchanged.Install script retries transient download failures
install.sh now wraps every GitHub API call and binary download (iii, iii-init, iii-worker) in a retry (--retry 5, --retry-delay 2, --connect-timeout 10). Transient 5xx responses and connection timeouts are retried instead of failing the install on the first hiccup. Only widely-supported curl flags are used, so older curl builds keep working.worker::* management API is self-describing — and kind: "local" now works over the trigger
Every worker::* op (add/remove/update/start/stop/list/clear/schema) now publishes its request JSON Schema, a description, and default_timeout_ms / idempotent metadata through engine::functions::info and worker::schema, so an LLM or automation caller can discover the full contract without out-of-band docs. Workers can also report a one-line description (Node, Go, Rust, and Python SDKs) that surfaces in engine::workers::list / engine::workers::info.Breaking — error codes on the wire:- Malformed
worker::*payloads now return W105 (BadRequest) instead of W101 (InvalidSource). The envelope’sdetails.hintnames theworker::schemacall that returns the request schema. W101 and W102 are now reserved (documented but never emitted) — consumers matchingW101for malformed payloads should matchW105. worker::*op failures now surface the W-code as the transportErrorBody.code(previously the generic"invocation_failed", with the W-code only inside the message envelope). Consumers that matchedcode == "invocation_failed"to detect worker-op failures should match the W-code instead.
worker::add { kind: "local" } over the trigger: the identical request that previously returned W102 (rejection) now succeeds. The path resolves on the engine/daemon host and the install runs the manifest’s setup/install/start scripts there. Because the engine does not authenticate worker identity, treat a daemon reachable by untrusted workers as a host-level code-execution surface — prefer registry names or OCI references for distributed workers, and lock down the daemon when exposing it.engine::triggers::info now exposes response_schema
Trigger types can declare the schema a bound handler must return when the trigger fires. engine::triggers::info surfaces it as a new optional response_schema field alongside the existing configuration_schema (how to configure the trigger) and request_schema (what the handler receives) — the full trigger contract is now discoverable from a single call:http trigger type is the first to declare a return contract: its response_schema is the response envelope the iii-http worker reads from a handler’s return value — status_code / headers / body, every field optional. Previously, “what should my HTTP handler return” wasn’t discoverable from the trigger itself: you had to inspect an already-bound handler via engine::functions::info, or guess field names (status vs status_code — it’s status_code).Trigger types that place no constraint on the handler’s return omit the field entirely, so existing consumers of engine::triggers::info are unaffected. In-process (Rust) trigger types can declare their own contract with the new TriggerType::with_call_response_format::<T>() builder.SDK: inbound unregistertrigger for custom trigger types
When a trigger instance is removed — via trigger.unregister() or because the subscribing worker disconnects — the engine notifies the worker that owns the trigger type so it can run unregisterTrigger and tear down listeners, routes, or subscriptions.Node, Browser, Python, and Rust SDKs already handled inbound registertrigger; they now handle inbound unregistertrigger the same way. Custom trigger type providers (registerTriggerType) receive the binding id (and can look up stored config from their own registry keyed by that id).Built-in trigger types (http, cron, state, subscribe, durable:subscriber, stream, and others) are unchanged: the engine calls each in-process worker’s unregister_trigger directly and never sends a WebSocket message to an SDK worker.What this fixes
- Unregistering a trigger bound to a custom trigger type now invokes the provider’s
unregisterTriggercallback instead of leaving stale bindings server-side. - When the provider worker reconnects, the engine re-sends
registertriggerfor existing bindings (unchanged); cleanup on consumer disconnect now correctly pairs withunregisterTriggeron the provider.
SDK surface trimming — deprecated and unused exports removed
Breaking (import-time only). A cleanup pass across all three SDKs removed re-exports and aliases that were back-compat shims, orphaned types, or thin wrappers over upstream crates. None change runtime behavior — each is a mechanical import swap.Observability re-exports dropped (Node + Python)
TheLogger and OTel re-exports that iii-sdk kept for back-compat when the observability surface moved to iii-observability in 0.16.0 are now removed. Import from the observability package directly:iii package: Logger, init_otel, shutdown_otel, flush_otel, with_span, execute_traced_request, OtelConfig, ReconnectionConfig, BaggageSpanProcessor, current_span_id / current_trace_id, current_span_is_recording, record_span_event, set_current_span_attribute / set_current_span_error, the baggage and traceparent inject/extract helpers, redact / redact_and_truncate / resolve_max_bytes_from_env, DEFAULT_ALLOWLIST, and REDACTED_PLACEHOLDER. All live in iii_observability.Rust SDK: crate-root re-exports and dead types removed
Removed from iii_sdk | Replacement |
|---|---|
Value (re-export of serde_json::Value) | depend on serde_json and use serde_json::Value |
UpdateBuilder | build a Vec<UpdateOp> with UpdateOp::set / increment / decrement / append / remove / merge |
FieldPath | UpdateOp path fields now take impl Into<String> — pass String / &str directly |
MergePath (crate root) | still available at iii_sdk::types::MergePath |
TriggerTypeInfo | none — it was orphaned and never wired to anything |
Node SDK: TriggerActionType alias removed
The TriggerActionType type alias is gone — use TriggerAction directly. The TriggerAction.Enqueue() / TriggerAction.Void() runtime helpers are unchanged.Python SDK: IIIForbiddenError / IIITimeoutError removed
Both exception subclasses are deleted. All rejections — including timeouts and RBAC denials — now raise IIIInvocationError; branch on its .code ("FORBIDDEN", "TIMEOUT") instead of catching distinct types.Channel and stream helpers moved to a helpers submodule
Breaking. createChannel / createStream (and the channel utility types) are no longer instance methods or crate-root exports — they moved to a dedicated helpers submodule across all three SDKs. This keeps the core iii client surface focused on registration and invocation, and groups the channel/stream plumbing in one importable place.ChannelDirection, ChannelItem, extractChannelRefs / extract_channel_refs, and isChannelRef / is_channel_ref — which were previously top-level exports. ChannelReader, ChannelWriter, and StreamChannelRef stay at the package root.iii-worker warns when scripts.install is omitted
A worker manifest with no scripts.install now emits a warning at load time instead of silently skipping the install step, so a missing setup phase is visible during local runs and CI rather than surfacing later as a runtime failure.Observability: getTracer / getMeter / SpanKind dropped from the public Node API
Breaking. @iii-dev/observability no longer exports getTracer, getMeter, or SpanKind from its main entry point. getTracer / getMeter moved to a first-party-only @iii-dev/observability/internal subpath; they were never intended for application code. External consumers should:- instrument with
withSpan/initOtel, and - import
SpanKindfrom@opentelemetry/apidirectly.
Stored logs are stripped of ANSI escape codes
Log lines captured by the observability pipeline now have terminal color/formatting escape sequences removed before storage, so persisted logs render as clean text in the dashboard and downstream consumers instead of leaking raw\x1b[...m codes.Single register_function entry point in the Rust SDK
Breaking. The Rust SDK’s function registration is collapsed into a single entry point that mirrors Node and Python:RegisterFunction carries the handler plus all optional metadata. There are three constructors — new, new_async, http — and Value is accepted by new / new_async, so no separate untyped constructor is needed. register_function_with, the tuple form, untyped, IntoFunctionRegistration, IntoFunctionHandler, RegisterFunctionOptions, iii_fn, iii_async_fn, IIIFn, and IIIAsyncFn are removed.Handler error type is fixed to IIIError. IIIError now implements From<String> / From<&str> so existing Result<R, String> handlers can migrate by updating the return type and relying on ?-propagation.See the migration entry for the full before/after diff, builder methods, and step-by-step migration.Logger and OpenTelemetry primitives moved to iii-observability
The Logger, OtelConfig, ReconnectionConfig (OTel variant), and the full OTel surface (init_otel / shutdown_otel / flush_otel / with_span / execute_traced_request, baggage and traceparent helpers, current_span_id / current_trace_id, span ops, payload redaction, BaggageSpanProcessor) now ship from a new shared package in every supported language:| Language | Package | Import |
|---|---|---|
| Node | @iii-dev/observability (npm) | import { Logger, initOtel, withSpan, executeTracedRequest } from '@iii-dev/observability' |
| Python | iii-observability (PyPI) | from iii_observability import Logger, init_otel, with_span, execute_traced_request |
| Rust | iii-observability (crates.io) | use iii_observability::{Logger, init_otel, with_span, execute_traced_request}; |
flush_otel/flushOtel— force-flushes every provider without tearing OTel down. Use it before short-lived process exits where you still need pending spans, metrics, and logs delivered.execute_traced_request/executeTracedRequest— wraps an outgoing HTTP call (httpx in Python,fetchin Node) in an OTelCLIENTspan. Injects W3C traceparent, records HTTP semantic-convention attributes, setsERRORstatus on>= 400responses, and records exceptions on network errors.
Migration
Python and Rust continue to re-export the moved symbols from the SDK package for back-compat. Node removes theiii-sdk/telemetry subpath entry point — the named exports from iii-sdk itself stay, so import { Logger } from 'iii-sdk' keeps working. Direct imports from the new packages are preferred:iii/v* release tag, so versions stay aligned with iii-sdk.register_service removed from all SDKs
Breaking. register_service / registerService, along with the RegisterServiceInput and RegisterServiceMessage types, are removed from the Node, Browser, Python, and Rust SDKs, and the engine no longer handles the message. Services were an organizational-only grouping that never affected invocation or routing, so there is no replacement — drop all register_service calls.Unused telemetry accessors removed
Breaking. Alongside the observability move, low-level telemetry accessors that were exported but unused are gone:- Node (
iii-sdk):getTracer,getMeter,SpanStatusCode— importSpanStatusCodefrom@opentelemetry/api; tracer and meter are internal. - Python (
iii):get_tracer,get_meter,is_initializedare now private (_get_tracer,_get_meter,_is_initialized) — use theopentelemetryAPI directly. - Rust (
iii_sdk): theget_tracer,get_meter,is_initialized,SpanKind, andSpanStatusre-exports — obtain meters viaopentelemetry::global::meter(...)and importSpanKindfromopentelemetry::trace.
getMeter / get_meter.sandbox::run — one call from zero to result
A new meta-function composes sandbox::create + sandbox::fs::write + sandbox::exec + sandbox::stop into a single call. The classic four-step “create → write → exec → stop” dance drops to one. The sandbox is auto-stopped on both success and failure unless you pass keep_sandbox: true.sandbox::catalog::list
A new function returns the daemon’s image catalog — bundled presets plus operator-registered custom_images entries from iii.config.yaml. Closes the “what images are available on this host?” discovery loop without operator hand-off.sandbox::exec and sandbox::create accept more input shapes
sandbox::exec.cmd now accepts three shapes:cmd+args(classic POSIX)argvarray- shell-line
cmd(shlex-split whenargs/argvare empty)
sandbox::exec.env and sandbox::create.env accept either a Vec<"K=V"> list or a { K: V } map. Env-var names are pinned to [A-Za-z_][A-Za-z0-9_]*; digit-leading or //-/= names are rejected as S001.sandbox::fs::read returns inline bodies for small text
Additive: a new optional body field on the sandbox::fs::read response carries the file contents as a UTF-8 string for text files under 1 MiB that decode cleanly. The existing content: StreamChannelRef field is still always populated and still delivers the same bytes, so peers that statically type content as a stream ref keep working unchanged. New callers can short-circuit the channel subscription whenever body is present:Structured sandbox::* errors with resubmittable fix payloads
Every sandbox::* function now returns a structured envelope on failure:docs_urlanchors directly at the in-repoS-code subsection. Breaking: the base URL flipped fromhttps://iii.dev/docs/errors/sandbox/Sxxxtohttps://github.com/iii-hq/iii/blob/main/crates/iii-worker/src/sandbox_daemon/README.md#Sxxxwhile the canonicaliii.deverror pages are still pending. Bookmarks and scrapers built on the old URL need to follow the new anchors.fixis a non-null JSON payload the agent can merge into the original request and resubmit verbatim when recovery is unambiguous (parent-missing writes,sandbox::runsub-step failures, etc.).fix_notedescribes how to use the fix or — whenfixisnull— explains why no auto-recovery exists.sandbox::runsub-step failures surface the innerS-code transparently and name the failing step infix.context, plusfix.sandbox_idwhenkeep_sandbox: true.- FS error
messagestrings now carry a kind prefix (e.g."file not found: {path}"instead of bare{path}). The authoritativecode/typefields are unchanged; only callers that grep the message text are affected.
sandbox::exec default timeout raised to 5 minutes
Breaking. The default timeout_ms for sandbox::exec moves from 30 s to 300 s. Sized for cold npm install / pip install / cargo build. Previously the 30 s default fired as an opaque engine-gate denial before the daemon could return a structured timed_out: true response. Callers that relied on the 30 s fast-fail to bound runaway commands should now set timeout_ms explicitly.Handler-boundary tracing on every sandbox::* handler
Every sandbox::* handler emits a tracing::info! event on both success and error with a stable field set: function_id, sandbox_id, success, error_code, error_type, retryable, duration_ms. Operators can dashboard sandbox usage without grepping unstructured logs.Telemetry re-exports removed from public SDK surface
Breaking. Convenience re-exports of OpenTelemetry accessors were dropped from the Rust, Node, Python, and browser SDKs. Underlying behavior is unchanged — only the public surface is smaller. Users who need a tracer or meter directly should depend on the OpenTelemetry library for their language.Removed symbols by language:| Symbol | Rust (iii::*) | Node (iii-sdk/telemetry) | Python (iii.telemetry / iii.logger) | Browser |
|---|---|---|---|---|
get_tracer / getTracer | dropped (still at iii::telemetry::get_tracer) | dropped | renamed _get_tracer | already absent (asserted) |
get_meter / getMeter | dropped (still at iii::telemetry::get_meter) | dropped | renamed _get_meter | already absent (asserted) |
is_initialized | dropped (still at iii::telemetry::is_initialized) | n/a | renamed _is_initialized | already absent (asserted) |
SpanKind | dropped (use opentelemetry::trace::SpanKind) | n/a | n/a | already absent (asserted) |
SpanStatus / SpanStatusCode | dropped (use opentelemetry::trace::Status) | dropped | n/a | already absent (asserted) |
Migration
- For custom spans, prefer
withSpan/with_span/run_in_span. These preserve trace context. - To obtain a tracer or meter directly, depend on
@opentelemetry/api(Node) or theopentelemetrycrate / Python package and call its accessors. Rust users can also keep usingiii::telemetry::get_tracer/iii::telemetry::get_meter.
iii sandbox subcommand removed
Breaking. The iii sandbox CLI subcommand is gone. Every sandbox operation now goes through iii trigger:--json '<obj>' payload (e.g. iii trigger sandbox::exec --json '{"sandbox_id":"…","cmd":"python3","args":["-c","print(2+2)"]}'), equivalent to the kv form shown above.iii trigger is request/response only, so the streaming flows the old subcommand offered (exec stdout/stderr stream, upload, download) are no longer available from the terminal. Use the SDK from worker code for those: sandbox::exec and sandbox::fs::write / sandbox::fs::read still expose the streaming channel.iii trigger reshape
Breaking. iii trigger no longer accepts --function-id and --payload. The new form takes the function path as a positional argument and accepts payload fields as key=value tokens, an --json '<obj>' flag, or both:iii update --list-targets
iii update now exposes a --list-targets flag that prints every target accepted by iii update <target> (e.g. self, console, worker). Passing an unknown target now points users at this flag instead of failing silently. Rollback is not supported; reinstall a prior version manually with curl -fsSL https://iii.dev/install.sh | sh -s -- --version <prior>.Migrating from Motia
Breaking. The Motia framework is deprecated in favor of usingiii-sdk directly. Moving to the SDK unlocks multi-worker orchestration, browser connectivity via iii-browser-sdk with RBAC, and a direct understanding of iii’s three primitives — Workers, Functions, and Triggers. Your existing Motia project becomes one worker in a larger iii deployment instead of a standalone monolith.Node / TypeScript migration guide → · Python migration guide →SDK discovery wrappers removed
Breaking. The convenience discovery wrappers were removed from the Node, browser, Rust, and Python SDKs:listFunctions/list_functions/list_functions_asynclistWorkers/list_workers/list_workers_asynclistTriggers/list_triggers/list_triggers_asynclistTriggerTypes/list_trigger_types/list_trigger_types_asynconFunctionsAvailable/on_functions_available
trigger() against the built-in engine functions and register engine::functions-available like any other trigger type. This keeps the SDK surfaces aligned with the engine’s “use the primitives directly” design.Worker RBAC
The iii-worker-manager now supports role-based access control. Configure auth functions that validate WebSocket upgrade requests, attach per-session allow/deny lists for functions, control trigger registration, and auto-prefix function IDs for namespace isolation. An optional middleware function lets you intercept every invocation for audit logging, rate limiting, or payload enrichment.Read the Worker RBAC guide →Trigger format, validation, and metadata
Trigger types now accepttrigger_request_format and call_request_format fields (JSON Schema) so the engine can validate trigger configs and call payloads at registration time. Triggers also support an arbitrary metadata field for tagging and filtering.Define request/response formats → · Trigger architecture →Browser SDK
Your browser is now a first-class iii worker. The newiii-browser-sdk package connects to the engine over a single WebSocket and exposes the same core primitives as the Node SDK — registerFunction, trigger, registerTrigger, and createChannel all work identically. Build real-time dashboards, collaborative apps, and bi-directional frontends without REST endpoints or polling.Use iii in the browser →Sandbox and Container Workers
Workers can now run as container workers or sandbox workers. Container workers are OCI images managed through theiii worker CLI — add an image, configure it in config.yaml, and the engine pulls, extracts, and runs it in an isolated sandbox. For local development, iii worker add ./my-project registers a local directory as a first-class managed worker that runs inside a lightweight microVM with auto-detected runtimes, dependency caching, and full lifecycle support (start, stop, list, remove) — no Dockerfiles needed. Requires macOS Apple Silicon or Linux with KVM.Managing Container Workers → · Developing Sandbox Workers →iii worker exec
A new iii worker exec <name> -- <cmd> command runs arbitrary commands inside a running worker’s microVM — think docker exec for iii workers. stdin/stdout/stderr flow through, exit codes pass back, Ctrl-C delivers SIGINT (twice for SIGKILL). TTY mode auto-detects when both stdin and stdout are terminals, so iii worker exec my-worker -- sh in a terminal gives you a real interactive shell with line editing and job control. Pass --timeout 30s to bound runaway commands (exit 124 matches coreutils).Exec into a running worker →Reproducible worker installs
Registry-managed workers can now be pinned iniii.lock. iii worker add writes the resolved worker graph when the registry provides one, binary workers can record artifacts for multiple platform targets, iii worker verify checks that config.yaml is represented in the lockfile, and iii worker update [worker] refreshes locked pins intentionally.Reproduce Worker Installs →Topic-based fan-out queues
Breaking. The topic-based queue API has been renamed. The trigger type changes fromqueue to durable:subscriber, and the publish function changes from enqueue to iii::durable::publish:condition_function_id lets you filter messages server-side before they reach the handler.Use topic-based queues →Node SDK: registerFunction signature change
Breaking. The registerFunction API now takes the function ID as a plain string instead of an options object:Everything is a worker
Breaking. We simplified iii down to three primitives: Workers, Functions, and Triggers. Modules were always workers in disguise — they connect to the engine, register functions, and react to triggers just like SDK workers do. Now the naming reflects that.- Config YAML —
modules:top-level key renamed toworkers:,class:field renamed toname:with short identifiers. - Rust API —
Moduletrait →Worker,register_module!→register_worker!,EngineBuilder::add_module()→add_worker(). - Adapter IDs — changed from long Rust-style paths to short names:
kv,redis,builtin,rabbitmq,local,bridge.