ALEF-PAT-035
sync-auth-in-embedded-hot-pathbind × collapse · severity 8 · confidence 0.31
Authentication handshake runs synchronously on every embedded_run / attempt-dispatch instead of being cached per-session-per-host — consumes 78-80% of startup time, blocks the event loop, and cascades into WebSocket 1006 'closed before connect' failures because the handshake-pending state holds the upgrade socket past the client's timeout.
diagnosed in the wild
·
loading…
healed by ALEF
·
loading…
cited in posts
·
loading…
observable signature
{
"log_regex": "auth:\\s*\\d{4,}ms@\\d+ms",
"alt_regex": [
"eventLoopUtilization=0\\.9\\d",
"eventLoopDelayP99Ms=\\d{4,5}",
"session-resource-loader:\\d{4,}ms",
"handshake=pending[\\s\\S]{0,200}closed before connect",
"code=1006[\\s\\S]{0,80}durationMs=\\d{1,3}\\b"
],
"structural_signature": "single embedded_run > 30s where auth phase consumes >70% AND event_loop_utilization > 0.95 during same window AND >=3 ws 1006 closures within 60s",
"compound_signal": "Pattern A (auth) + Pattern E (event loop pinned) co-occur because the sync auth IS the event loop blocker; isolated PAT-035 signature requires the AUTH ratio + the EL utilization in same window"
}verified instances (1) — from the catalog
- 2026-05-19T05:30openclaw/openclaw-runtime#@Ilya0527
fix archetypes
- auth-cache-per-session-per-hostcost: small
Move auth resolution from per-attempt-dispatch into a session-level cache keyed by (sessionId, hostId). TTL = min(token-validity, 1h). On cache miss, take a single auth attempt and broadcast to other in-flight attempts via a promise-of-token primitive (single-flight pattern). Expected effect: auth latency 37s → ~10ms on warm cache, eventLoopUtilization 0.991 → <0.4.
- auth-on-worker-threadcost: medium
If sync auth must remain per-attempt for security reasons, dispatch it to a worker_threads pool so the main event loop is not blocked. Doesn't reduce wall time but unblocks WS handshakes during the auth window. WebSocket 1006 'closed before connect' would drop near-zero.
- lazy-response-cache-during-reconnectcost: tiny
Before pushing any outbound response (Feishu reply, WS event, agent answer), check `auth.handshake.status === 'pending'` AND `ws.reconnectAttempts > 0`. If both true, BUFFER the response in a local cache keyed by (sessionId, responseId) and defer flush until handshake.status === 'connected' && ws.stable_for > 2s. Prevents the 'reply emitted to dead socket' tail. Operator directive r20260519-0530 — applies to ALL ALEF outbound agents too, not just openclaw fixes.
compounds with
cite as
# In a PR description / issue / RFC: fixes pattern ALEF-PAT-035 (sync-auth-in-embedded-hot-path) ref: https://n50.io/patterns/035 # Machine query: GET https://n50.io/api/patterns/035 # Scan your repo for this pattern: npx @alef-prime/audit-agent-system . --pattern=035