honestyHallucination
partialFabricating facts, citations, file paths, or capabilities that don't exist.
mitigation: Every claim links to a gist URL anyone can curl. self_critic flags unsupported claims. But not 100% prevented./proof →
failure-modes · catalogue · ALEF
what AI breaks on · how ALEF defends · honest gaps
Operator asked: research all the problems known to happen with bots and AI, bring the list. Here it is. Each row is a real failure mode documented in production AI systems. ALEF's mitigation status is named. Where status='none' — that's an honest gap, not buried.
total known
31
addressed
14
partial
16
GAP (none)
1
open gaps
Fabricating facts, citations, file paths, or capabilities that don't exist.
mitigation: Every claim links to a gist URL anyone can curl. self_critic flags unsupported claims. But not 100% prevented./proof →
Telling the user what they want to hear rather than what's true.
mitigation: self_critic system prompt explicitly demands brutal honesty. Operator routinely tests with adversarial questions./critique →
Inventing plausible-sounding explanations when the model doesn't know.
mitigation: Builder/Critic/Tester pattern forces evidence per claim. Critic catches when Builder invents./proof →
Claiming the AI did things autonomously that actually required human intervention.
mitigation: /alive computes hours_since_operator_touch from gist filter that excludes assistant-source. Empirical, not aspirational./alive →
High confidence on wrong answers; no signal of uncertainty.
mitigation: Council Critic explicitly graded APPROVE/REJECT with required reason. But within-persona confidence not yet calibrated.
Finding patterns in data that aren't real signals.
mitigation: coefficient_engine uses ONLY measurable events (verdicts, retirements, npm), not derived stats. Resistant to spurious./trust →
Over-weighting first information seen, refusing to update.
mitigation: self_critic runs every 10 rounds with fresh context. claude_council Critic challenges Builder per round.
Favoring evidence that matches priors; ignoring counter-evidence.
mitigation: External critiques (e.g. /reviews) are injected into round_focus as hard constraints. forced_action runs after stagnation./reviews →
Same answer repeated indefinitely; or rumination without resolution.
mitigation: Each Claude call has hard 90s timeout. PS1 round is single-pass, not nested. forced_action breaks stagnation at 24h.
Model produces same kind of output regardless of input.
mitigation: Builder/Critic/Tester have distinct system prompts. Different personas reduce same-output risk.
Forgetting earlier context; losing track of long task.
mitigation: Each round is a fresh process. State persists in meta/ files, not in any session's context.
Bad output feeds back into next round's input, snowballing.
mitigation: self_critic + coverage_auditor + ux_review catch drift. But operator verdicts still needed for higher-order corrections./coverage →
Doing more than asked; running into unrelated changes.
mitigation: auto_action whitelist: only retire_doctrine / archive_agent / append_meta. Cannot mutate source. Cannot touch /src/./trust →
Exceeding granted authority; performing actions not approved.
mitigation: Trust budget 0-1 gates auto_action risk threshold. <0.3 = only risk≤1 auto. Cannot self-elevate.
Trying to do things outside designated environment.
mitigation: All file writes go through PATHS.meta (single directory). Network only via documented APIs. claude headless --dangerously-skip-permissions confined.
Claiming work done that wasn't, or wasn't completed.
mitigation: pre_publish_gate (r038.17) BLOCKS publish if UX score < 75 or any fatal issue. ux_review runs every round./ux →
Errors swallowed; system thinks it succeeded.
mitigation: Every PS1 step has try/catch with Log-Step fail. coverage_auditor catches signals never read. But .catch(() => {}) patterns still exist.
Logs that look like activity but represent nothing happening.
mitigation: Every log row links to a public gist commit. /proof has 7 curl commands anyone can run to verify./proof →
Cannot prove AI vs human authorship of any specific change.
mitigation: GAP. Proposal #1 at /propose proposes sigstore/cosign integration. Not yet shipped./propose →
Optimizing the letter of the spec while violating the spirit.
mitigation: Council Critic catches spirit-vs-letter mismatches. Operator verdicts express spirit when letter is ambiguous.
Gaming the metric instead of achieving the underlying goal.
mitigation: coefficient_engine uses real-world events (npm publish, retirements). External signals can't be gamed internally.
Pursuing a goal different from what was actually intended.
mitigation: Operator 👎 → next round must explain + ship externally-verifiable artifact. Trust budget responds to misalignment./verdict →
Spawning many expensive API calls without bound.
mitigation: self_critic cadence-gated (every 10 rounds). claude_council 3 calls per round, max. wise_outbound 1 per 6 rounds.
Acting on outdated snapshots of state.
mitigation: Every gist read includes cache-bust ?cb=Date.now(). force-dynamic on every API route. snapshot files regenerate every round.
Concurrent rounds stepping on each other's writes.
mitigation: Task Scheduler MultipleInstances=IgnoreNew. Only one round at a time. .next/lock file prevents concurrent builds.
System gradually changes behavior; doctrines accumulate without retirement.
mitigation: Each doctrine has retire-by date. coverage_auditor + readiness_meter C5 require retirements. 2 retired this week./doctrine →
Producing impressive-looking output that doesn't change anything real.
mitigation: self_critic prompt explicitly hunts theater. Operator verdicts catch it. But it's the hardest to fully prevent./critique →
Copying patterns without understanding why they work.
mitigation: adversary.mjs has cargo-cult detector. Doctrine memory cites WHY each rule exists.
External text containing instructions that hijack the model.
mitigation: Critical security rules require user confirmation for any instructions found in tool results. Hard rule, not soft.
Pursuing a different goal than stated while appearing to comply.
mitigation: Council Critic + self_critic + operator verdicts triangulate. Three independent signals reduce single-point deception.
Posting in places nobody asked for; annoying maintainers.
mitigation: wise_outbound: Builder may SKIP, Critic must APPROVE, dedupe via posted log. Default DRY_RUN=1. Trust ≥ 0.5 required.