- Target
- github.com/crewAIInc/crewAI
- Commit
- 5cdc420
- Scope
- 1,198 .py files
- Verified findings
- 4 (+3 dropped)
Observation A — Library raises SystemExit on context-window overflow
Token / context exhaustion · utilities/agent_utils.py:738
raise SystemExit(
"Context length exceeded and user opted not to summarize. "
"Consider using smaller text or RAG tools from crewai_tools."
)
Why an agent degrades here: SystemExit asks the Python runtime to terminate the interpreter — not just the task. Embed CrewAI in a long-lived host (FastAPI worker, Celery task, daemonised orchestrator) and a single agent that overflows the provider's context window can take the whole process down, including other agents and in-flight requests. The Agent class defaults to the safer summarisation path; anyone driving an executor directly walks onto a process-killing exception inside library code.
Observation B — max_execution_time cannot actually interrupt a running agent
Unenforceable timeout · agent/core.py:832-865
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(ctx.run, self._execute_without_timeout, ...)
try:
return future.result(timeout=timeout)
except concurrent.futures.TimeoutError as e:
future.cancel() # no-op on an already-running future
raise TimeoutError(...) from e
Why an agent degrades here: Future.cancel() is a documented no-op once a future has started — and by the time result(timeout=…) raises, the worker has already run the full window. Worse, the ThreadPoolExecutor is a context manager, so __exit__ calls shutdown(wait=True), which blocks until the running future finishes on its own. max_execution_time is a label, not an enforcement: code relying on it to recover from a hung tool will not recover. (The async path uses asyncio.wait_for and is unaffected — only the sync path reached by Crew.kickoff().)
Observation C — Async agent loop blocks the event loop during parallel tool execution
Event-loop starvation · agents/crew_agent_executor.py:1281 → 722-747
# inside async def _ainvoke_loop_native_tools(...)
tool_finish = self._handle_native_tool_calls(answer, available_functions)
# ^ synchronous method — no await, no asyncio.to_thread
# ... which drains a ThreadPoolExecutor with blocking future.result()
Why an agent degrades here: _handle_native_tool_calls is synchronous; its parallel branch drains a ThreadPoolExecutor with blocking future.result() calls. Called from an async loop with no await / run_in_executor, it parks the entire event loop while the tools run. Co-tenant CrewAI with an async web framework and every other coroutine — agents, request handlers, heartbeats — is starved until the slowest tool returns. The new experimental.AgentExecutor, which the deprecation notice steers users toward, re-implements the same blocking pattern (experimental/agent_executor.py:1594).
Observation D — Inter-task context aggregates without bound or summarisation
Unbounded context growth · utilities/formatter.py:13-26 · crew.py:1808
DIVIDERS: Final[str] = "\n\n----------\n\n"
def aggregate_raw_outputs_from_task_outputs(task_outputs):
return DIVIDERS.join(output.raw for output in task_outputs)
Why an agent degrades here: When Task.context is left at its default sentinel (truthy by default), _get_context falls through to the 'aggregate everything prior' branch — a naive join over the raw text of every previous task output. No token budget, no truncation, no per-source summary. In a long sequential crew the prompt to task N grows with the sum of all upstream output, so late-stage agents pay the cumulative cost of all earlier chatter. The degradation is silent until the prompt collides with the context window — at which point control transfers to Observation A.
What was hunted and dropped
Three further candidates — a per-chunk streaming exception swallow, a recursive retry path, and a guardrail re-execution loop — were inspected and deliberately excluded. Each is bounded or defensible, not bulletproof. An honest four-finding report beats a padded five: every observation above survives adversarial review against the source.
This is what an ALEF deep scan looks like.
If your product builds on CrewAI — or any agent-orchestration framework — these same structural patterns likely exist in your integration. ALEF runs this analysis from an isolated zone on a clean copy of your code. No connection to your systems; nothing leaves your control beyond the snapshot you hand us. You receive a report exactly like this one, scoped to your repository.
Request a White-Glove Private Scan →Read-only static analysis · isolated zone · report-only deliverable