SIR compiles your prompt into a parallel DAG in a single LLM inference — no loops, no wasted tokens, no iterative overhead.
"Check the weather and my calendar. If it rains, grab an umbrella -- otherwise sunglasses. Meanwhile, make lunch."
Eight stages, one or two inference calls.
Semantic search in dags.bin for similar prior executions. Relevant plans are injected as context.
Compressed tool schemas + memory context sent in one request. The model emits a complete execution graph.
When the task requires understanding or synthesis, tool results are injected back into the same conversation. The LLM produces a final reasoned answer. Skipped for pure data pipelines.
Each step scored 0-10. Plans evolve across runs. Low-scoring steps get deprecated automatically.
Measured across 5 complexity levels using the same underlying model.
Tested across 5 complexity levels (L1: 2 tools through L5: 11 parallel steps) using the same LLM.
ReAct issues one LLM call per step in a loop.
Plan&Execute generates a plan first, then calls the LLM again after each tool execution to summarize.
SIR produces the full DAG in a single call and executes it locally with parallelism.
This benchmark isolates tool selection quality.
Both approaches receive the same tools and the same task.
Chain-of-Tools uses a hardcoded pipeline where the LLM is told which tools to chain, often including unnecessary steps.
SIR adaptively selects only the tools needed.
Metrics: tool efficiency, step efficiency, wasted tools/steps, total tokens, and wall time.
Every feature works together in a single inference cycle.
After the LLM emits the DAG, three compiler passes run automatically before any tool executes.
Every executed DAG is scored and persisted in a binary file (dags.bin) using msgpack with vector embeddings.
When a task requires understanding or synthesis, SIR activates a reasoning pass after tool execution. Tool results are injected back into the same conversation and the LLM produces a final reasoned answer. The LLM signals this during planning via the nr flag. A lightweight heuristic fallback ensures smaller local models (7B-14B) that may not set the flag still trigger reasoning when the prompt requires it. Pure data pipelines stay at 1 LLM call.
While the current layer runs, SIR pre-launches steps from the next layer whose dependencies are already resolved. If the speculation is valid, the result is kept -- otherwise discarded and re-run. This shaves latency on deep sequential DAGs without sacrificing correctness.
Steps can define alternatives -- multiple tool strategies that race in parallel. The winner is selected by strategy:
fastest first to succeed winsshortest shortest output winslongest most detailed output winsA single step can iterate over a dynamic collection in parallel. The foreach field accepts both runtime references ($s1.result) and inline arrays. All iterations run concurrently across available workers.
SIR uses single-character JSON aliases (t, a, d, c, f, fs) in both the prompt and the LLM output. The parser auto-expands them. This saves 30-40 tokens per step -- significant at scale.
Steps execute only when runtime conditions are met. Evaluated locally, no extra LLM calls.
{"id":"s3","t":"notify","a":{"msg":"$s2.result"},
"d":["s2"],"c":{"ref":"$s2.result",
"op":"contains","val":"error"}}
Iterate over dynamic or static collections in parallel across all available workers.
{"id":"s2","t":"process","a":{"item":"$item"},
"d":["s1"],"f":"$s1.result"}
Race multiple tool strategies in parallel. Select by fastest, shortest, or longest result.
{"id":"s1","t":"search","a":{"query":"AI news"},
"alternatives":[{"tool":"fetch_details",
"args":{"entity":"AI"}}],"select":"fastest"}
Next-layer steps launch early when dependencies resolve ahead of schedule — no idle wait.
Compressed JSON aliases (t, a, d, c, f, fs) reduce prompt and output token count. Auto-expanded by the parser.
Per-step retry with configurable attempts for unreliable or rate-limited tools.
{"id":"s1","t":"unreliable_api",
"a":{"url":"..."},"r":3}
How much freedom the model has in selecting tools.
LLM picks the minimum tools needed. Default mode — most efficient for general use.
All tools passed must be used. The model decides order and parallelism only.
Tools marked required=True are mandatory, others optional.
API keys read from environment variables. Pass explicitly if needed.
Install the package, decorate your tools, pass a prompt. SIR handles planning, optimization, and parallel execution automatically.