Why it matters

Same task, different approaches

"Check the weather and my calendar. If it rains, grab an umbrella -- otherwise sunglasses. Meanwhile, make lunch."

ReAct

8 LLM calls

Think: "Check weather first"

call LLM

Act: check_weather()

call LLM

Think: "It rains, get umbrella"

call LLM

Act: grab_umbrella()

call LLM

Think: "Now check calendar"

call LLM

Act: check_calendar()

call LLM

Think: "Now make lunch"

call LLM

Act: make_lunch()

Sequential. One LLM call per step. Cannot parallelize independent tasks.

Plan & Execute

5 LLM calls

Plan: [weather, calendar, umbrella, lunch]

call LLM

Execute: check_weather()

call LLM to summarize

Execute: check_calendar()

call LLM to summarize

Execute: grab_umbrella()

call LLM to summarize

Execute: make_lunch()

Plans once, but runs everything sequentially. Always grabs umbrella -- no conditional logic.

Chain-of-Tools

1 LLM call

Chain: weather → umbrella → sunglasses → calendar → lunch

execute sequentially

check_weather()

grab_umbrella()

grab_sunglasses()

check_calendar()

make_lunch()

Runs both umbrella AND sunglasses. No branching. No parallelism. Wasted steps.

SIR

1 or 1+1 LLM calls

Full DAG in one shot:

check_weather()

check_calendar()

make_lunch()

↓ weather result

if rain → umbrella

else → sunglasses

↓ if reasoning needed

LLM reasons on real data

One call for planning. Parallel execution. Conditional branches. Optional reasoning on real data in the same session.

Internals

From prompt to result

Eight stages, one or two inference calls.

Prompt

"Search AI news, summarize, translate to Italian"

Memory lookup

Semantic search in dags.bin for similar prior executions. Relevant plans are injected as context.

Vector similarity · 0.78 threshold

Single LLM call

Compressed tool schemas + memory context sent in one request. The model emits a complete execution graph.

{"steps":[ {"id":"s1","t":"search_web","a":{"query":"AI news 2025"}}, {"id":"s2","t":"summarize","a":{"text":"$s1.result"},"d":["s1"]}, {"id":"s3","t":"translate","a":{"text":"$s2.result","lang":"it"},"d":["s2"]} ],"fs":"s3"}

Graph optimization

Dependency inference Dead-step elimination Duplicate merge

Parallel execution

s1 · search_web

↓

s2 · summarize

↓

s3 · translate

Reasoning pass (if needed)

When the task requires understanding or synthesis, tool results are injected back into the same conversation. The LLM produces a final reasoned answer. Skipped for pure data pipelines.

1+1 calls only when needed

Score and persist

Each step scored 0-10. Plans evolve across runs. Low-scoring steps get deprecated automatically.

Stored in dags.bin

Result

"Ultime notizie sull'IA: progressi nei modelli di ragionamento..."

Performance

Measurably faster

Measured across 5 complexity levels using the same underlying model.

SIR vs Chain-of-Tools

Tool efficiency

100%

CoT efficiency

71%

Step efficiency

94%

CoT step eff.

64%

Absolute metrics

17s

SIR wall time

28s

CoT wall time

−39%

time saved

5,693

SIR tokens

6,769

CoT tokens

wasted steps

Overview: SIR vs ReAct vs Plan&Execute

Tested across 5 complexity levels (L1: 2 tools through L5: 11 parallel steps) using the same LLM.

ReAct issues one LLM call per step in a loop.
Plan&Execute generates a plan first, then calls the LLM again after each tool execution to summarize.
SIR produces the full DAG in a single call and executes it locally with parallelism.

Effectiveness: SIR vs Chain-of-Tools

This benchmark isolates tool selection quality.

Both approaches receive the same tools and the same task.
Chain-of-Tools uses a hardcoded pipeline where the LLM is told which tools to chain, often including unnecessary steps.
SIR adaptively selects only the tools needed.

Metrics: tool efficiency, step efficiency, wasted tools/steps, total tokens, and wall time.

Under the hood

What makes the DAG powerful

Every feature works together in a single inference cycle.

Graph Optimization

After the LLM emits the DAG, three compiler passes run automatically before any tool executes.

Dependency inference

Adds missing dependencies by analyzing $sN references in step arguments.

Dead-step elimination

Removes steps whose output is never referenced by any downstream step.

Duplicate merge

Detects steps calling the same tool with identical args and merges them into one.

Evolutionary Memory

Every executed DAG is scored and persisted in a binary file (dags.bin) using msgpack with vector embeddings.

Run 1 -- LLM generates plan, executes, scores each step 0-10, stores in memory.

Run 2 -- Semantic search finds the prior plan. LLM sees scores and notes, improves the graph.

Run 3+ -- Steps scoring below threshold get deprecated. LLM replaces them with better alternatives.

Run N -- Converges to the optimal action graph for this task class.

Reasoning Pass

When a task requires understanding or synthesis, SIR activates a reasoning pass after tool execution. Tool results are injected back into the same conversation and the LLM produces a final reasoned answer. The LLM signals this during planning via the nr flag. A lightweight heuristic fallback ensures smaller local models (7B-14B) that may not set the flag still trigger reasoning when the prompt requires it. Pure data pipelines stay at 1 LLM call.

Speculative Execution

While the current layer runs, SIR pre-launches steps from the next layer whose dependencies are already resolved. If the speculation is valid, the result is kept -- otherwise discarded and re-run. This shaves latency on deep sequential DAGs without sacrificing correctness.

DAG Branching

Steps can define alternatives -- multiple tool strategies that race in parallel. The winner is selected by strategy:

fastest first to succeed wins

shortest shortest output wins

longest most detailed output wins

Fan-out (Map-Reduce)

A single step can iterate over a dynamic collection in parallel. The foreach field accepts both runtime references ($s1.result) and inline arrays. All iterations run concurrently across available workers.

Token Compression

SIR uses single-character JSON aliases (t, a, d, c, f, fs) in both the prompt and the LLM output. The parser auto-expands them. This saves 30-40 tokens per step -- significant at scale.

Capabilities

Advanced graph primitives

Conditional branching

Steps execute only when runtime conditions are met. Evaluated locally, no extra LLM calls.

{"id":"s3","t":"notify","a":{"msg":"$s2.result"},
 "d":["s2"],"c":{"ref":"$s2.result",
 "op":"contains","val":"error"}}

Fan-out (map-reduce)

Iterate over dynamic or static collections in parallel across all available workers.

{"id":"s2","t":"process","a":{"item":"$item"},
 "d":["s1"],"f":"$s1.result"}

DAG branching

Race multiple tool strategies in parallel. Select by fastest, shortest, or longest result.

{"id":"s1","t":"search","a":{"query":"AI news"},
 "alternatives":[{"tool":"fetch_details",
 "args":{"entity":"AI"}}],"select":"fastest"}

Speculative execution

Next-layer steps launch early when dependencies resolve ahead of schedule — no idle wait.

Token compression

Compressed JSON aliases (t, a, d, c, f, fs) reduce prompt and output token count. Auto-expanded by the parser.

Retry policy

Per-step retry with configurable attempts for unreliable or rate-limited tools.

{"id":"s1","t":"unreliable_api",
 "a":{"url":"..."},"r":3}

Integrations

Any provider

API keys read from environment variables. Pass explicitly if needed.

Ollama

from sir.providers import OllamaProvider
sir = SIR(provider=OllamaProvider(model="qwen2.5:14b"))

OpenAI

from sir.providers import OpenAIProvider
sir = SIR(provider=OpenAIProvider(model="gpt-4o"))

Claude

from sir.providers import ClaudeProvider
sir = SIR(provider=ClaudeProvider(model="claude-sonnet-4-20250514"))

Gemini

from sir.providers import GeminiProvider
sir = SIR(provider=GeminiProvider(model="gemini-2.5-flash"))

AWS Bedrock

from sir.providers import BedrockProvider
sir = SIR(provider=BedrockProvider(
model="anthropic.claude-sonnet-4-20250514-v1:0"))

OpenRouter

from sir.providers import OpenRouterProvider
sir = SIR(provider=OpenRouterProvider(model="openai/gpt-4o"))

Perplexity

from sir.providers import PerplexityProvider
sir = SIR(provider=PerplexityProvider(model="sonar-pro"))

Mistral

from sir.providers import MistralProvider
sir = SIR(provider=MistralProvider(model="mistral-large-latest"))

Quick start

Up in minutes

Three steps to
full graph execution

Install the package, decorate your tools, pass a prompt. SIR handles planning, optimization, and parallel execution automatically.

pip install sir-agent

Decorate your tools with @tool

Call sir.run() with a prompt

example.py
from sir import SIR, tool
import requests

# Decorate tools -- no schema needed
@tool
def search_web(query: str) -> str:
    """Search the web."""
    return requests.get(
        f"https://api.search.com?q={query}"
    ).text

@tool
def summarize(text: str) -> str:
    """Summarize text."""
    return text[:200] + "..."

@tool
def translate(text: str, lang: str) -> str:
    """Translate text to any language."""
    return translated_text

# One call -- full DAG execution
sir = SIR(model="qwen2.5:14b")
result = sir.run(
    "Search AI news, summarize, translate to Italian",
    tools=[search_web, summarize, translate],
)
print(result.final_result)

One call.Full action graph.

Same task, different approaches

From prompt to result

Prompt

Memory lookup

Single LLM call

Graph optimization

Parallel execution

Reasoning pass (if needed)

Score and persist

Result

Measurably faster

Overview: SIR vs ReAct vs Plan&Execute

Effectiveness: SIR vs Chain-of-Tools

What makes the DAG powerful

Graph Optimization

Evolutionary Memory

Reasoning Pass

Speculative Execution

DAG Branching

Fan-out (Map-Reduce)

Token Compression

Advanced graph primitives

Conditional branching

Fan-out (map-reduce)

DAG branching

Speculative execution

Token compression

Retry policy

Control autonomy

Any provider

Ollama

OpenAI

Claude

Gemini

AWS Bedrock

OpenRouter

Perplexity

Mistral

Up in minutes

Three steps tofull graph execution

One call.
Full action graph.

Three steps to
full graph execution