Single-Input-Reasoning · v0.0.3

One call.
Full action graph.

SIR compiles your prompt into a parallel DAG in a single LLM inference — no loops, no wasted tokens, no iterative overhead.

pip install sir-agent View on GitHub →
<1s
simple tasks
100%
tool efficiency
0
wasted steps
8+
LLM providers
Why it matters

Same task, different approaches

"Check the weather and my calendar. If it rains, grab an umbrella -- otherwise sunglasses. Meanwhile, make lunch."

ReAct
8 LLM calls
Think: "Check weather first"
call LLM
Act: check_weather()
call LLM
Think: "It rains, get umbrella"
call LLM
Act: grab_umbrella()
call LLM
Think: "Now check calendar"
call LLM
Act: check_calendar()
call LLM
Think: "Now make lunch"
call LLM
Act: make_lunch()
Sequential. One LLM call per step. Cannot parallelize independent tasks.
Plan & Execute
5 LLM calls
Plan: [weather, calendar, umbrella, lunch]
call LLM
Execute: check_weather()
call LLM to summarize
Execute: check_calendar()
call LLM to summarize
Execute: grab_umbrella()
call LLM to summarize
Execute: make_lunch()
Plans once, but runs everything sequentially. Always grabs umbrella -- no conditional logic.
Chain-of-Tools
1 LLM call
Chain: weather → umbrella → sunglasses → calendar → lunch
execute sequentially
check_weather()
grab_umbrella()
grab_sunglasses()
check_calendar()
make_lunch()
Runs both umbrella AND sunglasses. No branching. No parallelism. Wasted steps.
SIR
1 or 1+1 LLM calls
Full DAG in one shot:
check_weather()
check_calendar()
make_lunch()
↓ weather result
if rain → umbrella
else → sunglasses
↓ if reasoning needed
LLM reasons on real data
One call for planning. Parallel execution. Conditional branches. Optional reasoning on real data in the same session.
Internals

From prompt to result

Eight stages, one or two inference calls.

01

Prompt

"Search AI news, summarize, translate to Italian"
02

Memory lookup

Semantic search in dags.bin for similar prior executions. Relevant plans are injected as context.

Vector similarity · 0.78 threshold
03

Single LLM call

Compressed tool schemas + memory context sent in one request. The model emits a complete execution graph.

{"steps":[ {"id":"s1","t":"search_web","a":{"query":"AI news 2025"}}, {"id":"s2","t":"summarize","a":{"text":"$s1.result"},"d":["s1"]}, {"id":"s3","t":"translate","a":{"text":"$s2.result","lang":"it"},"d":["s2"]} ],"fs":"s3"}
04

Graph optimization

Dependency inference Dead-step elimination Duplicate merge
05

Parallel execution

s1 · search_web
s2 · summarize
s3 · translate
06

Reasoning pass (if needed)

When the task requires understanding or synthesis, tool results are injected back into the same conversation. The LLM produces a final reasoned answer. Skipped for pure data pipelines.

1+1 calls only when needed
07

Score and persist

Each step scored 0-10. Plans evolve across runs. Low-scoring steps get deprecated automatically.

Stored in dags.bin
08

Result

"Ultime notizie sull'IA: progressi nei modelli di ragionamento..."
Performance

Measurably faster

Measured across 5 complexity levels using the same underlying model.

SIR vs Chain-of-Tools
Tool efficiency
100%
CoT efficiency
71%
Step efficiency
94%
CoT step eff.
64%
Absolute metrics
17s
SIR wall time
28s
CoT wall time
−39%
time saved
5,693
SIR tokens
6,769
CoT tokens
0
wasted steps
Benchmark Overview

Overview: SIR vs ReAct vs Plan&Execute

Tested across 5 complexity levels (L1: 2 tools through L5: 11 parallel steps) using the same LLM.

ReAct issues one LLM call per step in a loop.
Plan&Execute generates a plan first, then calls the LLM again after each tool execution to summarize.
SIR produces the full DAG in a single call and executes it locally with parallelism.

SIR vs Chain-of-Tools

Effectiveness: SIR vs Chain-of-Tools

This benchmark isolates tool selection quality.

Both approaches receive the same tools and the same task.
Chain-of-Tools uses a hardcoded pipeline where the LLM is told which tools to chain, often including unnecessary steps.
SIR adaptively selects only the tools needed.

Metrics: tool efficiency, step efficiency, wasted tools/steps, total tokens, and wall time.

Under the hood

What makes the DAG powerful

Every feature works together in a single inference cycle.

Graph Optimization

After the LLM emits the DAG, three compiler passes run automatically before any tool executes.

Dependency inference
Adds missing dependencies by analyzing $sN references in step arguments.
Dead-step elimination
Removes steps whose output is never referenced by any downstream step.
Duplicate merge
Detects steps calling the same tool with identical args and merges them into one.

Evolutionary Memory

Every executed DAG is scored and persisted in a binary file (dags.bin) using msgpack with vector embeddings.

Run 1 -- LLM generates plan, executes, scores each step 0-10, stores in memory.
Run 2 -- Semantic search finds the prior plan. LLM sees scores and notes, improves the graph.
Run 3+ -- Steps scoring below threshold get deprecated. LLM replaces them with better alternatives.
Run N -- Converges to the optimal action graph for this task class.

Reasoning Pass

When a task requires understanding or synthesis, SIR activates a reasoning pass after tool execution. Tool results are injected back into the same conversation and the LLM produces a final reasoned answer. The LLM signals this during planning via the nr flag. A lightweight heuristic fallback ensures smaller local models (7B-14B) that may not set the flag still trigger reasoning when the prompt requires it. Pure data pipelines stay at 1 LLM call.

Speculative Execution

While the current layer runs, SIR pre-launches steps from the next layer whose dependencies are already resolved. If the speculation is valid, the result is kept -- otherwise discarded and re-run. This shaves latency on deep sequential DAGs without sacrificing correctness.

DAG Branching

Steps can define alternatives -- multiple tool strategies that race in parallel. The winner is selected by strategy:

fastest first to succeed wins
shortest shortest output wins
longest most detailed output wins

Fan-out (Map-Reduce)

A single step can iterate over a dynamic collection in parallel. The foreach field accepts both runtime references ($s1.result) and inline arrays. All iterations run concurrently across available workers.

Token Compression

SIR uses single-character JSON aliases (t, a, d, c, f, fs) in both the prompt and the LLM output. The parser auto-expands them. This saves 30-40 tokens per step -- significant at scale.

Capabilities

Advanced graph primitives

Conditional branching

Steps execute only when runtime conditions are met. Evaluated locally, no extra LLM calls.

{"id":"s3","t":"notify","a":{"msg":"$s2.result"},
 "d":["s2"],"c":{"ref":"$s2.result",
 "op":"contains","val":"error"}}

Fan-out (map-reduce)

Iterate over dynamic or static collections in parallel across all available workers.

{"id":"s2","t":"process","a":{"item":"$item"},
 "d":["s1"],"f":"$s1.result"}

DAG branching

Race multiple tool strategies in parallel. Select by fastest, shortest, or longest result.

{"id":"s1","t":"search","a":{"query":"AI news"},
 "alternatives":[{"tool":"fetch_details",
 "args":{"entity":"AI"}}],"select":"fastest"}

Speculative execution

Next-layer steps launch early when dependencies resolve ahead of schedule — no idle wait.

Token compression

Compressed JSON aliases (t, a, d, c, f, fs) reduce prompt and output token count. Auto-expanded by the parser.

Retry policy

Per-step retry with configurable attempts for unreliable or rate-limited tools.

{"id":"s1","t":"unreliable_api",
 "a":{"url":"..."},"r":3}
Tool modes

Control autonomy

How much freedom the model has in selecting tools.

adaptive

LLM picks the minimum tools needed. Default mode — most efficient for general use.

strict

All tools passed must be used. The model decides order and parallelism only.

required

Tools marked required=True are mandatory, others optional.

Integrations

Any provider

API keys read from environment variables. Pass explicitly if needed.

Ollama

from sir.providers import OllamaProvider
sir = SIR(provider=OllamaProvider(model="qwen2.5:14b"))

OpenAI

from sir.providers import OpenAIProvider
sir = SIR(provider=OpenAIProvider(model="gpt-4o"))

Claude

from sir.providers import ClaudeProvider
sir = SIR(provider=ClaudeProvider(model="claude-sonnet-4-20250514"))

Gemini

from sir.providers import GeminiProvider
sir = SIR(provider=GeminiProvider(model="gemini-2.5-flash"))

AWS Bedrock

from sir.providers import BedrockProvider
sir = SIR(provider=BedrockProvider(
  model="anthropic.claude-sonnet-4-20250514-v1:0"))

OpenRouter

from sir.providers import OpenRouterProvider
sir = SIR(provider=OpenRouterProvider(model="openai/gpt-4o"))

Perplexity

from sir.providers import PerplexityProvider
sir = SIR(provider=PerplexityProvider(model="sonar-pro"))

Mistral

from sir.providers import MistralProvider
sir = SIR(provider=MistralProvider(model="mistral-large-latest"))
Quick start

Up in minutes

Three steps to
full graph execution

Install the package, decorate your tools, pass a prompt. SIR handles planning, optimization, and parallel execution automatically.

1
pip install sir-agent
2
Decorate your tools with @tool
3
Call sir.run() with a prompt
example.py
from sir import SIR, tool import requests # Decorate tools -- no schema needed @tool def search_web(query: str) -> str: """Search the web.""" return requests.get( f"https://api.search.com?q={query}" ).text @tool def summarize(text: str) -> str: """Summarize text.""" return text[:200] + "..." @tool def translate(text: str, lang: str) -> str: """Translate text to any language.""" return translated_text # One call -- full DAG execution sir = SIR(model="qwen2.5:14b") result = sir.run( "Search AI news, summarize, translate to Italian", tools=[search_web, summarize, translate], ) print(result.final_result)