LLM Layer¶
nah can optionally consult an LLM to resolve ambiguous ask decisions that the deterministic classifier can't handle.
Tool call → nah (deterministic) → LLM (optional) → Claude Code permissions → execute
The deterministic layer always runs first. The LLM only sees leftover ask decisions. If no LLM is configured or available, the decision stays ask and the user is prompted.
Providers¶
nah supports 5 LLM providers. Configure one or more in cascade order -- first success wins.
| Provider | API | Default model | Auth env var |
|---|---|---|---|
ollama |
Chat API (/api/chat) |
qwen3.5:9b |
(none -- local) |
openrouter |
OpenAI-compatible | google/gemini-3.1-flash-lite-preview |
OPENROUTER_API_KEY |
openai |
Responses API (/v1/responses) |
gpt-5.3-codex |
OPENAI_API_KEY |
anthropic |
Messages API (/v1/messages) |
claude-haiku-4-5 |
ANTHROPIC_API_KEY |
cortex |
Snowflake Cortex REST | claude-haiku-4-5 |
SNOWFLAKE_PAT |
All providers use urllib.request (stdlib) -- no external HTTP dependencies.
Configuration¶
# ~/.config/nah/config.yaml
llm:
enabled: true
providers: [ollama, openrouter] # cascade order
ollama:
url: http://localhost:11434/api/chat
model: qwen3.5:9b
timeout: 10
openrouter:
url: https://openrouter.ai/api/v1/chat/completions
key_env: OPENROUTER_API_KEY
model: google/gemini-3.1-flash-lite-preview
timeout: 10
Provider examples¶
llm:
enabled: true
providers: [ollama]
ollama:
url: http://localhost:11434/api/chat
model: qwen3.5:9b
timeout: 10
llm:
enabled: true
providers: [openrouter]
openrouter:
url: https://openrouter.ai/api/v1/chat/completions
key_env: OPENROUTER_API_KEY
model: google/gemini-3.1-flash-lite-preview
llm:
enabled: true
providers: [openai]
openai:
url: https://api.openai.com/v1/responses
key_env: OPENAI_API_KEY
model: gpt-5.3-codex
llm:
enabled: true
providers: [anthropic]
anthropic:
url: https://api.anthropic.com/v1/messages
key_env: ANTHROPIC_API_KEY
model: claude-haiku-4-5
llm:
enabled: true
providers: [cortex]
cortex:
account: myorg-myaccount # or set SNOWFLAKE_ACCOUNT env var
key_env: SNOWFLAKE_PAT
model: claude-haiku-4-5
LLM options¶
eligible¶
Control which ask categories route to the LLM:
llm:
eligible: default # default: unknown, lang_exec, context (excludes composition and sensitive)
eligible: all # route all ask decisions to LLM
eligible: # explicit list
- unknown
- lang_exec
- context
- composition # must be explicitly added
- sensitive # must be explicitly added
The default set routes unknown, lang_exec, and context to the LLM. Categories like composition and sensitive are excluded by default (they involve pipe safety or sensitive paths and should generally prompt the user). Add them explicitly if you want LLM resolution for those too.
max_decision¶
Cap the LLM's escalation power:
llm:
max_decision: ask # default: LLM can allow or ask, never block
max_decision: block # LLM can block (full trust)
When the LLM suggests block but max_decision is ask, the decision is downgraded to ask with the LLM's reasoning preserved in the prompt.
context_chars¶
How much conversation transcript context to include in the LLM prompt:
llm:
context_chars: 12000 # default: 12000 characters of recent transcript
Set to 0 to disable transcript context entirely.
The transcript is read from Claude Code's JSONL conversation file. It includes user/assistant messages and tool use summaries, wrapped with anti-injection framing.
How the cascade works¶
- nah tries each provider in the order listed in
providers: - If a provider returns
alloworblock, that decision is used - If a provider returns
uncertain, the cascade stops (doesn't try the next provider) - If a provider errors (timeout, auth failure), nah tries the next provider
- If all providers fail or return uncertain, the decision stays
ask
Testing¶
nah test "python3 -c 'import os; os.system(\"rm -rf /\")'"
# Shows: LLM eligible: yes/no, LLM decision (if configured)
The nah test command shows LLM eligibility and, if enabled, makes a live LLM call so you can verify the full pipeline.