OptionalproviderOverride the provider (auto-detected from model by default).
OptionaltimeoutRequest timeout in milliseconds.
OptionalretriesNumber of retries on failure.
OptionallocalBase URL for the local provider. Overrides the LOCAL_BASE_URL environment variable. Required when using provider: 'local' without LOCAL_BASE_URL set.
OptionallocalAPI key for the local provider. Overrides the LOCAL_API_KEY environment variable. Defaults to "local" for servers that don't validate keys.
OptionallocalRequest timeout in ms for the local provider. Defaults to 60 000 ms. Increase for slow or large local models.
OptionaltemperatureSampling temperature (0-2). Lower = more deterministic, higher = more creative. undefined = provider default.
Can also use semantic presets via creativity option.
OptionalcreativitySemantic creativity level. Alternative to raw temperature values.
If both temperature and creativity are set, temperature takes precedence.
OptionalmaxMaximum tokens to generate. undefined = provider default.
OptionaltopTop-p (nucleus) sampling. undefined = provider default.
OptionalstopStop sequences.
OptionalfrequencyFrequency penalty (-2 to 2).
OptionalpresencePresence penalty (-2 to 2).
OptionalreasoningUnified reasoning level across providers. Maps automatically to provider-specific implementations:
Note: Not all models support reasoning. For unsupported models, this is ignored.
OptionalwebEnable web search (xAI only). Ignored for other providers.
OptionalcacheEnable prompt caching (Anthropic only). Marks the system prompt for caching.
OptionalthinkingThinking budget in tokens for local models that support it (e.g. Qwen3.5 via oMLX).
When set, the model will produce reasoning/thinking content before the final answer.
Thinking content is streamed separately via reasoningContent and does not mix
with the visible response.
Only applies to local provider. Ignored for cloud providers.
OptionalextraArbitrary additional options passed to the provider. Use for bleeding-edge features not yet in the typed interface.
OptionalonCallback for each content delta.
OptionalonCallback when streaming completes.
OptionalonCallback on error during streaming.
Options for streaming generation.