Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/nullclaw/nullclaw/llms.txt

Use this file to discover all available pages before exploring further.

The Ollama provider enables running local language models on your machine using Ollama.

Configuration

provider
string
required
Set to "ollama"
base_url
string
Ollama server URL. Defaults to http://localhost:11434
model
string
required
Model name: llama3.2, mistral, qwen2.5-coder, etc.
temperature
number
Sampling temperature (0.0-2.0). Defaults to 0.7.

Example Configuration

{
  "provider": "ollama",
  "model": "llama3.2",
  "base_url": "http://localhost:11434",
  "temperature": 0.7
}

Installation

  1. Install Ollama:
    # macOS/Linux
    curl -fsSL https://ollama.ai/install.sh | sh
    
    # Or download from https://ollama.ai/download
    
  2. Pull a model:
    ollama pull llama3.2
    
  3. Verify it’s running:
    ollama list
    

Supported Models

  • Llama 3.2: llama3.2, llama3.2:3b, llama3.2:1b
  • Llama 3.1: llama3.1:70b, llama3.1:8b
  • Qwen: qwen2.5-coder, qwen2.5, qwen2.5:32b
  • Mistral: mistral, mistral-nemo, mistral-large
  • Vision: llava, llava:13b, bakllava
  • Code: codellama, deepseek-coder, starcoder2
Browse all models at ollama.ai/library

Capabilities

FeatureSupport
StreamingNo (planned)
Function CallingNo (uses structured output instead)
Vision (images)Yes (llava, bakllava models)
System MessagesYes
Tool CallsPartial (quirky format, auto-fixed)

Tool Call Handling

Ollama models sometimes produce tool calls in quirky formats. The provider auto-fixes these:

Pattern 1: Nested Wrapper

{
  "name": "tool_call",
  "arguments": {
    "name": "shell",
    "arguments": {"cmd": "ls"}
  }
}
The provider unwraps this to {"name": "shell", "arguments": {"cmd": "ls"}}.

Pattern 2: Prefixed Names

  • tool.shellshell
  • tools.file_readfile_read

Pattern 3: Normal

Standard format is passed through unchanged.

Vision Support

Vision models (llava, bakllava) support image input:
{
  "provider": "ollama",
  "model": "llava",
  "base_url": "http://localhost:11434"
}
Images are sent as base64-encoded data in the images array:
{
  "model": "llava",
  "messages": [
    {
      "role": "user",
      "content": "What's in this image?",
      "images": ["iVBORw0KGgo..."]
    }
  ]
}
Note: Ollama only supports base64-encoded images, not URLs. HTTP URLs are automatically skipped.

Remote Ollama Server

To connect to a remote Ollama instance:
{
  "provider": "ollama",
  "model": "llama3.2",
  "base_url": "http://192.168.1.100:11434"
}

Code Example

From src/providers/ollama.zig:
pub const OllamaProvider = struct {
    base_url: []const u8,
    allocator: std.mem.Allocator,

    const DEFAULT_BASE_URL = "http://localhost:11434";

    pub fn init(allocator: std.mem.Allocator, base_url: ?[]const u8) OllamaProvider {
        const url = if (base_url) |u| trimTrailingSlash(u) else DEFAULT_BASE_URL;
        return .{
            .base_url = url,
            .allocator = allocator,
        };
    }

    pub fn chatUrl(self: OllamaProvider, allocator: std.mem.Allocator) ![]const u8 {
        return std.fmt.allocPrint(allocator, "{s}/api/chat", .{self.base_url});
    }
};

Thinking-Only Responses

Some models produce “thinking” content without final output:
{
  "message": {
    "role": "assistant",
    "content": "",
    "thinking": "Let me reason about this carefully..."
  }
}
The provider returns a preview:
I was thinking about this: Let me reason about this carefully... but I didn't complete my response. Could you try asking again?

No Authentication

Ollama runs locally and does not require authentication. All requests are sent without credentials.

Performance Tips

  1. GPU Acceleration: Ollama automatically uses GPU if available (NVIDIA, AMD, Apple Metal)
  2. Model Size: Smaller models (3B, 7B) run faster on consumer hardware
  3. Context Window: Reduce num_ctx in Ollama for faster inference
  4. Quantization: Use quantized models (q4_0, q5_K_M) for better performance