r/README.md

// retoor <retoor@molodetz.nl>

# R - Autonomous AI Agent CLI

**Version:** 2.0.0  
**Author:** retoor <retoor@molodetz.nl>  
**License:** MIT  
**Language:** C99  
**Build System:** GNU Make

---

# 📖 TABLE OF CONTENTS

1. [User Guide](#user-guide)
   - [What is R?](#what-is-r)
   - [Key Features](#key-features)
   - [Installation](#installation)
   - [Quick Start](#quick-start)
   - [Use Cases](#use-cases)
   - [Interactive Commands](#interactive-commands)
2. [Technical Specification](#technical-specification)
   - [Architecture Overview](#architecture-overview)
   - [Build System](#build-system)
   - [Configuration](#configuration)
   - [Tool System](#tool-system)
   - [Agent System](#agent-system)
   - [DeepSearch Algorithm](#deepsearch-algorithm)
   - [Database Schema](#database-schema)
   - [API Integration](#api-integration)
   - [Security Model](#security-model)
   - [Performance Characteristics](#performance-characteristics)
   - [Development Guide](#development-guide)

---

# USER GUIDE

## What is R?

R is a high-performance, autonomous AI agent command-line interface written in C. It implements the **ReAct (Reasoning + Acting)** paradigm, enabling complex multi-turn task execution through an intelligent loop of:

1. **Reasoning** - Analyzing the current state and planning next steps
2. **Acting** - Executing tools (shell commands, file operations, web searches)
3. **Observing** - Processing tool outputs
4. **Refining** - Adjusting the plan based on observations

Unlike simple chatbots, R can:
- Write and execute code
- Browse the web and research topics
- Manage files and directories
- Spawn sub-agents for parallel task execution
- Monitor and control background processes
- Maintain persistent state across sessions

## Key Features

### 🧠 Autonomous Agent Architecture
- **ReAct Loop:** Up to 300 iterations of reasoning-acting-observing cycles
- **Sub-agent Spawning:** Hierarchical agent delegation with budget tracking
- **Plan-Execute-Verify-Conclude Lifecycle:** Structured task management
- **Context Management:** Automatic context window optimization

### 🔧 25+ Built-in Tools
All tools are OpenAI-compatible with strict JSON schema validation:

| Category | Tools |
|----------|-------|
| **Terminal** | `linux_terminal_execute`, `terminal_interactive` |
| **File System** | `read_file`, `write_file`, `directory_glob`, `mkdir`, `chdir`, `getpwd` |
| **Code** | `python_execute`, `code_grep`, `code_symbol_find` |
| **File Edit** | `file_line_replace`, `file_apply_patch` |
| **HTTP/Web** | `http_fetch`, `web_search`, `web_search_news`, `deepsearch` |
| **Database** | `db_get`, `db_set`, `db_query` |
| **Process** | `process_monitor`, `process_get_status`, `process_terminate` |
| **Network** | `network_check`, `dns_lookup`, `network_port_scan` |
| **Research** | `index_source_directory`, `research_dispatcher`, `fetch_and_scrape` |
| **Automation** | `automation_fuzz`, `automation_exploit_gen` |
| **Data** | `csv_export` |
| **Agent** | `spawn_agent` |
| **Snapshot** | `create_snapshot`, `list_snapshots`, `restore_snapshot` |

### ⚡ Performance
- **Parallel Tool Execution:** pthread-based concurrent tool runs
- **Asynchronous Background Tasks:** Non-blocking long-running operations
- **Real-time Output Streaming:** Live process output with PID tracking
- **Memory Efficient:** Context window management with intelligent shrinking

### 🎨 Terminal Experience
- **Syntax Highlighting:** Python code with ANSI colors and line numbers
- **Markdown Rendering:** Headers, lists, code blocks, tables
- **Process Traceability:** Every output line tagged with `[PID]`
- **Spinner Indicators:** Visual feedback for HTTP operations

## Installation

### Prerequisites

#### Ubuntu/Debian
```bash
sudo apt-get update
sudo apt-get install -y \
    gcc make \
    libreadline-dev \
    libncurses5-dev \
    libcurl4-openssl-dev \
    libssl-dev \
    libjson-c-dev \
    libsqlite3-dev \
    libtinfo-dev \
    libgnutls28-dev \
    libgmp-dev
```

#### macOS (with Homebrew)
```bash
brew install readline ncurses curl json-c sqlite3 openssl gmp gnutls
```

#### Fedora/RHEL
```bash
sudo dnf install -y \
    gcc make \
    readline-devel \
    ncurses-devel \
    libcurl-devel \
    openssl-devel \
    json-c-devel \
    sqlite-devel \
    gnutls-devel \
    gmp-devel
```

### Build & Install

```bash
# Clone repository
git clone <repository-url>
cd r

# Standard build
make build

# Build with debug symbols
make debug

# Install system-wide
sudo make install

# Verify installation
r --help
```

### Docker Build
```bash
make docker_make
make docker_run
```

### AppImage Build
```bash
make appimage
./r-x86_64.AppImage
```

## Quick Start

### 1. Set up API Key
```bash
export R_KEY="your-api-key-here"
```

Or use fallback environment variables:
- `OPENROUTER_API_KEY`
- `OPENAI_API_KEY`

### 2. Interactive Mode (REPL)
```bash
./r
```

REPL Commands:
- `!clear` - Clear session history
- `!dump` - Export message history as JSON
- `!session` - Show current session ID
- `!new` - Start new session
- `!tools` - List available tools
- `!models` - List available models
- `!model <name>` - Switch model
- `!verbose` - Toggle verbose mode
- `!vi` / `!emacs` - Set editing mode
- `exit` - Quit

### 3. Single Command Mode
```bash
# Direct prompt
./r "List all files in current directory"

# Read from stdin
echo "Analyze this code" | ./r --stdin

# Include Python file as context
./r --py script.py "Explain this code"

# Include custom context file
./r --context project.txt "Summarize the architecture"

# Specify session
./r -s mysession "Your prompt"

# API mode (JSON output)
./r --api "Your prompt"

# Disable syntax highlighting
./r --nh "Your prompt"
```

## Use Cases

### Software Development

#### Code Review & Analysis
```bash
# Review a codebase
./r --py main.py "Review this code for security issues and best practices"

# Understand complex code
./r "Index the src/ directory and explain the architecture"

# Refactoring assistance
./r "Refactor this function to use better error handling" --py module.py
```

#### Automated Testing
```bash
# Generate test cases
./r "Read src/calculator.c and generate comprehensive unit tests"

# Test execution with monitoring
./r "Run the test suite in background and monitor progress"
```

#### Documentation Generation
```bash
# Generate API documentation
./r "Read all header files in include/ and generate API documentation"

# Create README from code
./r "Analyze the codebase and create a comprehensive README.md"
```

### DevOps & System Administration

#### System Monitoring
```bash
# Check system health
./r "Monitor system processes and report any anomalies"

# Log analysis
./r "Analyze /var/log/syslog for error patterns in the last 24 hours"

# Network diagnostics
./r "Check network connectivity and DNS resolution for google.com"
```

#### Automation Scripts
```bash
# Batch file processing
./r "Find all .log files in /var/log and compress those older than 7 days"

# Configuration management
./r "Update all configuration files to use the new API endpoint"

# Deployment automation
./r "Execute deployment script and verify all services are running"
```

### Research & Data Analysis

#### Web Research
```bash
# Quick search
./r "Search for the latest news on renewable energy"

# Deep research (multi-iteration)
./r "Deep search: What are the current best practices for microservices architecture?"

# Competitive analysis
./r "Research the top 5 competitors in the cloud storage market"
```

#### Data Processing
```bash
# CSV analysis
./r "Read data.csv and calculate summary statistics for each column"

# Python data science
./r "Load the dataset and create visualizations of the trends"

# Database queries
./r "Query the local database for all agents created in the last hour"
```

### Cybersecurity

#### Security Auditing
```bash
# Port scanning
./r "Scan localhost for open ports and identify services"

# DNS reconnaissance
./r "Perform DNS lookup and analyze the records for example.com"

# Fuzzing
./r "Fuzz the API endpoint /api/v1/users with common payloads"
```

#### Vulnerability Research
```bash
# Exploit generation
./r "Generate exploit code for CVE-2024-XXXX"

# Security report analysis
./r "Read security_scan.txt and summarize the critical findings"
```

### Content Creation

#### Writing Assistance
```bash
# Technical writing
./r "Write a technical blog post about async/await in Python"

# Documentation
./r "Create user documentation for the file system tools"

# Code comments
./r "Add comprehensive docstrings to all functions in utils.py"
```

### Education & Learning

#### Tutorial Generation
```bash
# Explain concepts
./r "Explain how recursion works with examples in C"

# Code walkthrough
./r "Walk through the quicksort implementation line by line"

# Quiz generation
./r "Create a quiz with 10 questions about data structures"
```

## Interactive Commands

When in REPL mode, special commands prefixed with `!` are available:

| Command | Description |
|---------|-------------|
| `!clear` | Clear session message history |
| `!dump` | Export conversation as JSON |
| `!session` | Display current session identifier |
| `!new` | Start a fresh session |
| `!tools` | List all registered tools with descriptions |
| `!models` | Query and display available LLM models |
| `!model <name>` | Switch to a different LLM model |
| `!verbose` | Toggle verbose logging mode |
| `!vi` | Set line editor to vi mode |
| `!emacs` | Set line editor to emacs mode |
| `exit` | Quit the application |

---

# TECHNICAL SPECIFICATION

## Architecture Overview

### System Architecture Diagram

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                              USER INTERFACE                                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │   REPL Mode  │  │ Single Cmd   │  │   API Mode   │  │  Stdin Pipe  │      │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘      │
└─────────┼─────────────────┼─────────────────┼─────────────────┼──────────────┘
          │                 │                 │                 │
          └─────────────────┴─────────┬───────┴─────────────────┘
                                      │
                              ┌───────▼────────┐
                              │  Main Entry    │
                              │   (main.c)     │
                              └───────┬────────┘
                                      │
┌─────────────────────────────────────▼──────────────────────────────────────┐
│                              AGENT CORE                                      │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                         ReAct Loop                                  │    │
│  │  ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐          │    │
│  │  │ Reason  │───▶│  Plan   │───▶│  Act    │───▶│ Observe │──┐       │    │
│  │  └─────────┘    └─────────┘    └─────────┘    └─────────┘  │       │    │
│  │                             ▲───────────────────────────────┘       │    │
│  │                             └────────────────────────────────       │    │
│  │                                              (max 300 iterations)   │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                             │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐             │
│  │ Context Manager │  │ Message History │  │ Budget Tracker  │             │
│  │(agent.c:context)│  │  (messages.c)   │  │  (agent.c:tok)  │             │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘             │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
┌─────────────────────────────────────▼──────────────────────────────────────┐
│                            TOOL SYSTEM                                       │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                     Tool Registry (tool_registry.c)                 │    │
│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐       │    │
│  │  │Terminal │ │  File   │ │  HTTP   │ │ Python  │ │  Agent  │  ...  │    │
│  │  └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘       │    │
│  └───────┼───────────┼───────────┼───────────┼───────────┼────────────┘    │
│          │           │           │           │           │                 │
│          ▼           ▼           ▼           ▼           ▼                 │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    Parallel Execution (pthreads)                    │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────────────────┘
                                      │
┌─────────────────────────────────────▼──────────────────────────────────────┐
│                         EXTERNAL INTERFACES                                  │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │  LLM API     │  │  HTTP/Curl   │  │   SQLite     │  │   System     │      │
│  │(http_client) │  │(http_curl.c) │  │  (db_sqlite) │  │   (bash)     │      │
│  └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘      │
└─────────────────────────────────────────────────────────────────────────────┘
```

### Module Hierarchy

```
r/
├── src/
│   ├── main.c              # Application entry point, CLI parsing, REPL
│   ├── agent.c             # Core ReAct loop, agent lifecycle management
│   ├── messages.c          # Message history, context window management
│   ├── context_manager.c   # Context shrinking, token optimization
│   ├── tool_registry.c     # Tool registration, parallel execution
│   ├── bash_executor.c     # Shell command execution with PID tracking
│   ├── http_client.c       # HTTP client wrapper
│   ├── db.c                # SQLite database operations
│   ├── markdown.c          # Markdown to ANSI rendering
│   ├── r_config.c          # Configuration management (singleton)
│   ├── tools/              # Tool implementations
│   │   ├── tools_init.c    # Tool registration coordinator
│   │   ├── tool_terminal.c # Shell execution tools
│   │   ├── tool_python.c   # Python execution tool
│   │   ├── tool_file.c     # File I/O tools
│   │   ├── tool_file_edit.c# File editing tools
│   │   ├── tool_http.c     # HTTP and web search tools
│   │   ├── tool_deepsearch.c# Deep research tool
│   │   ├── tool_agent.c    # Sub-agent spawning
│   │   └── ... (17 more)
│   ├── core/               # Core utilities
│   │   ├── memory.c        # Memory management wrappers
│   │   ├── string.c        # String utilities
│   │   └── buffer.c        # Dynamic buffer
│   ├── util/               # General utilities
│   │   ├── path.c          # Path manipulation
│   │   └── time.c          # Time utilities
│   ├── impl/               # Implementation wrappers
│   │   ├── db_sqlite.c     # SQLite implementation
│   │   └── http_curl.c     # libcurl implementation
│   └── interfaces/         # Interface abstractions
│       ├── config.c
│       ├── logger.c
│       └── http.h
├── include/                # Public header files
│   ├── agent.h
│   ├── tool.h
│   ├── messages.h
│   ├── r_config.h
│   └── ... (10 more)
├── build/                  # Build artifacts (object files)
├── bin/                    # Binary output directory
└── test_results/           # Test output files
```

## Build System

### Makefile Targets

| Target | Description | Dependencies |
|--------|-------------|--------------|
| `build` | Standard optimized build | All object files |
| `debug` | Build with debug symbols (`-g`, `-O0`) | All object files |
| `run` | Build and execute with verbose output | `build` |
| `clean` | Remove all build artifacts | - |
| `install` | Install to `/usr/local/bin` | `build` |
| `uninstall` | Remove from `/usr/local/bin` | - |
| `docker` | Build Docker image | Dockerfile |
| `docker_make` | Build using Docker container | Dockerfile |
| `docker_run` | Run in Docker container | `docker_make` |
| `appimage` | Build AppImage | `appimagetool` |
| `build_deb` | Build Debian package | `dpkg-deb` |

### Compiler Flags

#### Standard Build (`make build`)
```makefile
CFLAGS = -Ofast -Werror -Wall -I./include
LDFLAGS = -lreadline -lncurses -lcurl -ljson-c -lsqlite3 -lm -lpthread -lssl -lcrypto
```

#### Debug Build (`make debug`)
```makefile
CFLAGS = -g -O0 -Werror -Wall -I./include -DDEBUG
```

### Source File Categories

```makefile
# Core modules
SRC_CORE = src/main.c src/agent.c src/messages.c src/context_manager.c \
           src/tool_registry.c src/bash_executor.c src/http_client.c \
           src/db.c src/markdown.c src/r_config.c src/r_diff.c src/r_error.c

# Tool implementations
SRC_TOOLS = src/tools/tools_init.c src/tools/tool_terminal.c \
            src/tools/tool_python.c src/tools/tool_file.c \
            src/tools/tool_file_edit.c src/tools/tool_agent.c \
            src/tools/tool_http.c src/tools/tool_db.c \
            src/tools/tool_indexer.c src/tools/tool_code.c \
            src/tools/tool_system.c src/tools/tool_enterprise.c \
            src/tools/tool_research.c src/tools/tool_network.c \
            src/tools/tool_dns.c src/tools/tool_automation.c \
            src/tools/tool_csv.c src/tools/tool_deepsearch.c \
            src/tools/tool_snapshot.c src/tools/tool_json.c

# Utilities
SRC_UTILS = src/core/memory.c src/core/string.c src/core/buffer.c \
            src/util/path.c src/util/time.c src/impl/db_sqlite.c \
            src/impl/http_curl.c
```

## Configuration

### Environment Variables

| Variable | Description | Default | Priority |
|----------|-------------|---------|----------|
| `R_KEY` | Primary API key for LLM access | - | 1 |
| `OPENROUTER_API_KEY` | Fallback API key | - | 2 |
| `OPENAI_API_KEY` | Secondary fallback | hardcoded | 3 |
| `R_BASE_URL` | API base URL | `https://api.openai.com` | - |
| `R_MODEL` | Model identifier | `gpt-4o-mini` | - |
| `R_MAX_TOKENS` | Maximum tokens per request | `4096` | - |
| `R_MAX_SPAWN_DEPTH` | Max agent spawn depth | `5` | - |
| `R_MAX_TOTAL_SPAWNS` | Max total spawned agents | `20` | - |
| `R_USE_TOOLS` | Enable tool use | `true` | - |
| `R_USE_STRICT` | Use strict JSON schema mode | `true` | - |
| `R_SESSION` | Session identifier | auto-generated | - |
| `R_SYSTEM_MESSAGE` | Custom system message | - | - |
| `R_VERBOSE` | Enable verbose mode | `false` | - |

### Configuration Resolution

```c
// Configuration precedence (highest to lowest):
// 1. Command-line flags (--model, etc.)
// 2. Environment variables (R_MODEL, etc.)
// 3. .rcontext.txt (project-specific)
// 4. ~/.rcontext.txt (user-global)
// 5. Built-in defaults
```

### Context Files

#### `.rcontext.txt` Format
```
# Project context for R agent
# Lines starting with # are comments

MODEL=gpt-4o
MAX_TOKENS=8192
SYSTEM_MESSAGE=You are working on a C project. Follow Linux kernel coding style.
```

#### Loading Order
1. Current directory: `./.rcontext.txt`
2. Home directory: `~/.rcontext.txt`

## Tool System

### Tool Architecture

All tools follow a vtable-based architecture for polymorphism:

```c
// Tool vtable - function pointers for polymorphic behavior
typedef struct {
    struct json_object *(*get_description)(void);
    char *(*execute)(tool_t *self, struct json_object *args);
    void (*print_action)(const char *name, struct json_object *args);
} tool_vtable_t;

// Tool instance
struct tool {
    const tool_vtable_t *vtable;
    const char *name;
};
```

### Tool Registration

```c
// Tools are registered in tools_init.c
tool_registry_t *tools_get_registry(void) {
    global_registry = tool_registry_create();
    
    // Register all tools
    tool_registry_register(global_registry, tool_terminal_create());
    tool_registry_register(global_registry, tool_web_search_create());
    tool_registry_register(global_registry, tool_deepsearch_create());
    // ... 22 more tools
    
    return global_registry;
}
```

### Parallel Tool Execution

Tools are executed in parallel using pthreads when part of the same tool call batch:

```c
// From tool_registry.c
void tool_registry_execute_parallel(tool_registry_t *reg, 
                                    json_object *tool_calls,
                                    json_object *results) {
    // Create thread for each tool
    pthread_t *threads = calloc(num_tools, sizeof(pthread_t));
    tool_exec_context_t *contexts = calloc(num_tools, sizeof(tool_exec_context_t));
    
    // Launch all tools concurrently
    for (int i = 0; i < num_tools; i++) {
        pthread_create(&threads[i], NULL, tool_execute_thread, &contexts[i]);
    }
    
    // Wait for completion
    for (int i = 0; i < num_tools; i++) {
        pthread_join(threads[i], NULL);
    }
}
```

### Tool Categories

#### Terminal Tools (`tool_terminal.c`)
- `linux_terminal_execute`: Execute shell commands with PID tracking
- `terminal_interactive`: Interactive terminal sessions

#### File Tools (`tool_file.c`)
- `read_file`: Read file contents
- `write_file`: Write file contents (atomic)
- `directory_glob`: List files matching pattern
- `mkdir`: Create directories
- `chdir`: Change working directory
- `getpwd`: Get current directory

#### Code Tools (`tool_code.c`)
- `index_source_directory`: Deep codebase indexing
- `code_grep`: Search code patterns
- `code_symbol_find`: Find function/variable definitions

#### File Edit Tools (`tool_file_edit.c`)
- `file_line_replace`: Replace specific lines
- `file_apply_patch`: Apply unified diff patches

#### HTTP Tools (`tool_http.c`)
- `http_fetch`: GET URL contents
- `web_search`: General web search
- `web_search_news`: News-specific search

#### DeepSearch Tool (`tool_deepsearch.c`)
Implements intelligent iterative research:

```c
// Algorithm parameters
#define MAX_QUERIES 8              // Max queries per iteration
#define QUERY_GENERATION_MAX_TOKENS 2048
#define MAX_ITERATIONS 3           // Max research iterations
#define MIN_CONTENT_LENGTH 100     // Minimum valid content length
#define MIN_VALID_RESULTS 5        // Stop threshold
```

**Execution Flow:**
1. Generate 5-8 diverse search queries using LLM
2. Execute all queries concurrently via pthreads
3. Extract and validate content from results
4. If insufficient results, generate follow-up queries
5. Merge all results into unified JSON response

## Agent System

### Agent State Machine

```c
typedef enum {
    AGENT_STATE_IDLE,           // Agent created, not running
    AGENT_STATE_RUNNING,        // Active ReAct loop
    AGENT_STATE_EXECUTING_TOOLS,// Currently executing tools
    AGENT_STATE_COMPLETED,      // Task completed successfully
    AGENT_STATE_MAX_ITERATIONS, // Stopped: iteration limit
    AGENT_STATE_ERROR           // Stopped: error condition
} agent_state_t;
```

### ReAct Loop Implementation

```c
// Simplified pseudocode of agent_run()
while (agent->iteration_count < max_iterations) {
    // 1. Build LLM request with full context
    request = agent_build_request(agent, messages);
    
    // 2. Send to LLM API
    response = http_post(api_url, request);
    
    // 3. Parse response
    parsed = parse_llm_response(response);
    
    // 4. Check for tool calls
    if (has_tool_calls(parsed)) {
        agent->state = AGENT_STATE_EXECUTING_TOOLS;
        results = execute_tools_parallel(tool_calls);
        add_observations_to_context(results);
    }
    
    // 5. Check for completion
    if (is_completion_message(parsed)) {
        agent->state = AGENT_STATE_COMPLETED;
        return final_answer;
    }
    
    // 6. Check for refusal/incompleteness
    if (is_incomplete(parsed)) {
        auto_continue();
    }
    
    agent->iteration_count++;
}
```

### Context Management

The context manager handles token overflow:

```c
// Context shrinking strategy
void context_manager_shrink(messages_handle msgs, int target_tokens) {
    // 1. Calculate current token count
    current = messages_count_tokens(msgs);
    
    // 2. If over limit, remove oldest non-system messages
    while (current > target_tokens) {
        msg = find_oldest_non_system_message(msgs);
        if (is_important(msg)) {
            summary = summarize_message(msg);
            replace_with_summary(msg, summary);
        } else {
            remove_message(msg);
        }
        current = recalculate_tokens(msgs);
    }
}
```

### Sub-Agent Spawning

```c
// Spawn specialized sub-agent
agent_handle agent_spawn(const char *role, const char *goal) {
    // Check depth limits
    if (current_depth >= R_MAX_SPAWN_DEPTH) return NULL;
    if (total_spawns >= R_MAX_TOTAL_SPAWNS) return NULL;
    
    // Create specialized agent
    agent = agent_create(goal, NULL);
    agent_set_role(agent, role);
    agent_set_manager_id(agent, parent_id);
    
    // Assign specialized tool registry based on role
    if (strcmp(role, "researcher") == 0) {
        agent_set_tool_registry(agent, tool_registry_get_specialized(TOOL_TYPE_RESEARCHER));
    } else if (strcmp(role, "developer") == 0) {
        agent_set_tool_registry(agent, tool_registry_get_specialized(TOOL_TYPE_DEVELOPER));
    }
    
    return agent;
}
```

## DeepSearch Algorithm

### Overview

The DeepSearch tool implements an iterative, AI-driven research algorithm that goes beyond simple search to provide comprehensive topic coverage.

### Algorithm Phases

```
┌─────────────────────────────────────────────────────────────────┐
│                    DEEPSEARCH ALGORITHM                          │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 1: QUERY GENERATION                                        │
│ - Input: Research subject (user query)                          │
│ - Process: LLM generates 5-8 diverse, specific queries          │
│ - Output: Array of search queries                               │
│                                                                  │
│ Prompt strategy:                                                 │
│ "Generate 5-8 diverse, specific search queries that will        │
│  comprehensively cover the given subject from multiple angles"  │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 2: PARALLEL SEARCH EXECUTION                               │
│ - Input: Array of queries                                       │
│ - Process: Execute all queries concurrently via pthreads        │
│ - Output: Raw search results (JSON)                             │
│                                                                  │
│ Concurrency model:                                               │
│ for each query:                                                  │
│     pthread_create(search_thread_func, query)                   │
│ for each thread:                                                 │
│     pthread_join(thread)                                        │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 3: CONTENT EXTRACTION & VALIDATION                         │
│ - Input: Raw JSON results                                       │
│ - Process: Parse and filter results by content length           │
│ - Criteria: MIN_CONTENT_LENGTH >= 100 characters                │
│ - Output: Validated content array                               │
│                                                                  │
│ Validation function:                                             │
│ count_valid_results(json) -> int valid_count                    │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 4: ITERATION DECISION                                      │
│ - Condition: total_valid_results >= MIN_VALID_RESULTS * iter?   │
│ - If YES: Proceed to Phase 5                                    │
│ - If NO and iter < MAX_ITERATIONS:                              │
│     * Generate research summary from findings                   │
│     * Use summary as context for follow-up queries              │
│     * Return to Phase 1                                         │
│                                                                  │
│ Follow-up prompt:                                               │
│ "Based on what has been found so far, generate 4-6 follow-up   │
│  queries to explore gaps and deeper aspects"                    │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 5: RESULT MERGING                                          │
│ - Input: All validated results from all iterations              │
│ - Process: Merge into unified JSON structure                    │
│ - Output: Combined results object                               │
│                                                                  │
│ Merge format:                                                   │
│ {                                                               │
│   "results": [                                                  │
│     { "title": "...", "content": "...", "url": "..." },         │
│     ...                                                         │
│   ]                                                             │
│ }                                                               │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 6: RETURN TO LLM                                           │
│ - Format merged results as JSON string                          │
│ - Return to agent for synthesis and reporting                   │
└─────────────────────────────────────────────────────────────────┘
```

### Configuration Parameters

```c
// DeepSearch tunable parameters
#define MAX_QUERIES 8              // Maximum queries per iteration
#define QUERY_GENERATION_MAX_TOKENS 2048
#define MAX_ITERATIONS 3           // Maximum research iterations
#define MIN_CONTENT_LENGTH 100     // Minimum content to be considered valid
#define MIN_VALID_RESULTS 5        // Valid results needed per iteration
```

## Database Schema

### SQLite Database Location
`~/.r.db` (or path specified by `R_DB_PATH`)

### Schema Definition

```sql
-- Agent tracking table
CREATE TABLE agents (
    agent_id TEXT PRIMARY KEY,
    role TEXT,
    manager_id TEXT,
    department TEXT,
    budget_limit INTEGER DEFAULT 0,
    used_tokens INTEGER DEFAULT 0,
    status TEXT DEFAULT 'active',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_heartbeat TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (manager_id) REFERENCES agents(agent_id)
);

-- Research tasks table (for research_dispatcher tool)
CREATE TABLE research_tasks (
    url_hash TEXT PRIMARY KEY,
    url TEXT NOT NULL,
    status TEXT DEFAULT 'pending', -- pending, processing, completed, failed
    summary TEXT,
    batch_id TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- GTR (Goal-Task-Result) tracking
CREATE TABLE gtr_tasks (
    task_id TEXT PRIMARY KEY,
    parent_task_id TEXT,
    goal TEXT NOT NULL,
    status TEXT DEFAULT 'pending',
    assigned_agent TEXT,
    result TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    completed_at TIMESTAMP,
    FOREIGN KEY (parent_task_id) REFERENCES gtr_tasks(task_id),
    FOREIGN KEY (assigned_agent) REFERENCES agents(agent_id)
);

-- Audit logging
CREATE TABLE audit_logs (
    log_id INTEGER PRIMARY KEY AUTOINCREMENT,
    agent_id TEXT,
    action TEXT NOT NULL,
    details TEXT,
    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (agent_id) REFERENCES agents(agent_id)
);

-- Session storage
CREATE TABLE sessions (
    session_id TEXT PRIMARY KEY,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_accessed TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Snapshots for checkpoint/restore
CREATE TABLE snapshots (
    snapshot_id TEXT PRIMARY KEY,
    session_id TEXT,
    description TEXT,
    state_json TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (session_id) REFERENCES sessions(session_id)
);
```

## API Integration

### LLM API Protocol

R uses OpenAI-compatible API endpoints:

#### Request Format
```json
{
  "model": "gpt-4o-mini",
  "messages": [
    {"role": "system", "content": "..."},
    {"role": "user", "content": "..."},
    {"role": "assistant", "content": "...", "tool_calls": [...]},
    {"role": "tool", "tool_call_id": "...", "content": "..."}
  ],
  "tools": [...],
  "tool_choice": "auto",
  "temperature": 0.1,
  "max_tokens": 4096
}
```

#### Response Format
```json
{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "gpt-4o-mini",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "...",
      "tool_calls": [{
        "id": "call_...",
        "type": "function",
        "function": {
          "name": "web_search",
          "arguments": "{\"query\": \"...\"}"
        }
      }]
    },
    "finish_reason": "tool_calls"
  }],
  "usage": {
    "prompt_tokens": 100,
    "completion_tokens": 50,
    "total_tokens": 150
  }
}
```

### HTTP Client Configuration

```c
// HTTP timeout settings
typedef struct {
    long connection_timeout_ms;     // 30000 (30 seconds)
    long operation_timeout_ms;      // 300000 (5 minutes)
    bool follow_redirects;          // true
    long max_redirects;             // 10
} http_config_t;
```

### Web Search API

Search is performed via `rsearch.app.molodetz.nl`:

```
GET https://rsearch.app.molodetz.nl/search?query=<encoded>&content=true

Response:
{
  "results": [
    {
      "title": "...",
      "url": "...",
      "content": "...",
      "published": "..."
    }
  ]
}
```

## Security Model

### Threat Model

| Threat | Mitigation |
|--------|------------|
| API Key Exposure | Keys only in environment variables, never logged |
| Command Injection | Input validation, parameterized commands |
| Path Traversal | Path canonicalization, working directory checks |
| Resource Exhaustion | Iteration limits, token budgets, timeouts |
| Unrestricted File Access | User confirmation for destructive operations |

### Sandboxing

- File operations restricted to working directory
- Shell commands executed in subprocess with limited environment
- Python execution runs in isolated interpreter instance
- Network access only via configured HTTP endpoints

### Audit Logging

All agent actions are logged to `audit_logs` table:
- Agent ID
- Action type
- Details (sanitized)
- Timestamp

## Performance Characteristics

### Memory Usage

| Component | Typical | Maximum |
|-----------|---------|---------|
| Base binary | ~2 MB | ~2 MB |
| Message history | ~1-10 MB | ~50 MB (configurable) |
| Tool execution | ~100 KB per tool | ~10 MB (parallel) |
| Database cache | ~1 MB | ~5 MB |

### Execution Times

| Operation | Typical | Worst Case |
|-----------|---------|------------|
| LLM API call | 1-5s | 60s (timeout) |
| Web search | 500ms | 10s |
| DeepSearch (full) | 10-30s | 120s |
| File I/O | <1ms | 100ms |
| Shell command | Depends on command | 300s (background threshold) |

### Throughput

- Maximum parallel tool executions: 25 (one per tool)
- Maximum concurrent HTTP requests: Limited by curl multi-handle
- Message processing rate: ~1000 tokens/second

## Development Guide

### Adding a New Tool

1. Create `src/tools/tool_mytool.c`:
```c
// retoor <retoor@molodetz.nl>
#include "tool.h"

static char *mytool_execute(tool_t *self, struct json_object *args) {
    // Parse arguments
    struct json_object *arg_obj;
    if (!json_object_object_get_ex(args, "myarg", &arg_obj)) {
        return strdup("Error: missing 'myarg' argument");
    }
    
    // Execute tool logic
    const char *value = json_object_get_string(arg_obj);
    
    // Return result
    char *result = malloc(256);
    snprintf(result, 256, "Processed: %s", value);
    return result;
}

static void mytool_print_action(const char *name, struct json_object *args) {
    fprintf(stderr, "  -> MyTool executing\n");
}

static struct json_object *mytool_get_description(void) {
    struct json_object *root = json_object_new_object();
    json_object_object_add(root, "type", json_object_new_string("function"));
    
    struct json_object *function = json_object_new_object();
    json_object_object_add(function, "name", json_object_new_string("my_tool"));
    json_object_object_add(function, "description", 
        json_object_new_string("Description of what my tool does"));
    
    // Define parameters schema
    struct json_object *parameters = json_object_new_object();
    // ... parameter definitions ...
    json_object_object_add(function, "parameters", parameters);
    
    json_object_object_add(root, "function", function);
    return root;
}

static const tool_vtable_t mytool_vtable = {
    .get_description = mytool_get_description,
    .execute = mytool_execute,
    .print_action = mytool_print_action
};

static tool_t mytool = { .vtable = &mytool_vtable, .name = "my_tool" };

tool_t *tool_my_tool_create(void) { return &mytool; }
```

2. Add declaration to `src/tools/tools_init.c`:
```c
extern tool_t *tool_my_tool_create(void);
```

3. Register in `tools_get_registry()`:
```c
tool_registry_register(global_registry, tool_my_tool_create());
```

4. Add to Makefile `SRC_TOOLS`:
```makefile
SRC_TOOLS = ... src/tools/tool_mytool.c
```

5. Rebuild:
```bash
make clean && make build
```

### Coding Standards

#### Naming Conventions
- Types: `r_<module>_handle` for opaque handles
- Functions: `r_<module>_<action>` for public APIs
- Files: `r_<module>.c` for core, `tool_<name>.c` for tools
- Constants: `UPPER_CASE_WITH_UNDERSCORES`

#### Header Style
All C files must start with:
```c
// retoor <retoor@molodetz.nl>
```

#### Error Handling
```c
// Return status codes
r_status_t function_name(...) {
    if (error_condition) {
        r_error_set("Error message");
        return R_ERROR_INVALID;
    }
    return R_SUCCESS;
}
```

#### Memory Management
```c
// Use wrappers from core/memory.c
void *ptr = r_malloc(size);
void *new_ptr = r_realloc(ptr, new_size);
r_free(ptr);

// Strings
char *str = r_strdup(source);
```

### Testing

```bash
# Run Python test framework
python3 testit.py

# Manual testing
./r "Test prompt here"
```

### Debugging

```bash
# Enable verbose mode
./r --verbose "Your prompt"

# Debug build with symbols
make debug

# Run with gdb
gdb ./r
(gdb) run "test prompt"
```

---

# APPENDIX

## Complete Tool Reference

### Tool: `linux_terminal_execute`
**Description:** Execute a shell command with real-time output and PID tracking.

**Parameters:**
- `command` (string, required): The shell command to execute.
- `description` (string, optional): Description of what the command does.

**Returns:** Command output, exit status, and PID.

**Example:**
```json
{
  "command": "ls -la",
  "description": "List all files in current directory"
}
```

### Tool: `web_search`
**Description:** Searches for information using search engines.

**Parameters:**
- `query` (string, required): The search query.

**Returns:** JSON array of search results with title, URL, content.

### Tool: `deepsearch`
**Description:** Performs intelligent iterative deep research.

**Parameters:**
- `query` (string, required): The research subject.

**Algorithm:**
1. Generate 5-8 diverse queries using LLM
2. Execute queries concurrently
3. Extract and validate content
4. Iterate if insufficient results
5. Merge all results

**Returns:** Merged JSON results from all iterations.

### Tool: `spawn_agent`
**Description:** Spawn a specialized sub-agent for parallel task execution.

**Parameters:**
- `role` (string, required): Agent role (researcher, developer, security).
- `goal` (string, required): Task description for the agent.
- `budget` (integer, optional): Token budget limit.

**Returns:** Agent ID and status.

## Error Codes

```c
typedef enum {
    R_SUCCESS = 0,
    R_ERROR_INVALID = -1,
    R_ERROR_MEMORY = -2,
    R_ERROR_IO = -3,
    R_ERROR_NETWORK = -4,
    R_ERROR_API = -5,
    R_ERROR_TOOL = -6,
    R_ERROR_TIMEOUT = -7,
    R_ERROR_PERMISSION = -8,
    R_ERROR_NOT_FOUND = -9,
    R_ERROR_ALREADY_EXISTS = -10
} r_status_t;
```

## License

MIT License - See source headers for details.

---

*This README was generated for R - Autonomous AI Agent CLI v2.0.0*
*Author: retoor <retoor@molodetz.nl>*