2026-01-29 08:06:31 +01:00
// retoor < retoor @ molodetz . nl >
2026-02-14 08:07:05 +01:00
# R - Autonomous AI Agent CLI
**Version:** 2.0.0
**Author:** retoor < retoor @ molodetz . nl >
**License:** MIT
**Language:** C99
**Build System:** GNU Make
---
# 📖 TABLE OF CONTENTS
1. [User Guide ](#user-guide )
- [What is R? ](#what-is-r )
- [Key Features ](#key-features )
- [Installation ](#installation )
- [Quick Start ](#quick-start )
- [Use Cases ](#use-cases )
- [Interactive Commands ](#interactive-commands )
2. [Technical Specification ](#technical-specification )
- [Architecture Overview ](#architecture-overview )
- [Build System ](#build-system )
- [Configuration ](#configuration )
- [Tool System ](#tool-system )
- [Agent System ](#agent-system )
- [DeepSearch Algorithm ](#deepsearch-algorithm )
- [Database Schema ](#database-schema )
- [API Integration ](#api-integration )
- [Security Model ](#security-model )
- [Performance Characteristics ](#performance-characteristics )
- [Development Guide ](#development-guide )
---
# USER GUIDE
## What is R?
R is a high-performance, autonomous AI agent command-line interface written in C. It implements the **ReAct (Reasoning + Acting)** paradigm, enabling complex multi-turn task execution through an intelligent loop of:
1. **Reasoning** - Analyzing the current state and planning next steps
2. **Acting** - Executing tools (shell commands, file operations, web searches)
3. **Observing** - Processing tool outputs
4. **Refining** - Adjusting the plan based on observations
Unlike simple chatbots, R can:
- Write and execute code
- Browse the web and research topics
- Manage files and directories
- Spawn sub-agents for parallel task execution
- Monitor and control background processes
- Maintain persistent state across sessions
## Key Features
### 🧠 Autonomous Agent Architecture
- **ReAct Loop:** Up to 300 iterations of reasoning-acting-observing cycles
- **Sub-agent Spawning:** Hierarchical agent delegation with budget tracking
- **Plan-Execute-Verify-Conclude Lifecycle:** Structured task management
- **Context Management:** Automatic context window optimization
### 🔧 25+ Built-in Tools
All tools are OpenAI-compatible with strict JSON schema validation:
| Category | Tools |
|----------|-------|
| **Terminal** | `linux_terminal_execute` , `terminal_interactive` |
| **File System** | `read_file` , `write_file` , `directory_glob` , `mkdir` , `chdir` , `getpwd` |
| **Code** | `python_execute` , `code_grep` , `code_symbol_find` |
| **File Edit** | `file_line_replace` , `file_apply_patch` |
| **HTTP/Web** | `http_fetch` , `web_search` , `web_search_news` , `deepsearch` |
| **Database** | `db_get` , `db_set` , `db_query` |
| **Process** | `process_monitor` , `process_get_status` , `process_terminate` |
| **Network** | `network_check` , `dns_lookup` , `network_port_scan` |
| **Research** | `index_source_directory` , `research_dispatcher` , `fetch_and_scrape` |
| **Automation** | `automation_fuzz` , `automation_exploit_gen` |
| **Data** | `csv_export` |
| **Agent** | `spawn_agent` |
| **Snapshot** | `create_snapshot` , `list_snapshots` , `restore_snapshot` |
### ⚡ Performance
- **Parallel Tool Execution:** pthread-based concurrent tool runs
- **Asynchronous Background Tasks:** Non-blocking long-running operations
- **Real-time Output Streaming:** Live process output with PID tracking
- **Memory Efficient:** Context window management with intelligent shrinking
### 🎨 Terminal Experience
- **Syntax Highlighting:** Python code with ANSI colors and line numbers
- **Markdown Rendering:** Headers, lists, code blocks, tables
- **Process Traceability:** Every output line tagged with `[PID]`
- **Spinner Indicators:** Visual feedback for HTTP operations
2025-01-04 17:01:08 +01:00
2025-12-18 01:08:38 +01:00
## Installation
2025-04-03 13:22:50 +02:00
2026-02-14 08:07:05 +01:00
### Prerequisites
#### Ubuntu/Debian
```bash
sudo apt-get update
sudo apt-get install -y \
gcc make \
libreadline-dev \
libncurses5-dev \
libcurl4-openssl-dev \
libssl-dev \
libjson-c-dev \
libsqlite3-dev \
libtinfo-dev \
libgnutls28-dev \
libgmp-dev
```
#### macOS (with Homebrew)
```bash
brew install readline ncurses curl json-c sqlite3 openssl gmp gnutls
```
#### Fedora/RHEL
2025-12-26 10:16:59 +01:00
```bash
2026-02-14 08:07:05 +01:00
sudo dnf install -y \
gcc make \
readline-devel \
ncurses-devel \
libcurl-devel \
openssl-devel \
json-c-devel \
sqlite-devel \
gnutls-devel \
gmp-devel
2025-12-26 10:16:59 +01:00
```
2026-01-29 08:06:31 +01:00
### Build & Install
2026-02-14 08:07:05 +01:00
2025-12-26 10:16:59 +01:00
```bash
2026-02-14 08:07:05 +01:00
# Clone repository
git clone < repository-url >
cd r
# Standard build
2025-12-26 10:16:59 +01:00
make build
2026-02-14 08:07:05 +01:00
# Build with debug symbols
make debug
# Install system-wide
2026-01-29 08:06:31 +01:00
sudo make install
2026-02-14 08:07:05 +01:00
# Verify installation
r --help
```
### Docker Build
```bash
make docker_make
make docker_run
```
### AppImage Build
```bash
make appimage
./r-x86_64.AppImage
```
## Quick Start
### 1. Set up API Key
```bash
export R_KEY="your-api-key-here"
```
Or use fallback environment variables:
- `OPENROUTER_API_KEY`
- `OPENAI_API_KEY`
### 2. Interactive Mode (REPL)
```bash
./r
```
REPL Commands:
- `!clear` - Clear session history
- `!dump` - Export message history as JSON
- `!session` - Show current session ID
- `!new` - Start new session
- `!tools` - List available tools
- `!models` - List available models
- `!model <name>` - Switch model
- `!verbose` - Toggle verbose mode
- `!vi` / `!emacs` - Set editing mode
- `exit` - Quit
### 3. Single Command Mode
```bash
# Direct prompt
./r "List all files in current directory"
# Read from stdin
echo "Analyze this code" | ./r --stdin
# Include Python file as context
./r --py script.py "Explain this code"
# Include custom context file
./r --context project.txt "Summarize the architecture"
# Specify session
./r -s mysession "Your prompt"
# API mode (JSON output)
./r --api "Your prompt"
# Disable syntax highlighting
./r --nh "Your prompt"
```
## Use Cases
### Software Development
#### Code Review & Analysis
```bash
# Review a codebase
./r --py main.py "Review this code for security issues and best practices"
# Understand complex code
./r "Index the src/ directory and explain the architecture"
# Refactoring assistance
./r "Refactor this function to use better error handling" --py module.py
```
#### Automated Testing
```bash
# Generate test cases
./r "Read src/calculator.c and generate comprehensive unit tests"
# Test execution with monitoring
./r "Run the test suite in background and monitor progress"
```
#### Documentation Generation
```bash
# Generate API documentation
./r "Read all header files in include/ and generate API documentation"
# Create README from code
./r "Analyze the codebase and create a comprehensive README.md"
```
### DevOps & System Administration
#### System Monitoring
```bash
# Check system health
./r "Monitor system processes and report any anomalies"
# Log analysis
./r "Analyze /var/log/syslog for error patterns in the last 24 hours"
# Network diagnostics
./r "Check network connectivity and DNS resolution for google.com"
```
#### Automation Scripts
```bash
# Batch file processing
./r "Find all .log files in /var/log and compress those older than 7 days"
# Configuration management
./r "Update all configuration files to use the new API endpoint"
# Deployment automation
./r "Execute deployment script and verify all services are running"
```
### Research & Data Analysis
#### Web Research
```bash
# Quick search
./r "Search for the latest news on renewable energy"
# Deep research (multi-iteration)
./r "Deep search: What are the current best practices for microservices architecture?"
# Competitive analysis
./r "Research the top 5 competitors in the cloud storage market"
```
#### Data Processing
```bash
# CSV analysis
./r "Read data.csv and calculate summary statistics for each column"
# Python data science
./r "Load the dataset and create visualizations of the trends"
# Database queries
./r "Query the local database for all agents created in the last hour"
```
### Cybersecurity
#### Security Auditing
```bash
# Port scanning
./r "Scan localhost for open ports and identify services"
# DNS reconnaissance
./r "Perform DNS lookup and analyze the records for example.com"
# Fuzzing
./r "Fuzz the API endpoint /api/v1/users with common payloads"
```
#### Vulnerability Research
```bash
# Exploit generation
./r "Generate exploit code for CVE-2024-XXXX"
# Security report analysis
./r "Read security_scan.txt and summarize the critical findings"
```
### Content Creation
#### Writing Assistance
```bash
# Technical writing
./r "Write a technical blog post about async/await in Python"
# Documentation
./r "Create user documentation for the file system tools"
# Code comments
./r "Add comprehensive docstrings to all functions in utils.py"
```
### Education & Learning
#### Tutorial Generation
```bash
# Explain concepts
./r "Explain how recursion works with examples in C"
# Code walkthrough
./r "Walk through the quicksort implementation line by line"
# Quiz generation
./r "Create a quiz with 10 questions about data structures"
```
## Interactive Commands
When in REPL mode, special commands prefixed with `!` are available:
| Command | Description |
|---------|-------------|
| `!clear` | Clear session message history |
| `!dump` | Export conversation as JSON |
| `!session` | Display current session identifier |
| `!new` | Start a fresh session |
| `!tools` | List all registered tools with descriptions |
| `!models` | Query and display available LLM models |
| `!model <name>` | Switch to a different LLM model |
| `!verbose` | Toggle verbose logging mode |
| `!vi` | Set line editor to vi mode |
| `!emacs` | Set line editor to emacs mode |
| `exit` | Quit the application |
---
# TECHNICAL SPECIFICATION
## Architecture Overview
### System Architecture Diagram
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ USER INTERFACE │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ REPL Mode │ │ Single Cmd │ │ API Mode │ │ Stdin Pipe │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
└─────────┼─────────────────┼─────────────────┼─────────────────┼──────────────┘
│ │ │ │
└─────────────────┴─────────┬───────┴─────────────────┘
│
┌───────▼────────┐
│ Main Entry │
│ (main.c) │
└───────┬────────┘
│
┌─────────────────────────────────────▼──────────────────────────────────────┐
│ AGENT CORE │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ ReAct Loop │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Reason │───▶│ Plan │───▶│ Act │───▶│ Observe │──┐ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │ │
│ │ ▲───────────────────────────────┘ │ │
│ │ └──────────────────────────────── │ │
│ │ (max 300 iterations) │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Context Manager │ │ Message History │ │ Budget Tracker │ │
│ │(agent.c:context)│ │ (messages.c) │ │ (agent.c:tok) │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────▼──────────────────────────────────────┐
│ TOOL SYSTEM │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Tool Registry (tool_registry.c) │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │Terminal │ │ File │ │ HTTP │ │ Python │ │ Agent │ ... │ │
│ │ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │
│ └───────┼───────────┼───────────┼───────────┼───────────┼────────────┘ │
│ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Parallel Execution (pthreads) │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────▼──────────────────────────────────────┐
│ EXTERNAL INTERFACES │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ LLM API │ │ HTTP/Curl │ │ SQLite │ │ System │ │
│ │(http_client) │ │(http_curl.c) │ │ (db_sqlite) │ │ (bash) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
```
### Module Hierarchy
```
r/
├── src/
│ ├── main.c # Application entry point, CLI parsing, REPL
│ ├── agent.c # Core ReAct loop, agent lifecycle management
│ ├── messages.c # Message history, context window management
│ ├── context_manager.c # Context shrinking, token optimization
│ ├── tool_registry.c # Tool registration, parallel execution
│ ├── bash_executor.c # Shell command execution with PID tracking
│ ├── http_client.c # HTTP client wrapper
│ ├── db.c # SQLite database operations
│ ├── markdown.c # Markdown to ANSI rendering
│ ├── r_config.c # Configuration management (singleton)
│ ├── tools/ # Tool implementations
│ │ ├── tools_init.c # Tool registration coordinator
│ │ ├── tool_terminal.c # Shell execution tools
│ │ ├── tool_python.c # Python execution tool
│ │ ├── tool_file.c # File I/O tools
│ │ ├── tool_file_edit.c# File editing tools
│ │ ├── tool_http.c # HTTP and web search tools
│ │ ├── tool_deepsearch.c# Deep research tool
│ │ ├── tool_agent.c # Sub-agent spawning
│ │ └── ... (17 more)
│ ├── core/ # Core utilities
│ │ ├── memory.c # Memory management wrappers
│ │ ├── string.c # String utilities
│ │ └── buffer.c # Dynamic buffer
│ ├── util/ # General utilities
│ │ ├── path.c # Path manipulation
│ │ └── time.c # Time utilities
│ ├── impl/ # Implementation wrappers
│ │ ├── db_sqlite.c # SQLite implementation
│ │ └── http_curl.c # libcurl implementation
│ └── interfaces/ # Interface abstractions
│ ├── config.c
│ ├── logger.c
│ └── http.h
├── include/ # Public header files
│ ├── agent.h
│ ├── tool.h
│ ├── messages.h
│ ├── r_config.h
│ └── ... (10 more)
├── build/ # Build artifacts (object files)
├── bin/ # Binary output directory
└── test_results/ # Test output files
```
## Build System
### Makefile Targets
| Target | Description | Dependencies |
|--------|-------------|--------------|
| `build` | Standard optimized build | All object files |
| `debug` | Build with debug symbols (`-g`, `-O0` ) | All object files |
| `run` | Build and execute with verbose output | `build` |
| `clean` | Remove all build artifacts | - |
| `install` | Install to `/usr/local/bin` | `build` |
| `uninstall` | Remove from `/usr/local/bin` | - |
| `docker` | Build Docker image | Dockerfile |
| `docker_make` | Build using Docker container | Dockerfile |
| `docker_run` | Run in Docker container | `docker_make` |
| `appimage` | Build AppImage | `appimagetool` |
| `build_deb` | Build Debian package | `dpkg-deb` |
### Compiler Flags
#### Standard Build (`make build`)
```makefile
CFLAGS = -Ofast -Werror -Wall -I./include
LDFLAGS = -lreadline -lncurses -lcurl -ljson-c -lsqlite3 -lm -lpthread -lssl -lcrypto
```
#### Debug Build (`make debug`)
```makefile
CFLAGS = -g -O0 -Werror -Wall -I./include -DDEBUG
```
### Source File Categories
```makefile
# Core modules
SRC_CORE = src/main.c src/agent.c src/messages.c src/context_manager.c \
src/tool_registry.c src/bash_executor.c src/http_client.c \
src/db.c src/markdown.c src/r_config.c src/r_diff.c src/r_error.c
# Tool implementations
SRC_TOOLS = src/tools/tools_init.c src/tools/tool_terminal.c \
src/tools/tool_python.c src/tools/tool_file.c \
src/tools/tool_file_edit.c src/tools/tool_agent.c \
src/tools/tool_http.c src/tools/tool_db.c \
src/tools/tool_indexer.c src/tools/tool_code.c \
src/tools/tool_system.c src/tools/tool_enterprise.c \
src/tools/tool_research.c src/tools/tool_network.c \
src/tools/tool_dns.c src/tools/tool_automation.c \
src/tools/tool_csv.c src/tools/tool_deepsearch.c \
src/tools/tool_snapshot.c src/tools/tool_json.c
# Utilities
SRC_UTILS = src/core/memory.c src/core/string.c src/core/buffer.c \
src/util/path.c src/util/time.c src/impl/db_sqlite.c \
src/impl/http_curl.c
```
## Configuration
### Environment Variables
| Variable | Description | Default | Priority |
|----------|-------------|---------|----------|
| `R_KEY` | Primary API key for LLM access | - | 1 |
| `OPENROUTER_API_KEY` | Fallback API key | - | 2 |
| `OPENAI_API_KEY` | Secondary fallback | hardcoded | 3 |
| `R_BASE_URL` | API base URL | `https://api.openai.com` | - |
| `R_MODEL` | Model identifier | `gpt-4o-mini` | - |
| `R_MAX_TOKENS` | Maximum tokens per request | `4096` | - |
| `R_MAX_SPAWN_DEPTH` | Max agent spawn depth | `5` | - |
| `R_MAX_TOTAL_SPAWNS` | Max total spawned agents | `20` | - |
| `R_USE_TOOLS` | Enable tool use | `true` | - |
| `R_USE_STRICT` | Use strict JSON schema mode | `true` | - |
| `R_SESSION` | Session identifier | auto-generated | - |
| `R_SYSTEM_MESSAGE` | Custom system message | - | - |
| `R_VERBOSE` | Enable verbose mode | `false` | - |
### Configuration Resolution
```c
// Configuration precedence (highest to lowest):
// 1. Command-line flags (--model, etc.)
// 2. Environment variables (R_MODEL, etc.)
// 3. .rcontext.txt (project-specific)
// 4. ~/.rcontext.txt (user-global)
// 5. Built-in defaults
```
### Context Files
#### `.rcontext.txt` Format
```
# Project context for R agent
# Lines starting with # are comments
MODEL=gpt-4o
MAX_TOKENS=8192
SYSTEM_MESSAGE=You are working on a C project. Follow Linux kernel coding style.
```
#### Loading Order
1. Current directory: `./.rcontext.txt`
2. Home directory: `~/.rcontext.txt`
## Tool System
### Tool Architecture
All tools follow a vtable-based architecture for polymorphism:
```c
// Tool vtable - function pointers for polymorphic behavior
typedef struct {
struct json_object *(*get_description)(void);
char *(*execute)(tool_t *self, struct json_object *args);
void (*print_action)(const char *name, struct json_object *args);
} tool_vtable_t;
// Tool instance
struct tool {
const tool_vtable_t *vtable;
const char *name;
};
```
### Tool Registration
```c
// Tools are registered in tools_init.c
tool_registry_t *tools_get_registry(void) {
global_registry = tool_registry_create();
// Register all tools
tool_registry_register(global_registry, tool_terminal_create());
tool_registry_register(global_registry, tool_web_search_create());
tool_registry_register(global_registry, tool_deepsearch_create());
// ... 22 more tools
return global_registry;
}
```
### Parallel Tool Execution
Tools are executed in parallel using pthreads when part of the same tool call batch:
```c
// From tool_registry.c
void tool_registry_execute_parallel(tool_registry_t *reg,
json_object *tool_calls,
json_object *results) {
// Create thread for each tool
pthread_t *threads = calloc(num_tools, sizeof(pthread_t));
tool_exec_context_t *contexts = calloc(num_tools, sizeof(tool_exec_context_t));
// Launch all tools concurrently
for (int i = 0; i < num_tools ; i + + ) {
pthread_create(& threads[i], NULL, tool_execute_thread, &contexts[i]);
}
// Wait for completion
for (int i = 0; i < num_tools ; i + + ) {
pthread_join(threads[i], NULL);
}
}
```
### Tool Categories
#### Terminal Tools (`tool_terminal.c`)
- `linux_terminal_execute` : Execute shell commands with PID tracking
- `terminal_interactive` : Interactive terminal sessions
#### File Tools (`tool_file.c`)
- `read_file` : Read file contents
- `write_file` : Write file contents (atomic)
- `directory_glob` : List files matching pattern
- `mkdir` : Create directories
- `chdir` : Change working directory
- `getpwd` : Get current directory
#### Code Tools (`tool_code.c`)
- `index_source_directory` : Deep codebase indexing
- `code_grep` : Search code patterns
- `code_symbol_find` : Find function/variable definitions
#### File Edit Tools (`tool_file_edit.c`)
- `file_line_replace` : Replace specific lines
- `file_apply_patch` : Apply unified diff patches
#### HTTP Tools (`tool_http.c`)
- `http_fetch` : GET URL contents
- `web_search` : General web search
- `web_search_news` : News-specific search
#### DeepSearch Tool (`tool_deepsearch.c`)
Implements intelligent iterative research:
```c
// Algorithm parameters
#define MAX_QUERIES 8 // Max queries per iteration
#define QUERY_GENERATION_MAX_TOKENS 2048
#define MAX_ITERATIONS 3 // Max research iterations
#define MIN_CONTENT_LENGTH 100 // Minimum valid content length
#define MIN_VALID_RESULTS 5 // Stop threshold
```
**Execution Flow:**
1. Generate 5-8 diverse search queries using LLM
2. Execute all queries concurrently via pthreads
3. Extract and validate content from results
4. If insufficient results, generate follow-up queries
5. Merge all results into unified JSON response
## Agent System
### Agent State Machine
```c
typedef enum {
AGENT_STATE_IDLE, // Agent created, not running
AGENT_STATE_RUNNING, // Active ReAct loop
AGENT_STATE_EXECUTING_TOOLS,// Currently executing tools
AGENT_STATE_COMPLETED, // Task completed successfully
AGENT_STATE_MAX_ITERATIONS, // Stopped: iteration limit
AGENT_STATE_ERROR // Stopped: error condition
} agent_state_t;
```
### ReAct Loop Implementation
```c
// Simplified pseudocode of agent_run()
while (agent->iteration_count < max_iterations ) {
// 1. Build LLM request with full context
request = agent_build_request(agent, messages);
// 2. Send to LLM API
response = http_post(api_url, request);
// 3. Parse response
parsed = parse_llm_response(response);
// 4. Check for tool calls
if (has_tool_calls(parsed)) {
agent->state = AGENT_STATE_EXECUTING_TOOLS;
results = execute_tools_parallel(tool_calls);
add_observations_to_context(results);
}
// 5. Check for completion
if (is_completion_message(parsed)) {
agent->state = AGENT_STATE_COMPLETED;
return final_answer;
}
// 6. Check for refusal/incompleteness
if (is_incomplete(parsed)) {
auto_continue();
}
agent->iteration_count++;
}
```
### Context Management
The context manager handles token overflow:
```c
// Context shrinking strategy
void context_manager_shrink(messages_handle msgs, int target_tokens) {
// 1. Calculate current token count
current = messages_count_tokens(msgs);
// 2. If over limit, remove oldest non-system messages
while (current > target_tokens) {
msg = find_oldest_non_system_message(msgs);
if (is_important(msg)) {
summary = summarize_message(msg);
replace_with_summary(msg, summary);
} else {
remove_message(msg);
}
current = recalculate_tokens(msgs);
}
}
```
### Sub-Agent Spawning
```c
// Spawn specialized sub-agent
agent_handle agent_spawn(const char *role, const char *goal) {
// Check depth limits
if (current_depth >= R_MAX_SPAWN_DEPTH) return NULL;
if (total_spawns >= R_MAX_TOTAL_SPAWNS) return NULL;
// Create specialized agent
agent = agent_create(goal, NULL);
agent_set_role(agent, role);
agent_set_manager_id(agent, parent_id);
// Assign specialized tool registry based on role
if (strcmp(role, "researcher") == 0) {
agent_set_tool_registry(agent, tool_registry_get_specialized(TOOL_TYPE_RESEARCHER));
} else if (strcmp(role, "developer") == 0) {
agent_set_tool_registry(agent, tool_registry_get_specialized(TOOL_TYPE_DEVELOPER));
}
return agent;
}
```
## DeepSearch Algorithm
### Overview
The DeepSearch tool implements an iterative, AI-driven research algorithm that goes beyond simple search to provide comprehensive topic coverage.
### Algorithm Phases
```
┌─────────────────────────────────────────────────────────────────┐
│ DEEPSEARCH ALGORITHM │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 1: QUERY GENERATION │
│ - Input: Research subject (user query) │
│ - Process: LLM generates 5-8 diverse, specific queries │
│ - Output: Array of search queries │
│ │
│ Prompt strategy: │
│ "Generate 5-8 diverse, specific search queries that will │
│ comprehensively cover the given subject from multiple angles" │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 2: PARALLEL SEARCH EXECUTION │
│ - Input: Array of queries │
│ - Process: Execute all queries concurrently via pthreads │
│ - Output: Raw search results (JSON) │
│ │
│ Concurrency model: │
│ for each query: │
│ pthread_create(search_thread_func, query) │
│ for each thread: │
│ pthread_join(thread) │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 3: CONTENT EXTRACTION & VALIDATION │
│ - Input: Raw JSON results │
│ - Process: Parse and filter results by content length │
│ - Criteria: MIN_CONTENT_LENGTH >= 100 characters │
│ - Output: Validated content array │
│ │
│ Validation function: │
│ count_valid_results(json) -> int valid_count │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 4: ITERATION DECISION │
│ - Condition: total_valid_results >= MIN_VALID_RESULTS * iter? │
│ - If YES: Proceed to Phase 5 │
│ - If NO and iter < MAX_ITERATIONS: │
│ * Generate research summary from findings │
│ * Use summary as context for follow-up queries │
│ * Return to Phase 1 │
│ │
│ Follow-up prompt: │
│ "Based on what has been found so far, generate 4-6 follow-up │
│ queries to explore gaps and deeper aspects" │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 5: RESULT MERGING │
│ - Input: All validated results from all iterations │
│ - Process: Merge into unified JSON structure │
│ - Output: Combined results object │
│ │
│ Merge format: │
│ { │
│ "results": [ │
│ { "title": "...", "content": "...", "url": "..." }, │
│ ... │
│ ] │
│ } │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ PHASE 6: RETURN TO LLM │
│ - Format merged results as JSON string │
│ - Return to agent for synthesis and reporting │
└─────────────────────────────────────────────────────────────────┘
```
### Configuration Parameters
```c
// DeepSearch tunable parameters
#define MAX_QUERIES 8 // Maximum queries per iteration
#define QUERY_GENERATION_MAX_TOKENS 2048
#define MAX_ITERATIONS 3 // Maximum research iterations
#define MIN_CONTENT_LENGTH 100 // Minimum content to be considered valid
#define MIN_VALID_RESULTS 5 // Valid results needed per iteration
```
## Database Schema
### SQLite Database Location
`~/.r.db` (or path specified by `R_DB_PATH` )
### Schema Definition
```sql
-- Agent tracking table
CREATE TABLE agents (
agent_id TEXT PRIMARY KEY,
role TEXT,
manager_id TEXT,
department TEXT,
budget_limit INTEGER DEFAULT 0,
used_tokens INTEGER DEFAULT 0,
status TEXT DEFAULT 'active',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_heartbeat TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (manager_id) REFERENCES agents(agent_id)
);
-- Research tasks table (for research_dispatcher tool)
CREATE TABLE research_tasks (
url_hash TEXT PRIMARY KEY,
url TEXT NOT NULL,
status TEXT DEFAULT 'pending', -- pending, processing, completed, failed
summary TEXT,
batch_id TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- GTR (Goal-Task-Result) tracking
CREATE TABLE gtr_tasks (
task_id TEXT PRIMARY KEY,
parent_task_id TEXT,
goal TEXT NOT NULL,
status TEXT DEFAULT 'pending',
assigned_agent TEXT,
result TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
completed_at TIMESTAMP,
FOREIGN KEY (parent_task_id) REFERENCES gtr_tasks(task_id),
FOREIGN KEY (assigned_agent) REFERENCES agents(agent_id)
);
-- Audit logging
CREATE TABLE audit_logs (
log_id INTEGER PRIMARY KEY AUTOINCREMENT,
agent_id TEXT,
action TEXT NOT NULL,
details TEXT,
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (agent_id) REFERENCES agents(agent_id)
);
-- Session storage
CREATE TABLE sessions (
session_id TEXT PRIMARY KEY,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_accessed TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Snapshots for checkpoint/restore
CREATE TABLE snapshots (
snapshot_id TEXT PRIMARY KEY,
session_id TEXT,
description TEXT,
state_json TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (session_id) REFERENCES sessions(session_id)
);
2025-12-26 10:16:59 +01:00
```
2025-04-03 13:22:50 +02:00
2026-02-14 08:07:05 +01:00
## API Integration
### LLM API Protocol
R uses OpenAI-compatible API endpoints:
#### Request Format
```json
{
"model": "gpt-4o-mini",
"messages": [
{"role": "system", "content": "..."},
{"role": "user", "content": "..."},
{"role": "assistant", "content": "...", "tool_calls": [...]},
{"role": "tool", "tool_call_id": "...", "content": "..."}
],
"tools": [...],
"tool_choice": "auto",
"temperature": 0.1,
"max_tokens": 4096
}
```
#### Response Format
```json
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1234567890,
"model": "gpt-4o-mini",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "...",
"tool_calls": [{
"id": "call_...",
"type": "function",
"function": {
"name": "web_search",
"arguments": "{\"query\": \"...\"}"
}
}]
},
"finish_reason": "tool_calls"
}],
"usage": {
"prompt_tokens": 100,
"completion_tokens": 50,
"total_tokens": 150
}
}
```
### HTTP Client Configuration
```c
// HTTP timeout settings
typedef struct {
long connection_timeout_ms; // 30000 (30 seconds)
long operation_timeout_ms; // 300000 (5 minutes)
bool follow_redirects; // true
long max_redirects; // 10
} http_config_t;
```
### Web Search API
Search is performed via `rsearch.app.molodetz.nl` :
```
GET https://rsearch.app.molodetz.nl/search?query=< encoded > & content=true
Response:
{
"results": [
{
"title": "...",
"url": "...",
"content": "...",
"published": "..."
}
]
}
```
## Security Model
### Threat Model
| Threat | Mitigation |
|--------|------------|
| API Key Exposure | Keys only in environment variables, never logged |
| Command Injection | Input validation, parameterized commands |
| Path Traversal | Path canonicalization, working directory checks |
| Resource Exhaustion | Iteration limits, token budgets, timeouts |
| Unrestricted File Access | User confirmation for destructive operations |
### Sandboxing
- File operations restricted to working directory
- Shell commands executed in subprocess with limited environment
- Python execution runs in isolated interpreter instance
- Network access only via configured HTTP endpoints
### Audit Logging
All agent actions are logged to `audit_logs` table:
- Agent ID
- Action type
- Details (sanitized)
- Timestamp
## Performance Characteristics
### Memory Usage
| Component | Typical | Maximum |
|-----------|---------|---------|
| Base binary | ~2 MB | ~2 MB |
| Message history | ~1-10 MB | ~50 MB (configurable) |
| Tool execution | ~100 KB per tool | ~10 MB (parallel) |
| Database cache | ~1 MB | ~5 MB |
### Execution Times
| Operation | Typical | Worst Case |
|-----------|---------|------------|
| LLM API call | 1-5s | 60s (timeout) |
| Web search | 500ms | 10s |
| DeepSearch (full) | 10-30s | 120s |
| File I/O | < 1ms | 100ms |
| Shell command | Depends on command | 300s (background threshold) |
### Throughput
- Maximum parallel tool executions: 25 (one per tool)
- Maximum concurrent HTTP requests: Limited by curl multi-handle
- Message processing rate: ~1000 tokens/second
## Development Guide
### Adding a New Tool
1. Create `src/tools/tool_mytool.c` :
```c
// retoor < retoor @ molodetz . nl >
#include "tool.h"
static char *mytool_execute(tool_t *self, struct json_object *args) {
// Parse arguments
struct json_object *arg_obj;
if (!json_object_object_get_ex(args, "myarg", & arg_obj)) {
return strdup("Error: missing 'myarg' argument");
}
// Execute tool logic
const char *value = json_object_get_string(arg_obj);
// Return result
char *result = malloc(256);
snprintf(result, 256, "Processed: %s", value);
return result;
}
static void mytool_print_action(const char *name, struct json_object *args) {
fprintf(stderr, " -> MyTool executing\n");
}
static struct json_object *mytool_get_description(void) {
struct json_object *root = json_object_new_object();
json_object_object_add(root, "type", json_object_new_string("function"));
struct json_object *function = json_object_new_object();
json_object_object_add(function, "name", json_object_new_string("my_tool"));
json_object_object_add(function, "description",
json_object_new_string("Description of what my tool does"));
// Define parameters schema
struct json_object *parameters = json_object_new_object();
// ... parameter definitions ...
json_object_object_add(function, "parameters", parameters);
json_object_object_add(root, "function", function);
return root;
}
static const tool_vtable_t mytool_vtable = {
.get_description = mytool_get_description,
.execute = mytool_execute,
.print_action = mytool_print_action
};
static tool_t mytool = { .vtable = & mytool_vtable, .name = "my_tool" };
tool_t *tool_my_tool_create(void) { return &mytool; }
```
2. Add declaration to `src/tools/tools_init.c` :
```c
extern tool_t *tool_my_tool_create(void);
```
3. Register in `tools_get_registry()` :
```c
tool_registry_register(global_registry, tool_my_tool_create());
```
4. Add to Makefile `SRC_TOOLS` :
```makefile
SRC_TOOLS = ... src/tools/tool_mytool.c
```
5. Rebuild:
```bash
make clean & & make build
```
### Coding Standards
#### Naming Conventions
- Types: `r_<module>_handle` for opaque handles
- Functions: `r_<module>_<action>` for public APIs
- Files: `r_<module>.c` for core, `tool_<name>.c` for tools
- Constants: `UPPER_CASE_WITH_UNDERSCORES`
#### Header Style
All C files must start with:
```c
// retoor < retoor @ molodetz . nl >
```
#### Error Handling
```c
// Return status codes
r_status_t function_name(...) {
if (error_condition) {
r_error_set("Error message");
return R_ERROR_INVALID;
}
return R_SUCCESS;
}
```
#### Memory Management
```c
// Use wrappers from core/memory.c
void *ptr = r_malloc(size);
void *new_ptr = r_realloc(ptr, new_size);
r_free(ptr);
// Strings
char *str = r_strdup(source);
```
### Testing
```bash
# Run Python test framework
python3 testit.py
# Manual testing
./r "Test prompt here"
```
### Debugging
```bash
# Enable verbose mode
./r --verbose "Your prompt"
# Debug build with symbols
make debug
# Run with gdb
gdb ./r
(gdb) run "test prompt"
```
---
# APPENDIX
## Complete Tool Reference
### Tool: `linux_terminal_execute`
**Description:** Execute a shell command with real-time output and PID tracking.
**Parameters:**
- `command` (string, required): The shell command to execute.
- `description` (string, optional): Description of what the command does.
**Returns:** Command output, exit status, and PID.
**Example:**
```json
{
"command": "ls -la",
"description": "List all files in current directory"
}
```
### Tool: `web_search`
**Description:** Searches for information using search engines.
**Parameters:**
- `query` (string, required): The search query.
**Returns:** JSON array of search results with title, URL, content.
### Tool: `deepsearch`
**Description:** Performs intelligent iterative deep research.
**Parameters:**
- `query` (string, required): The research subject.
**Algorithm:**
1. Generate 5-8 diverse queries using LLM
2. Execute queries concurrently
3. Extract and validate content
4. Iterate if insufficient results
5. Merge all results
**Returns:** Merged JSON results from all iterations.
### Tool: `spawn_agent`
**Description:** Spawn a specialized sub-agent for parallel task execution.
**Parameters:**
- `role` (string, required): Agent role (researcher, developer, security).
- `goal` (string, required): Task description for the agent.
- `budget` (integer, optional): Token budget limit.
**Returns:** Agent ID and status.
## Error Codes
```c
typedef enum {
R_SUCCESS = 0,
R_ERROR_INVALID = -1,
R_ERROR_MEMORY = -2,
R_ERROR_IO = -3,
R_ERROR_NETWORK = -4,
R_ERROR_API = -5,
R_ERROR_TOOL = -6,
R_ERROR_TIMEOUT = -7,
R_ERROR_PERMISSION = -8,
R_ERROR_NOT_FOUND = -9,
R_ERROR_ALREADY_EXISTS = -10
} r_status_t;
```
2025-12-26 10:16:59 +01:00
2025-12-18 01:08:38 +01:00
## License
2026-02-14 08:07:05 +01:00
MIT License - See source headers for details.
---
*This README was generated for R - Autonomous AI Agent CLI v2.0.0*
*Author: retoor < retoor @ molodetz . nl > *