# Tikker ML Analytics - Advanced Pattern Detection & Behavioral Analysis

## Overview

The Tikker ML Analytics service provides machine learning-powered insights into keystroke behavior. It detects patterns, identifies anomalies, builds behavioral profiles, and enables user authenticity verification.

**Service Port:** 8003

## Architecture

The ML service operates independently as a microservice while leveraging the SQLite database shared with other services.

```
┌─────────────────────────────────┐
│   ML Analytics Service:8003     │
├─────────────────────────────────┤
│ - Pattern Detection             │
│ - Anomaly Detection             │
│ - Behavioral Profiling          │
│ - User Authenticity Check       │
│ - Temporal Analysis             │
│ - ML Model Training & Inference │
└────────────┬────────────────────┘
             │
             ▼
        ┌─────────────┐
        │ SQLite DB   │
        │ (tikker.db) │
        └─────────────┘
```

## Capabilities

### 1. Pattern Detection

Automatically identifies typing patterns and behavioral characteristics.

**Detected Patterns:**
- **fast_typist** - User types significantly faster than average (>80 WPM)
- **slow_typist** - User types slower than average (<20 WPM)
- **consistent_rhythm** - Very regular keystroke timing (consistency >0.85)
- **inconsistent_rhythm** - Irregular keystroke timing (consistency <0.5)

**Endpoint:**
```
POST /patterns/detect
```

**Request:**
```json
{
  "events": [
    {"timestamp": 0, "key_code": 65, "event_type": "press"},
    {"timestamp": 100, "key_code": 66, "event_type": "press"}
  ],
  "user_id": "user123"
}
```

**Response:**
```json
[
  {
    "name": "fast_typist",
    "confidence": 0.92,
    "frequency": 150,
    "description": "User types significantly faster than average",
    "features": {
      "avg_wpm": 85
    }
  }
]
```

### 2. Anomaly Detection

Compares current behavior against user's baseline profile to identify deviations.

**Detectable Anomalies:**
- **typing_speed_deviation** - Significant change in typing speed
- **rhythm_deviation** - Unusual change in keystroke rhythm

**Endpoint:**
```
POST /anomalies/detect
```

**Request:**
```json
{
  "events": [...],
  "user_id": "user123"
}
```

**Response:**
```json
[
  {
    "timestamp": "2024-01-15T10:30:00",
    "anomaly_type": "typing_speed_deviation",
    "severity": 0.65,
    "reason": "Typing speed deviation of 65% from baseline",
    "expected_value": 50,
    "actual_value": 82.5
  }
]
```

### 3. Behavioral Profile Building

Creates comprehensive user profile from keystroke data.

**Profile Components:**
- Average typing speed (WPM)
- Peak activity hours
- Most common words
- Consistency score (0.0-1.0)
- Detected patterns

**Endpoint:**
```
POST /profile/build
```

**Request:**
```json
{
  "events": [...],
  "user_id": "user123"
}
```

**Response:**
```json
{
  "user_id": "user123",
  "avg_typing_speed": 58.5,
  "peak_hours": [9, 10, 14, 15, 16],
  "common_words": ["the", "and", "test", "python", "data"],
  "consistency_score": 0.78,
  "patterns": ["consistent_rhythm"]
}
```

### 4. User Authenticity Verification

Verifies if keystroke pattern matches known user profile (biometric authentication).

**Verdict Levels:**
- **authentic** - High confidence match (score > 0.8)
- **likely_authentic** - Good confidence match (score > 0.6)
- **uncertain** - Moderate confidence (score > 0.4)
- **suspicious** - Low confidence match (score ≤ 0.4)
- **unknown** - No baseline profile established

**Endpoint:**
```
POST /authenticity/check
```

**Request:**
```json
{
  "events": [...],
  "user_id": "user123"
}
```

**Response:**
```json
{
  "authenticity_score": 0.87,
  "confidence": 0.85,
  "verdict": "authentic",
  "reason": "Speed match: 92.1%, Consistency match: 82.5%"
}
```

### 5. Temporal Analysis

Analyzes keystroke patterns over time periods.

**Analysis Output:**
- Activity trends (increasing/decreasing)
- Daily breakdown
- Weekly patterns
- Seasonal variations

**Endpoint:**
```
POST /temporal/analyze
```

**Request:**
```json
{
  "date_range_days": 7
}
```

**Response:**
```json
{
  "trend": "increasing",
  "date_range_days": 7,
  "analysis": [
    {"date": "2024-01-08", "total_events": 1250},
    {"date": "2024-01-09", "total_events": 1380},
    {"date": "2024-01-10", "total_events": 1450}
  ]
}
```

### 6. ML Model Training

Trains models on historical keystroke data for predictions.

**Endpoint:**
```
POST /model/train
```

**Parameters:**
- `sample_size` (optional, default=100, max=10000): Training samples

**Response:**
```json
{
  "status": "trained",
  "samples": 500,
  "features": ["typing_speed", "consistency", "rhythm_pattern"],
  "accuracy": 0.89
}
```

### 7. Behavior Prediction

Predicts user behavior based on trained model.

**Predicted Behaviors:**
- **normal** - Expected behavior
- **fast_focused** - Fast, focused typing (>80 WPM)
- **slow_deliberate** - Careful typing (<30 WPM)
- **stressed_or_tired** - Inconsistent rhythm (consistency <0.5)

**Endpoint:**
```
POST /behavior/predict
```

**Request:**
```json
{
  "events": [...],
  "user_id": "user123"
}
```

**Response:**
```json
{
  "status": "predicted",
  "behavior_category": "fast_focused",
  "confidence": 0.89,
  "features": {
    "typing_speed": 85,
    "consistency": 0.82
  }
}
```

## Data Flow

### Pattern Detection Flow
```
Keystroke Events → Analyze Typing Metrics → Identify Patterns → Return Results
     ↓
  - Calculate WPM
  - Calculate Consistency
  - Compare to Thresholds
```

### Anomaly Detection Flow
```
Keystroke Events → Build Profile → Compare to Baseline → Detect Deviations → Alert
     ↓
  Store as Baseline (first time)
  Use for Comparison (subsequent)
```

### Authenticity Verification Flow
```
Keystroke Events → Extract Features → Compare to Baseline → Calculate Score → Verdict
     ↓
  - Speed match percentage
  - Consistency match percentage
  - Combined score
```

## Metrics

### Typing Speed (WPM)
Calculated as words per minute:
```
WPM = (Total Characters / 5) / (Total Time in Minutes)
```

### Rhythm Consistency (0.0 to 1.0)
Measures regularity of keystroke intervals:
```
Consistency = 1.0 - (Standard Deviation / Mean Interval)
```

Higher values indicate more consistent rhythm.

### Authenticity Score (0.0 to 1.0)
Composite score combining:
- Speed match (50% weight)
- Consistency match (50% weight)

### Anomaly Severity (0.0 to 1.0)
Indicates how significant deviation from baseline is.

## Usage Examples

### Example 1: Detect User's Typing Patterns

```bash
curl -X POST http://localhost:8003/patterns/detect \
  -H "Content-Type: application/json" \
  -d '{
    "events": [
      {"timestamp": 0, "key_code": 65, "event_type": "press"},
      {"timestamp": 95, "key_code": 66, "event_type": "press"},
      {"timestamp": 190, "key_code": 67, "event_type": "press"}
    ],
    "user_id": "alice"
  }'
```

### Example 2: Build User Baseline Profile

```bash
curl -X POST http://localhost:8003/profile/build \
  -H "Content-Type: application/json" \
  -d '{
    "events": [...],  # 200+ events
    "user_id": "alice"
  }'
```

### Example 3: Check User Authenticity

```bash
# First, build profile
curl -X POST http://localhost:8003/profile/build \
  -H "Content-Type: application/json" \
  -d '{"events": [...], "user_id": "alice"}'

# Then check if events match
curl -X POST http://localhost:8003/authenticity/check \
  -H "Content-Type: application/json" \
  -d '{
    "events": [...],  # New keystroke events
    "user_id": "alice"
  }'
```

### Example 4: Predict Behavior

```bash
# Train model
curl -X POST http://localhost:8003/model/train?sample_size=500

# Predict behavior
curl -X POST http://localhost:8003/behavior/predict \
  -H "Content-Type: application/json" \
  -d '{
    "events": [...],
    "user_id": "alice"
  }'
```

## Integration with Main API

The ML service can be called from the main API. To add ML endpoints to the main API:

```python
import httpx

@app.post("/api/ml/patterns")
async def analyze_patterns_endpoint(user_id: str):
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "http://ml_service:8003/patterns/detect",
            json={"events": events, "user_id": user_id}
        )
        return response.json()
```

## Performance Characteristics

Typical latencies on 2 CPU, 2GB RAM:
- Pattern detection: 50-100ms
- Anomaly detection: 80-150ms
- Profile building: 150-300ms
- Authenticity check: 100-200ms
- Temporal analysis: 200-500ms (depends on data range)
- Model training: 500-1000ms (depends on sample size)
- Behavior prediction: 50-100ms

## Security Considerations

1. **Input Validation**
   - Events must be valid timestamped data
   - User IDs sanitized

2. **Privacy**
   - Profiles stored only in memory during service lifetime
   - No persistent profile storage in ML service

3. **Access Control**
   - Runs on internal network (port 8003)
   - Not exposed directly to clients
   - Access via main API with authentication

## Limitations

1. **Baseline Establishment**
   - Requires minimum keystroke events (100+) for accurate profile
   - Needs established baseline for anomaly detection

2. **Model Accuracy**
   - Accuracy depends on training data quality
   - New user profiles need 200+ samples for reliability

3. **Time-Based Features**
   - Temporal analysis requires historical data in database
   - Peak hour detection requires events across different times

## Future Enhancements

1. **Advanced ML Models**
   - Neural network-based behavior classification
   - Seasonal pattern detection
   - Predictive analytics

2. **Continuous Learning**
   - Automatic profile updates
   - Adaptive thresholds
   - User adaptation tracking

3. **Threat Detection**
   - Replay attack detection
   - Impersonation detection
   - Behavioral drift tracking

4. **Integration**
   - Real-time alerts for anomalies
   - Dashboard visualizations
   - Export capabilities

## Troubleshooting

### Service won't start
```bash
docker-compose logs ml_service
```

### Pattern detection returns empty
- Ensure events list is not empty
- Minimum 10 events recommended for pattern detection

### Anomaly detection shows no anomalies
- Build baseline first with `/profile/build`
- Ensure user_id matches between profile and check

### Authenticity score always ~0.5
- Profile not established for user
- Need to call `/profile/build` first

## Testing

Run ML service tests:
```bash
pytest tests/test_ml_service.py -v
```

Run specific test:
```bash
pytest tests/test_ml_service.py::TestPatternDetection::test_detect_fast_typing_pattern -v
```

## References

- Main documentation: [docs/API.md](API.md)
- Performance guide: [docs/PERFORMANCE.md](PERFORMANCE.md)
- Deployment guide: [docs/DEPLOYMENT.md](DEPLOYMENT.md)