Tikker ML Analytics - Advanced Pattern Detection & Behavioral Analysis

Overview

The Tikker ML Analytics service provides machine learning-powered insights into keystroke behavior. It detects patterns, identifies anomalies, builds behavioral profiles, and enables user authenticity verification.

Service Port: 8003

Architecture

The ML service operates independently as a microservice while leveraging the SQLite database shared with other services.

┌─────────────────────────────────┐
│   ML Analytics Service:8003     │
├─────────────────────────────────┤
│ - Pattern Detection             │
│ - Anomaly Detection             │
│ - Behavioral Profiling          │
│ - User Authenticity Check       │
│ - Temporal Analysis             │
│ - ML Model Training & Inference │
└────────────┬────────────────────┘
             │
             ▼
        ┌─────────────┐
        │ SQLite DB   │
        │ (tikker.db) │
        └─────────────┘

Capabilities

1. Pattern Detection

Automatically identifies typing patterns and behavioral characteristics.

Detected Patterns:

fast_typist - User types significantly faster than average (>80 WPM)
slow_typist - User types slower than average (<20 WPM)
consistent_rhythm - Very regular keystroke timing (consistency >0.85)
inconsistent_rhythm - Irregular keystroke timing (consistency <0.5)

Endpoint:

POST /patterns/detect

Request:

{
  "events": [
    {"timestamp": 0, "key_code": 65, "event_type": "press"},
    {"timestamp": 100, "key_code": 66, "event_type": "press"}
  ],
  "user_id": "user123"
}

Response:

[
  {
    "name": "fast_typist",
    "confidence": 0.92,
    "frequency": 150,
    "description": "User types significantly faster than average",
    "features": {
      "avg_wpm": 85
    }
  }
]

2. Anomaly Detection

Compares current behavior against user's baseline profile to identify deviations.

Detectable Anomalies:

typing_speed_deviation - Significant change in typing speed
rhythm_deviation - Unusual change in keystroke rhythm

Endpoint:

POST /anomalies/detect

Request:

{
  "events": [...],
  "user_id": "user123"
}

Response:

[
  {
    "timestamp": "2024-01-15T10:30:00",
    "anomaly_type": "typing_speed_deviation",
    "severity": 0.65,
    "reason": "Typing speed deviation of 65% from baseline",
    "expected_value": 50,
    "actual_value": 82.5
  }
]

3. Behavioral Profile Building

Creates comprehensive user profile from keystroke data.

Profile Components:

Average typing speed (WPM)
Peak activity hours
Most common words
Consistency score (0.0-1.0)
Detected patterns

Endpoint:

POST /profile/build

Request:

{
  "events": [...],
  "user_id": "user123"
}

Response:

{
  "user_id": "user123",
  "avg_typing_speed": 58.5,
  "peak_hours": [9, 10, 14, 15, 16],
  "common_words": ["the", "and", "test", "python", "data"],
  "consistency_score": 0.78,
  "patterns": ["consistent_rhythm"]
}

4. User Authenticity Verification

Verifies if keystroke pattern matches known user profile (biometric authentication).

Verdict Levels:

authentic - High confidence match (score > 0.8)
likely_authentic - Good confidence match (score > 0.6)
uncertain - Moderate confidence (score > 0.4)
suspicious - Low confidence match (score ≤ 0.4)
unknown - No baseline profile established

Endpoint:

POST /authenticity/check

Request:

{
  "events": [...],
  "user_id": "user123"
}

Response:

{
  "authenticity_score": 0.87,
  "confidence": 0.85,
  "verdict": "authentic",
  "reason": "Speed match: 92.1%, Consistency match: 82.5%"
}

5. Temporal Analysis

Analyzes keystroke patterns over time periods.

Analysis Output:

Activity trends (increasing/decreasing)
Daily breakdown
Weekly patterns
Seasonal variations

Endpoint:

POST /temporal/analyze

Request:

{
  "date_range_days": 7
}

Response:

{
  "trend": "increasing",
  "date_range_days": 7,
  "analysis": [
    {"date": "2024-01-08", "total_events": 1250},
    {"date": "2024-01-09", "total_events": 1380},
    {"date": "2024-01-10", "total_events": 1450}
  ]
}

6. ML Model Training

Trains models on historical keystroke data for predictions.

Endpoint:

POST /model/train

Parameters:

sample_size (optional, default=100, max=10000): Training samples

Response:

{
  "status": "trained",
  "samples": 500,
  "features": ["typing_speed", "consistency", "rhythm_pattern"],
  "accuracy": 0.89
}

7. Behavior Prediction

Predicts user behavior based on trained model.

Predicted Behaviors:

normal - Expected behavior
fast_focused - Fast, focused typing (>80 WPM)
slow_deliberate - Careful typing (<30 WPM)
stressed_or_tired - Inconsistent rhythm (consistency <0.5)

Endpoint:

POST /behavior/predict

Request:

{
  "events": [...],
  "user_id": "user123"
}

Response:

{
  "status": "predicted",
  "behavior_category": "fast_focused",
  "confidence": 0.89,
  "features": {
    "typing_speed": 85,
    "consistency": 0.82
  }
}

Data Flow

Pattern Detection Flow

Keystroke Events → Analyze Typing Metrics → Identify Patterns → Return Results
     ↓
  - Calculate WPM
  - Calculate Consistency
  - Compare to Thresholds

Anomaly Detection Flow

Keystroke Events → Build Profile → Compare to Baseline → Detect Deviations → Alert
     ↓
  Store as Baseline (first time)
  Use for Comparison (subsequent)

Authenticity Verification Flow

Keystroke Events → Extract Features → Compare to Baseline → Calculate Score → Verdict
     ↓
  - Speed match percentage
  - Consistency match percentage
  - Combined score

Metrics

Typing Speed (WPM)

Calculated as words per minute:

WPM = (Total Characters / 5) / (Total Time in Minutes)

Rhythm Consistency (0.0 to 1.0)

Measures regularity of keystroke intervals:

Consistency = 1.0 - (Standard Deviation / Mean Interval)

Higher values indicate more consistent rhythm.

Authenticity Score (0.0 to 1.0)

Composite score combining:

Speed match (50% weight)
Consistency match (50% weight)

Anomaly Severity (0.0 to 1.0)

Indicates how significant deviation from baseline is.

Usage Examples

Example 1: Detect User's Typing Patterns

curl -X POST http://localhost:8003/patterns/detect \
  -H "Content-Type: application/json" \
  -d '{
    "events": [
      {"timestamp": 0, "key_code": 65, "event_type": "press"},
      {"timestamp": 95, "key_code": 66, "event_type": "press"},
      {"timestamp": 190, "key_code": 67, "event_type": "press"}
    ],
    "user_id": "alice"
  }'

Example 2: Build User Baseline Profile

curl -X POST http://localhost:8003/profile/build \
  -H "Content-Type: application/json" \
  -d '{
    "events": [...],  # 200+ events
    "user_id": "alice"
  }'

Example 3: Check User Authenticity

# First, build profile
curl -X POST http://localhost:8003/profile/build \
  -H "Content-Type: application/json" \
  -d '{"events": [...], "user_id": "alice"}'

# Then check if events match
curl -X POST http://localhost:8003/authenticity/check \
  -H "Content-Type: application/json" \
  -d '{
    "events": [...],  # New keystroke events
    "user_id": "alice"
  }'

Example 4: Predict Behavior

# Train model
curl -X POST http://localhost:8003/model/train?sample_size=500

# Predict behavior
curl -X POST http://localhost:8003/behavior/predict \
  -H "Content-Type: application/json" \
  -d '{
    "events": [...],
    "user_id": "alice"
  }'

Integration with Main API

The ML service can be called from the main API. To add ML endpoints to the main API:

import httpx

@app.post("/api/ml/patterns")
async def analyze_patterns_endpoint(user_id: str):
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "http://ml_service:8003/patterns/detect",
            json={"events": events, "user_id": user_id}
        )
        return response.json()

Performance Characteristics

Typical latencies on 2 CPU, 2GB RAM:

Pattern detection: 50-100ms
Anomaly detection: 80-150ms
Profile building: 150-300ms
Authenticity check: 100-200ms
Temporal analysis: 200-500ms (depends on data range)
Model training: 500-1000ms (depends on sample size)
Behavior prediction: 50-100ms

Security Considerations

Input Validation
- Events must be valid timestamped data
- User IDs sanitized
Privacy
- Profiles stored only in memory during service lifetime
- No persistent profile storage in ML service
Access Control
- Runs on internal network (port 8003)
- Not exposed directly to clients
- Access via main API with authentication

Limitations

Baseline Establishment
- Requires minimum keystroke events (100+) for accurate profile
- Needs established baseline for anomaly detection
Model Accuracy
- Accuracy depends on training data quality
- New user profiles need 200+ samples for reliability
Time-Based Features
- Temporal analysis requires historical data in database
- Peak hour detection requires events across different times

Future Enhancements

Advanced ML Models
- Neural network-based behavior classification
- Seasonal pattern detection
- Predictive analytics
Continuous Learning
- Automatic profile updates
- Adaptive thresholds
- User adaptation tracking
Threat Detection
- Replay attack detection
- Impersonation detection
- Behavioral drift tracking
Integration
- Real-time alerts for anomalies
- Dashboard visualizations
- Export capabilities

Troubleshooting

Service won't start

docker-compose logs ml_service

Pattern detection returns empty

Ensure events list is not empty
Minimum 10 events recommended for pattern detection

Anomaly detection shows no anomalies

Build baseline first with /profile/build
Ensure user_id matches between profile and check

Authenticity score always ~0.5

Profile not established for user
Need to call /profile/build first

Testing

Run ML service tests:

pytest tests/test_ml_service.py -v

Run specific test:

pytest tests/test_ml_service.py::TestPatternDetection::test_detect_fast_typing_pattern -v

References

Main documentation: docs/API.md
Performance guide: docs/PERFORMANCE.md
Deployment guide: docs/DEPLOYMENT.md