tikker/ML_ANALYTICS.md at main

 # Tikker ML Analytics - Advanced Pattern Detection & Behavioral Analysis
 ## Overview
 The Tikker ML Analytics service provides machine learning-powered insights into keystroke behavior. It detects patterns, identifies anomalies, builds behavioral profiles, and enables user authenticity verification.
 **Service Port:** 8003
 ## Architecture
 The ML service operates independently as a microservice while leveraging the SQLite database shared with other services.
 ```
 ┌─────────────────────────────────┐
 │   ML Analytics Service:8003     │
 ├─────────────────────────────────┤
 │ - Pattern Detection             │
 │ - Anomaly Detection             │
 │ - Behavioral Profiling          │
 │ - User Authenticity Check       │
 │ - Temporal Analysis             │
 │ - ML Model Training & Inference │
 └────────────┬────────────────────┘
              │
              ▼
         ┌─────────────┐
         │ SQLite DB   │
         │ (tikker.db) │
         └─────────────┘
 ```
 ## Capabilities
 ### 1. Pattern Detection
 Automatically identifies typing patterns and behavioral characteristics.
 **Detected Patterns:**
 - **fast_typist** - User types significantly faster than average (>80 WPM)
 - **slow_typist** - User types slower than average (<20 WPM)
 - **consistent_rhythm** - Very regular keystroke timing (consistency >0.85)
 - **inconsistent_rhythm** - Irregular keystroke timing (consistency <0.5)
 **Endpoint:**
 ```
 POST /patterns/detect
 ```
 **Request:**
 ```json
 {
   "events": [
     {"timestamp": 0, "key_code": 65, "event_type": "press"},
     {"timestamp": 100, "key_code": 66, "event_type": "press"}
   ],
   "user_id": "user123"
 }
 ```
 **Response:**
 ```json
 [
   {
     "name": "fast_typist",
     "confidence": 0.92,
     "frequency": 150,
     "description": "User types significantly faster than average",
     "features": {
       "avg_wpm": 85
     }
   }
 ]
 ```
 ### 2. Anomaly Detection
 Compares current behavior against user's baseline profile to identify deviations.
 **Detectable Anomalies:**
 - **typing_speed_deviation** - Significant change in typing speed
 - **rhythm_deviation** - Unusual change in keystroke rhythm
 **Endpoint:**
 ```
 POST /anomalies/detect
 ```
 **Request:**
 ```json
 {
   "events": [...],
   "user_id": "user123"
 }
 ```
 **Response:**
 ```json
 [
   {
     "timestamp": "2024-01-15T10:30:00",
     "anomaly_type": "typing_speed_deviation",
     "severity": 0.65,
     "reason": "Typing speed deviation of 65% from baseline",
     "expected_value": 50,
     "actual_value": 82.5
   }
 ]
 ```
 ### 3. Behavioral Profile Building
 Creates comprehensive user profile from keystroke data.
 **Profile Components:**
 - Average typing speed (WPM)
 - Peak activity hours
 - Most common words
 - Consistency score (0.0-1.0)
 - Detected patterns
 **Endpoint:**
 ```
 POST /profile/build
 ```
 **Request:**
 ```json
 {
   "events": [...],
   "user_id": "user123"
 }
 ```
 **Response:**
 ```json
 {
   "user_id": "user123",
   "avg_typing_speed": 58.5,
   "peak_hours": [9, 10, 14, 15, 16],
   "common_words": ["the", "and", "test", "python", "data"],
   "consistency_score": 0.78,
   "patterns": ["consistent_rhythm"]
 }
 ```
 ### 4. User Authenticity Verification
 Verifies if keystroke pattern matches known user profile (biometric authentication).
 **Verdict Levels:**
 - **authentic** - High confidence match (score > 0.8)
 - **likely_authentic** - Good confidence match (score > 0.6)
 - **uncertain** - Moderate confidence (score > 0.4)
 - **suspicious** - Low confidence match (score ≤ 0.4)
 - **unknown** - No baseline profile established
 **Endpoint:**
 ```
 POST /authenticity/check
 ```
 **Request:**
 ```json
 {
   "events": [...],
   "user_id": "user123"
 }
 ```
 **Response:**
 ```json
 {
   "authenticity_score": 0.87,
   "confidence": 0.85,
   "verdict": "authentic",
   "reason": "Speed match: 92.1%, Consistency match: 82.5%"
 }
 ```
 ### 5. Temporal Analysis
 Analyzes keystroke patterns over time periods.
 **Analysis Output:**
 - Activity trends (increasing/decreasing)
 - Daily breakdown
 - Weekly patterns
 - Seasonal variations
 **Endpoint:**
 ```
 POST /temporal/analyze
 ```
 **Request:**
 ```json
 {
   "date_range_days": 7
 }
 ```
 **Response:**
 ```json
 {
   "trend": "increasing",
   "date_range_days": 7,
   "analysis": [
     {"date": "2024-01-08", "total_events": 1250},
     {"date": "2024-01-09", "total_events": 1380},
     {"date": "2024-01-10", "total_events": 1450}
   ]
 }
 ```
 ### 6. ML Model Training
 Trains models on historical keystroke data for predictions.
 **Endpoint:**
 ```
 POST /model/train
 ```
 **Parameters:**
 - `sample_size` (optional, default=100, max=10000): Training samples
 **Response:**
 ```json
 {
   "status": "trained",
   "samples": 500,
   "features": ["typing_speed", "consistency", "rhythm_pattern"],
   "accuracy": 0.89
 }
 ```
 ### 7. Behavior Prediction
 Predicts user behavior based on trained model.
 **Predicted Behaviors:**
 - **normal** - Expected behavior
 - **fast_focused** - Fast, focused typing (>80 WPM)
 - **slow_deliberate** - Careful typing (<30 WPM)
 - **stressed_or_tired** - Inconsistent rhythm (consistency <0.5)
 **Endpoint:**
 ```
 POST /behavior/predict
 ```
 **Request:**
 ```json
 {
   "events": [...],
   "user_id": "user123"
 }
 ```
 **Response:**
 ```json
 {
   "status": "predicted",
   "behavior_category": "fast_focused",
   "confidence": 0.89,
   "features": {
     "typing_speed": 85,
     "consistency": 0.82
   }
 }
 ```
 ## Data Flow
 ### Pattern Detection Flow
 ```
 Keystroke Events → Analyze Typing Metrics → Identify Patterns → Return Results
      ↓
   - Calculate WPM
   - Calculate Consistency
   - Compare to Thresholds
 ```
 ### Anomaly Detection Flow
 ```
 Keystroke Events → Build Profile → Compare to Baseline → Detect Deviations → Alert
      ↓
   Store as Baseline (first time)
   Use for Comparison (subsequent)
 ```
 ### Authenticity Verification Flow
 ```
 Keystroke Events → Extract Features → Compare to Baseline → Calculate Score → Verdict
      ↓
   - Speed match percentage
   - Consistency match percentage
   - Combined score
 ```
 ## Metrics
 ### Typing Speed (WPM)
 Calculated as words per minute:
 ```
 WPM = (Total Characters / 5) / (Total Time in Minutes)
 ```
 ### Rhythm Consistency (0.0 to 1.0)
 Measures regularity of keystroke intervals:
 ```
 Consistency = 1.0 - (Standard Deviation / Mean Interval)
 ```
 Higher values indicate more consistent rhythm.
 ### Authenticity Score (0.0 to 1.0)
 Composite score combining:
 - Speed match (50% weight)
 - Consistency match (50% weight)
 ### Anomaly Severity (0.0 to 1.0)
 Indicates how significant deviation from baseline is.
 ## Usage Examples
 ### Example 1: Detect User's Typing Patterns
 ```bash
 curl -X POST http://localhost:8003/patterns/detect \
   -H "Content-Type: application/json" \
   -d '{
     "events": [
       {"timestamp": 0, "key_code": 65, "event_type": "press"},
       {"timestamp": 95, "key_code": 66, "event_type": "press"},
       {"timestamp": 190, "key_code": 67, "event_type": "press"}
     ],
     "user_id": "alice"
   }'
 ```
 ### Example 2: Build User Baseline Profile
 ```bash
 curl -X POST http://localhost:8003/profile/build \
   -H "Content-Type: application/json" \
   -d '{
     "events": [...],  # 200+ events
     "user_id": "alice"
   }'
 ```
 ### Example 3: Check User Authenticity
 ```bash
 # First, build profile
 curl -X POST http://localhost:8003/profile/build \
   -H "Content-Type: application/json" \
   -d '{"events": [...], "user_id": "alice"}'
 # Then check if events match
 curl -X POST http://localhost:8003/authenticity/check \
   -H "Content-Type: application/json" \
   -d '{
     "events": [...],  # New keystroke events
     "user_id": "alice"
   }'
 ```
 ### Example 4: Predict Behavior
 ```bash
 # Train model
 curl -X POST http://localhost:8003/model/train?sample_size=500
 # Predict behavior
 curl -X POST http://localhost:8003/behavior/predict \
   -H "Content-Type: application/json" \
   -d '{
     "events": [...],
     "user_id": "alice"
   }'
 ```
 ## Integration with Main API
 The ML service can be called from the main API. To add ML endpoints to the main API:
 ```python
 import httpx
 @app.post("/api/ml/patterns")
 async def analyze_patterns_endpoint(user_id: str):
     async with httpx.AsyncClient() as client:
         response = await client.post(
             "http://ml_service:8003/patterns/detect",
             json={"events": events, "user_id": user_id}
         )
         return response.json()
 ```
 ## Performance Characteristics
 Typical latencies on 2 CPU, 2GB RAM:
 - Pattern detection: 50-100ms
 - Anomaly detection: 80-150ms
 - Profile building: 150-300ms
 - Authenticity check: 100-200ms
 - Temporal analysis: 200-500ms (depends on data range)
 - Model training: 500-1000ms (depends on sample size)
 - Behavior prediction: 50-100ms
 ## Security Considerations
 . **Input Validation**
    - Events must be valid timestamped data
    - User IDs sanitized
 . **Privacy**
    - Profiles stored only in memory during service lifetime
    - No persistent profile storage in ML service
 . **Access Control**
    - Runs on internal network (port 8003)
    - Not exposed directly to clients
    - Access via main API with authentication
 ## Limitations
 . **Baseline Establishment**
    - Requires minimum keystroke events (100+) for accurate profile
    - Needs established baseline for anomaly detection
 . **Model Accuracy**
    - Accuracy depends on training data quality
    - New user profiles need 200+ samples for reliability
 . **Time-Based Features**
    - Temporal analysis requires historical data in database
    - Peak hour detection requires events across different times
 ## Future Enhancements
 . **Advanced ML Models**
    - Neural network-based behavior classification
    - Seasonal pattern detection
    - Predictive analytics
 . **Continuous Learning**
    - Automatic profile updates
    - Adaptive thresholds
    - User adaptation tracking
 . **Threat Detection**
    - Replay attack detection
    - Impersonation detection
    - Behavioral drift tracking
 . **Integration**
    - Real-time alerts for anomalies
    - Dashboard visualizations
    - Export capabilities
 ## Troubleshooting
 ### Service won't start
 ```bash
 docker-compose logs ml_service
 ```
 ### Pattern detection returns empty
 - Ensure events list is not empty
 - Minimum 10 events recommended for pattern detection
 ### Anomaly detection shows no anomalies
 - Build baseline first with `/profile/build`
 - Ensure user_id matches between profile and check
 ### Authenticity score always ~0.5
 - Profile not established for user
 - Need to call `/profile/build` first
 ## Testing
 Run ML service tests:
 ```bash
 pytest tests/test_ml_service.py -v
 ```
 Run specific test:
 ```bash
 pytest tests/test_ml_service.py::TestPatternDetection::test_detect_fast_typing_pattern -v
 ```
 ## References
 - Main documentation: [docs/API.md](API.md)
 - Performance guide: [docs/PERFORMANCE.md](PERFORMANCE.md)
 - Deployment guide: [docs/DEPLOYMENT.md](DEPLOYMENT.md)