# Tikker ML Analytics - Advanced Pattern Detection & Behavioral Analysis ## Overview The Tikker ML Analytics service provides machine learning-powered insights into keystroke behavior. It detects patterns, identifies anomalies, builds behavioral profiles, and enables user authenticity verification. **Service Port:** 8003 ## Architecture The ML service operates independently as a microservice while leveraging the SQLite database shared with other services. ``` ┌─────────────────────────────────┐ │ ML Analytics Service:8003 │ ├─────────────────────────────────┤ │ - Pattern Detection │ │ - Anomaly Detection │ │ - Behavioral Profiling │ │ - User Authenticity Check │ │ - Temporal Analysis │ │ - ML Model Training & Inference │ └────────────┬────────────────────┘ │ ▼ ┌─────────────┐ │ SQLite DB │ │ (tikker.db) │ └─────────────┘ ``` ## Capabilities ### 1. Pattern Detection Automatically identifies typing patterns and behavioral characteristics. **Detected Patterns:** - **fast_typist** - User types significantly faster than average (>80 WPM) - **slow_typist** - User types slower than average (<20 WPM) - **consistent_rhythm** - Very regular keystroke timing (consistency >0.85) - **inconsistent_rhythm** - Irregular keystroke timing (consistency <0.5) **Endpoint:** ``` POST /patterns/detect ``` **Request:** ```json { "events": [ {"timestamp": 0, "key_code": 65, "event_type": "press"}, {"timestamp": 100, "key_code": 66, "event_type": "press"} ], "user_id": "user123" } ``` **Response:** ```json [ { "name": "fast_typist", "confidence": 0.92, "frequency": 150, "description": "User types significantly faster than average", "features": { "avg_wpm": 85 } } ] ``` ### 2. Anomaly Detection Compares current behavior against user's baseline profile to identify deviations. **Detectable Anomalies:** - **typing_speed_deviation** - Significant change in typing speed - **rhythm_deviation** - Unusual change in keystroke rhythm **Endpoint:** ``` POST /anomalies/detect ``` **Request:** ```json { "events": [...], "user_id": "user123" } ``` **Response:** ```json [ { "timestamp": "2024-01-15T10:30:00", "anomaly_type": "typing_speed_deviation", "severity": 0.65, "reason": "Typing speed deviation of 65% from baseline", "expected_value": 50, "actual_value": 82.5 } ] ``` ### 3. Behavioral Profile Building Creates comprehensive user profile from keystroke data. **Profile Components:** - Average typing speed (WPM) - Peak activity hours - Most common words - Consistency score (0.0-1.0) - Detected patterns **Endpoint:** ``` POST /profile/build ``` **Request:** ```json { "events": [...], "user_id": "user123" } ``` **Response:** ```json { "user_id": "user123", "avg_typing_speed": 58.5, "peak_hours": [9, 10, 14, 15, 16], "common_words": ["the", "and", "test", "python", "data"], "consistency_score": 0.78, "patterns": ["consistent_rhythm"] } ``` ### 4. User Authenticity Verification Verifies if keystroke pattern matches known user profile (biometric authentication). **Verdict Levels:** - **authentic** - High confidence match (score > 0.8) - **likely_authentic** - Good confidence match (score > 0.6) - **uncertain** - Moderate confidence (score > 0.4) - **suspicious** - Low confidence match (score ≤ 0.4) - **unknown** - No baseline profile established **Endpoint:** ``` POST /authenticity/check ``` **Request:** ```json { "events": [...], "user_id": "user123" } ``` **Response:** ```json { "authenticity_score": 0.87, "confidence": 0.85, "verdict": "authentic", "reason": "Speed match: 92.1%, Consistency match: 82.5%" } ``` ### 5. Temporal Analysis Analyzes keystroke patterns over time periods. **Analysis Output:** - Activity trends (increasing/decreasing) - Daily breakdown - Weekly patterns - Seasonal variations **Endpoint:** ``` POST /temporal/analyze ``` **Request:** ```json { "date_range_days": 7 } ``` **Response:** ```json { "trend": "increasing", "date_range_days": 7, "analysis": [ {"date": "2024-01-08", "total_events": 1250}, {"date": "2024-01-09", "total_events": 1380}, {"date": "2024-01-10", "total_events": 1450} ] } ``` ### 6. ML Model Training Trains models on historical keystroke data for predictions. **Endpoint:** ``` POST /model/train ``` **Parameters:** - `sample_size` (optional, default=100, max=10000): Training samples **Response:** ```json { "status": "trained", "samples": 500, "features": ["typing_speed", "consistency", "rhythm_pattern"], "accuracy": 0.89 } ``` ### 7. Behavior Prediction Predicts user behavior based on trained model. **Predicted Behaviors:** - **normal** - Expected behavior - **fast_focused** - Fast, focused typing (>80 WPM) - **slow_deliberate** - Careful typing (<30 WPM) - **stressed_or_tired** - Inconsistent rhythm (consistency <0.5) **Endpoint:** ``` POST /behavior/predict ``` **Request:** ```json { "events": [...], "user_id": "user123" } ``` **Response:** ```json { "status": "predicted", "behavior_category": "fast_focused", "confidence": 0.89, "features": { "typing_speed": 85, "consistency": 0.82 } } ``` ## Data Flow ### Pattern Detection Flow ``` Keystroke Events → Analyze Typing Metrics → Identify Patterns → Return Results ↓ - Calculate WPM - Calculate Consistency - Compare to Thresholds ``` ### Anomaly Detection Flow ``` Keystroke Events → Build Profile → Compare to Baseline → Detect Deviations → Alert ↓ Store as Baseline (first time) Use for Comparison (subsequent) ``` ### Authenticity Verification Flow ``` Keystroke Events → Extract Features → Compare to Baseline → Calculate Score → Verdict ↓ - Speed match percentage - Consistency match percentage - Combined score ``` ## Metrics ### Typing Speed (WPM) Calculated as words per minute: ``` WPM = (Total Characters / 5) / (Total Time in Minutes) ``` ### Rhythm Consistency (0.0 to 1.0) Measures regularity of keystroke intervals: ``` Consistency = 1.0 - (Standard Deviation / Mean Interval) ``` Higher values indicate more consistent rhythm. ### Authenticity Score (0.0 to 1.0) Composite score combining: - Speed match (50% weight) - Consistency match (50% weight) ### Anomaly Severity (0.0 to 1.0) Indicates how significant deviation from baseline is. ## Usage Examples ### Example 1: Detect User's Typing Patterns ```bash curl -X POST http://localhost:8003/patterns/detect \ -H "Content-Type: application/json" \ -d '{ "events": [ {"timestamp": 0, "key_code": 65, "event_type": "press"}, {"timestamp": 95, "key_code": 66, "event_type": "press"}, {"timestamp": 190, "key_code": 67, "event_type": "press"} ], "user_id": "alice" }' ``` ### Example 2: Build User Baseline Profile ```bash curl -X POST http://localhost:8003/profile/build \ -H "Content-Type: application/json" \ -d '{ "events": [...], # 200+ events "user_id": "alice" }' ``` ### Example 3: Check User Authenticity ```bash # First, build profile curl -X POST http://localhost:8003/profile/build \ -H "Content-Type: application/json" \ -d '{"events": [...], "user_id": "alice"}' # Then check if events match curl -X POST http://localhost:8003/authenticity/check \ -H "Content-Type: application/json" \ -d '{ "events": [...], # New keystroke events "user_id": "alice" }' ``` ### Example 4: Predict Behavior ```bash # Train model curl -X POST http://localhost:8003/model/train?sample_size=500 # Predict behavior curl -X POST http://localhost:8003/behavior/predict \ -H "Content-Type: application/json" \ -d '{ "events": [...], "user_id": "alice" }' ``` ## Integration with Main API The ML service can be called from the main API. To add ML endpoints to the main API: ```python import httpx @app.post("/api/ml/patterns") async def analyze_patterns_endpoint(user_id: str): async with httpx.AsyncClient() as client: response = await client.post( "http://ml_service:8003/patterns/detect", json={"events": events, "user_id": user_id} ) return response.json() ``` ## Performance Characteristics Typical latencies on 2 CPU, 2GB RAM: - Pattern detection: 50-100ms - Anomaly detection: 80-150ms - Profile building: 150-300ms - Authenticity check: 100-200ms - Temporal analysis: 200-500ms (depends on data range) - Model training: 500-1000ms (depends on sample size) - Behavior prediction: 50-100ms ## Security Considerations 1. **Input Validation** - Events must be valid timestamped data - User IDs sanitized 2. **Privacy** - Profiles stored only in memory during service lifetime - No persistent profile storage in ML service 3. **Access Control** - Runs on internal network (port 8003) - Not exposed directly to clients - Access via main API with authentication ## Limitations 1. **Baseline Establishment** - Requires minimum keystroke events (100+) for accurate profile - Needs established baseline for anomaly detection 2. **Model Accuracy** - Accuracy depends on training data quality - New user profiles need 200+ samples for reliability 3. **Time-Based Features** - Temporal analysis requires historical data in database - Peak hour detection requires events across different times ## Future Enhancements 1. **Advanced ML Models** - Neural network-based behavior classification - Seasonal pattern detection - Predictive analytics 2. **Continuous Learning** - Automatic profile updates - Adaptive thresholds - User adaptation tracking 3. **Threat Detection** - Replay attack detection - Impersonation detection - Behavioral drift tracking 4. **Integration** - Real-time alerts for anomalies - Dashboard visualizations - Export capabilities ## Troubleshooting ### Service won't start ```bash docker-compose logs ml_service ``` ### Pattern detection returns empty - Ensure events list is not empty - Minimum 10 events recommended for pattern detection ### Anomaly detection shows no anomalies - Build baseline first with `/profile/build` - Ensure user_id matches between profile and check ### Authenticity score always ~0.5 - Profile not established for user - Need to call `/profile/build` first ## Testing Run ML service tests: ```bash pytest tests/test_ml_service.py -v ``` Run specific test: ```bash pytest tests/test_ml_service.py::TestPatternDetection::test_detect_fast_typing_pattern -v ``` ## References - Main documentation: [docs/API.md](API.md) - Performance guide: [docs/PERFORMANCE.md](PERFORMANCE.md) - Deployment guide: [docs/DEPLOYMENT.md](DEPLOYMENT.md)