Tikker ML Analytics - Advanced Pattern Detection & Behavioral Analysis
Overview
The Tikker ML Analytics service provides machine learning-powered insights into keystroke behavior. It detects patterns, identifies anomalies, builds behavioral profiles, and enables user authenticity verification.
Service Port: 8003
Architecture
The ML service operates independently as a microservice while leveraging the SQLite database shared with other services.
┌─────────────────────────────────┐
│ ML Analytics Service:8003 │
├─────────────────────────────────┤
│ - Pattern Detection │
│ - Anomaly Detection │
│ - Behavioral Profiling │
│ - User Authenticity Check │
│ - Temporal Analysis │
│ - ML Model Training & Inference │
└────────────┬────────────────────┘
│
▼
┌─────────────┐
│ SQLite DB │
│ (tikker.db) │
└─────────────┘
Capabilities
1. Pattern Detection
Automatically identifies typing patterns and behavioral characteristics.
Detected Patterns:
- fast_typist - User types significantly faster than average (>80 WPM)
- slow_typist - User types slower than average (<20 WPM)
- consistent_rhythm - Very regular keystroke timing (consistency >0.85)
- inconsistent_rhythm - Irregular keystroke timing (consistency <0.5)
Endpoint:
POST /patterns/detect
Request:
{
"events": [
{"timestamp": 0, "key_code": 65, "event_type": "press"},
{"timestamp": 100, "key_code": 66, "event_type": "press"}
],
"user_id": "user123"
}
Response:
[
{
"name": "fast_typist",
"confidence": 0.92,
"frequency": 150,
"description": "User types significantly faster than average",
"features": {
"avg_wpm": 85
}
}
]
2. Anomaly Detection
Compares current behavior against user's baseline profile to identify deviations.
Detectable Anomalies:
- typing_speed_deviation - Significant change in typing speed
- rhythm_deviation - Unusual change in keystroke rhythm
Endpoint:
POST /anomalies/detect
Request:
{
"events": [...],
"user_id": "user123"
}
Response:
[
{
"timestamp": "2024-01-15T10:30:00",
"anomaly_type": "typing_speed_deviation",
"severity": 0.65,
"reason": "Typing speed deviation of 65% from baseline",
"expected_value": 50,
"actual_value": 82.5
}
]
3. Behavioral Profile Building
Creates comprehensive user profile from keystroke data.
Profile Components:
- Average typing speed (WPM)
- Peak activity hours
- Most common words
- Consistency score (0.0-1.0)
- Detected patterns
Endpoint:
POST /profile/build
Request:
{
"events": [...],
"user_id": "user123"
}
Response:
{
"user_id": "user123",
"avg_typing_speed": 58.5,
"peak_hours": [9, 10, 14, 15, 16],
"common_words": ["the", "and", "test", "python", "data"],
"consistency_score": 0.78,
"patterns": ["consistent_rhythm"]
}
4. User Authenticity Verification
Verifies if keystroke pattern matches known user profile (biometric authentication).
Verdict Levels:
- authentic - High confidence match (score > 0.8)
- likely_authentic - Good confidence match (score > 0.6)
- uncertain - Moderate confidence (score > 0.4)
- suspicious - Low confidence match (score ≤ 0.4)
- unknown - No baseline profile established
Endpoint:
POST /authenticity/check
Request:
{
"events": [...],
"user_id": "user123"
}
Response:
{
"authenticity_score": 0.87,
"confidence": 0.85,
"verdict": "authentic",
"reason": "Speed match: 92.1%, Consistency match: 82.5%"
}
5. Temporal Analysis
Analyzes keystroke patterns over time periods.
Analysis Output:
- Activity trends (increasing/decreasing)
- Daily breakdown
- Weekly patterns
- Seasonal variations
Endpoint:
POST /temporal/analyze
Request:
{
"date_range_days": 7
}
Response:
{
"trend": "increasing",
"date_range_days": 7,
"analysis": [
{"date": "2024-01-08", "total_events": 1250},
{"date": "2024-01-09", "total_events": 1380},
{"date": "2024-01-10", "total_events": 1450}
]
}
6. ML Model Training
Trains models on historical keystroke data for predictions.
Endpoint:
POST /model/train
Parameters:
sample_size(optional, default=100, max=10000): Training samples
Response:
{
"status": "trained",
"samples": 500,
"features": ["typing_speed", "consistency", "rhythm_pattern"],
"accuracy": 0.89
}
7. Behavior Prediction
Predicts user behavior based on trained model.
Predicted Behaviors:
- normal - Expected behavior
- fast_focused - Fast, focused typing (>80 WPM)
- slow_deliberate - Careful typing (<30 WPM)
- stressed_or_tired - Inconsistent rhythm (consistency <0.5)
Endpoint:
POST /behavior/predict
Request:
{
"events": [...],
"user_id": "user123"
}
Response:
{
"status": "predicted",
"behavior_category": "fast_focused",
"confidence": 0.89,
"features": {
"typing_speed": 85,
"consistency": 0.82
}
}
Data Flow
Pattern Detection Flow
Keystroke Events → Analyze Typing Metrics → Identify Patterns → Return Results
↓
- Calculate WPM
- Calculate Consistency
- Compare to Thresholds
Anomaly Detection Flow
Keystroke Events → Build Profile → Compare to Baseline → Detect Deviations → Alert
↓
Store as Baseline (first time)
Use for Comparison (subsequent)
Authenticity Verification Flow
Keystroke Events → Extract Features → Compare to Baseline → Calculate Score → Verdict
↓
- Speed match percentage
- Consistency match percentage
- Combined score
Metrics
Typing Speed (WPM)
Calculated as words per minute:
WPM = (Total Characters / 5) / (Total Time in Minutes)
Rhythm Consistency (0.0 to 1.0)
Measures regularity of keystroke intervals:
Consistency = 1.0 - (Standard Deviation / Mean Interval)
Higher values indicate more consistent rhythm.
Authenticity Score (0.0 to 1.0)
Composite score combining:
- Speed match (50% weight)
- Consistency match (50% weight)
Anomaly Severity (0.0 to 1.0)
Indicates how significant deviation from baseline is.
Usage Examples
Example 1: Detect User's Typing Patterns
curl -X POST http://localhost:8003/patterns/detect \
-H "Content-Type: application/json" \
-d '{
"events": [
{"timestamp": 0, "key_code": 65, "event_type": "press"},
{"timestamp": 95, "key_code": 66, "event_type": "press"},
{"timestamp": 190, "key_code": 67, "event_type": "press"}
],
"user_id": "alice"
}'
Example 2: Build User Baseline Profile
curl -X POST http://localhost:8003/profile/build \
-H "Content-Type: application/json" \
-d '{
"events": [...], # 200+ events
"user_id": "alice"
}'
Example 3: Check User Authenticity
# First, build profile
curl -X POST http://localhost:8003/profile/build \
-H "Content-Type: application/json" \
-d '{"events": [...], "user_id": "alice"}'
# Then check if events match
curl -X POST http://localhost:8003/authenticity/check \
-H "Content-Type: application/json" \
-d '{
"events": [...], # New keystroke events
"user_id": "alice"
}'
Example 4: Predict Behavior
# Train model
curl -X POST http://localhost:8003/model/train?sample_size=500
# Predict behavior
curl -X POST http://localhost:8003/behavior/predict \
-H "Content-Type: application/json" \
-d '{
"events": [...],
"user_id": "alice"
}'
Integration with Main API
The ML service can be called from the main API. To add ML endpoints to the main API:
import httpx
@app.post("/api/ml/patterns")
async def analyze_patterns_endpoint(user_id: str):
async with httpx.AsyncClient() as client:
response = await client.post(
"http://ml_service:8003/patterns/detect",
json={"events": events, "user_id": user_id}
)
return response.json()
Performance Characteristics
Typical latencies on 2 CPU, 2GB RAM:
- Pattern detection: 50-100ms
- Anomaly detection: 80-150ms
- Profile building: 150-300ms
- Authenticity check: 100-200ms
- Temporal analysis: 200-500ms (depends on data range)
- Model training: 500-1000ms (depends on sample size)
- Behavior prediction: 50-100ms
Security Considerations
-
Input Validation
- Events must be valid timestamped data
- User IDs sanitized
-
Privacy
- Profiles stored only in memory during service lifetime
- No persistent profile storage in ML service
-
Access Control
- Runs on internal network (port 8003)
- Not exposed directly to clients
- Access via main API with authentication
Limitations
-
Baseline Establishment
- Requires minimum keystroke events (100+) for accurate profile
- Needs established baseline for anomaly detection
-
Model Accuracy
- Accuracy depends on training data quality
- New user profiles need 200+ samples for reliability
-
Time-Based Features
- Temporal analysis requires historical data in database
- Peak hour detection requires events across different times
Future Enhancements
-
Advanced ML Models
- Neural network-based behavior classification
- Seasonal pattern detection
- Predictive analytics
-
Continuous Learning
- Automatic profile updates
- Adaptive thresholds
- User adaptation tracking
-
Threat Detection
- Replay attack detection
- Impersonation detection
- Behavioral drift tracking
-
Integration
- Real-time alerts for anomalies
- Dashboard visualizations
- Export capabilities
Troubleshooting
Service won't start
docker-compose logs ml_service
Pattern detection returns empty
- Ensure events list is not empty
- Minimum 10 events recommended for pattern detection
Anomaly detection shows no anomalies
- Build baseline first with
/profile/build - Ensure user_id matches between profile and check
Authenticity score always ~0.5
- Profile not established for user
- Need to call
/profile/buildfirst
Testing
Run ML service tests:
pytest tests/test_ml_service.py -v
Run specific test:
pytest tests/test_ml_service.py::TestPatternDetection::test_detect_fast_typing_pattern -v
References
- Main documentation: docs/API.md
- Performance guide: docs/PERFORMANCE.md
- Deployment guide: docs/DEPLOYMENT.md