|
# Tikker ML Analytics - Advanced Pattern Detection & Behavioral Analysis
|
|
|
|
## Overview
|
|
|
|
The Tikker ML Analytics service provides machine learning-powered insights into keystroke behavior. It detects patterns, identifies anomalies, builds behavioral profiles, and enables user authenticity verification.
|
|
|
|
**Service Port:** 8003
|
|
|
|
## Architecture
|
|
|
|
The ML service operates independently as a microservice while leveraging the SQLite database shared with other services.
|
|
|
|
```
|
|
┌─────────────────────────────────┐
|
|
│ ML Analytics Service:8003 │
|
|
├─────────────────────────────────┤
|
|
│ - Pattern Detection │
|
|
│ - Anomaly Detection │
|
|
│ - Behavioral Profiling │
|
|
│ - User Authenticity Check │
|
|
│ - Temporal Analysis │
|
|
│ - ML Model Training & Inference │
|
|
└────────────┬────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────┐
|
|
│ SQLite DB │
|
|
│ (tikker.db) │
|
|
└─────────────┘
|
|
```
|
|
|
|
## Capabilities
|
|
|
|
### 1. Pattern Detection
|
|
|
|
Automatically identifies typing patterns and behavioral characteristics.
|
|
|
|
**Detected Patterns:**
|
|
- **fast_typist** - User types significantly faster than average (>80 WPM)
|
|
- **slow_typist** - User types slower than average (<20 WPM)
|
|
- **consistent_rhythm** - Very regular keystroke timing (consistency >0.85)
|
|
- **inconsistent_rhythm** - Irregular keystroke timing (consistency <0.5)
|
|
|
|
**Endpoint:**
|
|
```
|
|
POST /patterns/detect
|
|
```
|
|
|
|
**Request:**
|
|
```json
|
|
{
|
|
"events": [
|
|
{"timestamp": 0, "key_code": 65, "event_type": "press"},
|
|
{"timestamp": 100, "key_code": 66, "event_type": "press"}
|
|
],
|
|
"user_id": "user123"
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
[
|
|
{
|
|
"name": "fast_typist",
|
|
"confidence": 0.92,
|
|
"frequency": 150,
|
|
"description": "User types significantly faster than average",
|
|
"features": {
|
|
"avg_wpm": 85
|
|
}
|
|
}
|
|
]
|
|
```
|
|
|
|
### 2. Anomaly Detection
|
|
|
|
Compares current behavior against user's baseline profile to identify deviations.
|
|
|
|
**Detectable Anomalies:**
|
|
- **typing_speed_deviation** - Significant change in typing speed
|
|
- **rhythm_deviation** - Unusual change in keystroke rhythm
|
|
|
|
**Endpoint:**
|
|
```
|
|
POST /anomalies/detect
|
|
```
|
|
|
|
**Request:**
|
|
```json
|
|
{
|
|
"events": [...],
|
|
"user_id": "user123"
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
[
|
|
{
|
|
"timestamp": "2024-01-15T10:30:00",
|
|
"anomaly_type": "typing_speed_deviation",
|
|
"severity": 0.65,
|
|
"reason": "Typing speed deviation of 65% from baseline",
|
|
"expected_value": 50,
|
|
"actual_value": 82.5
|
|
}
|
|
]
|
|
```
|
|
|
|
### 3. Behavioral Profile Building
|
|
|
|
Creates comprehensive user profile from keystroke data.
|
|
|
|
**Profile Components:**
|
|
- Average typing speed (WPM)
|
|
- Peak activity hours
|
|
- Most common words
|
|
- Consistency score (0.0-1.0)
|
|
- Detected patterns
|
|
|
|
**Endpoint:**
|
|
```
|
|
POST /profile/build
|
|
```
|
|
|
|
**Request:**
|
|
```json
|
|
{
|
|
"events": [...],
|
|
"user_id": "user123"
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"user_id": "user123",
|
|
"avg_typing_speed": 58.5,
|
|
"peak_hours": [9, 10, 14, 15, 16],
|
|
"common_words": ["the", "and", "test", "python", "data"],
|
|
"consistency_score": 0.78,
|
|
"patterns": ["consistent_rhythm"]
|
|
}
|
|
```
|
|
|
|
### 4. User Authenticity Verification
|
|
|
|
Verifies if keystroke pattern matches known user profile (biometric authentication).
|
|
|
|
**Verdict Levels:**
|
|
- **authentic** - High confidence match (score > 0.8)
|
|
- **likely_authentic** - Good confidence match (score > 0.6)
|
|
- **uncertain** - Moderate confidence (score > 0.4)
|
|
- **suspicious** - Low confidence match (score ≤ 0.4)
|
|
- **unknown** - No baseline profile established
|
|
|
|
**Endpoint:**
|
|
```
|
|
POST /authenticity/check
|
|
```
|
|
|
|
**Request:**
|
|
```json
|
|
{
|
|
"events": [...],
|
|
"user_id": "user123"
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"authenticity_score": 0.87,
|
|
"confidence": 0.85,
|
|
"verdict": "authentic",
|
|
"reason": "Speed match: 92.1%, Consistency match: 82.5%"
|
|
}
|
|
```
|
|
|
|
### 5. Temporal Analysis
|
|
|
|
Analyzes keystroke patterns over time periods.
|
|
|
|
**Analysis Output:**
|
|
- Activity trends (increasing/decreasing)
|
|
- Daily breakdown
|
|
- Weekly patterns
|
|
- Seasonal variations
|
|
|
|
**Endpoint:**
|
|
```
|
|
POST /temporal/analyze
|
|
```
|
|
|
|
**Request:**
|
|
```json
|
|
{
|
|
"date_range_days": 7
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"trend": "increasing",
|
|
"date_range_days": 7,
|
|
"analysis": [
|
|
{"date": "2024-01-08", "total_events": 1250},
|
|
{"date": "2024-01-09", "total_events": 1380},
|
|
{"date": "2024-01-10", "total_events": 1450}
|
|
]
|
|
}
|
|
```
|
|
|
|
### 6. ML Model Training
|
|
|
|
Trains models on historical keystroke data for predictions.
|
|
|
|
**Endpoint:**
|
|
```
|
|
POST /model/train
|
|
```
|
|
|
|
**Parameters:**
|
|
- `sample_size` (optional, default=100, max=10000): Training samples
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"status": "trained",
|
|
"samples": 500,
|
|
"features": ["typing_speed", "consistency", "rhythm_pattern"],
|
|
"accuracy": 0.89
|
|
}
|
|
```
|
|
|
|
### 7. Behavior Prediction
|
|
|
|
Predicts user behavior based on trained model.
|
|
|
|
**Predicted Behaviors:**
|
|
- **normal** - Expected behavior
|
|
- **fast_focused** - Fast, focused typing (>80 WPM)
|
|
- **slow_deliberate** - Careful typing (<30 WPM)
|
|
- **stressed_or_tired** - Inconsistent rhythm (consistency <0.5)
|
|
|
|
**Endpoint:**
|
|
```
|
|
POST /behavior/predict
|
|
```
|
|
|
|
**Request:**
|
|
```json
|
|
{
|
|
"events": [...],
|
|
"user_id": "user123"
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"status": "predicted",
|
|
"behavior_category": "fast_focused",
|
|
"confidence": 0.89,
|
|
"features": {
|
|
"typing_speed": 85,
|
|
"consistency": 0.82
|
|
}
|
|
}
|
|
```
|
|
|
|
## Data Flow
|
|
|
|
### Pattern Detection Flow
|
|
```
|
|
Keystroke Events → Analyze Typing Metrics → Identify Patterns → Return Results
|
|
↓
|
|
- Calculate WPM
|
|
- Calculate Consistency
|
|
- Compare to Thresholds
|
|
```
|
|
|
|
### Anomaly Detection Flow
|
|
```
|
|
Keystroke Events → Build Profile → Compare to Baseline → Detect Deviations → Alert
|
|
↓
|
|
Store as Baseline (first time)
|
|
Use for Comparison (subsequent)
|
|
```
|
|
|
|
### Authenticity Verification Flow
|
|
```
|
|
Keystroke Events → Extract Features → Compare to Baseline → Calculate Score → Verdict
|
|
↓
|
|
- Speed match percentage
|
|
- Consistency match percentage
|
|
- Combined score
|
|
```
|
|
|
|
## Metrics
|
|
|
|
### Typing Speed (WPM)
|
|
Calculated as words per minute:
|
|
```
|
|
WPM = (Total Characters / 5) / (Total Time in Minutes)
|
|
```
|
|
|
|
### Rhythm Consistency (0.0 to 1.0)
|
|
Measures regularity of keystroke intervals:
|
|
```
|
|
Consistency = 1.0 - (Standard Deviation / Mean Interval)
|
|
```
|
|
|
|
Higher values indicate more consistent rhythm.
|
|
|
|
### Authenticity Score (0.0 to 1.0)
|
|
Composite score combining:
|
|
- Speed match (50% weight)
|
|
- Consistency match (50% weight)
|
|
|
|
### Anomaly Severity (0.0 to 1.0)
|
|
Indicates how significant deviation from baseline is.
|
|
|
|
## Usage Examples
|
|
|
|
### Example 1: Detect User's Typing Patterns
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8003/patterns/detect \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"events": [
|
|
{"timestamp": 0, "key_code": 65, "event_type": "press"},
|
|
{"timestamp": 95, "key_code": 66, "event_type": "press"},
|
|
{"timestamp": 190, "key_code": 67, "event_type": "press"}
|
|
],
|
|
"user_id": "alice"
|
|
}'
|
|
```
|
|
|
|
### Example 2: Build User Baseline Profile
|
|
|
|
```bash
|
|
curl -X POST http://localhost:8003/profile/build \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"events": [...], # 200+ events
|
|
"user_id": "alice"
|
|
}'
|
|
```
|
|
|
|
### Example 3: Check User Authenticity
|
|
|
|
```bash
|
|
# First, build profile
|
|
curl -X POST http://localhost:8003/profile/build \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"events": [...], "user_id": "alice"}'
|
|
|
|
# Then check if events match
|
|
curl -X POST http://localhost:8003/authenticity/check \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"events": [...], # New keystroke events
|
|
"user_id": "alice"
|
|
}'
|
|
```
|
|
|
|
### Example 4: Predict Behavior
|
|
|
|
```bash
|
|
# Train model
|
|
curl -X POST http://localhost:8003/model/train?sample_size=500
|
|
|
|
# Predict behavior
|
|
curl -X POST http://localhost:8003/behavior/predict \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"events": [...],
|
|
"user_id": "alice"
|
|
}'
|
|
```
|
|
|
|
## Integration with Main API
|
|
|
|
The ML service can be called from the main API. To add ML endpoints to the main API:
|
|
|
|
```python
|
|
import httpx
|
|
|
|
@app.post("/api/ml/patterns")
|
|
async def analyze_patterns_endpoint(user_id: str):
|
|
async with httpx.AsyncClient() as client:
|
|
response = await client.post(
|
|
"http://ml_service:8003/patterns/detect",
|
|
json={"events": events, "user_id": user_id}
|
|
)
|
|
return response.json()
|
|
```
|
|
|
|
## Performance Characteristics
|
|
|
|
Typical latencies on 2 CPU, 2GB RAM:
|
|
- Pattern detection: 50-100ms
|
|
- Anomaly detection: 80-150ms
|
|
- Profile building: 150-300ms
|
|
- Authenticity check: 100-200ms
|
|
- Temporal analysis: 200-500ms (depends on data range)
|
|
- Model training: 500-1000ms (depends on sample size)
|
|
- Behavior prediction: 50-100ms
|
|
|
|
## Security Considerations
|
|
|
|
1. **Input Validation**
|
|
- Events must be valid timestamped data
|
|
- User IDs sanitized
|
|
|
|
2. **Privacy**
|
|
- Profiles stored only in memory during service lifetime
|
|
- No persistent profile storage in ML service
|
|
|
|
3. **Access Control**
|
|
- Runs on internal network (port 8003)
|
|
- Not exposed directly to clients
|
|
- Access via main API with authentication
|
|
|
|
## Limitations
|
|
|
|
1. **Baseline Establishment**
|
|
- Requires minimum keystroke events (100+) for accurate profile
|
|
- Needs established baseline for anomaly detection
|
|
|
|
2. **Model Accuracy**
|
|
- Accuracy depends on training data quality
|
|
- New user profiles need 200+ samples for reliability
|
|
|
|
3. **Time-Based Features**
|
|
- Temporal analysis requires historical data in database
|
|
- Peak hour detection requires events across different times
|
|
|
|
## Future Enhancements
|
|
|
|
1. **Advanced ML Models**
|
|
- Neural network-based behavior classification
|
|
- Seasonal pattern detection
|
|
- Predictive analytics
|
|
|
|
2. **Continuous Learning**
|
|
- Automatic profile updates
|
|
- Adaptive thresholds
|
|
- User adaptation tracking
|
|
|
|
3. **Threat Detection**
|
|
- Replay attack detection
|
|
- Impersonation detection
|
|
- Behavioral drift tracking
|
|
|
|
4. **Integration**
|
|
- Real-time alerts for anomalies
|
|
- Dashboard visualizations
|
|
- Export capabilities
|
|
|
|
## Troubleshooting
|
|
|
|
### Service won't start
|
|
```bash
|
|
docker-compose logs ml_service
|
|
```
|
|
|
|
### Pattern detection returns empty
|
|
- Ensure events list is not empty
|
|
- Minimum 10 events recommended for pattern detection
|
|
|
|
### Anomaly detection shows no anomalies
|
|
- Build baseline first with `/profile/build`
|
|
- Ensure user_id matches between profile and check
|
|
|
|
### Authenticity score always ~0.5
|
|
- Profile not established for user
|
|
- Need to call `/profile/build` first
|
|
|
|
## Testing
|
|
|
|
Run ML service tests:
|
|
```bash
|
|
pytest tests/test_ml_service.py -v
|
|
```
|
|
|
|
Run specific test:
|
|
```bash
|
|
pytest tests/test_ml_service.py::TestPatternDetection::test_detect_fast_typing_pattern -v
|
|
```
|
|
|
|
## References
|
|
|
|
- Main documentation: [docs/API.md](API.md)
|
|
- Performance guide: [docs/PERFORMANCE.md](PERFORMANCE.md)
|
|
- Deployment guide: [docs/DEPLOYMENT.md](DEPLOYMENT.md)
|