Skip to content

Latest commit

 

History

History
613 lines (507 loc) · 24.4 KB

File metadata and controls

613 lines (507 loc) · 24.4 KB

DMARC Report Processor - Complete Feature List

Enterprise Edition v2.0


🎯 Core Features

✅ DMARC Report Ingestion

  • Email Integration: Automatically fetches DMARC aggregate reports from IMAP inbox
  • Bulk Upload: Drag-and-drop file upload (supports 50-200 files, max 50MB each)
  • Format Support: Handles XML, gzip (.gz), and zip (.zip) compressed formats
  • Idempotent Processing: Prevents duplicate reports using SHA256 content hashing
  • Error Handling: Gracefully handles malformed reports and connection issues
  • Processing Status: Tracks pending, processed, and failed reports
  • Automated Scheduling: Celery Beat-scheduled ingestion every 15 minutes

✅ Data Storage & Management

  • PostgreSQL Database: Production-ready relational database with connection pooling
  • Redis Caching: 90%+ cache hit rate with 5-minute TTL and pattern-based invalidation
  • Normalized Schema: Efficient storage with 7 core tables (reports, records, users, alerts, models)
  • Database Migrations: Alembic migrations for version control (7 migrations included)
  • Duplicate Prevention: SHA256 content hashing + unique constraints
  • Performance Indexes: Optimized queries on domain, date, source IP, and user lookups

🚀 API & Integration

RESTful API (100+ Endpoints)

Core Endpoints (/api)

  • GET /api/domains - List all domains with report counts
  • GET /api/reports - List reports with pagination and filtering
  • GET /api/reports/{id} - Get detailed single report
  • GET /api/reports/{id}/records - Get authentication records for report
  • POST /api/upload - Bulk file upload with drag-and-drop support

Analytics & Rollup (/api/rollup)

  • GET /api/rollup/summary - Aggregate statistics (pass/fail rates, message counts)
  • GET /api/rollup/sources - Top source IPs with authentication statistics
  • GET /api/rollup/alignment - DKIM/SPF alignment statistics
  • GET /api/rollup/timeline - Time-series data for trend charts
  • GET /api/rollup/alignment-breakdown - Detailed auth breakdown
  • GET /api/rollup/failure-trend - Daily failure rates with 7-day moving average
  • GET /api/rollup/top-organizations - Top sending organizations by volume

Export Endpoints (/api/export)

  • GET /api/export/reports/csv - Export reports to CSV (rate: 10/min)
  • GET /api/export/records/csv - Export detailed records to CSV (max 10K, rate: 10/min)
  • GET /api/export/sources/csv - Export source IP statistics to CSV (rate: 10/min)
  • GET /api/export/report/pdf - Generate comprehensive PDF summary (rate: 5/min)

Authentication (/auth) NEW ✨

  • POST /auth/login - Login with email/password, returns JWT tokens
  • POST /auth/refresh - Refresh expired access token
  • POST /auth/logout - Logout and invalidate refresh token

User Management (/users) NEW ✨

  • GET /users/me - Get current user profile
  • GET /users - List all users (admin only)
  • POST /users - Create new user (admin only)
  • PATCH /users/{id} - Update user details (admin only)
  • DELETE /users/{id} - Delete user (admin only)
  • POST /users/api-keys - Generate per-user API key
  • GET /users/api-keys - List user's API keys
  • DELETE /users/api-keys/{key_id} - Revoke API key

Alert Management (/alerts) NEW ✨

  • GET /alerts/history - List alert history with filters
  • GET /alerts/{id} - Get alert details
  • PATCH /alerts/{id}/acknowledge - Acknowledge alert with note
  • PATCH /alerts/{id}/resolve - Resolve alert with note
  • POST /alerts/check - Manually trigger alert checks
  • GET /alerts/rules - List alert rules
  • POST /alerts/rules - Create custom alert rule (admin)
  • PATCH /alerts/rules/{id} - Update alert rule (admin)
  • DELETE /alerts/rules/{id} - Delete alert rule (admin)
  • GET /alerts/suppressions - List alert suppressions
  • POST /alerts/suppressions - Create suppression window
  • DELETE /alerts/suppressions/{id} - Delete suppression

ML Analytics (/analytics) NEW ✨

  • GET /analytics/geolocation/map - Country heatmap with IP distribution
  • GET /analytics/geolocation/lookup/{ip} - Single IP geolocation
  • POST /analytics/geolocation/lookup-bulk - Bulk IP lookup (max 1000)
  • GET /analytics/geolocation/cache-stats - Cache statistics
  • GET /analytics/ml/models - List trained ML models
  • GET /analytics/ml/models/{id} - Get model details
  • POST /analytics/ml/train - Train new anomaly model (admin)
  • POST /analytics/ml/deploy - Deploy model for production (admin)
  • GET /analytics/ml/models/{id}/stats - Model performance stats
  • POST /analytics/anomalies/detect - Run anomaly detection
  • GET /analytics/anomalies/recent - Recent anomaly predictions

Task Management (/tasks) NEW ✨

  • POST /tasks/trigger/email-ingestion - Manually trigger email fetch
  • POST /tasks/trigger/process-reports - Process pending reports
  • GET /tasks/status/{task_id} - Get Celery task status

Utilities

  • GET /health - Health check endpoint
  • GET /docs - Interactive API documentation (Swagger UI)
  • GET /redoc - Alternative API documentation (ReDoc)

Advanced Filtering (All Endpoints)

All API endpoints support comprehensive filtering:

  • Domain filtering: ?domain=example.com
  • Date range filtering: ?days=30 or ?start=2024-01-01&end=2024-12-31
  • Source IP filtering: ?source_ip=192.168.1.1 (exact match)
  • CIDR range filtering: ?source_ip_range=192.168.1.0/24
  • Authentication filters: ?dkim_result=pass or ?spf_result=fail
  • Disposition filter: ?disposition=quarantine
  • Organization filter: ?org_name=google.com (case-insensitive)

🔐 Authentication & Authorization NEW ✨

JWT Authentication

  • Token Types:
    • Access Token: 15-minute expiry, for API requests
    • Refresh Token: 7-day expiry, for renewing access tokens
  • Token Storage: Refresh tokens stored in database, revocable
  • Security: HS256 algorithm, 64+ character secret key
  • Headers: Authorization: Bearer <token> or X-API-Key: <key>

Role-Based Access Control (RBAC)

Three role levels with granular permissions:

Role Permissions
Admin All permissions: user management, model training, rule creation, full data access
Analyst Read/write: reports, alerts (acknowledge/resolve), analytics, exports
Viewer Read-only: dashboards, reports, analytics (no modifications)

User Management

  • Admin-Only Creation: No self-registration, admins create all user accounts
  • Password Security: bcrypt hashing with 12 rounds
  • Password Requirements: Minimum 12 characters, complexity enforced
  • API Key Management: Per-user API keys with SHA256 hashing
  • User Lifecycle: Active/inactive status, soft deletion support
  • Audit Trail: Track user actions and API usage

📊 Interactive Dashboard

Summary Statistics (with Tooltips)

  • Total Reports: Hover shows explanation of DMARC aggregate reports
  • Pass Rate: Percentage passing both SPF & DKIM (tooltip explains >95% is healthy)
  • Fail Rate: Percentage failing both (tooltip warns about spoofing/misconfiguration)
  • Total Messages: Total emails analyzed across all reports

8 Comprehensive Charts

  1. DMARC Results Over Time (Line chart) - Timeline of pass/fail trends
  2. Results by Domain (Bar chart) - Click to filter by domain
  3. Top Source IPs (Bar chart) - Most active senders
  4. Disposition Breakdown (Pie chart) - None/Quarantine/Reject distribution
  5. SPF/DKIM Alignment Breakdown (Stacked bar) - Both pass, DKIM only, SPF only, both fail
  6. Policy Compliance (Doughnut chart) - Compliant vs non-compliant
  7. Failure Rate Trend (Line chart) - Daily failure rate with 7-day moving average
  8. Top Sending Organizations (Horizontal bar) - Top 10 by volume

Advanced Filtering UI

  • Basic Filters: Domain, date range (7/30/90 days or custom)
  • Advanced Filters Panel (expandable):
    • Source IP (exact match)
    • IP Range (CIDR notation)
    • DKIM Result (pass/fail)
    • SPF Result (pass/fail)
    • Disposition (none/quarantine/reject)
    • Organization name

Reports Table

  • Recent reports with pagination
  • Click "View" for detailed report modal
  • Shows organization, domain, date range, message counts, pass/fail stats

Report Detail Modal

  • Report Information: Org, domain, date range, report ID, email
  • Policy Information: DMARC policy (p), subdomain policy (sp), percentage, DKIM/SPF alignment
  • Statistics: Total messages, record count, received timestamp
  • Authentication Records Table: Paginated records with source IPs, counts, DKIM/SPF results, disposition

Export Features

  • Export Dropdown: 4 export options (Reports CSV, Records CSV, Sources CSV, PDF)
  • Filtered Exports: All exports respect active dashboard filters
  • CSV Formula Injection Prevention: Secure CSV generation
  • PDF Reports: Professional 2-page summary with charts and tables
  • Timestamped Filenames: Automatic naming with generation date

Educational Features

  • Help Modal (❓ button): Comprehensive guide explaining DMARC, SPF, DKIM
  • Instant Tooltips (hover any badge): Detailed pass/fail explanations
  • Visual Indicators: Color-coded badges (green=good, red=bad)

UX Features

  • Auto-refresh every 30 seconds
  • XSS protection with safe DOM manipulation
  • Responsive design (mobile-friendly)
  • Loading states and error messages
  • Success/error notifications

🔔 Enhanced Alerting System NEW ✨

Alert Lifecycle Management

  • States: Created → Acknowledged → Resolved
  • Acknowledgment: Assign to user, add notes, track timestamp
  • Resolution: Mark resolved with resolution notes, track resolver
  • History: Full audit trail of all state transitions

Alert Deduplication

  • Fingerprinting: SHA256 hash of alert type + domain + metadata
  • Cooldown Periods: Configurable per alert type (1-24 hours)
  • Smart Suppression: Prevent alert spam while ensuring visibility

Alert Types

  • Failure Rate Threshold: Alert when failure rate exceeds X% over Y days
  • Volume Spike: Detect unusual traffic increases (>X% above baseline)
  • Volume Drop: Detect suspicious traffic decreases
  • New Source IPs: Detect new IPs sending for your domain
  • Policy Violation: Monitor DMARC policy effectiveness
  • Anomaly Detection: ML-powered suspicious IP behavior NEW

Notification Channels

  • Microsoft Teams (Priority): Webhook integration with rich cards PRIORITIZED
  • Email: SMTP integration with HTML templates
  • Slack: Webhook integration with rich message formatting
  • Discord: Webhook integration
  • Webhook: Custom webhook support for integrations

Alert Suppressions

  • Time-Based: Suppress alerts during maintenance windows
  • Filter by: Alert type, severity, domain
  • Recurrence: Support for recurring suppression patterns
  • Automatic Expiry: Suppressions end automatically

Alert Rules (Configurable)

  • UI-Based Configuration: Create/edit rules without code changes
  • Custom Thresholds: Per-domain, per-alert-type thresholds
  • Channel Selection: Choose which channels to notify per rule
  • Enable/Disable: Toggle rules without deletion
  • Admin-Only Management: Prevent unauthorized rule changes

🤖 ML Analytics & Geolocation NEW ✨

IP Geolocation

  • MaxMind GeoLite2 Integration: Offline IP-to-location mapping
  • Database: GeoLite2 City database (free, updated monthly)
  • 90-Day Caching: PostgreSQL cache for efficiency
  • Bulk Lookup: Process up to 1000 IPs in single request
  • Data Points: Country, city, lat/lon, timezone, continent, ASN
  • Cache Management: Automatic expiry, purge tasks (weekly)

Country Heatmaps

  • Geographic Visualization: IP distribution by country
  • Customizable Timeframes: Last 7, 30, 90, 365 days
  • Real-Time Generation: On-demand heatmap creation
  • Daily Pre-Caching: Celery task generates heatmaps at 4 AM
  • Statistics: Total IPs, mapped IPs, unmapped IPs, max count per country

ML Anomaly Detection (Isolation Forest)

  • Algorithm: scikit-learn Isolation Forest
  • Features Extracted:
    • Volume: Total email count from IP
    • Failure Rate: DMARC failure percentage
    • Unique Domains: Number of domains targeted
    • Hour of Day: Temporal pattern (0-23)
    • Day of Week: Weekly pattern (0=Monday, 6=Sunday)
  • Contamination: Expected anomaly proportion (default 5%)
  • Anomaly Scoring: Lower score = more anomalous (-1.0 to 0.5 range)
  • Threshold Detection: Configurable threshold (default -0.5)

Model Management

  • Training: Train on last 30-365 days of data (default: 90 days)
  • Versioning: Multiple models stored, version tracking
  • Deployment: Single deployed model for production use
  • Auto-Deployment: First model auto-deploys, or deploy better models
  • Metadata: Training samples, metrics, date ranges, trained by user
  • Performance Tracking: Anomalies detected, prediction accuracy

Automated ML Workflows (Celery Beat)

  • Weekly Training: Sunday 2 AM - Train new model on last 90 days
  • Daily Detection: Daily 3 AM - Run anomaly detection on last 7 days
  • Cache Purge: Monday 1 AM - Clean expired geolocation cache
  • Analytics Generation: Daily 4 AM - Pre-generate country heatmaps

Prediction History

  • Database Storage: All predictions stored for analysis
  • Features Logged: IP address, score, features, timestamp
  • Queryable: Recent predictions endpoint for trends
  • Integration: Link anomalies to alert system (future)

⚡ Performance & Infrastructure

Celery Distributed Task Queue NEW ✨

  • Message Broker: Redis (fast, in-memory)
  • Result Backend: PostgreSQL (persistent, queryable)
  • Worker Pool: Configurable workers with prefetch multiplier
  • Retry Logic: Exponential backoff, max 3 retries
  • Task Timeouts: Soft limits (warning) + hard limits (kill)
  • Task Routing: Dedicated queues for different task types
  • Monitoring: Flower dashboard at :5555

Scheduled Tasks (Celery Beat)

  • Email Ingestion: Every 15 minutes
  • Report Processing: Every 5 minutes
  • Alert Checks: Every hour (on the hour)
  • ML Model Training: Weekly (Sunday 2 AM)
  • Anomaly Detection: Daily (3 AM)
  • Cache Purge: Weekly (Monday 1 AM)
  • Analytics Generation: Daily (4 AM)

Redis Caching

  • Cache Hit Rate: 90%+ after warmup
  • TTL: 5 minutes (configurable via CACHE_DEFAULT_TTL)
  • Pattern-Based Keys: timeline:*, summary:*, sources:*
  • Automatic Invalidation: Cache cleared on new data ingestion
  • Graceful Degradation: System continues if Redis unavailable
  • Memory Management: 256MB limit with LRU eviction

Query Optimization

  • N+1 Query Elimination: Timeline and alerting endpoints optimized
  • Single JOIN Queries: Replaced iterative record loops
  • Performance Improvements:
    • Timeline endpoint: ~800ms → <200ms
    • Alerting checks: ~1200ms → <300ms
    • Dashboard load: <1s with active cache
    • ML training: 1-5 minutes (depending on data volume)
  • Database Indexes: Optimized on domain, date, source_ip, fingerprints, user_id

Connection Pooling

  • SQLAlchemy connection pooling (pool size: 20, max overflow: 10)
  • Redis connection pooling
  • Nginx connection optimization

🧪 Testing & Quality

Test Coverage

  • 75%+ Coverage Enforced: CI/CD fails below threshold
  • Unit Tests: Parser, services, utilities, edge cases
  • Integration Tests: End-to-end workflows, idempotency, auth flows
  • Test Commands:
    pytest -v --cov=app                    # All tests with coverage
    pytest tests/unit/ -v                  # Unit tests only
    pytest tests/integration/ -v           # Integration tests only
    pytest --cov=app --cov-report=html     # HTML coverage report

CI/CD Pipeline (GitHub Actions)

  • Automated Testing: Runs on all pushes and PRs
  • Linting: Code quality checks (flake8, black)
  • Security Scans: Dependency vulnerability scanning
  • Docker Builds: Validates container builds
  • Coverage Reporting: Enforces 75% minimum
  • Type Checking: MyPy static type analysis

🐳 Deployment & Infrastructure

Docker Compose Setup (7 Services)

  • backend - FastAPI application (port 8000)
  • celery-worker - Background task processor
  • celery-beat - Task scheduler
  • flower - Celery monitoring UI (port 5555)
  • db - PostgreSQL 15 (Alpine)
  • redis - Redis 7 (Alpine) - Cache & broker
  • nginx - Web server & reverse proxy (port 80)

Production Deployment Features

  • Automated Database Migrations: Alembic migrations on startup
  • Log Rotation: Structured logging with rotation
  • Health Monitoring: /health endpoint for load balancers
  • Graceful Shutdown: Proper signal handling
  • Resource Limits: Configurable memory/CPU limits
  • SSL/TLS Support: Let's Encrypt integration documented
  • Security Headers: X-Content-Type-Options, HSTS, CSP

Documentation (10+ Guides)

  • README.md: Quick start guide with examples
  • FEATURES.md: This comprehensive feature list
  • DEPLOYMENT.md: Complete production deployment guide
  • TESTING.md: Testing strategy and QA procedures
  • PHASE1_DEPLOYMENT.md: Celery + Redis setup
  • PHASE2_DEPLOYMENT.md: Authentication setup
  • PHASE3_DEPLOYMENT.md: Enhanced alerting setup
  • PHASE4_DEPLOYMENT.md: ML analytics & geolocation setup
  • API Docs: Auto-generated Swagger UI and ReDoc

📈 Technical Stack

Backend Stack

  • Framework: FastAPI (async, type-safe, modern)
  • ORM: SQLAlchemy 2.0+ (mature, flexible)
  • Validation: Pydantic v2 (automatic validation & serialization)
  • Auth: PyJWT + bcrypt (secure token & password handling)
  • Task Queue: Celery + Redis
  • ML/Analytics: scikit-learn 1.4.0, NumPy 1.26.3, pandas 2.1.4
  • Geolocation: geoip2 4.7.0 + MaxMind GeoLite2
  • Testing: pytest + coverage + faker
  • Rate Limiting: SlowAPI (per-user tracking)
  • PDF Generation: ReportLab 4.0+
  • Python: 3.11

Database Stack

  • Engine: PostgreSQL 15 (Alpine)
  • Migration: Alembic (7 migrations included)
  • Connection Pooling: SQLAlchemy built-in
  • Persistent Storage: Docker volumes
  • Indexes: 15+ performance indexes

Frontend Stack

  • No Build Step: Vanilla HTML/CSS/JS (zero dependencies)
  • Charting: Chart.js v4.4.0 (lightweight, responsive)
  • Styling: Modern CSS Grid/Flexbox + custom tooltips
  • Responsive: Mobile-friendly design
  • CDN: Chart.js loaded from jsDelivr

Infrastructure Stack

  • Orchestration: Docker Compose
  • Web Server: Nginx (Alpine)
  • Reverse Proxy: API routing via Nginx
  • Health Checks: Docker health checks configured
  • Log Aggregation: Structured JSON logging ready

🏗️ Database Schema

Core Tables (7 Migrations)

Migration 001-003: Core DMARC

  • ingested_reports - Email tracking with SHA256 hashing
  • dmarc_reports - Aggregate report metadata
  • dmarc_records - Individual authentication records
  • Performance indexes on domain, date, source_ip

Migration 004: Task Tracking

  • celery_task_meta - Celery task result storage
  • Celery-managed schema for distributed tasks

Migration 005: Authentication

  • users - User accounts (email, password, role)
  • user_api_keys - Per-user API keys (SHA256 hashed)
  • refresh_tokens - JWT refresh token storage
  • Indexes on email, api_key_hash, token

Migration 006: Enhanced Alerting

  • alert_history - Persistent alert records with lifecycle
  • alert_rules - Configurable alert rules
  • alert_suppressions - Time-based suppression windows
  • Indexes on fingerprint, status, domain, created_at

Migration 007: ML & Geolocation

  • geolocation_cache - IP geolocation cache (90-day expiry)
  • ml_models - Trained ML models with serialized data
  • ml_predictions - ML prediction results
  • analytics_cache - Cached analytics (heatmaps, etc.)
  • Indexes on ip_address, model_id, cache_key, expires_at

🔒 Security Features

Authentication & Authorization

  • JWT Tokens: HS256, 64+ char secret, 15-min access + 7-day refresh
  • Password Hashing: bcrypt, 12 rounds, min 12 characters
  • API Key Security: SHA256 hashing, per-user tracking, revocable
  • Token Revocation: Refresh tokens stored, can be invalidated
  • Session Management: Stateless JWT + stateful refresh tokens

API Security

  • Input Validation: Pydantic schemas with type safety
  • SQL Injection Protection: SQLAlchemy ORM (never raw SQL)
  • XSS Prevention: Safe DOM methods, no innerHTML with user data
  • CSV Formula Injection Prevention: Prefixes dangerous characters
  • Rate Limiting: Per-user tracking (not just IP)
    • Upload: 20/hour
    • API calls: 100/minute
    • CSV exports: 10/minute
    • PDF exports: 5/minute
  • CORS Configuration: Configurable allowed origins
  • Security Headers: X-Content-Type-Options, Content-Disposition, HSTS

Infrastructure Security

  • Environment Variables: All secrets in .env (never committed)
  • SSL/TLS Support: Let's Encrypt integration documented
  • Optional Basic Auth: Dashboard password protection (Nginx)
  • Log Sanitization: Sensitive data excluded from logs
  • Docker Network Isolation: Internal network for services

ML Model Security

  • Self-Trained Models Only: Never load external/untrusted models
  • Database-Only Storage: Models stored in PostgreSQL, restricted access
  • Admin-Only Training: Model training requires admin role
  • Serialization: Standard scikit-learn practice (secure for self-trained)

📊 System Metrics

Current Performance

  • Dashboard Load Time: <1s (with cache)
  • API Response Times: <200ms (cached), <1s (uncached)
  • Cache Hit Rate: 90%+ after warmup
  • Concurrent Users: Tested with 100+ simultaneous requests
  • Report Processing: ~50ms per report
  • Upload Handling: 50-200 files in single upload
  • ML Training: 1-5 minutes (1K-10K samples)
  • Anomaly Detection: ~1 second per 1000 IPs

Capacity

  • Handles thousands of reports efficiently
  • Processes millions of authentication records
  • Stores years of historical data
  • Scales horizontally with Celery workers
  • Ready for enterprise-scale deployments

🎓 Educational Features

For Non-Technical Users

  • Help Modal: Beginner-friendly DMARC education
  • Tooltips: Instant explanations on hover
  • Visual Indicators: Color-coded badges (green=good, red=bad)
  • Plain Language: Avoids technical jargon where possible

For Security Teams

  • CIDR Filtering: Investigate entire IP ranges
  • Deep-Dive Records: Click any report for full authentication details
  • Export Options: Take data to other tools (CSV/PDF)
  • Alert System: Proactive security monitoring
  • Anomaly Detection: ML-powered threat identification
  • Geolocation Mapping: Visualize attack origins

For Compliance

  • RFC 7489 Compliant: Full DMARC standard support
  • Audit Trail: All ingested reports tracked
  • PDF Reports: Professional summaries for stakeholders
  • Historical Data: Prove compliance over time
  • Alert History: Demonstrate security response

🗺️ Future Enhancements

High Priority

  1. Holt-Winters Exponential Smoothing Forecasting (ML feature)

    • Forecast failure rates 7 days ahead
    • Uses statsmodels Holt-Winters implementation
    • Weekly training on 180 days of data
    • Confidence intervals included
  2. Frontend React Migration

    • Modern React + TypeScript frontend
    • Real-time WebSocket updates
    • Advanced dashboard customization
    • Dark mode toggle
  3. SIEM Integration

    • Splunk/ELK/Datadog connectors
    • Real-time event streaming
    • Custom alert forwarding
    • Prometheus metrics export

Medium Priority

  1. Multi-Tenancy

    • Support multiple organizations
    • Tenant isolation
    • Per-tenant configurations
    • Shared vs dedicated infrastructure
  2. Advanced Visualizations

    • Interactive geographic maps
    • Time-series anomaly charts
    • Real-time threat dashboards
    • Chart export as images
  3. Enhanced ML Features

    • Model A/B testing
    • Champion/challenger deployment
    • Automatic rollback on poor performance
    • Ensemble models

Low Priority

  1. Mobile App

    • Native iOS/Android apps
    • Push notifications for alerts
    • Quick stats view
  2. AI-Powered Features

    • Natural language queries
    • Automated root cause analysis
    • Smart policy recommendations

Version: 2.0.0 (Enterprise Edition) Last Updated: January 2026 Status: ✅ All 4 phases complete - Production ready