🔄 Data Ingestion Architecture
DNS Science employs a distributed daemon architecture where specialized services continuously ingest, process, and enrich domain intelligence data. Each daemon operates independently, automatically recovering from failures and maintaining data freshness.
Core Ingestion Principles
- Continuous Operation: All daemons run 24/7 as systemd services
- Automatic Recovery: Built-in retry logic and exponential backoff
- Database-Backed: All data persisted to PostgreSQL for historical analysis
- Rate-Limited: Respects API quotas and implements intelligent throttling
- Enrichment Pipeline: Data flows through multiple enrichment stages
📡 Active Data Sources (20+ Feeds)
🔒 SSL/TLS Certificate Monitoring
ContinuousDaemon: ssl-scanner.service
Actively scans domains for SSL certificates, tracks expiration dates, validates certificate chains, and alerts on certificate transparency logs.
Tables: ssl_certificates, certificate_history
🌐 RDAP (Registration Data Access Protocol)
BatchDaemon: rdap.service
Queries RDAP servers for domain registration data including registrar info, nameservers, DNSSEC status, and registration/expiration dates.
Tables: rdap_data
🛡️ Threat Intelligence (Multi-Source)
Real-timeDaemon: threat-intel.service
Aggregates threat data from AlienVault OTX, Pulsedive, URLhaus, PhishTank, Google Safe Browsing, and other feeds.
Tables: threat_intel, threat_history
🔍 Domain Enrichment Engine
ContinuousDaemon: enrichment.service
Calculates comprehensive security scores based on DNSSEC, SPF, DMARC, SSL status, threat intelligence, and blacklist status.
Tables: enrichment_data
🔄 Reverse DNS (ARPA) Scanner
ContinuousDaemon: arpad.service
Performs forward-confirmed reverse DNS (FCrDNS) checks for IPv4 and IPv6 addresses, validating PTR records.
Tables: ptr_records
🕵️ Dark Web Monitoring
Real-timeDaemon: darkweb-monitor.service
Monitors Tor hidden services and dark web marketplaces for domain-related threats, breached credentials, and mentions.
Tables: darkweb_mentions, darkweb_credentials
🌍 Web3 & Blockchain Domains
ContinuousDaemon: web3d.service
Indexes ENS (.eth), Solana Name Service (.sol), Unstoppable Domains, and other blockchain-based naming systems.
Tables: web3_domains
💰 Domain Valuation Engine
BatchDaemon: domain-valuation.service
Calculates estimated domain values based on length, keywords, TLD, traffic estimates, backlinks, and historical sales data.
Tables: domain_valuations
📧 Email Deliverability Scoring
ContinuousDaemon: email-deliverability.service
Tests SMTP connectivity, validates SPF/DKIM/DMARC records, checks MTA-STS policies, and scores email security posture.
Tables: email_deliverability
📅 Domain Expiry Monitoring
ContinuousDaemon: domain-expiry.service
Tracks expiration dates, sends alerts for domains nearing expiry, and identifies acquisition opportunities.
Tables: expiring_domains
🔎 Domain Discovery Engine
ContinuousDaemon: domain-discovery.service
Discovers new domains through AllZonefiles.io (312M+ domains across 1,570+ TLDs), certificate transparency logs, Tranco/Umbrella rankings, Cloudflare Radar, and WHOIS monitoring. Daily ingestion of 200K-500K newly registered domains.
Tables: discovered_domains
📂 AllZonefiles.io Zone Files
Real-timeIntegration: allzonefiles_api
Direct access to TLD zone files including .com (158M+), .net (13M+), .org (10M+), and 1,500+ other gTLDs and ccTLDs. Daily new domain lists and expired domain tracking.
Tables: domains, domain_discovery_stats
🌐 IP Intelligence & Geolocation
Real-timeDaemon: ip-intel.service
Enriches IP addresses with geolocation, ASN data, hosting provider info, reputation scores, and abuse history.
Tables: ip_intelligence
📊 Shodan Integration
BatchIntegration: shodan_integration.py
Queries Shodan for exposed services, open ports, vulnerabilities, and internet-facing infrastructure associated with domains.
Tables: shodan_results
🔐 SecurityTrails Integration
BatchIntegration: securitytrails_integration.py
Historical DNS records, subdomains, WHOIS history, and related domain discovery through SecurityTrails API.
Tables: securitytrails_data
🎣 PhishTank Anti-Phishing
Real-timeIntegration: phishtank_integration.py
Checks domains against PhishTank's community-verified phishing database.
Tables: phishing_detections
🦠 URLhaus Malware Tracking
Real-timeIntegration: urlhaus_integration.py
Monitors URLhaus for malware distribution URLs and C2 infrastructure associated with domains.
Tables: malware_urls
👁️ AlienVault OTX
Real-timeIntegration: alienvault_otx.py
Threat intelligence pulses, IOCs, and community-contributed security intelligence from AlienVault OTX.
Tables: otx_pulses, otx_indicators
🔬 Pulsedive Threat Intel
Real-timeIntegration: pulsedive_integration.py
Automated threat intelligence scanning and risk assessment through Pulsedive's threat intelligence platform.
Tables: pulsedive_threats
📜 Certificate Transparency Logs
Real-timeDaemon: certificate-transparency.service
Monitors CT logs for newly issued certificates, discovers subdomains, and detects suspicious certificate issuance.
Tables: ct_logs
🚨 DNS Blackhole Lists
ContinuousIntegration: dns_blackhole.py
Checks domains against major DNS blackhole lists (DNSBL) including Spamhaus, SURBL, and custom threat feeds.
Tables: blacklist_status
🏪 Domain Marketplace Integration
BatchDaemon: domain-acquisition.service
Monitors aftermarket domain sales, auctions, and drop-catching opportunities across major marketplaces.
Tables: marketplace_domains, auction_watch
📊 Reporting & Analytics Engine
BatchDaemon: reporting.service
Generates daily, weekly, and monthly analytics reports on domain security posture, threat trends, and recommendations.
Tables: reports, report_schedules
🔍 DNS Sniffer Daemon (Client Network Monitoring)
Real-timeDaemon: dnsscience_snifferd
Deploy on client networks to monitor real-time DNS traffic, detect threats, identify malicious domains, and analyze query patterns. Captures all DNS queries from home, office, or remote locations and reports to your DNS Science dashboard.
Capabilities: Real-time threat detection, blacklisted DNS server alerts, traffic pattern analysis, attack detection, and performance monitoring.
Tables: dns_monitoring_locations, dns_queries, dns_threats_detected
Documentation: Installation Guide
🔄 Data Flow & Processing Pipeline
1. Data Collection Layer
└─> Daemon discovers/receives new data
└─> Validates and normalizes data format
└─> Checks for duplicates and rate limits
2. Enrichment Layer
└─> Cross-references with existing data
└─> Calculates security scores
└─> Identifies correlations and patterns
3. Storage Layer
└─> Persists to PostgreSQL database
└─> Indexes for fast querying
└─> Maintains historical records
4. API Layer
└─> Exposes data via REST API
└─> Provides real-time WebSocket updates
└─> Serves GraphQL queries
5. User Interface Layer
└─> Data Explorer for browsing
└─> Real-time dashboards
└─> Alert notifications
🗄️ Database Schema Overview
| Table Name | Purpose | Update Frequency | Data Source |
|---|---|---|---|
domains |
Master domain registry | Real-time | All sources |
scans |
DNS security scan results | Every 5 minutes | DNS scanners |
ssl_certificates |
SSL/TLS certificate data | Hourly | SSL scanner |
rdap_data |
Registration information | Daily | RDAP daemon |
threat_intel |
Active threats and IOCs | Real-time | Multiple threat feeds |
enrichment_data |
Security scores and analysis | Every 15 minutes | Enrichment daemon |
ptr_records |
Reverse DNS records | Hourly | ARPA daemon |
web3_domains |
Blockchain domain names | Every 30 minutes | Web3 daemon |
darkweb_mentions |
Dark web monitoring hits | Continuous | Dark web daemon |
domain_valuations |
Estimated domain values | Daily | Valuation daemon |
⚡ Real-Time vs. Batch Processing
Real-Time Processing (WebSocket Updates)
- Threat intelligence alerts
- New certificate issuance
- Dark web mentions
- Phishing detections
- Blacklist status changes
Batch Processing (Scheduled Jobs)
- RDAP data updates (daily)
- Domain valuation calculations (daily)
- Historical trend analysis (weekly)
- Report generation (configurable)
- Data archival and cleanup (monthly)
🎯 How to Access Ingested Data
Via REST API
GET /api/domains?enriched=true
GET /api/threats?domain=example.com
GET /api/ssl-certificates?expiring_soon=true
GET /api/rdap?domain=example.com
GET /api/explorer/enrichment
Via GraphQL
query {
domain(name: "example.com") {
enrichment {
security_score
threats
ssl_status
}
rdap {
registrar
expiration_date
}
}
}
Via Data Explorer
Browse all ingested data interactively at /explorer with filtering, sorting, and export capabilities.
📈 Monitoring Data Ingestion
Check the health and status of all daemons:
# Via CLI
curl https://www.dnsscience.io/api/stats/daemons
# Via systemd (on server)
systemctl status enrichment ssl-scanner threat-intel rdap arpad
# Via monitoring dashboard
https://www.dnsscience.io/admin/monitoring
🚀 Why This Matters
- ✅ Real-time threat detection (seconds, not hours)
- ✅ Comprehensive historical data for trend analysis
- ✅ Automated enrichment without manual effort
- ✅ 20+ data sources cross-referenced automatically
- ✅ Proactive alerts before issues become critical
🔧 Technical Implementation
All ingestion daemons are implemented as:
- Python 3.9+ services with async/await for concurrent processing
- systemd units for automatic restart and dependency management
- PostgreSQL backend with optimized indexes and partitioning
- Redis caching for frequently accessed data
- Comprehensive logging to syslog and application logs
- Prometheus metrics for monitoring and alerting
📚 Further Reading
- API Documentation - Access ingested data programmatically
- CLI Documentation - Command-line tools for data access
- Data Explorer - Interactive browsing of all data sources
- Platform Architecture - System design and infrastructure