DNS Science is built on a distributed, scalable architecture designed for real-time domain intelligence. Our platform processes millions of DNS records daily through a network of specialized daemons and verification systems.
Flask-based web application serving dynamic content with real-time updates. Built with responsive design principles and progressive enhancement.
RESTful API endpoints handling domain lookups, dark web monitoring, RDAP queries, and comprehensive DNS analysis.
22 specialized daemons running continuously to discover, enrich, and monitor domain data from multiple sources.
PostgreSQL database with Redis caching for high-performance queries. Optimized indexes and materialized views for analytics.
Our platform runs 22 specialized daemons, each focused on a specific aspect of domain intelligence:
Our advanced DNS monitoring capabilities allow us to monitor and debug clients' internal network DNS problems, monitor for traffic trends, analyze attack data, and much more:
Every domain in our database goes through a multi-stage verification process:
Our dark web monitoring system provides passive intelligence on Tor hidden services, I2P networks, and blockchain DNS:
Tracking 1,160 active Tor exit nodes updated hourly from Tor Project APIs. Database of known .onion addresses mapped to clearnet domains.
Monitoring ENS (Ethereum Name Service), Handshake (.hns), and Namecoin (.bit) domains for alternative DNS registrations.
Analyzing SSL certificates for hidden services, identifying anomalies and self-signed certs indicative of dark web infrastructure.
100% passive monitoring - no active crawling. All data from public sources, CT logs, and community-verified mappings.
Dark web monitoring utilizes 10 specialized tables:
onion_addresses - Known .onion hidden servicesi2p_addresses - I2P eepsite addressestor_exit_nodes - Active Tor exit node databasealternative_dns - Blockchain DNS registrationsdarkweb_certificates - SSL cert anomaly trackingdarkweb_lookups - Audit trail of all lookupsonion_clearnet_mappings - Verified .onion ↔ clearnet associationsdarkweb_rate_limits - Per-user rate limitingdarkweb_stats - Cached statisticsdarkweb_audit_log - Compliance and security loggingProcessing millions of domains required a fundamental shift from sequential to parallel processing. Here's how we scaled our infrastructure.
Our original domain valuation daemon processed domains sequentially:
# Original approach - single threaded
while True:
domains = get_domains_needing_valuation(batch_size=100)
for domain in domains:
# 3-4 DB queries per domain
age_data = fetch_rdap_data(domain)
ssl_data = fetch_ssl_cert(domain)
email_data = fetch_email_records(domain)
# Calculate and save
valuation = calculate_value(domain, age_data, ssl_data, email_data)
save_to_database(valuation)
time.sleep(60) # Wait before next batch
Issues encountered at scale:
We migrated to a distributed task queue architecture using Celery and Redis:
Celery Beat schedules batches of 1,000 domains every 2 minutes. Each domain becomes an independent task that can be processed by any available worker.
16 concurrent Celery workers process valuations simultaneously. Each worker handles one domain at a time with automatic retry on failure.
Redis serves as the message broker, queuing tasks and distributing them to workers. Provides persistence and visibility into queue depth.
Failed tasks automatically retry up to 3 times with exponential backoff. No more silently dropped valuations.
# New approach - Celery distributed tasks
@app.task(bind=True, max_retries=3)
def value_domain(self, domain_id, domain_name):
try:
# Same valuation logic, but runs in parallel
valuation = calculate_and_save_valuation(domain_id, domain_name)
return {'domain': domain_name, 'value': valuation}
except Exception as e:
self.retry(exc=e) # Automatic retry with backoff
@app.task
def queue_valuation_batch(batch_size=1000):
domains = get_domains_needing_valuation(batch_size)
# Queue all domains as parallel tasks
tasks = group(value_domain.s(d.id, d.name) for d in domains)
tasks.apply_async() # Fan out to all workers
# /etc/systemd/system/dnsscience-celery-valuation.service
[Service]
ExecStart=/usr/local/bin/celery -A celery_config worker \
-Q valuation \
-c 16 \ # 16 concurrent workers
-n valuation@%h \
--loglevel=INFO
Restart=always
RestartSec=10
Operating 17+ background services requires automated monitoring and recovery. Manual intervention doesn't scale.
During development, we encountered several recurring issues:
Checks if all enabled services are running. Auto-restarts crashed daemons. Logs restart events to system journal.
Monitors table timestamps. If no new valuations in 60 min, restarts valuation daemon. If no new domains in 60 min, restarts discovery.
Tracks records per hour. Alerts and restarts if below thresholds (e.g., <50 domains/hr or <100 valuations/hr).
Systemd service ensures all daemons start on instance reboot. No manual intervention required after AWS maintenance.
# /usr/local/bin/dnsscience-health-monitor.sh (runs via cron every 15 min)
# Check data freshness - restart if stale
check_data_freshness() {
local table=$1
local service=$2
local max_minutes=$3
LAST_UPDATE=$(psql -c "SELECT EXTRACT(EPOCH FROM
(NOW() - MAX(created_at)))/60 FROM $table;")
if [ "$LAST_UPDATE" -gt "$max_minutes" ]; then
logger "STALE DATA: $table - restarting $service"
systemctl restart $service
fi
}
# Domain discovery - should have new domains every hour
check_data_freshness "discovered_domains" "domain-discovery.service" 60
# Valuations - should value domains every hour
check_data_freshness "domain_valuations" "dnsscience-domain-valuation.service" 60
Single command deploys all services, daemons, and configuration to production:
./deploy_all_services.sh
# Syncs to S3, deploys to instance, enables services, restarts everything
# No more "forgot to deploy" issues
Our architecture is designed to scale horizontally across multiple dimensions:
Auto Scaling Group with load balancer. Currently running t3.medium instances with capacity to scale to t3.xlarge.
RDS PostgreSQL with read replicas. Multi-AZ deployment for high availability.
Redis ElastiCache (cache.t3.small) with 1.5GB memory for hot data.
Daemons run independently and can be distributed across multiple worker instances.
Our architecture is designed with compliance in mind:
Our API follows REST principles with predictable endpoints:
GET /api/stats/live # Real-time platform statistics
GET /api/darkweb/stats # Dark web monitoring stats
GET /api/darkweb/onion/:domain # Check for .onion alternatives
POST /api/lookup # Domain intelligence lookup
GET /api/rdap/:domain # RDAP registration data
GET /api/whois/:domain # WHOIS information
POST /api/bulk-lookup # Batch domain analysis
POST /api/scan # Domain scan (Simple/Advanced/Expert modes)
GET /api/ip/:ip/scan # IP scan (Simple/Advanced/Expert modes)
Our platform offers three scanning modes with progressively granular control:
Quick scans with automatic checks for DNS, SSL, and basic security indicators. Perfect for rapid assessments.
Comprehensive scans including DNSSEC validation, threat intelligence feeds, and enhanced security checks.
Fully customizable scans with granular control over intelligence sources, DNS resolvers, and data collection methods. Choose exactly which checks to run.
Domain Scans: Customize DNS analysis (records, DNSSEC, propagation), security checks (SSL, certificate transparency), email security (SPF, DKIM, DMARC), and threat intelligence sources.
IP Scans: Configure geolocation providers (IPInfo, MaxMind, BGP, RIPEstat), security sources (AbuseIPDB, RBL, threat feeds), and advanced analysis (Cloudflare detection, reverse DNS, WHOIS lookups).
# Example: Expert Mode Domain Scan
POST /api/scan
{
"domain": "example.com",
"expert": true,
"options": {
"dns": ["records", "dnssec", "propagation"],
"security": ["ssl", "ssl-chain", "cert-transparency"],
"email": ["spf", "dkim", "dmarc", "mx-health"],
"intel": ["whois", "reputation", "threat"]
}
}
# Example: Expert Mode IP Scan
GET /api/ip/8.8.8.8/scan?expert=true&options={"geo":["ipinfo","maxmind"]}
Timeline: 3-4 weeks
Cost: $0-10/month
Flexible querying interface for complex use cases. Query exactly the data you need with a single request. Perfect for advanced integrations and custom dashboards.
Tech: Graphene-Python, Apollo Server, GraphQL subscriptions
Timeline: 2-3 weeks
Cost: $50-150/month
Live domain monitoring feeds with instant notifications. Stream CT log discoveries, SSL certificate changes, and DNS updates in real-time.
Tech: Socket.IO, Redis Pub/Sub, AWS API Gateway WebSocket
Timeline: 6-8 weeks
Cost: $50-200/month
Predictive analytics for domain reputation scoring. Anomaly detection, phishing prediction, and automated threat classification using TensorFlow and scikit-learn.
Tech: TensorFlow, scikit-learn, AWS SageMaker
Timeline: 4-6 weeks
Cost: $200-800/month
Natural language queries, automated report generation, and intelligent domain recommendations. Powered by large language models and vector embeddings.
Tech: OpenAI GPT-4, Claude API, Pinecone vector DB
Note: Detailed implementation plans, technical architecture diagrams, and cost breakdowns are available in our internal documentation. These features are being prioritized based on user feedback and enterprise requirements.
This architecture documentation is continuously updated as we enhance our platform.
Last updated: November 2025