📊 Data Ingestion & Intelligence Sources

Comprehensive documentation of DNS Science's 20+ real-time data feeds and intelligence sources

🚀 Market Differentiator: DNS Science operates 20+ autonomous data ingestion daemons providing real-time intelligence across 312M+ domains from 1,570+ TLDs. Powered by AllZonefiles.io zone file access, we ingest 200K-500K newly registered domains daily. This continuous ingestion architecture ensures you always have the latest security posture data.

🔄 Data Ingestion Architecture

DNS Science employs a distributed daemon architecture where specialized services continuously ingest, process, and enrich domain intelligence data. Each daemon operates independently, automatically recovering from failures and maintaining data freshness.

Core Ingestion Principles

📡 Active Data Sources (20+ Feeds)

🔒 SSL/TLS Certificate Monitoring

Continuous

Daemon: ssl-scanner.service

Actively scans domains for SSL certificates, tracks expiration dates, validates certificate chains, and alerts on certificate transparency logs.

Tables: ssl_certificates, certificate_history

🌐 RDAP (Registration Data Access Protocol)

Batch

Daemon: rdap.service

Queries RDAP servers for domain registration data including registrar info, nameservers, DNSSEC status, and registration/expiration dates.

Tables: rdap_data

🛡️ Threat Intelligence (Multi-Source)

Real-time

Daemon: threat-intel.service

Aggregates threat data from AlienVault OTX, Pulsedive, URLhaus, PhishTank, Google Safe Browsing, and other feeds.

Tables: threat_intel, threat_history

🔍 Domain Enrichment Engine

Continuous

Daemon: enrichment.service

Calculates comprehensive security scores based on DNSSEC, SPF, DMARC, SSL status, threat intelligence, and blacklist status.

Tables: enrichment_data

🔄 Reverse DNS (ARPA) Scanner

Continuous

Daemon: arpad.service

Performs forward-confirmed reverse DNS (FCrDNS) checks for IPv4 and IPv6 addresses, validating PTR records.

Tables: ptr_records

🕵️ Dark Web Monitoring

Real-time

Daemon: darkweb-monitor.service

Monitors Tor hidden services and dark web marketplaces for domain-related threats, breached credentials, and mentions.

Tables: darkweb_mentions, darkweb_credentials

🌍 Web3 & Blockchain Domains

Continuous

Daemon: web3d.service

Indexes ENS (.eth), Solana Name Service (.sol), Unstoppable Domains, and other blockchain-based naming systems.

Tables: web3_domains

💰 Domain Valuation Engine

Batch

Daemon: domain-valuation.service

Calculates estimated domain values based on length, keywords, TLD, traffic estimates, backlinks, and historical sales data.

Tables: domain_valuations

📧 Email Deliverability Scoring

Continuous

Daemon: email-deliverability.service

Tests SMTP connectivity, validates SPF/DKIM/DMARC records, checks MTA-STS policies, and scores email security posture.

Tables: email_deliverability

📅 Domain Expiry Monitoring

Continuous

Daemon: domain-expiry.service

Tracks expiration dates, sends alerts for domains nearing expiry, and identifies acquisition opportunities.

Tables: expiring_domains

🔎 Domain Discovery Engine

Continuous

Daemon: domain-discovery.service

Discovers new domains through AllZonefiles.io (312M+ domains across 1,570+ TLDs), certificate transparency logs, Tranco/Umbrella rankings, Cloudflare Radar, and WHOIS monitoring. Daily ingestion of 200K-500K newly registered domains.

Tables: discovered_domains

📂 AllZonefiles.io Zone Files

Real-time

Integration: allzonefiles_api

Direct access to TLD zone files including .com (158M+), .net (13M+), .org (10M+), and 1,500+ other gTLDs and ccTLDs. Daily new domain lists and expired domain tracking.

Tables: domains, domain_discovery_stats

🌐 IP Intelligence & Geolocation

Real-time

Daemon: ip-intel.service

Enriches IP addresses with geolocation, ASN data, hosting provider info, reputation scores, and abuse history.

Tables: ip_intelligence

📊 Shodan Integration

Batch

Integration: shodan_integration.py

Queries Shodan for exposed services, open ports, vulnerabilities, and internet-facing infrastructure associated with domains.

Tables: shodan_results

🔐 SecurityTrails Integration

Batch

Integration: securitytrails_integration.py

Historical DNS records, subdomains, WHOIS history, and related domain discovery through SecurityTrails API.

Tables: securitytrails_data

🎣 PhishTank Anti-Phishing

Real-time

Integration: phishtank_integration.py

Checks domains against PhishTank's community-verified phishing database.

Tables: phishing_detections

🦠 URLhaus Malware Tracking

Real-time

Integration: urlhaus_integration.py

Monitors URLhaus for malware distribution URLs and C2 infrastructure associated with domains.

Tables: malware_urls

👁️ AlienVault OTX

Real-time

Integration: alienvault_otx.py

Threat intelligence pulses, IOCs, and community-contributed security intelligence from AlienVault OTX.

Tables: otx_pulses, otx_indicators

🔬 Pulsedive Threat Intel

Real-time

Integration: pulsedive_integration.py

Automated threat intelligence scanning and risk assessment through Pulsedive's threat intelligence platform.

Tables: pulsedive_threats

📜 Certificate Transparency Logs

Real-time

Daemon: certificate-transparency.service

Monitors CT logs for newly issued certificates, discovers subdomains, and detects suspicious certificate issuance.

Tables: ct_logs

🚨 DNS Blackhole Lists

Continuous

Integration: dns_blackhole.py

Checks domains against major DNS blackhole lists (DNSBL) including Spamhaus, SURBL, and custom threat feeds.

Tables: blacklist_status

🏪 Domain Marketplace Integration

Batch

Daemon: domain-acquisition.service

Monitors aftermarket domain sales, auctions, and drop-catching opportunities across major marketplaces.

Tables: marketplace_domains, auction_watch

📊 Reporting & Analytics Engine

Batch

Daemon: reporting.service

Generates daily, weekly, and monthly analytics reports on domain security posture, threat trends, and recommendations.

Tables: reports, report_schedules

🔍 DNS Sniffer Daemon (Client Network Monitoring)

Real-time

Daemon: dnsscience_snifferd

Deploy on client networks to monitor real-time DNS traffic, detect threats, identify malicious domains, and analyze query patterns. Captures all DNS queries from home, office, or remote locations and reports to your DNS Science dashboard.

Capabilities: Real-time threat detection, blacklisted DNS server alerts, traffic pattern analysis, attack detection, and performance monitoring.

Tables: dns_monitoring_locations, dns_queries, dns_threats_detected

Documentation: Installation Guide

🔄 Data Flow & Processing Pipeline

1. Data Collection Layer
   └─> Daemon discovers/receives new data
   └─> Validates and normalizes data format
   └─> Checks for duplicates and rate limits

2. Enrichment Layer
   └─> Cross-references with existing data
   └─> Calculates security scores
   └─> Identifies correlations and patterns

3. Storage Layer
   └─> Persists to PostgreSQL database
   └─> Indexes for fast querying
   └─> Maintains historical records

4. API Layer
   └─> Exposes data via REST API
   └─> Provides real-time WebSocket updates
   └─> Serves GraphQL queries

5. User Interface Layer
   └─> Data Explorer for browsing
   └─> Real-time dashboards
   └─> Alert notifications
            

🗄️ Database Schema Overview

Table Name Purpose Update Frequency Data Source
domains Master domain registry Real-time All sources
scans DNS security scan results Every 5 minutes DNS scanners
ssl_certificates SSL/TLS certificate data Hourly SSL scanner
rdap_data Registration information Daily RDAP daemon
threat_intel Active threats and IOCs Real-time Multiple threat feeds
enrichment_data Security scores and analysis Every 15 minutes Enrichment daemon
ptr_records Reverse DNS records Hourly ARPA daemon
web3_domains Blockchain domain names Every 30 minutes Web3 daemon
darkweb_mentions Dark web monitoring hits Continuous Dark web daemon
domain_valuations Estimated domain values Daily Valuation daemon

⚡ Real-Time vs. Batch Processing

Real-Time Processing (WebSocket Updates)

Batch Processing (Scheduled Jobs)

🎯 How to Access Ingested Data

Via REST API

GET /api/domains?enriched=true
GET /api/threats?domain=example.com
GET /api/ssl-certificates?expiring_soon=true
GET /api/rdap?domain=example.com
GET /api/explorer/enrichment
            

Via GraphQL

query {
  domain(name: "example.com") {
    enrichment {
      security_score
      threats
      ssl_status
    }
    rdap {
      registrar
      expiration_date
    }
  }
}
            

Via Data Explorer

Browse all ingested data interactively at /explorer with filtering, sorting, and export capabilities.

📈 Monitoring Data Ingestion

Check the health and status of all daemons:

# Via CLI
curl https://www.dnsscience.io/api/stats/daemons

# Via systemd (on server)
systemctl status enrichment ssl-scanner threat-intel rdap arpad

# Via monitoring dashboard
https://www.dnsscience.io/admin/monitoring
            

🚀 Why This Matters

Competitive Advantage: Most domain intelligence platforms rely on manual queries or batch updates. DNS Science's continuous ingestion architecture means you get:
  • ✅ Real-time threat detection (seconds, not hours)
  • ✅ Comprehensive historical data for trend analysis
  • ✅ Automated enrichment without manual effort
  • ✅ 20+ data sources cross-referenced automatically
  • ✅ Proactive alerts before issues become critical

🔧 Technical Implementation

All ingestion daemons are implemented as:

📚 Further Reading