Module Reference

MailAccess ships 28 modules covering 800+ platforms. Modules are auto-discovered from backend/modules/ at startup. Each module runs concurrently with all others, subject to MAX_CONCURRENT_MODULES and MODULE_TIMEOUT_SECONDS.

A module marked key required skips itself with status: skipped when its API key is absent — it does not cause the investigation to fail.

`xposedornot`

Query XposedOrNot's public breach corpus for direct email-to-breach associations.


Requires key	No
Default	On
Rate limit	1 request / second
Status	Implemented

This module is free and does not require any API key. It calls both public XposedOrNot endpoints:

GET /v1/check-email/{email} for the direct breach association lookup
GET /v1/breach-analytics?email={email} for per-breach metadata and risk context

It returns one finding per breach with canonical breach name, exposed data classes, and risk indicators. Findings are normalized later with other breach sources, so the same breach from XposedOrNot and HIBP becomes a single canonical finding with sources attribution.

Finding shape:

{
  "platform": "XposedOrNot",
  "source": "xposedornot",
  "confidence": "high",
  "severity": "critical",
  "metadata": {
    "breach_name": "SweClockers",
    "breach_id": "SweClockers",
    "domain": "sweclockers.com",
    "breached_date": "2015-01-01",
    "industry": "Electronics",
    "exposed_records": 254967,
    "data_classes": ["Email addresses", "Usernames", "Passwords"],
    "risk": "critical",
    "risk_indicators": {
      "password_risk": "hardtocrack",
      "searchable": true,
      "verified": true,
      "sensitive": false
    },
    "direct_match": true,
    "source_module": "xposedornot"
  }
}

Module metadata:

{
  "breaches_found": 2,
  "direct_breaches": ["SweClockers", "Tesco"],
  "analytics_breaches": ["SweClockers", "Tesco"],
  "all_data_classes": ["Email addresses", "Passwords", "Usernames"],
  "risk_label": "Low",
  "risk_score": 3,
  "yearwise_details": {
    "y2015": 1,
    "y2020": 0
  },
  "direct_response": {},
  "analytics_response": {}
}

Rate-limit responses return status: partial with a retry hint.

`leakcheck`

Query LeakCheck's public breach corpus for direct email-to-breach associations.


Requires key	No
Default	On
Rate limit	1 request / 2 seconds
Status	Implemented

This module is free and does not require any API key. It calls the public LeakCheck endpoint:

GET https://leakcheck.io/api/public?check={email}

It returns one finding per breach with the breach name. Findings are normalized later with other breach sources, so the same breach from LeakCheck, XposedOrNot and HIBP becomes a single canonical finding with sources attribution. Regional breach lists that XposedOrNot misses are still surfaced here, and generic source labels are routed to the stealer signal path instead of the breach count.

Finding shape:

{
  "platform": "000webhost",
  "source": "leakcheck",
  "confidence": "high",
  "severity": "medium",
  "breach_name": "000webhost",
  "metadata": {
    "breach_name": "000webhost",
    "source_module": "leakcheck"
  }
}

Module metadata:

{
  "email": "test@example.com",
  "sources_found": 1,
  "breach_names": ["000webhost"]
}

Rate-limit responses return status: partial with a clear message.

`ransomware_intel`

Check whether the email domain appears in ransomware victim lists.


Requires key	No
Default	On
Scope	Domain-level signal
Status	Implemented

This module is free, requires no API key, and skips free email providers. It correlates the target domain against ransomware victim lists sourced from ransomware.live and ransomlook.io.

Finding shape:

{
  "platform": "RansomwareIntel",
  "source": "ransomware_intel",
  "signal_type": "ransomware_victim_domain",
  "confidence": "medium",
  "severity": "high",
  "metadata": {
    "domain": "example.com",
    "group_name": "Example Gang",
    "attack_date": "2025-01-01",
    "note": "Domain-level victim signal"
  }
}

Module metadata:

{
  "domain_checked": "example.com",
  "victim_found": true,
  "ransomware_group": "Example Gang",
  "attack_date": "2025-01-01",
  "is_free_provider": false
}

`hibp`

Check if the email address appears in known data breaches via the HaveIBeenPwned v3 API.


Requires key	Yes — `HIBP_API_KEY`
Status	Implemented

Findings schema (one per breach):

{
  "platform": "HaveIBeenPwned",
  "url": "https://haveibeenpwned.com/PwnedWebsites#Adobe",
  "metadata": {
    "name": "Adobe",
    "domain": "adobe.com",
    "breach_date": "2013-10-04",
    "description": "...",
    "data_classes": ["Email addresses", "Passwords"],
    "is_sensitive": false,
    "is_verified": true,
    "pwn_count": 152445165,
    "severity": "critical"
  },
  "confidence": "high"
}

Severity is derived from data_classes: critical if passwords or financial data are present, high if phone numbers or addresses, medium otherwise.

Module metadata:

{
  "total_breaches": 3,
  "breach_dates": "2013-10-04 to 2023-01-01",
  "most_critical_breach": "Adobe",
  "all_data_classes": ["Email addresses", "Passwords"]
}

`emailrep`

Query EmailRep.io for a reputation score, risk flags, and linked profiles.


Requires key	No (a key raises the rate limit — set `EMAILREP_API_KEY` if querying at volume)
Status	Implemented

Findings schema (one finding per investigation):

{
  "platform": "emailrep",
  "confidence": "high",
  "severity": "high",
  "metadata": {
    "reputation": "high",
    "suspicious": false,
    "references": 12,
    "blacklisted": false,
    "malicious_activity": false,
    "credentials_leaked": true,
    "data_breach": true,
    "last_seen": "2024-01-15",
    "spam": false,
    "free_provider": true,
    "disposable": false,
    "profiles": ["twitter", "linkedin"]
  }
}

`gravatar`

Look up Gravatar and Libravatar profiles linked to the email address.


Requires key	No
Status	Implemented

The email is hashed with MD5 (Gravatar standard). If a profile exists, the finding includes the display name, thumbnail URL, and any linked third-party accounts the user has added to their Gravatar profile.

Findings schema:

{
  "platform": "Gravatar",
  "url": "https://www.gravatar.com/abc123",
  "metadata": {
    "display_name": "Jane Doe",
    "thumbnail_url": "https://www.gravatar.com/avatar/abc123",
    "profile_url": "https://www.gravatar.com/abc123",
    "accounts": [...],
    "location": "San Francisco",
    "verified_accounts": [...]
  },
  "confidence": "high"
}

A Libravatar finding (confidence low) is added if an avatar is hosted there.

`google_dork`

Run Google dork queries via SerpAPI to surface public mentions of the email address.


Requires key	Yes — `SERPAPI_KEY`
Status	Implemented

Runs 5 dork templates concurrently:

site:linkedin.com "{email}"
site:github.com "{email}"
"{email}" site:pastebin.com
"{email}" filetype:pdf OR filetype:csv OR filetype:xlsx
intext:"{email}" -site:linkedin.com -site:github.com

Up to 5 results per dork are returned as findings with platform inferred from the URL.

Module metadata:

{
  "total_results_found": 8,
  "dorks_run": 5,
  "dorks_with_hits": 3
}

`email_discovery`

Post-primary module that dorks for other email addresses owned by the same person, using real names recovered by GHunt, Gravatar, WHOIS, breach metadata, social findings, or EmailRep.


Gate	`ENABLE_EMAIL_DISCOVERY=true` (default)
Requires key	Yes - `SERPAPI_KEY`
Runs	Post-primary; needs a name from primary modules
Skips	Automatically if no name was recovered
Status	Implemented

Enabled by default (ENABLE_EMAIL_DISCOVERY=true) and self-gating: it skips when SERPAPI_KEY is missing or no usable real name was recovered. It does not recursively investigate discovered addresses.

For up to 3 recovered names, it runs these SerpAPI dorks concurrently:

"{full_name}" "@gmail.com" OR "@outlook.com" OR "@yahoo.com" OR "@protonmail.com"
"{full_name}" "email" OR "contact" -site:linkedin.com -site:facebook.com
"{full_name}" "@" filetype:pdf OR filetype:csv
site:linkedin.com "{full_name}" "{domain}" for corporate target domains only

Finding example:

{
  "platform": "email_discovery",
  "profile_url": "https://docs.example.com/team",
  "confidence": "high",
  "metadata": {
    "discovered_email": "jane.doe@example.net",
    "source_name": "Jane Doe",
    "source_url": "https://docs.example.com/team",
    "snippet": "contact Jane at jane.doe@example.net",
    "dork_used": "contact_terms"
  }
}

Finding fields: discovered_email, source_name, source_url, snippet, dork_used.

Module metadata:

{
  "names_searched": 1,
  "dorks_run": 4,
  "emails_discovered": 1,
  "discovered_emails": ["jane.doe@example.net"]
}

Metadata fields: names_searched, dorks_run, emails_discovered, discovered_emails.

`wayback`

Search the Internet Archive Wayback Machine CDX API for archived pages mentioning the email address, then enrich the top results with archived page title and nearby context.


Gate	None; always runs
Requires key	No
Status	Implemented

This module runs a bounded CDX search and fetches archived page content for only the top 5 pages to avoid hammering Wayback. A 429 from the archive returns status: partial.

Findings schema (one per archived page):

{
  "platform": "wayback_machine",
  "profile_url": "https://web.archive.org/web/20190101120000/https://example.com/contact",
  "confidence": "high",
  "metadata": {
    "original_url": "https://example.com/contact",
    "archive_date": "2019-01-01",
    "page_title": "Contact",
    "context_snippet": "...contact jane@example.com for...",
    "original_domain": "example.com",
    "years_ago": 7
  }
}

Finding fields: original_url, archive_date, page_title, context_snippet, original_domain, years_ago.

Module metadata:

{
  "pages_found": 4,
  "earliest_mention": "2019-01-01",
  "latest_mention": "2023-06-15",
  "unique_domains": ["example.com", "forum.example.net"],
  "oldest_domain": "example.com"
}

Metadata fields: pages_found, earliest_mention, latest_mention, unique_domains, oldest_domain.

`github_commits`

Search public GitHub commits for the target email as an author, plus a GitHub user search fallback for public profile emails.


Gate	None; always runs
Requires key	No (`GITHUB_TOKEN` optional for higher rate limits)
Status	Implemented

GITHUB_TOKEN is required for commit author-email search. Without it the module returns PARTIAL and runs the user profile search fallback only. Set via:
mailaccess keys set GITHUB_TOKEN your-token

Unauthenticated requests are limited to 10 req/min. With GITHUB_TOKEN, the limit rises to 30 req/min.

Commit finding schema:

{
  "platform": "github_commit",
  "profile_url": "https://github.com/owner/repo/commit/abc1234...",
  "confidence": "high",
  "metadata": {
    "repo": "owner/repo",
    "repo_url": "https://github.com/owner/repo",
    "commit_sha": "abc1234",
    "commit_message": "Fix authentication bug",
    "author_name": "Jane Doe",
    "commit_date": "2022-03-10T12:34:56Z",
    "repo_stars": 142,
    "repo_language": "Python"
  }
}

Finding fields: repo, repo_url, commit_sha, commit_message, author_name, commit_date, repo_stars, repo_language.

GitHub user finding schema:

{
  "platform": "github_user",
  "profile_url": "https://github.com/janedoe",
  "confidence": "high",
  "metadata": {
    "login": "janedoe",
    "name": "Jane Doe",
    "bio": "Security engineer",
    "public_repos": 24,
    "followers": 180,
    "avatar_url": "https://avatars.githubusercontent.com/u/..."
  }
}

Module metadata:

{
  "commits_found": 3,
  "repos_contributed_to": ["owner/repo"],
  "real_name_from_git": "Jane Doe",
  "earliest_commit": "2021-11-01T09:00:00Z",
  "latest_commit": "2023-04-12T18:30:00Z",
  "primary_language": "Python",
  "github_user_found": true
}

Metadata fields: commits_found, repos_contributed_to, real_name_from_git, earliest_commit, latest_commit, primary_language, github_user_found.

`domain_intel`

WHOIS registration data, DNS security signals (SPF / DMARC / MX), website presence, and optionally Shodan host data for the email's domain.


Requires key	No (Shodan lookup is added automatically when `SHODAN_API_KEY` is set)
Status	Implemented

Skips free email providers (Gmail, Outlook, ProtonMail, etc.) — these are not worth querying for domain ownership.

Runs four checks concurrently: WHOIS, DNS, website HTTP fetch, and (if a Shodan key is present) Shodan subdomain and port data.

Findings: One finding per check (whois, dns, website, shodan), each with platform set to the check name.

DNS finding example:

{
  "platform": "dns",
  "confidence": "high",
  "metadata": {
    "mx_records": ["aspmx.l.google.com"],
    "mx_provider": "google",
    "spf_record": "v=spf1 include:_spf.google.com ~all",
    "dmarc_record": "v=DMARC1; p=reject; rua=mailto:dmarc@example.com",
    "has_spf": true,
    "has_dmarc": true,
    "a_records": ["93.184.216.34"],
    "ns_records": ["ns1.example.com"]
  }
}

`dns_lookup`

Real DNS resolution for the email's domain: MX, SPF, DMARC, DKIM, A, and NS records. Always runs — no API key required and no opt-in flag.


Requires key	No
Status	Implemented

Findings schema:

{
  "platform": "dns_lookup",
  "confidence": "high",
  "metadata": {
    "mx_records": ["aspmx.l.google.com"],
    "mx_provider": "google",
    "spf_record": "v=spf1 include:_spf.google.com ~all",
    "dmarc_record": "v=DMARC1; p=reject; rua=mailto:dmarc@example.com",
    "dkim_record": "v=DKIM1; k=rsa; p=...",
    "has_spf": true,
    "has_dmarc": true,
    "has_dkim": true,
    "a_records": ["93.184.216.34"],
    "ns_records": ["ns1.example.com"]
  }
}

`whois_lookup`

Full WHOIS registration data for the email's domain. Skips free email providers (Gmail, Outlook, ProtonMail, etc.) and detects privacy-shield registrations.


Requires key	No
Status	Implemented

Supports IANA-managed domains via raw socket fallback to whois.iana.org. Returns PARTIAL if the primary parser fails but the fallback succeeds. Only FAILED on a network error.

Findings schema:

{
  "platform": "whois_lookup",
  "confidence": "high",
  "metadata": {
    "registrar": "Namecheap, Inc.",
    "registered": "2015-03-12",
    "expires": "2027-03-12",
    "updated": "2024-01-05",
    "name_servers": ["ns1.example.com"],
    "privacy_protected": false,
    "registrant_org": "Acme Corp",
    "registrant_country": "US"
  }
}

When the domain uses a privacy shield, privacy_protected is true and registrant fields are omitted.

`social`

Check account existence across 13 social and productivity platforms.


Requires key	No
Status	Implemented

Platforms checked: GitHub, Duolingo, Spotify, Gravatar (linked accounts), Adobe, Patreon, Snapchat, Skype / Microsoft, Zoom, Dropbox, Apple ID, LinkedIn, Discord.

Detection methods vary by platform — some use public search APIs (GitHub), others infer existence from password-reset or registration flows. Confidence is high for direct API matches and medium / low for inferred results.

Finding example:

{
  "platform": "GitHub",
  "profile_url": "https://github.com/janedoe",
  "metadata": {
    "login": "janedoe",
    "name": "Jane Doe",
    "bio": "...",
    "location": "San Francisco",
    "public_repos": 24,
    "followers": 180
  },
  "confidence": "high"
}

LinkedIn, Snapchat, and several others aggressively block automated requests. These findings carry medium or low confidence and may be absent entirely when the platform changes its behavior.

`social_links`

Derives username variations from the target email (local part, display names from prior findings) and feeds them into username_pivot. Also probes links extracted from social profile bios and cross-references them across modules.


Requires key	No
Status	Implemented

Runs in the primary phase alongside other modules. Findings are username candidates — they are not independent confirmations but signals passed to username_pivot for validation.

Module metadata:

{
  "usernames_derived": ["janedoe", "jane.doe", "jdoe"],
  "source_modules": ["gravatar", "social"],
  "links_extracted": 3
}

`account_discovery`

Check account existence across 120+ platforms powered by Holehe.


Requires key	No
Status	Implemented

Platform coverage is dynamic — as Holehe adds new platforms upstream, this module picks them up automatically on the next install. See the Holehe repository for the current full platform list.

Enable via ENABLE_ACCOUNT_DISCOVERY=true (opt-in — runs 120+ probes, expect 30–60 s per investigation).

Finding example (account confirmed):

{
  "platform": "twitter",
  "profile_url": "https://twitter.com",
  "metadata": {
    "email_recovery": "j***@gmail.com",
    "high_value": true
  },
  "confidence": "high",
  "source": "account_discovery"
}

Findings with email_recovery or phone_hint in metadata are flagged high_value: true — these reveal partial contact details useful for cross-module correlation.

Module metadata:

{
  "platforms_checked": 124,
  "platforms_confirmed": 3,
  "platforms_rate_limited": 2,
  "platforms_not_found": 119,
  "holehe_version": "1.61"
}

`whatsmyname`

Username enumeration across 700+ platforms via the WhatsMyName dataset.


Requires key	No
Status	Implemented

Opt-in (ENABLE_WHATSMYNAME=true) because the sweep fires one HTTP request per platform and takes 60–90 seconds. The dataset is fetched from GitHub on first run and cached locally at data/cache/wmn-data.json for 24 hours.

Finding example (account confirmed):

{
  "platform": "HackerNews",
  "profile_url": "https://news.ycombinator.com/user?id=janedoe",
  "metadata": { "category": "tech" },
  "confidence": "high"
}

Module metadata:

{
  "total_platforms_checked": 800,
  "platforms_confirmed": 4,
  "platforms_not_found": 705,
  "platforms_errored": 3,
  "wmn_version": "1.4.0"
}

`user_scanner`

Email registration probes across 205+ platforms via the user-scanner package.


Requires key	No
Status	Implemented

Opt-in (ENABLE_USER_SCANNER=true) because a full sweep can take several minutes. Set user_scanner in MODULE_TIMEOUT_OVERRIDES (default 180s in .env.example).

Finding example (account confirmed):

{
  "platform": "Instagram",
  "profile_url": "https://instagram.com",
  "metadata": {
    "category": "Social",
    "reason": "",
    "source": "user_scanner"
  },
  "confidence": "high"
}

Module metadata:

{
  "platforms_checked": 205,
  "platforms_confirmed": 4,
  "platforms_not_registered": 198,
  "user_scanner_version": "1.3.6"
}

`username_pivot`

Post-primary phase: collects up to five unique usernames from primary findings (email local-part, metadata usernames, slugified display names) and re-runs the WhatsMyName dataset for each. Skips platforms already confirmed by the whatsmyname module.


Requires key	No
Status	Implemented

Opt-in (ENABLE_USERNAME_PIVOT=true). Runs after primary modules complete and before permutation_discovery. Reuses the cached WMN dataset at data/cache/wmn-data.json.

Finding example:

{
  "platform": "GitHub",
  "profile_url": "https://github.com/katriel_moses",
  "metadata": {
    "matched_username": "katriel_moses",
    "category": "dev",
    "source": "username_pivot"
  },
  "confidence": "medium"
}

Module metadata:

{
  "usernames_pivoted": ["katriel.moses", "katriel_moses"],
  "platforms_checked": 1600,
  "platforms_confirmed": 2,
  "wmn_version": "1.4.0"
}

`breachdirectory`

Search breach records for the target email via the BreachDirectory RapidAPI.


Requires key	Yes — `BREACHDIRECTORY_API_KEY`
Status	Implemented

One finding per unique breach source. Passwords and hashes are never stored in full — only a two-character hint (e.g. pa***) when a password field is present.

Finding example:

{
  "platform": "Collection1",
  "metadata": {
    "breach_source": "Collection1",
    "has_password_hash": true,
    "password_hint": "pa***"
  },
  "confidence": "high",
  "severity": "critical"
}

Module metadata:

{
  "total_records_found": 3,
  "sources_list": ["Collection1", "LinkedIn"],
  "has_plaintext_hashes": false
}

`hudson_rock`

Check if the email address appears in infostealer credential logs via the Hudson Rock Cavalier API.


Requires key	No
Status	Implemented

Always-on (no opt-in). The API is free and returns a 404 when the email is clean. Rate limits return status: partial.

Returns one summary finding (infection counts, stealer families) and one finding per compromised domain credential.

Finding example (clean): empty findings list, status: success.

Finding example (infected):

{
  "platform": "hudson_rock",
  "metadata": {
    "total_infections": 2,
    "stealer_families": ["RedLine", "Vidar"],
    "first_seen": "2023-04-10",
    "last_seen": "2024-01-22",
    "exposed_corporate_services": 1,
    "exposed_user_services": 4
  },
  "confidence": "high",
  "severity": "critical"
}

Per-domain findings (one per compromised service credential):

{
  "platform": "github.com",
  "url": "https://github.com",
  "metadata": {
    "source": "infostealer_log",
    "stealer_family": "RedLine",
    "date_compromised": "2023-04-10",
    "high_value": true
  },
  "confidence": "high"
}

Module metadata:

{
  "is_infostealer_victim": true,
  "total_infections": 2,
  "total_exposed_services": 5,
  "all_compromised_domains": ["github.com", "..."]
}

`permutation_discovery`

Post-primary-phase orchestrator: if any upstream module recovered a real name (from Gravatar, HIBP breach data, GHunt, etc.), generates up to 60 email permutations and probes each with HIBP and Hudson Rock to find related accounts.


Requires key	No (HIBP key enables breach-check sub-probes)
Status	Implemented

Opt-in (ENABLE_PERMUTATION_DISCOVERY=true) because it adds 30–60 seconds and up to 120 extra API calls. Skips automatically if no name was recovered. The original email address is never re-checked.

Finding example (match):

{
  "platform": "permutation_match",
  "metadata": {
    "matched_email": "jane.doe@outlook.com",
    "source_module": "hibp",
    "match_type": "breach",
    "breach_count": 2
  },
  "confidence": "medium"
}

Module metadata:

{
  "names_found": ["Jane Doe"],
  "permutations_checked": 60,
  "related_emails_found": true,
  "matched_emails": ["jane.doe@outlook.com"]
}

`ghunt`

Extract deep Google account intelligence via GHunt: GAIA ID, display name, profile photo, YouTube channel, public Drive files, Maps review history, and active Google services.


Requires key	Yes — `GHUNT_CREDS_PATH` (session credentials from `ghunt login`)
Status	Implemented

Opt-in (ENABLE_GHUNT=true). Runs only against @gmail.com, @googlemail.com, and domains whose MX records route through Google (Google Workspace). All other domains are skipped immediately.

Requires the ghunt extra: pip install "mailaccess[ghunt]" and a one-time ghunt login. See docs/ghunt-setup.md.

Finding example:

{
  "platform": "google_account",
  "profile_url": "https://plus.google.com/123456789",
  "metadata": {
    "gaia_id": "123456789",
    "display_name": "Jane Doe",
    "account_creation_date": "2011-03-15",
    "profile_photo_url": "https://lh3.googleusercontent.com/...",
    "custom_profile_photo": true,
    "youtube_channel_url": "https://www.youtube.com/channel/...",
    "maps_reviews_count": 12,
    "public_drive_files": 3,
    "google_services_active": ["YouTube", "Maps", "Drive"],
    "possible_location_hint": "London, Shoreditch"
  },
  "confidence": "high"
}

`phone_intel`

Validates phone numbers recovered from primary module findings and probes WhatsApp/Telegram registration hints.


Requires key	No
Status	Implemented

Runs in the post-primary phase when ENABLE_PHONE_INTEL=true (default) and at least one phone number was extracted from prior findings. Skips with status: skipped when no phones are found.

Phone numbers are never stored in full in findings — only masked values (e.g. +1234***7890).

Findings schema (validation):

{
  "platform": "phone_validation",
  "metadata": {
    "phone_number": "+1234***7890",
    "valid": true,
    "country": "United States",
    "carrier": "Verizon",
    "line_type": "mobile",
    "platform_hint": "numverify"
  },
  "confidence": "high"
}

Findings schema (WhatsApp / Telegram — experimental):

{
  "platform": "whatsapp",
  "profile_url": "https://wa.me/15551234567",
  "metadata": {
    "phone_number": "+1555***4567",
    "experimental": true,
    "platform_hint": "possible_registration"
  },
  "confidence": "low"
}

Module metadata:

{
  "phones_processed": 2,
  "phones_found": 3
}

`messaging_hints`

Best-effort Telegram username checks during the primary gather phase. Optional WhatsApp hints when phone numbers are available.


Requires key	No
Status	Implemented

Enabled by default (ENABLE_MESSAGING_HINTS=true). Rate-limited to 3 Telegram username checks per investigation. Signal has no public lookup API — noted in module metadata as signal_checkable: false.

Findings schema:

{
  "platform": "telegram",
  "profile_url": "https://t.me/username",
  "metadata": {
    "username": "jane.doe",
    "display_name": "Jane Doe",
    "photo_url": "https://...",
    "check_type": "username",
    "experimental": true
  },
  "confidence": "low"
}

Module metadata:

{
  "telegram_checks": 3,
  "whatsapp_checks": 0,
  "signal_checkable": false
}

`breach_deep`

Probes account existence on the top 100 HIBP-ranked breached sites from the public HIBP breach corpus.


Gate	`ENABLE_BREACH_DEEP=false` (opt-in)
Requires key	None; HIBP corpus is public
Timeout	90s default; set in `MODULE_TIMEOUT_OVERRIDES`
Status	Implemented

On startup MailAccess fetches https://haveibeenpwned.com/api/v3/breaches and caches it for 24 hours at data/cache/breach_corpus.json. Breaches are ranked by PwnCount multiplied by high-impact data classes: passwords, credit cards, financial data, and phone numbers. By default the module checks the top 100 ranked domains.

Enable with ENABLE_BREACH_DEEP=true, run once with mailaccess investigate user@example.com --modules breach_deep, or tune:

BREACH_DEEP_LIMIT=100
BREACH_DEEP_FULL=false

Known local YAML probes are reused first (adobe, spotify, dropbox, github, discord, linkedin, zoom, skype, apple, patreon). Other domains use bounded generic password-reset inference against the first three common reset endpoints with an 8 second per-site timeout and concurrency capped at 30.

Out of scope: credential verification, password hash lookup, dark web queries, and breach-dataset contents. This module only infers account existence on breached domains.

Findings schema:

{
  "platform": "adobe.com",
  "url": "https://adobe.com",
  "confidence": "high",
  "severity": "critical",
  "source": "breach_deep",
  "metadata": {
    "breach_name": "Adobe",
    "breach_date": "2013-10-04",
    "pwn_count": 152445165,
    "data_classes": ["Email addresses", "Password hints", "Passwords", "Usernames"],
    "severity_label": "critical",
    "severity_score": 457335495.0,
    "probe_method": "yaml",
    "implication": "Credentials from this account may be in publicly available breach datasets"
  }
}

Finding fields: breach_name, breach_date, pwn_count, data_classes, severity_label, probe_method, implication.

Module metadata:

{
  "sites_checked": 100,
  "sites_confirmed": 8,
  "critical_hits": 2,
  "high_hits": 6,
  "total_records_potentially_exposed": 534000000,
  "top_breach": "LinkedIn"
}

Metadata fields: sites_checked, sites_confirmed, critical_hits, high_hits, total_records_potentially_exposed, top_breach.

Note: uses YAML probes for known platforms and generic reset-flow inference for unknown domains.

`identity_graph` (built-in)

Not a standalone module — the identity graph is built automatically after all primary and post-primary modules complete. It cross-references findings by shared usernames, profile photos, display names, and breach data to produce confidence-scored identity clusters.


Requires key	No
Opt-in	No — always runs
Status	Implemented

The graph is available at:

CLI: displayed automatically (use --show-collisions to expand low-confidence clusters)
Web UI: /investigation/:id/graph
API: GET /api/report/{id}/clusters (clusters) and GET /api/report/{id}/graph (D3 nodes/links)

Cluster schema:

{
  "id": "cluster-1",
  "confidence": "high",
  "score": 0.91,
  "reasoning": "Shared username 'janedoe' across GitHub, HackerNews, and Twitter findings",
  "members": [
    {"module": "social", "platform": "GitHub", "username": "janedoe"},
    {"module": "whatsmyname", "platform": "HackerNews", "username": "janedoe"}
  ]
}

Adding a Module

See CONTRIBUTING.md for the full interface contract.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Module Reference

`xposedornot`

`leakcheck`

`ransomware_intel`

`hibp`

`emailrep`

`gravatar`

`google_dork`

`email_discovery`

`wayback`

`github_commits`

`domain_intel`

`dns_lookup`

`whois_lookup`

`social`

`social_links`

`account_discovery`

`whatsmyname`

`user_scanner`

`username_pivot`

`breachdirectory`

`hudson_rock`

`permutation_discovery`

`ghunt`

`phone_intel`

`messaging_hints`

`breach_deep`

`identity_graph` (built-in)

Adding a Module

FilesExpand file tree

modules.md

Latest commit

History

modules.md

File metadata and controls

Module Reference

xposedornot

leakcheck

ransomware_intel

hibp

emailrep

gravatar

google_dork

email_discovery

wayback

github_commits

domain_intel

dns_lookup

whois_lookup

social

social_links

account_discovery

whatsmyname

user_scanner

username_pivot

breachdirectory

hudson_rock

permutation_discovery

ghunt

phone_intel

messaging_hints

breach_deep

identity_graph (built-in)

Adding a Module

`xposedornot`

`leakcheck`

`ransomware_intel`

`hibp`

`emailrep`

`gravatar`

`google_dork`

`email_discovery`

`wayback`

`github_commits`

`domain_intel`

`dns_lookup`

`whois_lookup`

`social`

`social_links`

`account_discovery`

`whatsmyname`

`user_scanner`

`username_pivot`

`breachdirectory`

`hudson_rock`

`permutation_discovery`

`ghunt`

`phone_intel`

`messaging_hints`

`breach_deep`

`identity_graph` (built-in)