Skip to content

Add relay directory crawler#86

Open
alokdangre wants to merge 2 commits into
saiy2k:mainfrom
alokdangre:relay-directory-crawler
Open

Add relay directory crawler#86
alokdangre wants to merge 2 commits into
saiy2k:mainfrom
alokdangre:relay-directory-crawler

Conversation

@alokdangre

@alokdangre alokdangre commented Jun 15, 2026

Copy link
Copy Markdown

Summary'

the relay crawler:

  • queries relays for kind:10011 and kind:0
  • extracts verifiable Twitter identity claims
  • verifies proof tweets
  • optionally checks NIP-57 zappability
  • computes lightweight Web-of-Trust signals
  • writes a structured JSON output for directory review
  • add an end-to-end relay directory crawler for X/Twitter -> Nostr mappings
  • crawl NIP-39 kind:10011 plus legacy kind:0 identity tags
  • verify proof tweets, check NIP-57 zappability, and add WoT/risk scoring as a ranking signal only

Trust model

  • NIP-39 proof verification is the only identity verification path
  • claimed-only profile links are never auto-zapped
  • WoT/NIP-85 data ranks and flags records, but never proves X account ownership
  • autoZapAllowed requires verified identity and an LNURL endpoint with allowsNostr

Tests

  • npm test -- scripts/relay-directory-crawler.test.mjs

Summary by CodeRabbit

  • New Features

    • Added a directory crawler tool (crawl:directory) for building a verified mapping of X/Twitter handles to Nostr identities with proof verification, Lightning zap support detection, and Web-of-Trust scoring.
  • Tests

    • Added test coverage for the directory crawler functionality.

@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

A new Node.js script scripts/relay-directory-crawler.mjs is introduced that crawls Nostr relays via WebSocket to build a verified Twitter/X handle → Nostr pubkey directory. It includes CLI argument parsing, multi-strategy tweet proof verification, NIP-57 zap support checking, WoT/risk scoring, and JSON output. A companion Vitest test suite and a crawl:directory npm script are also added.

Changes

Nostr Relay Directory Crawler

Layer / File(s) Summary
CLI scaffolding and npm script entry
package.json, scripts/relay-directory-crawler.mjs
Registers the crawl:directory npm script and implements CLI argument parsing with validation, default relay/output constants, and a help/usage printer.
Relay querying and identity/event utilities
scripts/relay-directory-crawler.mjs, scripts/relay-directory-crawler.test.mjs
WebSocket relay querying with timeout/bounded collection, pool de-duplication, latest-replaceable-event selection, signed event validation, Twitter/X handle normalization, tweet ID extraction, pubkey-to-npub conversion, and extractDirectoryInputs; tests for normalizeTwitterHandle, extractTweetId, and extractDirectoryInputs.
Tweet fetching, proof verification, and zap support
scripts/relay-directory-crawler.mjs, scripts/relay-directory-crawler.test.mjs
Multi-strategy fetchTweet (X API → syndication → oEmbed), verifyCandidate (handle + npub checks), checkZapSupport (lud16→LNURLp + NIP-57 evaluation), lightningAddressToLnurlp helper, and its test.
WoT/risk scoring
scripts/relay-directory-crawler.mjs, scripts/relay-directory-crawler.test.mjs
Aggregates follower/report/assertion metrics from Nostr kinds 3, 1984, and 30382 into a capped wot score per directory record; includes a scenario test asserting identity fields are preserved.
runCrawler end-to-end orchestration and entry point
scripts/relay-directory-crawler.mjs
Orchestrates the full crawl pipeline: relay queries, candidate extraction, optional proof verification, kind:0 backfill, zap support, WoT scoring, auto-zap eligibility, structured JSON output, and the main-script entry point guard with error reporting and helper exports.

Sequence Diagram(s)

sequenceDiagram
  participant CLI
  participant runCrawler
  participant queryRelayPool
  participant fetchTweet
  participant checkZapSupport
  participant computeWotScores
  participant FileSystem

  CLI->>runCrawler: parsed args
  runCrawler->>queryRelayPool: REQ kinds 10011 + 0
  queryRelayPool-->>runCrawler: de-duplicated events
  runCrawler->>runCrawler: extractDirectoryInputs → candidates, claimed
  loop each candidate
    runCrawler->>fetchTweet: proofTweetId (X API / syndication / oEmbed)
    fetchTweet-->>runCrawler: tweet text + author handle
    runCrawler->>runCrawler: verifyCandidate → verified / rejected
  end
  runCrawler->>queryRelayPool: backfill kind:0 for verified authors
  loop each verified record
    runCrawler->>checkZapSupport: lud16 → LNURLp → allowsNostr
    checkZapSupport-->>runCrawler: zapSupport result
  end
  runCrawler->>computeWotScores: directory records + all events
  computeWotScores-->>runCrawler: records with wot scores
  runCrawler->>FileSystem: write JSON output
  runCrawler-->>CLI: stdout summary
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 Hoppity-hop through relays galore,
Sniffing out npubs and handles in store.
A tweet says "it's me!" — the rabbit checks twice,
With zaps and WoT scores to make things precise.
JSON lands softly, the burrow is complete,
The directory's verified, tidy, and neat! 🌟

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add relay directory crawler' directly and accurately describes the main change: a new relay directory crawler implementation being added to the codebase.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
scripts/relay-directory-crawler.test.mjs (1)

102-117: 💤 Low value

Test event has invalid signature and will be skipped by computeWotScores.

The mock event uses sig: 'invalid-for-test' but computeWotScores calls isValidSignedEvent(event) on line 493 and skips events that fail signature verification. This means the follow event is never counted, so wot.followerGraphMentions will be 0.

The test still passes because it only asserts that identity fields are preserved and wot.note exists, but it doesn't actually test that follower counting works.

Consider either:

  1. Removing the signature validation in computeWotScores (WoT is advisory, not security-critical)
  2. Or adding a test with properly signed events (more complex)
  3. Or accepting that this test only verifies field preservation, not scoring logic
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/relay-directory-crawler.test.mjs` around lines 102 - 117, The mock
follow event in the test uses an invalid signature ('invalid-for-test') which
causes computeWotScores to skip event validation when it calls
isValidSignedEvent() on line 493, meaning the follow event is never counted in
the WoT scoring. To fix this, either remove the signature validation check in
computeWotScores since WoT scoring is advisory and not security-critical, or
provide a properly signed event in the test. Choose the first option if WoT
verification should not be security-critical for advisory purposes, or the
second if signature validation must remain but the test needs to demonstrate
actual scoring logic by using a valid signature.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/relay-directory-crawler.mjs`:
- Around line 53-54: The argument parsing for `--kind10011-limit` and
`--kind0-limit` converts the input values using `Number()` but does not validate
them, allowing `NaN` values to propagate into the relay query logic. Add
validation checks after parsing these arguments (similar to how `timeoutMs` and
`maxProofs` are validated) to ensure the converted numbers are valid and non-NaN
before assigning them to `args.kind10011Limit` and `args.kind0Limit`. If
validation fails, either log an error message or set a sensible default value to
prevent invalid `limit` values from being used in relay queries.

---

Nitpick comments:
In `@scripts/relay-directory-crawler.test.mjs`:
- Around line 102-117: The mock follow event in the test uses an invalid
signature ('invalid-for-test') which causes computeWotScores to skip event
validation when it calls isValidSignedEvent() on line 493, meaning the follow
event is never counted in the WoT scoring. To fix this, either remove the
signature validation check in computeWotScores since WoT scoring is advisory and
not security-critical, or provide a properly signed event in the test. Choose
the first option if WoT verification should not be security-critical for
advisory purposes, or the second if signature validation must remain but the
test needs to demonstrate actual scoring logic by using a valid signature.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ac3557d2-3269-4cae-9df8-6143f6b05aeb

📥 Commits

Reviewing files that changed from the base of the PR and between dfc22e5 and 17dd08b.

📒 Files selected for processing (3)
  • package.json
  • scripts/relay-directory-crawler.mjs
  • scripts/relay-directory-crawler.test.mjs

Comment on lines +53 to +54
else if (arg === '--kind10011-limit') args.kind10011Limit = Number(next());
else if (arg === '--kind0-limit') args.kind0Limit = Number(next());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing validation for --kind10011-limit and --kind0-limit.

These numeric arguments are parsed with Number() but not validated like timeoutMs and maxProofs. If invalid values are passed, they become NaN and propagate to the relay query's limit field.

Proposed fix
   if (!args.relays.length) throw new Error('At least one relay is required.');
   if (!Number.isFinite(args.timeoutMs) || args.timeoutMs <= 0) throw new Error('--timeout-ms must be positive.');
   if (!Number.isFinite(args.maxProofs) || args.maxProofs < 0) throw new Error('--max-proofs must be >= 0.');
+  if (!Number.isFinite(args.kind10011Limit) || args.kind10011Limit <= 0) throw new Error('--kind10011-limit must be positive.');
+  if (!Number.isFinite(args.kind0Limit) || args.kind0Limit <= 0) throw new Error('--kind0-limit must be positive.');
   return args;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/relay-directory-crawler.mjs` around lines 53 - 54, The argument
parsing for `--kind10011-limit` and `--kind0-limit` converts the input values
using `Number()` but does not validate them, allowing `NaN` values to propagate
into the relay query logic. Add validation checks after parsing these arguments
(similar to how `timeoutMs` and `maxProofs` are validated) to ensure the
converted numbers are valid and non-NaN before assigning them to
`args.kind10011Limit` and `args.kind0Limit`. If validation fails, either log an
error message or set a sensible default value to prevent invalid `limit` values
from being used in relay queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant