Enhance embed URL handling and validation system #7

akshayutture-augment · 2025-11-14T23:41:41Z

Test 4

Summary by CodeRabbit

New Features
- Embed discussions on external websites with responsive iframe support
- Automatically poll RSS/ATOM feeds and import articles as topics
- Display embedded posts with author information, timestamps, and styled formatting

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Replicated from ai-code-review-evaluation/discourse-coderabbit#4

…f feedzirra

akshayutture-augment · 2025-11-14T23:44:30Z

@coderabbitai review

akshayutture-augment · 2025-11-21T22:56:00Z

@coderabbitai review

coderabbitai · 2025-11-21T22:56:05Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2025-11-21T22:56:20Z

Walkthrough

This pull request introduces a complete topic embedding feature, allowing Discourse topics to be embedded in external websites. It includes server-side retrieval via RSS feed polling, client-side iframe integration, database schema extensions for tracking embedded content, and comprehensive site configuration and localization.

Changes

Cohort / File(s)	Change Summary
Dependencies `Gemfile`	Added two new gems: `ruby-readability` and `simple-rss` with `require: false` for feed parsing and article extraction.
Frontend Assets `app/assets/javascripts/embed.js`, `app/assets/stylesheets/embed.css.scss`	Introduced embed iframe script that loads topics from `discourseUrl + "embed/best"`, listens for resize messages, and dynamically adjusts height. Added comprehensive styling for embedded post layout, headers, footers, and typography.
Embedding Controller `app/controllers/embed_controller.rb`	New controller implementing `best` action that resolves embedded topics by URL; enqueues async retrieval if topic not found; skips XHR checks; validates embeddable host via `ensure_embeddable` filter.
Embed Model `app/models/topic_embed.rb`	New model providing `import` and `import_remote` class methods to create/update embedded topics; includes `absolutize_urls` for converting relative URLs to absolute; tracks content via SHA1 hash to detect updates.
Post Model Extension `app/models/post.rb`	Added `cook_method` enum to support raw HTML rendering; modified `cook` method to bypass markdown processing when `cook_method` is `:raw_html`.
Async Jobs `app/jobs/regular/retrieve_topic.rb`, `app/jobs/scheduled/poll_feed.rb`	`RetrieveTopic` fetches single embedded topics asynchronously; `PollFeed` scheduled job (hourly) periodically fetches and imports RSS feed items with conditional throttling based on staff status.
Retrieval Logic `lib/topic_retriever.rb`	New orchestrator for remote topic retrieval with host validation, Redis-based throttling (60-second window), idempotence checks, and fallback to feed polling.
Supporting Libraries `lib/post_creator.rb`, `lib/post_revisor.rb`, `lib/tasks/disqus.thor`	Extended `PostCreator` to support `cook_method` assignment; added conditional validation skip in `PostRevisor`; migrated Disqus import to use new `TopicEmbed.import_remote` API.
Embed Views `app/views/embed/best.html.erb`, `app/views/embed/loading.html.erb`, `app/views/layouts/embed.html.erb`	Added templates for displaying embedded topics with post iteration, header/footer with site logo, auto-reload on loading view, and parent iframe height communication.
Configuration `config/routes.rb`, `config/site_settings.yml`, `config/locales/client.en.yml`, `config/locales/server.en.yml`	Added `embed/best` route; introduced four site settings (`embeddable_host`, `feed_polling_enabled`, `feed_polling_url`, `embed_by_username`); added translation keys for embedding UI labels and messages.
Database Migrations `db/migrate/20131217174004_create_topic_embeds.rb`, `db/migrate/20131219203905_add_cook_method_to_posts.rb`, `db/migrate/20131223171005_create_top_topics.rb`, `db/migrate/20131210181901_migrate_word_counts.rb`	Created `topic_embeds` table with unique index on `embed_url`; added `cook_method` column to `posts` with default 1; modified `top_topics` creation to use `force: true`; whitespace fix in word counts migration.
Test Coverage `spec/components/topic_retriever_spec.rb`, `spec/controllers/embed_controller_spec.rb`, `spec/jobs/poll_feed_spec.rb`, `spec/models/topic_embed_spec.rb`	Added comprehensive tests for retrieval host validation, throttling, controller access control, job execution guards, and topic embed import/update flows with URL absolutization and cook method validation.

Sequence Diagram(s)

sequenceDiagram
    participant External as External Site
    participant Iframe as Embed iframe
    participant Server as Discourse Server
    participant DB as Database
    participant Redis as Redis Cache

    External->>Iframe: Load embed.js
    Iframe->>Server: GET /embed/best?embed_url=...
    Server->>DB: Check TopicEmbed for URL
    alt Topic Exists
        DB-->>Server: TopicEmbed found
        Server->>DB: Load TopicView (best 5 posts)
        DB-->>Server: Posts data
    else Topic Missing
        DB-->>Server: Not found
        Server->>Redis: Check throttle key
        alt Not Throttled
            Redis-->>Server: OK (or expired)
            Server->>Server: Enqueue RetrieveTopic job
            Server-->>Iframe: Render loading view
            Note over Server: Job runs async
            Server->>Server: TopicRetriever.retrieve
            Server->>Server: Validate embeddable_host
            Server->>External: Fetch via HTTP (ruby-readability)
            External-->>Server: Article HTML
            Server->>DB: TopicEmbed.import_remote (create Post)
        else Throttled
            Redis-->>Server: Wait period active
            Server-->>Iframe: Render loading view
        end
    end
    
    Iframe-->>External: Render posts
    Iframe->>Iframe: Measure body height
    Iframe->>External: postMessage({type: 'discourse-resize', height: X})
    External->>External: Adjust iframe height

sequenceDiagram
    participant Scheduler as Scheduler
    participant Job as PollFeed Job
    participant Settings as Site Settings
    participant Feed as RSS Feed
    participant TopicEmbed as TopicEmbed
    participant DB as Database

    Scheduler->>Job: Execute (hourly)
    Job->>Settings: Check feed_polling_enabled?
    Job->>Settings: Check feed_polling_url
    Job->>Settings: Check embed_by_username
    
    alt Prerequisites Met
        Job->>Feed: Fetch RSS/ATOM feed
        Feed-->>Job: Feed items
        Job->>Job: Parse items (SimpleRSS)
        
        loop For Each Feed Item
            Job->>TopicEmbed: import(user, url, title, content)
            TopicEmbed->>DB: Check if URL exists
            alt New URL
                TopicEmbed->>DB: Create Topic + Post
                TopicEmbed->>DB: Create TopicEmbed record
            else Content Changed
                TopicEmbed->>DB: Revise existing Post
            end
            DB-->>TopicEmbed: Updated post
        end
    else Missing Configuration
        Job->>Job: Skip execution
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas requiring extra attention:

TopicEmbed model (app/models/topic_embed.rb) — Complex import logic with URL absolutization, content hashing, and transactional post creation/revision flow; verify SHA1-based change detection and edge cases around existing embeds
TopicRetriever class (lib/topic_retriever.rb) — Redis throttling logic with 60-second window and host validation; ensure idempotence and proper error handling for invalid URIs
PollFeed scheduled job (app/jobs/scheduled/poll_feed.rb) — Site setting guards and feed parsing; verify feed item normalization and handling of malformed RSS entries
EmbedController access control (app/controllers/embed_controller.rb) — ensure_embeddable filter with referer host matching; confirm X-Frame-Options header is set correctly and CSRF implications are considered
Post cook_method integration (app/models/post.rb, lib/post_creator.rb) — Verify raw HTML bypass doesn't introduce security regressions; check cook_method assignment in PostCreator initialization

Poem

🐰 A hop through the embedding flow,
Where topics in iframes now grow!
With polls and retrieval so fine,
Raw HTML posts now align,
Our warren of feeds starts to glow! 🌿

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'Enhance embed URL handling and validation system' is vague and does not clearly summarize the main changes; the PR introduces comprehensive embedding features including controllers, jobs, models, and views, not just URL handling and validation.	Consider a more specific title that captures the primary objective, such as 'Add topic embed feature with feed polling support' or 'Implement embed controller and topic retrieval system'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch embed-url-handling-post

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (8)

db/migrate/20131217174004_create_topic_embeds.rb (1)
4-11: Consider adding indexes on foreign key columns for query performance.

The topic_id and post_id columns are foreign keys that will likely be used in WHERE clauses and JOINs, but currently lack indexes.

Add indexes to improve query performance:
     end
 
     add_index :topic_embeds, :embed_url, unique: true
+    add_index :topic_embeds, :topic_id
+    add_index :topic_embeds, :post_id
   end
 end
Gemfile (1)
209-211: Update version constraints to pin latest stable gem versions.

The recommendation to pin gem versions is valid for security and reproducibility. However, the suggested versions should be updated based on the latest stable releases:
 # required for feed importing and embedding
-gem 'ruby-readability', require: false
-gem 'simple-rss', require: false
+gem 'ruby-readability', '~> 0.7', require: false
+gem 'simple-rss', '~> 1.3', require: false
ruby-readability latest stable is 0.7.2 (released Aug 29, 2024), so ~> 0.7 is recommended instead of ~> 0.6. simple-rss latest stable is 1.3.3 (released Apr 24, 2018), so ~> 1.3 appropriately captures this version.
spec/jobs/poll_feed_spec.rb (1)
1-40: Optional: Clean up extra blank lines for RuboCop compliance.

The test logic is solid and comprehensive. However, RuboCop flags extra blank lines that could be removed for style consistency.

Apply this diff:
 describe Jobs::PollFeed do
-
   let(:poller) { Jobs::PollFeed.new }
   
   context "execute" do
     # ... tests ...
   end
-
 end
Based on static analysis hints
spec/components/topic_retriever_spec.rb (1)
1-46: Optional: Clean up extra blank lines for RuboCop compliance.

The test suite provides excellent coverage of host validation and throttling logic. The extra blank lines flagged by RuboCop are minor style issues that can be cleaned up.

Apply this diff:
 describe TopicRetriever do
-
   let(:embed_url) { "http://eviltrout.com/2013/02/10/why-discourse-uses-emberjs.html" }
   let(:topic_retriever) { TopicRetriever.new(embed_url) }
   
   # ... tests ...
   
   context "with a valid host" do
     # ... tests ...
-
   end
-
 end
Based on static analysis hints
app/assets/javascripts/embed.js (1)
5-23: Harden container lookup and origin check in embed script

Right now comments is assumed to exist and the origin check uses a loose substring match. A couple of small tweaks would make this more robust:

Guard against a missing #discourse-comments container so the script fails gracefully instead of throwing.

Require discourseUrl to start with e.origin to avoid accidental substring matches.

Example:
-  var comments = document.getElementById('discourse-comments'),
-      iframe = document.createElement('iframe');
+  var comments = document.getElementById('discourse-comments');
+  if (!comments) { return; }
+  var iframe = document.createElement('iframe');
@@
-    if (discourseUrl.indexOf(e.origin) === -1) { return; }
+    if (!e.origin || discourseUrl.indexOf(e.origin) !== 0) { return; }
app/controllers/embed_controller.rb (1)
2-4: Update deprecated filter syntax to before_action.

The skip_before_filter and before_filter methods are deprecated in Rails 4+ and removed in Rails 5.1+. Use skip_before_action and before_action instead.

Apply this diff to update the syntax:
-  skip_before_filter :check_xhr
-  skip_before_filter :preload_json
-  before_filter :ensure_embeddable
+  skip_before_action :check_xhr
+  skip_before_action :preload_json
+  before_action :ensure_embeddable
lib/topic_retriever.rb (1)

49-49: Optional: Redundant downcase call if column is already lowercase.

If the username_lower column already stores lowercase usernames, the downcase call on SiteSetting.embed_by_username might be redundant. However, this defensive approach ensures correctness if the column naming convention doesn't guarantee lowercase storage.
app/models/topic_embed.rb (1)
11-11: Strengthen URL validation.

The regex /^https?\:\/\// only checks if the URL starts with http:// or https://, but doesn't validate URL structure or prevent malformed URLs. Consider using URI parsing for more robust validation.
-    return unless url =~ /^https?\:\/\//
+    begin
+      uri = URI.parse(url)
+      return unless uri.is_a?(URI::HTTP) || uri.is_a?(URI::HTTPS)
+    rescue URI::InvalidURIError
+      return
+    end

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62db063 and 4f8aed2.

⛔ Files ignored due to path filters (1)

Gemfile_rails4.lock is excluded by !**/*.lock

📒 Files selected for processing (27)

Gemfile (1 hunks)
app/assets/javascripts/embed.js (1 hunks)
app/assets/stylesheets/embed.css.scss (1 hunks)
app/controllers/embed_controller.rb (1 hunks)
app/jobs/regular/retrieve_topic.rb (1 hunks)
app/jobs/scheduled/poll_feed.rb (1 hunks)
app/models/post.rb (2 hunks)
app/models/topic_embed.rb (1 hunks)
app/views/embed/best.html.erb (1 hunks)
app/views/embed/loading.html.erb (1 hunks)
app/views/layouts/embed.html.erb (1 hunks)
config/locales/client.en.yml (1 hunks)
config/locales/server.en.yml (2 hunks)
config/routes.rb (1 hunks)
config/site_settings.yml (1 hunks)
db/migrate/20131210181901_migrate_word_counts.rb (2 hunks)
db/migrate/20131217174004_create_topic_embeds.rb (1 hunks)
db/migrate/20131219203905_add_cook_method_to_posts.rb (1 hunks)
db/migrate/20131223171005_create_top_topics.rb (1 hunks)
lib/post_creator.rb (1 hunks)
lib/post_revisor.rb (1 hunks)
lib/tasks/disqus.thor (1 hunks)
lib/topic_retriever.rb (1 hunks)
spec/components/topic_retriever_spec.rb (1 hunks)
spec/controllers/embed_controller_spec.rb (1 hunks)
spec/jobs/poll_feed_spec.rb (1 hunks)
spec/models/topic_embed_spec.rb (1 hunks)

🧰 Additional context used

🪛 Brakeman (7.1.1)

app/jobs/scheduled/poll_feed.rb

[medium] 29-29: Model attribute used in file name
Type: File Access
Confidence: Medium
More info: https://brakemanscanner.org/docs/warning_types/file_access/

(File Access)

config/routes.rb

[medium] Rails 3.2.12 with globbing routes is vulnerable to directory traversal and remote code execution. Patch or upgrade to Rails 3.2.18
Type: Remote Code Execution
Confidence: Medium
More info: http://matasano.com/research/AnatomyOfRailsVuln-CVE-2014-0130.pdf

(Remote Code Execution)

🪛 RuboCop (1.81.7)

spec/controllers/embed_controller_spec.rb

[convention] 4-5: Extra empty line detected at block body beginning.

(Layout/EmptyLinesAroundBlockBody)

[convention] 30-31: Extra empty line detected at block body beginning.

(Layout/EmptyLinesAroundBlockBody)

[convention] 54-55: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)

[convention] 57-58: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)

spec/jobs/poll_feed_spec.rb

[convention] 5-6: Extra empty line detected at block body beginning.

(Layout/EmptyLinesAroundBlockBody)

[convention] 39-40: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)

app/jobs/regular/retrieve_topic.rb

[convention] 5-6: Extra empty line detected at module body beginning.

(Layout/EmptyLinesAroundModuleBody)

[convention] 21-22: Extra empty line detected at module body end.

(Layout/EmptyLinesAroundModuleBody)

spec/components/topic_retriever_spec.rb

[convention] 5-6: Extra empty line detected at block body beginning.

(Layout/EmptyLinesAroundBlockBody)

[convention] 43-44: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)

[convention] 45-46: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)

spec/models/topic_embed_spec.rb

[convention] 4-5: Extra empty line detected at block body beginning.

(Layout/EmptyLinesAroundBlockBody)

[convention] 12-13: Extra empty line detected at block body beginning.

(Layout/EmptyLinesAroundBlockBody)

[convention] 43-44: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)

[convention] 45-46: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)

[convention] 47-48: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)

app/models/topic_embed.rb

[convention] 10-42: Assignment Branch Condition size for import is too high. [<7, 23, 6> 24.78/23]

(Metrics/AbcSize)

[convention] 56-76: Assignment Branch Condition size for absolutize_urls is too high. [<9, 23, 10> 26.65/23]

(Metrics/AbcSize)

[convention] 56-76: Cyclomatic complexity for absolutize_urls is too high. [9/7]

(Metrics/CyclomaticComplexity)

[convention] 56-76: Perceived complexity for absolutize_urls is too high. [9/8]

(Metrics/PerceivedComplexity)

🔇 Additional comments (19)

config/routes.rb (1)

245-245: LGTM! Route addition is correct.

The new embed route is properly structured and follows existing route patterns. Verification confirms that CVE-2014-0130 does affect the codebase (Rails 3.2.12 requires upgrade to 3.2.18+), but this specific route is safe as it doesn't use the vulnerable globbing pattern with :action. The infrastructure upgrade recommendation is valid but appropriately noted as beyond this PR's scope.

lib/post_creator.rb (1)

216-216: LGTM!

The conditional assignment of cook_method from options follows the established pattern for other attributes in this method and correctly integrates with the new cook_methods functionality introduced in the Post model.

app/views/embed/loading.html.erb (1)

1-12: LGTM!

The loading view is well-structured with a clear purpose. The 30-second auto-reload provides a reasonable interval for topic retrieval to complete, and the header with logo maintains consistent branding during the loading state.

lib/post_revisor.rb (1)

85-85: LGTM!

The conditional validation addition maintains consistency with the PostCreator class (line 232 of lib/post_creator.rb) and provides necessary flexibility for import flows that may need to bypass validations.

app/models/post.rb (2)

63-65: LGTM!

The cook_methods enum follows the established pattern used by types and hidden_reasons, and correctly aligns with the migration default value (1 = :regular).

130-137: LGTM!

The cooking logic correctly short-circuits for raw_html content while maintaining the existing cooking pipeline for regular posts. The comment clearly explains the use case for RSS/embed imports.

db/migrate/20131219203905_add_cook_method_to_posts.rb (1)

1-5: LGTM!

The migration correctly adds the cook_method column with appropriate defaults and constraints. The default value of 1 corresponds to :regular in the Post.cook_methods enum, ensuring existing posts maintain their current cooking behavior.

app/views/layouts/embed.html.erb (1)

6-15: The referer is already validated server-side—the main concern in this review is addressed.

The ensure_embeddable filter in app/controllers/embed_controller.rb (line 26) validates that request.referer matches SiteSetting.embeddable_host before rendering the template. Invalid referers raise an exception (lines 29-30). Since validation occurs before the template renders, the referer passed to postMessage is safe.

The parent check improvement (if (parent !== window)) remains valid defensive coding, though it's a minor enhancement since postMessage to the same window is benign.

Likely an incorrect or invalid review comment.

app/assets/stylesheets/embed.css.scss (1)

4-69: Embed layout styles look consistent and scoped

The CSS cleanly scopes embed layout (post, header, footer, logo, buttons) and matches the markup in the embed views; no functional issues stand out.

lib/tasks/disqus.thor (1)

143-178: Disqus import now correctly routes through TopicEmbed.import_remote

Routing the thread root creation via TopicEmbed.import_remote(user, t[:link], title: t[:title]) keeps Disqus imports consistent with the new embed URL handling/validation logic while still attaching imported replies via PostCreator. The flow looks coherent.

app/jobs/regular/retrieve_topic.rb (1)

9-18: RetrieveTopic job wiring looks correct

The job validates embed_url, optionally looks up a user, and passes a no_throttle flag based on user.try(:staff?) into TopicRetriever. That’s a clean separation of concerns for async retrieval.

spec/controllers/embed_controller_spec.rb (1)

5-52: EmbedController specs cover key embedding flows well

The tests exercise error paths (missing URL/host/referer), success behavior including X-Frame-Options, and both branches of topic_id_for_embed (nil vs present). This gives good coverage of the new embed controller behavior.

spec/models/topic_embed_spec.rb (1)

5-42: TopicEmbed.import specs accurately capture expected behavior

The tests nicely pin down associations, validations, malformed URL handling, raw_html cooking, URL absolutization, and idempotent updates on re-import. That should make future changes to TopicEmbed.import much safer.

app/controllers/embed_controller.rb (1)

24-31: Security: Verify that X-Frame-Options: ALLOWALL meets your requirements.

Setting X-Frame-Options to ALLOWALL permits this content to be embedded in any iframe, which is necessary for the embed feature but also increases the attack surface for clickjacking. Ensure this is the intended behavior and that the embeddable_host validation provides sufficient protection.

Note: The Referer header can be omitted or spoofed by clients, so this check provides defense-in-depth but should not be the sole security mechanism.

lib/topic_retriever.rb (2)

14-19: LGTM: Robust host validation.

The host validation correctly handles invalid URIs and compares against the configured embeddable host.

40-42: Verify synchronous feed polling behavior.

The Jobs::PollFeed.new.execute({}) call executes the feed polling job synchronously, which could block the retrieval process if feed polling takes a long time. Verify that this blocking behavior is intentional and acceptable for your use case.

If this causes timeouts or delays, consider:

Using asynchronous job enqueueing instead

Adding a timeout mechanism

Checking if the poll job has run recently before executing

app/models/topic_embed.rb (3)

22-22: Verify security implications of raw_html cook method with skip_validations.

Using cook_method: Post.cook_methods[:raw_html] with skip_validations: true bypasses Discourse's content sanitization and validation. While necessary for importing external content, ensure that:

Content from absolutize_urls is properly sanitized

Only trusted users can trigger imports

The embeddable_host validation provides sufficient access control

56-76: LGTM: URL absolutization logic is sound.

The absolutize_urls method correctly:

Constructs the base URL with proper scheme, host, and port handling

Converts relative anchor href and image src attributes to absolute URLs

Handles leading slash removal to prevent double slashes

The RuboCop complexity warnings are acceptable given the necessity of this logic.

78-80: LGTM: Efficient topic ID lookup.

Using pluck(:topic_id).first is an efficient way to retrieve a single column value.

coderabbitai · 2025-11-21T23:01:47Z

app/controllers/embed_controller.rb

+  layout 'embed'
+
+  def best
+    embed_url = params.require(:embed_url)


🛠️ Refactor suggestion | 🟠 Major

Validate embed_url parameter format.

The embed_url parameter is required but not validated for URL format. This could allow malformed or non-HTTP(S) URLs to be processed, potentially causing issues in downstream logic.

Consider adding URL format validation:

- embed_url = params.require(:embed_url) + embed_url = params.require(:embed_url) + unless embed_url =~ /\A#{URI::regexp(['http', 'https'])}\z/ + raise ActionController::BadRequest, 'Invalid embed_url format' + end

🤖 Prompt for AI Agents

In app/controllers/embed_controller.rb around line 9, the required embed_url param is not validated for URL format; update the action to parse and validate embed_url after params.require(:embed_url) by attempting to build a URI (or use URI.parse) and confirm it has an http or https scheme and a non-empty host, rescuing parse errors; if validation fails, return a 400/422 response with a clear error message and do not proceed with downstream processing.

coderabbitai · 2025-11-21T23:01:47Z

app/jobs/scheduled/poll_feed.rb

+    def poll_feed
+      user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
+      return if user.blank?
+
+      require 'simple-rss'
+      rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)
+
+      rss.items.each do |i|
+        url = i.link
+        url = i.id if url.blank? || url !~ /^https?\:\/\//
+
+        content = CGI.unescapeHTML(i.content.scrub)
+        TopicEmbed.import(user, url, i.title, content)
+      end


⚠️ Potential issue | 🟠 Major

Guard against missing content in RSS items to avoid job crashes

Many feeds don’t populate a content field on items; in those cases i.content will be nil, and i.content.scrub will raise a NoMethodError, causing the whole PollFeed run to fail.

Consider falling back to description/summary and always calling scrub on a string:

- rss.items.each do |i| - url = i.link - url = i.id if url.blank? || url !~ /^https?\:\/\// - - content = CGI.unescapeHTML(i.content.scrub) - TopicEmbed.import(user, url, i.title, content) - end + rss.items.each do |i| + url = i.link + url = i.id if url.blank? || url !~ /^https?\:\/\// + + raw_content = i.content || i.description || i.summary || "" + content = CGI.unescapeHTML(raw_content.to_s.scrub) + TopicEmbed.import(user, url, i.title, content) + end

(Separately, if you expect untrusted or variable feed_polling_url values, adding basic scheme/host checks and network timeouts around open would further harden this job.)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def poll_feed

user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first

return if user.blank?

require 'simple-rss'

rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)

rss.items.each do |i|

url = i.link

url = i.id if url.blank? || url !~ /^https?\:\/\//

content = CGI.unescapeHTML(i.content.scrub)

TopicEmbed.import(user, url, i.title, content)

end

def poll_feed

user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first

return if user.blank?

require 'simple-rss'

rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)

rss.items.each do |i|

url = i.link

url = i.id if url.blank? || url !~ /^https?\:\/\//

raw_content = i.content || i.description || i.summary || ""

content = CGI.unescapeHTML(raw_content.to_s.scrub)

TopicEmbed.import(user, url, i.title, content)

end

🧰 Tools

🪛 Brakeman (7.1.1)

[medium] 29-29: Model attribute used in file name
Type: File Access
Confidence: Medium
More info: https://brakemanscanner.org/docs/warning_types/file_access/

(File Access)

🤖 Prompt for AI Agents

In app/jobs/scheduled/poll_feed.rb around lines 24 to 37, the code assumes i.content is present and calls i.content.scrub which will raise NoMethodError when content is nil; update the logic to guard against nil by deriving a string first (e.g., content_str = (i.content || i.description || i.summary).to_s), then call scrub and CGI.unescapeHTML on that string before passing to TopicEmbed.import, so you always call scrub on a String and fall back to description/summary when content is missing.

coderabbitai · 2025-11-21T23:01:48Z

app/models/topic_embed.rb

+  def self.import(user, url, title, contents)
+    return unless url =~ /^https?\:\/\//
+
+    contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n"


⚠️ Potential issue | 🔴 Critical

Critical: Potential XSS vulnerability in URL interpolation.

The URL is directly interpolated into HTML without escaping, which could allow XSS attacks if the URL contains malicious characters or JavaScript. Even though the content uses cook_method: :raw_html, user-controlled URLs should be escaped.

Apply proper HTML escaping:

- contents << "\n<hr>\n#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}\n" + require 'cgi' + escaped_url = CGI.escapeHTML(url) + contents << "\n<hr>\n#{I18n.t('embed.imported_from', link: "<a href='#{escaped_url}'>#{escaped_url}</a>")}\n"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

contents << "\n<hr>\n#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}\n"

require 'cgi'

escaped_url = CGI.escapeHTML(url)

contents << "\n<hr>\n#{I18n.t('embed.imported_from', link: "<a href='#{escaped_url}'>#{escaped_url}</a>")}\n"

🤖 Prompt for AI Agents

In app/models/topic_embed.rb around line 13, the URL is interpolated into raw HTML which allows XSS; escape the URL before constructing the anchor tag and ensure both the href and the link text are HTML-escaped (e.g. via ERB::Util.html_escape or Rails’ h helper) and then mark the final string as safe HTML if needed for cook_method: :raw_html. Replace the direct interpolation with code that escapes url for attribute and content, composes the <a> tag from those escaped values, and only then returns the HTML-safe string.

coderabbitai · 2025-11-21T23:01:48Z

app/models/topic_embed.rb

+  def self.import_remote(user, url, opts=nil)
+    require 'ruby-readability'
+
+    opts = opts || {}
+    doc = Readability::Document.new(open(url).read,
+                                        tags: %w[div p code pre h1 h2 h3 b em i strong a img],
+                                        attributes: %w[href src])
+
+    TopicEmbed.import(user, url, opts[:title] || doc.title, doc.content)
+  end


⚠️ Potential issue | 🔴 Critical

Critical: SSRF vulnerability using open(url).read.

Using Ruby's open(url).read to fetch remote content is a severe security vulnerability. The open method (via Kernel#open) can:

Access local files using file:// URLs

Execute commands via pipe syntax (|command)

Access internal network resources

Bypass firewalls and access controls

This allows attackers to:

Read sensitive files from the server filesystem

Access internal services not exposed to the internet

Execute arbitrary commands (in some configurations)

Replace with a safe HTTP client and validate the URL scheme:

def self.import_remote(user, url, opts=nil) require 'ruby-readability' + require 'open-uri' + + # Validate URL scheme + uri = URI.parse(url) + unless uri.is_a?(URI::HTTP) || uri.is_a?(URI::HTTPS) + raise ArgumentError, 'Only HTTP(S) URLs are allowed' + end opts = opts || {} - doc = Readability::Document.new(open(url).read, + # Use OpenURI with safe options or better, use Net::HTTP directly + content = URI.open(url, redirect: true, read_timeout: 10).read + doc = Readability::Document.new(content, tags: %w[div p code pre h1 h2 h3 b em i strong a img], attributes: %w[href src]) TopicEmbed.import(user, url, opts[:title] || doc.title, doc.content) end

Better yet, use a dedicated HTTP client library with proper timeouts and safeguards:

require 'net/http' uri = URI.parse(url) raise ArgumentError, 'Only HTTP(S) URLs are allowed' unless uri.is_a?(URI::HTTP) || uri.is_a?(URI::HTTPS) # Add additional checks to prevent access to internal networks if uri.host.match?(/^(127\.|10\.|172\.(1[6-9]|2\d|3[01])\.|192\.168\.)/) raise ArgumentError, 'Access to internal networks not allowed' end http = Net::HTTP.new(uri.host, uri.port) http.use_ssl = (uri.scheme == 'https') http.open_timeout = 5 http.read_timeout = 10 request = Net::HTTP::Get.new(uri.request_uri) response = http.request(request) content = response.body

🤖 Prompt for AI Agents

In app/models/topic_embed.rb around lines 44 to 53, the use of open(url).read introduces a critical SSRF and command/file access risk; replace it by parsing and validating the URL (allow only http/https), resolve the host and block private/internal IP ranges, use a dedicated HTTP client with connection and read timeouts and redirect limits, perform the GET request and check for successful 200 responses before passing the body to Readability, and handle/network exceptions by raising or logging a clear error instead of calling open directly.

coderabbitai · 2025-11-21T23:01:48Z

app/views/embed/best.html.erb

+  <%- if @topic_view.posts.present? %>
+    <%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank') %>
+  <%- else %>
+    <%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank') %>
+  <%- end if %>


⚠️ Potential issue | 🔴 Critical

Fix invalid ERB block terminator (<%- end if %>)

<%- end if %> is not valid ERB/Ruby syntax and will cause the template to fail to render. The if should only appear on the opening tag; the closing tag must be just end:

- <%- if @topic_view.posts.present? %> - <%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank') %> - <%- else %> - <%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank') %> - <%- end if %> + <%- if @topic_view.posts.present? %> + <%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank') %> + <%- else %> + <%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank') %> + <%- end %>

(Optionally, you might also add a short alt attribute to the avatar <img> for accessibility, but that’s non-blocking.)

🤖 Prompt for AI Agents

In app/views/embed/best.html.erb around lines 2 to 6 the ERB closing tag is invalid (`<%- end if %>`); replace it with a proper ERB block terminator (`<%- end %>`) so the conditional closes correctly, keep the existing opening `<%- if @topic_view.posts.present? %>` and associated link_to branches unchanged, and optionally add a short alt attribute to any avatar <img> in this template for accessibility.

coderabbitai · 2025-11-21T23:01:48Z

db/migrate/20131217174004_create_topic_embeds.rb

@@ -0,0 +1,13 @@
+class CreateTopicEmbeds < ActiveRecord::Migration
+  def change
+    create_table :topic_embeds, force: true do |t|


⚠️ Potential issue | 🟠 Major

Verify the necessity of force: true to prevent unintended data loss.

Using force: true will drop the existing topic_embeds table before recreating it, which can cause data loss if the table already exists with data.

Consider removing force: true unless you're certain the table should be dropped:

- create_table :topic_embeds, force: true do |t| + create_table :topic_embeds do |t|

If this is an initial migration for a new feature, force: true may be acceptable during development, but should be removed before merging to production.

🤖 Prompt for AI Agents

In db/migrate/20131217174004_create_topic_embeds.rb around line 3, the migration uses create_table :topic_embeds, force: true which will drop and recreate the table and can cause unintended data loss; remove the force: true option (or replace with if_not_exists: true if supported) so the migration only creates the table when absent, and only keep force: true temporarily during local development if you intentionally want destructive behavior.

coderabbitai · 2025-11-21T23:01:48Z

db/migrate/20131223171005_create_top_topics.rb

 class CreateTopTopics < ActiveRecord::Migration
  def change
-    create_table :top_topics do |t|
+    create_table :top_topics, force: true do |t|


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Verify the necessity of force: true to prevent unintended data loss.

Using force: true will drop the existing top_topics table before recreating it, which can cause data loss if the table already exists with data.

This option should only be used during initial development or when you're certain the table should be dropped. For production migrations, consider removing force: true or using a more explicit approach:

- create_table :top_topics, force: true do |t| + drop_table :top_topics if table_exists?(:top_topics) + create_table :top_topics do |t|

Or simply:

- create_table :top_topics, force: true do |t| + create_table :top_topics do |t|

Verify whether this table is expected to exist already and whether data loss is acceptable:

🏁 Script executed:

#!/bin/bash # Check if there are other references to top_topics table in migrations rg -n "top_topics" --type=rb -g "db/migrate/*.rb"

Length of output: 26

I'll run a corrected verification script to check for references to the top_topics table in migrations.

--- 🏁 Script executed: ```shell #!/bin/bash # Search for top_topics references in migration files (without file type spec) echo "=== Searching for top_topics in db/migrate directory ===" rg -n "top_topics" db/migrate/ echo -e "\n=== Checking the specific migration file ===" cat db/migrate/20131223171005_create_top_topics.rb

Length of output: 1120

Remove force: true from the initial table creation migration.

This is the initial top_topics table creation (migration 20131223171005), yet it includes force: true, which will drop any existing table before recreating it. While the table shouldn't exist on first run, this option creates unnecessary risk if migrations are re-run (e.g., during database resets or redeployment), causing data loss. A subsequent migration (20131227164338) adds columns to this table, confirming the expected table persistence.

Simply remove force: true:

- create_table :top_topics, force: true do |t| + create_table :top_topics do |t|

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

create_table :top_topics, force: true do |t|

create_table :top_topics do |t|

🤖 Prompt for AI Agents

In db/migrate/20131223171005_create_top_topics.rb around line 3, remove the force: true option from the create_table call so it reads create_table :top_topics do |t|; this prevents the migration from dropping an existing table, keep the rest of the block unchanged and run the migration suite to verify no schema conflicts.

coderabbitai · 2025-11-21T23:01:48Z

lib/topic_retriever.rb

+      if $redis.setnx(retrieved_key, "1")
+        $redis.expire(retrieved_key, 60)


⚠️ Potential issue | 🔴 Critical

Critical: Non-atomic Redis operations create race condition.

The setnx followed by expire is not atomic. If the process crashes or is interrupted between these two calls, the Redis key will never expire, causing permanent throttling for that embed URL. Additionally, there's a small window where another process could check the key before the TTL is set.

Use the atomic SET command with EX option:

- if $redis.setnx(retrieved_key, "1") - $redis.expire(retrieved_key, 60) + if $redis.set(retrieved_key, "1", ex: 60, nx: true) return false end

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if $redis.setnx(retrieved_key, "1")

$redis.expire(retrieved_key, 60)

if $redis.set(retrieved_key, "1", ex: 60, nx: true)

return false

end

🤖 Prompt for AI Agents

In lib/topic_retriever.rb around lines 27-28, the current two-step use of setnx followed by expire is non-atomic and risks leaving the key without a TTL; replace those two calls with a single atomic Redis SET using NX and EX (i.e., set the key only if not exists and set the TTL in the same command) and branch on the SET return value so you only proceed when the key was actually created; if your Redis client doesn't support high-level options, invoke the equivalent raw SET command with NX and EX to achieve atomicity.

FEATURE: Embeddable Discourse comments, now with simple-rss instead o…

4f8aed2

…f feedzirra

ai-code-review-evaluations deleted a comment from coderabbitai bot Nov 18, 2025

coderabbitai bot reviewed Nov 21, 2025

View reviewed changes

-    contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n"
+    require 'cgi'
+    escaped_url = CGI.escapeHTML(url)
+    contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{escaped_url}'>#{escaped_url}</a>")}</small>\n"

	create_table :top_topics, force: true do \|t\|
	create_table :top_topics do \|t\|

		if $redis.setnx(retrieved_key, "1")
		$redis.expire(retrieved_key, 60)

Enhance embed URL handling and validation system #7

Are you sure you want to change the base?

Enhance embed URL handling and validation system #7

Uh oh!

Conversation

akshayutture-augment commented Nov 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

akshayutture-augment commented Nov 14, 2025

Uh oh!

akshayutture-augment commented Nov 21, 2025

Uh oh!

coderabbitai bot commented Nov 21, 2025

Uh oh!

coderabbitai bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

akshayutture-augment commented Nov 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 21, 2025 •

edited

Loading