Skip to content

Conversation

@akshayutture-augment
Copy link

@akshayutture-augment akshayutture-augment commented Nov 14, 2025

Test 4

Summary by CodeRabbit

  • New Features
    • Embed discussions on external websites with responsive iframe support
    • Automatically poll RSS/ATOM feeds and import articles as topics
    • Display embedded posts with author information, timestamps, and styled formatting

✏️ Tip: You can customize this high-level summary in your review settings.


Replicated from ai-code-review-evaluation/discourse-coderabbit#4

@akshayutture-augment
Copy link
Author

@coderabbitai review

@akshayutture-augment
Copy link
Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Nov 21, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link

coderabbitai bot commented Nov 21, 2025

Walkthrough

This pull request introduces a complete topic embedding feature, allowing Discourse topics to be embedded in external websites. It includes server-side retrieval via RSS feed polling, client-side iframe integration, database schema extensions for tracking embedded content, and comprehensive site configuration and localization.

Changes

Cohort / File(s) Change Summary
Dependencies
Gemfile
Added two new gems: ruby-readability and simple-rss with require: false for feed parsing and article extraction.
Frontend Assets
app/assets/javascripts/embed.js, app/assets/stylesheets/embed.css.scss
Introduced embed iframe script that loads topics from discourseUrl + "embed/best", listens for resize messages, and dynamically adjusts height. Added comprehensive styling for embedded post layout, headers, footers, and typography.
Embedding Controller
app/controllers/embed_controller.rb
New controller implementing best action that resolves embedded topics by URL; enqueues async retrieval if topic not found; skips XHR checks; validates embeddable host via ensure_embeddable filter.
Embed Model
app/models/topic_embed.rb
New model providing import and import_remote class methods to create/update embedded topics; includes absolutize_urls for converting relative URLs to absolute; tracks content via SHA1 hash to detect updates.
Post Model Extension
app/models/post.rb
Added cook_method enum to support raw HTML rendering; modified cook method to bypass markdown processing when cook_method is :raw_html.
Async Jobs
app/jobs/regular/retrieve_topic.rb, app/jobs/scheduled/poll_feed.rb
RetrieveTopic fetches single embedded topics asynchronously; PollFeed scheduled job (hourly) periodically fetches and imports RSS feed items with conditional throttling based on staff status.
Retrieval Logic
lib/topic_retriever.rb
New orchestrator for remote topic retrieval with host validation, Redis-based throttling (60-second window), idempotence checks, and fallback to feed polling.
Supporting Libraries
lib/post_creator.rb, lib/post_revisor.rb, lib/tasks/disqus.thor
Extended PostCreator to support cook_method assignment; added conditional validation skip in PostRevisor; migrated Disqus import to use new TopicEmbed.import_remote API.
Embed Views
app/views/embed/best.html.erb, app/views/embed/loading.html.erb, app/views/layouts/embed.html.erb
Added templates for displaying embedded topics with post iteration, header/footer with site logo, auto-reload on loading view, and parent iframe height communication.
Configuration
config/routes.rb, config/site_settings.yml, config/locales/client.en.yml, config/locales/server.en.yml
Added embed/best route; introduced four site settings (embeddable_host, feed_polling_enabled, feed_polling_url, embed_by_username); added translation keys for embedding UI labels and messages.
Database Migrations
db/migrate/20131217174004_create_topic_embeds.rb, db/migrate/20131219203905_add_cook_method_to_posts.rb, db/migrate/20131223171005_create_top_topics.rb, db/migrate/20131210181901_migrate_word_counts.rb
Created topic_embeds table with unique index on embed_url; added cook_method column to posts with default 1; modified top_topics creation to use force: true; whitespace fix in word counts migration.
Test Coverage
spec/components/topic_retriever_spec.rb, spec/controllers/embed_controller_spec.rb, spec/jobs/poll_feed_spec.rb, spec/models/topic_embed_spec.rb
Added comprehensive tests for retrieval host validation, throttling, controller access control, job execution guards, and topic embed import/update flows with URL absolutization and cook method validation.

Sequence Diagram(s)

sequenceDiagram
    participant External as External Site
    participant Iframe as Embed iframe
    participant Server as Discourse Server
    participant DB as Database
    participant Redis as Redis Cache

    External->>Iframe: Load embed.js
    Iframe->>Server: GET /embed/best?embed_url=...
    Server->>DB: Check TopicEmbed for URL
    alt Topic Exists
        DB-->>Server: TopicEmbed found
        Server->>DB: Load TopicView (best 5 posts)
        DB-->>Server: Posts data
    else Topic Missing
        DB-->>Server: Not found
        Server->>Redis: Check throttle key
        alt Not Throttled
            Redis-->>Server: OK (or expired)
            Server->>Server: Enqueue RetrieveTopic job
            Server-->>Iframe: Render loading view
            Note over Server: Job runs async
            Server->>Server: TopicRetriever.retrieve
            Server->>Server: Validate embeddable_host
            Server->>External: Fetch via HTTP (ruby-readability)
            External-->>Server: Article HTML
            Server->>DB: TopicEmbed.import_remote (create Post)
        else Throttled
            Redis-->>Server: Wait period active
            Server-->>Iframe: Render loading view
        end
    end
    
    Iframe-->>External: Render posts
    Iframe->>Iframe: Measure body height
    Iframe->>External: postMessage({type: 'discourse-resize', height: X})
    External->>External: Adjust iframe height

Loading
sequenceDiagram
    participant Scheduler as Scheduler
    participant Job as PollFeed Job
    participant Settings as Site Settings
    participant Feed as RSS Feed
    participant TopicEmbed as TopicEmbed
    participant DB as Database

    Scheduler->>Job: Execute (hourly)
    Job->>Settings: Check feed_polling_enabled?
    Job->>Settings: Check feed_polling_url
    Job->>Settings: Check embed_by_username
    
    alt Prerequisites Met
        Job->>Feed: Fetch RSS/ATOM feed
        Feed-->>Job: Feed items
        Job->>Job: Parse items (SimpleRSS)
        
        loop For Each Feed Item
            Job->>TopicEmbed: import(user, url, title, content)
            TopicEmbed->>DB: Check if URL exists
            alt New URL
                TopicEmbed->>DB: Create Topic + Post
                TopicEmbed->>DB: Create TopicEmbed record
            else Content Changed
                TopicEmbed->>DB: Revise existing Post
            end
            DB-->>TopicEmbed: Updated post
        end
    else Missing Configuration
        Job->>Job: Skip execution
    end

Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas requiring extra attention:

  • TopicEmbed model (app/models/topic_embed.rb) — Complex import logic with URL absolutization, content hashing, and transactional post creation/revision flow; verify SHA1-based change detection and edge cases around existing embeds
  • TopicRetriever class (lib/topic_retriever.rb) — Redis throttling logic with 60-second window and host validation; ensure idempotence and proper error handling for invalid URIs
  • PollFeed scheduled job (app/jobs/scheduled/poll_feed.rb) — Site setting guards and feed parsing; verify feed item normalization and handling of malformed RSS entries
  • EmbedController access control (app/controllers/embed_controller.rb) — ensure_embeddable filter with referer host matching; confirm X-Frame-Options header is set correctly and CSRF implications are considered
  • Post cook_method integration (app/models/post.rb, lib/post_creator.rb) — Verify raw HTML bypass doesn't introduce security regressions; check cook_method assignment in PostCreator initialization

Poem

🐰 A hop through the embedding flow,
Where topics in iframes now grow!
With polls and retrieval so fine,
Raw HTML posts now align,
Our warren of feeds starts to glow! 🌿

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Enhance embed URL handling and validation system' is vague and does not clearly summarize the main changes; the PR introduces comprehensive embedding features including controllers, jobs, models, and views, not just URL handling and validation. Consider a more specific title that captures the primary objective, such as 'Add topic embed feature with feed polling support' or 'Implement embed controller and topic retrieval system'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch embed-url-handling-post

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (8)
db/migrate/20131217174004_create_topic_embeds.rb (1)

4-11: Consider adding indexes on foreign key columns for query performance.

The topic_id and post_id columns are foreign keys that will likely be used in WHERE clauses and JOINs, but currently lack indexes.

Add indexes to improve query performance:

     end
 
     add_index :topic_embeds, :embed_url, unique: true
+    add_index :topic_embeds, :topic_id
+    add_index :topic_embeds, :post_id
   end
 end
Gemfile (1)

209-211: Update version constraints to pin latest stable gem versions.

The recommendation to pin gem versions is valid for security and reproducibility. However, the suggested versions should be updated based on the latest stable releases:

 # required for feed importing and embedding
-gem 'ruby-readability', require: false
-gem 'simple-rss', require: false
+gem 'ruby-readability', '~> 0.7', require: false
+gem 'simple-rss', '~> 1.3', require: false

ruby-readability latest stable is 0.7.2 (released Aug 29, 2024), so ~> 0.7 is recommended instead of ~> 0.6. simple-rss latest stable is 1.3.3 (released Apr 24, 2018), so ~> 1.3 appropriately captures this version.

spec/jobs/poll_feed_spec.rb (1)

1-40: Optional: Clean up extra blank lines for RuboCop compliance.

The test logic is solid and comprehensive. However, RuboCop flags extra blank lines that could be removed for style consistency.

Apply this diff:

 describe Jobs::PollFeed do
-
   let(:poller) { Jobs::PollFeed.new }
   
   context "execute" do
     # ... tests ...
   end
-
 end

Based on static analysis hints

spec/components/topic_retriever_spec.rb (1)

1-46: Optional: Clean up extra blank lines for RuboCop compliance.

The test suite provides excellent coverage of host validation and throttling logic. The extra blank lines flagged by RuboCop are minor style issues that can be cleaned up.

Apply this diff:

 describe TopicRetriever do
-
   let(:embed_url) { "http://eviltrout.com/2013/02/10/why-discourse-uses-emberjs.html" }
   let(:topic_retriever) { TopicRetriever.new(embed_url) }
   
   # ... tests ...
   
   context "with a valid host" do
     # ... tests ...
-
   end
-
 end

Based on static analysis hints

app/assets/javascripts/embed.js (1)

5-23: Harden container lookup and origin check in embed script

Right now comments is assumed to exist and the origin check uses a loose substring match. A couple of small tweaks would make this more robust:

  • Guard against a missing #discourse-comments container so the script fails gracefully instead of throwing.
  • Require discourseUrl to start with e.origin to avoid accidental substring matches.

Example:

-  var comments = document.getElementById('discourse-comments'),
-      iframe = document.createElement('iframe');
+  var comments = document.getElementById('discourse-comments');
+  if (!comments) { return; }
+  var iframe = document.createElement('iframe');
@@
-    if (discourseUrl.indexOf(e.origin) === -1) { return; }
+    if (!e.origin || discourseUrl.indexOf(e.origin) !== 0) { return; }
app/controllers/embed_controller.rb (1)

2-4: Update deprecated filter syntax to before_action.

The skip_before_filter and before_filter methods are deprecated in Rails 4+ and removed in Rails 5.1+. Use skip_before_action and before_action instead.

Apply this diff to update the syntax:

-  skip_before_filter :check_xhr
-  skip_before_filter :preload_json
-  before_filter :ensure_embeddable
+  skip_before_action :check_xhr
+  skip_before_action :preload_json
+  before_action :ensure_embeddable
lib/topic_retriever.rb (1)

49-49: Optional: Redundant downcase call if column is already lowercase.

If the username_lower column already stores lowercase usernames, the downcase call on SiteSetting.embed_by_username might be redundant. However, this defensive approach ensures correctness if the column naming convention doesn't guarantee lowercase storage.

app/models/topic_embed.rb (1)

11-11: Strengthen URL validation.

The regex /^https?\:\/\// only checks if the URL starts with http:// or https://, but doesn't validate URL structure or prevent malformed URLs. Consider using URI parsing for more robust validation.

-    return unless url =~ /^https?\:\/\//
+    begin
+      uri = URI.parse(url)
+      return unless uri.is_a?(URI::HTTP) || uri.is_a?(URI::HTTPS)
+    rescue URI::InvalidURIError
+      return
+    end
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62db063 and 4f8aed2.

⛔ Files ignored due to path filters (1)
  • Gemfile_rails4.lock is excluded by !**/*.lock
📒 Files selected for processing (27)
  • Gemfile (1 hunks)
  • app/assets/javascripts/embed.js (1 hunks)
  • app/assets/stylesheets/embed.css.scss (1 hunks)
  • app/controllers/embed_controller.rb (1 hunks)
  • app/jobs/regular/retrieve_topic.rb (1 hunks)
  • app/jobs/scheduled/poll_feed.rb (1 hunks)
  • app/models/post.rb (2 hunks)
  • app/models/topic_embed.rb (1 hunks)
  • app/views/embed/best.html.erb (1 hunks)
  • app/views/embed/loading.html.erb (1 hunks)
  • app/views/layouts/embed.html.erb (1 hunks)
  • config/locales/client.en.yml (1 hunks)
  • config/locales/server.en.yml (2 hunks)
  • config/routes.rb (1 hunks)
  • config/site_settings.yml (1 hunks)
  • db/migrate/20131210181901_migrate_word_counts.rb (2 hunks)
  • db/migrate/20131217174004_create_topic_embeds.rb (1 hunks)
  • db/migrate/20131219203905_add_cook_method_to_posts.rb (1 hunks)
  • db/migrate/20131223171005_create_top_topics.rb (1 hunks)
  • lib/post_creator.rb (1 hunks)
  • lib/post_revisor.rb (1 hunks)
  • lib/tasks/disqus.thor (1 hunks)
  • lib/topic_retriever.rb (1 hunks)
  • spec/components/topic_retriever_spec.rb (1 hunks)
  • spec/controllers/embed_controller_spec.rb (1 hunks)
  • spec/jobs/poll_feed_spec.rb (1 hunks)
  • spec/models/topic_embed_spec.rb (1 hunks)
🧰 Additional context used
🪛 Brakeman (7.1.1)
app/jobs/scheduled/poll_feed.rb

[medium] 29-29: Model attribute used in file name
Type: File Access
Confidence: Medium
More info: https://brakemanscanner.org/docs/warning_types/file_access/

(File Access)

config/routes.rb

[medium] Rails 3.2.12 with globbing routes is vulnerable to directory traversal and remote code execution. Patch or upgrade to Rails 3.2.18
Type: Remote Code Execution
Confidence: Medium
More info: http://matasano.com/research/AnatomyOfRailsVuln-CVE-2014-0130.pdf

(Remote Code Execution)

🪛 RuboCop (1.81.7)
spec/controllers/embed_controller_spec.rb

[convention] 4-5: Extra empty line detected at block body beginning.

(Layout/EmptyLinesAroundBlockBody)


[convention] 30-31: Extra empty line detected at block body beginning.

(Layout/EmptyLinesAroundBlockBody)


[convention] 54-55: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)


[convention] 57-58: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)

spec/jobs/poll_feed_spec.rb

[convention] 5-6: Extra empty line detected at block body beginning.

(Layout/EmptyLinesAroundBlockBody)


[convention] 39-40: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)

app/jobs/regular/retrieve_topic.rb

[convention] 5-6: Extra empty line detected at module body beginning.

(Layout/EmptyLinesAroundModuleBody)


[convention] 21-22: Extra empty line detected at module body end.

(Layout/EmptyLinesAroundModuleBody)

spec/components/topic_retriever_spec.rb

[convention] 5-6: Extra empty line detected at block body beginning.

(Layout/EmptyLinesAroundBlockBody)


[convention] 43-44: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)


[convention] 45-46: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)

spec/models/topic_embed_spec.rb

[convention] 4-5: Extra empty line detected at block body beginning.

(Layout/EmptyLinesAroundBlockBody)


[convention] 12-13: Extra empty line detected at block body beginning.

(Layout/EmptyLinesAroundBlockBody)


[convention] 43-44: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)


[convention] 45-46: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)


[convention] 47-48: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)

app/models/topic_embed.rb

[convention] 10-42: Assignment Branch Condition size for import is too high. [<7, 23, 6> 24.78/23]

(Metrics/AbcSize)


[convention] 56-76: Assignment Branch Condition size for absolutize_urls is too high. [<9, 23, 10> 26.65/23]

(Metrics/AbcSize)


[convention] 56-76: Cyclomatic complexity for absolutize_urls is too high. [9/7]

(Metrics/CyclomaticComplexity)


[convention] 56-76: Perceived complexity for absolutize_urls is too high. [9/8]

(Metrics/PerceivedComplexity)

🔇 Additional comments (19)
config/routes.rb (1)

245-245: LGTM! Route addition is correct.

The new embed route is properly structured and follows existing route patterns. Verification confirms that CVE-2014-0130 does affect the codebase (Rails 3.2.12 requires upgrade to 3.2.18+), but this specific route is safe as it doesn't use the vulnerable globbing pattern with :action. The infrastructure upgrade recommendation is valid but appropriately noted as beyond this PR's scope.

lib/post_creator.rb (1)

216-216: LGTM!

The conditional assignment of cook_method from options follows the established pattern for other attributes in this method and correctly integrates with the new cook_methods functionality introduced in the Post model.

app/views/embed/loading.html.erb (1)

1-12: LGTM!

The loading view is well-structured with a clear purpose. The 30-second auto-reload provides a reasonable interval for topic retrieval to complete, and the header with logo maintains consistent branding during the loading state.

lib/post_revisor.rb (1)

85-85: LGTM!

The conditional validation addition maintains consistency with the PostCreator class (line 232 of lib/post_creator.rb) and provides necessary flexibility for import flows that may need to bypass validations.

app/models/post.rb (2)

63-65: LGTM!

The cook_methods enum follows the established pattern used by types and hidden_reasons, and correctly aligns with the migration default value (1 = :regular).


130-137: LGTM!

The cooking logic correctly short-circuits for raw_html content while maintaining the existing cooking pipeline for regular posts. The comment clearly explains the use case for RSS/embed imports.

db/migrate/20131219203905_add_cook_method_to_posts.rb (1)

1-5: LGTM!

The migration correctly adds the cook_method column with appropriate defaults and constraints. The default value of 1 corresponds to :regular in the Post.cook_methods enum, ensuring existing posts maintain their current cooking behavior.

app/views/layouts/embed.html.erb (1)

6-15: The referer is already validated server-side—the main concern in this review is addressed.

The ensure_embeddable filter in app/controllers/embed_controller.rb (line 26) validates that request.referer matches SiteSetting.embeddable_host before rendering the template. Invalid referers raise an exception (lines 29-30). Since validation occurs before the template renders, the referer passed to postMessage is safe.

The parent check improvement (if (parent !== window)) remains valid defensive coding, though it's a minor enhancement since postMessage to the same window is benign.

Likely an incorrect or invalid review comment.

app/assets/stylesheets/embed.css.scss (1)

4-69: Embed layout styles look consistent and scoped

The CSS cleanly scopes embed layout (post, header, footer, logo, buttons) and matches the markup in the embed views; no functional issues stand out.

lib/tasks/disqus.thor (1)

143-178: Disqus import now correctly routes through TopicEmbed.import_remote

Routing the thread root creation via TopicEmbed.import_remote(user, t[:link], title: t[:title]) keeps Disqus imports consistent with the new embed URL handling/validation logic while still attaching imported replies via PostCreator. The flow looks coherent.

app/jobs/regular/retrieve_topic.rb (1)

9-18: RetrieveTopic job wiring looks correct

The job validates embed_url, optionally looks up a user, and passes a no_throttle flag based on user.try(:staff?) into TopicRetriever. That’s a clean separation of concerns for async retrieval.

spec/controllers/embed_controller_spec.rb (1)

5-52: EmbedController specs cover key embedding flows well

The tests exercise error paths (missing URL/host/referer), success behavior including X-Frame-Options, and both branches of topic_id_for_embed (nil vs present). This gives good coverage of the new embed controller behavior.

spec/models/topic_embed_spec.rb (1)

5-42: TopicEmbed.import specs accurately capture expected behavior

The tests nicely pin down associations, validations, malformed URL handling, raw_html cooking, URL absolutization, and idempotent updates on re-import. That should make future changes to TopicEmbed.import much safer.

app/controllers/embed_controller.rb (1)

24-31: Security: Verify that X-Frame-Options: ALLOWALL meets your requirements.

Setting X-Frame-Options to ALLOWALL permits this content to be embedded in any iframe, which is necessary for the embed feature but also increases the attack surface for clickjacking. Ensure this is the intended behavior and that the embeddable_host validation provides sufficient protection.

Note: The Referer header can be omitted or spoofed by clients, so this check provides defense-in-depth but should not be the sole security mechanism.

lib/topic_retriever.rb (2)

14-19: LGTM: Robust host validation.

The host validation correctly handles invalid URIs and compares against the configured embeddable host.


40-42: Verify synchronous feed polling behavior.

The Jobs::PollFeed.new.execute({}) call executes the feed polling job synchronously, which could block the retrieval process if feed polling takes a long time. Verify that this blocking behavior is intentional and acceptable for your use case.

If this causes timeouts or delays, consider:

  • Using asynchronous job enqueueing instead
  • Adding a timeout mechanism
  • Checking if the poll job has run recently before executing
app/models/topic_embed.rb (3)

22-22: Verify security implications of raw_html cook method with skip_validations.

Using cook_method: Post.cook_methods[:raw_html] with skip_validations: true bypasses Discourse's content sanitization and validation. While necessary for importing external content, ensure that:

  1. Content from absolutize_urls is properly sanitized
  2. Only trusted users can trigger imports
  3. The embeddable_host validation provides sufficient access control

56-76: LGTM: URL absolutization logic is sound.

The absolutize_urls method correctly:

  • Constructs the base URL with proper scheme, host, and port handling
  • Converts relative anchor href and image src attributes to absolute URLs
  • Handles leading slash removal to prevent double slashes

The RuboCop complexity warnings are acceptable given the necessity of this logic.


78-80: LGTM: Efficient topic ID lookup.

Using pluck(:topic_id).first is an efficient way to retrieve a single column value.

layout 'embed'

def best
embed_url = params.require(:embed_url)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Validate embed_url parameter format.

The embed_url parameter is required but not validated for URL format. This could allow malformed or non-HTTP(S) URLs to be processed, potentially causing issues in downstream logic.

Consider adding URL format validation:

-    embed_url = params.require(:embed_url)
+    embed_url = params.require(:embed_url)
+    unless embed_url =~ /\A#{URI::regexp(['http', 'https'])}\z/
+      raise ActionController::BadRequest, 'Invalid embed_url format'
+    end
🤖 Prompt for AI Agents
In app/controllers/embed_controller.rb around line 9, the required embed_url
param is not validated for URL format; update the action to parse and validate
embed_url after params.require(:embed_url) by attempting to build a URI (or use
URI.parse) and confirm it has an http or https scheme and a non-empty host,
rescuing parse errors; if validation fails, return a 400/422 response with a
clear error message and do not proceed with downstream processing.

Comment on lines +24 to +37
def poll_feed
user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
return if user.blank?

require 'simple-rss'
rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)

rss.items.each do |i|
url = i.link
url = i.id if url.blank? || url !~ /^https?\:\/\//

content = CGI.unescapeHTML(i.content.scrub)
TopicEmbed.import(user, url, i.title, content)
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against missing content in RSS items to avoid job crashes

Many feeds don’t populate a content field on items; in those cases i.content will be nil, and i.content.scrub will raise a NoMethodError, causing the whole PollFeed run to fail.

Consider falling back to description/summary and always calling scrub on a string:

-      rss.items.each do |i|
-        url = i.link
-        url = i.id if url.blank? || url !~ /^https?\:\/\//
-
-        content = CGI.unescapeHTML(i.content.scrub)
-        TopicEmbed.import(user, url, i.title, content)
-      end
+      rss.items.each do |i|
+        url = i.link
+        url = i.id if url.blank? || url !~ /^https?\:\/\//
+
+        raw_content = i.content || i.description || i.summary || ""
+        content = CGI.unescapeHTML(raw_content.to_s.scrub)
+        TopicEmbed.import(user, url, i.title, content)
+      end

(Separately, if you expect untrusted or variable feed_polling_url values, adding basic scheme/host checks and network timeouts around open would further harden this job.)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def poll_feed
user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
return if user.blank?
require 'simple-rss'
rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)
rss.items.each do |i|
url = i.link
url = i.id if url.blank? || url !~ /^https?\:\/\//
content = CGI.unescapeHTML(i.content.scrub)
TopicEmbed.import(user, url, i.title, content)
end
def poll_feed
user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
return if user.blank?
require 'simple-rss'
rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)
rss.items.each do |i|
url = i.link
url = i.id if url.blank? || url !~ /^https?\:\/\//
raw_content = i.content || i.description || i.summary || ""
content = CGI.unescapeHTML(raw_content.to_s.scrub)
TopicEmbed.import(user, url, i.title, content)
end
🧰 Tools
🪛 Brakeman (7.1.1)

[medium] 29-29: Model attribute used in file name
Type: File Access
Confidence: Medium
More info: https://brakemanscanner.org/docs/warning_types/file_access/

(File Access)

🤖 Prompt for AI Agents
In app/jobs/scheduled/poll_feed.rb around lines 24 to 37, the code assumes
i.content is present and calls i.content.scrub which will raise NoMethodError
when content is nil; update the logic to guard against nil by deriving a string
first (e.g., content_str = (i.content || i.description || i.summary).to_s), then
call scrub and CGI.unescapeHTML on that string before passing to
TopicEmbed.import, so you always call scrub on a String and fall back to
description/summary when content is missing.

def self.import(user, url, title, contents)
return unless url =~ /^https?\:\/\//

contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Potential XSS vulnerability in URL interpolation.

The URL is directly interpolated into HTML without escaping, which could allow XSS attacks if the URL contains malicious characters or JavaScript. Even though the content uses cook_method: :raw_html, user-controlled URLs should be escaped.

Apply proper HTML escaping:

-    contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n"
+    require 'cgi'
+    escaped_url = CGI.escapeHTML(url)
+    contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{escaped_url}'>#{escaped_url}</a>")}</small>\n"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n"
require 'cgi'
escaped_url = CGI.escapeHTML(url)
contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{escaped_url}'>#{escaped_url}</a>")}</small>\n"
🤖 Prompt for AI Agents
In app/models/topic_embed.rb around line 13, the URL is interpolated into raw
HTML which allows XSS; escape the URL before constructing the anchor tag and
ensure both the href and the link text are HTML-escaped (e.g. via
ERB::Util.html_escape or Rails’ h helper) and then mark the final string as safe
HTML if needed for cook_method: :raw_html. Replace the direct interpolation with
code that escapes url for attribute and content, composes the <a> tag from those
escaped values, and only then returns the HTML-safe string.

Comment on lines +44 to +53
def self.import_remote(user, url, opts=nil)
require 'ruby-readability'

opts = opts || {}
doc = Readability::Document.new(open(url).read,
tags: %w[div p code pre h1 h2 h3 b em i strong a img],
attributes: %w[href src])

TopicEmbed.import(user, url, opts[:title] || doc.title, doc.content)
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: SSRF vulnerability using open(url).read.

Using Ruby's open(url).read to fetch remote content is a severe security vulnerability. The open method (via Kernel#open) can:

  1. Access local files using file:// URLs
  2. Execute commands via pipe syntax (|command)
  3. Access internal network resources
  4. Bypass firewalls and access controls

This allows attackers to:

  • Read sensitive files from the server filesystem
  • Access internal services not exposed to the internet
  • Execute arbitrary commands (in some configurations)

Replace with a safe HTTP client and validate the URL scheme:

 def self.import_remote(user, url, opts=nil)
   require 'ruby-readability'
+  require 'open-uri'
+  
+  # Validate URL scheme
+  uri = URI.parse(url)
+  unless uri.is_a?(URI::HTTP) || uri.is_a?(URI::HTTPS)
+    raise ArgumentError, 'Only HTTP(S) URLs are allowed'
+  end
 
   opts = opts || {}
-  doc = Readability::Document.new(open(url).read,
+  # Use OpenURI with safe options or better, use Net::HTTP directly
+  content = URI.open(url, redirect: true, read_timeout: 10).read
+  doc = Readability::Document.new(content,
                                       tags: %w[div p code pre h1 h2 h3 b em i strong a img],
                                       attributes: %w[href src])
 
   TopicEmbed.import(user, url, opts[:title] || doc.title, doc.content)
 end

Better yet, use a dedicated HTTP client library with proper timeouts and safeguards:

require 'net/http'

uri = URI.parse(url)
raise ArgumentError, 'Only HTTP(S) URLs are allowed' unless uri.is_a?(URI::HTTP) || uri.is_a?(URI::HTTPS)

# Add additional checks to prevent access to internal networks
if uri.host.match?(/^(127\.|10\.|172\.(1[6-9]|2\d|3[01])\.|192\.168\.)/)
  raise ArgumentError, 'Access to internal networks not allowed'
end

http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = (uri.scheme == 'https')
http.open_timeout = 5
http.read_timeout = 10
request = Net::HTTP::Get.new(uri.request_uri)
response = http.request(request)
content = response.body
🤖 Prompt for AI Agents
In app/models/topic_embed.rb around lines 44 to 53, the use of open(url).read
introduces a critical SSRF and command/file access risk; replace it by parsing
and validating the URL (allow only http/https), resolve the host and block
private/internal IP ranges, use a dedicated HTTP client with connection and read
timeouts and redirect limits, perform the GET request and check for successful
200 responses before passing the body to Readability, and handle/network
exceptions by raising or logging a clear error instead of calling open directly.

Comment on lines +2 to +6
<%- if @topic_view.posts.present? %>
<%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank') %>
<%- else %>
<%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank') %>
<%- end if %>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix invalid ERB block terminator (<%- end if %>)

<%- end if %> is not valid ERB/Ruby syntax and will cause the template to fail to render. The if should only appear on the opening tag; the closing tag must be just end:

-  <%- if @topic_view.posts.present? %>
-    <%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank') %>
-  <%- else %>
-    <%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank') %>
-  <%- end if %>
+  <%- if @topic_view.posts.present? %>
+    <%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank') %>
+  <%- else %>
+    <%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank') %>
+  <%- end %>

(Optionally, you might also add a short alt attribute to the avatar <img> for accessibility, but that’s non-blocking.)

🤖 Prompt for AI Agents
In app/views/embed/best.html.erb around lines 2 to 6 the ERB closing tag is
invalid (`<%- end if %>`); replace it with a proper ERB block terminator (`<%-
end %>`) so the conditional closes correctly, keep the existing opening `<%- if
@topic_view.posts.present? %>` and associated link_to branches unchanged, and
optionally add a short alt attribute to any avatar <img> in this template for
accessibility.

@@ -0,0 +1,13 @@
class CreateTopicEmbeds < ActiveRecord::Migration
def change
create_table :topic_embeds, force: true do |t|
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Verify the necessity of force: true to prevent unintended data loss.

Using force: true will drop the existing topic_embeds table before recreating it, which can cause data loss if the table already exists with data.

Consider removing force: true unless you're certain the table should be dropped:

-    create_table :topic_embeds, force: true do |t|
+    create_table :topic_embeds do |t|

If this is an initial migration for a new feature, force: true may be acceptable during development, but should be removed before merging to production.

🤖 Prompt for AI Agents
In db/migrate/20131217174004_create_topic_embeds.rb around line 3, the migration
uses create_table :topic_embeds, force: true which will drop and recreate the
table and can cause unintended data loss; remove the force: true option (or
replace with if_not_exists: true if supported) so the migration only creates the
table when absent, and only keep force: true temporarily during local
development if you intentionally want destructive behavior.

class CreateTopTopics < ActiveRecord::Migration
def change
create_table :top_topics do |t|
create_table :top_topics, force: true do |t|
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Verify the necessity of force: true to prevent unintended data loss.

Using force: true will drop the existing top_topics table before recreating it, which can cause data loss if the table already exists with data.

This option should only be used during initial development or when you're certain the table should be dropped. For production migrations, consider removing force: true or using a more explicit approach:

-    create_table :top_topics, force: true do |t|
+    drop_table :top_topics if table_exists?(:top_topics)
+    create_table :top_topics do |t|

Or simply:

-    create_table :top_topics, force: true do |t|
+    create_table :top_topics do |t|

Verify whether this table is expected to exist already and whether data loss is acceptable:


🏁 Script executed:

#!/bin/bash
# Check if there are other references to top_topics table in migrations
rg -n "top_topics" --type=rb -g "db/migrate/*.rb"

Length of output: 26


I'll run a corrected verification script to check for references to the top_topics table in migrations.


---

🏁 Script executed:

```shell
#!/bin/bash
# Search for top_topics references in migration files (without file type spec)
echo "=== Searching for top_topics in db/migrate directory ==="
rg -n "top_topics" db/migrate/

echo -e "\n=== Checking the specific migration file ==="
cat db/migrate/20131223171005_create_top_topics.rb

Length of output: 1120


Remove force: true from the initial table creation migration.

This is the initial top_topics table creation (migration 20131223171005), yet it includes force: true, which will drop any existing table before recreating it. While the table shouldn't exist on first run, this option creates unnecessary risk if migrations are re-run (e.g., during database resets or redeployment), causing data loss. A subsequent migration (20131227164338) adds columns to this table, confirming the expected table persistence.

Simply remove force: true:

-    create_table :top_topics, force: true do |t|
+    create_table :top_topics do |t|
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
create_table :top_topics, force: true do |t|
create_table :top_topics do |t|
🤖 Prompt for AI Agents
In db/migrate/20131223171005_create_top_topics.rb around line 3, remove the
force: true option from the create_table call so it reads create_table
:top_topics do |t|; this prevents the migration from dropping an existing table,
keep the rest of the block unchanged and run the migration suite to verify no
schema conflicts.

Comment on lines +27 to +28
if $redis.setnx(retrieved_key, "1")
$redis.expire(retrieved_key, 60)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Non-atomic Redis operations create race condition.

The setnx followed by expire is not atomic. If the process crashes or is interrupted between these two calls, the Redis key will never expire, causing permanent throttling for that embed URL. Additionally, there's a small window where another process could check the key before the TTL is set.

Use the atomic SET command with EX option:

-      if $redis.setnx(retrieved_key, "1")
-        $redis.expire(retrieved_key, 60)
+      if $redis.set(retrieved_key, "1", ex: 60, nx: true)
         return false
       end
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if $redis.setnx(retrieved_key, "1")
$redis.expire(retrieved_key, 60)
if $redis.set(retrieved_key, "1", ex: 60, nx: true)
return false
end
🤖 Prompt for AI Agents
In lib/topic_retriever.rb around lines 27-28, the current two-step use of setnx
followed by expire is non-atomic and risks leaving the key without a TTL;
replace those two calls with a single atomic Redis SET using NX and EX (i.e.,
set the key only if not exists and set the TTL in the same command) and branch
on the SET return value so you only proceed when the key was actually created;
if your Redis client doesn't support high-level options, invoke the equivalent
raw SET command with NX and EX to achieve atomicity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants