-
Notifications
You must be signed in to change notification settings - Fork 0
Enhance embed URL handling and validation system #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: embed-url-handling-pre
Are you sure you want to change the base?
Enhance embed URL handling and validation system #7
Conversation
|
@coderabbitai review |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
WalkthroughThis pull request introduces a complete topic embedding feature, allowing Discourse topics to be embedded in external websites. It includes server-side retrieval via RSS feed polling, client-side iframe integration, database schema extensions for tracking embedded content, and comprehensive site configuration and localization. Changes
Sequence Diagram(s)sequenceDiagram
participant External as External Site
participant Iframe as Embed iframe
participant Server as Discourse Server
participant DB as Database
participant Redis as Redis Cache
External->>Iframe: Load embed.js
Iframe->>Server: GET /embed/best?embed_url=...
Server->>DB: Check TopicEmbed for URL
alt Topic Exists
DB-->>Server: TopicEmbed found
Server->>DB: Load TopicView (best 5 posts)
DB-->>Server: Posts data
else Topic Missing
DB-->>Server: Not found
Server->>Redis: Check throttle key
alt Not Throttled
Redis-->>Server: OK (or expired)
Server->>Server: Enqueue RetrieveTopic job
Server-->>Iframe: Render loading view
Note over Server: Job runs async
Server->>Server: TopicRetriever.retrieve
Server->>Server: Validate embeddable_host
Server->>External: Fetch via HTTP (ruby-readability)
External-->>Server: Article HTML
Server->>DB: TopicEmbed.import_remote (create Post)
else Throttled
Redis-->>Server: Wait period active
Server-->>Iframe: Render loading view
end
end
Iframe-->>External: Render posts
Iframe->>Iframe: Measure body height
Iframe->>External: postMessage({type: 'discourse-resize', height: X})
External->>External: Adjust iframe height
sequenceDiagram
participant Scheduler as Scheduler
participant Job as PollFeed Job
participant Settings as Site Settings
participant Feed as RSS Feed
participant TopicEmbed as TopicEmbed
participant DB as Database
Scheduler->>Job: Execute (hourly)
Job->>Settings: Check feed_polling_enabled?
Job->>Settings: Check feed_polling_url
Job->>Settings: Check embed_by_username
alt Prerequisites Met
Job->>Feed: Fetch RSS/ATOM feed
Feed-->>Job: Feed items
Job->>Job: Parse items (SimpleRSS)
loop For Each Feed Item
Job->>TopicEmbed: import(user, url, title, content)
TopicEmbed->>DB: Check if URL exists
alt New URL
TopicEmbed->>DB: Create Topic + Post
TopicEmbed->>DB: Create TopicEmbed record
else Content Changed
TopicEmbed->>DB: Revise existing Post
end
DB-->>TopicEmbed: Updated post
end
else Missing Configuration
Job->>Job: Skip execution
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Areas requiring extra attention:
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Nitpick comments (8)
db/migrate/20131217174004_create_topic_embeds.rb (1)
4-11: Consider adding indexes on foreign key columns for query performance.The
topic_idandpost_idcolumns are foreign keys that will likely be used in WHERE clauses and JOINs, but currently lack indexes.Add indexes to improve query performance:
end add_index :topic_embeds, :embed_url, unique: true + add_index :topic_embeds, :topic_id + add_index :topic_embeds, :post_id end endGemfile (1)
209-211: Update version constraints to pin latest stable gem versions.The recommendation to pin gem versions is valid for security and reproducibility. However, the suggested versions should be updated based on the latest stable releases:
# required for feed importing and embedding -gem 'ruby-readability', require: false -gem 'simple-rss', require: false +gem 'ruby-readability', '~> 0.7', require: false +gem 'simple-rss', '~> 1.3', require: falseruby-readability latest stable is 0.7.2 (released Aug 29, 2024), so
~> 0.7is recommended instead of~> 0.6. simple-rss latest stable is 1.3.3 (released Apr 24, 2018), so~> 1.3appropriately captures this version.spec/jobs/poll_feed_spec.rb (1)
1-40: Optional: Clean up extra blank lines for RuboCop compliance.The test logic is solid and comprehensive. However, RuboCop flags extra blank lines that could be removed for style consistency.
Apply this diff:
describe Jobs::PollFeed do - let(:poller) { Jobs::PollFeed.new } context "execute" do # ... tests ... end - endBased on static analysis hints
spec/components/topic_retriever_spec.rb (1)
1-46: Optional: Clean up extra blank lines for RuboCop compliance.The test suite provides excellent coverage of host validation and throttling logic. The extra blank lines flagged by RuboCop are minor style issues that can be cleaned up.
Apply this diff:
describe TopicRetriever do - let(:embed_url) { "http://eviltrout.com/2013/02/10/why-discourse-uses-emberjs.html" } let(:topic_retriever) { TopicRetriever.new(embed_url) } # ... tests ... context "with a valid host" do # ... tests ... - end - endBased on static analysis hints
app/assets/javascripts/embed.js (1)
5-23: Harden container lookup and origin check in embed scriptRight now
commentsis assumed to exist and the origin check uses a loose substring match. A couple of small tweaks would make this more robust:
- Guard against a missing
#discourse-commentscontainer so the script fails gracefully instead of throwing.- Require
discourseUrlto start withe.originto avoid accidental substring matches.Example:
- var comments = document.getElementById('discourse-comments'), - iframe = document.createElement('iframe'); + var comments = document.getElementById('discourse-comments'); + if (!comments) { return; } + var iframe = document.createElement('iframe'); @@ - if (discourseUrl.indexOf(e.origin) === -1) { return; } + if (!e.origin || discourseUrl.indexOf(e.origin) !== 0) { return; }app/controllers/embed_controller.rb (1)
2-4: Update deprecated filter syntax tobefore_action.The
skip_before_filterandbefore_filtermethods are deprecated in Rails 4+ and removed in Rails 5.1+. Useskip_before_actionandbefore_actioninstead.Apply this diff to update the syntax:
- skip_before_filter :check_xhr - skip_before_filter :preload_json - before_filter :ensure_embeddable + skip_before_action :check_xhr + skip_before_action :preload_json + before_action :ensure_embeddablelib/topic_retriever.rb (1)
49-49: Optional: Redundantdowncasecall if column is already lowercase.If the
username_lowercolumn already stores lowercase usernames, thedowncasecall onSiteSetting.embed_by_usernamemight be redundant. However, this defensive approach ensures correctness if the column naming convention doesn't guarantee lowercase storage.app/models/topic_embed.rb (1)
11-11: Strengthen URL validation.The regex
/^https?\:\/\//only checks if the URL starts withhttp://orhttps://, but doesn't validate URL structure or prevent malformed URLs. Consider using URI parsing for more robust validation.- return unless url =~ /^https?\:\/\// + begin + uri = URI.parse(url) + return unless uri.is_a?(URI::HTTP) || uri.is_a?(URI::HTTPS) + rescue URI::InvalidURIError + return + end
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Gemfile_rails4.lockis excluded by!**/*.lock
📒 Files selected for processing (27)
Gemfile(1 hunks)app/assets/javascripts/embed.js(1 hunks)app/assets/stylesheets/embed.css.scss(1 hunks)app/controllers/embed_controller.rb(1 hunks)app/jobs/regular/retrieve_topic.rb(1 hunks)app/jobs/scheduled/poll_feed.rb(1 hunks)app/models/post.rb(2 hunks)app/models/topic_embed.rb(1 hunks)app/views/embed/best.html.erb(1 hunks)app/views/embed/loading.html.erb(1 hunks)app/views/layouts/embed.html.erb(1 hunks)config/locales/client.en.yml(1 hunks)config/locales/server.en.yml(2 hunks)config/routes.rb(1 hunks)config/site_settings.yml(1 hunks)db/migrate/20131210181901_migrate_word_counts.rb(2 hunks)db/migrate/20131217174004_create_topic_embeds.rb(1 hunks)db/migrate/20131219203905_add_cook_method_to_posts.rb(1 hunks)db/migrate/20131223171005_create_top_topics.rb(1 hunks)lib/post_creator.rb(1 hunks)lib/post_revisor.rb(1 hunks)lib/tasks/disqus.thor(1 hunks)lib/topic_retriever.rb(1 hunks)spec/components/topic_retriever_spec.rb(1 hunks)spec/controllers/embed_controller_spec.rb(1 hunks)spec/jobs/poll_feed_spec.rb(1 hunks)spec/models/topic_embed_spec.rb(1 hunks)
🧰 Additional context used
🪛 Brakeman (7.1.1)
app/jobs/scheduled/poll_feed.rb
[medium] 29-29: Model attribute used in file name
Type: File Access
Confidence: Medium
More info: https://brakemanscanner.org/docs/warning_types/file_access/
(File Access)
config/routes.rb
[medium] Rails 3.2.12 with globbing routes is vulnerable to directory traversal and remote code execution. Patch or upgrade to Rails 3.2.18
Type: Remote Code Execution
Confidence: Medium
More info: http://matasano.com/research/AnatomyOfRailsVuln-CVE-2014-0130.pdf
(Remote Code Execution)
🪛 RuboCop (1.81.7)
spec/controllers/embed_controller_spec.rb
[convention] 4-5: Extra empty line detected at block body beginning.
(Layout/EmptyLinesAroundBlockBody)
[convention] 30-31: Extra empty line detected at block body beginning.
(Layout/EmptyLinesAroundBlockBody)
[convention] 54-55: Extra empty line detected at block body end.
(Layout/EmptyLinesAroundBlockBody)
[convention] 57-58: Extra empty line detected at block body end.
(Layout/EmptyLinesAroundBlockBody)
spec/jobs/poll_feed_spec.rb
[convention] 5-6: Extra empty line detected at block body beginning.
(Layout/EmptyLinesAroundBlockBody)
[convention] 39-40: Extra empty line detected at block body end.
(Layout/EmptyLinesAroundBlockBody)
app/jobs/regular/retrieve_topic.rb
[convention] 5-6: Extra empty line detected at module body beginning.
(Layout/EmptyLinesAroundModuleBody)
[convention] 21-22: Extra empty line detected at module body end.
(Layout/EmptyLinesAroundModuleBody)
spec/components/topic_retriever_spec.rb
[convention] 5-6: Extra empty line detected at block body beginning.
(Layout/EmptyLinesAroundBlockBody)
[convention] 43-44: Extra empty line detected at block body end.
(Layout/EmptyLinesAroundBlockBody)
[convention] 45-46: Extra empty line detected at block body end.
(Layout/EmptyLinesAroundBlockBody)
spec/models/topic_embed_spec.rb
[convention] 4-5: Extra empty line detected at block body beginning.
(Layout/EmptyLinesAroundBlockBody)
[convention] 12-13: Extra empty line detected at block body beginning.
(Layout/EmptyLinesAroundBlockBody)
[convention] 43-44: Extra empty line detected at block body end.
(Layout/EmptyLinesAroundBlockBody)
[convention] 45-46: Extra empty line detected at block body end.
(Layout/EmptyLinesAroundBlockBody)
[convention] 47-48: Extra empty line detected at block body end.
(Layout/EmptyLinesAroundBlockBody)
app/models/topic_embed.rb
[convention] 10-42: Assignment Branch Condition size for import is too high. [<7, 23, 6> 24.78/23]
(Metrics/AbcSize)
[convention] 56-76: Assignment Branch Condition size for absolutize_urls is too high. [<9, 23, 10> 26.65/23]
(Metrics/AbcSize)
[convention] 56-76: Cyclomatic complexity for absolutize_urls is too high. [9/7]
(Metrics/CyclomaticComplexity)
[convention] 56-76: Perceived complexity for absolutize_urls is too high. [9/8]
(Metrics/PerceivedComplexity)
🔇 Additional comments (19)
config/routes.rb (1)
245-245: LGTM! Route addition is correct.The new embed route is properly structured and follows existing route patterns. Verification confirms that CVE-2014-0130 does affect the codebase (Rails 3.2.12 requires upgrade to 3.2.18+), but this specific route is safe as it doesn't use the vulnerable globbing pattern with
:action. The infrastructure upgrade recommendation is valid but appropriately noted as beyond this PR's scope.lib/post_creator.rb (1)
216-216: LGTM!The conditional assignment of
cook_methodfrom options follows the established pattern for other attributes in this method and correctly integrates with the newcook_methodsfunctionality introduced in the Post model.app/views/embed/loading.html.erb (1)
1-12: LGTM!The loading view is well-structured with a clear purpose. The 30-second auto-reload provides a reasonable interval for topic retrieval to complete, and the header with logo maintains consistent branding during the loading state.
lib/post_revisor.rb (1)
85-85: LGTM!The conditional validation addition maintains consistency with the
PostCreatorclass (line 232 oflib/post_creator.rb) and provides necessary flexibility for import flows that may need to bypass validations.app/models/post.rb (2)
63-65: LGTM!The
cook_methodsenum follows the established pattern used bytypesandhidden_reasons, and correctly aligns with the migration default value (1 = :regular).
130-137: LGTM!The cooking logic correctly short-circuits for
raw_htmlcontent while maintaining the existing cooking pipeline for regular posts. The comment clearly explains the use case for RSS/embed imports.db/migrate/20131219203905_add_cook_method_to_posts.rb (1)
1-5: LGTM!The migration correctly adds the
cook_methodcolumn with appropriate defaults and constraints. The default value of 1 corresponds to:regularin thePost.cook_methodsenum, ensuring existing posts maintain their current cooking behavior.app/views/layouts/embed.html.erb (1)
6-15: The referer is already validated server-side—the main concern in this review is addressed.The
ensure_embeddablefilter inapp/controllers/embed_controller.rb(line 26) validates thatrequest.referermatchesSiteSetting.embeddable_hostbefore rendering the template. Invalid referers raise an exception (lines 29-30). Since validation occurs before the template renders, the referer passed topostMessageis safe.The parent check improvement (
if (parent !== window)) remains valid defensive coding, though it's a minor enhancement since postMessage to the same window is benign.Likely an incorrect or invalid review comment.
app/assets/stylesheets/embed.css.scss (1)
4-69: Embed layout styles look consistent and scopedThe CSS cleanly scopes embed layout (post, header, footer, logo, buttons) and matches the markup in the embed views; no functional issues stand out.
lib/tasks/disqus.thor (1)
143-178: Disqus import now correctly routes through TopicEmbed.import_remoteRouting the thread root creation via
TopicEmbed.import_remote(user, t[:link], title: t[:title])keeps Disqus imports consistent with the new embed URL handling/validation logic while still attaching imported replies viaPostCreator. The flow looks coherent.app/jobs/regular/retrieve_topic.rb (1)
9-18: RetrieveTopic job wiring looks correctThe job validates
embed_url, optionally looks up a user, and passes ano_throttleflag based onuser.try(:staff?)intoTopicRetriever. That’s a clean separation of concerns for async retrieval.spec/controllers/embed_controller_spec.rb (1)
5-52: EmbedController specs cover key embedding flows wellThe tests exercise error paths (missing URL/host/referer), success behavior including
X-Frame-Options, and both branches oftopic_id_for_embed(nil vs present). This gives good coverage of the new embed controller behavior.spec/models/topic_embed_spec.rb (1)
5-42: TopicEmbed.import specs accurately capture expected behaviorThe tests nicely pin down associations, validations, malformed URL handling, raw_html cooking, URL absolutization, and idempotent updates on re-import. That should make future changes to TopicEmbed.import much safer.
app/controllers/embed_controller.rb (1)
24-31: Security: Verify thatX-Frame-Options: ALLOWALLmeets your requirements.Setting
X-Frame-OptionstoALLOWALLpermits this content to be embedded in any iframe, which is necessary for the embed feature but also increases the attack surface for clickjacking. Ensure this is the intended behavior and that theembeddable_hostvalidation provides sufficient protection.Note: The
Refererheader can be omitted or spoofed by clients, so this check provides defense-in-depth but should not be the sole security mechanism.lib/topic_retriever.rb (2)
14-19: LGTM: Robust host validation.The host validation correctly handles invalid URIs and compares against the configured embeddable host.
40-42: Verify synchronous feed polling behavior.The
Jobs::PollFeed.new.execute({})call executes the feed polling job synchronously, which could block the retrieval process if feed polling takes a long time. Verify that this blocking behavior is intentional and acceptable for your use case.If this causes timeouts or delays, consider:
- Using asynchronous job enqueueing instead
- Adding a timeout mechanism
- Checking if the poll job has run recently before executing
app/models/topic_embed.rb (3)
22-22: Verify security implications ofraw_htmlcook method withskip_validations.Using
cook_method: Post.cook_methods[:raw_html]withskip_validations: truebypasses Discourse's content sanitization and validation. While necessary for importing external content, ensure that:
- Content from
absolutize_urlsis properly sanitized- Only trusted users can trigger imports
- The
embeddable_hostvalidation provides sufficient access control
56-76: LGTM: URL absolutization logic is sound.The
absolutize_urlsmethod correctly:
- Constructs the base URL with proper scheme, host, and port handling
- Converts relative anchor
hrefand imagesrcattributes to absolute URLs- Handles leading slash removal to prevent double slashes
The RuboCop complexity warnings are acceptable given the necessity of this logic.
78-80: LGTM: Efficient topic ID lookup.Using
pluck(:topic_id).firstis an efficient way to retrieve a single column value.
| layout 'embed' | ||
|
|
||
| def best | ||
| embed_url = params.require(:embed_url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Validate embed_url parameter format.
The embed_url parameter is required but not validated for URL format. This could allow malformed or non-HTTP(S) URLs to be processed, potentially causing issues in downstream logic.
Consider adding URL format validation:
- embed_url = params.require(:embed_url)
+ embed_url = params.require(:embed_url)
+ unless embed_url =~ /\A#{URI::regexp(['http', 'https'])}\z/
+ raise ActionController::BadRequest, 'Invalid embed_url format'
+ end🤖 Prompt for AI Agents
In app/controllers/embed_controller.rb around line 9, the required embed_url
param is not validated for URL format; update the action to parse and validate
embed_url after params.require(:embed_url) by attempting to build a URI (or use
URI.parse) and confirm it has an http or https scheme and a non-empty host,
rescuing parse errors; if validation fails, return a 400/422 response with a
clear error message and do not proceed with downstream processing.
| def poll_feed | ||
| user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first | ||
| return if user.blank? | ||
|
|
||
| require 'simple-rss' | ||
| rss = SimpleRSS.parse open(SiteSetting.feed_polling_url) | ||
|
|
||
| rss.items.each do |i| | ||
| url = i.link | ||
| url = i.id if url.blank? || url !~ /^https?\:\/\// | ||
|
|
||
| content = CGI.unescapeHTML(i.content.scrub) | ||
| TopicEmbed.import(user, url, i.title, content) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against missing content in RSS items to avoid job crashes
Many feeds don’t populate a content field on items; in those cases i.content will be nil, and i.content.scrub will raise a NoMethodError, causing the whole PollFeed run to fail.
Consider falling back to description/summary and always calling scrub on a string:
- rss.items.each do |i|
- url = i.link
- url = i.id if url.blank? || url !~ /^https?\:\/\//
-
- content = CGI.unescapeHTML(i.content.scrub)
- TopicEmbed.import(user, url, i.title, content)
- end
+ rss.items.each do |i|
+ url = i.link
+ url = i.id if url.blank? || url !~ /^https?\:\/\//
+
+ raw_content = i.content || i.description || i.summary || ""
+ content = CGI.unescapeHTML(raw_content.to_s.scrub)
+ TopicEmbed.import(user, url, i.title, content)
+ end(Separately, if you expect untrusted or variable feed_polling_url values, adding basic scheme/host checks and network timeouts around open would further harden this job.)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def poll_feed | |
| user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first | |
| return if user.blank? | |
| require 'simple-rss' | |
| rss = SimpleRSS.parse open(SiteSetting.feed_polling_url) | |
| rss.items.each do |i| | |
| url = i.link | |
| url = i.id if url.blank? || url !~ /^https?\:\/\// | |
| content = CGI.unescapeHTML(i.content.scrub) | |
| TopicEmbed.import(user, url, i.title, content) | |
| end | |
| def poll_feed | |
| user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first | |
| return if user.blank? | |
| require 'simple-rss' | |
| rss = SimpleRSS.parse open(SiteSetting.feed_polling_url) | |
| rss.items.each do |i| | |
| url = i.link | |
| url = i.id if url.blank? || url !~ /^https?\:\/\// | |
| raw_content = i.content || i.description || i.summary || "" | |
| content = CGI.unescapeHTML(raw_content.to_s.scrub) | |
| TopicEmbed.import(user, url, i.title, content) | |
| end |
🧰 Tools
🪛 Brakeman (7.1.1)
[medium] 29-29: Model attribute used in file name
Type: File Access
Confidence: Medium
More info: https://brakemanscanner.org/docs/warning_types/file_access/
(File Access)
🤖 Prompt for AI Agents
In app/jobs/scheduled/poll_feed.rb around lines 24 to 37, the code assumes
i.content is present and calls i.content.scrub which will raise NoMethodError
when content is nil; update the logic to guard against nil by deriving a string
first (e.g., content_str = (i.content || i.description || i.summary).to_s), then
call scrub and CGI.unescapeHTML on that string before passing to
TopicEmbed.import, so you always call scrub on a String and fall back to
description/summary when content is missing.
| def self.import(user, url, title, contents) | ||
| return unless url =~ /^https?\:\/\// | ||
|
|
||
| contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Potential XSS vulnerability in URL interpolation.
The URL is directly interpolated into HTML without escaping, which could allow XSS attacks if the URL contains malicious characters or JavaScript. Even though the content uses cook_method: :raw_html, user-controlled URLs should be escaped.
Apply proper HTML escaping:
- contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n"
+ require 'cgi'
+ escaped_url = CGI.escapeHTML(url)
+ contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{escaped_url}'>#{escaped_url}</a>")}</small>\n"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n" | |
| require 'cgi' | |
| escaped_url = CGI.escapeHTML(url) | |
| contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{escaped_url}'>#{escaped_url}</a>")}</small>\n" |
🤖 Prompt for AI Agents
In app/models/topic_embed.rb around line 13, the URL is interpolated into raw
HTML which allows XSS; escape the URL before constructing the anchor tag and
ensure both the href and the link text are HTML-escaped (e.g. via
ERB::Util.html_escape or Rails’ h helper) and then mark the final string as safe
HTML if needed for cook_method: :raw_html. Replace the direct interpolation with
code that escapes url for attribute and content, composes the <a> tag from those
escaped values, and only then returns the HTML-safe string.
| def self.import_remote(user, url, opts=nil) | ||
| require 'ruby-readability' | ||
|
|
||
| opts = opts || {} | ||
| doc = Readability::Document.new(open(url).read, | ||
| tags: %w[div p code pre h1 h2 h3 b em i strong a img], | ||
| attributes: %w[href src]) | ||
|
|
||
| TopicEmbed.import(user, url, opts[:title] || doc.title, doc.content) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: SSRF vulnerability using open(url).read.
Using Ruby's open(url).read to fetch remote content is a severe security vulnerability. The open method (via Kernel#open) can:
- Access local files using
file://URLs - Execute commands via pipe syntax (
|command) - Access internal network resources
- Bypass firewalls and access controls
This allows attackers to:
- Read sensitive files from the server filesystem
- Access internal services not exposed to the internet
- Execute arbitrary commands (in some configurations)
Replace with a safe HTTP client and validate the URL scheme:
def self.import_remote(user, url, opts=nil)
require 'ruby-readability'
+ require 'open-uri'
+
+ # Validate URL scheme
+ uri = URI.parse(url)
+ unless uri.is_a?(URI::HTTP) || uri.is_a?(URI::HTTPS)
+ raise ArgumentError, 'Only HTTP(S) URLs are allowed'
+ end
opts = opts || {}
- doc = Readability::Document.new(open(url).read,
+ # Use OpenURI with safe options or better, use Net::HTTP directly
+ content = URI.open(url, redirect: true, read_timeout: 10).read
+ doc = Readability::Document.new(content,
tags: %w[div p code pre h1 h2 h3 b em i strong a img],
attributes: %w[href src])
TopicEmbed.import(user, url, opts[:title] || doc.title, doc.content)
endBetter yet, use a dedicated HTTP client library with proper timeouts and safeguards:
require 'net/http'
uri = URI.parse(url)
raise ArgumentError, 'Only HTTP(S) URLs are allowed' unless uri.is_a?(URI::HTTP) || uri.is_a?(URI::HTTPS)
# Add additional checks to prevent access to internal networks
if uri.host.match?(/^(127\.|10\.|172\.(1[6-9]|2\d|3[01])\.|192\.168\.)/)
raise ArgumentError, 'Access to internal networks not allowed'
end
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = (uri.scheme == 'https')
http.open_timeout = 5
http.read_timeout = 10
request = Net::HTTP::Get.new(uri.request_uri)
response = http.request(request)
content = response.body🤖 Prompt for AI Agents
In app/models/topic_embed.rb around lines 44 to 53, the use of open(url).read
introduces a critical SSRF and command/file access risk; replace it by parsing
and validating the URL (allow only http/https), resolve the host and block
private/internal IP ranges, use a dedicated HTTP client with connection and read
timeouts and redirect limits, perform the GET request and check for successful
200 responses before passing the body to Readability, and handle/network
exceptions by raising or logging a clear error instead of calling open directly.
| <%- if @topic_view.posts.present? %> | ||
| <%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank') %> | ||
| <%- else %> | ||
| <%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank') %> | ||
| <%- end if %> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix invalid ERB block terminator (<%- end if %>)
<%- end if %> is not valid ERB/Ruby syntax and will cause the template to fail to render. The if should only appear on the opening tag; the closing tag must be just end:
- <%- if @topic_view.posts.present? %>
- <%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank') %>
- <%- else %>
- <%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank') %>
- <%- end if %>
+ <%- if @topic_view.posts.present? %>
+ <%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank') %>
+ <%- else %>
+ <%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank') %>
+ <%- end %>(Optionally, you might also add a short alt attribute to the avatar <img> for accessibility, but that’s non-blocking.)
🤖 Prompt for AI Agents
In app/views/embed/best.html.erb around lines 2 to 6 the ERB closing tag is
invalid (`<%- end if %>`); replace it with a proper ERB block terminator (`<%-
end %>`) so the conditional closes correctly, keep the existing opening `<%- if
@topic_view.posts.present? %>` and associated link_to branches unchanged, and
optionally add a short alt attribute to any avatar <img> in this template for
accessibility.
| @@ -0,0 +1,13 @@ | |||
| class CreateTopicEmbeds < ActiveRecord::Migration | |||
| def change | |||
| create_table :topic_embeds, force: true do |t| | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify the necessity of force: true to prevent unintended data loss.
Using force: true will drop the existing topic_embeds table before recreating it, which can cause data loss if the table already exists with data.
Consider removing force: true unless you're certain the table should be dropped:
- create_table :topic_embeds, force: true do |t|
+ create_table :topic_embeds do |t|If this is an initial migration for a new feature, force: true may be acceptable during development, but should be removed before merging to production.
🤖 Prompt for AI Agents
In db/migrate/20131217174004_create_topic_embeds.rb around line 3, the migration
uses create_table :topic_embeds, force: true which will drop and recreate the
table and can cause unintended data loss; remove the force: true option (or
replace with if_not_exists: true if supported) so the migration only creates the
table when absent, and only keep force: true temporarily during local
development if you intentionally want destructive behavior.
| class CreateTopTopics < ActiveRecord::Migration | ||
| def change | ||
| create_table :top_topics do |t| | ||
| create_table :top_topics, force: true do |t| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify the necessity of force: true to prevent unintended data loss.
Using force: true will drop the existing top_topics table before recreating it, which can cause data loss if the table already exists with data.
This option should only be used during initial development or when you're certain the table should be dropped. For production migrations, consider removing force: true or using a more explicit approach:
- create_table :top_topics, force: true do |t|
+ drop_table :top_topics if table_exists?(:top_topics)
+ create_table :top_topics do |t|Or simply:
- create_table :top_topics, force: true do |t|
+ create_table :top_topics do |t|Verify whether this table is expected to exist already and whether data loss is acceptable:
🏁 Script executed:
#!/bin/bash
# Check if there are other references to top_topics table in migrations
rg -n "top_topics" --type=rb -g "db/migrate/*.rb"Length of output: 26
I'll run a corrected verification script to check for references to the top_topics table in migrations.
---
🏁 Script executed:
```shell
#!/bin/bash
# Search for top_topics references in migration files (without file type spec)
echo "=== Searching for top_topics in db/migrate directory ==="
rg -n "top_topics" db/migrate/
echo -e "\n=== Checking the specific migration file ==="
cat db/migrate/20131223171005_create_top_topics.rb
Length of output: 1120
Remove force: true from the initial table creation migration.
This is the initial top_topics table creation (migration 20131223171005), yet it includes force: true, which will drop any existing table before recreating it. While the table shouldn't exist on first run, this option creates unnecessary risk if migrations are re-run (e.g., during database resets or redeployment), causing data loss. A subsequent migration (20131227164338) adds columns to this table, confirming the expected table persistence.
Simply remove force: true:
- create_table :top_topics, force: true do |t|
+ create_table :top_topics do |t|📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| create_table :top_topics, force: true do |t| | |
| create_table :top_topics do |t| |
🤖 Prompt for AI Agents
In db/migrate/20131223171005_create_top_topics.rb around line 3, remove the
force: true option from the create_table call so it reads create_table
:top_topics do |t|; this prevents the migration from dropping an existing table,
keep the rest of the block unchanged and run the migration suite to verify no
schema conflicts.
| if $redis.setnx(retrieved_key, "1") | ||
| $redis.expire(retrieved_key, 60) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Non-atomic Redis operations create race condition.
The setnx followed by expire is not atomic. If the process crashes or is interrupted between these two calls, the Redis key will never expire, causing permanent throttling for that embed URL. Additionally, there's a small window where another process could check the key before the TTL is set.
Use the atomic SET command with EX option:
- if $redis.setnx(retrieved_key, "1")
- $redis.expire(retrieved_key, 60)
+ if $redis.set(retrieved_key, "1", ex: 60, nx: true)
return false
end📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if $redis.setnx(retrieved_key, "1") | |
| $redis.expire(retrieved_key, 60) | |
| if $redis.set(retrieved_key, "1", ex: 60, nx: true) | |
| return false | |
| end |
🤖 Prompt for AI Agents
In lib/topic_retriever.rb around lines 27-28, the current two-step use of setnx
followed by expire is non-atomic and risks leaving the key without a TTL;
replace those two calls with a single atomic Redis SET using NX and EX (i.e.,
set the key only if not exists and set the TTL in the same command) and branch
on the SET return value so you only proceed when the key was actually created;
if your Redis client doesn't support high-level options, invoke the equivalent
raw SET command with NX and EX to achieve atomicity.
Test 4
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.
Replicated from ai-code-review-evaluation/discourse-coderabbit#4