Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,10 @@ gem 'unicorn', require: false
gem 'puma', require: false
gem 'rbtrace', require: false

# required for feed importing and embedding
gem 'ruby-readability', require: false
gem 'simple-rss', require: false

# perftools only works on 1.9 atm
group :profile do
# travis refuses to install this, instead of fuffing, just avoid it for now
Expand Down
7 changes: 7 additions & 0 deletions Gemfile_rails4.lock
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@ GEM
fspath (2.0.5)
given_core (3.1.1)
sorcerer (>= 0.3.7)
guess_html_encoding (0.0.9)
handlebars-source (1.1.2)
hashie (2.0.5)
highline (1.6.20)
Expand Down Expand Up @@ -309,6 +310,9 @@ GEM
rspec-mocks (~> 2.14.0)
ruby-hmac (0.4.0)
ruby-openid (2.3.0)
ruby-readability (0.5.7)
guess_html_encoding (>= 0.0.4)
nokogiri (>= 1.4.2)
sanitize (2.0.6)
nokogiri (>= 1.4.4)
sass (3.2.12)
Expand Down Expand Up @@ -337,6 +341,7 @@ GEM
celluloid (>= 0.14.1)
ice_cube (~> 0.11.0)
sidekiq (~> 2.15.0)
simple-rss (1.3.1)
simplecov (0.7.1)
multi_json (~> 1.0)
simplecov-html (~> 0.7.1)
Expand Down Expand Up @@ -466,6 +471,7 @@ DEPENDENCIES
rinku
rspec-given
rspec-rails
ruby-readability
sanitize
sass
sass-rails
Expand All @@ -474,6 +480,7 @@ DEPENDENCIES
sidekiq (= 2.15.1)
sidekiq-failures
sidetiq (>= 0.3.6)
simple-rss
simplecov
sinatra
slim
Expand Down
27 changes: 27 additions & 0 deletions app/assets/javascripts/embed.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/* global discourseUrl */
/* global discourseEmbedUrl */
(function() {

var comments = document.getElementById('discourse-comments'),
iframe = document.createElement('iframe');
iframe.src = discourseUrl + "embed/best?embed_url=" + encodeURIComponent(discourseEmbedUrl);
iframe.id = 'discourse-embed-frame';
iframe.width = "100%";
iframe.frameBorder = "0";
iframe.scrolling = "no";
comments.appendChild(iframe);


function postMessageReceived(e) {
if (!e) { return; }
if (discourseUrl.indexOf(e.origin) === -1) { return; }

if (e.data) {
if (e.data.type === 'discourse-resize' && e.data.height) {
iframe.height = e.data.height + "px";
}
}
}
window.addEventListener('message', postMessageReceived, false);

})();
69 changes: 69 additions & 0 deletions app/assets/stylesheets/embed.css.scss
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
//= require ./vendor/normalize
//= require ./common/foundation/base

article.post {
border-bottom: 1px solid #ddd;

.post-date {
float: right;
color: #aaa;
font-size: 12px;
margin: 4px 4px 0 0;
}

.author {
padding: 20px 0;
width: 92px;
float: left;

text-align: center;

h3 {
text-align: center;
color: #4a6b82;
font-size: 13px;
margin: 0;
}
}

.cooked {
padding: 20px 0;
margin-left: 92px;

p {
margin: 0 0 1em 0;
}
}
}

header {
padding: 10px 10px 20px 10px;

font-size: 18px;

border-bottom: 1px solid #ddd;
}

footer {
font-size: 18px;

.logo {
margin-right: 10px;
margin-top: 10px;
}

a[href].button {
margin: 10px 0 0 10px;
}
}

.logo {
float: right;
max-height: 30px;
}

a[href].button {
background-color: #eee;
padding: 5px;
display: inline-block;
}
34 changes: 34 additions & 0 deletions app/controllers/embed_controller.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
class EmbedController < ApplicationController
skip_before_filter :check_xhr
skip_before_filter :preload_json
before_filter :ensure_embeddable

layout 'embed'

def best
embed_url = params.require(:embed_url)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Validate embed_url parameter format.

The embed_url parameter is required but not validated for URL format. This could allow malformed or non-HTTP(S) URLs to be processed, potentially causing issues in downstream logic.

Consider adding URL format validation:

-    embed_url = params.require(:embed_url)
+    embed_url = params.require(:embed_url)
+    unless embed_url =~ /\A#{URI::regexp(['http', 'https'])}\z/
+      raise ActionController::BadRequest, 'Invalid embed_url format'
+    end
🤖 Prompt for AI Agents
In app/controllers/embed_controller.rb around line 9, the required embed_url
param is not validated for URL format; update the action to parse and validate
embed_url after params.require(:embed_url) by attempting to build a URI (or use
URI.parse) and confirm it has an http or https scheme and a non-empty host,
rescuing parse errors; if validation fails, return a 400/422 response with a
clear error message and do not proceed with downstream processing.

topic_id = TopicEmbed.topic_id_for_embed(embed_url)

if topic_id
@topic_view = TopicView.new(topic_id, current_user, {best: 5})
else
Jobs.enqueue(:retrieve_topic, user_id: current_user.try(:id), embed_url: embed_url)
render 'loading'
end

discourse_expires_in 1.minute
end

private

def ensure_embeddable
raise Discourse::InvalidAccess.new('embeddable host not set') if SiteSetting.embeddable_host.blank?
raise Discourse::InvalidAccess.new('invalid referer host') if URI(request.referer || '').host != SiteSetting.embeddable_host

response.headers['X-Frame-Options'] = "ALLOWALL"
rescue URI::InvalidURIError
raise Discourse::InvalidAccess.new('invalid referer host')
end


end
24 changes: 24 additions & 0 deletions app/jobs/regular/retrieve_topic.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
require_dependency 'email/sender'
require_dependency 'topic_retriever'

module Jobs

# Asynchronously retrieve a topic from an embedded site
class RetrieveTopic < Jobs::Base

def execute(args)
raise Discourse::InvalidParameters.new(:embed_url) unless args[:embed_url].present?

user = nil
if args[:user_id]
user = User.where(id: args[:user_id]).first
end

TopicRetriever.new(args[:embed_url], no_throttle: user.try(:staff?)).retrieve
end

end

end


41 changes: 41 additions & 0 deletions app/jobs/scheduled/poll_feed.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#
# Creates and Updates Topics based on an RSS or ATOM feed.
#
require 'digest/sha1'
require_dependency 'post_creator'
require_dependency 'post_revisor'
require 'open-uri'

module Jobs
class PollFeed < Jobs::Scheduled
recurrence { hourly }
sidekiq_options retry: false

def execute(args)
poll_feed if SiteSetting.feed_polling_enabled? &&
SiteSetting.feed_polling_url.present? &&
SiteSetting.embed_by_username.present?
end

def feed_key
@feed_key ||= "feed-modified:#{Digest::SHA1.hexdigest(SiteSetting.feed_polling_url)}"
end

def poll_feed
user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
return if user.blank?

require 'simple-rss'
rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)

rss.items.each do |i|
url = i.link
url = i.id if url.blank? || url !~ /^https?\:\/\//

content = CGI.unescapeHTML(i.content.scrub)
TopicEmbed.import(user, url, i.title, content)
end
Comment on lines +24 to +37
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against missing content in RSS items to avoid job crashes

Many feeds don’t populate a content field on items; in those cases i.content will be nil, and i.content.scrub will raise a NoMethodError, causing the whole PollFeed run to fail.

Consider falling back to description/summary and always calling scrub on a string:

-      rss.items.each do |i|
-        url = i.link
-        url = i.id if url.blank? || url !~ /^https?\:\/\//
-
-        content = CGI.unescapeHTML(i.content.scrub)
-        TopicEmbed.import(user, url, i.title, content)
-      end
+      rss.items.each do |i|
+        url = i.link
+        url = i.id if url.blank? || url !~ /^https?\:\/\//
+
+        raw_content = i.content || i.description || i.summary || ""
+        content = CGI.unescapeHTML(raw_content.to_s.scrub)
+        TopicEmbed.import(user, url, i.title, content)
+      end

(Separately, if you expect untrusted or variable feed_polling_url values, adding basic scheme/host checks and network timeouts around open would further harden this job.)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def poll_feed
user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
return if user.blank?
require 'simple-rss'
rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)
rss.items.each do |i|
url = i.link
url = i.id if url.blank? || url !~ /^https?\:\/\//
content = CGI.unescapeHTML(i.content.scrub)
TopicEmbed.import(user, url, i.title, content)
end
def poll_feed
user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
return if user.blank?
require 'simple-rss'
rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)
rss.items.each do |i|
url = i.link
url = i.id if url.blank? || url !~ /^https?\:\/\//
raw_content = i.content || i.description || i.summary || ""
content = CGI.unescapeHTML(raw_content.to_s.scrub)
TopicEmbed.import(user, url, i.title, content)
end
🧰 Tools
🪛 Brakeman (7.1.1)

[medium] 29-29: Model attribute used in file name
Type: File Access
Confidence: Medium
More info: https://brakemanscanner.org/docs/warning_types/file_access/

(File Access)

🤖 Prompt for AI Agents
In app/jobs/scheduled/poll_feed.rb around lines 24 to 37, the code assumes
i.content is present and calls i.content.scrub which will raise NoMethodError
when content is nil; update the logic to guard against nil by deriving a string
first (e.g., content_str = (i.content || i.description || i.summary).to_s), then
call scrub and CGI.unescapeHTML on that string before passing to
TopicEmbed.import, so you always call scrub on a String and fall back to
description/summary when content is missing.

end

end
end
9 changes: 9 additions & 0 deletions app/models/post.rb
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,10 @@ def self.types
@types ||= Enum.new(:regular, :moderator_action)
end

def self.cook_methods
@cook_methods ||= Enum.new(:regular, :raw_html)
end

def self.find_by_detail(key, value)
includes(:post_details).where(post_details: { key: key, value: value }).first
end
Expand Down Expand Up @@ -124,6 +128,11 @@ def post_analyzer
end

def cook(*args)
# For some posts, for example those imported via RSS, we support raw HTML. In that
# case we can skip the rendering pipeline.
return raw if cook_method == Post.cook_methods[:raw_html]

# Default is to cook posts
Plugin::Filter.apply(:after_post_cook, self, post_analyzer.cook(*args))
end

Expand Down
82 changes: 82 additions & 0 deletions app/models/topic_embed.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
require_dependency 'nokogiri'

class TopicEmbed < ActiveRecord::Base
belongs_to :topic
belongs_to :post
validates_presence_of :embed_url
validates_presence_of :content_sha1

# Import an article from a source (RSS/Atom/Other)
def self.import(user, url, title, contents)
return unless url =~ /^https?\:\/\//

contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Potential XSS vulnerability in URL interpolation.

The URL is directly interpolated into HTML without escaping, which could allow XSS attacks if the URL contains malicious characters or JavaScript. Even though the content uses cook_method: :raw_html, user-controlled URLs should be escaped.

Apply proper HTML escaping:

-    contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n"
+    require 'cgi'
+    escaped_url = CGI.escapeHTML(url)
+    contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{escaped_url}'>#{escaped_url}</a>")}</small>\n"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n"
require 'cgi'
escaped_url = CGI.escapeHTML(url)
contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{escaped_url}'>#{escaped_url}</a>")}</small>\n"
🤖 Prompt for AI Agents
In app/models/topic_embed.rb around line 13, the URL is interpolated into raw
HTML which allows XSS; escape the URL before constructing the anchor tag and
ensure both the href and the link text are HTML-escaped (e.g. via
ERB::Util.html_escape or Rails’ h helper) and then mark the final string as safe
HTML if needed for cook_method: :raw_html. Replace the direct interpolation with
code that escapes url for attribute and content, composes the <a> tag from those
escaped values, and only then returns the HTML-safe string.


embed = TopicEmbed.where(embed_url: url).first
content_sha1 = Digest::SHA1.hexdigest(contents)
post = nil

# If there is no embed, create a topic, post and the embed.
if embed.blank?
Topic.transaction do
creator = PostCreator.new(user, title: title, raw: absolutize_urls(url, contents), skip_validations: true, cook_method: Post.cook_methods[:raw_html])
post = creator.create
if post.present?
TopicEmbed.create!(topic_id: post.topic_id,
embed_url: url,
content_sha1: content_sha1,
post_id: post.id)
end
end
else
post = embed.post
# Update the topic if it changed
if content_sha1 != embed.content_sha1
revisor = PostRevisor.new(post)
revisor.revise!(user, absolutize_urls(url, contents), skip_validations: true, bypass_rate_limiter: true)
embed.update_column(:content_sha1, content_sha1)
end
end

post
end

def self.import_remote(user, url, opts=nil)
require 'ruby-readability'

opts = opts || {}
doc = Readability::Document.new(open(url).read,
tags: %w[div p code pre h1 h2 h3 b em i strong a img],
attributes: %w[href src])

TopicEmbed.import(user, url, opts[:title] || doc.title, doc.content)
end
Comment on lines +44 to +53
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: SSRF vulnerability using open(url).read.

Using Ruby's open(url).read to fetch remote content is a severe security vulnerability. The open method (via Kernel#open) can:

  1. Access local files using file:// URLs
  2. Execute commands via pipe syntax (|command)
  3. Access internal network resources
  4. Bypass firewalls and access controls

This allows attackers to:

  • Read sensitive files from the server filesystem
  • Access internal services not exposed to the internet
  • Execute arbitrary commands (in some configurations)

Replace with a safe HTTP client and validate the URL scheme:

 def self.import_remote(user, url, opts=nil)
   require 'ruby-readability'
+  require 'open-uri'
+  
+  # Validate URL scheme
+  uri = URI.parse(url)
+  unless uri.is_a?(URI::HTTP) || uri.is_a?(URI::HTTPS)
+    raise ArgumentError, 'Only HTTP(S) URLs are allowed'
+  end
 
   opts = opts || {}
-  doc = Readability::Document.new(open(url).read,
+  # Use OpenURI with safe options or better, use Net::HTTP directly
+  content = URI.open(url, redirect: true, read_timeout: 10).read
+  doc = Readability::Document.new(content,
                                       tags: %w[div p code pre h1 h2 h3 b em i strong a img],
                                       attributes: %w[href src])
 
   TopicEmbed.import(user, url, opts[:title] || doc.title, doc.content)
 end

Better yet, use a dedicated HTTP client library with proper timeouts and safeguards:

require 'net/http'

uri = URI.parse(url)
raise ArgumentError, 'Only HTTP(S) URLs are allowed' unless uri.is_a?(URI::HTTP) || uri.is_a?(URI::HTTPS)

# Add additional checks to prevent access to internal networks
if uri.host.match?(/^(127\.|10\.|172\.(1[6-9]|2\d|3[01])\.|192\.168\.)/)
  raise ArgumentError, 'Access to internal networks not allowed'
end

http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = (uri.scheme == 'https')
http.open_timeout = 5
http.read_timeout = 10
request = Net::HTTP::Get.new(uri.request_uri)
response = http.request(request)
content = response.body
🤖 Prompt for AI Agents
In app/models/topic_embed.rb around lines 44 to 53, the use of open(url).read
introduces a critical SSRF and command/file access risk; replace it by parsing
and validating the URL (allow only http/https), resolve the host and block
private/internal IP ranges, use a dedicated HTTP client with connection and read
timeouts and redirect limits, perform the GET request and check for successful
200 responses before passing the body to Readability, and handle/network
exceptions by raising or logging a clear error instead of calling open directly.


# Convert any relative URLs to absolute. RSS is annoying for this.
def self.absolutize_urls(url, contents)
uri = URI(url)
prefix = "#{uri.scheme}://#{uri.host}"
prefix << ":#{uri.port}" if uri.port != 80 && uri.port != 443

fragment = Nokogiri::HTML.fragment(contents)
fragment.css('a').each do |a|
href = a['href']
if href.present? && href.start_with?('/')
a['href'] = "#{prefix}/#{href.sub(/^\/+/, '')}"
end
end
fragment.css('img').each do |a|
src = a['src']
if src.present? && src.start_with?('/')
a['src'] = "#{prefix}/#{src.sub(/^\/+/, '')}"
end
end

fragment.to_html
end

def self.topic_id_for_embed(embed_url)
TopicEmbed.where(embed_url: embed_url).pluck(:topic_id).first
end

end
30 changes: 30 additions & 0 deletions app/views/embed/best.html.erb
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
<header>
<%- if @topic_view.posts.present? %>
<%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank') %>
<%- else %>
<%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank') %>
<%- end if %>
Comment on lines +2 to +6
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix invalid ERB block terminator (<%- end if %>)

<%- end if %> is not valid ERB/Ruby syntax and will cause the template to fail to render. The if should only appear on the opening tag; the closing tag must be just end:

-  <%- if @topic_view.posts.present? %>
-    <%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank') %>
-  <%- else %>
-    <%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank') %>
-  <%- end if %>
+  <%- if @topic_view.posts.present? %>
+    <%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank') %>
+  <%- else %>
+    <%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank') %>
+  <%- end %>

(Optionally, you might also add a short alt attribute to the avatar <img> for accessibility, but that’s non-blocking.)

🤖 Prompt for AI Agents
In app/views/embed/best.html.erb around lines 2 to 6 the ERB closing tag is
invalid (`<%- end if %>`); replace it with a proper ERB block terminator (`<%-
end %>`) so the conditional closes correctly, keep the existing opening `<%- if
@topic_view.posts.present? %>` and associated link_to branches unchanged, and
optionally add a short alt attribute to any avatar <img> in this template for
accessibility.


<%= link_to(image_tag(SiteSetting.logo_url, class: 'logo'), Discourse.base_url) %>
</header>

<%- if @topic_view.posts.present? %>
<%- @topic_view.posts.each do |post| %>
<article class='post'>
<%= link_to post.created_at.strftime("%e %b %Y"), post.url, class: 'post-date', target: "_blank" %>
<div class='author'>
<img src='<%= post.user.small_avatar_url %>'>
<h3><%= post.user.username %></h3>
</div>
<div class='cooked'><%= raw post.cooked %></div>
<div style='clear: both'></div>
</article>
<%- end %>

<footer>
<%= link_to(I18n.t('embed.continue'), @topic_view.topic.url, class: 'button', target: '_blank') %>
<%= link_to(image_tag(SiteSetting.logo_url, class: 'logo'), Discourse.base_url) %>
</footer>

<% end %>

Loading