]> git.netwichtig.de Git - user/henk/code/ruby/rbot.git/commitdiff
rss plugin: prevent double UTF-8 deconding
authorGiuseppe Bilotta <giuseppe.bilotta@gmail.com>
Thu, 20 Nov 2008 14:17:27 +0000 (15:17 +0100)
committerGiuseppe Bilotta <giuseppe.bilotta@gmail.com>
Thu, 20 Nov 2008 14:27:24 +0000 (15:27 +0100)
The rss parser looks at the encoding specified into the XML file and
converts everything to UTF-8. Since we do the UTF-8 conversion
ourselves, monkey-patch the XML 'encoding' declaration to claim it's
UTF-8 already (as it actually is).

data/rbot/plugins/rss.rb

index 9e85b416bb6bc6db5a50994008653e29a13c5087..45ee4a2300aa866d8d754be630cd489f79e1d5a5 100644 (file)
@@ -1092,6 +1092,13 @@ class RSSFeedsPlugin < Plugin
     # reassign the 0.9 RDFs to 1.0, and hope it goes right.
     xml.gsub!("xmlns=\"http://my.netscape.com/rdf/simple/0.9/\"",
               "xmlns=\"http://purl.org/rss/1.0/\"")
+    # make sure the parser doesn't double-convert in case the feed is not UTF-8
+    xml.sub!(/<\?xml (.*?)\?>/) do |match|
+      if /\bencoding=(['"])(.*?)\1/.match(match)
+        match.sub!(/\bencoding=(['"])(?:.*?)\1/,'encoding="UTF-8"')
+      end
+      match
+    end
     feed.mutex.synchronize do
       feed.xml = xml
     end