diff options
author | Giuseppe Bilotta <giuseppe.bilotta@gmail.com> | 2007-02-20 23:02:35 +0000 |
---|---|---|
committer | Giuseppe Bilotta <giuseppe.bilotta@gmail.com> | 2007-02-20 23:02:35 +0000 |
commit | 397b61df257f72a8ce90792985f76497ba735da4 (patch) | |
tree | 7b8321eab08498376d537178ebe7ed57dfc23713 | |
parent | 1572836f8c2888742b4f65da7dc6f66735f94bc1 (diff) |
Use ASCII KCODE to prevent problems like missing characters or matching failures when clients send messages in something else than UTF-8
-rw-r--r-- | ChangeLog | 10 | ||||
-rwxr-xr-x | bin/rbot | 11 | ||||
-rw-r--r-- | lib/rbot/rfc2812.rb | 4 |
3 files changed, 22 insertions, 3 deletions
@@ -6,6 +6,16 @@ <yaohan.chen@gmail.com>. People take turns to continue a chain of words by saying words that begin with the final letter(s) of the previous word. + * IRC messages are not UTF-8: Most of the string processing across + rbot is done against IRC messages, which do not have a well-defined + encoding. Although many clients are now using UTF-8, there is no + guarantee that an arbitrary string received from IRC will be UTF-8 + encoded. We have to force ASCII (byte-wise/charset agnostic) matching + because otherwise some strings can give problems: in particular, for + example, the bytesequence "\340\350\354\362\371" (that is the aeiou + vowels, each with a grave accent) will cause the string to be + considered up to the "\354" (i with grave accent) only: so either the + rest of the message is ignored, or the matching fails. 2007-02-18 Giuseppe Bilotta <giuseppe.bilotta@gmail.com> @@ -21,7 +21,16 @@ # IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN # CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -$KCODE = 'u' +# Most of the string processing across rbot is done against IRC messages, which +# do not have a well-defined encoding. Although many clients are now using +# UTF-8, there is no guarantee that an arbitrary string received from IRC will +# be UTF-8 encoded. We have to force ASCII (byte-wise/charset agnostic) +# matching because otherwise some strings can give problems: in particular, for +# example, the bytesequence "\340\350\354\362\371" (that is the aeiou vowels, +# each with a grave accent) will cause the string to be considered up to the +# "\354" (i with grave accent) only: so either the rest of the message is +# ignored, or the matching fails. +$KCODE = 'a' $VERBOSE=true diff --git a/lib/rbot/rfc2812.rb b/lib/rbot/rfc2812.rb index 5dec464c..97181b03 100644 --- a/lib/rbot/rfc2812.rb +++ b/lib/rbot/rfc2812.rb @@ -888,8 +888,8 @@ module Irc data = Hash.new data[:serverstring] = serverstring - unless serverstring =~ /^(:(\S+)\s)?(\S+)(\s(.*))?/ - raise "Unparseable Server Message!!!: #{serverstring}" + unless serverstring.chomp =~ /^(:(\S+)\s)?(\S+)(\s(.*))?$/ + raise "Unparseable Server Message!!!: #{serverstring.inspect}" end prefix, command, params = $2, $3, $5 |