in Hacking

My ruby external-encoding hack

Reading (text)-files from disk is very easy in Ruby.

content = IO.read( "filename.txt" )

When you are trying to do something with this content you can get in trouble.

content.split(",")  # => invalid byte sequence in UTF-8

I've setup my environment very nicely, so Ruby treats external files as UTF-8. The trouble begins when you are trying to handle files that are encoded in the ISO-8859-1 or CP-1252 format and Ruby thinks they are UTF-8.

To accept both UTF-8 and ISO-8858-? formats I've implemented the following hack:

  def convert_to_utf8(content)
    if content.valid_encoding?
      content
    else
      content.force_encoding("ISO-8859-1").encode("UTF-8")
    end    
  end

  # reading the content:
  content = convert_to_utf8 IO.read( "filename.txt" ) 

This hack works for me because the text-files I use are in one of those formats.