Reading (text)-files from disk is very easy in Ruby.
content = IO.read( "filename.txt" )
When you are trying to do something with this content you can get in trouble.
content.split(",") # => invalid byte sequence in UTF-8
I've setup my environment very nicely, so Ruby treats external files as UTF-8. The trouble begins when you are trying to handle files that are encoded in the ISO-8859-1 or CP-1252 format and Ruby thinks they are UTF-8.
To accept both UTF-8 and ISO-8858-? formats I've implemented the following hack:
def convert_to_utf8(content) if content.valid_encoding? content else content.force_encoding("ISO-8859-1").encode("UTF-8") end end # reading the content: content = convert_to_utf8 IO.read( "filename.txt" )
This hack works for me because the text-files I use are in one of those formats.