sophie: A cartoon-like representation of a girl standing on a hill, with brown hair, blue eyes, a flowery top, and blue skirt. ☀ (0)
Sophie ([personal profile] sophie) wrote in [site community profile] dw_dev 2011-01-25 07:18 am (UTC)

Re: no apologies needed, thank you for pointing that out

You want to use the Encode module, which is a core part of Perl since 5.8. The best way of doing it would be to use it like this:

open(my $fh, "<:encoding(utf-8)", $myfile);
# read from $fh here
close($fh);


That will read from $myfile as if it's a UTF-8 file. If the input file is not, in actual fact, UTF-8, then it will spew errors along the lines of 'utf8 "\xA1" does not map to Unicode', and your input text will have a stringified "\xA1" (or whatever the character was) in the text where the invalid character appeared. In this way, your scalar will be guaranteed to be valid UTF-8.

There are other things you can do with Encode; I've used it a lot. If you need help on how to do anything with it, let me know - character encodings are something of a specialty for me.

[edit: Fixing bad text.]

Post a comment in response:

If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

If you are unable to use this captcha for any reason, please contact us by email at support@dreamwidth.org