Oh. In that case I'm sorry to say that there has been extensive work done in that (UTF-8/Unicode) area in every version after 5.8, and that since it sounds like you're using it in a not-really-intended way, you're quite likely to run into problems.
I can definitely see how doing it the way you describe was a good choice a bunch of years ago, but I don't think it is any more. It also sounds like the sort of thing that's a huge amount of dull and fiddly work to change...
Back then (Perl 5.6 or so?), Unicode support in Perl was kinda flaky, if I recall correctly, so it made sense not to rely on it. (Plus LiveJournal originally didn't even support Unicode; that may have influenced the way things were handled. Do you remember the character set conversion widget people had to use to tell the system what charset their previous journal entries were stored in, once Unicode started being used internally?)
Now, doing the Unicode explicitly in Perl would totally be the way to go. But as Calle said, a *huge* amount of dull and fiddly (and error-prone!) work to change.
Once it's done, things will be better (no more opportunities to miss a place where you forgot to tell Perl, "No, I want to do this by myself, the hard way")... but until then... yeah.
Hell, we still have to point some people at the UTF-8 conversion page on LJ, because their old entries were never properly converted and they're trying to import!
But yeah; it's probably the sort of thing we should've done before opening, but it would've delayed us another six months at least, and, meh.
no subject
I can definitely see how doing it the way you describe was a good choice a bunch of years ago, but I don't think it is any more. It also sounds like the sort of thing that's a huge amount of dull and fiddly work to change...
no subject
Siiiiiiiiigh. We needed another one of those like we need a hole in the head...
no subject
Back then (Perl 5.6 or so?), Unicode support in Perl was kinda flaky, if I recall correctly, so it made sense not to rely on it. (Plus LiveJournal originally didn't even support Unicode; that may have influenced the way things were handled. Do you remember the character set conversion widget people had to use to tell the system what charset their previous journal entries were stored in, once Unicode started being used internally?)
Now, doing the Unicode explicitly in Perl would totally be the way to go. But as Calle said, a *huge* amount of dull and fiddly (and error-prone!) work to change.
Once it's done, things will be better (no more opportunities to miss a place where you forgot to tell Perl, "No, I want to do this by myself, the hard way")... but until then... yeah.
no subject
But yeah; it's probably the sort of thing we should've done before opening, but it would've delayed us another six months at least, and, meh.
no subject