The main concern I would have is changes(*) that would cause a string result to have the dreaded "this is... UNICODE!" flag set. For hysterical raisins, Dreamwidth (as LiveJournal) wants all strings to be Unicode (using the UTF8 encoding) but with that flag clear, and clears that flag on all strings returned from code known to set it. When some new code returns a string with that flag set, all strings we combine it with (usually,the rest of the page this string goes into) get turned into gibberish, causing users and developers alike to go "double UTF?".
(*) in the interpreter itself, in core modules, or in any other modules Dreamwidth uses that need to be ungraded to run with the newer perl and core modules)
Oh. In that case I'm sorry to say that there has been extensive work done in that (UTF-8/Unicode) area in every version after 5.8, and that since it sounds like you're using it in a not-really-intended way, you're quite likely to run into problems.
I can definitely see how doing it the way you describe was a good choice a bunch of years ago, but I don't think it is any more. It also sounds like the sort of thing that's a huge amount of dull and fiddly work to change...
Back then (Perl 5.6 or so?), Unicode support in Perl was kinda flaky, if I recall correctly, so it made sense not to rely on it. (Plus LiveJournal originally didn't even support Unicode; that may have influenced the way things were handled. Do you remember the character set conversion widget people had to use to tell the system what charset their previous journal entries were stored in, once Unicode started being used internally?)
Now, doing the Unicode explicitly in Perl would totally be the way to go. But as Calle said, a *huge* amount of dull and fiddly (and error-prone!) work to change.
Once it's done, things will be better (no more opportunities to miss a place where you forgot to tell Perl, "No, I want to do this by myself, the hard way")... but until then... yeah.
Hell, we still have to point some people at the UTF-8 conversion page on LJ, because their old entries were never properly converted and they're trying to import!
But yeah; it's probably the sort of thing we should've done before opening, but it would've delayed us another six months at least, and, meh.
no subject
(*) in the interpreter itself, in core modules, or in any other modules Dreamwidth uses that need to be ungraded to run with the newer perl and core modules)
no subject
I can definitely see how doing it the way you describe was a good choice a bunch of years ago, but I don't think it is any more. It also sounds like the sort of thing that's a huge amount of dull and fiddly work to change...
no subject
Siiiiiiiiigh. We needed another one of those like we need a hole in the head...
no subject
Back then (Perl 5.6 or so?), Unicode support in Perl was kinda flaky, if I recall correctly, so it made sense not to rely on it. (Plus LiveJournal originally didn't even support Unicode; that may have influenced the way things were handled. Do you remember the character set conversion widget people had to use to tell the system what charset their previous journal entries were stored in, once Unicode started being used internally?)
Now, doing the Unicode explicitly in Perl would totally be the way to go. But as Calle said, a *huge* amount of dull and fiddly (and error-prone!) work to change.
Once it's done, things will be better (no more opportunities to miss a place where you forgot to tell Perl, "No, I want to do this by myself, the hard way")... but until then... yeah.
no subject
But yeah; it's probably the sort of thing we should've done before opening, but it would've delayed us another six months at least, and, meh.
no subject