pauamma: Cartooney crab wearing hot pink and acid green facemask holding drink with straw (Default)
Res facta quae tamen fingi potuit ([personal profile] pauamma) wrote in [site community profile] dw_dev2016-03-08 01:46 am

Question thread #40

It's time for another question thread!

The rules:

- You may ask any dev-related question you have in a comment. (It doesn't even need to be about Dreamwidth, although if it involves a language/library/framework/database Dreamwidth doesn't use, you will probably get answers pointing that out and suggesting a better place to ask.)
- You may also answer any question, using the guidelines given in To Answer, Or Not To Answer and in this comment thread.
kaberett: Trans symbol with Swiss Army knife tools at other positions around the central circle. (Default)

[personal profile] kaberett 2016-03-10 11:49 am (UTC)(link)
I am properly baffled by how the various bits of HTML cleaner work. Probably the correct response is for me to have a poke around the actual Perl modules that handle it, but with respect to e.g. https://github.com/dreamwidth/dw-free/issues/1643 and https://github.com/dreamwidth/dw-free/issues/1652 I would definitely appreciate some cheer-leading!
marahmarie: (M In M Forever) (Default)

[personal profile] marahmarie 2016-03-14 03:19 am (UTC)(link)
Seconded.

(Also, thanks for opening a ticket on that. I've been going on about escaped HTML in PMs literally since 2010 or so. I never approached Support but I did write an actual rant to [personal profile] sophie about all the apostrophes and so on, and I think she said somebody was going to file (or had already filed, or was already aware of) something yadda yadda ticket bug whatever. That might be what Denise is referring to on the ticket page but I'd have to search so far back in my PMs to find the conversation Sophie and I had on it that it might not be possible to find anymore.)
Edited (typos) 2016-03-14 03:21 (UTC)
kaberett: Trans symbol with Swiss Army knife tools at other positions around the central circle. (Default)

[personal profile] kaberett 2016-03-14 09:19 am (UTC)(link)
Complaining to me that something I'm considering working on (but having difficulty with) isn't fixed yet is the opposite of encouraging. Please don't.
kaberett: Trans symbol with Swiss Army knife tools at other positions around the central circle. (Default)

[personal profile] kaberett 2016-03-14 10:25 pm (UTC)(link)
alRIGHT. I have got as far as locating LJ::ehtml, which is a small set of regexes covering &, ", ', <, >.

LJ::Protocol, beginning line 495, handles sending a message. The subject text is given by $req->{'subject'}, checked by LJ::text_in (in LJ::TextUtil), and passed to LJ::Message. $req is what was submitted via the form?

LJ::text_in returns LJ::is_utf8($text), which returns LJ::is_utf8_wrapper( $text ), which returns Unicode::CheckUTF8::is_utf8( ''. $text ), to make sure that the text is treated as a string. Fine.

LJ::Message:
278 sub subject_raw {
  1     my $self = shift;
  2     return $self->_row_getter("subject", "msgtext");
  3 }
  4 
  5 sub subject {
  6     my $self = shift;
  7     return LJ::ehtml($self->subject_raw) || "(no subject)";
  8 }
  9 
 10 sub body_raw {
 11     my $self = shift;
 12     return $self->_row_getter("body", "msgtext");
 13 }
 14 
 15 sub body {
 16     my $self = shift;
 17     return LJ::ehtml($self->body_raw);
 18 }


Alright, so it's the LJ::ehtml call in row 285 (sub subject). Body text isn't getting similarly overescaped; I think PMs don't display HTML in any case so there's no need for this call to the cleaner, but I don't understand the difference in behaviour between body and subject and I'm a bit nervous about just throwing things out. Which I think means I should cross-check how support requests are handled, because those don't get overescaped but are HTML-stripped.
denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

[staff profile] denise 2016-03-15 02:22 am (UTC)(link)

The problem is that LJ::ehtml is getting called every time a PM is sent, but if a PM is a reply, it's already been through LJ::ehtml once. So, an & has already been turned into &amp; on the previous pass, and a second pass will see the & in the &amp; and turn it into &amp;amp;. (I think. I am doing this in email and in order to make it render I have to escape it one order higher, heh.)

I think we have a "strip HTML" rather than "escape HTML" function somewhere in there; you can probably replace the call to ehtml with a call to that, but somebody better with security/escaping issues than I am can probably say for sure.

kaberett: Trans symbol with Swiss Army knife tools at other positions around the central circle. (Default)

[personal profile] kaberett 2016-03-15 09:38 am (UTC)(link)

RIGHT. OF COURSE. Thank you!

... which, I mean, that still leaves me at "I don't adequately understand what's going on here to be confident in any solution", but also means that I think I really can go cross-reference with how the Support Board handles this.

denise: Image: Me, facing away from camera, on top of the Castel Sant'Angelo in Rome (Default)

[staff profile] denise 2016-03-15 09:43 am (UTC)(link)

and ha, I am a master of escaping, that did exactly whta I wanted it to do.

kaberett: Grinning emoticon. (:D)

[personal profile] kaberett 2016-03-15 10:53 am (UTC)(link)
VICTORY IS YOURS.