Flutterby™! : Charset weirdness

Next unread comment / Catchup all unread comments User Account Info | Logout | XML/Pilot/etc versions | Long version (with comments) | Weblog archives | Site Map | | Browse Topics

Charset weirdness

2007-05-11 17:40:58.459458+00 by Dan Lyke 7 comments

A couple of you may have noticed issues with UTF-8 versus ISO-8859-1 character set issues since the move to the new server. I just went into the code to see if I could hack around this, and discovered that I should be dealing with them already.

Anyone know how the submitted form's character set should be sent to CGI.pm?

[ related topics: Web development ]

comments in ascending chronological order (reverse):

#Comment Re: made: 2007-05-11 17:41:16.963589+00 by: Dan Lyke [edit history]

#Comment Re: made: 2007-05-11 17:44:16.832261+00 by: Dan Lyke

Crap, okay, that should have failed.

#Comment Re: made: 2007-05-11 19:22:39.368717+00 by: spc476

I don't recall it ever being sent (at least by Apache). However, you can do:

<FORM METHOD="post" ACTION="blah.cgi" ACCEPT-CHARSET="US-ASCII"> and hope the browser follows the hint. That's about the only way I know to do it.

#Comment Re: Meta Tag made: 2007-05-12 00:39:54.747917+00 by: Roger

You can also force the issue by setting the appropriate META tag in the header:

[meta http-equiv="Content-Type" content="text/html; charset=utf-8" /]

#Comment Re: made: 2007-05-12 13:53:29.355655+00 by: Dan Lyke

Roger, is that supposed to force the submit character set? My experience is that it doesn't necessarily.

And there's something weird about how Perl (especially in conjunction with mod_perl) is handling this stuff, at least in the case of my code.

I think I've got it working for comments, still need to make it work for the front page.

#Comment Re: meta + forms made: 2007-05-17 21:12:22.107524+00 by: Roger

In my experiments, that "meta" tag has worked wonders in accepting text copied from any number of sources (websites in shift-jis, text from Word) and sending form data that is utf-8 encoded to my application.

My app's in Python, which may be a factor. I've noticed that trying to pull UTF-8 out of MySQL with Python and use it to PHP via XML-RPC is just a nightmare (well, and XML/XSLT via Popoon is in this mix, which likely does not help!)

#Comment Re: made: 2007-05-17 22:01:34.553537+00 by: Dan Lyke

Yeah, my problem is that I'm trying to detect anything that doesn't look like UTF-8 and convert it to standard ampersand escaped entity encoding, and for some reason that isn't happening all the time. Come to think of it, it may be an IE[Wiki] thing... Hmmm... Might have to fire up the Windows box, but I *really* don't want to be mucking with web apps right now.