Flutterby™! : That UTF-8 error

Next unread comment / Catchup all unread comments User Account Info | Logout | XML/Pilot/etc versions | Long version (with comments) | Weblog archives | Site Map | | Browse Topics

That UTF-8 error

2008-06-06 18:56:07.343985+00 by Dan Lyke 16 comments

Okay, if anyone out there knows a little bit about Perl and character sets, I need some help with that UTF-8 character people have been running into... More in the comments.

[ related topics: Flutterby Meta Perl Open Source ]

comments in ascending chronological order (reverse):

#Comment Re: made: 2008-06-06 19:05:04.556442+00 by: Dan Lyke

Hang on, testing to make sure it's not something simple and stupid...

#Comment Re: made: 2008-06-06 19:06:56.419288+00 by: Dan Lyke [edit history]

A Euro symbol: €

Adding a `

#Comment Re: made: 2008-06-06 19:07:33.189242+00 by: ziffle

Dan, its about two US dollars.

#Comment Re: made: 2008-06-06 19:14:36.40769+00 by: ebradway [edit history]

`try this' - hmmm...

#Comment Re: made: 2008-06-06 19:16:00.740395+00 by: ebradway

Yep. It's the grave (below the tilde). Right now it's only when you edit an existing comment - not when you post a new comment.

#Comment Re: made: 2008-06-06 19:18:46.755553+00 by: ebradway [edit history]

`` I think you got it...

#Comment Re: made: 2008-06-06 19:22:29.670531+00 by: Dan Lyke [edit history]

So depending on the client character set, the code looks like:

while ($t =~ /^(.*?)([\x80-\x{ffff}])(.*)$/)
            {
                $t = sprintf("%s&#%d;%s",$1,ord($2),$3);
            }

or

while ($t =~ /^(.*?)([\x80-\xff])(.*)$/)
            {
                $t = sprintf("%s&#%d;%s",$1,ord($2),$3);
            }

I've even tossed the second in the first code path, thinking that at worst it'd give me bad characters, but neither seems to be substituting correctly.

#Comment Re: made: 2008-06-06 19:48:02.415191+00 by: spl

I'm glad I get paid in € right now! :)

#Comment Re: made: 2008-06-06 19:48:54.109036+00 by: spl

€xcellent!

#Comment Re: made: 2008-06-06 19:49:33.155419+00 by: spl

As you can see by my comment spam, it worked in both Text and HTML modes for me.

#Comment Re: made: 2008-06-06 20:05:10.910763+00 by: ziffle [edit history]

#Comment Re: made: 2008-06-06 20:05:33.508416+00 by: ziffle

off topic, but giving burning man a run for the money, err Euro:

http://www.the-twelfth.org.uk/images/S1010010.JPG

#Comment Re: made: 2008-06-16 20:31:47.700346+00 by: ebradway [edit history]

Test:

“We don’t want to cast a pall over the blogosphere by being heavy-handed, so we have to figure out a better and more positive way to do this,” Mr. Kennedy said.

Getting: ERROR: invalid byte sequence for encoding "UTF8": 0x93

Single and double quotes in UTF-8 work when converted to � tags.

#Comment Re: made: 2008-06-16 20:43:24.778923+00 by: Dan Lyke

Yeah, the portion of the code that's not working is supposed to do exactly that.

#Comment Re: made: 2008-06-16 23:38:40.49285+00 by: Dan Lyke

Further testing:

“We don’t want to cast a pall over the blogosphere by being heavy-handed, so we have to figure out a better and more positive way to do this,” Mr. Kennedy said.

#Comment Re: made: 2008-06-16 23:43:43.54768+00 by: Dan Lyke

Some random quotes to see if this is licked:

They will “attempt to define clear standards as to how much of its articles and broadcasts bloggers and Web sites can excerpt without infringing on The A.P.’s copyright.”

Calling it “probably the greatest tournament I’ve ever had,” Tiger Woods

“I had to walk to school — not too far, a couple miles.”