Hungarian (UNICODE) storage problem

Author Message

Tony Wood

Sunday 07 November 2004 3:59:36 am

Hi,

We are using MySQL 4.1.7 set with utf-8 for the db abd get the following weird problem with some characters.

When you cut and past certain characters into eZ publish they fail This is the case with the A in Árucikkenként.

It appears to affect but not limited to.
U+00C1 LATIN CAPTIAL LETTER A WITH ACUTE
U+0150 LATIN CAPTIAL LETTER O WITH DOUBLE ACUTE

We tried with and without OE and it still fails. The cut and paste works with other Apps it appears only to be eZ

Has anyone else come across this and got a fix?

tia

tony

Tony Wood : twitter.com/tonywood
Vision with Technology
Experts in eZ Publish consulting & development

Power to the Editor!

Free eZ Training : http://www.VisionWT.com/training
eZ Future Podcast : http://www.VisionWT.com/eZ-Future

Balazs Halasy

Sunday 07 November 2004 2:35:14 pm

Hi,

I have no clue why this happens, but: you could try turning on debug output, SQL debug and redirection debug. Copy & paste the text again and look for the SQL insert query which actually stores the thing in the database - does the query itself look healthy?

When you say "other apps" I reckon you mean other Win32 applications, correct? Well, the reason for why that would work is because most windows apps support UNICODE (or do some internal mapping) by default. However, what happens if you try using this text in other web-based solutions? Also, what are the exact symptoms? Do the A+acute and the O+double-acute letters simply disappear?

Allman

Tony Wood

Monday 08 November 2004 12:51:29 am

Hi, Thanks for the quick reply.

>>SQL
There are no errors in the SQL, and the text looks good in the storage line. Somewhere between entering in the screen and it being stored in the db it fails.
If I past in the value direct into the DB it still fails on the display, even though the DB has the correct value.

>>Cut and Paste
This was cut and paste on Mandrake, but I think this was a red herin as I have found the problem is not here.

>>Symptons
The character get converted into what looks like a non-double byte character.
So Árucikkenként will be converted to �?rucikkenként.

I see that Árucikkenként stores correctly on your site is the ez.no site utf-8 (unicode)?

thanks

tony

Tony Wood : twitter.com/tonywood
Vision with Technology
Experts in eZ Publish consulting & development

Power to the Editor!

Free eZ Training : http://www.VisionWT.com/training
eZ Future Podcast : http://www.VisionWT.com/eZ-Future

Jan Borsodi

Monday 08 November 2004 10:17:25 pm

It might be that the output of the site is not in UTF-8 but in a standard 8-bit charset. Some browsers will send characters not in that range as HTML entities.

You should check the output of the HTML page and see if it contains a <i>meta</i> tag with:

http-equiv="Content-Type" content="text/html; charset=utf-8"

If charset is not <i>utf-8</i> then that will explain the problem.

You should also examine the <i>HTTP</i> headers for the page.

--
Amos

Documentation: http://ez.no/ez_publish/documentation
FAQ: http://ez.no/ez_publish/documentation/faq

Tony Wood

Tuesday 09 November 2004 5:04:29 am

Hi,

The problem occurs in the admin interface as well as the front end. I have tested the admin interface with and without the OE and it still has the problem.

If you can confirm that you have it working with UTF-8 in your environment then it must be our setup and I will review..

Tony

Tony Wood : twitter.com/tonywood
Vision with Technology
Experts in eZ Publish consulting & development

Power to the Editor!

Free eZ Training : http://www.VisionWT.com/training
eZ Future Podcast : http://www.VisionWT.com/eZ-Future

Tony Wood

Tuesday 16 November 2004 2:49:48 am

Hi Jan,

This is fixed by patching mysqldb.php with charset code from trunk

see: http://ez.no/community/bug_reports/hungarian_utf_8_character_bug and http://ez.no/community/bug_reports/mysql_connect_mysql_client_has_differnet_charset_as_server_db

Tony Wood : twitter.com/tonywood
Vision with Technology
Experts in eZ Publish consulting & development

Power to the Editor!

Free eZ Training : http://www.VisionWT.com/training
eZ Future Podcast : http://www.VisionWT.com/eZ-Future

Powered by eZ Publish™ CMS Open Source Web Content Management. Copyright © 1999-2014 eZ Systems AS (except where otherwise noted). All rights reserved.