Wednesday 03 March 2004 7:31:59 am
PHP itself does not support Unicode internally. You can get some support with the mbstring extension and overriding internal text functions but not all of PHP will support it. We also use the mbstring extension (if available) to perform conversion when it's needed (instead of all the time). However our i18n system does not support text operation such as extraction a portion of it yet. This means that all template operators that modify text will not work on Unicode characters.
The reason for the cutoff is the UTF8 encoding (which encodes Unicode characters), each Unicode character will be represented in an UTF8 encoding which can vary from 1 byte to 6 bytes. (1-3 is the most common). This means that a string that has three characters can actually be 4 or more bytes, and since PHP only sees each byte as a character it will cut off at the wrong place.
The only way to get support for this is create all the various text operations that are being used in the operators and place them in the i18n library. Then change the operators to use that functionality. However this is not a small task, especially considering problems such as case mapping (lowercase, uppercase etc.).
--
Amos
Documentation: http://ez.no/ez_publish/documentation
FAQ: http://ez.no/ez_publish/documentation/faq
|