Codeset conversion: the recommended way

trixie · Post by **trixie** » Mon Nov 28, 2011 7:40 pm

Can we please have an official Hyperion word on how the poor developers should implement codeset conversion in their programs? I mean, in a future-proof way: "the recommended practice", if you like. Is codesets.library the way? Or is there an alternative solution under development that is to become part of OS4? I'm asking because I have a number of projects under development that work with UTF-8 encoded text and therefore require codeset conversion.

ZeroG · Post by **ZeroG** » Tue Nov 29, 2011 5:52 pm

I don't think that there is support for UTF-8 encoding, but you can get a Unicode mapping table using
IDiskfont->ObtainCharsetInfo().

chris · Post by **chris** » Tue Nov 29, 2011 11:10 pm

...and once you have that, you can choose from newlib's iconv(), iconv.library/libiconv, codesets.library and/or parserutils.library (and I think there's a utf8.library floating around somewhere too)

I assume the "official" way is to use iconv() from newlib.library. I'm not sure if locale.library uses that too or has built-in functions for converting catalogs into the correct charset for display. Or maybe it doesn't even do that, I'm not quite sure.

Take your pick!

trixie · Post by **trixie** » Wed Jan 25, 2012 5:43 pm

@chris

chris wrote:you can choose from newlib's iconv(), iconv.library/libiconv, codesets.library and/or parserutils.library (and I think there's a utf8.library floating around somewhere too)

All right, iconv() would probably be the best for what I need - newlib.library is part of the AOS kernel now, is that right? So I won't have to rely on the user having a third-party library installed.

I found an example for iconv(), and see that before you use it, you have to

Code: Select all

iconv_open(const char *tocode, const char *fromcode);

Where do I get the "tocode" and "fromcode" codeset names, are they the same as those used by the locale.library? Can I do something like this?

Code: Select all

iconv_open("iso-8859-2", "utf-8");

(Sorry I'd try it out myself but unfortunately, my SAM is broken at the moment, taking a holiday in Italy

)

chris · Post by **chris** » Thu Jan 26, 2012 11:16 pm

trixie wrote:All right, iconv() would probably be the best for what I need - newlib.library is part of the AOS kernel now, is that right? So I won't have to rely on the user having a third-party library installed.

Yep, that's the best bet generally if it does what you need.

I found an example for iconv(), and see that before you use it, you have to
Code: Select all
iconv_open(const char *tocode, const char *fromcode);
Where do I get the "tocode" and "fromcode" codeset names, are they the same as those used by the locale.library? Can I do something like this?
Code: Select all
iconv_open("iso-8859-2", "utf-8");

IIRC, yes.

Belxjander · Post by **Belxjander** » Fri May 18, 2012 9:25 pm

Glad to have come across this... I'm taking notes as Perception-IME is also dealing with UTF-8 text strings as well...

@Trixie, I hope your own sam is recoverable somehow

billt · Post by **billt** » Fri Jun 01, 2012 5:20 pm

And WxWidgets port will need UTF8 as well. Seems to be a popular thing these days.

trixie · Post by **trixie** » Sun Jun 03, 2012 9:45 pm

@Belxjander

@Trixie, I hope your own sam is recoverable somehow

My Sam is probably dead as a dodo but thanks to Steven Solie I got access to an affordable replacement so I'm now fully setup and developing again!

Belxjander · Post by **Belxjander** » Thu Jun 28, 2012 11:48 am

trixie wrote:@Belxjander

@Trixie, I hope your own sam is recoverable somehow
My Sam is probably dead as a dodo but thanks to Steven Solie I got access to an affordable replacement so I'm now fully setup and developing again!

Excellent news at least...

I'm currently looking at how to handle plugging in extra materials to the locale.library and I'll consider remapping from the UTF tables already present to try and get Japanese Displaying properly first... following up with getting the Input handled properly

Hyperion Entertainment Message Boards

Codeset conversion: the recommended way

Codeset conversion: the recommended way

Re: Codeset conversion: the recommended way

Re: Codeset conversion: the recommended way

Re: Codeset conversion: the recommended way

Re: Codeset conversion: the recommended way

Re: Codeset conversion: the recommended way

Re: Codeset conversion: the recommended way

Re: Codeset conversion: the recommended way

Re: Codeset conversion: the recommended way