Help with Catalogs please...

This forum is for discussion of the AmigaOS 4.x localization. This includes translation errors as well as proposals for improved translations, and other topics related to localization.
Post Reply
Belxjander
Posts: 314
Joined: Mon May 14, 2012 10:26 pm
Location: 日本千葉県松戸市 / Matsudo City, Chiba, Japan
Contact:

Help with Catalogs please...

Post by Belxjander »

I've (not so )recently been trying to write up a set of Catalog files for building a database of character information for use on AmigaOS.

so far I have managed to write an ARexx script that hard-generates a currently "non-compiling" set of Catalog sections (the sam440 needed to have this run overnight...)

If anyone can explain to me where I may be going wrong with the Catalog semantics?

the script is in the Perception-IME repository as Unicode.rexx.

I have also committed a copy of the dataset generated by the script.
[[ EDIT:[[1]] I have subsequently *removed* that temporary dataset, as the script will actively generate a version as required... ]]

If anyone wishes to inform me as to how functional that script is when called from this GNUmakefile or an equivalent script to run the same section generation listing.

I do apologize if the source seems hard to read but I have single-lined each variation of UTF8 encoding based on how many Octets are produced for the given U+[[xxxxxxxx]] values with a hard limit in the results (which I am currently lucky in that none of the sections go above what the script handles so far).

I am currently trying to put together a dataset of useful information about characters specifically towards making a full unicode range capable means of Input Support without majorly changing the OS (maybe asking for specific "feature"s if they become required).

Currently the dataset result characters are not locally viewable on an AmigaOS system outside TimberWolf due to Cairo and Glyph rendering used there.
I am aware of OWB but am unsure as to whether that will properly display the UTF8 glyphs as well.

CJK-Unified-Ideographs.cd is my current focus with Character 4E00 looking like an English "-" sign but longer,
This is the character for "1" in both Japanese and Chinese as an example.

If anyone has time to discuss with me about what is actually essential for a catalog to be catcomp'ed and produce a valid .catalog result...
I would enjoy the explanation and any debugging help with what I have scripted so far.

EDIT:[[1]]
I basically need to confirm whether the script is reproducing the same output on non-AmigaOSv4 systems and any resource requirements it needs...
This is for fine-tuning it for only the sections essential for a given language to be generated and merged together for a properly compiled .catalog result.

EDIT:[[2]]
"TekMage" of AmigaWorld/#AmigaWorld attempted to run this from the RAM: disk and only saw script errors...
I can confirm reproduction of that error ***limited*** to trying to run the script in the ram: disk only.
This error is non-reproducible outside the ram: disk at this time.

EDIT:[[3]]
Resolved the Disk: root folder execution problem but this still runs into using a massive amount of memory on AmigaOSv4 systems,
otherwise I am on a slow schedule to have 13108 Kanji out of the Unicode ranges available on AmigaOS once I have built Hiragana based phoneme tables.

For now I am writing it ALL out by hand... and still don't have any single table of the characters + Unicode CodePoints + UTF8 Encoded Hex values.

But I can confirm ALL of the Ideographs for Japanese are 3-Octets per Ideograph when encoded. 2 as CodePoint values.
chris
Posts: 562
Joined: Sat Jun 18, 2011 11:05 am
Contact:

Re: Help with Catalogs please...

Post by chris »

Belxjander wrote:Currently the dataset result characters are not locally viewable on an AmigaOS system outside TimberWolf due to Cairo and Glyph rendering used there.
I am aware of OWB but am unsure as to whether that will properly display the UTF8 glyphs as well.
NetSurf should work too. For input events it converts from local charset to UCS4 (it might go via UTF-8, can't remember off-hand), so you ought to be able to type into input boxes and see the results immediately. All pages are rendered as UTF-8. If it doesn't work let me know and I'll try and figure out why not - but if you are sending UTF-8 and have the character set in your .language file set as UTF-8 then it ought to be fine.
For now I am writing it ALL out by hand... and still don't have any single table of the characters + Unicode CodePoints + UTF8 Encoded Hex values.
Can you not generate something from the Unicode tables? http://www.unicode.org/Public/UNIDATA/
Post Reply