filesysbox ntfs ubs massStorage problem

A forum for general AmigaOS 4.x support questions that are not platform-specific
User avatar
colinw
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 207
Joined: Mon Aug 15, 2011 9:20 am
Location: Brisbane, QLD. Australia.

Re: filesysbox ntfs ubs massStorage problem

Post by colinw »

salass00 wrote: Personally I would rather just use UTF-8 and get rid of all this character set conversion garbage in filesysbox but it just isn't going to happen.
On the contrary, that's exactly what needs to happen. Anything else is a kludge that will also cripple the filesystems efficiency,
so you might as well take the first step towards UTF-8 compatibility.
User avatar
salass00
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 530
Joined: Sat Jun 18, 2011 3:12 pm
Location: Finland
Contact:

Re: filesysbox ntfs ubs massStorage problem

Post by salass00 »

colinw wrote: On the contrary, that's exactly what needs to happen. Anything else is a kludge that will also cripple the filesystems efficiency,
so you might as well take the first step towards UTF-8 compatibility.
Well FWIW I will change filesysbox to use UTF-8 strings exclusively but I don't expect anyone to be spurred to implement UTF-8 support in the rest of the OS because this.
User avatar
colinw
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 207
Joined: Mon Aug 15, 2011 9:20 am
Location: Brisbane, QLD. Australia.

Re: filesysbox ntfs ubs massStorage problem

Post by colinw »

salass00 wrote: Well FWIW I will change filesysbox to use UTF-8 strings exclusively but I don't expect anyone to be spurred to implement
UTF-8 support in the rest of the OS because this.
Actually that's exactly what I expect to happen to spur further development, when the "wobbly characters" from the
other OS's data on USB sticks, start to piss-off enough people, action will be taken.
You know as well as I do that nothing gets fixed if it's not obviously problematic.

I have already gone over ENV-handler, RAM-handler, APPDIR-handler and "other stuff" for UTF-8 compatibility for a while now,
and fixed anything that could be problematic, even if it is largely untested at this time.

We have been using ISO#### and ASCII encoding for a very long time already, and what we need to happen now,
is to NOT hardcode a wall in front of UTF-8 compatibility, so we can have a smooth transition.

The writing is on the wall, and it's written in UTF-8 encoding.
User avatar
salass00
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 530
Joined: Sat Jun 18, 2011 3:12 pm
Location: Finland
Contact:

Re: filesysbox ntfs ubs massStorage problem

Post by salass00 »

To implement case insensitive string comparison and hash functions I need a toupper() function that supports unicode.

AFAICT if I use setlocale(LC_CTYPE, "C-UTF-8") first I should then be able to use towupper() for this purpose, but I guess this doesn't work so well in a shared where it will be called from many different programs?
chris
Posts: 562
Joined: Sat Jun 18, 2011 11:05 am
Contact:

Re: filesysbox ntfs ubs massStorage problem

Post by chris »

salass00 wrote:To implement case insensitive string comparison and hash functions I need a toupper() function that supports unicode.
You could use libunistring until locale.library gets UTF-8 support?
User avatar
colinw
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 207
Joined: Mon Aug 15, 2011 9:20 am
Location: Brisbane, QLD. Australia.

Re: filesysbox ntfs ubs massStorage problem

Post by colinw »

salass00 wrote: To implement case insensitive string comparison and hash functions I need a toupper() function that supports unicode.
As I mentioned in a previous post, you must avoid single byte operations on a UTF-8 stream because
one byte != one glyph anymore, at least for values >= 0x7F. ToUpper() / ToLower() can't work.

For example, take the Angstrom character in UTF-8, it's codepoint value is U+212B and is
represented by 3 bytes; 0xE2 0x84 0xAB, and using a function that performs single byte operations
on each of those 3 bytes within the UTF-8 stream, is simply not going to work.

To make things even more interesting, the Angstrom can also be "composed" in UTF-8 by using the
capital latin letter "A" and adding a ring above it.
User avatar
salass00
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 530
Joined: Sat Jun 18, 2011 3:12 pm
Location: Finland
Contact:

Re: filesysbox ntfs ubs massStorage problem

Post by salass00 »

colinw wrote:
salass00 wrote: To implement case insensitive string comparison and hash functions I need a toupper() function that supports unicode.
As I mentioned in a previous post, you must avoid single byte operations on a UTF-8 stream because
one byte != one glyph anymore, at least for values >= 0x7F. ToUpper() / ToLower() can't work.

For example, take the Angstrom character in UTF-8, it's codepoint value is U+212B and is
represented by 3 bytes; 0xE2 0x84 0xAB, and using a function that performs single byte operations
on each of those 3 bytes within the UTF-8 stream, is simply not going to work.
I wasn't talking about doing any single byte operations or even using toupper() itself. I already have code for decoding UTF-8 multibyte sequences into 32-bit unicode values. What I need is a toupper()-like function which takes this 32-bit unicode and converts it into it's equivalent upper case unicode if it has one.

The newlib.library towupper() accepts a wchar_t which is a 32-bit integer which is why I mentioned it.
User avatar
salass00
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 530
Joined: Sat Jun 18, 2011 3:12 pm
Location: Finland
Contact:

Re: filesysbox ntfs ubs massStorage problem

Post by salass00 »

chris wrote:
salass00 wrote:To implement case insensitive string comparison and hash functions I need a toupper() function that supports unicode.
You could use libunistring until locale.library gets UTF-8 support?
I would rather not use GPL code in this case and this does much more than I need it to do, but thanks for the suggestion anyway.
User avatar
colinw
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 207
Joined: Mon Aug 15, 2011 9:20 am
Location: Brisbane, QLD. Australia.

Re: filesysbox ntfs ubs massStorage problem

Post by colinw »

salass00 wrote: The newlib.library towupper() accepts a wchar_t which is a 32-bit integer which is why I mentioned it.
Carefull, I think the wchar_t is defined as 16 bit in our includes, better check it.
User avatar
salass00
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 530
Joined: Sat Jun 18, 2011 3:12 pm
Location: Finland
Contact:

Re: filesysbox ntfs ubs massStorage problem

Post by salass00 »

colinw wrote: Carefull, I think the wchar_t is defined as 16 bit in our includes, better check it.
I know it's not (it was discussed on the developer mailing list a while back), but it probably doesn't matter because the UTF-8 support in the wide char functions has to be enabled first with setlocale() which means I probably won't be able to use it anyway.

In fact this is its definition from SDK/newlib/include/stddef.h:

Code: Select all

typedef int wchar_t;
Post Reply