filesysbox ntfs ubs massStorage problem

A forum for general AmigaOS 4.x support questions that are not platform-specific
User avatar
gazelle
Posts: 102
Joined: Sun Mar 04, 2012 12:49 pm
Location: Frohnleiten, Austria

Re: filesysbox ntfs ubs massStorage problem

Post by gazelle »

salass00 wrote:... Given that I'm only interested in one mapping table (ISO-8859-15) ...
I still think you should use the local charset for filesystems that can handle UTF-8/16/32 or whatever.

from the autodocs of locale.doc:
locale.library/OpenLocale wrote:... When passing a NULL name parameter to this function, you are guaranteed a valid return. ...
What happens when you create a new file with a local-name (with codepoints not in USASCII) and your local charset is NOT ISO-8859-15?
joerg
Posts: 371
Joined: Sat Mar 01, 2014 5:42 am

Re: filesysbox ntfs ubs massStorage problem

Post by joerg »

salass00 wrote:IIRC even though in programs the filenames may be displayed as if they were using the locale defined codeset the filesystems still always handle the filenames internally as if they are ISO-8859-1/ISO-8859-15 encoded when doing operations with them like case insensitive string comparison.
No AmigaOS file system uses ISO-8859-15 (š 0xA8 = Š 0xA6, ž 0xB8 = Ž 0xB4, ÿ 0xFF = Ÿ 0xBE, œ 0xBD = Œ 0xBC).
FFS (DOS\2-DOS\7 DOSTypes only) and SFS (if formatted without using the case sensitive option) use ISO-8859-1 for case insensitive compares, it's required for backward compatibility to their AmigaOS 3.x versions (before AmigaOS 4.x everything was ISO-8859-1, there was no charset support in AmigaOS yet).
JXFS uses ASCII ([a-z]=[A-Z]) if formatted without the case sensitive option, therefore regardless of this option UTF-8 file names are always case sensitive on it. IIRC it's the same for FFS with DOSTypes DOS\0 and DOS\1.
Using ISO-8859-15 in an AmigaOS 4.x file system makes no sense at all, it's either ASCII only (identical in all charsets supported by AmigaOS 4.x), ISO-8859-1 for backward compatibility if it's an old AmigaOS-only file system which is available in an AmigaOS 3.x/m68k version as well, or UTF-8 (simply because everything else can't work to get unique names independent of the currently selected charset in locale prefs, even if there are currently a lot of problems with it and probably no single software which can display UTF-8 file names correctly yet).

For your file systems, which have to support Unicode because other systems using them do, you have two options:
- UTF-8
- en-/decoding names with non-ASCII chars using something like Punycode, mime quoted printable, OSTA UDF file name translation (with IsIllegal(x) (x < 32 || x >= 127 || x == ':' || x == '/') ? 1 : 0), etc.
User avatar
salass00
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 530
Joined: Sat Jun 18, 2011 3:12 pm
Location: Finland
Contact:

Re: filesysbox ntfs ubs massStorage problem

Post by salass00 »

@joerg

AFAIK ISO-8859-15 is exactly the same as ISO-8859-1 except for the euro sign which does not exist in ISO-8859-1. CrossDOS supports the euro sign (it has special code for this so it's not just working by mistake) so it must be using ISO-8859-15.
joerg
Posts: 371
Joined: Sat Mar 01, 2014 5:42 am

Re: filesysbox ntfs ubs massStorage problem

Post by joerg »

salass00 wrote:AFAIK ISO-8859-15 is exactly the same as ISO-8859-1 except for the euro sign which does not exist in ISO-8859-1.
Wrong, 8 chars changed: https://en.wikipedia.org/wiki/ISO/IEC_8859-15

You have to use UTF-8 or an ASCII en-/decoding of unicode chars, anything else doesn't work.
User avatar
salass00
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 530
Joined: Sat Jun 18, 2011 3:12 pm
Location: Finland
Contact:

Re: filesysbox ntfs ubs massStorage problem

Post by salass00 »

joerg wrote:
salass00 wrote:AFAIK ISO-8859-15 is exactly the same as ISO-8859-1 except for the euro sign which does not exist in ISO-8859-1.
Wrong, 8 chars changed: https://en.wikipedia.org/wiki/ISO/IEC_8859-15

You have to use UTF-8 or an ASCII en-/decoding of unicode chars, anything else doesn't work.
7 chars I couldn't care less about TBH, but I guess I can change the encoding to ISO-8859-1 if it makes you happy.
joerg
Posts: 371
Joined: Sat Mar 01, 2014 5:42 am

Re: filesysbox ntfs ubs massStorage problem

Post by joerg »

salass00 wrote:7 chars I couldn't care less about TBH, but I guess I can change the encoding to ISO-8859-1 if it makes you happy.
ISO-8859-1 is only OK for an AmigaOS 3.x version, but not for an AmigaOS 4.x file system.
On AmigaOS 4.x ISO-8859-1 is just as wrong as any other 8 bit charset, it only works for users who currently use the same charset you've hard coded into your file system, and no matter which one you choose most users are using a different charset.
User avatar
salass00
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 530
Joined: Sat Jun 18, 2011 3:12 pm
Location: Finland
Contact:

Re: filesysbox ntfs ubs massStorage problem

Post by salass00 »

joerg wrote:
salass00 wrote:7 chars I couldn't care less about TBH, but I guess I can change the encoding to ISO-8859-1 if it makes you happy.
ISO-8859-1 is only OK for an AmigaOS 3.x version, but not for an AmigaOS 4.x file system.
On AmigaOS 4.x ISO-8859-1 is just as wrong as any other 8 bit charset, it only works for users who currently use the same charset you've hard coded into your file system, and no matter which one you choose most users are using a different charset.
Well I never claimed it was a perfect solution but it's either this or nothing currently. Using UTF-8 is not an option because absolutely nothing supports it and changing filesystem charset depending on locale settings is completely retarded and doesn't really solve anything any more than your other solution of using only pure ASCII would.

Personally I would rather just use UTF-8 and get rid of all this character set conversion garbage in filesysbox but it just isn't going to happen.
joerg
Posts: 371
Joined: Sat Mar 01, 2014 5:42 am

Re: filesysbox ntfs ubs massStorage problem

Post by joerg »

salass00 wrote:Using UTF-8 is not an option because absolutely nothing supports it
As long as nobody starts using UTF-8 it it will never happen. And what's the problem with the UTF-8 file names in JXFS, except that next to nothing displays them correctly yet?
and changing filesystem charset depending on locale settings is completely retarded
Of course it is, and that's exactly what you are currently doing by converting unicode chars to random 8 bit chars. If a user doesn't use the same charset by accident you are using in your file systems the file names will be displayed for him with chars which don't have anything to do with the real chars in the file names. It's currently the same for UTF-8 in most software, but if UTF-8 is used the programs can be updated to support UTF-8, guessing which charset different file system might use instead can't work.
User avatar
nbache
Beta Tester
Beta Tester
Posts: 1714
Joined: Mon Dec 20, 2010 7:25 pm
Location: Copenhagen, Denmark
Contact:

Re: filesysbox ntfs ubs massStorage problem

Post by nbache »

joerg wrote:
salass00 wrote:Using UTF-8 is not an option because absolutely nothing supports it
As long as nobody starts using UTF-8 it it will never happen. And what's the problem with the UTF-8 file names in JXFS, except that next to nothing displays them correctly yet?
I have to support Jörg in this.

It would be much better to have the UTF-8 names used even if most software will display them weirdly at the moment.

And at the very least, please reconsider the unfortunate workaround of simply skipping the files with unmatched characters. It is very confusing for a user to get the idea that some files have disappered as opposed to just seeng them with partly scrambled names. Even something like
joerg wrote:en-/decoding names with non-ASCII chars using something like Punycode, mime quoted printable, OSTA UDF file name translation (with IsIllegal(x) (x < 32 || x >= 127 || x == ':' || x == '/') ? 1 : 0), etc.
would be much better.

Best regards,

Niels
User avatar
colinw
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 207
Joined: Mon Aug 15, 2011 9:20 am
Location: Brisbane, QLD. Australia.

Re: filesysbox ntfs ubs massStorage problem

Post by colinw »

I'm also currently doing a preliminary go-over with the dos.library code and making it as encoding agnostic as possible,
the only two ASCII / UTF-8 bytes being tested for are the "/" and ":" and ASCII constant strings like "NIL:",
the rest is treated as a simple byte stream. Currently there are a couple of functions that do case-insensitive searches
in DOS ie; FindDosEntry() and such, so I use the utility functions like; IUtility->Stricmp() for this, so that when the
time comes, these functions can be UTF8-ified in one hit.

As far as the filesystems are concerned, and me having spent years working around hard-coded filesystem limitations,
no matter how you try and manually interpret the data, it will inevitably be wrong at some point, neither DOS or the
Filesystems should even need to try to manually interpret the data stored, they should just store and retrieve it,
it's up to the display software to do the decoding.

With the legacy we are presented with, the only viable extensible encoding we can use is UTF-8, and luckily that also includes
the 7 bit ASCII legacy bytes that we have always keyed off, so, my opinion is that (besides the couple of ASCII control bytes),
we should simply store the data as presented, as a byte stream, and make no assumptions about what anything >= 0x7F
represents, because it is virtually guaranteed to be wrong at some point.

At this time, i'm leaving the case-sensitive/insensitivity to the appropriate utility functions and trying not to make any
other assumptions in the code at all, BUT i'm also avoiding byte-wide case changes using ToUpper(),ToLower() etc.
as these will be problematic later on with UTF-8 streams.
Last edited by colinw on Thu Mar 13, 2014 1:53 pm, edited 1 time in total.
Post Reply