filesysbox ntfs ubs massStorage problem

A forum for general AmigaOS 4.x support questions that are not platform-specific
User avatar
gazelle
Posts: 102
Joined: Sun Mar 04, 2012 12:49 pm
Location: Frohnleiten, Austria

Re: filesysbox ntfs ubs massStorage problem

Post by gazelle »

joerg wrote:The software used for displaying or entering the names has to do the conversion between UTF-8 and the local 8 bit charset, in this case it's the Workbench which has to be updated. Doing it in the file system (or dos.library) instead is wrong and can't work, currently only 8 bit charsets are supported by AmigaOS 4.x but the file systems, especially the ones used for transferring data form/to other OSes, have to support all Unicode chars.
But it's the filesystem which defines the format in which it stores the names. So shouldn't it be responsible to make the conversion? Otherwise each application would need to ask the filesystem first in which format it wants the names and adapt to it accordingly.

@salass00:

Nah, it's just an interresting topic to discuss.
User avatar
salass00
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 530
Joined: Sat Jun 18, 2011 3:12 pm
Location: Finland
Contact:

Re: filesysbox ntfs ubs massStorage problem

Post by salass00 »

gazelle wrote: But it's the filesystem which defines the format in which it stores the names. So shouldn't it be responsible to make the conversion? Otherwise each application would need to ask the filesystem first in which format it wants the names and adapt to it accordingly.
I don't think that character conversion in and of itself is necessarily a bad thing. The NTFS3G and exFAT filesystems still do conversion internally between UTF-8 and the 16-bit character system that Windows uses for filenames for example.
joerg
Posts: 371
Joined: Sat Mar 01, 2014 5:42 am

Re: filesysbox ntfs ubs massStorage problem

Post by joerg »

gazelle wrote:But it's the filesystem which defines the format in which it stores the names.
Internal encoding on the HD if required by the file system format, yes. Getting/returning from/to the OS (dos.library), no, that's UTF-8 now.
I'll fix my file systems (SFS, etc.) as well to convert between the internal encoding (ISO-8859-1 in case of SFS\0 and SFS\2) and UTF-8, and add new, AmigaOS 4.x only DOSTypes (SFS\1 and SFS\3) which allow using all UTF-8 strings.
So shouldn't it be responsible to make the conversion?
No, but even if it should: It couldn't. To get the charset you have to open locale.library and use locale = ILocale->OpenLocale(NULL). locale.library is in LIBS:, not a kickstart module, and it depends on several other files on SYS: as well. If SYS: is a SFS partition and SFS would try to open locale.library which has to be loaded from this SFS partition you'd get an endless loop or a deadlock ...
Otherwise each application would need to ask the filesystem first in which format it wants the names and adapt to it accordingly.
The names are in UTF-8, on any file system (most still need some bug fixes), it's no longer limited to ASCII (everything >= 160 was just undefined bytes until now).
User avatar
gazelle
Posts: 102
Joined: Sun Mar 04, 2012 12:49 pm
Location: Frohnleiten, Austria

Re: filesysbox ntfs ubs massStorage problem

Post by gazelle »

joerg wrote:To get the charset you have to open locale.library and use locale = ILocale->OpenLocale(NULL). locale.library is in LIBS:, not a kickstart module, and it depends on several other files on SYS: as well. If SYS: is a SFS partition and SFS would try to open locale.library which has to be loaded from this SFS partition you'd get an endless loop or a deadlock ...
Ok, that's a pretty good reason. As this discussion started with filesysbox.lib / NTFS I didn't think of the automount filesystems.
joerg wrote:The names are in UTF-8, on any file system (most still need some bug fixes), it's no longer limited to ASCII (everything >= 160 was just undefined bytes until now).
That's new but I guess it's needed as a first step to support multibyte charsets. Oh, I can hear the outcry of some of the community members: "But, but, but ... that will break the backward ... my super old program wont work anymore ... ", now where is my popcorn ;)
Belxjander
Posts: 314
Joined: Mon May 14, 2012 10:26 pm
Location: 日本千葉県松戸市 / Matsudo City, Chiba, Japan
Contact:

Re: filesysbox ntfs ubs massStorage problem

Post by Belxjander »

gazelle wrote:
joerg wrote:To get the charset you have to open locale.library and use locale = ILocale->OpenLocale(NULL). locale.library is in LIBS:, not a kickstart module, and it depends on several other files on SYS: as well. If SYS: is a SFS partition and SFS would try to open locale.library which has to be loaded from this SFS partition you'd get an endless loop or a deadlock ...
Ok, that's a pretty good reason. As this discussion started with filesysbox.lib / NTFS I didn't think of the automount filesystems.
joerg wrote:The names are in UTF-8, on any file system (most still need some bug fixes), it's no longer limited to ASCII (everything >= 160 was just undefined bytes until now).
That's new but I guess it's needed as a first step to support multibyte charsets. Oh, I can hear the outcry of some of the community members: "But, but, but ... that will break the backward ... my super old program wont work anymore ... ", now where is my popcorn ;)
UTF-8 Entry won't be an issue (I am working on an IME using UTF-8 encoding string outputs into the systems Input Event stream), I have already got UTF-8 encoded filenames that are inaccessible until I can actually deal with the UTF8 encoded names (display and input being seperate afaik).

I've just got an internal encoding issue for compilation of single encodings of characters (I will be taking composition inputs and generating the distinct characters as a "deadkey" processing response).

for my own personal internal use I am going to be using the UTF-8 "codepoint" values pretty much raw for internal buffering (modified use of a TagItem structure for buffering purposes as an overloaded Qualifiers(tag) and IE_Code(data) value pair).

I was considering of using the 7bit safe "URLencode" schema for requester recognition...but as colinw has said about dos.library being updated for UTF-8 encoding safe string usage. I'll deal with Side-By-Side Language Input selections all being pushed to a common core UTF-8 encoding.

This should allow English, Russian, Japanese and additional language support without any major weirdnesses and workarounds based on codepages and other things.

One thing I am definitely accepting is that I'll support only two output encodings... "raw original"(ISO Latin-1 encoding only) and "vanilla"(UTF-8 processed) so that there is some measure of backwards compatability.

I'm thinking this will help the OS for more than filesystems (and I am leaving loading to be referential based on language libraries being loaded from Iprefs)
chris
Posts: 562
Joined: Sat Jun 18, 2011 11:05 am
Contact:

Re: filesysbox ntfs ubs massStorage problem

Post by chris »

Belxjander wrote:UTF-8 Entry won't be an issue (I am working on an IME using UTF-8 encoding string outputs into the systems Input Event stream)
Absolutely. There's no reason why a small commodity couldn't be written to translate local charset input to UTF-8 on the fly. The display would be a bit wonky, and if the application already internally converts to UTF-8 there may be problems there. If the application isn't expecting UTF-8 input it's probably not all that useful though (this is where we need a MapRawKeyUTF8).
Belxjander
Posts: 314
Joined: Mon May 14, 2012 10:26 pm
Location: 日本千葉県松戸市 / Matsudo City, Chiba, Japan
Contact:

Re: filesysbox ntfs ubs massStorage problem

Post by Belxjander »

chris wrote:
Belxjander wrote:UTF-8 Entry won't be an issue (I am working on an IME using UTF-8 encoding string outputs into the systems Input Event stream)
Absolutely. There's no reason why a small commodity couldn't be written to translate local charset input to UTF-8 on the fly. The display would be a bit wonky, and if the application already internally converts to UTF-8 there may be problems there. If the application isn't expecting UTF-8 input it's probably not all that useful though (this is where we need a MapRawKeyUTF8).
I have an InputHandler() registered by a "Perception-IME" process launched from within "Perception.Library", (Loading is Launching here)

My plan is to enable Buffering the keyboard and translating to an extended UTF8 Character option based on user selected translation modes.

The "default" mode will be pass-through with the Alternate mode being "Translation" mode based on .language registered options.

This way Japanese for example will be supportable without losing English or Russian or other existing language support options.

Right now there is code which I have made public for all of the above ( Perception-IME on http://code.google.com/p/perception-ime/ )
I'll deal with "Input" options at the moment somewhat exclusively and I'm leaving display alone (UTF8 native support elsewhere can be built-in for string-safety).

I'll be dealing with FileSystems using FileSysBox.Library for any FileSystems myself, I'm not going to worry about what medium it is on just yet.

EDIT: I'll switch any further discussion of my own projects to seperate threads... as they are not directly related to this threads FileSysBox/NTFS/USB problem discussion at this point for what I am aware of.
Post Reply