ARexx - potential bug or just a badly written script?

A forum for general AmigaOS 4.x support questions that are not platform-specific
Belxjander
Posts: 315
Joined: Mon May 14, 2012 11:26 pm
Location: 日本千葉県松戸市 / Matsudo City, Chiba, Japan
Contact:

ARexx - potential bug or just a badly written script?

Post by Belxjander »

In trying to build an ARexx script to generate the initial catalog descriptor and catalog translation files I am going to use as a database,
I have found an annoying though hard-to-reproduce problem.

When I run Unicode.rexx I end up with the 1GB of memory on my sam440flex being fully utilized (I boot to workbench with ~200MB in use to start and can NOT run anything else when running this script...).

Am I hitting some hidden behaviour or something that is not being triggered?

as with ~800MB of free physical memory and no swap partitions at all, I end up with less than 20MB free when the script is run on it's own.

If any of the developers can please confirm that the following commandline is able to complete and is not using an excessive amount of memory for the 2K of ARexx code linked.

Code: Select all

RX Unicode.rexx CJK-Unified-Ideographs 4E00 A000
The source is [[EDIT: fixed a cx instead of c reference for the Output generated]]

Code: Select all

/*
\\	Unicode.rexx
//
\\	Generate a Unicode Catalog Fragment based on a Given CodePoint to start and a range-limit to terminate.
*/
Options Results
Parse Arg CatFname rb rh Vector
Range=X2D(rh)-X2D(rb)
Echo 'Section (From=U'||rb||'/To=U'||rh||'/Range='||Range||') is ['||CatFname||']'||Vector||' '

CDFP=CatFname||'.cd'
If ~Open(CDFH,CDFP,WRITE) Then Do
    Echo 'Failed to Open Catalog Description for Writing'
	Exit
	End
CTFP=CatFname||'.ct'
If ~Open(CTFH,CTFP,WRITE) Then Do
    Echo 'Failed to Open Catalog Translation for Writing'
	Exit
	End

Do c=X2D(rb) while c<X2D(rh)
	Select
		When c<X2D('80') Then cx=D2C(c);
		When c<X2D('800') Then If Length(D2C(c))=1 Then Do
				cx=B2C('110000'||SubStr(C2B(D2C(c)),1,2))||B2C('10'||SubStr(C2B(D2C(c)),3,6));
			End;Else If Length(D2C(c))=2 Then Do
	            cx=B2C('110'||SubStr(C2B(SubStr(D2C(c),1,1)),6,3)||SubStr(C2B(SubStr(D2C(c),2,1)),1,2))||B2C('10'||SubStr(C2B(SubStr(D2C(c),2,1)),3,6))
			End;
		When c<X2D('10000') Then Do
			cx=B2C('1110'||SubStr(C2B(SubStr(D2C(c),1,1)),1,4))||B2C('10'||SubStr(C2B(SubStr(D2C(c),1,1)),5,4)||SubStr(C2B(SubStr(D2C(c),2,1)),1,2))||B2C('10'||SubStr(C2B(SubStr(D2C(c),2,1)),3,6))
			End;
		When c<X2D('200000') Then If Length(D2C(c))=4 Then Do
				cx=B2C('11110'||SubStr(C2B(SubStr(D2C(c),1,1)),6,3)))||B2C('10'||SubStr(C2B(SubStr(D2C(c),2,1)),1,6))||B2C('10'||SubStr(C2B(SubStr(D2C(c),2,1)),7,2)||SubStr(C2B(SubStr(D2C(c),3,1)),1,4))||B2C('10'||SubStr(C2B(SubStr(D2C(c),3,1)),5,4)||SubStr(C2B(SubStr(D2C(c),4,1)),1,2))||B2C('10'||SubStr(C2B(SubStr(D2C(c),4,1)),3,6))
			End;Else If Length(D2C(c))=3 Then Do
				cx=B2C('11110'||SubStr(C2B(SubStr(D2C(c),1,1)),4,3))||B2C('10'||SubStr(C2B(SubStr(D2C(c),1,1)),7,2)||SubStr(C2B(SubStr(D2C(c),2,1)),1,4))||B2C('10'||SubStr(C2B(SubStr(D2C(c),2,1)),5,4)||SubStr(C2B(SubStr(D2C(c),3,1)),1,2))||B2C('10'||SubStr(C2B(SubStr(D2C(c),3,1)),3,6))
			End;
/*
		When c<X2D('4000000') Then Do
			End;
*/
		Otherwise Exit();
	End;
    WriteLn(CDFH,'MSG_U'||C2X(D2C(c))||' ( '||cx||' / '||Length(cx)||' / 128 )'||'0A'x||'U='||C2X(cx)||'0A'x||';;');
    WriteLn(CTFH,'MSG_U'||C2X(D2C(c))||'0A'x||'U['||cx||']'||'0A'x||';;');
End
Last edited by Belxjander on Sat Sep 06, 2014 5:17 pm, edited 1 time in total.
User avatar
jaokim
Beta Tester
Beta Tester
Posts: 92
Joined: Sat Jun 18, 2011 1:41 am

Re: ARexx - potential bug or just a badly written script?

Post by jaokim »

Does it ever complete?

Looking quickly at the code, it seems to be a loop dependent on the variable "c". Is that ever changed in the loop? I. I couldn't spot where it's changed.

Given that there isn't a lot of code, it'd be fairly easy to debug it. I'm thinking you have an infinite loop.
xenic
Posts: 1185
Joined: Sun Jun 19, 2011 1:06 am

Re: ARexx - potential bug or just a badly written script?

Post by xenic »

jaokim wrote:Does it ever complete?

Looking quickly at the code, it seems to be a loop dependent on the variable "c". Is that ever changed in the loop? I. I couldn't spot where it's changed.

Given that there isn't a lot of code, it'd be fairly easy to debug it. I'm thinking you have an infinite loop.
I can't find the loop control variable "c" being changed anywhere either. I also see 2 files opened but never closed. I don't see it opening any files for input and wonder how it's producing locale definition and translation files without any input file. Maybe I just don't understand what the script is doing or what it's purpose is.
AmigaOne X1000 with 2GB memory - OS4.1 FE
Belxjander
Posts: 315
Joined: Mon May 14, 2012 11:26 pm
Location: 日本千葉県松戸市 / Matsudo City, Chiba, Japan
Contact:

Re: ARexx - potential bug or just a badly written script?

Post by Belxjander »

It is an ARexx "Do While" conditional loop,

as C is set and referenced by the Do and While at the start of the Loop...the "End" triggers incrimenting,

It is pregenerating a section of a .cd and .ct files (Open'ed and WriteLn'ed before and during the loop)

If you use the link and change to "GNUmakefile" you will see the calls applied to the script.

Each "label" after the script is the name of a section in the Unicode (v7) Specification.
the First number is the Initial CodePoint to convert to UTF8 for output, the second is the character immediate after the current range.
some ranges have odd numbers due to missing characters and don't generate a complete section.

The Catalog sections generated I plan on editing by parsing them through another script and merging in additional data.

the CJK-Unified-Ideographs section runs into a memory limit on my sam440flex (1GB with 200MB used after boot, running the script sees up to ~970MB of memory used).

I'm trying to work out if there is anything special as to why it requires such an enormous amount of memory when running
xenic
Posts: 1185
Joined: Sun Jun 19, 2011 1:06 am

Re: ARexx - potential bug or just a badly written script?

Post by xenic »

Belxjander wrote:It is an ARexx "Do While" conditional loop,
Duh, my bad. The loop is fine. I looked at your makefile and came up with a single line to test the script directly from a shell: "RX Unicode.rexx CJK-Unified-Ideographs 4E00 A000". I can confirm your problem. Even my x1000 system with 2GB memory froze when too much memory was used up. However, I don't think the problem is your script or ARexx. I commented out the output lines ( WriteLn() ) and the script runs to completion without using up memory. When I substituted output lines using AmigaDOS echo (ADDRESS COMMAND 'Echo ...) commands, the script was much slower but still used up all my memory.

I did some testing using a simple AmigaDOS script that writes the same line to a file as many times as your ARexx script writes to it's files. I lowered the loop count to 15,000 so my system wouldn't crash. My AmigaDOS script chews up almost as much memory as your ARexx script. Here is my test script:

Code: Select all

; Test script

Avail

Set total 15000
Set counter 0

LAB repeat

Echo >>file "This is a big fat slow test"

Set counter `EVAL $counter + 1`

If VAL $counter NOT GT $total
	skip repeat back
endif

Avail

Unset total
Unset counter
Here is the result when I ran the script in "ram:" using the timer command to get the elapsed execution time:
  • 3.Ram_Disk:> timer scriptloop
    Installed: 2,147,483,648
    Free: 1,819,308,032
    (Script is executing here)
    Installed: 2,147,483,648
    Free: 744,820,736
    Command duration: 724.3980 sec.

    3.Ram_Disk:> avail
    Installed: 2,147,483,648
    Free: 1,543,286,784
Most of my memory was used. After the script finished, it too several minutes for most of the available memory to return. However, almost 300MB was lost permanently.

Here is the result of running my script in a USB flash drive after a reboot:
  • 3.PUBLIC:> timer scriptloop
    Installed: 2,147,483,648
    Free: 1,827,737,600
    (Script is executing here)
    Installed: 2,147,483,648
    Free: 1,792,966,656
    Command duration: 79.2075 sec.

    3.PUBLIC:> avail
    Installed: 2,147,483,648
    Free: 1,804,247,040
Here is the result of running my script a second time in the same USB flash drive without a reboot:
  • 3.PUBLIC:> timer scriptloop
    Installed: 2,147,483,648
    Free: 1,805,799,424
    (Script is executing here)
    Installed: 2,147,483,648
    Free: 752,455,680
    Command duration: 701.2888 sec.

    3.PUBLIC:> avail
    Installed: 2,147,483,648
    Free: 1,550,913,536
The first time I ran the script in the USB flash drive, it seemed like the written data was buffered because there were only a small number of LED activity indications on the flash drive. However, running the script a second time in the flash drive (after waiting for memory to return from the first trial) didn't produce any LED activity indications and the script took as long to execute as it did in ram: and it also chewed up most of my memory.

It looks to me like DOS is totally screwed up when it comes to rapid sequential disk writing. There are 3 big problems:
1. Too much memory is being used for rapid simple disk writing.
2. A large amount of memory is never returned to the system.
3. The system crashes or freezes when too much memory is used.

Could a beta tester try my script and/or Belxjander's ARexx script please. This issue needs to be fixed.
AmigaOne X1000 with 2GB memory - OS4.1 FE
xenic
Posts: 1185
Joined: Sun Jun 19, 2011 1:06 am

Re: ARexx - potential bug or just a badly written script?

Post by xenic »

Belxjander wrote:I'm trying to work out if there is anything special as to why it requires such an enormous amount of memory when running
Based on the results in my previous post, I think you're going to need to find a "workaround" to make your script work.
AmigaOne X1000 with 2GB memory - OS4.1 FE
Belxjander
Posts: 315
Joined: Mon May 14, 2012 11:26 pm
Location: 日本千葉県松戸市 / Matsudo City, Chiba, Japan
Contact:

Re: ARexx - potential bug or just a badly written script?

Post by Belxjander »

So I have literally walked into a dos.library/FileSystem level issue with regards the amount of memory actively used for the output buffering...

Well I will leave the script in the repository...and I plan on reworking the GNUmakefile as this was a temporary script for partial template generation.

the templates are to be filled in with correct data by hand anyway, once the initial ID's are all present.

I am putting a whole set of one-time-to-generate data strings into Catalogs based on Unicode "CodePoint" values as the StringID to be obtained from the Catalog.

It is somewhat of a surprise as I thought my own scripting was at fault in buffer management or some other reason.
xenic
Posts: 1185
Joined: Sun Jun 19, 2011 1:06 am

Re: ARexx - potential bug or just a badly written script?

Post by xenic »

Belxjander wrote:So I have literally walked into a dos.library/FileSystem level issue with regards the amount of memory actively used for the output buffering...
I did some more testing and it appears that the memory management can't keep up with continuous rapid file writing without buffering (which ram: apparently lacks). I can't explain why USB buffering goes bad after running the script one time.

I got your script working by adding a simple counter and a delay after every 200 writes. Here is your modified script that appears to work in ram: and doesn't eat any memory. It's counterintuitive but the script actually runs faster with added delays. I tested it from a shell with RX Unicode.rexx CJK-Unified-Ideographs 4E00 A000. Here is the modified script:

Code: Select all

/*
\\   Unicode.rexx
//
\\   Generate a Unicode Catalog Fragment based on a Given CodePoint to start and a range-limit to terminate.
*/
Options Results
Parse Arg CatFname rb rh Vector
Range=X2D(rh)-X2D(rb)
Echo 'Section (From=U'||rb||'/To=U'||rh||'/Range='||Range||') is ['||CatFname||']'||Vector||' '

CDFP=CatFname||'.cd'
If ~Open(CDFH,CDFP,WRITE) Then Do
    Echo 'Failed to Open Catalog Description for Writing'
   Exit
   End
CTFP=CatFname||'.ct'
If ~Open(CTFH,CTFP,WRITE) Then Do
    Echo 'Failed to Open Catalog Translation for Writing'
   Exit
   End

count=0
Do c=X2D(rb) while c<X2D(rh)
   count=count+1
   Select
      When c<X2D('80') Then cx=D2C(c);
      When c<X2D('800') Then If Length(D2C(c))=1 Then Do
            cx=B2C('110000'||SubStr(C2B(D2C(c)),1,2))||B2C('10'||SubStr(C2B(D2C(c)),3,6));
         End;Else If Length(D2C(c))=2 Then Do
               cx=B2C('110'||SubStr(C2B(SubStr(D2C(c),1,1)),6,3)||SubStr(C2B(SubStr(D2C(c),2,1)),1,2))||B2C('10'||SubStr(C2B(SubStr(D2C(c),2,1)),3,6))
         End;
      When c<X2D('10000') Then Do
         cx=B2C('1110'||SubStr(C2B(SubStr(D2C(c),1,1)),1,4))||B2C('10'||SubStr(C2B(SubStr(D2C(c),1,1)),5,4)||SubStr(C2B(SubStr(D2C(c),2,1)),1,2))||B2C('10'||SubStr(C2B(SubStr(D2C(c),2,1)),3,6))
         End;
      When c<X2D('200000') Then If Length(D2C(c))=4 Then Do
            cx=B2C('11110'||SubStr(C2B(SubStr(D2C(c),1,1)),6,3)))||B2C('10'||SubStr(C2B(SubStr(D2C(c),2,1)),1,6))||B2C('10'||SubStr(C2B(SubStr(D2C(c),2,1)),7,2)||SubStr(C2B(SubStr(D2C(c),3,1)),1,4))||B2C('10'||SubStr(C2B(SubStr(D2C(c),3,1)),5,4)||SubStr(C2B(SubStr(D2C(c),4,1)),1,2))||B2C('10'||SubStr(C2B(SubStr(D2C(c),4,1)),3,6))
         End;Else If Length(D2C(c))=3 Then Do
            cx=B2C('11110'||SubStr(C2B(SubStr(D2C(c),1,1)),4,3))||B2C('10'||SubStr(C2B(SubStr(D2C(c),1,1)),7,2)||SubStr(C2B(SubStr(D2C(c),2,1)),1,4))||B2C('10'||SubStr(C2B(SubStr(D2C(c),2,1)),5,4)||SubStr(C2B(SubStr(D2C(c),3,1)),1,2))||B2C('10'||SubStr(C2B(SubStr(D2C(c),3,1)),3,6))
         End;

      Otherwise Exit();
   End;
    WriteLn(CDFH,'MSG_U'||C2X(D2C(c))||' ( '||cx||' / '||Length(cx)||' / 128 )'||'0A'x||'U='||C2X(cx)||'0A'x||';;');
    WriteLn(CTFH,'MSG_U'||C2X(D2C(c))||'0A'x||'U['||cx||']'||'0A'x||';;');
    If count>200 Then Do
        ADDRESS COMMAND Wait
        count=0
    End;
End

AmigaOne X1000 with 2GB memory - OS4.1 FE
User avatar
gazelle
Posts: 102
Joined: Sun Mar 04, 2012 12:49 pm
Location: Frohnleiten, Austria

Re: ARexx - potential bug or just a badly written script?

Post by gazelle »

The function WRITELN() returns an value. You should either "CALL WRITELN(...)" or "var = WRITELN(...)".

Otherwise AREXX will try to execute a command with the name of the returned value.
xenic
Posts: 1185
Joined: Sun Jun 19, 2011 1:06 am

Re: ARexx - potential bug or just a badly written script?

Post by xenic »

gazelle wrote:The function WRITELN() returns an value. You should either "CALL WRITELN(...)" or "var = WRITELN(...)".

Otherwise AREXX will try to execute a command with the name of the returned value.
Since it returns a number I think it's unlikely that any command will be executed. However,you're right about the return value and the script should probably check the return value and report the error before closing the file and exiting. As I noted earlier, the script doesn't close the output files at all. I just added the counter and delay to prevent having all memory used up. The script still needs a little work.

My takeaway from this topic is that some fixes are needed in OS4. It might be interresting to see if a C program can produce the same excess memory consumption by performing a large number of consecutive data writes to a file in ram: like the ARexx and DOS scripts.
AmigaOne X1000 with 2GB memory - OS4.1 FE
Post Reply