Hard crash corrupting memory chips?

AmigaOne X1000 platform specific issues.
Post Reply
User avatar
Hypex
Beta Tester
Beta Tester
Posts: 442
Joined: Mon Dec 20, 2010 2:23 pm
Location: Vic. Australia.

Hard crash corrupting memory chips?

Post by Hypex »

Hi guys.

I've been debugging a program I'm working on. I changed some things around in an attempt to clean up some operations and found it had the opposite effect of crashing my program. I eventually determined what was going on from a cryptic crash log and restored the code, which had been working perfectly fine, and it was back to working order again.

But since then, I've started getting other crashes I can't figure out. They don't crash in my program but other programs in the system. Somehow I have a cascading crash effect and I can't track it down. It is hard crashing my program with a system freeze. So I can only read the result from a serial log. I suspect there is an interrupt crashing as the tasks affected make no sense. It's so bad that the reaper is crashing in the middle of dumping a log and task suspending is failing to suspend a task as the task keeps crashing before it can be suspended! This doesn't make sense as I thought the system disabled interrupts which should halt most things crashing.

But the problem is now worse. One night, after I was testing it crashed again, but somehow ruined the hardware. After I reset it kept going into a reboot loop. CFE crashed and reset. Then it loaded amgaboot and crashed. It kept rebooting and crashing. I turned off for for a minute or so. Turn on and it the crashing remained. I gave up and just went to bed.

The next night I turn on and it seems to boot perfectly fine. Then I tried to boot Linux which loaded the kernel but then the screen just went black. I rebooted and tried again with my recovery setup and it booted fine. I loaded up the desktop and used it for an hour with no problems.

I've also done some memory testing and suspect the ram has gone bad. I already had this problem last year and thought I had solved it but it may be back again. This comes after I have replaced the battery and thought my machine was stable again. Can a hard crash permanently damage memory chips? It seems ridiculous but the timing is impeccable.

This CFE crashing and rebooting in amigaboot immediately as soon as it loads a config.

Code: Select all

Booting configuration AmigaOS_4.1_Final_Edition

** Exception 0x0200: SRR0=000000007FD50A10 SRR1=100000000210B000 [MCheck  ] cpu0
         LR = 000000007FD3DAB0     CTR = 000000007FD4D060
        XER = 0000000000000000   DSISR = 00008000
       HID0 = 8000000000000000    HID1 = 000000005CE993B1
       HID4 = 4400240000080180    HID5 = 0000006600000000
       LPCR = 0000000000000002

        r0  = 00000000FC800000     r1  = 000000007FFFE5FC
        r2  = 000000007FD20838     r3  = 0000000000000000
        r4  = 00000000FC801040     r5  = 000000000000000C
        r6  = 000000007CE81438     r7  = 0000000000000200
        r8  = 0000000000000000     r9  = 0000000000000002
        r10 = 000000007CE81438     r11 = 000000007CE81638
        r12 = 0000000000004000     r13 = 0000000000000000
        r14 = 0000000000000000     r15 = 0000000000000000
        r16 = 000000007F2AC238     r17 = 0000000000000CC3
        r18 = 00000000003BAE62     r19 = 0000000000000CC3
        r20 = 000000000020F10C     r21 = 00000000001A1C00
        r22 = 0000000000000CC3     r23 = 00000000424E4443
        r24 = 000000000020F110     r25 = 000000007FFFF09C
        r26 = 0000000000000000     r27 = 0000000000000009
        r28 = 000000007FE00B40     r29 = 000000000000660C
        r30 = 000000007FDDE9A0     r31 = 0000000000000000
Check out these memory tests in CFE. The don't look right to me. But I don't fully understand what it means either, as it doesn't explain it.

Code: Select all

CFE> testdram
DRAM test complete!
*** command status = 0
CFE> memorytest
Available memory arenas:
phys = 0000000000000000, virt = 0000000000000000, size = 000000007FD1D000

Testing memory.

Testing: phys = 0000000000001500, virt = 0000000000001500, size = 000000007FD1BB00
Writing: a/5/c/3
Reading: a/5/c/3
Writing: address|5555/inv/aaaa|address
Reading: address|5555/inv/aaaa|address
MC_Status:  MC0: 00000000 [ ] SBE=0 | MC1: 00000000 [ ] SBE=0
*** command status = 0
CFE> randmemtest
Writing (0000000000000000 -> 0000000010000000) scrambler=7
Reading.
mem[000000000FFFFFC0] 8000000000000000 should be 55555AAAAAAAAAAA (D5555AAAAAAAAAAA)
mem[000000000FFFFFC8] 0000000000000000 should be AAAAA55555555555 (AAAAA55555555555)
mem[000000000FFFFFD8] 0000000000000000 should be AAAAAAAAAAAAAAAA (AAAAAAAAAAAAAAAA)
mem[000000000FFFFFD0] 0000000000000000 should be 5555555555555555 (5555555555555555)
mem[000000000FFFFFE0] 0000000000000000 should be 000000000FFFFFC0 (000000000FFFFFC0)
mem[000000000FFFFFE8] 0000000000000000 should be FFFFFFFFF000003F (FFFFFFFFF000003F)
mem[000000000FFFFFF8] 0000000000000000 should be FFFFFFFFFFFFFFFF (FFFFFFFFFFFFFFFF)
Another ram test after my program crashed.

Code: Select all

CFE> memorytest
Available memory arenas:
phys = 0000000000000000, virt = 0000000000000000, size = 000000007FD1D000

Testing memory.

Testing: phys = 0000000000001500, virt = 0000000000001500, size = 000000007FD1BB00
Writing: a/5/c/3
Reading: a/5/c/3
Writing: address|5555/inv/aaaa|address
Reading: address|5555/inv/aaaa|address
MC_Status:  MC0: 00000000 [ ] SBE=0 | MC1: 00000000 [ ] SBE=0
*** command status = 0
CFE> testdram
DRAM test complete!
*** command status = 0
CFE> randmemtest
Writing (0000000000000000 -> 0000000010000000) scrambler=7
Reading.
mem[000000000FFFFFC0] 8000000000000000 should be 55555AAAAAAAAAAA (D5555AAAAAAAAAAA)
mem[000000000FFFFFC8] 7FE000087FE00008 should be AAAAA55555555555 (D54AA55D2AB5555D)
mem[000000000FFFFFD8] 7FE000087FE00008 should be AAAAAAAAAAAAAAAA (D54AAAA2D54AAAA2)
mem[000000000FFFFFD0] 7FE000087FE00008 should be 5555555555555555 (2AB5555D2AB5555D)
mem[000000000FFFFFE0] 7FE000087FE00008 should be 000000000FFFFFC0 (7FE00008701FFFC8)
mem[000000000FFFFFE8] 7FE000087FE00008 should be FFFFFFFFF000003F (801FFFF78FE00037)
mem[000000000FFFFFF0] 7FE000087FE00008 should be 0000000000000000 (7FE000087FE00008)
mem[000000000FFFFFF8] 7FE000087FE00008 should be FFFFFFFFFFFFFFFF (801FFFF7801FFFF7)
So, time for some new RAM? I've been running it on one stick for years. Which I replaced last year. I'd like to run it on two matching sticks now. If it can still be bought new.
Post Reply