Threaded libcurl crash

A forum for general AmigaOS 4.x support questions that are not platform-specific
chris
Posts: 564
Joined: Sat Jun 18, 2011 12:05 pm
Contact:

Threaded libcurl crash

Post by chris »

If libcurl is built with the threaded resolver it causes a crash in bsdsocket.library when multiple DNS lookups are occuring at once.

Here is an example stack trace from OS4.1 update 4 using this threaded version of libcurl: http://homepage.ntlworld.com/cdyoung/tm ... readed.lha (NB: SSL does not work on this build). It can be tested with NetSurf 2.9 by copying the archive version of libcurl.so.7 into NetSurf's directory.

Stack trace:
native kernel module kernel+0x00012450
native kernel module dos.library.kmod+0x0002a4a4
native kernel module dos.library.kmod+0x000221d8
native kernel module dos.library.kmod+0x00019404
native kernel module dos.library.kmod+0x00005ab4
module LIBS:bsdsocket.library at 0x6FB80EF8 (section 5 @ 0x2DED8)
module LIBS:bsdsocket.library at 0x6FB81260 (section 5 @ 0x2E240)
module LIBS:bsdsocket.library at 0x6FB8CA1C (section 5 @ 0x399FC)
module LIBS:bsdsocket.library at 0x6FB8DF44 (section 5 @ 0x3AF24)
module LIBS:bsdsocket.library at 0x6FB806E8 (section 5 @ 0x2D6C8)
libcurl.so.7:Curl_ipv4_resolve_r()+0xA8 (section 9 @ 0x32cf0)
libcurl.so.7:gethostbyname_thread()+0x20 (section 9 @ 0x41370)
libcurl.so.7:curl_thread_create_thunk()+0x4C (section 9 @ 0x3f630)
pthreads.library:run()+0x278 (section 1 @ 0x4134)
pthreads.library:ThreadCode()+0x35C (section 1 @ 0x44dc)
native kernel module dos.library.kmod+0x00022a0c
native kernel module kernel+0x0003af48
native kernel module kernel+0x0003afc8

Another one, this time with OWB:
Stack trace:
module LIBS:bsdsocket.library at 0x6FC02EB8 (section 5 @ 0x4CE98)
module LIBS:bsdsocket.library at 0x6FC051E8 (section 5 @ 0x4F1C8)
module LIBS:bsdsocket.library at 0x6FC055B0 (section 5 @ 0x4F590)
module LIBS:bsdsocket.library at 0x6FBE7FBC (section 5 @ 0x31F9C)
module LIBS:bsdsocket.library at 0x6FBF1A04 (section 5 @ 0x3B9E4)
module LIBS:bsdsocket.library at 0x6FBF0A88 (section 5 @ 0x3AA68)
module LIBS:bsdsocket.library at 0x6FBF0CE4 (section 5 @ 0x3ACC4)
module LIBS:bsdsocket.library at 0x6FBF125C (section 5 @ 0x3B23C)
module LIBS:bsdsocket.library at 0x6FBE36E8 (section 5 @ 0x2D6C8)
native kernel module newlib.library.kmod+0x00039970
OWB:Curl_ipv4_resolve_r()+0x7C (section 1 @ 0xe0f88c)
OWB:gethostbyname_thread()+0x20 (section 1 @ 0xdf1a38)
OWB:curl_thread_create_thunk()+0x38 (section 1 @ 0xe17c4c)
pthreads.library:run()+0x278 (section 1 @ 0x4134)
pthreads.library:ThreadCode()+0x35C (section 1 @ 0x44dc)
native kernel module dos.library.kmod+0x00022a0c
native kernel module kernel+0x0003af48
native kernel module kernel+0x0003afc8

Lots of discussion on this aw.net thread (post #43 onwards): http://amigaworld.net/modules/newbb/vie ... at&order=0
kas1e
Beta Tester
Beta Tester
Posts: 543
Joined: Sat Jun 18, 2011 8:56 am
Contact:

Re: Threaded libcurl crash

Post by kas1e »

To add some bits about: Crhis here mean MUIOWB port (no Reaction one), which i tryed to build with threaded curl, and which crashes the same as it crashes for Crhis with his Netsurf port , when he just build libcurl with enabled threading as well. Visually crash happens when users trying to send more than few "bad" dns requesters (i.e. wrong urls and co), and while its start to works, and no blocking of GUI happens, after 3-5 bad dns queres (even of the same ones), crash is happens.

That mean, that in 2 different programms , libcurl builded with threaded resolver , crashes just on the same functions. So, or there is some nasty bug in pthreads.library, or there should be added something amigaos4-only specific to the code of libcurl when it builds with threading resolver. It also can be problems with our tcp/ip stack: because when we rewrite threaded parts of the libcurl on semaphores (so, no pthreads involved at all), its still crashes in the bsdsocket.library on the same functions.

We of course can think that it problems of the libcurl itself, but of course threaded version of libcurl works fine on all the other oses (such as unix, macos, windows, etc). I.e. exactly the same code.

I also contacted with Olaf about, and he aware about such a problem, but so far he have no time or so for checking that. In general, all what we need now, its just build a simply network programm, which will use threaded libcurl and will send let's say 10-20 specially bad DNS queres (thats when crashes in libcurl happens : when we send 3-5 or more bad dns requesters which should asynchornicly die in the rest). If anyone can make such an example, it will be very good start and test case for tracking down the bug (and after that, and netsurf, and muiowb will no block gui at all, and everything will be better).
User avatar
ssolie
Beta Tester
Beta Tester
Posts: 1010
Joined: Mon Dec 20, 2010 8:51 pm
Location: Canada
Contact:

Re: Threaded libcurl crash

Post by ssolie »

kas1e wrote:I also contacted with Olaf about, and he aware about such a problem...
What is the bugzilla bug number for this issue?
ExecSG Team Lead
kas1e
Beta Tester
Beta Tester
Posts: 543
Joined: Sat Jun 18, 2011 8:56 am
Contact:

Re: Threaded libcurl crash

Post by kas1e »

@Steven
What is the bugzilla bug number for this issue?
Its not in BZ because i do not know where to fill bug (in tcp/ip stack, or in the phtread.library). As well, as to reproduce the problem we need a normal and tiny test case, which no one currently do. Currently its all stops on the moment when we discuss with Crhis in mails that we need test case, but none of us do it. But even when test case will be done, i still do not know to which component fill BZ. It can be phtreads, can be bsdsocket.library, and can be just mix of both.

@all

Anyone have interst to help with test case / libcurl / dns programming ?
Deniil
Posts: 111
Joined: Mon Jul 11, 2011 7:59 pm

Re: Threaded libcurl crash

Post by Deniil »

So that is why we never got this thing to work in MUI-OWB. Interesting. So the issue seems to happen when there are more than a few (more than one?) outstanding DNS requests. (Bad requests causing longer replies and thus increases the probability of a crash?)

Since IBrowse seems to handle this fine it can't be bsdsocket.library alone. Or am I wrong? I uses one task for each connection which presumably opens its own instance of bsdsocket.library and sends exactly one request per task.

SabreMSN sometimes shows a similar behavior when the network goes down. It appears to try to resolve the name for its server(s) repeatedly (without getting any answers) within the same task and that usually leads to a similar hard lockup of the machine after a short while. Could be the same issue?
kas1e
Beta Tester
Beta Tester
Posts: 543
Joined: Sat Jun 18, 2011 8:56 am
Contact:

Re: Threaded libcurl crash

Post by kas1e »

@Deniil
So the issue seems to happen when there are more than a few (more than one?) outstanding DNS requests. (Bad requests causing longer replies and thus increases the probability of a crash?)
Yep, looks like this
SabreMSN sometimes shows a similar behavior when the network goes down. It appears to try to resolve the name for its server(s) repeatedly (without getting any answers) within the same task and that usually leads to a similar hard lockup of the machine after a short while. Could be the same issue?
For now i am almost sure that its the same, and that issue is not pthreads or semaphores, but bsdsocket.library itself. Everything around that curl_ipv4_resolve_r() which lead to crash bsdsocket.library at some conditions (as far as i can see from tests its indeed when bad-wrong requesters causing longer replies).


@all

I just build now very latest version of libcurl without any single change, just with "--enable-threaded-resolver". Then, i found on a curl's www a very good test case called multithread.c:

here original
here my modified version with just adding more bad urls, not 4 as in example, but just 10
here is os4 binary for tests

As you can see test case _very_ small. To reproduce the crash just spawn a let's say, 4-5 shell windowses, type in all of them "thread_test", and run them all after another fast (so bsdsocket will be bombed out by those bad-long querys, from 4-5 different tasks). Or, you can go another way: newcli, run thread_test, again newcli, run one more instance of thread_test, again newcli. And on 4-5-6 you will have or lockup or GR. Nature of problem the same as i have with muiowb, and i assume the same as have Crhis with netsurf : i.e. 3-4-5-6 tasks of the bad requesters cause a crash in bsdsocket.library.

Sometime you will have just lockup , sometime that lockup will be hard one (no 3 buttons works), sometime it will be easy one (you can reboot by 3 buttons and check by dumpdebugbuffer what is going on), sometime (50% of times), it will bring a GR, stacktrace of which point out on bsdsocket.library and thats curl_ipv4_resolve_r() which involved all the time (in the dumpdebugbuffer outputs stacktrace are the same too).

There i collect bunch of crashlogs and dumpdebugbuffers from 5-6 tries of running 4-5 instances of the same "thread_test" binary, with default stack size, and with pretty big stack size (2000000) - problem the same. If someone can reproduce all of this on some other machines, that can be helpfull.

It even can be possible that problem is not exactly bsdsocket.library, but that we need to add something aos4 specific to the test_code (like maybe some safe checking, or dunno). Through, as test case very small, and involved not a lot, and on let's say, morphos, there is no such crashes (with semaphores, not pthreads, but os4 build with samaphores crashes the same still), my bet its still bsdsocket.library.
User avatar
SOFISTISOFTWARE
Posts: 44
Joined: Sat Jun 18, 2011 9:14 am
Location: Latina, Italy
Contact:

Re: Threaded libcurl crash

Post by SOFISTISOFTWARE »

@Kas1e

i hope Olaf is going to resolve this annoying trouble
Sam 460EX, 2Gb Ram, Radeon R7 250E, OS4.1 FE
chris
Posts: 564
Joined: Sat Jun 18, 2011 12:05 pm
Contact:

Re: Threaded libcurl crash

Post by chris »

@Kas1e

Perhaps somebody with Roadshow 68k can run the same test? And then on some other TCP/IP stack if it does the same thing? That will prove whether it's Roadshow at fault.
User avatar
ssolie
Beta Tester
Beta Tester
Posts: 1010
Joined: Mon Dec 20, 2010 8:51 pm
Location: Canada
Contact:

Re: Threaded libcurl crash

Post by ssolie »

SOFISTISOFTWARE wrote:i hope Olaf is going to resolve this annoying trouble
I really wish a bug report was filed...

Nothing has been done and this will likely continue until at least a bug report is filed against bsdsocket.library.
ExecSG Team Lead
kas1e
Beta Tester
Beta Tester
Posts: 543
Joined: Sat Jun 18, 2011 8:56 am
Contact:

Re: Threaded libcurl crash

Post by kas1e »

@Chris
Perhaps somebody with Roadshow 68k can run the same test? And then on some other TCP/IP stack if it does the same thing? That will prove whether it's Roadshow at fault.
As far as i know pthreads only avail for os4, so even if do the same test case on other oses/tcpip stacks, then i need to adapt fab's semaphores changes (to avoid usage of pthreads), check if it the same crashes on os4, then do the same tests on let's say morphos/aros , and if someone can build os3 version of threaded curl with semaphores and no pthreads, then and on os3. But you right , its a way to go, so we can be sure it is os4 only problem (if it).

@steven
I really wish a bug report was filed...
Nothing has been done and this will likely continue until at least a bug report is filed against bsdsocket.library.
It is just too early for BZ , as we can't be sure it is bsdsocket.library, because in my current case pthreads is involved (through, i assume it is not pthreads problems, as they of course should be involved - they start a tasks). I firstly need to adapt again fab's changes on latest curl (where he replace pthreads on semaphores), then build mos/aros/ versions to check this out if it roadshow's only problem.

I.e. not everything clear there for now and knowing that noone want to dig in into big details when fix something, i hope to make a normal BZ where everything will be clear and easy to reproduce and to understand.
Post Reply