If libcurl is built with the threaded resolver it causes a crash in bsdsocket.library when multiple DNS lookups are occuring at once.
Here is an example stack trace from OS4.1 update 4 using this threaded version of libcurl: http://homepage.ntlworld.com/cdyoung/tm ... readed.lha (NB: SSL does not work on this build). It can be tested with NetSurf 2.9 by copying the archive version of libcurl.so.7 into NetSurf's directory.
Stack trace:
native kernel module kernel+0x00012450
native kernel module dos.library.kmod+0x0002a4a4
native kernel module dos.library.kmod+0x000221d8
native kernel module dos.library.kmod+0x00019404
native kernel module dos.library.kmod+0x00005ab4
module LIBS:bsdsocket.library at 0x6FB80EF8 (section 5 @ 0x2DED8)
module LIBS:bsdsocket.library at 0x6FB81260 (section 5 @ 0x2E240)
module LIBS:bsdsocket.library at 0x6FB8CA1C (section 5 @ 0x399FC)
module LIBS:bsdsocket.library at 0x6FB8DF44 (section 5 @ 0x3AF24)
module LIBS:bsdsocket.library at 0x6FB806E8 (section 5 @ 0x2D6C8)
libcurl.so.7:Curl_ipv4_resolve_r()+0xA8 (section 9 @ 0x32cf0)
libcurl.so.7:gethostbyname_thread()+0x20 (section 9 @ 0x41370)
libcurl.so.7:curl_thread_create_thunk()+0x4C (section 9 @ 0x3f630)
pthreads.library:run()+0x278 (section 1 @ 0x4134)
pthreads.library:ThreadCode()+0x35C (section 1 @ 0x44dc)
native kernel module dos.library.kmod+0x00022a0c
native kernel module kernel+0x0003af48
native kernel module kernel+0x0003afc8
Another one, this time with OWB:
Stack trace:
module LIBS:bsdsocket.library at 0x6FC02EB8 (section 5 @ 0x4CE98)
module LIBS:bsdsocket.library at 0x6FC051E8 (section 5 @ 0x4F1C8)
module LIBS:bsdsocket.library at 0x6FC055B0 (section 5 @ 0x4F590)
module LIBS:bsdsocket.library at 0x6FBE7FBC (section 5 @ 0x31F9C)
module LIBS:bsdsocket.library at 0x6FBF1A04 (section 5 @ 0x3B9E4)
module LIBS:bsdsocket.library at 0x6FBF0A88 (section 5 @ 0x3AA68)
module LIBS:bsdsocket.library at 0x6FBF0CE4 (section 5 @ 0x3ACC4)
module LIBS:bsdsocket.library at 0x6FBF125C (section 5 @ 0x3B23C)
module LIBS:bsdsocket.library at 0x6FBE36E8 (section 5 @ 0x2D6C8)
native kernel module newlib.library.kmod+0x00039970
OWB:Curl_ipv4_resolve_r()+0x7C (section 1 @ 0xe0f88c)
OWB:gethostbyname_thread()+0x20 (section 1 @ 0xdf1a38)
OWB:curl_thread_create_thunk()+0x38 (section 1 @ 0xe17c4c)
pthreads.library:run()+0x278 (section 1 @ 0x4134)
pthreads.library:ThreadCode()+0x35C (section 1 @ 0x44dc)
native kernel module dos.library.kmod+0x00022a0c
native kernel module kernel+0x0003af48
native kernel module kernel+0x0003afc8
Lots of discussion on this aw.net thread (post #43 onwards): http://amigaworld.net/modules/newbb/vie ... at&order=0
Threaded libcurl crash
Re: Threaded libcurl crash
To add some bits about: Crhis here mean MUIOWB port (no Reaction one), which i tryed to build with threaded curl, and which crashes the same as it crashes for Crhis with his Netsurf port , when he just build libcurl with enabled threading as well. Visually crash happens when users trying to send more than few "bad" dns requesters (i.e. wrong urls and co), and while its start to works, and no blocking of GUI happens, after 3-5 bad dns queres (even of the same ones), crash is happens.
That mean, that in 2 different programms , libcurl builded with threaded resolver , crashes just on the same functions. So, or there is some nasty bug in pthreads.library, or there should be added something amigaos4-only specific to the code of libcurl when it builds with threading resolver. It also can be problems with our tcp/ip stack: because when we rewrite threaded parts of the libcurl on semaphores (so, no pthreads involved at all), its still crashes in the bsdsocket.library on the same functions.
We of course can think that it problems of the libcurl itself, but of course threaded version of libcurl works fine on all the other oses (such as unix, macos, windows, etc). I.e. exactly the same code.
I also contacted with Olaf about, and he aware about such a problem, but so far he have no time or so for checking that. In general, all what we need now, its just build a simply network programm, which will use threaded libcurl and will send let's say 10-20 specially bad DNS queres (thats when crashes in libcurl happens : when we send 3-5 or more bad dns requesters which should asynchornicly die in the rest). If anyone can make such an example, it will be very good start and test case for tracking down the bug (and after that, and netsurf, and muiowb will no block gui at all, and everything will be better).
That mean, that in 2 different programms , libcurl builded with threaded resolver , crashes just on the same functions. So, or there is some nasty bug in pthreads.library, or there should be added something amigaos4-only specific to the code of libcurl when it builds with threading resolver. It also can be problems with our tcp/ip stack: because when we rewrite threaded parts of the libcurl on semaphores (so, no pthreads involved at all), its still crashes in the bsdsocket.library on the same functions.
We of course can think that it problems of the libcurl itself, but of course threaded version of libcurl works fine on all the other oses (such as unix, macos, windows, etc). I.e. exactly the same code.
I also contacted with Olaf about, and he aware about such a problem, but so far he have no time or so for checking that. In general, all what we need now, its just build a simply network programm, which will use threaded libcurl and will send let's say 10-20 specially bad DNS queres (thats when crashes in libcurl happens : when we send 3-5 or more bad dns requesters which should asynchornicly die in the rest). If anyone can make such an example, it will be very good start and test case for tracking down the bug (and after that, and netsurf, and muiowb will no block gui at all, and everything will be better).
Re: Threaded libcurl crash
What is the bugzilla bug number for this issue?kas1e wrote:I also contacted with Olaf about, and he aware about such a problem...
ExecSG Team Lead
Re: Threaded libcurl crash
@Steven
@all
Anyone have interst to help with test case / libcurl / dns programming ?
Its not in BZ because i do not know where to fill bug (in tcp/ip stack, or in the phtread.library). As well, as to reproduce the problem we need a normal and tiny test case, which no one currently do. Currently its all stops on the moment when we discuss with Crhis in mails that we need test case, but none of us do it. But even when test case will be done, i still do not know to which component fill BZ. It can be phtreads, can be bsdsocket.library, and can be just mix of both.What is the bugzilla bug number for this issue?
@all
Anyone have interst to help with test case / libcurl / dns programming ?
Re: Threaded libcurl crash
So that is why we never got this thing to work in MUI-OWB. Interesting. So the issue seems to happen when there are more than a few (more than one?) outstanding DNS requests. (Bad requests causing longer replies and thus increases the probability of a crash?)
Since IBrowse seems to handle this fine it can't be bsdsocket.library alone. Or am I wrong? I uses one task for each connection which presumably opens its own instance of bsdsocket.library and sends exactly one request per task.
SabreMSN sometimes shows a similar behavior when the network goes down. It appears to try to resolve the name for its server(s) repeatedly (without getting any answers) within the same task and that usually leads to a similar hard lockup of the machine after a short while. Could be the same issue?
Since IBrowse seems to handle this fine it can't be bsdsocket.library alone. Or am I wrong? I uses one task for each connection which presumably opens its own instance of bsdsocket.library and sends exactly one request per task.
SabreMSN sometimes shows a similar behavior when the network goes down. It appears to try to resolve the name for its server(s) repeatedly (without getting any answers) within the same task and that usually leads to a similar hard lockup of the machine after a short while. Could be the same issue?
Re: Threaded libcurl crash
@Deniil
@all
I just build now very latest version of libcurl without any single change, just with "--enable-threaded-resolver". Then, i found on a curl's www a very good test case called multithread.c:
here original
here my modified version with just adding more bad urls, not 4 as in example, but just 10
here is os4 binary for tests
As you can see test case _very_ small. To reproduce the crash just spawn a let's say, 4-5 shell windowses, type in all of them "thread_test", and run them all after another fast (so bsdsocket will be bombed out by those bad-long querys, from 4-5 different tasks). Or, you can go another way: newcli, run thread_test, again newcli, run one more instance of thread_test, again newcli. And on 4-5-6 you will have or lockup or GR. Nature of problem the same as i have with muiowb, and i assume the same as have Crhis with netsurf : i.e. 3-4-5-6 tasks of the bad requesters cause a crash in bsdsocket.library.
Sometime you will have just lockup , sometime that lockup will be hard one (no 3 buttons works), sometime it will be easy one (you can reboot by 3 buttons and check by dumpdebugbuffer what is going on), sometime (50% of times), it will bring a GR, stacktrace of which point out on bsdsocket.library and thats curl_ipv4_resolve_r() which involved all the time (in the dumpdebugbuffer outputs stacktrace are the same too).
There i collect bunch of crashlogs and dumpdebugbuffers from 5-6 tries of running 4-5 instances of the same "thread_test" binary, with default stack size, and with pretty big stack size (2000000) - problem the same. If someone can reproduce all of this on some other machines, that can be helpfull.
It even can be possible that problem is not exactly bsdsocket.library, but that we need to add something aos4 specific to the test_code (like maybe some safe checking, or dunno). Through, as test case very small, and involved not a lot, and on let's say, morphos, there is no such crashes (with semaphores, not pthreads, but os4 build with samaphores crashes the same still), my bet its still bsdsocket.library.
Yep, looks like thisSo the issue seems to happen when there are more than a few (more than one?) outstanding DNS requests. (Bad requests causing longer replies and thus increases the probability of a crash?)
For now i am almost sure that its the same, and that issue is not pthreads or semaphores, but bsdsocket.library itself. Everything around that curl_ipv4_resolve_r() which lead to crash bsdsocket.library at some conditions (as far as i can see from tests its indeed when bad-wrong requesters causing longer replies).SabreMSN sometimes shows a similar behavior when the network goes down. It appears to try to resolve the name for its server(s) repeatedly (without getting any answers) within the same task and that usually leads to a similar hard lockup of the machine after a short while. Could be the same issue?
@all
I just build now very latest version of libcurl without any single change, just with "--enable-threaded-resolver". Then, i found on a curl's www a very good test case called multithread.c:
here original
here my modified version with just adding more bad urls, not 4 as in example, but just 10
here is os4 binary for tests
As you can see test case _very_ small. To reproduce the crash just spawn a let's say, 4-5 shell windowses, type in all of them "thread_test", and run them all after another fast (so bsdsocket will be bombed out by those bad-long querys, from 4-5 different tasks). Or, you can go another way: newcli, run thread_test, again newcli, run one more instance of thread_test, again newcli. And on 4-5-6 you will have or lockup or GR. Nature of problem the same as i have with muiowb, and i assume the same as have Crhis with netsurf : i.e. 3-4-5-6 tasks of the bad requesters cause a crash in bsdsocket.library.
Sometime you will have just lockup , sometime that lockup will be hard one (no 3 buttons works), sometime it will be easy one (you can reboot by 3 buttons and check by dumpdebugbuffer what is going on), sometime (50% of times), it will bring a GR, stacktrace of which point out on bsdsocket.library and thats curl_ipv4_resolve_r() which involved all the time (in the dumpdebugbuffer outputs stacktrace are the same too).
There i collect bunch of crashlogs and dumpdebugbuffers from 5-6 tries of running 4-5 instances of the same "thread_test" binary, with default stack size, and with pretty big stack size (2000000) - problem the same. If someone can reproduce all of this on some other machines, that can be helpfull.
It even can be possible that problem is not exactly bsdsocket.library, but that we need to add something aos4 specific to the test_code (like maybe some safe checking, or dunno). Through, as test case very small, and involved not a lot, and on let's say, morphos, there is no such crashes (with semaphores, not pthreads, but os4 build with samaphores crashes the same still), my bet its still bsdsocket.library.
- SOFISTISOFTWARE
- Posts: 44
- Joined: Sat Jun 18, 2011 9:14 am
- Location: Latina, Italy
- Contact:
Re: Threaded libcurl crash
@Kas1e
i hope Olaf is going to resolve this annoying trouble
i hope Olaf is going to resolve this annoying trouble
Sam 460EX, 2Gb Ram, Radeon R7 250E, OS4.1 FE
Re: Threaded libcurl crash
@Kas1e
Perhaps somebody with Roadshow 68k can run the same test? And then on some other TCP/IP stack if it does the same thing? That will prove whether it's Roadshow at fault.
Perhaps somebody with Roadshow 68k can run the same test? And then on some other TCP/IP stack if it does the same thing? That will prove whether it's Roadshow at fault.
Re: Threaded libcurl crash
I really wish a bug report was filed...SOFISTISOFTWARE wrote:i hope Olaf is going to resolve this annoying trouble
Nothing has been done and this will likely continue until at least a bug report is filed against bsdsocket.library.
ExecSG Team Lead
Re: Threaded libcurl crash
@Chris
@steven
I.e. not everything clear there for now and knowing that noone want to dig in into big details when fix something, i hope to make a normal BZ where everything will be clear and easy to reproduce and to understand.
As far as i know pthreads only avail for os4, so even if do the same test case on other oses/tcpip stacks, then i need to adapt fab's semaphores changes (to avoid usage of pthreads), check if it the same crashes on os4, then do the same tests on let's say morphos/aros , and if someone can build os3 version of threaded curl with semaphores and no pthreads, then and on os3. But you right , its a way to go, so we can be sure it is os4 only problem (if it).Perhaps somebody with Roadshow 68k can run the same test? And then on some other TCP/IP stack if it does the same thing? That will prove whether it's Roadshow at fault.
@steven
It is just too early for BZ , as we can't be sure it is bsdsocket.library, because in my current case pthreads is involved (through, i assume it is not pthreads problems, as they of course should be involved - they start a tasks). I firstly need to adapt again fab's changes on latest curl (where he replace pthreads on semaphores), then build mos/aros/ versions to check this out if it roadshow's only problem.I really wish a bug report was filed...
Nothing has been done and this will likely continue until at least a bug report is filed against bsdsocket.library.
I.e. not everything clear there for now and knowing that noone want to dig in into big details when fix something, i hope to make a normal BZ where everything will be clear and easy to reproduce and to understand.