Warp3D: 2048x2048 texture / W3D_DrawArray problem

A forum for general AmigaOS 4.x support questions that are not platform-specific
User avatar
Daytona675x
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 34
Joined: Wed Jan 22, 2014 5:18 pm
Location: Cologne, Germany
Contact:

Warp3D: 2048x2048 texture / W3D_DrawArray problem

Post by Daytona675x »

A somewhat weird bug I stepped into:

Max. texture size reported by Warp3D on my Radeon 9250 setup is 2048 x 2048.
And when I create such a texture (no mip-mapping, 2048x2048x32) all is fine at first glance. The texture allocation / preparation reports success everywhere.
But when I actually use that texture with W3D_DrawArray the following happens:

- W3D_DrawArray suddenly becomes very slow for that call.
- that triangle batch is not drawn at all.
- W3D_DrawArray returns -8, which apparently coresponds to W3D_NOGFXMEM.
- according to the docs that isn't even a value W3D_DrawArray is supposed to return under any circumstances.

If I destroy that 2048x2048 texture and create a 1024x1024 variant instead then all works well (I can change it on the fly), so it can at least recover.
Overall I'm only using a fraction of the gfx-card's 128 MB VRAM (certainly not more than maybe 30 MB).

To pin-point it a little further I wrote a simple test-prog that does nothing else than to create such a 2048 texture and to draw one triangle using it.
It works - sometimes :)
At least as long as you don't create and keep a second texture in parallel (tried with a second texture of size 256x256x32).
Then the abovementioned things happen.

Only tested this with 32bit textures, maybe it also happens with 16bit, don't know. Only tested on my 9250 setup.

I know this is probably not the right forum anymore for Warp3D issues.
If a moderator or somebody else in charge can please forward this issue to the people at A-EON?!
Thanks!
Warp3D driver code-basher and bug-smasher - btw.: driver writing is nothing mysterious
User avatar
Karlos
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 84
Joined: Sun Jun 19, 2011 11:42 am
Location: United Kingdom of England and anybody else that wishes to remain.

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Post by Karlos »

Here is what I believe is happening

1) When querying the maximum texture size, the value supported by the given GPU is returned. This does not imply that your particular card is capable of supporting a texture that size. Note that for 2Kx2Kx4 we need 16MiB. Should not really be problem for a card with 128 MiB. However, texture allocations are made from within BitMap allocations that have to be appropriately sized.

2) No actual allocation is done until you first lock the hardware and attempt to use that texture. At this time, the driver will find the texture is not resident and needs VRAM allocating and transferring.

3) It is possible that your texture is copied to an internal non-VRAM buffer if the texel format is not directly supported. Thats a 16MiB pixel format conversion, which in turn is quite slow.

4) The driver will ask the W3D_Gfx driver to allocate video memory so that the texture can be used. This in turn uses P96/Graphics to manage allocations. This will probably fail to find a single segment large enough for the containing BitMap allocation. It's also possible that the internal rounding required (textures often require stricter alignments that ordinary bitmaps) push the required bitmap allocation above 2048 pixels tall, which then fails anyway because the hardware can't support it.

In short, you are working at the very limits of the driver. Even on R200 you can expect 2048x2048 to be slow even if you can allocate it successfully.
User avatar
Daytona675x
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 34
Joined: Wed Jan 22, 2014 5:18 pm
Location: Cologne, Germany
Contact:

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Post by Daytona675x »

@Karlos
This does not imply that your particular card is capable of supporting a texture that size.
That GPU can handle 2048x2048 textures of that format and that concrete card too. As being said: it works, as long as you don't use more than that texture.
It is possible that your texture is copied to an internal non-VRAM buffer if the texel format is not directly supported. Thats a 16MiB pixel format conversion, which in turn is quite slow.
The texture format is directly supported, R8G8B8A8. No conversion should be necessary. But I can check other RGBA 32bit formats and see if that makes a difference.
But I guess you can rule that one out because when it works it is fast.
The driver will ask the W3D_Gfx driver to allocate video memory so that the texture can be used. This in turn uses P96/Graphics to manage allocations. This will probably fail to find a single segment large enough for the containing BitMap allocation.
Of course I have no idea what is happening inside, but I don't see a reason why this should happen. There should be plenty of sequential free RAM available (although I will test this myself and see if I can create some more big normal bitmaps).
It's also possible that the internal rounding required (textures often require stricter alignments that ordinary bitmaps) push the required bitmap allocation above 2048 pixels tall, which then fails anyway because the hardware can't support it.
It's all (at least) perfectly long-word aligned, power-of-two and 32bit. Why would the driver make such a nice big texture wider?
And as described, it sometimes works! If what you describe above would be the reason, why would the driver sometimes artificially increase the size and sometimes not in the first place? Sounds weird.
In short, you are working at the very limits of the driver. Even on R200 you can expect 2048x2048 to be slow even if you can allocate it successfully.
At the very limits? Certainly not the limits of the GPU. And regarding the driver: it's all inside sane parameters. Regarding memory / texture / vertex usage I don't even get near to what this GPU can normally handle.

Anyway, the most interesting point is that it works as long as I don't create another (small) second texture.
When it works it is by no means slow, that 2048 texture is drawn as fast as a 1024 texture (at least I don't feel a difference, didn't really measure).
The W3D_DrawArray call is only slow if it turns out to fail with the (undocumented) return code -8. What about that one? Doesn't that give a concrete hint about what is failing?
Warp3D driver code-basher and bug-smasher - btw.: driver writing is nothing mysterious
User avatar
Hans
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 703
Joined: Tue Dec 21, 2010 9:25 pm
Location: New Zealand
Contact:

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Post by Hans »

Daytona675x wrote:The W3D_DrawArray call is only slow if it turns out to fail with the (undocumented) return code -8. What about that one? Doesn't that give a concrete hint about what is failing?
As Karlos said, the -8 return code corresponds to W3D_NOGFXMEM. See the header file. It might not be specifically mentioned for that function in the autodocs, but the return codes are always the same.

The slow behaviour is indicative that Warp3D is using more VRAM than it can allocate, and so it's paging textures in and out of VRAM. During paging Picasso96 may also have a go at paging out bitmaps and defragging VRAM. Unfortunately, all of this is pretty slow.
Of course I have no idea what is happening inside, but I don't see a reason why this should happen. There should be plenty of sequential free RAM available (although I will test this myself and see if I can create some more big normal bitmaps).
If only things were so simple. Picasso96 doesn't like to share VRAM with anything else, so Warp3D has to pull a few tricks. Without going into too much detail, Warp3D usually allocates blocks of VRAM at a time, and then stores multiple smaller textures in one block. Due to the way that things are set up, it has to lock these allocations in place, or the texture base pointers will become invalid if Picasso96 chooses to perform paging/defragging (which could happen at any time). Unfortunately, this also means that Picasso96's defragging becomes less effective, because there are large blocks of VRAM locked in place.

Hans
http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more. Home of the RadeonHD driver for Amiga OS 4.x project.
User avatar
Karlos
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 84
Joined: Sun Jun 19, 2011 11:42 am
Location: United Kingdom of England and anybody else that wishes to remain.

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Post by Karlos »

@Daytona

In addition to what Hans states above, allow me to reiterate. The W3D_Gfx driver is allocating VRAM for textures. It does this by allocating BitMaps. However, only the required size in bytes of the texture is passed to the allocator function, not the dimensions or even the format , the allocator does not care about those. It can't anyway, as not every supported texel format is is a valid pixel format.

It guarantees only that the allocation is in VRAM and is aligned to hardware requirements. A single BitMap can (and usually does) contain several texture allocations that don't even have to be in the same hardware format. So there is not a 1:1 correspondence between textures and bitmaps - this is an important caveat.

BitMaps are like block allocations for texture data. Calculating the size of a new BitMap to allocate for an arbitrarily large number of bytes is a non trivial function that starts with the square root of the requested size and then tries to find the nearest hardware aligned width and adjust the height to compensate for the difference.

However, when you allocate a large texture, the allocation size passed to the function will not be satisfied by any existing free space in an existing texture and instead a BitMap AT LEAST large enough to hold the new texture will be sought. However, due to the nature of the allocator it is unlikely even in the exact 2048x2048 case that the requested BitMap will be the same physical dimension. It won't be wider than that, but it might end up taller. It is likely this that results in the failure to allocate VRAM because you can't allocate a BitMap bigger than 2048x2048 either.
Last edited by Karlos on Wed Apr 01, 2015 12:01 pm, edited 1 time in total.
User avatar
Karlos
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 84
Joined: Sun Jun 19, 2011 11:42 am
Location: United Kingdom of England and anybody else that wishes to remain.

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Post by Karlos »

PS:Try a 2048x2048 16 bit supported texture. That shouldn't lead to a 2048x2048 BitMap allocation if your context is on a 32-bit display because the BitMap allocation will likely be in the same format as the display (it uses friend allocation in p96 to ensure the BitMap memory is in VRAM and this usually clones the friend format).

In theory your 16 bit texture will need 8MB. This should result in a 32 bit BitMap allocation of something like 1536x1376 (128 byte width aligned). That shouldn't fail, so your 2048x2048 texture will get allocated.

Let me know if that works. If not, the bug may be elsewhere.
User avatar
Daytona675x
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 34
Joined: Wed Jan 22, 2014 5:18 pm
Location: Cologne, Germany
Contact:

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Post by Daytona675x »

@Hans
As Karlos said, the -8 return code corresponds to W3D_NOGFXMEM. See the header file.
Yes, yes, that's what I did, it was me not Karlos who mentioned this define :-) Anyway, the docs should be extended accordingly.
If only things were so simple. Picasso96 doesn't like to share VRAM with anything else, so Warp3D has to pull a few tricks. Without going into too much detail, Warp3D usually allocates blocks of VRAM at a time, and then stores multiple smaller textures in one block. Due to the way that things are set up, it has to lock these allocations in place, or the texture base pointers will become invalid if Picasso96 chooses to perform paging/defragging (which could happen at any time). Unfortunately, this also means that Picasso96's defragging becomes less effective, because there are large blocks of VRAM locked in place.
This issue happens any time. I'd not expect 128 MB VRAM / 2GB RAM to be fragmented that much, especially right after boot, that this allocation wouldn't be possible anymore. As further tests revealed it even happens with 16bit textures too. There are some interesting test results below that reveal that it sometimes happens even if only about 4 MB tex-data is used for a texture. That's such a low amount compared to what's available, it just shouldn't fail.

@Karlos
Calculating the size of the bitmap to allocate is a non trivial function that starts with the square root of the requested size in bytes and then tries to find the nearest hardware aligned width and adjust the height to compensate for the difference.
I don't get it why perfectly 32bit-pot texture size calculations shouldn't be trivial. Under certain conditions (unsupported texture format, non-pot-texture, conversions) I'd understand that, but not in such a case where the conditions are optimal.
Even if the required alignment would be 512 bytes internally it should be no problem to find such a memory area (at least not until you threw tons of textures at the driver and it probably really gets fragmented) and to get away without any size modifications in that scenario.
However, due to the nature of the allocator it is unlikely even in the exact 2048x2048 case that the requested BitMap will be the same physical dimension. It won't be wider than that, but it might end up taller.
Taller? Weird algorithm. Anyway: apparently the allocator can do it - at least sometimes. It sometimes can create something the GPU likes, which would be 2048x2048, not wider not taller. "The nature of the allocator": smeels like the root of evil, that allocator.
In theory your 16 bit texture will need 8MB. This should result in a 32 bit BitMap allocation of something like 1536x1376 (128 byte width aligned). That shouldn't fail, so your 2048x2048 texture will get allocated.
A 16 or 32 bit 2048x2048 texture data is already perfectly 128 byte alignable. It is even 4096 byte alignable. Why would the driver do such funny resizing? He should happily take and swallow it. Anyway, see tests below: same issue, probably a bit less often.
It is likely this that results in the failure to allocate VRAM because you can't allocate a BitMap bigger than 2048x2048 either.
So on the one hand the allocater sometimes tries to create something taller than 2048 you say? But at the same time it knows that this would be useless anyway, since you can't allocate such a bitmap. And besides that it makes no sense to resize anything at all in that case in the first place. Interesting strategy.
Luckily sometimes it apparenty doesn't do something that weird and simply delivers what the GPU wants, namely a 2048x2048 texture (we know that because it's displayed correctly and, as you said, it cannot handle something larger - so it has to be 2048x2048). Now if that driver would just do so always.

@thread
I did some more tests:

1. it happens with both 32bit RGBA format W3D_R8G8B8A8 and W3D_A8R8G8B8.

2. it happens with W3D_R8G8B8 too.

3a. it works with the 16bit format W3D_R5G6B5, W3D_A4R4G4B4 and W3D_A1R5G5B5...

3b. ... but not always and not under real world conditions with some more textures - sometimes it also fails after boot and with the mini test-app. So all in all those are not better than their 32bit brothers: just like those it's sometimes go and sometimes not. Maybe the chances are somewhat higher that it works, but that may be just my imagination.

4. it also happens with 2048x1024 and 1024x2048 at both 32bit and 16bit (! so it fails even at about low 4 MB tex-data).

5. it works with 2048x512 and 512x2048 at both 16bit and 32bit (! so it can also work if about 4 MB tex-data).

6. just for completeness: it happens independent of the screen's format / window size.

7. I disabled W3D_AUTOTEXMANAGEMENT. Then we get W3D_NOGFXMEM when calling W3D_UploadTexture.

8. interesting: if I call W3D_DrawArray a second time right after it failed then it tells me success. And actually draws something. But it apparently uses an incomplete texture (or none at all). It also behaves this way if I rebind the texture beforehand.

9. interesting: sometimes (randomly, behaves that way quite often though in the small test-prog) only the first W3D_DrawArray call fails, the second and all others simply work. So if you create that 2048x2048x32 texture in frame X then W3D_DrawArray fails in frame X but works in X+1, X+2, X+n. If it worked that way during a test session you can create / delete that or other textures without issues, whenever you create a 2048x2048x32 again it continues to behave that same way (btw. tested again: it doesn't even have to be spread over frames, if you simply issue that second call immediately after the failed one, then it works - sometimes, of course ;-) )

Okay, so at the bottom line we got the following:
Although Warp3D reports that it can handle 2048x2048 textures here the truth is that it sometimes can and sometimes cannot. Since the cause doesn't seem to be related (only) to the amount of textures in use or the texture-format or the size (at least not always), because sometimes it works / sometimes it doesn't, whatever information (max-texture-size, texture-creation-status) you maybe got is pretty useless, it's a matter of luck.

The tests seem to indicate the follwing rule of thumb:
it usually fails in real life if either the width or the height of the texture is 2048 and the other edge's size is at least 1024. The amount of bytes required to store the texture data doesn't seem to matter that much (because even 16bit 2048x1024 / 1024x2048 fails).

Anyway, the worst part isn't the issue itself - the worst part is that the texture's creation and everything works just fine (at least you're told so by Warp3D). So you have no chance to correct this issue until it's too late.
If Warp3D tells me that my texture creation succeeded then that information should be reliable.
If it is a memory issue of whatever kind then the memory handler is obviously seriously broken. The amount (low compared to VRAM size) of textures in use etc. simply don't really justify such failures on the test system.

Luckily I found a work-around for that latter problem that works for me:
after I successfully (?) created such a texture I temporarily disable auto-tex-management and call W3D_UploadTexture. If that one tells me W3D_NOGFXMEM I know that this texture, although created successfully, is not worth a dime.
So I could at least "move" that failure-information to where it belongs.
If that's reliable? Don't really know, it seems to work so far...

Really a pity, this whole bug: the GPU/card can do it, but the driver is so unreliable that your best bet seems to be to always divide Warp3D's max-texture-size info by 2 to be on the safe side :(
Warp3D driver code-basher and bug-smasher - btw.: driver writing is nothing mysterious
User avatar
Karlos
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 84
Joined: Sun Jun 19, 2011 11:42 am
Location: United Kingdom of England and anybody else that wishes to remain.

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Post by Karlos »

"I don't get it why perfectly 32bit-pot texture size calculations shouldn't be trivial. Under certain conditions (unsupported texture format, non-pot-texture, conversions) I'd understand that, but not in such a case where the conditions are optimal."

Again, It is not the texture allocation that is the problem, it is the BitMap that is allocated to contain it.

You ask the VRAM allocator for a block of 16MiB and it has to ensure that the 16MiB allocation is at least 32 byte aligned (I think even 128 byte for texture allocations). This is because texture alignment is often much stricter than the alignment required for BitMaps (even visible ones). When you allocate any power of two size BitMap, you might get one aligned to less than the requirements of a texture but most often you won't. This means you end up allocating a larger BitMap and returning a pointer to a location within the BitMap VRAM area that is aligned to next N-byte boundary. This is why the BitMap usually has to be larger. However, allocating BitMaps much larger than 2048 will probably fail.

Speaking of formats, don't use 24-bit RGB/BGR texture formats ever. Even on hardware that can cope with them, they often aren't implemented in the driver model and end up being converted to 32-bit with a solid alpha channel. This increases your memory usage as Warp3D has to maintain the 32-bit copy and the VRAM and your source 24Bit allocated data must also be locked until you are finished with it.

Packed pixel format. Just say no, kids.
User avatar
Daytona675x
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 34
Joined: Wed Jan 22, 2014 5:18 pm
Location: Cologne, Germany
Contact:

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Post by Daytona675x »

@Karlos

Off topic:
Speaking of formats, don't use 24-bit RGB/BGR texture formats ever.
Really, I guess nobody does :-) I tested it for the sake of completeness only.

On topic:
When you allocate a 2048x2048x32-bit BitMap, you might get one aligned to less than the requirements of a texture.
Ah. So that allocator is simply dumb and not suited for the task, right?
So apparently all that speculation about fragmentation or whatever is most likely wrong: it's simply that the allocator doesn't always return 128bit aligned buffers and you try to get a better one by allocating more RAM - and somewhere this fails.
This means you end up allocating a larger BitMap and returning a pointer to a location within the BitMap VRAM area that is aligned to next N-byte boundary.
Okay. I still don't get why you would need any "non trivial" resizing calculus for that task though. After all it's just about getting a RAM buffer large enough and properly aligned, so simply requesting some bytes more should do (= one or a few dummy rows extra for that request, or one or a few dummy columns in case of a 2048 height texture).

Anyway, as being said: apparently the root of the problem is that the allocator does not always return 128 byte aligned buffers and that it apparently has a problem when asked for bitmap sizes larger than 2048x2048, right?

Thinking loud:

But then the question comes up why 2048x1024 or 1024x2048 also fail.
Taking the info from above into account that should only happen if the allocator already has problems if just one of the two values becomes larger than 2048.
And that could be explained if those internal "size adjustments" tend to choose the worse of the two variants: instead of enlarging the smaller edge, the one that's already 2048 is further enlarged.

But then again lets not forget that 2048x512 / 512x2048 work. So in that case it either chooses the smaller edge for enlargment or maybe it enlarges the 2048x1024 to something completely off (and the 2048x512 to 2048x1024 for example, which would still work)? That would probably explain why those work and the others not: it's simply harder to come up with values that are outside the valid rect boundaries.

But then again: for proper alignment you'd just need a few bytes extra, not KB and certainly not tons of MB. Maybe here's the reason why those 2048 x (< 2048) variants also fail: because that algorithm to increase the "virtual" size for that request is buggy and instead of just adding one or two rows...

Yes, the idea that this internal resizing is causing the issue would also explain some other test results, especially that the actual amount of RAM requested is apparently not the important but the width/height dimensions only. Yes, a buggy "resizer" plus an allocator limited to 2048x2048 could explain all that.

And the fact that is sometimes works could simply be luck: the allocator returned a 128 byte aligned buffer at the first request and no further "resize" stunts were necessary.

However, everything would be fine if that allocator would always return 128 byte aligned pointers. Would simplify things a lot.

Of course all under the asumption that this is the real cause of the issue.
I'm curious what you find out when you hunt down that bug!
Warp3D driver code-basher and bug-smasher - btw.: driver writing is nothing mysterious
User avatar
Karlos
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 84
Joined: Sun Jun 19, 2011 11:42 am
Location: United Kingdom of England and anybody else that wishes to remain.

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Post by Karlos »

If P96/Graphics AllocBitMap() functions returned VRAM resident allocations that were already texture-friendly aligned for the target GPU, *none* of this complex allocation strategy would be necessary; you'd just allocate BitMaps directly using a pixel depth that matches your required texel depth. Also this assumes that P96 can handle many VRAM resident BitMaps efficiently.

Unfortunately, that's not the system we have. Allocating a BitMap doesn't even guarantee that the area you allocate is even in VRAM, let alone aligned appropriately. If you just allocate a BitMap with these functions, you'll get a buffer in system memory that is copied to a VRAM allocation if/when P96 deems it appropriate to do so.

When the W3D code was written, the only way to get around all this was to use a "friend" bitmap and use the Context's display BitMap for that. You then get a VRAM allocation (if it fits) at whatever pixel format the friend bitmap was in - you have no control or choice over the depth.

So, a VRAM allocator that sat "on top" of all this was written and it is generally quite efficient. It allocates as few BitMaps as possible and re-uses memory within them for many textures.

In your case, allocating large (1024x1024 and higher) textures is *always* going to push the allocator into requesting a new BitMap. In that case, it has to go through some hoops to ensure that:

1) The allocated BitMap is in VRAM
2) The allocated BitMap is not excessively rectangular (in order not to upset the graphics subsystem which has width/height restrictions too)
3) The allocated bitmap's total linear size is *at least* as large as the requested allocation plus whatever padding is required for texture alignment requirements which P96/graphics generally know nothing about.

When allocating your 2048x2048x32 bit texture I expect that it should fail almost always, if not actually always because the padding alone will probably cause BitMap dimensions larger than 2048x2048.

Ideally, there should be some "allocate VRAM aligned to my exact requirements " function exposed by the graphics sub system that W3D could use, but instead we have BitMaps.
Post Reply