Warp3D: 2048x2048 texture / W3D_DrawArray problem

A forum for general AmigaOS 4.x support questions that are not platform-specific
User avatar
Daytona675x
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 34
Joined: Wed Jan 22, 2014 5:18 pm
Location: Cologne, Germany
Contact:

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Post by Daytona675x »

I'm not sure I appreciate your tone.
Forgive me, maybe I'm a tad too harsh indeed. It's just that I'm pretty tired about the fact that almost every time I want to use a W3D or Compositing functionality it turns out that it's broken, at least on some systems. I suppose you know about all the bugs I reported so far (more to come btw.)? AFAIK not one was solved (I was told that the MiniGL bugs were solved at least). Probably you can imagine that I'm a bit prickly when I have the feeling that no real measures are taken to really solve those issues. And when somebody wants to tell me something about "complex calculations" when it comes to math that's less complicated than what a sixth grade kid learns in school I start to feel to be taken for a ride, sorry...
We are talking about code that, according to SVN I have not looked at since 2013-06-20
That's probably part of the problem :( Such a crucial part of the system deserves more attention I'd say, don't you think?
If you want more commitment, feel free to hire me. My contracting rates are quite reasonable. I'd even give you a discount. Call it 450 UKP/day ?
No thanks :) I'm just the customer complaining about things not working. Things that should work. I mean, it's not as if I was asking for miracle-stuff here. There's a big bug (and not just this one) inside, one that makes simple texturing a problem and it would be great if at least this one was actually fixed. I'm really tired of adding workarounds for all those driver bugs to my AOS4-specific abstraction layer.
Wrong. Your requested 32-bit BitMap almost always ends up in FAST ram ready to be paged in only when RTG decides it's necessary to do so. You might get lucky sometimes, most of the time you won't. What are you going to do now? Your graphics card will at best render a load of garbage, or it will simply hang your entire system. You think this approach was never attempted before?
Well, since it looks like I'm the first one tapping into this issue... Yes, it may have been very possible indeed that nobody tried it before. Apparently you are not 100% sure yourself what happens under certain circumstances, so yes, I'd say this is a very valid constructive question indeed.
Why do you think increasing the width past 2048 will help?
Because from the information you delivered so far the height > 4096, this 4100, seems to be the problem. At least you answered "yes" when I asked if there was such a limit. So naturally the first guess to work around it would be to increase the width and lower the height instead, so that both values end up below 4096, even if that means using a not-so-nice-POT-width.
Trying this out is certainly more reasonable than asking for a 2048 x 4100 bitmap of which you apparently knew that it would fail beforehand. Or you could try different variants internally. If variant X fails, then try again with modified values.
That already won't work on most of the supported hardware and probably won't work on R200 either.
Yes, probably. Or probably not. How about trying it out? It would certainly be an improvement if it turns out that it works on R200. If it fails on others, bad luck for those.
If you want to help, why not write a test program yourself to ascertain the maximum hardware...
Not my job. I already helped to a great deal by providing test-programs for LOTS of W3D bugs I found, testing your debug lib and by talking to you here. I guess it's time that you make something out of it.
Sarcasm much? I have probably written more allocators than you would think, using many different strategies.
Unfortunately there was no sarcasm in that sentences. But anyway, if you are an allocator-master, then why not put your experience into writing one for this case here? That would be the solution I'd expect from a driver's author.
I didn't write this one, however. I have fixed many bugs and inefficiencies within it, but this one remains
If you have access to the code and digged into it then I don't get why you cannot fix this issue. With your experience regarding allocators this should be a piece of cake for you!
Feel free to develop it and I'll reimplement the W3D_Picasso96.library to use it.
Give me access to all the sources and info you got and I'll probably give it a shot. I'm not as cheap as you are though ;-)
I can write some special case handling for when we will exceed 2048x2048 on that basis
Why do you need yet another test program for this? If you can solve the problem, then how about doing it? I'll be happy to test a fresh library. I'd say that's way more than you can expect from an usual customer ;-)
After all, there clearly isn't a problem when we are allocating lower sizes
Who knows (remeber those 2048 x 1024 / 1024 x 2048 failures I sometimes got?) I only know what you decide to let me know. And I don't even know how valid that info is. So far the only thing I know for sure is, that you had a wrong calculation inside your code and that this got fixed, so that 2048x2048 textures work for 32bit contexts now (which is an improvement I appreciate, of course).
Earlier you were berating me (wrongly) for apparently restricting the allocator to only care about the non Radeon untermensch and suddenly you want to reduce everybody's functionality that aren't affected by your specific use case?
Sorry, you simply didn't get me right. Of course I expect you to deliver useful values depending on the context. Of course I don't want you to return from W3D_Q_MAXTEXWIDTH with 1024 always. You should return 1024 if you can determine that 2048 won't work (and apparently you can do that if we got a R200 and a 16bit context, since you know that your p96Alloc-call will fail - at least that's what we agreed on above). From what you told me so far (and as my tests underline) on R200 such an alloc will always fail. So it would be a smart (and simple) "solution" for now to adjust the max-texture-size info accordingly.
Since the underlying problem is unlikely to be fixed any time soon
Okay. Then we all know where we are.
and the W3D_Q_MAXTEXWIDTH/HEIGHT functions aren't context aware beyond what is potentially possible on 8-bit (deprecated) or RGB render targets
Pardon? The first parameter to W3D_Query is the context (and the NULL pointer variant is long "discouraged"). So it should be no problem for you to return a more correct value than 2048 x 2048 based on the context's depth / GPU type at least. I mean, that information should be possible to obtain, right? At least you definitely know the context's depth at this point - because you use that info during texture allocation. And if somebody calls W3D_Query with a NULL context, fine, then just do as you do now.
1) Cause W3D_AllocTexObject() to fail when attempting to reserve storage for a texture that is likely to exceed the limitations implied by the render target (ie, 2Kx2Kx32-bit texture on a 16-bit render target) in the Radeon drivers.
That's at least a tad better than the current variant of returning "okay" and failing later with the first draw command.
Warp3D driver code-basher and bug-smasher - btw.: driver writing is nothing mysterious
User avatar
Karlos
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 84
Joined: Sun Jun 19, 2011 11:42 am
Location: United Kingdom of England and anybody else that wishes to remain.

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Post by Karlos »

So, let's see where we are.

Warp3D's allocator isn't actually broken per se, it's the limitation of the RTG system it sits on that causes the problem. You may disagree, but this is indeed the case. If a function existed that allowed you to basically AllocVec() VRAM fixed in place, the current hacky method would not be needed. This is evidenced by the fact that the corresponding RadeonHD drivers have no such limitation. The RadeonHD resource is responsible for all memory allocation, AFAIK, for both 2D and 3D. Hans can probably confirm this for us. AFAIK, the RadeonHD drivers are quite happy up to 16Kx16K both for BitMaps and textures.

Believe me, I spent enough time fixing issues caused by the current allocator on less capable cards, but your specific case is not one I spent any real time analyzing because until you tried it, nothing in the existing corpus of code used textures that size.

The fix that you expect requires making elsewhere in the system. I'd need to overhaul P96, all the other drivers. etc, in order to fix this particular issue for good. Who knows how the system would behave the moment you allow client code to arbitrarily allocate and pin VRAM?

The reason I asked you to write a very specific test program, one that simply allocates a BitMap at sizes > 2048 pixels wide and confirms that they are indeed in VRAM is because I can't. Not for lack of skill or even time. I simply do not have a working OS4 system with R200 class hardware any more. Such at test program would help because at sizes > 2048 we could opt for non power of 2 expansions to the size and maybe we can allocate a BitMap that doesn't exceed the internal limits of the 2D RTG system and gives you the 16MiB you need for your texture allocation. Also, consider that the moment you add Mip Mapping to this, you need a single allocation bit enough to contain the level 0 and every smaller level as one contiguous block.

If you aren't prepared to help in this fashion, then you are going to have to live with the existing limitations. Adding a check to the driver's implementation of AllocTexObject is a relatively low impact solution.

Finally, regarding the W3D_Query() function, it is, and always has been generally implemented on the basis of fixed answer responses. Almost all of the drivers have static arrays of constants that are used as lookups for a given property. The fact they require a W3D_Context parameter is irrelevant. I agree that it's a bit silly and there are exceptions to this in some of the drivers, but generally that's how they work. Virtually all the library functions expect a context as the first parameter, but they don't always need it. It depends on what that function does and whether or not a given driver's implementation of it needs to care about it.
User avatar
Hans
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 703
Joined: Tue Dec 21, 2010 9:25 pm
Location: New Zealand
Contact:

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Post by Hans »

Daytona675x wrote:Pardon? The first parameter to W3D_Query is the context (and the NULL pointer variant is long "discouraged"). So it should be no problem for you to return a more correct value than 2048 x 2048 based on the context's depth / GPU type at least. I mean, that information should be possible to obtain, right? At least you definitely know the context's depth at this point - because you use that info during texture allocation. And if somebody calls W3D_Query with a NULL context, fine, then just do as you do now.
I think what Karlos meant is that W3D_Query() can't tell you what the max texture dimensions are per texture-format. The destfmt parameter is for the render target, and not the texture's format. So, you can't ask "what's the max width for a 16-bit texture?" This makes sense because the GPU has the same maximum texture dimensions regardless of the texture's pixel format; it's not the GPU's fault that the memory allocator gets in the way.

Karlos wrote:... This is evidenced by the fact that the corresponding RadeonHD drivers have no such limitation. The RadeonHD resource is responsible for all memory allocation, AFAIK, for both 2D and 3D. Hans can probably confirm this for us. AFAIK, the RadeonHD drivers are quite happy up to 16Kx16K both for BitMaps and textures.
The max bitmap/texture resolution varies between Radeon HD series, with Southern Islands GPUs supporting up to 16Kx16K.

Picasso96 still handles the low-level buffer allocation itself. The way that it's allocator is structured makes it impossible to take over completely without really nasty hacks (as in poking around in P96's internal data structures). The RadeonHD_RM.resource does have lower-level access to the VRAM allocator than W3D_Picasso96.library does, so it can directly allocate blocks of VRAM and manage paging of those itself (e.g., paging buffers on an LRU basis, which P96 doesn't do). It still takes some wrangling to wrestle this much control from Picasso96, but at least that's hidden from 3D drivers.

Hans
http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more. Home of the RadeonHD driver for Amiga OS 4.x project.
User avatar
Karlos
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 84
Joined: Sun Jun 19, 2011 11:42 am
Location: United Kingdom of England and anybody else that wishes to remain.

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Post by Karlos »

That's what I meant when I said it wasn't "context" aware. I'd I had meant W3D_Context, I would have written W3D_Context.

Nevertheless, most W3D_Query() implementations look only at dstfmt (and even then, usually only care if its RGB or CLUT) and return a static lookup value for the queried parameter. There's very little intelligence applied. And as you say, without knowing the intended texture format, it couldn't return different values to work around the allocator limit. I did override this for Permedia allowing the maximum dimension to be set via an end var but soon discovered that almost no software ever queried it and even when it did, wouldn't bother respecting it.

All said, i do agree that the existing query implementation is somewhat lacking. Nevertheless, the real issue here is having to allocate VRAM using friend BitMaps. A better solution is needed.
User avatar
Karlos
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 84
Joined: Sun Jun 19, 2011 11:42 am
Location: United Kingdom of England and anybody else that wishes to remain.

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Post by Karlos »

@Daytona
Forgive me, maybe I'm a tad too harsh indeed. It's just that I'm pretty tired about the fact that almost every time I want to use a W3D or Compositing functionality it turns out that it's broken, at least on some systems. I suppose you know about all the bugs I reported so far (more to come btw.)? AFAIK not one was solved (I was told that the MiniGL bugs were solved at least). Probably you can imagine that I'm a bit prickly when I have the feeling that no real measures are taken to really solve those issues. And when somebody wants to tell me something about "complex calculations" when it comes to math that's less complicated than what a sixth grade kid learns in school I start to feel to be taken for a ride, sorry...
That's OK. Honestly, I do understand your frustration. The same frustration, back in the WarpOS/Warp3D days is how I got involved in the first place. Loads of stuff in the Permedia driver just didn't work or wasn't even implemented.

When I say the allocator is "complex", what I mean is that it's not simple, when it should be simple. It should be as simple as "I need N bytes, aligned to the nearest X boundary". I'm not saying the math is particularly complex, but when your routine for allocating a linear (1 dimensional) span of memory requires both square root and base-2 logarithm operations because it's really trying to compute an optimum 2D hardware-aligned BitMap area behind the scenes, you hopefully get my point.

Just so you are fully informed, this allocation behaviour only kicks in when there is no free segment of an existing BitMap that is large enough for your request. Usually, many small textures reside together in a single BitMap. Once you start requesting allocations above a certain size (about 1MiB, but that's off the top of my head, at work now so can't check - might be less than this), fresh BitMaps are allocated using the method described above.

I could replace the whole thing (and have thought about a number of improvements for handling volatile textures etc), but at some point I need to actually allocate VRAM on the card. And that's where I'd become as unstuck as the existing implementation. There's no way to actually do it without allocating BitMaps and again you end up needing a friend BitMap to do that reliably.

Clearly we need an enhancement to RTG, because as you rightfully pointed out, why should it care what we do with our memory? It's simply the fact that it get's worried when it sees an allocation for a visible BitMap that appears so much larger than it expects that causes it to fail.
User avatar
Daytona675x
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 34
Joined: Wed Jan 22, 2014 5:18 pm
Location: Cologne, Germany
Contact:

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Post by Daytona675x »

I think what Karlos meant is that W3D_Query() can't tell you what the max texture dimensions are per texture-format. The destfmt parameter is for the render target, and not the texture's format. So, you can't ask "what's the max width for a 16-bit texture?"
Indeed, that's absolutely true. That small detail I forgot about renders my patching W3D_Query idea nonsense, of course :-)
Then I guess the best workaround for now is to let W3D_AllocTexObject fail.
That's OK. Honestly, I do understand your frustration.
It's more anger plus a spoon of resignation than frustration ;-)


For completeness: With the information on 32bit contexts vs. 16bit contexts I redid my tests and got the following:

Lib 53.10:

16bit contexts:
2048 x 2048 x 32: fail
2048 x 1024 x 32: fail
1024 x 2048 x 32: fail
2048 x 2048 x 16: okay
2048 x 1024 x 16: okay
1024 x 2048 x 16: okay

32bit contexts:
2048 x 2048 x 32: fail
2048 x 1024 x 32: okay
1024 x 2048 x 32: okay
2048 x 2048 x 16: okay
2048 x 1024 x 16: okay
1024 x 2048 x 16: okay

Lib 53.11 debug:

16bit contexts:
2048 x 2048 x 32: fail
2048 x 1024 x 32: okay
1024 x 2048 x 32: okay
2048 x 2048 x 16: okay
2048 x 1024 x 16: okay
1024 x 2048 x 16: okay

32bit contexts:
2048 x 2048 x 32: okay
2048 x 1024 x 32: okay
1024 x 2048 x 32: okay
2048 x 2048 x 16: okay
2048 x 1024 x 16: okay
1024 x 2048 x 16: okay

So apparently what I first thought was random behaviour with those 1024 x 2048 / 2048 x 1024 texture failures wasn't random at all. I simply had different screen modes active when I did those tests :P
Anyway, those got fixed with your fix inside the debug lib.
So the only case not working remains 2048x2048x32 on 16bit contexts (MIPmapping aside, not tested yet).
Warp3D driver code-basher and bug-smasher - btw.: driver writing is nothing mysterious
User avatar
Karlos
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 84
Joined: Sun Jun 19, 2011 11:42 am
Location: United Kingdom of England and anybody else that wishes to remain.

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Post by Karlos »

My expectation is that 2K x 2K 32 bit textures on 32 bit render targets while OK now will probably fail if MIP Mapping is turned on. They need about 1.5x the memory again and that's going to end up as a 2K x 3K request in 53.11.
User avatar
Daytona675x
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 34
Joined: Wed Jan 22, 2014 5:18 pm
Location: Cologne, Germany
Contact:

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Post by Daytona675x »

Fixed this, at least for R200 (cannot test others). Up to 2k x 2k now works with or without mip-mapping, no matter if 16/32 bit context, etc.
Programmers: don't forget to remove your workarounds / artificial limitations to below 2k x 2k when the driver gets updated :-)
Warp3D driver code-basher and bug-smasher - btw.: driver writing is nothing mysterious
User avatar
samo79
Posts: 572
Joined: Fri Jun 17, 2011 11:13 pm
Location: Italy

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Post by samo79 »

@Daytona675x

Awesome, it seems that a lot of bugs are being worked on !
Are you directly involved in Warp3D now ? :)
BSzili
Posts: 6
Joined: Mon Nov 18, 2013 7:51 pm

Re: Warp3D: 2048x2048 texture / W3D_DrawArray problem

Post by BSzili »

Daytona675x wrote:Fixed this, at least for R200 (cannot test others). Up to 2k x 2k now works with or without mip-mapping, no matter if 16/32 bit context, etc.
Programmers: don't forget to remove your workarounds / artificial limitations to below 2k x 2k when the driver gets updated :-)
At this rate I'll be forced to update a few of my ports :D Gotta look for those > 1024 hacks.
Post Reply