View Full Version : NVidia Blitting
Chris81
22nd February 2002, 13:54
Ok I don't mean to argue with anyone, but this NVidia hardware blitting problem has really got my goat. As far as I can tell no other program/plugin has a problem with NVidia hardware blitting, does anyone find this odd? Ryan Geiss insists it is a problem with the drivers, but if it is why has no one else ever reported this problem in any other situation. I'm no expert on graphics card technology but as far as I can tell there are two odd things going on.
1. The colours get corrupted, presumably due to an incorrect or invalid blitting mode.
2. The image gets shifted and/or stretched off the screen, (this only affects me in 32bit modes).
Although the above two are annoying the hardware blit does actually work, its not like the cards are incapable of it. NVidia cards are *very* widely used so surely its worth doing some in depth research to sort it out. Has anyone out there had this or a similar problem in another situation? Does anyone use an Nvidia based card in Milkdrop and not get this problem? If we all get together maybe we can at least narrow down the problem rather than just a cover-all "its the NVidia drivers" excuse.
Chris
As a start here are my details
VIA Chipset KM133A
AMD Athlon 1.1 GHz
Leadtek Geforce2 Titanium 64Mb
Detonator XP 23.11
Win 98SE
Direct X 8.1 Debug
Colour problems with all settings
Stretching only with 32bit modes
DGhost
24th February 2002, 01:54
As it has been stated in other forums, the problem is that NVidia doesn't convert between color formats when blitting. As Geiss himself wrote:
The problem is this: the card (or driver) will let you blit from an RGB to a YUV surface, but it won't do the color space conversion (from YUV to RGB) - it just copys the bits directly. When you look at a YUV pixel and pretend that it's formatted like an RGB pixel, that's when you get the grungy green-and-blue color scheme that nVidia users are seeing.
The feature works on both ATI and Matrox drivers so far, and i am not positive about the VIA/S3 chipsets or the Kyro's, but i would imagine that it works fine with those as well.
Hardware overlays are indeed working on the NVidia cards, we all know that. If overlays did not work you would get a nice black background and thats it.
As far as other programs, how many other programs render a scene to video memory and then blit it from video memory to the frame buffer. I don't know of any others that do it with a 3d rendering. For non-3d accelerated its easy - you just ensure that the data is in the proper format before you blit it.
People know what the problem is. But there is no solution to it. It is a broken feature of NVidia's. Last i checked, having a userspace program do color conversions would be the same thing as a software overlay, which obviously is not an sufficent solution.
And as far as other similar problems, i have seen similar things with my ATI TV tuner, and it is a driver issue. When using hardware blits, any problems that arise are driver issues. The whole process is being done by the video card.
It is indeed NVidia that is at fault. I would suggest contacting them about it.
Chris81
24th February 2002, 11:20
From what I can work out from what Ryan Geiss said in his post what we have is this.
Back Buffer RGB
Front Buffer YUV
When he does the flip the back buffer and front buffer swap. However the card doesn't do the color space conversion YUV-->RGB correctly and so the data that was in the front buffer and is now in the back buffer becomes corrupted with the lovely green/pink color scheme. So when the next flip is performed what we see is a YUV surface which has undergone processing assuming it's an RGB surface and then converted to another YUV surface, so screwing up the colors. I assume that Geiss has looked into the obvious, ie using an RGB front buffer surface. The only other solution I can offer is this.
The three YUV values are linear functions of the three RGB values. Then why can't you just do all the processing using YUV, have both the back and front buffer in YUV format and forget about any color space conversion. Admittedly this would probably involve a lot of work, but I can't see the drivers being fixed anytime soon. Or why don't you have a triple buffer system like so.
1st Back Buffer RGB
2nd Back Buffer RGB
Front Buffer YUV
Image processing and rendering done on the 2nd Back Buffer. When we do the blit we copy the 2nd Back Buffer to both the Front Buffer and the 1st Back Buffer. This way the 1st Back Buffer is just an RGB color space mirror of the YUV Front Buffer. If these ideas are stupid or unusable I apologise.
Chris
geiss
26th February 2002, 18:30
Chris,
Unfortunately, you can't just render in YUV, because in the YUV color format, every other pixel has a different format; odd pixels have one conversion formula, and even pixels have a different one. This could hypothetically be solved quite easily with a pixel shader, but GF2 users would still be out of luck, so I dunno... seems like more trouble than it's worth right now. If I ever get the time... =)
(( Also, the reason you never see this pop up in other programs is because MilkDrop is the only program that I know of that tries to blit from RGB to YUV. Drempels does it manually so it always works; see below. Also, BTW, the reason the screen gets chopped in half or doubled when you're in 32-bpp is because all yuv surfaces are 16-bit, so if the hardware blits 32->16 and doesn't do any conversion, you usually get weird results like that; they vary based on whether the driver pays attention to the surface pitch (bytes per line). You might have already known all this, but oh well, maybe someone else reading didn't. ))
Anyway, I have (just last week) followed up on this with nVidia to make sure, once and for all, that it's a driver problem. And, much to my dismay, it's worse than a driver problem: it's a hardware limitation. The GF2 line of cards does not support YUV->RGB color space conversion. So, GF2 owners are stuck with a software blit, where you take the image across the bus to the cpu, convert it there, and send it back. And for some reason, sending it TO the cpu is dirt slow; sending it back to video memory cruises. If anyone has any suggestions (on how to speed up the video->system memory transfer) I'd love to hear them. The full scoop is below.
Here is a transcript of my e-mail ping-pong with the driver guys at nVidia:
I wrote something like:
<FONT COLOR=#008800>
Is there *any* way to get an nVidia graphics adapter to perform a blit from an RGB surface to a YUV surface, and to do the color space conversion?
(Most cards do it, but I can't get it to work on nVidia cards in DX7 - trying every fourcc code, every pixelformat flag, and every kind of blit imaginable.)
</FONT>
They responded:
<FONT COLOR=#0000ff>
YUY2->RGB565 or YUY2->ARGB8888 should work, but we do *not* support the reverse, i.e. any kind of RGB->YUV conversion. GF3 and later HW is capable of doing it, but there is no driver support in the DX blit code. What is the purpose that this will be used for?
</FONT>
I wrote:
<FONT COLOR=#008800>
The context for this function is to be able to render 3D stuff into your Windows desktop (background/wallpaper) at a good frame rate (30+ fps). You first render the scene to a RGB texture, then blit the RGB texture to a YUV overlay (or, if the card supports it, like some ATI cards do, an RGB overlay). (The background must be cleared to the same color as the overlay color key, of course.)
There are two basic ways to generate the graphics: cpu-side and video-card-side.
If you generate the image on the cpu (software rendering), then send it to the video card each frame, it's plenty fast and always works because you can do the rgb-to-yuv conversion quickly in software & use overlay stretching to show it fullscreen. I've already written a program called Drempels that does this:
http://www.geisswerks.com/drempels/
to make trippy animated wallpaper that oozes in your background while you work.
But if you generate the image on the video card, you rely on the video card being able to blit your RGB result onto the YUV display surface, which the GF2 can't do. The alternative is to copy the texture over the bus every frame, convert it to YUV on the cpu, and then send it back across the bus to the video card. The bummer is, the rgb transfer from the video card to the cpu is ultra-slow. (*any
suggestions on how to speed it the video->cpu transfer? any driver back doors, AGP stuff, bus mastering... anything? all I do now is lock the surface and do an mmx-accelerated memcpy.)
MilkDrop is a Winamp audio-visualization plug-in that I've written, which falls into the latter category. Unlike Drempels, MilkDrop uses the 3D card (not the cpu) to generate the graphics; so, unfortunately for all GF2 owners, the image has to be copied over the bus, converted to yuv, and copied back, every frame. You end up getting a bad framerate (~15 fps) for even a low-quality image (half the res of your display, stretched). But on cards that do support the conversion in hardware, you can often run with no stretching (beautifully crisp image!) at 30 fps.
To check out MilkDrop:
http://www.nullsoft.com/free/milkdrop/
http://www.winamp.com/
( after installing it, hit ALT+K from winamp to configure it; then set the render mode to "desktop mode". Then run it & check out the slowness of the overworked bus. =) Then configure again & enable the "use hardware blit" option, run it, and check out the blazing speed, but the munged colors (if you're on a GF2). )
The only two remaining solutions for the GF2 that I can see are:
1) speed up the copy from video memory to the cpu somehow
(for software blit case), or
2) use an RGB overlay surface (...I haven't been able to
do this)
Also, for the GF3, it would be fantastic if the DX drivers could be updated someday to do this blit.
</FONT>
And they wrote:
<FONT COLOR=#0000ff>
Try blitting local to system or AGP memory (AGP transfers will have faster throughput, but be careful about usage of uncachable AGP surfaces). This blit is somewhat handicapped because of DX synchronization issues, nevertheless it should be much faster than reading with the CPU. Let us know if this is still not fast enough. A simple way of speeding this up is to operate on a 1/4 sized render surface and use the overlay to scale upwards.
We don't support RGB overlays either. Perhaps that will change one day.
</FONT>
So I changed some code, and replied:
<FONT COLOR=#008800>
Thanks for the suggestion - I did it, and the transfer from Local to System memory went a little faster (~25%) using BltFast (instead of an mmx-accel'd memcpy). However, even the blit is still cruciatingly slow: the BltFast() call alone takes 240 milliseconds to blit an 1024x1024 surface (so I get < 4 fps!!).
The transfer back (from System to Local memory) is blazing fast, though; whether I call BltFast(), memcpy_mmx(), or just write directly to the surface as I convert to YUV, it cruises. I can disable the slow part (the transfer TO system memory) and use any of these 3 methods to transfer it back, and I get 20-22 fps. This is all w/o stretching.
At a half- or quarter-sized surface, the speed gain of 25% remains using BltFast for local->system. However, the image looks rather shabby when stretched, so I'm really hoping to get the 1:1 blit faster. Any other ideas? What is an example of an uncacheable AGP surface?
(FYI - Details on the blit from local to system memory: )
source surface:
w=1024, h=1024, pitch=2048
16 bpp, RGB, masks: r=0000f800 g=000007e0 b=0000001f
DDSCAPS_3DDEVICE
DDSCAPS_LOCALVIDMEM
DDSCAPS_TEXTURE
DDSCAPS_VIDEOMEMORY
destination (system memory) surface:
w=1024 h=1024 pitch=2048
16 bpp, RGB, masks: r=0000f800 g=000007e0 b=0000001f
DDSCAPS_SYSTEMMEMORY
blit command used:
dest_surface->BltFast(0, 0, source_surface, NULL,
DDBLTFAST_WAIT);
// notes:
// takes 240 ms per call, regardless of if
// called before or after EndScene.
</FONT>
They responded:
<FONT COLOR=#0000ff>
When you normally request an AGP surface, I believe what
you get is an uncached surface, that means all CPU accesses
to it will have a 0% cache hit rate. That means any kind
of operation you do on that AGP surface with the CPU will
be slow (AGP surfaces are really meant for textures which
are set up once and never change). However, I heard there
is way to get cached AGP surfaces, but I don't know how.
The key to getting this working well is to be able to
pipeline your work and do something else while the blit is
happening. However, a big problem is the fact that DX7 and
above do not synchronize system or AGP blits. To avoid bad
things from happening when a system surface is destroyed
while a blit is in progress, the driver must make all system
blits synchronous, i.e. the blit call does not return until
the blit has completed. We do have a way of doing
asynchronous blits, but I'm not sure that functionality will
be available on future drivers...
It is somewhat surprising to me that you only saw a 25%
speed improvement in the local->system blits. Normally I
hear numbers on the order of 10x (going from 6 Mb/s to 60
Mb/s). Make sure that DXVIEW reports that we have local->
system caps set (earlier drivers don't have it set), if it
doesn't, then you are using the MS MMX code to do the blit.
Our level of support for local->system is simple blits only,
that means no stretching, no rops, no colour keys; any
deviation from this and you are using MS MMX code instead of
our HW.
</FONT>
So, I'll see what I can do to get them to add driver support for the GF3 to support RGB->YUV video-video blits properly, but it might take some assistance... so feel free to send nVidia a little mail requesting this!
cheers,
ryan
Chris81
26th February 2002, 22:23
Ryan,
I now understand the hardware problems with the YUV to RGB conversion, but what about the triple buffering setup using the video memory to keep an RGB mirror of the screen in local video memory so that when doing the iterative processing on the frame, you scrap the copy sent back from the screen buffer and instead work with the RGB mirror which doesn't rely on the use of the non-existent YUV->RGB conversion?
Chris
geiss
27th February 2002, 02:58
Chris,
I don't see how this would help... perhaps I'm not getting it. The graphics are generated on the video card, in local video memory, so if I use triple-buffering (set up manually so that one of those surfaces is in system memory), how is the image going to get to the system memory surface? It has to come across the bus somehow, if it's generated on the video card, so I would think this method would be just as slow.
Normally, Flip() is instantaneous because it's just a pointer swap on the video card, between two similar local-video-memory surfaces. But when one surface is in system memory, it's going to have to send the whole thing across the bus, which is the whole problem here.
Any more info to clear up the question would be great.
cheers,
geiss
Chris81
27th February 2002, 10:50
Added Later:
Also getting very confused, are we at cross-purposes. I got the impression it was the YUV->RGB conversion when shifting front buffer to back buffer that didn't work. But in one of the emails from NVidia they state that it is the RGB->YUV conversion that doesn't work, if they are right then ignore the message below.
Sorry
Ryan
Maybe i've got the wrong end of the stick but the system I was proposing would hold all three primary surfaces in the local video memory, that way avoiding the time consuming transfer across the system bus. Then keeping the second RGB surface as a mirror of the front buffer surface so that when it comes to do the iterative processing you scrap the data returned from the front buffer and instead overwrite it with the data from the RGB mirror of the front buffer surface (the second back buffer surface). Do you mean that this isn't possible (ie you can't have all three surfaces in the video memory and that one must reside in system memory).
Chris
FubbHead
25th March 2002, 20:36
OK. I kind of "dunno know sh*t about sh*t and pull up my pants" :) regarding all this, and I'm a bit confused too, but I feel I must ask, and it might even be a very stupid question too, so bare with me :).
Anyway..
It's the conversion, or actually the lack of it, that screws up the colors and proportions, right? But isn't it possible at all to make all the mumbo-jumbo in the right format in the actual 'routines' so to speak? Ehhrm.. Hope you all get what I mean :)..
(What I kind of mean is that if it is possible to have everything in the popular YUV flavor from the very beginning, so no conversion is needed?)
Wish
13th April 2002, 17:07
Has anyone tried the latest official Detonator 28.32 from Nvidia's website? I'm not sure if it would help in this issue, but might be worth a try. :D
coogles
16th April 2003, 06:54
1 year and 3 days later....
Any updates on this subject lately? I got a gforce4 ti-4800 and still has same effects. They didnt do anything about the drivers or cards? Thats a little suprising to me.
Rovastar
27th April 2003, 21:17
ummh don't know Ryan has since release a new version of desktop stuff like in Monkey and Smoke and his VisSDK. Maybe he has crack it, dunno. *shrug*
Carnyge
5th June 2003, 19:16
I have a fairly decent system:
AMD 2500+
1gb of PC2700 DDR (Dual Channel)
Geforce 4ti 4600 (Leadtek Winfast A250 Ultra) 128MB DDR
Asus A7N8X Deluxe
And the non-blitting mode is still crawling along at 3fps.
in Milkdrop 1.03 on Winamp 2.91 (I think thats still the latest)
How can the drivers still be messed? and still have a problem with garbled colours? Milkdrop has been my favourite vis for Winamp for a long time, but I cant help but notice that Monkey doesnt have this problem... and I get 30fps with it amped up to max everything (execpt res) i use the Omega Nvidia Drivers (im sure you have heard of him) www.omegacorner.com the texture quality has an amazing increase with these drivers (there the original 43.45 drivers but have added tweaks for texture quality, with a minimal FPS loss, 2-3% at MOST) while i like monkey as a desktop i would VERY much prefer milkdrop because... well i like it better then the harry caves of monkey :) Ryan you have long been a lead vis programmer for Winamp and MilkDrop is the proof of this (in fact i have prefered Milkdrop since before it was called "Nullsoft Milkdrop" that was quite some time ago.
So I must ask, is there ANYWAY to incorporate this speed into milkdrop
as a desktop? with out the colours being all addled and out of whack?
with all the progress in technology is there no way to have it nice and fast in Desktop mode? Has anyway to use a Nvidia card in HW Blit mode been revealed? or better yet produced?
Or can you trick it into displaying the correct colours? sorta second guessing the results? almost ofseting each colour in a way that it can be dysplayed correctly though blit mode and be all garbled in non-blit mode? sorta a toggleswitch? I dunno im not a programmer, it would be incredibly nice of you to look into this and see if it can be fixed or if Nvidia cards are forever doomed to spend the rest of there live in winampine exile? (ive also been a fan of Nvidia since my old Geforce 2 ti {Hurcules Prophet II ti}) please look into it
Your Loyal Fan
-cArnYgE
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.