Difference between revisions of "VQuake"
From Vogons Wiki
(Created page with "VQuake is a version of Quake written specifically for the Rendition Verite V1000 accelerator. It works quite differently than GLQuake because the V1000 has unique strengths an...") |
(No difference)
|
Revision as of 09:29, 24 February 2013
VQuake is a version of Quake written specifically for the Rendition Verite V1000 accelerator. It works quite differently than GLQuake because the V1000 has unique strengths and weaknesses. It is based upon the Quake software renderer.
Architecture
A usenet posting by former Rendition programmer Stefan Podell describes in detail how VQuake works in addition to why it is not optimal for V2x00.
Posted by Stefan Podell on October 30, 1997 at 04:34:56: Okay, with my reputation tarnished by VQuake and VHexen2 not going any faster on the V2x00, I'm here to explain why this is the case. First, the VQuake/VHexen2 engine can be roughly broken down into four parts (actually, most games are like this): * Software setup of geometry commands * Software creation of texture maps * Transfer of textures and commands to the renderer * Rendering The first two parts are totally CPU dependent, the third is bus dependent, and the fourth, in my case, is Verite dependent. Now, some history on the design of VQuake. When we first started working with Id on accelerated Quake, Id's design was much like their design for Quake 2: that there would be a "driver" part of the code, somewhat separated from the "main" game. (In Quake's case, though, since it is a DOS game, this would be accomplished with different executables rather than the more elegant DLL model employed by Quake 2.) They had two 3D engine paths through the game. One was a traditional triangle/polygon based engine, which is what they predicted all 3D accelerators would use, and one was a fairly elaborate "span sorting" scheme which the software renderer used. So we set out accelerating the game using the polygon interface. It looked great with filtering, but it wasn't terribly fast. Even after doing polygon sorting on the world polygons and turning off the Z buffer for those, the performance was not great. (You must remember that the V1000 wasn't really designed to do Z-buffering as its primary rendering style.) So Walt Donovan, then with Rendition, and Michael Abrash, then with Id, talked about using the software engine's span interface. For those who are interested, there have been a few articles published by Michael on the guts of the engine. I think Dr Dobb's Journal had the best, most detailed one. To the best of my understanding, the engine sorts all world surfaces (floors, walls, ceilings, and sky) on each scanline of the screen, and keeps track of the edges of each span. When it goes to draw a particular surface (polygon), it is then guaranteed that every pixel it is drawing is the only world pixel that will be drawn at those coordinates. (This is hard to explain...) Suffice to say that when the world surfaces are drawn, exactly the minimum possible number of pixels is drawn (i.e., a depth complexity of 1). Meanwhile, the Z buffer has been filled with the correct Z values, but Z comparison is not necessary, since we know the depth complexity is one. Next, when we start drawing objects (monsters, weapons, etc), we turn on the Z comparison so that the objects are properly hidden by the world. The reasoning behind this was that the Pentium was pretty slow at drawing pixels, but fast at floating point operations. By doing more of what the Pentium was good at, Quake was able to do less of what the Pentium was bad at. Overall, a performance win, with the side benefit of allowing much more interesting scenes. ((((whew)))) Walt and Michael decided that since the Verite 1000 wasn't terribly good at Z-buffered pixels, that if we let the Pentium take care of this span sorting, we could reduce the number of pixels the Verite would draw. Furthermore, we'd be able to turn off the Z compare function on the Verite. So now, rather than having a depth complexity of around 1.5 (about 450000 pixels at 640x480) pixels that draw at the peak rate of about 10MP/s on the V1000, we had a depth complexity of 1 (300000 pixels) that draw at a peak rate of about 17MP/s. As with the software renderer, we could then turn on the Z compare and draw all the more interesting objects. Yes these pixels would be as slow as always, but there are generally far fewer of them. So we set out writing new microcode to support Quake's span data format. When we finally got this working, sure enough, the performance was way better than the original polygon-style engine. Back to my four-part description of Quake's engine, we traded more CPU work in stage one for less Verite work in stage four, which ended up getting us a big win. Note that when you increase the vertical resolution in the game, the engine must sort more scanlines. And if you increase the resolution in either direction, the renderer must draw more pixels. Also note that when Quake was written, P133's were pretty top-notch, and the software frame rates were low enough that the span sorting was adequate to keep up. With the Verite rendering much faster than the Pentium could, suddenly the span sorting was the bottleneck. It wasn't until we got to P200's that the Verite was busy most of the time. So whether you're on a V1000 or V2x00, the CPU has a lot of work to do. Okay, that's part one. Part two is texture maps. (This will be much shorter.) The way the software renderer in Quake works is to take a small texture tile (like a couple of bricks) and duplicate it into a larger texture map for the world surface while it is applying the light map (dynamic or static). It then caches that texture map and draws with it. When it needs to draw that surface again, it checks to see if the lights have changed (like when you fire a weapon). If they have, it must regenerate the texture map and recache it. The VQuake engine does the same thing, with the extra step of having to download the texture map to video memory. Quake also mipmaps these surfaces. The mipmap level is chosen based on the size of the polygon (in pixels) relative to the size of the texture map. In VQuake, the texture cache is kept in video memory along with the display buffers and Z buffer. The quick equation for how much memory the display and Z buffers take is Width * Height * 6 (3 buffers, each 16-bits deep). The rest is for texture maps (minus about 128K for microcode). So when you increase the resolution, two things happen that increase the demands on the CPU for texture map generation. First, you have less texture memory, so textures will fall out of the cache more often, requiring regeneration. Second, higher resolution mipmaps will be chosen, further straining the texture cache. The assembly code for generating the textures is darn near as good as it can be. I certainly can't think of any instructions to remove, and Michael Abrash, who wrote it, is a genius at this stuff. We considered doing two pass lighting on the Verite, but after some experiments decided the CPU could do it faster. So again, no matter the Verite chip, the CPU will be very busy. (The texture mapping, by the way is the primary reason that timedemo works so much better as a real benchmark than timerefresh. In the demo sequences, there's lots of combat going on, which pushes the system much harder.) Alright. That's part two. Part three is the bus. As you know, the currently available Verites use the PCI bus, and are able to use DMA asynchronously, which does not use the CPU. The bus activity will steal some cycles from the CPU, but not an appreciable amount. Simple enough. Finally, the renderer. The V2x00 chips are *much* faster at drawing than the V1000. The fastest the V1000 could go was 25MP/s. The V2100 goes 40MP/s and the V2200 goes 50MP/s. Adding features (Z, alpha, fog, etc.) would slow the V1000 down a lot, while having minimal impact on the V2x00 chips. So I understand why people were expecting VQuake and VHexen2 to go much faster on the V2x00. But the fact of the matter is that a faster renderer doesn't necessarily buy you much given the architecture of this engine. And because of that, we're working on a V2x00-specific version of the engine, to take advantage of the extra pixel power, while lightening the load on the CPU. I must admit that I was beginning to wonder if my beliefs about the engine's behavior were really true. So I just did a weird hack on VHexen2 to test something: I put in a check at the time drawing commands and texture maps are sent to the Verite to see if the game was in "timedemo mode". If it was, I just threw away the commands and continued. Then, when timedemo was over, drawing would kick back in and I would see the results. The purpose of this was to simulate an *infinitely fast* renderer and bus (something we'd all like to have :-) (This is also known as a "speed of light" test.) ============================================== Here are the results of running timedemo demo1 on my current VHexen2 build (beta 3 candidate). I ran all the tests at 320x200, 512x384, and 640x480, with antialiasing set to 0 and to 7 at all resolutions. The first three tests are with rendering turned on, in other words, the numbers everyone has been responding to so far. The last set of numbers is my "infinitely fast" renderer. (fps are antialias = 0/antialias = 7) The first test was on my P166MMX (64MB RAM) with my V1000 reference board, which runs at the same speed as an Intergraph 3D 100. (These seem slower to me than what I remember, but I can't find where I wrote down my old numbers, so I just re-ran these.) ===================== 320x200 (51.0 / 43.1) 512x384 (26.7 / 24.2) 640x480 (16.9 / 15.5) ===================== Next, the same PC with a V2200. The first thing you'll notice is that it's a little slower at low resolutions. I think this is because the span microcode has a little extra overhead on the 2200. It also doesn't currently interleave buffers. I'm looking into fixing those things. ===================== 320x200 (48.1 / 41.9) 512x384 (32.9 / 28.5) 640x480 (22.0 / 20.1) ===================== Next, a V2200 in a P2-300 (this computer has a pretty sucky hard drive in it and 32MB RAM, so there was more swapping going on than on my 166MMX) ===================== 320x200 (71.0 / 63.4) 512x384 (43.3 / 36.1) 640x480 (27.9 / 23.5) ===================== And finally, my P166MMX with my "infinitely fast" renderer/bus test ===================== 320x200 (56.0 / 48.2) 512x384 (44.1 / 37.8) 640x480 (32.4 / 29.7) ===================== So you can see that at low resolutions, a faster CPU gets you much more by way of performance than a faster renderer. And as resolution increases, infinitely fast rendering is a good thing :-) Again, all this adds up to us working on a V2x00 version of this engine. We'll tell you more as we know more about its progress. This is not to say that there's no room for improvement in my current version of the game. So if I figure out any cool way to make this go faster on a V2x00, I'll definitely put it in. There is one thing Quake 2 does that I want to try in VHexen 2. I'll let you know. Regards, Stefan (maybe I should use a .plan file) Podell _________________________________________________