I really thought Nvidia’s DLSS 5 was going to be smarter than this

We’ve been trying to get answers out of Nvidia about DLSS 5 for a couple days now, and we’re still waiting to hear if we’ll get them. But techtuber Daniel Owen has got some. Detailed in a new video, Jacob Freeman, GeForce evangelist, has provided some rather enlightening details about what DLSS 5 is actually doing.

And it seems right now to be far less smart than I expected.

Given the demos over at the GPU Technology Conference (GTC) this week were being run using a pair of RTX 5090 graphics cards—one to render the games normally and another $4,000 GPU to run the DLSS 5 compute path—it seemed like maybe there was something beyond just the AI filter it seemed on first flush when Jen-Hsun dropped it at the beginning of his GTC keynote this week.

But no, answers direct from Nvidia itself suggest that all the current early preview iteration of DLSS 5 is using as an input is a static, 2D image. As Freeman says: “DLSS 5 takes a 2D frame plus motion vectors as input.”

So, unless Freeman is grossly oversimplifying things here, it really is essentially taking a screenshot of a game and applying an AI filter to it. Sure, it’s impressive that Nvidia has delivered the compute pathways to allow for this to be done at such a rate of licks that it can effectively be used in real-time during a scene, and that it seems to be able to maintain consistency between those frames, too, but the actual technical elements of the DLSS 5 ‘enhancements’ don’t really sound that in-depth.

The DLSS 5 model is only ever aware of the motion vectors attached to a static image (where objects in the scene have come from and where they’re going) and a single 2D image. It has no understanding, beyond the flat surface of that frame, of the 3D geometry or depth of a scene, or of the specifics of any lighting found outside of the image in front of it.

Freeman notes the DLSS 5 model has been trained like this, and is designed to be able to infer information about “complex scene semantics such as characters, hair, fabric and translucent skin, along with environmental lighting conditions like front-lit, back-lit or overcast—all by analysing a single frame.”

So, it all just comes down to what it can infer from a 2D image, and is not able to be given any “ground truth” about what is actually feeding into that scene. It’s apparently completely limited to screen space and the model has zero awareness of anything that sits outside of the single image it’s working on. A best guess is okay in some cases, sure, but we’re talking about going down a probabilistic path for things like environmental lighting when, if you’re rocking path tracing, you have very definite areas and sources of lighting.

(Image credit: Nvidia)

And definite lighting is an area where developers can have very definite ideas about how they want their game to look in the final reckoning. DLSS 5 isn’t going to help there if it’s just taking a punt at what it thinks it should look like.

Owen also asks specifically about concerns around the underlying geometry and textures appearing to be materially changed by DLSS 5 as well as about Nvidia’s assertions that the feature can “enhance PBR [physically based rendering] properties on materials (roughness, more realism), with more realistic interaction of light.”

They note the changed hairline of a model in Starfield and the entirely problematic issue of the, what will forever be known as, ‘yassified Grace’. While Freeman notes, as Nvidia has explicitly stated before, the underlying geometry isn’t changing, that doesn’t automatically mean that you’re still going to see it. What it seems to be doing is that the DLSS 5 model may simply paint something else it prefers over the top of the underlying geometry so that almost becomes a moot point.

(Image credit: Nvidia / Bethesda)

On the PBR side of things, again things feel far more simplified than I expected. There is apparently no level of DLSS 5 that is hooking into the game engine so the model has the sort of hooks that can tell it what to expect from a surface—what material it is, whether it’s wet, how rough it is, etc.—so the only way it can “enhance PBR properties” is by ‘looking’ at them and taking an educated guess as to what they are. It doesn’t actually have any access to what the developers have put into their world, just inference.

“Materials are inferred from the rendered frame,” says Freeman, noting again that there are no other inputs.

The other concerning detail of the Nvidia responses surround just what dials and levers developers have to retain artistic control over a scene. And it seems that’s kinda all they are. I naïvely assumed, from my experiences with gen AI, that there would be some kind of prompt mechanism, where the developer might be able to tune the DLSS 5 model, to adjust the level of ‘heat’ or to rein in its wilder creative impulses, or maybe ask for certain things to be added or adjusted in a scene.

But no, it seems you get a kind of slider so you can choose the intensity of the effect, using alpha blending to weight a scene more towards the original render or AI output, colour grading control, and the ability to mask off objects or parts of a scene to keep them out of DLSS 5’s reach.

If, as happens to Grace, a character is given what seems like a full face of make up in a scene where that doesn’t really make sense, the developers seemingly have the option to either dial the output down so you can’t really see it, or turn it off entirely. They apparently just can’t ask for another pass without the lip gloss.

Then you have to circle back to the fears folk have expressed about potential homogeneity arising from a single model deciding what our game characters look like. Sure, you’re not changing the underlying geometry, but if the same DLSS 5 model is painting over your characters they surely run the risk of starting to look a lot like each other. What if there was another sad lady with cheek bones you could cut yourself on rocking a blond bob, wouldn’t they look an awful lot like yassified Grace?

The masking is an odd one, however. As Nick on our team points out, to be able to selectively mask objects there must be some sort of understanding of depth for DLSS 5 to be able to consistently not yassify someone or something.

The more we hear about Nvidia’s DLSS 5 feature, the worse it seems to get. Which is honestly counter to what I was expecting. I was hoping we’d have some insight from developers who have used it to highlight just what controls they have over the model, and how they go about retaining the artistic expression which has been at the heart of many peoples’ consternation about the technology.

But I am not here right now to question the ethics of its implementation—that’s a whole other topic for ire—nor to deny the fact that I do, in some circumstances, think it looks pretty good. I like what I saw of Assassin’s Creed Shadows’ environments, and I’d absolutely play FC 26 with it enabled. I just thought it was doing something a little smarter behind the scenes with all that compute it’s using.

And maybe it still is. Maybe Jacob Freeman isn’t explaining it correctly, or hasn’t the clearance to actually go into detail about what DLSS 5 is technically doing beyond that raw 2D frame/motion vector input.

During the GTC keynote reveal Jen-Hsun said: “We fused controllable 3D graphics, the ground truth of virtual worlds, the structured data of virtual worlds, of generated worlds. We combined 3D graphics with generative AI, probabilistic computing.

“One of them is completely predictive, the other one, probabilistic yet highly realistic. The content is beautiful as well as controllable. This concept of fusing structured information and generative AI will repeat itself in one industry after another. Structured data is the foundation of trustworthy AI.”

But right now it feels very much like there is a huge disconnect between the ground truth of a given game world—Jen-Hsun’s vaunted structured data—and the DLSS 5 frosting being layered on top—that unstructured AI-generated data. The promised fusion feels rather more layered than I expected given the introduction. I thought the two things were being brough together in some holy union in an effort for each to benefit the other, but it’s seeming like far less of a mixing than I’d hoped.

However it shakes out, one thing is clear, the reveal of DLSS 5 has been a true omnishambles of an announcement. From the almost context-free drop at GTC, to the yassified Grace becoming the ersatz poster child of the technology, to Jen-Hsun telling everyone they’re just plain wrong, to the huge misstep in actually referring to this as DLSS 5 at all. It’s all been a painful exercise in mismanagement and mismessaging.