Facebook Researchers seem to have figured out a way to use machine learning to essentially give an Oculus Quest 67% more GPU power to work with. However, Facebook informs us this is “purely research”.
The Oculus Quest is a standalone headset, which means the computing hardware is inside the device itself. Because of the size and power constraints this introduces, as well as the desire to sell the device at a relatively affordable price, Quest uses a smartphone chip significantly less powerful than a gaming PC.
“Creating next-gen VR and AR experiences will require finding new, more efficient ways to render high-quality, low-latency graphics.”
Facebook AI Research
The new technique works by rendering at a lower resolution than usual, then the center of the view is upscaled using a machine learning “super resolution” algorithm. These algorithms have become popular in the last few years, with some websites even letting users upload any image on their PC or phone to be AI upscaled.
Given enough training data, super resolution algorithms can produce a significantly more detailed output than traditional upscaling. While just a few years ago “Zoom and Enhance” was a meme used to mock those who falsely believed computers could do this, machine learning has made this idea a reality. Of course, the algorithm is technically only “hallucinating” what it expects the missing detail might look like, but in many cases there is no practical difference.
One of the paper’s authors is Behnam Bastani, Facebook’s Head of Graphics in the Core AR/VR Technologies department. Between 2013 and 2017, Bastani worked for Google, developing “advanced display systems” and then leading development of Daydream’s rendering pipeline.
It’s interesting to note that the paper is not actually primarily about either the super resolution algorithm or freeing up GPU resources by using that. The researchers’ direct goal was to figure a “framework” for running machine learning algorithms in real time within the current rendering pipeline (with low latency), which they achieved. Super resolution upscaling is essentially just the first example of what this enables.
Because this is the focus of the paper, there isn’t much detail on the exact size of the upscaled region or the perceptibility, other than a mention of “temporally coherent and visually pleasing results in VR“.
The researchers claim that when rendering at 70% lower resolution in each direction, the technique can save roughly 40% of GPU time, and developers can “use those resources to generate better content”.
For applications like a media viewer, the saved GPU power could be kept unused to increase battery life, since on Snapdragon chips (and most others) the DSP (used for machine learning tasks like this) is significantly more power efficient than the GPU.
A limitation of this technique is that it could add latency since it happens when the frame is finished. But mobile GPUs are different from PC GPUs in that they render tile by tile, and the NPU tasks can run asynchronously, so the upscaling could be done per-tile, adding only a few milliseconds of latency for the final tile.
A demo video was produced using Beat Saber, where the left image “was generated using a fast super-resolution network applied to 2x low resolution content” (the right image is regular full resolution rendering):
The researchers also explained how their findings could be applied to future headsets with eye tracking:
“Another benefit for machine learning based reconstruction is the latency mitigation and late latching in eye-tracked foveated rendering system. With the proposed architecture, the rendering system does not need to know where the eye is looking at and render at a uniform low resolution. After rendering, the fovea region, determined by eye tracking system, gets reconstructed with machine learning models and blend with periphery regions in the compositor. In this way, the eye motion to photon latency can be reduced by roughly one frame duration, minus a minor portion of ML reconstruction time. This latency saving can be crucial to some eye tracking systems avoid latency artifacts with saccade. “
Apparently, using super resolution to save GPU power is just one potential application of this rendering pipeline framework:
“Besides super-resolution application, the framework can also be used to perform compression artifact removal for streaming content, frame prediction, feature analysis and feedback for guided foveated rendering. We believe enabling computational methods and machine learning in mobile graphics pipeline will open the door for a lot of opportunities towards the next generation of mobile graphics.”
Facebook AI Research
There is no indication from this paper that this technology is planned to be deployed in the consumer Oculus Quest, although it doesn’t give any reason why it couldn’t either. There could be technical barriers that aren’t stated here, or it may just be considered not worth the complexity until a next generation headset.
We reached out to Facebook for details and a representative of the company replied saying they have nothing to share beyond the paper itself, and that “this is purely research that we hope will advance the fields of machine learning, mobile computational graphics, and virtual reality“. Regardless, it looks clear that machine learning can play a role in bringing standalone VR closer to PC VR over the next decade.