The blistering pace of innovation in artificial intelligence for image, voice, robotic and self-driving vehicle applications has been fueled, in large part, by
NVIDIA’s House of AI
Just a year ago, NVIDIA announced the PASCAL generation GPUs that tripled the performance of AI workloads, and the industry’s big players in AI, including
Moving Beyond the GPU for AI
While the NVIDIA GPU has become the standard for training neural networks for Machine Learning, some have argued that the use of these networks (called inference processing), can best be executed on FPGAs or on an ASIC such as the Google TPU, especially when deployed in very large volumes where their speed and low cost can compensate for the significant development expenses required. The ASIC fixed function chips are not as flexible as a GPU or an FPGA, as ASICs are designed to do only one thing, but do it very fast. But the GPU’s flexibility comes at a cost in terms of die area and power consumption, so in theory an ASIC should perform better than a GPU. On the other hand, the argument for the GPU has been that deep learning research is moving so very fast that you might develop a new ASIC that could obsolete by the time you finish it, many years and millions of dollars later.
Now NVIDIA has announced that the best answer may be a hybrid approach. It will use a CPU where performance is not critical but need for programmability is high, use a GPU where you need to run operations in parallel but want to retain some level of flexibility / programmability, and then use an ASIC where the algorithms have become stable and the volumes will be large, such as in the case for deep learning inference processing.
NVIDIA’s Deep Learning Accelerator (DLA)
With this context in mind, then, it makes sense for NVIDIA to build a fixed function accelerator that acts as an efficient inference engine as part of a larger solution. NVIDIA announced that its next generation DrivePX platform for autonomous vehicles, the Xavier SOC, would consist of ARM CPU cores, Volta GPU cores, and a fixed function Deep Learning Accelerator (DLA) for inference. This approach, the company says, will result in higher performance at lower power, while maintaining the flexibility for customization that its automotive OEMs demand.
While the initial implementation of the NVIDIA DLA will be in the Xavier SOC for Autonomous Vehicles, I expect it to extend this approach to other platforms such as the low cost Jetson platform for vision guided autonomous robots, drones, etc. But why stop there? After all, in the world envisioned by NVIDA’s CEO, Jen-Hsun Huang, there will be trillions of devices embedded and connected in the Internet of Things that will require the intelligence afforded by AI. This is where NVIDIA’s strategy to open source the DLA comes in.
NVIDIA has always focused on solving very hard, computationally complex problems. So it has no interest in designing, as Huang puts it, a deep learning chip for smart lawn mowers, or a deep learning chip for refrigerators, or a deep learning chip for streetlamps. (All might be different.) By deciding to open source the DLA, NVIDIA is enabling its rich deep learning ecosystem to extend to include low cost, high volume and low power ASICs and SOCs, allowing other companies and researchers to build their own chips using this accelerator. And of course it all runs the same CUDA software used by NVIDIA GPUs. The company, in effect, is saying, “OK, if you want to build a TPU for your little widget, it’s probably best to build it on our technology, since nobody knows more about accelerating AI than NVIDIA.” Meanwhile NVIDIA can focus on building high margin, high value platforms needed in the datacenter and at the edge.
Conclusions
Many have been wondering how NVIDIA would respond to the Google TPU for over a year, and now we know. Instead of being threatened, it has effectively de-positioned the Deep Learning ASIC (TPU) as being a tool it can use where it makes sense, while maintaining the lead role for its GPUs and CUDA software. And by open sourcing the technology, it can retain a control point for the IOT adoption of machine learning. The risk with this strategy is that the open source approach may lend support for an idea that could evolve into a threat to NVIDIA’s long-term goals for datacenter inference engines. I would argue that could happen anyway, and that at least NVIDIA can now participate in that market indirectly or even directly should they choose.
--
Disclosure: Moor Insights & Strategy, like all research and analyst firms, provides or has provided research, analysis, advising and/or consulting to many high-tech companies in the industry, including some of those mentioned in this article including Microsoft and NVIDIA. The author does not have any investment positions in the companies named in this article.