From gaming chips in the 90s to AI computing GPUs/chips today, the world has witnessed a plethora of innovations. If you were a die-hard 90s gaming kid, and are reading this article today, you could be developing applications or leveraging GPUs indirectly for your day-to-day needs. With industrial streaming, complex compute and heavy-duty research workloads executed today on GPU-servers, our whole planet is wary of their role today. To pique the interest of readers, in this article we have covered the history of GPU computing and its evolution over the past three decades. Hop in, to know more!
1. Release of OpenGL Library by Silicon Graphics
In the 80s, engineers used to develop custom graphics software for each hardware. It was very tedious and time consuming. This problem was solved all thanks to Silicon Graphics Inc (now disjunct), which was a premier company in the graphics consumer/enterprise market mainly in the late 80s and early 90s. It’s graphic terminals and software were popular amongst the masses. They were used by Federal organizations, scientific communities and even the entertainment industry. It was the first company to come up with a graphics APIs that would support different types of graphics hardware.
SGI with its proprietary IRIS GL (Integrated Raster Imaging System Graphics Library) maintained a robust graphics API for 2D and 3D graphics which most of the firms couldn’t back then. On 30 June 1992, the company released OpenGL library, an open source version of their proprietary IRIS GL, to open its doors to a bigger market. Soon it became an industrial standard for its ease of use and scalability scope. OpenGL facilitated the use of advanced graphics algorithms even in applications running on low-end hardware or systems with a low TDP.
Although, GPUs weren’t there back then in early 90s and there’s no question of GPU computing, it was primarily SGI and birth of OpenGL that much later paved the way for programmable pixel shading (discussed in 3). SGI started a revolution in the industry with its 3D Graphics workstation and robust software that went open-source. OpenGL served as the means to talk to any graphics hardware in the same language and hence bolstered the growth of 3D applications development ranging from CAD to video-games.
2. Evolution of 3D Graphics for PC Gaming
With the rise of consumer workstations especially due to Microsoft Windows, PC gaming took a major leap especially with immersive FPS games like Doom, Quake, Star Wars Dark Forces, etc. in the early to late 90s. Immersive 3D environments demanded a good chunk of resources for rendering such graphics. Thanks to companies like Nvidia, 3dfx and ATI, their graphics accelerators led to an exponential rise in consumer graphics due to their affordability. Consumer graphics applications not only varied across gaming but also image processing, drawing and design, consuming video content, etc.
The development of these graphics applications were primarily done with OpenGL. In 1995, Microsoft rolled out Direct3D (a component of DirectX) for Windows PCs which became a major competitor for OpenGL. For a short duration, Microsoft and SGI worked on unifying OpenGL and Direct3D with the Fahrenheit project due to a huge demand by developers, who wanted a common graphics API for more stability. However, it was abandoned in 1999 due to lack of industry support and competitive business goals. Nevertheless, both OpenGL and DirectX graphics pipeline continued to evolve and proliferated the adoption of advanced graphics rendering like transformation and lighting in graphics applications.
In 1999, Nvidia unveiled the GeForce 256 which was claimed as the ‘world’s first GPU’. This was primarily due to its affordability, significant performance improvement over its predecessor (Riva-TNT) and even more consumer centric product standpoint. Nvidia claimed it as, ‘a single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines that is capable of processing a minimum of 10 million polygons per second’.
The GeForce 256 supported both OpenGL (v.1.2.1) and DirectX (v 7.0). The GPU initiated an industry move where most of the graphics pipeline would be implemented on the GPU itself. In addition, DirectX 7.0 for the first time went a step ahead of OpenGL for its hardware vertex buffer support and multi-texturing. However, programming with such API became so complicated that it demanded more exposure to the graphics hardware preferable with a simplified language (mentioned in 3rd section).
3. Programmable Pixel Shading
The introduction of DirectX 8.0 brought programmable pixel and vertex shading into the graphics development arsenal. The GeForce 3 series GPUs (3, Ti200, Ti500) unveiled by Nvidia in 2001 were the first to support DirectX 8.0 . Thus, it earmarked some level of flexibility in parallel computing as developers then had some authority over the computations being performed on the GPUs. Researchers thus managed to trick GPUs into performing basic arithmetic by leveraging the pixel shaders.
Scientists realized that pixel shading concept could be sabotaged for GPGPU computing. In practice, pixel-shaders take geometric co-ordinates (say 2D), color intenities and other relevant information. Hence, the science juggernauts figured out that those shaders could take any arbitrary inputs. They sent basic arithmetic inputs to those shaders and bamboozled the GPUs into rendering them. The resultant was a color but of course with a bunch of values they wanted as output.
However, such approaches were limited to be leveraged with the help of a graphics API in itself. Hence, the user-friendliness degree of GPU Computing with such graphics APIs was still low. To perform parallel computing with such hardware, one had to be well versed with either OpenGL and DirectX which was a very difficult task back then.
4. Introduction of CUDA with GeForce 8800 GTX
The demand of GPGPU parallel computing within the R&D ecosystem during mid 2000s, grew exponentially. Nvidia saw an opportunity and hence wanted to solve the challenges faced by communities leveraging parallel computing with their consumer centric cards. Hence, the company came up with a feasible and easier approach to address such challenges. 2006 was the year where programmers could finally leverage industry wide popular languages like C/C++ to interact with GPUs! With the unveil of GeForce 8800 GTX in 2006 by Nvidia, it introduced the GPU Computing revolution – Compute Unified Device Architecture. With (specs), the 8800 GTX was the first DirectX 10 GPU and also the first model to support CUDA.
Introduction of CUDA brought a unique and dedicated hardware architecture that facilitated computing directly with Nvidia GPUs. CUDA C was the first parallel GPGPU computing language with it’s own compiler – PTX (NVCC).
Initial applications that greatly harnessed CUDA, were based on medical imaging and computational science. One such good example was, researchers in University of Cambridge used low-cost GPU based clusters for computational fluid dynamics.
5. Birth of OpenCL
OpenCL (Open Compute Language), which we know is open source and a CUDA competitor, was an initiative by Apple! The MacBooks back then supported both Nvidia and AMD ATI GPUs. Hence, the silicon-valley giant also wanted to natively support GPGPU computing rather than getting into compatibility issues. In mid-2008, before shaping OpenCL for a release, the company collaborated with tech juggernauts from AMD, Intel, Nvidia, IBM and Qualcomm. This collective initiative officially became the Khronos Computing Group, which worked on refining OpenCL for a public release. OpenCL primarily featured a C programming language interface.
In August 2008, OpenCL 1.0 was released with the launch of MacOS X SnowLeopard. Subsequently, AMD dropped the ClosetoMetal framework and adopted OpenCL for its Stream Framework. Nvidia followed suit to support the computing language alongside IBM and other semiconductor companies. Unlike CUDA, OpenCL supported both AMD and Nvidia GPUs, with its arms later reaching to all sorts of hardware devices like CPUs, embedded devices, FPGPAs, etc. Hence, the ecosystem it supported was huge.
AMD strongly supported OpenCL and hence made KhronosGroup as an important ally to shape the software stack for its GPUs. In November 2016, AMD unveiled the ROCm software stack for GPU computing, where OpenCL has been a critical aspect to it based on a high-level programming interface. Although, Nvidia supported both, CUDA was faster due to various native compatibility features. In contrast, OpenCL in AMD GPUs provided a much improved performance which was often on par with CUDA, given the same backbone environment.
6. GPU Computing Today
The AI-HPC boom | Nvidia Chapter – Computational fluid dynamics, bioinformatics, game physics and many more industrial areas of research have actively leveraged HPC with GPUs. With the CUDA ecosystem becoming more developer friendly with wrapper modules and easy to use SDKS, the number of GPU computing developers have been increasing significantly.
With complex ML and Deep Learning algorithms going open-source and more user-friendly libraries, there has been an exponential rise in AI R&D since the past decade. With Nvidia leading the forefront in HPC industry applications, the company has been strongly supporting the AI ecosystem with a range of GPU accelerated SDKs. From training DNN models in Nvidia GPUs to deploying them in embedded hardware (Edge AI with Nvidia Jetson Platform), the semiconductor giant has shaped an end-to-end array of software stack in order to assist Data Scientists, ML Engineers, research scientists and embedded developers.
Data Center GPUs, DPUs and supercomputers/PODs based on Ampere architecture (e.g. A30, A40, DGX A100) have facilitated a giant leap in GPU Computing today for industrial use-cases. Just recently announced successor Ada-Lovelace, takes the performance to even much beyond! For content creation and rendering, the Ray Tracing series of GPUs by Nvidia have paved the way to designing realistic ray-traced environments and fostering the metaverse revolution with Omniverse. Jensen has already envisioned Ready Player One and has driven the company to foster the next generation.
MlOpen – A GPU Accelerated Machine Learning HPC primitive by AMD, MlOpen also supports Nvidia GPUs. Both HIP and OpenCL are its two major programming frameworks. With Nvidia leading the enterprise computing demand, AMD definitely hasn’t taken a back seat. The red giant has been also investing a lot on its MI instinct GPUs line-up including considerable research on GPU accelerated AI.
Metal – It came quite as a surprise when Apple unveiled its M1 chip to the world in 2020. The company entered the GPU master race with the M1 offering its own set of HPC cores. To even have its own computing ecosystem, Apple evolved its own proprietary Metal framework for developers Metal Shading Language (MSL) provides friendly APIs for neural network processing, rendering and ray tracing.
Intel OneAPI – Intel OneAPI is a heterogenous computing platform that supports cross-industry chips. Hence, GPU computing is a subset of the same. It supports ease of HPC with DPC++ through oneDPL library which is gaining some fame in the community. It also has support for Deep Learning computing, ray tracing and video processing. Intel is also coming up with its Arc (consumer based) & Flex series (data-center) GPUs for content creation, HPC and AI applications.
The industry has been greatly showing support for open-source frameworks. While Nvidia leads the race in today’s era of computing, AMD, Intel and various other semiconductor startups have been attempting to gain the upper hand by making their platforms open-source. As a consequence, it has been spurring a significant growth in the computing ecosystem which is not just limited to GPUs. Governments are investing a lot on domestic fabrication facilities to boost and meet the computing demands of the industry. Today, we are witnessing heavy-duty computing workloads processed by GPUs paired with robust CPUs. In the future, we might see an evolution in semiconductor architecture that will support the same with a merged chip, i.e (power of two in one!)
2 Comments