Nvidia GPUs: CUDA Cores vs Tensor Cores vs RT Cores

The design of modern graphics processors has become more integrated and complex than ever. A single circuit board or chip now includes multiple cores and other components. A good case in point is the high-tier GeForce RTX graphics processors from Nvidia. A particular RTX discrete graphics card has three sets of cores. These are the CUDA cores, Tensor cores, and RT cores. Each has its tailor-fitted purpose and processing capabilities. The combined functionalities of these sets of cores power the advanced applications of modern Nvidia graphics processors. This article explains the difference between CUDA cores, Tensor cores, and RT cores.

Explaining the Difference Between CUDA Cores, Tensor Cores, and RT Cores of Nvidia Graphic Processors

It is not clear when semiconductor manufacturers began developing graphics processors with a multi-core design. Two of the earliest examples of multi-core GPUs were the Oxygen 402 from 3Dlabs which was introduced in 1997 and the Voodoo2 from 3dfx Interactive which was released in 1998. These products failed to hit the mainstream market because of compatibility and scalability issues that limited their applications. They were also not efficient due to their high power requirement and excessive need for substantial heat management.

The first multi-core graphic processors that achieved mainstream success came from Nvidia Corporation and Advanced Micro Devices or AMD. These companies were responsible for developing, popularizing, and expanding the practical and specialized applications of a multi-core graphics processor design beginning in the mid-2000s.

Nvidia is notable for introducing the CUDA cores in 2006 with the unveiling of its CUDA architecture. This multi-core design has equipped its graphic processors with general processing and parallel processing capabilities that have since become useful outside video gaming and graphics rendering applications. It further pushed the boundaries of multi-core graphics processor design when it introduced several graphics processing architectures that contained multiple sets of cores for parallel processing.

The company specifically launched the Tensor cores in 2017 with the Volta architecture and the RT cores in 2018 with the Turing architecture. GPUs under the GeForce RTX 20 series were the first Nvidia products to feature these two sets of cores. Mid-range and higher-tier Nvidia GPUs are now equipped with CUDA cores, Tensor cores, and RT cores.

1. CUDA Cores

At the heart of a modern Nvidia graphics processor is a set of CUDA cores. These cores are the basic units of computation or main processing units of the GPU that are responsible for executing a range of graphics processing and even general processing tasks such as shading, texturing, rasterization, and general-purpose computing.

The number of CUDA cores partly determines the graphics and parallel processing capabilities of the graphics processor. More CUDA cores mean faster processing of complex workloads. The equivalent of these cores in AMD Ryzen graphics processors are the Stream cores while the Xe Engines are the equivalent in Intel Arc graphics processors.

CUDA cores are the most versatile processing units or type of cores in an Nvidia graphics processor. They are efficient at calculating the color of each pixel on a screen to produce shading and are quick to provide needed processing for applying textures on surfaces and converting three-dimensional geometry into two-dimensional pixels.

2. Tensor Cores

Another set of cores in an Nvidia GPU is called the Tensor cores. These are co-processing units and an example of an AI accelerator designed for accelerating artificial intelligence applications. They can perform matrix operations at higher speeds and better precision. Matrix operations are essential in machine learning training or running AI algorithms.

The Tensor cores enable several AI features of an Nvidia graphics processor. These include Deep Learning Super Sampling or DLSS, NVIDIA Broadcast, and NVIDIA Studio. AMD also has similar processing units called the Matrix cores. Intel has equivalent cores in its Intel Arc line of graphics processes called the Intel XMX Engines.

Remember that these cores are designed to accelerate AI workloads. They represent the application of AI accelerators and AI tech in graphics processing. Examples of these workloads include image processing tasks such as denoising and upscaling, video processing or decoding and encoding, and natural language processing.

3. RT Cores

An Nvidia GeForce RTX GPU differs from a GeForce GTX or GeForce GT GPU because of the inclusion of dedicated processing units for hardware-accelerated ray tracing called RT cores. The entire set of cores enables a graphics rendering technique that simulates the properties and behavior of light and its interactions with objects in a scene.

RT cores render closer to life-like visual effects, such as reflections, refractions, shadows, and global illumination without stressing the CUDA cores. The AMD equivalent for hardware-accelerated ray tracing is called Ray Accelerators while Intel calls the same processing units as ray tracing units in its Intel Arc line of GPUs.

Take note that real-time ray tracing is a computationally intensive process. The inclusion of processing units dedicated for ray tracing processes improves the capabilities of a GPU. These units have also enabled developers to create more realistic and immersive games and other applications such as augmented reality and mixed reality.