Afaik, gpu cores a very stripped down cpu cores that do basic math, but what sort of math can cpus do that gpus can’t
I’m talking, cpu core vs Tensor/Cuda cores.
Not so much GPUs themselves but GPU shader compilers really struggle on some code with a lot of branching and loops, operating on strings.
- It’s the GPU itself. GPUs work by grouping together multiple threads into a single thread group, which NVIDIA calls a warp and AMD/Intel call a wave. Every thread belonging to a given warp/wave has to take the exact same path through the code, so if you have heavily divergent branches where some threads in a warp/wave take one path and other threads take another path then that can kill performance as the code needs to be executed multiple times to cover all paths.
- GPUs typically don’t operate on strings. Strings aren’t even a supported data type in most GPU-oriented languages or frameworks, especially graphics. If you need to operate on strings on a GPU then you typically break the strings up into individual characters and treat each character as a simple integer, which is totally fine so long as you’re able to wrap your head around dealing with strings at such a low level.
I imagine it like a construction site for a new house. Do you want 10 master craftsman who can do everything or 300 guys off the street who kinda know what they’re doing and are really on there for grunt labor. For the tasks that require lots of manual labor you want the 300 guys, but for those tasks that aren’t easy or aren’t very parallel, you want fewer but more capable cores.
my guess is they just have different instruction sets but both are able to do basic add, substract and multiply
Think of it this way.
A CPU is a big digger, its bucket can dig a large chunk of earth with one scoop. Heavy lifting.
A GPU is a lot of small shovels, each digging a small amount of earth, but all at the same time.
So a CPU can do a large task at once, a GPU can do a lot of smaller tasks at once.
Which is great, the CPU is doing grunt work, the GPU is doing small calculations.
Both the GPU and the CPU being perfectly tasked for their specific purpose.
Security is an interesting reason that most people don’t think about.
When you run a program on your computer, you’re constantly swapping between user and privileged modes. For example, you don’t want a website reading the stuff from your harddrive. Any such attempts must go to the OS which will then say the website doesn’t have permission and refuse.
GPUs don’t have a privileged mode. This isn’t just because it wouldn’t be useful. To the contrary, webGL or WebGPU have massive security issues because you are executing third-party code using drivers which themselves generally have root access. GPUs don’t add such a mode because their hardware takes forever (in CPU terms) to get all the data loaded up and ready to execute. Moving to a different context every few microseconds would result in the GPU spending most of its time just waiting around for stuff to load up.
The solution to this problem would be decreasing latency, but all the stuff that does that for you is the same stuff that makes CPU cores hundreds of times larger than GPU cores. At that point, your GPU would just turn into an inferior CPU and an inferior GPU too.
It depends what type of GPU “core” you are talking about.
What NVIDIA refers to as CUDA/Tensor/RT cores are basically just glorified ALUs with their own itsy tiny control. But for the most part they are just an ALU.
For the most part CPUs tend to be more complete scalar processors, which they include the full control datapath as well as multiple Functional Units (FUs) not just an floating point unit.
The distictions are moot nowadays though; a modern GPU includes their own dedicated scalar core (usually in terms of a tiny ARM embedded core) for doing all the “housekeeping” stuff needed for them to interface with the outside world. And modern CPUs contain their own data-parallel functional units that can do some of the compute that GPUs can.
In the end the main difference is in terms of scale/width of data parallelism within a CPU (low) vs a GPU (high)
to echo others, it’s not what, but how.
cpus do execution reordering and speculation to run one thread really fast. gpus have mostly avoided that and execute threads in large groups called “warps” (analogous to lanes of a SIMD unit).
Gpus may be Turing complete ( yes I just repeated that to annoy the correcting dude 😂). But the majority of normal workloads that do not require an insane amount of relatively simple parallel work would be excruciatingly slow on a gpu. The processing cores are small and slow ( but there are thousands of those).
- it does way less computations per clock
- it dossnt do branching very well which is the core ability of any cpu ( if then else in badly predictable ways)
- it has a slower clock speed
- it’s cache lines are really bad unless you do a simple data in compute data out processing
Basically anything that is single threaded and moderately heavy in logic ( most OS core functionality and most application code) would be absolutely atrocious on a Cuda core. Also splitting out all parts that can be done on the Cuda cores would be a huge amount of work if your code is interspersed with the heavy stuff. That’s why most computation are on cpus they are just much better at pretty much anything without much tuning unless you have a huge number crunching for example matrix computation. But if you have a couple million data points and you want to do some simple mathematics on them oooh the Cuda cores murder any cpu.
in a more practical way, CPU are way more efficient than GPU for lots tasks. for GPU once you move out of matrix multiplication you cannot do so much. a GPU will harvest a field faster, but will not efficiently cut a tree
GPUs are not good with branching, as in taking different paths based on previous calculations.
CPUs are general purpose processor that do the general calculations and data. If a PC is handling AI algorithms, they are often done in the CPU. CPUs handle the more complex large file handling. CPUs handle the general interactions in a game. CPUs have fewer but massive complex processor cores optimized for large complex logic work.
GPUs are their own specialist computers optimized for complex graphics and physics vector calculations from hundreds of thousands of really tiny simple files. CPUs handle the general interactions in a game. GPUs handle the complex lighting, light rays, fragmentation physics and image 3D rendering. CUDA and Tensor Cores of a GPU are thousands of puny simple processors optimized for tiny random floating point calculations.
CPUs and GPUs are both Turing complete, so from a computation perspective, they can technically do the same things. However each is optimized for a different set of metrics, so it may be wise to target a particular workload to one over the other.
The CPU is optimized for minimum latency, aka maximum single threaded performance. This is really important for a lot of workloads.
GPUs are optimized for maximum throughput and are willing to suffer higher latency to achieve it. There are many architectural differences as a result of this goal, for example: GPUs don’t have branch prediction, they also don’t have out of order processing, and also have much smaller caches per computational unit. All of these saves a ton of transistors, which can be used to add more cores.
So no point in making Linux or Windows that can run 100% within GPU and leaving a very minimal CPU to handle data transfer between drives, keyboard, etc and GPU.
I’d like to see someone with a 4090 GPU running Windows and a Motorola 68000 as the CPU to handle drives.
GPUs are not necessarily Turing complete, BTW.
I think it’s more about the efficiency of doing the math than the ability to perform the math.
I mean if you’re creative enough, probably nothing.
This is kinda like asking what can I do on a lathe that I can’t do on a mill. It’s more what’s better suited to be done on one or the other.
CPUs are more generalised; they have a deep and complex instruction set and feature list. While GPUs are shallower and far more specialised, but do tasks that parallellalise more readily… Like calculating a metric shitload of triangles.
You can see CPUs used to ‘push pixels’ in older computers since that’s all they had.
Follow up question, what allowed CPUs before the advent of eGPU/dGPU’s to output video, or what prevents them from doing so now?
Frame Buffers have been a thing since the 60s at least ;-)
Basically it is a piece of memory that contains the color information for a set of pixels. The simplest would be a black and white frame buffer, there the color of each pixel is defined by it being 1 (black) or 0 (white).
Let’s assume that you want to deal with a monitor that it is 1024x1024 pixels in resolution, so you need 1024x1024 (~1Mbit) bits of information to store the color of each pixel.
So in the simplest case, you had a CPU writing the 1Mbit BW image that it just generated (by whatever means) into the region of memory that the video hardware is aware of. Then the display generator would go ahead and read each of the pixels and generate the color based on the bit information it reads.
Rinse and repeat this process around 30 times per second and you can display video.
If you want to display color, you increase the number of bits per pixel to whatever color depth you want. And the process is basically the same, except the display generator is a bit more complex as to generate the proper shade of color by mixing Red/Gree/Blue/etc values.
That is the most basic frame buffer, unaccelerated meaning that the CPU does most of the work in generating the image data to be displayed.
So assuming you had a CPU that was incredibly fast, you could technically do just about the same that a modern GPU can do. It just would need to be thousands of times faster than the fastest modern CPU to match a modern GPU. ;-)
Hope this makes sense.
You’ve probably seen this in older games already, hardware vs software rendering. Software rendering just asks your CPU to perform the same calculations your GPU would normally take care of. It still works today but games have such astronomically high performance requirements that most don’t even give you the option.
Video cards that didn’t have a GPU, but instead were just a RAMDAC or “scan out” engine.
I don’t think thinking of them as “stripped down CPU cores” is accurate. They are very different beasts these days, with GPUs executing multiple threads in lockstep, not having a stack etc.
As far as “math”. GPUs can do pretty much everything in the sense they are turing complete, but at the same time if you’re asking instruction-wise, GPUs won’t have all the extra super-specific stuff that the x64 instruction set has for example. You’d need to write more stuff yourself.
I’m not sure exactly at what level you’re asking the question.