AMD’s talking more about its upcoming CPU Bulldozer than before. We now know that it implements simultaneous multithreading by not sharing execution resources as with Pentium 4/Core i7, but by actually doubling the execution resources.

Notice how the FP unit was unaffected, aside from the fact that it is now slightly more capable. Chalk it all up to AMD’s long term strategy to integrate the CPU with the GPU, Fusion.
As you all know, processor architects haven’t had much success increasing single thread performance these days. Oh, in the old days, it was easy. Pipelining with the 80486 – execute different parts of different instructions all at once. Superscalar with the Pentium/P5 – just put two execution units instead of one, that really helped differentiate the Pentium from the 486. Out of order – execute instructions that don’t depend on the ones in front first. That also gave a major speed boost, just look at the performance gap between the Pentium Pro/Pentium II vs the original Pentium.
Unfortunately, nothing much has happened since then. Intel bolted a vector FPU unit onto the Pentium II and called it the Pentium III. AMD finally got over its FPU issues with the K6 and went on to produce the Athlon, which soundly kicked the Pentium III’s ass, and the Pentium IV’s too, before it got the 533MHz FSB. The Pentium IV took pipelining to an extreme and introduced SMT to the mainstream, but ultimately wasn’t that fast anyway. And now, we have Core/Core 2, a slightly widened Pentium M, which is really a Pentium III with the Pentium IV’s FSB and SSE2 support. Core i7 I haven’t studied that much on, but they say it’s yet another evolution of Core 2 with a much improved front-end (lower latency caches, integrated memory controller).
The point is, recent CPUs haven’t had any distinguishing architectural differences from one another. They’re all pipelined, superscalar, and out of order processors with some SIMD functionality. And Intel charges extra for SMT.
What does the GPU have that the CPU doesn’t? Why, insane floating point performance thanks to hundreds of simple FPUs, of course. Coincidentally, most floating point workloads that have a huge market are easily parallelized workloads, such as ray tracing, oil exploration, stock trading and anything else that has been advertised to be sped up x times with GPU computing. The major difference between the CPU and the GPU is that the CPU usually has one or two FPUs, with a lot of logic in front of them to make sure that one FPU isn’t waiting for another to finish computing a result that it needs for the next instruction, and a lot of logic behind the FPUs to make sure the instructions come out in the same order as they were supposed to be executed. The GPU, on the other hand, is basically hundreds of simple FPUs doing relatively simple, repetitive tasks in an SIMD fashion, whose workloads are scheduled/managed by another entity.
Does all this sound familiar? It should, because IBM/Sony/Toshiba Cell’s architecture was public knowledge a LONG time ago. Boatload of simple FPUs with fancy name? Check. Scheduler (with bonus SMT)? Check. Low performance in non-parallel workloads? Check. Widespread adoption in targeted market? Check.
But AMD was at pains to stress that Llano is an intermediate step, and is not the end-of-the-road for its Fusion program. The real goal of Fusion is to merge the CPU and GPU entirely, and to bring that combination directly in contact with the OS. (See the red arrow in the slide below.)
According to Jon Stokes’s article, the long term strategy is to integrate the GPU into the CPU, and remove the OpenGL layer between the OS and what was once the GPU. What this means, besides much more heat emanating from one chip instead of two, is that the CPU will now be used for DirectX shaders, texturing, geometry transformation, lighting and perspective transformation, and most importantly, highly parallelizable workloads. If you’ll remember, huge multiprocessor systems were all the rage in supercomputing before GPU computing showed up. Now everybody’s screaming about how much faster GPU computing is than using a truckload of AMD/Intel’s CPUs to do the same job. AMD buying ATi not only got them a foot in the door with respect to GPU computing, but also, with Fusion, the CPU will regain more of its former influence on a system’s performance. The CPU is relevant again, whereas before it was in danger of merely being in charge of a collection of GPUs, PPUs, whatever some stupid upstart company thinks it can accelerate that needs a lot of FP performance.
There is absolutely nothing in it for AMD to integrate a graphics core into a CPU. Think about it, AMD already owns the ATi graphics business, i.e. it already has integrated graphics. Integrating graphics functionality on the CPU would 1: make design/fab more difficult, 2: reduce the amount of parts to sell, 3: its most lucrative customers, business/supercomputing, don’t even care about graphics. No, better to let the GPU’s amazing FPU throughput onto the CPU, throw whatever’s left onto the northbridge (it already manages a host of diverse interfaces anyway, why not one more), and profit by developing one product line instead of two.
I don’t see this working out though. I mean, GPUs already produce a lot of heat even on the smallest feature size processes. If AMD doesn’t succeed in solving the heat issue, graphic cards as we know them will be here to stay. Which doesn’t seem to be what AMD set out to do. One thing’s for sure, though – IBM’s Cell processor has influenced many in processor architecture.




