Extracting the most Performance from a Computer

Introductory disclaimer: there are arguably more important aspects to a computer than just its performance. For example, the display (its size, resolution, viewing angles, colour gamut), the keyboard (is it obvious when you’ve pressed a button down?), the mouse (does it move ever so slightly when you click? is it a good shape? Are the buttons easy to reach?) etc. In general the I/O devices attached to a computer make more of a difference to the user experience than the computer itself. But there is always a place for more performance.

Make no mistake, the biggest performance gains will always come from algorithmic optimization of the software. You can overclock, but stability in this case is never a “yes or no” situation. A heavily OCed CPU/GPU may pass 24-48 hours of Prime95/LinX/Furmark, but will only be stable 99.8% of the time, and that little 0.1% of instability may manifest itself in a quiet error that nobody ever detects, until it’s time to chkdsk and you realize something that was not supposed to be written was written to disk. It is not “definitely stable”, nor can you ever rely on it being perfectly stable until you reset it back to stock clocks.

And when overclocking, always go for the maximum clock you can reach without raising the voltage. That means turning off Load Line Calibration, which is simply a way of minimizing the drop in voltage when the CPU gets busy (you can see this with CPU-Z, load all cores, and watch the Core Voltage drop). The drop in voltage is actually normal, and always within design specifications. There’s an Intel document out there for LGA775 CPUs that talks exactly about “Vdroop”, as they call it. Asus motherboards, if you turn LLC on, may overcompensate for this drop in voltage and end up pumping even more voltage than necessary for stable operation into your CPU, making it run ~15C hotter, decreasing its lifespan (my 1055T used to run at 3.7GHz, almost a year of nonstop Folding@Home at that speed at ~1.525V it’s only stable at 3.5GHz), and eating into your power bill. See this page on what exactly affects the lifespan of a CPU. Yes, CPUs have a lifespan. They, like us, will all eventually die.

You may say “but I’ve got a CPU cooler that can deal with it” but your motherboard’s VRMs will resent you, especially if you’re using a tower cooler that doesn’t blow down onto the motherboard, in which case the VRMs don’t get much airflow over their heatsinks, if at all. Since power consumption increases linearly with frequency but nonlinearly with voltage, you reach maximum efficiency when you achieve the maximum stable overclock at stock voltages.

Proper fan placement always beats more fans/faster fans. Dust will get into your case anyway, so just use the side intake panel even if you don’t have a dust filter on it. In my experience, the dust filter is more of an extra part of your case that you now have to clean, rather than something useful that actually does reduce the dust buildup (ok it slows it down but not by much).

I/O Performance
The CPU’s caches transfer data at ~200-300GB/s. DDR3 SDRAM, at best, gives you around 10GB/s of bandwidth, at worst, probably half that. A hard drive that’s seeking, reading many small files (for example, reading the many DLLs required to load a specific program, or reading a fragmented file) will give you 1MB/s, reading a sequential file (best case scenario), ~100MB/s. Given that code and data always have to be loaded from something slow like the hard drive, you can see where the bottleneck is, can’t you? It’s really obvious, right? Yet when I was working in a computer shop in Malaysia, a colleague specced out a computer for a customer with a QX9650, RadeonHD 4870 X2 and… a Caviar Green 1TB drive. And that was all while we had a Velociraptor right under the glass tabletop. And because he had worked there longer, he had authority and I couldn’t complain.

Anyway there is a plain need for much, much faster storage. Which is why you need a solid state drive, aka SSD, and in the absence of that, lots of RAM. Having both an SSD and lots of RAM would be nice though, and RAM is dirt cheap these days. Today’s SSDs transfer sequential data at up to ~500MB/s, and deal with random file accesses at 20-50MB/s. But take note, the real advantage of an SSD over a HDD is not the 500MB/s transfer rate! That is only 5x faster than a HDD. You know this already but Windows and the programs you install in it is not just one big file waiting to be read off the disk like a long todo list. It is many small files, some a few KBs, some a few MBs, and in this case the SSD is 20-50x faster than a HDD! You can definitely feel the difference.

The best part is random read performance hasn’t changed that much much from the old days of SSDs (Vertex 1) to the present day (Vertex 4), and even the oldest SSDs can boast a 20x performance increase over the HDD (which is really what you’re after). So if you’re planning to install the OS on the SSD, you can save a bit of money by going with an older SSD.

The performance gain from a 100MHz CPU overclock will probably beat anything you get from buying faster RAM/overclocking it/twiddling with the timings etc. Still, it does affect performance.
DRAM is not like SRAM (used in CPU caches). They’re arrays of capacitors, and like all capacitors, they lose their charge over time. So you have to refresh them. Refresh them too often, and you can never get in enough time to read/write from them. Refresh them too sparingly, and your data may be corrupted. That’s what all the CAS/RAS whatever timing strobes are for. RAM speed is all about latency, specifically, the time lag between the point when you request a part of data from RAM, and the point when it’s ready for you to read from it. And then there’s the time it takes to actually read the data into the CPU’s caches. The last part is easy. More MHz, less time taken. The first part is the tricky part. I’m not exactly sure on how this works (read Wikipedia for the details), but suffice to say the faster clocked DDR3 module is not always the fastest. You need the memory to talk to the CPU faster, but you also would like the memory to respond faster in order to see a gain in speed!

This is also of use while overclocking. For instance, your RAM may not be able to handle running at a higher clock. So you can try running your memory at a slower clock, but tighter timings, resulting in a net decrease in latency, thus a net increase in performance. The formula for the actual CAS latency number is:
(CAS / Frequency (MHz)) × 1000 = X ns
where frequency is (transfer rate/2) otherwise known as the I/O bus clock, due to the DDR signalling scheme.

Also remember that the L3 cache/memory controller in the Phenom is tied to the CPU/NB clock. Overclock the CPU/NB, the L3 cache gets faster, the memory controller also gets faster. It makes a difference, as Anandtech pointed out here.