Mac Pro: The perfect workstation

Since 2006, Apple has been doing Intel the favor of building desktops, workstations, and notebooks that make Intel x86 processors look like works of genius. It seems only fair that Intel has returned the gift by custom-engineering an x86 architecture with RISC-like attributes just for Apple's most demanding customers.

Intel completely rearchitected its x86 CPU beyond the core. Most PC users won't notice, but the Nehalem Xeon processor really lets OS X Leopard off its leash. With all 16 logical processors (two CPUs with four cores each and two thread contexts per core) overcommitted with burn-in compute and memory workloads, the "Nehalem" Mac Pro has the headroom to run a full plate of Mac GUI applications with the accustomed responsiveness. The Mac Pro feels like a new machine.

[ For more on Intel's new Xeon CPU, see "Intel's Nehalem simply sizzles" and "Where does Nehalem get its juice?" ]

Frankly, the Nehalem Mac Pro feels like a RISC workstation. The Leopard 10.5.6 OS that ships with the Nehalem Mac Pro is custom-tuned for Nehalem's parallel-friendly redesign and Mac Pro's remarkable power management, so don't let its OS X install disc get mixed in with your others. When Snow Leopard ships, this same machine will be born again with a full 64-bit kernel and new tools, frameworks, and language features that put pervasive parallelism front and center, right where workstation users need it. If you want the full heart-stopping Snow Leopard experience, the Nehalem Mac Pro is where you'll find it.

I already discussed the Mac Pro's extraordinary build quality and design, and I'm going at the nuts and bolts of Nehalem on a parallel track (using Mac Pro, Xserve, and Snow Leopard to do it). That groundwork sets up a very simple review of the Mac Pro itself, for which I had the benefit of testing two units (2.26GHz and 2.93GHz).

More throughput, less filling

The key to meaningful parallelism is throughput. Maximizing throughput is the cornerstone of OS X's architecture, and Intel's Nehalem processor redesign finally puts the hardware on the same page. The 2.93GHz eight-core Nehalem Mac Pro has the highest memory throughput of any two-socket Intel x86 system I've tested, well more than twice that of last year's 3GHz eight-core "Harpertown" Xserve (see review). The 2.26GHz eight-core Mac Pro, which is the entry-level dual socket model at US$3,299, very nearly matches the faster Mac Pro's memory throughput.

These machines establish a new bar in the price/performance/watt trifecta. In its factory configuration with one hard drive and the Nvidia GeForce GT120 GPU, the 2.26GHz eight-core Mac Pro's peak power utilization of 240 watts plummets to around 120 watts at idle. I was extremely impressed by OS X's broad and frequent adjustments to power draw during burn-in tests that overcommitted CPU resources. On prior Intel Macs, there was a direct correlation between gross core utilization (as reported by Activity Monitor) and power draw. The Nehalem Power Mac breaks this relationship.

In the Nehalem Mac Pro, the mix of workload determines the power consumed. Even while all 16 logical cores stayed locked at 100 percent utilization, the Mac Pro would drop power draw by 20 to 40 watts as workload combinations shifted.

STREAM is the industry's accepted standard benchmark for memory throughput, but it serves well to test the efficiency of the bus between processors in NUMA (non-uniform memory access) architectures like Nehalem. Each Nehalem CPU has an on-board, triple-channel memory controller that drives a dedicated bank of DDR3 memory. If one CPU needs access to memory attached to the other CPU, it has to go across the inter-CPU bus. With a sound NUMA implementation and a NUMA-optimized OS, aggregate memory throughput should increase with the number of CPUs engaged in an operation.

That's the case with the Mac Pro. I configured STREAM to run its tests on a 1.8GB array of memory. On the 2.93GHz Nehalem Mac Pro, hitting that array with one thread derived a "Triad" throughput score of 8GBps. Running STREAM again with eight threads pulled the second CPU into operation and the Triad score rose to around 20GBps. For contrast, the STREAM Triad scores for the non-NUMA 3GHz eight-core Xserve are 3.4GBps for a single thread and 7.4GBps for eight threads.

Because so many workstation workloads operate under just this model, with several threads pounding on a shared data set, STREAM is an exceptional predictor of overall workstation performance. The STREAM results were echoed in my 3-D rendering, AVCHD (high-definition H.264 video) stream transcoding, and string search and sort tests. In all of these tests, the 2.93GHz Nehalem Mac Pro outperformed the eight-core Xserve by at least 50 percent. Straight-ahead integer and floating point scores were comparable to those of other Intel Core 2 systems at similar clock speeds, which is to be expected considering the microarchitecture is largely unchanged. However, the Nehalem Mac Pro's aggressive power management let it handle the same workload with less power.

Let it Snow

The best of the Nehalem Mac Pro is yet to come. Like the iPhone, this is a system that will improve with time at no cost to its owners. Snow Leopard just lights it up in ways that I can't describe (I'm under non-disclosure, but June's not that far away). Even pre-Snow Leopard, turning Intel's version 11 compilers loose on your existing code will produce some surprising results. Mac developers should consider a two-socket Mac Pro a must-purchase, and parallelization of their applications a top priority.

The Mac Pro is available in a variety of configurations. As I see it, the sweet spot is the custom build with two 2.66GHz CPUs, 12GB of RAM, and the hardware RAID controller at $5,699. With Nehalem's bus opened up, disk I/O performance surfaces as a severe bottleneck. A compile of the SPEC CPU2006 benchmark suite (using eight processes) took twice as long on the new Mac Pro without RAID as it did on old Xserve with RAID.

The strange not-power-of-two memory configuration relates to Nehalem's triple-channel memory controller. Its best performance is derived from attaching three DDR3 DIMMs to each processor. This leaves two DIMM sockets vacant, and there's some controversy over the performance impact of filling them. I created a 14GB configuration by moving two DIMMs from the 2.26GHz test machine to the 2.93GHz unit. I reran the eight-process STREAM tests and got the same results, but your mileage could vary.

Either way, I don't see it as an issue. RAM is upgradable, more easily in the Nehalem Mac Pro than in any other PC, and as higher-density DDR3 DIMMs become available, you can build your own perfect workstation. You can't get a better start toward that end than with the Mac Pro.

This story, "Mac Pro: The perfect workstation" was originally published by InfoWorld.

Copyright © 2009 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon