I ran the benchmark on my Early 2008 Mac pro with dual 2.8GHz Xeons and 8GB FB-DIMM memory, but using Ubuntu 9.10. (8 cores, 8 threads, no turbo boost - based on the core2 series)
8 threads
Render Time: 50:45.00u 21.94s 6:26.09r 51:07t
Memory: 76.00 MB of 76.61 MB arena size. VM Size: 567.92 MB
7 threads
Render Time: 50:37.75u 17.20s 7:18.25r 50:55t
Memory: 67.20 MB of 67.96 MB arena size. VM Size: 491.50 MB
6 threads
Render Time: 50:23.38u 20.45s 8:28.46r 50:45t
Memory: 65.17 MB of 66.11 MB arena size. VM Size: 420.94 MB
5 threads
Render Time: 50:15.28u 20.73s 10:08.71r 50:36t
Memory: 60.92 MB of 61.88 MB arena size. VM Size: 349.36 MB
4 threads
Render Time: 50:07.26u 18.94s 12:37.92r 50:26t
Memory: 56.02 MB of 56.99 MB arena size. VM Size: 275.89 MB
3 threads
Render Time: 49:47.93u 16.97s 16:42.40r 50:04t
Memory: 49.98 MB of 50.79 MB arena size. VM Size: 266.77 MB
2 threads
Render Time: 49:32.92u 10.19s 24:52.12r 49:43t
Memory: 47.50 MB of 48.89 MB arena size. VM Size: 196.95 MB
1 thread
Render Time: 48:35.57u 13.49s 48:48.81r 48:48t
Memory: 50.40 MB of 52.38 MB arena size. VM Size: 68.52 MB
The result is near-perfect linear scaling with real cores. I've attached some graphs of this. (The new ##:##t column is the sum of the u & s numbers, total CPU time).
- ‘render_time’ is the time in seconds that the various threads took to render. The lower the better.
- ‘scaling’ is the speedup over a single thread (ideally 8x at 8 threads), with the ideal curve plotted. The closer the actual curve is to the ideal curve, the better.
- ‘contention’ should really be named the threading overhead. It's the amount of time wasted with thread setup, cleanup, synchronization, waiting, etc. It is expressed in a percentage of the single-threaded time (ie, 0-4.5%). The lower the better.
- ‘memory’ is the amount of virtual memory used, as reported by the memory command, expressed in a multiple of the single-threaded usage (so ‘4’ means it took 4x the memory). The lower the better.
Because overhead for this case is very low and scaling near-ideal, SMT (hyperthreaded) cores will improve performance. An SMT core will improve performance as long as the overall thread overhead doesn't exceed the SMT speedup (which is generally 0-40% over a single-issue core).
The Core i series hyperthreading is actually quite good; it has enough memory bandwidth to feed the CPU, enough cache to keep thrashing down, and enough resources within the cores themselves to keep the CPU busy – unlike the old Pentium hyperthreading which had issues on all those fronts.
Oh, and Peter – if price is an issue, you just got lucky [techreport.com].
