DjangoBB LoFi version

Full Version: Benchmarks on 6 core Phenom?

Root » Technical Discussion » Benchmarks on 6 core Phenom?

First
1
2
3
Last

malexander

Nov. 4, 2010 16:10:45

I ran the benchmark on my Early 2008 Mac pro with dual 2.8GHz Xeons and 8GB FB-DIMM memory, but using Ubuntu 9.10. (8 cores, 8 threads, no turbo boost - based on the core2 series)
8 threads
Render Time: 50:45.00u 21.94s 6:26.09r 51:07t
Memory: 76.00 MB of 76.61 MB arena size. VM Size: 567.92 MB

7 threads
Render Time: 50:37.75u 17.20s 7:18.25r 50:55t
Memory: 67.20 MB of 67.96 MB arena size. VM Size: 491.50 MB

6 threads
Render Time: 50:23.38u 20.45s 8:28.46r 50:45t
Memory: 65.17 MB of 66.11 MB arena size. VM Size: 420.94 MB

5 threads
Render Time: 50:15.28u 20.73s 10:08.71r 50:36t
Memory: 60.92 MB of 61.88 MB arena size. VM Size: 349.36 MB

4 threads
Render Time: 50:07.26u 18.94s 12:37.92r 50:26t
Memory: 56.02 MB of 56.99 MB arena size. VM Size: 275.89 MB

3 threads
Render Time: 49:47.93u 16.97s 16:42.40r 50:04t
Memory: 49.98 MB of 50.79 MB arena size. VM Size: 266.77 MB

2 threads
Render Time: 49:32.92u 10.19s 24:52.12r 49:43t
Memory: 47.50 MB of 48.89 MB arena size. VM Size: 196.95 MB

1 thread
Render Time: 48:35.57u 13.49s 48:48.81r 48:48t
Memory: 50.40 MB of 52.38 MB arena size. VM Size: 68.52 MB

The result is near-perfect linear scaling with real cores. I've attached some graphs of this. (The new ##:##t column is the sum of the u & s numbers, total CPU time).

- ‘render_time’ is the time in seconds that the various threads took to render. The lower the better.
- ‘scaling’ is the speedup over a single thread (ideally 8x at 8 threads), with the ideal curve plotted. The closer the actual curve is to the ideal curve, the better.
- ‘contention’ should really be named the threading overhead. It's the amount of time wasted with thread setup, cleanup, synchronization, waiting, etc. It is expressed in a percentage of the single-threaded time (ie, 0-4.5%). The lower the better.
- ‘memory’ is the amount of virtual memory used, as reported by the memory command, expressed in a multiple of the single-threaded usage (so ‘4’ means it took 4x the memory). The lower the better.

Because overhead for this case is very low and scaling near-ideal, SMT (hyperthreaded) cores will improve performance. An SMT core will improve performance as long as the overall thread overhead doesn't exceed the SMT speedup (which is generally 0-40% over a single-issue core).

The Core i series hyperthreading is actually quite good; it has enough memory bandwidth to feed the CPU, enough cache to keep thrashing down, and enough resources within the cores themselves to keep the CPU busy – unlike the old Pentium hyperthreading which had issues on all those fronts.

Oh, and Peter – if price is an issue, you just got lucky [techreport.com].

pbowmar

Dec. 13, 2010 02:09:57

So, I bought a 1090T (6 core 3.2ghz AMD Phenom II) and after a week of “which component is dead?” hell (it was one stick of RAM) I have some interesting results…

Benchmark.ifd from this thread, using 11.0.581 x86_64 gcc4.4 on Suse 11.3:

7:44

Identical machine, dual-booted to Windows 7 64bit, 11.0.581 64bit build:

9:15

I am quite surprised, I had heard Windows 7 had improved its effeciency, but apparently not…

First attempt to overclock failed (Mantra crashed) so I will likely have to either a) fiddle or b) give up and live with a stock CPU. I suspect b) is more likely

Cheers,

Peter B

splitpoint

Dec. 13, 2010 14:28:27

Thanks for the followup Peter. I was thinking of picking up a couple of those 1090t's for rendering. It's nice to know that the performance is pretty much identical to my Opteron machine.

Al

goldleaf

Dec. 17, 2010 03:17:58

I just finished building my own Phenom II X6 system, with a 2.8Ghz 1055T and 8GB Ram.

Here are my results:

Render Time: 53:44.60u 32.26s 9:24.50r
Memory: 62.64 MB of 63.21 MB arena size. VM Size: 418.01 MB

For how much I paid, this is amazing.

pbowmar

Dec. 17, 2010 03:48:04

Oh, I did get overclocking working. 15% overclock gives me:

6:44

Woot!

Soothsayer

Dec. 28, 2010 23:28:04

Here's my result for Xeon W3680 on W7 64

archie

Dec. 29, 2010 08:57:08

Dual Intel Xeon 5520 2.27ghz (8cores + 8 from Hyperthreading) 12gb ram
Windows 7 64bit, Houdini 11.0.581

Render Time: 5:38.381u 0.436s 5:40.57r
Memory: 435.94 MB of 441.13 MB arena size. VM Size: 516.81 MB

I think that Dual Xeon 56xx series will be the best variant for rendering. Korhon has used Dual Intel Xeon X5680 3.33ghz and his result about 2 min and 51 sec. My result for Dual Intel Xeon 5520 2.27ghz about 5 min and 40 sec. 55xx series is twice slower then 56xx series, it's too much

Soothsayer

Dec. 29, 2010 19:51:32

Yes, it's interesting Archie. Any idea why? The only difference between the processors seems to be double I/O bus speed for the 5680 vs 3680. Can that really make it render twice as fast (and nearly double the price)?

Alanw

Dec. 29, 2010 21:50:29

archie
Dual Intel Xeon 5520 2.27ghz (8cores + 8 from Hyperthreading) 12gb ram
Windows 7 64bit, Houdini 11.0.581

Render Time: 5:38.381u 0.436s 5:40.57r
Memory: 435.94 MB of 441.13 MB arena size. VM Size: 516.81 MB

I think that Dual Xeon 56xx series will be the best variant for rendering. Korhon has used Dual Intel Xeon X5680 3.33ghz and his result about 2 min and 51 sec. My result for Dual Intel Xeon 5520 2.27ghz about 5 min and 40 sec. 55xx series is twice slower then 56xx series, it's too much

There also seems to be a noticeable difference between Linux and Win7. I also have E5520's with hyperthreading on. My Linux score is 5:01 (posted on previous page), while I also get around 5:40 on Win7.

A few weeks ago I replaced my Asus Z8PE-D12 with an EVGA Classified SR-2, so I can actually OC my xeon's now

I haven't installed the proper cooling yet, but I'm stable at 2.6Ghz with decent temps. This brought my Gentoo time down to 4:39, and Win7 5:06.

I'm going for 3+Ghz this weekend. Doubt I can touch 2:51 but it sounds fun to try!

archie

Dec. 30, 2010 01:49:35

I guess that for the rendering (only for home test) is necessary looking for compromise between numbers of cores and frequency of core (Ghz). You can use a few cores with high frequency or you can use a lot of cores with low frequency. In both ways you will get equal results.
For example (for Linux): JColdrick has used Intel(R) Core(TM) i7 CPU 975 @ 3.33GHz and his result about 5:42, Alanw has used Dual Intel(R) Xeon(R) CPU E5520 @ 2.27GHz and his result about 5:01.
Also you can see if frequency falls then the numbers of cores will ‎be critical: Wolfwood has used Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz and his result 8:28

Why Soothsayer has showed so modest result, when he has used W3680? I think that W3680 has just 6 cores (12 Threads with Hyperthreading).

Also I have attached results from the ixbt.com, perhaps it will be interesting for you (you need to clear “.hip” because “.ods” is not allowed

)

Which the system should used for rendering images? Windows or Linux? I think that Linux.

But all things about which I wrote above perhaps will be revised in case the rendering of sequence images onto a render-farm. I guess that the critical value will have the speed of your storages and network.

malexander

Dec. 30, 2010 09:22:35

Soothsayer
Yes, it's interesting Archie. Any idea why? The only difference between the processors seems to be double I/O bus speed for the 5680 vs 3680. Can that really make it render twice as fast (and nearly double the price)?

The W3860 is a single-socket configuration CPU, whereas the X5680 is a dual-socket capable CPU. Since the 2:51 was posted by dual CPUs (2x6 SMT cores, or 24 threads), it's rendering twice as fast as the single W3860 (1 SMT core, 12 threads). The bus speed has a somewhat negligible effect.

The rendering test in question is a highly parallel RT algorithm, very well suited to thread scaling - almost an ideal case. Sims and micropolygon rendering likely won't show the same sort of ideal scaling; they'd benefit more from clockspeed and turbo boost as the scaling plateaus more at higher thread counts. So your choice of hardware really comes down to the type of operation that's bottlenecking your production.

Alanw

Dec. 31, 2010 17:30:47

Got my Xeon's to 3.168Ghz. Idle temps @ 40C / load @ 60C

Dropped my initial time of 5:01 down considerably. (Gentoo Linux 64 / 2.6.36 / Gcc 4.4)

Render Time: 58:43.32u 24.92s 3:51.23r
Memory: 121.59 MB of 122.48 MB arena size. VM Size: 1.13 GB

I think 3.5 would be stable as well, but I don't like the idea of my temps going over 60C for extended periods of time.

archie

Dec. 31, 2010 18:36:19

twod
Sims and micropolygon rendering likely won't show the same sort of ideal scaling; they'd benefit more from clockspeed and turbo boost as the scaling plateaus more at higher thread counts.

Thank you very much for this very important remark. I guess it'll very interesting many people.

Alanw
Got my Xeon's to 3.168Ghz. Idle temps @ 40C / load @ 60C
Dropped my initial time of 5:01 down considerably. (Gentoo Linux 64 / 2.6.36 / Gcc 4.4)
Render Time: 58:43.32u 24.92s 3:51.23r
Memory: 121.59 MB of 122.48 MB arena size. VM Size: 1.13 GB

Alanw, your results give hopes for the high-end model of 55XX series. The difference is not big between your result and result of X5680. Good news!

Have a happy New Year!

neonbulbs80

Feb. 2, 2011 15:41:00

Just spot this thread and wanted to try to do the benchmark.

Testing out Intel's latest Sandy Bridge processor.

i7 2600k. 4Gb RAM. Overclocked to 4.2Ghz

Render Time: 6:11.563u 0.499s 6:18.97r
Memory: 290.63 MB of 295.24 MB arena size. VM Size: 402.86 MB

I say it's quite fast for a 4 cores processor. And the price is just right for my pocket :wink:

Thanks to Peter for sharing the test file.

Cheers..

Nord3d

Jan. 12, 2012 22:07:31

CPU: i5 2500K @4.7GHz
RAM: 8Gb DDR3-1600 Dual-channel
Ubuntu 10.10 x64
Houdini 11.1.147 Linux x86_64 gcc4.4

Render Time: 22:59.70u 8.26s 5:49.56r
Memory: 42.53 MB of 43.21 MB arena size. VM Size: 262.68 MB

pbowmar

Jan. 13, 2012 02:38:18

New laptop for my wife:

i7-2670QM 2.2Ghz
Win7 64
8GB RAM
4 cores + 4 hyperthreads

9:36

Cyba

Feb. 11, 2012 05:47:10

First of all thanks to pbowmar for the benchmark file, i've looked for a mantra benchmark quiet a long time.

Here's the result of my
2500k, 4 x 4,5ghz
8Gb RAM
Win 7 64

Render Time: 6:48.301u 0.187s 6:48.45r
Memory: 220.94 MB. VM Size: 340.21 MB

Does anyone know why rendering on linux is so much faster compared to win7?

I'm planning a little render farm at my home office. Actually i'm rounning a bunch of different systems (2x phenom x6 1090t, 1x Q6600, 1x i3 540) with win7. I'm looking forward to set up 2 more render slaves with openSUSE. I hope it wont be an impossible thing without any Linux experience :?
Does anyone knows about the performance of the 8 core AMD Bulldozer at mantra?
I'm not quiet sure which Processor is more cost-effective for a mantra render node. My favourites are phenom x6-1090T, FX-8120 and Intel 2600k. Otherwise the next Intel generation with Sandy Bridge E and Ivy Bridge will be launched soon…

Best Regards
Hannes

pbowmar

Feb. 11, 2012 10:51:58

Hi Hannes,

First, you're welcome. As Twod has pointed out, this is a single test case that uses threading effectively so doesn't represent all Houdini or Mantra work!

Second, I do not recommend mixing Win and Linux. It can be done, make no mistake, but it's a huge time waster as you track down why textures load sometimes and not others because of arcane Windows issues. Or arcane Linux issues, depending on your experience

I recommend dual-booting if you need to keep Windows for apps that aren't available on Linux, so you can fire up an all-Linux farm for the overnight renders, but boot to Windows for Photoshop (which is really the main non-Windows software).

Just my opinion! If you're not Linux savvy, you'll also waste a lot of time figuring that out. Maybe sucking up the 10% performance hit is worth it for you to just stay Windows for everything?

Cheers,

Peter B

malexander

Feb. 12, 2012 20:21:19

Does anyone knows about the performance of the 8 core AMD Bulldozer at mantra?

I don't have specific numbers, but Bulldozer has worse single-threaded performance than the previous PhenomII processors, and threaded performance similar to the 2500K. Unfortunately, power consumption is much higher, so it's hard to recommend. Piledriver, the successor to Bulldozer, is supposed to improve performance by 10-15%, but unless AMD couples that with serious power improvements, that'll also be a tough sell.

I'm not quiet sure which Processor is more cost-effective for a mantra render node. My favourites are phenom x6-1090T, FX-8120 and Intel 2600k. Otherwise the next Intel generation with Sandy Bridge E and Ivy Bridge will be launched soon…

Since you're at home, you'll want to keep power consumption to a minimum. The 2500K is pretty good at this, and you can also choose components to minimize power - one large HD instead of several, larger DIMMs (2x8GB) rather than more smaller DIMMs (4x4GB), and no discrete GPU unless you're planning on doing OpenCL sims (the new AMD 7970/7950 cards apparently have excellent idle power at ~3W in sleep, 15W in 2D mode, and much improved compute).

Depending on your timeline, you could wait for an Ivy Bridge CPU to further reduce your power consumption, or look to a SB-E 3930K to reduce the amount of time the CPU is at 100% (which is unfortunately out-of-stock almost everywhere due to a VT-d bug that caused Intel to quietly halt shipments and produce a new version that's supposedly shipping soon).

Hope that helps.

pbowmar

March 7, 2012 21:10:03

pbowmar
My work benchmarks are (gcc 4.1, Centos 5.2, Houdini 11.0.538)

Intel(R) Xeon(R) CPU E5440 @ 2.83GHz

8 cores no HT

Render Time: 51:34.91u 17.73s 6:31.98r

Same machine, but Houdini 12.0.558 (from the .hip file, so recompiling the shaders with the llvm compiler) time:

8 cores no HT:

Render Time: 17:13.43u 1.82s 2:17.91r

Woo!