|
It is currently Sun May 19, 2013 10:29 am
|
View unanswered posts | View active topics
 |
|
 |
|
| Author |
Message |
|
apoppin
|
#1)
Post subject: Sandra SP4a: Bulldozer/Piledriver fixes and optimisations  Posted: Sat May 12, 2012 10:58 pm |
Joined: Fri Jul 04, 2008 1:26 am Posts: 19664 Location: 404 - Not Found!
|
|
ABT subscribes to SiSoft's Sandra and it is very good software for testing HW; one of the better and pretty well-thought out synthetic all in one benches. ANYWAY, it has just been updated and i found the email and changelog interesting .. the rest of this is a quote (but it don't want to put it in quotations] QUOTE>>>>
Change-log:
- Thread Scheduler: You probably know that Sandra has always used its own scheduler. This has allowed Sandra to properly schedule threads before Windows' kernel caught-up, e.g. Windows 7/2008 R2 with Hyper-Threading and Windows 8/2012 with Bulldozer.
Sandra works out the system topology itself (Node > Package > Module > Core > Thread) by relying (just like any OS) on the APIC IDs; unfortunately our (albeit limited) testing has found some boards for AMD Bulldozer coming up with "funky" APIC IDs: this causes both Sandra (and Windows) to schedule threads incorrectly.
If you're not bored - check out this example (Gigabyte GA-990FXA-UX):
CPU: Id 0 APIC 0h P 0 C 0 T 0 < module/CU #0 CPU: Id 1 APIC 1h P 0 C 0 T 1 CPU: Id 2 APIC 7h P 0 C 3 T 1 < module #3 here? CPU: Id 3 APIC 3h P 0 C 1 T 1 < module #1? CPU: Id 4 APIC 6h P 0 C 3 T 0 CPU: Id 5 APIC 2h P 0 C 1 T 0 CPU: Id 6 APIC 5h P 0 C 2 T 1 < module #2 here? CPU: Id 7 APIC 4h P 0 C 2 T 0
That's some funky topology there: and it even changes, sometimes, upon reboot:
CPU: Id 0 APIC 0h P 0 C 0 T 0 < module/CU #0 CPU: Id 1 APIC 3h P 0 C 1 T 1 < module #1? CPU: Id 2 APIC 5h P 0 C 2 T 1 < module #2? CPU: Id 3 APIC 2h P 0 C 1 T 0 CPU: Id 4 APIC 6h P 0 C 3 T 0 < module #3 here now? CPU: Id 5 APIC 4h P 0 C 2 T 0 CPU: Id 6 APIC 7h P 0 C 3 T 1 CPU: Id 7 APIC 1h P 0 C 0 T 1
Depending on the OS/kernel (Windows 32-bit and x64 ennumerate differently), you expect 0,1,2,3,... or 0,4,6,8,1,... Multi-socket/node systems do have "funky" topology but not single-CPU ones.
What Sandra does now is (also) test the latency between different CPU units (threads) - just like the Multi-Core Efficiency benchmark; then decides upon the topology rather than "trusting the board".
Note: Sandra supports up to 256-threads, across 64-groups. Tested up to 128. Note 2: As per above, the Multi-Core Efficiency benchmark shows you which CPU units (threads) are "paired" (either HT or Compute Unit) as it goes through all the combinations (see Task Manager during run) before selecting the lowest latency > highest bandwidth pairs.
Note 3: Sandra supports X2APIC ID. Not usually found in single-CPU systems.
- Benchmark: Multi-Media: enabled FMA4 Multi-Media code-path, thus improving Bulldozer performance by over 20% (versus AVX). FMA3 is also supported for Piledriver (and Haswell) which may be a bit faster still.
- Benchmark: Crypto: improved SHA256/SHA1 Bulldozer bandwidth by 50% by rolling back one SNB AVX optimisation. SNB/IVB scores are not significantly affected by this change. Similarly for AVX2, though nothing supports it yet.
- Benchmarks: Memory/Cache Bandwidth, Memory/Cache Latency: enabled large-pages (2MB) on Bulldozer by reading 2MB/TLB correctly. Assuming you granted yourself "lock pages in memory", using 2MB pages improves both performance and reliability by minimising TLB misses when using large memory blocks.
Note: huge-page (1GB) are still not supported by Windows; it is unlikely you could use them except on huge memory systems.
- Benchmarks: Memory Bandwidth, Cache Bandwidth: improved both FMA4 and FMA3 code-paths for better STREAM/Triad performance (versus AVX).
- Benchmark: Memory/Cache Latency: 32-bit/x86: Rolled back assembler > intrinsic change as the x86 compiler decided to generate non-optimal loop for the latency test. x64 compiler worked just fine.
- Fixes: Turbo multiplier detection on Bulldozer where P-States are not as expected (e.g. when overclocked manually or BIOS setting them up incorrectly). /end quote
Do you think Windows 8 will be able to schedule threads better for Bulldozer?
And do you want me to revisit Fx-8150 when i get back to testing i7-3770K?
|
|
|
|
 |
|
jaydip
|
#2)
Post subject: Re: Sandra SP4a: Bulldozer/Piledriver fixes and optimisations  Posted: Sun May 13, 2012 12:58 am |
Joined: Mon Mar 26, 2012 11:32 am Posts: 1792 Location: India
|
The new scheduler can get u max 5-6% improvements so leave it be 
|
|
|
|
 |
|
BallaTheFeared
|
#3)
Post subject: Re: Sandra SP4a: Bulldozer/Piledriver fixes and optimisations  Posted: Sun May 13, 2012 2:16 am |
Joined: Sun Mar 18, 2012 5:00 pm Posts: 1850 Location: Nv Headquarters
|
|
I'm down for a good laugh.
|
|
|
|
 |
|
apoppin
|
#4)
Post subject: Re: Sandra SP4a: Bulldozer/Piledriver fixes and optimisations  Posted: Sun Jul 22, 2012 6:48 pm |
Joined: Fri Jul 04, 2008 1:26 am Posts: 19664 Location: 404 - Not Found!
|
|
|
|
 |
|
Ocre
|
#5)
Post subject: Re: Sandra SP4a: Bulldozer/Piledriver fixes and optimisations  Posted: Tue Jul 24, 2012 1:29 am |
Joined: Mon Nov 22, 2010 5:22 am Posts: 2113
|
|
|
|
 |
|
grstanford
|
#6)
Post subject: Re: Sandra SP4a: Bulldozer/Piledriver fixes and optimisations  Posted: Tue Jul 24, 2012 1:35 am |
Joined: Sat Apr 24, 2010 11:19 am Posts: 4965
|
|
They wouldn't be so poor if they hadn't squandered 5 billion on ATi, only to have intel steam-roller them in on-CPU graphics anyway.
_________________ This is such total Horse-S**t! "At NVIDIA we know that all shredders are green." --Jensen Huang Adam knew he should have bought a PC, but Eve fell for the marketing hype.
|
|
|
|
 |
|
|
 |
|
 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|