|  | | DirectCompute Benchmark 0.35 | 
Our very own Pat has released a new version of his DirectCompute Benchmark. This small tool now allows you to benchmark both DirectCompute and OpenCL general-purpose computing APIs by calculating tons of FFT-like data and some memory transfers. This version adds full OpenCL support, shaders profile selection, versions reporting and some basic results validations.
DirectCompute is an application programming interface that takes advantage of the massively parallel processing power of a modern graphics processing unit to accelerate PC application performance in Microsoft Windows Vista or Windows 7. DirectCompute is part of the Microsoft DirectX collection of APIs.
OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. OpenCL includes a language for writing kernels, plus APIs that are used to define and then control the platforms. OpenCL provides parallel computing using task-based and data-based parallelism. Minimum Requirements
• Windows Vista SP2 or Windows 7
• DirectX 11
• Latest DirectX End-User Runtimes
• GPU with DirectX 10 support and above. Recommended Drivers
• ATI Catalyst 9.12
• ATI Stream SDK 2.0 Beta 4 with OpenCL Drivers
• Nvidia ForceWare 195.81
Download this file in our downloads section.
Last edited by Regeneration; December 18th, 2009 at 02:21 AM..
| | | | 53 Comments | | | Nice, keep the great work up!
EDIT: How did you get those OpenCL scores??? Did not work on my 5870 with cat 9.11?
EDIT2: Oh nevermind, I saw now that you need the beta OpenCL drivers. Will test soon with those drivers, thx anyways..  |
Last edited by HaZe303; November 24th, 2009 at 04:35 AM..
| Quote | | | | | | | nice way to compare ati vs nvidia then  | | | | Quote:
Originally Posted by HaZe303 Nice, keep the great work up!
EDIT: How did you get those OpenCL scores??? Did not work on my 5870 with cat 9.11?
EDIT2: Oh nevermind, I saw now that you need the beta OpenCL drivers. Will test soon with those drivers, thx anyways..  | For ATI, you need 9.11+ (beta or 'full') AND ATI Stream v2.0 beta4 SDK installed. ATI Catalyst does not include OpenCL.dll (Forceware 195+ does) so you need this SDK to get it working. It's not big deal, SDK is about 32MB and you can install just 'dev' part (13MB) without samples. It will install OpenCL.dll in C:\Program Files\ATI Stream\bin\x86 for the default x86 settings and add this path to the system PATH variable.
The screens with HD5xxx scores are not mine  I own only 4890  | | | | i know someone in my neighbour village who sells a 5870 for 360 euro, used 
although you cannot get hands on one here, it seems noone is interested in it too *lol* | | | | Quote:
Originally Posted by lumo i know someone in my neighbour village who sells a 5870 for 360 euro, used 
although you cannot get hands on one here, it seems noone is interested in it too *lol* | I'm a big fan of Vapor-X solutions from Sapphire, my Toxic 4890 is really great and I'm waiting for 5870 Toxic/Atomic edition. They already released Vapor-X version but I'll wait for Toxic  I can find ASUS 5870 (new) for 390 euro on our auction service (allegro.pl) but it's still a reference design. (no OC, hot and noisy) | | | | Can someone plz explain how scores are interpreted?i see Letters and numbers as well in scores,tell me how it works...Thnx in advance  | | | | Quote:
Originally Posted by 3dnab Can someone plz explain how scores are interpreted?i see Letters and numbers as well in scores,tell me how it works...Thnx in advance  | Letters are only to describe values so when you post/compare results like "D1234/C567/M89" it should be clear that 1234 is DirectCompute result, 567 is for OpenCL and 89 for CPU. No more M/S confusing subresults  The numbers are (should be, it's still 0.xx version) comparable, the codepath is the same for CPU/GPU and for DirectCompute/OpenCL.
The tool is still under development so you should not compare results from the different benchmark versions. | | | | Hi, I installed the AMD SDK beta4 in Windows7(7100)/4870, CAT911final, but the utility show me "No OpenCL Support". I copied the OpenCL_ATI.DLL in directory OpenCL, too.. but fail yet... what i do now? | | | | Quote:
Originally Posted by Gorgeous Hi, I installed the AMD SDK beta4 in Windows7(7100)/4870, CAT911final, but the utility show me "No OpenCL Support". I copied the OpenCL_ATI.DLL in directory OpenCL, too.. but fail yet... what i do now? | I think only hd5xxx series support OpenCL in drivers at the moment, other series will be supported in the future? Correct me if im wrong guys? | | | | Quote:
Originally Posted by HaZe303 I think only hd5xxx series support OpenCL in drivers at the moment, other series will be supported in the future? Correct me if im wrong guys? | OpenCL should also work for HD4xxx, the list of all supported GPUs is here.
The problem is with DirectCompute feature on HD4xxx which is not supported by the drivers yet. | | | | Hi Pat,
Are the same calculations done on Direct compute and OpenCL in this benchmark? If so, is it safe to reason then that OpenCL is twice as efficient as Direct compute? Also, is the CPU implementation a "best case" cpu implementation with SSE optimizations and say OpenMP (or equivalent), or is it the same OpenCL/Direct Compute code just running on the CPU?
Basically I would just like to know if the scores are fair comparisons. Nice work btw.
Thanks | | | | Quote:
Originally Posted by Gorgeous Hi, I installed the AMD SDK beta4 in Windows7(7100)/4870, CAT911final, but the utility show me "No OpenCL Support". I copied the OpenCL_ATI.DLL in directory OpenCL, too.. but fail yet... what i do now? | It seems it should work for you but I'm not sure about OpenCL support in win7 RC - can someone confirm that it's working on build 7100?
HD4870 supports OpenCL, you installed the latest 9.11 catalyst and ATI Stream v2.0 beta4 so the only thing I wonder is your OS. Is this 32bit RC1? And tell me where OpenCL.dll was installed? In the Program Files or system32 folder? Is this file on your system PATH? (Stream installer adds OpenCL.dll to the path but maybe it failed for you) | | | | Quote:
Originally Posted by TheBob Hi Pat,
Are the same calculations done on Direct compute and OpenCL in this benchmark? If so, is it safe to reason then that OpenCL is twice as efficient as Direct compute? Also, is the CPU implementation a "best case" cpu implementation with SSE optimizations and say OpenMP (or equivalent), or is it the same OpenCL/Direct Compute code just running on the CPU?
Basically I would just like to know if the scores are fair comparisons. Nice work btw.
Thanks | That's my goal for the final 1.0 version. But I think I'm very close. The code is the same for DirectCompute, OpenCL and CPU. The difference is only that DirectCompute (HLSL) uses "inout" (no pointers) and OpenCL/CPU code has pointers but I'm sure it doesn't make big difference, probably DX compiler generates more/less the same code as OpenCL compiler.
Total number of calculations done is not the same (especially for the CPU, that would make you wait hours to complete  ) but the score is calculated using simple formula total_numer_of_calculations/calculations_time so this should be compensated. The benchmark runs 50 series of calculations (for any API) and uses some simple statistics formulas (standard deviation) to provide stable results.
There are some more technical details that influence benchmark results like the way you allocate and use memory on GPU, the number of threads you can dispatch at once (using one API call), the way you synchronize and read computing results back from the GPU memory but that's I can't change. What I can is to use the APIs in the best way and that's what I'm working on now (especialy DirectCompute part) so in the next releases, DirectCompute scores could be higher. | | | | Installed OCL BETA software and when I run the benchmark in OCL mode i get this error  | | | | Quote:
Originally Posted by Gorgeous Hi, I installed the AMD SDK beta4 in Windows7(7100)/4870, CAT911final, but the utility show me "No OpenCL Support". I copied the OpenCL_ATI.DLL in directory OpenCL, too.. but fail yet... what i do now? | May be you can try the beta driver released with SDK beta4. | | | | Quote:
Originally Posted by Pat OpenCL should also work for HD4xxx, the list of all supported GPUs is here.
The problem is with DirectCompute feature on HD4xxx which is not supported by the drivers yet. | Thx pat for clearing that up, although wasnt that what I said??  | | | | Quote:
Originally Posted by HaZe303 Thx pat for clearing that up, although wasnt that what I said??  | Aaaaa, now I see "I think only hd5xxx series support OpenCL in drivers at the moment"
I didn't check the latest driver packages for HD5xxx. I must check if this DLL works with HD4xxx. Maybe it does  | | | | Quote:
Originally Posted by zme-ul Installed OCL BETA software and when I run the benchmark in OCL mode i get this error  | What OS? Did you install Catalyst 9.11 beta or 'full'? | | | | Win7 32bit, Radeon 4670, beta drivers that come with Stream beta
App crashes:
Problem signature:
Problem Event Name: APPCRASH
Application Name: DirectComputeBenchmark.exe
Application Version: 0.3.5.1
Application Timestamp: 4b0b38cb
Fault Module Name: aticaldd.dll
Fault Module Version: 6.14.10.467
Fault Module Timestamp: 4ae5df84
Exception Code: c0000005
Exception Offset: 000484bb
OS Version: 6.1.7600.2.0.0.256.4
Locale ID: 1026
Additional Information 1: 0a9e
Additional Information 2: 0a9e372d3b4ad19135b953a78882e789
Additional Information 3: 0a9e
Additional Information 4: 0a9e372d3b4ad19135b953a78882e789 | | | | Quote:
Originally Posted by Pat What OS? Did you install Catalyst 9.11 beta or 'full'? | 9.11 drivers (full) & W7 retail package fully updated | | | | Quote:
Originally Posted by chavv Win7 32bit, Radeon 4670, beta drivers that come with Stream beta
App crashes:
Problem signature:
Problem Event Name: APPCRASH
Application Name: DirectComputeBenchmark.exe
Application Version: 0.3.5.1
Application Timestamp: 4b0b38cb
Fault Module Name: aticaldd.dll
Fault Module Version: 6.14.10.467
Fault Module Timestamp: 4ae5df84
Exception Code: c0000005
Exception Offset: 000484bb
OS Version: 6.1.7600.2.0.0.256.4
Locale ID: 1026
Additional Information 1: 0a9e
Additional Information 2: 0a9e372d3b4ad19135b953a78882e789
Additional Information 3: 0a9e
Additional Information 4: 0a9e372d3b4ad19135b953a78882e789 | Please try the official 9.11 catalyst drivers (do not uninstall Stream SDK, just update the driver). If it's still crashing (I assume it crashes on load, not during the benchmark?) send me the version and size (in bytes) of your OpenCL.dll
EDIT: Can you run any other Stream/OpenCL apps/samples without problems?? |
Last edited by Pat; November 25th, 2009 at 10:04 PM..
| Quote | | | | | Quote:
Originally Posted by Unregistered May be you can try the beta driver released with SDK beta4. | Hi!!
I tested with beta driver provided with SDK beta 4, and with cat911 full. My Win7 is 64-bits, 7100-RC. My samples in directory x86_64 of OpenCL , is ready and running with both drivers.... With bench utility, I put OpenCL.dll in the current patch, in the system32, in directory OpenCL... no funcional, yet.... | | | | Quote:
Originally Posted by zme-ul 9.11 drivers (full) & W7 retail package fully updated | Ok, so probably your DirectX is also updated. Do you overclock your card? If not I must investigate it, maybe it's a problem with X2 cards.
Can someone with ATI X2 card confirm (or not) this issue?
EDIT: One more thing - do you have "Visual C++ 2008 Redistributable Package (x86)" installed? It's required too but probably you already got it. |
Last edited by Pat; November 25th, 2009 at 10:00 PM..
| Quote | | | | | @Pat
yep, I got that too; failed to mention W7 is in x86-64 flavor
and no, the card is not overclocked |
Last edited by zme-ul; November 26th, 2009 at 06:27 PM..
| Quote | | | | | Quote:
Originally Posted by Pat Please try the official 9.11 catalyst drivers (do not uninstall Stream SDK, just update the driver). If it's still crashing (I assume it crashes on load, not during the benchmark?) send me the version and size (in bytes) of your OpenCL.dll
EDIT: Can you run any other Stream/OpenCL apps/samples without problems?? | yes, I first tried 9.11, but the crash seems same - the app starts, few cmd prompts blink on screen and after 2-3 secs it crashes, ie looks like it starts.
Some examples from SDk do start - almost all CAL examples work, but only few OpenCL do - most opencl fail with error creating surface (?!)
And for fun - I tried running DC0.35 on the on-board NV 8100 - and it worked just fine in direct compute - got D2949 (cpu which as an old sempron 3000+ was 10-20% utilized by the bench), opencl path srated, got cpu loaded at 100%, system barely responsible and after waiting 2 minutes I killed it, perhaps it was running, just overloaded the cpu  | | | | It's working fine for me except for the directcompute benchmark which isn't working but that's normal cause I only have a 4870. I have a nice score i think comparing to some nvidia card like 285.
I have C290914 to the opencl benchmark.
see you guys | | | | Quote:
Originally Posted by Gorgeous Hi!!
I tested with beta driver provided with SDK beta 4, and with cat911 full. My Win7 is 64-bits, 7100-RC. My samples in directory x86_64 of OpenCL , is ready and running with both drivers.... With bench utility, I put OpenCL.dll in the current patch, in the system32, in directory OpenCL... no funcional, yet.... | Can you run the demo in sdk samples?There is no need to move OpenCL.dll to other folders, you should be able to run the bench after installation.May be your OS is the problem. I get C216721 with my HD4850. | | | | Some results:
HD4850: C216721
HD5870  130560,C525767
8800GT  15xxx,C49xxx
GTX285  14102,C69866
It seems that this bench has some problems with NV cards or AMD cards.Scores are too low on NV cards or too high on AMD cards. | | | | OpenCL works only with the OpenCL beta catalyst drivers for me. As mentioned earlier by PAT is wrong, the 9.11´s and newer do not work with the openCL test? Atleast not for me. | | | | Quote:
Originally Posted by Pat Please try the official 9.11 catalyst drivers (do not uninstall Stream SDK, just update the driver). If it's still crashing (I assume it crashes on load, not during the benchmark?) send me the version and size (in bytes) of your OpenCL.dll
EDIT: Can you run any other Stream/OpenCL apps/samples without problems?? | I'm using correct OpenCL from ATi SDK , its 6+MB, i simply searched, renamed NV opencl (which is under 100KB), tried running OpenCL apps.
Most opencl samples from SDK work, tho several fail. All CAL examples work. | | | | you think you might do a complete system score benchmark anytime soon. like test a combined cpu and what ever video cards you have in your rig. I would like to see what a combined score I have with my phenom II, hd5850 and gtx260 | | | | Quote:
Originally Posted by Shroomalistic you think you might do a complete system score benchmark anytime soon. like test a combined cpu and what ever video cards you have in your rig. I would like to see what a combined score I have with my phenom II, hd5850 and gtx260 | I've already done Combined OpenCL and going to implement Combined DirectCompute benchmark. This will be available in the next beta (0.40b). This version will include:
1) Extended verification (realtime checksum checks, not only the final sum at the end)
2) Combined benchmarks (with workload management) but only for GPUs (it works fine for my 4890+9800 combo)
3) Final scoring system, the number you get is mega kernels per seconds. Finally the numbers are 100% comparable between different APIs, combined benchmarks, etc.
Expect this beta in a day o two  | | | | here my total score so far, this is my phenom II 920 at stock speeds and I think the video cards are even stock. | | | | Hi!
I finally updated the benchmark! You can get the new 0.40b version here.
This time I need your help to update the benchmark to non-beta status:
1) Please try the combined benchmarks. I could only try it for OpenCL and it works fine but the 'Combined DirectCompute' is completely untested. Combined benchmarks need more than one GPU (one could be ATI and one NVidia - doesn't matter). I'm not sure how it works (if it works) for dual-GPU cards (see next point)
2) I need some testers with dual-GPU cards like HD4870 x2 or GTX295. If you have some time to test, write me a private message 
3) I also need someone with two identical cards (anyone with three or four??  | | | | combined opencl works fine, combine directcompute errors saying cant find shader and then wont let me close it.
im using a gtx260 and a 5850, probly doesnt work because 5850 is 5.0 while the gtx is 4.0
EDIT:
got combined directcompute to work by setting the 5850 to 4.0 manually
for combined directcompute the workload is seperated 65% 5850 and 35% gtx260
for combined opencl its 85% 5850 and 15% gtx 260 |
Last edited by Shroomalistic; December 4th, 2009 at 02:30 AM..
| Quote | | | | | Quote:
Originally Posted by Shroomalistic EDIT:
got combined directcompute to work by setting the 5850 to 4.0 manually
for combined directcompute the workload is seperated 65% 5850 and 35% gtx260
for combined opencl its 85% 5850 and 15% gtx 260 | Thanks for testing! You are probably the first who run combined DirectCompute test
And the issue with profiles will be fixed today. I forgot that the cards could have different profiles and should be set independently for each card
PS. Could you post the combined and single scores? I wonder if combined DirectCompute is faster (it seems a bit faster generally) than combined OpenCL. | | | | Core 2 Duo E8500
Geforce 260 @216sp
no overclocking
DirectCompute CPU load - about 0%
OpenCL CPU load - 50%(not one core 100% and another 0%, but it loads both of them at 50% overall).
BTW please add "Bench 'em all" button or combobox option, so that all results will be in available in one click. | | | | | | Program crashes with a 295 and 280 installed on the OpenCL test.
Only the 295 is displayed in the drop down menu, but my 280's info is displayed down below.
Is soon as I start the OpenCL benchmark, program aborts. | | | | Quote:
Originally Posted by MASTAN | Thanks for testing!  Wow, you got Cypress at 975MHz  | | | | Version 0.41b as usually here.
Bugfixes for combined DirectCompute test. Still need any dual-GPU tester  | | | | Quote:
Originally Posted by MASTAN BTW please add "Bench 'em all" button or combobox option, so that all results will be in available in one click. | Will be in the next release  | | | | Pat I was just wondering in your professional opinion why is there such a huge gap with the scores between the gtx's and radeons in this benchmark? for example someone in an earlier post got 461 with CS 4 on a gtx260 and i get 7182 CS 4 @ stock hd 5850?
my results:
Q9550 @ 3.31, XFX Radeon HD 5850 @ 938/1250, v.41b
DC/CS5: 9303.9
OCL: 4360.3
CPU: 25.2 |
Last edited by thehippo; December 5th, 2009 at 08:10 AM..
| Quote | | | | | Quote:
Originally Posted by Pat Thanks for testing!  Wow, you got Cypress at 975MHz  | Not me actually, some user from Russian forum.  Just reposted it here.
In 0.41b DirectCompute and OpenCL tests make system almost no-responsive. In 0.40b system remains responsive, just slow(try to move window of any other app).
DC & OpenCL tests take much longer time now.
In CPU test after score appears and Benchmark button is available again, 100% CPU load does not drop for some time(~30 sec). Pressing Benchmark button again during this high CPU load hangs program: it writes "CPU: init...", then "CPU: running 0%"(sometimes it makes to non-zero value), CPU load drops to 0 and no progress after that. Pressing close or Stop button makes program show "please wait..." and it hangs(could not be closed or moved, though edit fields are selectable).
Pressing Benchmark after CPU load drops to 0% makes program run normally. | | | | Quote:
Originally Posted by MASTAN In 0.41b DirectCompute and OpenCL tests make system almost no-responsive. In 0.40b system remains responsive, just slow(try to move window of any other app).
DC & OpenCL tests take much longer time now. | I've changed the way I call DirectCompute and OpenCL APIs. I was calling 128 times the gpu code, now I'm calling only 16 times but the single call is doing 32 times more calculations (like 32 API calls before). That makes the tests longer (about 4 times) *BUT* the API overhead does not affect final scores so much. Since the score is calculated based on number of kernels done per second, the scores should be more less the same as in 0.40b (but less dependent on API overhead)
I'll look closer at this system overload during the tests but that could be the fact that now 100% of GPU is doing calculations, no time left for updating the desktop elements (the CPU remains almost idle during the test) Quote:
Originally Posted by MASTAN In CPU test after score appears and Benchmark button is available again, 100% CPU load does not drop for some time(~30 sec). Pressing Benchmark button again during this high CPU load hangs program: |
Does this issue affect only CPU test or also OpenCL/DC ? | | | | Quote:
Originally Posted by Pat Does this issue affect only CPU test or also OpenCL/DC ? | Graphics tests take so much time in 0.41b(~18 minutes for DirectCompute, about so for OpenCL on GF260) that it's hard to repeat it several times. I tried only once, and there was no hanging. On the other hand CPU test is much faster(~1 min on E8500), so I tested it multiple times, it reproduces every time.
Some additional info. Process Explorer shows that when CPU test starts, 12 additional threads are created that start consuming CPU. But after score is shown they are not closed or stop working. Only about 30 seconds later they begin to disappear until only 1(main) thread remains. If I press Benchmark before they gone then 12 more threads appear(resulting in 25 threads), some time later first 12 additional threads are closed, but new threads don't consume CPU at all(stack shows they stop at GetMessageA system function).
One more thing. Pressing Stop/Benchmark button does not reproduce this bug. Additional threads disappear right after I press Stop. Tried that many times with different progress value. | | | | I reinstall my sdk and sample, now, the OpenCL support is YES!! But, I get this error after click on Benchmark:
Could not build OpenCL programa: Error 0
CL Compilation failed:
The system cannot find the path specified.
m
I have a 4870, Win7 is 64-bits, 7100-RC, Cat911 last.... The sdk sample is fine. | | | | Quote:
Originally Posted by MASTAN Graphics tests take so much time in 0.41b(~18 minutes for DirectCompute, about so for OpenCL on GF260) that it's hard to repeat it several times. I tried only once, and there was no hanging. On the other hand CPU test is much faster(~1 min on E8500), so I tested it multiple times, it reproduces every time.
Some additional info. Process Explorer shows that when CPU test starts, 12 additional threads are created that start consuming CPU. But after score is shown they are not closed or stop working. Only about 30 seconds later they begin to disappear until only 1(main) thread remains. If I press Benchmark before they gone then 12 more threads appear(resulting in 25 threads), some time later first 12 additional threads are closed, but new threads don't consume CPU at all(stack shows they stop at GetMessageA system function).
One more thing. Pressing Stop/Benchmark button does not reproduce this bug. Additional threads disappear right after I press Stop. Tried that many times with different progress value. | Thanks for your time and bug reports!  It seems I really messed up v0.41b. The fixed 0.42b version is as usually here. (I really need to dispatch a new thread dedicated to the tool)
I know there are two issues in the 0.42b. The first issue is that the driver version is not always recognized. I have one bug report form Vista x86 user. The second issue (from the same user  ) is that the AMD Phenom II X4 940 CPU is recognized as "AMD unknown processor" in the Results window.
The first issue will be fixed, the second will probably not. I'm taking the CPU name from de device manager (where it stands "AMD unknown processor" for this user) and I don't want to build CPU-Z like engine just to report a proper CPU name  I'll look for some free CPU identification library but it's not the priority for me right now  | | | | Hi,
I am still unable to launch the Bencmark on My Mobility HD 4650 with 9.11 Modded Driver.
7, 64 Bit. | | | | I've just created a dedicated thread for DirectCompute benchmark
It's here so please post any comments/questions/bugs there. | | | | Just tried this on my x2 5600(Vista64hp sp2) with hd4770 under the latest Cat9.12 (running cs4.1) and got D117717/M601.
Only slightly over-clocked card, but the figures still impressive.
Not too shabby for a 3-year old computer.
JBV^_^ | | | | AMD Athlon(tm) 64 X2 Dual Core Processor 6000+ (2 logical CPUs)
ATI Radeon HD 4800 Series @ 750 MHz (1002 / 9440 / 851174B)
ATI Radeon Kernel Mode Driver
atikmdag 8.01.01.984
Windows 7 x64 Home Premium Edition (build 7600)
Version 0.35
DirectCompute: D137968
OpenCL: N/A
CPU: M656
Version 0.43b
DirectCompute: D2389.9
OpenCL: N/A
CPU: M13.3
Version 0.44b
DirectCompute: D2382.6
OpenCL: N/A
CPU: M16.9 | | |