| | Great tool!! At last the numbers make some sense?
Got now: GPU:M34027 CPU: M3157 (multithreaded)
GPU:S34036 CPU: S535 (singlethreaded)
Finally the GPU shows its true coloros agains the goold ol cpu.  | |
| | something not working right got M3xxx with GTX 280 (7 Ultimate 64 bits with 195.39)
and M5xxx with my Q6600 | |
| | Quote:
Originally Posted by bejito81 something not working right got M3xxx with GTX 280 (7 Ultimate 64 bits with 195.39) and M5xxx with my Q6600 | With which version? 0.25? | |
| | Using CAT 9.11 beta on my ATI HD4770 still "No DirectCompute Support". | |
| | Quote:
Originally Posted by HaZe303 At last the numbers make some sense?  | Hi! The numbers you get represent relative computing performance. M15k for GPU is a 'typical' result for 9800gt (OC/AMP Edition, 700/1800/900 clocks, 112SPs, 16ROPs) so if you get M30k it means your card is twice as fast as 9800gt AMP+. The same for CPU, getting M5k means your CPU reach 1/3 of 9800gt performance doing the same job (same codepath, sse2 optimized multithreaded code).
"S" for CPU means singlethreaded code while "S" for GPU means passing D3D11_CREATE_DEVICE_SINGLETHREADED flag to the DirectX11 device (could be misunderstood as a singlethreaded code execution on GPU) | |
| | Quote:
Originally Posted by Dyre Straits Using CAT 9.11 beta on my ATI HD4770 still "No DirectCompute Support". | Unfortunetly ATI has not yet published DirectCompute enabled drivers for HD4xxx/DX10.1 GPUs...:/ You need HD5xxx or any GeForce8+. | |
| | Quote:
Originally Posted by bejito81 something not working right got M3xxx with GTX 280 (7 Ultimate 64 bits with 195.39)
and M5xxx with my Q6600 | Please try 0.25 version, if it's still not working for you, try latest NOT beta forceware (191.07 for now). If it still doesn't work, please write me PM and I will try to help/fix it. | |
| | Any chance you could explain why a GTX 260 is scoring over double that of a 5870? Something looks really wrong with this benchmark, does the code favour Nvidia cards somehow? | |
| | ok it seems it was with 0.15
it seems working fine now (well results are less strange)
i do M13000 with gtx 280 factory settings (which seems very low)
and M1100 with Q6600 @ 3Ghz | |
| | vo.25
9800gtx+ gpu m8750 s8750 ? proplem
q9650 m3383 s 282 | |
| | Quote:
Originally Posted by bejito81 ok it seems it was with 0.15
it seems working fine now (well results are less strange) | Please don't compare results from previous versions to 0.25. Next update(s) I will try to keep the score's range but sometimes it's impossible (when a bug in the benchmark makes wrong scores in some cases) Quote:
Originally Posted by bejito81 i do M13000 with gtx 280 factory settings (which seems very low)
and M1100 with Q6600 @ 3Ghz | The scores you get mean your GPU is 12x faster than CPU doing the same job. Overclocked i7 can reach M3000 but it's still very low score comparing to GPU.
What drivers do you use? 191.07 or 195.39 beta? I've seen 195 scores worse than 191.07 but as long as it's beta driver I'm not sure if I can fix something in the tool or it's a driver issue. | |
| | i use 195.39 beta, seeing peeps should do M11000 with old 9800 i was wondering why my gtx280 only does M13000 | |
| | Quote:
Originally Posted by bejito81 i use 195.39 beta, seeing peeps should do M11000 with old 9800 i was wondering why my gtx280 only does M13000 | You should get more than M13000 with your gtx280 but until 195.39 are in beta stage I'm not sure where is the problem. You should easly get M20000-M30000 on 191.07 WHQL. We need to wait for the next WHQL release. | |
| | Quote:
Originally Posted by Pat Hi! The numbers you get represent relative computing performance. M15k for GPU is a 'typical' result for 9800gt (OC/AMP Edition, 700/1800/900 clocks, 112SPs, 16ROPs) so if you get M30k it means your card is twice as fast as 9800gt AMP+. The same for CPU, getting M5k means your CPU reach 1/3 of 9800gt performance doing the same job (same codepath, sse2 optimized multithreaded code).
"S" for CPU means singlethreaded code while "S" for GPU means passing D3D11_CREATE_DEVICE_SINGLETHREADED flag to the DirectX11 device (could be misunderstood as a singlethreaded code execution on GPU) | Yeah I know that, I was comparing the 0.25 results to the 0.15. The 0.15 did not make any sense as I got more with my cpu than my 5870. Thats what I meant with its making sense now.  | |
| | I've got M10907 on my GF8600GT. Is this normal? | |
| | thanks for this benchmark- it's definately needed... but my GPU score seems too high compared to the 5770 (which should be just as fast).
could the NVidia drivers be so much better?
GPU 68449 / CPU 2203
--------------------------
Windows 7 x64 / i5 750 @ 4.0Ghz / GTX 260 OC 216 / driver 191.07 / 4GB DDR3 2000
using DirectCompute Benchmark v0.25 | |
| | Quote:
Originally Posted by Unregistered Any chance you could explain why a GTX 260 is scoring over double that of a 5870? Something looks really wrong with this benchmark, does the code favour Nvidia cards somehow? | Computing performance is one thing and rendering performance something different. I dont know if GTX260 is faster or slower than HD5870 in GPGPU. It depends of course on GPU and drivers. I use the same shader code for both ATI/NV, in fact I don't chceck the gpu vendor just compile the shader code and run it on GPU - all using DirectX API so I dont even 'touch' drivers API or direct GPU registers. DirectX11 does it for me.
The score can be affected by beta drivers. I really don't favour Nvidia, I personaly own HD4890 (Sapphire Toxic version) as a primary rendering device and a GeForce 9800gt ECO for PhysX acceleration. | |
| | I want to clear up any doubts about my benchmark:
1) I don't favour any vendor, the codepath is one and the same for ATI/NV any any other DirectCompute enabled device that could be istalled in the system. It looks like this:
benchmark(compute shader code) -> DirectX11 API (compiler) -> DirectX11 API (run) -> drivers -> GPU.
I personally own HD4890 Toxic as a primary rendering device and GeForce 9800gt as a dedicated PhysX card.
2) Rendering capability is not the same as computing capability - look at FluidMark scores, you can find GTX260/275/295 scores lower or comparable to 9800gt.
3) NVidia has enabled DirectCompute feature in 190.62 WHQL driver (21.08.2009) so we can assume it's more stable and mature than ATI implementation (note that NVidia had CUDA/PhysX for long time so we can assume it was easier for them to enable DirectCompte)
4) Benchmark is still in 0.xx version which means it CAN have some bugs.
And last word about my drivers expectations. I really wait for a DirectCompute enabled drivers for HD4xxx. I know ATI wants to sell DX11 cards but it's not fair that you can have DirectCompute on 9400gt and not on HD4890... | |
| | Quote:
Originally Posted by Pat I don't favour any vendor, the codepath is one and the same for ATI/NV any any other DirectCompute enabled device that could be istalled in the system. | Don't pay attention to the fanboy rash. I find it hilarious how people rush to cry foul if their favorite manufacturer isn't on top. I understand there is some corruption around - but getting so paranoid is simply ridiculous.
For your information, I tried to talk to AMD about this benchmark, but AMD doesn't support consumer projects like this one, or projects like ATI Tray Tools [despite the fact they use it themselves]. It appears they only support multi-billion developers like Adobe. |
Last edited by Regeneration; November 8th, 2009 at 03:47 PM..
| Quote | | |
| | GeForce GTX275 with 195.39 Beta on Vista 64, latest DX updates:
DirectCompute Benchmark 0.25 = M12520 / S12514
Did several runs. Some guys got 13000 with their GTX280, so this looks okay but doesn't compare to other results. I'll check AMD later on... | |
| | Quote:
Originally Posted by Pat ...the codepath is one and the same for ATI/NV any any other DirectCompute enabled device that could be istalled in the system. It looks like this:
benchmark(compute shader code) -> DirectX11 API (compiler) -> DirectX11 API (run) -> drivers -> GPU.
... | Hello Pat,
i'm very interested in this theme. When you use DirectCompute ist there any way to consider that ATI-Shaders (5D) are different from Nvidia-Shaders (1D)? If i unterstand your posting above right, then there isn't a possibility. Is this right?
Therefore a better performance can only be reached by a better ATI-Driver, which can make a better use of the 5d-shaders of the GPU.
ciao Tom | |
| | Geforce GTX 260 @216sp, driver ver. 191.07 WHQL
Core 2 Duo E8500
Win7 x64
And this one on Atom Z520(netbook Acer 751h):
BTW what is the meaning of D3D11_CREATE_DEVICE_SINGLETHREADED flag? | |
| | Quote:
Originally Posted by doelf GeForce GTX275 with 195.39 Beta on Vista 64, latest DX updates:
DirectCompute Benchmark 0.25 = M12520 / S12514
Did several runs. Some guys got 13000 with their GTX280, so this looks okay but doesn't compare to other results. I'll check AMD later on... | You should get 2x-3x higher result with 191.07 WHQL drivers. When finally we get 195 WHQL and the results will still differ so much from 191.07, I'll look closer at this issue. | |
| | Hello Pat,
do you read my posting from November 10th, 2009, 03:15 PM?
i have a further question: Do you use single or double precision for your calculations?
Can you build in a switch into your benchtool to chose between SP and DP?
Ciao Tom | |
| | "You should get 2x-3x higher result with 191.07 WHQL drivers. When finally we get 195 WHQL and the results will still differ so much from 191.07, I'll look closer at this issue."
I installed the 191.07 WHQL drivers first, but the benchmark said "DirectCompute support: NO". Same with Catalyst 9.11 beta with and without the Stream SDK 2.0 beta 4: "DirectCompute support: NO". I tried Radeon HD 4870/4890 and 4870 X2 running Vista 64.
Regards,
Michael | |
| | Quote:
Originally Posted by doelf I installed the 191.07 WHQL drivers first, but the benchmark said "DirectCompute support: NO". Same with Catalyst 9.11 beta with and without the Stream SDK 2.0 beta 4: "DirectCompute support: NO". I tried Radeon HD 4870/4890 and 4870 X2 running Vista 64. | That's strange, according to nVidia changelog ( http://www.nvidia.com/object/win7_wi...0.62_whql.html) they have introduced DirectCompute in 190.62 for both x86 and x64) I'm using win7 32bit so didn't try x64 drivers. You don't have to try Catalyst for HD4xxx, it's not supported by the drivers yet. | |
| | Quote:
Originally Posted by Unregistered Hello Pat,
do you read my posting from November 10th, 2009, 03:15 PM?
i have a further question: Do you use single or double precision for your calculations?
Can you build in a switch into your benchtool to chose between SP and DP?
Ciao Tom | Yes, I've read your post. I'm using single precision (float) for calculations. I'll think about the switch in the next release  If you want to talk/ask about tech details, please write me a private message but I don't want to optimize the tool for any vendor, the shader code is the same, compiled with the highest available profile (cs_4_0/4_1/5_0) for the GPU.
I've just downloaded the ATI driver with OpenCL support (beta) and will try to add OpenCL/DirectCompute switch to the tool  | |
| | Hi,
will there be any support for mobility cards? | |
| | Can anyone please tell me why I still dont have directcompute hardware support in this benchmark?
Vista sp2 dx11 installed 32bit
forceware 191.07 WHQL
Gigabyte Geforce 8800GTS 512Mb | |
| | Quote:
Originally Posted by Unregistered Hi,
will there be any support for mobility cards? | I'm sure it should work on any hardware as far as its drivers support DirectCompute. Even on Larrabee in faraway future. Quote:
Originally Posted by Stojan Can anyone please tell me why I still dont have directcompute hardware support in this benchmark?
Vista sp2 dx11 installed 32bit
forceware 191.07 WHQL
Gigabyte Geforce 8800GTS 512Mb | Check DirectCompute support in Lavalys Everest: http://www.lavalys.com/gfx/whatsnew/gpgpu_d3dcs_en.jpg (it is shown as Direct3D in GPGPU section). |
Last edited by MASTAN; November 13th, 2009 at 05:08 PM..
| Quote | | |
| | Quote:
Originally Posted by Unregistered Hi,
will there be any support for mobility cards? | It depends on drivers. Look in the readme/changelog of your drivers if it's supported. The latest version released by NVidia for Mobile GPUs is 186.81 WHQL and this version supports only CUDA so no DirectCompute/OpenCL in this version. But you can try 195.39 beta: http://www.nvidia.com/object/noteboo...5.39_beta.html
They support:
"GeForce 8M, 9M, 100M, and 200M-series notebook GPUs."
And:
"New in this release:
Adds support for DirectCompute with Windows 7.
Adds support for OpenGL 3.2.
Adds support for OpenCL 1.0 (Open Computing Language).
Adds support for CUDA Toolkit 3.0..."
That could work for mobile GPUs. |
Last edited by Pat; November 13th, 2009 at 06:23 PM..
| Quote | | |
| | I meant the ATI Radeon Mobility cards. I had a HD 4650. Modified 9.11 Beta wih DH Mobility Modder and installed. Becnhmarks says no support for DC. | |
| | Quote:
Originally Posted by Unregistered I meant the ATI Radeon Mobility cards. I had a HD 4650. Modified 9.11 Beta wih DH Mobility Modder and installed. Becnhmarks says no support for DC. | ATI has not yet released DirectCompute enabled drivers for HD4xxx gpus. Even my 4890 has 'no DirectCompute support'. But we've got OpenCL support in 9.11 beta (ATI STREAM SDK 2.0 beta4) and I'm working on OpenCL part right now so it should be possible to run the benchmark on OpenCL API. Next release should have it finished. Now you can try 0.31 beta to find out if OpenCL support is available for you. | |
| | I'm using the CAT 9.11 beta drivers, downloaded the 0.31 beta and tried to run it in OpenCL. | |
| | Quote:
Originally Posted by Dyre Straits I'm using the CAT 9.11 beta drivers, downloaded the 0.31 beta and tried to run it in OpenCL. | As I wrote, OpenCL support is not finished yet BUT you should see some OpenCL properties like version, name and max computing units. Can you find OpenCL.dll in your system? If so, where is it located? It should be on a system path.
You need OpenCL.dll in the system path to get it working. If you don't have it, install this driver and SDK: http://developer.amd.com/gpu/ATIStre...ault.aspx#five |
Last edited by Pat; November 16th, 2009 at 12:59 PM..
| Quote | | |
| | some progress (HD4890, 9.11 Beta)
When we could expect release with OpenCL support? I'm courious about what are the OpenCl/DirectCompute performance differences between hd48xx and hd58xx series ... | |
| | just tested with gtx280 and 195.55, still very low score (lil more than 13k) | |
| | Quote:
Originally Posted by Peter When we could expect release with OpenCL support? | Soon  I finally got OpenCL working for both my cards (4890 and 9800gt) so probably this weekend I'll upload some update  | |
| | I've just uploaded OpenCL enabled benchmark. Now you can select the GPGPU API. The tool is here.
Please try it, especially OpenCL part. Please also read the short readme if you're not sure about OpenCL support in your system. | |
| | Core 2 Duo E8500
GF 260 @216sp
no overclocking 
Validation:
DirectCompute 191.07 и 195.55:
GPU Checksum: 1370969.153194
CPU Checksum: 1370969.611431
PASSED
OpenCL 195.55:
GPU Checksum: 1370969.267754
CPU Checksum: 1370969.611431
PASSED
What is the meaning of these checksum numbers? Shouldn't they be exactly the same?
P.S. Some more screens of 0.35: http://forums.overclockers.ru/viewto...14884#p6714884 |
Last edited by MASTAN; November 24th, 2009 at 05:31 PM..
| Quote | | |
| | Quote:
Originally Posted by MASTAN Validation:
DirectCompute 191.07 и 195.55:
GPU Checksum: 1370969.153194
CPU Checksum: 1370969.611431
PASSED
OpenCL 195.55:
GPU Checksum: 1370969.267754
CPU Checksum: 1370969.611431
PASSED
What is the meaning of these checksum numbers? Shouldn't they be exactly the same? | Those checksum are calculated based on some subset of the result data (FFT outputs) and don't mean anything important, just should be more less the same for GPU and CPU.
I use single precision for both GPU and CPU (float 32-bit) and the calculations use a lot of sine/cosine functions that will never be 100% accurate. You will always have some round error and probably different precision for sin()/cos() functions on CPU and GPU. That's why its not exactly the same but it's still fine (about 10e-7 aggregated error for multiple calculations) |
Last edited by Pat; November 24th, 2009 at 09:11 PM..
| Quote | | |
| | Quote:
Originally Posted by MASTAN | Thanks! BTW. Greetings from Poland!  | |
| | Thanks!  | |
| | GTX 280 stock
Q6600 @ 3Ghz
Seven 64
driver 195.55
D10199
C59901
M1204 (i just launched a prog during it so can not be accurate)
congratz to nvidia for implementing so well direct compute  , well at least with opencl it rox  | |
| | Results of nVidia's fresh 195.62WHQL drivers are pretty close to 195.55:
D12640/C52321/M652 195.55
D12662/C52336/M655 195.62 WHQL
(v0.35, Win7 x64, GF260, Core 2 Duo E8500)
Also I've measured time that takes every test:
74/179/187 - DirectCompute/OpenCL/CPU respectively, in seconds.
It is said in readme that you can compare D/C/M scores, as they use the same codepath. Then why OpenCL, which takes more than 2 times of DC, has much higher score? | |
| | Quote:
Originally Posted by MASTAN Results of nVidia's fresh 195.62WHQL drivers are pretty close to | Thanks for info  Yesterday I installed 195.62 beta... Quote:
Originally Posted by MASTAN Also I've measured time that takes every test:
74/179/187 - DirectCompute/OpenCL/CPU respectively, in seconds.
It is said in readme that you can compare D/C/M scores, as they use the same codepath. Then why OpenCL, which takes more than 2 times of DC, has much higher score? | That's because the code for GPU is dispatched multiple times, it's not one "block" of code but multiple thread groups with multiple threads in one group.
So the codepath (single thread path) is the same but called D times for DirectCompute, C times for OpenCL and M times for CPU. Then the score is calculated like D/execute_time, C/execute_time and M/execute_time so it doesn't matter for the final score if the benchmark runs 1s, 1m or 1 day
In the next release (actually I've already got it working) also the total number of threads will be the same for both DirectCompute and OpenCL (the score should not be affected but the runtime will - you will get more/less the same runtime for DirectCompute and OpenCL)
Now I'm working on:
1) CUDA/STREAM support
2) OpenCL (CPU device) support for both ATI/NV implementation
3) Simple rendering techdemo (Phong-shaded object fully 'software rendered' using GPGPU)
4) Smarter validations (not only the final sum but 'online' on the thread group level)
5) Same 'threads layout' for both OpenCL and DirectCompute (except for CPU OpenCL devices and CPU - we don't want to wait a day for the result  ) | |
| | Huh got D135804 GPU with 5850? bug or real? | |
| | Thanks for the explanation!
I wonder then, why DirectCompute is 3-4 times slower than OpenCL on both ATI & nVidia. Quote:
Originally Posted by Pat Now I'm working on: | Wow, that is really something! | |
| | Quote:
Originally Posted by MASTAN I wonder then, why DirectCompute is 3-4 times slower than OpenCL on both ATI & nVidia. | It's because of current threads 'layout' which is different for DirectCompute and OpenCL. For example if you want to dispatch 10000 threads you can do it in many ways like 10 groups, each group 1000 threads (CS5.0) or 100 groups each 100 threads or even 1000 groups each 10 threads. Every layout will do 10000 kernels (ok, skipping technical details if you can split your gpu code for every thread layout, synchronize results etc) but there will be different total runtime for each case.
When I was working on DirectCompute part I assumed some treads layout and I was keeping this until current 0.35 (to keep the scores more less the same). But in the meantime, working on OpenCL part I 'cleaned' the GPU code and retested different layouts and found out a better one (at least for OpenCL).
Next version will have the same layouts for OpenCL and DirectCompute (that's for sure) and probably for CUDA/Stream if I could do it 'on time' (well, I prefer to release 'small steps' to get more feedback than just final 1.0 in 2 months...) The only problem is that the scores usually changes from version to version but I always say that it's still v0.xx and the scores should not be compared to any other version of the benchmark.
BTW. I'm going to rename the tool to something like GPGPU benchmark because it's not only a DirectCompute benchmark and that will be the time when I change the scoring range (again  ) | |
| | Hi!
I finally updated the benchmark! You can get the new 0.40b version here.
This time I need your help to update the benchmark to non-beta status:
1) Please try the combined benchmarks. I could only try it for OpenCL and it works fine but the 'Combined DirectCompute' is completely untested. Combined benchmarks need more than one GPU (one could be ATI and one NVidia - doesn't matter). I'm not sure how it works (if it works) for dual-GPU cards (see next point)
2) I need some testers with dual-GPU cards like HD4870 x2 or GTX295. If you have some time to test, write me a private message 
3) I also need someone with two identical cards (anyone with three or four??  | |
| | I've just created a dedicated thread for DirectCompute benchmark
It's here so please post any comments/questions/bugs there. | |
| | yeah, my gpu can finally show what it is capable of :-)
Radeon HD5770 / AMD Athlon X2 7850 Black Edition (2,8 GHz x 2)
Direct Compute Score: D108070
CPU Score: M551 | |