News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Monte Carlo Simulation with RDRAND (32 bit)

Started by Gunther, October 07, 2013, 09:32:58 PM

Previous topic - Next topic

Gunther

Attached is the archive MC32.ZIP. It's the 32 bit version of this thread. The assembly language source works for jWasm and probably for MASM (not tested). The C source will work with GCC and should work for MS VC (not tested).

Here's a typical program output:

Generating 200 Million random numbers with RDRAND.
That'll take a little while ...

Area           = 0.250004200000000
Absolute Error = 0.000004200000000
Elapsed Time   = 23.10 Seconds

Generating 200 Million random numbers with C.
That'll take a little while ...

Area           = 0.250011705000000
Absolute Error = 0.000011705000000
Elapsed Time   = 4.11 Seconds

It's the same situation: RDRAND tends to be slow. A good software RNG makes a better and faster job. Some test results would be fine.

Gunther
You have to know the facts before you can distort them.

dedndave

 :t

p4 prescott w/htt 3.0 GHz
Generating 200 Million random numbers with C.
That'll take a little while ...

Area           = 0.250017755000000
Absolute Error = 0.000017755000000
Elapsed Time   = 19.33 Seconds

Gunther

You have to know the facts before you can distort them.

GoneFishing

#3
...

Gunther

Hi vertograd,

Quote from: vertograd on October 08, 2013, 01:35:57 AM
I simplified your C routine  for CPU performance testing purposes :

the simplification is okay. But did you read MC.PDF?

Quote from: vertograd on October 08, 2013, 01:35:57 AM
I'll post the results for GPU-generated random numbers as soon  as I figure out how to write CUDA code .

Is CUDA a real good idea? It's inseparably connected with Nvidia hardware. There's only the alternative to AMD/ATI. In any case: The application would be very strong hardware dependent.

Gunther
You have to know the facts before you can distort them.

GoneFishing

#5
...

Gunther

Hi vertograd,

Quote from: vertograd on October 08, 2013, 02:49:09 AM
It seems good for me at least - it's interesting and I  want to use it for testing hardware capabilities.

But it has drawbacks, too.

Quote from: vertograd on October 08, 2013, 02:49:09 AM
You wrote about your codec for compressing/decompressing fractal images ( for internal use I suppose). That's the case when using CUDA can drastically increase performance   

CUDA is good for 9 or 12 bit data and some kind of fixed point arithmetics. But for usual widths (32, 64, 128 bit) it's not so suitable. We solved our performance problem by using SIMD instructions.

Gunther
You have to know the facts before you can distort them.

GoneFishing

#7
...

Antariy

Here is the test with added new implementation of Axrand proc.
The modified C source built with MSVC10 linked to a MSVCRT.DLL, maximal optimization by speed. Originally the unmodified C source was compilable with MSVC10 flawlessly, too.
The added ASM source axrand_asm.asm includes new Axrand proc which seems to be faster than the old one and also has better PRNG results according to a ENT tests, and an empty proc used in the reference loop to check the time which the calculations take.

Typical result on my machine:

Generating 200 Million random numbers with C.
That'll take a little while ...

Area           = 0.250024975000000
Absolute Error = 0.000024975000000
Elapsed Time   = 28.41 Seconds

Generating 200 Million random numbers with ASM Axrand.
That'll take a little while ...

Area           = 0.249965115000000
Absolute Error = 0.000034885000000
Elapsed Time   = 21.76 Seconds

This is empty reference loop to take the calculation code time in account
That'll take a little while ...

Area           = 0.000000000000000
Absolute Error = 0.250000000000000
Elapsed Time   = 8.44 Seconds


The ENT results for the Axrand output, just for reference:

#############################################################
Test for #0 byte
Entropy = 7.999284 bits per byte.

Optimum compression would reduce the size
of this 250000 byte file by 0 percent.

Chi square distribution for 250000 samples is 248.07, and randomly
would exceed this value 50.00 percent of the times.

Arithmetic mean value of data bytes is 127.6159 (127.5 = random).
Monte Carlo value for Pi is 3.151154418 (error 0.30 percent).
Serial correlation coefficient is 0.000242 (totally uncorrelated = 0.0).



#############################################################
Test for #1 byte
Entropy = 7.999269 bits per byte.

Optimum compression would reduce the size
of this 250000 byte file by 0 percent.

Chi square distribution for 250000 samples is 254.58, and randomly
would exceed this value 50.00 percent of the times.

Arithmetic mean value of data bytes is 127.2226 (127.5 = random).
Monte Carlo value for Pi is 3.152786445 (error 0.36 percent).
Serial correlation coefficient is 0.001665 (totally uncorrelated = 0.0).



#############################################################
Test for #2 byte
Entropy = 7.999284 bits per byte.

Optimum compression would reduce the size
of this 250000 byte file by 0 percent.

Chi square distribution for 250000 samples is 247.80, and randomly
would exceed this value 50.00 percent of the times.

Arithmetic mean value of data bytes is 127.3210 (127.5 = random).
Monte Carlo value for Pi is 3.140018240 (error 0.05 percent).
Serial correlation coefficient is -0.000587 (totally uncorrelated = 0.0).



#############################################################
Test for #3 byte
Entropy = 7.999284 bits per byte.

Optimum compression would reduce the size
of this 250000 byte file by 0 percent.

Chi square distribution for 250000 samples is 248.13, and randomly
would exceed this value 50.00 percent of the times.

Arithmetic mean value of data bytes is 127.4540 (127.5 = random).
Monte Carlo value for Pi is 3.136178179 (error 0.17 percent).
Serial correlation coefficient is 0.000549 (totally uncorrelated = 0.0).



#############################################################
Test for full DWORD
Entropy = 7.999807 bits per byte.

Optimum compression would reduce the size
of this 1000000 byte file by 0 percent.

Chi square distribution for 1000000 samples is 267.70, and randomly
would exceed this value 50.00 percent of the times.

Arithmetic mean value of data bytes is 127.4034 (127.5 = random).
Monte Carlo value for Pi is 3.130836523 (error 0.34 percent).
Serial correlation coefficient is 0.019429 (totally uncorrelated = 0.0).

Antariy

Thank you, Gunther, for permission to add the code into your test :biggrin:

dedndave

nice   :t

P4 Prescott w/htt @ 3.0 Ghz
Generating 200 Million random numbers with C.
That'll take a little while ...

Area           = 0.250029345000000
Absolute Error = 0.000029345000000
Elapsed Time   = 17.61 Seconds

Generating 200 Million random numbers with ASM Axrand.
That'll take a little while ...

Area           = 0.249965115000000
Absolute Error = 0.000034885000000
Elapsed Time   = 14.23 Seconds

This is empty reference loop to take the calculation code time in account
That'll take a little while ...

Area           = 0.000000000000000
Absolute Error = 0.250000000000000
Elapsed Time   = 5.42 Seconds


TWell

AMD Athlon(tm) II X2 220 Processor 2.80 GHzGenerating 200 Million random numbers with C.
That'll take a little while ...

Area           = 0.250021205000000
Absolute Error = 0.000021205000000
Elapsed Time   = 10.84 Seconds

Generating 200 Million random numbers with ASM Axrand.
That'll take a little while ...

Area           = 0.249965115000000
Absolute Error = 0.000034885000000
Elapsed Time   = 5.98 Seconds

This is empty reference loop to take the calculation code time in account
That'll take a little while ...

Area           = 0.000000000000000
Absolute Error = 0.250000000000000
Elapsed Time   = 2.58 Seconds

sinsi

Alex's latest

Generating 200 Million random numbers with RDRAND.
That'll take a little while ...

Area           = 0.250021975000000
Absolute Error = 0.000021975000000
Elapsed Time   = 16.46 Seconds

Generating 200 Million random numbers with C.
That'll take a little while ...

Area           = 0.249995930000000
Absolute Error = 0.000004070000000
Elapsed Time   = 4.25 Seconds

Generating 200 Million random numbers with ASM Axrand.
That'll take a little while ...

Area           = 0.249965115000000
Absolute Error = 0.000034885000000
Elapsed Time   = 2.97 Seconds

This is empty reference loop to take the calculation code time in account
That'll take a little while ...

Area           = 0.000000000000000
Absolute Error = 0.250000000000000
Elapsed Time   = 1.55 Seconds

What are the numbers for area and error e.g. is 25.002 better than 24.999?