News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Faster Prime Numbers

Started by rrr314159, October 22, 2015, 03:26:51 PM

Previous topic - Next topic

rrr314159

This 32-bit Primes Generator, primes.exe, does up to 2 billion (98222287 primes) in about 235 milliseconds. With one thread, it's about twice as fast as my previous version from jj2007's Primes thread.

Chris Wallich's primesieve (C++) is still 1.9x faster. I think / hope the reason is, it's 64-bit. The optimizing compiler utilizes all 16 registers, especially for the critical wheel algorithm. If I get to it I'll convert this to 64, which is worth doing anyway because it would compute primes up to 2^64 (100 billion takes about 10 secs with primesieve).

Although I used no algos from primesieve (just the segmenting idea) it was invaluable for providing test cases and something to shoot for. Thanks Chris! Comparisons:

BUILD SIZE: primesieve is 775K, primes is 7K: 1%! CW's is 64-bit, does more algo's (like wheeling), and also K-tuplets. So I figure roughly, the asm prog would be about 8% the size for equivalent functionality.

SLOC's: primesieve is almost 4000, primes less than 1000. If expanded to the same functionality I think they'd have approximately equal SLOCs (contrary to C-programmer's opinion).

Readability: Chris Wallich's prog is clear as a bell, my asm prog is not. Literally, I can read his C++ easier than my asm, even though I just wrote it!

Test runs:

C:\primes>test

C:\primes>primes
******************************************************************
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads

prime time ms 235.8
with 4 threads
from 2 up to 2000000001, 98222287 primes

C:\primes>primes 0 0 3
******************************************************************
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads

prime time ms 311.5
with 3 threads
from 2 up to 2000000001, 98222287 primes

C:\primes>primes 0 0 2
******************************************************************
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads

prime time ms 436.3
with 2 threads
from 2 up to 2000000001, 98222287 primes

C:\primes>primes 0 0 1
******************************************************************
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads

prime time ms 863.9
with 1 threads
from 2 up to 2000000001, 98222287 primes

C:\primes>primes 0 1000000000
******************************************************************
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads

prime time ms 107.5
with 4 threads
from 2 up to 1000000000, 50847534 primes

C:\primes>primes 1000000000 2000000000
******************************************************************
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads

prime time ms 130.8
with 4 threads
from 1000000000 up to 2000000000, 47374753 primes


C:\primes>primes 777
******************************************************************
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads

prime time ms 236.2
with 4 threads
from 777 up to 2000000001, 98222150 primes

C:\primes>primes 777777
******************************************************************
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads

prime time ms 235.3
with 4 threads
from 777777 up to 2000000001, 98159986 primes


C:\primes>primes 1234567 123456789
******************************************************************
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads

prime time ms 10.42
with 4 threads
from 1234567 up to 123456789, 6931900 primes

C:\primes>primes 15485163 15485163
******************************************************************
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads

prime time ms 0.1576
with 1 threads
from 15485163 up to 15485163, 0 primes

C:\primes>primes 15485863 15485863
******************************************************************
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads

prime time ms 0.157
with 1 threads
from 15485863 up to 15485863, 1 primes


The zip contains the following:

primes.exe
primes.asm ; main, does up to 60061 then calls threads
threads.asm ; sets up thread data, launches them
Do_Segments.asm ; each thread uses per job of segments
SysInfo.asm ; mod of siekmanski's code, gets # threads
primes.inc ; global equates, etc; printing and timing
makeit.bat ; make file
test.bat ; runs test cases given above
I am NaN ;)

Siekmanski

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz; Windows 8; 12 threads

prime time ms 53.41
with 12 threads
from 2 up to 2000000001, 98222287 primes
Creative coders use backward thinking techniques as a strategy.

jj2007

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz; Windows 7; 4 threads

prime time ms 985.3
with 4 threads
from 2 up to 2000000001, 98222287 primes

hutch--


K:\rrr_primes\primes>primes
******************************************************************
Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz; Windows 7; 8 threads

prime time ms 406.7
with 8 threads
from 2 up to 2000000001, 98222287 primes


K:\rrr_primes\primes>primes 0 0 3
******************************************************************
Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz; Windows 7; 8 threads

prime time ms 388.4
with 3 threads
from 2 up to 2000000001, 98222287 primes


K:\rrr_primes\primes>primes 0 0 2
******************************************************************
Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz; Windows 7; 8 threads

prime time ms 573.8
with 2 threads
from 2 up to 2000000001, 98222287 primes


K:\rrr_primes\primes>primes 0 0 1
******************************************************************
Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz; Windows 7; 8 threads

prime time ms 1141
with 1 threads
from 2 up to 2000000001, 98222287 primes


K:\rrr_primes\primes>primes 0 1000000000
******************************************************************
Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz; Windows 7; 8 threads

prime time ms 174.1
with 8 threads
from 2 up to 1000000000, 50847534 primes


K:\rrr_primes\primes>primes 1000000000 2000000000
******************************************************************
Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz; Windows 7; 8 threads

prime time ms 246.6
with 8 threads
from 1000000000 up to 2000000000, 47374753 primes


K:\rrr_primes\primes>primes 777
******************************************************************
Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz; Windows 7; 8 threads

prime time ms 437.9
with 8 threads
from 777 up to 2000000001, 98222150 primes


K:\rrr_primes\primes>primes 777777
******************************************************************
Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz; Windows 7; 8 threads

prime time ms 440.3
with 8 threads
from 777777 up to 2000000001, 98159986 primes


K:\rrr_primes\primes>primes 1234567 123456789
******************************************************************
Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz; Windows 7; 8 threads

prime time ms 30.83
with 7 threads
from 1234567 up to 123456789, 6931900 primes


K:\rrr_primes\primes>primes 15485163 15485163
******************************************************************
Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz; Windows 7; 8 threads

prime time ms 0.1768
with 1 threads
from 15485163 up to 15485163, 0 primes


K:\rrr_primes\primes>primes 15485863 15485863
******************************************************************
Intel(R) Core(TM) i7 CPU         860  @ 2.80GHz; Windows 7; 8 threads

prime time ms 0.1758
with 1 threads
from 15485863 up to 15485863, 1 primes


nidud

#4
deleted

FORTRANS

A:\>primes
******************************************************************

Intel(R) Pentium(R) M processor 1.70GHz; Windows XP; 1 threads



prime time ms 1278

with 1 threads

from 2 up to 2000000001, 98222287 primes


A:\>primes 0 0 3


prime time ms 1277

with 1 threads

from 2 up to 2000000001, 98222287 primes


A:\>primes 0 0 2


prime time ms 1277

with 1 threads

from 2 up to 2000000001, 98222287 primes


A:\>primes 0 0 1


prime time ms 1466

with 1 threads

from 2 up to 2000000001, 98222287 primes


A:\>primes 0 1000000000 


prime time ms 590

with 1 threads

from 2 up to 1000000000, 50847534 primes


A:\>primes 1000000000 2000000000


prime time ms 687.3

with 1 threads

from 1000000000 up to 2000000000, 47374753 primes


A:\>primes 777


prime time ms 1276

with 1 threads

from 777 up to 2000000001, 98222150 primes


A:\>primes 777777


prime time ms 1280

with 1 threads

from 777777 up to 2000000001, 98159986 primes


A:\>primes 1234567 123456789


prime time ms 59.89

with 1 threads

from 1234567 up to 123456789, 6931900 primes


A:\>primes 15485163 15485163


prime time ms 0.1464

with 1 threads

from 15485163 up to 15485163, 0 primes


A:\>primes 15485863 15485863


prime time ms 0.1461

with 1 threads

from 15485863 up to 15485863, 1 primes

rrr314159

Thanks all,

Siekmanski and FORTRANS numbers make sense. However ...

hutch has 8 threads, half of them logical (hyperthreading). Siekmanski's hyperthreads perform about as well as separate cores - but not hutch's. With 8 he gets 407 ms, with 3, 389 ms; obviously not right. Evidently the older i7 860's hyperthreading is not good for this type of CPU-intensive code. If you restrict it to 4 threads with

primes 0 0 4

it should do better than 300 ms - almost as fast as my 3-years-newer i5 3330.

jj2007's i5-2450M has only 2 threads, I think (according to internet). Don't know why SysInfo says 4. jj, if you try

primes 0 0 2

it will use only 2 threads and probably get about 1/2 second. At the moment it's running 4 threads with only 2 cores (and no hyperthreading) so the threads are just getting in each other's way.

nidud's numbers are off by a factor of 1000. I suppose it's because his QueryPerformanceFrequency function is different. If you multiply the numbers by 1000 (alternatively, consider them as seconds not ms) they're right.

Bottom line, the prog has two problems. 1) nidud's computer's QueryPerformanceFrequency differs by factor of 1000, 2) for some reason jj2007's threads count is wrong. I'll take a look at recent threads concerning more accurate ways to get SysInfo. (BTW I already know the code I'm using doesn't read "Windows 10" correctly). Fortunately both problems have simple work-arounds.

Apart from that all the numbers are fine, seems the primes code works right.

Thanks again!
I am NaN ;)

TWell


dedndave

i was surprised at the performance counter frequencies, as well
however, it still seems like a good high-res time measurement,
as long as you call QueryPerformanceCounterFrequency to make calculations
(look at my thread on GetCpuFrequency to get some readings from different machines)

notice that i underlined "time" - RDTSC is still the best for cycle counting

have you found some method to validate the prime number results ?
i.e., it is fast, but is it right ?

Intel(R) Pentium(R) 4 CPU 3.00GHz; Windows XP; 2 threads

prime time ms 6.94
with 2 threads
from 2 up to 2000000001, 98222287 primes


on my machine, it says 6.94 ms, but it took about 7 seconds to run
does the test program run 1000 times ?

nidud

#9
deleted

jj2007

Quote from: rrr314159 on October 23, 2015, 03:18:48 AMjj2007's i5-2450M has only 2 threads, I think (according to internet). Don't know why SysInfo says 4. jj, if you try

primes 0 0 2

it will use only 2 threads and probably get about 1/2 second. At the moment it's running 4 threads with only 2 cores (and no hyperthreading) so the threads are just getting in each other's way.

primes 0 0 2
******************************************************************
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz; Windows 7; 4 threads

prime time ms 515.4 :t
with 2 threads
from 2 up to 2000000001, 98222287 primes

rrr314159

Quote from: TWellUse RtlGetVersion ?
- no, I know this was discussed at length in GetCpuFrequency Tests, should study it. But more likely I'll just wait for siekmanski to update his SysInfo and steal that ;-)

Quote from: dedndavenotice that i underlined "time" - RDTSC is still the best for cycle counting

- I'm using RDTSC for cycle counting, then converting to time (doesn't work so good on older machines)

Quote from: dedndavehave you found some method to validate the prime number results ? i.e., it is fast, but is it right ?

- primesieve has a nice feature: a validation test. As I mentioned he has a few features I don't. The main way I checked is by comparing primes.exe to primesieve (the console version), download here. Run a range on both progs, for instance 1234567 to 123456789. They give the same count, 6931900 primes. Do that with many ranges, they all match. That convinces me my prog is right. As I mentioned b4, Chris Wallich's primesieve was "invaluable for providing test cases".

Quote from: dedndave
prime time ms 6.94
with 2 threads
from 2 up to 2000000001, 98222287 primes

on my machine, it says 6.94 ms, but it took about 7 seconds to run

- No doubt your (older) machine is like nidud's. The time calculation is off by 1000, so that number is 6.94 seconds (not ms). I'll bet the thread count is wrong also. Probably quite a bit faster with just one:

primes 0 0 1

Quote from: jj2007prime time ms 515.4 :t

- perfect, thanks jj. I should fix the thread info routine but instead maybe I'll remember the last thread count specifed by the user; then you just have to fix it once. The user might want to limit to fewer threads anyway, so primes.exe doesn't hog the CPU.
I am NaN ;)

dedndave

Quote from: rrr314159 on October 23, 2015, 12:28:53 PM
- I'm using RDTSC for cycle counting, then converting to time (doesn't work so good on older machines)

then, you need to know the CPU clock frequency to make that conversion


.... yah, my machine has a single physical core, but hyperthreading makes 2 logical cores

Zen

F:\MASM Forum Demos\Faster Primes, RRR314159\primes>primes
******************************************************************
Intel(R) Pentium(R) Dual  CPU  E2140  @ 1.60GHz; Windows 7; 2 threads
prime time ms 1113
with 2 threads
from 2 up to 2000000001, 98222287 primes

F:\MASM Forum Demos\Faster Primes, RRR314159\primes>primes 0 0 1
******************************************************************
Intel(R) Pentium(R) Dual  CPU  E2140  @ 1.60GHz; Windows 7; 2 threads
prime time ms 2116
with 1 threads
from 2 up to 2000000001, 98222287 primes

F:\MASM Forum Demos\Faster Primes, RRR314159\primes>primes 0 0 2
******************************************************************
Intel(R) Pentium(R) Dual  CPU  E2140  @ 1.60GHz; Windows 7; 2 threads
prime time ms 1116
with 2 threads
from 2 up to 2000000001, 98222287 primes

F:\MASM Forum Demos\Faster Primes, RRR314159\primes>primes 0 0 3
******************************************************************
Intel(R) Pentium(R) Dual  CPU  E2140  @ 1.60GHz; Windows 7; 2 threads
prime time ms 1093
with 2 threads
from 2 up to 2000000001, 98222287 primes

F:\MASM Forum Demos\Faster Primes, RRR314159\primes>primes 0 0 4
******************************************************************
Intel(R) Pentium(R) Dual  CPU  E2140  @ 1.60GHz; Windows 7; 2 threads
prime time ms 1102
with 2 threads
from 2 up to 2000000001, 98222287 primes


RRR314159, Hi,...how did I miss this ???
I read through the source very quickly,...most of it I didn't understand,...so, I will have to study it extensively,...Thanks for the example.  :bgrin:
I have two questions:

  • I assume that the actual values of the primes are stored in your defined segments,...so, could I modify the code to, say, write these to file ???
  • Is the only practical use for prime numbers public-key cryptography algorithms ???

rrr314159

I assume that the actual values of the primes are stored in segments,...so, could I modify the code to, say, write these to file ???

- yes and no ... (of course for all the primes a printout would come to a gig or so). The primes are actually encoded in a byte map. So for instance the second segment starts at 60063, but even numbers not included, so the index is 30031. 60077 is the first prime in the seg, so that means byte #7 is a "1", the previous ones are zero. Etc. I actually have written a routine to print out the primes from a segment; that was for a previous version, though.

So, you could do it yourself, but it's a challenge. Let me know how it goes,

Meanwhile I'm working on an improved 64-bit version which (I hope) will be faster, and go up to a trillion or more. (That would take a few minutes). I intend to have an interface ("API") allowing the user to write his own code for any prime manipulations: not only printing but also computing residues, k-tuplets, ranges etc. My original motivation, in fact, was to check out a "theorem" (more accurately, a guess) of mine concerning residues. So I'd rather make a print function for that version. Of course laziness might intervene
I am NaN ;)