This 64-bit version of the Prime Number Generator (png) is a bit faster than primesieve (current net record holder) under 2 billion primes, multi-threaded. It does 2 billion in about 113 milliseconds vs. primesieve's approx 117. Primesieve (ps) uses ticks to count and gets 110 and 125 about equally, so estimate 117.5. These are the times on my 15 3330, using Chris Wallich's latest console version (Nov 8, 2015).
The margin is very slim, and ps beats png with one thread; my advantage is in the threading algorithm. Much more important, he goes up to a billion billion; so png is just a toy by comparison - call it a Proof-of-Concept. Since I've converted to 64-bits I can go up to the same 2^64 without too much trouble. But png will be very slow up there until 2 more critical algo's are added (for medium and large primes). When all that is done (doubt I'll ever get to it) the 2 billion timing might suffer ... Bottom line, Chris Wallich has nothing to worry about yet!
png build size is about 4% that of ps; SLOC count, about 30% of ps. Assembler takes fewer lines than C++, due to macros; and of course far fewer bytes.
As usual this is all my own work, although primesieve has been invaluable for inspiration. About 75% of my algos are roughly the same - there are only so many ways to skin the cat - but a comparison will clearly show they're independently generated. in fact he has a couple of wrinkles I didn't think of, which I didn't steal; next time I will. OTOH he's missing a couple things also, particularly for threading.
To run it: "png" will do up to 2 billion by default. "png 12 67" does from 12 to 67. "png 12 67 -p" prints the primes, "png -t2" uses only two threads. BTW it takes at least 30 seconds to generate a 1-gig file with 100 million primes (or so), using "png -p > primes.txt".
Here are the results of some test cases:
C:\png1>test
C:\png1>png
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads
prime time ms 112.9
from 1 up to 2000000010, 98222287 primes
C:\png1>png -t1
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads
prime time ms 416.3
from 0 up to 2000000010, 98222287 primes
C:\png1>png 0 1000000000
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads
prime time ms 49.8
from 0 up to 1000000000, 50847534 primes
C:\png1>png 1000000000 2000000000
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads
prime time ms 69.07
from 1000000000 up to 2000000000, 47374753 primes
C:\png1>png 777 2000000000
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads
prime time ms 112.5
from 777 up to 2000000000, 98222150 primes
C:\png1>png 777777 2000000000
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads
prime time ms 112.5
from 777777 up to 2000000000, 98159986 primes
C:\png1>png 1234567 123456789
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads
prime time ms 5.379
from 1234567 up to 123456789, 6931900 primes
C:\png1>png 15485163
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads
prime time ms 0.0001095
from 15485163 up to 15485163, 0 primes
C:\png1>png 15485863
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads
prime time ms 0.1597
from 15485863 up to 15485863, 1 primes
C:\png1>png 0 960960
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads
prime time ms 0.1498
from 0 up to 960960, 75681 primes
The first zip includes:
png.exe
png.asm ; main program, does first segment
threads.asm ; thread algo
segments.asm ; segments above the first one
wheel.asm ; wheel algorithm (2 * 3 * 5 = 30)
utility.asm ; prime-counting, sys info, command line
png_macros ; png-specific utility macros
png.inc ; png-specific, plus all required 64-bit includes
my_macros.asm ; generic macros: print, timing, etc
makeit.bat ; make file
test.bat ; generates test results
test_results.txt ; correct test results
The second zip has the two required 64-bit libs, which I got from WinInc, kernel32.lib and msvcrt.lib, because there are incompatible versions around. With these two zips you can compile png, all you need is any JWasm (>= 2.11) or asmc.