Just for fun :biggrin:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
It took 29.16 seconds to find 98222287 primes among 2000000000 numbers
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67
71 73 79 83 89 97 101 103 107 109 113 127 131 137 139 149 151 157 163
167 173 179 181 191 193 197 199 211 223 227 229 233 239 241 251 257 263 269
271 277 281 283 293 307 311 313 317 331 337 347 349 353 359 367 373 379 383
389 397 401 409 419 421 431 433 439 443 449 457 461 463 467 479 487 491 499
...
1999999553 1999999559 1999999571 1999999609 1999999613 1999999621 1999999643 1999999649 1999999657 199999
9747 1999999763 1999999777 1999999811 1999999817 1999999829 1999999853 1999999861 1999999871 1999999873
1999999913 1999999927 1999999943 1999999973
Why is it so slow? 8)
It's waiting for your fast 64-bit version :biggrin:
I have a 32-bit version somewhere which took a few seconds, but was outclassed by others mentioned on the net. Of course multiple threads helps
I have a question about Italy, might as well ask it now. Long ago Italy produced men like Caesar, the Cato's, Pompey, Brutus, Octavius Augustus .. and Horace, Virgil, Tacitus .. Marcus Aurelius ..: a long list of great men. More recently we have Dante Alighieri, Leonardo da Vinci, Enrico Fermi and other great geniuses. But since 1945 or so, nothing! Well there's jj2007, but that's about it. The question is, what the h**l went wrong??
Jochen is from Germany :biggrin:
Fellini, Pasolini, Abbado, Gucci, Armani, Ferrari, Maserati, Lamborghini, ... ;-)
Quote from: jjFellini, Pasolini, Abbado, Gucci, Armani, Ferrari, Maserati, Lamborghini, ... ;-)
- Right: like I said, since 1945 or so, a bunch of nobodies! Imagine telling Caesar, after he's spent the day slaughtering thousands of Germans and half the Senate, and while taking a quick break is deciding the fate of the entire known world: "We still have great men in Italy! For instance here's a guy Armani who
designs clothes. Take off that blood-stained armor and try on this little number - you'll look so cute, darling!" He'd chop your head off and feed it to the dogs in the street.
What went wrong??
If King Richard III shouted "My kingdom for a horse", what do you think Caesar would have offered for a Lamborghini???
Caesar would respect Henry Ford, or other great inventors, but not a mere mechanic. He'd put Lamborghini to work repairing cart axles and cleaning up after the horses. A car would be useless: where would he get gasoline, oil, brake fluid, tires, spare parts? At least, he'd prefer something very rugged and reliable - Lamborghini's break down constantly. Maybe a Lamborghini tractor would do, but no doubt he'd want something Japanese or Korean-made.
Anyway, anybody these days can have a car; me, you, Lamborghini are all way ahead of Caesar in that regard. But I'm talking about the quality of the man, not his toys.
What the h**l went wrong ??
(http://i.imgur.com/aVBZoNC.jpg)
you forgot Monica Bellucci, a national treasure.
(http://www.effinhot.com/wp-content/uploads/2014/12/6af1da79629498336e7bee2b066d77291.jpg)
Quote from: rrr314159 on August 12, 2015, 12:19:45 PM
What the h**l went wrong ??
People got domesticated and complacent. And "leaders" got to rage war without putting their own skin on the line. That's what went wrong.
The same question could be asked for the Greeks. Is like they have chosen the worst traits of Athenians and Spartans instead of the best traits.
@dedndave and hutch,
Enzo Ferrari was somebody, can't deny that, but his glory days of skill and bravery were all before 1945 (the year I specified, as u remember). After that he was a mere businessman, who managed to win a few races, while killing seven of his drivers and numerous spectators. His dictatorial disregard of other's lives would, actually, be the only quality to win Caesar's respect ...
Likewise, I can't deny Monica Bellucci is somebody! Caesar would definitely have respected her and appointed her to a very important position. In fact, the entire army would have respected her - one after the other.
Come to think of it, the women have, indeed, improved since ancient days. There are extant pictures of Caesar's great passion, Cleopatra: if that's the best they had ... well, I'd hate to see the worst. More recently, look at paintings such as Mona Lisa. Ms. Bellucci is a huge improvement. Then there's Sophia Loren, and Gina Lollobridgida, and a million young ones I've never heard of. No doubt, the quality of the women is way up.
But, I'm talking about the quality of the men.
What went wrong ??
xanatose,
you must be a lot of fun at parties! ... I'm just trash-talking, don't really expect an answer. You're right, there are reasons why Italy is so much less than it was; but those same reasons apply to every country. The modern era is real tough on heroes. Italy is a good subject, not because its current crop is worse than anyone else's, but because its surpassing greatness in the past makes such a contrast.
For instance it wouldn't be funny to pick on Poland: the list of notables is a joke in itself, all it needs is the laugh track. Pilsudski? Boleslaw the Brave? Żeromski? Tarski? General Beck, who charged tanks with cavalry? Say "Chopin" - that's it, you're done with great Poles (well, Ulam perhaps also).
Quote from: xanatoseIs like they [Greeks] have chosen the worst traits of Athenians and Spartans instead of the best traits.
- indeed the same thought has occurred to me
Quote from: jj2007Just for fun :biggrin: ... It took 29.16 seconds to find 98222287 primes among 2000000000 numbers
- Actually this is an interesting topic, perhaps the time has come for a 64-bit SIMD multi-threaded prime number generator. Sorry, I accidentally hijacked the thread to complain about Italy. My excuse, I noticed "Just for fun" and, without checking, thought we were in the SoapBox or Colosseum. Like to know how fast the current champion PM generator is, can't seem to find it on net. Undoubtedly beatable, assuming it's not assembler. ... some fast techniques would be very SIMD-izable, for instance "wheeling". And of course cache management is vital
I think I will stick with Monica Bellucci being an Italian national treasure. When I want prime numbers I look them up in a massive table I have.
Quote from: rrr314159 on August 12, 2015, 01:22:58 PMMonica Bellucci is somebody!
Quote from: hutch-- on August 12, 2015, 04:15:55 PMI think I will stick with Monica Bellucci being an Italian national treasure.
Nice to see that two philosophers can agree on some body :icon_mrgreen:
Will look into the SIMD stuff when I find time. Maybe.
Hello jj2007:
C:\masm32\bin\PrimesBitField>PrimesBitField.exe
Intel(R) Core(TM)2 Duo CPU E4600 @ 2.40GHz
It took 69.97 seconds to find 98222287 primes among 2000000000 numbers
2 3 5 7 11 13 17 19 23 29
31 37 41 43 47 53 59 61 67 71
73 79 83 89 97 101 103 107 109 113
127 131 137 139 149 151 157 163 167 173
179 181 191 193 197 199 211 223 227 229
233 239 241 251 257 263 269 271 277 281
283 293 307 311 313 317 331 337 347 349
353 359 367 373 379 383 389 397 401 409
419 421 431 433 439 443 449 457 461 463
467 479 487 491 499
...
1999999553 1999999559 1999999571 1999999609 1999999613
1999999621 1999999643 1999999649 1999999657 1999999747
1999999763 1999999777 1999999811 1999999817 1999999829
1999999853 1999999861 1999999871 1999999873 1999999913
1999999927 1999999943 1999999973
It took almost 70 seconds.
Regards Herge
I LOVE THREADS LIKE THIS !!!
...As long as we're NOT MAKING ANY SENSE AT ALL,...
(https://scontent.cdninstagram.com/hphotos-xaf1/t51.2885-15/s320x320/e15/11357619_1657219911181892_1112746291_n.jpg)
...Oh,...and, HUTCH,...good point (http://masm32.com/board/index.php?topic=4495.msg48074#msg48074),...
Quote from: Zen on August 13, 2015, 05:30:19 AM
I LOVE THREADS LIKE THIS !!!
...As long as we're NOT MAKING ANY SENSE AT ALL,...0
Shovels, dead bodies...? :dazzled:
You are hijacking this thread! Until here, we were arguing about prime alive bodies 8)
(http://www.wallpixo.com/wp-content/uploads/2015/07/monica-bellucci-1-0.jpg)
Quote from: primesieve.orgprimesieve is a free software program and C/C++ library that generates primes using a highly optimized sieve of Eratosthenes implementation. It counts the primes below 10^10 in just 0.57 seconds on an Intel Core i7-4770 CPU (4 x 3.4GHz) from 2013.
- This is about as fast as they get, AFAIK. It's written in C++ and doesn't use SIMD, so must be beatable. compared to jj2007's, it's only about 20 times faster (taking into account multiple threads, etc) - with maybe 100 times as much code.
Quote from: rrr314159 on August 13, 2015, 06:30:45 AMcompared to jj2007's, it's only about 20 times faster
You made my day :bgrin:
Quote from: jj2007 on August 13, 2015, 06:55:54 AM
Quote from: rrr314159 on August 13, 2015, 06:30:45 AMcompared to jj2007's, it's only about 20 times faster
You made my day :bgrin:
Well it might be only factor of 10, hard to say. There are a lot of tricks to speed up priming, such as marking small primes all at once using bit-map of length 210 (leaving out details). If it makes your day better, I found my prime routine and timed it. It's about the same as yours except with threads
Hey, just kidding. I had also noticed primesieve & company, there seems to be a big community dedicated to chasing speed records, so we are just amateurs... if I had more time, I'd try the SIMD road, though.
Quote from: jjwe are just amateurs
- Every pro started as an amateur .. ! If we had to
figure out their tricks a defeatist attitude might be appropriate, but we don't, it's all public knowledge.
- BTW I've returned to my main thing - math programming - for the last couple months. I'm no longer learning assembler as such; it's a tool to do math with. You may have noticed my answers to questions have been vague and often along the lines of "d**ned if I know!". For 6 months I studied every issue that came up - goal was to get up to speed on assembler and I was interested in every detail. Now however I feel I've accomplished the objective (well enough).
- Therefore I may tackle prime numbers. To you it's a side issue (the main issue is assembler); to me, this sort of math is what it's all about.
- BTW I don't at the moment see much use for SIMD. Seems prime num algo does not lend itself to vector programming; unlike FFT's and graphics. Probably that's why primesieve author doesn't use it
Just for fun I made this Eratosthenes prime generator, does 2 billion in 10.5 seconds, on one thread. With 4 threads it gets under four seconds, but not 2.625, since threading is not very efficient. It has a few tricks (out of dozens one could put in):
- map only has odd numbers, similar to sieve of Sundaram
- uses "wheel" to mark 3 and 5
- starts priming at the square, skips even numbered multiples
- couple other even more minor items
But Chris Wallich's is 10 times faster. Apparently the reason is mainly cache management. With regular Eratosthenes (as I'm using) you mark one prime all the way through the map, then go back for the next one, etc. So the entire map must be moved to cache once for each prime (up to the square root); that's a few thousand times. The "Wallich technique" (AFAIK he invented it) takes just one piece of the map at a time, which fits into cache, and marks all the primes. Then it gets the next one: about 30,000 little segments. It's a lot more trouble, but this way the entire map is brought into cache only once. Apparently that speeds it up an order of magnitude! So I'd like to go ahead and try that, if I don't get bored. The other nice thing about his technique, it's more convenient to multi-thread. However, none of these approaches seems to use SIMD effectively (scatter / gather commands would help).
Anyway FWIW here it is. The main routine, primes.asm:
include primes.inc
SIZEOFMAP = 1000000000 ; this is highest index
MAX_NUMBER = SIZEOFMAP*2 + 1 ; highest odd number is 2*index + 1
indextoprime MACRO ; uses eax, converts any odd num not just prime
shl eax, 1
add eax, 1
ENDM
primetoindex MACRO ; reverse computation
sub eax, 1
shr eax, 1
ENDM
.data
primesmap dd 0
tenm3 real8 0.001
timerResults real8 0.0
.code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
primes:
call GetSystemInformation
InitCycles ; start timer
; allocate memory
invoke GlobalAlloc, 0, SIZEOFMAP + 100h ; buffer in case SIZEOFMAP not div by 4
mov edi, eax ; edi => the primes map
; do the wheel "mask"
CALL MasktheMap ; fill in "wheel" of 60 (30 times 2 for DWORDs)
dec edi ; below here edi always adds 1, edi + 0 is never used,
; so just back it up 1 because ecx = 1 is index of 3
; thus edi + 1 is on a dword boundary
mov primesmap, edi
; main work here
call MarkPrimes ; sieve of eratosthenes
call CountThem ; return count in ebx
; get time, print results
movflt timerResults, @ElapsedCycles()
movflt timerResults, @CyclestoTime(timerResults)
fld timerResults
fmul tenm3
fstp timerResults
pp "number of primes %u to %u\n", ebx, MAX_NUMBER
pp "prime time seconds %.4g\n", timerResults
; finally, print out last few primes
mov ecx, SIZEOFMAP
mov edx, ecx
sub edx, 100
xor ebx, ebx
printloop:
cmp BYTE PTR [edi + ecx], 0
je @F
mov eax, ecx
indextoprime
pp "prime %u\n", eax
inc ebx
@@:
dec ecx
cmp ecx, edx
jge printloop
pp "number of primes %i above %i\n", ebx, edx
ret
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
MasktheMap:
mov esi, edi
mov ecx, 0 ; threes/fives mask (nonzero ultimately means it's prime)
@@:
mov DWORD PTR [esi + ecx], 00010000h ; this knocks out (2 and) 3 and 5, 60 bytes at a time
mov DWORD PTR [esi + ecx + 4], 01000101h ; don't forget they're backwards
mov DWORD PTR [esi + ecx + 8], 00010001h
mov DWORD PTR [esi + ecx + 12], 00010100h
mov DWORD PTR [esi + ecx + 16], 01000100h
mov DWORD PTR [esi + ecx + 20], 01010001h
mov DWORD PTR [esi + ecx + 24], 00000100h
mov DWORD PTR [esi + ecx + 28], 00000101h
mov DWORD PTR [esi + ecx + 32], 01010001h
mov DWORD PTR [esi + ecx + 36], 00010100h
mov DWORD PTR [esi + ecx + 40], 01000001h
mov DWORD PTR [esi + ecx + 44], 01000001h
mov DWORD PTR [esi + ecx + 48], 00010100h
mov DWORD PTR [esi + ecx + 52], 01000101h
mov DWORD PTR [esi + ecx + 56], 01010000h
add ecx, 60
cmp ecx, SIZEOFMAP
jle @B
mov DWORD PTR [edi], 00010101h ; the first word set 3, 5, 7 to 1
ret
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
MarkPrimes:
xor esi, esi
mov edi, primesmap
mov ecx, 2 ; below it incs, so it starts with 7
beginnextprimeloop:
inc ecx
cmp BYTE PTR [edi + ecx], 1
jne beginnextprimeloop
; found a prime
mov eax, ecx ; ecx is the prime's index
mul ecx
add eax, ecx ; now eax has the index ^2 + index
shl eax, 1 ; index of the square = (x^2 + x) * 2
xchg eax, ecx ; eax again has prime's index
cmp ecx, SIZEOFMAP ; square has reached top ?
jg donepriming
indextoprime ; double eax, add 1, to check only odd multiples
@@:
mov BYTE PTR [edi + ecx], 0
add ecx, eax
cmp ecx, SIZEOFMAP
jle @B
primetoindex ; eax back to the prime's index (eg, for 11 it's 5)
mov ecx, eax
jmp beginnextprimeloop
donepriming:
ret
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
CountThem:
mov ebx, 1 ; for "2"
mov esi, primesmap
mov ecx, 1 ; ecx = 1 is index of 3
@@:
mov edx, [esi + ecx]
xor eax, eax
add al, dl
add al, dh
bswap edx
add al, dl
add al, dh
add ebx, eax
add ecx, 4
cmp ecx, SIZEOFMAP
jl @B
ret
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end primes
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
The zip includes:
primes.exe
primes.asm
primes.inc ; support routines like timer, and includes etc
makeit.bat ; batch make file
primesrun.txt ; runs for this routine and also jj2007's on my machine
It works!
Put in a very rough implementation of cache segmentation; cut the time for 2 billion from 10.5 to 2 seconds (!) on one thread. This is NOT optimized, lots of obvious improvements (other than threads) can be made (for instance I've got variables in memory which could be in registers) so it can be a lot faster. That's within striking distance of Chris Wallich, maybe half the speed or so. Then there are more difficult improvements one can steal from his program (like better wheeling) - so it seems likely he can be beat - which is expected: assembler vs. C++ = no contest. Whether anyone feels like doing all that work is another matter! But I think I'll be within a factor of 2 without much trouble.
After a while I'll clean it up and post it. Just wanted to share the good news: this caching technique does, in fact, work :eusa_dance:
On an i7 quad, running it with 8 threads should move it along a fair bit faster.
Compliments :t
Have no time right now to participate, but I'm watching with awe ;-)
Here is a pretty simple, pretty fast version of the segmented sieve of Eratosthenes prime number algorithm - only single-thread.
This version does 2 billion in about 1.7 seconds. It's hard to say how it compares to Chris Wallich's "primesieve"; not much slower (ignoring threading). It beats his "Simple (single-core) Segmented sieve of Eratosthenes" by 25%. Of course it's much smaller build, 4K (about 350 lines of asm).
This is very simple 32-bit code, uses SSE, compiles under ml.exe or JWasm.
In primes.asm, you may need to set the SEGMENT_SIZE (on my machine it's 32768) and the MAX_INDEX which is half the top number, in this case, 1 billion. It requires a bit more ram than MAX_INDEX, in this case, a gig.
Here are sample runs:
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads (only 1 used)
prime time seconds 1.715
number of primes 98222287 to 2000000001
1st prime 2
2nd prime 3
3rd prime 5
4th prime 7
5th prime 11
6th prime 13
7th prime 17
8th prime 19
9th prime 23
10th prime 29
11th prime 31
12th prime 37
number of primes 12 to 39
prime 1999999973
prime 1999999943
prime 1999999927
prime 1999999913
prime 1999999873
prime 1999999871
prime 1999999861
prime 1999999853
prime 1999999829
prime 1999999817
prime 1999999811
number of primes 11 above 1999999801
Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz; Windows 8; 4 threads (only 1 used)
prime time seconds 0.7966
number of primes 50847534 to 1000000001
1st prime 2
2nd prime 3
3rd prime 5
4th prime 7
5th prime 11
6th prime 13
7th prime 17
8th prime 19
9th prime 23
10th prime 29
11th prime 31
12th prime 37
number of primes 12 to 39
prime 999999937
prime 999999929
prime 999999893
prime 999999883
number of primes 4 above 999999801
The zip includes:
primes.exe
primes.asm ; main routine
primes.inc ; includes and support routines: print, timer, sys-info
makeit.bat ; "make" file
[edit] made another version about 15% faster, updated the .zip file, apologies to those who already downloaded it
Even on my sloooowww laptop, it does a decent job. :t
Genuine Intel(R) CPU T2060 @ 1.60GHz; Windows XP; 2 threads (only 1 used)
prime time seconds 2.685
number of primes 98222287 to 2000000001
1st prime 2
2nd prime 3
3rd prime 5
4th prime 7
5th prime 11
6th prime 13
7th prime 17
8th prime 19
9th prime 23
10th prime 29
11th prime 31
12th prime 37
number of primes 12 to 39
prime 1999999973
prime 1999999943
prime 1999999927
prime 1999999913
prime 1999999873
prime 1999999871
prime 1999999861
prime 1999999853
prime 1999999829
prime 1999999817
prime 1999999811
number of primes 11 above 999999900
From the updated version:
Genuine Intel(R) CPU T2060 @ 1.60GHz; Windows XP; 2 threads (only 1 used)
prime time seconds 2.442
~~~~
~~~~
This AMD is so slow :(AMD E-450 APU with Radeon(tm) HD Graphics 1.65GHz; Windows 7; 2 threads (only 1 used)
prime time seconds 5.594
number of primes 98222287 to 2000000001
1st prime 2
2nd prime 3
3rd prime 5
4th prime 7
5th prime 11
6th prime 13
7th prime 17
8th prime 19
9th prime 23
10th prime 29
11th prime 31
12th prime 37
number of primes 12 to 39
prime 1999999973
prime 1999999943
prime 1999999927
prime 1999999913
prime 1999999873
prime 1999999871
prime 1999999861
prime 1999999853
prime 1999999829
prime 1999999817
prime 1999999811
number of primes 11 above 1999999801
Quote from: TWell on September 15, 2015, 05:52:59 PM
This AMD is so slow :(
Are you running AV?
If so, maybe thats a contributing factor?
Else its possible that the code has optimizations that work best on Intel ???
@zedd,
Thanks a lot zedd, sorry for the two versions, BTW you might look at the timer macros in primes.inc, since you've been working on such things in the laboratory
@TWell,
thanks, my AMD A6 is a bit faster than E-450 at 4.2 but still a lot slower than I5
BTW zedd slower AMD's could be partly due to optimizing for Intel, but mainly they're just slower (with some exceptions; for instance fsin and fcos are much faster on AMD's)
Quote from: rrr314159 on September 15, 2015, 05:57:48 PM
...BTW you might look at the timer macros in primes.inc
Great, I'll look into that.
And your'e welcome.
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz; Windows 7; 4 threads (only 1 used)
prime time seconds 2.084
JWasm and ML 8.0+ accept the source, but ML 6.15 doesn't like it, internal error at the mask0 etc declarations.
Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz; Windows 8; 8 threads (only 1 used)
prime time seconds 1.218
Windows 10 actually... :biggrin:
@TWell, (and anyone else also, )
I forgot to mention, check the cache size, it should be set to your L1 cache. In the current code it's 32768. Try half that much, or twice, see if it makes a difference
@jj2007,
Thanks, mask0 is an oword in .data section. If anyone cares I can post the (older, slower) masking routine that doesn't use owords
@sinsi,
thanks, your 'puter is the speed king ... wouldn't be surprised if you could increase the cache size also, do even better ... ? My sys-info routine doesn't know about Win 10 yet; I stole it from siekmanski so blame him :biggrin:
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz; Windows 8.1; 12 threads (only 1 used)
prime time seconds 0.3928
number of primes 98222287 to 2000000001
1st prime 2
2nd prime 3
3rd prime 5
4th prime 7
5th prime 11
6th prime 13
7th prime 17
8th prime 19
9th prime 23
10th prime 29
11th prime 31
12th prime 37
number of primes 12 to 39
prime 1999999973
prime 1999999943
prime 1999999927
prime 1999999913
prime 1999999873
prime 1999999871
prime 1999999861
prime 1999999853
prime 1999999829
prime 1999999817
prime 1999999811
number of primes 11 above 1999999801
Quote from: rrr314159 on September 15, 2015, 06:28:02 PM
..should be set to your L1 cache. In the current code it's 32768. Try half that much, or twice, see if it makes a difference
hmmmmm....
ml 6.14 doesn't like mfence
Microsoft (R) Macro Assembler Version 6.14.8444
Copyright (C) Microsoft Corp 1981-1997. All rights reserved.
Assembling: primes.asm
primes.asm(48) : error A2008: syntax error : MFENCE
RDTSCFLOAT(3): Macro Called From
InitCycles(17): Macro Called From
primes.asm(48): Main Line Code
primes.asm(54) : error A2008: syntax error : MFENCE
RDTSCFLOAT(3): Macro Called From
@RDTSCFLOAT(2): Macro Called From
ElapsedCycles(3): Macro Called From
@ElapsedCycles(2): Macro Called From
movflt(1): Macro Called From
get_time(1): Macro Called From
primes.asm(54): Main Line Code
lemme grab jwasm...
Just for kicks, I have no idea how big my L1 cache is.
Genuine Intel(R) CPU T2060 @ 1.60GHz; Windows XP; 2 threads (only 1 used)
prime time seconds 4.036 L1 = 65536
prime time seconds 2.46 L1 = 32768 ; seems optimal
prime time seconds 2.972 L1 = 16384
prime time seconds 3.772 L1 = 8192
used jwasm :t
Quote
JWasm v2.11, ~~~
primes.asm: 409 lines, 2 passes, 46 ms, 0 warnings, 0 errors~~
@Siekmanski,
thanks, wow .4 seconds, sinsi eat your heart out! Chris Wallich was using an I7-4770, more comparable to sinsi's I guess ... it seems that this algo is about the same speed as his, if I make a 64-bit version with threads
@zedd151,
apparently your cache is same as mine 32768. This prime number algo is very sensitive to cache size, as u can see
Quote from: rrr314159 on September 15, 2015, 06:28:02 PMThanks, mask0 is an oword in .data section.
Workaround is two qwords, but caution with the order ;-)
Yes jj2007, least significant byte is tricky ;-) ... ML 6.15 doesn't like mfence either ... not sure it's worth it to care about a 6.15 version? Is it the case that some people can't (or, won't) upgrade to 8.0?
Quote from: rrr314159 on September 15, 2015, 07:57:28 PM
Is it the case that some people can't (or, won't) upgrade to 8.0?
For me at least, I only upgrade anything on an 'as-needed' basis.
I never simply upgrade because of a so-called 'improved' version.
(that's why I still run WinXP, don't need (yet) anything else) 8)
Software compatibility? That's why I am learning to write my own :biggrin:
The upgrade is indeed a clumsy procedure. Not everybody is willing to install that behemoth of Visual Crap.
asm:
mask0 qword 0001000101000101h, 0100010000010100h
mask1 qword 0000010000010001h, 0101000100000101h
mask2 qword 0100000100010000h, 0001010001000000h
mask3 qword 0100000001000101h, 0100000100010000h
mask4 qword 0001010000010001h, 0101000100000100h
mask5 qword 0000010100000100h, 0001010001010001h
inc:
db 0Fh, 0AEh, 0F0h ; mfence
db 0fh, 31h ; rdtsc
Thanks jj2007,
next version I'll put these changes in for those 6.15 fans out there. If anyone wants to change it themselves: the place in primes.asm to substitute the mask0, mask1 ... block is obvious. The mfence and rdtsc db statements would replace those instructions in primes.inc
SEGMENT_SIZE = 8192
AMD E-450 APU with Radeon(tm) HD Graphics; Windows 7; 2 threads (only 1 used)
prime time seconds 8.54
SEGMENT_SIZE = 16384
AMD E-450 APU with Radeon(tm) HD Graphics; Windows 7; 2 threads (only 1 used)
prime time seconds 7.385
SEGMENT_SIZE = 32768
AMD E-450 APU with Radeon(tm) HD Graphics; Windows 7; 2 threads (only 1 used)
prime time seconds 6.541