News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Prime numbers

Started by jj2007, August 12, 2015, 07:19:26 AM

Previous topic - Next topic

zedd151

Even on my sloooowww laptop, it does a decent job. :t

Genuine Intel(R) CPU           T2060  @ 1.60GHz; Windows XP; 2 threads (only 1 used)
prime time seconds 2.685
number of primes 98222287 to 2000000001
1st prime 2
2nd prime 3
3rd prime 5
4th prime 7
5th prime 11
6th prime 13
7th prime 17
8th prime 19
9th prime 23
10th prime 29
11th prime 31
12th prime 37
number of primes 12 to 39
prime 1999999973
prime 1999999943
prime 1999999927
prime 1999999913
prime 1999999873
prime 1999999871
prime 1999999861
prime 1999999853
prime 1999999829
prime 1999999817
prime 1999999811
number of primes 11 above 999999900

zedd151

From the updated version:


Genuine Intel(R) CPU           T2060  @ 1.60GHz; Windows XP; 2 threads (only 1 used)
prime time seconds 2.442

~~~~
~~~~

TWell

This AMD is so slow :(AMD E-450 APU with Radeon(tm) HD Graphics 1.65GHz; Windows 7; 2 threads (only 1 used)
prime time seconds 5.594
number of primes 98222287 to 2000000001
1st prime 2
2nd prime 3
3rd prime 5
4th prime 7
5th prime 11
6th prime 13
7th prime 17
8th prime 19
9th prime 23
10th prime 29
11th prime 31
12th prime 37
number of primes 12 to 39
prime 1999999973
prime 1999999943
prime 1999999927
prime 1999999913
prime 1999999873
prime 1999999871
prime 1999999861
prime 1999999853
prime 1999999829
prime 1999999817
prime 1999999811
number of primes 11 above 1999999801

zedd151

Quote from: TWell on September 15, 2015, 05:52:59 PM
This AMD is so slow :(

Are you running AV?
If so, maybe thats  a contributing factor?
Else its possible that the code has optimizations that work best on Intel ???

rrr314159

@zedd,

Thanks a lot zedd, sorry for the two versions, BTW you might look at the timer macros in primes.inc, since you've been working on such things in the laboratory

@TWell,

thanks, my AMD A6 is a bit faster than E-450 at 4.2 but still a lot slower than I5

BTW zedd slower AMD's could be partly due to optimizing for Intel, but mainly they're just slower (with some exceptions; for instance fsin and fcos are much faster on AMD's)
I am NaN ;)

zedd151

Quote from: rrr314159 on September 15, 2015, 05:57:48 PM
...BTW you might look at the timer macros in primes.inc

Great, I'll look into that.

And your'e welcome.

jj2007

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz; Windows 7; 4 threads (only 1 used)
prime time seconds 2.084

JWasm and ML 8.0+ accept the source, but ML 6.15 doesn't like it, internal error at the mask0 etc declarations.

sinsi

Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz; Windows 8; 8 threads (only 1 used)
prime time seconds 1.218

Windows 10 actually... :biggrin:

rrr314159

@TWell, (and anyone else also, )

I forgot to mention, check the cache size, it should be set to your L1 cache. In the current code it's 32768. Try half that much, or twice, see if it makes a difference

@jj2007,

Thanks, mask0 is an oword in .data section. If anyone cares I can post the (older, slower) masking routine that doesn't use owords

@sinsi,

thanks, your 'puter is the speed king ... wouldn't be surprised if you could increase the cache size also, do even better ... ? My sys-info routine doesn't know about Win 10 yet; I stole it from siekmanski so blame him :biggrin:
I am NaN ;)

Siekmanski

Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz; Windows 8.1; 12 threads (only 1 used)
prime time seconds 0.3928
number of primes 98222287 to 2000000001
1st prime 2
2nd prime 3
3rd prime 5
4th prime 7
5th prime 11
6th prime 13
7th prime 17
8th prime 19
9th prime 23
10th prime 29
11th prime 31
12th prime 37
number of primes 12 to 39
prime 1999999973
prime 1999999943
prime 1999999927
prime 1999999913
prime 1999999873
prime 1999999871
prime 1999999861
prime 1999999853
prime 1999999829
prime 1999999817
prime 1999999811
number of primes 11 above 1999999801

Creative coders use backward thinking techniques as a strategy.

zedd151

Quote from: rrr314159 on September 15, 2015, 06:28:02 PM
..should be set to your L1 cache. In the current code it's 32768. Try half that much, or twice, see if it makes a difference

hmmmmm....

ml 6.14 doesn't like mfence


Microsoft (R) Macro Assembler Version 6.14.8444
Copyright (C) Microsoft Corp 1981-1997.  All rights reserved.

Assembling: primes.asm
primes.asm(48) : error A2008: syntax error : MFENCE
RDTSCFLOAT(3): Macro Called From
  InitCycles(17): Macro Called From
   primes.asm(48): Main Line Code
primes.asm(54) : error A2008: syntax error : MFENCE
RDTSCFLOAT(3): Macro Called From
  @RDTSCFLOAT(2): Macro Called From
   ElapsedCycles(3): Macro Called From
    @ElapsedCycles(2): Macro Called From
     movflt(1): Macro Called From
      get_time(1): Macro Called From
       primes.asm(54): Main Line Code


lemme grab jwasm...

zedd151

Just for kicks, I have no idea how big my L1 cache is.


Genuine Intel(R) CPU           T2060  @ 1.60GHz; Windows XP; 2 threads (only 1 used)

prime time seconds 4.036  L1 = 65536

prime time seconds 2.46   L1 = 32768 ; seems optimal

prime time seconds 2.972  L1 = 16384

prime time seconds 3.772  L1 = 8192


used jwasm :t

Quote
JWasm v2.11, ~~~
primes.asm: 409 lines, 2 passes, 46 ms, 0 warnings, 0 errors~~

rrr314159

@Siekmanski,

thanks, wow .4 seconds, sinsi eat your heart out! Chris Wallich was using an I7-4770, more comparable to sinsi's I guess ... it seems that this algo is about the same speed as his, if I make a 64-bit version with threads

@zedd151,

apparently your cache is same as mine 32768. This prime number algo is very sensitive to cache size, as u can see
I am NaN ;)

jj2007

Quote from: rrr314159 on September 15, 2015, 06:28:02 PMThanks, mask0 is an oword in .data section.

Workaround is two qwords, but caution with the order ;-)

rrr314159

Yes jj2007, least significant byte is tricky ;-) ... ML 6.15 doesn't like mfence either ... not sure it's worth it to care about a 6.15 version? Is it the case that some people can't (or, won't) upgrade to 8.0?
I am NaN ;)