The MASM Forum

General => The Laboratory => Topic started by: flaschenpost on August 27, 2013, 03:58:06 AM

Title: Prime Sieve again - stumbling through quite a bit of memory
Post by: flaschenpost on August 27, 2013, 03:58:06 AM
Hi,

I have discussed it a bit in the end of another longer thread, but now I tried to give the full sieve.

I am a real, real beginner in asm and probably doing a lot of silly stuff in the attempt of writing out data.

Since I use jwasm with linux, I can't use the jj2007 macros, I'm afraid.

But that is not my main point. My main point is: how could I improve the speed of memory-walkthrough like that on Sieve of Erastothenes.

jj2007 mentioned that shifting is slow, and with a few Tests I could see that it costs some time. So I was lucky to find bts, which can set a bit in memory without the c-stylisch mem[i/64] |= 1UL << i%64.

But still I have to add to my bit-position a number and then look if I got over 64-bit, in order to jump i/64 "lines" forward.

I tried several things, like "decomposing" the stepsize into Lines and Bits, like precalculating the number of steps, thus invoking a division - but that just made things slower.

I do not need that stuff, it is just for fun, so no pressure. Just curious how fast it could get.

SSSE3 I do have, but no further - and AVX does not yet have shift operations, and no "set bit" in the xmm- registers.

So: Have Fun!

Marco
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: Gunther on August 27, 2013, 05:19:00 AM
Hi flaschenpost,

didi you check that link? (http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf)

Gunther
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: flaschenpost on August 27, 2013, 08:07:55 AM
Hi Gunther,

Quote from: Gunther on August 27, 2013, 05:19:00 AM

did you check that link? (http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf)


thanks for the nice tip!

I haven't seen that link before, but quite a bit of it fits with other ressources and with my own experiments/thoughts. It is cute how this small problem leeds to similar kind of solutions! ;-)

There also is a faster Sieve of Atkins, so really this is just pure fun: get the fastest instructions to "cross out" every p-th Bit in a Gig of RAM, and do it for a lot of increasing primes p.

I also tried to be more "memory-local" and I got rid of those tables mentioned in page 6 by some calculations, but at least my implementations were slower than the plain straight solution.

The first few primes could fill a small number of 64-bit-values (210 in the 2-3-5-7-Case) and I will definitely add that precalculated Pattern in the end.

I do think that storing the property of being prime or not prime with one bit per number is quite efficient, and that other data structures have much more overhead. Normally crossing out a bit in RAM is seen as very, very fast. Up to 2^29 it takes just 20 seconds in Assembler and 30s in optimized GCC.

The usual sieve challenge is getting out of the RAM-size and calculating the property blockwise.

Here I want to search for the fastest combination of instructions to achieve setting every p-th bit for a big number of increasing values of p.

The kernel loop inner_loop in the following code. r9 holds the current prime number itself, r10 the current byte position, rbx the current bit position, rcx the loop counter. rbx may have a 64-bit-"Overflow", I mean it will get greater than 63. This I have to check and then I need to add the value of 8*(rbx/64) to my Byte-Pointer r10.

The nice thing about x86/64-Bit-Assembler is that everything fits into registers, everything but the write-to-RAM itself.


    ; how far should we go?
    mov rcx, endmem
    sub rcx, r10
    jc inner_break
    jz inner_break

    sub rcx,1
    ; run (or walk) through the Megabits!
    inner_loop:
        ; set bit
        bts [r10], bx

        add rbx, r9
        mov rax, rbx
        shr rax, 3
        and rax, -8
        and rbx,63
        add r10, rax
        sub rcx, rax
        jc inner_break

    jmp inner_loop



Good night  -  Marco.
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: KeepingRealBusy on August 29, 2013, 09:13:47 AM
Marco,

There are several things you can do.

1. Use an odd bit sieve -  bits represent only the odd numbers like 1, 3, 5, 7 ...  This gives you twice the numbers for the same memory block.

2. Set all bits, then reset all multiples of the prime numbers. Since 2 is not present, you save the time of resetting half of all of the bits.

3. Scan for a set bit, double and add 1 for the prime number, Square that prime for the first bit to reset, divide by 2 (shift right 1) to get the reset index, then reset the bit and add the prime number for the next bit to reset.

To calculate the last prime number to test for, take the bit number of the first bit outside of the block of bits you are working with, double +1 and take the square root (this only has to be done once - when the size of the bit block is defined).

When the end of the block is detected while resetting, save the bit index and the current block number you are working on in a table indexed by the bit index of the prime number. Get the next prime and repeat. When the last prime has been processed for this block, allocate another block, set all bits, then re-process with the saved table of bit index and block, possibly extending the saved table values.

I did this in 32 bit mode, with about 3 3/4 GB of memory (14 blocks of 256 MB), after 14 blocks were processed, repeated with 13 new blocks (same blocks, re0initialized) 5 times giving 66 blocks of prime bits.  The prime bit scan did not have to go outside of the first block for the entire process.


The Total Count of the Primes in all of the blocks is 16,028,906,205.
The current time is: 11:23:40.00
The current time is:  9:17:42.04
                     -----------
                     02:05:57.96


Dave.
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: KeepingRealBusy on August 29, 2013, 09:24:29 AM
Marco,

Let me correct one part of that explanation.

"When the end of the block is detected while resetting, save the bit index and the current block number you are working on in a table indexed by the bit index of the prime number. "

The index for the save table is just an incrementing offset in the table. In 32 bit mode with just 66 blocks to process, I only had to save 1 BYTE and 1 DWORD (the block number, and the bit index), thus 5 BYTES per entry. The next iteration will repeat the same old prime numbers and add some new ones, i.e. 3,5,7,11,13 ...  Overlay the table entries when you get to the end of the block, and keep on trucking through the table, then add new entries at the end of the table, then repeat for the next iteration.

Dave.
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: Farabi on August 31, 2013, 12:18:10 PM
My, people has switched to 64 bits. I better wait until Im Up with another system. Or stick with my android PCTablets. Still unsure they had a 1 Ghz processors. But on some few tests, yes it did. It was the OOP thingy and bad programming concept that making everything is slow. Or sabotage.
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: flaschenpost on September 01, 2013, 01:14:32 AM
Hi Dave,

my Code still uses only odd numbers, it doesn't need to set all bits because "1" means "is composite" and in my code the marking as "is composite" starts with p*p, so if p*p > max I break the outer loop.

My Code goes up to 2^29 in 20 Seconds.

I tried to comment my asm; if something is unclear, please ask.

Quote from: KeepingRealBusy on August 29, 2013, 09:13:47 AM
Marco,

There are several things you can do.

1. Use an odd bit sieve -  bits represent only the odd numbers like 1, 3, 5, 7 ...  This gives you twice the numbers for the same memory block.

2. Set all bits, then reset all multiples of the prime numbers. Since 2 is not present, you save the time of resetting half of all of the bits.

3. Scan for a set bit, double and add 1 for the prime number, Square that prime for the first bit to reset, divide by 2 (shift right 1) to get the reset index, then reset the bit and add the prime number for the next bit to reset.

The Total Count of the Primes in all of the blocks is 16,028,906,205.
The current time is: 11:23:40.00
The current time is:  9:17:42.04
                     -----------
                     02:05:57.96


Dave.
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: flaschenpost on September 01, 2013, 01:17:23 AM
Quote from: Farabi on August 31, 2013, 12:18:10 PM
My, people has switched to 64 bits.

Well, the extra registers *are* tempting, aren't they? Still you could try beating a greenhorn like me with even better 32-Bit-Assembler, since I didn't even use a single SSE3 instruction. ;-)

If you want I can try to write the algorithm in a clearer way.
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: Rockoon on September 29, 2013, 05:30:50 AM
Quote from: flaschenpost on August 27, 2013, 03:58:06 AM
jj2007 mentioned that shifting is slow, and with a few Tests I could see that it costs some time. So I was lucky to find bts, which can set a bit in memory without the c-stylisch mem[i/64] |= 1UL << i%64.

But still I have to add to my bit-position a number and then look if I got over 64-bit, in order to jump i/64 "lines" forward.

BTS [base], index

aka

BTS [reg32], reg32

works on a memory base pointer, with up to 32-bits of displacement index in 32-bit mode (so thats 512BM of memory, might treat the index as signed so care must be taken)

In 64-bit mode it could well be able to access all available memory using a single base, but you might want to double-check that.

Its quite a bit slower in latency than the register equivalent BTS reg32, reg32 according to agner fog, but you arent waiting for the result anyways.  basically, I see no reason to be recalculating index/64 .. the base can remain constant for literally billions of delta.
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: K_F on November 02, 2013, 04:44:54 PM
Do a search for 'Prime Number Circle' (Peter Plitcha)
This method seems to be simpler, and maybe faster.
  8)
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: Gunther on November 02, 2013, 11:57:33 PM
Hi K_F,

Quote from: K_F on November 02, 2013, 04:44:54 PM
Do a search for 'Prime Number Circle' (Peter Plitcha)
This method seems to be simpler, and maybe faster.
  8)

his name is Peter Plichta (https://de.wikipedia.org/wiki/Peter_Plichta) and it seems to me that he's a weirdo.

Gunther
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: dedndave on November 03, 2013, 01:06:43 AM
Quote from: Gunther on November 02, 2013, 11:57:33 PM
his nama is Peter Plichta (https://de.wikipedia.org/wiki/Peter_Plichta) and it seems to me that he's a weirdo.

i'm a little surprised to hear you say that, Gunther   :biggrin:

it sounds to me like a case of people not listening to someone who's found something new
not the first time people laughed at the smart guy

as for the prime number thing - not particularly fond of the prime circle
but it's an interesting pattern
what i get from it - possibly a base-6 counting system to do the sieve   :t
all the primes end in 1 or 5 (base-6)
so, once you get started....
add 2, prime test, add 4, prime test, add 2, prime test, add 4.....
conversely, you have eliminated 2/3 of all the integers
so - each bit in the stored data might represent only the base-6 1's and 5's
then - use ling long kai fang to access the info and do the base conversion
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: dedndave on November 03, 2013, 02:47:20 AM
how about.....

base-210, anyone ?   :biggrin:
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: Gunther on November 03, 2013, 04:08:42 AM
Dave,

Quote from: dedndave on November 03, 2013, 01:06:43 AM
i'm a little surprised to hear you say that, Gunther   :biggrin:

it sounds to me like a case of people not listening to someone who's found something new
not the first time people laughed at the smart guy

it's not surprising. No doubt the man is a good chemist. But I've read his books (only available in German) and I think that the world around us is not so easy build like Dr. Plichta might think. Furthermore, he's vain and taken from him. I've observed that in a panel discussion with him. So, I think there are good reasons for my point of view.

Gunther
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: dedndave on November 03, 2013, 04:14:02 AM
well - i saw a video on youtube
didn't watch the whole thing because i don't speak German
but - he was talking about building a saucer-shaped spaceship - lol
he may have some unique ideas on fuel
so - let's allow him to build it - and see how it goes   :biggrin:
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: Gunther on November 03, 2013, 04:24:09 AM
Dave,

Quote from: dedndave on November 03, 2013, 04:14:02 AM
but - he was talking about building a saucer-shaped spaceship - lol
he may have some unique ideas on fuel
so - let's allow him to build it - and see how it goes   :biggrin:

that's another point. His spaceship may fly or not fly. I can't say much about that because my research interests are the Maxwell equations. But what he wrote about prime numbers and God's hidden formula (which he had found) sounds like nonsense to me. That was my point.

Gunther
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: K_F on November 03, 2013, 08:07:29 PM
It looks like he's pizzed off nearly all the scientific community because he's a renegade/rebel in the scientific world, telling them that they're 'barking up the wrong tree'.

In a way I agree with him, as the idea of science research is to simplify the world around us, whereas science seems to have made it more complicated and really messy - IOW something has gone wrong.
He has come up with hypothetical new theories/postulates, without any funding or research grants - and this is the BIG difference between himself and other researchers who are paid to come up with an 'answer'.
They're very scared of him, as if proven correct (which they're probably will not try to do as they're not going to shoot themselves in the foot..)  - there goes Quantum Theory..Relativity .. and a few other heavily funded theories.

There's a quote out there by someone I cannot remember - goes along the lines..
"All major advances in science are made by the rebels/outsiders"

I wouldn't dispel his theories just yet..
;)
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: georg on November 03, 2013, 10:12:50 PM
to find prime numbers, I use this technique, I use a period of 30, where I avoid multiples of 3, I compute directly the first prime numbers less than 31, and I begin my cycle with 31, in a period of 30 you got 8 possible candidates to be primes, this period is divided in three parts a) number whose Lsb are 1, and 7 first decade, b) numbers with 1, 7, 3, 9 Lsb in the second decade, and c) numbers whose Lsb are 3, and 9, in a period of 30, you got 8 candidates to test for primes, I do the trial, testing to the limit of prime numbers less than some given p*p prime, beginning testing with 7, is a pretty optimized algorithm.  :biggrin:
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: georg on November 03, 2013, 10:15:46 PM
I got to work with some code to make it in assembly, I haven't done that yet.
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: georg on November 03, 2013, 10:29:23 PM
Quote from: georg on November 03, 2013, 10:12:50 PM
to find prime numbers, I use this technique, I use a period of 30, where I avoid multiples of 3, I compute directly the first prime numbers less than 31, and I begin my cycle with 31, in a period of 30 you got 8 possible candidates to be primes, this period is divided in three parts a) numbers whose Lsb are 1, and 7 first decade, b) numbers with 1, 7, 3, 9 Lsb in the second decade, and c) numbers whose Lsb are 3, and 9, in a period of 30, you got 8 candidates to test for primes, I do the trial, testing to the limit of prime numbers less than some given p*p prime, begining testing with 7, is a pretty optimized algorithm.  :biggrin:

that makes you think of the possibility of a finite quantity of prime numbers, due to the ever increasing possibility of finding non prime numbers as a combination of previous findings, but because Euler we know there are infinite prime numbers.
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: KeepingRealBusy on November 04, 2013, 07:51:29 AM
Quote from: K_F on November 02, 2013, 04:44:54 PM
Do a search for 'Prime Number Circle' (Peter Plitcha)
This method seems to be simpler, and maybe faster.
  8)

Here is a link to the CodeProject, they had a long discussion about this back in Nov 2008.

http://www.codeproject.com/Messages/2815908/Wheel-Factorization-to-simplify-a-Sieve-of-Eratost.aspx

Dave.
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: Gunther on November 04, 2013, 08:48:16 AM
Quote from: KeepingRealBusy on November 04, 2013, 07:51:29 AM
Here is a link to the CodeProject, they had a long discussion about this back in Nov 2008.

http://www.codeproject.com/Messages/2815908/Wheel-Factorization-to-simplify-a-Sieve-of-Eratost.aspx

Dave.

interesting discussion. My impression is that a modified sieve algorithm will do the job.

Gunther
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: dedndave on November 05, 2013, 03:06:59 AM
Sheldon Prime Sieve Method   :biggrin:

base-210 isn't such a bad idea
if you are using each bit as odd number values, you are getting a "bit density" of 16 values per byte

by using base-210, you can get an equivalent bit density of 35 values per byte
48 positions is 6 bytes - you cover 210 values with 6 bytes of data
by following the pattern in the "offsets" list, you signifigantly speed up the sieve   :t
~77% of all values are already eliminated

Note: Primes 2, 3, 5, and 7 are removed.

(numbers in each group are read vertically)

positions:
000000000011111111112222222222333333333344444444
012345678901234567890123456789012345678901234567

offsets:
000000000000000000000011111111111111111111111112
011112233444556677788900001223334455666778899990
113793917137391713939713793171793917379391713799

primes from 000 to 209:
-00000000000000000000011111-1111-11111-111-1111-
-11112233444556677788900001-2333-45566-778-9999-
-13793917137391713939713793-7179-91737-391-1379-

primes from 210 to 419:
2-222222-2-2222222-2-3333--33-3333-33-333-34--44
1-222334-5-5667788-9-0111--33-4455-67-788-90--01
1-379391-1-7391713-3-7137--17-7939-73-939-71--99

primes from 420 to 629:
444-444-4444-4-44-455-55---55-5-5555-5-5566-666-
233-344-5666-7-89-900-22---44-5-6677-8-9900-111-
113-939-7137-9-71-939-13---17-7-3917-7-3917-379-

primes from 630 to 839:
6666-666--666-6-7-7-77-7-77777-77--7-7--88-88888
3444-556--778-9-0-0-12-3-34556-67--8-9--01-22223
1137-391--373-1-1-9-97-3-93171-93--7-7--91-13799

primes from 840 to 1049:
--0000--0000---00-0-000-0-0-000-0-00--1111-11-11
--8888--8888---99-9-999-9-9-999-9-99--0000-00-00
--5556--7888---01-1-234-4-5-677-8-99--0112-33-34
--3793--7137---71-9-971-7-3-717-3-17--9391-13-99

also, base conversion between binary and base-210 is fairly compact
each data byte can represent a base-210 digit

(C)opyright David R. Sheldon, November 4, 2013
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: Gunther on November 05, 2013, 03:26:07 AM
Dave,

Quote from: dedndave on November 05, 2013, 03:06:59 AM
Sheldon Prime Sieve Method   :biggrin:

base-210 isn't such a bad idea
if you are using each bit as odd number values, you are getting a "bit density" of 16 values per byte

by using base-210, you can get an equivalent bit density of 70 values per byte
48 positions is 3 bytes - you cover 210 values with 3 bytes of data
by following the pattern in the "offsets" list, you signifigantly speed up the sieve   :t

excellent proposal. That'll do the job.

Gunther
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: dedndave on November 05, 2013, 04:18:41 AM
little snafu with my math - lol
48 positions is 6 bytes - doh !
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: K_F on November 05, 2013, 07:28:58 AM
There must be a way of doing this 'logically' (Har Har)..
Test bit-0..
- if == 0 then even >>> Not Prime unless 2
-Then Odd >> now comes the fun bit... just that I haven't thought about it yet!!  :t

There might be a pattern cycling through the bit positions from bit-1 upwards... thinking off my head  :icon_eek:
Possibly a prime pattern within the primes....
I always enjoyed watching those cartoons with my kids .. Prime Evil - I think he was from the Power Rangers series
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: dedndave on November 05, 2013, 08:48:30 AM
the steps follow these offsets
001
011
013
017
019
023
029
031
037
041
043
047
053
059
061
067
071
073
079
083
089
097
101
103
107
109
113
121
127
131
137
139
143
149
151
157
163
167
169
173
179
181
187
191
193
197
199
209


the Eratosthenes algorithm runs on multiples, not additive steps
i am trying to let the algo "fall into place" in my head - lol
it will come to me

i think the ling long kai fang base conversion algo will get me there
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: dedndave on November 05, 2013, 08:34:32 PM
well - i wanted to start by spliting those offsets into the 6 groups for each byte
a bit of luck - that went surprisingly well   :t
because of the fact that bytes have 8 bits, we can split the low-order base-210 into a base-6 digit and a base-35 digit
that way, we can use the base-6 digit to address 1 of 6 bytes and the base-35 digit to address bits
we can also use the base-6 digit to select the "parse" values for the base-35 values
you can do that by a set of look-up tables or by branching, i guess

nonetheless, when the offsets are split into 6 groups of 8....
------------0
001  01
011  11
013  13
017  17
019  19
023  23
029  29
031  31
------------35
037  02
041  06
043  08
047  12
053  18
059  24
061  26
067  32
------------70
071  01
073  03
079  09
083  13
089  19
097  27
101  31
103  33
------------105
107  02
109  04
113  08
121  16
127  22
131  26
137  32
139  34
------------140
143  03
149  09
151  11
157  17
163  23
167  27
169  29
173  33
------------175
179  04
181  06
187  12
191  16
193  18
197  22
199  24
209  34

that worked out nicely   :biggrin:

so, it would seem that the best approach is to actually mix bases
a simple example...
high order             low order
base4294967296digit  base35digit

in this case, we need to take the base-4294967296 digit and MOD 6 to figure out how to parse the low-order base-35 digit into bits

rather than constantly dividing to get the mod-6 "remainder", we can keep another high-order digit in parallel
high order             low order
base4294967296digit
                     base35digit
          mod6digit

the base-4294967296 digit can be used to address bytes
and, the mod-6 digit can be used to parse the low-order digit

the limits of the above method are
max file: 4gb
   range: 140g = 150,323,855,360


if we wanted a larger range, we could use a real mixture of bases - lol
high order                         low order
base4294967296digit  base6digit  base35digit

max file: 24gb
   range: 840g = 901,943,132,160

which makes it a little tricky to address bytes, but it's do-able
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: dedndave on November 07, 2013, 03:55:09 AM
when it came to implementing this in code, i realized the best method is probably:
high order              low order
base4294967296digit  base210digit

max file: 24gb
   range: 840g = 901,943,132,160


then, i use a 210-element look-up table that has byte offsets (0-5) and bit masks
if the mod-210 digit is for a non-prime, the table entry is 0
i could use a word LUT, but i think i'll spend the extra 420 bytes to eliminate the size override operators   :P
i can MUL the high-order digit by 6 and add the offset to obtain the address
this keeps the number of DIV operations to a minimum
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: Gunther on November 07, 2013, 04:47:27 AM
Dave,

Quote from: dedndave on November 07, 2013, 03:55:09 AM
i can MUL the high-order digit by 6 and add the offset to obtain the address
this keeps the number of DIV operations to a minimum

good idea.  :t Go forward.

Gunther
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: dedndave on November 07, 2013, 05:15:28 AM
well - i may not use MUL - i will probably use the extended addressing modes   :P

for now, i am just writing a test to 8,589,934,591 so i can compare it with Eratosthenes odd-bit
with a little luck, i can develop a theorem to eliminate mod-7's
then, i can use an "Atkin-like" method to create a new "Sheldon-Atkin-Bernstein" sieve
Title: Re: Prime Sieve again - stumbling through quite a bit of memory
Post by: herge on November 10, 2013, 10:48:55 AM

Hi Bit Coders:

Some times it's quicker to tick off a byte or word than a bit and
yes you are wasting mwmory like a pig? Some of the bit manipulation
instructions are  quite large.
Bit manipulation is not always faster.