News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Rotate Bits SSE

Started by guga, April 27, 2020, 06:30:22 AM

Previous topic - Next topic

Siekmanski

The Shuffle MACRO can be used for all the 64/128bit shuffle instructions.
It sets the 8bit shuffle bitfield ( imm8 ) in the right order for each DWORD/REAL4 ( each DWORD/REAL4 has 2 bits per position (0-3) )
Each QWORD/REAL8 must be treated as 2 DWORD/REAL4 pairs.
The same for the 64bit shuffle, only then there are 2 position bits per WORD.

shuffle_instruction xmm0,xmm1,imm8
Creative coders use backward thinking techniques as a strategy.

guga

Quote from: Siekmanski on April 29, 2020, 08:17:30 AM
:biggrin:
First I was "PSHUFD" with Magnus, and now with Scarmatil.   :thumbsup:

    ;rotate64left 1 bit
    movq    xmm0,qword ptr [Number64]
    movq    xmm1,xmm0
    psllq   xmm0,1 
    psrlq   xmm1,63
    pxor    xmm0,xmm1

    ;rotate64right 1 bit
    movq    xmm0,qword ptr [Number64]
    movq    xmm1,xmm0
    psllq   xmm0,63
    psrlq   xmm1,1
    pxor    xmm0,xmm1


Hi Siekmanski. I tested the routine to rotate the qwords inside a xmm but it seems not working. On the High Qword Bit127 is not rotating to bit 64 and on LowQword Bit 63 is not rotating to bit 0

Ex:
For rotating left, say in xmm0 we have this (From bit 127 to 0):
10101000 10111010 10001001 10011100 00111101 01100100 11011101 10000100 11110111 01110110 01001010 00101111 10100010 11111111 11001111 11110100

So, the HiQword (Bit 127 to 64) is:
10101000 10111010 10001001 10011100 00111101 01100100 11011101 10000100

and the low qword is (Bit 63 to 0):
11110111 01110110 01001010 00101111 10100010 11111111 11001111 11110100


When i use the rotate64left routine, it turns onto:

New HiQword (Bit 127 to 64) is:
01010001 01110101 00010011 00111000 01111010 11001001 10111011 00001000

and the new low qword is (Bit 63 to 0):
11101110 11101100 10010100 01011111 01000101 11111111 10011111 11101000

When they should be:

New HiQword (Bit 127 to 64) is:
01010001 01110101 00010011 00111000 01111010 11001001 10111011 00001001

and the new low qword is (Bit 63 to 0):
11101110 11101100 10010100 01011111 01000101 11111111 10011111 11101001
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

I guess i found. It seems that replacing

movq    xmm1,xmm0

with
movdqu    xmm1,xmm0

Should do the trick, right ? :)


And how to rotate N bits and not only 1 ? I do like this ?

Proc rot64left:
    Arguments @Count

    mov eax D@Count
    mov ecx 64             ; 64 bits Barrier.
    movd xmm3 ecx                ; 64 bit Shift-Range.
    movdqu xmm1 xmm0
    movd xmm2 eax
    psubq xmm3 xmm2               ; Calculate Shift-Range.
    psllq xmm0 xmm2
    psrlq xmm1 xmm3
    pxor xmm0 xmm1
 
EndP
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

Siekmanski

Hi guga,

Here is the logic of N bits rotation,

N = 7 
movdqu xmm1,xmm0
psllq  xmm0,N
psrlq  xmm1,64-N
pxor   xmm0,xmm1

the other direction

N = 7 
movdqu xmm1,xmm0
psllq  xmm0,64-N
psrlq  xmm1,N
pxor   xmm0,xmm1
Creative coders use backward thinking techniques as a strategy.

guga

Great :)

Tks a lot, Siekmanski

I´m trying to optimize that Sha3 Algorithm and it do uses those rotate left and right (and also shift) routines to MMX. I´m trying to port them to SSE2.

So far i succeeded to partially convert the MMX to SSE on the Theta routine inside the keccakf function, but still it have a long way to fully understand this algorithm.One good thing is that if i succed to port, then maybe (just a Huge maybe) there should have a way to reverse it back.

Some of the routines inside the keccakf are basically a copy of the data belonging to the state of the words. For example, i found that on the "Chi" routine, all the second "for" is simply a binary copy. Therefore, this routine is unnecessary and can be optimized

         Chi
         for (j = 0; j < 25; j += 5) {
           for (i = 0; i < 5; i++) <----------------- This is a simple copy from st (state) to bc variable.
             bc[i] = st[j + i];       <-----------------
           for (i = 0; i < 5; i++)
             st[j + i] ^= (~bc[(i+1)%5]) & bc[(i+2)%5];
         }


And even this copy of chunks maybe removed later. The main problem of this Algorithm is that it is hard to follow and understand what it is doing, but if i succeed to optimize and simplify it, then maybe this can be reverted (If not totally, at least part of it could be, i hope).

Why revert ? Well, because the algorithm seems to act more like an encoder then a hash per se. So, if (and it is a huge huge huge if) this can be reversed, then we could, in theory, use it to compress whatever file, text etc etc is needed forcing the data to be limited to a 256 or 512 etc etc bytes long. (Kind of impossible, i know, but the behavior of this parts of the routines i´m trying to work with, seems to act more like an encoder)
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

daydreamer

so if I want to use SSE shift/rotate in combination with a winapi calls on modern win version,are there any safe XMM regs,similar like some gp registers are or I need to save/restore everytime in the loop(s)?

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding