Rotate Bits SSE

Siekmanski · May 04, 2020, 07:24:23 AM

The Shuffle MACRO can be used for all the 64/128bit shuffle instructions.
It sets the 8bit shuffle bitfield ( imm8 ) in the right order for each DWORD/REAL4 ( each DWORD/REAL4 has 2 bits per position (0-3) )
Each QWORD/REAL8 must be treated as 2 DWORD/REAL4 pairs.
The same for the 64bit shuffle, only then there are 2 position bits per WORD.

shuffle_instruction xmm0,xmm1,imm8

guga · May 07, 2020, 06:15:34 PM

Quote from: Siekmanski on April 29, 2020, 08:17:30 AM

First I was "PSHUFD" with Magnus, and now with Scarmatil.

Code Select Expand
;rotate64left 1 bit movq xmm0,qword ptr [Number64] movq xmm1,xmm0 psllq xmm0,1 psrlq xmm1,63 pxor xmm0,xmm1 ;rotate64right 1 bit movq xmm0,qword ptr [Number64] movq xmm1,xmm0 psllq xmm0,63 psrlq xmm1,1 pxor xmm0,xmm1

Hi Siekmanski. I tested the routine to rotate the qwords inside a xmm but it seems not working. On the High Qword Bit127 is not rotating to bit 64 and on LowQword Bit 63 is not rotating to bit 0

Ex:
For rotating left, say in xmm0 we have this (From bit 127 to 0):
10101000 10111010 10001001 10011100 00111101 01100100 11011101 10000100 11110111 01110110 01001010 00101111 10100010 11111111 11001111 11110100

So, the HiQword (Bit 127 to 64) is:
10101000 10111010 10001001 10011100 00111101 01100100 11011101 10000100

and the low qword is (Bit 63 to 0):
11110111 01110110 01001010 00101111 10100010 11111111 11001111 11110100

When i use the rotate64left routine, it turns onto:

New HiQword (Bit 127 to 64) is:
01010001 01110101 00010011 00111000 01111010 11001001 10111011 00001000

and the new low qword is (Bit 63 to 0):
11101110 11101100 10010100 01011111 01000101 11111111 10011111 11101000

When they should be:

New HiQword (Bit 127 to 64) is:
01010001 01110101 00010011 00111000 01111010 11001001 10111011 00001001

and the new low qword is (Bit 63 to 0):
11101110 11101100 10010100 01011111 01000101 11111111 10011111 11101001

guga · May 07, 2020, 06:57:05 PM

I guess i found. It seems that replacing

movq xmm1,xmm0

with
movdqu xmm1,xmm0

Should do the trick, right ? :)

And how to rotate N bits and not only 1 ? I do like this ?

Code Select

Proc rot64left:
    Arguments @Count

    mov eax D@Count
    mov ecx 64             ; 64 bits Barrier. 
    movd xmm3 ecx                ; 64 bit Shift-Range.
    movdqu xmm1 xmm0
    movd xmm2 eax
    psubq xmm3 xmm2               ; Calculate Shift-Range.
    psllq xmm0 xmm2
    psrlq xmm1 xmm3
    pxor xmm0 xmm1
  
EndP

Siekmanski · May 07, 2020, 10:30:38 PM

Hi guga,

Here is the logic of N bits rotation,

N = 7
movdqu xmm1,xmm0
psllq xmm0,N
psrlq xmm1,64-N
pxor xmm0,xmm1

the other direction

N = 7
movdqu xmm1,xmm0
psllq xmm0,64-N
psrlq xmm1,N
pxor xmm0,xmm1

guga · May 08, 2020, 05:21:19 AM

Great :)

Tks a lot, Siekmanski

I´m trying to optimize that Sha3 Algorithm and it do uses those rotate left and right (and also shift) routines to MMX. I´m trying to port them to SSE2.

So far i succeeded to partially convert the MMX to SSE on the Theta routine inside the keccakf function, but still it have a long way to fully understand this algorithm.One good thing is that if i succed to port, then maybe (just a Huge maybe) there should have a way to reverse it back.

Some of the routines inside the keccakf are basically a copy of the data belonging to the state of the words. For example, i found that on the "Chi" routine, all the second "for" is simply a binary copy. Therefore, this routine is unnecessary and can be optimized

Code Select

         Chi
         for (j = 0; j < 25; j += 5) {
           for (i = 0; i < 5; i++) <----------------- This is a simple copy from st (state) to bc variable.
             bc[i] = st[j + i];       <-----------------
           for (i = 0; i < 5; i++)
             st[j + i] ^= (~bc[(i+1)%5]) & bc[(i+2)%5];
         }

And even this copy of chunks maybe removed later. The main problem of this Algorithm is that it is hard to follow and understand what it is doing, but if i succeed to optimize and simplify it, then maybe this can be reverted (If not totally, at least part of it could be, i hope).

Why revert ? Well, because the algorithm seems to act more like an encoder then a hash per se. So, if (and it is a huge huge huge if) this can be reversed, then we could, in theory, use it to compress whatever file, text etc etc is needed forcing the data to be limited to a 256 or 512 etc etc bytes long. (Kind of impossible, i know, but the behavior of this parts of the routines i´m trying to work with, seems to act more like an encoder)

daydreamer · June 17, 2020, 04:32:20 AM

so if I want to use SSE shift/rotate in combination with a winapi calls on modern win version,are there any safe XMM regs,similar like some gp registers are or I need to save/restore everytime in the loop(s)?

The MASM Forum

News:

Rotate Bits SSE

Siekmanski

guga

guga

Siekmanski

guga

daydreamer