News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Rotate Bits SSE

Started by guga, April 27, 2020, 06:30:22 AM

Previous topic - Next topic

guga

Hi Guys

I made a couple of routines to rotate bits from rigth and left using MMX instructions, but i´m failing to do it for Se2. Can someone help ? Also, can these be optimized ?

The MMX instruction are as this:

Rotate Left


      rotl64 -  Rotate Left bits in MM0 register N times.

       Parameters:
                   Count (Input). The total amount of times the register should be shifted left

       Return Value:
                    The shifted value is stored in MM0 register. (See also, remarks below)

       Remarks:
                  The function uses MM0 register to perform the rotation. So, on input, MM0 must already be filled.

       Example of usage:

      [tttt: Q$ 1] ; A qword variable that hold "1" as value
      movq MM0 Q$Value
      call rotl64 1 ; rotate 1 left by bit


Proc rotl64:
    Arguments @Count

    mov eax D@Count
    movq MM2 MM0
    movd MM3 eax
    sub eax 64
    neg eax
    movd MM4 eax
    psllq MM0 MM3
    psrlq MM2 MM4
    por MM0 MM2

EndP




Rotate Right


      rotr64 -  Rotate Right bits in MM0 register N times.

       Parameters:
                   Count (Input). The total amount of times the register should be shifted right

       Return Value:
                    The shifted value is stored in MM0 register. (See also, remarks below)

       Remarks:
                  The function uses MM0 register to perform the rotation. So, on input, MM0 must already be filled.

      [tttt: Q$ 1] ; A qword variable that hold "1" as value
      movq MM0 Q$Value
      call rotr64 1 ; rotate 1 right by bit

Proc rotr64:
    Arguments @Count

    mov eax D@Count
    movq MM2 MM0
    movd MM3 eax
    sub eax 64
    neg eax
    movd MM4 eax
    psrlq MM0 MM3
    psllq MM2 MM4
    por MM0 MM2

EndP



Now...How to do a similar thing (left and right) for 128 Bits ? I tried this one below, but it is not working :(


Proc rot128:
    Arguments @Count

    mov eax D@Count
    movdqu XMM2 XMM0
    movd XMM3 eax
    sub eax 128
    neg eax
    movd XMM4 eax
    psllq XMM0 XMM3
    psrlq XMM2 XMM4
    por XMM0 XMM2

EndP
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

daydreamer

Quote from: guga on April 27, 2020, 06:30:22 AM

Now...How to do a similar thing (left and right) for 128 Bits ? I tried this one below, but it is not working :(


Proc rot128:
    Arguments @Count

    mov eax D@Count
    movdqu XMM2 XMM0
    movd XMM3 eax
    sub eax 128
    neg eax
    movd XMM4 eax
    psllq XMM0 XMM3
    psrlq XMM2 XMM4
    por XMM0 XMM2

EndP

you should use PSLLDQ (double quadword) instead of PSLLQ (only 64bit quadword) versions instead
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

jj2007

#2
Quote from: daydreamer on April 27, 2020, 03:55:20 PMyou should use PSLLDQ (double quadword) instead of PSLLQ (only 64bit quadword) versions instead

Marinus, Magnus, PSLLDQ shifts bytes while PSLLQ shifts bits.

mineiro

#3
The version below is not SSE2. Sorry, I didn't check for optimization or others instructions.

mov rax,0123456789abcdefh   ;high
mov rdx,0123456789abcdefh   ;low
mov ecx,8         ;count
shld rax,rdx,cl         ;shift left rotate higher
shl rdx,cl         ;shift left rotate lower

Rotating to the left is to multiply the number N times by 2. Depending, you may need two variables for the carry even to create a rcl or rol version at each step.
Maybe this can be usefull to others tries.

---edit---
I assembled your source file and now understand. It's something like:

mov rax,0f123456789abcdefh ;high
mov rdx,0123456789abcdefh ;low
mov rbx,0
mov ecx,8 ;count
shld rbx,rax,cl                 ;rbx=high bits of rax
shld rax,rdx,cl ;rotate n bits agregating
shl rdx,cl ;shift left rotate lower, inserting zeros at right side
or rdx,rbx                ;join, logical add
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

Siekmanski

Quote from: jj2007 on April 27, 2020, 08:24:14 PM
Quote from: daydreamer on April 27, 2020, 03:55:20 PMyou should use PSLLDQ (double quadword) instead of PSLLQ (only 64bit quadword) versions instead

Marinus, PSLLDQ shifts bytes while PSLLQ shifts bits.

Am I PSHUFD with Magnus?  :biggrin:
Creative coders use backward thinking techniques as a strategy.

mineiro

hello sir guga,
This is a quick try;  psllq deals with 2 qwords instead of 1 oword(128)

.data
align 32
number    dq 0123456789abcdefh,0fedcba9876543210h
low_mask dq 0ffffffffffffffffh,0000000000000000h ;reversed because using qword
high_mask  dq 0000000000000000h,0ffffffffffffffffh

.code
mov eax,4 ;count
movdqu xmm0,oword ptr [number] ;xmm0=fedcba98765432100123456789abcdef
movdqu xmm1,xmm0
movdqu xmm2,xmm0
pand xmm1,oword ptr [high_mask] ;xmm1=FEDCBA98765432100000000000000000
pand xmm2,oword ptr [low_mask] ;xmm2=00000000000000000123456789ABCDEF
movd xmm3,eax ;xmm3=counter
psllq xmm0,xmm3 ;xmm0=EDCBA98765432100123456789ABCDEF0

sub eax,64
neg eax
movd xmm4,eax ;xmm4=3c
psrlq xmm1,xmm4 ;xmm1=000000000000000F0000000000000000
psrlq xmm2,xmm4 ;xmm2=00000000000000000000000000000000
pxor xmm3,xmm3 ;zero
movhlps xmm3,xmm1 ;high part of 1 to lower part of 3
movlhps xmm3,xmm2 ;lower part of 2 to higher part of 3
;xmm3=0000000000000000000000000000000F
por xmm0,xmm3 ;concatenate


--edit---
I measure both procedures and the first that don't uses SSE perform better in my machine.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

jj2007


Siekmanski

No worries, 'The mouth speaks what the heart is full of'  :rofl:
Creative coders use backward thinking techniques as a strategy.

guga

Hi Mineiro, tks, but not working. Try with Rotating 64 and 65 Bits. The result will be 0.

number    dq 0123456789abcdefh,0fedcba9876543210h
The input are:
0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0111_0110_0101_0100_0011_0010_0001_0000_1000_1001_1010_1011_1100_1101_1110_1111

and the output should be:

Rotate 64 Bts
1110_1100_1010_1000_0110_0100_0010_0001_0001_0011_0101_0111_1001_1011_1101_1110_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000

Rotate 65 bits
1101_1001_0101_0000_1100_1000_0100_0010_0010_0110_1010_1111_0011_0111_1011_1100_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0001
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

guga

Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

mineiro

That's working or not to your needs sir guga? I can change the code to fit your needs.
I suppose shl deals with N-1, so, maximum counter can hold will be 63. Well, now that you said that I need review result with counter being zero, 64 or above.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

guga

Hi Mineiro

No. It´s not working. The goal is to rotate left and right all the bits on a 128 bit data. What the code was doing is shifting the bits (also only for 64) which causes the zeroing of the lo half of the 128 bits, rather then rotating all bits by XXX times.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

mineiro

Maybe this can work, please check. If this works we can deal with rotate right.
.data
align 32
number    dq 3,0
low_mask dq 0ffffffffffffffffh,0000000000000000h ;reversed because using qword
high_mask  dq 0000000000000000h,0ffffffffffffffffh

.code
  mov eax,127 ;count range 0 to 128
  movdqu xmm0,oword ptr [number]
  movdqu xmm1,xmm0
  .if eax >= 64
      movhlps xmm0,xmm1 ;switch high and low
    movlhps xmm0,xmm1
    sub eax,64
  .endif

    movdqu xmm1,xmm0
    movdqu xmm2,xmm0

    pand xmm1,oword ptr [high_mask]
    pand xmm2,oword ptr [low_mask]
    movd xmm3,eax
    psllq xmm0,xmm3

    sub eax,64
    neg eax
    movd xmm4,eax
    psrlq xmm1,xmm4
    psrlq xmm2,xmm4
    pxor xmm3,xmm3
    movhlps xmm3,xmm1
    movlhps xmm3,xmm2
    por xmm0,xmm3
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

Siekmanski

Cool challenge.

EDIT: Sorry, I misunderstood the question and posted the wrong code.  :rolleyes:
I'll try again and if it works, post some code that does the job.
Creative coders use backward thinking techniques as a strategy.

mineiro

Quote from: guga on April 28, 2020, 06:56:11 AM
by XXX times.
obscene  :rofl:

I figured it was not necessary to use masks, I cleaned the code.
  mov eax,127 ;count range 0 to 128
  movdqu xmm0,oword ptr [number]

  .if eax >= 64
    movdqu xmm1,xmm0
      movhlps xmm0,xmm1 ;switch high and low
    movlhps xmm0,xmm1
    sub eax,64
  .endif

    movdqu xmm1,xmm0
    movd xmm3,eax
    psllq xmm0,xmm3

    sub eax,64
    neg eax
    movd xmm4,eax
    psrlq xmm1,xmm4
    movhlps xmm3,xmm1
    movlhps xmm3,xmm1
    por xmm0,xmm3

I'd rather be this ambulant metamorphosis than to have that old opinion about everything