News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

BITs extraction from a byte

Started by frktons, February 02, 2013, 02:35:06 AM

Previous topic - Next topic

frktons

Hi all.

I need a faster way to do the following:

- extract 2 bits at a time from a byte, from LSB to MSB.
- move its values into 4 dwords.

I've created a sample program to show what I mean. I'm not satisfied
with it, I think it could be done faster and shorter.

Any idea? Thanks.

Frank 


;==============================================================================
include \masm32\include\masm32rt.inc
;==============================================================================
.nolist
.686
.xmm

;==============================================================================
.data

align 4
val1  dd 0
val2  dd 0
val3  dd 0
val4  dd 0


align 4
Ptrval1 dd val1

bit_stream  db 11100100B
 
   
.code

;==============================================================================
Main proc near

    mov al, bit_stream
    mov cl, al
    and al, 00000011B

    mov byte ptr val1, al
   
    mov al, cl
    and al, 00001100B
    shr al, 2
 
    mov byte ptr val2, al
   
    mov al, cl
    and al, 00110000B
    shr al, 4
 
    mov byte ptr val3, al 
     
    mov al, cl
    and al, 11000000B
    rol al, 2
   
    mov byte ptr val4, al     

    print " First value "
    print str$(val1), 13, 10
   
    print " Second value "
    print str$(val2), 13, 10   
   
    print " Third value "
    print str$(val3), 13, 10
   
    print " Fourth value "
    print str$(val4), 13, 10           

    ret

Main endp

;==============================================================================
start:
;==============================================================================
    call Main

    inkey
    exit
;==============================================================================
end start


Output:

Quote
First value 0
Second value 1
Third value 2
Fourth value 3
Press any key to continue ...
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave

hiya Frank
you might do something like this

mov edx,offset Val1

mov al,cl
and al,3
shr ecx,2
mov [edx],al

mov al,cl
add edx,4
and al,3
shr ecx,2
mov [edx],al

mov al,cl
add edx,4
and al,3
shr ecx,2
mov [edx],al

add edx,4
and cl,3
mov [edx],cl


another variation
mov edx,offset Val1

mov al,cl
and al,3
shr ecx,2
mov [edx],al

mov al,cl
and al,3
shr ecx,2
mov [edx+4],al

mov al,cl
and al,3
shr ecx,2
mov [edx+8],al

and cl,3
mov [edx+12],cl

dedndave

this may be faster
mov al,cl
and al,3
shr ecx,2
mov byte ptr Val1,al

mov dl,cl
and dl,3
shr ecx,2
mov byte ptr Val2,dl

mov al,cl
and al,3
shr ecx,2
mov byte ptr Val3,al

and cl,3
mov byte ptr Val4,cl

frktons

Quote from: dedndave on February 02, 2013, 03:28:00 AM
this may be faster
mov al,cl
and al,3
shr ecx,2
mov byte ptr Val1,al

mov dl,cl
and dl,3
shr ecx,2
mov byte ptr Val2,dl

mov al,cl
and al,3
shr ecx,2
mov byte ptr Val3,al

and cl,3
mov byte ptr Val4,cl


Hi Dave.

I think you're using the same logic as mine, with some improvements [I hope].

You know mov [edx],al if it works? Shouldn't the two operands be
of the same size? like mov byte ptr [edx],al.

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave

EDX is not the operand
the memory location at [EDX] is the operand
so, the assembler knows the size because AL is used - it can't be anything but a byte

frktons

Quote from: dedndave on February 02, 2013, 04:12:29 AM
EDX is not the operand
the memory location at [EDX] is the operand
so, the assembler knows the size because AL is used - it can't be anything but a byte
OK Dave.   :t

Looking for a XMM/SSE2/3 faster version.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave

even though direct addressing takes a few more bytes of code, i think it's the fastest
for example...
mov byte ptr Val3,al

yah - SSE is going to be faster, i am sure

frktons

Quote from: dedndave on February 02, 2013, 04:15:47 AM
even though direct addressing takes a few more bytes of code, i think it's the fastest
for example...
mov byte ptr Val3,al
Thanks Dave.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave

here is kind of a "Henry Ford" approach
we push and pop ebx, but reduce dependancies
Bits1   PROC

    movzx eax,byte ptr bit_stream
    push  ebx
    mov   ecx,eax
    mov   edx,eax
    mov   ebx,eax

    shr   ecx,2
    shr   edx,4
    shr   ebx,6

    and   al,3
    and   cl,3

    mov byte ptr val1,al
    mov byte ptr val2,cl

    and   dl,3

    mov byte ptr val4,bl
    mov byte ptr val3,dl

    pop   ebx
    ret

Bits1   ENDP

frktons

Quote from: dedndave on February 02, 2013, 04:48:03 AM
here is kind of a "Henry Ford" approach
we push and pop ebx, but reduce dependancies
Bits1   PROC

    movzx eax,byte ptr bit_stream
    push  ebx
    mov   ecx,eax
    mov   edx,eax
    mov   ebx,eax

    shr   ecx,2
    shr   edx,4
    shr   ebx,6

    and   al,3
    and   cl,3

    mov byte ptr val1,al
    mov byte ptr val2,cl

    and   bl,3
    and   dl,3

    mov byte ptr val4,bl
    mov byte ptr val3,dl

    pop   ebx
    ret

Bits1   ENDP


And here a kind of frktons approach:

    xor eax, eax
    xor ebx, ebx
    xor ecx, ecx
    xor edx, edx

    mov al, bit_stream
    ror eax, 8

    shld  edx, eax, 2
    shl   eax, 2
    shld  ecx, eax, 2
    shl   eax, 2
    shld  ebx, eax, 2
    shl   eax, 2
    rol   eax, 2

    mov   val1, eax
    mov   val2, ebx
    mov   val3, ecx
    mov   val4, edx


Which is faster?

Sill looking for SSE3 solution.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave

on the Henry Ford - no need to AND BL,3 - you can remove that line of code

i am working on one that doesn't push/pop ebx
then, i will let you write the test code   :P

dedndave

Bits2   PROC

    movzx eax,byte ptr bit_stream
    mov   ecx,eax
    mov   edx,eax
    shr   ecx,2
    shr   eax,4
    and   dl,3
    mov   ah,al
    and   cl,3
    shr   ah,2
    mov byte ptr val1,dl
    and   al,3
    mov byte ptr val2,cl
    mov byte ptr val4,ah
    mov byte ptr val3,al
    ret

Bits2   ENDP

frktons

I'm half way SSE2 solution. Still something to do though.



.data

align 16
maskxx  dd  000000C0H, 00000030H, 0000000CH, 00000003H

align 16
val1  dd 0
val2  dd 0
val3  dd 0
val4  dd 0


align 4
bit_stream  db 11100100B

.code

.....

    lea   esi, maskxx
    movaps  xmm1, oword ptr [esi]
    lea   edi, val1

    movzx eax, byte ptr bit_stream
    movd  xmm0, eax
    pshufd xmm0, xmm0, 0

    pand  xmm0, xmm1

    ; Here I have to solve the problem to shift 2nd dword in xmm0 2 bits right
    ; 3rd  dword  4 bits right
    ; 4th  dword  6 bits right


    movaps oword ptr [edi], xmm0



Any idea? Maybe PSRLD xmm0, xmm2 ?

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

dedndave

that is the trick, huh - lol

here is a bit-swap routine that runs very fast (reverses bits in a dword)
developed by BitRake and Nexo, updated by Drizz
        mov     eax,dwInputVal
        mov     edx,00001111000011110000111100001111b
        and     edx,eax
        and     eax,11110000111100001111000011110000b
        shl     edx,4
        shr     eax,4
        or      eax,edx
        mov     edx,00110011001100110011001100110011b
        and     edx,eax
        and     eax,11001100110011001100110011001100b
        shr     eax,2
        lea     eax,[edx*4+eax]                         ;shl edx,2 ;or eax,edx
        mov     edx,01010101010101010101010101010101b
        and     edx,eax
        and     eax,10101010101010101010101010101010b
        shr     eax,1
        lea     eax,[edx*2+eax]                         ;shl edx,1 ;or eax,edx
        bswap   eax

now, i know that's not what you want to do
however, it demonstrates a concept that you may find useful
understand how it works - then you may be able to use the same concept in SSE code

frktons

Quote from: dedndave on February 02, 2013, 06:10:29 AM
that is the trick, huh - lol

here is a bit-swap routine that runs very fast (reverses bits in a dword)
developed by BitRake and Nexo, updated by Drizz
        mov     eax,dwInputVal
        mov     edx,00001111000011110000111100001111b
        and     edx,eax
        and     eax,11110000111100001111000011110000b
        shl     edx,4
        shr     eax,4
        or      eax,edx
        mov     edx,00110011001100110011001100110011b
        and     edx,eax
        and     eax,11001100110011001100110011001100b
        shr     eax,2
        lea     eax,[edx*4+eax]                         ;shl edx,2 ;or eax,edx
        mov     edx,01010101010101010101010101010101b
        and     edx,eax
        and     eax,10101010101010101010101010101010b
        shr     eax,1
        lea     eax,[edx*2+eax]                         ;shl edx,1 ;or eax,edx
        bswap   eax

now, i know that's not what you want to do
however, it demonstrates a concept that you may find useful
understand how it works - then you may be able to use the same concept in SSE code

Thanks Dave, I'll try to understand what it does later.

Now to finish the SSE2 task I've to understand how to shift bits
using a different approach, like adding/subtracting a value
to the byte interested.

if I have


bit_mask  db 11100100B

and  I want to shift right 6 bits, I can do:

shr bit_mask, 6

If I want to get the same result using sub/add or something like this
what kind of logic have I to use?

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama