Author Topic: Fastest way to move 3 bytes into a dword  (Read 15107 times)

frktons

  • Member
  • ***
  • Posts: 491
Fastest way to move 3 bytes into a dword
« on: January 21, 2013, 04:32:29 AM »
Next challenge on the way to enlightenment  is:

What is the fastest way to move a 3 bytes variable
into a dword variable/register?

And the reverse, of course.

Let's see what shows up this time  :P

I bet Dave will enjoy this one.  :lol:

dedndave

  • Member
  • *****
  • Posts: 8827
  • Still using Abacus 2.0
    • DednDave
Re: Fastest way to move 3 bytes into a dword
« Reply #1 on: January 21, 2013, 04:40:08 AM »
Code: [Select]
mov eax,dword ptr var3bytes
and eax,0FFFFFFh
:P

frktons

  • Member
  • ***
  • Posts: 491
Re: Fastest way to move 3 bytes into a dword
« Reply #2 on: January 21, 2013, 05:28:14 AM »
Code: [Select]
mov eax,dword ptr var3bytes
and eax,0FFFFFFh
:P

Good idea Dave, I like it.  :t

The reverse operation is still missing. Dave, give us the light.

I'm very curious  to see if more solutions come up & what kind
of bit-imagination assembly programmers have developed  :lol:

Siekmanski

  • Member
  • *****
  • Posts: 2326
Re: Fastest way to move 3 bytes into a dword
« Reply #3 on: January 21, 2013, 07:43:45 AM »

24bit to 32 bit,


* code modified,

Code: [Select]
.data
align 16
Buffer32               db 32*1024 dup (0)
ByteAndMask32          db -1,-1,-1,0,-1,-1,-1,0,-1,-1,-1,0,-1,-1,-1,0
ByteMask24BitSSE3      db 0,1,2,0,3,4,5,0,6,7,8,0,9,10,11,0

Bytes24bit             db 1,2,3,4,5,6,7,8,9,10,11,12 ; etc.

.code

lea eax,ByteMask24BitSSE3
movdqa xmm1,[eax]
lea eax,ByteAndMask32
movdqa xmm2,[eax]

lea esi,Bytes24bit
lea edi,Buffer32

; mov ecx,1024
align 16
conversion_loop:
movdqu xmm0,[esi]
pshufb xmm0,xmm1
pand xmm0,xmm2
movdqa [edi],xmm0
; movdqu xmm0,[esi+12]
; pshufb xmm0,xmm1
; pand xmm0,xmm2
; movdqa [edi+16],xmm0
; add esi,24
; add edi,32
; dec ecx
; jnz conversion_loop


1,2,3,4,5,6,7,8,9,10,11,12
results in:
00030201h 00060504h 00090807h 000c0b0ah
« Last Edit: January 21, 2013, 10:05:32 AM by Siekmanski »
Creative coders use backward thinking techniques as a strategy.

frktons

  • Member
  • ***
  • Posts: 491
Re: Fastest way to move 3 bytes into a dword
« Reply #4 on: January 21, 2013, 08:47:07 AM »

Code: [Select]
.data
align 16
Buffer32               db 32*1024 dup (0)
ByteAndMask32          db -1,-1,-1,0,-1,-1,-1,0,-1,-1,-1,0,-1,-1,-1,0
ByteMask24BitSSE3      db 0,1,2,0,3,4,5,0,6,7,8,0,9,10,11,0

Bytes24bit             db 1,2,3,4,5,6,7,8,9,10,11,12 ; etc.

.code

lea eax,ByteMask24BitSSE3
movdqa xmm1,[eax]
lea eax,ByteAndMask32
movdqa xmm2,[eax]

lea esi,Bytes24bit
lea edi,Buffer32

; mov ecx,1024
align 16
conversion_loop:
movdqu xmm0,[esi]
pshufb xmm0,xmm1
pand xmm0,xmm2
movdqa [edi],xmm0
; movdqu xmm0,[esi+12]
; pshufb xmm0,xmm1
; pand xmm0,xmm2
; movdqa [edi+16],xmm0
; add esi,24
; add edi,32
; dec ecx,1
; jnz conversion_loop


1,2,3,4,5,6,7,8,9,10,11,12
results in:
00030201h 00060504h 00090807h 000c0b0ah

Very interesting Siekmanski, I want to test it to see the performance
of SSE code against traditional 32 bit code.   :t

Siekmanski

  • Member
  • *****
  • Posts: 2326
Re: Fastest way to move 3 bytes into a dword
« Reply #5 on: January 21, 2013, 09:05:40 AM »
32bit to 24 bit,

 :biggrin:

Code: [Select]

.data
align 16
Bytes32bit             dd 00030201h,00060504h,00090807h,000c0b0ah ; etc.
ByteMask24BitSSE3_2    db 0,1,2,4,5,6,8,9,10,12,13,14,3,3,3,3
Buffer24        db 24*1024+4 dup (0)


.code

lea eax,ByteMask24BitSSE3_2
movdqa xmm1,[eax]

lea esi,Bytes32bit
lea edi,Buffer24

; mov ecx,1024
align 16
conversion_loop2:
movdqa xmm0,[esi]
pshufb xmm0,xmm1
movdqu [edi],xmm0
; movdqa xmm0,[esi+16]
; pshufb xmm0,xmm1
; movdqu [edi+12],xmm0
; add esi,32
; add edi,24
; dec ecx
; jnz conversion_loop2


00030201h 00060504h 00090807h 000c0b0ah
results in:
1,2,3,4,5,6,7,8,9,10,11,12
Creative coders use backward thinking techniques as a strategy.

frktons

  • Member
  • ***
  • Posts: 491
Re: Fastest way to move 3 bytes into a dword
« Reply #6 on: January 21, 2013, 09:19:17 AM »
32bit to 24 bit,

 :biggrin:

Code: [Select]

.data
align 16
Bytes32bit             dd 00030201h,00060504h,00090807h,000c0b0ah ; etc.
ByteMask24BitSSE3_2    db 0,1,2,4,5,6,8,9,10,12,13,14,3,3,3,3
Buffer24        db 24*1024+4 dup (0)


.code

lea eax,ByteMask24BitSSE3_2
movdqa xmm1,[eax]

lea esi,Bytes32bit
lea edi,Buffer24

; mov ecx,1024
align 16
conversion_loop2:
movdqa xmm0,[esi]
pshufb xmm0,xmm1
movdqu [edi],xmm0
; movdqa xmm0,[esi+16]
; pshufb xmm0,xmm1
; movdqu [edi+12],xmm0
; add esi,32
; add edi,24
; dec ecx,1
; jnz conversion_loop2


00030201h 00060504h 00090807h 000c0b0ah
results in:
1,2,3,4,5,6,7,8,9,10,11,12


Yes, this was the missing part. Very nice. I'll test them ASAP.   :t

Siekmanski

  • Member
  • *****
  • Posts: 2326
Re: Fastest way to move 3 bytes into a dword
« Reply #7 on: January 21, 2013, 09:31:47 AM »
 :icon_redface: Made a stupid mistake,

dec ecx,1

must be: dec ecx

sources modified....   :biggrin:
Creative coders use backward thinking techniques as a strategy.

Siekmanski

  • Member
  • *****
  • Posts: 2326
Re: Fastest way to move 3 bytes into a dword
« Reply #8 on: January 21, 2013, 10:02:06 AM »
To get faster results,unrole the conversionloop 3 times to run in L1 cache (64 byte)
This 24 bit to 32 bit routine is now 60 bytes long and fits in the L1 cache.
( The 32 bit to 24 bit routine unroled 3 times is 65 byte so 1 byte to big to fit the L1 cache.)

Code: [Select]
mov ecx,128
align 16
conversion_loop2:
movdqa xmm0,[esi]
pshufb xmm0,xmm1
movdqu [edi],xmm0
movdqa xmm0,[esi+16]
pshufb xmm0,xmm1
movdqu [edi+12],xmm0
movdqa xmm0,[esi+32]
pshufb xmm0,xmm1
movdqu [edi+24],xmm0
add esi,48
add edi,36
dec ecx
jnz conversion_loop2

Creative coders use backward thinking techniques as a strategy.

frktons

  • Member
  • ***
  • Posts: 491
Re: Fastest way to move 3 bytes into a dword
« Reply #9 on: January 21, 2013, 10:15:27 AM »
To get faster results,unrole the conversionloop 3 times to run in L1 cache (64 byte)
This 24 bit to 32 bit routine is now 60 bytes long and fits in the L1 cache.
( The 32 bit to 24 bit routine unroled 3 times is 65 byte so 1 byte to big to fit the L1 cache.)

Code: [Select]
mov ecx,128
align 16
conversion_loop2:
movdqa xmm0,[esi]
pshufb xmm0,xmm1
movdqu [edi],xmm0
movdqa xmm0,[esi+16]
pshufb xmm0,xmm1
movdqu [edi+12],xmm0
movdqa xmm0,[esi+32]
pshufb xmm0,xmm1
movdqu [edi+24],xmm0
add esi,48
add edi,36
dec ecx
jnz conversion_loop2



I'm preparing the test program, but I'm not sure it'll be ready soon.
It is night and I'm almost sleeping.   :dazzled:

frktons

  • Member
  • ***
  • Posts: 491
Re: Fastest way to move 3 bytes into a dword
« Reply #10 on: January 21, 2013, 10:32:25 AM »
This is the structure of the test program:
Code: [Select]
;==============================================================================
; Test_mov3todw.asm
; ------------------------------------------------------------------------
; Example to test instructions that mov 3 bytes vars into dwords.
; The test uses 48 bytes string to be read 3 bytes each time, into 16 DW.
; ------------------------------------------------------------------------
; Frktons 20-jan-2013 @Masm32 Forum
;==============================================================================
 include \masm32\include\masm32rt.inc
;==============================================================================
.nolist
.686
.xmm

include \masm32\macros\timers.asm
; get them from the
;[url=http://www.masm32.com/board/index.php?topic=770.0]Masm32 Laboratory[/url]

AxCPUid_Print PROTO

LOOP_COUNT EQU 1000

include \masm32\include\MyLib.inc


;==============================================================================
.data

    align 16
    Area  DB "Here it is a string with 48 characters inside me",0
    AreaLn = ($ - Area - 1)
    align Four   
    AreaLen    dd 0
    Counter    dd 0
    PtrSource  dd Area
    PtrDest    dd ArrayDW
 

    align Four
    LineSep     db  72 dup("-"),0,0,0,0

    align Four
    PtrLineSep  dd  LineSep
 

.data?

    align 16
    ArrayDW  dd 16 DUP (?)
    align Four
    CPU_Count DD  ?             ; Number of Cycles elapsed   
   


   
.code
;==============================================================================
align Four
MovProc proc

    mov edx, 1000      ; Number of cycles to perform

align Four   
TotCycles:

    mov esi, PtrSource
    mov edi, PtrDest
    mov ecx, 16
align Four   
cycle:
    mov eax,   [esi]
    and eax,   00FFFFFFH
    mov [edi], eax
    add esi,   3
    add edi,   Four

    dec ecx
    jnz cycle

    dec edx
    jnz TotCycles


    ret

MovProc endp

;==============================================================================
align Four
DisplayArrayDW  proc

    mov  ecx, 0
    mov  edx, PtrDest

Display:

    pushad

    print DWORD PTR edx

    popad
   
    add   edx, Four
    inc   ecx
    cmp   ecx, 16
    jnz   Display

    ret

DisplayArrayDW  endp
;==============================================================================
align Four
Main proc



    invoke GetLocaleInfo,LOCALE_USER_DEFAULT,LOCALE_STHOUSAND,offset Tsep,Four
    invoke CharToOem,offset Tsep,offset Tsep

    CALL   FillMyArray

    CALL   FillMyArray0

    INVOKE ConsoleSize, 40, 100

    print PtrLineSep, 13, 10
   
    invoke AxCPUid_Print

    print PtrLineSep, 13, 10

    REPEAT Four

;---------------------------------------------------------------------------------

invoke Sleep, 100
counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
       
      CALL MovProc

counter_end

      mov  edi, PtrFmtNum16
      lea  esi, InitString
      movdqa  xmm0, [esi]
      movdqa  [edi], xmm0

      INVOKE FormatNumDW, eax, PtrFmtNum16
     
print PtrFmtNum16, 9,  "cycles for Dave - MOV 4 bytes / AND", 13, 10

;---------------------------------------------------------------------------------

      print PtrLineSep, 13, 10
     
       
    ENDM 
   
;    CALL DisplayArrayDW
 

    ret

Main endp


;-------------------------------------------------------------

    include AxCPUid.inc

;-------------------------------------------------------------

;==============================================================================
start:
;==============================================================================

;==============================================================================
    call Main

    inkey
    exit
;==============================================================================
end start

If you are still awake, try to adapt your code for the task.
Attached the files you need.

Frank

Siekmanski

  • Member
  • *****
  • Posts: 2326
Re: Fastest way to move 3 bytes into a dword
« Reply #11 on: January 21, 2013, 12:56:33 PM »
Inserted my routines.

Code: [Select]
------------------------------------------------------------------------
Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz

Instructions: MMX, SSE1, SSE2, SSE3, SSSE3
------------------------------------------------------------------------
       49.992   cycles for Dave - 48 bytes MOV 4 bytes / AND
------------------------------------------------------------------------
       48.133   cycles for Dave - 48 bytes MOV 4 bytes / AND
------------------------------------------------------------------------
       48.143   cycles for Dave - 48 bytes MOV 4 bytes / AND
------------------------------------------------------------------------
       48.144   cycles for Dave - 48 bytes MOV 4 bytes / AND
------------------------------------------------------------------------
       23.103   cycles for Siekmanski - 48 bytes SSSE3_24_32
------------------------------------------------------------------------
       23.152   cycles for Siekmanski - 48 bytes SSSE3_24_32
------------------------------------------------------------------------
       23.137   cycles for Siekmanski - 48 bytes SSSE3_24_32
------------------------------------------------------------------------
       23.136   cycles for Siekmanski - 48 bytes SSSE3_24_32
------------------------------------------------------------------------
       20.138   cycles for Siekmanski - 48 bytes SSSE3_24_32 unroled
------------------------------------------------------------------------
       20.124   cycles for Siekmanski - 48 bytes SSSE3_24_32 unroled
------------------------------------------------------------------------
       20.130   cycles for Siekmanski - 48 bytes SSSE3_24_32 unroled
------------------------------------------------------------------------
       20.137   cycles for Siekmanski - 48 bytes SSSE3_24_32 unroled
------------------------------------------------------------------------
       23.137   cycles for Siekmanski - 48 bytes SSSE3_32_24
------------------------------------------------------------------------
       23.126   cycles for Siekmanski - 48 bytes SSSE3_32_24
------------------------------------------------------------------------
       23.137   cycles for Siekmanski - 48 bytes SSSE3_32_24
------------------------------------------------------------------------
       23.137   cycles for Siekmanski - 48 bytes SSSE3_32_24
------------------------------------------------------------------------
       19.139   cycles for Siekmanski - 48 bytes SSSE3_32_24 unroled
------------------------------------------------------------------------
       19.130   cycles for Siekmanski - 48 bytes SSSE3_32_24 unroled
------------------------------------------------------------------------
       19.138   cycles for Siekmanski - 48 bytes SSSE3_32_24 unroled
------------------------------------------------------------------------
       19.138   cycles for Siekmanski - 48 bytes SSSE3_32_24 unroled
------------------------------------------------------------------------
Creative coders use backward thinking techniques as a strategy.

KeepingRealBusy

  • Member
  • ***
  • Posts: 426
Re: Fastest way to move 3 bytes into a dword
« Reply #12 on: January 21, 2013, 02:10:30 PM »
I looked at some of this code and noticed some new instructions that I had never seen before, and then it dawned on me, I don't have the manuals for my new quad A8 3520M cpu, I only have the manuals for my old dual core system.

Does anyone have a good link to AMD to get the CORRECT manuals for this CPU?

Dave.


dedndave

  • Member
  • *****
  • Posts: 8827
  • Still using Abacus 2.0
    • DednDave
Re: Fastest way to move 3 bytes into a dword
« Reply #14 on: January 21, 2013, 02:55:21 PM »
i have a prescott that supports SSE3, Marinus
crashes at PSHUFB XMM0,XMM1   :P

87,665 cycles for the first test, though