Author Topic: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords  (Read 46582 times)

dedndave

  • Member
  • *****
  • Posts: 8827
  • Still using Abacus 2.0
    • DednDave
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #15 on: November 25, 2012, 12:45:29 PM »
nope - i just write bad code   :lol:

LEA does not affect the flags, so
Code: [Select]
        lea     eax,[eax+1]adds one to EAX without altering the flags that were set by DEC ECX
the idea was to put something in between the instruction that sets the flags and the one that examines them
but, LEA is not a great performer on older CPU's

still, it shouldn't be that slow - lol

dedndave

  • Member
  • *****
  • Posts: 8827
  • Still using Abacus 2.0
    • DednDave
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #16 on: November 25, 2012, 12:52:01 PM »
i got a slight improvement on my p4 prescott

Code: [Select]
counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
    mov eax, offset Dest
    mov ecx, 4096
    mov ebx, offset Source

@@:     mov     edx,[ebx]
        add     ebx,4
        mov     [eax],dl
        dec     ecx
        lea     eax,[eax+1]
        jnz     @B

; @@:
;     mov edx, [ebx]
;     mov byte ptr [eax], dl
;     add eax, 1
;                add ebx, 4
;     dec ecx
;     jnz @B
counter_end

sinsi

  • Guest
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #17 on: November 25, 2012, 01:34:26 PM »
Maybe try
Code: [Select]
  mov dl,[ebx]
  mov dh,[ebx+4]
  mov [eax],dx

hutch--

  • Administrator
  • Member
  • ******
  • Posts: 7545
  • Mnemonic Driven API Grinder
    • The MASM32 SDK
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #18 on: November 25, 2012, 01:44:48 PM »
Dave,

It was only the PIV that was a poor performer with LEA, PIII and earlier and Core2 onwards are fine.
hutch at movsd dot com
http://www.masm32.com    :biggrin:  :skrewy:

dedndave

  • Member
  • *****
  • Posts: 8827
  • Still using Abacus 2.0
    • DednDave
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #19 on: November 25, 2012, 01:51:21 PM »
i knew it was something like that, Hutch - lol

sinsi has the right idea, i think...
Code: [Select]
counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
    mov ebx, offset Source
    mov eax, offset Dest
    mov ecx, 4096/4

@@:     mov     dh,[ebx+12]
        mov     dl,[ebx+8]
        shl     edx,16
        mov     dh,[ebx+4]
        mov     dl,[ebx]
        add     ebx,16
        mov     [eax],edx
        dec     ecx
        lea     eax,[eax+4]
        jnz     @B

counter_end

it would help if the destination array is 4-aligned - maybe the source, too

nidud

  • Member
  • *****
  • Posts: 1980
    • https://github.com/nidud/asmc
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #20 on: November 25, 2012, 07:52:55 PM »
Quote
AMD Athlon(tm) II X2 245 Processor (SSE3)
---------------------------------------------------------
4123   cycles for MOV AX
5127   cycles for LEA
5143   cycles for MMX/MOVD DWORD PTR
17419   cycles for STOSB
---------------------------------------------------------
4121   cycles for MOV AX
5127   cycles for LEA
5238   cycles for MMX/MOVD DWORD PTR
17428   cycles for STOSB
---------------------------------------------------------

jj2007

  • Member
  • *****
  • Posts: 10552
  • Assembler is fun ;-)
    • MasmBasic
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #21 on: November 25, 2012, 09:04:52 PM »
So on your puter Sinsi's solution is clearly the fastest. That's what I suspected ;-)

Not on mine, however:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
9968    cycles for MOV AX
9622    cycles for LEA
5181    cycles for MMX/MOVD DWORD PTR

frktons

  • Member
  • ***
  • Posts: 491
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #22 on: November 25, 2012, 10:19:15 PM »
On newer machine there is no game:

Code: [Select]
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
---------------------------------------------------------
13874   cycles for MOV AX
13124   cycles for LEA
6193    cycles for MMX/MOVD DWORD PTR
18486   cycles for STOSB
---------------------------------------------------------
13856   cycles for MOV AX
13087   cycles for LEA
4129    cycles for MMX/MOVD DWORD PTR
18516   cycles for STOSB
---------------------------------------------------------

--- ok ---


later the pshufb solution that should win the race.


frktons

  • Member
  • ***
  • Posts: 491
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #23 on: November 25, 2012, 11:15:09 PM »
Here it is, the first quick shot with PSHUFB:

Code: [Select]
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
---------------------------------------------------------
13866   cycles for MOV AX
13084   cycles for LEA
6205    cycles for MMX/MOVD DWORD PTR
4848    cycles for PSHUFB / I shot
18487   cycles for STOSB
---------------------------------------------------------
13852   cycles for MOV AX
13083   cycles for LEA
6194    cycles for MMX/MOVD DWORD PTR
4730    cycles for PSHUFB / I shot
18518   cycles for STOSB
---------------------------------------------------------

--- ok ---

later I'll try to improve it.

nidud

  • Member
  • *****
  • Posts: 1980
    • https://github.com/nidud/asmc
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #24 on: November 26, 2012, 01:13:49 AM »
big improvement for STOSB
here is one using loop
Quote
---------------------------------------------------------
6227   cycles for MOV AX
6165   cycles for LEA
7188   cycles for MMX/MOVD DWORD PTR
19466   cycles for STOSB
---------------------------------------------------------
6228   cycles for MOV AX
6163   cycles for LEA
7188   cycles for MMX/MOVD DWORD PTR
19475   cycles for STOSB
---------------------------------------------------------

frktons

  • Member
  • ***
  • Posts: 491
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #25 on: November 26, 2012, 02:19:29 AM »
Some more tests:

Code: [Select]
Intel(R) Core(TM)2 CPU  6600  @ 2.40GHz (SSSE3)
---------------------------------------------------------
13927   cycles for MOV AX
13097   cycles for LEA
6203    cycles for MMX/PUNPCKLBW
4729    cycles for XMM/PSHUFB - I shot
3518    cycles for XMM/PSHUFB - II shot
15364   cycles for XMM/MASKMOVDQU - I shot
18506   cycles for STOSB
---------------------------------------------------------
13868   cycles for MOV AX
13096   cycles for LEA
6198    cycles for MMX/PUNPCKLBW
4732    cycles for XMM/PSHUFB - I shot
3520    cycles for XMM/PSHUFB - II shot
15360   cycles for XMM/MASKMOVDQU - I shot
18503   cycles for STOSB
---------------------------------------------------------

--- ok ---

jj2007

  • Member
  • *****
  • Posts: 10552
  • Assembler is fun ;-)
    • MasmBasic
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #26 on: November 26, 2012, 03:25:54 AM »
Ciao Frank,
Apart from being slow, maskmovdqu does not what you want:

source     xxxCxxxIxxxAxxxO
wanted     CIAO
effective     C   I   A   O

nidud

  • Member
  • *****
  • Posts: 1980
    • https://github.com/nidud/asmc
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #27 on: November 26, 2012, 04:34:28 AM »
laptop with Intel Inside W95 16MB RAM
Quote
pre-P4
---------------------------------------------------------
17080   cycles for MOV AX
8935   cycles for LEA
11542   cycles for MMX/MOVD DWORD PTR
32254   cycles for STOSB
---------------------------------------------------------
17057   cycles for MOV AX
9082   cycles for LEA
11753   cycles for MMX/MOVD DWORD PTR
32423   cycles for STOSB
---------------------------------------------------------
using loop
Quote
---------------------------------------------------------
16556   cycles for MOV AX
12435   cycles for LEA
15317   cycles for MMX/MOVD DWORD PTR
32145   cycles for STOSB
---------------------------------------------------------
16556   cycles for MOV AX
12365   cycles for LEA
15312   cycles for MMX/MOVD DWORD PTR
32332   cycles for STOSB
---------------------------------------------------------

pshufbtest.exe

Quote
AMD Athlon(tm) II X2 245 Processor (SSE3)
-----------------------------------------------
4121    cycles for MOV AX
5127    cycles for LEA
5140    cycles for MMX/MOVD DWORD PTR
crash..

Quote
Intel(R) Core(TM) i3-2367M CPU @ 1.40GHz (SSE4)
---------------------------------------------------------
6334    cycles for MOV AX
5171    cycles for LEA
3123    cycles for MMX/MOVD DWORD PTR
2189    cycles for PSHUFB / I shot
10503   cycles for STOSB
---------------------------------------------------------
5243    cycles for MOV AX
5488    cycles for LEA
3150    cycles for MMX/MOVD DWORD PTR
2060    cycles for PSHUFB / I shot
9276    cycles for STOSB
---------------------------------------------------------

frktons

  • Member
  • ***
  • Posts: 491
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #28 on: November 26, 2012, 06:12:38 AM »
Ciao Frank,
Apart from being slow, maskmovdqu does not what you want:

source     xxxCxxxIxxxAxxxO
wanted     CIAO
effective     C   I   A   O


If the mask is correctly set, maskmovdqu should do the job :

F0h = byte to move, 00h = byte non moved, according to Intel Docs:

Code: [Select]
The most significant bit in each byte of the mask operand determines whether the
corresponding byte in the source operand is written to the corresponding byte location
in memory: 0 indicates no write and 1 indicates write.
[/b]


At least my previous test showed it can do the job, but maybe I didn't try it enough. ::)

Edit: it only works fine with consecutive bytes, probably, as you said, not the one I need
beside being slow.

frktons

  • Member
  • ***
  • Posts: 491
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #29 on: November 26, 2012, 06:17:17 AM »

Quote
Intel(R) Core(TM) i3-2367M CPU @ 1.40GHz (SSE4)
---------------------------------------------------------
6334    cycles for MOV AX
5171    cycles for LEA
3123    cycles for MMX/MOVD DWORD PTR
2189    cycles for PSHUFB / I shot
10503   cycles for STOSB
---------------------------------------------------------
5243    cycles for MOV AX
5488    cycles for LEA
3150    cycles for MMX/MOVD DWORD PTR
2060    cycles for PSHUFB / I shot
9276    cycles for STOSB
---------------------------------------------------------


These SIMD instructions work a lot better with modern tech.
Try the last version only on I3.