Author Topic: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords  (Read 46579 times)

nidud

  • Member
  • *****
  • Posts: 1980
    • https://github.com/nidud/asmc
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #30 on: November 26, 2012, 06:52:59 AM »
Quote
---------------------------------------------------------
5158    cycles for MOV AX
5197    cycles for LEA
3117    cycles for MMX/PUNPCKLBW
2086    cycles for XMM/PSHUFB - I shot
1076    cycles for XMM/PSHUFB - II shot
10499   cycles for XMM/MASKMOVDQU - I shot
10267   cycles for STOSB
---------------------------------------------------------
5190    cycles for MOV AX
5175    cycles for LEA
3091    cycles for MMX/PUNPCKLBW
2062    cycles for XMM/PSHUFB - I shot
1089    cycles for XMM/PSHUFB - II shot
10645   cycles for XMM/MASKMOVDQU - I shot
9282    cycles for STOSB
---------------------------------------------------------

nidud

  • Member
  • *****
  • Posts: 1980
    • https://github.com/nidud/asmc
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #31 on: November 26, 2012, 07:28:02 AM »
With adjustment for the loop (mov ecx,4096/16):
Quote
---------------------------------------------------------
1125    cycles for XMM/PSHUFB - I shot
1088    cycles for XMM/PSHUFB - II shot
---------------------------------------------------------
1124    cycles for XMM/PSHUFB - I shot
1140    cycles for XMM/PSHUFB - II shot
---------------------------------------------------------

frktons

  • Member
  • ***
  • Posts: 491
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #32 on: November 26, 2012, 08:01:46 AM »
With adjustment for the loop (mov ecx,4096/16):
Quote
---------------------------------------------------------
1125    cycles for XMM/PSHUFB - I shot
1088    cycles for XMM/PSHUFB - II shot
---------------------------------------------------------
1124    cycles for XMM/PSHUFB - I shot
1140    cycles for XMM/PSHUFB - II shot
---------------------------------------------------------


In the first shot 4096/4 refers to the dwords to elaborate in each cycle,
so it cannot be 4096/16.
The second one works on 16 dwords at a time, so it is 4096/16.

nidud

  • Member
  • *****
  • Posts: 1980
    • https://github.com/nidud/asmc
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #33 on: November 26, 2012, 08:19:47 AM »
In the first shot 4096/4 refers to the dwords to elaborate in each cycle,
so it cannot be 4096/16.
The second one works on 16 dwords at a time, so it is 4096/16.
Yes, so you have to repeat this 4 times to make it even.
Code: [Select]
mov eax,offset Dest
mov ecx,4096/16
mov ebx,offset Source
    @@:
    movdqa xmm1, [ebx]
    pshufb xmm1, xmm2
    movd dword ptr [eax], xmm1

    movdqa xmm1, [ebx+16]
    pshufb xmm1, xmm2
    movd dword ptr [eax+4], xmm1

    movdqa xmm1, [ebx+32]
    pshufb xmm1, xmm2
    movd dword ptr [eax+8], xmm1

    movdqa xmm1, [ebx+48]
    pshufb xmm1, xmm2
    movd dword ptr [eax+12], xmm1

    add  ebx, 64
    add  eax, 16
    dec ecx
    jnz @b

frktons

  • Member
  • ***
  • Posts: 491
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #34 on: November 26, 2012, 08:57:04 AM »
I'm going to prepare a real test, with some data to make the masks
a little bit more accurate. They are not tested for the time being, and
were used just to have an idea of their performances.

After testing on real data and adjusting the masks accordingly, the
test could be considered valid.

Up to now I've worked on uninitializes data, so there is no way to
know if the sequence of bit/bytes in the masks are correct.  ::)

nidud

  • Member
  • *****
  • Posts: 1980
    • https://github.com/nidud/asmc
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #35 on: November 26, 2012, 09:43:40 AM »
I think it does what it suppose to do:
Code: [Select]
SetBuffer:
mov edi,offset Dest
mov ecx,4096
mov al,1
rep stosb
ret

TestBuffer:
mov esi,offset Dest
mov ecx,4096
.repeat
    lodsb
    .break .if al
.untilcxz
mov esi,offset cp_Ok
.if al
    mov esi,offset cp_Fail
.endif
ret

Quote
---------------------------------------------------------
1473    cycles for XMM/PSHUFB - I shot : Ok..
1024    cycles for XMM/PSHUFB - II shot : Ok..
---------------------------------------------------------
1145    cycles for XMM/PSHUFB - I shot : Ok..
1024    cycles for XMM/PSHUFB - II shot : Ok..
---------------------------------------------------------

frktons

  • Member
  • ***
  • Posts: 491
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #36 on: November 26, 2012, 10:09:04 AM »
I think it does what it suppose to do

Well, a good test should start with 4096 dword initialized with 00000001h
and then use the single routines with it, testing if at the end there are all 01h
in the Dest buffer, and to verify it you could use your routine.

jj2007

  • Member
  • *****
  • Posts: 10552
  • Assembler is fun ;-)
    • MasmBasic
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #37 on: November 26, 2012, 10:29:39 AM »
Hi Frank,

Here is a testfile. The exe shows it, *.asc is the source in RTF/RichMasm format.

Hope it helps,
Jochen

nidud

  • Member
  • *****
  • Posts: 1980
    • https://github.com/nidud/asmc
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #38 on: November 26, 2012, 11:54:27 AM »
Well, a good test should start with 4096 dword initialized with 00000001h
and then use the single routines with it, testing if at the end there are all 01h
in the Dest buffer, and to verify it you could use your routine.

This was implemented using macros for each test. You need to reset the byte buffer (Dest) for each test, using 0 if source is 1, or 1 if source is 0 (as in this case).
Code: [Select]
test_start macro
call SetBuffer
invoke Sleep, 100
counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
endm

test_end macro text
counter_end
print str$(eax), 9, text
call TestBuffer
print esi
endm

frktons

  • Member
  • ***
  • Posts: 491
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #39 on: November 27, 2012, 05:49:25 AM »
This was implemented using macros for each test. You need to reset the byte buffer (Dest) for each test, using 0 if source is 1, or 1 if source is 0 (as in this case).

Yes nidud, thanks.

Hi Frank,

Here is a testfile. The exe shows it, *.asc is the source in RTF/RichMasm format.

Hope it helps,
Jochen

Grazie Jochen, il tuo aiuto è sempre benvenuto.

I'll give it a look as soon as I finish a couple of prelimary
things I'm working on.


frktons

  • Member
  • ***
  • Posts: 491
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #40 on: November 27, 2012, 09:15:37 AM »
Is there an opcode to compare two xmm register to verify
if they have the same content?

Again SIMD instructions are a bit tricky for simple instructions.


qWord

  • Member
  • *****
  • Posts: 1473
  • The base type of a type is the type itself
    • SmplMath macros
MREAL macros - when you need floating point arithmetic while assembling!

frktons

  • Member
  • ***
  • Posts: 491
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #42 on: November 27, 2012, 09:37:57 AM »
See the CMPxxx, COMxxx and PCMPxxx instructions: AMD64 Architecture Programmer’s Manual Volume 4: 128-bit and 256 bit media instructions

Yes qWord,

Let's assume I use:

Code: [Select]
   PCMPEQD xmm0,xmm1

considering this and the others don't affect the flags,
how do I jmp somewhere after the test?
If they are equal or not, what tells me that?

PTEST affect the Zero Flag, but the opcode is out of my league (SSE4.1).

 

frktons

  • Member
  • ***
  • Posts: 491
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #43 on: November 27, 2012, 10:17:09 AM »
The first correct test for SSE instructions with proc to check the results:

Code: [Select]
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
---------------------------------------------------------
13862   cycles for MOV AX - Test OK
13114   cycles for LEA - Test OK
6195    cycles for MMX/PUNPCKLBW - Test OK
3157    cycles for XMM/PSHUFB - I shot - Test OK
2375    cycles for XMM/PSHUFB - II shot - Test OK
12327   STOSB - Test OK
---------------------------------------------------------
9238    cycles for MOV AX - Test OK
8723    cycles for LEA - Test OK
4130    cycles for MMX/PUNPCKLBW - Test OK
3150    cycles for XMM/PSHUFB - I shot - Test OK
2375    cycles for XMM/PSHUFB - II shot - Test OK
16701   STOSB - Test OK
---------------------------------------------------------

--- ok ---

Attached last version.

Enjoy

nidud

  • Member
  • *****
  • Posts: 1980
    • https://github.com/nidud/asmc
Re: MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords
« Reply #44 on: November 27, 2012, 11:12:38 AM »
Seems to be possible to compare the low 8 bytes:
Code: [Select]
COMISD dest,source

The destination operand is an XMM register.
The source can be either an XMM register or a memory location.

The flags are set according to the following rules:
Result Flags  Values
Unordered ZF,PF,CF  111
Greater than ZF,PF,CF  000
Less than ZF,PF,CF  001
Equal ZF,PF,CF  100

Maybe it's possible to shift (or rotate) the regs and then compare the high 8 bytes?