News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

MASM FOR FUN - REBORN - #0 Extract low order bytes from dwords

Started by frktons, November 25, 2012, 02:48:06 AM

Previous topic - Next topic

nidud

deleted

nidud

deleted

frktons

Quote from: nidud on November 26, 2012, 07:28:02 AM
With adjustment for the loop (mov ecx,4096/16):
Quote---------------------------------------------------------
1125    cycles for XMM/PSHUFB - I shot
1088    cycles for XMM/PSHUFB - II shot
---------------------------------------------------------
1124    cycles for XMM/PSHUFB - I shot
1140    cycles for XMM/PSHUFB - II shot
---------------------------------------------------------


In the first shot 4096/4 refers to the dwords to elaborate in each cycle,
so it cannot be 4096/16.
The second one works on 16 dwords at a time, so it is 4096/16.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

nidud

deleted

frktons

I'm going to prepare a real test, with some data to make the masks
a little bit more accurate. They are not tested for the time being, and
were used just to have an idea of their performances.

After testing on real data and adjusting the masks accordingly, the
test could be considered valid.

Up to now I've worked on uninitializes data, so there is no way to
know if the sequence of bit/bytes in the masks are correct.  ::)
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

nidud

deleted

frktons

Quote from: nidud on November 26, 2012, 09:43:40 AM
I think it does what it suppose to do

Well, a good test should start with 4096 dword initialized with 00000001h
and then use the single routines with it, testing if at the end there are all 01h
in the Dest buffer, and to verify it you could use your routine.
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

jj2007

Hi Frank,

Here is a testfile. The exe shows it, *.asc is the source in RTF/RichMasm format.

Hope it helps,
Jochen

nidud

deleted

frktons

Quote from: nidud on November 26, 2012, 11:54:27 AM
This was implemented using macros for each test. You need to reset the byte buffer (Dest) for each test, using 0 if source is 1, or 1 if source is 0 (as in this case).

Yes nidud, thanks.

Quote from: jj2007 on November 26, 2012, 10:29:39 AM
Hi Frank,

Here is a testfile. The exe shows it, *.asc is the source in RTF/RichMasm format.

Hope it helps,
Jochen

Grazie Jochen, il tuo aiuto รจ sempre benvenuto.

I'll give it a look as soon as I finish a couple of prelimary
things I'm working on.

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

frktons

Is there an opcode to compare two xmm register to verify
if they have the same content?

Again SIMD instructions are a bit tricky for simple instructions.

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

qWord

MREAL macros - when you need floating point arithmetic while assembling!

frktons

Quote from: qWord on November 27, 2012, 09:24:01 AM
See the CMPxxx, COMxxx and PCMPxxx instructions: AMD64 Architecture Programmer's Manual Volume 4: 128-bit and 256 bit media instructions

Yes qWord,

Let's assume I use:


   PCMPEQD xmm0,xmm1


considering this and the others don't affect the flags,
how do I jmp somewhere after the test?
If they are equal or not, what tells me that?

PTEST affect the Zero Flag, but the opcode is out of my league (SSE4.1).

There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

frktons

The first correct test for SSE instructions with proc to check the results:


Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
---------------------------------------------------------
13862   cycles for MOV AX - Test OK
13114   cycles for LEA - Test OK
6195    cycles for MMX/PUNPCKLBW - Test OK
3157    cycles for XMM/PSHUFB - I shot - Test OK
2375    cycles for XMM/PSHUFB - II shot - Test OK
12327   STOSB - Test OK
---------------------------------------------------------
9238    cycles for MOV AX - Test OK
8723    cycles for LEA - Test OK
4130    cycles for MMX/PUNPCKLBW - Test OK
3150    cycles for XMM/PSHUFB - I shot - Test OK
2375    cycles for XMM/PSHUFB - II shot - Test OK
16701   STOSB - Test OK
---------------------------------------------------------

--- ok ---


Attached last version.

Enjoy
There are only two days a year when you can't do anything: one is called yesterday, the other is called tomorrow, so today is the right day to love, believe, do and, above all, live.

Dalai Lama

nidud

deleted