News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Motion mask

Started by cpu2, March 23, 2014, 10:50:05 AM

Previous topic - Next topic

cpu2

Hello

I´m implementing the function Mixcolunms of Aes encryption. I used the instruction maskmovq 32 OPS, is to make the modular reduction.

%mm0 = 0x8001018080010180
     
maskmovq %mm0, %mm0  ; (%rdi) = 0x8000008080000080


But pmovmskb is 1 OPS.

%mm0 = 0x8001018080010180
     
pmovmskb %mm0, %esi  ; %esi = 0x99


Might as well build a mask, from byte to copy the bytes as does maskmovq?

Thanks and sorry for my English.

Magnum

Welcome to the board.

Your english is excellent.

I wish I could help, but FPU instructions are one of my weak points.

Andy
Take care,
                   Andy

Ubuntu-mate-18.04-desktop-amd64

http://www.goodnewsnetwork.org

TouEnMasm

From amd book of instructions
Quote
The MOVNTQ instruction stores a 64-bit MMX register value into a 64-bit memory location.
The MASKMOVQ instruction stores bytes from the first operand, as selected by the mask value (mostsignificant bit of each byte)
in the second operand, to a memorylocation specified in the rDI and DS registers.
The first operand is an MMX register, and the second operand is another MMX register. The size of the store is determined by the effective address size.

The result is in DS:rDI

Quote
• PMOVMSKB—Packed Move Mask Byte
The PMOVMSKB instruction moves the most-significant bit ofeach byte in an XMM register to the low-order word of a 32-bit or 64-bit general-purpose register, with zero-extension. The instruction is useful for extracting bits from mask patterns, or zero values from quantized data, or sign bits—resulting in a byte that can be used for data-dependent branching.


Fa is a musical note to play with CL

Gunther

Hi cpu2,

first things first: Welcome to the forum.

Another point. Why do you use the more or less horrible AT&T syntax? That's not really necessary. By the way, you'r English isn't bad.

Gunther
You have to know the facts before you can distort them.

MichaelW

According to Agner Fog's instruction_tables.pdf (available here), MASKMOVQ is 31-32 ops on the AMD processors and 3-4 ops on the Intel processors. And PMOVMSKB is 1-3 ops on the AMD processors and 1-2 ops on the Intel processors.
Well Microsoft, here's another nice mess you've gotten us into.

cpu2

Yes use the AMD K10. I´m resolved the problem, maskmovq/dqu exceeds 4 instructions and lookup table.

.section .data

bts0_: .quad 0x8080808080808080,0x8080808080808080

.section .text
.globl _start

_start:

movdqa %xmm0, %xmm1

pand bts0_, %xmm0
pcmpeqb bts0_, %xmm0
pand %xmm0, %xmm1


Is more eficient, and sorry for the AT&T syntax, i assemble in Unix like system.

Thanks for the replies an the welcome.

Gunther

Hi cpu2,

Quote from: cpu2 on March 24, 2014, 09:08:54 AM
Is more eficient, and sorry for the AT&T syntax, i assemble in Unix like system.

that's not the point. You can use


.intel_syntax noprefix


with the gcc (I think since version 2.8 or so).

Gunther
You have to know the facts before you can distort them.