The MASM Forum

General => The Laboratory => Topic started by: cpu2 on March 23, 2014, 10:50:05 AM

Title: Motion mask
Post by: cpu2 on March 23, 2014, 10:50:05 AM
Hello

I´m implementing the function Mixcolunms of Aes encryption. I used the instruction maskmovq 32 OPS, is to make the modular reduction.

%mm0 = 0x8001018080010180
     
maskmovq %mm0, %mm0  ; (%rdi) = 0x8000008080000080


But pmovmskb is 1 OPS.

%mm0 = 0x8001018080010180
     
pmovmskb %mm0, %esi  ; %esi = 0x99


Might as well build a mask, from byte to copy the bytes as does maskmovq?

Thanks and sorry for my English.
Title: Re: Motion mask
Post by: Magnum on March 23, 2014, 01:10:10 PM
Welcome to the board.

Your english is excellent.

I wish I could help, but FPU instructions are one of my weak points.

Andy
Title: Re: Motion mask
Post by: TouEnMasm on March 23, 2014, 07:43:19 PM
From amd book of instructions
Quote
The MOVNTQ instruction stores a 64-bit MMX register value into a 64-bit memory location.
The MASKMOVQ instruction stores bytes from the first operand, as selected by the mask value (mostsignificant bit of each byte)
in the second operand, to a memorylocation specified in the rDI and DS registers.
The first operand is an MMX register, and the second operand is another MMX register. The size of the store is determined by the effective address size.

The result is in DS:rDI

Quote
• PMOVMSKB—Packed Move Mask Byte
The PMOVMSKB instruction moves the most-significant bit ofeach byte in an XMM register to the low-order word of a 32-bit or 64-bit general-purpose register, with zero-extension. The instruction is useful for extracting bits from mask patterns, or zero values from quantized data, or sign bits—resulting in a byte that can be used for data-dependent branching.


Title: Re: Motion mask
Post by: Gunther on March 23, 2014, 10:27:45 PM
Hi cpu2,

first things first: Welcome to the forum.

Another point. Why do you use the more or less horrible AT&T syntax? That's not really necessary. By the way, you'r English isn't bad.

Gunther
Title: Re: Motion mask
Post by: MichaelW on March 23, 2014, 11:48:18 PM
According to Agner Fog's instruction_tables.pdf (available  here (http://www.agner.org/optimize/)), MASKMOVQ is 31-32 ops on the AMD processors and 3-4 ops on the Intel processors. And PMOVMSKB is 1-3 ops on the AMD processors and 1-2 ops on the Intel processors.
Title: Re: Motion mask
Post by: cpu2 on March 24, 2014, 09:08:54 AM
Yes use the AMD K10. I´m resolved the problem, maskmovq/dqu exceeds 4 instructions and lookup table.

.section .data

bts0_: .quad 0x8080808080808080,0x8080808080808080

.section .text
.globl _start

_start:

movdqa %xmm0, %xmm1

pand bts0_, %xmm0
pcmpeqb bts0_, %xmm0
pand %xmm0, %xmm1


Is more eficient, and sorry for the AT&T syntax, i assemble in Unix like system.

Thanks for the replies an the welcome.
Title: Re: Motion mask
Post by: Gunther on March 24, 2014, 07:40:12 PM
Hi cpu2,

Quote from: cpu2 on March 24, 2014, 09:08:54 AM
Is more eficient, and sorry for the AT&T syntax, i assemble in Unix like system.

that's not the point. You can use


.intel_syntax noprefix


with the gcc (I think since version 2.8 or so).

Gunther