Author Topic: Motion mask  (Read 2881 times)

cpu2

  • Regular Member
  • *
  • Posts: 28
Motion mask
« on: March 23, 2014, 10:50:05 AM »
Hello

I´m implementing the function Mixcolunms of Aes encryption. I used the instruction maskmovq 32 OPS, is to make the modular reduction.

Code: [Select]
%mm0 = 0x8001018080010180
     
maskmovq %mm0, %mm0  ; (%rdi) = 0x8000008080000080

But pmovmskb is 1 OPS.

Code: [Select]
%mm0 = 0x8001018080010180
     
pmovmskb %mm0, %esi  ; %esi = 0x99

Might as well build a mask, from byte to copy the bytes as does maskmovq?

Thanks and sorry for my English.

Magnum

  • Member
  • *****
  • Posts: 2258
Re: Motion mask
« Reply #1 on: March 23, 2014, 01:10:10 PM »
Welcome to the board.

Your english is excellent.

I wish I could help, but FPU instructions are one of my weak points.

Andy
Take care,
                   Andy

Ubuntu-mate-16.04-desktop-amd64

http://www.goodnewsnetwork.org

ToutEnMasm

  • Member
  • *****
  • Posts: 1190
    • EditMasm
Re: Motion mask
« Reply #2 on: March 23, 2014, 07:43:19 PM »
From amd book of instructions
Quote
The MOVNTQ instruction stores a 64-bit MMX register value into a 64-bit memory location.
The MASKMOVQ instruction stores bytes from the first operand, as selected by the mask value (mostsignificant bit of each byte)
in the second operand, to a memorylocation specified in the rDI and DS registers.
The first operand is an MMX register, and the second operand is another MMX register. The size of the store is determined by the effective address size.

The result is in DS:rDI

Quote
• PMOVMSKB—Packed Move Mask Byte
The PMOVMSKB instruction moves the most-significant bit ofeach byte in an XMM register to the low-order word of a 32-bit or 64-bit general-purpose register, with zero-extension. The instruction is useful for extracting bits from mask patterns, or zero values from quantized data, or sign bits—resulting in a byte that can be used for data-dependent branching.


Fa is a musical note to play with CL

Gunther

  • Member
  • *****
  • Posts: 3585
  • Forgive your enemies, but never forget their names
Re: Motion mask
« Reply #3 on: March 23, 2014, 10:27:45 PM »
Hi cpu2,

first things first: Welcome to the forum.

Another point. Why do you use the more or less horrible AT&T syntax? That's not really necessary. By the way, you'r English isn't bad.

Gunther
Get your facts first, and then you can distort them.

MichaelW

  • Global Moderator
  • Member
  • *****
  • Posts: 1209
Re: Motion mask
« Reply #4 on: March 23, 2014, 11:48:18 PM »
According to Agner Fog’s instruction_tables.pdf (available here), MASKMOVQ is 31-32 ops on the AMD processors and 3-4 ops on the Intel processors. And PMOVMSKB is 1-3 ops on the AMD processors and 1-2 ops on the Intel processors.
Well Microsoft, here’s another nice mess you’ve gotten us into.

cpu2

  • Regular Member
  • *
  • Posts: 28
Re: Motion mask
« Reply #5 on: March 24, 2014, 09:08:54 AM »
Yes use the AMD K10. I´m resolved the problem, maskmovq/dqu exceeds 4 instructions and lookup table.

Code: [Select]
.section .data

bts0_: .quad 0x8080808080808080,0x8080808080808080

.section .text
.globl _start

_start:

movdqa %xmm0, %xmm1

pand bts0_, %xmm0
pcmpeqb bts0_, %xmm0
pand %xmm0, %xmm1

Is more eficient, and sorry for the AT&T syntax, i assemble in Unix like system.

Thanks for the replies an the welcome.

Gunther

  • Member
  • *****
  • Posts: 3585
  • Forgive your enemies, but never forget their names
Re: Motion mask
« Reply #6 on: March 24, 2014, 07:40:12 PM »
Hi cpu2,

Is more eficient, and sorry for the AT&T syntax, i assemble in Unix like system.

that's not the point. You can use

Code: [Select]
.intel_syntax noprefix

with the gcc (I think since version 2.8 or so).

Gunther
Get your facts first, and then you can distort them.