AES 128 bits Encrypt

peter_asm · May 20, 2014, 12:45:59 PM

Another question. Why are you using MOVNTI?
Explain the purpose of your code.
Provide comments for why you use instructions.

cpu2 · May 20, 2014, 01:24:14 PM

Quote from: peter_asm on May 20, 2014, 12:45:59 PM
Another question. Why are you using MOVNTI?

To minimize the cache pollution.

Quote from: peter_asm on May 20, 2014, 12:45:59 PM
Explain the purpose of your code.

This code is a fragment of one of my projects, not intended for an implementation with fastcall and C / C + +.

If that's why you believe it is an ineffective code and I'm not serious, it is a shame.

The objective of this code is test me, and if possible provide faster than some projects code, which I think and got, if it is false please let me know.

Quote from: peter_asm on May 20, 2014, 12:45:59 PM
Provide comments for why you use instructions.

Okay, seeing how strict they are with the syntax and presentation, when ready InvMixcolumns.

Regards.

peter_asm · May 22, 2014, 09:20:13 AM

I haven't given up on this. I'm genuinely very interested in your approach to encrypting with AES but haven't had time lately to test the code again.
If you would consider using INTEL syntax and using stdcall/fastcall convention in addition to providing example, I'm sure many more forum members would provide feedback.
Right now for me it's pain to convert into INTEL syntax, then assemble 2 files before loading into a debugger just to monitor the data because if i run exe it just crashes.

No offense, I think you're on to something and it's worth exploring but why you make it difficult for people to test/use your code is what i'm having difficulty understanding.
Is it because you don't want people to steal it?

cpu2 · May 22, 2014, 02:29:42 PM

Okay, I promise to write comments in the code, and translated into intel syntax, and use the fastcall system.

So the question on movnti was a doubt, I thought it was something like "what are you doing". Although because of the syntax not understand some instructions are not put, because if it is simply better to mov.

And not put the code that way so people do not steal, this is how I program.

Regards.

P.S: Here a few days, publishes InvMixcolumns, I was busy and could not finish it.

Gunther · May 22, 2014, 04:11:43 PM

Hi cpu2,

take care, slow down. Don't rush.

Gunther

peter_asm · May 23, 2014, 07:16:22 AM

Quote from: cpu2 on May 22, 2014, 02:29:42 PM
Okay, I promise to write comments in the code, and translated into intel syntax, and use the fastcall system.

So the question on movnti was a doubt, I thought it was something like "what are you doing". Although because of the syntax not understand some instructions are not put, because if it is simply better to mov.

And not put the code that way so people do not steal, this is how I program.

Regards.

P.S: Here a few days, publishes InvMixcolumns, I was busy and could not finish it.

No problem. As Gunther said, take your time and I personally look forward to seeing your results.
The code is interesting and I think it is worth developing further but I don't completely understand AES and wouldn't be much help right now. I'd like to help and I'm sure many others on here would too.

I did try to optimize the AES key generation algorithm for encryption by WiteG.
It isn't optimized for speed. I like your idea and hope it can be realized as it could be very useful.

Code Select


setkey:
    pushad
    mov   esi, [esp+32+4]  ; input
    mov   edi, [esp+32+8]  ; output
    lea   ebx, [sbox]
    
    push  4
    pop   ecx
load_key:
    lodsd
    stosd
    loop  load_key
    push  1
    pop   edx
    mov   cl, 10
init_key:
    push  ecx
    mov   cl, 4
swap_bytes:
    ror   eax, 8
    xlatb
    loop  swap_bytes
    pop   ecx
    ror   eax, 8
    xor   eax, edx
    shl   dl, 1
    jnc   no_carry
    xor   dl, 1Bh
no_carry:    
    push  ecx
    mov   cl, 4
xor_dword:
    xor   eax, dword ptr [edi-16]
    stosd
    loop  xor_dword
    pop   ecx
    loop  init_key
    popad
    ret 2*4

cpu2 · May 23, 2014, 09:38:34 AM

Quote from: Gunther on May 22, 2014, 04:11:43 PM
Hi cpu2,

take care, slow down. Don't rush.

Gunther

Okay. :icon_mrgreen:

Quote from: peter_asm on May 23, 2014, 07:16:22 AM
Quote from: cpu2 on May 22, 2014, 02:29:42 PM
Okay, I promise to write comments in the code, and translated into intel syntax, and use the fastcall system.

So the question on movnti was a doubt, I thought it was something like "what are you doing". Although because of the syntax not understand some instructions are not put, because if it is simply better to mov.

And not put the code that way so people do not steal, this is how I program.

Regards.

P.S: Here a few days, publishes InvMixcolumns, I was busy and could not finish it.

No problem. As Gunther said, take your time and I personally look forward to seeing your results.
The code is interesting and I think it is worth developing further but I don't completely understand AES and wouldn't be much help right now. I'd like to help and I'm sure many others on here would too.

Thanks, did not think that would be so interesting to them.

Quote from: peter_asm on May 23, 2014, 07:16:22 AM
I did try to optimize the AES key generation algorithm for encryption by WiteG.
It isn't optimized for speed. I like your idea and hope it can be realized as it could be very useful.

Code Select Expand
setkey: pushad mov esi, [esp+32+4] ; input mov edi, [esp+32+8] ; output lea ebx, [sbox] push 4 pop ecx load_key: lodsd stosd loop load_key push 1 pop edx mov cl, 10 init_key: push ecx mov cl, 4 swap_bytes: ror eax, 8 xlatb loop swap_bytes pop ecx ror eax, 8 xor eax, edx shl dl, 1 jnc no_carry xor dl, 1Bh no_carry: push ecx mov cl, 4 xor_dword: xor eax, dword ptr [edi-16] stosd loop xor_dword pop ecx loop init_key popad ret 2*4

OPS would calculate that a Sandy Bridge, would be about 572 OPS. I assume that everyone will have the carry, if they would 500- 552 OPS.

Mine was a 274 OPS.

But I would not modular reduction in key expand, you saw that I did in Mixcolumns, I guess that is the intention of this xor.

if you need help, ask.

Regards.

Gunther · May 23, 2014, 07:51:01 PM

Hi cpu2,

Quote from: cpu2 on May 23, 2014, 09:38:34 AM
if you need help, ask.

okay, so be prepared.

Gunther

cpu2 · May 26, 2014, 05:57:43 PM

Quote from: Gunther on May 23, 2014, 07:51:01 PM
Hi cpu2,

Quote from: cpu2 on May 23, 2014, 09:38:34 AM
if you need help, ask.

okay, so be prepared.

Gunther

Okay. :t

-------

Days ago any comment, just tell them that I am already writing InvMixcolumns, you can not do before for a few personal problems.

I found a person who is willing to translate my code to intel syntax and fastcall, is a member of a Hispanic forum which I also belong.

I guess in a few days and will, but the truth is more complicated function.

Regards.

Gunther · May 26, 2014, 07:12:23 PM

Hi cpu2,

Quote from: cpu2 on May 26, 2014, 05:57:43 PM
I found a person who is willing to translate my code to intel syntax and fastcall, is a member of a Hispanic forum which I also belong.

translating to Intel syntax isn't hard. Compile it with gas, use objdump -d -Mintel myfile.o and you've the Intel syntax.

Gunther

peter_asm · May 27, 2014, 11:19:14 PM

Nice one Gunther

Code Select


>objdump -d -Mintel --no-show-raw-insn aes.o

aes.o:     file format pe-x86-64


Disassembly of section .text:

0000000000000000 <_aes_crypt>:
   0:   push   r11
   2:   push   r12
   4:   push   r13
   6:   push   r14
   8:   push   r15
   a:   push   r8
   c:   push   r9
   e:   push   rax
   f:   mov    r12,0x170
  16:   prefetch BYTE PTR ds:0x0
  1e:   prefetch BYTE PTR ds:0x100
  26:   mov    r13,0xffffffffffffff60
  2d:   movdqu xmm0,XMMWORD PTR [r11]
  32:   movdqu XMMWORD PTR [rsp+r13*1-0x10],xmm0
  39:   mov    r11d,DWORD PTR [r11+0xc]
  3d:   movnti QWORD PTR [rsp+r13*1],r11d

any idea how to remove the prefixed addresses?

Gunther · May 27, 2014, 11:26:49 PM

Quote from: peter_asm on May 27, 2014, 11:19:14 PM
Nice one Gunther

but it did work, didn't it? But what the heck:

I think that prefetch instructions are necessary.

Gunther

peter_asm · May 28, 2014, 12:24:07 AM

No, I mean the addresses before each mnemonic so i could assemble with JWASM, that's all.
might be possible using cut command.

something like : @echo off & for /f "tokens=2 delims=:" %i in (aes.asm) do echo %i >aes_jwasm.asm
ah, it's okay, would just love an easy way to convert at&t into intel.
My own way was using ida pro disassembler which wasn't all that great.

Gunther · May 28, 2014, 02:00:00 AM

Hi peter_asm,

cut & paste should be the right way, I think.

Quote from: peter_asm on May 28, 2014, 12:24:07 AM
something like : @echo off & for /f "tokens=2 delims=:" %i in (aes.asm) do echo %i >aes_jwasm.asm

That's to crazy, but could work. :lol: :lol: :lol:

Gunther

cpu2 · May 31, 2014, 01:33:18 AM

Do not worry for the syntax, as I said earlier.

Had personal problems this week so I finally write something, is a small step but it classified the bytes to later while the modular reduction.

Code Select

.section .data

bt0_: .quad 0x8080808080808080,0x8080808080808080
bt1_: .quad 0x4040404040404040,0x4040404040404040
bt2_: .quad 0x2020202020202020,0x2020202020202020
bd1_: .quad 0x3f3f3f3f3f3f3f3f,0x3f3f3f3f3f3f3f3f
bd2_: .quad 0x7f7f7f7f7f7f7f7f,0x7f7f7f7f7f7f7f7f

.section .text
.globl _start

_start:

movdqa  %xmm0, %xmm1
movdqa  %xmm1, %xmm2
movdqa  %xmm2, %xmm3
movdqa  %xmm3, %xmm4

pand bt0_, %xmm1
pcmpeqb bt0_, %xmm1

movdqa %xmm1, %xmm8
pand %xmm1, %xmm2

movdqa %xmm2, %xmm5
pand bd2_, %xmm2
pcmpeqb bd2_, %xmm2
movdqa %xmm2, %xmm6
pand %xmm0, %xmm2
pand bd1_, %xmm5
pcmpeqb bd1_, %xmm5
movdqa %xmm5, %xmm7
pand %xmm0, %xmm5

pandn %xmm0, %xmm1

pand bt1_, %xmm1
pcmpeqb bt1_, %xmm1

movdqa %xmm1, %xmm9
pand %xmm1, %xmm3

movdqa %xmm3, %xmm11
pand bd1_, %xmm3
pcmpeqb bt1_, %xmm3
movdqa %xmm3, %xmm6
pand %xmm0, %xmm3

pandn %xmm0, %xmm1

pand bt2_, %xmm1
pcmpeqb bt2_, %xmm1

movdqa %xmm1, %xmm10
pand %xmm1, %xmm4

I hope to finish soon.

Regards.

The MASM Forum

News:

AES 128 bits Encrypt

peter_asm

cpu2

peter_asm

cpu2

Gunther

peter_asm

cpu2

Gunther

cpu2

Gunther

peter_asm

Gunther

peter_asm

Gunther

cpu2