News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

AES 128 bits Encrypt

Started by cpu2, May 14, 2014, 09:25:17 PM

Previous topic - Next topic

peter_asm

Another question. Why are you using MOVNTI?
Explain the purpose of your code.
Provide comments for why you use instructions.

cpu2

Quote from: peter_asm on May 20, 2014, 12:45:59 PM
Another question. Why are you using MOVNTI?

To minimize the cache pollution.

Quote from: peter_asm on May 20, 2014, 12:45:59 PM
Explain the purpose of your code.

This code is a fragment of one of my projects, not intended for an implementation with fastcall and C / C + +.

If that's why you believe it is an ineffective code and I'm not serious, it is a shame.

The objective of this code is test me, and if possible provide faster than some projects code, which I think and got, if it is false please let me know.

Quote from: peter_asm on May 20, 2014, 12:45:59 PM
Provide comments for why you use instructions.

Okay,  seeing how strict they are with the syntax and presentation, when ready InvMixcolumns.

Regards.

peter_asm

I haven't given up on this. I'm genuinely very interested in your approach to encrypting with AES but haven't had time lately to test the code again.
If you would consider using INTEL syntax and using stdcall/fastcall convention in addition to providing example, I'm sure many more forum members would provide feedback.
Right now for me it's pain to convert into INTEL syntax, then assemble 2 files before loading into a debugger just to monitor the data because if i run exe it just crashes.

No offense, I think you're on to something and it's worth exploring but why you make it difficult for people to test/use your code is what i'm having difficulty understanding.
Is it because you don't want people to steal it?

cpu2

Okay, I promise to write comments in the code, and translated into intel syntax, and use the fastcall system.

So the question on movnti was a doubt, I thought it was something like "what are you doing". Although because of the syntax not understand some instructions are not put, because if it is simply better to mov.

And not put the code that way so people do not steal, this is how I program.

Regards.

P.S: Here a few days, publishes InvMixcolumns, I was busy and could not finish it.

Gunther

Hi cpu2,

take care, slow down. Don't rush.

Gunther
You have to know the facts before you can distort them.

peter_asm

Quote from: cpu2 on May 22, 2014, 02:29:42 PM
Okay, I promise to write comments in the code, and translated into intel syntax, and use the fastcall system.

So the question on movnti was a doubt, I thought it was something like "what are you doing". Although because of the syntax not understand some instructions are not put, because if it is simply better to mov.

And not put the code that way so people do not steal, this is how I program.

Regards.

P.S: Here a few days, publishes InvMixcolumns, I was busy and could not finish it.

No problem. As Gunther said, take your time and I personally look forward to seeing your results.
The code is interesting and I think it is worth developing further but I don't completely understand AES and wouldn't be much help right now. I'd like to help and I'm sure many others on here would too.

I did try to optimize the AES key generation algorithm for encryption by WiteG.
It isn't optimized for speed. I like your idea and hope it can be realized as it could be very useful.


setkey:
    pushad
    mov   esi, [esp+32+4]  ; input
    mov   edi, [esp+32+8]  ; output
    lea   ebx, [sbox]
   
    push  4
    pop   ecx
load_key:
    lodsd
    stosd
    loop  load_key
    push  1
    pop   edx
    mov   cl, 10
init_key:
    push  ecx
    mov   cl, 4
swap_bytes:
    ror   eax, 8
    xlatb
    loop  swap_bytes
    pop   ecx
    ror   eax, 8
    xor   eax, edx
    shl   dl, 1
    jnc   no_carry
    xor   dl, 1Bh
no_carry:   
    push  ecx
    mov   cl, 4
xor_dword:
    xor   eax, dword ptr [edi-16]
    stosd
    loop  xor_dword
    pop   ecx
    loop  init_key
    popad
    ret 2*4


cpu2

Quote from: Gunther on May 22, 2014, 04:11:43 PM
Hi cpu2,

take care, slow down. Don't rush.

Gunther

Okay.  :icon_mrgreen:

Quote from: peter_asm on May 23, 2014, 07:16:22 AM
Quote from: cpu2 on May 22, 2014, 02:29:42 PM
Okay, I promise to write comments in the code, and translated into intel syntax, and use the fastcall system.

So the question on movnti was a doubt, I thought it was something like "what are you doing". Although because of the syntax not understand some instructions are not put, because if it is simply better to mov.

And not put the code that way so people do not steal, this is how I program.

Regards.

P.S: Here a few days, publishes InvMixcolumns, I was busy and could not finish it.

No problem. As Gunther said, take your time and I personally look forward to seeing your results.
The code is interesting and I think it is worth developing further but I don't completely understand AES and wouldn't be much help right now. I'd like to help and I'm sure many others on here would too.

Thanks, did not think that would be so interesting to them.



Quote from: peter_asm on May 23, 2014, 07:16:22 AM
I did try to optimize the AES key generation algorithm for encryption by WiteG.
It isn't optimized for speed. I like your idea and hope it can be realized as it could be very useful.


setkey:
    pushad
    mov   esi, [esp+32+4]  ; input
    mov   edi, [esp+32+8]  ; output
    lea   ebx, [sbox]
   
    push  4
    pop   ecx
load_key:
    lodsd
    stosd
    loop  load_key
    push  1
    pop   edx
    mov   cl, 10
init_key:
    push  ecx
    mov   cl, 4
swap_bytes:
    ror   eax, 8
    xlatb
    loop  swap_bytes
    pop   ecx
    ror   eax, 8
    xor   eax, edx
    shl   dl, 1
    jnc   no_carry
    xor   dl, 1Bh
no_carry:   
    push  ecx
    mov   cl, 4
xor_dword:
    xor   eax, dword ptr [edi-16]
    stosd
    loop  xor_dword
    pop   ecx
    loop  init_key
    popad
    ret 2*4




OPS would calculate that a Sandy Bridge, would be about 572 OPS. I assume that everyone will have the carry, if they would 500- 552 OPS.

Mine was a 274 OPS.

But I would not modular reduction in key expand, you saw that I did in Mixcolumns, I guess that is the intention of this xor.


if you need help, ask.

Regards.

Gunther

Hi cpu2,

Quote from: cpu2 on May 23, 2014, 09:38:34 AM
if you need help, ask.

okay, so be prepared.

Gunther
You have to know the facts before you can distort them.

cpu2

Quote from: Gunther on May 23, 2014, 07:51:01 PM
Hi cpu2,

Quote from: cpu2 on May 23, 2014, 09:38:34 AM
if you need help, ask.

okay, so be prepared.

Gunther

Okay.  :t

-------

Days ago any comment, just tell them that I am already writing InvMixcolumns, you can not do before for a few personal problems.

I found a person who is willing to translate my code to intel syntax and fastcall, is a member of a Hispanic forum which I also belong.

I guess in a few days and will, but the truth is more complicated function.

Regards.

Gunther

Hi  cpu2,

Quote from: cpu2 on May 26, 2014, 05:57:43 PM
I found a person who is willing to translate my code to intel syntax and fastcall, is a member of a Hispanic forum which I also belong.

translating to Intel syntax isn't hard. Compile it with gas, use objdump -d -Mintel myfile.o and you've the Intel syntax.

Gunther
You have to know the facts before you can distort them.

peter_asm

Nice one Gunther


>objdump -d -Mintel --no-show-raw-insn aes.o

aes.o:     file format pe-x86-64


Disassembly of section .text:

0000000000000000 <_aes_crypt>:
   0:   push   r11
   2:   push   r12
   4:   push   r13
   6:   push   r14
   8:   push   r15
   a:   push   r8
   c:   push   r9
   e:   push   rax
   f:   mov    r12,0x170
  16:   prefetch BYTE PTR ds:0x0
  1e:   prefetch BYTE PTR ds:0x100
  26:   mov    r13,0xffffffffffffff60
  2d:   movdqu xmm0,XMMWORD PTR [r11]
  32:   movdqu XMMWORD PTR [rsp+r13*1-0x10],xmm0
  39:   mov    r11d,DWORD PTR [r11+0xc]
  3d:   movnti QWORD PTR [rsp+r13*1],r11d


any idea how to remove the prefixed addresses?

Gunther

Quote from: peter_asm on May 27, 2014, 11:19:14 PM
Nice one Gunther

but it did work, didn't it? But what the heck:

I think that prefetch instructions are necessary.

Gunther
You have to know the facts before you can distort them.

peter_asm

No, I mean the addresses before each mnemonic so i could assemble with JWASM, that's all.
might be possible using cut command.

something like : @echo off & for /f "tokens=2 delims=:" %i in (aes.asm) do echo %i >aes_jwasm.asm
ah, it's okay, would just love an easy way to convert at&t into intel.
My own way was using ida pro disassembler which wasn't all that great.

Gunther

Hi peter_asm,

cut & paste should be the right way, I think.

Quote from: peter_asm on May 28, 2014, 12:24:07 AM
something like : @echo off & for /f "tokens=2 delims=:" %i in (aes.asm) do echo %i >aes_jwasm.asm

That's to crazy, but could work.  :lol: :lol: :lol:

Gunther
You have to know the facts before you can distort them.

cpu2

Do not worry for the syntax, as I said earlier.

Had personal problems this week so I finally write something, is a small step but it classified the bytes to later while the modular reduction.

.section .data

bt0_: .quad 0x8080808080808080,0x8080808080808080
bt1_: .quad 0x4040404040404040,0x4040404040404040
bt2_: .quad 0x2020202020202020,0x2020202020202020
bd1_: .quad 0x3f3f3f3f3f3f3f3f,0x3f3f3f3f3f3f3f3f
bd2_: .quad 0x7f7f7f7f7f7f7f7f,0x7f7f7f7f7f7f7f7f

.section .text
.globl _start

_start:

movdqa  %xmm0, %xmm1
movdqa  %xmm1, %xmm2
movdqa  %xmm2, %xmm3
movdqa  %xmm3, %xmm4

pand bt0_, %xmm1
pcmpeqb bt0_, %xmm1

movdqa %xmm1, %xmm8
pand %xmm1, %xmm2

movdqa %xmm2, %xmm5
pand bd2_, %xmm2
pcmpeqb bd2_, %xmm2
movdqa %xmm2, %xmm6
pand %xmm0, %xmm2
pand bd1_, %xmm5
pcmpeqb bd1_, %xmm5
movdqa %xmm5, %xmm7
pand %xmm0, %xmm5

pandn %xmm0, %xmm1

pand bt1_, %xmm1
pcmpeqb bt1_, %xmm1

movdqa %xmm1, %xmm9
pand %xmm1, %xmm3

movdqa %xmm3, %xmm11
pand bd1_, %xmm3
pcmpeqb bt1_, %xmm3
movdqa %xmm3, %xmm6
pand %xmm0, %xmm3

pandn %xmm0, %xmm1

pand bt2_, %xmm1
pcmpeqb bt2_, %xmm1

movdqa %xmm1, %xmm10
pand %xmm1, %xmm4


I hope to finish soon.

Regards.