News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Faster Memcopy ...

Started by rrr314159, March 03, 2015, 02:40:50 PM

Previous topic - Next topic

guga

Manual option ?  :thumbsup: :thumbsup: :thumbsup: Indeed. Yeah...maybe i´ll implement it later when i finish the fixes. I plan to add a couple of user options as soon i have a few time to dedicate to work on RosAsm updates again. The current version can identifies with relatively accuracy what is ANSI or Unicode strings, but still have some minor problems because on some cases a chunk of data can be either a string or a pointer and it is not so easy to fix that without the help of other tools like a Signature system i started developing years ago, but never finished.

Manual options could be good to make as in IdaPro, allowing the user to choose either he wants to disassemble C-Style strings, Dos Style, pascal Styles, Delphi strings etc..But, for those specific cases (of strings that are used to certain compilers) the better should be i do it when (or if) succeed to finish the signature technique (i called that DIS - Digital Identification System) many years ago.

The current routine does the job in more then 90% of the time for real apps. The disassembler sets a couple of flags to forbidden areas of the PE to avoid those to be  disassembled. For example, sections flagged as import, export, resources etc etc...It basically identifies the good code and data sections and those are the ones that are actually disassembled.

One of the problems is that, inside the code section it is common to we find embedded data, structures, strings etc. Although the disassembler works reasonable fine on those sections, it do have files that produces wrong results...but those i´ll fix later once i fix some minor bugs in RosAsm.

I´m comparing the results of the fixes i´m currently doing and they are more accurate then the ones in IdaPro (Not considering the Flirt system used on Ida, of course), but still it have some issues.

Eventually i´ll try to create a set of macros or internal routines to allow compilation of masm syntax style, but..this will still take more time.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

daydreamer

Quote from: hutch-- on November 16, 2019, 09:14:42 PM
Just write a UNICODE editor, no conversions.
I already worked on unicode richedit,but as asm programmer I cannot resist downsizing tricks like half the size of  text data,upsizing with add the start of that specific language it belongs to and mov to character buffer
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

daydreamer

Quote from: guga on November 17, 2019, 11:50:24 AM
Manual options could be good to make as in IdaPro, allowing the user to choose either he wants to disassemble C-Style strings, Dos Style, pascal Styles, Delphi strings etc..But, for those specific cases (of strings that are used to certain compilers) the better should be i do it when (or if) succeed to finish the signature technique (i called that DIS - Digital Identification System) many years ago.

The current routine does the job in more then 90% of the time for real apps. The disassembler sets a couple of flags to forbidden areas of the PE to avoid those to be  disassembled. For example, sections flagged as import, export, resources etc etc...It basically identifies the good code and data sections and those are the ones that are actually disassembled.

90% great job :thumbsup: some of the fail % it maybe is creator of code tried to make it harder to disassemble?
why dont make it like the smartphone editor: its best guess of string format is showed and a tiny sample is showed to the user and option to choose from a list of different text formats?I had no idea there was loads of different text formats
I also can think of hardcoded mov eax,"abcd" or mov eax,"ab" ;unicode can be very hard to handle in a disassembler
good to know is howto detect text files are saved in utf16 format
my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

hutch--

Using rich edit, the shift from ANSI to UNICODE is simple enough to do. If you have to work in both, load the file and if it looks like garbage, switch from one to the other. Just means reloading the file again. You can make tangled messes that still can't garrantee being correct or switch between the two, the latter is much easier.

aw27

This is another one, this time for an AVX-512 strlen (based on this url)


strlenZMM:
push esi
mov eax, 01010101h
vpbroadcastd zmm2, eax ; broadcast eax to all elements
xor edx, edx ; len = 0
mov eax, 80808080h
vpbroadcastd zmm3, eax
mov esi, [esp+8]
@@:
vmovdqu32 zmm0, ZMMWORD PTR [esi+edx]
vpsubd zmm1, zmm0, zmm2
vpternlogd zmm1, zmm0, zmm3, 32
vptestmd k1, zmm1, zmm1
kmovw eax, k1
movzx eax, ax
test ax, ax
jnz @F
add edx, 64
jmp short @B
@@:
bsf eax, eax
push 32
pop ecx
cmovne ecx, eax
lea esi, dword ptr [esi+ecx*4]
cmp byte ptr [esi+edx], 0
lea eax, dword ptr [edx+ecx*4]
je short @exit
cmp byte ptr [esi+edx+1], 0
jne short @F
inc eax
jmp short @exit
@@:
cmp byte ptr [esi+edx+2], 0
jne short @F
add eax, 2
jmp short @exit
@@:
add eax, 3
@exit:
vzeroupper
pop esi
ret
    end


I added a 4th test for strings between 40000 and 40900 to see it the AVX-512 decouples. Well, not really, SSE Intel Silvermont and SSE Intel Atom are there as well.  :hmmm:

total [0 .. 40], 8++
   290780 cycles 7.asm: sse2
   355355 cycles 5.asm: PCMPISTRI
   412251 cycles 3.asm: SSE Intel Silvermont
   469664 cycles 8.asm: Agner Fog
   502841 cycles 1.asm: SSE 16
   524321 cycles 2.asm: SSE 32
   597335 cycles 9.asm: ZMM AVX512
   865552 cycles 4.asm: SSE Intel Atom
   908227 cycles 6.asm: scasb
   913651 cycles 0.asm: msvcrt.strlen()
   
   
total [41 .. 80], 7++
   270380 cycles 3.asm: SSE Intel Silvermont
   299431 cycles 5.asm: PCMPISTRI
   306940 cycles 7.asm: sse2
   314735 cycles 1.asm: SSE 16
   364536 cycles 9.asm: ZMM AVX512
   380247 cycles 8.asm: Agner Fog
   405156 cycles 2.asm: SSE 32
   639091 cycles 4.asm: SSE Intel Atom
   758265 cycles 6.asm: scasb
   982403 cycles 0.asm: msvcrt.strlen()

   total [600 .. 1000], 100++
   202227 cycles 9.asm: ZMM AVX512
   237534 cycles 3.asm: SSE Intel Silvermont
   292854 cycles 4.asm: SSE Intel Atom
   334146 cycles 2.asm: SSE 32
   338568 cycles 1.asm: SSE 16
   356720 cycles 7.asm: sse2
   436840 cycles 8.asm: Agner Fog
   650222 cycles 5.asm: PCMPISTRI
  1438033 cycles 6.asm: scasb
  1830544 cycles 0.asm: msvcrt.strlen()
 
total [40000 .. 40900], 100++
  2161645 cycles 3.asm: SSE Intel Silvermont
  2224521 cycles 4.asm: SSE Intel Atom
  2342704 cycles 9.asm: ZMM AVX512
  3137064 cycles 1.asm: SSE 16
  3465817 cycles 7.asm: sse2
  3514206 cycles 2.asm: SSE 32
  4113016 cycles 8.asm: Agner Fog
  6173622 cycles 5.asm: PCMPISTRI
13022424 cycles 6.asm: scasb
16670776 cycles 0.asm: msvcrt.strlen() 

nidud

#125
deleted

aw27

@nidud,

It has a bug, I did not figure out yet the logic but it does not pass through this.
00007ff65266174e  test         ecx, ecx 
00007ff652661750  jz         0x7ff652661734 

You can debug with VS 2012 or 2015 and the Intel SDE Debugger. I don't know if it works with the Express or Community, I have the Pro for those years.
Later: I think it does because I read this:
http://masm32.com/board/index.php?topic=6473.msg69456#msg69456

aw27

I understood the logic and corrected it in the following way and it does not need to be aligned.


.code

avx512aligned proc
    sub rsp, 8
    xor             rax,rax
    vpbroadcastq    zmm0,rax
    mov             r8,rcx
    mov             rax,rcx
    and             rax,-64
    and             rcx,64-1
    xor             rdx,rdx
    dec             rdx   
    shl             rdx,cl
    vpcmpgtb        k1,zmm0,[rax]
    kmovq           rcx,k1
    and             rcx,rdx
    jnz             L2

L1:
    add             rax,64
    vpcmpgtb        k1,zmm0,[rax]
    kmovq           rcx,k1
    test            rcx,rcx
    jz              L1

L2:
    bsf             rcx,rcx
    lea             rax,[rax+rcx]
    sub             rax,r8
    dec rax
    add rsp, 8
    ret
avx512aligned endp

end


What a mess was there with the k2! Simply remove it.

guga

Hi, DayDreamer
"90% great job :thumbsup: some of the fail % it maybe is creator of code tried to make it harder to disassemble?"
Thanks :thumbsup: :thumbsup: :)  About the code being harder due to creator choice, well, not necessarily. The vast majority of time disassembler fails to identify is due to characteristics of each compiler or it´s libraries (When the file was not packed, of course).For example, VisualBasic6 code contains a lot of embedded data inside the code section. Some delphi or borland files have that too. Plain C using Api´s by the other-hand are somewhat easier to disassemble because the code and data are more distinguishable from each other. Also some C++ files too are somewhat easier. What makes disablement process a bit hard are heavily bad encoded libraries, specially when they are made with C++ for example or when there are trash code inside, i mean, functions that are never used. On those situations (functions that don´t have any reference), following the data chain is tricky, because some of that chunks can be either code or data.

Sure that there is not any disassembler will be 100% accurate, but if we can get a higher rate of accuracy, the better. I`m pretty sure that if i finish the DIS System (Similar to Flirt on Ida), i can reach something around 98%of accuracy, but..no plans/time to do that right now. I still have to fix lot of problems inside RosAsm yet, and try to make their inner functions be less attached to the interface (and, rebuild it completely eventually). I´ll have to isolate the encoder, the disassembler, the debugger, and create a new resource editor to only then i can try rewrite the interface or implement a new set of macros (or internal routines) to force it to work with something closer to a masm syntax too.


Hi Steve :)
"Using rich edit, the shift from ANSI to UNICODE is simple enough to do. If you have to work in both, load the file and if it looks like garbage, switch from one to the other. Just means reloading the file again. You can make tangled messes that still can't garrantee being correct or switch between the two, the latter is much easier."
Interesting idea. :) Easier indeed.
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

Quote from: guga on November 18, 2019, 11:45:21 AM"Using rich edit, the shift from ANSI to UNICODE is simple enough..."

The RichEdit control can use rtf under the hood, and thus has no problems using Unicode. In RichMasm, for example, your source can be Ansi or Unicode or a mix of both, no problem. The more interesting part is what you pass from editor to assembler - and that must be Utf-8, if you want to display non-English text in your executable.

guga

Hi Jj. Tks.

I tested the new updated and it is working faster :)

About the UTF8, so i always need to convert it to UTF8 before displaying whenever i load a file or immediately before i show it on screen, and don´t need to make a user choice ? I´m not sure, if i understood what you meant with passing from editor to assembly.
You open a file (unicode or ansi) in RichMAsm, and export it as UTF 8 or the UTF8 conversion is done internally only to display the asm text on the screen ?
Coding in Assembly requires a mix of:
80% of brain, passion, intuition, creativity
10% of programming skills
10% of alcoholic levels in your blood.

My Code Sites:
http://rosasm.freeforums.org
http://winasm.tripod.com

jj2007

This is OT but I'll keep it short:

include \masm32\MasmBasic\MasmBasic.inc         ; download
  Init
  uMsgBox 0, "Добро пожаловать", "歡迎", MB_OK
EndOfCode


- RichMasm exports this source to whatever.asm as Utf-8
- the assembler (ML, UAsm, AsmC) sees an 8-bit text but doesn't care whether that's codepage 1252 or 65001 or whatever
- the executable sees an 8-bit text, and gets told via the u in uMsgBox that a MessageBoxW is required, and that it should kindly translate the Utf-8 to Utf-16 before passing it on to MessageBoxW

That's the whole trick. For the coder, it's convenient because he can write everything directly into your source. And of course, comments can be in any encoding, the RichEdit control uses RTF to store it, and the assemblers don't care.

nidud

#132
deleted

aw27

It works very well.  :thumbsup:


total [0 .. 40], 8++
   307793 cycles 1.asm: PCMPISTRI
   443231 cycles 0.asm: AVX-512 aligned
   521571 cycles 2.asm: ZMM AVX512 (older)

total [41 .. 80], 7++
   257807 cycles 0.asm: AVX-512 aligned
   356038 cycles 2.asm: ZMM AVX512 (older)
   370879 cycles 1.asm: PCMPISTRI
   
total [600 .. 1000], 100++
   113553 cycles 0.asm: AVX-512 aligned
   204811 cycles 2.asm: ZMM AVX512 (older)
   859649 cycles 1.asm: PCMPISTRI

total [40000 .. 40800], 100++
   897536 cycles 0.asm: AVX-512 aligned
  2127546 cycles 2.asm: ZMM AVX512 (older)
  5980835 cycles 1.asm: PCMPISTRI 


If you convert the remaining to 64-bit I will test them all.

nidud

#134
deleted