Faster Memcopy ...

guga · November 17, 2019, 11:50:24 AM

Manual option ?

Indeed. Yeah...maybe i´ll implement it later when i finish the fixes. I plan to add a couple of user options as soon i have a few time to dedicate to work on RosAsm updates again. The current version can identifies with relatively accuracy what is ANSI or Unicode strings, but still have some minor problems because on some cases a chunk of data can be either a string or a pointer and it is not so easy to fix that without the help of other tools like a Signature system i started developing years ago, but never finished.

Manual options could be good to make as in IdaPro, allowing the user to choose either he wants to disassemble C-Style strings, Dos Style, pascal Styles, Delphi strings etc..But, for those specific cases (of strings that are used to certain compilers) the better should be i do it when (or if) succeed to finish the signature technique (i called that DIS - Digital Identification System) many years ago.

The current routine does the job in more then 90% of the time for real apps. The disassembler sets a couple of flags to forbidden areas of the PE to avoid those to be disassembled. For example, sections flagged as import, export, resources etc etc...It basically identifies the good code and data sections and those are the ones that are actually disassembled.

One of the problems is that, inside the code section it is common to we find embedded data, structures, strings etc. Although the disassembler works reasonable fine on those sections, it do have files that produces wrong results...but those i´ll fix later once i fix some minor bugs in RosAsm.

I´m comparing the results of the fixes i´m currently doing and they are more accurate then the ones in IdaPro (Not considering the Flirt system used on Ida, of course), but still it have some issues.

Eventually i´ll try to create a set of macros or internal routines to allow compilation of masm syntax style, but..this will still take more time.

daydreamer · November 17, 2019, 08:50:52 PM

Quote from: hutch-- on November 16, 2019, 09:14:42 PM
Just write a UNICODE editor, no conversions.

I already worked on unicode richedit,but as asm programmer I cannot resist downsizing tricks like half the size of text data,upsizing with add the start of that specific language it belongs to and mov to character buffer

daydreamer · November 17, 2019, 09:12:38 PM

Quote from: guga on November 17, 2019, 11:50:24 AM
Manual options could be good to make as in IdaPro, allowing the user to choose either he wants to disassemble C-Style strings, Dos Style, pascal Styles, Delphi strings etc..But, for those specific cases (of strings that are used to certain compilers) the better should be i do it when (or if) succeed to finish the signature technique (i called that DIS - Digital Identification System) many years ago.

The current routine does the job in more then 90% of the time for real apps. The disassembler sets a couple of flags to forbidden areas of the PE to avoid those to be disassembled. For example, sections flagged as import, export, resources etc etc...It basically identifies the good code and data sections and those are the ones that are actually disassembled.

90% great job

some of the fail % it maybe is creator of code tried to make it harder to disassemble?
why dont make it like the smartphone editor: its best guess of string format is showed and a tiny sample is showed to the user and option to choose from a list of different text formats?I had no idea there was loads of different text formats
I also can think of hardcoded mov eax,"abcd" or mov eax,"ab" ;unicode can be very hard to handle in a disassembler
good to know is howto detect text files are saved in utf16 format

hutch-- · November 17, 2019, 10:15:24 PM

Using rich edit, the shift from ANSI to UNICODE is simple enough to do. If you have to work in both, load the file and if it looks like garbage, switch from one to the other. Just means reloading the file again. You can make tangled messes that still can't garrantee being correct or switch between the two, the latter is much easier.

aw27 · November 17, 2019, 11:03:31 PM

This is another one, this time for an AVX-512 strlen (based on this url)

Code Select


strlenZMM:
	push esi
	mov eax, 01010101h
	vpbroadcastd zmm2, eax ; broadcast eax to all elements
	xor edx, edx ; len = 0
	mov	eax, 80808080h
	vpbroadcastd zmm3, eax
	mov esi, [esp+8]
@@:	
	vmovdqu32 zmm0, ZMMWORD PTR [esi+edx]
	vpsubd	zmm1, zmm0, zmm2
	vpternlogd zmm1, zmm0, zmm3, 32
	vptestmd k1, zmm1, zmm1
	kmovw eax, k1
	movzx	eax, ax
	test	ax, ax
	jnz @F
	add	edx, 64	
	jmp short @B
@@:	
	bsf	eax, eax
	push 32
	pop	ecx
	cmovne	ecx, eax
	lea	esi, dword ptr [esi+ecx*4]
	cmp	byte ptr [esi+edx], 0
	lea	eax, dword ptr [edx+ecx*4]
	je	short @exit
	cmp	byte ptr [esi+edx+1], 0
	jne	short @F
	inc	eax
	jmp	short @exit
@@:
	cmp	byte ptr [esi+edx+2], 0
	jne	short @F
	add	eax, 2
	jmp	short @exit
@@:	
	add	eax, 3
@exit:
	vzeroupper
	pop esi
	ret
    end

I added a 4th test for strings between 40000 and 40900 to see it the AVX-512 decouples. Well, not really, SSE Intel Silvermont and SSE Intel Atom are there as well.

total [0 .. 40], 8++
290780 cycles 7.asm: sse2
355355 cycles 5.asm: PCMPISTRI
412251 cycles 3.asm: SSE Intel Silvermont
469664 cycles 8.asm: Agner Fog
502841 cycles 1.asm: SSE 16
524321 cycles 2.asm: SSE 32
597335 cycles 9.asm: ZMM AVX512
865552 cycles 4.asm: SSE Intel Atom
908227 cycles 6.asm: scasb
913651 cycles 0.asm: msvcrt.strlen()

total [41 .. 80], 7++
270380 cycles 3.asm: SSE Intel Silvermont
299431 cycles 5.asm: PCMPISTRI
306940 cycles 7.asm: sse2
314735 cycles 1.asm: SSE 16
364536 cycles 9.asm: ZMM AVX512
380247 cycles 8.asm: Agner Fog
405156 cycles 2.asm: SSE 32
639091 cycles 4.asm: SSE Intel Atom
758265 cycles 6.asm: scasb
982403 cycles 0.asm: msvcrt.strlen()

total [600 .. 1000], 100++
202227 cycles 9.asm: ZMM AVX512
237534 cycles 3.asm: SSE Intel Silvermont
292854 cycles 4.asm: SSE Intel Atom
334146 cycles 2.asm: SSE 32
338568 cycles 1.asm: SSE 16
356720 cycles 7.asm: sse2
436840 cycles 8.asm: Agner Fog
650222 cycles 5.asm: PCMPISTRI
1438033 cycles 6.asm: scasb
1830544 cycles 0.asm: msvcrt.strlen()

total [40000 .. 40900], 100++
2161645 cycles 3.asm: SSE Intel Silvermont
2224521 cycles 4.asm: SSE Intel Atom
2342704 cycles 9.asm: ZMM AVX512
3137064 cycles 1.asm: SSE 16
3465817 cycles 7.asm: sse2
3514206 cycles 2.asm: SSE 32
4113016 cycles 8.asm: Agner Fog
6173622 cycles 5.asm: PCMPISTRI
13022424 cycles 6.asm: scasb
16670776 cycles 0.asm: msvcrt.strlen()

nidud · November 18, 2019, 12:53:02 AM

deleted

aw27 · November 18, 2019, 05:19:44 AM

@nidud,

It has a bug, I did not figure out yet the logic but it does not pass through this.
00007ff65266174e test ecx, ecx
00007ff652661750 jz 0x7ff652661734

You can debug with VS 2012 or 2015 and the Intel SDE Debugger. I don't know if it works with the Express or Community, I have the Pro for those years.
Later: I think it does because I read this:
http://masm32.com/board/index.php?topic=6473.msg69456#msg69456

aw27 · November 18, 2019, 08:07:16 AM

I understood the logic and corrected it in the following way and it does not need to be aligned.

Code Select


.code

avx512aligned proc
    sub rsp, 8
    xor             rax,rax
    vpbroadcastq    zmm0,rax
    mov             r8,rcx
    mov             rax,rcx
    and             rax,-64
    and             rcx,64-1
    xor             rdx,rdx
    dec             rdx   
    shl             rdx,cl
    vpcmpgtb        k1,zmm0,[rax]
    kmovq           rcx,k1
    and             rcx,rdx
    jnz             L2
 
L1:
    add             rax,64
    vpcmpgtb        k1,zmm0,[rax]
    kmovq           rcx,k1
    test            rcx,rcx
    jz              L1

L2:
    bsf             rcx,rcx
    lea             rax,[rax+rcx]
    sub             rax,r8
    dec rax
    add rsp, 8
    ret
avx512aligned endp

end

What a mess was there with the k2! Simply remove it.

guga · November 18, 2019, 11:45:21 AM

Hi, DayDreamer
"90% great job

some of the fail % it maybe is creator of code tried to make it harder to disassemble?"
Thanks

:) About the code being harder due to creator choice, well, not necessarily. The vast majority of time disassembler fails to identify is due to characteristics of each compiler or it´s libraries (When the file was not packed, of course).For example, VisualBasic6 code contains a lot of embedded data inside the code section. Some delphi or borland files have that too. Plain C using Api´s by the other-hand are somewhat easier to disassemble because the code and data are more distinguishable from each other. Also some C++ files too are somewhat easier. What makes disablement process a bit hard are heavily bad encoded libraries, specially when they are made with C++ for example or when there are trash code inside, i mean, functions that are never used. On those situations (functions that don´t have any reference), following the data chain is tricky, because some of that chunks can be either code or data.

Sure that there is not any disassembler will be 100% accurate, but if we can get a higher rate of accuracy, the better. I`m pretty sure that if i finish the DIS System (Similar to Flirt on Ida), i can reach something around 98%of accuracy, but..no plans/time to do that right now. I still have to fix lot of problems inside RosAsm yet, and try to make their inner functions be less attached to the interface (and, rebuild it completely eventually). I´ll have to isolate the encoder, the disassembler, the debugger, and create a new resource editor to only then i can try rewrite the interface or implement a new set of macros (or internal routines) to force it to work with something closer to a masm syntax too.

Hi Steve :)
"Using rich edit, the shift from ANSI to UNICODE is simple enough to do. If you have to work in both, load the file and if it looks like garbage, switch from one to the other. Just means reloading the file again. You can make tangled messes that still can't garrantee being correct or switch between the two, the latter is much easier."
Interesting idea. :) Easier indeed.

jj2007 · November 18, 2019, 08:58:06 PM

Quote from: guga on November 18, 2019, 11:45:21 AM"Using rich edit, the shift from ANSI to UNICODE is simple enough..."

The RichEdit control can use rtf under the hood, and thus has no problems using Unicode. In RichMasm, for example, your source can be Ansi or Unicode or a mix of both, no problem. The more interesting part is what you pass from editor to assembler - and that must be Utf-8, if you want to display non-English text in your executable.

guga · November 18, 2019, 11:34:00 PM

Hi Jj. Tks.

I tested the new updated and it is working faster :)

About the UTF8, so i always need to convert it to UTF8 before displaying whenever i load a file or immediately before i show it on screen, and don´t need to make a user choice ? I´m not sure, if i understood what you meant with passing from editor to assembly.
You open a file (unicode or ansi) in RichMAsm, and export it as UTF 8 or the UTF8 conversion is done internally only to display the asm text on the screen ?

jj2007 · November 19, 2019, 12:35:25 AM

This is OT but I'll keep it short:

include \masm32\MasmBasic\MasmBasic.inc ; download
Init
uMsgBox 0, "Добро пожаловать", "歡迎", MB_OK
EndOfCode

- RichMasm exports this source to whatever.asm as Utf-8
- the assembler (ML, UAsm, AsmC) sees an 8-bit text but doesn't care whether that's codepage 1252 or 65001 or whatever
- the executable sees an 8-bit text, and gets told via the u in uMsgBox that a MessageBoxW is required, and that it should kindly translate the Utf-8 to Utf-16 before passing it on to MessageBoxW

That's the whole trick. For the coder, it's convenient because he can write everything directly into your source. And of course, comments can be in any encoding, the RichEdit control uses RTF to store it, and the assemblers don't care.

nidud · November 19, 2019, 03:17:18 AM

deleted

aw27 · November 19, 2019, 05:37:46 AM

It works very well.

Code Select


total [0 .. 40], 8++
   307793 cycles 1.asm: PCMPISTRI
   443231 cycles 0.asm: AVX-512 aligned
   521571 cycles 2.asm: ZMM AVX512 (older)

total [41 .. 80], 7++
   257807 cycles 0.asm: AVX-512 aligned
   356038 cycles 2.asm: ZMM AVX512 (older)
   370879 cycles 1.asm: PCMPISTRI
   
total [600 .. 1000], 100++
   113553 cycles 0.asm: AVX-512 aligned
   204811 cycles 2.asm: ZMM AVX512 (older)
   859649 cycles 1.asm: PCMPISTRI

total [40000 .. 40800], 100++
   897536 cycles 0.asm: AVX-512 aligned
  2127546 cycles 2.asm: ZMM AVX512 (older)
  5980835 cycles 1.asm: PCMPISTRI

If you convert the remaining to 64-bit I will test them all.

nidud · November 19, 2019, 07:12:37 AM

deleted

The MASM Forum

News:

Faster Memcopy ...

guga

daydreamer

daydreamer

hutch--

aw27

nidud

aw27

aw27

guga

jj2007

guga

jj2007

nidud

aw27

nidud