News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Which registers are safe, control questions

Started by Alek, December 03, 2018, 11:53:12 AM

Previous topic - Next topic

Alek

Hey guys sorry for the late reply (in the process of moving states!), I'll try all these suggestions here soon. Thanks again for all the help - will report back  :biggrin:

Alek

Alright I had some time to go through the examples you guys had for me.

1. The first one which you posted hutch (Reply #12), wouldn't ptxt1 be the same as "offset txt1"? I could be mistaken.

2. Vortex - this is VERY close to what I was looking for. The MS C runtime example however doesn't look like the strings get combined - instead you are just printing them off together. The first example is close though.

-It looks like the stdin call moves whatever is in the input to the buffer
-string2 is then appended to the buffer

Cool, but can we do this WITHOUT making the stdin calls?  For instance:

.data
buffer db BUFFER_SIZE dup(?)

.code
mov offset buffer, offset ["hello world"]


Also do you have the official docs on szCatStr?


3. The third example by hutch was a bit beyond me right now, I think namely because of the UI stuff (I think its UI). Ill have to check that out again once I get a bit better.

hutch--

Hi Alek,

    txt1 db "This is a test",0  ; the text
    ptxt1 dq txt1               ; the pointer to it

As long as you put the pointer variable after the data, MASM will load the address of the string data into the QWORD variable. It gives you a tidy single variable that can be used with mnemonics or API/Procedure calls. OFFSET still works in some contexts in 64 bit MASM but it is not the same as in 32 bit. The /LARGEADDRESSAWARE format in Win64 causes problems with OFFSET where LEA or the method above don't have any problems at all.

You will find "szCatStr proc src:QWORD, lpAdd:QWORD" in the "m64lib" directory and its useful enough for single text append but there is a better faster version that is macro driven called "mcat" where each append does not need to scan the length. This is straight out of the help file.

  mcat
    Add a number of strings to a pre-allocated buffer.
    This can be either LOCAL or in the .DATA? section and
    must be large enough to hold all of the characters.

    mcat pBuffer,"String 1",ptxt1,str$(number),"More Text"

    Limits are either text line length or 24 arguments, which ever comes first.


The third example is a bit messy but it shows that assembler code can be written in many different ways. Differing from at least some high level languages, there is no "One True Way" but many different ways to do things and its part of the flexibility of assembler coding. You obviously come from a C background and that will be useful to you but be careful of not letting it be a straight jacket on the style of assembler you want to write.

hutch--

Here is a test piece for you. I have attached it as a zip file. Two different methods of copying text data from one buffer to another. The first with a stack frame, the second without. Note that there are many ways to do things like this.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include64\masm64rt.inc

    .data
      text db "The time has come the walrus said,"
           db " to speak of many things.",13,10,13,10,0     ; split line technique

    .code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

    LOCAL buf1[128]:BYTE            ; local 128 byte buffer
    LOCAL ptr1 :QWORD               ; pointer for source text
    LOCAL pout :QWORD               ; pointer for output buffer

    lea rax, text                   ; load buffer address
    mov ptr1, rax                   ; store it in LOCAL pointer

    lea rax, buf1                   ; get the output buffer address
    mov pout, rax                   ; store it in LOCAL pointer

    invoke cpytxt,ptr1,pout       ; call the procedure

    conout pout,lf                  ; display with console output

    waitkey                         ; stop so you can see the results
    .exit                           ; macro to exit process

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

CopyText proc psrc:QWORD,pdst:QWORD

    mov r11, psrc                   ; load source address
    mov r10, pdst                   ; load destination address
    mov r9, -1                      ; set up the loop increment

  lbl:
    add r9, 1                       ; start at 0, increment offset
    movzx rax, BYTE PTR [r11+r9]    ; copy BYTE into a register
    mov BYTE PTR [r10+r9], al       ; copy the byte to output buffer
    test rax, rax                   ; test if RAX is terminator
    jnz lbl                         ; loop back if not

    ret                             ; return to caller

CopyText endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

NOSTACKFRAME

cpytxt proc

    mov r11, -1                     ; set up the loop increment

  lbl:
    add r11, 1                      ; start at 0, increment offset
    movzx rax, BYTE PTR [rcx+r11]   ; copy BYTE into a register
    mov BYTE PTR [rdx+r11], al      ; copy the byte to output buffer
    test rax, rax                   ; test if RAX is terminator
    jnz lbl                         ; loop back if not

    ret                             ; return to caller

cpytxt endp

STACKFRAME

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    end

TimoVJL

What optimizing C compiler do//#include <string.h>
char * strcpy(char * dst, const char * src);
#pragma intrinsic(strcpy)
char *s = "test string";
int __cdecl main(void)
{
char buf[260];
strcpy(buf, s);
puts(buf);
return 0;
}

main:
  [0000000000000000] 4881EC38010000               sub               rsp,138h
  [0000000000000007] 488B0500000000               mov               rax,qword ptr [s]
  [000000000000000E] 488D542420                   lea               rdx,[rsp+20h]
  [0000000000000013] 482BD0                       sub               rdx,rax
  [0000000000000016] 66660F1F840000000000         nop               [rax+rax+0]
  [0000000000000020] 0FB608                       movzx             ecx,byte ptr [rax]
  [0000000000000023] 880C02                       mov               byte ptr [rdx+rax],cl
  [0000000000000026] 488D4001                     lea               rax,[rax+1]
  [000000000000002A] 84C9                         test              cl,cl
  [000000000000002C] 75F2                         jne               0000000000000020
  [000000000000002E] 488D4C2420                   lea               rcx,[rsp+20h]
  [0000000000000033] E800000000                   call              puts
  [0000000000000038] 33C0                         xor               eax,eax
  [000000000000003A] 4881C438010000               add               rsp,138h
  [0000000000000041] C3                           ret
May the source be with you

aw27

Strange, I always heard that Microsoft VS could not optimize better than other compilers.  :badgrin:


#define _CRT_SECURE_NO_WARNINGS
#include <string.h>
#include <stdio.h>


char s[] = "test string";

int main()
{
00007FF719CF1000  sub         rsp,138h 
char buf[260];
strcpy(buf, s);
00007FF719CF1007  xor         eax,eax 
char buf[260];
strcpy(buf, s);
00007FF719CF1009  lea         rdx,[s (07FF719CF3038h)] 
00007FF719CF1010  movzx       ecx,byte ptr [rax+rdx] 
00007FF719CF1014  mov         byte ptr buf[rax],cl 
00007FF719CF1018  lea         rax,[rax+1] 
00007FF719CF101C  test        cl,cl 
00007FF719CF101E  jne         main+10h (07FF719CF1010h) 
puts(buf);
00007FF719CF1020  lea         rcx,[buf] 
00007FF719CF1025  call        qword ptr [__imp_puts (07FF719CF2150h)] 
return 0;
00007FF719CF102B  xor         eax,eax 
}
00007FF719CF102D  add         rsp,138h 
00007FF719CF1034  ret


OOPS, sorry it can do a little better:  :t


#define _CRT_SECURE_NO_WARNINGS
#include <string.h>
#include <stdio.h>


char s[] = "test string";

int main()
{
00007FF7A4E51000  sub         rsp,138h 
char buf[260];
strcpy(buf, s);
00007FF7A4E51007  xor         ecx,ecx 
00007FF7A4E51009  lea         rax,[s (07FF7A4E53020h)] 
00007FF7A4E51010  mov         al,byte ptr [rcx+rax] 
00007FF7A4E51013  inc         rcx 
00007FF7A4E51016  mov         byte ptr [rsp+rcx+1Fh],al 
00007FF7A4E5101A  test        al,al 
00007FF7A4E5101C  jne         main+9h (07FF7A4E51009h) 
puts(buf);
00007FF7A4E5101E  lea         rcx,[buf] 
00007FF7A4E51023  call        qword ptr [__imp_puts (07FF7A4E520F0h)] 
return 0;
00007FF7A4E51029  xor         eax,eax 
}
00007FF7A4E5102B  add         rsp,138h 
00007FF7A4E51032  ret 




hutch--

The humours is that any of these algos when even vaguely optimised will hit the memory wall.

aw27

Quote from: hutch-- on December 09, 2018, 12:16:47 AM
The humours is that any of these algos when even vaguely optimised will hit the memory wall.
We need to continue watching closely that matter but technology always find solutions in due time.  :icon_rolleyes:

Vortex

Hi Alek,

Quote from: Alek on December 08, 2018, 11:50:03 AM
2. Vortex - this is VERY close to what I was looking for. The MS C runtime example however doesn't look like the strings get combined - instead you are just printing them off together. The first example is close though.

-It looks like the stdin call moves whatever is in the input to the buffer
-string2 is then appended to the buffer

Cool, but can we do this WITHOUT making the stdin calls?  For instance:

.data
buffer db BUFFER_SIZE dup(?)

.code
mov offset buffer, offset ["hello world"]


Also do you have the official docs on szCatStr?

About the masm32 library functions :

\masm32\help\masmlib.chm -> Zero Terminated String Functions -> szCatStr

Here is the new C run-time example, you can prefer the traditional strcat function :
include     \masm32\include\masm32rt.inc

BUFFER_SIZE = 128

.data

format      db '%s',0
string1     db 'Please type your name : ',13,10,0
string2     db ', hello world.',0

.data?

buffer      db BUFFER_SIZE dup(?)

.code

start:

    invoke  crt_printf,ADDR string1

    invoke  crt_scanf,ADDR format,ADDR buffer

    invoke  crt_strcat,ADDR buffer,ADDR string2

    invoke  crt_printf,ADDR format,ADDR buffer

    invoke  ExitProcess,0

END start


The CTXT macro probably does what you wish, the definition can be found in the file \masm32\macros\macros.asm
include     \masm32\include\masm32rt.inc

BUFFER_SIZE = 128

.data?

buffer      db BUFFER_SIZE dup(?)

.code

start:

    invoke  StdOut,CTXT("Please type your name",13,10)

    invoke  StdIn,ADDR buffer,BUFFER_SIZE

    invoke  szCatStr,ADDR buffer,CTXT(", hello world.")

    invoke  StdOut,ADDR buffer

    invoke  ExitProcess,0

END start



Alek

Quote from: hutch-- on December 08, 2018, 05:16:48 PM
Here is a test piece for you. I have attached it as a zip file. Two different methods of copying text data from one buffer to another. The first with a stack frame, the second without. Note that there are many ways to do things like this.

<edited to save space>


Insane - incredibly well documented too this is awesome


@Vortex: During this I also realized I could just call C functions, like strcpy. Any differences between this and szCatStr besides performance between the two functions? Is there a performance overhead of calling C functions?

jj2007

Quote from: Alek on December 09, 2018, 09:55:17 AMIs there a performance overhead of calling C functions?

Not really. If you are worried about performance, check the Faster Memcopy thread.

Quote from: hutch-- on December 08, 2018, 05:16:48 PMNote that there are many ways to do things like this.

Here is an extremely fast algo:
  mov ecx, (sizeof somestring)/8
  .Repeat
dec ecx
fld real8 ptr somestring[8*ecx]
fstp real8 ptr somedest[8*ecx]
  .Until Zero?

Alek

Quote from: jj2007 on December 09, 2018, 11:34:36 AM
Quote from: Alek on December 09, 2018, 09:55:17 AMIs there a performance overhead of calling C functions?

Not really. If you are worried about performance, check the Faster Memcopy thread.

I'm not too concerned, was just curious in why someone would want to use szCatStr over strcpy

jj2007

Quote from: Alek on December 09, 2018, 11:37:52 AMwhy someone would want to use szCatStr over strcpy

Check the docs, they don't do the same thing. There is also szMultiCat btw - the Zero Terminated String Functions section in MasmLib.chm is a fascinating one 8)

Vortex

Hi Alek,

Quote@Vortex: During this I also realized I could just call C functions, like strcpy. Any differences between this and szCatStr besides performance between the two functions? Is there a performance overhead of calling C functions?

I don't think that there would be a dramatic difference in general. The C functions are good for practical programming.

hutch--

Erol is right, you will tend to find timing variations are usually hardware based, different processors yield different times. The "szCatStr" proc scans the length then writes to the end of the buffer. It is no slouch and its handy for a few appends but if you are doing a high count of appends, there are better ways to do it where you don't have to repeatedly scan the string length. The "szappend" is a lot more efficient with streamed appends as it uses a current location pointer to update the next write address.