News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

issues with dereferencing iteration of array of long elements in a loop

Started by cyrus, January 14, 2024, 12:20:40 PM

Previous topic - Next topic

sinsi

Quote from: NoCforMe on January 14, 2024, 04:07:32 PMMy recommendation, take it or leave it: Forget 64-bit programming. Completely overkill and a pain in the ass besides. Win32 forever!
It is nice to allocate 8GB to work with an SQL table and have the WHOLE F'N THING in memory  :cool:

NoCforMe

I should have said "forget 64-bit programming except in certain circumstances where you need humongous amounts of memory" ...
Assembly language programming should be fun. That's why I do it.

cyrus

Quote from: sinsi on January 14, 2024, 03:34:58 PMYou need to read up about spill/shadow space and passing parameters for 64-bit.
    sub rsp, 28h+256                    ;reserve stack space for called functions
    lea r15, [rsp+28]    ; delete the later line before the call to GetModuleBaseName
This change seems to *not crash*

You normally allocate 4 qwords for the spill. If a Windows function you call has more than 4 parameters then you would allocate that many. Note that you MUST allocate a minimum of 4.

Once you have set up your stack, don't touch it - no more "sub rsp,20h/add rsp,20h" pairs, the initial adjustment will take care of it.

I have noticed that the style of setting aside stack space this way you stated: 'sub rsp, 256' and then using that for my buffer doesn't end up working in some cases and I'll tell you why. When you reserve stack space that way, it's going to have random data, not null bytes. When you try to use that for a buffer, you never know what you'll get and often your buffer will contain other data and not work. I do that style of subtracting stack space when I am going to use that amount of space to dedicate to a structure like the PROCESS INFORMATION in CreateProcessA because that is going to get populated. or WSAData, or when I am in a read loop from a network socket. That buffer is going to fill up entirely with the data I am reading in and then gets null-terminated.

However, I believe my weakness with asm in general is the stack space. I have 1 program where I have to make 2 calls to printf with an empty string because it won't work otherwise and I've written quite a bit of programs with perfect stack alignment, so I don't know what that issue is.

I believe the sub rsp, 20h is required for every function call isn't it? I read about this before in 64-bit programming. The add rsp, 20h is only necessary when I am in a loop. If I leave it out, stack overflow.

cyrus

Quote from: NoCforMe on January 14, 2024, 04:40:37 PMI should have said "forget 64-bit programming except in certain circumstances where you need humongous amounts of memory" ...

I understand 32-bits is more fun to program but I have to actually program this for current systems which are 64-bit lol.

sinsi

Quote from: cyrus on January 14, 2024, 06:15:17 PMI have noticed that the style of setting aside stack space this way you stated: 'sub rsp, 256' and then using that for my buffer doesn't end up working in some cases ...
Two reasons to fail, 256 is not enough, or misalignes the stack.

Quote from: cyrus on January 14, 2024, 06:15:17 PMWhen you reserve stack space that way, it's going to have random data, not null bytes.
As for any LOCAL variable, you set it up for the call, if the call returns no error the buffer has to be correc.

Quote from: cyrus on January 14, 2024, 06:15:17 PMI believe the sub rsp, 20h is required for every function call isn't it? I read about this before in 64-bit programming. The add rsp, 20h is only necessary when I am in a loop. If I leave it out, stack overflow.
A Windows function uses at least 4 spill slots, that's what the "sub rsp,20h" is, assuming the stack is aligned (which it isn't on entry).
You are way off here, study the Win64 ABI.

NoCforMe

Quote from: cyrus on January 14, 2024, 06:15:17 PMI have noticed that the style of setting aside stack space this way you stated: 'sub rsp, 256' and then using that for my buffer doesn't end up working in some cases and I'll tell you why. When you reserve stack space that way, it's going to have random data, not null bytes. When you try to use that for a buffer, you never know what you'll get [...]

Yes. It's the same with any variables allocated on the stack as LOCALs. The rule is, when using any such stack-allocated space, ASSUME it contains garbage.

You can clear stack space just like any other space by using REP STOSB or in a loop by setting it to the desired value. For instance (32-bit example here):
    PUSH    EDI
    LEA    EDI, <variable you want to clear>
    MOV    ECX, <size of variable in bytes>
    MOV    AL, <value to fill variable with>
    REP    STOSB
    POP    EDI

   --or--

    LEA    EDX, <variable you want to clear>
    MOV    ECX, <size of variable in bytes>
    MOV    AL, <value to fill variable with>
@@:    MOV    [EDX], AL
    INC    EDX
    LOOP    @B

You can clear the space using words, dwords or qwords as well.

Also, if the stack space is going to receive the results of a function call like your EnumProcesses(), it doesn't matter what's in the buffer: the function will just overwrite it, so no need to initialize it.
Assembly language programming should be fun. That's why I do it.

jj2007

Quote from: cyrus on January 14, 2024, 01:18:33 PMI have debugged that and it does not fail

So did I, and as Sinsi wrote, it will brutally fail for values over 1023*).
Test it (the code is Masm64 SDK compatible, unlike yours):

include \masm64\include64\masm64rt.inc
.code
entry_point proc
  xor rax, rax
  xor rbx, rbx
  INT 3
  mov ax, 1234h        ; simulated WORD PTR [cbNeeded]
  mov bl, 4h        ; size of long
  div bl        ; before: eax=1234h, ebx=4h
  conout str$(eax)
  invoke ExitProcess, 0
entry_point endp
end

*) Actually, it is much more complicated, see attachment.

TimoVJL

With poasm:
ifdef __UASM__
.x64
.Model flat
endif
ExitProcess PROTO STDCALL :DWORD
.code
_mainCRTStartup proc
  xor rax, rax
  xor rbx, rbx
  INT 3
  mov ax, 1234h        ; simulated WORD PTR [cbNeeded]
  mov bl, 4h        ; size of long
  div bl        ; before: eax=1234h, ebx=4h
  ;conout str$(eax)
  ;invoke ExitProcess, 0
  mov eax, 0
  call ExitProcess    ; just for ml64
_mainCRTStartup endp
end
May the source be with you

cyrus

Quote from: sinsi on January 14, 2024, 06:33:13 PM
Quote from: cyrus on January 14, 2024, 06:15:17 PMI have noticed that the style of setting aside stack space this way you stated: 'sub rsp, 256' and then using that for my buffer doesn't end up working in some cases ...
Two reasons to fail, 256 is not enough, or misalignes the stack.

That is a good point and I've missed that it may corrupt the stack alignment there.

Quote from: cyrus on January 14, 2024, 06:15:17 PMWhen you reserve stack space that way, it's going to have random data, not null bytes.
As for any LOCAL variable, you set it up for the call, if the call returns no error the buffer has to be correc.

I already know that local variables are set up for that call. In this case, I am setting up buffer for each call. Could I do what NoCForMe mentioned, declare my buf as 256 in the .data section initialized to 0, and then use REP STOSB in each call to clear it out before I use it? Yes but I'm not sure if that is more efficient than simply pushing 256 null bytes on the stack. Is it? If so, I may use that for the increase in performance but I doubt it would matter in that regard. Maybe if that was megabytes.

Quote from: cyrus on January 14, 2024, 06:15:17 PMI believe the sub rsp, 20h is required for every function call isn't it? I read about this before in 64-bit programming. The add rsp, 20h is only necessary when I am in a loop. If I leave it out, stack overflow.
A Windows function uses at least 4 spill slots, that's what the "sub rsp,20h" is, assuming the stack is aligned (which it isn't on entry).
You are way off here, study the Win64 ABI.

What am I way off on exactly? I did mention a windows function uses 32 bytes so why are you telling me that?

cyrus

Quote from: jj2007 on January 14, 2024, 08:07:06 PM
Quote from: cyrus on January 14, 2024, 01:18:33 PMI have debugged that and it does not fail

So did I, and as Sinsi wrote, it will brutally fail for values over 1023*).
Test it (the code is Masm64 SDK compatible, unlike yours):

include \masm64\include64\masm64rt.inc
.code
entry_point proc
  xor rax, rax
  xor rbx, rbx
  INT 3
  mov ax, 1234h        ; simulated WORD PTR [cbNeeded]
  mov bl, 4h        ; size of long
  div bl        ; before: eax=1234h, ebx=4h
  conout str$(eax)
  invoke ExitProcess, 0
entry_point endp
end

*) Actually, it is much more complicated, see attachment.

Good point. I overlooked anything over 1023, so that makes sense.

cyrus

Quote from: NoCforMe on January 14, 2024, 07:00:54 PM
Quote from: cyrus on January 14, 2024, 06:15:17 PMI have noticed that the style of setting aside stack space this way you stated: 'sub rsp, 256' and then using that for my buffer doesn't end up working in some cases and I'll tell you why. When you reserve stack space that way, it's going to have random data, not null bytes. When you try to use that for a buffer, you never know what you'll get [...]

Yes. It's the same with any variables allocated on the stack as LOCALs. The rule is, when using any such stack-allocated space, ASSUME it contains garbage.

You can clear stack space just like any other space by using REP STOSB or in a loop by setting it to the desired value. For instance (32-bit example here):
    PUSH    EDI
    LEA    EDI, <variable you want to clear>
    MOV    ECX, <size of variable in bytes>
    MOV    AL, <value to fill variable with>
    REP    STOSB
    POP    EDI

   --or--

    LEA    EDX, <variable you want to clear>
    MOV    ECX, <size of variable in bytes>
    MOV    AL, <value to fill variable with>
@@:    MOV    [EDX], AL
    INC    EDX
    LOOP    @B

You can clear the space using words, dwords or qwords as well.

Also, if the stack space is going to receive the results of a function call like your EnumProcesses(), it doesn't matter what's in the buffer: the function will just overwrite it, so no need to initialize it.

I did mention when I have a buffer I'm going to fill entirely, using 'sub rsp' method works just fine. It's when in these cases, the data varies and I don't know how large that may be and I'm comparing strings. Although in this particular case, I know 'notepad.exe' is only 11 bytes so if data from other PIDs are read into the 11 byte buffer, I don't care but it may overflow onto something else and I figure 256 bytes isn't much to push onto the stack.

Thanks for the tip on clearing a buffer. 2 things here.

1. Is that more efficient than declaring my buffer in the .data section, initializing it to 0, then simply doing that for each call when I am in the loop? Or is simply pushing 256 bytes on the stack just as efficient?

2. I managed to "clear" my buffer by doing
   
mov qword ptr [r15], 0           ; clear the buffer, otherwise it will end up in an infinite loop thinking it is always there
Assuming r15 has the beginning of rsp where I pushed 256 bytes onto. I believe it just adds a null terminator to that so it may not clear the entire data but I believe it is sufficient for strcmp.

sinsi

Quote from: cyrus on January 15, 2024, 06:35:23 AMWhat am I way off on exactly? I did mention a windows function uses 32 bytes so why are you telling me that?
It gets tricky when a function has more than 4 parameters, the extra ones get put onto the stack, usually by a series of "mov [rsp+28h],rax" and so on, so it's easy to lose track of where RSP is.
Even if a function has 0 parameters, it still needs those 32 bytes, that's part of the ABI.

cyrus

Quote from: sinsi on January 15, 2024, 09:32:51 AM
Quote from: cyrus on January 15, 2024, 06:35:23 AMWhat am I way off on exactly? I did mention a windows function uses 32 bytes so why are you telling me that?
It gets tricky when a function has more than 4 parameters, the extra ones get put onto the stack, usually by a series of "mov [rsp+28h],rax" and so on, so it's easy to lose track of where RSP is.
Even if a function has 0 parameters, it still needs those 32 bytes, that's part of the ABI.

Ok I totally know that. Here is an example of how I call WSASocketA. In 32-bits, I used push. In 64-bit, I do exactly what is required.

    ; call WSASocketA
    sub rsp, 30h
    xor r9, r9                       ; 4th arg: lpProtocolInfo=NULL (uses itself from above: NULL)
    ;push r9                          ; 6th arg: dwFlags=NULL
    ;push r9                          ; 5th arg: g=NULL
    mov QWORD PTR [rsp + 28h], 00h  ; 6th arg: dwFlags=NULL
    mov QWORD PTR [rsp + 20h], 00h  ; 5th arg: g=NULL
    xor r8, r8
    mov r8b, 6h                    ; 3rd arg: protocol=6
    xor rdx, rdx
    mov dl, 1h                     ; 2nd arg: type=1
    xor rcx, rcx
    mov cl, 2h                     ; 1st arg: af=2
    call WSASocketA                ; call WSASocketA
    mov sockfd, rax                ; save socket descriptor of WSASocketA to sockfd variable

sinsi

callWSASocketA PROC
    ;on entry, the stack is misaligned. We have 6 arguments, so need to add 8 bytes to align it
    sub rsp, 38h ;This would be at the top of this proc so every function call can re-use it
                 ;As a bonus it gives us 8 bytes to use at [RSP+30..37] (this time)
    ;swap some code around to cut down on size
    xor r9d,r9d                     ; 4th arg: lpProtocolInfo=NULL (uses itself from above: NULL)
    mov [rsp+28h],r9                ; 6th arg: dwFlags=NULL
    mov [rsp+20h],r9                ; 5th arg: g=NULL
    ;the next 3 args are of type 'int' which is 32-bit? I'm not a C programmer
    ;The advantage of altering the low 32 bits of a register is that the upper 32 are cleared.
    ;Of course if you forget that it can make your code crash in mysterious ways :)
    mov r8d,6h                      ; 3rd arg: protocol=6
    mov edx,1h                      ; 2nd arg: type=1
    mov ecx,2h                      ; 1st arg: af=2
    call WSASocketA                 ; call WSASocketA
    ;this proc acts like a function, and returns rax
    ;Slightly better than having this code accessing a non-local var
    add rsp,38h
    ret
callWSASocketA ENDP
Another way
callWSASocketA PROC
    mov  ecx,2
    mov  edx,1
    mov  r8d,6
    xor  r9d,r9d
    push rax     ;aligns the stack
    push 0
    push 0
    sub  rsp,20h
    call WSASocketA
    add rsp,7*8
    ret
callWSASocketA ENDP


NoCforMe

Quote from: cyrus on January 15, 2024, 06:39:53 AMThanks for the tip on clearing a buffer. 2 things here.

1. Is that more efficient than declaring my buffer in the .data section, initializing it to 0, then simply doing that for each call when I am in the loop? Or is simply pushing 256 bytes on the stack just as efficient?

2. I managed to "clear" my buffer by doing
   
mov qword ptr [r15], 0          ; clear the buffer, otherwise it will end up in an infinite loop thinking it is always there
Assuming r15 has the beginning of rsp where I pushed 256 bytes onto. I believe it just adds a null terminator to that so it may not clear the entire data but I believe it is sufficient for strcmp.

Just to clear up a bit of confusion here: I didn't realize that the data going into your buffer was strings. That actually makes things easier.

1. Again, if you're having a function fill a buffer, you don't need to "clear" the buffer, as the function will simply overwrite whatever's in the buffer to start with.

2. Your 2nd bit of code there is correct. Since strings (the kind we deal with here in assembly language 99.99% of the time) are NULL-terminated, all you need to do to "clear" a buffer is to put a single byte of zero into it.

3. If you're doing string comparisons on a buffer that's been filled by a function, again, you don't need to initialize the buffer first, as the string (assuming there's just one) is guaranteed to have a NULL at the end. There are some weird Windows API functions that return multiple strings where each string is terminated by one NULL and the whole shebang is terminated by an extra NULL, but those are special cases. Even there, you're always going to be able to find the end of the strings and the end of the buffer.

About your question about using a static buffer (one declared in your .data section) instead of one allocated on the stack: pretty much 6 of one, half a dozen of the other. Not more or less efficient either way. It's true that you can initialize the static buffer when you declare it. But again, if you're using it multiple times with your Enum function, there's no need to "clear" it each time anyhow. A static buffer will take up space in your program; however, you can minimize the space it occupies in the .exe file by declaring it in your .data? section (uninitialized data), but then you can't initialize it in the declaration; you'll have to use code to initialize it if you need to do that.
Assembly language programming should be fun. That's why I do it.