issues with dereferencing iteration of array of long elements in a loop

sinsi · January 14, 2024, 04:28:55 PM

Quote from: NoCforMe on January 14, 2024, 04:07:32 PMMy recommendation, take it or leave it: Forget 64-bit programming. Completely overkill and a pain in the ass besides. Win32 forever!

It is nice to allocate 8GB to work with an SQL table and have the WHOLE F'N THING in memory

NoCforMe · January 14, 2024, 04:40:37 PM

I should have said "forget 64-bit programming except in certain circumstances where you need humongous amounts of memory" ...

cyrus · January 14, 2024, 06:15:17 PM

Quote from: sinsi on January 14, 2024, 03:34:58 PMYou need to read up about spill/shadow space and passing parameters for 64-bit.
Code Select Expand
sub rsp, 28h+256 ;reserve stack space for called functions lea r15, [rsp+28] ; delete the later line before the call to GetModuleBaseNameThis change seems to *not crash*

You normally allocate 4 qwords for the spill. If a Windows function you call has more than 4 parameters then you would allocate that many. Note that you MUST allocate a minimum of 4.

Once you have set up your stack, don't touch it - no more "sub rsp,20h/add rsp,20h" pairs, the initial adjustment will take care of it.

I have noticed that the style of setting aside stack space this way you stated: 'sub rsp, 256' and then using that for my buffer doesn't end up working in some cases and I'll tell you why. When you reserve stack space that way, it's going to have random data, not null bytes. When you try to use that for a buffer, you never know what you'll get and often your buffer will contain other data and not work. I do that style of subtracting stack space when I am going to use that amount of space to dedicate to a structure like the PROCESS INFORMATION in CreateProcessA because that is going to get populated. or WSAData, or when I am in a read loop from a network socket. That buffer is going to fill up entirely with the data I am reading in and then gets null-terminated.

However, I believe my weakness with asm in general is the stack space. I have 1 program where I have to make 2 calls to printf with an empty string because it won't work otherwise and I've written quite a bit of programs with perfect stack alignment, so I don't know what that issue is.

I believe the sub rsp, 20h is required for every function call isn't it? I read about this before in 64-bit programming. The add rsp, 20h is only necessary when I am in a loop. If I leave it out, stack overflow.

cyrus · January 14, 2024, 06:17:39 PM

Quote from: NoCforMe on January 14, 2024, 04:40:37 PMI should have said "forget 64-bit programming except in certain circumstances where you need humongous amounts of memory" ...

I understand 32-bits is more fun to program but I have to actually program this for current systems which are 64-bit lol.

sinsi · January 14, 2024, 06:33:13 PM

Quote from: cyrus on January 14, 2024, 06:15:17 PMI have noticed that the style of setting aside stack space this way you stated: 'sub rsp, 256' and then using that for my buffer doesn't end up working in some cases ...

Two reasons to fail, 256 is not enough, or misalignes the stack.

Quote from: cyrus on January 14, 2024, 06:15:17 PMWhen you reserve stack space that way, it's going to have random data, not null bytes.

As for any LOCAL variable, you set it up for the call, if the call returns no error the buffer has to be correc.

Quote from: cyrus on January 14, 2024, 06:15:17 PMI believe the sub rsp, 20h is required for every function call isn't it? I read about this before in 64-bit programming. The add rsp, 20h is only necessary when I am in a loop. If I leave it out, stack overflow.

A Windows function uses at least 4 spill slots, that's what the "sub rsp,20h" is, assuming the stack is aligned (which it isn't on entry).
You are way off here, study the Win64 ABI.

NoCforMe · January 14, 2024, 07:00:54 PM

Quote from: cyrus on January 14, 2024, 06:15:17 PMI have noticed that the style of setting aside stack space this way you stated: 'sub rsp, 256' and then using that for my buffer doesn't end up working in some cases and I'll tell you why. When you reserve stack space that way, it's going to have random data, not null bytes. When you try to use that for a buffer, you never know what you'll get [...]

Yes. It's the same with any variables allocated on the stack as LOCALs. The rule is, when using any such stack-allocated space, ASSUME it contains garbage.

You can clear stack space just like any other space by using REP STOSB or in a loop by setting it to the desired value. For instance (32-bit example here):

Code Select

    PUSH    EDI
    LEA    EDI, <variable you want to clear>
    MOV    ECX, <size of variable in bytes>
    MOV    AL, <value to fill variable with>
    REP    STOSB
    POP    EDI

   --or--

    LEA    EDX, <variable you want to clear>
    MOV    ECX, <size of variable in bytes>
    MOV    AL, <value to fill variable with>
@@:    MOV    [EDX], AL
    INC    EDX
    LOOP    @B

You can clear the space using words, dwords or qwords as well.

Also, if the stack space is going to receive the results of a function call like your EnumProcesses(), it doesn't matter what's in the buffer: the function will just overwrite it, so no need to initialize it.

jj2007 · January 14, 2024, 08:07:06 PM

Quote from: cyrus on January 14, 2024, 01:18:33 PMI have debugged that and it does not fail

So did I, and as Sinsi wrote, it will brutally fail for values over 1023*).
Test it (the code is Masm64 SDK compatible, unlike yours):

Code Select

include \masm64\include64\masm64rt.inc
.code
entry_point proc
  xor rax, rax
  xor rbx, rbx
  INT 3
  mov ax, 1234h        ; simulated WORD PTR [cbNeeded]
  mov bl, 4h        ; size of long
  div bl        ; before: eax=1234h, ebx=4h
  conout str$(eax)
  invoke ExitProcess, 0
entry_point endp
end

*) Actually, it is much more complicated, see attachment.

TimoVJL · January 14, 2024, 10:23:58 PM

With poasm:

Code Select

ifdef __UASM__
.x64
.Model flat
endif
ExitProcess PROTO STDCALL :DWORD
.code
_mainCRTStartup proc
  xor rax, rax
  xor rbx, rbx
  INT 3
  mov ax, 1234h        ; simulated WORD PTR [cbNeeded]
  mov bl, 4h        ; size of long
  div bl        ; before: eax=1234h, ebx=4h
  ;conout str$(eax)
  ;invoke ExitProcess, 0
  mov eax, 0
  call ExitProcess    ; just for ml64
_mainCRTStartup endp
end

cyrus · January 15, 2024, 06:35:23 AM

Quote from: sinsi on January 14, 2024, 06:33:13 PM
Quote from: cyrus on January 14, 2024, 06:15:17 PMI have noticed that the style of setting aside stack space this way you stated: 'sub rsp, 256' and then using that for my buffer doesn't end up working in some cases ...
Two reasons to fail, 256 is not enough, or misalignes the stack.

That is a good point and I've missed that it may corrupt the stack alignment there.

Quote from: cyrus on January 14, 2024, 06:15:17 PMWhen you reserve stack space that way, it's going to have random data, not null bytes.
As for any LOCAL variable, you set it up for the call, if the call returns no error the buffer has to be correc.

I already know that local variables are set up for that call. In this case, I am setting up buffer for each call. Could I do what NoCForMe mentioned, declare my buf as 256 in the .data section initialized to 0, and then use REP STOSB in each call to clear it out before I use it? Yes but I'm not sure if that is more efficient than simply pushing 256 null bytes on the stack. Is it? If so, I may use that for the increase in performance but I doubt it would matter in that regard. Maybe if that was megabytes.

Quote from: cyrus on January 14, 2024, 06:15:17 PMI believe the sub rsp, 20h is required for every function call isn't it? I read about this before in 64-bit programming. The add rsp, 20h is only necessary when I am in a loop. If I leave it out, stack overflow.
A Windows function uses at least 4 spill slots, that's what the "sub rsp,20h" is, assuming the stack is aligned (which it isn't on entry).
You are way off here, study the Win64 ABI.

What am I way off on exactly? I did mention a windows function uses 32 bytes so why are you telling me that?

cyrus · January 15, 2024, 06:36:46 AM

Quote from: jj2007 on January 14, 2024, 08:07:06 PM
Quote from: cyrus on January 14, 2024, 01:18:33 PMI have debugged that and it does not fail

So did I, and as Sinsi wrote, it will brutally fail for values over 1023*).
Test it (the code is Masm64 SDK compatible, unlike yours):

Code Select Expand
include \masm64\include64\masm64rt.inc .code entry_point proc xor rax, rax xor rbx, rbx INT 3 mov ax, 1234h ; simulated WORD PTR [cbNeeded] mov bl, 4h ; size of long div bl ; before: eax=1234h, ebx=4h conout str$(eax) invoke ExitProcess, 0 entry_point endp end
*) Actually, it is much more complicated, see attachment.

Good point. I overlooked anything over 1023, so that makes sense.

cyrus · January 15, 2024, 06:39:53 AM

Quote from: NoCforMe on January 14, 2024, 07:00:54 PM
Quote from: cyrus on January 14, 2024, 06:15:17 PMI have noticed that the style of setting aside stack space this way you stated: 'sub rsp, 256' and then using that for my buffer doesn't end up working in some cases and I'll tell you why. When you reserve stack space that way, it's going to have random data, not null bytes. When you try to use that for a buffer, you never know what you'll get [...]

Yes. It's the same with any variables allocated on the stack as LOCALs. The rule is, when using any such stack-allocated space, ASSUME it contains garbage.

You can clear stack space just like any other space by using REP STOSB or in a loop by setting it to the desired value. For instance (32-bit example here):
Code Select Expand
PUSH EDI LEA EDI, <variable you want to clear> MOV ECX, <size of variable in bytes> MOV AL, <value to fill variable with> REP STOSB POP EDI --or-- LEA EDX, <variable you want to clear> MOV ECX, <size of variable in bytes> MOV AL, <value to fill variable with> @@: MOV [EDX], AL INC EDX LOOP @B
You can clear the space using words, dwords or qwords as well.

Also, if the stack space is going to receive the results of a function call like your EnumProcesses(), it doesn't matter what's in the buffer: the function will just overwrite it, so no need to initialize it.

I did mention when I have a buffer I'm going to fill entirely, using 'sub rsp' method works just fine. It's when in these cases, the data varies and I don't know how large that may be and I'm comparing strings. Although in this particular case, I know 'notepad.exe' is only 11 bytes so if data from other PIDs are read into the 11 byte buffer, I don't care but it may overflow onto something else and I figure 256 bytes isn't much to push onto the stack.

Thanks for the tip on clearing a buffer. 2 things here.

1. Is that more efficient than declaring my buffer in the .data section, initializing it to 0, then simply doing that for each call when I am in the loop? Or is simply pushing 256 bytes on the stack just as efficient?

2. I managed to "clear" my buffer by doing

Code Select

mov qword ptr [r15], 0           ; clear the buffer, otherwise it will end up in an infinite loop thinking it is always there

Assuming r15 has the beginning of rsp where I pushed 256 bytes onto. I believe it just adds a null terminator to that so it may not clear the entire data but I believe it is sufficient for strcmp.

sinsi · January 15, 2024, 09:32:51 AM

Quote from: cyrus on January 15, 2024, 06:35:23 AMWhat am I way off on exactly? I did mention a windows function uses 32 bytes so why are you telling me that?

It gets tricky when a function has more than 4 parameters, the extra ones get put onto the stack, usually by a series of "mov [rsp+28h],rax" and so on, so it's easy to lose track of where RSP is.
Even if a function has 0 parameters, it still needs those 32 bytes, that's part of the ABI.

cyrus · January 15, 2024, 11:10:13 AM

Quote from: sinsi on January 15, 2024, 09:32:51 AM
Quote from: cyrus on January 15, 2024, 06:35:23 AMWhat am I way off on exactly? I did mention a windows function uses 32 bytes so why are you telling me that?
It gets tricky when a function has more than 4 parameters, the extra ones get put onto the stack, usually by a series of "mov [rsp+28h],rax" and so on, so it's easy to lose track of where RSP is.
Even if a function has 0 parameters, it still needs those 32 bytes, that's part of the ABI.

Ok I totally know that. Here is an example of how I call WSASocketA. In 32-bits, I used push. In 64-bit, I do exactly what is required.

Code Select

    ; call WSASocketA
    sub rsp, 30h
    xor r9, r9                       ; 4th arg: lpProtocolInfo=NULL (uses itself from above: NULL)
    ;push r9                          ; 6th arg: dwFlags=NULL
    ;push r9                          ; 5th arg: g=NULL
    mov QWORD PTR [rsp + 28h], 00h  ; 6th arg: dwFlags=NULL
    mov QWORD PTR [rsp + 20h], 00h  ; 5th arg: g=NULL
    xor r8, r8
    mov r8b, 6h                    ; 3rd arg: protocol=6
    xor rdx, rdx
    mov dl, 1h                     ; 2nd arg: type=1
    xor rcx, rcx
    mov cl, 2h                     ; 1st arg: af=2
    call WSASocketA                ; call WSASocketA
    mov sockfd, rax                ; save socket descriptor of WSASocketA to sockfd variable

sinsi · January 15, 2024, 12:10:47 PM

Code Select

callWSASocketA PROC
    ;on entry, the stack is misaligned. We have 6 arguments, so need to add 8 bytes to align it
    sub rsp, 38h ;This would be at the top of this proc so every function call can re-use it
                 ;As a bonus it gives us 8 bytes to use at [RSP+30..37] (this time)
    ;swap some code around to cut down on size
    xor r9d,r9d                     ; 4th arg: lpProtocolInfo=NULL (uses itself from above: NULL)
    mov [rsp+28h],r9                ; 6th arg: dwFlags=NULL
    mov [rsp+20h],r9                ; 5th arg: g=NULL
    ;the next 3 args are of type 'int' which is 32-bit? I'm not a C programmer
    ;The advantage of altering the low 32 bits of a register is that the upper 32 are cleared.
    ;Of course if you forget that it can make your code crash in mysterious ways :)
    mov r8d,6h                      ; 3rd arg: protocol=6
    mov edx,1h                      ; 2nd arg: type=1
    mov ecx,2h                      ; 1st arg: af=2
    call WSASocketA                 ; call WSASocketA
    ;this proc acts like a function, and returns rax
    ;Slightly better than having this code accessing a non-local var
    add rsp,38h
    ret
callWSASocketA ENDP

Another way

Code Select

callWSASocketA PROC
    mov  ecx,2
    mov  edx,1
    mov  r8d,6
    xor  r9d,r9d
    push rax     ;aligns the stack
    push 0
    push 0
    sub  rsp,20h
    call WSASocketA
    add rsp,7*8
    ret
callWSASocketA ENDP

NoCforMe · January 15, 2024, 01:33:04 PM

Quote from: cyrus on January 15, 2024, 06:39:53 AMThanks for the tip on clearing a buffer. 2 things here.

1. Is that more efficient than declaring my buffer in the .data section, initializing it to 0, then simply doing that for each call when I am in the loop? Or is simply pushing 256 bytes on the stack just as efficient?

2. I managed to "clear" my buffer by doing

Code Select Expand
mov qword ptr [r15], 0 ; clear the buffer, otherwise it will end up in an infinite loop thinking it is always there
Assuming r15 has the beginning of rsp where I pushed 256 bytes onto. I believe it just adds a null terminator to that so it may not clear the entire data but I believe it is sufficient for strcmp.

Just to clear up a bit of confusion here: I didn't realize that the data going into your buffer was strings. That actually makes things easier.

1. Again, if you're having a function fill a buffer, you don't need to "clear" the buffer, as the function will simply overwrite whatever's in the buffer to start with.

2. Your 2nd bit of code there is correct. Since strings (the kind we deal with here in assembly language 99.99% of the time) are NULL-terminated, all you need to do to "clear" a buffer is to put a single byte of zero into it.

3. If you're doing string comparisons on a buffer that's been filled by a function, again, you don't need to initialize the buffer first, as the string (assuming there's just one) is guaranteed to have a NULL at the end. There are some weird Windows API functions that return multiple strings where each string is terminated by one NULL and the whole shebang is terminated by an extra NULL, but those are special cases. Even there, you're always going to be able to find the end of the strings and the end of the buffer.

About your question about using a static buffer (one declared in your .data section) instead of one allocated on the stack: pretty much 6 of one, half a dozen of the other. Not more or less efficient either way. It's true that you can initialize the static buffer when you declare it. But again, if you're using it multiple times with your Enum function, there's no need to "clear" it each time anyhow. A static buffer will take up space in your program; however, you can minimize the space it occupies in the .exe file by declaring it in your .data? section (uninitialized data), but then you can't initialize it in the declaration; you'll have to use code to initialize it if you need to do that.

The MASM Forum

News:

issues with dereferencing iteration of array of long elements in a loop

sinsi

NoCforMe

cyrus

cyrus

sinsi

NoCforMe

jj2007

TimoVJL

cyrus

cyrus

cyrus

sinsi

cyrus

sinsi

NoCforMe