News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Some basic Win32 questions for assembly programmers

Started by NoCforMe, January 25, 2024, 12:42:14 PM

Previous topic - Next topic

NoCforMe

I've been programming in assembly language for Win32 for quite some time now, and I consider myself fairly knowledgable about the platform and how to write programs for it. But there are some really basic things about the Win32 ABI that I realize I'm not quite sure of.

1. State of registers on entry to a Win32 program:
Following another thread here where someone said that they thought that RSI was set to a certain value at program start (this of course in a 64-bit program), I wonder if any registers are set to anything on entry to a 32-bit program, like maybe a pointer to the "command tail".

2. Usage of nonvolatile registers:
In all my coding up to now I've always been very careful to save and restore all of the "sacred" registers (EBX, ESI, EDI, EBP) every time I use them. Of course I always, always do this in subroutines, as my own code may be using these registers.

But I've seen plenty of code posted here where the author uses one of these registers but doesn't bother to save and restore it. So my specific question is this: Is it necessary to save and restore these registers in the "main" Windows code, meaning that which is first run when the program runs? Does the loader take care of saving and restoring these registers? Here I'm thinking of the loader as executing a branch of code from the OS, so does it take care of saving/restoring? And is there any difference between GUI and console programs in this regard?

3. State of uninitialized data:
I had thought up to now that data in an uninitialized .data? section (which ends up being BSS storage) was just that, uninitialized and therefore random. But JJ informs us that all this data is actually zeroed out. Is this true?

Is any of this stuff actually documented somewhere? I'm guessing it might be found at Micro$oft Learn, but there's so much stuff there it's hard to find anything.

I just did a fairly extensive online search but couldn't come up with any definitive info on the Win32 ABI (or the X86 ABI). Lots and lots of pages on the X64 ABI, but nothing in 32-bit land. At least not that I've found yet.
Assembly language programming should be fun. That's why I do it.


NoCforMe

Assembly language programming should be fun. That's why I do it.

TimoVJL

QuoteSome of it was due to user mode zeroing pages for security reasons.
May the source be with you

jj2007

Quote from: NoCforMe on January 25, 2024, 12:42:14 PM1. State of registers on entry to a Win32 program:

The initial values of registers when a 32-bit program starts are determined by the operating system and the runtime environment. Typically, the entry point for a program is the main function, and the initial values of registers are set by the operating system loader or the runtime startup code.

In the x86 architecture, which includes 32-bit systems, the following registers are commonly used:

eax: Accumulator register
ecx: Counter register
edx: Data register
esi: Source index register
edi: Destination index register
ebp: Base pointer register
esp: Stack pointer register
These registers may be used for various purposes, and their initial values depend on the operating system and the runtime environment. In general, when a program starts, the operating system provides the initial context for the program, including the values of certain registers.

For example, the eax register might contain command-line arguments count (argc), ebx could contain a pointer to the command-line arguments (argv), and ecx might be initialized to zero. The stack pointer esp would typically point to the top of the program's stack.

If you want to know the specific initial register values for a particular program, you may need to refer to the documentation of the compiler, linker, or loader used to build and run the program, as well as any relevant documentation for the operating system.

P.S., check yourself:
include \masm32\include\masm32rt.inc

.data?
_esp dd ?
TheRegs dd 8 dup(?)

.code
start:
  pushad
  mov _esp, esp
  mov esp, offset TheRegs
  popad
  mov esp, _esp
  ; ... check their values ...
  inkey "bye"
  exit
end start

jj2007

Quote from: NoCforMe on January 25, 2024, 12:42:14 PMIs it necessary to save and restore these registers in the "main" Windows code, meaning that which is first run when the program runs? Does the loader take care of saving and restoring these registers?

The need to save and restore registers like esi, edi, ebx, ebp, and esp depends on the context and the specific conventions or requirements of the code you are writing. Here are some general guidelines:

a) Always:
ebp (Base Pointer) is often used as a frame pointer and is commonly preserved in functions to maintain a stable reference to the current stack frame. It is often pushed and popped in the function prologue and epilogue.
esp (Stack Pointer) should generally be maintained, especially if you adjust it within your function.

b) In Callback Functions like WndProc:
In Windows callback functions, such as WndProc for handling window messages, it's essential to follow the calling conventions specified by the Windows ABI.
The standard calling convention for Windows callback functions is the __stdcall convention. According to this convention, the callee (WndProc in this case) is responsible for cleaning up the stack, and it may also need to preserve certain registers.
esi, edi, and ebx are considered non-volatile and must be preserved by the callee if modified.

jj2007

Quote from: NoCforMe on January 25, 2024, 12:42:14 PM3. State of uninitialized data:

On Windows, the uninitialised data section (.bss section), which includes variables declared without explicit initialization, is zero-initialised by the loader. This means that the data in the uninitialised section is set to zero (or null) before the program starts executing.

This behavior is documented in the Microsoft PE (Portable Executable) file format specification. The .bss section is part of the PE file structure, and during the loading process, the loader ensures that the memory for the uninitialised data is set to zero.

Specifically, the Windows loader performs the zero-filling of the .bss section as part of the process of loading the executable into memory before the program begins execution.

Note that obviously local variables, residing on the stack, are not initialised to zero. Use a macro like ClearLocals to make sure you are not working with garbage.

sinsi

1. According to Raymond Chen, the loader simply calls the program entry point (RawEntryPoint) with no parameters. Even if there were parameters they would be on the stack (as per the Win32 ABI). The loader might leave values in some registers, although nowadays they are probably all zeroed for security. If you are thinking about the parameters for WinMain, that's constructed by the C runtime (even a pointer to the command line is obtained by a Windows API call).

2. If you have a callback function (like the message loop, or an EnumX callback) you need to preserve them if you use them. The exception to this is the program entry - this is defined as a callback but the loader doesn't rely on anything being preserved, it saves the important stuff in kernel memory.

3. Like jj said. A bit of trivia, if you look at the Win2000 source code that was leaked it shows the BSP booting the system, starting other CPUs then scheduling a high priority thread to zero memory pages.

jj2007

Re local variables:
WndProc proc uses esi edi ebx hWnd, uMsg, wParam:WPARAM, lParam:LPARAM
Local buffer[280000]:BYTE
ClearLocals

Under Windows 7, it was possible to allocate 800,000 of local buffer. However, it needed probing (e.g. with ClearLocals).
Under Windows 10, the limit seems to be slightly above 280k. No more probing required, but with 300k, you get a stack overflow.

Of course, this isn't documented anywhere :badgrin:

NoCforMe

@sinsi: Good answers. Clear, to the point, satisfies my curiosity. Thanks.

So basically, you don't need to worry about preserving non-volatile registers in the program entry point code, right?
Assembly language programming should be fun. That's why I do it.

sinsi

Quote from: NoCforMe on January 25, 2024, 08:07:21 PM@sinsi: Good answers. Clear, to the point, satisfies my curiosity. Thanks.

So basically, you don't need to worry about preserving non-volatile registers in the program entry point code, right?

Yep. Once the loader gets control back, your program and anything to do with it is trashed.

Off topic, but Mr Chen also says that while you can just RET from the entry point (instead of using ExitProcess) the loader will then call ExitThread on your main thread, so if you have any threads going you turn into a zombie process

NoCforMe

Quote from: sinsi on January 25, 2024, 08:19:08 PMOff topic, but Mr Chen also says that while you can just RET from the entry point (instead of using ExitProcess) the loader will then call ExitThread on your main thread, so if you have any threads going you turn into a zombie process
Good to know, even if I'll never try that trick. But hey, that's why god invented Task Manager, yes?
Assembly language programming should be fun. That's why I do it.

jj2007

Quote from: sinsi on January 25, 2024, 07:29:40 PM1. According to Raymond Chen, the loader simply calls the program entry point (RawEntryPoint) with no parameters. Even if there were parameters they would be on the stack (as per the Win32 ABI). The loader might leave values in some registers, although nowadays they are probably all zeroed for security. If you are thinking about the parameters for WinMain, that's constructed by the C runtime (even a pointer to the command line is obtained by a Windows API call).

For a standard Masm32 program, there is nothing constructed by the C runtime, so "RawEntryPoint" = start: (or whatever comes after the end statement). That's why the snippet in reply #4 works.

sinsi

Quote from: jj2007In general, when a program starts, the operating system provides the initial context for the program, including the values of certain registers.
In the case of Win32, the only registers set by the OS are EIP, ESP and at least DF (and segment registers, which we're not supposed to care about).

Quote from: jj2007For example, the eax register might contain command-line arguments count (argc), ebx could contain a pointer to the command-line arguments (argv), and ecx might be initialized to zero.
Again, Win32 doesn't do any of this.


Quote from: jj2007For a standard Masm32 program, there is nothing constructed by the C runtime, so "RawEntryPoint" = start: (or whatever comes after the end statement).
That's what I said.
Quote from: sinsithe loader simply calls the program entry point (RawEntryPoint) with no parameters.

jj2007

Quote from: sinsi on January 25, 2024, 11:28:19 PMIn the case of Win32, the only registers set by the OS are EIP, ESP and at least DF

The OS does not set registers on purpose, but of course, the loader does set and use registers while initialising the program. This is the state on arriving at start:

CPU - main thread, module NewMasm32

EAX 0019FFCC
ECX 00401000 NewMasm32.<ModuleEntryPoint>
EDX 00401000 NewMasm32.<ModuleEntryPoint>
EBX 002CD000
ESP 0019FF74
EBP 0019FF80
ESI 00401000 NewMasm32.<ModuleEntryPoint>
EDI 00401000 NewMasm32.<ModuleEntryPoint>
EIP 00401001 NewMasm32.00401001

C 0  ES 002B 32bit 0(FFFFFFFF)
P 1  CS 0023 32bit 0(FFFFFFFF)
A 0  SS 002B 32bit 0(FFFFFFFF)
Z 1  DS 002B 32bit 0(FFFFFFFF)
S 0  FS 0053 32bit 2D0000(FFF)
T 0  GS 002B 32bit 0(FFFFFFFF)
D 0
O 0  LastErr 00000000 ERROR_SUCCESS
EFL 00000246 (NO,NB,E,BE,NS,PE,GE,LE)

ST0 empty 0.0
ST1 empty 0.0
ST2 empty 0.0
ST3 empty 0.0
ST4 empty 0.0
ST5 empty 0.0
ST6 empty 0.0
ST7 empty 0.0
               3 2 1 0      E S P U O Z D I
FST 0000  Cond 0 0 0 0  Err 0 0 0 0 0 0 0 0 (GT)
FCW 027F  Prec NEAR,53  Mask    1 1 1 1 1 1
Last cmnd 0000:00000000

XMM0 00000000 00000000 00000000 00000000
XMM1 00000000 00000000 00000000 00000000
XMM2 00000000 00000000 00000000 00000000
XMM3 00000000 00000000 00000000 00000000
XMM4 00000000 00000000 00000000 00000000
XMM5 00000000 00000000 00000000 00000000
XMM6 00000000 00000000 00000000 00000000
XMM7 00000000 00000000 00000000 00000000
                                P U O Z D I
MXCSR 00001F80  FZ 0 DZ 0  Err  0 0 0 0 0 0
                Rnd NEAR   Mask 1 1 1 1 1 1

Several registers, i.e. eip, ecx, edx, esi and edi, are set to the 00401000 entry address (eip+1 because of the int 3).

Note I don't want to contradict you, Sinsi, you are right in everything. I just try to be as precise as possible, as I suspect this thread will persist and become a reference. Btw what is the DF register?