News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Help undertanding stack frame

Started by jayanthd, February 21, 2013, 03:27:39 AM

Previous topic - Next topic

dedndave

i am writing a little demo program to help you understand
i am almost done with it   :P

dedndave


jayanthd

Ok. Dave. I will check you code. I have a few more doubts in assembly language. Please clear them.

My first doubt... In the below code
var1  db    10h
var2  db  10h, 20h, 30h, 40h, 50h
msg1 db 'Hello!', '$'


If data starts at DS:0000 then address of var1 is DS:0000h and value is 10h. Then address of var2 will be DS:0001h as var1 was only one byte. How can var2 have 5 bytes when var2 is only one byte? db defines only one byte. Right?
Similarly msg1 is defined as byte and the string 'Hello!' will take 6 or 7 bytes including null character. Each character in the string will take a byte. Then how can more than one byte assigned to one byte defined?

How can the second byte of var2 which is 20h addressed? address of var2 is DS:0001 and address of 20h is DS:0002?


dedndave

#18
yes - masm allows you to define numerous bytes on one line
this applies to all data sizes (words, dwords, etc)
the second byte of var2 may be addressed as
    mov     al,var2+1

there are other ways to define multiple data values on one line, as well
Buffer  db 1024 dup(?)
assigns 1,024 uninitialized bytes to Buffer
Array   dd 2048 dup(0FFFFFFFFh)
assigns 2,048 dwords to Array, and initializes them to 0FFFFFFFFh

the assembler keeps track of the addresses and number of bytes used
it knows what address to assign to each label

what you seem to be looking at is 16-bit code   :P
with 16-bit code, you can only address 65536 unique locations with a word address
the data segment register (DS) must point to a specific segment of memory

32-bit code is quite different
because you can directly address up to 4 gb with a dword address value, all code, stack, and data are in the same "segment"
so - no need to mess with segment registers   :t

EDIT: fixed a typo, per Michael
missed the "dup" operator for 2048 dup(0FFFFFFFFh)

MichaelW

Quote from: dedndave on February 24, 2013, 02:04:49 AM

Array   dd 2048(0FFFFFFFFh)
assigns 2,048 dwords to Array, and initializes them to 0FFFFFFFFh

Using ML 6.15, without the dup this defines only a single DWORD with the value 2047, the same as would:

Array dd 2048(-1)





Well Microsoft, here's another nice mess you've gotten us into.

dedndave


jayanthd

#21
Ok. What about the msg1 variable? It starts at DS:0006h. Does the assembler know that msg1 is from DS:0006h to DS:000C    ?

If we use mov dx, offset msg1

The starting address of msg1 which is DS:0006h is put to DX. Does the assembler know that the length of msg1 to be printed is 7 bytes including the null character? While printing the msg1 will it print first byte of msg1 to last byte of msg1?

MichaelW

The assembler knows the length of it, but knows nothing about what is to be printed. It's up to the programmer to indicate to the print routine where the string starts and where it ends. Typically, strings are null-terminated and the print routine stops when it reads the null.

;==============================================================================
    include \masm32\include\masm32rt.inc
;==============================================================================
    .data
        v1  db "XXXXXX",0
        v2  dw 1,2,3,4
        v3  dd 1,2,3,4
        v4  dq 1,2,3,4
    .code
;==============================================================================
start:
;==============================================================================
    printf("%s\n\n", OFFSET v1)
    printf("%Xh\t%d\t%d\n",   OFFSET v1, SIZE v1, SIZEOF v1)
    printf("%Xh\t%d\t%d\n",   OFFSET v2, SIZE v2, SIZEOF v2)
    printf("%Xh\t%d\t%d\n",   OFFSET v3, SIZE v3, SIZEOF v3)
    printf("%Xh\t%d\t%d\n\n", OFFSET v4, SIZE v4, SIZEOF v4)
    inkey
    exit
;==============================================================================
END start


XXXXXX

403000h 1       7
403007h 2       8
40300Fh 4       16
40301Fh 8       32


If you are using the DOS WriteString function (Interrupt 21h function 09h), which expects the OFFSET addrss of the string in DX, you will need to change the null terminator to a "$".
Well Microsoft, here's another nice mess you've gotten us into.

dedndave

yes - 16-bit DOS uses a '$' char to terminate strings - win32 uses a null (0)
that's how i knew he was playing with 16-bit code   :P
but, the idea is the same

win32 also uses the length of a string when it displays it in the console
but, Hutch has hidden that from us with the StdOut routine and print macro
when you pass the address of a null-terminated string to StdOut, it calls StrLen to get the length
the win32 API function, WriteFile, requires the length as a parameter

StdOut proc lpszText:DWORD

    LOCAL hOutPut  :DWORD
    LOCAL bWritten :DWORD
    LOCAL sl       :DWORD

    invoke GetStdHandle,STD_OUTPUT_HANDLE
    mov hOutPut, eax

    invoke StrLen,lpszText
    mov sl, eax

    invoke WriteFile,hOutPut,lpszText,sl,ADDR bWritten,NULL

    mov eax, bWritten
    ret

StdOut endp


Michael gave a nice example of the SIZEOF operator
SIZEOF works well for strings that are on one line
somtimes, you want to use multiple lines, so you might do something like this
szString db 'first line',13,10
         db 'second line',13,10
         db 'third line',13,10,0
EndOfString LABEL BYTE

now, to get the number of bytes in the string (less the null terminator)...
        mov     ecx,EndOfString-szString-1
the assembler does the math for you and resolves it to a single constant

jayanthd

#24
Ok. Thanks Michael and Dave. That cleared my doubts. I bought 10 books on Assembly Language and studying them and getting confused. I want to know what is regptr16 and memptr16. In this case I am referring to 16 bit programming and ofcourse it applies to regptr32 and memptr32.

All I can understand is regptr16 is a 16 bit register used as pointer and memptr16 is a memory address which contains another memory address and which is used as operand like below


mov bx, 0001h     ;0001h is some address
jmp  bx



mov [0002h], 0500h      ; 0002h and 0500h are memory addresses
jmp  [0002h]


In the first code execution jumps to instruction at address 0001h. Right?
In the second code execution jumps to instruction at 0500h. Right?

If I am wrong make me clear about regptr16 and memptr16.  :icon_rolleyes:




dedndave

you seem to have that worked out fine

i would suggest that you avoid the 16-bit stuff and work on win 32-bit code, if that is an option

jayanthd

Quote from: dedndave on February 24, 2013, 11:50:19 PM
you seem to have that worked out fine

i would suggest that you avoid the 16-bit stuff and work on win 32-bit code, if that is an option

Ok. I am actually learning x86 16 bit, 32 bit, and 64 bit assmenly language programming at the same time and I am using MASM611, TASM, NASM, WASM, FASM, emu8086, A86, D86, A386, D386, MASm32, and ml64 tools.

A question about jumps.

There are two kinds of jumps Direct jumps via displacements and Indirect jumps via memory or register. Can you explain them clearly?

And In both the above types there are short jump, near jump, and far jump.

Further the 3 kinds of jumps are classifies in direct and indirect methods as

forward short jump
backward short jump

farward near jump
backward near jump

forward far jump

Is there also backward far jumps in direct and indirect methods i.e., can there be backward jump to another segment?


CS:IP

1234:4444
1234:4445
       .
       .
       .
1235:5555   mov bx, 4444h
1235:5556   jmp bx
       .
       .
       .
1240:1122
1240:1123

Can there be direct and indirect far jumps in backward direction like, in the above block can the jmp be taken to segment 1234 from segment 1235? I know the code is wrong for far jumps because I have to change both CS and IP for far jumps. Can you show how to change CS and IP before taking a far jump?

dedndave

direct jumps use relative addressing
in other words, the branch is taken as a displacement or offset, relative to the address of the next instruction
this also applies to calls - except there are no short calls
    call    SomeAddress
NextAddress:

to calculate the displacement, you would use SomeAddress-NextAddress
if the displacement is positive, you are branching forward
if the displacement is negative, you are branching backward

these types of jumps can be short or near
short means the displacement distance is small enough to fit into a signed byte (range = -128 to +127)
near means that it doesn't fit into a byte
(word used in 16-bit code, dword used in 32-bit code, qword used in 64-bit code)
the difference is that a shorter opcode is used
another small difference is that some assemblers (including older versions of masm), are not able to handle forward branches well
they cannot calculate the distance, as they haven't filled in the code yet - lol
in the old days, we used JMP SHORT SomeLabel for forward references to tell the assembler it was short
for backward references, the assembler knows what the distance is - so it uses the right opcode automatically

far branches are very different, and really only apply to 16-bit code, unless you are writing a driver or something
they are not considered to be forward or backward, as the target is in a different code segment
far branches, therefore, are never relative
there are a couple ways to make a far branch in 16-bit code
first, the assembler will know if the target is in a different code segment
so,
    jmp     FarTarget
will work
however, sometimes, it may be in an external module, so you need to tell it
    jmp far ptr FarTarget
another way to achieve the same thing is to push the segment of the target,
then push the offset of the target, then execute a RETF (far return)
if the address is in memory, you use dword ptr
lpFarTarg  dw offset,segment
    jmp dword ptr lpFarTarg


you won't have to deal with far branches much unless you write device drivers, interrupt handlers, or boot code

jayanthd

Another question:


General, Base, and Index Registers

Bits   w = 0   w = 1   Bits for Segment Registers
--------------------------------------------------
000   AL   AX/EAX      000   ES
001   CL   CX/ECX      001   CS
010   DL   DX/EDX      010   SS
011   BL   BX/EBX      011   DS
100   AH   SP      100   FS
101   CH   BP      101   GS
110   DH   SI
111   BH   DI

for the instruction mov ah, 00

the machine code is 2 bytes

1011 0 100 00000000
        mov w reg operand


op code for mov is 1011, w = 0 as reg is 8 bit, reg bits is 100 as reg is AH, and operand is

00000000.

What happenes inside the cpu when this 2 byte is executed? How does the digital circuit inside the

processor switch for 1101010000000000 and how does AH finally gets the value 0?



A question revisited. (stack frame]


push op1
push op2
push ebp
mov ebp, esp
push esi
push edi

mov eax, [ebp+12] ;get op2
add eax, [ebp+16] ;get op1

pop edi
pop esi
mov esp, ebp
pop ebp


is the process of a function call and return. Why op1, op2 are pushed first? Why can't the code be

like below.


push ebp
mov ebp, esp
push esi
push edi
push op1
push op2

mov eax, [ebp-8]
add eax, [ebp-12]

pop edi
pop esi
mov esp, ebp
pop ebp


In the second way the op1 and op2 will be inside the stack frame like local variables, but in the

normal method op1 and op2 will be in the stack frame of the calling function (main()). Isn't it

better if passed arguments are inside the called functions stack frame?


Regarding far jmp:

Far jumps use registers containing the jump address or memory address as operands. Right?
If the code takes three segments each of 64K bytes and the jump instruction is in segment 2 and jumps to segment one, then it ia a backward jump. Then how do you say there is nobackward jump in far jump? Assuming code is in 1234:0000h to 1236:FFFFh. Jump instruction is somewhere in 1235:2222. jump from CS 1235 to CS 1234 is backward. Isn't it?

Another question:

Does the below code move segment address?


mov ax, @data
mov ds, ax

mov ax, seg ds


Quotejmp far ptr FarTarget

Is it equal to jmp far byte ptr FarTarget ?

QuotelpFarTarg  dw offset,segment
    jmp dword ptr lpFarTarg

Offset and segment should be initialized. Right? But how to know the address of segment and offset of the far destination? By subtracting far address from address of next instruction after jump instruction?

dedndave

the first part of that question is, lemme say, beyond the scope of this forum - lol
that particular instruction is fairly simple, as it does not involve the ALU (Arithmetic-Logic Unit)
it is, for the most part, handled by the BIU (Bus-Interface Unit), the instruction pre-fetch queue, and the register files
if you want to know how processors work, there is a lot of reading to do - especially modern ones
but, you can read up on microcode, tri-state busses, latches, and decoders/selectors   :P
i will say that opcodes are generally selected so that the microcode knows rather soon how long the instruction is
for most instructions, the cpu probably knows by the value of the first byte - certainly by the second

for the second part, you are missing a very important concept
the op2 and op1 values (and any other parameters) are pushed onto the stack before the call
the other pushes are internal to the routine

before the branch occurs...
    push    op2
    push    op1
    push    RetAddress
    jmp     Target
RetAdress:

the CALL instruction pushes the return address, then branches
the routine will end with a RET instruction
the RET instruction pop's the return address off the stack, discards the parms, and branches to that address

so, the stack will look like this....
op2
op1
RetAddress

;any registers pushed by the routine


then, EBP is set to the current value of ESP, as a stable reference point