The MASM Forum

General => The Campus => Topic started by: jayanthd on February 21, 2013, 03:27:39 AM

Title: Help undertanding stack frame
Post by: jayanthd on February 21, 2013, 03:27:39 AM
I have a C/C++ code like below


//Function prototype
int _sum(int _op1, int _op2);

//Main Function
int main() {

int op1, op2, sum;

op1 = 25;
op2 = 75;

//Calling function
_sum(op1, op2);

return (0);

}

//Function Definition
//Called function
int _sum(int _op1, int _op2) {

int result;

result = _op1 + _op2;
return result;

}


When calling function is executed first the value 75 is pushed to the stack and then value 25 is pushed to the stack. Then return address is pushed on to the stack. Return address will be the address of the next instruction after the calling function. Right? How does the return address calculated?

Then ebp, esi, edi are pushed on the stack and ebp is set to esp. So, ebp and esp will be pointing to the top of the stack which contains edi.

Then when the called function is executed, a local variable result is created on the stack and stack will be pointing to result variable.

then values of _op1 and _op2 on the stack is referenced and value for result is computed and stored in result variable on the stack.

How is the result returned to the calling function?
Is _sum(op1, op2) the calling function or is it main() the calling function?
Is the function definition of _sum() the called function or is it _sum() in the main() the called function?

See the asm code below and complete the process after executing the _sum() function


push 75
push 25
push return address
push ebp
mov ebp, esp
push esi
push edi
push result
mov ax, 25
add ax, 75
mov [result], ax
.
.
.
pop edi
pop esi
mov esp, ebp
pop edi
pop esi
ret


Where actuslly the stack frame gets created. Is it when mov ebp, esp is executed?
old value of esp is stored in ebp and then ebp is used to reference the variables on the stack but ebp never changes but esp changes during stack operation. Finally when returning from the function esp is assigned its old value which is in ebp. Right?

In the asm code show how value of result is returned to main function?

Can mov ebp, esp coded after pushing esi and edi onto the stack?
Title: Re: Help undertanding stack frame
Post by: Greenhorn on February 21, 2013, 04:37:12 AM
Hi jayanthd,

this is the wrong subforum for your question ...  ;)

However, the return value is stored in (r/e)ax.
The "result" variable is not necessary in this case.


Cheers
Greenhorn
Title: Re: Help undertanding stack frame
Post by: jayanthd on February 21, 2013, 05:30:16 AM
Quote from: Greenhorn on February 21, 2013, 04:37:12 AM
Hi jayanthd,

this is the wrong subforum for your question ...  ;)

However, the return value is stored in (r/e)ax.
The "result" variable is not necessary in this case.


Cheers
Greenhorn

Thanks for replying Greenhorn.  :biggrin:

Why cant mov ebp, esp put after push esi and push edi?
How is the value in the result variable returned to the sum variable in the main()?
How is the return address calculated after pushing the function arguments on the stack?
Title: Re: Help undertanding stack frame
Post by: dedndave on February 21, 2013, 05:36:47 AM
Quote from: jayanthd on February 21, 2013, 03:27:39 AM
When calling function is executed first the value 75 is pushed to the stack and then value 25 is pushed to the stack. Then return address is pushed on to the stack. Return address will be the address of the next instruction after the calling function. Right? How does the return address calculated?
that's a pretty good description
the CALL instruction calculates the return address and pushes it onto the stack before branching to the routine

Quote from: jayanthd on February 21, 2013, 03:27:39 AM
How is the result returned to the calling function?
in most high-level compilers, the result is returned in EAX
in assembly language, we may also use ECX and/or EDX to return values, as they need not be preserved
if more space is required, the address of a structure is generally passed and the routine fills it with values

Quote from: jayanthd on February 21, 2013, 03:27:39 AM
Is _sum(op1, op2) the calling function or is it main() the calling function?
Is the function definition of _sum() the called function or is it _sum() in the main() the called function?
i would say main is the calling function, _sum(op1, op2) is the actual call
the called function is defined here
//Function Definition
//Called function
int _sum(int _op1, int _op2) {

int result;

result = _op1 + _op2;
return result;

}


Quote from: jayanthd on February 21, 2013, 03:27:39 AM
Where actually the stack frame gets created. Is it when mov ebp, esp is executed?
old value of esp is stored in ebp and then ebp is used to reference the variables on the stack but ebp never changes but esp changes during stack operation. Finally when returning from the function esp is assigned its old value which is in ebp. Right?
that's pretty close
Quoteold value of esp is stored in ebp
not exactly worded right
the current value of ESP is copied into EBP
Quoteebp is used to reference the variables on the stack but ebp never changes but esp changes during stack operation
very good   :t
many beginners have a hard time with that one
Quotewhen returning from the function esp is assigned its old value which is in ebp
correct, this is often done with a LEAVE instruction, which is essentially the same as
    mov     esp,ebp
    pop     ebp


Quote from: jayanthd on February 21, 2013, 03:27:39 AM
In the asm code show how value of result is returned to main function?
again, return values are passed in EAX

Quote from: jayanthd on February 21, 2013, 03:27:39 AM
Can mov ebp, esp coded after pushing esi and edi onto the stack?
yes - i sometimes write my own stackframe code so i can do it that way

here is how equivalent code might look in assembler....

function prototype, typically near beginning of source
_sum    PROTO   :DWORD,:DWORD
we might like to type them as INT's, but INT is a reserved word in ASM - an instruction for INTerrupt
so - we just type them as DWORD's - assembly does not use strong typing like C

function definition
_sum    PROC    op1:DWORD,op2:DWORD

    mov     eax,op2
    add     eax,op1
    ret

_sum    ENDP


calling the function
    INVOKE  _sum,op1,op2
in this case, you are calling with constants, so...
    INVOKE  _sum,75,25

the actual code generated by the assembler looks like this
_sum    PROC    op1:DWORD,op2:DWORD

    push    ebp
    mov     ebp,esp
    mov     eax,[ebp+12]
    add     eax,[ebp+8]
    leave
    ret     8

_sum    ENDP


and, for the INVOKE...
    push    25
    push    75
    call    _sum
Title: Re: Help undertanding stack frame
Post by: KeepingRealBusy on February 21, 2013, 05:54:25 AM
Quote from: dedndave on February 21, 2013, 05:36:47 AM
.
.
.
_sum    PROTO   :DWORD,:DWORD
we might like to type them as INT's, but INT is a reserved word in ASM - an instruction for INTerrupt
so - we just type them as DWORD's - assembly does not use strong typing like C

function definition
_sum    PROC    op1:DWORD,op2:DWORD

    mov     eax,op2
    add     eax,op1
    ret

_sum    ENDP


calling the function
    INVOKE  _sum,op1,op2
in this case, you are calling with constants, so...
    INVOKE  _sum,75,25


Actually, the assembler DOES support strong typing, at least for function parameters. if you define


PDWORD      TYPEDEF         PTR DWORD


and


_sum    PROTO   PDWORD,:PDWORD


then you must call as


    INVOKE  _sum,ADDRESS op1,ADDRESS op2


You will get an error message if you skip the ADDRESS modifier as in


    INVOKE  _sum,op1,op2


You do not need to endlessly use DWORDs as the only PROTO definers, whether or not you are passing values or pointers to values. The assembler will be checking on you.

Dave.
Title: Re: Help undertanding stack frame
Post by: dedndave on February 21, 2013, 06:04:39 AM
that may be so for PTR's
but, you can prototype with DWORD's, then use UINT's on the PROC line
if i am not mistaken, the assembler only checks it for size
Title: Re: Help undertanding stack frame
Post by: RuiLoureiro on February 21, 2013, 07:16:10 AM
Dave,
        Try to follow this. What the answer

ProcA       proc    x:DWORD,...
            push    ebp
            mov     ebp, esp        ; <-  suppose ESP=EBP = 12345678

            ; ...................................
            ; here we write a lot of correct code
            ;     If we push we pop also
            ; all procs we call exit correctly
            ; ...................................

            ; ....................
            ; Here we want to exit  -> question: what the value in ESP ?
            ; ....................

ProcA       endp
Title: Re: Help undertanding stack frame
Post by: dedndave on February 21, 2013, 07:36:53 AM
hopefully, it will be 12345678   :P

but, what if you want to put a bunch of locals on the stack without keeping track of how big they are ?
then, the MOV ESP,EBP (or LEAVE) balances the stack for you automatically
Title: Re: Help undertanding stack frame
Post by: Gunther on February 21, 2013, 08:14:51 AM
Hi RuiLoureiro,

Quote from: RuiLoureiro on February 21, 2013, 07:16:10 AM
ProcA       proc    x:DWORD,...
            push    ebp
            mov     ebp, esp        ; <-  suppose ESP=EBP = 12345678

            ; ...................................
            ; here we write a lot of correct code
            ;     If we push we pop also
            ; all procs we call exit correctly
            ; ...................................

            ; ....................
            ; Here we want to exit  -> question: what the value in ESP ?
            ; ....................

ProcA       endp

But the current value of ESP isn't interesting, because we're addressing via EBP. ESP is changing by every PUSH or POP or function call etc.

Gunther
Title: Re: Help undertanding stack frame
Post by: MichaelW on February 21, 2013, 08:32:42 AM
Quote from: jayanthd on February 21, 2013, 03:27:39 AM
How is the result returned to the calling function?

See Agner Fog's calling_conventions.pdf available  here (http://www.agner.org/optimize/).
Title: Re: Help undertanding stack frame
Post by: dedndave on February 21, 2013, 08:37:08 AM
Quote from: Gunther on February 21, 2013, 08:14:51 AM
But the current value of ESP isn't interesting, because we're addressing via EBP. ESP is changing by every PUSH or POP or function call etc.

Gunther

it will be interesting for the next instruction where, presumably, they POP EBP and RET   :biggrin:
Title: Re: Help undertanding stack frame
Post by: RuiLoureiro on February 21, 2013, 08:51:21 AM
Quote from: dedndave on February 21, 2013, 07:36:53 AM
hopefully, it will be 12345678   :P

but, what if you want to put a bunch of locals on the stack without keeping track of how big they are ?
then, the MOV ESP,EBP (or LEAVE) balances the stack for you automatically
Thats right. Everything ok Dave !  ;)

Gunther:  Dave gave the answer for me  :t
Title: Re: Help undertanding stack frame
Post by: jayanthd on February 21, 2013, 04:11:28 PM
Thanks everybody. It was helpful.

@Dave

Quote
and, for the INVOKE...
   
    push    25
    push    75
    call    _sum


Why op1 is pushed first and op2 is pushed next? I read somewhere that before calling a function the arguments to the function are pushed in reverse order.

I have another question.

In different addressing modes we use instructions like below to load some value stored at some address into a register.


var1 dd ?

mov eax, [var1]
mov eax, offset var1
mov eax, [aabbccdd]

or

mov ebx, aabbccdd
mov eax, ds:[ebx]                   ;value from some address
mov eax, ds:[ebx + 2]             ;value from some effective address

or

mov si, aabbccdd
mov eax, ds:[si]

etc...




Actually the [address] points to the actual data at that address.

What is the difference between the above instructions to get data and with the below instructions?



mov ax, byte ptr ds:[var1]                  ;here var1 is a byte
mov eax, word ptr ds:[aabb]
mov eax, dword ptr ds:[aabbccdd]
mov eax, dword ptr ds:[ebx]              ;ebx is set to aabbccdd earlier



Where are the above code used?


Title: Re: Help undertanding stack frame
Post by: dedndave on February 21, 2013, 04:44:59 PM
Quote from: jayanthd on February 21, 2013, 04:11:28 PM
and, for the INVOKE...
   
    push    25
    push    75
    call    _sum

Why op1 is pushed first and op2 is pushed next? I read somewhere that before calling a function the arguments to the function are pushed in reverse order.
my mistake - i simply swapped 25 and 75 by accident
the last parameter listed is pushed first

Quote from: jayanthd on February 21, 2013, 04:11:28 PM
In different addressing modes we use instructions like below to load some value stored at some address into a register.

var1 dd ?

mov eax, [var1]
mov eax, offset var1
mov eax, [aabbccdd]

or

mov ebx, aabbccdd
mov eax, ds:[ebx]                   ;value from some address
mov eax, ds:[ebx + 2]             ;value from some effective address

or

mov si, aabbccdd
mov eax, ds:[si]

etc...




Actually the [address] points to the actual data at that address.

What is the difference between the above instructions to get data and with the below instructions?



mov ax, byte ptr ds:[var1]                  ;here var1 is a byte
mov eax, word ptr ds:[aabb]
mov eax, dword ptr ds:[aabbccdd]
mov eax, dword ptr ds:[ebx]              ;ebx is set to aabbccdd earlier



Where are the above code used?
there are a variety of addressing modes that may be used for different purposes
first, let's deal with the address issue....
var1 dd ?

    mov     eax, offset var1

the assembler creates space for the label "var1" at some address
the assembler knows what the address is at assembly-time
the "offset" operator tells the assembler to load the address of var1 into EAX, not the contents at that address
the actual code generated might look something like this
    mov     eax,00401012h   ;the address of var1 is loaded into EAX

if we were to reference a LOCAL variable this way, we would use
    LOCAL   var1    :DWORD

    lea     eax,var1

LEA stands for Load Effective Address
behind the scenes, LOCAL's are addressed by using EBP as a reference
the address isn't known at assembly-time, so the assembler can't use MOV,constant
the actual code might be something like
    lea     eax,[ebp-4]
LEA calculates the address of var1 by subtracting 4 from the address in EBP and placing that value in EAX

now, let's load the contents
i noticed you used [] brackets
with MASM, you don't need to use brackets unless you are using a register
for a variable name...
    mov     eax,var1   ;the contents at the address of var1 are loaded into EAX
for a GLOBAL variable, the actual code generated by the assembler might look something like this
    mov     eax,[00401012h]
for a LOCAL...
    mov     eax,[ebp-4]

when addressing arrays or strings, it is often convenient to access data by using a register to hold the address
    mov     edx,offset var1
    mov     eax,[edx]

this is done so that you may calculate steps in EDX to address the individual elements of an array
on the next pass of a loop, for example, we might...
    add     edx,4      ;adjust the address
    mov     eax,[edx]  ;get the next dword element


there are numerous combinations that are available
    xor     edx,edx        ;zero EDX
    mov     eax,var1[edx]  ;same as MOV EAX,[EDX+var1]


you can also use 2 registers, an index, or even a multiplier of 2, 4, or 8
    mov     eax,MyArray[4*edx+ebx+24]
that's about as complex as they allow   :P
notice that the assembler combines "MyArray" and "+24" to form a single constant
this form might be used to address a 3-dimensional array, where EDX is an X, EBX is a Y, and +24 is a Z
Title: Re: Help undertanding stack frame
Post by: jayanthd on February 23, 2013, 03:04:46 AM
@Dave

Your answers cleared most of my doubts but I didn't get clear picture of the below things.
Quote
there are numerous combinations that are available
    xor     edx,edx        ;zero EDX
    mov     eax,var1[edx]  ;same as MOV EAX,[EDX+var1]


Can you explain how mov eax, var1[edx] works? edx is cleared, So, [edx + var1] = [var1] Right? and var1[edx] = var1[0] = var1. Right?

Quote
you can also use 2 registers, an index, or even a multiplier of 2, 4, or 8
    mov     eax,MyArray[4*edx+ebx+24]
that's about as complex as they allow   :P
notice that the assembler combines "MyArray" and "+24" to form a single constant
this form might be used to address a 3-dimensional array, where EDX is an X, EBX is a Y, and +24 is a Z

By using two registers are you telling it will be based indexed addressing mode, where one base register and one index register is used to get the effective address like
mov eax, [ebx + esi + 12] ?

can you explain more about this code     mov  eax,MyArray[4*edx+ebx+24]

It will be MyArray[some address related to some element of the array]. Right?

And I didn't get this. There are different addressing modes. Ok, but what is the difference between the below two codes?
mov eax, dword ptr ds:[aabbccdd]

mov eax, ds:[aaddccdd]
Title: Re: Help undertanding stack frame
Post by: dedndave on February 23, 2013, 05:42:45 AM
i am writing a little demo program to help you understand
i am almost done with it   :P
Title: Re: Help undertanding stack frame
Post by: dedndave on February 23, 2013, 05:56:28 AM
hopefully, this will help...
Title: Re: Help undertanding stack frame
Post by: jayanthd on February 24, 2013, 01:27:37 AM
Ok. Dave. I will check you code. I have a few more doubts in assembly language. Please clear them.

My first doubt... In the below code
var1  db    10h
var2  db  10h, 20h, 30h, 40h, 50h
msg1 db 'Hello!', '$'


If data starts at DS:0000 then address of var1 is DS:0000h and value is 10h. Then address of var2 will be DS:0001h as var1 was only one byte. How can var2 have 5 bytes when var2 is only one byte? db defines only one byte. Right?
Similarly msg1 is defined as byte and the string 'Hello!' will take 6 or 7 bytes including null character. Each character in the string will take a byte. Then how can more than one byte assigned to one byte defined?

How can the second byte of var2 which is 20h addressed? address of var2 is DS:0001 and address of 20h is DS:0002?

Title: Re: Help undertanding stack frame
Post by: dedndave on February 24, 2013, 02:04:49 AM
yes - masm allows you to define numerous bytes on one line
this applies to all data sizes (words, dwords, etc)
the second byte of var2 may be addressed as
    mov     al,var2+1

there are other ways to define multiple data values on one line, as well
Buffer  db 1024 dup(?)
assigns 1,024 uninitialized bytes to Buffer
Array   dd 2048 dup(0FFFFFFFFh)
assigns 2,048 dwords to Array, and initializes them to 0FFFFFFFFh

the assembler keeps track of the addresses and number of bytes used
it knows what address to assign to each label

what you seem to be looking at is 16-bit code   :P
with 16-bit code, you can only address 65536 unique locations with a word address
the data segment register (DS) must point to a specific segment of memory

32-bit code is quite different
because you can directly address up to 4 gb with a dword address value, all code, stack, and data are in the same "segment"
so - no need to mess with segment registers   :t

EDIT: fixed a typo, per Michael
missed the "dup" operator for 2048 dup(0FFFFFFFFh)
Title: Re: Help undertanding stack frame
Post by: MichaelW on February 24, 2013, 05:46:36 AM
Quote from: dedndave on February 24, 2013, 02:04:49 AM

Array   dd 2048(0FFFFFFFFh)
assigns 2,048 dwords to Array, and initializes them to 0FFFFFFFFh

Using ML 6.15, without the dup this defines only a single DWORD with the value 2047, the same as would:

Array dd 2048(-1)





Title: Re: Help undertanding stack frame
Post by: dedndave on February 24, 2013, 06:21:15 AM
typo, Michael   :P
fixed it
Title: Re: Help undertanding stack frame
Post by: jayanthd on February 24, 2013, 05:26:17 PM
Ok. What about the msg1 variable? It starts at DS:0006h. Does the assembler know that msg1 is from DS:0006h to DS:000C    ?

If we use mov dx, offset msg1

The starting address of msg1 which is DS:0006h is put to DX. Does the assembler know that the length of msg1 to be printed is 7 bytes including the null character? While printing the msg1 will it print first byte of msg1 to last byte of msg1?
Title: Re: Help undertanding stack frame
Post by: MichaelW on February 24, 2013, 09:15:57 PM
The assembler knows the length of it, but knows nothing about what is to be printed. It's up to the programmer to indicate to the print routine where the string starts and where it ends. Typically, strings are null-terminated and the print routine stops when it reads the null.

;==============================================================================
    include \masm32\include\masm32rt.inc
;==============================================================================
    .data
        v1  db "XXXXXX",0
        v2  dw 1,2,3,4
        v3  dd 1,2,3,4
        v4  dq 1,2,3,4
    .code
;==============================================================================
start:
;==============================================================================
    printf("%s\n\n", OFFSET v1)
    printf("%Xh\t%d\t%d\n",   OFFSET v1, SIZE v1, SIZEOF v1)
    printf("%Xh\t%d\t%d\n",   OFFSET v2, SIZE v2, SIZEOF v2)
    printf("%Xh\t%d\t%d\n",   OFFSET v3, SIZE v3, SIZEOF v3)
    printf("%Xh\t%d\t%d\n\n", OFFSET v4, SIZE v4, SIZEOF v4)
    inkey
    exit
;==============================================================================
END start


XXXXXX

403000h 1       7
403007h 2       8
40300Fh 4       16
40301Fh 8       32


If you are using the DOS WriteString function (Interrupt 21h function 09h), which expects the OFFSET addrss of the string in DX, you will need to change the null terminator to a "$".
Title: Re: Help undertanding stack frame
Post by: dedndave on February 24, 2013, 09:56:32 PM
yes - 16-bit DOS uses a '$' char to terminate strings - win32 uses a null (0)
that's how i knew he was playing with 16-bit code   :P
but, the idea is the same

win32 also uses the length of a string when it displays it in the console
but, Hutch has hidden that from us with the StdOut routine and print macro
when you pass the address of a null-terminated string to StdOut, it calls StrLen to get the length
the win32 API function, WriteFile, requires the length as a parameter

StdOut proc lpszText:DWORD

    LOCAL hOutPut  :DWORD
    LOCAL bWritten :DWORD
    LOCAL sl       :DWORD

    invoke GetStdHandle,STD_OUTPUT_HANDLE
    mov hOutPut, eax

    invoke StrLen,lpszText
    mov sl, eax

    invoke WriteFile,hOutPut,lpszText,sl,ADDR bWritten,NULL

    mov eax, bWritten
    ret

StdOut endp


Michael gave a nice example of the SIZEOF operator
SIZEOF works well for strings that are on one line
somtimes, you want to use multiple lines, so you might do something like this
szString db 'first line',13,10
         db 'second line',13,10
         db 'third line',13,10,0
EndOfString LABEL BYTE

now, to get the number of bytes in the string (less the null terminator)...
        mov     ecx,EndOfString-szString-1
the assembler does the math for you and resolves it to a single constant
Title: Re: Help undertanding stack frame
Post by: jayanthd on February 24, 2013, 11:41:07 PM
Ok. Thanks Michael and Dave. That cleared my doubts. I bought 10 books on Assembly Language and studying them and getting confused. I want to know what is regptr16 and memptr16. In this case I am referring to 16 bit programming and ofcourse it applies to regptr32 and memptr32.

All I can understand is regptr16 is a 16 bit register used as pointer and memptr16 is a memory address which contains another memory address and which is used as operand like below


mov bx, 0001h     ;0001h is some address
jmp  bx



mov [0002h], 0500h      ; 0002h and 0500h are memory addresses
jmp  [0002h]


In the first code execution jumps to instruction at address 0001h. Right?
In the second code execution jumps to instruction at 0500h. Right?

If I am wrong make me clear about regptr16 and memptr16.  :icon_rolleyes:



Title: Re: Help undertanding stack frame
Post by: dedndave on February 24, 2013, 11:50:19 PM
you seem to have that worked out fine

i would suggest that you avoid the 16-bit stuff and work on win 32-bit code, if that is an option
Title: Re: Help undertanding stack frame
Post by: jayanthd on February 25, 2013, 02:32:19 AM
Quote from: dedndave on February 24, 2013, 11:50:19 PM
you seem to have that worked out fine

i would suggest that you avoid the 16-bit stuff and work on win 32-bit code, if that is an option

Ok. I am actually learning x86 16 bit, 32 bit, and 64 bit assmenly language programming at the same time and I am using MASM611, TASM, NASM, WASM, FASM, emu8086, A86, D86, A386, D386, MASm32, and ml64 tools.

A question about jumps.

There are two kinds of jumps Direct jumps via displacements and Indirect jumps via memory or register. Can you explain them clearly?

And In both the above types there are short jump, near jump, and far jump.

Further the 3 kinds of jumps are classifies in direct and indirect methods as

forward short jump
backward short jump

farward near jump
backward near jump

forward far jump

Is there also backward far jumps in direct and indirect methods i.e., can there be backward jump to another segment?


CS:IP

1234:4444
1234:4445
       .
       .
       .
1235:5555   mov bx, 4444h
1235:5556   jmp bx
       .
       .
       .
1240:1122
1240:1123

Can there be direct and indirect far jumps in backward direction like, in the above block can the jmp be taken to segment 1234 from segment 1235? I know the code is wrong for far jumps because I have to change both CS and IP for far jumps. Can you show how to change CS and IP before taking a far jump?
Title: Re: Help undertanding stack frame
Post by: dedndave on February 25, 2013, 03:06:44 AM
direct jumps use relative addressing
in other words, the branch is taken as a displacement or offset, relative to the address of the next instruction
this also applies to calls - except there are no short calls
    call    SomeAddress
NextAddress:

to calculate the displacement, you would use SomeAddress-NextAddress
if the displacement is positive, you are branching forward
if the displacement is negative, you are branching backward

these types of jumps can be short or near
short means the displacement distance is small enough to fit into a signed byte (range = -128 to +127)
near means that it doesn't fit into a byte
(word used in 16-bit code, dword used in 32-bit code, qword used in 64-bit code)
the difference is that a shorter opcode is used
another small difference is that some assemblers (including older versions of masm), are not able to handle forward branches well
they cannot calculate the distance, as they haven't filled in the code yet - lol
in the old days, we used JMP SHORT SomeLabel for forward references to tell the assembler it was short
for backward references, the assembler knows what the distance is - so it uses the right opcode automatically

far branches are very different, and really only apply to 16-bit code, unless you are writing a driver or something
they are not considered to be forward or backward, as the target is in a different code segment
far branches, therefore, are never relative
there are a couple ways to make a far branch in 16-bit code
first, the assembler will know if the target is in a different code segment
so,
    jmp     FarTarget
will work
however, sometimes, it may be in an external module, so you need to tell it
    jmp far ptr FarTarget
another way to achieve the same thing is to push the segment of the target,
then push the offset of the target, then execute a RETF (far return)
if the address is in memory, you use dword ptr
lpFarTarg  dw offset,segment
    jmp dword ptr lpFarTarg


you won't have to deal with far branches much unless you write device drivers, interrupt handlers, or boot code
Title: Re: Help undertanding stack frame
Post by: jayanthd on February 25, 2013, 03:36:50 AM
Another question:


General, Base, and Index Registers

Bits   w = 0   w = 1   Bits for Segment Registers
--------------------------------------------------
000   AL   AX/EAX      000   ES
001   CL   CX/ECX      001   CS
010   DL   DX/EDX      010   SS
011   BL   BX/EBX      011   DS
100   AH   SP      100   FS
101   CH   BP      101   GS
110   DH   SI
111   BH   DI

for the instruction mov ah, 00

the machine code is 2 bytes

1011 0 100 00000000
        mov w reg operand


op code for mov is 1011, w = 0 as reg is 8 bit, reg bits is 100 as reg is AH, and operand is

00000000.

What happenes inside the cpu when this 2 byte is executed? How does the digital circuit inside the

processor switch for 1101010000000000 and how does AH finally gets the value 0?



A question revisited. (stack frame]


push op1
push op2
push ebp
mov ebp, esp
push esi
push edi

mov eax, [ebp+12] ;get op2
add eax, [ebp+16] ;get op1

pop edi
pop esi
mov esp, ebp
pop ebp


is the process of a function call and return. Why op1, op2 are pushed first? Why can't the code be

like below.


push ebp
mov ebp, esp
push esi
push edi
push op1
push op2

mov eax, [ebp-8]
add eax, [ebp-12]

pop edi
pop esi
mov esp, ebp
pop ebp


In the second way the op1 and op2 will be inside the stack frame like local variables, but in the

normal method op1 and op2 will be in the stack frame of the calling function (main()). Isn't it

better if passed arguments are inside the called functions stack frame?


Regarding far jmp:

Far jumps use registers containing the jump address or memory address as operands. Right?
If the code takes three segments each of 64K bytes and the jump instruction is in segment 2 and jumps to segment one, then it ia a backward jump. Then how do you say there is nobackward jump in far jump? Assuming code is in 1234:0000h to 1236:FFFFh. Jump instruction is somewhere in 1235:2222. jump from CS 1235 to CS 1234 is backward. Isn't it?

Another question:

Does the below code move segment address?


mov ax, @data
mov ds, ax

mov ax, seg ds


Quotejmp far ptr FarTarget

Is it equal to jmp far byte ptr FarTarget ?

QuotelpFarTarg  dw offset,segment
    jmp dword ptr lpFarTarg

Offset and segment should be initialized. Right? But how to know the address of segment and offset of the far destination? By subtracting far address from address of next instruction after jump instruction?
Title: Re: Help undertanding stack frame
Post by: dedndave on February 25, 2013, 04:08:19 AM
the first part of that question is, lemme say, beyond the scope of this forum - lol
that particular instruction is fairly simple, as it does not involve the ALU (Arithmetic-Logic Unit)
it is, for the most part, handled by the BIU (Bus-Interface Unit), the instruction pre-fetch queue, and the register files
if you want to know how processors work, there is a lot of reading to do - especially modern ones
but, you can read up on microcode, tri-state busses, latches, and decoders/selectors   :P
i will say that opcodes are generally selected so that the microcode knows rather soon how long the instruction is
for most instructions, the cpu probably knows by the value of the first byte - certainly by the second

for the second part, you are missing a very important concept
the op2 and op1 values (and any other parameters) are pushed onto the stack before the call
the other pushes are internal to the routine

before the branch occurs...
    push    op2
    push    op1
    push    RetAddress
    jmp     Target
RetAdress:

the CALL instruction pushes the return address, then branches
the routine will end with a RET instruction
the RET instruction pop's the return address off the stack, discards the parms, and branches to that address

so, the stack will look like this....
op2
op1
RetAddress

;any registers pushed by the routine


then, EBP is set to the current value of ESP, as a stable reference point
Title: Re: Help undertanding stack frame
Post by: dedndave on February 25, 2013, 04:32:04 AM
now - back to forward and backward references
it is a matter of context
the direction vector only has meaning when the branch is relative - far branches are never relative

so - i said that far branches have no forward or backward - that isn't strictly correct
what i might have said is - for far branches, forward or backward direction have no signifigance

also - far branches cannot target addresses that are in register - at least, not x86
the address is in the code stream or in a memory location

mov ax, @data
mov ds, ax

mov ax, seg ds

"@data" is an assembler short-hand for "the data segment", usually _DATA is the real name
you cannot load a segment register from an immediate operand, so they put it in a general register, first
that last instruction makes no sense   :P

jmp far byte ptr FarTarget
no - in 16-bit code, far addresses consist of a segment (word) and an offset (word) - so "byte" is not good
intel usually stores the segment at the higher address (little-endian)

lpFarTarg  dw offset,segment
        jmp dword ptr lpFarTarg


QuoteOffset and segment should be initialized. Right? But how to know the address of segment and offset of the far destination? By subtracting far address from address of next instruction after jump instruction?

yes - they must be initialized, either by code, of by defining them as initialized variables
as for the segment, the operating system will adjust the segment as a relocatable when it loads the EXE
the offset can be a label
it might look like this
pFarTarget dw FarLabel,FAR_CODE
where the target is...
FAR_CODE SEGMENT

FarLabel:

FAR_CODE ENDS


pardon my "lp" - i am used to win32 code   :P

subtracting the address of the next instruction only applies to relative (near) branches
so - that part is wrong
Title: Re: Help undertanding stack frame
Post by: jayanthd on February 25, 2013, 07:28:18 AM
Quotethe direction vector only has meaning when the branch is relative
What do you mean by relative? Does it mean the jump label is in 64k byte offset of same segment?

mov ax, seg ds makes sense to me. It loads ax with data segment address (not offset address) in emu8086.
Quote
lpFarTarg  dw offset,segment
        jmp dword ptr lpFarTarg

Where is the value for offset and segment defined? How can variable names used as contents of dw?

Should it be

lpFarTarg  dw aabbccddh,ddccbbaah
        jmp dword ptr lpFarTarg



In jmp far ptr FarTarget

is FarTarget a variable name or label? I am asking because FarTarget should contain the CS:IP address to where the code has to jump

Quote
pFarTarget dw FarLabel,FAR_CODE

I can't understand this... FAR_CODE is the label and it has some offset address in the code. But where is segment address? In the FarLabel?

Why is it written like pFarTarget dw FarLabel,FAR_CODE and not like
pFarTarget dw FarLabel:FAR_CODE
Title: Re: Help undertanding stack frame
Post by: MichaelW on February 25, 2013, 09:29:43 AM
In this context "relative" means relative to the current value of the instruction pointer. A relative address is encoded as a "displacement", which is effectively a signed value that is added to the instruction pointer to set it to the destination address. Displacements can be SHORT or NEAR, with a SHORT displacement encoded as a BYTE, and a NEAR displacement encoded as a WORD for 16-bit code or as a DWORD for 32-bit code. MASM by default uses the shortest displacement encoding possible, but as shown below it can be forced to a larger encoding.

And in case this is not clear in the listing, the first byte of the encoded instructions is the opcode, and for the call and jump instructions the instruction operand is the displacement.

;==============================================================================
include \masm32\include\masm32rt.inc
;==============================================================================
.data
.code
;==============================================================================
  L0:
    ret
  L1:
    jmp L2
  start:
    call L0
    jmp L1
  L2:
    jmp L3
  L3:
    jmp NEAR PTR L4
    nop
    nop
  L4:
    inkey
    exit
;==============================================================================
end start

00000000   L0:
00000000     C3             ret
00000001   L1:
00000001     EB 07          jmp L2            ; displacement = +7
00000003   start:
00000003     E8 FFFFFFF8    call L0           ; displacement = -8
00000008     EB F7          jmp L1            ; displacement = -9
0000000A   L2:
0000000A     EB 00          jmp L3            ; displacement = 0
0000000C   L3:
0000000C     E9 00000002    jmp NEAR PTR L4   ; displacement = +2
00000011     90             nop
00000012     90             nop
00000013   L4:


Title: Re: Help undertanding stack frame
Post by: jayanthd on February 25, 2013, 05:38:54 PM
That was somewhat clear.

Quotethe first byte of the encoded instructions is the opcode

What is an encoded instruction? Are you telling that for a 16 bit (2 byte) instruction, the 1st byte (upper byte or left byte) will be the op code and the next byte will be the operand? encoded instruction means machine code. Right?

The below code in emu8086 loads the segment value and offset value of a variable to cx and dx registers.

data segment
var1 dw 2030h, 4050h
ends

stack segment
    dw   128  dup(0)
ends

code segment
start:

    ; add your code here
    mov ax, data
    mov ds, ax
   
    mov bx, ds
    mov cx, seg var1
    mov dx, offset var1
   
mov ax, 4c00h
int 21h 

ends

end start
Title: Re: Help undertanding stack frame
Post by: dedndave on February 25, 2013, 07:22:03 PM
different instructions require different numbers of of bytes
16-bit code refers to code that runs on 16-bit processors, in the case of intel, 8086/8088/80186/80188
it does not mean that each instruction is 16 bits

INC AX is a single byte
JMP 8000:0000 is 5 bytes

when the processor inerprets instructions, one of the things it must do is determine the number of bytes

the term "opcode" is thrown around a bit ambiguously
because part of an instruction might be the opcode and part might be an immediate operand
we often refer to the whole thing as an opcode - lol
i can see where that might be a little confusing

as for the stretch of code....
sure - you can load the segment and offset into registers
but, intel processors do not provide instructions that look like this
        jmp     cx:dx
        call    cx:dx


if i wanted to branch to a far address from values in register...
        push    cx    ;push the segment
        push    dx    ;push the offset
        retf          ;far return


if var1 was a code label, you could just branch to the label
the assembler knows it is in a different segment, and makes it a far branch

however, var1 is not a code label - it is a data label
what you really want is
        jmp dword ptr var1
now, the assembler knows that var1 has 2 words
it knows that it is a far branch
the segment of var1 must be in a segment register
normally, the DS register holds the data segment

in everything i have discussed, i am refering to MASM syntax
we don't use emu86 much in here
Title: Re: Help undertanding stack frame
Post by: MichaelW on February 25, 2013, 10:29:01 PM
Quote from: jayanthd on February 25, 2013, 05:38:54 PM
Quotethe first byte of the encoded instructions is the opcode
What is an encoded instruction?

An encoded instruction is an instruction in its machine code format. The opcodes that I was referring to in the listing, in this case (but not in the general case) each a single byte, are:
C3
EB
E8
EB
EB
E9
90
90
Sorry for the confusion, I was trying to make it easy for you to identify the point of interest, the encoded displacements in the instruction operands.
Title: Re: Help undertanding stack frame
Post by: jayanthd on February 26, 2013, 04:34:44 AM
@ Michael and Dave

Ok. If I have a 1 byte, 2 byte, and 5 byte machine code like below

F6
D58A
EBC14F8B2C90


Then F6, D5, and EB is the opcode. Right?

I am using masm611 and emu8086. I will play with it for another 2 weeks and then I will start 32 bit assembly programming using MASM32. So, bear with me.

Can anybody give me a simple MASM32 code for adding two numbers. It must have three variable var1, var2, and sum. The result should be printed in console window.
Title: Re: Help undertanding stack frame
Post by: dedndave on February 26, 2013, 04:56:50 AM
correct on the opcode

you did say masm32   :P
we assume you have installed the masm32 package

;###############################################################################################

        INCLUDE    \Masm32\Include\Masm32rt.inc

;###############################################################################################

        .DATA

var1    dd 65
var2    dd 75

;***********************************************************************************************

        .DATA?

sum     dd ?

;###############################################################################################

        .CODE

;***********************************************************************************************

_main   PROC

        mov     eax,var1
        add     eax,var2
        mov     sum,eax

        print   str$(eax),13,10
        inkey
        exit

_main   ENDP

;###############################################################################################

        END     _main
Title: Re: Help undertanding stack frame
Post by: dedndave on February 26, 2013, 05:11:04 AM
i guess, if you are using masm v 6.11, you may not have installed the masm32 package
it will be difficult to get started with 32-bit code without installing it

you can, however, assemble 16-bit code with it, provided you have a 16-bit linker
(the 32-bit linker will not link 16-bit modules)
notice that you can use some 32-bit registers in 16-bit code - that is probably a little confusing
the fact is, if you need 32-bit registers, you may as well write 32-bit code   :biggrin:

at any rate, here is a 16-bit equiv of the above program
i have omitted the display part, as you would need to write a routine for that and i didn't want to complicate it
you can watch the results in a debugger

        .MODEL  Small
        .STACK  1024
        OPTION  CaseMap:None

;####################################################################################

        .DATA

var1    dw 65
var2    dw 75

;************************************************************************************

        .DATA?

sum     dw ?

;####################################################################################

        .CODE

;************************************************************************************

_main   PROC    FAR

;----------------------------------

;DS = DGROUP

        mov     ax,@data
        mov     ds,ax

;----------------------------------

        mov     ax,var1
        add     ax,var2
        mov     sum,ax

;----------------------------------

;terminate

        mov     ax,4C00h
        int     21h

_main   ENDP

;####################################################################################

        END     _main

Title: Re: Help undertanding stack frame
Post by: jayanthd on February 26, 2013, 05:26:47 PM
I have installed both MASM 615 and MASM32. I will start using MASM32 in another 2 days.  :P

The things that I didn't understand in the masm32 code and masm611 code are

masm32 code
print   str$(eax),13,10
        inkey


Why str$(eax)? Why not str$(sum)...
What is inkey?

masm611 code
What is OPTION  CaseMap:None

Title: Re: Help undertanding stack frame
Post by: dedndave on February 27, 2013, 12:03:03 AM
well - i could use (sum), or i could use (eax)
it just happens that the value is in a register, at that time
so, it is more efficient to use the register

print, inkey, str$ are all macros provided by Hutch's masm32 package
they just save some typing - and make the code a little easier to read

inkey displays a "press any key" message and waits for a keypress
if you run the program by clicking on it in windows explorer, and you don't have some kind of wait,
the console opens, runs the program, and closes before you get to see the results

you will want to browse the files in the \masm32\help folder
the macros are described in hlhelp.chm
and are defined in \masm32\macros\macros.asm