News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

16-bit MASM blues...

Started by turboscrew, December 13, 2013, 04:37:37 PM

Previous topic - Next topic

turboscrew

Now that I stumbled into 16-bit MASM-stuff: I remember why I never learned
x86 assembly so well...

The main headaches were:


  • ASSUME

  • SEGMENT - ENDS

  • PROC - ENDP

  • END
My questions are:


  • Is there something - even in principle - that starts something that END ends? If, then what?

  • SEGMENT: Does that define physical or logical segment? Looks like physical, because it doesn't take any indications of which kind of segment?
    The type (program/data/BSS) seems to be defined only by its use - that is: which segment register is set to point to it?

  • .CODE and .DATA are logical segment start markers? The linker may consolidate them to the same (unseen, internal) segments depending on the type? (The next .data continues where the last .data left off?)

  • What it is that ASSUME DS:@data does (in the assembler's point of view)?

  • What is procedure block (from the assembler's point of view)? What are they used for?
That stuff is still very vaguely explained in the MASM manuals ("Programmer's Guide" and "Reference").

I guess when the command prompt starts the program, the CS is set to the program's code segment, but what's in the other segment registers?

dedndave

the MASM reference manual assumes that you are familiar with assembly language
the purpose of the manual is to provide syntax reference, not to teach assembly language

QuoteIs there something - even in principle - that starts something that END ends? If, then what?

yes - the program source text
text that appears after the END directive is ignored by the assembler
the END directive has an optional argument that is required for programs, but not for modules
that argument allows the programmer to specify the program entry point

QuoteSEGMENT: Does that define physical or logical segment? Looks like physical, because it doesn't take any indications of which kind of segment?

i guess i would say that a segment is a logical reference to a physical memory region   :biggrin:
as far as the assembler is concerned, it's logical (assembly-time)
when the program is executed, the operating system maps it to a physical/virtual area of memory (run-time)
16-bit DOS programs ran on systems with 1 MB addressable memory
a segment was actually a physical region in memory
32-bit/64-bit windows programs run on systems that address much more memory
segments are mapped to virtual pages or groups of pages of memory

simply put, the assembler uses segments to handle various memory attributes,
and as a reference for generating numeric addresses

QuoteThe type (program/data/BSS) seems to be defined only by its use - that is: which segment register is set to point to it?

that's a fair observation under 16-bit DOS
although, to assemble executable instructions, it must be a CODE type segment

under 32-bit windows, segments map to sections of memory that have specific security attributes
various sections may allow or dis-allow reads, writes, or code execution
and - different privilege levels may have different access rights
for example, user-mode programs have ring-3 access (refer to the intel manuals)

Quote.CODE and .DATA are logical segment start markers?

.CONST, .DATA, .DATA?, .CODE, and .STACK are aliases or shortcuts, intended to save typing
in "old style" syntax, segments are "opened" with a SEGMENT directive and "closed" with an ENDS directive
the programmer may provide name, type, combine, class, and a couple other attributes to a segment
segments may not be nested
masm provides shortcuts to open a limited set of standardized program segments
when the shortcuts are used, opening one segment automatically closes any previously opened segment

QuoteThe linker may consolidate them to the same (unseen, internal) segments depending on the type?

as mentioned, segments have a "combine" attribute that tells the linker how they may be combined
furthermore, segments may be assigned to "groups"
in which case, the programmer may reference a group of segments and the linker will make them contiguous

QuoteThe next .data continues where the last .data left off?

yes

QuoteWhat it is that ASSUME DS:@data does (in the assembler's point of view)?

the ASSUME directive has many uses, actually
but, when used for segment registers, it tells the assembler which program segment is pointed to by a specific register
a programmer may load a value into a segment register by use of a few different instructions
in some cases, it may be loaded from a variable
the assember cannot know what the value is at assemble-time

the specific directive "ASSUME DS:@data" is used to tell the assembler that the DS register points to DGROUP
DGROUP is a group that includes constant, initialized data, and uninitialized code sections
at least, that's the behaviour in the most commonly used memory models
for 32-bit code - all segments are in the same 4 GB addressable space (flat model)
but, in 16-bit DOS world, there are numerous models - the most common are TINY and SMALL
you can look at the reference material (or google) for descriptions of MEDIUM, COMPACT, LARGE, HUGE models

QuoteWhat is procedure block (from the assembler's point of view)? What are they used for?

simple answer, PROC = subroutine   :P
it's an individual block of code
not all PROC's are subroutines, because the main program code may be assembled as a PROC

QuoteI guess when the command prompt starts the program, the CS is set to the program's code segment, but what's in the other segment registers?

.EXE program headers have values that tell the operating system how to assign initial values to most segment registers (code, data, stack)
operation varies between DOS and windows
under win32, you can pretty much ignore segment registers
a "flat" memory model is used, and all segments essentially point to the same 4 GB of memory

16-bit .COM programs have all code, data, and stack within a single 64 KB segment (CS = DS = ES = SS)
although, you are allowed to access memory outside that segment at run-time

turboscrew

Well, I'm semi-familiar with x86 assembly and more familiar with PDP-11, 6502, 6800, 6809, 68000, PowerPC, 8085, TMS32C6000-series, and some other processor's assembly.

Quote"Is there something - even in principle - that starts something that END ends? If, then what?"

yes - the program source text
So from my point of view - no.


QuoteSEGMENT: Does that define physical or logical segment? Looks like physical, because it doesn't take any indications of which kind of segment?

i guess i would say that a segment is a logical reference to a physical memory region   :biggrin:
as far as the assembler is concerned, it's logical (assembly-time)
when the program is executed, the operating system maps it to a physical/virtual area of memory (run-time)
16-bit DOS programs ran on systems with 1 MB addressable memory
a segment was actually a physical region in memory
32-bit/64-bit windows programs run on systems that address much more memory
segments are mapped to virtual pages or groups of pages of memory

simply put, the assembler uses segments to handle various memory attributes,
and as a reference for generating numeric addresses


The type (program/data/BSS) seems to be defined only by its use - that is: which segment register is set to point to it?

that's a fair observation under 16-bit DOS
although, to assemble executable instructions, it must be a CODE type segment

under 32-bit windows, segments map to sections of memory that have specific security attributes
various sections may allow or dis-allow reads, writes, or code execution
and - different privilege levels may have different access rights
for example, user-mode programs have ring-3 access (refer to the intel manuals)

I'd think that physical segment is something you configure in the MMU segment descriptor. Logical segments are rather kind of link-units,
like the segments or sections used with old 8-bit processors.

jj2007

Quote from: turboscrew on December 14, 2013, 07:23:34 AM
"Is there something - even in principle - that starts something that END ends? If, then what?"
Technically speaking, start: starts something that end ends:

include \masm32\include\masm32rt.inc  ; everything you need - read this file!

.code
AppName   db "Masm32:", 0

start:
   MsgBox 0, "Hello World", addr AppName, MB_OK
   exit

end start


This is how you tell the assembler & linker where you want to start your program. Note you can use turbo: .... end turbo if you prefer ;-)

QuoteI'd think that physical segment is something you configure in the MMU segment descriptor. Logical segments are rather kind of link-units, like the segments or sections used with old 8-bit processors.
Wiki has it all. Or google tlb virtual memory

turboscrew

The SEGMET - ENDS pair is still a bit blurred, but maybe still another go with the socumentation...

The PROC - ENDP still puzzles me. In most assembly languages a label and RET is enough.
What does MASM use PROC for?

jj2007

Quote from: turboscrew on December 27, 2013, 06:59:30 PM
What does MASM use PROC for?

Go here and search for prologue epilogue macro
Same here - read specifically this thread.
Then come back next year for questions ;)

turboscrew

Does PROC - ENDP generate a (C) function prologue/epilogue for "outside" calls ?
Does it create a new stack frame and save some registers (add stuff to the code)?
Or does it add some info to the procedure name symbol for the linker?

Another question: does the need of "verification" ever go away?

Gunther

Hi turboscrew,

Quote from: turboscrew on December 27, 2013, 09:42:04 PM
Does PROC - ENDP generate a (C) function prologue/epilogue for "outside" calls ?

No, not automatically. You've to do that by hand or with appropriate macros.

Quote from: turboscrew on December 27, 2013, 09:42:04 PM
Does it create a new stack frame and save some registers (add stuff to the code)?

Same answer, see above. By the way, The PROC ENDP is a bit red band. A procedure is a simple label inside the source code. There are other assemblers (NASM, YASM) which don't need PROC ENDP. If you're interested, you should give them a try.

Gunther
You have to know the facts before you can distort them.

jj2007

Quote from: Gunther on December 27, 2013, 10:08:57 PMThe PROC ENDP is a bit red band. A procedure is a simple label inside the source code. There are other assemblers (NASM, YASM) which don't need PROC ENDP.

So how do they decide how to interpret "ret"?

include \masm32\include\masm32rt.inc
.code

MyProc proc uses esi arg1
LOCAL v1, rc:RECT
  print str$(arg1)
  ret      ; = pop esi, leave, retn 4
MyProc endp

SomeLabel:
  MsgBox 0, "Hello World", "Title", MB_OK
  ret      ; =?

start:
   invoke MyProc, 123
   call SomeLabel
   exit
end start

turboscrew

Quote from: Gunther on December 27, 2013, 10:08:57 PM

...procedure is a simple label inside the source code. There are other assemblers (NASM, YASM) which don't need PROC ENDP. If you're interested, you should give them a try.

Gunther

That's why I've been wondering.
Any other assembly that I know (I don't know Intel-assemblers too well) doesn't have such.
Not even GAS for x86 (AT&T syntax).
With MASM-"family" assemblers that seems to be necessary.


Gunther

Jochen,

Quote from: jj2007 on December 27, 2013, 11:32:25 PM
So how do they decide how to interpret "ret"?


What is the result of RET? Should I explain for you the content of the processor manual? You're joking, don't you?

Here's an example:

SECTION .data

msg db "Hello, world!",0xa ;
len equ     $ - msg

SECTION .text
global main

main:
        mov     eax,4 ; write system call
        mov     ebx,1           ; file (stdou)
        mov     ecx,msg         ; string
        mov     edx,len         ; strlen
int     0x80 ; call kernel

mov eax,1 ; exit system call
        mov     ebx,0     
        int     0x80 ; call kernel


That will work under Linux and BSD. Main is a simple label, what else?

Gunther
You have to know the facts before you can distort them.

dedndave

if a PROC has parameters, USES, or LOCAL's, Masm will generate an epilogue whenever it sees RET
it must be a "vanilla" RET - not RETN
you can, of course, disable the epilogue with OPTION

jj2007

Quote from: Gunther on December 28, 2013, 01:30:41 AMYou're joking, don't you?

Gunther,

No, I am not joking. Just try to build my example with an assembler that creates automatically a stack frame and doesn't need endp. Dave gave you already a hint.

Jochen

Gunther

Jochen,

Quote from: jj2007 on December 28, 2013, 04:15:03 AM
No, I am not joking. Just try to build my example with an assembler that creates automatically a stack frame and doesn't need endp. Dave gave you already a hint.

Neither NASM still YASM create automatically a stack frame. My point was, that a procedure is a label inside the source code, not more, not less. All these other stuff (LOCALs, USES, INVOKE etc.) is very assembler specific (MASM, JWASM) and isn't really necessary. It is comfortable, no doubt, but one can live without that.

The target direction of this thread was the question about this stuff. The PROC, ENDP is necessary for MASM type assemblers, but not for others. It is not so easy for starters or coders from another platform to look behind the scenes and separate the important from the unimportant. I hope that's clear enough. No offending.

Gunther
You have to know the facts before you can distort them.

jj2007

Quote from: Gunther on December 28, 2013, 05:20:27 AMNeither NASM still YASM create automatically a stack frame. My point was, that a procedure is a label inside the source code, not more, not less.

For Masm/JWasm, a procedure is much more than just a label. But of course, real menTM can live without invoke, macros, locals, stack frames and other fashionable gadgets ;-)

Again, no offense intended. My point was that the endp delimits the scope of a procedure, and that ret has a different meaning above and below the endp.