Author Topic: 16-bit MASM blues...  (Read 10297 times)

turboscrew

  • Guest
16-bit MASM blues...
« on: December 13, 2013, 04:37:37 PM »
Now that I stumbled into 16-bit MASM-stuff: I remember why I never learned
x86 assembly so well...

The main headaches were:

  • ASSUME
  • SEGMENT - ENDS
  • PROC - ENDP
  • END
My questions are:

  • Is there something - even in principle - that starts something that END ends? If, then what?
  • SEGMENT: Does that define physical or logical segment? Looks like physical, because it doesn't take any indications of which kind of segment?
    The type (program/data/BSS) seems to be defined only by its use - that is: which segment register is set to point to it?
  • .CODE and .DATA are logical segment start markers? The linker may consolidate them to the same (unseen, internal) segments depending on the type? (The next .data continues where the last .data left off?)
  • What it is that ASSUME DS:@data does (in the assembler's point of view)?
  • What is procedure block (from the assembler's point of view)? What are they used for?
That stuff is still very vaguely explained in the MASM manuals ("Programmer's Guide" and "Reference").

I guess when the command prompt starts the program, the CS is set to the program's code segment, but what's in the other segment registers?

dedndave

  • Member
  • *****
  • Posts: 8734
  • Still using Abacus 2.0
    • DednDave
Re: 16-bit MASM blues...
« Reply #1 on: December 13, 2013, 08:03:50 PM »
the MASM reference manual assumes that you are familiar with assembly language
the purpose of the manual is to provide syntax reference, not to teach assembly language

Quote
Is there something - even in principle - that starts something that END ends? If, then what?

yes - the program source text
text that appears after the END directive is ignored by the assembler
the END directive has an optional argument that is required for programs, but not for modules
that argument allows the programmer to specify the program entry point

Quote
SEGMENT: Does that define physical or logical segment? Looks like physical, because it doesn't take any indications of which kind of segment?

i guess i would say that a segment is a logical reference to a physical memory region   :biggrin:
as far as the assembler is concerned, it's logical (assembly-time)
when the program is executed, the operating system maps it to a physical/virtual area of memory (run-time)
16-bit DOS programs ran on systems with 1 MB addressable memory
a segment was actually a physical region in memory
32-bit/64-bit windows programs run on systems that address much more memory
segments are mapped to virtual pages or groups of pages of memory

simply put, the assembler uses segments to handle various memory attributes,
and as a reference for generating numeric addresses

Quote
The type (program/data/BSS) seems to be defined only by its use - that is: which segment register is set to point to it?

that's a fair observation under 16-bit DOS
although, to assemble executable instructions, it must be a CODE type segment

under 32-bit windows, segments map to sections of memory that have specific security attributes
various sections may allow or dis-allow reads, writes, or code execution
and - different privilege levels may have different access rights
for example, user-mode programs have ring-3 access (refer to the intel manuals)

Quote
.CODE and .DATA are logical segment start markers?

.CONST, .DATA, .DATA?, .CODE, and .STACK are aliases or shortcuts, intended to save typing
in "old style" syntax, segments are "opened" with a SEGMENT directive and "closed" with an ENDS directive
the programmer may provide name, type, combine, class, and a couple other attributes to a segment
segments may not be nested
masm provides shortcuts to open a limited set of standardized program segments
when the shortcuts are used, opening one segment automatically closes any previously opened segment

Quote
The linker may consolidate them to the same (unseen, internal) segments depending on the type?

as mentioned, segments have a "combine" attribute that tells the linker how they may be combined
furthermore, segments may be assigned to "groups"
in which case, the programmer may reference a group of segments and the linker will make them contiguous

Quote
The next .data continues where the last .data left off?

yes

Quote
What it is that ASSUME DS:@data does (in the assembler's point of view)?

the ASSUME directive has many uses, actually
but, when used for segment registers, it tells the assembler which program segment is pointed to by a specific register
a programmer may load a value into a segment register by use of a few different instructions
in some cases, it may be loaded from a variable
the assember cannot know what the value is at assemble-time

the specific directive "ASSUME DS:@data" is used to tell the assembler that the DS register points to DGROUP
DGROUP is a group that includes constant, initialized data, and uninitialized code sections
at least, that's the behaviour in the most commonly used memory models
for 32-bit code - all segments are in the same 4 GB addressable space (flat model)
but, in 16-bit DOS world, there are numerous models - the most common are TINY and SMALL
you can look at the reference material (or google) for descriptions of MEDIUM, COMPACT, LARGE, HUGE models

Quote
What is procedure block (from the assembler's point of view)? What are they used for?

simple answer, PROC = subroutine   :P
it's an individual block of code
not all PROC's are subroutines, because the main program code may be assembled as a PROC

Quote
I guess when the command prompt starts the program, the CS is set to the program's code segment, but what's in the other segment registers?

.EXE program headers have values that tell the operating system how to assign initial values to most segment registers (code, data, stack)
operation varies between DOS and windows
under win32, you can pretty much ignore segment registers
a "flat" memory model is used, and all segments essentially point to the same 4 GB of memory

16-bit .COM programs have all code, data, and stack within a single 64 KB segment (CS = DS = ES = SS)
although, you are allowed to access memory outside that segment at run-time

turboscrew

  • Guest
Re: 16-bit MASM blues...
« Reply #2 on: December 14, 2013, 07:23:34 AM »
Well, I'm semi-familiar with x86 assembly and more familiar with PDP-11, 6502, 6800, 6809, 68000, PowerPC, 8085, TMS32C6000-series, and some other processor's assembly.

Quote
"Is there something - even in principle - that starts something that END ends? If, then what?"

yes - the program source text
So from my point of view - no.


Quote
SEGMENT: Does that define physical or logical segment? Looks like physical, because it doesn't take any indications of which kind of segment?

i guess i would say that a segment is a logical reference to a physical memory region   :biggrin:
as far as the assembler is concerned, it's logical (assembly-time)
when the program is executed, the operating system maps it to a physical/virtual area of memory (run-time)
16-bit DOS programs ran on systems with 1 MB addressable memory
a segment was actually a physical region in memory
32-bit/64-bit windows programs run on systems that address much more memory
segments are mapped to virtual pages or groups of pages of memory

simply put, the assembler uses segments to handle various memory attributes,
and as a reference for generating numeric addresses


The type (program/data/BSS) seems to be defined only by its use - that is: which segment register is set to point to it?

that's a fair observation under 16-bit DOS
although, to assemble executable instructions, it must be a CODE type segment

under 32-bit windows, segments map to sections of memory that have specific security attributes
various sections may allow or dis-allow reads, writes, or code execution
and - different privilege levels may have different access rights
for example, user-mode programs have ring-3 access (refer to the intel manuals)

I'd think that physical segment is something you configure in the MMU segment descriptor. Logical segments are rather kind of link-units,
like the segments or sections used with old 8-bit processors.

jj2007

  • Member
  • *****
  • Posts: 7559
  • Assembler is fun ;-)
    • MasmBasic
Re: 16-bit MASM blues...
« Reply #3 on: December 14, 2013, 11:18:41 AM »
"Is there something - even in principle - that starts something that END ends? If, then what?"
Technically speaking, start: starts something that end ends:

include \masm32\include\masm32rt.inc  ; everything you need - read this file!

.code
AppName   db "Masm32:", 0

start:
   MsgBox 0, "Hello World", addr AppName, MB_OK
   exit

end start


This is how you tell the assembler & linker where you want to start your program. Note you can use turbo: .... end turbo if you prefer ;-)

Quote
I'd think that physical segment is something you configure in the MMU segment descriptor. Logical segments are rather kind of link-units, like the segments or sections used with old 8-bit processors.
Wiki has it all. Or google tlb virtual memory

turboscrew

  • Guest
Re: 16-bit MASM blues...
« Reply #4 on: December 27, 2013, 06:59:30 PM »
The SEGMET - ENDS pair is still a bit blurred, but maybe still another go with the socumentation...

The PROC - ENDP still puzzles me. In most assembly languages a label and RET is enough.
What does MASM use PROC for?
 

jj2007

  • Member
  • *****
  • Posts: 7559
  • Assembler is fun ;-)
    • MasmBasic
Re: 16-bit MASM blues...
« Reply #5 on: December 27, 2013, 07:13:00 PM »
What does MASM use PROC for?

Go here and search for prologue epilogue macro
Same here - read specifically this thread.
Then come back next year for questions ;)

turboscrew

  • Guest
Re: 16-bit MASM blues...
« Reply #6 on: December 27, 2013, 09:42:04 PM »
Does PROC - ENDP generate a (C) function prologue/epilogue for "outside" calls ?
Does it create a new stack frame and save some registers (add stuff to the code)?
Or does it add some info to the procedure name symbol for the linker?

Another question: does the need of "verification" ever go away?

Gunther

  • Member
  • *****
  • Posts: 3515
  • Forgive your enemies, but never forget their names
Re: 16-bit MASM blues...
« Reply #7 on: December 27, 2013, 10:08:57 PM »
Hi turboscrew,

Does PROC - ENDP generate a (C) function prologue/epilogue for "outside" calls ?

No, not automatically. You've to do that by hand or with appropriate macros.

Does it create a new stack frame and save some registers (add stuff to the code)?

Same answer, see above. By the way, The PROC ENDP is a bit red band. A procedure is a simple label inside the source code. There are other assemblers (NASM, YASM) which don't need PROC ENDP. If you're interested, you should give them a try.

Gunther
Get your facts first, and then you can distort them.

jj2007

  • Member
  • *****
  • Posts: 7559
  • Assembler is fun ;-)
    • MasmBasic
Re: 16-bit MASM blues...
« Reply #8 on: December 27, 2013, 11:32:25 PM »
The PROC ENDP is a bit red band. A procedure is a simple label inside the source code. There are other assemblers (NASM, YASM) which don't need PROC ENDP.

So how do they decide how to interpret "ret"?

include \masm32\include\masm32rt.inc
.code

MyProc proc uses esi arg1
LOCAL v1, rc:RECT
  print str$(arg1)
  ret      ; = pop esi, leave, retn 4
MyProc endp

SomeLabel:
  MsgBox 0, "Hello World", "Title", MB_OK
  ret      ; =?

start:
   invoke MyProc, 123
   call SomeLabel
   exit
end start

turboscrew

  • Guest
Re: 16-bit MASM blues...
« Reply #9 on: December 28, 2013, 12:44:33 AM »

...procedure is a simple label inside the source code. There are other assemblers (NASM, YASM) which don't need PROC ENDP. If you're interested, you should give them a try.

Gunther

That's why I've been wondering.
Any other assembly that I know (I don't know Intel-assemblers too well) doesn't have such.
Not even GAS for x86 (AT&T syntax).
With MASM-"family" assemblers that seems to be necessary.


Gunther

  • Member
  • *****
  • Posts: 3515
  • Forgive your enemies, but never forget their names
Re: 16-bit MASM blues...
« Reply #10 on: December 28, 2013, 01:30:41 AM »
Jochen,

So how do they decide how to interpret "ret"?


What is the result of RET? Should I explain for you the content of the processor manual? You're joking, don't you?

Here's an example:
Code: [Select]
SECTION .data

msg db "Hello, world!",0xa ;
len equ     $ - msg

SECTION .text
global main

main:
        mov     eax,4 ; write system call
        mov     ebx,1           ; file (stdou)
        mov     ecx,msg         ; string
        mov     edx,len         ; strlen
int     0x80 ; call kernel

mov eax,1 ; exit system call
        mov     ebx,0     
        int     0x80 ; call kernel

That will work under Linux and BSD. Main is a simple label, what else?

Gunther
Get your facts first, and then you can distort them.

dedndave

  • Member
  • *****
  • Posts: 8734
  • Still using Abacus 2.0
    • DednDave
Re: 16-bit MASM blues...
« Reply #11 on: December 28, 2013, 01:48:53 AM »
if a PROC has parameters, USES, or LOCAL's, Masm will generate an epilogue whenever it sees RET
it must be a "vanilla" RET - not RETN
you can, of course, disable the epilogue with OPTION

jj2007

  • Member
  • *****
  • Posts: 7559
  • Assembler is fun ;-)
    • MasmBasic
Re: 16-bit MASM blues...
« Reply #12 on: December 28, 2013, 04:15:03 AM »
You're joking, don't you?

Gunther,

No, I am not joking. Just try to build my example with an assembler that creates automatically a stack frame and doesn't need endp. Dave gave you already a hint.

Jochen

Gunther

  • Member
  • *****
  • Posts: 3515
  • Forgive your enemies, but never forget their names
Re: 16-bit MASM blues...
« Reply #13 on: December 28, 2013, 05:20:27 AM »
Jochen,

No, I am not joking. Just try to build my example with an assembler that creates automatically a stack frame and doesn't need endp. Dave gave you already a hint.

Neither NASM still YASM create automatically a stack frame. My point was, that a procedure is a label inside the source code, not more, not less. All these other stuff (LOCALs, USES, INVOKE etc.) is very assembler specific (MASM, JWASM) and isn't really necessary. It is comfortable, no doubt, but one can live without that.

The target direction of this thread was the question about this stuff. The PROC, ENDP is necessary for MASM type assemblers, but not for others. It is not so easy for starters or coders from another platform to look behind the scenes and separate the important from the unimportant. I hope that's clear enough. No offending.

Gunther
Get your facts first, and then you can distort them.

jj2007

  • Member
  • *****
  • Posts: 7559
  • Assembler is fun ;-)
    • MasmBasic
Re: 16-bit MASM blues...
« Reply #14 on: December 28, 2013, 06:13:56 AM »
Neither NASM still YASM create automatically a stack frame. My point was, that a procedure is a label inside the source code, not more, not less.

For Masm/JWasm, a procedure is much more than just a label. But of course, real menTM can live without invoke, macros, locals, stack frames and other fashionable gadgets ;-)

Again, no offense intended. My point was that the endp delimits the scope of a procedure, and that ret has a different meaning above and below the endp.