The MASM Forum

General => The Campus => Topic started by: masterori on February 07, 2016, 04:50:39 PM

Title: Need some clarification on basic MASM concepts
Post by: masterori on February 07, 2016, 04:50:39 PM
I'm reviewing for my midterm right now, and I'm a bit confused with some basic concepts, so I was hoping someone can clarify for me?

1. please see code example from book

    .data
    sum  dword  0
    .code
    mov  eax  5
    add  eax,6
    mov  sum,eax

Why did the book use the "dword" directive? The sum total is 11, maybe I haven't clearly understood how size works, but shouldn't "byte" or "word" be sufficient? Since 11 can fit into 8 bits: 00001011

2. Are spaces counted in terms of a space taken in memory? E.g

    text  byte  "Hello there",0

Is the number of bytes 12 or 11?

3. If I want to store the ASCII value of an unsigned binary in the runtime stack, can I just push the binary surrounded by single quotes?
E.g: The value 0b1001011001101011 is 38507, so if I want to push the ASCII of 3,8,5,0,7 onto the runtime stack, can I just divide 1001011001101011 by 10000, 1000, 100, 10, 1 and then push the quotient onto stack? Or will I have to convert that unsigned binary into a decimal value first?
Title: Re: Need some clarification on basic MASM concepts
Post by: jj2007 on February 07, 2016, 07:59:16 PM
Quote from: masterori on February 07, 2016, 04:50:39 PM1. Why did the book use the "dword" directive? The sum total is 11, maybe I haven't clearly understood how size works, but shouldn't "byte" or "word" be sufficient? Since 11 can fit into 8 bits: 00001011

It can fit into 8 bits, but the "natural" size for x86 code is 32 bits, so by default, one would use dword. Costs a bit more memory, but often it is faster, and there is more than enough memory around ;-)

Quote2. Are spaces counted in terms of a space taken in memory?

Yes, the processor couldn't care less if you've stored " ", "X" or a nullbyte.

Quote3. If I want to store the ASCII value of an unsigned binary in the runtime stack, can I just push the binary surrounded by single quotes?
E.g: The value 0b1001011001101011 is 38507, so if I want to push the ASCII of 3,8,5,0,7 onto the runtime stack, can I just divide 1001011001101011 by 10000, 1000, 100, 10, 1 and then push the quotient onto stack? Or will I have to convert that unsigned binary into a decimal value first?

That dividing method won't work - no need to do such acrobatics. Instead, you can use
push 1001011001101011b
push 1001011001101011y
push 38507
push 966Bh
push "–k"


It's all the same value, just expressed in a different syntax.

You should test what you are doing using either a debugger or the deb macro. See this thread for details. (http://masm32.com/board/index.php?topic=5101.msg54907#msg54907)

For example, with the deb macro (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1019), you would typically do this:
  push 11111111 ; delimit your stack entries
  push 1001011001101011b
  push 1001011001101011y
  push 30000+8000+500+7
  push 38507
  push 966Bh
  push "–k"
  push 22222222 ; delimit your stack entries
  deb 4, "Values on the stack", stack[0], stack[4], stack[8], stack[12], stack[16], stack[20], stack[24], stack[28]


and see this output:
Values on the stack
stack[0]        22222222
stack[4]        38507
stack[8]        38507
stack[12]       38507
stack[16]       38507
stack[20]       38507
stack[24]       38507
stack[28]       11111111
Title: Re: Need some clarification on basic MASM concepts
Post by: dedndave on February 07, 2016, 11:56:01 PM
it is simpler to push the binary value and convert it to ascii decimal when required

however, strings can be pushed onto the stack
pushing an ascii decimal value is problematic, though
consider
0000 0000 0000 0000 0000 0000 0000 0001  "1" - 1 byte string + null terminator
0111 1111 1111 1111 1111 1111 1111 1111  "2147483647" - 10 byte string + null terminator


and, for 32-bit code, the stack should always be 4-aligned
so, regardless of the length of the string, you will probably want to push 3 dwords
Title: Re: Need some clarification on basic MASM concepts
Post by: masterori on February 08, 2016, 12:44:46 PM
But what if I want to put the individual ASCII of the 38507 in the stack? Like so:

stack[0] = ; ascii of 3
stack[4] = ; ascii of 8
...

That's the reason I was asking if it would be okay to divide that unsigned binary by 10000,1000, etc and then somehow get the ascii value of the quotient in eax.

If I just push the entire binary value (0b1001011001101011) onto the stack, then it wouldn't have the ascii of each individual digit, or any ascii char at all right?
Title: Re: Need some clarification on basic MASM concepts
Post by: dedndave on February 09, 2016, 12:46:00 AM
division is naturally one of the slower operations to perform
and, while it is possible to perform division on ASCII decimal strings,
it is faster to perform division on binary values, then convert the results to ASCII

this is the general concept for most math with computers
use human native ASCII decimal for input and output,
but perform the intermediate math steps in the CPU native binary format

using the stack for storage is another concept, altogether
best to learn one, then the other
trying to learn both at once is just making it harder on yourself
Title: Re: Need some clarification on basic MASM concepts
Post by: qWord on February 09, 2016, 01:59:35 AM
Quote from: masterori on February 08, 2016, 12:44:46 PM
But what if I want to put the individual ASCII of the 38507 in the stack? Like so:

stack[0] = ; ascii of 3
stack[4] = ; ascii of 8
...

You can get that by repeated division by 10 until the quotient gets zero:
.const
ten DWORD 10
.code

mov eax,12456

xor ecx,ecx
@1: xor edx,edx
div ten
add edx,'0'
push edx
add ecx,1
cmp eax,0
jnz @1
; ECX = count digits
; ASCII-digits in [ESP] ... [ESP+ECX*4-4]

; output example
@2: pop edx
push ecx
fn crt_putchar,edx
pop ecx
sub ecx,1
jnz @2
Title: Re: Need some clarification on basic MASM concepts
Post by: masterori on February 11, 2016, 04:19:24 PM
I guess I'm not being clear on my question. I was wondering, if I push like say '0010' vs 0010, will it put '51' on the stack? If so for which - '0010' or 0010?

Another question as I was going through some book examples, it's regarding the 'imul' instruction

mov al, 48
mov bl, 4
imul bl            ; AX = 00c0h, Overflow = 1 because AH is not a sign extension of AL



mov ax,48
mov bx, 4
imul bx          ; DX:AX = 000000c0h, Overflow = 0 because DX is sign extension of AX


Can someone explain why the first example is not a sign extension? The book says "Because AH is not a sign extension of AL", but I'm not sure why...
Title: Re: Need some clarification on basic MASM concepts
Post by: hutch-- on February 11, 2016, 07:19:16 PM
AL and AH are respecively the LOW byte and the HIGH byte of the AX register which is the LOW word of the EAX register. The difference between signed and unsigned is not how its stored in memory, it is how you evaluate it AFTER the value is in a register or memory. JG is a signed instruction, JA is unsigned. After a CMP you use a conditional jump to branch according to its value.
Title: Re: Need some clarification on basic MASM concepts
Post by: dedndave on February 11, 2016, 11:58:10 PM
Quote from: masterori on February 11, 2016, 04:19:24 PM
I guess I'm not being clear on my question. I was wondering, if I push like say '0010' vs 0010, will it put '51' on the stack? If so for which - '0010' or 0010?

    push    "0010"     ;string operands are reversed, so 30303130h is pushed (as a string, "0100")
    push    0010       ;the masm default radix is 10 (decimal), so 0000000Ah is pushed
    push    0010h      ;00000010h is pushed


usually, strings need to be null-terminated
if you want to push a literal string onto the stack, you should take that into account

    push    "gfe"      ;the last byte (highest address) pushed will be 0
    push    "dcba"     ;you now have the string "abcdefg",0 on the stack
Title: Re: Need some clarification on basic MASM concepts
Post by: dedndave on February 12, 2016, 12:17:50 AM
as Hutch mentioned, the difference between signed values and unsigned values is how they are interpreted
i.e., the context is up to the programmer

as an example, let's say that EAX holds the value 00303030h

    mov     eax,303030h

if we want that to represent a null-terminated string, it is "000",0
if we want that to represent a signed integer, it is +3158064 decimal
if we want that to represent an unsigned integer, it is 3158064

unsigned dword integers have the range of 0 to 4294967295 decimal
signed dword integers have the range of -2147483648 to +2147483647

unsigned word integers have the range of 0 to 65535
signed word integers have the range of -32768 to +32767

unsigned byte integers have the range of 0 to 255
signed byte integers have the range of -128 to +127

simply, if the value is signed, and the high bit is 0, it is positive
if the value is signed and the high bit is 1, it is negative, and the order of count is reversed (two's compliment)

0FFFFFFFFh may be treated as -1 (signed), or 4294967295 (unsigned)
080000000h may be treated as -2147483648 (signed), or 2147483648 (unsigned)
Title: Re: Need some clarification on basic MASM concepts
Post by: dedndave on February 12, 2016, 12:24:10 AM
the real trick to signed versus signed is how we treat them in our code
basically, we need to use the correct routine to convert integers to decimal strings (and vica versa)
AND, we need to use the right set of conditional branch instructions when dealing with signed/unsigned values

notice that signed and unsigned branches act on different flags in the EFlags register
a SUB instruction will yield the same result whether the values are signed or unsigned
however, a different set of flags might be used to branch
this is the elegance of the two's compliment numbering system

------------------------------------------------------------------------
Group     Instruction  Description               Condition       Aliases
------------------------------------------------------------------------
Equality  JZ           Jump if equal             ZF=1            JE
          JNZ          Jump if not equal         ZF=0            JNE

Unsigned  JA           Jump if above             CF=0 and ZF=0   JNBE
          JAE          Jump if above or equal    CF=0            JNC JNB
          JB           Jump if below             CF=1            JC JNAE
          JBE          Jump if below or equal    CF=1 or ZF=1    JNA

Signed    JG           Jump if greater           SF=OF or ZF=0   JNLE
          JGE          Jump if greater or equal  SF=OF           JNL
          JL           Jump if less              SF<>OF          JNGE
          JLE          Jump if less or equal     SF<>OF or ZF=1  JNG
          JO           Jump if overflow          OF=1
          JNO          Jump if no overflow       OF=0
          JS           Jump if sign              SF=1
          JNS          Jump if no sign           SF=0

Parity    JP           Jump if parity            PF=1            JPE
          JNP          Jump if no parity         PF=0            JPO
Title: Re: Need some clarification on basic MASM concepts
Post by: dedndave on February 12, 2016, 01:05:42 AM
here, we see two's compliment at work
the CPU doesn't care whether the values are signed or unsigned
it performs the same operation, either way

        mov     eax,0FFFFFFFFh   ;signed value = -1, unsigned value = 4294967295
        mov     edx,000000100h   ;signed value = +256, unsigned value = 256
        sub     eax,edx          ;result = 0FFFFFEFFh, signed = -257, unsigned = 4294967039, EFlags = 00000286h

        mov     eax,0FFFFFFFFh   ;signed value = -1, unsigned value = 4294967295
        mov     edx,000000100h   ;signed value = +256, unsigned value = 256
        add     eax,edx          ;result = 0FFh, signed = +255, unsigned = 255 (with carry), EFlags = 00000207h


the EFLags register may be evaluated as follows:
EFlags Register
   Bit      Description
  31-22      unassigned, always 0 (as of Pentium IV)
   21        ID (Identification, may be toggled if CPUID supported)
   20        VIP (Virtual Interrupt Pending)
   19        VIF (Virtual Interrupt Flag)
   18        AC (Alignment Check, may be toggled if 486 or later)
   17        VM (Virtual 8086 Mode)
   16        RF (Resume Flag)
   15        unassigned, always 0 (as of Pentium IV)
   14        NT (Nested Task)
  13,12      IOPL (I/O Privilege Level)
   11        OF (Overflow Flag)
   10        DF (Direction Flag, 0 = up)
    9        IF (Interrupt Flag)
    8        TF (Trap Flag)
    7        SF (Sign Flag)
    6        ZF (Zero Flag)
    5        unassigned, always 0 (as of Pentium IV)
    4        AF (Auxiliary Flag)
    3        unassigned, always 0 (as of Pentium IV)
    2        PF (Parity Flag, 0 = odd parity)
    1        unassigned, always 1 (as of Pentium IV)
    0        CF (Carry Flag)


you are primarily interested in the math flags: OF, SF, ZF, CF
Title: Re: Need some clarification on basic MASM concepts
Post by: masterori on February 13, 2016, 06:23:36 AM
So based on dedndave's answer, it seems I have to convert the binary to ascii first then push? How can I do so such that the binary: 0b1001011001101011 (in dec: 38507) is stored like this in the runtime stack:


stack[0]    55    ; ascii of 7
stack[1]    48    ; ascii of 0
stack[2]    53    ; ascii of 5
stack[3]    56    ; ascii of 8
stack[4]    51    ; ascii of 3
Title: Re: Need some clarification on basic MASM concepts
Post by: dedndave on February 13, 2016, 08:55:44 AM
the assembler is perfectly capable of converting ASCII, decimal, hexadecimal, octal, etc to binary values
that means you can type

    mov     eax,"abcd"
    mov     eax,61626364h
    mov     eax,1633837924
    mov     eax,1100001011000100110001101100100b


they are all the same
you use the one that makes your code easy to understand

also, i will say this about PUSH and POP
in 32-bit code, the stack should always be 4-byte aligned
that means we don't generally push or pop bytes or words - always dwords
Title: Re: Need some clarification on basic MASM concepts
Post by: masterori on February 13, 2016, 09:16:54 AM
So they're all stored as ascii in the runtime stack by default? The reason I'm so persistent on knowing how to store the values as ascii on the stack, is because that's one of the review questions. Write code so that once it's done, the ascii values of the binary is in the runtime stack.
Title: Re: Need some clarification on basic MASM concepts
Post by: dedndave on February 13, 2016, 10:19:03 AM
there are several ways to do that
keep in mind, the last byte should be a 0 (null terminator)
also - any DWORD ASCII decimal value, signed or unsigned, plus the null terminator, will fit into a 12-byte buffer
so, you might start with

    mov     edi,esp     ;EDI points to the end of buffer
    push    0
    sub     esp,8


or, you could do this, and add code to 0 the last byte later on
    mov     edi,esp     ;EDI points to the end of buffer
    sub     esp,12


notice that ESP maintains 4-byte aligned in both cases
Title: Re: Need some clarification on basic MASM concepts
Post by: masterori on February 13, 2016, 10:35:56 AM
Maybe I'm a bit dense, but I don't think you're answering my question...
If I do something like this

    push  0011

Will it push the ASCII character 51 onto the stack?. The ASCII 51 is the representation of 3. If it does not, how do I get it to put 51 instead of 0011, 3 or whatever it puts in the stack?
Title: Re: Need some clarification on basic MASM concepts
Post by: dedndave on February 13, 2016, 10:56:16 AM
    push    11

will push a decimal 11, dword size (3 zero bytes)
dword is the default for push in 32-bit code
decimal is the default radix for masm, so the 11 is seen as 11, or 0Bh

if you want to get 51 on the stack
    push    51
again, it will be a dword, and 51 will be seen as decimal (ASCII for "3")
Title: Re: Need some clarification on basic MASM concepts
Post by: masterori on February 13, 2016, 11:27:04 AM
So even if I use     push 0011b it'll still push 11? Is there a way to automatically convert a binary into ascii?

As per my original question, the question wants to push each digit of 38507 onto the stack one by one. I'm given 1001011001101011b to start with. So I thought of using 'div' by 10000, 1000, 100, ... to get each of the digits of 38507 in binary form. Or does that not work - dividing a binary by a decimal?
Title: Re: Need some clarification on basic MASM concepts
Post by: dedndave on February 13, 2016, 12:05:16 PM
11b specifies binary - that overrides the default - that is 3 in decimal

ok - i am telling you that pushing individual bytes isn't going to work
you can push 4 bytes at a time

or - you can create stack space by multiples of 4 bytes, then fill those bytes individually
that is why i used sub esp

you did not state that the values had to be pushed
you stated they had to be on the stack when done
Title: Re: Need some clarification on basic MASM concepts
Post by: masterori on February 13, 2016, 12:36:25 PM
I understand everything that you've posted. But I'm still trying to figure out if there's a way to get the ascii representation of each digit because that's what the question is asking for. It shows a stack similar to the one I posted on the previous page.

Quote from: dedndave on February 13, 2016, 12:05:16 PM
you did not state that the values had to be pushed
you stated they had to be on the stack when done
I don't understand you. How can they be on the stack when they weren't pushed on?

Again, 1001011001101011b is 38507 in binary. So again going back to my question on the first page, can I take 1001011001101011b and divide by 10000 (decimal) to get 3 decimal then (somehow) get the ascii representation of it and push it onto the stack?

I want to know if there's some kind of conversion that gets the ascii of a binary or decimal value. The reason is because DIV stores the quotient in AX and the remainder in DX, so I was going to loop it until DX is 0. This would give me 3, 8, 5, 0, 7 respectively. Now all that's left is to convert those to ascii then push (or if pushing automatically converts it over to ascii then that's fine).
Title: Re: Need some clarification on basic MASM concepts
Post by: jj2007 on February 13, 2016, 01:27:35 PM
Quote from: masterori on February 13, 2016, 12:36:25 PMcan I take 1001011001101011b and divide by 10000 (decimal) to get 3 decimal then (somehow) get the ascii representation of it and push it onto the stack?

Yes. Get Olly (http://www.ollydbg.de/version2.html) to understand what this code does:
include \masm32\include\masm32rt.inc
.code
start:
  mov eax, 123456789 ; or 38507
  mov esi, esp ; we need a "permanent" reg32 like esi here
  mov ecx, 10
  .Repeat
cdq
div ecx
add edx, "0"
push edx
  .Until !eax
  .Repeat
print esp, 32 
pop eax
  .Until esp>=esi
  inkey chr$(13, 10, "That was cool, right?")
  exit
end start
Title: Re: Need some clarification on basic MASM concepts
Post by: masterori on February 13, 2016, 02:05:12 PM
Quote from: jj2007 on February 13, 2016, 01:27:35 PM

add edx, "0"
push edx
 

This does the conversion to ascii? You can add a char with a binary like that?
Title: Re: Need some clarification on basic MASM concepts
Post by: dedndave on February 13, 2016, 04:27:06 PM
let's deal with the stack

there are many ways to access data on the stack
generally, the CPU maintains a stack pointer
for 32-bit code, it is named ESP (Extended Stack Pointer)
it is a register that holds a pointer into the stack area (an address)

when you PUSH a dword onto the stack, the value in ESP is decreased by 4, and the data is written to that address
when you POP a dword, the value at [ESP] is popped to whatever destination you specified,
and ESP is increased by 4

now, if we want to "reserve" space on the stack without PUSH, we can manipulate ESP
    sub     esp,12
it's almost as though we had pushed 3 dwords
but - the data on the stack is not altered (i.e., it is garbage that was there before)

later on, we can write data to those 12 bytes (our ASCII string, for example)
and still later, we can access that data directly, or by using POP

the EBP (Extended Base Pointer) register is also used to access data on the stack
ESP changes whenever items are added or removed from the stack
EBP does not change when items are added or removed
instead, it is generally a fixed pointer that is temporary on a per-subroutine basis
i have written many posts about the use of EBP - use the forum search tool to find them
Title: Re: Need some clarification on basic MASM concepts
Post by: dedndave on February 13, 2016, 04:41:42 PM
now, conversion from binary to ASCII decimal string is an entirely different subject
we usually write a subroutine to perform this function
we might pass to the subroutine the binary value to convert, and a pointer to (address of) the string buffer
whether that buffer is on the stack or in one of the data sections is inconsequential

a simple method is an adaptation of Horner's Rule
for the decimal value of 38507 decimal...
we repeatedly divide by 10
for each division, the remainder (technically, it's called a modulus) becomes the next digit (last digit first)
it will be from 0 to 9 (binary)
to convert that digit to ASCII, we add 30h (or use the OR instruction)
now, the byte is from 30h to 39h - the ASCII decimal numbers
the quotient is saved for the next pass
the process is repeated until the quotient is 0 (no more digits)

38507 / 10   >> quotient = 3850, modulus = 7 >> 7 + 30h = 37h (the last digit in the string)
3850 / 10   >> quotient = 385, modulus = 0 >> 0 + 30h = 30h
  385 / 10   >> quotient = 38, modulus = 5 >> 5 + 30h = 35h
   38 / 10   >> quotient = 3, modulus = 8 >> 8 + 30h = 38h
    3 / 10   >> quotient = 0, modulus = 3 >> 3 + 30h = 33h (the first digit in the string)


for each pass of the loop, a digit is stored, and the pointer is decremented

as you can see digits come out of the loop one byte at a time
because the stack is maintained as dwords, accessing bytes with PUSH is not practical
Title: Re: Need some clarification on basic MASM concepts
Post by: jj2007 on February 13, 2016, 07:28:12 PM
Quote from: masterori on February 13, 2016, 02:05:12 PM
Quote from: jj2007 on February 13, 2016, 01:27:35 PM

add edx, "0"
push edx

This does the conversion to ascii? You can add a char with a binary like that?

Yes, you can :t

If you had followed my advice above to use Olly, you would have already found out how the stack looks like before starting the code:
Address   Hex dump                                         comments
0018FF7C  00 00 00 00|00 00 00 00|00 00 00 00|00 00 00 00|
0018FF8C  8A 33 44 76|00 E0 FD 7E|D4 FF 18 00|82 98 C5 77| stack is 0018FF8C


Now you start the loop, pushing (using my example 123465789)
9+"0" = 00000039h
8+"0" = 00000038h
7+"0" = 00000038h


etc. After 4 times push edx, the same memory area looks like this:
Address   Hex dump                                         comments
0018FF7C  36 00 00 00|37 00 00 00|38 00 00 00|39 00 00 00| stack is 0018FF7C
0018FF8C  8A 33 44 76|00 E0 FD 7E|D4 FF 18 00|82 98 C5 77|


Now, in the second loop, we use print esp, 32 (with the 32 just being a space added to the output)
The print macro expects a pointer to a memory area containing a zero-delimited string. So is esp "a pointer to a memory area containing a zero-delimited string"?
0018FF7C  36 00

With each pop eax, esp advances a dword (4 bytes), so in round 2 print esp uses 37 00
etc etc.

Your turn. I won't help you any more unless you can prove that you opened your executable in Olly (http://www.ollydbg.de/version2.html) and hit the F8 key to see all this happening.
Title: Re: Need some clarification on basic MASM concepts
Post by: dedndave on February 13, 2016, 11:35:24 PM
    add     edx,"0"

is the same as

    add     edx,30h

these instructions would also work

    add     dl,30h

    or      dl,30h

if i am not mistaken, the instructions with EDX are 3 bytes in length
the ones with DL are only 2 bytes

you would still PUSH EDX (not PUSH DL)