I'm reviewing for my midterm right now, and I'm a bit confused with some basic concepts, so I was hoping someone can clarify for me?
1. please see code example from book
.data
sum dword 0
.code
mov eax 5
add eax,6
mov sum,eax
Why did the book use the "dword" directive? The sum total is 11, maybe I haven't clearly understood how size works, but shouldn't "byte" or "word" be sufficient? Since 11 can fit into 8 bits: 00001011
2. Are spaces counted in terms of a space taken in memory? E.g
text byte "Hello there",0
Is the number of bytes 12 or 11?
3. If I want to store the ASCII value of an unsigned binary in the runtime stack, can I just push the binary surrounded by single quotes?
E.g: The value 0b1001011001101011 is 38507, so if I want to push the ASCII of 3,8,5,0,7 onto the runtime stack, can I just divide 1001011001101011 by 10000, 1000, 100, 10, 1 and then push the quotient onto stack? Or will I have to convert that unsigned binary into a decimal value first?
Quote from: masterori on February 07, 2016, 04:50:39 PM1. Why did the book use the "dword" directive? The sum total is 11, maybe I haven't clearly understood how size works, but shouldn't "byte" or "word" be sufficient? Since 11 can fit into 8 bits: 00001011
It
can fit into 8 bits, but the "natural" size for x86 code is 32 bits, so by default, one would use dword. Costs a bit more memory, but often it is faster, and there is more than enough memory around ;-)
Quote2. Are spaces counted in terms of a space taken in memory?
Yes, the processor couldn't care less if you've stored " ", "X" or a nullbyte.
Quote3. If I want to store the ASCII value of an unsigned binary in the runtime stack, can I just push the binary surrounded by single quotes?
E.g: The value 0b1001011001101011 is 38507, so if I want to push the ASCII of 3,8,5,0,7 onto the runtime stack, can I just divide 1001011001101011 by 10000, 1000, 100, 10, 1 and then push the quotient onto stack? Or will I have to convert that unsigned binary into a decimal value first?
That dividing method won't work - no need to do such acrobatics. Instead, you can use
push 1001011001101011b
push 1001011001101011y
push 38507
push 966Bh
push "–k"
It's all the same value, just expressed in a different syntax.
You should test what you are doing using either a debugger or the deb macro. See this thread for details. (http://masm32.com/board/index.php?topic=5101.msg54907#msg54907)
For example, with the deb macro (http://www.webalice.it/jj2006/MasmBasicQuickReference.htm#Mb1019), you would typically do this:
push 11111111 ; delimit your stack entries
push 1001011001101011b
push 1001011001101011y
push 30000+8000+500+7
push 38507
push 966Bh
push "–k"
push 22222222 ; delimit your stack entries
deb 4, "Values on the stack", stack[0], stack[4], stack[8], stack[12], stack[16], stack[20], stack[24], stack[28]
and see this output:
Values on the stack
stack[0] 22222222
stack[4] 38507
stack[8] 38507
stack[12] 38507
stack[16] 38507
stack[20] 38507
stack[24] 38507
stack[28] 11111111
it is simpler to push the binary value and convert it to ascii decimal when required
however, strings can be pushed onto the stack
pushing an ascii decimal value is problematic, though
consider
0000 0000 0000 0000 0000 0000 0000 0001 "1" - 1 byte string + null terminator
0111 1111 1111 1111 1111 1111 1111 1111 "2147483647" - 10 byte string + null terminator
and, for 32-bit code, the stack should always be 4-aligned
so, regardless of the length of the string, you will probably want to push 3 dwords
But what if I want to put the individual ASCII of the 38507 in the stack? Like so:
stack[0] = ; ascii of 3
stack[4] = ; ascii of 8
...
That's the reason I was asking if it would be okay to divide that unsigned binary by 10000,1000, etc and then somehow get the ascii value of the quotient in eax.
If I just push the entire binary value (0b1001011001101011) onto the stack, then it wouldn't have the ascii of each individual digit, or any ascii char at all right?
division is naturally one of the slower operations to perform
and, while it is possible to perform division on ASCII decimal strings,
it is faster to perform division on binary values, then convert the results to ASCII
this is the general concept for most math with computers
use human native ASCII decimal for input and output,
but perform the intermediate math steps in the CPU native binary format
using the stack for storage is another concept, altogether
best to learn one, then the other
trying to learn both at once is just making it harder on yourself
Quote from: masterori on February 08, 2016, 12:44:46 PM
But what if I want to put the individual ASCII of the 38507 in the stack? Like so:
stack[0] = ; ascii of 3
stack[4] = ; ascii of 8
...
You can get that by repeated division by 10 until the quotient gets zero:
.const
ten DWORD 10
.code
mov eax,12456
xor ecx,ecx
@1: xor edx,edx
div ten
add edx,'0'
push edx
add ecx,1
cmp eax,0
jnz @1
; ECX = count digits
; ASCII-digits in [ESP] ... [ESP+ECX*4-4]
; output example
@2: pop edx
push ecx
fn crt_putchar,edx
pop ecx
sub ecx,1
jnz @2
I guess I'm not being clear on my question. I was wondering, if I push like say '0010' vs 0010, will it put '51' on the stack? If so for which - '0010' or 0010?
Another question as I was going through some book examples, it's regarding the 'imul' instruction
mov al, 48
mov bl, 4
imul bl ; AX = 00c0h, Overflow = 1 because AH is not a sign extension of AL
mov ax,48
mov bx, 4
imul bx ; DX:AX = 000000c0h, Overflow = 0 because DX is sign extension of AX
Can someone explain why the first example is not a sign extension? The book says "Because AH is not a sign extension of AL", but I'm not sure why...
AL and AH are respecively the LOW byte and the HIGH byte of the AX register which is the LOW word of the EAX register. The difference between signed and unsigned is not how its stored in memory, it is how you evaluate it AFTER the value is in a register or memory. JG is a signed instruction, JA is unsigned. After a CMP you use a conditional jump to branch according to its value.
Quote from: masterori on February 11, 2016, 04:19:24 PM
I guess I'm not being clear on my question. I was wondering, if I push like say '0010' vs 0010, will it put '51' on the stack? If so for which - '0010' or 0010?
push "0010" ;string operands are reversed, so 30303130h is pushed (as a string, "0100")
push 0010 ;the masm default radix is 10 (decimal), so 0000000Ah is pushed
push 0010h ;00000010h is pushed
usually, strings need to be null-terminated
if you want to push a literal string onto the stack, you should take that into account
push "gfe" ;the last byte (highest address) pushed will be 0
push "dcba" ;you now have the string "abcdefg",0 on the stack
as Hutch mentioned, the difference between signed values and unsigned values is how they are interpreted
i.e., the context is up to the programmer
as an example, let's say that EAX holds the value 00303030h
mov eax,303030h
if we want that to represent a null-terminated string, it is "000",0
if we want that to represent a signed integer, it is +3158064 decimal
if we want that to represent an unsigned integer, it is 3158064
unsigned dword integers have the range of 0 to 4294967295 decimal
signed dword integers have the range of -2147483648 to +2147483647
unsigned word integers have the range of 0 to 65535
signed word integers have the range of -32768 to +32767
unsigned byte integers have the range of 0 to 255
signed byte integers have the range of -128 to +127
simply, if the value is signed, and the high bit is 0, it is positive
if the value is signed and the high bit is 1, it is negative, and the order of count is reversed (two's compliment)
0FFFFFFFFh may be treated as -1 (signed), or 4294967295 (unsigned)
080000000h may be treated as -2147483648 (signed), or 2147483648 (unsigned)
the real trick to signed versus signed is how we treat them in our code
basically, we need to use the correct routine to convert integers to decimal strings (and vica versa)
AND, we need to use the right set of conditional branch instructions when dealing with signed/unsigned values
notice that signed and unsigned branches act on different flags in the EFlags register
a SUB instruction will yield the same result whether the values are signed or unsigned
however, a different set of flags might be used to branch
this is the elegance of the two's compliment numbering system
------------------------------------------------------------------------
Group Instruction Description Condition Aliases
------------------------------------------------------------------------
Equality JZ Jump if equal ZF=1 JE
JNZ Jump if not equal ZF=0 JNE
Unsigned JA Jump if above CF=0 and ZF=0 JNBE
JAE Jump if above or equal CF=0 JNC JNB
JB Jump if below CF=1 JC JNAE
JBE Jump if below or equal CF=1 or ZF=1 JNA
Signed JG Jump if greater SF=OF or ZF=0 JNLE
JGE Jump if greater or equal SF=OF JNL
JL Jump if less SF<>OF JNGE
JLE Jump if less or equal SF<>OF or ZF=1 JNG
JO Jump if overflow OF=1
JNO Jump if no overflow OF=0
JS Jump if sign SF=1
JNS Jump if no sign SF=0
Parity JP Jump if parity PF=1 JPE
JNP Jump if no parity PF=0 JPO
here, we see two's compliment at work
the CPU doesn't care whether the values are signed or unsigned
it performs the same operation, either way
mov eax,0FFFFFFFFh ;signed value = -1, unsigned value = 4294967295
mov edx,000000100h ;signed value = +256, unsigned value = 256
sub eax,edx ;result = 0FFFFFEFFh, signed = -257, unsigned = 4294967039, EFlags = 00000286h
mov eax,0FFFFFFFFh ;signed value = -1, unsigned value = 4294967295
mov edx,000000100h ;signed value = +256, unsigned value = 256
add eax,edx ;result = 0FFh, signed = +255, unsigned = 255 (with carry), EFlags = 00000207h
the EFLags register may be evaluated as follows:
EFlags Register
Bit Description
31-22 unassigned, always 0 (as of Pentium IV)
21 ID (Identification, may be toggled if CPUID supported)
20 VIP (Virtual Interrupt Pending)
19 VIF (Virtual Interrupt Flag)
18 AC (Alignment Check, may be toggled if 486 or later)
17 VM (Virtual 8086 Mode)
16 RF (Resume Flag)
15 unassigned, always 0 (as of Pentium IV)
14 NT (Nested Task)
13,12 IOPL (I/O Privilege Level)
11 OF (Overflow Flag)
10 DF (Direction Flag, 0 = up)
9 IF (Interrupt Flag)
8 TF (Trap Flag)
7 SF (Sign Flag)
6 ZF (Zero Flag)
5 unassigned, always 0 (as of Pentium IV)
4 AF (Auxiliary Flag)
3 unassigned, always 0 (as of Pentium IV)
2 PF (Parity Flag, 0 = odd parity)
1 unassigned, always 1 (as of Pentium IV)
0 CF (Carry Flag)
you are primarily interested in the math flags: OF, SF, ZF, CF
So based on dedndave's answer, it seems I have to convert the binary to ascii first then push? How can I do so such that the binary: 0b1001011001101011 (in dec: 38507) is stored like this in the runtime stack:
stack[0] 55 ; ascii of 7
stack[1] 48 ; ascii of 0
stack[2] 53 ; ascii of 5
stack[3] 56 ; ascii of 8
stack[4] 51 ; ascii of 3
the assembler is perfectly capable of converting ASCII, decimal, hexadecimal, octal, etc to binary values
that means you can type
mov eax,"abcd"
mov eax,61626364h
mov eax,1633837924
mov eax,1100001011000100110001101100100b
they are all the same
you use the one that makes your code easy to understand
also, i will say this about PUSH and POP
in 32-bit code, the stack should always be 4-byte aligned
that means we don't generally push or pop bytes or words - always dwords
So they're all stored as ascii in the runtime stack by default? The reason I'm so persistent on knowing how to store the values as ascii on the stack, is because that's one of the review questions. Write code so that once it's done, the ascii values of the binary is in the runtime stack.
there are several ways to do that
keep in mind, the last byte should be a 0 (null terminator)
also - any DWORD ASCII decimal value, signed or unsigned, plus the null terminator, will fit into a 12-byte buffer
so, you might start with
mov edi,esp ;EDI points to the end of buffer
push 0
sub esp,8
or, you could do this, and add code to 0 the last byte later on
mov edi,esp ;EDI points to the end of buffer
sub esp,12
notice that ESP maintains 4-byte aligned in both cases
Maybe I'm a bit dense, but I don't think you're answering my question...
If I do something like this
push 0011
Will it push the ASCII character 51 onto the stack?. The ASCII 51 is the representation of 3. If it does not, how do I get it to put 51 instead of 0011, 3 or whatever it puts in the stack?
push 11
will push a decimal 11, dword size (3 zero bytes)
dword is the default for push in 32-bit code
decimal is the default radix for masm, so the 11 is seen as 11, or 0Bh
if you want to get 51 on the stack
push 51
again, it will be a dword, and 51 will be seen as decimal (ASCII for "3")
So even if I use push 0011b
it'll still push 11? Is there a way to automatically convert a binary into ascii?
As per my original question, the question wants to push each digit of 38507 onto the stack one by one. I'm given 1001011001101011b to start with. So I thought of using 'div' by 10000, 1000, 100, ... to get each of the digits of 38507 in binary form. Or does that not work - dividing a binary by a decimal?
11b specifies binary - that overrides the default - that is 3 in decimal
ok - i am telling you that pushing individual bytes isn't going to work
you can push 4 bytes at a time
or - you can create stack space by multiples of 4 bytes, then fill those bytes individually
that is why i used sub esp
you did not state that the values had to be pushed
you stated they had to be on the stack when done
I understand everything that you've posted. But I'm still trying to figure out if there's a way to get the ascii representation of each digit because that's what the question is asking for. It shows a stack similar to the one I posted on the previous page.
Quote from: dedndave on February 13, 2016, 12:05:16 PM
you did not state that the values had to be pushed
you stated they had to be on the stack when done
I don't understand you. How can they be on the stack when they weren't pushed on?
Again, 1001011001101011b is 38507 in binary. So again going back to my question on the first page, can I take 1001011001101011b and divide by 10000 (decimal) to get 3 decimal then (somehow) get the ascii representation of it and push it onto the stack?
I want to know if there's some kind of conversion that gets the ascii of a binary or decimal value. The reason is because DIV stores the quotient in AX and the remainder in DX, so I was going to loop it until DX is 0. This would give me 3, 8, 5, 0, 7 respectively. Now all that's left is to convert those to ascii then push (or if pushing automatically converts it over to ascii then that's fine).
Quote from: masterori on February 13, 2016, 12:36:25 PMcan I take 1001011001101011b and divide by 10000 (decimal) to get 3 decimal then (somehow) get the ascii representation of it and push it onto the stack?
Yes. Get Olly (http://www.ollydbg.de/version2.html) to understand what this code does:
include \masm32\include\masm32rt.inc
.code
start:
mov eax, 123456789 ; or 38507
mov esi, esp ; we need a "permanent" reg32 like esi here
mov ecx, 10
.Repeat
cdq
div ecx
add edx, "0"
push edx
.Until !eax
.Repeat
print esp, 32
pop eax
.Until esp>=esi
inkey chr$(13, 10, "That was cool, right?")
exit
end start
Quote from: jj2007 on February 13, 2016, 01:27:35 PM
add edx, "0"
push edx
This does the conversion to ascii? You can add a char with a binary like that?
let's deal with the stack
there are many ways to access data on the stack
generally, the CPU maintains a stack pointer
for 32-bit code, it is named ESP (Extended Stack Pointer)
it is a register that holds a pointer into the stack area (an address)
when you PUSH a dword onto the stack, the value in ESP is decreased by 4, and the data is written to that address
when you POP a dword, the value at [ESP] is popped to whatever destination you specified,
and ESP is increased by 4
now, if we want to "reserve" space on the stack without PUSH, we can manipulate ESP
sub esp,12
it's almost as though we had pushed 3 dwords
but - the data on the stack is not altered (i.e., it is garbage that was there before)
later on, we can write data to those 12 bytes (our ASCII string, for example)
and still later, we can access that data directly, or by using POP
the EBP (Extended Base Pointer) register is also used to access data on the stack
ESP changes whenever items are added or removed from the stack
EBP does not change when items are added or removed
instead, it is generally a fixed pointer that is temporary on a per-subroutine basis
i have written many posts about the use of EBP - use the forum search tool to find them
now, conversion from binary to ASCII decimal string is an entirely different subject
we usually write a subroutine to perform this function
we might pass to the subroutine the binary value to convert, and a pointer to (address of) the string buffer
whether that buffer is on the stack or in one of the data sections is inconsequential
a simple method is an adaptation of Horner's Rule
for the decimal value of 38507 decimal...
we repeatedly divide by 10
for each division, the remainder (technically, it's called a modulus) becomes the next digit (last digit first)
it will be from 0 to 9 (binary)
to convert that digit to ASCII, we add 30h (or use the OR instruction)
now, the byte is from 30h to 39h - the ASCII decimal numbers
the quotient is saved for the next pass
the process is repeated until the quotient is 0 (no more digits)
38507 / 10 >> quotient = 3850, modulus = 7 >> 7 + 30h = 37h (the last digit in the string)
3850 / 10 >> quotient = 385, modulus = 0 >> 0 + 30h = 30h
385 / 10 >> quotient = 38, modulus = 5 >> 5 + 30h = 35h
38 / 10 >> quotient = 3, modulus = 8 >> 8 + 30h = 38h
3 / 10 >> quotient = 0, modulus = 3 >> 3 + 30h = 33h (the first digit in the string)
for each pass of the loop, a digit is stored, and the pointer is decremented
as you can see digits come out of the loop one byte at a time
because the stack is maintained as dwords, accessing bytes with PUSH is not practical
Quote from: masterori on February 13, 2016, 02:05:12 PM
Quote from: jj2007 on February 13, 2016, 01:27:35 PM
add edx, "0"
push edx
This does the conversion to ascii? You can add a char with a binary like that?
Yes, you can :t
If you had followed my advice above to use Olly, you would have already found out how the stack looks like before starting the code:
Address Hex dump comments
0018FF7C 00 00 00 00|00 00 00 00|00 00 00 00|00 00 00 00|
0018FF8C 8A 33 44 76|00 E0 FD 7E|D4 FF 18 00|82 98 C5 77| stack is 0018FF8CNow you start the loop, pushing (using my example 123465789)
9+"0" = 00000039h
8+"0" = 00000038h
7+"0" = 00000038hetc. After 4 times
push edx, the
same memory area looks like this:
Address Hex dump comments
0018FF7C 36 00 00 00|37 00 00 00|38 00 00 00|39 00 00 00| stack is 0018FF7C
0018FF8C 8A 33 44 76|00 E0 FD 7E|D4 FF 18 00|82 98 C5 77|Now, in the second loop, we use
print esp, 32 (with the 32 just being a space added to the output)
The
print macro expects a pointer to a memory area containing a zero-delimited string. So is
esp "a pointer to a memory area containing a zero-delimited string"?
0018FF7C 36 00With each pop eax, esp advances a dword (4 bytes), so in round 2
print esp uses
37 00
etc etc.
Your turn. I won't help you any more unless you can prove that you opened your executable in Olly (http://www.ollydbg.de/version2.html) and hit the F8 key to see all this happening.
add edx,"0"
is the same as
add edx,30h
these instructions would also work
add dl,30h
or dl,30h
if i am not mistaken, the instructions with EDX are 3 bytes in length
the ones with DL are only 2 bytes
you would still PUSH EDX (not PUSH DL)