I feel like such an idiot.
I should know this. I'm trying to create an array of structures, and access an element of the array using a pointer.
I've created the array with no problem:
$test STRUCT
s1 DB 20 DUP(?)
s2 DB 10 DUP(?)
s3 DB 4 DUP(?)
$test ENDS
TestStructs $test 4 DUP (<>)
Problem is, when I try to access an element of the array, the subscript I use becomes a byte offset within one of the strings, not an offset to the nth element.
In other words, if I do this:
LEA EAX, TestStructs[1].s2
I end up pointing to the 2nd byte within s2 of the first element--not at all what I want.
I thought I knew how to do this. MASM's behavior here seems completely counter-intuitive. If I say TestStructs[1].s2, I'm saying I want the 2nd element (0-based) of the array of structures (what's to the left of the period), and then I want the offset to field s2 within that element. Right?
Obviously, wrong. The following little program shows it clearly:
;============================================
; Array addressing testbed
;============================================
include \masm32\include\masm32rt.inc
;============================================
; Defines, macros, prototypes, etc.
;============================================
$test STRUCT
s1 DB 20 DUP(?)
s2 DB 10 DUP(?)
s3 DB 4 DUP(?)
$test ENDS
;============================================
; HERE BE DATA
;============================================
.data
TestStructs $test 4 DUP (<>)
Addrfmt DB "Address of TestStructs[0].s2: %x", 13, 10
DB "Address of TestStructs[1].s2: %x", 13, 10, 0
buffer DB 200 DUP(?)
;============================================
; CODE LIVES HERE
;============================================
.code
start: INVOKE wsprintf, OFFSET buffer, OFFSET Addrfmt,
OFFSET TestStructs[0].s2, OFFSET TestStructs[1].s2
INVOKE MessageBox, 0, OFFSET buffer, NULL, MB_OK
INVOKE ExitProcess, EAX
END start
So what's the correct syntax for what I'm trying to do? I know this is a piece of cake with structures that don't contain arrays (i.e., strings). It seems that the subscript is being applied to the field rather than the element, since the field is an array of bytes.
(I realize those OFFSETs don't really do anything--I just tried putting them in out of desperation!)
include \masm32\include\masm32rt.inc
$test STRUCT
s1 DB 20 DUP(?)
s2 DB 10 DUP(?)
s3 DB 4 DUP(?)
$test ENDS
.data?
TestStructs $test 4 DUP (<>)
.code
start: lea edi, TestStructs[3*$test].s2
mov byte ptr [edi], 123
print str$(TestStructs[3*$test].s2), 9, "value", 13, 10
mov eax, edi
sub eax, offset TestStructs
print str$(eax), 9, "offset", 13, 10
exit
end start
HTH, jj
Hi,
LEA EAX, TestStructs[1].s2
This says :load EAX with the following address.
Take the address of TestStructs, add the offset of .s2,
then add 1 ([1]). You probably want to use something
like:
$test STRUCT
s1 DB 20 DUP(?)
s2 DB 10 DUP(?)
s3 DB 4 DUP(?)
$test ENDS
TestStructs $test 4 DUP (<>)
SizeOfTest EQU 34 ; Your struc has 34 bytes.
LEA EAX, TestStructs[1*SizeOfTests].s2
Oops, jj2007 posted something, better forget mine.
Regards,
Steve N.
Quote from: FORTRANS on June 28, 2012, 07:22:56 AM
Oops, jj2007 posted something, better forget mine.
That doesn't mean your post isn't correct. On the contrary, you added theory to my practical example :biggrin:
By the way, try this:
SizeOfTest EQU 34
lea esi, TestStructs[3*SizeOfTest].s2
lea edi, TestStructs[3*$test].s2
.if esi==edi
shout "the same"
...
ok - my turn - lol
i would not use LEA, in this case
LEA might be needed if one of the registers already contained part of the address
if you use MOV reg,OFFSET ..... the assembler will calculate the required address for you
the assembler knows the size of the structure and the base address of the array
sometimes, it isn't so obvious until you look at the disassembled code whether LEA is required
TestStructs[1].s2
use of the brackets ([]) and the period have the same effect, here
the assembler will add the 3 elements together:
the base address of TestStructs
1
s2 (the offset of s2 in a test$ structure)
Quote from: FORTRANS on June 28, 2012, 07:22:56 AM
Hi,
LEA EAX, TestStructs[1].s2
This says :load EAX with the following address.
Take the address of TestStructs, add the offset of .s2,
then add 1 ([1]).
That's not what I would have ASS-U-med about this at all. (Even though you are correct.)
My immediate reaction is that MASM's behavior in this case is brain-damaged and illogical. On more sober contemplation, it seems that MASM simply lacks true array processing.
Why do I say "brain-damaged and illogical"? Because, well, C handles array references in a way that seems logical:
array[n].field says "Take the offset of the
nth element of
array and add to that the offset of
field". Everything to the
left of the dot has to do with selecting the array element; everything to the right adds an offset to that selection.
That's the way array references
should work. But MASM has it bass-ackwards. (I confirmed it with a little test prog. Doesn't matter if the fields are DDs or whatever.) How did they come up with that behavior?
In other words, what I thought was a subscript is actually just an offset, much like [EBX + VarName + 1]. The really annoying thing is that I haven't even been able to find documentation of this, at least not in the official Micro$oft MASM manual.
So is there no good shorthand method of referencing array elements using subscripts?
By the way, rather than using an equate using a hard-coded number (which would be incorrect if the size of the structure changed), I would prefer to do things this way:
TestStructs[SIZEOF $test * 1].s2Still sucks compared to the way it
should work, though ...
:biggrin:
it is perfectly logical - just low-level
that is one of the major differences in programming in ASM vs compiled languages
you have to do a little more work in order to get a lot more control
and - you get to see what goes on inside the processor's "head" :P
Quote from: dedndave on June 28, 2012, 12:53:10 PM
:biggrin:
it is perfectly logical - just low-level
Sorry, no; it's not logical at all. At least not syntactically.
Look: I make an array reference like
TextField[2].field1. How in the world can you say that interpreting "[2]" as being an offset added to the offset of "field1" makes sense? It doesn't; everything on the left of the period should be evaluated as referencing a particular
array element, not an offset from the 0th element. Otherwise, why have arrays at all if you can't properly reference their elements? (Well, we can, but we have to jump through a few hoops in other to do it. And it has nothing whatever to do with "low level" vs. high level.)
it has everything to do with being low level
at any rate....
you sure like to be contrary, don't you - lol
you're lucky we are in the campus
i remind myself that these posts are really for reference for others
I came up with another way to access array elements:
$test STRUCT
s1 DD ?
s2 DD ?
s3 DD ?
$test ENDS
ar TEXTEQU <SIZEOF $test *>
LEA EDX, TestStructs[ar 2].s2
Is more intuitively satisfying to me (i.e., the "subscript" number is what one would expect), and is still "low level".
In most cases, you don't want to use hard-coded subscripts.
At best, you can access bytes, words, dwords, and qwords from arrays using the following syntax forms:
mov al,ByteArray[ecx]
mov ax,WordArray[ecx*2]
mov eax,DwordArray[ecx*4]
mov eax,dword ptr QwordArray[ecx*8] ; lower half
mov edx,dword ptr QwordArray[ecx*8+4] ; upper half
MASM exposes the processor, and the processor knows nothing about arrays.
For an arbitrary sized item, you are forced to do the following for a variable index:
mov eax,sizeof $test
imul index_of_array ; compute byte offset
mov edx,TestStructs[eax].s1
mov TestStructs[eax].s2,edx
or
mov ecx,index_of_array
imul eax,ecx,sizeof $test ; compute byte offset
mov edx,TestStructs[eax].s1
mov TestStructs[eax].s2,edx
or
imul eax,index_of_array,sizeof $test ; compute byte offset
mov edx,TestStructs[eax].s1
mov TestStructs[eax].s2,edx
Constant indexes are easy. The array index does need to be adjusted by the size of the array elements, but if you are looping through the elements the multiply can be replaced with an addition.
;==============================================================================
include \masm32\include\masm32rt.inc
;==============================================================================
$test STRUCT
s0 DWORD 3 DUP(?)
s1 DWORD 3 DUP(?)
s2 DWORD 3 DUP(?)
$test ENDS
;==============================================================================
.data
TestStructs $test 3 DUP (<{0,1,2},{3,4,5},{6,7,8}>)
.code
;==============================================================================
start:
;==============================================================================
I = sizeof $test
printf("%d\t", TestStructs[I*0].s0[0*4])
printf("%d\t", TestStructs[I*0].s0[1*4])
printf("%d\t", TestStructs[I*0].s0[2*4])
printf("%d\t", TestStructs[I*0].s1[0*4])
printf("%d\t", TestStructs[I*0].s1[1*4])
printf("%d\t", TestStructs[I*0].s1[2*4])
printf("%d\t", TestStructs[I*0].s2[0*4])
printf("%d\t", TestStructs[I*0].s2[1*4])
printf("%d\n\n",TestStructs[I*0].s2[2*4])
printf("%d\t", TestStructs[I*1].s0[0*4])
printf("%d\t", TestStructs[I*1].s0[1*4])
printf("%d\t", TestStructs[I*1].s0[2*4])
printf("%d\t", TestStructs[I*1].s1[0*4])
printf("%d\t", TestStructs[I*1].s1[1*4])
printf("%d\t", TestStructs[I*1].s1[2*4])
printf("%d\t", TestStructs[I*1].s2[0*4])
printf("%d\t", TestStructs[I*1].s2[1*4])
printf("%d\n\n",TestStructs[I*1].s2[2*4])
printf("%d\t", TestStructs[I*2].s0[0*4])
printf("%d\t", TestStructs[I*2].s0[1*4])
printf("%d\t", TestStructs[I*2].s0[2*4])
printf("%d\t", TestStructs[I*2].s1[0*4])
printf("%d\t", TestStructs[I*2].s1[1*4])
printf("%d\t", TestStructs[I*2].s1[2*4])
printf("%d\t", TestStructs[I*2].s2[0*4])
printf("%d\t", TestStructs[I*2].s2[1*4])
printf("%d\n\n",TestStructs[I*2].s2[2*4])
xor ebx, ebx
.WHILE ebx < 3 * I
xor esi, esi
.WHILE esi < 3
printf("%d\t", TestStructs[ebx].s0[esi*4])
inc esi
.ENDW
xor esi, esi
.WHILE esi < 3
printf("%d\t", TestStructs[ebx].s1[esi*4])
inc esi
.ENDW
xor esi, esi
.WHILE esi < 3
printf("%d\t", TestStructs[ebx].s2[esi*4])
inc esi
.ENDW
printf("\n\n")
add ebx, I
.ENDW
inkey
exit
;==============================================================================
end start
As Dave already wrote, Masm is low level, and [n] means "offset n bytes". But you still have elegant options available:
include \masm32\include\masm32rt.inc
$test STRUCT
s1 DB 20 DUP(?)
s2 DB 10 DUP(?)
s3 DB 4 DUP(?)
$test ENDS
.data?
TestStructs $test 4 DUP (<>)
.code
start: lea edi, TestStructs[1*$test].s2 ; indirect, using edi
mov byte ptr [edi], 111 ; needs to inform Masm which size
mov TestStructs[2*$test].s2, 222 ; directly, no size info needed
print str$(TestStructs[1*$test].s2), 9, "value 1.s2", 13, 10
print str$(TestStructs[2*$test].s2), 9, "value 2.s2", 13, 10
mov eax, edi
sub eax, offset TestStructs
print str$(eax), 9, "offset 1:0", 13, 10
exit
end start
Output:
111 value 1.s2
222 value 2.s2
54 offset 1:0
If that is still not highlevelish enough, you are a candidate for MasmBasic :greensml:
include \masm32\MasmBasic\MasmBasic.inc
$test STRUCT
s1 DB 20 DUP(?)
s2 DB 10 DUP(?)
s3 DB 4 DUP(?)
$test ENDS
Init
Dim TestStructs(3) As $test
mov TestStructs(3, s2), 123
Print Str$("Value=\t%i\n", TestStructs(3, s2))
lea ecx, TestStructs(0, s1) ; first item
lea eax, TestStructs(3, s2) ; current item
sub eax, ecx
Inkey Str$("Offset=\t%i", eax)
Exit
end start
Pure MAssemblerTM :biggrin:
Quote from: michaelwConstant indexes are easy. The array index does need to be adjusted by the size of the array elements, but if you are looping through the elements the multiply can be replaced with an addition.
One of the reasons for the continuing existence of the C language is to have access to some of the ASM tricks.
The indexed copy loop
for (j = 0; j < count; j++) dest[j].f1 = src[j].f2; can be rewritten as
pdest = dest; // array name is ptr constant
psrc = src;
n = count;
while (n--) (pdest++)->f1 = (psrc++)->f2;
The latter replaces the subscript multiplication with address addition. It is roughly equivalent to
lea edi,dest
lea esi,src
mov ecx,count
cmp ecx,0
je endlbl
lbl:
mov eax,[esi].src_struct.f2 ; if f1 and f2 are dwords
mov [edi].dest_struct.f1,eax
add esi,sizeof src_struct
add edi,sizeof dest_struct
dec ecx
jnz lbl
endlbl:
A technique that does work well and is no big deal to write in assembler is an array of pointers to other array elements. The target array can be of uneven size, IE: an array of variable length strings for example but what makes it fast and easy to work with is an array of predictable size (DWORD ARRAY) where each DWORD member is a pointer to the uneven size elements.
High level languages do this all the time but its easy enough to create an array of variable length elements that is addressed by an array of pointers.
I remember running into this when i was writing some code that accessed the data directories in the PE header, i wondered why in C i could simply give the directory equate yet in MASM it was failing. I debugged it and saw the memory accesses weren't correct and realized that i had to multply the directory index by the size of IMAGE_DATA_DIRECTORY to get it back to the correct values.
HR,
Ghandi
Quote from: jj2007 on June 28, 2012, 07:20:35 AM
print str$(TestStructs[3*$test].s2), 9, "value", 13, 10
The SIZEOF operator is assumed when using the name of the struct in a calculation?
test$ STRUCT
member db ?
test$ ENDS
TestStruct test$ <>
yes - test$ is a type definition and the assembler will use the size, as it has no address
TestStruct has an address, so the assembler will use that
Yes, in my code I was able to reduce:
I = sizeof $test
To:
I = $test
And the app produced the same output as it did with the sizeof operator.
Quote from: tenkey on June 28, 2012, 05:03:29 PM
In most cases, you don't want to use hard-coded subscripts.
I agree; usually, one doesn't. However, in this case, hard-coded subscripts are what's needed, which is what prompted my question in the first place.
What I'm doing is placing addresses within one one array of structures (pointers to text buffers)into another array. It's much easier to hard-code this at assemble-time (since there's a 1:1 correspondence between the two elements), rather than have to programmatically loop through the the structures at runtime and plug in addresses.
Quote from: hutch-- on June 28, 2012, 09:36:04 PM
A technique that does work well and is no big deal to write in assembler is an array of pointers to other array elements. The target array can be of uneven size, IE: an array of variable length strings for example but what makes it fast and easy to work with is an array of predictable size (DWORD ARRAY) where each DWORD member is a pointer to the uneven size elements.
High level languages do this all the time but its easy enough to create an array of variable length elements that is addressed by an array of pointers.
I edited this as I figured out how to do this with pain and patience.
.NOLIST
include \masm32\include\masm32rt.inc
.LIST
ThreadFunction PROTO :DWORD
ErrorHandler PROTO :LPTSTR
ExitProcess PROTO :DWORD
Main PROTO
MYDATA STRUCT
val1 DWORD ?
val2 DWORD ?
MYDATA ENDS
public start
.CONST
max_threads EQU 10
buff_size EQU 0FFh
.DATA
threadMsg DB "CreateThread ", 0
rtlMsg DB "Parameters = %d, %d", 10, 0
rtlMsg2 DB "%s failed with error %d: %s", 0
errMsg DB "Error", 0
.DATA?
gData MYDATA <?>
.CODE
start:
INVOKE Main
;inkey
push 0
call ExitProcess
ThreadFunction PROC lpParam:DWORD
LOCAL hStdOut:DWORD
LOCAL msgBuf[buff_size]:BYTE
LOCAL cchStringSize:DWORD
LOCAL dwChars:DWORD
push STD_OUTPUT_HANDLE
call GetStdHandle
mov hStdOut, eax
cmp hStdOut, INVALID_HANDLE_VALUE
je _ERROR
;printf("I am inside\n")
mov ecx, lpParam
mov ebx, DWORD PTR [ecx]
mov edx, DWORD PTR [ecx+4]
INVOKE crt__snprintf, addr msgBuf, buff_size, addr rtlMsg, ebx, edx
mov cchStringSize, eax
INVOKE WriteConsoleA, hStdOut, addr msgBuf, cchStringSize, addr dwChars, 0
ret
_ERROR:
printf("I didn't make it that far inside the thread function\n")
mov eax, 1
ret
ThreadFunction ENDP
ErrorHandler PROC lpszFunction:LPTSTR
LOCAL lpMsgBuf:LPVOID
LOCAL lpDisplayBuf:LPVOID
LOCAL dwordage:DWORD
mov dwordage, rv(GetLastError)
INVOKE FormatMessage, FORMAT_MESSAGE_ALLOCATE_BUFFER or \
FORMAT_MESSAGE_FROM_SYSTEM or FORMAT_MESSAGE_IGNORE_INSERTS,
0, dwordage, 0, lpMsgBuf, 0, 0
mov eax, rv(lstrlen, lpszFunction)
mov ebx, eax
mov eax, rv(lstrlen, lpMsgBuf)
add eax, ebx
add eax, 028h
push eax
push 0
call LocalAlloc
mov lpDisplayBuf, eax
INVOKE wsprintf, lpDisplayBuf, eax, addr rtlMsg2, lpszFunction, dwordage, lpMsgBuf
INVOKE MessageBoxA, 0, lpDisplayBuf, addr errMsg, 0
push lpMsgBuf
call LocalFree
push lpDisplayBuf
call LocalFree
ret
ErrorHandler ENDP
Main PROC
LOCAL localStrucArray[max_threads]:DWORD
LOCAL dwThreadIdArray[max_threads]:DWORD
LOCAL hThreadArray[max_threads]:HANDLE
LOCAL i:DWORD
mov i, 0
.WHILE i < max_threads
INVOKE GetProcessHeap
INVOKE HeapAlloc, eax, HEAP_ZERO_MEMORY, SIZEOF(MYDATA)
mov ecx, DWORD PTR i
mov DWORD PTR localStrucArray[ecx*4], eax
cmp DWORD PTR localStrucArray[ecx*4], 0
jz _EXIT
jne @F
INVOKE ErrorHandler, addr threadMsg
printf("Here @ErrorHandler\n")
push 2
call ExitProcess
@@:
mov ecx, DWORD PTR i
ASSUME eax:PTR MYDATA
mov eax, DWORD PTR localStrucArray[ecx*4]
mov edx, DWORD PTR i
add edx, 10
mov (MYDATA PTR [eax]).val1, edx
add edx, 90
mov (MYDATA PTR [eax]).val2, edx
ASSUME eax:NOTHING
mov edx, DWORD PTR dwThreadIdArray[ecx*4]
mov ecx, DWORD PTR localStrucArray[ecx*4]
INVOKE CreateThread, 0, 0, offset ThreadFunction, ecx, 0, edx
mov ecx, DWORD PTR i
mov DWORD PTR hThreadArray[ecx*4], eax
cmp DWORD PTR hThreadArray[ecx*4], 0
jz _EXIT
inc i
.ENDW
INVOKE WaitForMultipleObjects, max_threads, addr hThreadArray, TRUE, -1
mov i, 0
.WHILE i < max_threads
mov ecx, DWORD PTR i
mov ebx, DWORD PTR hThreadArray[ecx*4]
INVOKE CloseHandle, ebx
mov ecx, DWORD PTR i
cmp localStrucArray[ecx*4], 0
je @F
call GetProcessHeap
mov ecx, DWORD PTR i
mov edx, DWORD PTR localStrucArray[ecx*4]
INVOKE HeapFree, eax, 0, edx
mov ecx, DWORD PTR i
mov DWORD PTR localStrucArray[ecx*4], 0
@@:
inc i
.ENDW
xor eax, eax
ret
_EXIT:
printf("Here @exit\n")
push 3
call ExitProcess
Main ENDP
end start
The first time I attempted to tackle this bit of Heap Allocation.