How to access structure array elements

NoCforMe · June 28, 2012, 07:05:57 AM

I feel like such an idiot.

I should know this. I'm trying to create an array of structures, and access an element of the array using a pointer.

I've created the array with no problem:

Code Select


$test		STRUCT
  s1		DB 20 DUP(?)
  s2		DB 10 DUP(?)
  s3		DB 4 DUP(?)
$test		ENDS

TestStructs	$test 4 DUP (<>)

Problem is, when I try to access an element of the array, the subscript I use becomes a byte offset within one of the strings, not an offset to the nth element.

In other words, if I do this:

Code Select


	LEA	EAX, TestStructs[1].s2

I end up pointing to the 2nd byte within s2 of the first element--not at all what I want.

I thought I knew how to do this. MASM's behavior here seems completely counter-intuitive. If I say TestStructs[1].s2, I'm saying I want the 2nd element (0-based) of the array of structures (what's to the left of the period), and then I want the offset to field s2 within that element. Right?

Obviously, wrong. The following little program shows it clearly:

Code Select


;============================================
; Array addressing testbed
;============================================


	include \masm32\include\masm32rt.inc


;============================================
; Defines, macros, prototypes, etc.
;============================================

$test		STRUCT
  s1		DB 20 DUP(?)
  s2		DB 10 DUP(?)
  s3		DB 4 DUP(?)
$test		ENDS


;============================================
; HERE BE DATA
;============================================
.data

TestStructs	$test 4 DUP (<>)

Addrfmt	DB "Address of TestStructs[0].s2: %x", 13, 10
	DB "Address of TestStructs[1].s2: %x", 13, 10, 0

buffer		DB 200 DUP(?)


;============================================
; CODE LIVES HERE
;============================================
.code


start:	INVOKE	wsprintf, OFFSET buffer, OFFSET Addrfmt,
		OFFSET TestStructs[0].s2, OFFSET TestStructs[1].s2
	INVOKE	MessageBox, 0, OFFSET buffer, NULL, MB_OK

	INVOKE	ExitProcess, EAX

	END	start

So what's the correct syntax for what I'm trying to do? I know this is a piece of cake with structures that don't contain arrays (i.e., strings). It seems that the subscript is being applied to the field rather than the element, since the field is an array of bytes.

(I realize those OFFSETs don't really do anything--I just tried putting them in out of desperation!)

jj2007 · June 28, 2012, 07:20:35 AM

include \masm32\include\masm32rt.inc

$test      STRUCT
s1      DB 20 DUP(?)
s2      DB 10 DUP(?)
s3      DB 4 DUP(?)
$test      ENDS

.data?
TestStructs   $test 4 DUP (<>)

.code
start:   lea   edi, TestStructs[3*$test].s2
   mov byte ptr [edi], 123
   print str$(TestStructs[3*$test].s2), 9, "value", 13, 10
   mov eax, edi
   sub eax, offset TestStructs
   print str$(eax), 9, "offset", 13, 10
   exit
end start

HTH, jj

FORTRANS · June 28, 2012, 07:22:56 AM

Hi,

Code Select

	LEA	EAX, TestStructs[1].s2

This says :load EAX with the following address.
Take the address of TestStructs, add the offset of .s2,
then add 1 ([1]). You probably want to use something
like:

Code Select

$test		STRUCT
  s1		DB 20 DUP(?)
  s2		DB 10 DUP(?)
  s3		DB 4 DUP(?)
$test		ENDS

TestStructs	$test 4 DUP (<>)
SizeOfTest      EQU     34      ; Your struc has 34 bytes.

	LEA	EAX, TestStructs[1*SizeOfTests].s2

Oops, jj2007 posted something, better forget mine.

Regards,

Steve N.

jj2007 · June 28, 2012, 07:43:49 AM

Quote from: FORTRANS on June 28, 2012, 07:22:56 AM
Oops, jj2007 posted something, better forget mine.

That doesn't mean your post isn't correct. On the contrary, you added theory to my practical example

By the way, try this:
SizeOfTest EQU 34

lea esi, TestStructs[3*SizeOfTest].s2
lea edi, TestStructs[3*$test].s2
.if esi==edi
shout "the same"
...

dedndave · June 28, 2012, 09:34:28 AM

ok - my turn - lol
i would not use LEA, in this case
LEA might be needed if one of the registers already contained part of the address

if you use MOV reg,OFFSET ..... the assembler will calculate the required address for you
the assembler knows the size of the structure and the base address of the array

sometimes, it isn't so obvious until you look at the disassembled code whether LEA is required

Code Select

TestStructs[1].s2
use of the brackets ([]) and the period have the same effect, here
the assembler will add the 3 elements together:

Code Select

the base address of TestStructs
1
s2 (the offset of s2 in a test$ structure)

NoCforMe · June 28, 2012, 12:40:35 PM

Quote from: FORTRANS on June 28, 2012, 07:22:56 AM
Hi,

Code Select Expand
LEA EAX, TestStructs[1].s2

This says :load EAX with the following address.
Take the address of TestStructs, add the offset of .s2,
then add 1 ([1]).

That's not what I would have ASS-U-med about this at all. (Even though you are correct.)

My immediate reaction is that MASM's behavior in this case is brain-damaged and illogical. On more sober contemplation, it seems that MASM simply lacks true array processing.

Why do I say "brain-damaged and illogical"? Because, well, C handles array references in a way that seems logical: array[n].field says "Take the offset of the nth element of array and add to that the offset of field". Everything to the left of the dot has to do with selecting the array element; everything to the right adds an offset to that selection.

That's the way array references should work. But MASM has it bass-ackwards. (I confirmed it with a little test prog. Doesn't matter if the fields are DDs or whatever.) How did they come up with that behavior?

In other words, what I thought was a subscript is actually just an offset, much like [EBX + VarName + 1]. The really annoying thing is that I haven't even been able to find documentation of this, at least not in the official Micro$oft MASM manual.

So is there no good shorthand method of referencing array elements using subscripts?

By the way, rather than using an equate using a hard-coded number (which would be incorrect if the size of the structure changed), I would prefer to do things this way:

TestStructs[SIZEOF $test * 1].s2

Still sucks compared to the way it should work, though ...

dedndave · June 28, 2012, 12:53:10 PM

it is perfectly logical - just low-level
that is one of the major differences in programming in ASM vs compiled languages
you have to do a little more work in order to get a lot more control
and - you get to see what goes on inside the processor's "head" :P

NoCforMe · June 28, 2012, 01:28:25 PM

Quote from: dedndave on June 28, 2012, 12:53:10 PM

it is perfectly logical - just low-level

Sorry, no; it's not logical at all. At least not syntactically.

Look: I make an array reference like TextField[2].field1. How in the world can you say that interpreting "[2]" as being an offset added to the offset of "field1" makes sense? It doesn't; everything on the left of the period should be evaluated as referencing a particular array element, not an offset from the 0th element. Otherwise, why have arrays at all if you can't properly reference their elements? (Well, we can, but we have to jump through a few hoops in other to do it. And it has nothing whatever to do with "low level" vs. high level.)

dedndave · June 28, 2012, 01:50:15 PM

it has everything to do with being low level

at any rate....
you sure like to be contrary, don't you - lol
you're lucky we are in the campus
i remind myself that these posts are really for reference for others

NoCforMe · June 28, 2012, 04:45:28 PM

I came up with another way to access array elements:

Code Select


$test		STRUCT
  s1		DD ?
  s2		DD ?
  s3		DD ?
$test		ENDS

ar		TEXTEQU <SIZEOF $test *>

	LEA	EDX, TestStructs[ar 2].s2

Is more intuitively satisfying to me (i.e., the "subscript" number is what one would expect), and is still "low level".

tenkey · June 28, 2012, 05:03:29 PM

In most cases, you don't want to use hard-coded subscripts.

At best, you can access bytes, words, dwords, and qwords from arrays using the following syntax forms:

mov al,ByteArray[ecx]
mov ax,WordArray[ecx*2]
mov eax,DwordArray[ecx*4]
mov eax,dword ptr QwordArray[ecx*8] ; lower half
mov edx,dword ptr QwordArray[ecx*8+4] ; upper half

MASM exposes the processor, and the processor knows nothing about arrays.

For an arbitrary sized item, you are forced to do the following for a variable index:

mov eax,sizeof $test
imul index_of_array ; compute byte offset
mov edx,TestStructs[eax].s1
mov TestStructs[eax].s2,edx

or

mov ecx,index_of_array
imul eax,ecx,sizeof $test ; compute byte offset
mov edx,TestStructs[eax].s1
mov TestStructs[eax].s2,edx

or

imul eax,index_of_array,sizeof $test ; compute byte offset
mov edx,TestStructs[eax].s1
mov TestStructs[eax].s2,edx

MichaelW · June 28, 2012, 05:32:33 PM

Constant indexes are easy. The array index does need to be adjusted by the size of the array elements, but if you are looping through the elements the multiply can be replaced with an addition.

Code Select


;==============================================================================
include \masm32\include\masm32rt.inc
;==============================================================================
    $test STRUCT
        s0  DWORD 3 DUP(?)
        s1  DWORD 3 DUP(?)
        s2  DWORD 3 DUP(?)
    $test ENDS
;==============================================================================
.data
    TestStructs $test 3 DUP (<{0,1,2},{3,4,5},{6,7,8}>)
.code
;==============================================================================
start:
;==============================================================================
    I = sizeof $test

    printf("%d\t",  TestStructs[I*0].s0[0*4])
    printf("%d\t",  TestStructs[I*0].s0[1*4])
    printf("%d\t",  TestStructs[I*0].s0[2*4])
    printf("%d\t",  TestStructs[I*0].s1[0*4])
    printf("%d\t",  TestStructs[I*0].s1[1*4])
    printf("%d\t",  TestStructs[I*0].s1[2*4])
    printf("%d\t",  TestStructs[I*0].s2[0*4])
    printf("%d\t",  TestStructs[I*0].s2[1*4])
    printf("%d\n\n",TestStructs[I*0].s2[2*4])
    printf("%d\t",  TestStructs[I*1].s0[0*4])
    printf("%d\t",  TestStructs[I*1].s0[1*4])
    printf("%d\t",  TestStructs[I*1].s0[2*4])
    printf("%d\t",  TestStructs[I*1].s1[0*4])
    printf("%d\t",  TestStructs[I*1].s1[1*4])
    printf("%d\t",  TestStructs[I*1].s1[2*4])
    printf("%d\t",  TestStructs[I*1].s2[0*4])
    printf("%d\t",  TestStructs[I*1].s2[1*4])
    printf("%d\n\n",TestStructs[I*1].s2[2*4])
    printf("%d\t",  TestStructs[I*2].s0[0*4])
    printf("%d\t",  TestStructs[I*2].s0[1*4])
    printf("%d\t",  TestStructs[I*2].s0[2*4])
    printf("%d\t",  TestStructs[I*2].s1[0*4])
    printf("%d\t",  TestStructs[I*2].s1[1*4])
    printf("%d\t",  TestStructs[I*2].s1[2*4])
    printf("%d\t",  TestStructs[I*2].s2[0*4])
    printf("%d\t",  TestStructs[I*2].s2[1*4])
    printf("%d\n\n",TestStructs[I*2].s2[2*4])

    xor ebx, ebx
    .WHILE ebx < 3 * I
        xor esi, esi
        .WHILE esi < 3
            printf("%d\t", TestStructs[ebx].s0[esi*4])
            inc esi
        .ENDW
        xor esi, esi
        .WHILE esi < 3
            printf("%d\t", TestStructs[ebx].s1[esi*4])
            inc esi
        .ENDW
        xor esi, esi
        .WHILE esi < 3
            printf("%d\t", TestStructs[ebx].s2[esi*4])
            inc esi
        .ENDW
        printf("\n\n")
        add ebx, I
    .ENDW

    inkey
    exit
;==============================================================================
end start

jj2007 · June 28, 2012, 06:17:31 PM

As Dave already wrote, Masm is low level, and [n] means "offset n bytes". But you still have elegant options available:

Code Select

include \masm32\include\masm32rt.inc

$test		STRUCT
  s1		DB 20 DUP(?)
  s2		DB 10 DUP(?)
  s3		DB 4 DUP(?)
$test		ENDS

.data?
TestStructs	$test 4 DUP (<>)

.code
start:	lea	edi, TestStructs[1*$test].s2	; indirect, using edi
	mov byte ptr [edi], 111	; needs to inform Masm which size
	mov TestStructs[2*$test].s2, 222	; directly, no size info needed
	print str$(TestStructs[1*$test].s2), 9, "value 1.s2", 13, 10
	print str$(TestStructs[2*$test].s2), 9, "value 2.s2", 13, 10
	mov eax, edi
	sub eax, offset TestStructs
	print str$(eax), 9, "offset 1:0", 13, 10
	exit
end start

Output:
111 value 1.s2
222 value 2.s2
54 offset 1:0

If that is still not highlevelish enough, you are a candidate for MasmBasic

include \masm32\MasmBasic\MasmBasic.inc

$test      STRUCT
s1      DB 20 DUP(?)
s2      DB 10 DUP(?)
s3      DB 4 DUP(?)
$test      ENDS

   Init
   Dim TestStructs(3) As $test
   mov TestStructs(3, s2), 123
   Print Str$("Value=\t%i\n", TestStructs(3, s2))
   lea ecx, TestStructs(0, s1)   ; first item
   lea eax, TestStructs(3, s2)   ; current item
   sub eax, ecx
   Inkey Str$("Offset=\t%i", eax)
   Exit
end start

Pure MAssembler^TM

tenkey · June 28, 2012, 07:01:48 PM

Quote from: michaelwConstant indexes are easy. The array index does need to be adjusted by the size of the array elements, but if you are looping through the elements the multiply can be replaced with an addition.

One of the reasons for the continuing existence of the C language is to have access to some of the ASM tricks.

The indexed copy loop

for (j = 0; j < count; j++) dest[j].f1 = src[j].f2;

can be rewritten as

pdest = dest; // array name is ptr constant
psrc = src;
n = count;
while (n--) (pdest++)->f1 = (psrc++)->f2;

The latter replaces the subscript multiplication with address addition. It is roughly equivalent to

lea edi,dest
lea esi,src
mov ecx,count
cmp ecx,0
je endlbl
lbl:
mov eax,[esi].src_struct.f2 ; if f1 and f2 are dwords
mov [edi].dest_struct.f1,eax
add esi,sizeof src_struct
add edi,sizeof dest_struct
dec ecx
jnz lbl
endlbl:

hutch-- · June 28, 2012, 09:36:04 PM

A technique that does work well and is no big deal to write in assembler is an array of pointers to other array elements. The target array can be of uneven size, IE: an array of variable length strings for example but what makes it fast and easy to work with is an array of predictable size (DWORD ARRAY) where each DWORD member is a pointer to the uneven size elements.

High level languages do this all the time but its easy enough to create an array of variable length elements that is addressed by an array of pointers.

The MASM Forum

News:

How to access structure array elements

NoCforMe

jj2007

FORTRANS

jj2007

dedndave

NoCforMe

dedndave

NoCforMe

dedndave

NoCforMe

tenkey

MichaelW

jj2007

tenkey

hutch--