News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

How to access structure array elements

Started by NoCforMe, June 28, 2012, 07:05:57 AM

Previous topic - Next topic

NoCforMe

I feel like such an idiot.

I should know this. I'm trying to create an array of structures, and access an element of the array using a pointer.

I've created the array with no problem:


$test STRUCT
  s1 DB 20 DUP(?)
  s2 DB 10 DUP(?)
  s3 DB 4 DUP(?)
$test ENDS

TestStructs $test 4 DUP (<>)


Problem is, when I try to access an element of the array, the subscript I use becomes a byte offset within one of the strings, not an offset to the nth element.

In other words, if I do this:


LEA EAX, TestStructs[1].s2


I end up pointing to the 2nd byte within s2 of the first element--not at all what I want.

I thought I knew how to do this. MASM's behavior here seems completely counter-intuitive. If I say TestStructs[1].s2, I'm saying I want the 2nd element (0-based) of the array of structures (what's to the left of the period), and then I want the offset to field s2 within that element. Right?

Obviously, wrong. The following little program shows it clearly:


;============================================
; Array addressing testbed
;============================================


include \masm32\include\masm32rt.inc


;============================================
; Defines, macros, prototypes, etc.
;============================================

$test STRUCT
  s1 DB 20 DUP(?)
  s2 DB 10 DUP(?)
  s3 DB 4 DUP(?)
$test ENDS


;============================================
; HERE BE DATA
;============================================
.data

TestStructs $test 4 DUP (<>)

Addrfmt DB "Address of TestStructs[0].s2: %x", 13, 10
DB "Address of TestStructs[1].s2: %x", 13, 10, 0

buffer DB 200 DUP(?)


;============================================
; CODE LIVES HERE
;============================================
.code


start: INVOKE wsprintf, OFFSET buffer, OFFSET Addrfmt,
OFFSET TestStructs[0].s2, OFFSET TestStructs[1].s2
INVOKE MessageBox, 0, OFFSET buffer, NULL, MB_OK

INVOKE ExitProcess, EAX

END start


So what's the correct syntax for what I'm trying to do? I know this is a piece of cake with structures that don't contain arrays (i.e., strings). It seems that the subscript is being applied to the field rather than the element, since the field is an array of bytes.

(I realize those OFFSETs don't really do anything--I just tried putting them in out of desperation!)



jj2007

include \masm32\include\masm32rt.inc

$test      STRUCT
  s1      DB 20 DUP(?)
  s2      DB 10 DUP(?)
  s3      DB 4 DUP(?)
$test      ENDS

.data?
TestStructs   $test 4 DUP (<>)

.code
start:   lea   edi, TestStructs[3*$test].s2
   mov byte ptr [edi], 123
   print str$(TestStructs[3*$test].s2), 9, "value", 13, 10
   mov eax, edi
   sub eax, offset TestStructs
   print str$(eax), 9, "offset", 13, 10
   exit
end start

HTH, jj

FORTRANS

Hi,

LEA EAX, TestStructs[1].s2


   This says :load EAX with the following address.
Take the address of TestStructs, add the offset of .s2,
then add 1 ([1]).  You probably want to use something
like:

$test STRUCT
  s1 DB 20 DUP(?)
  s2 DB 10 DUP(?)
  s3 DB 4 DUP(?)
$test ENDS

TestStructs $test 4 DUP (<>)
SizeOfTest      EQU     34      ; Your struc has 34 bytes.

LEA EAX, TestStructs[1*SizeOfTests].s2


   Oops, jj2007 posted something, better forget mine.

Regards,

Steve N.

jj2007

Quote from: FORTRANS on June 28, 2012, 07:22:56 AM
Oops, jj2007 posted something, better forget mine.

That doesn't mean your post isn't correct. On the contrary, you added theory to my practical example :biggrin:

By the way, try this:
SizeOfTest      EQU     34

lea   esi, TestStructs[3*SizeOfTest].s2
lea   edi, TestStructs[3*$test].s2
.if esi==edi
    shout "the same"
...

dedndave

ok - my turn - lol
i would not use LEA, in this case
LEA might be needed if one of the registers already contained part of the address

if you use MOV reg,OFFSET ..... the assembler will calculate the required address for you
the assembler knows the size of the structure and the base address of the array

sometimes, it isn't so obvious until you look at the disassembled code whether LEA is required

TestStructs[1].s2
use of the brackets ([]) and the period have the same effect, here
the assembler will add the 3 elements together:
the base address of TestStructs
1
s2 (the offset of s2 in a test$ structure)


NoCforMe

Quote from: FORTRANS on June 28, 2012, 07:22:56 AM
Hi,

LEA EAX, TestStructs[1].s2


   This says :load EAX with the following address.
Take the address of TestStructs, add the offset of .s2,
then add 1 ([1]).

That's not what I would have ASS-U-med about  this at all. (Even though you are correct.)

My immediate reaction is that MASM's behavior in this case is brain-damaged and illogical. On more sober contemplation, it seems that MASM simply lacks true array processing.

Why do I say "brain-damaged and illogical"? Because, well, C handles array references in a way that seems logical: array[n].field says "Take the offset of the nth element of array and add to that the offset of field". Everything to the left of the dot has to do with selecting the array element; everything to the right adds an offset to that selection.

That's the way array references should work. But MASM has it bass-ackwards. (I confirmed it with a little test prog. Doesn't matter if the fields are DDs or whatever.) How did they come up with that behavior?

In other  words, what I thought was a subscript is actually just an offset, much like [EBX + VarName + 1]. The really annoying thing is that I haven't even been able to find documentation of this, at least not in the official Micro$oft MASM manual.

So is there no good shorthand method of referencing array elements using subscripts?

By the way, rather than using an equate using a hard-coded number (which would be incorrect if the size of the structure changed), I would prefer to do things this way:

TestStructs[SIZEOF $test * 1].s2

Still sucks compared to the way it should work, though ...

dedndave

 :biggrin:

it is perfectly logical - just low-level
that is one of the major differences in programming in ASM vs compiled languages
you have to do a little more work in order to get a lot more control
and - you get to see what goes on inside the processor's "head"   :P

NoCforMe

Quote from: dedndave on June 28, 2012, 12:53:10 PM
:biggrin:

it is perfectly logical - just low-level

Sorry, no; it's not logical at all. At least not syntactically.

Look: I make an array reference like TextField[2].field1. How in the world can you say that interpreting "[2]" as being an offset added to the offset of "field1" makes sense? It doesn't; everything on the left of the period should be evaluated as referencing a particular array element, not an offset from the 0th element. Otherwise, why have arrays at all if you can't properly reference their elements? (Well, we can, but we have to jump through a few hoops in other to do it. And it has nothing whatever to do with "low level" vs. high level.)

dedndave

it has everything to do with being low level

at any rate....
you sure like to be contrary, don't you - lol
you're lucky we are in the campus
i remind myself that these posts are really for reference for others

NoCforMe

I came up with another way to access array elements:


$test STRUCT
  s1 DD ?
  s2 DD ?
  s3 DD ?
$test ENDS

ar TEXTEQU <SIZEOF $test *>

LEA EDX, TestStructs[ar 2].s2


Is more intuitively satisfying to me (i.e., the "subscript" number is what one would expect), and is still "low level".

tenkey

In most cases, you don't want to use hard-coded subscripts.

At best, you can access bytes, words, dwords, and qwords from arrays using the following syntax forms:

    mov al,ByteArray[ecx]
    mov ax,WordArray[ecx*2]
    mov eax,DwordArray[ecx*4]
    mov eax,dword ptr QwordArray[ecx*8]  ; lower half
    mov edx,dword ptr QwordArray[ecx*8+4]   ; upper half

MASM exposes the processor, and the processor knows nothing about arrays.

For an arbitrary sized item, you are forced to do the following for a variable index:

    mov     eax,sizeof $test
    imul    index_of_array   ; compute byte offset
    mov     edx,TestStructs[eax].s1
    mov     TestStructs[eax].s2,edx

or

    mov     ecx,index_of_array
    imul    eax,ecx,sizeof $test   ; compute byte offset
    mov     edx,TestStructs[eax].s1
    mov     TestStructs[eax].s2,edx

or

    imul    eax,index_of_array,sizeof $test   ; compute byte offset
    mov     edx,TestStructs[eax].s1
    mov     TestStructs[eax].s2,edx

MichaelW

Constant indexes are easy. The array index does need to be adjusted by the size of the array elements, but if you are looping through the elements the multiply can be replaced with an addition.

;==============================================================================
include \masm32\include\masm32rt.inc
;==============================================================================
    $test STRUCT
        s0  DWORD 3 DUP(?)
        s1  DWORD 3 DUP(?)
        s2  DWORD 3 DUP(?)
    $test ENDS
;==============================================================================
.data
    TestStructs $test 3 DUP (<{0,1,2},{3,4,5},{6,7,8}>)
.code
;==============================================================================
start:
;==============================================================================
    I = sizeof $test

    printf("%d\t",  TestStructs[I*0].s0[0*4])
    printf("%d\t",  TestStructs[I*0].s0[1*4])
    printf("%d\t",  TestStructs[I*0].s0[2*4])
    printf("%d\t",  TestStructs[I*0].s1[0*4])
    printf("%d\t",  TestStructs[I*0].s1[1*4])
    printf("%d\t",  TestStructs[I*0].s1[2*4])
    printf("%d\t",  TestStructs[I*0].s2[0*4])
    printf("%d\t",  TestStructs[I*0].s2[1*4])
    printf("%d\n\n",TestStructs[I*0].s2[2*4])
    printf("%d\t",  TestStructs[I*1].s0[0*4])
    printf("%d\t",  TestStructs[I*1].s0[1*4])
    printf("%d\t",  TestStructs[I*1].s0[2*4])
    printf("%d\t",  TestStructs[I*1].s1[0*4])
    printf("%d\t",  TestStructs[I*1].s1[1*4])
    printf("%d\t",  TestStructs[I*1].s1[2*4])
    printf("%d\t",  TestStructs[I*1].s2[0*4])
    printf("%d\t",  TestStructs[I*1].s2[1*4])
    printf("%d\n\n",TestStructs[I*1].s2[2*4])
    printf("%d\t",  TestStructs[I*2].s0[0*4])
    printf("%d\t",  TestStructs[I*2].s0[1*4])
    printf("%d\t",  TestStructs[I*2].s0[2*4])
    printf("%d\t",  TestStructs[I*2].s1[0*4])
    printf("%d\t",  TestStructs[I*2].s1[1*4])
    printf("%d\t",  TestStructs[I*2].s1[2*4])
    printf("%d\t",  TestStructs[I*2].s2[0*4])
    printf("%d\t",  TestStructs[I*2].s2[1*4])
    printf("%d\n\n",TestStructs[I*2].s2[2*4])

    xor ebx, ebx
    .WHILE ebx < 3 * I
        xor esi, esi
        .WHILE esi < 3
            printf("%d\t", TestStructs[ebx].s0[esi*4])
            inc esi
        .ENDW
        xor esi, esi
        .WHILE esi < 3
            printf("%d\t", TestStructs[ebx].s1[esi*4])
            inc esi
        .ENDW
        xor esi, esi
        .WHILE esi < 3
            printf("%d\t", TestStructs[ebx].s2[esi*4])
            inc esi
        .ENDW
        printf("\n\n")
        add ebx, I
    .ENDW

    inkey
    exit
;==============================================================================
end start



Well Microsoft, here's another nice mess you've gotten us into.

jj2007

As Dave already wrote, Masm is low level, and [n] means "offset n bytes". But you still have elegant options available:
include \masm32\include\masm32rt.inc

$test STRUCT
  s1 DB 20 DUP(?)
  s2 DB 10 DUP(?)
  s3 DB 4 DUP(?)
$test ENDS

.data?
TestStructs $test 4 DUP (<>)

.code
start: lea edi, TestStructs[1*$test].s2 ; indirect, using edi
mov byte ptr [edi], 111 ; needs to inform Masm which size
mov TestStructs[2*$test].s2, 222 ; directly, no size info needed
print str$(TestStructs[1*$test].s2), 9, "value 1.s2", 13, 10
print str$(TestStructs[2*$test].s2), 9, "value 2.s2", 13, 10
mov eax, edi
sub eax, offset TestStructs
print str$(eax), 9, "offset 1:0", 13, 10
exit
end start

Output:
111     value 1.s2
222     value 2.s2
54      offset 1:0



If that is still not highlevelish enough, you are a candidate for MasmBasic :greensml:

include \masm32\MasmBasic\MasmBasic.inc

$test      STRUCT
  s1      DB 20 DUP(?)
  s2      DB 10 DUP(?)
  s3      DB 4 DUP(?)
$test      ENDS

   Init
   Dim TestStructs(3) As $test
   mov TestStructs(3, s2), 123
   Print Str$("Value=\t%i\n", TestStructs(3, s2))
   lea ecx, TestStructs(0, s1)   ; first item
   lea eax, TestStructs(3, s2)   ; current item
   sub eax, ecx
   Inkey Str$("Offset=\t%i", eax)
   Exit
end start

Pure MAssemblerTM  :biggrin:

tenkey

Quote from: michaelwConstant indexes are easy. The array index does need to be adjusted by the size of the array elements, but if you are looping through the elements the multiply can be replaced with an addition.

One of the reasons for the continuing existence of the C language is to have access to some of the ASM tricks.

The indexed copy loop

  for (j = 0; j < count; j++) dest[j].f1 = src[j].f2;

can be rewritten as

  pdest = dest;   // array name is ptr constant
  psrc = src;
  n = count;
  while (n--)  (pdest++)->f1 = (psrc++)->f2;

The latter replaces the subscript multiplication with address addition. It is roughly equivalent to

    lea     edi,dest
    lea     esi,src
    mov     ecx,count
    cmp     ecx,0
    je      endlbl
lbl:
    mov     eax,[esi].src_struct.f2   ; if f1 and f2 are dwords
    mov     [edi].dest_struct.f1,eax
    add     esi,sizeof src_struct
    add     edi,sizeof dest_struct
    dec     ecx
    jnz     lbl
endlbl:

hutch--

A technique that does work well and is no big deal to write in assembler is an array of pointers to other array elements. The target array can be of uneven size, IE: an array of variable length strings for example but what makes it fast and easy to work with is an array of predictable size (DWORD ARRAY) where each DWORD member is a pointer to the uneven size elements.

High level languages do this all the time but its easy enough to create an array of variable length elements that is addressed by an array of pointers.