A quick question & answer for STRUCT performance access

LordAdef · February 06, 2019, 09:28:51 AM

First STRUCT
    line            dd    10000 dup (?)
    numOfChunks     dd    10000 dup (?)
    x               dd    10000 dup (?)
    w               dd    10000 dup (?)
First ends

Second STRUCT
    line            dd ?
    numOfChunks     dd ?
    x               dd ?
    w               dd ?
Second ends

aFirst  First <>
aSecond Second 10000 dup(<>)

Hello my friends, I am writting this code to parse data into one array.

I could obviously use one of the above methods (the 2 above structs First and Second).

I would retrive data through a loop where aFirst.line OR aSecond.line value would iterate.

Since one of you surely already have the answer, I didn't write a test.

My feeling is "aSecond Second 10.000 dup(<>)" would be faster since the values (accessed within the loop) are together.
Any thought, based on your experience?

Cheers all

jj2007 · February 06, 2019, 01:31:40 PM

Quote from: LordAdef on February 06, 2019, 09:28:51 AMI would retrive data through a loop where aFirst.line OR aSecond.line value would iterate.

Explain "iterate" using some lines of code.

Siekmanski · February 06, 2019, 07:53:15 PM

In the "Second struct" the members are closer together thus more likely in the same data cache and will be faster.

daydreamer · February 06, 2019, 09:13:07 PM

I usually add align 16 and the second struct
maybe should align 64 in the case of thinking fit in cache line
its faster and it has the potential to be used with SSE if its floats/SSE2 integer instructions its integers if you want to

Raistlin · February 06, 2019, 09:28:28 PM

QuoteIn the "Second struct" the members are closer together thus more likely in the same data cache and will be faster.

Agreed, targeting structs for data access, you want the probability of the the cache pre-fetch containing your data's memory mapped address
to be high for L1 and L2 caches. The 2nd Struct has more opportunity for this to be the case, Ex. Cache-line sizes are typically 64 bytes, L1
data cache in the 32Kb range and L2 in the 256kb per CPU core range. Lastly, to my knowledge, Cache's on modern platforms (post 2000) are
"ways associative" which should also favor the 2nd Struct option on most occasions.

LordAdef · February 07, 2019, 08:22:46 AM

Quote from: jj2007 on February 06, 2019, 01:31:40 PM
Quote from: LordAdef on February 06, 2019, 09:28:51 AMI would retrive data through a loop where aFirst.line OR aSecond.line value would iterate.

Explain "iterate" using some lines of code.

Hi Jochen,It's bad wording explaining it. The loop iterate through each member of the struct, as in most cases.But the colleagues already mentioned what I was guessing, second is faster

LordAdef · February 07, 2019, 08:36:14 AM

Hi Marinus, Raistlin & Daydreamer,
Thanks, I was betting my code on the second version too. It's faster for most cases. But there are those cases when you may use only one member of the struct and iterate through them. I guess for those cases First struct may be the case, right?
Let's say you need to get all struct.x :
quick code without thinking much:

Code Select

mov ecx, 10000
getIt: 
    mov eax, aFirst.x[ecx] 
    ** do something with x values here 
loop getIt

For these cases, I guess the first struct may be faster.

*** Thinking about it, the first case looks so HLL than I wondered how structures are implemented in C, under the hood...

HSE · February 07, 2019, 09:45:06 AM

You can use both simultaneously. Penalties are more complex actualizations and more memory in use.

The MASM Forum

News:

A quick question & answer for STRUCT performance access

LordAdef

jj2007

Siekmanski

daydreamer

Raistlin

LordAdef

LordAdef

HSE