News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

A quick question & answer for STRUCT performance access

Started by LordAdef, February 06, 2019, 09:28:51 AM

Previous topic - Next topic

LordAdef

First STRUCT
    line            dd    10000 dup (?)
    numOfChunks     dd    10000 dup (?)
    x               dd    10000 dup (?)
    w               dd    10000 dup (?)
First ends

Second STRUCT
    line            dd ?
    numOfChunks     dd ?
    x               dd ?
    w               dd ?
Second ends

aFirst  First <>
aSecond Second 10000 dup(<>)


Hello my friends, I am writting this code to parse data into one array.

I could obviously use one of the above methods (the 2 above structs First and Second).

I would retrive data through a loop where aFirst.line OR aSecond.line value would iterate.

Since one of you surely already have the answer, I didn't write a test.

My feeling is "aSecond Second 10.000 dup(<>)" would be faster since the values (accessed within the loop) are together.
Any thought, based on your experience?

Cheers all

jj2007

Quote from: LordAdef on February 06, 2019, 09:28:51 AMI would retrive data through a loop where aFirst.line OR aSecond.line value would iterate.

Explain "iterate" using some lines of code.

Siekmanski

In the "Second struct" the members are closer together thus more likely in the same data cache and will be faster.
Creative coders use backward thinking techniques as a strategy.

daydreamer

I usually add align 16 and the second struct
maybe should align 64 in the case of thinking fit in cache line
its faster and it has the potential to be used with SSE if its floats/SSE2 integer instructions its integers if you want to



my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

Raistlin

QuoteIn the "Second struct" the members are closer together thus more likely in the same data cache and will be faster.

Agreed, targeting structs for data access, you want the probability of the the cache pre-fetch containing your data's memory mapped address
to be high for L1 and L2 caches. The 2nd Struct has more opportunity for this to be the case, Ex. Cache-line sizes are typically 64 bytes, L1
data cache in the 32Kb range and L2 in the 256kb per CPU core range.  Lastly, to my knowledge, Cache's on modern platforms (post 2000) are
"ways associative" which should also favor the 2nd Struct option on most occasions. 
Are you pondering what I'm pondering? It's time to take over the world ! - let's use ASSEMBLY...

LordAdef

Quote from: jj2007 on February 06, 2019, 01:31:40 PM
Quote from: LordAdef on February 06, 2019, 09:28:51 AMI would retrive data through a loop where aFirst.line OR aSecond.line value would iterate.

Explain "iterate" using some lines of code.
Hi Jochen,It's bad wording explaining it. The loop iterate through each member of the struct, as in most cases.But the colleagues already mentioned what I was guessing, second is faster

LordAdef

Hi Marinus, Raistlin & Daydreamer,
Thanks, I was betting my code on the second version too. It's faster for most cases. But there are those cases when you may use only one member of the struct and iterate through them. I guess for those cases First struct may be the case, right?
Let's say you need to get all struct.x :
quick code without thinking much:
mov ecx, 10000
getIt:
    mov eax, aFirst.x[ecx]
    ** do something with x values here
loop getIt

For these cases, I guess the first struct may be faster.

*** Thinking about it, the first case looks so HLL than I wondered how structures are implemented in C, under the hood...

HSE

You can use both simultaneously. Penalties are more complex actualizations and more memory in use.
Equations in Assembly: SmplMath