News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Ring buffer vs HeapAlloc

Started by jj2007, February 25, 2022, 04:02:35 AM

Previous topic - Next topic

jj2007

The attached program compares five cases: a random #bytes (0...1250) is requested...
a) from a ring buffer, not zeroed
b) from a ring buffer, zeroed using two methods
c) via HeapAlloc, not zeroed
d) via HeapAlloc, zeroed

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
9899 µs  for circular buffer, not zeroed
88 ms    for circular buffer, zeroed with stosb
72 ms    for circular buffer, zeroed with movups
267 ms   for HeapAlloc, generate_exceptions
453 ms   for HeapAlloc, ZERO_MEMORY

10 ms    for circular buffer, not zeroed
80 ms    for circular buffer, zeroed with stosb
61 ms    for circular buffer, zeroed with movups
260 ms   for HeapAlloc, generate_exceptions
455 ms   for HeapAlloc, ZERO_MEMORY

9927 µs  for circular buffer, not zeroed
81 ms    for circular buffer, zeroed with stosb
62 ms    for circular buffer, zeroed with movups
265 ms   for HeapAlloc, generate_exceptions
521 ms   for HeapAlloc, ZERO_MEMORY

9923 µs  for circular buffer, not zeroed
81 ms    for circular buffer, zeroed with stosb
70 ms    for circular buffer, zeroed with movups
337 ms   for HeapAlloc, generate_exceptions
539 ms   for HeapAlloc, ZERO_MEMORY

10 ms    for circular buffer, not zeroed
80 ms    for circular buffer, zeroed with stosb
60 ms    for circular buffer, zeroed with movups
342 ms   for HeapAlloc, generate_exceptions
541 ms   for HeapAlloc, ZERO_MEMORY


Source attached. On my machine, the ring buffer is over a factor 7 faster.

quarantined

Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
9560 µs  for circular buffer, not zeroed
172 ms   for circular buffer, zeroed with stosb
128 ms   for circular buffer, zeroed with movups
276 ms   for HeapAlloc, generate_exceptions
511 ms   for HeapAlloc, ZERO_MEMORY

9716 µs  for circular buffer, not zeroed
170 ms   for circular buffer, zeroed with stosb
128 ms   for circular buffer, zeroed with movups
276 ms   for HeapAlloc, generate_exceptions
511 ms   for HeapAlloc, ZERO_MEMORY

9691 µs  for circular buffer, not zeroed
170 ms   for circular buffer, zeroed with stosb
128 ms   for circular buffer, zeroed with movups
288 ms   for HeapAlloc, generate_exceptions
739 ms   for HeapAlloc, ZERO_MEMORY

9656 µs  for circular buffer, not zeroed
171 ms   for circular buffer, zeroed with stosb
128 ms   for circular buffer, zeroed with movups
416 ms   for HeapAlloc, generate_exceptions
750 ms   for HeapAlloc, ZERO_MEMORY

9733 µs  for circular buffer, not zeroed
171 ms   for circular buffer, zeroed with stosb
128 ms   for circular buffer, zeroed with movups
415 ms   for HeapAlloc, generate_exceptions
753 ms   for HeapAlloc, ZERO_MEMORY

-- hit any key --

For what its worth...

jj2007


TimoVJL

AMD Ryzen 5 3400G with Radeon Vega Graphics
5768 µs  for circular buffer, not zeroed
47 ms    for circular buffer, zeroed with stosb
30 ms    for circular buffer, zeroed with movups
172 ms   for HeapAlloc, generate_exceptions
270 ms   for HeapAlloc, ZERO_MEMORY

5817 µs  for circular buffer, not zeroed
47 ms    for circular buffer, zeroed with stosb
30 ms    for circular buffer, zeroed with movups
377 ms   for HeapAlloc, generate_exceptions
612 ms   for HeapAlloc, ZERO_MEMORY

5783 µs  for circular buffer, not zeroed
47 ms    for circular buffer, zeroed with stosb
30 ms    for circular buffer, zeroed with movups
469 ms   for HeapAlloc, generate_exceptions
597 ms   for HeapAlloc, ZERO_MEMORY

5785 µs  for circular buffer, not zeroed
47 ms    for circular buffer, zeroed with stosb
30 ms    for circular buffer, zeroed with movups
478 ms   for HeapAlloc, generate_exceptions
598 ms   for HeapAlloc, ZERO_MEMORY

5766 µs  for circular buffer, not zeroed
47 ms    for circular buffer, zeroed with stosb
30 ms    for circular buffer, zeroed with movups
465 ms   for HeapAlloc, generate_exceptions
597 ms   for HeapAlloc, ZERO_MEMORY

-- hit any key --
May the source be with you

jj2007

Quote from: TimoVJL on February 25, 2022, 09:54:56 AM
AMD Ryzen 5 3400G with Radeon Vega Graphics
270 ms   for HeapAlloc, ZERO_MEMORY
612 ms   for HeapAlloc, ZERO_MEMORY
597 ms   for HeapAlloc, ZERO_MEMORY
598 ms   for HeapAlloc, ZERO_MEMORY
597 ms   for HeapAlloc, ZERO_MEMORY

Interesting, that doubling after the first run :rolleyes:

LiaoMi

Hi,

11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
4916 µs  for circular buffer, not zeroed
19 ms    for circular buffer, zeroed with stosb
30 ms    for circular buffer, zeroed with movups
131 ms   for HeapAlloc, generate_exceptions
163 ms   for HeapAlloc, ZERO_MEMORY

4152 µs  for circular buffer, not zeroed
17 ms    for circular buffer, zeroed with stosb
30 ms    for circular buffer, zeroed with movups
327 ms   for HeapAlloc, generate_exceptions
507 ms   for HeapAlloc, ZERO_MEMORY

4360 µs  for circular buffer, not zeroed
18 ms    for circular buffer, zeroed with stosb
31 ms    for circular buffer, zeroed with movups
391 ms   for HeapAlloc, generate_exceptions
481 ms   for HeapAlloc, ZERO_MEMORY

4187 µs  for circular buffer, not zeroed
18 ms    for circular buffer, zeroed with stosb
30 ms    for circular buffer, zeroed with movups
388 ms   for HeapAlloc, generate_exceptions
479 ms   for HeapAlloc, ZERO_MEMORY

4171 µs  for circular buffer, not zeroed
18 ms    for circular buffer, zeroed with stosb
30 ms    for circular buffer, zeroed with movups
393 ms   for HeapAlloc, generate_exceptions
494 ms   for HeapAlloc, ZERO_MEMORY

jj2007

Wow, that's a fast machine, LiaoMi :thumbsup:

Interesting that your HeapAlloc times triple after the first run. On my Win7-64, there is only a slight increase by 20% :cool:

daydreamer

with fast SSD you should include this kinda buffer
buffer      db 1024 dup ("0")
               db 13,10,0
to compare speed vs diskspeed
fopen "zeros.txt"
fread
fclose
or maybe copy the compile time for including big buffer?

Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
10 ms    for circular buffer, not zeroed
51 ms    for circular buffer, zeroed with stosb
50 ms    for circular buffer, zeroed with movups
225 ms   for HeapAlloc, generate_exceptions
308 ms   for HeapAlloc, ZERO_MEMORY

8564 µs  for circular buffer, not zeroed
33 ms    for circular buffer, zeroed with stosb
41 ms    for circular buffer, zeroed with movups
378 ms   for HeapAlloc, generate_exceptions
432 ms   for HeapAlloc, ZERO_MEMORY

7319 µs  for circular buffer, not zeroed
34 ms    for circular buffer, zeroed with stosb
41 ms    for circular buffer, zeroed with movups
291 ms   for HeapAlloc, generate_exceptions
419 ms   for HeapAlloc, ZERO_MEMORY

8376 µs  for circular buffer, not zeroed
33 ms    for circular buffer, zeroed with stosb
43 ms    for circular buffer, zeroed with movups
293 ms   for HeapAlloc, generate_exceptions
438 ms   for HeapAlloc, ZERO_MEMORY

9726 µs  for circular buffer, not zeroed
33 ms    for circular buffer, zeroed with stosb
41 ms    for circular buffer, zeroed with movups
290 ms   for HeapAlloc, generate_exceptions
447 ms   for HeapAlloc, ZERO_MEMORY

-- hit any key --

my none asm creations
https://masm32.com/board/index.php?topic=6937.msg74303#msg74303
I am an Invoker
"An Invoker is a mage who specializes in the manipulation of raw and elemental energies."
Like SIMD coding

FORTRANS

Hi,

   Three systems.  When first run on the i3-10110U, the "circular
buffer, not zeroed" showed a performance similar to the other
systems for two runs out of those run.  After that, the performance
was about ~~300 times slower.  I usually run your timing programs
once at the command line to see results, and then pipe results to a file.
Which means the fast "circular buffer, not zeroed" were not captured.

Intel(R) Pentium(R) M processor 1.70GHz
23 ms for circular buffer, not zeroed
719 ms for circular buffer, zeroed with stosb
213 ms for circular buffer, zeroed with movups
321 ms for HeapAlloc, generate_exceptions
1041 ms for HeapAlloc, ZERO_MEMORY

22 ms for circular buffer, not zeroed
724 ms for circular buffer, zeroed with stosb
211 ms for circular buffer, zeroed with movups
319 ms for HeapAlloc, generate_exceptions
1041 ms for HeapAlloc, ZERO_MEMORY

22 ms for circular buffer, not zeroed
723 ms for circular buffer, zeroed with stosb
215 ms for circular buffer, zeroed with movups
320 ms for HeapAlloc, generate_exceptions
1041 ms for HeapAlloc, ZERO_MEMORY

22 ms for circular buffer, not zeroed
722 ms for circular buffer, zeroed with stosb
210 ms for circular buffer, zeroed with movups
321 ms for HeapAlloc, generate_exceptions
1047 ms for HeapAlloc, ZERO_MEMORY

22 ms for circular buffer, not zeroed
731 ms for circular buffer, zeroed with stosb
215 ms for circular buffer, zeroed with movups
322 ms for HeapAlloc, generate_exceptions
1044 ms for HeapAlloc, ZERO_MEMORY

-- hit any key --

Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz
16 ms for circular buffer, not zeroed
77 ms for circular buffer, zeroed with stosb
82 ms for circular buffer, zeroed with movups
381 ms for HeapAlloc, generate_exceptions
612 ms for HeapAlloc, ZERO_MEMORY

14 ms for circular buffer, not zeroed
77 ms for circular buffer, zeroed with stosb
80 ms for circular buffer, zeroed with movups
601 ms for HeapAlloc, generate_exceptions
762 ms for HeapAlloc, ZERO_MEMORY

15 ms for circular buffer, not zeroed
77 ms for circular buffer, zeroed with stosb
77 ms for circular buffer, zeroed with movups
389 ms for HeapAlloc, generate_exceptions
720 ms for HeapAlloc, ZERO_MEMORY

15 ms for circular buffer, not zeroed
78 ms for circular buffer, zeroed with stosb
82 ms for circular buffer, zeroed with movups
375 ms for HeapAlloc, generate_exceptions
711 ms for HeapAlloc, ZERO_MEMORY

15 ms for circular buffer, not zeroed
77 ms for circular buffer, zeroed with stosb
78 ms for circular buffer, zeroed with movups
375 ms for HeapAlloc, generate_exceptions
710 ms for HeapAlloc, ZERO_MEMORY

-- hit any key --

Intel(R) Core(TM) i3-10110U CPU @ 2.10GHz
8332 µs for circular buffer, not zeroed
26 ms for circular buffer, zeroed with stosb
35 ms for circular buffer, zeroed with movups
176 ms for HeapAlloc, generate_exceptions
223 ms for HeapAlloc, ZERO_MEMORY

6377 µs for circular buffer, not zeroed
25 ms for circular buffer, zeroed with stosb
32 ms for circular buffer, zeroed with movups
268 ms for HeapAlloc, generate_exceptions
340 ms for HeapAlloc, ZERO_MEMORY

6304 µs for circular buffer, not zeroed
25 ms for circular buffer, zeroed with stosb
31 ms for circular buffer, zeroed with movups
211 ms for HeapAlloc, generate_exceptions
321 ms for HeapAlloc, ZERO_MEMORY

6937 µs for circular buffer, not zeroed
25 ms for circular buffer, zeroed with stosb
34 ms for circular buffer, zeroed with movups
212 ms for HeapAlloc, generate_exceptions
315 ms for HeapAlloc, ZERO_MEMORY

6456 µs for circular buffer, not zeroed
25 ms for circular buffer, zeroed with stosb
35 ms for circular buffer, zeroed with movups
208 ms for HeapAlloc, generate_exceptions
319 ms for HeapAlloc, ZERO_MEMORY

-- hit any key --


Weird,

Steve

hutch--


Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
6777 µs  for circular buffer, not zeroed
37 ms    for circular buffer, zeroed with stosb
36 ms    for circular buffer, zeroed with movups
198 ms   for HeapAlloc, generate_exceptions
285 ms   for HeapAlloc, ZERO_MEMORY

6051 µs  for circular buffer, not zeroed
37 ms    for circular buffer, zeroed with stosb
35 ms    for circular buffer, zeroed with movups
290 ms   for HeapAlloc, generate_exceptions
364 ms   for HeapAlloc, ZERO_MEMORY

6631 µs  for circular buffer, not zeroed
37 ms    for circular buffer, zeroed with stosb
34 ms    for circular buffer, zeroed with movups
192 ms   for HeapAlloc, generate_exceptions
360 ms   for HeapAlloc, ZERO_MEMORY

5904 µs  for circular buffer, not zeroed
37 ms    for circular buffer, zeroed with stosb
36 ms    for circular buffer, zeroed with movups
194 ms   for HeapAlloc, generate_exceptions
359 ms   for HeapAlloc, ZERO_MEMORY

6573 µs  for circular buffer, not zeroed
37 ms    for circular buffer, zeroed with stosb
34 ms    for circular buffer, zeroed with movups
197 ms   for HeapAlloc, generate_exceptions
359 ms   for HeapAlloc, ZERO_MEMORY

-- hit any key --

InfiniteLoop

What is a circular buffer?

12th Gen Intel(R) Core(TM) i7-12700K
2966 µs  for circular buffer, not zeroed
17 ms    for circular buffer, zeroed with stosb
21 ms    for circular buffer, zeroed with movups
118 ms   for HeapAlloc, generate_exceptions
165 ms   for HeapAlloc, ZERO_MEMORY

3863 µs  for circular buffer, not zeroed
15 ms    for circular buffer, zeroed with stosb
21 ms    for circular buffer, zeroed with movups
233 ms   for HeapAlloc, generate_exceptions
324 ms   for HeapAlloc, ZERO_MEMORY

2861 µs  for circular buffer, not zeroed
15 ms    for circular buffer, zeroed with stosb
21 ms    for circular buffer, zeroed with movups
231 ms   for HeapAlloc, generate_exceptions
321 ms   for HeapAlloc, ZERO_MEMORY

2886 µs  for circular buffer, not zeroed
16 ms    for circular buffer, zeroed with stosb
21 ms    for circular buffer, zeroed with movups
232 ms   for HeapAlloc, generate_exceptions
318 ms   for HeapAlloc, ZERO_MEMORY

3967 µs  for circular buffer, not zeroed
16 ms    for circular buffer, zeroed with stosb
20 ms    for circular buffer, zeroed with movups
233 ms   for HeapAlloc, generate_exceptions
319 ms   for HeapAlloc, ZERO_MEMORY

jj2007

Quote from: InfiniteLoop on March 06, 2022, 06:38:11 PM
What is a circular buffer?

https://www.techopedia.com/definition/18301/ring-buffer

quarantined

Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz win 10 64 bit
9939 µs  for circular buffer, not zeroed
175 ms   for circular buffer, zeroed with stosb
130 ms   for circular buffer, zeroed with movups
333 ms   for HeapAlloc, generate_exceptions
583 ms   for HeapAlloc, ZERO_MEMORY

9763 µs  for circular buffer, not zeroed
176 ms   for circular buffer, zeroed with stosb
131 ms   for circular buffer, zeroed with movups
539 ms   for HeapAlloc, generate_exceptions
899 ms   for HeapAlloc, ZERO_MEMORY

11 ms    for circular buffer, not zeroed
193 ms   for circular buffer, zeroed with stosb
169 ms   for circular buffer, zeroed with movups
366 ms   for HeapAlloc, generate_exceptions
725 ms   for HeapAlloc, ZERO_MEMORY

9601 µs  for circular buffer, not zeroed
173 ms   for circular buffer, zeroed with stosb
129 ms   for circular buffer, zeroed with movups
346 ms   for HeapAlloc, generate_exceptions
749 ms   for HeapAlloc, ZERO_MEMORY

9965 µs  for circular buffer, not zeroed
173 ms   for circular buffer, zeroed with stosb
129 ms   for circular buffer, zeroed with movups
342 ms   for HeapAlloc, generate_exceptions
717 ms   for HeapAlloc, ZERO_MEMORY

-- hit any key --


fearless

AMD Ryzen 9 5950X 16-Core Processor
4382 µs  for circular buffer, not zeroed
29 ms    for circular buffer, zeroed with stosb
23 ms    for circular buffer, zeroed with movups
137 ms   for HeapAlloc, generate_exceptions
199 ms   for HeapAlloc, ZERO_MEMORY

4398 µs  for circular buffer, not zeroed
29 ms    for circular buffer, zeroed with stosb
23 ms    for circular buffer, zeroed with movups
201 ms   for HeapAlloc, generate_exceptions
191 ms   for HeapAlloc, ZERO_MEMORY

4414 µs  for circular buffer, not zeroed
29 ms    for circular buffer, zeroed with stosb
23 ms    for circular buffer, zeroed with movups
105 ms   for HeapAlloc, generate_exceptions
177 ms   for HeapAlloc, ZERO_MEMORY

4576 µs  for circular buffer, not zeroed
29 ms    for circular buffer, zeroed with stosb
23 ms    for circular buffer, zeroed with movups
104 ms   for HeapAlloc, generate_exceptions
174 ms   for HeapAlloc, ZERO_MEMORY

4460 µs  for circular buffer, not zeroed
29 ms    for circular buffer, zeroed with stosb
23 ms    for circular buffer, zeroed with movups
107 ms   for HeapAlloc, generate_exceptions
178 ms   for HeapAlloc, ZERO_MEMORY

-- hit any key --

mikeburr

interesting .. sort of JJ .. but i presume youve hard coded the buffer into the data area ??
if this is the case then its fixed so for instance if youre generating up 8 to 10 million records in memory as i do on some
32 bit programs its no going to be an option
its obviously going to be more rapid as the op system does all the work with heap alloc
incidentally the creation of 10 million records [ linked list with index/ page indexes... paged 4096 records per page .. (because the list boxes are very limited 64k)
only takes about 50 seconds including the creation of the records ... which is a fast loop division type algo of about 120 lines
and has the advantage the on the next request you can simply destroy the heaps  and start again
regards mikeb