The MASM Forum

General => The Laboratory => Topic started by: jj2007 on January 31, 2021, 12:56:21 PM

Title: CreateThread overhead
Post by: jj2007 on January 31, 2021, 12:56:21 PM
How much time does creating a thread cost? Here are some timings, using NanoTimer() (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1171):

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
256 µs  for #1/1
521 µs  for #2/3
709 µs  for #3/4
813 µs  for #4/5
970 µs  for #5/2
1148 µs for #6/9
1266 µs for #7/11
1407 µs for #8/13
1507 µs for #9/17
1711 µs for #10/7
1862 µs for #11/33
1954 µs for #12/12
2044 µs for #13/27
2071 µs for #14/16
2179 µs for #15/19
2299 µs for #16/10
2370 µs for #17/20
2454 µs for #18/14
2626 µs for #19/6
2710 µs for #20/23
2858 µs for #21/18
2942 µs for #22/21
3048 µs for #23/25
3086 µs for #24/26
3193 µs for #25/29
3264 µs for #26/30
3346 µs for #27/8
3498 µs for #28/37
3590 µs for #29/38
3634 µs for #30/15
3732 µs for #31/28
3772 µs for #32/31
3848 µs for #33/32
3950 µs for #34/35
3985 µs for #35/36
4071 µs for #36/39
4105 µs for #37/40
4366 µs for #38/24
4460 µs for #39/22
4663 µs for #40/34


The first number indicates the order of finished threads (1-40), the second number the order in which they were started.
So 2626 µs for #19/6 means that the thread started in loop iteration #6 finished rather late, as #19.

On average, creating and starting a thread costs slightly less than 120 µs on my machine.

include \masm32\MasmBasic\MasmBasic.inc ; download
.data?
ctThread dd ?
ctMain dd ?
.code
  Init
  Cls
  PrintCpu 0
  xor ecx, ecx
  .Repeat
inc ecx
  .Until Sign?
  NanoTimer()
  push 39
  .Repeat
push eax
inc ctMain
invoke CreateThread, 0, 0, TheThread, ctMain, 0, esp
pop ecx
dec stack
  .Until Sign?
  pop edx
  Delay 100
  Exit

TheThread proc arg
  mov ecx, NanoTimer(µs)
  inc ctThread
  Print Str$("%i µs",ecx), Str$("\tfor #%i", ctThread), Str$("/%i\n", arg)
  ret
TheThread endp

EndOfCode
Title: Re: CreateThread overhead
Post by: quarantined on January 31, 2021, 06:25:20 PM

AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G  1.60 GHz
-2147483648 µs for #2/1
171 µs for #1/2
290 µs for #3/3
351 µs for #4/5
405 µs for #5/4
476 µs for #6/7
546 µs for #7/9
594 µs for #8/6
642 µs for #9/8
678 µs for #10/10
714 µs for #11/11
771 µs for #12/12
844 µs for #13/13
909 µs for #14/15
955 µs for #15/14
963 µs for #16/17
1031 µs for #17/16
1081 µs for #18/18
1126 µs for #19/19
1145 µs for #20/20
1204 µs for #21/21
1270 µs for #22/23
1377 µs for #23/24
1411 µs for #24/31
1454 µs for #25/22
1534 µs for #26/25
1588 µs for #27/27
1672 µs for #28/29
1695 µs for #29/40
1727 µs for #30/33
1775 µs for #31/28
1813 µs for #33/32
1840 µs for #34/34
1872 µs for #35/35
1792 µs for #32/30
1929 µs for #36/37
2011 µs for #37/36
2070 µs for #38/26
2087 µs for #39/39
2156 µs for #40/38


-2147483648 µs   for #2/1 ??
Title: Re: CreateThread overhead
Post by: jj2007 on January 31, 2021, 08:37:46 PM
Quote from: quarantined on January 31, 2021, 06:25:20 PM-2147483648 µs   for #2/1 ??

Lovely :tongue:

It means that the first call to QueryPerformanceFrequency returned zero. Can you reproduce it?
Title: Re: CreateThread overhead
Post by: jj2007 on February 01, 2021, 01:11:39 AM
Here is a variant: CreateThread once, then the thread gets suspended and resumed. Typical timings:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GH
1851 µs for creating the thread
thread resumed after 280 µs    for #1
thread resumed after 2 µs       for #2
thread resumed after 10 µs      for #3
thread resumed after 4 µs       for #4
thread resumed after 7 µs       for #5
thread resumed after 20 µs      for #6
thread resumed after 2 µs       for #7
thread resumed after 6 µs       for #8
thread resumed after 80 µs      for #9
thread resumed after 14 µs      for #10
thread resumed after 6 µs       for #11
thread resumed after 4 µs       for #12
thread resumed after 8 µs       for #13
thread resumed after 2 µs       for #14
thread resumed after 10 µs      for #15
thread resumed after 6 µs       for #16
thread resumed after 17 µs      for #17
thread resumed after 14 µs      for #18
thread resumed after 4 µs       for #19
thread resumed after 4 µs       for #20
thread resumed after 9 µs       for #21
thread resumed after 11 µs      for #22
thread resumed after 18 µs      for #23
thread resumed after 17 µs      for #24
thread resumed after 46 µs      for #25
thread resumed after 21 µs      for #26
thread resumed after 22 µs      for #27
thread resumed after 44 µs      for #28
thread resumed after 32 µs      for #29
thread resumed after 20 µs      for #30
thread resumed after 56 µs      for #31
Title: Re: CreateThread overhead
Post by: daydreamer on February 01, 2021, 05:44:32 AM
first exe wont work correctly,it just quickly runs and ends,without any way to see timings or any text
second exe :
Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
137 µs for creating the thread
thread resumed after 994 µs     for #1
thread resumed after 26 µs      for #2
thread resumed after 43 µs      for #3
thread resumed after 25 µs      for #4
thread resumed after 21 µs      for #5
thread resumed after 76 µs      for #6
thread resumed after 23 µs      for #7
thread resumed after 22 µs      for #8
thread resumed after 76 µs      for #9
thread resumed after 52 µs      for #10
thread resumed after 52 µs      for #11
thread resumed after 61 µs      for #12
thread resumed after 89 µs      for #13
thread resumed after 53 µs      for #14
thread resumed after 63 µs      for #15
thread resumed after 53 µs      for #16
thread resumed after 56 µs      for #17
thread resumed after 65 µs      for #18
thread resumed after 55 µs      for #19
thread resumed after 29 µs      for #20
thread resumed after 47 µs      for #21
thread resumed after 29 µs      for #22
thread resumed after 31 µs      for #23
thread resumed after 33 µs      for #24
thread resumed after 17 µs      for #25
thread resumed after 72 µs      for #26
thread resumed after 28 µs      for #27
thread resumed after 49 µs      for #28
thread resumed after 88 µs      for #29
thread resumed after 56 µs      for #30
thread resumed after 93 µs      for #31
--- hit any key ---

new fashion to measure performance µs
so have clock cycles measuring become vintage?  :tongue:
Title: Re: CreateThread overhead
Post by: jj2007 on February 01, 2021, 07:48:48 AM
Quote from: daydreamer on February 01, 2021, 05:44:32 AM
first exe wont work correctly,it just quickly runs and ends,without any way to see timings or any text

You can run it from a DOS prompt.

Quoteso have clock cycles measuring become vintage?  :tongue:

Yessss. Cycles don't mean anything between threads.
Title: Re: CreateThread overhead
Post by: quarantined on February 02, 2021, 12:47:28 PM
Quote from: jj2007 on January 31, 2021, 08:37:46 PM
It means that the first call to QueryPerformanceFrequency returned zero. Can you reproduce it?

AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G 
-2147483648 µs   for #2/1
171 µs   for #1/2
290 µs   for #3/3
351 µs   for #4/5
405 µs   for #5/4
476 µs   for #6/7
546 µs   for #7/9
594 µs   for #8/6
642 µs   for #9/8
678 µs   for #10/10
714 µs   for #11/11
771 µs   for #12/12
844 µs   for #13/13
909 µs   for #14/15
955 µs   for #15/14
963 µs   for #16/17
1031 µs   for #17/16
1081 µs   for #18/18
1126 µs   for #19/19
1145 µs   for #20/20
1204 µs   for #21/21
1270 µs   for #22/23
1377 µs   for #23/24
1411 µs   for #24/31
1454 µs   for #25/22
1534 µs   for #26/25
1588 µs   for #27/27
1672 µs   for #28/29
1695 µs   for #29/40
1727 µs   for #30/33
1775 µs   for #31/28
1813 µs   for #33/32
1840 µs   for #34/34
1872 µs   for #35/35
1792 µs   for #32/30
1929 µs   for #36/37
2011 µs   for #37/36
2070 µs   for #38/26
2087 µs   for #39/39
2156 µs   for #40/38

:cool:
Title: Re: CreateThread overhead
Post by: quarantined on February 02, 2021, 12:51:03 PM
for single thread.....:
AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G 
66 µs for creating the thread
thread resumed after 101 µs for #1
thread resumed after 21 µs for #2
thread resumed after 12 µs for #3
thread resumed after 8 µs for #4
thread resumed after 15 µs for #5
thread resumed after 14 µs for #6
thread resumed after 17 µs for #7
thread resumed after 14 µs for #8
thread resumed after 14 µs for #9
thread resumed after 14 µs for #10
thread resumed after 15 µs for #11
thread resumed after 21 µs for #12
thread resumed after 19 µs for #13
thread resumed after 20 µs for #14
thread resumed after 22 µs for #15
thread resumed after 31 µs for #16
thread resumed after 20 µs for #17
thread resumed after 14 µs for #18
thread resumed after 16 µs for #19
thread resumed after 20 µs for #20
thread resumed after 19 µs for #21
thread resumed after 15 µs for #22
thread resumed after 28 µs for #23
thread resumed after 16 µs for #24
thread resumed after 19 µs for #25
thread resumed after 20 µs for #26
thread resumed after 17 µs for #27
thread resumed after 16 µs for #28
thread resumed after 21 µs for #29
thread resumed after 17 µs for #30
thread resumed after 15 µs for #31


2nd run...


AMD A6-9220e RADEON R4, 5 COMPUTE CORES 2C+3G 
87 µs for creating the thread
thread resumed after 189 µs for #1
thread resumed after 30 µs for #2
thread resumed after 194 µs for #3
thread resumed after 24 µs for #4
thread resumed after 17 µs for #5
thread resumed after 22 µs for #6
thread resumed after 26 µs for #7
thread resumed after 24 µs for #8
thread resumed after 17 µs for #9
thread resumed after 33 µs for #10
thread resumed after 22 µs for #11
thread resumed after 19 µs for #12
thread resumed after 17 µs for #13
thread resumed after 19 µs for #14
thread resumed after 20 µs for #15
thread resumed after 33 µs for #16
thread resumed after 19 µs for #17
thread resumed after 34 µs for #18
thread resumed after 17 µs for #19
thread resumed after 17 µs for #20
thread resumed after 15 µs for #21
thread resumed after 17 µs for #22
thread resumed after 20 µs for #23
thread resumed after 16 µs for #24
thread resumed after 21 µs for #25
thread resumed after 17 µs for #26
thread resumed after 15 µs for #27
thread resumed after 17 µs for #28
thread resumed after 17 µs for #29
thread resumed after 17 µs for #30
thread resumed after 20 µs for #31
--- hit any key ---

Title: Re: CreateThread overhead
Post by: quarantined on February 02, 2021, 12:53:41 PM
Quote from: daydreamer on February 01, 2021, 05:44:32 AM
first exe wont work correctly,it just quickly runs and ends,without any way to see timings or any text


or use a batch file with   >output.txt to print it
:tongue:
Title: Re: CreateThread overhead
Post by: jj2007 on February 02, 2021, 01:56:43 PM
Thanks :thumbsup:
Title: Re: CreateThread overhead
Post by: TimoVJL on February 02, 2021, 06:15:11 PM
Windows 10
AMD Ryzen 5 3400G with Radeon Vega Graphics
70 µs for creating the thread
thread resumed after 188 µs     for #1
thread resumed after 13 µs      for #2
thread resumed after 9 µs       for #3
thread resumed after 25 µs      for #4
thread resumed after 9 µs       for #5
thread resumed after 10 µs      for #6
thread resumed after 15 µs      for #7
thread resumed after 9 µs       for #8
thread resumed after 14 µs      for #9
thread resumed after 10 µs      for #10
thread resumed after 8 µs       for #11
thread resumed after 10 µs      for #12
thread resumed after 11 µs      for #13
thread resumed after 8 µs       for #14
thread resumed after 10 µs      for #15
thread resumed after 8 µs       for #16
thread resumed after 9 µs       for #17
thread resumed after 8 µs       for #18
thread resumed after 13 µs      for #19
thread resumed after 8 µs       for #20
thread resumed after 36 µs      for #21
thread resumed after 9 µs       for #22
thread resumed after 31 µs      for #23
thread resumed after 64 µs      for #24
thread resumed after 9 µs       for #25
thread resumed after 174 µs     for #26
thread resumed after 9 µs       for #27
thread resumed after 12 µs      for #28
thread resumed after 13 µs      for #29
thread resumed after 270 µs     for #30
thread resumed after 9 µs       for #31
--- hit any key ---
Title: Re: CreateThread overhead
Post by: jj2007 on February 02, 2021, 08:17:10 PM
 :thumbsup:

My initial CreateThread is always above 1000 µs, on Win7-64. I wonder if Win10 has become better, or if it's cpu related :rolleyes:
Title: Re: CreateThread overhead
Post by: TimoVJL on February 02, 2021, 08:34:24 PM
Windows 7 x64AMD Athlon(tm) II X2 220 Processor
111 µs for creating the thread
thread resumed after 335 µs     for #1
thread resumed after 16 µs      for #2
thread resumed after 92 µs      for #3
thread resumed after 56 µs      for #4
thread resumed after 16 µs      for #5
thread resumed after 25 µs      for #6
thread resumed after 337 µs     for #7
thread resumed after 16 µs      for #8
thread resumed after 20 µs      for #9
thread resumed after 23 µs      for #10
thread resumed after 7 µs       for #11
thread resumed after 17 µs      for #12
thread resumed after 53 µs      for #13
thread resumed after 35 µs      for #14
thread resumed after 61 µs      for #15
thread resumed after 114 µs     for #16
thread resumed after 128 µs     for #17
thread resumed after 13 µs      for #18
thread resumed after 20 µs      for #19
thread resumed after 8 µs       for #20
thread resumed after 153 µs     for #21
thread resumed after 84 µs      for #22
thread resumed after 220 µs     for #23
thread resumed after 15 µs      for #24
thread resumed after 27 µs      for #25
thread resumed after 12 µs      for #26
thread resumed after 227 µs     for #27
thread resumed after 129 µs     for #28
thread resumed after 9 µs       for #29
thread resumed after 10 µs      for #30
thread resumed after 125 µs     for #31
--- hit any key ---
Title: Re: CreateThread overhead
Post by: hutch-- on February 02, 2021, 11:51:17 PM
The old i7 is clocked at 4 gig.

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
152 µs  for #1/1
182 µs  for #2/2
420 µs  for #3/3
444 µs  for #4/4
464 µs  for #5/5
489 µs  for #6/6
512 µs  for #7/7
551 µs  for #8/8
571 µs  for #9/9
591 µs  for #10/10
620 µs  for #11/11
649 µs  for #12/12
666 µs  for #13/13
684 µs  for #14/14
705 µs  for #15/15
736 µs  for #16/28
786 µs  for #17/17
838 µs  for #18/32
868 µs  for #19/19
889 µs  for #20/20
923 µs  for #21/21
951 µs  for #22/22
997 µs  for #23/23
1040 µs for #24/37
1060 µs for #25/25
1092 µs for #26/26
1135 µs for #27/27
1176 µs for #28/16
1214 µs for #29/29
1234 µs for #30/30
1281 µs for #31/18
1320 µs for #32/33
1359 µs for #33/34
1392 µs for #34/35
1431 µs for #35/36
1467 µs for #36/24
1489 µs for #37/38
1528 µs for #38/39
1569 µs for #39/40
1619 µs for #40/31
Title: Re: CreateThread overhead
Post by: hutch-- on February 02, 2021, 11:58:57 PM
12 Core Xeon.

Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
180 µs  for #1/1
214 µs  for #2/2
478 µs  for #3/5
525 µs  for #4/4
557 µs  for #5/3
599 µs  for #6/6
630 µs  for #7/20
666 µs  for #8/8
696 µs  for #9/21
719 µs  for #10/23
749 µs  for #11/11
774 µs  for #12/25
796 µs  for #13/26
827 µs  for #14/14
856 µs  for #15/15
897 µs  for #16/16
913 µs  for #17/17
959 µs  for #18/18
1005 µs for #19/33
1053 µs for #20/7
1079 µs for #21/36
1105 µs for #22/9
1128 µs for #23/38
1187 µs for #24/24
1234 µs for #25/12
1288 µs for #26/13
1342 µs for #27/27
1421 µs for #28/28
1454 µs for #29/29
1523 µs for #30/30
1594 µs for #31/32
1645 µs for #32/31
1671 µs for #33/19
1748 µs for #34/34
1797 µs for #35/37
1842 µs for #36/22
1891 µs for #37/39
1916 µs for #38/10
1989 µs for #39/40
2124 µs for #40/35

Title: Re: CreateThread overhead
Post by: jj2007 on February 03, 2021, 01:59:15 AM
Quote from: hutch-- on February 02, 2021, 11:51:17 PM
The old i7 is clocked at 4 gig.

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
152 µs  for #1/1
182 µs  for #2/2
420 µs  for #3/3
444 µs  for #4/4
464 µs  for #5/5
489 µs  for #6/6
512 µs  for #7/7
551 µs  for #8/8
571 µs  for #9/9
591 µs  for #10/10
620 µs  for #11/11
649 µs  for #12/12
666 µs  for #13/13
684 µs  for #14/14
705 µs  for #15/15
736 µs  for #16/28  <---- first "outlier"

So far the most "stable" result (2nd for quarantined's AMD), in the sense of launching order == finishing order :cool:
Title: Re: CreateThread overhead
Post by: hutch-- on February 03, 2021, 02:31:17 AM
Something you will find is that not all cores in a processor run at the same speed. Both with the old 6 core i7 AND the 12 core Xeon, the core speeds and loadings show some cores faster than others. I use to get the same variation with an earlier 4 core 4770k so its probably a variable factor with multi-core processors.
Title: Re: CreateThread overhead
Post by: quarantined on February 04, 2021, 03:14:20 AM
Quote from: jj2007 on February 02, 2021, 08:17:10 PM
:thumbsup:

My initial CreateThread is always above 1000 µs, on Win7-64. I wonder if Win10 has become better, or if it's cpu related :rolleyes:

For your info, all the tests I have run here are on Windows 7, 32 bit. Hope this info is useful
Title: Re: CreateThread overhead
Post by: jj2007 on February 04, 2021, 07:32:10 AM
Quote from: quarantined on February 04, 2021, 03:14:20 AM
Quote from: jj2007 on February 02, 2021, 08:17:10 PM
:thumbsup:

My initial CreateThread is always above 1000 µs, on Win7-64. I wonder if Win10 has become better, or if it's cpu related :rolleyes:

For your info, all the tests I have run here are on Windows 7, 32 bit. Hope this info is useful

It is, it is, thanks :thup:

In theory, it could be related to Wow64 (https://en.wikipedia.org/wiki/WoW64#Performance), the thin layer that lets 32-bit executables access the 64-bit kernel stuff. However, some other members have posted very short CreateThread times, so I doubt that is the reason :cool:
Title: Re: CreateThread overhead
Post by: HSE on February 04, 2021, 08:56:59 AM
AMD A6-3500 APU with Radeon(tm) HD Graphics
195 µs  for #1/1
233 µs  for #2/2
516 µs  for #3/3
551 µs  for #4/4
785 µs  for #5/7
869 µs  for #6/6
902 µs  for #7/11
1063 µs for #8/12
1286 µs for #9/8
1395 µs for #10/9
1507 µs for #11/10
1622 µs for #12/5
1682 µs for #13/13
1713 µs for #14/23
1838 µs for #15/15
1925 µs for #16/16
1973 µs for #17/27
2027 µs for #18/28
2090 µs for #19/19
2109 µs for #20/29
2195 µs for #21/21
2247 µs for #22/22
2300 µs for #23/30
2316 µs for #24/32
2400 µs for #25/14
2461 µs for #26/24
2513 µs for #27/25
2581 µs for #28/26
2628 µs for #29/37
2685 µs for #30/17
2777 µs for #31/18
2869 µs for #32/20
2934 µs for #33/31
3063 µs for #34/33
3142 µs for #35/34
3160 µs for #36/35
3278 µs for #37/36
3369 µs for #38/38
3426 µs for #39/39
3633 µs for #40/40

This machine have 3 cores and allow 6 subprocess.
Title: Re: CreateThread overhead
Post by: TimoVJL on February 04, 2021, 07:05:37 PM
Windows 10 x64
https://www.amd.com/en/products/apu/amd-ryzen-5-3400g
AMD Ryzen 5 3400G with Radeon Vega Graphics   
327 µs for #1/1
386 µs for #2/2
454 µs for #3/3
492 µs for #4/4
591 µs for #5/6
672 µs for #6/7
698 µs for #7/8
743 µs for #8/5
791 µs for #9/9
895 µs for #10/11
995 µs for #11/12
1046 µs for #12/13
1093 µs for #13/10
1121 µs for #14/16
1168 µs for #15/15
1242 µs for #16/14
1295 µs for #17/17
1349 µs for #18/18
1402 µs for #19/19
1505 µs for #20/20
1661 µs for #21/21
1708 µs for #22/22
1757 µs for #23/23
1804 µs for #24/24
1841 µs for #25/25
1912 µs for #26/26
2006 µs for #27/27
2045 µs for #28/28
2105 µs for #29/29
2157 µs for #30/30
2219 µs for #31/31
2254 µs for #32/32
2317 µs for #33/33
2370 µs for #34/34
2467 µs for #35/35
2550 µs for #36/36
2592 µs for #37/37
2637 µs for #38/38
2701 µs for #39/39
2797 µs for #40/40
Title: Re: CreateThread overhead
Post by: mikeburr on February 05, 2021, 08:44:49 AM
heres mine ... bit surprised how quick it is
Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz
125 µs   for #1/1
159 µs   for #2/2
193 µs   for #3/3
217 µs   for #4/4
255 µs   for #5/5
431 µs   for #6/6
467 µs   for #7/7
487 µs   for #8/8
504 µs   for #9/11
526 µs   for #10/10
550 µs   for #11/9
577 µs   for #12/12
623 µs   for #13/13
717 µs   for #14/14
785 µs   for #15/15
794 µs   for #16/16
826 µs   for #17/17
847 µs   for #18/18
890 µs   for #19/19
938 µs   for #20/20
974 µs   for #21/21
1008 µs   for #22/22
1034 µs   for #23/23
1078 µs   for #24/24
1115 µs   for #25/25
1155 µs   for #26/26
1214 µs   for #27/27
1259 µs   for #28/28
1298 µs   for #29/29
1340 µs   for #30/30
1360 µs   for #31/31
1388 µs   for #32/32
1438 µs   for #33/33
1468 µs   for #34/34
1506 µs   for #35/35
1552 µs   for #36/36
1589 µs   for #37/37
1608 µs   for #38/38
1654 µs   for #39/39
1676 µs   for #40/40
Title: Re: CreateThread overhead
Post by: LiaoMi on February 06, 2021, 10:58:39 AM
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz
47 µs for creating the thread
thread resumed after 385 µs     for #1
thread resumed after 9 µs       for #2
thread resumed after 7 µs       for #3
thread resumed after 7 µs       for #4
thread resumed after 6 µs       for #5
thread resumed after 7 µs       for #6
thread resumed after 6 µs       for #7
thread resumed after 10 µs      for #8
thread resumed after 4 µs       for #9
thread resumed after 3 µs       for #10
thread resumed after 9 µs       for #11
thread resumed after 4 µs       for #12
thread resumed after 6 µs       for #13
thread resumed after 5 µs       for #14
thread resumed after 9 µs       for #15
thread resumed after 8 µs       for #16
thread resumed after 5 µs       for #17
thread resumed after 8 µs       for #18
thread resumed after 8 µs       for #19
thread resumed after 9 µs       for #20
thread resumed after 9 µs       for #21
thread resumed after 11 µs      for #22
thread resumed after 10 µs      for #23
thread resumed after 8 µs       for #24
thread resumed after 10 µs      for #25
thread resumed after 12 µs      for #26
thread resumed after 14 µs      for #27
thread resumed after 12 µs      for #28
thread resumed after 9 µs       for #29
thread resumed after 17 µs      for #30
thread resumed after 9 µs       for #31
--- hit any key ---


Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz
163 µs  for #1/1
218 µs  for #2/2
439 µs  for #3/3
506 µs  for #4/5
545 µs  for #5/18
584 µs  for #6/19
632 µs  for #7/20
669 µs  for #8/8
708 µs  for #9/9
766 µs  for #10/10
796 µs  for #11/26
861 µs  for #12/12
888 µs  for #13/13
968 µs  for #14/14
1033 µs for #15/33
1137 µs for #16/4
1169 µs for #17/16
1216 µs for #18/17
1275 µs for #19/6
1378 µs for #20/7
1450 µs for #21/21
1518 µs for #22/22
1590 µs for #23/23
1673 µs for #24/24
1736 µs for #25/25
1791 µs for #26/27
1894 µs for #27/28
1957 µs for #28/29
1988 µs for #29/30
2066 µs for #30/31
2147 µs for #31/15
2190 µs for #32/34
2266 µs for #33/35
2296 µs for #34/36
2324 µs for #35/37
2353 µs for #36/38
2407 µs for #37/39
2450 µs for #38/40
2598 µs for #39/11
2730 µs for #40/32

Title: Re: CreateThread overhead
Post by: jj2007 on February 06, 2021, 12:05:44 PM
Quote from: LiaoMi on February 06, 2021, 10:58:39 AM
Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz
47 µs for creating the thread
thread resumed after 385 µs     for #1
thread resumed after 9 µs       for #2
thread resumed after 7 µs       for #3
thread resumed after 7 µs       for #4
thread resumed after 6 µs       for #5
thread resumed after 7 µs       for #6
thread resumed after 6 µs       for #7

Wow, that's a fast machine :thumbsup:
Title: Re: CreateThread overhead
Post by: Gunther on February 06, 2021, 05:30:37 PM
Jochen,

sorry, I had simply overlooked your request for the test. Here is my result:
Quote
Intel(R) Core(TM) i7-7820X CPU @ 3.60GHz
353 µs   for #1/1
452 µs   for #2/10
554 µs   for #3/3
681 µs   for #4/21
782 µs   for #5/7
871 µs   for #6/27
969 µs   for #7/8
1044 µs   for #8/35
1131 µs   for #9/39
1222 µs   for #10/11
1459 µs   for #11/12
1577 µs   for #12/14
1694 µs   for #13/15
1787 µs   for #14/16
1870 µs   for #15/17
1972 µs   for #16/18
2060 µs   for #17/4
2150 µs   for #18/20
2239 µs   for #19/22
2334 µs   for #20/24
2449 µs   for #21/26
2531 µs   for #22/28
2568 µs   for #23/29
2753 µs   for #24/32
2837 µs   for #25/34
2968 µs   for #26/36
3000 µs   for #27/37
3104 µs   for #28/38
3184 µs   for #29/2
3304 µs   for #30/13
3506 µs   for #31/19
3558 µs   for #32/5
3668 µs   for #33/23
3754 µs   for #34/25
3842 µs   for #35/6
3949 µs   for #36/30
4040 µs   for #37/33
4124 µs   for #38/9
4542 µs   for #40/31
4310 µs   for #39/40

Gunther
Title: Re: CreateThread overhead
Post by: jj2007 on February 06, 2021, 08:00:57 PM
Quote from: Gunther on February 06, 2021, 05:30:37 PM
Jochen,

sorry, I had simply overlooked your request for the test. Here is my result:
...
Gunther

No problem, I got already many more answers than I expected. The purpose was simply to check if splitting a short task in many threads makes any sense... and it depends, obviously :cool:
Title: Re: CreateThread overhead
Post by: TimoVJL on February 06, 2021, 08:42:10 PM
I wait to see how Zen 3 / Ryzen 7 perform.
Would be nice to know testing Windows OS too.

A one of my test PC AMD Ryzen 5 3400G with Radeon Vega Graphics runs normally 2.7 GHz and is cabable to 3.7 GHz
so results varies a lot.
AMD Ryzen 5 3400G with Radeon Vega Graphics   
179 µs for #1/1
221 µs for #2/2
262 µs for #3/5
308 µs for #4/3
342 µs for #5/4
386 µs for #6/6
455 µs for #7/7
505 µs for #8/8
544 µs for #9/11
573 µs for #10/10
622 µs for #11/9
650 µs for #12/15
752 µs for #13/12
781 µs for #14/13
816 µs for #15/14
843 µs for #16/22
906 µs for #17/16
947 µs for #18/17
973 µs for #19/18
1002 µs for #20/19
1025 µs for #21/20
1055 µs for #22/26
1149 µs for #23/23
1202 µs for #24/24
1239 µs for #25/25
1284 µs for #26/28
1347 µs for #27/27
1405 µs for #28/21
1455 µs for #29/29
1492 µs for #30/30
1556 µs for #31/31
1610 µs for #32/32
1670 µs for #33/33
1714 µs for #34/34
1754 µs for #35/35
1786 µs for #36/36
1839 µs for #37/37
1874 µs for #38/38
1959 µs for #39/39
2023 µs for #40/40

Title: Re: CreateThread overhead
Post by: Gunther on February 06, 2021, 09:02:50 PM
Jochen,

Quote from: jj2007 on February 06, 2021, 08:00:57 PM
No problem, I got already many more answers than I expected. The purpose was simply to check if splitting a short task in many threads makes any sense... and it depends, obviously :cool:

oh yes, it really depends on a lot of factors. And we are talking about fixed coupling (multiple threads or processes on the same machine). In a cluster (loose coupling), things look quite different again.

Gunther
Title: Re: CreateThread overhead
Post by: quarantined on April 20, 2021, 05:31:45 AM

Just for kicks...


Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
26 µs for creating the thread
thread resumed after 67 µs      for #1
thread resumed after 6 µs       for #2
thread resumed after 4 µs       for #3
thread resumed after 5 µs       for #4
thread resumed after 4 µs       for #5
thread resumed after 6 µs       for #6
thread resumed after 3 µs       for #7
thread resumed after 3 µs       for #8
thread resumed after 4 µs       for #9
thread resumed after 3 µs       for #10
thread resumed after 5 µs       for #11
thread resumed after 4 µs       for #12
thread resumed after 4 µs       for #13
thread resumed after 3 µs       for #14
thread resumed after 5 µs       for #15
thread resumed after 3 µs       for #16
thread resumed after 4 µs       for #17
thread resumed after 3 µs       for #18
thread resumed after 3 µs       for #19
thread resumed after 4 µs       for #20
thread resumed after 3 µs       for #21
thread resumed after 5 µs       for #22
thread resumed after 3 µs       for #23
thread resumed after 4 µs       for #24
thread resumed after 4 µs       for #25
thread resumed after 4 µs       for #26
thread resumed after 4 µs       for #27
thread resumed after 4 µs       for #28
thread resumed after 4 µs       for #29
thread resumed after 4 µs       for #30
thread resumed after 4 µs       for #31
--- hit any key ---

Windows xp, 32 bit
Title: Re: CreateThread overhead
Post by: jj2007 on April 20, 2021, 07:20:50 AM
Quote from: quarantined on April 20, 2021, 05:31:45 AM
Windows xp, 32 bit

That's pretty fast. I wonder whether the WOW64 overhead plays a role here.
Title: Re: CreateThread overhead
Post by: hutch-- on April 20, 2021, 09:49:22 AM

Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
113 µs  for #1/1
147 µs  for #2/2
324 µs  for #3/3
348 µs  for #4/4
372 µs  for #5/5
410 µs  for #6/6
433 µs  for #7/7
449 µs  for #8/8
473 µs  for #9/9
489 µs  for #10/10
513 µs  for #11/11
530 µs  for #12/12
574 µs  for #13/22
605 µs  for #14/15
654 µs  for #15/16
678 µs  for #16/17
692 µs  for #17/26
715 µs  for #18/18
742 µs  for #19/28
780 µs  for #20/20
823 µs  for #21/21
870 µs  for #22/13
895 µs  for #23/14
923 µs  for #24/35
942 µs  for #25/24
977 µs  for #26/25
997 µs  for #27/38
1046 µs for #28/19
1085 µs for #29/29
1107 µs for #30/40
1145 µs for #31/30
1177 µs for #32/32
1202 µs for #33/33
1227 µs for #34/34
1269 µs for #35/36
1314 µs for #36/37
1352 µs for #37/27
1393 µs for #38/39
1427 µs for #39/31
1470 µs for #40/23
Title: Re: CreateThread overhead
Post by: quarantined on April 22, 2021, 04:48:56 AM
Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
80 µs   for #1/1
116 µs   for #2/2
159 µs   for #3/3
245 µs   for #4/5
274 µs   for #5/4
341 µs   for #6/7
355 µs   for #7/6
387 µs   for #8/9
430 µs   for #9/8
480 µs   for #10/11
526 µs   for #11/13
553 µs   for #12/10
576 µs   for #13/12
613 µs   for #14/15
664 µs   for #15/17
714 µs   for #16/19
744 µs   for #17/14
773 µs   for #18/16
789 µs   for #19/18
833 µs   for #20/21
895 µs   for #21/23
937 µs   for #22/25
978 µs   for #23/27
1016 µs   for #24/29
1054 µs   for #25/31
1101 µs   for #26/33
1129 µs   for #27/20
1156 µs   for #28/22
1173 µs   for #29/24
1216 µs   for #30/35
1272 µs   for #31/39
1279 µs   for #32/28
1296 µs   for #33/30
1321 µs   for #34/26
1374 µs   for #35/37
1392 µs   for #37/32
1381 µs   for #36/34
1485 µs   for #40/40
1419 µs   for #38/36
1445 µs   for #39/38
:tongue:

win 7. 32 bit
Title: Re: CreateThread overhead
Post by: quarantined on April 22, 2021, 04:57:38 AM
Quote from: jj2007 on April 20, 2021, 07:20:50 AM
That's pretty fast. I wonder whether the WOW64 overhead plays a role here.

For comparison from win 10, 64 bit...

215 µs   for #1/2
303 µs   for #2/1
516 µs   for #3/4
544 µs   for #4/5
567 µs   for #5/6
484 µs   for #6/3
750 µs   for #7/8
758 µs   for #8/7
901 µs   for #9/10
874 µs   for #10/9
1104 µs   for #11/11
1132 µs   for #12/12
1154 µs   for #13/13
1190 µs   for #14/14
1348 µs   for #15/15
1476 µs   for #16/16
1499 µs   for #17/17
1672 µs   for #18/19
1645 µs   for #19/18
1875 µs   for #20/20
1902 µs   for #21/21
1929 µs   for #22/22
2016 µs   for #23/23
2142 µs   for #24/24
2173 µs   for #25/25
2196 µs   for #26/26
2332 µs   for #27/27
2483 µs   for #28/28
2510 µs   for #29/29
2533 µs   for #30/30
2714 µs   for #31/32
2768 µs   for #32/31
2922 µs   for #33/33
2949 µs   for #34/34
2972 µs   for #35/35
2999 µs   for #36/36
3217 µs   for #37/38
3254 µs   for #38/37
3338 µs   for #39/39
3366 µs   for #40/40
Title: Re: CreateThread overhead
Post by: FORTRANS on April 22, 2021, 09:52:03 PM
Hi Jochen,

   Both programs, three computers.  Two XP {assumed 32-bit},
one 8.1.  Just noticed when posting, one negative timing.  So,
I hope you find this either useful, or amusing.

Intel(R) Pentium(R) M processor 1.70GHz
469 µs for #1/1
529 µs for #2/2
561 µs for #3/3
592 µs for #4/4
622 µs for #5/5
651 µs for #6/6
682 µs for #7/7
712 µs for #8/8
742 µs for #9/9
772 µs for #10/10
803 µs for #11/11
833 µs for #12/12
863 µs for #13/13
893 µs for #14/14
923 µs for #15/15
953 µs for #16/16
983 µs for #17/17
1014 µs for #18/18
1538 µs for #19/19
1571 µs for #20/20
1602 µs for #21/21
1632 µs for #22/22
1772 µs for #23/23
1804 µs for #24/24
1835 µs for #25/25
1865 µs for #26/26
1895 µs for #27/27
2001 µs for #28/28
2032 µs for #29/29
2065 µs for #30/30
2096 µs for #31/31
2126 µs for #32/32
2156 µs for #33/33
2186 µs for #34/34
2216 µs for #35/35
2345 µs for #36/36
2376 µs for #37/37
2405 µs for #38/38
2434 µs for #39/39
2464 µs for #40/40

Intel(R) Pentium(R) M processor 1.70GHz
42 µs for creating the thread
thread resumed after 74 µs for #1
thread resumed after 5 µs for #2
thread resumed after 4 µs for #3
thread resumed after 3 µs for #4
thread resumed after 3 µs for #5
thread resumed after 3 µs for #6
thread resumed after 3 µs for #7
thread resumed after 3 µs for #8
thread resumed after 3 µs for #9
thread resumed after 3 µs for #10
thread resumed after 3 µs for #11
thread resumed after 3 µs for #12
thread resumed after 3 µs for #13
thread resumed after 3 µs for #14
thread resumed after 3 µs for #15
thread resumed after 3 µs for #16
thread resumed after 3 µs for #17
thread resumed after 3 µs for #18
thread resumed after 3 µs for #19
thread resumed after 3 µs for #20
thread resumed after 3 µs for #21
thread resumed after 3 µs for #22
thread resumed after 3 µs for #23
thread resumed after 3 µs for #24
thread resumed after 3 µs for #25
thread resumed after 3 µs for #26
thread resumed after 3 µs for #27
thread resumed after 3 µs for #28
thread resumed after 3 µs for #29
thread resumed after 3 µs for #30
thread resumed after 3 µs for #31
--- hit any key ---

Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz
264 µs for #1/1
357 µs for #2/2
383 µs for #3/3
510 µs for #4/5
609 µs for #5/6
672 µs for #6/4
703 µs for #7/8
745 µs for #8/7
767 µs for #9/9
1128 µs for #10/10
1188 µs for #11/11
1241 µs for #12/12
1284 µs for #13/13
1337 µs for #14/14
1397 µs for #15/15
1504 µs for #16/16
1669 µs for #17/17
1769 µs for #18/18
1826 µs for #19/19
1951 µs for #20/20
1988 µs for #21/21
2036 µs for #22/22
2190 µs for #23/23
2230 µs for #24/24
2277 µs for #25/25
2343 µs for #26/26
2384 µs for #27/27
2585 µs for #28/28
2626 µs for #29/29
2673 µs for #30/30
2728 µs for #31/31
2777 µs for #32/32
2960 µs for #33/34
3066 µs for #34/33
3147 µs for #35/35
3185 µs for #36/37
3250 µs for #37/36
3313 µs for #38/38
3386 µs for #39/39
3511 µs for #40/40

Intel(R) Core(TM) i3-4005U CPU @ 1.70GHz
61 µs for creating the thread
thread resumed after 252 µs for #1
thread resumed after 42 µs for #2
thread resumed after 7 µs for #3
thread resumed after 31 µs for #4
thread resumed after 18 µs for #5
thread resumed after 23 µs for #6
thread resumed after 28 µs for #7
thread resumed after 49 µs for #8
thread resumed after 19 µs for #9
thread resumed after 40 µs for #10
thread resumed after 31 µs for #11
thread resumed after 59 µs for #12
thread resumed after 43 µs for #13
thread resumed after 40 µs for #14
thread resumed after 30 µs for #15
thread resumed after 32 µs for #16
thread resumed after 33 µs for #17
thread resumed after 32 µs for #18
thread resumed after 31 µs for #19
thread resumed after 35 µs for #20
thread resumed after 46 µs for #21
thread resumed after 35 µs for #22
thread resumed after 39 µs for #23
thread resumed after 20 µs for #24
thread resumed after 37 µs for #25
thread resumed after 27 µs for #26
thread resumed after 45 µs for #27
thread resumed after 29 µs for #28
thread resumed after 48 µs for #29
thread resumed after 28 µs for #30
thread resumed after 35 µs for #31
--- hit any key ---

Genuine Intel(R) CPU           T2400  @ 1.83GHz
237 µs for #1/1
369 µs for #2/2
466 µs for #3/3
545 µs for #4/4
634 µs for #5/5
758 µs for #6/7
877 µs for #7/6
991 µs for #8/8
1010 µs for #9/9
1119 µs for #10/10
1144 µs for #11/11
1305 µs for #12/12
1381 µs for #13/13
1494 µs for #14/14
1609 µs for #15/15
1703 µs for #16/16
1798 µs for #17/17
1919 µs for #18/18
1944 µs for #19/19
2102 µs for #20/20
2542 µs for #21/21
2658 µs for #22/23
2693 µs for #23/27
2667 µs for #24/25
2856 µs for #25/24
2875 µs for #26/22
2988 µs for #27/29
3105 µs for #28/26
-2147483648 µs for #29/28
3238 µs for #30/31
3321 µs for #31/30
3429 µs for #32/32
3459 µs for #33/33
3609 µs for #35/35
3609 µs for #35/35
3722 µs for #36/36
3874 µs for #37/37
3963 µs for #38/38
4048 µs for #39/39
4137 µs for #40/40

Genuine Intel(R) CPU           T2400  @ 1.83GHz
118 µs for creating the thread
thread resumed after 177 µs for #1
thread resumed after 13 µs for #2
thread resumed after 10 µs for #3
thread resumed after 17 µs for #4
thread resumed after 11 µs for #5
thread resumed after 16 µs for #6
thread resumed after 9 µs for #7
thread resumed after 9 µs for #8
thread resumed after 8 µs for #9
thread resumed after 14 µs for #10
thread resumed after 11 µs for #11
thread resumed after 13 µs for #12
thread resumed after 11 µs for #13
thread resumed after 61 µs for #14
thread resumed after 10 µs for #15
thread resumed after 15 µs for #16
thread resumed after 17 µs for #17
thread resumed after 15 µs for #18
thread resumed after 8 µs for #19
thread resumed after 15 µs for #20
thread resumed after 11 µs for #21
thread resumed after 9 µs for #22
thread resumed after 11 µs for #23
thread resumed after 17 µs for #24
thread resumed after 9 µs for #25
thread resumed after 12 µs for #26
thread resumed after 11 µs for #27
thread resumed after 16 µs for #28
thread resumed after 10 µs for #29
thread resumed after 8 µs for #30
thread resumed after 10 µs for #31
--- hit any key ---


Cheers,

Steve N.
Title: Re: CreateThread overhead
Post by: jj2007 on April 22, 2021, 10:20:04 PM
Thanks, Steve. The first one is definitely fast ("thread resumed after 3 µs"), can you verify if it's a 32-bit OS?
If so, it would give us a hint how much the translation of 32-bit calls to the 64-bit OS costs in practice.

Attached a test with a fast WinAPI call: GetTickCount. Here is the source:
include \masm32\MasmBasic\MasmBasic.inc
  Init
  PrintCpu 0
  Print Str$("This is Windows version %i", MbWinVersion()), Str$(".%i", ecx)
  Print Str$(", build %i, ", MbWinVersion(build))
  .if Win64()
PrintLine "64-bit OS"
  .else
PrintLine "32-bit OS"
  .endif
  xor ecx, ecx
  .Repeat
inc ecx ; warmup
  .Until Sign?
  REPEAT 3
NanoTimer()
xor ebx, ebx
.Repeat
invoke GetTickCount
inc ebx
.Until ebx>99999999
  Print NanoTimer$(), Str$(" for %i Million times GetTickCount\n", ebx/1000000)
  ENDM
  Inkey
EndOfCode


My results:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
This is Windows version 6.1, build 7601, 64-bit OS
584 ms for 100 Million times GetTickCount
533 ms for 100 Million times GetTickCount
539 ms for 100 Million times GetTickCount

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
This is Windows version 5.1, build 2600, 32-bit OS
321 ms for 100 Million times GetTickCount
279 ms for 100 Million times GetTickCount
283 ms for 100 Million times GetTickCount


The second one is a VM :cool:

For comparison, the empty loop:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
This is Windows version 6.1, build 7601, 64-bit OS
35 ms for 100 Million times empty loop
35 ms for 100 Million times empty loop
42 ms for 100 Million times empty loop

Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
This is Windows version 5.1, build 2600, 32-bit OS
34 ms for 100 Million times empty loop
35 ms for 100 Million times empty loop
34 ms for 100 Million times empty loop


So it seems that the translation of a WinAPI call from 32-bit to 64-bit land costs around 2.5 milliseconds per Million calls :cool:
Title: Re: CreateThread overhead
Post by: FORTRANS on April 23, 2021, 12:25:55 AM
Hi,

Quote from: jj2007 on April 22, 2021, 10:20:04 PM
Thanks, Steve. The first one is definitely fast ("thread resumed after 3 µs"), can you verify if it's a 32-bit OS?

   Well, it's a 32-bit CPU, but anyway, here's your program's
output and that of VER.

This is Windows version 5.1, build 2600, 32-bit OS
603 ms for 100 Million times GetTickCount
538 ms for 100 Million times GetTickCount
601 ms for 100 Million times GetTickCount

Microsoft Windows XP [Version 5.1.2600]


Regards,

Steve
Title: Re: CreateThread overhead
Post by: jj2007 on April 23, 2021, 12:53:05 AM
Thanks :biggrin:

Here is the 64-bit version:
include \Masm32\MasmBasic\Res\JBasic.inc ; ## console demo, builds in 32- or 64-bit mode with UAsm, ML, AsmC ##
ticks QWORD ?
Init ; OPT_64 1 ; put 0 for 32 bit, 1 for 64 bit assembly
  PrintLine Chr$("This program was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format.")
  xor edi, edi
@OuterLoop:
jinvoke GetTickCount
mov ticks, rax
xor ebx, ebx
@@: jinvoke GetTickCount
inc ebx
cmp ebx, 99999999
jb @B
sub rax, ticks
PrintLine Str$("%i ms for 100 Million x GetTickCount", rax)
  inc edi
  cmp edi, 3
  jb @OuterLoop
  Inkey
EndOfCode

This program was assembled with ml64 in 64-bit format.
343 ms for 100 Million x GetTickCount
328 ms for 100 Million x GetTickCount
327 ms for 100 Million x GetTickCount


A bit slow compared to the 32-bit code in the 32-bit OS, but the 2.5ms/Million overhead is not present :cool:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
This is Windows version 5.1, build 2600, 32-bit OS
321 ms for 100 Million times GetTickCount
279 ms for 100 Million times GetTickCount
283 ms for 100 Million times GetTickCount
Title: Re: CreateThread overhead
Post by: quarantined on April 23, 2021, 02:27:49 AM
---------------  For 'GetTickCount.exe' (32 bit version)

64 bit cpu Windows XP 32 bit
Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
This is Windows version 5.1, build 2600, 32-bit OS
300 ms for 100 Million times GetTickCount
267 ms for 100 Million times GetTickCount
267 ms for 100 Million times GetTickCount

64 bit cpu Windows 7 32 bit
Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
This is Windows version 6.1, build 7601, 32-bit OS
478 ms for 100 Million times GetTickCount
435 ms for 100 Million times GetTickCount
437 ms for 100 Million times GetTickCount




--------------------------------------------
64 bit cpu Windows 10 64 bit
Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
This is Windows version 10.0, build 19042, 64-bit OS
470 ms for 100 Million times GetTickCount
435 ms for 100 Million times GetTickCount
434 ms for 100 Million times GetTickCount

=============================================
---------------  For 'GetTickCount64.exe' (64 bit version)

64 bit cpu Windows 10 64 bit
This program was assembled with ml64 in 64-bit format.
203 ms for 100 Million x GetTickCount
203 ms for 100 Million x GetTickCount
203 ms for 100 Million x GetTickCount

Sorry I don't have a native 32 bit box to test on.

Title: Re: CreateThread overhead
Post by: jj2007 on April 23, 2021, 02:43:01 AM
Thanks :thumbsup:
Title: Re: CreateThread overhead
Post by: LiaoMi on April 23, 2021, 06:12:52 AM
Quote from: jj2007 on April 23, 2021, 12:53:05 AM
Thanks :biggrin:

Here is the 64-bit version:
include \Masm32\MasmBasic\Res\JBasic.inc ; ## console demo, builds in 32- or 64-bit mode with UAsm, ML, AsmC ##
ticks QWORD ?
Init ; OPT_64 1 ; put 0 for 32 bit, 1 for 64 bit assembly
  PrintLine Chr$("This program was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format.")
  xor edi, edi
@OuterLoop:
jinvoke GetTickCount
mov ticks, rax
xor ebx, ebx
@@: jinvoke GetTickCount
inc ebx
cmp ebx, 99999999
jb @B
sub rax, ticks
PrintLine Str$("%i ms for 100 Million x GetTickCount", rax)
  inc edi
  cmp edi, 3
  jb @OuterLoop
  Inkey
EndOfCode

This program was assembled with ml64 in 64-bit format.
343 ms for 100 Million x GetTickCount
328 ms for 100 Million x GetTickCount
327 ms for 100 Million x GetTickCount


A bit slow compared to the 32-bit code in the 32-bit OS, but the 2.5ms/Million overhead is not present :cool:
Intel(R) Core(TM) i5-2450M CPU @ 2.50GHz
This is Windows version 5.1, build 2600, 32-bit OS
321 ms for 100 Million times GetTickCount
279 ms for 100 Million times GetTickCount
283 ms for 100 Million times GetTickCount


win10 x64 i7-4810mq

This program was assembled with ml64 in 64-bit format.
203 ms for 100 Million x GetTickCount
188 ms for 100 Million x GetTickCount
203 ms for 100 Million x GetTickCount
Title: Re: CreateThread overhead
Post by: hutch-- on April 23, 2021, 07:25:20 AM
Haswell E/EP @ 4gig.


This program was assembled with ml64 in 64-bit format.
125 ms for 100 Million x GetTickCount
125 ms for 100 Million x GetTickCount
125 ms for 100 Million x GetTickCount
Title: Re: CreateThread overhead
Post by: hutch-- on April 23, 2021, 08:15:08 AM
Bothered to have a look, you certainly have some strange code for 64 bit. Looks suspiciously like hybrid STDCALL 32 bit code embedded in 64 bit.

0x140001248:
stosq qword ptr [rdi], rax         ; no REP

0x14000119b:
push rsi
push rdi
push rbx
push rdx
mov rsi, rcx
mov rdi, rdx
xor rbx, rbx

pop rdx
pop rbx
pop rdi
pop rsi
ret

jdebP:
mov qword ptr [0x140002700], rsp
lea rsp, [0x140002790]
push r15
push r14
push r13
push r12
push r11
push r10
push r9
push r8
push rdi
push rsi
push rbp
push rsp
push rbx
push rdx
push rcx
push rax
lea rdx, [0x140002700]
mov rax, qword ptr [rsp-0x10]
mov rdx, qword ptr [rax]
sub rdx, 5
mov qword ptr [rsp-8], rdx
mov qword ptr [rsp+0x20], rax
add qword ptr [rsp+0x20], 8
xchg rax, rsp                      ; antique junk
or eax, 0xffffffff
cdq                                ; more antique junk
lea rcx, [0x140002790]
fxsave ptr [rcx]
ret
Title: Re: CreateThread overhead
Post by: jj2007 on April 23, 2021, 09:19:03 AM
Quote from: hutch-- on April 23, 2021, 08:15:08 AM
Bothered to have a look, you certainly have some strange code for 64 bit. Looks suspiciously like hybrid STDCALL 32 bit code embedded in 64 bit.

Any violations of the x64 ABI? Shall we call the code police?
:tongue:
Title: Re: CreateThread overhead
Post by: hutch-- on April 23, 2021, 10:04:30 AM
> Any violations of the x64 ABI

Stack twiddling again, dodgy unreliable code.
Title: Re: CreateThread overhead
Post by: jj2007 on April 23, 2021, 10:06:32 AM
Is that just your opinion, or can you prove it, maybe with a crispy example of crashing code?
Title: Re: CreateThread overhead
Post by: mikeburr on April 23, 2021, 01:18:33 PM
xchg rax, rsp                      ; antique junk
this is going to lock the bus ...
regards mikeb
Title: Re: CreateThread overhead
Post by: hutch-- on April 23, 2021, 01:51:28 PM
 :biggrin:

> Is that just your opinion, or can you prove it, maybe with a crispy example of crashing code?

No, I will just use yours. Having to match pushes and pops leaves the code open to alignment errors. Try 3 pushes and pops.

It should look like this, not the macros but the underlying mnemonics.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include64\masm64rt.inc

    .code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

    call tst
    .exit

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

PROCALIGN                                                   ; align proc with no stack frame

tst proc

    USING rsi, rdi, r12, r13                                ; list regs to be saved
    LOCAL pbuf  :QWORD                                      ; buffer pointer
    LOCAL buff[128]:BYTE                                    ; buffer

    SaveRegs                                                ; save listed regs

    mov pbuf, ptr$(buff)                                    ; get pointer to buffer

    mov rsi, 1                                              ; write something to 4 regs
    mov rdi, 2
    mov r12, 3
    mov r13, 4

    mcat pbuf, str$(rsi)," ",str$(rdi)," ", \               ; convert and join 4 strings
               str$(r12)," ",str$(r13)

    rcall MessageBox,0,pbuf,"MASM64",MB_ICONINFORMATION     ; call the MessageBox function

    RestoreRegs                                             ; restore listed regs
    ret

tst endp

STACKFRAME                                                  ; restore default stack frame

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    end


Disasm.

; IN
mov qword ptr [rbp+0x80], rsi
mov qword ptr [rbp+0x88], rdi
mov qword ptr [rbp+0x90], r12
mov qword ptr [rbp+0x98], r13

: OUT

mov rsi, qword ptr [rbp+0x80]
mov rdi, qword ptr [rbp+0x88]
mov r12, qword ptr [rbp+0x90]
mov r13, qword ptr [rbp+0x98]


Look MUM, no stack twiddling.  :tongue:
Title: Re: CreateThread overhead
Post by: jj2007 on April 23, 2021, 05:10:33 PM
Quote from: mikeburr on April 23, 2021, 01:18:33 PM
xchg rax, rsp                      ; antique junk
this is going to lock the bus ...
regards mikeb

Not correct, Mike - study the docs. The instruction is certainly slow. It will delay this proc, which prints debug output to the console, by a few cycles. Right now I am too lazy to calculate whether that's in the order of  nano- or picoseconds...

Quote from: hutch-- on April 23, 2021, 01:51:28 PM
:biggrin:

> Is that just your opinion, or can you prove it, maybe with a crispy example of crashing code?

No, I will just use yours. Having to match pushes and pops leaves the code open to alignment errors. Try 3 pushes and pops.

This is a library proc. Contrary to what you seem to believe, I am perfectly able to calculate the number of pushes required to maintain the 16-byte alignment. In this context, and only in this context, pushing the regs is the best way to save them all.

Btw you didn't prove that my code could crash. You didn't because it cannot crash.

Quote from: hutch-- on April 23, 2021, 08:15:08 AM
xchg rax, rsp                      ; antique junk
or eax, 0xffffffff
cdq                                ; more antique junk

Simple C examples and their Assembly output from GCC 4.9.0 (https://gist.github.com/lancejpollard/adf75b90137ef29e6f02)
foo(int, int):
  mov eax, edi
  cdq
  idiv  esi
  ret

foo(int, int, int):
  mov eax, edi
  mov ecx, edx
  cdq
  idiv  esi
  cdq
  idiv  ecx
  ret
Title: Re: CreateThread overhead
Post by: hutch-- on April 23, 2021, 06:41:07 PM
 :biggrin:

> Btw you didn't prove that my code could crash. You didn't because it cannot crash.

Unless you only use 3 instead of four pushes or pops. Manual stack twiddling is dangerous unreliable code and you should know that by now.

> Simple C examples and their Assembly output from GCC 4.9.0

Now you are trying to make me laugh, taking your instruction reference from a C compiler and GCC at that.
Title: Re: CreateThread overhead
Post by: jj2007 on April 23, 2021, 06:54:14 PM
Quote from: hutch-- on April 23, 2021, 06:41:07 PM
:biggrin:

> Btw you didn't prove that my code could crash. You didn't because it cannot crash.

Unless you only use 3 instead of four pushes or pops. Manual stack twiddling is dangerous unreliable code and you should know that by now.

I do know that, and I am able to count to three.

Quote from: hutch-- on April 23, 2021, 08:15:08 AM
cdq                                ; more antique junk

Quote> Simple C examples and their Assembly output from GCC 4.9.0

Now you are trying to make me laugh, taking your instruction reference from a C compiler and GCC at that.

I am so sorry that I mentioned this crappy Open Sauce compiler - my apologies! Since you are so unhappy that I don't refer to the true and only Microsoft C compiler:

VS2017 compiler emitting 2 division instructions for a division/remainder pair (https://stackoverflow.com/questions/51477667/vs2017-compiler-emitting-2-division-instructions-for-a-division-remainder-pair)
00007FF790061FA0 42 8B 04 1F          mov         eax,dword ptr [rdi+r11] 
00007FF790061FA4 99                   cdq 
00007FF790061FA5 F7 7E 28             idiv        eax,dword ptr [rsi+28h] 
00007FF790061FA8 4C 63 D0             movsxd      r10,eax 
00007FF790061FAB 42 8B 04 1F          mov         eax,dword ptr [rdi+r11] 
00007FF790061FAF 99                   cdq 
00007FF790061FB0 F7 7E 28             idiv        eax,dword ptr [rsi+28h] 
Title: Re: CreateThread overhead
Post by: hutch-- on April 23, 2021, 09:04:07 PM
 :biggrin:

> I am so sorry that I mentioned this crappy Open Sauce compiler

No, you mentioned "crappy Open Sauce compiler". I referred to a C compiler AND GCC at that.

Since when did assembler programmers use a C compiler as their reference for writing assembler ? You may find the Intel manuals a lot more informative.

You can keep avoiding the obvious that you are trying to use an unreliable technique left over from Win32 but in Win64 you need to leave this old junk behind and write modern x64 code, not clapped out unreliable hybrids left over from Win32.
Title: Re: CreateThread overhead
Post by: jj2007 on April 23, 2021, 09:06:41 PM
You called cdq "antique junk", and I demonstrated that both GCC and Microsoft Visual C use what you call "antique junk" :cool:

Quote from: hutch-- on April 23, 2021, 09:04:07 PMyou are trying to use an unreliable technique left over from Win32

There is nothing unreliable about the technique I am using in the jdebP procedure - in the hands of the expert. As stated earlier, this is a library function. Newbies are not allowed to touch it.

Once upon a time, Steve Hutchesson was proud that assembler programmers could use different techniques than the dumb C compilers.
Title: Re: CreateThread overhead
Post by: hutch-- on April 23, 2021, 09:56:30 PM
 :biggrin:

I still fail to see why you are preaching the virtues of aping C compilers when the code you are defending is ancient junk.

Don't expect that simply because something is in a C compiler output that its good code. Over time they have produced their fair share of crap code as it gets immotalised in each generation of compiler and rarely ever gets changed.

> Once upon a time, Steve Hutchesson was proud that assembler programmers could use different techniques than the dumb C compilers.

Seems you have not learnt that lesson and want to keep aping the junky end of C compiler output.

There are a couple of things that you need to change, abandon old junk instructions and only use the fast stuff AND stop trying to marry Win32 STDCALL and x64 and only use Win64 FASTCALL where you stop modifying the stack.
Title: Re: CreateThread overhead
Post by: jj2007 on April 23, 2021, 10:06:42 PM
Take it easy, Hutch :biggrin:
Title: Re: CreateThread overhead
Post by: hutch-- on April 23, 2021, 10:48:47 PM
 :skrewy:
Title: Re: CreateThread overhead
Post by: nidud on April 24, 2021, 12:48:42 AM
deleted
Title: Re: CreateThread overhead
Post by: jj2007 on April 24, 2021, 01:18:39 AM
You are kidding, nidud. I suggest you read the manual of the push instruction :cool:
Title: Re: CreateThread overhead
Post by: nidud on April 24, 2021, 01:43:28 AM
deleted
Title: Re: CreateThread overhead
Post by: jj2007 on April 24, 2021, 02:13:46 AM
I stand corrected, Nidud - congrats :thumbsup:

Yes, this stuff is almost 5 years old, and I had forgotten what a daredevil I was in May 2016 :greensml:

So, my advice: don't use the JBasic deb macro in the release version of your programs :cool:

Still, I'd be curious to see what exactly happens when an interrupt takes over, and finds rsp in the .data? section :rolleyes:

regsave db 512+reqb dup(?) ; fxsave: only 416 bytes overwritten by CPU

.CODE
jdebP proc
  mov QWORD ptr regsave, rsp
  lea rsp, regsave+reqb    ; <<<<<<<<<<<<<<< put rsp in the .data? section!
Title: Re: CreateThread overhead
Post by: hutch-- on April 24, 2021, 09:56:37 AM
 :biggrin:

> I'd be curious to see what exactly happens when an interrupt takes over

Aha, yet another unreliable technique in the field of contraception.  :skrewy:

What happened to that rock solid UASMBasic ?  :tongue:
Title: Re: CreateThread overhead
Post by: jj2007 on April 24, 2021, 11:20:38 AM
What happened to your explorative spirit of assembly programming? Nowadays scared of new techniques, and of the code police waiting for you behind the next corner?

Here is what I found on interrupts and the role of the stack; it's very little but it's sufficiently clear that the exotic technique I used in the JBasic debugging proc (jdebP) will not pose any problems. Btw JBasic (http://masm32.com/board/index.php?topic=9266.0) (my dual 64-/32-bit assembly framework for MASM, UAsm and AsmC) has almost nothing to do with MasmBasic (http://masm32.com/board/index.php?topic=94.0) (32-bit only and really rock solid :biggrin:) - they share some syntax but are otherwise independent of each other.

A bit detailed info about interrupts in Windows (https://forums.guru3d.com/threads/a-bit-detailed-info-about-interrupts-in-windows.424677/)
Discussion in 'Operating Systems' started by mbk1969, Jan 4, 2019
QuoteWhen a hardware exception or interrupt is generated, the processor records enough machine state on the kernel stack of the thread that's interrupted to return to that point in the control flow and continue execution as if nothing had happened. If the thread was executing in user mode, Windows switches to the thread's kernel-mode stack. Windows then creates a trap frame on the kernel stack of the interrupted thread into which it stores the execution state of the thread.

Is it valid to write below ESP? (https://stackoverflow.com/questions/52258402/is-it-valid-to-write-below-esp)
Quote
Hardware interrupts can't use the user stack; that would let user-space crash the kernel with mov esp, 0, or worse take over the kernel by having another thread in the user-space process modify return addresses while an interrupt handler was running. This is why kernels always configure things so interrupt context is pushed onto the kernel stack.
...
So the question becomes: is there anything on Windows that can asynchronously run code using the user-space stack between two arbitrary instructions? (i.e. any equivalent to a Unix signal handler.)

As far as we can tell, SEH (Structured Exception Handling) is the only real obstacle to what you propose for user-space code on current 32 and 64-bit Windows.
Title: Re: CreateThread overhead
Post by: hutch-- on April 24, 2021, 11:41:32 AM
 :biggrin:

> What happened to your explorative spirit of assembly programming? Nowadays scared of new techniques, and the code police behind the next corner?

There are two things here, creative genius at the hands of assembler language programmers and the hardware you are using to run it. To get creative genius off the ground, you have to first make the hardware happy by not feeding it garbage and this is filtered through the Operating System.

Once you get the hardware and the OS happy, THEN and only THEN do you start to develop the "explorative spirit of assembly programming" free of the hardware and OS victimising you for feeding it crap.
Title: Re: CreateThread overhead
Post by: jj2007 on April 24, 2021, 11:45:38 AM
Perfect. Do you have any evidence that the hardware might be unhappy if you use mov esp, 0 in a user mode procedure? We could write a little test, of course, in case somebody is really interested in the subject.

Raymond Chen's Why do we even need to define a red zone? Can't I just use my stack for anything? (https://devblogs.microsoft.com/oldnewthing/20190111-00/?p=100685) is a good read for the scary among us (with some broken links, he's becoming sloppy). I feel tempted to write a demo but have too many other things on my plate ;-)

P.S.: I couldn't resist, here is the proggie (source and 32-/64-bit executables attached):
include \Masm32\MasmBasic\Res\JBasic.inc ; ## console demo, builds in 32- or 64-bit mode with UAsm, ML, AsmC ##
usedeb=1 ; 1=use the deb macro
.code
DefProc SayHi proc argString:SIZE_P, argDword, argDouble:REAL8 ; #1 is a pointer size argument
Local v1, v2:REAL8, rc:RECT
  deb 4, "debug:", v1, rc.right, argDword, argDouble, $argString        ; this calls the dangerous jdebP!!!
  mov rsp, 12345 ; ***** CODE POLICE, where are you??? *****
  ret
SayHi endp
Init ; OPT_64 1 ; put 0 for 32 bit, 1 for 64 bit assembly
  PrintLine Chr$("This program was assembled with ", @AsmUsed$(1), " in ", jbit$, "-bit format.")
  xor ebx, ebx
@@:
jinvoke Sleep, 100
jinvoke SetConsoleCursorPosition, rv(GetStdHandle, STD_OUTPUT_HANDLE), 040000h ; Print At(0, 4)
Print Str$("** ct=%i **\n", rbx)
jinvoke SayHi, Chr$("Argument #1 is a string"), 222222222, FP8(33333.33333)
inc ebx
cmp ebx, 99
jbe @B
  Inkey "The hardware seems happy with mov rsp, 12345"
EndOfCode


Output:
This program was assembled with ml64 in 64-bit format.

** ct=99 **
debug:
v1      0
rc.right        0
argDword        222222222
argDouble       33333.333330000
$argString      Argument #1 is a string

The hardware seems happy with mov rsp, 12345
Title: Re: CreateThread overhead
Post by: hutch-- on April 24, 2021, 04:27:40 PM
There are a couple of the code police that just won't listen to you, the CPU will tell you with absolutely no negotiation that you have tried to use an invalid opcode. Then the OS is even nastier, no matter what you want from it, if you stuff something up it will say horrible things about your code and make you look like a jerk.

Now other mere mortals from time to time try and help others out when it comes to problems of that type and they then become the code police if they had the audacity to try and help other understand that the CPU and OS have boundaries which both will enforce.

When I see you peddling crap coding techniques that have numbers of preconditions attached to them like having to push in pairs to maintain alignment, you are peddling the JJ ABI, not the real one.

For years I had to deal with donkeys going HE HAW HE HAW because they promoted not observing the Win32 Intel ABI because it may have worked on an old Win9x version but when the arse fell out of their code they just disappeared.
Title: Re: CreateThread overhead
Post by: jj2007 on April 24, 2021, 07:18:45 PM
When you have no proof and no arguments, what's your solution? Insults...
Title: Re: CreateThread overhead
Post by: hutch-- on April 24, 2021, 08:14:55 PM
 :biggrin:

Stop trying to pull my leg, while you may float the idea of the JJABI, as long as it contains crap code, it is not worth listening to. I know you can do better as I have seen your JJWASMBASIC which was robust and reliable but this dodgy unreliable 64 bit code is not up to scratch yet.
Title: Re: CreateThread overhead
Post by: jj2007 on April 24, 2021, 09:07:57 PM
Quote from: hutch-- on April 24, 2021, 08:14:55 PMwhile you may float the idea of the JJABI, as long as it contains crap code

I don't float the idea of a JJABI. All my 64-bit code is absolutely ABI-compliant, with the exception of the jdebP proc that makes an unorthodox use of rsp. Since it's a leaf proc and obviously in userland, this usage of rsp is not a problem, as demonstrated with the proggie posted above.
Title: Re: CreateThread overhead
Post by: hutch-- on April 24, 2021, 09:25:16 PM
There is a simple solution to this hyatis you have arrived at, rewrite it so its not dodgy unreliable code. You can defend crap like that as being some form of exception but it needs to be changed and you can leave 32 bit STDCALL forever in 64 bit code.
Title: Re: CreateThread overhead
Post by: hutch-- on April 25, 2021, 12:14:10 PM
JJ,

Here is your Christmas present for the year before last or perhaps the year before that. Each rerun of the pair, SaveEmAll and RestoreEmAll will overwrite the previous result.

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include64\masm64rt.inc

  ; --------------------------------------------------

    AllocRegSpace MACRO

      IsLoaded@@@@ equ (1)

      .data?
      .rax dq ?
      .rbx dq ?
      .rcx dq ?
      .rdx dq ?
      .rsi dq ?
      .rdi dq ?
      .r8  dq ?
      .r9  dq ?
      .r10 dq ?
      .r11 dq ?
      .r12 dq ?
      .r13 dq ?
      .r14 dq ?
      .r15 dq ?
      .rbp dq ?
      .rsp dq ?

      .code

    ENDM

  ; --------------------------------------------------

    SaveEmAll MACRO

      IFNDEF IsLoaded@@@@
        AllocRegSpace
      ENDIF

      mov .rax, rax
      mov .rbx, rbx
      mov .rcx, rcx
      mov .rdx, rdx
      mov .rsi, rsi
      mov .rdi, rdi
      mov .r8,  r8
      mov .r9,  r9
      mov .r10, r10
      mov .r11, r11
      mov .r12, r12
      mov .r13, r13
      mov .r14, r14
      mov .r15, r15
      mov .rbp, rbp
      mov .rsp, rsp
    ENDM

  ; --------------------------------------------------

    RestoreEmAll MACRO

      IFNDEF IsLoaded@@@@
        echo Register Space Not Allocated
        exitm
        .err
      ENDIF

      mov rax, .rax
      mov rbx, .rbx
      mov rcx, .rcx
      mov rdx, .rdx
      mov rsi, .rsi
      mov rdi, .rdi
      mov r8,  .r8
      mov r9,  .r9
      mov r10, .r10
      mov r11, .r11
      mov r12, .r12
      mov r13, .r13
      mov r14, .r14
      mov r15, .r15
      mov rbp, .rbp
      mov rsp, .rsp

    ENDM

  ; --------------------------------------------------

    .code

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

entry_point proc

    SaveEmAll
    RestoreEmAll

    SaveEmAll

    conout str$(rax),lf
    conout str$(rbx),lf
    conout str$(rcx),lf
    conout str$(rdx),lf
    conout str$(rsi),lf
    conout str$(rdi),lf
    conout str$(r8),lf
    conout str$(r9),lf
    conout str$(r10),lf
    conout str$(r11),lf
    conout str$(r12),lf
    conout str$(r13),lf
    conout str$(r14),lf
    conout str$(r15),lf
    conout str$(rbp),lf
    conout str$(rsp),lf,lf,lf

    mov rax, 1234
    mov rbx, 5678
    mov rcx, 9012
    mov rdx, 3456           ; change register content
    mov rsi, 7890
    mov rdi, 1234
    add rsp, rbp
    rol rdx, 4

    RestoreEmAll

    conout str$(rax),lf
    conout str$(rbx),lf
    conout str$(rcx),lf
    conout str$(rdx),lf
    conout str$(rsi),lf
    conout str$(rdi),lf
    conout str$(r8),lf
    conout str$(r9),lf
    conout str$(r10),lf
    conout str$(r11),lf
    conout str$(r12),lf
    conout str$(r13),lf
    conout str$(r14),lf
    conout str$(r15),lf
    conout str$(rbp),lf
    conout str$(rsp),lf,lf,lf

    waitkey

    .exit

entry_point endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    end
Title: Re: CreateThread overhead
Post by: jj2007 on April 25, 2021, 12:43:40 PM
Wow, a bit bloated but nonetheless impressive! I love global variables :thumbsup:

48:8905 38210000         | mov [14000349C],rax      |
48:891D 39210000         | mov [1400034A4],rbx      |
48:890D 3A210000         | mov [1400034AC],rcx      |
48:8915 3B210000         | mov [1400034B4],rdx      |
48:8935 3C210000         | mov [1400034BC],rsi      |
48:893D 3D210000         | mov [1400034C4],rdi      |
4C:8905 3E210000         | mov [1400034CC],r8       |
4C:890D 3F210000         | mov [1400034D4],r9       |
4C:8915 40210000         | mov [1400034DC],r10      |
4C:891D 41210000         | mov [1400034E4],r11      |
4C:8925 42210000         | mov [1400034EC],r12      |
4C:892D 43210000         | mov [1400034F4],r13      |
4C:8935 44210000         | mov [1400034FC],r14      |
4C:893D 45210000         | mov [140003504],r15      |
48:892D 46210000         | mov [14000350C],rbp      |
48:8925 47210000         | mov [140003514],rsp      |
Title: Re: CreateThread overhead
Post by: nidud on April 25, 2021, 09:01:26 PM
deleted
Title: Re: CreateThread overhead
Post by: jj2007 on April 26, 2021, 02:05:23 AM
That looks nice, nidud :thumbsup:

However, throwing exceptions to save some regs might be a bit of an overkill. For the 64-bit version of the deb macro (http://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1019), it can be done in 72 bytes (48 for the 32-bit version):
jRegSave$ equ <#rax#rcx#rdx#rbx#rsp#rbp#rsi#rdi#r8 #r9 #r10#r11#r12#r13#r14#r15#>
jdebP proc export
  mov DefSize ptr regsave, rsp ; save the current stack pointer
  lea rsp, regsave+reqb ; regsave: db 512+18*8 dup(?)
  fxsave [rsp+reqbX] ; take care of xmmregs
  is=(1+@64)*32 ; 32 for 32-bit, 64 for 64-bit code
  WHILE is gt 1
jRS3$ SUBSTR jRegSave$, is-2, 3
push jRS3$ ; push all regs to the .data? section
is=is-4 ; #r15 -> #r14 ... #rax
  ENDM
  mov rax, [rsp-2*DefSize] ; old rsp
  mov rdx, [rax]
  mov [rsp+4*DefSize], rax ; rsp
  add DefSize ptr [rsp+4*DefSize], DefSize ; correct for ret addr
  xchg rax, rsp
  sub rdx, 5 ; size of a call
  mov [rax-DefSize], rdx ; rip
  ret
jdebP endp


The macro posted by Hutch above needs 67% more space for the same job. Plus, it needs that space for every use of deb, while the unorthodox jdepP solution costs 72 bytes once, plus only 6 bytes for every call.
Title: Re: CreateThread overhead
Post by: daydreamer on April 26, 2021, 08:42:47 PM
I thought it was only me trying make 1k demo,but now I get Curious Jochen if its possible with 1k demo or 4k demo in 64bit,now when I see your code sizereduction in 64bit??? :greenclp:
Title: Re: CreateThread overhead
Post by: jj2007 on April 26, 2021, 08:59:31 PM
I don't practice size reduction, daydreamer, except when I can do it :badgrin: