The MASM Forum

Projects => ObjAsm => Topic started by: Biterider on April 03, 2020, 12:41:17 AM

Title: UTF8 performance
Post by: Biterider on April 03, 2020, 12:41:17 AM
Hi
As you probably know, most communication between mobile devices uses UTF8 encoded strings. Because Windows uses wide strings (also known as UTF16) internally, many conversions need to be done. The two API routines in charge of the conversions are: MultiByteToWideChar and WideCharToMultiByte. They can perform various tasks like mapping characters from different code pages to Unicode Code Points or translate from one encoding to another. The former is useful if you are using code pages (8 bit characters), but if your application uses wide strings, these routines can encode or decode from UTF8 to UTF16.

Sometimes I got a better performance coding my own routines than using the Windows APIs. Therefore, I tried it here too. The two new routines are called WideToUTF8 and UTF8ToWide. I implemented a naive encoding described here https://de.wikipedia.org/wiki/UTF-8 (https://de.wikipedia.org/wiki/UTF-8) and here https://de.wikipedia.org/wiki/UTF-16 (https://de.wikipedia.org/wiki/UTF-16).

I also wrote a performance test using only the MASM32 framework and was really surprised how well the Windows routines performed. I spent some time tweaking my routines on my main machine and sometimes I got a 30% increase in performance in some Unicode regions (outside the BMP) and -10% in others. When I went to an older machine with the same code, I got a 30% general drop in performance.

My conclusion is that the MS routines are not bad at all.  :icon_idea:

I post my code in case someone wants to try it out.

Biterider
Title: Re: UTF8 performance
Post by: LiaoMi on April 03, 2020, 01:47:54 AM
Hi Biterider,

i7-4810mq
-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
SUCCEEDED       WideToUTF8: 406 ticks, WideCharToMultiByte: 438 ticks = 7% faster
SUCCEEDED       WideToUTF8: 390 ticks, WideCharToMultiByte: 438 ticks = 10% faster
SUCCEEDED       WideToUTF8: 390 ticks, WideCharToMultiByte: 453 ticks = 13% faster
SUCCEEDED       WideToUTF8: 391 ticks, WideCharToMultiByte: 438 ticks = 10% faster
SUCCEEDED       WideToUTF8: 390 ticks, WideCharToMultiByte: 438 ticks = 10% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 609 ticks, MultiByteToWideChar: 688 ticks = 11% faster
SUCCEEDED       UTF8ToWide: 578 ticks, MultiByteToWideChar: 687 ticks = 15% faster
SUCCEEDED       UTF8ToWide: 578 ticks, MultiByteToWideChar: 672 ticks = 13% faster
SUCCEEDED       UTF8ToWide: 594 ticks, MultiByteToWideChar: 672 ticks = 11% faster
SUCCEEDED       UTF8ToWide: 578 ticks, MultiByteToWideChar: 672 ticks = 13% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
SUCCEEDED       WideToUTF8: 1406 ticks, WideCharToMultiByte: 1859 ticks = 24% faster
SUCCEEDED       WideToUTF8: 1375 ticks, WideCharToMultiByte: 1875 ticks = 26% faster
SUCCEEDED       WideToUTF8: 1438 ticks, WideCharToMultiByte: 1812 ticks = 20% faster
SUCCEEDED       WideToUTF8: 1360 ticks, WideCharToMultiByte: 1828 ticks = 25% faster
SUCCEEDED       WideToUTF8: 1375 ticks, WideCharToMultiByte: 1859 ticks = 26% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 2735 ticks, MultiByteToWideChar: 2797 ticks = 2% faster
SUCCEEDED       UTF8ToWide: 2734 ticks, MultiByteToWideChar: 2703 ticks = -1% faster
SUCCEEDED       UTF8ToWide: 2656 ticks, MultiByteToWideChar: 2735 ticks = 2% faster
SUCCEEDED       UTF8ToWide: 2734 ticks, MultiByteToWideChar: 2734 ticks = 0% faster
SUCCEEDED       UTF8ToWide: 2750 ticks, MultiByteToWideChar: 2766 ticks = 0% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
SUCCEEDED       WideToUTF8: 672 ticks, WideCharToMultiByte: 859 ticks = 21% faster
SUCCEEDED       WideToUTF8: 657 ticks, WideCharToMultiByte: 875 ticks = 24% faster
SUCCEEDED       WideToUTF8: 718 ticks, WideCharToMultiByte: 860 ticks = 16% faster
SUCCEEDED       WideToUTF8: 687 ticks, WideCharToMultiByte: 875 ticks = 21% faster
SUCCEEDED       WideToUTF8: 672 ticks, WideCharToMultiByte: 844 ticks = 20% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 1390 ticks, MultiByteToWideChar: 1329 ticks = -4% faster
SUCCEEDED       UTF8ToWide: 1390 ticks, MultiByteToWideChar: 1297 ticks = -7% faster
SUCCEEDED       UTF8ToWide: 1344 ticks, MultiByteToWideChar: 1344 ticks = 0% faster
SUCCEEDED       UTF8ToWide: 1359 ticks, MultiByteToWideChar: 1312 ticks = -3% faster
SUCCEEDED       UTF8ToWide: 1391 ticks, MultiByteToWideChar: 1344 ticks = -3% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
SUCCEEDED       WideToUTF8: 1172 ticks, WideCharToMultiByte: 1125 ticks = -4% faster
SUCCEEDED       WideToUTF8: 1172 ticks, WideCharToMultiByte: 1109 ticks = -5% faster
SUCCEEDED       WideToUTF8: 1187 ticks, WideCharToMultiByte: 1094 ticks = -8% faster
SUCCEEDED       WideToUTF8: 1188 ticks, WideCharToMultiByte: 1109 ticks = -7% faster
SUCCEEDED       WideToUTF8: 1156 ticks, WideCharToMultiByte: 1109 ticks = -4% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 1125 ticks, MultiByteToWideChar: 1375 ticks = 18% faster
SUCCEEDED       UTF8ToWide: 1141 ticks, MultiByteToWideChar: 1375 ticks = 17% faster
SUCCEEDED       UTF8ToWide: 1187 ticks, MultiByteToWideChar: 1375 ticks = 13% faster
SUCCEEDED       UTF8ToWide: 1156 ticks, MultiByteToWideChar: 1391 ticks = 16% faster
SUCCEEDED       UTF8ToWide: 1203 ticks, MultiByteToWideChar: 1359 ticks = 11% faster
Ready
Press any key to continue ...
Title: Re: UTF8 performance
Post by: HSE on April 03, 2020, 02:03:21 AM
Hi Biterider!

In this very fast turtle, UTF8ToWide always better  :thumbsup:

Regards. HSE
Title: Re: UTF8 performance
Post by: mineiro on April 03, 2020, 02:15:33 AM
hello sir Biterider, tested in linux
Code: [Select]
$ lscpu
Arquitetura:           x86_64
Modo(s) operacional da CPU:32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per núcleo  2
Núcleo(s) por soquete:4
Soquete(s):            1
Nó(s) de NUMA:        1
ID de fornecedor:      GenuineIntel
Família da CPU:       6
Modelo:                94
Step:                  3
CPU MHz:               800.000
BogoMIPS:              6816.05
Virtualização:       VT-x
cache de L1d:          32K
cache de L1i:          32K
cache de L2:           256K
cache de L3:           8192K
NUMA node0 CPU(s):     0-7

Code: [Select]
$ wine PerformanceTest.exe
-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
fixme:winediag:sigxcpu_handler realtime priority was throttled due to program exceeding time limit
FAILED
FAILED
FAILED
FAILED
FAILED
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 544 ticks, MultiByteToWideChar: 607 ticks = 10% faster
SUCCEEDED       UTF8ToWide: 546 ticks, MultiByteToWideChar: 606 ticks = 9% faster
SUCCEEDED       UTF8ToWide: 524 ticks, MultiByteToWideChar: 606 ticks = 13% faster
SUCCEEDED       UTF8ToWide: 542 ticks, MultiByteToWideChar: 606 ticks = 10% faster
SUCCEEDED       UTF8ToWide: 531 ticks, MultiByteToWideChar: 603 ticks = 11% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
FAILED
FAILED
FAILED
FAILED
FAILED
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 2062 ticks, MultiByteToWideChar: 2535 ticks = 18% faster
SUCCEEDED       UTF8ToWide: 2076 ticks, MultiByteToWideChar: 2534 ticks = 18% faster
SUCCEEDED       UTF8ToWide: 2114 ticks, MultiByteToWideChar: 2534 ticks = 16% faster
SUCCEEDED       UTF8ToWide: 2163 ticks, MultiByteToWideChar: 2535 ticks = 14% faster
SUCCEEDED       UTF8ToWide: 2064 ticks, MultiByteToWideChar: 2535 ticks = 18% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
SUCCEEDED       WideToUTF8: 622 ticks, WideCharToMultiByte: 882 ticks = 29% faster
SUCCEEDED       WideToUTF8: 621 ticks, WideCharToMultiByte: 963 ticks = 35% faster
SUCCEEDED       WideToUTF8: 698 ticks, WideCharToMultiByte: 881 ticks = 20% faster
SUCCEEDED       WideToUTF8: 622 ticks, WideCharToMultiByte: 881 ticks = 29% faster
SUCCEEDED       WideToUTF8: 625 ticks, WideCharToMultiByte: 882 ticks = 29% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 1341 ticks, MultiByteToWideChar: 1258 ticks = -6% faster
SUCCEEDED       UTF8ToWide: 1343 ticks, MultiByteToWideChar: 1258 ticks = -6% faster
SUCCEEDED       UTF8ToWide: 1304 ticks, MultiByteToWideChar: 1258 ticks = -3% faster
SUCCEEDED       UTF8ToWide: 1273 ticks, MultiByteToWideChar: 1258 ticks = -1% faster
SUCCEEDED       UTF8ToWide: 1288 ticks, MultiByteToWideChar: 1258 ticks = -2% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
SUCCEEDED       WideToUTF8: 1136 ticks, WideCharToMultiByte: 3114 ticks = 63% faster
SUCCEEDED       WideToUTF8: 1163 ticks, WideCharToMultiByte: 3112 ticks = 62% faster
SUCCEEDED       WideToUTF8: 1146 ticks, WideCharToMultiByte: 3124 ticks = 63% faster
SUCCEEDED       WideToUTF8: 1137 ticks, WideCharToMultiByte: 3099 ticks = 63% faster
SUCCEEDED       WideToUTF8: 1159 ticks, WideCharToMultiByte: 3146 ticks = 63% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 1002 ticks, MultiByteToWideChar: 901 ticks = -11% faster
SUCCEEDED       UTF8ToWide: 1003 ticks, MultiByteToWideChar: 899 ticks = -11% faster
SUCCEEDED       UTF8ToWide: 1001 ticks, MultiByteToWideChar: 902 ticks = -10% faster
SUCCEEDED       UTF8ToWide: 1003 ticks, MultiByteToWideChar: 906 ticks = -10% faster
SUCCEEDED       UTF8ToWide: 1006 ticks, MultiByteToWideChar: 900 ticks = -11% faster
Ready
Press any key to continue ...fixme:console:CONSOLE_DefaultHandler Terminating process 8 on event 0
Title: Re: UTF8 performance
Post by: Siekmanski on April 03, 2020, 06:05:27 AM
i7-4930K

Code: [Select]
-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
SUCCEEDED       WideToUTF8: 406 ticks, WideCharToMultiByte: 484 ticks = 16% faster
SUCCEEDED       WideToUTF8: 407 ticks, WideCharToMultiByte: 468 ticks = 13% faster
SUCCEEDED       WideToUTF8: 422 ticks, WideCharToMultiByte: 469 ticks = 10% faster
SUCCEEDED       WideToUTF8: 406 ticks, WideCharToMultiByte: 469 ticks = 13% faster
SUCCEEDED       WideToUTF8: 406 ticks, WideCharToMultiByte: 469 ticks = 13% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 609 ticks, MultiByteToWideChar: 766 ticks = 20% faster
SUCCEEDED       UTF8ToWide: 593 ticks, MultiByteToWideChar: 766 ticks = 22% faster
SUCCEEDED       UTF8ToWide: 609 ticks, MultiByteToWideChar: 766 ticks = 20% faster
SUCCEEDED       UTF8ToWide: 594 ticks, MultiByteToWideChar: 781 ticks = 23% faster
SUCCEEDED       UTF8ToWide: 594 ticks, MultiByteToWideChar: 765 ticks = 22% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
SUCCEEDED       WideToUTF8: 1375 ticks, WideCharToMultiByte: 1875 ticks = 26% faster
SUCCEEDED       WideToUTF8: 1375 ticks, WideCharToMultiByte: 1875 ticks = 26% faster
SUCCEEDED       WideToUTF8: 1375 ticks, WideCharToMultiByte: 1875 ticks = 26% faster
SUCCEEDED       WideToUTF8: 1375 ticks, WideCharToMultiByte: 1875 ticks = 26% faster
SUCCEEDED       WideToUTF8: 1375 ticks, WideCharToMultiByte: 1875 ticks = 26% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 2485 ticks, MultiByteToWideChar: 3312 ticks = 24% faster
SUCCEEDED       UTF8ToWide: 2453 ticks, MultiByteToWideChar: 3297 ticks = 25% faster
SUCCEEDED       UTF8ToWide: 2532 ticks, MultiByteToWideChar: 3296 ticks = 23% faster
SUCCEEDED       UTF8ToWide: 2469 ticks, MultiByteToWideChar: 3297 ticks = 25% faster
SUCCEEDED       UTF8ToWide: 2344 ticks, MultiByteToWideChar: 3250 ticks = 27% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
SUCCEEDED       WideToUTF8: 703 ticks, WideCharToMultiByte: 828 ticks = 15% faster
SUCCEEDED       WideToUTF8: 688 ticks, WideCharToMultiByte: 828 ticks = 16% faster
SUCCEEDED       WideToUTF8: 687 ticks, WideCharToMultiByte: 813 ticks = 15% faster
SUCCEEDED       WideToUTF8: 687 ticks, WideCharToMultiByte: 828 ticks = 17% faster
SUCCEEDED       WideToUTF8: 688 ticks, WideCharToMultiByte: 812 ticks = 15% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 1234 ticks, MultiByteToWideChar: 1641 ticks = 24% faster
SUCCEEDED       UTF8ToWide: 1219 ticks, MultiByteToWideChar: 1640 ticks = 25% faster
SUCCEEDED       UTF8ToWide: 1360 ticks, MultiByteToWideChar: 1625 ticks = 16% faster
SUCCEEDED       UTF8ToWide: 1250 ticks, MultiByteToWideChar: 1640 ticks = 23% faster
SUCCEEDED       UTF8ToWide: 1235 ticks, MultiByteToWideChar: 1640 ticks = 24% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
SUCCEEDED       WideToUTF8: 1703 ticks, WideCharToMultiByte: 1235 ticks = -37% faster
SUCCEEDED       WideToUTF8: 1703 ticks, WideCharToMultiByte: 1234 ticks = -38% faster
SUCCEEDED       WideToUTF8: 1703 ticks, WideCharToMultiByte: 1219 ticks = -39% faster
SUCCEEDED       WideToUTF8: 1688 ticks, WideCharToMultiByte: 1234 ticks = -36% faster
SUCCEEDED       WideToUTF8: 1703 ticks, WideCharToMultiByte: 1235 ticks = -37% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 1640 ticks, MultiByteToWideChar: 1625 ticks = 0% faster
SUCCEEDED       UTF8ToWide: 1641 ticks, MultiByteToWideChar: 1609 ticks = -1% faster
SUCCEEDED       UTF8ToWide: 1656 ticks, MultiByteToWideChar: 1610 ticks = -2% faster
SUCCEEDED       UTF8ToWide: 1656 ticks, MultiByteToWideChar: 1609 ticks = -2% faster
SUCCEEDED       UTF8ToWide: 1657 ticks, MultiByteToWideChar: 1609 ticks = -2% faster
Ready
Press any key to continue ...
Title: Re: UTF8 performance
Post by: Biterider on April 04, 2020, 10:22:41 PM
Hi
LiaoMi, HSE, mineiro and Siekmanski, thank you all for taking the time to test these routines.  :thumbsup:

With your results I decided to concentrate on the last code point area (00h – 7Fh) and unrolled the matching part of the routine by a factor of 2x, 4x, 8x, 16x, 32x and 128x. The last factor gave me an idea of what is happening in an extreme configuration. When experimenting with these values, I came to the conclusion that 8x is the best factor for my machine. All areas now show positive results between 15% and 25% performance increase.

I looked at the MS implementation and found that the code spends a lot of time analyzing the input values to decide how to proceed. This means that the overhead becomes more relevant for short strings. Tests with short strings (16 code points) confirm this behavior. In this length range, the new routines outperform the originals by more than 60%.  :cool:

I introduced a change in the performance test. I doubled the runs to 10 and compared the best results (the fastest) of all runs. The performance boost now shows very consistent values.

I commented out the CPU scheduling code (SetPriorityClass and SetThreadPriority) and it seems that it doesn't affect the results. With this change, the code should also run in an emulator too.

The files in the first post are updated with the described code.

Biterider
Title: Re: UTF8 performance
Post by: Siekmanski on April 04, 2020, 10:36:44 PM
i7-4930K

Code: [Select]
-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
Fastest run     WideToUTF8: 406 ticks, WideCharToMultiByte: 453 ticks = 10% fast
er
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 547 ticks, MultiByteToWideChar: 750 ticks = 27% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
Fastest run     WideToUTF8: 1343 ticks, WideCharToMultiByte: 1843 ticks = 27% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 2171 ticks, MultiByteToWideChar: 3250 ticks = 33% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
Fastest run     WideToUTF8: 671 ticks, WideCharToMultiByte: 797 ticks = 15% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 1078 ticks, MultiByteToWideChar: 1593 ticks = 32% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
Fastest run     WideToUTF8: 1156 ticks, WideCharToMultiByte: 1203 ticks = 3% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 1171 ticks, MultiByteToWideChar: 1593 ticks = 26% faster
Ready
Press any key to continue ...
Title: Re: UTF8 performance
Post by: mineiro on April 05, 2020, 01:26:48 AM
Hello sir Biterider;
I received a lot of failed, this is why I'm not posting results. Others codes that I used use SetPriorityClass and SetThreadPriority in wine. You posted the source code, I will try my measures here and later post results. Thanks.

edit----- range 0 and 1 have failed.
Code: [Select]
$ wine PerformanceTest.exe
-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 490 ticks, MultiByteToWideChar: 663 ticks = 26% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 1991 ticks, MultiByteToWideChar: 3008 ticks = 33% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
Fastest run     WideToUTF8: 620 ticks, WideCharToMultiByte: 881 ticks = 29% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 983 ticks, MultiByteToWideChar: 1494 ticks = 34% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
Fastest run     WideToUTF8: 720 ticks, WideCharToMultiByte: 3087 ticks = 76% faster
Ready
Title: Re: UTF8 performance
Post by: HSE on April 05, 2020, 02:07:33 AM
Hi Biterider!

All areas now show positive results between 15% and 25% performance increase.
:eusa_naughty:  Almost...  :biggrin: (of course this sytem is like a piece of a museum)

Regards. HSE
Title: Re: UTF8 performance
Post by: LiaoMi on April 05, 2020, 03:12:58 AM
Code: [Select]
-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
Fastest run     WideToUTF8: 391 ticks, WideCharToMultiByte: 438 ticks = 10% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 515 ticks, MultiByteToWideChar: 672 ticks = 23% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
Fastest run     WideToUTF8: 1344 ticks, WideCharToMultiByte: 1875 ticks = 28% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 2140 ticks, MultiByteToWideChar: 2750 ticks = 22% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
Fastest run     WideToUTF8: 672 ticks, WideCharToMultiByte: 859 ticks = 21% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 1047 ticks, MultiByteToWideChar: 1344 ticks = 22% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
Fastest run     WideToUTF8: 828 ticks, WideCharToMultiByte: 1125 ticks = 26% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 984 ticks, MultiByteToWideChar: 1391 ticks = 29% faster
Ready
Press any key to continue ...
Title: Re: UTF8 performance
Post by: FORTRANS on April 05, 2020, 05:11:53 AM
Hi,

   Results from three machines.

Cheers,

Steve N.
Title: Re: UTF8 performance
Post by: mineiro on April 05, 2020, 06:05:37 AM
I commented on the process and priority function calls but with no luck, I tried change to globalalloc without success.

RANGE 0 WideToUTF8
When comparing the source and destination bytes the result of ecx = 00039fff and the content of esi and edi are:
esi=
004D3011  EE 80 84 EE|80 83 EE 80|82 EE 80 81|EE 80 80 ED|
004D3021  9F BF ED 9F|BE ED 9F BD|ED 9F BC ED|9F BB ED 9F|
edi=
00554011  EE 80 84 EE|80 83 EE 80|82 EE 80 81|EE 80 80 EF|
00554021  BF BD EF BF|BD EF BF BD|EF BF BD EF|BF BD EF BF|

RANGE 1 WideToUTF8
ECX=79FFF
ESI=
00493011  EE 80 84 EE|80 83 EE 80|82 EE 80 81|EE 80 80 ED|
00493021  9F BF ED 9F|BE ED 9F BD|ED 9F BC ED|9F BB ED 9F|
EDI=
00514011  EE 80 84 EE|80 83 EE 80|82 EE 80 81|EE 80 80 EF|
00514021  BF BD EF BF|BD EF BF BD|EF BF BD EF|BF BD EF BF|

Maybe bounds check, maybe implementation of WideCharToMultiByte and other in wine is not nice.
The solution to this problem is to blame the intern, it never fails.
Title: Re: UTF8 performance
Post by: jj2007 on April 05, 2020, 09:36:54 AM
 :thumbsup:

Code: [Select]
Preparing the test...
RANGE = 0
Testing WideToUTF8...
Fastest run     WideToUTF8: 499 ticks, WideCharToMultiByte: 702 ticks = 28% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 655 ticks, MultiByteToWideChar: 1139 ticks = 42% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
Fastest run     WideToUTF8: 1778 ticks, WideCharToMultiByte: 3244 ticks = 45% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 2464 ticks, MultiByteToWideChar: 4555 ticks = 45% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
Fastest run     WideToUTF8: 889 ticks, WideCharToMultiByte: 1482 ticks = 40% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 1216 ticks, MultiByteToWideChar: 2231 ticks = 45% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
Fastest run     WideToUTF8: 1435 ticks, WideCharToMultiByte: 1372 ticks = -4% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 1373 ticks, MultiByteToWideChar: 2090 ticks = 34% faster
Ready
Title: Re: UTF8 performance
Post by: Biterider on April 05, 2020, 05:07:13 PM
Hi all
@mineiro/FORTRANS: Thank you for debugging the code and providing additional information.  :thumbsup:
I have found the cause of the problem. In range 0, at ecx = 39FFFh, a lone surrogate occurs for the first time.

The MS documentation (https://docs.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-widechartomultibyte (https://docs.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-widechartomultibyte)) says:
Quote
Starting with Windows Vista, this function fully conforms with the Unicode 4.1 specification for UTF-8 and UTF-16. The function used on earlier operating systems encodes or decodes lone surrogate halves or mismatched surrogate pairs. Code written in earlier versions of Windows that rely on this behavior to encode random non-text binary data might run into problems. However, code that uses this function to produce valid UTF-8 strings will behave the same way as on earlier Windows operating systems.

This is in line with the tests by FORTRANS with different operating systems. Win2000, WinXP and Wine seem to behave as described by MS.

I changed the performance test to show the timings even if the result doesn't match. It is interesting to see how big the difference in performance is in these cases. I also reactivated the CPU scheduling code.  :icon_idea:

Code updated in the first post.

Biterider
Title: Re: UTF8 performance
Post by: Siekmanski on April 05, 2020, 06:33:44 PM
Code: [Select]
-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
FAILED  Fastest run     WideToUTF8: 406 ticks, WideCharToMultiByte: 453 ticks = 10% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 547 ticks, MultiByteToWideChar: 750 ticks = 27% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
FAILED  Fastest run     WideToUTF8: 1359 ticks, WideCharToMultiByte: 1843 ticks = 26% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 2172 ticks, MultiByteToWideChar: 3234 ticks = 32% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
FAILED  Fastest run     WideToUTF8: 672 ticks, WideCharToMultiByte: 796 ticks = 15% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 1063 ticks, MultiByteToWideChar: 1593 ticks = 33% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
FAILED  Fastest run     WideToUTF8: 1188 ticks, WideCharToMultiByte: 1203 ticks = 1% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 1187 ticks, MultiByteToWideChar: 1593 ticks = 25% faster
Ready
Press any key to continue ...
Title: Re: UTF8 performance
Post by: LiaoMi on April 05, 2020, 06:52:49 PM
Code: [Select]
-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
FAILED  Fastest run     WideToUTF8: 375 ticks, WideCharToMultiByte: 421 ticks = 10% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 515 ticks, MultiByteToWideChar: 672 ticks = 23% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
FAILED  Fastest run     WideToUTF8: 1312 ticks, WideCharToMultiByte: 1813 ticks = 27% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 2093 ticks, MultiByteToWideChar: 2672 ticks = 21% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
FAILED  Fastest run     WideToUTF8: 656 ticks, WideCharToMultiByte: 828 ticks = 20% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 1047 ticks, MultiByteToWideChar: 1297 ticks = 19% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
FAILED  Fastest run     WideToUTF8: 812 ticks, WideCharToMultiByte: 1094 ticks = 25% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 968 ticks, MultiByteToWideChar: 1343 ticks = 27% faster
Ready
Press any key to continue ...
Title: Re: UTF8 performance
Post by: Biterider on April 05, 2020, 06:57:49 PM
@Siekmanski/LiaoMi: sorry, I uploaded a wrong version of the exe  :sad:
I uploaded the correct code now. Timings should be correct.

Biterider
Title: Re: UTF8 performance
Post by: mineiro on April 05, 2020, 08:33:29 PM
good job, that explains the reason, thanks.

Code: [Select]
$ wine PerformanceTest.exe
-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
FAILED  Fastest run     WideToUTF8: 358 ticks, WideCharToMultiByte: 539 ticks = 33% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 490 ticks, MultiByteToWideChar: 674 ticks = 27% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
FAILED  Fastest run     WideToUTF8: 1238 ticks, WideCharToMultiByte: 1761 ticks = 29% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 1990 ticks, MultiByteToWideChar: 3006 ticks = 33% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 621 ticks, WideCharToMultiByte: 882 ticks = 29% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 984 ticks, MultiByteToWideChar: 1497 ticks = 34% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 727 ticks, WideCharToMultiByte: 3094 ticks = 76% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 857 ticks, MultiByteToWideChar: 903 ticks = 5% faster
Ready
Press any key to continue ...
Title: Re: UTF8 performance
Post by: Siekmanski on April 05, 2020, 09:36:30 PM
 :thumbsup:

Code: [Select]
-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 421 ticks, WideCharToMultiByte: 453 ticks = 7% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 547 ticks, MultiByteToWideChar: 750 ticks = 27% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 1359 ticks, WideCharToMultiByte: 1843 ticks = 26% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 2172 ticks, MultiByteToWideChar: 3234 ticks = 32% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 671 ticks, WideCharToMultiByte: 797 ticks = 15% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 1062 ticks, MultiByteToWideChar: 1593 ticks = 33% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 1203 ticks, WideCharToMultiByte: 1203 ticks = 0% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 1187 ticks, MultiByteToWideChar: 1593 ticks = 25% faster
Ready
Press any key to continue ...
Title: Re: UTF8 performance
Post by: Biterider on April 05, 2020, 10:30:07 PM
Hi all
For the sake of completeness, I have limited the performance test to strings of 16 code points long, which should be the most common case.

The results on my machine looks like:

Large test:
Code: [Select]
-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 343 ticks, WideCharToMultiByte: 390 ticks = 13% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 468 ticks, MultiByteToWideChar: 563 ticks = 20% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 1203 ticks, WideCharToMultiByte: 1656 ticks = 37% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 1938 ticks, MultiByteToWideChar: 2297 ticks = 18% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 593 ticks, WideCharToMultiByte: 750 ticks = 26% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 953 ticks, MultiByteToWideChar: 1141 ticks = 19% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 719 ticks, WideCharToMultiByte: 1000 ticks = 39% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 859 ticks, MultiByteToWideChar: 1156 ticks = 34% faster
Ready
Press any key to continue ...

Short test:
Code: [Select]
-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 625 ticks, WideCharToMultiByte: 1062 ticks = 69% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 750 ticks, MultiByteToWideChar: 1343 ticks = 79% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 390 ticks, WideCharToMultiByte: 906 ticks = 132% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 500 ticks, MultiByteToWideChar: 843 ticks = 68% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 391 ticks, WideCharToMultiByte: 907 ticks = 131% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 500 ticks, MultiByteToWideChar: 843 ticks = 68% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 125 ticks, WideCharToMultiByte: 312 ticks = 149% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 172 ticks, MultiByteToWideChar: 281 ticks = 63% faster
Ready
Press any key to continue ...

As you can see, the performance increase for shorter strings is considerable.

Thank you for supporting this little adventure.  :thumbsup:

Biterider

Title: Re: UTF8 performance
Post by: LiaoMi on April 06, 2020, 02:38:54 AM
Large test:
Code: [Select]
-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 375 ticks, WideCharToMultiByte: 422 ticks = 12% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 516 ticks, MultiByteToWideChar: 671 ticks = 30% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 1328 ticks, WideCharToMultiByte: 1844 ticks = 38% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 2094 ticks, MultiByteToWideChar: 2734 ticks = 30% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 672 ticks, WideCharToMultiByte: 859 ticks = 27% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 1047 ticks, MultiByteToWideChar: 1343 ticks = 28% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 859 ticks, WideCharToMultiByte: 1125 ticks = 30% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 1000 ticks, MultiByteToWideChar: 1391 ticks = 39% faster
Ready
Press any key to continue ...

Short test:
Code: [Select]
-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 703 ticks, WideCharToMultiByte: 1203 ticks = 71% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 843 ticks, MultiByteToWideChar: 1515 ticks = 79% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 437 ticks, WideCharToMultiByte: 1031 ticks = 135% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 562 ticks, MultiByteToWideChar: 1234 ticks = 119% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 453 ticks, WideCharToMultiByte: 1047 ticks = 131% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 547 ticks, MultiByteToWideChar: 1218 ticks = 122% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
SUCCEEDED       Fastest run     WideToUTF8: 140 ticks, WideCharToMultiByte: 375 ticks = 167% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 187 ticks, MultiByteToWideChar: 375 ticks = 100% faster
Ready
Press any key to continue ...