News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

UTF8 performance

Started by Biterider, April 03, 2020, 12:41:17 AM

Previous topic - Next topic

Biterider

Hi
As you probably know, most communication between mobile devices uses UTF8 encoded strings. Because Windows uses wide strings (also known as UTF16) internally, many conversions need to be done. The two API routines in charge of the conversions are: MultiByteToWideChar and WideCharToMultiByte. They can perform various tasks like mapping characters from different code pages to Unicode Code Points or translate from one encoding to another. The former is useful if you are using code pages (8 bit characters), but if your application uses wide strings, these routines can encode or decode from UTF8 to UTF16.

Sometimes I got a better performance coding my own routines than using the Windows APIs. Therefore, I tried it here too. The two new routines are called WideToUTF8 and UTF8ToWide. I implemented a naive encoding described here https://de.wikipedia.org/wiki/UTF-8 and here https://de.wikipedia.org/wiki/UTF-16.

I also wrote a performance test using only the MASM32 framework and was really surprised how well the Windows routines performed. I spent some time tweaking my routines on my main machine and sometimes I got a 30% increase in performance in some Unicode regions (outside the BMP) and -10% in others. When I went to an older machine with the same code, I got a 30% general drop in performance.

My conclusion is that the MS routines are not bad at all.  :icon_idea:

I post my code in case someone wants to try it out.

Biterider

LiaoMi

Hi Biterider,

i7-4810mq
-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
SUCCEEDED       WideToUTF8: 406 ticks, WideCharToMultiByte: 438 ticks = 7% faster
SUCCEEDED       WideToUTF8: 390 ticks, WideCharToMultiByte: 438 ticks = 10% faster
SUCCEEDED       WideToUTF8: 390 ticks, WideCharToMultiByte: 453 ticks = 13% faster
SUCCEEDED       WideToUTF8: 391 ticks, WideCharToMultiByte: 438 ticks = 10% faster
SUCCEEDED       WideToUTF8: 390 ticks, WideCharToMultiByte: 438 ticks = 10% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 609 ticks, MultiByteToWideChar: 688 ticks = 11% faster
SUCCEEDED       UTF8ToWide: 578 ticks, MultiByteToWideChar: 687 ticks = 15% faster
SUCCEEDED       UTF8ToWide: 578 ticks, MultiByteToWideChar: 672 ticks = 13% faster
SUCCEEDED       UTF8ToWide: 594 ticks, MultiByteToWideChar: 672 ticks = 11% faster
SUCCEEDED       UTF8ToWide: 578 ticks, MultiByteToWideChar: 672 ticks = 13% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
SUCCEEDED       WideToUTF8: 1406 ticks, WideCharToMultiByte: 1859 ticks = 24% faster
SUCCEEDED       WideToUTF8: 1375 ticks, WideCharToMultiByte: 1875 ticks = 26% faster
SUCCEEDED       WideToUTF8: 1438 ticks, WideCharToMultiByte: 1812 ticks = 20% faster
SUCCEEDED       WideToUTF8: 1360 ticks, WideCharToMultiByte: 1828 ticks = 25% faster
SUCCEEDED       WideToUTF8: 1375 ticks, WideCharToMultiByte: 1859 ticks = 26% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 2735 ticks, MultiByteToWideChar: 2797 ticks = 2% faster
SUCCEEDED       UTF8ToWide: 2734 ticks, MultiByteToWideChar: 2703 ticks = -1% faster
SUCCEEDED       UTF8ToWide: 2656 ticks, MultiByteToWideChar: 2735 ticks = 2% faster
SUCCEEDED       UTF8ToWide: 2734 ticks, MultiByteToWideChar: 2734 ticks = 0% faster
SUCCEEDED       UTF8ToWide: 2750 ticks, MultiByteToWideChar: 2766 ticks = 0% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
SUCCEEDED       WideToUTF8: 672 ticks, WideCharToMultiByte: 859 ticks = 21% faster
SUCCEEDED       WideToUTF8: 657 ticks, WideCharToMultiByte: 875 ticks = 24% faster
SUCCEEDED       WideToUTF8: 718 ticks, WideCharToMultiByte: 860 ticks = 16% faster
SUCCEEDED       WideToUTF8: 687 ticks, WideCharToMultiByte: 875 ticks = 21% faster
SUCCEEDED       WideToUTF8: 672 ticks, WideCharToMultiByte: 844 ticks = 20% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 1390 ticks, MultiByteToWideChar: 1329 ticks = -4% faster
SUCCEEDED       UTF8ToWide: 1390 ticks, MultiByteToWideChar: 1297 ticks = -7% faster
SUCCEEDED       UTF8ToWide: 1344 ticks, MultiByteToWideChar: 1344 ticks = 0% faster
SUCCEEDED       UTF8ToWide: 1359 ticks, MultiByteToWideChar: 1312 ticks = -3% faster
SUCCEEDED       UTF8ToWide: 1391 ticks, MultiByteToWideChar: 1344 ticks = -3% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
SUCCEEDED       WideToUTF8: 1172 ticks, WideCharToMultiByte: 1125 ticks = -4% faster
SUCCEEDED       WideToUTF8: 1172 ticks, WideCharToMultiByte: 1109 ticks = -5% faster
SUCCEEDED       WideToUTF8: 1187 ticks, WideCharToMultiByte: 1094 ticks = -8% faster
SUCCEEDED       WideToUTF8: 1188 ticks, WideCharToMultiByte: 1109 ticks = -7% faster
SUCCEEDED       WideToUTF8: 1156 ticks, WideCharToMultiByte: 1109 ticks = -4% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 1125 ticks, MultiByteToWideChar: 1375 ticks = 18% faster
SUCCEEDED       UTF8ToWide: 1141 ticks, MultiByteToWideChar: 1375 ticks = 17% faster
SUCCEEDED       UTF8ToWide: 1187 ticks, MultiByteToWideChar: 1375 ticks = 13% faster
SUCCEEDED       UTF8ToWide: 1156 ticks, MultiByteToWideChar: 1391 ticks = 16% faster
SUCCEEDED       UTF8ToWide: 1203 ticks, MultiByteToWideChar: 1359 ticks = 11% faster
Ready
Press any key to continue ...

HSE

Hi Biterider!

In this very fast turtle, UTF8ToWide always better  :thumbsup:

Regards. HSE
Equations in Assembly: SmplMath

mineiro

hello sir Biterider, tested in linux
$ lscpu
Arquitetura:           x86_64
Modo(s) operacional da CPU:32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per núcleo  2
Núcleo(s) por soquete:4
Soquete(s):            1
Nó(s) de NUMA:        1
ID de fornecedor:      GenuineIntel
Família da CPU:       6
Modelo:                94
Step:                  3
CPU MHz:               800.000
BogoMIPS:              6816.05
Virtualização:       VT-x
cache de L1d:          32K
cache de L1i:          32K
cache de L2:           256K
cache de L3:           8192K
NUMA node0 CPU(s):     0-7


$ wine PerformanceTest.exe
-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
fixme:winediag:sigxcpu_handler realtime priority was throttled due to program exceeding time limit
FAILED
FAILED
FAILED
FAILED
FAILED
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 544 ticks, MultiByteToWideChar: 607 ticks = 10% faster
SUCCEEDED       UTF8ToWide: 546 ticks, MultiByteToWideChar: 606 ticks = 9% faster
SUCCEEDED       UTF8ToWide: 524 ticks, MultiByteToWideChar: 606 ticks = 13% faster
SUCCEEDED       UTF8ToWide: 542 ticks, MultiByteToWideChar: 606 ticks = 10% faster
SUCCEEDED       UTF8ToWide: 531 ticks, MultiByteToWideChar: 603 ticks = 11% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
FAILED
FAILED
FAILED
FAILED
FAILED
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 2062 ticks, MultiByteToWideChar: 2535 ticks = 18% faster
SUCCEEDED       UTF8ToWide: 2076 ticks, MultiByteToWideChar: 2534 ticks = 18% faster
SUCCEEDED       UTF8ToWide: 2114 ticks, MultiByteToWideChar: 2534 ticks = 16% faster
SUCCEEDED       UTF8ToWide: 2163 ticks, MultiByteToWideChar: 2535 ticks = 14% faster
SUCCEEDED       UTF8ToWide: 2064 ticks, MultiByteToWideChar: 2535 ticks = 18% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
SUCCEEDED       WideToUTF8: 622 ticks, WideCharToMultiByte: 882 ticks = 29% faster
SUCCEEDED       WideToUTF8: 621 ticks, WideCharToMultiByte: 963 ticks = 35% faster
SUCCEEDED       WideToUTF8: 698 ticks, WideCharToMultiByte: 881 ticks = 20% faster
SUCCEEDED       WideToUTF8: 622 ticks, WideCharToMultiByte: 881 ticks = 29% faster
SUCCEEDED       WideToUTF8: 625 ticks, WideCharToMultiByte: 882 ticks = 29% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 1341 ticks, MultiByteToWideChar: 1258 ticks = -6% faster
SUCCEEDED       UTF8ToWide: 1343 ticks, MultiByteToWideChar: 1258 ticks = -6% faster
SUCCEEDED       UTF8ToWide: 1304 ticks, MultiByteToWideChar: 1258 ticks = -3% faster
SUCCEEDED       UTF8ToWide: 1273 ticks, MultiByteToWideChar: 1258 ticks = -1% faster
SUCCEEDED       UTF8ToWide: 1288 ticks, MultiByteToWideChar: 1258 ticks = -2% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
SUCCEEDED       WideToUTF8: 1136 ticks, WideCharToMultiByte: 3114 ticks = 63% faster
SUCCEEDED       WideToUTF8: 1163 ticks, WideCharToMultiByte: 3112 ticks = 62% faster
SUCCEEDED       WideToUTF8: 1146 ticks, WideCharToMultiByte: 3124 ticks = 63% faster
SUCCEEDED       WideToUTF8: 1137 ticks, WideCharToMultiByte: 3099 ticks = 63% faster
SUCCEEDED       WideToUTF8: 1159 ticks, WideCharToMultiByte: 3146 ticks = 63% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 1002 ticks, MultiByteToWideChar: 901 ticks = -11% faster
SUCCEEDED       UTF8ToWide: 1003 ticks, MultiByteToWideChar: 899 ticks = -11% faster
SUCCEEDED       UTF8ToWide: 1001 ticks, MultiByteToWideChar: 902 ticks = -10% faster
SUCCEEDED       UTF8ToWide: 1003 ticks, MultiByteToWideChar: 906 ticks = -10% faster
SUCCEEDED       UTF8ToWide: 1006 ticks, MultiByteToWideChar: 900 ticks = -11% faster
Ready
Press any key to continue ...fixme:console:CONSOLE_DefaultHandler Terminating process 8 on event 0
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

Siekmanski

i7-4930K

-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
SUCCEEDED       WideToUTF8: 406 ticks, WideCharToMultiByte: 484 ticks = 16% faster
SUCCEEDED       WideToUTF8: 407 ticks, WideCharToMultiByte: 468 ticks = 13% faster
SUCCEEDED       WideToUTF8: 422 ticks, WideCharToMultiByte: 469 ticks = 10% faster
SUCCEEDED       WideToUTF8: 406 ticks, WideCharToMultiByte: 469 ticks = 13% faster
SUCCEEDED       WideToUTF8: 406 ticks, WideCharToMultiByte: 469 ticks = 13% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 609 ticks, MultiByteToWideChar: 766 ticks = 20% faster
SUCCEEDED       UTF8ToWide: 593 ticks, MultiByteToWideChar: 766 ticks = 22% faster
SUCCEEDED       UTF8ToWide: 609 ticks, MultiByteToWideChar: 766 ticks = 20% faster
SUCCEEDED       UTF8ToWide: 594 ticks, MultiByteToWideChar: 781 ticks = 23% faster
SUCCEEDED       UTF8ToWide: 594 ticks, MultiByteToWideChar: 765 ticks = 22% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
SUCCEEDED       WideToUTF8: 1375 ticks, WideCharToMultiByte: 1875 ticks = 26% faster
SUCCEEDED       WideToUTF8: 1375 ticks, WideCharToMultiByte: 1875 ticks = 26% faster
SUCCEEDED       WideToUTF8: 1375 ticks, WideCharToMultiByte: 1875 ticks = 26% faster
SUCCEEDED       WideToUTF8: 1375 ticks, WideCharToMultiByte: 1875 ticks = 26% faster
SUCCEEDED       WideToUTF8: 1375 ticks, WideCharToMultiByte: 1875 ticks = 26% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 2485 ticks, MultiByteToWideChar: 3312 ticks = 24% faster
SUCCEEDED       UTF8ToWide: 2453 ticks, MultiByteToWideChar: 3297 ticks = 25% faster
SUCCEEDED       UTF8ToWide: 2532 ticks, MultiByteToWideChar: 3296 ticks = 23% faster
SUCCEEDED       UTF8ToWide: 2469 ticks, MultiByteToWideChar: 3297 ticks = 25% faster
SUCCEEDED       UTF8ToWide: 2344 ticks, MultiByteToWideChar: 3250 ticks = 27% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
SUCCEEDED       WideToUTF8: 703 ticks, WideCharToMultiByte: 828 ticks = 15% faster
SUCCEEDED       WideToUTF8: 688 ticks, WideCharToMultiByte: 828 ticks = 16% faster
SUCCEEDED       WideToUTF8: 687 ticks, WideCharToMultiByte: 813 ticks = 15% faster
SUCCEEDED       WideToUTF8: 687 ticks, WideCharToMultiByte: 828 ticks = 17% faster
SUCCEEDED       WideToUTF8: 688 ticks, WideCharToMultiByte: 812 ticks = 15% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 1234 ticks, MultiByteToWideChar: 1641 ticks = 24% faster
SUCCEEDED       UTF8ToWide: 1219 ticks, MultiByteToWideChar: 1640 ticks = 25% faster
SUCCEEDED       UTF8ToWide: 1360 ticks, MultiByteToWideChar: 1625 ticks = 16% faster
SUCCEEDED       UTF8ToWide: 1250 ticks, MultiByteToWideChar: 1640 ticks = 23% faster
SUCCEEDED       UTF8ToWide: 1235 ticks, MultiByteToWideChar: 1640 ticks = 24% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
SUCCEEDED       WideToUTF8: 1703 ticks, WideCharToMultiByte: 1235 ticks = -37% faster
SUCCEEDED       WideToUTF8: 1703 ticks, WideCharToMultiByte: 1234 ticks = -38% faster
SUCCEEDED       WideToUTF8: 1703 ticks, WideCharToMultiByte: 1219 ticks = -39% faster
SUCCEEDED       WideToUTF8: 1688 ticks, WideCharToMultiByte: 1234 ticks = -36% faster
SUCCEEDED       WideToUTF8: 1703 ticks, WideCharToMultiByte: 1235 ticks = -37% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       UTF8ToWide: 1640 ticks, MultiByteToWideChar: 1625 ticks = 0% faster
SUCCEEDED       UTF8ToWide: 1641 ticks, MultiByteToWideChar: 1609 ticks = -1% faster
SUCCEEDED       UTF8ToWide: 1656 ticks, MultiByteToWideChar: 1610 ticks = -2% faster
SUCCEEDED       UTF8ToWide: 1656 ticks, MultiByteToWideChar: 1609 ticks = -2% faster
SUCCEEDED       UTF8ToWide: 1657 ticks, MultiByteToWideChar: 1609 ticks = -2% faster
Ready
Press any key to continue ...
Creative coders use backward thinking techniques as a strategy.

Biterider

Hi
LiaoMi, HSE, mineiro and Siekmanski, thank you all for taking the time to test these routines.  :thumbsup:

With your results I decided to concentrate on the last code point area (00h – 7Fh) and unrolled the matching part of the routine by a factor of 2x, 4x, 8x, 16x, 32x and 128x. The last factor gave me an idea of what is happening in an extreme configuration. When experimenting with these values, I came to the conclusion that 8x is the best factor for my machine. All areas now show positive results between 15% and 25% performance increase.

I looked at the MS implementation and found that the code spends a lot of time analyzing the input values to decide how to proceed. This means that the overhead becomes more relevant for short strings. Tests with short strings (16 code points) confirm this behavior. In this length range, the new routines outperform the originals by more than 60%.  :cool:

I introduced a change in the performance test. I doubled the runs to 10 and compared the best results (the fastest) of all runs. The performance boost now shows very consistent values.

I commented out the CPU scheduling code (SetPriorityClass and SetThreadPriority) and it seems that it doesn't affect the results. With this change, the code should also run in an emulator too.

The files in the first post are updated with the described code.

Biterider

Siekmanski

i7-4930K

-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
Fastest run     WideToUTF8: 406 ticks, WideCharToMultiByte: 453 ticks = 10% fast
er
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 547 ticks, MultiByteToWideChar: 750 ticks = 27% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
Fastest run     WideToUTF8: 1343 ticks, WideCharToMultiByte: 1843 ticks = 27% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 2171 ticks, MultiByteToWideChar: 3250 ticks = 33% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
Fastest run     WideToUTF8: 671 ticks, WideCharToMultiByte: 797 ticks = 15% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 1078 ticks, MultiByteToWideChar: 1593 ticks = 32% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
Fastest run     WideToUTF8: 1156 ticks, WideCharToMultiByte: 1203 ticks = 3% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 1171 ticks, MultiByteToWideChar: 1593 ticks = 26% faster
Ready
Press any key to continue ...
Creative coders use backward thinking techniques as a strategy.

mineiro

#7
Hello sir Biterider;
I received a lot of failed, this is why I'm not posting results. Others codes that I used use SetPriorityClass and SetThreadPriority in wine. You posted the source code, I will try my measures here and later post results. Thanks.

edit----- range 0 and 1 have failed.
$ wine PerformanceTest.exe
-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 490 ticks, MultiByteToWideChar: 663 ticks = 26% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
FAILED
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 1991 ticks, MultiByteToWideChar: 3008 ticks = 33% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
Fastest run     WideToUTF8: 620 ticks, WideCharToMultiByte: 881 ticks = 29% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 983 ticks, MultiByteToWideChar: 1494 ticks = 34% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
Fastest run     WideToUTF8: 720 ticks, WideCharToMultiByte: 3087 ticks = 76% faster
Ready
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

HSE

Hi Biterider!

Quote from: Biterider on April 04, 2020, 10:22:41 PM
All areas now show positive results between 15% and 25% performance increase.
:eusa_naughty:  Almost...  :biggrin: (of course this sytem is like a piece of a museum)

Regards. HSE
Equations in Assembly: SmplMath

LiaoMi

-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
Fastest run     WideToUTF8: 391 ticks, WideCharToMultiByte: 438 ticks = 10% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 515 ticks, MultiByteToWideChar: 672 ticks = 23% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
Fastest run     WideToUTF8: 1344 ticks, WideCharToMultiByte: 1875 ticks = 28% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 2140 ticks, MultiByteToWideChar: 2750 ticks = 22% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
Fastest run     WideToUTF8: 672 ticks, WideCharToMultiByte: 859 ticks = 21% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 1047 ticks, MultiByteToWideChar: 1344 ticks = 22% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
Fastest run     WideToUTF8: 828 ticks, WideCharToMultiByte: 1125 ticks = 26% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 984 ticks, MultiByteToWideChar: 1391 ticks = 29% faster
Ready
Press any key to continue ...

FORTRANS

Hi,

   Results from three machines.

Cheers,

Steve N.

mineiro

I commented on the process and priority function calls but with no luck, I tried change to globalalloc without success.

RANGE 0 WideToUTF8
When comparing the source and destination bytes the result of ecx = 00039fff and the content of esi and edi are:
esi=
004D3011  EE 80 84 EE|80 83 EE 80|82 EE 80 81|EE 80 80 ED|
004D3021  9F BF ED 9F|BE ED 9F BD|ED 9F BC ED|9F BB ED 9F|
edi=
00554011  EE 80 84 EE|80 83 EE 80|82 EE 80 81|EE 80 80 EF|
00554021  BF BD EF BF|BD EF BF BD|EF BF BD EF|BF BD EF BF|

RANGE 1 WideToUTF8
ECX=79FFF
ESI=
00493011  EE 80 84 EE|80 83 EE 80|82 EE 80 81|EE 80 80 ED|
00493021  9F BF ED 9F|BE ED 9F BD|ED 9F BC ED|9F BB ED 9F|
EDI=
00514011  EE 80 84 EE|80 83 EE 80|82 EE 80 81|EE 80 80 EF|
00514021  BF BD EF BF|BD EF BF BD|EF BF BD EF|BF BD EF BF|

Maybe bounds check, maybe implementation of WideCharToMultiByte and other in wine is not nice.
The solution to this problem is to blame the intern, it never fails.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

jj2007

 :thumbsup:

Preparing the test...
RANGE = 0
Testing WideToUTF8...
Fastest run     WideToUTF8: 499 ticks, WideCharToMultiByte: 702 ticks = 28% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 655 ticks, MultiByteToWideChar: 1139 ticks = 42% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
Fastest run     WideToUTF8: 1778 ticks, WideCharToMultiByte: 3244 ticks = 45% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 2464 ticks, MultiByteToWideChar: 4555 ticks = 45% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
Fastest run     WideToUTF8: 889 ticks, WideCharToMultiByte: 1482 ticks = 40% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 1216 ticks, MultiByteToWideChar: 2231 ticks = 45% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
Fastest run     WideToUTF8: 1435 ticks, WideCharToMultiByte: 1372 ticks = -4% faster
Ready

Testing UTF8ToWide...
Fastest run     UTF8ToWide: 1373 ticks, MultiByteToWideChar: 2090 ticks = 34% faster
Ready

Biterider

Hi all
@mineiro/FORTRANS: Thank you for debugging the code and providing additional information.  :thumbsup:
I have found the cause of the problem. In range 0, at ecx = 39FFFh, a lone surrogate occurs for the first time.

The MS documentation (https://docs.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-widechartomultibyte) says:
QuoteStarting with Windows Vista, this function fully conforms with the Unicode 4.1 specification for UTF-8 and UTF-16. The function used on earlier operating systems encodes or decodes lone surrogate halves or mismatched surrogate pairs. Code written in earlier versions of Windows that rely on this behavior to encode random non-text binary data might run into problems. However, code that uses this function to produce valid UTF-8 strings will behave the same way as on earlier Windows operating systems.

This is in line with the tests by FORTRANS with different operating systems. Win2000, WinXP and Wine seem to behave as described by MS.

I changed the performance test to show the timings even if the result doesn't match. It is interesting to see how big the difference in performance is in these cases. I also reactivated the CPU scheduling code.  :icon_idea:

Code updated in the first post.

Biterider

Siekmanski

-----------------------------------------------------------
Preparing the test...
RANGE = 0
Testing WideToUTF8...
FAILED  Fastest run     WideToUTF8: 406 ticks, WideCharToMultiByte: 453 ticks = 10% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 547 ticks, MultiByteToWideChar: 750 ticks = 27% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 1
Testing WideToUTF8...
FAILED  Fastest run     WideToUTF8: 1359 ticks, WideCharToMultiByte: 1843 ticks = 26% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 2172 ticks, MultiByteToWideChar: 3234 ticks = 32% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 2
Testing WideToUTF8...
FAILED  Fastest run     WideToUTF8: 672 ticks, WideCharToMultiByte: 796 ticks = 15% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 1063 ticks, MultiByteToWideChar: 1593 ticks = 33% faster
Ready
-----------------------------------------------------------
Preparing the test...
RANGE = 3
Testing WideToUTF8...
FAILED  Fastest run     WideToUTF8: 1188 ticks, WideCharToMultiByte: 1203 ticks = 1% faster
Ready

Testing UTF8ToWide...
SUCCEEDED       Fastest run     UTF8ToWide: 1187 ticks, MultiByteToWideChar: 1593 ticks = 25% faster
Ready
Press any key to continue ...
Creative coders use backward thinking techniques as a strategy.