The MASM Forum

Microsoft 64 bit MASM => Examples => Topic started by: ttribelli on November 17, 2023, 03:24:42 AM

Title: Intel SSE and ARM NEON hexadecimal string
Post by: ttribelli on November 17, 2023, 03:24:42 AM
This repository is mainly for readers of Randy Hyde's various Art of Assembly books, but I thought I'd share it here just in case someone else might find it interesting. The code is a follow-up to a book example. Creating a hexadecimal string just begs for SIMD implementations. Here are Intel SSE and ARM NEON implementations, SSE and NEON C intrinsic implementations, x86-64 and AArch64 implementations, and an ordinary C implementation for comparison. Windows, macOS, and Linux.

The repository is a testbed, it's not trying for the fastest code possible. The "audience" is students, so I went for readability at times. Different algorithms may be faster depending on the underlying hardware. Table lookup or computed hex digits. Copying individual digits to the output buffer or collecting digits in a register and only copying to the output buffer when the register is full. Also, assembly vs intrinsics. If a specific compiler does not optimize intrinsics then even my not-fully-optimized assembly can beat intrinsics.

Forgive me for a repository that has both masm and gas code, but I am not a complete heathen and configured gas for Intel syntax so that Mac programmers may see the light. :-)

https://github.com/atribelli/hexstr
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: jj2007 on November 17, 2023, 06:13:42 AM
Thanks. I see two asm files in hexstr-main.zip, hexstr-sse and hexstr-sse, what's their respective role?
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: ttribelli on November 17, 2023, 06:34:34 AM
There are source and make files for both Windows masm and nmake, and macOS/Linux gas and make. The gas .s files are using Intel syntax mode. Note that the code is slightly different between Windows and macOS/Linux. Different registers are used for passing parameters from C to assembly.

Files

makefile - macOS and Linux based builds.
hexstr.mak - Windows based builds.
main.cpp - Timing code.
hexstr.h - Prototypes for hex string conversion functions.
hexstr.c - C and SSE and NEON intrinsic implementations.
hexstr-x64.s - x86-64 assembly implementation (gas).
hexstr-sse.s - SSE implementation (gas).
hexstr-a64.s - AArch64 assembly implementation.
hexstr-neon.s - NEON implementation.
hexstr-x64.asm - x86-64 assembly implementation (masm).
hexstr-sse.asm - SSE implementation (masm).

Building

make - Creates C and intrinsics based code, hexstr-c hexstr-intrin.
make intel - Creates assembly and SSE code, hexstr-x64 hexstr-sse.
make arm - Creates assembly and NEON code, hexstr-a64 hexstr-neon.
make clean - Removes executable and build files.
nmake /f hexstr.mak - Create all executables for Windows.
nmake /f hexstr.mak clean - Removes executable and build files under Windows.
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: Vortex on November 17, 2023, 06:46:57 AM
Hi ttribelli,

Thanks for the code. I could not manage to assemble this module :

D:\FreeBASIC\bin\win64>as.exe --version
GNU assembler (Binutils for MinGW-W64 x86_64, built by Brecht Sanders) 2.34
Copyright (C) 2020 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `x86_64-w64-mingw32'.

D:\FreeBASIC\bin\win64>as -o hexstr-x64.o hexstr-x64.s
Conditional Assembly: Lookup digits
Conditional Assembly: Output bytes
hexstr-x64.s: Assembler messages:
hexstr-x64.s:118: Error: bad expression
hexstr-x64.s:118: Error: invalid use of register
hexstr-x64.s:121: Error: bad expression
hexstr-x64.s:121: Error: invalid use of register
hexstr-x64.s:123: Error: bad expression
hexstr-x64.s:123: Error: invalid use of register
hexstr-x64.s:123: Error: bad expression
hexstr-x64.s:123: Error: invalid use of register
hexstr-x64.s:128: Error: bad expression
hexstr-x64.s:128: Error: invalid use of register
hexstr-x64.s:128: Error: bad expression
hexstr-x64.s:128: Error: invalid use of register
hexstr-x64.s:133: Error: bad expression
hexstr-x64.s:133: Error: invalid use of register
hexstr-x64.s:133: Error: bad expression
hexstr-x64.s:133: Error: invalid use of register
hexstr-x64.s:138: Error: bad expression
hexstr-x64.s:138: Error: invalid use of register
hexstr-x64.s:138: Error: bad expression
hexstr-x64.s:138: Error: invalid use of register
hexstr-x64.s:143: Error: bad expression
hexstr-x64.s:143: Error: invalid use of register

Trying to assemble in the Msys2 environment :

# as -o hexstr-x64.o hexstr-x64.s
Conditional Assembly: Lookup digits
Conditional Assembly: Output bytes
hexstr-x64.s: Assembler messages:
hexstr-x64.s:118: Error: bad expression
hexstr-x64.s:118: Error: invalid use of register
hexstr-x64.s:121: Error: bad expression
hexstr-x64.s:121: Error: invalid use of register
hexstr-x64.s:58: Error: bad expression
hexstr-x64.s:91:  Info: macro invoked from here
hexstr-x64.s:123:   Info: macro invoked from here
hexstr-x64.s:58: Error: invalid use of register
hexstr-x64.s:91:  Info: macro invoked from here
hexstr-x64.s:123:   Info: macro invoked from here
hexstr-x64.s:59: Error: bad expression
hexstr-x64.s:91:  Info: macro invoked from here
hexstr-x64.s:123:   Info: macro invoked from here
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: ttribelli on November 17, 2023, 08:22:15 AM
QuoteThanks for the code. I could not manage to assemble this module :
D:\FreeBASIC\bin\win64>as.exe --version
GNU assembler (Binutils for MinGW-W64 x86_64, built by Brecht Sanders) 2.34
This assembler was configured for a target of `x86_64-w64-mingw32'.

I replicated the problem on Debian. I screwed up and only tested the Intel .s under macOS and not Linux. All my Linux testing was ARM.

While waiting for a repository update, if you have Windows, try nmake and hexstr.mak.

Thanks for your patience.
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: jj2007 on November 17, 2023, 11:58:57 AM
Quote from: ttribelli on November 17, 2023, 06:34:34 AMhexstr-sse.asm - SSE implementation (masm)

AMD Athlon Gold 3150U with Radeon Graphics      (SSE4)

417     cycles for 100 * Hex$ (MasmBasic)
550     cycles for 100 * ToHexStr
5907    cycles for 100 * hex$ (Masm32 SDK)
46498   cycles for 100 * crt Printf()

374     cycles for 100 * Hex$ (MasmBasic)
527     cycles for 100 * ToHexStr
5820    cycles for 100 * hex$ (Masm32 SDK)
46328   cycles for 100 * crt Printf()

361     cycles for 100 * Hex$ (MasmBasic)
524     cycles for 100 * ToHexStr
5934    cycles for 100 * hex$ (Masm32 SDK)
46904   cycles for 100 * crt Printf()

371     cycles for 100 * Hex$ (MasmBasic)
558     cycles for 100 * ToHexStr
5673    cycles for 100 * hex$ (Masm32 SDK)
46998   cycles for 100 * crt Printf()

Hex$ (MasmBasic)                        12345600
ToHexStr                                12345600
hex$ (Masm32 SDK)                       12345600
crt Printf()                            12345600

Congrats, it's really fast :thumbsup:
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: ttribelli on November 17, 2023, 03:34:23 PM
Quote from: jj2007 on November 17, 2023, 11:58:57 AM
Quote from: ttribelli on November 17, 2023, 06:34:34 AMhexstr-sse.asm - SSE implementation (masm)

Congrats, it's really fast :thumbsup:

Thank you. It could get faster. Removing the macro and inlining the code would provide more interleaving opportunities.
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: NoCforMe on November 17, 2023, 07:20:02 PM
Quote from: jj2007 on November 17, 2023, 11:58:57 AMCongrats, it's really fast :thumbsup:

Wow. (Almost) two orders of magnitude compared to the CRT routine. (Maybe that shouldn't surprise us ...)
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: jj2007 on November 17, 2023, 08:12:02 PM
Quote from: NoCforMe on November 17, 2023, 07:20:02 PMWow. (Almost) two orders of magnitude compared to the CRT routine. (Maybe that shouldn't surprise us ...)

Remember that nowadays C/C++ compilers are faster than hand-made Assembly (according to C/C++ programmers) :cool:
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: Vortex on November 18, 2023, 12:58:32 AM
Not easy to compete with the C\C++ compilers especially if the case is large projects.
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: jj2007 on November 18, 2023, 01:43:43 AM
I wonder why they are not using their fast compilers for the common good :cool:
361     cycles for 100 * Hex$ (MasmBasic)
46904   cycles for 100 * crt Printf()
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: TimoVJL on November 18, 2023, 02:07:21 AM
printf and sprintf are standard versatile C runtime functions, not specific for purbose, like yours.
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: ttribelli on November 18, 2023, 04:48:11 AM
Quote from: Vortex on November 18, 2023, 12:58:32 AMNot easy to compete with the C\C++ compilers especially if the case is large projects.

That is an unknown until you have profiled the code and discovered any hot spots. Its not about size, its about the percentage of time in hot spots.
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: Vortex on November 18, 2023, 06:25:09 AM
Hi ttribelli,

This has nothing to do with unknown cases and hot spots. Maybe, I should be more clear : imagine that I am tasked to write a very very big application in assembly. Trying to optimize the whole application would be a very tiring task too. On the other side,you could write the same application with an optimizing C\C++ compiler. The optimizing engine does not get tired like me or another human. From a practical point of view, the C\C++ compiler can be the winner.
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: ttribelli on November 18, 2023, 06:32:34 AM
Quote from: Vortex on November 18, 2023, 06:25:09 AMimagine that I am tasked to write a very very big application in assembly.

In that case advice HR that your manager may need medical intervention. :-)
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: Vortex on November 18, 2023, 06:39:30 AM
Hi ttribelli,

Compared to a human, a machine is much more robust and productive in some cases. That was the point of my example.
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: ttribelli on November 18, 2023, 06:49:28 AM
Quote from: Vortex on November 18, 2023, 06:39:30 AMCompared to a human, a machine is much more robust and productive in some cases. That was the point of my example.

I agree, until the profiler tells me about serious hotspots. Modern C/C++ compilers have inline assembly or SIMD intrinsic support for very good reasons.
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: jj2007 on November 18, 2023, 10:17:25 AM
Quote from: Vortex on November 18, 2023, 06:25:09 AMimagine that I am tasked to write a very very big application in assembly. Trying to optimize the whole application would be a very tiring task too. On the other side,you could write the same application with an optimizing C\C++ compiler. The optimizing engine does not get tired like me or another human. From a practical point of view, the C\C++ compiler can be the winner.

The compiler can be the winner, but it depends on a number of factors:
- I am a lousy C programmer; programming a simple string like Let my$="Today is the "+fDate$()+", and it's "+fTime$() would cost me ages in C but a few seconds in Assembly;
- we are Assembly programmers: we know exactly how to design an application for speed;
- we know how to recognise an innermost loop, and how to tickle out of the cpu the minimum cycle count inside that loop;
- we are Windows API experts: our code will not be portable, but we can make the best use of Windows without bloated libraries like Electron or QT.

In short: it depends. Maybe the average C programmer can beat the average Assembly programmer for a medium-sized project when looking at the function Overall performance = f(development time, application runtime, user experience). However, 1. we are not average Assembly programmers here in this forum and 2. our overall performance function will give zero weight to the development time factor simply because we are hobby programmers who enjoy coding: it's so much fun to beat a C function :biggrin:
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: TimoVJL on November 18, 2023, 03:57:04 PM
Quote from: jj2007 on November 18, 2023, 10:17:25 AM- I am a lousy C programmer; programming a simple string like Let my$="Today is the "+fDate$()+", and it's "+fTime$() would cost me ages in C but a few seconds in Assembly;",
I am a bit beginner in assembly language, so what assembler you use with that example ? Example just looks like an old basic language or VB script.
I am a bit rusty with C, but:
#include <stdio.h>
#include <time.h>

int __cdecl main(void)
{
    time_t tm = time(NULL);
    printf("Today is the %s\n", ctime(&tm));
    return 0;
}
outputs
Today is the Sat Nov 18 06:48:56 2023and exe size can be 3 072 bytes / 3 584 bytes (x64), if OS msvcrt.dll is used.
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: Vortex on November 18, 2023, 08:29:42 PM
Hi Timo,

The code above is MasmBasic :

https://masm32.com/board/index.php?board=57.0 (https://masm32.com/board/index.php?board=57.0)

MasmBasic allows string concatenation like the traditional Basic dialects.
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: jj2007 on November 18, 2023, 09:46:01 PM
Quote from: TimoVJL on November 18, 2023, 03:57:04 PMI am a bit beginner in assembly language, so what assembler you use with that example ?

MASM or UAsm, sometimes also AsmC or JWasm.

Btw your example is not equivalent: you are just printing stuff to the console, while I am assigning it to a string for further use.

@Erol: thanks, but Timo is just pulling my leg :biggrin:
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: adeyblue on November 19, 2023, 08:33:01 AM
Quote from: jj2007 on November 18, 2023, 09:46:01 PMBtw your example is not equivalent: you are just printing stuff to the console, while I am assigning it to a string for further use.
strftime
#include <stdio.h>
#include <time.h>

int __cdecl main(void)
{
    time_t now = time(0);
    char buff[200];
    strftime(buff, 200, "Today is %#x and it's %X", localtime(&now));
    puts(buff);
}
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: daydreamer on November 24, 2023, 05:11:36 PM
Jochen,well some assembler programmers have spent much time programming SIMD
about code productivity,with many years asm programming = we can whip up a program faster than expected what HLL programmer thinks asm is slow development compared to C
,faster than a newbie asm programmer
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: jj2007 on November 24, 2023, 08:00:12 PM
Yes indeed, Magnus :thumbsup:

It's all about "socialisation". Adeyblue whips up his strftime example in 2 minutes because he is a professional, and as such he has spent most of his career coding in C or C++. Same for Timo.

We are mostly hobby coders here, but we are fluent in Assembly, so we can do the same in our domain. Besides, you and me and many others here have also gone well beyond what a C/C++ guys can/will do in terms of SIMD programming.

P.S., I just hacked together a little challenge: How many files in \Masm32\Examples use ComCtr32.inc? (https://masm32.com/board/index.php?topic=11493.0)
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: ttribelli on January 03, 2024, 07:49:01 AM
[quote author=Vortex link=msg=125188 date=1700164017]
Hi ttribelli,

Thanks for the code. I could not manage to assemble this module :

D:\FreeBASIC\bin\win64>as -o hexstr-x64.o hexstr-x64.s
[/quote]

I apologize for the delay but the code is building and running properly under Debian now.
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: Vortex on January 04, 2024, 04:46:09 AM
Hi ttribelli,

No worries. Do you have a new version of the source code for members operating on Windows? Thanks.
Title: Re: Intel SSE and ARM NEON hexadecimal string
Post by: ttribelli on January 04, 2024, 05:00:56 AM
Quote from: Vortex on January 04, 2024, 04:46:09 AMNo worries. Do you have a new version of the source code for members operating on Windows? Thanks.

The problem was specific to the Mac/Linux version (.s), and only manifested on Linux. The Windows version (.asm) always worked.

The Mac/Linux version (.s) cannot be compiled on Windows using a GNU toolchain. The C ABI (register usage) is different between Windows and Mac/Linux.

The updated code has been tested on Debian and macOS.