This repository is mainly for readers of Randy Hyde's various Art of Assembly books, but I thought I'd share it here just in case someone else might find it interesting. The code is a follow-up to a book example. Creating a hexadecimal string just begs for SIMD implementations. Here are Intel SSE and ARM NEON implementations, SSE and NEON C intrinsic implementations, x86-64 and AArch64 implementations, and an ordinary C implementation for comparison. Windows, macOS, and Linux.
The repository is a testbed, it's not trying for the fastest code possible. The "audience" is students, so I went for readability at times. Different algorithms may be faster depending on the underlying hardware. Table lookup or computed hex digits. Copying individual digits to the output buffer or collecting digits in a register and only copying to the output buffer when the register is full. Also, assembly vs intrinsics. If a specific compiler does not optimize intrinsics then even my not-fully-optimized assembly can beat intrinsics.
Forgive me for a repository that has both masm and gas code, but I am not a complete heathen and configured gas for Intel syntax so that Mac programmers may see the light. :-)
https://github.com/atribelli/hexstr
Thanks. I see two asm files in hexstr-main.zip, hexstr-sse and hexstr-sse, what's their respective role?
There are source and make files for both Windows masm and nmake, and macOS/Linux gas and make. The gas .s files are using Intel syntax mode. Note that the code is slightly different between Windows and macOS/Linux. Different registers are used for passing parameters from C to assembly.
Files
makefile - macOS and Linux based builds.
hexstr.mak - Windows based builds.
main.cpp - Timing code.
hexstr.h - Prototypes for hex string conversion functions.
hexstr.c - C and SSE and NEON intrinsic implementations.
hexstr-x64.s - x86-64 assembly implementation (gas).
hexstr-sse.s - SSE implementation (gas).
hexstr-a64.s - AArch64 assembly implementation.
hexstr-neon.s - NEON implementation.
hexstr-x64.asm - x86-64 assembly implementation (masm).
hexstr-sse.asm - SSE implementation (masm).
Building
make - Creates C and intrinsics based code, hexstr-c hexstr-intrin.
make intel - Creates assembly and SSE code, hexstr-x64 hexstr-sse.
make arm - Creates assembly and NEON code, hexstr-a64 hexstr-neon.
make clean - Removes executable and build files.
nmake /f hexstr.mak - Create all executables for Windows.
nmake /f hexstr.mak clean - Removes executable and build files under Windows.
Hi ttribelli,
Thanks for the code. I could not manage to assemble this module :
D:\FreeBASIC\bin\win64>as.exe --version
GNU assembler (Binutils for MinGW-W64 x86_64, built by Brecht Sanders) 2.34
Copyright (C) 2020 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `x86_64-w64-mingw32'.
D:\FreeBASIC\bin\win64>as -o hexstr-x64.o hexstr-x64.s
Conditional Assembly: Lookup digits
Conditional Assembly: Output bytes
hexstr-x64.s: Assembler messages:
hexstr-x64.s:118: Error: bad expression
hexstr-x64.s:118: Error: invalid use of register
hexstr-x64.s:121: Error: bad expression
hexstr-x64.s:121: Error: invalid use of register
hexstr-x64.s:123: Error: bad expression
hexstr-x64.s:123: Error: invalid use of register
hexstr-x64.s:123: Error: bad expression
hexstr-x64.s:123: Error: invalid use of register
hexstr-x64.s:128: Error: bad expression
hexstr-x64.s:128: Error: invalid use of register
hexstr-x64.s:128: Error: bad expression
hexstr-x64.s:128: Error: invalid use of register
hexstr-x64.s:133: Error: bad expression
hexstr-x64.s:133: Error: invalid use of register
hexstr-x64.s:133: Error: bad expression
hexstr-x64.s:133: Error: invalid use of register
hexstr-x64.s:138: Error: bad expression
hexstr-x64.s:138: Error: invalid use of register
hexstr-x64.s:138: Error: bad expression
hexstr-x64.s:138: Error: invalid use of register
hexstr-x64.s:143: Error: bad expression
hexstr-x64.s:143: Error: invalid use of register
Trying to assemble in the Msys2 environment :
# as -o hexstr-x64.o hexstr-x64.s
Conditional Assembly: Lookup digits
Conditional Assembly: Output bytes
hexstr-x64.s: Assembler messages:
hexstr-x64.s:118: Error: bad expression
hexstr-x64.s:118: Error: invalid use of register
hexstr-x64.s:121: Error: bad expression
hexstr-x64.s:121: Error: invalid use of register
hexstr-x64.s:58: Error: bad expression
hexstr-x64.s:91: Info: macro invoked from here
hexstr-x64.s:123: Info: macro invoked from here
hexstr-x64.s:58: Error: invalid use of register
hexstr-x64.s:91: Info: macro invoked from here
hexstr-x64.s:123: Info: macro invoked from here
hexstr-x64.s:59: Error: bad expression
hexstr-x64.s:91: Info: macro invoked from here
hexstr-x64.s:123: Info: macro invoked from here
QuoteThanks for the code. I could not manage to assemble this module :
D:\FreeBASIC\bin\win64>as.exe --version
GNU assembler (Binutils for MinGW-W64 x86_64, built by Brecht Sanders) 2.34
This assembler was configured for a target of `x86_64-w64-mingw32'.
I replicated the problem on Debian. I screwed up and only tested the Intel .s under macOS and not Linux. All my Linux testing was ARM.
While waiting for a repository update, if you have Windows, try nmake and hexstr.mak.
Thanks for your patience.
Quote from: ttribelli on November 17, 2023, 06:34:34 AMhexstr-sse.asm - SSE implementation (masm)
AMD Athlon Gold 3150U with Radeon Graphics (SSE4)
417 cycles for 100 * Hex$ (MasmBasic)
550 cycles for 100 * ToHexStr
5907 cycles for 100 * hex$ (Masm32 SDK)
46498 cycles for 100 * crt Printf()
374 cycles for 100 * Hex$ (MasmBasic)
527 cycles for 100 * ToHexStr
5820 cycles for 100 * hex$ (Masm32 SDK)
46328 cycles for 100 * crt Printf()
361 cycles for 100 * Hex$ (MasmBasic)
524 cycles for 100 * ToHexStr
5934 cycles for 100 * hex$ (Masm32 SDK)
46904 cycles for 100 * crt Printf()
371 cycles for 100 * Hex$ (MasmBasic)
558 cycles for 100 * ToHexStr
5673 cycles for 100 * hex$ (Masm32 SDK)
46998 cycles for 100 * crt Printf()
Hex$ (MasmBasic) 12345600
ToHexStr 12345600
hex$ (Masm32 SDK) 12345600
crt Printf() 12345600
Congrats, it's really fast :thumbsup:
Quote from: jj2007 on November 17, 2023, 11:58:57 AMQuote from: ttribelli on November 17, 2023, 06:34:34 AMhexstr-sse.asm - SSE implementation (masm)
Congrats, it's really fast :thumbsup:
Thank you. It could get faster. Removing the macro and inlining the code would provide more interleaving opportunities.
Quote from: jj2007 on November 17, 2023, 11:58:57 AMCongrats, it's really fast :thumbsup:
Wow. (Almost) two orders of magnitude compared to the CRT routine. (Maybe that shouldn't surprise us ...)
Quote from: NoCforMe on November 17, 2023, 07:20:02 PMWow. (Almost) two orders of magnitude compared to the CRT routine. (Maybe that shouldn't surprise us ...)
Remember that nowadays C/C++ compilers are faster than hand-made Assembly (according to C/C++ programmers) :cool:
Not easy to compete with the C\C++ compilers especially if the case is large projects.
I wonder why they are not using their fast compilers for the common good :cool:
361 cycles for 100 * Hex$ (MasmBasic)
46904 cycles for 100 * crt Printf()
printf and sprintf are standard versatile C runtime functions, not specific for purbose, like yours.
Quote from: Vortex on November 18, 2023, 12:58:32 AMNot easy to compete with the C\C++ compilers especially if the case is large projects.
That is an unknown until you have profiled the code and discovered any hot spots. Its not about size, its about the percentage of time in hot spots.
Hi ttribelli,
This has nothing to do with unknown cases and hot spots. Maybe, I should be more clear : imagine that I am tasked to write a very very big application in assembly. Trying to optimize the whole application would be a very tiring task too. On the other side,you could write the same application with an optimizing C\C++ compiler. The optimizing engine does not get tired like me or another human. From a practical point of view, the C\C++ compiler can be the winner.
Quote from: Vortex on November 18, 2023, 06:25:09 AMimagine that I am tasked to write a very very big application in assembly.
In that case advice HR that your manager may need medical intervention. :-)
Hi ttribelli,
Compared to a human, a machine is much more robust and productive in some cases. That was the point of my example.
Quote from: Vortex on November 18, 2023, 06:39:30 AMCompared to a human, a machine is much more robust and productive in some cases. That was the point of my example.
I agree, until the profiler tells me about serious hotspots. Modern C/C++ compilers have inline assembly or SIMD intrinsic support for very good reasons.
Quote from: Vortex on November 18, 2023, 06:25:09 AMimagine that I am tasked to write a very very big application in assembly. Trying to optimize the whole application would be a very tiring task too. On the other side,you could write the same application with an optimizing C\C++ compiler. The optimizing engine does not get tired like me or another human. From a practical point of view, the C\C++ compiler can be the winner.
The compiler
can be the winner, but it depends on a number of factors:
- I am a lousy C programmer; programming a simple string like
Let my$="Today is the "+fDate$()+", and it's "+fTime$() would cost me ages in C but a few seconds in Assembly;
- we are Assembly programmers: we know exactly how to
design an application for speed;
- we know how to recognise an innermost loop, and how to tickle out of the cpu the minimum cycle count inside that loop;
- we are Windows API experts: our code will not be portable, but we can make the best use of Windows without bloated libraries like Electron or QT.
In short: it depends. Maybe the average C programmer can beat the average Assembly programmer for a medium-sized project when looking at the function
Overall performance = f(development time, application runtime, user experience). However, 1. we are not average Assembly programmers here in this forum and 2. our
overall performance function will give zero weight to the development time factor simply because we are hobby programmers who
enjoy coding: it's so much fun to beat a C function :biggrin:
Quote from: jj2007 on November 18, 2023, 10:17:25 AM- I am a lousy C programmer; programming a simple string like Let my$="Today is the "+fDate$()+", and it's "+fTime$() would cost me ages in C but a few seconds in Assembly;",
I am a bit beginner in assembly language, so what assembler you use with that example ? Example just looks like an old basic language or VB script.
I am a bit rusty with C, but:
#include <stdio.h>
#include <time.h>
int __cdecl main(void)
{
time_t tm = time(NULL);
printf("Today is the %s\n", ctime(&tm));
return 0;
}
outputs
Today is the Sat Nov 18 06:48:56 2023
and exe size can be 3 072 bytes / 3 584 bytes (x64), if OS msvcrt.dll is used.
Hi Timo,
The code above is MasmBasic :
https://masm32.com/board/index.php?board=57.0 (https://masm32.com/board/index.php?board=57.0)
MasmBasic allows string concatenation like the traditional Basic dialects.
Quote from: TimoVJL on November 18, 2023, 03:57:04 PMI am a bit beginner in assembly language, so what assembler you use with that example ?
MASM or UAsm, sometimes also AsmC or JWasm.
Btw your example is not equivalent: you are just printing stuff to the console, while I am assigning it to a string for further use.
@Erol: thanks, but Timo is just pulling my leg :biggrin:
Quote from: jj2007 on November 18, 2023, 09:46:01 PMBtw your example is not equivalent: you are just printing stuff to the console, while I am assigning it to a string for further use.
strftime
#include <stdio.h>
#include <time.h>
int __cdecl main(void)
{
time_t now = time(0);
char buff[200];
strftime(buff, 200, "Today is %#x and it's %X", localtime(&now));
puts(buff);
}
Jochen,well some assembler programmers have spent much time programming SIMD
about code productivity,with many years asm programming = we can whip up a program faster than expected what HLL programmer thinks asm is slow development compared to C
,faster than a newbie asm programmer
Yes indeed, Magnus :thumbsup:
It's all about "socialisation". Adeyblue whips up his strftime example in 2 minutes because he is a professional, and as such he has spent most of his career coding in C or C++. Same for Timo.
We are mostly hobby coders here, but we are fluent in Assembly, so we can do the same in our domain. Besides, you and me and many others here have also gone well beyond what a C/C++ guys can/will do in terms of SIMD programming.
P.S., I just hacked together a little challenge: How many files in \Masm32\Examples use ComCtr32.inc? (https://masm32.com/board/index.php?topic=11493.0)
[quote author=Vortex link=msg=125188 date=1700164017]
Hi ttribelli,
Thanks for the code. I could not manage to assemble this module :
D:\FreeBASIC\bin\win64>as -o hexstr-x64.o hexstr-x64.s
[/quote]
I apologize for the delay but the code is building and running properly under Debian now.
Hi ttribelli,
No worries. Do you have a new version of the source code for members operating on Windows? Thanks.
Quote from: Vortex on January 04, 2024, 04:46:09 AMNo worries. Do you have a new version of the source code for members operating on Windows? Thanks.
The problem was specific to the Mac/Linux version (.s), and only manifested on Linux. The Windows version (.asm) always worked.
The Mac/Linux version (.s) cannot be compiled on Windows using a GNU toolchain. The C ABI (register usage) is different between Windows and Mac/Linux.
The updated code has been tested on Debian and macOS.