FastArrayOps - implementations of common operations for linear arrays

LiaoMi · April 03, 2021, 02:20:00 AM

Hi,

FastArrayOps - https://github.com/komrad36/FastArrayOps

Extremely fast (up to 35x faster than compiler or STL, for arbitrarily large or small n) x86 / AVX2 implementations of common array ops for arrays of all basic data types:

Check if array contains element (set membership query)
Find index of element in array (find-element)
Find min element in array (find-min, min-element)
Find max element in array (find-max, max-element)
Find index of min element in array (idx-min, argmin, minidx)
Find index of max element in array (idx-max, argmax, maxidx)
This means the crossover point at which accelerated structures such as hash tables or trees become preferable to linear arrays is much, MUCH larger than usually taught or seen in the wild, especially for small elements. A linear array traversed with this library is faster for find-element, find-min, find-max, decrease-value, increase-value operations than "better" data structures unless you have thousands of elements.

Note that these are the operations provided by a priority queue, and indeed, similarly, an array-backed priority queue implemented with this library is faster than the traditional heap-backed priority queue unless you have many hundreds or thousands of elements.

The following are speedups for large n; proportional speedup is even higher than this for smaller n as the functions incorporate bespoke behavior optimized over all n.

Approx. speedup vs. best compiler output
Data type   Contains/Find   Min/Max Element   Min/Max Index
I8/U8   22x   2.3x   35x
I16/U16   18x   2.1x   19x
I32/U32   12x   1.1x   8.5x
I64/U64   7.0x   1.3x   3.1x
Float   13.0x   28x   8.9x
Double   7.0x   14x   5.0x

Min/max index does not guarantee which index is returned if multiple elements are all the minimum.

In contrast, find-element does guarantee to always return the first (lowest-index) element that matches.

0-element arrays are handled gracefully. Min/max index returns ~0U. Min/max returns the min/max representable value for the datatype.

If find-element does not find the element, it returns ~0ULL.

Gunther · April 03, 2021, 07:16:19 AM

LiaoMi,

interesting code for both worlds (Windows and Linux). Is it your project?

Gunther

LiaoMi · April 03, 2021, 08:00:36 PM

Quote from: Gunther on April 03, 2021, 07:16:19 AM
LiaoMi,

interesting code for both worlds (Windows and Linux). Is it your project?

Gunther

Hi Gunther,

no, the project is not mine, there is an author's contact in the readme. I just posted it here

Gunther · April 04, 2021, 06:23:53 AM

Quote from: LiaoMi on April 03, 2021, 08:00:36 PM
no, the project is not mine, there is an author's contact in the readme. I just posted it here

Thank you for posting. The question is: Can we beat the Intel MKL?

Gunther

LiaoMi · April 04, 2021, 10:56:26 PM

Hi Gunther,

Developer Reference for Intel® oneAPI Math Kernel Library - C
https://software.intel.com/content/dam/develop/external/us/en/documents/onemkl-developerreference-c.pdf

I do not know how to use this library

, because source code, that was compiled using Visual Studio and Intel Parallel Studio, was always 10 to 20 percent faster in Visual Studio without using these libraries. Even with maximum optimization, it was slow. I'm not sure if the MKL-library has exactly these atomic operations.

The MASM Forum

News:

FastArrayOps - implementations of common operations for linear arrays

LiaoMi

Gunther

LiaoMi

Gunther

LiaoMi