Print Page - Hash tables for ultra fast dictionaries

Title: Hash tables for ultra fast dictionaries
Post by: Biterider on January 10, 2022, 01:09:58 AM

Hi
For the past 2 years I have been involved in an industrial-scale Python project.
One of the things I've learned is how to use Python dictionaries really efficiently. The backend is a combination of an index table and a hash table.
While there are many implementations in other languages that use many different hash algorithms and implementations, looking for a hash table implementation in masm did not produce a useful result. :sad:

Now it is time again to get my hands dirty and write some code that is flexible to use.
As a proof of concept, I started using an OOP approach, but it can easily be turned into a procedural approach.
I will be working on it in the next few weeks and I will be happy when someone can join me. :cool:

A good description I used as starting point can be found here https://www.data-structures-in-practice.com/hash-tables/ (https://www.data-structures-in-practice.com/hash-tables/)

Biterider

Title: Re: Hash tables for ultra fast dictionaries
Post by: LiaoMi on January 10, 2022, 09:06:21 PM

Hi Biterider,

I was interested in this topic earlier, I was looking for a method in which I could use - Bidirectional-Associative-memory - with high search speed on large volumes up to 200 gigabytes. For other purposes, you can look at a combination of AVL trees and hashing. In any case, a lot depends on the initial requirements for data structures and speed.

A c++ hash map/table which utilizes simd (specifically Intel x86 SSE/AVX) - https://github.com/NathanSWard/simd_hash_map

Title: Re: Hash tables for ultra fast dictionaries
Post by: six_L on January 11, 2022, 06:47:10 AM

Hi，Biterider
interesting...
(2^N-1)&X = Mod(X/2^N)
;-----------------------------
what's your next plan? translate Python's code into Asm

Title: Re: Hash tables for ultra fast dictionaries
Post by: Biterider on January 11, 2022, 07:04:58 AM

Hi LiaoMi
Of course, collision handling is a big deal when it comes to hash tables.
As you know, one way is to "openly address" or append some sort of binary tree (RBT, AVL etc) linked list etc to store items with the same hash.
Thanks for the link, I'll check it :thumbsup:

Biterider

Title: Re: Hash tables for ultra fast dictionaries
Post by: Biterider on January 11, 2022, 07:40:02 AM

Hi six_L
From what I've seen using Python (I'm not a Python fanboy!), it has extremely useful constructs. One of them is the dictionaries discussed, but there are others too.
My intention is to bring the good things I've seen into asm using an asm implementation.

In fact, Python's core is written in C, which in many ways comes close to asm. I'm not talking here about Python's VM, which is a complete different thing!
You can check out the latest implementation details on Github's CPython.

Biterider

Title: Re: Hash tables for ultra fast dictionaries
Post by: mineiro on January 11, 2022, 08:28:14 AM

Quote from: six_L on January 11, 2022, 06:47:10 AM
interesting...
(2^N-1)&X = Mod(X/2^N)

I used this approach here, sounds ok without optimized code, an element frequency count of 2 millions random numbers executed in 0.1439990000 sec. To reduce collisions more memory is need.
https://masm32.com/board/index.php?topic=8811.msg96213#msg96213

I think the first question is how data looks like, it's fixed size or dynamic size?

Glib hash to strings use:
"This function implements the widely used "djb" hash apparently posted by Daniel Bernstein to comp.lang.c some time ago. The 32 bit unsigned hash value starts at 5381 and for each byte 'c' in the string, is updated: hash = hash * 33 + c. This function uses the signed value of each byte."

A good reference book about hash is Algorithms by Robert Sedgewick.
You can see a lot of hash code if you search for dictionary approach (lempel ziv) to data compression. LZ,LZW,LZO,LZP, LZARI, ... .

Title: Re: Hash tables for ultra fast dictionaries
Post by: mineiro on January 11, 2022, 08:29:41 AM

Quote from: LiaoMi on January 10, 2022, 09:06:21 PM
I was looking for a method in which I could use - Bidirectional-Associative-memory -

A.I. with Hopfield? To restore damaged images?

Title: Re: Hash tables for ultra fast dictionaries
Post by: Biterider on January 11, 2022, 05:31:59 PM

Hi mineiro
If you look at the different implementations and strategies you will find that they all look very similar. They differ mainly in the hash algorithm and in the collision handler.
IMHO it is possible to build a common skeleton and change the hash algo or the collision handler according to individual needs.

There are of course details like resizing (e.g. only powers of 2) or deleting (Tombstones, shifting, etc.), but that can certainly be worked out.

Biterider

Title: Re: Hash tables for ultra fast dictionaries
Post by: mikeburr on January 12, 2022, 11:34:51 AM

i think you really need to understand your data and hence have a specific application for the use of hash tables .. trees wdnt be almost ubiquitous otherwise ??? .. one thought though is .....
It is often thought that in modular arithmetic [which includes x_bit computers ] that you need to divide to get a modular solution..... but you can multiply by the inverse to get the equivalent .. what is not quite so easy is calculating the inverses which wd seem at first sight to be trivial
regards mike b

Title: Re: Hash tables for ultra fast dictionaries
Post by: Biterider on January 17, 2022, 05:21:00 AM

Hi
It took some effort but finally I got it working. :biggrin:

The HashTable is defined such that the hashing and collision handler are implemented in a derived object. This has the advantage that different classes can be created using different strategies.
As I mentioned before, I took some inspiration from Python, but at the end, I came with a slightly different approach, which maybe is not better, but is good enough as a starting point.

In this case I created a Dictionary from the HashTable using open addressing. It handles string keys of any length and a payload of any kind. I've chosen a DWORD that I've initialized with an incrementing index for demonstration purposes.
I intentionally left aside all the necessary memory allocations to just focus on inserting, searching and deleting the dictionary entries.
My first attempt was to use the well-known FNV-1a hashing algorithm and a naive deletion strategy using tombstones, which are widely used in the literature. This combination results in 126 collisions for 512 items.

As a performance reference, I used a binary tree in form of a highly optimized object called SortedStrCollection(W).
In theory, a dictionary should perform with O(1) while the SortedStrCollection using the binary search should perform with O(n). The reality comes close to the theory. The reasons are:

Hash collisions degrade the performance of the HashTable.
The overhead of the hash calculation is significant when there are only a few elements in the containers.

Still, looking at the results, searching and deleting outperforms the binary search, even for very small element counts.

A major disadvantage comes into play when a HashTable needs to be resized, which requires all elements to be reinserted by hashing them again. The SortedCollection does not suffer from this effect, since all item pointers remain in place when the internal table is resized.

There are many possible improvements like testing other hashing algorithms, in-place allocation, close addressing, etc. which I'll try over the next few days.
Attached are the sources and a log-log chart comparing performance between the dictionary and the SortedStrCollection.

Biterider

Title: Re: Hash tables for ultra fast dictionaries
Post by: LiaoMi on January 17, 2022, 05:43:03 AM

Quote from: mineiro on January 11, 2022, 08:29:41 AM
Quote from: LiaoMi on January 10, 2022, 09:06:21 PM
I was looking for a method in which I could use - Bidirectional-Associative-memory -
A.I. with Hopfield? To restore damaged images?

Hi mineiro,

it was a database search, here the data is located randomly and cannot be sorted by structural conditions. In both directions, the search must have the same speed.

Title: Re: Hash tables for ultra fast dictionaries
Post by: Biterider on January 18, 2022, 09:08:30 PM

Hi
While searching for hashing algorithms, I discovered that both Uasm and AsmC use the well-known FNV-1 hashing routine for their symbol tables.
A much more complicated hash routine is the MurmurHash3, which should be fine for general hash-based searching.

There are a few other interesting algorithms to test. :cool:

Biterider

Title: Re: Hash tables for ultra fast dictionaries
Post by: six_L on January 19, 2022, 07:21:45 AM

Hi，Biterider
here is a hash which happens Collision hardly.

Title: Re: Hash tables for ultra fast dictionaries
Post by: Biterider on January 19, 2022, 08:40:47 AM

Hi six_L
Looks promising. I'll check it.

Regards,
Biterider

Title: Re: Hash tables for ultra fast dictionaries
Post by: LiaoMi on January 20, 2022, 10:02:22 AM

Hi,

https://www.asmcommunity.net/forums/topic/?id=30185

QuoteOk, I've started working on my own version of h2incn - a program that will parse C header files and convert them into Nasm compatible .inc files. After creating the initial program outline I am now ready to begin adding in the preprocessing support.

One of the things that I need is a fast way of performing lookups of defines and typedefs. I've settled on using hash maps and the hash function algorithm FNV-1 which, according to the authors, provides for a nice dispersion across a hash map. Thus I begin by defining a hash map to contain the key/value pairs and develop the hashing function.

I want the hashing function and the hash map to be capable of holding arbitrary types and data sizes. Thus the routines being developed do not depend on simple character strings (although you can indeed supply them to the functions). That way, the routines may be used to hold objects of all types for any purpose ( token processing, game world objects, in-memory databases, caching, etc. ).

The following is my Nasm modified version of the FNV-1 hash algorithm for use in both 32-bit and 64-bit systems. Note that the 32-bit version uses the standard C calling convention while the 64-bit version uses either Windows or Linux fastcall calling conventions. It may be optimized further (an exersize left for the reader) but I'm quite happy with it. I would love to know if others find use for it and what their collision findings are...

nasm - FNV1HASH.ASM

Code Select

;
; FNV1HASH.ASM
;
; Copyright (C)2010 Rob Neff
; Source code is licensed under the new/simplified 2-clause BSD OSI license.
;
; This function implements the FNV-1 hash algorithm.
; This source file is formatted for Nasm compatibility although it
; is small enough to be easily converted into another assembler format.
;
; Example C/C++ call:
;
; #ifdef __cplusplus
; extern "C" {
; #endif
;
; unsigned int FNV1Hash(char *buffer, unsigned int len, unsigned int offset_basis);
;
; #ifdef __cplusplus
; }
; #endif
;
; int hash;
;
; /* obtain 32-bit FNV1 hash */
; hash = FNV1Hash(buffer, len, 2166136261);
;
; /* if desired - convert from a 32-bit to 16-bit hash */
; hash = ((hash >> 16) ^ (hash & 0xFFFF));
;



%ifidni __BITS__,32
;
; 32-bit C calling convention
;

%define buffer
%define len
%define offset_basis

global _FNV1Hash

_FNV1Hash:
  push ebp                    ; set up stack frame
  mov  ebp, esp
  push esi                    ; save registers used
  push edi
  push ebx
  push ecx
  push edx

  mov  esi, buffer            ;esi = ptr to buffer
  mov  ecx, len              ;ecx = length of buffer (counter)
  mov  eax, offset_basis      ;set to 2166136261 for FNV-1
  mov  edi, 1000193h          ;FNV_32_PRIME = 16777619
  xor  ebx, ebx              ;ebx = 0
nextbyte:
  mul  edi                    ;eax = eax * FNV_32_PRIME
  mov  bl,               ;bl = byte from esi
  xor  eax, ebx              ;al = al xor bl
  inc  esi                    ;esi = esi + 1 (buffer pos)
  dec  ecx                    ;ecx = ecx - 1 (counter)
  jnz  nextbyte              ;if ecx != 0, jmp to NextByte

  pop  edx                    ; restore registers
  pop  ecx
  pop  ebx
  pop  edi
  pop  esi
  mov  esp, ebp              ; restore stack frame
  pop  ebp
  ret                        ; eax = fnv1 hash

%elifidni __BITS__,64
;
; 64-bit function
;

%ifidni __OUTPUT_FORMAT__,win64
;
; 64-bit Windows fastcall convention:
;    ints/longs/ptrs: RCX, RDX, R8, R9
;    floats/doubles: XMM0 to XMM3
;
global FNV1Hash

FNV1Hash:
  xchg rcx, rdx              ;rcx = length of buffer
  xchg r8, rdx                ;r8 = ptr to buffer

%elifidni __OUTPUT_FORMAT__,elf64
;
; 64-bit Linux fastcall convention
;    ints/longs/ptrs: RDI, RSI, RDX, RCX, R8, R9
;    floats/doubles: XMM0 to XMM7
;
global _FNV1Hash

_FNV1Hash:
  mov  rcx, rsi
  mov  r8, rdi

%endif

  mov  rax, rdx              ;rax = offset_basis - set to 14695981039346656037 for FNV-1
  mov  r9, 100000001B3h      ;r9 = FNV_64_PRIME = 1099511628211
  mov  r10, rbx              ;r10 = saved copy of rbx
  xor  rbx, rbx              ;rbx = 0
nextbyte:
  mul  r9                    ;rax = rax * FNV_64_PRIME
  mov  bl,               ;bl = byte from r8
  xor  rax, rbx              ;al = al xor bl
  inc  r8                    ;inc buffer pos
  dec  rcx                    ;rcx = rcx - 1 (counter)
  jnz  nextbyte              ;if rcx != 0, jmp to nextbyte
  mov  rbx, r10              ;restore rbx
  ret                        ;rax = fnv1 hash

%endif

Robin Hood hashing
Robin hood hashing is an open addressing based collision resolution algorithm; the collisions are resolved through favouring the displacement of the element that is farthest—or longest probe sequence length (PSL)—from its "home location" i.e. the bucket to which the item was hashed into. Although robin hood hashing does not change the theoretical search cost, it significantly affects the variance of the distribution of the items on the buckets, i.e. dealing with cluster formation in the hash table.

Leveraging SIMD parallelism for accelerating network - https://conferences.sigcomm.org/events/apnet2020/material/apnet20-final20.pdf

xxHash - Extremely fast non-cryptographic hash algorithm
Benchmarks
The reference system uses an Intel i7-9700K cpu, and runs Ubuntu x64 20.04. The open source benchmark program is compiled with clang v10.0 using -O3 flag.

Hash Name   Width   Bandwidth (GB/s)   Small Data Velocity   Quality   Comment
XXH3 (SSE2)   64   31.5 GB/s   133.1   10
XXH128 (SSE2)   128   29.6 GB/s   118.1   10
RAM sequential read   N/A   28.0 GB/s   N/A   N/A   for reference
City64   64   22.0 GB/s   76.6   10
T1ha2   64   22.0 GB/s   99.0   9   Slightly worse collisions
City128   128   21.7 GB/s   57.7   10
XXH64   64   19.4 GB/s   71.0   10
SpookyHash   64   19.3 GB/s   53.2   10
Mum   64   18.0 GB/s   67.0   9   Slightly worse collisions
XXH32   32   9.7 GB/s   71.9   10
City32   32   9.1 GB/s   66.0   10
Murmur3   32   3.9 GB/s   56.1   10
SipHash   64   3.0 GB/s   43.2   10
FNV64   64   1.2 GB/s   62.7   5   Poor avalanche properties
Blake2   256   1.1 GB/s   5.1   10   Cryptographic
SHA1   160   0.8 GB/s   5.6   10   Cryptographic but broken
MD5   128   0.6 GB/s   7.8   10   Cryptographic but broken

* | Hash Name | ISA ext | Width | Large Data Speed | Small Data Velocity |
* | -------------------- | ------- | ----: | ---------------: | ------------------: |
* | XXH3_64bits() | @b AVX2 | 64 | 59.4 GB/s | 133.1 |
* | MeowHash | AES-NI | 128 | 58.2 GB/s | 52.5 |
* | XXH3_128bits() | @b AVX2 | 128 | 57.9 GB/s | 118.1 |
* | CLHash | PCLMUL | 64 | 37.1 GB/s | 58.1 |
* | XXH3_64bits() | @b SSE2 | 64 | 31.5 GB/s | 133.1 |
* | XXH3_128bits() | @b SSE2 | 128 | 29.6 GB/s | 118.1 |
* | RAM sequential read | | N/A | 28.0 GB/s | N/A |
* | ahash | AES-NI | 64 | 22.5 GB/s | 107.2 |
* | City64 | | 64 | 22.0 GB/s | 76.6 |
* | T1ha2 | | 64 | 22.0 GB/s | 99.0 |
* | City128 | | 128 | 21.7 GB/s | 57.7 |
* | FarmHash | AES-NI | 64 | 21.3 GB/s | 71.9 |
* | XXH64() | | 64 | 19.4 GB/s | 71.0 |
* | SpookyHash | | 64 | 19.3 GB/s | 53.2 |
* | Mum | | 64 | 18.0 GB/s | 67.0 |
* | CRC32C | SSE4.2 | 32 | 13.0 GB/s | 57.9 |
* | XXH32() | | 32 | 9.7 GB/s | 71.9 |
* | City32 | | 32 | 9.1 GB/s | 66.0 |
* | Blake3* | @b AVX2 | 256 | 4.4 GB/s | 8.1 |
* | Murmur3 | | 32 | 3.9 GB/s | 56.1 |
* | SipHash* | | 64 | 3.0 GB/s | 43.2 |
* | Blake3* | @b SSE2 | 256 | 2.4 GB/s | 8.1 |
* | HighwayHash | | 64 | 1.4 GB/s | 6.0 |
* | FNV64 | | 64 | 1.2 GB/s | 62.7 |
* | Blake2* | | 256 | 1.1 GB/s | 5.1 |
* | SHA1* | | 160 | 0.8 GB/s | 5.6 |
* | MD5* | | 128 | 0.6 GB/s | 7.8 |

note 1: Small data velocity is a rough evaluation of algorithm's efficiency on small data. For more detailed analysis, please refer to next paragraph.
note 2: some algorithms feature faster than RAM speed. In which case, they can only reach their full speed when input data is already in CPU cache (L3 or better). Otherwise, they max out on RAM speed limit.

Small data
Performance on large data is only one part of the picture. Hashing is also very useful in constructions like hash tables and bloom filters. In these use cases, it's frequent to hash a lot of small data (starting at a few bytes). Algorithm's performance can be very different for such scenarios, since parts of the algorithm, such as initialization or finalization, become fixed cost. The impact of branch mis-prediction also becomes much more present.

XXH3 has been designed for excellent performance on both long and small inputs, which can be observed in the following graph: - https://user-images.githubusercontent.com/750081/61976089-aedeab00-af9f-11e9-9239-e5375d6c080f.png

Vectorizing xxHash for Fun and Profit - https://moinakg.wordpress.com/2013/01/19/vectorizing-xxhash-for-fun-and-profit/

Title: Re: Hash tables for ultra fast dictionaries
Post by: six_L on January 20, 2022, 09:48:47 PM

Hi,
i tested the FNV1 hash algorithm. it's faster and happenning collision hardly.

Code Select

DictHashValue_FNV1 proc uses rbx pData:qword,dqSize:qword	;hashValue
	; retValue=rax

_FNV1Hash:
	mov rax,14695981039346656037;rax = offset_basis - set to 14695981039346656037 for FNV-1
	mov rcx, dqSize
	mov r8, pData

	mov r9, 0100000001B3h     ;r9 = FNV_64_PRIME = 1099511628211
	xor rbx, rbx              ;rbx = 0
nextbyte:
	mul r9                    ;rax = rax * FNV_64_PRIME
	mov bl, [r8]		  ;bl = byte from r8
	xor rax, rbx              ;al = al xor bl
	inc r8                    ;inc buffer pos
	dec rcx                   ;rcx = rcx - 1 (counter)
	jnz nextbyte              ;if rcx != 0, jmp to nextbyte
	ret                       ;rax = fnv1 hash

DictHashValue_FNV1 endp

Title: Re: Hash tables for ultra fast dictionaries
Post by: Biterider on January 20, 2022, 10:04:55 PM

Hi six_L
That is cool :thumbsup:
Since that you have the test setup ready, can you check and compare with the FNV-1a algorithm?
it is only necessary to change the order of operations. This algo is recommended in the literature over the FNV-1 because it is slightly better and requires the same computing time.

https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function (https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function)

Thanks, Biterider

Title: Re: Hash tables for ultra fast dictionaries
Post by: six_L on January 20, 2022, 11:06:54 PM

Hi,Biterider
i can't access the [https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function (https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function)], could you convert it into PDF and post here?
i'll know what's the hash algorithm which has been recommended by literature.

regard.

Title: Re: Hash tables for ultra fast dictionaries
Post by: jj2007 on January 21, 2022, 01:17:38 AM

For you ;-)

Title: Re: Hash tables for ultra fast dictionaries
Post by: six_L on January 21, 2022, 01:37:59 AM

Hi,jj2007
:thumbsup:
Thank you very much.

Title: Re: Hash tables for ultra fast dictionaries
Post by: six_L on January 21, 2022, 03:56:22 AM

Hi,Biterider
we'll use some names as key, so FNV-1 and FNV-1a has a same effects on avalanche or random distribution of hash values. about speed, FNV-1 is as fast as FNV-1a on calculating hash values.

Quotecan you check and compare with the FNV-1a algorithm?

the attachmnet is the effects of FNV-1a on random distribution of hash values.

Title: Re: Hash tables for ultra fast dictionaries
Post by: Biterider on January 21, 2022, 05:53:58 AM

Hi six_L
Thanks for testing :cool:
Here are some interesting comparisons between hash algos. In particular, FNV-1 vs. FNV-1a.
If you look around, CRC32 is another interesting candidate.

https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed (https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed)

Biterider

Title: Re: Hash tables for ultra fast dictionaries
Post by: six_L on January 21, 2022, 07:51:05 AM

Hi,Biterider

QuoteDo collisions actually happen?
Yes. I started writing my test program to see if hash collisions actually happen - and are not just a theoretical construct. They do indeed happen:

FNV-1 collisions
creamwove collides with quists

FNV-1a collisions
costarring collides with liquid
declinate collides with macallums
altarage collides with zinke
altarages collides with zinkes

Murmur2 collisions

i doubt the results of his test. Maybe he tested on a 32bit system.

Title: Re: Hash tables for ultra fast dictionaries
Post by: LiaoMi on January 21, 2022, 10:20:58 AM

Hi,

I have compiled a library for this type of hashes - xxHashLib_x64 :rolleyes:

It successfully completes the SMHasher test suite which evaluates collision, dispersion and randomness qualities of hash functions. Code is highly portable, and hashes are identical across all platforms (little / big endian).
Here's results of various hash functions, hashing data of different lengths, with performance in MB/s: https://aras-p.info/img/blog/2016-08/hash-perf.png (https://aras-p.info/blog/2016/08/02/Hash-Functions-all-the-way-down/)

Where can I find xxhash64 and md5 collision probability statistics? - https://stackoverflow.com/questions/45788721/where-can-i-find-xxhash64-and-md5-collision-probability-statistics

If you use xxhash64,
Assuming that xxhash64 produce a 64-bit hash. You will get this graph:
(https://i.stack.imgur.com/bOsiJ.jpg)

Title: Re: Hash tables for ultra fast dictionaries
Post by: LiaoMi on January 21, 2022, 07:52:20 PM

Hi :biggrin:,

ISCSI CRC 32 Implementation with crc32 and pclmulqdq Instructions.

"Hardware-accelerated CRC (labeled iSCSI CRC in the table) is the fastest hash function on the recent Core i5/i7 processors. However, the CRC32 instruction is not supported by AMD and earlier Intel processors"

Title: Re: Hash tables for ultra fast dictionaries
Post by: Biterider on January 21, 2022, 10:28:41 PM

Hi LiaoMi
Thanks for all the usefull information. :thumbsup:
I took a look into smhasher at https://github.com/aappleby/smhasher (https://github.com/aappleby/smhasher).
It has a lot of usefull things, including src of the hash algos!

Biterider

Title: Re: Hash tables for ultra fast dictionaries
Post by: LiaoMi on January 21, 2022, 11:51:32 PM

Quote from: Biterider on January 21, 2022, 10:28:41 PM
Hi LiaoMi
Thanks for all the usefull information. :thumbsup:
I took a look into smhasher at https://github.com/aappleby/smhasher (https://github.com/aappleby/smhasher).
It has a lot of usefull things, including src of the hash algos!

Biterider

Hi Biterider,

I would like to know exactly the spread of execution speed, in the material above there are many graphs with different comparisons. In general terms, I derived the following rules, speed depends:

1. on the type of processor
2. on the data itself
3. on the amount of data
3. on key length (length affects collision balance)
4. on hardware acceleration (crc32 and pclmulqdq etc)
5. on parallel execution (Can slow down execution in special cases)
6. on seed (?!)

xxHash - Extremely fast hash algorithm - already contains a collision test - https://github.com/Cyan4973/xxHash/blob/dev/tests/collisions/main.c (https://en.wikipedia.org/wiki/Avalanche_effect)

Judging by the graphs, xxHash has very good performance, it has different sub-versions for other tasks :rolleyes:

Today in the previous post I added the x86 version of xxHash

Title: Re: Hash tables for ultra fast dictionaries
Post by: hutch-- on January 22, 2022, 12:23:45 PM

I have never bothered to port my basic hash table to masm, it uses a basic dynamic string array which is allocate as an empty array of placeholders which you then fill with data based on the the hash calculation. I use a reverse table of prime numbers to get the initial array location and the array length must also be a prime number so that it can never be divided by an number larger than 1.

The advantage of the dynamic array is that you can make massive arrays as each placeholder is only 4 bytes. You pay a penalty loading the array full of placeholders but you only notice it on very large arrays. Once loaded with values, the technique is genuinely fast and it does not slow down until well over 80% of the table length.

You should only really saturate the table up to about 50% but having a lot of head room makes it a lot easier to use.

Title: Re: Hash tables for ultra fast dictionaries
Post by: six_L on January 22, 2022, 07:31:45 PM

Hi,LiaoMi
thank you providid a lot of the usefull information and file.

QuotexxHash has very good performance.

i want to test. but I have assembled your xxHashLib_x64，show me "fatal error LNK1104: can't open the file 'LIBCMT.lib'"

Title: Re: Hash tables for ultra fast dictionaries
Post by: LiaoMi on January 23, 2022, 12:05:33 AM

Quote from: six_L on January 22, 2022, 07:31:45 PM
Hi,LiaoMi
thank you providid a lot of the usefull information and file.
QuotexxHash has very good performance.
i want to test. but I have assembled your xxHashLib_x64，show me "fatal error LNK1104: can't open the file 'LIBCMT.lib'"

Hi six_L,

the problem is in the functions that are not included in this library :sad:

I added a part, but again something is missing

Code Select

includelib xxHashLib_x86.lib
includelib ucrt.lib
includelib libucrt.lib
includelib vcruntime.lib
includelib vc.lib

Here is the build log:

Code Select

Searching libraries
    Searching xxHashLib_x86.lib:
      Found _XXH32@12
        Referenced in test_bugs.obj
        Loaded xxHashLib_x86.lib(xxhash.obj)
Processed /DEFAULTLIB:LIBCMT
Processed /DEFAULTLIB:OLDNAMES
    Searching ucrt.lib:
      Found _free
        Referenced in xxHashLib_x86.lib(xxhash.obj)
        Loaded ucrt.lib(msvcrt.dll)
      Found _malloc
        Referenced in xxHashLib_x86.lib(xxhash.obj)
        Loaded ucrt.lib(msvcrt.dll)
      Found __IMPORT_DESCRIPTOR_msvcrt
        Referenced in ucrt.lib(msvcrt.dll)
        Referenced in ucrt.lib(msvcrt.dll)
        Loaded ucrt.lib(msvcrt.dll)
      Found __NULL_IMPORT_DESCRIPTOR
        Referenced in ucrt.lib(msvcrt.dll)
        Loaded ucrt.lib(msvcrt.dll)
      Found msvcrt_NULL_THUNK_DATA
        Referenced in ucrt.lib(msvcrt.dll)
        Loaded ucrt.lib(msvcrt.dll)
    Searching libucrt.lib:
    Searching vcruntime.lib:
      Found _memcpy
        Referenced in xxHashLib_x86.lib(xxhash.obj)
        Loaded vcruntime.lib(msvcrt.dll)
      Found _memset
        Referenced in xxHashLib_x86.lib(xxhash.obj)
        Loaded vcruntime.lib(msvcrt.dll)
    Searching vc.lib:
      Found __allmul
        Referenced in xxHashLib_x86.lib(xxhash.obj)
        Loaded vc.lib(llmul.obj)
      Found __aullshr
        Referenced in xxHashLib_x86.lib(xxhash.obj)
        Loaded vc.lib(ullshr.obj)
LINK : fatal error LNK1104: cannot open file 'LIBCMT.lib'

They pull everything else with them, I did not find other dependencies

Code Select

Severity	Code	Description	Project	File	Line	Suppression State
Error	LNK2019	unresolved external symbol _free referenced in function _XXH32_freeState@4	xxHashLib	E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\xxHashLib\xxhash.obj	1	
Error	LNK2019	unresolved external symbol _malloc referenced in function _XXH32_createState@0	xxHashLib	E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\xxHashLib\xxhash.obj	1	
Error	LNK2019	unresolved external symbol __allmul referenced in function _XXH3_128bits@8	xxHashLib	E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\xxHashLib\xxhash.obj	1	
Error	LNK2001	unresolved external symbol __allmul	xxHashLib	E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\xxHashLib\xxh_x86dispatch.obj	1	
Error	LNK2001	unresolved external symbol __aullshr	xxHashLib	E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\xxHashLib\xxhash.obj	1	
Error	LNK2001	unresolved external symbol __aullshr	xxHashLib	E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\xxHashLib\xxh_x86dispatch.obj	1	
Error	LNK2019	unresolved external symbol _memcpy referenced in function _XXH32_update@12	xxHashLib	E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\xxHashLib\xxhash.obj	1	
Error	LNK2001	unresolved external symbol _memcpy	xxHashLib	E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\xxHashLib\xxh_x86dispatch.obj	1	
Error	LNK2019	unresolved external symbol _memset referenced in function _XXH64_reset@12	xxHashLib	E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\xxHashLib\xxhash.obj	1	
Error	LNK2001	unresolved external symbol __DllMainCRTStartup@12	xxHashLib	E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\xxHashLib\LINK	1	
Error	LNK1120	7 unresolved externals	xxHashLib	E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\Release\xxHashLib.dll	1

Title: Re: Hash tables for ultra fast dictionaries
Post by: six_L on January 23, 2022, 03:31:25 AM

Hi,LiaoMi
thanks your respond.

i only use the XXH3_64bits PROTO input:QWORD ,length1:QWORD.
now the link wants vc.lib, i can't find where it is.

regard.

Title: Re: Hash tables for ultra fast dictionaries
Post by: LiaoMi on January 23, 2022, 07:28:32 AM

Quote from: six_L on January 23, 2022, 03:31:25 AM
Hi,LiaoMi
thanks your respond.

i only use the XXH3_64bits PROTO input:QWORD ,length1:QWORD.
now the link wants vc.lib, i can't find where it is.

regard.

Hi six_L,

these functions _free, _malloc, __allmul, _aullshr, _memcpy, _memset need to be rewritten, otherwise we will have to link the entire library of functions ... some of the routines involve a lot of garbage. I replaced all the obvious functions, but the visual studio itself inserts constructions like memcpy and memset :undecided: ... I should make a separate library with functions :icon_idea:

Title: Re: Hash tables for ultra fast dictionaries
Post by: six_L on January 23, 2022, 04:08:24 PM

Hi, LiaoMi
thanks your respond.
i hope you'll make a separate library with the XXH3_64bits PROTO input:QWORD ,length1:QWORD

regard

Title: Re: Hash tables for ultra fast dictionaries
Post by: Biterider on January 23, 2022, 06:56:55 PM

Hi
I played a bit with the CRC hash algo. I had an implementation in 2 flavors (ANSI and PKZip) that I used for data transfer testing.
I put the code in the dictionary and ran the same tests as before.
have to make one thing clear. When I speak of collisions below, I mean total collisions: this is due to the fact that I am using "open addressing". If you only look at hash collisions, these can be smaller.

Looking at the results, it is clear that FNV-1a is faster and has fewer collisions than the CRC ANSI version. What is interesting is that the algorithm used for PKZip has fewer collisions but has the same execution time as its ANSI brother.

I'm curious to see if the hardware-accelerated CRC32 variant performs better. It requires SEE4.2, which should work on my machine. :rolleyes:

The source is attached.

Biterider

Title: Re: Hash tables for ultra fast dictionaries
Post by: LiaoMi on January 24, 2022, 12:15:00 AM

Quote from: six_L on January 23, 2022, 04:08:24 PM
Hi, LiaoMi
thanks your respond.
i hope you'll make a separate library with the XXH3_64bits PROTO input:QWORD ,length1:QWORD

regard

Hi six_L,

try this, it works for me, I updated the libraries, you can find a working example in the archive.

Title: Re: Hash tables for ultra fast dictionaries
Post by: Biterider on January 24, 2022, 02:11:58 AM

Hi
Looking for a faster way to calculate a CRC hash, I looked into CRC32 instruction and found an official Intel paper that describes various methods on how to do that https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf (https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf)
The way it works is that the CRC for medium-sized memory blocks is calculated in parallel and then combined with one another at the end.
This way of working is suitable for processing large blocks of memory, such as are used for transfers, and not for comparatively short strings. :sad:

Fortunately, there is a lot of interesting assembly code in the article that can be used for other purposes. :thumbsup:

I found the SHA statement extension, but it works in the same way as the CRC32, and isn't efficient to use for this either.

Biterider

Title: Re: Hash tables for ultra fast dictionaries
Post by: LiaoMi on January 24, 2022, 02:18:23 AM

:biggrin: in the example above, I forgot to expand printf XXH3_64bits to QWORD

Code Select

    .486
    .model flat,stdcall
    option casemap:none

.nolist
.nocref

XXH32 PROTO  input:PTR ,length1:PTR ,seed:DWORD
XXH3_64bits PROTO  input:PTR ,length1:PTR
printf PROTO C arg1:Ptr Byte, printlist: VARARG
Sleep PROTO :DWORD

includelib xxHashLib_x86.lib
includelib CrtEmu_x32.lib

%noop MACRO
nop
nop
ENDM

SCALE equ 7

.data
msg db "Hello Hello!!! =)",10,0
XXH32msg db "XXH32: %x",10,0
XXH64msg db "XXH64: %x%x",10,0
.data?

.code

mainCRTStartup proc

invoke printf, addr msg

invoke XXH32, addr msg, SIZEOF msg, 797
invoke printf, addr XXH32msg, eax
%noop

invoke XXH3_64bits, addr msg, SIZEOF msg
invoke printf, addr XXH64msg, eax, edx

%noop

mov al, 1
and al,NOT byte PTR (SCALE)
and al,NOT(SCALE)

%noop

; start delay

mov ecx, 100
mov esi, 10
delay2:
dec ecx
nop
jnz delay2
dec esi
cmp esi,0    
jnz delay2
; end delay

 ret
mainCRTStartup endp

end

Title: Re: Hash tables for ultra fast dictionaries
Post by: LiaoMi on January 24, 2022, 02:53:09 AM

Quote from: Biterider on January 24, 2022, 02:11:58 AM
Hi
Looking for a faster way to calculate a CRC hash, I looked into CRC32 instruction and found an official Intel paper that describes various methods on how to do that.
The way it works is that the CRC for medium-sized memory blocks is calculated in parallel and then combined with one another at the end.
This way of working is suitable for processing large blocks of memory, such as are used for transfers, and not for comparatively short strings. :sad:

Fortunately, there is a lot of interesting assembly code in the article that can be used for other purposes. :thumbsup:

I found the SHA statement extension, but it works in the same way as the CRC32, and isn't efficient to use for this either.

Biterider

Hi Biterider,

are your graphs based on your library that you recently published? :rolleyes: It seems to me that SHA should be slower than all other algorithms?! :icon_idea:

Fast CRC Computation for iSCSI Polynomial using CRC32 - https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf
Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction - https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf

xxHash for small keys: the impressive power of modern compilers - https://fastcompression.blogspot.com/2018/03/xxhash-for-small-keys-impressive-power.html

SHA-3 Hash with Keccak - https://masm32.com/board/index.php?topic=8508.0

Quotehere is a SHA-3 hashing algorithm with or without Keccak encryption. The algorithm is consistent with the SHA-3 standard plus the keccak routine to enforce security.

The algorithm produces 224, 256, 384 and 512 bits of data to be used as a signature of whatever file or text that is needed.

I built one biased to these ones:

http://gauss.ececs.uc.edu/Courses/c6053/lectures/Hashing/sha3
https://github.com/magurosan/sha3-odzhan

The Intel® SHA Extensions are designed to accelerate SHA-1 and SHA-256 processing.
There are seven new SSE-based instructions, four supporting SHA-1 and three for SHA-256:
SHA1RNDS4, SHA1NEXTE, SHA1MSG1, SHA1MSG2
SHA256RNDS2, SHA256MSG1, SHA256MSG2

New Instructions Supporting the Secure Hash Algorithm on Intel® Architecture Processors - https://www.intel.com/content/dam/develop/external/us/en/documents/intel-sha-extensions-white-paper-402097.pdf
Fast SHA-256 Implementations on Intel® Architecture - https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/sha-256-implementations-paper.pdf Source - https://github.com/Dar13/intel-sha-testbed/archive/refs/heads/master.zip
Fast SHA512 Implementations on Intel® Architecture Processors - https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-sha512-implementations-ia-processors-paper.pdf

Title: Re: Hash tables for ultra fast dictionaries
Post by: Biterider on January 24, 2022, 03:40:17 AM

Hi LiaoMi
The information collected comes from my own code and my test string list. Using a different setup may alter the results. :icon_idea:
The hashing algorithms I've tested so far are FNV-1, FNV-1a, 2 CRC variants, and the proprietary Python implementation.

IMO hardware accelerated CRC32 and SHA makes no sense for regular strings. See my post above.

I'm curious to see how xxHash performs. :cool:

Biterider

Title: Re: Hash tables for ultra fast dictionaries
Post by: six_L on January 24, 2022, 03:51:57 AM

Hi,LiaoMi
thanks you for your patience and the time you've spent.

Quotetry this, it works for me, I updated the libraries, you can find a working example in the archive.

the link needs OLDNAMES.lib.

Title: Re: Hash tables for ultra fast dictionaries
Post by: LiaoMi on January 24, 2022, 04:34:35 AM

Quote from: six_L on January 24, 2022, 03:51:57 AM
Hi,LiaoMi
thanks you for your patience and the time you've spent.
Quotetry this, it works for me, I updated the libraries, you can find a working example in the archive.
the link needs OLDNAMES.lib.

Hi six_L,

the attached archive has this file, this library is available in many packages :icon_idea: What bitness do you assemble?

Title: Re: Hash tables for ultra fast dictionaries
Post by: six_L on January 24, 2022, 04:58:58 AM

hi,LiaoMi

QuotexxHashLib_x64.lib(xxhash.obj) : error LNK2019: __imp_free， XXH32_freeState
xxHashLib_x64.lib(xxhash.obj) : error LNK2019: __imp_malloc， XXH32_createState
xxHashLib_x64.lib(xxhash.obj) : error LNK2019: memcpy， XXH32_update
msvcrt.lib : warning LNK4272:
oldnames.lib : warning LNK4272:
hashvrf_5.exe : fatal error LNK1120: 3

QuoteWhat bitness do you assemble?

X64

Title: Re: Hash tables for ultra fast dictionaries
Post by: Biterider on January 24, 2022, 05:00:43 AM

Hi
It seems that I was wrong in the assumption that CRC32 has to be used in blocks. :sad:
It is perfectly possible to use it on single BYTEs.
I'll write some code to test it.

Biterider

Title: Re: Hash tables for ultra fast dictionaries
Post by: Biterider on January 24, 2022, 06:03:08 AM

Hi
I hacked something together using CRC32. I don't care about bit reflection at the moment since the goal is to get a hash value.
The results differ from my previous attempts because the polynomial used for the CRC32 is also different.

The results beat the FNV-1a for the first time, which is great.

Code Select


  mov rdx, pKey
  xor eax, eax
  .while TRUE
    mov cx, WORD ptr [rdx]
    test cx, cx
    CRC32 rax, cl
    mov cl, ch
    CRC32 rax, cl
    .break .if ZERO?
    add rdx, 2
  .endw

Biterider

Title: Re: Hash tables for ultra fast dictionaries
Post by: LiaoMi on January 24, 2022, 08:09:39 AM

Quote from: six_L on January 24, 2022, 04:58:58 AM
hi,LiaoMi
QuotexxHashLib_x64.lib(xxhash.obj) : error LNK2019: __imp_free， XXH32_freeState
xxHashLib_x64.lib(xxhash.obj) : error LNK2019: __imp_malloc， XXH32_createState
xxHashLib_x64.lib(xxhash.obj) : error LNK2019: memcpy， XXH32_update
msvcrt.lib : warning LNK4272:
oldnames.lib : warning LNK4272:
hashvrf_5.exe : fatal error LNK1120: 3
QuoteWhat bitness do you assemble?
X64

Hi six_L,

in the 32 bit example, we need to reverse the order of the two registers (edx, eax) so that both examples show the same data. invoke printf, addr XXH64msg, edx, eax

Code Select

invoke XXH3_64bits, addr msg, SIZEOF msg
invoke printf, addr XXH64msg, edx, eax

Attached is an example for 64 bit :thup:

Title: Re: Hash tables for ultra fast dictionaries
Post by: LiaoMi on January 24, 2022, 08:24:02 AM

Quote from: Biterider on January 24, 2022, 06:03:08 AM
Hi
I hacked something together using CRC32. I don't care about bit reflection at the moment since the goal is to get a hash value.
The results differ from my previous attempts because the polynomial used for the CRC32 is also different.

The results beat the FNV-1a for the first time, which is great.

Hi Biterider,

is it a graphical representation of graphs from python? :rolleyes: On what principle does your Hash search work? Looking at the code, a simple search of values?

Code Select

; ??????????????????????????????????????????????????????????????????????????????????????????????????
; Method:     Dictionary.Search
; Purpose:    Search in the Dictionary for a given Key.
; Arguments:  Arg1: -> Key
; Return:     xax -> Item or NULL if not found.

Method Dictionary.Search, uses xbx xdi xsi, pKey:PSTRING
;  DbgText "Dictionary.Search"
  SetObject xsi
  OCall xsi.Hash, pKey
  mov xdi, xax                                            ;Remember the hash
  mov ebx, [xsi].dMask
  and xbx, xax                                            ;ebx is used as index into a pointer array

  ;Search for the correct item where the hash and the Key match
  .while TRUE
    mov xdx, [xsi].pBuckets
    mov xax, POINTER ptr [xdx + sizeof(POINTER)*xbx]
    .if xax == HTL_BUCKET_EMPTY
      DbgText "Not found"
      ExitMethod
    .endif
    .if xax != HTL_BUCKET_TOMBSTONE
      invoke StrComp, pKey, [xax].DIC_ITEM.pKey
      .break .if eax == 0                                 ;Found => exit loop
    .endif
    shr xdi, DIC_PERTURB_SHIFT                            ;edi = Perturbation
    lea ebx, [4*ebx + ebx + 1]
    add ebx, edi
    and ebx, [xsi].dMask
  .endw
  mov xdx, [xsi].pBuckets
  mov xax, POINTER ptr [xdx + sizeof(POINTER)*xbx]
MethodEnd

Optimizing HashMap's Performance - https://www.baeldung.com/java-hashmap-optimize-performance

3. Tree Instead of LinkedList
Starting from Java 8, one optimization is built-in in HashMap: When buckets are getting too large, they're transformed into trees, instead of linked lists. That brings the pessimistic time of O(n) to O(log(n)), which is much better. For that to work, the keys of HashMap need to implement the Comparable interface.

That's a nice and automatic solution, but it's not perfect. O(log(n)) is still worse than desired constant time, and transforming and storing trees takes additional power and memory.

4.3. Custom hashCode
Usually, we want to override the equals method, and then we also need to override hashCode. Sometimes, we can take advantage of the specific identity of the class and easily make a very fast hashCode method.

Let's say our object's identity is purely based on its integer id. Then, we can just use this id as a hash function:

Code Select

@Override
public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || getClass() != o.getClass()) return false;

    MemberWithId that = (MemberWithId) o;

    return id.equals(that.id);
}

@Override
public int hashCode() {
    return id;
}

High performance hashmap(wen load factor is high), using AVL tree instead linked list to resolve hash collision - https://github.com/LYKZZzz/HashMap
Using red-black tree and hash-map to implement a spelling corrector - https://github.com/nirnicole/Spelling_Corrector-C

OPTIMIZING HASHMAPS EVEN MORE - https://blog.yoshuawuyts.com/optimizing-hashmaps-even-more/
However we've known for a long time about a faster hashmap than Swisstable: the perfect hashtable. The idea is that if you know all your keys ahead of time, you can create a lookup table which never creates collisions, never has to re-index the hashmap, or grow in size2. Though it's not always possible to know all keys ahead of time, so there are limits to when it can be used.

Even so, let's take a look at when it does make sense to use perfect hashtables, and show examples of how to implement them.

Sorting a HashMap - https://www.interviewkickstart.com/learn/sort-hashmap
HashMap is a very popular data structure used in the programming paradigm. Many programming problems in software engineering interviews can be tackled by the simple use of a HashMap.

In this article, we will cover:

The Internal Workings of a HashMap
Problem Statement
Solution #1: Sorting a HashMap Using a LinkedHashMap
Solution #2: Sorting a HashMap Using a TreeMap
Time and Space Complexity of Sorting a HashMap
FAQs on Sorting a HashMap

Title: Re: Hash tables for ultra fast dictionaries
Post by: six_L on January 24, 2022, 06:24:30 PM

Hi, LiaoMi
as you said, For short data/strings, simplicity of FNV or djb2 are hard to beat, they are very performant on short data as well.

Code Select

draw_PointThread proc USES rdi rbx rsi

	lea	rsi,org_str
	invoke	GetSystemTimePreciseAsFileTime, addr StartTime
@@:
	
	invoke	RtlZeroMemory,addr NameA,sizeof NameA
	lea	rdi,NameA
	invoke	iRandom_1,3,15
	mov	_rndstrlen,rax	;NameALen

	xor	rbx,rbx
	.repeat
		invoke	iRandom_1,0,org_StrlenMax
		mov	al,[rsi+rax]
		mov	[rdi+rbx],al
		inc	rbx
	.until rbx == _rndstrlen
	
	invoke	lstrlen,addr NameA
	invoke	XXH3_64bits,addr NameA,rax
	xor	rdx,rdx
	mov	rcx,x_max
	div	rcx		;hash % x_max
	mov	hHash_IndexA,edx

	invoke	RtlZeroMemory,addr NameB,sizeof NameB

	lea	rdi,NameB
	invoke	iRandom_1,3,15
	mov	_rndstrlen,rax	;NameBLen

	xor	rbx,rbx
	.repeat
		invoke	iRandom_1,0,org_StrlenMax
		mov	al,[rsi+rax]
		mov	[rdi+rbx],al
		inc	rbx
	.until rbx == _rndstrlen
	
	invoke	lstrlen,addr NameB
	invoke	XXH3_64bits,addr NameB,rax
	xor	rdx,rdx
	mov	rcx,y_max
	div	rcx		;hash % y_max
	mov	hHash_IndexB,edx

	invoke	InvalidateRect,Main_hWnd,0,FALSE
	invoke	UpdateWindow,Main_hWnd

	lea	rdi,TxtColorTable
	mov	rax,dqColorCnt
	mov	rcx,[rdi+rax*4]
	mov	point_cor,ecx
	inc	dqColorCnt
	.if	dqColorCnt == 6
		mov    dqColorCnt,0
	.endif
	
	invoke	Sleep,1
	inc	dqTotalCalcHash
	.if	dqTotalCalcHash >= 10000
		mov	StopThreadFlag,TRUE
	.endif
	cmp StopThreadFlag,TRUE
	jnz @B
	
	invoke	GetSystemTimePreciseAsFileTime, addr EndTime
	
	xor rax,rax
	xor rdx,rdx
	mov eax,StartTime.dwHighDateTime
	shl rax, 32
	mov edx,StartTime.dwLowDateTime
	add rax,rdx
	mov rcx,rax

	xor rax,rax
	xor rdx,rdx
	mov eax,EndTime.dwHighDateTime
	shl rax, 32
	mov edx,EndTime.dwLowDateTime
	add rax,rdx
	
	sub rax,rcx
	invoke	wsprintf,addr szBuffer,CStr("SpendTime: %Id"),rax
	invoke	SetWindowText,Main_hWnd,addr szBuffer
	ret

draw_PointThread endp

1669717931（us） XXH3_64bits
1669303733（us） FNV-1A

FNV-1A is faster 414198（us）than XXH3_64bits on calculating 20000 hash values

Title: Re: Hash tables for ultra fast dictionaries
Post by: jj2007 on January 24, 2022, 08:29:36 PM

On a side not, six_L: I had never seen GetSystemTimePreciseAsFileTime - interesting, thanks. It seems, though, that it is not ideal for benchmarking (https://stackoverflow.com/questions/34557407/queryperformancecounter-or-getsystemtimepreciseasfiletime-when-using-sntp) ("QueryPerformanceCounter will never speed up or slow down") :cool:

Title: Re: Hash tables for ultra fast dictionaries
Post by: Biterider on January 24, 2022, 08:38:03 PM

Hi LiaoMi

Quote from: LiaoMi on January 24, 2022, 08:24:02 AM
is it a graphical representation of graphs from python? :rolleyes: On what principle does your Hash search work? Looking at the code, a simple search of values?

I coded my own testbed and dumped the results into an Excel spreadsheet to create the charts.
The HashTable and Dictionary are very similar to what Python does. They use "open addressing", ie no trees or linked lists are used in the event of a collision. Instead, the next available bucket is used to store the pointer to the dictionary data. The search pattern to find this bucket comes from Python. It uses a perturbation to take advantage of the unused part of the hash number. The concept is somewhat similar to double hashing, but with much less computational effort.
The strategy is described in the following source
https://github.com/python/cpython/blob/main/Objects/dictobject.c#:~:text=%23define%20PERTURB_SHIFT%205 (https://github.com/python/cpython/blob/main/Objects/dictobject.c#:~:text=%23define%20PERTURB_SHIFT%205)

By the way, if there is no need to delete information from the dictionary, the tombstones can be removed.

Biterider

Title: Re: Hash tables for ultra fast dictionaries
Post by: LiaoMi on January 24, 2022, 08:42:13 PM

Quote from: six_L on January 24, 2022, 06:24:30 PM
Hi, LiaoMi
as you said, For short data/strings, simplicity of FNV or djb2 are hard to beat, they are very performant on short data as well.

Hi six_L,

thanks for the stats and tests! In general, this is not surprising if we compare the code and the input data :icon_idea:

Could you please check these functions? - XXH32 and XXH64, XXHL64_default_sse2, XXHL64_default_avx2, XXH3_64bits_dispatch ...

Title: Re: Hash tables for ultra fast dictionaries
Post by: six_L on January 25, 2022, 01:16:10 AM

Hi,jj2007
thanks you for the following useful information.

QuoteAs long as your computer's clock is not currently undergoing SystemTimeAdjustment, the elapsed intervals measured by:

GetSytemTimeAsPreciseFileTime
QueryPerformanceCounter
will both:

be in sync
be within a few tenths of a microsecond of each other
have an ultimate resolution of 100ns (0.1 us)
For accurate time measurements these are the two functions you should use. The question, which one do you care for the task:
GetSystemTimeAsPreciseFileTime: high resolution measurement of "now"

good for log files
timestamping of events
synchronized to UTC for cross-machine agreement of "now"
QueryPerformanceCounter: high resolution measurement of "elapsed time"

good for bench-marking
measuring duration of processes

the GetSystemTimePreciseAsFileTime is not ideal for benchmarking.

Hi,LiaoMi

QuoteCould you please check these functions? - XXH32 and XXH64, XXHL64_default_sse2, XXHL64_default_avx2, XXH3_64bits_dispatch ...

i'll test them lately

Title: Re: Hash tables for ultra fast dictionaries
Post by: LiaoMi on January 25, 2022, 05:52:03 AM

XXH3: a new hash speed record holder - https://sudonull.com/post/31491-XXH3-New-Hash-Rate-Record-GlobalSign-Blog

Some algorithms from Google:
Сityhash - https://github.com/google/cityhash
Fast strong hash functions: SipHash/HighwayHash - https://github.com/google/highwayhash

https://en.wikipedia.org/wiki/SipHash
SipHash is an add–rotate–xor (ARX) based family of pseudorandom functions created by Jean-Philippe Aumasson and Daniel J. Bernstein in 2012. SipHash computes a 64-bit message authentication code from a variable-length message and 128-bit secret key. It was designed to be efficient even for short inputs, with performance comparable to non-cryptographic hash functions, such as CityHash.

C library to calculate the FNV Hash function in 32, 64, 128, 256 or 1024 bits. The FNV-1a version is supported - https://github.com/fnvhash/libfnv
(C language FNV hash function library with all supported bit lengths of FNV-1a: 32, 64, 128, 256, 512, and 1024)

t1ha - One of the fastest hash functions - https://github.com/erthink/t1ha
WangYi´s wyhash - The FASTEST QUALITY hash function, random number generators (PRNG) and hash map - https://github.com/wangyi-fudan/wyhash
Git mirror of the Fowler Noll Vo (FNV) hash algorithm original C source - https://github.com/catb0t/fnv-hash or http://www.isthe.com/chongo/tech/comp/fnv/index.html

Title: Re: Hash tables for ultra fast dictionaries
Post by: Biterider on January 25, 2022, 09:01:16 AM

Hi
Here is a simple description of how xxHash works.
The secret of its speed is the parallel processing of 4 streams. :icon_idea:

https://create.stephan-brumme.com/xxhash/#git1 (https://create.stephan-brumme.com/xxhash/#git1)

Biterider

Title: Re: Hash tables for ultra fast dictionaries
Post by: six_L on January 26, 2022, 09:53:57 PM

Hi,LiaoMi

there was the better benchmarking algos in our forum, e.g. get the CPU info, set the priority level of thread. but it's the x86 style. i need time to convert into x64.
sorry for testing more slowly.

;=================================================

QuoteWhat are the longest words in the Oxford English Dictionary?

Pneumonoultramicroscopicsilicovolcanoconiosis (45 letters)
Hippopotomonstrosesquippedaliophobia (36 letters)
Supercalifragilisticexpialidocious (34 letters)
Pseudopseudohypoparathyroidism (30 letters)
Floccinaucinihilipilification (29 letters)
Antidisestablishmentarianism (28 letters)

we'll use the hash to create and determine a data structures position, not to checksum. so The test datas are limited to less than 64 characters.

regard.

Title: Re: Hash tables for ultra fast dictionaries
Post by: HSE on January 27, 2022, 02:01:42 AM

Hi!

Quote from: LiaoMi on January 24, 2022, 08:24:02 AM
Sorting a HashMap - https://www.interviewkickstart.com/learn/sort-hashmap
HashMap is a very popular data structure used in the programming paradigm. Many programming problems in software engineering interviews can be tackled by the simple use of a HashMap.

Then:

1) for most purposes HashTable and HashMap are exactly same thing?

2) You never sort the HashMap, You create a new ordered HashMap?

3) can be sorted the HashTable under design?

Thanks in advance, HSE.

Title: Re: Hash tables for ultra fast dictionaries
Post by: Biterider on January 27, 2022, 07:38:05 AM

Hi HSE
The example of the linked article only refers to "open hashing" or "closed addressing", which usually means chaining.
As far as I can tell in this context, the difference between a HashTable and a HashMap is more sematic. The map is a special case of the table where the bucket contains a pointer to the data, while the more general case of the table has the complete data stored in the bucket.

In the case of the article, there are many roads that lead to Rome and it is up to the programmer to choose the best solution. For example, you can use superhero strength as the key value and put the superhero names in a sorted lists attached to the buckets. In this case, the Items are sorted by design.

One thing must be clear, hash tables are not primarily intended for sorting. The goal is to be as close to O(1) as possible.

Biterider

Title: Re: Hash tables for ultra fast dictionaries
Post by: HSE on January 27, 2022, 10:33:45 AM

Hi Biterider!!

A little more clear now. :thumbsup:

In lockdown I begin translation of a very complex program in Java. Mostly everything was easy translated to ObjAsm32, but there was a problem with HashTables and HahsMaps. :biggrin:

Quote from: Biterider on January 27, 2022, 07:38:05 AM
For example, you can use superhero strength as the key value and put the superhero names in a sorted lists attached to the buckets. In this case, the Items are sorted by design.

I'm reading that Java project now, and become clear that is something like that.

Look like using HashTables like tables can add a lot of flexibility at a not so high price, but I skipped all that and maked simple tables :cool:.

Thanks, HSE.

Title: Re: Hash tables for ultra fast dictionaries
Post by: Biterider on January 30, 2022, 06:00:47 PM

Hi
Looking for the hardware implementation of the CRC calculation I found this extremely easy to understand video.
I highly recommend it. :thumbsup:

https://www.youtube.com/watch?v=sNkERQlK8j8 (https://www.youtube.com/watch?v=sNkERQlK8j8)

Biterider

Title: Re: Hash tables for ultra fast dictionaries
Post by: six_L on February 08, 2022, 05:00:21 AM

QuoteHash tables are a pupular way to implement the dictionary data type. Theoretically they are intuitive to understand. However,real-life implementations of hash tables could be complex.

hash tables waste 40% of memory in 1024 max item, when user items reach 1000. maybe the wasted 40% of memory never use in future.

Title: Re: Hash tables for ultra fast dictionaries
Post by: Biterider on February 08, 2022, 05:31:11 AM

Hi six_L
You're absolutely right, but don't forget what the HashTable is for. From my perspective, it is not intended for memory optimization, but for quick data access using a key.
Imagine a compiler where you put the symbols into a hash table. It doesn't matter how much memory you waste because after compilation the application will be closed and all memory will be released to the operating system. What you really want is a fast compilation process!

Regards, Biterider

Title: Re: Hash tables for ultra fast dictionaries
Post by: six_L on February 08, 2022, 10:21:57 PM

Hi,Biterider
i'm comprehended. about the speed and the memory, we'll get both of them is impossible. either loss the speed or loss the memory.
;================================
1, Indices_Array is stored by the DictObjectIndex, Entries_Array is stored by the CollisionIndex, Collision_Array is stored by the CollisionIndex. If the actual address is stored, this memory database backup and recovery cannot work.
2, Why did the hashtable still use comparison instructions? Due to the possibility of collision at any time, the target dictobject item must be confirmed by the szStrCmp.
3, When deleting a DictObject item, keep nextaddr in there. otherwise the openning address chain will break, the DictObject item in this openning address chain will be lost. If the nextdatas move a bit by bit, it will take a lot of time.
4, Replacing index zero with "- 1" is to distinguish between initialization "0" and written "0".

The MASM Forum

Projects => ObjAsm => Topic started by: Biterider on January 10, 2022, 01:09:58 AM