Hi
For the past 2 years I have been involved in an industrial-scale Python project.
One of the things I've learned is how to use Python dictionaries really efficiently. The backend is a combination of an index table and a hash table.
While there are many implementations in other languages that use many different hash algorithms and implementations, looking for a hash table implementation in masm did not produce a useful result. :sad:
Now it is time again to get my hands dirty and write some code that is flexible to use.
As a proof of concept, I started using an OOP approach, but it can easily be turned into a procedural approach.
I will be working on it in the next few weeks and I will be happy when someone can join me. :cool:
A good description I used as starting point can be found here https://www.data-structures-in-practice.com/hash-tables/ (https://www.data-structures-in-practice.com/hash-tables/)
Biterider
Hi Biterider,
I was interested in this topic earlier, I was looking for a method in which I could use - Bidirectional-Associative-memory - with high search speed on large volumes up to 200 gigabytes. For other purposes, you can look at a combination of AVL trees and hashing. In any case, a lot depends on the initial requirements for data structures and speed.
A c++ hash map/table which utilizes simd (specifically Intel x86 SSE/AVX) - https://github.com/NathanSWard/simd_hash_map
Hi,Biterider
interesting...
(2^N-1)&X = Mod(X/2^N)
;-----------------------------
what's your next plan? translate Python's code into Asm
Hi LiaoMi
Of course, collision handling is a big deal when it comes to hash tables.
As you know, one way is to "openly address" or append some sort of binary tree (RBT, AVL etc) linked list etc to store items with the same hash.
Thanks for the link, I'll check it :thumbsup:
Biterider
Hi six_L
From what I've seen using Python (I'm not a Python fanboy!), it has extremely useful constructs. One of them is the dictionaries discussed, but there are others too.
My intention is to bring the good things I've seen into asm using an asm implementation.
In fact, Python's core is written in C, which in many ways comes close to asm. I'm not talking here about Python's VM, which is a complete different thing!
You can check out the latest implementation details on Github's CPython.
Biterider
Quote from: six_L on January 11, 2022, 06:47:10 AM
interesting...
(2^N-1)&X = Mod(X/2^N)
I used this approach here, sounds ok without optimized code, an element frequency count of 2 millions random numbers executed in 0.1439990000 sec. To reduce collisions more memory is need.
https://masm32.com/board/index.php?topic=8811.msg96213#msg96213
I think the first question is how data looks like, it's fixed size or dynamic size?
Glib hash to strings use:
"This function implements the widely used "djb" hash apparently posted by Daniel Bernstein to comp.lang.c some time ago. The 32 bit unsigned hash value starts at 5381 and for each byte 'c' in the string, is updated: hash = hash * 33 + c. This function uses the signed value of each byte."
A good reference book about hash is Algorithms by Robert Sedgewick.
You can see a lot of hash code if you search for dictionary approach (lempel ziv) to data compression. LZ,LZW,LZO,LZP, LZARI, ... .
Quote from: LiaoMi on January 10, 2022, 09:06:21 PM
I was looking for a method in which I could use - Bidirectional-Associative-memory -
A.I. with Hopfield? To restore damaged images?
Hi mineiro
If you look at the different implementations and strategies you will find that they all look very similar. They differ mainly in the hash algorithm and in the collision handler.
IMHO it is possible to build a common skeleton and change the hash algo or the collision handler according to individual needs.
There are of course details like resizing (e.g. only powers of 2) or deleting (Tombstones, shifting, etc.), but that can certainly be worked out.
Biterider
i think you really need to understand your data and hence have a specific application for the use of hash tables .. trees wdnt be almost ubiquitous otherwise ??? .. one thought though is .....
It is often thought that in modular arithmetic [which includes x_bit computers ] that you need to divide to get a modular solution..... but you can multiply by the inverse to get the equivalent .. what is not quite so easy is calculating the inverses which wd seem at first sight to be trivial
regards mike b
Hi
It took some effort but finally I got it working. :biggrin:
The
HashTable is defined such that the hashing and collision handler are implemented in a derived object. This has the advantage that different classes can be created using different strategies.
As I mentioned before, I took some inspiration from Python, but at the end, I came with a slightly different approach, which maybe is not better, but is good enough as a starting point.
In this case I created a
Dictionary from the
HashTable using open addressing. It handles string keys of any length and a payload of any kind. I've chosen a DWORD that I've initialized with an incrementing index for demonstration purposes.
I intentionally left aside all the necessary memory allocations to just focus on inserting, searching and deleting the dictionary entries.
My first attempt was to use the well-known
FNV-1a hashing algorithm and a naive deletion strategy using tombstones, which are widely used in the literature. This combination results in 126 collisions for 512 items.
As a performance reference, I used a binary tree in form of a highly optimized object called
SortedStrCollection(W).
In theory, a dictionary should perform with
O(1) while the SortedStrCollection using the binary search should perform with
O(n). The reality comes close to the theory. The reasons are:
- Hash collisions degrade the performance of the HashTable.
- The overhead of the hash calculation is significant when there are only a few elements in the containers.
Still, looking at the results, searching and deleting outperforms the binary search, even for very small element counts.
A major disadvantage comes into play when a HashTable needs to be
resized, which requires all elements to be reinserted by hashing them again. The SortedCollection does not suffer from this effect, since all item pointers remain in place when the internal table is resized.
There are many possible improvements like testing other hashing algorithms, in-place allocation, close addressing, etc. which I'll try over the next few days.
Attached are the sources and a log-log chart comparing performance between the dictionary and the SortedStrCollection.
Biterider
Quote from: mineiro on January 11, 2022, 08:29:41 AM
Quote from: LiaoMi on January 10, 2022, 09:06:21 PM
I was looking for a method in which I could use - Bidirectional-Associative-memory -
A.I. with Hopfield? To restore damaged images?
Hi mineiro,
it was a database search, here the data is located randomly and cannot be sorted by structural conditions. In both directions, the search must have the same speed.
Hi
While searching for hashing algorithms, I discovered that both Uasm and AsmC use the well-known FNV-1 hashing routine for their symbol tables.
A much more complicated hash routine is the MurmurHash3, which should be fine for general hash-based searching.
There are a few other interesting algorithms to test. :cool:
Biterider
Hi,Biterider
here is a hash which happens Collision hardly.
Hi six_L
Looks promising. I'll check it.
Regards,
Biterider
Hi,
https://www.asmcommunity.net/forums/topic/?id=30185
QuoteOk, I've started working on my own version of h2incn - a program that will parse C header files and convert them into Nasm compatible .inc files. After creating the initial program outline I am now ready to begin adding in the preprocessing support.
One of the things that I need is a fast way of performing lookups of defines and typedefs. I've settled on using hash maps and the hash function algorithm FNV-1 which, according to the authors, provides for a nice dispersion across a hash map. Thus I begin by defining a hash map to contain the key/value pairs and develop the hashing function.
I want the hashing function and the hash map to be capable of holding arbitrary types and data sizes. Thus the routines being developed do not depend on simple character strings (although you can indeed supply them to the functions). That way, the routines may be used to hold objects of all types for any purpose ( token processing, game world objects, in-memory databases, caching, etc. ).
The following is my Nasm modified version of the FNV-1 hash algorithm for use in both 32-bit and 64-bit systems. Note that the 32-bit version uses the standard C calling convention while the 64-bit version uses either Windows or Linux fastcall calling conventions. It may be optimized further (an exersize left for the reader) but I'm quite happy with it. I would love to know if others find use for it and what their collision findings are...
nasm - FNV1HASH.ASM;
; FNV1HASH.ASM
;
; Copyright (C)2010 Rob Neff
; Source code is licensed under the new/simplified 2-clause BSD OSI license.
;
; This function implements the FNV-1 hash algorithm.
; This source file is formatted for Nasm compatibility although it
; is small enough to be easily converted into another assembler format.
;
; Example C/C++ call:
;
; #ifdef __cplusplus
; extern "C" {
; #endif
;
; unsigned int FNV1Hash(char *buffer, unsigned int len, unsigned int offset_basis);
;
; #ifdef __cplusplus
; }
; #endif
;
; int hash;
;
; /* obtain 32-bit FNV1 hash */
; hash = FNV1Hash(buffer, len, 2166136261);
;
; /* if desired - convert from a 32-bit to 16-bit hash */
; hash = ((hash >> 16) ^ (hash & 0xFFFF));
;
%ifidni __BITS__,32
;
; 32-bit C calling convention
;
%define buffer
%define len
%define offset_basis
global _FNV1Hash
_FNV1Hash:
push ebp ; set up stack frame
mov ebp, esp
push esi ; save registers used
push edi
push ebx
push ecx
push edx
mov esi, buffer ;esi = ptr to buffer
mov ecx, len ;ecx = length of buffer (counter)
mov eax, offset_basis ;set to 2166136261 for FNV-1
mov edi, 1000193h ;FNV_32_PRIME = 16777619
xor ebx, ebx ;ebx = 0
nextbyte:
mul edi ;eax = eax * FNV_32_PRIME
mov bl, ;bl = byte from esi
xor eax, ebx ;al = al xor bl
inc esi ;esi = esi + 1 (buffer pos)
dec ecx ;ecx = ecx - 1 (counter)
jnz nextbyte ;if ecx != 0, jmp to NextByte
pop edx ; restore registers
pop ecx
pop ebx
pop edi
pop esi
mov esp, ebp ; restore stack frame
pop ebp
ret ; eax = fnv1 hash
%elifidni __BITS__,64
;
; 64-bit function
;
%ifidni __OUTPUT_FORMAT__,win64
;
; 64-bit Windows fastcall convention:
; ints/longs/ptrs: RCX, RDX, R8, R9
; floats/doubles: XMM0 to XMM3
;
global FNV1Hash
FNV1Hash:
xchg rcx, rdx ;rcx = length of buffer
xchg r8, rdx ;r8 = ptr to buffer
%elifidni __OUTPUT_FORMAT__,elf64
;
; 64-bit Linux fastcall convention
; ints/longs/ptrs: RDI, RSI, RDX, RCX, R8, R9
; floats/doubles: XMM0 to XMM7
;
global _FNV1Hash
_FNV1Hash:
mov rcx, rsi
mov r8, rdi
%endif
mov rax, rdx ;rax = offset_basis - set to 14695981039346656037 for FNV-1
mov r9, 100000001B3h ;r9 = FNV_64_PRIME = 1099511628211
mov r10, rbx ;r10 = saved copy of rbx
xor rbx, rbx ;rbx = 0
nextbyte:
mul r9 ;rax = rax * FNV_64_PRIME
mov bl, ;bl = byte from r8
xor rax, rbx ;al = al xor bl
inc r8 ;inc buffer pos
dec rcx ;rcx = rcx - 1 (counter)
jnz nextbyte ;if rcx != 0, jmp to nextbyte
mov rbx, r10 ;restore rbx
ret ;rax = fnv1 hash
%endif
Robin Hood hashingRobin hood hashing is an open addressing based collision resolution algorithm; the collisions are resolved through favouring the displacement of the element that is farthest—or longest probe sequence length (PSL)—from its "home location" i.e. the bucket to which the item was hashed into. Although robin hood hashing does not change the theoretical search cost, it significantly affects the variance of the distribution of the items on the buckets, i.e. dealing with cluster formation in the hash table.
Leveraging SIMD parallelism for accelerating network - https://conferences.sigcomm.org/events/apnet2020/material/apnet20-final20.pdf
xxHash - Extremely fast non-cryptographic hash algorithm
BenchmarksThe reference system uses an Intel i7-9700K cpu, and runs Ubuntu x64 20.04. The open source benchmark program is compiled with clang v10.0 using -O3 flag.
Hash Name Width Bandwidth (GB/s) Small Data Velocity Quality Comment
XXH3 (SSE2) 64 31.5 GB/s 133.1 10
XXH128 (SSE2) 128 29.6 GB/s 118.1 10
RAM sequential read N/A 28.0 GB/s N/A N/A for reference
City64 64 22.0 GB/s 76.6 10
T1ha2 64 22.0 GB/s 99.0 9 Slightly worse collisions
City128 128 21.7 GB/s 57.7 10
XXH64 64 19.4 GB/s 71.0 10
SpookyHash 64 19.3 GB/s 53.2 10
Mum 64 18.0 GB/s 67.0 9 Slightly worse collisions
XXH32 32 9.7 GB/s 71.9 10
City32 32 9.1 GB/s 66.0 10
Murmur3 32 3.9 GB/s 56.1 10
SipHash 64 3.0 GB/s 43.2 10
FNV64 64 1.2 GB/s 62.7 5 Poor avalanche properties
Blake2 256 1.1 GB/s 5.1 10 Cryptographic
SHA1 160 0.8 GB/s 5.6 10 Cryptographic but broken
MD5 128 0.6 GB/s 7.8 10 Cryptographic but broken
* | Hash Name | ISA ext | Width | Large Data Speed | Small Data Velocity |
* | -------------------- | ------- | ----: | ---------------: | ------------------: |
* | XXH3_64bits() | @b AVX2 | 64 | 59.4 GB/s | 133.1 |
* | MeowHash | AES-NI | 128 | 58.2 GB/s | 52.5 |
* | XXH3_128bits() | @b AVX2 | 128 | 57.9 GB/s | 118.1 |
* | CLHash | PCLMUL | 64 | 37.1 GB/s | 58.1 |
* | XXH3_64bits() | @b SSE2 | 64 | 31.5 GB/s | 133.1 |
* | XXH3_128bits() | @b SSE2 | 128 | 29.6 GB/s | 118.1 |
* | RAM sequential read | | N/A | 28.0 GB/s | N/A |
* | ahash | AES-NI | 64 | 22.5 GB/s | 107.2 |
* | City64 | | 64 | 22.0 GB/s | 76.6 |
* | T1ha2 | | 64 | 22.0 GB/s | 99.0 |
* | City128 | | 128 | 21.7 GB/s | 57.7 |
* | FarmHash | AES-NI | 64 | 21.3 GB/s | 71.9 |
* | XXH64() | | 64 | 19.4 GB/s | 71.0 |
* | SpookyHash | | 64 | 19.3 GB/s | 53.2 |
* | Mum | | 64 | 18.0 GB/s | 67.0 |
* | CRC32C | SSE4.2 | 32 | 13.0 GB/s | 57.9 |
* | XXH32() | | 32 | 9.7 GB/s | 71.9 |
* | City32 | | 32 | 9.1 GB/s | 66.0 |
* | Blake3* | @b AVX2 | 256 | 4.4 GB/s | 8.1 |
* | Murmur3 | | 32 | 3.9 GB/s | 56.1 |
* | SipHash* | | 64 | 3.0 GB/s | 43.2 |
* | Blake3* | @b SSE2 | 256 | 2.4 GB/s | 8.1 |
* | HighwayHash | | 64 | 1.4 GB/s | 6.0 |
* | FNV64 | | 64 | 1.2 GB/s | 62.7 |
* | Blake2* | | 256 | 1.1 GB/s | 5.1 |
* | SHA1* | | 160 | 0.8 GB/s | 5.6 |
* | MD5* | | 128 | 0.6 GB/s | 7.8 |
note 1: Small data velocity is a rough evaluation of algorithm's efficiency on small data. For more detailed analysis, please refer to next paragraph.
note 2: some algorithms feature faster than RAM speed. In which case, they can only reach their full speed when input data is already in CPU cache (L3 or better). Otherwise, they max out on RAM speed limit.
Small dataPerformance on large data is only one part of the picture. Hashing is also very useful in constructions like hash tables and bloom filters. In these use cases, it's frequent to hash a lot of small data (starting at a few bytes). Algorithm's performance can be very different for such scenarios, since parts of the algorithm, such as initialization or finalization, become fixed cost. The impact of branch mis-prediction also becomes much more present.
XXH3 has been designed for excellent performance on both long and small inputs, which can be observed in the
following graph: - https://user-images.githubusercontent.com/750081/61976089-aedeab00-af9f-11e9-9239-e5375d6c080f.png
Vectorizing xxHash for Fun and Profit - https://moinakg.wordpress.com/2013/01/19/vectorizing-xxhash-for-fun-and-profit/
Hi,
i tested the FNV1 hash algorithm. it's faster and happenning collision hardly.
DictHashValue_FNV1 proc uses rbx pData:qword,dqSize:qword ;hashValue
; retValue=rax
_FNV1Hash:
mov rax,14695981039346656037;rax = offset_basis - set to 14695981039346656037 for FNV-1
mov rcx, dqSize
mov r8, pData
mov r9, 0100000001B3h ;r9 = FNV_64_PRIME = 1099511628211
xor rbx, rbx ;rbx = 0
nextbyte:
mul r9 ;rax = rax * FNV_64_PRIME
mov bl, [r8] ;bl = byte from r8
xor rax, rbx ;al = al xor bl
inc r8 ;inc buffer pos
dec rcx ;rcx = rcx - 1 (counter)
jnz nextbyte ;if rcx != 0, jmp to nextbyte
ret ;rax = fnv1 hash
DictHashValue_FNV1 endp
Hi six_L
That is cool :thumbsup:
Since that you have the test setup ready, can you check and compare with the FNV-1a algorithm?
it is only necessary to change the order of operations. This algo is recommended in the literature over the FNV-1 because it is slightly better and requires the same computing time.
https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function (https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function)
Thanks, Biterider
Hi,Biterider
i can't access the [https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function (https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function)], could you convert it into PDF and post here?
i'll know what's the hash algorithm which has been recommended by literature.
regard.
For you ;-)
Hi,jj2007
:thumbsup:
Thank you very much.
Hi,Biterider
we'll use some names as key, so FNV-1 and FNV-1a has a same effects on avalanche or random distribution of hash values. about speed, FNV-1 is as fast as FNV-1a on calculating hash values.
Quotecan you check and compare with the FNV-1a algorithm?
the attachmnet is the effects of FNV-1a on random distribution of hash values.
Hi six_L
Thanks for testing :cool:
Here are some interesting comparisons between hash algos. In particular, FNV-1 vs. FNV-1a.
If you look around, CRC32 is another interesting candidate.
https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed (https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed)
Biterider
Hi,Biterider
QuoteDo collisions actually happen?
Yes. I started writing my test program to see if hash collisions actually happen - and are not just a theoretical construct. They do indeed happen:
FNV-1 collisions
creamwove collides with quists
FNV-1a collisions
costarring collides with liquid
declinate collides with macallums
altarage collides with zinke
altarages collides with zinkes
Murmur2 collisions
i doubt the results of his test. Maybe he tested on a 32bit system.
Hi,
I have compiled a library for this type of hashes - xxHashLib_x64 :rolleyes:
It successfully completes the SMHasher test suite which evaluates collision, dispersion and randomness qualities of hash functions. Code is highly portable, and hashes are identical across all platforms (little / big endian).
Here's results of various hash functions, hashing data of different lengths, with performance in MB/s: https://aras-p.info/img/blog/2016-08/hash-perf.png (https://aras-p.info/blog/2016/08/02/Hash-Functions-all-the-way-down/)
Where can I find xxhash64 and md5 collision probability statistics? - https://stackoverflow.com/questions/45788721/where-can-i-find-xxhash64-and-md5-collision-probability-statistics
If you use xxhash64,
Assuming that xxhash64 produce a 64-bit hash. You will get this graph:
(https://i.stack.imgur.com/bOsiJ.jpg)
Hi :biggrin:,
ISCSI CRC 32 Implementation with crc32 and pclmulqdq Instructions.
"Hardware-accelerated CRC (labeled iSCSI CRC in the table) is the fastest hash function on the recent Core i5/i7 processors. However, the CRC32 instruction is not supported by AMD and earlier Intel processors"
Hi LiaoMi
Thanks for all the usefull information. :thumbsup:
I took a look into smhasher at https://github.com/aappleby/smhasher (https://github.com/aappleby/smhasher).
It has a lot of usefull things, including src of the hash algos!
Biterider
Quote from: Biterider on January 21, 2022, 10:28:41 PM
Hi LiaoMi
Thanks for all the usefull information. :thumbsup:
I took a look into smhasher at https://github.com/aappleby/smhasher (https://github.com/aappleby/smhasher).
It has a lot of usefull things, including src of the hash algos!
Biterider
Hi Biterider,
I would like to know exactly the spread of execution speed, in the material above there are many graphs with different comparisons. In general terms, I derived the following rules, speed depends:
1. on the type of processor
2. on the data itself
3. on the amount of data
3. on key length (length affects collision balance)
4. on hardware acceleration (crc32 and pclmulqdq etc)
5. on parallel execution (Can slow down execution in special cases)
6. on seed (?!)
xxHash - Extremely fast hash algorithm - already contains a collision test - https://github.com/Cyan4973/xxHash/blob/dev/tests/collisions/main.c (https://en.wikipedia.org/wiki/Avalanche_effect)
Judging by the graphs, xxHash has very good performance, it has different sub-versions for other tasks :rolleyes:
Today in the previous post I added the x86 version of xxHash
I have never bothered to port my basic hash table to masm, it uses a basic dynamic string array which is allocate as an empty array of placeholders which you then fill with data based on the the hash calculation. I use a reverse table of prime numbers to get the initial array location and the array length must also be a prime number so that it can never be divided by an number larger than 1.
The advantage of the dynamic array is that you can make massive arrays as each placeholder is only 4 bytes. You pay a penalty loading the array full of placeholders but you only notice it on very large arrays. Once loaded with values, the technique is genuinely fast and it does not slow down until well over 80% of the table length.
You should only really saturate the table up to about 50% but having a lot of head room makes it a lot easier to use.
Hi,LiaoMi
thank you providid a lot of the usefull information and file.
QuotexxHash has very good performance.
i want to test. but I have assembled your xxHashLib_x64,show me "fatal error LNK1104: can't open the file 'LIBCMT.lib'"
Quote from: six_L on January 22, 2022, 07:31:45 PM
Hi,LiaoMi
thank you providid a lot of the usefull information and file.
QuotexxHash has very good performance.
i want to test. but I have assembled your xxHashLib_x64,show me "fatal error LNK1104: can't open the file 'LIBCMT.lib'"
Hi six_L,
the problem is in the functions that are not included in this library :sad:
I added a part, but again something is missing
includelib xxHashLib_x86.lib
includelib ucrt.lib
includelib libucrt.lib
includelib vcruntime.lib
includelib vc.lib
Here is the build log:
Searching libraries
Searching xxHashLib_x86.lib:
Found _XXH32@12
Referenced in test_bugs.obj
Loaded xxHashLib_x86.lib(xxhash.obj)
Processed /DEFAULTLIB:LIBCMT
Processed /DEFAULTLIB:OLDNAMES
Searching ucrt.lib:
Found _free
Referenced in xxHashLib_x86.lib(xxhash.obj)
Loaded ucrt.lib(msvcrt.dll)
Found _malloc
Referenced in xxHashLib_x86.lib(xxhash.obj)
Loaded ucrt.lib(msvcrt.dll)
Found __IMPORT_DESCRIPTOR_msvcrt
Referenced in ucrt.lib(msvcrt.dll)
Referenced in ucrt.lib(msvcrt.dll)
Loaded ucrt.lib(msvcrt.dll)
Found __NULL_IMPORT_DESCRIPTOR
Referenced in ucrt.lib(msvcrt.dll)
Loaded ucrt.lib(msvcrt.dll)
Found msvcrt_NULL_THUNK_DATA
Referenced in ucrt.lib(msvcrt.dll)
Loaded ucrt.lib(msvcrt.dll)
Searching libucrt.lib:
Searching vcruntime.lib:
Found _memcpy
Referenced in xxHashLib_x86.lib(xxhash.obj)
Loaded vcruntime.lib(msvcrt.dll)
Found _memset
Referenced in xxHashLib_x86.lib(xxhash.obj)
Loaded vcruntime.lib(msvcrt.dll)
Searching vc.lib:
Found __allmul
Referenced in xxHashLib_x86.lib(xxhash.obj)
Loaded vc.lib(llmul.obj)
Found __aullshr
Referenced in xxHashLib_x86.lib(xxhash.obj)
Loaded vc.lib(ullshr.obj)
LINK : fatal error LNK1104: cannot open file 'LIBCMT.lib'
They pull everything else with them, I did not find other dependencies
Severity Code Description Project File Line Suppression State
Error LNK2019 unresolved external symbol _free referenced in function _XXH32_freeState@4 xxHashLib E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\xxHashLib\xxhash.obj 1
Error LNK2019 unresolved external symbol _malloc referenced in function _XXH32_createState@0 xxHashLib E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\xxHashLib\xxhash.obj 1
Error LNK2019 unresolved external symbol __allmul referenced in function _XXH3_128bits@8 xxHashLib E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\xxHashLib\xxhash.obj 1
Error LNK2001 unresolved external symbol __allmul xxHashLib E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\xxHashLib\xxh_x86dispatch.obj 1
Error LNK2001 unresolved external symbol __aullshr xxHashLib E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\xxHashLib\xxhash.obj 1
Error LNK2001 unresolved external symbol __aullshr xxHashLib E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\xxHashLib\xxh_x86dispatch.obj 1
Error LNK2019 unresolved external symbol _memcpy referenced in function _XXH32_update@12 xxHashLib E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\xxHashLib\xxhash.obj 1
Error LNK2001 unresolved external symbol _memcpy xxHashLib E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\xxHashLib\xxh_x86dispatch.obj 1
Error LNK2019 unresolved external symbol _memset referenced in function _XXH64_reset@12 xxHashLib E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\xxHashLib\xxhash.obj 1
Error LNK2001 unresolved external symbol __DllMainCRTStartup@12 xxHashLib E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\xxHashLib\LINK 1
Error LNK1120 7 unresolved externals xxHashLib E:\DATA\Hashing Table - Dictionary - Hash Maps - High Dispersion\xxHashLib\xxHashLib\Release\xxHashLib.dll 1
Hi,LiaoMi
thanks your respond.
i only use the XXH3_64bits PROTO input:QWORD ,length1:QWORD.
now the link wants vc.lib, i can't find where it is.
regard.
Quote from: six_L on January 23, 2022, 03:31:25 AM
Hi,LiaoMi
thanks your respond.
i only use the XXH3_64bits PROTO input:QWORD ,length1:QWORD.
now the link wants vc.lib, i can't find where it is.
regard.
Hi six_L,
these functions _free, _malloc, __allmul, _aullshr, _memcpy, _memset need to be rewritten, otherwise we will have to link the entire library of functions ... some of the routines involve a lot of garbage. I replaced all the obvious functions, but the visual studio itself inserts constructions like memcpy and memset :undecided: ... I should make a separate library with functions :icon_idea:
Hi, LiaoMi
thanks your respond.
i hope you'll make a separate library with the XXH3_64bits PROTO input:QWORD ,length1:QWORD
regard
Hi
I played a bit with the CRC hash algo. I had an implementation in 2 flavors (ANSI and PKZip) that I used for data transfer testing.
I put the code in the dictionary and ran the same tests as before.
have to make one thing clear. When I speak of collisions below, I mean total collisions: this is due to the fact that I am using "open addressing". If you only look at hash collisions, these can be smaller.
Looking at the results, it is clear that FNV-1a is faster and has fewer collisions than the CRC ANSI version. What is interesting is that the algorithm used for PKZip has fewer collisions but has the same execution time as its ANSI brother.
I'm curious to see if the hardware-accelerated CRC32 variant performs better. It requires SEE4.2, which should work on my machine. :rolleyes:
The source is attached.
Biterider
Quote from: six_L on January 23, 2022, 04:08:24 PM
Hi, LiaoMi
thanks your respond.
i hope you'll make a separate library with the XXH3_64bits PROTO input:QWORD ,length1:QWORD
regard
Hi six_L,
try this, it works for me, I updated the libraries, you can find a working example in the archive.
Hi
Looking for a faster way to calculate a CRC hash, I looked into CRC32 instruction and found an official Intel paper that describes various methods on how to do that https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf (https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf)
The way it works is that the CRC for medium-sized memory blocks is calculated in parallel and then combined with one another at the end.
This way of working is suitable for processing large blocks of memory, such as are used for transfers, and not for comparatively short strings. :sad:
Fortunately, there is a lot of interesting assembly code in the article that can be used for other purposes. :thumbsup:
I found the SHA statement extension, but it works in the same way as the CRC32, and isn't efficient to use for this either.
Biterider
:biggrin: in the example above, I forgot to expand printf XXH3_64bits to QWORD
.486
.model flat,stdcall
option casemap:none
.nolist
.nocref
XXH32 PROTO input:PTR ,length1:PTR ,seed:DWORD
XXH3_64bits PROTO input:PTR ,length1:PTR
printf PROTO C arg1:Ptr Byte, printlist: VARARG
Sleep PROTO :DWORD
includelib xxHashLib_x86.lib
includelib CrtEmu_x32.lib
%noop MACRO
nop
nop
ENDM
SCALE equ 7
.data
msg db "Hello Hello!!! =)",10,0
XXH32msg db "XXH32: %x",10,0
XXH64msg db "XXH64: %x%x",10,0
.data?
.code
mainCRTStartup proc
invoke printf, addr msg
invoke XXH32, addr msg, SIZEOF msg, 797
invoke printf, addr XXH32msg, eax
%noop
invoke XXH3_64bits, addr msg, SIZEOF msg
invoke printf, addr XXH64msg, eax, edx
%noop
mov al, 1
and al,NOT byte PTR (SCALE)
and al,NOT(SCALE)
%noop
; start delay
mov ecx, 100
mov esi, 10
delay2:
dec ecx
nop
jnz delay2
dec esi
cmp esi,0
jnz delay2
; end delay
ret
mainCRTStartup endp
end
Quote from: Biterider on January 24, 2022, 02:11:58 AM
Hi
Looking for a faster way to calculate a CRC hash, I looked into CRC32 instruction and found an official Intel paper that describes various methods on how to do that.
The way it works is that the CRC for medium-sized memory blocks is calculated in parallel and then combined with one another at the end.
This way of working is suitable for processing large blocks of memory, such as are used for transfers, and not for comparatively short strings. :sad:
Fortunately, there is a lot of interesting assembly code in the article that can be used for other purposes. :thumbsup:
I found the SHA statement extension, but it works in the same way as the CRC32, and isn't efficient to use for this either.
Biterider
Hi Biterider,
are your graphs based on your library that you recently published? :rolleyes: It seems to me that SHA should be slower than all other algorithms?! :icon_idea:
Fast CRC Computation for iSCSI Polynomial using CRC32 - https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf
Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction - https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf
xxHash for small keys: the impressive power of modern compilers - https://fastcompression.blogspot.com/2018/03/xxhash-for-small-keys-impressive-power.html
SHA-3 Hash with Keccak - https://masm32.com/board/index.php?topic=8508.0
Quotehere is a SHA-3 hashing algorithm with or without Keccak encryption. The algorithm is consistent with the SHA-3 standard plus the keccak routine to enforce security.
The algorithm produces 224, 256, 384 and 512 bits of data to be used as a signature of whatever file or text that is needed.
I built one biased to these ones:
http://gauss.ececs.uc.edu/Courses/c6053/lectures/Hashing/sha3
https://github.com/magurosan/sha3-odzhan
The Intel® SHA Extensions are designed to accelerate SHA-1 and SHA-256 processing.
There are seven new SSE-based instructions, four supporting SHA-1 and three for SHA-256:
SHA1RNDS4, SHA1NEXTE, SHA1MSG1, SHA1MSG2
SHA256RNDS2, SHA256MSG1, SHA256MSG2
New Instructions Supporting the Secure Hash Algorithm on Intel® Architecture Processors - https://www.intel.com/content/dam/develop/external/us/en/documents/intel-sha-extensions-white-paper-402097.pdf
Fast SHA-256 Implementations on Intel® Architecture - https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/sha-256-implementations-paper.pdf Source - https://github.com/Dar13/intel-sha-testbed/archive/refs/heads/master.zip
Fast SHA512 Implementations on Intel® Architecture Processors - https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-sha512-implementations-ia-processors-paper.pdf
Hi LiaoMi
The information collected comes from my own code and my test string list. Using a different setup may alter the results. :icon_idea:
The hashing algorithms I've tested so far are FNV-1, FNV-1a, 2 CRC variants, and the proprietary Python implementation.
IMO hardware accelerated CRC32 and SHA makes no sense for regular strings. See my post above.
I'm curious to see how xxHash performs. :cool:
Biterider
Hi,LiaoMi
thanks you for your patience and the time you've spent.
Quotetry this, it works for me, I updated the libraries, you can find a working example in the archive.
the link needs OLDNAMES.lib.
Quote from: six_L on January 24, 2022, 03:51:57 AM
Hi,LiaoMi
thanks you for your patience and the time you've spent.
Quotetry this, it works for me, I updated the libraries, you can find a working example in the archive.
the link needs OLDNAMES.lib.
Hi six_L,
the attached archive has this file, this library is available in many packages :icon_idea: What bitness do you assemble?
hi,LiaoMi
QuotexxHashLib_x64.lib(xxhash.obj) : error LNK2019: __imp_free, XXH32_freeState
xxHashLib_x64.lib(xxhash.obj) : error LNK2019: __imp_malloc, XXH32_createState
xxHashLib_x64.lib(xxhash.obj) : error LNK2019: memcpy, XXH32_update
msvcrt.lib : warning LNK4272:
oldnames.lib : warning LNK4272:
hashvrf_5.exe : fatal error LNK1120: 3
QuoteWhat bitness do you assemble?
X64
Hi
It seems that I was wrong in the assumption that CRC32 has to be used in blocks. :sad:
It is perfectly possible to use it on single BYTEs.
I'll write some code to test it.
Biterider
Hi
I hacked something together using CRC32. I don't care about bit reflection at the moment since the goal is to get a hash value.
The results differ from my previous attempts because the polynomial used for the CRC32 is also different.
The results beat the FNV-1a for the first time, which is great.
mov rdx, pKey
xor eax, eax
.while TRUE
mov cx, WORD ptr [rdx]
test cx, cx
CRC32 rax, cl
mov cl, ch
CRC32 rax, cl
.break .if ZERO?
add rdx, 2
.endw
Biterider
Quote from: six_L on January 24, 2022, 04:58:58 AM
hi,LiaoMi
QuotexxHashLib_x64.lib(xxhash.obj) : error LNK2019: __imp_free, XXH32_freeState
xxHashLib_x64.lib(xxhash.obj) : error LNK2019: __imp_malloc, XXH32_createState
xxHashLib_x64.lib(xxhash.obj) : error LNK2019: memcpy, XXH32_update
msvcrt.lib : warning LNK4272:
oldnames.lib : warning LNK4272:
hashvrf_5.exe : fatal error LNK1120: 3
QuoteWhat bitness do you assemble?
X64
Hi six_L,
in the 32 bit example, we need to reverse the order of the two registers (edx, eax) so that both examples show the same data. invoke printf, addr XXH64msg, edx, eax
invoke XXH3_64bits, addr msg, SIZEOF msg
invoke printf, addr XXH64msg, edx, eax
Attached is an example for 64 bit :thup:
Quote from: Biterider on January 24, 2022, 06:03:08 AM
Hi
I hacked something together using CRC32. I don't care about bit reflection at the moment since the goal is to get a hash value.
The results differ from my previous attempts because the polynomial used for the CRC32 is also different.
The results beat the FNV-1a for the first time, which is great.
Hi Biterider,
is it a graphical representation of graphs from python? :rolleyes: On what principle does your Hash search work? Looking at the code, a simple search of values?
; ??????????????????????????????????????????????????????????????????????????????????????????????????
; Method: Dictionary.Search
; Purpose: Search in the Dictionary for a given Key.
; Arguments: Arg1: -> Key
; Return: xax -> Item or NULL if not found.
Method Dictionary.Search, uses xbx xdi xsi, pKey:PSTRING
; DbgText "Dictionary.Search"
SetObject xsi
OCall xsi.Hash, pKey
mov xdi, xax ;Remember the hash
mov ebx, [xsi].dMask
and xbx, xax ;ebx is used as index into a pointer array
;Search for the correct item where the hash and the Key match
.while TRUE
mov xdx, [xsi].pBuckets
mov xax, POINTER ptr [xdx + sizeof(POINTER)*xbx]
.if xax == HTL_BUCKET_EMPTY
DbgText "Not found"
ExitMethod
.endif
.if xax != HTL_BUCKET_TOMBSTONE
invoke StrComp, pKey, [xax].DIC_ITEM.pKey
.break .if eax == 0 ;Found => exit loop
.endif
shr xdi, DIC_PERTURB_SHIFT ;edi = Perturbation
lea ebx, [4*ebx + ebx + 1]
add ebx, edi
and ebx, [xsi].dMask
.endw
mov xdx, [xsi].pBuckets
mov xax, POINTER ptr [xdx + sizeof(POINTER)*xbx]
MethodEnd
Optimizing HashMap's Performance - https://www.baeldung.com/java-hashmap-optimize-performance
3. Tree Instead of LinkedListStarting from Java 8, one optimization is built-in in HashMap: When buckets are getting too large, they're transformed into trees, instead of linked lists. That brings the pessimistic time of O(n) to O(log(n)), which is much better. For that to work, the keys of HashMap need to implement the Comparable interface.
That's a nice and automatic solution, but it's not perfect. O(log(n)) is still worse than desired constant time, and transforming and storing trees takes additional power and memory.
4.3. Custom hashCodeUsually, we want to override the equals method, and then we also need to override hashCode. Sometimes, we can take advantage of the specific identity of the class and easily make a very fast hashCode method.
Let's say our object's identity is purely based on its integer id. Then, we can just use this id as a hash function:
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
MemberWithId that = (MemberWithId) o;
return id.equals(that.id);
}
@Override
public int hashCode() {
return id;
}
High performance hashmap(wen load factor is high), using
AVL tree instead linked list to resolve hash collision - https://github.com/LYKZZzz/HashMap
Using
red-black tree and hash-map to implement a spelling corrector - https://github.com/nirnicole/Spelling_Corrector-C
OPTIMIZING HASHMAPS EVEN MORE - https://blog.yoshuawuyts.com/optimizing-hashmaps-even-more/
However we've known for a long time about a faster hashmap than Swisstable: the perfect hashtable. The idea is that if you know all your keys ahead of time, you can create a lookup table which never creates collisions, never has to re-index the hashmap, or grow in size2. Though it's not always possible to know all keys ahead of time, so there are limits to when it can be used.
Even so, let's take a look at when it does make sense to use perfect hashtables, and show examples of how to implement them.
Sorting a HashMap - https://www.interviewkickstart.com/learn/sort-hashmap
HashMap is a very popular data structure used in the programming paradigm. Many programming problems in software engineering interviews can be tackled by the simple use of a HashMap.
In this article, we will cover:
The Internal Workings of a HashMap
Problem Statement
Solution #1: Sorting a HashMap Using a LinkedHashMap
Solution #2: Sorting a HashMap Using a TreeMap
Time and Space Complexity of Sorting a HashMap
FAQs on Sorting a HashMap
Hi, LiaoMi
as you said, For short data/strings, simplicity of FNV or djb2 are hard to beat, they are very performant on short data as well.
draw_PointThread proc USES rdi rbx rsi
lea rsi,org_str
invoke GetSystemTimePreciseAsFileTime, addr StartTime
@@:
invoke RtlZeroMemory,addr NameA,sizeof NameA
lea rdi,NameA
invoke iRandom_1,3,15
mov _rndstrlen,rax ;NameALen
xor rbx,rbx
.repeat
invoke iRandom_1,0,org_StrlenMax
mov al,[rsi+rax]
mov [rdi+rbx],al
inc rbx
.until rbx == _rndstrlen
invoke lstrlen,addr NameA
invoke XXH3_64bits,addr NameA,rax
xor rdx,rdx
mov rcx,x_max
div rcx ;hash % x_max
mov hHash_IndexA,edx
invoke RtlZeroMemory,addr NameB,sizeof NameB
lea rdi,NameB
invoke iRandom_1,3,15
mov _rndstrlen,rax ;NameBLen
xor rbx,rbx
.repeat
invoke iRandom_1,0,org_StrlenMax
mov al,[rsi+rax]
mov [rdi+rbx],al
inc rbx
.until rbx == _rndstrlen
invoke lstrlen,addr NameB
invoke XXH3_64bits,addr NameB,rax
xor rdx,rdx
mov rcx,y_max
div rcx ;hash % y_max
mov hHash_IndexB,edx
invoke InvalidateRect,Main_hWnd,0,FALSE
invoke UpdateWindow,Main_hWnd
lea rdi,TxtColorTable
mov rax,dqColorCnt
mov rcx,[rdi+rax*4]
mov point_cor,ecx
inc dqColorCnt
.if dqColorCnt == 6
mov dqColorCnt,0
.endif
invoke Sleep,1
inc dqTotalCalcHash
.if dqTotalCalcHash >= 10000
mov StopThreadFlag,TRUE
.endif
cmp StopThreadFlag,TRUE
jnz @B
invoke GetSystemTimePreciseAsFileTime, addr EndTime
xor rax,rax
xor rdx,rdx
mov eax,StartTime.dwHighDateTime
shl rax, 32
mov edx,StartTime.dwLowDateTime
add rax,rdx
mov rcx,rax
xor rax,rax
xor rdx,rdx
mov eax,EndTime.dwHighDateTime
shl rax, 32
mov edx,EndTime.dwLowDateTime
add rax,rdx
sub rax,rcx
invoke wsprintf,addr szBuffer,CStr("SpendTime: %Id"),rax
invoke SetWindowText,Main_hWnd,addr szBuffer
ret
draw_PointThread endp
1669717931(us) XXH3_64bits
1669303733(us) FNV-1A
FNV-1A is faster 414198(us)than XXH3_64bits on calculating 20000 hash values
On a side not, six_L: I had never seen GetSystemTimePreciseAsFileTime - interesting, thanks. It seems, though, that it is not ideal for benchmarking (https://stackoverflow.com/questions/34557407/queryperformancecounter-or-getsystemtimepreciseasfiletime-when-using-sntp) ("QueryPerformanceCounter will never speed up or slow down") :cool:
Hi LiaoMi
Quote from: LiaoMi on January 24, 2022, 08:24:02 AM
is it a graphical representation of graphs from python? :rolleyes: On what principle does your Hash search work? Looking at the code, a simple search of values?
I coded my own testbed and dumped the results into an Excel spreadsheet to create the charts.
The HashTable and Dictionary are very similar to what Python does. They use "open addressing", ie no trees or linked lists are used in the event of a collision. Instead, the next available bucket is used to store the pointer to the dictionary data. The search pattern to find this bucket comes from Python. It uses a perturbation to take advantage of the unused part of the hash number. The concept is somewhat similar to double hashing, but with much less computational effort.
The strategy is described in the following source
https://github.com/python/cpython/blob/main/Objects/dictobject.c#:~:text=%23define%20PERTURB_SHIFT%205 (https://github.com/python/cpython/blob/main/Objects/dictobject.c#:~:text=%23define%20PERTURB_SHIFT%205)
By the way, if there is no need to delete information from the dictionary, the tombstones can be removed.
Biterider
Quote from: six_L on January 24, 2022, 06:24:30 PM
Hi, LiaoMi
as you said, For short data/strings, simplicity of FNV or djb2 are hard to beat, they are very performant on short data as well.
Hi six_L,
thanks for the stats and tests! In general, this is not surprising if we compare the code and the input data :icon_idea:
Could you please check these functions? - XXH32 and XXH64, XXHL64_default_sse2, XXHL64_default_avx2, XXH3_64bits_dispatch ...
Hi,jj2007
thanks you for the following useful information.
QuoteAs long as your computer's clock is not currently undergoing SystemTimeAdjustment, the elapsed intervals measured by:
- GetSytemTimeAsPreciseFileTime
- QueryPerformanceCounter
will both:
- be in sync
- be within a few tenths of a microsecond of each other
- have an ultimate resolution of 100ns (0.1 us)
For accurate time measurements these are the two functions you should use. The question, which one do you care for the task:
GetSystemTimeAsPreciseFileTime: high resolution measurement of "now"
- good for log files
- timestamping of events
- synchronized to UTC for cross-machine agreement of "now"
QueryPerformanceCounter: high resolution measurement of "elapsed time"
- good for bench-marking
- measuring duration of processes
the GetSystemTimePreciseAsFileTime is not ideal for benchmarking.
Hi,LiaoMi
QuoteCould you please check these functions? - XXH32 and XXH64, XXHL64_default_sse2, XXHL64_default_avx2, XXH3_64bits_dispatch ...
i'll test them lately
XXH3: a new hash speed record holder - https://sudonull.com/post/31491-XXH3-New-Hash-Rate-Record-GlobalSign-Blog
Some algorithms from Google:
Сityhash - https://github.com/google/cityhash
Fast strong hash functions: SipHash/HighwayHash - https://github.com/google/highwayhash
https://en.wikipedia.org/wiki/SipHash
SipHash is an add–rotate–xor (ARX) based family of pseudorandom functions created by Jean-Philippe Aumasson and Daniel J. Bernstein in 2012. SipHash computes a 64-bit message authentication code from a variable-length message and 128-bit secret key. It was designed to be efficient even for short inputs, with performance comparable to non-cryptographic hash functions, such as CityHash.
C library to calculate the FNV Hash function in 32, 64, 128, 256 or 1024 bits. The FNV-1a version is supported - https://github.com/fnvhash/libfnv
(C language FNV hash function library with all supported bit lengths of FNV-1a: 32, 64, 128, 256, 512, and 1024)
t1ha - One of the fastest hash functions - https://github.com/erthink/t1ha
WangYi´s wyhash - The FASTEST QUALITY hash function, random number generators (PRNG) and hash map - https://github.com/wangyi-fudan/wyhash
Git mirror of the Fowler Noll Vo (FNV) hash algorithm original C source - https://github.com/catb0t/fnv-hash or http://www.isthe.com/chongo/tech/comp/fnv/index.html
Hi
Here is a simple description of how xxHash works.
The secret of its speed is the parallel processing of 4 streams. :icon_idea:
https://create.stephan-brumme.com/xxhash/#git1 (https://create.stephan-brumme.com/xxhash/#git1)
Biterider
Hi,LiaoMi
there was the better benchmarking algos in our forum, e.g. get the CPU info, set the priority level of thread. but it's the x86 style. i need time to convert into x64.
sorry for testing more slowly.
;=================================================
QuoteWhat are the longest words in the Oxford English Dictionary?
Pneumonoultramicroscopicsilicovolcanoconiosis (45 letters)
Hippopotomonstrosesquippedaliophobia (36 letters)
Supercalifragilisticexpialidocious (34 letters)
Pseudopseudohypoparathyroidism (30 letters)
Floccinaucinihilipilification (29 letters)
Antidisestablishmentarianism (28 letters)
we'll use the hash to create and determine a data structures position, not to checksum. so The test datas are limited to less than 64 characters.
regard.
Hi!
Quote from: LiaoMi on January 24, 2022, 08:24:02 AM
Sorting a HashMap - https://www.interviewkickstart.com/learn/sort-hashmap
HashMap is a very popular data structure used in the programming paradigm. Many programming problems in software engineering interviews can be tackled by the simple use of a HashMap.
Then:
1) for most purposes HashTable and HashMap are exactly same thing?
2) You never sort the HashMap, You create a new ordered HashMap?
3) can be sorted the HashTable under design?
Thanks in advance, HSE.
Hi HSE
The example of the linked article only refers to "open hashing" or "closed addressing", which usually means chaining.
As far as I can tell in this context, the difference between a HashTable and a HashMap is more sematic. The map is a special case of the table where the bucket contains a pointer to the data, while the more general case of the table has the complete data stored in the bucket.
In the case of the article, there are many roads that lead to Rome and it is up to the programmer to choose the best solution. For example, you can use superhero strength as the key value and put the superhero names in a sorted lists attached to the buckets. In this case, the Items are sorted by design.
One thing must be clear, hash tables are not primarily intended for sorting. The goal is to be as close to O(1) as possible.
Biterider
Hi Biterider!!
A little more clear now. :thumbsup:
In lockdown I begin translation of a very complex program in Java. Mostly everything was easy translated to ObjAsm32, but there was a problem with HashTables and HahsMaps. :biggrin:
Quote from: Biterider on January 27, 2022, 07:38:05 AM
For example, you can use superhero strength as the key value and put the superhero names in a sorted lists attached to the buckets. In this case, the Items are sorted by design.
I'm reading that Java project now, and become clear that is something like that.
Look like using HashTables like tables can add a lot of flexibility at a not so high price, but I skipped all that and maked simple tables :cool:.
Thanks, HSE.
Hi
Looking for the hardware implementation of the CRC calculation I found this extremely easy to understand video.
I highly recommend it. :thumbsup:
https://www.youtube.com/watch?v=sNkERQlK8j8 (https://www.youtube.com/watch?v=sNkERQlK8j8)
Biterider
QuoteHash tables are a pupular way to implement the dictionary data type. Theoretically they are intuitive to understand. However,real-life implementations of hash tables could be complex.
hash tables waste 40% of memory in 1024 max item, when user items reach 1000. maybe the wasted 40% of memory never use in future.
Hi six_L
You're absolutely right, but don't forget what the HashTable is for. From my perspective, it is not intended for memory optimization, but for quick data access using a key.
Imagine a compiler where you put the symbols into a hash table. It doesn't matter how much memory you waste because after compilation the application will be closed and all memory will be released to the operating system. What you really want is a fast compilation process!
Regards, Biterider
Hi,Biterider
i'm comprehended. about the speed and the memory, we'll get both of them is impossible. either loss the speed or loss the memory.
;================================
1, Indices_Array is stored by the DictObjectIndex, Entries_Array is stored by the CollisionIndex, Collision_Array is stored by the CollisionIndex. If the actual address is stored, this memory database backup and recovery cannot work.
2, Why did the hashtable still use comparison instructions? Due to the possibility of collision at any time, the target dictobject item must be confirmed by the szStrCmp.
3, When deleting a DictObject item, keep nextaddr in there. otherwise the openning address chain will break, the DictObject item in this openning address chain will be lost. If the nextdatas move a bit by bit, it will take a lot of time.
4, Replacing index zero with "- 1" is to distinguish between initialization "0" and written "0".