News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Hash tables for ultra fast dictionaries

Started by Biterider, January 10, 2022, 01:09:58 AM

Previous topic - Next topic

six_L

Hi,LiaoMi
thanks your respond.

i only use the XXH3_64bits PROTO  input:QWORD ,length1:QWORD.
now the link wants vc.lib, i can't find where it is.


regard.
Say you, Say me, Say the codes together for ever.

LiaoMi

Quote from: six_L on January 23, 2022, 03:31:25 AM
Hi,LiaoMi
thanks your respond.

i only use the XXH3_64bits PROTO  input:QWORD ,length1:QWORD.
now the link wants vc.lib, i can't find where it is.


regard.

Hi six_L,

these functions _free, _malloc, __allmul, _aullshr, _memcpy, _memset need to be rewritten, otherwise we will have to link the entire library of functions ... some of the routines involve a lot of garbage. I replaced all the obvious functions, but the visual studio itself inserts constructions like memcpy and memset  :undecided: ... I should make a separate library with functions  :icon_idea:


six_L

Hi, LiaoMi
thanks your respond.
i hope you'll make a separate library with the XXH3_64bits PROTO  input:QWORD ,length1:QWORD

regard
Say you, Say me, Say the codes together for ever.

Biterider

Hi
I played a bit with the CRC hash algo. I had an implementation in 2 flavors (ANSI and PKZip) that I used for data transfer testing.
I put the code in the dictionary and ran the same tests as before.
have to make one thing clear. When I speak of collisions below, I mean total collisions: this is due to the fact that I am using "open addressing". If you only look at hash collisions, these can be smaller.

Looking at the results, it is clear that FNV-1a is faster and has fewer collisions than the CRC ANSI version. What is interesting is that the algorithm used for PKZip has fewer collisions but has the same execution time as its ANSI brother.

I'm curious to see if the hardware-accelerated CRC32 variant performs better. It requires SEE4.2, which should work on my machine.  :rolleyes:

The source is attached.

Biterider



LiaoMi

Quote from: six_L on January 23, 2022, 04:08:24 PM
Hi, LiaoMi
thanks your respond.
i hope you'll make a separate library with the XXH3_64bits PROTO  input:QWORD ,length1:QWORD

regard

Hi six_L,

try this, it works for me, I updated the libraries, you can find a working example in the archive.

Biterider

Hi
Looking for a faster way to calculate a CRC hash, I looked into CRC32 instruction and found an official Intel paper that describes various methods on how to do that https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf
The way it works is that the CRC for medium-sized memory blocks is calculated in parallel and then combined with one another at the end.
This way of working is suitable for processing large blocks of memory, such as are used for transfers, and not for comparatively short strings.  :sad:

Fortunately, there is a lot of interesting assembly code in the article that can be used for other purposes.  :thumbsup:

I found the SHA statement extension, but it works in the same way as the CRC32, and isn't efficient to use for this either.

Biterider

LiaoMi

 :biggrin: in the example above, I forgot to expand printf XXH3_64bits to QWORD

    .486
    .model flat,stdcall
    option casemap:none

.nolist
.nocref

XXH32 PROTO  input:PTR ,length1:PTR ,seed:DWORD
XXH3_64bits PROTO  input:PTR ,length1:PTR
printf PROTO C arg1:Ptr Byte, printlist: VARARG
Sleep PROTO :DWORD

includelib xxHashLib_x86.lib
includelib CrtEmu_x32.lib

%noop MACRO
nop
nop
ENDM

SCALE equ 7

.data
msg db "Hello Hello!!! =)",10,0
XXH32msg db "XXH32: %x",10,0
XXH64msg db "XXH64: %x%x",10,0
.data?

.code

mainCRTStartup proc

invoke printf, addr msg

invoke XXH32, addr msg, SIZEOF msg, 797
invoke printf, addr XXH32msg, eax
%noop

invoke XXH3_64bits, addr msg, SIZEOF msg
invoke printf, addr XXH64msg, eax, edx

%noop

mov al, 1
and al,NOT byte PTR (SCALE)
and al,NOT(SCALE)

%noop

; start delay

mov ecx, 100
mov esi, 10
delay2:
dec ecx
nop
jnz delay2
dec esi
cmp esi,0   
jnz delay2
; end delay

ret
mainCRTStartup endp

end

LiaoMi

Quote from: Biterider on January 24, 2022, 02:11:58 AM
Hi
Looking for a faster way to calculate a CRC hash, I looked into CRC32 instruction and found an official Intel paper that describes various methods on how to do that.
The way it works is that the CRC for medium-sized memory blocks is calculated in parallel and then combined with one another at the end.
This way of working is suitable for processing large blocks of memory, such as are used for transfers, and not for comparatively short strings.  :sad:

Fortunately, there is a lot of interesting assembly code in the article that can be used for other purposes.  :thumbsup:

I found the SHA statement extension, but it works in the same way as the CRC32, and isn't efficient to use for this either.

Biterider

Hi Biterider,

are your graphs based on your library that you recently published?  :rolleyes: It seems to me that SHA should be slower than all other algorithms?!  :icon_idea:

Fast CRC Computation for iSCSI Polynomial using CRC32 - https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf
Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction - https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf

xxHash for small keys: the impressive power of modern compilers - https://fastcompression.blogspot.com/2018/03/xxhash-for-small-keys-impressive-power.html


SHA-3 Hash with Keccak - https://masm32.com/board/index.php?topic=8508.0
Quotehere is a SHA-3 hashing algorithm with or without Keccak encryption. The algorithm is consistent with the SHA-3 standard plus the keccak routine to enforce security.

The algorithm produces 224, 256, 384 and 512 bits of data to be used as a signature of whatever file or text that is needed.

I built one biased  to these ones:

http://gauss.ececs.uc.edu/Courses/c6053/lectures/Hashing/sha3
https://github.com/magurosan/sha3-odzhan


The Intel® SHA Extensions are designed to accelerate SHA-1 and SHA-256 processing.
There are seven new SSE-based instructions, four supporting SHA-1 and three for SHA-256:
SHA1RNDS4, SHA1NEXTE, SHA1MSG1, SHA1MSG2
SHA256RNDS2, SHA256MSG1, SHA256MSG2

New Instructions Supporting the Secure Hash Algorithm on Intel® Architecture Processors - https://www.intel.com/content/dam/develop/external/us/en/documents/intel-sha-extensions-white-paper-402097.pdf
Fast SHA-256 Implementations on Intel® Architecture - https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/sha-256-implementations-paper.pdf Source - https://github.com/Dar13/intel-sha-testbed/archive/refs/heads/master.zip
Fast SHA512 Implementations on Intel® Architecture Processors - https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-sha512-implementations-ia-processors-paper.pdf

Biterider

Hi LiaoMi
The information collected comes from my own code and my test string list. Using a different setup may alter the results.  :icon_idea:
The hashing algorithms I've tested so far are FNV-1, FNV-1a, 2 CRC variants, and the proprietary Python implementation.

IMO hardware accelerated CRC32 and SHA makes no sense for regular strings. See my post above.

I'm curious to see how xxHash performs.  :cool:

Biterider

six_L

Hi,LiaoMi
thanks you for your patience and the time you've spent.
Quotetry this, it works for me, I updated the libraries, you can find a working example in the archive.
the link needs OLDNAMES.lib.
Say you, Say me, Say the codes together for ever.

LiaoMi

Quote from: six_L on January 24, 2022, 03:51:57 AM
Hi,LiaoMi
thanks you for your patience and the time you've spent.
Quotetry this, it works for me, I updated the libraries, you can find a working example in the archive.
the link needs OLDNAMES.lib.

Hi six_L,

the attached archive has this file, this library is available in many packages  :icon_idea: What bitness do you assemble?

six_L

hi,LiaoMi
QuotexxHashLib_x64.lib(xxhash.obj) : error LNK2019:  __imp_free, XXH32_freeState
xxHashLib_x64.lib(xxhash.obj) : error LNK2019:  __imp_malloc, XXH32_createState
xxHashLib_x64.lib(xxhash.obj) : error LNK2019:  memcpy, XXH32_update
msvcrt.lib : warning LNK4272:
oldnames.lib : warning LNK4272:
hashvrf_5.exe : fatal error LNK1120: 3
QuoteWhat bitness do you assemble?
X64
Say you, Say me, Say the codes together for ever.

Biterider

Hi
It seems that I was wrong in the assumption that CRC32 has to be used in blocks.  :sad:
It is perfectly possible to use it on single BYTEs.
I'll write some code to test it.

Biterider

Biterider

Hi
I hacked something together using CRC32. I don't care about bit reflection at the moment since the goal is to get a hash value.
The results differ from my previous attempts because the polynomial used for the CRC32 is also different.

The results beat the FNV-1a for the first time, which is great.


  mov rdx, pKey
  xor eax, eax
  .while TRUE
    mov cx, WORD ptr [rdx]
    test cx, cx
    CRC32 rax, cl
    mov cl, ch
    CRC32 rax, cl
    .break .if ZERO?
    add rdx, 2
  .endw


Biterider

LiaoMi

Quote from: six_L on January 24, 2022, 04:58:58 AM
hi,LiaoMi
QuotexxHashLib_x64.lib(xxhash.obj) : error LNK2019:  __imp_free, XXH32_freeState
xxHashLib_x64.lib(xxhash.obj) : error LNK2019:  __imp_malloc, XXH32_createState
xxHashLib_x64.lib(xxhash.obj) : error LNK2019:  memcpy, XXH32_update
msvcrt.lib : warning LNK4272:
oldnames.lib : warning LNK4272:
hashvrf_5.exe : fatal error LNK1120: 3
QuoteWhat bitness do you assemble?
X64

Hi six_L,

in the 32 bit example, we need to reverse the order of the two registers (edx, eax) so that both examples show the same data. invoke printf, addr XXH64msg, edx, eax

invoke XXH3_64bits, addr msg, SIZEOF msg
invoke printf, addr XXH64msg, edx, eax


Attached is an example for 64 bit  :thup: