I want to deal with hash table length 21 GB complete, each key to a unique bucket (key=Pointer), What is the best and quickest way (masm 32 or 64) to deal with it, Knowing that I have just 8 GB in RAM.
I don't think there is a simple solution, I gather what you are after is an index to data in a database which is stored in memory and to load that index you would need to scan the very large data file to get the offsets of each member you are after. Now once you have scanned the data file and have a list of offsets then to get any individual piece of data from the large file you can either keep the file open and just change the offset to the database member you want. You could also think of using a memory mapped file which may be faster than the normal seek operation.
Quote from: hutch-- on September 21, 2016, 11:47:13 PM
I don't think there is a simple solution, I gather what you are after is an index to data in a database which is stored in memory and to load that index you would need to scan the very large data file to get the offsets of each member you are after. Now once you have scanned the data file and have a list of offsets then to get any individual piece of data from the large file you can either keep the file open and just change the offset to the database member you want. You could also think of using a memory mapped file which may be faster than the normal seek operation.
This means that I must create database engine.
It's easy if I have 32 GB in memory, just Load my data in memory (Buffer), then calculate the pointer as follows:
Pointer = BaseAddr + ( Key * 5 )
the record length is 5 byte ; ((2 ^ 32 record)*5 byte= 21 474 836 480 byte)
so, How do I make this work on 8 GB memory to make the quickest way to access the data?
I agree, a memory mapped file would probably be handier, as you can map in a specific offset of the file of a specific slice required, regardless of the files size or memory size. That way your only consuming a chunk of memory at a time, and then sliding this view to consume another chunk etc. Probably be handier for sequential access and if you know the optimal size to read at a time and probably requires some sort of index management maybe?
Anyhow I use something similar to working with large files in a game engine, once they approach 1GB in size its safer to switch to a large file mapping option (instead of reading entire file) that maps in portions of the file based on the size required and the systems memory granulation (probably for alignment purposes from what i recall), here is some rough code to demonstrate how i calc the offsets and size required using a CalcLargeFileView function:
LOCAL LargeMapResourceSize:DWORD
LOCAL LargeMapResourceOffset:DWORD
Invoke CalcLargeFileView, dwRequiredViewSize, dwRequiredViewOffset, Addr LargeMapResourceSize, Addr LargeMapResourceOffset
Invoke MapViewOfFileEx, LargeMapHandle, FILE_MAP_ALL_ACCESS, 0, LargeMapResourceOffset, LargeMapResourceSize, NULL
And the routine CalcLargeFileView
;-----------------------------------------------------------------------------------------
; CalcLargeFileView - calculate large mapping size, view and offset
;-----------------------------------------------------------------------------------------
CalcLargeFileView PROC USES EBX dwRequiredViewSize:DWORD, dwRequiredViewOffset:DWORD, lpdwMappedViewSize:DWORD, lpdwMappedViewOffset:DWORD
LOCAL sysinfo:SYSTEM_INFO
LOCAL dwAllocationGranularity:DWORD
LOCAL dwAdjustedOffset:DWORD
Invoke GetSystemInfo, Addr sysinfo
mov eax, sysinfo.dwAllocationGranularity
mov dwAllocationGranularity, eax
mov eax, dwRequiredViewSize
.IF eax < dwAllocationGranularity
mov ebx, lpdwMappedViewSize
mov eax, dwAllocationGranularity
add eax, dwAllocationGranularity
mov [ebx], eax
.ELSE
mov eax, dwRequiredViewSize
xor edx, edx
mov ebx, dwAllocationGranularity
div ebx ; Divides dwRequiredViewSize by dwAllocationGranularity. EAX = quotient and EDX = remainder (modulo).
.IF edx > 0 ; we have a remainder, so calc to add dwAllocationGranularity to dwRequiredViewSize - remainder
mov eax, dwRequiredViewSize
sub eax, edx
add eax, dwAllocationGranularity
add eax, dwAllocationGranularity
mov ebx, lpdwMappedViewSize
mov [ebx], eax
.ELSE ; else we have a multiple of dwAllocationGranularity, so just return the dwRequiredViewSize as actualsize
mov eax, dwRequiredViewSize
mov ebx, lpdwMappedViewSize
mov [ebx], eax
.ENDIF
.ENDIF
mov eax, dwRequiredViewOffset
.IF eax < dwAllocationGranularity
mov ebx, dwAllocationGranularity
sub ebx, eax
mov dwAdjustedOffset, ebx
mov eax, 0
mov ebx, lpdwMappedViewOffset
mov [ebx], eax
.ELSE
mov eax, dwRequiredViewOffset
xor edx, edx
mov ebx, dwAllocationGranularity
div ebx ; Divides dwRequiredViewSize by dwAllocationGranularity. EAX = quotient and EDX = remainder (modulo).
.IF edx > 0 ; we have a remainder, so calc to add dwAllocationGranularity to dwRequiredViewSize - remainder
mov dwAdjustedOffset, edx
mov eax, dwRequiredViewOffset
sub eax, edx
mov ebx, lpdwMappedViewOffset
mov [ebx], eax
.ELSE ; else we have a multiple of dwAllocationGranularity, so just return the dwRequiredViewSize as actualsize
mov eax, dwRequiredViewOffset
mov ebx, lpdwMappedViewOffset
mov [ebx], eax
mov dwAdjustedOffset, 0
.ENDIF
.ENDIF
mov eax, dwAdjustedOffset
CalcLargeFileView ENDP
I have to save some of the information for later reuse, so the memory mapped handle and pointer and file handle are saved to a custom structure as my 'handle' so that i can process it later on or use it in other custom functions.
If you need more info let me know.
Thank you fearless, but not what I'm looking for.
Memory mapped provide fast and transparent random access do files. Let us know, in the unlikely event if you find a better and faster solution.
Quote from: aw27 on September 23, 2016, 09:51:26 PM
Memory mapped provide fast and transparent random access do files. Let us know, in the unlikely event if you find a better and faster solution.
Not yet
I didn't find a solution programmatically, but I changed math equation until I get the length 16 rather than 32, with additional instruction
Old:
Pointer = BaseAddr + ( Key * 5 )
the record length is 5 byte (4 byte +1) ; ((2 ^ 32 record)*5 byte= 21 474 836 480 byte)
= 21 GoNow:
Pointer = BaseAddr + ( Key * 3 )
the record length is 3 byte (3 byte +1) ; ((2 ^ 16 record)*3 byte= 196 608 byte)
= 200 KoMathematic makes every job easy.
What I don't know is what you are trying to do. Hash tables come in many forms, An algorithm to generate as unique as possible array index within the range of the hash table, a collision detection mechanism with an alternate strategy if a "bucket" in already in use then write to, read from, delete and/or modify, depends on what you want to do with it. The complication of needing far more memory that you have available can probably only be addressed with a memory mapped file. The alternative is a very big table where you save the offsets from the start for each bucket then move around the file based on the hash of the data you want to access.
File and disk based access will be a lot slower than an in memory hash table array.
Quote from: hutch-- on September 28, 2016, 01:31:30 AM
What I don't know is what you are trying to do.
Numerical analysis
Hi, MABDELOUAHAB,
:bgrin: :bgrin: :bgrin:
From your example, why do you even need a hash table? Your last comment about the size would easily fit in most memory situations and from what I see could easily just use the "key" as an offset value into your array, not entirely sure why you need a hash table and "Numerical analysis" doesn't offer much... (And people said my NDA'ed Registry stuff was cryptic!)
GuruSR.