News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Compar Strings

Started by mabdelouahab, April 29, 2015, 03:00:23 AM

Previous topic - Next topic

mabdelouahab

Hello Forum members

I have 850 keyword; and I will make method have one parameter is string must compare with all keyword and return True if it is equal one of keyword or false if not ; what is the fastest way to make this work   , (max size 10 char,char= A to Z , and uppercase = lowercase, Param=Unicode string)
I have 10000 string to compare, must I give fast  algorithm

nidud

#1
deleted

jj2007

StringsDiffer is reasonably fast, and with option 1 it's case-insensitive. 10,000 comparisons take about one millisecond.

rrr314159

Hi mabdelouahab,

Do you have SIMD (SSE, AVX) instructions? If so, no doubt it would be fastest to use vector operations. How fast must it be? If 100 nanoseconds is good enough and you can use MasmBasic then go with jj2007's recommendation
I am NaN ;)

habran

I would suggest a hash table 8)
Cod-Father

qWord

#5
Quote from: habran on April 29, 2015, 05:50:00 AM
I would suggest a hush table 8)
Me too.
Because the keywords are constant, one could try to find a collision-free hash table by varying the seed value and/or the hash algorithm (in such case the search is in O(1)).
MREAL macros - when you need floating point arithmetic while assembling!

rrr314159

how to use a "hush table": if you want someone to shut up, hit him over the head with a table

OTOH if you want to compare strings you might use a hash table :biggrin:

But seriously - I think it depends how many matches u expect. Hashing will be 100% correct when it rejects the match, but can give a false match (say it matches when it really doesn't). SO if you expect a match to be rare, it's probably good. But not if you expect many matches; because every one has to be checked for authenticity.

It also matters whether the "keyword" mentioned is static (which it probably is, I guess). If it never changes that's good, but if it changes then two hashes must be computed for every comparison.

Bottom line, you're probably right Habran, hashing would be a good approach

[edit] I posted this b4 seeing qWord's post ... qWord you're probably right the keyword is constant, just wasn't sure of that from mabdelouahab's description. Can you really guarantee no collisions? As I say, as long as few matches are expected occasional collision shouldn't hurt much. But if you can find a collision-free table then doesn't matter how many matches expected. And, I guess you can do that if the keyword is constant - just make sure nothing else hashes to the keyword! Hmm, now that u mention it sounds good, never thought of that b4
I am NaN ;)

habran

Quotehow to use a "hush table": if you want someone to shut up, hit him over the head with a table
:lol::eusa_clap:
Cod-Father

qWord

Quote from: rrr314159 on April 29, 2015, 06:04:12 AMCan you really guarantee no collisions?
no guarantee, but you might be lucky and find such a table.
Some time back I've played with that using MurmurHash: at least for several hundred strings I got results by varying just the seed value.

EDIT: I did wrote something (completely) wrong here before.  :icon_redface:
MREAL macros - when you need floating point arithmetic while assembling!

mabdelouahab

Quote from: habran on April 29, 2015, 05:50:00 AM
I would suggest a hash table 8)
Good idea, I forgot, I've worked on it a lot in school days
Quote from: rrr314159 on April 29, 2015, 06:04:12 AM
how to use a "hush table": if you want someone to shut up, hit him over the head with a table
:lol:
Quote from: rrr314159 on April 29, 2015, 06:04:12 AM

[edit] I posted this b4 seeing qWord's post ... qWord you're probably right the keyword is constant, just wasn't sure of that from mabdelouahab's description. Can you really guarantee no collisions? As I say, as long as few matches are expected occasional collision shouldn't hurt much. But if you can find a collision-free table then doesn't matter how many matches expected. And, I guess you can do that if the keyword is constant - just make sure nothing else hashes to the keyword! Hmm, now that u mention it sounds good, never thought of that b4
yes, the keyword is constant, it is Masm Keyword
I have completed work on the generator codes for com for masm, but I figured the problem of function names, and Parameter Names, Class Names, sometimes identical to one of the words
for example method Invoke of Idispatsh, sometimes Function Named Instr or "Type","Name", ...

rrr314159

mabdelouahab,

seems to me hash table is not necessary and it's a lot more coding than simple comparison.

And, probably 100 nanos / comparison is fast enough for you? A simple comparison routine with SIMD registers could be made even faster, prob 30 nanos or less.

So I go back to my first recommendation:

Quote from: rrr314159Do you have SIMD (SSE, AVX) instructions? If so, no doubt it would be fastest to use vector operations. How fast must it be? If 100 nanoseconds is good enough and you can use MasmBasic then go with jj2007's recommendation

If you need to code the routine just follow nidud's example but do it with SIMD registers. Of course you can figure it out - you're the man who figured out COM!  :t
I am NaN ;)

hutch--

If you want speed in order of performance, a finite state machine (static tree) is the fastest, a well designed hash table performs OK and somewhere down the track normal string compares come a distant last.

What I would recommend is a first character lookup to a label address that splits the word range alphabetically so that if your first character is "g", you only have to scan a list of words starting with "g" rather than the whole list. If done correctly it will hit into about the middle range of performance without having to write complex algorithms.

rrr314159

hutch, you're right, and I finally understand what he's looking for.

Quote from: mabdelouahabI have 850 keyword...

I thought he meant one keyword with 850 characters! Obviously not a word, but some sort of long string. No, he means 850 keywords - with an s on the end - each one probably only 10 or so characters long. When I read your reply it made no sense, went back and read OP. That's the very first time I got tripped up by poor English on this board!

So, forget what I said about SIMD registers, I was talking about a completely different problem
I am NaN ;)

hutch--

RE: Hash tables, you must have collision detection and if a hash collision occurs, you step to the next available slot in the hash table. You trade hashing speed with collision detection speed and if you get it right, you can fill a hash table to about 80% before you get noticable degradation in performance. Best to try for about 50% fill to maximise speed.

jj2007

Any volunteers for a nice testbed in the 10...100 MB range? I have no experience in hush tables. Hush hush 8)

SOF article for bloody beginners like me: Here's an explanation in layman's terms.