News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

C++ std::map case-insensitive 'find'

Started by gfalen, January 20, 2016, 03:09:06 AM

Previous topic - Next topic

gfalen

Greg D Falen

jj2007

Everything can be done in assembler, but can you translate the request using a Masm example?

dedndave

i am guessing a case-insensitive string search   :P

jj2007

Learn Chinese, Dave ;)

Quoteconst_iterator find (const key_type& k) const;

Get iterator to element
Searches the container for an element with a key equivalent to k and returns an iterator to it if found, otherwise it returns an iterator to map::end.
http://www.cplusplus.com/reference/map/map/find/

qWord

Quote from: gfalen on January 20, 2016, 03:09:06 AM
Can it be done ?
Only if the key compare is also case-insensitive. Otherwise you must implement your own find by iterating through the map elements.
MREAL macros - when you need floating point arithmetic while assembling!

gfalen

I know how to do a binary_search on a sorted array (or std::vector) but I
don't know how to iterate over a std::map because it is a tree structure
not contiguous memory!
Greg D Falen

qWord

Quote from: gfalen on January 20, 2016, 07:55:00 AM
don't know how to iterate over a std::map
see http://www.cplusplus.com/reference/map/map/begin/

If it fits your requirements, you could also use your own compare object (third template parameter) and then use the find method (see example here: http://www.cplusplus.com/reference/map/map/map/)
MREAL macros - when you need floating point arithmetic while assembling!

adeyblue

There's also just plain ol' std::find_if which lets you search based on anything stored in the map, not just the key. Here's an example in case your string isn't the key. You don't have to use a functor now lambdas are a thing but I can't remember / hate the syntax


#include <string>
#include <map>
#include <algorithm>
#include <iostream>
#include <iterator>
#include <cstring>

struct Greeting
{
    std::string name;
    std::string greeting;
};

typedef std::map<int, Greeting> GreetMap;

struct FindGreeting
{
private:
    const char* pToFind;
public:
    FindGreeting(const char* pSearchText) : pToFind(pSearchText) {}
    bool operator() (const std::pair<int, Greeting>& i) const
    {
        // can use strcasestr on *nix / StrStrI on Windows to avoid the lowercasing if you don't mind using platform specific stuff
        std::string lowerGreeting;
        std::transform(i.second.greeting.begin(), i.second.greeting.end(), std::back_inserter(lowerGreeting), &tolower);
        return lowerGreeting.find(pToFind) != std::string::npos;
    }
};

int main()
{
    Greeting t = {"Bob", "Hello"};
    Greeting h = {"Gary", "Yo!"};
    Greeting i = {"Steven", "Hi"};
   
    GreetMap greetings;
    greetings.insert(std::make_pair(1, t));
    greetings.insert(std::make_pair(2, h));
    greetings.insert(std::make_pair(3, i));

    std::cout << "Who says Yo!\n";

    GreetMap::iterator yoSpeaker = std::find_if(greetings.begin(), greetings.end(), FindGreeting("yo"));
    if(yoSpeaker != greetings.end())
    {
        std::cout << yoSpeaker->second.name << " says Yo!" << std::endl;
    }
    else std::cout << "Nobody. This is a classy joint." << std::endl;
}

gfalen

Looked promising - but

Complexity:
"Up to linear in the distance between first and last: Calls pred for each element until a match is found."

I have up to 64k words or more to search for syntax highlighting. 'Linear' might be to slow!
I know a binary search is fast enough.
Greg D Falen

hutch--

Greg,

You can reduce a linear search by loading the words into an array and have an array of pointers to them. There is a one load time delay when the app starts up but if you store the first instance of each different character, you can reduce the linear search time by a very large amount. The other option is to use a hash table which is a lot faster than either a binary tree or a tweaked linear search. With syntax hiliting you basically have an either or, yes or no find of the word and this should be simple and fast if done right.

jj2007

Quote from: gfalen on January 20, 2016, 10:40:12 AMI have up to 64k words or more to search for syntax highlighting. 'Linear' might be to slow!

With a pretty simple approach*) (attached), I get around 1 millisecond per word on my Intel i5, assuming a 64k words file has to be searched, and case-insensitive search is wanted.
- read the words into a string array
- sort the array
- create a table with 26 dwords
- fill it with the start line for each letter a-z
- search only the range of lines that start with the first letter of the keyword.

If you extend the table to 26*26, i.e. the first two letters, search times should drop by a factor 26. Fast enough?

*) @Hutch: You were 3 minutes faster :eusa_naughty:

hutch--

There is fast and there is fast. To look up a word list with a YES or NO response, here is one fast way to do it. I have used your 400 odd MB words with a 10000 iteration to get a timing. About 125 ms for 10000 iterations of your complete word list. The technique is called a Finite State Machine.

gfalen

jj2007 - yes this is what MasmEd (and presumably Radasm) do.  I never could quite
get my head around this.

Right now I'm getting all the words to an array of pointers.  Then qsort() + bsearch().
The problem is i have to do a search for every word in current window (to hilite it
properly) and EACH keyword can be case sensitive (C/C++) or not (ASM/Basic).
And this must be done for every EN_CHANGE that occurs.  So i have

typedef struct {
    WORD len;           // word length
    WORD styl;          // word style
    LPSTR ptr;          // ptr to asciiz string
} KEYWORD, *PKEYWORD;

static BYTE iskeyword(LPSTR pstr) {

    KEYWORD* pkw;

    // srchproc / srchproci = case-sensitive / case-insensitive compare
    // .parr = ptr to appropriate array

    // language keywords - according to current file: ASM / Basic / CPP .etc
    if (linfo.nWords) {
        if (pkw = (KEYWORD*) bsearch(pstr, linfo.parr, linfo.nWords, 8, srchproc))
            return STYLESKWD + LOBYTE(pkw->styl);
        if (pkw = (KEYWORD*) bsearch(pstr, linfo.parr, linfo.nWords, 8, srchproci))
            if (!(pkw->styl & 0x100)) return STYLESKWD + LOBYTE(pkw->styl);
    }

    // API keywords - langs[0].parr = WIN32 API words ie functions/structs/types etc
    if (langs[0].nWords) {
        if (pkw = (KEYWORD*) bsearch(pstr, langs[0].parr, langs[0].nWords, 8, srchproc))
            return STYLESAPI + LOBYTE(pkw->styl);
    }

    // special project keywords
    if (prjlinfo.nWords) {
        if (pkw = (KEYWORD*) bsearch(pstr, prjlinfo.parr, prjlinfo.nWords, 8, srchproc))
            return STYLESPRJ + LOBYTE(pkw->styl);
        if (pkw = (KEYWORD*) bsearch(pstr, prjlinfo.parr, prjlinfo.nWords, 8, srchproci))
            if (!(pkw->styl & 0x100)) return STYLESPRJ + LOBYTE(pkw->styl);
    }

    return 0;
}

EDIT: remember first question std::map - I'm using c++
P.S C/CPP about 4-6 months. STL about 1 week!
PPS: it semms to work ok. i was hoping to find a cleaner implementation using
std::map or possibly std:vector.  Especially for their auto alloc / destroy capabilities.
Greg D Falen

jj2007

@Greg: Could you post that 64k words file for some testing?
@Hutch: Cute idea, but we need some timings :P

gfalen

As i said:
the # of words CAN grow to over 64k.  This is the current list (more or less).
These words are loaded into basically three different arrays
1 for API words - global - hilited in all windows (when active)
1 for Language specific words - by file extension
1 for Project specific words - when project changes new list is loaded

My current implementation is plenty fast enough.  As i mentioned earlier i was
hoping to find a way to leverage some of the benefits of STL to improve it.

I have 2 ideas i'm gonna run by you guys but for now i need a rest (my car
accident injuries zap my strength and concentration)

Thanks for the quick and insightfull replies.  I'll get back here later today.
Greg D Falen