I found a real life example of a dictionary here (https://download.freedict.org/dictionaries/eng-deu/1.9-fd1/freedict-eng-deu-1.9-fd1.dictd.tar.xz). Download, open in 7-zip, in several steps until you reach
eng-deu.dictExample:
Quoteparing cutters/ploughs /pˈeəɹɪŋ kˈʌtəz plˈaʊz/
Flachwender <pl>
Synonyms: {skim ploughs}, {turf cutters/ploughs}
see: {skim plough}, {turf cutter/plough}, {paring cutter/plough}
flattening /flˈatənɪŋ/
Flachwerden durch Niederdrücken [textil.]
uckwit /fˈʌkwɪt/
[vulg.] Flachwichser <masc> [vulg.]
Note: Dummkopf
Note: fool
Looks like fun. Who writes the fastest algo to find e.g.
uckw in this 80MB, two Million lines file?
They also have an API to access the dictionaries (https://freedict.org/downloads/#windows).
It probably has a lot to do with how the dictionary is laid out. If the words are alphabetical, you could get to the first "f" offset by having its address in a table, then linear scan the words until you either find it or not. If I had to do a task like that I would set up a table with each alphabetical character as an offset to a list. If the lists are sorted as they should be, you would only need to scan the second character before you started to scan the whole word.
Quote from: hutch-- on February 03, 2023, 01:53:12 PM
It probably has a lot to do with how the dictionary is laid out. If the words are alphabetical, you could get to the first "f" offset by having its address in a table, then linear scan the words until you either find it or not. If I had to do a task like that I would set up a table with each alphabetical character as an offset to a list. If the lists are sorted as they should be, you would only need to scan the second character before you started to scan the whole word.
Yep, exactly. I would go for a two-character table (26*26=676 DWORDs). If I had too much time at hand, I would try it, but I made a quick test with RichMasm, and finding 43 matches for
flibbertigibbet in that 80MB document takes only about 100 milliseconds, so a proper solution is probably not worth the time :tongue: