News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

High speed text parser

Started by hutch--, March 15, 2022, 12:54:52 PM

Previous topic - Next topic

hutch--

This algo was originally designed for a scripting engine and its hallmark is it had to be fast parsing each line of script so that the script was not laggy. It is basically designed for parsing single lines in a script but its easily fast enough to parse words in even large text files.

It is reasonably open in the characters that can be used and this is defined by the table in the asm code, it currently splits text based on spaces and commas as it set in the table. Any adjustments required need to be done in the assembler source as switching code would have slowed the algo down.

It is set up to use the PowerBASIC STRINGZ data type but it can accept input data of a direct memory address or STRPTR() to basic dynamic string or VARPTR() to a STRINGZ address. The output is a STRINGZ array with the function's return value being the word or phrase count.

The algorithm is usable as is, you don't have to know how to modify it in assembler to use it. Look at the example pl.bas to see how to use it. You can test it by running it by itself or dropping any of the other text file onto it in Explorer or Winfile.

Modifying the SLL source.

This form of the algo has additional capacity in that it not only handles double quoted text but has had single quoted added, square brackets added and normal brackets added and they are all working OK. If you need to turn any of these off, it is done here.

  ; --------------------------
  ; paired character branching
  ; --------------------------
    cmp BYTE PTR [esi], 34                          ; branch to "double quote" handling
    je dquote

    cmp BYTE PTR [esi], 39                          ; branch to 'single quote' handling
    je squote

    cmp BYTE PTR [esi], 91                          ; branch to [square bracket] handling
    je sqBracket

    cmp BYTE PTR [esi], 40                          ; branch to (bracket) handling
    je brackets

Simply comment out the comparison and JE branch and that capacity is disabled.

The return value for any errors are here.
]
  qterror1:
    mov eax, -1                                     ; return -1 for "double quote" error
    jmp quit

  qterror2:
    mov eax, -2                                     ; return -2 for 'single quote' error
    jmp quit

  sbError:
    mov eax, -3                                     ; return -3 for [square bracket] error
    jmp quit

  brError:
    mov eax, -4                                     ; return -4 for (bracket) error
    jmp quit

The character table is set for the english language, if you know how to modify a 256 item character table, its done here.

  chtbl:
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
    db 0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1              ; space and comma delimited
    db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
    db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
    db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
    db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
    db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
    db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
    db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
    db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
    db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
    db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
    db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
    db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
    db 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1