News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

New line tokeniser.

Started by hutch--, October 24, 2014, 03:04:03 PM

Previous topic - Next topic

TouEnMasm

Use a batch is too much difficult  :lol:
But as I said,they don't need to be tested.
Fa is a musical note to play with CL

nidud

#31
deleted

jj2007

Quote from: nidud on October 26, 2014, 10:51:19 PM
how many lines in the buffer?

An important test indeed.

include \masm32\MasmBasic\MasmBasic.inc      ; download

.data
justOne   db "This is one line",0

  Init
  ; StringToArray uses Recall code but with a buffer instead of a file name
  StringToArray offset justOne, My$()
  Print Str$("%i lines\n", eax)
  Inkey "Line 0: [", My$(0), "]"
  Exit
end start


Output:
1 lines
Line 0: [This is one line]

hutch--

I have this problem with the style of testing being done, it is incomplete, unbuildable and unverifiable. The whole point of a code laboratory is to be able to compare and test different algorithms and with this style of testing this is not being done. I wonder what happened to posted testable algorithms ?

nidud,

The result with a string,


one   db "one line",0


with the get_lcnt algo should be 0 as there are no ascii 13 characters to count in a single line that is not 13,10 delimited.

jj2007

Quote from: hutch-- on November 05, 2014, 01:32:19 PMincomplete, unbuildable and unverifiable
This is indeed a problem. Snippets without headers etc are fine to have a quick look, but the complete buildable example should then be attached. Most of the time people do post buildable examples, but others post code that requires fumbling with environment variables, which is a bad habit because it may interfere with the setup of one's machine. There is a reason why practically all Masm32 code has include \Masm32\include\somelib.inc on top, and not include somelib.inc or, worse, include C:\Masm32\include\somelib.inc

I must say that in this respect Masm32 is an excellent library: All 300+ examples in \Masm32\examples build without a single error message. In my experience, in contrast 95% of all C/C++ examples on the web throw cryptic error messages when you drop them into the leading free C/C++ software, VC Express, and you spend a lot of time to decipher what this behemoth wants from you, and where to find the missing header files :(

Quote from: hutch-- on November 05, 2014, 01:32:19 PM
The result with a string,


one   db "one line",0


with the get_lcnt algo should be 0 as there are no ascii 13 characters to count in a single line that is not 13,10 delimited.

IMHO, for the purpose of building an array, both

  one   db "one line",0

and

  one   db "one line", 13, 10, 0

should return one line. Other views? Practical examples supporting other views?

hutch--

There is a very simple way to solve that problem if you wish to include lines that are not 13,10 delimited or a variation of either. As you almost exclusively know the length of the text, read the last byte before the terminator and if its not 13 or 10, write one there. You can set a flag for whether there is a trailing 13,10 or not, and if it matters, you can trim the tail end off the string by writing zero to the last line to truncate it back before the added byte.

jj2007

Yes, this is indeed the strategy of Recall(). And qEditor also recognises both versions, of course - see attachment for a test. Textfiles with no CrLf at the end are relatively rare, but they do occur in real life, and not taking that into account may lead to bugs that are really difficult to chase.

hutch--

What was the purpose of the 2 text files when most know the difference between a last line that is zero terminated and a last line the is 13,10,0 terminated ?

jj2007

Quote from: hutch-- on November 05, 2014, 03:10:32 PM
What was the purpose of the 2 text files when most know the difference between a last line that is zero terminated and a last line the is 13,10,0 terminated ?

Just to encourage some testing. Btw I had swapped the names, attached the right ones, together with a demo:

Testing 0 MB in TestLastLineZero.txt
...
Lines found:
10      Hutch
10      Yves
11      JJ/Recall

Results Recall:
Line 7
Line 8
Line 9
Last line
----------

Results Hutch:
Line 7
Line 8
Line 9
----------


What is the desired result, for all practical purposes? qEditor.exe does show "Last line".

hutch--

QE does not use a tokeniser at all, it simply streams file data into the rich edit control.

nidud

#40
deleted

hutch--

nidud,

> In my view a file begins at offset 0 and ends at EOF, and its size are measured between these two.

I have been looking at files in both text editors and in hex editors for a mighty long time and they either end at the OS stored byte count or with an ascii zero. I have not seen an ascii 26 (eof) terminating any file I have ever viewed.

nidud

#42
deleted

FORTRANS

Quote from: hutch-- on November 11, 2014, 02:39:59 AM
I have been looking at files in both text editors and in hex editors for a mighty long time and they either end at the OS stored byte count or with an ascii zero. I have not seen an ascii 26 (eof) terminating any file I have ever viewed.

Hi,

   FWIW I have seen quite a few.  Some text files from CPM systems.
And some from what appeared to be text editors ported from CPM.
And one strange OS/2 editor.

Quote from: nidud on November 11, 2014, 04:50:50 AM
I don't think the CTRL-Z character is actually used any more.

   If you go to a command line and copy to a file from the CON:
device, a Ctrl-Z is used to terminate input.  Something like the
following copied from a command prompt.

C:\>COPY CON FileName
test text
line 2^Z
        1 file(s) copied.

C:\>TYPE FileName
test text
line 2
C:\>

Cheers,

Steve N.

nidud

#44
deleted