News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

ReadFile Confusion! Please help!

Started by GuruSR, August 08, 2016, 12:45:25 PM

Previous topic - Next topic

GuruSR

Well, first off, been a few years since I was here last.  Though am back again, to ask for "old man's assistance" (no, not that kind).

Anyhow, I'm working on a DLL that does 2 things:

Readfile byte by byte to pull in text, sounds simple, but, it actually monitors quotes and ignores crlfs inside of quotes and also ignores "escaped quotes" inside.  Or at least it *should*, but sadly, what I'm getting back is 1 byte read and it's 00h.  So not helpful.

Secondly, it takes a buffer and an offset into that buffer and returns the following "line" inside that buffer (untested since I haven't gotten the first one to work yet).


.Data?
bQuote Byte ?   ; Whether or not we're in "Quotes".
bLast Byte ?   ; Last character, used to see if we have crlf or \"
Write DWord ?   ; Current offset in the Buffer where I'm writing/reading.
EndBF DWord ?    ; End of the buffer (Buffer + BufferLen)
Count DWord ?   ; It'll be 1 or 0 (0 meaning nothing left).

ReadToCRLFQ Proc Uses Ebx Ecx Edx Esi FileID:DWord, BufferLen:DWord, Buffer:DWord
         Mov Edi, DWord Ptr [Buffer]
         Mov Write, Edi
         Mov Eax, Edi
         Add Eax, DWord Ptr [BufferLen]
         Mov EndBF, Eax
         Mov Count, 0H
         Mov bLast, 0H
         Mov bQuote, 0H
         Cld

ReadLoop:   Push 0
         Lea Eax, Count
         Push Eax
         Push 1
         Push Edi      ; Buffer
         Mov Eax, DWord Ptr [FileID]   ; Supplied hFile.
         Push Eax
         Call ReadFile
         Cmp Count, 0
         Je Split      ; Nothing left
         Mov Al, [Edi]   ; Get the character.
         Mov Bl, bLast   ; Check last.
         Cmp Al, Quote   ; if "...
         Jne NoQuote
         Cmp Bl, Slash   ; if \, ignore it, it's "not a quote".
         Je NoQuote
         Mov Cl, bQuote   ; Get bQuote
         Xor Cl, 01h   ; Toggle it on, since we're "in quotes".
         Mov bQuote, Cl   ; Store it.
NoQuote:   Mov Cl, bQuote
         Cmp Cl, 0
         Jne NotLF      ; We're inside a quote.
         Cmp Al, LF      ; Check for Line Feed (CR is in last)
         Jne NotLF
         Cmp Bl, CR      ; If it's a carridge return, we should stop.
         Je Split      ; Done, found the end of line.
NotLF:      Mov bLast, Al   ; Store the last one, so we're keeping track of things.
         Inc Edi         ; Increase.
         Inc Edi         ; Unicode increase.
           Cmp Edi, EndBF
           Jl ReadLoop      ; Keep going until we run out of buffer space.

Split:      Mov Eax, Edi
           Sub Eax, Write
           Shr Eax, 1H      ; Take the Unicode off.
           Cld
            Ret                                 ;0Ch
ReadToCRLFQ EndP


Now the code has a "counterpart" that actually hasn't been tested (yet), should work though, as it's basically reading the text data from a predefined buffer and offset, returns the line read in from a chunk, would rather not do this with the file, since it's collecting data 1 "line" at a time.

Basically the text file could look like this:

"Welcome" = "

Welcome to this nightmare!

"
"Goodbye" = "

You died, have a nice afterlife!

"
"UserName" = "Dead Sucker"


Reading the file in would read the 1st "line" to the end of the first quote for "Welcome".  Sadly, all I'm getting is 1 count read, it's not showing up in the buffer and it's exiting cleanly.  I mean if it crashed (Olly would step in and show me what happened).  I cannot change the format of the text file to suit "my" needs, since the data isn't mine to alter.

The function is part of a library, which is passed "FileID" (hFile handle), "Buffer" (Address of the buffer in memory) and "BufferLen" (raw value of length of "Buffer").  I know the handle for the file is correct, as it's getting an actual value (non-zero).  Buffer should be accurate (I'm not seeing any crashing with access violations, so thats a good thing, well not really since it's *not* crashing so I can debug it).

If I can get this going, I'll share the library source since there are probably people out there that could use the two functions.

GuruSR.
Learned 68k Motorola Asm instruction set in 30 minutes on the way to an Amiga Developer's Forum meeting.
Following week wrote a kernel level memory pool manager in 68k assembler for fun.

jj2007

Without seeing the complete code (from "include" to "end start"), and a text file for testing, it is almost impossible to diagnose your problem. So this is just a guess: "Unicode increase" could be the culprit. In your loop, you read one byte, but you advance two bytes in your buffer. So if you read "a text", buffer will be a,0,32,0,t,0,e,0,x,0,t

Is that intentional?

Re Olly: an int 3 before ReadFile can do miracles 8)

mineiro

This is just supposition because I don't see full source code.
If you deal with ascii/ansi you read just 1 byte (1byte == 1 letter/symbol). If you deal with Unicode things  you should read at least 2 bytes (2 bytes == 1 letter/symbol).
So, you can adjust  this thing on ReadFile, and on other parts of program too like " Mov Al, [Edi]   ; Get the character.". So, you don't need change that 'inc edi, inc edi'.
Other point is that texts ends with zero (00h). So, on unicode should be a double zeros (0000h). When your code find (00h or 0000h) it should stop.

I don't get the point about usage of "bl" register.
Maybe you're assuming that text will start with quotes, try to remove quotes as first symbol on text and do tests.

If you have troubles understanding ansi/unicode, just copy that text supplied and paste on notepad. When you go to save, that window that ask name of file, look for options on that window and you will see that have options to save as unicode/ansi. So, as an example, hit "enter"(crlf) 3 times on text and save as enterA.txt using ansi way. After saved, save again with other name, enterU.txt, but this time using unicode. Next step is open that files using hexadecimal editor, you will see that on ansi that is just "crlfcrlfcrlf" sequence, but on unicode you will see some zeros.

Have functions on windows .dll that convert unicode to ansi and vice-versa.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything

GuruSR

Quote from: jj2007 on August 08, 2016, 02:01:55 PM
Without seeing the complete code (from "include" to "end start"), and a text file for testing, it is almost impossible to diagnose your problem. So this is just a guess: "Unicode increase" could be the culprit. In your loop, you read one byte, but you advance two bytes in your buffer. So if you read "a text", buffer will be a,0,32,0,t,0,e,0,x,0,t

Is that intentional?

Re Olly: an int 3 before ReadFile can do miracles 8)

I actually managed to get it working with very minor changes, the Int 3 was extremely useful, just couldn't remember what it was!  The process is from a library, only includes are windows and kernel32, lib only kernel32.  I'll see about posting the source with includes added, since I'm using Easy Code.

GuruSR.
Learned 68k Motorola Asm instruction set in 30 minutes on the way to an Amiga Developer's Forum meeting.
Following week wrote a kernel level memory pool manager in 68k assembler for fun.

GuruSR

Quote from: mineiro on August 08, 2016, 11:06:13 PM
This is just supposition because I don't see full source code.
If you deal with ascii/ansi you read just 1 byte (1byte == 1 letter/symbol). If you deal with Unicode things  you should read at least 2 bytes (2 bytes == 1 letter/symbol).
So, you can adjust  this thing on ReadFile, and on other parts of program too like " Mov Al, [Edi]   ; Get the character.". So, you don't need change that 'inc edi, inc edi'.
Other point is that texts ends with zero (00h). So, on unicode should be a double zeros (0000h). When your code find (00h or 0000h) it should stop.

Yes, the text file is ASCII, but the buffer is Unicode.   The buffer is full of nulls, so I just check for the return of Count from the ReadFile which denotes EoF.  The data isn't Unicode, but the buffer is.

Quote from: mineiro on August 08, 2016, 11:06:13 PM
I don't get the point about usage of "bl" register.
Maybe you're assuming that text will start with quotes, try to remove quotes as first symbol on text and do tests.

The Bl keeps track of the previously read character.  If Al hits a quote it checks Bl for a \, which means it sees \", which is an escaped quote, it's how you store quotes inside strings.  It also uses Bl to check for CR when Al has LF in it, but will not test for that within Quotes.  The data in the text file cannot be altered.

Quote from: mineiro on August 08, 2016, 11:06:13 PM
If you have troubles understanding ansi/unicode, just copy that text supplied and paste on notepad. When you go to save, that window that ask name of file, look for options on that window and you will see that have options to save as unicode/ansi. So, as an example, hit "enter"(crlf) 3 times on text and save as enterA.txt using ansi way. After saved, save again with other name, enterU.txt, but this time using unicode. Next step is open that files using hexadecimal editor, you will see that on ansi that is just "crlfcrlfcrlf" sequence, but on unicode you will see some zeros.

Have functions on windows .dll that convert unicode to ansi and vice-versa.

I've not got any problems with ANSI, ASCII or Unicode, been writing code in a variety of languages since the '70s.

GuruSR.
Learned 68k Motorola Asm instruction set in 30 minutes on the way to an Amiga Developer's Forum meeting.
Following week wrote a kernel level memory pool manager in 68k assembler for fun.

mineiro

Now I understand bl register usage, you're dealing with 2 symbols on each pass, previous and actual.
I don't have much to contribute, maybe change

         Mov Cl, bQuote   ; Get bQuote
         Xor Cl, 01h   ; Toggle it on, since we're "in quotes".
         Mov bQuote, Cl   ; Store it.
to
         xor bQuote,1

Wow GuruSR, you're an ancestor of programming language, from 70's, like a monk , hermit :t . I born on 77.
I'd rather be this ambulant metamorphosis than to have that old opinion about everything