News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

I need a program called "FindLine"

Started by learn64bit, November 29, 2022, 04:41:33 AM

Previous topic - Next topic

zedd151

Okay you want non-empty lines in a.txt copied to c.txt?
Also if b.txt contains a unique non-empty line that is not in a.txt, also copy to c.txt? Is this correct?


That would take a bit of work to code, but should be able to be done.
Does the order of lines of text matter, or can they be sorted? This would make easier to not have duplicated lines... I have a sorting routine that also removes duplicate entries while sorting. That is why I ask if the lines can be sorted.

learn64bit

no duplicated lines in a or b. don't worry about that.
don't care about the line order in c.

NoCforMe: you want the output file to contain all lines that are common to both input files, excluding blank lines. Is that so?

NoCforMe is right.

zedd151

Okay, now I see. That will take a bit of effort. Need to compare each line, to be sure not duplicated. Now we are getting somewhere. I'm sure the coding gurus can come up with something. I will look at this again later.

zedd151

Quote from: learn64bit on December 02, 2022, 02:41:42 AM
no duplicated lines in a or b. don't worry about that.
don't care about the line order in c.

NoCforMe: you want the output file to contain all lines that are common to both input files, excluding blank lines. Is that so?

NoCforMe is right.


Okay that criteria is now defined by you. What is the maximum expected line size? And the expected maximum size of both a.txt and b.bxt?


You wrote:
Quote from: learn64bit on December 02, 2022, 03:37:12 AMfilesize maximum less than 1g
line size maximum less than 4mb
(from the post immediately after this one)

Knowing all of that would be a good starting point for you to devise an algorithm to do what you want it to do. I don't think anyone is going to write the code for you, but we will try to help you.

learn64bit

filesize maximum less than 1g
line size maximum less than 4mb

jj2007

Ten minutes of coding: it loads two files into two arrays, and all lines of the first file are written to disk if they are not found in the second file.

include \masm32\MasmBasic\MasmBasic.inc
  Init
  SetGlobals outCt, found
  Recall "\Masm32\examples\exampl04\jacts\jacts.asm", f1$()
  Recall "\Masm32\examples\exampl04\listview\listview.asm", f2$()
  Dim out$()
  SetGlobals
  For_ each esi in f1$()
        .if Len(esi)
                and found, 0
                For_ ct=0 To f2$(?)-1
                                .if Len(f2$(ct))
                                        .if !StringsDiffer(esi, f2$(ct))
                                            inc found
                                        .endif
                                .endif
                Next
                .if !found
                                Let out$(outCt)=esi
                                inc outCt
                .endif
        .endif
  Next
  .if outCt
        Store "c.txt", out$()
        Inkey Str$("%i strings written. Have a look (y)?", outCt)
        If_ eax=="y" Then ShEx "c.txt"
  .endif
EndOfCode

learn64bit

thank JJ,
your c.txt is that a1 which I need.

I changed the input 2 files to
.\a.txt
.\b.txt

Test it with 200mb, It looks take a long time... not finished yet, so I don't konw, but 999mb should take 20 minutes or more.

inputfile
a.txt and b.txt
output
c.txt, a1.txt, and b1.txt

jj2007

Quote from: learn64bit on December 02, 2022, 06:41:21 AMTest it with 200mb, It looks take a long time...

Which is not surprising: you need to compare n2 strings. If that's too slow, hash all strings and do dword comparisons.

learn64bit

about JJ's FindLine,
my test files
  a.txt
   00h 0Dh 0Ah FFh
  b.txt
   01h 0Dh 0Ah FFh 0Dh 0Ah
problem
  no c.txt file produced

jj2007

Zip your a.txt and b.txt, and post the archive here.

learn64bit


jj2007

Let esi=FileRead$("a.txt")
PrintLine "A: ", HexDump$(esi, LastFileSize)
Let esi=FileRead$("b.txt")
PrintLine "B: ", HexDump$(esi, LastFileSize)
Recall "a.txt", f1$()
deb 4, "#lines in A", eax
Recall "b.txt", f2$()
deb 4, "#lines in B", eax


Output (address, bytes):
A: 005EF760  00 0D 0A FF
B: 005F2890  01 0D 0A FF 0D 0A
#lines in A     eax             2
#lines in B     eax             2


Your files don't contain text. They are malformed.

learn64bit


zedd151

Quote from: jj2007 on December 02, 2022, 08:34:46 AM
Your files don't contain text. They are malformed.
It would probably help to know what these files are supposed to be used for, and what program (???.exe) works with them.

jj2007

Quote from: jj2007 on December 02, 2022, 12:26:22 AM
What you do need to do:
- prepare two test files A.txt and B.txt, no more than one MB each but no less than some kBytes
- post them here (if you manage to keep the zip below 512MB)

They won't be text files, it seems, so (as Hutch wrote earlier) it will need binary comparisons. The interesting question is why you write about "lines" in the context of binary files. I see the 0Dh 0Ah sequences, but they don't make sense behind nullbytes.

Quote from: zedd151 on December 02, 2022, 08:39:23 AM
It would probably help to know what these files are supposed to be used for, and what program (???.exe) works with them.

Yes indeed :thumbsup: