Source
a.txt
b.txt
Result
c.txt
e.g.
a.txt file bytes
00h
1 byte file, didn't have line seperator "0Dh 0Ah"
b.txt file bytes
0Dh 0Ah 0Dh 0Ah 00h 0Dh 0Ah 0Ah
8 bytes file
It has 3 empty lines
first empty line
before line seperator, and not after line seperator
second empty line
before line seperator, and after line seperator
third empty line
not before line seperator, and after line seperator
two not empty line
first
00h
second
0Ah
c.txt file bytes should be
00h
Why "FindTheCommonLines"? Because we have line sepetator "0Dh 0Ah" here
We don't care about empty lines, so just don't look at them
I wonder if this program already exist?
Quote from: learn64bit on November 29, 2022, 04:41:33 AM
I wonder if this program already exist?
Look that in your mind program is very clear. Just build it :thumbsup:
Please do yourself a favour: write what you need in your native language, then let DeepL (https://www.deepl.com/translator) translate it. Plus, zip a.txt and b.txt and post them here. Maybe somebody will understand then what you want :rolleyes:
Wait, wait: before taking JJ's advice (which is good advice*), let me ask: is what you're really looking for here something that could be called "Find common lines"? Because based on your description (what I can figure out in it, anyhow) is that you want a count of lines that are identical between a.txt and b.txt, not including blank lines. Is that right?
* I have to second JJ's kind of annoyed suggestion, because it annoys me too: people who post stuff here in English that's impossible to understand. Before you accuse me of being an Anglocentric snob or something, let me point out two things: one, that this happens to be an English-language forum, and two, that if I were to post such incomprehensible gibberish in your native language, you'd be annoyed too.
:biggrin:
If I had a buck$ for every incomprehensible question ever posted, I would own 2 Cadillacs, a house on the Riviera and at least 2 Lear jets.
Perhaps what our friend needs is a little more comprehension and some better explanation of what he is after.
I want c.txt. c.txt is a file.
FindLine is the program's name
FindTheCommonLines is the main function's name
common is a cn english word, like common ground wire, live wire
Keep the questions comming, I will answer them.
But did I describe accurately what you want the program to do? That wasn't clear in your original post.
As I said, it sounds to me like you want the output file to contain all lines that are common to both input files, excluding blank lines. Is that so?
Yes.
My test a.txt is a 242,206 KB, b.txt is 923,293 KB
(Maybe I will need other options, like only check line's first 40 bytes[sha1sum], like only check line's first 64 bytes[sha256sum], ideas is welcome)
Now the next question is, are you comparing only lines or is there multi-line text ?
a.txt and b.txt have 00h byte, 0D byte, and FFh byte... etc.., if that is ok, it is text file
the biggest line is under 4mb
An (ASCII) text file normally doesn't have any zero or FF bytes in it. What type of text file are you using here exactly?
The normal end-of-line characters would be either carriage return (0D) or carriage-return/line feed (0D/0A).
A 4 megabyte line? really? That's huge!
Sounds weird to me.
Yes, weird to me too
4mb is fine, below WinAPI will do
invoke HeapAlloc,eax,4*1024*1024
HeapSize
LocalAlloc
LocalSize
GlobalAlloc
GlobalSize
Yes, we know you can easily allocate a 4MB buffer, no problemo. But a 4MB line???? You mean one line of text? or are you talking about the size of the entire file?
You gotta be a lot more clear about what you're trying to do here.
Yes, the biggest line is almost 4mb in size. the biggest file is almost 1gb
Sounds like records in a database, not any kind of text file. How could you have a 4MB line of text?
Quote from: learn64bit on November 29, 2022, 05:10:05 PM
a.txt and b.txt have 00h byte, 0D byte, and FFh byte... etc.., if that is ok, it is text file
the biggest line is under 4mb
Then it's not a text file, but a binary. :biggrin:
If the two input files as well as the output file are indeed text files, what program are you using to open them? I am at a loss as to why a text file would contain 0h or 0FFh bytes. What would the usage for the file be? Obviously we don't understand this rather unorthodox use or formatting of text files, if thats what they are. How are they encoded (Ascii, utf-8, utf-16, UNICODE or ??), and what format (.txt, .rtf, some custom format?) ??
And by "FindLIne" do you mean finding a line by line number? Or find a line containing certain known text? Or find a line that meets some other criteria?
Can you post an example here, that is small enough to attach as a zip file in your post? (Not the 1 GB file that you mentioned earlier :tongue: )
yes... maybe I should call them byte corrupted text file.
now... I want to start to build it. (sorry if I did not answer your questions)
first give the "AllocateUserPhysicalPages" a try, why this one? because I didn't found any masm32 source code use it.
I found a ms c/c++ example which use this API, of course our "FindLine" will be masm32 source code
[no Win11sdk, no VS2k23; I still use ms PlatformSDK for Win2k3 R2 and ms VS.NET 2k3]
This might be useful
https://learn.microsoft.com/en-us/windows/win32/memory/file-mapping
It sounds like you want to perform binary search of byte patterns where you would need a starting and ending byte. This can be done reasonably easily as long as this is what you are after.
new idea
input files
a and b
output files
c,a1 and b2
c
contain lines in a and b
a1
contain lines only in a
b1
contain lines only in b
it means we need 5 memory blocks, 1 block is 1gb.
oh man... I wish my machine have 64gb or 1tb RAM!
if this idea failed, I will try Timo's idea
Quote from: learn64bit on December 02, 2022, 12:21:28 AM
nes idea
Is that English?
Seriously, your absolutely incomprehensible, confused posts will not get you the help you want. Ranting about gigabytes needed is not helpful, either. What you do need to do:
- prepare two test files A.txt and B.txt, no more than one MB each
- post them here (if you manage to keep the zip below 512MB) or elsewhere
- tell us
exactly what you need, and how C.txt should look like
I am trying to be helpful :thup:
written bytes didn't show up... in picture...maybe just I run a old copy to capture the pic
Quote from: learn64bit on December 02, 2022, 12:35:00 AM
written bytes didn't show up... in picture...maybe just I run a old copy to capture the pic
Again, an absolutely incomprehensible, confused comment. You don't want help, right?
Okay, I have given this some thought...
learn64bit, do you want to copy text from a.txt and b.txt to file c.txt and remove bytes that might be 0h or OFFh?
Is that what you were after in your original post (post #1 in this thread)?
Also do you want the lines ending with only 0D0Ah? And remove if contain only 0Ah? Or do you want to convert from 0Ah to 0D0Ah?
Does that sound like what you want to do? If not than try to explain it so that we can understand what you need since so far, no one can decipher what you want to do.
1 line c example
0Ah
2 lines c example
0Ah 0Dh 0Ah
00h
3 lines c example
0Ah 0Dh 0Ah
00h 0Dh 0Ah
FFh
if last line is
0Ah 0Dh 0Ah
or
FFh 0Dh 0Ah
or
00h 0Dh 0Ah 0Dh 0Ah
it's ok. because I don't care empty lines at all
Quote from: learn64bit on December 02, 2022, 01:26:44 AM
1 line c example
0Ah
2 lines c example
0Ah 0Dh 0Ah
00h
3 lines c example
0Ah 0Dh 0Ah
00h 0Dh 0Ah
FFh
I give up, folks. Back to coding :tongue:
It seems you are wanting to copy a.txt to c.txt,
Then append b.txt to the c.txt file created when copying a.txt -> c.txt in the line above? Does that describe what you want?
That would be trivial if that is the case.
BTW: no duplicate noneEmptyLine in a or b, every noneEmtryLine in a is uniqueLine in a
if a is 800mb and b is 900mb, then c's maximum size should be 800mb.
Okay, I think I understand...
Copy a.txt to c.txt
If b.txt contains a line that is not in a.txt, copy that to c.txt?
Is a.txt almost the same as b.txt, except b.txt has an added line? Is that correct?
Okay you want non-empty lines in a.txt copied to c.txt?
Also if b.txt contains a unique non-empty line that is not in a.txt, also copy to c.txt? Is this correct?
That would take a bit of work to code, but should be able to be done.
Does the order of lines of text matter, or can they be sorted? This would make easier to not have duplicated lines... I have a sorting routine that also removes duplicate entries while sorting. That is why I ask if the lines can be sorted.
no duplicated lines in a or b. don't worry about that.
don't care about the line order in c.
NoCforMe: you want the output file to contain all lines that are common to both input files, excluding blank lines. Is that so?
NoCforMe is right.
Okay, now I see. That will take a bit of effort. Need to compare each line, to be sure not duplicated. Now we are getting somewhere. I'm sure the coding gurus can come up with something. I will look at this again later.
Quote from: learn64bit on December 02, 2022, 02:41:42 AM
no duplicated lines in a or b. don't worry about that.
don't care about the line order in c.
NoCforMe: you want the output file to contain all lines that are common to both input files, excluding blank lines. Is that so?
NoCforMe is right.
Okay that criteria is now defined by you. What is the maximum expected line size? And the expected maximum size of both a.txt and b.bxt? You wrote: Quote from: learn64bit on December 02, 2022, 03:37:12 AMfilesize maximum less than 1g
line size maximum less than 4mb
(from the post immediately after this one)
Knowing all of that would be a good starting point for you to devise an algorithm to do what you want it to do. I don't think anyone is going to write the code for you, but we will try to help you.
filesize maximum less than 1g
line size maximum less than 4mb
Ten minutes of coding: it loads two files into two arrays, and all lines of the first file are written to disk if they are not found in the second file.
include \masm32\MasmBasic\MasmBasic.inc
Init
SetGlobals outCt, found
Recall "\Masm32\examples\exampl04\jacts\jacts.asm", f1$()
Recall "\Masm32\examples\exampl04\listview\listview.asm", f2$()
Dim out$()
SetGlobals
For_ each esi in f1$()
.if Len(esi)
and found, 0
For_ ct=0 To f2$(?)-1
.if Len(f2$(ct))
.if !StringsDiffer(esi, f2$(ct))
inc found
.endif
.endif
Next
.if !found
Let out$(outCt)=esi
inc outCt
.endif
.endif
Next
.if outCt
Store "c.txt", out$()
Inkey Str$("%i strings written. Have a look (y)?", outCt)
If_ eax=="y" Then ShEx "c.txt"
.endif
EndOfCode
thank JJ,
your c.txt is that a1 which I need.
I changed the input 2 files to
.\a.txt
.\b.txt
Test it with 200mb, It looks take a long time... not finished yet, so I don't konw, but 999mb should take 20 minutes or more.
inputfile
a.txt and b.txt
output
c.txt, a1.txt, and b1.txt
Quote from: learn64bit on December 02, 2022, 06:41:21 AMTest it with 200mb, It looks take a long time...
Which is not surprising: you need to compare n
2 strings. If that's too slow,
hash (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1489) all strings and do dword comparisons.
about JJ's FindLine,
my test files
a.txt
00h 0Dh 0Ah FFh
b.txt
01h 0Dh 0Ah FFh 0Dh 0Ah
problem
no c.txt file produced
Zip your a.txt and b.txt, and post the archive here.
my test files a and b (v1)
Let esi=FileRead$("a.txt")
PrintLine "A: ", HexDump$(esi, LastFileSize)
Let esi=FileRead$("b.txt")
PrintLine "B: ", HexDump$(esi, LastFileSize)
Recall "a.txt", f1$()
deb 4, "#lines in A", eax
Recall "b.txt", f2$()
deb 4, "#lines in B", eax
Output (address, bytes):
A: 005EF760 00 0D 0A FF
B: 005F2890 01 0D 0A FF 0D 0A
#lines in A eax 2
#lines in B eax 2
Your files don't contain text. They are malformed.
Yes. You are right.
Quote from: jj2007 on December 02, 2022, 08:34:46 AM
Your files don't contain text. They are malformed.
It would probably help to know what these files are supposed to be used for, and what program (???.exe) works with them.
Quote from: jj2007 on December 02, 2022, 12:26:22 AM
What you do need to do:
- prepare two test files A.txt and B.txt, no more than one MB each but no less than some kBytes
- post them here (if you manage to keep the zip below 512MB)
They won't be text files, it seems, so (as Hutch wrote earlier) it will need binary comparisons. The interesting question is why you write about "lines" in the context of binary files. I see the 0Dh 0Ah sequences, but they don't make sense behind nullbytes.
Quote from: zedd151 on December 02, 2022, 08:39:23 AM
It would probably help to know what these files are supposed to be used for, and what program (???.exe) works with them.
Yes indeed :thumbsup:
phonetic alphabet font id
00h
us english
01h
uk english
...
FFh
cn english
And that is supposed to tell us ... what? :rolleyes:
Note that a nullbyte (00h) will always be interpreted as "end of string". If you want to avoid that, and see strings, you should add 41h to these bytes, so 00h->A, 01h->B, 0FFh->@ etc
@jj... seems to be some sort of 'proprietary format'. Only learn64bit knows for certain.
Advanced English program
tables
english phonetic alpahet
id <- column
english word
word - column
....
a.txt and b.txt can be one of the tables
Quote from: learn64bit on December 02, 2022, 08:49:02 AM
Advanced English program
tables
english phonetic alpahet
id <- column
english word
word - column
....
a.txt and b.txt can be one of the tables
We believe you. Show us a medium sized table, i.e. more than 3 bytes, less than 3 gigabytes. And give us a description of the format, in real English please.
Quote from: jj2007 on December 02, 2022, 08:50:45 AM
We believe you. Show us a medium sized table, i.e. more than 3 bytes, less than 3 gigabytes.
Hopefully a table that will be equal to or less than the zip file size limit after zipping. :tongue: Hate to have to attempt opening that file. Would take eons. :biggrin: (3 GB)
On second thought, a screenshot of it opened in the program where it is used might be helpful. Knowing that program name would certainly help too, learn64bit.
Idea update
A lady told me, I don't need b1. (a is 1gb. b is 1gb-1byte. c is 1gb-1byte. a1 is 1byte. b1 is 0byte. I can use b and a1 to restore a.)
I think she is right.
So now our FindLine will only use 4gb of RAM
a, b, c and a1
4mb line memory will use HeapAlloc or something else, not AUPP
[thanks for everybody's advise]
Quote from: learn64bit on December 03, 2022, 05:03:10 AM
I can use b and a1 to restore a
Let us know if it works out for you. :thumbsup:
ideas still are welcome
I am just starting build it
I will go slow. so I can get more ideas. It means save more memory, program run faster!
[Thank everyone!]
Quote from: learn64bit on December 04, 2022, 07:29:09 AM
ideas still are welcome
I am just starting build it
I will go slow. so I can get more ideas. It means save more memory, program run faster!
[Thank everyone!]
:thumbsup:
next step
how should I load a and b in their memory
Quote from: learn64bit on December 04, 2022, 12:43:09 PM
next step
how should I load a and b in their memory
Let esi=FileRead$ (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1075)("a.txt")
Let edi=FileRead$("b.txt")
The Masm32 SDK has a similar macro. Just RTFM.
jj,
thanks for the idea
Quote from: learn64bit on December 04, 2022, 12:43:09 PM
next step
how should I load a and b in their memory
At the bare minimum (no error checking), I would use the api's:
CreateFileA (ansi) or CreateFileW (unicode) ; opens the file
GetFileSize ; gets file size in bytes
GlobalAlloc ; allocate memory buffer
ReadFile ; read file into the memory buffer
CloseHandle ; close file handle
~~~
;; here you would do what you need to with the file (now in memory buffer)
~~~
GlobalFree ; free the memory used
That is how I would do it... :biggrin:
Quote from: jj2007 on December 04, 2022, 08:50:36 PM
The Masm32 SDK has a similar macro.
xHelp is your friend (http://masm32.com/board/index.php?topic=531.0): mov edi, InputFile(lpfile)
thanks,
seems nothing better than
ReadFile,handleOfFile,addressOfMemory,4kb,BytesHaveRead,0
Idea update,
Now we use ReadFile loaded a.txt and b.txt in their memory,
...
mov AddressOfMemoryA,eax
mov FileSizeOfFileA,ebx
;FileSizeOfFileA is minimum 1 byte, maximum 1 giga bytes
...
mov AddressOfMemoryB,eax
mov FileSizeOfFileB,ebx
We use "invoke VirtualAlloc,400000h,MEM_COMMIT,PAGE_READWRITE" create a 4mb memory for LineMemory,
(I think use VirtualAlloc or GlobleAlloc, it's no big different)
mov addressOfLineMemory,eax
Next step,
How to load a noneEmptyLine from AddressOfMemoryA in addressOfLineMemory, then check if it is a line of AddressOfMemoryB or not?
Use either HeapAlloc() or GlobalAlloc() as VirtualAlloc() tends to be for more specialised OS functions.
hutch--,
thanks.
Idea update,
how to load a noneEmptyLine from AddressOfMemoryA in addressOfLineMemory, then check if it is a line of AddressOfMemoryB or not?
somebody suggest use "lodsb" to load, then use "cmpsb" to check.
I think It's a good idean to try, since I don't have an better one yet.
First thing, do you have the Masm32 SDK installed on your computer? And I mean a proper, full installation without modifications. Masm32 has some nice libraries and macros that would make your described program much easier to write and receive assistance in writing it.
Have you started the assembly source code for your proposed program? If you have started it, post what you have so far.
no, nothing is new, just copy & past...
I will not post another unfinished and repeat source code
of course, I will post my working FindLine program
ideas are welcome, this is just beginning
Quote from: learn64bit on December 08, 2022, 05:09:56 PM
no, nothing is new, just copy & past...
So, no source code? :undecided: I am eager to help you, but I can only help you if I see that you are really trying to write your own assembly program. Without a source code, I do not know that you are even trying.
Do you have the Masm32 SDK installed?
Thanks for everybody's ideas!
now we can go next steps
Ideas still are welcome, untill it is been finished.
Still no source?
Quote from: learn64bit on December 10, 2022, 03:15:52 PM
...
now we can go next steps
...
Looking at the executable (that you attached) in ollydbg, seems like you are making a simple task a little over-complicated. :thumbsup:
But I wish you the best of luck in your endevours. No souce code, I cannot assist.
OllyDbg's disassembly code is better than my source code?
Yes, it is true.
Every time I post source code, people say "not human readable!"
Quote from: learn64bit on December 10, 2022, 03:57:33 PM
Every time I post source code, people say "not human readable!"
Post the source code that you have, I won't complain about it.Also, you never answered my question.. "Do you have the Masm32 SDK installed on your computer?"If so, myself and other members may be able to help you better. :thumbsup:
Also, look at the examples in the 'examples' folder in Masm32. There you will see some nice examples of masm source code. Notice that most of them are easy to follow, uncluttered and only contain the data, code, includes and equates that are needed to assemble and link the example. (Plus compile .rc file (resources) where applicable and link it) Well some of the commenting is very verbose. That is done purposely to aid those just starting to use Masm32 to explain in detail what the functions do plus other descriptive remarks.
Quote from: zedd151 on December 10, 2022, 04:39:30 PM
Also, you never answered my question.. "Do you have the Masm32 SDK installed on your computer?"
I can answer it for you :biggrin:
Address Hex dump Command Comments
00402550 /$ 6A F6 push -0A ; /StdHandle = STD_INPUT_HANDLE
00402552 |. E8 CB000000 call <jmp.&kernel32.GetStdHandle> ; \KERNEL32.GetStdHandle
00402557 |. 50 push eax ; /hConsole
00402558 |. E8 CB000000 call <jmp.&kernel32.FlushConsoleInputBuf ; \KERNEL32.FlushConsoleInputBuffer
0040255D |> 6A 01 /push 1 ; /Time = 1 ms
0040255F |. E8 CA000000 |call <jmp.&kernel32.Sleep> ; \KERNEL32.Sleep
00402564 |. FF15 6C304000 |call near [<&msvcrt._kbhit>] ; [MSVCRT._kbhit
0040256A |. 85C0 |test eax, eax
0040256C |.^ 74 EF \jz short 0040255D
0040256E |. FF15 68304000 call near [<&msvcrt._getch>] ; [MSVCRT._getch
00402574 \. C3 retn
wait_key proc
invoke FlushConsoleInputBuffer, rv(GetStdHandle,STD_INPUT_HANDLE)
@@:
invoke Sleep, 1
call crt__kbhit
test eax, eax
jz @B
call crt__getch ; recover the character in the keyboard
; buffer and return it in EAX
ret
wait_key endp
Quote from: jjI can answer that for you
:tongue:
I rarely write console programs. :toothy:
Still awaiting learn64bit's source to give some suggestions on Masm32.lib funcs or macros that might be useful for him to complete his project.
hope it's fast nough
1gb a and b will take a while
idea update,
somebody suggest: no need to check line length if memory pointer in last 4mb of memoryOfFile.
I think he is right.
That does not make sense, depending on how the data is stored, if its text with a zero terminator, you scan it like normal, if its binary data it would require a length specified when it was created.
0D0Ah+4mb(all 00h)+0D0Ah
make sure it's not over 4mb, then next step we don't care about line length any more.
this is the idea.
his idea speed up the length check procedure.
(because I check to the endOfFileMemory)
Idea update,
another lady suggest create lineTable(2 column: lineLenght and lineLocation) to speed up the making c and a1 procedure.
I think it's a good idea.
next step will be create these 2 table
I think in order to create table, first we need to find out how many lines in a and b.
idea update,
find the maximum number of lines in a or b.
maybe we can caculate it without open a and b...
it will less than (1 giga)/3...
maybe the formular to caculate it is complicated
I should say the number is in the range of 1 to x...
can we caculate x?
we need a formular
which number is x?
just use the number, we don't need to know why it is this number
[sizeOfTableMemory = 4 bytes * 2 * x]
Quote from: learn64bit on December 12, 2022, 12:12:08 PM
find the maximum number of lines in a or b.
A byte scanner that scans for carriage return/linefeed (0D0Ah) could count the lines. In masm32.lib is a function"lfcnt" that can handle that job.
zedd151,
we don't care about emptyLines(like 0D0A0D0Ah), and we care about noneEmptyLines(like 0D0A000D0Ah)
the step after table be created will be:
make sure every noneEmptyLine in a or b is a unique line
Quote from: learn64bit on December 12, 2022, 01:31:27 PM
... and we care about noneEmptyLines(like 0D0A000D0Ah)
I think you mean 'non empty' lines not 'none empty' lines. But okay, that function that I mentioned will not work, as your example contains zero's.
I cannot think of any ready made function that will do what you want. Reason being there are many functions that work with ascii text. Your example is NOT ascii text as it contains zero bytes.
A zero byte would be recognized as the end of string, and the text handling function would exit immediately after encountering it. That means you need a custom function to count the number of lines, which should not be very hard to do. But you wouldn't be able to depend on (0h) as end of file, but would need to compare the byte counter to the file size to determine if the end of file has been reached. :biggrin: Good luck with this project. :thumbsup:
idea update,
first we need to make sure no duplicated line in a and b.
otherwise one lineTableMemory will be 2.4gb...
Quote from: zedd151 on December 12, 2022, 01:52:57 PM
Quote from: learn64bit on December 12, 2022, 01:31:27 PM
... and we care about noneEmptyLines(like 0D0A000D0Ah)
I think you mean 'non empty' lines not 'none empty' lines. But okay, that function that I mentioned will not work, as your example contains zero's.
I cannot think of any ready made function that will do what you want. Reason being there are many functions that work with ascii text. Your example is NOT ascii text as it contains zero bytes.
A zero byte would be recognized as the end of string, and the text handling function would exit immediately after encountering it. That means you need a custom function to count the number of lines, which should not be very hard to do. But you wouldn't be able to depend on (0h) as end of file, but would need to compare the byte counter to the file size to determine if the end of file has been reached. :biggrin: Good luck with this project. :thumbsup:
https://www.deepl.com/translator:
我想你指的是 "非空 "行,而不是 "非空 "行。但是好吧,我提到的那个函数不会起作用,因为你的例子中含有零。
我想不出有什么现成的函数可以做你想做的。原因是有很多函数可以处理ascii文本。你的例子不是ascii文本,因为它包含零字节。
零字节会被识别为字符串的结尾,文本处理函数会在遇到它时立即退出。这意味着你需要一个自定义函数来计算行数,这应该不难做到。但你不能依赖(0h)作为文件的结束,而是需要将字节计数器与文件大小进行比较,以确定是否已经达到文件的结束。 祝你在这个项目中好运。
亲爱的learn64bit。
你应该使用Deepl.com,它比谷歌翻译好得多。当把你的帖子翻译成英文时,你可以:a)确保我们这些成员理解你的意思;b)提高你的英语水平。你有一个很好的项目,但直到现在还绝对不清楚你真正想做什么。
jj, no hablar chino. this is an English speaking forum. :tongue: Except for hutch, he speaks 'stralian. :toothy:
edited to clarify, just for jj2007 {in response to his PM}:
Yes I know that the link (and translation above) was intended for learn64bit.
My reply was meant as humour, Jochen. :sad:
No need to send PM's. :eusa_naughty: :biggrin:
now a and b are unique line file.
what is maximum number of lines?
I hope tableSpace is not over 256mb.
daydreamer,
we will check available memory later.
I think create table still is a good idea.
idea update,
someone say the number is bigger than 170000000.
so tableSpace will over 256mb.
we cannot afford such big table.
It's a good idea, but now we must forget it.
someone have noticed me: we forgot to make sure have at least one none empty line in a and b.
I think he is right.
Hello,
This is a general soluce to manipulate text easily.
perhaps he could help.
https://codes-sources.commentcamarche.net/source/100782-manipuler-du-texte-avec-des-numeros-de-lignes (https://codes-sources.commentcamarche.net/source/100782-manipuler-du-texte-avec-des-numeros-de-lignes)
TouEnMasm,
thanks.
if check fail, we just exit.
idea update,
do none empty line check before none empty line length check.
none empty line exist, then continue. otherwise exit
hope nothing be forgot to check
q:is your next program called "finding in line" or "FindString", "FindHex"?
a:haha, no! no next program untill I finished the FindLine.
make sure no duplicated line in a.txt
now lineLenght only work in one byte (cmp al,x)
but we can make it work in two(cmp ax,x), three(cmpsb), four(cmp eax,x) and overfour(cmpsb)
ideas still welcome, this is just the beginning.
thanks everyone!
q:where is somebody's idea
a:
xor eax,eax
jmp skipNoneUsedIdeaCode
;none used idea code
mov eax,0
skipNoneUsedIdeaCode:
too slow.. already found a bug...
Quote from: learn64bit on December 29, 2022, 01:08:32 AM
too slow..
Not surprising as you are opening two files @ 1 Gigabyte or more and working on them.
Get the code to work first, then you can work on optimizing it to make it run faster once it works properly. :icon_idea:
Congratulations for your persistence. :thumbsup:
I usually put code on the back burner if I haven't gotten it up and running after a couple of days.
Quote from: learn64bit on December 29, 2022, 01:08:32 AMalready found a bug...
Sorry to hear that. Hope you can find it, you seem to making some progress with your project.
very slow
still very slow, but we must move on
Quote from: learn64bit on December 30, 2022, 01:34:55 AM
still very slow, but we must move on
:biggrin: We will be awaiting the final version. :thumbsup:
this is a dead project, because I am not interested anymore
(hope somebody will finish it someday)
(it's a .rar file, not a .zip file)
in order to create line address and length table, now I need to figure out the maximumLineNumber of b.txt
(when I done this, I will upload the source code again)
Oooops!
(https://i.postimg.cc/xqfQVWVG/1.jpg) (https://postimg.cc/xqfQVWVG)
fixed, another step done
(https://i.postimg.cc/V5ds8MXL/1.jpg) (https://postimg.cc/V5ds8MXL)
now I can create a lineTable for b.txt
(btw: if anybody want the updated source code, just let me know, I will upload it)
(I make too many changes in one day, no way to upload every version of it)
do I need to get the modulo 4096?
Quote from: learn64bit on April 12, 2024, 07:13:15 PMdo I need to get the modulo 4096?
Some modulo numbers can be replaced with faster boolean opcode AND edx,4095
Modulo 2,4,8,16,32,64,128,256,512,1024,2048,4096,8192,16384,32768,65536.. Can be replaced with use and 1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535...
daydreamer, welcome to my post.
Oooops, now create table memory failed.
search for "int 3" for the problem point
File a.txt not exist
a.txt and b.txt are in old zip
my test b.txt is almost 400mb.
you can make your a.txt or b.txt test file use findFile, findFileSha128 and findFileSha256.
how I make my a.txt and b.txt?
C:\Users\ydc-24h2>cd \a
C:\a>vs_Community.exe --layout c:\b
Visual Studio 2022 - 17.9.34728.123 - v17.9.06 - Evergreen
findFileSha256 -> C:\b
Result.txt -> a.txt
17.9.34728.124 -> b.txt
this App's goal is to make 3 files:
c.txt will have all the old files in both .123 and .124.
d.txt will have all the new added files only in .124.
e.txt will have all the deleted files only in .123
There are many zips in this 8-page thread. Give us a URL.
here, #103
a.txt is one byte file
00h
b.txt is one byte file
01h
findFile(with source code, it will crash after made Result.txt at my 12tb F:\, no fix for this bug yet), findFileSha128(with source code, no bug be found)
and findFileSha256 are in my RadASM for chinese post
You have a buffer overflow at " mov [ebx+4],edx ;fake length"
the line just after your inserted 'int 3'
yes, no fix for that yet.
it will come
Just curious: does anyone here know what this person is trying to do with a.txt and b.txt? Scratching my head here ...
to make c.txt, d.txt and e.txt
Okay I located your files A and B in a previous posting, no buffer overflow - but it doesn't seem to do anything. Previously I just used two random files, renamed to a.txt and b.txt.
The a.txt and b.txt you supplied are very short. Do you have a larger example of those?
I have 1gb a.txt and 1gb b.txt for test.
You can make your own
1byte to 1gb is the working range
(of cause, I have c.txt, d.txt and e.txt to check whether the App's ouputs are wrong or correct)
Quote from: learn64bit on April 13, 2024, 07:12:27 AMI have 1gb a.txt and 1gb b.txt for test.
You can make your own
Well, actually I can't. I have no idea how the files are structured or formatted. Couldn't you trim down those two huge files (just give a small sampling, but greater than only a couple bytes each) and zip them? (.zip file need to be less than 500kb)
test1
a.txt(1byte)
00h
b.txt(1byte)
01h
test2
a.txt(4bytes)
00h 0Dh 0Ah 01h
b.txt(1byte)
00h
....
test9999999
...
red
line(1byte to 4mb)
blue
line seperator(2bytes)
you can make it
Quote from: sudoku on April 13, 2024, 07:20:28 AMQuote from: learn64bit on April 13, 2024, 07:12:27 AMI have 1gb a.txt and 1gb b.txt for test.
You can make your own
Well, actually I can't. I have no idea how the files are structured or formatted. Couldn't you trim down those two huge files (just give a small sampling, but greater than only a couple bytes each) and zip them? (.zip file need to be less than 500kb)
I fully agree :thumbsup:
test12
d.zip
a.txt and b.txt
e, d and e.zip
c.txt, d.txt and e.txt
I forgot to delete the empty line in a.txt, b.txt and c.txt
Okay....these are actually really text files. That original files a.txt and b.txt had zero bytes in them.
Anyway, the program runs, does nothing noticeable, then exits.
test12 is.
but test1 have 2 files, they are not text file
it's unfinished yet
Quote from: learn64bit on April 13, 2024, 07:46:17 AMtest1 have 2 files, they are not text file
They most certainly are ascii text files. Only normal alphabetic characters, punctuation and carriage return/linefeed pairs. Unless I am missing something.
I did notice that it did return the number of lines in b.txt in the console before exiting. Not sure what c.txt, d.txt and e.txt are supposed to contain after running the program - it appears the the program itself doesn't use them.
Quote from: sudoku on April 13, 2024, 07:54:27 AMQuote from: learn64bit on April 13, 2024, 07:46:17 AMtest1 have 2 files, they are not text file
They most certainly are ascii text files. Only normal alphabetic characters, punctuation and carriage return/linefeed pairs. Unless I am missing something.
I did notice that it did return the number of lines in b.txt in the console before exiting. Not sure what c.txt, d.txt and e.txt are supposed to contain after running the program - it appears the the program itself doesn't use them.
the App, it's unfinished yet.
this "e, d and e.zip" file have c.txt, d.txt and e.txt
test1 in #103
I'll come and look at this again, at another time. :smiley:
test1, all files: a.txt, b.txt, c.txt, d.txt and e.txt.
Quote from: learn64bit on April 13, 2024, 11:04:00 AMa.txt, b.txt, c.txt, d.txt and e.txt.
Four one byte files, one zero byte files. It seems we have a cultural barrier here...
Quote from: sudoku on April 13, 2024, 07:20:28 AMWell, actually I can't. I have no idea how the files are structured or formatted. Couldn't you trim down those two huge files (just give a small sampling, but greater than only a couple bytes each)
Quote from: jj2007 on April 13, 2024, 11:13:15 AMQuote from: learn64bit on April 13, 2024, 11:04:00 AMa.txt, b.txt, c.txt, d.txt and e.txt.
Four one byte files, one zero byte files. It seems we have a cultural barrier here...
Quote from: sudoku on April 13, 2024, 07:20:28 AMWell, actually I can't. I have no idea how the files are structured or formatted. Couldn't you trim down those two huge files (just give a small sampling, but greater than only a couple bytes each)
test12 is bigger than test1.
test12 at #122.
(they all have full 5 files for test)
the final version of findLine will pass test1 to test9999999...
fix the table memory creation - a line table for 2616406 lines b.txt
(https://i.postimg.cc/p9zH6VtR/1.jpg) (https://postimg.cc/p9zH6VtR)
buffer overfow bug fixed
(https://i.postimg.cc/SYfs4YNk/1.jpg) (https://postimg.cc/SYfs4YNk)
After playing with this, a little more results from my latest test.
This is the output:
File a.txt exist.
File b.txt exist.
File a.txt is in the range of 1 byte to 1 giga bytes.
File b.txt is in the range of 1 byte to 1 giga bytes.
PhysicalPageSize: 4096 bytes.
None empty line exist in a.txt.
None empty line exist in b.txt.
None empty line is in the range of 1 byte to 4 mega bytes in a.txt.
None empty line is in the range of 1 byte to 4 mega bytes in b.txt.
25 <------------------------------------------------ line count in 'b.txt'...
Now we can do something.
reserved 8192 bytes <---------------------------------------- buffer reserved for future use??
30000 <---------------------------------------- what is this number for??
Press any key to continue...
It shows 25 as line count of my b.txt
reserved 8192 bytes (presumably for future use.)
I am not sure what the '30000' is, as it doesn't have any description or comment.
8192d <- buffer size in bytes.
30000h <- buffer base address
Ooops! I forgot to turn the "duplicated line check" on.
(it should not bypass the dupLineCheck)
your b.txt only have 25 lines, you should test 9999999999 lines
Quote from: learn64bit on April 13, 2024, 10:04:56 PM8192d <- buffer size in bytes.
30000h <- buffer base address
Ah, okay. that makes sense.
What about
30000 1000
table storage test sccess, now we can do something!
Press any key to exit...
Is that supposed to be 30000h and 1000h?
30000h <- buffer base address
1000h <- current buffer size in bytes
allocation on demond
I don't make limit on buffer size
Quote from: learn64bit on April 13, 2024, 10:15:10 PMallocation on demond
I don't make limit on buffer size
Allocation on demand. Does that mean the program calculates how much is needed? Or does the program need input from the user?
yes, it depends on how many lines in b.txt
Quote from: learn64bit on April 13, 2024, 10:24:04 PMit depends on how many lines in b.txt
Ah, okay.
I noticed that the program is supposed to look for duplicate lines also. Is this correct?
yes, because that function is too slow on my 400mb b.txt, so I just bypass it for test new functions
Quote from: learn64bit on April 13, 2024, 10:29:36 PMyes, because that function is too slow on my 400mb b.txt, so I just bypass it for test new functions
What exactly is too slow? Comparing the lines of text to another line of text?
Or will the lines contain characters other than ascii text also, as in your very first posts in this thread.
If we can ascertain exactly where you are having trouble with that, we might be able to help.
b.txt
00h 0D0Ah 00h
two lines have same 00h(line content), that's not for this App, it suppose to be no dupLine in a.txt and b.txt
Quote from: learn64bit on April 13, 2024, 10:39:20 PMb.txt
00h 0D0Ah 00h
two lines have save 00h(line content), that's not for this App, no dupLine in a.txt and b.txt
Okay, so the finished program needs non-ascii characters. Okay, that will require custom functions. That is because most of the string compare functions I have seen, once a 00 is found, it is considered to be the end of the string.
I will reread everything in this thread to try to define, in a clear manner, what I think you are trying to do. I will come back to this later today, I have some other work to do right now, so please be patient. :smiley:
Is your 400mb file Unicode, UTF8, wide character, or something else? Seems it might be a mix of one of those and ascii text.
Do you have any way to upload that 400 megabyte file online somewhere, so you can post a link to it?
Seeing the actual file that you want to work on would help to determine the structure of what you refer to as 'lines'.
Before I go, can you post the code (or attach it) of the function that you were originally doing the comparisons with? The one that you said was 'too slow'. That might help some...
oh, it's already in source code, just comment out the "jmp GotoC"
Quote from: learn64bit on April 13, 2024, 11:03:23 PMyes, I will, but now I already add some new code, so it cound not be ml&link, I will upload it later
okay.
Quote from: learn64bit on April 13, 2024, 11:03:23 PMoh, it's already in source code, just comment out the "jmp GotoC"
lol, you changed your post. I just noticed that...
I'll look at this.
Upon a precursory look at your code, I think that you are making it more complicated than it needs to be.
TokenPrivilege stuff? For what? Pages stuff?
I would open and read both files into their own buffers. Close the handles.
Get line count for each file.
Using the line count for each, I would allocate two buffers that will hold pointers to each line for both files. Process that bit, into an array of pointers.
Using those arrays, I would then call the compare function in a loop.
I notice that all your sub functions are all nested in the main procedure, I would split off the compare function, so as not to have duplicated code.
Now, I have one question. Are you wanting to check if lines from a.txt have duplicate lines in b.txt (or vice versa) or are you only looking for duplicate lines in each file by itself?
For a 400mb file, it would need a very fast compare function. I would look into mmx for that, rather than cmpsb cmpsd or cmpsw.
Oh, another question... do the lines have different lengths, or are they all the same length?
only looking for duplicate lines in each file by itself.
line length: 1 byte to 4 mega bytes.
file length: 1 byte to 1 giga bytes
Quote from: learn64bit on April 14, 2024, 12:33:04 AMonly looking for duplicate lines in each file by itself
Okay, that simplifies it somewhat.
I have an idea. Make two files with maybe 10 lines each from your 400mb file, and attach it here. Later today if I have time, I will make a small program that I think should meet your needs.
only have line length and file length limit, no lines limit
it must can do 4mb, 1gb and no lines limit
Quote from: learn64bit on April 14, 2024, 12:38:19 AMonly have line length and file length limit, no lines limit
Thats fine, I just need a small sample of the file you want to process...
you can try test1 and test12, they have all test files.
Quote from: learn64bit on April 14, 2024, 12:42:13 AMyou can try test1 and test12, they have all test files.
Okay. I will work on this, but have some other stuff that I have to do first.
I'll post it later today.
if your app passed the test1 and test12, I will upload test2 and test26100 test files
Ok
Do you want to remove the duplicate line from whichever file it is in, Or do you simply want to know that a duplicate line is there?
Do you want to put the duplicate line in a separate file? If so, do you want to list all the duplicate lines found in the file, in the new file?
if found dupLine, just quit.
if found empty line(2 0D0Ah), quit
it means file currupted
Quote from: learn64bit on April 14, 2024, 02:14:42 AMif found dupLine, just quit.
if found empty line(2 0D0Ah), quit
Okay, I have a couple ideas. It will take a little time though.
Okay I had a little time to start on this:
include \masm32\include\masm32rt.inc
load_file proto :dword ;; memory pointer in eax, size of file in ecx
countlines proto :dword, :dword ;; line count in eax
.data
infile_a db "a.txt", 0
infile_b db "b.txt", 0
mem_a dd 0
mem_b dd 0
size_a dd 0
size_b dd 0
linecount_a dd 0
linecount_b dd 0
.code
countlines proc src:dword, slen:dword
push edx
push ecx
mov ecx, src
xor eax, eax
xor edx, edx
dec edx
dec eax
countline:
inc eax
@@:
inc edx
cmp edx, slen
jz @f
cmp word ptr [ecx+edx], 0A0Dh
jz countline
jmp @b
@@:
pop ecx
pop edx
ret
countlines endp
start:
invoke load_file, addr infile_a
cmp eax, -1
jz failed
mov mem_a, eax
mov size_a, ecx
invoke load_file, addr infile_b
cmp eax, -1
jz failed
mov mem_b, eax
mov size_b, ecx
invoke countlines, mem_a, size_a
mov linecount_a, eax
invoke countlines, mem_b, size_b
mov linecount_b, eax
fn MessageBox, 0, str$(linecount_a), "File a.txt", 0
fn MessageBox, 0, str$(linecount_b), "File b.txt", 0
failed:
invoke GlobalFree, mem_a
invoke GlobalFree, mem_b
invoke ExitProcess, 0
load_file proc lpName:dword
local hFile:dword, fl:dword, bRead:dword, hMem$:dword
push esi
push edi
push ebx
push edx
invoke CreateFile, lpName, GENERIC_READ, FILE_SHARE_READ or FILE_SHARE_WRITE, 0, OPEN_EXISTING, 0, 0
cmp eax, -1
jne @F
fn MessageBox, 0, lpName, "couldn't read file!", MB_OK
mov eax, -1
jmp outta
@@:
mov hFile, eax
invoke GetFileSize, hFile, 0
mov fl, eax
add eax, 4
invoke GlobalAlloc, GPTR, eax
cmp eax, 0
jnz @f
fn MessageBox, 0, lpName, "couldn't allocate memory!", MB_OK
mov eax, -1
jmp outta
@@:
mov hMem$, eax ; source file memory
invoke ReadFile, hFile, hMem$, fl, addr bRead, 0
invoke CloseHandle, hFile
mov eax, hMem$
mov ecx, fl
outta:
pop edx
pop ebx
pop edi
pop esi
ret
load_file endp
end start
So far it only loads a.txt and b.txt into memory and counts the lines in each file.
It does not depend on a terminating zero for the end of file, but rather the file size.
Two message boxes will show the line count for each file. :smiley:
* Notice how nicely formatted the code is, and easier to read.
I will work on the line compare code later on...
:thumbsup:
if it works, it can be understand by human.
I love "not" human readable code, usually, it's very good.
I even accept AI wrote code
Quote from: learn64bit on April 14, 2024, 03:06:28 AM:thumbsup:
if it works, it can be understand by human.
Non-humans may even be able to understand it. :rofl:
testproject.zip\a.txt and b.txt both have one empty line at the end(the last line), that I forgot to delete
Quote from: learn64bit on April 14, 2024, 03:16:50 AMtestproject.zip\a.txt and b.txt both have one empty line at the end(the last line), that I forgot to delete
Not really. There is no ODOAODOAh at the end of the file. The single ODOAh at the end is the line terminator for the end of the last line. :smiley:
The line counting procedure depends on every line to have ODOAh at the end of each line, as the line terminator.
I found a bug
(https://i.postimg.cc/Y4QDyLFR/1.jpg) (https://postimg.cc/Y4QDyLFR)
it will failed at emptyLine check
Quote from: learn64bit on April 14, 2024, 03:23:41 AMI found a bug
Did you remove the ODOAh at the end of that line???? That is the line terminator! :icon_idea:
Quote from: learn64bit on April 14, 2024, 03:24:33 AMit will failed at emptyLine check
I did not write any code for that yet...
my app works, haha
(https://i.postimg.cc/hJ4QNHzY/1.jpg) (https://postimg.cc/hJ4QNHzY)
not finished, final goal is too create c.txt, d.txt and e.txt...
a long way to go
Quote from: learn64bit on April 14, 2024, 03:31:10 AMnot finished, final goal is too create c.txt, d.txt and e.txt...
a long way to go
My test project also is not yet finished. So be patient, I will try to have it do everything you need. So yes, in its current form, it does not yet detect empty lines.
Quote from: sudoku on April 14, 2024, 03:26:06 AMQuote from: learn64bit on April 14, 2024, 03:23:41 AMI found a bug
Did you remove the ODOAh at the end of that line???? That is the line terminator! :icon_idea:
yes, you should remove the ODOAh at the end of that line, otherwise file is currupted
Quote from: learn64bit on April 14, 2024, 03:34:25 AMQuote from: sudoku on April 14, 2024, 03:26:06 AMQuote from: learn64bit on April 14, 2024, 03:23:41 AMI found a bug
Did you remove the ODOAh at the end of that line???? That is the line terminator! :icon_idea:
yes, you should remove the ODOAh at the end of that line, otherwise file is currupted
How do you figure that? That is the line terminator for the last line.
How would you properly count all lines without it?
you can do it
Quote from: learn64bit on April 14, 2024, 03:45:15 AMyou can do it
We will see. I will not post anything further until I think this little project is finished. :smiley:
unfinshed source code, Warning: don't run the exe!
(it will ml&link, but don't run the exe!)
Quote from: sudoku on April 14, 2024, 12:22:15 AMI would open and read both files into their own buffers. Close the handles.
Get line count for each file.
Using the line count for each, I would allocate two buffers that will hold pointers to each line for both files. Process that bit, into an array of pointers.
Using those arrays, I would then call the compare function in a loop.
I notice that all your sub functions are all nested in the main procedure, I would split off the compare function, so as not to have duplicated code.
Now, I have one question. Are you wanting to check if lines from a.txt have duplicate lines in b.txt (or vice versa) or are you only looking for duplicate lines in each file by itself?
For a 400mb file, it would need a very fast compare function. I would look into mmx for that, rather than cmpsb cmpsd or cmpsw.
You should use one block read into allocated array,instead read each line for faster speed
Random generated 400MB file alternative instead of download 400 MB file, I have very fast SSE2 128 bit rnd generator
Processing two buffers SIMT alternative to process them faster?
I just noticed my emptyLineCheck and dupLineCheck not work right....
(maybe there's bug or I bypassed them... anyway, I will go forword, not go back to fix bug)
Y'know, I really have to laugh. I'm surprised that y'all are still engaging with this clown here. Let me ask you a few questions:
- Q: Do any of you understand what any of those files are for?
- A: No.
- Q: Do any of you understand what the format of any of those files is?
- A: No.
- Q: Do any of you understand what the purpose of this program even is?
- A: No.
Now none of this is a reflection on you who have been following this clown here; it's a reflection on
them, since they've revealed pretty much nothing that would shed any light on any of these questions, despite repeated requests to do so.
Me, I really don't care: the instrument has yet to be invented that could measure my indifference to what this clown is doing.
I'm just sittin' here eating my popcorn ...
test file b.txt (1 line 4mb).
a lots way to make your own test file
(if anyone need the 1gb b.txt, just let me know)
I actually downloaded the damn thing.
It's 4 MB of ... ZEROES. (Note to self: 4096 KB ≠ 4.096 bytes.)
Care to show us your advanced programming technique for producing such a file? Gosh, I wonder how someone would do such a thing ...
I tell you, you're all being played! You've been pwned! All your base are belong to us!
More popcorn!
Quote from: NoCforMe on April 14, 2024, 05:09:24 AMI actually downloaded the damn thing.
Wot?
You?
Quote from: NoCforMe on April 14, 2024, 05:09:24 AM4 MB? Nope, it's 4 KB. 4,096 bytes of ... ZEROES.
You are partly wrong: the zip archive has 4 kB, but the extracted file has indeed
4 MB of zeroes. You should acknowledge, at least, that the compression rate of that archive is remarkably good :thumbsup:
I fully agree re popcorn :cool:
NoCforMe, jj2007:
1. learn64bit has posted his source code.
2. Explained the contents of the file he wants to process
3. Explained what he wants his program to do.
4. Answered each and every question I have asked of him, regarding what he is trying to achieve in his code.
Yes he does seem a bit confused about what constitutes an ascii text file, but I see no reason to not help him. So, what is the issue here?
This is the Orphanage after all, not one of the more mainstream boards.
oh, kids no fighting, we have better thing to enjoy our time.
anyway, it's your time, you can do want ever you want
table creation has already finished.
now I am at a.txt first line and b.txt first line length compare, then byte compare
(https://i.postimg.cc/8JLMLRh1/1.jpg) (https://postimg.cc/8JLMLRh1)
Quote from: learn64bit on April 14, 2024, 04:04:10 AMunfinshed source code, Warning: don't run the exe!
(it will ml&link, but don't run the exe!)
Hey you two, download the right one and you will see code :biggrin:
Quote from: sudoku on April 14, 2024, 06:41:26 AMNoCforMe, jj2007:
1. learn64bit has posted his source code.
2. Explained the contents of the file he wants to process
3. Explained what he wants his program to do.
4. Answered each and every question I have asked of him, regarding what he is trying to achieve in his code.
OK, if you say so.
I don't see it.
And "text files" that contain zero bytes?
There's a Yiddish word for that:
meshugah.
For what looks like a simple line comparison, that code is like a HLL disassembly. Half of the function calls to Windows I had to look up :biggrin:
no limit, if your code works, it can be understand, please share your code with us.
(if you use coding AI, I'm ok with that)
Quote from: sinsi on April 14, 2024, 02:39:38 PMHalf of the function calls to Windows I had to look up
You mean stuff like AdjustTokenPrivileges, AllocateUserPhysicalPages, LookupPrivilegeValue, MapUserPhysicalPages, SetUnhandledExceptionFilter? I always use them to compare two lines of text, how do you do that?
Quote from: sudoku on April 13, 2024, 10:12:10 PMQuote from: learn64bit on April 13, 2024, 10:04:56 PM8192d <- buffer size in bytes.
30000h <- buffer base address
Ah, okay. that makes sense.
What about
30000 1000
table storage test sccess, now we can do something!
Press any key to exit...
Is that supposed to be 30000h and 1000h?
I just upload the new source code, you can understand it with code
add text to these numbers for you guys easy understand
(https://i.postimg.cc/qtwktjK3/1.jpg) (https://postimg.cc/qtwktjK3)
what I'm thinking:
if found aLine in b.txt, just write it to c.txt.
if didn't found aLine in b.txt, write it to d.txt
c compiler's asm code is something
(https://i.postimg.cc/xc1xMpmD/1.jpg) (https://postimg.cc/xc1xMpmD)
now the question is how to make sure the two lines are same or not?
it's 1 byte to 4 mega bytes, so it must be something special.
otherwise it gonna be slow as the dupliLineCheck
Quote from: learn64bit on April 14, 2024, 08:01:04 PMc compiler's asm code is something
Very interesting :thumbsup:
Can you explain in a few words what it means? It's your code, right?
about the a,b,c,d and e.
I read a file, so that file is a.txt.
I read another file, so that file is b.txt.
I write a file, so that file is c.txt.
I write another file, so that file is d.txt.
...
(you can choose whatever you want)
(https://i.postimg.cc/crFZZjD4/1.jpg) (https://postimg.cc/crFZZjD4)
Quote from: jj2007 on April 14, 2024, 05:12:16 PMQuote from: sinsi on April 14, 2024, 02:39:38 PMHalf of the function calls to Windows I had to look up
You mean stuff like AdjustTokenPrivileges, AllocateUserPhysicalPages, LookupPrivilegeValue, MapUserPhysicalPages, SetUnhandledExceptionFilter? I always use them to compare two lines of text, how do you do that?
Yeah, odd. I had told him already that he doesn't need all of that. He makes his project overly complicated.
Anyway, I am working on something for him. I just had not had the time yet to finish it.
Quote from: learn64bit on April 14, 2024, 05:38:26 PMQuote from: sudoku on April 13, 2024, 10:12:10 PMQuote from: learn64bit on April 13, 2024, 10:04:56 PM8192d <- buffer size in bytes.
30000h <- buffer base address
Ah, okay. that makes sense.
What about
30000 1000
table storage test sccess, now we can do something!
Press any key to exit...
Is that supposed to be 30000h and 1000h?
I just upload the new source code, you can understand it with code
I'll take a look at it...
The one in post #187?
Tip:
You should make uploads of different version have a slightly different name,
For instance "findline_1, findline_2" or findline_A, findline_B, etc
Otherwise the versions could get mixed up, especially if the content is different from each other. Otherwise the user downloading it will have to do that, or may it be ignored just because it is not clear the latest upload is different (judging by its name).
I think #187 did a very good job. I don't wanna take its job....
(anyway, maybe next upload will be findLine.9.zip)
add line data to c.txt.
add 5 lines, repeat 5 times "open, write date, close". looks like no better way
(https://i.postimg.cc/gwcWxLsp/1.jpg) (https://postimg.cc/gwcWxLsp)
If I have time later today, I'll download the version in #187 and take a look at it. :smiley:
if the content is different from each other.
yes, I begin to delete old idea test code. otherwise file will become very big.
if a better idea test code works and faster than current code, I will delete all the current code
Ok. It's up to you...
Quote from: learn64bit on April 14, 2024, 10:36:26 PMnow the question is how to make sure the two lines are same or not?
it's 1 byte to 4 mega bytes, so it must be something special.
otherwise it gonna be slow as the dupliLineCheck
Hint:
You need to compare dwords (at least) rather than one byte at a time. You know the length of each line, so it should not be too hard.
Personally I would try two dwords at a time in a custom compare function.
At any rate bytewise compare is the slowest way to compare two lines of text.
Another tip:
Your text file (the multiple megabyte one) also contains binary zeros and 0FFh bytes. Normal string compare functions may not work for this. (they would consider the binary zero as the end of the string), therefore a custom function would be needed.
the 2 lines have same length, otherwise they are differrent
Quote from: learn64bit on April 15, 2024, 01:28:55 AMthe 2 lines have same length, otherwise they are differrent
Of course. If two lines do not have the same length we skip them, else we perform the compare on them.
I only mentioned length since we need that for the compare function. Since your 'text' contains binary 00, we need the length of the line to know when the line has ended. Else a "normal" string compare will consider the binary 00 as the end of a string.
learn64bit,
You said that the last line cannot contain a carriage return/line feed (0x0D0A), is this correct?
All other lines do contain the carriage return and line feed then?
I am still gathering information for the functions I am preparing. :smiley:
Quote from: learn64bit on April 14, 2024, 10:36:26 PMnow the question is how to make sure the two lines are same or not?
it's 1 byte to 4 mega bytes, so it must be something special.
otherwise it gonna be slow as the dupliLineCheck
I just found a function 'cmpmem' in the masm32 library that should meet your needs (I knew I seen it somewhere, and I found where), here is the code... You don't need to copy this code, its already in masm32.lib and will work without needing to declare a prototype seperately as well. I am showing you this so you know what is in the function.
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 4
cmpmem proc buf1:DWORD,buf2:DWORD,bcnt:DWORD
mov ecx, [esp+4] ; buf1
mov edx, [esp+8] ; buf2
push esi
push edi
xor esi, esi
xor eax, eax
mov edi, [esp+20] ; bcnt
cmp edi, 4
jb under
shr edi, 2 ; div by 4
align 4
@@:
mov eax, [ecx+esi] ; DWORD compare main file
cmp eax, [edx+esi]
jne fail
add esi, 4
sub edi, 1
jnz @B
mov edi, [esp+20] ; bcnt ; calculate any remainder
and edi, 3
jz match ; exit if its zero
under:
movzx eax, BYTE PTR [ecx+esi] ; BYTE compare tail
cmp al, [edx+esi]
jne fail
add esi, 1
sub edi, 1
jnz under
jmp match
fail:
xor eax, eax ; return zero if DIFFERENT
jmp quit
match:
mov eax, 1 ; return NON zero if SAME
quit:
pop edi
pop esi
ret 12
cmpmem endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
example usage:
"invoke cmpmem, addr mem1, addr mem2, sourcelength"
Return value in eax is zero, if they do NOT match. If eax is NOT zero on return, you have a match.
Where mem1 is pointer to the first 'string' you want to compare with, mem22 is pointer to the second 'string', sourcelength is of course the source length. This function doesn't care if there are non ascii charcters.
This works for ascii strings or binary data. It compares four bytes at a time, in the case that the last bytes are fewer than four, it does a byte comparison for the remainder.
...
This should give you some ideas...
it looks like 6 hours faster than the stupid "repe cmpsb" on my 400mb test file
Quote from: learn64bit on April 15, 2024, 12:58:48 PMit looks like 6 hours faster than the stupid "repe cmpsb" on my 400mb test file
how long did it take?
Have you tested that the results are what you expected?
In any case, it is a step in the right direction. :smiley:
Here is a modified version of 'cmpmem', called cmpmem2. It should be a bit faster than the original 'cmpmem' from the masm32 library. It compares two dwords at a time.
You need to copy this code, as it is NOT in the masm32 library. And use the prototype also.
removed by author
Intended recipient didn't want it
test the buffer overflow or bad buffer
(https://i.postimg.cc/svp6LMsg/1.jpg) (https://postimg.cc/svp6LMsg)
(https://i.postimg.cc/nXN1cZHP/1.jpg) (https://postimg.cc/nXN1cZHP)
bypass bufferCheck:
jmp endTestMemory
I think if the 2 line's cmpMem is match, we should modify bLineTable's length to 0 to prevent useless and repeat line length and data compare
Quote from: learn64bit on April 15, 2024, 02:54:21 PMI think if the 2 line's cmpMem is match, I should modify bLineTable's length to 0 to prevent useless and repeat line length and data compare
Sounds logical. Test it, to see if there is a big difference in the time it takes to compare the files. :thumbsup:
Anything that will help reduce the time it takes will be a great help. You shouldn't need to wait hours, or (days?) for results.
a bug!
it only made d.txt and it didn't made c.txt.
something I must did wrong
(zip file list: a.txt, b.txt, findLine .asm .cmd)
fixed
not just the speed problem.
it only just passed 2 test, and failed more than 9999 test
make it work first (easy job), then speed (mybe too hard for me)
this one passed sudoku\testproject.zip\a.txt and b.txt test
(of cause, the trailing 0D0Ah must be deleted, otherwise they will not pass miscCheck, if you bypassed miscCheck this app will crash)
Quote from: learn64bit on April 15, 2024, 10:15:08 PMmake it work first (easy job)
Definitely, it has to work
before any improvements can be made. :thumbsup:
Of course, you must verify the results to make sure it really works. If it simply does not crash, that should not imply automatically that the program works - results need to be verified.
Quotethen speed (mybe too hard for me)
It shouldn't really be too hard. You just need to understand where your code is the slowest. Usually within a loop (or loops), especially when working on large data. :smiley: hint: 'repe cmpsb' is a loop, btw...
Quote from: learn64bit on April 15, 2024, 10:34:12 PMthis one passed sudoku\testproject.zip\a.txt and b.txt test
(of cause, the trailing 0D0Ah must be deleted, otherwise they will not pass miscCheck, if you bypassed miscCheck this app will crash)
The compare code that I showed you, was intended for you to modify your code to use that function in your compare routine rather than the 'repe cmpsb' that you are now using.
Today as yesterday, I have a lot of work to do outside. So, I won't be able to look at your latest versions for several hours. When I have enough time to do it, I will dissect your code - then remove the unneeded API calls that you have there. Possibly rewrite other portions of it. (That part might take some time maybe a couple days)
Assembly code does not need to be bloated like C++ code. :wink2:
a have one extra line in the end compare to b.
dupliLineCheck on b took 87 hours.
bypass dupliLineCheck make d 715kb took 30mins (d should be same as b. c should have the one extra line).
(my guess it will take 50 hours to finish)
(https://i.postimg.cc/tYC6SfmN/1.jpg) (https://postimg.cc/tYC6SfmN)
Quote from: learn64bit on April 16, 2024, 12:36:04 AMdupliLineCheck on b took 87 hours.
:dazzled:
How many lines were processed that took THAT long?
Was that using "repe cmpsb"??
I am not young, but I wouldn't want to grow old(er) waiting for that program to finish. :rofl:
Sheesh! almost four days!! Is it really worth it..?...?
What exactly is this program for, for you to waste 3 and a half days of your life, for it to finish?
I most certainly would not wait for any program to finish, for more than maybe a few minutes. Maybe up to an hour, if the program is actually doing something very useful and no other means or skills to do it better.
AdjustTokenPrivileges
AllocateUserPhysicalPages
FreeUserPhysicalPages
GetCurrentProcess
GetSystemInfo
LookupPrivilegeValueA
MapUserPhysicalPages
OpenProcessToken
SetUnhandledExceptionFilter
VirtualQuery
I am very curious. Do you know what these api's do? And how do they help your program?
Or did you just copy and paste code from some other example??
I am just trying to understand why you make your code more complicated than it really needs to be, not making fun of you or anything like that. Just trying to understand...
In the disassembly, I notice also some unused gdi32 api calls... in findLineV9??
V9 have source code, maybe I forget to delete or comment old idea test code.
v10 and v11 only have 2 or 3 line code added.
if you have other idea, you can replace any code, it's ok
v12
I just checked V9, no exe, only source code.
you ml&linked. then disassemble exe? haha.
anyway, you can do whatever you want
complicated thing is good for brain.
(my modified RadASM2 source code is far more complicated than this one, without my comments, almost nobody can understand, haha)
You haven't answered my questions from post #225
If you can't or won't, I see no reason for me to continue here.
did it pass the test1?
pass test1 and test12. I will upload 1gb a.txt.
you can quit. you can do whatever you want
Quote from: learn64bit on April 16, 2024, 09:05:31 AMdid it pass the test1?
pass test1 and test12. I will upload 1gb a.txt.
you can quit. you can do whatever you want
I was just giving you some simple examples so maybe you can get new ideas for your program - since it is quite obvious that you are having some difficulty with it.
I hope that everything works out for you. :thumbsup:
it looks like v12 works.
Quote from: jj2007 on April 14, 2024, 10:49:36 PMQuote from: learn64bit on April 14, 2024, 08:01:04 PMc compiler's asm code is something
Very interesting :thumbsup:
Can you explain in a few words what it means? It's your code, right?
nothing special, it's just a simple vc++ code compiled with listing files
somebody say sort table by length.
maybe next version will try that
Quote from: learn64bit on April 16, 2024, 07:29:03 PMnothing special, it's just a simple vc++ code compiled with listing files
So, is that what you've been working on. VC++ generated assembly? :rolleyes:
That would explain the bloat, and unnecessary API calls.
Carry on. Good luck with your project. :smiley:
Seems that you don't really want to
learn masm32 assembly - at least that is what it looks like to me. :sad:
somebody said about the app: "c+d = a", haha
Quote from: learn64bit on April 16, 2024, 11:37:15 PMsomebody say "c+d == a", haha
You keep on saying "somebody say...". Are you getting advice from another forum, or from some social media site?
cn have 1.4 billion people
Quote from: sudoku on April 14, 2024, 06:41:26 AMNoCforMe, jj2007:
1. learn64bit has posted his source code.
2. Explained the contents of the file he wants to process
3. Explained what he wants his program to do.
4. Answered each and every question I have asked of him, regarding what he is trying to achieve in his code.
Yes he does seem a bit confused about what constitutes an ascii text file, but I see no reason to not help him.
After working a few days with him, giving him good and simple code examples, etc., when I finally asked again exactly what he needs this code for, no response.
He has not used the code I offered, keeps telling me 'someone say do this, or someone said do that...'. Well, all I can do is let him listen to 'someone', as it seems clear that he does not want to really 'learn' Masm32 assembly coding. Ironic, given his handle. Maybe if I gave him 64 bit examples???...
I have removed my code and attachments that I prepared specifically for his needs, since he doesn't seem to be interested in it. :sad:
So, I guess both of you were correct. I was just wasting my time in this thread. And I was about to ask that this thread be moved to the Workshop since learn64bit was posting code here. Now I think it should stay here, so not to waste anyone else's time.
(too many people share code, share idea, share their modifid findLine.
almost 8k people intrested this simple findLine.
coding, testing, enjoy.
I only provide my intrested version.
in the end, we will have 10k version of findLine. and my version will not the best one)
this one bypass all miscCheck
Quote from: learn64bit on April 18, 2024, 06:34:59 AMalmost 8k people intrested this simple findLine
You made my day :thumbsup:
Quote from: sudoku on April 18, 2024, 04:26:42 AMSo, I guess both of you were correct.
So you can remove the "s" in your signature ;-)
yes, people is important
(maybe this why cn getting strong.
no law restrict gov.
no envirment protecton.
no worker protection.
product is so cheap.
smartphone, solar panel, EV, Li-Battery...
99% dirty cheap, 1% is very good.
I hope all people enjoy cheap smartpone. cheap EV. not just samsung. telsa. we need more)
Quote from: learn64bit on April 18, 2024, 07:13:16 AMno envirment protecton.
no worker protection.
Paradise :thumbsup:
yes, 7 month buid tesla factory.
my guess No1 speed if in EU.
(maybe 6 month in US)
Quote(maybe this why cn getting strong.
the "strong" isn't accurate."Fat" it is.
1) most people live for "money",very few people focus on science.
2) each rich man or senior officials or university professor consume averagely more than three girls.
3) people's moral standards are getting lower and lower,even lower than animals. for example, a mother threw her child out of the window from high-rise residence.
4) entertainment until death is the mainstream in social atmosphere.
yes, looks like no way to stop people buy cheap product.
findLine
now time to try the FileMapping idea
(https://i.postimg.cc/fkw6CY3D/1.jpg) (https://postimg.cc/fkw6CY3D)
v248: a.txt, b.txt, findLine.asm and findLine.cmd
You know,... having observed the development, (if that is even a word I can use here), to describe the 'growth' of this unbelievably Lengthy thread, .....
Is there a chance you can IN VERY SIMPLE TERMS !!! describe What this program will be used for, if and when it ever gets finished .... and please Don't start telling me about a.txt, b.txt etc,.. I just want you to describe it's intended application, in 'Real Life' !!
I like many here, have had Great difficulty understanding many of your Posts and I am Amazed that it has taken 17 Pages so far, for you to still NOT actually describe it's purpose, let alone seemingly Wasting the time and effort of a few of the members here who really did go 'Above and beyond' in trying to help you.
I am sure you could condense much of your 'One Line' posts into Fewer posts with more content.
Quite honestly, I simply DON'T See the Point.
Sorry
another test passed
(a lot people have difficuty understand ml&link asm code or Windows API. so I add a lot of print text)
v250: a.txt, b.txt, findLine.asm and findLine.cmd
I took a look at your program. It does 'something'. Congratulations.
I think that you misunderstand ascii text.
Hex byte 00h does not equal ascii zero "0", which is 30h. That it why you will not see it in a text editor like notepad.exe.
Also the carriage return/line feed (CRLF) is the line terminator, meaning the end of the line - not the beginning of the line - although there may be another line after CRLF. You call it the start of the line in your code, that is incorrect. The start of the line is after CRLF except for the first line which may or may not have a preceding CRLF (an empty line)
If there is a hex zero in an ascii string, that is considered as the string terminator or end of the string.
You keep calling a.txt and b.txt text files. If either of them contain a hex zero byte 00h, then it is not ascii text, nor a .txt file.
I did some testing with your program, using real ascii text for a.txt and b.txt. It seems to work. Opening c.txt and d.txt in notepad, the results are displayed without using a hex editor.
I think that you are confused between hex zero 00 and ascii "0"(30h) or any other ascii numbers - I think that is where some of your confusion is stemming from.
30h = ascii "0"
31h = ascii "1"
32h = ascii "2"
and so on...
I think that you misunderstand ascii text.
this app not process just ascii text.
"You call it the start of the line in your code"
no, I call it line separator. you can modify 0Dh,0Ah(2bytes) to 000000h(3bytes) or anything you want.
The start of the line is after CRLF except for the first line.
in this app, line is pure line, no start somgthing, no ending sometiing or terminator.
it is not ascii text, nor a .txt file.
right, you can mofidy .txt to .bin, .dat, .dump or .database anthing you want.
Quote from: learn64bit on April 19, 2024, 02:28:44 PMI think that you misunderstand ascii text.
this app not process just ascii text.
then not a valid .txt file.
Quote"You call it the start of the line in your code"
no, I call it line separator.
I didn't remember what you called it, sorry.
QuoteThe start of the line is after CRLF except for the first line.
in this app, line is pure line, no start somgthing, no ending sometiing or terminator.
Okay then, I am the one that does not understand, along with the other members who have posted here.
Now can you explain exactly what your
program is for. The administrator here, stoo23 would like to know. We all would like to know.
I write
programs in assembly, not apps. :wink2:
it for fun!
I wrote apps in any languge(if I can understand).
Quote from: learn64bit on April 19, 2024, 02:43:27 PMI wrote apps in any languge(if I can understand).
Okay. Can you show us one working program (including source code) that you yourself wrote?
no.
(customer will not happy. they already force me take many source code down. they warnned me a lot of times. haha)
Quote from: learn64bit on April 19, 2024, 03:02:58 PMcustomer will not happy. they already force me take many source code down. they warnned me a lot of times.
I am sorry, but I don't believe you. I have seen your code. Not what I would call production level code. I feel sorry for your 'customers'.
I have also seen your hacked and patched programs, (like notepad) that you were trying to pass off as your own. And then theres your attempts at bypassing security/licensing features of windows 10 and 11....
Quote from: learn64bit on April 07, 2024, 08:07:21 PMwhich info you want? you can ask about it
(like regedit bypass check works, disable update works, and kms works...
Hmmm naughty, naughty, naughty... :eusa_naughty:
I am not impressed. :undecided:
in cn, just make it work. no limit.
of cause, if someday, made something gov want to copyright.
politicians will fighting EU or US for copyright, haha
but, not now, now if you made a 3nm ic machine. gov don't care how you made it. haha
(now in cn. many people make cpu, gpu. they are crap! in many games rending wrong. but dirty cheap. haha. btw: nothing wrong about my huananzhi x10x99-d16 motherboard. you can buy it.)
(from now. cn gov will not allow use EU or US cpu, gpu, smartphone in gov office. so gov is the biggest custom for these crap and dirty cheap cpu, gpu, smartphone...)
v260: just add some lines of code. it will pass more test
Quote from: learn64bit on April 18, 2024, 06:34:59 AMalmost 8k people intrested this simple findLine.
Quote from: learn64bit on April 19, 2024, 03:02:58 PMcustomer will not happy. they already force me take many source code down. they warnned me a lot of times.
Quote from: learn64bit on April 19, 2024, 03:18:48 PMof cause, if someday, made something gov want to copyright.
politicians will fighting EU or US for copyright, haha
This project of yours will become a multimillion dollar business :thumbsup:
I hope you will give adequate royalties to sudoku, as he helped you a lot.
no. it just for fun. it on text forum without ADs.
(coding on TikTok or Youtube is not that popular. only 1% of people have instrest in coding)
(customer project have alot of money. but it alway have a lot restriction...)
btw: in cn. 8k is not enough. most popular video on TikTok or other platform have 0.1 billion veiws that customer will love to put ADs on your video.
people report this 5kb app will use a lot of memory.
but no crash
(https://i.postimg.cc/8fgbdFcK/1.jpg) (https://postimg.cc/8fgbdFcK)
I am testing it now. right now over 500mb.
it gonna reach 2gb, don't know what will happen
Quote from: learn64bit on April 19, 2024, 11:47:14 PMpeople report this 5kb app will use a lot of memory.
If you are working with GigaByte sized files, then yes. But don't you still have 96 GB memory on your computer. Surely you can afford to work with large (x GB) files, without fear of the program crashing for lack of adequate memory.
no, not 1gb test yet.
only 3mb.
(https://i.postimg.cc/ppHJSnCR/1.jpg) (https://postimg.cc/ppHJSnCR)
looks like it will over 2gb.
over 1gb, no crash.
Oooops! crashed
(https://i.postimg.cc/87MnqSQr/1.jpg) (https://postimg.cc/87MnqSQr)
FileMapping tried, and failed
Quote from: learn64bit on April 20, 2024, 12:13:16 AMOooops! crashed
FileMapping tried, and failed
What does it say was the reason for the crash, in the crash dump?
my guess maybe just my mistake.
it will be fix soon
Hmmmm... okay.
Odd though. I have opened and worked on 2 GB files without issue. Windows 7 here. Could you run it in a debugger to see exactly where it crashed?? (Which line of code)
Was it an access violation or ????
bug fixed
my mistake: it repeatedly doing CreateFileMapping on b.txt
it should only do one time
now memory usage: 12,360 K
Quote from: learn64bit on April 20, 2024, 02:11:35 AMmy mistake: it repeatedly doing CreateFileMapping on b.txt
it should only do one time
Excellent. :thumbsup:
When running V269 without the files being present, the program crashes without any error messages displayed.
(https://i.postimg.cc/cvVGcYzD/untitled.png) (https://postimg.cc/cvVGcYzD)
yes, no miscCheck yet.
(everybody say miscCheck speciallly dupliLineCheck is toooo slow. so no miscCheck for test)
Quote from: learn64bit on April 20, 2024, 04:38:29 AMyes, no miscCheck yet.
(everybody say miscCheck speciallly dupliLineCheck is toooo slow. so no miscCheck for test)
ok. just letting you know about the crash.
If CreateFile returns -1 in eax, you should display an error message then exit... so the program doesn't crash.
Quote from: sudoku on April 20, 2024, 04:39:23 AMQuote from: learn64bit on April 20, 2024, 04:38:29 AMyes, no miscCheck yet.
(everybody say miscCheck speciallly dupliLineCheck is toooo slow. so no miscCheck for test)
ok. just letting you know about the crash.
If CreateFile returns -1 in eax, you should display an error message then exit... so the program doesn't crash.
you are right.
2 3mb took 3 hours...
maybe time to try multiThreading.
how many threads should be ok? 4 or 48?
loop48
load 48 lines from a.txt into 48 buffers.
creat fileMapping on b.txt for every lineBuffer.
start thread to do lengthCompare/contentCompare job
omg! it will be 48 times faster
3 housrs/48 = 3.75mins.
it looks acceptable
okay, not 48, only 2.
(everybody can enjoy the testing)
48 threads? :dazzled: That would be interesting.
I have only worked with two threads in any one program myself, with varying success.
Let us know how it works out for you. :thumbsup:
How many cores in your CPU?
According to this, two threads are available per core... (this is for server cpus, but should still apply methinks)
I could be mistaken, but that is how I read it although I have only used as many threads as I have cores.
When I tried 4 threads on my 2 core cpu, it did not double the efficiency. I hadn't done my research.
Server cpus may behave differently???
https://community.fs.com/article/what-is-a-server-cpu.html
Skip down to where it says "CPU threads"
Further reading for you...
optimal threads per core (https://stackoverflow.com/questions/1718465/optimal-number-of-threads-per-core)
Seems that you should do some research before trying 48 threads. :thumbsup:
I have two
intel xeon 2650v4
12c24t - 2400,2133,1866,1600 ddr4 - 105w - 2.2GHz, 2.90GHz - 14nm
Quote from: learn64bit on April 21, 2024, 12:41:38 AMI have two
intel xeon 2650v4
12c24t - 2400,2133,1866,1600 ddr4 - 105w - 2.2GHz, 2.90GHz - 14nm
Two cores will probably limit how many threads you can use.
You can try as many threads as you like, but the results might disappoint you. ( or did you mean you have two different cpus?)Oh wait, does 12c24t equals 12 core, 24 threads??? My computer is jealous, if that's the case. Especially if you have two of those in one computer. Congrats. If that is true, then 48 threads are theoretically possible. May not be very practical though from a coding standpoint.... cumbersome, unwieldy, etc.
Props to you if you can successfully make that work as you intended. :thumbsup: That would definitely speed up your program.
Not sure if it will be 48 times faster though, you'd have to consider the overhead involved with creating that many threads, etc.
In order for that to be that efficient for anyone else, they would have to have the same number of cpus, same number of cores and threads per cpu, etc. at minimum.
An average user most likely will not have an adequate setup (effectively 24 cores, 48 threads) to realize any great speed improvement from the original versions. Even if 48 threads could be made then, the threads would be battling to use the available cpu time from the available cores - reducing efficiency. That is what happened to me, when I was experimenting adding threads to a program I was testing.
If that program is just for you then perfect, all is well since you don't have to worry about it working for anyone else.
v279: .asm and .cmd
(start to try multiThreading idea)
Quote from: sudoku on April 21, 2024, 12:44:04 AMQuote from: learn64bit on April 21, 2024, 12:41:38 AMI have two
intel xeon 2650v4
12c24t - 2400,2133,1866,1600 ddr4 - 105w - 2.2GHz, 2.90GHz - 14nm
Two cores will probably limit how many threads you can use.
You can try as many threads as you like, but the results might disappoint you. ( or did you mean you have two different cpus?)
Oh wait, does 12c24t equals 12 core, 24 threads??? My computer is jealous, if that's the case. Especially if you have two of those in one computer. Congrats. If that is true, then 48 threads are theoretically possible. May not be very practical though from a coding standpoint.... cumbersome, unwieldy, etc.
I had already tried create 2048 threads for fun,I have a vision of big open world RPG with 2048 NPC ,each is handled with a separate thread ,because majority of those 2048 threads are suspended until PC comes close enough to interact with NPC's thread is started and stopped when PC moves out of range
Quote from: learn64bit on April 21, 2024, 01:49:28 AMstart to try multiThreading idea
include \masm32\include\masm32rt.inc
.data?
@dwThreadID dd ?
threadFinished dd ?
.code
...
invoke CreateThread,NULL,0,offset _Counter,NULL,NULL,addr @dwThreadID
That is one thread. Where are the other 47?
[raises hand] Question, teach:
OK, so you've stored the thread ID, which seems like a good idea.
But how do you exit a specific thread if you have many?
ExitThread() takes no parameter and just exits the currently-executing thread. So how do you end, say, thread #692? (see possible self-answer below, after thinking about this)
Well, OK, there's
TerminateThread(), which takes the handle you get from
CreateThread(). Except that Micro$oft has this to say about it:
QuoteTerminateThread is a dangerous function that should only be used in the most extreme cases. You should call TerminateThread only if you know exactly what the target thread is doing, and you control all of the code that the target thread could possibly be running at the time of the termination. For example, TerminateThread can result in the following problems:
- If the target thread owns a critical section, the critical section will not be released.
- If the target thread is allocating memory from the heap, the heap lock will not be released.
- If the target thread is executing certain kernel32 calls when it is terminated, the kernel32 state for the thread's process could be inconsistent.
- If the target thread is manipulating the global state of a shared DLL, the state of the DLL could be destroyed, affecting other users of the DLL.
Maybe you don't need to do anything to end a thread except put a
RET at the end of the thread procedure (gotta have that anyhow), or use
ExitThread() which does same thing. Reach the end of the routine, bam, thread is gone.
To terminate a running thread from your main thread, when you create a new thread you pass it the address of a dword (e.g. flags) to use for communication between them, and set the ACTIVE flag.
You both check flags periodically to see what the status is.
If the main thread wants thread 2 to quit, set the QUIT bit in thread 2's flags.
When thread 2 checks, it can set the IHEARANDOBEY flag and start cleaning up.
When the main thread sees IHEARANDOBEY it waits for the ACTIVE to clear.
Thread 2 clears ACTIVE and returns.
Main thread clears all flags, makes them ready to use for another thread and goes back to its message loop.
There are two things that I need to look up, I am using PeekMessage instead of GetMessage so CPU usage is always a few percent, and I'm not sure about having to lock memory accesses between threads.
Although now that I think of it, isn't it much more likely that the threads will be the ones to determine when they're finished and simply terminate themselves? Can't really think of a situation where the "supervisor" (i.e., main thread) is going to need to terminate a thread (unless worst case, the thread is somehow out of control or corrupted, in which case it might be difficult or impossible to communicate with it anyhow).
On the other hand, it certainly might want to know the status of a thread, using those flags you described.
There's no real way for a thread to talk to its owner unless (as you know) you have a window procedure and can post a message to it. The thread can't get anything from the owner without some sort of shared memory area. Sometimes a thread will have a lock open or similar, the owner doesn't really want to suspend it unless it can free the lock first, so it needs a notification to prepare for suspension.
I am basing my code on my old protected-mode OS (didn't we all have one?), I'm probably making Windows programmers nervous :biggrin:
Quote from: NoCforMe on April 21, 2024, 07:36:53 AMBut how do you exit a specific thread if you have many?
It does that automagically for you with a "ret" :cool:
Yes, or with ExitThread(). (Same difference, I guess.)
I'm learning.
ooops, almost forgot to post my test result.
my v269 test result, no crash.
it works. hope no more bug
(https://i.postimg.cc/svvM4r85/1.jpg) (https://postimg.cc/svvM4r85)
line number check
15601 <- a.txt
02381 <- c.txt
13220 <- d.txt
a = c + d.
seems the test passed
Quote from: daydreamer on April 21, 2024, 04:47:28 AMQuote from: sudoku on April 21, 2024, 12:44:04 AMQuote from: learn64bit on April 21, 2024, 12:41:38 AMI have two
intel xeon 2650v4
12c24t - 2400,2133,1866,1600 ddr4 - 105w - 2.2GHz, 2.90GHz - 14nm
Two cores will probably limit how many threads you can use.
You can try as many threads as you like, but the results might disappoint you. ( or did you mean you have two different cpus?)
Oh wait, does 12c24t equals 12 core, 24 threads??? My computer is jealous, if that's the case. Especially if you have two of those in one computer. Congrats. If that is true, then 48 threads are theoretically possible. May not be very practical though from a coding standpoint.... cumbersome, unwieldy, etc.
I had already tried create 2048 threads for fun,I have a vision of big open world RPG with 2048 NPC ,each is handled with a separate thread ,because majority of those 2048 threads are suspended until PC comes close enough to interact with NPC's thread is started and stopped when PC moves out of range
haha.
202400000 jobs need to do.
create 2048 threads
asign job to every thread.
check thread for finish, than create new thread, asign job.
until 202400000 jobs all done
maybe you should share your code with us.
anyway, have fun
Quote from: learn64bit on April 21, 2024, 07:57:03 PMcreate 2048 threads
These are good intentions, but remember you can only create twice as many threads as your cpu (https://www.intel.com/content/www/us/en/products/sku/91767/intel-xeon-processor-e52650-v4-30m-cache-2-20-ghz/specifications.html) has cores:
Essentials
Download Specifications
Product CollectionIntel® Xeon® Processor E5 v4 Family
Code NameProducts formerly Broadwell
Vertical SegmentServer
Processor Number E5-2650V4
Lithography 14 nm
Recommended Customer Price $1166.00
CPU Specifications
Total Cores 12
Total Threads 24
Max Turbo Frequency 2.90 GHz
Intel® Turbo Boost Technology 2.0 Frequency‡ 2.90 GHz
Processor Base Frequency 2.20 GHz
Cache 30 MB Intel® Smart Cache
Bus Speed 9.6 GT/s
# of QPI Links 2
TDP 105 W
Supplemental Information
Marketing StatusDiscontinued
Launch Date Q1'16
Servicing Status End of Servicing Updates
End of Servicing Updates Date Thursday, June 30, 2022
Embedded Options Available Yes
DatasheetView now
Memory Specifications
Max Memory Size (dependent on memory type) 1.5 TB
Memory Types DDR4 1600/1866/2133/2400
Max # of Memory Channels 4
Max Memory Bandwidth 76.8 GB/s
Quote from: jj2007 on April 22, 2024, 12:25:14 AMThese are good intentions, but remember you can only create twice as many threads as your cpu has cores:
:badgrin:
Quote from: CreateThreadThe number of threads a process can create is limited by the available virtual memory
Depending on how much stack each thread gets.
(oh stack, 48 threads, maybe it will crash again...)
how do I should naming these 47 threads?
Quote from: learn64bit on April 22, 2024, 11:32:02 AM(oh stack, 48 threads, maybe it will crash again...)
how do I should naming these 47 threads?
thread handles
hThread1 dd ?
hThread2 dd ?
hThread3 dd ?
All the way to
hThread47 dd ?
Thread procedures
Thread1 Proc...etc
Thread2 Proc...etc
Thread3 Proc...
All the way to
Thread47 Proc...
And so on. Perhaps?
okay, 1 to 47
(https://i.postimg.cc/RqQpkRf0/1.jpg) (https://postimg.cc/RqQpkRf0)
I only see two threads running in your screenshot.
Keep adding to it, up to Thread47...
But you have to make sure that any given thread is not overwriting data that was written by another thread. Multi threading is very tricky, from what I have read. I have only successfully dealt with 2 threads, but have experimented with higher thread counts. Mostly failed though, since only two cores here.
v296
(1+2 threads)
if no crash, will add more threads
ooops, forgot to remove "Sleep"
v297
(Sleep removed)
mytest: no crash, old version took 3 hours, this verson only took 3 minutes.
don't know what happen on 1+47
Quote from: learn64bit on April 23, 2024, 03:28:56 AMold version took 3 hours, this verson only took 3 minutes.
Great! Sounds like you are making good progress.
Which a.txt and b.txt are you using? The ones I had used, finished in seconds - so I know they are not correct.
Quote from: sudoku on April 23, 2024, 04:49:47 AMQuote from: learn64bit on April 23, 2024, 03:28:56 AMold version took 3 hours, this verson only took 3 minutes.
Great! Sounds like you are making good progress.
Which a.txt and b.txt are you using? The ones I had used, finished in seconds - so I know they are not correct.
I'm using #265 testfiles. the result is correct.
maybe you should share you testFiles
Quote from: learn64bit on April 23, 2024, 03:17:32 PMI'm using #265 testfiles.
There are no files attached in post #265. ???
Quote from: learn64bit on April 23, 2024, 03:17:32 PMmaybe you should share you testFiles
Mine are just some random junk text files like "jyufukfyfkvgkydhvukygf". About 1024 bytes worth in around 11 or 12 lines. I already discarded those.
okay, here it is
the last one
(you need to delete trailing .zip)
(I used the findFile3sha256sum to create them)
(the 'findFile3sha256sum' is in 'RadASM2 for chinese' post)
Okay, they extracted correctly.
I have one tip that will make it faster.
Instead of printing after each line processed, only print if an error occurs, or when finished.
printing to screen very often like that takes a lot of time.
okay, a long way to go.
a lot of ideas need to try
Just under two minutes here, with the last version that you posted, V297
And the latest a.txt and b.txt attached in post #301 and #302
Quote from: sudoku on April 23, 2024, 04:10:14 PMJust under two minutes here, with the last version that you posted, V297
ok, maybe you have better cpu than me.
thread 2 is fake, this V297 should call 1(main thread)+1(real worker)+1(fake worker)
In your next version, only print to screen when the program is finished processing the files. It will be faster, I guarantee it.
Quote from: learn64bit on April 23, 2024, 03:55:50 PMI used the findFile3sha256sum to create them
Looking at those files, I had thought that they must be some sort of checksum. :smiley:
Are you using that system of comparing checksums to check for corrupted files?
a1: yes, sha256sum + space + filePath.
a2: no.
it's just testFiles.
this app will find out:
which files are in vs2k22community only.
which files are in vs2k22buildtools only.
which files are in vs2k22community&buildtools both
(vs2k22: v17.9.6 v17.9.34728.123 evergreen)
I can see clearly what your program is doing. If you ever get it finished and running very fast, it looks like it will be a very useful tool.
So finally we get an idea what's the content of these files :thumbsup:
C:\b\Microsoft.Android.Runtime.34.android-x86.34.0.52,version=34.0.52.0,machinearch=x86\d612a3ff06807afce62b7ede2c147f73-x86.msi
341DB352F7616F3176443FDC24EE3A6DCEFA0BFEE79E52FE14CAFCD4DAC78820 C:\b\Microsoft.Android.Sdk.net7.33.0.95,version=33.0.95.0,machinearch=arm64\60f0e13cf49938e11638e841420111cc-arm64.msi
40B5A4A5961DBA8BE65A0C56C9E8E23380ED9F6596CCE9E1D7F2CA83C8829928 C:\b\Microsoft.Android.Sdk.net7.33.0.95,version=33.0.95.0,machinearch=x64\60f0e13cf49938e11638e841420111cc-x64.msi
80B676E1720FD81283A06B0CDF687C7FF4281631844BE2DEE58C5CF5A4305320 C:\b\Microsoft.Android.Sdk.net7.33.0.95,version=33.0.95.0,machinearch=x86\60f0e13cf49938e11638e841420111cc-x86.msi
18A90B72E72C52FC7FB6EFD6BF07C1CC36B5C1B815B21A983018CFD4CC1BD03E C:\b\Microsoft.Android.Sdk.net8.34.0.52,version=34.0.52.0,machinearch=arm64\5446d0c43626814f53bb98efe989c925-arm64.msi
127D7467F01B92D5803C4710FC46E31F30635F899AD4FCCAF5ABCCB3A454CE94
looks like in order to add other 46 threads, we must use the "Macro" things.
otherwise there are a lot of copy&past and replace
v314
(real 1+2)
Seems to be working ..... but
there also seems to be an endless loop, where the processing is already done -- but the program keeps "going to check whether all threads quit" etc... (endlessly see screenshot)
(https://i.postimg.cc/m1BR1bgQ/endless-loop.png) (https://postimg.cc/m1BR1bgQ)
I stopped the program after 7 minutes while it was still stuck in the loop.
There also seems to be a couple of stalls along the way, where the program does not do anything for several seconds.
Quote from: sudoku on April 24, 2024, 01:24:14 AMSeems to be working ..... but
there also seems to be an endless loop, where the processing is already done -- but the program keeps "going to check whether all threads quit" etc... (endlessly see screenshot)
(https://i.postimg.cc/m1BR1bgQ/endless-loop.png) (https://postimg.cc/m1BR1bgQ)
I stopped the program after 7 minutes while it was still stuck in the loop.
There also seems to be a couple of stalls along the way, where the program does not do anything for several seconds.
need your testFiles to find the bug
Quote from: learn64bit on April 24, 2024, 01:35:44 AMneed your testFiles to find the bug
They are
your files... (exactly the a.txt and b.txt from the 3 part download) and the latest version of your program. Or do you mean c.txt and d.txt?
The last time I ran the program, it is still stuck in the endless loop after 15 minutes still running in that loop.
Edit to add...about 20 minutes after this was first posted...
Now still running after 30+ minutes.... I think it's time to put it to bed. :tongue:
same testFiles? but not happpened on my machine
Quote from: learn64bit on April 24, 2024, 02:30:21 AMsame testFiles? but not happpened on my machine
It surely happened here though. Windows 7 if that makes any difference. Over 30 minutes and the program never finished, kept repeating what is shown in the screenshot above.
ooops!
you are right.
5th time run this app, I saw the bug.
it must be a randam bug.
the result c and d is correct.
thread 1 and 2 already quit. but checkQuit is failed.
I must made a mistake on the check thread proc.
v321
(try to fix the "quit" bug)
After about 1 minute and 40 seconds, appears to be done processing a.txt and b.txt. But still the endless loop afterwards.
Bug not fixed yet.
v323
(this one I already tested 9 times, no bug)
(run 4 this app, my cpu is at 90% usage)
Quote from: sudoku on April 24, 2024, 03:45:54 AMAfter about 1 minute and 40 seconds, appears to be done processing a.txt and b.txt. But still the endless loop afterwards.
Bug not fixed yet.
try v323.
oh. maybe thread is really usefull in this app.
next upload will be 1+3 threads
ooops, the result is not correct...
more bugs.. haha
1 minute 53 seconds, program exited properly and when it was supposed to.
edit to add:
Quote from: learn64bit on April 24, 2024, 04:26:07 AMooops, the result is not correct...
more bugs.. haha
Uh oh. Too bad, then. Back to the drawing board...
yes, something is wrong in the code.
the 'quit' bug is not easy to fix
v330
(tested 8 times, 'quit' bug is fixed, and result is correct)
(I don't know what's wrong, I just deleted unecessory '.if' '.elseif'.
anyway hope bug fixed, result is correct)
v332
(1+3 threads)
Almost 4 minutes with this version. I had run it eight times, average is just under 4 minutes.
Quote from: sudoku on April 24, 2024, 01:11:06 PMAlmost 4 minutes with this version. I had run it eight times, average is just under 4 minutes.
interesting
Quote from: learn64bit on April 23, 2024, 03:54:54 PMokay, here it is
Both archives throw errors. These are not valid zip files.
I just checked "findLineV323source.zip", it is ok here.
I don't know what's worng
Quote from: learn64bit on April 24, 2024, 07:58:21 PMI just checked "findLineV323source.zip", it is ok here.
Files in reply #301 are definitely
not ok.
Quote from: jj2007 on April 24, 2024, 08:01:28 PMQuote from: learn64bit on April 24, 2024, 07:58:21 PMI just checked "findLineV323source.zip", it is ok here.
Files in reply #301 are definitely not ok.
sudoku: Okay, they extracted correctly.
I didn't check that yet.
but, I believe sudoku.
sorry, I don't know what's wrong
Same for #265testFiles.z02.zip. Open them in Notepad, there is no PK signature. Garbage.
Quote from: jj2007 on April 24, 2024, 08:06:04 PMSame for #265testFiles.z02.zip. Open them in Notepad, there is no PK signature. Garbage.
interesting
#332 and other smaller archives work fine (they do have the PK signature).
Quote from: jj2007 on April 24, 2024, 08:01:28 PMQuote from: learn64bit on April 24, 2024, 07:58:21 PMI just checked "findLineV323source.zip", it is ok here.
Files in reply #301 are definitely not ok.
Its a 3 part file. (.7z multipart archive I think)
Two files from #301, the other from #302. Remove the .zip extension from all 3. and keep em together for 7zip.
from post #302:
Quote from: learn64bit on April 23, 2024, 03:55:50 PMthe last one
(you need to delete trailing .zip)
Don't change the extension to anything. ONLY remove the trailing ".zip"
You should end up with 3 files... :rolleyes: :rolleyes:
#265.zip
#265.z01
#265.z02
Quote from: sudoku on April 24, 2024, 01:11:06 PMAlmost 4 minutes with this version. I had run it eight times, average is just under 4 minutes.
02 cores wiht hyperThreading
1+02 is best
24 cores with hyperThreading
1+46
maybe this is the best for me
Quote from: learn64bit on April 24, 2024, 10:47:00 PM02 cores wiht hyperThreading
1+02 is best
So far that one is the fastest, at least on my computer.
So yes, for those that do not have more than a single two core cpu.
Quote24 cores with hyperThreading
1+46
maybe this is the best for me
Possibly yes, if done properly. But not everyone will be able to use it (to its full potential) due to the number of cpu cores/threads that they may or may not have. For your own special computer it should work fine.
learn64bit:
How are you creating the file listings a.txt and b.txt. Your own program, or an external third party program? As well as adding the checksum data to the listing, your own program or third party...
I am only asking since it might be a faster process if everything could be done in a single program, since you wouldn't have to first make a file listing (for a.txt and b.txt), add the checksum for each file, then running your "findline" program on the file listings. It is quite conceivable to do everything in a single program.
findFile
(https://i.postimg.cc/N2D3P3Xq/1.jpg) (https://postimg.cc/N2D3P3Xq)
Quote from: learn64bit on April 25, 2024, 12:42:17 AMfindFile
Something like This? (https://masm32.com/board/index.php?topic=11863.0)
Edit:
I just noticed you added the screenshot...
Oh, okay. I will look for that. :smiley:
a and b not just text file or ascii file
a.txt can be 4mb 00h
Quote from: learn64bit on April 25, 2024, 12:47:44 AMa and b not just text file or ascii file
a.txt can be 4mb 00h
Is your 'findfile' program from here (https://masm32.com/board/index.php?msg=115132)generating that file with zeros??? Or is the file with zeros made by some other program?
Quote from: sudoku on April 25, 2024, 12:55:00 AMQuote from: learn64bit on April 25, 2024, 12:47:44 AMa and b not just text file or ascii file
a.txt can be 4mb 00h
Is your 'findfile' program from here (https://masm32.com/board/index.php?msg=115132)generating that file with zeros??? Or is the file with zeros made by some other program?
otherApp.
findFile's result.txt is a 65001withoutBom text file
Quote from: learn64bit on April 25, 2024, 12:59:54 AMotherApp.
findFile's result.txt is a 65001withoutBom text file
Okay. Probably a lot of work to put everything into one program. I was only thinking that it would save time to not have to run several programs to get the desired results.
Quote from: jj2007 on April 24, 2024, 08:20:53 PM#332 and other smaller archives work fine (they do have the PK signature).
... and the attached .exe throws an error (
LODSB with ESI=0). Hmmmmph.
Quote from: NoCforMe on April 25, 2024, 09:51:40 AMQuote from: jj2007 on April 24, 2024, 08:20:53 PM#332 and other smaller archives work fine (they do have the PK signature).
... and the attached .exe throws an error (LODSB with ESI=0). Hmmmmph.
maybe your testFiles will not pass the miscCheck
when I finished 1+47, will try other ideas.
Quote from: learn64bit on April 25, 2024, 05:59:39 PMwhen I finished 1+47, will try other ideas.
Let us know about your progress... :smiley:
v332 takes about 10 seconds here.
c.txt and d.txt are attached (remove the .zip extension :biggrin: )
Quote from: sinsi on April 26, 2024, 01:55:11 AMv332 takes about 10 seconds here.
Wow. Fast machine. :thumbsup:
I just reran v332 (with a&b txt files from #301&302) on my machine.... It is still chugging along.... (very slow for me)
a bit later...
... still just under 4 minutes on my computer.
Your mileage may vary, depending on
your cpu of course.
13th Gen Intel(R) Core(TM) i9-13900KF 3.00 GHz
Quote from: sinsi on April 26, 2024, 01:55:11 AMv332 takes about 10 seconds here.
c.txt and d.txt are attached (remove the .zip extension :biggrin: )
win7+winrar
copy /b cd.zip.001 + cd.zip.002 x.zip
use winrar open x.zip
10secs... that's fast
v360
(1+5)
1 minute and 40 seconds on my computer.
The fastest version (from my computers perspective) so far. :thumbsup:
intel have different CPUs for servers and desktops.
there are some memory issues, like what CPU support fast memory
AMD Ryzen family have similar problem.
Latest version
The current time is: 18:05:34.97
The current time is: 18:05:47.84
Previous version
The current time is: 18:04:13.45
The current time is: 18:04:19.22
Quote from: sinsi on April 27, 2024, 06:37:10 PMLatest version
The current time is: 18:05:34.97
The current time is: 18:05:47.84
Previous version
The current time is: 18:04:13.45
The current time is: 18:04:19.22
Neither one of those look like
Quote10 seconds
. :tongue:
Also hard to decipher those early in the morning, I haven't had my coffee yet.
Looks like the latest version V360 is slower for you, sinsi?
Quote from: sudoku on April 27, 2024, 11:53:57 PMNeither one of those look like
Quote10 seconds
. :tongue:
Selective quoting? I actually said
Quote from: sinsi on April 26, 2024, 01:55:11 AMabout 10 seconds
Quote from: sudoku on April 27, 2024, 11:53:57 PMLooks like the latest version V360 is slower for you, sinsi?
Sorry, wrong way around :sad:
Previous version
The current time is: 18:05:34.97
The current time is: 18:05:47.84
around 12 seconds
Latest version
The current time is: 18:04:13.45
The current time is: 18:04:19.22
around 6 seconds
Of course that assumes that the results are valid
Quote from: sinsi on April 28, 2024, 12:24:55 AMOf course that assumes that the results are valid
Very true, only learn64bit knows for sure. If valid, it
is faster..
learn64bit, how fast does V360 run on your computer, for comparison?
Quote from: sinsi on April 28, 2024, 12:24:55 AMSelective quoting? I actually said
Its extremely difficult to precisely copy and paste on my ancient ipad. :tongue: Thats my excuse, and I am sticking to it.
new idea and new app:
delete file and delete folder
input: c.txt and c:\b
c:\b is vs2022 buildtools installation folder
maybe it should be called deleteFile&emptyFolder
how should I do it?
Quote from: learn64bit on April 28, 2024, 08:13:15 AMnew idea and new app:
delete file and delete folder
...
how should I do it?
The easiest way would be to delete it (or them) using windows explorer.
Or even make a batch file for the task. :greensml:
Quote from: sudoku on April 28, 2024, 08:53:03 AMQuote from: learn64bit on April 28, 2024, 08:13:15 AMnew idea and new app:
delete file and delete folder
...
how should I do it?
The easiest way would be to delete it (or them) using windows explorer.
Or even make a batch file for the task. :greensml:
windows explorer and 2000 files... not possible at all.
batch file can do tha? I don't think so.
delete all files which in c.txt, then delete all empty folders
I was joking, of course.
Does this mean that you are finished with 'findline'?
Quote from: sudoku on April 28, 2024, 09:03:28 AMI was joking, of course.
Does this mean that you are finished with 'findfile'?
no, not finished yet. but deleteFile&Folder is usefull to verify findLine's result
Ah, okay. Do you have any code for the new program started yet? Or just still trying to come up with ideas?
Quote from: sudoku on April 28, 2024, 09:07:24 AMAh, okay. Do you have any code for the new program started yet? Or just still trying to come up with ideas?
no idea, how to do it
Do some research on API "DeleteFile"...
Later....
I made you a small example using "DeleteFile".
You first need to open the file that you want to delete, using CreateFile to verify that the file exists.
If it exists, close the handle returned by CreateFile.
Then call DeleteFile.
If the file that you want to delete does not exist, display an error message.
This is very basic code.
Use caution using DeleteFile... it can be very dangerous, unless you know exactly what you are doing with it, and especially if you are going to run DeleteFile within untested code.
That being said, use at your own risk. I assume no responsibilty for misuse of this code, or accidentally deleted files. They (any accidentally deleted files) cannot be retreived from "Recycle Bin" after deletion. You have been warned.
include \masm32\include\masm32rt.inc
.data
filename1 db "junktext1.txt", 0
filename2 db "junktext2.txt", 0
.code
start:
invoke CreateFile, addr filename1, 0, 0, 0, OPEN_EXISTING, 0, 0 ;; to verify file exists
cmp eax, -1
jnz @f
fn MessageBox, 0, "file not found", "file 1", 0 ;; display message if it doesn't exist.
jmp nextfile
@@:
invoke CloseHandle, eax ;; close handle if file exists
invoke DeleteFile, addr filename1 ;; delete the file
nextfile:
invoke CreateFile, addr filename2, 0, 0, 0, OPEN_EXISTING, 0, 0 ;; to verify file exists
cmp eax, -1
jnz @f
fn MessageBox, 0, "file not found", "file 2", 0 ;; display message if it doesn't exist.
jmp nomorefile
@@:
invoke CloseHandle, eax ;; close handle if file exists
invoke DeleteFile, addr filename2 ;; delete the file
nomorefile:
invoke ExitProcess, 0
end start
Assumption-that c.txt has a list of filenames, like 'e:\data\filename.ext'
from cmd.exe prompt
for /F "delims=" %i in (c.txt) do del "%i"from batch file
for /F "delims=" %%i in (c.txt) do del "%%i"Quote from: learn64bit on April 28, 2024, 08:59:10 AMbatch file can do tha? I don't think so.
:biggrin:
Quote from: sinsi on April 28, 2024, 11:08:37 AMAssumption-that c.txt has a list of filenames, like 'e:\data\filename.ext'
...
from batch file
for /F "delims=" %%i in (c.txt) do del "%%i"
...
:biggrin:
Yeah, I had suggested a batch file too. But he wants something that can run from inside his 'findline' code.
looks like delete file is easy.
how about delete empty folder? (maybe find the empty folder is difficult)
(after delete file, we need to delete all empty folders in c:\b)
Quote from: sinsi on April 28, 2024, 11:08:37 AMAssumption-that c.txt has a list of filenames, like 'e:\data\filename.ext'
from cmd.exe prompt
for /F "delims=" %i in (c.txt) do del "%i"
from batch file
for /F "delims=" %%i in (c.txt) do del "%%i"
Quote from: learn64bit on April 28, 2024, 08:59:10 AMbatch file can do tha? I don't think so.
:biggrin:
c.txt is 65001withoutBOM, is that ok?
Quote from: sudoku on April 28, 2024, 09:11:02 AMDo some research on API "DeleteFile"...
Later....
I made you a small example using "DeleteFile".
You first need to open the file that you want to delete, using CreateFile to verify that the file exists.
If it exists, close the handle returned by CreateFile.
Then call DeleteFile.
If the file that you want to delete does not exist, display an error message.
This is very basic code. The zip file contains two test files, to test deletion of those files.
Use caution using DeleteFile... it can be very dangerous, unless you know exactly what you are doing with it, and especially if you are going to run DeleteFile within untested code.
That being said, use at your own risk. I assume no responsibilty for misuse of this code, or accidentally deleted files. They (any accidentally deleted files) cannot be retreived from "Recycle Bin" after deletion. You have been warned.
include \masm32\include\masm32rt.inc
.data
filename1 db "junktext1.txt", 0
filename2 db "junktext2.txt", 0
.code
start:
invoke CreateFile, addr filename1, 0, 0, 0, OPEN_EXISTING, 0, 0 ;; to verify file exists
cmp eax, -1
jnz @f
fn MessageBox, 0, "file not found", "file 1", 0 ;; display message if it doesn't exist.
jmp nextfile
@@:
invoke CloseHandle, eax ;; close handle if file exists
invoke DeleteFile, addr filename1 ;; delete the file
nextfile:
invoke CreateFile, addr filename2, 0, 0, 0, OPEN_EXISTING, 0, 0 ;; to verify file exists
cmp eax, -1
jnz @f
fn MessageBox, 0, "file not found", "file 2", 0 ;; display message if it doesn't exist.
jmp nomorefile
@@:
invoke CloseHandle, eax ;; close handle if file exists
invoke DeleteFile, addr filename2 ;; delete the file
nomorefile:
invoke ExitProcess, 0
end start
code here and attachment revised. forgot to jump (after MessageBox call), if file not found
great.
maybe we should check if the "DeleteFile" failed.
(example: read-only, system-file, file-permission ...)
if failed, then inform user to deal with that
I think it will be two new apps: deleteFile and deleteEmptyFolder
deleteFile: input c.txt and c:\b
deleteEmptyFolder: input c:\b
Under the hood it's DeleteFileW and RemoveDirectoryW:
include \masm32\MasmBasic\MasmBasic.inc
Init
Kill "somefolder\a.txt" ; delete a file
KillFolder "somefolder" ; folder must be empty
EndOfCode
Quote from: learn64bit on April 28, 2024, 08:56:31 PMI think it will be two new apps: deleteFile and deleteEmptyFolder
deleteFile: input c.txt and c:\b
deleteEmptyFolder: input c:\b
@learn64bit...
As jj said, DeleteFile can also delete empty folders. * see jj2007s post below this one. Its early here and I havent had my coffee yet. :tongue:
I had forgotten that you also wanted to remove the folder as well. So did not put that in the example.
Quote from: learn64bit on April 28, 2024, 08:37:34 PMgreat.
maybe we should check if the "DeleteFile" failed.
(example: read-only, system-file, file-permission ...)
if failed, then inform user to deal with that
To check for whether or not DeleteFile succeeded or failed, that is a good idea.
In the example, I just looked to see if they were still there. I am lazy about those details. :joking:
Quote from: sudoku on April 28, 2024, 11:42:32 PMAs jj said, DeleteFile can also delete empty folders.
Quote from: jj2007 on April 28, 2024, 10:29:01 PMUnder the hood it's DeleteFileW and RemoveDirectoryW:
SHFileOperationW is also good one for delete file
Quote from: sudoku on April 29, 2024, 12:37:34 AMToo early in the morning for me to post. :tongue:
OK, Sudoku, I'm challenging you:
You wrote that you understand what this guy's trying to do here. So would you please explain to us what this program is supposed to do and how it works (since he seems either unable or unwilling to do so)? And what the format of all these mysterious files (a, b, c, etc.) is and how that's supposed to work?
Thanks from all the perplexed readers here.
Quote from: NoCforMe on April 29, 2024, 04:56:25 AMOK, Sudoku, I'm challenging you:
I will refer you to the OP, learn64bit. He can give a more precise answer than I could, it is his project after all. Or you could simply read through the thread.... there are breadcrumbs throughout, if you would care to look for them.
If anything that is not clear to you, you can ask learn64bit (as I have) as he is the author of this thread and the project.
There are commented source codes here (for some earlier versions) as well as example .txt files that demonstrate the use of the program. Also several versions of the executable, the more recent ones are the better of them.
So you don't know. Why didn't you just say so?
deleteFileV389
(.zip\.asm&.cmd)
QuotefileName dw "\\?\D:Win7m7\a.txt", 0
(removed formatting for better display here - was unicode string in the source)
Why the wildcard "\\?" before the drive letter?
comment in the source file:
Quote;By default, the name is limited to MAX_PATH characters. To extend this limit to 32,767 wide characters, prepend "\\?\" to the path.
Okay, but then....
Are you sure you want to delete a.txt?
Now
this I do noty understand. Are you simply trying to test DeleteFileW?
QuoteRetrouveMessageErreur
French??
Very curious, odd even.
deleteEmptyFolderV391
(.zip\.asm&.cmd)
1. after deleteFile, now we have dupliLine problem again, haha
2. how to delete "sha256sum" + "space" from line. which Editor can do it?
3. how to delete "backslash" + "fileName" from line. which Editor can do it?
If you use memory mapped files, you need to use buffers for removing unused data.
Look at masmbasic, if there are already usable functions for data processing, i you can't make a your own.
delete sha256sum + space
(use EditPlusV3.51b1036)
(https://i.postimg.cc/gr1b0562/1.jpg) (https://postimg.cc/gr1b0562)
correct
delete backslash + fileName
(https://i.postimg.cc/0bCn4Kx1/2.jpg) (https://postimg.cc/0bCn4Kx1)
not correct
Quote from: learn64bit on April 29, 2024, 11:00:49 PM1. after deleteFile, now we have dupliLine problem again, haha
2. how to delete "sha256sum" + "space" from line. which Editor can do it?
3. how to delete "backslash" + "fileName" from line. which Editor can do it?
The sha256sum plus space is always the same number of characters, should be easy to obtain just the filename having that knowledge. Just copy the filename into an appropriate sized buffer. The carriage return/line feed pair will tell you where the filename ends. Except for your last line, which you said cannot contain CRLF pair. In that case, look for the first zero after the path. If Unicode, look for the first double zero (Hex: 00 00)
Why remove anything from their line? You are deleting every file on the list, correct?
Just delete the a/b/c/d or whatever .txt file when you have removed all the files on the list. Or just leave it.
For files that are not removed, make a separate list of those files.(the ones not removed for whatever reason) you can also make a note on that list why not removed, didn't exist, file in use, system file, or whatever.
Using yet another third party program? Should be easy enough to write your own, I would think. Or just a simple procedure to be used in existing code source file. Just some more tips for you to think about.
delete sha256sum + space is for deleteFile app.
delete backslash + fileName is for deleteEmptyFolder app.
delete dupliLine is for deleteEmptyFolder app
third party app is ok.
delete emptyLine use third party app.
delete trailing 0Dh,0Ah use third party app
Quote from: learn64bit on April 30, 2024, 12:12:03 AMdelete sha256sum + space is for deleteFile app.
delete backslash + fileName is for deleteEmptyFolder.
delete dupliLine is for deleteEmptyFolder
third party app is ok.
delete emptyLine use third party app.
delete trailing 0Dh,0Ah use third party app
Okay. It seems like you always take the long way to do these text manipulation duties. Even generating the sha256sum can be done in your own code (there exist libraries for that) as well as everything you have mentioned in this thread. Still you are making progress with your overall project (or seem to be) and I applaud your persistence.
now the delete backslash + fileName is correct
(https://i.postimg.cc/vg27ykXr/3.jpg) (https://postimg.cc/vg27ykXr)
now we only need the deleteDupliLine app
To remove duplicate lines from a text file, it is much easier if it is sorted first.
Sorting lines of text should be easy. Hutch has some pretty good sorting algos in the Masm32 SDK. With that sorted list, it is very easy to remove duplicate lines from it. Compare a given line with the next line, if duplicate do not copy it to output buffer else copy it to output buffer. I have a qeditor plugin that works in that fashion, so I know it is easy to code.
This can all be done in assembly. You haven't been interested in most of the example code I have presented {and removed because of your lack of interest} (with exception to the DeleteFile example) so I refrain from posting the code that I would use here. You definitely do not need a third party to do that for you. It wastes time, and for me it is more satisfying if I can do it myself. That is what coding is all about (for me). No satisfaction if a third party tool does the work for you. Unless of course, it is a truly complex task that needs to be handled. Nothing complex about removing duplicate lines of text.
Just some more 'food for thought' from me.
e.txt and f.txt testFiles
(not everybody have the third party app, so I upload the result e.txt and f.txt testFiles)
Quote from: TimoVJL on April 29, 2024, 11:18:54 PMLook at masmbasic, if there are already usable functions for data processing
There probably are, and they are fast, but after following 27 pages of this funny thread I still don't have the faintest idea what this guy
wants or what he is doing :biggrin:
(yes I know it has something to do with finding lines in text files consisting of one Million zero bytes; right?)
Quote from: sudoku on April 30, 2024, 01:15:28 AMSorting lines of text should be easy
It is easy, indeed - but not fast. If I understand correctly, he has Millions of lines. Study the O(n) behaviour of sort algos...
@jj2007
Yes, sorting can be slowish. Hutch has functionality in qeditor for either ascending sort and descending sort. Both are 'pretty fast', but yes even they (or qeditor) would choke if millions of lines in the editor.
And the zeros that may be present are indeed troublesome. No normal text or string handling functions will work with those zeros present.
Yes what he wants to do is evolving over time, and getting hard to follow (it was easy in the early stages) - he is now onto deleting scores (hundreds?, thousands?, millions?) of files.
[rant] My solution would be to not put that much junk on my computer in the first place. (Dozens of full versions of Visual Studio, for example - every conceivable version of Windows OS's is another example - he posted about both) pick one that works for you and be done with it, imo. You probably won't even use most of it anyway. [rant off]
now we have dupliLine problem in f.txt, haha
a new app "deleteDupliLine" will come
Quote from: sudoku on April 30, 2024, 01:31:57 AMYes what he wants to do is evolving over time, and getting hard to follow
There are wonderful translation sites like Deepl.com (https://www.deepl.com/translator) - I refuse to follow a guy who throws around bits and pieces of only partly useful and/or correct information, spread over 27 pages. Kudos that you are still playing his game :thumbsup:
@jj
Just giving him some tips that might be helpful. btw he understands English well enough... although not as articulate as a native English speaker.
if you know any app can do the "deleteDupliLine" job, just let me know.
third party app is ok.
if we know it already exist, we don't need to create it again
Quote from: learn64bit on April 30, 2024, 01:49:38 AMif you know any app can do the "deleteDupliLine" job, just let me know.
third party app is ok.
if we know it already exist, we don't need to create it again
You can try to code it yourself. I gave some tips on how to do that already.
Sort the list first, compare one line with the line that follows it.
If the second line matches the first line, discard it... and so forth until the list has been fully traversed.
Tip: The sort and compare could be made faster, if you only need to compare the sha256sums, rather than comparing the full line (sha256sum+filename)
@Sudoku: Let me remind you that you wrote, somewhere way back there:
Quote from: sudoku on April 23, 2024, 04:37:42 PMI can see clearly what your program is doing. If you ever get it finished and running very fast, it looks like it will be a very useful tool.
So since you "see clearly" what his code is doing, why don't you explain it to us?
As to your handwaving suggestion that I look through his code myself and try to suss it out. no: I've looked at enough of his code already, and TBH life is too fucking short for that. So please explain what it's all about, since you "see clearly what [his] program is doing", 'k?
Quote from: NoCforMe on April 30, 2024, 05:51:45 AMSo since you "see clearly" what his code is doing...
No! That quote is from days ago, and does not reflect the current state of the project.
This project is off on another tangent.
So ask learn64bit!... and stop pestering
me about it. :icon_idea:
Well, regarding your suggestion, you do realize that he (learn64bit) has never once answered any questions asked of him in this thread about his code, right? So that's a non-starter.
Quote from: sudoku on April 30, 2024, 01:31:57 AMsorting can be slowish
One option is to hash the strings: that would give you a Million DWORDs to compare, much faster than comparing strings.
One nice way to keep the hash and the index together is ArraySort (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1491) with QWORD size: you put the hash into the upper, the index into the lower DWORD; with the QWORD sort the hashes are adjacent, so that you can decide which strings to keep or delete.
Quote from: NoCforMe on April 30, 2024, 06:20:17 AMhe (learn64bit) has never once answered any questions
Correct. He seems absolutely unable to explain what he wants.
@jj
I have heard about hashing, but I have absolutely no clue how/why it works or how to implement it. Not really the place for that discussion, we can save that for another time.
read people's code first, chat is second.
(I promise to read all people's code. at least 5mins for one code)
Quote from: learn64bit on April 30, 2024, 06:31:10 AMI promise to read all people's code
Explaining on half a page what your file format is, and what you actually want to do, in plain, clear, understandable English, with complete sentences, would be a much more intelligent option. This is
reply #414 - ridiculous.
Quote from: jj2007 on April 30, 2024, 06:23:17 AMOne nice way to keep the hash and the index together is ArraySort (https://www.jj2007.eu/MasmBasicQuickReference.htm#Mb1491) with QWORD size:
Yes, but as usual that's a MasmBasic thing, not a general-purpose function.
other people understanding: the line is just a 4mb number, numbers are separated by two bytes 0Dh0Ah. findLine's job is to find out dupliNumber(these numbers is in a.txt and b.txt both) and uniqueNumber(these numbers is in a.txt or b.txt only)
I think above is right.
Quote from: NoCforMe on April 30, 2024, 06:57:48 AMYes, but as usual that's a MasmBasic thing, not a general-purpose function.
You can use any sort function that handles QWORDs.
Quote from: learn64bit on April 30, 2024, 07:00:48 AMthe line is just a 4mb number, numbers are separated by two bytes
Plain nonsense. A "line" is not a "number". In arrays, numbers are not separated by anything, they are just adjacent.
every people have their own understanding.
you can have your own
for me it's just 4mb data separated by 0Dh0Ah. that's all.
no line, no number, no ascii, no text, no format, no ... just 4mb data.
what it is? It is of no importance
looks like no third party app is available for f.txt.
okay I will create the "deleteDupliLine"
I think we should use multiThreading in deleteDupliLine app.
otherwise it will take 87 hours...
87 hours is too long for most people.
maybe 30 mins is ok
what I gonna do it
inputFile
f.txt
outputFiles
g.txt
all dupliLines goes
h.txt
all nonDupliLines goes
(it will let the result can be verified easily by people, then "copy /b" on g.txt and h.txt to get i.txt for deleteEmptyFolder app)
deleteDupliLineV422
(.exe only, 1+5 threads)
new problem.
we should deleteParentIfChildNotExist.
hope third party app can do it
(https://i.postimg.cc/xcQT6rvT/1.jpg) (https://postimg.cc/xcQT6rvT)
Quote from: hutch-- on November 29, 2022, 10:04:53 AMIf I had a buck$ for every incomprehensible question ever posted, I would own 2 Cadillacs, a house on the Riviera and at least 2 Lear jets.
Wow, I can't believe this nonsense thread is that old :badgrin:
(no g.txt and h.txt)it's too complicated. maybe we just simply feed f.txt to deleteFolder
This guy (yes, you, "learn64bit") just keeps plowing ahead like some mindless bulldozer scraping over the same ground again and again. "a.txt", "b.txt", "c.txt" ... no doubt to extend to "x.txt", "y.txt" and "z.txt". No explanation, none, of what these files are, what they do, what their purpose or format is. Plus no explanation, none, of what the code does and what its purpose is. And of course well-meaning but gullible members are getting sucked into this gigantic whirlpool as well.
More popcorn!
deleteEmptyFolderV427
(input file is f.txt)
(.exe only)
Quote from: sinsi on April 28, 2024, 12:24:55 AMQuote from: sudoku on April 27, 2024, 11:53:57 PMNeither one of those look like
Quote10 seconds
. :tongue:
Selective quoting? I actually said
Quote from: sinsi on April 26, 2024, 01:55:11 AMabout 10 seconds
Quote from: sudoku on April 27, 2024, 11:53:57 PMLooks like the latest version V360 is slower for you, sinsi?
Sorry, wrong way around :sad:
Previous version
The current time is: 18:05:34.97
The current time is: 18:05:47.84
around 12 seconds
Latest version
The current time is: 18:04:13.45
The current time is: 18:04:19.22
around 6 seconds
Of course that assumes that the results are valid
90% of people think it's valid. because c+d=a.
after I finish the deleteFile app, the result can be verified.
I know I'm later, because I took different way to verify it.
(what am I thinking? it's for fun! haha)
Since there is no code in your fastest program to check the results for validity, it is normal for people to question the validity of the results.
You said the slowest code took 87 hours :dazzled: to finish. Now on sinsis computer, presumably the same functions (but faster code) took mere seconds.
On my computer a few minutes.
Just because someone 'thinks' it is valid is not enough, results must be verified. Especially since it is much, much faster than the original. (87 hours -> 10 seconds)... sheesh.
This is the short explanation. Would be pages, for the longer version. But my time is limited.
I have truncated the post to the bare essence of what was originally posted here, for brevity.
deleteFilesV430
(inputfile is e.txt)
(.exe only)
yeee hooo! finally I can answer the question.
sinsi's c.txt and d.txt are valid.
the result files have be verified.
now we can try other ideas to speedup the findLine app
q: how you verify the result?
a: verified existing files
(https://i.postimg.cc/K4kvh5kT/0.jpg) (https://postimg.cc/K4kvh5kT)