News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Get Files and GetFolders

Started by clamicun, August 20, 2017, 08:41:22 AM

Previous topic - Next topic

clamicun

Boa noite jj,
Here is one which doesn't let me sleep.

2 methods to search for files on a disk/folder.

1. GetMyFiles uses the macro GetFiles
GetFiles "C:\*.xxx"

2. GetMyFolders uses the macro GetFolders
GetFolders "C:\"
Then it uses INVOKE FindFirstFile,"offset Files$(ecx)",offset wfd

GetFiles is faster but GetFolders has the advantage that you only have to use it once searching the same disk/folder.
The next times you can use the saved results. Files$(ecx) and and the amount of folders. So then it is much faster. 

My problem.
The results are "mostly" the same, but not always !
We know that. An error is an error, you have to find it, but sometimes error and sometimes not is always bad.

It seems that searching for any textfiles (*.txt *.doc *.htm *.pdf and others)  is correct
Searching *.exe *.dll *.sys and others can produce different results.

After a couple of days experimenting  now I have to ask you if you could take a look on the two sources.
No hurry. 
My machine is Win7 32b.




 

jj2007

Quote from: clamicun on August 20, 2017, 08:41:22 AM
GetFiles is faster but GetFolders has the advantage that you only have to use it once searching the same disk/folder.
The next times you can use the saved results. Files$(ecx) and and the amount of folders. So then it is much faster.
...
Searching *.exe *.dll *.sys and others can produce different results.

Hi Clamicun,

2. I made some quick tests, see screenshots below, and the result is an identical count of exe and dll files.
1. The three boxes are in chronological order, and you can see that GetFolders is very slow indeed, but only the first time. Afterwards, everything is in the cache, so GetFiles and GetFolders are about equally fast.

Problems with exe, sys and dll files can have several causes (and they are most likely not a problem of the algo!), but to investigate, one would have to see which files exactly have not been found on your machine. A simple Store "GetFiles.txt", Files$() may help.

Btw the little apps are well done :t

P.S.: I just see that you do save the results. Post the ones that differ...

clamicun

jj,
thanks.
Here is a screenshot.
I'll try to check which *.exe are lacking.

jj2007

Quote from: clamicun on August 21, 2017, 02:26:02 AMI'll try to check which *.exe are lacking.

That will be difficult because you save in MyFolders.txt the folder names only, in MyFiles.txt the file names...

On my machine, the difference is 8 files among 17,000, 0.047%, see screenshot.
idden and read-only files seem not affected, the count is equal for both applications.
But one culprit is already identified: Your FindFirstFile will not find files with non-Latin alphabet names. GetFiles() is Unicode aware, filenames are returned in UTF8 format.

However, that doesn't explain why your folders.exe finds more files for my drive C: :(

What comes to my mind is "super hidden" files, or double-counting, e.g. because of links. But really, without identifying which files are not found in either application, there is no chance to find the reason.

clamicun

Quote jj "That will be difficult because you save in MyFolders.txt the folder names only, in MyFiles.txt the file names"

I know. That's why I said  "I'll try to find".
But anyway I'll try and let you know.

btw.you have patience .. 741 seconds is a lot.

jj2007

Yes, my machine is not as virgin as yours :P

Attached a "folders" version that writes the filenames, too. Please run it in parallel with your files.exe and post me the two results here (I can delete them shortly after).

clamicun

Wrote a similar thing.
Here are lots of those files which GetFolders found and GetFiles did not.
Maybe it tells you something.

jj2007

Quote from: clamicun on August 21, 2017, 11:48:16 AM
Wrote a similar thing.
Here are lots of those files which GetFolders found and GetFiles did not.
Maybe it tells you something.

"lots of"? I thought the difference was only 8 files...

About half of these files have an embedded .exe, but it seems that both versions don't find them (which is correct behaviour - we are searching for extensions at the end of the file):
C:\Dokumente und Einstellungen\All Users\Microsoft\Windows\WER\ReportQueue\AppCrash_mbamservice.exe_232465cb478ff41a49642fc256ef2b034d26f94_cab_05f1365c

Any specific characteristics of the setup.exe files, such as hidden, read-only, ...?

clamicun

C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Text_Tools\OpenOffice 4.1.2 (de) Installation Files\setup.exe

The folder C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Text_Tools\OpenOffice 4.1.2 (de) Installation Files is readonly.

But talking on the embedded .exe, my GetFolders app finds them ??
Why findfirst/findnext finds them ?

Well, it'ok now. I know that there is no mistake in my search program.(Just for fun I include it. Maybe you have some hints.)

I cleaned up the machine and killed all those strange files. The two apps no w show the same results. 

jj2007

Quote from: clamicun on August 21, 2017, 08:58:57 PMI cleaned up the machine and killed all those strange files. The two apps no w show the same results.

Glad you solved your issue. I've tried pretty much everything now, hidden files and folders, read-only, not ready for archiving - no differences.

As written above, the only point that makes GetFiles() different from ordinary FindFirstFileA is that GetFiles handles Unicode filenames properly. And that is clearly a feature, not a bug ;)

clamicun

#10
Did you run my CMC_Search.exe ?
The only reason I hide the mainwindow is:
When you touch it during GetFolders and the following search it might go down.