Adventures in programming: Treeviews, directory scanning, recursion

NoCforMe · April 05, 2024, 10:36:19 AM

(putting this in the Windows API forum since it deals mostly with Win32 API topics)

Note: Long post. If you don't like to read much, you might want to move on to something shorter.

Someone suggested to me an idea for an application that would let them annotate pictures, basically associate a block of text with a picture. That sounded intriguing to me, so I started thinking about the design of such a program. I imagined an Explorer-like interface showing folders and picture files, and it seemed to me that a treeview control might be the way to go.

I'd never used a treeview before; I've seen them, of course, and was familiar with the basic idea. But given the horrors of dealing with certain Windows controls (listview, here's looking at you!), I was kind of hesitant to jump into this new thing to me.

Turns out I shouldn't have been worried: the whole project (so far) has been extremely pleasurable, with success after success, including implementing the treeview. You know how some projects just fight you every step of the way? The documentation is sketchy, missing or downright wrong; things don't work the way they're supposed to; there are lots of hidden "gotchas" to make life interesting for you. None of that here.

In fact, the treeview control worked perfectly the first time I coded it! That hardly ever happens.

I thought a treeview would be appropriate since it follows the structure of a file system, folders and files, and this is in fact the case. (Of course, that's one of the classic uses of a treeview, to show directory structures.) I thought that perhaps this could be done with some other control type, like, say, a listbox with a lot of custom-draw stuff added to it. But the treeview was an absolute pleasure to work with. This is obviously a very well-designed and well implemented piece of work. It's complex: just think about all the code behind all that drawing and placement, and the capability of expanding or collapsing sections. I'd say it's somewhere between a listview and a RichEdit control in terme of complexity.

So: first task was to do a directory scan to feed the treeview. I'd done this previously with my file-finder program, using FindFirstFile() and FindNextFile() to dive into a directory structure. After not too much trouble I came up with a simple and reliable function to do this, called GetAllDirsAndFiles(). Turns out, of course, that recursion is your friend here. In fact, with recursion, it's dead simple to do this.

In pseudocode:

Code Select

Start @ "root" folder:
GetAllDirsAndFiles (root)

while ! NoMoreFiles
(  FindFirstFile()
  if NoMoreFiles
      exit
  filter files (exclude ".", "..", and non-image files)
  add folder or file to file list
  FindNextFile()
)
if AnyFilesFound
(  for each folder found
  GetAllDirsAndFiles (folder found)
)

That's it, minus a couple details. (Complete code and explanation below.)

I use a list of structures to store folders and files:

Code Select

FILEINFO    STRUC
  pathName    DB MAX_PATH DUP(?)
  rootLoc    DD ?
  textPtr    DD ?
  parent    DD ?
  tvHandle    HWND ?
  flag        DD ?
FILEINFO    ENDS

pathName: The full pathname to the folder or file
rootLoc: Offset to the "root" name (name.ext)
textPtr: Pointer to block of text (not yet implemented)
parent: Pointer to parent folder
tvHandle: Handle of this treeview item
flag: Attributes of this entry

Now, that first item causes me some concern about memory requirements, as MAX_PATH is 240 bytes. That's a lot of memory if you're dealing with thousands (or tens of thousands) of files. I thought of alternative schemes, where only the "root" name would be stored in the FILEINFO entry, and you could construct the full pathname by following the parent pointers. But besides being a pain in the ass to code, you'd still probably need MAX_PATH characters to store even just a filename, let alone a full pathname. So at the end of the day, since our computers have so much memory available nowadays, maybe this shouldn't be a source of worry. So I'm OK with this scheme.

OK, now we have an in-memory list of all our files and folders. But how to get this into a treeview? My first thought was to integrate populating the treeview with the directory scan, creating treeview entries from each folder or file found.

Long story short, it turns out to be much easier to break this process into two parts: first scanning the directory structure, then creating the treeview from this list. And it turned out to be so easy!

All that needs to be done is to go through the file list sequentially, creating treeview items from each item. Because of the sequence of scanning, the treeview will be perfect. For each folder found, create a treeview item with children; for each file, create one without children. (Well, almost; read on for details.) Because each file list entry has a parent pointer, this gives us the handle of the parent treeview item we need to create a new treeview item. Perfect!

Well, almost perfect. My first design gave me the right directory structure, except for one thing: instead of all folders being listed first and then files, the two were intermixed in alpha-sort order. Not what I wanted. This was due to the fact that folders and files arrive in alpha-sort order, not folders first, then files.

Hmmm; scratch my head a few minutes. Hey, what if I ... rearrange the file list so that folders are first? This turned out to be trivially easy, and gives the correct order of folders and files.

See the before and after shots of file order below.

Another thing: since I was telling Windows that all folders had children while all files didn't, all folders looked as if they could be expanded, even if they were empty. Hmmm, not sure how to fix that; it seems like a time machine would come in handy here, since how do I know whether there are any children in a folder until I come to them?

This problem was also trivially easy to solve: that's where the flag member of FILEINFO comes in. As I'm processing a folder in GetAllDirsAndFiles(), I have access to both the parent (the folder being processed) and any children (folders and files found in the parent). So what I do is initially create all folders with "no children", but then when I find any items within (folders or files), I change the flag to "has children". And then in the 2nd phase, populating the treeview, each folder item gets created corrrectly.

Here's before and after for that problem.

Here's the code for the folder/file scanner:

Code Select

;============================================
; GetAllDirsAndFiles (fiPtr)
;
; New version (1 pass/folder instead of 2)
;============================================

GetAllDirsAndFiles	PROC fiPtr:DWORD
	LOCAL	findHandle:HANDLE, folderCount:DWORD, fiSavePtr:DWORD, rootLoc:DWORD
	LOCAL	totalCountThisDir:DWORD, fi:FILEINFO, fd:WIN32_FIND_DATA, pathBuffer[256]:BYTE

	CMP	NoMoreObjectsFlag, TRUE
	JE	exit99

; Get location for all appendages to incoming pathname (folder & filenames):
	MOV	EDX, fiPtr
	LEA	EAX, [EDX].FILEINFO.pathName
	CALL	strlen
	INC	EAX			;+1 for "\".
	MOV	rootLoc, EAX

	MOV	folderCount, 0
	MOV	totalCountThisDir, 0
	MOV	EAX, NextFileEntryPtr
	MOV	fiSavePtr, EAX		;Save place where these dirs. will be stored.

	MOV	EDX, fiPtr
	INVOKE	strcpy, ADDR [EDX].FILEINFO.pathName, ADDR pathBuffer
	INVOKE	_strcat, OFFSET WildcardStr, ADDR pathBuffer
	INVOKE	FindFirstFile, ADDR pathBuffer, ADDR fd
	CMP	EAX, ERROR_NO_MORE_FILES
	JE	exit99
	TEST	EAX, EAX
	JZ	exit99
	MOV	findHandle, EAX

; Filter out "." and ".." folders:
filter:	INVOKE	strcmpi, ADDR fd.cFileName, OFFSET OneDotDirStr
	TEST	EAX, EAX
	JNZ	nxtfil
	INVOKE	strcmpi, ADDR fd.cFileName, OFFSET TwoDotDirStr
	TEST	EAX, EAX
	JNZ	nxtfil

; Filter out any bogus files whose name is (or starts with) ':'"
	CMP	fd.cFileName, ':'
	JE	nxtfil

; If it's a file, filter on valid extensions:
	TEST	fd.dwFileAttributes, FILE_ATTRIBUTE_DIRECTORY
	JNZ	@F
	LEA	EAX, fd.cFileName
	CALL	FilterFiles
	TEST	EAX, EAX			;Result of extension match.
	JZ	nxtfil

; Add folder or file to file list:
; Check for too many objects:
@@:	CMP	TotalObjectCount, $maxObjects
	JNE	@F
	INVOKE	MessageBox, NULL, OFFSET TooManyObjectsMsg, NULL, MB_OK
	MOV	NoMoreObjectsFlag, TRUE
	JMP	exit99				;That's it, folks.

@@:	PUSH	EBX
	MOV	EBX, NextFileEntryPtr
	MOV	EAX, rootLoc
	MOV	[EBX].FILEINFO.rootLoc, EAX
	MOV	EDX, fiPtr			;Get ptr. to parent.
	MOV	[EBX].FILEINFO.parent, EDX
	OR	[EDX].FILEINFO.flag, $hasKids	;Indicate that the parent has children.
	INVOKE	strcpy, ADDR [EDX].FILEINFO.pathName, ADDR [EBX].FILEINFO.pathName
	INVOKE	_strcat, OFFSET BackslashStr, ADDR [EBX].FILEINFO.pathName
	INVOKE	_strcat, ADDR fd.cFileName, ADDR [EBX].FILEINFO.pathName

; Folder or file?
	TEST	fd.dwFileAttributes, FILE_ATTRIBUTE_DIRECTORY
	JZ	isfile

; It's a folder:
	INC	folderCount
	MOV	EAX, $folder
	JMP	SHORT @F

; It's a file:
isfile:	INC	FileCount
	XOR	EAX, EAX

@@:	MOV	[EBX].FILEINFO.flag, EAX
	POP	EBX

	INC	TotalObjectCount
	INC	totalCountThisDir

	ADD	NextFileEntryPtr, SIZEOF FILEINFO

; Get next file:
nxtfil:	INVOKE	FindNextFile, findHandle, ADDR fd
	TEST	EAX, EAX
	JZ	dodirs
	CMP	EAX, ERROR_NO_MORE_FILES
	JNE	filter

; Reorder list so that folders are first, then files:
dodirs:	MOV	EAX, totalCountThisDir
	TEST	EAX, EAX
	JZ	@F
	MOV	EDX, fiSavePtr
	CALL	ReorderFilist

@@:	MOV	ECX, folderCount
	ADD	TotalFolderCount, ECX
	JECXZ	exit99

	PUSH	EBX
	MOV	EBX, fiSavePtr			;Pt. to list of new folders & files.

dofldr:	PUSH	ECX

; Now recurse into folders:
foldlp:	TEST	[EBX].FILEINFO.flag, $folder
	JZ	skipfl

	INVOKE	GetAllDirsAndFiles, ADDR [EBX].FILEINFO.pathName
	POP	ECX
	ADD	EBX, SIZEOF FILEINFO
	LOOP	dofldr
	POP	EBX
	JMP	SHORT exit99

skipfl:	ADD	EBX, SIZEOF FILEINFO
	JMP	foldlp

exit99:	RET

GetAllDirsAndFiles	ENDP

There's more to the story. I'm going to stop here and continue with another post for Part 2.

NoCforMe · April 05, 2024, 11:00:03 AM

Some juicy details about using treeview controls:

To create a treeview item:

Start at the "root", of course
For each item, fill out a TVINSERTSTRUCT structure.
This includes a nested TVITEM (or TVITEMEX) structure that contains the details, like the text pointer, whether the item has children, etc.
You need to set the mask member (it's _mask for MASM) to tell which members you're setting in the structure
Use the TVM_INSERTITEM message to create the item.

To clear a treeview control:
Easy-peasy; just use the TVM_DELETEITEM message with a value of TVI_ROOT (or zero).

To expand (or collapse) a treeview at any point in the tree:
Use the TVM_EXPAND message, naturally. (If you don't expand the control it'll come up collapsed when you create it.)

Treeview styles:
You probably want all these styles:

TVS_HASBUTTONS creates "buttons" for nodes (with or without plus sign)
TVS_HASLINES gives you those cool connecting lines between items
TVS_LINESATROOT shows a line at the "root" (top of the tree)

So how do you track treeview items to your data structures, like my folder/file list? Well, like so many Win32 structures or functions, TVITEM(EX) contains a handy-dandy lParam element, which is just a DWORD. I use this to hold a pointer to the corresponding FILEINFO element for each treeview item. Works like a charm.

NoCforMe · April 05, 2024, 11:25:02 AM

Issues:

After coming up with a working demo of at least part of this app (a picture & text associator, remember), several issues have emerged:

Speed: Scanning directory structures takes some time. Sometimes lots of time.
Text storage: Storing all that text to associate with pictures is not going to be trivial. I'm still throwing around ideas for how this will work.
Data storage: By which I mean storing the whole shebang to disk so it can be loaded later. All kinds of problems here!
Stack considerations

I'll cover the speed considerations here.

Interesting detail: my first version of the folder/file scanner (code posted in 1st post above) was more complicated: I did two passes through each folder I was scanning, first for folders, then for files. Not only was this overly complex, but more importantly it was slower than hell! On a path containing a little more than 4,000 files it took about 45 seconds to grind through all that data! But when I simplified it to the one-pass version posted here, the time went down ... to less than 1 second! I must have been tripping up the file-search process something terrible; it may be that it expects a certain sequence of searching (getting all folders and files in a folder in one batch), and gets all screwed up with another sequence.

Even now it can take a long time with a lot of files. If I scan my entire working disk (~28,000 files in ~16,000 folders) it still takes almost 30 seconds.

Which is way too long to leave a user looking at an unresponsive program (which will probably show "Not responding" in its title bar). I know how to fix this: run the directory scan in a separate thread. And give the user some indication of activity, like a progress bar.

Oh well, this probably is never going to be a commercial product anyway ...

I wonder if those newfangled "shell" functions are any faster?

sinsi · April 05, 2024, 11:42:14 AM

pathName should be easy enough to get, assuming the node text is the directory name.

Code Select

GetNodeFullPath PROC USES EBX path:PTR,node:HTREEITEM
    ;path = buffer large enough (MAX_PATH presumably)
    ;node = the node with the file name
    mov     ebx,node
 @@:
    INVOKE  PrependNodeText,ebx,path
    INVOKE  SendMessage,treeview,TVM_GETNEXTITEM,TVGN_PARENT,ebx
    mov     ebx,eax
    test    eax,eax
    jnz     @b
    ret
GetNodeFullPath ENDP

NoCforMe · April 05, 2024, 12:00:28 PM

Yes, that's another way it could be done; really 6 of one/half a dozen of the other, as I have those (partial) pathnames in my file list, and all I have to do is access my parent pointer.

Quote from: sinsi on April 05, 2024, 11:42:14 AMpathName should be easy enough to get, assuming the node text is the directory name.

Actually, it's not: I'm only storing the "root" name (name + .ext), at least in the treeview (for display). But I store the entire pathname for the node in my own file list. And I'd only have to go back one level to a file's parent to construct the entire pathname, not all the way back to the root. (That's what all those MAX_PATH elements are used for.)

Pet peeve: I hate that word "prepend". So programmer-ish. The right word would be prefix.

NoCforMe · April 05, 2024, 12:07:26 PM

Issue: Data Storage

By this I mean storing the whole shebang, file list and associated text, after scanning and user input. Now it's easy enough to come up with a file format for this; it's essentially a database with a tree structure having pointers into a text heap of some kind.

What's a much more serious problem is what you're going to do with that database once you load it. Say the user picks an entry point in the directory structure on their disk, scans it, inputs a bunch of text to go with pictures and saves it. Later they start the program and reload the database. But in the meantime pictures have been moved, deleted and added; how the hell do you synchronize that mess? Ugh.

I suppose my first pass will only work with files (pictures) that are where they were before, and just ignore the others. But even this will mean having to confirm that those files still exist in their original locations.

Design; it's a bitch, but it's absolutely necessary. (Unless you don't mind a program that was obviously designed by a programmer!)

sinsi · April 05, 2024, 12:09:00 PM

QuoteAlthough it sounds correct, prepend is not an English word. It was created to sound like the opposite of "append," which means to add to the end.

Every day we learn a new thing

NoCforMe · April 05, 2024, 12:18:19 PM

Hopefully, yes.

OK, I'm re-attaching the demo executable here. I found the problem, a very stupid error made by one of our low-level workers here (they have been severely reprimanded and hopefully won't make that mistake again!). So give it a try; you can click on anything without it crashing. (I'm pretty sure.)

sinsi · April 05, 2024, 01:28:31 PM

Code Select

(4ad0.4fc): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000001 ebx=00000001 ecx=00000001 edx=00000000 esi=004010f0 edi=00550a52
eip=00401183 esp=0019f144 ebp=0019f490 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202
PixOrg+0x1183:
00401183 f7821401000001000000 test dword ptr [edx+114h],1 ds:002b:00000114=????????

Click the + to open a node, bang!

NoCforMe · April 05, 2024, 01:33:34 PM

Yikes. Sorry about that.

Looks like I was testing the "folder" bit there with EDX=0. Hmmmm ... back to the lab.

Well, folks, looks like development is at a standstill until we get this bug report resolved here at NoCforMe Laboratories, GmbH.

Found what I think was the problem (I didn't set the pointer to my file list for the root treeview item, and I must have never clicked on that). Updated version attached to reply #7 above.

sinsi · April 05, 2024, 02:13:03 PM

Code Select

(2b24.a20): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
*** WARNING: Unable to verify checksum for e:\Desktop\PixOrg.exe
eax=00000001 ebx=00000001 ecx=00000001 edx=00000000 esi=004010f0 edi=00240aea
eip=00401183 esp=0019f144 ebp=0019f490 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202
PixOrg+0x1183:
00401183 f7821401000001000000 test dword ptr [edx+114h],1 ds:002b:00000114=????????

Not a cut'n'paste

sinsi · April 05, 2024, 02:24:42 PM

Further playing
- if I select a node before trying to open it, all is OK
- I can open a node that isn't selected, subject to the above
- image preview works
- clicking the root node is an instant crash at the same code address, irrespective of the above

Greenhorn · April 05, 2024, 03:22:01 PM

Quote from: NoCforMe on April 05, 2024, 11:25:02 AM
Speed: Scanning directory structures takes some time. Sometimes lots of time.
Text storage: Storing all that text to associate with pictures is not going to be trivial. I'm still throwing around ideas for how this will work.
Data storage: By which I mean storing the whole shebang to disk so it can be loaded later. All kinds of problems here!
Stack considerations

What I would try to do ...

Speed: First scan root folder only (or just the first subnode, too) and create the treeview items
Speed: When opening a node (TVN_ITEMEXPANDING), scan the subfolder and create the treeview child items for that node (or the next level)
Data/Text storage: When creating a treeview item, store a pointer to an info structure (your data for the pic file) for each item in lParam of the TVITEM(EX)

NoCforMe · April 05, 2024, 03:42:02 PM

Ah, incremental loading; interesting. I'll think about that.

That's pretty much what I had in mind for storing the whole mess on disk.

I re-attached the demo (fixed the crash bug) in post #7 above if anyone wants to play with this.

sinsi · April 05, 2024, 04:02:54 PM

OK, it works, but...there's no indication that a node is a folder or a picture file.
I can expand everything, there are 10 pages in the tree, but what are pictures? Data overload.
Use a listview. I found them easier to use than a treeview

Or an imagelist for the nodes.

The MASM Forum

News:

Adventures in programming: Treeviews, directory scanning, recursion

NoCforMe

NoCforMe

NoCforMe

sinsi

NoCforMe

NoCforMe

sinsi

NoCforMe

sinsi

NoCforMe

sinsi

sinsi

Greenhorn

NoCforMe

sinsi