News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

Adventures in programming: Treeviews, directory scanning, recursion

Started by NoCforMe, April 05, 2024, 10:36:19 AM

Previous topic - Next topic

NoCforMe

(putting this in the Windows API forum since it deals mostly with Win32 API topics)

Note: Long post. If you don't like to read much, you might want to move on to something shorter.

Someone suggested to me an idea for an application that would let them annotate pictures, basically associate a block of text with a picture. That sounded intriguing to me, so I started thinking about the design of such a program. I imagined an Explorer-like interface showing folders and picture files, and it seemed to me that a treeview control might be the way to go.

I'd never used a treeview before; I've seen them, of course, and was familiar with the basic idea. But given the horrors of dealing with certain Windows controls (listview, here's looking at you!), I was kind of hesitant to jump into this new thing to me.

Turns out I shouldn't have been worried: the whole project (so far) has been extremely pleasurable, with success after success, including implementing the treeview. You know how some projects just fight you every step of the way? The documentation is sketchy, missing or downright wrong; things don't work the way they're supposed to; there are lots of hidden "gotchas" to make life interesting for you. None of that here.

In fact, the treeview control worked perfectly the first time I coded it! That hardly ever happens.

I thought a treeview would be appropriate since it follows the structure of a file system, folders and files, and this is in fact the case. (Of course, that's one of the classic uses of a treeview, to show directory structures.) I thought that perhaps this could be done with some other control type, like, say, a listbox with a lot of custom-draw stuff added to it. But the treeview was an absolute pleasure to work with. This is obviously a very well-designed and well implemented piece of work. It's complex: just think about all the code behind all that drawing and placement, and the capability of expanding or collapsing sections. I'd say it's somewhere between a listview and a RichEdit control in terme of complexity.

So: first task was to do a directory scan to feed the treeview. I'd done this previously with my file-finder program, using FindFirstFile() and FindNextFile() to dive into a directory structure. After not too much trouble I came up with a simple and reliable function to do this, called GetAllDirsAndFiles(). Turns out, of course, that recursion is your friend here. In fact, with recursion, it's dead simple to do this.

In pseudocode:
Start @ "root" folder:
GetAllDirsAndFiles (root)

while ! NoMoreFiles
(  FindFirstFile()
  if NoMoreFiles
      exit
  filter files (exclude ".", "..", and non-image files)
  add folder or file to file list
  FindNextFile()
)
if AnyFilesFound
(  for each folder found
  GetAllDirsAndFiles (folder found)
)

That's it, minus a couple details. (Complete code and explanation below.)

I use a list of structures to store folders and files:
FILEINFO    STRUC
  pathName    DB MAX_PATH DUP(?)
  rootLoc    DD ?
  textPtr    DD ?
  parent    DD ?
  tvHandle    HWND ?
  flag        DD ?
FILEINFO    ENDS
  • pathName:  The full pathname to the folder or file
  • rootLoc:  Offset to the "root" name (name.ext)
  • textPtr:  Pointer to block of text (not yet implemented)
  • parent:    Pointer to parent folder
  • tvHandle:  Handle of this treeview item
  • flag:      Attributes of this entry

Now, that first item causes me some concern about memory requirements, as MAX_PATH is 240 bytes. That's a lot of memory if you're dealing with thousands (or tens of thousands) of files. I thought of alternative schemes, where only the "root" name would be stored in the FILEINFO entry, and you could construct the full pathname by following the parent pointers. But besides being a pain in the ass to code, you'd still probably need MAX_PATH characters to store even just a filename, let alone a full pathname. So at the end of the day, since our computers have so much memory available nowadays, maybe this shouldn't be a source of worry. So I'm OK with this scheme.

OK, now we have an in-memory list of all our files and folders. But how to get this into a treeview? My first thought was to integrate populating the treeview with the directory scan, creating treeview entries from each folder or file found.

Long story short, it turns out to be much easier to break this process into two parts: first scanning the directory structure, then creating the treeview from this list. And it turned out to be so easy!

All that needs to be done is to go through the file list sequentially, creating treeview items from each item. Because of the sequence of scanning, the treeview will be perfect. For each folder found, create a treeview item with children; for each file, create one without children. (Well, almost; read on for details.) Because each file list entry has a parent pointer, this gives us the handle of the parent treeview item we need to create a new treeview item. Perfect!

Well, almost perfect. My first design gave me the right directory structure, except for one thing: instead of all folders being listed first and then files, the two were intermixed in alpha-sort order. Not what I wanted. This was due to the fact that folders and files arrive in alpha-sort order, not folders first, then files.

Hmmm; scratch my head a few minutes. Hey, what if I ... rearrange the file list so that folders are first? This turned out to be trivially easy, and gives the correct order of folders and files.

See the before and after shots of file order below.





Another thing: since I was telling Windows that all folders had children while all files didn't, all folders looked as if they could be expanded, even if they were empty. Hmmm, not sure how to fix that; it seems like a time machine would come in handy here, since how do I know whether there are any children in a folder until I come to them?

This problem was also trivially easy to solve: that's where the flag member of FILEINFO comes in. As I'm processing a folder in GetAllDirsAndFiles(), I have access to both the parent (the folder being processed) and any children (folders and files found in the parent). So what I do is initially create all folders with "no children", but then when I find any items within (folders or files), I change the flag to "has children". And then in the 2nd phase, populating the treeview, each folder item gets created corrrectly.

Here's before and after for that problem.





Here's the code for the folder/file scanner:
;============================================
; GetAllDirsAndFiles (fiPtr)
;
; New version (1 pass/folder instead of 2)
;============================================

GetAllDirsAndFiles PROC fiPtr:DWORD
LOCAL findHandle:HANDLE, folderCount:DWORD, fiSavePtr:DWORD, rootLoc:DWORD
LOCAL totalCountThisDir:DWORD, fi:FILEINFO, fd:WIN32_FIND_DATA, pathBuffer[256]:BYTE

CMP NoMoreObjectsFlag, TRUE
JE exit99

; Get location for all appendages to incoming pathname (folder & filenames):
MOV EDX, fiPtr
LEA EAX, [EDX].FILEINFO.pathName
CALL strlen
INC EAX ;+1 for "\".
MOV rootLoc, EAX

MOV folderCount, 0
MOV totalCountThisDir, 0
MOV EAX, NextFileEntryPtr
MOV fiSavePtr, EAX ;Save place where these dirs. will be stored.

MOV EDX, fiPtr
INVOKE strcpy, ADDR [EDX].FILEINFO.pathName, ADDR pathBuffer
INVOKE _strcat, OFFSET WildcardStr, ADDR pathBuffer
INVOKE FindFirstFile, ADDR pathBuffer, ADDR fd
CMP EAX, ERROR_NO_MORE_FILES
JE exit99
TEST EAX, EAX
JZ exit99
MOV findHandle, EAX

; Filter out "." and ".." folders:
filter: INVOKE strcmpi, ADDR fd.cFileName, OFFSET OneDotDirStr
TEST EAX, EAX
JNZ nxtfil
INVOKE strcmpi, ADDR fd.cFileName, OFFSET TwoDotDirStr
TEST EAX, EAX
JNZ nxtfil

; Filter out any bogus files whose name is (or starts with) ':'"
CMP fd.cFileName, ':'
JE nxtfil

; If it's a file, filter on valid extensions:
TEST fd.dwFileAttributes, FILE_ATTRIBUTE_DIRECTORY
JNZ @F
LEA EAX, fd.cFileName
CALL FilterFiles
TEST EAX, EAX ;Result of extension match.
JZ nxtfil

; Add folder or file to file list:
; Check for too many objects:
@@: CMP TotalObjectCount, $maxObjects
JNE @F
INVOKE MessageBox, NULL, OFFSET TooManyObjectsMsg, NULL, MB_OK
MOV NoMoreObjectsFlag, TRUE
JMP exit99 ;That's it, folks.

@@: PUSH EBX
MOV EBX, NextFileEntryPtr
MOV EAX, rootLoc
MOV [EBX].FILEINFO.rootLoc, EAX
MOV EDX, fiPtr ;Get ptr. to parent.
MOV [EBX].FILEINFO.parent, EDX
OR [EDX].FILEINFO.flag, $hasKids ;Indicate that the parent has children.
INVOKE strcpy, ADDR [EDX].FILEINFO.pathName, ADDR [EBX].FILEINFO.pathName
INVOKE _strcat, OFFSET BackslashStr, ADDR [EBX].FILEINFO.pathName
INVOKE _strcat, ADDR fd.cFileName, ADDR [EBX].FILEINFO.pathName

; Folder or file?
TEST fd.dwFileAttributes, FILE_ATTRIBUTE_DIRECTORY
JZ isfile

; It's a folder:
INC folderCount
MOV EAX, $folder
JMP SHORT @F

; It's a file:
isfile: INC FileCount
XOR EAX, EAX

@@: MOV [EBX].FILEINFO.flag, EAX
POP EBX

INC TotalObjectCount
INC totalCountThisDir

ADD NextFileEntryPtr, SIZEOF FILEINFO

; Get next file:
nxtfil: INVOKE FindNextFile, findHandle, ADDR fd
TEST EAX, EAX
JZ dodirs
CMP EAX, ERROR_NO_MORE_FILES
JNE filter

; Reorder list so that folders are first, then files:
dodirs: MOV EAX, totalCountThisDir
TEST EAX, EAX
JZ @F
MOV EDX, fiSavePtr
CALL ReorderFilist

@@: MOV ECX, folderCount
ADD TotalFolderCount, ECX
JECXZ exit99

PUSH EBX
MOV EBX, fiSavePtr ;Pt. to list of new folders & files.

dofldr: PUSH ECX

; Now recurse into folders:
foldlp: TEST [EBX].FILEINFO.flag, $folder
JZ skipfl

INVOKE GetAllDirsAndFiles, ADDR [EBX].FILEINFO.pathName
POP ECX
ADD EBX, SIZEOF FILEINFO
LOOP dofldr
POP EBX
JMP SHORT exit99

skipfl: ADD EBX, SIZEOF FILEINFO
JMP foldlp

exit99: RET

GetAllDirsAndFiles ENDP

There's more to the story. I'm going to stop here and continue with another post for Part 2.
Assembly language programming should be fun. That's why I do it.

NoCforMe

Some juicy details about using treeview controls:

To create a treeview item:
  • Start at the "root", of course
  • For each item, fill out a TVINSERTSTRUCT structure.
    This includes a nested TVITEM (or TVITEMEX) structure that contains the details, like the text pointer, whether the item has children, etc.
    You need to set the mask member (it's _mask for MASM) to tell which members you're setting in the structure
  • Use the TVM_INSERTITEM message to create the item.

To clear a treeview control:
Easy-peasy; just use the TVM_DELETEITEM message with a value of TVI_ROOT (or zero).

To expand (or collapse) a treeview at any point in the tree:
Use the TVM_EXPAND message, naturally. (If you don't expand the control it'll come up collapsed when you create it.)

Treeview styles:
You probably want all these styles:
  • TVS_HASBUTTONS creates "buttons" for nodes (with or without plus sign)
  • TVS_HASLINES gives you those cool connecting lines between items
  • TVS_LINESATROOT shows a line at the "root" (top of the tree)

So how do you track treeview items to your data structures, like my folder/file list? Well, like so many Win32 structures or functions, TVITEM(EX) contains a handy-dandy lParam element, which is just a DWORD. I use this to hold a pointer to the corresponding FILEINFO element for each treeview item. Works like a charm.
Assembly language programming should be fun. That's why I do it.

NoCforMe

Issues:

After coming up with a working demo of at least part of this app (a picture & text associator, remember), several issues have emerged:

  • Speed:  Scanning directory structures takes some time. Sometimes lots of time.
  • Text storage:  Storing all that text to associate with pictures is not going to be trivial. I'm still throwing around ideas for how this will work.
  • Data storage:  By which I mean storing the whole shebang to disk so it can be loaded later. All kinds of problems here!
  • Stack considerations

I'll cover the speed considerations here.

Interesting detail: my first version of the folder/file scanner (code posted in 1st post above) was more complicated: I did two passes through each folder I was scanning, first for folders, then for files. Not only was this overly complex, but more importantly it was slower than hell! On a path containing a little more than 4,000 files it took about 45 seconds to grind through all that data! But when I simplified it to the one-pass version posted here, the time went down ... to less than 1 second! I must have been tripping up the file-search process something terrible; it may be that it expects a certain sequence of searching (getting all folders and files in a folder in one batch), and gets all screwed up with another sequence.

Even now it can take a long time with a lot of files. If I scan my entire working disk (~28,000 files in ~16,000 folders) it still takes almost 30 seconds.

Which is way too long to leave a user looking at an unresponsive program (which will probably show "Not responding" in its title bar). I know how to fix this: run the directory scan in a separate thread. And give the user some indication of activity, like a progress bar.

Oh well, this probably is never going to be a commercial product anyway ...

I wonder if those newfangled "shell" functions are any faster?
Assembly language programming should be fun. That's why I do it.

sinsi

pathName should be easy enough to get, assuming the node text is the directory name.

GetNodeFullPath PROC USES EBX path:PTR,node:HTREEITEM
    ;path = buffer large enough (MAX_PATH presumably)
    ;node = the node with the file name
    mov     ebx,node
 @@:
    INVOKE  PrependNodeText,ebx,path
    INVOKE  SendMessage,treeview,TVM_GETNEXTITEM,TVGN_PARENT,ebx
    mov     ebx,eax
    test    eax,eax
    jnz     @b
    ret
GetNodeFullPath ENDP
😁

NoCforMe

Yes, that's another way it could be done; really 6 of one/half a dozen of the other, as I have those (partial) pathnames in my file list, and all I have to do is access my parent pointer.

Quote from: sinsi on April 05, 2024, 11:42:14 AMpathName should be easy enough to get, assuming the node text is the directory name.
Actually, it's not: I'm only storing the "root" name (name + .ext), at least in the treeview (for display). But I store the entire pathname for the node in my own file list. And I'd only have to go back one level to a file's parent to construct the entire pathname, not all the way back to the root. (That's what all those MAX_PATH elements are used for.)

Pet peeve: I hate that word "prepend". So programmer-ish. The right word would be prefix.

Assembly language programming should be fun. That's why I do it.

NoCforMe

Issue: Data Storage

By this I mean storing the whole shebang, file list and associated text, after scanning and user input. Now it's easy enough to come up with a file format for this; it's essentially a database with a tree structure having pointers into a text heap of some kind.

What's a much more serious problem is what you're going to do with that database once you load it. Say the user picks an entry point in the directory structure on their disk, scans it, inputs a bunch of text to go with pictures and saves it. Later they start the program and reload the database. But in the meantime pictures have been moved, deleted and added; how the hell do you synchronize that mess? Ugh.

I suppose my first pass will only work with files (pictures) that are where they were before, and just ignore the others. But even this will mean having to confirm that those files still exist in their original locations.

Design; it's a bitch, but it's absolutely necessary. (Unless you don't mind a program that was obviously designed by a programmer!)
Assembly language programming should be fun. That's why I do it.

sinsi

QuoteAlthough it sounds correct, prepend is not an English word. It was created to sound like the opposite of "append," which means to add to the end.
Every day we learn a new thing  :cool:
😁

NoCforMe

Hopefully, yes.

OK, I'm re-attaching the demo executable here. I found the problem, a very stupid error made by one of our low-level workers here (they have been severely reprimanded and hopefully won't make that mistake again!). So give it a try; you can click on anything without it crashing. (I'm pretty sure.)
Assembly language programming should be fun. That's why I do it.

sinsi

(4ad0.4fc): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000001 ebx=00000001 ecx=00000001 edx=00000000 esi=004010f0 edi=00550a52
eip=00401183 esp=0019f144 ebp=0019f490 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202
PixOrg+0x1183:
00401183 f7821401000001000000 test dword ptr [edx+114h],1 ds:002b:00000114=????????
Click the + to open a node, bang!
😁

NoCforMe

Yikes. Sorry about that.

Looks like I was testing the "folder" bit there with EDX=0. Hmmmm ... back to the lab.

Well, folks, looks like development is at a standstill until we get this bug report resolved here at NoCforMe Laboratories, GmbH.

Found what I think was the problem (I didn't set the pointer to my file list for the root treeview item, and I must have never clicked on that). Updated version attached to reply #7 above.
Assembly language programming should be fun. That's why I do it.

sinsi

(2b24.a20): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
*** WARNING: Unable to verify checksum for e:\Desktop\PixOrg.exe
eax=00000001 ebx=00000001 ecx=00000001 edx=00000000 esi=004010f0 edi=00240aea
eip=00401183 esp=0019f144 ebp=0019f490 iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010202
PixOrg+0x1183:
00401183 f7821401000001000000 test dword ptr [edx+114h],1 ds:002b:00000114=????????
Not a cut'n'paste  :biggrin:
😁

sinsi

Further playing
 - if I select a node before trying to open it, all is OK
 - I can open a node that isn't selected, subject to the above
 - image preview works
 - clicking the root node is an instant crash at the same code address, irrespective of the above
😁

Greenhorn

Quote from: NoCforMe on April 05, 2024, 11:25:02 AM
  • Speed:  Scanning directory structures takes some time. Sometimes lots of time.
  • Text storage:  Storing all that text to associate with pictures is not going to be trivial. I'm still throwing around ideas for how this will work.
  • Data storage:  By which I mean storing the whole shebang to disk so it can be loaded later. All kinds of problems here!
  • Stack considerations

What I would try to do ...

  • Speed: First scan root folder only (or just the first subnode, too) and create the treeview items
  • Speed: When opening a node (TVN_ITEMEXPANDING), scan the subfolder and create the treeview child items for that node (or the next level)
  • Data/Text storage: When creating a treeview item, store a pointer to an info structure (your data for the pic file) for each item in lParam of the TVITEM(EX)
Kole Feut un Nordenwind gift en krusen Büdel un en lütten Pint.

NoCforMe

Ah, incremental loading; interesting. I'll think about that.

That's pretty much what I had in mind for storing the whole mess on disk.

I re-attached the demo (fixed the crash bug) in post #7 above if anyone wants to play with this.
Assembly language programming should be fun. That's why I do it.

sinsi

OK, it works, but...there's no indication that a node is a folder or a picture file.
I can expand everything, there are 10 pages in the tree, but what are pictures? Data overload.
Use a listview. I found them easier to use than a treeview  :biggrin:
Or an imagelist for the nodes.
😁