The MASM Forum

General => The Workshop => Topic started by: NoCforMe on January 10, 2025, 08:33:32 AM

Title: How to synchronize a database?
Post by: NoCforMe on January 10, 2025, 08:33:32 AM
This is kind of a blue-sky project I have in mind. No real plans to actually implement it just yet; something I'm curious about and would be interested in others' takes on.

And no code at this point! What I'm looking for is a discussion of a possible general plan to accomplish what I want to do.

The project, such as it is, would be a simple database with information about files on a computer. Specifically image files (pictures). The idea is to be able to "tag" your collection of pictures with any number of tags (simple text strings), and be able to select pictures from the collection using those tags ("show me all pictures that have Aunt Martha at Theo's birthday party"). Very simple.

The problem of course, if you think about it, is this: let's say you get this thing all set up and running. Great; you can add tags, add pictures and tag them, and find pictures using tag queries.

But then you start moving and deleting pictures on your computer. Now you have a situation where your database is totally out of sync with your file system. And there's really no reasonable way for you to manually correct this (well, you could do it, but it would be a total pain in the ass). How would you deal with that?

Even though I have zero experience with it, I'm guessing that some protocol like ODBC might be what's needed here: some way for the OS (Windows) to signal to my application that a file has been moved, renamed or deleted, so that the database can be updated.

I have no idea how this system would work nor how to implement it. (And ODBC is just a guess; it may well be some other protocol, like maybe DDE.)

I know that there are probably existing applications that already do what I've described here, and more: I'm not interested in them. I want to know how a guy would implement this on his own.

Not urgent, since I have no plans to forge ahead with this anytime soon. Just very curious and interested.
Title: Re: How to synchronize a database?
Post by: sinsi on January 10, 2025, 09:28:17 AM
If you keep them on an NTFS drive you could use alternate data streams.
Title: Re: How to synchronize a database?
Post by: NoCforMe on January 10, 2025, 09:36:23 AM
Interesting.
Not ideal due to lack of file system agnosticism. But interesting. Looking into it.
Title: Re: How to synchronize a database?
Post by: fearless on January 10, 2025, 09:49:11 AM
Could look at few options to achieve this.

Alternate data streams - use 1 per file, and add the tag info there. But it would increase disk usage per file by 23bytes+length of tag data (however that is encoded). https://learn.microsoft.com/en-us/windows/win32/fileio/file-streams (https://learn.microsoft.com/en-us/windows/win32/fileio/file-streams), https://learn.microsoft.com/en-us/windows/win32/fileio/using-streams (https://learn.microsoft.com/en-us/windows/win32/fileio/using-streams), https://learn.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-win32_find_stream_data (https://learn.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-win32_find_stream_data), https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-findfirststreamw (https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-findfirststreamw)

One file per folder, similar to thumbs.db, maybe called metadata.db, that stores the tag info for files in the folder. Use hashes instead of names to get round the renaming of files issue. Use api for watching directory changes: https://learn.microsoft.com/en-us/windows/win32/fileio/obtaining-directory-change-notifications (https://learn.microsoft.com/en-us/windows/win32/fileio/obtaining-directory-change-notifications) to handle removing an entry for metadata.db if file is deleted, or add a default 'tag' to a new file added to directory.

Use IPropertyStore COM to set 'tag' information for a file: https://stackoverflow.com/questions/6080319/where-does-windows-explorer-store-file-meta-data (https://stackoverflow.com/questions/6080319/where-does-windows-explorer-store-file-meta-data), https://learn.microsoft.com/en-us/windows/win32/stg/ipropertysetstorage-ntfs-file-system-implementation (https://learn.microsoft.com/en-us/windows/win32/stg/ipropertysetstorage-ntfs-file-system-implementation), https://learn.microsoft.com/en-us/windows/win32/api/shobjidl_core/nn-shobjidl_core-ishellitem2 (https://learn.microsoft.com/en-us/windows/win32/api/shobjidl_core/nn-shobjidl_core-ishellitem2)

For any local databases you could simply use SQLite or if the info is very small then just a simple ini format for a 'database' file.
Title: Re: How to synchronize a database?
Post by: NoCforMe on January 10, 2025, 12:33:21 PM
Very interesting, and thanks.

One general question: assuming there's some way to capture changes to the filesystem through Windows, the question is how?

Meaning that also assuming that the picture-tagging program is only run intermittently, how would one capture these changes that can occur when it's not running?

Wouldn't you have to a process that's running all the time in the background to capture these changes? Or is there some way of querying a log that the OS maintains with such changes (files moved/deleted/renamed)?
Title: Re: How to synchronize a database?
Post by: sinsi on January 10, 2025, 12:37:40 PM
JPEG files can have EXIF data embedded, but I'm not sure if other formats do.
Title: Re: How to synchronize a database?
Post by: NoCforMe on January 10, 2025, 01:21:17 PM
I wouldn't want to touch the picture files; they'd be read-only. All info would be stored in the database.
Title: Re: How to synchronize a database?
Post by: sinsi on January 10, 2025, 02:03:53 PM
Are the files going to be restricted to one computer, or do you want a portable solution?
Portable means using some sort of datafile, if you use a database then the target probably needs additional software.
Another restriction, Windows only? Or would you support linux/mac?

Quote from: NoCforMe on January 10, 2025, 12:33:21 PMassuming that the picture-tagging program is only run intermittently, how would one capture these changes that can occur when it's not running?
None that I can think of.

Quote from: NoCforMe on January 10, 2025, 12:33:21 PMWouldn't you have to a process that's running all the time in the background to capture these changes?
Yes, possibly using ReadDirectoryChangesW

Quote from: NoCforMe on January 10, 2025, 12:33:21 PMOr is there some way of querying a log that the OS maintains with such changes (files moved/deleted/renamed)?
Change journals, but that requires admin privileges.


Title: Re: How to synchronize a database?
Post by: NoCforMe on January 10, 2025, 02:18:58 PM
o Windows only
o Single computer

Sounds like it would require a background process to be run, if the user wanted it to be able to track changes.
Otherwise it could just revert to a "dumb" mode where if it doesn't find a file at the expected place, too bad.
Title: Re: How to synchronize a database?
Post by: sinsi on January 10, 2025, 02:46:03 PM
If you use the async version of ReadDirectoryChangesW then your program waits on an event, so uses minimum resources. One caveat, it's a unicode-only function.
Understanding ReadDirectoryChangesW - Part 1 (https://qualapps.blogspot.com/2010/05/understanding-readdirectorychangesw.html)
Title: Re: How to synchronize a database?
Post by: NoCforMe on January 10, 2025, 02:54:22 PM
Hey, thanks a lot. That's a read for another day, but I will read it.
Unicode can easily be worked around; I have my own ASCII <--> Unicode functions at the ready.
Title: Re: How to synchronize a database?
Post by: jj2007 on January 10, 2025, 06:41:41 PM
Quote from: NoCforMe on January 10, 2025, 08:33:32 AMThe idea is to be able to "tag" your collection of pictures with any number of tags (simple text strings), and be able to select pictures from the collection using those tags ("show me all pictures that have Aunt Martha at Theo's birthday party"). Very simple.

I do have a program that does this, a searchable database of images and videos. You can add text of any length to each and every image. Currently, I have about 18,000 images there.

The problem is not the files that get moved. There is a simple solution: don't move them. Why should you? The location of each file is in the database, no need to move them.

The real problem is maintaining the database. You can spend weeks adding text to 18,000 images: "Oh, there is Aunt Mary with her nephew. What do I write?"

Quote from: NoCforMe on January 10, 2025, 02:18:58 PMit could just revert to a "dumb" mode where if it doesn't find a file at the expected place

If it doesn't find the file, it should automagically look for it in known places, checking name and creation date.
Title: Re: How to synchronize a database?
Post by: NoCforMe on January 10, 2025, 07:06:00 PM
Quote from: jj2007 on January 10, 2025, 06:41:41 PM
Quote from: NoCforMe on January 10, 2025, 08:33:32 AMThe idea is to be able to "tag" your collection of pictures with any number of tags (simple text strings), and be able to select pictures from the collection using those tags ("show me all pictures that have Aunt Martha at Theo's birthday party"). Very simple.

I do have a program that does this, a searchable database of images and videos. You can add text of any length to each and every image. Currently, I have about 18,000 images there.

The problem is not the files that get moved. There is a simple solution: don't move them. Why should you? The location of each file is in the database, no need to move them.

But JJ, the user of this program (who might not be me) could well move files after tagging them. You do see that, right? People move files all the time. Or rename them.

Quote from: NoCforMe on January 10, 2025, 02:18:58 PMit could just revert to a "dumb" mode where if it doesn't find a file at the expected place

QuoteIf it doesn't find the file, it should automagically look for it in known places, checking name and creation date.

Yes, that could work. You could call it search and rescue, I guess.
Title: Re: How to synchronize a database?
Post by: jj2007 on January 10, 2025, 07:22:01 PM
Quote from: NoCforMe on January 10, 2025, 07:06:00 PMPeople move files all the time. Or rename them.
If they rename them, search and rescue will not work.

QuoteUnderstanding ReadDirectoryChangesW - Part 1 (https://qualapps.blogspot.com/2010/05/understanding-readdirectorychangesw.html)
That looks like great fun, sinsi :biggrin:
Title: Re: How to synchronize a database?
Post by: NoCforMe on January 14, 2025, 07:24:55 PM
I put together this little app that monitors any part of a filesystem and reports any changes, using ReadDirectoryChangesW(). Try it out and see how it works. (Source attached.)

It works OK, but with some problems: I initially call ReadDirectoryChangesW() with the bWatchSubtree parameter set to TRUE so that it'll monitor changes in any directories below the selected one. But for some strange reason, if I change this after starting the program (there's a checkbox for this in the dialog), it becomes flaky. If I turn this option off (setting bWatchSubtree to FALSE), it still reports changes in directories below the selected one. (In order for this to work correctly, if it could, you need to stop the monitoring and then re-enable it so the watch routine restarts with the new setting of this parameter.)

Other than that, this seems to work OK. I opted for simplified operation here; I call ReadDirectoryChangesW() in synchronous mode in a continuous loop within a thread, which seems to work well. It shows that this could probably be used as a "silent" process to monitor filesystem changes and send messages to another thread or process to track changes on a drive.

Comments welcome.
Title: Re: How to synchronize a database?
Post by: sinsi on January 14, 2025, 07:57:21 PM
You shouldn't really use TerminateThread, it's usually a last resort.
Why not set RDCenabled to FALSE and let the thread clean up and exit properly?

Quote from: TerminateThreadTerminateThread is a dangerous function that should only be used in the most extreme cases.
Title: Re: How to synchronize a database?
Post by: NoCforMe on January 14, 2025, 08:19:31 PM
Quote from: sinsi on January 14, 2025, 07:57:21 PMYou shouldn't really use TerminateThread, it's usually a last resort.
Why not set RDCenabled to FALSE and let the thread clean up and exit properly?

Quote from: TerminateThreadTerminateThread is a dangerous function that should only be used in the most extreme cases.

Good question.
Here's the thing: in the watch routine (RDCmonitor()), ReadDirectoryChangesW is called in an endless loop (unless the flag RDCenabled is set to FALSE). This means that almost all the time, that routine is waiting for ReadDirectoryChangesW to return (called synchronously, remember), so there's no way to check any flags to exit the routine.

There is this code
again: CMP RDCenabled, FALSE
JE exit99
which will exit the thread, but ReadDirectoryChangesW has to complete and return before that can happen.

I tried using ExitThread() outside that routine to shut it down, but that doesn't work. So that's why I use TerminateThread() instead; it reliably terminates the thread (hey, that's why it's called that, huh?), in case you want to change the directory being monitored. Otherwise, you can just leave it running in that infinite loop.

Things would work differently, of course, if you used asynch operation instead, but I haven't tried that yet. This seems to work well and it's simpler.

So I wonder what's so "dangerous" about TerminateThread()?
Title: Re: How to synchronize a database?
Post by: sinsi on January 14, 2025, 08:48:20 PM
CancelSynchronousIo (https://learn.microsoft.com/en-us/windows/win32/fileio/cancelsynchronousio-func)

QuoteMarks pending synchronous I/O operations that are issued by the specified thread as canceled.
Then just check the return status I would guess.
Title: Re: How to synchronize a database?
Post by: NoCforMe on January 14, 2025, 08:50:49 PM
I'll have to look into using that.
Title: Re: How to synchronize a database?
Post by: mabdelouahab on January 15, 2025, 05:28:44 AM
FileSystemWatcher (https://learn.microsoft.com/en-us/dotnet/api/system.io.filesystemwatcher?view=net-9.0)
Title: Re: How to synchronize a database?
Post by: sinsi on January 15, 2025, 06:19:27 AM
Quote from: mabdelouahab on January 15, 2025, 05:28:44 AMFileSystemWatcher (https://learn.microsoft.com/en-us/dotnet/api/system.io.filesystemwatcher?view=net-9.0)
VB.net?

*ducks for cover*
Title: Re: How to synchronize a database?
Post by: zedd151 on January 15, 2025, 06:45:21 AM
Quote from: sinsi on January 15, 2025, 06:19:27 AM
Quote from: mabdelouahab on January 15, 2025, 05:28:44 AMFileSystemWatcher (https://learn.microsoft.com/en-us/dotnet/api/system.io.filesystemwatcher?view=net-9.0)
VB.net?

*ducks for cover*

Even worse...
Quote from: FileSystemWatcher_link_from_above...C#

Otherwise known as C Sharp. Might as well be Python.
:badgrin:
Title: Re: How to synchronize a database?
Post by: sinsi on January 15, 2025, 07:10:13 AM
Happy to be the pedant  :biggrin:
Depends on what you viewed last (yes, I just outed myself as a vb.net user :badgrin:)
Untitled.png
Title: Re: How to synchronize a database?
Post by: zedd151 on January 15, 2025, 07:40:18 AM
Quote from: sinsi on January 15, 2025, 07:10:13 AMDepends on what you viewed last
From the example code in the posted link.  :tongue:

(https://i.postimg.cc/m22h7Xz0/untitled.png)


Quote from: sinsi on January 15, 2025, 07:10:13 AM(yes, I just outed myself as a vb.net user :badgrin:)

Blasphemy!  :biggrin:
Title: Re: How to synchronize a database?
Post by: sinsi on January 15, 2025, 08:03:27 AM
My point was that you can change the .net language of the page to suit what you use.
I last used VB.net, so the website remembers that.
Untitled.png
Title: Re: How to synchronize a database?
Post by: zedd151 on January 15, 2025, 08:21:20 AM
Quote from: sinsi on January 15, 2025, 08:03:27 AMMy point was that you can change the .net language...
:rolleyes:
No thanks.
Title: Re: How to synchronize a database?
Post by: NoCforMe on January 15, 2025, 10:24:46 AM
Quote from: mabdelouahab on January 15, 2025, 05:28:44 AMFileSystemWatcher (https://learn.microsoft.com/en-us/dotnet/api/system.io.filesystemwatcher?view=net-9.0)

I'll bet that uses ReadDirectoryChangesW() under the hood.
Title: Re: How to synchronize a database?
Post by: NoCforMe on January 15, 2025, 12:51:58 PM
Quote from: sinsi on January 14, 2025, 08:48:20 PMCancelSynchronousIo (https://learn.microsoft.com/en-us/windows/win32/fileio/cancelsynchronousio-func)

QuoteMarks pending synchronous I/O operations that are issued by the specified thread as canceled.
Then just check the return status I would guess.

OK, I give up: why isn't CancelSynchronousIo() in any of the Masm32 include files?

Aaargh; gotta use LoadLibrary() and GetProcAddress().
Title: Re: How to synchronize a database?
Post by: sinsi on January 15, 2025, 01:32:09 PM
Quote from: NoCforMe on January 15, 2025, 12:51:58 PMgotta use LoadLibrary() and GetProcAddress
It's in the Windows 10 SDK kernel32.lib but not MASM32.
Does that mean that MASM32 hasn't been updated to Vista? :dazzled:
Title: Re: How to synchronize a database?
Post by: zedd151 on January 15, 2025, 01:58:23 PM
Quote from: sinsi on January 15, 2025, 01:32:09 PMDoes that mean that MASM32 hasn't been updated to Vista? :dazzled:
Probably since Windows xp.  :tongue: 
Title: Re: How to synchronize a database?
Post by: TimoVJL on January 15, 2025, 02:08:34 PM
krnl32x.def
LIBRARY kernel32.dll
EXPORTS
 _CancelSynchronousIo@4
polib -def:krnl32x.def -out:krnl32x.lib -machine:x86
Title: Re: How to synchronize a database?
Post by: sinsi on January 15, 2025, 02:22:42 PM
MASM64 doesn't have it either :sad:
Title: Re: How to synchronize a database?
Post by: NoCforMe on January 15, 2025, 03:35:38 PM
I want my money back.
Title: Re: How to synchronize a database?
Post by: NoCforMe on January 15, 2025, 03:37:09 PM
Quote from: TimoVJL on January 15, 2025, 02:08:34 PMkrnl32x.def
LIBRARY kernel32.dll
EXPORTS
 _CancelSynchronousIo@4
polib -def:krnl32x.def -out:krnl32x.lib -machine:x86
Timo, I'm not using that Po stuff; I'm using regular old MASM.
Title: Re: How to synchronize a database?
Post by: TimoVJL on January 15, 2025, 04:03:47 PM
Building an Import Library and Export File (https://learn.microsoft.com/en-us/cpp/build/reference/building-an-import-library-and-export-file?view=msvc-170)
Title: Re: How to synchronize a database?
Post by: NoCforMe on January 15, 2025, 04:16:05 PM
Quote from: TimoVJL on January 15, 2025, 04:03:47 PMBuilding an Import Library and Export File (https://learn.microsoft.com/en-us/cpp/build/reference/building-an-import-library-and-export-file?view=msvc-170)

I'm not going to go to the trouble to create a library just for one stupid missing function in MASM include files: I just load it directly from the DLL using LoadLibrary() and GetProcAddress().
Title: Re: How to synchronize a database?
Post by: NoCforMe on January 18, 2025, 07:04:49 AM
OK, here's an update: everything seems to work correctly, except for one thing.

What doesn't work is setting the parameter to ReadDirectoryChangesW() that selects whether the subtree below the selected directory is monitored or not. (The parameter is bWatchSubtree, a BOOL set to TRUE or FALSE.)

To be precise, it does work--the first time you call the function. But afterwards, even though I'm exiting the thread within which this function is called, and then restarting the thread anew with the new setting of this parameter, it stubbornly holds onto the original value.

Maybe someone here can do a little testing of this? It's annoying, and certainly not covered at all in the Micro$oft documentation.
Title: Re: How to synchronize a database?
Post by: sinsi on January 19, 2025, 01:39:03 AM
I had to make one change (added WS_VSCROLL to the editbox).
Subdirectory flag only seems to work as it is initially set at program start, changing it (even after stop) does nothing.
QuoteWhen you first call ReadDirectoryChangesW, the system allocates a buffer to store change information. This buffer is associated with the directory handle until it is closed and its size does not change during its lifetime.
I suspect it caches that flag, maybe open the dir handle at the start of the thread and close it when the thread exits.

edit:
It works if you open/close within the thread but only if you stop monitoring, toggle the flag then start monitoring.
You could probably get it to work in the check handler by stop/starting the monitor.