News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests
NB: Posting URL's See here: Posted URL Change

Main Menu

How to synchronize a database?

Started by NoCforMe, January 10, 2025, 08:33:32 AM

Previous topic - Next topic

NoCforMe

This is kind of a blue-sky project I have in mind. No real plans to actually implement it just yet; something I'm curious about and would be interested in others' takes on.

And no code at this point! What I'm looking for is a discussion of a possible general plan to accomplish what I want to do.

The project, such as it is, would be a simple database with information about files on a computer. Specifically image files (pictures). The idea is to be able to "tag" your collection of pictures with any number of tags (simple text strings), and be able to select pictures from the collection using those tags ("show me all pictures that have Aunt Martha at Theo's birthday party"). Very simple.

The problem of course, if you think about it, is this: let's say you get this thing all set up and running. Great; you can add tags, add pictures and tag them, and find pictures using tag queries.

But then you start moving and deleting pictures on your computer. Now you have a situation where your database is totally out of sync with your file system. And there's really no reasonable way for you to manually correct this (well, you could do it, but it would be a total pain in the ass). How would you deal with that?

Even though I have zero experience with it, I'm guessing that some protocol like ODBC might be what's needed here: some way for the OS (Windows) to signal to my application that a file has been moved, renamed or deleted, so that the database can be updated.

I have no idea how this system would work nor how to implement it. (And ODBC is just a guess; it may well be some other protocol, like maybe DDE.)

I know that there are probably existing applications that already do what I've described here, and more: I'm not interested in them. I want to know how a guy would implement this on his own.

Not urgent, since I have no plans to forge ahead with this anytime soon. Just very curious and interested.
Assembly language programming should be fun. That's why I do it.

sinsi

If you keep them on an NTFS drive you could use alternate data streams.

NoCforMe

Interesting.
Not ideal due to lack of file system agnosticism. But interesting. Looking into it.
Assembly language programming should be fun. That's why I do it.

fearless

Could look at few options to achieve this.

Alternate data streams - use 1 per file, and add the tag info there. But it would increase disk usage per file by 23bytes+length of tag data (however that is encoded). https://learn.microsoft.com/en-us/windows/win32/fileio/file-streams, https://learn.microsoft.com/en-us/windows/win32/fileio/using-streams, https://learn.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-win32_find_stream_data, https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-findfirststreamw

One file per folder, similar to thumbs.db, maybe called metadata.db, that stores the tag info for files in the folder. Use hashes instead of names to get round the renaming of files issue. Use api for watching directory changes: https://learn.microsoft.com/en-us/windows/win32/fileio/obtaining-directory-change-notifications to handle removing an entry for metadata.db if file is deleted, or add a default 'tag' to a new file added to directory.

Use IPropertyStore COM to set 'tag' information for a file: https://stackoverflow.com/questions/6080319/where-does-windows-explorer-store-file-meta-data, https://learn.microsoft.com/en-us/windows/win32/stg/ipropertysetstorage-ntfs-file-system-implementation, https://learn.microsoft.com/en-us/windows/win32/api/shobjidl_core/nn-shobjidl_core-ishellitem2

For any local databases you could simply use SQLite or if the info is very small then just a simple ini format for a 'database' file.

NoCforMe

Very interesting, and thanks.

One general question: assuming there's some way to capture changes to the filesystem through Windows, the question is how?

Meaning that also assuming that the picture-tagging program is only run intermittently, how would one capture these changes that can occur when it's not running?

Wouldn't you have to a process that's running all the time in the background to capture these changes? Or is there some way of querying a log that the OS maintains with such changes (files moved/deleted/renamed)?
Assembly language programming should be fun. That's why I do it.

sinsi

JPEG files can have EXIF data embedded, but I'm not sure if other formats do.

NoCforMe

I wouldn't want to touch the picture files; they'd be read-only. All info would be stored in the database.
Assembly language programming should be fun. That's why I do it.

sinsi

Are the files going to be restricted to one computer, or do you want a portable solution?
Portable means using some sort of datafile, if you use a database then the target probably needs additional software.
Another restriction, Windows only? Or would you support linux/mac?

Quote from: NoCforMe on January 10, 2025, 12:33:21 PMassuming that the picture-tagging program is only run intermittently, how would one capture these changes that can occur when it's not running?
None that I can think of.

Quote from: NoCforMe on January 10, 2025, 12:33:21 PMWouldn't you have to a process that's running all the time in the background to capture these changes?
Yes, possibly using ReadDirectoryChangesW

Quote from: NoCforMe on January 10, 2025, 12:33:21 PMOr is there some way of querying a log that the OS maintains with such changes (files moved/deleted/renamed)?
Change journals, but that requires admin privileges.



NoCforMe

o Windows only
o Single computer

Sounds like it would require a background process to be run, if the user wanted it to be able to track changes.
Otherwise it could just revert to a "dumb" mode where if it doesn't find a file at the expected place, too bad.
Assembly language programming should be fun. That's why I do it.

sinsi

If you use the async version of ReadDirectoryChangesW then your program waits on an event, so uses minimum resources. One caveat, it's a unicode-only function.
Understanding ReadDirectoryChangesW - Part 1

NoCforMe

Hey, thanks a lot. That's a read for another day, but I will read it.
Unicode can easily be worked around; I have my own ASCII <--> Unicode functions at the ready.
Assembly language programming should be fun. That's why I do it.

jj2007

Quote from: NoCforMe on January 10, 2025, 08:33:32 AMThe idea is to be able to "tag" your collection of pictures with any number of tags (simple text strings), and be able to select pictures from the collection using those tags ("show me all pictures that have Aunt Martha at Theo's birthday party"). Very simple.

I do have a program that does this, a searchable database of images and videos. You can add text of any length to each and every image. Currently, I have about 18,000 images there.

The problem is not the files that get moved. There is a simple solution: don't move them. Why should you? The location of each file is in the database, no need to move them.

The real problem is maintaining the database. You can spend weeks adding text to 18,000 images: "Oh, there is Aunt Mary with her nephew. What do I write?"

Quote from: NoCforMe on January 10, 2025, 02:18:58 PMit could just revert to a "dumb" mode where if it doesn't find a file at the expected place

If it doesn't find the file, it should automagically look for it in known places, checking name and creation date.

NoCforMe

Quote from: jj2007 on January 10, 2025, 06:41:41 PM
Quote from: NoCforMe on January 10, 2025, 08:33:32 AMThe idea is to be able to "tag" your collection of pictures with any number of tags (simple text strings), and be able to select pictures from the collection using those tags ("show me all pictures that have Aunt Martha at Theo's birthday party"). Very simple.

I do have a program that does this, a searchable database of images and videos. You can add text of any length to each and every image. Currently, I have about 18,000 images there.

The problem is not the files that get moved. There is a simple solution: don't move them. Why should you? The location of each file is in the database, no need to move them.

But JJ, the user of this program (who might not be me) could well move files after tagging them. You do see that, right? People move files all the time. Or rename them.

Quote from: NoCforMe on January 10, 2025, 02:18:58 PMit could just revert to a "dumb" mode where if it doesn't find a file at the expected place

QuoteIf it doesn't find the file, it should automagically look for it in known places, checking name and creation date.

Yes, that could work. You could call it search and rescue, I guess.
Assembly language programming should be fun. That's why I do it.

jj2007

Quote from: NoCforMe on January 10, 2025, 07:06:00 PMPeople move files all the time. Or rename them.
If they rename them, search and rescue will not work.

QuoteUnderstanding ReadDirectoryChangesW - Part 1
That looks like great fun, sinsi :biggrin:

NoCforMe

I put together this little app that monitors any part of a filesystem and reports any changes, using ReadDirectoryChangesW(). Try it out and see how it works. (Source attached.)

It works OK, but with some problems: I initially call ReadDirectoryChangesW() with the bWatchSubtree parameter set to TRUE so that it'll monitor changes in any directories below the selected one. But for some strange reason, if I change this after starting the program (there's a checkbox for this in the dialog), it becomes flaky. If I turn this option off (setting bWatchSubtree to FALSE), it still reports changes in directories below the selected one. (In order for this to work correctly, if it could, you need to stop the monitoring and then re-enable it so the watch routine restarts with the new setting of this parameter.)

Other than that, this seems to work OK. I opted for simplified operation here; I call ReadDirectoryChangesW() in synchronous mode in a continuous loop within a thread, which seems to work well. It shows that this could probably be used as a "silent" process to monitor filesystem changes and send messages to another thread or process to track changes on a drive.

Comments welcome.
Assembly language programming should be fun. That's why I do it.