News:

Masm32 SDK description, downloads and other helpful links
Message to All Guests

Main Menu

Small files / Large files

Started by KeepingRealBusy, November 30, 2013, 10:29:07 AM

Previous topic - Next topic

KeepingRealBusy

I ran into some interesting timing when accessing the same amount of data, but split in different ways (512 large files and 131072 small files). If anyone is interested, I will explain and supply the timing values I got.

Dave.

dedndave

well - some variations from one machine to another, i would think
different RAM configurations, OS versions, drive sizes, etc

but, it would be interesting to see the results, of course

KeepingRealBusy

Dave,

Here it is:


I started with about 1TB of data (8 BYTE records) which I reduced to 7/8 TB by
deleting one of the BYTES (7 BYTE records), and I split this data on the highest
8 bits into 256 files.

I then split each of these 256 files into 512 files (a total of 131072 files). I
had an intermediate file created with 6 BYTE records (3/4 TB) and finalized the
files into 5 BYTE records (5/8 TB) by grouping the data into 4 blocks and having
a 5 DWORD index at the front which contained the seek offsets of the record
groups. Essentially, I sorted the records on a 19 bit key (the data is binary).

In the debug process, I had to delete these files and their directories several
times. When deleting the 256 large files, it took about 30 seconds. When
deleting the 131072 files, it took over 1 hour! Remember, this is the same amount
of data, in fact, the large files were larger than the small files in total size,
yet were 120 times faster during deletion.

I started re-thinking whether I should keep the small files or combine then back
into sorted larger files. Only one way to find out, try it. I wrote a procedure
to combine 512 of the small files into a merged copy of the data, with a larger
index in the same format as with the smaller files. I then created a randomized
list (only 512 file fragments accessed total to keep the execution time down)
but laid out such that I accessed 512 different fragments spread out among the
files in the directories. I used the same list for accessing both the small
files (reading a single fragment from the possible 4 fragments), and for
accessing the large files (for multiple fragments). Essentially, I was accessing
data on the directories in the same relative positions on the drive surface.

For all accesses, I calculated the filename from the random list fragment info,
opened the file, read the index, did a seek to the start of the data, and read
the fragment data, closed the file, then selected another fragment to read. The
file accesses were all set for unbuffered I/O to avoid a lot of system copying,
reading it directly into a sector sized and bounded buffer from a sector bounded
file position for a sector sized read size. OBTW, the files were all on a 4 TB
external drive, USB3 port.

I thought that the extra time to read 4 KB (the drive sector size) just to get
the 20 BYTE index from the small files would match the extra time to read the 8
KB index on the large files. I did not attempt to read the first half of the
index or the second half of the index on the larger files, depending on which
index was desired, and this could reduce the index reading time. The fragments
I was reading were generally 1,310,000 BYTES (about 3 MB each) which should have
swamped any index reading inefficiencies.

I thought that the directory processing (the filesystem meta-data) was
responsible for the slow deletion time and that this would also affect the file
access time as well when reading. Oh, was I wrong!

The following is the result:

Timing 130,072 small file accesses.
Accumulated times:

Time:          24.670 Read time.
Time:           8.387 Process time.

Timing 256 large file accesses.
Accumulated times:

Time:          50.851 Read time.
Time:          16.338 Process time.


Dave.

dedndave

when you delete files, the data isn't erased - the entry is just marked

30 seconds * (131072/256) = 4.27 hours
so, deleting the small files was faster than you think   :t

KeepingRealBusy

Dave,

Doesn't the cluster map have to be updated so that new allocations can be made? I would not expect the file space to be erased, but at least the bit map must be updated?

Dave.

hutch--

Dave,

is it an old disk or a very slow one ? I know this problem from trying to write high file counts to pen drives which are as slow as a wet week. I sometimes back up or move partitions around from one machine to another and you get to see how fast reads and writes are apart from the speed of a gigabit network which flattens out the speed some.

I wonder if in fact it is an old disk that is slow at disk writes or at least the seek time.

sinsi

Another slowdown is whether the removable disk is optimised for quick removal or better performance.
Quick removal doesn't use a cache so writes are flushed straight away, slowing the process (delete, update mft, update bitmap) for each file.
With the cache enabled it's more like delete, delete, delete...update.

dedndave

Quote from: KeepingRealBusy on November 30, 2013, 02:20:06 PM
Dave,

Doesn't the cluster map have to be updated so that new allocations can be made? I would not expect the file space to be erased, but at least the bit map must be updated?

Dave.

right - that didn't come to mind
let's say you have the drive formatted with 4 KB cluster size
each 256 byte file uses 1 cluster + "directory entry" = 2 writes
and each 131072 byte file uses 32 clusters + "directory entry" = 33 writes (hopefully not)
one might expect that the 32 cluster markers in the volume index are sometimes in the same cluster   :P
or, at least, fall into 2 or 3 clusters
that would account for the disparity in the previous calculation vs measured times

KeepingRealBusy

Dave,

Your counts are not quite right.

Take 7/8 TB and divide by 256. This gives you the average size of one of the 256 data files. Add 4095 to that and divide by 4096 to get the number of 4 KB clusters in each of the 256 data files.

((((1024*1024*1024*1024)*7)/8)/256) = 3,758,096,384 BYTES  (((3,758,096,384)+4095)/4096)  = 917,504 clusters.

Take 6/8 TB and divide by 256 and divide by 512. This gives you the average size of one of the 131,072 intermediate files. Add 4095 to that and divide by 4096 to get the number of 4 KB clusters in each of the 131,072 intermediate files.

((((1024*1024*1024*1024)*6)/8)/256/512) = 6,291,456 BYTES  (((6,291,456)+4095)/4096)  = 1,536 clusters.

Take 5/8 TB and divide by 256 and divide by 512, and add 20. This gives you the average size of one of the 131,072 index files. Add 4095 to that and divide by 4096 to get the number of 4 KB clusters in each of the 131,072 index files.

(((((1024*1024*1024*1024)*5)/8)/256/512)+20) = 5,242,900 BYTES  (((5,242,900)+4095)/4096)  = 1,281 clusters.

Take 5/8 TB and divide by 256, and add 8192. This gives you the average size of one of the 256 merged files. Add 4095 to that and divide by 4096 to get the number of 4 KB clusters in each of the 256 merged files.

((((1024*1024*1024*1024)*5)/8)/256)+8192) = 2,684,362,752 BYTES  (((2,684,362,752)+4095)/4096)  = 655,362 clusters.

I think these numbers are correct, correct me if I'm wrong.

At any rate, this is a huge amount of data to move around, and this is only half of the data.

The initial data is two incrementing 20 bit factors multiplied and masked to 20 bits and saved with the two factors and a 4 bit fill for 64 bits (this gives the 1 TB total data size). The file name is the hex conversion of the highest 8 bits (as Lxxxxx.dat), but since 8 bits of the product are already in the file name, just delete those 8 bits from the records (giving 7/8 TB) and save 20+20+12+4 bits instead of 20+20+20+4 bits.

The other half of the data is two incrementing 20 bit factors multiplied, left shifted to a 40 bit field and then right shifted to 20 bits and saved with the two factors and a 4 bit fill for 64 bits (this gives the 1 TB total data size). The file name is the hex conversion of the highest 8 bits (as Hxxxxx.dat), but since 8 bits of the product are already in the file name, just delete those 8 bits from the records (giving 7/8 TB) and save 20+20+12+4 bits instead of 20+20+20+4 bits.

These data files are then split into the index files containing only the two 20 bit factors and an initial 5 DWORD index giving the split between each of the 4 blocks of data.

Dave.



KeepingRealBusy

As an added thought (or question), does anyone have a description of the format of the file system fields when GPT formatting is used instead of MBR formatting on huge disks?

Dave.

dedndave

i didn't correctly examine your file sizes   :P

my point was that the cluster index entries quite often fall into the same index cluster
the writes are cached, so - it doesn't write each cluster entry individually

dedndave

GPT - what OS are you using ?

as far as i know, the drives still use NTFS format (perhaps a higher version)
but, GPT allows for larger volumes and partitions

KeepingRealBusy

Dave,

My post above used the term "clusters" incorrectly, that should have been "sectors". I'm not sure whether or not "clusters" are used in NTFS. The drives I am saving these files on are 4 TB external drives, and as you mentioned in prior posts, they are in GPT format and have 4096 BYTE sectors.

Dave.

Vortex

Hi Dave,

You can check the GPT fdisk tools. Text mode partitioning tools for GPT drives :

http://www.rodsbooks.com/gdisk

KeepingRealBusy

Vortex,

Thank you very much for the link.

Dave.