In 1956, (seven years before I was born) IBM produced the first hard disk drive available to the general public. It was huge by today’s standards - about the size of a washing machine. It contained 50 disk platters and each one was about 2 feet in diameter. It only had a capacity of about 4 MB which is about the size of a single picture I take with my smart phone camera. Companies that used it had to lease it from IBM for $750 per month or $9000 per year. It was probably pretty loud and likely used quite a bit of electricity to run.
Thirty years later in 1986, while I was still in college; I purchased my first hard drive. During the interim, manufacturers had managed to greatly decrease the size while also more than doubling the capacity. It was still about 3x the size of today’s 3.5 inch drives but it still fit inside my computer box. It cost $800 and held 10 MB of data. It was also very slow by today’s standards, but compared to the floppy drives I had been using, it seemed very fast. File systems had already been invented to manage the data on these drives, even though they would only hold a few hundred small files.
Ten years after that in 1996, I purchased my first 1 GB drive. The capacity had increased 100 fold and the cost had been cut by 75%. For only $200, I had more disk space than I knew what to do with (at the time at least). The size had been shrunk down to the modern form factor and speeds had also increased significantly.
Just eleven years later in 2007, the first 1 TB drive was released. Capacity had increased another 1000x and speeds also improved although at a much slower pace. Even though average file sizes had increased, disk capacity had increased so much more so that these drives could hold millions of files. Some changes to file systems were made, but nearly all file systems are still based on the same architecture designed decades earlier.
Today, drives that hold 22 TB are available and drive manufacturers have announced drives with a 30 TB capacity which will be available later this year. Speeds have continued to improve but not nearly at the same rate as capacity. For every doubling of the speed, capacity seems to increase 10 fold or greater. This means that it takes an ever-increasing amount of time to read and/or write all the data to one of these drives. What once took several minutes to backup a full drive, can now take more than a full day. Drives that fail within an array of disks (i.e. a RAID setup) can likewise take days to rebuild.
File systems are just too old to effectively manage the number of files that can fit on modern disk drives. If the average file size is 100KiB, then just one of the 30 TB drives can fit about 300 million files. If you bundle several of them into a RAID setup, then a single system can hold billions of files. While some file systems are capable of storing that many files; things like searches for specific subsets of files are untenable using just file system mechanisms.
Various databases can be used to track files and do searches; but those systems are separate from the file system containing the files. If the database becomes out of sync with the files, it must be rebuilt in order to be accurate. As mentioned earlier, that operation can take days when hundreds of millions of files are present.
My Didgets system was designed to effectively manage the number of files that modern day disk drives can hold. Instead of just building on top of existing file systems, my system was built from the ground up with today’s data needs in mind.
Fair point.