This makes me wonder how efficient filesystems are with millions of file in a si...

majewsky · on April 15, 2021

Indeed, putting too many files in a single directory is inefficient, both for read and write. When writing a file (or changing metadata like permissions), the entire directory inode may have to be rewritten. When searching for a file or opening it, the entire file list for the directory needs to be read (in the worst case where the file is at the end of the list).

When you need to store lots of files on disk, it's a common pattern to spread them out in subdirectories. For instance, instead of `files/2d8af74bcb29ad84`, you would have `files/2d/8a/f74bcb29ad84`.

loeg · on April 15, 2021

> When writing a file (or changing metadata like permissions), the entire directory inode may have to be rewritten.

That isn't how inode filesystems work -- if you change a file's permissions, it's just an inode update on the file -- not the containing directory.

Even in DOS-type filesystems (FAT/exFAT), it's just a record update in the corresponding dirent for that file.

If you add a new file to a directory, that causes an mtime update on the directory's inode.

The rest is accurate -- many older filesystems have lookup performance that scales poorly with directory size (for DOS filesystems and BSD UFS, you have to do a full directory scan). Also ls defaults to sorting output, which is O(N log N) and can be slow in large directories.

fnord123 · on April 15, 2021

Large directory sizes suck on NFS and parallel file systems like Panasas, Lustre, GPFS, and the like. Python and Rust's Cargo also suck on networked file systems and would be greatly improved by pushing things into a sqlite file.

deckard1 · on April 15, 2021

most things suck on networked file systems. I still have trauma from having to use NFS more than a decade ago, on systems that would freeze on boot because the network couldn't be reached or the NFS server was down.

I'd also never put sqlite on NFS. Locking is often broken on NFS. Unless you can guarantee a heterogeneous environment. I can imagine some Excel guy in marketing is going to launch his SQLite UI on Windows and completely hose it all.

fnord123 · on April 15, 2021

Nfs locking broken.. excel? Sqlite ui on Windows? Are you sure you're not confusing SMB with nfs? I've had SMB cache files that ducks everything up. Not nfs.

simias · on April 15, 2021

That's sometimes true but it depends a lot on the filesystem used, and sometimes on the options used when building it. ReiserFS, ext4 and FAT 32 won't have the same performance profile.

I think breaking down large collections in subdirectories is mainly done in order to ensure that it'll work even with FS that don't deal well with very large directories and also because it makes inspecting the files manually a little more convenient given that many file explorers (and especially the GUI ones) can have trouble with large directories.

eru · on April 15, 2021

Yes, though I think there's some POSIX compatibility stuff that also makes big directories a hassle.

utxaa · on April 15, 2021

also ext4 (then 3 i think) used to look for dir entries linearly (it was a linked list of entries).

gugagore · on April 15, 2021

What about on the other end? Why not have each hex digit be a directory along a path? Then you have very, very few files per directory at the cost of deeper hierarchy. What's the practical downside?

pmiller2 · on April 15, 2021

Directories are just another type of file. So, if you do this, and your filenames are n characters long, you'll end up needing to do n file accesses just to find the file you're looking for. Unless the underlying file system does something to make that particular access pattern fast, well... it's going to be stupidly slow after a certain point.

loeg · on April 15, 2021

> Unless the underlying file system does something to make that particular access pattern fast

I don't know exactly how Linux does it. Windows hands off whole paths to the filesystem, so this idea is possible there.

In FreeBSD, there is a generic routine (lookup(9)) that goes component by component, so at each step the filesystem is only asked to resolve a single component to a vnode. I think a clever filesystem implementation (in FreeBSD) could look at the remaining path and kick off asynchronous prefetch... but I am not aware of anything doing this.

eru · on April 16, 2021

I've played around with fuse a bit.

There are two modes you can implement for your filesystem: in the so-called 'high level' mode you get the whole path. In the 'low level' mode the filesystem asks you for one piece of the path at a time.

The low level mode seemed faster in my tests, and I think it's also closer to how Linux kernel works internally?

ArneBab · on April 15, 2021

You then need to do more stat calls — one per hierarchy level — and you waste more inodes.

thewakalix · on April 15, 2021

[epistemic status: amateur speculation]

I’d expect it to take up slightly more disk space. More importantly, the more indirections, the longer it’ll take to get to the file.

loeg · on April 15, 2021

More overhead for each extra layer. Kernel serialization/locking, random seeks (dirent and inode per level), maybe other concerns.

eru · on April 16, 2021

The random seeks aren't a given: your filesystem could do something clever with memory or disk layout to avoid most of them.

loeg · on April 16, 2021

Anything particular in mind? You could probably get a good academic paper or patent out of this.

utxaa · on April 15, 2021

if this was the case - even if rewriting the inode was O(n) - you would have noticed. try ls on a large directory, now touch a file on same directory.

xorcist · on April 15, 2021

"Modern" file systems (say, ext3 and onwards) scales pretty well.

The problem is that most tools that operate on file systems doesn't. Things like readdir() is a linear scan and takes a long time on a million files.

So in practice it's not optimal. A thousand files, no problem. A million, start looking at doing it in-memory (or use some database tool).

nh2 · on April 15, 2021

> Are there limits to the number of files a directory can hold?

Yes, depending on the file system.

For example, ext4 with default settings uses 32-bit hashes. Upon the first collision, you can no longer add more files to the directory (ENOSPC error).

Source: https://blog.merovius.de/2013/10/20/ext4-mysterious-no-space...