This is probably true since NTFS performance just falls through the floor and exhibits other anomalies.
I did a proof-of-concept test to demonstrate a ZIP file with one million (1,000,000) entries. The point was to prove the tools. I happened to use NT (2.0 GHz Xeon, 2 GB RAM) at first and it was incredibly slow, taking over a day to create the files (it was CPU bound not disk bound). However, an NTFS directory did hold a million files.
Another group did some experiments and found that for file reads â€“ no creation and rebalancing â€“ it was faster to manually place the files into a subdirectory structure (i.e. aardvark.zip in a/a/aardvark.zip etc) than to place everything in one directory. Thatâ€™s shocking â€“ it says their B-tree implementation is so bad that it is cheaper for the filesystem to descend into subdirectories than search within a directory. I think this group was dealing with the 20,000 to 50,000 files case.
The point of the million file case was not that we wanted to do it, but rather to prove that the plan using ZIPs worked with existing cases, foreseeable growth, unreasonable growth, and truly absurd outlier cases (a million files in a ZIP). I had to defend it and wanted to show that the tools and plan would not break even far past the point at which the rest of the process would be expected to fall over.