Jump to content


Photo

correcting size on disk with defrag on RAID5 array?


  • You cannot start a new topic
  • Please log in to reply
4 replies to this topic

#1 alpha754293

alpha754293

    Member

  • Member
  • 2,015 posts

Posted 07 February 2013 - 11:03 AM

I'm a little confused about this. I know that the difference between size and size on disk is because small files can take up a whole cluster even if they don't fill the cluster.

And I am under the impression that defragging might be able to help improve what I've termed "cluster utilization" (more small files that use different parts of the same cluster). Example being say that my clusters are 4 kiB, and I have a file that's 17 kiB, that means that it's going to use 5 clusters (20 kiB) leaving 3 kiB wasted. If I have four of these files, will defragging be able to help make better use of that last cluster that would normally only be partially used?

System/OS details:
10x 3 TB drives on ARC-1230 RAID5 formatted as NTFS with 64kiB clusters running on Server 2008 HPC.

I'm also asking cuz I was checking on of my mounts and it's showing like some 31.7 GB actual size, but the size on disk is closer to 66.1 GB; so I am wondering if defragging might fix that. Or would it matter? The way that Windows reports disk utilization - is it based on the actual size or is it based on the size-on-disk?

Any help/advice would be greatly appreciated. Thanks.

Edited by alpha754293, 07 February 2013 - 11:15 AM.

All 'round übergeek.

#2 FastMHz

FastMHz

    Member

  • Member
  • 403 posts

Posted 09 February 2013 - 09:13 PM

The problem you are encountering is what's called "slack space". Defragging can help, to a small point, but it won't eliminate it.

Are you storing many thousands of these tiny files? Are they static and not changing? You may wish to package them all up into a ZIP archive (not necessarily compressed) using 7-zip and then you can mount the archive via a freebie called Pismo File Mount to access the files without extracting.

This puts all the tiny files into one huge file, eliminating the "slack space" issue. I've done this with infrequently used tiny files and have gained gigabytes of lost space back.

Production: Vishera 8350/32gb RAM/Dual SSD/VelociRaptor/Radeon 7750
Gaming: Phenom II 955/16gb RAM/SSD/VelociRaptor/Radeon 7950
Retro: K6-2 550/256mb RAM/160gb HDD/CompactFlash/3DFX/ATI AIW Pro/SB16/DB50XG
http://www.fastmhz.com

#3 alpha754293

alpha754293

    Member

  • Member
  • 2,015 posts

Posted 10 February 2013 - 06:30 PM

The problem you are encountering is what's called "slack space". Defragging can help, to a small point, but it won't eliminate it.

Are you storing many thousands of these tiny files? Are they static and not changing? You may wish to package them all up into a ZIP archive (not necessarily compressed) using 7-zip and then you can mount the archive via a freebie called Pismo File Mount to access the files without extracting.

This puts all the tiny files into one huge file, eliminating the "slack space" issue. I've done this with infrequently used tiny files and have gained gigabytes of lost space back.


Well...they're actually the applications that are installed. I created a directory on the array, and then mapped it as a network drive so that instead of installing the apps over and over on the client machines, wherever possible, it's a single instance, stored on the network server, but across multiple clients.

So it's whatever the file breakdown ends up being whatever it happens to be.

I don't suppose that there's a way for me to format different part of the drive without having to repartition the drive into multiple partitions, and then assigning a separate drive letter to each, is there?

Also, should the stripe size for a RAID5 array match the format settings? Or does not matter?

I also tried copying the folder onto another drive (the OS drive) and then copying it back to see if "re-organizing" the layout might help. It didn't.
All 'round übergeek.

#4 dietrc70

dietrc70

    Member

  • Member
  • 106 posts

Posted 10 February 2013 - 09:08 PM

These are the options I can think of:

1. Shrinking a partition is not very difficult, and you could create a new one with default clusters, move the applications to the new partition, and hard-link the directories. Your users would not see any change. The problem with this option is that it would slow the array by forcing it to seek to the new partition. I don't really like this option.

2. Reformat the whole array with 8kb clusters, which are large enough for your array and will eliminate most of your slack space issues. This would probably be the cleanest and best solution.

3. On a side note, I wonder if you should consider changing your array configuration. Your array is huge, and I'd be concerned about rebuild times and possible 2 drive failures on a 10 drive array. RAID 6 or 60 might be worth considering.

I've seen endless arguments on cluster size vs. stripe size. My own impression is that it's usually best to use the manufacturer's recommended settings for your application, and then just go with the default (minimum) cluster size for your array size. Perhaps a RAID 5 expert could give more specific recommendations. Since you have so many small files, then stripes in the smaller range might be better.

Edited by dietrc70, 10 February 2013 - 09:25 PM.

#5 alpha754293

alpha754293

    Member

  • Member
  • 2,015 posts

Posted 11 February 2013 - 06:39 AM

These are the options I can think of:

1. Shrinking a partition is not very difficult, and you could create a new one with default clusters, move the applications to the new partition, and hard-link the directories. Your users would not see any change. The problem with this option is that it would slow the array by forcing it to seek to the new partition. I don't really like this option.

2. Reformat the whole array with 8kb clusters, which are large enough for your array and will eliminate most of your slack space issues. This would probably be the cleanest and best solution.

3. On a side note, I wonder if you should consider changing your array configuration. Your array is huge, and I'd be concerned about rebuild times and possible 2 drive failures on a 10 drive array. RAID 6 or 60 might be worth considering.

I've seen endless arguments on cluster size vs. stripe size. My own impression is that it's usually best to use the manufacturer's recommended settings for your application, and then just go with the default (minimum) cluster size for your array size. Perhaps a RAID 5 expert could give more specific recommendations. Since you have so many small files, then stripes in the smaller range might be better.


Well, that's just the applications directory (directory where the applications are being served out of). In another directory, I have a LOT of really big files. The space-on-disk vs. space utilization is much better (sitting somewhere around the 0.9-0.95 factor). And that's the thing too - I couldn't predict what the installation files would unpack to. And whether the application will run over the network (some do. Some won't. Some won't even let me installed to a mapped network drive.)

(As for backup, I'm starting to think about looking into getting a LTO-3 drive and do a grandfather-father-son type dealio. But that's another discussion. (ORIGINALLY, the plan was for me to build a second live server running ZFS and then run rsync weekly), but I dunno.

I know that usually, the discussion around stripe vs. cluster size is due to performance considerations. Whereas in my case, I'm not really too overly concerned about that. (I've got a 12-port SATA 3 Gbps controller, and my network is only 1 Gbps, which means that I am much likely to bottleneck and oversaturate the network long before I will run out of performance room from the array.) So this ends up being an optimization of a different kind - one that I don't think that very many people has done.
All 'round übergeek.



0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users