wfn

Looking for a feature full file copying utility

Recommended Posts

Morning folks,

Haven't posted in a long time, so bare with me. I'm looking for a file copy tool along the lines of xcopy that would handle files in a specific order. I'm trying to maximize performance and reduce fragmentation as the target drive will be filled up and won't have enough room to do any meaningful defragmentation on. I will be copying several hundred thousand files varying in length between 1MB and 200MB onto a 500GB drive. I'm thinking smaller files in the beginning of the drive. The reason I'm starting this in the first place is because I watched the actual file placement patterns with built in explorer copy on an empty drive using a defrag util and I don't like what I'm seeing. Please share your wisdom.

Share this post


Link to post
Share on other sites

wfn, I don't think that exists, but if you do find it, share it back here.

You could do what you want by scripting it yourself, but this isn't quite trivial. One (slow, CPU intensive) way would be to build a list of all files, and then sort it by size, and then pipe the filenames to something like Robocopy.

However, even if you do that, I'm still not sure it will work 100% as you want it. AFAIK NTFS, and most modern filesystems, implement their own logic for file placement, to avoid fragmentation amongst other things. If so, then you would need a filecopy tool that uses 'raw' copying, or use a more primitive filesystem on the destination drive.

I guess since you ask as you do, that simply getting a slightly larger harddrive and/or not caring about fragmentation isn't an option?

Share this post


Link to post
Share on other sites

So your main concern is fragmentation.

You may want to take a look at Microsoft Robocopy XP010. (part of the Resource Kit 2003)

http://go.microsoft.com/fwlink/?LinkId=20249

This new version is amazingly versatile : it supports resume, tree monitoring, bandwidth throttling, filenames up to 64000 characters, etc.

The nice thing in your case is, it can do a huge copy job - and create a tree first.

That means it will first create all directories and files with zero size.

Just placeholders.

Next, you'd do the copy job again and copy the full data.

That way defragmentation is kept to a minimum.

I love this tool so if i can help you with it just let me know.

Share this post


Link to post
Share on other sites

Any news? Playing with Robocopy it's my current favorite of the two. I don't think Beyond Compare was meant as a mass file copy tool, though it does an admirable job of actual comparing. :)

Share this post


Link to post
Share on other sites

I didn't like BC because it's taking forever. I'm playing with Robocopy and will finalize my project tonight, I'll do the /CREATE to make 0 byte files with a full tree and then do the actual copying. I'm actually pondering if there's a way to feed a list of files to be copied into Robocopy. Then the only thing I'd need is to give it the output of DIR /S/O:S/L/B/A:-D

Share this post


Link to post
Share on other sites
I didn't like BC because it's taking forever. I'm playing with Robocopy and will finalize my project tonight, I'll do the /CREATE to make 0 byte files with a full tree and then do the actual copying. I'm actually pondering if there's a way to feed a list of files to be copied into Robocopy. Then the only thing I'd need is to give it the output of DIR /S/O:S/L/B/A:-D

I don't see how to pipe that DIR output in together with a /CREATE. Cmd line RAR takes a textfile containing filenames as an argument, but Robocopy doesn't.

Using a couple of runs witht the /max and /min settings, you could copy your files in sorted batches, or 'buckets', based on size. I don't think there will be much of a benefit though, but it boils down to how the filesystem allocates slack space, which I don't know.

Reefers suggestion to use Robocopy /create already fixes 99% of what you want to do. Take that, and be happy it's so simple when you know how. :-)

Share this post


Link to post
Share on other sites
but it boils down to how the filesystem allocates slack space, which I don't know.

NTFS uses sectors to store it's data in - a sector being the smallest part on the disk.

Ignoring alternate streams, the rule is that once data of 1 file is entered into a sector,

nothing else can be stored in that sector.

Say, you use large clusters on your whole disk of 64 KB.

You only have files with a size of 32KB.

That would mean each 32KB file is stored in a 64KB cluster, which cannot store anything else.

This situation gives you wasted space - slack space - of 50%.

Say, you use clusters on your whole disk of 4KB.

You only have files with a size of 30KB.

That would mean each 30KB file is stored occupies :

7 clusters of 4KB = 28KB (no loss/slack space here)

1 cluster of 4KB, using 2KB of that sector.

That's 2KB wasted per 32KB so this situation gives you wasted space - slack space - of

6.25%.

Usually, the problem is there's a mix of small files and big ones.

For the small files you may want a small sector size like 512b.

That saves you slack space.

The large files are also stored most efficiently, but at the expense of speed.

Say you've got all movies of 1GB. (=1000MB*1024b=1.024.000b)

With a cluster size of 512b, that will take 1.024.000/512=2000 reads.

With a cluster size of 4KB, that will take 1.024.000/4096=250 reads.

Slack space :

a movie of 1GB on clusters of 512b will use exactly 2000 clusters and 0 % slack space.

The trick is to find a cluster size that gives you good performance and not too much slack space.

You can check your slack space using Karen's Disk Slack Checker tool : http://www.karenware.com/powertools/ptslack.asp

I have raid sets of 2TB and i find the default 4KB very nice for a mix of small and large files,

but it is definitely a trade-off between speed and slackspace.

Please note that NTFS compression is not possible using sectors larger than 4kb.

Hope this helps.

Edited by reefer

Share this post


Link to post
Share on other sites

Reefer, your theory is correct, but your maths is wrong.

1GB. (=1000MB*1024b=1.024.000b)

Technically, in the traditional binary sense,

1 GB = 1024 MB = 1,048,576 KB = 1,073,741,824 B

So, 512 byte sectors would mean a 1GB file would use 2,097,152 sectors.

4 KB sectors would mean a 1 GB file would use 262,144 sectors.

Share this post


Link to post
Share on other sites
Reefer, your theory is correct, but your maths is wrong.
1GB. (=1000MB*1024b=1.024.000b)

Technically, in the traditional binary sense,

1 GB = 1024 MB = 1,048,576 KB = 1,073,741,824 B

So, 512 byte sectors would mean a 1GB file would use 2,097,152 sectors.

4 KB sectors would mean a 1 GB file would use 262,144 sectors.

you are correct.

Unfortunately, so am i....

This discussion is repeated again and again...

At least i hope my story is enlightening...

Share this post


Link to post
Share on other sites

Glad to report that robocopy worked out just fine, i have 0 excessive fragments per file according to diskeeper.

One question I couldn't find an answer to was what is "transfer.log" and why robocopy tries to put it in the system volume information folder on the target drive?

Share this post


Link to post
Share on other sites
Glad to report that robocopy worked out just fine, i have 0 excessive fragments per file according to diskeeper.

One question I couldn't find an answer to was what is "transfer.log" and why robocopy tries to put it in the system volume information folder on the target drive?

I am glad it worked to your satisfaction !

The only transfer.log i know of is the one MS DPM creates - got that running on the 'source' machine ?

BTW, Robocopy has /XF and /XD switches to exclude files and folders.

I always specify those to exclude the System Volume Information folder, Recycler and pagefile.

It also supports wildcards so if applicable i also specify to exclude temp files etc.

Share this post


Link to post
Share on other sites
I recommend xxcopy and unison...

xxcopy (command line only version) for a swiss army knide style replacement to xcopy and robocopy

http://www.xxcopy.com/index.htm

unison (command line and basic gui versions) for bi-driectional synching

http://www.cis.upenn.edu/~bcpierce/unison/

Good tip.

But personally, i do not agree.

I love xxcopy, and i use it often when Robocopy cannot do the job.

The latest version of Robocopy, XP010, has a lot more options than the older ones did.

The comparison found on the xxcopy site clearly is a bit outdated : it compares the Robocopy from the Microsoft's Windows NT/2000/XP Resource Kit (version 1.96 or earlier) with xxcopy.

Robocopy XP010 comes from the 2003 reskit.

The huge number of enhancements made to robocopy is not reflected, and i value those enhancements highly.

I value both tools - imho they complement each other...

Edited by reefer

Share this post


Link to post
Share on other sites

Don't confuse sectors and clusters - they are not interchangable.

A sector is the smallest unit on the disk. With conventional hard drives, a sector is 512 bytes, and not user changable. This will be changing as drive manufacturers move to big sector drives (www.bigsector.org), floppies and optical media use different sector sizes.

A cluster is the smallest addressable unit by the operating system. The OS will bundle a group of sectors together in a cluster, so a 4k cluster will consist of 8 sectors on the typical NTFS hard drive.

Cluster size is what you want to consider in this exercise.

Share this post


Link to post
Share on other sites
Don't confuse sectors and clusters - they are not interchangable.
*blush* i did.

Believe me, i know the difference. Thanks for the correction.

Share this post


Link to post
Share on other sites

Oops, so did I! <Feeling sheepish> Good catch, bytre!

Everywehere we said "sectors", mentally replace it with "clusters".

And yes, reefer, I should have acknowledged that you made a good point. It came out a bit abrupt, just correcting you. Decimal GB vs. binary GiB (I hate that term) is a constant cause of confusion. But what caused me to post was the mixture of the two, and the three orders of magnitude error!

Of course, it's been generally agreed that the standard 4KB clusters are fine for general use. If a volume is predominantly storing very large files, it can reduce the processing overhead if you increase the cluster size. But if space is at a premium, it helps to consider that

(slack space) <= (cluster size) * (number of files)

Share this post


Link to post
Share on other sites

SPOD : no worries i am glad to be corrected - i consider that a good thing.

But the inconsistencies i made... pfff... this is obviously not my week.

I know i am tired lately but i hate those mistakes...

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now