ae3145

Hitting 2GB Compressed Zip Limit, Recomendations?

Recommended Posts

Greetings all,

We have a script running that compresses a few directories on our file server. We grab these zip files (made by info-zip) and burn them to DVD for storage.

Thus far this has been fast and inexpensive backup method.

We noticed that a few of our zip files were not updating... and realized that they were all close to 2GB.

One of the directories has 80k files ~6GB and compresses down to 1.7GB

The other has 10k files and ~4.9GB and is hitting the 2GB limit compressed.

So....

Are there any utillities that don't have this limitations and could handle 100k files?

Experiences?

Thanks!

George

Share this post


Link to post
Share on other sites

Tar with BZip2 or Gzip?

Check your filesystem to make sure you won't run into a FS limit as well.

Share this post


Link to post
Share on other sites

(Win)RAR is your friend. You'll never go back.

http://www.rarlabs.com/rar_archiver.htm

"WinRAR supports files and archives up to 8,589 billion gigabytes in size. The number of archived files is, for all practical purposes, unlimited."

Anyway, are you sure that your filesystem can handle files larger than 2GB?

Also, keep in mind that the standard DVD format doesn't support >2GB files either. You'll have to burn them as UDF disks (Universal Disk Format).

Regs,

JD

P.S. For scripting purposes, check out some of the possibilities of the command-line version:

D:\>rar --help

RAR 3.30    Copyright (c) 1993-2004 Eugene Roshal    22 Jan 2004
Shareware version         Type RAR -? for help

Usage:     rar <command> -<switch 1> -<switch N> <archive> <files...>
              <@listfiles...> <path_to_extract\>

<Commands>
 a             Add files to archive
 c             Add archive comment
 cf            Add files comment
 cw            Write archive comment to file
 d             Delete files from archive
 e             Extract files to current directory
 f             Freshen files in archive
 i[par]=<str>  Find string in archives
 k             Lock archive
 l[t,b]        List archive [technical, bare]
 m[f]          Move to archive [files only]
 p             Print file to stdout
 r             Repair archive
 rc            Reconstruct missing volumes
 rn            Rename archived files
 rr[N]         Add data recovery record
 rv[N]         Create recovery volumes
 s[name|-]     Convert archive to or from SFX
 t             Test archive files
 u             Update files in archive
 v[t,b]        Verbosely list archive [technical,bare]
 x             Extract files with full path

<Switches>
 -             Stop switches scanning
 ac            Clear Archive attribute after compression or extraction
 ad            Append archive name to destination path
 ag[format]    Generate archive name using the current date
 ao            Add files with Archive attribute set
 ap<path>      Set path inside archive
 as            Synchronize archive contents
 av            Put authenticity verification (registered versions only)
 av-           Disable authenticity verification check
 c-            Disable comments show
 cfg-          Disable read configuration
 cl            Convert names to lower case
 cu            Convert names to upper case
 df            Delete files after archiving
 dh            Open shared files
 ds            Disable name sort for solid archive
 e<attr>       Set file exclude attributes
 ed            Do not add empty directories
 en            Do not put 'end of archive' block
 ep            Exclude paths from names
 ep1           Exclude base directory from names
 ep2           Expand paths to full
 f             Freshen files
 hp[password]  Encrypt both file data and headers
 idp           Disable percentage display
 ieml[addr]    Send archive by email
 ierr          Send all messages to stderr
 ilog[name]    Log errors to file (registered versions only)
 inul          Disable all messages
 ioff          Turn PC off after completing an operation
 isnd          Enable sound
 k             Lock archive
 kb            Keep broken extracted files
 m<0..5>       Set compression level (0-store...3-default...5-maximal)
 mc<par>       Set advanced compression parameters
 md<size>      Dictionary size in KB (64,128,256,512,1024,2048,4096 or A-G)
 ms[ext;ext]   Specify file types to store
 o+            Overwrite existing files
 o-            Do not overwrite existing files
 os            Save NTFS streams
 ow            Save or restore file owner and group
 p[password]   Set password
 p-            Do not query password
 r             Recurse subdirectories
 r0            Recurse subdirectories for wildcard names only
 rr[N]         Add data recovery record
 rv[N]         Create recovery volumes
 s[<N>,v[-],e] Create solid archive
 s-            Disable solid archiving
 sfx[name]     Create SFX archive
 si[name]      Read data from standard input (stdin)
 t             Test files after archiving
 ta<date>      Process files modified after <date> in YYYYMMDDHHMMSS format
 tb<date>      Process files modified before <date> in YYYYMMDDHHMMSS format
 tk            Keep original archive time
 tl            Set archive time to latest file
 tn<time>      Process files newer than <time>
 to<time>      Process files older than <time>
 ts<m,c,a>[N]  Save or restore file time (modification, creation, access)
 u             Update files
 v             Create volumes with size autodetection or list all volumes
 v<size>[k,b]  Create volumes with size=<size>*1000 [*1024, *1]
 vd            Erase disk contents before creating volume
 ver[n]        File version control
 vn            Use the old style volume naming scheme
 vp            Pause before each volume
 w<path>       Assign work directory
 x<file>       Exclude specified file
 x@            Read file names to exclude from stdin
 x@<list>      Exclude files in specified list file
 y             Assume Yes on all queries
 z<file>       Read archive comment from file

D:\>

Share this post


Link to post
Share on other sites

Great info,

The files are made on an NTFS drive, but I wasn't aware of the DVD limit, thanks for the heads-up!

I'm going to test out RAR... thanks for the command line info!

Take Care

-George

Share this post


Link to post
Share on other sites

I used to have both WinRAR & WinZip on my machines.

I found that I was naturally using WinRAR for everything, even handling ZIP archives.

My machine now only has WinRAR, and I have also used it to create backups of multiple GB. If necessary, there is even an option to split the archive into multiple smaller "parts" (say 700MB each for CD, or 4.7GB for DVD).

One other feature that I would strongly commend to you for backups - use the option to add Recovery Records. This makes the data completely recoverable, even in the event that part of the archive gets corrupted (eg some bad sectors). It adds somewhat to the time to create a large archive, but it gives you a lovely warm feeling of security.

You have to set a percentage for this option - 1% will only increase a 1GB archive by 10MB. I don't know if there is any recommended percentage, but I use 3% on mine.

I also use the "profiles" feature, to create an "HD backup" profile. It sets several options, such as compression levels, RR %age, etc.

Great prog.

cheers, Martin

Edited by MartinP

Share this post


Link to post
Share on other sites
One other feature that I would strongly commend to you for backups - use the option to add Recovery Records. This makes the data completely recoverable, even in the event that part of the archive gets corrupted (eg some bad sectors). It adds somewhat to the time to create a large archive, but it gives you a lovely warm feeling of security.

You have to set a percentage for this option - 1% will only increase a 1GB archive by 10MB. I don't know if there is any recommended percentage, but I use 3% on mine.

200110[/snapback]

Unless Eugene has done a lot more work on his "recovery" records scheme, I wouldn't recommend it. I would recommend QuickPar. PAR3 (when it arrives) will be the best way to go.

I know you say it gives you a "warm feeling", but have you ever tested Winrar's ability to fix corruption? I wanted to use it for extra file integrity, but quickly found your options for amount of parity ("recovery records") to keep extremely limiting, but was dumbfounded when I realized I could corrupt TWO BYTES OF DATA and have it render Winrar's recovery abilities useless. WTF?

I do admit this was two years ago, when he first added the option for recovery records. Maybe it has been "fixed." (If so, it hasn't been mentioned...)

But, Winrar's biggest issue was that it couldn't detect corruption in its parity records. And if there was corruption in the parity, even the user couldn't add a way to get it to IGNORE the corrupt area.

Simply put: You could corrupt ANY ONE (just one!) byte of your data, but as long as the first recovery block also had a (just one!) corrupt byte, IT RENDERED YOUR DATA COMPLETELY UNREPAIRABLE.

I believe Winrar stores info in 256-byte blocks. So, yes, MBs and GBs of "recovery" blocks for your data, but due to Winrar's methods, if the parity was corrupt in the block it wanted to use, it was never going to recognize this block (out of thousands of more good ones you may have) may be corrupt.

Winrar would just go "hmmm, I used that recovery block, the CRC still didn't match... Hmmm..."

And then go "oh, well... happy fellow it"

Also, back then he had some odd 15-bit limitation on something that caused some other odd recovery record size limitation... So, either way, even if the parity wasn't going to be useless, I wasn't going to be able to create as much parity as I needed.

Hmmm, actually, if you can't create as much of something as you need, isn't that the definition again of useless?

Never mind...

Don't get me wrong: Winrar is a GREAT GREAT product. For compression, it is my fav. Actually, for everything else it's my fav. All of your other points are valid and I agree.

But, unless Winrar's changed in the past two years or so in how it handles its parity records, I just cannot recommend it at all for corruption repair capabilities...

(I will give Eugene proper credit for being 100% honest with me on Winrar's limitations in this regard. Everything I've mentioned, he knew. [Well, he wrote, he should know...] But yes, he was honest. But I made some recommendations, so perhaps he followed them by now... But, last I knew, Winrar had the same limitations, so, if you are serious about your back-ups, and concerned about corruption and data repair, sorry, I cannot recommend Winrar for that...)

And, either way, PAR3 will be much faster in parity creation/repair, so unless Eugene wants to change his parity algorithm to match, I'd still recommend QuickPar...

Actually, Winrar has one more area of issue concerning file integrity, but since an OS's file management could be blamed in a 50/50 split, I won't bother going there... (Although, since Winrar could remove this potential issue, it's still 100% avoidable...) And yes, I am positive this issue still exists with the latest version of Winrar...

(Simply put, Winrar itself can help create corruption, and since it doesn't take the step to assure it hasn't done so, well, what's the point in storing data that doesn't match the source?)

I admit I haven't brought that issue to Eugene's attention...

But, yes, if you NEED to be certain your saved archives are good before you delete the source, DO NOT (I REPEAT: DO NOT) use the "verify then delete" option regarding creating archives then deleting the source data...

It's not exactly Winrar's fault in this regard, but since Winrar could 100% detect and avoid it, it'd still be nice if Winrar would... (QuickPar does...)

I don't make these comments because I don't like Winrar. I make them because I love the product. But, it could use some simple improvements to make it just that much better.

But, simply put: Winrar is a data archive/compression tool. It is NOT a data integrity tool.

Share this post


Link to post
Share on other sites
Simply put:  You could corrupt ANY ONE (just one!) byte of your data, but as long as the first recovery block also had a (just one!) corrupt byte, IT RENDERED YOUR DATA COMPLETELY UNREPAIRABLE.

I believe Winrar stores info in 256-byte blocks.  So, yes, MBs and GBs of "recovery" blocks for your data, but due to Winrar's methods, if the parity was corrupt in the block it wanted to use, it was never going to recognize this block (out of thousands of more good ones you may have) may be corrupt.

Winrar would just go "hmmm, I used that recovery block, the CRC still didn't match...  Hmmm..."

And then go "oh, well... happy fellow it"

200131[/snapback]

Damn, I hadn't heard about that before. Thanks for the info.

I've successfully used recovery records in the past however. Somewhere in 1998 I discovered corruption issues after sending large files over my 10Mbit network (coax). Eventually, it appeared that I had (all at once!):

- bad RAM

- an IDE controller which caused corruption after enabling bus mastering

- a network card which somehow corrupted data without it being noticed by the tcp/ip stack :s (swapped it with an identical card and everything was alright then)

I had transferred RAR archives over the network to burn them to a cd-rom. I discovered the corruption after I had already deleted the source files. Luckily, in 50% of the cases "rar r" repaired it.

So now I'm wondering, if the "oh, well... happy fellow it" happens, does WinRAR report that? Or won't you ever know about it?

JD

Share this post


Link to post
Share on other sites
Are there any utillities that don't have this limitations and could handle 100k files?

200046[/snapback]

I thought of your post yesterday as I backed up over 3 gig of .pdf's and .doc's and .jpg's with WinZip

It ended up being about 2.4 gig. Winzip 9.0 seems to handle that just fine.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now