Gilbo

Reiser4 has been released!

Recommended Posts

While "a new era in storage is beginning," may be a little melodramatic, the potential this filesystem offers should not be underestimated. The most interesting aspects of the FS, in my opinion, are the plugin system, and, that it is an atomic FS.

I've been following the development of Reiser4 with quite a great deal of interest. There was, for a brief period, a terrific hubbub regarding Longhorn's WinFS. I've always thought that Reiser4 was the FS to watch for the future. It has vaster potential. And, hell, it's just plain more elegant than slapping another layer between you and your data (like WinFS intends to do).

The developers of Reiser4 are Namesys, and their site is here. For now, kernel patches can be found on the namesys site at this location. Obviously this is the first release version, and, while it is considered stable, there are myriad caveats which the prudent user should attend. If you install Reiser4 on your companies servers and everything explodes (like one of my hard drives did today :( --don't worry Reiser4 had nothing to do with it), please don't blame me, or, for that manner, anyone else at all.

Share this post


Link to post
Share on other sites

I wonder about the possibility of porting this to Windows.

It seems there is a project to do this for Ext2/3 but I could find nothing about any incarnation of ReiserFS.

Of course, I also wonder whether using ReiserFS in WinXP would be a good or bad thing... :)

-JoeTD

Share this post


Link to post
Share on other sites

Any word on its bad block handling abilities? The previous versions' inability to deal with bad blocks was their most significant downfall.

Share this post


Link to post
Share on other sites

as about riserfs v3 - my personal experiences are:

+ saves space on small files

+ best performer on deleting files

- unsuitable for small partitions

- slow on dirs with lots of files

- easily corupts and if that hapens - 75% chance data is lost

- problems with quota

- performance degrades over time heavily (I guess due fragmentation)

maby my experiences are one year old and absolete, but the last month I tried the one on /tmp partition - stinker hapened, so I dumped the riser once again :)

my preferences are:

ext3 for /boot and /tmp

all else xfs

the only dravback of xfs I know is: xfs is slov on deleting files

Share this post


Link to post
Share on other sites

The file system corruption problem with reiser has, AFAIK, been fixed. I had FS corruptions with reiser (twice!) in 2001 when running under early versions of the 2.4 kernel, but haven't had any trouble in the past couple of years.

That said, however, I still use ext3 for most of my systems because the FS tools are more mature (fsck does a better job cleaning up after an unclean shutdown and forensics tools are easily available that allow me to undelete files). Plus, I simply "trust" it more.

I saw the news about reiser4 earlier today, though. This has been in the works for several years at least, so it's GREAT news that it's finally out. I'm thinking about switching over to it as my main FS once it matures a bit and I don't hear about anyone having problems. It seems to have much more features and higher performance than ext3, which for me, may be enough to get me to switch back.

Share this post


Link to post
Share on other sites
that it is an atomic FS.

Does this apply to all FS operations, including reads and writes?

And does it allow me to replace a file with a new version where any other app sees either the old or the new version, but nothing else?

Share this post


Link to post
Share on other sites
While "a new era in storage is beginning," may be a little melodramatic, the potential this filesystem offers should not be underestimated.  The most interesting aspects of the FS, in my opinion, are the plugin system, and, that it is an atomic FS.

Gilbo, would you mind making a short summary of why we should be interested in this FS? If you make it compelling enough I'll actually dig through the references you provided! :)

In particular, I'm curious about whether it addresses the problems WinFS is designed to solve(access to huge amounts of data, mostly) without "slapping another layer between you and your data."

-- Rick

Share this post


Link to post
Share on other sites

The data corruption problems with ReiserFS have indeed been solved. In fact, IBM boxes that use Linux (including their servers) ship with ReiserFS as the default filesystem. This has been the status quo for some time now, and I am surprised that so many people persist in their belief that ReiserFS is prone to data corruption.

As for Reiser4, once any bugs have been ironed out, it will represent a new tier of reliability for filesystems. The fact that it handles all IO operations atomically means that it is more consistent even than competing journalled FSes. There are situations, albeit ones that require various coincidences, that will result in errors on a journalling FS. A fully atomic FS like Reiser4 can handle such situations without introducing discrepancies into the data.

Does this apply to all FS operations, including reads and writes?

I was initially confused by this inquiry because I didn't understand the value of conducting reads atomically. The benefit is the same as for writes of course. To answer your question, it does indeed apply to all filesystem calls. Now that I think about it though, I think most FSes could be considered to be atomic as regards reads.

For general knowledge, the term atomic, when applied to filesystems, is an adjective that qualifies that an operation is completed as a single unit. The entire operation is either completed or none of it is completed. The necessity of this becomes apparent if you consider that filesystems don't simply write data to a disk, but also organize it. A write operation requires various metadata updates and actually data writing. In an atomic FS it is impossible to, for example, perform the metadata update without completely updating the data itself. Journalled filesystems guarantee atomicity of metadata updates, but this is not enough to guarantee the state of data after a crash. They cannot guarantee the state of the data. This keeps the FS from being fubared, but does allow data to develop inconsistencies. A crash of a metadata journalled FS at just the right time can result in, not data corruption in the traditional sense --journalling prevents that-- but a disparity in what data should be on the disk. The data is where it is supposed to be, and everything will work, but the data may not be what you think it is or what it should be.

I also alluded to the fact that Reiser4 has the potential to offer many features of WinFS, and to even go beyond the most ambitious claims made regarding that FS. This is due to two qualities that bestow on it tremendous flexibility: its plugin system, and its ability to treat files, not simply as files, but as directories.

It is theoretically possible for Reiser4 to manage the tags on your music collection for example, and do it with infinite flexibility. One would, potentially, be liberated from messing around with various limited, troublesome tagging hacks (ID3v1, ID3v2, vorbis tags) that are incompatible with various files and players,not to mention the infamous databases, which lack any persistence at all. The FS could simply handle tags as files (in the directory of the music file) and applications that understand this will be able to modify them effortlessly --no matter whether the compression format used on the audio supports tags or not. No matter whether you wanted to include multimedia like album covers, or lyrics.

In fact, Reiser4 has been designed with this type of potential specifically in mind. Databases exist to manage data because traditional filesystems are unable to perform many important tasks with reasonable performance. Reiser4 is a filesystem that is designed to compete with the performance of databases in the tasks at which they excel (referencing and qualifying data). WinFS implements Microsoft's SQL Server over NTFS. Reiser4 is the database and the filesystem. Whether the developers will truly be able to realize their performance claims is something time will tell. The tools don't exist, presently, to test Reiser4 as anything but a filesystem in the traditional sense, which is only where its performance, and its utility, begins.

Plugins can also be written to do all sorts of interesting things. Not only could tagging be built into the FS, but audio and video compression could be as well. Container formats like Matroska, which is a multimedia container format that is fulfilling a serious need, and shows tremendous promise, could become unnecessary. Matroska is terrific, but, at its heart, it's a hack for managing data that is designed to compensate for the fact that traditional filesystems are unable to fulfill our needs.

The extent to which this potential is realized will depend on the interest of developers in exploiting the technology fully, as opposed to dooming Reiser4 to be Just Another Linux Filesystem. This will, inevitably, depend on the extent to which Reiser4 becomes the standard FS for Linux boxes.

Share this post


Link to post
Share on other sites

> To answer your question, it does indeed apply to all filesystem calls.

In that case, I'm wondering, how it's handling this situation:

You have a 2 gb drive with a 2 gb file on it. You then do one 2 gb write on that file.

How can it guarantee atomicity?

Share this post


Link to post
Share on other sites

I really don't see the value of a plug-in for a file system. Yes, it sounds cool, but plug-in writers typically write the correctly. Plug-ins tend to be the cause of a disproportionate percentage of crashes and hangs. Sorry, but I can't tolerate that in my file system.

Why is winfs outside of the filesytem? Reliability.

Why would I want to treat files as directorys? If I want to make compound files, why would I want to tie them to the file system? I have seem any compound file designs, and why do I want to put this in the file system. This appears to be poor design.

IMHO, Reiser has long been overrated. Reiser3 curtainly was, why is version four different.

Share this post


Link to post
Share on other sites
The data corruption problems with ReiserFS have indeed been solved.  In fact, IBM boxes that use Linux (including their servers) ship with ReiserFS as the default filesystem.  This has been the status quo for some time now, and I am surprised that so many people persist in their belief that ReiserFS is prone to data corruption. 

You are appearently new to the high tech industry. Data corruption issues that a file system has are not forgoten quickly. They should not been there in the first place.

File systems should be very well tested before you give it out to other people to use. ReiserFs has always been pretty loose with this. I trust XFS or EFS far more than Reiser.

Share this post


Link to post
Share on other sites

No kidding! After being bitten by Reiser3 data corruption on a few occasions, it will be a cold summer day in Houston before I try it again with any data I care about. Even if bugs leading to corruption have been solved, hardware problems can still cause major trouble. AFAIK, there isn't a reasonable tool for Reiser3 data recovery. With ext2/3, there are very capable and thorough tools to aid in data recovery - e2fsck and debugfs. I have never seen reiserfsck accomplish anything with a corrupt filesystem other than delete everything in it!

To be fair, I've had similar problems with JFS under Linux, even though that filesystem is rock solid under AIX. I've yet to experience any problems with ext3 or XFS, but I typically use ext3 because it is a safe choice. I still think that what sets ext3 apart are its data recovery tools. It's not the fanciest or fastest kid on the block, but it works.

Backups are great when you have them, but you can't force people to make backups...

I think Reiser4 is highly interesting on paper. It will be fun to watch from a distance :)

Share this post


Link to post
Share on other sites

I don't see why or how plug-ins could be the cause of any crash. Providing a plug-in interface invloves that the plug-in cannot do all the mess it'd want.

If the plug-in is bugged when writing its data, well, the data will be incorrect when the user will read them using the same plug-in, but no crash or FS corruption. It is not different of an application bugging while writing data to disk. Data just will be incorrect, period.

As i understand the thing about plug-in, it is a convenient way to delegate data types, as we have in databases, to external code, thus allowing the user to create and handle his own data types by using the classical I/O function.

Actually, FS and databases are different things, with different constraints and aims. However, those last years, we saw databases managing more and more plain text and blobs (Binary Large OBjects). This is a fuzzy border between FS and databases, as File System purpose is precisely to manage text or binary files.

So, i don't expect such a file system to really compete against real databases, which can offer many more complex stuff, but will allow to use those specific databases made of plain text or blobs the same way files are used. When programming, accessing databases implies another complex API. This is the problem this kind of FS is adressing, if i understand correctly.

WinFS is not outside for reliability reasons, but for flexibility, by simply adding another (relatively) independant layer. That means that the evolution of the file system and the evolution of the database layer can go their own way without interfering with each other.

On the other hand, Reiser's plug-in system will allow it to grow only when needed. No better or worst design, just two different designs and purposes : flexibility versus lightweight.

The ability to treat files as directories allows you to access data simply by using ls or cd commands.

Share this post


Link to post
Share on other sites

That IBM has forgotten is good enough for me. I also believe that all things must be weighed on all the currently available evidence. FUD doesn't consitute evidence in my books, and neither do the random, anecdotal, outdated incarnations in which it generally arises.

I understand that losing data is unfortunate, but any data lost to ReiserFS in critical industry applications is the fault of the administrator --not the FS. An admin should know that evaluation of all critical software is the only method of ensuring the proper operation of a business' technical endevours.

You have a 2 gb drive with a 2 gb file on it. You then do one 2 gb write on that file.

How can it guarantee atomicity?

I would only assume that, like most filesystems, it imposes a maximum operation size. In NTFS, for example, the limit is 64KB.

Share this post


Link to post
Share on other sites

Reiser3 is also the default filesystem in Novell's SuSE Linux.

Not using Reiser because it was buggy in early 2.4 kernels is like not using Mozilla because 0.7 Beta was crap, or not using Windows 2000 because Windows ME was about as stable as a Jell-O skyscraper.

It is the primary filesystem on my server across more than 8 partitions on 3 SCSI and 1 IDE disk and is used for storing everything from PHP and C++ source code (for projects we are developing) to movies and music. The only partition that has ever had a problem on the server was EXT3, but it was easily recovered with no data loss of any kind.

While I wouldn't use Reiser4 for at least 6 months (just to be safe), there is no reason to avoid Reiser3 on partitions that would benefit from its strengths (great small-file performance, space efficiency, quick small IOs and file deletions, etc.)

It isn't the perfect filesystem (mediocre large file performance, less tested quota support) but it certainly isn't unreliable. I've had more problems with NTFS by far, and on less used systems with fewer disks, and many (including me) consider NTFS an enterprise-grade filesystem.

Share this post


Link to post
Share on other sites
I understand that losing data is unfortunate, but any data lost to ReiserFS in critical industry applications is the fault of the administrator --not the FS.  An admin should know that evaluation of all critical software is the only method of ensuring the proper operation of a business' technical endevours.

I don't disagree with that statement, but mistakes and failures happen. And I get called in to repair mistakes and failures caused by others. It would be just dandy to have useful recovery tools! Hopefully reiserfsck is a lot more useful today than it was 12-18 months ago; I have no idea, and it certainly isn't my mission to spread FUD. I expect most folks are smart enough to take anecdotes for what they are.

Reiser4 is more than an evolution of Reiser3. It is a totally new filesystem. There will be bugs, and if past performance can be used as an indication of the future, probably major ones. In short, I think Reiser4 will be an interesting "toy" for the time being, and if it proves itself over the next few years, it may be very interesting. That's just my opinion, and it's probably not worth anything to anyone but me :)

Clearly, nobody will be using Reiser4 for "critical industry applications" for quite some time.

Share this post


Link to post
Share on other sites

However, atomicity is not possible without journalisation. It shouldn't be a technical problem to make a big write of 2 Go an atomic operation. But i dunno how reiser will handle it (maybe it will like Gilbo assumes it) because, even without technical limitation, it is not sure wether it's a good thing or not.

I guess the root user will able to specify the kind of limits gilbo was talking about...

Share this post


Link to post
Share on other sites
it certainly isn't my mission to spread FUD.

Sorry Bicster, I didn't mean to imply that it was.

On the subject of WinFS' implementation of the database aspects. I believe that is being contructed the way it is purely because it will be easier. I don't think they ever considered writing a new FS from scratch.

Share this post


Link to post
Share on other sites
However, atomicity is not possible without journalisation. It shouldn't be a technical problem to make a big write of 2 Go an atomic operation. But i dunno how reiser will handle it (maybe it will like Gilbo assumes it) because, even without technical limitation, it is not sure wether it's a good thing or not.

Uhh.. trees. Think of the trees. :)

(Curiousity here - didn't someone patent the use of trees for filesystems, in the manner in which ReiserFS is using them? I'm wondering how that all turned out.)

In that case, it would not be possible to perform an atomic write to a 2GB file, on a filesystem which itself is only 2GB. In fact, based on how I understand that tree-structured FS's work, it wouldn't even be possible for that situation to happen, as the largest file on on a 2GB filesystem would never be allowed to get larger than slightly-less than half of the total maxmimum size. (With a small portion taken up for the FS tree head and file metadata.) However, for file sizes up to nearly 1GB, an atomic write can still be performed, as it basically writes to an additional 1GB file, and then when it is sucessfully written, and the directory tree above it is sucessfully modified, etc., up through the root node of the FS metadata tree, then the file has been written correctly. Up until that point, of the final root/head of the tree being modified, the existing filesystem state is still intact and maintains integrity, should the write-in-progress fail at any point. Once it has all been updated, then the write has succeeded, and the space taken up by the (now irrelevant) prior copy can be freed and made available for future use.

Basically, everything is fully-transactional, in theory. This is a revolution in file-system design principles, and it is one principle that I think should be fully required in all filesystems, because it is the underlying linch-pin feature that allows one to build fully-transactional systems and applications on top. But without that solid foundation, everything else built on top, is sheer folly, when it comes to data-integrity. (Aka - Windows', and most other popular OSes as well.)

As far as ReiserFS4, I'm slightly biased. I actually started designing something very similar to it (far before I had ever heard of it, or WinFS/Longhorn), nearly 8 years ago. I stopped development on it some time ago though, for various reasons, but it encompassed many of the same features that ReiserFS4 and WinFS promise, along with a huge interrelated set of others. Basically, it was an enourmous project, far too large for me to go it alone, so I slacked off. I'm thrilled to hear that RFS4 has finally "gone final", and that I might have a chance to test it at some point.

Will there be a port for Windows'? I know that it is difficult and costly for a hobbiest to write filesystem drivers for Windows, because MS makes one sign an NDA and pay a large sum to get access to the devkit and necessary docs. So I kind of doubt it, but it would be a nice option.

Share this post


Link to post
Share on other sites
That IBM has forgotten is good enough for me.  I also believe that all things must be weighed on all the currently available evidence.  FUD doesn't consitute evidence in my books, and neither do the random, anecdotal, outdated incarnations in which it generally arises.

I understand that losing data is unfortunate, but any data lost to ReiserFS in critical industry applications is the fault of the administrator --not the FS.  An admin should know that evaluation of all critical software is the only method of ensuring the proper operation of a business' technical endevours.

...

if I am not mistaken -> ibm uses suse -> suse comes by default with raiserfs -> suse's kernels are extensivly tested befor shiping to the masses

if a kernel is home compiled, we have no N+K spare non production servers to test the one. in this cenario mistakes are more common, and not necesereli data lost, but downtime hurts a lot...

and does this sounds like fud or anecdotes?

...

maby my experiences are one year old and absolete, but the last month I tried the one on /tmp partition - stinker hapened, so I dumped the riser once again :)

...

lets say the folowing one is an anecdotal situation:

  • file system xfs on 8GB seagate scsi (backups once a day)
  • once the backup proces fails due hdd errors
  • restoring from the backup was not the best posible option - lots of changes was made due daytime
  • I managed to dump with dd the partition to a file, placing zeros in places there disk was unreadable (all in all ~6MB randomly lokated holes in size of 16-64KB)
  • replaced the disk, and restored the partition from the dump file (curent state - unusable partition)
  • fsck managed to restore the one into fully usable state
  • there were cheksum files in all dirs, and validation shown none of files was corupted - sometimes miracles hapens :)

Share this post


Link to post
Share on other sites
Think of the trees

Could you explain (or give a link) how trees can be a replacement for journalisation in atomicity, or how trees are used in reiser ?

You need to keep track of each IOs before you can commit or roll-back a full set of them. That is journalisation. In my mind, a tree is just the way the journalisation can be implemented, convenient because, as long as nodes have no side effects, a commit or roll-back can be done just by committing or rolling-back the head of the tree.

However, you may be right by saying this is not a journalisation. Sometimes, design and implementation are two clear distinct things, sometimes implementation is completely part of the design (in the first view, journalisation would be the design and trees the implementation, in the second view trees would be both design and implementation, and the concept of journalisation would become then irrelevant).

Share this post


Link to post
Share on other sites

I am certainly interested in trying out Reiser 4. My main area of interest is performance. If the performance claims of the developer turn out to be true (I would like to see some benchmarks on SR before I truly believe), than Reiser 4 has the potential to be a hot commodity.

From the benchmarks on the reiser site, it appears that Reiser 4 is often competitive (and sometimes quicker) than ext2, which has no journaling features. Ext3, JFS, and XFS are often left in the dust. If Reiser 4 can provide high performance and high integrity, it is certainly worth a look. How long will we have to wait before hard drive technology can provide so significant an increase ?

To not look at Reiser4 because of problems Reiser 3 had in the past seems silly to me. Anyone ever used NT4 in the early days ? When I first started at my current job, we had a fairly new installation of NT4 (no service packs). I learned an awful lot about restoring from tape in those days. Life was hell until SP3 was released, at which point there was sanity. Microsoft is not the only culprit - if you spend enough time working in the IT field, almost all vendors (hardware and software) will be guilty of screw-ups from time to time.

The message is that early adopters should beware. No need to switch the production servers over today, but if you have the luxury of a test environment, why not give it a go ? The worst case is that it fails to live up to expectations, but on the flip side, we all may be blown away by how good it is. Only time will tell, but it certainly is worth following.

Share this post


Link to post
Share on other sites
Think of the trees

Could you explain (or give a link) how trees can be a replacement for journalisation in atomicity, or how trees are used in reiser ?

The ReiserFS website has a large amount of information on the inner-workings of ReiserFS, though it isn't particularly technical. http://www.namesys.com/

Share this post


Link to post
Share on other sites

Thanks. After reading, it seems that trees are not a replacement for journalisation regarding atomicity, which is still used in a classical way (they use those two things they call "Fixed Location Journaling" and "Wandering Logs", if i understand correctly).

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now