Storage Forums: RAID5 fileserver recommendations - Storage Forums

Jump to content

Advertisement

  • (4 Pages)
  • +
  • 1
  • 2
  • 3
  • 4
  • You cannot start a new topic
  • You cannot reply to this topic

RAID5 fileserver recommendations Building a file server, and would appreciate some tips

#11 User is offline   jpiszcz Icon

  • Member
  • Group: Member
  • Posts: 472
  • Joined: 15-January 06

Posted 02 January 2007 - 07:23 PM

View PostHaversian, on Jan 2 2007, 06:55 PM, said:

View Postjboles, on Jan 2 2007, 02:31 PM, said:

Also don't forget the 32-bit LBA (2TiB) limit is not a function of the OS but of the RAID controller. Buying one new, however, it shouldn't be an issue at all.


True.

But the LBA limitation is not the only (nor even the most important) limitation. The filesystem may have limitations too, though anything reasonably modern won't. I'm a fan of XFS (wikipedia), but even ext2 (with 8k blocks) will scale to 32TB filesystems and 2TB files.

The biggest limitation in terms of using not-cutting-edge tech will be the partition table format. The standard partition formats limit you to 2TB as well, so you'll have to use something like GPT (wikipedia) or simply put a filesystem on unpartitioned space, which works just fine.


I second the use of XFS :) -- I use it for a 3.3TB FS.

#12 User is offline   Roxor McOwnage Icon

  • Member
  • Group: Member
  • Posts: 27
  • Joined: 21-March 03

Posted 02 January 2007 - 10:49 PM

If you've had experience with Sun environments, have you considered Solaris 10 x86 instead?

For serving up bits over the network Debian isn't going to get you anything extra, and they both cost the same. But instead of agonizing over the best bang-per-buck hardware RAID cards for Linux... you may get better data consistency, flexibility, and performance by just buying cheap PCI/PCIe/PCIx cards and feeding them to ZFS:

http://en.wikipedia.org/wiki/Zfs

Yes, Linux supports a wider variety of IDE/SATA cards, but the Sol10 HCL gets longer every day, and there are many forums for Sol10/OpenSolaris/SolarisExpress full of people who can help you make the correct hardware choice.

I use Debian and Gentoo at home myself, but use Solaris at work, and my next fileserver will use ZFS.

Something to think about.

Happy New Year!

#13 User is offline   Haversian Icon

  • Member
  • Group: Member
  • Posts: 143
  • Joined: 01-January 02

Posted 02 January 2007 - 11:35 PM

View PostRoxor McOwnage, on Jan 2 2007, 09:49 PM, said:

For serving up bits over the network Debian isn't going to get you anything extra, and they both cost the same. But instead of agonizing over the best bang-per-buck hardware RAID cards for Linux... you may get better data consistency, flexibility, and performance by just buying cheap PCI/PCIe/PCIx cards and feeding them to ZFS:


If you do decide to use zfs, let us all know what hardware it's running on and how fast it is, easy to use, etc. It's a fairly new file system so there's not too much collective wisdom out there about it yet.

#14 User is offline   poodel Icon

  • Member
  • Group: Member
  • Posts: 29
  • Joined: 31-December 06

Posted 03 January 2007 - 05:11 AM

View PostRoxor McOwnage, on Jan 3 2007, 04:49 AM, said:

If you've had experience with Sun environments, have you considered Solaris 10 x86 instead?

For serving up bits over the network Debian isn't going to get you anything extra, and they both cost the same. But instead of agonizing over the best bang-per-buck hardware RAID cards for Linux... you may get better data consistency, flexibility, and performance by just buying cheap PCI/PCIe/PCIx cards and feeding them to ZFS:

Yes, Linux supports a wider variety of IDE/SATA cards, but the Sol10 HCL gets longer every day, and there are many forums for Sol10/OpenSolaris/SolarisExpress full of people who can help you make the correct hardware choice.


Hm... interesting idea. I'll have to read up on how ZFS performs (and I'd have to keep my Debian box on the side). Thanks.

#15 User is offline   jpiszcz Icon

  • Member
  • Group: Member
  • Posts: 472
  • Joined: 15-January 06

Post icon  Posted 03 January 2007 - 06:08 AM

View Postpoodel, on Jan 3 2007, 05:11 AM, said:

View PostRoxor McOwnage, on Jan 3 2007, 04:49 AM, said:

If you've had experience with Sun environments, have you considered Solaris 10 x86 instead?

For serving up bits over the network Debian isn't going to get you anything extra, and they both cost the same. But instead of agonizing over the best bang-per-buck hardware RAID cards for Linux... you may get better data consistency, flexibility, and performance by just buying cheap PCI/PCIe/PCIx cards and feeding them to ZFS:

Yes, Linux supports a wider variety of IDE/SATA cards, but the Sol10 HCL gets longer every day, and there are many forums for Sol10/OpenSolaris/SolarisExpress full of people who can help you make the correct hardware choice.


Hm... interesting idea. I'll have to read up on how ZFS performs (and I'd have to keep my Debian box on the side). Thanks.


With all of this talk I am also building another fileserver as I have not outgrown, but become sick of slow speeds of the PCI bus. ZFS in my opinion is probably one of the best filesystems currently in existence; however, it is new and not proven over time yet. I currently use XFS and I am satisfied with it.

Instead of purchasing a $1500 RAID controller, I am going to use the onboard SATA and multiple PCI-e x1 cards with dual SATA ports. Not sure if I want to use RAID5 or RAID10 yet; however, this will give me the speed the drives can push. Currently, my configuration is as follows:

/dev/md3:
		Version : 00.90.03
  Creation Time : Fri Jul  7 18:52:29 2006
	 Raid Level : raid5
	 Array Size : 3516378624 (3353.48 GiB 3600.77 GB)
	Device Size : 390708736 (372.61 GiB 400.09 GB)
   Raid Devices : 10
  Total Devices : 10
Preferred Minor : 3
	Persistence : Superblock is persistent

	Update Time : Wed Jan  3 05:32:03 2007
		  State : active
 Active Devices : 10
Working Devices : 10
 Failed Devices : 0
  Spare Devices : 0

		 Layout : left-symmetric
	 Chunk Size : 512K

		   UUID : 6b8f95e6:23e17793:9107a4ba:c2732883
		 Events : 0.6224664

	Number   Major   Minor   RaidDevice State
	   0	   3		1		0	  active sync   /dev/hda1 *seagate/400
	   1	  57		1		1	  active sync   /dev/hdk1 *seagate/400
	   2	  34		1		2	  active sync   /dev/hdg1 *seagate/400
	   3	  33		1		3	  active sync   /dev/hde1 *seagate/400
	   4	  56		1		4	  active sync   /dev/hdi1 *seagate/400
	   5	   8	   81		5	  active sync   /dev/sdf1 *seagate/400
	   6	   8	   97		6	  active sync   /dev/sdg1 *seagate/400
	   7	   8	   33		7	  active sync   /dev/sdc1 * wd/400
	   8	   8	   49		8	  active sync   /dev/sdd1 * wd/400
	   9	   8	   65		9	  active sync   /dev/sde1 * seagate/400



As you can see, its a mix-mash of IDE+SATA and WD/SEAGATE, in the new case I am contemplating whether I should get all the exact same model number drives/or do something else.

Currently though, Linux SW RAID has been nothing but awesome, I started out with 1.8TB and 'grew' the RAID5 from there and then I used xfs_growfs to grow the filesystem.

Like this (I kept the logs when I did this):

Step #1: Growing the RAID

First, you add a spare to the RAID5 pool.

box:~# df -h | grep /raid5
/dev/md3			  746G   80M  746G   1% /raid5
box:~# umount /dev/md3
box:~# mdadm -D /dev/md3
/dev/md3:
		Version : 00.90.03
  Creation Time : Fri Jul  7 15:44:24 2006
	 Raid Level : raid5
	 Array Size : 781417472 (745.22 GiB 800.17 GB)
	Device Size : 390708736 (372.61 GiB 400.09 GB)
   Raid Devices : 3
  Total Devices : 4
Preferred Minor : 3
	Persistence : Superblock is persistent

	Update Time : Fri Jul  7 18:25:29 2006
		  State : clean
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

		 Layout : left-symmetric
	 Chunk Size : 64K

		   UUID : cf7a7488:64c04921:b8dfe47c:6c785fa1
		 Events : 0.26

	Number   Major   Minor   RaidDevice State
	   0	   3		1		0	  active sync   /dev/hda1
	   1	  33		1		1	  active sync   /dev/hde1
	   2	   8	   33		2	  active sync   /dev/sdc1

	   3	  22		1		-	  spare   /dev/hdc1


Then you "grow" the RAID5.

box:~# mdadm /dev/md3 --grow --raid-disks=4
mdadm: Need to backup 384K of critical section..
mdadm: ... critical section passed.

Then you check the status:

box:~# cat /proc/mdstat 
Personalities : [raid1] [raid5] [raid4]
md1 : active raid1 sdb2[1] sda2[0]
	  136448 blocks [2/2] [UU]
	  
md2 : active raid1 sdb3[1] sda3[0]
	  70268224 blocks [2/2] [UU]
	  
md3 : active raid5 hdc1[3] sdc1[2] hde1[1] hda1[0]
	  781417472 blocks super 0.91 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
	  [>....................]  reshape =  0.0% (85120/390708736) finish=840.5min speed=7738K/sec
	  
md0 : active raid1 sdb1[1] sda1[0]
	  2200768 blocks [2/2] [UU]
	   
Then wait a while, when done, you can grow the filesystem...


Step #2: Growing the filesystem

Growing the XFS filesystem is a breeze:

# xfs_growfs /raid5

box:~# df -h | egrep '(^Filesystem|/dev/md3)'
Filesystem			Size  Used Avail Use% Mounted on
/dev/md3			  2.6T  932G  1.7T  36% /raid5
box:~# xfs_growfs /raid5
meta-data=/dev/md3			   isize=256	agcount=38, agsize=18314368 blks
		 =					   sectsz=4096  attr=0
data	 =					   bsize=4096   blocks=683740288, imaxpct=25
		 =					   sunit=128	swidth=768 blks, unwritten=1
naming   =version 2			  bsize=4096  
log	  =internal			   bsize=4096   blocks=32768, version=2
		 =					   sectsz=4096  sunit=1 blks
realtime =none				   extsz=3145728 blocks=0, rtextents=0
data blocks changed from 683740288 to 781417472
box:~# df -h | egrep '(^Filesystem|/dev/md3)'
Filesystem			Size  Used Avail Use% Mounted on
/dev/md3			  3.0T  932G  2.1T  32% /raid5
box:~#



PROS:

1) RAID5 (don't need to worry about a drive dying)
2) Only 5-15% CPU utilization under heavy I/O, here the dd is doing 40-120MB/s and the RAID5 process is only using 12% of the CPU (old 3.4GHZ Pentium4 Prescott)

  PID USER	  PR  NI  VIRT  RES  SHR S %CPU %MEM	TIME+  COMMAND		   
20105 bob	   18   0  2008  540  440 D   36  0.1   0:05.68 dd				 
  381 root	  10  -5	 0	0	0 S   12  0.0  99:08.57 md3_raid5


3) I can monitor all drives via smartctl (SMARTMONTOOLS)-- yes, 3ware allows a pass-thru to get to the drives, but many other RAID cards do not. This also means I can monitor temperature very easily as well.

$ ctemp
/dev/hda: ST3400832A: 35°C
/dev/hde: ST3400832A: 34°C
/dev/hdg: ST3400832A: 34°C
/dev/hdi: ST3400832A: 33°C
/dev/hdk: ST3400633A: 36°C
/dev/sda: WDC WD740GD-00FLC0: 30°C
/dev/sdb: WDC WD740GD-00FLC0: 31°C
/dev/sdc: ST3400633AS: 35°C
/dev/sdd: ST3400620AS: 37°C
/dev/sde: ST3400633AS: 36°C
/dev/sdf: WDC WD4000KD-00NAB0: 33°C
/dev/sdg: WDC WD4000KD-00NAB0: 30°C


4) I get between 100-133MB/s read from the array, which is nice.


CONS:

1) PCI bus is limited to 133MB/s.
2) Even though I use SATA drives on the motherboard, I believe they are also on the PCI bus as PCI-express was not out when my motherboard was created.
3) Write speed is 38-40MB/s sustained, again, I believe this is because of the PCI bus, it has to calculate/write PARITY and then the data..
4) Current case setup is a nightmare, which is why I ordered the Cooler Master Stacker. The entire case had to be modded to put fans where they did not belong and the cables are everywhere. Part of the problem is that some drives are IDE and some are SATA (IDE cables, even the round ones take up a lot of room). The Antec TruPower 550W handles the drives with no issues, at bootup it hits 500-520 watts and then after the drives have spun up it uses 220-280 watts.

Pictures of setup (below):

The two raptors are at the very top, followed by the two WD 400s and below that the rest are Seagate IDE+SATA.

Amazingly, with about 10-12 fans in the box, everything stays very cool.

Front of the case, I disconnected the temperature control in the front because it added an additional 3-6 power cables/fan control cables in the case, and as you can see, I have enough of those!

Posted Image

The side of the case, yes, its a mess.

Posted Image

Plan:

Build new machine.
New drives (possibly).
Use cooler master stacker.

Hopefully have a lot less mess!

Justin.

#16 User is offline   Big Buck Hunter Icon

  • Mod
  • Group: Member
  • Posts: 2,338
  • Joined: 07-April 03

Posted 03 January 2007 - 08:15 AM

my 2 cents.

1: Linux DMraid is great for archival fileservers where performance is not critical.
2: I'd like the origional poster to check out 3ware's offerings before deciding on a controller. I swear by them.

Thank you for your time,
Frank Russo

#17 User is offline   lizardking009 Icon

  • Member
  • Group: Member
  • Posts: 88
  • Joined: 11-December 02

Posted 03 January 2007 - 05:33 PM

jpiszcz, excellent HOWTO. Thanks. I didn't know that Linux RAID 5 arrays were growable yet! Is there a specific kernel version that you need to support it? Do you know if RAID 6 arrays are growable?

Thanks!

#18 User is offline   jpiszcz Icon

  • Member
  • Group: Member
  • Posts: 472
  • Joined: 15-January 06

Posted 03 January 2007 - 06:17 PM

View Postlizardking009, on Jan 3 2007, 05:33 PM, said:

jpiszcz, excellent HOWTO. Thanks. I didn't know that Linux RAID 5 arrays were growable yet! Is there a specific kernel version that you need to support it? Do you know if RAID 6 arrays are growable?

Thanks!


I /think/ 2.6.17 introduced support for growable RAID5 arrays. RAID6 is not growable AFAIK, just RAID5.

#19 User is offline   Haversian Icon

  • Member
  • Group: Member
  • Posts: 143
  • Joined: 01-January 02

Posted 04 January 2007 - 04:08 AM

View Postjpiszcz, on Jan 3 2007, 05:17 PM, said:

I /think/ 2.6.17 introduced support for growable RAID5 arrays. RAID6 is not growable AFAIK, just RAID5.


2.6.16 cleaned up a bunch of RAID code, but you're correct that 2.6.17 introduced the code and interfaces for growing RAID5 arrays. Later kernels include various comments about fixing bugs in the RAID5 code, both specifically related to growability and not, so I wouldn't recommend using the minimum 2.6.17. 2.6.18 merges the RAID4/5/6 code, though it's not apparent from the changelog whether this means RAID6 arrays are growable or not. RAID5->RAID6 migration is not currently possible but it's a near-term feature and the code is actively moving in that direction. I tried to find in the changelog where someone (Andrew Morton?) commented that enough successful reports of RAID5 growing had come in that he's comfortable that it's pretty stable, but I've so far failed.

2.6.19 includes yet more md fixes; 2.6.19.1 does not, but it's the latest stable kernel, so that's probably your best bet. 2.6.20-rc[1-3] don't include many changes to md code.

#20 User is offline   Haversian Icon

  • Member
  • Group: Member
  • Posts: 143
  • Joined: 01-January 02

Posted 04 January 2007 - 01:06 PM

Forgot to mention: You'll need a copy of mdadm newer than 2.3.1 to use the RAID5 reshape code, in addition to the newer kernel.

  • (4 Pages)
  • +
  • 1
  • 2
  • 3
  • 4
  • You cannot start a new topic
  • You cannot reply to this topic

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users