poodel, on Jan 3 2007, 05:11 AM, said:
Roxor McOwnage, on Jan 3 2007, 04:49 AM, said:
If you've had experience with Sun environments, have you considered Solaris 10 x86 instead?
For serving up bits over the network Debian isn't going to get you anything extra, and they both cost the same. But instead of agonizing over the best bang-per-buck hardware RAID cards for Linux... you may get better data consistency, flexibility, and performance by just buying cheap PCI/PCIe/PCIx cards and feeding them to ZFS:
Yes, Linux supports a wider variety of IDE/SATA cards, but the Sol10 HCL gets longer every day, and there are many forums for Sol10/OpenSolaris/SolarisExpress full of people who can help you make the correct hardware choice.
Hm... interesting idea. I'll have to read up on how ZFS performs (and I'd have to keep my Debian box on the side). Thanks.
With all of this talk I am also building another fileserver as I have not outgrown, but become sick of slow speeds of the PCI bus. ZFS in my opinion is probably one of the best filesystems currently in existence; however, it is new and not proven over time yet. I currently use XFS and I am satisfied with it.
Instead of purchasing a $1500 RAID controller, I am going to use the onboard SATA and multiple PCI-e x1 cards with dual SATA ports. Not sure if I want to use RAID5 or RAID10 yet; however, this will give me the speed the drives can push. Currently, my configuration is as follows:
/dev/md3:
Version : 00.90.03
Creation Time : Fri Jul 7 18:52:29 2006
Raid Level : raid5
Array Size : 3516378624 (3353.48 GiB 3600.77 GB)
Device Size : 390708736 (372.61 GiB 400.09 GB)
Raid Devices : 10
Total Devices : 10
Preferred Minor : 3
Persistence : Superblock is persistent
Update Time : Wed Jan 3 05:32:03 2007
State : active
Active Devices : 10
Working Devices : 10
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
UUID : 6b8f95e6:23e17793:9107a4ba:c2732883
Events : 0.6224664
Number Major Minor RaidDevice State
0 3 1 0 active sync /dev/hda1 *seagate/400
1 57 1 1 active sync /dev/hdk1 *seagate/400
2 34 1 2 active sync /dev/hdg1 *seagate/400
3 33 1 3 active sync /dev/hde1 *seagate/400
4 56 1 4 active sync /dev/hdi1 *seagate/400
5 8 81 5 active sync /dev/sdf1 *seagate/400
6 8 97 6 active sync /dev/sdg1 *seagate/400
7 8 33 7 active sync /dev/sdc1 * wd/400
8 8 49 8 active sync /dev/sdd1 * wd/400
9 8 65 9 active sync /dev/sde1 * seagate/400
As you can see, its a mix-mash of IDE+SATA and WD/SEAGATE, in the new case I am contemplating whether I should get all the exact same model number drives/or do something else.
Currently though, Linux SW RAID has been nothing but awesome, I started out with 1.8TB and 'grew' the RAID5 from there and then I used xfs_growfs to grow the filesystem.
Like this (I kept the logs when I did this):
Step #1: Growing the RAID
First, you add a spare to the RAID5 pool.
box:~# df -h | grep /raid5
/dev/md3 746G 80M 746G 1% /raid5
box:~# umount /dev/md3
box:~# mdadm -D /dev/md3
/dev/md3:
Version : 00.90.03
Creation Time : Fri Jul 7 15:44:24 2006
Raid Level : raid5
Array Size : 781417472 (745.22 GiB 800.17 GB)
Device Size : 390708736 (372.61 GiB 400.09 GB)
Raid Devices : 3
Total Devices : 4
Preferred Minor : 3
Persistence : Superblock is persistent
Update Time : Fri Jul 7 18:25:29 2006
State : clean
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1
Layout : left-symmetric
Chunk Size : 64K
UUID : cf7a7488:64c04921:b8dfe47c:6c785fa1
Events : 0.26
Number Major Minor RaidDevice State
0 3 1 0 active sync /dev/hda1
1 33 1 1 active sync /dev/hde1
2 8 33 2 active sync /dev/sdc1
3 22 1 - spare /dev/hdc1
Then you "grow" the RAID5.
box:~# mdadm /dev/md3 --grow --raid-disks=4
mdadm: Need to backup 384K of critical section..
mdadm: ... critical section passed.
Then you check the status:
box:~# cat /proc/mdstat
Personalities : [raid1] [raid5] [raid4]
md1 : active raid1 sdb2[1] sda2[0]
136448 blocks [2/2] [UU]
md2 : active raid1 sdb3[1] sda3[0]
70268224 blocks [2/2] [UU]
md3 : active raid5 hdc1[3] sdc1[2] hde1[1] hda1[0]
781417472 blocks super 0.91 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
[>....................] reshape = 0.0% (85120/390708736) finish=840.5min speed=7738K/sec
md0 : active raid1 sdb1[1] sda1[0]
2200768 blocks [2/2] [UU]
Then wait a while, when done, you can grow the filesystem...
Step #2: Growing the filesystem
Growing the XFS filesystem is a breeze:
# xfs_growfs /raid5
box:~# df -h | egrep '(^Filesystem|/dev/md3)'
Filesystem Size Used Avail Use% Mounted on
/dev/md3 2.6T 932G 1.7T 36% /raid5
box:~# xfs_growfs /raid5
meta-data=/dev/md3 isize=256 agcount=38, agsize=18314368 blks
= sectsz=4096 attr=0
data = bsize=4096 blocks=683740288, imaxpct=25
= sunit=128 swidth=768 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=2
= sectsz=4096 sunit=1 blks
realtime =none extsz=3145728 blocks=0, rtextents=0
data blocks changed from 683740288 to 781417472
box:~# df -h | egrep '(^Filesystem|/dev/md3)'
Filesystem Size Used Avail Use% Mounted on
/dev/md3 3.0T 932G 2.1T 32% /raid5
box:~#
PROS:
1) RAID5 (don't need to worry about a drive dying)
2) Only 5-15% CPU utilization under heavy I/O, here the dd is doing 40-120MB/s and the RAID5 process is only using 12% of the CPU (old 3.4GHZ Pentium4 Prescott)
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20105 bob 18 0 2008 540 440 D 36 0.1 0:05.68 dd
381 root 10 -5 0 0 0 S 12 0.0 99:08.57 md3_raid5
3) I can monitor all drives via smartctl (SMARTMONTOOLS)-- yes, 3ware allows a pass-thru to get to the drives, but many other RAID cards do not. This also means I can monitor temperature very easily as well.
$ ctemp
/dev/hda: ST3400832A: 35°C
/dev/hde: ST3400832A: 34°C
/dev/hdg: ST3400832A: 34°C
/dev/hdi: ST3400832A: 33°C
/dev/hdk: ST3400633A: 36°C
/dev/sda: WDC WD740GD-00FLC0: 30°C
/dev/sdb: WDC WD740GD-00FLC0: 31°C
/dev/sdc: ST3400633AS: 35°C
/dev/sdd: ST3400620AS: 37°C
/dev/sde: ST3400633AS: 36°C
/dev/sdf: WDC WD4000KD-00NAB0: 33°C
/dev/sdg: WDC WD4000KD-00NAB0: 30°C
4) I get between 100-133MB/s read from the array, which is nice.
CONS:
1) PCI bus is limited to 133MB/s.
2) Even though I use SATA drives on the motherboard, I believe they are also on the PCI bus as PCI-express was not out when my motherboard was created.
3) Write speed is 38-40MB/s sustained, again, I believe this is because of the PCI bus, it has to calculate/write PARITY and then the data..
4) Current case setup is a nightmare, which is why I ordered the Cooler Master Stacker. The entire case had to be modded to put fans where they did not belong and the cables are everywhere. Part of the problem is that some drives are IDE and some are SATA (IDE cables, even the round ones take up a lot of room). The Antec TruPower 550W handles the drives with no issues, at bootup it hits 500-520 watts and then after the drives have spun up it uses 220-280 watts.
Pictures of setup (below):
The two raptors are at the very top, followed by the two WD 400s and below that the rest are Seagate IDE+SATA.
Amazingly, with about 10-12 fans in the box, everything stays very cool.
Front of the case, I disconnected the temperature control in the front because it added an additional 3-6 power cables/fan control cables in the case, and as you can see, I have enough of those!
The side of the case, yes, its a mess.
Plan:
Build new machine.
New drives (possibly).
Use cooler master stacker.
Hopefully have a lot less mess!
Justin.