Expanding ZFS zpool RAID
I’m big fan of ZFS and all the volume managing options it offers. It’s often that ZFS makes hard things easy and impossible things possible. In an era of ever growing data sets, sysadmins are regularly pressed with the need to expand volumes. While this may be easy to accomplish in an enterprise environment with IBM or Hitachi storage solutions, problems arise on mid and low level servers. Most often expanding volumes means online rsync to new data pool, then another rsync while the production system is down and finally putting new system to production. ZFS makes this process a breeze.
Here is one example where ZFS really shines. Take a look at the following pool:
# zfs list | grep "tank " tank 1.75T 31.4G 40.0K /tank
and it’s geometry:
# zpool status tank pool: tank state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 errors: No known data errors
1.75TiB pool is slowly getting filled. As you see, it’s 6 disk pool consisted of two RAID-Z’s in stripe. It is approximate of RAID 50, in normal RAID nomenclature. It’s a lot of data to rsync over, isn’t it? Well, ZFS offers one neat solution. We could replace single disk with a bigger one, rebuild RAID and repeat the procedure 6 times. Finally, after last rebuild, we could ‘grow’ the pool to new size. In this particular case I decided to replace 500GB Seagate’s with 2TB Western Digital drives.
This is how the pool looks like after disk c2t5d0 is replaced with 2TB drive:
# zpool status tank pool: tank state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-2Q scan: resilvered 449G in 7h9m with 0 errors on Mon Dec 24 20:58:51 2012 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz1-0 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 raidz1-1 DEGRADED 0 0 0 c2t3d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 c2t5d0 UNAVAIL 0 0 0 cannot open errors: No known data errors
Now we need to tell ZFS to rebuild the pool:
# zpool replace tank c2t5d0 c2t5d0
After this command, rebuild process will start. Few hours later, state of the system is:
# zpool status tank pool: tank state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Tue Jan 1 14:43:22 2013 91.6M scanned out of 2.63T at 3.52M/s, 217h26m to go 91.6M scanned out of 2.63T at 3.52M/s, 217h26m to go 14.5M resilvered, 0.00% done config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz1-0 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 raidz1-1 DEGRADED 0 0 0 c2t3d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 replacing-2 DEGRADED 0 0 0 c2t5d0/old FAULTED 0 0 0 corrupted data c2t5d0 ONLINE 0 0 0 (resilvering) errors: No known data errors
After the process is finished, pool will look something like this:
# zpool status tank pool: tank state: ONLINE scan: resilvered 449G in 7h1m with 0 errors on Tue Jan 1 21:44:37 2013 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t1d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 errors: No known data errors
Once when all disks are replaced, only thing needed to grow the pool is to set autoexpand to on. If it was already on, then first turn it to off and then turn it again to on to grow the pool:
# zfs list | grep "tank " tank 1.75T 31.4G 40.0K /tank # zpool set autoexpand=off tank # zpool set autoexpand=on tank # zfs list | grep "tank " tank 1.75T 5.40T 40.0K /tank
And that’s it! We’ve grown striped 2x RAID-Z configuration from 500GB drives to 2TB drives, growing the total size from 1.7 TB to 7.10 TB. Enjoy the wonders of ZFS!
Not really a breeze compared to say Linux software RAID where you can just add new drive(s) to the array, grow the filesystem, and you’re done. ZFS can’t add new drives to an existing pool, which is a pretty serious limitation. This is especially limiting when you already have a pool of large drives (say 4TB) and need to expand.
While it’s true you can’t expand number of disks in zpool vdev, ZFS still has other benefits over other solutions, like send/recv, more usable snapshot/clones, etc.
“ZFS can’t add new drives to an existing pool,”
“While it’s true you can’t expand number of disks in zpool,”
Both of these assertions are incorrect: ZFS can add an unlimited number of devices to a pool, AND the newly added devices will automatically form a vdev, AND this vdev will be striped along with the rest of the vdevs, AND it is possible to mix and match different types of vdevs, AND this functionality has been available in ZFS since its inception, back in 2005.
I missed the “zpool vdev”. So matter of speach.
Say I have 2 mirrored vdevs in a pool. Can I expand 1 vdev with larger disks and get more space in my pool?
No. The reason for this is that the mirror has the vdev size of the smallest disk in the set. When you have, say, a 500gb drive pair as a mirror and replace one of them with a 2tb drive, the disk will behave as a 500gb drive until both disks are 2tb in size. then the mirror will grow to the smallest disk in the mirror set. ZFS will not allow storage to a portion of a disk that violates the rules of the set, in this case using a portion on disk A that cannot be mirrored to the other drive due to it being smaller.