This Monday, ZFS on Linux lead developer Brian Behlendorf published the OpenZFS 2.0.0 release to GitHub. Along with quite a lot of new features, the announcement brings an end to the former distinction between “ZFS on Linux” and ZFS elsewhere (for example, on FreeBSD). This move has been a long time coming—the FreeBSD community laid out its side of the roadmap two years ago—but this is the release that makes it official.
The new OpenZFS 2.0.0 release is already available on FreeBSD, where it can be installed from ports (overriding the base system ZFS) on FreeBSD 12 systems and will be the base FreeBSD version in the upcoming FreeBSD 13. On Linux, the situation is a bit more uncertain and depends largely on the Linux distro in play.
Users of Linux distributions that use DKMS-built OpenZFS kernel modules will tend to get the new release rather quickly. Users of the better-supported but slower-moving Ubuntu probably won’t see OpenZFS 2.0.0 until Ubuntu 21.10, nearly a year from now. For Ubuntu users who are willing to live on the edge, the popular but third-party and individually maintained jonathonf PPA might make it available considerably sooner.
OpenZFS 2.0.0 modules can be built from source for Linux kernels from 3.10-5.9—but most users should stick to getting prebuilt modules from distributions or well-established developers. “Far beyond the beaten trail” is not a phrase one should generally apply to the file system that holds one’s precious data!
- Sequential resilver—rebuilding degraded arrays in ZFS has historically been very different from conventional RAID. On nearly empty arrays, the ZFS rebuild—known as “resilvering”—was much faster because ZFS only needs to touch the used portion of the disk rather than cloning each sector across the entire drive. But this process involved an abundance of random I/O—so on more nearly full arrays, conventional RAID’s more pedestrian block-by-block whole-disk rebuild went much faster. With sequential resilvering, ZFS gets the best of both worlds: largely sequential access while still skipping unused portions of the disk(s) involved.
- Persistent L2ARC—one of ZFS’ most compelling features is its advanced read cache, known as the ARC. Systems with very large, very hot working sets can also implement an SSD-based read cache called L2ARC, which populates itself from blocks in the ARC nearing eviction. Historically, one of the biggest issues with L2ARC is that although the underlying SSD is persistent, the L2ARC itself is not—it becomes empty on each reboot (or export and import of the pool). This new feature allows data in the L2ARC to remain available and viable between pool import/export cycles (including system reboots), greatly increasing the potential value of the L2ARC device.
- Zstd compression algorithm—OpenZFS offers transparent inline compression, controllable at per-data-set granularity. Traditionally, the algorithm most commonly used has been lz4, a streaming algorithm offering relatively poor compress ratio but very light CPU loading. OpenZFS 2.0.0 brings support for zstd—an algorithm designed by Yann Collet (the author of lz4) which aims to provide compression similar to gzip, with CPU load similar to lz4.
These graphs are a bit difficult to follow—but essentially, they show zstd achieving its goals. During compression (disk writes), zstd-2 is more efficient than even gzip-9 while maintaining high throughput.
Compared to lz4, zstd-2 achieves 50 percent higher compression in return for a 30 percent throughput penalty. On the decompression (disk read) side, the throughput penalty is slightly higher, at around 36 percent.
Keep in mind, the throughput “penalties” described assume negligible bottlenecking on the storage medium itself. In practice, most CPUs can run rings around most storage media (even relatively slow CPUs and fast SSDs). ZFS users are broadly accustomed to seeing lz4 compression accelerate workloads in the real world, not slow them down!
- Redacted replication—this one’s a bit of a brain-breaker. Let’s say there are portions of your data that you don’t want to back up using ZFS replication. First, you clone the data set. Next, you delete the sensitive data from the clone. Then, you create a bookmark on the parent data set, which marks the blocks which changed from the parent to the clone. Finally, you can send the parent data set to its backup target, including the
--redact redaction_bookmarkargument—and this replicates the nonsensitive blocks only to the backup target.
Additional improvements and changes
In addition to the major features outlined above, OpenZFS 2.0.0 brings
fallocate support; improved and reorganized
man pages; higher performance for
zfs send, and
zfs receive;more efficient memory management; and optimized encryption performance. Meanwhile, some infrequently used features—deduplicated send streams, dedupditto blocks, and the zfs_vdev_scheduler module option—have all been deprecated.
For a full list of changes, please see the original release announcement on GitHub at https://github.com/openzfs/zfs/releases/tag/zfs-2.0.0.