Object Storage dominates the unstructured data discussion. But many organizations need high performance NFS, SMB and even Apple’s AFP to meet the demand of various use cases in markets like M&E, Science, HPC and Oil and Gas. The more traditional enterprise market is also using high performance NAS for functions like big data analytics and test-dev. Both specific verticals and the enterprise at large are in need of a next generation Network Attached Storage (NAS).
What’s Needed in NAS?
The introduction of object storage is changing the role of NAS. NAS no longer needs to be the everything of storage. Object storage can easily fulfill the long term data retention function, but it struggles delivering the performance that organizations need when analyzing or manipulating unstructured data.
The next generation NAS needs to be highly scalable, highly reliable and, of course, deliver high performance. The challenge is that in most cases the high scalability delivered by scale-out architectures requires a different form of data protection than traditional RAID, which inhibits performance.
Scale-out storage systems are a cluster of servers called nodes. Each node has capacity in it and that capacity is aggregated together creating a virtual pool of storage. The problem is traditional RAID data protection doesn’t work well in this design and most scale-out NAS vendors are going to some form of erasure coding for data protection. Erasure coding creates is protection by segmenting data, often at a file level, and placing the parity bit at different locations in the cluster. Generally erasure coding works well within the scale-out design. But erasure coding, because of its highly computational nature, creates overhead that impacts performance.
The computational overhead of erasure coding also creates a performance challenge when there is a failed drive or node. There is simply too much math to apply on the fly. The problem is in a large scale-out storage cluster, a failed state can be almost constant.
The other problem with scale-out architecture is managing metadata. Most scale-out architectures bottleneck as file count increasing making metadata management more complex. This forces organizations to add nodes, not for capacity but for performance. As a result most scale-out architectures end up being capacity-heavy.
Introducing Rozo
Rozo is a file-system company. The solution is available as software, and can be installed on any type of hardware. The software nature of RozoFS, means it can be deployed in private, public or hybrid clouds. The deployments are also highly automated.
RozoFS leverages a proprietary form of erasure coding called “Mojette Transform” that it developed and patented. Mojette is a high performance erasure coding technique that provides excellent performance for both random and sequential workloads. It delivers the benefits a triple replication (3-way mirror) but only using 1.5 times of capacity. The design also provides seamless repair of failed nodes without impacting performance.
Rozo is an Asymmetric Architecture, which means the metadata is processed by a separate set of servers. If the organizations needs to store more files or support more file systems it simply adds more metadata servers. If the organization needs more capacity or faster access to that capacity it simply adds more NAS heads. The result should be a relatively well-balanced architecture that can scale very large while being very efficient.
For environments that are not taxed in terms of file count, namespace count or capacity, then the metadata controller function can run on the NAS heads to lower the cost of the deployment.
In version 3.0 of the file system, Rozo adds support for RDMA over Converged Ethernet (RoCE v2), which will provide reduced latency, increased network bandwidth and lower CPU consumption per node. Rozo also spent time optimizing its SMB protocol support and now supports SMB v3.
A lacking item is snapshots, which are used for a variety of situations, but in file systems they are commonly used to recover deleted files. Rozo takes a step toward snapshot capabilities by delaying the time a deletion of a file is fully executed.
Finally, RozoFS introduces a new API that reduces the time to perform metadata searched on large multi-PB file systems from hours to second.
StorageSwiss Take
The data center still needs scalable, high performance file systems. Most available NAS and file system technology is based on code that is at least a decade old. Given unstructured data is easily the biggest challenge facing most data centers, it may be time to look for new ways to deliver high performance access to this data. RozoFS is worthy of consideration.