Unprecedented data growth, increased demands for performance and new use cases like analytics and IoT are pushing Network Attached Storage (NAS) to the breaking point. Organizations are looking to cloud storage to help them solve the problem and stop the endless upgrade cycle. In response, vendors flooded the market with potential solutions hoping to get the “cloud integration” checkmark, but not all of those solutions are the same. IT planners need to understand the various Cloud NAS architectures to select the solution that best fits the need of the organization.
Clarifying the Goals of Cloud NAS
Before examining the differences between the various Cloud NAS architectures, the organization must decide what are its goals for the Cloud NAS initiative. Each of these goals must be looked at holistically and weighed together. For example, a typical goal of Cloud NAS is to reduce the on-premises storage footprint and also to reduce cost. Another goal, however, may be to maintain or even improve performance, which typically means keeping more data on-premises and increasing costs. It is critical that IT weigh all these goals together and look for a solution that strikes the right balance.
Understanding The Cloud NAS Architectures
On-premises NAS with Cloud Extension
A common approach to achieving some cloud integration is for NAS vendors to extend their solutions to the cloud. NAS with Cloud Extension is a technique commonly used by legacy NAS vendors to get the “cloud checkmark”, and in most cases, cloud support is relatively basic. If the legacy NAS supports the cloud at all, it typically does so as a disaster recovery copy of the NAS, which means it does nothing to reduce on-premises storage footprint nor does it reduce costs (actually it increases costs). Ironically, the cloud copy also does not even provide a disaster recovery solution since the NAS itself can’t instantiate in the cloud. All of that NAS data has to be copied back to a replacement NAS in the event of a disaster.
At the other end of the spectrum is Cloud-Only NAS. These solutions create a software-only version of the NAS system and instantiate it in the cloud. The Cloud-Only NAS solutions may work for cloud-hosted applications; it does little for on-premises applications and users. The answer is typically applicable for two use cases. First, because it allows cloud storage to emulate more traditional protocols like NFS, SMB and occasionally iSCSI, legacy applications can move to the cloud without necessarily changing the way they access storage. Instead of going through a POSIX to Object change in coding, these applications should be able to run almost unchanged in the cloud.
The problem is Cloud-Only NAS solutions don’t typically support bi-directional access. Once the application migrates, all access to that application has to be cloud-based. There is no concept of tiering or caching responses locally. If the organization decides to move an application back on-premises, that too is possible, since both on-premises and cloud instances are leveraging legacy protocols. The move back to on-premises, however, does require that the application is shut down and then manually have its data copied back into the data center.
The other use case is for organizations with multiple branch offices. A Cloud-Only NAS does allow them to create a centralized repository for all file storage and access. The problem is, again, that all this access is to and from the cloud; there is no on-premises synchronization. The connection has to be very reliable and relatively high performance.
Hybrid Cloud NAS
Hybrid Cloud NAS, until recently, seemed like the best compromise for organizations looking to leverage the cloud as a centralized storage repository for unstructured data. A Hybrid Cloud NAS uses an on-premises caching appliance that stores the most frequently accessed and recently modified data locally. The appliance provides users with direct access to data as if they had a local NAS. When data changes, it is written locally, and then replicated to the cloud as quickly as possible.
Some of these Hybrid Cloud NAS solutions even can run without an internet connection and can continue to serve files, assuming the requested files are on the appliance. Once the network connection is back up, data synchronizes between the appliance and the cloud with IT being alerted to any conflict (the same file changed in two locations at nearly the same time).
While Hybrid Cloud NAS is a step in the right direction, the problem is the performance delta between accessing on-premises data, and cloud-based data is still significant. The problem is the latency involved in accessing and retrieving files in the cloud. The cloud data center is just too far away.
In most cases, users will notice the difference. For user home directories the performance gap might be something an organization can tolerate, but unstructured data is far more than just user home directories. It is data used to help an organization create a more personalized web/ordering experience for their customers, gain customer insights, detect credit card fraud, improve products and improve operations.
This data often called big data is often sequentially read and written, which makes it very cache unfriendly. Not only will almost all of the accesses of these data sets come outside of the cache, but this activity invalidates the cache making it also unusable for cache-friendly data like home directories. The net impact is all or at least most, accesses are to the cloud and again because of the performance delta between the two storage areas the solution has to be limited to just home directories.
The performance delta between on-premises performance and cloud storage performance limits the use cases for a Hybrid Cloud NAS solution to essentially home directory use cases and collaboration between multiple branch offices.
Multi-tier Hybrid Cloud NAS
The last option is Multi-Tier Hybrid Cloud NAS. This architecture is similar to cloud NAS except it adds a secondary storage tier at a regional cloud provider that is close to the primary data center. The regional cloud provider can store all of the active and near active primary data set, while the public cloud is available to store a DR copy of data, copies of versions of data and copies of data used for various activities like test/dev. The on-premises appliance is still available, but now it only needs to be large enough to cache new or modified data as well as the most actively read data.
The Multi-Tier Hybrid Cloud NAS architecture allows for the on-premises cache to be much smaller in size, and therefore less expensive, than the Hybrid Cloud NAS solution. It automatically protects data with copies created at the regional cloud storage location and multiple copies made in the cloud. And since the data in the cloud is in operation, it can be interacted with via cloud compute resources fulling such functions as Disaster Recovery as a Service (DRaaS), test-dev, reporting and analytics processing.
The architecture allows for multiple use cases. Certainly, it can be used to take care of the home directory use case, but thanks to the additional regional tier the architecture is suitable for the full range of unstructured data use cases, including the various big data use cases.
Leveraging the cloud for unstructured data services makes sense, but the resources of the cloud have to be used intelligently. Bolting on the cloud to just “hit the cloud checkbox” may not solve all the challenges associated with unstructured data, and in fact, it may make the situation worse. Organizations need to consider solutions that can reduce the on-premises footprint of unstructured data, but still use that data for the wide variety of use cases that the data center faces today. A multi-cloud architecture not only fulfills these requirements, but it also opens up further possibilities beyond just file access.