If Primary Data can do to data storage what VMware did to server computing, there could be a titanic shift in the storage industry in the coming years. Primary Data’s “data virtualization” platform is designed to liberate information from the confines of individual storage silos in much the same way that server virtualization has decoupled application workloads from physical servers.
Veteran IT professionals might think that Primary Data is yet another player in the storage virtualization space, however, CTO and co-founder David Flynn states there are clear distinctions between their solution and traditional storage virtualization technologies. Traditional or classic storage virtualization offerings in effect, make multiple islands of SAN, NAS or DAS platforms appear as one large system. It gives administrators a single point of management for provisioning storage resources across the enterprise. Flynn claims that Primary Data’s offering doesn’t focus on virtualizing physical storage capacity so that it can be presented and managed as one large logical pool of disk. On the other hand, Primary Data’s approach is to abstract data’s location dependance so it can be seamlessly moved for enhanced storage I/O performance and for efficient data placement on the appropriate storage resource.
Storage I/O Speed Bump
Some storage virtualization solutions provide all the storage services, like snapshots, replication, thin provisioning, etc., required to manage and protect data. Other virtualization technologies provide centralized management and provisioning capabilities, while allowing the back-end arrays to perform all the storage services. There are pros and cons to each of these approaches, however, one common challenge is that they both introduce a “speed bump” into the storage networking path. Since all storage I/O has to pass through the virtualization management node(s), an additional layer of I/O latency is introduced.
Anything-to-Anything Data Access
Flynn states that Primary Data’s offering enables application hosts to “talk” directly to the storage resource on the backend regardless of the underlying storage protocol or data type. So for example, a client host can access file, block or object storage regardless of its own physical interconnects. This means that a client with an IP connection can access SAN and object storage as easily as it can access NAS capacity. Likewise, a fibre channel connected host can “talk” to a NAS or object storage system. But critically, there is no additional network hop for the client to pass through to gain access to data.
Virtualized Data Tiering
While this protocol agnostic storage access can greatly simplify and speed up data movement, another one of Primary Data’s benefits is that it allows storage administrators to automate data movement to the appropriate storage tier, regardless of manufacturer, based on pre-defined policies. Primary Data provides application templates that storage managers can apply to their environment so that data can be prioritized and allocated to the right storage resource based on its metadata attributes. For example, hot Oracle database tables can be assigned a policy to be placed on vendor A’s flash storage system, while infrequently accessed user files can be designated for vendor B’s lower cost NAS or object storage.
Data Movement Paralysis
Flynn claims that 80% of storage IOPS are wastefully going to the primary storage array. On the other hand, if data was more fluid, highly accessed data could be promoted up into a flash tier (server or array based) to immediately accelerate performance. On the flip side, 80% of primary storage capacity is holding cold files or objects that are only rarely being accessed. The challenge is that administrators are too afraid to move them off because it is a manual process to migrate them and it is a manual process to pull them back when they are needed. In other words, they are bound to the storage system on which they reside.
Data Movement Transparency
With Primary Data’s solution, cold data can transparently be moved off to low cost storage in a private or hybrid cloud environment and they can just as easily be retrieved when needed because it is abstracted from a specific storage system. This can result in much more efficient storage utilization by placing data where it needs to be at the moment it needs to be there. Flynn likened these efficiencies to how VMware allows for better utilization of CPU processing power by allowing more applications to be packed on to the same server resource.
Primary Data’s architecture consists of two components – a data director appliance that maintains a metadata inventory of all the information across the data center and a data hypervisor, which is a lightweight client that resides either on the physical server running a hypervisor or in the virtual machine, at the guest OS level. The data hypervisor references the data objects by looking them up in the data director and then once it resolves where the data resides, it allows the application server to speak natively to the storage system whether it is block, file or object based.
The concept is to take the metadata management function out of the physical storage system entirely. By maintaining a centralized inventory of file metadata (file names, directories, permissions, access times, etc.) and hosting it as a distinct service outside of the storage system, storage I/O can be greatly accelerated and data can be placed on the appropriate storage resource.
Open systems storage virtualization has been around for close to two decades. While this segment of the industry has had moderate success over the years, it is experiencing something of a revival and renewed credibility with the recent entry of players like EMC and VMware into the software defined storage market space. With virtualized or “software-defined” networking also starting to take root into the IT lexicon, it would seem that the next logical progression beyond server virtualization is to abstract the actual data within the enterprise.
Since the value of information infrastructure is providing nimble access to data itself, it seems like Primary Data’s focus on metadata management, rather than storage infrastructure management, is spot on. As always, it’s going to boil down to execution. Vendor incumbency is still going to be a major hurdle with industry powerhouses like EMC, IBM and HP still dominating the storage landscape. Unlike these companies though, Primary Data plays a complementary role. With its data abstraction it allows for the seamless movement of data between vendor storage systems, yet allows them to be separate entities.