Enterprise data centers are looking longingly at web-scale cloud providers. They envy the cloud providers’ ability to scale rapidly so they can meet performance and capacity demands. They also envy the cloud providers’ cost effectiveness of the designs. In this Storage Q&A, we join Storage Switzerland Founder and Lead Analyst George Crump and Nutanix Solutions Marketing Manager Sachin Chheda. During the Question and Answer portion of a recent webinar, they discuss how more traditional enterprises can take advantage of web-scale architectures.
Watch our on demand webinar “Web-scale vs. Enterprise IT"
Question: My applications are not web-scale ready, should I still consider the concept of web-scale for my data center?
Sachin: Absolutely, this is what Nutanix brings, uniquely, to the table. We are able to deliver the benefits of web-scale IT or IT infrastructure to the enterprise workloads. We abstract out all the storage scalability challenges that web-scale solves and deliver that to a VM. That is why Nutanix is successful in delivering that web-scale benefit to workloads such as SQL, Exchange, and SharePoint.
Question: “Up-time is key for my apps, what are some things I should look for when evaluating a web-scale infrastructure?
Sachin: I think the first thing is resiliency. Resiliency is the concept of keeping your data secure and protected within a system or a cluster. Nutanix does that based on the concept of tunable redundancy. As soon as data comes in it is replicated, or made redundant on another node so that a copy of that data exists in two locations within a cluster. This means that you could technically lose an entire node on a Nutanix system and still survive, and still keep the applications running. So I think resiliency is the fundamental requirement for every single deployment infrastructure out there. Having tunable redundancy – which basically gives the ability to add more resiliency in terms of copies of your data- is definitely highly desirable. Especially if tier one workloads are being moved on.
George: I think another point there is that this isn’t RAID. That is a challenge we see in the environment all the time nowadays. I was just talking to a guy who had high capacity drives, it took him two weeks to do a RAID rebate, so he was running on an unprotected state, praying over his array every morning before he walked in, because one more drive failure and he’s toast! So as I understand it, you guys wouldn’t be exposed to that?
Sachin: Right, and that’s where we really differ. This is not a hardware-based solution. This is not a RAID solution in general. This is truly a redundancy solution, and that’s because using a web-scale concept, we keep the data at the forefront, and we keep the redundancy as a function of the actual system, of the storage itself versus it only happens when you’re storing data to disk. It’s actually elevated in it’s importance.
George: Right, and I would assume on a node failure, your ability to get back in to whatever the protected state was, is going to be very quick because you don’t have to do parody checks, correct?
Sachin: Absolutely. The second key, is as I mentioned earlier, the concept of snapshot and replication at the VM level. I think that’s important for people to realize. Snapshot is a point in time copy, it is basically creating different restore points, so this discards against things like, “Oh! I screwed up and I deleted this file and I need to recover,” or, “My database got corrupted and I want to go back to the last hour and basically rebuild it from my log files.” I think that’s important.
This snapshot and replication capability is quite complimentary with what Microsoft has done with Microsoft Exchange and SQL always on. Exchange has this thing called DAGs and then SQL has this concept of Always-On which basically creates real time clones and replicas – we could get more into that in a subsequent webinar. However what we find is people want to have it safe even beyond that, and that’s where the concept of snapshot and cloning and replication comes into play.
The last key is the concept of easily restoring data, and that’s where having VM awareness is very important. We can clone at the VM level, you have a snapshot and you clone it at the VM level, by doing so you’re able to quickly restore from those files. And that’s a huge win, because no longer do you have to wait through the LAN, the VMFS, the actual VMBK, to restore something. It’s a lot more elegant.
Question: Do you see the need for organizational changes when deploying a web-scale?”
George: This is a really good question. So, for example: I decide to go web-scale, what changes if any do I need to make organizationally to make this work better? So an example here would be like a business critical SAP deployment with SQL server.
Sachin: The important thing to know is that we are collapsing a lot of the complexity that is existing in an infrastructure stack. I think it’s important for IT organizations to acknowledge this up front and say, “OK, we are going to do away with the complexity that exists in storage networking, and create an agreement that these are the goals associated with that environment.” Customers that I’ve talked to, here at Nutanix, found that it was always easier to accomplish their goals if they had the conversations up front with the organization of their intent.
Generally what happens is organizations start small, and then they grow over time. So by starting small, they are able to agree on the possible impacts could be and try things out in a very elegant fashion. A large retail customer went down this path, and they’ve created what we call a FRU model for their IT infrastructure. When they first proposed this idea, they wanted to make sure that everyone was bought off, that this was the way they wanted to go down the path. What that meant was, they were able to get that agreement that all the IT building blocks, the infrastructure, were considered to be FRUs which is made possible by Nutanix. By having that conversation upfront, they were able to solve a lot of their organizational challenges. It also gave a lot of the guys that were traditionally involved in storage an opportunity to get involved in the process sooner and find a proper role for them. They found out that they were actually the biggest proponents of Nutanix going forward.
Question: I have a lot of remote branch office sites with Microsoft SQL in those offices, are you guys applicable in that world?
Sachin: Absolutely, that is the beauty of Nutanix’s architecture. We have the ability to use various amounts of hardware with varying configurations, compute, storage, flash, and disk drives. We actually have a NX1000 series, which is our entry level family. A lot of our customers start with the NX1000 series or the NX3000 series in their remote offices to basically consolidate both compute and storage onto one platform.
Over time if their needs grow, they can just add nodes to that same infrastructure. They do not have to go upfront with a very large V-block, for example. What’s also interesting about this is Nutanix also has a product called Prism-Central which allows for IT administrators to manage multiple Nutanix clusters, they may be located in the main data center, or even in a remote office sight from one console. So now IT organizations can monitor and manage all these different instances from one pane of glass. That’s huge because they don’t have to go and log in to that system separately. It’s all managed from one central location.
Question: How does troubleshooting, say database performance issues, change in this environment?
Sachin: Generally when you’re looking at troubleshooting, in traditional IT infrastructure, it’s usually troubleshooting by committee. You have to get a whole team of people involved, and unfortunately there’s a lot of finger-pointing that goes along with that. It’s not uncommon to spend weeks on troubleshooting for a performance problem, especially if it’s critical and could impact the business.
What we’ve found with Nutanix is customers are able to do away with a lot of the complexity of troubleshooting, because our Prism management framework essentially offers up an elegant way to access information to the VM level. So we do understand VM centric manageability, we understand resource consumption by different VMs, and we also have views into the storage etc. That means that troubleshooting can be done a lot quicker. We also have API’s that can be used to run reports, people can pull stats in and then export it out using some of their favorite tools in terms of mapping and graphing. That really solves a lot of problems, things that used to take weeks, will take hours, if not less.