Shutterfly is the market leader in digital, personalized photo products, currently serving millions of customers who benefit from unlimited, secure (and free) storage, plus a 100 percent happiness guarantee. They help customers turn their pictures into lasting keepsakes, such as photo books, personalized cards and stationery, as well as custom home decor products and unique photo gifts. Obviously, these pictures are important to users (for many it’s their only copy), but this data has future value to Shutterfly as well. So protection of that data is ‘mission critical’ for the data storage management team.
Shutterfly’s offer of free, unlimited capacity for users creates an almost unimaginable volume of data that needs to be handled, manipulated and saved, potentially forever. So this team has to design an architecture that will scale to meet the capacity demands of an enormous number of photos, and still remain cost competitive. Usually, the terms “reliable, scalable and cost competitive” aren’t used to describe the same product. This report will discuss how Shutterfly found a technology that enabled them to design a solution that actually met these three requirements.
The ‘Before’ Infrastructure
When users uploaded a photo it was written in two locations on Shutterfly’s back-end storage infrastructure, both of which needed to be verified as identical. This was part of their customer assurance process. Obviously, web-based services like this need to do more than just make sure customers’ data is protected, they need to safeguard their customers’ experiences which meant keeping storage latency to a minimum. Shutterfly takes a ‘no holds barred’ approach to maintaining this experience.
If any back-end component in the infrastructure had a problem with this data verification process it could mean a delay for the customer. This didn’t mean any data was lost, just that their data redundancy process hadn’t been completed, but it could have a negative impact on the customer’s experience.
The 80 Petabyte Challenge
As Shutterfly’s business grew its storage infrastructure became so large that a concern arose within management that the potential for customer delay, caused by storage latency, could become a problem. The company needed an infrastructure that could scale into the hundreds of PBs while maintaining reasonable file access times.
The traditional RAID-based storage systems they used were reliable, but the sheer number of drives deployed made the mathematical probability for failure relatively high. As Cleversafe Fellow, Storage Architect Justin Stottlemyer explained the situation: “An 8+2 RAID set may promise you one catastrophic failure every 5000 years, but if you have 5000 of these RAID sets, one’s going to fail every year.”
For a business like Shutterfly’s losing data isn’t the only concern, another cost of storage failures is the time required for data recovery. Service delays caused by storage latency can be just as detrimental to customer retention. So eliminating the potential for long RAID rebuild cycles was a key requirement.
The first thought was to simply make another copy, which would mean storing three copies of their entire data set. This was a nonstarter, given the scale of Shutterfly’s storage environment. (note: Shutterfly’s current customer data set is over 80PB, although smaller at the time referenced here). The cost of raw storage capacity and the amount of replication and data handling this architecture would require was prohibitive.
As a cloud-based, consumer service, Shutterfly has to maintain very tight cost parameters. They needed another approach, a storage infrastructure that had better resiliency, better storage efficiency and was more cost effective.
Rather than the brute force of simply adding another copy of data Shutterfly took a more intelligent approach. They used Cleversafe’s dsNet system to store the ‘golden copy’ of each customer’s pictures, leveraging the robustness of its “dispersed storage” architecture to alleviate the need for a third (or even a second) full copy.
RAID and replication
Historically, storage environments have used RAID to maintain data integrity within the storage array, and then replicate entire data sets to different physical storage arrays and different locations. This process ends up generating multiple copies of data which consume storage processing resources, network bandwidth and, of course, lots of storage space.
Traditional RAID and replication strategies were not designed for ‘hyperscale’ environments like Shutterfly’s, which have scale-out storage systems that grow into the hundreds of petabytes and beyond. But in addition to being more efficient, solutions like Cleversafe’s dsNet are also more robust than traditional RAID and replication methods.
Using erasure coding and a data dispersion process within an object-based storage architecture is what Cleversafe calls “Information Dispersal”. The system first parses a data set into multiple components or segments. Then, each segment is expanded with some additional information, somewhat like a parity calculation, to create a more resilient superset of data. With a mathematical algorithm operators can use this superset to recreate their original data. But unlike RAID technologies which can only lose one or two data segments or drives, Information Dispersal can accurately reproduce data after losing many more pieces, giving it far better resiliency.
Information dispersal with its multiple tiers of parity is designed for this kind of environment and can realistically promise much better reliability. Mathematical failure rates are in the range of millions of years. This moves failure out of the realm of relative certainty and back into that of extremely remote probability.
What this means for Shutterfly
By providing better reliability with its protected single copy of data than Shutterfly was getting with multiple copies on their original system, Cleversafe enabled the company to re-architect their storage infrastructure. Now they can use their original arrays to support the most immediate aspects of Shutterfly’s business model, providing a superior online photo experience. Instead of worrying about how to meet future capacity demands (and how to afford it) they can configure these arrays for performance and leave the capacity expansion issues to Cleversafe.
Conversion to Object Storage
How to implement an architectural change in an environment this large is a primary factor in the decision process. After all, a company like Shutterfly can’t simply shut their existing system down. As it turned out, this conversion step was relatively simple.
Traditional file systems store the file pathway information with an entry in a relational database, so accessing a file requires a single column lookup process to find the file metadata. As an object-based storage system, Cleversafe uses object ID numbers (OIDs) in a flat index to access files, instead of a database. By simply adding a second column lookup for the OID, Shutterfly was able to access the file metadata, giving them that file’s storage location.
While it added another metadata step, this architecture gave them almost unlimited growth potential, a trade off that was well worth it. According to the Chief Storage Architect, the Cleversafe system has orders of magnitude better scalability than their traditional storage systems. They calculate they can now scale capacity into the Exabyte range.
Shutterfly’s IT department has the responsibility to make good on their offer to provide unlimited photo sharing and storage for an unlimited number of potential users. While their scale is uncommon their needs are not, handling and storing large data sets safely and efficiently. As is often the case, more traditional environments can learn a lot of best practices from an extreme environment.
Cleversafe’s object-based storage system that leverages a unique information dispersal technology is providing the scalability this huge internet photo website requires, at a cost that keeps them competitive. But this same technology can lower storage costs and improve performance for storage managers in more traditional IT environments as well – and address any scalability requirements they could probably imagine.