Thin provisioning is a storage allocation process that improves the write efficiency of a storage system by enabling it to consume less capacity for storing a given amount of data. It is becoming an increasingly popular, almost commonplace, feature available on a wide variety of arrays. The tendency in the fast paced world of IT is to assume that these systems all present thin provisioning the same way. In reality these offerings are NOT the same and in fact their differences may significantly impact how storage managers are able to take advantage of this technology.
The Thick Provisioning Problem
In traditional storage systems, physical capacity is allocated in advance and in excess of what is actually needed, in other words, ‘thickly’ provisioned. This over-allocation is used to support databases and other applications for which capacity expansion is difficult or impossible. It’s also used to simplify overall storage management, saving the storage administrator from the time consuming and somewhat risky storage expansion tasks down the road.
By reserving enough storage capacity to support a theoretical upper bound over-allocation can eliminate the disruption that future expansion can cause to databases, software applications or general file serving. It can also reduce the administrative overhead that expanding traditional storage volumes can require. But over-allocation reduces the effective utilization of a storage volume which drives up costs.
What is Thin Provisioning
Thin provisioning uses a more dynamic process which allocates physical space on disk that’s closer to the amount actually needed and closer to the time it’s needed. This is done by logically reserving capacity that approaches this upper bound but only physically allocating that space when data is written. The affect is to allocate more storage capacity than is physically available by ‘promising’ storage to applications or operating systems up front, but relying on the user’s ability to physically add that capacity before it’s actually consumed.
Converting Thick to Thin
Aside from simply creating thinly provisioned volumes there is a need to convert ‘fat’ volumes into thin ones, something that arises when data is migrated from an old storage system to a new one that supports thin provisioning. To address this need thin provisioning systems were developed with the ability to perform ‘zero detection’, a process which skips the long strings of zeros that storage systems use as placeholders when over-allocating capacity in traditional ‘thick’ provisioning.
Another issue is that thinly provisioned volumes can become ‘fat’, as data changes and is overwritten. This is caused by operating systems not communicating with the storage systems that capacity is available when a file is deleted. Traditionally, they’ve had no mechanisms to release these data blocks to the storage system for reuse.
So thin provisioning again evolved to address this by developing APIs with the storage volume management vendors that enabled dynamic reclamation of newly freed capacity resulting from file deletions. Now storage vendors are evolving thin provisioning technology again to add even more efficiency and further reduce effective costs of storage.
The Thin Provisioning Problem
The basic tenet of thin provisioning is allocating physical space when it’s needed, meaning capacity shouldn’t be allocated in advance and allocation, when it occurs, should be as small as possible. Ironically, many storage systems that practice thin provisioning don’t follow this at all times. When a volume is created these systems require that an initial amount of storage capacity be allocated, typically a small percentage of the total size that the volume will be.
What’s more, this unused capacity is maintained, even as data is written to the volume. These systems reallocate this ‘headroom’ of storage automatically, as the volume reaches this threshold, essentially assuring that the wasted percentage of capacity is maintained. This process can also kick off at inopportune times, such as when an application is running a CPU-intensive set of routines.
Start at Zero
Historically this has been a relatively small amount of storage space given the total size of the volume. But, as more volumes are created over the lifespan of the system, such as boot volumes for an increasing number of VMs and virtualization hosts, this wasted capacity can add up to a significant amount of storage. And this automatic allocation takes some of the control away from storage administrators potentially creating a performance issue.
Companies like Dell with their Compellent storage arrays don’t commit any physical storage when a volume is created. They only generate a little metadata making new volumes a ‘non-issue’ from a storage perspective, and a processing perspective. This ‘start at 0’ philosophy, with no upfront storage allocation, follows the consumption-on-use model that true thin provisioning was originally designed around.
While dynamic allocation is much more capacity-efficient than the up-front, over-allocation of traditional methods, there’s a detail that impacts just how efficient that process is. Each thin provisioning technology still must physically allocate space before it’s written; the question is how much space. The smaller that increment of allocation is the less capacity will sit idle before it’s consumed. This granularity of data blocks or pages written is an important factor in overall effectiveness of a thin provisioning process, one that may get overlooked. The system that allocates in pages of a few MB instead of blocks that may be hundreds of MBs or even GBs, will be more efficient and create ‘thinner’ volumes.
In addition to not over-allocating storage space for volumes or files upfront to support potential future needs, thin provisioning can also save space by more efficiently writing data to existing ‘thick’ volumes. In addition to detecting zeros when converting a thick volume, some thin provisioning systems will actually detect zeros that are used by operating systems to reserve space within files or volumes that are already created. An example of this could be the creation of a new database within an existing volume.
Writing thin means tracking these strings of zeros at the drive level and not actually committing them to disk. These capacity reserving (and capacity consuming) writes are acknowledged in the OS which ‘tells’ the database, in this example, that the space has been reserved with zeros, but not written.
Historically the SCSI “trim” command was the standard process by which storage was ‘deallocated’ after a delete operation, freeing up space to be reused. If the operating system supported this command everything was fine, if not, a free space recovery agent was needed to recapture storage after data was deleted. Now the “unmap” command is part of the SCSI standard, allowing storage systems to reclaim capacity from deleted data, regardless of whether it’s supported by the OS. So it’s important for storage systems that offer thin provisioning to support this SCSI unmap command.
Thin provisioning is an exciting technology that can have an immediate and lasting impact on storage consumption. Originally, it could only be applied to new volumes but the technology has evolved to encompass traditional ‘thick’ volumes as well.
Current generations of thin provisioning, like those available from Compellent, are taking this technology to the next level, a welcome development for companies trying to maximize their investments in solid state storage while needing to provide consistent storage performance. They’re creating a ‘thinner’ provisioning process by leveraging more granular allocation, applying the fundamentals of thin provisioning from the first byte of data written and writing thin to traditional volumes so they don’t get any thicker.
Dell Compellent is a client of Storage Switzerland