AIS: Squeeze 15 pounds of data into a 5 pound bag

Try using a sledgehammer to pack 15 pounds of potatoes into a bag with a 5-pound capacity, and what do you end up with? Too much messy and disgusting material crammed into a vessel too small for the job and a lot of sloppy over-spill.

Unless you have the right sledgehammer. What does this have to do with computer storage? Plenty. And it all starts with a new data-reshaping capability of Seagate’s SandForce DuraWrite technology. Keep the numbers in mind: 15 pounds of data in a 5-pound bag. Or three units of data in a space designed for one. Its key. More on that later.

What does DuraWrite technology do again?
My blog Write Amplification (Part 2) talks about how DuraWrite data reduction technology can make more space for over provisioning. That translates into faster data transfers, longer flash memory endurance and lower power draw. What it has NOT done is increase the space available for data storage.

Until now.

DuraWrite Virtual Capacity is like the Dr. Who TARDIS
Excuse my Dr. Who phone booth reference, but for those who know the TARDIS, it provides a great analogy. For those unfamiliar with the TARDIS, it is a fictional time machine that looks like a British police call box. It is very small by external appearances, but inside it is vast in its carrying capacity, taking occupants on an odyssey through time.

DuraWrite Virtual Capacity (DVC) is a new feature of our SandForce SSD controllers, and its a bit like the TARDIS. While there is no time travel involved, it does provide a lot more than can be seen. DVC takes advantage of data entropy (randomness) as data is written to the SSD. Some people like to think of it as data compression. Whatever you call it, the end result is the same — less data is written to the flash than what is written from the host. DuraWrite technology alone will increase the over provisioning of an SSD, but DVC increases the user data storage available in an SSD (not the over provisioning).

How much more space can it add?
The efficiency of DVC is inversely related to the entropy. High entropy data like JPEG, encrypted and similar compressed files do little to increase data capacity. In contrast, files like Microsoft Office, Outlook PST, Oracle databases, EXE, and DLL (operating system) files have much lower entropy and can increase usable storage space on the order of two to three times for the same physical flash memory. Yes, I said two to three times. Better still, that translates into a two to three times reduction in the net cost of the flash storage. Again, no typo: two to three times more affordable. Since most enterprise deployments of flash memory are limited by the cost per GB of the flash, this kind of advance has the potential to further accelerate flash memory deployments in the enterprise.

Why hasn’t this been done in the past?
Think TARDIS again. Step into the booth, and take a joy ride through space and time. It happens on-the-fly. Simple but, only in a fictional world. With on-the-fly data reduction and compression, the process is filled with complexities. The biggest problem is most operating systems don’t understand that the maximum capacity of a primary storage device (hard disk or solid state drive) can increase or decrease over time. However, open source operating systems can address the issue with customized drivers.

The other problem is any storage device that includes data reduction or compression must use a variable mapping table to track the location of the data on the device once data is reduced. Hard disk drives (HDDs) do not require any kind of mapping table because the operating system can write new data over old data. However, the lack of a mapping table prevents HDDs from supporting an on-the-fly data reduction and compression system.

All solid state drives (SSDs) using NAND flash feature a basic mapping table, typically called the flash translation layer (FTL). This mapping table is required because NAND flash memory pages cannot be rewritten directly, but must first be erased in larger blocks. The SSD controller needs to relocate valid data while the old data gets erased. This process, called garbage collection, uses the mapping table. However, the data reduction and compression system requires a mapping table that is variable in size. Most SSDs lack that capability, but not those using a SandForce controller, making SSDs with SandForce controllers perfectly suited to the job.

What use cases can be applied to DVC?
DVC can be used to increase usable data storage space or provide more cache capacity flexibility by two to three times. To create more usable data storage space, the operating system must be altered with new primary storage device drivers for it to understand the drives maximum capacity, which can fluctuate over time based on how much the data is reduced or compressed.

To support greater cache capacity flexibility, a host controller would manage the flash memory directly as a cache. The controller would isolate the flash memory capacity from the host so the operating system does not even see it. The dynamic cache capacity would increase cache performance at a lower price per GB depending upon the entropy of the data. The Seagate Nytro product line and some SandForce Driven program SSDs already support both of these use cases.

When will this appear in my personal computer?
While DVC is already being deployed and evaluated in enterprise datacenters around the world, the use in personal computers will take a bit longer due to the need to have the operating system changed with new storage device drivers that understand the fluctuating maximum capacity.

When these operating system changes come together, you will not need that sledgehammer to pack more data into your TARDIS (SSD). Now that’s a space odyssey to write home about.

4 Comments

Paul M November 21, 2013 at 5:20 pm - Reply

How will this ‘new found capacity’ be advertised?

Will a unit that previously had “256 GB” now be claimed to have ‘512GB’? Or some other number based on some ‘average compression ratio’ or such?

IOW, will we now be lead into thinking the same drive now has more capacity, and will thus be advertised and priced as such? When in fact the UNCOMPRESSED capacity is exactly the same?

This sounds like it could be misused as one heck of a marketing trick. How will we know?
- Kent Smith November 21, 2013 at 10:18 pm - Reply
  
  Great question. The DVC feature will initially be sold to mega datacenters where they already understand their data entropy so there is no ‘marketed’ capacity.
  
  Because operating systems don’t yet natively understand a variable max capacity for the primary storage device, the operating system has to be customized. That will be done by these mega datacenters. Therefore, this will not immediately be sold to single-user client systems. Later, once the operating system incorporates these capabilities, you will start seeing this feature in end-user systems. I don’t imagine they will be marketed as anything other than the base capacity with a note of the upside capacity possible or typical.
Paul M November 22, 2013 at 2:51 am - Reply

So in effect, you’re accomplishing a result similar to, say, Len Zev, on the fly. Recognizing ‘template’ blocks that are repetitive, and mapping pointers to them on the writes. Then on reads you ‘simply’ (ha ha) are pointing to the templates as mapped. So:

1) On writes, this requires a LOT more intelligence, (in effect on board CPU?) in effect similar to any Zip or RAR type process. Doesn’t this slow the writes down a LOT?

2) On reads, if the original write was a long stream, thus normally causing a large un-segmented read, and is now broken up, causing many 4K reads etc, isn’t that a lot slower?

So, what is the trade off between capacity vs speed? Referencing Einstein’s First Theory of Free Lunch (There ain’t no such thing).
- Kent Smith November 22, 2013 at 7:37 pm - Reply
  
  The underlying DuraWrite technology is very advanced as you noted as well. For write operations all this work is done in our SandForce Flash Controller at line speed, meaning it is done as the data passes through from the host to the flash memory. In theory a poor design could easily slow down the process, so that is why we ensured that was not a problem for the DuraWrite data reduction technology.
  
  For read operations we have some proprietary techniques in DuraWrite that we use to prevent issues slowing down the reads. The DuraWrite technology used in DVC has been around for many years undergoing tests and analysis from third party companies. You can see in all those tests we have the ability to maintain performance on all entropy levels.

Data Is Potential. Harness It.

About the Author: admin

4 Comments

Leave A Comment Cancel reply

AIS: You can finally squeeze 15 pounds of data into a 5 pound bag

Share This Story:

About the Author: admin

Related Posts

Data in Motion, Part 2: How IoT Makes Healthy Communities

HAMR: the Next Leap Forward is Now

Data in Motion, Part 1: Storage Implications for Analytics Applications

Seagate Secure: Protection for Life-Critical Data, On Seagate Enterprise Drives

Introducing Exos and Nytro: The Enterprise Datasphere Fleet

4 Comments

Leave A Comment Cancel reply