The impact of big data on cloud infrastructure procurement

Big data seems nebulous from a distance, if only because its definition can be hard to pin down and its impact dispersed over the entire organization. That said, big data has real, material implications for cloud infrastructure. Enterprises can no longer afford to stick with legacy solutions that lack the scalability and flexibility to support demanding analytics projects.

Going forward, companies must rethink their approach to enterprise cloud storage, with an eye on achieving sufficient performance and acceptable economics for big data. Name aside, big data is about more than just volume size – it also pertains to the speed with which IT departments must collect, store and analyze the information that passes through their systems. The right big data strategy pairs an actionable business plan with technology that is cost-effective and versatile.

Scale-up architectures and hybrid clouds: Getting optimal big data performance without breaking the bank
The amount of new digital information generated each year is on breakneck growth trajectory. A recent EMC report pegged the worldwide figure at 4.4 trillion gigabytes in 2014 and estimated that the total would double each year for the rest of the decade, exceeding 44 trillion gigabytes by 2020. How are enterprises going to deal with these troves of data?

At first glance, big data would seem to put organizations in the impossible position of upgrading capacity by up to 80 percent each year. Doing so would be costly and only a short-term remedy – the challenge is to build infrastructure that is also smarter and more economical. To this end, some companies have been looking back at scale-up (rather than scale-out) architectures, in the process adding processors, memory and networking to existing boxes in order to handle record data flows.

Focusing on scale-up is one way to potentially reduce costs by foregoing additional hardware procurement. Scale-up designs work particularly well with increased virtualization, by decreasing the probability of errors during server configuration and maintenance. Consolidating hardware is also naturally conducive to lowering costs related to energy and floor space.

In the long run, though, scale-up machines can only do so much before additional infrastructure is required. Organizations often turn to a mix of public cloud computing services and on-premises private clouds, an arrangement commonly referred to as a hybrid cloud. By making their own IT setups more efficient and having the option to tap into a provider's resources as needed, enterprises obtain new levels of scalability befitting their big data projects.

What type of storage media are needed during this transition to big data infrastructure? SSDs are the logical first choice, if only because their exceptional speed is a perfect fit for information that must be processed quickly and collated into business initiatives. It almost goes without saying, though, that flash is expensive, at least compared to disk, creating the issue of determining how much of it can be incorporated into scalable cloud storage systems without pushing the organization over budget.

Choosing the right media for big data storage architectures
Buyers do not have to fall into an either/or trap when buying storage. Older technologies such as tape are still in wide use, and the same holds for HDDs, which despite the rise of flash are likely to stick around for years to come, coexisting with SSDs in the data center. Recent research from DataCore Software discovered slow uptake of all-flash storage arrays, underscoring the mixed approach that many enterprises are taking to physical media. Almost two-thirds of DataCore's 388 IT professional respondents said that less than 10 percent of their storage was dedicated to SSDs.

Rather than view SSDs as one-to-one replacements for HDDs, many organizations are coming to see them as assets that help with particular workflows. Efficiently managing big data means striking a balance between capacity and performance – so why not use HDDs when storage space is the foremost concern, and SSDs when speed is most pertinent?

Writing for TechTarget, Infostructure Associates president Wayne Kernochan explained the need for tiered storage, the practice of using a variety of storage drives (both SSDs and HDDs) to achieve optimal performance and cost-effectiveness. In effect, using different media enables companies to get the best out of flash and magnetic storage without risking cost overruns or performance lags.

"Tiered storage mitigates total cost by providing several cost/performance options in the pool, ranging from high-price, high-performance solid-state storage devices, to conventional magnetic disk storage based on Serial-Attached SCSI," wrote Kernochan. "Adding a solid-state tier between main memory and disk helps keep performance high for big data tasks without letting storage costs get unmanageable."

Kernochan recommended a 9 to 1 ratio of flash to disk. Such an arrangement means that costs are 10 percent higher than they would be using pure disk, but performance is 90 percent faster, a tradeoff that is likely acceptable to many enterprises staring down big data initiatives.

Hadoop, OpenStack and the future of big data management
Enterprises have been tweaking their operations for big data at all levels. On top of taking new approaches to hardware procurement, they have also changed course on software, adopting solutions such as Hadoop and OpenStack to ease the transition into more intensive information management.

Hadoop provides a distributed file system that is useful for scaling transactions and running applications that process tremendous amounts of data. Despite its utility, there is an overall shortage of expertise in Hadoop, meaning that a lot of organizations have turned to IT solutions providers to give them an easy path to the big data framework.

Soon, Hadoop will be more thoroughly supported in OpenStack, with the Juno release slated to include integration through the Sahara component. Enterprises that consider OpenStack and buy the right mix of drives from storage reputable vendors will be in good position to adapt to the demands of big data.

"Customers expect big data, it's assumed in today's world when you are working with an enterprise," stated David Steinberg, CEO of Zeta Interactive, according to ZDNet. "Companies on the customer side are not talking to vendors that can't bring them better data and better analytics than what they were doing before."