Seagate and Tape Ark Will Liberate Massive Cold Data Libraries for New AI and Analytics

Data trapped on tape is data we’re not using

Zettabytes of the world’s data sits trapped on legacy tapes, stuck in offsite vault storage. Today, this data may be more useful than ever; with the power of AI and deep data analytics, it can provide greater intelligence than ever to solve problems. Older data can also be valuable, or even necessary, for maintaining regulatory compliance. Yet this data is disconnected, hard to access, and sits deteriorating on aging technology.

Seagate’s Lyve Data Services and Tape Ark have teamed up to deliver a service that will free the data trapped in tape vaults — a simple, streamlined process to migrate high volumes of data from aging, archived tape media directly to public cloud platforms such as AWS, Google and Microsoft Azure, in a secure and methodical way.

The goal: liberate the data from the tape vaults, unlocking the potential to access, restore, mine, analyze, monetize and deploy it in ways never before possible. In the new Data Age — the age if IT 4.0 — data is continuously being created at endpoints, often processed at the edge then transmitted to the cloud to be analyzed as part of still-larger sets of relevant data; the migration and activation of all available data sets is crucial for driving digital transformation — and for surviving and thriving as part of the IT 4.0 revolution.

Liberate Massive Cold Data Libraries for New AI and Analytics

This partnership enables organizations from all industries to regain access to their data, potentially turning a stagnant cost center into a valuable business asset. Once data is liberated and ingested into the public cloud it becomes available to the application of data analytics thus empowering companies to exploit the inherent potential and worth of previously inaccessible data.

“Our vision is to give client data back to the client and make it instantly and economically accessible to them where and when they want it,” says Guy Holmes, Tape Ark founder and chief executive officer. “Ingesting these data sets into the cloud puts them back in control of their data assets and potentially turns an inaccessible, decaying cost center into a valuable revenue stream.”

“With Seagate’s expertise, zettabytes of data can be migrated to cloud platforms. On a global scale, using new edge hardware solutions, we will securely transport the data enabling businesses to recover, access, index, and analyze their tape-stored archived data,” explains Paul Steele, senior director of Seagate’s Lyve Data Services. “By moving this data to the cloud, previously inaccessible data can be put to work using technologies such as AI, industry 4.0 and other analytical tools.”

To solve our most urgent problems, humanity needs all our data at hand

For most of the past five decades, the most cost-effective means for storing cold data was on digital tape cartridges, stored in an offsite vault to ensure their physical safety. Once a backup or an archive was created on tape, it was given to a courier, placed on a truck, driven to offsite storage and placed on a shelf. While this gave clients some protection, it was essentially an air-conditioned room with a fire suppression system and swipe card access.

Today, things are different. Data is more present and more important than ever, and there is an almost endless supply of data that is actionable. And the data infrastructure has evolved. Over the last decade, the public cloud became ubiquitous and scalable; it provides built-in data redundancy, and it’s seen significant price decreases (while at the same time offsite tape storage costs have been on the rise). With today’s shift to IT 4.0, we’ve begun moving beyond the cloud, to the edge — tremendous computing capability is no longer limited to large centralized data centers; it can be moved closer to sources of data. The sheer volume of data created at the endpoints shifts the data gravity to the edge of the network and draws the computing power and applications closer to deliver decisions in real time.

That means we need all our data available in real time.

Previously cold NASA data relating to lunar dust, recorded during the Apollo moon landings, has recently been recovered and is now being used to assist in planning future missions to the Moon and Mars.

Previously cold NASA data relating to lunar dust, recorded during the Apollo moon landings, has recently been recovered and is now being used to assist in planning future missions to the Moon and Mars.

The legacy data in the world’s tape archives holds the potential to deliver tremendous value and potentially major breakthroughs in research, understanding, processes and practices in a wide variety of sectors — but only if it’s brought in from the cold, and made available to us. Bringing decades of cold data to life will release the potential for life-critical data to solve many of humanity’s problems, from small to large, from healthcare, education and research to energy and mining, agriculture, environmental and space sciences, and media and entertainment.

“There are over one billion stored tapes on the planet, and the value of the data on those tapes is incalculable,” says Ted Oade, Seagate director of product marketing. “Bringing it online allows newly emerging big data analytics and AI-driven technologies to be applied that will let clients extract new value, monetize, and potentially deliver — in some cases — life-saving insights.”

A complex process, made simple

Seagate’s Lyve Data Services powered by Tape Ark ends the need to maintain old-school vaulting storage systems. Our new tape migration services allow all the data to be available online, and it’s often cheaper or the same cost as legacy tape storage. Restores are no longer an issue, as data retention plans can become more proactive and robust. Access to data is available in minutes or hours instead of days or weeks. Analytics and big data tools can be used to interrogate the vast collection of historical data, offering new insights and new business opportunities.

“No more couriers, cables and crusty old equipment causing frustration and delays,” says Holmes. “You expect Netflix-style service at home, so why not expect that for your enterprise backups also?”

Seagate and Tape Ark together offer unprecedented capabilities in volume and scale, enabling clients to free the data on their entire tape archive — from one tape to a million tapes, and in all legacy tape formats — with no up-front cost to migrate the data to the cloud.

Seagate’s Lyve Data Services provides global scale, leading expertise in efficient data management and recovery, and the most advanced edge hardware solutions to ensure secure data transport from the edge to any cloud service. Tape Ark brings more than two decades of specialized tape transcription and data migration, recovery and restoration, and has pioneered a groundbreaking software and tape-to-cloud data transfer interface.

Reliable data migration from tape to cloud is normally a complex process. It requires extensive resources, including legacy tape drives to read the tapes, the software and file systems originally used to write the data, and extensive use of proprietary technologies and a highly methodical process to insure data integrity, auditability and security. Tape Ark has developed proprietary technology to free data trapped within a maze of data formats, file systems, and legacy tape formats — as well as tapes that are decades old and can be quite fragile. Utilizing Tape Ark’s unique scalable and automated tape migration tools and backend infrastructure, legacy tapes can seamlessly be migrated to the cloud en masse and then easily accessed through a client portal.

What are the big benefits for customers?

Clients will gain an incalculable advantage by adding years or decades of latent data back into their data sets to be accessible for AI-enabled deep analytics. In addition, converting archived data into always-available data has numerous benefits. Maintaining all levels of data in cloud provides flexibility, while being bound to legacy tape storage infrastructure may inhibit a company’s ability to modernize systems to be more flexible, efficient and secure. Cloud storage can be significantly less expensive than long term physical storage of tapes. Data in the cloud is always on, while data on tapes is expensive and difficult to access, and it’s always decaying and at risk of deterioration and permanent loss; additional costs include the need to maintain legacy devices and software licenses to provide access when necessary — and expertise on using legacy infrastructure is also hard to come by. And migrating data to the cloud makes regulatory compliance and data audits much simpler.

The data benefit

Whether a company is backing up directly to the cloud today or is still using tape, the ongoing storage of a historical tape collection continues to cost time and money. And when a project lead or analyst needs access to this data, they’ll have to wait days for a courier — often to find out the company no longer has the right tape drive or software to read the old tapes. So what’s the solution for all those tapes — and more importantly, all the historical data on them?

Seagate’s Lyve Data Services powered by Tape Ark will rapidly migrate the data from the client’s offsite tapes directly into the cloud, regardless of the media type, data format or the volume of tapes in the library.

Now when a client needs legacy data, they can retrieve it online, quickly and reliably.

The Seagate and Tape Ark teams understand the unique challenges and different regulatory requirements that exist between different industry sectors when it comes to data management and data storage practices. The regulatory landscape is constantly changing and we understand and proactively keep on top of complex requirements for data retention, ensuring that the migration of legacy archive data complies with all legal and regulatory requirements specific to each industry while simultaneously safeguarding the integrity and security of the data. We help to identify any possible compliance gaps and security vulnerabilities and ensure that these are eliminated during the migration process.

How will important sectors benefit from full access to data?

Healthcare and Medical Research

Healthcare and life sciences is a highly data-intensive sector, and its data is among the most sensitive. Research and scientific discovery, clinical trials, hospital records and individual patient data all generate large volumes of data requiring long term, secure storage. Not too long ago patient data was maintained in voluminous hard copy files on site, but after a slow evolution most records and data are now stored on tape and disk, both in onsite data centers and offsite storage facilities — often including multiple tape copies in multiple locations. While many organizations are putting new data into the cloud, enormous volumes of aging data still reside on tape, which is becoming increasingly difficult and expensive to maintain and access. Moving historical and legacy patient and scientific research data from tape into the cloud not only makes it more accessible and easier to manage, but also opens the door for big data opportunities and breakthrough healthcare research analytics — like this case in which machine learning applied to the data from chest radiography studies has enabled researchers to accurately predict Tuberculosis. It can increase the pace of innovation and discovery, potentially bringing treatments, medicines and cures to the market sooner.

Healthcare and medical research

Education and Research

Universities have one of the largest collections of backup tapes in the world. The rules that govern management of student records and research projects are complex and far reaching. Vast sums of money are wasted each year in this sector on off-site storage of data that the industry must retain — data that’s inaccessible to them without significant cost and time implications.

Media and Entertainment

The media, entertainment, broadcast and publishing sector has seen exponential data growth in past 20 years. Media assets such as video, animation and effects files, audio, security and news footage require large-volume durable storage that can grow to multi-petabytes on demand. Improved accessibility to that data at production, commercial and consumer levels drives market advantages. Case in point: Major League Baseball (MLB) has deployed the analysis of real-time data using Amazon Web Services (AWS) to power its revolutionary Player Tracking System, which is transforming the sport by revealing new, richly detailed information about the nuances and athleticism of the game — information that’s generating new levels of excitement among fans.

The media industry conventionally has maintained many tens of thousands of tapes with historical media content. The tapes are often stored offsite with the media deteriorating, and the hardware needed to read the tapes becoming obsolete. Liberating valuable legacy entertainment files lets clients better track and manage assets, and recut, remix, restore and monetize this content in the modern era of dispersed digital distribution.

AI technology enables clients to create enriched metadata and smart indexing, for example using performer recognition, sentiment analysis, facial expressions, background object recognition, on screen text analysis to enhance each asset’s metadata — to more easily search, find and monetize content.

Energy, Environment and Minerals

The mining, energy, and environmental sectors each have individual and complex corporate data retention requirements and regulations to comply with. In addition, the sector generates large volumes of research, meteorological, geological, and production data that is often used more than once, and requires ongoing access for years — even decades. Geophysical and meteorological data recorded up to half a century ago can still be very relevant to understanding resource and climate models today — even more so with the addition of new technology, software and data analytic tools that were not available when this data was first recorded.

The terrain for the scientific work conducted by ICESCAPE scientists is Arctic sea ice and melt ponds in the Chukchi Sea. The mission is dedicated to sampling the physical, chemical and biological characteristics of the ocean and sea ice. ICESCAPE is a multi-year NASA shipborne project. Photo Credit: (NASA/Kathryn Hansen)

Financial Services

As one of the most heavily regulated industry sectors, the handling and storage of financial services data is subject to compliance with a plethora of laws and regulations. The management of financial services data may also differ between public, private and not-for-profit organizations, the size of the organization and whether the financial data relates to a domestic or a global field. Data covers many sub-sectors including accounting, banking, stock market, superannuation, investment, insurance, credit, tax and other related services. It’s all complex, transactional information, the majority of which is confidential and requires high levels of security, encryption and traceability. It can be subject to periodic auditing by governing bodies and data retention periods can vary significantly between each sub-sector or type. For many organizations, the complexity of managing the various data sets creates challenges in deciding what data needs to be retained, what data can be destroyed securely, and what data is needed purely for compliance and auditing matters.  With most of the financial services data in question residing on backup tapes in offsite vaults, it is difficult for organizations to proactively manage.


Data management and storage in the retail industry has undergone many changes in the past quarter century and continues to evolve as market trends become more global and ecommerce dominates. Retail companies need access to both real-time statistics and historical data to understand trends and customer needs, and manage inventory in breakthrough ways — for example, as Walmart has famously pioneered by using machine learning AI, IoT and big data, and employing predictive analytics, to understand how product demand will shift with the seasons, the weather, the economy — to efficiently predict sales.

Predictive analytics systems need historical data to inform predictions. Data management is transforming from a cost center to become a revenue generator — and all this data must be at hand in the production environment.

Engineering and Construction

The building and construction industry is seeing significant new benefits from data mining and analytics to better plan for site selection, design complexities, material and labor costs earlier in the planning process. The sector also has very long retention periods for data, and large volumes of data to retain. There’s a recurring need to review project data for adherence to code regulations, ongoing maintenance and informing new building projects. Offsite tape storage makes this difficult and costly.

More benefits: lower cost, higher efficiency, reduced carbon footprint

It turns out air-conditioned secure tape vaults are no longer an essential part of the IT infrastructure. But there are bonuses. Freeing clients from the traditional, cumbersome and often complex offsite data storage model provides other benefits too.

Migrating data to the cloud means IT and data managers can eliminate the large ongoing costs of offsite storage fees and costly legacy infrastructure maintenance. And by removing needless courier vans from public roads, as well as emissions from air conditioning and gas suppression systems, customers reduce their carbon footprint.

The broad range of benefits includes:

  • A single point of access to all legacy data.
  • Direct online access to massive data collections; deployment of analytics, data mining, AI and other big data tools previously not possible when tapes were in a vault.
  • Fast and efficient access to data within hours of request, instead of days (no more picking data, courier vans or restoring tapes).
  • The ability to address media in the cloud as “media,” and not a restored version of a historical backup.
  • Increased data security and 11×9’s of durability.
  • Ability to expire media with the click of a button.
  • Creation of backup and data replication to additional geographies in the cloud, for disaster recovery.
  • Massively reduced data center footprint and associated running costs.
  • No more legacy tapes or tape drives to maintain.
  • Eliminates the risk of degradation to tape media.
  • Eliminates expensive tape-to-tape data migration projects
  • Environmental benefits from reducing vehicles on the road and reduced warehouse air conditioning and electricity.
  • Cost neutral or better; the cost of tape migration to cloud is no more than current physical storage costs, while providing massive upside and value.
  • Tape Ark’s “rapid learning” system simplifies re-cataloguing while tapes are transferred; make decisions about retention, de-duplication, re-cataloguing and retiring unneeded datasets confidently.

Most importantly, Seagate’s Lyve Data Services powered by Tape Ark delivers the restoration, migration and preservation of a company’s data assets. We have the physical infrastructure, the software, and the processes that will enable any client to access their data in ways that were never possible before.


About the Author:

John Paulsen
John Paulsen is a "Data for Good" advocate, with nearly 20 years in the data storage industry. He's helped launch many industry-firsts including HAMR technology, 10K-rpm and 15K-rpm hard drives, drives designed specifically for video and for gaming, Serial ATA drives, fluid dynamic HDD motors, 60TB SSDs, and MACH.2 multi-actuator technology.