“How valuable is data, and what level of data infrastructure do businesses require in this era of edge computing?”
From consumer wearables to industry hardware to transportation infrastructure, billions of devices are creating more data, more often. These data infuse intelligence into every business, empowering insights and enabling leaders to make decisions that fuel growth and competitive advantage.
We are also witnessing clear shift in the gravity of data, towards the edge — the location outside of the traditional centralized data center core, closer to the endpoint device where data is created, such as a factory floor or the base of a cellphone tower. Since edge computing, analysis and storage takes place close to the endpoints, data can make the round trip in fewer milliseconds to enable faster decision-making.
Enterprises must now streamline and secure data across the endpoint, the edge, and the core. The enterprise’s role as a data steward continues to grow in importance, as the increasing flow of data enables better business insights and consumer experiences.
In a recent panel discussion captured below, Seagate’s Jeff Nygaard and BS Teh, together with IDC’s Dave Reinsel, talk about the need for organizations to take a multi-faceted approach to harness the business potential of valuable data, and why they must understand the importance of data storage, seamless data movement and data security at the edge.
Read the full conversation below (edited for clarity), or watch the video above to find out more.
Globally, what challenges and opportunities will companies face when looking to strengthen their edge capabilities in 2019?
Dave Reinsel, Sr. Vice President, IDC:
When you think about the challenges associated with rolling out the edge, the first one is maintaining it. Having a physical presence out there in a remote area where you’ve got a server or something connected to a pipeline, how do you make sure that that is still working? How do you make sure that it’s updated with the right patches, so that it’s doing what it’s supposed to be doing? And that the electricity is on, that it’s cooled and properly maintained. That’s probably one of the biggest challenges.
The next challenge is the new types of data that are going to be available to you at the edge. You’re going to be able to capture data that you’d never captured before. How do you manage that data compared to the data that you used to use. How do you integrate it into your analysis? How do you extract the value from that?
Some of the opportunities? I think you get to control the customer experience. If I can have an effective edge out there, now I get to engage my customer in unique and exciting ways. And if I can do it better than somebody else, then I get to own that customer experience.
I mentioned capturing new sets of data. But there’s a whole new area of what we call OT, operational technology. We’ve been doing IT for years. OT is that connected elevator that’s sending information to you. It’s the connected refrigerator, the connected washing machine in my house, that’s sending information to a manufacturer. It’s a new source of data, it has different access privileges, it has different security protocols. Trying to make sure that you manage that is not just a challenge. It’s a huge opportunity to drive new levels of service and convenience to the users.
BS Teh, SVP of Global Sales and Sales Operations, Seagate:
One other challenge that I’d like to point out is security. I’m sure that many of you have visited data centers. As you actually get into the data center, you may notice the security around the data center is probably even tighter than that of a prison. So imagine now how precious and how important the data is.
Now you’re planning to have data at various edge sites. Are you able to secure the data the same way as you would at the data center itself? Now, obviously the types of data that you store at the edge will be different from the data center and the core itself. But still, it is a very big question and a challenge a lot of enterprises will have to take into consideration. How do you secure the data at rest?
Jeff Nygaard, EVP and Head of Operations, Products, and Technology, Seagate:
As we think about the edge, the edge is a tiering of data. There’s the traditional data center in the core that you’ve talked about. And we’ve talked about tiering within that data center. But as the edge comes online, to BS’s point, you’re going to move those tiers outside of the traditional data center into other locations.
So not only do you have to worry about the security, but you have to have some group within your company that’s thinking about “What data do I want at each point of that access?” There needs to be intelligent thinking about where the data is, and where we’re going to extract value from that data; either at the edge or at the core. There needs to be a group that’s thinking about it and positioning the company architecturally around that.
I’ve been following Seagate since the year 2000 and I’ve been with IDC going on 19 years. And prior to that I worked for a company that supplied a critical component — Hutchinson Technology, now TDK. But my question to you then is:
How have you seen the data center change in the age of edge computing? Where do you think the role for the data center, the core, is going to go now that the edge is starting to proliferate and expand its role and presence?
When you talk about data centers, I look at it from a perspective of private cloud versus public cloud. So I’ll touch on both.
I think, from a public cloud perspective, what has changed right now are two things. One: there’s a proliferation of data centers spread across all the various geographies, due in part to legislation, because certain countries want to keep data in-country. Secondly: it’s due to the scale obviously, as the scale grows. And thirdly: it’s latency as well. So we see that happening in the public cloud space. On top of that, we see public cloud service providers are now moving to offer edge services as well. So it’s not just about public cloud operators offering core data center services — they’re also extending out because they too recognize there is a need for edge infrastructure, and they don’t want to be left behind.
Now, on the private cloud side of things, a lot of enterprises right now are moving into a hybrid IT model. They’re not just saying: “Okay, I want to have my own data center”; it’s also, “how do we bridge the data center to the public cloud as well.” So that becomes really interesting. Surveys have shown that a majority of the companies who are thinking of moving into public cloud will retain a degree of their own data centers, as well.
To build on a few comments you talked about there. First, we talked a lot about latency and the importance of real time data, in terms of how we structure or architect our data. But that’s not the only driver for the edge. And you mentioned a couple other areas we need to stay focused on.
The other drivers to talk about are: Size of data, and cost of data, which kind of correlates with size of data. It’s not free to move data through a pipe from endpoint to edge to cloud; it costs money to send data through that pipeline. The idea that you should really only be moving data if you need to move the data — based on how you’ve architected, and how you get value out of the data — is something you should be thinking about.
Then the third element that you captured, BS, was compliance; compliance either due to privacy reasons, or compliance related to localization of data.
There’s a lot of different drivers beyond just this idea of real time data. And all of those factors, depending on your company’s situation, have to be part of that decision-making process.
You brought up some interesting comments — I’ll add to what you’re saying. Data has mass, right? So therefore, it doesn’t move cheaply. That has implications.
Think about managing the data from a connected vehicle. We’ve had these airplanes that have gotten lost. The authorities can’t find them anywhere. And you would think, “Why aren’t they just constantly communicating their location, and all the stats across the connection that they have, whether it’s via satellite or whatever?” Because it is just cost prohibitive. Instead, what do they have? They have a black box, and they collect all the data there. And assuming you can find the black box after an event, you can then recreate what happened. But it’s really hard to do it real time, because there’s no presence of an edge when you’re 35 thousand feet in the air.
Same thing with an autonomous vehicle; it’d be great to have every vehicle connected via cellular, but it’s expensive. So what you’re going to have are these little gateways at a stoplight, for example, so that when your car stops, the sensors will respond — “I can see this car, I’ll start unloading the data really quickly now; okay, the light’s green, I got most of the data” — and now you continue on doing your own thing.
So connectivity is a huge part of the edge — not just making sure it’s cost efficient, but making sure it’s available hundred percent of the time and present as much as possible, so you can deliver these services.
I’d like your opinion on the roles the data center will have. As I said earlier, you’re going to have the opportunity to capture a lot of data. We use the term “data lakes” a lot — but the last thing you want is your data lake to become a data ocean or data swamp, right? Or data bog, a data marsh, the Dead Sea of data. You don’t want that. So I think a key role of a data center will be making sure that data is managed in a very appropriate way. That will include reducing siloed environments — and that’s going to be difficult to do, because you’re going to be bringing in all these new sources of data. But the last thing you want to do is silo all these data sources, because if you don’t intersect data, you can’t apply analytics.
So how do data centers do that? What kind of infrastructure to they need to bring together to make sure that their data lakes don’t become data swamps? How do they go about liberating data?
This is a really important point. I think the public cloud operators have a bit of an advantage because of the scale they have. How they manage the data they’re holding is a big part of the cost structure as well. A key issue is how they optimize the data that they have; the public clouds are actively working on how to manage and move the data to the various locations in the most efficient way possible.
Now, as far as the enterprise data center is concerned, I think they don’t necessarily have that kind of scale to do their own development. So they’d have to rely on third party services to help them manage. I think Jeff would be able to comment on it because he manages that in-house at Seagate. But it’s more of a question of how you ensure you’re consistently maintaining the data that you have, and moving the data to the various tiers of storage, in the most efficient way possible.
Yeah, we use a term called data governance, in terms of how we try to optimize and re-optimize our data in the system. I want to build on a couple of things you captured, BS. First is, there are tools that allow you to analyze data across databases, or even across data centers. Those tools are out there, we’re using them. It’s an important idea: “Position the data where it needs to be to extract value.” You don’t have to move everything to a centralized database to do the analytics on it.
The second point I want to make, which we’re kind of dancing around a little bit because we’re talking a lot about infrastructure — we haven’t talked about people yet. You have to have the people infrastructure at several levels to be able to extract value from the data at all those levels. Let me highlight a few.
First, you need people that understand how to analyze data and big data sets. It’s no longer a world where there’s data up in a database, I pull the data down to my laptop or my computer, I analyze it. That’s not how things are done today, in a structured way. Data is analyzed at the database, and you need people that can do that. At Seagate we call that a “citizen data scientist.” We need the workforce to be able to understand how to do modern analytics on data.
Second, you need people that are actually building those data pipes at your company, and thinking about the architecture of where the data is located. That would be data scientists and computer scientists.
Last, importantly, there’s got to be some executive buy-in for the idea that data has value. For big hyperscale cloud companies, data is obviously their value. But for enterprises, data, in some cases, may be viewed as support for their real product, rather than necessarily an asset that has value. So there has to be executive buy-in that data is important for how we’re going to operate the company.
Those are great comments. I think about the various applications by industry. One of the things we did with Seagate this year was to look at some key industries and how they’re prepared to handle the data that’s coming their way. We called it the DATCON index, the data readiness condition index.
When you mentioned skill set, it came to my mind that some industries have a pretty good skill set. When you think about manufacturing — they’ve been doing this for years, and they know exactly what to do, especially inside the factory walls. I think an interesting piece of the skill set is, as you mentioned, a “data visionary.”
Somebody has to sit there and figure out what is it that we want to deliver to our customers? What is it that we want to deliver to our business partners? What information do we need? And where can we get that level of information? So I think about the manufacturing environment — and one of the companies we interviewed, they built engines, and they were putting sensors on all their engines now. And as we kind of discussed, it’s great to have that information. But what do you do with it, right? But what’s so interesting is, inside the manufacturing walls, it’s a controlled environment. I control vibration, I know what the temperature is, I control who uses it… it’s a very, very controlled environment. Once you exit the factory walls, all of a sudden, you increase the randomness of what’s influencing the data that you’re capturing.
So as you capture data, it’s not enough that you get a hold of data. You have to understand whether it’s good data or bad data, right? Garbage in, garbage out. We used to say that back in the 80s, back in the PCs. It is now so much more important, because as you start to enable processes with AI, you’re trying to build artificial intelligence and algorithms to make sure that you place data in the right place at the right time to deliver it to the right people on the right device in the right format. If you don’t get it there just in time or at the right time, then you’re going to ruin your service, you’re going to create a poor customer experience, and then that’s going to spoil the well for everyone else.
So I think about that manufacturer who is now taking sensor data, who can now understand so much more about their product. Even for refrigerators or washing machines: How often do I leave the door open? How often do I do a load of laundry? And what settings do I put it on? The manufacturer is using it for maintenance, sure. But they’re also beginning to understand how their users are actually using the product. Now, they have to figure out how to take that data, infuse it into the manufacturing process, to make a product that’s even longer lasting with a better customer experience — and personalized by the way — and I just think that’s fascinating.
I love your comment about the structure of the data. And you mentioned it briefly but also — the cleaning of the data. Because sensor data is not always clean. And once you go down these paths of automated decision making, you have to have the right cleaning of that data. Or you’re going to create, to your point, garbage in, garbage out. You’re going to make bad decisions.
So you’ve done a lot with manufacturing. I know for us at IDC, healthcare is an exciting industry that we explored, because if you think of the advances you can do with healthcare, it’s kind of like you’re taking IT to the people. For instance: it’s tough sometimes to get your child into the clinic or the hospital. The wait time to get in is terrible. You go there to wait, then you go into a room and wait some more. It’s not a good experience. But now you have telehealth — the ability to use video and audio to present whatever the ailment is of your child to a doctor online, and get a diagnosis and maybe even a prescription without even having to go into the medical center itself. Also AR/VR. We didn’t really talk a lot about AR/VR [in the DATCON report]. But what’s fascinating to me is the use of virtual reality in phobia situations. If you’re afraid of spiders, or if you’re afraid of heights, or you have this fear of people, they use VR in clinics to gently introduce you to a whole new set of reality to get you to overcome the phobia, while they’re watching your body metrics the whole time to monitor progress. I find that fascinating.
Do you have other industries that you’ve worked with, where there’s an exciting application, or where you’ve had to deal with limited skill sets?
Agriculture is another one that’s up and coming as well. But the theme we see across all the different emerging markets is that most of the discussion really starts at the application level. What I mean by that is they start by asking” “What services can we do?” And then they move on to: “What are the inventions and the creations that allow us to perform these particular applications?”
But in our opinion, not a lot of attention is paid on the backend and the infrastructure required. And oftentimes, it’s all about data storage infrastructure and data transfer infrastructure. I think healthcare and autonomous vehicles are two very good examples of that. A lot of focus on the front end, the car itself. What can the car do? But not a lot of discussion about the deeper infrastructure required; the car just can’t operate on its own, it needs an entire infrastructure worldwide in order for the car to operate. Just like, in healthcare, yeah you can have all this great equipment out there to do remote surgery and all that. Yeah, the robot can do that. But how do you make sure there’s an infrastructure around it to ensure there’s linkage to make full use of the data? That’s the part I think that is not getting enough focus and attention.
Some data is not that critical, other data is hypercritical
You mentioned the robots and healthcare. One of the measures we do about data in the global datasphere is the criticality of data. What is becoming increasingly important is to understand that there is critical data, while sometimes certain data’s not critical at all. The data associated with the movies that I watch isn’t critical. We all know what a blue screen is, it’s never fun; but it creates an inconvenience, it doesn’t kill us.
But if I have an embedded defibrillator and an embedded insulin pump, if that data is wrong then a life is at stake. If the data is wrong in a self-driving vehicle, or in an airplane, that is a critical situation. So the criticality, in fact, the hyper-criticality of data is starting to go up.
And that’s really where we see the intersection of the digital world with our physical world. Do we want it? Well, if it brings a level of convenience and a level of efficiency, then, yes, we want it. You mentioned autonomous vehicles. Personal experience, I had a car that had these autonomous driving features in it. But the problem was, it had the proximity cruise control where it always stayed a certain distance behind the vehicle in front. And that’s fine and very convenient. But the problem was, it was too far behind. I couldn’t get close enough. So what happened? This guy over here decides he’s going to pull in and that means my car moves back a little bit. Now another one comes in, and I felt like I was going backwards sometimes.
So why did they design it that way? They have to do it because a life’s at stake. They have to build in enough cushion and enough margin to take care of the randomness that exists. You were mentioning, BS, the infrastructure around autonomous vehicles. If I can create an infrastructure where I can control the variables — for example, instead of my carpool lane, there might be an autonomous vehicle lane where I know that the only thing in there is autonomous vehicles — I can tighten up those distances, I can bring in those parameters. I can control it much more finely and make it a much more pleasing experience, and a much more efficient experience.
That’s something we look forward to. But it takes time to roll out that infrastructure. Sometimes there’s resistance.
Video surveillance is perhaps one of the largest drivers in the growth of the datasphere. And it’s certainly an edge application, right? It’s an endpoint, but it’s an edge application. What kind of growth have you seen with video surveillance across the APJ region?
Let me start with a global perspective and then come down to APJ. Firstly, within the storage industry, this segment is one of the fastest growing segments outside of the cloud and data centers. What is really changing is that it’s moving from really being an endpoint application to an edge application. And it’s now also emerging as a core and data center application as well. What we’re also seeing is an emergence of a service model, in addition to just selling the surveillance system boxes out there. What is different, in the video surveillance space, is that the edge infrastructure tends to be owned by the brands. I mean, they will provide solutions from the endpoint to the edge and then also offer cloud services as well. That’s one thing a little bit unusual about this segment; there’re a few companies offering these end to end solutions.
Specifically for the APJ region, we see that as one of the fastest growing segments as well. By far, the largest market within APJ for video surveillance is India; it is certainly a very fast growing market. The other countries that have very strong focus in video surveillance would be Taiwan, Korea and Japan. The rest of the APJ countries are more consumers, and they’re deploying a lot of the infrastructure.
The designs of the drives they want in the surveillance market are changing as well, because they certainly want to stream in from tens to hundreds of cameras into the same storage unit. So they are migrating towards needs for high bandwidth, and high I/O. They’re also beginning to look at starting to analyze some of that video at the edge as well. So some of the discussion we’ve had about high I/O devices like the MACH.2 dual actuator hard drive design that Seagate has talked about, where they can move more input and output through the unit more rapidly, might be something that plays out in that market as they try to tie in more and more surveillance systems at higher and higher resolution going forward.
That’s an excellent point. Think about client drives having enterprise-class workloads. That’s a challenge that we’re facing. So we’re going to make sure that the client drives that we are selling in this space are built to withstand that.
Question from David from Simon Sherwood IT News and CRN:
You mentioned micro services integrated into business workflows and personal streams of life. Could you explain or give us an example of such micro services? And could you explain how that will impact on core and edge infrastructure?
Sure. I think one of the greatest examples of micro services relates to enabling persistent authentication. This relates to trying to make sure each individual person can maintain absolute identity security, so they are able to say: “This is not just who I said I was, but it’s who I am and will continue to say I am.” It’s this security package, the security service, that follows me around — it provides a level of authentication that enables features of convenience. So let’s say I’m walking up to my vehicle, and I just came from a bar, various sensor endpoints recognize that my walking gait isn’t normal, that my blood pressure isn’t normal. Therefore, when I walk up to my car, it doesn’t open or it doesn’t start. Some may see that as an inconvenience, but it also is a convenience for my safety and health, and for the insurance agency that is glad that it’s not allowing me to drive that way. So I think security is one of the greatest opportunities for micro services, as it provides a level of authentication on a personal level. But also with security for business partners, and the way they access data on the go. It also applies with mass transit, in asset management and tracking assets across geographies — again, it’s providing a level of security with that vehicle or with that fleet as it goes from point A to point B.
Question from Anthony Caruana, freelance journalist:
I kind of want to challenge the premise that you’re actually coming from about the importance of the edge. I’m listening to some of these use cases. For example, if I take the one that Dave threw in before about micro services, and you know, a blood pressure monitor talking to a device that stopped him from getting into his car and creating an accident. That’s not an edge case, that’s my smartphone talking to my car. That doesn’t need to go anywhere beyond the device on my wrist to my phone to my car. That doesn’t need to go out to the cloud or anywhere else. So I kind of challenge that premise. And I’m thinking even just about the one that Jeff threw in there about video surveillance, and BS talked about video surveillance. Even in those cases, the need for super low latency probably isn’t there because the vast majority of the analysis that happens on video surveillance is after the fact of a crime. It’s very rare that the good guys are scanning actively for facial recognition, for example, in real time looking for people to do bad things. They tend to use it investigatively afterwards and forensically.
First let’s be careful on how we’re defining the edge. A video surveillance camera is an endpoint; it’s not really the edge. The edge, as defined in this study, is really the branch office, for example, or when we talk about telco, the edge is the cell tower. At that edge, think about servers — enterprise hardened infrastructure that is responsible for data aggregation.
Now, you’re right, you will find endpoint devices increasingly embedding intelligence to take the load off the core. But if I want personalized services everywhere, and not just connected to one or two personal devices — if I want to anticipate your need based on your location, based on your preferences, based on the time of day, based upon your normal activity — that’s information and insight that has to be calculated somewhere in the edge. The other point is that the intelligence levels of the analysis will get pushed further and further, which drives more quantity of data and more immediately useful data, and therefore a greater need to analyze data near its source. Think of video surveillance cameras — they will get to the point where they’ll have a level of analytics so that it knows what data to save in the camera, what needs to be analyzed immediately in an edge data center, in addition to what data it needs to send back to the core for further analysis.
I’ll comment on video surveillance, and your reference that it’s more of, “A crime has been committed, so therefore, you go back, and let’s see what’s happening on that video.” That use-case of today is only part of it. In the future it’s also about new use cases that require a lot of analytics, in real-time and on the spot in order to function. Use cases like personalized services, or situational predictions that can prevent gridlock and overloading of public spaces or resources by analyzing many points of information constantly. Take an airport. Millions of travelers are coming in and — in future real-time applications – will need to be routed efficiently as well as kept safe. The data from all the activity within the airport has to be sent to be analyzed in real time and collated between passengers’ needs and airport resources, and go to some sort of a database for these analytics.
Another example is a shopping mall. Imagine going to a shopping mall, and immediately receiving services based on what your shopping habits are like. How do we get to that point? One is facial recognition or other biometrics like an eye scan or some other form of personalized identification. Now immediately the service provider can know who you are and your history. What you’ve done before and what you’ve bought, and therefore, what you’re likely to buy, can be considered in how to provide you the best service. So these are all the analytics that are going to be required, that’s why you have to have the edge infrastructure to support that.
Let me give one more example a colleague told me about recently about what railways are doing now. Think of a station or a bridge that goes across the tracks — as the train passes through it, it’s capturing video and it’s also capturing data that’s been collected off the sensors on the train. As it’s collecting all that information, it’s able to identify the parts of the train that are starting to wear out. Now if there’s an issue, something that needs to be replaced, maybe it’s a bearing in a wheel or whatever — the system is preordering that part, so the next station that train stops at, the parts are already there, and technicians have already been notified to come in and swap the part out. That has to have a vibrant edge to be able to accomplish that.
You prefaced your question with a comment around latency. Latency is one driver for the edge. There are also other drivers for the edge, like the size of data, the cost of moving the data. The surveillance example is a great example for why. It’s not just because there’s real time analytics — it’s also a high quantity of high-resolution video data being collected there. It’s expensive to send all that data up to a main core. So they don’t tend to always send all the data up there; they send a fraction or a subset of that data up there.
Another example: in our factories at our company, we have video cameras pointed inside our equipment. Those video cameras are watching the equipment 24/7 as the equipment runs. The systems are doing real time analytics on those video images to detect if something has changed or performed aberrantly on the tool, to react right away.
There are a variety of elements that are driving the need for storage and analytics at the edge.