Recently, the SNIA Compute, Memory, and Storage Initiative (CMSI) hosted a wide-ranging discussion on the “compute everywhere” continuum. The panel featured Chipalo Street from Microsoft, Steve Adams from Intel, and Eli Tiomkin from NGD Systems representing both the start-up environment and the SNIA Computational Storage Special Interest Group. We appreciate the many questions asked during the webcast and are pleased to answer them in this Q&A blog.
Our speakers discussed how, in modern
analytics deployments, latency is the fatal flaw that limits the efficacy of
the overall system. Solutions move at
the speed of decision, and microseconds could mean the difference between
success and failure against competitive offerings. Artificial Intelligence, Machine Learning,
and In-Memory Analytics solutions have significantly reduced latency, but the
sheer volume of data and its potential broad distribution across the globe
prevents a single analytics node from efficiently harvesting and processing
data.
Viewers asked questions on these subjects and
more. Let us know if you have any additional questions by emailing cmsi@snia.org.
And, if you have not had a chance to view the entire webcast, you can
access it in the SNIA
Educational Library.
Q1: The overlay of policy is the key to
enabling roles across distributed nodes that make “compute everywhere” an
effective strategy, correct?
A1: Yes, and there are different kinds of
applications. Examples include content distribution
or automation systems, and all of these can benefit from being able to run
anywhere in the network. This will
require significant advancements in security and trust as well.
Q2: Comment: There are app silos and
dependencies that make it difficult to move away from a centralized IT
design. There’s an aspect of write-once,
run-everywhere that needs to be addressed.
A2: This comes to the often-asked question on
the differences between centralized and distributed computing. It really comes down to the ability to run
common code anywhere, which allows digital transformation. By driving both centralized and edge
products, the concept of compute everywhere can really come to life.
Q3: Comment: There are app silos and app
dependencies, for instance three tier apps, that make it difficult to move away
from centralized consolidated IT design.
What are the implications of this?
A3: Data silos within a single tenant, and
data silos that cross tenants need to be broken down. The ability to share data in a secure fashion
allows a global look to get results.
Many companies view data like oil, it’s their value. There needs to be an ability to grant and
then revoke access to data. The opportunity for companies is to get insight
from their own data first, but then to share and access other shared data to
develop additional insight. We had a
lively discussion on how companies could take advantage of this. Emerging
technologies to automate the process of anonymizing or de-identifying data
should facilitate more sharing of data.
Q4: Comment: The application may run on the
edge, but the database is on-prem. But
that’s changing, and the ability to run the data analytics anywhere is the
significant change. Compute resources
are available across the spectrum in the network and storage systems. There is still need for centralized compute
resources, but the decisions will eventually be distributed. This is true not only inside a single
company, but across the corporate boundary.
A4: You have the programming paradigm to
write-one, run-everywhere. You can also expose products and data. The concept of data gravity might apply to
regulatory as well as just size considerations.
Q5: There’s the concept of geo-fencing from
a storage perspective, but is that also from a routing perspective?
A5: There are actually requirements such as
GDPR in Europe that define how certain data can be routed. What’s interesting is that the same kind of
technology that allows network infrastructure to route data can also be used to
help inform how data should flow. This
is not just to avoid obstacles, but also to route data where it will eventually
need to be collected in order to facilitate machine learning and queries
against streaming data, especially where streaming data aggregates.
Q6: Eli Tiomkin introduced the concept of
computational storage. The comment was
made that moving compute to the storage node enables the ability to take an
analytics model and distribute that across the entire network.
A6: As data becomes vast, the ability to gain
insight without forcing continuous data movement will enable new types of
application and deployments to occur.
Q7: When do you make the decision to keep
the data on-prem and bring the analytics to the data store rather than take the
data to the service itself? Or what are
the keys to making the decision to keep the data on your premise instead of
moving it to a centralized database?
When would you want to do one vs. the other?
A7: The reason the data should be processed on
the edge is because it’s easier to compare the results to new data as it’s
aggregated at the source. There are
latency implications of moving the data to the cloud to make all the decisions,
and it also avoids excess data movements.
In addition to data gravity considerations there might be regulation
barriers. Additionally, some of the
decisions that customers are expecting to make might have to scale to a metro
area. An example would be using retail
data to influence digital signage. We
provided several other examples in the discussion.
Q8: “Routing” traditionally means that data
needs to be moved from one point to the next as fast as possible. But perhaps intelligent routing can be used
to make more deliberate decisions in when and where to move and secure
data. What are the implications of this?
A8: What it really represents is that data has
different value at different times, and also at different locations. Being able to distribute data is not just an
act of networking, but also an act of balancing the processing required to gain
the most insight. There’s a real need
for orchestration to be available to all nodes in the deployment to best
effect.
Q9: It seems like the simple answer is to
compute at the edge and store in the cloud.
Is this true?
A9: It really depends on what you want to
store and where you need to store it.
You might find your insight immediately, or you might have to store that
data for a while due to audit considerations, or because the sought-after
insight is a trend line from streaming sources. So likely, a cache of data is
needed at the edge. It depends on the
type of application and the importance of the data. When you’re improving your
training models, the complexity of the model will dictate where you can
economically process the data. So the
simple answer might not always apply. An example would be where there is a huge
cache of data at the edge but archive/data lake in the cloud. For instance,
consider the customer support arm of a cellular network with a dashboard
indicating outages, congestion, and trending faults in order to address a
customer who is complaining of poor service. The need to quickly determine
whether the problem is their phone, a basestation, or the network itself drives
the need to have compute and store distributed everywhere. Large cellular
networks produce 100+ Terabytes of data a day in telemetry, logging, and event
data. Both maintaining the dashboard and the larger analytics tasks for
predictive maintenance requires a distributed approach.
Q10: How can you move cloud services like
AI/ML to on-prem, when on-prem might have a large database. Many of the applications depend on the
database and it might be difficult to move the application to the edge when the
data is on-prem.
A10: The real question is where you run your
compute. You need a large dataset to
train an AI model, and you’ll need a large processing center to do that. But once you have the model, you can run the
data through the model anywhere, and you might get different insight based on
the timeliness of the decision needed.
That might not mean that you can throw away the data at that point. There’s a need to continue to augment the
data store and make new decisions based on the new data.
Q11: So how would the architecture change
as a result?
A11: Compute everywhere implies that the old
client-server model is expanding to suggest that compute capability needs to be
coordinated between compute/store/move capabilities in the end device,
on-premises infrastructure, local IT, metro or network edge compute resources,
zones of compute, and in the cloud. Compute everywhere means client to client
and server to server, peers of servers and tiers of servers. Cloud gaming is an
early example of compute everywhere. Gaming PCs & Gaming Console
inter-acting in peer-to-peer fashion while simultaneously interacting with edge
and cloud gaming servers each inter-acting within its tiers and peers. AI is
becoming a distributed function like gaming driving demand for compute
everywhere and just like gaming, some AI functions are best done in or close to
the end device and others nearby, and still other further away in highly
centralized locations.
Q12: Outside of a business partnership or
relationship, what are other cases where users would generally agree to share
data?
A12: As we’ve seen trends change due to the
current pandemic, there are many cities and municipalities that would like to
keep some of the benefits of reduced travel and traffic. There’s an opportunity to share data on
building automation, traffic control, coordination of office and work
schedules, and many other areas that might benefit from shared data. There are many other examples that might also
apply. Public sources of data from
public agencies, in some geographies, are or will be mandated to share their
collected data. We should anticipate that some government statistic data will
be available by subscription, just like a news feed.
Q13: Efficient interactions among
datacenters and nodes might be important for the decisions we need to make for
future compute and storage. How could
real-time interactions affect latency?
A13: The ability to move the compute to the
data could significantly reduce the latency of decision-making. We should see more real-time and
near-real-time decisions will simultaneously be made through a network of edge
clusters. Distributed problems, like dynamically managing traffic systems
across a large metro area will leverage distributed compute and store edge
clusters to adjust metered on-ramps, stop lights, traffic signage in near
real-time. Imagine what kinds of apps and services will emerge if insights can
be shared near instantaneously between edge compute clusters. Put succinctly,
some distributed problems, especially those exposed in streaming data from
people and things, will require distributed processing operating in a
coordinate way in order to resolve.
Q14: Who’s dog barked at the end of the
talk?
A14: That would be Jim’s dog valiantly
defending the household from encroaching squirrels.
Q15: Will there be more discussions on this
topic?
A15: Well, if you’d like to hear more, let us
at SNIA know and we’ll find more great discussion topics on compute everywhere.
Leave a Reply