Sorry, you need to enable JavaScript to visit this website.

Reflections on Computational Storage

SNIAOnStorage

Jan 4, 2021

title of post

As a year we will never forget drew to a close, SNIA on Storage sat down (virtually of course!) with Computational Storage Technical Work Group Co-Chairs Jason Molgaard of Arm and Scott Shadley of NGD Systems and Computational Storage Special Interest Group Chair Eli Tiomkin of NGD Systems to take the pulse of 2020 and anticipate 2021 computational storage advances.

SNIA On Storage (SOS): Jason, Scott, and Eli, thanks for taking the time to chat. Where was computational storage 12 months ago and how did it progress in 2020?

Scott Shadley (SS): The industry launched the computational storage effort in late 2018, so 2019 was a year of beginning education and understanding of the technology concepts to encourage the “ask” for computational storage.  All new technology takes time to develop, so 2020 saw the beginning of implementation and growth with customer solutions beginning to be publicized and multiple vendors beginning to promote.

Jason Molgaard (JM): I agree. In 2019 the question was, “What is computational storage?” and the belief that it might never happen.  By early 2020, we saw much more interest and understanding of what computational storage was and how it could play a role in product development and deployment. 

Eli Tiomkin (ET): SNIA established the Computational Storage Special Interest Group in early 2020 as a great way to start to spread the word and make people aware of how compute could meet storage. As the year progressed, more players joined the market with devices that offered viable solutions and SNIA gained more members interested in contributing to the growth of this technology. 

SS:  We really saw the launch of the computational storage market in 2020 with multiple solutions of merit and also third-party industry analysts and experts writing on the technology.  The Computational Storage Technical Work Group, launched in 2019, brought 45+ companies together to begin to craft a standard for computational storage architectures and a programming model. In 2020 that effort branched out to other standards groups like NVM Express to propel standards even further.

JM:  Now, nearing the end of 2020, everyone has some vested interest in computational storage.

SOS:  Who are some of the “everyones” who have a vested interest?

JM:  First interest is from the developers– who are looking at “What should I make” and “How does it work”. They are seeing the interest from the knowledge gained by customers and prospects.  Users acquire devices and ask, ‘How will I use this?” and “Where will it give me benefits in my data center?”. They are interested in how they can use computational storage implementations in their industry for their purposes.

SS:  Computational storage at the end of 2020 is no longer simply a concept discussed at the CTO level as a forward-looking implementation but is now getting into the business units and those doing the real work.  That is the key – moving from R&D to the market by way of the business unit.

SOS:  Is this because users are understanding the hows and whys of compute moving closer to storage?

SS:  SNIA has done a huge amount of work this year to make computational storage visible and the connection between compute and storage understandable with outbound publicity around the technology and the weight it carries.  SNIA drove folks to pay attention, and the industry has responded making sure computational storage is on customer roadmaps.

ET:  SNIA’s 2020 activity to make computational storage noticeable has gotten results.  Our 2021 goal in the SIG is to take everything we did in 2020 and multiply it two to three times to draw even more attention to computational storage’s benefits for cloud, edge storage, and networking.  We want to make users always consider computational storage to solve problems and make outcomes more efficient.  We will be increasing the SIG’s identification and education on computational storage real world deployments in 2021 with videos, demonstrations, and developer bootcamps.

SOS: Thinking good things for the future, where do you see computational storage in five years?

SS: I see computational storage where persistent memory is today or even more advanced, with more opportunities and more deployments.  By 2025, 10% of all solid state drives could be computational based.

JM:  I agree with the 10%, and it could even be more looking at the kinds of industries that will have more widespread adoption. There will be higher adoption in end point applications as it is an easy way to gain a lot of compute into existing storage needs.  Data centers will also be clear winners but there some players may be more reluctant to adopt computational storage.

SS; I see an emerging growth market for data storage at the edge where the problem is to move data from the edge to some core location – cloud, on premise, etc.  The ability to put computational storage at the end point – the edge - gives SNIA the perfect opportunity to engage the industry and educate where the technology will get its success as compared to the core data center.

ET:  I double that and see for the edge as it evolves and grows, computational storage will be a natural selection for storage and compute at the edge. I would even say further if the data center hyperscaler would have started today from a technology point of view, we would have seen computational storage deployed in most data center infrastructures.  But getting into the existing infrastructure and pushing compute and storage as it is today is difficult so we might now be playing with some existing swim lanes.  However, as the edge evolves, it will have a natural tendency to go with NVMe SSDs with computational storage as a perfect fit for edge applications.

SOS:  Any further thoughts?

SS:  We at SNIA are very bullish on computational storage but have to be cautiously optimistic. We are not saying this has to happen, but rather that we at SNIA in the Technical Work Group and Special Interest Group can make it happen by how well we work as an organization in the industry with real customers who will deploy computational storage to drive success in the market.  SNIA is well versed with the capability to understand this new architecture and help others to understand that it is not scary but safe.  SNIA can provide that support to drive the technology.

SOS:  I have always been impressed by the cross-vendor cooperation and collaboration of SNIA members in putting technology forward and advancing standards and education in the marketplace.

SS: It is a great effort so let’s have some fun and make 2021 the year of computational storage deployments!  If you are interested, come join us!

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Jim Fister

Dec 15, 2020

title of post
There is a new wave of cognitive services based on video and image analytics, leveraging the latest in machine learning and deep learning. In a recent SNIA Cloud Storage Technologies Initiative (CSTI) webcast “How Video Analytics is Changing the Way We Store Video,” we looked at some of the benefits and factors driving this adoption, as well as explored compelling projects and required components for a successful video-based cognitive service. This included some great work being done in the open source community. In the course of the presentation, there were several discussion points and questions that arose. Our SNIA presenters, Glyn Bowden from HPE and Kevin Cone from Intel provide the answers. Q: The material several times mentioned an open source project for video analytics.  Is that available for everyone to view and contribute? A. Absolutely. The Open Visual Cloud Product (OVCP) is located on GitHub at: https://github.com/OpenVisualCloud. Contributions are welcome, and there are a significant number of contributors already involved in the project. There were several examples on the versatility of OVCP, and it was noted how extensible the project could be with addition of new tools and models. Q. Glyn talked about some old video platforms. Did people really capture video like that? Did the dinosaurs roam the tape archives back in those days? A. Ha! Glyn would like everyone to know that while that was the way video used to be captured, he was not around during the time of the dinosaurs. Q. Is there a reason to digitize and store old video of such poor quality? A. Glyn demonstrated how much of this video can still be valuable, but he also discussed how it was difficult to capture and index. Clearly, there are significant storage implications in digitizing too much old video, though cloud storage certainly provides a variety of solutions. Q. There was a good example of video analytics in smart cities. Is there a role for computational storage in this type of application? A. Not only is there a role for computational storage, there’s a significant need for smart networking. Kevin and Glyn provided some cases where the network might do local analytics. In fact, there was a recent SNIA webcast on “Compute Everywhere: How Storage and Networking Expand the Compute Continuum” that discussed some aspects of the edge and cloud interaction. Q. There was a good discussion on governance of video data. One discussion point was around the use of video in public safety and law enforcement. Would it be the case that smart city video might also be useful as a legal tool, and would have different retention rules as a result? Are there other examples of something like this? A. There are a variety of rules on archiving and retention of data that may be used in public safety. This is a pretty fluid area. Another example would be of videos where children are present, as there are significant privacy issues. The EU leads in the legislative efforts in this area, and they have a number of rules & guidelines that are outlined here. Q. Digital camera pickups have the ability to see beyond the human visual spectrum. Are there uses for video analytics in the IR and UV spaces? A. Kevin mentioned the use of IR as an indicator of remote temperature monitoring. Glyn said that this might also be an example of legal hazards, where there could be a violation of health protections. So, governance is likely to play a role in this area. Q. What are some differences between analytics and storage of video at the edge and storage in the cloud or data center? A. Video storage at the edge is likely a temporary thing. It could be stored there or analyzed there to reduce the latency of decision-making. Or it’s possible that it would be analyzed to determine how much of the video should be permanently archived. Cloud storage is more permanent, and analytics in the cloud is more likely to generate metadata that would be used to make policy at the edge.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Questions on Securing Data in Transit Answered

Alex McDonald

Dec 9, 2020

title of post
Data in transit provides a large attack surface for bad actors. Keeping data secure from threats and compromise while it’s being transmitted was the topic at our live SNIA Networking Storage Forum (NSF) webcast, Securing Data in Transit. Our presenters, Claudio DeSanti, Ariel Kit, Cesar Obediente, and Brandon Hoff did an excellent job explaining how to mitigate risks. We had several questions during the live event. Our panel of speakers have been kind enough to answer them here. Q. Could we control the most important point – identity, that is, the permission of every data transportation must have an identity label, so that we can control anomalies and misbehaviors easily? A. That is the purpose of every authentication protocol: verify the identity of entities participating in the authentication protocol on the basis of some secret values or certificates associated with the involved entity. This is similar to verifying the identity of a person on the basis of an identity document associated with the person. Q. What is BGP? A. BGP stands for Border Gateway Protocol, it is a popular routing protocol commonly used across the Internet but also leveraged by many customers in their environments. BGP is used to exchange routing information and next hop reachability between network devices (routers, switches, firewall, etc.). In order to establish this communication among the neighbors, BGP creates a TCP session in port 179 to maintain and exchange BGP updates. Q. What are ‘north-south’ and ‘east west’ channels? A. Traditionally “north-south” is traffic up and down the application or solution “stack” such as from client to/from server, Internet to/from applications, application to/from database, application to/from storage, etc. East-West is between similar nodes–often peers in a distributed application or distributed storage cluster. For example, east-west could include traffic from client to client, between distributed database server nodes, between clustered storage nodes, between hyperconverged infrastructure nodes, etc. Q. If I use encryption for data in transit, do I still need a separate encryption solution for data at rest? A. The encryption of data in transit protects the data as it flows through the network and blocks attack types such as eavesdropping, however, once it arrives to the target the data is decrypted and saved to the storage unencrypted unless data at rest encryption is applied. It is highly recommended to use both for best protection, data at rest protection protects the data in case the storage target is accessed by an attacker. The SNIA NSF did a deep dive on this topic in a separate webcast “Storage Networking Security Series: Protecting Data at Rest.” Q. Will NVMe-oFÔ use 3 different encryption solutions depending upon whether it’s running over Fibre Channel, RDMA, or IP? A. When referring to data in transit, the encryption type depends on the network type, hence, for different networks we will use different data-in-motion encryption protocols, nevertheless, they can all be based on Encapsulating Security Protocol (ESP) with same cipher suite and key exchange methods. Q. Can NVMe-oF over IP already use Transport Layer Security (TLS) for encryption or is this still a work in progress? Is the NVMe-oF spec aware of TLS? A. NVMe-oF over TCP already supports TLS 1.2. The NVM Express Technical Proposal TP 8011 is adding support for TLS 1.3. Q. Are there cases where I would want to use both MACsec and IPSec, or use both IPSec and TLS?  Does CloudSec rely on either MACSec or IPSec? A. Because of the number of cyber-attacks that are currently happening on a daily basis, it is always critical to create a secure environment in order to protect confidentially and integrity of the data. MACsec is enabled in a point-to-point Ethernet link and IPSec could be classified as to be end-to-end (application-to-application or router-to-router). Essentially you could (and should) leverage both technologies to provide the best encryption possible to the application. These technologies can co-exist with each other without any problem. The same can be said if the application is leveraging TLS. To add an extra layer of security you can implement IPSec, for example site-to-site to IPSec VPN. This is true especially if the communication is leveraging the Internet. CloudSec, on the other hand, doesn’t rely on MACsec because MACsec is a point-to-point Ethernet Link technology and CloudSec provides the transport and encryption mechanism to support a multi-site encryption communication. This is useful where more than one data center is required to provide an encryption mechanism to protect the confidentially and integrity of the data. The CloudSec session is a point-to-point encryption over Data Center Interconnect on two or more sites. CloudSec key exchange uses BGP to guarantee the correct information gets the delivered to the participating devices. Q. Does FC-SP-2 require support from both HBAs and switches, or only from the HBAs? A. For data that moves outside the data center, Fibre Channel Security Protocols (FC-SP-2) for Fibre Channel or IPsec for IP would need to be supported by the switches or routers. No support would be required in the HBA. This is most common use case for FC-SP-2.  Theoretically, if you wanted to support FC-SP-2 inside the secure walls of the data center, you can deploy end-to-end or HBA-to-HBA encryption and you won’t need support in the switches.  Unfortunately, this breaks some switch features since information the switch relies on would be hidden. You could also do link encryption from the HBA-to-the switch, and this would require HBA and switch support.  Unfortunately, there are no commercially available HBAs with FC-SP-2 support today, and if they become available, interoperability will need to be proven. This webcast from the Fibre Channel Industry Association (FCIA) goes into more detail on Fibre Channel security. Q. Does FC-SP-2 key management require a centralized key management server or is that optional? A. For switch-to-switch encryption, keys can be managed through a centralized server or manually. Other solutions are available and in production today. For HBAs, in most environments there would be thousands of keys to manage so a centralized key management solution would be required and FC-SP provides 5 different options. Today, there are no supported key management solutions for FC-SP-2 from SUSE, RedHat, VMware, Windows, etc. and there are no commercially available HBAs that support FC-SP-2. This webcast was part of our Storage Networking Security Webcast Series and they are all available on demand. I encourage you to take a look at the other SNIA educational webcasts from this series:

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Compute Everywhere – Your Questions Answered

Jim Fister

Dec 7, 2020

title of post
Recently, the SNIA Compute, Memory, and Storage Initiative (CMSI) hosted a wide-ranging discussion on the “compute everywhere” continuum.  The panel featured Chipalo Street from Microsoft, Steve Adams from Intel, and Eli Tiomkin from NGD Systems representing both the start-up environment and the SNIA Computational Storage Special Interest Group. We appreciate the many questions asked during the webcast and are pleased to answer them in this Q&A blog. Our speakers discussed how, in modern analytics deployments, latency is the fatal flaw that limits the efficacy of the overall system.  Solutions move at the speed of decision, and microseconds could mean the difference between success and failure against competitive offerings.  Artificial Intelligence, Machine Learning, and In-Memory Analytics solutions have significantly reduced latency, but the sheer volume of data and its potential broad distribution across the globe prevents a single analytics node from efficiently harvesting and processing data. Viewers asked questions on these subjects and more. Let us know if you have any additional questions by emailing cmsi@snia.org. And, if you have not had a chance to view the entire webcast, you can access it in the SNIA Educational Library. Q1: The overlay of policy is the key to enabling roles across distributed nodes that make “compute everywhere” an effective strategy, correct? A1: Yes, and there are different kinds of applications.  Examples include content distribution or automation systems, and all of these can benefit from being able to run anywhere in the network.  This will require significant advancements in security and trust as well. Q2: Comment: There are app silos and dependencies that make it difficult to move away from a centralized IT design.  There’s an aspect of write-once, run-everywhere that needs to be addressed. A2: This comes to the often-asked question on the differences between centralized and distributed computing.  It really comes down to the ability to run common code anywhere, which allows digital transformation.  By driving both centralized and edge products, the concept of compute everywhere can really come to life. Q3: Comment: There are app silos and app dependencies, for instance three tier apps, that make it difficult to move away from centralized consolidated IT design. What are the implications of this? A3: Data silos within a single tenant, and data silos that cross tenants need to be broken down.  The ability to share data in a secure fashion allows a global look to get results. Many companies view data like oil, it’s their value.  There needs to be an ability to grant and then revoke access to data. The opportunity for companies is to get insight from their own data first, but then to share and access other shared data to develop additional insight.  We had a lively discussion on how companies could take advantage of this. Emerging technologies to automate the process of anonymizing or de-identifying data should facilitate more sharing of data. Q4: Comment: The application may run on the edge, but the database is on-prem.  But that’s changing, and the ability to run the data analytics anywhere is the significant change.  Compute resources are available across the spectrum in the network and storage systems.  There is still need for centralized compute resources, but the decisions will eventually be distributed.  This is true not only inside a single company, but across the corporate boundary. A4: You have the programming paradigm to write-one, run-everywhere. You can also expose products and data.  The concept of data gravity might apply to regulatory as well as just size considerations. Q5: There’s the concept of geo-fencing from a storage perspective, but is that also from a routing perspective? A5: There are actually requirements such as GDPR in Europe that define how certain data can be routed.  What’s interesting is that the same kind of technology that allows network infrastructure to route data can also be used to help inform how data should flow.  This is not just to avoid obstacles, but also to route data where it will eventually need to be collected in order to facilitate machine learning and queries against streaming data, especially where streaming data aggregates. Q6: Eli Tiomkin introduced the concept of computational storage.  The comment was made that moving compute to the storage node enables the ability to take an analytics model and distribute that across the entire network. A6: As data becomes vast, the ability to gain insight without forcing continuous data movement will enable new types of application and deployments to occur. Q7: When do you make the decision to keep the data on-prem and bring the analytics to the data store rather than take the data to the service itself?  Or what are the keys to making the decision to keep the data on your premise instead of moving it to a centralized database? When would you want to do one vs. the other? A7: The reason the data should be processed on the edge is because it’s easier to compare the results to new data as it’s aggregated at the source.  There are latency implications of moving the data to the cloud to make all the decisions, and it also avoids excess data movements. In addition to data gravity considerations there might be regulation barriers.  Additionally, some of the decisions that customers are expecting to make might have to scale to a metro area.  An example would be using retail data to influence digital signage.  We provided several other examples in the discussion. Q8: “Routing” traditionally means that data needs to be moved from one point to the next as fast as possible.  But perhaps intelligent routing can be used to make more deliberate decisions in when and where to move and secure data.  What are the implications of this? A8: What it really represents is that data has different value at different times, and also at different locations.  Being able to distribute data is not just an act of networking, but also an act of balancing the processing required to gain the most insight.  There’s a real need for orchestration to be available to all nodes in the deployment to best effect. Q9: It seems like the simple answer is to compute at the edge and store in the cloud. Is this true? A9: It really depends on what you want to store and where you need to store it. You might find your insight immediately, or you might have to store that data for a while due to audit considerations, or because the sought-after insight is a trend line from streaming sources. So likely, a cache of data is needed at the edge.  It depends on the type of application and the importance of the data. When you’re improving your training models, the complexity of the model will dictate where you can economically process the data.  So the simple answer might not always apply. An example would be where there is a huge cache of data at the edge but archive/data lake in the cloud. For instance, consider the customer support arm of a cellular network with a dashboard indicating outages, congestion, and trending faults in order to address a customer who is complaining of poor service. The need to quickly determine whether the problem is their phone, a basestation, or the network itself drives the need to have compute and store distributed everywhere. Large cellular networks produce 100+ Terabytes of data a day in telemetry, logging, and event data. Both maintaining the dashboard and the larger analytics tasks for predictive maintenance requires a distributed approach. Q10: How can you move cloud services like AI/ML to on-prem, when on-prem might have a large database.  Many of the applications depend on the database and it might be difficult to move the application to the edge when the data is on-prem. A10: The real question is where you run your compute.  You need a large dataset to train an AI model, and you’ll need a large processing center to do that.  But once you have the model, you can run the data through the model anywhere, and you might get different insight based on the timeliness of the decision needed. That might not mean that you can throw away the data at that point.  There’s a need to continue to augment the data store and make new decisions based on the new data. Q11: So how would the architecture change as a result? A11: Compute everywhere implies that the old client-server model is expanding to suggest that compute capability needs to be coordinated between compute/store/move capabilities in the end device, on-premises infrastructure, local IT, metro or network edge compute resources, zones of compute, and in the cloud. Compute everywhere means client to client and server to server, peers of servers and tiers of servers. Cloud gaming is an early example of compute everywhere. Gaming PCs & Gaming Console inter-acting in peer-to-peer fashion while simultaneously interacting with edge and cloud gaming servers each inter-acting within its tiers and peers. AI is becoming a distributed function like gaming driving demand for compute everywhere and just like gaming, some AI functions are best done in or close to the end device and others nearby, and still other further away in highly centralized locations. Q12: Outside of a business partnership or relationship, what are other cases where users would generally agree to share data? A12: As we’ve seen trends change due to the current pandemic, there are many cities and municipalities that would like to keep some of the benefits of reduced travel and traffic.  There’s an opportunity to share data on building automation, traffic control, coordination of office and work schedules, and many other areas that might benefit from shared data.  There are many other examples that might also apply.  Public sources of data from public agencies, in some geographies, are or will be mandated to share their collected data. We should anticipate that some government statistic data will be available by subscription, just like a news feed. Q13: Efficient interactions among datacenters and nodes might be important for the decisions we need to make for future compute and storage.  How could real-time interactions affect latency? A13: The ability to move the compute to the data could significantly reduce the latency of decision-making.  We should see more real-time and near-real-time decisions will simultaneously be made through a network of edge clusters. Distributed problems, like dynamically managing traffic systems across a large metro area will leverage distributed compute and store edge clusters to adjust metered on-ramps, stop lights, traffic signage in near real-time. Imagine what kinds of apps and services will emerge if insights can be shared near instantaneously between edge compute clusters. Put succinctly, some distributed problems, especially those exposed in streaming data from people and things, will require distributed processing operating in a coordinate way in order to resolve. Q14: Who’s dog barked at the end of the talk? A14: That would be Jim’s dog valiantly defending the household from encroaching squirrels. Q15: Will there be more discussions on this topic? A15: Well, if you’d like to hear more, let us at SNIA know and we’ll find more great discussion topics on compute everywhere.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Implications of Internet of Payments Q&A

Jim Fister

Dec 7, 2020

title of post
Internet of Payments (IoP) enables payment processing over many kinds of IoT devices and has also led to the emergence of the micro-transaction. It’s an area of rapid growth. Recently, the SNIA Cloud Storage Technologies Initiative (CSTI) hosted a live webcast Technology Implications of Internet of Payments. The talk was hosted by me, with Glyn Bowden from HPE and Richard George from HLPS providing expert insight. In the course of the conversation, several comments and questions arose, and they are summarized here. Feel free to view the entire discussion and provide us with feedback on the talk. We are also always interested on topics you’d like to see us cover in the future. Q. When considering digitization of assets, currency is not locked to a solid standard. That is, they are not based on specific physical assets. Many new digital currencies are therefore unstable. But the proposition here is that they would be more secure because they’re locked to real physical value. Is that correct? A. There is significant volatility in digital assets, mostly because of speculation. Being able to, say, fractionalize your home into digital assets stabilizes the specific currency and creates value that is locked to the growing value of the asset itself. The physical asset can be locked to a fiat currency. For those interested in trading on a reliable crypto exchange platform, a Coinbase Review can provide valuable insights. If you're planning to have an investment for your child's future, here is the guide to junior ISAs to help you. Q. Comment: The fact that the currency is digital means that it can be shared on a currency exchange.  The example used was that assets in the game Fortnight can be bought and sold on eBay. Those who love playing Contra may use these contra hard corps cheats to play more game levels. A. Yes, exactly, this is a new way to create a wealth of exchanges. The assets themselves are readily exchanged, and this gets us back to a more traditional bartering of desired goods and services that have been extant for centuries. Q. Is this the real opportunity to move back to a barter and exchange system? You can value your own assets and bargain them for other assets? A. Absolutely. This is a way for people and organizations to generate value that they see for their assets. There is an opportunity to make liquid approximately $250T of assets. Q. The reach of these assets is astounding. Will this change global micro-lending based on the real assets that global start-up businesses own or create? A. Yes. This is a peer-to-peer investment and exchange, and it can be opened up to hundreds of thousands of assets and individuals. These peer-to-peer transactions will impact both individuals and governments, and create significant efficiencies in the trading of value. To learn more about the wonders of trading, articles like the Kiana Danial course have you covered. Q. How does trust impact currency? In the digital environment, a loss of trust would essentially make the specific currency valueless rapidly. A. Yes, there’s significant need for technology to establish a common trust model, and for all parties in the transaction to commit to it. Richard and Glyn provided a great example in the presentation, so make sure you watch it to see. Glyn also provided a high-level architecture that could authenticate the transaction. Q. The credentialing process and the creation of storage repositories is a way to create the trust in the currency. The example of a third-world farmer lets the farmer create a true chain of trust that can be used by large global entities to establish value. So, can the final customer be assured that the asset came from not only a trusted source but also from an ethical one? A. Yes, and this will create significant value to the originating entity as well as others in the chain of value. The technology chain adds transparency to the transaction which opens it up to public scrutiny. Q. How does this affect cloud storage vendors that participate as part of the transaction infrastructure? A. Vendors who process the transactions and manage the currency exchange can gain insight from both the data and the data flow of transactions. Q. What is HLPS? A. Health Life Prosperity Shared Ltd. is a financial technology company focused on using digital assets to help people in the UK purchase homes.  Richard is an expert in digital assets and payments.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Implications of Internet of Payments Q&A

Jim Fister

Dec 7, 2020

title of post
Internet of Payments (IoP) enables payment processing over many kinds of IoT devices and has also led to the emergence of the micro-transaction. It’s an area of rapid growth. Recently, the SNIA Cloud Storage Technologies Initiative (CSTI) hosted a live webcast Technology Implications of Internet of Payments. The talk was hosted by me, with Glyn Bowden from HPE and Richard George from HLPS providing expert insight. In the course of the conversation, several comments and questions arose, and they are summarized here. Feel free to view the entire discussion and provide us with feedback on the talk. We are also always interested on topics you’d like to see us cover in the future. Q. When considering digitization of assets, currency is not locked to a solid standard. That is, they are not based on specific physical assets. Many new digital currencies are therefore unstable. But the proposition here is that they would be more secure because they’re locked to real physical value. Is that correct? A. There is significant volatility in digital assets, mostly because of speculation. Being able to, say, fractionalize your home into digital assets stabilizes the specific currency and creates value that is locked to the growing value of the asset itself. The physical asset can be locked to a fiat currency. Q. Comment: The fact that the currency is digital means that it can be shared on a currency exchange.  The example used was that assets in the game Fortnight can be bought and sold on eBay. A. Yes, exactly, this is a new way to create a wealth of exchanges. The assets themselves are readily exchanged, and this gets us back to a more traditional bartering of desired goods and services that have been extant for centuries. Q. Is this the real opportunity to move back to a barter and exchange system? You can value your own assets and bargain them for other assets? A. Absolutely. This is a way for people and organizations to generate value that they see for their assets. There is an opportunity to make liquid approximately $250T of assets. Q. The reach of these assets is astounding. Will this change global micro-lending based on the real assets that global start-up businesses own or create? A. Yes. This is a peer-to-peer investment and exchange, and it can be opened up to hundreds of thousands of assets and individuals. These peer-to-peer transactions will impact both individuals and governments, and create significant efficiencies in the trading of value. Q. How does trust impact currency? In the digital environment, a loss of trust would essentially make the specific currency valueless rapidly. A. Yes, there’s significant need for technology to establish a common trust model, and for all parties in the transaction to commit to it. Richard and Glyn provided a great example in the presentation, so make sure you watch it to see. Glyn also provided a high-level architecture that could authenticate the transaction. Q. The credentialing process and the creation of storage repositories is a way to create the trust in the currency. The example of a third-world farmer lets the farmer create a true chain of trust that can be used by large global entities to establish value. So, can the final customer be assured that the asset came from not only a trusted source but also from an ethical one? A. Yes, and this will create significant value to the originating entity as well as others in the chain of value. The technology chain adds transparency to the transaction which opens it up to public scrutiny. Q. How does this affect cloud storage vendors that participate as part of the transaction infrastructure? A. Vendors who process the transactions and manage the currency exchange can gain insight from both the data and the data flow of transactions. Q. What is HLPS? A. Health Life Prosperity Shared Ltd. is a financial technology company focused on using digital assets to help people in the UK purchase homes.  Richard is an expert in digital assets and payments.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

5G Streaming Questions Answered

Michael Hoard

Dec 2, 2020

title of post

The broad adoption of 5G, internet of things (IOT) and edge computing are reshaping the nature and role of enterprise and cloud storage. Preparing for this significant disruption is important. It’s a topic the SNIA Cloud Storage Technologies Initiative covered in our recent webcast “Storage Implications at the Velocity of 5G Streaming,” where my colleagues, Steve Adams and Chip Maurer, took a deep dive into the 5G journey, streaming data and real-time edge AI, 5G use cases and much more. If you missed the webcast, it’s available on-demand along with a copy of the webcast slides.

As you might expect, this discussion generated some intriguing questions. As promised during the live presentation, our experts have answered them all here.

Q. What kind of transport do you see that is going to be used for those (5G) use-cases?

A. At a high level, 5G consists of 3 primary slices: enhanced mobile broadband (eMBB), ultra-low latency communications (URLLC) and massive machine type communication (mMTC). Each of these are better suited for different use cases, for example normal smartphone usage relies on eMBB, factory robotics relies on URLLC, and intelligent device or sensor applications like farming, edge computing and IOT relies on mMTC.  

The primary 5G standards-making bodies include:

  • The 3rd Generation Partnership Project (3GPP) – formulates 5G technical specifications which become 5G standards. Release 15 was the first release to define 5G implementations, and Release 16 is currently underway.
  • The Internet Engineering Task Force (IETF) partners with 3GPP on the development of 5G and new uses of the technology. Particularly, IETF develops key specifications for various functions enabling IP protocols to support network virtualization. For example, IETF is pioneering Service Function Chaining (SFC), which will link the virtualized components of the 5G architecture—such as the base station, serving gateway, and packet data gateway—into a single path. This will permit the dynamic creation and linkage of Virtual Network Functions (VNFs).
  • The International Telecommunication Union (ITU), based in Geneva, is the United Nations specialized agency focused on information and communication technologies. ITU World Radio communication conferences revise the international treaty governing the use of the radio-frequency spectrum and the geostationary and non-geostationary satellite orbits.

To learn more, see

Q. What if the data source at the Edge is not close to where the signal is good to connect to cloud? And, I wonder how these algorithm(s) / data streaming solutions should be considered?

A. When we look at a 5G applications like massive Machine Type Communications (mMTC), we expect many kinds of devices will connect only occasionally, e.g. battery-operated sensors attached to farming water sprinklers or water pumps.  Therefore, long distance, low bandwidth, sporadically connected 5G network applications will need to tolerate long stretches of no-contact without losing context or connectivity, as well as adapt to variations in signal strength and signal quality.   

Additionally, 5G supports three broad ranges of wireless frequency spectrum: Low, Mid and High. The lower frequency range provides lower bandwidth for broader or more wide area wireless coverage.  The higher frequency range provides higher bandwidth for limited area or more focused area wireless coverage. To learn more, check out The Wired Guide to 5G.

On the second part of the question regarding algorithm(s) / data streaming solutions, we anticipate streaming IOT data from sporadically connected devices can still be treated as steaming data sources from a data ingestion standpoint. It is likely to consist of broad snapshots (pre-stipulated time windows) with potential intervals of null sets of data when compared with other types of data sources. Streaming data, regardless of interval of data arrival, has value because of the “last known state” value versus previous interval known states. Calculation of trending data is one of the most common meaningful ways to extract value and make decisions. 

Q. Is there an improvement with the latency in 5G from cloud to data center?

By 2023, we should see the introduction of 5G ultra reliable low latency connection (URLLC) capabilities, which will increase the amount of time sensitive data ingested into and delivered from wireless access networks. This will increase demand for fronthaul and backhaul bandwidth to move time sensitive data from remote radio units, to baseband stations and aggregation points like metro area central offices.

As an example, to reduce latency, some hyperscalers have multiple connections out to regional co-location sites, central offices and in some cases sites near cell towers. To save on backhaul transport costs and improve 5G latency, some cloud service providers (CSP) are motivated to locate their networks as close to users as possible.

Independent of CSPs, we expect that backhaul bandwidth will increase to support the growth in wireless access bandwidth of 5G over 4G LTE. But it isn’t the only reason backhaul bandwidth is growing. COVID-19 revealed that many cable and fiber access networks were built to support much more download than upload traffic. The explosion in work and study from home, as well as video conferencing has changed the ratio of upload to download. So many wireline operators (which are often also wireless operators) are upgrading their backhaul capacity in anticipation that not everyone will go back to the office any time soon and some may hardly ever return to the office.

Q. Are the 5G speeds ensured from end-to-end (i.e from mobile device to tower and with MSP’s infrastructure)? Understand most of the MSPs have improved the low latency speeds between Device and Tower.

We expect specialized services like 5G ultra reliable low latency connection (URLLC) will help improve low latency and narrow jitter communications. As far as “assured,” this depends on the service provider SLA. More broadly 5G mobile broadband and massive machine type communications are typically best effort networks, so generally, there is no overall guaranteed or assured latency or jitter profile.

5G supports the largest range of radio frequencies. The high frequency range uses milli-meter (mm) wave signals to deliver the theoretical max of 10Gbps, which means by default reduced latency along with higher throughput. For more information on deterministic over-the-air network connections using 5G URLLC and TSN (Time Sensitive Networking), see this ITU presentation “Integration of 5G and TSN.”  

To provide a bit more detail, mobile devices communicate via wireless with Remote Radio Head (RRH) units co-located at the antenna tower site, while baseband unit (BBU) processing is typically hosted in local central offices.  The local connection between RRHs and BBUs is called the fronthaul network (from antennas to central office). Fronthaul networks are usually fiber optic supporting eCPRI7.2 protocol, which provide time sensitive network delivery. Therefore, this portion of the wireless data path is deterministic even if the over-the-air or other backhaul portions of the network are not.

Q. Do we use a lot of matrix calculations in streaming data, and do we have a circuit model for matrix calculations for convenience?

We see this applies case-by-case based on the type of data.  What we often see is many edge hardware systems include extensive GPU support to facilitate matrix calculations for real time analytics.

Q. How do you see the deployment and benefits of Hyperconverged Infrastructure (HCI) on the edge?

Great question.  The software flexibility of HCI can provide many advantages on the edge over dedicated hardware solutions. Ease of deployment, scalability and service provider support make HCI an attractive option.  See this very informative article from TechTarget “Why hyper-converged edge computing is coming into vogue” for more details.

Q. Can you comment on edge-AI accelerator usage and future potentials? What are the places these will be used?

Edge processing capabilities include many resources to improve AI capabilities.  Things like computational storage and increased use of GPUs will only serve to improve analytics performance. Here is a great article on this topic.

Q. How important is high availability (HA) for edge computing?

For most enterprises, edge computing reliability is mission critical.  Therefore, almost every edge processing solution we have seen includes complete and comprehensive HA capabilities.

Q. How do you see Computational Storage fitting into these Edge use cases?  Any recommendations on initial deployment targets?

The definition and maturity of computational storage is rapidly evolving and is targeted to offer huge benefits for management and scale of 5G data usage on distributed edge devices.  First and foremost, 5G data can be used to train deep neural networks at higher rates due to parallel operation of “in storage processing.”  Petabytes of data may be analyzed in storage devices or within storage enclosures (not moved over the network for analysis). Secondly, computational storage may also accelerate the process of conditioning data or filtering out unwanted data.

Q. Do you think that the QUIC protocol will be a standard for the 5G communication?

So far, TCP is still the dominate transport layer protocol within the industry. QUIC was initially proposed by Google and is widely adopted in the Chrome/Android ecosystem.  QUIC is getting increased interest and adoption due to its performance benefits and ease in implementation (it can be implemented in user space and does not need OS kernel changes). 

For more information, here is an informative SNIA presentation on the QUIC protocol.

Please note this is an active area of innovation.  There are other methods including Apple IOS devices using MPTCP, and for inter/intra data center communications RoCE (RDMA over Converged Ethernet) is also gaining traction, as it allows for direct memory access without consuming host CPU cycles.  We expect TCP/QUIC/RDMA will all co-exist, as other new L3/L4 protocols will continue to emerge for next generation workloads. The choice will depend on workloads, service requirements and system availability.

 

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Why Cloud Standards Matter

Alex McDonald

Nov 18, 2020

title of post
Effective cloud data management and interoperability is critical for organizations looking to gain control and security over their cloud usage in hybrid and multicloud environments. The Cloud Data Management Interface (CDMI™), also known as the ISO/IEC 17826 International Standard, is intended for application developers who are implementing or using cloud storage systems, and who are developing applications to manage and consume cloud storage. It specifies how to access cloud storage namespaces and how to interoperably manage the data stored in these namespaces. Standardizing the metadata that expresses the requirements for the data, leads to multiple clouds from different vendors treating your data the same. First published in 2010, the CDMI standard (ISO/IEC 17826:2016) is now at version 2.0 and will be the topic of our webcast on December 9, 2020, “Cloud Data Management & Interoperability: Why A CDMI Standard Matters,” where our experts, Mark Carlson, Co-chair of the SNIA Technical Council and Eric Hibbard, SNIA Storage Security Technical Work Group Chair, will provide an overview of the CDMI standard and cover CDMI 2.0:
  • Support for encrypted objects
  • Delegated access control
  • General clarifications
  • Errata contributed by vendors implementing the CDMI standard
This webcast will be live and Mark and Eric will be available to answer your questions on the spot. We hope to see you there. Register today.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

A New Wave of Video Analytics

Jim Fister

Nov 10, 2020

title of post

Adoption of cognitive services based on video and image analytics is on the rise. It’s an intriguing topic that the SNIA Cloud Storage Technologies Initiative will dive into on December 2, 2020 at our live webcast, “How Video Analytics is Changing the Way We Store Video.” In this webcast, we will look at some of the benefits and factors driving this adoption, as well as explore compelling projects and required components for a successful video-based cognitive service. This includes some great work in the open source community to provide methods and frameworks, some standards that are being worked on to unify the ecosystem and allow interoperability with models and architectures. Finally, we’ll cover the data required to train such models, the data source and how it needs to be treated.

 

As you might guess, there are challenges in how we do all of this. Many video archives are analog and tape based which doesn’t stand up well to mass ingestion or the back and forth of training algorithms. How can we start to define new architectures and leverage the right medium to make our archives accessible while still focusing on performance at the point of capture?

Join us for a discussion on:

  • New and interesting use cases driving adoption of video analytics as a cognitive service
  • Work in the open source arena on new frameworks and standards
  • Modernizing archives to enable training and refinement at will
  • Security and governance where personal identifiable information and privacy become a concern
  • Plugging into the rest of the ecosystem to build rich, video centric experiences for operations staff and consumers

Register today and bring your question for our experts who will be ready to answer them on the spot. We look forward to seeing you.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

NVMe Key-Value Standard Q&A

John Kim

Nov 9, 2020

title of post

Last month, Bill Martin, SNIA Technical Council Co-Chair, presented a detailed update on what’s happening in the development and deployment of the NVMe Key-Value standard. Bill explained where Key Value fits within an architecture, why it’s important, and the standards work that is being done between NVM Express and SNIA. The webcast was one of our highest rated. If you missed it, it’s available on-demand along with the webcast slides. Attendees at the live event had many great questions, which Bill Martin has answered here:

Q. Two of the most common KV storage mechanisms in use today are AWS S3 and RocksDB. How does NVMe KV standards align or differ from them? How difficult would it be to map between the APIs and semantics of those other technologies to NVMe KV devices?

A. KV Storage is intended as a storage layer that would support these and other object storage mechanisms. There is a publicly available KVRocks: RocksDB compatible key value store and MyRocks compatible storage engine designed for KV SSDs at GitHub. There is also a Ceph Object storage design available. These are example implementations that can help an implementer get to an efficient use of NVMe KV storage.

Q. At which layer will my app stack need to change to take advantage of KV storage?  Will VMware or Linux or Windows need to change at the driver level?  Or do the apps need to be changed to treat data differently?  If the apps don’t need to change doesn’t this then just take the data layout tables and move them up the stack in to the server?

A. The application stack needs to change at the point where it interfaces to a filesystem, where the interface would change from a filesystem interface to a KV storage interface. In order to take advantage of Key Value storage, the application itself may need to change, depending on what the current application interface is. If the application is talking to a RocksDB or similar interface, then the driver could simply be changed out to allow the app to talk directly to Key Value Storage. In this case, the application does not care about the API or the underlying storage. If the application is currently interfacing to a filesystem, then the application itself would indeed need to change and the KV API provides a standardized interface that multiple vendors can support to provide both the necessary libraries and access to a Key Value storage device. There will need to be changes in the OS to support this in providing a kernel layer driver for the NVMe KV device. If the application is using an existing driver stack that goes through a filesystem and does not change, then you cannot take advantage of KV Storage, but if the application changes or already has an object storage interface then the kernel filesystem and mapping functions can be removed from the data path.

Q. Is there a limit to the length of a key or value in the KV Architecture?

A.There are limits to the Key and value sizes in the current NVMe standard. The current implementation limits the key to 16 bytes due to a desire to pass the key within the NVMe command. The other architectural limit on a key is that the length of the key is specified in a field that allows up to 255 bytes for the key length. To utilize this, an alternative mechanism for passing the key to the device is necessary. For the value, the limit on the size is 4 GBytes.

Q. Are there any atomicity guarantees (e.g. for overwrites)?

A. The current specification makes it mandatory for atomicity at the KV level. In other words, if a KV Store command overwrites an existing KV pair and there is a power failure, you either get all of the original value or all of the new value.

Q. Is KV storage for a special class of storage called computational storage or can it be used for general purpose storage?

A. This is for any application that benefits from storing objects as opposed to storing blocks. This is unrelated to computational storage but may be of use in computational storage applications. One application that has been considered is that for a filesystem that rather than using the filesystem for storing blocks and having a mapping of each file handle to a set of blocks that contain the file contents, you would use KV storage where the file handle is the key and the object holds the file contents.

Q. What are the most frequently used devices to use the KV structure?

A. If what is being asked is, what are the devices that provide a KV structure, then the answer is, we expect the most common devices using the KV structure will be KV SSDs.

Q. Does the NVMe KV interface require 2 accessed in order to get the value (i.e., on access to get the value size in order to allocate the buffer and then a second access to read the value)?

A.If you know the size of the object or if you can pre-allocate enough space for your maximum size object then you can do a single access. This is no different than current implementations where you actually have to specify how much data you are retrieving from the storage device by specifying a starting LBA and a length. If you do not know the size of the value and require that in order to retrieve the value then you would indeed need to submit two commands to the NVMe KV storage device.

Q. Does the device know whether an object was compressed, and if not how can a previously compressed object be stored?

A. The hardware knows if it does compression automatically and therefore whether it should de-compress the object. If the storage device supports compression and the no-compress option, then the device will store metadata with the KV pair indicating if no-compress was specified when storing the file in order to return appropriate data. If the KV storage device does not perform compression, it can simply support storage and retrieval of previously compressed objects. If the KV storage device performs its own compression and is given a previously-compressed object to store and the no-compress option is not requested, the device will recompress the value (which typically won’t result in any space savings) or if the no-compress option is requested the device will store the value without attempting additional compression.

Q. On flash, erased blocks are fixed sizes, so how does Key Value handle defrag after a lot of writes and deletes?

A. This is implementation specific and depends on the size of the values that are stored. This is much more efficient on values that are approximately the size of the device’s erase block size as those values may be stored in an erase block and when deleted the erase block can be erased. For smaller values, an implementation would need to manage garbage collection as values are deleted and when appropriate move values that remain in a mostly empty erase block into a new erase block prior to erasing the erase block. This is no different than current garbage collection. The NVMe KV standard provides a mechanism for the device to report optimal value size to the host in order to better manage this as well.

Q. What about encryption?  Supported now or will there be SED versions of [key value] drives released down the road?

A. There is no reason that a product could not support encryption with the current definition of key value storage. The release of SED (self-encrypting drive) products is vendor specific.

Q. What are considered to be best use cases for this technology? And for those use cases - what's the expected performance improvement vs. current NVMe drives + software?

A. The initial use case is for database applications where the database is already storing key/value pairs. In this use case, experimentation has shown that a 6x performance improvement from RocksDB to a KV SSD implementing KV-Rocks is possible.

Q. Since writes are complete (value must be written altogether), does this mean values are restricted to NVMe's MDTS?

 A. Yes. Values are limited by MDTS (maximum data transfer size). A KV device may set this value to something greater than a block storage device does in order to support larger value sizes.

Q. How do protection scheme works with key-value (erasure coding/RAID/...)?

A. Since key value deals with complete values as opposed to blocks that make up a user data, RAID and erasure coding are usually not applicable to key value systems. The most appropriate data protection scheme for key value storage devices would be a mirrored scheme. If a storage solution performed erasure coding on data first, it could store the resulting EC fragments or symbols on key value SSDs.

Q. So Key Value is not something built on top of block like Object and NFS are?  Object and NFS data are still stored on disks that operate on sectors, so object and NFS are layers on top of block storage?  KV is drastically different, uses different drive firmware and drive layout?  Or do the drives still work the same and KV is another way of storing data on them alongside block, object, NFS?

A. Today, there is only one storage paradigm at the drive level -- block. Object and NFS are mechanisms in the host to map data models onto block storage. Key Value storage is a mechanism for the storage device to map from an address (a key) to a physical location where the value is stored, avoiding a translation in the host from the Key/value pair to a set of block addresses which are then mapped to physical locations where data is then stored. A device may have one namespace that stores blocks and another namespace that stores key value pairs. There is not a difference in the low-level storage mechanism only in the mapping process from address to physical location. Another difference from block storage is that the value stored is not a fixed size.

Q. Could you explain more about how tx/s is increased with KV?

A. The increase in transfers/second occurs for two reasons: one is because the translation layer in the host from key/value to block storage is removed; the second is that the commands over the bus are reduced to a single transfer for the entire key value pair. The latency savings from this second reduction is less significant than the savings from removing translation operations that have to happen in the host.

Keep up-to-date on work SNIA is doing on the Key Value Storage API Specification at the SNIA website.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Subscribe to