Sorry, you need to enable JavaScript to visit this website.

See You – Virtually – at SDC 2021

Marty Foltyn

Sep 23, 2021

title of post
SNIA Storage Developer Conference goes virtual September 28-29 2021, and compute, memory, and storage are important topics.  SNIA Compute, Memory, and Storage Initiative is a sponsor of SDC 2021 – so visit our booth for the latest information and a chance to chat with our experts.  With over 120 sessions available to watch live during the event and later on-demand, live Birds of a Feather chats, and a Persistent Memory Bootcamp and Hackathon accessing new systems in the cloud, we want to make sure you don’t miss anything!  Register here to see sessions live – or on demand to your schedule. Agenda highlights include: LIVE Birds of a Feather Sessions are OPEN to all – SDC registration not required. Here is your chance, via zoom, to ask your questions of the SNIA experts.  Registration links will go live on September 28 and 29 at this page link. Computational Storage Talks A great video provides an overview of sessions. Watch it here.
  • Computational Storage APIs – how the SNIA Computational Storage TWG is leading the way with new interface definitions with Computational Storage APIs that work across different hardware architectures.
  • NVMe Computational Storage Update – Learn what is happening in NVMe to support Computational Storage devices, including a high level architecture that is being defined in NVMe for Computational Storage. The architecture provides for programs based on a standardized eBPF. (Check out our blog on eBPF.)
Persistent Memory Presentations A great video provides an overview of sessions. Watch it here. The post See You – Virtually – at SDC 2021 first appeared on SNIA Compute, Memory and Storage Blog.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Storage at the Edge Q&A

Alex McDonald

Sep 15, 2021

title of post
The ability to run analytics from the data center to the Edge, where the data is generated and lives creates new use cases for nearly every business. The impact of Edge computing on storage strategy was the topic at our recent SNIA Cloud Storage Technologies Initiative (CSTI) webcast, “Extending Storage to the Edge – How It Should Affect Your Storage Strategy.” If you missed the live event, it’s available on-demand. Our experts, Erin Farr, Senior Technical Staff Member, IBM Storage CTO Innovation Team and Vincent Hsu, IBM Fellow, VP & CTO for Storage received several interesting questions during the live event. As promised, here are answers to them all. Q. What is the core principle of Edge computing technology? A. Edge computing is an industry trend rather than a standardized architecture, though there are organizations like LF EDGE with the objective of establishing an open, interoperable framework. Edge computing is generally about moving the workloads closer to where the data is generated and creating new innovative workloads due to that proximity. Common principles often include the ability to manage Edge devices at scale, using open technologies to create portable solutions, and of ultimately doing all of this with enterprise levels of security. Reference architectures exist for guidance, though implementations can vary greatly by industry vertical. Q. We all know connectivity is not guaranteed – how does that affect these different use cases? What are the HA implications? A. Assuming the requisite retry logic is in place at the various layers (e.g. network, storage, platform, application) as needed, it comes down to a question of how much can each of these use cases tolerate delays until connectivity is obtained again. The cloud bursting use case would likely be impacted by connectivity delays if the workload burst to the cloud for availability reasons or because it needed time-sensitive additional resources. When bursting for performance, the impact depends on the length of the delay vs. the length of the average time savings gained when bursting. Delays in the federated learning use case might only impact how soon a model gets refreshed with updated data. The query engine use case might avoid being impacted if the data has been pre-fetched before the connectivity loss occurred. In all of these cases it is important that the storage fabric resynchronizes the data to be a single unified view (when configured to do so.) Q. Heterogeneity of devices is a challenge in Edge computing, right? A. It is one of the challenges of Edge computing. How the data from Edge devices is stored on an Edge server may also vary depending on how that data gets shared (e.g. MQTT, NFS, REST). Storage software that can virtualize accessing data on an Edge server across different file protocols could simplify application complexity and data management. Q. Can we say Edge computing is an opposite of cloud computing? A. From our perspective, Edge computing is an extension of hybrid cloud. Edge computing can also be viewed as complementary to cloud computing since some workloads are more suitable for Cloud and some are more suitable for Edge. Q. What assumptions are you making about WAN bandwidth? Even when caching data locally the transit time for large amounts of data or large amounts of metadata could be prohibitive. A. Each of these use cases should be assessed under the lens of your industry, business, and data volumes to understand whether any potential latency that’s part of any segment of these flows would be acceptable to you. WAN acceleration, which can be used to ensure certain workloads are prioritized for guaranteed qualities of service, could also be explored to improve or ensure transit times. Integration with Software Defined Networking solutions may also provide mechanisms to mitigate or avoid bandwidth problems. Q. How about the situation where data resides in on-premises data center and machine learning tools are in the cloud to build the model and the goal is not to move the data (security) to cloud, but run and test model only on-premises and score and improve and finally implement? A. The Federated Learning use case allows you to keep the data in the on-premises data center while only moving the model updates to the cloud.  If you also cannot move model updates and if the ML tools are containerized and/or the on-premises site can act as a satellite location for your cloud, it may be possible to run the ML tools in your on-premises data center.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Next-generation Interconnects: The Critical Importance of Connectors and Cables

Tim Lustig

Sep 14, 2021

title of post

Modern data centers consist of hundreds of subsystems connected with optical transceivers, copper cables, and industry standards-based connectors. As data demands escalate, it drives the throughput of these interconnects to increase rapidly, making the maximum reach of copper cabling very short. At the same time, data centers are expanding in size, with nodes stretching further apart. This is making longer-reach optical technologies much more popular. However, optical interconnect technologies are more costly and complex than copper with many new buzz-words and technology concepts.

The rate of change from the vast uptick in data demand accelerates new product development at an incredible pace. While much of the enterprise is still on 10/40/100GbE and 128GFC speeds, the optical standards bodies are beginning to deliver 800G, with 1.6Tb transceivers in discussion! The introduction of new technologies creates a paradigm shift that requires changes and adjustments throughout the network.

There’s a lot to keep up with. That’s why on October 19, 2021 the SNIA Network Storage Forum is hosting a live webcast, “Next-generation Interconnects: The Critical Importance of Connectors and Cables.” In this session, our experts will cover the latest in the impressive array of data center infrastructure components designed to address expanding requirements for higher-bandwidth and lower-power. Including defining new terminology and addressing the next-generation copper and optics solutions required to deliver high signal integrity, lower-latency, and lower insertion loss to achieve maximum efficiency, speed, and density. Register today. We look forward to seeing you on October 19th.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Next-generation Interconnects: The Critical Importance of Connectors and Cables

Tim Lustig

Sep 14, 2021

title of post
Modern data centers consist of hundreds of subsystems connected with optical transceivers, copper cables, and industry standards-based connectors. As data demands escalate, it drives the throughput of these interconnects to increase rapidly, making the maximum reach of copper cabling very short. At the same time, data centers are expanding in size, with nodes stretching further apart. This is making longer-reach optical technologies much more popular. However, optical interconnect technologies are more costly and complex than copper with many new buzz-words and technology concepts. The rate of change from the vast uptick in data demand accelerates new product development at an incredible pace. While much of the enterprise is still on 10/40/100GbE and 128GFC speeds, the optical standards bodies are beginning to deliver 800G, with 1.6Tb transceivers in discussion! The introduction of new technologies creates a paradigm shift that requires changes and adjustments throughout the network. There’s a lot to keep up with. That’s why on October 19, 2021 the SNIA Network Storage Forum is hosting a live webcast, “Next-generation Interconnects: The Critical Importance of Connectors and Cables.” In this session, our experts will cover the latest in the impressive array of data center infrastructure components designed to address expanding requirements for higher-bandwidth and lower-power. Including defining new terminology and addressing the next-generation copper and optics solutions required to deliver high signal integrity, lower-latency, and lower insertion loss to achieve maximum efficiency, speed, and density. Register today. We look forward to seeing you on October 19th.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Genomics Compute, Storage & Data Management Q&A

Alex McDonald

Sep 13, 2021

title of post
Everyone knows data is growing at exponential rates. In fact, the numbers can be mind-numbing. That’s certainly the case when it comes to genomic data where 40,000PB of storage each year will be needed by 2025. Understanding, managing and storing this massive amount of data was the topic at our SNIA Cloud Storage Technologies Initiative webcast “Moving Genomics to the Cloud: Compute and Storage Considerations.” If you missed the live presentation, it’s available on-demand along with presentation slides. Our live audience asked many interesting questions during the webcast, but we did not have time to answer them all. As promised, our experts, Michael McManus, Torben Kling Petersen and Christopher Davidson have answered them all here. Q. Human genomes differ only by 1% or so, there’s an immediate 100x improvement in terms of data compression, 2743EB could become 27430PB, that’s 2.743M HDDs of 10TB each. We have ~200 countries for the 7.8B people, and each country could have 10 sequencing centers on average, each center would need a mere 1.4K HDDs, is there really a big challenge here? A. The problem is not that simple unfortunately. The location of genetic differences and the size of the genetic differences vary a lot across people. Still, there are compression methods like CRAM and PetaGene that can save a lot of space. Also consider all of the sequencing for rare disease, cancer, single cell sequencing, etc. plus sequencing for agricultural products. Q. What’s the best compression ratio for human genome data? A. CRAM states 30-60% compression, PetaGene cites up to 87% compression, but there are a lot of variables to consider and it depends on the use case (e.g., is this compression for archive or for withing run computing). Lustre can compress data by roughly half (compression ratio of 2), though this does not usually include compression of metadata. We have tested PetaGene in our lab and achieved a compression ratio of 2 without any impact on the wall clock. Q. What is the structure of the Genome processed file? It is one large file or multiple small files and what type of IO workload do they have? A. The addendum at the end of this presentation covers file formats for genome files, e.g. FASTQ, BAM, VCF, etc. Q. It’s not just capacity, it’s also about performance. Analysis of genomic data sets is very often hard on large-scale storage systems. Are there prospects for developing methods like in-memory processing, etc., to offload some of the analysis and/or ways to optimize performance of I/O in storage systems for genomic applications? A. At Intel, we are using HPC systems that are using an IB or OPA fabric (or RoCE over Ethernet) with Lustre. We are running in a “throughput” mode versus focusing on individual sample processing speed. Multiple samples are processed in parallel versus sequentially on a compute node. We use a sizing methodology to rate a specific compute node config to provide, for example, our benchmark on our 2nd Gen Scalable processors. This benchmark is 6.4 30x whole genomes per compute node per day. Benchmarks on our 3rd Gen Scalable processors are underway. This sizing methodology allows for the most efficient use of compute resources, which in turn can alleviate storage bottlenecks. Q. What is the typical access pattern of a 350G sequence? Is full traversal most common, or are there usually focal points or hot spots? A. The 350GB is comprised of two possible input file types and 2 output file types. For input file types they can be either a FASTQ file, which is an uncompressed, raw text file, or a compressed version called a uBAM (u=unaligned). The output file types are a compressed “aligned” version called a BAM file, output of the alignment process; and a gVCF file which is the output of the secondary analysis. This 350GB number is highly dependent on data retention policies, compression tools, genome coverage, etc. Q. What is the size of a sequence and how many sequence are we looking at? A. If you are asking about an actual sequence of 6 billion DNA bases (3 billion base pairs) then each base is represented by 1 byte so you have 6 GB.  However, the way the current “short read” sequencers work is using the concept of coverage. This means you run the sequence multiple times, for example 30 times, which is referred to as “30x”. So, 30 times 6GB = 180GB. In terms of My “thought experiment” I considered 7.8B sequences, one for each person on the planet at 30x coverage. This analysis use the ~350GB number which all the files mentioned above. Q. Can you please help with the IO pattern question? A. IO patterns are dependent on the applications used in the pipeline. Applications like GATK baserecal and SAMtools have a lot of random IO and can benefit from the use of SSDs. On the flipside, many of the applications are sequential in nature. Another thing to consider is the amount of IO in relation to the overall pipeline, as the existence of random IO does not inherently mean the existence of a bottleneck. Q. You talked about Prefetch the data before compute which needs a compressed file signature of the actual data and referencing of it. Can you please share some details of what is used now to do this? A. The current implementation of Prefetch via workload manager directives (WLM) is based on metadata queries done using standard SQL on distributed index files in the system. This way, any metadata recorded for a specific file can be used as a search criterion. We’re also working on being able to access and process the index in large concatenated file formats such as NetCDF and others which will extend the capabilities to find the right data at the right time. Q. For Genome and the quantum of data do you see Quartz Glass a better replacement to tape? A. Quartz Glass is an interesting concept but one of many new long term storage technologies being researched. Back in 2012 when this was originally announced by Hitachi, I thought it would most definitely replace many storage technologies, but it’s gone very quiet the last 5+ years so I’m wondering whether this particular technology survived.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Genomics Compute, Storage & Data Management Q&A

Alex McDonald

Sep 13, 2021

title of post
Everyone knows data is growing at exponential rates. In fact, the numbers can be mind-numbing. That’s certainly the case when it comes to genomic data where 40,000PB of storage each year will be needed by 2025. Understanding, managing and storing this massive amount of data was the topic at our SNIA Cloud Storage Technologies Initiative webcast “Moving Genomics to the Cloud: Compute and Storage Considerations.” If you missed the live presentation, it’s available on-demand along with presentation slides. Our live audience asked many interesting questions during the webcast, but we did not have time to answer them all. As promised, our experts, Michael McManus, Torben Kling Petersen and Christopher Davidson have answered them all here. Q. Human genomes differ only by 1% or so, there’s an immediate 100x improvement in terms of data compression, 2743EB could become 27430PB, that’s 2.743M HDDs of 10TB each. We have ~200 countries for the 7.8B people, and each country could have 10 sequencing centers on average, each center would need a mere 1.4K HDDs, is there really a big challenge here? A. The problem is not that simple unfortunately. The location of genetic differences and the size of the genetic differences vary a lot across people. Still, there are compression methods like CRAM and PetaGene that can save a lot of space. Also consider all of the sequencing for rare disease, cancer, single cell sequencing, etc. plus sequencing for agricultural products. Q. What’s the best compression ratio for human genome data? A. CRAM states 30-60% compression, PetaGene cites up to 87% compression, but there are a lot of variables to consider and it depends on the use case (e.g., is this compression for archive or for withing run computing). Lustre can compress data by roughly half (compression ratio of 2), though this does not usually include compression of metadata. We have tested PetaGene in our lab and achieved a compression ratio of 2 without any impact on the wall clock. Q. What is the structure of the Genome processed file? It is one large file or multiple small files and what type of IO workload do they have? A. The addendum at the end of this presentation covers file formats for genome files, e.g. FASTQ, BAM, VCF, etc. Q. It’s not just capacity, it’s also about performance. Analysis of genomic data sets is very often hard on large-scale storage systems. Are there prospects for developing methods like in-memory processing, etc., to offload some of the analysis and/or ways to optimize performance of I/O in storage systems for genomic applications? A. At Intel, we are using HPC systems that are using an IB or OPA fabric (or RoCE over Ethernet) with Lustre. We are running in a “throughput” mode versus focusing on individual sample processing speed. Multiple samples are processed in parallel versus sequentially on a compute node. We use a sizing methodology to rate a specific compute node config to provide, for example, our benchmark on our 2nd Gen Scalable processors. This benchmark is 6.4 30x whole genomes per compute node per day. Benchmarks on our 3rd Gen Scalable processors are underway. This sizing methodology allows for the most efficient use of compute resources, which in turn can alleviate storage bottlenecks. Q. What is the typical access pattern of a 350G sequence? Is full traversal most common, or are there usually focal points or hot spots? A. The 350GB is comprised of two possible input file types and 2 output file types. For input file types they can be either a FASTQ file, which is an uncompressed, raw text file, or a compressed version called a uBAM (u=unaligned). The output file types are a compressed “aligned” version called a BAM file, output of the alignment process; and a gVCF file which is the output of the secondary analysis. This 350GB number is highly dependent on data retention policies, compression tools, genome coverage, etc. Q. What is the size of a sequence and how many sequence are we looking at? A. If you are asking about an actual sequence of 6 billion DNA bases (3 billion base pairs) then each base is represented by 1 byte so you have 6 GB.  However, the way the current “short read” sequencers work is using the concept of coverage. This means you run the sequence multiple times, for example 30 times, which is referred to as “30x”. So, 30 times 6GB = 180GB. In terms of My “thought experiment” I considered 7.8B sequences, one for each person on the planet at 30x coverage. This analysis use the ~350GB number which all the files mentioned above. Q. Can you please help with the IO pattern question? A. IO patterns are dependent on the applications used in the pipeline. Applications like GATK baserecal and SAMtools have a lot of random IO and can benefit from the use of SSDs. On the flipside, many of the applications are sequential in nature. Another thing to consider is the amount of IO in relation to the overall pipeline, as the existence of random IO does not inherently mean the existence of a bottleneck. Q. You talked about Prefetch the data before compute which needs a compressed file signature of the actual data and referencing of it. Can you please share some details of what is used now to do this? A. The current implementation of Prefetch via workload manager directives (WLM) is based on metadata queries done using standard SQL on distributed index files in the system. This way, any metadata recorded for a specific file can be used as a search criterion. We’re also working on being able to access and process the index in large concatenated file formats such as NetCDF and others which will extend the capabilities to find the right data at the right time. Q. For Genome and the quantum of data do you see Quartz Glass a better replacement to tape? A. Quartz Glass is an interesting concept but one of many new long term storage technologies being researched. Back in 2012 when this was originally announced by Hitachi, I thought it would most definitely replace many storage technologies, but it’s gone very quiet the last 5+ years so I’m wondering whether this particular technology survived.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Demystifying the Fibre Channel SAN Protocol

John Kim

Sep 10, 2021

title of post

Every wonder how Fibre Channel (FC) hosts and targets really communicate? Join the SNIA Networking Storage Forum (NSF) on September 23, 2021 for a live webcast, “How Fibre Channel Hosts and Targets Really Communicate.” This SAN overview will dive into details on how initiators (hosts) and targets (storage arrays) communicate and will address key questions, like:

  • How do FC links activate?
  • Is FC routable?
  • What kind of flow control is present in FC?
  • How do initiators find targets and set up their communication?
  • Finally, how does actual data get transferred between initiators and hosts, since that is the ultimate goal?

Each SAN transport has its own way to initialize and transfer data. This is an opportunity to learn how it works in the Fibre Channel world. Storage experts will introduce the concepts and demystify the inner workings of FC SAN. Register today.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Demystifying the Fibre Channel SAN Protocol

John Kim

Sep 10, 2021

title of post
Every wonder how Fibre Channel (FC) hosts and targets really communicate? Join the SNIA Networking Storage Forum (NSF) on September 23, 2021 for a live webcast, “How Fibre Channel Hosts and Targets Really Communicate.” This SAN overview will dive into details on how initiators (hosts) and targets (storage arrays) communicate and will address key questions, like:
  • How do FC links activate?
  • Is FC routable?
  • What kind of flow control is present in FC?
  • How do initiators find targets and set up their communication?
  • Finally, how does actual data get transferred between initiators and hosts, since that is the ultimate goal?
Each SAN transport has its own way to initialize and transfer data. This is an opportunity to learn how it works in the Fibre Channel world. Storage experts will introduce the concepts and demystify the inner workings of FC SAN. Register today.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Storage for Applications Webcast Series

John Kim

Sep 8, 2021

title of post

Everyone enjoys having storage that is fast, reliable, scalable, and affordable. But it turns out different applications have different storage needs in terms of I/O requirements, capacity, data sharing, and security.  Some need local storage, some need a centralized storage array, and others need distributed storage—which itself could be local or networked. One application might excel with block storage while another with file or object storage. For example, an OLTP database might require small amounts of very fast flash storage; a media or streaming application might need vast quantities of inexpensive disk storage with extra security safeguards; while a third application might require a mix of different storage tiers with multiple servers sharing the same data. This SNIA Networking Storage Forum "Storage for Applications" webcast series will cover the storage requirements for specific uses such as artificial intelligence (AI), database, cloud, media & entertainment, automotive, edge, and more. With limited resources, it’s important to understand the storage intent of the applications in order to choose the right storage and storage networking strategy, rather than discovering the hard way that you’ve chosen the wrong solution for your application.

We kick off this series on October 5, 2020 with “Storage for AI Applications.” AI is a technology which itself encompasses a broad range of use cases, largely divided into training and inference. In this webcast, we’ll look at what types of storage are typically needed for different aspects of AI, including different types of access (local vs. networked, block vs. file vs. object) and different performance requirements. And we will discuss how different AI implementations balance the use of on-premises vs. cloud storage. Tune in to this SNIA Networking Storage Forum (NSF) webcast to boost your natural (not artificial) intelligence about application-specific storage. Register today. Our AI experts will be waiting to answer your questions.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Storage for Applications Webcast Series

John Kim

Sep 8, 2021

title of post
Everyone enjoys having storage that is fast, reliable, scalable, and affordable. But it turns out different applications have different storage needs in terms of I/O requirements, capacity, data sharing, and security.  Some need local storage, some need a centralized storage array, and others need distributed storage—which itself could be local or networked. One application might excel with block storage while another with file or object storage. For example, an OLTP database might require small amounts of very fast flash storage; a media or streaming application might need vast quantities of inexpensive disk storage with extra security safeguards; while a third application might require a mix of different storage tiers with multiple servers sharing the same data. This SNIA Networking Storage Forum “Storage for Applications” webcast series will cover the storage requirements for specific uses such as artificial intelligence (AI), database, cloud, media & entertainment, automotive, edge, and more. With limited resources, it’s important to understand the storage intent of the applications in order to choose the right storage and storage networking strategy, rather than discovering the hard way that you’ve chosen the wrong solution for your application. We kick off this series on October 5, 2020 with “Storage for AI Applications.” AI is a technology which itself encompasses a broad range of use cases, largely divided into training and inference. In this webcast, we’ll look at what types of storage are typically needed for different aspects of AI, including different types of access (local vs. networked, block vs. file vs. object) and different performance requirements. And we will discuss how different AI implementations balance the use of on-premises vs. cloud storage. Tune in to this SNIA Networking Storage Forum (NSF) webcast to boost your natural (not artificial) intelligence about application-specific storage. Register today. Our AI experts will be waiting to answer your questions.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Subscribe to