Sorry, you need to enable JavaScript to visit this website.

The Evolution of Congestion Management in Fibre Channel

Erik Smith

Jun 20, 2024

title of post
The Fibre Channel (FC) industry introduced Fabric Notifications as a key resiliency mechanism for storage networks in 2021 to combat congestion, link integrity, and delivery errors. Since then, numerous manufacturers of FC SAN solutions have implemented Fabric Notifications and enhanced the overall user experience when deploying FC SANs. On August 27, 2024, the SNIA Data, Networking & Storage Forum is hosting a live webinar, “The Evolution of Congestion Management in Fibre Channel,” for a deep dive into Fibre Channel congestion management. We’ve convened a stellar, multi-vendor group of Fibre Channel experts with extensive Fibre Channel knowedge and different technology viewpoints to explore the evolution of Fabric Notifications and the available solutions of this exciting new technology. You’ll learn:
  • The state of Fabric Notifications as defined by the Fibre Channel standards.
  • The mechanisms and techniques for implementing Fabric Notifications.
  • The currently available solutions deploying Fabric Notifications.
Register today and bring your questions for our experts. We hope you’ll join us on August 27th.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

The Evolution of Congestion Management in Fibre Channel

Erik Smith

Jun 20, 2024

title of post
The Fibre Channel (FC) industry introduced Fabric Notifications as a key resiliency mechanism for storage networks in 2021 to combat congestion, link integrity, and delivery errors. Since then, numerous manufacturers of FC SAN solutions have implemented Fabric Notifications and enhanced the overall user experience when deploying FC SANs. On August 27, 2024, the SNIA Data, Networking & Storage Forum is hosting a live webinar, “The Evolution of Congestion Management in Fibre Channel,” for a deep dive into Fibre Channel congestion management. We’ve convened a stellar, multi-vendor group of Fibre Channel experts with extensive Fibre Channel knowedge and different technology viewpoints to explore the evolution of Fabric Notifications and the available solutions of this exciting new technology. You’ll learn:
  • The state of Fabric Notifications as defined by the Fibre Channel standards.
  • The mechanisms and techniques for implementing Fabric Notifications.
  • The currently available solutions deploying Fabric Notifications.
Register today and bring your questions for our experts. We hope you’ll join us on August 27th. The post The Evolution of Congestion Management in Fibre Channel first appeared on SNIA on Data, Networking & Storage.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Three Truths About Hard Drives and SSDs

Jason Feist

May 23, 2024

title of post
An examination of the claim that flash will replace hard drives in the data center “Hard drives will soon be a thing of the past.” “The data center of the future is all-flash.” Such predictions foretelling hard drives’ demise, perennially uttered by a few vocal proponents of flash-only technology, have not aged well. Without question, flash storage is well-suited to support applications that require high-performance and speed. And flash revenue is growing, as is all-flash-array (AFA) revenue. But not at the expense of hard drives. We are living in an era where the ubiquity of the cloud and the emergence of AI use cases have driven up the value of massive data sets. Hard drives, which today store by far the majority of the world’s exabytes (EB), are more indispensable to data center operators than ever. Industry analysts expect hard drives to be the primary beneficiary of continued EB growth, especially in enterprise and large cloud data centers—where the vast majority of the world’s data sets reside. Myth: SSD pricing will soon match the pricing of hard drives. Truth: SSD and hard drive pricing will not converge at any point in the next decade. Hard drives hold a firm cost-per-terabyte (TB) advantage over SSDs, which positions them as the unquestionable cornerstone of data center storage infrastructure. Analysis of research by IDC, TRENDFOCUS, and Forward Insights confirms that hard drives will remain the most cost-effective option for most enterprise tasks. The price-per-TB difference between enterprise SSDs and enterprise hard drives is projected to remain at or above a 6 to 1 premium through at least 2027. This differential is particularly evident in the data center, where device acquisition cost is by far the dominant component in total cost of ownership (TCO). Taking all storage system costs into consideration—including device acquisition, power, networking, and compute costs—a far superior TCO is rendered by hard drive-based systems on a per-TB basis. Myth: Supply of NAND can ramp to replace all hard drive capacity. Truth: Entirely replacing hard drives with NAND would require untenable CapEx investments. The notion that the NAND industry would or could rapidly increase its supply to replace all hard drive capacity isn’t just optimistic—such an attempt would lead to financial ruin. According to the Q4 2023 NAND Market Monitor report from industry analyst Yole Intelligence, the entire NAND industry shipped 3.1 zettabytes (ZB) from 2015 to 2023, while having to invest a staggering $208 billion in CapEx—approximately 47% of their combined revenue. In contrast, the hard drive industry addresses the vast majority—almost 90%—of large-scale data center storage needs in a highly capital-efficient manner. To help crystalize it, let's do a thought experiment. Note that there are 3 hard drive manufacturers in the world. Let's take one of them, for whom we have the numbers, as an example. Look at the chart below where byte production efficiency of the NAND and hard drive industries are compared, using this hard drive manufacturer as a proxy. You can easily see, even with just one manufacturer represented, that the hard drive industry is far more efficient at delivering ZBs to the data center. Could the flash industry fully replace the entire hard drive industry’s capacity output by 2028? Yole Intelligence report cited above indicates that from 2025 to 2027, the NAND industry will invest about $73 billion, which is estimated to yield 963EB of output for enterprise SSDs as well as other NAND products for tablets and phones. This translates to an investment of about $76 per TB of flash storage output. Applying that same capital price per bit, it would require a staggering $206 billion in additional investment to support the 2.723ZB of hard drive capacity forecast to ship in 2027. In total, that’s nearly $279 billion of investment for a total addressable market of approximately $25 billion. A 10:1 loss. This level of investment is unlikely for an industry facing uncertain returns, especially after losing money throughout 2023. Myth: Only AFAs can meet the performance requirements of modern enterprise workloads. Truth: Enterprise storage architecture usually mixes media types to optimize for the cost, capacity, and performance needs of specific workloads. At issue here is a false dichotomy. All-flash vendors advise enterprises to “simplify” and “future-proof” by going all-in on flash for high performance. Otherwise, they posit, enterprises risk finding themselves unable to keep pace with the performance demands of modern workloads. This zero-sum logic fails because:
  1. Most modern workloads do not require the performance advantage offered by flash.
  2. Enterprises must balance capacity and cost, as well as performance.
  3. The purported simplicity of a single-tier storage architecture is a solution in search of a problem.
Let’s address these one by one. First, most of the world’s data resides in the cloud and large data centers. There, only a small percentage of the workload requires a significant percentage of the performance. This is why according to IDC[1] over the last five years, hard drives have amounted to almost 90% of the storage installed base in cloud service providers and hyperscale data centers. In some cases, all-flash systems are not even required at all as part of the highest performance solutions. There are hybrid storage systems that perform as well as or faster than all-flash. Second, TCO considerations are key to most data center infrastructure decisions. This forces a balance of cost, capacity, and performance. Optimal TCO is achieved by aligning the most cost-effective media—hard drive, flash, or tape—to the workload requirement. Hard drives and hybrid arrays (built from hard drives and SSDs) are a great fit for most enterprise and cloud storage and application use cases. While flash storage excels in read-intensive scenarios, its endurance diminishes with increased write activity. Manufacturers address this with error correction and overprovisioning—extra, unseen storage, to replace worn cells. However, overprovisioning greatly increases the imbedded product cost and constant power is needed to avoid data loss, posing cost challenges in data centers. Additionally, while technologies like triple level cell (TLC) and quad-level cell (QLC) allow flash to handle data-heavy workloads like hard drives, the economic rationale weakens for larger data sets or long-term retention. In these cases, disk drives, with their growing areal density, offer a more cost-effective solution. Third, the claim that using an AFA is “simpler” than adopting a mix of media types in a tiered architecture is a solution in search of a problem. Many hybrid storage systems employ a well-proven and finely tuned software-defined architecture that seamlessly integrates and harnesses the strengths of diverse media types into singular units. In scale-out private or public cloud data center architectures, file systems or software defined storage is used to manage the data storage workloads across data center locations and regions. AFAs and SSDs are a great fit for high-performance, read-intensive workloads. But it’s a mistake to extrapolate from niche use cases or small-scale deployments to the mass market and hyperscale where AFAs provide an unnecessarily expensive way to do what hard drives already deliver at a much lower TCO. The data bears it out. Analysis of data from IDC and TRENDFOCUS predicts an almost 250% increase in EB outlook for hard drives by 2028. Extrapolating further out in time, that ratio holds well into the next decade. Hard drives, indeed, are here to stay—in synergy with flash storage.  And the continued use of hard drives and hybrid array solutions supports, in turn, the continued use of SAS technologies in the infrastructure data centers. About the Author Jason Feist is senior vice president of marketing, products and markets, at Seagate Technology.   [1] IDC, Multi-Client Study, Cloud Infrastructure Index 2023: Compute and Storage Consumption by 100 Service Providers, November 2023

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Erin Farr

May 6, 2024

title of post
In a little over a month, more than 1,500 people have viewed the SNIA Cloud Storage Technologies Initiative (CSTI) live webinar, “Ceph: The Linux of Storage Today,” with SNIA experts Vincent Hsu and Tushar Gohad. If you missed it, you can watch it on-demand at the SNIA Educational Library. The live audience was extremely engaged with our presenters, asking several interesting questions. As promised, Vincent and Tushar have answered them here. Given the high level of this interest in this topic, the CSTI is planning additional sessions on Ceph. Please follow us @sniacloud_com or at SNIA LinkedIn for dates. Q: How many snapshots can Ceph support per cluster? Q: Does Ceph provide Deduplication? If so, is it across objects, file and block storage?  A: There is no per-cluster limit. In the Ceph filesystem (cephfs) it is possible to create snapshots on a per-path basis, and currently the configurable default limit is 100 snapshots per path. The Ceph block storage (rbd) does not impose limits on the number of snapshots.  However, when using the native Linux kernel rbd client there is a limit of 510 snapshots per image. There is a Ceph project to support data deduplication, though it is not available yet. Q: How easy is the installation setup? I heard Ceph is hard to setup.  A: Ceph used to be difficult to install, however, the ceph deployment process has gone under many changes and improvements.  In recent years, the experience has been very streamlined. The cephadm system was created in order to bootstrap and manage the Ceph cluster, and Ceph also can now be deployed and managed via a dashboard.  Q: Does Ceph provide good user-interface to monitor usage, performance, and other details in case it is used as an object-as-a-service across multiple tenants? A: Currently the Ceph dashboard allows monitoring the usage and performance at the cluster level and also at a per-pool basis. This question falls under consumability.  Many people contribute to the community in this area. You will start seeing more of these management tool capabilities being added, to see a better profile of the utilization efficiencies, multi-tenancy, and qualities of service. The more that Ceph becomes the substrate for cloud-native on-premises storage, the more these technologies will show up in the community. Ceph dashboard has come a long way.   Q: A slide mentioned support for tiered storage. Is tiered meant in the sense of caching (automatically managing performance/locality) or for storing data with explicitly different lifetimes/access patterns? A: The slide mentioned the future support in Crimson for device tiering. That feature, for example, will allow storing data with different access patterns (and indeed lifetimes) on different devices. Access the full webinar presentation here.  Q: Can you discuss any performance benchmarks or case studies demonstrating the benefits of using Ceph as the underlying storage infrastructure for AI workloads?    A: The AI workloads have multiple requirements that Ceph is very suitable for:
  • Performance: Ceph can provide the high performance demands that AI workloads can require. As a SDS solution, it can be deployed on different hardware to provide the necessary performance characteristics that are needed. It can scale out and provide more parallelism to adapt to increase in performance demands. A recent post by a Ceph community member showed a Ceph cluster performing at 1 TiB/s.
  • Scale-out: Scale was built from the bottom up as a scale out solution. As the training and inferencing data is growing, it is possible to grow the cluster to provide more capacity and more performance. Ceph can scale to thousands of nodes.
  • Durability: Training data set sizes can become very large and it is important that the storage system itself takes care of the data durability, as transferring the data in and out of the storage system can be prohibitive. Ceph employs different techniques such as data replication and erasure coding, as well as automatic healing and data re-distribution to ensure data durability
  • Reliability: It is important that the storage systems operate continuously, even as failures are happening through the training and inference processing. In a large system where thousands of storage devices failures are the norm. Ceph was built from the ground up to avoid a single point of failure, and it can continue to operate and automatically recover when failures happen.
Object, Block, File support: Different AI applications require different types of storage. Ceph provides both object, block, and file access. Q: Is it possible to geo-replicate Ceph datastore? Having a few Exabytes in the single datacenter seems a bit scary.  A: We know you don’t want all your eggs in one basket. Ceph can perform synchronous or asynchronous replication. Synchronous replication is especially used in a stretch cluster context, where data can be spread across multiple data centers. Since Ceph is strongly consistent, stretch clusters are limited to deployments where the latency between the data center is relatively low. For example, stretch clusters are in general, shorter distance, i.e., not beyond 100-200 km.  Otherwise, the turnaround time would be too long. For longer distances for geo-replication, people typically perform asynchronous replication between different Ceph clusters. Ceph also supports different geo replication schemes. The Ceph Object storage (RGW) provides the ability to access data in multiple geographical regions and allow data to be synchronized between them. Ceph RBD provides asynchronous mirroring that enables replication of RBD images between different Ceph clusters. The Ceph filesystem provides similar capabilities, and improvements to this feature are being developed.  Q: Is there any NVMe over HDD percentage capacity that has the best throughput?  For example, for 1PB of HDD, how much NVMe capacity is recommended? Also, can you please include a link to the Communities Vincent referenced? A: Since NVMe provides superior performance to HDD, the more NVMe devices being used, the better the expected throughput. However, when factoring in cost and trying to get better cost/performance ratio, there are a few ways that Ceph can be configured to minimize the HDD performance penalties. The Ceph documentation recommends that in a mixed spinning and solid drive setup, the OSD metadata should be put on the solid state drive, and it should be at least in the range of 1-4% of the size of the HDD. Ceph also allows you to create different storage pools that can be built by different media types in order to accommodate different application needs. For example, applications that need higher IO and/or higher data throughput can be set to use the more expensive NVMe based data pool, etc. There is no hard rule.  It depends on factors like what CPU you have. What is seen today is that users tend to implement all Flash NVMe, but not a lot of hybrid configurations. They’ll implement all Flash, even object storage block storage, to get consistent performance. Another scenario is using HDD for high-capacity object storage for a data repository. The community and Ceph documentation have the best practices, known principles and architecture guidelines for a CPU to hard drive ratio or a CPU to NVMe ratio. The Ceph community is launching a user council to gather best practices from users, and involves two topics: Performance and Consumability If you are a user of Ceph, we strongly recommend you join the community and participate in user council discussions. https://ceph.io/en/community/ Q: Hardware RAID controllers made sense on few CPU cores systems. Can any small RAID controller compete with massive core densities and large memory banks on modern systems? A: Ceph provides its own durability, so in most cases there is no need to also use a RAID controller. Ceph can provide durability leveraging data replication and/or erasure coding schemes. Q: I would like to know if there is a Docker version for Ceph? What is the simple usage of Ceph? A: Full fledged Ceph system requires multiple daemons to be managed, and as such a single container image is not the best fit. Ceph can be deployed on Kubernetes via Rook. There have been different experimental upstream projects to allow running a simplified version of Ceph. These are not currently supported by the Ceph community. Q: Does Ceph support Redfish/ Swordfish APIs for management? Q: Was SPDK considered for low level locking? A: Yes, Ceph supports both Redfish and Swordfish APIs for management.  Here are example technical user guide references. https://docs.ceph.com/en/latest/hardware-monitoring/ https://www.snia.org/sites/default/files/technical_work/Swordfish/Swordfish_v1.0.6_UserGuide.pdf To answer the second part of your question, SeaStar, which follows similar design principals as SPDK, is used as the asynchronous programming library given it is already in C++ and allows us to use a pluggable network and storage stack—standard kernel/libc based network stack or DPDK, io_uring or SPDK, etc.  We are in discussion with the SeaStar community to see how SPDK can be natively enabled for storage access Q: Are there scale limitations on the number of MONs to OSDs, wouldn’t there be issues with OSDs reporting back to MON's (epochs, maps) etc management based on number of OSDs? A: The issue of scaling the number of OSDs has been tested and addressed. In 2017 it was reported that CERN tested successfully a Ceph cluster with over 10,000 OSDs. Nowadays, the public Ceph telemetry shows regularly many active clusters in the range of 1,000-4,000 OSDs. Q: I saw you have support for NVMe/TCP.  Are there any plans for adding NVMe/FC support? A: There are no current plans to support NVMe/FC. Q: What about fault tolerance? If we have one out of 24 nodes offline, how possible is data loss? How can the cluster avoid request to down nodes? A: There are two aspects to this question: Data loss: Ceph has reputation in the market for its very conservative approach to protect the data. Once it approaches critical mass, Ceph will stop the writes to the system. Availability: This depends on how you configured it. For example, some users spread 6 copies of data across 3 data centers. If you lose the whole site, or multiple drives, the data is still available. It really depends on what is your protection design for that. Data can be set to be replicated into different failure domains, in which case it can be guaranteed that, unless there are multiple failures in multiple domains, there is no data loss.  The cluster marks and tracks down nodes and makes sure that all requests go to nodes that are available. Ceph replicates the data and different schemes can be used to provide data durability. It depends on your configuration, but the design principle of Ceph is to make sure you don’t lose data. Let’s say you have 3-way replication. If you start to lose critical mass, Ceph will go into read-only mode. Ceph will stop the write operation to make sure you don’t update the current state until you recover it. Q: Can you comment on Ceph versus Vector Database? A: Ceph is a unified storage system that can provide file, block, and object access. It does not provide the same capabilities that a vector database needs to provide. There are cases where a vector database can use Ceph as its underlying storage system.  Q: Is there any kind of support for parallel I/O on Ceph? A: Ceph natively performs parallel I/O. By default, it schedules all operations directly to the OSDs and in parallel. Q: Can you use Ceph with two AD domains? Let say we have a path /FS/share1/. Can you create two SMB shares for this path, one per domain with different set of permissions each? A: Partial AD support has been recently added to upstream Ceph and will be available in future versions. Support for multiple ADs is being developed. Q: Does Ceph provide shared storage similar to Gluster or something like EFS?  Also, does Ceph work best with many small files or large files? A: Yes, Ceph provides shared file storage like EFS. There is no concrete answer to whether many small files are better than large files. Ceph can handle either.  In terms of “what is best”, most of the file storage today is not optimized for very tiny files. In general, many small files would likely use more metadata storage, and are likely to gain less from certain prefetching optimizations. Ceph can comfortably handle large files, though this is not a binary answer.  Over time, Ceph will continue to improve in terms of granularity of file support. Q: What type of storage is sitting behind the OSD design?  VMware SAN? A: The OSD device can use any raw block device, e.g., JBOD.  Also, the assumption here is that every OSD, traditionally, is mapped to one disk. It could be a virtual disk, but it’s typically a physical disk. Think of a bunch of NVMe disks in a physical server with one OSD handling one disk. But we can have namespaces, for example, ZNS type drives, that allow us to do physical partitioning based on the type of media, and expose the disk as partitions. We could have one OSD to a partition. Ceph provides equivalent functionality to a VSAN. Each ceph OSD manages a physical drive or a subset of a drive. Q: How can hardware RAID coexist with Ceph? A: Ceph can use hardware RAID for its underlying storage, however, as Ceph manages its own durability, there is not necessarily additional benefit of adding RAID in most cases.  Doing so would duplicate the durability functions at a block level, reducing capacity and impacting performance. A lower latency drive could perform better. Most people use multiple 3-way replication or they just use erasure coding. Another consideration is you can run on any server instead of hard-coding for particular RAID adapters.              

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Erin Farr

May 6, 2024

title of post
In a little over a month, more than 1,500 people have viewed the SNIA Cloud Storage Technologies Initiative (CSTI) live webinar, “Ceph: The Linux of Storage Today,” with SNIA experts Vincent Hsu and Tushar Gohad. If you missed it, you can watch it on-demand at the SNIA Educational Library. The live audience was extremely engaged with our presenters, asking several interesting questions. As promised, Vincent and Tushar have answered them here. Given the high level of this interest in this topic, the CSTI is planning additional sessions on Ceph. Please follow us @sniacloud_com or at SNIA LinkedIn for dates. Q: How many snapshots can Ceph support per cluster? Q: Does Ceph provide Deduplication? If so, is it across objects, file and block storage?  A: There is no per-cluster limit. In the Ceph filesystem (cephfs) it is possible to create snapshots on a per-path basis, and currently the configurable default limit is 100 snapshots per path. The Ceph block storage (rbd) does not impose limits on the number of snapshots.  However, when using the native Linux kernel rbd client there is a limit of 510 snapshots per image. There is a Ceph project to support data deduplication, though it is not available yet. Q: How easy is the installation setup? I heard Ceph is hard to setup.  A: Ceph used to be difficult to install, however, the ceph deployment process has gone under many changes and improvements.  In recent years, the experience has been very streamlined. The cephadm system was created in order to bootstrap and manage the Ceph cluster, and Ceph also can now be deployed and managed via a dashboard.  Q: Does Ceph provide good user-interface to monitor usage, performance, and other details in case it is used as an object-as-a-service across multiple tenants? A: Currently the Ceph dashboard allows monitoring the usage and performance at the cluster level and also at a per-pool basis. This question falls under consumability.  Many people contribute to the community in this area. You will start seeing more of these management tool capabilities being added, to see a better profile of the utilization efficiencies, multi-tenancy, and qualities of service. The more that Ceph becomes the substrate for cloud-native on-premises storage, the more these technologies will show up in the community. Ceph dashboard has come a long way.   Q: A slide mentioned support for tiered storage. Is tiered meant in the sense of caching (automatically managing performance/locality) or for storing data with explicitly different lifetimes/access patterns? A: The slide mentioned the future support in Crimson for device tiering. That feature, for example, will allow storing data with different access patterns (and indeed lifetimes) on different devices. Access the full webinar presentation here.  Q: Can you discuss any performance benchmarks or case studies demonstrating the benefits of using Ceph as the underlying storage infrastructure for AI workloads?    A: The AI workloads have multiple requirements that Ceph is very suitable for:
  • Performance: Ceph can provide the high performance demands that AI workloads can require. As a SDS solution, it can be deployed on different hardware to provide the necessary performance characteristics that are needed. It can scale out and provide more parallelism to adapt to increase in performance demands. A recent post by a Ceph community member showed a Ceph cluster performing at 1 TiB/s.
  • Scale-out: Scale was built from the bottom up as a scale out solution. As the training and inferencing data is growing, it is possible to grow the cluster to provide more capacity and more performance. Ceph can scale to thousands of nodes.
  • Durability: Training data set sizes can become very large and it is important that the storage system itself takes care of the data durability, as transferring the data in and out of the storage system can be prohibitive. Ceph employs different techniques such as data replication and erasure coding, as well as automatic healing and data re-distribution to ensure data durability
  • Reliability: It is important that the storage systems operate continuously, even as failures are happening through the training and inference processing. In a large system where thousands of storage devices failures are the norm. Ceph was built from the ground up to avoid a single point of failure, and it can continue to operate and automatically recover when failures happen.
Object, Block, File support: Different AI applications require different types of storage. Ceph provides both object, block, and file access. Q: Is it possible to geo-replicate Ceph datastore? Having a few Exabytes in the single datacenter seems a bit scary.  A: We know you don’t want all your eggs in one basket. Ceph can perform synchronous or asynchronous replication. Synchronous replication is especially used in a stretch cluster context, where data can be spread across multiple data centers. Since Ceph is strongly consistent, stretch clusters are limited to deployments where the latency between the data center is relatively low. For example, stretch clusters are in general, shorter distance, i.e., not beyond 100-200 km.  Otherwise, the turnaround time would be too long. For longer distances for geo-replication, people typically perform asynchronous replication between different Ceph clusters. Ceph also supports different geo replication schemes. The Ceph Object storage (RGW) provides the ability to access data in multiple geographical regions and allow data to be synchronized between them. Ceph RBD provides asynchronous mirroring that enables replication of RBD images between different Ceph clusters. The Ceph filesystem provides similar capabilities, and improvements to this feature are being developed.  Q: Is there any NVMe over HDD percentage capacity that has the best throughput?  For example, for 1PB of HDD, how much NVMe capacity is recommended? Also, can you please include a link to the Communities Vincent referenced? A: Since NVMe provides superior performance to HDD, the more NVMe devices being used, the better the expected throughput. However, when factoring in cost and trying to get better cost/performance ratio, there are a few ways that Ceph can be configured to minimize the HDD performance penalties. The Ceph documentation recommends that in a mixed spinning and solid drive setup, the OSD metadata should be put on the solid state drive, and it should be at least in the range of 1-4% of the size of the HDD. Ceph also allows you to create different storage pools that can be built by different media types in order to accommodate different application needs. For example, applications that need higher IO and/or higher data throughput can be set to use the more expensive NVMe based data pool, etc. There is no hard rule.  It depends on factors like what CPU you have. What is seen today is that users tend to implement all Flash NVMe, but not a lot of hybrid configurations. They’ll implement all Flash, even object storage block storage, to get consistent performance. Another scenario is using HDD for high-capacity object storage for a data repository. The community and Ceph documentation have the best practices, known principles and architecture guidelines for a CPU to hard drive ratio or a CPU to NVMe ratio. The Ceph community is launching a user council to gather best practices from users, and involves two topics: Performance and Consumability If you are a user of Ceph, we strongly recommend you join the community and participate in user council discussions. https://ceph.io/en/community/ Q: Hardware RAID controllers made sense on few CPU cores systems. Can any small RAID controller compete with massive core densities and large memory banks on modern systems? A: Ceph provides its own durability, so in most cases there is no need to also use a RAID controller. Ceph can provide durability leveraging data replication and/or erasure coding schemes. Q: I would like to know if there is a Docker version for Ceph? What is the simple usage of Ceph? A: Full fledged Ceph system requires multiple daemons to be managed, and as such a single container image is not the best fit. Ceph can be deployed on Kubernetes via Rook. There have been different experimental upstream projects to allow running a simplified version of Ceph. These are not currently supported by the Ceph community. Q: Does Ceph support Redfish/ Swordfish APIs for management? Q: Was SPDK considered for low level locking? A: Yes, Ceph supports both Redfish and Swordfish APIs for management.  Here are example technical user guide references. https://docs.ceph.com/en/latest/hardware-monitoring/ https://www.snia.org/sites/default/files/technical_work/Swordfish/Swordfish_v1.0.6_UserGuide.pdf To answer the second part of your question, SeaStar, which follows similar design principals as SPDK, is used as the asynchronous programming library given it is already in C++ and allows us to use a pluggable network and storage stack—standard kernel/libc based network stack or DPDK, io_uring or SPDK, etc.  We are in discussion with the SeaStar community to see how SPDK can be natively enabled for storage access Q: Are there scale limitations on the number of MONs to OSDs, wouldn’t there be issues with OSDs reporting back to MON’s (epochs, maps) etc management based on number of OSDs? A: The issue of scaling the number of OSDs has been tested and addressed. In 2017 it was reported that CERN tested successfully a Ceph cluster with over 10,000 OSDs. Nowadays, the public Ceph telemetry shows regularly many active clusters in the range of 1,000-4,000 OSDs. Q: I saw you have support for NVMe/TCP.  Are there any plans for adding NVMe/FC support? A: There are no current plans to support NVMe/FC. Q: What about fault tolerance? If we have one out of 24 nodes offline, how possible is data loss? How can the cluster avoid request to down nodes? A: There are two aspects to this question: Data loss: Ceph has reputation in the market for its very conservative approach to protect the data. Once it approaches critical mass, Ceph will stop the writes to the system. Availability: This depends on how you configured it. For example, some users spread 6 copies of data across 3 data centers. If you lose the whole site, or multiple drives, the data is still available. It really depends on what is your protection design for that. Data can be set to be replicated into different failure domains, in which case it can be guaranteed that, unless there are multiple failures in multiple domains, there is no data loss.  The cluster marks and tracks down nodes and makes sure that all requests go to nodes that are available. Ceph replicates the data and different schemes can be used to provide data durability. It depends on your configuration, but the design principle of Ceph is to make sure you don’t lose data. Let’s say you have 3-way replication. If you start to lose critical mass, Ceph will go into read-only mode. Ceph will stop the write operation to make sure you don’t update the current state until you recover it. Q: Can you comment on Ceph versus Vector Database? A: Ceph is a unified storage system that can provide file, block, and object access. It does not provide the same capabilities that a vector database needs to provide. There are cases where a vector database can use Ceph as its underlying storage system.  Q: Is there any kind of support for parallel I/O on Ceph? A: Ceph natively performs parallel I/O. By default, it schedules all operations directly to the OSDs and in parallel. Q: Can you use Ceph with two AD domains? Let say we have a path /FS/share1/. Can you create two SMB shares for this path, one per domain with different set of permissions each? A: Partial AD support has been recently added to upstream Ceph and will be available in future versions. Support for multiple ADs is being developed. Q: Does Ceph provide shared storage similar to Gluster or something like EFS?  Also, does Ceph work best with many small files or large files? A: Yes, Ceph provides shared file storage like EFS. There is no concrete answer to whether many small files are better than large files. Ceph can handle either.  In terms of “what is best”, most of the file storage today is not optimized for very tiny files. In general, many small files would likely use more metadata storage, and are likely to gain less from certain prefetching optimizations. Ceph can comfortably handle large files, though this is not a binary answer.  Over time, Ceph will continue to improve in terms of granularity of file support. Q: What type of storage is sitting behind the OSD design?  VMware SAN? A: The OSD device can use any raw block device, e.g., JBOD.  Also, the assumption here is that every OSD, traditionally, is mapped to one disk. It could be a virtual disk, but it’s typically a physical disk. Think of a bunch of NVMe disks in a physical server with one OSD handling one disk. But we can have namespaces, for example, ZNS type drives, that allow us to do physical partitioning based on the type of media, and expose the disk as partitions. We could have one OSD to a partition. Ceph provides equivalent functionality to a VSAN. Each ceph OSD manages a physical drive or a subset of a drive. Q: How can hardware RAID coexist with Ceph? A: Ceph can use hardware RAID for its underlying storage, however, as Ceph manages its own durability, there is not necessarily additional benefit of adding RAID in most cases.  Doing so would duplicate the durability functions at a block level, reducing capacity and impacting performance. A lower latency drive could perform better. Most people use multiple 3-way replication or they just use erasure coding. Another consideration is you can run on any server instead of hard-coding for particular RAID adapters.              

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

30 Speakers Highlight AI, Memory, Sustainability, and More at the May 21-22 Summit!

SNIA CMS Community

May 1, 2024

title of post
SNIA Compute, Memory, and Storage Summit is where solutions, architectures, and community come together. Our 2024 Summit – taking place virtually on May 21-22, 2024 – is the best example to date, featuring a stellar lineup of 30 speakers in sessions on artificial intelligence, the future of memory, sustainability, critical storage security issues, the latest on CXL®, UCIe™, and Ultra Ethernet, and more. “We’re excited to welcome executives, architects, developers, implementers, and users to our 12th annual Summit,” said David McIntyre, Compute, Memory, and Storage Summit Chair and member of the SNIA Board of Directors. “Our event features technology leaders from companies like Dell, IBM, Intel, Meta, Samsung – and many more – to bring us the latest developments in AI, compute, memory, storage, and security in our free online event.  We hope you will attend live to ask questions of our experts as they present and watch those you miss on-demand.“ Artificial intelligence sessions sponsored by the SNIA Data, Networking & Storage Forum feature J Michel Metz of the Ultra Ethernet Consortium (UEC) on powering AI’s future with the UEC,  John Cardente of Dell on storage requirements for AI, Jeff White of Dell on edgenuity, and Garima Desai of Samsung on creating a sustainable semiconductor industry for the AI era. Other AI sessions include Manoj Wadekar of Meta on the evolution of hyperscale data centers from CPU centric to GPU accelerated AI, Paul McLeod of Supermicro on storage architecture optimized for AI, and Prasad Venkatachar of Pliops on generative AI data architecture. Memory sessions begin with Jim Handy and Tom Coughlin on how memories are driving big architectural changes. Ahmed Medhioub of Astera Labs will discuss breaking through the memory wall with CXL, and Sudhir Balasubramanian and Arvind Jagannath of VMware will share their memory vision for real world applications. Compute sessions include Andy Walls of IBM on computational storage and real time ransomware detection, JB Baker of ScaleFlux on computational storage real world deployments, Dominic Manno of Los Alamos National Labs on streamlining scientific workflows in computational storage, and Bill Martin and Jason Molgaard of the SNIA Computational Storage Technical Work Group on computational storage standards. CXL and UCIe will be featured with a CXL Consortium panel on increasing AI and HPC application performance with CXL fabrics and a session from Samsung and Broadcom on bringing unique customer value with CXL accelerator-based memory solutions. Richelle Ahlvers and Brian Rea of the UCI Express will discuss enabling an open chipset system with UCIe. The Summit will also dive into security with a number of presentations on this important topic. And there is much more, including a memory Birds-of-a-Feather session, a live Memory Workshop and Hackathon featuring CXL exercises, and opportunities to chat with our experts! Check out the agenda and register for free! The post 30 Speakers Highlight AI, Memory, Sustainability, and More at the May 21-22 Summit! first appeared on SNIA Compute, Memory and Storage Blog.

Olivia Rhye

Product Manager, SNIA

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Power Efficiency Measurement – Our Experts Make It Clear – Part 4

title of post
Measuring power efficiency in datacenter storage is a complex endeavor. A number of factors play a role in assessing individual storage devices or system-level logical storage for power efficiency. Luckily, our SNIA experts make the measuring easier! In this SNIA Experts on Data blog series, our experts in the SNIA Solid State Storage Technical Work Group and the SNIA Green Storage Initiative explore factors to consider in power efficiency measurement, including the nature of application workloads, IO streams, and access patterns; the choice of storage products (SSDs, HDDs, cloud storage, and more); the impact of hardware and software components (host bus adapters, drivers, OS layers); and access to read and write caches, CPU and GPU usage, and DRAM utilization. Join us on our final installment on the  journey to better power efficiency - Part 4: Impact of Storage Architectures on Power Efficiency Measurement. And if you missed our earlier segments, click on the titles to read them:  Part 1: Key Issues in Power Efficiency Measurement,  Part 2: Impact of Workloads on Power Efficiency Measurement, and Part 3: Traditional Differences in Power Consumption: Hard Disk Drives vs Solid State Drives.  Bookmark this blog series and explore the topic further in the SNIA Green Storage Knowledge Center. Impact of Storage Architectures on Power Efficiency Measurement Ultimately, the interplay between hardware and software storage architectures can have a substantial impact on power consumption. Optimizing these architectures based on workload characteristics and performance requirements can lead to better power efficiency and overall system performance. Different hardware and software storage architectures can lead to varying levels of power efficiency. Here's how they impact power consumption. Hardware Storage Architectures
  1. HDDs v SSDs: Solid State Drives (SSDs) are generally more power-efficient than Hard Disk Drives (HDDs) due to their lack of moving parts and faster access times. SSDs consume less power during both idle and active states.
  2. NVMe® v SATA SSDs: NVMe (Non-Volatile Memory Express) SSDs often have better power efficiency compared to SATA SSDs. NVMe's direct connection to the PCIe bus allows for faster data transfers, reducing the time components need to be active and consuming power. NVMe SSDs are also performance optimized for different power states.
  3. Tiered Storage: Systems that incorporate tiered storage with a combination of SSDs and HDDs optimize power consumption by placing frequently accessed data on SSDs for quicker retrieval and minimizing the power-hungry spinning of HDDs.
  4. RAID Configurations: Redundant Array of Independent Disks (RAID) setups can affect power efficiency. RAID levels like 0 (striping) and 1 (mirroring) may have different power profiles due to how data is distributed and mirrored across drives.
Software Storage Architectures
  1. Compression and Deduplication: Storage systems using compression and deduplication techniques can affect power consumption. Compressing data before storage can reduce the amount of data that needs to be read and written, potentially saving power.
  2. Caching: Caching mechanisms store frequently accessed data in faster storage layers, such as SSDs. This reduces the need to access power-hungry HDDs or higher-latency storage devices, contributing to better power efficiency.
  3. Data Tiering: Similar to caching, data tiering involves moving data between different storage tiers based on access patterns. Hot data (frequently accessed) is placed on more power-efficient storage layers.
  4. Virtualization Virtualized environments can lead to resource contention and inefficiencies that impact power consumption. Proper resource allocation and management are crucial to optimizing power efficiency.
  5. Load Balancing: In storage clusters, load balancing ensures even distribution of data and workloads. Efficient load balancing prevents overutilization of certain components, helping to distribute power consumption evenly
  6. Thin Provisioning: Allocating storage on-demand rather than pre-allocating can lead to more efficient use of storage resources, which indirectly affects power efficiency

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Power Efficiency Measurement – Our Experts Make It Clear – Part 4

title of post
Measuring power efficiency in datacenter storage is a complex endeavor. A number of factors play a role in assessing individual storage devices or system-level logical storage for power efficiency. Luckily, our SNIA experts make the measuring easier! In this SNIA Experts on Data blog series, our experts in the SNIA Solid State Storage Technical Work Group and the SNIA Green Storage Initiative explore factors to consider in power efficiency measurement, including the nature of application workloads, IO streams, and access patterns; the choice of storage products (SSDs, HDDs, cloud storage, and more); the impact of hardware and software components (host bus adapters, drivers, OS layers); and access to read and write caches, CPU and GPU usage, and DRAM utilization. Join us on our final installment on the  journey to better power efficiency – Part 4: Impact of Storage Architectures on Power Efficiency Measurement. And if you missed our earlier segments, click on the titles to read them:  Part 1: Key Issues in Power Efficiency Measurement,  Part 2: Impact of Workloads on Power Efficiency Measurement, and Part 3: Traditional Differences in Power Consumption: Hard Disk Drives vs Solid State Drives.  Bookmark this blog series and explore the topic further in the SNIA Green Storage Knowledge Center. Impact of Storage Architectures on Power Efficiency Measurement Ultimately, the interplay between hardware and software storage architectures can have a substantial impact on power consumption. Optimizing these architectures based on workload characteristics and performance requirements can lead to better power efficiency and overall system performance. Different hardware and software storage architectures can lead to varying levels of power efficiency. Here’s how they impact power consumption. Hardware Storage Architectures
  1. HDDs v SSDs: Solid State Drives (SSDs) are generally more power-efficient than Hard Disk Drives (HDDs) due to their lack of moving parts and faster access times. SSDs consume less power during both idle and active states.
  2. NVMe® v SATA SSDs: NVMe (Non-Volatile Memory Express) SSDs often have better power efficiency compared to SATA SSDs. NVMe’s direct connection to the PCIe bus allows for faster data transfers, reducing the time components need to be active and consuming power. NVMe SSDs are also performance optimized for different power states.
  3. Tiered Storage: Systems that incorporate tiered storage with a combination of SSDs and HDDs optimize power consumption by placing frequently accessed data on SSDs for quicker retrieval and minimizing the power-hungry spinning of HDDs.
  4. RAID Configurations: Redundant Array of Independent Disks (RAID) setups can affect power efficiency. RAID levels like 0 (striping) and 1 (mirroring) may have different power profiles due to how data is distributed and mirrored across drives.
Software Storage Architectures
  1. Compression and Deduplication: Storage systems using compression and deduplication techniques can affect power consumption. Compressing data before storage can reduce the amount of data that needs to be read and written, potentially saving power.
  2. Caching: Caching mechanisms store frequently accessed data in faster storage layers, such as SSDs. This reduces the need to access power-hungry HDDs or higher-latency storage devices, contributing to better power efficiency.
  3. Data Tiering: Similar to caching, data tiering involves moving data between different storage tiers based on access patterns. Hot data (frequently accessed) is placed on more power-efficient storage layers.
  4. Virtualization Virtualized environments can lead to resource contention and inefficiencies that impact power consumption. Proper resource allocation and management are crucial to optimizing power efficiency.
  5. Load Balancing: In storage clusters, load balancing ensures even distribution of data and workloads. Efficient load balancing prevents overutilization of certain components, helping to distribute power consumption evenly
  6. Thin Provisioning: Allocating storage on-demand rather than pre-allocating can lead to more efficient use of storage resources, which indirectly affects power efficiency
The post Power Efficiency Measurement – Our Experts Make It Clear – Part 4 first appeared on SNIA Compute, Memory and Storage Blog.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Subscribe to