Ceph Storage for AI/ML Q&A

title of post

At our  SNIA Cloud Storage Technologies webinar, “Ceph Storage in a World of AI/ML Workloads,” our experts, Kyle Bader from IBM and Philip Williams from Canonical, explained how open source solutions like Ceph can provide almost limitless scaling capabilities, both for performance and capacity for AI/ML workloads. If you missed the presentation, it’s available at the SNIA Educational Library along with the slides. 

The live webinar audience was highly engaged with this timely topic, and they asked several questions. Our presenters have generously taken the time to answer them here. 

 Q: What does checkpoint mean?   

A: Checkpointing is storing the state of the model (weights and optimizer states) to storage so that if there is an issue with the training cluster, training can resume from the last checkpoint instead of starting from scratch.

Q: Is CEPH a containerized solution?   

A: One of the ways to deploy Ceph is as a set of containers. These can be hosted directly on conventional Linux hosts and coordinated with cephadm / systemd / podman, or in a Kubernetes cluster using an operator like Rook.

Q: Any advantages or disadvantages of using Hardware RAID to help with data protection and redundancy along with Ceph?

A: The main downside to using RAID under Ceph is the additional overhead. You can adjust the way Ceph does data protection to compensate, but generally that changes the availability of the pools in a negative way. That said, you could, in theory, construct larger storage systems, in terms of total PBs if you were to have an OSD per aggregate instead of per physical disk. 

Q: What are the changes we are seeing to upgrade the storage hardware with AI? Only GPUs or is there other specific hardware to be upgraded?

A: There are no GPU-upgrades required for the storage hardware. If you’re running training or inference co-resident on the storage hosts then you could include GPUs, but for standalone storage serving AI workloads, there is no need for GPUs in the storage system themselves. Ceph off the shelf servers configured for high throughput is all that’s needed.

Q: Does Ceph provide AI features?   

A: In the context of AI, the biggest thing that is needed, and that is provided by Ceph, is scalability in multiple dimensions (capacities, bandwidth, IOPs, etc). We also have some capabilities to do device failure predictions using a model.

Q: How do you see a storage admin career in AI Industry and what are the key learnings needed? 

 A: Understanding how to use scale-out storage technologies like Ceph. Understanding the hardware, differences in types of SSDs, networking - basically, feeds and speeds type stuff. It's also essential to learn as much as possible about what AI practitioners are doing, so that you can "meet them in their world" and have constructive conversations. 

Q: Any efforts to move the processing into the Ceph infrastructure so the data doesn't have to move? 

A: Yes! At the low level, RADOS (Reliable Autonomic Distributed Object Store) has always had classes that can be executed on objects, they tend to be used to provide the semantics needed for different protocols. So, at the core, Ceph has always been a computational storage technology. More recently as an example, we’ve seen S3 Select added to the object protocol, which allows pushdown of filtering and aggregation - think pNFS but for tabular data, with storage side filtering and aggregation.

Q: What is the realistic checkpoint frequency?

A: The best thing to do is to checkpoint every round, but that might not be viable depending on the bandwidth of the storage system, the size of the checkpoint, the amount of data parallelization in the training pipeline, and whether or not they are leveraging asynchronous checkpointing. The more frequently the better. As the GPU cluster gets bigger, the need to checkpoint more frequently goes up, because they need to protect against failures in the training environment. 

Q: Why train with Ceph storage instead of direct-attached NVMe storage? That would speed up the training by orders of magnitude.

A: When you’re looking at modest data set sizes and looking for ways to do significant levels of data parallelization, yes, you could copy these data sets onto local attach NVMe storage. In this case, yes, you would accomplish faster results, just because that's how physics works. 

However, for larger recommendation systems, you may be dealing with much larger training data set sizes, and you might not be able to fit all of the necessary data onto local NVMe storage of the system.  In this case, there are a number of trade-offs people make that favor the use of external Ceph storage, including the size of your GPU system, the need for more flexibility, as well as the need for experimentation to test various ways to accomplish data level parallelism and data pipelining. All of this is intended to maximize the use of the GPUs, and causes readjustment on how you partition, pre-load and use data on local NVMe versus external Ceph storage. Flexibility and experimentation are important, and there are always trade-offs. 

Q: How can we integrate Ceph to an HCI environment using VMware? 

A: It’s possible to use NVMe-oF for that.

Q: Is there a QAT analogue (compression) on EPYCs? 

A: Not today, you could use one of the legacy PCIe add in cards though. 

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Erin Farr

May 6, 2024

title of post
In a little over a month, more than 1,500 people have viewed the SNIA Cloud Storage Technologies Initiative (CSTI) live webinar, “Ceph: The Linux of Storage Today,” with SNIA experts Vincent Hsu and Tushar Gohad. If you missed it, you can watch it on-demand at the SNIA Educational Library. The live audience was extremely engaged with our presenters, asking several interesting questions. As promised, Vincent and Tushar have answered them here. Given the high level of this interest in this topic, the CSTI is planning additional sessions on Ceph. Please follow us @sniacloud_com or at SNIA LinkedIn for dates. Q: How many snapshots can Ceph support per cluster? Q: Does Ceph provide Deduplication? If so, is it across objects, file and block storage?  A: There is no per-cluster limit. In the Ceph filesystem (cephfs) it is possible to create snapshots on a per-path basis, and currently the configurable default limit is 100 snapshots per path. The Ceph block storage (rbd) does not impose limits on the number of snapshots.  However, when using the native Linux kernel rbd client there is a limit of 510 snapshots per image. There is a Ceph project to support data deduplication, though it is not available yet. Q: How easy is the installation setup? I heard Ceph is hard to setup.  A: Ceph used to be difficult to install, however, the ceph deployment process has gone under many changes and improvements.  In recent years, the experience has been very streamlined. The cephadm system was created in order to bootstrap and manage the Ceph cluster, and Ceph also can now be deployed and managed via a dashboard.  Q: Does Ceph provide good user-interface to monitor usage, performance, and other details in case it is used as an object-as-a-service across multiple tenants? A: Currently the Ceph dashboard allows monitoring the usage and performance at the cluster level and also at a per-pool basis. This question falls under consumability.  Many people contribute to the community in this area. You will start seeing more of these management tool capabilities being added, to see a better profile of the utilization efficiencies, multi-tenancy, and qualities of service. The more that Ceph becomes the substrate for cloud-native on-premises storage, the more these technologies will show up in the community. Ceph dashboard has come a long way.   Q: A slide mentioned support for tiered storage. Is tiered meant in the sense of caching (automatically managing performance/locality) or for storing data with explicitly different lifetimes/access patterns? A: The slide mentioned the future support in Crimson for device tiering. That feature, for example, will allow storing data with different access patterns (and indeed lifetimes) on different devices. Access the full webinar presentation here.  Q: Can you discuss any performance benchmarks or case studies demonstrating the benefits of using Ceph as the underlying storage infrastructure for AI workloads?    A: The AI workloads have multiple requirements that Ceph is very suitable for:
  • Performance: Ceph can provide the high performance demands that AI workloads can require. As a SDS solution, it can be deployed on different hardware to provide the necessary performance characteristics that are needed. It can scale out and provide more parallelism to adapt to increase in performance demands. A recent post by a Ceph community member showed a Ceph cluster performing at 1 TiB/s.
  • Scale-out: Scale was built from the bottom up as a scale out solution. As the training and inferencing data is growing, it is possible to grow the cluster to provide more capacity and more performance. Ceph can scale to thousands of nodes.
  • Durability: Training data set sizes can become very large and it is important that the storage system itself takes care of the data durability, as transferring the data in and out of the storage system can be prohibitive. Ceph employs different techniques such as data replication and erasure coding, as well as automatic healing and data re-distribution to ensure data durability
  • Reliability: It is important that the storage systems operate continuously, even as failures are happening through the training and inference processing. In a large system where thousands of storage devices failures are the norm. Ceph was built from the ground up to avoid a single point of failure, and it can continue to operate and automatically recover when failures happen.
Object, Block, File support: Different AI applications require different types of storage. Ceph provides both object, block, and file access. Q: Is it possible to geo-replicate Ceph datastore? Having a few Exabytes in the single datacenter seems a bit scary.  A: We know you don’t want all your eggs in one basket. Ceph can perform synchronous or asynchronous replication. Synchronous replication is especially used in a stretch cluster context, where data can be spread across multiple data centers. Since Ceph is strongly consistent, stretch clusters are limited to deployments where the latency between the data center is relatively low. For example, stretch clusters are in general, shorter distance, i.e., not beyond 100-200 km.  Otherwise, the turnaround time would be too long. For longer distances for geo-replication, people typically perform asynchronous replication between different Ceph clusters. Ceph also supports different geo replication schemes. The Ceph Object storage (RGW) provides the ability to access data in multiple geographical regions and allow data to be synchronized between them. Ceph RBD provides asynchronous mirroring that enables replication of RBD images between different Ceph clusters. The Ceph filesystem provides similar capabilities, and improvements to this feature are being developed.  Q: Is there any NVMe over HDD percentage capacity that has the best throughput?  For example, for 1PB of HDD, how much NVMe capacity is recommended? Also, can you please include a link to the Communities Vincent referenced? A: Since NVMe provides superior performance to HDD, the more NVMe devices being used, the better the expected throughput. However, when factoring in cost and trying to get better cost/performance ratio, there are a few ways that Ceph can be configured to minimize the HDD performance penalties. The Ceph documentation recommends that in a mixed spinning and solid drive setup, the OSD metadata should be put on the solid state drive, and it should be at least in the range of 1-4% of the size of the HDD. Ceph also allows you to create different storage pools that can be built by different media types in order to accommodate different application needs. For example, applications that need higher IO and/or higher data throughput can be set to use the more expensive NVMe based data pool, etc. There is no hard rule.  It depends on factors like what CPU you have. What is seen today is that users tend to implement all Flash NVMe, but not a lot of hybrid configurations. They’ll implement all Flash, even object storage block storage, to get consistent performance. Another scenario is using HDD for high-capacity object storage for a data repository. The community and Ceph documentation have the best practices, known principles and architecture guidelines for a CPU to hard drive ratio or a CPU to NVMe ratio. The Ceph community is launching a user council to gather best practices from users, and involves two topics: Performance and Consumability If you are a user of Ceph, we strongly recommend you join the community and participate in user council discussions. https://ceph.io/en/community/ Q: Hardware RAID controllers made sense on few CPU cores systems. Can any small RAID controller compete with massive core densities and large memory banks on modern systems? A: Ceph provides its own durability, so in most cases there is no need to also use a RAID controller. Ceph can provide durability leveraging data replication and/or erasure coding schemes. Q: I would like to know if there is a Docker version for Ceph? What is the simple usage of Ceph? A: Full fledged Ceph system requires multiple daemons to be managed, and as such a single container image is not the best fit. Ceph can be deployed on Kubernetes via Rook. There have been different experimental upstream projects to allow running a simplified version of Ceph. These are not currently supported by the Ceph community. Q: Does Ceph support Redfish/ Swordfish APIs for management? Q: Was SPDK considered for low level locking? A: Yes, Ceph supports both Redfish and Swordfish APIs for management.  Here are example technical user guide references. https://docs.ceph.com/en/latest/hardware-monitoring/ https://www.snia.org/sites/default/files/technical_work/Swordfish/Swordfish_v1.0.6_UserGuide.pdf To answer the second part of your question, SeaStar, which follows similar design principals as SPDK, is used as the asynchronous programming library given it is already in C++ and allows us to use a pluggable network and storage stack—standard kernel/libc based network stack or DPDK, io_uring or SPDK, etc.  We are in discussion with the SeaStar community to see how SPDK can be natively enabled for storage access Q: Are there scale limitations on the number of MONs to OSDs, wouldn’t there be issues with OSDs reporting back to MON's (epochs, maps) etc management based on number of OSDs? A: The issue of scaling the number of OSDs has been tested and addressed. In 2017 it was reported that CERN tested successfully a Ceph cluster with over 10,000 OSDs. Nowadays, the public Ceph telemetry shows regularly many active clusters in the range of 1,000-4,000 OSDs. Q: I saw you have support for NVMe/TCP.  Are there any plans for adding NVMe/FC support? A: There are no current plans to support NVMe/FC. Q: What about fault tolerance? If we have one out of 24 nodes offline, how possible is data loss? How can the cluster avoid request to down nodes? A: There are two aspects to this question: Data loss: Ceph has reputation in the market for its very conservative approach to protect the data. Once it approaches critical mass, Ceph will stop the writes to the system. Availability: This depends on how you configured it. For example, some users spread 6 copies of data across 3 data centers. If you lose the whole site, or multiple drives, the data is still available. It really depends on what is your protection design for that. Data can be set to be replicated into different failure domains, in which case it can be guaranteed that, unless there are multiple failures in multiple domains, there is no data loss.  The cluster marks and tracks down nodes and makes sure that all requests go to nodes that are available. Ceph replicates the data and different schemes can be used to provide data durability. It depends on your configuration, but the design principle of Ceph is to make sure you don’t lose data. Let’s say you have 3-way replication. If you start to lose critical mass, Ceph will go into read-only mode. Ceph will stop the write operation to make sure you don’t update the current state until you recover it. Q: Can you comment on Ceph versus Vector Database? A: Ceph is a unified storage system that can provide file, block, and object access. It does not provide the same capabilities that a vector database needs to provide. There are cases where a vector database can use Ceph as its underlying storage system.  Q: Is there any kind of support for parallel I/O on Ceph? A: Ceph natively performs parallel I/O. By default, it schedules all operations directly to the OSDs and in parallel. Q: Can you use Ceph with two AD domains? Let say we have a path /FS/share1/. Can you create two SMB shares for this path, one per domain with different set of permissions each? A: Partial AD support has been recently added to upstream Ceph and will be available in future versions. Support for multiple ADs is being developed. Q: Does Ceph provide shared storage similar to Gluster or something like EFS?  Also, does Ceph work best with many small files or large files? A: Yes, Ceph provides shared file storage like EFS. There is no concrete answer to whether many small files are better than large files. Ceph can handle either.  In terms of “what is best”, most of the file storage today is not optimized for very tiny files. In general, many small files would likely use more metadata storage, and are likely to gain less from certain prefetching optimizations. Ceph can comfortably handle large files, though this is not a binary answer.  Over time, Ceph will continue to improve in terms of granularity of file support. Q: What type of storage is sitting behind the OSD design?  VMware SAN? A: The OSD device can use any raw block device, e.g., JBOD.  Also, the assumption here is that every OSD, traditionally, is mapped to one disk. It could be a virtual disk, but it’s typically a physical disk. Think of a bunch of NVMe disks in a physical server with one OSD handling one disk. But we can have namespaces, for example, ZNS type drives, that allow us to do physical partitioning based on the type of media, and expose the disk as partitions. We could have one OSD to a partition. Ceph provides equivalent functionality to a VSAN. Each ceph OSD manages a physical drive or a subset of a drive. Q: How can hardware RAID coexist with Ceph? A: Ceph can use hardware RAID for its underlying storage, however, as Ceph manages its own durability, there is not necessarily additional benefit of adding RAID in most cases.  Doing so would duplicate the durability functions at a block level, reducing capacity and impacting performance. A lower latency drive could perform better. Most people use multiple 3-way replication or they just use erasure coding. Another consideration is you can run on any server instead of hard-coding for particular RAID adapters.              

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Erin Farr

May 6, 2024

title of post
In a little over a month, more than 1,500 people have viewed the SNIA Cloud Storage Technologies Initiative (CSTI) live webinar, “Ceph: The Linux of Storage Today,” with SNIA experts Vincent Hsu and Tushar Gohad. If you missed it, you can watch it on-demand at the SNIA Educational Library. The live audience was extremely engaged with our presenters, asking several interesting questions. As promised, Vincent and Tushar have answered them here. Given the high level of this interest in this topic, the CSTI is planning additional sessions on Ceph. Please follow us @sniacloud_com or at SNIA LinkedIn for dates. Q: How many snapshots can Ceph support per cluster? Q: Does Ceph provide Deduplication? If so, is it across objects, file and block storage?  A: There is no per-cluster limit. In the Ceph filesystem (cephfs) it is possible to create snapshots on a per-path basis, and currently the configurable default limit is 100 snapshots per path. The Ceph block storage (rbd) does not impose limits on the number of snapshots.  However, when using the native Linux kernel rbd client there is a limit of 510 snapshots per image. There is a Ceph project to support data deduplication, though it is not available yet. Q: How easy is the installation setup? I heard Ceph is hard to setup.  A: Ceph used to be difficult to install, however, the ceph deployment process has gone under many changes and improvements.  In recent years, the experience has been very streamlined. The cephadm system was created in order to bootstrap and manage the Ceph cluster, and Ceph also can now be deployed and managed via a dashboard.  Q: Does Ceph provide good user-interface to monitor usage, performance, and other details in case it is used as an object-as-a-service across multiple tenants? A: Currently the Ceph dashboard allows monitoring the usage and performance at the cluster level and also at a per-pool basis. This question falls under consumability.  Many people contribute to the community in this area. You will start seeing more of these management tool capabilities being added, to see a better profile of the utilization efficiencies, multi-tenancy, and qualities of service. The more that Ceph becomes the substrate for cloud-native on-premises storage, the more these technologies will show up in the community. Ceph dashboard has come a long way.   Q: A slide mentioned support for tiered storage. Is tiered meant in the sense of caching (automatically managing performance/locality) or for storing data with explicitly different lifetimes/access patterns? A: The slide mentioned the future support in Crimson for device tiering. That feature, for example, will allow storing data with different access patterns (and indeed lifetimes) on different devices. Access the full webinar presentation here.  Q: Can you discuss any performance benchmarks or case studies demonstrating the benefits of using Ceph as the underlying storage infrastructure for AI workloads?    A: The AI workloads have multiple requirements that Ceph is very suitable for:
  • Performance: Ceph can provide the high performance demands that AI workloads can require. As a SDS solution, it can be deployed on different hardware to provide the necessary performance characteristics that are needed. It can scale out and provide more parallelism to adapt to increase in performance demands. A recent post by a Ceph community member showed a Ceph cluster performing at 1 TiB/s.
  • Scale-out: Scale was built from the bottom up as a scale out solution. As the training and inferencing data is growing, it is possible to grow the cluster to provide more capacity and more performance. Ceph can scale to thousands of nodes.
  • Durability: Training data set sizes can become very large and it is important that the storage system itself takes care of the data durability, as transferring the data in and out of the storage system can be prohibitive. Ceph employs different techniques such as data replication and erasure coding, as well as automatic healing and data re-distribution to ensure data durability
  • Reliability: It is important that the storage systems operate continuously, even as failures are happening through the training and inference processing. In a large system where thousands of storage devices failures are the norm. Ceph was built from the ground up to avoid a single point of failure, and it can continue to operate and automatically recover when failures happen.
Object, Block, File support: Different AI applications require different types of storage. Ceph provides both object, block, and file access. Q: Is it possible to geo-replicate Ceph datastore? Having a few Exabytes in the single datacenter seems a bit scary.  A: We know you don’t want all your eggs in one basket. Ceph can perform synchronous or asynchronous replication. Synchronous replication is especially used in a stretch cluster context, where data can be spread across multiple data centers. Since Ceph is strongly consistent, stretch clusters are limited to deployments where the latency between the data center is relatively low. For example, stretch clusters are in general, shorter distance, i.e., not beyond 100-200 km.  Otherwise, the turnaround time would be too long. For longer distances for geo-replication, people typically perform asynchronous replication between different Ceph clusters. Ceph also supports different geo replication schemes. The Ceph Object storage (RGW) provides the ability to access data in multiple geographical regions and allow data to be synchronized between them. Ceph RBD provides asynchronous mirroring that enables replication of RBD images between different Ceph clusters. The Ceph filesystem provides similar capabilities, and improvements to this feature are being developed.  Q: Is there any NVMe over HDD percentage capacity that has the best throughput?  For example, for 1PB of HDD, how much NVMe capacity is recommended? Also, can you please include a link to the Communities Vincent referenced? A: Since NVMe provides superior performance to HDD, the more NVMe devices being used, the better the expected throughput. However, when factoring in cost and trying to get better cost/performance ratio, there are a few ways that Ceph can be configured to minimize the HDD performance penalties. The Ceph documentation recommends that in a mixed spinning and solid drive setup, the OSD metadata should be put on the solid state drive, and it should be at least in the range of 1-4% of the size of the HDD. Ceph also allows you to create different storage pools that can be built by different media types in order to accommodate different application needs. For example, applications that need higher IO and/or higher data throughput can be set to use the more expensive NVMe based data pool, etc. There is no hard rule.  It depends on factors like what CPU you have. What is seen today is that users tend to implement all Flash NVMe, but not a lot of hybrid configurations. They’ll implement all Flash, even object storage block storage, to get consistent performance. Another scenario is using HDD for high-capacity object storage for a data repository. The community and Ceph documentation have the best practices, known principles and architecture guidelines for a CPU to hard drive ratio or a CPU to NVMe ratio. The Ceph community is launching a user council to gather best practices from users, and involves two topics: Performance and Consumability If you are a user of Ceph, we strongly recommend you join the community and participate in user council discussions. https://ceph.io/en/community/ Q: Hardware RAID controllers made sense on few CPU cores systems. Can any small RAID controller compete with massive core densities and large memory banks on modern systems? A: Ceph provides its own durability, so in most cases there is no need to also use a RAID controller. Ceph can provide durability leveraging data replication and/or erasure coding schemes. Q: I would like to know if there is a Docker version for Ceph? What is the simple usage of Ceph? A: Full fledged Ceph system requires multiple daemons to be managed, and as such a single container image is not the best fit. Ceph can be deployed on Kubernetes via Rook. There have been different experimental upstream projects to allow running a simplified version of Ceph. These are not currently supported by the Ceph community. Q: Does Ceph support Redfish/ Swordfish APIs for management? Q: Was SPDK considered for low level locking? A: Yes, Ceph supports both Redfish and Swordfish APIs for management.  Here are example technical user guide references. https://docs.ceph.com/en/latest/hardware-monitoring/ https://www.snia.org/sites/default/files/technical_work/Swordfish/Swordfish_v1.0.6_UserGuide.pdf To answer the second part of your question, SeaStar, which follows similar design principals as SPDK, is used as the asynchronous programming library given it is already in C++ and allows us to use a pluggable network and storage stack—standard kernel/libc based network stack or DPDK, io_uring or SPDK, etc.  We are in discussion with the SeaStar community to see how SPDK can be natively enabled for storage access Q: Are there scale limitations on the number of MONs to OSDs, wouldn’t there be issues with OSDs reporting back to MON’s (epochs, maps) etc management based on number of OSDs? A: The issue of scaling the number of OSDs has been tested and addressed. In 2017 it was reported that CERN tested successfully a Ceph cluster with over 10,000 OSDs. Nowadays, the public Ceph telemetry shows regularly many active clusters in the range of 1,000-4,000 OSDs. Q: I saw you have support for NVMe/TCP.  Are there any plans for adding NVMe/FC support? A: There are no current plans to support NVMe/FC. Q: What about fault tolerance? If we have one out of 24 nodes offline, how possible is data loss? How can the cluster avoid request to down nodes? A: There are two aspects to this question: Data loss: Ceph has reputation in the market for its very conservative approach to protect the data. Once it approaches critical mass, Ceph will stop the writes to the system. Availability: This depends on how you configured it. For example, some users spread 6 copies of data across 3 data centers. If you lose the whole site, or multiple drives, the data is still available. It really depends on what is your protection design for that. Data can be set to be replicated into different failure domains, in which case it can be guaranteed that, unless there are multiple failures in multiple domains, there is no data loss.  The cluster marks and tracks down nodes and makes sure that all requests go to nodes that are available. Ceph replicates the data and different schemes can be used to provide data durability. It depends on your configuration, but the design principle of Ceph is to make sure you don’t lose data. Let’s say you have 3-way replication. If you start to lose critical mass, Ceph will go into read-only mode. Ceph will stop the write operation to make sure you don’t update the current state until you recover it. Q: Can you comment on Ceph versus Vector Database? A: Ceph is a unified storage system that can provide file, block, and object access. It does not provide the same capabilities that a vector database needs to provide. There are cases where a vector database can use Ceph as its underlying storage system.  Q: Is there any kind of support for parallel I/O on Ceph? A: Ceph natively performs parallel I/O. By default, it schedules all operations directly to the OSDs and in parallel. Q: Can you use Ceph with two AD domains? Let say we have a path /FS/share1/. Can you create two SMB shares for this path, one per domain with different set of permissions each? A: Partial AD support has been recently added to upstream Ceph and will be available in future versions. Support for multiple ADs is being developed. Q: Does Ceph provide shared storage similar to Gluster or something like EFS?  Also, does Ceph work best with many small files or large files? A: Yes, Ceph provides shared file storage like EFS. There is no concrete answer to whether many small files are better than large files. Ceph can handle either.  In terms of “what is best”, most of the file storage today is not optimized for very tiny files. In general, many small files would likely use more metadata storage, and are likely to gain less from certain prefetching optimizations. Ceph can comfortably handle large files, though this is not a binary answer.  Over time, Ceph will continue to improve in terms of granularity of file support. Q: What type of storage is sitting behind the OSD design?  VMware SAN? A: The OSD device can use any raw block device, e.g., JBOD.  Also, the assumption here is that every OSD, traditionally, is mapped to one disk. It could be a virtual disk, but it’s typically a physical disk. Think of a bunch of NVMe disks in a physical server with one OSD handling one disk. But we can have namespaces, for example, ZNS type drives, that allow us to do physical partitioning based on the type of media, and expose the disk as partitions. We could have one OSD to a partition. Ceph provides equivalent functionality to a VSAN. Each ceph OSD manages a physical drive or a subset of a drive. Q: How can hardware RAID coexist with Ceph? A: Ceph can use hardware RAID for its underlying storage, however, as Ceph manages its own durability, there is not necessarily additional benefit of adding RAID in most cases.  Doing so would duplicate the durability functions at a block level, reducing capacity and impacting performance. A lower latency drive could perform better. Most people use multiple 3-way replication or they just use erasure coding. Another consideration is you can run on any server instead of hard-coding for particular RAID adapters.              

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Here’s Why Ceph is the Linux of Storage Today

Erin Farr

Feb 14, 2024

title of post
Data is one of the most critical resources of our time. Storage for data has always been a critical architectural element for every data center, requiring careful considerations for storage performance, scalability, reliability, data protection, durability and resilience. A decade ago, the market was aggressively embracing public storage because of its agility and scalability. In the last few years, people have been rethinking that approach, moving toward on-premises storage with cloud consumption models. The new cloud native architecture on-premises has the promise of the traditional data center’s security and reliability with cloud agility and scalability. Ceph, an Open Source project for enterprise unified software-defined storage, represents a compelling solution for this cloud native on-premises architecture and will be the topic of our next SNIA Cloud Storage Technologies Initiative webinar, “Ceph: The Linux of Storage Today.” This webinar will discuss:
  • How Ceph targets important characteristics of modern software-defined data centers
  • Use cases that illustrate how Ceph has evolved, along with future use cases
  • Quantitative data points that exemplify Ceph’s community success
We will describe how Ceph is gaining industry momentum, satisfying enterprise architectures’ data storage needs and how the technology community is investing to enable the vision of “Ceph, the Linux of Storage Today.” Register today to join us for this timely discussion.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Here’s Why Ceph is the Linux of Storage Today

Erin Farr

Feb 14, 2024

title of post
Data is one of the most critical resources of our time. Storage for data has always been a critical architectural element for every data center, requiring careful considerations for storage performance, scalability, reliability, data protection, durability and resilience. A decade ago, the market was aggressively embracing public storage because of its agility and scalability. In the last few years, people have been rethinking that approach, moving toward on-premises storage with cloud consumption models. The new cloud native architecture on-premises has the promise of the traditional data center’s security and reliability with cloud agility and scalability. Ceph, an Open Source project for enterprise unified software-defined storage, represents a compelling solution for this cloud native on-premises architecture and will be the topic of our next SNIA Cloud Storage Technologies Initiative webinar, “Ceph: The Linux of Storage Today.” This webinar will discuss:
  • How Ceph targets important characteristics of modern software-defined data centers
  • Use cases that illustrate how Ceph has evolved, along with future use cases
  • Quantitative data points that exemplify Ceph’s community success
We will describe how Ceph is gaining industry momentum, satisfying enterprise architectures’ data storage needs and how the technology community is investing to enable the vision of “Ceph, the Linux of Storage Today.” Register today to join us for this timely discussion.

Olivia Rhye

Product Manager, SNIA

Find a similar article by tags

Leave a Reply

Comments

Name

Email Adress

Website

Save my name, email, and website in this browser for the next time I comment.

Subscribe to Ceph