Oct 6, 2015
Oct 5, 2015
We got some great questions during our Webcast on how OpenStack can consume and control file services appropriate for High Performance Computing (HPC) in a cloud and multi-tenanted environment. Here are answers to all of them. If you missed the Webcast, it’s now available on-demand. I encourage you to check it out and please feel free to leave any additional questions at this blog.
Q. Presumably we can use other than ZFS for the underlying filesystems in Lustre?
A. Yes, there a plenty of other filesystems that can be used other than ZFS. ZFS was given as an example of a scale up and modern filesystem that has recently been integrated, but essentially you can use most filesystem types with some having more advantages than others. What you are looking for is a filesystem that addresses the weaknesses of Lustre in terms of self-healing and scale up. So any filesystem that allows you to easily grow capacity whilst also being capable of protecting itself would be a reasonable choice. Remember, Lustre doesn’t do anything to protect the data itself. It simply places objects in a distributed fashion of the Object Storage Targets.
Q. Are there any other HPC filesystems besides Lustre?
A. Yes there are and depending on your exact requirements Lustre might not be appropriate. Gluster is an alternative that some have found slightly easier to manage and provides some additional functionality. IBM has GPFS which has been implemented as an HPC filesystem and other vendors have their scale-out filesystems too. An HPC filesystem is simply a scale-out filesystem capable of very good throughput with low latency. So under that definition a flash array could be considered a High Performance storage platform, or a scale out NAS appliance with some fast disks. It’s important to understand you’re workloads characteristics and demands before making the choice as each system has pro’s and con’s.
Q. Does “embarrassingly parallel” require bandwidth or latency from the storage system?
A. Depending on the workload characteristics it could require both. Bandwidth is usually the first demand though as data is shipped to the nodes for processing. Obviously the lower the latency the fast though jobs can start and run, but its not critical as there is limited communication between nodes that normally drives the low latency demand.
Q. Would you suggest to use Object Storage for NFV, i.e Telco applications?
A. I would for some applications. The problem with NFV is it actually captures a surprising breadth of applications so of which have very limited data storage needs. For example there is little need for storage in a packet switching environment beyond the OS and binaries needed to stand up the VM’s. In this case, object is a very good fit as it can be easily, geographically distributed ensuring the same networking function is delivered in the same manner. Other applications that require access to filtered data (so maybe billing based applications or content distribution) would also be good candidates.
Q. I missed something in the middle; please clarify, your suggestion is to use ZFS (on Linux) for the local file system on OSTs?
A. Yes, this was one example and where some work has recently been done in the Lustre community. This affords the OSS’s the capability of scaling the capacity upwards as well as offering the RAID-like protection and self-healing that comes with ZFS. Other filesystems can offer those some things so I am not suggesting it is the only choice.
Q. Why would someone want/need scale-up, when they can scale-out?
A. This can often come down to funding. A lot of HPC environments exist in academic institutions that rely on grant funding and sponsorship to expand their infrastructure. Sometimes it simply isn’t feasible to buy extra servers in order to add capacity, particularly if there is already performance headroom. It might also be the case that rack space, power and cooling could be factors in which case adding drives to cope with bigger workloads might be the only option. You do need to consider if the additional capacity would also provoke the need for better performance so we can’t just assume that adding disk is enough, but it’s certainly a good option and a requirement I have seen a number of times.
Oct 5, 2015
Sep 24, 2015
We had a great response to last week’s Webcast “Controlling Congestion in New Storage Architectures” where we introduced CONGA, a new congestion control mechanism that is the result of research at Stanford University. We had many good questions at the live event and have complied answers for all of them in this blog. If you think of additional questions, please feel free to comment here and we’ll get back to you as soon as possible.
Q. Isn’t the leaf/spine network just a Clos network? Since the network has loops, isn’t there a deadlock hazard if pause frames are sent within the network?
A. CLOS/Spine-Leaf networks are based on routing, which has its own loop prevention (TTLs/RPF checks).
Q. Why isn’t the congestion metric subject to the same delays as the rest of the data traffic?
A. It is, but since this is done in the data plane with 40/100g within a data center fabric it can be done in near real time and without the delay of sending it to a centralized control plane.
Q. Are packets dropped in certain cases?
A. Yes, there can be certain reasons why a packet might be dropped.
Q. Why is there no TCP reset? Is it because the Ethernet layer does the flowlet retransmission before TCP has to do a resend?
A. There are many reasons for a TCP reset, CONGA does not prevent them, but it can help with how the application responds to a loss. If there is a loss of the flowlet it is less detrimental to how the application performs because it will resend what it has lost versus the potential for full TCP connection to be reset.
Q. Is CONGA on an RFC standard track?
A. CONGA is based on research done at Stanford. It is not currently an RFC.
The research information can be found here.
Q. How does ECN fit into CONGA?
A. ECN can be used in conjunction with CONGA, as long as the host/networking hardware supports it.
Sep 24, 2015
Sep 19, 2015
Sep 19, 2015
I’ll be speaking at SNIA’s SDC Pre-Conference this Sunday, Sept 20, about the new Intel-Micron 3D XPoint memory. I was surprised to find that my talk won’t be unique. There are about 15 papers at this conference that will be discussing NVM, or persistent memory.
What’s all this fuss about?
Part of it has to do with the introduction by Micron & Intel of their 3D XPoint (pronounced “Crosspoint”) memory. This new product will bring nonvolatility, or persistence, to main memory, and that’s big!
Intel itself will present a total of seven papers to tell us all how they envision this technology being used in computing applications. Seven other companies, other than Objective Analysis (my company) will also discuss this hot new topic.
SNIA is really on top of this new trend. This organization has been developing standards for nonvolatile memory for the past couple of years, and has published an NVM Programming Model to help software developers produce code that will communicate with nonvolatile memory no matter who supplies it. Prior to SNIA’s intervention the market was wildly inconsistent, and all suppliers’ NVDIMMs differed slightly from one another, with no promise that this would become any better once new memory technologies started to make their way onto memory modules.
Now that Intel and Micron will be producing their 3D XPoint memory, and will be supplying it on industry-standard DDR4 DIMMs, it’s good to know that there will be a standard protocol to communicate with it. This will facilitate the development of standard software to harness all that nonvolatile memory has to offer.
As for me, I will be sharing information from my company’s new report on the Micron-Intel 3D XPoint memory. This is new, and it’s exciting. Will it succeed? I’ll discuss that with you there.
Sep 18, 2015
Sep 16, 2015
Sep 15, 2015
Leave a Reply