A while back I write an article entitled “NVMe for Absolute Beginners.” It seems to have resonated with a lot of people and it appears there might be a call for doing the same thing for NVMe® over Fabrics (NVMe-oF).
This article is for absolute beginners. If you are a
seasoned (or even moderately-experienced) technical person, this probably won’t
be news to you. However, you are free (and encouraged!) to point people to this
article who need Plain English to get started.
A Quick Refresher
Any time an application on a computer (or server, or even a consumer
device like a phone) needs to talk to a storage device, there are a couple of
things that you need to have. First, you need to have memory (like RAM), you
need to have a CPU, and you also need to have something that can hold onto your
data for the long haul (also called storage).
Another thing you need to have is a way for the CPU to talk
to the memory device (on one hand) and the storage device (on the other). Thing
is, CPUs talk a very specific language, and historically memory could speak
that language, but storage could not.
For many years, things ambled along in this way. The CPU
would talk natively with memory, which made it very fast but also was somewhat
risky because memory was considered volatile. That is, if there was a
power blip (or went out completely), any data in memory would be wiped out.
Not fun.
So, you wanted to have your data stored somewhere
permanently, i.e., on a non-volatile medium. For many years, that meant
hard disk drives (HDDs). This was great, and worked well, but didn’t really
work fast.
Solid State Disks, or SSDs, changed all that. SSDs don’t
have moving parts, which ultimately meant that you could get your data to and
from the device faster. Much faster. However, as they got faster, it
became clear that because the CPU didn’t talk to SSDs natively using the same
language – and needed an adapter of some kind – we weren’t getting as fast as
we wanted to be.
Enter Non-Volatile Memory Express (NVMe).
NVMe changed the nature of the game completely. For one, it
removed the need for an adapter by allowing the CPU to talk to the storage
natively. (In technical terms, what it did was allow the CPU to treat storage as
if it were memory, with which it could already speak natively through a
protocol called PCIe).
The second thing that was pretty cool was NVMe changed the
nature of the relationship with storage from this:
… which was necessarily a 1:1 relationship, to this:
… which now meant that you could have more than one
relationship between devices.
Very cool.
Since I wrote the “NVMe for Absolute Beginners” article a
few years ago, the technology has taken off like wildfire. In only a few short
years, there have been more NVMe storage drives shipped than the previous go-to
technology (i.e., SATA).
By this point, there are many, many more articles written
about NVMe than there were back then. Now, however, there are a lot of
questions about what happens when you want to go outside of
the range of PCIe.
NVMe® over Fabrics
Thing is, NVMe using PCIe is a technology that is best used inside
a computer or server. PCIe is not generally regarded as a “fabric” technology.
So what is a “fabric” technology, and what makes it
so special?
Like anything else, there are trade-offs when it comes to
technology. The great thing about NVMe using PCIe is that it is wicked fast.
The not-so-great thing about NVMe using PCIe is that it’s contained inside of a
single computer. If you want to go outside of the computer, well, things
get tricky… unless you do something special.
In general terms, a “fabric” is that “something
special.” It’s not as easy as putting a storage device at the end of a wire and
calling it quits. Oh no; there is so much more that needs to be done.
Any time you want to go outside of a computer or server, you
need to be extra careful, because there are a lot more things that can go
wrong. As in, an exponential number of things can go wrong. Not only do
you need to try your best to make sure that things don’t go wrong in the first
place, but you need to put systems in place to handle those problems when they
do.
The good news is that there are a lot of choices when
it comes to solving this problem. These storage networks have been the
tried and true means by which people have handled storage solutions at scale.
Technologies like Fibre Channel, Ethernet and InfiniBand have been used to
connect servers and storage for years. Each one has its place, and each one has
its fans (and with good reasons).
Because of this, there was no reason for the NVM Express group (the people behind the NVMe protocol) to create their own, new, fabric. Why re-invent the wheel? Instead, it was much better to use the battle-hardened technologies that were already available.
That’s why it’s called NVMe over Fabrics; we are
simply using the NVMe protocol to piggy-back on top of networking technologies
that already exist.
The Magic of Bindings
Imagine you’re rebuilding a Jeep. At a high level, you have
two basic parts to a Jeep’s structure –
you have the chassis, and you have the body. As you can imagine, you
can’t simply place the body on top of the chassis and start driving around. The
body is going to eventually slide right off the chassis. Not exactly safe.
By the same logic, we can’t simply place the NVMe commands
on top of a Fabric and expect, magically, that everything is going to work out
all the time. Just like our Jeep body, there needs to be a strong connection
with what happens underneath.
In NVMe-oF parlance, these are called bindings.
Bindings solve a number of problems. They are the glue that
holds the NVMe communication language to the underlying fabric transport
(whether it is Fibre Channel, InfiniBand, or various forms of Ethernet).
In particular, Bindings:
Define the establishment of a connection between
NVMe and the transport/fabric
Restrict capabilities based upon what the
transport fabric can (or can’t) do
Identify how NVMe is managed,
administratively, using the transport/fabric
Establish requirements of size, authentication,
type of information, etc., depending upon specific transport fabric methods
If you consider that with networking technology we think in
terms of layers, the NVMe over Fabrics bindings sit on top of the transport
fabrics layer, and it is the responsibility of the organizations who represent
those transport fabrics to make sure that there are appropriate connections
into the bindings from the other side.
For instance, the T11 Standards body is responsible for
creating the changes to the Fibre Channel standards so that it can interact
with the bindings appropriately, not just simply sling the NVMe commands from
one side to the other.
You can find out more about how this works in the Fibre Channel example by watching the FCIA BrightTalk Webinar – Introduction to FC-NVMe by yours truly and Craig Carlson from Cavium, now Marvell).
Types of Fabrics
Now, I’ve given you an example of one type of Fabric that
can be used for NVMe-oF, but Fibre Channel is not the only one. In fact, the
magic of NVMe-oF is that you can choose one from a number of transport types:
At the top of the graphic you can see the host, and at the
bottom you can see the storage. In the middle, you can see all of the different
networking options that could be used.
Now, the interesting thing here, is that NVMe-oF is not those
different types of transports. On the contrary, there are different
technological bodies that work on those different transports.
Instead, the magic of NVMe over Fabrics is the part represented by this:
And this:
To Bind or Not To Bind
Now, it’s important to know that just because the NVMe
Express group defines the bindings format for NVMe-oF (the “” is intentional,
here), it doesn’t mean that this is the only way to do it. In fact, before
the NVMe over Fabrics standard was ratified, there were quite a few companies
who created their own forms of moving NVMe commands from one device to another.
Let me be absolutely clear here: there is nothing
wrong with this!
Just because someone has a solution that isn’t standardized
does not mean that they are doing something wrong or, worse, doing
something nefarious. All it means is that they have figured out a different way
to handle the means by which they send NVMe commands from one place to another.
However…
It’s valuable to know whether or not a company is using a standardized
version of NVMe over Fabrics, or whether someone is using a proprietary
version of using a fabric to transport NVMe. The reason why
it’s important is that storage is an end-to-end problem that needs solving, and
you need to know how all of the parts fit together, and what
(if any) kind of special attention needs to be made in order to make everything
work together seamlessly.
For that reason, even though the acronym NVMe-oF looks
funny[1], it is the official acronym for NVMe over Fabrics. There are a
number of other popular acronyms, however, that have been used to represent
networked NVMe:
NVM/f
NVMe/F
NVMf
NVMe-F
NVMe-oE (“over Ethernet”)
And so on…
Most of the time these are innocent and harmless mistakes,
or simply affectations for a particular type of acronym. The problem comes when
a vendor uses a different acronym because it looks like they are using a
standardized version of the bindings when in fact it is not.
Taking advantage of people’s ignorance over the proper
terminology in order to make your product look like it’s something it isn’t is,
well, it’s uncool. You should especially beware if someone uses a
trademark symbol (“”) with an incorrect acronym.
Bottom Line
NVMe over Fabrics is a way of extending NVMe outside of a
computer/server. It is more than simply slapping the commands onto a network,
and it still helps to know the pros and cons of each transport fabric as it
applies to what you need to do.
Remember, there is no such thing as a panacea for storage.
Storage still has a very, very hard job:
Give me back the
correct bit I asked you to hold on to for me.
Everything that happens inside of NVMe and NVMe-oF is
designed to help make sure that happens.
If you are interested in learning more about NVMe, and NVMe
over Fabrics, may I recommend some additional reading and videos (whichever you
prefer):
[1] The reason why the acronym was chosen was because it was
supposed to reflect the various forms of NVMe. For instance, the NVMe
Management Interface is known as NVMe-MI, and the group wished for there to be
consistency across all the acronyms.
Leave a Reply