

Feb 15, 2022
Feb 15, 2022
Feb 15, 2022
Multiple clouds within an organization have become the norm. This strategy enables organizations to reduce risk and dependence on a single cloud platform. The SNIA Cloud Storage Technologies Initiative (CSTI) discussed this topic at length at our live webcast last month “Why Use Multiple Clouds?”
We polled our webcast attendees on their use of multiple clouds and here’s what we learned about the cloud platforms that comprise their multi-cloud environments:
Our expert presenters, Mark Carlson and Gregory Touretsky, also discussed the benefits of a storage abstraction layer that insulates the application from the underlying cloud provider’s interfaces, something the SNIA Cloud Data Management Interface (CDMI) 2.0 enables.
Cost is always an issue with cloud. One of our session attendees asked: do you have an example of a cloud vendor who does not toll for egress? There may be a few vendors that don’t charge, but one we know of that is toll free on egress is Seagate’s Lyve Cloud; they only charge for used capacity.
We were also challenged on the economics and increased cost due to the perceived complexity of multi-cloud specifically, security. While it’s true that there’s no standard security model for multi-cloud, there are 3rd party security solutions that can simplify its management, something we covered in the webinar.
If you missed this webinar, you can access it on-demand and get a copy of the presentation slides in the SNIA Educational Library
Feb 14, 2022
Over 900 people (and counting) have watched our SNIA Networking Storage Forum (NSF) webcast, “Object Storage: Trends, Use Cases” where our expert panelist had a lively discussion on object storage characteristics, use cases and performance acceleration. If you have not seen this session yet, we encourage you to check it out on-demand. The conversation included several interesting questions related to object storage. As promised, here are answers to them:
Q: Today object storage allows many new capabilities but also new challenges, such as the need for geographic and local load balancers in a distributed scale out infrastructure that at the same time do not become the bottleneck of the object services at an unsustainable cost. Are there any solutions available today that have these features built in?
A: Some object storage solutions have features such as load balancing and geographic distribution built into the software, though often the storage administrator must manually configure parts of these features at the network and/or server level. Most object storage cloud (StaaS) implementations include a distributed, scale-out infrastructure (including load balancing) in their implementation.
Q: What's the approximate current market share of block vs. file vs. object storage deployed today? Where do you see this going in the next 5 years?
A: You can analyze this based on spending or capacity, since object storage typically costs less per terabyte than block or file storage. Including all private and public cloud storage worldwide, object storage probably makes up between 20-30% of the spending and between 40-60% of all storage capacity. If we look only at enterprise (not cloud) storage, then object storage probably constitutes 10-15% of spending and 20-30% of capacity.
Q: There was a comment at the start of the discussion where object storage is less performant than block/file which was clarified as a myth? Can you share some performance numbers for a given size of data?
A: On average, existing object storage is less performant than existing block/file storage because it is usually deployed on top of slower storage media, slower servers, and slower networks. But there is no reason object storage needs to be any slower than block/file storage for throughput and large I/O sizes. If deployed using fast infrastructure, the fastest object storage solutions run just as fast—in throughput terms—as the fastest block or file storage. However, in many cases, object storage may not be appropriate for highly-transactional small I/O workloads, which typically run on top of block or file storage.
Q: Do I need to transform to key value or can I just query S3?
A: You don’t query S3. To retrieve an object via S3 is simply an HTTP GET request which can be done from a browser. Many types of object storage support the S3 API, either natively or through translation, but there may be some types that require switching your applications to support a different key value storage API.
Q: Where does NVMe KV Command Set (in NVMe 2.0) sit in the S3 Amazon stack? How does it change the API structure?
A. The NVMe Key Value Command Set does not sit at the same level as the S3 API. The S3 API sits above protocols like the NVMe KV Command Set. The SNIA Key Value API allows a library to be written to the NVMe KV Command Set specification which is part of NVMe 2.0. Amazon S3 today supports use of key value pairs but does not currently employ the SNIA Key Value Storage API.
Q: Aren’t analytics on Object Storage slow and difficult? Have there been any changes in this area that make analytics faster?
A: This is one of the myths about object storage that we wanted to debunk in this webcast. Analytics on object storage is only slow if the storage itself is slow. It’s difficult only if the analytics tools or query cannot query object storage. While it is true that most traditional object storage deployed in the past ran on slower storage media (and connected with slower networks), there are now fast object storage solutions that can perform just as well as block or file storage solutions. In fact, some object storage software/service options include analytics capabilities built into the storage servers, and computational storage can include analytics capabilities within the drives themselves.
Q: For Kubernetes, if the client is the app why is CSI required (COSI)?
A: CSI provides an interface between the containerized app and persistent storage outside of the Kubernetes orchestrator. It allows storage vendors to support containerized applications.
Q: Is the entire KV database from a given S3 bucket being downloaded to the local drive?
A: AWS S3 sync can be used to synchronize an entire bucket to a local directory, but there are multiple ways to move data to and from AWS S3 to your local directories or other instance types.
Q: Given the volume, sensitivity, and the hybrid nature of data generation, location, and access -- does object storage include security/encryption/key management built into the solution deployments?
A: Some object storage products include encryption and key management. Others do encryption while integrating with an external key management solution. At a high level, any object storage solution should include support for encryption and other security features.
Q: Does object storage support compression and dedupe?
A: Most object storage solutions include the ability to support dedupe or single-instance storage (storing only one copy of identical objects if the same object is submitted multiple times). Some object storage solutions include support for compression performed within the storage service, but it’s more common for objects to be compressed by the application or client before being sent to the object storage system.
Q: Amazon's S3 in-the-cloud storage means saving in data ingress-egress, but losing on the Amazon CPU to perform the analysis in Amazon's cloud compute platform, doesn't it? Not understanding how data remains ""local."
A: If you’re comparing AWS S3 to on-premises local storage, whether it will be less expensive to run analytics using AWS or using your own on-prem servers depends on the scale, maturity, and efficiency of your in-house analytics. Typically, an IT department building a small or new analytics operation will find it less costly to use AWS cloud storage and cloud analytics. While a large IT organization running a scalable, mature and efficient analytics operation would find they can do so at a lower cost than outsourcing it to AWS. Whether on-prem or in the cloud, object storage solutions can typically scale out further in capacity, while supporting a customizable level of processing performance based on the user’s requirements.
Q: Cheap and deep describes Openstack Swift, which claims to be hardware agnostic (deploys on readily available commodity hardware) - then you have to add network bandwidth, CPU, SSD, etc, for what you want to do at speed that makes it cheaper in the long run to go for a purpose-built array and fabric. Why not stay client-server at the outset, with a fast array, fast processing and fast network?
A: For geographic location, data remains local if you store it in your local data center without replicating it to remote locations. For data analytics purposes, data is “local” if it’s stored in the same data center or on the same network segment as the analytics servers. When the data and the analytics servers are in different data centers and not connected by a high-bandwidth, low-latency network, then analytics performance may suffer. This is true for object or any other type of storage solution. If the data is stored in Amazon servers, there may be less control over where data remains.
Q: Does supporting the NVMe KV Command Set in NVMe SSD/HDDs improve the performance or latency when compared to standard NVM Command Set?
A: Using SSDs/HDDs which support the NVMe KV Command Set structure should improve performance and latency over using the standard NVM Command Set, if storing an object as a key value pair.
Q. Do SSDs need to support both Command sets or just one?
A. An SSD can support just NVMe Command Set, just the NVM Command Set or both. A namespace on an NVMe SSD is formatted for one or the other. To get the benefits of the NVMe KV Command Set, an SSD only needs to implement that command set.
Q. Are there any latest updates on the KV Command Set ecosystem in Linux?
A. The latest drivers for Linux are available on a public GitHub site at: https://github.com/OpenMPDK/KVSSD
Q: Computational storage with S3 SELECT: Usually, an object storage solution doesn't write objects to a single disk, there is some kind of erasure coding for data protection and probably some file system as an abstraction layer which the disk may not be aware of. Also, the data is usually encrypted. How would S3 SELECT be able to parse the original object data on a single drive?
A: Yes, most object storage solutions use erasure coding or a simple mirroring mechanism to ensure each object is stored in redundant locations, and yes, erasure coding usually splits up each object across multiple drives. A storage-side query such as AWS S3 Select runs a query on or near the object storage servers and returns a subset of the data to the client or requestor instead of returning the entire object to the requestor for the query. In this type of query, the object storage servers can decrypt encryption before executing the local query, if the encryption was done on the object server side. (If the encryption was done by the client before being sent to the object storage, then the queries would not be able to run at or on the storage servers.) The storage servers would also be able to reassemble an erasure-coded object locally to the storage servers to run the query, or possibly distribute and run the query on the multiple erasure coding destinations for that object.
Interested in more information on object storage? Check out the SNIA Educational Library.
Feb 14, 2022
Feb 11, 2022
Our new SNIA Compute, Memory, and Storage webcast focuses on a hot topic – storage-based cryptocurrency.
Blockchains, cryptocurrency, and the internet of markets are working to transform finance, wealth, safety, digital security, and trust. Storage-based cryptocurrencies had a breakout year in 2021. Proof of Space and Time is a new blockchain consensus that uses storage capacity to secure the blockchain. Decentralized file storage will enable alternatives to hyperscale data centers for hosting files and objects. Understanding the TCO of a storage system and optimizing the utilization of the storage hardware is critical in scaling these systems.
Join our speakers, Jonmichael Hands of Chia Network and Eli Tiomkin of NGD Systems, for this discussion on how a new approach of auto-plotting SSDs combined with computational storage can lower the total TCO. Registration is free for this webcast on Tuesday, February 15 at 10:00 am Pacific time. Click on the link to register and see you there! https://www.brighttalk.com/webcast/663/526154
Feb 11, 2022
Feb 9, 2022
Ransomware is a malware attack that uses a variety of methods to prevent or limit an organization or individual from accessing their IT systems and data, either by locking the system's screen, or by encrypting files until a ransom is paid, usually in cryptocurrency for reasons of anonymity.
By encrypting these files and demanding a ransom payment for the decryption key, the malware places organizations in a position where paying the ransom is the easiest and most cost-effective way to regain access to their files. It should be noted, however, that paying the ransom does not guarantee that users will get the decryption key required to regain access to the infected system or files.
In some instances, the perpetrators may steal an organization’s information and demand an additional payment in return for not disclosing the information to authorities, competitors or the public, something that would inflict reputational damage to the organization.
The cybercriminals who commit ransomware cybercrimes are now becoming so proficient at what they do that they use artificial intelligence in analyzing the victim’s environment to ensure that recovering files is extremely difficult if not impossible. Additionally, cybercriminals are offering RaaS (ransomware-as-a-service) to organized crime and government agencies to help them launch an attack while they reap the benefits. That may explain why large organizations, which theoretically have large sums of money to pay ransoms, are currently more likely to be targeted than individuals.
However, the landscape is changing, and ransomware is no longer just about a financial ransom with attacks being aimed at public services, utilities and infrastructure undermining public confidence.
Is Ransomware different from malware?
Ransomware is a cyber-attack where the sole purpose is financial gain. The cybercriminals ensure a path to a decryption key is available that they can sell to victims. In many cases however, the decryption key does not help or even partially help depending on the level of damage incurred by the organization trying to recover before giving up and agreeing to pay the ransom. On the other hand, the malware’s purpose is to damage the victim organization where there is no decryption key, or the malware simply encrypts or deletes the victim’s systems beyond recovery and there is no demand for a ransom.
How does ransomware work?
Ransomware can be unwittingly downloaded by visiting malicious or compromised websites or by downloading from malicious pages or advertisements. It can also be delivered as an attachment or a link in an email which is known as a phishing attack.
Once in the system, ransomware can either lock the computer screen or encrypt predetermined files. The user will see a full-screen image or notification displayed on an infected system's screen, which states the method used to prevent the victim from using their system and will indicate how the user can pay the ransom. Alternatively, the ransomware will prevent access to potentially critical or valuable files like documents and spreadsheets.
Implications for data protection methods and/or disaster recovery
Ransomware attacks can sometimes use what is called a “Trojan”, where there is a time lag between the first system infected and the detonation (activation of malicious code). During that time, the malicious code copies itself to all connected systems to ensure maximum damage to the victim’s environment. Depending on the frequency of replication between the production and disaster recovery environment, the malicious code will use the replication to infect the Disaster Recovery (DR) environment. For example, if the victim’s disaster recovery uses synchronous replication the malicious code will propagate immediately to the DR site, and once the malicious code is activated, both the production and DR environments will be locked.
Moreover, if there is a time lag between infection and activation, the malicious code will likely be included in the backup. Additionally, in most cases, the cybercriminal will study the victim’s environment to understand the backup retention policy and extend the time lag between infection and activation to ensure all backups are infected. Once all backup generations are infected, the cybercriminal will have full control over the victim’s environment.
So rather than acting as a data protection procedure, disaster recovery can help spread the malicious code and any recovery/backup data will be equally affected along with production data.
How to mitigate and recover from a ransomware attack
Businesses are now beginning to realize that it is no longer a question of if they will be attacked, but when. Given the scope and sophistication of current threats, what can businesses realistically do to prevent such attacks, or recover from them?
Payment of the ransom – As previously stated, depending on the motive for the attack, paying the ransom does not necessarily guarantee that the organization will get the decryption key required to regain access to the infected system or files. However, it is understandable that many organizations are placed in the unenviable position where paying the ransom is the easiest and most cost-effective path. To try and dissuade businesses from taking this path the US Treasury have issued guidelines that strongly discourages the payment of ransoms or extortion demands, with possible sanctions for businesses that do. They are instead encouraging businesses to adopt the CISA (Cybersecurity and Infrastructure Security Agency) recommendations and to report incidents to the CISA and the Federal Bureau of Investigation.
Cyber insurance - Some businesses have taken the approach of accepting they we will be attacked and lose data, and that cyber insurance will cover any loss. There is increasing evidence that the insurance companies are unwilling to meet those claims, especially where there is no motivation or strategy for risk management or at least minimum steps towards prevention of the threat.
Best practices - As an example of how to deal with a ransomware threat, CISA has issued a series of recommendations to protect networks from a ransomware attack:
The next steps are to mitigate the threat through the processes of containment, eradication, and recovery. Containment means isolating the infection and so that it does not cause anything more to happen. Eradication means to eliminate and destroy all the malware software instances. Recovery tends to mean recovery from uncontaminated offline backups to regain the integrity and confidence.
The final step is to articulate the lessons learned and apply them back at the incident planning process in a cyclical manner.
Reference Material
Definitions
Malware, short for malicious software, is a blanket term for viruses, worms, Trojans, and other harmful software that attackers use to gain access to sensitive information illegally. Software is identified as malware based on its intended nefarious use (such as identity theft or even total data destruction), rather than a particular technique or technology used to build it.
SNIA Dictionary Definitions
malware [Permalink]
[Computer System] [Data Security]
Malicious software designed specifically to damage or disrupt a system, attacking confidentiality, integrity and/or availability. [ISO/IEC 27033-1]
Examples are a computer virus, computer worm, Trojan horse, spyware, adware, ransomware, or scareware.
ransomware [Permalink]
[Data Security]
A type of malicious software designed to block access to data until funds are paid.
Feb 2, 2022
Feb 2, 2022
Leave a Reply