Register here to join us on April 9, 2024 for what will surely be a fascinating discussion on the impact of AIOps.
]]>Ceph, an Open Source project for enterprise unified software-defined storage, represents a compelling solution for this cloud native on-premises architecture and will be the topic of our next SNIA Cloud Storage Technologies Initiative webinar, “Ceph: The Linux of Storage Today.”
This webinar will discuss:
We will describe how Ceph is gaining industry momentum, satisfying enterprise architectures’ data storage needs and how the technology community is investing to enable the vision of “Ceph, the Linux of Storage Today.”
Register today to join us for this timely discussion.
]]>The audience was highly-engaged during the live event and asked several great questions. Here are answers to them all.
Q. Do you see the need for fast object storage for AI kind of workloads?
A. Yes, the demand for fast object storage in AI workloads is growing. Initially, object storage was mainly used for backup or archival purposes. However, its evolution into Data Lakes and the introduction of features like the S3 SELECT API have made it more suitable for data analytics. The launch of Amazon’s S3 Express, a faster yet more expensive tier, is a clear indication of this trend. Other vendors are following suit, suggesting a shift towards object storage as a primary data storage platform for specific workloads.
Q. As Object Storage becomes more prevalent in the primary storage space, could you talk about data protection, especially functionalities like synchronous replication and multi-site deployments – or is your view that this is not needed for object storage deployments?
A. Data protection, including functionalities like synchronous replication and multi-site deployments, is essential for object storage, especially as it becomes more prevalent in primary storage. Various object storage implementations address this differently. For instance, Amazon S3 supports asynchronous replication. Azure ZRS (Zone-redundant storage) offers something akin to synchronous replication within a specific geographical area. Many on-premises solutions provide multi-site deployment and replication capabilities. It’s crucial for vendors to offer distinct features and value additions, giving customers a range of choices to best meet their specific requirements. Ultimately, customers must define their data availability and durability needs and select the solution that aligns with their use case.
Q. Regarding polling question #3 during the webinar, why did the question only ask “above 10PB?” We look for multi-PB like 100PB … does this mean Object Storage is not suitable for multi PB?
A. Object storage is inherently scalable and can support deployments ranging from petabyte to exabyte scale. However, scalability can vary based on specific implementations. Each object storage solution may have its own limits in terms of capacity. It’s important to review the details of each solution to ensure it meets your specific needs for multi-petabyte scale deployments.
Q. Is Wasabi 100% Compatible with Amazon S3?
A. While we typically avoid discussing specific vendors in a general forum, it’s important to note that most ‘S3-compatible’ object storage implementations have some discrepancies when compared to Amazon S3. These differences can vary in significance. Therefore, we always recommend testing your actual workload against the specific object storage solution to identify any critical issues or incompatibilities.
Q. What are the best ways to see a unified view of different types of storage — including objects, file and blocks? This may be most relevant for enterprise-wide data tracking and multi-cloud deployments.
A. There are various solutions available from different vendors that offer visibility into multiple types of storage, including object, file, and block storage. These solutions are particularly useful for enterprise-wide data management and multi-cloud deployments. However, this topic extends beyond the scope of our current discussion. SNIA might consider addressing this in a separate, dedicated webinar in the future.
Q. Is there any standard object storage implementation against which the S3 compatibility would be defined?
A. Amazon S3 serves as a de facto standard for object storage implementation. Independent software vendors (ISVs) can decide the degree of compatibility they want to achieve with Amazon S3, including which features to implement and to what extent. The objective isn’t necessarily to achieve identical functionality across all implementations, but rather for each ISV to be cognizant of the specific differences and potential incompatibilities in their own solutions. Being aware of these discrepancies is key, even if complete compatibility isn’t achieved.
Q. With the introduction of directory buckets, do you anticipate vendors picking up compatibility there as well or maintaining a strictly flat namespace?
A. That’s an intriguing question. We are putting together an on-going object storage forum, which will delve into more in follow-up calls, and will serve as a platform for these kinds of discussions. We anticipate addressing not only the concept of directory buckets versus a flat namespace, but also exploring other ideas like performance enhancements and alternate transport layers for S3. This forum is intended to be a collaborative space for discussing future directions in object storage. If you’re interested, contact the cloudtwg@snia.org.
Q. How would an incompatibility be categorized as something that is important for clients vs. just something that doesn’t meet the AWS spec/behavior?
A. Incompatibilities should be assessed based on the specific needs and priorities of each implementor. While we don’t set universal compatibility goals, it’s up to every implementor to determine how closely they align with S3 or other protocols. They must decide whether to address any discrepancies in behavior or functionality based on their own objectives and their clients’ requirements. Essentially, the significance of an incompatibility is determined by its impact on the implementor’s goals and client needs.
Q. Have customers experienced incompatibilities around different SDKs with regard to HA behaviors? Load balancers vs. round robin DNS vs. other HA techniques on-prem and in the cloud?
A. Yes, customers do encounter incompatibilities related to different SDKs, particularly concerning high availability (HA) behaviors. Object storage encompasses more than just APIs; it also involves implementation choices like load balancing decisions and HA techniques. Discrepancies often arise due to these differences, especially when object storage solutions are deployed within a customer’s data center and need to integrate with the existing networking infrastructure. These incompatibilities can be due to various factors, including whether load balancing is handled through round robin DNS, dedicated load balancers, or other HA techniques, either on-premises or in the cloud.
Q. Any thoughts on keeping pace with AWS as they evolve the S3 API? I’m specifically thinking about the new Directory Bucket type and the associated API changes to support hierarchy.
A. We at the SNIA Cloud Storage Technical Work Group are in dialogue with Amazon and are encouraging their participation in our planned Plugfest at SDC’24. Their involvement would be invaluable in helping us anticipate upcoming changes and understand new developments, such as the Directory Bucket type and its associated API changes. This new variation of S3 from Amazon, which differs from the original implementation, underscores the importance of compatibility testing. While complete compatibility may not always be achievable, it’s crucial for ISVs to be fully aware of how their implementations differ from S3’s evolving standards.
Q. When it comes to object store data protection with backup software, do you see also some incompatibilities with recovered data?
A. When data is backed up to an object storage system, there’s a fundamental expectation that it can be reliably retrieved later. This reliability is a cornerstone of any storage platform. However, issues can arise when data is initially stored in one specific object storage implementation and later transferred to a different one. If this transfer isn’t executed in accordance with the backup software provider’s requirements, it could lead to difficulties in accessing the data in the future. Therefore, careful planning and adherence to recommended practices are crucial during any data migration process to prevent such compatibility issues.
The SNIA Cloud Storage Technical Work Group is actively working on this topic. If you want to get involved, reach out at cloudtwg@snia.org and follow us @SNIAcloud
]]>Q. How are the queues assigned to individual namespaces? How many queues are assigned for a particular namespace, can we customize it and if so, how? What is the difference between normal namespace and SR-IOV enabled namespace? Can you please explain sets domain and endurance group?
A. For NVMe® namespace management, the Cloud Storage TWG recommends the use of the SNIA Swordfish® Specification. The SNIA Cloud Data Management Interface (CDMI) is designed to provide a cloud abstraction that hides the complexities and details of the underlying storage implementation and infrastructure from the cloud user. Hiding storage implementation and infrastructure complexities are a key part of what makes a cloud a cloud, in that:
a) knowledge and specification of the underlying infrastructure should not be required in order to consume storage services (simplification of use),
b) the underlying infrastructure will be constantly and dynamically changing, and these changes should not be visible to the end user (to enable operational flexibility for the cloud provider), and,
c) the desired quality of service, data protection, and other client-required storage services should be indicated by intent, rather than as a side-effect of the underlying configuration (ideally via declarative metadata).
As these are also three key principles of CDMI. This guides us to avoid directly exposing or managing the underlying storage infrastructure.
Q. How do the responses behave if the responses are really large – i.e. Can I get just the metadata that might warn me there are 70 billion objects at the top level and I don’t really want to spend the time to get them all before deciding or diving into one of them?
A. When obtaining information about data objects and containers, CDMI allows the request to specify which fields should be returned in the JSON response. This is described in sections 8.4.1 and 9.4.1 of the CDMI specification, and uses standard URI query parameters.
For example, to get the object name, metadata and number of children for a container, the following request URI would be used: GET /cdmi/2.0.0/MyContainer/?objectName&metadata&childrenrange
CDMI also allows range requests for listing a subset of the children of a container. For example, listing the first 200 children of a container would be accomplished by using the following request URI: GET /cdmi/2.0.0/MyContainer/?children=0-199
There also is a draft extension to CDMI to support recursive children listing and obtaining information about children in a single request, which can dramatically reduce the number of requests required to enumerate a container when information about each child is required.
Q. Where can I go to get help using CDMI with my application?
A. SNIA provides free access to the CDMI specification on the SNIA website. Extensions to the CDMI standard are also publicly available here.The SNIA Cloud Storage (TWG) provides implementation assistance and discusses extensions and errata to the standard as part of its weekly work group calls. Interested parties are encouraged to join the TWG.
Q. If I were not using CDMI, what tools or methods would I need to incorporate to do the same kind of operations? What else is out there?
A. The Cloud Storage TWG does not know of any similar standards for namespace management. In order to manage namespaces without using CDMI, one would need to do the following:
a) Define or select an HTTP-based protocol that provides basic request/response semantics and includes authentication. This is provided by all of the cloud providers for their cloud APIs.
b) Define or select a set of APIs for enumerating namespaces, for example, the ListBuckets API in AWS S3, and the Azure Files List Directories and Files API in Microsoft Azure.
c) Define a set of APIs for listing and specifying how namespaces (files, directories, objects and containers) can be exported or imported.
While each of these exists for the major cloud providers, they are unique for each provider and storage type. CDMI provides a common, open and unified way to manage all types of storage namespaces.
Q. How does CDMI help address security in my namespace management?
A. CDMI provides a number of security functions that assist with namespace management:
a) Every object in CDMI, including namespaces, can have an access control list (ACL) that specifies what operations can be performed against that object. This is described in section 17.2 of the CDMI specification. ACLs are based on standard NFSv4 ACLs, and allow metadata modifications (E.g. CDMI exports and CDMI imports) to have separate access control entries (ACEs).
b) CDMI objects can have their access control decisions delegated to a customer-provided system via Delegated Access Control (DAC), which can provide finer-grained access control than ACLs where needed, as needed. This allows policies to take into account the specific import and export requests themselves, and to interface with policy enforcement frameworks such as XACML and open source policy engines such as the Open Policy Agent (OPA).
c) CDMI allows mapping of user credentials to the user principal and group to be performed by external systems, such as Active Directory. This mapping can be on an object-by-object basis, allowing objects managed by different security domains to co-exist within a single unified namespace.
]]>
The SNIA Cloud Storage Technologies Initiative, together with the SNIA Cloud Storage Technical Work Group, is working to address the issues of cloud object storage complexity and interoperability. We’re kicking off 2024 with two exciting initiatives: 1) a webinar on June 9, 2024, and 2) a Plugfest in September of 2024. Here are the details:
Webinar: Navigating the Complexities of Object Storage Compatibility
In this webinar, we’ll highlight real-world incompatibilities found in various object storage implementations. We’ll discuss specific examples of existing discrepancies, such as missing or incorrect response headers, unsupported API calls, and unexpected behavior. We’ll also describe the implications these have on actual client applications.
This analysis is based on years of experience with implementation, deployment, and evaluation of a wide range of object storage systems on the market. Attendees will leave with a deeper understanding of the challenges around compatibility and how to address them in their own applications.
Register here to join us on January 9, 2024.
Plugfest: Cloud Object Storage Plugfest
SNIA is planning an open collaborative Cloud Object Storage Plugfest co-located at SNIA Storage Developer Conference (SDC) scheduled for September 2024 to work on improving cross-implementation compatibility for client and/or server implementations of private and public cloud object storage solutions.
This endeavor is designed to be an independent, vendor-neutral effort with broad industry support, focused on a variety of solutions, including on-premises and in the cloud. This Plugfest aims to reduce compatibility issues, thus improving customer experience and increasing the adoption rate of object storage solutions.
Click here to let us know if you’re interested.
We hope you will consider participating in both of these initiatives!
]]>
Our live audience asked several interesting questions. Here are answers from our presenters.
Q. With the rise of large language models (LLMs) what role will edge AI play?
A. LLMs are very good at predicting events based on previous data, often referred to as next token in LLMs. Many edge use cases are also about prediction of the next event, e.g., a machine is going to go down, predicting an outage, a security breach on the network and so on. One of the challenges of applying LLMs to these use cases is to convert the data (tokens) into text with the right context.
Q. After you create an AI model how often do you need to update it?
A. That is very dependent on the dataset itself, the use case KPIs, and the techniques used (e.g., network backbone and architecture). It used to be where the data collection cycle is very long to collect data that includes outliers and rare events. We are moving away from this kind of development due to its cost and long time required. Instead, most customers start with a few data points and reiterate by updating their models more often. Such a strategy helps a faster return on the investment since you deploy a model as soon as it is good enough. Also, new techniques in AI such as unsupervised learning or even selective annotation can enable some use cases to get a model that is self-learning or at the least self-adaptable.
Q. Deploying AI is costly, what use cases tend to be cost effective?
A. Just like any technology, prices drop as scale increases. We are at an inflection point where we will see more use cases become feasible to develop and deploy. But yes, many use cases might not have an ROI. Typically, we recommend to start with use cases that are business critical or have potential of improving yield, quality, or both.
Q. Do you have any measurements about energy usage for edge AI? Wondering if there is an ecological argument for edge AI in addition to the others mentioned?
A. This is very good question and top of mind for many in the industry.There is not data yet to support sustainability claims, however running AI at the edge can provide more control and further refinement for making tradeoffs in relation to corporate goals including sustainability. Of course, compute at the edge reduces data transfer and the environmental impact of these functions.
]]>
There is good news on addressing this issue, and the SNIA Cloud Storage Technologies Initiative (CSTI) will explain how in our live webinar “Simplified Namespace Management – The Open Standards Way” on October 18, 2023, where David Slik, Chair of the SNIA Cloud Storage Technical Work Group, will demonstrate how the SNIA Cloud Data Management Interface (CDMI), an open ISO standard (ISO/IEC 17826:2022) for managing data objects and containers, already includes extensive capabilities for simplifying the management of complex namespaces.
In this webinar, you’ll learn the benefits of simplifying namespace management in an open standards way, including namespace discovery, introspection, exports, imports and more, discussing:
As one of the key architects of CDMI, David will dive into the details, discuss real-world use cases and answer your questions. We hope you’ll join us on October 18th. Register here.
]]>Q. Are businesses using Confidential AI today?
A. Absolutely, we have seen a big increase in adoption of Confidential AI particularly in industries such as Financial Services, Healthcare and Government, where Confidential AI is helping these organizations enhance risk mitigation, including cybercrime prevention, anti-money laundering, fraud prevention and more.
Q: With compute capabilities on the Edge increasing, how do you see Trusted Execution Environments evolving?
A. One of the important things about Confidential Computing is although it’s a discrete privacy enhancing technology, it’s part of the underlying broader, distributed data center compute hardware. However, the Edge is going to be increasingly important as we look ahead to things like 6G communication networks. We see a role for AI at the Edge in terms of things like signal processing and data quality evaluation, particularly in situations where the data is being sourced from different endpoints.
Q: Can you elaborate on attestation within a Trusted Execution Environment (TEE)?
A. One of the critical things about Confidential Computing is the need for an attested Trusted Execution Environment. In order to have that reassurance of confidentiality and the isolation and integrity guarantees that we spoke about during the webinar, attestation is the foundational truth of Confidential Computing and is absolutely necessary. In every secure implementation of confidential AI, attestation provides the assurance that you’re working in that protected memory region, that data and software instructions can be secured in memory, and that the AI workload itself is shielded from the other elements of the computing system. If you’re starting with hardware-based technology, then you have the utmost security, removing the majority of actors outside of the boundary of your trust. However, this also creates a level of isolation that you might not want to use for an application that doesn’t need this high level of security. You must balance utmost security with your application’s appetite for risk.
Q: What is your favorite reference for implementing Confidential Computing that bypasses the OS, BIOS, VMM (Virtual Machine Manager) and uses the root trust certificate?
A. It’s important to know that there are different implementations of Trusted Execution Environments, and they are very relevant to different types of purposes. For example, there are process-based TEEs that enable a very discrete definition of a TEE and provide the ability to write specific code and protect very sensitive information because of the isolation from things like the hypervisor and virtual machine manager. There are also different technologies available now that have a virtualization basis and include a guest operating system within their trusted computing base, but they provide greater flexibility in terms of implementation, so you might want to use that when you have a larger application or a more complex deployment. The Confidential Computing Consortium, which is part of The Linux Foundation, is also a good resource to keep up with Confidential AI guidance.
Q: Can you please give us a picture of the upcoming standards for strengthening security? Do you believe that European Union’s AI Act (EU AI Act) is going in the right direction and that it will have a positive impact on the industry?
A. That’s a good question. The draft EU AI Act was approved in June 2023 by the European Parliament, but the UN Security Council has also put out a call for international regulation in the same way that we have treaties and conventions. We think what we’re going to see is different nation states taking discrete approaches. The UK has taken an open approach to AI regulation in order to stimulate innovation. The EU already has a very prescriptive data protection regulation method, and the EU AI Act takes a similar approach. It’s quite prescriptive and designed to complement data privacy regulations that already exist.
Q. Where do you think some of the biggest data privacy issues are within generative AI?
A. There’s quite a lot of debate already about how these massive generative AI systems have used data scraped from the web, whether things like copyright provisions have been acknowledged, and whether data privacy in imagery from social media has been respected. At an international level, it’s going to be interesting to see whether people can agree on a cohesive framework to regulate AI and to see if different countries can agree. There’s also the issue of the time required to develop legislation being superseded by technological developments. We saw ChatGPT to be very disruptive last year. There are also ethical considerations around this topic which the SNIA CSTI covered in a webinar “The Ethics of Artificial Intelligence.”
Q. Are you optimistic that regulators can come to an agreement on generative AI?
A. In the last four or five years, regulators have become more open to working with financial institutions to better understand the impact of adopting new technologies such as AI and generative AI. This collaboration among regulators with those in the financial sector is creating momentum. Regulators such as the Monetary Authority of Singapore are leading this strategy, actively working with vendors to understand the technology application within financial services and how to guide the rest of the banking industry.
]]>The impact of edge AI is the topic for our next SNIA Cloud Storage Technologies Initiative (CSTI) live webinar, “Why Distributed Edge Data is the Future of AI,” on October 3, 2023. If centralized (or in the cloud), AI is a superpower and super expert, but edge AI is a community of many smart wizards with the power of cumulative knowledge over a central superpower. In this webinar, our SNIA experts will discuss:
Register here to join us on October 3rd. Our experts will be ready to answer your questions.
]]>How a data fabric abstraction layer works and the benefits it delivers was the topic of our recent SNIA Cloud Storage Technologies Initiative (CSTI) webinar, “Data Fabric: Connecting the Dots between Structured and Unstructured Data.” If you missed it, you can watch it on-demand and access the presentations slides at the SNIA Educational Library.
We did not have time to answer audience questions at the live session. Here are answers from our expert, Joseph Dain.
Q. What are some of the biggest challenges you have encountered when building this architecture?
A. The scale of unstructured data makes it challenging to build a catalog of this information. With structured data you may have thousands or hundreds of thousands of table assets, but in unstructured data you can have billions of files and objects that need to be tracked at massive scale.
Another challenge is masking unstructured data. With structured data you have a well-defined schema so it is easier to mask specific columns but in unstructured data you don’t have such a schema so you need to be able to understand what term needs to be masked in an unstructured document and you need to know the location of that field without having the luxury of a well-defined schema to guide you.
Q. There can be lots of data access requests from many users. How is this handled?
A. The data governance layer has two aspects that are leveraged to address this. The first aspect is data privacy rules which are automatically enforced during data access requests and are typically controlled at a group level. The second aspect is the ability to create custom workflows with personas that enable users to initiate data access requests which are sent to the appropriate approvers.
Q. What are some of the next steps with this architecture?
A. One area of interest is leveraging computational storage to do the classification and profiling of data to identify aspects such as personally identifiable information (PII). In particular, profiling vast amounts of unstructured data for PII is a compute, network, storage, and memory intense operation. By performing this profiling leveraging computational storage close to the data, we gain efficiencies in the rate at which we can process data with less resource consumption.
We continue to offer educational webinars on a wide range of cloud-related topics throughout the year. Please follow us @SNIACloud to make sure you don’t miss any.
]]>