The Power of Data Aggregation during a Pandemic

The new coronavirus that has been ravaging countries and sending us all into lockdown is the most observed pandemic we’ve ever experienced. Data about the virus itself and perhaps more appropriately, the nations upon which it is having an impact have been shared from multiple sources. These include academic institutions such as John Hopkins University, national governments and international organisations such as the World Health Organisation. The data has been made available in many formats, from programmatically accessible APIs to downloadable comma delimited files to prepared data visualisations. We’ve never been more informed about the current status of anything.

Data Aggregation

What this newfound wealth of data has also brought to light is the true power of data aggregation. There is really only a limited number of conclusions that can be drawn from the number of active and resolved cases per nation and region. Over time, this can show us a trend and it also gives a very real snapshot of where we stand today. However, if we layer on additional data such as when actions were taken, we can see clear pictures of the impact of that strategy over time. With each nation taking differing approaches based on their own perceived position, mixed with culture and other socio-economic factors, we end up with a good side-by-side comparison of the strategies and their effectiveness. This is helping organisations and governments make decisions going forward, but data scientists globally are urging caution. In fact, the data we are producing today by processing all of these feeds may turn out to be far more valuable for the next pandemic, than it will for this one. It will be the analysis that helps create the “new normal.”

Read More

Exploring the Software Defined Data Center – A SNIA Cloud Webcast

SNIA Cloud is pleased to announce our next live Webcast, “Exploring the Software Defined Data Center.” A Software Defined Data Center (SDDC) is a compute facility in which all elements of the infrastructure – networking, storage, CPU and security – are virtualized and removed from proprietary hardware stacks. Deployment, provisioning and configuration as well as the operation, monitoring and automation of the entire environment is abstracted from hardware and implemented in software. If you ever have a software that you haven’t used before and want to test it before applying it then consider using misra.

The results of this software-defined approach include maximizing agility and minimizing cost, benefits that appeal to IT organizations of all sizes. In fact, understanding SDDC concepts can help IT professionals in any organization better apply these software defined concepts to storage, networking, compute and other infrastructure decisions.

If you’re interested in Software Defined Data Centers and how such a thing might be implemented – and why this concept is important to IT professionals who aren’t involved with building data centers – then please join us on March 15th as Eric Slack, Sr. Analyst with Evaluator Group, will explain what “software defined” really means and why it’s important to all IT organizations. Eric will be joined by Alex McDonald, Chair for SNIA’s Cloud Storage Initiative who will talk about how these concepts apply to the modern data center.

Register now as we’ll explore:

  • How a SDDC leverages this concept to make the private cloud feasible
  • How we can apply SDDC concepts to an existing data center
  • How to develop your own software defined data center environment

As always, this Webcast will be live. Eric, Alex and I will be on hand to answer your questions. We hope you’ll join us on March 15th.

OpenStack File Services for HPC Q&A

We got some great questions during our Webcast on how OpenStack can consume and control file services appropriate for High Performance Computing (HPC) in a cloud and multi-tenanted environment. Here are answers to all of them. If you missed the Webcast, it’s now available on-demand. I encourage you to check it out and please feel free to leave any additional questions at this blog.

Q. Presumably we can use other than ZFS for the underlying filesystems in Lustre?

A. Yes, there a plenty of other filesystems that can be used other than ZFS. ZFS was given as an example of a scale up and modern filesystem that has recently been integrated, but essentially you can use most filesystem types with some having more advantages than others. What you are looking for is a filesystem that addresses the weaknesses of Lustre in terms of self-healing and scale up. So any filesystem that allows you to easily grow capacity whilst also being capable of protecting itself would be a reasonable choice. Remember, Lustre doesn’t do anything to protect the data itself. It simply places objects in a distributed fashion of the Object Storage Targets.

Q. Are there any other HPC filesystems besides Lustre?

A. Yes there are and depending on your exact requirements Lustre might not be appropriate. Gluster is an alternative that some have found slightly easier to manage and provides some additional functionality. IBM has GPFS which has been implemented as an HPC filesystem and other vendors have their scale-out filesystems too. An HPC filesystem is simply a scale-out filesystem capable of very good throughput with low latency. So under that definition a flash array could be considered a High Performance storage platform, or a scale out NAS appliance with some fast disks. It’s important to understand you’re workloads characteristics and demands before making the choice as each system has pro’s and con’s.

Q. Does “embarrassingly parallel” require bandwidth or latency from the storage system?

A. Depending on the workload characteristics it could require both. Bandwidth is usually the first demand though as data is shipped to the nodes for processing. Obviously the lower the latency the fast though jobs can start and run, but its not critical as there is limited communication between nodes that normally drives the low latency demand.

Q. Would you suggest to use Object Storage for NFV, i.e Telco applications?

A. I would for some applications. The problem with NFV is it actually captures a surprising breadth of applications so of which have very limited data storage needs. For example there is little need for storage in a packet switching environment beyond the OS and binaries needed to stand up the VM’s. In this case, object is a very good fit as it can be easily, geographically distributed ensuring the same networking function is delivered in the same manner. Other applications that require access to filtered data (so maybe billing based applications or content distribution) would also be good candidates.

Q. I missed something in the middle; please clarify, your suggestion is to use ZFS (on Linux) for the local file system on OSTs?

A. Yes, this was one example and where some work has recently been done in the Lustre community. This affords the OSS’s the capability of scaling the capacity upwards as well as offering the RAID-like protection and self-healing that comes with ZFS. Other filesystems can offer those some things so I am not suggesting it is the only choice.

Q. Why would someone want/need scale-up, when they can scale-out?

A. This can often come down to funding. A lot of HPC environments exist in academic institutions that rely on grant funding and sponsorship to expand their infrastructure. Sometimes it simply isn’t feasible to buy extra servers in order to add capacity, particularly if there is already performance headroom. It might also be the case that rack space, power and cooling could be factors in which case adding drives to cope with bigger workloads might be the only option. You do need to consider if the additional capacity would also provoke the need for better performance so we can’t just assume that adding disk is enough, but it’s certainly a good option and a requirement I have seen a number of times.

 

OpenStack File Services Options

How can OpenStack consume and control file services appropriate to High Performance Compute (HPC) in a cloud and multi-tenanted environment? Find out on September 22nd when SNIA Cloud hosts a live Webcast and examines two approaches to integration.

One approach is to have OpenStack manage the storage infrastructure services using Cinder, Nova and Neutron to provide HPC Filesystem as a Service.

A second option is to use Manila file services for OpenStack to control the HPC File system deployment and manage the exports etc. This part also looks at the creation (in progress) of the Lustre Manila driver and its current progress.

I hope you’ll join Alex McDonald and me as we discuss the pros and cons of each approach. Register today and

Upcoming Webcast: Hybrid Clouds Part 2

On June 10, 2015, SNIACloud will be hosting a live Webcast “Hybrid Clouds Part 2: A Case Study on Building a Bridge between Public and Private Clouds.” There are significant differences in how cloud services are delivered to various categories of users. The integration of these services with traditional IT operations will remain an important success factor but also a challenge for IT managers. The key to success is to build a bridge between private and public clouds. I’ll be back to expand upon our earlier SNIA Hybrid Clouds Webcast where we looked at the choices and strategies for picking a cloud provider for public and hybrid solutions. Please join me on June 10th to hear:

  • Best practices to work with multiple public cloud providers
  • The role of SDS in supporting a hybrid data fabric
  • Hybrid cloud decision criteria
  • Key implementation principles
  • Real-world hybrid cloud use case

Please Register now and bring your questions. This will be a live and interactive event. I hope to see you there.

 

 

New Webcast: Hierarchical Erasure Coding: Making Erasure Coding Usable

On May 14th the SNIA-CSI (Cloud Storage Initiative) will be hosting a live Webcast “Hierarchical Erasure Coding: Making erasure coding usable.” This technical talk, presented by Vishnu Vardhan, Sr. Manager, Object Storage, at NetApp and myself, will cover two different approaches to erasure coding – a flat erasure code across JBOD, and a hierarchical code with an inner code and an outer code. This Webcast, part of the SNIA-CSI developer’s series, will compare the two approaches on different parameters that impact the IT business and provide guidance on evaluating object storage solutions. You’ll learn:

  • Industry dynamics
  • Erasure coding vs. RAID – Which is better?
  • When is erasure coding a good fit?
  • Hierarchical Erasure Coding- The next generation
  • How hierarchical codes make growth easier
  • Key areas where hierarchical coding is better than flat erasure codes

Register now and bring your questions. Vishnu and I will look forward to answering them.

Hybrid Clouds Webcast Preview

On March 18th, SNIA-CSI will be hosting a live Webcast “Hybrid Clouds: Bridging Private and Public Cloud Infrastructures.”

Every IT consumer is using (or is planning to use) cloud in one form or another. The emphasis on the design and implementation of cloud architectures is often made without consideration of where the cloud storage and compute should be located and the benefits, costs and risks of deciding where the applications will run. Will it be a public cloud? Or a private cloud in the data center or co-location site? Or a hybrid of the two?

This session will be an overview on developing & delivering a cloud architecture with a focus on getting the overall goals correctly specified and defined, understanding the issues that must be addressed, and then making the decision about whether the application is suitable for public, private or some hybrid mixture of the two before undertaking implementation. We’ll also focus on one of the most difficult aspects of the solution, the management of data and storage in the cloud, and present a case study of a successful commercial implementation.

Register now for this live event. I hope you’ll join Alex McDonald and me for what we hope will be an informative and interactive event.