Federated Computing for the Masses

Understanding Fluid Flow

IMG
IMG
IMG
IMG

In March 2013 a joint team of researchers from Rutgers Discovery Informatics Institute and Computational Physics and Mechanics Laboratory at Iowa State University launched a large scale computational experiment to gather the most comprehensive to date information on the effects of pillars on microfluid channel flow.

The experiment is unique as it demonstrates that a single user operating entirely in a user-space can federate multiple, geographically distributed and heterogeneous HPC resources, to obtain a platform with cloud-like capabilities able to solve large scale computational engineering problems.

In this web page we provide details of the experiment, we show some of the results and summarize our findings.

↑ Top

News

↑ Top

Challenge

The ability to control fluid streams at microscale has significant applications in many domains, including biological processing, guiding chemical reactions, and creating structured materials, to name just a few. Recently, it has been discovered that placing pillars of different dimensions, and at different offsets, allows "sculpting" the fluid flow in microchannels. The design and placement of sequences of pillars allows a phenomenal degree of flexibility to program the flow. However, to achieve such a control it is necessary to understand how the flow is affected by different input parameters.

IMGUsing parallel, finite element and MPI-based Navier-Stokes equation solver, we can simulate flows in a microchannel with an embedded pillar obstacle. For a given combination of microchannel height, pillar location, pillar diameter, and Reynolds number (4 variables), the solver captures qualitative and quantitative characteristics of flow. However, to reveal how the input parameters interplay, and how they impact flow, we have to construct a phase diagram of possible flow behaviors.

The problem is challenging for several reasons. The search space consists of tens of thousands of points, and an individual simulation may take hundreds of core-hours, even when executed on a state-of-the-art HPC cluster. The individual simulations, although independent, are highly heterogeneous and their cost is very difficult to estimate a priori, owing to varying resolution and mesh density required for different configurations. Finally, because the non-linear solver is iterative, it may fail to converge for some combinations of input parameters, which necessitates fault-tolerance.

↑ Top

Team

To solve the problem we formed a multidisciplinary team with joint expertise in high performance and distributed computing, and computational physics and mechanics:

↑ Top

Results

The goal of the experiment was to understand how different microchannel parameters affect fluid flow. To achieve this we interrogated the 4D parameter space formed by input variables (channel height, pillar geometry, Reynolds number), in which a single point is equivalent to a parallel Navier-Stokes simulation with a specific configuration. By discretizing the search space we identified 12,400 simulations that would provide sufficient data to construct phase diagrams. The total cost of these simulations is approximately 1.5 million core-hours if run on the Stampede cluster at TACC - one of the most powerful machines within XSEDE.

The massive size of this challenge makes it virtually impossible to execute on a single HPC resource (unless a special allocation is provided). This is because of the associated computational cost, and more importantly, required throughput. Therefore, we decided to depend on a user-centered computational federation. The idea is to aggregate heterogeneous HPC resources in the spirit of how volunteer computing assembles desktop computers. Specifically, we designed a federation model that:

To achieve these goals we used the CometCloud platform. We combined the MPI-based solver with the CometCloud infrastructure using the master/worker paradigm. In this scenario, the simulation software serves as a computational engine, while CometCloud is responsible for orchestrating the entire execution. The master component takes care of generating tasks, collecting results, verifying that all tasks executed properly, and keeping log of the execution. Here, each task is described by a simulation configuration (specific values of the input variables), and minimal hardware requirements. All tasks are automatically placed in the CometCloud-managed distributed task space for execution. In case of failed tasks the master recognizes the error and either directly resubmits task (in case of a hardware error or a resource leaving the federation), or regenerates it after first increasing the minimal hardware requirements and/or modifying solver parameters (in case of an application error and/or insufficient resources). Workers sole responsibility is to execute tasks pulled from the task space. To achieve this, each worker interacts with the respective queuing system and the native MPI library via a set of dedicated drivers implemented as simple shell scripts.

The resulting platform enabled us to execute the experiment in just two weeks. Below are the main highlights of the experiment:

Distribution of HPC resources used in the experiment:

IMG

Summary of the execution:

IMG
Utilization of different computational resources. Line thickness is proportional to the number of tasks being executed at given point of time. Gaps correspond to idle time, e.g. due to machine maintenance.

IMG
The total number of running tasks at given point of time.

IMG
The total number of finished tasks at given point of time.

IMG
Dissection of throughput measured as the number of tasks completed per hour. Different colors represent component throughput of different machines.

IMG
Thoughput contribution by different institutions.

IMG
Queue waiting time on selected resources.

↑ Top

Resources

Below we provide several useful resources with additional information about the experiment:

If you would like to reference this work, please cite:

↑ Top

Acknowledgment

This work is supported in part by the NSF under grants IIP-0758566, DMS-0835436, CBET-1307743, CBET-1306866, CAREER-1149365 and PHY-0941576. This project used resources provided by: XSEDE supported by NSF OCI-1053575, FutureGrid supported in part by NSF OCI-0910812, and NERSC Center supported by DOE DE-AC02-05CH11231. We would like to thank the SciCom group at the Universidad de Castilla-la Mancha, Spain (UCLM) for providing access to Hermes, and Distributed Computing research group at the Institute of High Performance Computing, Singapore (IHPC) for providing access to Libra. We wish to acknowledge the CINECA Italy, LRZ Germany, CESGA Spain, and the National Institute for Computational Sciences (NICS) for willing to share their computational resources. We would like to thank Dr. Olga Wodo for discussion and help with development of the simulation software, and Dr. Dino DiCarlo for discussions about the problem definition. We express gratitude to all administrators of systems used in this experiment, especially to Prentice Bisbal from Rutgers Discovery Informatics Institute and Koji Tanaka from FutureGrid, for their efforts to minimize downtime of computational resources, and a general support.

↑ Top

Copyright © 2014-2016 UB Scalable Computing Research Group
Copyright © 2013-2014 Rutgers Discovery Informatics Institute