Actualité : Un ouvrage avec une coauteur du LIRMM et de l'INRIA, E. Pacitti
Un ouvrage avec une coauteur du LIRMM et de l'INRIA, E. Pacitti
Workflows may be defined as abstractions used to model the coherent flow of activities in the context of an in silico scientific experiment. They are employed in many domains of science such as bioinformatics, astronomy, and engineering. Such workflows usually present a considerable number of activities and activations (i.e., tasks associated with activities) and may need a long time for execution. Due to the continuous need to store and process data efficiently (making them data-intensive workflows), high-performance computing environments allied with parallelization techniques are used to run these workflows. At the beginning of the 2010s, cloud technologies emerged as a promising environment to run scientific workflows. By using clouds, scientists have expanded beyond single parallel computers to hundreds or even thousands of virtual machines.In this book, we aim to identify and distill the body of work on workflow management in clouds and Data-Intensive Scalable Computing (DISC) environments. We start by presenting the principles of data-intensive scientific workflows. We provide examples of architectures and real use cases. Next, we present scheduling solutions that enable to execute workflows in a single site and multisite clouds, taking into account data placement and replication, as well as provenance. Afterwards, we go towards workflow management in DISC environments. We present in detail, complete solutions that enables to optimize the executionof workflows using frameworks such as Apache Spark and its extensions. This book is intended for researchers, either in eScience or not, or as a textbook for master and doctorate students.