Information : cette page n'est pas traduite en français

Big Data Workflows - how provenance can help

Lundi 25 Mars 2013, 14h Batiment la Galéra, Salle 127

Big data analyses are critical for decision support in business data processing. These analyses involve the execution of many activities such as: programs to explore data from the web, databases, data warehouses and files; data cleaning procedures; programs to aggregate data; core programs that perform analyses; and tools to visualize and interpret the results. Each step (activity) of the analysis is performed isolated from the other and the analysts need to manually manage the larger life cycle of big data analysis. Big data analysis started to be represented as pipelines or dataflows. However, current approaches lack features to provide a consistent view of many different explorations and activities as part of a broader analysis, like a computational experiment. Scientific workflows have long provided such features for scientific experiments, and although originally designed for science, they may be useful to support the life cycle of big data analysis. Scientific analyses typically involve experimenting with several steps using different datasets and computer programs. Scientists need to manage the composition, execution and analysis of their experiments carefully, so the results can be trusted and the experiments reproducible. To help managing experiments, scientific workflow management systems (SWfMS) have been proposed to let scientists design workflows of different complexities and manage their execution, including high performance computing (HPC) in cloud environments. Most SWfMS also have provenance data support. Provenance tracks how the results of the experiments were produced, which is essential to make an experiment (big data analysis) reproducible and trustworthy. Business Process Workflows are focused on modeling the process rather than managing big data flows with provenance and HPC. In this talk we discuss on provenance support along the big data analysis workflow as an alternative to improve results of big data analysis, especially in a long-term view


Télécharger cet événement

Dernière mise à jour le 13/03/2013