Conférence invitée

 

Efficient Big Data Processing in Hadoop MapReduce


abstract: This tutorial is motivated by the clear need of many organizations, companies, and scientists to deal with big data volumes efficiently. Examples include web analytics applications (log analysis), scientific applications (physics, astronomy, and biology), as well as social networks (facebook, twitter, linkedin). A popular data processing engine for big data is Hadoop MapReduce.  In this tutorial, we will familiarize the audience with the main principles and ideas of Hadoop MapReduce and motivate its use for big data processing. We will show several use-cases explaining how to effectively use Hadoop for scientific data. Then, we will focus on different data management techniques, going from job optimization to physical data organization (like data layouts and indexes) that help to boost the performance of Hadoop MapReduce by orders of magnitude.