2015

Contrer l'attaque Simple Power Analysis efficacement dans les applications de la cryptographie asymétrique, algorithmes et implantations

Par Robert Jean-Marc (DALI/LIRMM) le 2015-12-03

Avec le développement des communications et de l'Internet, l'échange des informations cryptées a explosé. Cette évolution a été possible par le développement des protocoles de la cryptographie asymétrique qui font appel à des opérations arithmétiques telles que l'exponentiation modulaire sur des grands entiers ou la multiplication scalaire de point de courbe elliptique. Ces calculs sont réalisés par des plates-formes diverses, qui font l'objet d'attaques qui exploitent les informations recueillies par un canal auxiliaire, tels que le courant instantané consommé.
Dans la thèse, nous améliorons les performances des opérations résistantes à l'attaque Simple Power Analysis. Sur l'exponentiation modulaire, nous proposons l'utilisation de multiplications modulaires multiples avec une opérande commune. Sur la multiplication scalaire de point de courbe elliptique nous proposons : d'employer des opérations combinées AB,AC et AB+CD dans les opérations de points ; sur corps binaire, de paralléliser l'échelle binaire de Montgomery ; nous réalisons l'implantation d'une approche parallèle de l'approche Right-to-left Double-and-add.

Simulateur de processeur many-coeur

Par Ramoune Djallal (DALI/LIRMM) le 2015-11-19

Le processeur permettra d'exécuter automatiquement un programme écrit en langage séquentielle (le C par exemple)en parallèle. À chaque fois qu'on arrive à un appel de fonction durant l'exécution du programme on va exécuter la fonction et le retour de la fonction en parallèle, Le programme divisé en un ensemble de portions d'instructions qu'on appel une section. Les sections sont distribuées et exécutées sur un ensemble de coeurs. En utilise un mécanisme de communication entre les sections, une section SI qui se trouve sur le coeur CI et qui a besoin d'une source SRI demande à la section SJ qui se trouve sur le coeur CJ la source SRI via le réseau (Network On Chip). L'exposé présente l'état d'avancement du développement du simulateur de ce processeur.

Efficiency of Reproducible BLAS

Par Chohra Chemseddine (DALI/LIRMM) le 2015-11-12

Modern high performance computation (HPC) performs a huge amount of floating point operations on massively multithreaded systems. Those systems interleave operations and include both dynamic scheduling and non-deterministic reductions that prevent numerical reproducibility, i.e. getting identical results from multiple runs, even on one given machine. Floating point addition is non-associative and the result depends on the computation order. Of course, numerical reproducibility is important to debug, check the correctness of programs and validate the results. A way to guarantee the numerical reproducibility is to calculate the correctly rounded value of the exact result, i.e. extending the IEEE-754 rounding properties to larger computing sequences. When such computation is possible, it is certainly more costly. But is it unacceptable in practice? We are motivated by round-to-nearest parallel BLAS. We can implement such RTN-BLAS thanks to recent algorithms that compute correctly rounded sums.
In this talk, We present and discuss our performance results on different hardware configurations, and we compare the extra cost of our implementation to available solutions for reproducibility. We show that round-to-nearest BLAS can be implemented with no important extra cost.

Recovering numerical reproducibility in hydrodynamic simulations

Par Nheili Rafife (DALI/LIRMM) le 2015-11-05

HPC simulations suffer from failures of numerical reproducibility because of floating-point arithmetic approximations. In practice, different computing distributions of a parallel computation may yield different numerical results. We are interested in a finite element computation of hydrodynamic simulations within the openTelemac software. The main computing step consists in building a linear system and to solve it by the conjugated gradient method. We detailed why reproducibility fails in this process and which operations have to be modified to recover it. We present how to use compensation techniques to provide reproducible numerical simulations

Simulation d'une architecture multicœur

Par Porada Cathy (DALI, UPVD/LIRMM) le 2015-10-15

Le principe d'un processeur parallélisant à beaucoup de cœurs développé par DALI est de découper l'exécution d'un programme en des tout petits morceaux et de distribuer ces morceaux (sections) sur l'ensemble des cœurs.
Les travaux en cours visent à écrire ce processeur en VHDL (langage de description de matériel) pour valider son architecture. On présentera l'avancement et on fera le point sur le travail à venir.

Range Reduction Based on Pythagorean Triples for Trigonometric Function Evaluation

Par De Lassus Saint-Geniès Hugues (DALI, UPVD/LIRMM) le 2015-10-01

Software evaluation of elementary functions usually requires three steps: a range reduction, a polynomial evaluation, and a reconstruction step. These evaluation schemes are designed to give the best performance for a given accuracy, which requires a fine control of errors. One of the main issues is to minimize the number of sources of error and/or their influence on the final result. The work presented in this talk addresses this problem as it removes one source of error for the evaluation of trigonometric functions. We propose a method that eliminates rounding errors from tabulated values used in the second range reduction for the sine and cosine evaluation. When targeting correct rounding, we show that such tables are smaller and make the reconstruction step less expensive than existing methods. This approach relies on Pythagorean triples generators. Finally, we show how to generate tables indexed by up to 10 bits in a reasonable time and with little memory consumption.

Floating-point cosine for ARM Cortex-A50 processors

Par Thévenoux Laurent (Inria/LIP/ENS Lyon) le 2015-09-24

AArch64 is the newest 64-bit RISC architecture developed by ARM. In this talk we will focus on Cortex-A53, a low-power oriented processor, and show how integer instructions of the AArch64 architecture can be exploited in order to perform a cosine evaluation accurately for IEEE double precision (including subnormals). We present a software implementation for cosine over [-pi/4, pi/4] that has a proven 1-ulp accuracy. On-going work aims to determine how to combine SIMD extensions with this integer-based implementation approach in order to achieve best performances.

GPU-enhanced power flow analysis

Par Marin Manuel (DALI, UPVD/LIRMM) le 2015-09-17

This talk presents different alternatives to enhance power flow analysis using Graphic Processing Units (GPU). In power system operation, power flow analysis can be a very useful tool provided it is performed accurately and fast (e.g., for reliability management). However, as power systems worldwide become more heterogeneous, the analysis becomes more and more complex. In recent years, GPUs have proven successful in accelerating several computations, although they are still challenged by applications that exhibit an irregular computing pattern such as power flow analysis. Enhancing power flow analysis using GPUs is significant in two senses: first, it allows power flow analysis to remain relevant in a context of rapid changes in power systems; and second, it expands the GPU application field to a novel terrain. This talk discusses two ways of achieving that goal: the first is accelerating the analysis of radial networks on GPU; the second is using GPU fuzzy interval arithmetic to handle uncertainty in power flow analysis. The proposed methods are implemented and tested on recent GPU architectures, and the results are contrasted with existing solutions from the literature of the field.

Code generation for mathematical functions

Par Kupriianova Olga (LIP6, Paris) le 2015-06-03

There are already several mathematical libraries (libms): collections of code to evaluate some mathematical functions. They differ by the result's accuracy, language, developer teams, etc. The common thing for all existing libms is that they do not give enough choice for user. E.g. you can't choose between fast or accurate implementations, while for applications on the large sets of data the speed is crucial. We try to rewrite the existing libm in a more flexible way. As there are plenty of function variations, it is impossible to write all of them manually, and we need the code generator. We pass the implementation parameters e.g. the results accuracy, implementation domain (as well as the function itself!) to the generator to get the corresponding implementation. Our black-box generator produces code for the functions from the standard libms, applies the needed argument reduction and (or) splits the domain. In the last case, there are some attempts to get the reconstruction code without branching.