## Specific Countermeasures Against Physical Attacks in FPGAs

Jean-Luc DANGER Shivam BHASIN Guillaume DUC Tarik GRABA Sylvain GUILLEY Houssem MAGHREBI Olivier MEYNARD Maxime NASSAR Laurent SAUVAGE Nidhal SELMANE Youssef SOUISSI

Institut TELECOM / TELECOM-ParisTech / CNRS - LTCI (UMR 5141)



Thursday, December 7th, 2010, Journée SocSip Securité

Jean-Luc Danger

**SOCSIP** securit

#### Presentation Outline

- FPGA specificity and vulnerability
- Overview of countermeasures in FPGAs
- 3 Protection by DPL in FPGAs
- Protection by Masking in FPGAs
- 5 Conclusions



< 🗇 🕨

FPGA specificity and vulnerability Overview of countermeasures in FPGAs

Overview of countermeasures in FPGAs Protection by DPL in FPGAs Protection by Masking in FPGAs Conclusions

Specificity vulnerability

#### **Presentation Outline**

#### FPGA specificity and vulnerability

- Overview of countermeasures in FPGAs
- 3 Protection by DPL in FPGAs
- 4 Protection by Masking in FPGAs
- 5 Conclusions



→ ∃ >

▲ □ ► ▲ □ ►

FPGA specificity and vulnerability

Overview of countermeasures in FPGAs Protection by DPL in FPGAs Protection by Masking in FPGAs Conclusions

Specificity vulnerability

## FPGA specificity

- Price to pay for reconfigurability:
  - Size 35X  $\Rightarrow$  18X , Consumption 14X ASIC size (Kuon and all 2007)
- Many high-gain DFFs
- Many memories:
  - distributed: LUTs
  - embedded
- Many DSPs
- Many long lines and switches : Interconnect = 80% of the total area, and unknown



Specificity vulnerability

## Vulnerability against side-channel attacks

Comparison between ASIC and FPGA in terms of power leakage:

#### SecMat v3[ASIC]:

• Shared power supply between all modules

#### SecMat v3[FPGA]:

- SecMat v3[ASIC] VHDL code synthesized in an Altera Stratix EPS1S25
- Global power supply
- 10,157 logic elements and 286,720 RAM bits for the whole SoC
- DES alone is 1,125 logic elements (LuT4)

The power traces acquired from those three circuits are available for download from <a href="http://www.dpacontest.org/">http://www.dpacontest.org/</a>.



Overview of countermeasures in FPGAs Protection by DPL in FPGAs Protection by Masking in FPGAs Conclusions

## SecMat v3[ASIC] – covariance with $|LR[0] \oplus LR[1]|$



#### SecMat v3[ASIC]:

- Typical trace: 38 mV
- Typical DPA: 0.6 mV
- $\Rightarrow$  Side-channel leakage: 1.5 %

Covariance result (same scale as the average power tra-



Jean-Luc Danger

Overview of countermeasures in FPGAs Protection by DPL in FPGAs Protection by Masking in FPGAs Conclusions

## SecMat v3[FPGA] – covariance with $|LR[0] \oplus LR[1]|$





- Typical trace: 19 mV
- Typical DPA: 0.19 mV
- $\Rightarrow$  Side-channel leakage: 1.0 %



8 16 24 32

Time Icloc

260

Covariance result (same scale as the average power tra-

Jean-Luc Danger

0.05

Protocol-Level Register Transfer Level Netlist Level

#### **Presentation Outline**

- I FPGA specificity and vulnerability
- Overview of countermeasures in FPGAs
- 3 Protection by DPL in FPGAs
- Protection by Masking in FPGAs
- 5 Conclusions



→ ∃ >

▲ □ ► ▲ □ ►

Protocol-Level Register Transfer Level Netlist Level

## Targeted strategies

- Protocol-level:
  - Most wanted since provable
- Register-Transfer Level:
  - Masking, boolean or algorithmic.
  - Encrypted leakage
  - Glitch-full circuits
- Netlist or implementation level:
  - Hiding= DPL, Dual-rail with Precharge Logic
- Degenerated counter-measures
  - Noise generator, Dummy instructions, Varying clock, etc.

A⊒ ▶ ∢ ∃

**Protocol-Level** Register Transfer Level Netlist Level

#### if $\approx 1$ bit is leaked per 100 encryptions...



The FPGAs designs can take advantage of Reconfigurability to change regularly the implementation.

Jean-Luc Danger

SOCSIP security

590

TELECOM TELECOM

SECURE

Protocol-Level Register Transfer Level Netlist Level

## Masking

#### Principle

- Every variable s, potentially sensible, is represented as a share  $\{s_0, s_1, \cdots, s_{n-1}\}$
- To reconstruct s, all the  $s_i$  are required.
- Example: n = 2,  $s \doteq s_0 \oplus s_1$ .

#### Constraints and Drawbacks

- Leakage resistant since variables are never used plain.
- Attractive but works only fine for registers.
- Efforts done to protect also the combinational logic.
- Sensitive to Hi-orders attacks.
- Ineffective against Fault attacks.



Conclusions

Protocol-Level Register Transfer Level Netlist Level

イロト イヨト イヨト イヨト

æ

## Encrypted Leakage



Protocol-Level Register Transfer Level Netlist Level

## Hiding by using DPL: Dual Rail with Precharge Logic

#### $a \leftrightarrow (a_f, a_t)$ DPL representation:

- *a* is **VALID** if  $a_f \oplus a_t = 1$ . VALID  $\doteq$  {VALID0, VALID1} or VALID  $\doteq$  {(1,0), (0,1)}.
- *a* is **NULL** if  $a_f \oplus a_t = 0$ . NULL  $\doteq$  {NULL0, NULL1} or NULL  $\doteq$  {(0,0), (1,1)}.



Image: A (1)

SECURE

Protocol-Level Register Transfer Level Netlist Level

# A common DPL: WDDL=Waveform Dynamic Differential Logic



A digital circuit and its WDDL equivalent

PRE/EVAL at bt yt bt yt yt yt

Timing Diagram of a WDDL AND gate

Only positive gates could be used for netlist synthesis. Jean-Luc Danger SOCSIP security 14/43

Protocol-Level Register Transfer Level Netlist Level

## Important constraints in DPL : No glitches +

#### No Early Evaluation



#### No Technological Biais

- OR consumption = AND consumption
- routing T = routing F

RE

How to meet the DPL constraints in FPGAs ? Case study of BCDL

#### **Presentation Outline**

- FPGA specificity and vulnerability
- 2 Overview of countermeasures in FPGAs
- OPROTECTION BY DPL in FPGAs
- 4 Protection by Masking in FPGAs
- 5 Conclusions



< 🗇 🕨

How to meet the DPL constraints in FPGAs ? Case study of BCDL

## Security constraints 1/2

#### Logic without glitches and early propagation

#### $\Rightarrow \textbf{Synchronization}$

The rules to be "synchronized":

- Rule 1: Evaluation starts after all the input signals are valid.
- Rule 2: Precharge starts:
  - Either after all the inputs becomes NULL<sup>1</sup> but the outputs need to be memorized or
  - Or before the first input becomes NULL (which does not need any memorization).

<sup>1</sup>NULL is the value in precharge phase

SOCSIP security

SECURE 1

How to meet the DPL constraints in FPGAs ? Case study of BCDL

< 17 > <

1 IDF

## Security constraint 2/2

#### Logic with a minimum of technological biais

- Special care at placing and routing (but the FPGA vendors give few informations)
- Use of the same logic structure for True and False (e.g. MDPL with majority gates)
- Statistical balancing

#### Logic resistant to fault attacks

- Detection capability or
- Resilience

How to meet the DPL constraints in FPGAs ? Case study of BCDL

▲ □ ► ▲ □ ►

SECURE 1

## Cost and Speed constraints

#### Logic with a minimum cost

- A few more than X2
- Use of RAMs and DSP in FPGAs

#### Fast speed

• speed divided by 2. Possible to be better?

How to meet the DPL constraints in FPGAs ? Case study of BCDL

## Case study of BCDL: Balance Cell Differential Logic

The BCDL gate: Synchronization with Global Precharge



- No need of memorization as a **global precharge** *PRE* is faster than any inputs.
- $U/\overline{PRE}$  falls to 0  $\Rightarrow$  precharge is forced immediately.
- $U/\overline{PRE}$  rises to  $1 \Rightarrow$  evaluation begins after "unanimity to 1".
- Tables T and F can be fully separated ⇒ huge complexity gain.

How to meet the DPL constraints in FPGAs ? Case study of BCDL

#### Exemple of a 2-input OR gate



**A** ► 4

э

æ

How to meet the DPL constraints in FPGAs ? Case study of BCDL

#### Robustness against FA

#### In-Built Robustness against Fault Attacks

- Automatically detects symmetric faults: {VALID0, VALID1}  $\stackrel{\downarrow \text{ or }\uparrow}{\longrightarrow}$  {NULL0, NULL1}(1  $\rightarrow$  0 or 0  $\rightarrow$  1).
- "Error state" is propagated throughout the design  $\Rightarrow$  Fault resilience.

| PRECHARGE | Fault detection               |
|-----------|-------------------------------|
| 1         | state $\neq$ {NULL0, NULL1}   |
| 0         | state $\neq$ {VALID0, VALID1} |



▲ □ ► < □</p>

How to meet the DPL constraints in FPGAs ? Case study of BCDL

## Fault Detection with DSP blocks

- based on  $AxB = (-A)x(-B) \Rightarrow$  $(2A+1)x(2B+1) = (2\overline{A}+1)x(2\overline{B}+1)$
- Allows to detect and locate either during precharge or evaluation



How to meet the DPL constraints in FPGAs ? Case study of BCDL  $% \left( {\left( {{{\rm{S}}} \right)_{\rm{s}}} \right)_{\rm{s}} \right)$ 

#### Area

#### T and F easy to implement

- Not limited to positive functions
- separable
  - 1 additionnal input  $(U/\overline{PRE})$  + duplication(T and F)
  - Area of tables =  $2.2^{n+1} < 2^{2n}$  if n > 2
  - $\Rightarrow$  S-Box area = only 4 times the size of an unprotected one.

#### Total Area

 $= \mathsf{DFF}(*4) + [\mathsf{SYNC}(a \text{ few gates}) + \mathsf{T} + \mathsf{F}] * n.$ 

#### Special case: MUX driven by single rail signal

No needs of synchronization.

<ロ> (四) (四) (三) (三)

How to meet the DPL constraints in FPGAs ? Case study of BCDL

## Speed optimization



#### Faster than other DPLs

- Evaluation time > precharge time  $\Rightarrow$  performances  $\nearrow$
- Speed /  $\sim 1.25~\leftrightarrow~1.75$

・ロト ・回ト ・ヨト

э

SECURE

How to meet the DPL constraints in FPGAs ? Case study of BCDL

A 🕨 🕨 🖌 🖻

## results in FPGA Stratix for an AES implementation

#### Complexity and speed

|               | ALM  | Reg  | RAM    | Max. freq. | Max. throughput |
|---------------|------|------|--------|------------|-----------------|
| no protection | 1078 | 256  | 40 Kb  | 71.88 MHz  | 287.52 Mbps     |
| WDDL          | 4885 | 1024 |        | 37.07 MHz  | 74.14 Mbps      |
| BCDL          | 1841 | 1024 | 160 Kb | 50.64 MHz  | 151.92 Mbps     |

#### CPA results

- Attack processed on 150000 power consumption traces.
- No subkey found for BCDL.

How to meet the DPL constraints in FPGAs ? Case study of BCDL

#### MIA results for different subbytes implementations



How to meet the DPL constraints in FPGAs ? Case study of BCDL

## Comparison with other DPLs in FPGAs

- **WDDL** : Propagation of the NULL state with positive functions
- **RCDDL** : WDDL with factored logic, which amplifies the early evaluation
- **MDPL** : T gate =F gate = Majority, random Mask to balance the True and False networks
- **STTL** : A third wire is added to synchronize with the last stable signal.
- DRSL : As MDPL with a synchronization before evaluation
- **IWDDL** : Isolated WDDL with separated T and F networks by means of superpipelining
- BCDL : The logic presented here
- MBCDL : BCDL with mask

SECURE

How to meet the DPL constraints in FPGAs ? Case study of BCDL

#### Comparison with other DPLs

| Logic | Compl. | Speed           | Robust. SCA |       | Robust. FA |      | Design Constr   |
|-------|--------|-----------------|-------------|-------|------------|------|-----------------|
|       |        |                 | EE          | Т. В. | Fault      | Det. | Design Constr.  |
| WDDL  | *      | < 1/2           |             |       | asym       | comb | Positive gates  |
| MDPL  | *      | < 1/2           |             | 1     | asym       | comb | $MAJ\;gate+RNG$ |
| STTL  | *      | < 1/4           | 1           |       | sym        | seq  | 50% more wiring |
| DRSL  | *      | < 1/2           | partly      | 1     | sym        | comb | + RNG           |
| IWDDL |        | $< 1/2 \cdot n$ | 1           |       | asym       | comb | superpipeline   |
| BCDL  | **     | > 1/2           | 1           |       | sym        | comb |                 |
| MBCDL | *      | > 1/2           | 1           | 1     | sym        | comb | + RNG           |

イロト イヨト イヨト イヨト

TELECORE SECURE

æ

Zero-offset implementation Squeezed Leakage

#### **Presentation Outline**

- FPGA specificity and vulnerability
- Overview of countermeasures in FPGAs
- 3 Protection by DPL in FPGAs
- Protection by Masking in FPGAs
- 5 Conclusions



Zero-offset implementation Squeezed Leakage

## ROM Hardware masking



Masked DES implemented with ROMs.

"Zero Offset" From Waddle et al., Peeters et al..

Activity:

 $A = HW[(x \oplus m) \oplus (S(x \oplus k) \oplus m')] + HW[m \oplus m']$ 

• The register data Hamming distance is:

$$\Delta(x) = x \oplus S(x \oplus k)$$

• The register mask Hamming distance is:

< 🗇 🕨

$$\Delta(m)=m\oplus m'$$

• Then:

 $A = HW[\Delta(x) \oplus \Delta(m)] + HW[\Delta(m)]$ 

Zero-offset implementation Squeezed Leakage

#### Problem # 1: HO-attacks



Power distributions of the five possible values of  $HW(\Delta(x, k))$ .

#### Theoretic MIA attack evaluation

Table: Theoretical conditional entropy of the ROM masked DES.

| Theoretical entropies  | The correct key | Any wrong key |  |  |
|------------------------|-----------------|---------------|--|--|
| $H(O HW(\Delta(x,k)))$ | 1.3992 bit      | 2.5442 bit    |  |  |

<mark>Zero-offset implementation</mark> Squeezed Leakage

## Problem # 2: ROM too complex for FPGAs

- Need of  $2^{2n}$  memory
- Use of external Mask recomposition with USM: Universal S-Box Masking



But attackable on the combinatorial logic!

SECURE

Zero-offset implementation Squeezed Leakage

## Solution #1: Squeezed leakage by encoding tables



Jean-Luc Danger

< ≣⇒

ECURE

Zero-offset implementation Squeezed Leakage

# Solution #2: Squeezed leakage by encoding tables with USM



Jean-Luc Danger

SOCSIP security

E> E

-

Zero-offset implementation Squeezed Leakage

#### Implementation results with leakage squeezing

Table 1: Complexity and speed results. "I. s." denotes the "leakage squeezing" countermeasure.

| Implementation              | ALMs | Block mem- | M4Ks | Throughput |
|-----------------------------|------|------------|------|------------|
|                             |      | -ory [bit] |      | [Mbit/s]   |
| Unprotected DES (reference) | 276  | 0          | 0    | 929.4      |
| DES masked USM              | 447  | 0          | 0    | 689.1      |
| DES masked ROM              | 366  | 131072     | 32   | 398.4      |
| DES masked ROM with I. s.   | 408  | 131072     | 32   | 320.8      |
| DES masked USM with I. s.   | 488  | 0          | 0    | 582.8      |



▲ □ ► ▲ □ ►

Zero-offset implementation Squeezed Leakage

#### MIA results with leakage squeezing



Jean-Luc Danger

SOCSIP security

● ▶ ● ●

Zero-offset implementation Squeezed Leakage

## Squeezed leakage by mask decomposition



• • • • •

< ≣⇒

ECURE

Zero-offset implementation Squeezed Leakage

## Distributions obtained for different $\Theta$





< 🗗 🕨

JRE

Zero-offset implementation Squeezed Leakage

## MIA by Squeezed leakage by mask decomposition



#### Presentation Outline

- FPGA specificity and vulnerability
- Overview of countermeasures in FPGAs
- 3 Protection by DPL in FPGAs
- Protection by Masking in FPGAs

**5** Conclusions

< 17 b

SECURE

- The FPGAs need efficient countermeasures to be protected against physical attacks.
- Three levels:
  - Protocol:
    - Reconfiguration can be done in FPGAs
  - RTL : Masking by taking davantages of RAMs but care has to be taken against HO-DPA. Exemples:
    - Leakage squeezing
    - Mask decomposition
  - Netlist : By using DPL. Examples:
    - $\bullet\,$  STTL: no EE, need of 3rd wire, care of  $\mathsf{P}/\mathsf{R}$
    - $\bullet\,$  BCDL: no EE, low complexity, care of  $\mathsf{P}/\mathsf{R}$
    - MBCDL: BCDL + easy P/R

## Thanks for your attention. Any question?

Jean-Luc Danger

SECURE