2020. 7. 7. 13:04ㆍComputer Architecture/Cache security
author: Mengjia Yan† , Jiho Choi† , Dimitrios Skarlatos, Adam Morrison∗ , Christopher W. Fletcher, and Josep Torrellas University of Illinois at Urbana-Champaign ∗Tel Aviv University {myan8, jchoi42, skarlat2}@illinois.edu, mad@cs.tau.ac.il, {cwfletch, torrella}@illinois.edu †Authors contributed equally to this work.
[1. Introduction]
Spectre and Meltdown monitor the micro-architectural foot-print left by speculation, such as the state left by wrong-path speculative loads in the cache.
Spectre variant 1, where mis-predictions on direct conditional branches can lead to attacks such as array bounds check bypassing.
--> Solution: There is no current hardware proposal to block it,except for disbling all speculation, which is clearly unacceptable.
In this paper, we propose InvisiSpec, a novel strategy to defend against hardware speculation attacks in multiprocessors, by making speculation invisible in the data cache hierarchy.
Goal: to block micro-architectural covert and side channels through the multiprocessor data cache hierarchy due to speculative loads e.g., channels stemming from cache set occupancy, line replacement information, and cache coherence state.
InvisiSpec's micro-architecture is based on two mechanisms.
1. unsafe speculative loads read data into a new Speculative Buffer(SB) instead of into the caches, without modifying the cache hierarchy.
Data in the SB do not observe cache coherence transactions.
2.When a speculative load is finally safe, the InvisiSpec hardware makes it visible to the rest of the system by reisuuing it to the memory system and loading the data into the caches.
[2. BACKGROUND AND TERMINOLOGY]
Total Store Order(TSO)
is the memory model of the x86 architecture.
TSO forbids all observable load and store reorderings except store-->load reordering, which is when a load bypasses an earlier store to a diffrent address.
Im
Release Consistency (RC)
RC allows any reordering, except across synchronization instructions.
[4. UNDERSTANDING SPECULATIVE EXECUTION ATTACKS]
What event creats the transient instructions.
Exceptions.
The processor squahsed the execution when the exception-raising instruction reaches the ehad of the ROB.
but by this time, dependent transmitting instructions can leak data.
The Meltdown and L1 Terminal Fault(L1TF), attacks exploit virtual memory-related exceptions.
Meltdown reads a kernel address mapped as inaccessible to user space in the page table.
L1TF reads a virtual address whose page table entry (PTE) marks the physical page as not present, but the physical address is still loaded.
Control-flow misprediction.
Spectre attacks exploit control-flow speculation to load from an arbitray memory address and leak its value.
Variant 1 performs an out-of-bounds array read, exploiting a branch misprediction of the array bounds check.
Other variants direct the control flow to an instruction sequence that leaks arbitrary memory, either through indirect branch misprediction, return address misprediction, or an out-of-bounds array write that redirects the control flow (e.g., by overwriting a return address on the stack).
We define a Futuristic attack as one that can exploit any speculative load.
In the rest of the paper, we present two versions of invisiSpec: one that defend against Spectre attacks, and one that defend against Futuristic attacks.
[5.InvisiSpec: Thwarting Speculation Attacks]
1) Unsafe Speculative Loads:
Any load that initiates a read before it reaches the head of the ROB is a speculative load.
(large) subset of speculative loads that can create a security vulnerability due to speculation. --> USLs: Unsafe Speculative Loads.
The set of speculative loads that ar USLs depends on the attack model.
- Spectre attack model, USL == speculative loads that follow an unresolved control-flow instruction.
- Futuristic attack model, USL == speculative loads that can be squahsed by an earlier instruction.
A USL transitions to a safe load as soon as it becomes either (i) non-speculative because it reaches the ehad of the ROB or (ii) speculative non-squashable by any earlier instruction (or speculative non-squashable for short).
Note that in the table, one of the squashing events is interrupts.
Hence, making a load safe includes delaying interrupts until the load reaches the head of the ROB.
2) Making USLs Invisible
The idea behind InvisiSpec is to make USLs invisible.
This means that a USL cannot modify the cache hierarchy in any way that is visible to other threads, including the coherence states.
Speculative Buffer (SB): USL loads the data into a special buffer that we call SB, and not into the locacl caches.
There is a point in time when the USL can transition to a safe load.
At this point, called the Visibility Point, InvisiSpec takes an action that willl make the USL visible i.e., will make all the side effects of the USL in the memory hierarchy apparent to all other threads.
InvisiSpec makes the USL visible by re-loading the data, this time storing the data in the local caches and changing the cache hierarchy states.
The load may remain specultavie.
3) Maintaining Memory Consistency
Window of Suppressed Visibility (for a load): is the time period between when the load is issued as a USL and when it makes itself visible.
During this period, since a USL does not change any coherence stae, the core may fail to receive invalidations directed to the line loaded by the USL.
~~~
To solve this prob, InvisiSpec may have to perform a validation step when it re-loads the data at the load's visibility point.
It is possible that the line requested by the USl was already in the core's L1 cache when the USL was issued and the line loaded into the SB.
4) Validation or Exposure of a USL:
The operation of reloading the data at the visibility point has two flavors: Validation and Exposure.
There are two ways to make a USL visible: validation adn exposure.
The memory consistency model determines which one is needed. Figure 2 shows the timeline of a USL with validation and with exposure.
B. InvisiSpec Operation
A load in InvisiSpec has two steps.
First, when it is issued to memory as a USL, it accesses the cache hierarchy and obtains the current version of the requested cache line.
The line is only stored in the local SB, which is as close to the core as the L1 cache.
USL s do not modify the cache coherence states, cache replacement algorithm states, or any other cache hierarchy state.
No other thread, local or remote, can see any changes.
However, the core uses the data returned by the USL to make progress.
The SB stores lines rather than individual words to exploit spatial locality.
When the USL can be made visible, and always after it has received its requested cache line, the hardware triggers a validation or an exposrue transaction.
Such a transaction requests the line again, this time modifying the cache hierarchy, and bringing the line to the ocal caches.
validation and exposure transactions operate differently and have different performance implications.
We consider two attack models, Spectre and Futuristic, and propose slightly different InvisiSpec designs to defend against each of these attacks.
In our defense against the Spectre attack, a USL reaches its visibility point when all of its prior control-flow instructions resolve.
At that point, the hardware issues a validation or an exposure transaction for the load depending on the memory consistency model and the load's position in the ROB.
If multiple USLs can issue validation or exposure transactions, the transactions have to start in program order, but can otherwise all overlap.
In our defense against Futuristic attacks, a USL reaches its visibility point only when: (i) it is not speculative anymore because it is at the head of the ROB, or (ii) it is still speculative but cannot be squashed anymore.
At that point, the hardware issues a validation or an exposure for the load depending on the memory consistency model and the load's position in the ROB.
Overall, in the Spectre and Futuristic defense designs, the only pipeline stalls may occur when a validation transaction holds up the retirement of a load at the head of the ROB and the ROB is full. This is more likely in Futuristic than in Spectre.
InvisiSpec-Spectre (or IS-Spectre) and InvisiSpec-Future (or IS-Future).
Compared to the fence-based designs, InvisiSpec improves performance.
Our concern is that validation transactions for loads at the head of the ROB may stall the pipeline. However, we will show how to maximize the number of exposures (which cause no stall) at the expense of the number of validations.
C. When to Use Validation and Exposure
The memory consistency model determines when to use a validation and when to use an exposure.
Consider TSO) a speculative load that reads when there is at least one older load (or fence) in the ROB will be squashed by an invalidation to the line it read. Hence, such a USL is required to use a validation.
Consider RC) In this case, only speculative loads that read when there is at least one earlier fence in the ROB will be squashed by an invalidation to the line read. Hence, only those will be required to use a validation; the very large majority of loads can use exposures.
1) Transforming a Validation USL into an Exposure USL:
Assume that, under TSO, USL initiates a read while there are eralier loads in the ROB.
Originarily, InvisiSpec would mark USL as needing a validation.
However, assume that, at the time of the read, all of the loads in the ROB earlier than USL have already obtained the data they requested—in particular, if they are USLs, the data they requested has already arrived at the SB and been passed to a register.
In this case, USL is not reordered relative to any of its earlier loads. As a result, TSO would not require squashing USL1on reception of an invalidation to the line it loaded. Therefore, USL is marked as needing exposure, not validation.
Performed bit: It is set when the data requested by the USL has been received in the SB and passed to the destination register.
2) Early Squashing of USLs Needing Validations:
Assume that a core receives an invalidation for a line in its cache that also happens to be loaded into its SB by a USL marked as needing validation. Receiving an invalidation indicates that the line has been updated. Such update will typically cause the validation of the USL at the point of visibility to fail. Validation could only succeed if this invalidation was caused by false sharing, or if the net effect of all the updates to the line until the validation turned out to be silent (i.e., they restored the data to its initial value). Since these conditions are unlikely, InvisiSpec squashes such a USL on reception of the invalidation. There is a second case of a USL with a high chance of validation failure. Assume that USL needs validation and has data in the SB. Moreover, there is an earlier USL to the same line (but to different words of the line) that also has its data in the SB and needs validation. When USL performs the validation and brings the line to the core, InvisiSpec also compares the line to USL’s data in the SB. If the data are different, it shows that USL has read stale data. At that point, InvisiSpec conservatively squashes USL.
D. Overlapping Validations and Exposures
To improve performance, we seek to overlap validation and exposure transactions as much as possible.
[6. Detailed InvisiSpec Design]
1) Speculative Buffer Design:
InvisiSpec places the SB close to the core to keep the access latency low.
Our main goal in designing the SB is to keep its operation simple, rather then minimizing its area;
Hence, we design the SB with as many entries as the Load Queue(LQ), and a one-to-one mapping between the LQ and SB entreis.
Given an LQ entry, InvisiSpec can quickly find its corresponding SB entry.
Further, since the LQ can easily find if there are multiple accesses to the same address, InvisiSpec can also identify multiple SB entries for the same line.
Importantly, it is trivial to (i) allocate an SB entry for the next load in program order, (ii) remove the SB entry for a retiring load at the ROB head, and (iii) remove the SB entries for a set of loads being squashed.
These operations need simple moves of SB's Head and Tail pointers-which are the LQ's Head and Tail pointers.
An SB entry does not store any address.
It sotres the data of a cache line+an Address Mask that indicates which bytes were read.
Each LQ entry has some status bits: Valide, Performed, State, and Prefetch.
- Valide bit: records whether the entry is valid
- Performed bit: indicates whether the data requested by the USl has arrvied and is stored in the SB entry.
- State bit: indicates the state of the load.
- Prefetch bit: indicates whether this entry corresponds to a prefetch.
States:
- E: requiring an exposure when it beomes visible
- V: requiring a validation when it becomes visible
- C: exposure or validation has completed
- N: invisible speculation is not necessary for this lad
The latter is used when invisible speculation is not needed, and the access should go directly to the cache hierarchy.
2) Opeartion of the Load Queue and Speculative Buffer:
A load instructino is issued: The HW allocates an LQ entry and an SB entry. The LQ entry's Valid bit is set.
The address of a load is resolved: The load is ready to be sent to the cache hieararchy.
If the load is safe according to the attack model, the Tste bits in the LQ entry are set to N and the load is issued to the cache hieararchy with a normal coherence transaction. The SB entry will be unused.
Otherwise, the load is a USL, and the State is set to E or V, as dictated by the memory consistency model.
Specifically, the LQ determines whether there is any prior load in the LQ that already requested the same line.