Lecture Note 10
locality
- temporal locality
- spatial locality
128 byte cacheline: chunk of data transferred between the cores' cache -> associated with specific core at specific time -> 'move' the access authority to other cores
problem: accessing multiple location in the cacheline requires exclusive access to the cacheline e.g. core1 accessing l1, core2 accessing l2 (different part of the data) -> trying to access l's requires exlusive access to the cacheline false sharing: not fshared in the code, but underlying hardward is actually sharing the data
protogen(isca): https://dl.acm.org/doi/10.1109/ISCA.2018.00030
crossbeam_utils::CachePadded
//-> padding bytes for concurrent programming, size of cachelines
crossbeam_utils::channel
//-> padding bytes for concurrent programming, size of cachelines
- sending/receiving values among thread
- message passing model / actor model
- communicate with each other
rayon::into_par_iter(self)- > self::Iter
parallele iterator will be returned, multiple threads will be executed with the datas in the iter.
Implementations
Spinlock
- AtomicBool value + multiple shared references to the bool value
-> tryp to acquire the lock
- compare and swap(&inner, 0, 1): CAS operation -> atomically
- false( 0): not acquired by other thread (sable)
- True(1): acquired by some thread
- compare and swap(&inner, 0, 1): CAS operation -> atomically
impl RawLock for SpinLock {
type Token = ();
fn lock(&self) {
let backoff = Backoff::new();
while self
.inner
.compare_exchange(false, true, Ordering::Acquire, Ordering::Relaxed) // cas
.is_err()
{
backoff.snooze();
}
}
// ensured that nobody has changed the value of the lock (by cas)
// only called after the guarantee of lock has been acquired
unsafe fn unlock(&self, _token: ()) {
self.inner.store(false, Ordering::Release);
}
}
``
> problem of
old=inner if old=0 then inner=1
old1=0 if old1=0 inner=1;
old2=0 if old2=0 inner=1; // multiple thread accessing the data and
> Dekker's algorithm: slow!!!