Lecture Note 3
Computer abstraction and Technology
defining performance can be tricky!
- Response Time : How long it takes to do a task
- Throughput : Total work done per unit time
- more emphasized in parallel programming
- replacing processor with a faster version
- add more processors -> more tasks in the same time (increasing throughput)
Relative performance
Performacne = 1 / (Execution time)
if X is $n$ times faster than Y, the excution time of Y takes $n$ times longer
Measuring execution time
-
Elapsed time (Total response time)
- includes CPU time, disk access, network access, os overhead, idle time, etc.
- measuring the whoe system performance at once
-
CPU time (time spent processing a given job)
- user CPU time + system CPU time
(Clock Rate) = (Clock Cycle) / (CPU time) (CPU time) = (Clock Cycle) / (Clock Rate)
(CPU clock cycles used) * (clock time per cycle) = (cpu clock cycles) / (clock rate)
-
Improving performance
- reducing clock cycle #
- increase clock rate
- clock rate vs cycle count? : increagin clock rate can cause more clock cycles
-
Instruction counts: Determined by program, ISA, compiler, etc.
(Clock cycles) = (Cycles per Instruction) * (Instruction #) (CPU Time) = (Instruction #) * (CPI) / (Clock Rate)j
Different instructions have different CPI: average CPI utilized
- $(\text{average CPI}) = \displaystyle \sum{(\text{CPI}_{\text{instruction}})} \cdot (\text{Relative frequency of instruction})$
CPU Clock
digital hardward generating constant-rate clock
- synchronized circuits utilizing the clock cycles of cpu
Performance summary
$$ \text{CPU time} = \frac{\text{instruction #}}{\text{Program}} \cdot \frac{\text{Clock cycle #}}{\text{Instruction #}} \cdot \frac{\text{clock cycle time}}{\text{Clock cycle #}} $$
- Power consumption
9Power consumption) = (capacitive load) * (frequency) * (voltage) ** 2
- power wall! (around 2003)
- we cannot reduce voltage: reliability problems
- we cannot reduce heat: temperature of cpu increase as as power consumption increases
introduction to multicore processor (from unicore-processor)
- Multiprocessors: hard to program (explicit parallel programming)
- c.f. instruction level parallelism (ILP)
Amdahl's Law
$$ T_{\text{improved}} = \frac{T_{\text{affected}}}{\text{improvement factor}} + T_{\text{unaffected}} $$
we should make the affected portion large enough to improve the overall performance!