Concurrent Programming
Table of Contents
- 1. Lect 2 (24 August): Basic Mechanics for Processes and Threads
- 2. Lect 3 (29 August): Atomicity
- 3. Lect 4 (31 August): Atomicity Continued
- 4. Lect 4 (31 August): Memory Model
- 5. Lect 5 (7 September): Memory Model continued
- 5.1. TODO Visibility
- 5.2. TODO TSO (Total store order)
- 5.3. Special instructions/ FENCE
- 5.4. Compiler optimization
- 5.5. Vocabulary
- 5.6. A data race free (DRF) program behaves sequentially
- 5.7. What happens if data race
- 5.8. Race v/s data race
- 5.9. atomic provide order and
- 5.10. Order: defined control flow
- 5.11. TODO Ordered by happens-before
- 6. Lect 6: Publishing
- 7. Lect 7: Patterns for concurrent programming
- 8. Lect 8: Patterns for concurrent data structures
- 9. Lect 9: Patterns for thread safe data structures
- 10. Lect 10: Case study: Swing Package
- 11. Lect 11: Patterns for concurrent data structures 4: Deadlocks
- 12. Lect 12: Deadlocks (contd)
- 13. Lect 13: Java synchronized, wait, notify(got late)
- 14. Lect 14: Reader Writer problem (contd)
- 15. Lect 15: Cancellation
- 16. Lect 16: Cancellation (contd)
- 17. Lect 17: Split Binary Semaphore
- 18. Mid Exam 1 Notes:
- 19. Lect 18: Thread Pools
- 20. Lect 19: Executors & Futures
- 21. Lect 20: C++ Move, futures,
fork/join
- 22. Lect 21: case study: Concurrency in Android, Lock implementation
- 23. Lect 22: TTAS
- 24. Lect 23: Spin Lock Implementations, hybrid Locks
- 25. Lect 24: Mid 1 Review
- 26. Linearizability and correctness
- 27. Reasoning about Linearizability
- 28. Composing higher constructs using primitive
- 29. Parallel Programming
- 30. OpenMP (cont)
- 31. Research Work: ACES4
- 32. Performance Models
- 33. Go
- 34. Actors
- 35. MPI
- 36. Vector Algorithms
- 37. Final Review
- 38. Deadlocks
- 39. Livelock
- 40. Starvation
- 41. Invariant
1 Lect 2 (24 August): Basic Mechanics for Processes and Threads
- Unit of execution – common word for process or thread
- Threads: light weight process
1.1 Process
- has its own heap and stack (address space)
- IPC: requires sys call, hence slow
- created in UNIX using
fork
– everything same except return code
1.2 Thread
- share address space; shared heap
pthreads
: POSIX threads- IPC don't require sys call
1.2.1 PThreads
See slides for more Pthreads APIs, I'm listing some details about some APIs:
int pthread_join(pthread_t thread, void **value_ptr);
will suspend execution of the thread which calls it until target thread terminates. Useful when main thread waits for spawned threads to complete their execution before continuing. Here we can collect returned valued from thread invalue_ptr
.void pthread_exit(void *retval);
This will exit the thread which calls it (which callspthread_exit
) Main thread will exit, but spawned threads will continue. This is useful when main thread is used only to spawn new threads and does no other work.int pthread_detach(pthread_t thread);
The pthreaddetach() function marks the thread identified by thread as detached. When a detached thread terminates, its resources are automatically released back to the system without the need for another thread to join with the terminated thread.- use
pthread_exit
to exit from thread and return value instead of directly usingreturn val
so that proper destructor are called.
detached
vsjoin
Create a detached thread using
pthread_attr_setdetachstate
when we don't want to join it. A detached thread, after terminating, returns all of its resources, where a joinable thread doesn't return resources until it is joined. If a joinable thread is not joined, it's resources are not released until the process it terminated.- Todos
1.2.2 Java: java.lang.Thread
- threads can be set as Daemon
- JVM waits for non daemon threads before shutting down
2 Lect 3 (29 August): Atomicity
2.1 Immutable: Java BigInteger is immutable
2.2 Stateless method
everything which is created is deleted/cleared after return Thread safe
2.3 AtomicLong: Java provided atomic DS
- Gives methods such as increment
- C++ added atomic DS in C++ 11.
- utilizes hardware to provide atomicity
2.3.1 TODO Memory synchronization
2.4 Lock
- Package
java.util.lock
- Interface ReenterantLock
- Exceptions in critical section
use
unlock
infinally
l.lock() try //critical section finally l.unlock()
- Note, we need to lock
lastNumber
returnlastFactor
, we return what we checked [slide 25]
2.4.1 TODO slide 24, return statement
2.5 Pthread Mutexes
- pthreadmutex_[lock,trylock,unlock]
tryLock
returns true/falsereenterantLock
orpthread_mutex_recursive
same thread can lock the same lock- Java -
reenterantLock
is default; C,C++ says otherwise
- Java -
- No garbage collector:
pthread_mutex_destroy
2.6 Java Intrinsic Locks (monitor locks)
- used with
synchronized
keyword - compiler gives support for proper use
this
is default object for lockingsynchronized
keyword is not inherited in sub classes- Java: intrinsic lock can also be on class object, refer SO (slide 47)
2.6.1 DONE was in earlier version, considered as mistake
- many still use
- every object has lock - overhead
- some methods are missing/unavailable in such method of locking
lockinterruptibility
,tryLock
…
- SO says
synchronized
is for easy cases,locks
give more flexibility and responsibility.
2.6.2 How monitors work intuition
AtomicIntegers can be implemented using synchronized
keyword.
class AtomicInt { private int value; synchronized public int increment() { // here we are having lock on `this` return AtomicInt(this.value++); // since atomic types are immutable, we are passing // new AtomicInt } }
meanwhile in some other part of the universe
AtomicIntegers i = new AtomicInteger(0);
t1 calls i.increment()
t2 calls i.increment()
where `i` is shared, while t1
and t2
are different objects, i
is shared
and this is used for locking in the intrinsic lock (monitor lock) in
synchronized public int increment() {
Thus atomicity is maintained.
3 Lect 4 (31 August): Atomicity Continued
3.1 template for atomic type
++
is overloaded- implemented by runtime
- often utilized using hw instruction
- if not available, it uses lock
- in c++,
is_lock_free
indicates if lock is used
3.2 mutex (locks)
3.2.1 lock_guard
- TODO Implements RAII (Resource allocation is initialization)
manager
object: allocated in constructor, deallocates in destructor{ manage_obj man // calls constructor, allocates resource (file open, mutex lock) //do your stuff... } // man goes out of scope, destructor is called
mutex
RAII - explained in slide "Cached version"
4 Lect 4 (31 August): Memory Model
5 Lect 5 (7 September): Memory Model continued
5.1 TODO Visibility
5.2 TODO TSO (Total store order)
5.3 Special instructions/ FENCE
- sides of the FENCE are not mixed
- eg: ARM calls fence as
sync
- eg: ARM calls fence as
- compiler takes care of it
- add FENCE only when required
5.4 Compiler optimization
- eg: dead code elimination -> spin locks
5.4.1 when to optimize?
- communication needed with compiler
5.5 Vocabulary
5.5.1 Conflicting operation:
- different threads
- access same location
- at least one write
5.5.2 Data race:
conflicting operations are not ordered
5.6 A data race free (DRF) program behaves sequentially
5.7 What happens if data race
5.8 Race v/s data race
- some mean race, as un wanted behavior due to schduler
5.9 atomic provide order and
5.10 Order: defined control flow
5.11 TODO Ordered by happens-before
6 Lect 6: Publishing
6.1 Publishing (problems with Publishing data)
- ensure happens before relation between construction and use of object
- wrapper for implementing multithreading
- do not trush the function
- block pointers
6.1.1 Java double checking locking
- partially created instance can trigger
instance == null
- constructor may not be atomic
- can be fixed by making instance
volatile
- can this be fixed by synchronizing
constructor
6.2 unsafe publication - escape
6.2.1 Construction
- Java:
this
cannot escape during construction - use
private
constructor, wraper to get new object
6.3 Immutability
final
is special in Java wrt to concurrencycall_once
is corresponding stuff in C++
7 Lect 7: Patterns for concurrent programming
7.1 Fully synchronized object
- one single lock
- no public fields
- finite
- consistent
- publish object without data race
7.1.1 Not scalable
7.1.2 Easy to implement
- good when concurrency level is low
7.1.3 Publishing
- if instance is
volatile
in java then it's construction is safe - publishing is about constructing
7.1.4 Re using sequential code
- modify source code - add lock
- subclass - add lock
- adapter, proxy, lock then call method
- iterator is not thread safe
- user has to manually do it
why not synchronized
iter.hasNext(); nextItem = iter.next(); //two methods
- throws exception
ConcurrentModification
- this is not guaranteed
7.2 Finer grained approach
- maybe too small granularity
- identify stateless
- does not rely on mutable object
- slide 56, why make a copy of
b
- maybe read > 1 in
compute
- maybe read > 1 in
8 Lect 8: Patterns for concurrent data structures
8.1 Hand-over-hand locking
- lock only one element of the data structure
- in example in slides: lock means, to delete predecessor we need lock
- what about inserting new node (my question)
- in example in slides: lock means, to delete predecessor we need lock
8.2 Optimistic synchronization
- let things happen, then check if something went wrong
- better if low contention
- Atomic commit
- check if anything changed
- if no, then edit/commit
- dont use optimistic sync if irreversible side effects
9 Lect 9: Patterns for thread safe data structures
- slide 49:
Node next
should bevolatile
, one thread locks it but other traverses without locking.item
andkey
final.
9.1 Copy on write
- relevant when we need to update multiple field in the same object
10 Lect 10: Case study: Swing Package
10.1 Threads:
- main thread
- event queue for GUI
10.2 event listener
- create button
- create listener
- install listener
10.3 Swing not thread safe
- event dispatch thread cannot do long computation - GUI freeze
listenerList
is copy-on-write- what happens if event is removed from
listenerList
- handle events even after removing listener
10.4 Assignment 1: Thread safe queue
- C++, template - any class can be put in
- use exception - inherit from standard exception
default
compiler will generate the functiondelete
cannot be called- copy constructor
=
operator
lock_guard
auto release when goes out of scope- even if exception happens
shared_pointer
auto deleted after all reference are exhausted
11 Lect 11: Patterns for concurrent data structures 4: Deadlocks
11.1 Design problem: queue
- cannot be solved just by making all methods
synchronized
empty
is shared, two threads in differentsynchronized
methods can access the same variable.
11.2 TODO Automated tests
- how to test data races?
- create threads, add to queue
- make threads to add certain intervals
- time to make thread v/s work done in thread
11.3 Deadlocks
- when thread hold multiple locks - think about deadlocks
11.3.1 Avoiding deadlocks
- associate lock with serial number
- cannot lock before locking other instance
- acquire lower number lock than higher number lock.
- orderings to acquire lock:
- ids in object
- system identity hash code
11.3.2 Deadlock suspesion
- design code without deadlocks
12 Lect 12: Deadlocks (contd)
12.1 Let them happen, then deal with it
- hard to detect deadlock accurately in distributed concurrent programs
- if there suspicion that there is deadlock, consider as there is deadlock
synchronized
keyword doesn't allow totry_lock
try_lock
returns false if could not acquire lock else lock.- try to get lock, if not acquired then release own lock
- 0, try again immediately
- constant delay
- random delay
- exponential back off
12.2 Thread priority
- 10 priority levels in Java, maybe mapped to lesser number of OS thread priority
- thread priority may not be very useful
- give daemon thread low priority
12.3 Conditional Synchrnozation
12.3.1 Design by Contract
13 Lect 13: Java synchronized, wait, notify(got late)
13.1 wait
- obj reacquires the lock after coming from wait state.
- check and wait must be in a while loop
13.2 Safety and Liveness
- safety is correctness
- doing nothing can also be correct
- ensure Liveness
notify
v/snotifyAll
notify
may be efficientnotifyAll
may be easier to prove correct
13.3 waiting in C++
condition_variable
13.4 Readers-Writers Problem
concurrent read allowed
nr > 0 || nw == 1
14 Lect 14: Reader Writer problem (contd)
- instead of just mutual exclusion, multiple readers allowed, only one writer allowed.
invariant
nr >= 0 ^ nw >= 0 ^ (nw == 0 v (nw == 1 and nr == 0))
condition
in javaawait
readers
lock
14.1 Bounded Buffer
invariant
0 <= i ^ 0 <= j ^ 0 < i - j <= N
14.2 TODO unique_lock
v/s lock_guard
15 Lect 15: Cancellation
15.1 TODO Solve last problem of last slides
15.2 Cancellation: terminate thread externally
15.2.1 Do cleanup
- release locks
- free memory (if not garbage collection; or delete references)
- free resources
15.2.2 Unrestricted asynchronous cancellation
- stop thread - it stops sometime soon
- Java stop deprecated - it may not cleanup
- difficult to stop program in safe manner -see slides
15.2.3 Polling for flag
- set flag
- thread polls and stops when see flag set
- after self cleaning
15.2.4 Deferred cancellation
- request cancellation anytime but will occur at specific points
- where threads checks for it
- predefined cancellation point
- Implementation
- Java
Java.lang.Thread.interrupt()
- throw
InterruptedException
- if the thread is blocked, sleeping, waiting
run
method not declared to throwInterruptedException
- catch exception and interrupt thread
Thread.currentThread.interrupt()
- or wrap into
RuntimeException
RuntimeException
doesn't need to be declared (unchecked exception)
- catch exception and interrupt thread
- catching clears interrupt flag
- can poll for interrupt -
interrupt()
,isInterrupted
- Advantages
- language provided
- works even after when blocked
- Interrupts and memory model
- call on
interrupt()
happens before detection- no data race
- check before something difficult to stop
- empirical test
- call on
- Problems
synchronized
doesn't allow interrupt- Blocked on IO doesn't interrupt
- socket have timed waiting blocking
- Java
16 Lect 16: Cancellation (contd)
16.1 Interrupts and IO
- Sockets have timeout
- poll when timed out
16.2 Resource revocation (Java)
- forcefully close resource
16.2.1 Java.nio
close()
provided in Java.nio- no interrupt on write
16.3 C++ thread cancellation
- no support
- use Boost
- can be implemented using
Java.socket
like timeout- Refer: C++ concurrency in action: Practical Multithreading
16.4 Timed Waiting
16.4.1 Java
if (B,T) s // B: if condition is true // T: wait for T timeout // execute s
16.4.2 C++
wait_for(lock, duration); wait_until(lock, time_point);
- slide 15: what happens after we go into wait until and then done is true
17 Lect 17: Split Binary Semaphore
- singaling on semaphores are sequencial
17.1 Passing the baton
- only one thread does (slide 59)
18 Mid Exam 1 Notes:
18.1 Java memory model allows assignments to be reordered.
//thread 0 int x = ... Boolean done = true; //thread 1 if (done) { //do something with x }
Here declare x
as volatile
This is also the reason for problem in double checking idiom. Due to assignment
re-ordering, the constructed object can reflect as non NULL
before construction
is finished.
18.2 Deadlocks
If a thread acquires >= 2 locks, think of deadlock. Two ways of dealing with deadlocks:
- Avoidance: order them so that without first locking 'A', should not lock
'B'. In java, we can use
System.identityHashCode
to get unique value from totally ordered set. - Dealing with deadlock: use
tryLock
, which uses timeout to tell locking failed. Then re-try, with:- certain timeout
- exponential back off
- random delay
- 0 delay
18.3 Safe publication:
- mark variable as
static
- mark as
volatile
- refer from concurrent data structure, such as
ConcurrentQueue
etc. - keep
atomic
reference.
18.4 wait
, notify
obj.wait()
only releases one lock, if it is nested inside multiple locks, this will release only 1 (outer ones are still acquired).
18.5 Passing the baton:
- #guards = conditions to be satisfied
- guards are conditions, thus multiple invariant expression can form a single
guard.
Ex: in reader/writer,
nr == 0 ^ nw == 0
denotes safe for writer. - be careful of the initial state of semaphores
18.5.1 How multiple readers access is implemented
The SIGNAL
is executed after startReading
as well as endReading
. Thus
after a reading thread awakens, it can signal other reading thread.
- which in turn can cause starvation for writing thread
- in
startRead
, check if no writer and no writer pending.
- in
18.6 Callable
v/s Runnable
A Runnable, does not return a result and cannot throw a checked exception.
18.7 ReentrantLock
v/s synchronized
ReentrantLock
provides timed lock waits, interruptible lock waits, non-block-structured locks, multiple condition variables and lock pollingReentrantLock
can give conditional locking.
18.8 getAndSet()
getAndSet()
creates a lot more traffic on message bus thanget()
. Thus it is better to test byget()
before spinning ongetAndSet()
.getAndSet()
invalidates the cache, thus substantial performance penalty.
18.9 lock_guard
v/s unique_guard
lock_guard
automatically releases lock on reaching closing brace. Takes care of exception.unique_guard
: same aslock_guard
, but can be unlocked and locked again.
18.10 safety and *liveness~
- safety says nothing bad happens
- liveness something good will eventually happen
19 Lect 18: Thread Pools
19.1 Overhead of creating threads, and cleaning them
- 2 options:
- Thread Pool
- Futures
19.2 web Server
19.2.1 TODO check socket
doc in java and c++
19.2.2 socket construction
// version 1 Socket s = new Socket(IP_add, port); //blocking! //version 2 Socket s = new Socket(); s.inetSocket(ip, port, timeout); //added `timeout`
19.2.3 Server construction
19.2.4 Producer Consumer Paradigm
- N threads
- one server socket which creates job in bounded buffer
- N consumer/worker threads
- Need to clean resources etc
- may use
java.util.ExecutorService
- may use
20 Lect 19: Executors & Futures
20.1 TODO Java Executors Doc
Java lambda automagically turns into runnable
or callable
depending on
if value is returned
20.2 Asynchronous Computations
20.2.1 Future
- why? joining may not be easy. If executor, it is not possible
- methods:
- check: isDone()
- wait
- get(timeout) : blocking
- cancel()
- task should be written in a way which respects
interrupt
flag.
- task should be written in a way which respects
- C++ futures
<future>
std::future()
unique futurestd::shared_future()
less efficient- valid
- how to create
std::async
- takes same kind of arguments as thread (lambdas, parameters etc)
- may create new thread or not!
- can pass parameter to tell run in another thread
std::launch::async
orstd::launch::defered
- can pass parameter to tell run in another thread
packaged_task
- wrapper around (
stored task
andshared thread
)
- wrapper around (
20.3 misc
Lambda, runnable, callable are all closure.
21 Lect 20: C++ Move, futures, fork/join
21.1 Lvalues and Rvalues
- Lvalues are memory location
21.1.1 C++ refs
int x = 44; int& y = x; int&z = 44; //this won't compile
21.1.2 C++ rvalue references
21.2 futures
21.3 fork/join tasks
typical use case: divide and conquer
21.3.1 Work steal
- if thread's queue is empty, steal it from other thread's queue
21.3.2 using
- override
compute()
22 Lect 21: case study: Concurrency in Android, Lock implementation
22.1 Andriod app case study
LinkedBlockingQueue
: blocks if queue empty- Saves the thread somewhere so that it can interrupted if needed.
- since executor class is used, we don't know which thread computes what
- efforts made to make thread cancel at safe point
22.2 Lock Implementation
22.2.1 Spin Lock Implementation
- if want lock, get lock or:
- block (suspend)
- spinning (busy waiting)
- can be better than suspending on multi processor
- blocking needs context switch
- can be better than suspending on multi processor
22.2.2 TODO Doubts
- why not built in optimization Prof: These are directly mapped to hw instructions. We want these to be fast.
- how to stop thread without context switch
Prof:
park
java - Do spin locks utilize processor/ will they drain battery Prof: Yes, and hence use only for short amount of time.
23 Lect 22: TTAS
23.1 Backoff has disadvantage
23.2 waiting queue
23.2.1 advantages:
- fair
23.3 misc
- atomic reference to array of int
- array of atomic int
atomicInt
are mutable- safe
- atomicIntArray
- safe to add new integers: java docs
23.4 CLHLock
- uses virtual linked list
23.5 MCS Queue Lock
- check
java.util.ConcurrentLinkedQueue
source code
23.6 AtomicInt
Binary representation lock
- use bits to denote lock
- rest of the bits to represent tail position index
24 Lect 23: Spin Lock Implementations, hybrid Locks
24.1 Timeout
missed inital part, was fixing fiber lamp! :D
24.2 Hybrid Locks
24.3 Fat Locks
- stored in hashtables
24.4 Thin Locks
- fast path if no contention
- convert to fat lock if contention
- header contatains bits which indicate if there is a lock. If there is need for fat lock, the bits points to fat lock.
25 Lect 24: Mid 1 Review
25.1 TODO Can there be a deadlock without a conflict?
No!
25.2 Conflicts and deadlock
resolve conflict by
- happens before
- mark as
volatile
26 Linearizability and correctness
26.1 Linearizability
- what is it for a data structure to be correct to be serial
- when not locks
26.1.1 Objects are linearlizable
27 Reasoning about Linearizability
27.1 Linearizability Proof
- Linearization order is order lock released
27.2 Sequential Consistency
- not composible
- re-ordering can cause cycles
27.3 Wait Free
- Every contending thread is guaranteed to complete its method call within a bounded
number of its own time steps.
- have helper thread to substitute for the thread
- so that even if the thread fails, it can take over
27.4 Lock Free
- Lock-freedom allows individual threads to starve but guarantees system-wide throughput
- Some contending threads are guaranteed to complete their method call within a bounded number of steps
- The difference between Wait free and lock free is, in wait free operations every process is guaranteed to succeed in finite amount of steps
27.5 Obstruction free
In absence of contention, a thread is guaranteed to complete its method within a bounded number of steps
27.6 DONE Safe
- if a read call that doesn't overlap a write call returns the value written by the most recent write call
- otherwise, if a read call overlaps with a write call then it can return any value within register's allowed range of values
27.7 DONE Regular
- safe
- if read overlaps ith write, then read value can be between i and (i - 1) write
27.8 TODO Linearizable
27.9 Foundation of Concurrency
- SRSW
28 Composing higher constructs using primitive
28.1 Safe MRSW from SRSW
- get idea, skipped slides
28.2 Mututal exclution can not be used to get wait free
- once in critical section, it can go away
- not all wait free are practical
28.2.1 Example, 2 thread FIFO
- spinning on pre conditions
- works because only 1 thread each on enqueue and dequeue
- what happens if multiple dequeuers
- proved by contradiction
28.2.2 Consensus Number
28.2.3 Read modify write
28.3 ABA problem (Problem in CAS):
- we get value of variable
- then check if value is still olf, if yes change to
- what if someone changes to old value
28.4 Lock free
- cannot have lock free
29 Parallel Programming
- subset of concurrent programming
- OpenMP: memory parallelism - compiler directives
29.1 OpenMP
- add to program, compiler takes care of it. If compiler doesn't understand it then it ignores
- Fork-join Parallelism
- need not create actual thread, might be thread pool
example:
#include "omp.h" ... #pragma omp parallel for for (i = 0; i < 1000; i++) { //some big computation }
- works only (better?) on structured/blocked code
- because there is implicit barrier at end of block
29.1.1 SPMD (Single program, multiple data):
Get thread id then do different work based on id
int ID = omp_get_thread_num();
- barrier at end of openMP block implicitly
29.1.2 Work sharing
- divides work for threads automatically
- in SPMD this was done manually
- args
schedule(static [,chunk])
: preditibleschedule(dynamic [,chunk])
: unpredictable, highly variableschedule(guided [,chunk])
schedule(runtime)
- instead of shared variable like
sum
, havesum[]
and have each thread its own space to store.
29.2 OpenMP: variables in block
- most variables are shared
- local variables are not shared
29.2.1 TODO Types
lastprivate
firstprivate
30 OpenMP (cont)
30.1 Reduction
- don't accumulate!
- operator
- associative
- commutative
- form:
reduction(op : list)
- based on operator the var are initialized
+
0*
1
- slide 60: by applying reduction clause, we get final result in
res
- each thread is doing some accumulation
- without the reduction clause.
res
is shared, so accessing it needs to be synchronized
30.2 Critical Section
#pragma omp critical { //call the critical sections }
30.3 False Sharing
- incorrect update of cache.
- array: values next to each other, updating one, updates the whole cache line which contains other elements
30.4 Atomic
- special case of critical section
30.5 Barriers
#pragma omp barrier
: explicit- wait until all threads arrive
- implicit barrier at end of block
- lot of barriers by default
nowait()
30.6 Ordered:
sequencial ordering
- if operation is not associative/commutative
30.7 Flush
- memory synchronization
- creates consistent view of memory
- write everything to memory before flush, read everything after flush from memory
30.8 Environment Variable
- setting options at runtime (after compiling, during execution)
30.9 Problems with openMP
- when to flush
- memory not symmetric, some memory close to some processor
30.10 Tree based reduction
- divide and conquer
- serial to parallel
30.11 Floating point is not associative
- programmer needs to take care of it
31 Research Work: ACES4
- Between the compiler and the runtime system, significant opportunities exist for error checking as well as static and dynamic analyses to improve performance
- asynchronous task delegation
- MADNESS DAG
- any issues or bug fixes needed in code
32 Performance Models
32.1 Metrics
- Elapsed time
- TODO: how good is this, multi processors real time
- Java:
System.nanoTime
- C++:
chrono
package
- Price/Performance
- Speedup
- Efficiency
- Energy Efficiency
32.2 Models
32.2.1 Simple Analytical model
T(1) = T_{serial} + T_{parallelizable}(1)
32.3 Amdahl's Law
- Effect of serial part to overall performance.
- or serial part limits the overall performance
32.4 Message Passing
- UE: unit of execution
- Memory Sharing needs locks
- You cannot receive a message before it is sent
32.4.1 CSP (Communicating Sequential Processes)
- CAR Hoare paper 1978
- S0 || S1 … Sn
- x ! S1 –|
- y ? S0 –| Blocking
33 Go
33.1 Bg
- type inference
33.2 Concurrency
- share memory by communicating
- goroutine:
- lightweight
- channels:
- first class, can be passed around
34 Actors
- works well when regular flow of data
34.1 Actors inside
- local data
- mailbox (message queueu)
- Loop - handle messages
34.2 Difference with OOP
- actors do not share memory
- can't send references
- send deep copy
- many frameworks try to optimize this
- send ref if safe
- immutable messages are safe
- enforcing encapsulation may not be guaranteed
- many frameworks try to optimize this
- sending message is synchronous in OOP
34.3 MPI
- portable message passing format
34.4 TODO SPMD
35 MPI
35.1 MPI_type
- constant
35.2 MPI_send
blocks until buff is copied into system
- similar to
MPI_recv
35.3 SPMD
same code, based on id decide who process what part
35.4 Asychrony
- send and recv return immediately, we'll need to handle using
MPI_Wait
36 Vector Algorithms
36.1 Regular datastructures (homogeneous array)
- instead of having array of structs, have stucts of array
- better performance due to locality
36.2 Kernel Parallelism
- Cuda and OpenCl
36.3 Vector parallelism (SIMD)
36.3.1 Challenges:
- change problem- change code
- special register - small number of vector registers
- may have to swap to memory
36.3.2 How to use vector units
- Language constructs
- Compiler vectorization
- typically loops
- pass flag to see vector report
- some code may make look un-vectorizable
- vector instruction sets
- Example: SSE 4.2
36.4 OpenCl
- portable way to use all CPUs, GPUs etc
37 Final Review
37.1 Instructions
- 6 pages notes
- 10am to 12, 14th December
37.2 Preparation
- Homework
- Definitions
- Previous Exams
38 Deadlocks
- When threads holds multiple locks, think about deadlocks!
- Eg:
synchronized
method invokes othersynchronized
method associated with a different monitor lock.
- Eg:
38.1 Solutions
- Resource ordering
- can use
System.identityHashCode
- can use
- Deadlock suspicion
- lock with timeout
- not available with
monitor
-> usereentrant
- not available with
- lock with timeout
39 Livelock
- narrow passage example: passage wide enough to allow only 1 person, two people
coming in opposite direction. Both move sideway to allow other to pass.
- Solution: random backoff
40 Starvation
denial of resource
41 Invariant
conditions that are needed to be true at certain points.
41.1 Pre-condition
ways to check
- Optimistic: do everything then check
- Balking
- Guarded Suspension: suspend until preconditions not true
- Timed Waiting: wait for certain time then balk
- atomic await
<await B {S}>
wait tillB
becomestrue
then atomically executeS
- Implement
- Test a condition and suspend thread if not true
- release lock before suspending
- awaken, re acquire lock before proceeding
Every
Object
in Java has 2 Queues: wait set and lock set<await B {S}> => synchronized (obj) { while (!B) { obj.wait(); } S; }
wait
gives safetynotify
gives liveness
- Condition Variables
- Java:
cv = ReentrantLock.newCondition()
- each condition variable has it's own wait set.
Comparison with
monitor
cv await
signal
signalAll
for specific condition monitor wait
notify
notifyAll
for specific monitor lock
- C++:
std::condition_variable c;
c
: takes inmutex
(needed for release and reacquire mechanism)- optionally takes in condition (guard) else we need to manually loop on
!B
(condition)
- Java:
- Reader Writer lock: more expensive than mutex
- only situation when write less and lock for small time
- Implement