Concurrent Programming

1. Lect 2 (24 August): Basic Mechanics for Processes and Threads
- 1.1. Process
- 1.2. Thread
2. Lect 3 (29 August): Atomicity
- 2.1. Immutable: Java BigInteger is immutable
- 2.2. Stateless method
- 2.3. AtomicLong: Java provided atomic DS
  - 2.3.1. TODO Memory synchronization
- 2.4. Lock
  - 2.4.1. TODO slide 24, return statement
- 2.5. Pthread Mutexes
- 2.6. Java Intrinsic Locks (monitor locks)
  - 2.6.1. DONE was in earlier version, considered as mistake
  - 2.6.2. How monitors work intuition
3. Lect 4 (31 August): Atomicity Continued
- 3.1. template for atomic type
- 3.2. mutex (locks)
  - 3.2.1. lock_guard
4. Lect 4 (31 August): Memory Model
5. Lect 5 (7 September): Memory Model continued
- 5.1. TODO Visibility
- 5.2. TODO TSO (Total store order)
- 5.3. Special instructions/ FENCE
- 5.4. Compiler optimization
  - 5.4.1. when to optimize?
- 5.5. Vocabulary
  - 5.5.1. Conflicting operation:
  - 5.5.2. Data race:
- 5.6. A data race free (DRF) program behaves sequentially
- 5.7. What happens if data race
  - 5.7.1. Java
- 5.8. Race v/s data race
- 5.9. atomic provide order and
- 5.10. Order: defined control flow
- 5.11. TODO Ordered by happens-before
6. Lect 6: Publishing
- 6.1. Publishing (problems with Publishing data)
  - 6.1.1. Java double checking locking
  - 6.1.2. C++ initialization
- 6.2. unsafe publication - escape
  - 6.2.1. Construction
- 6.3. Immutability
7. Lect 7: Patterns for concurrent programming
- 7.1. Fully synchronized object
- 7.2. Finer grained approach
8. Lect 8: Patterns for concurrent data structures
- 8.1. Hand-over-hand locking
- 8.2. Optimistic synchronization
9. Lect 9: Patterns for thread safe data structures
- 9.1. Copy on write
10. Lect 10: Case study: Swing Package
- 10.1. Threads:
- 10.2. event listener
- 10.3. Swing not thread safe
- 10.4. Assignment 1: Thread safe queue
11. Lect 11: Patterns for concurrent data structures 4: Deadlocks
- 11.1. Design problem: queue
- 11.2. TODO Automated tests
- 11.3. Deadlocks
  - 11.3.1. Avoiding deadlocks
  - 11.3.2. Deadlock suspesion
12. Lect 12: Deadlocks (contd)
- 12.1. Let them happen, then deal with it
- 12.2. Thread priority
- 12.3. Conditional Synchrnozation
  - 12.3.1. Design by Contract
  - 12.3.2. 3 main conservative approaches
13. Lect 13: Java synchronized, wait, notify(got late)
- 13.1. wait
- 13.2. Safety and Liveness
- 13.3. waiting in C++
- 13.4. Readers-Writers Problem
14. Lect 14: Reader Writer problem (contd)
- 14.1. Bounded Buffer
- 14.2. TODO unique_lock v/s lock_guard
15. Lect 15: Cancellation
- 15.1. TODO Solve last problem of last slides
- 15.2. Cancellation: terminate thread externally
16. Lect 16: Cancellation (contd)
- 16.1. Interrupts and IO
- 16.2. Resource revocation (Java)
  - 16.2.1. Java.nio
- 16.3. C++ thread cancellation
- 16.4. Timed Waiting
  - 16.4.1. Java
  - 16.4.2. C++
- 16.5. Semaphores
  - 16.5.1. P and V
  - 16.5.2. Split binary semaphore
17. Lect 17: Split Binary Semaphore
- 17.1. Passing the baton
18. Mid Exam 1 Notes:
- 18.1. Java memory model allows assignments to be reordered.
- 18.2. Deadlocks
- 18.3. Safe publication:
- 18.4. wait, notify
- 18.5. Passing the baton:
  - 18.5.1. How multiple readers access is implemented
- 18.6. Callable v/s Runnable
- 18.7. ReentrantLock v/s synchronized
- 18.8. getAndSet()
- 18.9. lock_guard v/s unique_guard
- 18.10. safety and *liveness~
19. Lect 18: Thread Pools
- 19.1. Overhead of creating threads, and cleaning them
- 19.2. web Server
20. Lect 19: Executors & Futures
- 20.1. TODO Java Executors Doc
- 20.2. Asynchronous Computations
  - 20.2.1. Future
- 20.3. misc
21. Lect 20: C++ Move, futures, fork/join
- 21.1. Lvalues and Rvalues
  - 21.1.1. C++ refs
  - 21.1.2. C++ rvalue references
- 21.2. futures
- 21.3. fork/join tasks
  - 21.3.1. Work steal
  - 21.3.2. using
22. Lect 21: case study: Concurrency in Android, Lock implementation
- 22.1. Andriod app case study
- 22.2. Lock Implementation
  - 22.2.1. Spin Lock Implementation
  - 22.2.2. TODO Doubts
23. Lect 22: TTAS
- 23.1. Backoff has disadvantage
- 23.2. waiting queue
  - 23.2.1. advantages:
  - 23.2.2. disadvantages:
- 23.3. misc
- 23.4. CLHLock
- 23.5. MCS Queue Lock
- 23.6. AtomicInt Binary representation lock
24. Lect 23: Spin Lock Implementations, hybrid Locks
- 24.1. Timeout
- 24.2. Hybrid Locks
- 24.3. Fat Locks
- 24.4. Thin Locks
25. Lect 24: Mid 1 Review
- 25.1. TODO Can there be a deadlock without a conflict?
- 25.2. Conflicts and deadlock
26. Linearizability and correctness
- 26.1. Linearizability
  - 26.1.1. Objects are linearlizable
27. Reasoning about Linearizability
- 27.1. Linearizability Proof
- 27.2. Sequential Consistency
- 27.3. Wait Free
- 27.4. Lock Free
- 27.5. Obstruction free
- 27.6. DONE Safe
- 27.7. DONE Regular
- 27.8. TODO Linearizable
- 27.9. Foundation of Concurrency
28. Composing higher constructs using primitive
- 28.1. Safe MRSW from SRSW
- 28.2. Mututal exclution can not be used to get wait free
- 28.3. ABA problem (Problem in CAS):
- 28.4. Lock free
29. Parallel Programming
- 29.1. OpenMP
  - 29.1.1. SPMD (Single program, multiple data):
  - 29.1.2. Work sharing
- 29.2. OpenMP: variables in block
  - 29.2.1. TODO Types
30. OpenMP (cont)
- 30.1. Reduction
- 30.2. Critical Section
- 30.3. False Sharing
- 30.4. Atomic
- 30.5. Barriers
- 30.6. Ordered:
- 30.7. Flush
- 30.8. Environment Variable
- 30.9. Problems with openMP
- 30.10. Tree based reduction
- 30.11. Floating point is not associative
31. Research Work: ACES4
32. Performance Models
- 32.1. Metrics
- 32.2. Models
  - 32.2.1. Simple Analytical model
- 32.3. Amdahl's Law
- 32.4. Message Passing
  - 32.4.1. CSP (Communicating Sequential Processes)
33. Go
- 33.1. Bg
- 33.2. Concurrency
34. Actors
- 34.1. Actors inside
- 34.2. Difference with OOP
- 34.3. MPI
- 34.4. TODO SPMD
35. MPI
- 35.1. MPI_type
- 35.2. MPI_send blocks until buff is copied into system
- 35.3. SPMD
- 35.4. Asychrony
36. Vector Algorithms
- 36.1. Regular datastructures (homogeneous array)
- 36.2. Kernel Parallelism
- 36.3. Vector parallelism (SIMD)
  - 36.3.1. Challenges:
  - 36.3.2. How to use vector units
- 36.4. OpenCl
37. Final Review
- 37.1. Instructions
- 37.2. Preparation
38. Deadlocks
- 38.1. Solutions
39. Livelock
40. Starvation
41. Invariant
- 41.1. Pre-condition

1 Lect 2 (24 August): Basic Mechanics for Processes and Threads

Unit of execution – common word for process or thread
Threads: light weight process

1.1 Process

has its own heap and stack (address space)
IPC: requires sys call, hence slow
created in UNIX using fork – everything same except return code

1.2 Thread

share address space; shared heap
pthreads: POSIX threads
IPC don't require sys call

1.2.1 PThreads

See slides for more Pthreads APIs, I'm listing some details about some APIs:

int pthread_join(pthread_t thread, void **value_ptr); will suspend execution of the thread which calls it until target thread terminates. Useful when main thread waits for spawned threads to complete their execution before continuing. Here we can collect returned valued from thread in value_ptr.
void pthread_exit(void *retval); This will exit the thread which calls it (which calls pthread_exit) Main thread will exit, but spawned threads will continue. This is useful when main thread is used only to spawn new threads and does no other work.
int pthread_detach(pthread_t thread); The pthread_detach() function marks the thread identified by thread as detached. When a detached thread terminates, its resources are automatically released back to the system without the need for another thread to join with the terminated thread.
use pthread_exit to exit from thread and return value instead of directly using return val so that proper destructor are called.

detached vs join

Create a detached thread using pthread_attr_setdetachstate when we don't want to join it. A detached thread, after terminating, returns all of its resources, where a joinable thread doesn't return resources until it is joined. If a joinable thread is not joined, it's resources are not released until the process it terminated.
Todos
1. DONE return value void*
2. DONE increment t
3. DONE Read about pthreads

1.2.2 Java: `java.lang.Thread`

threads can be set as Daemon
- JVM waits for non daemon threads before shutting down

DONE Java8 lambda
DONE Java8 Lexical Scope
Scheduling
- set by JVM
- priorities are just guide lines

1.2.3 C++

TODO C+++ lambdas
TODO try multi threaded program

2 Lect 3 (29 August): Atomicity

2.1 Immutable: Java BigInteger is immutable

2.2 Stateless method

everything which is created is deleted/cleared after return Thread safe

2.3 AtomicLong: Java provided atomic DS

Gives methods such as increment
C++ added atomic DS in C++ 11.
utilizes hardware to provide atomicity

2.3.1 TODO Memory synchronization

2.4 Lock

Package java.util.lock - Interface
ReenterantLock

Exceptions in critical section

use unlock in finally

l.lock()
try
  //critical section
finally
  l.unlock()

Note, we need to lock lastNumber return lastFactor, we return what we checked [slide 25]

2.4.1 TODO slide 24, return statement

2.5 Pthread Mutexes

pthread_mutex_[lock,trylock,unlock]
tryLock returns true/false
reenterantLock or pthread_mutex_recursive same thread can lock the same lock
- Java - reenterantLock is default; C,C++ says otherwise
No garbage collector: pthread_mutex_destroy

2.6 Java Intrinsic Locks (monitor locks)

used with synchronized keyword
compiler gives support for proper use
this is default object for locking
synchronized keyword is not inherited in sub classes
Java: intrinsic lock can also be on class object, refer SO (slide 47)

2.6.1 DONE was in earlier version, considered as mistake

many still use
every object has lock - overhead
some methods are missing/unavailable in such method of locking
- lockinterruptibility, tryLock …
SO says synchronized is for easy cases, locks give more flexibility and responsibility.

2.6.2 How monitors work intuition

AtomicIntegers can be implemented using synchronized keyword.

class AtomicInt {
    private int value;
    synchronized public int increment() { // here we are having lock on `this`
        return AtomicInt(this.value++);
        // since atomic types are immutable, we are passing
        // new AtomicInt
    }
}

meanwhile in some other part of the universe AtomicIntegers i = new AtomicInteger(0); t1 calls i.increment() t2 calls i.increment() where `i` is shared, while t1 and t2 are different objects, i is shared and this is used for locking in the intrinsic lock (monitor lock) in synchronized public int increment() { Thus atomicity is maintained.

3 Lect 4 (31 August): Atomicity Continued

3.1 template for atomic type

++ is overloaded
implemented by runtime
- often utilized using hw instruction
- if not available, it uses lock
- in c++, is_lock_free indicates if lock is used

3.2 mutex (locks)

3.2.1 `lock_guard`

TODO Implements RAII (Resource allocation is initialization)

manager object: allocated in constructor, deallocates in destructor

{
     manage_obj man // calls constructor, allocates resource (file open, mutex lock)
     //do your stuff...
} // man goes out of scope, destructor is called

mutex RAII - explained in slide "Cached version"

4 Lect 4 (31 August): Memory Model

5 Lect 5 (7 September): Memory Model continued

5.1 TODO Visibility

5.2 TODO TSO (Total store order)

5.3 Special instructions/ FENCE

sides of the FENCE are not mixed
- eg: ARM calls fence as sync
compiler takes care of it
add FENCE only when required

5.4 Compiler optimization

eg: dead code elimination -> spin locks

5.4.1 when to optimize?

communication needed with compiler

5.5 Vocabulary

5.5.1 Conflicting operation:

different threads
access same location
at least one write

5.5.2 Data race:

conflicting operations are not ordered

5.6 A data race free (DRF) program behaves sequentially

5.7 What happens if data race

5.7.1 Java

No out of thin air values

maybe helpful for allowed races for performance

5.8 Race v/s data race

some mean race, as un wanted behavior due to schduler

5.9 atomic provide order and

5.10 Order: defined control flow

5.11 TODO Ordered by happens-before

6 Lect 6: Publishing

6.1 Publishing (problems with Publishing data)

ensure happens before relation between construction and use of object
wrapper for implementing multithreading
do not trush the function
block pointers

6.1.1 Java double checking locking

partially created instance can trigger instance == null
- constructor may not be atomic
- can be fixed by making instance volatile
- can this be fixed by synchronizing constructor

6.1.2 C++ initialization

TODO shared_pointers
call_once C++ initialization in a thread safe way provided by language
- once_flag

6.2 unsafe publication - escape

6.2.1 Construction

Java: this cannot escape during construction
use private constructor, wraper to get new object

6.3 Immutability

final is special in Java wrt to concurrency
call_once is corresponding stuff in C++

7 Lect 7: Patterns for concurrent programming

7.1 Fully synchronized object

one single lock
no public fields
finite
consistent
publish object without data race

7.1.1 Not scalable

7.1.2 Easy to implement

good when concurrency level is low

7.1.3 Publishing

if instance is volatile in java then it's construction is safe
publishing is about constructing

7.1.4 Re using sequential code

modify source code - add lock
subclass - add lock
adapter, proxy, lock then call method
iterator is not thread safe
- user has to manually do it
- why not synchronized
```
iter.hasNext();
nextItem = iter.next(); //two methods
```
- throws exception ConcurrentModification
  - this is not guaranteed

7.2 Finer grained approach

maybe too small granularity
identify stateless
- does not rely on mutable object
- slide 56, why make a copy of b
  - maybe read > 1 in compute

8 Lect 8: Patterns for concurrent data structures

8.1 Hand-over-hand locking

lock only one element of the data structure
- in example in slides: lock means, to delete predecessor we need lock
  - what about inserting new node (my question)

8.2 Optimistic synchronization

let things happen, then check if something went wrong
better if low contention
Atomic commit
- check if anything changed
- if no, then edit/commit
dont use optimistic sync if irreversible side effects

9 Lect 9: Patterns for thread safe data structures

slide 49:
- Node next should be volatile, one thread locks it but other traverses without locking.
- item and key final.

9.1 Copy on write

relevant when we need to update multiple field in the same object

10 Lect 10: Case study: Swing Package

10.1 Threads:

main thread
event queue for GUI

10.2 event listener

create button
create listener
install listener

10.3 Swing not thread safe

event dispatch thread cannot do long computation - GUI freeze
listenerList is copy-on-write
what happens if event is removed from listenerList
- handle events even after removing listener

10.4 Assignment 1: Thread safe queue

C++, template - any class can be put in
use exception - inherit from standard exception
default compiler will generate the function
delete cannot be called
- copy constructor
- = operator
lock_guard auto release when goes out of scope
- even if exception happens
shared_pointer auto deleted after all reference are exhausted

11 Lect 11: Patterns for concurrent data structures 4: Deadlocks

11.1 Design problem: queue

cannot be solved just by making all methods synchronized
- empty is shared, two threads in different synchronized methods can access the same variable.

11.2 TODO Automated tests

how to test data races?
create threads, add to queue
make threads to add certain intervals
time to make thread v/s work done in thread

11.3 Deadlocks

when thread hold multiple locks - think about deadlocks

11.3.1 Avoiding deadlocks

associate lock with serial number
cannot lock before locking other instance
acquire lower number lock than higher number lock.
orderings to acquire lock:
- ids in object
- system identity hash code

11.3.2 Deadlock suspesion

design code without deadlocks

12 Lect 12: Deadlocks (contd)

12.1 Let them happen, then deal with it

hard to detect deadlock accurately in distributed concurrent programs
if there suspicion that there is deadlock, consider as there is deadlock
synchronized keyword doesn't allow to try_lock
try_lock returns false if could not acquire lock else lock.
try to get lock, if not acquired then release own lock
- 0, try again immediately
- constant delay
- random delay
- exponential back off

12.2 Thread priority

10 priority levels in Java, maybe mapped to lesser number of OS thread priority
- thread priority may not be very useful
give daemon thread low priority

12.3 Conditional Synchrnozation

12.3.1 Design by Contract

precondition

responsibility of caller -> must be true before calling method
postcondition

responsibility of method -> must be true after method is called
Invariant

always true between method calls #+END_SRC

12.3.2 3 main conservative approaches

balking:

I'm not doing it!
Guarded suspension
wait till someone completes the precondition
- use atomic_wait
*

13 Lect 13: Java synchronized, wait, notify(got late)

13.1 wait

obj reacquires the lock after coming from wait state.
check and wait must be in a while loop

13.2 Safety and Liveness

safety is correctness
doing nothing can also be correct
- ensure Liveness
notify v/s notifyAll
- notify may be efficient
- notifyAll may be easier to prove correct

13.3 waiting in C++

condition_variable

13.4 Readers-Writers Problem

concurrent read allowed
```
nr > 0 || nw == 1
```

14 Lect 14: Reader Writer problem (contd)

instead of just mutual exclusion, multiple readers allowed, only one writer allowed.

invariant

nr >= 0 ^ nw >= 0 ^
(nw == 0 v (nw == 1 and nr == 0))

condition in java
- await
readers lock

14.1 Bounded Buffer

invariant
```
0 <= i ^ 0 <= j ^ 0 < i - j <= N
```

14.2 TODO `unique_lock` v/s `lock_guard`

15 Lect 15: Cancellation

15.1 TODO Solve last problem of last slides

15.2 Cancellation: terminate thread externally

15.2.1 Do cleanup

release locks
free memory (if not garbage collection; or delete references)
free resources

15.2.2 Unrestricted asynchronous cancellation

stop thread - it stops sometime soon
Java stop deprecated - it may not cleanup
- difficult to stop program in safe manner -see slides

TODO why called asynchronous

15.2.3 Polling for flag

set flag
thread polls and stops when see flag set
- after self cleaning

Problems
- efforts on programmer
- lock, block

15.2.4 Deferred cancellation

request cancellation anytime but will occur at specific points
- where threads checks for it
- predefined cancellation point

Implementation
1. Java
  - Java.lang.Thread.interrupt()
  - throw InterruptedException
    - if the thread is blocked, sleeping, waiting
    - run method not declared to throw InterruptedException
      
      catch exception and interrupt thread Thread.currentThread.interrupt()
      
      or wrap into RuntimeException
      
      RuntimeException doesn't need to be declared (unchecked exception)
    - catching clears interrupt flag
  - can poll for interrupt - interrupt(), isInterrupted
  1. Advantages
    - language provided
    - works even after when blocked
  2. Interrupts and memory model
    - call on interrupt() happens before detection
      
      no data race
    - check before something difficult to stop
    - empirical test
  3. Problems
    - synchronized doesn't allow interrupt
    - Blocked on IO doesn't interrupt
      
      socket have timed waiting blocking

16 Lect 16: Cancellation (contd)

16.1 Interrupts and IO

Sockets have timeout
- poll when timed out

16.2 Resource revocation (Java)

forcefully close resource

16.2.1 Java.nio

close() provided in Java.nio
no interrupt on write

16.3 C++ thread cancellation

no support
use Boost
can be implemented using Java.socket like timeout
- Refer: C++ concurrency in action: Practical Multithreading

16.4 Timed Waiting

16.4.1 Java

if (B,T) s
// B: if condition is true
// T: wait for T timeout
// execute s

16.4.2 C++

wait_for(lock, duration);
wait_until(lock, time_point);

slide 15: what happens after we go into wait until and then done is true

16.5 Semaphores

16.5.1 P and V

nat s; //s >= 0
P(): <await (s > 0) s-->
V(): <s++>

Chunk P/V
- useful in reader/writer
  - n readers allowed => writer requires n

16.5.2 Split binary semaphore

producer consumer- 2 semaphore, empty and full
- gives signaling like effect

17 Lect 17: Split Binary Semaphore

singaling on semaphores are sequencial

17.1 Passing the baton

only one thread does (slide 59)

18 Mid Exam 1 Notes:

18.1 Java memory model allows assignments to be reordered.

//thread 0
int x = ...
Boolean done = true;

//thread 1
if (done) {
//do something with x
}

Here declare x as volatile This is also the reason for problem in double checking idiom. Due to assignment re-ordering, the constructed object can reflect as non NULL before construction is finished.

18.2 Deadlocks

If a thread acquires >= 2 locks, think of deadlock. Two ways of dealing with deadlocks:

Avoidance: order them so that without first locking 'A', should not lock 'B'. In java, we can use System.identityHashCode to get unique value from totally ordered set.
Dealing with deadlock: use tryLock, which uses timeout to tell locking failed. Then re-try, with:
1. certain timeout
2. exponential back off
3. random delay
4. 0 delay

18.3 Safe publication:

mark variable as static
mark as volatile
refer from concurrent data structure, such as ConcurrentQueue etc.
keep atomic reference.

18.4 `wait`, `notify`

obj.wait() only releases one lock, if it is nested inside multiple locks, this will release only 1 (outer ones are still acquired).

18.5 Passing the baton:

#guards = conditions to be satisfied
guards are conditions, thus multiple invariant expression can form a single guard. Ex: in reader/writer, nr == 0 ^ nw == 0 denotes safe for writer.
be careful of the initial state of semaphores

18.5.1 How multiple readers access is implemented

The SIGNAL is executed after startReading as well as endReading. Thus after a reading thread awakens, it can signal other reading thread.

which in turn can cause starvation for writing thread
- in startRead, check if no writer and no writer pending.

18.6 `Callable` v/s `Runnable`

A Runnable, does not return a result and cannot throw a checked exception.

18.7 `ReentrantLock` v/s `synchronized`

ReentrantLock provides timed lock waits, interruptible lock waits, non-block-structured locks, multiple condition variables and lock polling
ReentrantLock can give conditional locking.

18.8 `getAndSet()`

getAndSet() creates a lot more traffic on message bus than get(). Thus it is better to test by get() before spinning on getAndSet(). getAndSet() invalidates the cache, thus substantial performance penalty.

18.9 `lock_guard` v/s `unique_guard`

lock_guard automatically releases lock on reaching closing brace. Takes care of exception.
unique_guard: same as lock_guard, but can be unlocked and locked again.

18.10 safety and *liveness~

safety says nothing bad happens
liveness something good will eventually happen

19 Lect 18: Thread Pools

19.1 Overhead of creating threads, and cleaning them

2 options:
1. Thread Pool
2. Futures

19.2 web Server

19.2.1 TODO check `socket` doc in java and c++

19.2.2 socket construction

// version 1
Socket s = new Socket(IP_add, port);
//blocking!

//version 2
Socket s = new Socket();
s.inetSocket(ip, port, timeout);
//added `timeout`

Work around
- Create another thread for blocked call
- try to join the thread by giving a timeout
  - after timeout, the SocketOpener thread continues to execute
  - main thread is unblocked by getting InterruptedException
    - may have to clean resources by inner thread

19.2.3 Server construction

19.2.4 Producer Consumer Paradigm

N threads
one server socket which creates job in bounded buffer
N consumer/worker threads
Need to clean resources etc
- may use java.util.ExecutorService

20 Lect 19: Executors & Futures

20.1 TODO Java Executors Doc

Java lambda automagically turns into runnable or callable depending on if value is returned

20.2 Asynchronous Computations

20.2.1 Future

why? joining may not be easy. If executor, it is not possible
methods:
- check: isDone()
- wait
- get(timeout) : blocking
- cancel()
  - task should be written in a way which respects interrupt flag.
C++ futures
- <future>
  - std::future() unique future
  - std::shared_future() less efficient
  - valid
  - how to create
    - std::async
      - takes same kind of arguments as thread (lambdas, parameters etc)
      - may create new thread or not!
        
        can pass parameter to tell run in another thread std::launch::async or std::launch::defered
- packaged_task
  - wrapper around (stored task and shared thread)

20.3 misc

Lambda, runnable, callable are all closure.

21 Lect 20: C++ Move, futures, `fork/join`

21.1 Lvalues and Rvalues

Lvalues are memory location

21.1.1 C++ refs

int x = 44;
int& y = x;

int&z = 44; //this won't compile

21.1.2 C++ rvalue references

21.2 futures

21.3 fork/join tasks

typical use case: divide and conquer

21.3.1 Work steal

if thread's queue is empty, steal it from other thread's queue

21.3.2 using

override compute()

22 Lect 21: case study: Concurrency in Android, Lock implementation

22.1 Andriod app case study

LinkedBlockingQueue: blocks if queue empty
Saves the thread somewhere so that it can interrupted if needed.
- since executor class is used, we don't know which thread computes what
efforts made to make thread cancel at safe point

22.2 Lock Implementation

22.2.1 Spin Lock Implementation

if want lock, get lock or:
- block (suspend)
- spinning (busy waiting)
  - can be better than suspending on multi processor
    - blocking needs context switch

TODO Classical Algorithms
- Lamport bakery
- Peterson
- Dekker's
Modern approach
1. TAS (test and set)
  - atomic getAndSet
    public void lock() { while (state.getAndSet(true)) { } // ^ atomic }
  - Not re entrant
2. TTAS (test and test and set)
  - additional check, test before testAndSet
Multiprocessor Architecture
1. Cache coherent protocols
  1. Write through
  2. Write back

22.2.2 TODO Doubts

why not built in optimization Prof: These are directly mapped to hw instructions. We want these to be fast.
how to stop thread without context switch Prof: park java
Do spin locks utilize processor/ will they drain battery Prof: Yes, and hence use only for short amount of time.

23 Lect 22: TTAS

23.1 Backoff has disadvantage

23.2 waiting queue

23.2.1 advantages:

fair

23.2.2 disadvantages:

memory allocation

TODO false sharing

23.3 misc

atomic reference to array of int
array of atomic int
- atomicInt are mutable
- safe
atomicIntArray
- safe to add new integers: java docs

23.4 CLHLock

uses virtual linked list

23.5 MCS Queue Lock

check java.util.ConcurrentLinkedQueue source code

23.6 `AtomicInt` Binary representation lock

use bits to denote lock
rest of the bits to represent tail position index

24 Lect 23: Spin Lock Implementations, hybrid Locks

24.1 Timeout

missed inital part, was fixing fiber lamp! :D

24.2 Hybrid Locks

24.3 Fat Locks

stored in hashtables

24.4 Thin Locks

fast path if no contention
convert to fat lock if contention
header contatains bits which indicate if there is a lock. If there is need for fat lock, the bits points to fat lock.

25 Lect 24: Mid 1 Review

25.1 TODO Can there be a deadlock without a conflict?

No!

25.2 Conflicts and deadlock

resolve conflict by

happens before
mark as volatile

26 Linearizability and correctness

26.1 Linearizability

what is it for a data structure to be correct to be serial
when not locks

26.1.1 Objects are linearlizable

27 Reasoning about Linearizability

27.1 Linearizability Proof

Linearization order is order lock released

27.2 Sequential Consistency

not composible
- re-ordering can cause cycles

27.3 Wait Free

wiki link

Every contending thread is guaranteed to complete its method call within a bounded

number of its own time steps.

have helper thread to substitute for the thread
- so that even if the thread fails, it can take over

27.4 Lock Free

Lock-freedom allows individual threads to starve but guarantees system-wide throughput
Some contending threads are guaranteed to complete their method call within a bounded number of steps
The difference between Wait free and lock free is, in wait free operations every process is guaranteed to succeed in finite amount of steps

27.5 Obstruction free

In absence of contention, a thread is guaranteed to complete its method within a bounded number of steps

27.6 DONE Safe

if a read call that doesn't overlap a write call returns the value written by the most recent write call
otherwise, if a read call overlaps with a write call then it can return any value within register's allowed range of values

27.7 DONE Regular

safe
if read overlaps i^th write, then read value can be between i and (i - 1) write

27.8 TODO Linearizable

27.9 Foundation of Concurrency

SRSW

28 Composing higher constructs using primitive

28.1 Safe MRSW from SRSW

get idea, skipped slides

28.2 Mututal exclution can not be used to get wait free

once in critical section, it can go away
not all wait free are practical

28.2.1 Example, 2 thread FIFO

spinning on pre conditions
- works because only 1 thread each on enqueue and dequeue
what happens if multiple dequeuers
proved by contradiction

Consensus problem

28.2.2 Consensus Number

28.2.3 Read modify write

28.3 ABA problem (Problem in CAS):

we get value of variable
then check if value is still olf, if yes change to
what if someone changes to old value

28.4 Lock free

cannot have lock free

29 Parallel Programming

subset of concurrent programming
OpenMP: memory parallelism - compiler directives

29.1 OpenMP

add to program, compiler takes care of it. If compiler doesn't understand it then it ignores
Fork-join Parallelism
- need not create actual thread, might be thread pool

example:

#include "omp.h"
...
#pragma omp parallel for
for (i = 0; i < 1000; i++) {
  //some big computation
}

works only (better?) on structured/blocked code
- because there is implicit barrier at end of block

29.1.1 SPMD (Single program, multiple data):

Get thread id then do different work based on id
```
int ID = omp_get_thread_num();
```
barrier at end of openMP block implicitly

29.1.2 Work sharing

divides work for threads automatically
- in SPMD this was done manually
args
- schedule(static [,chunk]): preditible
- schedule(dynamic [,chunk]): unpredictable, highly variable
- schedule(guided [,chunk])
- schedule(runtime)
instead of shared variable like sum, have sum[] and have each thread its own space to store.

29.2 OpenMP: variables in block

most variables are shared
local variables are not shared

29.2.1 TODO Types

lastprivate
firstprivate

30 OpenMP (cont)

30.1 Reduction

don't accumulate!
operator
- associative
- commutative
form: reduction(op : list)
based on operator the var are initialized
- + 0
- * 1
slide 60: by applying reduction clause, we get final result in res
- each thread is doing some accumulation
- without the reduction clause. res is shared, so accessing it needs to be synchronized

30.2 Critical Section

#pragma omp critical
{
  //call the critical sections
}

30.3 False Sharing

incorrect update of cache.
array: values next to each other, updating one, updates the whole cache line which contains other elements

30.4 Atomic

special case of critical section

30.5 Barriers

#pragma omp barrier: explicit
wait until all threads arrive
implicit barrier at end of block
lot of barriers by default
- nowait()

30.6 Ordered:

sequencial ordering

if operation is not associative/commutative

30.7 Flush

memory synchronization
creates consistent view of memory
write everything to memory before flush, read everything after flush from memory

30.8 Environment Variable

setting options at runtime (after compiling, during execution)

30.9 Problems with openMP

when to flush
memory not symmetric, some memory close to some processor

30.10 Tree based reduction

divide and conquer
serial to parallel

30.11 Floating point is not associative

programmer needs to take care of it

31 Research Work: ACES4

Between the compiler and the runtime system, significant opportunities exist for error checking as well as static and dynamic analyses to improve performance
asynchronous task delegation
MADNESS DAG
any issues or bug fixes needed in code

32 Performance Models

32.1 Metrics

Elapsed time
- TODO: how good is this, multi processors real time
- Java: System.nanoTime
- C++: chrono package
Price/Performance
Speedup
Efficiency
Energy Efficiency

32.2 Models

32.2.1 Simple Analytical model

T(1) = T_{serial} + T_{parallelizable}(1)

32.3 Amdahl's Law

Effect of serial part to overall performance.
- or serial part limits the overall performance

32.4 Message Passing

UE: unit of execution
Memory Sharing needs locks
You cannot receive a message before it is sent

32.4.1 CSP (Communicating Sequential Processes)

CAR Hoare paper 1978
S0 || S1 … Sn
x ! S1 –|
y ? S0 –| Blocking

33 Go

33.1 Bg

type inference

33.2 Concurrency

share memory by communicating
goroutine:
- lightweight
channels:
- first class, can be passed around

34 Actors

works well when regular flow of data

34.1 Actors inside

local data
mailbox (message queueu)
Loop - handle messages

34.2 Difference with OOP

actors do not share memory
- can't send references
- send deep copy
  - many frameworks try to optimize this
    - send ref if safe
    - immutable messages are safe
    - enforcing encapsulation may not be guaranteed
sending message is synchronous in OOP

34.3 MPI

portable message passing format

34.4 TODO SPMD

35 MPI

35.1 `MPI_type`

constant

35.2 `MPI_send` blocks until buff is copied into system

similar to MPI_recv

35.3 SPMD

same code, based on id decide who process what part

35.4 Asychrony

send and recv return immediately, we'll need to handle using MPI_Wait

36 Vector Algorithms

36.1 Regular datastructures (homogeneous array)

instead of having array of structs, have stucts of array
- better performance due to locality

36.2 Kernel Parallelism

Cuda and OpenCl

36.3 Vector parallelism (SIMD)

36.3.1 Challenges:

change problem- change code
special register - small number of vector registers
- may have to swap to memory

36.3.2 How to use vector units

Language constructs
Compiler vectorization
- typically loops
- pass flag to see vector report
  - some code may make look un-vectorizable
vector instruction sets
- Example: SSE 4.2

36.4 OpenCl

portable way to use all CPUs, GPUs etc

37 Final Review

37.1 Instructions

6 pages notes
10am to 12, 14th December

37.2 Preparation

Homework
Definitions
Previous Exams

38 Deadlocks

When threads holds multiple locks, think about deadlocks!
- Eg: synchronized method invokes other synchronized method associated with a different monitor lock.

38.1 Solutions

Resource ordering
- can use System.identityHashCode
Deadlock suspicion
- lock with timeout
  - not available with monitor -> use reentrant

39 Livelock

narrow passage example: passage wide enough to allow only 1 person, two people

coming in opposite direction. Both move sideway to allow other to pass.

Solution: random backoff

40 Starvation

denial of resource

41 Invariant

conditions that are needed to be true at certain points.

41.1 Pre-condition

ways to check

Optimistic: do everything then check
Balking
- Guarded Suspension: suspend until preconditions not true
- Timed Waiting: wait for certain time then balk
atomic await <await B {S}> wait till B becomes true then atomically execute S
- Implement
  - Test a condition and suspend thread if not true
  - release lock before suspending
  - awaken, re acquire lock before proceeding
- Every Object in Java has 2 Queues: wait set and lock set
```
<await B {S}> => synchronized (obj) {
                     while (!B) { obj.wait(); }
                     S;
                 }
```
  - wait gives safety
  - notify gives liveness
- Condition Variables
  - Java: cv = ReentrantLock.newCondition()
    - each condition variable has it's own wait set.
    - Comparison with monitor
      
      cv await signal signalAll for specific condition
      
      monitor wait notify notifyAll for specific monitor lock
  - C++: std::condition_variable c;
    - c: takes in mutex (needed for release and reacquire mechanism)
    - optionally takes in condition (guard) else we need to manually loop on !B (condition)
- Reader Writer lock: more expensive than mutex
  - only situation when write less and lock for small time

cv	`await`	`signal`	`signalAll`	for specific condition
monitor	`wait`	`notify`	`notifyAll`	for specific monitor lock