Concurrent Programming

Table of Contents

1 Lect 2 (24 August): Basic Mechanics for Processes and Threads

  • Unit of execution – common word for process or thread
  • Threads: light weight process

1.1 Process

  • has its own heap and stack (address space)
  • IPC: requires sys call, hence slow
  • created in UNIX using fork – everything same except return code

1.2 Thread

  • share address space; shared heap
  • pthreads: POSIX threads
  • IPC don't require sys call

1.2.1 PThreads

See slides for more Pthreads APIs, I'm listing some details about some APIs:

  • int pthread_join(pthread_t thread, void **value_ptr); will suspend execution of the thread which calls it until target thread terminates. Useful when main thread waits for spawned threads to complete their execution before continuing. Here we can collect returned valued from thread in value_ptr.
  • void pthread_exit(void *retval); This will exit the thread which calls it (which calls pthread_exit) Main thread will exit, but spawned threads will continue. This is useful when main thread is used only to spawn new threads and does no other work.
  • int pthread_detach(pthread_t thread); The pthreaddetach() function marks the thread identified by thread as detached. When a detached thread terminates, its resources are automatically released back to the system without the need for another thread to join with the terminated thread.
  • use pthread_exit to exit from thread and return value instead of directly using return val so that proper destructor are called.
  1. detached vs join

    Create a detached thread using pthread_attr_setdetachstate when we don't want to join it. A detached thread, after terminating, returns all of its resources, where a joinable thread doesn't return resources until it is joined. If a joinable thread is not joined, it's resources are not released until the process it terminated.

  2. Todos
    1. DONE return value void*
    2. DONE increment t
    3. DONE Read about pthreads

1.2.2 Java: java.lang.Thread

  • threads can be set as Daemon
    • JVM waits for non daemon threads before shutting down
  1. DONE Java8 lambda
  2. DONE Java8 Lexical Scope
  3. Scheduling
    • set by JVM
    • priorities are just guide lines

1.2.3 C++

  1. TODO C+++ lambdas
  2. TODO try multi threaded program

2 Lect 3 (29 August): Atomicity

2.1 Immutable: Java BigInteger is immutable

2.2 Stateless method

everything which is created is deleted/cleared after return Thread safe

2.3 AtomicLong: Java provided atomic DS

  • Gives methods such as increment
  • C++ added atomic DS in C++ 11.
  • utilizes hardware to provide atomicity

2.3.1 TODO Memory synchronization

2.4 Lock

  • Package java.util.lock - Interface
  • ReenterantLock
  • Exceptions in critical section
    • use unlock in finally

      l.lock()
      try
        //critical section
      finally
        l.unlock()
      
  • Note, we need to lock lastNumber return lastFactor, we return what we checked [slide 25]

2.4.1 TODO slide 24, return statement

2.5 Pthread Mutexes

  • pthreadmutex_[lock,trylock,unlock]
  • tryLock returns true/false
  • reenterantLock or pthread_mutex_recursive same thread can lock the same lock
    • Java - reenterantLock is default; C,C++ says otherwise
  • No garbage collector: pthread_mutex_destroy

2.6 Java Intrinsic Locks (monitor locks)

  • used with synchronized keyword
  • compiler gives support for proper use
  • this is default object for locking
  • synchronized keyword is not inherited in sub classes
  • Java: intrinsic lock can also be on class object, refer SO (slide 47)

2.6.1 DONE was in earlier version, considered as mistake

  • many still use
  • every object has lock - overhead
  • some methods are missing/unavailable in such method of locking
    • lockinterruptibility, tryLock
  • SO says synchronized is for easy cases, locks give more flexibility and responsibility.

2.6.2 How monitors work   intuition

AtomicIntegers can be implemented using synchronized keyword.

class AtomicInt {
    private int value;
    synchronized public int increment() { // here we are having lock on `this`
        return AtomicInt(this.value++);
        // since atomic types are immutable, we are passing
        // new AtomicInt
    }
}

meanwhile in some other part of the universe AtomicIntegers i = new AtomicInteger(0); t1 calls i.increment() t2 calls i.increment() where `i` is shared, while t1 and t2 are different objects, i is shared and this is used for locking in the intrinsic lock (monitor lock) in synchronized public int increment() { Thus atomicity is maintained.

3 Lect 4 (31 August): Atomicity Continued

3.1 template for atomic type

  • ++ is overloaded
  • implemented by runtime
    • often utilized using hw instruction
    • if not available, it uses lock
    • in c++, is_lock_free indicates if lock is used

3.2 mutex (locks)

3.2.1 lock_guard

  1. TODO Implements RAII (Resource allocation is initialization)
    • manager object: allocated in constructor, deallocates in destructor

      {
           manage_obj man // calls constructor, allocates resource (file open, mutex lock)
           //do your stuff...
      } // man goes out of scope, destructor is called
      
    • mutex RAII - explained in slide "Cached version"

4 Lect 4 (31 August): Memory Model

5 Lect 5 (7 September): Memory Model continued

5.1 TODO Visibility

5.2 TODO TSO (Total store order)

5.3 Special instructions/ FENCE

  • sides of the FENCE are not mixed
    • eg: ARM calls fence as sync
  • compiler takes care of it
  • add FENCE only when required

5.4 Compiler optimization

  • eg: dead code elimination -> spin locks

5.4.1 when to optimize?

  • communication needed with compiler

5.5 Vocabulary

5.5.1 Conflicting operation:

  • different threads
  • access same location
  • at least one write

5.5.2 Data race:

conflicting operations are not ordered

5.6 A data race free (DRF) program behaves sequentially

5.7 What happens if data race

5.7.1 Java

  1. No out of thin air values

    maybe helpful for allowed races for performance

5.8 Race v/s data race

  • some mean race, as un wanted behavior due to schduler

5.9 atomic provide order and

5.10 Order: defined control flow

5.11 TODO Ordered by happens-before

6 Lect 6: Publishing

6.1 Publishing (problems with Publishing data)

  • ensure happens before relation between construction and use of object
  • wrapper for implementing multithreading
  • do not trush the function
  • block pointers

6.1.1 Java double checking locking

  • partially created instance can trigger instance == null
    • constructor may not be atomic
    • can be fixed by making instance volatile
    • can this be fixed by synchronizing constructor

6.1.2 C++ initialization

  1. TODO shared_pointers
  2. call_once C++ initialization in a thread safe way provided by language
    • once_flag

6.2 unsafe publication - escape

6.2.1 Construction

  • Java: this cannot escape during construction
  • use private constructor, wraper to get new object

6.3 Immutability

  • final is special in Java wrt to concurrency
  • call_once is corresponding stuff in C++

7 Lect 7: Patterns for concurrent programming

7.1 Fully synchronized object

  • one single lock
  • no public fields
  • finite
  • consistent
  • publish object without data race

7.1.1 Not scalable

7.1.2 Easy to implement

  • good when concurrency level is low

7.1.3 Publishing

  • if instance is volatile in java then it's construction is safe
  • publishing is about constructing

7.1.4 Re using sequential code

  • modify source code - add lock
  • subclass - add lock
  • adapter, proxy, lock then call method
  • iterator is not thread safe
    • user has to manually do it
    • why not synchronized

      iter.hasNext();
      nextItem = iter.next(); //two methods
      
    • throws exception ConcurrentModification
      • this is not guaranteed

7.2 Finer grained approach

  • maybe too small granularity
  • identify stateless
    • does not rely on mutable object
    • slide 56, why make a copy of b
      • maybe read > 1 in compute

8 Lect 8: Patterns for concurrent data structures

8.1 Hand-over-hand locking

  • lock only one element of the data structure
    • in example in slides: lock means, to delete predecessor we need lock
      • what about inserting new node (my question)

8.2 Optimistic synchronization

  • let things happen, then check if something went wrong
  • better if low contention
  • Atomic commit
    • check if anything changed
    • if no, then edit/commit
  • dont use optimistic sync if irreversible side effects

9 Lect 9: Patterns for thread safe data structures

  • slide 49:
    • Node next should be volatile, one thread locks it but other traverses without locking.
    • item and key final.

9.1 Copy on write

  • relevant when we need to update multiple field in the same object

10 Lect 10: Case study: Swing Package

10.1 Threads:

  • main thread
  • event queue for GUI

10.2 event listener

  • create button
  • create listener
  • install listener

10.3 Swing not thread safe

  • event dispatch thread cannot do long computation - GUI freeze
  • listenerList is copy-on-write
  • what happens if event is removed from listenerList
    • handle events even after removing listener

10.4 Assignment 1: Thread safe queue

  • C++, template - any class can be put in
  • use exception - inherit from standard exception
  • default compiler will generate the function
  • delete cannot be called
    • copy constructor
    • = operator
  • lock_guard auto release when goes out of scope
    • even if exception happens
  • shared_pointer auto deleted after all reference are exhausted

11 Lect 11: Patterns for concurrent data structures 4: Deadlocks

11.1 Design problem: queue

  • cannot be solved just by making all methods synchronized
    • empty is shared, two threads in different synchronized methods can access the same variable.

11.2 TODO Automated tests

  • how to test data races?
  • create threads, add to queue
  • make threads to add certain intervals
  • time to make thread v/s work done in thread

11.3 Deadlocks

  • when thread hold multiple locks - think about deadlocks

11.3.1 Avoiding deadlocks

  • associate lock with serial number
  • cannot lock before locking other instance
  • acquire lower number lock than higher number lock.
  • orderings to acquire lock:
    • ids in object
    • system identity hash code

11.3.2 Deadlock suspesion

  • design code without deadlocks

12 Lect 12: Deadlocks (contd)

12.1 Let them happen, then deal with it

  • hard to detect deadlock accurately in distributed concurrent programs
  • if there suspicion that there is deadlock, consider as there is deadlock
  • synchronized keyword doesn't allow to try_lock
  • try_lock returns false if could not acquire lock else lock.
  • try to get lock, if not acquired then release own lock
    • 0, try again immediately
    • constant delay
    • random delay
    • exponential back off

12.2 Thread priority

  • 10 priority levels in Java, maybe mapped to lesser number of OS thread priority
    • thread priority may not be very useful
  • give daemon thread low priority

12.3 Conditional Synchrnozation

12.3.1 Design by Contract

  1. precondition

    responsibility of caller -> must be true before calling method

  2. postcondition

    responsibility of method -> must be true after method is called

  3. Invariant

    always true between method calls #+ENDSRC

12.3.2 3 main conservative approaches

  1. balking:

    I'm not doing it!

  2. Guarded suspension

    wait till someone completes the precondition

    • use atomic_wait

    *

13 Lect 13: Java synchronized, wait, notify(got late)

13.1 wait

  • obj reacquires the lock after coming from wait state.
  • check and wait must be in a while loop

13.2 Safety and Liveness

  • safety is correctness
  • doing nothing can also be correct
    • ensure Liveness
  • notify v/s notifyAll
    • notify may be efficient
    • notifyAll may be easier to prove correct

13.3 waiting in C++

  • condition_variable

13.4 Readers-Writers Problem

  • concurrent read allowed

    nr > 0 || nw == 1
    

14 Lect 14: Reader Writer problem (contd)

  • instead of just mutual exclusion, multiple readers allowed, only one writer allowed.
  • invariant

    nr >= 0 ^ nw >= 0 ^
    (nw == 0 v (nw == 1 and nr == 0))
    
  • condition in java
    • await
  • readers lock

14.1 Bounded Buffer

  • invariant

    0 <= i ^ 0 <= j ^ 0 < i - j <= N
    

14.2 TODO unique_lock v/s lock_guard

15 Lect 15: Cancellation

15.1 TODO Solve last problem of last slides

15.2 Cancellation: terminate thread externally

15.2.1 Do cleanup

  • release locks
  • free memory (if not garbage collection; or delete references)
  • free resources

15.2.2 Unrestricted asynchronous cancellation

  • stop thread - it stops sometime soon
  • Java stop deprecated - it may not cleanup
    • difficult to stop program in safe manner -see slides
  1. TODO why called asynchronous

15.2.3 Polling for flag

  • set flag
  • thread polls and stops when see flag set
    • after self cleaning
  1. Problems
    • efforts on programmer
    • lock, block

15.2.4 Deferred cancellation

  • request cancellation anytime but will occur at specific points
    • where threads checks for it
    • predefined cancellation point
  1. Implementation
    1. Java
      • Java.lang.Thread.interrupt()
      • throw InterruptedException
        • if the thread is blocked, sleeping, waiting
        • run method not declared to throw InterruptedException
          • catch exception and interrupt thread Thread.currentThread.interrupt()
          • or wrap into RuntimeException
            • RuntimeException doesn't need to be declared (unchecked exception)
        • catching clears interrupt flag
      • can poll for interrupt - interrupt(), isInterrupted
      1. Advantages
        • language provided
        • works even after when blocked
      2. Interrupts and memory model
        • call on interrupt() happens before detection
          • no data race
        • check before something difficult to stop
        • empirical test
      3. Problems
        • synchronized doesn't allow interrupt
        • Blocked on IO doesn't interrupt
          • socket have timed waiting blocking

16 Lect 16: Cancellation (contd)

16.1 Interrupts and IO

  • Sockets have timeout
    • poll when timed out

16.2 Resource revocation (Java)

  • forcefully close resource

16.2.1 Java.nio

  • close() provided in Java.nio
  • no interrupt on write

16.3 C++ thread cancellation

  • no support
  • use Boost
  • can be implemented using Java.socket like timeout
    • Refer: C++ concurrency in action: Practical Multithreading

16.4 Timed Waiting

16.4.1 Java

if (B,T) s
// B: if condition is true
// T: wait for T timeout
// execute s

16.4.2 C++

wait_for(lock, duration);
wait_until(lock, time_point);
  • slide 15: what happens after we go into wait until and then done is true

16.5 Semaphores

16.5.1 P and V

nat s; //s >= 0
P(): <await (s > 0) s-->
V(): <s++>
  1. Chunk P/V
    • useful in reader/writer
      • n readers allowed => writer requires n

16.5.2 Split binary semaphore

  • producer consumer- 2 semaphore, empty and full
    • gives signaling like effect

17 Lect 17: Split Binary Semaphore

  • singaling on semaphores are sequencial

17.1 Passing the baton

  • only one thread does (slide 59)

18 Mid Exam 1 Notes:

18.1 Java memory model allows assignments to be reordered.

//thread 0
int x = ...
Boolean done = true;

//thread 1
if (done) {
//do something with x
}

Here declare x as volatile This is also the reason for problem in double checking idiom. Due to assignment re-ordering, the constructed object can reflect as non NULL before construction is finished.

18.2 Deadlocks

If a thread acquires >= 2 locks, think of deadlock. Two ways of dealing with deadlocks:

  1. Avoidance: order them so that without first locking 'A', should not lock 'B'. In java, we can use System.identityHashCode to get unique value from totally ordered set.
  2. Dealing with deadlock: use tryLock, which uses timeout to tell locking failed. Then re-try, with:
    1. certain timeout
    2. exponential back off
    3. random delay
    4. 0 delay

18.3 Safe publication:

  1. mark variable as static
  2. mark as volatile
  3. refer from concurrent data structure, such as ConcurrentQueue etc.
  4. keep atomic reference.

18.4 wait, notify

  • obj.wait() only releases one lock, if it is nested inside multiple locks, this will release only 1 (outer ones are still acquired).

18.5 Passing the baton:

  • #guards = conditions to be satisfied
  • guards are conditions, thus multiple invariant expression can form a single guard. Ex: in reader/writer, nr == 0 ^ nw == 0 denotes safe for writer.
  • be careful of the initial state of semaphores

18.5.1 How multiple readers access is implemented

The SIGNAL is executed after startReading as well as endReading. Thus after a reading thread awakens, it can signal other reading thread.

  • which in turn can cause starvation for writing thread
    • in startRead, check if no writer and no writer pending.

18.6 Callable v/s Runnable

A Runnable, does not return a result and cannot throw a checked exception.

18.7 ReentrantLock v/s synchronized

  • ReentrantLock provides timed lock waits, interruptible lock waits, non-block-structured locks, multiple condition variables and lock polling
  • ReentrantLock can give conditional locking.

18.8 getAndSet()

  • getAndSet() creates a lot more traffic on message bus than get(). Thus it is better to test by get() before spinning on getAndSet(). getAndSet() invalidates the cache, thus substantial performance penalty.

18.9 lock_guard v/s unique_guard

  • lock_guard automatically releases lock on reaching closing brace. Takes care of exception.
  • unique_guard: same as lock_guard, but can be unlocked and locked again.

18.10 safety and *liveness~

  • safety says nothing bad happens
  • liveness something good will eventually happen

19 Lect 18: Thread Pools

19.1 Overhead of creating threads, and cleaning them

  • 2 options:
    1. Thread Pool
    2. Futures

19.2 web Server

19.2.1 TODO check socket doc in java and c++

19.2.2 socket construction

// version 1
Socket s = new Socket(IP_add, port);
//blocking!

//version 2
Socket s = new Socket();
s.inetSocket(ip, port, timeout);
//added `timeout`
  1. Work around
    • Create another thread for blocked call
    • try to join the thread by giving a timeout
      • after timeout, the SocketOpener thread continues to execute
      • main thread is unblocked by getting InterruptedException
        • may have to clean resources by inner thread

19.2.3 Server construction

19.2.4 Producer Consumer Paradigm

  • N threads
  • one server socket which creates job in bounded buffer
  • N consumer/worker threads
  • Need to clean resources etc
    • may use java.util.ExecutorService

20 Lect 19: Executors & Futures

20.1 TODO Java Executors Doc

Java lambda automagically turns into runnable or callable depending on if value is returned

20.2 Asynchronous Computations

20.2.1 Future

  • why? joining may not be easy. If executor, it is not possible
  • methods:
    • check: isDone()
    • wait
    • get(timeout) : blocking
    • cancel()
      • task should be written in a way which respects interrupt flag.
  • C++ futures
    • <future>
      • std::future() unique future
      • std::shared_future() less efficient
      • valid
      • how to create
        • std::async
          • takes same kind of arguments as thread (lambdas, parameters etc)
          • may create new thread or not!
            • can pass parameter to tell run in another thread std::launch::async or std::launch::defered
    • packaged_task
      • wrapper around (stored task and shared thread)

20.3 misc

Lambda, runnable, callable are all closure.

21 Lect 20: C++ Move, futures, fork/join

21.1 Lvalues and Rvalues

  • Lvalues are memory location

21.1.1 C++ refs

int x = 44;
int& y = x;

int&z = 44; //this won't compile

21.1.2 C++ rvalue references

21.2 futures

21.3 fork/join tasks

typical use case: divide and conquer

21.3.1 Work steal

  • if thread's queue is empty, steal it from other thread's queue

21.3.2 using

  • override compute()

22 Lect 21: case study: Concurrency in Android, Lock implementation

22.1 Andriod app case study

  • LinkedBlockingQueue: blocks if queue empty
  • Saves the thread somewhere so that it can interrupted if needed.
    • since executor class is used, we don't know which thread computes what
  • efforts made to make thread cancel at safe point

22.2 Lock Implementation

22.2.1 Spin Lock Implementation

  • if want lock, get lock or:
    • block (suspend)
    • spinning (busy waiting)
      • can be better than suspending on multi processor
        • blocking needs context switch
  1. TODO Classical Algorithms
    • Lamport bakery
    • Peterson
    • Dekker's
  2. Modern approach
    1. TAS (test and set)
      • atomic getAndSet

        public void lock() {
          while (state.getAndSet(true)) { }
          //      ^ atomic
        }
        
      • Not re entrant
    2. TTAS (test and test and set)
      • additional check, test before testAndSet
  3. Multiprocessor Architecture
    1. Cache coherent protocols
      1. Write through
      2. Write back

22.2.2 TODO Doubts

  • why not built in optimization Prof: These are directly mapped to hw instructions. We want these to be fast.
  • how to stop thread without context switch Prof: park java
  • Do spin locks utilize processor/ will they drain battery Prof: Yes, and hence use only for short amount of time.

23 Lect 22: TTAS

23.1 Backoff has disadvantage

23.2 waiting queue

23.2.1 advantages:

  • fair

23.2.2 disadvantages:

  • memory allocation
  1. TODO false sharing

23.3 misc

  • atomic reference to array of int
  • array of atomic int
    • atomicInt are mutable
    • safe
  • atomicIntArray

23.4 CLHLock

  • uses virtual linked list

23.5 MCS Queue Lock

  • check java.util.ConcurrentLinkedQueue source code

23.6 AtomicInt Binary representation lock

  • use bits to denote lock
  • rest of the bits to represent tail position index

24 Lect 23: Spin Lock Implementations, hybrid Locks

24.1 Timeout

missed inital part, was fixing fiber lamp! :D

24.2 Hybrid Locks

24.3 Fat Locks

  • stored in hashtables

24.4 Thin Locks

  • fast path if no contention
  • convert to fat lock if contention
  • header contatains bits which indicate if there is a lock. If there is need for fat lock, the bits points to fat lock.

25 Lect 24: Mid 1 Review

25.1 TODO Can there be a deadlock without a conflict?

No!

25.2 Conflicts and deadlock

resolve conflict by

  • happens before
  • mark as volatile

26 Linearizability and correctness

26.1 Linearizability

  • what is it for a data structure to be correct to be serial
  • when not locks

26.1.1 Objects are linearlizable

27 Reasoning about Linearizability

27.1 Linearizability Proof

  • Linearization order is order lock released

27.2 Sequential Consistency

  • not composible
    • re-ordering can cause cycles

27.3 Wait Free

wiki link

  • Every contending thread is guaranteed to complete its method call within a bounded

number of its own time steps.

  • have helper thread to substitute for the thread
    • so that even if the thread fails, it can take over

27.4 Lock Free

  • Lock-freedom allows individual threads to starve but guarantees system-wide throughput
  • Some contending threads are guaranteed to complete their method call within a bounded number of steps
  • The difference between Wait free and lock free is, in wait free operations every process is guaranteed to succeed in finite amount of steps

27.5 Obstruction free

In absence of contention, a thread is guaranteed to complete its method within a bounded number of steps

27.6 DONE Safe

  • if a read call that doesn't overlap a write call returns the value written by the most recent write call
  • otherwise, if a read call overlaps with a write call then it can return any value within register's allowed range of values

27.7 DONE Regular

  • safe
  • if read overlaps ith write, then read value can be between i and (i - 1) write

27.8 TODO Linearizable

27.9 Foundation of Concurrency

  • SRSW

28 Composing higher constructs using primitive

28.1 Safe MRSW from SRSW

  • get idea, skipped slides

28.2 Mututal exclution can not be used to get wait free

  • once in critical section, it can go away
  • not all wait free are practical

28.2.1 Example, 2 thread FIFO

  • spinning on pre conditions
    • works because only 1 thread each on enqueue and dequeue
  • what happens if multiple dequeuers
  • proved by contradiction
  1. Consensus problem

28.2.2 Consensus Number

28.2.3 Read modify write

28.3 ABA problem (Problem in CAS):

  • we get value of variable
  • then check if value is still olf, if yes change to
  • what if someone changes to old value

28.4 Lock free

  • cannot have lock free

29 Parallel Programming

  • subset of concurrent programming
  • OpenMP: memory parallelism - compiler directives

29.1 OpenMP

  • add to program, compiler takes care of it. If compiler doesn't understand it then it ignores
  • Fork-join Parallelism
    • need not create actual thread, might be thread pool
  • example:

    #include "omp.h"
    ...
    #pragma omp parallel for
    for (i = 0; i < 1000; i++) {
      //some big computation
    }
    
  • works only (better?) on structured/blocked code
    • because there is implicit barrier at end of block

29.1.1 SPMD (Single program, multiple data):

  • Get thread id then do different work based on id

    int ID = omp_get_thread_num();
    
  • barrier at end of openMP block implicitly

29.1.2 Work sharing

  • divides work for threads automatically
    • in SPMD this was done manually
  • args
    • schedule(static [,chunk]): preditible
    • schedule(dynamic [,chunk]): unpredictable, highly variable
    • schedule(guided [,chunk])
    • schedule(runtime)
  • instead of shared variable like sum, have sum[] and have each thread its own space to store.

29.2 OpenMP: variables in block

  • most variables are shared
  • local variables are not shared

29.2.1 TODO Types

  • lastprivate
  • firstprivate

30 OpenMP (cont)

30.1 Reduction

  • don't accumulate!
  • operator
    • associative
    • commutative
  • form: reduction(op : list)
  • based on operator the var are initialized
    • + 0
    • * 1
  • slide 60: by applying reduction clause, we get final result in res
    • each thread is doing some accumulation
    • without the reduction clause. res is shared, so accessing it needs to be synchronized

30.2 Critical Section

#pragma omp critical
{
  //call the critical sections
}

30.3 False Sharing

  • incorrect update of cache.
  • array: values next to each other, updating one, updates the whole cache line which contains other elements

30.4 Atomic

  • special case of critical section

30.5 Barriers

  • #pragma omp barrier: explicit
  • wait until all threads arrive
  • implicit barrier at end of block
  • lot of barriers by default
    • nowait()

30.6 Ordered:

sequencial ordering

  • if operation is not associative/commutative

30.7 Flush

  • memory synchronization
  • creates consistent view of memory
  • write everything to memory before flush, read everything after flush from memory

30.8 Environment Variable

  • setting options at runtime (after compiling, during execution)

30.9 Problems with openMP

  • when to flush
  • memory not symmetric, some memory close to some processor

30.10 Tree based reduction

  • divide and conquer
  • serial to parallel

30.11 Floating point is not associative

  • programmer needs to take care of it

31 Research Work: ACES4

  • Between the compiler and the runtime system, significant opportunities exist for error checking as well as static and dynamic analyses to improve performance
  • asynchronous task delegation
  • MADNESS DAG
  • any issues or bug fixes needed in code

32 Performance Models

32.1 Metrics

  • Elapsed time
    • TODO: how good is this, multi processors real time
    • Java: System.nanoTime
    • C++: chrono package
  • Price/Performance
  • Speedup
  • Efficiency
  • Energy Efficiency

32.2 Models

32.2.1 Simple Analytical model

T(1) = T_{serial} + T_{parallelizable}(1)

32.3 Amdahl's Law

  • Effect of serial part to overall performance.
    • or serial part limits the overall performance

32.4 Message Passing

  • UE: unit of execution
  • Memory Sharing needs locks
  • You cannot receive a message before it is sent

32.4.1 CSP (Communicating Sequential Processes)

  • CAR Hoare paper 1978
  • S0 || S1 … Sn
  • x ! S1 –|
  • y ? S0 –| Blocking

33 Go

33.1 Bg

  • type inference

33.2 Concurrency

  • share memory by communicating
  • goroutine:
    • lightweight
  • channels:
    • first class, can be passed around

34 Actors

  • works well when regular flow of data

34.1 Actors inside

  • local data
  • mailbox (message queueu)
  • Loop - handle messages

34.2 Difference with OOP

  • actors do not share memory
    • can't send references
    • send deep copy
      • many frameworks try to optimize this
        • send ref if safe
        • immutable messages are safe
        • enforcing encapsulation may not be guaranteed
  • sending message is synchronous in OOP

34.3 MPI

  • portable message passing format

34.4 TODO SPMD

35 MPI

35.1 MPI_type

  • constant

35.2 MPI_send blocks until buff is copied into system

  • similar to MPI_recv

35.3 SPMD

same code, based on id decide who process what part

35.4 Asychrony

  • send and recv return immediately, we'll need to handle using MPI_Wait

36 Vector Algorithms

36.1 Regular datastructures (homogeneous array)

  • instead of having array of structs, have stucts of array
    • better performance due to locality

36.2 Kernel Parallelism

  • Cuda and OpenCl

36.3 Vector parallelism (SIMD)

36.3.1 Challenges:

  • change problem- change code
  • special register - small number of vector registers
    • may have to swap to memory

36.3.2 How to use vector units

  • Language constructs
  • Compiler vectorization
    • typically loops
    • pass flag to see vector report
      • some code may make look un-vectorizable
  • vector instruction sets
    • Example: SSE 4.2

36.4 OpenCl

  • portable way to use all CPUs, GPUs etc

37 Final Review

37.1 Instructions

  • 6 pages notes
  • 10am to 12, 14th December

37.2 Preparation

  • Homework
  • Definitions
  • Previous Exams

38 Deadlocks

  • When threads holds multiple locks, think about deadlocks!
    • Eg: synchronized method invokes other synchronized method associated with a different monitor lock.

38.1 Solutions

  1. Resource ordering
    • can use System.identityHashCode
  2. Deadlock suspicion
    • lock with timeout
      • not available with monitor -> use reentrant

39 Livelock

  • narrow passage example: passage wide enough to allow only 1 person, two people

coming in opposite direction. Both move sideway to allow other to pass.

  • Solution: random backoff

40 Starvation

denial of resource

41 Invariant

conditions that are needed to be true at certain points.

41.1 Pre-condition

ways to check

  • Optimistic: do everything then check
  • Balking
    • Guarded Suspension: suspend until preconditions not true
    • Timed Waiting: wait for certain time then balk
  • atomic await <await B {S}> wait till B becomes true then atomically execute S
    • Implement
      • Test a condition and suspend thread if not true
      • release lock before suspending
      • awaken, re acquire lock before proceeding
    • Every Object in Java has 2 Queues: wait set and lock set

      <await B {S}> => synchronized (obj) {
                           while (!B) { obj.wait(); }
                           S;
                       }
      
      • wait gives safety
      • notify gives liveness
    • Condition Variables
      • Java: cv = ReentrantLock.newCondition()
        • each condition variable has it's own wait set.
        • Comparison with monitor

          cv await signal signalAll for specific condition
          monitor wait notify notifyAll for specific monitor lock
      • C++: std::condition_variable c;
        • c: takes in mutex (needed for release and reacquire mechanism)
        • optionally takes in condition (guard) else we need to manually loop on !B (condition)
    • Reader Writer lock: more expensive than mutex
      • only situation when write less and lock for small time

Author: Anurag Peshne

Created: 2017-04-05 Wed 22:10

Emacs 25.1.1 (Org mode 9.0.5)

Validate