SWEN-342 Eng of Concurrent & Distributed SW Systems

Bucket Hash Map

This activity will be done individually.

Introduction

A bucket hash map is a hash map implemented via an array of N buckets. Each bucket is associated with those keys whose Java hashCode (modulo N) is the bucket's index in the array, and each bucket has a list of some sort to hold the key/value pairs for its associated keys. One reason for such a structure is to cut down on the search time for a key - on average, each bucket holds only 1/N of the hash map entries. In addition, distinct threads can access different buckets can proceed concurrently. The source code for a simplfied hash map using native Java synchronization is at SynchronizedBucketHashMap.java.

While this version works, it has two significant drawbacks:

  1. Access to each bucket is exclusive, even though buckets are read much more frequently than they are written, and read accesses can proceed concurrently.
  2. The computation of the hash map's size is inconsistent - it may reflect a size which the map never had. For instance, since the size of each bucket is computed while only holding that bucket's lock, changes to other buckets can lead to an incorrect size. For example, assume there are 3 buckets in the map and the bucket has 50 key/value pairs in it:
    1. Bucket 0 is locked, its size added in, and bucket 0's lock is released.
    2. Bucket 1 is locked, its size added in, and bucket 1's lock is released.
    3. Two key/value pairs are removed from bucket 0 by another thread.
    4. Three key/value pairs are added to bucket 2 by another thread.
    5. Bucket 2 is locked, its size added in, and bucket 2's lock is released.
    6. At this point, the reported size only accounts for the addition to bucket 2, not the removal in bucket 0. The reported size of 53 is wrong - the size if 51 - and what is more, the map never had 53 elements in it!
  3. The problem could be addressed by acquiring locks on all the buckets before computing the size, but the basic java mechanisms do not allow this in the general case.

The Problem

Convert SynchronizedBucketHashMap to a ConcurrentBucketHashMap as follows:

  1. Copy over the provided file and change the class name globally.
  2. Add a private ReadWriteLock to the state of the internal Bucket class, and add interface methods to Bucket to lock and unlock the read and the write locks - four methods in all.
  3. Replace the basic Java synchronization with explicit locking, ensuring that maximum possible concurrency while maintaining overall state consistency (i.e., safety).
  4. Ensure that size() returns a consistent value by acquiring all read locks first, then, as each bucket's size is accumulated, releasing that bucket's lock.

You must, of course, test your implementation both to ensure proper functionality and to show that increased concurrency is possible. For this latter issue, you are free to instrument your ConcurrentBucketHashMap to (a) force thread switches and insert delays, and (b) to print out trace information showing proper concurrent access by multiple client threads. Collect this trace information into a text file and annotate it to show that concurrent access to a given bucket is possible, while read access to a bucket being written is impossible (actually, that it cannot be produced in a trace).

Deliverables

Submit your solution by the deadline in MyCourses by pushing it to the GitHub repository. At a minimum you must have two files named  exactly ConcurrentBucketHashMap.java and Trace.txt. Note that ConcurrentBucketHashMap.java must be free of any instrumentation code.

The activity will be graded based on the course rubric.