4010-441 Principles of Concurrent System Design
Bucket Hash Map
This page is accessible at www.se.rit.edu/~se441/Assignments/BucketHashMap.html.
This activity will be done in pairs within your Project 1 group.
See ToDo file for comments on this assignment from 2122.
Introduction
A bucket hash map is a hash map implemented via an array of N buckets. Each bucket is associated with those keys whose Java hashCode (modulo N)
is the bucket's index in the array, and each bucket has a list of some
sort to hold the key/value pairs for its associated keys. One
reason for such a structure is to cut down on the search time for a key
- on average, each bucket holds only 1/N of the hash map entries. In
addition, distinct threads can access different buckets can proceed
concurrently. The source code for a simplfied hash map using native
Java synchronization is at SynchronizedBucketHashMap.java.
While this version works, it has two significant drawbacks:
- Access
to each bucket is exclusive, even though buckets are read much more
frequently than they are written, and read accesses can proceed
concurrently.
- The computation of the hash map's size is
inconsistent - it may reflect a size which the map never had. For
instance, since the size of each bucket is computed while only holding
that bucket's lock, changes to other buckets can lead to an incorrect
size. For example, assume there are 3 buckets in the map and the bucket
has 50 key/value pairs in it:
- Bucket 0 is locked, its size added in, and bucket 0's lock is released.
- Bucket 1 is locked, its size added in, and bucket 1's lock is released.
- Two key/value pairs are removed from bucket 0 by another thread.
- Three key/value pairs are added to bucket 2 by another thread.
- Bucket 2 is locked, its size added in, and bucket 2's lock is released.
- At
this point, the reported size only accounts for the addition to bucket
2, not the removal in bucket 0. The reported size of 53 is wrong - the
size if 51 - and what is more, the map never had 53 elements in it!
- The
problem could be addressed by acquiring locks on all the buckets before
computing the size, but the basic java mechanisms do not allow this in
the general case.
The Problem
Your team is to convert SynchronizedBucketHashMap to a ConcurrentBucketHashMap as follows:
- Copy over the provided file and change the class name globally.
- Add a private ReadWriteLock to the state of the internal Bucket class, and add interface methods to Bucket to lock and unlock the read and the write locks - four methods in all.
- Replace
the basic Java synchronization with explicit locking, ensuring that
maximum possible concurrency while maintaining overall state
consistency (i.e., safety).
- Ensure that size()
returns a consistent value by acquiring all read locks first, then, as
each bucket's size is accumulated, releasing that bucket's lock.
You
must, of course, test your implementation both to ensure proper
functionality and to show that increased concurrency is possible. For
this latter issue, you are free to instrument your ConcurrentBucketHashMap
to (a) force thread switches and insert delays, and (b) to print out
trace information showing proper concurrent access by multiple client
threads. Collect this trace information into a text file and annotate
it to show that concurrent access to a given bucket is possible, while
read access to a bucket being written is impossible (actually, that it cannot be produced in a trace).
Deliverables
Submit a exactly two files named exactly ConcurrentBucketHashMap.java and Trace.txt to the Bucket Hashmap Activity dropbox by the due date. Note that ConcurrentBucketHashMap.java must be free of any instrumentation code.
Of the 10 points available for this activity,
- 2 will be granted on an all-or-nothing basis for a correct submission in accordance with the
preceding specification,
- 3 will be for functionality (including maximizing concurrency),
- 2 will be for the analysis shown in the trace file.
- 2 will be for the quality of:
- the implementation (simplicity, elegance, no repetition, etc.),
- the solution (naming, spacing, indentation, etc.), and
- 1 will be based on whether or not the internal documentation is of acceptable quality.
In all cases, points assigned represent your instructor's judgement as
a professional engineer, and are not subject to negotiation.