How does the choice of multiplier modulus and hashmapsize


You'll be implementing a hash map which you'll use to create a password management system. In Section A, you'll implement and compare different implentations of hash maps and in Section B, you'll implement a simple password manager.

Please read each section carefully and submit your code to PASTA and report to eLearning as described:

• use the same class and method names as those in the code skeletons provided
• your report should not be longer than 2 A4 pages including any tables, graphs, and references - if you write more than this, only the first two pages will be marked
• structure your report in sections with names matching the section names used in this handout
• you may write each section as a series of dot points, or in full sentences

Data sets

In this assignment, you'll be testing your implementations on two data sets. Please download the data sets from the following locations. These data sets are for testing only and we take no responsibility for their content.

For Section A:

US census data set

https://www.census.gov/genealogy/www/data/1990surnames/dist.male.first

• 1219 most common male first names from the 1990 US census
• presented in tab separated columns:
1. name
2. frequency of name in population (%)
3. cumulative frequency of name in population (%)
4. frequency rank

For Section B:

Password data set
https://github.com/discourse/discourse/blob/master/lib/common_passwords/10k-common-passwords.txt
• collection of 10000 commonly used passwords
• presented as one password per line

OR

Alternative Password data set
• collection of 10000 common English words
• presented as one word per line
• available on eLearning.

A Hash Map Implementations
In this section, you will implement and compare three hash maps. Each differs in the method used to resolve collisions. The three methods we will investigate are linear probing, double hashing, and chaining. For all implementations, you may assume that you do not have to perform any resizing.

You will remember from lectures that a hash map is a data structure which stores key-value pairs by using a hash function to map the key of each to an index in a table. In particular, you will use HashMapNode to store your key-value pairs and will define a hash function to tell you where in the map to store the node.

For your linear probing and double hashing hash map implementations, nodes should be implemented with class name HashMapNode and this should have the following structure.

public class HashMapNode {

// construction
public HashMapNode(Object key, Object value)

// get methods
public Object getKey() public Object getValue()
// set method
public void setValue(Object newValue)
}

For your chaining hash map implementation, nodes should be implemented with class name ChainingHashMapNode
and this should additionally store a pointer to the next node in its chain.

public class ChainingHashMapNode {

// construction
public ChainingHashMapNode(Object key, Object value)

// get methods
public Object getKey() public Object getValue()
public ChainingHashMapNode getNext()

// set methods
public void setValue(Object newValue)

public void setNext(ChainingHashMapNode next)
}

Method 1: Linear probing

Code
Your first task is to implement a class called HashMap, which stores HashMapNode entries and uses linear probing to resolve collisions. Since we may want to re-use these core classes, you will need to ensure that they are type-safe, by using the Java Generics framework1, as outlined in the instructions below.

The hash function you need to implement is:

hash(key) = abs(multiplier • hashCode(key)) mod modulus

where abs is absolute value and hashCode is Java's .hashCode() method (see Week 7 tutorial).

Hint: to use the output of hash as your index into the table, you will need to perform mod hashMapSize.

public class HashMap {

// construction
public HashMap(int multiplier, int modulus)
// construct a HashMap with 4000 places and given hash paarameters

public HashMap(int hashMapSize, int multiplier, int modulus)
// construct a HashMap with given capacity and given hash parameters

// hashing
public int hash(K key)

// size
public int size()
// return the number of nodes currently stored in the map public boolean isEmpty()
// interface methods public List keys()

public V put(K key, V value) public V get(K key)
public V remove(K key)
}

Instructions for construction:

• your node store should be initialised an empty array, typed to have elements which are HashMapNode

• its capacity will either be the default value of 4000, or given as hashMapSize to the constructor

• multiplier and modulus are your hashing parameters; store these in instance variables so you can access them when the hash method is called

Instructions for the interface methods:
• keys
• return an array list of the keys of the nodes currently stored in the map
• put
- insert a node into the map with its key and value set to the objects given
- if an element already exists with the given key, return the original value and update to the value given
- return null if the key was not already stored
- collisions should be handled using the linear probing method
• get
- return the value of the node whose key matches the given argument
- return null if the key is not stored in the map
• remove
- remove the value of the node whose key matches the given argument and return it
- place a defunct marker in the map in its place
- return null if the key is not stored in the map

Impact of hash function on linear probing

In this section, you will explore how your choice of hash function impacts on the performance of your hash map. In particular, you will investigate how often collisions occur and how far your linear probe needs to scan as you change the values of multiplier, modulus, hashMapSize. To do this, add three additional methods to your HashMap which output statistics about the collisions encountered by any call to the put method, and one which resets them.

Code

public class HashMap {

public int putCollisions() public int totalCollisions() public int maxCollisions() public void resetStatistics()
}

• putCollisions
- return the number of calls to the put method which encountered a collision, resulting in the entry not being place in the index indicated by hash, but instead one found by moving the linear probe
- if multiple collisions occur on a single put call, this number should only go up once, not once per subsequent collision
- return 0 if no collisions have occured
• totalCollisions
- return the (accrued) number of probing steps needed in put method calls to add entries to the hash map; i.e. increment on each probing step needed within any call to put
- return 0 if no collisions have occured
• maxCollisions
- return the maximum number of probing steps that is ever needed to add a new entry to the table on any call to put
- return 0 if no collisions have occured
• resetStatistics
- reset all three statistics to be zero

Report

Store the census data in your hash map, using the parameters presented below, and report on the collision statistics observed. In particular, instantiate hash maps using the given parameters which have String keys and Double values; these will map each name in column 1 of the dist.male.first to its frequency in column
2. To help you, we have provided a code snippet for reading the US census data file in Appendix A.

In your PDF report, fill in the following table and answer the following questions. If you run further experiments with different settings, please include them in your report.

1. How does the choice of multiplier, modulus, and hashMapSize affect the number of collisions seen and the number of hops the probe needs to make?
2. Can you explain why these trends are observed?
3. Why are these factors important for implementations which will be used by real world applications?
4. What improvements would you suggest to improve the choice of hash function your map relies on?


Method 2: Double Hashing
In this section you will explore performance of using double hashing as an alternative method of handling collisions. You will remember that, in double-hashing, instead of the probe being moved along the store in hops of 1 place, it is instead moved along in hops of size determined by a secondary hash function. Updates to the index are given by the formula i + j • d(k) where i is the ouput of your primary hash function, d(k) is your secondary hash function applied to the key k, and j increments from 0 until an available cell is found to store the new entry.

Code
Create a new class called DoubleHashMap which re-uses the methods you implemented for HashMap but instead handles collisions using double hashing. You will need to edit your constructor methods to allow users to specify a secondaryModulus and add one further public method, secondaryHash, which implements the secondary hash function:


secondaryModulus - (abs(hashCode(key)) mod secondaryModulus)

public class DoubleHashMap {

// updated construction
public DoubleHashMap(int multiplier, int modulus, int secondaryModulus)
// construct a DoubleHashMap with 4000 places and given hash parameters

public DoubleHashMap(int hashMapSize, int multiplier, int modulus, int secondaryModulus)
// construct a HashMap with given capacity and given hash parameters

....

public int secondaryHash(K key)

}

Report

Report on the effect of using double hashing compared to linear probing in storing the US census data by filling in the following table and answering the following questions. If you run further experiments with different settings, please include them in your report.

1. What does the case of secondaryModulus = 1 correspond to?

2. How does the use of double hashing affect the number of collisions seen and the distance the probe needs to be displaced compared to linear probing?

3. Can you explain why these trends are observed?

4. What do your findings mean for how hash maps should be implemented for real world applications?

Method 3: Chaining

In this section you will explore the performance of using chaining instead of open addressing in your hash map. In particular, where cells of your hash map have been elements in an array in which store up to one object of type HashMapNode, they will now hold one element on type ChainingHashMapNode.

Code

Create two new classes, called ChainingHashMapNode and ChainingHashMap, which borrow your core im- plementation from HashMapNode and HashMap with the following changes
• ChainingHashMapNode has a pointer to another ChainingHashMapNode, and so, acts like a node in a singly linked list
• in place of the collision statistics in HashMap (i.e. putCollisions, totalCollisions, maxCollisions, and resetStatistics), implement the method getFullestBuckets which returns an int[] of size two, with the first element being the number of nodes stored in the fullest hash map cell and the second element being the number of hash map cells containing this number of nodes
For instance, getFullestBuckets should return {3, 1} given the hash map overleaf.

public class ChainingHashMap {

// construction
public ChainingHashMap(int multiplier, int modulus)
// construct a HashMap with 4000 places and given hash parameters

public ChainingHashMap(int hashMapSize, int multiplier, int modulus)
// construct a HashMap with given capacity and given hash parameters

// hashing
public int hash(K key)

// size
public int size()
// return the number of nodes currently stored in the map public boolean isEmpty()
// interface
public int[] getFullestBuckets() public List keys()
public V put(K key, V value) public V get(K key)
public V remove(K key)
}

Report

Report on the effect of using a hash map which resolves collisions by chaining instead of using open addressing to store the US census data. In particular, fill in the following table and answer the following questions. In your answer to the questions, you should consider all three methods for resolving collisions linear probing, double hashing, and chaining.

1. How do your quantitative results here relate to those you obtained in your open addressing experiments on linear probing and double hashing?

2. What advantages and disadvantages do you see in the three methods for real world applications?

B Password Manager

You will now implement SimplePasswordManager, which stores and manages users' passwords. For security, we do not want to store the actual password string used by each user but, rather, the hash of the password string: in the case that our data store is compromised, we don't want nefarious individuals to have free access to user passwords. The result is that, instead of checking at authentication that the password entered is the same as the password stored, we will instead need check that the hash of the password entered is the same as the (hashed) password representation stored.

Report

Secure hash functions for password storage

Do some research to answer the following questions:
1. What properties does a good hash function for storing password data have?
2. Why are each of these properties important for security?
3. Select a hash function designed for storing passwords (e.g. djb2, sdbm, lose lose) and:
• give the algorithm used to calculate the hash in psuedocode
• describe what the algorithm does in you own words

Code

Implement a class called SimplePasswordManager which uses a hash map to store user passwords and has a series of methods to manage their use. You may use any one of the hash map implementations from Section A. Your hash map will need to have String keys which are usernames and Long values which reflect the hashed value of that user's password. Since we used a type-safe implementation of HashMap, the key-value types need to be specified to instantiate a HashMap to store users. If you forget how to do this, revise the work on type-safe data structures we have covered in tutorials (especially Week 7).

public class SimplePasswordManager {

// construction
public SimplePasswordManager()
// construct a SimplePasswordManager with 4000 places and default hash parameters multiplier = 1 and modulus = 4271

public SimplePasswordManager(int size, int multiplier, int modulus)
// construct a SimplePasswordManager with size number of places

// hashing
public Long hash(String password)

// interface methods
public List listUsers()
// return an array of the usernames of the users currently stored public boolean authenticate(String username, String password)
public void addNewUser(String username, String password) public void deleteUser(String username, String password)
public void resetPassword(String username, String oldPassword, String newPassword)

}

Instructions for hashing:
• implement the hash function you found in your research, that is:
- you may not import a library which implements hash functions - the implementation must be your own
- the hash function on SimplePasswordManager will be different to the hash function you imple- mented on your hash maps - the hash function on your hash maps were used to index the entries in the map while this hash function on SimplePasswordManager will be used to securely store passwords
- no test on PASTA will assume a particular hash function, only that hash takes a single String argument and returns a Long

Instruction for the interface methods:
• authenticate
- check that the hash of the input password is the same as the stored password for the given user
- if no user exists in the store with the given username, print the message "No such user exists." to the console, and return false
• addNewUser
- add a new user to your HashMap store with the given username and password
- if a user with the given username is already stored, print the message "User already exists." to the console
• deleteUser
- delete the given user from the store of users after authenticating the given password
- if no user exists in the store with the given username, print the message "No such user exists." to the console
- if the given password fails authentication, print the message "Failed to authenticate user." to the console
• resetPassword
- update the stored password for the given user to the (hash of) the newPassword after authenti- cating the oldPassword
- if no user exists in the store with the given username, print the message "No such user exists." to the console
- if the oldPassword fails authentication, print the message "Failed to authenticate user." to the console

Report on how well the hash function you chose uniquely identifies passwords. In particular, report how many of the 10000 common passwords in the Password data set have the same value when hashed by your hash function. You may find the code snippet in Appendix B helpful for reading the password file.

Discuss your findings with refererence to the following questions:

1. What are the practical implications of having multiple passwords having the same hash representation?

2. What improvements could you make to the hash function you chose to improve security?

Request for Solution File

Ask an Expert for Answer!!
Data Structure & Algorithms: How does the choice of multiplier modulus and hashmapsize
Reference No:- TGS01210823

Expected delivery within 24 Hours