Programming Assignment:

Problem 1: In this assignment you will implement a program which prints out all
anagrams of a specified string. Two strings are anagrams of one
another if by rearranging letters of one of the strings you can obtain
another. For example, the string "toxic" is an anagram of "ioxct". For
the purpose of this assignment, we are only interested in those
anagrams of a given string which appear in the dictionary. The
dictionary you should use is the file words.txt.
Algorithm and Implementation:
Since we will be
performing multiple anagram queries, our first step is to load all of
the (25,000) words in the dictionary into an appropriate data
structure. A primary requirement is that one must be able to
efficiently search this data structure to look for anagrams of a given
word. A clever trick that can be used is to first
sort the letters of every word we insert into our data structure to
produce a key for each word. For example, the key for the string
"toxic" is "ciotx", similarly the key for both "star" and "rats" is
"arst". (Expert Scrabble players will recognize this procedure as
computing the alphagram of a set of letters). We will then use a hash
table to store pairs of strings, where the pair consists of the
original word and its key.

When performing insertions, we will compute the hash of the key of the
word to compute the correct bucket. This approach guarantees that all
words which are anagrams of one another are stored in the same bucket
of the hash table.  Similarly, when we are searching for anagrams, we
will first compute the key of the word we are searching for, then hash
the key, then search that bucket for anagram matches. You should feel
free to use the methods described in the text book and in class for
appropriate hash functions for hashing strings (but please cite any
source which you use).

Write and submit two files:

1. hashtable.c -- basic
functionality (insert, find) of a chained hash table.
2.  anagram.c --
a main program which loads words from the dictionary into the
hashtable and then answers anagram queries.

Note: You may fix a size for your hash table -- for efficient searching,
Recommended size at least 10,000 buckets. (For debugging your code,work with a
much smaller practice dictionary, perhaps 10 words, and a much smaller hash
table, perhaps 8 buckets).

You may disregard any words in the dictionary which contain any characters
which are not lowercase alphabetic characters (uppercase,
punctuation, numbers).

Your program should read anagram queries from a file which has been
redirected on standard input. Each query will be on its own line and
will simply consist of a string. The output file should echo the
string, print the number of matching anagrams and list the anagrams.

Problem 2:
The goal of this assignment is to empirically verify the performance 
of a hash table with
two different probing strategies (a) linear probing
and (b) quadratic probing. In both the cases you have to report the average number
of probes. The latter is defined as the total number of times a table
location is calculated, i.e., including the one calculated by the hash
function.

After being sure that your average number of probes is being
calculated correctly, empirically verify the performance vs. load
factor lambda. 
Notes:
1. Use words from words.txt. 

2. For this assignment, we are
interested only in the expected number
of probes required to do a single unsuccessful search or an insertion for a
given load factor lambda.

3. To get the expected number of probes, you have to measure the
expected number of probes that it takes to insert a single item into a
hash table that is already “loaded” to a load factor lambda.
Naturally, you should be picking words at random to do test
insertions, so you should expect some variability in the number of
probes it takes to insert each one. Therefore to get meaningful
answers, you have to insert several and average the number of probes.
A minor problem is that each insertion you do will change the load
factor. However, with a large table this effect should be small enough
to be ignored. That is, if you have 25,005 words and create table of
size 50,000 you will have a load factor of 0.5 after inserting 25,000
of them. Inserting 5 or 10 more will change the load factor very
little.

4. Run your program for lambda = 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.65,
0.70, 0.75, 0.8, 0.85 and 0.9. Plot the result with a spread sheet, or
carefully on graph paper.

Problem 3:

The birthday paradox is the simple fact that the probability that at least two people in a room of randomly chosen people will share a common birthday is much greater than most people expect.

Design and implement a C program that demonstrates this with a simulation. Your program should test values of n = 5, 10, 15, ..., 50 randomly chosen people. For each value of n, run 100 experiments in which n people are chosen at random and determine for what proportion of the 100 trials at least two of the n people have the same birthday.

Your program should print the results in a simple table, such as:

  n     %
----  -----
  5   0.013
 10   0.121
 15   0.347


Problem 4 (HASH): Suppose we want to find the first occurrence of a string p1p2. . . pk in a long input string a_1a_2 . . . a_n. We can solve this problem by hashing the pattern string, obtaining a hash value h_p, and comparing this value with the hash value formed from a_1a_2 . . . a_k,a_2a_3 . . . a_k+1,a_3a_4 .. . a_k+2, and so on until a_n-k+1a_n-k+2 . . . a_n. If we have a match of hash values, we compare the strings character by character to verify the match. We return the position (in a) if the strings actually do match, and we continue in the unlikely event that the match is false. Write a program to implement this algorithm.