Projects in Data Structures and Algorithms
COMPSCI 165
Sorts
Shell Sort
- A generalized version of insertion sort
- Takes in parameter of g, where g is the gap between swaps
- For example: if g = 5, then every fifth element should be sorted with each other
- Insertion sort is the case where g = 1
Hybrid Sort
- Same as merge sort, but once the size of the split arrays becomes small enough, run insertion sort
- Motivation: insertion sort works best with small lists, and the merge paradigm doesn’t work as well when there aren’t many elements
Randomness
- Donald Knuth’s definition of randomness: Each element is chosen randomly and independently from a distribution
- Various sources of randomness
- Radioactive decay
- Radio noise
- Hardware noise
- If any of these sources are truly random, then it can be XOR’d with non-random sources to create a truly random outcome
Skew Correction
- Method of creating a random number
- Choose two bits, discard if they are identical, use the first if they are not
- Analogous to flipping a coin repeatedly
- Computationally intensive
- Keep generating random numbers until they fall into your desired range
Pseudorandom Number Generators
- Mimics the properties of randomness without the computational load
- Important property: the output should not be determinable from previous outputs
- Known as being cryptographically strong
- Various hashing algorithms are known to be cryptographically strong, such as SHA-1 and MD5
Linear Congruent Generator (LCG)
- Parameters
- x0: start term
- P1: coefficient
- P2: intercept
- N: modulo term
- Equation: xn+1 = P1xn + P2 (mod N)
- Flawed because the sequence generated from an LCG will eventually repeat
Generating Random Sequences
Radix Sort
- Assign elements in a list a random, large number
- Sort these elements according to their assignments; creates a random permutation
Fisher Yates
- O(n) algorithm to randomly permute an array
for k = n - 1 down to 1:
j = random(k+1)
swap arr[k] and arr[j]
Bin Packing
- Problem: pack items of weights 0 <= wi <= 1 into bins with a capacity of 1 and find the minimum number of bins needed
- NP-hard problem
- Different than knapsack: trying to find smallest number of bins as opposed to most value
- Metrics exist to determine how good a solution can be
- Approximation Ratio (AR): Approximated Solution / Optimal Solution
- Waste: Waste = Number of Bins - Sum of Item Weights
Next-Fit (NF)
- Check to see if item fits in the current bin
- If it does, place it there; if not, move to the next bin
- AR will be no worse than 2
First-Fit (FF)
- Scan bins in order and place the new item in the first (leftmost) bin that can hold it
- Create a new bin if it doesn’t fit anywhere
- Can use a balanced binary tree to store bins to get an O(n log n) run time for the algorithm
- Use bin-index as the primary key, and store the maximum capacity available to determine if you can go left/right
- AR is no worse than the ceiling of 1.7m
Best-Fit (BF)
- Same as FF, but leftmost restriction doesn’t apply
- Binary search on bins for best-match capacity left; O(n log n)
- After adding an item to a bin, you can delete the node and re-insert it with its new capacity
FFD and BFD
- Sort the items from largest to smallest first, then run FF or BF, respectively
Algorithms and Architectures
- Algorithm descriptions require a computational model to start as a foundation
Turing Machine
- Mathmatical model
- Any polynomial-time function has a polynomial-time algorithm on a Turing machine
- Any computational model or system than can complete any operation a Turing machine can is known as Turing-complete
- The ENIAC (first digital computer) was Turing-complete
Von Neumann Architecture
- New model that consisted of a control unit, arithmetic/logic unit (ALU), and memory unit to store instructions and execute them
- Built upon by the Random Access Machine (RAM)
- Programming languages were built for the RAM and Von Neumann machines
- Based on high-level programming languages; consisted of artihmetic + logical operators, conditionals, loops, subroutines, memory access
- Word RAM Model: Computational model in which a RAM can do arithmetic and bitwise operations on a word of w bits in constant time
Memory Hierarchy
- Defined by a trade-off of size vs. speed
- The closer to the core/CPU, the faster lookup is, but the less you can store
External Memory Model
- Identifies he frontier between the two highest layers in the memory hierarchy
- B = block size
- M = “internal” memory size
- N = “external” input size
- Can use a B-tree to maintain a map in external memory
- Adapted version of a (a,b) tree
Parallel Random Access Machine (PRAM)
- Performs operations synchronously over p processors
- Exclusive Read (ER): p processors can simultaneously read the content of p distinct memory locations
- Concurrent Read (CR): p processors can simultaneously read the content of p’ memory locations where p’ < p
- Exclusive Write (EW): p processors can simultaneously write the content of p distinct memory locations
- Concurrent Write (CW): p processors can simultaneously write the content of p’ memory locations where p’ < p