Sorts

Shell Sort

  • A generalized version of insertion sort
  • Takes in parameter of g, where g is the gap between swaps
    • For example: if g = 5, then every fifth element should be sorted with each other
    • Insertion sort is the case where g = 1

Hybrid Sort

  • Same as merge sort, but once the size of the split arrays becomes small enough, run insertion sort
  • Motivation: insertion sort works best with small lists, and the merge paradigm doesn’t work as well when there aren’t many elements

Randomness

  • Donald Knuth’s definition of randomness: Each element is chosen randomly and independently from a distribution
  • Various sources of randomness
    • Radioactive decay
    • Radio noise
    • Hardware noise
    • If any of these sources are truly random, then it can be XOR’d with non-random sources to create a truly random outcome

Skew Correction

  • Method of creating a random number
  • Choose two bits, discard if they are identical, use the first if they are not
    • Analogous to flipping a coin repeatedly
    • Computationally intensive
  • Keep generating random numbers until they fall into your desired range

Pseudorandom Number Generators

  • Mimics the properties of randomness without the computational load
  • Important property: the output should not be determinable from previous outputs
    • Known as being cryptographically strong
    • Various hashing algorithms are known to be cryptographically strong, such as SHA-1 and MD5

Linear Congruent Generator (LCG)

  • Parameters
    • x0: start term
    • P1: coefficient
    • P2: intercept
    • N: modulo term
  • Equation: xn+1 = P1xn + P2 (mod N)
  • Flawed because the sequence generated from an LCG will eventually repeat

Generating Random Sequences

Radix Sort

  • Assign elements in a list a random, large number
  • Sort these elements according to their assignments; creates a random permutation

Fisher Yates

  • O(n) algorithm to randomly permute an array
for k = n - 1 down to 1:
    j = random(k+1)
    swap arr[k] and arr[j]

Bin Packing

  • Problem: pack items of weights 0 <= wi <= 1 into bins with a capacity of 1 and find the minimum number of bins needed
  • NP-hard problem
  • Different than knapsack: trying to find smallest number of bins as opposed to most value
  • Metrics exist to determine how good a solution can be
    • Approximation Ratio (AR): Approximated Solution / Optimal Solution
    • Waste: Waste = Number of Bins - Sum of Item Weights

Next-Fit (NF)

  • Check to see if item fits in the current bin
    • If it does, place it there; if not, move to the next bin
  • AR will be no worse than 2

First-Fit (FF)

  • Scan bins in order and place the new item in the first (leftmost) bin that can hold it
    • Create a new bin if it doesn’t fit anywhere
  • Can use a balanced binary tree to store bins to get an O(n log n) run time for the algorithm
    • Use bin-index as the primary key, and store the maximum capacity available to determine if you can go left/right
  • AR is no worse than the ceiling of 1.7m

Best-Fit (BF)

  • Same as FF, but leftmost restriction doesn’t apply
  • Binary search on bins for best-match capacity left; O(n log n)
  • After adding an item to a bin, you can delete the node and re-insert it with its new capacity

FFD and BFD

  • Sort the items from largest to smallest first, then run FF or BF, respectively

Algorithms and Architectures

  • Algorithm descriptions require a computational model to start as a foundation

Turing Machine

  • Mathmatical model
  • Any polynomial-time function has a polynomial-time algorithm on a Turing machine
  • Any computational model or system than can complete any operation a Turing machine can is known as Turing-complete
    • The ENIAC (first digital computer) was Turing-complete

Von Neumann Architecture

  • New model that consisted of a control unit, arithmetic/logic unit (ALU), and memory unit to store instructions and execute them
  • Built upon by the Random Access Machine (RAM)
  • Programming languages were built for the RAM and Von Neumann machines
    • Based on high-level programming languages; consisted of artihmetic + logical operators, conditionals, loops, subroutines, memory access
  • Word RAM Model: Computational model in which a RAM can do arithmetic and bitwise operations on a word of w bits in constant time

Memory Hierarchy

  • Defined by a trade-off of size vs. speed
  • The closer to the core/CPU, the faster lookup is, but the less you can store

External Memory Model

  • Identifies he frontier between the two highest layers in the memory hierarchy
    • B = block size
    • M = “internal” memory size
    • N = “external” input size
  • Can use a B-tree to maintain a map in external memory
    • Adapted version of a (a,b) tree

Parallel Random Access Machine (PRAM)

  • Performs operations synchronously over p processors
    • Exclusive Read (ER): p processors can simultaneously read the content of p distinct memory locations
    • Concurrent Read (CR): p processors can simultaneously read the content of p’ memory locations where p’ < p
    • Exclusive Write (EW): p processors can simultaneously write the content of p distinct memory locations
    • Concurrent Write (CW): p processors can simultaneously write the content of p’ memory locations where p’ < p