State of the QuickUnion

In my last look at the UnionFind algorithm I made a quick-find implementation — finding if two elements were connected was fast but connecting two elements was slow. The union operation is a good candidate for optimization.

Quick Union

One way to optimize the union operation is to reduce the number of array accesses required.

In QuickFind, the sets are arranged in a flat array of elements which all share the same value: they are all wearing the same t-shirt. When we connect elements from different sets, we must make sure that every element in the first set changes its t-shirt to match the elements in the other set. This is a time-consuming process, and uses a lot of t-shirts.

What if only one element had to change its t-shirt? If the set had a single element in charge – a root – then joining elements from different sets would require identifying the t-shirt of the one in charge, and then changing to that element’s t-shirt.

Using an array instead of t-shirts to keep track of the values, we can walk through what this implementation would look like:

Index 1 2 3 4 5 6 7 8 9 0
Value 1 2 3 4 5 6 7 8 9 0 | Each element begins as the root of its own set
union(4,9) 1 2 3 9 5 6 7 8 9 0 | root(9) = 9, root(4) = 4, change array[4] = 9
union(2,4) 1 9 3 9 5 6 7 8 9 0 | root(4) = 9, root(2) = 2, change array[2] = 9
union(4,7) 1 9 3 9 5 6 7 8 7 0 | root(7) = 7, root(4) = 9, change array[9] = 7
find(2,7) 1 9 3 9 5 6 7 8 7 0 | root(7) = 7, root(2) = 7. The two are connected

QuickUnion

Again following the basic API:

public class UF
---------------
public UF(int N)
public int find(int p, int q)
public void union(int p, int q)

I adapted the QuickUnion in ruby:

class QuickUnion
  def initialize(size)
    @find_array = Array.new(size){ |index| index }
  end

  # check if p and q have the same root
  def find(p, q)
    verify(p, q)
    root(p) == root(q)
  end

  # set id of p's root to the id of q's root
  def union(p, q)
    verify(p, q)
    @find_array[root(p)] = root(q)
  end

  private
    def root(index)
      # get root until root(index) == index
      if @find_array[index] == index
        return index
      else
        root(@find_array[index])
      end
    end

    def verify(p, q)
      raise NotAnElement if @find_array[p].nil?
      raise NotAnElement if @find_array[q].nil?
    end
end

class NotAnElement < Exception
end

This implementation is a QuickUnion because in an array of 1,000,000 entries, connecting element p to element q is easy: you only have to find the root of p, and then change q to match (one root() and one array access). Finding if p and q are connected takes a little bit longer, because you have to perform two root operations: one for p and one for q, but it is still faster than accessing every element in the array.

Benchmark

$ ruby lib/benchmark.rb
=> Array Size: 100
       user     system      total        real
   0.000000   0.000000   0.000000 (  0.001865)
   0.000000   0.000000   0.000000 (  0.000377)
$ ruby lib/benchmark.rb
=> Array Size: 1000
       user     system      total        real
   0.120000   0.010000   0.130000 (  0.124929)
   0.010000   0.000000   0.010000 (  0.008825)
$ ruby lib/benchmark.rb
=> Array Size: 10000
       user     system      total        real
  10.120000   0.030000  10.150000 ( 10.146124)
   0.520000   0.000000   0.520000 (  0.520377)

Comparing QuickFind (top line) with QuickUnion(bottom line), we can actually measure how well this small optimization of the union operation has increased performance with larger datasets.

Awesome.