State of the UnionFind

I was scraping the rust off my data-structures synapses last week, and found that there was still some functionality underneath. UnionFind is one of those sneaky algorithms that crops up in non-linear arrangements of data, such as computer or social networks. Specifically, it keeps track of sets of connected elements. The API consists of a union(p, q), which connects the two elements p and q, and a find(p, q), which determines if p is in the same set as q.

I learned about this algorithm through Coursera, and their class Algorithms Part 1 with Kevin Wayne and Robert Sedgewick. The focus of this course was not only on implementation of algorithms, but also optimization.

Union Find

A simple model for connectivity is 10 objects numbered 1 through 0:

0 1 2 3 4 5 6 7 8 9

Making random connections will produce disjointed sets of objects:

0 1 { 2 3 9 } { 5 6 } 7 { 4 8 }

A find(p, q) will determine if p and q are in the same set:

  • find(0, 8) => false
  • find(2, 9) => true

A union(p, q) will put p and q in the same set:

  • union(1, 6) => 0 { 2 3 9 } { 1 5 6 } 7 { 4 8 }
  • union(2, 4) => 0 1 { 5 6 } 7 { 2 3 9 4 8 }

Using an array to keep track of the values, we can walk through what this implementation would look like:

Index 1 2 3 4 5 6 7 8 9 0
Value 1 2 3 4 5 6 7 8 9 0
union(4,9) 1 2 3 9 5 6 7 8 9 0 | Changes all values that match array[4] to match array[9]
union(2,4) 1 9 3 9 5 6 7 8 9 0 | Changes all values that match array[2] to match array[4]
union(4,7) 1 7 3 7 5 6 7 8 7 0 | Changes all values that match array[4] to match array[7]

Using the java api provided by Kevin Wayne and Robert Sedgewick:

public class UF
----------------
public UF(int N)
public int find(int p, int q)
public void union(int p, int q)

I adapted the basic union find in ruby:

class QuickFind

  attr_accessor :find_array

  def initialize(size)
    @find_array = Array.new(size){ |index| index }
  end

  def find(p, q)
    location(p) == location(q)
  end

  def union(p, q)
    @find_array = create_union(p, q)
  end

  private

    def location(index)
      raise NotAnElement if @find_array[index].nil?
      return @find_array[index]
    end

    def create_union(p, q)
      target = location(p)
      goal = location(q)
      temp_array = @find_array.map do |element|
        element == target ? goal : element
      end
    end
end

class NotAnElement < Exception
end

Sets are arranged as a flat array of elements which all share the same value: they are all wearing the same t-shirt. When we connect elements from different sets, we must make sure that every element in the first set changes its t-shirt to match the elements in the other set. This is a time-consuming process, and uses a lot of t-shirts.

This implementation is called QuickFind because in an array of 1,000,000 entries, finding if two elements are connected is easy (compare the t-shirt at index p with the t-shirt at index q), but connecting them requires inspection of every element in the array in order (check everyone’s t-shirts) to make sure all the previous connections remain in place.

Benchmark

To compare efficiency of this and future UnionFind implementations, I built a basic benchmark that performs a set number of union and find operations and measures the time to completion:

require 'benchmark'
@size = 10000

def testUnit(object)
  @size.times do
    object.union(rand(@size), rand(@size))
    object.find(rand(@size), rand(@size))
  end
end

Benchmark.bm do |x|
  x.report { testUnit(QuickFind.new(@size)) }
end

Running the benchmark on QuickFind:

$ ruby lib/benchmark.rb
=> Array size = 100
       user     system      total        real
   0.000000   0.000000   0.000000 (  0.002745)
$ ruby lib/benchmark.rb
=> Array size = 1000
       user     system      total        real
   0.140000   0.000000   0.140000 (  0.137957)
$ ruby lib/benchmark.rb
=> Array size = 10000
       user     system      total        real
  13.790000   0.020000  13.810000 ( 13.866528)

You can see from these measurements that increasing the size of the array by a factor of 10 each time comes with a logarithmic increase in processing time. There is definitely room for optimization, so stay tuned for more refinements. In the meantime,

Fun Rubyisms

Array initialization is a snap in ruby! Compare the initialization line in the QuickFind algorithm above with the equivalent java implementation:

@find_array = Array.new(size){ |index| index }

versus

...
int[] arry = new int[size]
for (int i = 0; i < size; i++) {
    id[i] = i;
}

Both result in an array of the specified size, but ruby is slightly less sprawling.

Also, check out lines 27 – 29

temp_array = @find_array.map do |element|
  element == target ? goal : element
end

Using map() eliminates the need for a separate array declaration. It is the equivalent of:

temp_array = []
@find_array.each do |element|
  temp_array << element == target ? goal : element
end
return temp_array

and is certainly more readable than the equivalent in java:

int pid = id[p];
for (int i = 0; i < id.length; i++)
    if (id[i] == pid) id[i] = id[q];

Awesome