I was scraping the rust off my data-structures synapses last week, and found that there was still some functionality underneath. UnionFind is one of those sneaky algorithms that crops up in non-linear arrangements of data, such as computer or social networks. Specifically, it keeps track of sets of connected elements. The API consists of a union(p, q), which connects the two elements p and q, and a find(p, q), which determines if p is in the same set as q.
I learned about this algorithm through Coursera, and their class Algorithms Part 1 with Kevin Wayne and Robert Sedgewick. The focus of this course was not only on implementation of algorithms, but also optimization.
Union Find
A simple model for connectivity is 10 objects numbered 1 through 0:
0 1 2 3 4 5 6 7 8 9
Making random connections will produce disjointed sets of objects:
0 1 { 2 3 9 } { 5 6 } 7 { 4 8 }
A find(p, q) will determine if p and q are in the same set:
- find(0, 8) => false
- find(2, 9) => true
A union(p, q) will put p and q in the same set:
- union(1, 6) => 0 { 2 3 9 } { 1 5 6 } 7 { 4 8 }
- union(2, 4) => 0 1 { 5 6 } 7 { 2 3 9 4 8 }
Using an array to keep track of the values, we can walk through what this implementation would look like:
| Index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Value | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | |
| union(4,9) | 1 | 2 | 3 | 9 | 5 | 6 | 7 | 8 | 9 | 0 | | Changes all values that match array[4] to match array[9] |
| union(2,4) | 1 | 9 | 3 | 9 | 5 | 6 | 7 | 8 | 9 | 0 | | Changes all values that match array[2] to match array[4] |
| union(4,7) | 1 | 7 | 3 | 7 | 5 | 6 | 7 | 8 | 7 | 0 | | Changes all values that match array[4] to match array[7] |
Using the java api provided by Kevin Wayne and Robert Sedgewick:
public class UF
----------------
public UF(int N)
public int find(int p, int q)
public void union(int p, int q)I adapted the basic union find in ruby:
class QuickFind
attr_accessor :find_array
def initialize(size)
@find_array = Array.new(size){ |index| index }
end
def find(p, q)
location(p) == location(q)
end
def union(p, q)
@find_array = create_union(p, q)
end
private
def location(index)
raise NotAnElement if @find_array[index].nil?
return @find_array[index]
end
def create_union(p, q)
target = location(p)
goal = location(q)
temp_array = @find_array.map do |element|
element == target ? goal : element
end
end
end
class NotAnElement < Exception
endSets are arranged as a flat array of elements which all share the same value: they are all wearing the same t-shirt. When we connect elements from different sets, we must make sure that every element in the first set changes its t-shirt to match the elements in the other set. This is a time-consuming process, and uses a lot of t-shirts.
This implementation is called QuickFind because in an array of 1,000,000 entries, finding if two elements are connected is easy (compare the t-shirt at index p with the t-shirt at index q), but connecting them requires inspection of every element in the array in order (check everyone’s t-shirts) to make sure all the previous connections remain in place.
Benchmark
To compare efficiency of this and future UnionFind implementations, I built a basic benchmark that performs a set number of union and find operations and measures the time to completion:
require 'benchmark'
@size = 10000
def testUnit(object)
@size.times do
object.union(rand(@size), rand(@size))
object.find(rand(@size), rand(@size))
end
end
Benchmark.bm do |x|
x.report { testUnit(QuickFind.new(@size)) }
endRunning the benchmark on QuickFind:
$ ruby lib/benchmark.rb
=> Array size = 100
user system total real
0.000000 0.000000 0.000000 ( 0.002745)
$ ruby lib/benchmark.rb
=> Array size = 1000
user system total real
0.140000 0.000000 0.140000 ( 0.137957)
$ ruby lib/benchmark.rb
=> Array size = 10000
user system total real
13.790000 0.020000 13.810000 ( 13.866528)You can see from these measurements that increasing the size of the array by a factor of 10 each time comes with a logarithmic increase in processing time. There is definitely room for optimization, so stay tuned for more refinements. In the meantime,
Fun Rubyisms
Array initialization is a snap in ruby! Compare the initialization line in the QuickFind algorithm above with the equivalent java implementation:
@find_array = Array.new(size){ |index| index }versus
...
int[] arry = new int[size]
for (int i = 0; i < size; i++) {
id[i] = i;
}Both result in an array of the specified size, but ruby is slightly less sprawling.
Also, check out lines 27 - 29
temp_array = @find_array.map do |element|
element == target ? goal : element
endUsing map() eliminates the need for a separate array declaration. It is the equivalent of:
temp_array = []
@find_array.each do |element|
temp_array << element == target ? goal : element
end
return temp_arrayand is certainly more readable than the equivalent in java:
int pid = id[p];
for (int i = 0; i < id.length; i++)
if (id[i] == pid) id[i] = id[q];Awesome