In my last look at the UnionFind algorithm I made a quick-find implementation — finding if two elements were connected was fast but connecting two elements was slow. The union operation is a good candidate for optimization.
Quick Union
One way to optimize the union operation is to reduce the number of array accesses required.
In QuickFind, the sets are arranged in a flat array of elements which all share the same value: they are all wearing the same t-shirt. When we connect elements from different sets, we must make sure that every element in the first set changes its t-shirt to match the elements in the other set. This is a time-consuming process, and uses a lot of t-shirts.
What if only one element had to change its t-shirt? If the set had a single element in charge - a root - then joining elements from different sets would require identifying the t-shirt of the one in charge, and then changing to that element’s t-shirt.
Using an array instead of t-shirts to keep track of the values, we can walk through what this implementation would look like:
| Index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Value | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | | Each element begins as the root of its own set |
| union(4,9) | 1 | 2 | 3 | 9 | 5 | 6 | 7 | 8 | 9 | 0 | | root(9) = 9, root(4) = 4, change array[4] = 9 |
| union(2,4) | 1 | 9 | 3 | 9 | 5 | 6 | 7 | 8 | 9 | 0 | | root(4) = 9, root(2) = 2, change array[2] = 9 |
| union(4,7) | 1 | 9 | 3 | 9 | 5 | 6 | 7 | 8 | 7 | 0 | | root(7) = 7, root(4) = 9, change array[9] = 7 |
| find(2,7) | 1 | 9 | 3 | 9 | 5 | 6 | 7 | 8 | 7 | 0 | | root(7) = 7, root(2) = 7. The two are connected |
QuickUnion
Again following the basic API:
public class UF
---------------
public UF(int N)
public int find(int p, int q)
public void union(int p, int q)I adapted the QuickUnion in ruby:
class QuickUnion
def initialize(size)
@find_array = Array.new(size){ |index| index }
end
# check if p and q have the same root
def find(p, q)
verify(p, q)
root(p) == root(q)
end
# set id of p's root to the id of q's root
def union(p, q)
verify(p, q)
@find_array[root(p)] = root(q)
end
private
def root(index)
# get root until root(index) == index
if @find_array[index] == index
return index
else
root(@find_array[index])
end
end
def verify(p, q)
raise NotAnElement if @find_array[p].nil?
raise NotAnElement if @find_array[q].nil?
end
end
class NotAnElement < Exception
endThis implementation is a QuickUnion because in an array of 1,000,000 entries, connecting element p to element q is easy: you only have to find the root of p, and then change q to match (one root() and one array access). Finding if p and q are connected takes a little bit longer, because you have to perform two root operations: one for p and one for q, but it is still faster than accessing every element in the array.
Benchmark
$ ruby lib/benchmark.rb
=> Array Size: 100
user system total real
0.000000 0.000000 0.000000 ( 0.001865)
0.000000 0.000000 0.000000 ( 0.000377)
$ ruby lib/benchmark.rb
=> Array Size: 1000
user system total real
0.120000 0.010000 0.130000 ( 0.124929)
0.010000 0.000000 0.010000 ( 0.008825)
$ ruby lib/benchmark.rb
=> Array Size: 10000
user system total real
10.120000 0.030000 10.150000 ( 10.146124)
0.520000 0.000000 0.520000 ( 0.520377)Comparing QuickFind (top line) with QuickUnion(bottom line), we can actually measure how well this small optimization of the union operation has increased performance with larger datasets.
Awesome.