Wednesday, June 12, 2013

What is the fastest way to iterate over a Dictionary

hello and welcome to my first post, i hope many more will come after...

As a c# developer i use dictionary very often, it is one of the most popular generics that .Net has to offer.
The fact that dictionary has insertion, deletion and look-up in O(n) (on average) makes it suitable for almost every scenario and preferable collection by many programmers.

Most of the use i had with dictionary is to insert and delete items from a dictionary which is quite straight forward, but from time to time i needed to iterate over all the items i have in a dictionary and find something without knowing his key. To figure out what is the fastest approach for this i ran few tests that I'll show here and try to explain the results.

I used the dictionary to hold 100,000+ vertices, and i iterate over them very often (something sevral times per second) and that is why it was important for me to iterate over the as fast as possible.

I have a simple struct called cell
 
private struct Cell
{
    private int mX, mY;
    public Cell(int x, int y)
    {
        mX = x;
        mY = y;
    }
    public override string ToString()
    {
        return "(" + mX.ToString() + "," + mY.ToString() + ")";
    }
}

And i construct my dictionary as following
 
Dictionary < cell,string > myDictionary = new Dictionary < cell,string >();
for (int i = 0; i < 1000000; i++)
{
    Cell c = new Cell(i,i);
    myDictionary.Add(c, c.ToString());
}

First of all I will use the obvious approach to iterate over a dictionary which is "for each statement", while in each iteration i'm getting a reference to current value.

I'm measuring the time it takes to iterate all over the dictionary using Stopwatch, which is high performance measurement class in .Net.

Simple foreach statement

double microsecPerTick = (1000d * 1000) / Stopwatch.Frequency;
Stopwatch totalTime = Stopwatch.StartNew();

string tempVar;
foreach (KeyValuePair < Cell,string > item in myDictionary)
{
    tempVar = item.Value;
}

totalTime.Stop();
Convert.ToString(totalTime.ElapsedTicks * microsecPerTick);

It ran about 42,000 micro sec, which is 42 mili sec or 0.042 sec which is pretty fast.
But we need to notice one problem on every iteration we construct a KeyValuePair which slows us down.

This is how the Dictionary Enumerator.Current works (ForEach uses it to iterate over the dictionary) :
 
object IEnumerator.Current
{
    get
    {
        if ((this.index == 0) || (this.index == (this.dictionary.count + 1)))
        {
            ThrowHelper.ThrowInvalidOperationException
                      (ExceptionResource.InvalidOperation_EnumOpCantHappen);
        }
        if (this.getEnumeratorRetType == 1)
        {
            return new DictionaryEntry(this.current.Key, this.current.Value);
        }
        return new KeyValuePair < Tkey,Tvalue >(this.current.Key, this.current.Value);
    }
}

As you can see, the enumerator constructs a new KeyValuePair (which we don't need) which is a waste of time, we get additional performance penalty because my key is a struct (which is value type) and the struct is being copied.

There is a faster way to iterate over the values of the dictionary, and it's pretty simple actually, instead of iterating on the <key,value> pairs, we can iterate only on the values by simply using the myDictionary.Values

string tempVar;
Dictionary<Cell, string>.ValueCollection values = myDictionary.Values;
foreach(string currentValue in values)
{
    tempVar = currentValue;
}

Because this code simply return a reference to each value, it runs 2 times faster, about 20,000 micro seconds.