Saturday, February 13, 2010

Best way to create an Array from a group of items

In my earlier blog, we found that arrays were twice as fast as IEnumerables. So how do you get data into an array with good performance?

 

This note looks at how to get a stream of objects into an array with the best performance. Using Object Browser in Visual Studio quickly finds all of the built in ToArray() methods. The other alternative is to create a fix size array and expand it as needed (we will use 1000 items as the default and expand in 1000 items chunks) – this was the worst performer.

 

The numbers for x86 compiled code is shown below:

Row Labels Min of Msec
ArrayResize<> 198.24
HashSet<> 69.34
IEnumerable 27.34
List<> 14.65
NoOp 0.00
Queue<> 17.58
Stack<> 21.48

 

And doing it for explicit x64 compiled code found:

Row Labels Min of Msec
ArrayResize<> 312.50
HashSet<> 71.29
IEnumerable 38.09
List<> 22.46
NoOp 0.00
Queue<> 25.39
Stack<> 26.37

 

We again see the same pattern as before, explicit x64 code performs worst than explicit x86 code. We also see that the best method is to use a List<>

The code for anyone wishing to try alternative patterns:

 

XDocument doc = XDocument.Load(xmlFile.FullName);
IEnumerable<XElement> enumerable = doc.Descendants();
XElement[] asList = enumerable.ToArray();
XElement[] retList;
string name = string.Empty;
for (int repeat = 0; repeat < 30; repeat++)
    for (int pattern = 0; pattern < 7; pattern++)
    {
        int cnt = 0;
        DateTime start = DateTime.Now;
        switch (pattern)
        {
            case 1:
                name = "List<>";
                List<XElement> list = new List<XElement>();
                Array.ForEach(asList, item =>
                {
                    list.Add(item);
                });
                retList = list.ToArray();
                break;
            case 2:
                name = "Queue<>";
                Queue<XElement> queue = new Queue<XElement>();
                Array.ForEach(asList, item =>
                {
                    queue.Enqueue(item);
                });
                retList = queue.ToArray();
                break;
            case 3:
                name = "Stack<>";
                Stack<XElement> stack = new Stack<XElement>();
                Array.ForEach(asList, item =>
                {
                    stack.Push(item);
                });
                retList = stack.ToArray();
                break;
            case 4:
                name = "HashSet<>";
                HashSet<XElement> hashset = new HashSet<XElement>();
                Array.ForEach(asList, item =>
                {
                    hashset.Add(item);
                });
                retList = hashset.ToArray();
                break;

            case 5:
                name = "IEnumerable";
                retList = enumerable.ToArray();
                break;
            case 6:
                name = "ArrayResize<>";
                var array= new XElement[1000];
                int i = 0;
                Array.ForEach(asList, item =>
                {                                    
                    array[i]=item;
                    i++;
                    if (i >= array.Length)
                        Array.Resize(ref array, array.Length + 1000);                                    
                });
                Array.Resize(ref array, i-1);
                break;
            default:
                name = "NoOp";
                break;
        }
        DateTime end = DateTime.Now;
        TimeSpan elapse = end - start;
        File.AppendAllText(@"C:\data.txt", String.Format("{0}\t{1}\r\n", name, elapse.TotalMilliseconds));
    }
}

2 comments:

  1. Visual Studio static code analysis constantly nags to change List<> to Collect<>.

    I added this code:
    case 7:
    name = "Collection<>";
    Collection collection= new Collection();
    Array.ForEach(asList, item =>
    {
    collection.Add(item);
    });
    retList = collection.ToArray();
    break;
    and re-executed.

    Best List<> time was 25 msec. Best Collection<> time was 36 msec. I also noticed that the times were often randomly high with the average times being:
    List<>: 48 msec
    Collection<>: 88 msec
    IEnumerable : 75 msec

    ReplyDelete