Arrays, Linked Lists and Performance

Patient: Doctor doctor, it hurts when I do this.
Doctor: Well, don’t do that.

Your reminder that building arrays with reduce, while fun, is accidentally quadratic.

I was surprised at how surprising some found this. Quite a few people suggested the reduce version could be changed to not do the array copy (I don’t think it can). Some suggested maybe + could be optimized so it doesn’t perform a copy (I don’t think that it could easily, as we’ll see shortly).¹

In other feedback, a few commented on the previous post about linked lists. Why implement an outdated data structure? What’s the point when we have arrays?

So, you know how sometimes I mention this isn’t a blog about Mac and iOS programming? It’s not a blog about Mac and iOS programming! Don’t put a enum-based linked list into your app just because I happen to find it interesting. I’ll probably find your ensuing performance problems interesting too. You won’t. That said, I think the linked list example is very interesting, and worth implementing and playing with, and might help shed some light on the Array reduce performance. And it might even be useful in real code in certain (infrequent) circumstances.²

So, to recap – sometimes, you’ll see reduce used to build an array (or dictionary or set), for example, in this implementation of map:

extension SequenceType {
   func mapUsingReduce<T>(transform: Generator.Element->T) -> [T] {
        return reduce([]) { $0 + [transform($1)] }
    }
}

as opposed to creating a mutable array then adding to it from a for loop:

extension SequenceType {
   func mapUsingFor<T>(transform: Generator.Element->T) -> [T] {
        var result: [T] = []
        for x in self { result.append(transform(x)) }
        return result
    }
}

The difference being, + creates a copy of the accumulating array every time. And copying the array takes linear time, inside a loop over the full array, so the overall time taken increases quadratically with the length of the array being mapped:

Of course, people aren’t normally going around re-implementing map though: you more often see this technique with, say, filtering duplicates or building dictionaries of word frequencies. But the problem remains the same.

Why is this relevant to a list? Well, because you could implement a version of map using reduce on the list code from last time, like so:

extension SequenceType {
    func mapToList<T>(transform: Generator.Element->T) -> List<T> {
        return reduce(List()) { $0.cons(transform($1)) }.reverse()
    }
}

The performance results you get are so perfectly half the array performance (because of the reverse step) that your teacher may accuse you of faking the results instead of doing the experiment:

This works because the list is persistent – it shares nodes between previous lists and newly consed lists, forever. So no copying needed. But this comes at the cost of only being able to grow from the head (hence the need for a reverse), and the list has to be fully immutable, so you have to make a copy to modify it even when it’s uniquely referenced. This is unlike Array, which can detect unique use of its buffer and just change it in-place, no copying required. Lists have other costs as well – to sum a list of numbers takes twice as long as to sum an array of numbers, as the indirection needed to traverse the list takes time.

So is the full copy on + with arrays fixable? To think about that, let’s first look at how a copy-on-write array might work. Mike Ash already has a great blog post on implementing a copy-on-write Array, so let’s do something a little different, which is to use the ManagedBuffer class from the standard library to build it.

ManagedBuffer

ManagedBuffer is a class you can inherit from, which simplifies the process of allocating/deallocating and managing storage on the heap. It is generic, and has two separate placeholders, Value and Element. Element is the type of the block of storage of n elements, allocated dynamically on creation. Value is the type of an extra single variable on the side for storing other information – for example, to implement an array, you would need to store the element count, as the elements need to be destroyed before the memory is deallocated. Access to the elements is via withUnsafeMutablePointerToElements, whereas the value can be accessed either through a similar unsafe method, or directly via a .value property.

Here’s a very simple self-destroying ArrayBuffer:

private class MyArrayBuffer<Element>: ManagedBuffer<Int,Element> {
    deinit {
        self.withUnsafeMutablePointerToElements { elems->Void in
            elems.destroy(self.value)
        }
    }
}

So, MyArrayBuffer is still generic on what elements it stores, but it fixes the Value of ManagedBuffer to just be an Int, which will store the number of elements in the buffer (bear in mind, we will allocate more storage than we have elements in the array, to avoid constantly reallocating).

When the buffer deinitializes, MyArrayBuffer.deinit will be called prior to ManagedBuffer.deinit, which deallocates the memory. This gives MyArrayBuffer a chance to destroy all its objects. Destroying is necessary if Element is something more than just a passive struct – for example, if the array contained other copy-on-write types, destroying them will trigger them freeing their memory if necessary.

Now, we can create an array type of a struct, with a private buffer as its storage:

public struct MyArray<Element> {
    private var _buf: MyArrayBuffer<Element>

    public init() {
        _buf = MyArrayBuffer<Element>.create(8) { _ in 0 } as! MyArrayBuffer<Element>
    }
}

We don’t use MyArrayBuffer’s init directly – instead we use the class method from ManagedBuffer. Because this method returns the superclass, we force-downcast it to the right type.

Then, we turn MyArray into a collection type:

extension MyArray: CollectionType {
    public var startIndex: Int { return 0 }
    public var endIndex: Int { return _buf.value }

    public subscript(idx: Int) -> Element {
        guard idx < self.endIndex else { fatalError("Array index out of range") }
        return _buf.withUnsafeMutablePointerToElements { $0[idx] }
    }
}

Next, we need two fairly similar methods on the buffer, one to clone the storage and one to resize the storage. Cloning will be used when shared storage is detected, resizing when non-shared storage needs to get bigger:

extension MyArrayBuffer {
    func clone() -> MyArrayBuffer<Element> {
        return self.withUnsafeMutablePointerToElements { oldElems->MyArrayBuffer<Element> in
            return MyArrayBuffer<Element>.create(self.allocatedElementCount) { newBuf in
                newBuf.withUnsafeMutablePointerToElements { newElems->Void in
                    newElems.initializeFrom(oldElems, count: self.value)
                }
                return self.value
            } as! MyArrayBuffer<Element>
        }
    }

    func resize(newSize: Int) -> MyArrayBuffer<Element> {
        return self.withUnsafeMutablePointerToElements { oldElems->MyArrayBuffer<Element> in
            let elementCount = self.value
            return MyArrayBuffer<Element>.create(newSize) { newBuf in
                newBuf.withUnsafeMutablePointerToElements { newElems->Void in
                    newElems.moveInitializeFrom(oldElems, count: elementCount)
                }
                self.value = 0
                return elementCount
            } as! MyArrayBuffer<Element>
        }
    }
}

Creating and populating the buffers in one shot is a little finicky – first we need to get the unsafe pointer to the existing elements, then call create, which takes a closure that receives the partially-created object (i.e. allocated but not initialized memory), which we then need to call newBuf.withUnsafeMutablePointerToElements on to copy the memory from the old buffer to the new.

The main difference between the two is clone doesn’t change the elements in the old buffer, just loads new copies into a new buffer. resize moves the elements from the old to the new storage (via UnsafeMutablePointer’s moveInitializeFrom method), then updates the old buffer to tell it that it no longer has any elements to manage – otherwise, it would try to destroy them during its deinit.

Finally, we give MyArray an append and extend method:

extension MyArray {
    public mutating func append(x: Element) {
        if !isUniquelyReferencedNonObjC(&_buf) {
            _buf = _buf.clone()
        }

        if _buf.allocatedElementCount == count {
            _buf = _buf.resize(count*2)
        }

        _buf.withUnsafeMutablePointers { (val, elems)->Void in
            (elems + val.memory++).initialize(x)
        }
    }

    public mutating func extend<S: SequenceType where S.Generator.Element == Element>(seq: S) {
        for x in seq { self.append(x) }
    }
}

This is just sample code. In practice, you would break out the uniqueness and resizing code, so you could re-use it in subscript set or other mutating methods, but I’ve crammed it all in the append method to keep it brief. Also you’d want to reserve enough space for the extend up-front if possible, and avoid double-copying the buffer when it’s both shared and too small. But none of these things have a major impact on the bigger picture for our purposes.

OK now for the operators. First, +=, which being an assignment operator takes an inout left-hand side and just extends it with the right-hand side:

func +=<Element, S: SequenceType where S.Generator.Element == Element>
  (inout lhs: MyArray<Element>, rhs: S) {
    lhs.extend(rhs)
}

And finally, +. We can implement this in terms of +=. The operator takes two immutable arrays, and adds them together to produce a new array. It does this by relying on the copy-on-write behaviour to create a mutable copy of the left-hand side, then extend it with the right-hand side:

func +<Element, S: SequenceType where S.Generator.Element == Element>
  (lhs: MyArray<Element>, rhs: S) -> MyArray<Element> {
    var result = lhs
    result += rhs
    return result
}

In fact you could shorten this further by using the var modifier in front of the lhs argument:

func +<Element, S: SequenceType where S.Generator.Element == Element>
  (var lhs: MyArray<Element>, rhs: S) -> MyArray<Element> {
    lhs += rhs
    return lhs
}

I mention this second version because some suggested a better reduce solution might involve using var on the accumulating argument. But this would be similar to what is happening here with lhs: all var does is declare your passed-by-value variable to be mutable. It is still a copy – it is not the original variable somehow passed through by reference.

Can + be optimized?

We now have a fully working toy implementation of a copy-on-write array you can append to, and which has a + operator. Which means we can rewrite our reduce version of map with it:

func mapUsingMyReduce<T>(transform: Generator.Element->T) -> MyArray<T> {
    return reduce([]) { $0 + [transform($1)] }
}

func mapUsingMyFor<T>(transform: Generator.Element->T) -> MyArray<T> {
    var result = MyArray<T>()
    for x in self { result.append(transform(x)) }
    return result
}

and if you chart the performance, you’ll see both exhibiting the similar behaviour as with array.

So, given we now have an implementation we have complete control over, can we change + so it doesn’t make a copy? I don’t think so.

In a simpler case, could we change this:

var a = MyArray<Int>()
a.extend(0..<3)
let b = a + [6,7,8]

so that it didn’t make a copy? It seems pretty obvious we can’t. b has to be a new copy of the array, in order to not affect a. Even if we don’t make any further changes to a after the creation of b, there’s no way the implementation of + could know this. Maybe the compiler could know this, and optimize accordingly, but the + func can’t.

Checking for unique references wouldn’t help here. a is still in existence, so the lhs variable will not be the only owner of the buffer.

reduce is no different. Here’s a possible implementation:

extension SequenceType {
    func myReduce<T>(initial: T, combine: (T,Generator.Element)->T) -> T {
        var result = initial
        for x in self {
            result = combine(result,x)
        }
        return result
    }
}

Assuming combine here is { $0 + [transform($1)] }, you can see that + similarly has no knowledge of the fact that we’re actually going to assign the outcome directly to the result variable. We know, on inspecting the code, that it’ll just be fine to add the right-hand side onto the left-hand side, if that were even possible (in theory it is, since even though the array is passed immutably by value, the buffer is a class and so could be mutated since it has reference semantics). But + can’t know that from where it sits. It definitely knows it’s copy of the left-hand side isn’t the only owner of the buffer. There is another owner too: reduce holds a copy of result – and is about to throw it away and replace it with a new result, but that’s coming after + has run.

One ray of hope is if arrays were also their own slices (which they aren’t – there is ArraySlice instead, which has extra overhead to track the start and end slice into the parent array). If they were, then perhaps they could be modified to allow one, but only one, array to have an append happen to it which the others could ignore. But this would probably add overhead to arrays in general, and the whole point of arrays is to be fast – you don’t want to slow them down just to cater to this use case.

Perhaps there is a very clever way of figuring all this out, with or without the compiler’s help. But such gymnastics don’t seem like a good idea even then. The semantics are that + creates a new array. Wanting it to secretly modify an existing one under very specific circumstances doesn’t seem like the right solution – mutating the array is. If you prefer, you can wrap that var up in a nice little generic method and then pretend it’s not there. But it’ll make your code faster.

Others suggested you shouldn’t care about this sort of thing until a profiler tells you that you need to (I think you definitely should care while you write your code – saying “I’ll only care when the profiler tells me there’s a problem” feels a bit like “I’ll only write correct code when the unit tests tell me it isn’t correct”). ↩
I also think the addition of features like enum, as well as the flexibility of choosing between object or function solution, and the “safe until you ask not to be” would make Swift a really great CS teaching language. Perhaps a future edition of this book could be in Swift. ↩

5 thoughts on “Arrays, Linked Lists and Performance”

Michael Tsai - Blog - Swift Array Performance says:

August 3, 2015 at 1:50 pm

[…] (2015-08-03): Airspeed Velocity […]

Yet Another Root of All Evil | Big O Note-Taking says:

August 9, 2015 at 7:45 pm

[…] Aww! I had thought the recursive lists were really cool! Apparently not. Apparently they’re cripplingly slow. (Airspeed Velocity also did a great post on this) […]

Generic Collections, SubSequences and Overloading | Airspeed Velocity says:

January 3, 2016 at 8:18 pm

[…] they involve incrementing the index one by one. Now it could be worse – at least it’s not accidentally quadratic, but other examples could easily be (imagine implementing a sort that assumed constant-time advance […]

Swift化零为整：Reduce 详解 - IT大道 says:

March 11, 2016 at 7:20 am

[…] Reddit 网站的搜索结果指出，从 reduce 的语义上来说，传入闭包的参数（如果可变的话，即 mutated），会对底层序列的每个元素都产生一份 copy 。在我们的案例中，这意味着 accumulator 参数 ac 将为 0…100000 范围内的每个元素都执行一次复制操作。有关对此更好、更详细的解释请看这篇 Airspeedvelocity 博客文章。 […]

Collection Data Structures in Swift says:

May 5, 2016 at 1:01 pm

[…] AirspeedVelocity, a blog looking into some of the internals of Swift, has written a blog post titled “Arrays, Linked Lists, and Performance”. It closely follows implementing a stable merge sort. […]

	Sorting Nibbles in S… on Sorting Nibbles in Swift
	Collection Data Stru… on Arrays, Linked Lists and …
	Swift化零为整：Reduce 详解… on Arrays, Linked Lists and …
	Writing A Generic St… on Generic Collections, SubSequen…
	Writing A Generic St… on Collection Indices, Slices, an…

	Sorting Nibbles in S… on Sorting Nibbles in Swift
	Collection Data Stru… on Arrays, Linked Lists and …
	Swift化零为整：Reduce 详解… on Arrays, Linked Lists and …
	Writing A Generic St… on Generic Collections, SubSequen…
	Writing A Generic St… on Collection Indices, Slices, an…

Airspeed Velocity

African or European Swift?

Arrays, Linked Lists and Performance

ManagedBuffer

Can + be optimized?

5 thoughts on “Arrays, Linked Lists and Performance”

Leave a comment Cancel reply

ManagedBuffer

Can + be optimized?

Share this:

5 thoughts on “Arrays, Linked Lists and Performance”

Leave a comment Cancel reply