Joel Spolsky wrote a good article back in 2001 about knowing when the function you’re calling has linear complexity,1 and not accidentally putting a loop in your loop,2 getting quadratic complexity without realizing it. He was talking about C string algorithms like strlen
and strcat
, but it applies just as much to several Swift standard library functions.
When a function only has versions that take a sequence, like the seductively-useful contains
, equal
, min
– and maxElement
, it’s linear.3 Some depend on their input: countElements
can be done in constant time if you give it a collection with a random-access index, but linear if not. Same with distance
and advance
index operations, which is why extending String
to support random-access indexing using them is probably a bad idea. Be careful, not everything that could be optimized may have been. For example, ClosedInterval.contains
is presumably O(1) but there doesn’t appear to be an optimized overload for the non-member contains
, which only has versions that take a sequence.
The Swift team have helpfully put the complexity of certain functions in the documentation. Array’s reserveCapacity
, insert
, and replaceRange
are O(N).
And so is… removeAtIndex
. In a previous article, I mentioned the following code to remove all occurrences of a given value from an array has an efficiency problem:
func remove
<C: RangeReplaceableCollectionType,
E: Equatable
where C.Generator.Element == E>
(inout collection: C, value: E) {
var idx = collection.startIndex
while idx != collection.endIndex {
if collection[idx] == value {
collection.removeAtIndex(idx)
}
else {
++idx
}
}
}
The documentation for removeAtIndex
says: 4
Remove and return the element at the given index. Worst case complexity:
O(N). Requires: index
< count
That is, the worst-case time it takes to remove an element increases in linear proportion to the length of the collection. That's not surprising – if you remove an element of an array, you have to shuffle each element after it down one. The larger the collection the more things to shuffle. Maybe your element is near the end, maybe your collection is a linked list that can remove elements in O(1), maybe it’s magic and has all sorts of cool optimizations – hence O(N) is the worst case. But still, it’s probably not best to call it within another loop.
Here, it shouldn't be necessary. We're already iterating over the collection, so ought to be able to combine the deletion and shuffling down together as one operation. Here's a new version of remove
that does this, modelled on the C++ STL equivalent:
(incidentally, it also removes the need for questionable assumptions about how indexes behave when you remove elements, the subject of the previous article)
func remove
<C: protocol<RangeReplaceableCollectionType,
MutableCollectionType>,
E: Equatable
where C.Generator.Element == E>
(inout col: C, value: E) {
// find the first entry to remove
if var advance = find(col, value) {
// advance points to next element to test,
// rear points to where to copy it to
// if it's a keeper
var rear = advance++
while advance != col.endIndex {
if col[advance] != value {
col[rear] = col[advance]
++rear
}
++advance
}
col.removeRange(rear..<col.endIndex)
}
}
This version breaks the removal into multiple steps. First, it uses find
to locate the first entry to remove (and if it doesn’t find one, does nothing more). Next, one by one it moves the subsequent elements down on top of that entry. When it encounters more entries to remove, it skips over them (i.e. it increments advance
, the index of entries to examine and maybe copy, but not rear
, the index of where to copy to, and they aren’t copied).
Finally, when all this is done, the collection should have all the non-removed entries at the front, and some meaningless garbage at the end. This end section is the length of the number of removed entries (though it doesn‘t contain the removed entries – entries were copied, not swapped). This is trimmed off by a call to removeRange
, which is also O(N). But that’s fine – two O(N) algorithms in series is still O(N).
By the way, in this last step it differs from the C++ STL version which, in a quality bit of user-unfriendliness, requires the caller to do the final remove step, instead leaving the collection with the garbage still at the end. There‘s good reasons for this (because iterators), but it’s a nasty gotcha for newbies. I’d chalk this one up as a win for Swift’s approach to generic collection algorithms. 5
Note that for this to work, the collection also needs to support the MutableCollectionType
protocol, because that’s where the assignable version of subscript
lives for some reason.6 In fact, that’s all MutableCollectionType
adds. By the way, this whole optimization is based on the assumption that the assignment version of subscript
runs in constant time. It should do, right? Copying a value into a position in an array shouldn’t affect the rest of the array, so it shouldn’t matter how long it is.
A quick test with a reasonably large array shows that this new version of remove
does indeed run quicker (and more consistently) than our first version. Yay.
Then you try and use it on a string, and your celebration is short lived. String
doesn’t implement MutableCollectionType
. Huh, what’s that about?
What this doesn‘t mean is that String
is immutable. It’s totally mutable. Mutating is what RangeReplaceableCollectionType
is all about. This is a chance to make an important if maybe obvious point: just because your function takes an object via a protocol that doesn’t allow mutation doesn‘t mean that object is immutable. It just means you can‘t mutate it in your function.
My guess for why strings don‘t support MutableCollectionType
? Because that assumption above, about subscript assignment being O(1), doesn’t hold. Remember, individual elements of Swift strings are of variable length. What if you replaced a longer character with a shorter one? We’d be back to square one, having to shovel all the subsequent characters down to fill in the gap. Worse, what if you replaced a shorter entry with a longer one? The string would get bigger, maybe even need relocating to a newly allocated chunk of memory.
This would be another example of signalling more from protocols than just what functions are supported. They can tell you about fundamental properties of the object. Perhaps MutableCollectionType
is like MutableInConstantTimeCollectionType
. Course, on the other hand, I could be reading waaay too much into String
not implementing it. It could just be an oversight – only time (or one of the Swift devs) will tell.
String
does support replaceRange
as an alternative to subscript assign. But its complexity is, you guessed it, O(N). And after crowbarring it into the second algorithm above, tests suggest it’s no faster than the removeAtIndex
version.7 So if you really have a burning desire to remove some characters from a huge string, maybe find another way. Perhaps you can trade some space for that time.