Accidentally putting a loop in your loop

Joel Spolsky wrote a good article back in 2001 about knowing when the function you’re calling has linear complexity,¹ and not accidentally putting a loop in your loop,² getting quadratic complexity without realizing it. He was talking about C string algorithms like strlen and strcat, but it applies just as much to several Swift standard library functions.

When a function only has versions that take a sequence, like the seductively-useful contains, equal, min– and maxElement, it’s linear.³ Some depend on their input: countElements can be done in constant time if you give it a collection with a random-access index, but linear if not. Same with distance and advance index operations, which is why extending String to support random-access indexing using them is probably a bad idea. Be careful, not everything that could be optimized may have been. For example, ClosedInterval.contains is presumably O(1) but there doesn’t appear to be an optimized overload for the non-member contains, which only has versions that take a sequence.

The Swift team have helpfully put the complexity of certain functions in the documentation. Array’s reserveCapacity, insert, and replaceRange are O(N).

And so is… removeAtIndex. In a previous article, I mentioned the following code to remove all occurrences of a given value from an array has an efficiency problem:

func remove
    <C: RangeReplaceableCollectionType,
     E: Equatable
     where C.Generator.Element == E>
    (inout collection: C, value: E) {
        var idx = collection.startIndex
        while idx != collection.endIndex {
            if collection[idx] == value {
                collection.removeAtIndex(idx)
            }
            else {
                ++idx
            }
        }
}

The documentation for removeAtIndex says: ⁴

Remove and return the element at the given index. Worst case complexity:
O(N). Requires: index < count

That is, the worst-case time it takes to remove an element increases in linear proportion to the length of the collection. That's not surprising – if you remove an element of an array, you have to shuffle each element after it down one. The larger the collection the more things to shuffle. Maybe your element is near the end, maybe your collection is a linked list that can remove elements in O(1), maybe it’s magic and has all sorts of cool optimizations – hence O(N) is the worst case. But still, it’s probably not best to call it within another loop.

Here, it shouldn't be necessary. We're already iterating over the collection, so ought to be able to combine the deletion and shuffling down together as one operation. Here's a new version of remove that does this, modelled on the C++ STL equivalent:

(incidentally, it also removes the need for questionable assumptions about how indexes behave when you remove elements, the subject of the previous article)

func remove
    <C: protocol<RangeReplaceableCollectionType,
                 MutableCollectionType>,
     E: Equatable
     where C.Generator.Element == E>
    (inout col: C, value: E) {
        // find the first entry to remove
        if var advance = find(col, value) {
            // advance points to next element to test,
            // rear points to where to copy it to
            // if it's a keeper
            var rear = advance++
            while advance != col.endIndex {
                if col[advance] != value {
                    col[rear] = col[advance]
                    ++rear
                }
                ++advance
            }
            col.removeRange(rear..<col.endIndex)
        }
}

This version breaks the removal into multiple steps. First, it uses find to locate the first entry to remove (and if it doesn’t find one, does nothing more). Next, one by one it moves the subsequent elements down on top of that entry. When it encounters more entries to remove, it skips over them (i.e. it increments advance, the index of entries to examine and maybe copy, but not rear, the index of where to copy to, and they aren’t copied).

Finally, when all this is done, the collection should have all the non-removed entries at the front, and some meaningless garbage at the end. This end section is the length of the number of removed entries (though it doesn‘t contain the removed entries – entries were copied, not swapped). This is trimmed off by a call to removeRange, which is also O(N). But that’s fine – two O(N) algorithms in series is still O(N).

By the way, in this last step it differs from the C++ STL version which, in a quality bit of user-unfriendliness, requires the caller to do the final remove step, instead leaving the collection with the garbage still at the end. There‘s good reasons for this (because iterators), but it’s a nasty gotcha for newbies. I’d chalk this one up as a win for Swift’s approach to generic collection algorithms. ⁵

Note that for this to work, the collection also needs to support the MutableCollectionType protocol, because that’s where the assignable version of subscript lives for some reason.⁶ In fact, that’s all MutableCollectionType adds. By the way, this whole optimization is based on the assumption that the assignment version of subscript runs in constant time. It should do, right? Copying a value into a position in an array shouldn’t affect the rest of the array, so it shouldn’t matter how long it is.

A quick test with a reasonably large array shows that this new version of remove does indeed run quicker (and more consistently) than our first version. Yay.

Then you try and use it on a string, and your celebration is short lived. String doesn’t implement MutableCollectionType. Huh, what’s that about?

What this doesn‘t mean is that String is immutable. It’s totally mutable. Mutating is what RangeReplaceableCollectionType is all about. This is a chance to make an important if maybe obvious point: just because your function takes an object via a protocol that doesn’t allow mutation doesn‘t mean that object is immutable. It just means you can‘t mutate it in your function.

My guess for why strings don‘t support MutableCollectionType? Because that assumption above, about subscript assignment being O(1), doesn’t hold. Remember, individual elements of Swift strings are of variable length. What if you replaced a longer character with a shorter one? We’d be back to square one, having to shovel all the subsequent characters down to fill in the gap. Worse, what if you replaced a shorter entry with a longer one? The string would get bigger, maybe even need relocating to a newly allocated chunk of memory.

This would be another example of signalling more from protocols than just what functions are supported. They can tell you about fundamental properties of the object. Perhaps MutableCollectionType is like MutableInConstantTimeCollectionType. Course, on the other hand, I could be reading waaay too much into String not implementing it. It could just be an oversight – only time (or one of the Swift devs) will tell.

String does support replaceRange as an alternative to subscript assign. But its complexity is, you guessed it, O(N). And after crowbarring it into the second algorithm above, tests suggest it’s no faster than the removeAtIndex version.⁷ So if you really have a burning desire to remove some characters from a huge string, maybe find another way. Perhaps you can trade some space for that time.

That wikipedia entry on complexity, like so many wikipedia entries on mathematical topics, is pretty beginner-unfriendly. If anyone has a good beginner’s guide link I could replace it with, let me know. An (admittedly cursory) google search doesn’t turn much up. ↩
I’d have called this post “Yo, dawg” but then I’ve already done that once. ↩
Unless it does something like return the first element in the sequence, obvs. Hmmm, maybe I should be less paranoid about people finding errors in my posts. ↩
Actually it doesn’t, not on the protocol version anyway. The stand-alone function version (that takes that same protocol) does though, so I’m taking the liberty of giving that instead. ↩
It‘s not all wins, mind. The STL’s iterator-based functions are much easier to use with subranges of collections. Don‘t get me started on slices… ↩
Wait for it… ↩
though this is possibly due to my lack of deftness with a crowbar. ↩

7 thoughts on “Accidentally putting a loop in your loop”

Wes Campaigne says:

September 14, 2014 at 10:34 pm

Ideally, we’d have an ExtensibleCollectionType-specific version of `filter` that returns a collection of the same type, instead of an Array. That said, if we don’t mind a temp array, we can still do that way:

func remove(inout col: C, value: E) {
var newcol = C()
newcol.extend(filter(col) { $0 != value })
col = newcol
}

var s = “tattle tale”

remove(&s, “t”) // s == “ale ale”

It’s O(n), regardless of C’s behaviour, but requires two passes. (It’s maybe also worth a reminder here that you can assign entirely new values to `inout` parameters; you’re not limited to mutating the existing one.)

- airspeedvelocity says:
  
  September 16, 2014 at 4:55 pm
  
  Totally agree it’d be good if filter had an ExtensibleCollectionType version, as I talked about here: https://airspeedvelocity.net/2014/08/22/filter-string-and-extensiblecollectiontype/
  
Maarten says:

September 15, 2014 at 7:16 am

It’s a good article, and I don’t see anything wrong with it as such.
However, my first thought if I want to remove any particular element from a sequence would be to reach for the filter function.
Generally that is O(n) in performance.
Only if it is really important to do the work in-place would I start to look at the techniques described in this article.

- airspeedvelocity says:
  
  September 16, 2014 at 4:56 pm
  
  Yup totally agree if you can afford the space you can do it in O(N) (hence the last line of my article 🙂
  
  Though as Wes points out (and I mention in https://airspeedvelocity.net/2014/08/22/filter-string-and-extensiblecollectiontype/) it’d be nice to have a version of filter that returned the same type as passed in.
  
Michael M. Mayer says:

September 16, 2014 at 4:44 pm

I am really enjoying all your articles. Was there an incomplete thought expressed here: “(because iterators)”?
I did a google search of the term “Algorithmic Complexity” and found a number of good articles from which to choose.
Please keep writing. It is appreciated.

- airspeedvelocity says:
  
  September 16, 2014 at 5:18 pm
  
  Hi Michael – thanks for the feedback, glad to hear it.
  
  The “because iterators” comment was my attempt at Internet humor, as in: http://blog.dictionary.com/because-x/
  
  If you’re interested in a fuller explanation of why STL remove doesn’t actually remove anything, see the top answer on this stack overflow post: http://stackoverflow.com/questions/6456870/stl-remove-doesnt-work-as-expected
  
Which func? | Airspeed Velocity says:

September 20, 2014 at 6:20 pm

[…] Ever since Swift 1.0 beta 5, Range has supposed to be only for representing collection index ranges. If you’re not operating on indices, ClosedInterval is probably what you want. It has methods like contains that determine in constant time if an interval contains a value. Range can only use the non-member contains algorithm, which will turn your range into a sequence and iterate over it – don’t accidentally put a loop in your loop! […]

	Sorting Nibbles in S… on Sorting Nibbles in Swift
	Collection Data Stru… on Arrays, Linked Lists and …
	Swift化零为整：Reduce 详解… on Arrays, Linked Lists and …
	Writing A Generic St… on Generic Collections, SubSequen…
	Writing A Generic St… on Collection Indices, Slices, an…

	Sorting Nibbles in S… on Sorting Nibbles in Swift
	Collection Data Stru… on Arrays, Linked Lists and …
	Swift化零为整：Reduce 详解… on Arrays, Linked Lists and …
	Writing A Generic St… on Generic Collections, SubSequen…
	Writing A Generic St… on Collection Indices, Slices, an…

Airspeed Velocity

African or European Swift?

Accidentally putting a loop in your loop

7 thoughts on “Accidentally putting a loop in your loop”

Leave a comment Cancel reply

Share this:

7 thoughts on “Accidentally putting a loop in your loop”

Leave a comment Cancel reply