edit: subsequent to this article being written, the Swift standard library has been updated, and documentation-comments above the relevant methods of RangeReplaceableCollectionType
now explicitly state: “Invalidates all indices with respect to self
.” Bear this in mind as you read on:
What does it mean to implement a protocol? Possibly more than supporting methods with specific names.
Obviously the methods ought to do what their names imply – isEmpty
shouldn’t fire the torpedoes. But as well as basic functionality, there are also guarantees about things like the method’s complexity, and possibly wider implications for how the class itself behaves.
More importantly, protocols might not guarantee a behaviour you think they do and are relying on. Using protocols with generics can sometimes give you the illusion of more guarantees than you actually have.
Suppose you want to write a function remove
that removes entries from a collection in-place – that is, similar to the sort
function, it takes a collection as an inout
parameter, and removes from it all the entries that match a given value. 1
Unlike sort
, which just requires MutableCollection
, remove
would need to use the new-in-beta6 RangeReplaceableCollectionType
, which includes a removeAtIndex
method. Armed with this, you might write the following: 2
func remove <C: RangeReplaceableCollectionType, E: Equatable where C.Generator.Element == E> (inout collection: C, value: E) { var idx = collection.startIndex while idx != collection.endIndex { if collection[idx] == value { collection.removeAtIndex(idx) } else { ++idx } } }
Embedded in this code are a lot of assumptions about how collections and their indices behave when you mutate them, and some of these might not be valid, depending on the collection type. It works with String
, the only explicitly range-replaceable collection in the standard library currently, as well as the secretly range-replaceable Array
, ContiguousArray
and Slice
. But you could easily implement a new type of collection that was range-replaceable for which the above code would explode in flames.
The biggest assumption is that removing an element from a collection does not completely invalidate an index. That is, after you call collection.removeAtIndex(idx)
, idx
remains a legitimate index into the collection rather than just becoming junk.
Next, there’s the assumption that when you remove a entry at an index, that index will now point to the next element in the collection. That’s why, after removing the entry, the code above just goes straight back around the loop without incrementing idx
. You could put it another way – when you remove an element, the next element “moves down” to the position indexed by the removed element.
Finally, there’s the assumption that if the element that idx
points to is the last element, then what idx
will point to after you remove it will be endIndex
. Or, to put it another way, as you remove the last element, endIndex
“moves down” to equal the index of the last element.
By the way, this last assumption is why the code uses while idx != collection.endIndex
rather than the more-elegant for idx in indices(collection)
. indices
returns a Range
object between the collection’s start and end, but it would be created before we start looping. Because endIndex
is a moving target as we remove some entries from the collection, it won’t work for our purposes. A cleverer version of indices
that returned something more dynamic might help, but that could have different undesirable side-effects.
Are all these assumptions legit? Well you can see they obviously are for the Array
types, because these just use an integer for their index, with endIndex
equal to count
. When elements are removed, the rest of the array shuffles down. Even if that resulted in a full reallocation of the array’s memory, the assumptions would still hold because all the index does is represent a relative offset from the start of the array, not point to a specific location.
Strings are trickier, because their index type is opaque. Chances are it’s still an offset from the start of the string, but because Swift characters aren’t of uniform width, 3 that offset doesn’t necessarily increment by a constant amount each time. Still, if this is the implementation, the above assumptions would hold, and experimentation suggests they do.
What kind of collection might not adhere to these assumptions? Well, a very simple doubly-linked list implemention might not. 4 If the index for a linked list were a pointer to each node, then removing that node could leave the index pointing at a removed entry. You couldn’t just loop around without first pointing to the next node in the list:
func remove <C: RangeReplaceableCollectionType, E: Equatable where C.Generator.Element == E> (inout collection: C, element: E) { var idx = collection.startIndex while idx != collection.endIndex { if collection[idx] == element { // first grab the index of the next element let next = idx.successor() // then remove this one collection.removeAtIndex(idx) // and repoint idx = next } else { ++idx } } }
But then this algorithm would no longer work correctly with arrays and strings! 5
So what’s the solution – should RangeReplaceableCollectionType
mandate the kind of index validity behaviour our remove
algorithm relies on? Or are the assumptions invalid and we need a better algorithm? (of which more in a later article) The Swift standard library is still evolving rapidly with each beta so it’s possibly a little early to tell. For now, be careful about the assumptions you make – just because all the current implementations of a particular protocol work a certain way doesn’t mean other implementations will.
-
As opposed to a version that returned a copy, which would be called
removed
. I thought I didn’t like this convention at first, but I’m warming to it. ↩ - This code has an efficiency deficiency, which we’ll talk about in a later article. ↩
- For an in-depth explanation of Swift strings, see Ole Begemann’s article. ↩
-
Singly-linked lists couldn’t implement
removeAtIndex
easily, and would probably have some kind ofremoveAfterIndex
operation instead. ↩ -
The C++ STL resolves this by having
erase
return an iterator for the entry just after the erased elements – along with a fairly draconian assertion that any removal invalidates all other iterators (not just ones at and beyond the removed value). But Swift’sremoveAtIndex
currently returns the removed value, rather than a new index. ↩
[…] Airspeed Velocity: […]
[…] so is… removeAtIndex. In a previous article, I mentioned the following code to remove all occurrences of a given value from an array has an […]