Protocols and Assumptions

edit: subsequent to this article being written, the Swift standard library has been updated, and documentation-comments above the relevant methods of RangeReplaceableCollectionType now explicitly state: “Invalidates all indices with respect to self.” Bear this in mind as you read on:

What does it mean to implement a protocol? Possibly more than supporting methods with specific names.

Obviously the methods ought to do what their names imply – isEmpty shouldn’t fire the torpedoes. But as well as basic functionality, there are also guarantees about things like the method’s complexity, and possibly wider implications for how the class itself behaves.

More importantly, protocols might not guarantee a behaviour you think they do and are relying on. Using protocols with generics can sometimes give you the illusion of more guarantees than you actually have.

Suppose you want to write a function remove that removes entries from a collection in-place – that is, similar to the sort function, it takes a collection as an inout parameter, and removes from it all the entries that match a given value. ¹

Unlike sort, which just requires MutableCollection, remove would need to use the new-in-beta6 RangeReplaceableCollectionType, which includes a removeAtIndex method. Armed with this, you might write the following: ²

func remove
    <C: RangeReplaceableCollectionType,
     E: Equatable
     where C.Generator.Element == E>
    (inout collection: C, value: E) {
        var idx = collection.startIndex
        while idx != collection.endIndex {
            if collection[idx] == value {
                collection.removeAtIndex(idx)
            }
            else {
                ++idx
            }
        }
}

Embedded in this code are a lot of assumptions about how collections and their indices behave when you mutate them, and some of these might not be valid, depending on the collection type. It works with String, the only explicitly range-replaceable collection in the standard library currently, as well as the secretly range-replaceable Array, ContiguousArray and Slice. But you could easily implement a new type of collection that was range-replaceable for which the above code would explode in flames.

The biggest assumption is that removing an element from a collection does not completely invalidate an index. That is, after you call collection.removeAtIndex(idx), idx remains a legitimate index into the collection rather than just becoming junk.

Next, there’s the assumption that when you remove a entry at an index, that index will now point to the next element in the collection. That’s why, after removing the entry, the code above just goes straight back around the loop without incrementing idx. You could put it another way – when you remove an element, the next element “moves down” to the position indexed by the removed element.

Finally, there’s the assumption that if the element that idx points to is the last element, then what idx will point to after you remove it will be endIndex. Or, to put it another way, as you remove the last element, endIndex “moves down” to equal the index of the last element.

By the way, this last assumption is why the code uses while idx != collection.endIndex rather than the more-elegant for idx in indices(collection). indices returns a Range object between the collection’s start and end, but it would be created before we start looping. Because endIndex is a moving target as we remove some entries from the collection, it won’t work for our purposes. A cleverer version of indices that returned something more dynamic might help, but that could have different undesirable side-effects.

Are all these assumptions legit? Well you can see they obviously are for the Array types, because these just use an integer for their index, with endIndex equal to count. When elements are removed, the rest of the array shuffles down. Even if that resulted in a full reallocation of the array’s memory, the assumptions would still hold because all the index does is represent a relative offset from the start of the array, not point to a specific location.

Strings are trickier, because their index type is opaque. Chances are it’s still an offset from the start of the string, but because Swift characters aren’t of uniform width, ³ that offset doesn’t necessarily increment by a constant amount each time. Still, if this is the implementation, the above assumptions would hold, and experimentation suggests they do.

What kind of collection might not adhere to these assumptions? Well, a very simple doubly-linked list implemention might not. ⁴ If the index for a linked list were a pointer to each node, then removing that node could leave the index pointing at a removed entry. You couldn’t just loop around without first pointing to the next node in the list:

func remove
    <C: RangeReplaceableCollectionType,
     E: Equatable
     where C.Generator.Element == E>
    (inout collection: C, element: E) {
        var idx = collection.startIndex
        while idx != collection.endIndex {
            if collection[idx] == element {
              // first grab the index of the next element
              let next = idx.successor()
              // then remove this one
              collection.removeAtIndex(idx)
              // and repoint
              idx = next
            }
            else {
                ++idx
            }
        }
}

But then this algorithm would no longer work correctly with arrays and strings! ⁵

So what’s the solution – should RangeReplaceableCollectionType mandate the kind of index validity behaviour our remove algorithm relies on? Or are the assumptions invalid and we need a better algorithm? (of which more in a later article) The Swift standard library is still evolving rapidly with each beta so it’s possibly a little early to tell. For now, be careful about the assumptions you make – just because all the current implementations of a particular protocol work a certain way doesn’t mean other implementations will.

As opposed to a version that returned a copy, which would be called removed. I thought I didn’t like this convention at first, but I’m warming to it. ↩
This code has an efficiency deficiency, which we’ll talk about in a later article. ↩
For an in-depth explanation of Swift strings, see Ole Begemann’s article. ↩
Singly-linked lists couldn’t implement removeAtIndex easily, and would probably have some kind of removeAfterIndex operation instead. ↩
The C++ STL resolves this by having erase return an iterator for the entry just after the erased elements – along with a fairly draconian assertion that any removal invalidates all other iterators (not just ones at and beyond the removed value). But Swift’s removeAtIndex currently returns the removed value, rather than a new index. ↩

2 thoughts on “Protocols and Assumptions”

Michael Tsai - Blog - Protocols and Assumptions says:

September 2, 2014 at 8:54 pm

[…] Airspeed Velocity: […]

Accidentally putting a loop in your loop | Airspeed Velocity says:

September 14, 2014 at 7:42 pm

[…] so is… removeAtIndex. In a previous article, I mentioned the following code to remove all occurrences of a given value from an array has an […]

	Sorting Nibbles in S… on Sorting Nibbles in Swift
	Collection Data Stru… on Arrays, Linked Lists and …
	Swift化零为整：Reduce 详解… on Arrays, Linked Lists and …
	Writing A Generic St… on Generic Collections, SubSequen…
	Writing A Generic St… on Collection Indices, Slices, an…

	Sorting Nibbles in S… on Sorting Nibbles in Swift
	Collection Data Stru… on Arrays, Linked Lists and …
	Swift化零为整：Reduce 详解… on Arrays, Linked Lists and …
	Writing A Generic St… on Generic Collections, SubSequen…
	Writing A Generic St… on Collection Indices, Slices, an…

Airspeed Velocity

African or European Swift?

Protocols and Assumptions

2 thoughts on “Protocols and Assumptions”

Leave a comment Cancel reply

Share this:

2 thoughts on “Protocols and Assumptions”

Leave a comment Cancel reply