Collection Indices, Slices, and Generics

I missed posting about the last couple of releases of Swift before it was open sourced, but now that I have some time to write a couple of posts over the holidays, here’s a one about this sentence in the release notes for 2.0 beta 5:

For consistency and better composition of generic code, ArraySlice indices are no longer always zero-based but map directly onto the indices of the collection they are slicing and maintain that mapping even after mutations.

Why is this interesting enough to write a whole post over? It has to do with this bit of code that you see written from time to time:

// iterate over the indices in a collection:
for idx in 0..<someCollection.count {
  // use someCollection[idx] in some way
  // that can't be done via for x in someCollection
}

What’s wrong with that? Well, if someCollection is an Array, nothing. Hmm, maybe readability. All collections have an indices property that does the same thing. If you think

for idx in 0..<someCollection.count

is just as readable as

for idx in someCollection.indices

then I put it to you that all these C derivatives have addled your brain. If you remain unconvinced, the rest of this article is probably going to be a bit like this cartoon.

(if you find yourself wanting all but the last index, or all but the first, remember Range is a collection, so you can write things like someCollection.indices.dropLast())

Anyway, readability aside, the problems come when you try and make your loop generic. For example, take everyone’s favorite way to start an algorithms text book, a binary search. Here’s a version for arrays:

extension Array
  // this seems to be the indentation style
  // found in the open-sourced the Swift code...
  where  
  Generator.Element: Comparable {
    /// Returns an index of an element in `self` which 
    /// is equal to `element`, or `nil` if such value
    /// is not found.
    ///
    /// - Requires: elements of `self` are ordered by <
    ///
    /// - Complexity: O(log(`self.count`)).
    public func binarySearch(
      element: Generator.Element
    ) -> Int? {
        var left = 0
        var right = count - 1

        while left <= right {
            let mid = (left + right) / 2
            let candidate = self[mid]

            if candidate < element {
                left = mid + 1
            }
            else if element < candidate {
                right = mid - 1
            }
            else {
                return mid
            }
        }
        // Not found
        return nil
    }
}

(you'd probably want to implement a version that takes an isOrderedBefore predicate, then implement this one in terms of that one, but let's ignore that for now)

Binary searches are notoriously hard to get right, and this one contains at least one bug – that I've left in deliberately to talk about later – as well as the ones I added accidentally that you can let me know about.

A first trainer-wheels attempt to make this generic might be to switch the extension to be of CollectionType, and constrain the index to be of type Int:

extension CollectionType 
  where 
  Index == Int, 
  Generator.Element: Comparable {
    // same as before
}

This seems like it's working at first. You can now use it on ContiguousArray, UnsafeBufferPointer and, up until it changed, ArraySlice. It won't work with random-access-indexed collections that don't use integer indices, but there aren't many of those around. But it makes a big assumption that the collection is zero based. As I mentioned in a previous changes post when collections all became sliceable, this became a very shakey assumption, and now it doesn't even apply to array slices. It's also not something the type system will help you find – you'd discover it at runtime when your out-of-bounds error traps.

You could fix it while keeping the constraint to integer indices, but at this point you may as well rip the band-aid off and just make it fully generic. This also helps you spot all the places where you need to fix your zero-based assumption. It starts easy enough…

extension CollectionType 
  where
  Index: RandomAccessIndexType, 
  Generator.Element: Comparable {
    public func binarySearch(
      element: Generator.Element
    ) -> Index? {
        // instead of 0 for the start:
        var left: Index = startIndex
        // and instead of count - 1 for the end:
        var right: Index = endIndex.predecessor()

but then we hit trouble:

// error: binary operator '+' cannot be
// applied to two 'Self.Index' operands
let mid = (left + right) / 2

You can't just add two indices together. Think about non-contiguous storage like a node-based container like a linked list or a tree. You can't just add together two node pointers. Or for a random-access example, what about ReverseRandomAccessCollection, which is a wrapper around a random-access collection, and has an opaque index type. How could you add two of those together?

Really there are two concepts here – indices, and distances between indices. You can always count, as an integer, the number of times you need to call successor() on an index to get to a later index. This is the distance between them. You can do mathematical operations on distances – you can go twice as far from an index, or advance as far from one index as another is from the start. All indices have this ability, but bi-directional indices can have negative distances (the number of times you'd need to call predecessor()), and random-access indices can move by a distance in constant time.

When we had an integer index on a zero-based collection, a handy trick we were relying on is that any index is also the distance of that index from the start. So when we wrote let mid = (left + right) / 2, really this was saying that the mid-point was the average of the distance from the start of the two indices. But now we want our algorithm to work not just on zero-based integer indices but kind of random-access index (random-access because if you can't move around in constant time, you may as well do a linear search).

You could rewrite that generically as:

let mid = startIndex.advancedBy(
           (  startIndex.distanceTo(left) 
            + startIndex.distanceTo(right)
          ) / 2)

but this would be a bit silly. Instead, you can just take the distance from the left to the right, halve that, and add it to the start. We can add this ability to Range, to give us a mid-point property (and which also fixes a bug, if you followed the earlier link):

extension Range {
    var mid: Element {
        return startIndex.advancedBy(
          startIndex.distanceTo(endIndex) / 2
        )
    }
}

which we can use in a fully generic binary search:

extension CollectionType 
  where 
  Index: RandomAccessIndexType {
    public func binarySearch(
      element: Generator.Element, 
      isOrderedBefore: (Generator.Element,Generator.Element)->Bool
    ) -> Index? {
        var searchRegion = self.indices

        while !searchRegion.isEmpty {
            let mid = searchRegion.mid
            let candidate = self[mid]

            if isOrderedBefore(candidate,element) {
                searchRegion.startIndex = mid + 1
            }
            else if isOrderedBefore(element,candidate) {
                searchRegion.endIndex = mid
            }
            else {
                return mid
            }
        }
        return nil
    }
}

extension CollectionType 
where Index: RandomAccessIndexType, 
      Generator.Element: Comparable 
{
    public func binarySearch(
      element: Generator.Element
    ) -> Index? {
        return binarySearch(element, isOrderedBefore: <)
    }
}

Note, we can still write searchRegion.startIndex = mid + 1 because RandomAccessIndexType conforms to Strideable, which has this definition of +

public func +<T : Strideable>(lhs: T, rhs: T.Stride) -> T

that allows you to add a distance (which is an indices' stride) to an index. There's a matching definition with the lhs and rhs flipped so you can write it the other way around.

This version makes a couple of other changes. It combines the left and right indices into a Range, searchRegion. I think this helps readability a bit – !searchRegion.isEmpty is a bit nicer than left<=right – and also allows us to use our mid extension. But more importantly, this rewrite ditches the other trick we were relying on when the index was an integer – you can go one past the end or ahead of the start without any consequences. When your index is an integer, who cares if you decrement it from 0 to -1. But some indices don't take kindly to going past their limit (try calling &quot;hello&quot;.endIndex.successor()). So to make the algorithm properly generic, it needed a rewrite to avoid this being a possibility.

Talking of readability, 2.1b5 added some additional slicing methods, so instead of:

a[a.startIndex...idx]
a[a.startIndex..<idx]
a[idx..<a.endIndex]

you can now write:

a.prefixThrough(idx)
a.prefixUpTo(idx)
a.suffixFrom(idx)

It's kinda fun, though probably a bit cute to be used in production code, to write Python-style one-sided subscript operations for these as well:

postfix operator ..< { }
prefix operator ..< { }
prefix operator ... { }

struct RangeStart<I: ForwardIndexType> { let start: I }
struct RangeToEnd<I: ForwardIndexType> { let end: I }
struct RangeThroughEnd<I: ForwardIndexType> { let end: I }

postfix func ..< <I: ForwardIndexType>(lhs: I) -> RangeStart<I>
{ return RangeStart(start: lhs) }

prefix func ..< <I: ForwardIndexType>(rhs: I) -> RangeToEnd<I>
{ return RangeToEnd(end: rhs) }

prefix func ... <I: ForwardIndexType>(rhs: I) -> RangeThroughEnd<I>
{ return RangeThroughEnd(end: rhs) }

extension CollectionType {
    subscript(r: RangeStart<Self.Index>) -> SubSequence {
        return self.suffixFrom(r.start)
    }
    subscript(r: RangeToEnd<Self.Index>) -> SubSequence {
        return self.prefixUpTo(r.end)
    }
    subscript(r: RangeThroughEnd<Self.Index>) -> SubSequence {
        return self.prefixThrough(r.end)
    }
}

which then means you can write:

let a = [1,2,3,4]
a[1..<] // [2,3,4]
a[..<2] // [1,2]
a[...2] // [1,2,3]

You could even define pattern-matching versions, for use in a switch statement:

infix operator ~= {
    associativity none
    precedence 130
}

func ~=<I: ForwardIndexType where I : Comparable>(pattern: RangeStart<I>, value: I) -> Bool {
    return value > pattern.start
}

func ~=<I: ForwardIndexType where I : Comparable>(pattern: RangeToEnd<I>, value: I) -> Bool {
    return value < pattern.end
}

func ~=<I: ForwardIndexType where I : Comparable>(pattern: RangeThroughEnd<I>, value: I) -> Bool {
    return value <= pattern.end
}

so you can write:

for i in [9, 10, 11] {
    switch i {
    case ..<10: print("below 10")
    case ...10: print("10 or below")
    case 2..<: print("2 or more")
    default: fatalError()
    }
}

If you're not familiar with ~=, check out Ole's series of posts about pattern matching, it's pretty powerful.

2 thoughts on “Collection Indices, Slices, and Generics

Leave a reply to Generic Collections, SubSequences and Overloading | Airspeed Velocity Cancel reply