Collection Indices, Slices, and Generics

I missed posting about the last couple of releases of Swift before it was open sourced, but now that I have some time to write a couple of posts over the holidays, here’s a one about this sentence in the release notes for 2.0 beta 5:

For consistency and better composition of generic code, ArraySlice indices are no longer always zero-based but map directly onto the indices of the collection they are slicing and maintain that mapping even after mutations.

Why is this interesting enough to write a whole post over? It has to do with this bit of code that you see written from time to time:

// iterate over the indices in a collection:
for idx in 0..<someCollection.count {
  // use someCollection[idx] in some way
  // that can't be done via for x in someCollection
}

What’s wrong with that? Well, if someCollection is an Array, nothing. Hmm, maybe readability. All collections have an indices property that does the same thing. If you think

for idx in 0..<someCollection.count

is just as readable as

for idx in someCollection.indices

then I put it to you that all these C derivatives have addled your brain. If you remain unconvinced, the rest of this article is probably going to be a bit like this cartoon.

(if you find yourself wanting all but the last index, or all but the first, remember Range is a collection, so you can write things like someCollection.indices.dropLast())

Anyway, readability aside, the problems come when you try and make your loop generic. For example, take everyone’s favorite way to start an algorithms text book, a binary search. Here’s a version for arrays:

extension Array
  // this seems to be the indentation style
  // found in the open-sourced the Swift code...
  where  
  Generator.Element: Comparable {
    /// Returns an index of an element in `self` which 
    /// is equal to `element`, or `nil` if such value
    /// is not found.
    ///
    /// - Requires: elements of `self` are ordered by <
    ///
    /// - Complexity: O(log(`self.count`)).
    public func binarySearch(
      element: Generator.Element
    ) -> Int? {
        var left = 0
        var right = count - 1

        while left <= right {
            let mid = (left + right) / 2
            let candidate = self[mid]

            if candidate < element {
                left = mid + 1
            }
            else if element < candidate {
                right = mid - 1
            }
            else {
                return mid
            }
        }
        // Not found
        return nil
    }
}

(you'd probably want to implement a version that takes an isOrderedBefore predicate, then implement this one in terms of that one, but let's ignore that for now)

Binary searches are notoriously hard to get right, and this one contains at least one bug – that I've left in deliberately to talk about later – as well as the ones I added accidentally that you can let me know about.

A first trainer-wheels attempt to make this generic might be to switch the extension to be of CollectionType, and constrain the index to be of type Int:

extension CollectionType 
  where 
  Index == Int, 
  Generator.Element: Comparable {
    // same as before
}

This seems like it's working at first. You can now use it on ContiguousArray, UnsafeBufferPointer and, up until it changed, ArraySlice. It won't work with random-access-indexed collections that don't use integer indices, but there aren't many of those around. But it makes a big assumption that the collection is zero based. As I mentioned in a previous changes post when collections all became sliceable, this became a very shakey assumption, and now it doesn't even apply to array slices. It's also not something the type system will help you find – you'd discover it at runtime when your out-of-bounds error traps.

You could fix it while keeping the constraint to integer indices, but at this point you may as well rip the band-aid off and just make it fully generic. This also helps you spot all the places where you need to fix your zero-based assumption. It starts easy enough…

extension CollectionType 
  where
  Index: RandomAccessIndexType, 
  Generator.Element: Comparable {
    public func binarySearch(
      element: Generator.Element
    ) -> Index? {
        // instead of 0 for the start:
        var left: Index = startIndex
        // and instead of count - 1 for the end:
        var right: Index = endIndex.predecessor()

but then we hit trouble:

// error: binary operator '+' cannot be
// applied to two 'Self.Index' operands
let mid = (left + right) / 2

You can't just add two indices together. Think about non-contiguous storage like a node-based container like a linked list or a tree. You can't just add together two node pointers. Or for a random-access example, what about ReverseRandomAccessCollection, which is a wrapper around a random-access collection, and has an opaque index type. How could you add two of those together?

Really there are two concepts here – indices, and distances between indices. You can always count, as an integer, the number of times you need to call successor() on an index to get to a later index. This is the distance between them. You can do mathematical operations on distances – you can go twice as far from an index, or advance as far from one index as another is from the start. All indices have this ability, but bi-directional indices can have negative distances (the number of times you'd need to call predecessor()), and random-access indices can move by a distance in constant time.

When we had an integer index on a zero-based collection, a handy trick we were relying on is that any index is also the distance of that index from the start. So when we wrote let mid = (left + right) / 2, really this was saying that the mid-point was the average of the distance from the start of the two indices. But now we want our algorithm to work not just on zero-based integer indices but kind of random-access index (random-access because if you can't move around in constant time, you may as well do a linear search).

You could rewrite that generically as:

let mid = startIndex.advancedBy(
           (  startIndex.distanceTo(left) 
            + startIndex.distanceTo(right)
          ) / 2)

but this would be a bit silly. Instead, you can just take the distance from the left to the right, halve that, and add it to the start. We can add this ability to Range, to give us a mid-point property (and which also fixes a bug, if you followed the earlier link):

extension Range {
    var mid: Element {
        return startIndex.advancedBy(
          startIndex.distanceTo(endIndex) / 2
        )
    }
}

which we can use in a fully generic binary search:

extension CollectionType 
  where 
  Index: RandomAccessIndexType {
    public func binarySearch(
      element: Generator.Element, 
      isOrderedBefore: (Generator.Element,Generator.Element)->Bool
    ) -> Index? {
        var searchRegion = self.indices

        while !searchRegion.isEmpty {
            let mid = searchRegion.mid
            let candidate = self[mid]

            if isOrderedBefore(candidate,element) {
                searchRegion.startIndex = mid + 1
            }
            else if isOrderedBefore(element,candidate) {
                searchRegion.endIndex = mid
            }
            else {
                return mid
            }
        }
        return nil
    }
}

extension CollectionType 
where Index: RandomAccessIndexType, 
      Generator.Element: Comparable 
{
    public func binarySearch(
      element: Generator.Element
    ) -> Index? {
        return binarySearch(element, isOrderedBefore: <)
    }
}

Note, we can still write searchRegion.startIndex = mid + 1 because RandomAccessIndexType conforms to Strideable, which has this definition of +

public func +<T : Strideable>(lhs: T, rhs: T.Stride) -> T

that allows you to add a distance (which is an indices' stride) to an index. There's a matching definition with the lhs and rhs flipped so you can write it the other way around.

This version makes a couple of other changes. It combines the left and right indices into a Range, searchRegion. I think this helps readability a bit – !searchRegion.isEmpty is a bit nicer than left<=right – and also allows us to use our mid extension. But more importantly, this rewrite ditches the other trick we were relying on when the index was an integer – you can go one past the end or ahead of the start without any consequences. When your index is an integer, who cares if you decrement it from 0 to -1. But some indices don't take kindly to going past their limit (try calling &quot;hello&quot;.endIndex.successor()). So to make the algorithm properly generic, it needed a rewrite to avoid this being a possibility.

Talking of readability, 2.1b5 added some additional slicing methods, so instead of:

a[a.startIndex...idx]
a[a.startIndex..<idx]
a[idx..<a.endIndex]

you can now write:

a.prefixThrough(idx)
a.prefixUpTo(idx)
a.suffixFrom(idx)

It's kinda fun, though probably a bit cute to be used in production code, to write Python-style one-sided subscript operations for these as well:

postfix operator ..< { }
prefix operator ..< { }
prefix operator ... { }

struct RangeStart<I: ForwardIndexType> { let start: I }
struct RangeToEnd<I: ForwardIndexType> { let end: I }
struct RangeThroughEnd<I: ForwardIndexType> { let end: I }

postfix func ..< <I: ForwardIndexType>(lhs: I) -> RangeStart<I>
{ return RangeStart(start: lhs) }

prefix func ..< <I: ForwardIndexType>(rhs: I) -> RangeToEnd<I>
{ return RangeToEnd(end: rhs) }

prefix func ... <I: ForwardIndexType>(rhs: I) -> RangeThroughEnd<I>
{ return RangeThroughEnd(end: rhs) }

extension CollectionType {
    subscript(r: RangeStart<Self.Index>) -> SubSequence {
        return self.suffixFrom(r.start)
    }
    subscript(r: RangeToEnd<Self.Index>) -> SubSequence {
        return self.prefixUpTo(r.end)
    }
    subscript(r: RangeThroughEnd<Self.Index>) -> SubSequence {
        return self.prefixThrough(r.end)
    }
}

which then means you can write:

let a = [1,2,3,4]
a[1..<] // [2,3,4]
a[..<2] // [1,2]
a[...2] // [1,2,3]

You could even define pattern-matching versions, for use in a switch statement:

infix operator ~= {
    associativity none
    precedence 130
}

func ~=<I: ForwardIndexType where I : Comparable>(pattern: RangeStart<I>, value: I) -> Bool {
    return value > pattern.start
}

func ~=<I: ForwardIndexType where I : Comparable>(pattern: RangeToEnd<I>, value: I) -> Bool {
    return value < pattern.end
}

func ~=<I: ForwardIndexType where I : Comparable>(pattern: RangeThroughEnd<I>, value: I) -> Bool {
    return value <= pattern.end
}

so you can write:

for i in [9, 10, 11] {
    switch i {
    case ..<10: print("below 10")
    case ...10: print("10 or below")
    case 2..<: print("2 or more")
    default: fatalError()
    }
}

If you're not familiar with ~=, check out Ole's series of posts about pattern matching, it's pretty powerful.

Arrays, Linked Lists and Performance

Patient: Doctor doctor, it hurts when I do this.
Doctor: Well, don’t do that.

On twitter, I wrote:

Your reminder that building arrays with reduce, while fun, is accidentally quadratic.

I was surprised at how surprising some found this. Quite a few people suggested the reduce version could be changed to not do the array copy (I don’t think it can). Some suggested maybe + could be optimized so it doesn’t perform a copy (I don’t think that it could easily, as we’ll see shortly).1

In other feedback, a few commented on the previous post about linked lists. Why implement an outdated data structure? What’s the point when we have arrays?

So, you know how sometimes I mention this isn’t a blog about Mac and iOS programming? It’s not a blog about Mac and iOS programming! Don’t put a enum-based linked list into your app just because I happen to find it interesting. I’ll probably find your ensuing performance problems interesting too. You won’t. That said, I think the linked list example is very interesting, and worth implementing and playing with, and might help shed some light on the Array reduce performance. And it might even be useful in real code in certain (infrequent) circumstances.2

So, to recap – sometimes, you’ll see reduce used to build an array (or dictionary or set), for example, in this implementation of map:

extension SequenceType {
   func mapUsingReduce<T>(transform: Generator.Element->T) -> [T] {
        return reduce([]) { $0 + [transform($1)] }
    }
}

as opposed to creating a mutable array then adding to it from a for loop:

extension SequenceType {
   func mapUsingFor<T>(transform: Generator.Element->T) -> [T] {
        var result: [T] = []
        for x in self { result.append(transform(x)) }
        return result
    }
}

The difference being, + creates a copy of the accumulating array every time. And copying the array takes linear time, inside a loop over the full array, so the overall time taken increases quadratically with the length of the array being mapped:

Pasted_Image_8_2_15__12_49_PM

Of course, people aren’t normally going around re-implementing map though: you more often see this technique with, say, filtering duplicates or building dictionaries of word frequencies. But the problem remains the same.

Why is this relevant to a list? Well, because you could implement a version of map using reduce on the list code from last time, like so:

extension SequenceType {
    func mapToList<T>(transform: Generator.Element->T) -> List<T> {
        return reduce(List()) { $0.cons(transform($1)) }.reverse()
    }
}

The performance results you get are so perfectly half the array performance (because of the reverse step) that your teacher may accuse you of faking the results instead of doing the experiment:

Pasted_Image_8_2_15__1_17_PM

This works because the list is persistent – it shares nodes between previous lists and newly consed lists, forever. So no copying needed. But this comes at the cost of only being able to grow from the head (hence the need for a reverse), and the list has to be fully immutable, so you have to make a copy to modify it even when it’s uniquely referenced. This is unlike Array, which can detect unique use of its buffer and just change it in-place, no copying required. Lists have other costs as well – to sum a list of numbers takes twice as long as to sum an array of numbers, as the indirection needed to traverse the list takes time.

So is the full copy on + with arrays fixable? To think about that, let’s first look at how a copy-on-write array might work. Mike Ash already has a great blog post on implementing a copy-on-write Array, so let’s do something a little different, which is to use the ManagedBuffer class from the standard library to build it.

ManagedBuffer

ManagedBuffer is a class you can inherit from, which simplifies the process of allocating/deallocating and managing storage on the heap. It is generic, and has two separate placeholders, Value and Element. Element is the type of the block of storage of n elements, allocated dynamically on creation. Value is the type of an extra single variable on the side for storing other information – for example, to implement an array, you would need to store the element count, as the elements need to be destroyed before the memory is deallocated. Access to the elements is via withUnsafeMutablePointerToElements, whereas the value can be accessed either through a similar unsafe method, or directly via a .value property.

Here’s a very simple self-destroying ArrayBuffer:

private class MyArrayBuffer<Element>: ManagedBuffer<Int,Element> {
    deinit {
        self.withUnsafeMutablePointerToElements { elems->Void in
            elems.destroy(self.value)
        }
    }
}

So, MyArrayBuffer is still generic on what elements it stores, but it fixes the Value of ManagedBuffer to just be an Int, which will store the number of elements in the buffer (bear in mind, we will allocate more storage than we have elements in the array, to avoid constantly reallocating).

When the buffer deinitializes, MyArrayBuffer.deinit will be called prior to ManagedBuffer.deinit, which deallocates the memory. This gives MyArrayBuffer a chance to destroy all its objects. Destroying is necessary if Element is something more than just a passive struct – for example, if the array contained other copy-on-write types, destroying them will trigger them freeing their memory if necessary.

Now, we can create an array type of a struct, with a private buffer as its storage:

public struct MyArray<Element> {
    private var _buf: MyArrayBuffer<Element>

    public init() {
        _buf = MyArrayBuffer<Element>.create(8) { _ in 0 } as! MyArrayBuffer<Element>
    }
}

We don’t use MyArrayBuffer’s init directly – instead we use the class method from ManagedBuffer. Because this method returns the superclass, we force-downcast it to the right type.

Then, we turn MyArray into a collection type:

extension MyArray: CollectionType {
    public var startIndex: Int { return 0 }
    public var endIndex: Int { return _buf.value }

    public subscript(idx: Int) -> Element {
        guard idx < self.endIndex else { fatalError("Array index out of range") }
        return _buf.withUnsafeMutablePointerToElements { $0[idx] }
    }
}

Next, we need two fairly similar methods on the buffer, one to clone the storage and one to resize the storage. Cloning will be used when shared storage is detected, resizing when non-shared storage needs to get bigger:

extension MyArrayBuffer {
    func clone() -> MyArrayBuffer<Element> {
        return self.withUnsafeMutablePointerToElements { oldElems->MyArrayBuffer<Element> in
            return MyArrayBuffer<Element>.create(self.allocatedElementCount) { newBuf in
                newBuf.withUnsafeMutablePointerToElements { newElems->Void in
                    newElems.initializeFrom(oldElems, count: self.value)
                }
                return self.value
            } as! MyArrayBuffer<Element>
        }
    }

    func resize(newSize: Int) -> MyArrayBuffer<Element> {
        return self.withUnsafeMutablePointerToElements { oldElems->MyArrayBuffer<Element> in
            let elementCount = self.value
            return MyArrayBuffer<Element>.create(newSize) { newBuf in
                newBuf.withUnsafeMutablePointerToElements { newElems->Void in
                    newElems.moveInitializeFrom(oldElems, count: elementCount)
                }
                self.value = 0
                return elementCount
            } as! MyArrayBuffer<Element>
        }
    }
}

Creating and populating the buffers in one shot is a little finicky – first we need to get the unsafe pointer to the existing elements, then call create, which takes a closure that receives the partially-created object (i.e. allocated but not initialized memory), which we then need to call newBuf.withUnsafeMutablePointerToElements on to copy the memory from the old buffer to the new.

The main difference between the two is clone doesn’t change the elements in the old buffer, just loads new copies into a new buffer. resize moves the elements from the old to the new storage (via UnsafeMutablePointer’s moveInitializeFrom method), then updates the old buffer to tell it that it no longer has any elements to manage – otherwise, it would try to destroy them during its deinit.

Finally, we give MyArray an append and extend method:

extension MyArray {
    public mutating func append(x: Element) {
        if !isUniquelyReferencedNonObjC(&_buf) {
            _buf = _buf.clone()
        }

        if _buf.allocatedElementCount == count {
            _buf = _buf.resize(count*2)
        }

        _buf.withUnsafeMutablePointers { (val, elems)->Void in
            (elems + val.memory++).initialize(x)
        }
    }

    public mutating func extend<S: SequenceType where S.Generator.Element == Element>(seq: S) {
        for x in seq { self.append(x) }
    }
}

This is just sample code. In practice, you would break out the uniqueness and resizing code, so you could re-use it in subscript set or other mutating methods, but I’ve crammed it all in the append method to keep it brief. Also you’d want to reserve enough space for the extend up-front if possible, and avoid double-copying the buffer when it’s both shared and too small. But none of these things have a major impact on the bigger picture for our purposes.

OK now for the operators. First, +=, which being an assignment operator takes an inout left-hand side and just extends it with the right-hand side:

func +=<Element, S: SequenceType where S.Generator.Element == Element>
  (inout lhs: MyArray<Element>, rhs: S) {
    lhs.extend(rhs)
}

And finally, +. We can implement this in terms of +=. The operator takes two immutable arrays, and adds them together to produce a new array. It does this by relying on the copy-on-write behaviour to create a mutable copy of the left-hand side, then extend it with the right-hand side:

func +<Element, S: SequenceType where S.Generator.Element == Element>
  (lhs: MyArray<Element>, rhs: S) -> MyArray<Element> {
    var result = lhs
    result += rhs
    return result
}

In fact you could shorten this further by using the var modifier in front of the lhs argument:

func +<Element, S: SequenceType where S.Generator.Element == Element>
  (var lhs: MyArray<Element>, rhs: S) -> MyArray<Element> {
    lhs += rhs
    return lhs
}

I mention this second version because some suggested a better reduce solution might involve using var on the accumulating argument. But this would be similar to what is happening here with lhs: all var does is declare your passed-by-value variable to be mutable. It is still a copy – it is not the original variable somehow passed through by reference.

Can + be optimized?

We now have a fully working toy implementation of a copy-on-write array you can append to, and which has a + operator. Which means we can rewrite our reduce version of map with it:

func mapUsingMyReduce<T>(transform: Generator.Element->T) -> MyArray<T> {
    return reduce([]) { $0 + [transform($1)] }
}

func mapUsingMyFor<T>(transform: Generator.Element->T) -> MyArray<T> {
    var result = MyArray<T>()
    for x in self { result.append(transform(x)) }
    return result
}

and if you chart the performance, you’ll see both exhibiting the similar behaviour as with array.

So, given we now have an implementation we have complete control over, can we change + so it doesn’t make a copy? I don’t think so.

In a simpler case, could we change this:

var a = MyArray<Int>()
a.extend(0..<3)
let b = a + [6,7,8]

so that it didn’t make a copy? It seems pretty obvious we can’t. b has to be a new copy of the array, in order to not affect a. Even if we don’t make any further changes to a after the creation of b, there’s no way the implementation of + could know this. Maybe the compiler could know this, and optimize accordingly, but the + func can’t.

Checking for unique references wouldn’t help here. a is still in existence, so the lhs variable will not be the only owner of the buffer.

reduce is no different. Here’s a possible implementation:

extension SequenceType {
    func myReduce<T>(initial: T, combine: (T,Generator.Element)->T) -> T {
        var result = initial
        for x in self {
            result = combine(result,x)
        }
        return result
    }
}

Assuming combine here is { $0 + [transform($1)] }, you can see that + similarly has no knowledge of the fact that we’re actually going to assign the outcome directly to the result variable. We know, on inspecting the code, that it’ll just be fine to add the right-hand side onto the left-hand side, if that were even possible (in theory it is, since even though the array is passed immutably by value, the buffer is a class and so could be mutated since it has reference semantics). But + can’t know that from where it sits. It definitely knows it’s copy of the left-hand side isn’t the only owner of the buffer. There is another owner too: reduce holds a copy of result – and is about to throw it away and replace it with a new result, but that’s coming after + has run.

One ray of hope is if arrays were also their own slices (which they aren’t – there is ArraySlice instead, which has extra overhead to track the start and end slice into the parent array). If they were, then perhaps they could be modified to allow one, but only one, array to have an append happen to it which the others could ignore. But this would probably add overhead to arrays in general, and the whole point of arrays is to be fast – you don’t want to slow them down just to cater to this use case.

Perhaps there is a very clever way of figuring all this out, with or without the compiler’s help. But such gymnastics don’t seem like a good idea even then. The semantics are that + creates a new array. Wanting it to secretly modify an existing one under very specific circumstances doesn’t seem like the right solution – mutating the array is. If you prefer, you can wrap that var up in a nice little generic method and then pretend it’s not there. But it’ll make your code faster.


  1. Others suggested you shouldn’t care about this sort of thing until a profiler tells you that you need to (I think you definitely should care while you write your code – saying “I’ll only care when the profiler tells me there’s a problem” feels a bit like “I’ll only write correct code when the unit tests tell me it isn’t correct”). 
  2. I also think the addition of features like enum, as well as the flexibility of choosing between object or function solution, and the “safe until you ask not to be” would make Swift a really great CS teaching language. Perhaps a future edition of this book could be in Swift. 

A Swift Spelling Corrector

One of my favourite blog posts is Peter Norvig’s How to Write a Spelling Corrector, in which he writes 21 lines of python that sucks in a large corpus of text, and then uses some probability theory to propose spelling corrections. Go read it now, it’s cool.

Anyway, that code has been translated into multiple different programming languages, but not Swift, so let’s give that a go. The resulting code is maybe not as compact as the python equivalent, but to my eyes that’s not such a bad thing, as the python is a bit write-only for my liking.1

List Comprehensions and Swift

The heart of the code is this function, which generates a set of possible alternatives for a word:

def edits1(word):
   splits     = [(word[:i], word[i:]) for i in range(len(word) + 1)]
   deletes    = [a + b[1:] for a, b in splits if b]
   transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
   replaces   = [a + c + b[1:] for a, b in splits for c in alphabet if b]
   inserts    = [a + c + b     for a, b in splits for c in alphabet]
   return set(deletes + transposes + replaces + inserts)

This makes extensive use of python’s list comprehensions, which Swift lacks, so we’ll have to make do with map etc.

First, splits. The code takes a string, and generates an array of all the ways it can be split into two. So hello becomes (, hello), (h, ello), (he, llo), (hel, lo), (hell, o).

One way to do this in Swift would be to use map over the indices of the word, slicing it in two at each index, like so:

let splits = indices(word).map {
    (word[word.startIndex..<$0], word[$0..<word.endIndex])
}

This works – but those word.startIndex and word.endIndex are a bit cluttering. The python version uses a handy one-sided slicing operation that looks a bit neater. But we can easily extend String to have the same functionality. We just need to create a pre- and postfix version of the range operator, and then have that return a type that indicates a one-sided range:

// even though infix ..< already exists, we need to declare it
// two more times for the prefix and postfix form
postfix operator ..< { }
prefix operator ..< { }

// then, declare a couple of simple custom types that indicate one-sided ranges:
struct RangeStart<I: ForwardIndexType> { let start: I }
struct RangeEnd<I: ForwardIndexType> { let end: I }

// and define ..< to return them
postfix func ..<<I: ForwardIndexType>(lhs: I) -> RangeStart<I> 
{ return RangeStart(start: lhs) }

prefix func ..<<I: ForwardIndexType>(rhs: I) -> RangeEnd<I> 
{ return RangeEnd(end: rhs) }

// finally, extend String to have a slicing subscript for these types:
extension String {
    subscript(r: RangeStart<String.Index>) -> String { 
        return self[r.start..<self.endIndex]
    }
    subscript(r: RangeEnd<String.Index>) -> String { 
        return self[self.startIndex..<r.end] 
    }
}

This is a lot of effort to go to for just this one case, but probably useful to stash in a library somewhere for reuse. Armed with this, we can simplify the splits function:

// hello becomes (,hello), (h,ello), (he,llo) etc.
let splits = indices(word).map {
    (word[..<$0], word[$0..<])
}

OK now for each of the transformations. Deleting a single character is easy, just drop the first character from the right-hand of each split and join them back up:

// (h, ello) becomes hllo
let deletes = splits.map { $0.0 + dropFirst($0.1) }

Next, transposition. We want to generate every string in which two adjacent characters in the original string are swapped. This can be done by taking the first two characters of the right side of each split, swapping them, and joining it all back together. The tricky part is that not all right-hand sides will have two characters. The python code uses integer indexing and the fact that you can slice beyond the index range of a python string and just get back an empty string. Both of these are a non-starter in Swift, so the code is going to need to be a little more verbose:

// (h, ello) becomes hlelo
let transposes: [String] = map(splits) { left, right in
    if let fst = first(right) {
        let drop1 = dropFirst(right)
        if let snd = first(drop1) {
            let drop2 = dropFirst(drop1)
            return "\(left)\(snd)\(fst)\(drop2)"
        }
    }
    return ""
}.filter { !$0.isEmpty }

Unlike with delete, there’s not much to be done to neaten this up. There are helper functions that might make it a bit better (like dropTwo or firstTwo) but it’ll still look a bit messy. If anyone has some good suggestions, tweet me

Replacement requires a map within a map. We want to replace every character with another character from the alphabet. With each split, this can be done by dropping the first character of the right-hand side, and inserting each letter of the alphabet. If we do this with map, we’ll get an array of possible strings, but then if we map that over each of the splits, we’ll have a nested array. So instead, we can use flatMap, as introduced in Swift 1.2, which flattens out the arrays returned from each inner map into a single array of values:

let alphabet = "abcdefghijklmnopqrstuvwxyz"

let replaces = splits.flatMap { left, right in
    // (he, llo) becomes healo, heblo, heclo etc
    map(alphabet) { "\(left)\($0)\(dropFirst(right))" }
}

Finally, insertion is just the same as replace, only without dropping the first character of the right-hand side:

let inserts = splits.flatMap { left, right in
    // (he, llo) becomes heallo, hebllo, hecllo etc
    map(alphabet) { "\(left)\($0)\(right)" }
}

Reading the corpus

Next, we need to be able to load in a large text file, break it up into individual words, and then count the frequence of each word. This will be used to pick the most likely word in the case of misspellings. There are many good ways to do this, especially given useful Foundation classes like NSRegularExpression and NSCharacterSet. But since this is a Swift blog, let’s try for something almost entirely in Swift, just using the Foundation extension to String to read in the file:

struct SpellChecker {
    var knownWords: [String:Int] = [:]

    mutating func train(word: String) {
        knownWords[word] = knownWords[word]?.successor() ?? 1
    }    

    init?(contentsOfFile file: String) {
        if let text = String(contentsOfFile: file, encoding: NSUTF8StringEncoding, error: nil)?.lowercaseString {
            let words = split(text.unicodeScalars) { !("a"..."z").contains($0) }.map { String($0) }
            for word in words { self.train(word) }
        }
        else {
            return nil
        }
    }    
}

This code is rather shamefully English-language-centric. It may also cope partially with other Latin-based languages, but more through luck than anything else (for example, ("a"..."z").contains("é") is true, but ("a"..."z").contains("ß") is false).

Why split the unicodeScalars and then restringify them, rather than split the string itself directyly? Because splitting Swift strings is currently painfully slow. Swift makes heroic efforts to be correct regarding Unicode, such as presenting grapheme clusters as single characters. But this comes at a price, and since we’ve already acknowledged that we are really living in ASCII-world, it’s probably not a price worth paying.

Correcting spelling

Finally, we’re ready to write the spelling corrector. We need a function that takes a sequence of possible words, and filters out those that aren’t known words:

extension SpellChecker {
    func known<S: SequenceType where S.Generator.Element == String>(words: S) -> Set<String>? {
        let s = Set(filter(words) { self.knownWords.indexForKey($0) != nil })
        return s.isEmpty ? nil : s
    }
}

and a function that takes a word, and generates a list of known words from edits of edits (i.e. a distance of 2 from the original word):

extension SpellChecker {
    func knownEdits2(word: String) -> Set<String>? {
        var known_edits: Set<String> = []
        for edit in edits(word) {
            if let k = known(edits(edit)) {
                known_edits.unionInPlace(k)
            }
        }
        return known_edits.isEmpty ? nil : known_edits
    }
}

Why do these functions return an optional Set when it’s empty? Well, the goal is to go first for the word itself if it’s known, then the most likely word with a distance of 1 (i.e. edit(word)), then, if there’s no such word, the most likely word with a distance of 2 (i.e. knownEdits2(word)).

In the python version, this is done with or, which chooses the first non-empty set (i.e. [1,2] or [3,4] returns [1,2], but [] or [3,4] returns [3,4]). Swift doesn’t have this feature – but it’s very similar to how the nil-coalescing operator works. So if known and knownEdits2 return optionals, you can write the correcting function like this:

extension SpellChecker {
    func correct(word: String) -> String {
        let candidates = known([word]) ?? known(edits(word)) ?? knownEdits2(word)

        return reduce(candidates ?? [], word) {
            (knownWords[$0] ?? 1) < (knownWords[$1] ?? 1) ? $1 : $0
        }
    }
}

This is possibly abuse of the ?? operator, and perhaps we’d be better off overloading || to have the same behaviour as python’s or for collections. Still, it works.

The reduce is just searching for the value in candidates with the maximum frequency in the knownWords model. In the python version, the dictionary has a default value of 1 – that is, if a word isn’t known, it’s assumed to have a frequency of 1. This accounts for rare words that are correct, but not in the corpus. Swift doesn’t have dictionary defaulting, so we can use ?? to replace the not-found nil value with 1 instead.

And that’s pretty much it. There are lots of speed improvements that could be made by sacrificing the brevity of the code, and the original blog article has lots of ideas for enhancing this to make it better functionally. Here’s the full code, which you should be able to compile as a single file with swiftc to build a stand-alone executable. Let me know if you spot any bugs, I’m sure there are a couple!

/// Translation of [Peter Norvig's spell checker](http://norvig.com/spell-correct.html) into Swift.
/// Sample input corpus [here](http://norvig.com/big.txt)

import Foundation   // purely for IO, most things done with Swift.String

/// Given a word, produce a set of possible alternatives with
/// letters transposed, deleted, replaced or rogue characters inserted
func edits(word: String) -> Set<String> {
    if word.isEmpty { return [] }

    let splits = indices(word).map {
        (word[word.startIndex..<$0], word[$0..<word.endIndex])
    }

    let deletes = splits.map { $0.0 + dropFirst($0.1) }

    let transposes: [String] = map(splits) { left, right in
        if let fst = first(right) {
            let drop1 = dropFirst(right)
            if let snd = first(drop1) {
                let drop2 = dropFirst(drop1)
                return "\(left)\(snd)\(fst)\(drop2)"
            }
        }
        return ""
    }.filter { !$0.isEmpty }

    let alphabet = "abcdefghijklmnopqrstuvwxyz"

    let replaces = splits.flatMap { left, right in
        map(alphabet) { "\(left)\($0)\(dropFirst(right))" }
    }

    let inserts = splits.flatMap { left, right in
        map(alphabet) { "\(left)\($0)\(right)" }
    }

    return Set(deletes + transposes + replaces + inserts)
}


struct SpellChecker {

    var knownWords: [String:Int] = [:]

    mutating func train(word: String) {
        knownWords[word] = knownWords[word]?.successor() ?? 1
    }    

    init?(contentsOfFile file: String) {
        if let text = String(contentsOfFile: file, encoding: NSUTF8StringEncoding, error: nil)?.lowercaseString {
            let words = split(text.unicodeScalars) { !("a"..."z").contains($0) }.map { String($0) }
            for word in words { self.train(word) }
        }
        else {
            return nil
        }
    }

    func knownEdits2(word: String) -> Set<String>? {
        var known_edits: Set<String> = []
        for edit in edits(word) {
            if let k = known(edits(edit)) {
                known_edits.unionInPlace(k)
            }
        }
        return known_edits.isEmpty ? nil : known_edits
    }

    func known<S: SequenceType where S.Generator.Element == String>(words: S) -> Set<String>? {
        let s = Set(filter(words) { self.knownWords.indexForKey($0) != nil })
        return s.isEmpty ? nil : s
    }

    func correct(word: String) -> String {
        let candidates = known([word]) ?? known(edits(word)) ?? knownEdits2(word)

        return reduce(candidates ?? [], word) {
            (knownWords[$0] ?? 1) < (knownWords[$1] ?? 1) ? $1 : $0
        }
    }
}


// main()

if Process.argc == 2, let checker = SpellChecker(contentsOfFile: Process.arguments[1]) {

    println("Type word to check and hit enter")
    while let word = NSString(data: NSFileHandle.fileHandleWithStandardInput().availableData,
                              encoding: NSUTF8StringEncoding) as? String
        where last(word) == "\n"
    {
        let word = dropLast(word)
        let checked = checker.correct(word)

        if word == checked {
            println("\(word) unchanged")
        }
        else {
            println("\(word) -> \(checked)")
        }
    }

}
else {
    println("Usage: (Process.arguments[0]) <corpus_filename>")
}

  1. It’s a continuum of course. Some would consider the code in this post unreadably terse. Personally I don’t think making code longer always makes it more clear. 

Changes to the Swift standard library in 1.2 betas 2 and 3

Swift v1.2 beta 2 and 3 bring quite a few new enhancements, many of which build on what was introduced in beta 1.

The if let syntax continues to evolve – as of beta 2, you can check for a boolean condition and, if it’s true, then proceed to unwrap values. Nate Cook has a good example of this in his NSScanner extension:

// NSScanner.scanCharactersFromSet returns the scanned string
// by assigning it to a pointer you pass in:
var value: NSString? = ""

// The method returns true only if a string was successfully
// scanned, so check that first:
if scanner.scanCharactersFromSet(set, intoString: &value),
   // then if true, unwrap the value and convert to a String:
   let str = value as? String 
{
    // now use the valid scanned String as str
}

Flat Map

Continuing the theme of language features that help with handling optionals: beta 3 introduces flatMap. This is a map operation, followed by a flattening operation. The release notes have a good example of its use when you “want to chain optionals with functions”, so let’s drill into that a little.

The existing map allows you to apply a function to the value inside an optional, if that optional is non-nil. For example, suppose you have an optional integer i and you want to double it. You could write i.map { $0 * 2 }. If i has a value, you get back an optional of that value doubled. On the other hand, if i is nil, no doubling takes place.

Now, suppose instead of doubling an integer, you wanted to perform a mapping on an optional that could itself return an optional. The example in the release notes is find, which searches an array for a value and returns the index if it’s found – or nil if not found:

// an array of arrays
let arr = [[1,2,3],[4,5,6]]
// .first returns an optional of the first element of the array
// (optional because the array could be empty, in which case it's nil)
let fst = arr.first  // fst is now [Int]?, an optional array of ints
// now, if we want to find the index of the value 2, we could use map and find
let idx = fst.map { find($0, 2) }

The trouble with this is, idx will be an Int??, because fst was already optional, and then we mapped it with find, that also returns an optional – so now it’s an optional wrapped in an optional. What we want is to “flatten” this nested optional – so instead of map we use flatMap:

// flatMap means idx will be an Int?, not an Int??
let idx = fst.flatMap { find($0, 2) }

Here’s another use of flatMap: since we have an optional array, you could use it to get the first element of the first element:

// getting the first element of an optional array
let innerFst = fst.flatMap(first)

Now, if you’ve been programming Swift for a while, you might say hang on, since arrays have a first method, I could use optional chaining:

// using optional chaining to get the first element of an optional array
let innerFst = fst?.first

And at this point, hopefully it’s clear that flatMap and optional chaining do very similar things. Optional chaining essentially is a compact version of flatMap that only works on methods. When you want the same functionality as optional chaining, but instead with a function you need to pass the optional value into (like with find) instead of calling a method on that value, you need flatMap.

That’s not all flatMap does – it’s also a method on arrays (think of a mapping function that turns elements into arrays of elements), and there’s a generic free function version that works on sequences and collections. I won’t go into those here – I’m sure there will be a flood of Swift blogs coming up that will cover it in more detail. Just grab one as it drifts by, read, and experiment.1

Grab Bag

There are a few other changes to the standard library across the two releases:

  • The precendence of the ?? has been bumped from 110 to 131. This means it now has higher precedence than equality (130). So i == j ?? 0 now replaces a nil value for j with 0 prior to equating it to i. Note, it still doesn’t trump arithmetic operators, so unbracketed monstrocities like i + j ?? 0 won’t compile.
  • Slice has been renamed ArraySlice, making its purpose clearer (strings, for example, are their own slice type). Other than the name, nothing else has changed (except the additional flatMap).
  • &/ and &% are gone, as mentioned in the release notes.
  • Following its introduction in beta 1, various function arguments have been marked @noescape. Examples include various withUnsafeXXXPointer functions, and various map, reduce and filter functions. Of course, lazy‘s map is not marked like this because it does capture self.
  • Similar to UnsafePointer, AutoreleasingUnsafeMutablePointer and CFunctionPointer have lost their static null instance.
  • But they have acquired conformance to CVarArgType, as does UnsafePointer
  • Double and Float, on the other hand, are now conforming to _CVarArgPassedAsDouble, a specialization of CVarArg. This is to enable x86_64-specific handling apparently.
  • Character is now DebugPrintable
  • If you want a “how to speak Swift” tip, in various documenting comments, things are now called “values” rather than “objects”.
  • String.Index, String.UTF16Index and String.UTF8Index now have methods for converting between each other.
  • String.UnicodeScalarView now conforms to ExtensibleCollectionType and RangeReplaceableCollectionType, and its index is now bidirectional and comparable, giving it a lot of the same capabilities of the String itself.
  • UInt8 has a constructor from an ASCII UnicodeScalar (with a warning that the scalar better actually be ASCII!)
  • In a boon to reckless optimizers everywhere, unsafeUnwrap will now force-unwrap your optional without safety-checking non-nilness first.

Trailing Closures and Default Arguments

A subtle change in the language means that the split function has had its arguments reordered.

Previously, its second parameter was the isSeparator function, followed by a couple of defaulted arguments (maxSplit and allowEmptySlices). But as of Swift 1.2, when you supply a function via the “trailing closure syntax”, that trailing closure is always the last parameter. So if you tried to call the old split without supplying those defaulted arguments, you’d get an error, because it isn’t the last paramter, it’s the second. The new split makes the isSeparator function the last parameter.

While we’re talking about default arguments, did you know you that you can supply them in any order?

func f(a: String = "A", b: String = "B") {
    println("(a) (b)")
}

f(b: "bee")             // prints A bee
f(b: "Bee!", a: "Eh?")  // prints Eh? Bee!

Signed Mismatches

OK this last one’s a bit language-lawyerly so if that sort of stuff bores you, you’ve been warned!

You might be surprised to find that, up until Swift 1.2, the following code would compile:

let x: UInt = 5
let y: Int = 6
let z = x + y
// z will be a UInt of value 11

Given that Swift is usually so strict about not mixing types, how come this is allowed? It comes down to the following definition for + that appeared way back in Swift 1.0 beta 4:

// if a type is strideable, you can add its stride type to it
func +(lhs: T, rhs: T.Stride) -> T

// and strides can be signed
protocol _Strideable {
    /// A type that can represent the distance between two values of `Self`
    typealias Stride: SignedNumberType
}

And since UInt is strideable, and defines its distance (and thus its stride) as a signed integer, that means you can add signed integers to unsigned integers.2

Swift 1.2 fixes this state of affairs by introducing two bits of code. First, it adds a typealias to the _UnsignedIntegerType protocol:

protocol _UnsignedIntegerType : _IntegerType {
    typealias _DisallowMixedSignArithmetic : SignedIntegerType = Int
}

Next, it adds a new overload for the + operator:

func +<T : _UnsignedIntegerType>(lhs: T, rhs: T._DisallowMixedSignArithmetic) -> T

The upshot of this is if you try and add an Int to a UInt, there are now two matching overloads for the + operator: the old one, and the new one, and Swift will complain that they’re ambiguous. Hopefully the fact that the compiler error mentions mixed-sign arithmetic will be enough of a hint you that this is the problem. A bit clunky, but it does the job.


  1. At some point, someone will try to explain to you that flatMap is actually something called monadic bind. Don’t panic when they do – the most important thing is to understand what it does. This more abstract concept will come easily once you have a good grasp of optional and array versions of flatMap under your belt. When you’re ready, I’d recommend reading Rob Napier’s article and this series by Alexandros Salazar. 
  2. Though not the other way around i.e. y + x wouldn’t compile – which is another problem, since people expect integer addition to be symmetric. 

Sorting Nibbles in Swift

A couple of months back, John Regehr posted a challenge on his blog – write the fastest possible sorting algorithm for all the nibbles in a 64-bit word.

The problem is to sort the 4-bit pieces of a 64-bit word with (unsigned) smaller values towards the small end of the word. The nibble sort of 0xbadbeef is 0xfeedbba000000000. The function you implement will perform this sorting operation on a buffer of 1024 64-bit integers:

void nibble_sort(unsigned long *buf);

The challenge was in C, but it made me wonder about solving the same problem in Swift. Here’s a straight port of the supplied reference version in Swift:

func read_nibble(w: UInt64, i: UInt64) -> UInt64 {
    let res = w >> (i * 4)
    return res & 0xf
}

func write_nibble(inout w: UInt64, i: UInt64, v: UInt64) {
    var mask: UInt64 = 0xf
    mask <<= (i * 4);
    w &= ~mask
    var prom = v;
    prom <<= (i * 4)
    w |= prom
}

func nibble_sort_word_ref(var arg: UInt64) -> UInt64 {
    for var i: UInt64 = 0; i < 16; ++i {
        var min = i;
        for var j = i+1; j < 16; ++j {
            if read_nibble(arg, j) < read_nibble(arg, min) {
                min = j
            }
        }
        if (min != i) {
            var tmp = read_nibble(arg, i)
            write_nibble(&arg, i, read_nibble(arg, min))
            write_nibble(&arg, min, tmp)
        }
    }
    return arg;
}

func nibble_sort_ref(buf: UnsafeMutablePointer<UInt64>) {
    for var i=0; i<1024; i++ {
        buf[i] = nibble_sort_word_ref(buf[i])
    }
}

The first bit of good news is that on my machine, when compiled with -Ounchecked, this version performs with pretty much the same speed as the original C version. Sticking with -O instead gives up about ~20% performance, which is not too bad for the sake of safety.

So, this version has two functions to read and write the nibble at a given index, and a function to sort it using them. Get/set at a given index? That sounds a lot like a collection type. What if we were to wrap the word in a collection, and use subscript instead of the read/write functions? Something like this:

struct NibbleCollection: MutableCollectionType {
    var val: UInt64
    
    var startIndex: UInt64 { return 0 }
    var endIndex: UInt64 { return 16 }
    
    subscript(idx: UInt64) -> UInt64 {
        get {
            return (val >> (idx*4)) & 0xf
        }
        set(n) {
            let mask = 0xf << (idx * 4)
            val &= ~mask
            val |= n << (idx * 4)
        }
    }
    
    typealias Generator = IndexingGenerator<NibbleCollection>
    func generate() -> Generator { return Generator(self) }
}

Once you have this, you can then use the standard library sorting function on it to implement the challenge:

func nibble_sort_word_col(arg: UInt64) -> UInt64 {
    var col = NibbleCollection(val: arg)
    sort(&col)
    return col.val
}

“Does all this Collection overhead mean it won’t be as fast as the raw function implementation?” you might worry. But there really is no overhead – since NibbleCollection is a struct, not a class, it really isn’t anything more than the raw underlying UInt64 and some functions. All of the scaffolding sloughs away at compile time, leaving you with pretty much identical code to the function-based version.

With one big difference: Swift.sort is way better than the reference sort implementation above.1 So this version outperforms it, both in Swift and in C. With -O, the collection version is faster than the Swift reference, and the same speed as the C version. With -Ounchecked it outperforms the C version by about 25%.

Which tells you two things you should bear in mind when writing Swift:

  1. Structs implementing protocols, and generic functions, are a good way of getting abstraction without giving up performance.
  2. Always check if the code you want exists as a library function. Using it won‘t just save time and make your code clearer, it could be better optimized than the version you might write.

Of course, the competitive entries in C were orders of magnitute faster than the reference version, using bit-twiddling techniques like bucket sorts and lookup tables that take advantage of the problem being constrained to values less than 16 held in a single word. But since Swift also supports most of the operations they use, many of them can be ported too. A quick and dirty translations I did of a couple of the simpler ones suggests that, like the reference version, Swift can match or almost match them in performance too.


  1. Naming the two sort algorithms in question is an exercise for the reader. They might not be the one you’d guess first. 

Changes to the Swift Standard Library in 1.2 beta 1

Swift v1.2 is out, and, well, it’s pretty fab actually.

The most exciting part for me is the beefed-up if let syntax. Not only does it allow you to unwrap multiple optionals at once, you can define a later let in terms of an earlier one. For instance, here are 3 functions, each one of which takes a non-optional and returns an optional, chained together:

if let url = NSURL(string: urlString),
   let data = NSData(contentsOfURL: url),
   let image = UIImage(data: data) {
      // display image
}

The key here is that if at any of the preceding lets fail, the whole thing fails and the later functions are not called. If you’re already familiar with optional chaining with ?. this should come naturally to you. For more on this, see NSHipster’s article.

The new ability to defer initialization of let constants makes much of of my “avoiding var” post redundant, as you can now do this: 1

let direction: Direction    
switch (char) {
    case "a": direction = .Left
    case "s": direction = .Right
    default:  direction = .Forward
}

It’d still be nice to have a form of switch that evaluates to a value – but with the new let, that’s really just a desire for some syntactic sugar (imagine using it inside a short closure expression, without needing any return).

Speaking of switch, the new if, with the ability to have a where clause on the bound variables, means if may now be suitable for several places where you were previously using switch. But there’s still a place for switch – especially when using it for full coverage of each case of an enum. And I feel a bit like pattern matching via ~= is one of switch’s powerful overlooked features that I keep meaning to write a post about. This release adds a ~= operator for ranges, not just intervals.

Library Additions

The big new addition to the standard library is of course the Set collection type. The focus on mathematical set operations is very nice, and also that those operations take any SequenceType, not just another Set:

let alphabet = Set("abcdefghijklmnopqrstuvwxyz")
let consonants = alphabet.subtract("aeiou")
// don't forget, it's "jumps" not "jumped"
let foxy = "the quick brown fox jumped over the lazy dog"
alphabet.isSubsetOf(foxy)  // false

Note that the naming conventions emphasize the pure versions of these operations, which create new sets, with each operation having a mutating equivalent with an InPlace suffix:

// create a new set
var newalpha = consonants.union("aeiou")
// modify an existing set
newalpha.subtractInPlace(consonants)

Note also for those fearful of the evil of custom operators, there’s no - for set subtraction etc. The one operator defined for sets is ==. Unlike with Array, sets are also Equatable (since they mandate their contents are Equatable, via Hashable).

But that’s not all. Here’s a rundown of some of the other library changes:

  • There‘s no longer a separate countElements call for counting the size of collections vs count for ranges – it’s now just count for both.
  • Dictionary‘s index, which used to be bi-directional, is now forward-only, as are the lazy collections you get back from keys and values.
  • The C_ARGC and C_ARGV variables are gone, as is the _Process protocol. In their place there’s now a Process enum with 3 static methods: arguments, which gives you a [String] of the arguments, and argc and argv with give you the raw C-string versions.
  • Protocols that used to declare class funcs now declare static funcs since the language now requires that.
  • Array, ClosedInterval and Slice now conform to a new protocol, _DestructorSafeContainer. This protocol indicates “a container is destructor safe if whether it may store to memory on destruction only depends on its type parameters.” i.e. an array of Int is known never to allocate memory when its destroyed since Int doesn’t, whereas a more complex type (with a deinit, or aggregating a type with one) may not be.
  • They also now conform to a new __ArrayType protocol. Two underscores presumably means you definitely definitely shouldn’t try and conform to this directly yourself.
  • The builtin-converting Array.convertFromHeapArray function is gone.
  • The pointer types (AutoreleasingUnsafeMutablePointer, UnsafePointer and UnsafeMutablePointer now conform to a new _PointerType protocol.
  • The CVaListPointer (the equivalent of C’s va_list) now has an init from an unsafe pointer.
  • EmptyGenerator now has a public initializer, so you can use it directly.
  • The various unicode types now have links in their docs to the relevant parts of unicode.org’s glossary
  • HeapBuffer and HeapBufferStorage are gone, replaced by the more comprehensive and documented ManagedBuffer and ManagedBufferPointer, of which more later.
  • ObjectIdentifier is now Comparable.
  • The enigmatic OnHeap struct is gone.
  • All the views (UnicodeScalarView, UTF8View and UTF8View) inside String are now Printable and DebugPrintable
  • String.lowercaseString and uppercaseString are back! I missed you guys.
  • The various convertFrom static functions in StringInterpolationConvertible are now inits.
  • UnsafePointer and UnsafeMutablePointer no longer have a static null() instance, presumably you should just use the nil literal.
  • There’s a new zip function, which returns a Zip2 struct – consistent with the lazy function that returns LazyThings, and possibly a gateway to ZipMoreThan2 structs in the future…

The Printable and DebugPrintable protocol docs now point out: if you want a string representation of something, use the toString protocol – in case you were trying to get fancy with if let x as? Printable { use(x.description) }. Which the language will now let you do, since as? and is now work with Swift protocols, which is cool. But still don’t do that – especially considering you may be sorry when you find out String doesn’t conform to Printable.

The Character type is now a struct, not an enum, and no longer exposes a “small representation” and “large representation”. This appears to go along with a performance-improving refactoring that has made a big difference – for example, a simple bit of code that sucks in /usr/share/dict/words into a Swift String and counts the word lengths now runs in a quarter of the time it used to.

Bovine Husbandry

Suppose you wanted to write your very own collection, similar to Array, but managing all the memory yourself. Up until Swift 1.2, this was tricky to do.

The memory allocation part isn’t too hard – UnsafeMutablePointer makes it pretty straightforward (and typesafe) to allocate memory, create and destroy objects in that memory, move those objects to fresh memory when you need to expand the storage etc.

But what about de-allocating the memory? Structs have no deinit, and that unsafely-pointed memory ain’t going to deallocate itself. But you still want your collection to be a struct, just like Array is, so instead, you implement a class to hold the buffer, and make that a member. That class can implement deinit to destroy its contained objects and deallocate the buffer when the struct is destroyed.

But you’re still not done. At this point, you’re in a similar position Swift arrays were in way back in the first ever Swift beta. They were structs, but they didn’t have full value semantics. If someone copies your collection, they’ll get a bitwise copy of the struct, but all this will do is copy the reference to the buffer. The two structs will both point to the same buffer, and updating the values in one would update both. Behaviour when the buffer is expanded would depend on your implementation (if the buffer expanded itself, they might both see the change, if the struct replaced its own buffer, they might not).

How to handle this? One way would be to be able to detect when a struct is copied, and make a copy of its buffer. But just as structs don’t have deinit, they also don’t have copy constructors – this isn’t C++!

So plan B – let the bitwise copy happen, but when the collection is mutated (say, via subscript set), detect whether the buffer is referenced more than once, and at that point, if it is, make a copy of the buffer before changing it. This solves the problem, and also comes with a big side-benefit – your collection is now copy-on-write (COW). Arrays in Swift are also COW, and this means that if you make a copy of your array, but then don’t make any changes, no actual copying of the buffer takes place – the buffer is only copied when it has to be, because you are mutating the values of some shared storage.

Since classes in Swift are reference-counted, it ought to be possible to detect if your buffer class is being referenced by more than one struct. But there wasn’t an easy Swift-native way to do this, until 1.2, which introduces isUniquelyReferenced and isUniquelyReferencedNonObjC function calls:

extension MyFancyCollection: MutableCollectionType {
  // snip...

  subscript(idx: Index) -> T {
    get {
      return _buffer[idx]
    }
    set(newValue) {
      if isUniquelyReferencedNonObjC(&_buffer) {
          // all fine, this buffer is solely owned
          // by this struct:
          _buffer[idx] = newValue
      }
      else {
          // uh-oh, someone else is referencing this buffer
          _buffer = _buffer.cloneAndModify(idx, newValue)
      }
   }
}

Great, COW collections all round. But before you run off and start implementing your own buffer class for use with your collection, you probably want to check out the ManagedBuffer class implementation, and a ManagedBufferPointer struct that can be used to manage it. These provide the ability to create and resize a buffer, check its size, check if its uniquely referenced, access the underlying memory via the usual withUnsafeMutablePointer pattern etc. The commenting docs are pretty thorough so check them out.


  1. Just in case you want to try out this new feature in a playground – this technique does not work with globals. See this thread on the dev forums. 

Running Totals and Force Unwraps

Recently, while implementing my game of immutable snake, I needed to write a function that produced an array of running totals.1 Since creating an array with reduce is all the rage these days, my first take was like this:

let start = 0
let array = [1,2,3]
let runningTotal = reduce(array, [start]) { $0 + [$0.last! + $1] }
// produces [0,1,3,6]

As you can see, that code snippet contains a force unwrap. It’s perfectly safe – you can see by inspection there’s no possible way that the array could ever be empty, so .last will never be nil. Well, except for that time in the future where I decide I don’t want the initial value at the start of the array after all, and so change reduce’s initial value to [], and kerplowie.

Obviously that’d be easy to catch in a test, and you’d probably spot the ! when making the change. But maybe there are a few lines between the change and the unwrap. Maybe it’s dependent on a variable rather than a literal. In Swift, a ! should be like a traffic Stop sign. Stop for a minute and look both ways very carefully or you might get run over by a truck. Do you really need it? Is it definitely safe, no shadow of a doubt?

The best outcome from this pause for thought is another, equally valid, equally readable, way to solve the problem that removes the need for the !.

For example, take the following snippet from the Swift section of John Siracusa’s Yosemite review:

let ages = [
    "Tim":  53,  "Angela": 54,  "Craig":   44,
    "Jony": 47,  "Chris":  37,  "Michael": 34,
]
let people = sorted(ages.keys, <).filter { ages[$0]! < 50 }
println(people)

There’s another way to write this that obviates the need for the ! completely:2

// dictionary's implementation of SequenceType (which is what filter takes)
// results in iterating over all the (key,value) pairs
let people = filter(ages) { $0.1 < 50 }.map { $0.0 }.sorted(<)

// or, if you find $0.1 ugly/unreadable:
let people = filter(ages) { (k,v) in v < 50 }.map { (k,v) in k }.sorted(<)

Hopefully just as concise, but no need for the force unwrap.

I couldn’t spot a similar rewrite for my runningTotal calculation, so I took the lazy option and threw it out as a question on twitter.3

First up, suggestions of using ?? or if let instead of !:

let runningTotal = reduce(array, [start]) { $0 + [($0.last ?? 0) + $1] }

let runningTotal = reduce(array, [start]) { 
    if let val = $0.last {
        $0 + [$0.last! ?? 0 + $1]
    }
    else {
        // what to put here?
    }
}

These seem safer – but I feel like that safety can be an illusion. You know that thenil can never, ever, happen in the code as it stands. So what’s the point? Well, to kick in if you ever make a change by mistake that means you do force-unwrap a nil. But if that ever happened, you would have a bug, and sweeping it under the carpet and silently continuing could be just as bad. So perhaps what should go in that else clause is an assert, just like you’d get with the force-unwrap. In which case you’re not much better off.

How about eliminating the call to last, by passing the last value along as you go?

let runningTotal = reduce(a, (start,[start])) { ($0.0 + $1, $0.1 + [$0.0 + $1]) }.1

This version has no force-unwrap, but it’s pretty nasty to read as a one-liner. And the addition is being done twice so that could cause a performance problem for something more complex than an Int.

Speaking of performance, how about a version that uses map instead of reduce?

var runningValue = start
let runningTotal = [start] + map(a) {
    runningValue += $0
    return runningValue
}

“Ick, var”, you say. But you might change your mind if you ran a benchmark. This version screams ahead of the reduce version in terms of performance for arrays larger than a handful of elements. This is not surprising – Swift arrays aren’t lists. Concatenating two arrays is, in the absence of some kind of fancy optimization, O(n), and you’re doing that n times. So the innocent-looking reduce(array,[],+) version of flatmap is quadratic, so maybe best left as an educational example rather than something you actually do.

Anyway, if you must use a mutable variable, you can make yourself feel better by putting it in a generic function you know is safe and can use again and again. A function to produce a list of running totals is popular enough to find a place in plenty of standard libraries.

In Haskell, it’s called scanl. The Typelift Basis library has a very attractive recursive definition (which uses some other functions defined in Typelift, but you can probably get the idea):

public func scanl<B, A>(f : B -> A -> B) -> B -> [A] -> [B] {
    return { q in { ls in
        switch destruct(ls) {
            case .Empty:
                return q <| []
            case .Destructure(let x, let xs):
                return q <| scanl(f)(f(q)(x))(xs)
        }
    } }
}

As pretty as this is, relying on this kind of recursion in Swift as of now is probably also going to lead to performance problems. So here’s a more imperative implementation. In keeping with similar naming for reduce, I’ll call it accumulate, which is the name it has in Python:

// A version that returns any type that adheres so ExtensibleCollectionType
func accumulate
  <S: SequenceType, C: ExtensibleCollectionType>
  (source: S, var initial: C.Generator.Element, combine: (C.Generator.Element, S.Generator.Element) -> C.Generator.Element)
  -> C {
    var result = C()
    result.append(initial)
    for x in source {
        initial = combine(initial, x)
        result.append(initial)
    }
    return result
}

// And a default version that returns an Array, as with map
func accumulate<S: SequenceType, U>
  (source: S, initial: U, combine: (U, S.Generator.Element)->U)
  -> [U] {
    return accumulate(source, initial, combine)
}

So there you go. I’ve started putting the re-useable functions found on this blog up in a library on github, SwooshKit, that you might want to take a look at.


  1. The actual code was to compute the coordinates of the snake’s body from the head through the tail. 
  2. This also has the benefit of forcing you to put the sort at the end, where it’s sorting a filtered list which ought to be quicker. Unfortunately it introduces a different performance problem, can you spot it? 
  3. Thanks to @nnnnnnnn, @jl_hfl, @CodaFi_, @rob_rix, @inthehands and @maxbucknell for all their helpful replies.