A Swift Spelling Corrector

Posted on May 2, 2015May 2, 2015 by airspeedvelocity

One of my favourite blog posts is Peter Norvig’s How to Write a Spelling Corrector, in which he writes 21 lines of python that sucks in a large corpus of text, and then uses some probability theory to propose spelling corrections. Go read it now, it’s cool.

Anyway, that code has been translated into multiple different programming languages, but not Swift, so let’s give that a go. The resulting code is maybe not as compact as the python equivalent, but to my eyes that’s not such a bad thing, as the python is a bit write-only for my liking.¹

List Comprehensions and Swift

The heart of the code is this function, which generates a set of possible alternatives for a word:

def edits1(word):
   splits     = [(word[:i], word[i:]) for i in range(len(word) + 1)]
   deletes    = [a + b[1:] for a, b in splits if b]
   transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
   replaces   = [a + c + b[1:] for a, b in splits for c in alphabet if b]
   inserts    = [a + c + b     for a, b in splits for c in alphabet]
   return set(deletes + transposes + replaces + inserts)

This makes extensive use of python’s list comprehensions, which Swift lacks, so we’ll have to make do with map etc.

First, splits. The code takes a string, and generates an array of all the ways it can be split into two. So hello becomes (, hello), (h, ello), (he, llo), (hel, lo), (hell, o).

One way to do this in Swift would be to use map over the indices of the word, slicing it in two at each index, like so:

let splits = indices(word).map {
    (word[word.startIndex..<$0], word[$0..<word.endIndex])
}

This works – but those word.startIndex and word.endIndex are a bit cluttering. The python version uses a handy one-sided slicing operation that looks a bit neater. But we can easily extend String to have the same functionality. We just need to create a pre- and postfix version of the range operator, and then have that return a type that indicates a one-sided range:

// even though infix ..< already exists, we need to declare it
// two more times for the prefix and postfix form
postfix operator ..< { }
prefix operator ..< { }

// then, declare a couple of simple custom types that indicate one-sided ranges:
struct RangeStart<I: ForwardIndexType> { let start: I }
struct RangeEnd<I: ForwardIndexType> { let end: I }

// and define ..< to return them
postfix func ..<<I: ForwardIndexType>(lhs: I) -> RangeStart<I> 
{ return RangeStart(start: lhs) }

prefix func ..<<I: ForwardIndexType>(rhs: I) -> RangeEnd<I> 
{ return RangeEnd(end: rhs) }

// finally, extend String to have a slicing subscript for these types:
extension String {
    subscript(r: RangeStart<String.Index>) -> String { 
        return self[r.start..<self.endIndex]
    }
    subscript(r: RangeEnd<String.Index>) -> String { 
        return self[self.startIndex..<r.end] 
    }
}

This is a lot of effort to go to for just this one case, but probably useful to stash in a library somewhere for reuse. Armed with this, we can simplify the splits function:

// hello becomes (,hello), (h,ello), (he,llo) etc.
let splits = indices(word).map {
    (word[..<$0], word[$0..<])
}

OK now for each of the transformations. Deleting a single character is easy, just drop the first character from the right-hand of each split and join them back up:

// (h, ello) becomes hllo
let deletes = splits.map { $0.0 + dropFirst($0.1) }

Next, transposition. We want to generate every string in which two adjacent characters in the original string are swapped. This can be done by taking the first two characters of the right side of each split, swapping them, and joining it all back together. The tricky part is that not all right-hand sides will have two characters. The python code uses integer indexing and the fact that you can slice beyond the index range of a python string and just get back an empty string. Both of these are a non-starter in Swift, so the code is going to need to be a little more verbose:

// (h, ello) becomes hlelo
let transposes: [String] = map(splits) { left, right in
    if let fst = first(right) {
        let drop1 = dropFirst(right)
        if let snd = first(drop1) {
            let drop2 = dropFirst(drop1)
            return "\(left)\(snd)\(fst)\(drop2)"
        }
    }
    return ""
}.filter { !$0.isEmpty }

Unlike with delete, there’s not much to be done to neaten this up. There are helper functions that might make it a bit better (like dropTwo or firstTwo) but it’ll still look a bit messy. If anyone has some good suggestions, tweet me

Replacement requires a map within a map. We want to replace every character with another character from the alphabet. With each split, this can be done by dropping the first character of the right-hand side, and inserting each letter of the alphabet. If we do this with map, we’ll get an array of possible strings, but then if we map that over each of the splits, we’ll have a nested array. So instead, we can use flatMap, as introduced in Swift 1.2, which flattens out the arrays returned from each inner map into a single array of values:

let alphabet = "abcdefghijklmnopqrstuvwxyz"

let replaces = splits.flatMap { left, right in
    // (he, llo) becomes healo, heblo, heclo etc
    map(alphabet) { "\(left)\($0)\(dropFirst(right))" }
}

Finally, insertion is just the same as replace, only without dropping the first character of the right-hand side:

let inserts = splits.flatMap { left, right in
    // (he, llo) becomes heallo, hebllo, hecllo etc
    map(alphabet) { "\(left)\($0)\(right)" }
}

Reading the corpus

Next, we need to be able to load in a large text file, break it up into individual words, and then count the frequence of each word. This will be used to pick the most likely word in the case of misspellings. There are many good ways to do this, especially given useful Foundation classes like NSRegularExpression and NSCharacterSet. But since this is a Swift blog, let’s try for something almost entirely in Swift, just using the Foundation extension to String to read in the file:

struct SpellChecker {
    var knownWords: [String:Int] = [:]

    mutating func train(word: String) {
        knownWords[word] = knownWords[word]?.successor() ?? 1
    }    

    init?(contentsOfFile file: String) {
        if let text = String(contentsOfFile: file, encoding: NSUTF8StringEncoding, error: nil)?.lowercaseString {
            let words = split(text.unicodeScalars) { !("a"..."z").contains($0) }.map { String($0) }
            for word in words { self.train(word) }
        }
        else {
            return nil
        }
    }    
}

This code is rather shamefully English-language-centric. It may also cope partially with other Latin-based languages, but more through luck than anything else (for example, ("a"..."z").contains("é") is true, but ("a"..."z").contains("ß") is false).

Why split the unicodeScalars and then restringify them, rather than split the string itself directyly? Because splitting Swift strings is currently painfully slow. Swift makes heroic efforts to be correct regarding Unicode, such as presenting grapheme clusters as single characters. But this comes at a price, and since we’ve already acknowledged that we are really living in ASCII-world, it’s probably not a price worth paying.

Correcting spelling

Finally, we’re ready to write the spelling corrector. We need a function that takes a sequence of possible words, and filters out those that aren’t known words:

extension SpellChecker {
    func known<S: SequenceType where S.Generator.Element == String>(words: S) -> Set<String>? {
        let s = Set(filter(words) { self.knownWords.indexForKey($0) != nil })
        return s.isEmpty ? nil : s
    }
}

and a function that takes a word, and generates a list of known words from edits of edits (i.e. a distance of 2 from the original word):

extension SpellChecker {
    func knownEdits2(word: String) -> Set<String>? {
        var known_edits: Set<String> = []
        for edit in edits(word) {
            if let k = known(edits(edit)) {
                known_edits.unionInPlace(k)
            }
        }
        return known_edits.isEmpty ? nil : known_edits
    }
}

Why do these functions return an optional Set when it’s empty? Well, the goal is to go first for the word itself if it’s known, then the most likely word with a distance of 1 (i.e. edit(word)), then, if there’s no such word, the most likely word with a distance of 2 (i.e. knownEdits2(word)).

In the python version, this is done with or, which chooses the first non-empty set (i.e. [1,2] or [3,4] returns [1,2], but [] or [3,4] returns [3,4]). Swift doesn’t have this feature – but it’s very similar to how the nil-coalescing operator works. So if known and knownEdits2 return optionals, you can write the correcting function like this:

extension SpellChecker {
    func correct(word: String) -> String {
        let candidates = known([word]) ?? known(edits(word)) ?? knownEdits2(word)

        return reduce(candidates ?? [], word) {
            (knownWords[$0] ?? 1) < (knownWords[$1] ?? 1) ? $1 : $0
        }
    }
}

This is possibly abuse of the ?? operator, and perhaps we’d be better off overloading || to have the same behaviour as python’s or for collections. Still, it works.

The reduce is just searching for the value in candidates with the maximum frequency in the knownWords model. In the python version, the dictionary has a default value of 1 – that is, if a word isn’t known, it’s assumed to have a frequency of 1. This accounts for rare words that are correct, but not in the corpus. Swift doesn’t have dictionary defaulting, so we can use ?? to replace the not-found nil value with 1 instead.

And that’s pretty much it. There are lots of speed improvements that could be made by sacrificing the brevity of the code, and the original blog article has lots of ideas for enhancing this to make it better functionally. Here’s the full code, which you should be able to compile as a single file with swiftc to build a stand-alone executable. Let me know if you spot any bugs, I’m sure there are a couple!

/// Translation of [Peter Norvig's spell checker](http://norvig.com/spell-correct.html) into Swift.
/// Sample input corpus [here](http://norvig.com/big.txt)

import Foundation   // purely for IO, most things done with Swift.String

/// Given a word, produce a set of possible alternatives with
/// letters transposed, deleted, replaced or rogue characters inserted
func edits(word: String) -> Set<String> {
    if word.isEmpty { return [] }

    let splits = indices(word).map {
        (word[word.startIndex..<$0], word[$0..<word.endIndex])
    }

    let deletes = splits.map { $0.0 + dropFirst($0.1) }

    let transposes: [String] = map(splits) { left, right in
        if let fst = first(right) {
            let drop1 = dropFirst(right)
            if let snd = first(drop1) {
                let drop2 = dropFirst(drop1)
                return "\(left)\(snd)\(fst)\(drop2)"
            }
        }
        return ""
    }.filter { !$0.isEmpty }

    let alphabet = "abcdefghijklmnopqrstuvwxyz"

    let replaces = splits.flatMap { left, right in
        map(alphabet) { "\(left)\($0)\(dropFirst(right))" }
    }

    let inserts = splits.flatMap { left, right in
        map(alphabet) { "\(left)\($0)\(right)" }
    }

    return Set(deletes + transposes + replaces + inserts)
}


struct SpellChecker {

    var knownWords: [String:Int] = [:]

    mutating func train(word: String) {
        knownWords[word] = knownWords[word]?.successor() ?? 1
    }    

    init?(contentsOfFile file: String) {
        if let text = String(contentsOfFile: file, encoding: NSUTF8StringEncoding, error: nil)?.lowercaseString {
            let words = split(text.unicodeScalars) { !("a"..."z").contains($0) }.map { String($0) }
            for word in words { self.train(word) }
        }
        else {
            return nil
        }
    }

    func knownEdits2(word: String) -> Set<String>? {
        var known_edits: Set<String> = []
        for edit in edits(word) {
            if let k = known(edits(edit)) {
                known_edits.unionInPlace(k)
            }
        }
        return known_edits.isEmpty ? nil : known_edits
    }

    func known<S: SequenceType where S.Generator.Element == String>(words: S) -> Set<String>? {
        let s = Set(filter(words) { self.knownWords.indexForKey($0) != nil })
        return s.isEmpty ? nil : s
    }

    func correct(word: String) -> String {
        let candidates = known([word]) ?? known(edits(word)) ?? knownEdits2(word)

        return reduce(candidates ?? [], word) {
            (knownWords[$0] ?? 1) < (knownWords[$1] ?? 1) ? $1 : $0
        }
    }
}


// main()

if Process.argc == 2, let checker = SpellChecker(contentsOfFile: Process.arguments[1]) {

    println("Type word to check and hit enter")
    while let word = NSString(data: NSFileHandle.fileHandleWithStandardInput().availableData,
                              encoding: NSUTF8StringEncoding) as? String
        where last(word) == "\n"
    {
        let word = dropLast(word)
        let checked = checker.correct(word)

        if word == checked {
            println("\(word) unchanged")
        }
        else {
            println("\(word) -> \(checked)")
        }
    }

}
else {
    println("Usage: (Process.arguments[0]) <corpus_filename>")
}

It’s a continuum of course. Some would consider the code in this post unreadably terse. Personally I don’t think making code longer always makes it more clear. ↩

Protocols and Generics

Posted on March 26, 2015March 28, 2015 by airspeedvelocity

In a talk last week at Swift Summit, I spoke a bit about Swift and performance, and about how structs and generic functions in Swift allow you to write high-level code without paying a significant performance penalty. This built on the example I posted a couple of weeks back about sorting nibbles in Swift.

Towards the end, I gave a couple of lines of code that seemingly behave the same, but are actually subtly different:

// given a protocol with no Self or associated type requirements, 
// such as Printable (one of the few like this in the standard library)

// how is this...
func f(x: Printable) { }

// different from this...
func g<T: Printable>(x: T) { }

This article is going to look at the differences between the two. Warning – some of the behaviour below is undocumented, and subject to change. All the code has been tested on Swift 1.2 beta 4, but may behave differently in future versions. Also, all size numbers assume a 64-bit platform.

Functional Differences

So what is the difference between the two? In the first case, the function takes an argument with the type of the protocol Printable, and can then access the argument value via any of the methods that protocol provides.

In the second case, g takes an argument of any type T that conforms to Printable. This means T is the type of whatever was passed in as the argument. If you pass in an Int, then T is an Int. If you pass in an array of integers, then T is an [Int].

For most purposes, this distinction doesn’t matter when implementing the body of the function. Even though T might be an Int or an [Int], you can’t use any properties not guaranteed by Printable because the function needs to be valid for any kind of type that conforms to Printable. So you can’t call x.successor() or x.count. Only x.description.

The most important functional difference between the two forms comes when the function returns a value. Suppose you wanted to implement your own version of the standard library’s abs function. It could look something like this:

func myAbs<T: SignedNumberType>(x: T) -> T {
    if x < 0 {
       return -x
    }
    else {
        return x
    }
}

// myAbs works for any kind of signed number 
// (e.g. Int, Int64, Float, Double etc)
let i: Int8 = -4
let j = myAbs(i)
// j will be of type Int8, with value 4

This function relies on 3 things provided by SignedNumberType: a negation operator, a less-than operator (via SignedNumberType conforming to Comparable), and the ability to create a zero of the same type for comparison (via IntegerLiteralConvertible). It compares to zero and then returns the original value or its negation. Importantly, it returns the same type that was passed in. If you pass in an Int8, you get back an Int8. If you passed in a Double, you get back a Double. If this function were implemented using protocols,¹ you’d get back the type of the protocol. Not only would that be inconvenient for type inference purposes, you’d also probably need to cast the value back to the type you wanted using as! in order to make use of it.

Poking at Protocols

So aside from that, how else do the two functions differ? Well, there are still some ways you can tell the difference between the protocol and the T placeholder. For example, you can look at the size of the value:

func takesProtocol(x: Printable) {
    // this will print "40" every time:
    println(sizeofValue(x))
}

func takesPlaceholder<T: Printable>(x: T) {
    // this will print whatever the size of
    // the argument type is (for example, Int64
    // is 8, Int8 is 1, class references are 8)
    println(sizeofValue(x))
}

takesProtocol(1 as Int16)      // prints 40
takesPlaceholder(1 as Int16)   // prints 2

class MyClass: Printable {
    var description: String { return "MyClass" }
}

takesProtocol(MyClass())     // prints 40
takesPlaceholder(MyClass())  // prints 8

So it looks like Printable is some kind of fixed-sized box that holds any kind of value that is printable. This kind of boxing is quite a common feature in other languages like Java and C#. But here even references to classes are put inside this 40-byte box, which might surprise you if you‘re used to thinking of protocols as like references to pure virtual base classes. This Swift box is geared up to hold both value and reference types.

edit: as @norio_nomura points out, class-only protocols are smaller, at 16 bytes, as they never need to hold a larger-sized payload.

So what’s inside those 40 bytes? Mike Ash had a great series of posts in his Friday Q&A column about poking around inside Swift’s memory layout. Here we’re just going to do a much shorter bit of code to look into contents of the protocol:

// function to dump out the contents of a protocol variable
func dumpPrintable(var p: Printable) {
    // you could also do this with unsafeBitCast
    withUnsafePointer(&p) { ptr -> Void in
        let intPtr = UnsafePointer<Int>(ptr)
        for i in stride(from: 0, to: (sizeof(Printable)/8), by: 1) {
            println("\(i):\t 0x\(String(intPtr[i], radix: 16))")
        }
    }
}

let i = Int(0xb1ab1ab1a)
dumpPrintable(i)

// prints out:
0:       0xb1ab1ab1a
1:       0x7fff5ad10f48
2:       0x2000000000
3:       0x10507bfa8
4:       0x105074780

With a single 8-byte integer, the protocol appears to pack the value into the protocol value itself. The next two words look like uninitialized memory (their contents vary on each run), and then the last two are pointers to some metadata about the type that is actually being held. (Mike’s article goes into more detail about this metadata).

If you create a custom struct of size 16 or 24, this is also held within the first 3 words of the protocol. But if you go above this, it switches over to holding a pointer to the referenced value:

struct FourInts: Printable {
    var a = 0xaaaa
    var b = 0xbbbb
    var c = 0xcccc
    var d = 0xdddd
    var description: String { return toString(a,b,c,d) }
}

dumpPrintable(FourInts(), FourInts.self)

// prints out:
0:       0x7f8b5840fb90  // this is a pointer to a FourInts value
1:       0xaaaa          // uninitialized garbage (?)
2:       0xbbbb          // ditto
3:       0x10dde52a8     // metadata
4:       0x10dde51b8     // metadata

How to know that first part is a pointer to a FourInts type? Well, you can dereference it and see! We need to amend our dumping function slightly to tell it what the real type the protocol is referencing is:

func dumpPrintable<T>(var p: Printable, type: T.Type) {
    withUnsafePointer(&p) { ptr -> Void in
        let intPtr = UnsafePointer<Int>(ptr)
        for i in stride(from: 0, to: (sizeof(Printable)/8), by: 1) {
            println("\(i):\t 0x\(String(intPtr[i], radix: 16))")
        }
        // if the pointed-to value is to big to fit:
        if sizeof(T) > 24 {
            // we have an integer, and we want to turn it into a pointer,
            // so we use the bitPattern: constructor of UnsafePointer<T>
            let valPtr = UnsafePointer<T>(bitPattern: Int(intPtr.memory))
            // and now we can look at the value at that location in memory
            println("value at pointer: \(valPtr.memory)")
        }
    }
}

dumpPrintable(FourInts(),FourInts.self)

// prints out:
0:       0x7f8b5840fb90
1:       0x7fff909c5395
2:       0xaaaa
3:       0x10dde52a8
4:       0x10dde51b8
value at pointer: (43690, 48059, 52428, 56797)

Bingo! Those are the values of the 4 integers.

One final point before we move on. Just because you switch to referring to a value type using a protocol, this does not turn it into a reference type:

protocol Incrementable {
    var x: Int { get }
    mutating func inc()
}

struct S: Incrementable {
    var x = 0
    mutating func inc() {
        ++x
    }
}

var p1: Incrementable = S()
var p2 = p1
p1.inc()
p1.x  // now 1
p2.x  // still 0

Performance Implications

OK, so protocols used like this seem to add some level of indirection. Does that cost us anything compared to using the generic placeholder approach? To test this out, we can construct some trivial structs that perform a basic operation, and then attempt to run that operation multiple times via both a protocol and a generic placeholder.

First here’s the protocol:

protocol NumberGeneratorType {
    mutating func generateNumber() -> Int
}

We’re pretty restricted in what can be done without resorting to associated types, so all it does is spit out a number. Here are 3 implementions that do different things, along with two harnesses that iterate multiple times, totalling the returned numbers:

struct RandomGenerator: NumberGeneratorType {
    func generateNumber() -> Int {
        return Int(arc4random_uniform(10))
    }
}

struct IncrementingGenerator: NumberGeneratorType {
    var n: Int
    init(start: Int) { n = start }
    mutating func generateNumber() -> Int {
        n += 1
        return n
    }
}

struct ConstantGenerator: NumberGeneratorType {
    let n: Int
    init(constant: Int) { n = constant }
    func generateNumber() -> Int {
        return n
    }
}


func generateUsingProtocol(var g: NumberGeneratorType, count: Int) -> Int {
    return reduce(stride(from: 0, to: count, by: 1), 0) { total, _ in
        total &+ g.generateNumber()
    }
}

func generateUsingGeneric<T: NumberGeneratorType>(var g: T, count: Int) -> Int {
    return reduce(stride(from: 0, to: count, by: 1), 0) { total, _ in
        total &+ g.generateNumber()
    }
}

Running the above, compiled with -O, with a count of 10k iterations, gives the following timings:

(the full code with the timing can be found here)

Generic rand          261829µs
Protocol rand         286625µs
Generic incr            5481µs
Protocol incr          45094µs
Generic const              0µs
Protocol const         43666µs

So what does this tell us? Calling arc4random is quite expensive, so the marginal difference made by the protocol is negligible but noticeable. But in the case of the incrementing generator, it’s a lot more in proportion to the actual operation being performed.

And in the case of the constant generator, the compiler has managed to optimize away all the code of the generic version and turn it into a single multiply operation (number of iterations times the constant)! So it takes a constant tiny amount of time. But the protocol indirection acted as a barrier to this same optimization.

In fact if you recompile with -Ounchecked, the same happens with the incrementing generator too – it’s only the check for overflow on the increment that was preventing it. The protocol versions remain the same.

For the most part, much of this can probably be put in the “premature optimization” camp. The big wins are really more about the expressiveness generics give you, as seen in the previous article about sorting, and avoiding things like type erasure shown in the earlier section. But if you’re in the habit of writing generic functions, the performance gains are nice too. And when writing re-useable library functions like sort or reduce that are going to be called a lot and perform tight loops, they’re critical.

Of course, all this is subject to change – given that the protocols are not being used to get any kind of dynamic behaviour, perhaps the compiler could have optimized them too. But it doesn’t appear to at the moment.

Dynamic Dispatch

Speaking of dynamic behaviour – that might be one occasion when you might prefer to use protocols rather than generics.

Consider the following code:

func f(p: Printable) {
    println(p)
}

func g<T: Printable>(t: T) {
    println(t)
}

let a = [1,2,3]
let i = 1

// this will work fine: either can be converted
// to Printable and passed in, then the appropriate
// version of `description` is called at runtime
f(arc4random()%2 == 0 ? a : i)

// this will _not_ compile.  T needs to be fixed as
// either an array of ints, or an int at compile time, 
// it can't be both
g(arc4random()%2 == 0 ? a : i)

A contrived example obviously, but it shows how even structs in Swift can get dynamic behaviour at runtime via protocols, that you might find a use-case for.

There’s lots more to explore with generics. The mental model of “Swift just writes a version of your function for each different type” isn’t quite how it works in practice. And associated types and constraints between multiple placeholders allow you to create kinds of generic algorithms that would be very difficult to achieve with protocols. But this article is already pretty long so I’ll leave those for another day.

In fact you couldn’t implement this function using protocols rather than generics, because SignedIntegerType uses Self and (via IntegerLiteralConvertible) associated types. A topic for a different post. ↩

Changes to the Swift standard library in 1.2 betas 2 and 3

Posted on March 13, 2015March 26, 2015 by airspeedvelocity

Swift v1.2 beta 2 and 3 bring quite a few new enhancements, many of which build on what was introduced in beta 1.

The if let syntax continues to evolve – as of beta 2, you can check for a boolean condition and, if it’s true, then proceed to unwrap values. Nate Cook has a good example of this in his NSScanner extension:

// NSScanner.scanCharactersFromSet returns the scanned string
// by assigning it to a pointer you pass in:
var value: NSString? = ""

// The method returns true only if a string was successfully
// scanned, so check that first:
if scanner.scanCharactersFromSet(set, intoString: &value),
   // then if true, unwrap the value and convert to a String:
   let str = value as? String 
{
    // now use the valid scanned String as str
}

Flat Map

Continuing the theme of language features that help with handling optionals: beta 3 introduces flatMap. This is a map operation, followed by a flattening operation. The release notes have a good example of its use when you “want to chain optionals with functions”, so let’s drill into that a little.

The existing map allows you to apply a function to the value inside an optional, if that optional is non-nil. For example, suppose you have an optional integer i and you want to double it. You could write i.map { $0 * 2 }. If i has a value, you get back an optional of that value doubled. On the other hand, if i is nil, no doubling takes place.

Now, suppose instead of doubling an integer, you wanted to perform a mapping on an optional that could itself return an optional. The example in the release notes is find, which searches an array for a value and returns the index if it’s found – or nil if not found:

// an array of arrays
let arr = [[1,2,3],[4,5,6]]
// .first returns an optional of the first element of the array
// (optional because the array could be empty, in which case it's nil)
let fst = arr.first  // fst is now [Int]?, an optional array of ints
// now, if we want to find the index of the value 2, we could use map and find
let idx = fst.map { find($0, 2) }

The trouble with this is, idx will be an Int??, because fst was already optional, and then we mapped it with find, that also returns an optional – so now it’s an optional wrapped in an optional. What we want is to “flatten” this nested optional – so instead of map we use flatMap:

// flatMap means idx will be an Int?, not an Int??
let idx = fst.flatMap { find($0, 2) }

Here’s another use of flatMap: since we have an optional array, you could use it to get the first element of the first element:

// getting the first element of an optional array
let innerFst = fst.flatMap(first)

Now, if you’ve been programming Swift for a while, you might say hang on, since arrays have a first method, I could use optional chaining:

// using optional chaining to get the first element of an optional array
let innerFst = fst?.first

And at this point, hopefully it’s clear that flatMap and optional chaining do very similar things. Optional chaining essentially is a compact version of flatMap that only works on methods. When you want the same functionality as optional chaining, but instead with a function you need to pass the optional value into (like with find) instead of calling a method on that value, you need flatMap.

That’s not all flatMap does – it’s also a method on arrays (think of a mapping function that turns elements into arrays of elements), and there’s a generic free function version that works on sequences and collections. I won’t go into those here – I’m sure there will be a flood of Swift blogs coming up that will cover it in more detail. Just grab one as it drifts by, read, and experiment.¹

Grab Bag

There are a few other changes to the standard library across the two releases:

The precendence of the ?? has been bumped from 110 to 131. This means it now has higher precedence than equality (130). So i == j ?? 0 now replaces a nil value for j with 0 prior to equating it to i. Note, it still doesn’t trump arithmetic operators, so unbracketed monstrocities like i + j ?? 0 won’t compile.
Slice has been renamed ArraySlice, making its purpose clearer (strings, for example, are their own slice type). Other than the name, nothing else has changed (except the additional flatMap).
&/ and &% are gone, as mentioned in the release notes.
Following its introduction in beta 1, various function arguments have been marked @noescape. Examples include various withUnsafeXXXPointer functions, and various map, reduce and filter functions. Of course, lazy‘s map is not marked like this because it does capture self.
Similar to UnsafePointer, AutoreleasingUnsafeMutablePointer and CFunctionPointer have lost their static null instance.
But they have acquired conformance to CVarArgType, as does UnsafePointer
Double and Float, on the other hand, are now conforming to _CVarArgPassedAsDouble, a specialization of CVarArg. This is to enable x86_64-specific handling apparently.
Character is now DebugPrintable
If you want a “how to speak Swift” tip, in various documenting comments, things are now called “values” rather than “objects”.
String.Index, String.UTF16Index and String.UTF8Index now have methods for converting between each other.
String.UnicodeScalarView now conforms to ExtensibleCollectionType and RangeReplaceableCollectionType, and its index is now bidirectional and comparable, giving it a lot of the same capabilities of the String itself.
UInt8 has a constructor from an ASCII UnicodeScalar (with a warning that the scalar better actually be ASCII!)
In a boon to reckless optimizers everywhere, unsafeUnwrap will now force-unwrap your optional without safety-checking non-nilness first.

Trailing Closures and Default Arguments

A subtle change in the language means that the split function has had its arguments reordered.

Previously, its second parameter was the isSeparator function, followed by a couple of defaulted arguments (maxSplit and allowEmptySlices). But as of Swift 1.2, when you supply a function via the “trailing closure syntax”, that trailing closure is always the last parameter. So if you tried to call the old split without supplying those defaulted arguments, you’d get an error, because it isn’t the last paramter, it’s the second. The new split makes the isSeparator function the last parameter.

While we’re talking about default arguments, did you know you that you can supply them in any order?

func f(a: String = "A", b: String = "B") {
    println("(a) (b)")
}

f(b: "bee")             // prints A bee
f(b: "Bee!", a: "Eh?")  // prints Eh? Bee!

Signed Mismatches

OK this last one’s a bit language-lawyerly so if that sort of stuff bores you, you’ve been warned!

You might be surprised to find that, up until Swift 1.2, the following code would compile:

let x: UInt = 5
let y: Int = 6
let z = x + y
// z will be a UInt of value 11

Given that Swift is usually so strict about not mixing types, how come this is allowed? It comes down to the following definition for + that appeared way back in Swift 1.0 beta 4:

// if a type is strideable, you can add its stride type to it
func +(lhs: T, rhs: T.Stride) -> T

// and strides can be signed
protocol _Strideable {
    /// A type that can represent the distance between two values of `Self`
    typealias Stride: SignedNumberType
}

And since UInt is strideable, and defines its distance (and thus its stride) as a signed integer, that means you can add signed integers to unsigned integers.²

Swift 1.2 fixes this state of affairs by introducing two bits of code. First, it adds a typealias to the _UnsignedIntegerType protocol:

protocol _UnsignedIntegerType : _IntegerType {
    typealias _DisallowMixedSignArithmetic : SignedIntegerType = Int
}

Next, it adds a new overload for the + operator:

func +<T : _UnsignedIntegerType>(lhs: T, rhs: T._DisallowMixedSignArithmetic) -> T

The upshot of this is if you try and add an Int to a UInt, there are now two matching overloads for the + operator: the old one, and the new one, and Swift will complain that they’re ambiguous. Hopefully the fact that the compiler error mentions mixed-sign arithmetic will be enough of a hint you that this is the problem. A bit clunky, but it does the job.

At some point, someone will try to explain to you that flatMap is actually something called monadic bind. Don’t panic when they do – the most important thing is to understand what it does. This more abstract concept will come easily once you have a good grasp of optional and array versions of flatMap under your belt. When you’re ready, I’d recommend reading Rob Napier’s article and this series by Alexandros Salazar. ↩
Though not the other way around i.e. y + x wouldn’t compile – which is another problem, since people expect integer addition to be symmetric. ↩

Sorting Nibbles in Swift

Posted on March 8, 2015March 26, 2015 by airspeedvelocity

A couple of months back, John Regehr posted a challenge on his blog – write the fastest possible sorting algorithm for all the nibbles in a 64-bit word.

The problem is to sort the 4-bit pieces of a 64-bit word with (unsigned) smaller values towards the small end of the word. The nibble sort of 0xbadbeef is 0xfeedbba000000000. The function you implement will perform this sorting operation on a buffer of 1024 64-bit integers:

void nibble_sort(unsigned long *buf);

The challenge was in C, but it made me wonder about solving the same problem in Swift. Here’s a straight port of the supplied reference version in Swift:

func read_nibble(w: UInt64, i: UInt64) -> UInt64 {
    let res = w >> (i * 4)
    return res & 0xf
}

func write_nibble(inout w: UInt64, i: UInt64, v: UInt64) {
    var mask: UInt64 = 0xf
    mask <<= (i * 4);
    w &= ~mask
    var prom = v;
    prom <<= (i * 4)
    w |= prom
}

func nibble_sort_word_ref(var arg: UInt64) -> UInt64 {
    for var i: UInt64 = 0; i < 16; ++i {
        var min = i;
        for var j = i+1; j < 16; ++j {
            if read_nibble(arg, j) < read_nibble(arg, min) {
                min = j
            }
        }
        if (min != i) {
            var tmp = read_nibble(arg, i)
            write_nibble(&arg, i, read_nibble(arg, min))
            write_nibble(&arg, min, tmp)
        }
    }
    return arg;
}

func nibble_sort_ref(buf: UnsafeMutablePointer<UInt64>) {
    for var i=0; i<1024; i++ {
        buf[i] = nibble_sort_word_ref(buf[i])
    }
}

The first bit of good news is that on my machine, when compiled with -Ounchecked, this version performs with pretty much the same speed as the original C version. Sticking with -O instead gives up about ~20% performance, which is not too bad for the sake of safety.

So, this version has two functions to read and write the nibble at a given index, and a function to sort it using them. Get/set at a given index? That sounds a lot like a collection type. What if we were to wrap the word in a collection, and use subscript instead of the read/write functions? Something like this:

struct NibbleCollection: MutableCollectionType {
    var val: UInt64
    
    var startIndex: UInt64 { return 0 }
    var endIndex: UInt64 { return 16 }
    
    subscript(idx: UInt64) -> UInt64 {
        get {
            return (val >> (idx*4)) & 0xf
        }
        set(n) {
            let mask = 0xf << (idx * 4)
            val &= ~mask
            val |= n << (idx * 4)
        }
    }
    
    typealias Generator = IndexingGenerator<NibbleCollection>
    func generate() -> Generator { return Generator(self) }
}

Once you have this, you can then use the standard library sorting function on it to implement the challenge:

func nibble_sort_word_col(arg: UInt64) -> UInt64 {
    var col = NibbleCollection(val: arg)
    sort(&col)
    return col.val
}

“Does all this Collection overhead mean it won’t be as fast as the raw function implementation?” you might worry. But there really is no overhead – since NibbleCollection is a struct, not a class, it really isn’t anything more than the raw underlying UInt64 and some functions. All of the scaffolding sloughs away at compile time, leaving you with pretty much identical code to the function-based version.

With one big difference: Swift.sort is way better than the reference sort implementation above.¹ So this version outperforms it, both in Swift and in C. With -O, the collection version is faster than the Swift reference, and the same speed as the C version. With -Ounchecked it outperforms the C version by about 25%.

Which tells you two things you should bear in mind when writing Swift:

Structs implementing protocols, and generic functions, are a good way of getting abstraction without giving up performance.
Always check if the code you want exists as a library function. Using it won‘t just save time and make your code clearer, it could be better optimized than the version you might write.

Of course, the competitive entries in C were orders of magnitute faster than the reference version, using bit-twiddling techniques like bucket sorts and lookup tables that take advantage of the problem being constrained to values less than 16 held in a single word. But since Swift also supports most of the operations they use, many of them can be ported too. A quick and dirty translations I did of a couple of the simpler ones suggests that, like the reference version, Swift can match or almost match them in performance too.

Naming the two sort algorithms in question is an exercise for the reader. They might not be the one you’d guess first. ↩

Changes to the Swift Standard Library in 1.2 beta 1

Posted on February 11, 2015March 26, 2015 by airspeedvelocity

Swift v1.2 is out, and, well, it’s pretty fab actually.

The most exciting part for me is the beefed-up if let syntax. Not only does it allow you to unwrap multiple optionals at once, you can define a later let in terms of an earlier one. For instance, here are 3 functions, each one of which takes a non-optional and returns an optional, chained together:

if let url = NSURL(string: urlString),
   let data = NSData(contentsOfURL: url),
   let image = UIImage(data: data) {
      // display image
}

The key here is that if at any of the preceding lets fail, the whole thing fails and the later functions are not called. If you’re already familiar with optional chaining with ?. this should come naturally to you. For more on this, see NSHipster’s article.

The new ability to defer initialization of let constants makes much of of my “avoiding var” post redundant, as you can now do this: ¹

let direction: Direction    
switch (char) {
    case "a": direction = .Left
    case "s": direction = .Right
    default:  direction = .Forward
}

It’d still be nice to have a form of switch that evaluates to a value – but with the new let, that’s really just a desire for some syntactic sugar (imagine using it inside a short closure expression, without needing any return).

Speaking of switch, the new if, with the ability to have a where clause on the bound variables, means if may now be suitable for several places where you were previously using switch. But there’s still a place for switch – especially when using it for full coverage of each case of an enum. And I feel a bit like pattern matching via ~= is one of switch’s powerful overlooked features that I keep meaning to write a post about. This release adds a ~= operator for ranges, not just intervals.

Library Additions

The big new addition to the standard library is of course the Set collection type. The focus on mathematical set operations is very nice, and also that those operations take any SequenceType, not just another Set:

let alphabet = Set("abcdefghijklmnopqrstuvwxyz")
let consonants = alphabet.subtract("aeiou")
// don't forget, it's "jumps" not "jumped"
let foxy = "the quick brown fox jumped over the lazy dog"
alphabet.isSubsetOf(foxy)  // false

Note that the naming conventions emphasize the pure versions of these operations, which create new sets, with each operation having a mutating equivalent with an InPlace suffix:

// create a new set
var newalpha = consonants.union("aeiou")
// modify an existing set
newalpha.subtractInPlace(consonants)

Note also for those fearful of the evil of custom operators, there’s no - for set subtraction etc. The one operator defined for sets is ==. Unlike with Array, sets are also Equatable (since they mandate their contents are Equatable, via Hashable).

But that’s not all. Here’s a rundown of some of the other library changes:

There‘s no longer a separate countElements call for counting the size of collections vs count for ranges – it’s now just count for both.
Dictionary‘s index, which used to be bi-directional, is now forward-only, as are the lazy collections you get back from keys and values.
The C_ARGC and C_ARGV variables are gone, as is the _Process protocol. In their place there’s now a Process enum with 3 static methods: arguments, which gives you a [String] of the arguments, and argc and argv with give you the raw C-string versions.
Protocols that used to declare class funcs now declare static funcs since the language now requires that.
Array, ClosedInterval and Slice now conform to a new protocol, _DestructorSafeContainer. This protocol indicates “a container is destructor safe if whether it may store to memory on destruction only depends on its type parameters.” i.e. an array of Int is known never to allocate memory when its destroyed since Int doesn’t, whereas a more complex type (with a deinit, or aggregating a type with one) may not be.
They also now conform to a new __ArrayType protocol. Two underscores presumably means you definitely definitely shouldn’t try and conform to this directly yourself.
The builtin-converting Array.convertFromHeapArray function is gone.
The pointer types (AutoreleasingUnsafeMutablePointer, UnsafePointer and UnsafeMutablePointer now conform to a new _PointerType protocol.
The CVaListPointer (the equivalent of C’s va_list) now has an init from an unsafe pointer.
EmptyGenerator now has a public initializer, so you can use it directly.
The various unicode types now have links in their docs to the relevant parts of unicode.org’s glossary
HeapBuffer and HeapBufferStorage are gone, replaced by the more comprehensive and documented ManagedBuffer and ManagedBufferPointer, of which more later.
ObjectIdentifier is now Comparable.
The enigmatic OnHeap struct is gone.
All the views (UnicodeScalarView, UTF8View and UTF8View) inside String are now Printable and DebugPrintable
String.lowercaseString and uppercaseString are back! I missed you guys.
The various convertFrom static functions in StringInterpolationConvertible are now inits.
UnsafePointer and UnsafeMutablePointer no longer have a static null() instance, presumably you should just use the nil literal.
There’s a new zip function, which returns a Zip2 struct – consistent with the lazy function that returns LazyThings, and possibly a gateway to ZipMoreThan2 structs in the future…

The Printable and DebugPrintable protocol docs now point out: if you want a string representation of something, use the toString protocol – in case you were trying to get fancy with if let x as? Printable { use(x.description) }. Which the language will now let you do, since as? and is now work with Swift protocols, which is cool. But still don’t do that – especially considering you may be sorry when you find out String doesn’t conform to Printable.

The Character type is now a struct, not an enum, and no longer exposes a “small representation” and “large representation”. This appears to go along with a performance-improving refactoring that has made a big difference – for example, a simple bit of code that sucks in /usr/share/dict/words into a Swift String and counts the word lengths now runs in a quarter of the time it used to.

Bovine Husbandry

Suppose you wanted to write your very own collection, similar to Array, but managing all the memory yourself. Up until Swift 1.2, this was tricky to do.

The memory allocation part isn’t too hard – UnsafeMutablePointer makes it pretty straightforward (and typesafe) to allocate memory, create and destroy objects in that memory, move those objects to fresh memory when you need to expand the storage etc.

But what about de-allocating the memory? Structs have no deinit, and that unsafely-pointed memory ain’t going to deallocate itself. But you still want your collection to be a struct, just like Array is, so instead, you implement a class to hold the buffer, and make that a member. That class can implement deinit to destroy its contained objects and deallocate the buffer when the struct is destroyed.

But you’re still not done. At this point, you’re in a similar position Swift arrays were in way back in the first ever Swift beta. They were structs, but they didn’t have full value semantics. If someone copies your collection, they’ll get a bitwise copy of the struct, but all this will do is copy the reference to the buffer. The two structs will both point to the same buffer, and updating the values in one would update both. Behaviour when the buffer is expanded would depend on your implementation (if the buffer expanded itself, they might both see the change, if the struct replaced its own buffer, they might not).

How to handle this? One way would be to be able to detect when a struct is copied, and make a copy of its buffer. But just as structs don’t have deinit, they also don’t have copy constructors – this isn’t C++!

So plan B – let the bitwise copy happen, but when the collection is mutated (say, via subscript set), detect whether the buffer is referenced more than once, and at that point, if it is, make a copy of the buffer before changing it. This solves the problem, and also comes with a big side-benefit – your collection is now copy-on-write (COW). Arrays in Swift are also COW, and this means that if you make a copy of your array, but then don’t make any changes, no actual copying of the buffer takes place – the buffer is only copied when it has to be, because you are mutating the values of some shared storage.

Since classes in Swift are reference-counted, it ought to be possible to detect if your buffer class is being referenced by more than one struct. But there wasn’t an easy Swift-native way to do this, until 1.2, which introduces isUniquelyReferenced and isUniquelyReferencedNonObjC function calls:

extension MyFancyCollection: MutableCollectionType {
  // snip...

  subscript(idx: Index) -> T {
    get {
      return _buffer[idx]
    }
    set(newValue) {
      if isUniquelyReferencedNonObjC(&_buffer) {
          // all fine, this buffer is solely owned
          // by this struct:
          _buffer[idx] = newValue
      }
      else {
          // uh-oh, someone else is referencing this buffer
          _buffer = _buffer.cloneAndModify(idx, newValue)
      }
   }
}

Great, COW collections all round. But before you run off and start implementing your own buffer class for use with your collection, you probably want to check out the ManagedBuffer class implementation, and a ManagedBufferPointer struct that can be used to manage it. These provide the ability to create and resize a buffer, check its size, check if its uniquely referenced, access the underlying memory via the usual withUnsafeMutablePointer pattern etc. The commenting docs are pretty thorough so check them out.

Just in case you want to try out this new feature in a playground – this technique does not work with globals. See this thread on the dev forums. ↩

More fun with implicitly wrapped non-optionals

Posted on January 2, 2015March 23, 2015 by airspeedvelocity

As we’ve seen previously, Swift will happily auto-wrap your non-optional value inside an optional temporarily if it helps make the types involved compatible. We’ve seen the occasional instance where this can be surprising. But for the most part, this implicit wrapping of non-optionals is your friend.¹ Here’s a few more examples.

Equating Optionals

The Swift standard library defines an == operator for optionals:

func ==<T : Equatable>(lhs: T?, rhs: T?) -> Bool

So long as your optional wraps a type that’s Equatable, that means you can compare two optionals of the same underlying type:

let i: Int? = 1
let j: Int? = 1
let k: Int? = 2
let m: Int? = nil
let n: Int? = nil

i == j  // true,  1 == 1
i == k  // false, 1 != 2
i == m  // false, non-nil != nil
m == n  // true,  nil == nil

let a: [Int]? = [1,2,3]
let b: [Int]? = [1,2,3]

// error, [Int] is not equatable
// (it does have an == operator, but that's not the same thing)
a == b

let f: Double? = 1.0

// ignore the weird compiler error,
// the problem is Int and Double aren't
// the same type...
i == f

Well, this is useful, but even more useful is this one definition of == allows you to compare optionals against non-optionals too. Suppose you want to know if an optional string contains a specific value. You don’t care if it’s nil or not – just whether it contains that value.

You can just pass your non-optional into that same == that takes optionals:

let x = "Elsa"

let s1: String? = "Elsa"
let s2: String? = "Anna"
let s3: String? = nil

s1 == x  // true
s2 == x  // false
s3 == x  // false

This works because Swift is implicitly wrapping x in an optional each time. If Swift didn’t do this, and you implemented an optional-to-non-optional operator yourself, you’d have to define == an extra two times, to make sure it was symmetric (which is a requirement whenever you implement ==):

// definitions if there weren't implicit optionals
func ==<T:Equatable>(lhs:T, rhs:T?)->Bool { return rhs != nil && lhs == rhs! }
func ==<T:Equatable>(lhs:T?, rhs:T)->Bool { return lhs != nil && lhs! == rhs }

Beware though, optional == isn’t all sunshine and unicorn giggles. Imagine the following code:

let a: Any = "Hello"
let b: Any = "Goodbye"

// false, of course
(a as? String) == (b as? String)

// oops, this is true...
(a as? Int) == (b as? Int)

Depending on how you think, this is either perfectly reasonable or immensley confusing, but either way you should bear it in mind. And maybe it’s another reason to go on your long list “Why not to use Any and as unless you really really have to.”

Comparing Optionals

Unsurprisingly, there’s a similar definition of < that will compare two optionals so long as what they wrap is Comparable:

func <<T : _Comparable>(lhs: T?, rhs: T?) -> Bool

If lhs and rhs are both some value, it compares them. Two nils are considered equal just like before. But a nil is always considered less than any non-nil value, (Comparable must always enforce a strict total order so they have to go somewhere well-defined in the ordering).

So if you sort a set of optionals, the nil values all end up at one end:

let a: [Int?] = [1, 3, nil, 2]
sorted(a, <)
// results in [nil, 1, 2, 3]

This is fairly intuitive, but it can lead to other subtle bugs. Suppose you want to filter a list like so:²

let ages = [
    "Tim":  53,  "Angela": 54,  "Craig":   44,
    "Jony": 47,  "Chris":  37,  "Michael": 34,
]

let people = ["Tim", "Johny", "Chris", "Michael"]

// don't trust anyone over 40
let trust = people.filter { ages[$0] < 40 }

// trust will contain ["Johny", "Chris", "Michael"]

The string "Johny" was not in the ages dictionary so you get a nil – but the way < works, nil is less than 40, so included in the filtered array.

Optional Subscripts

While we're talking about dictionaries, did you know that Dictionary.subscript(key) doesn't just return an optional, it also takes an optional?

var d = ["1":1]

let four = "4"
let nonsense = "nonsense"

// this will add ["4":4] to d
// remember toInt returns an optional
// (the string might not have been a number)
d[four] = four.toInt()

// nothing will be added, because
// the rhs is nil
d[nonsense] = nonsense.toInt()

// this will remove the "1" entry from d
d["1"] = nil

That's quite useful – but also, there wasn't much choice in it working like this.³ Dictionary.subscript(key) { set } has to take an optional, because when you define a setter, you must define a getter. And Dictionary's getter has to be optional (in case the key isn't present). So the setter's input value has to be.

This means every time you assign a value to a dictionary using a key subscript, your non-optional value is getting auto-wrapped in an optional. If this didn’t happen automatically, it would make assigning values to your dictionary a real pain, as you’d have to type d[key] = .Some(value) every time.

To see this in practice, here's a very bare-bones (very inefficient) implementation of a dictionary. Take a look at the subscript to see how the optional is handled:

public struct RubbishDictionary<Key: Equatable, Value> {

    public typealias Element = (Key, Value)

    // just store the key/value pairs in an unsorted
    // array (see, told you it was rubbish)
    private typealias Storage = [Element]
    private var _store: Storage = []

    // there's a reason why this is a private method,
    // which I'll get to later...
    private func _indexForKey(key: Key) -> Storage.Index? {
        // new year's resolution: see if I can avoid
        // calling array[i] completely in 2015
        for (idx, (k, _)) in Zip2(indices(_store),_store) {
            if key == k { return idx }
        }
        return nil
    }

    // Subscript for key-based lookup/storage
    public subscript(key: Key) -> Value? {
        // "get" is pretty simple, either key can be
        // found, or not...
        get {
            if let idx = _indexForKey(key) {
                // implicit optional wrap
                return _store[idx].1
            }
            else {
                return nil
            }
        }
        // remember, newValue is a Value?
        set(newValue) {
            switch (_indexForKey(key), newValue) {

            // existing index, so replace value
            case let (.Some(idx), .Some(value)):
                _store[idx].1 = value

            // existing index, nil means remove value
            case let (.Some(idx), .None):
                _store.removeAtIndex(idx)

            // new index, add non-nil value
            case let (.None, .Some(value)):
                _store.append((key, value))

            // new index, nil value is no-op
            case (.None,.None):
                break
            }
        }
    }
}

Finally, just a reminder that an optional containing an optional containing nil is not the same as nil:

var d: [Int:Int?] = [1:1, 2:nil]
// d now contains two entries, 1:1 and 2:nil
d[2] = nil  // removes nil
d[3] = .Some(nil)  // adds 3:nil

Full imlementation of `RubbishDictionary`

So, in case you've made it all the way down here, and you're interested, I thought I'd end with a fuller but minimal-as-possible implementation of RubbishDictionary. It has almost-feature-parity with Dictionary (it's just missing getMirror and init(minimumCapatity)). It ought to do everything Dictionary can, just slower.

If reading raw code's not your thing, skip it. But it shows how to simply implement several of the standard library's protocols such as CollectionType and DictionaryLiteralConvertible, as well as a few other features like lazily-evaluated key and value collections. I like how Swift extensions allow you to build up the type in logical groups (e.g. each protocol implementation one-by-one). Since this is a post about implicit optional wrapping, I've flagged where it’s also helping keep the code consise.

I have a nagging feeling there are some neater ways to handle some of the optionals – if you spot one, let me know. Oh and bugs. I'm sure there are a couple of them.

// building on the initial definition given above...

// An index type for RubbishDictionary.  In theory, you could perhaps just
// expose an alias of the storage type’s index, however, if the Key type were
// an Int, this would mean subscript(key) and subscript(index) would be
// ambiguous. So instead it's simple wrapper struct that does little more than
// pass-through to the real index.
public struct RubbishDictionaryIndex<Key: Equatable, Value>: BidirectionalIndexType {
    private let _idx: RubbishDictionary<Key,Value>.Storage.Index

    public typealias Index = RubbishDictionaryIndex<Key,Value>

    public func predecessor() -> Index { return Index(_idx: _idx.predecessor()) }
    public func successor() -> Index { return Index(_idx: _idx.successor()) }
}

// Required to make the index conform to Equatable
// (indirectly required by BidirectionalIndexType)
public func ==<Key: Equatable, Value>
  (lhs: RubbishDictionaryIndex<Key,Value>, rhs: RubbishDictionaryIndex<Key,Value>)
   -> Bool { 
    return lhs._idx == rhs._idx 
}

// Conformance to CollectionType
extension RubbishDictionary: CollectionType {
    public typealias Index = RubbishDictionaryIndex<Key,Value>

    // Add a subscript that takes an Index.  This one
    // does not need to be nil, since an indexed entry
    // _must_ exist.  Read-only.
    public subscript(idx: Index) -> Element { return _store[idx._idx] }

    public var startIndex: Index { return Index(_idx: _store.startIndex) }
    public var endIndex: Index { return Index(_idx: _store.endIndex) }

    public typealias Generator = IndexingGenerator<RubbishDictionary>
    public func generate() -> Generator { return Generator(self) }
}

// Conformance to Printable and DebugPrintable
extension RubbishDictionary: Printable, DebugPrintable {
    public var description: String {
        let elements = map(self) { "($0.0):($0.1)" }
        return "[" + ",".join(elements) + "]"
    }

    public var debugDescription: String {
        let elements = map(self) { "(toDebugString($0.0)):(toDebugString($0.1))" }
        return "[" + ",".join(elements) + "]"
    }
}

// Conformance to DictionaryLiteralConvertible
extension RubbishDictionary: DictionaryLiteralConvertible {
    public init(dictionaryLiteral elements: (Key, Value)...) {
        for (k,v) in elements {
            self[k] = v
        }
    }
}

// Remaining features supported by Dictionary
extension RubbishDictionary {

    public var count: Int { return _store.count }
    public var isEmpty: Bool { return _store.isEmpty }

    // now for the public version of indexForKey
    public func indexForKey(key: Key) -> Index? {
        // this maps non-nil results from _indexForKey to
        // the wrapper type, or just returns nil if 
        // _indexForKey did...
        return _indexForKey(key).map { Index(_idx: $0) }
    }

    // a lazily-evaluated collection of keys in the dictionary
    public var keys: LazyBidirectionalCollection<MapCollectionView<RubbishDictionary<Key,Value>,Key>> {
        return lazy(self).map { (k,v) in k }
    }

    // and the same for the values
    public var values: LazyBidirectionalCollection<MapCollectionView<RubbishDictionary<Key,Value>,Value>> {
        return lazy(self).map { (k,v) in v }
    }

    // unlike its subscript equivalent, updateValue:forKey does not take an optional value...
    public mutating func updateValue(value: Value, forKey key: Key) -> Value? {
        if let idx = _indexForKey(key) {
            let old = _store[idx].1
            _store[idx].1 = value
            // implicit optional wrap
            return old
        }
        else {
            _store.append((key,value))
            return nil
        }
    }

    // returns the value removed, if there was one
    public mutating func removeValueForKey(key: Key) -> Value? {
        if let idx = _indexForKey(key) {
            // implicit optional wrap
            return _store.removeAtIndex(idx).1
        }
        else {
            return nil
        }
    }

    public mutating func removeAtIndex(index: Index) {
        _store.removeAtIndex(index._idx)
    }

    public mutating func removeAll(keepCapacity: Bool = false) { 
        _store.removeAll(keepCapacity: keepCapacity)
    }

}

Unlike implicitly-wrapped optionals, which are more like the kind of friend that gets you into fights in bars. ↩
I'm getting a lot of mileage out of this example. ↩
I guess it could have asserted on a nil value. ↩

Avoid using var with this one weird trick

Posted on December 21, 2014February 11, 2015 by airspeedvelocity

update: as of Swift 1.2, there is another way to do this, as you can now defer assignment to let.

Ever find yourself having to use var when a let would be better, because you’re having to choose between multiple options?

var direction: Direction
switch (char) {
    case "a": direction = .Left
    case "s": direction = .Right
    default:  direction = .Forward
}

var cry: String
if direction == .Forward {
    cry = "To glory!"
}
else {
    cry = "Tum-tee-tum"
}

Often you’ll see people declaring those vars as optionals set to nil, because they think you have to give it a value. This isn’t actually necessary – so long as the compiler can tell it will always be assigned to before it is used, as it is here, it’ll let you defer setting the value. But you still can’t use let instead of var.

Most people are aware of the ternary operator, that means you can replace the if version with this:

let cry = direction == .Forward ? "To glory!" : "Tum-tee-tum"

But this doesn’t scale at all well with multiple choices:

// hours of code formatting enjoyment available making this look sensible
let direction: Direction =
    char == "a" ? .Left
  : char == "s" ? .Right
  :               .Forward

But don’t despair, there’s a better way! Just wrap your switch in a closure expression:

let direction: Direction = { 
// you could put "_ -> Direction in" here instead of
// typing the let, if you prefer

    switch (char) {
    case "a": return .Left
    case "s": return .Right
    default:  return  .Forward
    }

}() // call the expression immediately

There, both sanity and let preserved.

Of course, it’d be nice if there were a version of switch (and maybe even if) that was an expression, so you didn’t need the closure. Especially if it meant Swift could fully infer the type for you (with the closure trick, you have to give the type).

So that’s on my letter to Father Christmas. What would be really nice would be something like this:

let direction = case(char) {
    when "a" { .Left }
    when "s" { .Right }
    otherwise { .Forward }
}

Calling the outer keyword case (and the default otherwise) is lifted from LISP. This would help differentiate it from the switch statement format, to avoid confusion and backwards-compatibility troubles. Ditching the : and replacing it with braces seems more consistent with the rest of Swift syntax rather than sticking with a hangover from C.¹ The blocks against the when clauses could follow the same pattern as closures, i.e. automatically return if they’re single expressions.² The compiler would need to detect when the clauses returned incompatible types and throw an error (the ternary operator behaves similarly).

Anyway, there’s a glass of sherry waiting for Santa if he visits.³ In the meantime, give the closure trick a go.

This idea courtesy of @OldManKris ↩
Maybe they should even be closures (and you could pass function names instead), not sure about this one. ↩
I think the Swift dev team have posted on the forums acknowledging switches-as-expressions would be useful and that they’ll probably get to it some time. ↩

zipWith, pipe forward, and treating functions like objects

Posted on December 5, 2014December 5, 2014 by airspeedvelocity

In the previous article, about implementing the Luhn algorithm using function composition in Swift, I had a footnote to the part about using mapEveryNth to double every other digit:

An alternative could be a version of map that cycled over a sequence of different functions. You could then give it an array of two functions – one that did nothing, and another that did the conversion.

This post is about doing just that. I’ll define some more generic helper functions, and then use them to build a variant of our checksum function.

First up: cycling through a sequence multiple times. Suppose you have a sequence, and you want to repeat that sequence over and over. For example, passing in 1...3 would result in 1,2,3,1,2,3,1,...etc.

You could do this by returning a SequenceOf object that takes a sequence, and serves up values from its generator, but intercepts when that generator returns nil and instead just resets it to a fresh one:

func cycle
  <S: SequenceType>(source: S) 
  -> SequenceOf<S.Generator.Element> {
    return SequenceOf { _->GeneratorOf<S.Generator.Element> in
        var g = source.generate()
        return GeneratorOf {
            if let x = g.next() {
                return x
            }
            else {
                g = source.generate()
                if let y = g.next() {
                    return y
                }
                else {
                    // maybe assert here, if you want that behaviour
                    // when passing an empty sequence into cycle
                    return nil
                }
            }
        }
    }
}

// seq is a sequence that repeats 1,2,3,1,2,3,1,2...
let seq = cycle(1...3)

This code does something that might be considered iffy – it re-uses a sequence for multiple passes. Per the Swift docs, sequences are not guaranteed to be multi-pass. However, most sequences you encounter from day-to-day are fine with multiple passes. Collection sequences are, for example. But some non-collection sequences like StrideThrough ought to be too. My (possibly misguided) take on this: it seems OK to take multiple passes over a sequence, so long as you make it clear to the caller that this will happen. The caller needs to know if their sequence allows this, and if they pass a sequence that doesn’t support it, the results would be undefined.¹

OK, next, suppose you have two sequences. Maybe they’re both finite, maybe one is finite and one infinite (such as a sequence created by cycle above). You want to combine the elements at the same point in each one together, using a supplied function that takes two arguments. We’ll call that zipWith, and it’s very simple:

func zipWith
  <S1: SequenceType, S2: SequenceType, T>
  (s1: S1, s2: S2, combine: (S1.Generator.Element,S2.Generator.Element) -> T) 
  -> [T] {
    return map(Zip2(s1,s2),combine)
}

// returns an array of 1+1,2+2,3+3,4+4,5+5,
// so [2,4,6,8,10]
zipWith(1...5, 1...5, +)

How does this work? Zip2 is a SequenceType that takes two sequences, and serves up elements from the same position in each as a pair.² For example, if you passed it [1,2,3] and ["a","b","c"], Zip2 would be a sequence of (1,"a"), (2,"b"), (3,"c"). If one sequence is longer than the other, Zip2 stops as soon as it’s used up the shortest sequence.³

Then, that zipped sequence of 2-tuples is passed into map, along with the combiner function. Map does it’s thing, calling a transformation function on each element to produce a new combined value. Remember, Swift functions that take n arguments can instead take one n-tuple. That is, if you have a function f(a,b), and a pair p = (x,y), you can call f as f(p). So if Zip2 produces p = (1,1), then +(p) returns 2.

Finally, map spits out the array of combined pairs, and that’s returned.

So, we have a function that cycles over a given sequence forever, and a function that takes two sequences and combines their elements using a given function. How does this help double every other number in a sequence?

Well, suppose you had a sequence of two functions: one that did nothing, just returned its input unchanged, and one that doubled its input. If you cycled that sequence, you’d have an infinite sequence of functions that alternated between doing nothing and doubling.

How could zipWith be used to combine that cycling pair of functions, with a sequence of numbers, such that every other number was doubled? What would the combiner function need to be? It would have to take a value (the number), and a function (do-nothing or double), and apply the function to the number.

We’ve already written a function that does that, last time: pipe forward.

infix operator |> {
    associativity left
}

public func |> <T,U>(t: T, f: T->U) -> U {
    return f(t)
}

So, if we passed in pipe forward as the function to zipWith, it’d do exactly what we want:⁴

// a function that does nothing to its input
func id<T>(t: T) -> T {
    return t
}

// a function that doubles its input
func double<I: IntegerType>(i: I) -> I {
    return i * 2
}

// an array of those two functions, for Ints:
let funcs: [Int->Int] = [id, double]

// double every other number from one to ten
// returns [1,4,3,8,5,12,7,16,9,20]
zipWith(1...10, cycle(funcs), |> )

Finally, let’s implement the new checksum function. We’re going to re-use the •, mapSome, toInt, sum and isMultipleOf functions from the previous article, just redefining the digit-doubling part:

let doubleThenCombine = { i in
    i < 5
        ? i * 2
        : i * 2 - 9
}

let idOrDouble: [Int->Int] = [
    id,
    doubleThenCombine,
]

let doubleEveryOther: [Int]->[Int] = { zipWith($0, cycle(idOrDouble), |> ) }

let extractDigits: String->[Int] = { mapSome($0, toInt) }

let checksum = isMultipleOf(10) • sum • doubleEveryOther • reverse • extractDigits

let ccnum = "4012 8888 8888 1881"
println( checksum(ccnum) ? "👍" : "👎" )

This shows another benefit of the function composition approach. We completely changed the way the doubleEveryOther function was implemented, but because its interfaces in and out remained the same, everything else could stay untouched. Compare this with the iterative version, where looping and doubling and summing were all tangled up together.

So that’s nice, but you might ask if this version is any better than the version from the last post that used mapEveryNth. And it probably isn’t.

But what it does do is show the power of treating functions like any other object, storing them in a collection and iterating over them.

Suppose the Luhn algorithm actually required you to double the even-positioned numbers, but halve the odd ones. Using the mapIfIndex function, you’d need to take two passes (or write another function that took two transformations). With the zipWith technique, all you’d have to do is replace id with a function to halve its input.

You could take this even further. In this example, the array was built literally. But imagine a scenario where you needed more dynamic behaviour – where the array of functions you passed into zipWith were built at runtime, perhaps chosen via user interaction.

“Big whoop,” you say. “This is just like an array of delegates. I’ve been doing this for years.” And that’s totally fair. But functions can be both lighter-weight and more flexible. No need to create an interface, define methods, create an object that implements the interface, worry about which methods need real implementations and which just need stubbing out. The callee defines the function type, the caller writes a closure expression, maybe captures a few variables, and we’re done. For more on this, see Peter Norvig’s slides on how language features can make design patterns invisible or simpler.⁵

As ever, you can find all the general-purpose functions defined in this post in the SwooshKit framework.

The alternative to this (which the Swift docs actually advocate) is to only write cycle against collections, not sequences. But this seems like a shame, as it’d rule out using it on sequences that don’t have collection equivalents, like strides, but that ought to be multi-pass capable. Perhaps an option is to create a new protocol, MultipassSequenceType: SequenceType { } and to have sequences that can be used this way extend that? ↩
Zip2 is a bit of an anomoly in the Swift standard library. Whereas in most places, the pattern is to have a function that returns an object (for example, the enumerate function returns an instance of EnumerateSequence), Zip2 skips that stage and is just a struct itself, that you initialize with two sequences. Its usage looks almost exactly the same, but that’s why the Z is capitalized (because the convention is to capitalize the names of structs). ↩
If you’re interested in a version of zip that runs to the end of the longer sequence, I used one in this gist. ↩
If you’re of a more Haskelly temperament, you might feel this is the wrong way around (functions should come before values because f(x). In which case you’d want pipe backwards (<|) and flip the first two arguments. ↩
OK, so he was talking about dynamic languages, not just functional ones, but it’s still worth reading. ↩

A straw man argument for trying more functional-style programming in Swift

Posted on December 3, 2014December 26, 2014 by airspeedvelocity

A few weeks back, @SwiftLDN ran a hands-on, and the challenge was a calculation of the Luhn checksum for credit card numbers in Swift. Implementing this algorithm features as an early problem to solve in some functional programming tutorials, and can serve as a nice example for contrasting different programming styles.

(If you’re already familiar with more functional-style Swift, you’ll already know about many of the techniques in this article, but I’m also going to use it as a jumping off point to talk more about overloading, and also performance.)

Here’s wikipedia’s description of the algorithm:

From the rightmost digit, which is the check digit, moving left, double the value of every second digit; if the product of this doubling operation is greater than 9 (e.g., 8 × 2 = 16), then sum the digits of the products (e.g., 16: 1 + 6 = 7, 18: 1 + 8 = 9).
Take the sum of all the digits.
If the total modulo 10 is equal to 0 (if the total ends in zero) then the number is valid according to the Luhn formula; else it is not valid.

Let’s start by writing an imperative version, and then getting all judgmental about it:

func checksum(ccnum: String) -> Bool {
    var sum = 0
    var idx = 0
    for char in reverse(ccnum) {
        if let digit = String(char).toInt() {
            if (++idx)%2 == 0 {
                sum += digit < 5
                    ? digit * 2
                    : digit * 2 - 9
            }
            else {
                sum += digit
            }
        }
    }

    return sum % 10 == 0
}

let ccnum = "4012 8888 8888 1881"
println( checksum(ccnum) ? "👍" : "👎" )

This bit of code is perfectly adequate. It already uses some of Swift’s features to make it simpler and safer, such as for ... in, an if let binding to check if the character was a digit, and reverse to reverse the string.

But it’s not totally obvious at first glance that it’s correct. The mutable var sum means you have to run through the whole function in your head to understand what it does, figuring out how sum gets changed over time. Debugging it would probably involve stepping through line by line.

“But this is Swift!”, you say. “We should be avoiding var, and using map and reduce and stuff.” So, maybe something like this:

func checksum(ccnum: String) -> Bool {

    let digits = map(ccnum) {
        String($0).toInt()
    }.filter {
       $0 != nil
    }.map {
       $0!
    }

    let checksumDigits = map(enumerate(reverse(digits))) {
        (idx: Int, digit: Int) -> Int in
        if idx%2 == 0 {
            return digit
        }
        else {
            return digit < 5
                ? digit * 2
                : digit * 2 - 9
        }
    }

    return checksumDigits.reduce(0,+) % 10 == 0
}

let ccnum = "4012 8888 8888 1881"
println( checksum(ccnum) ? "👍" : "👎" )

Well, that’s is a bit of a dog’s breakfast. I’d say it’s worse than the version with the for loop, in terms of readability. But we can fix that.

Let’s start with the first part – the conversion of the string to an array of digits. It contains a force unwrap, often a sign there’s a better way. Needing to map and filter simultaneously is a pretty common problem. Common enough that Haskell has a function that combines the two, called mapMaybe (Maybe in Haskell is like Swift’s Optional type). Like map, it takes a function that transforms the elements, but that function returns an optional, and if it returns nil for an element, that element is not included in the result.

Here’s an equivalent function in Swift, which I’ll call mapSome: ¹

func mapSome
  <S: SequenceType, D: ExtensibleCollectionType>
  (source: S, transform: (S.Generator.Element)->D.Generator.Element?)
  -> D {
      var result = D()
      for x in source {
          if let y = transform(x) {
              result.append(y)
          }
      }
      return result
}

let toInt = { (c: Character)->Int? in String(c).toInt() }

let ccnum = "4012 8888 8888 1881"
let digits: [Int] = mapSome(ccnum, toInt)
// digits = [4,0,1,2,8,8,8,8,8,8,8,8,1,8,8,1]

Note, when declaring digits, you have to give the type of [Int], because mapSome is written to work on any kind of container that supports the ExtensibleCollectionType – but you have to specify which type of extensible container you want to use (in this case, an array).²

Next, the part that applies a transformation (doubling, and combining double digits) to every other digit. One way to solve this is with another variant of map that only maps every nth entry.³

In fact you could generalize that even further, and write a map that took a function that determined if a given index should be transformed:

func mapIfIndex
  <S: SequenceType, C: ExtensibleCollectionType 
   where S.Generator.Element == C.Generator.Element>
  (source: S, transform: S.Generator.Element -> S.Generator.Element, ifIndex: Int -> Bool)
  -> C {
    var result = C()
    for (index,value) in enumerate(source) {
        if ifIndex(index) {
            result.append(transform(value))
        }
        else {
            result.append(value)
        }
    }
    return result
}

With this, we can write a version of map that transforms every nth entry:

func mapEveryNth
  <S: SequenceType, C: ExtensibleCollectionType 
   where S.Generator.Element == C.Generator.Element>
  (source: S, n: Int, transform: S.Generator.Element -> C.Generator.Element) 
  -> C {

    // enumerate starts from zero, so for this to work with the nth element,
    // and not the 0th, n+1th etc, we need to add 1 to the ifIndex check: 
    let isNth = { ($0 + 1) % n == 0 }

    return mapIfIndex(source, transform, isNth)
}

let double = { $0*2 }
let doubleEveryOther: [Int]->[Int] = { mapEveryNth($0, 2, double) }

let doubled = doubleEveryOther([1,2,3,4,5])
// doubled = [1,4,3,8,5]

Notice how, in the snippet above, instead of applying the map directly to some values, doubleEveryOther is defined as a function that takes an array of Ints, and returns a new array with every other one doubled. This is then used on an array of Ints. Bear this in mind for later.

Finally, we need to sum the result, and then check it’s a multiple of 10. Functions to do that are easy enough:

func sum
  <S: SequenceType 
   where S.Generator.Element: IntegerType>
  (nums: S) -> S.Generator.Element {
    return reduce(nums, 0) { $0.0 + $0.1 }
}

let summed = sum(1...10) // 55

func isMultipleOf<T: IntegerType>(of: T) -> T -> Bool {
    return { $0 % of == 0 }
}

let isMultipleOfTen = isMultipleOf(10)

isMultipleOfTen(5)  // false
isMultipleOfTen(20) // true

Note this time, isMultipleOf is a function that returns another function – isMultipleOf(10) returns a function that tests whether a number is a multiple of 10. isMultipleOf(5) would return a function that tested if a number were a multiple of 5.

OK so now we have a function that can filter out digits, one that transforms every other value, one that sums integers, and one that checks if the result is a multiple of 10. All we need to do is string them all together.

The simple way would be to store each intermediate result in a variable:

func checksum(ccnum: String) -> Bool {

    let digits: [Int] = mapSome(ccnum, toInt)

    let reversed = reverse(digits)

    let transformed: [Int] = mapEveryNth(reversed, 2) { i in
        i < 5
          ? i * 2
          : i * 2 - 9
    }

    let summed = sum(transformed)

    return isMultipleOf(10)(summed)
}

That’s starting to look good. But those variable declarations are kinda cluttering up the place.

You could ditch the intermediate variables and write the whole thing like this:

func checksum(ccnum: String) -> Bool {

    let doubleAndCombine = { i in
            i < 5
              ? i * 2
              : i * 2 - 9
    }

    return isMultipleOf10(sum(mapEveryNth(reverse(mapSome(ccnum, toInt) as [Int]), 2, doubleAndCombine) as [Int]))
}

Obviously that’s ridiculous – it’s not just impossible to read, it’s impossible to write. It took me a good few minutes plugging away, counting all the parenthesis, like an animal.⁴

Luckily there’s a handy operator we can define that makes chaining function calls together a lot easier: pipe forward.

It looks like this:

infix operator |> {
    associativity left
}

public func |><T,U>(t: T, f: T->U) -> U {
    return f(t)
}

Read that function definition as “take any value of type T as an argument on the left, and any function that takes type T and returns another value of type U on the right, apply the function to the value and return the result”.

This allows us to pipe the result of one function into the input of another , like so:

// returns 30 (sum of 1 to 5, doubled)
1...5 |> sum |> double

Using this, we can rewrite checksum again:

func checksum(ccnum: String) -> Bool {

    let doubleAndCombine = { i in
        i < 5
            ? i * 2
            : i * 2 - 9
    }

    return ccnum
       |> { mapSome($0, toInt) as [Int] }
       |> reverse
       |> { mapEveryNth($0, 2, doubleAndCombine) as [Int] }
       |> sum
       |> isMultipleOf(10)

}

Here, for the functions that take more than one parameter, we have to wrap them in a closure expression to turn them into a temporary function that takes one parameter, the one coming through the pipeline. To read more about pipe forward, try Kris Johnson’s post about it.

One last take. Instead of using the pipe forward, we could use function composition. We can define an operator that takes two functions and returns a new function that is the result of applying first one, then the other. That is, if f and g are functions that take one argument, we could compose them as a new function h that was equivalent to g(f(x)).

infix operator • {
    associativity left
}

public func • <T, U, V> (g: U -> V, f: T -> U) -> T -> V {
    return { x in g(f(x)) }
}

// define a new function that takes a range of Ints, sums
// them, then doubles the result
let sumThenDouble: Range -> Int = double • sum

sumThenDouble(1...5) // returns 30

Unless you’re too busy being incandescent with rage about someone defining an operator using •,⁵ you’ll notice that, unlike |>, composed functions read from right to left, to match the order if you wrote the application out in full i.e. g(f(x)). There’s nothing magic about this – it’s just how the • function was defined above, with g as the left-hand argument and f as the right-hand one.

You could also define a pipe backwards operator, <|, to go in a similar direction. The Functional Swift book defines >>> as a composition operator that goes left to right. Pick whichever you prefer – left-to-right might feel more natural because it’s similar to object method calls.

Composition can be very useful in building up new more complex functions out of smaller ones. Recall earlier when we were implementing mapEveryNth, we had to write a small function:

    // add 1 to the index to check if
    // this entry is the nth
    isNth = { ($0 + 1) % n }

Assuming we had a function that incremented a number, we could have used composition to build that function:

func successor<I: _Incrementable>(i: I) -> I {
    return i.successor()
}

let isNth = isMultipleOf(n) • successor

And so, for one final go at writing checksum, assuming all the helper functions above are readily available:

let extractDigits: String->[Int] = { mapSome($0, toInt) }

let doubleAndCombine = { i in
    i < 5
        ? i*2
        : i*2-9
}

let doubleEveryOther: [Int]->[Int] = { mapEveryNth($0, 2, doubleAndCombine) }

let checksum = isMultipleOf(10) • sum • doubleEveryOther • reverse • extractDigits

let ccnum = "4012 8888 8888 1881"
println( checksum(ccnum) ? "👍" : "👎" )

The actual declaration is just a let. The checksum function is composed purely from other functions, some generic, some specific.⁶

Is this better then our original versions? I think so.⁷ The definition of checksum looks, to me, to be very readable. Reading from right to left, you extract the digits from the input, reverse them, double every other digit, sum them, and check if that’s a multiple of 10. The individual custom functions can be tested separately to check they work correctly – this approach is especially nice when combined with Swift playgrounds.

You could argue this is all cheating. You could take the same “breaking the problem down” steps with the imperative algorithm I started this blog with. And that’s true – but then I did title this post “a straw man”. And ultimately, you’d really still be circling the functional solution, getting closer to it. That’s what’s nice about Swift’s hybrid approach – it allows you to shift from one style to the other according to your preference.

It also assumes you have a library of re-useable functions just lying around that do stuff like this. But a lot of these kind of libraries already exist. They start with all the functions available in the Swift standard library (about which you can read more on this blog), but you might want to check out Swiftz, LlamaKit, or some of Rob Rix’s micro framworks.

Or you could start your own personal library of reuseable functions as you go along. Mine, with all the functions from this blog, is called SwooshKit and you can find it here.

I realize mapOptional would be the equivalent name to mapMaybe, but to me mapSome sounds better and seems more descriptive of what it actually does. ↩
The standard library goes the other route – the free map function only returns arrays. There is a trick you can do that allows you to default to arrays if not specified, but still be generic – but I’ll leave that for another time. ↩
An alternative could be a version of map that cycled over a sequence of different functions. You could then give it an array of two functions – one that did nothing, and another that did the conversion. ↩
Or a LISP programmer. ↩
I really like it. The main argument against it is that, while it’s easy to type option-8 (because it looks like a *?) on my keyboard, that’s not the case for everyone e.g. Chinese or Arabic keyboards. I’d be interested in feedback from people affected by this (and not so interested in feedback from people who are ASCII-only fans even on English keyboards 😛 ). ↩
You could even take it further, and compose a function that did the thumbs up/down conversion (using a further functional equivalent of the ternary operator – the VB programmer in me would call that iif), and then compose that with the println function. But let’s not get silly. ↩
Except in one, possibly crucial, respect – it’s slower than the imperative version. I’ll cover why in a later post. ↩

Running Totals and Force Unwraps

Posted on November 27, 2014November 27, 2014 by airspeedvelocity

Recently, while implementing my game of immutable snake, I needed to write a function that produced an array of running totals.¹ Since creating an array with reduce is all the rage these days, my first take was like this:

let start = 0
let array = [1,2,3]
let runningTotal = reduce(array, [start]) { $0 + [$0.last! + $1] }
// produces [0,1,3,6]

As you can see, that code snippet contains a force unwrap. It’s perfectly safe – you can see by inspection there’s no possible way that the array could ever be empty, so .last will never be nil. Well, except for that time in the future where I decide I don’t want the initial value at the start of the array after all, and so change reduce’s initial value to [], and kerplowie.

Obviously that’d be easy to catch in a test, and you’d probably spot the ! when making the change. But maybe there are a few lines between the change and the unwrap. Maybe it’s dependent on a variable rather than a literal. In Swift, a ! should be like a traffic Stop sign. Stop for a minute and look both ways very carefully or you might get run over by a truck. Do you really need it? Is it definitely safe, no shadow of a doubt?

The best outcome from this pause for thought is another, equally valid, equally readable, way to solve the problem that removes the need for the !.

For example, take the following snippet from the Swift section of John Siracusa’s Yosemite review:

let ages = [
    "Tim":  53,  "Angela": 54,  "Craig":   44,
    "Jony": 47,  "Chris":  37,  "Michael": 34,
]
let people = sorted(ages.keys, <).filter { ages[$0]! < 50 }
println(people)

There’s another way to write this that obviates the need for the ! completely:²

// dictionary's implementation of SequenceType (which is what filter takes)
// results in iterating over all the (key,value) pairs
let people = filter(ages) { $0.1 < 50 }.map { $0.0 }.sorted(<)

// or, if you find $0.1 ugly/unreadable:
let people = filter(ages) { (k,v) in v < 50 }.map { (k,v) in k }.sorted(<)

Hopefully just as concise, but no need for the force unwrap.

I couldn’t spot a similar rewrite for my runningTotal calculation, so I took the lazy option and threw it out as a question on twitter.³

First up, suggestions of using ?? or if let instead of !:

let runningTotal = reduce(array, [start]) { $0 + [($0.last ?? 0) + $1] }

let runningTotal = reduce(array, [start]) { 
    if let val = $0.last {
        $0 + [$0.last! ?? 0 + $1]
    }
    else {
        // what to put here?
    }
}

These seem safer – but I feel like that safety can be an illusion. You know that thenil can never, ever, happen in the code as it stands. So what’s the point? Well, to kick in if you ever make a change by mistake that means you do force-unwrap a nil. But if that ever happened, you would have a bug, and sweeping it under the carpet and silently continuing could be just as bad. So perhaps what should go in that else clause is an assert, just like you’d get with the force-unwrap. In which case you’re not much better off.

How about eliminating the call to last, by passing the last value along as you go?

let runningTotal = reduce(a, (start,[start])) { ($0.0 + $1, $0.1 + [$0.0 + $1]) }.1

This version has no force-unwrap, but it’s pretty nasty to read as a one-liner. And the addition is being done twice so that could cause a performance problem for something more complex than an Int.

Speaking of performance, how about a version that uses map instead of reduce?

var runningValue = start
let runningTotal = [start] + map(a) {
    runningValue += $0
    return runningValue
}

“Ick, var”, you say. But you might change your mind if you ran a benchmark. This version screams ahead of the reduce version in terms of performance for arrays larger than a handful of elements. This is not surprising – Swift arrays aren’t lists. Concatenating two arrays is, in the absence of some kind of fancy optimization, O(n), and you’re doing that n times. So the innocent-looking reduce(array,[],+) version of flatmap is quadratic, so maybe best left as an educational example rather than something you actually do.

Anyway, if you must use a mutable variable, you can make yourself feel better by putting it in a generic function you know is safe and can use again and again. A function to produce a list of running totals is popular enough to find a place in plenty of standard libraries.

In Haskell, it’s called scanl. The Typelift Basis library has a very attractive recursive definition (which uses some other functions defined in Typelift, but you can probably get the idea):

public func scanl<B, A>(f : B -> A -> B) -> B -> [A] -> [B] {
    return { q in { ls in
        switch destruct(ls) {
            case .Empty:
                return q <| []
            case .Destructure(let x, let xs):
                return q <| scanl(f)(f(q)(x))(xs)
        }
    } }
}

As pretty as this is, relying on this kind of recursion in Swift as of now is probably also going to lead to performance problems. So here’s a more imperative implementation. In keeping with similar naming for reduce, I’ll call it accumulate, which is the name it has in Python:

// A version that returns any type that adheres so ExtensibleCollectionType
func accumulate
  <S: SequenceType, C: ExtensibleCollectionType>
  (source: S, var initial: C.Generator.Element, combine: (C.Generator.Element, S.Generator.Element) -> C.Generator.Element)
  -> C {
    var result = C()
    result.append(initial)
    for x in source {
        initial = combine(initial, x)
        result.append(initial)
    }
    return result
}

// And a default version that returns an Array, as with map
func accumulate<S: SequenceType, U>
  (source: S, initial: U, combine: (U, S.Generator.Element)->U)
  -> [U] {
    return accumulate(source, initial, combine)
}

So there you go. I’ve started putting the re-useable functions found on this blog up in a library on github, SwooshKit, that you might want to take a look at.

The actual code was to compute the coordinates of the snake’s body from the head through the tail. ↩
This also has the benefit of forcing you to put the sort at the end, where it’s sorting a filtered list which ought to be quicker. Unfortunately it introduces a different performance problem, can you spot it? ↩
Thanks to @nnnnnnnn, @jl_hfl, @CodaFi_, @rob_rix, @inthehands and @maxbucknell for all their helpful replies. ↩

	Sorting Nibbles in S… on Sorting Nibbles in Swift
	Collection Data Stru… on Arrays, Linked Lists and …
	Swift化零为整：Reduce 详解… on Arrays, Linked Lists and …
	Writing A Generic St… on Generic Collections, SubSequen…
	Writing A Generic St… on Collection Indices, Slices, an…

	Sorting Nibbles in S… on Sorting Nibbles in Swift
	Collection Data Stru… on Arrays, Linked Lists and …
	Swift化零为整：Reduce 详解… on Arrays, Linked Lists and …
	Writing A Generic St… on Generic Collections, SubSequen…
	Writing A Generic St… on Collection Indices, Slices, an…

Airspeed Velocity

African or European Swift?

A Swift Spelling Corrector

Protocols and Generics

Functional Differences

Poking at Protocols

Performance Implications

Dynamic Dispatch

Changes to the Swift standard library in 1.2 betas 2 and 3

Sorting Nibbles in Swift

Changes to the Swift Standard Library in 1.2 beta 1

More fun with implicitly wrapped non-optionals

Equating Optionals

Comparing Optionals

Optional Subscripts

Full imlementation of `RubbishDictionary`

Avoid using var with this one weird trick

zipWith, pipe forward, and treating functions like objects

A straw man argument for trying more functional-style programming in Swift

Running Totals and Force Unwraps

Functional Differences

Poking at Protocols

Performance Implications

Dynamic Dispatch

Equating Optionals

Comparing Optionals

Optional Subscripts

Full imlementation of RubbishDictionary

Full imlementation of `RubbishDictionary`