My Talk at Swift Summit

Posted on May 22, 2015May 22, 2015 by airspeedvelocity

Earlier in the year, I spoke at Swift Summit, and the video has just been posted.

I also added some annotations to the slides with more info, credits and further links, which you can find here on speaker deck.

Swift Summit was a really fun conference to attend with many great speakers, and assuming it happens again next year I’d heartily recommend it.

Protocols and Generics

Posted on March 26, 2015March 28, 2015 by airspeedvelocity

In a talk last week at Swift Summit, I spoke a bit about Swift and performance, and about how structs and generic functions in Swift allow you to write high-level code without paying a significant performance penalty. This built on the example I posted a couple of weeks back about sorting nibbles in Swift.

Towards the end, I gave a couple of lines of code that seemingly behave the same, but are actually subtly different:

// given a protocol with no Self or associated type requirements, 
// such as Printable (one of the few like this in the standard library)

// how is this...
func f(x: Printable) { }

// different from this...
func g<T: Printable>(x: T) { }

This article is going to look at the differences between the two. Warning – some of the behaviour below is undocumented, and subject to change. All the code has been tested on Swift 1.2 beta 4, but may behave differently in future versions. Also, all size numbers assume a 64-bit platform.

Functional Differences

So what is the difference between the two? In the first case, the function takes an argument with the type of the protocol Printable, and can then access the argument value via any of the methods that protocol provides.

In the second case, g takes an argument of any type T that conforms to Printable. This means T is the type of whatever was passed in as the argument. If you pass in an Int, then T is an Int. If you pass in an array of integers, then T is an [Int].

For most purposes, this distinction doesn’t matter when implementing the body of the function. Even though T might be an Int or an [Int], you can’t use any properties not guaranteed by Printable because the function needs to be valid for any kind of type that conforms to Printable. So you can’t call x.successor() or x.count. Only x.description.

The most important functional difference between the two forms comes when the function returns a value. Suppose you wanted to implement your own version of the standard library’s abs function. It could look something like this:

func myAbs<T: SignedNumberType>(x: T) -> T {
    if x < 0 {
       return -x
    }
    else {
        return x
    }
}

// myAbs works for any kind of signed number 
// (e.g. Int, Int64, Float, Double etc)
let i: Int8 = -4
let j = myAbs(i)
// j will be of type Int8, with value 4

This function relies on 3 things provided by SignedNumberType: a negation operator, a less-than operator (via SignedNumberType conforming to Comparable), and the ability to create a zero of the same type for comparison (via IntegerLiteralConvertible). It compares to zero and then returns the original value or its negation. Importantly, it returns the same type that was passed in. If you pass in an Int8, you get back an Int8. If you passed in a Double, you get back a Double. If this function were implemented using protocols,¹ you’d get back the type of the protocol. Not only would that be inconvenient for type inference purposes, you’d also probably need to cast the value back to the type you wanted using as! in order to make use of it.

Poking at Protocols

So aside from that, how else do the two functions differ? Well, there are still some ways you can tell the difference between the protocol and the T placeholder. For example, you can look at the size of the value:

func takesProtocol(x: Printable) {
    // this will print "40" every time:
    println(sizeofValue(x))
}

func takesPlaceholder<T: Printable>(x: T) {
    // this will print whatever the size of
    // the argument type is (for example, Int64
    // is 8, Int8 is 1, class references are 8)
    println(sizeofValue(x))
}

takesProtocol(1 as Int16)      // prints 40
takesPlaceholder(1 as Int16)   // prints 2

class MyClass: Printable {
    var description: String { return "MyClass" }
}

takesProtocol(MyClass())     // prints 40
takesPlaceholder(MyClass())  // prints 8

So it looks like Printable is some kind of fixed-sized box that holds any kind of value that is printable. This kind of boxing is quite a common feature in other languages like Java and C#. But here even references to classes are put inside this 40-byte box, which might surprise you if you‘re used to thinking of protocols as like references to pure virtual base classes. This Swift box is geared up to hold both value and reference types.

edit: as @norio_nomura points out, class-only protocols are smaller, at 16 bytes, as they never need to hold a larger-sized payload.

So what’s inside those 40 bytes? Mike Ash had a great series of posts in his Friday Q&A column about poking around inside Swift’s memory layout. Here we’re just going to do a much shorter bit of code to look into contents of the protocol:

// function to dump out the contents of a protocol variable
func dumpPrintable(var p: Printable) {
    // you could also do this with unsafeBitCast
    withUnsafePointer(&p) { ptr -> Void in
        let intPtr = UnsafePointer<Int>(ptr)
        for i in stride(from: 0, to: (sizeof(Printable)/8), by: 1) {
            println("\(i):\t 0x\(String(intPtr[i], radix: 16))")
        }
    }
}

let i = Int(0xb1ab1ab1a)
dumpPrintable(i)

// prints out:
0:       0xb1ab1ab1a
1:       0x7fff5ad10f48
2:       0x2000000000
3:       0x10507bfa8
4:       0x105074780

With a single 8-byte integer, the protocol appears to pack the value into the protocol value itself. The next two words look like uninitialized memory (their contents vary on each run), and then the last two are pointers to some metadata about the type that is actually being held. (Mike’s article goes into more detail about this metadata).

If you create a custom struct of size 16 or 24, this is also held within the first 3 words of the protocol. But if you go above this, it switches over to holding a pointer to the referenced value:

struct FourInts: Printable {
    var a = 0xaaaa
    var b = 0xbbbb
    var c = 0xcccc
    var d = 0xdddd
    var description: String { return toString(a,b,c,d) }
}

dumpPrintable(FourInts(), FourInts.self)

// prints out:
0:       0x7f8b5840fb90  // this is a pointer to a FourInts value
1:       0xaaaa          // uninitialized garbage (?)
2:       0xbbbb          // ditto
3:       0x10dde52a8     // metadata
4:       0x10dde51b8     // metadata

How to know that first part is a pointer to a FourInts type? Well, you can dereference it and see! We need to amend our dumping function slightly to tell it what the real type the protocol is referencing is:

func dumpPrintable<T>(var p: Printable, type: T.Type) {
    withUnsafePointer(&p) { ptr -> Void in
        let intPtr = UnsafePointer<Int>(ptr)
        for i in stride(from: 0, to: (sizeof(Printable)/8), by: 1) {
            println("\(i):\t 0x\(String(intPtr[i], radix: 16))")
        }
        // if the pointed-to value is to big to fit:
        if sizeof(T) > 24 {
            // we have an integer, and we want to turn it into a pointer,
            // so we use the bitPattern: constructor of UnsafePointer<T>
            let valPtr = UnsafePointer<T>(bitPattern: Int(intPtr.memory))
            // and now we can look at the value at that location in memory
            println("value at pointer: \(valPtr.memory)")
        }
    }
}

dumpPrintable(FourInts(),FourInts.self)

// prints out:
0:       0x7f8b5840fb90
1:       0x7fff909c5395
2:       0xaaaa
3:       0x10dde52a8
4:       0x10dde51b8
value at pointer: (43690, 48059, 52428, 56797)

Bingo! Those are the values of the 4 integers.

One final point before we move on. Just because you switch to referring to a value type using a protocol, this does not turn it into a reference type:

protocol Incrementable {
    var x: Int { get }
    mutating func inc()
}

struct S: Incrementable {
    var x = 0
    mutating func inc() {
        ++x
    }
}

var p1: Incrementable = S()
var p2 = p1
p1.inc()
p1.x  // now 1
p2.x  // still 0

Performance Implications

OK, so protocols used like this seem to add some level of indirection. Does that cost us anything compared to using the generic placeholder approach? To test this out, we can construct some trivial structs that perform a basic operation, and then attempt to run that operation multiple times via both a protocol and a generic placeholder.

First here’s the protocol:

protocol NumberGeneratorType {
    mutating func generateNumber() -> Int
}

We’re pretty restricted in what can be done without resorting to associated types, so all it does is spit out a number. Here are 3 implementions that do different things, along with two harnesses that iterate multiple times, totalling the returned numbers:

struct RandomGenerator: NumberGeneratorType {
    func generateNumber() -> Int {
        return Int(arc4random_uniform(10))
    }
}

struct IncrementingGenerator: NumberGeneratorType {
    var n: Int
    init(start: Int) { n = start }
    mutating func generateNumber() -> Int {
        n += 1
        return n
    }
}

struct ConstantGenerator: NumberGeneratorType {
    let n: Int
    init(constant: Int) { n = constant }
    func generateNumber() -> Int {
        return n
    }
}


func generateUsingProtocol(var g: NumberGeneratorType, count: Int) -> Int {
    return reduce(stride(from: 0, to: count, by: 1), 0) { total, _ in
        total &+ g.generateNumber()
    }
}

func generateUsingGeneric<T: NumberGeneratorType>(var g: T, count: Int) -> Int {
    return reduce(stride(from: 0, to: count, by: 1), 0) { total, _ in
        total &+ g.generateNumber()
    }
}

Running the above, compiled with -O, with a count of 10k iterations, gives the following timings:

(the full code with the timing can be found here)

Generic rand          261829µs
Protocol rand         286625µs
Generic incr            5481µs
Protocol incr          45094µs
Generic const              0µs
Protocol const         43666µs

So what does this tell us? Calling arc4random is quite expensive, so the marginal difference made by the protocol is negligible but noticeable. But in the case of the incrementing generator, it’s a lot more in proportion to the actual operation being performed.

And in the case of the constant generator, the compiler has managed to optimize away all the code of the generic version and turn it into a single multiply operation (number of iterations times the constant)! So it takes a constant tiny amount of time. But the protocol indirection acted as a barrier to this same optimization.

In fact if you recompile with -Ounchecked, the same happens with the incrementing generator too – it’s only the check for overflow on the increment that was preventing it. The protocol versions remain the same.

For the most part, much of this can probably be put in the “premature optimization” camp. The big wins are really more about the expressiveness generics give you, as seen in the previous article about sorting, and avoiding things like type erasure shown in the earlier section. But if you’re in the habit of writing generic functions, the performance gains are nice too. And when writing re-useable library functions like sort or reduce that are going to be called a lot and perform tight loops, they’re critical.

Of course, all this is subject to change – given that the protocols are not being used to get any kind of dynamic behaviour, perhaps the compiler could have optimized them too. But it doesn’t appear to at the moment.

Dynamic Dispatch

Speaking of dynamic behaviour – that might be one occasion when you might prefer to use protocols rather than generics.

Consider the following code:

func f(p: Printable) {
    println(p)
}

func g<T: Printable>(t: T) {
    println(t)
}

let a = [1,2,3]
let i = 1

// this will work fine: either can be converted
// to Printable and passed in, then the appropriate
// version of `description` is called at runtime
f(arc4random()%2 == 0 ? a : i)

// this will _not_ compile.  T needs to be fixed as
// either an array of ints, or an int at compile time, 
// it can't be both
g(arc4random()%2 == 0 ? a : i)

A contrived example obviously, but it shows how even structs in Swift can get dynamic behaviour at runtime via protocols, that you might find a use-case for.

There’s lots more to explore with generics. The mental model of “Swift just writes a version of your function for each different type” isn’t quite how it works in practice. And associated types and constraints between multiple placeholders allow you to create kinds of generic algorithms that would be very difficult to achieve with protocols. But this article is already pretty long so I’ll leave those for another day.

In fact you couldn’t implement this function using protocols rather than generics, because SignedIntegerType uses Self and (via IntegerLiteralConvertible) associated types. A topic for a different post. ↩

Sorting Nibbles in Swift

Posted on March 8, 2015March 26, 2015 by airspeedvelocity

A couple of months back, John Regehr posted a challenge on his blog – write the fastest possible sorting algorithm for all the nibbles in a 64-bit word.

The problem is to sort the 4-bit pieces of a 64-bit word with (unsigned) smaller values towards the small end of the word. The nibble sort of 0xbadbeef is 0xfeedbba000000000. The function you implement will perform this sorting operation on a buffer of 1024 64-bit integers:

void nibble_sort(unsigned long *buf);

The challenge was in C, but it made me wonder about solving the same problem in Swift. Here’s a straight port of the supplied reference version in Swift:

func read_nibble(w: UInt64, i: UInt64) -> UInt64 {
    let res = w >> (i * 4)
    return res & 0xf
}

func write_nibble(inout w: UInt64, i: UInt64, v: UInt64) {
    var mask: UInt64 = 0xf
    mask <<= (i * 4);
    w &= ~mask
    var prom = v;
    prom <<= (i * 4)
    w |= prom
}

func nibble_sort_word_ref(var arg: UInt64) -> UInt64 {
    for var i: UInt64 = 0; i < 16; ++i {
        var min = i;
        for var j = i+1; j < 16; ++j {
            if read_nibble(arg, j) < read_nibble(arg, min) {
                min = j
            }
        }
        if (min != i) {
            var tmp = read_nibble(arg, i)
            write_nibble(&arg, i, read_nibble(arg, min))
            write_nibble(&arg, min, tmp)
        }
    }
    return arg;
}

func nibble_sort_ref(buf: UnsafeMutablePointer<UInt64>) {
    for var i=0; i<1024; i++ {
        buf[i] = nibble_sort_word_ref(buf[i])
    }
}

The first bit of good news is that on my machine, when compiled with -Ounchecked, this version performs with pretty much the same speed as the original C version. Sticking with -O instead gives up about ~20% performance, which is not too bad for the sake of safety.

So, this version has two functions to read and write the nibble at a given index, and a function to sort it using them. Get/set at a given index? That sounds a lot like a collection type. What if we were to wrap the word in a collection, and use subscript instead of the read/write functions? Something like this:

struct NibbleCollection: MutableCollectionType {
    var val: UInt64
    
    var startIndex: UInt64 { return 0 }
    var endIndex: UInt64 { return 16 }
    
    subscript(idx: UInt64) -> UInt64 {
        get {
            return (val >> (idx*4)) & 0xf
        }
        set(n) {
            let mask = 0xf << (idx * 4)
            val &= ~mask
            val |= n << (idx * 4)
        }
    }
    
    typealias Generator = IndexingGenerator<NibbleCollection>
    func generate() -> Generator { return Generator(self) }
}

Once you have this, you can then use the standard library sorting function on it to implement the challenge:

func nibble_sort_word_col(arg: UInt64) -> UInt64 {
    var col = NibbleCollection(val: arg)
    sort(&col)
    return col.val
}

“Does all this Collection overhead mean it won’t be as fast as the raw function implementation?” you might worry. But there really is no overhead – since NibbleCollection is a struct, not a class, it really isn’t anything more than the raw underlying UInt64 and some functions. All of the scaffolding sloughs away at compile time, leaving you with pretty much identical code to the function-based version.

With one big difference: Swift.sort is way better than the reference sort implementation above.¹ So this version outperforms it, both in Swift and in C. With -O, the collection version is faster than the Swift reference, and the same speed as the C version. With -Ounchecked it outperforms the C version by about 25%.

Which tells you two things you should bear in mind when writing Swift:

Structs implementing protocols, and generic functions, are a good way of getting abstraction without giving up performance.
Always check if the code you want exists as a library function. Using it won‘t just save time and make your code clearer, it could be better optimized than the version you might write.

Of course, the competitive entries in C were orders of magnitute faster than the reference version, using bit-twiddling techniques like bucket sorts and lookup tables that take advantage of the problem being constrained to values less than 16 held in a single word. But since Swift also supports most of the operations they use, many of them can be ported too. A quick and dirty translations I did of a couple of the simpler ones suggests that, like the reference version, Swift can match or almost match them in performance too.

Naming the two sort algorithms in question is an exercise for the reader. They might not be the one you’d guess first. ↩

Which function does Swift call? Part 1: Return Values

Posted on September 20, 2014September 24, 2014 by airspeedvelocity

ClosedInterval is shy. You have to coax it out from behind its friend, Range.

// r will be of type Range<Int>
let r = 1...5

// if you want a ClosedInterval<Int>, you
// have to be explicit:
let integer_interval: ClosedInterval = 1...5

// if you use doubles though, you don’t need
// to declare the type explicitly.
// floatingpoint_interval is a ClosedInterval<Double>
let floatingpoint_interval = 1.0...5.0

Ever since Swift 1.0 beta 5, Range has supposed to be only for representing collection index ranges. If you’re not operating on indices, ClosedInterval is probably what you want. It has methods like contains, which determines in constant time if an interval contains a value. Range can only use the non-member contains algorithm, which will turn your range into a sequence and iterate over it – don’t accidentally put a loop in your loop!

But Range is not going quietly. If you use the ... operator with an integer, it elbows ClosedRange out the way and dashes onto the stage. Why? Because integers are forward indexes (as used by Array), and the Swift ... operator has 3 declarations:

func ...<T : Comparable>(start: T, end: T) -> ClosedInterval<T>

func ...<Pos : ForwardIndexType>(minimum: Pos, maximum: Pos) -> Range<Pos>

func ...<Pos : ForwardIndexType where Pos : Comparable>(start: Pos, end: Pos) -> Range<Pos>

When you write let r = 1...5, Swift calls the last one of these, and you get back a Range. To understand why, we need to run through the different ways Swift decides which overloaded function to call. There are a lot of them.

Let’s start with:

Differing Return Values

In Swift, you can’t declare the exact same function twice:

func f() {  }

// error: Invalid redeclaration of f()
func f() {  }

What you can do in Swift, unlike in some other languages, is define two versions of a function that differ only by their return type:

struct A { }
struct B { }

func f() -> A { return A() }

// this second f is fine, even though it only differs
// by what it returns:
func f() -> B { return B() } 

// note, typealiases are just aliases not different types,
// so you can't do this:
typealias L = A
// error: Invalid redeclaration of 'f()'
func f() -> L { return A() }

When calling these only-differing-by-return-value functions, you need to give Swift enough information at the call site for it to unambiguously determine which one to call:

// error: Ambiguous use of 'f'
let x = f()

// instead you must specify which return value you want:
let x: A = f()    // calls the A-returning version
                  // x is of type A

// or if you prefer this syntax:
let y = f() as B  // calls the B-returning version
                  // y is of type B

// or, maybe you're passing it in as the argument to a function:
func takesA(a: A) { }

// the A-returning version of f will be called
takesA(f())

Finally, if you want declare a reference to a function f, you need to specify the full type of the function. Note, once you have assigned it, the reference is to a specific function. It doesn’t need further disambiguation:

// g will be a reference to the A-returning version
let g: ()->A = f

// h will be a reference to the B-returning version
let h = f as ()->B

// calling g doesn't need further info, it references 
// a specific f, and z will be of type A:
let z = g()

OK, so ... is a function that differs by return type – it returns either a ClosedInterval or a Range. By explicitly specifying the result needs to be a ClosedInterval, we can force Swift to call the function that returns one.

But if we don’t specify which ... explicitly, we don’t get an error about ambiguity as seen above. Instead, Swift picks the Range version by default. How come?

Because the different versions of ... also differ by the arguments they can take. In the next article, we’ll take a look at how that works.

Protocols and Assumptions

Posted on August 29, 2014January 10, 2015 by airspeedvelocity

edit: subsequent to this article being written, the Swift standard library has been updated, and documentation-comments above the relevant methods of RangeReplaceableCollectionType now explicitly state: “Invalidates all indices with respect to self.” Bear this in mind as you read on:

What does it mean to implement a protocol? Possibly more than supporting methods with specific names.

Obviously the methods ought to do what their names imply – isEmpty shouldn’t fire the torpedoes. But as well as basic functionality, there are also guarantees about things like the method’s complexity, and possibly wider implications for how the class itself behaves.

More importantly, protocols might not guarantee a behaviour you think they do and are relying on. Using protocols with generics can sometimes give you the illusion of more guarantees than you actually have.

Suppose you want to write a function remove that removes entries from a collection in-place – that is, similar to the sort function, it takes a collection as an inout parameter, and removes from it all the entries that match a given value. ¹

Unlike sort, which just requires MutableCollection, remove would need to use the new-in-beta6 RangeReplaceableCollectionType, which includes a removeAtIndex method. Armed with this, you might write the following: ²

func remove
    <C: RangeReplaceableCollectionType,
     E: Equatable
     where C.Generator.Element == E>
    (inout collection: C, value: E) {
        var idx = collection.startIndex
        while idx != collection.endIndex {
            if collection[idx] == value {
                collection.removeAtIndex(idx)
            }
            else {
                ++idx
            }
        }
}

Embedded in this code are a lot of assumptions about how collections and their indices behave when you mutate them, and some of these might not be valid, depending on the collection type. It works with String, the only explicitly range-replaceable collection in the standard library currently, as well as the secretly range-replaceable Array, ContiguousArray and Slice. But you could easily implement a new type of collection that was range-replaceable for which the above code would explode in flames.

The biggest assumption is that removing an element from a collection does not completely invalidate an index. That is, after you call collection.removeAtIndex(idx), idx remains a legitimate index into the collection rather than just becoming junk.

Next, there’s the assumption that when you remove a entry at an index, that index will now point to the next element in the collection. That’s why, after removing the entry, the code above just goes straight back around the loop without incrementing idx. You could put it another way – when you remove an element, the next element “moves down” to the position indexed by the removed element.

Finally, there’s the assumption that if the element that idx points to is the last element, then what idx will point to after you remove it will be endIndex. Or, to put it another way, as you remove the last element, endIndex “moves down” to equal the index of the last element.

By the way, this last assumption is why the code uses while idx != collection.endIndex rather than the more-elegant for idx in indices(collection). indices returns a Range object between the collection’s start and end, but it would be created before we start looping. Because endIndex is a moving target as we remove some entries from the collection, it won’t work for our purposes. A cleverer version of indices that returned something more dynamic might help, but that could have different undesirable side-effects.

Are all these assumptions legit? Well you can see they obviously are for the Array types, because these just use an integer for their index, with endIndex equal to count. When elements are removed, the rest of the array shuffles down. Even if that resulted in a full reallocation of the array’s memory, the assumptions would still hold because all the index does is represent a relative offset from the start of the array, not point to a specific location.

Strings are trickier, because their index type is opaque. Chances are it’s still an offset from the start of the string, but because Swift characters aren’t of uniform width, ³ that offset doesn’t necessarily increment by a constant amount each time. Still, if this is the implementation, the above assumptions would hold, and experimentation suggests they do.

What kind of collection might not adhere to these assumptions? Well, a very simple doubly-linked list implemention might not. ⁴ If the index for a linked list were a pointer to each node, then removing that node could leave the index pointing at a removed entry. You couldn’t just loop around without first pointing to the next node in the list:

func remove
    <C: RangeReplaceableCollectionType,
     E: Equatable
     where C.Generator.Element == E>
    (inout collection: C, element: E) {
        var idx = collection.startIndex
        while idx != collection.endIndex {
            if collection[idx] == element {
              // first grab the index of the next element
              let next = idx.successor()
              // then remove this one
              collection.removeAtIndex(idx)
              // and repoint
              idx = next
            }
            else {
                ++idx
            }
        }
}

But then this algorithm would no longer work correctly with arrays and strings! ⁵

So what’s the solution – should RangeReplaceableCollectionType mandate the kind of index validity behaviour our remove algorithm relies on? Or are the assumptions invalid and we need a better algorithm? (of which more in a later article) The Swift standard library is still evolving rapidly with each beta so it’s possibly a little early to tell. For now, be careful about the assumptions you make – just because all the current implementations of a particular protocol work a certain way doesn’t mean other implementations will.

As opposed to a version that returned a copy, which would be called removed. I thought I didn’t like this convention at first, but I’m warming to it. ↩
This code has an efficiency deficiency, which we’ll talk about in a later article. ↩
For an in-depth explanation of Swift strings, see Ole Begemann’s article. ↩
Singly-linked lists couldn’t implement removeAtIndex easily, and would probably have some kind of removeAfterIndex operation instead. ↩
The C++ STL resolves this by having erase return an iterator for the entry just after the erased elements – along with a fairly draconian assertion that any removal invalidates all other iterators (not just ones at and beyond the removed value). But Swift’s removeAtIndex currently returns the removed value, rather than a new index. ↩

filter, String and ExtensibleCollectionType

Posted on August 22, 2014August 23, 2014 by airspeedvelocity

String was extended in beta 6 to implement RangeReplaceableCollectionType. This means that, via inheritance, it also implements ExtensibleCollectionType.¹

ExtensibleCollectionType is interesting, because it requires the collection to support an empty initializer. This means that, without having to resort to shenanigans, you can write a generic function that takes an ExtensibleCollectionType and returns a new one.

Since they were changed to return eagerly-evaluated results, the non-member filter and map have returned arrays, no matter what. This is a bit frustrating when working with some non-array types, such as String:²

let vowels = "eaoiu"
let isConsonant = { !contains(vowels, $0) }
let s = "hello, i must be going"
// filtered will be an array
let filtered = filter(s, isConsonant)
// and then we have to turn it back into a string
let only_consonants = String(seq: filtered)
// only_consonants is "hll,  mst b gng"

It would be nice to have a version of filter that took a String and returned a String instead of an Array. ³ Even better, it would be nice to have a single generic version that worked on both arrays and strings.

Here’s one:

func my_filter
  <C: ExtensibleCollectionType>
  (source: C, includeElement: (C.Generator.Element)->Bool)
  -> C {
    // use the `init()` from `ExtensibleCollectionType`
    var result = C()
    for element in source {
        if(includeElement(element)) {
            // append is also part of `ExtensibleCollectionType`
            result.append(element)
        }
    }
    return result
}

// my_filter returns a String when passed one:
let only_consonants = my_filter(s, isConsonant)

Since this is possible, should Swift’s filter and map be changed to be like this? Maybe, but I can think of a couple of reasons why not.

First, it’d be a bit inconsistent and possibly surprising. Not all collections are extensible collections. Dictionary isn’t. Range and StrideTo even less so – they’re like “virtual” collections that don’t really have individual elements at all. So there’d still need to be versions that took these collections and returned an array. So when calling filter, you’d need to know whether your collection was extensible to know whether you were going to get back the same collection type or an array.

There’s precedent for this kind of thing. lazy gives you back different types depending on what you pass in. But lazy is very explicit. map and filter would be a bit more subtle, and bear in mind subtle maybe-unexpected behaviour was probably the reason lazy evaluation was moved into the lazy family in the first place.

Second, maybe you do want an array back. This can be catered for – declare a second version of my_filter like so: ⁴

func my_filter
    <C1: ExtensibleCollectionType, C2: ExtensibleCollectionType
    where C1.Generator.Element == C2.Generator.Element>
    (source: C1, includeElement: (C1.Generator.Element)->Bool)
    -> C2 {
        var result = C2()
        for element in source {
            if(includeElement(element)) {
                result.append(element)
            }
        }
        return result
}

// same-type version will be used by default
let consonant_string = my_filter(s, isConsonant)
// but if you declare the result as a specific type, the
// second version will be used:
let consonant_array = my_filter(s, isConsonant) as Array

Third, there’s the big gotcha that means this wouldn’t be a good idea, but that I haven’t thought of. If you have, leave a comment or tweet me.

or rather, _ExtensibleCollectionType, which contains the goods, and that ExtensibleCollectionType just inherits without any additions. I’m not sure why it’s done this way, though I’m guessing it’s not for no reason. ↩
This is of course a horrible piece of code, ignoring upper-case characters, not to mention accented characters, but let’s keep the examples simple. ↩
Using lazy(s).filter is probably more efficient, since it won’t require the construction of temporary Array. But the issue of it being a two-step process remains. ↩
Having written the second version, you should probably implement the first one in terms of the second to avoid code duplication. ↩

Writing Algorithms on Collections in Swift

Posted on June 17, 2014November 1, 2014 by airspeedvelocity

edit: this post has been updated to Swift as of Xcode 6.1 (Swift 1.1).

Suppose you want a new algorithm that checks if every member of an array fulfills a given criteria.

Your first thought might be to add it as an extension of Array:¹

extension Array {
    func all_of<B: BooleanType>
        (predicate: (Element) -> B) -> Bool {
            for e in self {
                if !predicate(e) {
                    return false
                }
            }
            return true
    }
}

This function takes a predicate (a function that takes an element and returns true or false), and applies it to every member of the array, only returning true if it was true for every member.

Note we specify a template argument, B: BooleanType, that allows the predicate to return not just a Bool but any type that can behave like a Bool (such as ObjCBool)².

We can test it works easily:

let isEven = { ($0 % 2) == 0 }
let mixed = Array(1...4)     // [1, 2, 3, 4]
let doubled = map(1...4) { $0 * 2 }
let evens = Array(doubled)   // [2, 4, 6, 8]

mixed.all_of(isEven)  // false
evens.all_of(isEven)  // true

Incidentally, the above shows a nice trick that’s useful for testing collection algorithms, which is that arrays can be initialized with sequences, such as ranges. So if you quickly want an array of increasing numbers, you can write Array(begin...end). If you want something more complicated, you can pass a range to map or filter to alter the range first.

This implementation is nice because it’s now built into Array, but there are a few downsides.

Firstly, what if you wanted a slightly different version all_of_equal, that just took a value and confirmed if every entry was equal to that value? You’ll need to test for equality, so Element will need to be Equatable. You might think you’d specify a function that goes like this:

extension Array {
    func all_of_equal<Element: Equatable>
    (value: Element) -> Bool {
        return self.all_of { $0 == value }
    }
}

Nope, that won’t work. The problem is that you can’t further constrain Element within Array to be Equatable. What if you created an array of something that didn’t conform to Equatable? Would this no longer be allowed now you’ve added this method? Or should this method not be callable? Neither of these are palatable so the language just won’t let you do it.

Instead, you need to define a non-member function that takes an Array and an Element. Then you can constrain that element as much as you like:

func all_of<E, B: BooleanType>
    (a: Array<E>, predicate: (E) -> B) -> Bool {
        for e in a {
            if !predicate(e) {
                return false
            }
        }
        return true
}

func all_of_equal<E: Equatable>
    (a: Array<E>, rhs: E) -> Bool {
        return all_of(a) { $0 == rhs }
}

In fact, because of Swift’s operator overloading, you can even give it the same name:

func all_of<E: Equatable>
    (a: Array<E>, rhs: E) -> Bool {
        return all_of(a) { $0 == rhs }
}

// calls the version that takes an equatable value
all_of([1,1,1], 1)

// calls the version that takes a closure
all_of([1,1,1]) { $0%2 == 1 }

The second benefit of defining this as a non-member function is you can generalize it to apply not just to arrays, but any kind of collection.³ To do this, you need to extend the generic parameter clause in a couple of ways:

func all_of<E, C: CollectionType, L: BooleanType
    where E == C.Generator.Element>
    (c: C, predicate: (E) -> L) -> Bool {
        for e in c {
            if(!predicate(e)) {
                return false
            }
        }
        return true
}

First, we swap out Array for a type of Collection, by specifying a generic placeholder C that must be a collection. Then, we further constrain the collection to contain types E, which are the type of input our predicate will take.

Strictly speaking, we don’t need to include the placeholder E. You could replace it in the predicate definition with C.Generator.Element. But you might find it’s cleaner to create a placeholder for it, especially if you apply further constraints to E.

Now, we can use this function on not just arrays, but any type of collection, such as ranges directly:

let isEven = { ($0 % 2) == 0 }
all_of(1...4, isEven)  // returns false

or other types of collection:

let doubled = map(1...4) { $0 * 2 }
all_of(doubled, isEven)  // returns true

There’s one final improvement that could be made. Collections implement the Sequence protocol, which also gives us enough to iterate over the elements, checking the predicate. So we could rewrite all_of even more generally:

func all_of<E, S: SequenceType, B: BooleanType
where E == S.Generator.Element>
(sequence: S, predicate: (E) -> B) -> Bool {
    for e in sequence {
        if(!predicate(e)) {
            return false
        }
    }
    return true
}

Because Sequences behave differently, we don’t need to mandate a ForwardIndex to make them work with the for.

There are some times you still want to use collections rather than sequences. If we were writing a different algorithm, say find_first_of, that we wanted to return an index into the collection of the first occurrence, we’d want to stick with an algorithm that works on collections. Also, collections imply deterministic results over multiple iterations, and consistent size, which some algorithms might rely on. But not in the case of all_of, so SequenceType is a better choice.

The most important question facing the Swift community today: how do you split long function footprints over multiple lines? ↩
In fact, the Swift standard library docs explicitly state that Bool and ObjCBool should really only be the two types that ever implement BooleanType, so maybe you could say screw it and just make the predicate return a Bool. That’s what the predicates for std lib functions like filter do. But then, that wouldn’t make for quite so educational a blog post. ↩
You might be thinking, I know, I’ll extend CollectionType! But CollectionType is a protocol, and you can’t extend protocols with implementations. What you want is something like traits or mixins. Swift doesn’t have those (yet, hopefully). ↩

Implementing Ruby’s ||= operator in Swift using @autoclosure

Posted on June 10, 2014November 1, 2014 by airspeedvelocity

edit: this has been updated for Swift as of Xcode 6.1 (Swift 1.1)

A common idiom in Ruby is to assign a value to a variable only if that variable isn’t already set, using the ||= operator:

s ||= "default value"

Swift doesn’t provide such an operator out of the box, but using some of the features of the language, it’s possible to implement one.

Simple first implementation

First let’s try implementing it just for string types. We need to define the operator itself first:

infix operator ||= {
    associativity right
    precedence 90
    assignment
}

And then implement it with a function:

func ||=(inout lhs: String?, rhs: String) {
    if(lhs == nil) {
        lhs = rhs
    }
}

Note the use of the assignment attribute on the operator definition, and the inout parameter on the left-hand side, which is the variable being assigned to.

This works. If you try the following

var s: String?
s ||= "first assignment"
s ||= "second assignment"

the second assignment will have no effect on s – it will keep the value “first assignment”.

Using generics to apply to any type

What about implementing this with generics, so it will work not just for String but for any type? That’s pretty simple:

func ||=<T>(inout lhs: T?, rhs: T) {
    if(lhs == nil) {
        lhs = rhs
    }
}

Now you can use our ||= on any type optional type – String?, Int?, or any user-defined class.

Using autoclosure to avoid unncessary evaluation

Our ||= still doesn’t quite match the Ruby version. If the value on the left-hand side is already set, the statement on the right-hand side is never executed. This is important if, for example, the right-hand side were a function with other side-effects, or an expensive computation.

But by default in Swift, any statement passed to a parameter is fully executed first. To replicate the Ruby functionality, we have to use an attribute in Swift called autoclosure. This is used like this:

func ||=<T>(inout lhs: T?, rhs: @autoclosure () -> T) {
    if(lhs == nil) {
        lhs = rhs()
    }
}

What autoclosure does is wrap the arguments suppled in a closure, for later execution. Then, if they aren’t needed, they are never executed. If they are needed (in this case, because lhs is nil), the closure can be called (note the new type for rhs, and the parenthesis after rhs, indicating it is now calling a function).

To check this works, try the following, which should only print “I’ve been run” once to the console:

func printlnWhenRun() -> Int {
    println("I've been run")
    return 0
}
var i: Int?
i ||= printlnWhenRun()
i ||= printlnWhenRun()

Implementing for Boolean values

This works great for optional types, but what about the more conventional behaviour, a compound logical-OR-and-assign operator for boolean values?

Despite listing it in the Expressions precedence section of the language reference, this doesn’t appear to be implemented in Swift natively.

edit: it’s now been removed from the precedence section. You can still declare it as below though.

No problem though, we can just implement it ourselves. Here it is:

func ||=<T: BooleanType>(inout lhs: T, rhs: @autoclosure () -> T) {
    if(!lhs) {
        lhs = rhs()
    }
}

Note the BooleanType type constraint after the type parameter. This is necessary because we need to guarantee that you can pass whatever type is used into the if statement. This wasn’t necessary with the other version because it was the optional qualifier supplying this ability.¹

But doesn’t this clash with the other definition? Nope. A combination of the type constraint on the logical version, and the presence of the optional parameter on the optional-assignment version, means the possible uses of these two functions are entirely disjoint, and the Swift compiler will pick the right one for you. Even a Bool? type will go to the optional-assignment version.²

Using lazy properties instead of `||=`

Having done all of this, you could argue ||= isn’t all that useful in Swift because of different feature, lazy properties. These are properties of a class or struct that are only initialized for the first time when they are used – in much the same way as Ruby devs use ||=. Below is some code showing this in action:

class MyClass {
    lazy var complexThing = CreateComplexThing()
}
var c = MyClass()
// ... maybe some time later
var thing = c.complexThing  // CreateComplexThing() runs now

Obviously this only applies when your property is part of a class, not when you’re declaring a variable in a function, so maybe there’s still a place for ||=. But if not, at least we learned something implementing it.

And there’s no need to constrain to objects with the ability to assign to each other – they all have that ability. There’s no “Assignable” protocol. Be nice if there were – that would imply you could overload “=”. But this ain’t C++. ↩
It is possible to write two generic functions that overlap in their inputs, in which case you will get an “use of unresolved identifier” error from the compiler when you call the function (note, not when you define the overlapping function, which is a break from other features of generics when you get a compiler error at the time of definition rather than use). ↩

An Accumulator in Swift, Part 2 – Using Generics

Posted on June 6, 2014April 3, 2015 by airspeedvelocity

In Part 1 of this post, we wrote an implementation of Paul Graham’s accumulator problem in Swift. Here’s one of the versions:

func foo(var n: Int) -> (Int) -> Int {
  return { n += $0; return n }
}

This looks pretty similar to the Ruby version he gives:

def foo (n)
  lambda {|i| n += i }
end

Except there’s a problem. On his page of guidelines for submitting an example in a new language¹ he points out it needs to:

[work] for any numeric type – i.e. can take both ints and floats and returns functions that can take both ints and floats. (It is not enough simply to convert all input to floats. An accumulator that has only seen integers must return integers.)²

In Ruby, this isn’t an issue because of the duck typing. But in Swift, if you try to pass a Float to foo, you’ll get an error. We’re going to have to use another feature to fix this – generics.³

First let’s try implementing incf using generics, so it can take an Int or a Float. We do this by defining a type parameter T in angle brackets after the function name, and then using that instead of the explicit Ints:

func foo<T>(var n: T) -> (T) -> T {
  return { n += $0; return n }
}

Oops, more compiler errors. “Could not find an overload for ‘+=’ that accepts the supplied arguments” again.

This is because not all objects have a += operator, so if you tried to use it with one of those, it wouldn’t work. Unlike in C++ templates, where you wouldn’t get a compiler error⁴ unless you tried to use incf with a type that was missing a +=, Swift requires you to restrict your types up-front.

The way you ensure a type will support the necessary member functions is by using Type Constraints. These are protocols you append to your generic type. For example, if you wanted to write a generic less_than function, you could use the Comparable type:

func less_than<T: Comparable>(lhs: T, rhs: T) -> Bool {
    return lhs < rhs
}

Only classes that implement the Comparable protocol can be used with the less_than function, and because they impelement this protocol they’re guaranteed to have a < operator.

The Swift standard library only defines 3 protols so far – Equatable, Comparable and Printable.⁵ They don't cover what we need, so we'll define our own. Let's call it, uhmm, Addable.

protocol Addable {
    @assignment func += (inout left: Self, right: Self)
}

Then, we need to declare that Int and Float support Addable by extending them with the new protocol:

  extension Int: Addable { }
  extension Float: Addable { }

Finally, let's try our generic function with the new constraint:

func foo<T: Addable>(var n: T) -> (T) -> T {
  return { n += $0; return n }
}

Now, if you pass in a Float to foo, it works.⁶. In fact, if you tag any type that implements += it will work for that type too. Try it with String.

Anyway, with this requirement covered, hopefully we can add Swift to the pantheon of powerful languages. Maybe one day they'll take new submissions to the page!

These pages tell a story. First he was all hey guys, let’s get all the examples, this is cool! Then he’s like, no you numbskulls, read the problem, that’s not what it says! Eventually he gives up and tells people to stop emailing him. ↩
Strictly speaking what we’ll implement still doesn’t cover this requirement, because the quote and the example following it implies it needs to be able to switch dynamically between integers and floating point halfway through. Read the next part for a solution to this. ↩
I love generics. But then the first language they taught me in college was Ada, and after that I spent years in the C++ mines, so it’s not surprising. ↩
The compiler error will be about 10 lines long and will fill you with hopeless despair. Concepts, something similar to Swift’s protocol constraints, didn’t make it into C++11. ↩
They don't define Assignable. I guess that's considered so fundamental you can take it as a given. Presumably they're going to add a lot more standard protocols to the library over time, so keep an eye on that page. There are several more implemented but not yet documented – to see them, type import Swift into playground, and command-click it. ↩
I was pretty amazed when it did the first time I tried this. ↩

	Sorting Nibbles in S… on Sorting Nibbles in Swift
	Collection Data Stru… on Arrays, Linked Lists and …
	Swift化零为整：Reduce 详解… on Arrays, Linked Lists and …
	Writing A Generic St… on Generic Collections, SubSequen…
	Writing A Generic St… on Collection Indices, Slices, an…

	Sorting Nibbles in S… on Sorting Nibbles in Swift
	Collection Data Stru… on Arrays, Linked Lists and …
	Swift化零为整：Reduce 详解… on Arrays, Linked Lists and …
	Writing A Generic St… on Generic Collections, SubSequen…
	Writing A Generic St… on Collection Indices, Slices, an…

Airspeed Velocity

African or European Swift?

generics

My Talk at Swift Summit

Protocols and Generics

Functional Differences

Poking at Protocols

Performance Implications

Dynamic Dispatch

Sorting Nibbles in Swift

Which function does Swift call? Part 1: Return Values

Protocols and Assumptions

filter, String and ExtensibleCollectionType

Writing Algorithms on Collections in Swift

Implementing Ruby’s ||= operator in Swift using @autoclosure

Simple first implementation

Using generics to apply to any type

Using autoclosure to avoid unncessary evaluation

Implementing for Boolean values

Using lazy properties instead of `||=`

An Accumulator in Swift, Part 2 – Using Generics

Functional Differences

Poking at Protocols

Performance Implications

Dynamic Dispatch

Simple first implementation

Using generics to apply to any type

Using autoclosure to avoid unncessary evaluation

Implementing for Boolean values

Using lazy properties instead of ||=

Using lazy properties instead of `||=`