A persistent tree using indirect enums in Swift

Posted on July 22, 2015October 4, 2015 by airspeedvelocity

UPDATE: now conforms to Swift 2.1b2

Suppose you want an ordered set. Set is great and all, but it’s unordered. You could always use a Set anyway, and sort the contents when needed. Or you could use an array, keep it sorted, and always insert new elements in the right place. Or, you could use a tree.

One way to build a tree in Swift would be with a class that refers to itself. Something like this for a binary search tree:

class Tree<Element: Comparable> {
    let value: Element
    // entries < value go on the left
    let left: Tree<Element>?
    // entries > value go on the right
    let right: Tree<Element>?
}

For the tree’s logic, Element would only have to be Comparable, not Hashable, which is another potential benefit over Set.

The left and right subtrees are optional, with nil at the leaves when there’s no lesser/greater value than the current node. But there’s a problem with this approach – how do you represent an empty tree? value is not optional. Sure you could make it optional, and have an empty tree be represented by a nil value. But that doesn’t feel healthy.

Really, you want an enum with an associated type. Something like this:

enum Tree<Element: Comparable> {
    case Empty
    case Node(Tree<Element>,Element,Tree<Element>)
}

So, a tree is either empty, or it has a left and right subtree and a value. The subtrees no longer need to be optional, because if they’re empty, they can just hold the .Empty value.

No more boxes

Except, this won’t work. Swift enums are value types. So they can’t contain themselves like this. You need some kind of indirection – a reference type, such as a class.

Up until 2.0b4, you would have to use a Box trick – inserting a class in between each enum:

final class Box<Element> {
    let unbox: Element
    init(_ x: Element) { self.unbox = x }
}

enum Tree <Element: Comparable> {
    case Empty
    case Node(Box<Tree<Element>>,Element,Box<Tree<Element>>)
}

Well that’s pretty gross. And we haven’t even got to the actual implementation logic yet. Trust me, Box is gonna stomp all over that code leaving its muddy footprints everywhere.

But good news! The latest release of Swift introduces the indirect keyword for enums. This is similar to a transparent version of the box above, putting a reference where the boxed value would go. Xcode will even suggest putting in the keyword for you.

So now we can write this:

indirect enum Tree<Element: Comparable> {
    case Empty
    case Node(Tree<Element>,Element,Tree<Element>)
}

But really, we want a balanced tree. Simple binary search trees work well with random data, but if you insert already-ordered data into them, they degenerate into a linked list (because all the insertions happen down one side of the tree).

So lets go with a red-black tree – a kind of binary search tree that colours its nodes red or black, and then uses some invariants to guarantee it remains balanced on insertion. If you follow that link you’ll see quite a lot of ugly code for maintaining those invariants – but don’t worry, we’re going to write something that hopefully looks a lot simpler.

So here’s the tree, now with added colour, plus a couple of initializers to make things easier:

enum Color { case R, B }

indirect enum Tree<Element: Comparable> {
    case Empty
    case Node(Color,Tree<Element>,Element,Tree<Element>)

    init() { self = .Empty }

    init(_ x: Element, color: Color = .B,
        left: Tree<Element> = .Empty, right: Tree<Element> = .Empty)
    {
        self = .Node(color, left, x, right)
    }
}

Some might not like that I called my Color cases R and B instead of Red and Black. It’s a personal preference, but I like to keep them shorter because a) they’re arbitrary – they are only called that because those were the two colours on the laser printer they had at Xerox PARC, and b) it keeps the pattern matching shorter and (therefore, IMO) clearer. Judge for yourself when we get to that part.

By the way, this implementation is a Swift translation of an ML version from Chris Okasaki’s Purely Functional Data Structures which is definitely worth reading if you like this kind of stuff. What’s cool is that with the new 2.0 features of Swift, the Swift version is getting pretty close to the ML one in terms of expressivity/simplicity.

Checking if the tree contains an element

So, let’s start with contains. This needs to recur down the tree, checking each value to see if it’s empty, or if not, whether its value is less than, greater than or equal to the value we’re looking for.

Ideally I would write this using a guard statement like so:

extension Tree {
    func contains(x: Element) -> Bool {
        guard case let .Node(_,left,y,right) = self
          else { return false }

        if x < y { return left.contains(x) }
        if y < x { return right.contains(x) }
        return true
    }
}

I like how guard helps here. It lets us unpack the important stuff used by the main body of the function (is it a node? If so, grab the left, right and value elements – discarding color, since its not important). But it also lets us put the failure case – if you hit an empty node, you know the tree does not contain the element – up front. Then, head left if the element we’re looking for is less than this one, or right if it’s greater.

We can rely on the fact that Comparable elements are required to be in a strict total order. This means if x is neither less than or greater than a value, it must be equal to that value. So we’ve found the element and contains is true.

Inserting an element

OK, insert is a little trickier. We’re going to define insert as a method that returns a new tree – one with the new value inserted. This is a little different to Set.insert, which is an in-place insert.

In doing this, we’re going to create a “persistent” data structure – one that does not change the old structure when it updates it. This might sound like multiple insertions in a row are going to perform many unnecessary copies of the whole structure, but it isn’t necessarily that bad. Because nodes of the tree are read-only, two trees can share nodes. Only the nodes that are actually changed as a result of the insert need to be copied.

First, the insert method. It doesn’t really do much except maintain one of the invariants required to maintain balance: the root node must always be black. It leaves most of the work to a second function, ins:

private func ins<T>(into: Tree<T>, _ x: T) -> Tree<T> {
    guard case let .Node(c, l, y, r) = into
        else { return Tree(x, color: .R) }

    if x < y { return balance(Tree(y, color: c, left: ins(l,x), right: r)) }
    if y < x { return balance(Tree(y, color: c, left: l, right: ins(r, x))) }
    return into
}


extension Tree {
    func insert(x: Element) -> Tree {
        guard case let .Node(_,l,y,r) = ins(self, x)
            else { fatalError("ins should never return an empty tree") }

        return .Node(.B,l,y,r)
    }
}

So, if we’ve hit an empty node, append the value here as a new node. New nodes are always red.

Otherwise, insert the value into the left or right subtree. And if the element is equal, stop and don’t insert anything, since we’re implementing set behaviour here.

But there’s also the call to balance. This function is going to ensure that, as we come back up the tree having inserted our new element, the tree is properly balanced, by looking at whether any invariants have been broken and fixing them.

This is normally where the ugliness of implementing a red-black tree resides. But we’re going to use Swift’s pattern matching to try and keep it simple. The balancing logic can be summarized as: “on the way back up from an insertion, if there’s a black node, with a red child and a red grandchild, this needs fixing”.

A red child and grandchild of a black node can happen in one of four different configurations. Here’s a visualization of one of those possibilities (excuse the ASCII art):

        z: black
       /        \
      x: red     d
     / \
    a   y: red
       / \
      b   c

needs rearranging into:

         y: red
       /        \
      x: black   z: black
     / \        / \
    a   b      c   d

Once you understand this visualization, you can then encode it into a Swift pattern matching if statement, that extracts all the relevant variables from the possible imbalanced structure, then returns them reconfigured as the second, balanced, version.

The above re-balancing operation would end up encoded in Swift like this:

case let .Node(.B, .Node(.R, a, x, .Node(.R, b, y, c)), z, d):
    return .Node(.R, .Node(.B, a, x, b), y, .Node(.B, c, z, d))

This is why I prefer .B and .R for the colour case names – single-letter names to me look clearer in this stage, which is when they matter. There’s no getting around it, the transformation code is going to be hard to read no matter what you do, so the best you can hope for is easy translation. The longer words wouldn’t help with the key problem, which is encoding the diagram above into a pattern-matching statement.

To write the full balancing function, you need to visualize all 4 imbalanced configurations, then encode them into 4 patterns. If a node doesn’t match any, it’s good and you can return it as-is:

private func balance<T>(tree: Tree<T>) -> Tree<T> {
    switch tree {
    case let .Node(.B, .Node(.R, .Node(.R, a, x, b), y, c), z, d):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    case let .Node(.B, .Node(.R, a, x, .Node(.R, b, y, c)), z, d):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    case let .Node(.B, a, x, .Node(.R, .Node(.R, b, y, c), z, d)):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    case let .Node(.B, a, x, .Node(.R, b, y, .Node(.R, c, z, d))):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    default:
        return tree
    }
}

Note, this pattern match recurses 3 levels deep into the enum. This is also where getting rid of Box was critical. It seriously gets in the way – here is how it would look before indirect:

if case let .Node(.B,l,z,d) = tree, case let .Node(.R,ll,y,c) = l.unbox, case let .Node(.R,a,x,b) = ll.unbox {
    return .Node(.R, Box(.Node(.B,a,x,b)),y,Box(.Node(.B,c,z,d)))
}

Note, the return statement is the same in all 4 rebalance patterns. It’s only the pattern match that changes, based on which of the 4 imbalance possibilities is being checked.

It might be nice to be able to group these 4 possible matches together like so:

    switch tree {
    case let .Node(.B, .Node(.R, .Node(.R, a, x, b), y, c), z, d),
             .Node(.B, .Node(.R, a, x, .Node(.R, b, y, c)), z, d),
             .Node(.B, a, x, .Node(.R, .Node(.R, b, y, c), z, d)),
             .Node(.B, a, x, .Node(.R, b, y, .Node(.R, c, z, d))):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    default:
        return tree
    }

but Swift won’t allow multiple patterns in a case when the pattern declares variables, so we have to stick with the multiple returns for now.

Traversing in-order

So, now we have a nice balanced tree. But to get the benefit of an ordered set, we want to iterate over it in order. This sounds like a job for SequenceType. That way, you could use for...in over the sorted elements.

In-order traversal of a tree is usually done recursively, but that’s a little tricky to do when you want to give anyGenerator a closure expression that spits out the next element. So instead, we can use a stack to do it iteratively:

extension Tree: SequenceType {
    func generate() -> AnyGenerator<Element> {
        var stack: [Tree] = []
        var current: Tree = self

        return anyGenerator { _ -> Element? in
            while true {
                // if there's a left-hand node, head down it
                if case let .Node(_,l,_,_) = current {
                    stack.append(current)
                    current = l
                }
                // if there isn’t, head back up, going right as
                // soon as you can:
                else if !stack.isEmpty, case let .Node(_,_,x,r) = stack.removeLast() {
                    current = r
                    return x
                }
                else {
                // otherwise, we’re done
                return nil
                }
            }
        }
    }
}

Initializing from another sequence

One last extension. When you have a container, it’s nice to be able to initialize it from another sequence, as well as from a literal. These are both pretty easy:

extension Tree: ArrayLiteralConvertible {
    init <S: SequenceType where S.Generator.Element == Element>(_ source: S) {
        self = source.reduce(Tree()) { $0.insert($1) }
    }

    init(arrayLiteral elements: Element...) {
        self = Tree(elements)
    }
}

Given these, you can now create new sets using literal syntax:

let alphabet = Tree("the quick brown fox jumps over the lazy dog".characters)
let primes: Tree = [2, 3, 5, 7, 11, 13, 17, 19]

That’s it for now. Several features of Swift 2.0 make this code a lot more pleasant to read and write. There are various further enhancements you could make. The tree maybe ought to be wrapped in an outer struct to avoid exposing the enum implementation. The next logical extension would be conforming to CollectionType.

And there are also performance improvements you could make (the functional data structures book suggests a few as exercises). Honestly, the times when this data structure will do better than an unordered set or array + sort are probably going to be pretty rare. But it’s nice to have the option now.

Below is the final code, along with some tests, or you can find the full code for the tree in a gist, here.

If you found this interesting (and if you made it all the way down here, hopefully that’s true!), it’ll be part of Chris Eidhof and my book, which you can get a preview of here.

enum Color { case R, B }

indirect enum Tree<Element: Comparable> {
    case Empty
    case Node(Color,Tree<Element>,Element,Tree<Element>)
    
    init() { self = .Empty }
    
    init(_ x: Element, color: Color = .B,
        left: Tree<Element> = .Empty, right: Tree<Element> = .Empty)
    {
        self = .Node(color, left, x, right)
    }
}


extension Tree {
    func contains(x: Element) -> Bool {
        guard case let .Node(_,left,y,right) = self
            else { return false }
        
        if x < y { return left.contains(x) }
        if y < x { return right.contains(x) }
        return true
    }
}

private func balance<T>(tree: Tree<T>) -> Tree<T> {
    switch tree {
    case let .Node(.B, .Node(.R, .Node(.R, a, x, b), y, c), z, d):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    case let .Node(.B, .Node(.R, a, x, .Node(.R, b, y, c)), z, d):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    case let .Node(.B, a, x, .Node(.R, .Node(.R, b, y, c), z, d)):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    case let .Node(.B, a, x, .Node(.R, b, y, .Node(.R, c, z, d))):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    default:
        return tree
    }
}


private func ins<T>(into: Tree<T>, _ x: T) -> Tree<T> {
    guard case let .Node(c, l, y, r) = into
        else { return Tree(x, color: .R) }

    if x < y { return balance(Tree(y, color: c, left: ins(l,x), right: r)) }
    if y < x { return balance(Tree(y, color: c, left: l, right: ins(r, x))) }
    return into
}


extension Tree {
    func insert(x: Element) -> Tree {
        guard case let .Node(_,l,y,r) = ins(self, x)
            else { fatalError("ins should never return an empty tree") }

        return .Node(.B,l,y,r)
    }
}



extension Tree: SequenceType {
    func generate() -> AnyGenerator<Element> {
        var stack: [Tree] = []
        var current: Tree = self

        return anyGenerator { _ -> Element? in
            while true {
                // if there's a left-hand node, head down it
                if case let .Node(_,l,_,_) = current {
                    stack.append(current)
                    current = l
                }
                // if there isn’t, head back up, going right as
                // soon as you can:
                else if !stack.isEmpty, case let .Node(_,_,x,r) = stack.removeLast() {
                    current = r
                    return x
                }
                else {
                // otherwise, we’re done
                return nil
                }
            }
        }
    }
}

extension Tree: ArrayLiteralConvertible {
    init <S: SequenceType where S.Generator.Element == Element>(_ source: S) {
        self = source.reduce(Tree()) { $0.insert($1) }
    }
    
    init(arrayLiteral elements: Element...) {
        self = Tree(elements)
    }
}

import Darwin

extension Array {
    func shuffle() -> [Element] {
        var list = self
        for i in 0..<(list.count - 1) {
            let j = Int(arc4random_uniform(UInt32(list.count - i))) + i
            guard i != j else { continue }
            swap(&list[i], &list[j])
        }
        return list
    }
}


let engines = [
    "Daisy", "Salty", "Harold", "Cranky",
    "Thomas", "Henry", "James", "Toby",
    "Belle", "Diesel", "Stepney", "Gordon",
    "Captain", "Percy", "Arry", "Bert",
    "Spencer",
]

// test various inserting engines in various different permutations
for permutation in [engines, engines.sort(), engines.sort(>),engines.shuffle(),engines.shuffle()] {
    let t = Tree(permutation)
    assert(!t.contains("Fred"))
    assert(t.elementsEqual(t.insert("Thomas")))
    assert(!engines.contains { !t.contains($0) })
    assert(t.elementsEqual(engines.sort()))
    print(t.joinWithSeparator(","))
}

More fun with implicitly wrapped non-optionals

Posted on January 2, 2015March 23, 2015 by airspeedvelocity

As we’ve seen previously, Swift will happily auto-wrap your non-optional value inside an optional temporarily if it helps make the types involved compatible. We’ve seen the occasional instance where this can be surprising. But for the most part, this implicit wrapping of non-optionals is your friend.¹ Here’s a few more examples.

Equating Optionals

The Swift standard library defines an == operator for optionals:

func ==<T : Equatable>(lhs: T?, rhs: T?) -> Bool

So long as your optional wraps a type that’s Equatable, that means you can compare two optionals of the same underlying type:

let i: Int? = 1
let j: Int? = 1
let k: Int? = 2
let m: Int? = nil
let n: Int? = nil

i == j  // true,  1 == 1
i == k  // false, 1 != 2
i == m  // false, non-nil != nil
m == n  // true,  nil == nil

let a: [Int]? = [1,2,3]
let b: [Int]? = [1,2,3]

// error, [Int] is not equatable
// (it does have an == operator, but that's not the same thing)
a == b

let f: Double? = 1.0

// ignore the weird compiler error,
// the problem is Int and Double aren't
// the same type...
i == f

Well, this is useful, but even more useful is this one definition of == allows you to compare optionals against non-optionals too. Suppose you want to know if an optional string contains a specific value. You don’t care if it’s nil or not – just whether it contains that value.

You can just pass your non-optional into that same == that takes optionals:

let x = "Elsa"

let s1: String? = "Elsa"
let s2: String? = "Anna"
let s3: String? = nil

s1 == x  // true
s2 == x  // false
s3 == x  // false

This works because Swift is implicitly wrapping x in an optional each time. If Swift didn’t do this, and you implemented an optional-to-non-optional operator yourself, you’d have to define == an extra two times, to make sure it was symmetric (which is a requirement whenever you implement ==):

// definitions if there weren't implicit optionals
func ==<T:Equatable>(lhs:T, rhs:T?)->Bool { return rhs != nil && lhs == rhs! }
func ==<T:Equatable>(lhs:T?, rhs:T)->Bool { return lhs != nil && lhs! == rhs }

Beware though, optional == isn’t all sunshine and unicorn giggles. Imagine the following code:

let a: Any = "Hello"
let b: Any = "Goodbye"

// false, of course
(a as? String) == (b as? String)

// oops, this is true...
(a as? Int) == (b as? Int)

Depending on how you think, this is either perfectly reasonable or immensley confusing, but either way you should bear it in mind. And maybe it’s another reason to go on your long list “Why not to use Any and as unless you really really have to.”

Comparing Optionals

Unsurprisingly, there’s a similar definition of < that will compare two optionals so long as what they wrap is Comparable:

func <<T : _Comparable>(lhs: T?, rhs: T?) -> Bool

If lhs and rhs are both some value, it compares them. Two nils are considered equal just like before. But a nil is always considered less than any non-nil value, (Comparable must always enforce a strict total order so they have to go somewhere well-defined in the ordering).

So if you sort a set of optionals, the nil values all end up at one end:

let a: [Int?] = [1, 3, nil, 2]
sorted(a, <)
// results in [nil, 1, 2, 3]

This is fairly intuitive, but it can lead to other subtle bugs. Suppose you want to filter a list like so:²

let ages = [
    "Tim":  53,  "Angela": 54,  "Craig":   44,
    "Jony": 47,  "Chris":  37,  "Michael": 34,
]

let people = ["Tim", "Johny", "Chris", "Michael"]

// don't trust anyone over 40
let trust = people.filter { ages[$0] < 40 }

// trust will contain ["Johny", "Chris", "Michael"]

The string "Johny" was not in the ages dictionary so you get a nil – but the way < works, nil is less than 40, so included in the filtered array.

Optional Subscripts

While we're talking about dictionaries, did you know that Dictionary.subscript(key) doesn't just return an optional, it also takes an optional?

var d = ["1":1]

let four = "4"
let nonsense = "nonsense"

// this will add ["4":4] to d
// remember toInt returns an optional
// (the string might not have been a number)
d[four] = four.toInt()

// nothing will be added, because
// the rhs is nil
d[nonsense] = nonsense.toInt()

// this will remove the "1" entry from d
d["1"] = nil

That's quite useful – but also, there wasn't much choice in it working like this.³ Dictionary.subscript(key) { set } has to take an optional, because when you define a setter, you must define a getter. And Dictionary's getter has to be optional (in case the key isn't present). So the setter's input value has to be.

This means every time you assign a value to a dictionary using a key subscript, your non-optional value is getting auto-wrapped in an optional. If this didn’t happen automatically, it would make assigning values to your dictionary a real pain, as you’d have to type d[key] = .Some(value) every time.

To see this in practice, here's a very bare-bones (very inefficient) implementation of a dictionary. Take a look at the subscript to see how the optional is handled:

public struct RubbishDictionary<Key: Equatable, Value> {

    public typealias Element = (Key, Value)

    // just store the key/value pairs in an unsorted
    // array (see, told you it was rubbish)
    private typealias Storage = [Element]
    private var _store: Storage = []

    // there's a reason why this is a private method,
    // which I'll get to later...
    private func _indexForKey(key: Key) -> Storage.Index? {
        // new year's resolution: see if I can avoid
        // calling array[i] completely in 2015
        for (idx, (k, _)) in Zip2(indices(_store),_store) {
            if key == k { return idx }
        }
        return nil
    }

    // Subscript for key-based lookup/storage
    public subscript(key: Key) -> Value? {
        // "get" is pretty simple, either key can be
        // found, or not...
        get {
            if let idx = _indexForKey(key) {
                // implicit optional wrap
                return _store[idx].1
            }
            else {
                return nil
            }
        }
        // remember, newValue is a Value?
        set(newValue) {
            switch (_indexForKey(key), newValue) {

            // existing index, so replace value
            case let (.Some(idx), .Some(value)):
                _store[idx].1 = value

            // existing index, nil means remove value
            case let (.Some(idx), .None):
                _store.removeAtIndex(idx)

            // new index, add non-nil value
            case let (.None, .Some(value)):
                _store.append((key, value))

            // new index, nil value is no-op
            case (.None,.None):
                break
            }
        }
    }
}

Finally, just a reminder that an optional containing an optional containing nil is not the same as nil:

var d: [Int:Int?] = [1:1, 2:nil]
// d now contains two entries, 1:1 and 2:nil
d[2] = nil  // removes nil
d[3] = .Some(nil)  // adds 3:nil

Full imlementation of `RubbishDictionary`

So, in case you've made it all the way down here, and you're interested, I thought I'd end with a fuller but minimal-as-possible implementation of RubbishDictionary. It has almost-feature-parity with Dictionary (it's just missing getMirror and init(minimumCapatity)). It ought to do everything Dictionary can, just slower.

If reading raw code's not your thing, skip it. But it shows how to simply implement several of the standard library's protocols such as CollectionType and DictionaryLiteralConvertible, as well as a few other features like lazily-evaluated key and value collections. I like how Swift extensions allow you to build up the type in logical groups (e.g. each protocol implementation one-by-one). Since this is a post about implicit optional wrapping, I've flagged where it’s also helping keep the code consise.

I have a nagging feeling there are some neater ways to handle some of the optionals – if you spot one, let me know. Oh and bugs. I'm sure there are a couple of them.

// building on the initial definition given above...

// An index type for RubbishDictionary.  In theory, you could perhaps just
// expose an alias of the storage type’s index, however, if the Key type were
// an Int, this would mean subscript(key) and subscript(index) would be
// ambiguous. So instead it's simple wrapper struct that does little more than
// pass-through to the real index.
public struct RubbishDictionaryIndex<Key: Equatable, Value>: BidirectionalIndexType {
    private let _idx: RubbishDictionary<Key,Value>.Storage.Index

    public typealias Index = RubbishDictionaryIndex<Key,Value>

    public func predecessor() -> Index { return Index(_idx: _idx.predecessor()) }
    public func successor() -> Index { return Index(_idx: _idx.successor()) }
}

// Required to make the index conform to Equatable
// (indirectly required by BidirectionalIndexType)
public func ==<Key: Equatable, Value>
  (lhs: RubbishDictionaryIndex<Key,Value>, rhs: RubbishDictionaryIndex<Key,Value>)
   -> Bool { 
    return lhs._idx == rhs._idx 
}

// Conformance to CollectionType
extension RubbishDictionary: CollectionType {
    public typealias Index = RubbishDictionaryIndex<Key,Value>

    // Add a subscript that takes an Index.  This one
    // does not need to be nil, since an indexed entry
    // _must_ exist.  Read-only.
    public subscript(idx: Index) -> Element { return _store[idx._idx] }

    public var startIndex: Index { return Index(_idx: _store.startIndex) }
    public var endIndex: Index { return Index(_idx: _store.endIndex) }

    public typealias Generator = IndexingGenerator<RubbishDictionary>
    public func generate() -> Generator { return Generator(self) }
}

// Conformance to Printable and DebugPrintable
extension RubbishDictionary: Printable, DebugPrintable {
    public var description: String {
        let elements = map(self) { "($0.0):($0.1)" }
        return "[" + ",".join(elements) + "]"
    }

    public var debugDescription: String {
        let elements = map(self) { "(toDebugString($0.0)):(toDebugString($0.1))" }
        return "[" + ",".join(elements) + "]"
    }
}

// Conformance to DictionaryLiteralConvertible
extension RubbishDictionary: DictionaryLiteralConvertible {
    public init(dictionaryLiteral elements: (Key, Value)...) {
        for (k,v) in elements {
            self[k] = v
        }
    }
}

// Remaining features supported by Dictionary
extension RubbishDictionary {

    public var count: Int { return _store.count }
    public var isEmpty: Bool { return _store.isEmpty }

    // now for the public version of indexForKey
    public func indexForKey(key: Key) -> Index? {
        // this maps non-nil results from _indexForKey to
        // the wrapper type, or just returns nil if 
        // _indexForKey did...
        return _indexForKey(key).map { Index(_idx: $0) }
    }

    // a lazily-evaluated collection of keys in the dictionary
    public var keys: LazyBidirectionalCollection<MapCollectionView<RubbishDictionary<Key,Value>,Key>> {
        return lazy(self).map { (k,v) in k }
    }

    // and the same for the values
    public var values: LazyBidirectionalCollection<MapCollectionView<RubbishDictionary<Key,Value>,Value>> {
        return lazy(self).map { (k,v) in v }
    }

    // unlike its subscript equivalent, updateValue:forKey does not take an optional value...
    public mutating func updateValue(value: Value, forKey key: Key) -> Value? {
        if let idx = _indexForKey(key) {
            let old = _store[idx].1
            _store[idx].1 = value
            // implicit optional wrap
            return old
        }
        else {
            _store.append((key,value))
            return nil
        }
    }

    // returns the value removed, if there was one
    public mutating func removeValueForKey(key: Key) -> Value? {
        if let idx = _indexForKey(key) {
            // implicit optional wrap
            return _store.removeAtIndex(idx).1
        }
        else {
            return nil
        }
    }

    public mutating func removeAtIndex(index: Index) {
        _store.removeAtIndex(index._idx)
    }

    public mutating func removeAll(keepCapacity: Bool = false) { 
        _store.removeAll(keepCapacity: keepCapacity)
    }

}

Unlike implicitly-wrapped optionals, which are more like the kind of friend that gets you into fights in bars. ↩
I'm getting a lot of mileage out of this example. ↩
I guess it could have asserted on a nil value. ↩

Randomly Lazy

Posted on July 30, 2014July 31, 2014 by airspeedvelocity

In the previous article on extending lazy, we extended the lazy views with first_n, which returned a sequence of the first n elements of a sequence.

But this isn’t great if what you passed in was a random-access indexed collection. Where possible, it’s nice to retain random access on the output of your lazy function. The lazy map does this, for example. If you pass an array into lazy then call map, the collection you get back is a LazyRandomAccessView, and you can index randomly into it just like you could into an array.

To do this for first_n, we need a struct that implements Collection, takes a Collection as an initializer plus a count n, and layers a view over the top of that collection that only exposes the first n elements.

Or, in the more general case, a view that takes a collection and a sub-range within that collection. Here it is:

struct SubrangeCollectionView<Base: Collection>: Collection {
    private let _base: Base
    private let _range: Range<Base.IndexType>
    init(_ base: Base, subrange: Range<Base.IndexType>) {
        _base = base
        _range = subrange
    }

    var startIndex: Base.IndexType {
      return _range.startIndex
    }
    var endIndex: Base.IndexType {
      return _range.endIndex
    }

    subscript(idx: Base.IndexType)
        -> Base.GeneratorType.Element {
        return _base[idx]
    }

    typealias GeneratorType
        = IndexingGenerator<SubrangeCollectionView>
    func generate() -> GeneratorType {
        return IndexingGenerator(self)
    }
}

This sub-range collection view is so useful that I was sure it was somewhere in the Swift standard library. That’s partly why I wrote this article on collection and sequence helpers – as a side-effect of going line by line looking for it. I’m still expecting someone to reply to this article saying “no you fool, you just use X”. If you are that person, call me a fool here.

The Sliceable protocol seems designed to provide this function. It requires a collection to typealias a SliceType and implement a subscript(Range)->SliceType method. But you need the collection to implement sliceable, which seems overly restrictive, whereas the view above works for any collection.

SubrangeCollectionView enables you to pass in a sub-range of any collection to any algorithm that takes a collection or sequence. For example:

let r = 1...10
let halfway = r.startIndex.advancedBy(5)
let top_half = SubrangeCollectionView(r,
                subrange: halfway..<r.endIndex)
reduce(top_half,0,+)  // returns 6+7+8+9+10

A nice feature is that if you pass a subrange collection into an algorithm that returns an index, such as find, the index returned can also be used on the base collection. So in the example above, if you ran if let idx = find(top_half,8), you could then use idx on not just top_half[idx], but also r[idx].

Operating on subranges is a capability that C++ developers, used to the STL, will be missing from Swift collection algorithms. In the STL, operations on containers are performed not by passing the container itself to the algorithm, but instead by passing two iterators on that container defining a range.

For example, here is the definition of the find algorithm in the C++ STL compared to the Swift version:

// C++ std::find
template <class InputIterator, class T>
  InputIterator find
    (InputIterator first, InputIterator last, const T& val);

// Swift.find
func find
  <C: Collection where C.GeneratorType.Element: Equatable>
    (domain: C, value: C.GeneratorType.Element) -> C.IndexType?

While Swift.find takes a collection as its first argument, std::find takes a first and last iterator.

STL container iterators are similar to Swift collection indices, in that they can be forward, bidirectional or random, and are used to move up and down the collection. Like Swift indices, the end iterator points to one past the last element – the equivalent of Swift.find returning nil for not found would be std::find returning end.

But unlike Swift indices, STL iterators model C pointer-like behaviour, so they are dereferencable. To get the value the iterator points to, you don’t have to have access to the underlying container, using something like Swift’s subscript. To get the value at first, you call *first. This is why std::find doesn’t need to take a container in it’s input. It just increments first until either it equals last, or *first equals val.

There are lots of reasons why the STL models iterators like this: because it’s like pointers into C arrays; because it dodges memory management issues (which won’t apply to Swift) etc. But also because, with this model, every algorithm will work just as well on a sub-range of a container as on the whole container. You don’t have to pass the starting or ending iterator for the container into find, you can pass in any two arbitrary points.¹

Swift indices, on the other hand, are pretty dumb beasts. They don’t have to know about what value they point to or what collection they index. Heck, the index for Array is just an Int.²

For all the advantages of Swift’s just passing in collections as arguments has (it looks a lot cleaner, doesn’t confuse beginners),³ it lacks this ability to apply algorithms to subranges. SubrangeCollectionView gives you that ability back.

Anyway, back to the original problem, which was to enhance LazyRandomAccessView with a version of first_n that returns another LazyRandomAccessView. With the help of SubrangeCollectionView, here it is:

extension LazyRandomAccessCollection {
  func first_n
    (n: LazyRandomAccessCollection.IndexType.DistanceType)
    -> LazyRandomAccessCollection<SubrangeCollectionView<LazyRandomAccessCollection>> {
      let start = self.startIndex
      let end = min(self.endIndex, start.advancedBy(n))
      let range = start..<end
      let perm = SubrangeCollectionView(self, subrange: range)
      return lazy(perm)
    }
}

let a = [1, 2, 3, 4, 5, 6, 7]
let first3 = lazy(a).first_n(3)
first3[1]  // returns 2

One final interesting question is whether to do the same for forward and bidirectional lazy views as well. The problem here is computing the endIndex for the view. Unlike with a random-access index, you can’t just advance them by n in constant time. It takes O(n) because the index has to be walked up one by one. For this reason, I’d stick with returning sequences for these.

Of course, you can also pass a later iterator to first than to last, at which point, kaboom! A risk which passing in the container itself eliminates. ↩
Of course, you could write an über-index type for your collection, that did understand about what it pointed at. Maybe you could even overload the * operator for it to dereference itself. ↩
And lets face it, you operate on the whole collection 99.9% of the time. ↩

Jumping to conclusions

Posted on July 19, 2014 by airspeedvelocity

In an article yesterday, I mentioned that I wasn’t having much luck extending Dictionary to give it an initializer that took a sequence of tuples, because my attempt wasn’t compiling, even though I was sure that it should:

extension Dictionary {
    init<S: Sequence
      where S.GeneratorType.Element == Element>
      (_ seq: S) {
        self.init()
        for (k,v) in seq {
          self[k] = v
        }
     }
}

Looked like the problem was with destructuring a typealiased tuple type into its consitutents. Like a good little beta tester, I settled in this morning with a fresh pot of coffee to file a bug report. My plan was to reproduce the problem without the Dictionary part, to try and clarify exactly where the issue lay. I tried writing a couple of classes that used typealiased tuples, one taking two generic placeholders and typealiasing the pair of them as a tuple, and then writing a function on one that took the other.

After about 20 minutes plugging away at this, I was still unable to reproduce it independently, and was geting quite frustrated. Then I realized hang on, what made me so sure this was a problem with tuples?¹ I pushed aside my failed attempt and just tried compiling this, and got the exact same error:

extension Array {
    func someFunc<S: Sequence
      where S.GeneratorType.Element == Element>
      (_ seq: S) {
        for e in seq {

        }
     }
}

Well, shit. That’ll teach me to overthink a problem. I scold myself for not applying Occam’s razor and assuming the cause was more complicated than it was.² Once I stopped obsessing over the tuple part, the next step in fixing my code became obvious. for ... in is just shorthand for creating a generator and then looping while next() returns non-nil elements. So I tried this:

extension Array {
    func someFunc<S: Sequence
      where S.GeneratorType.Element == Element>
      (seq: S) {
        var gen = seq.generate()
        let elem = gen.next()
    }
}

Bingo! The call to gen.next() reports Cannot convert the expression's type 'Self.Element? to type 'Self.Element?'. Somehow type inference is borked. But if I qualify the declaration of elem as elem: Element? it works fine.

So now I know how to work around it, I can finally write my sequence initializer for dictionary.

But first: @rbrockerhoff replied yesterday with a gist that was very close to the Sequence initializer (it took MapCollectionView as an argument rather than the sequence directly, which somehow avoids the type inference issue in a similar same way that my version that took an Array did).

As part of this, he did something very smart, which is to separate out inserting the sequence into the dictionary into another method, and then calling it from init instead of having the loop in init itself. This is a great example of solving something generically first, and then specializing your generic solution to the immediate problem. So here’s the full solution:

extension Dictionary {
    init<S: Sequence
      where S.GeneratorType.Element == Element>
      (_ seq: S) {
        self.init()
        self.merge(seq)
    }

    mutating func merge<S: Sequence
      where S.GeneratorType.Element == Element>
      (seq: S) {
        var gen = seq.generate()
        while let (k: KeyType, v: ValueType) = gen.next() {
            self[k] = v
        }
    }
}

let pairs = enumerate(["zero","one", "two"])
let d: [Int:String] = Dictionary(pairs)
d[1] // returns {Some "one"}... yes!

Just one wrinkle – you still have to give the dictionary variable a type, it won’t infer it. I’m not sure if this is a bug in my code, a Swift bug, or a feature not a bug. If you know, let me know.

I was possibly nudged towards the tuple assumption because of another bug in playgrounds with mapping arrays of tuples. ↩
I did the same thing yesterday when I convinced myself the only way to write my own Dictionary.init() was to create another dictionary and assign it to self, before a helpful comment from @NachoSoto suggested I just try self.init(). Duh. ↩

Swift’s Lazy Collections and Sequences

Posted on July 10, 2014March 9, 2015 by airspeedvelocity

Warning: this article is out of date as of Swift 1.0 beta 4. Read this updated version instead.

The Swift non-member function map acts much like the Array.map member function, except you can pass in any kind of sequence or collection (click here for an article on how to write an algorithm like this).

But if you look at what map returns, it isn’t an array of results. It’s an object (a MapColectionView or a MapSequenceView).

That’s because map evaluates its results lazily. When you call map, no mapping takes place. Instead, the input and mapping function are stored for later, and values are only mapped as and when they are needed.

To see this happening, run the following code:

let r = 1...3
let mapped = map(r) {
    (i: Int) -> Int in
    println("mapping (i)")
    return i*2
}

println("crickets...")

for i in mapped {
    println("mapped (i)")
}

println("tumbleweed...")

println("index 2 = (mapped[2])")

// prints out
crickets...
mapping 1
mapped 2
mapping 2
mapped 4
mapping 3
mapped 6
tumbleweed...
mapping 2
index 2 = 4

The returned object from map mimics the properties from the object passed in. If you pass in a sequence, then you get back a sequence. If the collection you passed in only supported a bidirectional index (such as a Dictionary or a String, which can’t be indexed randomly), then you won’t be able to subscript arbitrarily into the mapped collection, you’ll have to walk through it.

map is not the only lazy function. filter and reverse also return these “view” classes. Combining these different lazily evaluated classes means you can build up quite complicated expressions without sacrificing efficiency.

Want to search backwards through an array or string? find(reverse(myarray), value) won't actually reverse the array to search through it, it just iterates over the ReverseView class, which serves up the array in reverse order.

Want the first 3 odd numbers that are a minimum of two arrays at each position? first_n(filter(map(Zip2(a1, a2), min) {$0%2 == 1}, 3) wont do any more zipping or minning or modding than necessary.¹²³

There are also several lazily generated sequences you can also use:

enumerate numbers each item in a collection — it returns an EnumerateGenerator, that serves up those numbered elements on demand. This is useful in a for (count, val) in enumerate(seq) loop where you want to know both the value and how many values so far.

PermutationGenerator is initialized with a collection and a collection of indices into that collection, and serves up each element at each index in that order (thus allowing you to lazily reorder any collection – for example, PermutationGenerator(elements: a, indices: reverse(a.startIndex..<a.endIndex)) would be the reverse of a.⁴

GeneratorOf just takes a closure and runs it each time to generate the next value. So var i = 0;var naturals = GeneratorOf({ ++i }) sets naturals to a sequence that counts up from 1 to infinity (or rather, to overflow).

These infinite virtual sequences can be pretty useful. For example, to do the same thing as enumerate, you could write Zip2(naturals, somecollection). Or you could pass your infinite sequence into map or filter to generate further inifinite sequences.

(at the opposite end, if you have a single value, but what a function needs is a Sequence, you can use GeneratorOfOne(value) to create a quick sequence that will just serve up just your one value)

It's not all rainbows and unicorn giggles though. Imagine you did the following:

let r = 1...10_000
mapped = map(r) { megaExpensiveOp($0) }

let a1 = mapped
let a2 = mapped

Here the lazy evaluation will work against you. megaExpensiveOp will be run not ten but twenty thousand times.

“Shouldn't map cache the data?” you ask. Well that leads to the next complication. Take this code:

var i =0
let mapped = map(1...5) {
    i += 1
    return $0 + i
}

Every time you iterate over mapped now, you'll get different results.⁵

This behaviour might be put to very good use (say you wanted to peturb a data set with small random values). But if you weren't expecting it, that could be one nasty bug to track down.

So its good to know when you're using these functions what they are actually doing under the hood. Just keep in mind these caveats and an exciting life of lazy evaluation awaits you in the off-world colonies.

OK I cheated, first_n isn't a Swift standard library function. But it should be! ↩
This code would look a lot nicer if Swift came with a pipe operator ↩
Yeah, this is a pretty rubbish example, I should have put the time in to think of a better real-life use case. ↩
Obviously, reverse(a) would work better, but a more useful example is tough to fit on one line… ↩
If you're unclear on how come, try reading this article. ↩

Writing Algorithms on Collections in Swift

Posted on June 17, 2014November 1, 2014 by airspeedvelocity

edit: this post has been updated to Swift as of Xcode 6.1 (Swift 1.1).

Suppose you want a new algorithm that checks if every member of an array fulfills a given criteria.

Your first thought might be to add it as an extension of Array:¹

extension Array {
    func all_of<B: BooleanType>
        (predicate: (Element) -> B) -> Bool {
            for e in self {
                if !predicate(e) {
                    return false
                }
            }
            return true
    }
}

This function takes a predicate (a function that takes an element and returns true or false), and applies it to every member of the array, only returning true if it was true for every member.

Note we specify a template argument, B: BooleanType, that allows the predicate to return not just a Bool but any type that can behave like a Bool (such as ObjCBool)².

We can test it works easily:

let isEven = { ($0 % 2) == 0 }
let mixed = Array(1...4)     // [1, 2, 3, 4]
let doubled = map(1...4) { $0 * 2 }
let evens = Array(doubled)   // [2, 4, 6, 8]

mixed.all_of(isEven)  // false
evens.all_of(isEven)  // true

Incidentally, the above shows a nice trick that’s useful for testing collection algorithms, which is that arrays can be initialized with sequences, such as ranges. So if you quickly want an array of increasing numbers, you can write Array(begin...end). If you want something more complicated, you can pass a range to map or filter to alter the range first.

This implementation is nice because it’s now built into Array, but there are a few downsides.

Firstly, what if you wanted a slightly different version all_of_equal, that just took a value and confirmed if every entry was equal to that value? You’ll need to test for equality, so Element will need to be Equatable. You might think you’d specify a function that goes like this:

extension Array {
    func all_of_equal<Element: Equatable>
    (value: Element) -> Bool {
        return self.all_of { $0 == value }
    }
}

Nope, that won’t work. The problem is that you can’t further constrain Element within Array to be Equatable. What if you created an array of something that didn’t conform to Equatable? Would this no longer be allowed now you’ve added this method? Or should this method not be callable? Neither of these are palatable so the language just won’t let you do it.

Instead, you need to define a non-member function that takes an Array and an Element. Then you can constrain that element as much as you like:

func all_of<E, B: BooleanType>
    (a: Array<E>, predicate: (E) -> B) -> Bool {
        for e in a {
            if !predicate(e) {
                return false
            }
        }
        return true
}

func all_of_equal<E: Equatable>
    (a: Array<E>, rhs: E) -> Bool {
        return all_of(a) { $0 == rhs }
}

In fact, because of Swift’s operator overloading, you can even give it the same name:

func all_of<E: Equatable>
    (a: Array<E>, rhs: E) -> Bool {
        return all_of(a) { $0 == rhs }
}

// calls the version that takes an equatable value
all_of([1,1,1], 1)

// calls the version that takes a closure
all_of([1,1,1]) { $0%2 == 1 }

The second benefit of defining this as a non-member function is you can generalize it to apply not just to arrays, but any kind of collection.³ To do this, you need to extend the generic parameter clause in a couple of ways:

func all_of<E, C: CollectionType, L: BooleanType
    where E == C.Generator.Element>
    (c: C, predicate: (E) -> L) -> Bool {
        for e in c {
            if(!predicate(e)) {
                return false
            }
        }
        return true
}

First, we swap out Array for a type of Collection, by specifying a generic placeholder C that must be a collection. Then, we further constrain the collection to contain types E, which are the type of input our predicate will take.

Strictly speaking, we don’t need to include the placeholder E. You could replace it in the predicate definition with C.Generator.Element. But you might find it’s cleaner to create a placeholder for it, especially if you apply further constraints to E.

Now, we can use this function on not just arrays, but any type of collection, such as ranges directly:

let isEven = { ($0 % 2) == 0 }
all_of(1...4, isEven)  // returns false

or other types of collection:

let doubled = map(1...4) { $0 * 2 }
all_of(doubled, isEven)  // returns true

There’s one final improvement that could be made. Collections implement the Sequence protocol, which also gives us enough to iterate over the elements, checking the predicate. So we could rewrite all_of even more generally:

func all_of<E, S: SequenceType, B: BooleanType
where E == S.Generator.Element>
(sequence: S, predicate: (E) -> B) -> Bool {
    for e in sequence {
        if(!predicate(e)) {
            return false
        }
    }
    return true
}

Because Sequences behave differently, we don’t need to mandate a ForwardIndex to make them work with the for.

There are some times you still want to use collections rather than sequences. If we were writing a different algorithm, say find_first_of, that we wanted to return an index into the collection of the first occurrence, we’d want to stick with an algorithm that works on collections. Also, collections imply deterministic results over multiple iterations, and consistent size, which some algorithms might rely on. But not in the case of all_of, so SequenceType is a better choice.

The most important question facing the Swift community today: how do you split long function footprints over multiple lines? ↩
In fact, the Swift standard library docs explicitly state that Bool and ObjCBool should really only be the two types that ever implement BooleanType, so maybe you could say screw it and just make the predicate return a Bool. That’s what the predicates for std lib functions like filter do. But then, that wouldn’t make for quite so educational a blog post. ↩
You might be thinking, I know, I’ll extend CollectionType! But CollectionType is a protocol, and you can’t extend protocols with implementations. What you want is something like traits or mixins. Swift doesn’t have those (yet, hopefully). ↩

	Sorting Nibbles in S… on Sorting Nibbles in Swift
	Collection Data Stru… on Arrays, Linked Lists and …
	Swift化零为整：Reduce 详解… on Arrays, Linked Lists and …
	Writing A Generic St… on Generic Collections, SubSequen…
	Writing A Generic St… on Collection Indices, Slices, an…

	Sorting Nibbles in S… on Sorting Nibbles in Swift
	Collection Data Stru… on Arrays, Linked Lists and …
	Swift化零为整：Reduce 详解… on Arrays, Linked Lists and …
	Writing A Generic St… on Generic Collections, SubSequen…
	Writing A Generic St… on Collection Indices, Slices, an…

Airspeed Velocity

African or European Swift?

collections

A persistent tree using indirect enums in Swift

No more boxes

Checking if the tree contains an element

Inserting an element

Traversing in-order

Initializing from another sequence

More fun with implicitly wrapped non-optionals

Equating Optionals

Comparing Optionals

Optional Subscripts

Full imlementation of `RubbishDictionary`

Randomly Lazy

Jumping to conclusions

Swift’s Lazy Collections and Sequences

Writing Algorithms on Collections in Swift

No more boxes

Checking if the tree contains an element

Inserting an element

Traversing in-order

Initializing from another sequence

Equating Optionals

Comparing Optionals

Optional Subscripts

Full imlementation of RubbishDictionary

Full imlementation of `RubbishDictionary`