Linked lists, enums, value types and identity

Posted on July 26, 2015October 3, 2015 by airspeedvelocity

UPDATE: now conforms to Swift 2.1b2

Following on from the previous post, here’s some more on building collections from enums.

For our book on Swift, we wrote a simple linked list example to use in a section on conforming to CollectionType. Prior to indirect, it was written using a box, and I wanted to update it. But I hit a snag, of which more later. First, a bit about lists and value types.

Here is a super-simple linked list:

enum List<Element> {
    case End
    indirect case Node(Element, next: List<Element>)
}

You prepend another element to the list by creating a new node, with the next value that of the current list. To make that a little easier, we can create a method for it:

extension List {
    func cons(x: Element) -> List {
        return .Node(x, next: self)
    }
}

// a 3-element list: 3, 2, 1
let l = List<Int>.End.cons(1).cons(2).cons(3)

Just like the tree from last time, this list has an interesting property – it is “persistent“. The nodes are immutable – you cannot change them once created. Consing another element onto the list doesn’t copy the list, just gives you a new node that links on to the front of the existing list (again, the first couple of chapters of the Purely Functional Data Structures book is a really good primer for this stuff).

This means two lists can share a tail. To illustrate this, for the first time on this blog (duh duh duhhn…), a diagram:

The immutability of the list is key here. If you could change the list (say, remove the last entry, or update the element held in a node), then this sharing would be a problem – a might change the list, and the change would affect b.

Values and Variables

This list is a stack, with consing as a push, and unwrapping the next element a pop. Suppose we added two methods to simplify these operations:

extension List {
    mutating func push(x: Element) {
        self = self.cons(x)
    }

    mutating func pop() -> Element? {
        switch self {
        case .End: return nil
        case let .Node(x, next: xs):
            self = xs
            return x
        }
    }
}

“What?! Mutating methods? I thought you said the list had to be immutable.”

Yes, I did. And it still is:

var stack = List<Int>.End.cons(1).cons(2).cons(3)
var a = stack
var b = stack

a.pop() // 3
a.pop() // 2
a.pop() // 1

stack.pop() // 3
stack.push(4)

b.pop() // 3
b.pop() // 2
b.pop() // 1

stack.pop() // 4
stack.pop() // 2
stack.pop() // 1

What this shows us is the difference between values and variables. The nodes of the list are values. They cannot change. A node of 3 and a reference to the next node cannot become some other value. It will be that value forever, just like the number 3 cannot change. It just is. Just because these values in question are structures with pointers to each other doesn’t make them less value-y.

A variable a, on the other hand, can change the value it holds. It can be set hold a value of an indirect reference to any of the nodes, or to the value End. But changing a doesn’t change these nodes. Just which node a refers to.

This is what these mutating methods on structs do – they take an implicit inout argument of self, and they can change the value self holds.¹ This doesn’t change the list, just which part of the list the variable currently represents.

In this sense, the variables are iterators into the list:

You can of course declare your variables with let instead of var, in which case the variables will be constant i.e. you can’t change the value they hold once set. But let is about the variables, not the values. Values are constant by definition.

Now, this is all just a logical model of how things work. In reality, the nodes are actually places in memory that point to each other. And they take up space, which we want back if it’s no longer needed. So Swift uses ARC to manage this, and frees them up:

Conforming to SequenceType

OK, so if List variables act like iterators, we should be able to use them to iterate, right? For example, we can conform List to SequenceType:

extension List: SequenceType {
    func generate() -> AnyGenerator<Element> {
        // declare a variable to capture that
        //  tracks progression through the list:
        var current = self
        return anyGenerator {
            // next() will pop, returning nil
            // when the list is empty:
            current.pop()
        }
    }
}

And, to make things easier, to ArrayLiteralConvertible:

extension List: ArrayLiteralConvertible {
    init(arrayLiteral elements: Element...) {
        self = elements.reverse().reduce(.End) {
            $0.cons($1)
        }
    }
}

So now you can use lists with for...in:

let l: List = ["1","2","3"]
for x in l { print(x) }

It also means, through the power of protocol extensions, we can use List with dozens of standard library functions:

l.joinWithSeparator(",")        // "1,2,3"
l.contains("2")                 // true
l.flatMap { Int($0) }           // [1,2,3]
l.elementsEqual(["1","2","3"])  // true

Boxes and Identity

But not so many as if we conformed to CollectionType. CollectionType brings us properties like .first, which would be nice to use to peek at the first element without popping it.

More importantly, conforming to CollectionType gives a guarantee that multiple passes over the sequence are OK. As the docs for SequenceType say:

SequenceType makes no requirement on conforming types regarding whether they will be destructively “consumed” by iteration. To ensure non-destructive iteration, constrain your sequence to CollectionType.

Our list definitely fits this description – you can iterate it as much as you like. So we’d like it to be a collection.

Since variables holding nodes can be treated as iterators, it seems reasonable to use them them as a ForwardIndexType.

But first, lets separate out the nodes from the list and index types. It is tempting to just conform the enum to CollectionType and ForwardIndexType directly, but this will lead to problems. For example, those two protocols need very different implementations of ==. The index needs to know if two indices from the same list are at the same position. It should not need the elements to conform to Equatable. The collection, on the other hand, should be able to compare two different lists to see if they hold the same elements. It will need the elements to conform to Equatable.

So lets rename the enum to ListNode, making it a private implementation detail, and add public List and ListIndex types that use it:

private enum ListNode<Element> {
    case End
    indirect case Node(Element, next: ListNode<Element>)

    func cons(x: Element) -> ListNode<Element> {
        return .Node(x, next: self)
    }
}

public struct ListIndex<Element> {
    private let node: ListNode<Element>
}

public struct List<Element> {
    public typealias Index = ListIndex<Element>
    public var startIndex: Index

    // since this is a constant, no need to store it in a variable,
    // so use a computed property instead to save space
    public var endIndex: Index { return Index(node: .End) }

    public init() {
        startIndex = Index(node: .End)
    }
}


extension List {
    public mutating func push(x: Element) {
        startIndex = Index(node: startIndex.node.cons(x))
    }

    public mutating func pop() -> Element? {
        guard case let .Node(x, next: xs) = startIndex.node
            else { return nil }

        startIndex = Index(node: xs)
        return x
    }
}

extension List: ArrayLiteralConvertible {
    public init(arrayLiteral elements: Element...) {
        self = List()
        for x in elements.reverse() { push(x) }
    }
}

OK so now to conform List to CollectionType. The first step is to conform ListIndex to ForwardIndexType. Simple enough, successor() just needs to return the next node:

extension ListIndex: ForwardIndexType {
    public func successor() -> ListIndex<Element> {
        guard case let .Node(_, next: next) = node
            else { fatalError("cannot increment endIndex") }

        return ListIndex(node: next)
    }
}

But this is only half the story. ForwardIndexType conforms to Equatable, which requires you to implement ==. And this is where we hit the buffers… how can we do that?

public func == <T>(lhs: ListIndex<T>, rhs: ListIndex<T>) -> Bool {
    switch (lhs.node,rhs.node) {
    case (.End,.End): return true
    case (.Node,.End),(.End,.Node): return false
    case (.Node,.Node): // what to put here???
    }
}

The problem being, indirect enum cases do not have identity. Once upon a time, when this list code was implemented with box, the boxes had identity, which you could use to solve this problem:

// Box, being a class, will have a === operator
final class Box<Element> {
    let unbox: Element
    init(_ x: Element) { unbox = x }
}

enum ListNode<Element> {
    case End
    // a node will be held inside a box, so have identity
    case Node(Box<(Element, next: ListNode<Element>)>)

    func cons(x: Element) -> ListNode<Element> {
        return .Node(Box(x,next: self))
    }
}

Now, no-one is going to mourn the loss of Box, since it got all up in your pattern matching and generally dirtied up the joint. But it did mean you could write == like this:

func ==<T>(lhs: List<T>, rhs: List<T>) -> Bool {
    switch (lhs, rhs) {
    case (.End,.End):               return true
    case (.End,.Node),(.Node,.End): return false
    case let (.Node(l),.Node(r)):   return l === r
    }
}

This all fitted perfectly well with the notion of List being a value type. Box is not a value type – being a class, it is a reference type. But it is immutable – it only has one property, declared with let. So it has value semantics.

This is a subtle but important distinction. Value types are copied on assignment, while reference types are shared (or rather, the thing they refer to is shared). But value types can contain reference types as properties. When value types containing reference types are copied, it is a shallow copy. The two value types will share a reference. If the reference type in question has methods that can mutate its state, the containing type needs to account for this possibility. This can get pretty hairy.

In the case of Box, this is fine though, because it is immutable. The sharing doesn’t matter, because once created, a box can never be changed. Yes, what is boxed could be a reference type you could then mutate, but since what we are boxing is our enum, and that is a value type, it’s fine.

(the list could contain mutable reference types. But we can ignore this – in the same way an array containing reference types doesn’t meddle with the array’s value semantics.)

So, with our nodes being references, we essentially got a free globally-unique identifier for each one, which we could then use to implement node equality. But now that we have switched away from box, we can’t use === any more. So what can we do?

Well, we could rely on the fact that, really, indirect is kind of the same as the box approach, just with the compiler writing the boxing/unboxing in for you and hiding the details. So we could do something very unwise, and just look at the bits:

public func == <T>(lhs: ListIndex<T>, rhs: ListIndex<T>) -> Bool {
    switch (lhs.node,rhs.node) {
    case (.End,.End): return true
    case (.Node,.End),(.End,.Node): return false
    case (.Node,.Node):
        // engage ಠ_ಠ now...
        return unsafeBitCast(lhs, UInt.self) == unsafeBitCast(rhs, UInt.self)
    }
}

This is evil, but it appears to work. When a ListNode.Node is copied, with indirect this just makes a copy of the indirect reference. Then, if the variable is updated, it changes to be a new reference and is no longer equal.

But it is almost certainly Not A Good Idea. It’s relying on an implementation detail of indirect that, if it ever changed or if the compiler did some optimization under the hood that did something different, would lead to undefined behaviour. There could well be other ways in which my behaviour assumptions are incorrect and this breaks, though I haven’t found any.

Next hacky solution: only make the ends of the index equal. Even if two nodes might be the same, return false regardless:

public func == <T>(lhs: ListIndex<T>, rhs: ListIndex<T>) -> Bool {
    switch (lhs.node,rhs.node) {
    case (.End,.End): return true
    default:          return false
    }
}

Again, this mostly works, but we are still breaking the rules – in this case, the rules of Equatable.

Equatable requires that == be an equvalence relation. This means, amongst other things, that it must be reflexive – a thing must equal itself. And our indices don’t (unless they’re the end):

l.startIndex == l.startIndex  // false... not OK!

Mysterious undiagnosable bugs can ensue from this. There are some places in the standard library where == being an equivalence relation is relied upon. For example, Array takes a shortcut of first checking if two arrays share the same buffer. If they do, they have to be equal, right? But implement an Equatable type whose == isn’t reflexive, and you get some very strange behaviour:

struct NeverEqual: Equatable { }
func == (lhs: NeverEqual, rhs: NeverEqual) -> Bool {
    return false
}

var a = [NeverEqual()]
let b = a
a == b       // true
a[0] = b[0]
a == b       // false

…so we should probably rule this out.

Conforming to CollectionType

Finally, a third option: tag our list indices with a number, and use that to compare them. Whenever you change an index, increment/decrement the tag.

edit: I originally put this tag on the nodes, but Thorsten in the comments made a great point that it is more space-efficient to have it on the indices.

This solution is a bit more intrusive. We’ll have to reconfigure ListIndex to acommodate the tag:

public struct ListIndex<Element> {
    private let node: ListNode<Element>
    private let tag: Int
}

…and any places where generate new indices:

public struct List<Element> {
    public typealias Index = ListIndex<Element>
    public var startIndex: Index
    public var endIndex: Index { return Index(node: .End, tag: 0) }

    public init() {
        startIndex = Index(node: .End, tag: 0)
    }
}

extension ListIndex: ForwardIndexType {
    public func successor() -> ListIndex<Element> {
        guard case let .Node(_, next: next) = node
            else { fatalError("cannot increment endIndex") }

        return ListIndex(node: next, tag: tag.predecessor())
    }
}

extension List {
    public mutating func push(x: Element) {
        startIndex = Index(node: startIndex.node.cons(x),
                           tag:  startIndex.tag.successor())
    }

    public mutating func pop() -> Element? {
        guard case let .Node(x, next: _) = startIndex.node
        else { return nil }

        ++startIndex
        return x
    }
}

Now, index equality can now be performed just by comparing the tag:

public func == <T>(lhs: ListIndex<T>, rhs: ListIndex<T>) -> Bool {
    return lhs.tag == rhs.tag
}

Finally, we can conform List to CollectionType:

extension List: CollectionType {
    public subscript(idx: Index) -> Element {
        guard case let .Node(x, _) = idx.node
            else { fatalError("Subscript out of range") }

        return x
    }
}

and it gainst the collection-only features:

l.first  // .Some("1")
if let twoIdx = l.indexOf("2") {
    l[twoIdx]  // "2"
}

There’s no longer any need to implement generate for the SequenceType conformance. Collections automatically gain a default implementation using IndexingGenerator.

As an added benefit of the tagging solution, since the tag is the count of nodes prepended to .End, we can implement a constant-time count, even though the lists are forward-index only which would normally mean linear-time counting:

extension List {
    public var count: Int {
        return startIndex.tag
    }
}

It is true that two ListIndex types from two totally different lists with different values can now be equated and be equal to each other. But then, this is true of array’s indices too (cos, you know, they’re just integers).

We can also implement == for List the way users expect == to behave for value-semantic collections, by comparing the elements:

func == <T: Equatable>(lhs: List<T>, rhs: List<T>) -> Bool {
    return lhs.elementsEqual(rhs)
}

Efficient Slicing

Collections also get a default implementation of the slicing operation, [Range], which is how dropFirst works:

// equivalent of dropFirst(l)
let firstDropped = l[l.startIndex.successor()..<l.endIndex]

// firstDropped will be a Slice<List<String>>

But this default implementation returns a wrapper over the original collection, plus an index sub-range, so is bigger than it needs to be in List‘s case:

// Size of a list is size of two nodes, the start and end:
sizeofValue(l)             // returns 16

// Size of a list slice is size of a list, plus size of a subrange
// (a range between two indices, which in lists's case are also list nodes)
sizeofValue(dropFirst(l))  // returns 48

Remember, since endIndex was always the same value, it takes no space inside List, so the sizeofValue(l) is just the size of List’s sole member variable, startIndex.

But if we switch to hold an explicit endIndex property, lists could return subsequences of themselves by holding a different start and end:

public struct List<Element> {
    public typealias Index = ListIndex<Element>
    public var startIndex: Index
    public var endIndex: Index

    public init() {
        startIndex = Index(node: .End, tag: 0)
        endIndex = startIndex
    }

    // and add the slicing initializer, which takes a range:
    private init (subRange: Range<Index>) {
        startIndex = subRange.startIndex
        endIndex = subRange.endIndex
    }
    public subscript (subRange: Range<Index>) -> List<Element> {
        return List(subRange: subRange)
    }
}

// now lists take 32 bytes (a startIndex and an endIndex):
sizeofValue(l)             // returns 32

// but list slices are also lists, so only 32 bytes
sizeofValue(dropFirst(l))  // returns 32

This leads to another benefit of this list implementation. With many sliceable containers, including Swift’s arrays and strings, a slice shares the storage buffer of the original collection. This has an unpleasant side-effect: slices can keep the original collection’s buffer alive in its entirety, even if the original collection falls out of scope. If you suck a 1gb file into an array or string, then slice off a tiny part, the whole 1gb buffer will stay in memory until both the collection and the slice are destroyed.

With List, it isn’t quite as bad. As we’ve seen, the nodes are refcounted. Which means when the slices are the the only remaining copy, any elements dropped from the front will be reclaimed as soon as no-one is referencing them.

Not the back though – since the slice’s endIndex still has a reference to what comes after. You could optimize this even further by having slices only hold the tag for their end index if you wanted.

I hate writing conclusions

So, losing Box was mostly great, but it also lost us one benefit, of a free identity for our nodes. This identity is still secretly there, and I think it works similarly, but we can’t use it. I have a feeling there’s a case to be made for enums to be able to use it internally, without necessarily breaking with the idea that enums are value types. But there are probably hidden gotchas to this that I haven’t thought of. Tags serve as a reasonable subsitute (at the expense of an extra few bytes), but this approach wouldn’t easily adapt to more complex structures like trees, so conforming last week’s code to a collection type would probably prove tricky (though, a persistent stack like this list may help with creating the index type). If anyone wants to take a stab, let me know what you come up with!

inout arguments in Swift are interesting too. They aren’t pass-by-reference, but rather pass-by-value-and-copy-back. Which makes several things about Swift work better than, say, C#. There’s more info in the book. ↩

Changes to the Swift standard library in 2.0 betas 2..<5

Posted on July 23, 2015July 24, 2015 by airspeedvelocity

So, I was waiting for enough library changes across a few betas to justify posting something, and then all of a sudden lots of stuff changed! So let’s start with:

Flat Map

Missed from my 2.0b1 changes post (shame on me) was a third kind of flatMap.

1.2 brought us flatMap for collections – given an array, and a function that transforms elements into arrays, it produces a new array with every element transformed, and then those arrays “flattened” into a single array. For example, suppose you had a function, extractLinks, that took a filename and returned an array of links found in that file:

func extractLinks(markdownFile: String) -> [NSURL]

If you had an array of filename strings (say from a directory listing), and you wanted an array of the links in all the files, you could use this function with flatMap to produce a single array (unlike map, which would generate an array of arrays):

let nestedArrays: [NSURL] = markdownFiles.flatMap(extractLinks)

There was also a flatMap for optionals. Given an optional, it takes a function that takes the possible value inside the optional, and applies a function that also returns an optional, and “flattens” the result of mapping an optional to another optional. For example, Array.first returns the first element of a collection, or nil if the array is empty and has no such element. Int.init(String) takes a string, and if it is a representation of an integer, returns that integer, or nil if it isn’t. To take the first element of an array, and turn it into an optional integer, you could use flatMap (unlike map, which will return an optional of an optional of an integer):

let a = ["1","2","3"]
let i: Int? = a.first.flatMap { Int($0) }

So what’s the new third kind of flatMap? It’s a mixture of the two – for when you have an array, and want to transform it with a function that returns an optionals, discarding any transforms that return nil. This is logical if you think of optionals as like collections with either zero or one element – flatMap would discard the empty results, and flatten the single-element collections into an array of elements.

For example, if you have an array of strings, and you want to transform it into integers, discarding any non-numeric strings:

let strings = ["1","2","elephant","3"]
let ints: [Int] = strings.flatMap { Int($0) }
// ints = [1,2,3]

If you’ve been following this blog for a while, you’ll recognize this is a function we’ve defined before as mapSome.

By the way, all these examples have been pulled from the book on Swift that Chris Eidhof and I have been writing. The chapter on optionals has just been added to the preview.

First-class Initializers

In beta 2 came the ability to use initializers, partially-applied methods, and enum cases, as first-class functions. This gives you the ability to pass them directly into higher-order functions without needing a closure expression:

// equivalent to [1,2,3].map { Optional.Some($0) }
let optionals = [1,2,3].map(Optional.Some)

// equivalent to [1,2,3].map { String($0) }
let strings = [1,2,3].map(String.init)

// equivalent to [1,2,3].map { 1.advancedBy($0) }
let plusones = [1,2,3].map(1.advancedBy)

Given this, you might wonder why I wrote strings.flatMap { Int($0) } in the earlier section, instead of strings.flatMap(Int.init). But this new ability is constrained by the fact that overload resolution works slightly differently for function invocation versus function assignment. If you try the latter version, you will get an error because there is no method for Int.init that takes just a single string as an argument. When you call Int("1"), you are really calling Int("1", radix: 10), with the compiler filling out the default argument for you. But it won’t do that when passing Int.init in to map.

String.init on integers works though, I guess because the overloading is less ambiguous? On the other hand, arrayOfCustomStructs.map(String.init) won’t compile, probably because there is ambiguity between String.init(T) and String.init(reflecting:T) (the debug version). So for now, you’ll have to stick with writing array.map { String($0) }, like an animal.

One other thing to be careful of: this new terse syntax doesn’t work for when you want to apply a method to the self argument. The following will compile, but won’t do what might be intended – reverse each of the arrays:

let reversedArrays = [[1,2,3],[4,5,6]].map(Array.reverse)

Instead, this will produce an array of functions – because Array.reverse returns a function that returns a function where the method will be called on the argument pased.

So instead, you would have to write this:

let reversedArrays = [[1,2,3],[4,5,6]].map { Array.reverse($0)() }
// which is just a long winded way of writing this…
let reversedArrays = [[1,2,3],[4,5,6]].map { $0.reverse() }
// but bear with me...

You could always write a version of map that takes curried functions and does the above for you:

extension CollectionType {
    func map<T>(@noescape transform: (Self.Generator.Element) -> () -> T) -> [T] {
        return self.map { transform($0)() }
    }
}

after defining which, .map(Array.reverse) will do what you probably want.¹

This can also be nice to do for other functions. For example, if you defined a similar version of sort:

extension CollectionType {
    public func sort(isOrderedBefore: Self.Generator.Element -> Self.Generator.Element -> Bool) -> [Self.Generator.Element] {
        return self.sort { isOrderedBefore($0)($1) }
    }
}

then, after also overloading caseInsensitiveCompare to have a version that returns a boolean in Swift’s isOrderedBefore style, you could do the same with methods that compare:

import Foundation

extension String {
    func caseInsensitiveCompare(other: String) -> Bool {
        return self.caseInsensitiveCompare(other) == .OrderedAscending
    }
}

["Hello","hallo","Hullo","Hallo",].sort(String.caseInsensitiveCompare)
// returns ["hallo", "Hallo", "Hello", "Hullo"]

RangeReplaceableType: all your ExtensibleCollectionTypes are belong to us

ExtensibleCollectionType is gone. It used to provide append and extend, which are now amongst the features provided by RangeReplaceableCollectionType.

RangeReplaceableCollectionType is a great example of the power of protocol extensions. You implement one uber-flexible method, replaceRange, which takes a range to replace and a collection to replace with, and from that comes a whole bunch of derived methods for free:

append and extend: replace endIndex..<endIndex (i.e. nothing, at the end) with the
new element/elements.
removeAtIndex and removeRange: replace i...i or subRange with an empty collection.
splice and insertAtIndex: replace atIndex..<atIndex (i.e. don't replace any elements but insert at that point) with a new element/elements.
removeAll: replace startIndex..<endIndex with an empty collection.

The two remaining functions from ExtensibleCollectionType – an empty initializer, and reserveCapacity, have also moved into RangeReplaceableCollectionType.

As always, if a specific collection type can use knowledge about its implementation to perform these functions more efficiently, it can provide custom versions that will take priority over the default protocol extention ones.

You get to be Sliceable. And you get to be Sliceable. Everybody gets to be Sliceable!

Things being Sliceable (i.e. having a subscript that takes a range, and returns a new collection of just that range) was incredibly useful. But it was also limiting, because you needed to rely on things implementing Sliceable.

But you could make anything sliceable if you needed it to be – by writing a wrapper view that took a subrange and implemented CollectionType to present only that subrange:

// note, won't compile in beta 4 because Sliceable is gone!
public struct SubSliceView<Base: CollectionType>: Sliceable {
    let collection: Base
    let bounds: Range<Base.Index>

    public var startIndex: Base.Index { return bounds.startIndex }
    public var endIndex: Base.Index { return bounds.endIndex }

    public subscript(idx: Base.Index) -> C.Generator.Element
    { return collection[idx] }

    typealias SubSlice = SubSliceView<Base>
    public subscript(bounds: Range<Base.Index>) -> SubSliceView<Base> {
        return SubSliceView(collection: collection, bounds: bounds)
    }
}

// AnyRandomAccessCollection is an example of a type that wasn't sliceable,
// but you could make it one by doing this:
extension AnyRandomAccessCollection: Sliceable {
    public subscript(bounds: Range<Index>) -> SubSliceView<AnyRandomAccessCollection> {
        return SubSliceView(collection: self, bounds: bounds)
    }
}

let a = [1,2,3]
let r = AnyRandomAccessCollection(a)
// dropFirst relies on things being sliceable:
print(dropFirst(dropFirst(r)).first)

As of beta 4, the new Slice type does pretty much this exact thing. Including the way the slice's start index is not zero based – if you create a slice starting at index 2, the startIndex of that slice will be 2. Another reason to avoid using indices directly in favour of things like for...in or higher-order functions like map. Index-based operations are so much easier to screw up, and your assumptions (e.g. every integer-indexed collection starts at zero) can easily be invalid.

The Sliceable protocol has been subsumed into CollectionType, and the default implementation for the ranged subscript is to return a Slice view onto the collection. So now every collection supports a slicing subscript.

CollectionType in turn now conforms to Indexable, which is the new home for startIndex, endIndex, and index-based subscript. Note, Indexable does not to conform to SequenceType – these two things are now brought together by CollectionType.

There is still a good reason for collections to implement their own custom slicing code – they can often do it more efficiently.

For example, UnsafeBufferPointer didn't used to be sliceable, but now is, with the default implementation. If you slice it, you'll get back a Slice:

[1,2,3].withUnsafeBufferPointer { buf -> Void in
    let slice = buf[2..<4]
    sizeofValue(slice)  // returns 32 bytes
}

sizeof returns 32, because the type has to contain both the base buffer (which is 16 bytes – the base pointer address and the length), plus the start and end of the subrange (another 16 bytes).²

But alternatively, UnsafeBufferPointer could use itself as its subslice, like so:

extension UnsafeBufferPointer {
    subscript(subRange: Range<Int>) -> UnsafeBufferPointer {
        return UnsafeBufferPointer(start: self.baseAddress + subRange.startIndex,
                                   count: subRange.count)
    }
}

If you do this, you’ll see that UnsafeBufferPointer‘s slice is now another UnsafeBufferPointer, and the memory requirement drops in half:

[1,2,3].withUnsafeBufferPointer { buf -> Void in
    let slice = buf[2..<4]
    sizeofValue(slice)  // returns 16 bytes
}

This approach – making Self your subslice – is taken by String slices, but not by Array, which has a subslice type of ArraySlice.

This brings up another gotcha you might hit when writing generic methods on collections using slices – slices aren’t necessarily of the same type as the thing you are slicing. They are sometimes, but not always.

Even more strangely, the elements contained in the subslice aren’t guaranteed to be the same as the elements of the original collection! Ideally they would be, but this constraint can’t be expressed in Swift right now.

For example, suppose you wanted to overload reduce with a version that didn’t need an initial element – instead it used the first element of the collection, and returned an optional in case the collection was empty. You might try and implement it like this:

extension CollectionType { 
    /// Return the result of repeatedly calling `combine` with an
    /// accumulated value initialized to the first element, and each 
    /// subsequent element of `self`, in turn, i.e. return
    /// `combine(combine(...combine(combine(self[0], self[1]),
    /// self[2]),...self[count-2]), self[count-1])`.
    /// Return `nil` in case of an empty collection.
    func reduce(@noescape combine: (Generator.Element, Generator.Element) -> Generator.Element) -> Generator.Element? {
        return first.map {
            dropFirst(self).reduce($0, combine: combine)
        }
    }
}

This will not compile (and beta 4 will lead you up the garden path with a spurious error for why). The problem being, the elements of the collection returned by dropFirst aren’t guaranteed to be the same as those of the collection. You would probably be right to be annoyed if they weren’t – but the type system doesn’t guarantee it.

The fix would be to constrain the extension to require it:

extension CollectionType where Generator.Element == SubSequence.Generator.Element {
    func reduce(@noescape combine: (Generator.Element, Generator.Element) -> Generator.Element) -> Generator.Element? {
        return first.map {
            dropFirst(self).reduce($0, combine: combine)
        }
    }
}

// so now you can do this:
[1,2,3,4].reduce(+)  // returns 10

Grab Bag

There are a few other changes to the standard library:

Reflectable became _Reflectable and MirrorType became _MirrorType
QuickLookObject has been renamed PlaygroundQuickLookObject
_MirrorType and PlaygroundQuickLookObject acquired documenting comments.
_MirrorDisposition is now gone from the visible library, though _MirrorType still refers to it for now.
And the reflect function is gone: Xcode tells you to use Mirror.init(reflecting:) instead.
RawOptionSetType was removed
The gradual disappearing of the _ protocols continues. _UnsignedIntegerType is now gone (its methods – toUIntMax and init(_: UIntMax) moved into UnsignedIntegerType)
Views onto other things (such as FilterCollection, a lazily filtered viewonto a collection) have dropped their View suffix.
Free functions continue to disappear into protocol extensions.
As do unnecessarily non-default methods, such as generate for collections that now use the default indexing generator implementation.
LazyRandomAccessCollection.reverse now returns a random-access reversed collection, not a bi-directional one.
The Zip2 type has been renamed Zip2Sequence – but you already switched to creating it using the zip function, right? Though perhaps the protocol extension wolves are circling its campfire.
SinkType and SinkOf are gone. SinkType was used in two places – UTF encoding methods, where you now pass a closure, and UnsafeMutablePointer, where you could now write (ptr++).memory = x instead, I guess.

Beta 3 didn’t introduce many functional changes, but did do a lot of renaming, changing the placeholder names for generic types to something more descriptive than T, U etc:

Sequence and collection-like things now use Element
Pointer-like things now use Memory
Intervals use Bound
Views onto other types use Base for the thing they are viewing
Unmanaged things are Instances.
Optionals still wrap a T though.

We’re not quite out of gotcha territory yet though. After defining that map, try and guess what ["foo","bar","baz"].map(String.hasPrefix("b")) will do… ↩
On 64-bit platforms, anyway. Cut all these numbers in half for 32-bit. ↩

A persistent tree using indirect enums in Swift

Posted on July 22, 2015October 4, 2015 by airspeedvelocity

UPDATE: now conforms to Swift 2.1b2

Suppose you want an ordered set. Set is great and all, but it’s unordered. You could always use a Set anyway, and sort the contents when needed. Or you could use an array, keep it sorted, and always insert new elements in the right place. Or, you could use a tree.

One way to build a tree in Swift would be with a class that refers to itself. Something like this for a binary search tree:

class Tree<Element: Comparable> {
    let value: Element
    // entries < value go on the left
    let left: Tree<Element>?
    // entries > value go on the right
    let right: Tree<Element>?
}

For the tree’s logic, Element would only have to be Comparable, not Hashable, which is another potential benefit over Set.

The left and right subtrees are optional, with nil at the leaves when there’s no lesser/greater value than the current node. But there’s a problem with this approach – how do you represent an empty tree? value is not optional. Sure you could make it optional, and have an empty tree be represented by a nil value. But that doesn’t feel healthy.

Really, you want an enum with an associated type. Something like this:

enum Tree<Element: Comparable> {
    case Empty
    case Node(Tree<Element>,Element,Tree<Element>)
}

So, a tree is either empty, or it has a left and right subtree and a value. The subtrees no longer need to be optional, because if they’re empty, they can just hold the .Empty value.

No more boxes

Except, this won’t work. Swift enums are value types. So they can’t contain themselves like this. You need some kind of indirection – a reference type, such as a class.

Up until 2.0b4, you would have to use a Box trick – inserting a class in between each enum:

final class Box<Element> {
    let unbox: Element
    init(_ x: Element) { self.unbox = x }
}

enum Tree <Element: Comparable> {
    case Empty
    case Node(Box<Tree<Element>>,Element,Box<Tree<Element>>)
}

Well that’s pretty gross. And we haven’t even got to the actual implementation logic yet. Trust me, Box is gonna stomp all over that code leaving its muddy footprints everywhere.

But good news! The latest release of Swift introduces the indirect keyword for enums. This is similar to a transparent version of the box above, putting a reference where the boxed value would go. Xcode will even suggest putting in the keyword for you.

So now we can write this:

indirect enum Tree<Element: Comparable> {
    case Empty
    case Node(Tree<Element>,Element,Tree<Element>)
}

But really, we want a balanced tree. Simple binary search trees work well with random data, but if you insert already-ordered data into them, they degenerate into a linked list (because all the insertions happen down one side of the tree).

So lets go with a red-black tree – a kind of binary search tree that colours its nodes red or black, and then uses some invariants to guarantee it remains balanced on insertion. If you follow that link you’ll see quite a lot of ugly code for maintaining those invariants – but don’t worry, we’re going to write something that hopefully looks a lot simpler.

So here’s the tree, now with added colour, plus a couple of initializers to make things easier:

enum Color { case R, B }

indirect enum Tree<Element: Comparable> {
    case Empty
    case Node(Color,Tree<Element>,Element,Tree<Element>)

    init() { self = .Empty }

    init(_ x: Element, color: Color = .B,
        left: Tree<Element> = .Empty, right: Tree<Element> = .Empty)
    {
        self = .Node(color, left, x, right)
    }
}

Some might not like that I called my Color cases R and B instead of Red and Black. It’s a personal preference, but I like to keep them shorter because a) they’re arbitrary – they are only called that because those were the two colours on the laser printer they had at Xerox PARC, and b) it keeps the pattern matching shorter and (therefore, IMO) clearer. Judge for yourself when we get to that part.

By the way, this implementation is a Swift translation of an ML version from Chris Okasaki’s Purely Functional Data Structures which is definitely worth reading if you like this kind of stuff. What’s cool is that with the new 2.0 features of Swift, the Swift version is getting pretty close to the ML one in terms of expressivity/simplicity.

Checking if the tree contains an element

So, let’s start with contains. This needs to recur down the tree, checking each value to see if it’s empty, or if not, whether its value is less than, greater than or equal to the value we’re looking for.

Ideally I would write this using a guard statement like so:

extension Tree {
    func contains(x: Element) -> Bool {
        guard case let .Node(_,left,y,right) = self
          else { return false }

        if x < y { return left.contains(x) }
        if y < x { return right.contains(x) }
        return true
    }
}

I like how guard helps here. It lets us unpack the important stuff used by the main body of the function (is it a node? If so, grab the left, right and value elements – discarding color, since its not important). But it also lets us put the failure case – if you hit an empty node, you know the tree does not contain the element – up front. Then, head left if the element we’re looking for is less than this one, or right if it’s greater.

We can rely on the fact that Comparable elements are required to be in a strict total order. This means if x is neither less than or greater than a value, it must be equal to that value. So we’ve found the element and contains is true.

Inserting an element

OK, insert is a little trickier. We’re going to define insert as a method that returns a new tree – one with the new value inserted. This is a little different to Set.insert, which is an in-place insert.

In doing this, we’re going to create a “persistent” data structure – one that does not change the old structure when it updates it. This might sound like multiple insertions in a row are going to perform many unnecessary copies of the whole structure, but it isn’t necessarily that bad. Because nodes of the tree are read-only, two trees can share nodes. Only the nodes that are actually changed as a result of the insert need to be copied.

First, the insert method. It doesn’t really do much except maintain one of the invariants required to maintain balance: the root node must always be black. It leaves most of the work to a second function, ins:

private func ins<T>(into: Tree<T>, _ x: T) -> Tree<T> {
    guard case let .Node(c, l, y, r) = into
        else { return Tree(x, color: .R) }

    if x < y { return balance(Tree(y, color: c, left: ins(l,x), right: r)) }
    if y < x { return balance(Tree(y, color: c, left: l, right: ins(r, x))) }
    return into
}


extension Tree {
    func insert(x: Element) -> Tree {
        guard case let .Node(_,l,y,r) = ins(self, x)
            else { fatalError("ins should never return an empty tree") }

        return .Node(.B,l,y,r)
    }
}

So, if we’ve hit an empty node, append the value here as a new node. New nodes are always red.

Otherwise, insert the value into the left or right subtree. And if the element is equal, stop and don’t insert anything, since we’re implementing set behaviour here.

But there’s also the call to balance. This function is going to ensure that, as we come back up the tree having inserted our new element, the tree is properly balanced, by looking at whether any invariants have been broken and fixing them.

This is normally where the ugliness of implementing a red-black tree resides. But we’re going to use Swift’s pattern matching to try and keep it simple. The balancing logic can be summarized as: “on the way back up from an insertion, if there’s a black node, with a red child and a red grandchild, this needs fixing”.

A red child and grandchild of a black node can happen in one of four different configurations. Here’s a visualization of one of those possibilities (excuse the ASCII art):

        z: black
       /        \
      x: red     d
     / \
    a   y: red
       / \
      b   c

needs rearranging into:

         y: red
       /        \
      x: black   z: black
     / \        / \
    a   b      c   d

Once you understand this visualization, you can then encode it into a Swift pattern matching if statement, that extracts all the relevant variables from the possible imbalanced structure, then returns them reconfigured as the second, balanced, version.

The above re-balancing operation would end up encoded in Swift like this:

case let .Node(.B, .Node(.R, a, x, .Node(.R, b, y, c)), z, d):
    return .Node(.R, .Node(.B, a, x, b), y, .Node(.B, c, z, d))

This is why I prefer .B and .R for the colour case names – single-letter names to me look clearer in this stage, which is when they matter. There’s no getting around it, the transformation code is going to be hard to read no matter what you do, so the best you can hope for is easy translation. The longer words wouldn’t help with the key problem, which is encoding the diagram above into a pattern-matching statement.

To write the full balancing function, you need to visualize all 4 imbalanced configurations, then encode them into 4 patterns. If a node doesn’t match any, it’s good and you can return it as-is:

private func balance<T>(tree: Tree<T>) -> Tree<T> {
    switch tree {
    case let .Node(.B, .Node(.R, .Node(.R, a, x, b), y, c), z, d):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    case let .Node(.B, .Node(.R, a, x, .Node(.R, b, y, c)), z, d):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    case let .Node(.B, a, x, .Node(.R, .Node(.R, b, y, c), z, d)):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    case let .Node(.B, a, x, .Node(.R, b, y, .Node(.R, c, z, d))):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    default:
        return tree
    }
}

Note, this pattern match recurses 3 levels deep into the enum. This is also where getting rid of Box was critical. It seriously gets in the way – here is how it would look before indirect:

if case let .Node(.B,l,z,d) = tree, case let .Node(.R,ll,y,c) = l.unbox, case let .Node(.R,a,x,b) = ll.unbox {
    return .Node(.R, Box(.Node(.B,a,x,b)),y,Box(.Node(.B,c,z,d)))
}

Note, the return statement is the same in all 4 rebalance patterns. It’s only the pattern match that changes, based on which of the 4 imbalance possibilities is being checked.

It might be nice to be able to group these 4 possible matches together like so:

    switch tree {
    case let .Node(.B, .Node(.R, .Node(.R, a, x, b), y, c), z, d),
             .Node(.B, .Node(.R, a, x, .Node(.R, b, y, c)), z, d),
             .Node(.B, a, x, .Node(.R, .Node(.R, b, y, c), z, d)),
             .Node(.B, a, x, .Node(.R, b, y, .Node(.R, c, z, d))):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    default:
        return tree
    }

but Swift won’t allow multiple patterns in a case when the pattern declares variables, so we have to stick with the multiple returns for now.

Traversing in-order

So, now we have a nice balanced tree. But to get the benefit of an ordered set, we want to iterate over it in order. This sounds like a job for SequenceType. That way, you could use for...in over the sorted elements.

In-order traversal of a tree is usually done recursively, but that’s a little tricky to do when you want to give anyGenerator a closure expression that spits out the next element. So instead, we can use a stack to do it iteratively:

extension Tree: SequenceType {
    func generate() -> AnyGenerator<Element> {
        var stack: [Tree] = []
        var current: Tree = self

        return anyGenerator { _ -> Element? in
            while true {
                // if there's a left-hand node, head down it
                if case let .Node(_,l,_,_) = current {
                    stack.append(current)
                    current = l
                }
                // if there isn’t, head back up, going right as
                // soon as you can:
                else if !stack.isEmpty, case let .Node(_,_,x,r) = stack.removeLast() {
                    current = r
                    return x
                }
                else {
                // otherwise, we’re done
                return nil
                }
            }
        }
    }
}

Initializing from another sequence

One last extension. When you have a container, it’s nice to be able to initialize it from another sequence, as well as from a literal. These are both pretty easy:

extension Tree: ArrayLiteralConvertible {
    init <S: SequenceType where S.Generator.Element == Element>(_ source: S) {
        self = source.reduce(Tree()) { $0.insert($1) }
    }

    init(arrayLiteral elements: Element...) {
        self = Tree(elements)
    }
}

Given these, you can now create new sets using literal syntax:

let alphabet = Tree("the quick brown fox jumps over the lazy dog".characters)
let primes: Tree = [2, 3, 5, 7, 11, 13, 17, 19]

That’s it for now. Several features of Swift 2.0 make this code a lot more pleasant to read and write. There are various further enhancements you could make. The tree maybe ought to be wrapped in an outer struct to avoid exposing the enum implementation. The next logical extension would be conforming to CollectionType.

And there are also performance improvements you could make (the functional data structures book suggests a few as exercises). Honestly, the times when this data structure will do better than an unordered set or array + sort are probably going to be pretty rare. But it’s nice to have the option now.

Below is the final code, along with some tests, or you can find the full code for the tree in a gist, here.

If you found this interesting (and if you made it all the way down here, hopefully that’s true!), it’ll be part of Chris Eidhof and my book, which you can get a preview of here.

enum Color { case R, B }

indirect enum Tree<Element: Comparable> {
    case Empty
    case Node(Color,Tree<Element>,Element,Tree<Element>)
    
    init() { self = .Empty }
    
    init(_ x: Element, color: Color = .B,
        left: Tree<Element> = .Empty, right: Tree<Element> = .Empty)
    {
        self = .Node(color, left, x, right)
    }
}


extension Tree {
    func contains(x: Element) -> Bool {
        guard case let .Node(_,left,y,right) = self
            else { return false }
        
        if x < y { return left.contains(x) }
        if y < x { return right.contains(x) }
        return true
    }
}

private func balance<T>(tree: Tree<T>) -> Tree<T> {
    switch tree {
    case let .Node(.B, .Node(.R, .Node(.R, a, x, b), y, c), z, d):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    case let .Node(.B, .Node(.R, a, x, .Node(.R, b, y, c)), z, d):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    case let .Node(.B, a, x, .Node(.R, .Node(.R, b, y, c), z, d)):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    case let .Node(.B, a, x, .Node(.R, b, y, .Node(.R, c, z, d))):
        return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
    default:
        return tree
    }
}


private func ins<T>(into: Tree<T>, _ x: T) -> Tree<T> {
    guard case let .Node(c, l, y, r) = into
        else { return Tree(x, color: .R) }

    if x < y { return balance(Tree(y, color: c, left: ins(l,x), right: r)) }
    if y < x { return balance(Tree(y, color: c, left: l, right: ins(r, x))) }
    return into
}


extension Tree {
    func insert(x: Element) -> Tree {
        guard case let .Node(_,l,y,r) = ins(self, x)
            else { fatalError("ins should never return an empty tree") }

        return .Node(.B,l,y,r)
    }
}



extension Tree: SequenceType {
    func generate() -> AnyGenerator<Element> {
        var stack: [Tree] = []
        var current: Tree = self

        return anyGenerator { _ -> Element? in
            while true {
                // if there's a left-hand node, head down it
                if case let .Node(_,l,_,_) = current {
                    stack.append(current)
                    current = l
                }
                // if there isn’t, head back up, going right as
                // soon as you can:
                else if !stack.isEmpty, case let .Node(_,_,x,r) = stack.removeLast() {
                    current = r
                    return x
                }
                else {
                // otherwise, we’re done
                return nil
                }
            }
        }
    }
}

extension Tree: ArrayLiteralConvertible {
    init <S: SequenceType where S.Generator.Element == Element>(_ source: S) {
        self = source.reduce(Tree()) { $0.insert($1) }
    }
    
    init(arrayLiteral elements: Element...) {
        self = Tree(elements)
    }
}

import Darwin

extension Array {
    func shuffle() -> [Element] {
        var list = self
        for i in 0..<(list.count - 1) {
            let j = Int(arc4random_uniform(UInt32(list.count - i))) + i
            guard i != j else { continue }
            swap(&list[i], &list[j])
        }
        return list
    }
}


let engines = [
    "Daisy", "Salty", "Harold", "Cranky",
    "Thomas", "Henry", "James", "Toby",
    "Belle", "Diesel", "Stepney", "Gordon",
    "Captain", "Percy", "Arry", "Bert",
    "Spencer",
]

// test various inserting engines in various different permutations
for permutation in [engines, engines.sort(), engines.sort(>),engines.shuffle(),engines.shuffle()] {
    let t = Tree(permutation)
    assert(!t.contains("Fred"))
    assert(t.elementsEqual(t.insert("Thomas")))
    assert(!engines.contains { !t.contains($0) })
    assert(t.elementsEqual(engines.sort()))
    print(t.joinWithSeparator(","))
}

	Sorting Nibbles in S… on Sorting Nibbles in Swift
	Collection Data Stru… on Arrays, Linked Lists and …
	Swift化零为整：Reduce 详解… on Arrays, Linked Lists and …
	Writing A Generic St… on Generic Collections, SubSequen…
	Writing A Generic St… on Collection Indices, Slices, an…

	Sorting Nibbles in S… on Sorting Nibbles in Swift
	Collection Data Stru… on Arrays, Linked Lists and …
	Swift化零为整：Reduce 详解… on Arrays, Linked Lists and …
	Writing A Generic St… on Generic Collections, SubSequen…
	Writing A Generic St… on Collection Indices, Slices, an…

Airspeed Velocity

African or European Swift?

Month: July 2015