Linked lists, enums, value types and identity

UPDATE: now conforms to Swift 2.1b2

Following on from the previous post, here’s some more on building collections from enums.

For our book on Swift, we wrote a simple linked list example to use in a section on conforming to CollectionType. Prior to indirect, it was written using a box, and I wanted to update it. But I hit a snag, of which more later. First, a bit about lists and value types.

Here is a super-simple linked list:

enum List<Element> {
    case End
    indirect case Node(Element, next: List<Element>)
}

You prepend another element to the list by creating a new node, with the next value that of the current list. To make that a little easier, we can create a method for it:

extension List {
    func cons(x: Element) -> List {
        return .Node(x, next: self)
    }
}

// a 3-element list: 3, 2, 1
let l = List<Int>.End.cons(1).cons(2).cons(3)

Just like the tree from last time, this list has an interesting property – it is “persistent“. The nodes are immutable – you cannot change them once created. Consing another element onto the list doesn’t copy the list, just gives you a new node that links on to the front of the existing list (again, the first couple of chapters of the Purely Functional Data Structures book is a really good primer for this stuff).

This means two lists can share a tail. To illustrate this, for the first time on this blog (duh duh duhhn…), a diagram:

Sharing

The immutability of the list is key here. If you could change the list (say, remove the last entry, or update the element held in a node), then this sharing would be a problem – a might change the list, and the change would affect b.

Values and Variables

This list is a stack, with consing as a push, and unwrapping the next element a pop. Suppose we added two methods to simplify these operations:

extension List {
    mutating func push(x: Element) {
        self = self.cons(x)
    }

    mutating func pop() -> Element? {
        switch self {
        case .End: return nil
        case let .Node(x, next: xs):
            self = xs
            return x
        }
    }
}

“What?! Mutating methods? I thought you said the list had to be immutable.”

Yes, I did. And it still is:

var stack = List<Int>.End.cons(1).cons(2).cons(3)
var a = stack
var b = stack

a.pop() // 3
a.pop() // 2
a.pop() // 1

stack.pop() // 3
stack.push(4)

b.pop() // 3
b.pop() // 2
b.pop() // 1

stack.pop() // 4
stack.pop() // 2
stack.pop() // 1

What this shows us is the difference between values and variables. The nodes of the list are values. They cannot change. A node of 3 and a reference to the next node cannot become some other value. It will be that value forever, just like the number 3 cannot change. It just is. Just because these values in question are structures with pointers to each other doesn’t make them less value-y.

A variable a, on the other hand, can change the value it holds. It can be set hold a value of an indirect reference to any of the nodes, or to the value End. But changing a doesn’t change these nodes. Just which node a refers to.

This is what these mutating methods on structs do – they take an implicit inout argument of self, and they can change the value self holds.1 This doesn’t change the list, just which part of the list the variable currently represents.

In this sense, the variables are iterators into the list:

Iterating

You can of course declare your variables with let instead of var, in which case the variables will be constant i.e. you can’t change the value they hold once set. But let is about the variables, not the values. Values are constant by definition.

Now, this is all just a logical model of how things work. In reality, the nodes are actually places in memory that point to each other. And they take up space, which we want back if it’s no longer needed. So Swift uses ARC to manage this, and frees them up:

Freeing

Conforming to SequenceType

OK, so if List variables act like iterators, we should be able to use them to iterate, right? For example, we can conform List to SequenceType:

extension List: SequenceType {
    func generate() -> AnyGenerator<Element> {
        // declare a variable to capture that
        //  tracks progression through the list:
        var current = self
        return anyGenerator {
            // next() will pop, returning nil
            // when the list is empty:
            current.pop()
        }
    }
}

And, to make things easier, to ArrayLiteralConvertible:

extension List: ArrayLiteralConvertible {
    init(arrayLiteral elements: Element...) {
        self = elements.reverse().reduce(.End) {
            $0.cons($1)
        }
    }
}

So now you can use lists with for...in:

let l: List = ["1","2","3"]
for x in l { print(x) }

It also means, through the power of protocol extensions, we can use List with dozens of standard library functions:

l.joinWithSeparator(",")        // "1,2,3"
l.contains("2")                 // true
l.flatMap { Int($0) }           // [1,2,3]
l.elementsEqual(["1","2","3"])  // true

Boxes and Identity

But not so many as if we conformed to CollectionType. CollectionType brings us properties like .first, which would be nice to use to peek at the first element without popping it.

More importantly, conforming to CollectionType gives a guarantee that multiple passes over the sequence are OK. As the docs for SequenceType say:

SequenceType makes no requirement on conforming types regarding whether they will be destructively “consumed” by iteration. To ensure non-destructive iteration, constrain your sequence to CollectionType.

Our list definitely fits this description – you can iterate it as much as you like. So we’d like it to be a collection.

Since variables holding nodes can be treated as iterators, it seems reasonable to use them them as a ForwardIndexType.

But first, lets separate out the nodes from the list and index types. It is tempting to just conform the enum to CollectionType and ForwardIndexType directly, but this will lead to problems. For example, those two protocols need very different implementations of ==. The index needs to know if two indices from the same list are at the same position. It should not need the elements to conform to Equatable. The collection, on the other hand, should be able to compare two different lists to see if they hold the same elements. It will need the elements to conform to Equatable.

So lets rename the enum to ListNode, making it a private implementation detail, and add public List and ListIndex types that use it:

private enum ListNode<Element> {
    case End
    indirect case Node(Element, next: ListNode<Element>)

    func cons(x: Element) -> ListNode<Element> {
        return .Node(x, next: self)
    }
}

public struct ListIndex<Element> {
    private let node: ListNode<Element>
}

public struct List<Element> {
    public typealias Index = ListIndex<Element>
    public var startIndex: Index

    // since this is a constant, no need to store it in a variable,
    // so use a computed property instead to save space
    public var endIndex: Index { return Index(node: .End) }

    public init() {
        startIndex = Index(node: .End)
    }
}


extension List {
    public mutating func push(x: Element) {
        startIndex = Index(node: startIndex.node.cons(x))
    }

    public mutating func pop() -> Element? {
        guard case let .Node(x, next: xs) = startIndex.node
            else { return nil }

        startIndex = Index(node: xs)
        return x
    }
}

extension List: ArrayLiteralConvertible {
    public init(arrayLiteral elements: Element...) {
        self = List()
        for x in elements.reverse() { push(x) }
    }
}

OK so now to conform List to CollectionType. The first step is to conform ListIndex to ForwardIndexType. Simple enough, successor() just needs to return the next node:

extension ListIndex: ForwardIndexType {
    public func successor() -> ListIndex<Element> {
        guard case let .Node(_, next: next) = node
            else { fatalError("cannot increment endIndex") }

        return ListIndex(node: next)
    }
}

But this is only half the story. ForwardIndexType conforms to Equatable, which requires you to implement ==. And this is where we hit the buffers… how can we do that?

public func == <T>(lhs: ListIndex<T>, rhs: ListIndex<T>) -> Bool {
    switch (lhs.node,rhs.node) {
    case (.End,.End): return true
    case (.Node,.End),(.End,.Node): return false
    case (.Node,.Node): // what to put here???
    }
}

The problem being, indirect enum cases do not have identity. Once upon a time, when this list code was implemented with box, the boxes had identity, which you could use to solve this problem:

// Box, being a class, will have a === operator
final class Box<Element> {
    let unbox: Element
    init(_ x: Element) { unbox = x }
}

enum ListNode<Element> {
    case End
    // a node will be held inside a box, so have identity
    case Node(Box<(Element, next: ListNode<Element>)>)

    func cons(x: Element) -> ListNode<Element> {
        return .Node(Box(x,next: self))
    }
}

Now, no-one is going to mourn the loss of Box, since it got all up in your pattern matching and generally dirtied up the joint. But it did mean you could write == like this:

func ==<T>(lhs: List<T>, rhs: List<T>) -> Bool {
    switch (lhs, rhs) {
    case (.End,.End):               return true
    case (.End,.Node),(.Node,.End): return false
    case let (.Node(l),.Node(r)):   return l === r
    }
}

This all fitted perfectly well with the notion of List being a value type. Box is not a value type – being a class, it is a reference type. But it is immutable – it only has one property, declared with let. So it has value semantics.

This is a subtle but important distinction. Value types are copied on assignment, while reference types are shared (or rather, the thing they refer to is shared). But value types can contain reference types as properties. When value types containing reference types are copied, it is a shallow copy. The two value types will share a reference. If the reference type in question has methods that can mutate its state, the containing type needs to account for this possibility. This can get pretty hairy.

In the case of Box, this is fine though, because it is immutable. The sharing doesn’t matter, because once created, a box can never be changed. Yes, what is boxed could be a reference type you could then mutate, but since what we are boxing is our enum, and that is a value type, it’s fine.

(the list could contain mutable reference types. But we can ignore this – in the same way an array containing reference types doesn’t meddle with the array’s value semantics.)

So, with our nodes being references, we essentially got a free globally-unique identifier for each one, which we could then use to implement node equality. But now that we have switched away from box, we can’t use === any more. So what can we do?

Well, we could rely on the fact that, really, indirect is kind of the same as the box approach, just with the compiler writing the boxing/unboxing in for you and hiding the details. So we could do something very unwise, and just look at the bits:

public func == <T>(lhs: ListIndex<T>, rhs: ListIndex<T>) -> Bool {
    switch (lhs.node,rhs.node) {
    case (.End,.End): return true
    case (.Node,.End),(.End,.Node): return false
    case (.Node,.Node):
        // engage ಠ_ಠ now...
        return unsafeBitCast(lhs, UInt.self) == unsafeBitCast(rhs, UInt.self)
    }
}

This is evil, but it appears to work. When a ListNode.Node is copied, with indirect this just makes a copy of the indirect reference. Then, if the variable is updated, it changes to be a new reference and is no longer equal.

But it is almost certainly Not A Good Idea. It’s relying on an implementation detail of indirect that, if it ever changed or if the compiler did some optimization under the hood that did something different, would lead to undefined behaviour. There could well be other ways in which my behaviour assumptions are incorrect and this breaks, though I haven’t found any.

Next hacky solution: only make the ends of the index equal. Even if two nodes might be the same, return false regardless:

public func == <T>(lhs: ListIndex<T>, rhs: ListIndex<T>) -> Bool {
    switch (lhs.node,rhs.node) {
    case (.End,.End): return true
    default:          return false
    }
}

Again, this mostly works, but we are still breaking the rules – in this case, the rules of Equatable.

Equatable requires that == be an equvalence relation. This means, amongst other things, that it must be reflexive – a thing must equal itself. And our indices don’t (unless they’re the end):

l.startIndex == l.startIndex  // false... not OK!

Mysterious undiagnosable bugs can ensue from this. There are some places in the standard library where == being an equivalence relation is relied upon. For example, Array takes a shortcut of first checking if two arrays share the same buffer. If they do, they have to be equal, right? But implement an Equatable type whose == isn’t reflexive, and you get some very strange behaviour:

struct NeverEqual: Equatable { }
func == (lhs: NeverEqual, rhs: NeverEqual) -> Bool {
    return false
}

var a = [NeverEqual()]
let b = a
a == b       // true
a[0] = b[0]
a == b       // false

…so we should probably rule this out.

Conforming to CollectionType

Finally, a third option: tag our list indices with a number, and use that to compare them. Whenever you change an index, increment/decrement the tag.

edit: I originally put this tag on the nodes, but Thorsten in the comments made a great point that it is more space-efficient to have it on the indices.

This solution is a bit more intrusive. We’ll have to reconfigure ListIndex to acommodate the tag:

public struct ListIndex<Element> {
    private let node: ListNode<Element>
    private let tag: Int
}

…and any places where generate new indices:

public struct List<Element> {
    public typealias Index = ListIndex<Element>
    public var startIndex: Index
    public var endIndex: Index { return Index(node: .End, tag: 0) }

    public init() {
        startIndex = Index(node: .End, tag: 0)
    }
}

extension ListIndex: ForwardIndexType {
    public func successor() -> ListIndex<Element> {
        guard case let .Node(_, next: next) = node
            else { fatalError("cannot increment endIndex") }

        return ListIndex(node: next, tag: tag.predecessor())
    }
}

extension List {
    public mutating func push(x: Element) {
        startIndex = Index(node: startIndex.node.cons(x),
                           tag:  startIndex.tag.successor())
    }

    public mutating func pop() -> Element? {
        guard case let .Node(x, next: _) = startIndex.node
        else { return nil }

        ++startIndex
        return x
    }
}

Now, index equality can now be performed just by comparing the tag:

public func == <T>(lhs: ListIndex<T>, rhs: ListIndex<T>) -> Bool {
    return lhs.tag == rhs.tag
}

Finally, we can conform List to CollectionType:

extension List: CollectionType {
    public subscript(idx: Index) -> Element {
        guard case let .Node(x, _) = idx.node
            else { fatalError("Subscript out of range") }

        return x
    }
}

and it gainst the collection-only features:

l.first  // .Some("1")
if let twoIdx = l.indexOf("2") {
    l[twoIdx]  // "2"
}

There’s no longer any need to implement generate for the SequenceType conformance. Collections automatically gain a default implementation using IndexingGenerator.

As an added benefit of the tagging solution, since the tag is the count of nodes prepended to .End, we can implement a constant-time count, even though the lists are forward-index only which would normally mean linear-time counting:

extension List {
    public var count: Int {
        return startIndex.tag
    }
}

It is true that two ListIndex types from two totally different lists with different values can now be equated and be equal to each other. But then, this is true of array’s indices too (cos, you know, they’re just integers).

We can also implement == for List the way users expect == to behave for value-semantic collections, by comparing the elements:

func == <T: Equatable>(lhs: List<T>, rhs: List<T>) -> Bool {
    return lhs.elementsEqual(rhs)
}

Efficient Slicing

Collections also get a default implementation of the slicing operation, [Range], which is how dropFirst works:

// equivalent of dropFirst(l)
let firstDropped = l[l.startIndex.successor()..<l.endIndex]

// firstDropped will be a Slice<List<String>>

But this default implementation returns a wrapper over the original collection, plus an index sub-range, so is bigger than it needs to be in List‘s case:

// Size of a list is size of two nodes, the start and end:
sizeofValue(l)             // returns 16

// Size of a list slice is size of a list, plus size of a subrange
// (a range between two indices, which in lists's case are also list nodes)
sizeofValue(dropFirst(l))  // returns 48

Remember, since endIndex was always the same value, it takes no space inside List, so the sizeofValue(l) is just the size of List’s sole member variable, startIndex.

But if we switch to hold an explicit endIndex property, lists could return subsequences of themselves by holding a different start and end:

public struct List<Element> {
    public typealias Index = ListIndex<Element>
    public var startIndex: Index
    public var endIndex: Index

    public init() {
        startIndex = Index(node: .End, tag: 0)
        endIndex = startIndex
    }

    // and add the slicing initializer, which takes a range:
    private init (subRange: Range<Index>) {
        startIndex = subRange.startIndex
        endIndex = subRange.endIndex
    }
    public subscript (subRange: Range<Index>) -> List<Element> {
        return List(subRange: subRange)
    }
}

// now lists take 32 bytes (a startIndex and an endIndex):
sizeofValue(l)             // returns 32

// but list slices are also lists, so only 32 bytes
sizeofValue(dropFirst(l))  // returns 32

This leads to another benefit of this list implementation. With many sliceable containers, including Swift’s arrays and strings, a slice shares the storage buffer of the original collection. This has an unpleasant side-effect: slices can keep the original collection’s buffer alive in its entirety, even if the original collection falls out of scope. If you suck a 1gb file into an array or string, then slice off a tiny part, the whole 1gb buffer will stay in memory until both the collection and the slice are destroyed.

With List, it isn’t quite as bad. As we’ve seen, the nodes are refcounted. Which means when the slices are the the only remaining copy, any elements dropped from the front will be reclaimed as soon as no-one is referencing them.

Not the back though – since the slice’s endIndex still has a reference to what comes after. You could optimize this even further by having slices only hold the tag for their end index if you wanted.

I hate writing conclusions

So, losing Box was mostly great, but it also lost us one benefit, of a free identity for our nodes. This identity is still secretly there, and I think it works similarly, but we can’t use it. I have a feeling there’s a case to be made for enums to be able to use it internally, without necessarily breaking with the idea that enums are value types. But there are probably hidden gotchas to this that I haven’t thought of. Tags serve as a reasonable subsitute (at the expense of an extra few bytes), but this approach wouldn’t easily adapt to more complex structures like trees, so conforming last week’s code to a collection type would probably prove tricky (though, a persistent stack like this list may help with creating the index type). If anyone wants to take a stab, let me know what you come up with!


  1. inout arguments in Swift are interesting too. They aren’t pass-by-reference, but rather pass-by-value-and-copy-back. Which makes several things about Swift work better than, say, C#. There’s more info in the book

5 thoughts on “Linked lists, enums, value types and identity

  1. Great post as always!
    Just one comment: Instead of having to tag each node you could also put the tag into the ListIndex (incrementing it in the successor() method). Then the tag would only take up space when indices are needed. With this approach the tag would count upwards from the beginning of the list and would be equivalent to the element index.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s