Writing A Generic Stable Sort

One more post on writing generic algorithms. Let’s take a non-trivial non-generic algorithm and write a generic version. We’ll port a Java implementation of merge sort, and also look at some other interesting things we might do to some Java code when converting it to Swift.

Why bother with a merge sort? Well, the standard library comes with a very efficient in-place sort, based on the introsort, but it’s not a stable sort – that is, two otherwise equal entries aren’t guaranteed to stay in the same order, something that is often expected in things like grids that sort when you click a column header.

To start off, here is a Java version of a bottom-up merge sort, taken from Sedgewick, in all its Java-y C-for-loop-y zero-based-integer-index-y glory. Check out the vertigo-inducing braceless `for` loops!

```public class Merge {
private static Comparable[] aux;

private static void merge(Comparable[] a, int lo, int mid, int hi) {
int i = lo, j = mid+1;

for (int k = lo; k <= hi; k++)
aux[k] = a[k];

for (int k = lo; k <= hi; k++)
if      (i > mid)
a[k] = aux[j++];
else if (j > hi)
a[k] = aux[i++];
else if (aux[j].compareTo(aux[i]) < 0)
a[k] = aux[j++];
else
a[k] = aux[i++];
}

public static void sort(Comparable[] a) {
int N = a.length;
aux = new Comparable[N];
for (int sz = 1; sz < N; sz = sz+sz)
for (int lo = 0; lo < N-sz; lo += sz+sz)
merge(a, lo, lo+sz-1, Math.min(lo+sz+sz-1, N-1));

}
}
```

This is generic on the array’s contents, but not on the container (not that you couldn’t write a container-generic version in Java too, but this isn’t a Java blog, I get quite enough Java in my day job thanks).

A good first step to turn this into a generic Swift algorithm would be just to port it to `Array` first. But even this presents some interesting problems – it’s not just a straight port, despite Java and Swift’s common ancestry. So let’s port it to Swift and make it more Swift-like at the same time, while keeping the algorithm broadly similar (speaking of which, yes, I know you can do it recursively but we’ll stick with the iterative version).

A first step would be ditching the very Java-esque practice of creating a class just to wrap some static methods. This passes for normal in the kingdom of nouns, but in Swift this would be more suited to an extension, with `self` taking the place of the `a` parameter:

```extension Array {
// really this ought to be marked rethrows but
// let’s ignore that to keep things simple...
public mutating func stableSortInPlace(
isOrderedBefore: (Element,Element)->Bool
) {
let N = self.count
// etc.
}
}
```

Why not write a version for `Array where Element: Comparable`? Well, `Comparable` conforms to `Equatable`, which has the following stern warning in its comment:

Equality implies substitutability. When x == y, x and y are interchangeable in any code that only depends on their values.

When needing a stable sort, the whole point is that you’re only comparing part of the values – you sort by last name, while wanting rows with the same last name to stay in the same order. Obviously that means that the two values you’re comparing are not substitutable. So it probably only makes sense to implement this sort with a comparator function. Never, ever, conform to `Comparable` by comparing only part of the visible properties of your type.

The first problem we face is that temporary storage array `aux`. Well, two problems really. The Java version does something that would be pretty monstrous in anything other than a textbook example – it uses a static variable. This would be fine for single-threaded code but if ever you used this same function simultaneously from two threads (even on two different arrays) then kerplowie. Brings back fond memories of debugging production problems from someone calling `strtok` from multiple threads. Sorry, did I say fond? I meant traumatic.

Anyway, even if you wanted to do it with a static property, you couldn’t, because you can’t add static variables to generic types in Swift. (if you wonder why – think about what would happen if there were a static property on `Array` – should there be just one static variable for both `[Int]` and `[String]`, or two different ones?)

So we need another way. We could just declare a new array inside `merge` every time, but that would allocate fresh array storage on every call, which would be horribly inefficient.

Alternatively, you could create the array once in the `sort` function, just like in the Java version, but then pass it as an argument into the `merge` function. This would ugly up the code a bit. But more importantly, it wouldn’t work with Swift arrays, because they are value types. If you created an array in `sort` then passed it into `merge`, it could be copied(-on-write), not re-used, which would be even worse than declaring a new one.

(If you are thinking `inout` or `var` is the solution, you’re also out of luck there. Despite what that call-site ampersand might imply, `inout` in Swift is not pass-by-reference, but rather pass-by-value-then-copy-back. `var` is similarly no help – it’s just shorthand for declaring a second mutable variable and assigning it the value of the parameter. Plus `var` is going away soon, probably partly because it caused this kind of confusion. UPDATE: per @jckarter, the `inout` situation may not be so clear-cut and ought to work in this case)

You could use reference-type storage, like `NSArray`, instead. Or you could allocate some memory yourself using `UnsafeMutablePointer`. Or use a `Box` class to wrap an array in a reference type. But here’s a perhaps neater alternative – declare `merge` as an inner function, and make it a closure that captures a local array variable:

```extension Array {
public mutating func stableSortInPlace(
isOrderedBefore: (Element,Element)->Bool
) {
let N = self.count
// declare a local aux variable
var aux: [Element] = []
// make sure it has enough space
aux.reserveCapacity(self.count)

// local merge function that can operate on (capture) self
func merge(lo: Int, _ mid: Int, _ hi: Int) {
// merge captures the aux variable _by ref_
// we can now use aux to merge self
}

for etc. {
merge(etc.)
}

}
}
```

When a function closes over a variable, it captures it by reference. This means that `merge` and the outer sorting function will share the same instance, exactly what we want. Unlike the static approach, there’s no risk that a second call will use the same storage – `aux` is still local to the call to sort, and two different calls will use two different local storage arrays, and so will their inner variables. The one downside of this is we wouldn’t be able to expose our `merge` function publicly, even though it could be a useful algorithm in its own right.

By the way, even though we declared `merge` with `func`, it is still a closure – don’t conflate the term closure (functions plus captured environment), with closure expressions (the Swift term for the shorthand syntax for writing anonymous functions). Local functions declared with `func` are just as closurey as closure expressions.

OK that solves that problem, but there’s a second issue. In the Java version, the array is declared to be a given fixed size:

```    aux = new Comparable[N];
```

and then the `merge` function just plunges in and uses bits of it:

```    for (int k = lo; k <= hi; k++)
aux[k] = a[k];
```

Here’s another place where both Java and Swift start from their common ancestor, but then Java zigs while Swift zags. In C, you declare a variable, and then you can immediately read from it without giving it a value. Everyone agrees that this is all very horrible and monstrous, but to not have that you have to choose a path – give everything a default value, like in Java, or ban reading until you’ve written, like in Swift.

There are maybe performance reasons why Swift’s choice is better. But really, it was forced by other factors. In Java, everything is either a primitive type, or a reference. With such a limited set of types it’s reasonable to give everything a default value – either zero, empty, or null. But Swift isn’t so simple. It doesn’t even really have “primitive” types – even `Int` is a struct. And while it has references, they aren’t always nullable. So everything having a default isn’t really an option.

So it follows that in Swift you can’t have an `Array` where you just declare it to be a given size then plunge in getting and setting elements at arbitrary indices. I mean, you could write an array that worked like that, but then you’d have to track internally when values for given index had been set or not. So instead, to make an array have `n` elements, you have to put `n` in. You can do that easily with `Array(count: n, repeatedValue: ???)`, but for our purposes, what would you put as the value? `Element` is completely generic, you don’t know anything about it, so there’s no reasonable default.

One solution is to use an array of `Optional`. That way, you can set them all to `nil`. Probably this would be one of those rare legit uses of an implicitly-unwrapped optional – where you can prove that at the time you use the optional that it wil never be `nil`, so there’s no harm unwrapping it implicitly. In fact, you’d like it to assert if you ever unwrapped a `nil`, as this would mean there was something horribly wrong with your logic.

```extension Array {
public mutating func stableSortInPlace(
isOrderedBefore: (Element,Element)->Bool
) {
let N = self.count
var aux = Array<Element!>(count: N, repeatedValue: nil)

func merge(lo: Int, _ mid: Int, _ hi: Int) {
var i = lo, j = mid+1

for var k = lo; k <= hi; k++ {
aux[k] = self[k]
}

for var k = lo; k <= hi; k++ {
if      i > mid { self[k] = aux[j++] }
else if j > hi  { self[k] = aux[i++] }
else if isOrderedBefore(aux[j],aux[i])
{ self[k] = aux[j++] }
else            { self[k] = aux[i++] }
}
}

for var sz = 1; sz < N; sz = sz+sz {
for var lo = 0; lo < N-sz; lo += sz+sz {
merge(lo, lo+sz-1, min(lo+sz+sz-1, N-1))
}
}

}
}
```

This now works, and is pretty much like the Java version. But implicitly-unwrapped optionals are a bit unpleasant, and anyone unhappy about the merge sort’s linear space requirements might blanche at the extra bits they take up. And the rest isn’t very Swift-y. Those `for` loops and `++` are not long for this world, and it’ll bit a bit painful to translate this into a fully generic function.

• Flip it around and have the merge logic copy into the `aux` array, then copy back into `self`, using `append`, removing the need to size it up at the beginning.
• At the same time, we can switch the `for` for a `while`, and ditch the `++`
• Switch from closed ranges (up-to-and-including `hi`) to half-open ranges (up-to but not including `hi`) since this is more the Swift convention (and will be easier in the next part)
• Use `appendContentsOf` and `replaceRange` instead of hand-rolled loop-and-assign

which gives us this merge function:

```  // ditch the optionals
var aux: [Element] = []
// make sure aux has enough memory to store a fully copy
// (note, does _not_ actually increase aux.count)
aux.reserveCapacity(N)

func merge(lo: Int, _ mid: Int, _ hi: Int) {
// wipe aux, but keep the memory buffer
// at full size each time around
aux.removeAll(keepCapacity: true)

var i = lo, j = mid
while i < mid && j < hi {
// append the lesser of either element
if isOrderedBefore(self[j],self[i]) {
aux.append(self[j])
j += 1
}
else {
aux.append(self[i])
i += 1
}
}
// then append the stragglers
// (one of the next 2 lines will be a no-op)
aux.appendContentsOf(self[i..<mid])
aux.appendContentsOf(self[j..<hi])
// then copy the sorted temp storage back into self
self.replaceRange(lo..<hi, with: aux)
}
```

Finally, we need to do the same for the sorting `for` loops:

```  for var sz = 1; sz < N; sz = sz+sz {
for var lo = 0; lo < N-sz; lo += sz+sz {
merge(lo, lo+sz-1, min(lo+sz+sz-1, N-1))
}
}
```

That outer `for` is kinda sneaky, since it’s doubling `sz` each time around the loop, so that needs a `while` rather than a `stride`. The inner one is a lot simpler though, it’s just striding over the array by a constant `sz*2` each time. Moving to half-open rather than closed ranges makes this a whole lot easier, since it gets rid of all the annoying `-1`s littered about:

```  var sz = 1
while sz < N {
for lo in 0.stride(to: N-sz, by: sz*2) {
merge(lo, lo+sz, min(lo+sz*2, N))
}
sz *= 2
}
```

This leaves us with a final version, Swiftified if not yet generic. It’s a little more verbose than the original Java, but mainly because of the loss of the compact `for` and `++` notation, and hopefully a bit more readable for it:

```extension Array {
public mutating func stableSortInPlace(
isOrderedBefore: (Element,Element)->Bool
) {
let N = self.count

var aux: [Element] = []
aux.reserveCapacity(N)

func merge(lo: Int, _ mid: Int, _ hi: Int) {

aux.removeAll(keepCapacity: true)

var i = lo, j = mid
while i < mid && j < hi {
if isOrderedBefore(self[j],self[i]) {
aux.append(self[j])
j += 1
}
else {
aux.append(self[i])
i += 1
}
}
aux.appendContentsOf(self[i..<mid])
aux.appendContentsOf(self[j..<hi])
self.replaceRange(lo..<hi, with: aux)
}

var sz = 1
while sz < N {
for lo in 0.stride(to: N-sz, by: sz*2) {
merge(lo, lo+sz, min(lo+sz*2, N))
}
sz *= 2
}
}
}
```

Making it work on any random-access collection

OK, now to make it generic. A lot of that rework from before will make this easier.

First, to make this an extension of a protocol instead of protocol instead of `Array`. We actually used `self.replaceRange`, so let’s make it an extension of `RangeReplaceableCollectionType` (maybe a bit of an overconstraint – you could do without it and just make do with `MutableCollectionType`, but let’s ignore that). Plus the `where` clauses. We know the index needs to be random-access. We also rely on slicing, so per the previous article on this topic, we need to require `SubSequence` contains the same elements as the sequence:

```extension RangeReplaceableCollectionType
where
Index: RandomAccessIndexType,
SubSequence.Generator.Element == Generator.Element {
```

Doing this in Xcode will now show you all the places where you are relying on assumptions that no longer hold now you aren’t just writing against `Array` (though, more spring up as you fix these…)

Easy ones first. `Generator.Element` instead of just `Element`:

```    public mutating func stableSortInPlace(
isOrderedBefore: (Generator.Element,Generator.Element)->Bool
) {
let N = self.count

var aux: [Generator.Element] = []
```

Next, `aux.reserveCapacity`. We want to reserve the number of elements, but that comes (via `aux.count`) in the form of a `Self.Distance`. Remember, distances are always integers, but they might not be `Int`. We have to use `numericCast` to fix this:

```            aux.reserveCapacity(numericCast(N))
```

I admit, I haven’t quite got my head around whether this is an OK thing to do or not. For almost every reasonable circumstance, this will be a no-op, since `Distance` will be an `Int`. If for some reason you were using a class that used a smaller `Distance`, it will upconvert it. There’s a slim chance that `Distance` could be bigger than an `Int` – say if you’re on 32-bit platform but using a collection that used an explicit 64-bit index? Or maybe some giant virtual collection that used a `BigInt`? But for now I shall ignore these possibilities with reckless abandon.

So long as the `where` constraints above, the `merge` function is easy. You just make `lo`, `mid`, and `hi` an `Index` instead of an `Int`:

```    func merge(lo: Index, _ mid: Index, _ hi: Index) {
// the rest is the same
}
```

OK now for the interesting bit, the sorting loops:

```        var sz = 1
while sz < N {
for lo in 0.stride(to: N-sz, by: sz*2) {
merge(lo, lo+sz, min(lo+sz*2, N))
}
sz *= 2
}
```

The trick is, knowing which ones are indices, and which ones are distances (by which indices will be advanced).

`sz` is a distance. It’s the length by which each pass strides through the collection performing the merge, and which doubles each time around (giving the algorithm its linearithmic performance characteristic). To make that explicit, we need to annotate it with a type (otherwise it’s going to default to being an `Int`):

```    var sz: Index.Distance = 1
```

We already established `N` is the size of the collection and is also a distance, so this is fine (you can compare two distances, and you can double distances):

```  while sz < N {
// inner for stride
sz *= 2
}
```

Finally, the inner stride:

```  for lo in 0.stride(to: N-sz, by: sz*2) {
merge(lo, lo+sz, min(lo+sz*2, N))
}
```

`lo` is an `Index` – that’s what we pass in to `merge` as `lo`. The calculated `mid` and `hi` are indices too. When you see `0` as an index, that really means `startIndex` so we can substitute that.

The only other tricky part is `N`. With zero-based integer indices, `N` could mean two things: the length of the collection, or the end of the collection (i.e. `startIndex.advancedBy(N)`). Here, in this inner loop, it means the latter. So `N-sz` means “back from the end by `sz`“.

Similarly `min(lo+sz*2, N)` means “add `sz*2` to `lo`, or `endIndex` if that goes past the end”. So you could write `min(lo+sz*2, N)`. But there’s already a function for “go this far, but not past this point”. `advancedBy` is overloaded with a version that takes a `limit:` argument. So instead, you can write this as `lo.advancedBy(sz*2,limit: endIndex))`. This is arguably more correct as well, since it could be an error to add a distance to an index to take it past the end of the collection (like with `String` indices).

So we rewrite the inner loop as:

```  for lo in startIndex.stride(to: endIndex-sz, by: sz*2) {
}
```

Again, if you’re doing this in Xcode, the compiler will help you along (for example, it will complain if you’re trying to take the minimum of an index and a distance, since you can’t do that, which is a hint you need an index instead of a distance).

One final gotcha. After you make all these changes and you’re sure everything is correct, the `stride` still won’t compile. We need one more constraint on the protocol: `Index.Distance == Index.Stride`. Yes, this should always be the case (that’s the whole point of making random-access indices strideable – so you can stride by `Distance`), but you still need to make it explicit.

After all this we finally have a fully generic random-access in-place merge sort:

```extension RangeReplaceableCollectionType
where
Index: RandomAccessIndexType,
SubSequence.Generator.Element == Generator.Element,
Index.Distance == Index.Stride {
public mutating func stableSortInPlace(
isOrderedBefore: (Generator.Element,Generator.Element)->Bool
) {
let N = self.count

var aux: [Generator.Element] = []
aux.reserveCapacity(numericCast(N))

func merge(lo: Index, _ mid: Index, _ hi: Index) {

aux.removeAll(keepCapacity: true)

var i = lo, j = mid
while i < mid && j < hi {
if isOrderedBefore(self[j],self[i]) {
aux.append(self[j])
j += 1
}
else {
aux.append(self[i])
i += 1
}
}
aux.appendContentsOf(self[i..<mid])
aux.appendContentsOf(self[j..<hi])
self.replaceRange(lo..<hi, with: aux)
}

var sz: Index.Distance = 1
while sz < N {
for lo in startIndex.stride(to: endIndex-sz, by: sz*2) {
}
sz *= 2
}
}
}
```

I’d suggest you try it out on a mutable random-access collection that doesn’t have integer indices. Except, uh, there aren’t any. There are random-access collections without integer indices – reverse collections for example. Or (with a bit of extension-bullying) `String.UTF16View`. But unfortunately these aren’t mutable! So perhaps I’ve just totally wasted your time. But it was fun, right?

OK so there’s also a much more important benefit to this – it no longer relies on the index starting from zero. Yes, we could have done this for `CollectionType where Index: IntegerType`, but that would probably have been harder, since the compiler would not have helped as much when we made invalid assumptions. The `0` instead of `startIndex` might be easy to catch, but assumptions about when `N` means `endIndx` vs when `N` means `self.count` are much tricker.

Now that we’ve made this change, you can now apply the algorithm in-place to subranges, such as:

```var a = Array(1...5)
a[1..<4].stableSortInPlace(>)
// a = [1,4,3,2,5]
```

And finally, just to check it really is still a stable sort after all this futzing around (that property is easy to break if you make innocent-seeming changes to the order of your `i` and `j` comparisons – something I learned the hard way, forgot, then remembered the hard way during the course of writing this…)

```var pairs = [
(2,1), (2,2),
(3,1), (3,2),
(1,1), (1,2),
(1,3),(2,3),(3,3)
]

pairs.stableSortInPlace { \$0.0 < \$1.0 }
print(pairs.map { \$0.1 })
// if sort is stable, ought to print
// [1, 2, 3, 1, 2, 3, 1, 2, 3]
```

So, you’ve figured out the whole indices versus distances situation, and now you’re feeling pretty happy and you start trying out some more generic collection algorithms. And before long you smack your head against one of the more confusing things about generic collections: a sequence’s `SubSequence` doesn’t necessarily contain the same element type as that sequence. This usually manifests itself in one of those compiler errors that has you yelling “what are you talking about you infernal machine” at the compiler before eventually realizing what you’ve done wrong and sheepishly fixing it.

For example, suppose you want to find the location of the first subsequence in a collection. Like `indexOf`, but for another sequence rather than an element. The simple brute-force approach would be to iterate over every subsequence, trying to match it. Something like this – except it won’t compile:

```extension CollectionType
where
Generator.Element: Equatable {

func search<
Other: SequenceType
where
Other.Generator.Element == Generator.Element
>(pattern: Other) -> Index? {

for idx in self.indices {
// error: cannot convert value of type 'Other'
// to expected argument type...
if suffixFrom(idx).startsWith(pattern) {
return idx
}
}
return nil
}
}
```

Until you know why, this can be very confusing. You’ve constrained the other sequence to have the same elements as the collection, so why can’t you use `startsWith` on it?

Except you’re not calling `startsWith` on `Self`. You’re calling it on `Self.SubSequence`, because that’s what `suffixFrom` returns – it’s a slicing operation. And `SubSequence.Generator.Element` isn’t guaranteed to be the same as `Self.Generator.Element`.

Here’s the relevant declaration:

```protocol CollectionType {
typealias SubSequence : Indexable, SequenceType = Slice<Self>
}
```

There’s nothing about this definition that forces the elements of `SubSequence` to be anything specific. Really, this needs a `where` clause on the typealias – something that Swift doesn’t currently support:1

```protocol CollectionType {
typealias SubSequence : Indexable, SequenceType
// not (yet?) valid Swift
where SubSequence.Generator.Element == Generator.Element
}
```

So, in the mean-time, the way around this is to specify any requirements like this as part of the declaration. A fix that might seem to work, but is over-constraining, would be to require subsequences be the same type:

```extension CollectionType
where
Generator.Element: Equatable,
SubSequence == Self {
// etc
}
```

This would compile, and it would work fine with types whose slices are the same type as themselves, like strings:

```// a slice of a String.CharacterView is
// a String.CharacterView, so this is fine
"hello".characters.search("ll".characters)
```

but not for arrays, who have an `ArraySlice` type, or arbitrary collections, that have the default `Slice` implementation.

```// "type of expression is ambiguous without more context", apparently
[1,2,3,4].search(2…4)
// "ambiguous reference to member '..<'" uhhh what?
(0..<5).search(2...4)
```

Instead, you can constrain the elements to be the same while allowing the slice type itself to differ:

```extension CollectionType
where
Generator.Element: Equatable,
// guarantee subsequences contain the same element
SubSequence.Generator.Element == Generator.Element {

func search<
Other: SequenceType
where
Other.Generator.Element == Generator.Element
>(pattern: Other) -> Index? {

for idx in self.indices {
// and now this line will happily compile...
if suffixFrom(idx).startsWith(pattern) {
return idx
}
}
return nil
}
}

[1,2,3,4].search(2...4) // returns 1
(0..<5).search([3,4])   // returns 3
```

Optimizing for Random Access

It’s often the case that you can optimize a collection algorithm if you know it has random access. For example, given the above algorithm, you can tweak it to avoid searching for a pattern that is longer than what remains of the collection, so long as both the collection and the pattern are random access and therefore able to calculate their lengths (and the lengths of their collection) in constant time. OK, maybe it’s not Boyer-Moore, but it’s better than nothing and the code to do this is pretty simple, in fact it’s more constraints than code at this point:

```extension CollectionType
where
Generator.Element: Equatable,
Index: RandomAccessIndexType,
SubSequence.Generator.Element == Generator.Element {
func search<
Other: CollectionType
where
Other.Index: RandomAccessIndexType,
Other.Index.Distance == Index.Distance,
Other.Generator.Element == Generator.Element
>(pat: Other) -> Index? {

// if pattern is longer, this cannot match, exit early
guard !isEmpty && pat.count <= count else { return nil }

// otherwise, search from the start up to the
// end less the space for the pattern
// check if a slice from this point
// starts with the pattern
if self.suffixFrom(i).startsWith(pat) {
// if it does, we've found it
return i
}
}
return nil
}
}
```

(one part of the where clause, `Other.Index.Distance == Index.Distance`, is maybe overconstraining, but the alternative would be fiddling around with casts – non-`Int` distances aren’t exactly common, even `Bit`’s distance is an `Int`).

We’ve talked about overload priority a long time ago, and it still pertains even with protocol extensions. The most specific function wins. Here, if your two collections are random access, this version will be called because it is more tightly constrained than the first version. So:

```// calls the version that works for any index
"hello".characters.search("ll".characters)
// calls the version optimized for random-access
[1,2,3,4].search(2...4) // returns 1?
```

Now, there is a potentially nasty bug you could stumble into here, and the compiler won’t save you from it. Try removing the constraints requiring the indices be random access on the second version. It still compiles! Except, now this version will be called strings as well. But in the case of non-random indices, it’s not an optimized version, it’s potentially slower. The reason being, `pat.count` and `endIndex.advancedBy` aren’t constant time, they’re linear, because they involve incrementing the index one by one. Now it could be worse – at least it’s not accidentally quadratic, but other examples could easily be (imagine implementing a sort that assumed constant-time advance but didn’t get it), so it’s something to watch out for.

Protocols, Extensions and Dynamic Dispatch

Once upon a time, it was harder to have made this mistake. In earlier versions of the standard library there were two ways to advance an index: a free `advance` function, that was `O(1)` for random access indices and `O(n)` otherwise, and then there was an `advancedBy` method on only the `RandomAccessIndexType`.

As part of the great extensionizing, the free `advance` function went away, and `advancedBy` moved into `ForwardIndexType`. So now you can call it on any index type, but you’ll get different performance characteristics depending on the type you call it on.

If you look at the declaration of `ForwardIndex.advancedBy`, you’ll see that it’s declared twice. Once inside the protocol:

```public protocol ForwardIndexType : _Incrementable {
public func advancedBy(n: Self.Distance) -> Self
}
```

and then again just below, as an extension:

```extension ForwardIndexType {
public func advancedBy(n: Self.Distance) -> Self
}
```

This seems a little weird. Why declare it twice? Why not just declare it once as an extension? It has to do with static versus dynamic dispatch, and which actual implementation is called.

To show this, let’s create our own toy index, initially without random access:

```struct MyIndex {
let i: Int
}

extension MyIndex: BidirectionalIndexType {
func successor() -> MyIndex {
print("successing!")
return MyIndex(i: i+1)
}
func predecessor() -> MyIndex {
print("predecessing!")
return MyIndex(i: i-1)
}
}

// indices need to be Equatable
func == (lhs: MyIndex, rhs: MyIndex) -> Bool {
return lhs.i == rhs.i
}
```

I’ve put prints in there so you can tell when the functions are called. Now, this index type will have the default `advancedBy` implementation, which just calls `successor()` multiple times. And because of the `print`, you can see it happening:

```func castHolyHandGrenadeOfAntioch<
I: ForwardIndexType
>(idx: I) {
}

let idx = MyIndex(i: 1)
// prints “successing!” 3 times
```

OK, now, we conform the type to be random access by giving it advance and distance functions:

```extension MyIndex: RandomAccessIndexType {
func advancedBy(n: Int) -> MyIndex {
return MyIndex(i: i+n)
}
func distanceTo(end: MyIndex) -> Int {
return end.i - i
}
}
```

and now, when we call the `castHolyHandGrenadeOfAntioch`, we see nothing printed. Instead, even though the function only required a forward index, it uses the better `advancedBy` implementation of the random-access index.

But… this only works because `advancedBy` had been declared in the original `ForwardIndexType` protocol declaration. If, on the other hand, we added another method as an extension to different index types – say, a `retreatedBy` function – this would not happen.

First, let’s do that for `BidirectionalIndexType`:

```extension BidirectionalIndexType {
// just an example, you don't have to do this!
// since BidirectionalIndexType also supports
func retreatedBy(n: Distance) -> Self {
var i = n
var result = self
// since Distance isn’t strideable, this
// is a rare case where you might mourn the
// loss of C-style for and the ++ operator…
while i > 0 {
result = result.predecessor()
i -= 1
}
return result
}
}
```

and then, an optimized version for `RandomAccessIndexType`:

```extension RandomAccessIndexType {
func retreatedBy(n: Distance) -> Self {
}
}
```

But now, if we repeat our earlier experiment but calling this function, we get a different result:

```// only requires bidirectional access
func runAway<I: BidirectionalIndexType>(idx: I) {
idx.retreatedBy(10)
}

// and even though MyIndex is random access
let braveSirRobin = MyIndex(i: 10)

// the bidirectional version is called...
runAway(braveSirRobin)
// prints “predecessing!” 10 times
```

So we can see the difference between default implementations that are also declared as part of the protocol, versus ones just created as extensions. With the former, generic functions get to take advantage of specialized implementations, whereas with the latter, only the implementation that matches the generic constraint will be called. This is why you often see functions declared twice in the standard library. For example, on `SequenceType`, `underestimateCount` is declared like this, so that if you pass in a collection to a function that takes a sequence, it can check it’s length if it’s a collection, without risking consuming the sequence.

1. That’s assuming that there isn’t a legitimate reason to vary elements in subsequences which I don’t… think… there is?

Collection Indices, Slices, and Generics

I missed posting about the last couple of releases of Swift before it was open sourced, but now that I have some time to write a couple of posts over the holidays, here’s a one about this sentence in the release notes for 2.0 beta 5:

For consistency and better composition of generic code, `ArraySlice` indices are no longer always zero-based but map directly onto the indices of the collection they are slicing and maintain that mapping even after mutations.

Why is this interesting enough to write a whole post over? It has to do with this bit of code that you see written from time to time:

```// iterate over the indices in a collection:
for idx in 0..<someCollection.count {
// use someCollection[idx] in some way
// that can't be done via for x in someCollection
}
```

What’s wrong with that? Well, if `someCollection` is an `Array`, nothing. Hmm, maybe readability. All collections have an `indices` property that does the same thing. If you think

```for idx in 0..<someCollection.count
```

```for idx in someCollection.indices
```

(if you find yourself wanting all but the last index, or all but the first, remember `Range` is a collection, so you can write things like `someCollection.indices.dropLast()`)

Anyway, readability aside, the problems come when you try and make your loop generic. For example, take everyone’s favorite way to start an algorithms text book, a binary search. Here’s a version for arrays:

```extension Array
// this seems to be the indentation style
// found in the open-sourced the Swift code...
where
Generator.Element: Comparable {
/// Returns an index of an element in `self` which
/// is equal to `element`, or `nil` if such value
///
/// - Requires: elements of `self` are ordered by <
///
/// - Complexity: O(log(`self.count`)).
public func binarySearch(
element: Generator.Element
) -> Int? {
var left = 0
var right = count - 1

while left <= right {
let mid = (left + right) / 2
let candidate = self[mid]

if candidate < element {
left = mid + 1
}
else if element < candidate {
right = mid - 1
}
else {
return mid
}
}
return nil
}
}
```

(you'd probably want to implement a version that takes an `isOrderedBefore` predicate, then implement this one in terms of that one, but let's ignore that for now)

Binary searches are notoriously hard to get right, and this one contains at least one bug – that I've left in deliberately to talk about later – as well as the ones I added accidentally that you can let me know about.

A first trainer-wheels attempt to make this generic might be to switch the extension to be of `CollectionType`, and constrain the index to be of type `Int`:

```extension CollectionType
where
Index == Int,
Generator.Element: Comparable {
// same as before
}
```

This seems like it's working at first. You can now use it on `ContiguousArray`, `UnsafeBufferPointer` and, up until it changed, `ArraySlice`. It won't work with random-access-indexed collections that don't use integer indices, but there aren't many of those around. But it makes a big assumption that the collection is zero based. As I mentioned in a previous changes post when collections all became sliceable, this became a very shakey assumption, and now it doesn't even apply to array slices. It's also not something the type system will help you find – you'd discover it at runtime when your out-of-bounds error traps.

You could fix it while keeping the constraint to integer indices, but at this point you may as well rip the band-aid off and just make it fully generic. This also helps you spot all the places where you need to fix your zero-based assumption. It starts easy enough…

```extension CollectionType
where
Index: RandomAccessIndexType,
Generator.Element: Comparable {
public func binarySearch(
element: Generator.Element
) -> Index? {
// instead of 0 for the start:
var left: Index = startIndex
// and instead of count - 1 for the end:
var right: Index = endIndex.predecessor()
```

but then we hit trouble:

```// error: binary operator '+' cannot be
// applied to two 'Self.Index' operands
let mid = (left + right) / 2
```

You can't just add two indices together. Think about non-contiguous storage like a node-based container like a linked list or a tree. You can't just add together two node pointers. Or for a random-access example, what about `ReverseRandomAccessCollection`, which is a wrapper around a random-access collection, and has an opaque index type. How could you add two of those together?

Really there are two concepts here – indices, and distances between indices. You can always count, as an integer, the number of times you need to call `successor()` on an index to get to a later index. This is the distance between them. You can do mathematical operations on distances – you can go twice as far from an index, or advance as far from one index as another is from the start. All indices have this ability, but bi-directional indices can have negative distances (the number of times you'd need to call `predecessor()`), and random-access indices can move by a distance in constant time.

When we had an integer index on a zero-based collection, a handy trick we were relying on is that any index is also the distance of that index from the start. So when we wrote `let mid = (left + right) / 2`, really this was saying that the mid-point was the average of the distance from the start of the two indices. But now we want our algorithm to work not just on zero-based integer indices but kind of random-access index (random-access because if you can't move around in constant time, you may as well do a linear search).

You could rewrite that generically as:

```let mid = startIndex.advancedBy(
(  startIndex.distanceTo(left)
+ startIndex.distanceTo(right)
) / 2)
```

but this would be a bit silly. Instead, you can just take the distance from the left to the right, halve that, and add it to the start. We can add this ability to `Range`, to give us a mid-point property (and which also fixes a bug, if you followed the earlier link):

```extension Range {
var mid: Element {
startIndex.distanceTo(endIndex) / 2
)
}
}
```

which we can use in a fully generic binary search:

```extension CollectionType
where
Index: RandomAccessIndexType {
public func binarySearch(
element: Generator.Element,
isOrderedBefore: (Generator.Element,Generator.Element)->Bool
) -> Index? {
var searchRegion = self.indices

while !searchRegion.isEmpty {
let mid = searchRegion.mid
let candidate = self[mid]

if isOrderedBefore(candidate,element) {
searchRegion.startIndex = mid + 1
}
else if isOrderedBefore(element,candidate) {
searchRegion.endIndex = mid
}
else {
return mid
}
}
return nil
}
}

extension CollectionType
where Index: RandomAccessIndexType,
Generator.Element: Comparable
{
public func binarySearch(
element: Generator.Element
) -> Index? {
return binarySearch(element, isOrderedBefore: <)
}
}
```

Note, we can still write `searchRegion.startIndex = mid + 1` because `RandomAccessIndexType` conforms to `Strideable`, which has this definition of `+`

```public func +<T : Strideable>(lhs: T, rhs: T.Stride) -> T
```

that allows you to add a distance (which is an indices' stride) to an index. There's a matching definition with the lhs and rhs flipped so you can write it the other way around.

This version makes a couple of other changes. It combines the left and right indices into a `Range`, `searchRegion`. I think this helps readability a bit – `!searchRegion.isEmpty` is a bit nicer than `left`<=`right` – and also allows us to use our `mid` extension. But more importantly, this rewrite ditches the other trick we were relying on when the index was an integer – you can go one past the end or ahead of the start without any consequences. When your index is an integer, who cares if you decrement it from `0` to `-1`. But some indices don't take kindly to going past their limit (try calling `&quot;hello&quot;.endIndex.successor()`). So to make the algorithm properly generic, it needed a rewrite to avoid this being a possibility.

```a[a.startIndex...idx]
a[a.startIndex..<idx]
a[idx..<a.endIndex]
```

you can now write:

```a.prefixThrough(idx)
a.prefixUpTo(idx)
a.suffixFrom(idx)
```

It's kinda fun, though probably a bit cute to be used in production code, to write Python-style one-sided subscript operations for these as well:

```postfix operator ..< { }
prefix operator ..< { }
prefix operator ... { }

struct RangeStart<I: ForwardIndexType> { let start: I }
struct RangeToEnd<I: ForwardIndexType> { let end: I }
struct RangeThroughEnd<I: ForwardIndexType> { let end: I }

postfix func ..< <I: ForwardIndexType>(lhs: I) -> RangeStart<I>
{ return RangeStart(start: lhs) }

prefix func ..< <I: ForwardIndexType>(rhs: I) -> RangeToEnd<I>
{ return RangeToEnd(end: rhs) }

prefix func ... <I: ForwardIndexType>(rhs: I) -> RangeThroughEnd<I>
{ return RangeThroughEnd(end: rhs) }

extension CollectionType {
subscript(r: RangeStart<Self.Index>) -> SubSequence {
return self.suffixFrom(r.start)
}
subscript(r: RangeToEnd<Self.Index>) -> SubSequence {
return self.prefixUpTo(r.end)
}
subscript(r: RangeThroughEnd<Self.Index>) -> SubSequence {
return self.prefixThrough(r.end)
}
}
```

which then means you can write:

```let a = [1,2,3,4]
a[1..<] // [2,3,4]
a[..<2] // [1,2]
a[...2] // [1,2,3]
```

You could even define pattern-matching versions, for use in a `switch` statement:

```infix operator ~= {
associativity none
precedence 130
}

func ~=<I: ForwardIndexType where I : Comparable>(pattern: RangeStart<I>, value: I) -> Bool {
return value > pattern.start
}

func ~=<I: ForwardIndexType where I : Comparable>(pattern: RangeToEnd<I>, value: I) -> Bool {
return value < pattern.end
}

func ~=<I: ForwardIndexType where I : Comparable>(pattern: RangeThroughEnd<I>, value: I) -> Bool {
return value <= pattern.end
}
```

so you can write:

```for i in [9, 10, 11] {
switch i {
case ..<10: print("below 10")
case ...10: print("10 or below")
case 2..<: print("2 or more")
default: fatalError()
}
}
```

If you're not familiar with `~=`, check out Ole's series of posts about pattern matching, it's pretty powerful.

Patient: Doctor doctor, it hurts when I do this.
Doctor: Well, don’t do that.

Your reminder that building arrays with reduce, while fun, is accidentally quadratic.

I was surprised at how surprising some found this. Quite a few people suggested the `reduce` version could be changed to not do the array copy (I don’t think it can). Some suggested maybe `+` could be optimized so it doesn’t perform a copy (I don’t think that it could easily, as we’ll see shortly).1

In other feedback, a few commented on the previous post about linked lists. Why implement an outdated data structure? What’s the point when we have arrays?

So, you know how sometimes I mention this isn’t a blog about Mac and iOS programming? It’s not a blog about Mac and iOS programming! Don’t put a enum-based linked list into your app just because I happen to find it interesting. I’ll probably find your ensuing performance problems interesting too. You won’t. That said, I think the linked list example is very interesting, and worth implementing and playing with, and might help shed some light on the Array `reduce` performance. And it might even be useful in real code in certain (infrequent) circumstances.2

So, to recap – sometimes, you’ll see `reduce` used to build an array (or dictionary or set), for example, in this implementation of `map`:

```extension SequenceType {
func mapUsingReduce<T>(transform: Generator.Element->T) -> [T] {
return reduce([]) { \$0 + [transform(\$1)] }
}
}
```

as opposed to creating a mutable array then adding to it from a `for` loop:

```extension SequenceType {
func mapUsingFor<T>(transform: Generator.Element->T) -> [T] {
var result: [T] = []
for x in self { result.append(transform(x)) }
return result
}
}
```

The difference being, `+` creates a copy of the accumulating array every time. And copying the array takes linear time, inside a loop over the full array, so the overall time taken increases quadratically with the length of the array being mapped:

Of course, people aren’t normally going around re-implementing `map` though: you more often see this technique with, say, filtering duplicates or building dictionaries of word frequencies. But the problem remains the same.

Why is this relevant to a list? Well, because you could implement a version of `map` using `reduce` on the list code from last time, like so:

```extension SequenceType {
func mapToList<T>(transform: Generator.Element->T) -> List<T> {
return reduce(List()) { \$0.cons(transform(\$1)) }.reverse()
}
}
```

The performance results you get are so perfectly half the array performance (because of the `reverse` step) that your teacher may accuse you of faking the results instead of doing the experiment:

This works because the list is persistent – it shares nodes between previous lists and newly consed lists, forever. So no copying needed. But this comes at the cost of only being able to grow from the head (hence the need for a `reverse`), and the list has to be fully immutable, so you have to make a copy to modify it even when it’s uniquely referenced. This is unlike `Array`, which can detect unique use of its buffer and just change it in-place, no copying required. Lists have other costs as well – to sum a list of numbers takes twice as long as to sum an array of numbers, as the indirection needed to traverse the list takes time.

So is the full copy on `+` with arrays fixable? To think about that, let’s first look at how a copy-on-write array might work. Mike Ash already has a great blog post on implementing a copy-on-write Array, so let’s do something a little different, which is to use the `ManagedBuffer` class from the standard library to build it.

ManagedBuffer

`ManagedBuffer` is a class you can inherit from, which simplifies the process of allocating/deallocating and managing storage on the heap. It is generic, and has two separate placeholders, `Value` and `Element`. `Element` is the type of the block of storage of n elements, allocated dynamically on creation. `Value` is the type of an extra single variable on the side for storing other information – for example, to implement an array, you would need to store the element count, as the elements need to be destroyed before the memory is deallocated. Access to the elements is via `withUnsafeMutablePointerToElements`, whereas the value can be accessed either through a similar unsafe method, or directly via a `.value` property.

Here’s a very simple self-destroying `ArrayBuffer`:

```private class MyArrayBuffer<Element>: ManagedBuffer<Int,Element> {
deinit {
self.withUnsafeMutablePointerToElements { elems->Void in
elems.destroy(self.value)
}
}
}
```

So, `MyArrayBuffer` is still generic on what elements it stores, but it fixes the `Value` of `ManagedBuffer` to just be an `Int`, which will store the number of elements in the buffer (bear in mind, we will allocate more storage than we have elements in the array, to avoid constantly reallocating).

When the buffer deinitializes, `MyArrayBuffer.deinit` will be called prior to `ManagedBuffer.deinit`, which deallocates the memory. This gives `MyArrayBuffer` a chance to destroy all its objects. Destroying is necessary if `Element` is something more than just a passive struct – for example, if the array contained other copy-on-write types, destroying them will trigger them freeing their memory if necessary.

Now, we can create an array type of a struct, with a private buffer as its storage:

```public struct MyArray<Element> {
private var _buf: MyArrayBuffer<Element>

public init() {
_buf = MyArrayBuffer<Element>.create(8) { _ in 0 } as! MyArrayBuffer<Element>
}
}
```

We don’t use `MyArrayBuffer`’s `init` directly – instead we use the class method from `ManagedBuffer`. Because this method returns the superclass, we force-downcast it to the right type.

Then, we turn `MyArray` into a collection type:

```extension MyArray: CollectionType {
public var startIndex: Int { return 0 }
public var endIndex: Int { return _buf.value }

public subscript(idx: Int) -> Element {
guard idx < self.endIndex else { fatalError("Array index out of range") }
return _buf.withUnsafeMutablePointerToElements { \$0[idx] }
}
}
```

Next, we need two fairly similar methods on the buffer, one to clone the storage and one to resize the storage. Cloning will be used when shared storage is detected, resizing when non-shared storage needs to get bigger:

```extension MyArrayBuffer {
func clone() -> MyArrayBuffer<Element> {
return self.withUnsafeMutablePointerToElements { oldElems->MyArrayBuffer<Element> in
return MyArrayBuffer<Element>.create(self.allocatedElementCount) { newBuf in
newBuf.withUnsafeMutablePointerToElements { newElems->Void in
newElems.initializeFrom(oldElems, count: self.value)
}
return self.value
} as! MyArrayBuffer<Element>
}
}

func resize(newSize: Int) -> MyArrayBuffer<Element> {
return self.withUnsafeMutablePointerToElements { oldElems->MyArrayBuffer<Element> in
let elementCount = self.value
return MyArrayBuffer<Element>.create(newSize) { newBuf in
newBuf.withUnsafeMutablePointerToElements { newElems->Void in
newElems.moveInitializeFrom(oldElems, count: elementCount)
}
self.value = 0
return elementCount
} as! MyArrayBuffer<Element>
}
}
}
```

Creating and populating the buffers in one shot is a little finicky – first we need to get the unsafe pointer to the existing elements, then call `create`, which takes a closure that receives the partially-created object (i.e. allocated but not initialized memory), which we then need to call `newBuf.withUnsafeMutablePointerToElements` on to copy the memory from the old buffer to the new.

The main difference between the two is `clone` doesn’t change the elements in the old buffer, just loads new copies into a new buffer. `resize` moves the elements from the old to the new storage (via `UnsafeMutablePointer`’s `moveInitializeFrom` method), then updates the old buffer to tell it that it no longer has any elements to manage – otherwise, it would try to destroy them during its `deinit`.

Finally, we give `MyArray` an `append` and `extend` method:

```extension MyArray {
public mutating func append(x: Element) {
if !isUniquelyReferencedNonObjC(&_buf) {
_buf = _buf.clone()
}

if _buf.allocatedElementCount == count {
_buf = _buf.resize(count*2)
}

_buf.withUnsafeMutablePointers { (val, elems)->Void in
(elems + val.memory++).initialize(x)
}
}

public mutating func extend<S: SequenceType where S.Generator.Element == Element>(seq: S) {
for x in seq { self.append(x) }
}
}
```

This is just sample code. In practice, you would break out the uniqueness and resizing code, so you could re-use it in subscript set or other mutating methods, but I’ve crammed it all in the `append` method to keep it brief. Also you’d want to reserve enough space for the `extend` up-front if possible, and avoid double-copying the buffer when it’s both shared and too small. But none of these things have a major impact on the bigger picture for our purposes.

OK now for the operators. First, `+=`, which being an assignment operator takes an `inout` left-hand side and just extends it with the right-hand side:

```func +=<Element, S: SequenceType where S.Generator.Element == Element>
(inout lhs: MyArray<Element>, rhs: S) {
lhs.extend(rhs)
}
```

And finally, `+`. We can implement this in terms of `+=`. The operator takes two immutable arrays, and adds them together to produce a new array. It does this by relying on the copy-on-write behaviour to create a mutable copy of the left-hand side, then extend it with the right-hand side:

```func +<Element, S: SequenceType where S.Generator.Element == Element>
(lhs: MyArray<Element>, rhs: S) -> MyArray<Element> {
var result = lhs
result += rhs
return result
}
```

In fact you could shorten this further by using the `var` modifier in front of the `lhs` argument:

```func +<Element, S: SequenceType where S.Generator.Element == Element>
(var lhs: MyArray<Element>, rhs: S) -> MyArray<Element> {
lhs += rhs
return lhs
}
```

I mention this second version because some suggested a better `reduce` solution might involve using `var` on the accumulating argument. But this would be similar to what is happening here with `lhs`: all `var` does is declare your passed-by-value variable to be mutable. It is still a copy – it is not the original variable somehow passed through by reference.

Can + be optimized?

We now have a fully working toy implementation of a copy-on-write array you can append to, and which has a `+` operator. Which means we can rewrite our `reduce` version of `map` with it:

```func mapUsingMyReduce<T>(transform: Generator.Element->T) -> MyArray<T> {
return reduce([]) { \$0 + [transform(\$1)] }
}

func mapUsingMyFor<T>(transform: Generator.Element->T) -> MyArray<T> {
var result = MyArray<T>()
for x in self { result.append(transform(x)) }
return result
}
```

and if you chart the performance, you’ll see both exhibiting the similar behaviour as with array.

So, given we now have an implementation we have complete control over, can we change `+` so it doesn’t make a copy? I don’t think so.

In a simpler case, could we change this:

```var a = MyArray<Int>()
a.extend(0..<3)
let b = a + [6,7,8]
```

so that it didn’t make a copy? It seems pretty obvious we can’t. `b` has to be a new copy of the array, in order to not affect `a`. Even if we don’t make any further changes to `a` after the creation of `b`, there’s no way the implementation of `+` could know this. Maybe the compiler could know this, and optimize accordingly, but the `+` func can’t.

Checking for unique references wouldn’t help here. `a` is still in existence, so the `lhs` variable will not be the only owner of the buffer.

`reduce` is no different. Here’s a possible implementation:

```extension SequenceType {
func myReduce<T>(initial: T, combine: (T,Generator.Element)->T) -> T {
var result = initial
for x in self {
result = combine(result,x)
}
return result
}
}
```

Assuming `combine` here is `{ \$0 + [transform(\$1)] }`, you can see that `+` similarly has no knowledge of the fact that we’re actually going to assign the outcome directly to the `result` variable. We know, on inspecting the code, that it’ll just be fine to add the right-hand side onto the left-hand side, if that were even possible (in theory it is, since even though the array is passed immutably by value, the buffer is a class and so could be mutated since it has reference semantics). But `+` can’t know that from where it sits. It definitely knows it’s copy of the left-hand side isn’t the only owner of the buffer. There is another owner too: `reduce` holds a copy of `result` – and is about to throw it away and replace it with a new result, but that’s coming after `+` has run.

One ray of hope is if arrays were also their own slices (which they aren’t – there is `ArraySlice` instead, which has extra overhead to track the start and end slice into the parent array). If they were, then perhaps they could be modified to allow one, but only one, array to have an append happen to it which the others could ignore. But this would probably add overhead to arrays in general, and the whole point of arrays is to be fast – you don’t want to slow them down just to cater to this use case.

Perhaps there is a very clever way of figuring all this out, with or without the compiler’s help. But such gymnastics don’t seem like a good idea even then. The semantics are that `+` creates a new array. Wanting it to secretly modify an existing one under very specific circumstances doesn’t seem like the right solution – mutating the array is. If you prefer, you can wrap that `var` up in a nice little generic method and then pretend it’s not there. But it’ll make your code faster.

1. Others suggested you shouldn’t care about this sort of thing until a profiler tells you that you need to (I think you definitely should care while you write your code – saying “I’ll only care when the profiler tells me there’s a problem” feels a bit like “I’ll only write correct code when the unit tests tell me it isn’t correct”).
2. I also think the addition of features like `enum`, as well as the flexibility of choosing between object or function solution, and the “safe until you ask not to be” would make Swift a really great CS teaching language. Perhaps a future edition of this book could be in Swift.

Linked lists, enums, value types and identity

UPDATE: now conforms to Swift 2.1b2

Following on from the previous post, here’s some more on building collections from enums.

For our book on Swift, we wrote a simple linked list example to use in a section on conforming to `CollectionType`. Prior to `indirect`, it was written using a box, and I wanted to update it. But I hit a snag, of which more later. First, a bit about lists and value types.

Here is a super-simple linked list:

```enum List<Element> {
case End
indirect case Node(Element, next: List<Element>)
}
```

You prepend another element to the list by creating a new node, with the next value that of the current list. To make that a little easier, we can create a method for it:

```extension List {
func cons(x: Element) -> List {
return .Node(x, next: self)
}
}

// a 3-element list: 3, 2, 1
let l = List<Int>.End.cons(1).cons(2).cons(3)
```

Just like the tree from last time, this list has an interesting property – it is “persistent“. The nodes are immutable – you cannot change them once created. Consing another element onto the list doesn’t copy the list, just gives you a new node that links on to the front of the existing list (again, the first couple of chapters of the Purely Functional Data Structures book is a really good primer for this stuff).

This means two lists can share a tail. To illustrate this, for the first time on this blog (duh duh duhhn…), a diagram:

The immutability of the list is key here. If you could change the list (say, remove the last entry, or update the element held in a node), then this sharing would be a problem – `a` might change the list, and the change would affect `b`.

Values and Variables

This list is a stack, with consing as a push, and unwrapping the next element a pop. Suppose we added two methods to simplify these operations:

```extension List {
mutating func push(x: Element) {
self = self.cons(x)
}

mutating func pop() -> Element? {
switch self {
case .End: return nil
case let .Node(x, next: xs):
self = xs
return x
}
}
}
```

“What?! Mutating methods? I thought you said the list had to be immutable.”

Yes, I did. And it still is:

```var stack = List<Int>.End.cons(1).cons(2).cons(3)
var a = stack
var b = stack

a.pop() // 3
a.pop() // 2
a.pop() // 1

stack.pop() // 3
stack.push(4)

b.pop() // 3
b.pop() // 2
b.pop() // 1

stack.pop() // 4
stack.pop() // 2
stack.pop() // 1
```

What this shows us is the difference between values and variables. The nodes of the list are values. They cannot change. A node of `3` and a reference to the next node cannot become some other value. It will be that value forever, just like the number `3` cannot change. It just is. Just because these values in question are structures with pointers to each other doesn’t make them less value-y.

A variable `a`, on the other hand, can change the value it holds. It can be set hold a value of an indirect reference to any of the nodes, or to the value `End`. But changing `a` doesn’t change these nodes. Just which node `a` refers to.

This is what these mutating methods on structs do – they take an implicit `inout` argument of `self`, and they can change the value self holds.1 This doesn’t change the list, just which part of the list the variable currently represents.

In this sense, the variables are iterators into the list:

You can of course declare your variables with `let` instead of `var`, in which case the variables will be constant i.e. you can’t change the value they hold once set. But `let` is about the variables, not the values. Values are constant by definition.

Now, this is all just a logical model of how things work. In reality, the nodes are actually places in memory that point to each other. And they take up space, which we want back if it’s no longer needed. So Swift uses ARC to manage this, and frees them up:

Conforming to SequenceType

OK, so if `List` variables act like iterators, we should be able to use them to iterate, right? For example, we can conform `List` to `SequenceType`:

```extension List: SequenceType {
func generate() -> AnyGenerator<Element> {
// declare a variable to capture that
//  tracks progression through the list:
var current = self
return anyGenerator {
// next() will pop, returning nil
// when the list is empty:
current.pop()
}
}
}
```

And, to make things easier, to `ArrayLiteralConvertible`:

```extension List: ArrayLiteralConvertible {
init(arrayLiteral elements: Element...) {
self = elements.reverse().reduce(.End) {
\$0.cons(\$1)
}
}
}
```

So now you can use lists with `for...in`:

```let l: List = ["1","2","3"]
for x in l { print(x) }
```

It also means, through the power of protocol extensions, we can use `List` with dozens of standard library functions:

```l.joinWithSeparator(",")        // "1,2,3"
l.contains("2")                 // true
l.flatMap { Int(\$0) }           // [1,2,3]
l.elementsEqual(["1","2","3"])  // true
```

Boxes and Identity

But not so many as if we conformed to `CollectionType`. `CollectionType` brings us properties like `.first`, which would be nice to use to peek at the first element without popping it.

More importantly, conforming to `CollectionType` gives a guarantee that multiple passes over the sequence are OK. As the docs for `SequenceType` say:

SequenceType makes no requirement on conforming types regarding whether they will be destructively “consumed” by iteration. To ensure non-destructive iteration, constrain your sequence to CollectionType.

Our list definitely fits this description – you can iterate it as much as you like. So we’d like it to be a collection.

Since variables holding nodes can be treated as iterators, it seems reasonable to use them them as a `ForwardIndexType`.

But first, lets separate out the nodes from the list and index types. It is tempting to just conform the enum to `CollectionType` and `ForwardIndexType` directly, but this will lead to problems. For example, those two protocols need very different implementations of `==`. The index needs to know if two indices from the same list are at the same position. It should not need the elements to conform to `Equatable`. The collection, on the other hand, should be able to compare two different lists to see if they hold the same elements. It will need the elements to conform to `Equatable`.

So lets rename the enum to `ListNode`, making it a private implementation detail, and add public `List` and `ListIndex` types that use it:

```private enum ListNode<Element> {
case End
indirect case Node(Element, next: ListNode<Element>)

func cons(x: Element) -> ListNode<Element> {
return .Node(x, next: self)
}
}

public struct ListIndex<Element> {
private let node: ListNode<Element>
}

public struct List<Element> {
public typealias Index = ListIndex<Element>
public var startIndex: Index

// since this is a constant, no need to store it in a variable,
// so use a computed property instead to save space
public var endIndex: Index { return Index(node: .End) }

public init() {
startIndex = Index(node: .End)
}
}

extension List {
public mutating func push(x: Element) {
startIndex = Index(node: startIndex.node.cons(x))
}

public mutating func pop() -> Element? {
guard case let .Node(x, next: xs) = startIndex.node
else { return nil }

startIndex = Index(node: xs)
return x
}
}

extension List: ArrayLiteralConvertible {
public init(arrayLiteral elements: Element...) {
self = List()
for x in elements.reverse() { push(x) }
}
}
```

OK so now to conform `List` to `CollectionType`. The first step is to conform `ListIndex` to `ForwardIndexType`. Simple enough, `successor()` just needs to return the next node:

```extension ListIndex: ForwardIndexType {
public func successor() -> ListIndex<Element> {
guard case let .Node(_, next: next) = node
else { fatalError("cannot increment endIndex") }

return ListIndex(node: next)
}
}
```

But this is only half the story. `ForwardIndexType` conforms to `Equatable`, which requires you to implement `==`. And this is where we hit the buffers… how can we do that?

```public func == <T>(lhs: ListIndex<T>, rhs: ListIndex<T>) -> Bool {
switch (lhs.node,rhs.node) {
case (.End,.End): return true
case (.Node,.End),(.End,.Node): return false
case (.Node,.Node): // what to put here???
}
}
```

The problem being, indirect enum cases do not have identity. Once upon a time, when this list code was implemented with box, the boxes had identity, which you could use to solve this problem:

```// Box, being a class, will have a === operator
final class Box<Element> {
let unbox: Element
init(_ x: Element) { unbox = x }
}

enum ListNode<Element> {
case End
// a node will be held inside a box, so have identity
case Node(Box<(Element, next: ListNode<Element>)>)

func cons(x: Element) -> ListNode<Element> {
return .Node(Box(x,next: self))
}
}
```

Now, no-one is going to mourn the loss of `Box`, since it got all up in your pattern matching and generally dirtied up the joint. But it did mean you could write `==` like this:

```func ==<T>(lhs: List<T>, rhs: List<T>) -> Bool {
switch (lhs, rhs) {
case (.End,.End):               return true
case (.End,.Node),(.Node,.End): return false
case let (.Node(l),.Node(r)):   return l === r
}
}
```

This all fitted perfectly well with the notion of `List` being a value type. `Box` is not a value type – being a class, it is a reference type. But it is immutable – it only has one property, declared with `let`. So it has value semantics.

This is a subtle but important distinction. Value types are copied on assignment, while reference types are shared (or rather, the thing they refer to is shared). But value types can contain reference types as properties. When value types containing reference types are copied, it is a shallow copy. The two value types will share a reference. If the reference type in question has methods that can mutate its state, the containing type needs to account for this possibility. This can get pretty hairy.

In the case of `Box`, this is fine though, because it is immutable. The sharing doesn’t matter, because once created, a box can never be changed. Yes, what is boxed could be a reference type you could then mutate, but since what we are boxing is our enum, and that is a value type, it’s fine.

(the list could contain mutable reference types. But we can ignore this – in the same way an array containing reference types doesn’t meddle with the array’s value semantics.)

So, with our nodes being references, we essentially got a free globally-unique identifier for each one, which we could then use to implement node equality. But now that we have switched away from box, we can’t use `===` any more. So what can we do?

Well, we could rely on the fact that, really, `indirect` is kind of the same as the box approach, just with the compiler writing the boxing/unboxing in for you and hiding the details. So we could do something very unwise, and just look at the bits:

```public func == <T>(lhs: ListIndex<T>, rhs: ListIndex<T>) -> Bool {
switch (lhs.node,rhs.node) {
case (.End,.End): return true
case (.Node,.End),(.End,.Node): return false
case (.Node,.Node):
// engage ಠ_ಠ now...
return unsafeBitCast(lhs, UInt.self) == unsafeBitCast(rhs, UInt.self)
}
}
```

This is evil, but it appears to work. When a `ListNode.Node` is copied, with `indirect` this just makes a copy of the indirect reference. Then, if the variable is updated, it changes to be a new reference and is no longer equal.

But it is almost certainly Not A Good Idea. It’s relying on an implementation detail of indirect that, if it ever changed or if the compiler did some optimization under the hood that did something different, would lead to undefined behaviour. There could well be other ways in which my behaviour assumptions are incorrect and this breaks, though I haven’t found any.

Next hacky solution: only make the ends of the index equal. Even if two nodes might be the same, return `false` regardless:

```public func == <T>(lhs: ListIndex<T>, rhs: ListIndex<T>) -> Bool {
switch (lhs.node,rhs.node) {
case (.End,.End): return true
default:          return false
}
}
```

Again, this mostly works, but we are still breaking the rules – in this case, the rules of `Equatable`.

`Equatable` requires that `==` be an equvalence relation. This means, amongst other things, that it must be reflexive – a thing must equal itself. And our indices don’t (unless they’re the end):

```l.startIndex == l.startIndex  // false... not OK!
```

Mysterious undiagnosable bugs can ensue from this. There are some places in the standard library where `==` being an equivalence relation is relied upon. For example, `Array` takes a shortcut of first checking if two arrays share the same buffer. If they do, they have to be equal, right? But implement an `Equatable` type whose `==` isn’t reflexive, and you get some very strange behaviour:

```struct NeverEqual: Equatable { }
func == (lhs: NeverEqual, rhs: NeverEqual) -> Bool {
return false
}

var a = [NeverEqual()]
let b = a
a == b       // true
a[0] = b[0]
a == b       // false
```

…so we should probably rule this out.

Conforming to CollectionType

Finally, a third option: tag our list indices with a number, and use that to compare them. Whenever you change an index, increment/decrement the tag.

edit: I originally put this tag on the nodes, but Thorsten in the comments made a great point that it is more space-efficient to have it on the indices.

This solution is a bit more intrusive. We’ll have to reconfigure `ListIndex` to acommodate the tag:

```public struct ListIndex<Element> {
private let node: ListNode<Element>
private let tag: Int
}
```

…and any places where generate new indices:

```public struct List<Element> {
public typealias Index = ListIndex<Element>
public var startIndex: Index
public var endIndex: Index { return Index(node: .End, tag: 0) }

public init() {
startIndex = Index(node: .End, tag: 0)
}
}

extension ListIndex: ForwardIndexType {
public func successor() -> ListIndex<Element> {
guard case let .Node(_, next: next) = node
else { fatalError("cannot increment endIndex") }

return ListIndex(node: next, tag: tag.predecessor())
}
}

extension List {
public mutating func push(x: Element) {
startIndex = Index(node: startIndex.node.cons(x),
tag:  startIndex.tag.successor())
}

public mutating func pop() -> Element? {
guard case let .Node(x, next: _) = startIndex.node
else { return nil }

++startIndex
return x
}
}
```

Now, index equality can now be performed just by comparing the `tag`:

```public func == <T>(lhs: ListIndex<T>, rhs: ListIndex<T>) -> Bool {
return lhs.tag == rhs.tag
}
```

Finally, we can conform `List` to `CollectionType`:

```extension List: CollectionType {
public subscript(idx: Index) -> Element {
guard case let .Node(x, _) = idx.node
else { fatalError("Subscript out of range") }

return x
}
}
```

and it gainst the collection-only features:

```l.first  // .Some("1")
if let twoIdx = l.indexOf("2") {
l[twoIdx]  // "2"
}
```

There’s no longer any need to implement `generate` for the `SequenceType` conformance. Collections automatically gain a default implementation using `IndexingGenerator`.

As an added benefit of the tagging solution, since the tag is the count of nodes prepended to `.End`, we can implement a constant-time `count`, even though the lists are forward-index only which would normally mean linear-time counting:

```extension List {
public var count: Int {
return startIndex.tag
}
}
```

It is true that two `ListIndex` types from two totally different lists with different values can now be equated and be equal to each other. But then, this is true of array’s indices too (cos, you know, they’re just integers).

We can also implement `==` for `List` the way users expect `==` to behave for value-semantic collections, by comparing the elements:

```func == <T: Equatable>(lhs: List<T>, rhs: List<T>) -> Bool {
return lhs.elementsEqual(rhs)
}
```

Efficient Slicing

Collections also get a default implementation of the slicing operation, `[Range]`, which is how `dropFirst` works:

```// equivalent of dropFirst(l)
let firstDropped = l[l.startIndex.successor()..<l.endIndex]

// firstDropped will be a Slice<List<String>>
```

But this default implementation returns a wrapper over the original collection, plus an index sub-range, so is bigger than it needs to be in `List`‘s case:

```// Size of a list is size of two nodes, the start and end:
sizeofValue(l)             // returns 16

// Size of a list slice is size of a list, plus size of a subrange
// (a range between two indices, which in lists's case are also list nodes)
sizeofValue(dropFirst(l))  // returns 48
```

Remember, since `endIndex` was always the same value, it takes no space inside `List`, so the `sizeofValue(l)` is just the size of `List`’s sole member variable, `startIndex`.

But if we switch to hold an explicit `endIndex` property, lists could return subsequences of themselves by holding a different start and end:

```public struct List<Element> {
public typealias Index = ListIndex<Element>
public var startIndex: Index
public var endIndex: Index

public init() {
startIndex = Index(node: .End, tag: 0)
endIndex = startIndex
}

// and add the slicing initializer, which takes a range:
private init (subRange: Range<Index>) {
startIndex = subRange.startIndex
endIndex = subRange.endIndex
}
public subscript (subRange: Range<Index>) -> List<Element> {
return List(subRange: subRange)
}
}

// now lists take 32 bytes (a startIndex and an endIndex):
sizeofValue(l)             // returns 32

// but list slices are also lists, so only 32 bytes
sizeofValue(dropFirst(l))  // returns 32
```

This leads to another benefit of this list implementation. With many sliceable containers, including Swift’s arrays and strings, a slice shares the storage buffer of the original collection. This has an unpleasant side-effect: slices can keep the original collection’s buffer alive in its entirety, even if the original collection falls out of scope. If you suck a 1gb file into an array or string, then slice off a tiny part, the whole 1gb buffer will stay in memory until both the collection and the slice are destroyed.

With `List`, it isn’t quite as bad. As we’ve seen, the nodes are refcounted. Which means when the slices are the the only remaining copy, any elements dropped from the front will be reclaimed as soon as no-one is referencing them.

Not the back though – since the slice’s `endIndex` still has a reference to what comes after. You could optimize this even further by having slices only hold the tag for their end index if you wanted.

I hate writing conclusions

So, losing Box was mostly great, but it also lost us one benefit, of a free identity for our nodes. This identity is still secretly there, and I think it works similarly, but we can’t use it. I have a feeling there’s a case to be made for enums to be able to use it internally, without necessarily breaking with the idea that enums are value types. But there are probably hidden gotchas to this that I haven’t thought of. Tags serve as a reasonable subsitute (at the expense of an extra few bytes), but this approach wouldn’t easily adapt to more complex structures like trees, so conforming last week’s code to a collection type would probably prove tricky (though, a persistent stack like this list may help with creating the index type). If anyone wants to take a stab, let me know what you come up with!

1. `inout` arguments in Swift are interesting too. They aren’t pass-by-reference, but rather pass-by-value-and-copy-back. Which makes several things about Swift work better than, say, C#. There’s more info in the book

Changes to the Swift standard library in 2.0 betas 2..<5

So, I was waiting for enough library changes across a few betas to justify posting something, and then all of a sudden lots of stuff changed! So let’s start with:

Flat Map

Missed from my 2.0b1 changes post (shame on me) was a third kind of `flatMap`.

1.2 brought us `flatMap` for collections – given an array, and a function that transforms elements into arrays, it produces a new array with every element transformed, and then those arrays “flattened” into a single array. For example, suppose you had a function, `extractLinks`, that took a filename and returned an array of links found in that file:

```func extractLinks(markdownFile: String) -> [NSURL]
```

If you had an array of filename strings (say from a directory listing), and you wanted an array of the links in all the files, you could use this function with `flatMap` to produce a single array (unlike `map`, which would generate an array of arrays):

```let nestedArrays: [NSURL] = markdownFiles.flatMap(extractLinks)
```

There was also a `flatMap` for optionals. Given an optional, it takes a function that takes the possible value inside the optional, and applies a function that also returns an optional, and “flattens” the result of mapping an optional to another optional. For example, `Array.first` returns the first element of a collection, or `nil` if the array is empty and has no such element. `Int.init(String)` takes a string, and if it is a representation of an integer, returns that integer, or `nil` if it isn’t. To take the first element of an array, and turn it into an optional integer, you could use `flatMap` (unlike `map`, which will return an optional of an optional of an integer):

```let a = ["1","2","3"]
let i: Int? = a.first.flatMap { Int(\$0) }
```

So what’s the new third kind of `flatMap`? It’s a mixture of the two – for when you have an array, and want to transform it with a function that returns an optionals, discarding any transforms that return `nil`. This is logical if you think of optionals as like collections with either zero or one element – `flatMap` would discard the empty results, and flatten the single-element collections into an array of elements.

For example, if you have an array of strings, and you want to transform it into integers, discarding any non-numeric strings:

```let strings = ["1","2","elephant","3"]
let ints: [Int] = strings.flatMap { Int(\$0) }
// ints = [1,2,3]
```

If you’ve been following this blog for a while, you’ll recognize this is a function we’ve defined before as `mapSome`.

By the way, all these examples have been pulled from the book on Swift that Chris Eidhof and I have been writing. The chapter on optionals has just been added to the preview.

First-class Initializers

In beta 2 came the ability to use initializers, partially-applied methods, and enum cases, as first-class functions. This gives you the ability to pass them directly into higher-order functions without needing a closure expression:

```// equivalent to [1,2,3].map { Optional.Some(\$0) }
let optionals = [1,2,3].map(Optional.Some)

// equivalent to [1,2,3].map { String(\$0) }
let strings = [1,2,3].map(String.init)

// equivalent to [1,2,3].map { 1.advancedBy(\$0) }
```

Given this, you might wonder why I wrote `strings.flatMap { Int(\$0) }` in the earlier section, instead of `strings.flatMap(Int.init)`. But this new ability is constrained by the fact that overload resolution works slightly differently for function invocation versus function assignment. If you try the latter version, you will get an error because there is no method for `Int.init` that takes just a single string as an argument. When you call `Int("1")`, you are really calling `Int("1", radix: 10)`, with the compiler filling out the default argument for you. But it won’t do that when passing `Int.init` in to `map`.

`String.init` on integers works though, I guess because the overloading is less ambiguous? On the other hand, `arrayOfCustomStructs.map(String.init)` won’t compile, probably because there is ambiguity between `String.init(T)` and `String.init(reflecting:T)` (the debug version). So for now, you’ll have to stick with writing `array.map { String(\$0) }`, like an animal.

One other thing to be careful of: this new terse syntax doesn’t work for when you want to apply a method to the `self` argument. The following will compile, but won’t do what might be intended – reverse each of the arrays:

```let reversedArrays = [[1,2,3],[4,5,6]].map(Array.reverse)
```

Instead, this will produce an array of functions – because `Array.reverse` returns a function that returns a function where the method will be called on the argument pased.

So instead, you would have to write this:

```let reversedArrays = [[1,2,3],[4,5,6]].map { Array.reverse(\$0)() }
// which is just a long winded way of writing this…
let reversedArrays = [[1,2,3],[4,5,6]].map { \$0.reverse() }
// but bear with me...
```

You could always write a version of `map` that takes curried functions and does the above for you:

```extension CollectionType {
func map<T>(@noescape transform: (Self.Generator.Element) -> () -> T) -> [T] {
return self.map { transform(\$0)() }
}
}
```

after defining which, `.map(Array.reverse)` will do what you probably want.1

This can also be nice to do for other functions. For example, if you defined a similar version of `sort`:

```extension CollectionType {
public func sort(isOrderedBefore: Self.Generator.Element -> Self.Generator.Element -> Bool) -> [Self.Generator.Element] {
return self.sort { isOrderedBefore(\$0)(\$1) }
}
}
```

then, after also overloading `caseInsensitiveCompare` to have a version that returns a boolean in Swift’s `isOrderedBefore` style, you could do the same with methods that compare:

```import Foundation

extension String {
func caseInsensitiveCompare(other: String) -> Bool {
return self.caseInsensitiveCompare(other) == .OrderedAscending
}
}

["Hello","hallo","Hullo","Hallo",].sort(String.caseInsensitiveCompare)
// returns ["hallo", "Hallo", "Hello", "Hullo"]
```

RangeReplaceableType: all your ExtensibleCollectionTypes are belong to us

`ExtensibleCollectionType` is gone. It used to provide `append` and `extend`, which are now amongst the features provided by `RangeReplaceableCollectionType`.

`RangeReplaceableCollectionType` is a great example of the power of protocol extensions. You implement one uber-flexible method, `replaceRange`, which takes a range to replace and a collection to replace with, and from that comes a whole bunch of derived methods for free:

• `append` and `extend`: replace `endIndex..`<`endIndex` (i.e. nothing, at the end) with the
new element/elements.
• `removeAtIndex` and `removeRange`: replace `i...i` or `subRange` with an empty collection.
• `splice` and `insertAtIndex`: replace `atIndex..`<`atIndex` (i.e. don't replace any elements but insert at that point) with a new element/elements.
• `removeAll`: replace `startIndex..`<`endIndex` with an empty collection.

The two remaining functions from `ExtensibleCollectionType` – an empty initializer, and `reserveCapacity`, have also moved into `RangeReplaceableCollectionType`.

As always, if a specific collection type can use knowledge about its implementation to perform these functions more efficiently, it can provide custom versions that will take priority over the default protocol extention ones.

You get to be Sliceable. And you get to be Sliceable. Everybody gets to be Sliceable!

Things being `Sliceable` (i.e. having a subscript that takes a range, and returns a new collection of just that range) was incredibly useful. But it was also limiting, because you needed to rely on things implementing `Sliceable`.

But you could make anything sliceable if you needed it to be – by writing a wrapper view that took a subrange and implemented `CollectionType` to present only that subrange:

```// note, won't compile in beta 4 because Sliceable is gone!
public struct SubSliceView<Base: CollectionType>: Sliceable {
let collection: Base
let bounds: Range<Base.Index>

public var startIndex: Base.Index { return bounds.startIndex }
public var endIndex: Base.Index { return bounds.endIndex }

public subscript(idx: Base.Index) -> C.Generator.Element
{ return collection[idx] }

typealias SubSlice = SubSliceView<Base>
public subscript(bounds: Range<Base.Index>) -> SubSliceView<Base> {
return SubSliceView(collection: collection, bounds: bounds)
}
}

// AnyRandomAccessCollection is an example of a type that wasn't sliceable,
// but you could make it one by doing this:
extension AnyRandomAccessCollection: Sliceable {
public subscript(bounds: Range<Index>) -> SubSliceView<AnyRandomAccessCollection> {
return SubSliceView(collection: self, bounds: bounds)
}
}

let a = [1,2,3]
let r = AnyRandomAccessCollection(a)
// dropFirst relies on things being sliceable:
print(dropFirst(dropFirst(r)).first)
```

As of beta 4, the new `Slice` type does pretty much this exact thing. Including the way the slice's start index is not zero based – if you create a slice starting at index 2, the `startIndex` of that slice will be 2. Another reason to avoid using indices directly in favour of things like `for...in` or higher-order functions like `map`. Index-based operations are so much easier to screw up, and your assumptions (e.g. every integer-indexed collection starts at zero) can easily be invalid.

The `Sliceable` protocol has been subsumed into `CollectionType`, and the default implementation for the ranged subscript is to return a `Slice` view onto the collection. So now every collection supports a slicing subscript.

`CollectionType` in turn now conforms to `Indexable`, which is the new home for `startIndex`, `endIndex`, and index-based subscript. Note, `Indexable` does not to conform to `SequenceType` – these two things are now brought together by `CollectionType`.

There is still a good reason for collections to implement their own custom slicing code – they can often do it more efficiently.

For example, `UnsafeBufferPointer` didn't used to be sliceable, but now is, with the default implementation. If you slice it, you'll get back a `Slice`:

```[1,2,3].withUnsafeBufferPointer { buf -> Void in
let slice = buf[2..<4]
sizeofValue(slice)  // returns 32 bytes
}
```

`sizeof` returns 32, because the type has to contain both the base buffer (which is 16 bytes – the base pointer address and the length), plus the start and end of the subrange (another 16 bytes).2

But alternatively, `UnsafeBufferPointer` could use itself as its subslice, like so:

```extension UnsafeBufferPointer {
subscript(subRange: Range<Int>) -> UnsafeBufferPointer {
count: subRange.count)
}
}
```

If you do this, you’ll see that `UnsafeBufferPointer`‘s slice is now another `UnsafeBufferPointer`, and the memory requirement drops in half:

```[1,2,3].withUnsafeBufferPointer { buf -> Void in
let slice = buf[2..<4]
sizeofValue(slice)  // returns 16 bytes
}
```

This approach – making `Self` your subslice – is taken by `String` slices, but not by `Array`, which has a subslice type of `ArraySlice`.

This brings up another gotcha you might hit when writing generic methods on collections using slices – slices aren’t necessarily of the same type as the thing you are slicing. They are sometimes, but not always.

Even more strangely, the elements contained in the subslice aren’t guaranteed to be the same as the elements of the original collection! Ideally they would be, but this constraint can’t be expressed in Swift right now.

For example, suppose you wanted to overload `reduce` with a version that didn’t need an initial element – instead it used the first element of the collection, and returned an optional in case the collection was empty. You might try and implement it like this:

```extension CollectionType {
/// Return the result of repeatedly calling `combine` with an
/// accumulated value initialized to the first element, and each
/// subsequent element of `self`, in turn, i.e. return
/// `combine(combine(...combine(combine(self[0], self[1]),
/// self[2]),...self[count-2]), self[count-1])`.
/// Return `nil` in case of an empty collection.
func reduce(@noescape combine: (Generator.Element, Generator.Element) -> Generator.Element) -> Generator.Element? {
return first.map {
dropFirst(self).reduce(\$0, combine: combine)
}
}
}
```

This will not compile (and beta 4 will lead you up the garden path with a spurious error for why). The problem being, the elements of the collection returned by `dropFirst` aren’t guaranteed to be the same as those of the collection. You would probably be right to be annoyed if they weren’t – but the type system doesn’t guarantee it.

The fix would be to constrain the extension to require it:

```extension CollectionType where Generator.Element == SubSequence.Generator.Element {
func reduce(@noescape combine: (Generator.Element, Generator.Element) -> Generator.Element) -> Generator.Element? {
return first.map {
dropFirst(self).reduce(\$0, combine: combine)
}
}
}

// so now you can do this:
[1,2,3,4].reduce(+)  // returns 10
```

Grab Bag

There are a few other changes to the standard library:

• `Reflectable` became `_Reflectable` and `MirrorType` became `_MirrorType`
• `QuickLookObject` has been renamed `PlaygroundQuickLookObject`
• `_MirrorType` and `PlaygroundQuickLookObject` acquired documenting comments.
• `_MirrorDisposition` is now gone from the visible library, though `_MirrorType` still refers to it for now.
• And the `reflect` function is gone: Xcode tells you to use `Mirror.init(reflecting:)` instead.
• `RawOptionSetType` was removed
• The gradual disappearing of the `_` protocols continues. `_UnsignedIntegerType` is now gone (its methods – `toUIntMax` and `init(_: UIntMax)` moved into `UnsignedIntegerType`)
• Views onto other things (such as `FilterCollection`, a lazily filtered viewonto a collection) have dropped their `View` suffix.
• Free functions continue to disappear into protocol extensions.
• As do unnecessarily non-default methods, such as `generate` for collections that now use the default indexing generator implementation.
• `LazyRandomAccessCollection.reverse` now returns a random-access reversed collection, not a bi-directional one.
• The `Zip2` type has been renamed `Zip2Sequence` – but you already switched to creating it using the `zip` function, right? Though perhaps the protocol extension wolves are circling its campfire.
• `SinkType` and `SinkOf` are gone. `SinkType` was used in two places – UTF `encoding` methods, where you now pass a closure, and `UnsafeMutablePointer`, where you could now write `(ptr++).memory = x` instead, I guess.

Beta 3 didn’t introduce many functional changes, but did do a lot of renaming, changing the placeholder names for generic types to something more descriptive than `T`, `U` etc:

• Sequence and collection-like things now use `Element`
• Pointer-like things now use `Memory`
• Intervals use `Bound`
• Views onto other types use `Base` for the thing they are viewing
• Unmanaged things are `Instance`s.
• Optionals still wrap a `T` though.

1. We’re not quite out of gotcha territory yet though. After defining that `map`, try and guess what `["foo","bar","baz"].map(String.hasPrefix("b"))` will do…
2. On 64-bit platforms, anyway. Cut all these numbers in half for 32-bit.

A persistent tree using indirect enums in Swift

UPDATE: now conforms to Swift 2.1b2

Suppose you want an ordered set. `Set` is great and all, but it’s unordered. You could always use a `Set` anyway, and sort the contents when needed. Or you could use an array, keep it sorted, and always insert new elements in the right place. Or, you could use a tree.

One way to build a tree in Swift would be with a class that refers to itself. Something like this for a binary search tree:

```class Tree<Element: Comparable> {
let value: Element
// entries < value go on the left
let left: Tree<Element>?
// entries > value go on the right
let right: Tree<Element>?
}
```

For the tree’s logic, `Element` would only have to be `Comparable`, not `Hashable`, which is another potential benefit over `Set`.

The left and right subtrees are optional, with `nil` at the leaves when there’s no lesser/greater value than the current node. But there’s a problem with this approach – how do you represent an empty tree? `value` is not optional. Sure you could make it optional, and have an empty tree be represented by a nil value. But that doesn’t feel healthy.

Really, you want an enum with an associated type. Something like this:

```enum Tree<Element: Comparable> {
case Empty
case Node(Tree<Element>,Element,Tree<Element>)
}
```

So, a tree is either empty, or it has a left and right subtree and a value. The subtrees no longer need to be optional, because if they’re empty, they can just hold the `.Empty` value.

No more boxes

Except, this won’t work. Swift enums are value types. So they can’t contain themselves like this. You need some kind of indirection – a reference type, such as a class.

Up until 2.0b4, you would have to use a `Box` trick – inserting a class in between each enum:

```final class Box<Element> {
let unbox: Element
init(_ x: Element) { self.unbox = x }
}

enum Tree <Element: Comparable> {
case Empty
case Node(Box<Tree<Element>>,Element,Box<Tree<Element>>)
}
```

Well that’s pretty gross. And we haven’t even got to the actual implementation logic yet. Trust me, `Box` is gonna stomp all over that code leaving its muddy footprints everywhere.

But good news! The latest release of Swift introduces the `indirect` keyword for enums. This is similar to a transparent version of the box above, putting a reference where the boxed value would go. Xcode will even suggest putting in the keyword for you.

So now we can write this:

```indirect enum Tree<Element: Comparable> {
case Empty
case Node(Tree<Element>,Element,Tree<Element>)
}
```

But really, we want a balanced tree. Simple binary search trees work well with random data, but if you insert already-ordered data into them, they degenerate into a linked list (because all the insertions happen down one side of the tree).

So lets go with a red-black tree – a kind of binary search tree that colours its nodes red or black, and then uses some invariants to guarantee it remains balanced on insertion. If you follow that link you’ll see quite a lot of ugly code for maintaining those invariants – but don’t worry, we’re going to write something that hopefully looks a lot simpler.

So here’s the tree, now with added colour, plus a couple of initializers to make things easier:

```enum Color { case R, B }

indirect enum Tree<Element: Comparable> {
case Empty
case Node(Color,Tree<Element>,Element,Tree<Element>)

init() { self = .Empty }

init(_ x: Element, color: Color = .B,
left: Tree<Element> = .Empty, right: Tree<Element> = .Empty)
{
self = .Node(color, left, x, right)
}
}
```

Some might not like that I called my `Color` cases `R` and `B` instead of `Red` and `Black`. It’s a personal preference, but I like to keep them shorter because a) they’re arbitrary – they are only called that because those were the two colours on the laser printer they had at Xerox PARC, and b) it keeps the pattern matching shorter and (therefore, IMO) clearer. Judge for yourself when we get to that part.

By the way, this implementation is a Swift translation of an ML version from Chris Okasaki’s Purely Functional Data Structures which is definitely worth reading if you like this kind of stuff. What’s cool is that with the new 2.0 features of Swift, the Swift version is getting pretty close to the ML one in terms of expressivity/simplicity.

Checking if the tree contains an element

So, let’s start with `contains`. This needs to recur down the tree, checking each value to see if it’s empty, or if not, whether its value is less than, greater than or equal to the value we’re looking for.

Ideally I would write this using a `guard` statement like so:

```extension Tree {
func contains(x: Element) -> Bool {
guard case let .Node(_,left,y,right) = self
else { return false }

if x < y { return left.contains(x) }
if y < x { return right.contains(x) }
return true
}
}
```

I like how `guard` helps here. It lets us unpack the important stuff used by the main body of the function (is it a node? If so, grab the left, right and value elements – discarding color, since its not important). But it also lets us put the failure case – if you hit an empty node, you know the tree does not contain the element – up front. Then, head left if the element we’re looking for is less than this one, or right if it’s greater.

We can rely on the fact that `Comparable` elements are required to be in a strict total order. This means if `x` is neither less than or greater than a value, it must be equal to that value. So we’ve found the element and `contains` is true.

Inserting an element

OK, `insert` is a little trickier. We’re going to define `insert` as a method that returns a new tree – one with the new value inserted. This is a little different to `Set.insert`, which is an in-place insert.

In doing this, we’re going to create a “persistent” data structure – one that does not change the old structure when it updates it. This might sound like multiple insertions in a row are going to perform many unnecessary copies of the whole structure, but it isn’t necessarily that bad. Because nodes of the tree are read-only, two trees can share nodes. Only the nodes that are actually changed as a result of the insert need to be copied.

First, the `insert` method. It doesn’t really do much except maintain one of the invariants required to maintain balance: the root node must always be black. It leaves most of the work to a second function, `ins`:

```private func ins<T>(into: Tree<T>, _ x: T) -> Tree<T> {
guard case let .Node(c, l, y, r) = into
else { return Tree(x, color: .R) }

if x < y { return balance(Tree(y, color: c, left: ins(l,x), right: r)) }
if y < x { return balance(Tree(y, color: c, left: l, right: ins(r, x))) }
return into
}

extension Tree {
func insert(x: Element) -> Tree {
guard case let .Node(_,l,y,r) = ins(self, x)
else { fatalError("ins should never return an empty tree") }

return .Node(.B,l,y,r)
}
}
```

So, if we’ve hit an empty node, append the value here as a new node. New nodes are always red.

Otherwise, insert the value into the left or right subtree. And if the element is equal, stop and don’t insert anything, since we’re implementing set behaviour here.

But there’s also the call to `balance`. This function is going to ensure that, as we come back up the tree having inserted our new element, the tree is properly balanced, by looking at whether any invariants have been broken and fixing them.

This is normally where the ugliness of implementing a red-black tree resides. But we’re going to use Swift’s pattern matching to try and keep it simple. The balancing logic can be summarized as: “on the way back up from an insertion, if there’s a black node, with a red child and a red grandchild, this needs fixing”.

A red child and grandchild of a black node can happen in one of four different configurations. Here’s a visualization of one of those possibilities (excuse the ASCII art):

```        z: black
/        \
x: red     d
/ \
a   y: red
/ \
b   c
```

needs rearranging into:

```         y: red
/        \
x: black   z: black
/ \        / \
a   b      c   d
```

Once you understand this visualization, you can then encode it into a Swift pattern matching `if` statement, that extracts all the relevant variables from the possible imbalanced structure, then returns them reconfigured as the second, balanced, version.

The above re-balancing operation would end up encoded in Swift like this:

```case let .Node(.B, .Node(.R, a, x, .Node(.R, b, y, c)), z, d):
return .Node(.R, .Node(.B, a, x, b), y, .Node(.B, c, z, d))
```

This is why I prefer `.B` and `.R` for the colour case names – single-letter names to me look clearer in this stage, which is when they matter. There’s no getting around it, the transformation code is going to be hard to read no matter what you do, so the best you can hope for is easy translation. The longer words wouldn’t help with the key problem, which is encoding the diagram above into a pattern-matching statement.

To write the full balancing function, you need to visualize all 4 imbalanced configurations, then encode them into 4 patterns. If a node doesn’t match any, it’s good and you can return it as-is:

```private func balance<T>(tree: Tree<T>) -> Tree<T> {
switch tree {
case let .Node(.B, .Node(.R, .Node(.R, a, x, b), y, c), z, d):
return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
case let .Node(.B, .Node(.R, a, x, .Node(.R, b, y, c)), z, d):
return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
case let .Node(.B, a, x, .Node(.R, .Node(.R, b, y, c), z, d)):
return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
case let .Node(.B, a, x, .Node(.R, b, y, .Node(.R, c, z, d))):
return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
default:
return tree
}
}
```

Note, this pattern match recurses 3 levels deep into the enum. This is also where getting rid of `Box` was critical. It seriously gets in the way – here is how it would look before `indirect`:

```if case let .Node(.B,l,z,d) = tree, case let .Node(.R,ll,y,c) = l.unbox, case let .Node(.R,a,x,b) = ll.unbox {
return .Node(.R, Box(.Node(.B,a,x,b)),y,Box(.Node(.B,c,z,d)))
}
```

Note, the return statement is the same in all 4 rebalance patterns. It’s only the pattern match that changes, based on which of the 4 imbalance possibilities is being checked.

It might be nice to be able to group these 4 possible matches together like so:

```    switch tree {
case let .Node(.B, .Node(.R, .Node(.R, a, x, b), y, c), z, d),
.Node(.B, .Node(.R, a, x, .Node(.R, b, y, c)), z, d),
.Node(.B, a, x, .Node(.R, .Node(.R, b, y, c), z, d)),
.Node(.B, a, x, .Node(.R, b, y, .Node(.R, c, z, d))):
return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
default:
return tree
}
```

but Swift won’t allow multiple patterns in a case when the pattern declares variables, so we have to stick with the multiple returns for now.

Traversing in-order

So, now we have a nice balanced tree. But to get the benefit of an ordered set, we want to iterate over it in order. This sounds like a job for `SequenceType`. That way, you could use `for...in` over the sorted elements.

In-order traversal of a tree is usually done recursively, but that’s a little tricky to do when you want to give `anyGenerator` a closure expression that spits out the next element. So instead, we can use a stack to do it iteratively:

```extension Tree: SequenceType {
func generate() -> AnyGenerator<Element> {
var stack: [Tree] = []
var current: Tree = self

return anyGenerator { _ -> Element? in
while true {
// if there's a left-hand node, head down it
if case let .Node(_,l,_,_) = current {
stack.append(current)
current = l
}
// if there isn’t, head back up, going right as
// soon as you can:
else if !stack.isEmpty, case let .Node(_,_,x,r) = stack.removeLast() {
current = r
return x
}
else {
// otherwise, we’re done
return nil
}
}
}
}
}
```

Initializing from another sequence

One last extension. When you have a container, it’s nice to be able to initialize it from another sequence, as well as from a literal. These are both pretty easy:

```extension Tree: ArrayLiteralConvertible {
init <S: SequenceType where S.Generator.Element == Element>(_ source: S) {
self = source.reduce(Tree()) { \$0.insert(\$1) }
}

init(arrayLiteral elements: Element...) {
self = Tree(elements)
}
}
```

Given these, you can now create new sets using literal syntax:

```let alphabet = Tree("the quick brown fox jumps over the lazy dog".characters)
let primes: Tree = [2, 3, 5, 7, 11, 13, 17, 19]
```

That’s it for now. Several features of Swift 2.0 make this code a lot more pleasant to read and write. There are various further enhancements you could make. The tree maybe ought to be wrapped in an outer struct to avoid exposing the enum implementation. The next logical extension would be conforming to `CollectionType`.

And there are also performance improvements you could make (the functional data structures book suggests a few as exercises). Honestly, the times when this data structure will do better than an unordered set or array + sort are probably going to be pretty rare. But it’s nice to have the option now.

Below is the final code, along with some tests, or you can find the full code for the tree in a gist, here.

If you found this interesting (and if you made it all the way down here, hopefully that’s true!), it’ll be part of Chris Eidhof and my book, which you can get a preview of here.

```enum Color { case R, B }

indirect enum Tree<Element: Comparable> {
case Empty
case Node(Color,Tree<Element>,Element,Tree<Element>)

init() { self = .Empty }

init(_ x: Element, color: Color = .B,
left: Tree<Element> = .Empty, right: Tree<Element> = .Empty)
{
self = .Node(color, left, x, right)
}
}

extension Tree {
func contains(x: Element) -> Bool {
guard case let .Node(_,left,y,right) = self
else { return false }

if x < y { return left.contains(x) }
if y < x { return right.contains(x) }
return true
}
}

private func balance<T>(tree: Tree<T>) -> Tree<T> {
switch tree {
case let .Node(.B, .Node(.R, .Node(.R, a, x, b), y, c), z, d):
return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
case let .Node(.B, .Node(.R, a, x, .Node(.R, b, y, c)), z, d):
return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
case let .Node(.B, a, x, .Node(.R, .Node(.R, b, y, c), z, d)):
return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
case let .Node(.B, a, x, .Node(.R, b, y, .Node(.R, c, z, d))):
return .Node(.R, .Node(.B,a,x,b),y,.Node(.B,c,z,d))
default:
return tree
}
}

private func ins<T>(into: Tree<T>, _ x: T) -> Tree<T> {
guard case let .Node(c, l, y, r) = into
else { return Tree(x, color: .R) }

if x < y { return balance(Tree(y, color: c, left: ins(l,x), right: r)) }
if y < x { return balance(Tree(y, color: c, left: l, right: ins(r, x))) }
return into
}

extension Tree {
func insert(x: Element) -> Tree {
guard case let .Node(_,l,y,r) = ins(self, x)
else { fatalError("ins should never return an empty tree") }

return .Node(.B,l,y,r)
}
}

extension Tree: SequenceType {
func generate() -> AnyGenerator<Element> {
var stack: [Tree] = []
var current: Tree = self

return anyGenerator { _ -> Element? in
while true {
// if there's a left-hand node, head down it
if case let .Node(_,l,_,_) = current {
stack.append(current)
current = l
}
// if there isn’t, head back up, going right as
// soon as you can:
else if !stack.isEmpty, case let .Node(_,_,x,r) = stack.removeLast() {
current = r
return x
}
else {
// otherwise, we’re done
return nil
}
}
}
}
}

extension Tree: ArrayLiteralConvertible {
init <S: SequenceType where S.Generator.Element == Element>(_ source: S) {
self = source.reduce(Tree()) { \$0.insert(\$1) }
}

init(arrayLiteral elements: Element...) {
self = Tree(elements)
}
}

import Darwin

extension Array {
func shuffle() -> [Element] {
var list = self
for i in 0..<(list.count - 1) {
let j = Int(arc4random_uniform(UInt32(list.count - i))) + i
guard i != j else { continue }
swap(&list[i], &list[j])
}
return list
}
}

let engines = [
"Daisy", "Salty", "Harold", "Cranky",
"Thomas", "Henry", "James", "Toby",
"Belle", "Diesel", "Stepney", "Gordon",
"Captain", "Percy", "Arry", "Bert",
"Spencer",
]

// test various inserting engines in various different permutations
for permutation in [engines, engines.sort(), engines.sort(>),engines.shuffle(),engines.shuffle()] {
let t = Tree(permutation)
assert(!t.contains("Fred"))
assert(t.elementsEqual(t.insert("Thomas")))
assert(!engines.contains { !t.contains(\$0) })
assert(t.elementsEqual(engines.sort()))
print(t.joinWithSeparator(","))
}
```

Protocol extensions and the death of the pipe-forward operator

So a while back I wrote about using some functional programming techniques in Swift to solve the Luhn credit card checksum problem.

One of the suggestions in the article was defining and using `|`> to avoid the annoying tennis-match bouncing left and right between free functions and methods.

Since 2.0 came out, I’ve been thinking I really needed to update some of my articles to the new syntax, and started on this one. Except as protocol extensions are the new hotness, using them would probably be the more “Swift-y” way of writing this. So here’s a solution using new 2.0 features.

As a reminder, here’s the requirements, in the form of wikipedia’s description of the algorithm:

1. From the rightmost digit, which is the check digit, moving left, double the value of every second digit; if the product of this doubling operation is greater than 9 (e.g., 8 × 2 = 16), then sum the digits of the products (e.g., 16: 1 + 6 = 7, 18: 1 + 8 = 9).
2. Take the sum of all the digits.
3. If the total modulo 10 is equal to 0 (if the total ends in zero) then the number is valid according to the Luhn formula; else it is not valid.

So, here in turn are each of the components, all written as extensions.

First, converting characters to integers. Similar to `String.toInteger` becoming `Int.init(String)`, here’s an extension of `Int`:

```extension Int {
init?(c: Character) {
guard let i = Int(String(c)) else { return nil }

self = i
}
}
```

Here I’m using `guard` to check the value is convertible and return `nil` if not. Yes, I know, this is hardly different from an `if let/else`, but I like the use of `guard` whenever I want to handle a failure case like this “up front”.

It would be nice to make this a protocol extension rather than an extension of `Int`. All the std lib integer types support a from-string initializer, after all. But this would involve creating a protocol like `IntegerStringConvertible` and then extending all the integers to conform to it, since such a common grouping doesn’t exist.

The previous post defined `mapSome`, which takes a transformation function that returns an optional, and returns an array of only those values that weren’t transformed into `nil`:

```extension SequenceType {
func mapSome<U>(transform: Generator.Element -> U?) -> [U] {
var result: [U] = []
for case let x? in lazy(self).map(transform) {
result.append(x)
}
return result
}
}
```

(the `for case let x?` is using the new pattern-matching syntax that lets you `for` over only the non-nil values in a sequence)

But as of 2.0, `flatMap` has been enhanced to do this exact thing. So we no longer need to define it.

We’re going to use this along with the character-to-integer initializer to extract only the numbers from the credit-card string, like so, using the new 2.0b2 feature of using an `init` as a function:

```":123-456:".characters.flatMap(Int.init)
// returns [1,2,3,4,5,6]
```

(note, we have to do this on the `.characters` view now, since `String` is no longer itself a sequence).

Next, the a function that let you apply a transformation only to every n th value in a sequence:

```extension SequenceType {
func mapEveryNth(n: Int, transform: Generator.Element -> Generator.Element)
-> [Generator.Element]  {

// enumerate starts from zero, so for this to work with the nth element,
// and not the 0th, n+1th etc, we need to add 1 to the ifIndex check:
let isNth = { (\$0 + 1) % n == 0 }

return enumerate().map { i, x in
isNth(i) ? transform(x) : x
}
}
}
```

Then, a `sum` on any sequence that contains integers:

```extension SequenceType where Generator.Element: IntegerType {
func sum() -> Generator.Element {
return reduce(0, combine: +)
}
}
```

and an extension on any integer to check if it’s a multiple of another:

```extension IntegerType {
func isMultipleOf(of: Self) -> Bool {
return self % of == 0
}
}
```

And finally, to put them all together into a single extension to `String` that validates the checksum:

```extension String {
func luhnchecksum() -> Bool {
return characters
.flatMap(Int.init)
.reverse()
.mapEveryNth(2) { \$0 < 5 ? \$0*2 : \$0*2 - 9 }
.sum()
.isMultipleOf(10)
}
}

let ccnum = "4012 8888 8888 1881"
print( ccnum.luhnchecksum() ? "👍" : "👎" )
```

This now looks very similar to the version that used the `|`> operator – but without having to define any custom operator at all. Which feels like a win to me.

Yes, you could have done all this before using extensions of concrete objects in 1.2. But most of these extensions are more general than that – for example, `mapSome` can work on the string character view, but also arrays, dictionaries, sets.

Anyway, now back to playing with beta 2…

Changes to the Swift Standard Library in 2.0 beta 1

OK don’t panic – it might look like a lot has changed, but not really. It’s more… transformed. For the better. Nothing the migration assistant shouldn’t be able to handle.

By far the biggest change in the standard library is due to the new protocol extensions. The free `map` function is dead, long live the `map` extensions to `CollectionType` and `SequenceType`. Which means now you only ever call `map` as a method – no more shuttling back and forth between functions and methods like you’re watching a tennis match.

To show how this works, let’s define my usual example, `mapSome`:

```extension SequenceType {
/// Return an `Array` containing the results of mapping `transform`
/// over `self`, discarding any elements where the result is `nil`.
///
/// - Complexity: O(N).
func mapSome<U>(@noescape transform: Generator.Element -> U?) -> [U] {
var result: [U] = []
for case let x? in self.map(transform) {
result.append(x)
}
return result
}
}
```

You can use this in the method-calling style, for example, with the brand new double-from-string failable initializer:

```let a = ["3.14", "foo", "6.02e23"]
let doubles = a.mapSome { Double(\$0) }
print(doubles) // no more println!
// prints [3.14, 6.02e+23]
```

Incidentally, this implementation of `mapSome` uses the new pattern-matching capabilities of `for` to filter out nils. The `case let` syntax is matching an enumeration – and since optionals are enumerations, it works for them too. `for` has also acquired `where` clauses:1

```let isEven = { \$0%2 == 0 }
for even in 0..<10 where isEven(even) {
print(even)
}
```

But there’s more. You can constrain the extension, similar to how you would constrain a placeholder in a generic function. And this means you can give protocols methods that only apply when they have certain properties. For example, there’s now a version of sort on arrays (or any other collection type) that doesn’t need to be given a isOrderedBefore argument, so long as the array contents are `Comparable`. There’s still an overload that takes a closure if you want to do something more custom, but if you just want the default behaviour (ascending), you don’t need one.

For example, here’s an implementation of `all` that returns true if all the values are equal to a certain value:

```// a version that only applies when the contents of the
// sequence are equatable
extension SequenceType where Generator.Element: Equatable {
/// Return `true` iff every element of `self` is `x`.
func all(equalTo: Generator.Element) -> Bool {
// of course, contains is now a method too
return !self.contains { \$0 != equalTo }
}
}

// and an unconstrained version where the caller supplies a predicate
extension SequenceType {
/// Return `true` iff every element of `self` satisfies `predicate`.
func all(criteria: Generator.Element -> Bool) -> Bool {
return !self.contains { !criteria(\$0) }
}
}

[1,1,1].all(1)       // true

let isEven = { \$0%2 == 0 }
[2,4,6].all(isEven)  // true
```

As a result, the number of free functions in the standard library has dropped from 101 down to 77. I wonder how far this will go – will it eventually just be a motly crew of `unsafeBitCast` and company left? Should `abs()` become an extension of `SignedNumberType`? And now it can be, should it be a property? This and more in the next exciting installment of Swift…

Grab Bag

Here are a few other changes to the standard library:

• `Array.withUnsafeMutableBufferPointer` has acquired a warning: do not use the array itself while calling it, only refer to the contents via the pointer. Presumably so updates to the array can potentially be deferred to after it’s finished executing. Same for `ContiguousArray` and `Slice`.
• They’ve also had some of their internal-type initializers privated.
• Some of the `_`-prefixed protocols have started to disappear. For example `_BidirectionalIndexType` is gone, and `BidirectionalIndexType` now has its `predecessor` method. And `_Comparable` is gone, `Comparable` now containing its < operator.
• `CollectionOfOne` and `EmptyCollection` now have a `count` property – just in case you want to check they’re not misleading you.
• So now for the start of the great extensionizing… `CollectionType` now has `isEmpty`, `count`, `map`, `filter`, `first`, `last`, `indexOf`, `indices` and `reverse`. So these are now available as methods on all collections. Corresponding methods in existing collections have been removed.
• `indexOf` is what `find` used to be called. And it now has a version that takes a predicate.
• An `ErrorType` protocol has been added to support the new error handling feature.
• `Printable` and `DebugPrintable` protocols are now the more descriptive `CustomStringConvertible` and `CustomDebugStringConvertible`
• There are also `CustomReflectable`, `CustomLeafReflectable` and `CustomPlaygroundQuickLookable` protocols for tweaking how your type behaves in a playground.
• And there’s a new `Mirror` type for returning from the `CustomReflectable` protocol.
• There’s a new `DictionaryLiteral` type, so you can now use the `[:]` syntax for more than just dictionaries without worrying about duplicate keys getting coalesced.
• As mentioned above, `Double`, `Float` and `Float80` have acquired failable initializers from strings.
• And the integers now have the same, replacing the `toInt` method on string. And they take a `radix` argument! You still can’t tell from the headers what default values are for defaulted arguments, but I'm guessing this one is 10.
• `FloatingPointType`’s `_toBitPattern` and `_fromBitPattern` are gone. I guess you can use `unsafeBitCast` if you like bits.
• All the lazy collections have acquired an `underestimateCount`.
• `MutableCollectionType` is a good example of extensions with `where` clauses – such as requiring the collection also be random-access in order for it to have `partition` and `sortInPlace` support.
• `SequenceType` sucks in `contains`, `sort`, `underestimateCount`, `enumerate`, `minElement`, `maxElement`, `elementsEqual`, `lexicographicalCompare` and `flatMap`
• `elementsEqual` being the new name for the `equal` function that compares two sequences.
• and `minElement` and `maxElement` now return optionals in case of empty sequences, and also now have versions that take `isOrderedBefore` closures.
• There’s a new `SetAlgebraType` protocol, for types that are “a generalized set whose distinct elements are not necessarily disjoint”. `Set` does not conform to it (though it has all the properties it requires).
• A type that does conform to it is `OptionSetType` protocol, for bitfield enums.
• `print` is gone! Now `println` is `print` and old `print` is `print(1, newLine: false)`
• `toString` is no more. Just use the `String` initializer that takes any type (and has a similar hierarchy of fallbacks)
• `readLine` reads from the standard input, returning an optional `String` so you can `while let` over it.

Protocol Extensions are Defaults

Silly jokes aside, there’s a good reason for why `CollectionOfOne` and `EmptyCollection` have implementations of `count`. If a protocol has a default implementation, but you know your specific type can do it faster because of how it works internally, a specific implementation will take precedence over the protocol version.

So for example, suppose we implemented the next (but fairly pointless) logical collection type: `CollectionOfTwo`:

```struct CollectionOfTwo<T>: CollectionType {
let first: T, second: T
subscript(idx: Int) -> T {
precondition(idx < 2, "Index out of bounds")
return idx == 0 ? first : second
}
var startIndex: Int { return 0 }
var endIndex: Int { return 2 }
}

let two = CollectionOfTwo(first: "42", second: "420")
",".join(two)  // “42,420"
```

Notice, that I didn’t need to define a generator at all – due to this handy new protocol extension:

```// CollectionType conforms to _CollectionGeneratorDefaultsType
// which is extended with:
extension _CollectionGeneratorDefaultsType {
func generate() -> IndexingGenerator<Self>
}
```

Anyway because `CollectionOfTwo` conforms to `CollectionType` it gets a `count` property for free. But it gets them by subtracting the end index from the start, which is quite a roundabout way of doing it. So you can instead give it a hardcoded value by explicitly implementing it:

```extension CollectionOfTwo {
var count: Int { return 2 }
}
```

Now, when `count` is called, it will run this code instead of the default.

Protocols can also replace the default implementations of other protocols they conform to – hence `CollectionType` defines `map` even though `SequenceType` does too.

While we’re implementing simple little types – the default string rendering of custom types is now a lot nicer. Even though it doesn’t yet conform to `CustomStringConvertible`, if you print out `CollectionOfTwo` you’ll get something that looks like `CollectionOfTwo(first: 42, second: 420)` which is much nicer that the `__lldb_expr_11.CollectionOfTwo` you used to get.

Strings are No Longer Collections

Something that might take you by surprise – `String` no longer conforms to `CollectionType`. Though it has all the required properties (just writing `extension String: CollectionType { }` without any implementation works), it’s no longer tagged as such. This is apparently due to concerns that, even with the `Character` representation, there were ways in which using collection algorithms on strings could produce non-Unicode-correct results.

Of course, you still need to manipulate strings, so rather than just resorting to staring at them extra intensely, you can use the new `CharacterView` accessible via the `characters` property:

```let s = "comma,separated,strings"
let fields = split(s.characters) { \$0 == "," }.map { String(\$0) }
```

Since `String`'s `Index` is just a typealias for `String.CharacterView.Index`, you can use them interchangeably:

```let s = "Hello, world!"
if let comma = s.characters.indexOf(",") {
print(s[s.startIndex..<comma])
}
```

Nonetheless, the fact that you have to switch to a character-based view on the string rather than operate on it directly should act as a reminder that you might be doing something that doesn’t behave correctly under all edge cases.

Type-Erased Containers

Finally, there are now a collection of “type-erased” container types available: `AnyBidirectionalCollection`, `AnyRandomAccessCollection` and `AnyForwardCollection` (and associated indexes), plus `AnyGenerator` and `AnySequence`. These could be used, for example, to bully different kinds of collection into being in the same container:

```let set: Set = [1,2,3]
let array = [4,5,6]
let anySet = AnyForwardCollection(set)
let anyArray = AnyForwardCollection(array)
let anys = [anySet,anyArray]
anys.flatMap { \$0 } // [2, 3, 1, 4, 5, 6]
```

However, this is probably not all that useful. Despite the name, this isn’t like the `Any` type – you can’t cast back into the original type, it’s more a one-way trip.

This is more useful when you want to expose an internal collection without exposing exactly what that collection type is. Instead you could create an `AnyXXXCollection` from your internal value, and return that, safe in the knowledge users of your class won’t complain when you switch to a different internal data structure later on.

Well that’s it. The standard library has some sparkly new online documentation. And as part of the WWDC sample code, there’s a playground all about the Swift standard library that is worth checking out. Also it looks like the developer forums for Swift no longer require a developer login to read either, if you want to take a look without registering.

Here’s to a future using all this stuff on the back end once the port to Linux is available!

1. Of course, you wouldn’t write this would you – you’d write `for even in stride(from: 0, to: 10, by: 2)`, right?

My Talk at Swift Summit

Earlier in the year, I spoke at Swift Summit, and the video has just been posted.

I also added some annotations to the slides with more info, credits and further links, which you can find here on speaker deck.

Swift Summit was a really fun conference to attend with many great speakers, and assuming it happens again next year I’d heartily recommend it.