Randomly Lazy

Posted on July 30, 2014July 31, 2014 by airspeedvelocity

In the previous article on extending lazy, we extended the lazy views with first_n, which returned a sequence of the first n elements of a sequence.

But this isn’t great if what you passed in was a random-access indexed collection. Where possible, it’s nice to retain random access on the output of your lazy function. The lazy map does this, for example. If you pass an array into lazy then call map, the collection you get back is a LazyRandomAccessView, and you can index randomly into it just like you could into an array.

To do this for first_n, we need a struct that implements Collection, takes a Collection as an initializer plus a count n, and layers a view over the top of that collection that only exposes the first n elements.

Or, in the more general case, a view that takes a collection and a sub-range within that collection. Here it is:

struct SubrangeCollectionView<Base: Collection>: Collection {
    private let _base: Base
    private let _range: Range<Base.IndexType>
    init(_ base: Base, subrange: Range<Base.IndexType>) {
        _base = base
        _range = subrange
    }

    var startIndex: Base.IndexType {
      return _range.startIndex
    }
    var endIndex: Base.IndexType {
      return _range.endIndex
    }

    subscript(idx: Base.IndexType)
        -> Base.GeneratorType.Element {
        return _base[idx]
    }

    typealias GeneratorType
        = IndexingGenerator<SubrangeCollectionView>
    func generate() -> GeneratorType {
        return IndexingGenerator(self)
    }
}

This sub-range collection view is so useful that I was sure it was somewhere in the Swift standard library. That’s partly why I wrote this article on collection and sequence helpers – as a side-effect of going line by line looking for it. I’m still expecting someone to reply to this article saying “no you fool, you just use X”. If you are that person, call me a fool here.

The Sliceable protocol seems designed to provide this function. It requires a collection to typealias a SliceType and implement a subscript(Range)->SliceType method. But you need the collection to implement sliceable, which seems overly restrictive, whereas the view above works for any collection.

SubrangeCollectionView enables you to pass in a sub-range of any collection to any algorithm that takes a collection or sequence. For example:

let r = 1...10
let halfway = r.startIndex.advancedBy(5)
let top_half = SubrangeCollectionView(r,
                subrange: halfway..<r.endIndex)
reduce(top_half,0,+)  // returns 6+7+8+9+10

A nice feature is that if you pass a subrange collection into an algorithm that returns an index, such as find, the index returned can also be used on the base collection. So in the example above, if you ran if let idx = find(top_half,8), you could then use idx on not just top_half[idx], but also r[idx].

Operating on subranges is a capability that C++ developers, used to the STL, will be missing from Swift collection algorithms. In the STL, operations on containers are performed not by passing the container itself to the algorithm, but instead by passing two iterators on that container defining a range.

For example, here is the definition of the find algorithm in the C++ STL compared to the Swift version:

// C++ std::find
template <class InputIterator, class T>
  InputIterator find
    (InputIterator first, InputIterator last, const T& val);

// Swift.find
func find
  <C: Collection where C.GeneratorType.Element: Equatable>
    (domain: C, value: C.GeneratorType.Element) -> C.IndexType?

While Swift.find takes a collection as its first argument, std::find takes a first and last iterator.

STL container iterators are similar to Swift collection indices, in that they can be forward, bidirectional or random, and are used to move up and down the collection. Like Swift indices, the end iterator points to one past the last element – the equivalent of Swift.find returning nil for not found would be std::find returning end.

But unlike Swift indices, STL iterators model C pointer-like behaviour, so they are dereferencable. To get the value the iterator points to, you don’t have to have access to the underlying container, using something like Swift’s subscript. To get the value at first, you call *first. This is why std::find doesn’t need to take a container in it’s input. It just increments first until either it equals last, or *first equals val.

There are lots of reasons why the STL models iterators like this: because it’s like pointers into C arrays; because it dodges memory management issues (which won’t apply to Swift) etc. But also because, with this model, every algorithm will work just as well on a sub-range of a container as on the whole container. You don’t have to pass the starting or ending iterator for the container into find, you can pass in any two arbitrary points.¹

Swift indices, on the other hand, are pretty dumb beasts. They don’t have to know about what value they point to or what collection they index. Heck, the index for Array is just an Int.²

For all the advantages of Swift’s just passing in collections as arguments has (it looks a lot cleaner, doesn’t confuse beginners),³ it lacks this ability to apply algorithms to subranges. SubrangeCollectionView gives you that ability back.

Anyway, back to the original problem, which was to enhance LazyRandomAccessView with a version of first_n that returns another LazyRandomAccessView. With the help of SubrangeCollectionView, here it is:

extension LazyRandomAccessCollection {
  func first_n
    (n: LazyRandomAccessCollection.IndexType.DistanceType)
    -> LazyRandomAccessCollection<SubrangeCollectionView<LazyRandomAccessCollection>> {
      let start = self.startIndex
      let end = min(self.endIndex, start.advancedBy(n))
      let range = start..<end
      let perm = SubrangeCollectionView(self, subrange: range)
      return lazy(perm)
    }
}

let a = [1, 2, 3, 4, 5, 6, 7]
let first3 = lazy(a).first_n(3)
first3[1]  // returns 2

One final interesting question is whether to do the same for forward and bidirectional lazy views as well. The problem here is computing the endIndex for the view. Unlike with a random-access index, you can’t just advance them by n in constant time. It takes O(n) because the index has to be walked up one by one. For this reason, I’d stick with returning sequences for these.

Of course, you can also pass a later iterator to first than to last, at which point, kaboom! A risk which passing in the container itself eliminates. ↩
Of course, you could write an über-index type for your collection, that did understand about what it pointed at. Maybe you could even overload the * operator for it to dereference itself. ↩
And lets face it, you operate on the whole collection 99.9% of the time. ↩

Working at being lazy

Posted on July 27, 2014July 29, 2014 by airspeedvelocity

As teased in the previous article, suppose you want a new kind of lazy evaluation. You want to be able to generate a lazy sequence that represents the first n values of a container.

Here’s how you could add it to LazyRandomAccess:

extension LazyRandomAccessCollection {
    func first_n
      (n: LazyRandomAccessCollection.IndexType.DistanceType)
      -> LazySequence<
           PermutationGenerator<
             LazyRandomAccessCollection,
             Range<LazyRandomAccessCollection.IndexType>
           >
         > {
        let start = self.startIndex
        let end = min(self.endIndex, start.advancedBy(n))
        let range = start..<end
        let perm = PermutationGenerator(elements: self, indices: range)
        return lazy(perm)
    }
}

This uses PermutationGenerator, which is a lazy view class that returns a (sub-)sequence of values from a collection in the order given by another collection of indices into that collection. In first_n’s case, the indices are a range of the first n indices (or the full collection if there are fewer than n elements).

The return type of first_n is a bit crackers so I’ve broken it up over multiple lines. It returns a lazy sequence of a permutation generator of a lazy random access collection, permuted by a range of lazy random access collection indices. Phew.¹ This is why the lazy function and Swift’s type inference is more than just nice to have, it’s essential. The actual types you are using in a simple function can quickly get to a point where there’s no practical way you could declare them by hand.

The practice of returning another lazy object that wraps the results is copied from the lazy map and filter members, which do the same. Why wrap PermutationGenerator in another LazySequence instead of returning it directly? Chaining mostly I think.² Without doing that, you’d have to re-lazy the result if you wanted to run more lazy filters on it:

let r = 1...10
// with lazy wrapper
let l = lazy(r).first_n(5).filter(isOdd)
// without lazy wrapper
let l = lazy(lazy(r).first_n(5)).filter(isOdd)

That works for a collection,³ but what about a sequence? How can we generate a new sequence that only returns the first n values?

Instead of declaring that in one shot, we’ll follow the pattern of map and filter and first declare a view:

struct FirstNSequenceView<Base: Sequence> : Sequence {
    private let _n: Int
    private let _base: Base
    init(_ base: Base, _ n: Int) {
        self._base = base
        self._n = n
    }

    func generate()
      -> GeneratorOf<Base.GeneratorType.Element> {
        var i: Int = 0
        var g = _base.generate()
        return GeneratorOf {
            ++i <= self._n ? g.next() : nil
        }
    }
}

This uses GeneratorOf, which takes for its constructor a closure that serves up values for each call to next(). This helps avoid having to manually write a companion generator class when you implement your own sequence (IndexingGenerator is a similar helper that can be used to conform to Sequence when you implement your own Collection). In this case, generate() creates a closure that captures a counter that will count up, returning elements from the sequence until it hits n. It then returns a GeneratorOf that will call that closure each time next() is called on it. Since each time generate() is called, a new closure with a fresh i will be created, this means FirstNSequenceView conforms to the requirement of Sequence that it can be walked multiple times with multiple independent generators.

Now that we have this view, it’s easy to declare first_n for all the lazy views:

extension LazySequence {
    func first_n(n: Int) -> LazySequence<FirstNSequenceView<LazySequence>> {
        return lazy(FirstNSequenceView(self,n))
    }
}
extension LazyForwardCollection {
    func first_n(n: Int) -> LazySequence<FirstNSequenceView<LazyForwardCollection>> {
        return lazy(FirstNSequenceView(self,n))
    }
}
extension LazyBidirectionalCollection {
    func first_n(n: Int) -> LazySequence<FirstNSequenceView<LazyBidirectionalCollection>> {
        return lazy(FirstNSequenceView(self,n))
    }
}
extension LazyRandomAccessCollection {
    func first_n(n: Int) -> LazySequence<FirstNSequenceView<LazyRandomAccessCollection>> {
        return lazy(FirstNSequenceView(self,n))
    }
}

It’s a bit of a hassle having to implement each one separately, but that’s the price you pay for structs that aren’t in a hierarchy. All the more reason to factor the hard part out into a seperate view object.

Does first_n belong as a lazy view member? Unlike map and filter, it doesn’t take a closure so the laziness is less likely to bite you. Maybe take_while, a filter that returned members of an array until a supplied predicate returned false, would have been a better example. But first_n could take a lazy sequence that itself depended on a closure, so the laziness of first_n could still come as a surprise. Also, the chaining is nice. Does this mean other lazy sequences like enumerate should migrate into being lazy class members? I dunno, maybe.

Without the chaining, you have to wrap the results in a function call. But the function call goes on the left, while chained calls go on the right, which results in a confusing ping-pong effect:

let l = lazy(first_n(lazy(r).filter(isOdd),5)).map(...)

The ordering of this is not obvious at first glance, but it’s filter, then first_n, then map. But then where does it end? Should everything get piled into the lazy classes just to benefit from chaining? Nah. The better solution is to implement a pipe forward operator, of which more some other time.

I have a nagging feeling this could be simplified. The Swift standard lazy functions don’t look quite this bad, they use a type of S instead of a type of themselves for the inner parts of the return values of map and filter. But then the compiler doesn’t completely faithfully represent the generic types in the declarations you see in Xcode (which is why there is a lot of T == T in what we see of the Swift standard library). Let me know if can figure out a better way. ↩
Also, if someone option-clicks the results, it’s obvious at a glance that the results are lazy. ↩
In fact, while a variant of the permutation version would work for forward and bidirectional collections as well, it’s not a good idea, because advance will take O(n) which is unnecessary, as the next version will show. ↩

Lazy by name, lazy by nature

Posted on July 26, 2014March 9, 2015 by airspeedvelocity

When last we discussed lazy collections and sequences, I opened the article with an “ah-hah-hah, this doesn’t do what you might assume” number.¹ What map and filter returned were objects that would evaluate and return results lazily on demand, so when you called them, no mapping or filtering took place right away. Instead it happened later, when you accessed the elements of a MapFilterView and its kin.

Well, turns out Apple decided that cleverly not doing what people might expect isn’t necessarily the best move, so as of beta 4, map and filter return an Array. They still take in collections and sequences of any kind, but an array is what they spit out.²

This is probably for the best. If you didn’t realize map computed lazily, you could be surprised when the results changed each time you iterated over a map using a closure like this:

let r = 1...4
var i = 0
let m = map(r) { $0 * ++i }

for e in m {
    // loops over 1, 4, 9, 16
}
for e in m {
    // loops over 5, 12, 21, 32
}

Even the Swift dev team weren’t immune to the unexpected consequences of lazyness. There were some bugs in using FilterCollectionView to populate an Array, as it took two passes, one to determine the array size needed and another to populate the array. A predicate that returned fewer results on the first pass than the second would result in buffer overrun.

Explicitly lazy

Now, with beta 4, there’s no excuses for getting surprised by lazy evaluation. If you still want to be lazy, you first need to pass your sequence or collection into a call to lazy(), which will give you back a lazy view class. What you get back depends on what you pass in – if you pass in a sequence, you’ll get back a LazySequence. If you pass in a collection, you’ll get back one of the lazy collection structs – either LazyForwardCollection, LazyBidirectionalCollection, or LazyRandomAccessCollection.

These views get progressively more features depending on the capabilities of their base. LazySequence has lazy map and filter methods that work like the old lazy map and filter functions, by returning another LazySequence object.

It also has an array property for crystalizing the lazy results into an array. Should you decide at a later point you want to iterate over the collection more than once, you should use this. If you don’t, duplicate iterations will wastefully re-run the mapping or filtering function over and over.

This also means that you can now safely write this:

var i = 0
let lf = lazy(1...5).filter { _ in
    ++i % 2 == 0
}
let a1 = lf.array
// a1 is [2, 4]
let a2 = lf.array
// a2 is [1, 3, 5]
let a3 = lf.array
// a3 is [2, 4]

LazyForwardCollection only adds subscript, since forward-indexable collections can’t do much more than sequences.

Note, filter still returns a sequence, even when called on the lazy collections, to avoid the heartache described above where other collection constructors assumed they could rely a collection’s length being consistent. The results of map can be a collection, because it returns a value for every element in the base no matter what. That collection inherits the index properties of the base.

LazyBidirectionalCollection and LazyRandomAccessCollection add the ability to reverse lazily. So if you wanted to filter just the first few items starting at the back of a collection, you could call lazy(col).reverse().filter { ... }.

The collection returned by reverse can be used wherever you use a regular collection. If you’re a C++ programmer and you liked the benefits of rbegin/rend, this might be what you’re looking for:

let s = "The cat in the hat"
let rs = lazy(s).reverse()
if let idx = find(rs, "h") {
    // idx points to the h of hat
    // not the h of The
}

The lazy factory

How you get the best lazy view class is pretty cool. lazy is actually 4 overloaded generic functions:

func lazy<S: SequenceType>(s: S) -> LazySequence<S>
func lazy<S: CollectionType where S.Index: ForwardIndexType>(s: S) -> LazyForwardCollection<S>
func lazy<S: CollectionType where S.Index: BidirectionalIndexType>(s: S) -> LazyBidirectionalCollection<S>
func lazy<S: CollectionType where S.Index: RandomAccessIndexType>(s: S) -> LazyRandomAccessCollection<S>

When you call lazy, the Swift compiler picks the most specific overload possible, preferring more specialized inherited protocols over base ones. So CollectionType beats SequenceType, because CollectionType inherits from SequenceType. CollectionType where S.Index: RandomAccessIndexType beats CollectionType where S.Index: BidirectionalIndexType because RandomAccessIndexType inherits from BidirectionalIndexType.³ What is returned is an instance of another generic class, that implements a lazy view on any specific collection or sequence.

I don’t know if there’s an official term for this, but I call it a generic factory. It’s similar to the abstract factory design pattern, in that you call a function to get back one of a range of possible concrete types. But in this case, the type determination happens at compile time, and what you get back is not an implementation of an abstract interface, but the actual appropriate concrete type.

This all feels transparent to the caller because of Swift’s type inference capabilities. You call lazy, passing in your base object, assign the result to a variable, and then merrily start using it. But you aren’t constrained to an interface exposing only the common features of the possible concrete classes, like you would be with an interface and absract factory set-up. If you passed in collection that’s capable of it, you get a reverse method.

Other than help pick the best container type, lazy doesn’t do much. There’s nothing stopping you from declaring the lazy views directly:

let r = 1...500
let l = LazyBidirectionalCollection(r)
let evens = l.filter { $0%2 == 0 }

But if you were implementing your own set of classes generated by this generic factory pattern, you could also put common set-up code in your generic factory method (or even have a generic factory class if needed).

Stride aside

By the way, the new stride function in beta 4 follows a similar pattern of returning different types at compile time from an overloaded function. But in its case, the overloading isn’t done by what you pass in. It isn’t done by types at all.

func stride<T: Strideable>(from start: T, to end: T, by stride: T.Stride) -> StrideTo<T>
func stride<T: Strideable>(from start: T, through end: T, by stride: T.Stride) -> StrideThrough<T>

These two functions differ only by the name of their middle parameter. I don’t know about you, but that this was possible was an eye opener. Score one for the Objective-C named arguments enthusiasts.

Extending lazy

So what if you have your own idea for a lazily evaluated filter to apply to sequences or collections? Well, you could extend the lazy classes to support it. We’ll look at that in the next article. Follow @airspeedswift to catch it.

Which, frankly, was a bit smug of me. ↩
Which, in this authors opinion, means they should kill the Array members map and filter, since they’re now duplicative. They could still be special-cased for performance purposes. ↩
This behaviour can also be used to pick an optimized version of a collection algorithm that takes advantage of random access e.g. to find a subsequence inside a collection. ↩

Swift’s Lazy Collections and Sequences

Posted on July 10, 2014March 9, 2015 by airspeedvelocity

Warning: this article is out of date as of Swift 1.0 beta 4. Read this updated version instead.

The Swift non-member function map acts much like the Array.map member function, except you can pass in any kind of sequence or collection (click here for an article on how to write an algorithm like this).

But if you look at what map returns, it isn’t an array of results. It’s an object (a MapColectionView or a MapSequenceView).

That’s because map evaluates its results lazily. When you call map, no mapping takes place. Instead, the input and mapping function are stored for later, and values are only mapped as and when they are needed.

To see this happening, run the following code:

let r = 1...3
let mapped = map(r) {
    (i: Int) -> Int in
    println("mapping (i)")
    return i*2
}

println("crickets...")

for i in mapped {
    println("mapped (i)")
}

println("tumbleweed...")

println("index 2 = (mapped[2])")

// prints out
crickets...
mapping 1
mapped 2
mapping 2
mapped 4
mapping 3
mapped 6
tumbleweed...
mapping 2
index 2 = 4

The returned object from map mimics the properties from the object passed in. If you pass in a sequence, then you get back a sequence. If the collection you passed in only supported a bidirectional index (such as a Dictionary or a String, which can’t be indexed randomly), then you won’t be able to subscript arbitrarily into the mapped collection, you’ll have to walk through it.

map is not the only lazy function. filter and reverse also return these “view” classes. Combining these different lazily evaluated classes means you can build up quite complicated expressions without sacrificing efficiency.

Want to search backwards through an array or string? find(reverse(myarray), value) won't actually reverse the array to search through it, it just iterates over the ReverseView class, which serves up the array in reverse order.

Want the first 3 odd numbers that are a minimum of two arrays at each position? first_n(filter(map(Zip2(a1, a2), min) {$0%2 == 1}, 3) wont do any more zipping or minning or modding than necessary.¹²³

There are also several lazily generated sequences you can also use:

enumerate numbers each item in a collection — it returns an EnumerateGenerator, that serves up those numbered elements on demand. This is useful in a for (count, val) in enumerate(seq) loop where you want to know both the value and how many values so far.

PermutationGenerator is initialized with a collection and a collection of indices into that collection, and serves up each element at each index in that order (thus allowing you to lazily reorder any collection – for example, PermutationGenerator(elements: a, indices: reverse(a.startIndex..<a.endIndex)) would be the reverse of a.⁴

GeneratorOf just takes a closure and runs it each time to generate the next value. So var i = 0;var naturals = GeneratorOf({ ++i }) sets naturals to a sequence that counts up from 1 to infinity (or rather, to overflow).

These infinite virtual sequences can be pretty useful. For example, to do the same thing as enumerate, you could write Zip2(naturals, somecollection). Or you could pass your infinite sequence into map or filter to generate further inifinite sequences.

(at the opposite end, if you have a single value, but what a function needs is a Sequence, you can use GeneratorOfOne(value) to create a quick sequence that will just serve up just your one value)

It's not all rainbows and unicorn giggles though. Imagine you did the following:

let r = 1...10_000
mapped = map(r) { megaExpensiveOp($0) }

let a1 = mapped
let a2 = mapped

Here the lazy evaluation will work against you. megaExpensiveOp will be run not ten but twenty thousand times.

“Shouldn't map cache the data?” you ask. Well that leads to the next complication. Take this code:

var i =0
let mapped = map(1...5) {
    i += 1
    return $0 + i
}

Every time you iterate over mapped now, you'll get different results.⁵

This behaviour might be put to very good use (say you wanted to peturb a data set with small random values). But if you weren't expecting it, that could be one nasty bug to track down.

So its good to know when you're using these functions what they are actually doing under the hood. Just keep in mind these caveats and an exciting life of lazy evaluation awaits you in the off-world colonies.

OK I cheated, first_n isn't a Swift standard library function. But it should be! ↩
This code would look a lot nicer if Swift came with a pipe operator ↩
Yeah, this is a pretty rubbish example, I should have put the time in to think of a better real-life use case. ↩
Obviously, reverse(a) would work better, but a more useful example is tough to fit on one line… ↩
If you're unclear on how come, try reading this article. ↩

	Sorting Nibbles in S… on Sorting Nibbles in Swift
	Collection Data Stru… on Arrays, Linked Lists and …
	Swift化零为整：Reduce 详解… on Arrays, Linked Lists and …
	Writing A Generic St… on Generic Collections, SubSequen…
	Writing A Generic St… on Collection Indices, Slices, an…

	Sorting Nibbles in S… on Sorting Nibbles in Swift
	Collection Data Stru… on Arrays, Linked Lists and …
	Swift化零为整：Reduce 详解… on Arrays, Linked Lists and …
	Writing A Generic St… on Generic Collections, SubSequen…
	Writing A Generic St… on Collection Indices, Slices, an…

Airspeed Velocity

African or European Swift?

lazy evaluation