Randomly Lazy

In the previous article on extending lazy, we extended the lazy views with first_n, which returned a sequence of the first n elements of a sequence.

But this isn’t great if what you passed in was a random-access indexed collection. Where possible, it’s nice to retain random access on the output of your lazy function. The lazy map does this, for example. If you pass an array into lazy then call map, the collection you get back is a LazyRandomAccessView, and you can index randomly into it just like you could into an array.

To do this for first_n, we need a struct that implements Collection, takes a Collection as an initializer plus a count n, and layers a view over the top of that collection that only exposes the first n elements.

Or, in the more general case, a view that takes a collection and a sub-range within that collection. Here it is:

struct SubrangeCollectionView<Base: Collection>: Collection {
    private let _base: Base
    private let _range: Range<Base.IndexType>
    init(_ base: Base, subrange: Range<Base.IndexType>) {
        _base = base
        _range = subrange
    }

    var startIndex: Base.IndexType {
      return _range.startIndex
    }
    var endIndex: Base.IndexType {
      return _range.endIndex
    }

    subscript(idx: Base.IndexType)
        -> Base.GeneratorType.Element {
        return _base[idx]
    }

    typealias GeneratorType
        = IndexingGenerator<SubrangeCollectionView>
    func generate() -> GeneratorType {
        return IndexingGenerator(self)
    }
}

This sub-range collection view is so useful that I was sure it was somewhere in the Swift standard library. That’s partly why I wrote this article on collection and sequence helpers – as a side-effect of going line by line looking for it. I’m still expecting someone to reply to this article saying “no you fool, you just use X”. If you are that person, call me a fool here.

The Sliceable protocol seems designed to provide this function. It requires a collection to typealias a SliceType and implement a subscript(Range)->SliceType method. But you need the collection to implement sliceable, which seems overly restrictive, whereas the view above works for any collection.

SubrangeCollectionView enables you to pass in a sub-range of any collection to any algorithm that takes a collection or sequence. For example:

let r = 1...10
let halfway = r.startIndex.advancedBy(5)
let top_half = SubrangeCollectionView(r,
                subrange: halfway..<r.endIndex)
reduce(top_half,0,+)  // returns 6+7+8+9+10

A nice feature is that if you pass a subrange collection into an algorithm that returns an index, such as find, the index returned can also be used on the base collection. So in the example above, if you ran if let idx = find(top_half,8), you could then use idx on not just top_half[idx], but also r[idx].

Operating on subranges is a capability that C++ developers, used to the STL, will be missing from Swift collection algorithms. In the STL, operations on containers are performed not by passing the container itself to the algorithm, but instead by passing two iterators on that container defining a range.

For example, here is the definition of the find algorithm in the C++ STL compared to the Swift version:

// C++ std::find
template <class InputIterator, class T>
  InputIterator find
    (InputIterator first, InputIterator last, const T& val);

// Swift.find
func find
  <C: Collection where C.GeneratorType.Element: Equatable>
    (domain: C, value: C.GeneratorType.Element) -> C.IndexType?

While Swift.find takes a collection as its first argument, std::find takes a first and last iterator.

STL container iterators are similar to Swift collection indices, in that they can be forward, bidirectional or random, and are used to move up and down the collection. Like Swift indices, the end iterator points to one past the last element – the equivalent of Swift.find returning nil for not found would be std::find returning end.

But unlike Swift indices, STL iterators model C pointer-like behaviour, so they are dereferencable. To get the value the iterator points to, you don’t have to have access to the underlying container, using something like Swift’s subscript. To get the value at first, you call *first. This is why std::find doesn’t need to take a container in it’s input. It just increments first until either it equals last, or *first equals val.

There are lots of reasons why the STL models iterators like this: because it’s like pointers into C arrays; because it dodges memory management issues (which won’t apply to Swift) etc. But also because, with this model, every algorithm will work just as well on a sub-range of a container as on the whole container. You don’t have to pass the starting or ending iterator for the container into find, you can pass in any two arbitrary points.¹

Swift indices, on the other hand, are pretty dumb beasts. They don’t have to know about what value they point to or what collection they index. Heck, the index for Array is just an Int.²

For all the advantages of Swift’s just passing in collections as arguments has (it looks a lot cleaner, doesn’t confuse beginners),³ it lacks this ability to apply algorithms to subranges. SubrangeCollectionView gives you that ability back.

Anyway, back to the original problem, which was to enhance LazyRandomAccessView with a version of first_n that returns another LazyRandomAccessView. With the help of SubrangeCollectionView, here it is:

extension LazyRandomAccessCollection {
  func first_n
    (n: LazyRandomAccessCollection.IndexType.DistanceType)
    -> LazyRandomAccessCollection<SubrangeCollectionView<LazyRandomAccessCollection>> {
      let start = self.startIndex
      let end = min(self.endIndex, start.advancedBy(n))
      let range = start..<end
      let perm = SubrangeCollectionView(self, subrange: range)
      return lazy(perm)
    }
}

let a = [1, 2, 3, 4, 5, 6, 7]
let first3 = lazy(a).first_n(3)
first3[1]  // returns 2

One final interesting question is whether to do the same for forward and bidirectional lazy views as well. The problem here is computing the endIndex for the view. Unlike with a random-access index, you can’t just advance them by n in constant time. It takes O(n) because the index has to be walked up one by one. For this reason, I’d stick with returning sequences for these.

Of course, you can also pass a later iterator to first than to last, at which point, kaboom! A risk which passing in the container itself eliminates. ↩
Of course, you could write an über-index type for your collection, that did understand about what it pointed at. Maybe you could even overload the * operator for it to dereference itself. ↩
And lets face it, you operate on the whole collection 99.9% of the time. ↩

2 thoughts on “Randomly Lazy”

Tuple-wrangling | Airspeed Velocity says:

August 2, 2014 at 12:24 pm

[…] my recent posts, my mind immediately went to making unzip lazy. But before we get to that, let’s muck about […]

Null Pointer Exceptions fixed, next up… | Airspeed Velocity says:

August 6, 2014 at 5:51 pm

[…] about their container would have the side benefit of allowing them to become dereferenceable (see a previous post for more on […]

	Sorting Nibbles in S… on Sorting Nibbles in Swift
	Collection Data Stru… on Arrays, Linked Lists and …
	Swift化零为整：Reduce 详解… on Arrays, Linked Lists and …
	Writing A Generic St… on Generic Collections, SubSequen…
	Writing A Generic St… on Collection Indices, Slices, an…

	Sorting Nibbles in S… on Sorting Nibbles in Swift
	Collection Data Stru… on Arrays, Linked Lists and …
	Swift化零为整：Reduce 详解… on Arrays, Linked Lists and …
	Writing A Generic St… on Generic Collections, SubSequen…
	Writing A Generic St… on Collection Indices, Slices, an…

Airspeed Velocity

African or European Swift?

Randomly Lazy

2 thoughts on “Randomly Lazy”

Leave a comment Cancel reply

Share this:

2 thoughts on “Randomly Lazy”

Leave a comment Cancel reply