So, you’ve figured out the whole indices versus distances situation, and now you’re feeling pretty happy and you start trying out some more generic collection algorithms. And before long you smack your head against one of the more confusing things about generic collections: a sequence’s SubSequence
doesn’t necessarily contain the same element type as that sequence. This usually manifests itself in one of those compiler errors that has you yelling “what are you talking about you infernal machine” at the compiler before eventually realizing what you’ve done wrong and sheepishly fixing it.
For example, suppose you want to find the location of the first subsequence in a collection. Like indexOf
, but for another sequence rather than an element. The simple brute-force approach would be to iterate over every subsequence, trying to match it. Something like this – except it won’t compile:
extension CollectionType where Generator.Element: Equatable { func search< Other: SequenceType where Other.Generator.Element == Generator.Element >(pattern: Other) -> Index? { for idx in self.indices { // error: cannot convert value of type 'Other' // to expected argument type... if suffixFrom(idx).startsWith(pattern) { return idx } } return nil } }
Until you know why, this can be very confusing. You’ve constrained the other sequence to have the same elements as the collection, so why can’t you use startsWith
on it?
Except you’re not calling startsWith
on Self
. You’re calling it on Self.SubSequence
, because that’s what suffixFrom
returns – it’s a slicing operation. And SubSequence.Generator.Element
isn’t guaranteed to be the same as Self.Generator.Element
.
Here’s the relevant declaration:
protocol CollectionType { typealias SubSequence : Indexable, SequenceType = Slice<Self> }
There’s nothing about this definition that forces the elements of SubSequence
to be anything specific. Really, this needs a where
clause on the typealias – something that Swift doesn’t currently support:1
protocol CollectionType { typealias SubSequence : Indexable, SequenceType // not (yet?) valid Swift where SubSequence.Generator.Element == Generator.Element }
So, in the mean-time, the way around this is to specify any requirements like this as part of the declaration. A fix that might seem to work, but is over-constraining, would be to require subsequences be the same type:
extension CollectionType where Generator.Element: Equatable, SubSequence == Self { // etc }
This would compile, and it would work fine with types whose slices are the same type as themselves, like strings:
// a slice of a String.CharacterView is // a String.CharacterView, so this is fine "hello".characters.search("ll".characters)
but not for arrays, who have an ArraySlice
type, or arbitrary collections, that have the default Slice
implementation.
// "type of expression is ambiguous without more context", apparently [1,2,3,4].search(2…4) // "ambiguous reference to member '..<'" uhhh what? (0..<5).search(2...4)
Instead, you can constrain the elements to be the same while allowing the slice type itself to differ:
extension CollectionType where Generator.Element: Equatable, // guarantee subsequences contain the same element SubSequence.Generator.Element == Generator.Element { func search< Other: SequenceType where Other.Generator.Element == Generator.Element >(pattern: Other) -> Index? { for idx in self.indices { // and now this line will happily compile... if suffixFrom(idx).startsWith(pattern) { return idx } } return nil } } [1,2,3,4].search(2...4) // returns 1 (0..<5).search([3,4]) // returns 3
Optimizing for Random Access
It’s often the case that you can optimize a collection algorithm if you know it has random access. For example, given the above algorithm, you can tweak it to avoid searching for a pattern that is longer than what remains of the collection, so long as both the collection and the pattern are random access and therefore able to calculate their lengths (and the lengths of their collection) in constant time. OK, maybe it’s not Boyer-Moore, but it’s better than nothing and the code to do this is pretty simple, in fact it’s more constraints than code at this point:
extension CollectionType where Generator.Element: Equatable, Index: RandomAccessIndexType, SubSequence.Generator.Element == Generator.Element { func search< Other: CollectionType where Other.Index: RandomAccessIndexType, Other.Index.Distance == Index.Distance, Other.Generator.Element == Generator.Element >(pat: Other) -> Index? { // if pattern is longer, this cannot match, exit early guard !isEmpty && pat.count <= count else { return nil } // otherwise, search from the start up to the // end less the space for the pattern for i in startIndex...endIndex.advancedBy(-pat.count) { // check if a slice from this point // starts with the pattern if self.suffixFrom(i).startsWith(pat) { // if it does, we've found it return i } } // otherwise, not found return nil } }
(one part of the where clause, Other.Index.Distance == Index.Distance
, is maybe overconstraining, but the alternative would be fiddling around with casts – non-Int
distances aren’t exactly common, even Bit
’s distance is an Int
).
We’ve talked about overload priority a long time ago, and it still pertains even with protocol extensions. The most specific function wins. Here, if your two collections are random access, this version will be called because it is more tightly constrained than the first version. So:
// calls the version that works for any index "hello".characters.search("ll".characters) // calls the version optimized for random-access [1,2,3,4].search(2...4) // returns 1?
Now, there is a potentially nasty bug you could stumble into here, and the compiler won’t save you from it. Try removing the constraints requiring the indices be random access on the second version. It still compiles! Except, now this version will be called strings as well. But in the case of non-random indices, it’s not an optimized version, it’s potentially slower. The reason being, pat.count
and endIndex.advancedBy
aren’t constant time, they’re linear, because they involve incrementing the index one by one. Now it could be worse – at least it’s not accidentally quadratic, but other examples could easily be (imagine implementing a sort that assumed constant-time advance but didn’t get it), so it’s something to watch out for.
Protocols, Extensions and Dynamic Dispatch
Once upon a time, it was harder to have made this mistake. In earlier versions of the standard library there were two ways to advance an index: a free advance
function, that was O(1)
for random access indices and O(n)
otherwise, and then there was an advancedBy
method on only the RandomAccessIndexType
.
As part of the great extensionizing, the free advance
function went away, and advancedBy
moved into ForwardIndexType
. So now you can call it on any index type, but you’ll get different performance characteristics depending on the type you call it on.
If you look at the declaration of ForwardIndex.advancedBy
, you’ll see that it’s declared twice. Once inside the protocol:
public protocol ForwardIndexType : _Incrementable { public func advancedBy(n: Self.Distance) -> Self }
and then again just below, as an extension:
extension ForwardIndexType { public func advancedBy(n: Self.Distance) -> Self }
This seems a little weird. Why declare it twice? Why not just declare it once as an extension? It has to do with static versus dynamic dispatch, and which actual implementation is called.
To show this, let’s create our own toy index, initially without random access:
struct MyIndex { let i: Int } extension MyIndex: BidirectionalIndexType { func successor() -> MyIndex { print("successing!") return MyIndex(i: i+1) } func predecessor() -> MyIndex { print("predecessing!") return MyIndex(i: i-1) } } // indices need to be Equatable func == (lhs: MyIndex, rhs: MyIndex) -> Bool { return lhs.i == rhs.i }
I’ve put prints in there so you can tell when the functions are called. Now, this index type will have the default advancedBy
implementation, which just calls successor()
multiple times. And because of the print
, you can see it happening:
func castHolyHandGrenadeOfAntioch< I: ForwardIndexType >(idx: I) { idx.advancedBy(3) } let idx = MyIndex(i: 1) castHolyHandGrenadeOfAntioch(idx) // prints “successing!” 3 times
OK, now, we conform the type to be random access by giving it advance and distance functions:
extension MyIndex: RandomAccessIndexType { func advancedBy(n: Int) -> MyIndex { return MyIndex(i: i+n) } func distanceTo(end: MyIndex) -> Int { return end.i - i } }
and now, when we call the castHolyHandGrenadeOfAntioch
, we see nothing printed. Instead, even though the function only required a forward index, it uses the better advancedBy
implementation of the random-access index.
But… this only works because advancedBy
had been declared in the original ForwardIndexType
protocol declaration. If, on the other hand, we added another method as an extension to different index types – say, a retreatedBy
function – this would not happen.
First, let’s do that for BidirectionalIndexType
:
extension BidirectionalIndexType { // just an example, you don't have to do this! // since BidirectionalIndexType also supports // advancedBy(-distance) func retreatedBy(n: Distance) -> Self { var i = n var result = self // since Distance isn’t strideable, this // is a rare case where you might mourn the // loss of C-style for and the ++ operator… while i > 0 { result = result.predecessor() i -= 1 } return result } }
and then, an optimized version for RandomAccessIndexType
:
extension RandomAccessIndexType { func retreatedBy(n: Distance) -> Self { return self.advancedBy(-n) } }
But now, if we repeat our earlier experiment but calling this function, we get a different result:
// only requires bidirectional access func runAway<I: BidirectionalIndexType>(idx: I) { idx.retreatedBy(10) } // and even though MyIndex is random access let braveSirRobin = MyIndex(i: 10) // the bidirectional version is called... runAway(braveSirRobin) // prints “predecessing!” 10 times
So we can see the difference between default implementations that are also declared as part of the protocol, versus ones just created as extensions. With the former, generic functions get to take advantage of specialized implementations, whereas with the latter, only the implementation that matches the generic constraint will be called. This is why you often see functions declared twice in the standard library. For example, on SequenceType
, underestimateCount
is declared like this, so that if you pass in a collection to a function that takes a sequence, it can check it’s length if it’s a collection, without risking consuming the sequence.
- That’s assuming that there isn’t a legitimate reason to vary elements in subsequences which I don’t… think… there is? ↩
[…] Airspeed Velocity: […]
I think (but am not certain) this implementation of ‘retreatedBy’ is simpler:
extension BidirectionalIndexType where Distance: SignedIntegerType {
func retreatedBy (n: Distance) -> Self {
return (0..<n).reduce(self) { (i,_) in i.predecessor() }
}
}
This requires constraining Distance, but I don't think that is a problem because as far as I can tell all Distance typealiases on Index types in the Standard Library are either Int or IntMax (a.k.a. Int64). In any case it's hard to imagine any Distance that is not at least an IntegerType (it is defined as a "number of steps"). Presumably BidirectionalIndexType.Distance is defined as _SignedIntegerType in the standard library for technical reasons.
I see my post was victim to one of the few disadvantages of using tabs for indentation 🙂
As someone who fights continually with wordpress to not have my generic placeholders stripped as HTML tags… I feel your pain.
Yeah I thought about mentioning that… the ideal would be that Distance be Strideable… but it isn’t so you either have to constrain, or just rely on -=.
[…] the where clauses. We know the index needs to be random-access. We also rely on slicing, so per the previous article on this topic, we need to require SubSequence contains the same elements as the […]