The Scala Collection API Sucks … Or is it a Work of Beauty?

A commenter on my last post was a little … irritated by the Scala Collection API (and so are probably others):

Scala libraries are a mess. Its baffling to look at the ScalaDoc for something like List, which should be simple. And then discover it extends 30 other classes and has 50 traits.

And I must admit the first time I looked at it I was kind of confused. Lets look at a simple example: List. scala.collection.immutable.List to be precise. It inherits directly from
LinearSeq, Product, GenericTraversableTemplate and LinearSeqOptimized, and indirectly from : LinearSeqOptimized, Product, LinearSeq, LinearSeq, LinearSeqLike, Seq, SeqLike, PartialFunction, Iterable, IterableLike, Equals, Traversable, Immutable, GenericTraversableTemplate, TraversableLike, TraversableOnce, FilterMonadic, HasNewBuilder, AnyRef and Any. Most of which have type parameters, making them even more intimidating.

Obviously you are tempted to compare it to lets say java.util.ArrayList. It implements or inherits only the following interfaces and classes: Object, AbstractCollection, AbstractList, ArrayList, Cloneable, Collection, List, RandomAccess, Serializable.

Why the huge difference? Scalas collection API is extremely modular. Every little aspect of what the unexpected developer might consider a simple class is factored in its own little trait which gets reused for various ‘collections’. Let’s try to tear it apart.

If you consider the concept of a List you might say a List is just a bunch of elements I can access by their index. This is probably the way most developers think about lists most of the time. And thats OK. Actually if you are operating on that level Scala is easier to work with than Java. In Java you write code like

  1. List<String> myList = Arrays.asList("one", "two");
  2. String one = myList.get(0);

in Scala you’ll simply write

  1. var myList = List("one", "two")
  2. val one = myList(0);

The Scala version is shorter, you don’t have to know other classes but List and you don’t have to specify the type parameter, since it is inferred.

But if you are a specialist in collections you’ll know there is much more to a list.

Different implementations of lists my differ in the performance of various methods like

  • accessing the first element
  • accessing the last element
  • adding an element at the beginning or at the end

When this kind of difference is important to you take care of it by using an apropriate implementation, like ArrayList or LinkedList. But this is really overspecifying your requirements since you don’t really want the specific implementation, you only want a specific runtime behavior.

This is exactly what you do when you specify that you want a LinearSeq or a IndexedSeq in Scala. Sorry Java, Scala wins again.

There are two fundamentally different ways in dealing with a List (or actually with any Collection): mutable or immutable and of course there is the case when you just don’t care.

In Scala you can specify this by using the traits defined in scala.collection, scala.collection.immutable or scala.collection.mutable. In Java you’ll wrap your instances with Collections.immutableList(..) calls, which is quite noisy And there is really no (clean) way to ensure that a given method only gets passed one or the other variation. In Scala the compiler will make sure you get the correct variant or a compile time error.

When broadening the discussion to collections in general you’ll want to iterate through all elements. Or do you want to enumerate all elements? What is the difference anyway? Here Java bites you with its stubborn backwards compatibility, leaving you with two very similiar interfaces which mainly differ in their respective time of conception: Iterable and Enumerable.

In Scala you have two traits as well:Traversable and Iterable. These look very similar but have very specific different meaning: The later guarantees a processing of its elements one after the other, while the first only guarantees that each element gets processed, but possibly in parallel (As of now the actual implementation of parallel collections is work in process, but the traits are already there). One more point for Scala.

On top of that Scala provides a lot of methods on collections which make only sense in the presence of closures, like map, filter and many more. Obviously there is nothing comparable in the Java libraries.

But when everything is so nice, why do people make a statement like that quoted at the beginning of this article? Well all the different aspects mentioned above come in their own seperate traits. Then there are the XXXLike traits which implement all the methods of XXX based on very few methods so that you don’t have to provide only those few when you want to create an implementation of XXX. Then there are companion objects which provide nice concise way to instanciate default implementations of the collection at hand. And finally you have a similiar set of traits for dealing with various aspects for the functional/closure based features of collections. Therefore you end up with a huge number of traits. If you want to know where method x is actually implemented you might have to do quite some searching. Thats the cost of modularity.

One thing to lessen the cost of modularity is documentation. Concerning Scala collections you should obviously consult ScalaDoc. Note that in ScalaDoc you can click on the different traits extended by a class to hide or show the methods comming from that trait. There is also a nice documentation of the different collection variants written by Odersky which will help you deciding which trait you really need. And there is another piece by Odersky which describes the internal workings of the collection API.

So when starting with Scala, just start with Set, Seq (instead of List) and Map from scala.collection to get at the Java level. And from there start exploring.


Share:
  • DZone
  • Digg
  • del.icio.us
  • Reddit
  • Facebook
  • Twitter