This course has already ended.

Luet oppimateriaalin englanninkielistä versiota. Mainitsit kuitenkin taustakyselyssä osaavasi suomea. Siksi suosittelemme, että käytät suomenkielistä versiota, joka on testatumpi ja hieman laajempi ja muutenkin mukava.

Suomenkielinen materiaali kyllä esittelee englanninkielisetkin termit. Myös suomenkielisessä materiaalissa käytetään ohjelmaprojektien koodissa englanninkielisiä nimiä kurssin alkupään johdantoesimerkkejä lukuunottamatta.

Voit vaihtaa kieltä A+:n valikon yläreunassa olevasta painikkeesta. Tai tästä: Vaihda suomeksi.


Chapter 7.1: Streams

About This Page

Questions Answered: How do I write a program that reads in an unspecified amount of input or repeats some other operation a initially unknown number of times? How can I operate on a bunch of data one at a time without first storing all of it in memory?

Topics: Finite and infinite streams. By-name parameters.

What Will I Do? Read and program.

Rough Estimate of Workload:? A couple of hours? This is a fairly long chapter but much of it is optional reading.

Points Available: A190. (There isn’t all that much to do, but the points value is high because the assignments are intended as near-mandatory.)

Related Projects: Sentiments (new), HigherOrder.

../_images/person04.png

Introduction

As a warm-up for the main subject of this chapter, streams, let’s consider another topic.

Suppose we intend to create a function fiveTimes. This function should return a five-element vector where each element is the value of a given Int expression:

fiveTimes(100)res0: Vector[Int] = Vector(100, 100, 100, 100, 100)
fiveTimes(1 + 1)res1: Vector[Int] = Vector(2, 2, 2, 2, 2)

We really want five separate evaluations of the same given expression. If different evaluations yield different results, that should be reflected in the output:

import scala.util.Randomimport scala.util.Random
fiveTimes(Random.nextInt(100))res2: Vector[Int] = Vector(43, 55, 21, 46, 87)
fiveTimes(Random.nextInt(100))res3: Vector[Int] = Vector(33, 65, 62, 31, 73)

Here’s an implementation for fiveTimes. Assess it.

def fiveTimes(numberExpression: Int) =
  Vector.tabulate(5)( anyIndex => numberExpression )

Which of the following claims are correct? Select all that apply.

Chapter 4.2 introduced you to the Option class and its method getOrElse.

Some(100).getOrElse(0)res4: Int = 100
None.getOrElse(0)res5: Int = 0
vectorOfWords.lift(-123).getOrElse("no word at that index")res6: String = "no word at that index"

Above, we used a literal as getOrElse’s parameter, but we could have used a compound expression as well:

None.getOrElse(1 + 1)res7: Int = 2
vectorOfNumbers.lift(-5).getOrElse(Random.nextInt(10))res8: Int = 8

There are a couple more expressions below. Work out how they behave.

Some(123).getOrElse(1 / 0)
Some(123).getOrElse(Random.nextInt(100))

Which of the following is correct?

Passing Unevaluated Parameters

Here’s a version of fiveTimes that works. It’s nearly identical to our earlier attempt:

def fiveTimes(numberExpression: =>Int) =
  Vector.tabulate(5)( anyIndex => numberExpression )
There is a lone arrow in front of the parameter type. It means that this is an unevaluated parameter, better known as a by-name parameter.

As we just re-established, ordinary parameters — also known as by-value parameters — are evaluated before the function call begins. What gets passed in is a value. In contrast, a by-name parameter is not evaluated before the function call begins: what gets passed in is an unevaluated expression. That parameter expression will be evaluated only when the function body uses it (if in fact it does use it). That is, the parameter is evaluated while the function is already running.

In case a function uses a by-name parameter multiple times, the parameter gets evaluated multiple times:

def fiveTimes(numberExpression: =>Int) =
  Vector.tabulate(5)( anyIndex => numberExpression )
tabulate forms a five-element vector, placing an Int at each of the indices from 0 to 4. Each element is obtained by evaluating numberExpression. If numberExpression is, say, Random.nextInt(100), five distinct random numbers will be placed in the vector.

You can think of by-name parameters as a sort of parameterless function that is passed to another function as a parameter.

Here’s an illustrated example of a by-name parameter:

In O1, you aren’t required to define methods that take by-name parameters. However, you will need to use such methods. There’s nothing much to it, and indeed you have already done so, since getOrElse on Options takes a by-name parameter. getOrElse has been defined to either a) return the contents of the Option wrapper and leave the parameter untouched, or b) evaluate the parameter expression and return the result if there’s nothing in the Option. This is why the method works as discussed above.

Another familiar example

In Chapter 4.4 you saw that the subexpression to the right of the logical operators && and || is evaluated only in case the subexpression on the left isn’t enough to decide the value of the logical expression. You also know that each of those operators is actually a method on Boolean objects (Chapter 4.5); the “subexpression to the right” is actually a by-name parameter of that method.

More jargon

If a parameter expression is evaluated at least once, the parameter is termed strict (tiukka). The parameter of fiveTimes is strict; so are all by-value parameters. If there is a possibility that a parameter does not get evaluated at all, it’s termed non-strict (väljä); getOrElse’s parameter is one example.

Now to the main subject of this chapter.

Challenge: An Unknown Number of Repetitions

So far in O1, we have repeated commands by gathering all the necessary data in a collection, then working through that collection one element at a time. The number of repetitions has equalled the size of the collection, which has been known to our programs by the time we start traversing the collection. For example, we have often collected data in a vector, which is wholly stored in the computer’s memory; we have then used a for loop or a higher-order method to do something with some or all of the vector’s elements.

In contrast, consider these common scenarios:

  • We intend to read inputs from the user for the program to process until the user indicates they wish to stop. The program doesn’t know in advance how many inputs the user will enter before stopping; the user may not know, either.
  • We intend to process data from a source (a file, a network, a physical sensor governed by the program, etc.), processinga single element at a time. We don’t know the number of elements before starting and there could be so many of them that we don’t know if it’s possible to store all of them at once in the computer’s memory.
  • We intend to compute increasingly precise approximations of a desired quantity by repeatedly applying a particular computation; Newton’s method for finding roots is an example. We need an iterative process that applies the computation to the previous result until we reach sufficient precision. We don’t know in advance how many iterations it will take before that happens.

More generally still: we wish to repeat an operation an initially unknown number of times. In principle, there is no limit to the number of repetions; we might have a source that continuously generates more data, for instance.

A more concrete example

Let’s write a toy program that reports the lengths of four strings. There is nothing new here yet.

object LengthReport extends App {
  def report(input: String) = "The input is " + input.length + " characters long."
  def lines = Vector("a line of text", "another", "not written by the user", "but hardcoded into the program")
  lines.map(report).foreach(println)
}
We have an auxiliary function that forms a report of a single input.
Another function gives us a collection with some example data. In this first version of the program, that collection is a vector and contains five specific strings.
The last command uses these functions: we take the strings, generate a report from each one, and print out the reports.

Now, let’s undertake to edit that program so that it works like this, in the text console:

Enter some text: hello
The input is 5 characters long.
Enter some text: hello again
The input is 11 characters long.
Enter some text: third input
The input is 11 characters long.
Enter some text: fourth
The input is 6 characters long.
Enter some text: stop already
The input is 12 characters long.
Enter some text: stop
The input is 4 characters long.
Enter some text: please
Our program should not use any standard set of strings. It should prompt the user for input.
It should keep doing that until the user enters the word "please". Before that, the user may any number of inputs large or small.

How can we specify when the input-reading and report-printing should stop when we don’t know the number of inputs?

One way to do that is to use something called a stream. Before we apply streams to that problem, however, let’s start from some basics.

A Stream of Elements

A stream (virta) is collection of elements. The word is meant to evoke the notion that a flowing stream “brings” elements for processing.

You can create a stream by listing its elements, just like you can create other collections:

val streamOfData = Stream(10.2, 32.1, 3.14159)streamOfData: Stream[Double] = Stream(10.2, ?)
The toString on Streams, which is here used by the REPL, doesn’t list the entire contents of the stream. This is a symptom of how streams work; we’ll say more about that in a bit.

Another approach is to use toStream, which creates a stream from an existing collection (like toVector, toBuffer, etc.; Chapter 4.1):

val vectorOfWords = Vector("first", "second", "third", "fourth")vectorOfWords: Vector[String] = Vector(first, second, third, fourth)
val streamOfWords = vectorOfWords.toStreamstreamOfWords: Stream[String] = Stream(first, ?)
This gives us a stream that contains the same elements as the vector.

Streams have the collection methods you know. As an example, the following command skips past a couple of the first elements in the stream, then picks out the first of the remaining ones:

streamOfWords.drop(2).headres9: String = third

A loop works, too:

for (word <- streamOfWords) {
  println("data from the stream: " + word)
}data from the stream: first
data from the stream: second
data from the stream: third
data from the stream: fourth

As do higher-order methods:

streamOfWords.filter( _.length > 5 ).map( _ + "!" ).foreach(println)second!
fourth!

Nothing groundbreaking there. How are streams different?

Infinite Streams

One special thing about streams is that their size doesn’t need to be finite. You can define an endless stream of values. (Cf. infinite sequences in math.)

The factory method continually creates an infinite stream:

val myStream = Stream.continually("SPAM")myStream: Stream[String] = Stream(SPAM, ?)

We have just created a collection that has the string "SPAM" as each of its elements and that contains an infinite number of these identical elements. Let’s take the first five and print them out:

myStream.take(5).foreach(println)SPAM
SPAM
SPAM
SPAM
SPAM
The original stream is infinite, but take returns a finite stream of the given length. That stream contains some of the elements from the infinite stream.
foreach doesn’t have an infinite amount of work to do since we apply it to only the five-element stream returned by take.

Consider another example of an infinite stream. (It features the ++ operator from Chapter 4.1, which combines two collections.)

val words = Vector("first", "second", "third", "fourth")words: Vector[String] = Vector(first, second, third, fourth)
def wordStream = words.toStream ++ Stream.continually("OUT OF WORDS")wordStream: Stream[String]
wordStream.take(7).foreach(println)first
second
third
fourth
OUT OF WORDS
OUT OF WORDS
OUT OF WORDS
We form a stream that has the four elements for starters, followed a specific string that repeats an arbitrary number of times.
If we take the first seven elements, we get the four different ones and three copies of the repeating message.
If we hadn’t used toStream here, our code wouldn’t work. Why? What would happen? You can try it in the REPL, but if you have any unsaved work, save it first.

In the examples above, the infinite part of each stream was made up of identical elements. In the example below, that isn’t the case. Let’s form a stream of pseudorandom numbers.

val randomStream = Stream.continually( Random.nextInt(10) )randomStream: Stream[Int] = Stream(8, ?)

Now let’s take a few numbers:

randomStream.take(5).foreach(println)8
9
5
6
8

And here we generate random numbers between 0 and 99 until we happen to hit one that exceeds 90:

Stream.continually( Random.nextInt(100) ).takeWhile( _ <= 90 ).mkString(",")res10: String = 31,84,16,45,72,81,41,36,87,19,79,62,13,60,47,45,66,58,85,15,8,9,7,30,68,41,48,80,21,78,72,27
Stream.continually( Random.nextInt(100) ).takeWhile( _ <= 90 ).mkString(",")res11: String = 0,65,83,38,75,33,11,18,75,51,3
takeWhile returns a stream that terminates just before the first element that fails to match the given criterion. Even though our original stream is infinite, the resulting stream is not. However, like continually and take, takeWhile does not yet generate all the numbers; it merely returns a stream that is capable of generating them until a terminating element is reached.
mkString produces a string that describes all the elements in the stream. Had we invoked it on an infinite stream, we would crash the program by filling the available memory. On a finite stream, however, this works. In this example, the streams’ contents are random and we get streams of different lengths when we run the command multiple times.

How Streams Work

Obviously, a computer cannot store an infinite number of distinct elements, which is where the key feature of streams comes in: the whole stream isn’t constructed as soon as you create a stream object. Instead, the stream generates new elements only as needed. For instance, in order to run mkString above, it was necessary to generate some random numbers by repeatedly evaluating Random.nextInt(100). Those parts of the stream were actually generated in memory, but only when mkString needed them and only to the extent necessary.

Earlier in this chapter, we discussed by-name parameters, which are evaluated by the called function as needed. It is precisely such a by-name parameter that we passed to continually; the method forms a collection that keeps evaluating the parameter whenever we access the stream for a new element.

This principle of generating elements only when needed will be crucial as we use streams for handling input, next.

Back to Our Input-Processing Program

Earlier, we came up with this program that reports the lengths of four specific strings.

object LengthReport extends App {
  def report(input: String) = "The input is " + input.length + " characters long."
  def lines = Vector("a line of text", "another", "not written by the user", "but hardcoded into the program")
  lines.map(report).foreach(println)
}

What we wanted instead is a program that reports on the lengths of an arbitrary number of inputs and terminates with "please". With streams, this is easy to accomplish. Here’s an almost working implementation:

object SayPlease extends App {
  def report(input: String) = "The input is " + input.length + " characters long."
  def inputs = Stream.continually( readLine("Enter some text: ") )
  inputs.map(report).foreach(println)
}
The only difference to the previous version is: we don’t return a vector of predefined strings but a stream that “brings” inputs for the program to process. This is an infinite stream of strings that are generated by asking the user to provide them. This command alone does not yet prompt the user for all those inputs! It only defines a stream whose elements come from calling readLine whenever we need a new element.
map generates a “stream of reports” whose each element is formed (as needed) by calling readLine and applying report to the resulting string. This command also doesn’t prompt the user for the inputs, nor does it call report on them. It simply prepares the stream to do so if and when we later access the elements.
We use foreach to print out the elements of the report stream. In order to print an element, it’s necessary to determine what that element is by prompting the user for input and applying report. In practice, what we get is a program that repeatedly receives keyboard input and reports its length.

That pretty much works. But we didn’t attend to the magic word yet: the above program just keeps prompting for new inputs till the cows come home. This one doesn’t:

object SayPlease extends App {
  def report(input: String) = "The input is " + input.length + " characters long."
  def inputs = Stream.continually( readLine("Enter some text: ") )
  inputs.takeWhile( _ != "please" ).map(report).foreach(println)
}
takeWhile stems the stream at "please". No more elements (user inputs) will be generated once that element is reached. (Cf. the earlier example of takeWhile on a random stream.)

Assignment: Sentiments about Movies

Let’s create a program that reads in the user’s short textual comments about movies. The program then tries to figure out whether each of those user inputs is positive or negative in tone. That estimate will be based on thousands of earlier movie reviews that have been written by real people, manually marked as positive or negative by a human, and made accessible to the program.

The more difficult and laborsome parts of this program have already been written. You’ll only need to put the user interface in order.

Task description

Your task is to fetch Sentiments and complete its user interface in MovieSentimentApp.scala. The UI should work in the text console as shown below.

Please comment on a movie or hit Enter to quit: This is a masterpiece, a truly fantastic work of cinema art.
I think this sentiment is positive. (Average word sentiment: 0.36.)

Please comment on a movie or hit Enter to quit: I hated it.
I think this sentiment is negative. (Average word sentiment: -0.41.)

Please comment on a movie or hit Enter to quit: The plot had holes in it.
I think this sentiment is negative. (Average word sentiment: -0.28.)

Please comment on a movie or hit Enter to quit: Adam Sandler
I think this sentiment is negative. (Average word sentiment: -0.80.)

Please comment on a movie or hit Enter to quit: It was great.
I think this sentiment is positive. (Average word sentiment: 0.04.)

Please comment on a movie or hit Enter to quit: It wasn't great.
I think this sentiment is negative. (Average word sentiment: -0.16.)

Please comment on a movie or hit Enter to quit: It was "great".
I think this sentiment is positive. (Average word sentiment: 0.04.)

Please comment on a movie, or hit Enter to quit: 
Bye.
The program doesn’t even try for a finespun analysis: it just lumps the user’s inputs in two categories: positive and negative.
The program also reports a number that it used in making each decision. A positive number means that, on average, the individual words of the input tend to occur in positive movie reviews, judging by the genuine data the program has been trained with.
If the input is blunt enough, the program classifies it quite accurately. It doesn’t get punctuation, though, or many of the other subtleties of human communication.
The user stops the program with an empty input. (The input must be genuinely empty with no characters, not even spaces.)

The training data tells our program which words occur in positive reviews and which words occur in negative ones. You can find that data in sample_reviews_from_rotten_tomatoes.txt within the Sentiments project. There’s no need for you to touch that data, but do take a look; it will help you understand how the program works.

The analyzer’s programmatic implementation is in SentimentAnalyzer.scala. Go ahead and browse the Scaladocs and the code if you want (but it's not required).

Implement the user interface in MovieSentimentApp.scala in package o1.sentiment.ui. The parts of the puzzle are there already, but you’ll need to use a stream and its methods to piece them together.

Instructions and hints

  • The UI should work much like our previous example program.
  • In fact, there is rather little the you need to do. You can solve the entire assignment with a couple of lines of code, or even just one.

Submission form

A+ presents the exercise submission form here.

Assignment: Reading in Measurements

This assignment provides a bit more practice on reading in a stream of keyboard input.

Task description

In Chapter 6.3, you wrote averageRainfall. That function computes the average of a vector of rainfall measurements (Ints), stopping at 999999, which marks the end of the first data series. Any negative numbers in the input vector are ignored.

Now implement the same functionality as an interactive program. Place the program in project HigherOrder, in RainfallApp.scala

The program should work precisely as illustrated in the examples below.

Enter rainfall (or 999999 to stop): 10
Enter rainfall (or 999999 to stop): 5
Enter rainfall (or 999999 to stop): 100
Enter rainfall (or 999999 to stop): 10
Enter rainfall (or 999999 to stop): 5
Enter rainfall (or 999999 to stop): -100
Enter rainfall (or 999999 to stop): -100
Enter rainfall (or 999999 to stop): 110
Enter rainfall (or 999999 to stop): 999999
The average is 40.
Enter rainfall (or 999999 to stop): 999999
No valid data. Cannot compute average.
Enter rainfall (or 999999 to stop): -123
Enter rainfall (or 999999 to stop): -1
Enter rainfall (or 999999 to stop): 999999
No valid data. Cannot compute average.

Instructions and hints

  • In this assignment, you do not have to trouble yourself with what happens if the user enters invalid values such as the empty string "", "llama", or "3.141". You can assume that the entered text can be successfully interpreted as an integer by the toInt method (Chapter 4.5).
  • You could solve the assignment by reading the inputs with a stream, copying them into a vector (toVector), and then computing the average as you did in the earlier rainfall assignment. On the other hand, you don’t need to use a vector at all, since a stream also has the necessary methods.
    • For instance, it’s fine to call size and sum on a stream, as long as it’s finite.
    • If you call those methods on a stream, use a val that refers to the stream.
  • Make sure to spell the program’s output exactly right so that the automatic grader will give you the points you deserve.

Submission form

A+ presents the exercise submission form here.

Streams of Numbers, Conveniently

Is it possible to create an infinite vector with all the natural numbers, or something like that?

A vector (or a buffer or any other strictly evaluated collection) won’t do, because all of the vector’s elements are stored in memory and an infinite vector would need an infinite amount of storage.

But a stream will do:

def positiveNumbers = Stream.from(1)positiveNumbers: Stream[Int]
positiveNumbers.take(3).foreach(println)1
2
3
As it happens, the Scala API gives us from, a helpful method that creates a stream of increasing numbers. We pass in the first number.

Here we increment by ten at each element:

def evenTens = Stream.from(0, 10)evenTens: Stream[Int]
evenTens.take(3).foreach(println)0
10
20

And here we increment by -1, which gives us a stream of decreasing numbers:

def negativeNumbers = Stream.from(-1, -1)negativeNumbers: Stream[Int]
negativeNumbers.take(3).foreach(println)-1
-2
-3

Of course, from also works in combination with other methods:

val firstBigSquare = Stream.from(0).map( n => n * n ).dropWhile( _ <= 1234567 ).headfirstBigSquare: Int = 1236544

Optional Reading: Different Kinds of Streams

A stream that iterates

iterate is another method worth mentioning. It creates a stream that generates each element by re-applying a function to the previous element:

def alternating = Stream.iterate(1)( x => -2 * x )alternating: Stream[Int]
alternating.take(4).foreach(println)1
-2
4
-8
def babble = Stream.iterate("")( "blah" + _ )babble: Stream[String]
babble.take(4).foreach(println)
blah
blahblah
blahblahblah

The example below uses iterate to generate a stream that implements an algorithm that estimates the square root of a given number without resorting to the sqrt library function. (The program applies Newton’s method for approximating square roots of positive numbers.)

def squareRoot(n: Double) = {
  def isTooFar(approx: Double) = (approx * approx - n).abs > 0.0001
  def nextApprox(prev: Double) = (prev + n / prev) / 2
  def streamOfApproximations   = Stream.iterate(1.0)(nextApprox)
  streamOfApproximations.dropWhile(isTooFar).head
}squareRoot: (n: Double)Double
squareRoot(9)res12: Double = 3.000000001396984
squareRoot(654321)res13: Double = 808.9011064400888
The first of three auxiliary functions checks whether a given approximation isn’t close enough to the desired solution.
The second function uses an earlier approximation to generate another, as per Newton’s method.
The third generates an infinite stream of approximate solutions. The stream starts with a “guess” of 1.0 and generates better approximations by applying nextApprox repeatedly.
We get a good-enough approximation by dropping all those approximations at the front of the stream that aren’t close enough. Then we pick out the first element that remains.

A recursively defined stream

Methods such as continually, from, and iterate work well for many needs. But what if we didn’t have these factory methods? Or what if you want to define a stream that doesn’t match what those methods do?

You can use recursion to tailor exactly the sort of stream that you need.

Below is a very simple example of a recursive (i.e., self-referential) definition of a stream. This definition produces a stream of positive integers (just like Stream.from(1)).

def positiveNumbers(first: Int): Stream[Int] = first #:: positiveNumbers(first + 1)positiveNumbers: (first: Int)Stream[Int]
positiveNumbers(1).take(3).foreach(println)1
2
3
The operator #:: combines a single value and a stream, yielding another stream. The value to the left of the operator becomes the first element; it’s followed by the elements of the stream on the right-hand side.
The definition is recursive: it refers to itself. We form a sequence of positive integers by starting at the given integer and following it with a stream of positive integers that starts at the next integer.

In the Scala API, methods such as from and continually have recursive implementations. The same basic principle works for defining any kind of stream; experiment!

Recursion is a versatile and powerful technique; streams are just one of its countless applications. Chapter 11.2 will say more.

A code-reading challenge

Here’s a more elaborate example of streams. Can you figure out what this function does and how it does it?

The names of the function and its components are vague but not deliberately misleading.

This is a math-y sort of program.

def mystery(limit: Int): Vector[Int] = {
  import scala.math.sqrt

  val odd = Stream.from(3, 2)
  def candidates = odd.takeWhile( _ <= limit )

  def initialVals = odd.takeWhile( _ <= sqrt(limit).toInt )
  def multiples(n: Int) = Stream.from(n * n, 2 * n).takeWhile( _ <= limit )
  def rejected = initialVals.flatMap(multiples)

  val result = candidates.diff(rejected)
  result.toVector
}

Optional Reading: Properties of Streams

What is the difference between these two programs?

val result = "llama".length
println(result)
println(result)
def result = "llama".length
println(result)
println(result)

The first program computes the length of a string, stores the result (in a val), and prints it out twice. The second program recomputes the length of the string upon request (using the def), which happens twice in the example. The first program uses an additional memory slot to avoid repeating work.

What does that have to do with streams?

Streams are non-strict and lazy

We’ve already established that all the elements of a stream aren’t necessarily generated at all; in other words, a stream is a non-strict collection. We also noted that a stream element is generated only when that element is actually needed for some purpose.

Let’s return to this example:

val randomStream = Stream.continually( Random.nextInt(100) )randomStream: Stream[Int] = Stream(12, ?)
randomStream.take(5).mkString(",")res14: String = 12,62,28,14,31
The stream evaluates the element-generating expression Random.nextInt(100) only when or if a new element is needed. In this example, that happens five times, since mkString forces the stream to access five of its elements. In order to do that, the stream needs to evaluate the randomizing expression repeatedly.

Careful now: an element is generated into the stream whenever a new element is needed. What if you reuse a previously generated element?

randomStream.take(5).mkString(",")res15: String = 12,62,28,14,31
randomStream.take(5).mkString(",")res16: String = 12,62,28,14,31
randomStream.take(10).mkString(",")res17: String = 12,62,28,14,31,27,79,18,78,43
Looking again at the first five elements of our stream, we get the same numbers as before. The stream object has stored those elements and doesn’t re-evaluate the randomizing expression again when we repeat our request.
When we take a slightly longer chunk of the stream and check its contents, we get the five stored elements followed by some additional ones that are randomly generated (and stored) in order to respond to our request.

In programmer-speak, a stream is not merely non-strict but lazy (laiska): it not only doesn’t generate elements unless it needs them but also stores the elements to save itself the future trouble of having to generate them again.

Laziness is even more conspicuous in the next example.

Let’s start by creating a function that generates new elements and notifies us about it:

var counter = 0counter: Int = 0
def generateElement() = {
  val newElem = counter
  println("I guess I have to bother to generate a new element: " + newElem)
  counter += 1
  newElem
}generateElement: ()Int

Now we form a stream that uses the above function to generate its elements:

val streamOfInts = Stream.continually( generateElement() )I guess I have to bother to generate a new element: 0
streamOfInts: Stream[Int] = Stream(0, ?)
The first element is immediately generated. The REPL reports it as part of the toString output. (N.B. In future versions of the Scala API, this behavior will not occur; a stream will not generate even the first element like this.)

Examining the first five elements necessitates additional generateElement calls:

streamOfInts.take(5).mkString(",")I guess I have to bother to generate a new element: 1
I guess I have to bother to generate a new element: 2
I guess I have to bother to generate a new element: 3
I guess I have to bother to generate a new element: 4
res18: String = 0,1,2,3,4

Re-examining them does not, and the stream doesn’t do anything it doesn’t need to:

streamOfInts.take(5).mkString(",")res19: String = 0,1,2,3,4

Examining further elements makes the stream work:

streamOfInts.take(10).mkString(",")I guess I have to bother to generate a new element: 5
I guess I have to bother to generate a new element: 6
I guess I have to bother to generate a new element: 7
I guess I have to bother to generate a new element: 8
I guess I have to bother to generate a new element: 9
res20: String = 0,1,2,3,4,5,6,7,8,9

val, def, streams, and memory usage

As long as you have a reference to a stream stored somewhere, you don’t lose the stream to the garbage collector. This was illustrated in the examples just above, where we had variables (randomStream, streamOfInts) that referred to our streams. What if we hadn’t had them?

def randomStream = Stream.continually( Random.nextInt(100) )randomStream: Stream[Int]
randomStream.take(5).mkString(",")res21: String = 42,72,7,86,30
randomStream.take(5).mkString(",")res22: String = 2,2,64,87,54
Instead of a variable, we define a function that returns a reference to a new stream. Apart from that, this example is identical to our earlier one.
The expression randomStream calls a parameterless function. Whenever we evaluate this expression, we get an entirely new stream.
The two streams are independent of each other and have different elements.

Now that we didn’t store a reference to either of the random streams in a variable, the stream objects won’t stay in memory. The garbage collector promptly releases the memory for other use.

Let’s compare two more code fragments. Here’s the first one, in which a val refers to a stream:

val longStreamOfInputs = Stream.continually( readSomeDataFromSomewhere() )
longStreamOfInputs.map(doStuffWithDataElement).foreach(println)
longStreamOfInputs.map(doSomethingElseWithElement).foreach(println)
We store the stream reference in a variable. We intend to read a large but finite amount of data from a source (a file, the internet, the keyboard; somewhere).
When we subsequently command the stream to process its elements and print out the results, the entire stream is traversed and each of its elements generated with a function that reads a new element into memory. The variable longStreamOfInputs stores a reference to all the data.
Later, we take the previously stored inputs and process them in some other way. The latter traversal doesn’t fetch the inputs again: it uses the stored elements in the stream.

Here’s the same code but with a def instead of a val:

def longStreamOfInputs = Stream.continually( readSomeDataFromSomewhere() )
longStreamOfInputs.map(doStuffWithDataElement).foreach(println)
longStreamOfInputs.map(doSomethingElseWithElement).foreach(println)

This program doesn’t store the stream and its contents anywhere. In fact, even while the computer processes the stream one element at a time, each previously processed element becomes fair game for the garbage collector, which operates in the background. Especially if the stream is long, it can happen that the first elements have been processed and removed from memory before the last elements have even been generated.

The input data is not stored anywhere. The second traversal causes all the input to be re-read.

Whether we store the stream in a variable can thus make a difference. Circumstances dictate whether it’s a good idea to do so:

  • If the stream is short, if there is plenty of memory available, and especially if the goal is to traverse the same elements multiple times, then it’s a good idea to use a variable.
  • If the elements are traversed only once and each one can be discarded as soon as it’s dealt with (as in our SentimentAnalyzer, for instance), then def is the better choice. This approach also has the benefit that all the elements don’t have to fit in memory at once.

When operating on small amounts of data, this difference between the alternatives is often negligible.

The internal implementation of streams is based on linking each element to the next element in the stream. This linked structure makes it possible to discard previously processed parts of the stream from memory, as described above; on the other hand, it also means that the only efficient way to process a stream is to advance linearly from the head rather than picking out elements at arbitrary indices (which is fine on buffers and vectors, for example). You’ll learn more about linked structures and efficiency in various follow-on courses, including Programming 2.

Summary of Key Points

  • A stream is a collection whose elements aren’t formed as soon as the collection is created. They are generated and stored as needed.
    • A stream is intended for traversing the elements in linear order.
    • You can process a stream’s elements one at a time without storing them all simultaneously in the computer’s memory or even knowing how many elements there are or will be.
    • Unlike the previously familiar collections, a stream can be either finite or infinite. You can use methods such as take and takeWhile to select a finite part of an infinite stream.
    • There are many ways to form a stream. For instance, you can repeat an operation that creates elements (continually), produce elements in numerical order (from), or repeatedly apply an operation to the previous element (iterate).
  • So-called by-name parameters are received by the function as unevaluated expressions. This contrasts with ordinary (by-value) parameters, which receive the previously computed values of expressions.
    • The receiving method may evaluate a by-name parameter once, more than once, or not at all, as appropriate.
    • These parameters are highly useful for some purposes, such as forming streams.
  • Links to the glossary: stream; by-name parameter; strict, non-strict, and lazy evaluation.

Feedback

Please note that this section must be completed individually. Even if you worked on this chapter with a pair, each of you should submit the form separately.

Credits

Thousands of students have given feedback that has contributed to this ebook’s design. Thank you!

Weeks 1 to 13 of the ebook, including the assignments and weekly bulletins, have been written in Finnish and translated into English by Juha Sorva.

Weeks 14 to 20 are by Otto Seppälä. That part of the ebook isn’t available during the fall term, but we’ll publish it when it’s time.

The appendices (glossary, Scala reference, FAQ, etc.) are by Juha Sorva unless otherwise specified on the page.

The automatic assessment of the assignments has been programmed by Riku Autio, Jaakko Kantojärvi, Teemu Lehtinen, Timi Seppälä, Teemu Sirkiä, and Aleksi Vartiainen.

The illustrations at the top of each chapter, and the similar drawings elsewhere in the ebook, are the work of Christina Lassheikki.

The animations that detail the execution Scala programs have been designed by Juha Sorva and Teemu Sirkiä. Teemu Sirkiä and Riku Autio have done the technical implementation, relying on Teemu’s Jsvee and Kelmu toolkits.

The other diagrams and interactive presentations in the ebook are by Juha Sorva.

The O1Library software has been developed by Aleksi Lukkarinen and Juha Sorva. Several of its key components are built upon Aleksi’s SMCL library.

The pedagogy of using tools from O1Library (such as Pic) for simple graphical programming is inspired by the textbooks How to Design Programs by Flatt, Felleisen, Findler, and Krishnamurthi and Picturing Programs by Stephen Bloch.

The course platform A+ has been created by Aalto’s LeTech research group and is largely developed by students. The current lead developer is Jaakko Kantojärvi; many other students of computer science and information networks are also active on the project.

For O1’s current teaching staff, please see Chapter 1.1.

Additional credits for this page

The rainfall assignment is a version of a classic programming problem by Elliot Soloway.

The sentiment-analyzer assignment is an adaptation of a programming assignment by Eric D. Manley and Timothy M. Urness, which is in turn based on a programming contest on the web site Kaggle.

../_images/imho7.png
Posting submission...