This course has already ended.

Luet oppimateriaalin englanninkielistä versiota. Mainitsit kuitenkin taustakyselyssä osaavasi suomea. Siksi suosittelemme, että käytät suomenkielistä versiota, joka on testatumpi ja hieman laajempi ja muutenkin mukava.

Suomenkielinen materiaali kyllä esittelee englanninkielisetkin termit.

Kieli vaihtuu A+:n sivujen yläreunan painikkeesta. Tai tästä: Vaihda suomeksi.


Chapter 11.3: An Introduction to File I/O

../_images/person07.png

Using Data Files

So far, O1’s programming assignments haven’t required you to write code that accesses data stored on your computer’s hard drive or saves data in such files. Some of the given programs have done that, however; for instance, the DNA program of Chapter 5.6 read its input from a text file, as did the Stars app and RobotTribes.

File operations are common in everyday applications. There are various reasons for a program to operate on files; here are just a few:

  • We want a program to save documents (e.g., a text document, presentation slides, entries of an experience diary) so that they persist in external memory (e.g., a hard drive) even while the program isn’t currently running.

  • We’d like a program to read data (e.g., scientific measurements) from a file and compute results from that data.

  • We’d like our application’s user to be able to save their personal preference settings (e.g., language). The app stores the settings in a file and loads them whenever the app is launched.

  • We’d like our computer game to save and load its state on the player’s command.

This chapter provides a very brief introduction to working with files. We’ll cover the topic in practical terms: you’ll learn about a handful of basic tools that you can apply in your Scala programs. Further information about files is available in O1’s follow-on courses and in various online resources.

On I/O libraries

When a file serves as a program’s input, we say that the program reads the file. Correspondingly, a program that stores information writes the file. Reading and writing data are jointly known as I/O (from input/output). The umbrella term I/O covers not just file I/O but other interactions as well, such as printing information onscreen and network access.

Nearly all programming languages’ standard libraries come with a toolkit for file I/O. Scala’s I/O package is unsurprisingly named: scala.io.

It’s easy to use Java libraries from Scala (Chapter 5.4) and since Java’s standard API provides a decent I/O library. Therefore, creating a similar, Scala-specific I/O library hasn’t been a top priority for Scala’s standard-library developers so far. The package scala.io contains just a handful of fairly simple tools for some fairly simple but common needs, such as reading text files. For other needs, Scala programmers can use java.io or one of various third-party libraries.

On text files

Files, just like computer memory in general, can store data in a variety of formats (Chapter 5.4). In this chapter, we’ll stick with plain-text files — that is, files whose binary contents represent written characters.

Text files have the nice attribute of being easy to modify in a variety of editors (e.g., Emacs, Notepad, or IntelliJ). Many I/O libraries have tools that are specifically meant for handling text files.

Reading a File, Example 1 of 7

Our goal: read a few characters

Say we have a text file named example.txt that contains these three lines of text:

This is the first line from the file.
This is the second one.
This third line is the last.

You’ll find just such a file in the Files module, which also contains all of this chapter’s example programs.

As our first example, let’s create a program that reads in one character at a time and reports some of the first characters that it finds in the file. The program will produce this output:

The first  character in the file is T.
The second character in the file is h.
The third  character in the file is i.
The fourth character in the file is s.
The fifth  character in the file is  .
The sixth  character in the file is i.

Solution: fromFile and next

scala.io contains a class named Source, which represents data that the program reads from some source such as a file. The class has a companion object that provides a number of methods for constructing Source objects; one of them is named fromFile. This method creates a Source object that receives its data from a file. Here’s an example:

import scala.io.Source

@main def readingExample1() =
  val file = Source.fromFile("Files/example.txt")
  println("The first  character in the file is " + file.next() + ".")
  println("The second character in the file is " + file.next() + ".")
  println("The third  character in the file is " + file.next() + ".")
  println("The fourth character in the file is " + file.next() + ".")
  println("The fifth  character in the file is " + file.next() + ".")
  println("The sixth  character in the file is " + file.next() + ".")
  file.close()

The fromFile method takes in a file name and returns a Source object that is capable of accessing that file’s contents.

In this program, we have a file variable that refers to the object. That object is an iterator (iteraattori); an iterator is associated with a particular data source (such as a file or a collection) and is capable of working through those data elements once.

Having created the iterator, we can call next on it to obtain the next unprocessed character from the file. This iterator first brings us the initial character on the first line within the text file (a Char).

The iterator object internally keeps track of its progress through the data source. Its state changes every time we invoke next. Consecutive invocations of next bring us the following characters from the file.

So that the I/O library can read a file, it automatically requests access to that file from the operating system, which ties up some resources. Once your program is done with a file, it’s appropriate to release those resources. Calling close does that.

In this example, we used a relative path. This program works as long as there is a folder named Files and, therein, a file named example.txt in the program’s working directory. When you launch a program in IntelliJ, the working directory is the entire project’s root folder, under which the Files module folder resides.

Reading a File, Example 2 of 7: I/O Errors

I/O programming unescapably involves the possibility to runtime errors. For example, any of the previous program’s file.next() calls fails in case the file cannot be accessed. That might happen for a number of reasons; for instance, the file might be on a memory stick that someone disconnects from the computer just as the program needs the file. When that happens, calling next fails with a runtime error of type IOException, which crashes the example app. The remaining lines of code don’t run at all.

Our goal: always close the file

As we noted already, the “hygiene” of I/O programming demands that we release any resources allocated to our program once we no longer need them. In this example, we need to make sure to always call close once we’re done.

The above program code does indeed call close, but that call never happens if one of the next calls crashes with an error.

Closing a file even on errors is a good habit to learn. There are several ways to accomplish that; in this chapter, we’ll use the try construct.

Solution: try and finally

This program works just like the previous one but deals with errors more neatly:

import scala.io.Source

@main def readingExample2() =
  val file = Source.fromFile("Files/example.txt")
  try
    println("The first  character in the file is " + file.next() + ".")
    println("The second character in the file is " + file.next() + ".")
    println("The third  character in the file is " + file.next() + ".")
    println("The fourth character in the file is " + file.next() + ".")
    println("The fifth  character in the file is " + file.next() + ".")
    println("The sixth  character in the file is " + file.next() + ".")
  finally
    file.close()

The try keyword starts a code block whose contents will be run as per usual — or at least, the computer will try to run that code normally. The try block’s execution is interrupted if an error occurs. Without the surrounding try block, our example program would terminate right away, but...

... we’ve followed the try block with a finally block, in which we specify what we’d like to happen after the try block in all scenarios, even if the try block’s execution was interrupted by an error.

We essentially state: “Finally, no matter how things went in the try block, close the connection to the file.”

What about reacting to errors?

A finally block lets you specify which commands to run no matter if an error occurred or not. But what if you’d like to do something only in case an error occurred in a particular section of the program? You might like to inform your app’s user of a problem, for instance.

Moreover, the tryfinally construct that we used doesn’t actually prevent the program from crashing once it has executed the finally block. What if you’d like your program to keep runnning despite the error? (Resuming after an error may or may not be a reasonable thing to do, depending on your program and the type of the error.)

One of the things you can do is add a catch block to your try: it specifies what the program does in case the code in the try block failed with an error. (The topic is covered in other courses, including Programming Studio 1 and Programming Studio 2. It should easy to find other sources as well.)

Reading a File, Example 3 of 7

Our goal: read every character

The previous program read in only the six characters. Let’s develop it into a program that reads and prints out every single letter in the file. Assuming the same text file, our next program will produce this output:

Character #1 is T.
Character #2 is h.
Character #3 is i.
Character #4 is s.
Character #5 is  .
Character #6 is i.
Character #7 is s.
Character #8 is  .
Character #9 is t.
[Long section of similar output snipped.]
Character #82 is s.
Character #83 is  .
Character #84 is t.
Character #85 is h.
Character #86 is e.
Character #87 is  .
Character #88 is l.
Character #89 is a.
Character #90 is s.
Character #91 is t.
Character #92 is ..

Solution: a Source object in a for loop

import scala.io.Source

@main def readingExample3() =
  val file = Source.fromFile("Files/example.txt")
  try
    var charCount = 1              // stepper: 1, 2, ...
    for char <- file do
      println("Character #" + charCount + " is " + char + ".")
      charCount += 1
  finally
    file.close()

You can process an iterator in a for loop. This corresponds to releatedly asking the iterator for the next element by calling next until there are no more elements available.

This program keeps printing reports and counting characters until it has dealt with every character that’s stored in the text file.

More on iterators and for loops

As mentioned, an iterator object has the job and capability of iterating over a sequence of data elements once, in a particular order.

One way to obtain an iterator is to ask a collection for one: any Scala collection can create an iterator object that traverses the collection’s elements. Behind the scenes, Scala’s collection objects use iterators extensively to provide the various methods that you know.

An iterator object has access to the data that it traverses; moreover, the iterator tracks its own progress during that traversal. How it accomplishes that depends on the data source: an iterator that traverses a buffer, for example, may track its progress by storing the current index.

As an application programmer, when you process a Scala collection’s elements, you generally don’t have to expressly create an iterator. This doesn’t mean that an iterator isn’t being used; it means that Scala automatically manages the iterator object for you. Let’s take a brief but instructive look at what that means.

Consider this familiar sort of for loop, which prints out a vector’s contents:

// Version 1
val myData = Vector(10, 5, 2, 4)
for number <- myData do
  println(number)

Vectors have an iterator method, which creates a new iterator object that iterates over vector (cf. Source.fromFile). Such an iterator knows how to provide one vector element at a time and how to track its own progress through the vector’s elements. To illustrate, here is some code that does the same as Version 1 above.

// Version 2
val myData = Vector(10, 5, 2, 4)
val myIterator = myData.iterator
for number <- myIterator do
  println(number)

In Version 3, below, also has the same effect but is phrased differently: it uses the iterator’s next and hasNext methods explicitly instead of leaving that to the for loop to take care of.

// Version 3
val myData = Vector(10, 5, 2, 4)
val myIterator = myData.iterator
while myIterator.hasNext do
  println(myIterator.next())

Chapter 6.3 mentioned that Scala’s for loops are a different way to write foreach method calls. For instance, the for loop in Version 1 above is just a different way to call foreach. The Scala compiler turns the code in Version 1 essentially into this:

// Version 4
val myData = Vector(10, 5, 2, 4)
myData.foreach(println)

As you can see, the code invokes foreach on the vector. But what does the vector’s foreach method do? The answer is that since the vector needs to traverse its own elements, it creates an iterator object and invokes that iterator’s foreach method; the iterator takes care of the actual traversal. Version 4 works essentially identically to Version 5, below:

// Version 5
val myData = Vector(10, 5, 2, 4)
myData.iterator.foreach(println)

We won’t dig deeper into the internal implementation of different iterators here. But perhaps these examples shed some light on what happens as you use a for loop on a collection.

Reading a File, Example 4 of 7

Our goal: read the whole file in a variable

What if we’d like to take the entire contents of the file and store them in a String variable or pass them on to a function? For instance, we might want to read in musical notes that have been stored in a text file and pass the whole song to play.

Let’s do just that. You’ll find one suitable example file, running_up_that_hill.txt, in the Files module.

Solution: mkString

import scala.io.Source
import o1.play

@main def readingExample4() =
  val file = Source.fromFile("Files/running_up_that_hill.txt")
  try
    val entireContents = file.mkString
    play(entireContents)
  finally
    file.close()

Iterators have an mkString method (Chapter 4.2) just like collections do. Here, we use the method to make a string that contains the entire file contents, with line breaks and all.

Reading a File, Example 5 of 7

Our goal: print a file with line numbers

Let’s write a program that prints out a numbered report of what’s in our example file:

1: This is the first line from the file.
2: This is the second one.
3: This third line is the last.

Towards a solution: getLines

Now, we could operate on the file one character at a time (as in Examples 1 to 3 above), or we could construct a multi-line string (as in Example 4) and then split it into lines. But it’s natural to process a text file one line at a time, and easy too, so let’s do that instead.

Before we get to the working program, let’s experiment a bit and write some code to print out the first two lines from the file:

This is the first line from the file.
This is the second one.

Here’s the code:

val file = Source.fromFile("Files/example.txt")
val linesFromFile = file.getLines
println(linesFromFile.next())
println(linesFromFile.next())

The getLines method returns another iterator object, which that relies on and governs the original one-char-at-a-time iterator (which file refers to). This new iterator lets us access the file’s contents one line at a time...

... because its next method tells the file iterator to provide characters until it reaches a newline character. We get a return value that contains all the characters on a single line. (The return value is a String, not a Char as was the case with our original iterator.)

The next next returns the following line.

Solution

Here’s a program that numbers all the lines as we intended:

import scala.io.Source

@main def readingExample5() =
  val file = Source.fromFile("Files/example.txt")
  try
    var lineNumber = 1              // stepper
    for line <- file.getLines do
      println(s"$lineNumber: $line")
      lineNumber += 1
  finally
    file.close()

We loop over each of the lines that getLines gives us.

Below is an alternative way to implement the try block. This version discards the index-counting stepper in favor of the zipWithIndex method (Chapter 9.2):

for (line, index) <- file.getLines.zipWithIndex do
  println(s"${index + 1}: $line")

We pair each line with a (zero-based) index.

We loop over those pairs.

Reading a File, Example 6 of 7

Our goal: first the lines, then their lengths

Now let’s try to process the file so that we get this output:

LINES OF TEXT:
1: This is the first line from the file.
2: This is the second one.
3: This third line is the last.
LENGTHS OF EACH LINE:
37 characters
23 characters
28 characters
THE END

Attempted solution: getLines twice

Here’s a simple way to solve the problem in principle:

  1. Iterate over each of the lines in the file, printing each one along with its number.

  2. Then iterate over the lines again and print its length.

Let’s turn this idea into Scala code. We’ll use two consecutive loops. Each calls getLines and processes one line of text at a time:

import scala.io.Source

@main def readingExample6() =
  val file = Source.fromFile("Files/example.txt")
  try
    println("LINES OF TEXT:")
    var lineNumber = 1              // stepper
    for line <- file.getLines do
      println(s"$lineNumber: $line")
      lineNumber += 1
    end for

    println("LENGTHS OF EACH LINE:")
    for line <- file.getLines do
      println(s"${line.length} characters")
    end for

    println("THE END")
  finally
    file.close()
end readingExample6

Running that program brings is disappointing, however:

LINES OF TEXT:
1: This is the first line from the file.
2: This is the second one.
3: This third line is the last.
LENGTHS OF EACH LINE:
THE END

Our program completely failed to print the line lengths!

Reading a File, Example 7 of 7

Our goal: fix that bug

The reason the previous program didn’t work is that an iterator is capable of iterating over a data sequence only once. The iterator object keeps track of how far it has advanced, but it never steps backwards or starts over.

For instance, the iterator that fromFile returns (and that we indirectly use via getLines) is capable of advancing through the file’s contents just once, from beginning to end. When the second loop calls getLines, we get only any remaining lines that the iterator hasn’t yet processed. And since the first loop already processed every single line, there are no lines left for the second loop to do anything with.

One possible solution is to call Source.fromFile twice: once before each loop. Each loop would then use a completely different iterator and each would read the data file independently of the other loop. The drawback of that solution is that our program would then unnecessarily read the file twice, and it is remarkably slow to read data from external memory compared to executing simple programming-language instructions.

Another solution is to copy the file’s entire contents into a collection; we can then iterate over that collection as many times as we like. The downside of this approach is that we need to store the file’s entire contents in main memory.

Below is an example of the second approach.

Solution: toVector

import scala.io.Source

@main def readingExample7() =
  val file = Source.fromFile("Files/example.txt")
  try
    val vectorOfLines = file.getLines.toVector

    println("LINES OF TEXT:")
    var lineNumber = 1              // stepper
    for line <- vectorOfLines do
      println(s"$lineNumber: $line")
      lineNumber += 1

    println("LENGTHS OF EACH LINE:")
    for line <- vectorOfLines do
      println(s"${line.length} characters")

    println("THE END")
  finally
    file.close()
end readingExample7

We ask the line-by-line iterator to put all the lines in a vector (cf. mkString in Example 4 above). Now we have the data in a Vector[String], which we can operate on just like any other vector.

We can loop over a vector however many times. The program works and produces the intended output.

Writing a File

Given a suitable library class, writing a text file is straightforward enough. One of the available tools, which we’ll use here, is the PrintWriter class from package java.io.

Here’s a little example program that writes ten thousand random numbers between 0 and 99 in a text file, one number per line.

import java.io.PrintWriter
import scala.util.Random

@main def writingExample() =
  val fileName = "Files/random.txt"
  val file = PrintWriter(fileName)
  try
    for n <- 1 to 10000 do
      file.println(Random.nextInt(100))
    println("Created a file " + fileName + " that contains pseudorandom numbers.")
    println("In case the file already existed, its old contents were replaced with new numbers.")
  finally
    file.close()

You can create a PrintWriter object as shown. Pass in the name of the target file. Note! If a file of that name didn’t exist yet, it will be created. If it did exist already, the file’s old contents will be lost and replaced by what our program writes in the file.

The println method writes a single line of text into the file.

Closing the file connection is also important when writing files. In fact, it’s particularly important when writing, as it ensures that all the data that you’ve scheduled for writing actually finds its way into the file on disk. (For improved efficiency, file-writing commands don’t always take place at once: they may instead collect data in a buffer in main memory. The write buffer’s contents are taken to the file in one go once a substantial chunk of data has accumulated in the buffer — or the connection is closed.)

Try it

That program, too, is available in the Files module under o1.io. Try running it.

Try running the program again. The old numbers get replaced by newly randomized ones.

File-handling practice

Write a program that asks the user to provide a file name, reads in numbers from the named file, and computes a few statistics: how many numbers there are in the data set, the average and median of the numbers, and the number of occurrences of the most frequently-occurring number.

Assume that the input file contains characters that correspond to Doubles one number on each line, as shown below. (Exact integers are also OK as inputs.)

10.4
5
4.12
10.9
10.9

Below is an example run, which assumes that test.txt exists and contains the five lines listed above. Note that the user types in the file name.

Please enter the name of the input file: test.txt
Count: 5
Average: 8.264
Median: 10.4
Highest number of occurrences: 2

You’ll find some skeleton code in printFileStatistics.scala in the Files module. Write your code there.

Use try and finally and remember to close the file.

You may want to apply the program to the random numbers generated by writingExample. Does the random-number generator seem to be doing its job right?

A+ presents the exercise submission form here.

Summary of Key Points

  • Tools for reading and writing input — I/O — are available in the standard libraries that come with programming languages. These tools help the programmer work with files, among other things.

  • You can use the methods in scala.io to read and write text files (and other files).

  • Whenever you do file I/O, you need to be alert to the possibility of failures (exceptions) that may occur at runtime.

  • You can learn more about I/O, files, and exception handling in later courses and online.

  • Links to the glossary: I/O; file, text file; runtime error, exception handling; main memory, external memory, hard drive.

Feedback

Please note that this section must be completed individually. Even if you worked on this chapter with a pair, each of you should submit the form separately.

Credits

Thousands of students have given feedback and so contributed to this ebook’s design. Thank you!

The ebook’s chapters, programming assignments, and weekly bulletins have been written in Finnish and translated into English by Juha Sorva.

The appendices (glossary, Scala reference, FAQ, etc.) are by Juha Sorva unless otherwise specified on the page.

The automatic assessment of the assignments has been developed by: (in alphabetical order) Riku Autio, Nikolas Drosdek, Kaisa Ek, Joonatan Honkamaa, Antti Immonen, Jaakko Kantojärvi, Onni Komulainen, Niklas Kröger, Kalle Laitinen, Teemu Lehtinen, Mikael Lenander, Ilona Ma, Jaakko Nakaza, Strasdosky Otewa, Timi Seppälä, Teemu Sirkiä, Joel Toppinen, Anna Valldeoriola Cardó, and Aleksi Vartiainen.

The illustrations at the top of each chapter, and the similar drawings elsewhere in the ebook, are the work of Christina Lassheikki.

The animations that detail the execution Scala programs have been designed by Juha Sorva and Teemu Sirkiä. Teemu Sirkiä and Riku Autio did the technical implementation, relying on Teemu’s Jsvee and Kelmu toolkits.

The other diagrams and interactive presentations in the ebook are by Juha Sorva.

The O1Library software has been developed by Aleksi Lukkarinen, Juha Sorva, and Jaakko Nakaza. Several of its key components are built upon Aleksi’s SMCL library.

The pedagogy of using O1Library for simple graphical programming (such as Pic) is inspired by the textbooks How to Design Programs by Flatt, Felleisen, Findler, and Krishnamurthi and Picturing Programs by Stephen Bloch.

The course platform A+ was originally created at Aalto’s LeTech research group as a student project. The open-source project is now shepherded by the Computer Science department’s edu-tech team and hosted by the department’s IT services; dozens of Aalto students and others have also contributed.

The A+ Courses plugin, which supports A+ and O1 in IntelliJ IDEA, is another open-source project. It has been designed and implemented by various students in collaboration with O1’s teachers.

For O1’s current teaching staff, please see Chapter 1.1.

Additional credits for this page

This chapter does injustice to music by Kate Bush. Thank you and sorry.

a drop of ink
Posting submission...