The latest instance of the course can be found at: O1: 2024
Luet oppimateriaalin englanninkielistä versiota. Mainitsit kuitenkin taustakyselyssä osaavasi suomea. Siksi suosittelemme, että käytät suomenkielistä versiota, joka on testatumpi ja hieman laajempi ja muutenkin mukava.
Suomenkielinen materiaali kyllä esittelee englanninkielisetkin termit.
Kieli vaihtuu A+:n sivujen yläreunan painikkeesta. Tai tästä: Vaihda suomeksi.
Chapter 11.3: An Introduction to File I/O
About This Page
Questions Answered: How can I load data from a (text) file into my program? Or save data in a file?
Topics: Reading and writing text files. Some other topics will
come up as well, including iterators, try
, and finally
.
What Will I Do? Read. There’s one small, optional assignment.
Rough Estimate of Workload:? One hour. (This is an entirely optional chapter, so you may just skip it, too.)
Points Available: None.
Related Modules: Files (new).
Notes: One example uses sound via o1.play
, but that isn’t really
important for understanding the chapter.
Using Data Files
So far, O1’s programming assignments haven’t required you to write code that accesses data stored on your computer’s hard drive or saves data in such files. Some of the given programs have done that, however; for instance, the DNA program of Chapter 5.6 read its input from a text file, as did the Stars app and RobotTribes.
File operations are common in everyday applications. There are various reasons for a program to operate on files; here are just a few:
We want a program to save documents (e.g., a text document, presentation slides, entries of an experience diary) so that they persist in external memory (e.g., a hard drive) even while the program isn’t currently running.
We’d like a program to read data (e.g., scientific measurements) from a file and compute results from that data.
We’d like our application’s user to be able to save their personal preference settings (e.g., language). The app stores the settings in a file and loads them whenever the app is launched.
We’d like our computer game to save and load its state on the player’s command.
This chapter provides a very brief introduction to working with files. We’ll cover the topic in practical terms: you’ll learn about a handful of basic tools that you can apply in your Scala programs. Further information about files is available in O1’s follow-on courses and in various online resources.
On I/O libraries
When a file serves as a program’s input, we say that the program reads the file. Correspondingly, a program that stores information writes the file. Reading and writing data are jointly known as I/O (from input/output). The umbrella term I/O covers not just file I/O but other interactions as well, such as printing information onscreen and network access.
Nearly all programming languages’ standard libraries come with a toolkit for file I/O.
Scala’s I/O package is unsurprisingly named: scala.io
.
It’s easy to use Java libraries from Scala (Chapter 5.4) and since Java’s standard
API provides a decent I/O library. Therefore, creating a similar, Scala-specific I/O
library hasn’t been a top priority for Scala’s standard-library developers so far.
The package scala.io
contains just a handful of fairly simple tools for some fairly
simple but common needs, such as reading text files. For other needs, Scala programmers
can use java.io
or one of various third-party libraries.
On text files
Files, just like computer memory in general, can store data in a variety of formats (Chapter 5.4). In this chapter, we’ll stick with plain-text files — that is, files whose binary contents represent written characters.
Text files have the nice attribute of being easy to modify in a variety of editors (e.g., Emacs, Notepad, or IntelliJ). Many I/O libraries have tools that are specifically meant for handling text files.
Reading a File, Example 1 of 7
Our goal: read a few characters
Say we have a text file named example.txt
that contains these three lines of text:
This is the first line from the file.
This is the second one.
This third line is the last.
You’ll find just such a file in the Files module, which also contains all of this chapter’s example programs.
As our first example, let’s create a program that reads in one character at a time and reports some of the first characters that it finds in the file. The program will produce this output:
The first character in the file is T.
The second character in the file is h.
The third character in the file is i.
The fourth character in the file is s.
The fifth character in the file is .
The sixth character in the file is i.
Solution: fromFile
and next
scala.io
contains a class named Source
, which represents data that the program reads
from some source such as a file. The class has a companion object that provides a number
of methods for constructing Source
objects; one of them is named fromFile
. This method
creates a Source
object that receives its data from a file. Here’s an example:
import scala.io.Source
@main def readingExample1() =
val file = Source.fromFile("Files/example.txt")
println("The first character in the file is " + file.next() + ".")
println("The second character in the file is " + file.next() + ".")
println("The third character in the file is " + file.next() + ".")
println("The fourth character in the file is " + file.next() + ".")
println("The fifth character in the file is " + file.next() + ".")
println("The sixth character in the file is " + file.next() + ".")
file.close()
In this program, we have a file
variable that refers to the
object. That object is an iterator (iteraattori); an
iterator is associated with a particular data source (such as
a file or a collection) and is capable of working through those
data elements once.
Having created the iterator, we can call next
on it to obtain
the next unprocessed character from the file. This iterator first
brings us the initial character on the first line within the text
file (a Char
).
The iterator object internally keeps track of its progress through
the data source. Its state changes every time we invoke next
.
Consecutive invocations of next
bring us the following characters
from the file.
So that the I/O library can read a file, it automatically requests
access to that file from the operating system, which ties up some
resources. Once your program is done with a file, it’s appropriate
to release those resources. Calling close
does that.
In this example, we used a relative path. This program works as
long as there is a folder named Files
and, therein, a file named
example.txt
in the program’s working directory. When you launch
a program in IntelliJ, the working directory is the entire
project’s root folder, under which the Files
module folder resides.
Reading a File, Example 2 of 7: I/O Errors
I/O programming unescapably involves the possibility to runtime errors. For example, any
of the previous program’s file.next()
calls fails in case the file cannot be accessed.
That might happen for a number of reasons; for instance, the file might be on a memory
stick that someone disconnects from the computer just as the program needs the file.
When that happens, calling next
fails with a runtime error of type IOException
,
which crashes the example app. The remaining lines of code don’t run at all.
Our goal: always close the file
As we noted already, the “hygiene” of I/O programming demands that we release any
resources allocated to our program once we no longer need them. In this example, we
need to make sure to always call close
once we’re done.
The above program code does indeed call close
, but that call never happens if one of
the next
calls crashes with an error.
Closing a file even on errors is a good habit to learn. There are several ways to
accomplish that; in this chapter, we’ll use the try
construct.
Solution: try
and finally
This program works just like the previous one but deals with errors more neatly:
import scala.io.Source
@main def readingExample2() =
val file = Source.fromFile("Files/example.txt")
try
println("The first character in the file is " + file.next() + ".")
println("The second character in the file is " + file.next() + ".")
println("The third character in the file is " + file.next() + ".")
println("The fourth character in the file is " + file.next() + ".")
println("The fifth character in the file is " + file.next() + ".")
println("The sixth character in the file is " + file.next() + ".")
finally
file.close()
The try
keyword starts a code block whose contents will be run
as per usual — or at least, the computer will try to run that
code normally. The try
block’s execution is interrupted if an
error occurs. Without the surrounding try
block, our example
program would terminate right away, but...
... we’ve followed the try
block with a finally
block, in
which we specify what we’d like to happen after the try
block
in all scenarios, even if the try
block’s
execution was interrupted by an error.
We essentially state: “Finally, no matter how things went in
the try
block, close the connection to the file.”
What about reacting to errors?
A finally
block lets you specify which commands to run no matter if an error occurred
or not. But what if you’d like to do something only in case an error occurred in a
particular section of the program? You might like to inform your app’s user of a problem,
for instance.
Moreover, the try
–finally
construct that we used doesn’t actually prevent the
program from crashing once it has executed the finally
block. What if you’d like your
program to keep runnning despite the error? (Resuming after an error may or may not be
a reasonable thing to do, depending on your program and the type of the error.)
One of the things you can do is add a catch
block to your try
: it specifies what
the program does in case the code in the try
block failed with an error. (The topic
is covered in other courses, including Programming Studio 1 and
Programming Studio 2. It should easy to find other sources as well.)
Reading a File, Example 3 of 7
Our goal: read every character
The previous program read in only the six characters. Let’s develop it into a program that reads and prints out every single letter in the file. Assuming the same text file, our next program will produce this output:
Character #1 is T.
Character #2 is h.
Character #3 is i.
Character #4 is s.
Character #5 is .
Character #6 is i.
Character #7 is s.
Character #8 is .
Character #9 is t.
[Long section of similar output snipped.]
Character #82 is s.
Character #83 is .
Character #84 is t.
Character #85 is h.
Character #86 is e.
Character #87 is .
Character #88 is l.
Character #89 is a.
Character #90 is s.
Character #91 is t.
Character #92 is ..
Solution: a Source
object in a for
loop
import scala.io.Source
@main def readingExample3() =
val file = Source.fromFile("Files/example.txt")
try
var charCount = 1 // stepper: 1, 2, ...
for char <- file do
println("Character #" + charCount + " is " + char + ".")
charCount += 1
finally
file.close()
You can process an iterator in a for
loop. This corresponds
to releatedly asking the iterator for the next element by calling
next
until there are no more elements available.
This program keeps printing reports and counting characters until it has dealt with every character that’s stored in the text file.
More on iterators and for
loops
As mentioned, an iterator object has the job and capability of iterating over a sequence of data elements once, in a particular order.
One way to obtain an iterator is to ask a collection for one: any Scala collection can create an iterator object that traverses the collection’s elements. Behind the scenes, Scala’s collection objects use iterators extensively to provide the various methods that you know.
An iterator object has access to the data that it traverses; moreover, the iterator tracks its own progress during that traversal. How it accomplishes that depends on the data source: an iterator that traverses a buffer, for example, may track its progress by storing the current index.
As an application programmer, when you process a Scala collection’s elements, you generally don’t have to expressly create an iterator. This doesn’t mean that an iterator isn’t being used; it means that Scala automatically manages the iterator object for you. Let’s take a brief but instructive look at what that means.
Consider this familiar sort of for
loop, which prints out a vector’s
contents:
// Version 1
val myData = Vector(10, 5, 2, 4)
for number <- myData do
println(number)
Vectors have an iterator
method, which creates a new iterator object
that iterates over vector (cf. Source.fromFile
). Such an iterator knows
how to provide one vector element at a time and how to track its own
progress through the vector’s elements. To illustrate, here is some code
that does the same as Version 1 above.
// Version 2
val myData = Vector(10, 5, 2, 4)
val myIterator = myData.iterator
for number <- myIterator do
println(number)
In Version 3, below, also has the same effect but is phrased differently:
it uses the iterator’s next
and hasNext
methods explicitly instead
of leaving that to the for
loop to take care of.
// Version 3
val myData = Vector(10, 5, 2, 4)
val myIterator = myData.iterator
while myIterator.hasNext do
println(myIterator.next())
Chapter 6.3 mentioned that Scala’s for
loops are a different way
to write foreach
method calls. For instance, the for
loop in
Version 1 above is just a different way to call foreach
. The Scala
compiler turns the code in Version 1 essentially into this:
// Version 4
val myData = Vector(10, 5, 2, 4)
myData.foreach(println)
As you can see, the code invokes foreach
on the vector. But what
does the vector’s foreach
method do? The answer is that since the
vector needs to traverse its own elements, it creates an iterator
object and invokes that iterator’s foreach
method; the iterator
takes care of the actual traversal. Version 4 works essentially
identically to Version 5, below:
// Version 5
val myData = Vector(10, 5, 2, 4)
myData.iterator.foreach(println)
We won’t dig deeper into the internal implementation of different
iterators here. But perhaps these examples shed some light on what
happens as you use a for
loop on a collection.
Reading a File, Example 4 of 7
Our goal: read the whole file in a variable
What if we’d like to take the entire contents of the file and store them in a String
variable or pass them on to a function? For instance, we might want to read in musical
notes that have been stored in a text file and pass the whole song to play
.
Let’s do just that. You’ll find one suitable example file, running_up_that_hill.txt
,
in the Files module.
Solution: mkString
import scala.io.Source
import o1.play
@main def readingExample4() =
val file = Source.fromFile("Files/running_up_that_hill.txt")
try
val entireContents = file.mkString
play(entireContents)
finally
file.close()
Iterators have an mkString
method (Chapter 4.2) just like
collections do. Here, we use the method to make a string that
contains the entire file contents, with line breaks and all.
Reading a File, Example 5 of 7
Our goal: print a file with line numbers
Let’s write a program that prints out a numbered report of what’s in our example file:
1: This is the first line from the file.
2: This is the second one.
3: This third line is the last.
Towards a solution: getLines
Now, we could operate on the file one character at a time (as in Examples 1 to 3 above), or we could construct a multi-line string (as in Example 4) and then split it into lines. But it’s natural to process a text file one line at a time, and easy too, so let’s do that instead.
Before we get to the working program, let’s experiment a bit and write some code to print out the first two lines from the file:
This is the first line from the file.
This is the second one.
Here’s the code:
val file = Source.fromFile("Files/example.txt")
val linesFromFile = file.getLines
println(linesFromFile.next())
println(linesFromFile.next())
The getLines
method returns another iterator object, which
that relies on and governs the original one-char-at-a-time
iterator (which file
refers to). This new iterator lets us
access the file’s contents one line at a time...
... because its next
method tells the file
iterator to
provide characters until it reaches a newline character. We
get a return value that contains all the characters on a single
line. (The return value is a String
, not a Char
as was the
case with our original iterator.)
The next next
returns the following line.
Solution
Here’s a program that numbers all the lines as we intended:
import scala.io.Source
@main def readingExample5() =
val file = Source.fromFile("Files/example.txt")
try
var lineNumber = 1 // stepper
for line <- file.getLines do
println(s"$lineNumber: $line")
lineNumber += 1
finally
file.close()
We loop over each of the lines that getLines
gives us.
Below is an alternative way to implement the try
block. This version discards the
index-counting stepper in favor of the zipWithIndex
method (Chapter 9.2):
for (line, index) <- file.getLines.zipWithIndex do
println(s"${index + 1}: $line")
We pair each line with a (zero-based) index.
We loop over those pairs.
Reading a File, Example 6 of 7
Our goal: first the lines, then their lengths
Now let’s try to process the file so that we get this output:
LINES OF TEXT:
1: This is the first line from the file.
2: This is the second one.
3: This third line is the last.
LENGTHS OF EACH LINE:
37 characters
23 characters
28 characters
THE END
Attempted solution: getLines
twice
Here’s a simple way to solve the problem in principle:
Iterate over each of the lines in the file, printing each one along with its number.
Then iterate over the lines again and print its length.
Let’s turn this idea into Scala code. We’ll use two consecutive loops. Each calls
getLines
and processes one line of text at a time:
import scala.io.Source
@main def readingExample6() =
val file = Source.fromFile("Files/example.txt")
try
println("LINES OF TEXT:")
var lineNumber = 1 // stepper
for line <- file.getLines do
println(s"$lineNumber: $line")
lineNumber += 1
end for
println("LENGTHS OF EACH LINE:")
for line <- file.getLines do
println(s"${line.length} characters")
end for
println("THE END")
finally
file.close()
end readingExample6
Running that program brings is disappointing, however:
LINES OF TEXT:
1: This is the first line from the file.
2: This is the second one.
3: This third line is the last.
LENGTHS OF EACH LINE:
THE END
Our program completely failed to print the line lengths!
Reading a File, Example 7 of 7
Our goal: fix that bug
The reason the previous program didn’t work is that an iterator is capable of iterating over a data sequence only once. The iterator object keeps track of how far it has advanced but it never steps backwards or starts over.
For instance, the iterator that fromFile
returns (and that we indirectly use via
getLines
) is capable of advancing through the file’s contents just once, from beginning
to end. When the second loop calls getLines
, we get only any remaining lines that the
iterator hasn’t yet processed. And since the first loop already processed every single
line, there are no lines left for the second loop to do anything with.
One possible solution is to call Source.fromFile
twice: once before each loop. Each
loop would then use a completely different iterator and each would read the data file
independently of the other loop. The drawback of that solution is that our program would
then unnecessarily read the file twice, and it is remarkably slow to read data from
external memory compared to executing simple programming-language instructions.
Another solution is to copy the file’s entire contents into a collection; we can then iterate over that collection as many times as we like. The downside of this approach is that we need to store the file’s entire contents in main memory.
Below is an example of the second approach.
Solution: toVector
import scala.io.Source
@main def readingExample7() =
val file = Source.fromFile("Files/example.txt")
try
val vectorOfLines = file.getLines.toVector
println("LINES OF TEXT:")
var lineNumber = 1 // stepper
for line <- vectorOfLines do
println(s"$lineNumber: $line")
lineNumber += 1
println("LENGTHS OF EACH LINE:")
for line <- vectorOfLines do
println(s"${line.length} characters")
println("THE END")
finally
file.close()
end readingExample7
We ask the line-by-line iterator to put all the lines in a vector
(cf. mkString
in Example 4 above). Now we have the data in a
Vector[String]
, which we can operate on just like any other
vector.
We can loop over a vector however many times. The programs works and produces the intended output.
Writing a File
Given a suitable library class, writing a text file is straightforward enough. One
of the available tools, which we’ll use here, is the PrintWriter
class from package
java.io
.
Here’s a little example program that writes ten thousand random numbers between 0 and 99 in a text file, one number per line.
import java.io.PrintWriter
import scala.util.Random
@main def writingExample() =
val fileName = "Files/random.txt"
val file = PrintWriter(fileName)
try
for n <- 1 to 10000 do
file.println(Random.nextInt(100))
println("Created a file " + fileName + " that contains pseudorandom numbers.")
println("In case the file already existed, its old contents were replaced with new numbers.")
finally
file.close()
You can create a PrintWriter
object as shown. Pass in the name
of the target file. Note! If a file of that name didn’t exist
yet, it will be created. If it did exist already, the file’s old
contents will be lost and replaced by what our program writes
in the file.
The println
method writes a single line of text into the file.
Closing the file connection is also important when writing files. In fact, it’s particularly important when writing, as it ensures that all the data that you’ve scheduled for writing actually finds its way into the file on disk. (For improved efficiency, file-writing commands don’t always take place at once: they may instead collect data in a buffer in main memory. The write buffer’s contents are taken to the file in one go once a substantial chunk of data has accumulated in the buffer — or the connection is closed.)
Try it
That program, too, is available in the Files module under o1.io
. Try running it.
Try running the program again. The old numbers get replaced by newly randomized ones.
File-handling practice
Write a program that asks the user to provide a file name, reads in numbers from the named file, and computes a few statistics: how many numbers there are in the data set, the average and median of the numbers, and the number of occurrences of the most frequently-occurring number.
Assume that the input file contains characters that correspond to
Double
s one number on each line, as shown below. (Exact integers
are also OK as inputs.)
10.4
5
4.12
10.9
10.9
Below is an example run, which assumes that test.txt
exists and
contains the five lines listed above. Note that the user types in
the file name.
Please enter the name of the input file: test.txt Count: 5 Average: 8.264 Median: 10.4 Highest number of occurrences: 2
You’ll find some skeleton code in printFileStatistics.scala
in the
Files module. Write your code there.
Use try
and finally
and remember to close
the file.
You may want to apply the program to the random numbers generated
by writingExample
. Does the random-number generator seem to be
doing its job right?
A+ presents the exercise submission form here.
Summary of Key Points
Tools for reading and writing input — I/O — are available in the standard libraries that come with programming languages. These tools help the programmer work with files, among other things.
You can use the methods in
scala.io
to read and write text files (and other files).Whenever you do file I/O, you need to be alert to the possibility of failures (exceptions) that may occur at runtime.
You can learn more about I/O, files, and exception handling in later courses and online.
Links to the glossary: I/O; file, text file; runtime error, exception handling; main memory, external memory, hard drive.
Feedback
Please note that this section must be completed individually. Even if you worked on this chapter with a pair, each of you should submit the form separately.
Credits
Thousands of students have given feedback and so contributed to this ebook’s design. Thank you!
The ebook’s chapters, programming assignments, and weekly bulletins have been written in Finnish and translated into English by Juha Sorva.
The appendices (glossary, Scala reference, FAQ, etc.) are by Juha Sorva unless otherwise specified on the page.
The automatic assessment of the assignments has been developed by: (in alphabetical order) Riku Autio, Nikolas Drosdek, Kaisa Ek, Joonatan Honkamaa, Antti Immonen, Jaakko Kantojärvi, Niklas Kröger, Kalle Laitinen, Teemu Lehtinen, Mikael Lenander, Ilona Ma, Jaakko Nakaza, Strasdosky Otewa, Timi Seppälä, Teemu Sirkiä, Anna Valldeoriola Cardó, and Aleksi Vartiainen.
The illustrations at the top of each chapter, and the similar drawings elsewhere in the ebook, are the work of Christina Lassheikki.
The animations that detail the execution Scala programs have been designed by Juha Sorva and Teemu Sirkiä. Teemu Sirkiä and Riku Autio did the technical implementation, relying on Teemu’s Jsvee and Kelmu toolkits.
The other diagrams and interactive presentations in the ebook are by Juha Sorva.
The O1Library software has been developed by Aleksi Lukkarinen and Juha Sorva. Several of its key components are built upon Aleksi’s SMCL library.
The pedagogy of using O1Library for simple graphical programming (such as Pic
) is
inspired by the textbooks How to Design Programs by Flatt, Felleisen, Findler, and
Krishnamurthi and Picturing Programs by Stephen Bloch.
The course platform A+ was originally created at Aalto’s LeTech research group as a student project. The open-source project is now shepherded by the Computer Science department’s edu-tech team and hosted by the department’s IT services. Markku Riekkinen is the current lead developer; dozens of Aalto students and others have also contributed.
The A+ Courses plugin, which supports A+ and O1 in IntelliJ IDEA, is another open-source project. It has been designed and implemented by various students in collaboration with O1’s teachers.
For O1’s current teaching staff, please see Chapter 1.1.
Additional credits for this page
This chapter does injustice to music by Kate Bush. Thank you and sorry.
The
fromFile
method takes in a file name and returns aSource
object that is capable of accessing that file’s contents.