The latest instance of the course can be found at: O1: 2024
Luet oppimateriaalin englanninkielistä versiota. Mainitsit kuitenkin taustakyselyssä osaavasi suomea. Siksi suosittelemme, että käytät suomenkielistä versiota, joka on testatumpi ja hieman laajempi ja muutenkin mukava.
Suomenkielinen materiaali kyllä esittelee englanninkielisetkin termit.
Kieli vaihtuu A+:n sivujen yläreunan painikkeesta. Tai tästä: Vaihda suomeksi.
Chapter 11.3: An Introduction to File I/O
About This Page
Questions Answered: How can I load data from a (text) file into my program? Or save data in a file?
Topics: Reading and writing text files. Some other topics will
come up as well, including iterators, try
, and finally
.
What Will I Do? Read. There’s one small, optional assignment.
Rough Estimate of Workload:? One hour. (This is an entirely optional chapter, so you may just skip it, too.)
Points Available: None.
Related Modules: Files (new).
Notes: One example uses sound via o1.play
, but that isn’t really
important for understanding the chapter.
Using Data Files
So far, O1’s programming assignments haven’t required you to write code that accesses data stored on your computer’s hard drive or saves data in such files. Some of the given programs have done that, however; for instance, the DNA program of Chapter 5.6 read its input from a text file, as did the Stars app and RobotTribes.
File operations are common in everyday applications. There are various reasons for a program to operate on files; here are just a few:
- We want a program to save documents (e.g., a text document, presentation slides, entries of an experience diary) so that they persist in external memory (e.g., a hard drive) even while the program isn’t currently running.
- We’d like a program to read data (e.g., scientific measurements) from a file and compute results from that data.
- We’d like our application’s user to be able to save their personal preference settings (e.g., language). The app stores the settings in a file and loads them whenever the app is launched.
- We’d like our computer game to save and load its state on the player’s command.
This chapter provides a very brief introduction to working with files. We’ll cover the topic in practical terms: you’ll learn about a handful of basic tools that you can apply in your Scala programs. Further information about files is available in O1’s follow-on courses and in various online resources.
On I/O libraries
When a file serves as a program’s input, we say that the program reads the file. Correspondingly, a program that stores information writes the file. Reading and writing data are jointly known as I/O (from input/output). The umbrella term I/O covers not just file I/O but other interactions as well, such as printing information onscreen and network access.
Nearly all programming languages’ standard libraries come with a toolkit for file I/O.
Scala’s I/O package is unsurprisingly named: scala.io
.
It’s easy to use Java libraries from Scala (Chapter 5.4) and since Java’s standard
API provides a decent I/O library. Therefore, creating an extensive, Scala-specific I/O
library hasn’t been a top priority for Scala’s developers
so far. The package scala.io
contains just a handful of fairly simple tools for some
fairly simple but common needs, such as reading text files. For other needs, Scala
programmers can use java.io
or one of various third-party libraries.
On text files
Databases
A more sophisticated alternative to regular files is to use a database (tietokanta): an organized storage that makes it easier, more efficient, and more secure to store data. For more information, you could do worse than check out one of Aalto’s dedicated courses on the topic.
Files, just like computer memory in general, can store data in a variety of formats (Chapter 5.4). In this chapter, we’ll stick with plain-text files — that is, files whose binary contents represent written characters.
Text files have the nice attribute of being easy to modify in a variety of editors (e.g., Emacs, Notepad, or IntelliJ). Many I/O libraries have tools that are specifically meant for handling text files.
Reading a File, Example 1 of 7
Our goal: read a few characters
Say we have a text file named example.txt
that contains these three lines of text:
This is the first line from the file.
This is the second one.
This third line is the last.
You’ll find just such a file in the Files module, which also contains all of this chapter’s example programs.
As our first example, let’s create a program that reads in one character at a time and reports some of the first characters that it finds in the file. The program will produce this output:
The first character in the file is T.
The second character in the file is h.
The third character in the file is i.
The fourth character in the file is s.
The fifth character in the file is .
The sixth character in the file is i.
Solution: fromFile
and next
scala.io
contains a class named Source
, which represents data that the program reads
from some source such as a file. The class has a companion object that provides a number
of factory methods; one of them is named fromFile
. This factor method creates a Source
object that receives its data from a file. Here’s an example:
import scala.io.Source
object ReadingExample1 extends App {
val file = Source.fromFile("Files/example.txt")
println("The first character in the file is " + file.next() + ".")
println("The second character in the file is " + file.next() + ".")
println("The third character in the file is " + file.next() + ".")
println("The fourth character in the file is " + file.next() + ".")
println("The fifth character in the file is " + file.next() + ".")
println("The sixth character in the file is " + file.next() + ".")
file.close()
}
file
variable that refers to the
object. That object is an iterator (iteraattori); an
iterator is associated with a particular data source (such as
a file or a collection) and is capable of working through those
data elements once.next
on it to obtain
the next unprocessed character from the file. This iterator first
brings us the initial character on the first line within the text
file (a Char
).next
.
Consecutive invocations of next
bring us the following characters
from the file.close
does that.Files
and, therein, a file named
example.txt
in the program’s working directory. When you launch
a program in IntelliJ, the working directory is the entire
project’s root folder, under which the Files
module folder resides.Reading a File, Example 2 of 7: I/O Errors
I/O programming unescapably involves the possibility to runtime errors. For example, any
of the previous program’s file.next()
calls fails in case the file cannot be accessed.
That might happen for a number of reasons; for instance, the file might be on a memory
stick that someone disconnects from the computer just as the program needs the file.
When that happens, calling next
fails with a runtime error of type IOException
, which
crashes the example app. The remaining lines of code don’t run at all.
Our goal: always close the file
As we noted already, the “hygiene” of I/O programming demands that we release any resources
allocated to our program once we no longer need them. In this example, we need to make
sure to always call close
once we’re done.
The above program code does indeed call close
, but that call never happens if one of
the next
calls crashes with an error.
Closing a file even on errors is a good habit to learn. There are several ways to
accomplish that; in this chapter, we’ll use the try
construct.
Solution: try
and finally
This program works just like the previous one but deals with errors more neatly:
import scala.io.Source
object ReadingExample2 extends App {
val file = Source.fromFile("Files/example.txt")
try {
println("The first character in the file is " + file.next() + ".")
println("The second character in the file is " + file.next() + ".")
println("The third character in the file is " + file.next() + ".")
println("The fourth character in the file is " + file.next() + ".")
println("The fifth character in the file is " + file.next() + ".")
println("The sixth character in the file is " + file.next() + ".")
} finally {
file.close()
}
}
try
keyword and the curlies mark a code block whose
contents will be run as per usual — or at least, the computer
will try to run that code normally. The try
block’s execution
is interrupted if an error occurs. Without the surrounding try
block, our example program would terminate right away, but...try
block with a finally
block, in
which we specify what we’d like to happen after the try
block
in all scenarios, even if the try
block’s
execution was interrupted by an error.try
block, close the connection to the file.”What about reacting to errors?
A finally
block lets you specify which commands to run no matter if an error occurred
or not. But what if you’d like to do something only in case an error occurred in a
particular section of the program? You might like to inform your app’s user of a problem,
for instance.
Moreover, the try
–finally
construct that we used doesn’t actually prevent the
program from crashing once it has executed the finally
block. What if you’d like your
program to keep runnning despite the error? (Resuming after an error may or may not be
a reasonable thing to do, depending on your program and the type of the error.)
One of the things you can do is add a catch
block to your try
: it specifies
what the program does in case the code in the try
block failed with an error. (The
topic is covered in other courses, including Programming Studio 1 and
Programming Studio 2. It should easy to find other sources as well.)
Reading a File, Example 3 of 7
Our goal: read every character
The previous program read in only the six characters. Let’s develop it into a program that reads and prints out every single letter in the file. Assuming the same text file, our next program will produce this output:
Character #1 is T.
Character #2 is h.
Character #3 is i.
Character #4 is s.
Character #5 is .
Character #6 is i.
Character #7 is s.
Character #8 is .
Character #9 is t.
[Long section of similar output snipped.]
Character #82 is s.
Character #83 is .
Character #84 is t.
Character #85 is h.
Character #86 is e.
Character #87 is .
Character #88 is l.
Character #89 is a.
Character #90 is s.
Character #91 is t.
Character #92 is ..
Solution: a Source
object in a for
loop
import scala.io.Source
object ReadingExample3 extends App {
val file = Source.fromFile("Files/example.txt")
try {
var charCount = 1 // stepper: 1, 2, ...
for (char <- file) {
println("Character #" + charCount + " is " + char + ".")
charCount += 1
}
} finally {
file.close()
}
}
for
loop. This corresponds
to releatedly asking the iterator for the next element by calling
next
until there are no more elements available.More on iterators and for
loops
As mentioned, an iterator object has the job and capability of iterating over a sequence of data elements once, in a particular order.
One way to obtain an iterator is to ask a collection for one: any Scala collection can create an iterator object that traverses the collection’s elements. Behind the scenes, Scala’s collection objects use iterators extensively to provide the various methods that you know.
An iterator object has access to the data that it traverses; moreover, the iterator tracks its own progress during that traversal. How it accomplishes that depends on the data source: an iterator that traverses a buffer, for example, may track its progress by storing the current index.
As an application programmer, when you process a Scala collection’s elements, you generally don’t have to expressly create an iterator. This doesn’t mean that an iterator isn’t being used; it means that Scala automatically manages the iterator object for you. Let’s take a brief but instructive look at what that means.
Consider this familiar sort of for
loop, which prints out a vector’s
contents:
// Version 1
val myData = Vector(10, 5, 2, 4)
for (number <- myData) {
println(number)
}
Vectors have an iterator
method, which creates a new iterator object
that iterates over vector (cf. Source.fromFile
). Such an iterator knows
how to provide one vector element at a time and how to track its own
progress through the vector’s elements. To illustrate, here is some code
that does the same as Version 1 above.
// Version 2
val myData = Vector(10, 5, 2, 4)
val myIterator = myData.iterator
for (number <- myIterator) {
println(number)
}
In Version 3, below, also has the same effect but is phrased differently:
it uses the iterator’s next
and hasNext
methods explicitly instead
of leaving that to the for
loop to take care of.
// Version 3
val myData = Vector(10, 5, 2, 4)
val myIterator = myData.iterator
while (myIterator.hasNext) {
println(myIterator.next())
}
Chapter 6.3 mentioned that Scala’s for
loops are a different way
to write foreach
method calls. For instance, the for
loop in
Version 1 above is just a different way to call foreach
. The Scala
compiler turns the code in Version 1 essentially into this:
// Version 4
val myData = Vector(10, 5, 2, 4)
myData.foreach( println(_) )
As you can see, the code invokes foreach
on the vector. But what
does the vector’s foreach
method do? The answer is that since the
vector needs to traverse its own elements, it creates an iterator
object and invokes that iterator’s foreach
method; the iterator
takes care of the actual traversal. Version 4 works essentially
identically to Version 5, below:
// Version 5
val myData = Vector(10, 5, 2, 4)
myData.iterator.foreach( println(_) )
We won’t dig deeper into the internal implementation of different
iterators here. But perhaps these examples shed some light on what
happens as you use a for
loop on a collection.
Reading a File, Example 4 of 7
Our goal: read the whole file in a variable
What if we’d like to take the entire contents of the file and store them in a String
variable or pass them on to a function? For instance, we might want to read in musical
notes that have been stored in a text file and pass the whole song to play
.
Let’s do just that. You’ll find one suitable example file, running_up_that_hill.txt
,
in the Files module.
Solution: mkString
import scala.io.Source
import o1.play
object ReadingExample4 extends App {
val file = Source.fromFile("Files/running_up_that_hill.txt")
try {
val entireContents = file.mkString
play(entireContents)
} finally {
file.close()
}
}
mkString
method (Chapter 4.2) just like
collections do. Here, we use the method to make a string that
contains the entire file contents, with line breaks and all.Reading a File, Example 5 of 7
Our goal: print a file with line numbers
Let’s write a program that prints out a numbered report of what’s in our example file:
1: This is the first line from the file.
2: This is the second one.
3: This third line is the last.
Towards a solution: getLines
Now, we could operate on the file one character at a time (as in Examples 1 to 3 above), or we could construct a multi-line string (as in Example 4) and then split it into lines. But it’s natural to process a text file one line at a time, and easy too, so let’s do that instead.
Before we get to the working program, let’s experiment a bit and write some code to print out the first two lines from the file:
This is the first line from the file.
This is the second one.
Here’s the code:
val file = Source.fromFile("Files/example.txt")
val linesFromFile = file.getLines
println(linesFromFile.next())
println(linesFromFile.next())
getLines
method returns another iterator object, which
that relies on and governs the original one-char-at-a-time
iterator (which file
refers to). This new iterator lets us
access the file’s contents one line at a time...next
method tells the file
iterator to
provide characters until it reaches a newline character. We
get a return value that contains all the characters on a single
line. (The return value is a String
, not a Char
as was the
case with our original iterator.)next
returns the following line.Solution
Here’s a program that numbers all the lines as we intended:
import scala.io.Source
object ReadingExample5 extends App {
val file = Source.fromFile("Files/example.txt")
try {
var lineNumber = 1 // stepper
for (line <- file.getLines) {
println(s"$lineNumber: $line")
lineNumber += 1
}
} finally {
file.close()
}
}
getLines
gives us.Below is an alternative way to implement the try
block. This version discards the
index-counting stepper in favor of the zipWithIndex
method (Chapter 8.4):
for ((line, index) <- file.getLines.zipWithIndex) {
println(s"${index + 1}: $line")
}
Reading a File, Example 6 of 7
Our goal: first the lines, then their lengths
Now let’s try to process the file so that we get this output:
LINES OF TEXT:
1: This is the first line from the file.
2: This is the second one.
3: This third line is the last.
LENGTHS OF EACH LINE:
37 characters
23 characters
28 characters
THE END
Attempted solution: getLines
twice
Here’s a simple way to solve the problem in principle:
- Iterate over each of the lines in the file, printing each one along with its number.
- Then iterate over the lines again and print its length.
Let’s turn this idea into Scala code. We’ll use two consecutive loops. Each calls
getLines
and processes one line of text at a time:
import scala.io.Source
object ReadingExample6 extends App {
val file = Source.fromFile("Files/example.txt")
try {
println("LINES OF TEXT:")
var lineNumber = 1 // stepper
for (line <- file.getLines) {
println(s"$lineNumber: $line")
lineNumber += 1
}
println("LENGTHS OF EACH LINE:")
for (line <- file.getLines) {
println(s"${line.length} characters")
}
println("THE END")
} finally {
file.close()
}
}
Running that program brings is disappointing, however:
LINES OF TEXT:
1: This is the first line from the file.
2: This is the second one.
3: This third line is the last.
LENGTHS OF EACH LINE:
THE END
Our program completely failed to print the line lengths!
Reading a File, Example 7 of 7
Our goal: fix that bug
The reason the previous program didn’t work is that an iterator is capable of iterating over a data sequence only once. The iterator object keeps track of how far it has advanced but it never steps backwards or starts over.
For instance, the iterator that fromFile
returns (and that we indirectly use via
getLines
) is capable of advancing through the file’s contents just once, from beginning
to end. When the second loop calls getLines
, we get only any remaining lines that the
iterator hasn’t yet processed. And since the first loop already processed every single
line, there are no lines left for the second loop to do anything with.
One possible solution is to call Source.fromFile
twice: once before each loop. Each
loop would then use a completely different iterator and each would read the data file
independently of the other loop. The drawback of that solution is that our program would
then unnecessarily read the file twice, and it is remarkably slow to read data from
external memory compared to executing simple programming-language instructions.
Another solution is to copy the file’s entire contents into a collection; we can then iterate over that collection as many times as we like. The downside of this approach is that we need to store the file’s entire contents in main memory.
There are other ways
Can you come up with further alternative solutions?
Below is an example of the second approach.
Solution: toVector
import scala.io.Source
object ReadingExample7 extends App {
val file = Source.fromFile("Files/example.txt")
try {
val vectorOfLines = file.getLines.toVector
println("LINES OF TEXT:")
var lineNumber = 1 // stepper
for (line <- vectorOfLines) {
println(s"$lineNumber: $line")
lineNumber += 1
}
println("LENGTHS OF EACH LINE:")
for (line <- vectorOfLines) {
println(s"${line.length} characters")
}
println("THE END")
} finally {
file.close()
}
}
mkString
in Example 4 above). Now we have the data in a
Vector[String]
, which we can operate on just like any other
vector.Writing a File
Given a suitable library class, writing a text file is straightforward enough. One
of the available tools, which we’ll use here, is the PrintWriter
class from package
java.io
.
Here’s a little example program that writes ten thousand random numbers between 0 and 99 in a text file, one number per line.
import java.io.PrintWriter
import scala.util.Random
object WritingExample extends App {
val fileName = "Files/random.txt"
val file = new PrintWriter(fileName)
try {
for (n <- 1 to 10000) {
file.println(Random.nextInt(100))
}
println("Created a file " + fileName + " that contains pseudorandom numbers.")
println("In case the file already existed, its old contents were replaced with new numbers.")
} finally {
file.close()
}
}
PrintWriter
object as shown. Pass in the name
of the target file. Note! If a file of that name didn’t exist
yet, it will be created. If it did exist already, the file’s old
contents will be lost and replaced by what our program writes
in the file.println
method writes a single line of text into the file.Try it
That program, too, is available in the Files module under o1.io
. Try running it.
Try running the program again. The old numbers get replaced by newly randomized ones.
File-handling practice
Write a program that asks the user to provide a file name, reads in numbers from the named file, and computes a few statistics: how many numbers there are in the data set, the average and median of the numbers, and the number of occurrences of the most frequently-occurring number.
Assume that the input file contains characters that correspond to
Double
s one number on each line, as shown below. (Exact integers are
also OK as inputs.)
10.4
5
4.12
10.9
10.9
Below is an example run, which assumes that test.txt
exists and
contains the five lines listed above. Note that the user types in
the file name.
Please enter the name of the input file: test.txt Count: 5 Average: 8.264 Median: 10.4 Highest number of occurrences: 2
You’ll find some skeleton code in FileStatistics.scala
in the
Files module. Write your code there.
Use try
and finally
and remember to close
the file.
You may want to apply the program to the random numbers generated
by WritingExample
. Does the random-number generator seem to be
doing its job right?
A+ presents the exercise submission form here.
Summary of Key Points
The I/O tools in o1.util
O1Library has a handful of convenience functions for simple
file-handling needs. These functions rely on the standard
libraries scala.io
and java.io
.
- Tools for reading and writing input — I/O — are available in the standard libraries that come with programming languages. These tools help the programmer work with files, among other things.
- You can use the methods in
scala.io
to read and write text files (and other files). - Whenever you do file I/O, you need to be alert to the possibility of failures (exceptions) that may occur at runtime.
- You can learn more about I/O, files, and exception handling in later courses and online.
- Links to the glossary: I/O; file, text file; runtime error, exception handling; main memory, external memory, hard drive.
Feedback
Please note that this section must be completed individually. Even if you worked on this chapter with a pair, each of you should submit the form separately.
Credits
Thousands of students have given feedback that has contributed to this ebook’s design. Thank you!
The ebook’s chapters, programming assignments, and weekly bulletins have been written in Finnish and translated into English by Juha Sorva.
The appendices (glossary, Scala reference, FAQ, etc.) are by Juha Sorva unless otherwise specified on the page.
The automatic assessment of the assignments has been developed by: (in alphabetical order) Riku Autio, Nikolas Drosdek, Joonatan Honkamaa, Jaakko Kantojärvi, Niklas Kröger, Teemu Lehtinen, Strasdosky Otewa, Timi Seppälä, Teemu Sirkiä, and Aleksi Vartiainen.
The illustrations at the top of each chapter, and the similar drawings elsewhere in the ebook, are the work of Christina Lassheikki.
The animations that detail the execution Scala programs have been designed by Juha Sorva and Teemu Sirkiä. Teemu Sirkiä and Riku Autio did the technical implementation, relying on Teemu’s Jsvee and Kelmu toolkits.
The other diagrams and interactive presentations in the ebook are by Juha Sorva.
The O1Library software has been developed by Aleksi Lukkarinen and Juha Sorva. Several of its key components are built upon Aleksi’s SMCL library.
The pedagogy of using O1Library for simple graphical programming (such as Pic
) is
inspired by the textbooks How to Design Programs by Flatt, Felleisen, Findler, and
Krishnamurthi and Picturing Programs by Stephen Bloch.
The course platform A+ was originally created at Aalto’s LeTech research group as a student project. The open-source project is now shepherded by the Computer Science department’s edu-tech team and hosted by the department’s IT services. Markku Riekkinen is the current lead developer; dozens of Aalto students and others have also contributed.
The A+ Courses plugin, which supports A+ and O1 in IntelliJ IDEA, is another open-source project. It was created by Nikolai Denissov, Olli Kiljunen, Nikolas Drosdek, Styliani Tsovou, Jaakko Närhi, and Paweł Stróżański with input from Juha Sorva, Otto Seppälä, Arto Hellas, and others.
For O1’s current teaching staff, please see Chapter 1.1.
fromFile
takes in a file name and returns aSource
object that is capable of accessing that file’s contents.