This course has already ended.

Luet oppimateriaalin englanninkielistä versiota. Mainitsit kuitenkin taustakyselyssä osaavasi suomea. Siksi suosittelemme, että käytät suomenkielistä versiota, joka on testatumpi ja hieman laajempi ja muutenkin mukava.

Suomenkielinen materiaali kyllä esittelee englanninkielisetkin termit.

Kieli vaihtuu A+:n sivujen yläreunan painikkeesta. Tai tästä: Vaihda suomeksi.


Chapter 5.4: Inside the Scala Toolkit

About This Page

Questions Answered: How does Scala work? What sorts of things does IntelliJ do for me behind the scenes? Why do some arithmetic operations work all funny?

Topics: Representing data as (binary) numbers. File formats. Compilers, virtual machines, bytecode, and machine language. Garbage collection. Restrictions on numeric data types. Scala tools at the command line.

What Will I Do? Mostly read. A few mini-assignments.

Rough Estimate of Workload:? One hour. (More, if you read/watch the optional parts.)

Points Available: A5.

Related Modules: None.

../_images/sound_icon.png

Notes: There’s an embedded video, but it’s entirely optional.

../_images/person03.png

Introduction

We’ll now take a break from learning to instruct the computer in new ways. In this chapter, we will instead investigate the different stages that a program goes through on its way to becoming a concrete program run.

Let’s get started with this short presentation:

../_images/mammuttikirja.png

The illustrations in this chapter pay tribute to a classic book (Finnish edition pictured).

An IDE such as IntelliJ takes care of these stages automatically and often unnoticeably. IntelliJ, too, uses a number of individual tools such as a compiler and a virtual machine; it’s just that their integration into the IDE is seamless enough that you often don’t need to think about the fact that you are actually using multiple auxiliary programs. So far in O1, we’ve let IntelliJ work its magic. But now we’ll take a peek under the hood.

Why?

If IntelliJ does things automatically for us, is there any call for this chapter?

Well, yes. For starters, a program can be viewed at different levels of abstraction. The execution of an object-oriented program, for example, can be described in highly abstract terms as communication among objects. That high level of abstraction is founded on a lower level where the program is a sequence of instructions in a programming language. At a lower level still, the execution of those instructions involves operations on bits within a computer’s processor.

../_images/mammoth_in_scala.png

Even as you work at a particular level of abstraction, it’s helpful to understand the basic principles of the level beneath. Peering inside an abstraction is particularly useful when your instructions to the computer fail to do what you meant or expected.

The same goes for the auxiliary programs that help us program. Even if you use an IDE, it pays off to have at least a passing understanding of how it works.

A second good reason for this chapter is that in the “real world” outside O1, IntelliJ is hardly the only tool that people use for programming. It makes sense to know the basic principles behind our toolkit so that you’ll find it easier to adapt to similar but different tools.

Another reason is that much of what we’re about to discuss is considered by many to be the sort of general knowledge that any programmer or informed citizen should know.

And of course, some programmers need to be able to create IntelliJs and the like for themselves and other programmers to use. (But that’s not something we’ll get to in O1.)

So let’s take a look at the way Scala works.

Distinct Tools for Distinct Purposes

It will be easier to appreciate the distinct stages of a Scala program if we leave the comfort of our IDE for a while and use separate utilities instead.

The official Scala toolkit (from http://www.scala-lang.org/) contains a number of programs:

  • a compiler named scalac (from Scala compiler);

  • a program for starting the virtual machine, named simply scala;

  • a program for creating documents, Scaladoc (Chapter 3.2); and

  • various other tools not discussed here.

The short presentation below demonstrates how to create and run a tiny Scala program using a text editor, a compiler, and a virtual machine as separate programs.

Know the command line?

To follow the presentation below, it’s helpful if you have even a passing familiarity with the command line — that is, if you know how to enter textual commands in an environment such as Windows’s Command Prompt or PowerShell or Linux’s Terminal windows. If you don’t, you may wish to look up a tutorial online. On the other hand, you can probably appreciate the main points even without such experience.

You should now have an inkling of how a Scala program gets processed. The next sections elaborate on the transformations and tools introduced above and link them to what you’ve done so far in IntelliJ.

Much of what’s coming up involves transformations of data within the computer, so you’ll need to know something about how computers represent data. Let’s discuss bits.

Numbers as Bits, Data as Numbers

A binary number consists of zeros and ones. A single binary digit — a single zero or one — is called bit (bitti).

For instance, here are a couple of numbers in our usual (decimal) number system converted into binary:

14decimal = 8 + 4 + 2 = 1*23 + 1*22 + 1*21 + 0*20 = 1110binary

5decimal = 4 + 1 = 1*22 + 0*21 + 1*20 = 101binary

A computer is a digital system. In its memory, all data is represented as binary numbers or other sequences of bits.

Images and text as bits

A picture is made up of colored dots, pixels. The color of each pixel can be expressed as numbers and those numbers can be expressed as bits.

Text is made up of characters; people may define how each different character uniquely corresponds to a particular number. Scala strings follow a numbering scheme defined by the Unicode standard (http://www.unicode.org/). Here are a few example characters and their values in Unicode:

Value in Unicode

Character

32

space

38

&

48

0

49

1

65

A

66

B

97

a

98

b

122

z

945

α (alpha)

Knowing the number that matches a particular character, we can represent that character as a binary sequence (one way or another; direct conversion into binary as illustrated above is one option). Strings can then be represented as a sequences of such binary numbers.

References as bits

You’re familiar with the concept of a reference by now: a reference is a small granule of data that you can use to access some other data (an object). In this ebook’s animations, you have seen references stored in variables and passed as parameters. At a lower level of abstraction, those references, too, are binary sequences that — one way or another — identify a specific location in computer memory that stores an object.

Programs as bits

It’s not just the data that a program processes that gets stored as bits; the program itself also consists of bits, usually stored in a file. In fact, the program is data, too, and programs can process other programs!

In the case of Scala and most other programming languages, source code is stored as so-called “plain text”, which consists of each character’s binary representations and nothing more. The compiler takes in that data and produces an output that represents the program in a different binary format.

What are bits actually made of?

How did they make machines “understand” bits (1 and 0)?

I’ve always wondered what those “ones and zeros” really mean.

../_images/jm_jacquard.png

J.M. Jacquard

In earlier technology, bits were easier to see than they are in modern computers, where their physical representation is further removed from our everyday reality.

As recently as the 1970s, it was perfectly common to program computers using punched cards: pieces of perforated cardboard with patterns of holes that represented programs. For instance, the presence or absence of a hole in a specific spot on a card could represent a bit; a combination of such bits could represent an instruction.

Punched cards were already used in Joseph Marie Jacquard’s programmable loom in 1801. The Jacquard loom was a forerunner of the first true computers.

In modern computers, a single bit may be represented by the presence or absence of an electric charge in a hardware component, or the presence or absence of a tiny pit on a DVD’s surface, or a magnetic trace, just to name a few. Different devices store bits in different ways.

A 2016 study looked into storing bits using individual clorine atoms on a copper surface. Here’s a quote from the article World’s Smallest Hard Drive Writes Data Atom-By-Atom: “When a chlorine atom is on top with a hole beneath it, it’s a one, the binary digit and when it’s the other way around it’s a zero — thus creating a hard drive.” Further progress in single-atom storage has been made since.

Goodness gracious, what is a qubit?

If you want to rattle your brain, do an internet search for quantum computers (kvanttitietokone) and find out how they use qubits (kubitti) instead of bits.

Now let’s get back to the tools that programmers use.

Editing Source Code

A “plain text” file — often called simply a text file — stores a sequence of bits. Parts of that sequence correspond to numbers, which in turn correspond to characters as defined by a standard. To read and edit such a file, you need an application that’s been programmed to interpret the bits as characters. Such applications include, among many others, IntelliJ’s editor, Emacs, Notepad in Windows, and TextEdit on Macs.

Other file formats (e.g., spreadsheets, GIF image files, Word documents) store bits according to different standards. To work on these files, you need an application that can interpret binary sequences according to the logic of the particular file format.

See for yourself

If you want, you can open, say, an image file in Notepad (or some other non-plain-text file in an editor meant for plain-text editing). The bits in the image file don’t represent written characters, which is why Notepad fails to interpret them correctly. Oblivious to its shortcoming, the application takes in the bits and interprets them as characters; a messy jumble of letters and special characters appears in the editor.

You can write program code not just in IntelliJ but in any application that lets you edit and save text files. Notepad, for example, is a plausible environment for coding; it’s just that Notepad’s other features don’t provide much if any support for programming, so practically nobody wants to use it for that purpose. There are many other text editors that aren’t part of an IDE but nevertheless have features that support programming.

The files of Scala code that you edit in IntelliJ get stored in the file system just like the documents you save in other applications. The files are located in subfolders of your IntelliJ project folder.

Machine Language

Scala is a high-level language (korkean tason ohjelmointikieli). What that means is that the language has a high level of abstraction and its expressions don’t directly involve low-level concepts like binary numbers or the physical components of a computer.

No ordinary computer has a processor that executes commands in Scala or any other high-level language. A processor produces the program’s behavior by executing large numbers of primitive instructions in machine language (or machine code; konekieli)

An instruction in machine language is a sequence of bits that corresponds to some low-level operation such as arithmetic on binary numbers, comparison between sequences of bits, or switching zeros to ones. Different processors have their own machine languages where different binary sequences correspond to different operations.

Programming in a machine language is laborious and error-prone. Creating a very large software system directly in machine code is like building a real skyscraper from Lego: practically impossible.

Compilation, Intermediate Languages, and Bytecode

The source code that a programmer writes needs conversion into a form that the computer can execute. Traditionally, programmers have tended to distinguish between two ways of accomplishing this: To compile (kääntää) a program is to convert it into executable form before one instructs the computer to run it. In contrast, to interpret (tulkata) a program is to convert the program incrementally while it is already being run.

The first step in processing a Scala program is compilation.

Compilation is a systematic (and rather complex) process that an auxiliary program, a compiler, can take care of. It yields code (so-called “binaries”) that are meant for the machine to run. As it compiles a program, the compiler program also ensures in advance that the source program follows the programming language’s syntax.

IntelliJ is equipped with a compiler. On default settings, IntelliJ compiles Scala source code automatically “as needed”, such as when you launch a program that you have edited and that thus requires recompilation. You’ve also learned to tell IntelliJ to compile (build) your code via the Build menu or F10.

Many of the error messages that you’ve seen are a product of these automatic compilations. You’ll recall that several earlier chapters have mentioned compile-time-errors.

Many programming languages are usually compiled directly into a machine code; the resulting machine-code program is then provided for end users to run. For example, the .exe files that you may know from Windows contain machine code, usually produced by compiling a program written in some high-level language. There is an alternative, though:

Intermediate languages and bytecode

Using an intermediate language (välikieli) is an increasingly common alternative to directly translating source into machine code. Here’s the basic idea:

  • You use a compiler that converts the source code into an intermediate language. This happens in advance, before the program is run.

  • To run the program, you use a separate program that converts the intermediate code into machine code. This conversion happens at runtime. The computer runs the generated machine code.

Scala programs, for example, are (normally) first compiled into an intermediate language.

Some intermediate languages go by the name of bytecode (tavukoodi); the intermediate language used with Scala is one of them. The Scala compiler generates so-called class files that contain bytecode. A single .scala source file compiles into one or more .class files.

Locate your class files

Browse the contents of your O1 project folder through the operating system. There is an out folder and within it, a production folder. In there, you’ll find folders that contain the compiled bytecode versions of each module’s Scala source, stored in files with the .class suffix.

Bytecode consists of rather primitive instructions and resembles machine code. Like machine code and unlike source code, bytecode isn’t meant primarily for humans to read and edit. The class files that store the bytecode aren’t text files, and a plain-text editor won’t display their contents properly.

../_images/notepad_class.png

A class file loaded into the Notepad editor, which is fairly useless for examining the file.

To make use of bytecode files, you need a program that specializes in just that. The most important of such programs is the virtual machine.

Virtual Machines and the JVM

../_images/mammoth_virtual.png

The word virtual (virtuaalinen) has many meanings, such as:

  • artificial;

  • simulated;

  • unreal although appearing real;

  • being close to something but not quite being it; and

  • being something in effect or appearance without actually or officially being it.

A virtual machine is a computer program that operates as though it were a computer, without actually being a physical computer.

When you tell IntelliJ to run your program, it looks as though IntelliJ takes in your Scala code and runs it. IntelliJ itself does not in fact run your program, however; what it does is launch an auxiliary program — a virtual machine that takes care of running your program.

IntelliJ gives the virtual machine a bytecode version of the code you wrote. It tells the virtual machine where to start executing the program (i.e., where the code that should run first is). In essence, what IntelliJ does is exactly what we did as we manually entered the command scala Ave.scala on the command line above.

The virtual machine that you’ve used is, in effect, a computer whose processor can execute bytecode instructions; bytecode is that machine’s virtual machine language. The virtual machine processes each bytecode instruction, works out which instructions in actual machine code correspond to it, and passes that actual machine code to the physical computer. In practice, what happens is that your program gets executed.

The virtual machine governs the program run. It sees to it that the bytecode instructions are dealt with in the correct order and manages the program’s use of memory. In other words, the virtual machine takes care of exactly the sorts of things that you’ve seen illustrated as animated diagrams throughout this ebook: it allocates and deallocates frames in the call stack, creates objects in memory, and so forth.

Scala programs are usually run in a virtual machine called the JVM, which is short for the Java Virtual Machine.

Hold on! The Java virtual machine?!

../_images/mammoth_in_java.png

Isn’t Java the name of a completely different programming language?

Indeed it is, but we nevertheless run our Scala programs in the Java Virtual Machine.

The JVM is a program that runs programs that have been compiled into a specific bytecode format. That bytecode format was originally created for use with the Java programming language, which is why the virtual machine is known as the JVM. However, the JVM can run any programs that have been compiled into the JVM’s bytecode, no matter which programming language the source code was originally written in.

So we can perfectly well use the JVM to run Scala programs as long as we have a Scala compiler that transforms our programs into .class files. Over the past couple of decades, the JVM has become a popular platform for running programs written not only in Java but newer and more advanced programming languages as well, Scala being one example.

Despite appearances, the scala that we typed on the command line launched the Java Virtual Machine. This command would have failed if we had tried it on a computer that does not have both the Scala toolkit and the JVM installed.

Scala, Java, JavaScript

The most widely used implementation of Scala, which we use in O1 also, is the one that’s “built on top of Java”. The Scala libraries that we rely on have been implemented using the manifold libraries that are available for Java. If you pay attention, you can glimpse the underlying Java implementation here and there. Perhaps you’ve already noticed some of the following.

  • Our guide for installing IntelliJ and Scala notes that you need not only Scala but a Java toolkit as well.

  • When you launch a Scala application or the REPL in IntelliJ, a text shows up onscreen that mentions Java.

  • In Chapter 4.2, our program crashed with a NullPointerException. More specifically, the error message said java.lang.NullPointerException, because this error type has been defined in Java’s standard library and Scala’s standard library reuses the type.

However, the Scala language isn’t dependent on Java as such. Alternative implementations for the Scala libraries and tools can be made and have been made. The most significant of these alternatives is Scala.js, which is built on top of the JavaScript programming language and makes it possible to run Scala programs in a web browser.

A rationale for intermediate languages and virtual machines

What are the benefits of first compiling to an intermediate language and then running the program in virtual machine? Wouldn’t it be more straightforward just to compile directly to machine code?

Let’s consider a few important criteria:

Quality

Intermediate language + virtual machine

Directly to machine code

portability (siirrettävyys)

A program compiled into an intermediate language can be run in any environment where a virtual machine is available. For example, Scala programs compiled for the JVM can be run on any platform for which a JVM implementation has been created. (In part, this is also a downside: a virtual machine is needed.)

The program will run only in a specific environment that can process the particular machine language. To run the program in different environments, we need to compile multiple different versions of the program.

interoperability (yhdisteltävyys)

Multiple source languages can be compiled into the same intermediate language. This makes it easier for program components to communicate even if they were originally written in different languages. For example, programs that run on the JVM can make use of the many libraries written for Java.

Using components written in another language may require special measure that are particular to the combination of languages.

efficiency (suoritustehokkuus)

Using an intermediate language isn’t maximally efficient. Some programs will run somewhat slower. Optimizations to the virtual machine can mitigate this drawback. When a virtual machine is in widespread use (like the JVM is), optimizations to the VM benefit many programmers who write their source code in any of a number of languages.

The compiler knows not only the source language but also the specific machine language that is being targeted. This information is useful for optimizing the efficiency of the generated machine code. Depending on circumstances, the resulting gain in efficiency may or may not be relevant in practice.

Virtual machines are more popular now than they were in the past, and the trend seems to be continuing.

On actually running bytecode

Some unusual computer hardware has been made that executes JVM bytecode “directly”. Most computers don’t, though, so we use a virtual machine that translates the bytecode into machine code.

Garbage in the Virtual Machine

We sometimes create an object that we need for only a short while. For example, an object might be used only while a single line of code is running. Consider the following code. What happens to the buffer object after we’ve executed the print command?

println(Buffer(5, 2, 7, 32, 9)) // Note: we don’t store a reference to the newly created buffer
// After the println, we might enter other commands that don't use the buffer.

Objects that have been abandoned in memory and are inaccessible to the program are called garbage (roska). Garbage just takes up computer memory but no longer serves any purpose.

So as not to waste memory resources, any memory allocated for garbage should be released for other use while the program is still running. That doesn’t always happen, though. A memory leak (muistivuoto) is a classic bug: a program allocates memory for its needs but fails to release its hold on that memory after its done with it. Little by little, a leaking program may end up reserving more memory than is available and crash the program.

Memory management

Some programming environments expect the programmer to issue explicit commands in order to release memory that is no longer needed. Consider an example in the C++ programming language. The first command below reserves memory for an object’s data and initializes a new object. It looks quite a bit like Scala.

SomeClass* myObj = new SomeClass();     // This is C++, not Scala.
var myObj = SomeClass()                 // This is the corresponding command in Scala.

Once the object has served its purpose, the C++ programmer removes the object from memory:

delete myObj;                           // This is C++, not Scala.

This prevents the unneeded object from cluttering up the computer’s memory.

We haven’t learned a Scala command that works like delete. There isn’t one.

If we have a long-running Scala program that keeps creating objects, will it also require an ever-increasing amount of memory?

No:

Automatic garbage collection

../_images/mammoth_collects_garbage.png

One of the jobs of the Java Virtual Machine is garbage collection (garbage collection). The JVM automatically keeps track of which objects have become garbage and releases the memory allocated to those objects. As programmers, we don’t have to deal with garbage directly.

The JVM considers an object to be garbage in case no references that point to it are stored anywhere.

Automatic garbage collection eliminates most memory leaks. Even so, a Scala program isn’t wholly immune to leakage. Consider this example: A program uses a buffer to store assorted references to objects. As long as those references are stored in the buffer, JVM’s garbage collector won’t release the memory that the objects take up, even if the program no longer actually needs all the objects in the buffer for anything. However, if the buffer object itself becomes garbage, and if the objects it stores aren’t stored anywhere else, then both the buffer and its contents become garbage and will be automatically cleaned up. It’s up to the programmer to design their program so that it stores references in variables or collections only while the referenced objects have a purpose in the program.

Optional Reading: Sharing Your Program

How can I share my Scala app so that others can use it?

There are many alternative ways to do that. Below is a brief summary. The topic will come up again in the spring course Programming Studio 2.

  1. If the end user’s computer already has the Java Virtual Machine and its associated libraries installed, a fairly simply option is to pack your app into an executable JAR. An executable JAR is a file that bundles together the compiled .class files as well as the libraries that Scala needs. Within the JAR file, you can also record which app object starts your program. SBT (Simple Build Tool) is one of the tools that you can use to bundle a Scala app in an executable JAR.

  2. ../_images/scala_js.png

    The Scala.js logo.

    Scala.js is a relatively new toolkit that compiles Scala into JavaScript rather than bytecode. Despite its name, JavaScript is not at all the same as Java; its unique merit is that modern web browsers can run it as part of a web page.

  3. Native compilers generate platform-specific machine code. Such a compiled program may then be bundled with a library that takes care of things like garbage collection. Even though Scala isn’t usually compiled this way, there are tools for it: Scala Native is a new native compiler for Scala. It’s based on the LLVM compiler toolkit.

  4. A “Java wrapper” is a program that bundles a compiled Java or Scala program together with a small helper program. When run, the wrapper program either uses an existing JVM if available or, failing that, installs the required software.

  5. Java Web Start is a technique for creating web links that launch the JVM. The Java Web Start tool helps the user install the application and the JVM as needed before starting the app.

For more information on several of the alternatives listed above, see:

When reading up on these matters, bear in mind that — as stated in our glossary — the word “Java” sometimes refers to the Java programming language and sometimes to the JVM and associated technologies (the latter being more relevant to us as Scala programmers). Authors aren’t always clear on which one they’re talking about.

Last but not least, it’s worth mentioning that these days many applications are provided to end users as a web applications whose internal logic runs in a networked server while the user interface runs in the user’s web browser, a client. When Scala is used for implementing the server side of things, it suffices to install the appropriate tooling (the JVM, etc.) on the server rather than the client. Web applications will be discussed further in O1’s follow-on courses.

Optional Reading: Compilers and Virtual Machines

JVMs here and there

How dependent is the JVM on hardware? Will I find a JVM on PCs, on my Mac, and in an Mercedes E-Class car?

There are many JVM implementations for a variety of environments. There are JVMs on personal laptops and desktops but also in tiny embedded systems and on server farms that run high-traffic web sites. Embedded systems with limited computing capacity often run a lightweight version of Java while other systems run the same JVM that desktop computers do.

Java Technology can be found in a broad spectrum of products across a diverse set of industries that produce anything from RFID readers to parking meters to ATMs to in-flight video systems to POS terminals to wearable systems (just to name a few) — — across a large number of hardware and OS platform configurations.

—A sales pitch by Oracle, the company that shepherds the Java ecosystem

The JVM isn’t the only virtual machine capable of running Java or Scala programs. Android phones, for instance, feature a virtual machine that deviates from the JVM standard.

In some contexts, using a JVM or a similar virtual machine isn’t a good fit. Here’s one example:

Some time ago, I heard that Java cannot be used in nuclear plants and similar sites that require extreme reliability. The reason had something to do with garbage collection. Does the same apply to Scala?

Yes. The garbage collector, which the JVM automatically runs the in the background, needs a bit of time to do its work every now and then — just a tiny moment, but still. That, in combination with other factors, makes it difficult or impossible to predict, with extreme precision, how much time it will take to run a particular piece of code. The difficulty is enough to interfere with life-critical systems such as nuclear plants.

Some operating systems and programming languages are designed specifically for life-critical systems (eee, e.g.: real-time operating system; real-time computing; the SPARK language). These tools support programming methodologies that are more formal and more dependable (see, e.g.: model checking, formal verification).

Is it possible to “un-compile” a compiled program back to source code?

In principle, yes. In practice, sort of.

When source code is compiled into machine language, some of the information recorded in the source is generally lost. For instance, variable names, comments, and other constructs that structure the high-level program may be lost.

Depending on the original program, the programming language, and the compiler, the code produced by a reverse compiler may be very difficult for a human to make sense of. It is by no means guaranteed that the code is easy enough to comprehend that reverse compilation is worth the effort.

There are tools that purposely make it more difficult to recover the source code from its compiled version.

Look these up: decompiler, reverse engineering, code obfuscation.

Compilers vs. interpreters

Compilation and interpretation are separated by a line drawn in sand. It’s not always easy — but perhaps not necessary, either — to say if a tool is a compiler or an interpreter. Intermediate languages and virtual machines further muddle the distinction, as does just-in-time compilation or JIT (ajonaikainen kääntäminen). As an optional reading assignment, look up the latter on the internet.

Some programming environments thoroughly interleave the writing of source code, its compilation/interpretation, and its execution. The Scala REPL is one such environment.

Exploring the JVM bytecode

(This optional assignment is best suited to readers who have programming experience from before O1. This topic will be introduced in more detail in Programming 2.)

You can use auxiliary programs named javap and scalap to examine the bytecode instructions that a Scala program compiles to. Search online to find out how to use these utilities, then use them to examine a compiled Scala class of your choice.

More virtual machines

Find out what other virtual machines and intermediate languages exist apart from the JVM and its bytecode. For example: find out what the CLR is and how JavaScript is being used as an intermediate language for other high-level languages in web browsers.

Optional Reading: Open Source

Open-source software

Open source (avoin lähdekoodi) is yet another term whose precise meaning depends on who you’re talking to. Nevertheless, the core meaning is that open source is about making the source code of a program publically available and allowing people other than the original creator to use it in their software projects.

Source code can be published under any of a number of open-source licences. Those licences set different policies on whether commercial use is allowed, whether the licencee must also publish derivative work under the same licence, and so on.

Open source can be an asset when multiple parties — companies and/or individuals — develop software in collaboration. An open-source community can brainstorm ideas, implement them as code, and jointly assess the work of community members to ensure better quality.

Open source is also associated with values and ideologies, as discussed on Wikipedia.

There are at least hundreds of thousands of open-source projects; the Linux operating system is probably the most famous one. Other well-known examples of open source include the Firefox browser and the Thunderbird email application, both developed by the Mozilla Foundation, a non-profit organization that promotes openness. IntelliJ is another example. So is Scala.

Each page that describes a class in the Scala API documentation has a Source link near the top. You can click it to take a look at that class’s source code. Scala has been published under a highly permissive license.

Optional Viewing: Ethical Considerations

Ethics in Programming

Open source is one of several topics discussed in the conference presentation that is embedded as a video below. The end of the video discusses a specific open-source project and, more generally, the video serves as an illustration of goals and values that some open-source programmers embrace.

Other themes that surface during the presentation include: a programmer’s ethical responsibility for the programs they write; how to reconcile business goals, programming, and the desire to make the world a better place; safe email; and the importance of mass surveillance even if you personally have nothing to hide.

Most of the video should be understandable even with limited prior knowledge. Unfortunately, the short introductory bit at the beginning is probably the most confusing part for many O1 students, but this primer may help you: The first of the two speakers, Martin Fowler, repeatedly refers to agile software development (ketterä ohjelmistokehitys), which has emerged over the past couple of decades as a mainstream approach to software development and which Fowler himself is a reknowned champion of. Very briefly put, to be agile means to develop software in a way that makes it possible to react quickly to changing circumstances and requirements and involves continuous collaboration between developers and their customers (for more information, see Wikipedia). This video isn’t about agilility as such; Fowler uses agile development as point of departure as he guides the talk to other topics.

More on Bits, Numbers, and Data Types

Think about the command println(2147483647 + 1) and the output it produces. Then enter the command in the REPL. Did the output match your expectations?

Enter the output you got here:

Think about the command println(100.0 / 11.0 * 11.0) and the output it produces. Then enter the command in the REPL. Did the output match your expectations?

Enter the output you got here:

At the beginning of this chapter, we established that the computer stores numbers and other data as bits. That’s something we don’t need to constantly think about as programmers though. A Scala program is a high-level abstraction of a computer’s internal behavior, and much of the time we can think about our programs at that abstract level. However, even in Scala programs, bits do come into play sometimes. The two print commands above are cases in point. Let’s see why their output is so odd.

Counting bits

Consider how many different values can be represented with a given number of bits:

  • Zero bits can represent only a single value. (And a single value isn’t enough to express information; cf. Unit from Chapter 1.6.)

  • One bit can represent two different values. A single bit can be used for representing a Boolean value, for instance, or to indicated whether a number is 0 or 1.

  • Two bits can represent a maximum of four different values. For instance, we might specify that 00=0, 01=1, 10=2, and 11=3. Or that 00=0, 01=1, 10=-2, and 11=-1.

  • Three bits can represent a maximum of eight different values.

  • Given some number of bits, n, we can use them to represent a maximum of 2n different values.

On Ints

For reasons of efficiency, Scala specifies that a standard number of bits — 32 to be precise — is used for storing each and every value of type Int. Scala is a typical language in this sense; many other programming languages, too, similarly specify standard bit counts for numeric types.

The value of an Int variable thus invariably takes up 32 bits of memory. Put differently, an Int takes up four bytes (tavu) of memory; one byte equals eight bits.

32 bits is enough for 232 different combinations of zeros and ones. This means that there are 232 different values of type Int. A constant, finite number of bits simply isn’t enough to represent an infinite number of different integers.

The Scala language specification states that the 232 values are the integers in this interval:

-2147483648 ... +2147483647

that is

-231 ... +231-1

The compiler does not check that the integers you use in your programs “fit” the data type, and neither does the virtual machine. It’s your job as a programmer to make sure you don’t attempt to represent numbers outside that interval as Ints.

Computing with numbers that are too large or too small can yield some unexpected results, as in 2147483647 + 1. As you can see, the first operand in our example wasn’t chosen at random: it is precisely the largest integer that fits the Int type. Adding one more causes the result to “wrap around”.

The pros don’t always get it right, either

The limitations of numeric data types can impinge on user experience. Former O1 students have spotted this in games:

2,147,483,647 happened to be the gold limit in World of Warcraft way back when. The more recent expansions made it so easy to hoard money that they raised the gold limit.

In Hearthstone, I’ve noticed that if a minion’s health or attack strength goes past 2,147,483,647, the minion gets destroyed. Now I have a better idea why.

A good example of the same error in reverse is Gandhi’s behavior in Civilization I.

In 2014, Gangnam Style broke YouTube: “The music video for South Korean singer Psy’s Gangnam Style exceeded YouTube's view limit, prompting the site to upgrade its counter. YouTube’s counter previously used a 32-bit integer [which] means the maximum possible views it could count was 2,147,483,647.”

The failed launch of the spacecraft Cluster in 1996 is an example of a more serious incident. Wikipedia reports:

“[The launch] ended in failure due to an error in the software design [which] caused inadequate protection from integer overflow. This resulted in the rocket veering off its flight path 37 seconds after launch, beginning to disintegrate under high aerodynamic forces, and finally self-destructing by its automated flight termination system. The failure has become known as one of the most infamous and expensive software bugs in history. The failure resulted in a loss of more than US$370 million.”

If you have a fear of flying, keep your eyes closed for this paragraph. Because in 2015, it came out that the electricity in the Boeing 787 Dreamliner would switch off unless the plane was “rebooted” at least every 248 days. “What’s 248 days in other time units?”, one wonders.

Sometimes, an oversight such as this becomes a data-security issue.

On Doubles and floating-point numbers

Scala allocates eight bytes, or 64 bits, to every Double. It uses those 64 bits to represent each Double as a so-called floating-point number (liukuluku; Double comes from “double-precision floating-point number”). The floating-point format consists of a sign bit (“plus or minus”), a multiplier known as the mantissa, and an exponent.

The details of the format are unimportant for present purposes (but you can easily find them on the internet if you’re interested). For us, the significant thing is that since the available memory is finite, it’s impossible to represent a decimal number of arbitrary precision as a Double. The number of bits allocated for each Double constrains the data type’s magnitude and precision. 64 bits is enough to reach a precision of roughly fifteen significant digits.

Since Doubles aren’t stored to an arbitrary precision, computing with Doubles often yields “inexact” results such as the one you got when you evaluated 100.0 / 11.0 * 11.0. The division doesn’t produce an “infinitely precise” result; multiplying the approximate result by eleven gives only something close to a hundred.

Here’s another example:

2.999999999999999res0: Double = 2.999999999999999
2.9999999999999999res1: Double = 3.0

Due to finite precision, it’s rarely meaningful to use the == operator to compare floating-point numbers.

The phenomenon has some odd-looking consequences. Let’s try rounding a couple of numbers to the nearest integer:

5.499999999999999.roundres2: Long = 5
5.4999999999999999.roundres3: Long = 6

The second Double literal actually stands for the number 5.5, which rounds upwards.

But what does Long mean in the REPL’s output?

Other numerical types in Scala

Int and Double are the most common numeric types in Scala. But they aren’t the best fit for all situations, so there are alternatives available. It’s up to the programmer to use types that are “large enough” to fit whichever numbers the program needs. Moreover, it occasionally pays off to optimize resources by choosing types that take up fewer bits of memory than Ints and Doubles do.

  • You can represent large integers with the eight-byte data type Long (long integer), which is enough for numbers in the 1018range.

  • You can save resources by representing integers as Shorts (short integer; 2 bytes) or Bytes (1 byte). Similarly, you can use the Float type (4 bytes) to represent floating-point numbers. Using these types is advisable only when you know the optimization is needed in your particular situation.

  • If Longs and Doubles aren’t enough, you can use scala.math.BigInt and scala.math.BigDecimal, which can represent numbers of arbitrary size. Objects of these types aren’t limited by a constant number of bits but use a flexible amount of space that depends on their actual numeric value. The downside is that the numbers take up more memory and calculations on them aren’t as fast.

In O1, you’ll be fine using Ints for integers and Doubles for “decimal numbers”. You can look up the other types online if you’re interested.

Mini-assignment: colors as numbers

It already came up that each pixel of an image can be represented as numbers that define the pixel’s exact hue.

One common representation is the RGB model, which defines each color in terms of its red, green, and blue components. Each component is a number. The range of those numbers depends on how many different colors are available. A range of zero to 255 is common:

  • Each of a pixels’s three RGB components is one of the 256 numbers between zero and 255. These numbers indicate “how much red”, “how much green”, and “how much blue” there is in the pixel.

  • This means that each pixel has one of 2563 (roughly 16.8 million) colors. In order to represent a pixel’s three color components, we need 3 * 8 bits (three bytes) of memory, since eight bits (one byte) can represent exactly 28 or 256 different values.

The familiar Color type you know from the o1 package uses exactly such an RGB representation for colors. This is easy to observe:

Thus far, we’ve used only color constants like Red and LightBlue. You can also create an object that represents a specific hue:

val preciselyTheColorWeWant = Color(220, 150, 220)preciselyTheColorWeWant: Color = Color(220, 150, 220)

The parameters (between 0 and 255) determine the color we get. In this example, we have a pale violet color with especially high values for the red and blue components.

Every Color object has the variables red, green, and blue:

preciselyTheColorWeWant.redres4: Int = 220
preciselyTheColorWeWant.greenres5: Int = 150

The constant colors that you’ve used also have the same variables; see for yourself. Find out the component values of White, Black, and Brown, for example.

How much red is there in Brown?

Pic objects have a pixelColor method that you can call to get the color of a specific pixel: myPic.pixelColor(x, y). Try it.

How much red is there in the pixel that appears at (100,100) within Pic("defense.png")?

(That picture file comes with O1Library and is available to you in the REPL.)

More on colors

There are more things you can do with a Color object. Here are a couple of examples; the documentation lists more methods.

It’s easy to generate a slightly darker or lighter version of a color:

val slightlyLighter = Brown.lighterslightlyLighter: Color = Color(174, 64, 64)
val somewhatDarker = Brown.darker.darkersomewhatDarker: Color = Color(133, 33, 33)
val testPic = rectangle(100, 100, Brown).leftOf(rectangle(100, 100, slightlyLighter))
                                        .leftOf(rectangle(100, 100, somewhatDarker))testPic: Pic = combined pic
testPic.show()
../_images/pic_lighter_darker.png

In addition to their R, G, and B components, colors have an opacity value (sometimes called the alpha channel):

Blue.opacityres6: Int = 255
val translucentRed = Color(255, 0, 0, 100)translucentRed: Color = Color(255, 0, 0, 100)
rectangle(300, 100, translucentRed).onto(circle(200, Blue)).show()

Unless otherwise specified, colors are fully opaque: their opacity is at the highest value permitted.

Opacity is the fourth component of a color. A color with an opacity of one hundred is fairly translucent. An opacity of zero would have made it completely transparent.

../_images/pic_opacity.png

Summary of Key Points

  • The data that computer programs process — including programs themselves — is represented in computer memory as bits.

  • Scala code isn’t executed as is. Typically, it’s first translated into an intermediate language known as bytecode. This translation is taken care of by an auxiliary program known as a compiler.

  • The bytecode is in turn translated into a machine language suitable for the computer system that runs the program. This is taken care of by an auxiliary program known as a virtual machine and happens when the program is run.

    • Scala programs are usually run in a virtual machine known as the JVM, which was originally created for Java programmers.

    • Virtual machines make programs more portable and program components more interoperable.

  • The numerical values that we commonly use, such as Ints and Doubles, are stored in memory as a predefined number of bits. This limits these data types from expressing numbers of arbitrary size and precision.

  • Links to the glossary: integrated development environment (IDE); source code, high-level language, machine language, processor, compiler, build; intermediate language, bytecode; virtual machine, the Java Virtual Machine (JVM); Java, JavaScript, Scala.js; command line, garbage collection; bit, byte; floating-point number; RGB.

In this chapter, you’ve seen but a tiny fraction of the vast conceptual and technological landscape around virtual machines, compilers, machine language, and bits and bytes. In follow-on courses, you’ll get to explore farther and delve deeper. If you’re champing at the bit, go ahead and take an advance look at CS-A1120 Programming 2.

P.S.

Here’s one more:

0.1 + 0.1 + 0.1res7: Double = 0.30000000000000004

Feedback

Please note that this section must be completed individually. Even if you worked on this chapter with a pair, each of you should submit the form separately.

Credits

Thousands of students have given feedback and so contributed to this ebook’s design. Thank you!

The ebook’s chapters, programming assignments, and weekly bulletins have been written in Finnish and translated into English by Juha Sorva.

The appendices (glossary, Scala reference, FAQ, etc.) are by Juha Sorva unless otherwise specified on the page.

The automatic assessment of the assignments has been developed by: (in alphabetical order) Riku Autio, Nikolas Drosdek, Joonatan Honkamaa, Antti Immonen, Jaakko Kantojärvi, Niklas Kröger, Kalle Laitinen, Teemu Lehtinen, Jaakko Nakaza, Strasdosky Otewa, Timi Seppälä, Teemu Sirkiä, Anna Valldeoriola Cardó, and Aleksi Vartiainen.

The illustrations at the top of each chapter, and the similar drawings elsewhere in the ebook, are the work of Christina Lassheikki.

The animations that detail the execution Scala programs have been designed by Juha Sorva and Teemu Sirkiä. Teemu Sirkiä and Riku Autio did the technical implementation, relying on Teemu’s Jsvee and Kelmu toolkits.

The other diagrams and interactive presentations in the ebook are by Juha Sorva.

The O1Library software has been developed by Aleksi Lukkarinen and Juha Sorva. Several of its key components are built upon Aleksi’s SMCL library.

The pedagogy of using O1Library for simple graphical programming (such as Pic) is inspired by the textbooks How to Design Programs by Flatt, Felleisen, Findler, and Krishnamurthi and Picturing Programs by Stephen Bloch.

The course platform A+ was originally created at Aalto’s LeTech research group as a student project. The open-source project is now shepherded by the Computer Science department’s edu-tech team and hosted by the department’s IT services. Markku Riekkinen is the current lead developer; dozens of Aalto students and others have also contributed.

The A+ Courses plugin, which supports A+ and O1 in IntelliJ IDEA, is another open-source project. It has been designed and implemented by various students in collaboration with O1’s teachers.

For O1’s current teaching staff, please see Chapter 1.1.

Additional credits appear at the ends of some chapters.

a drop of ink
Posting submission...