A Guide To Porting C And C++ Code Rust

User Manual: Pdf

Open the PDF directly: View PDF PDF.
Page Count: 254

DownloadA Guide To Porting C And C++ Code Rust A-guide-to-porting-c-to-rust
Open PDF In BrowserView PDF
Table of Contents
Introduction

1.1

Licence

1.2

Foreword

1.3

Credits

1.4

Notation used through this book

1.5

Setting Up Rust

1.6

C and C++ Background

1.7

Rust Background

1.8

Let's Start Simple

1.9

Compiling and Linking in More Detail

1.10

Source Layout and Other General Points

1.11

Namespacing With Modules

1.12

Porting Code

1.13

Features of Rust compared with C++

1.14

Types

1.14.1

Strings

1.14.2

Variables

1.14.3

Literals

1.14.4

Collections

1.14.5

Structs

1.14.6

Comments

1.14.7

Lifetimes, References and Borrowing

1.14.8

Expressions

1.14.9

Conditions

1.14.10

Switch / Match

1.14.11

Casting

1.14.12

Enumerations

1.14.13

Loops

1.14.14

Functions

1.14.15

Polymorphism

1.14.16

1

Error Handling

1.14.17

Lambda Expressions / Closures

1.14.18

Templates / Generics

1.14.19

Attributes

1.14.20

Multi-threading

1.14.21

Lint

1.14.22

Macros

1.14.23

Memory Allocation

1.14.24

Foreign Function Interface

1.14.25

Porting from C/C++ to Rust

1.15

Copy Constructor / Assignment Operators

1.15.1

Missing Braces in Conditionals

1.15.2

Assignment in Conditionals

1.15.3

Class Member Initialisation

1.15.4

Headers and Sources

1.15.5

Forward Declarations

1.15.6

Namespace Collisions

1.15.7

Macros

1.15.8

Type Mismatching

1.15.9

Explicit / Implicit Class Constructors

1.15.10

Poor Lifetime Enforcement

1.15.11

Memory Allocation

1.15.12

Null Pointers

1.15.13

Virtual Destructors

1.15.14

Exception Handling / Safety

1.15.15

Templates vs Generics

1.15.16

Multiple Inheritance

1.15.17

Linker Errors

1.15.18

Debugging Rust

1.16

Memory Management

1.17

Rust's std:: library

1.18

Rust Cookbook

1.19

2

3

Introduction

A Guide to Porting C/C++ to Rust
This book is for people familiar with C or C++ who are thinking of using Rust.
Before we go into what Rust is or why it might be preferable to C/C++ in some cases, let's
think of software that is mission critical and must not or should not fail.
Operating system services and daemons
Internet of things devices
Industrial control software
Medical devices - MRI, ultrasound, X-ray, ventilators etc.
High availability servers / databases / cloud storage etc.
Avionics, telemetry, rocketry, drones etc.
All this code must run as efficiently and reliably as possible. It must run on devices for days,
weeks, months or preferably years without failure. It cannot suffer intermittent freezes,
erratic performance, memory leaks, crashes or other issues without impacting on its
purpose.
Normally such software would be written in C or C++, but consider these every day
programming issues that can afflict these languages:
Dangling pointers. A program calls an invalid pointer causing a crash.
Buffer overruns / underruns. Code writes beyond an allocated buffer causing memory
corruption or a page exception.
Memory leaks. Code that allocates memory or resources without calling the
corresponding free action. C++ provides classes such as smart pointers and techniques
like RAII to mitigate these issues but still occur.
Data races. Multiple threads write to data at the same time causing corruption or other
destabilizing behavior.
Rust stops these bad things happening by design. And it does so without impacting on
runtime performance because all of these things are checked at compile time:
Object lifetimes are tracked automatically to prevent memory leaks and dangling
pointers.
The length of arrays and collections is enforced.
Data race conditions are prevented by strict enforcement of mutex / guards and object
ownership.
Code that passes the compiler's checks is transformed into machine code with similar
performance and speed as the equivalent C or C++.

4

Introduction

This is a "zero-cost" approach. The compiler enforces the rules so that there is zero runtime
cost over the equivalent and correctly written program in C or C++. Safety does not
compromise performance.
In addition Rust plays well C. You may invoke C from Rust or invoke Rust from C using
foreign function interfaces. You can choose to rewrite a critical section of your codebase
leave the remainder alone.
For example, the Firefox browser uses Rust to analyse video stream data - headers and
such like where corrupt or malicious code could destabilize the browser or even be
exploitable.

Why Rust?
Let's start by saying if what you have works and is reliable, then the answer to the question
is "there's no reason" you should consider porting.
However if you have code that doesn't work or isn't reliable, or hasn't been written yet or is
due a major rewrite then perhaps you have answered your own question.
You could write the code or fixes in C/C++ in which case you have to deal with all the unsafe
issues that the language does not protect you from. Or you might consider that choosing a
safe-by-design language is a good way to protect you from suffering bugs in the field when
the code is supposed to be ready for production.

Rust is not a magic wand
Despite the things the language can protect you against, it cannot protect you against the
following:
General race conditions such as deadlocks between threads
Unbounded growth, e.g. a loop that pushes values onto a vector until memory is
exhausted.
Application logic errors, i.e. errors that have nothing to do with the underlying language,
e.g. missing out the line that should say "if door_open { sound_alarm(); }"
Explicit unsafe sections doing unsafe and erroneous things
Errors in LLVM or something outside of Rust's control.

5

Licence

Licence
The book is written under these terms:

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0
International License.
Refer to the link for the exact legal terms. But in essence you may share and modify this
book providing you do not sell or derive profit from doing so.

6

Foreword

Foreword
BEGIN DRAFT BOOK DISCLAIMER
Some of the samples will not compile or may not have been syntax checked
C and Rust code snippets are not distinguished very well yet (styling)
Some of the text makes uncited assertions of fact
Some of the text is marked TODO
Some of the topics that should be covered are brushed over, given undue weight or
omitted entirely
Some of the text probably makes no sense or repeats itself
WITH ALL THAT IN MIND, read on!
END DRAFT BOOK DISCLAIMER
Think of all the software that needs to be reliable in this world. Software that can ill afford
downtime or crashes. Software that is mission critical and must not or should not fail.
Operating system services and daemons
Internet of things devices
Industrial control software
Medical devices, imagery etc.
High availability servers / databases / cloud storage etc.
Avionics, telemetry, rocketry, drones etc.
All this code that has to run as efficiently and reliably as possible with the minimal of errors.
It also has to be predictable without sudden freezes or mystery-memory behavior due to
garbage collection.
C and C++ has the speed angle covered but is hard to make reliable. A language like Java
would have the reliability angle covered but is hard to make performant.
What we want is something which runs as fast as C or C++ but has the reliability that goes
with it. And that is what Rust is about. It compiles into binary executables or libraries just like
C or C++ and can even be used to produce dynamic libraries that can be consumed by other
code bodies.

7

Credits

All of the information found in this document can be gleaned from elsewhere but it tends to
be scattered across documents and sites that are focused on topics not and in many cases
I've sought inspiration and knowledge from TODO
1. Online documentation for Rust
2. The Rustonomicon - describes some of the esoteric reasoning and internals behind the
language and how to perform unsafe programming. Unsafe programming is not the
default but is necessary for interacting with external code.
3. TODO - Design patterns repo
4. TODO - Effective C++, Meyers
5. TODO – various stackoverflow questions and answers

8

Notation used through this book

Notation used through this book
Code samples are given throughout this book are for C, C++, Rust and general configuration
/ console output.
In order to distinguish each kind they are styled as follows:
C / C++ samples are given in this style:
// C/C++
while (x < y) {
cout << "x is less than y" << endl;
++x;
}

Rust samples are given in this style:
// Rust
if x == 20 {
println!("Warning!");
}

Standard console output or script is given this style:
cd myproject/
cargo build

Most of the code samples are abbreviated in some fashion. e.g. they assume the code is
running from within a main() function or they omit noise such as #includes, namespace
definitions and so forth.

9

Setting Up Rust

Setting up Rust
This section will talk you through setting up Rust for the first time and also how to keep it up
to date.
Getting started is incredibly easy but some details vary upon your target platform. Rust runs
on Windows, Linux and MacOS. In addition you might wish to cross-compile code for the
consumption of another platform.

Use Rustup
The easiest way to get started is to download and run

rustup-init

which you can do by

visiting the Rustup site.
The instructions differ for Windows and Unix-like systems:
On Windows, rustup-init is an exe installer.
On Unix / OS X / Linux rust-init is a shell script.
Either way, when you follow the instructions the installer will download and install put rustc,
cargo, and rustup in your bin directory which is
%USERPROFILE%.cargo.\bin

It will also set your

PATH

~/.cargo/bin

on Unix and

on Windows.
environment variable so that you can open a terminal and type

rustc, cargo, rustup etc.
Once

rustup

is installed you can also use the tool for maintenance:

Install additional Rust toolchains (e.g. if you are cross-compiling or supporting multiple
targets you may have more than one toolchain)
Change the default toolchain that is invoked when you type

rustc

or

cargo

. Rustup

will create symbolic links / scripts that invoke the appropriate toolchain
Update the toolchain when a new version of Rust is released
Fetch source and documentation

Unix / Linux
The process for running

rustup-init.sh

is as follows:

1. Open a terminal / console
2. Type "curl https://sh.rustup.rs -sSf | sh"

10

Setting Up Rust

3. This will download and execute a script which will examine your environment,
recommend the toolchain to download, and offer to modify your

PATH

environment

variable.
4. Choose the option 1 to proceed. Or customize if you want to modify something
5. Wait for download to complete
6. You're done.
If you don't have curl, then you must install it first to proceed, or save the shell script from a
browser to disk and execute that.
To install

curl

in Linux you would invoke a command like this to install it.

Debian / Ubuntu -

sudo apt get curl

Fedora / Redhat -

sudo dnf install curl

Windows
1. Download rustup-init.exe from rustup.rs.
2. Double click on the rust-init.exe and a console will open
3. Choose the option 1 to proceed. Or customize if you want to modify something
4. Wait for download to complete
5. You're done.
If you prefer not to go with the defaults, here are some choices you should decide upon:
1. 32/64 bit version. Most Windows installations are going to be 64-bits these days but you
may have a reason to choose 32-bit.
2. GNU or MSVC ABI. This depends on what toolchain and runtimes you wish to be
compatible with.
The second choice concerns the application binary interface (ABI) you want Rust to be
compatible with.
If you don't care about linking to anything then choose the GNU ABI. Also choose it if
you have DLLs produced by MingW / MSYS. The advantage of this ABI is that it is more
mature.
If you have Visual Studio installed or intend to use Rust against DLLs created with
Visual Studio, that's the ABI you need. One advantage of this option is that you can
debug Rust inside of Visual Studio- the compiler will produce .pdb files that allow you to
step debug Rust.

Keeping Rust up to date

11

Setting Up Rust

New versions of Rust appear in a semi-frequent basis. If you want to update your
environment to the latest version, it is as simple as this:
rustup update

Sometimes rustup will get an update of its own in which case you type:
rustup self update

Adding Rust source
Rustup installs a rust toolchain but if you're writing code or debugging you probably should
also get the Rust source code so you can step into it or look at the implementation:
rustup component add rust-src

Manual installation
If you prefer manual installation of Rust then there are packages and instructions on the
Rust site.
Just be aware that Rust has a fairly rapid release cycle so you probably only want to do this
if you have a reason to choose a specific version of Rust and stick with it.
Otherwise you may find yourself uninstalling and reinstalling a new version 6 weeks later all
over again.

Setting up a debugger
Unix / Linux
Debugging Rust is little different from debugging C or C++.
You must install gdb for your platform and then you may invoke it from a console or your
favourite front-end to debug Rust code.
On Linux systems you would normally install gdb from a package with one of these
commands:

12

Setting Up Rust

sudo apt-get install gdb
# or
sudo dnf install gdb

You may also prefer to use lldb which is a companion project to LLVM (the backend compiler
used by Rust). Refer to the lldb website for information on using it.
Rust comes with a few scripts that wrap gdb and lldb to provide pretty-printing to assist with
debugging. When debugging, you can invoke

rust-gdb

or

rust-lldb

to use them.

Windows
If you have chosen Rust with the MSVC ABI then you can debug through Visual Studio with
some limitations. When you create a debug build of your code, the compile will also create a
.pdb file to go with it. You may open your executable in Visual Studio and step debug it,
inspect variables and so on.

GDB
GDB on Windows is available through MSYS / MingW distributions.
For example downloads of the TDM-GCC distribution of MSYS can be found here. At the
time of writing this, there is a standalone gdb-7.9.1-tdm64-2.zip containing the Choose the
32 or 64-bit version according to your Rust environment.
Extract the zip file to a directory, e.g.
PATH

C:\tools\gdb-7.9.1-tdm64-2

and add a value to your

environment variable:

set PATH=%PATH%;C:\tools\gdb-7.9.1-tdm64-2\bin\

You can invoke

gdb

from the command line but more normally you'd prefer a front end.

At the time of writing, perhaps the best option is Visual Studio Code which has plugins for
debugging with GDB and for Rust development. So you can edit and debug from the same
IDE.
Pretty printer
Rust supplies a pretty printer for variable inspection that you can add to the GDB. The pretty
printer is a script written in Python that GDB will invoke to display variables.
First ensure you have Python 2.7 installed in your path.

13

Setting Up Rust

The script is bundled with the Rust source code so you need to have installed that first.
If you installed it with

rustup

then it can be found in your

%USERPROFILE%\.rustup

directory:

e.g.
c:\users\MyName\.rustup\toolchains\stable-x86_64-pc-windowsgnu\lib\rustlib\src\rust\src\etc

Otherwise it can be found wherever you unzipped your Rust source code under
src\rust\src\etc

.

Note the fully qualified path its under and edit

C:\tools\gdb-7.9.1-tdm64-2\bin\gdbinit

to

insert the path using forward slashes.
python
print "---- Loading Rust pretty-printers ----"
sys.path.insert(0, "C:/users/MyName/.rustup\toolchains/stablex86_64-pc-windows-gnu/lib/rustlib/src/rust/src/etc")
import gdb_rust_pretty_printing
gdb_rust_pretty_printing.register_printers(gdb)
end

Setting up an IDE
Rust is still behind some other languages when it comes to IDE integration but there are
already plugins that provide much of the functionality you need.
Popular IDEs such as Eclipse, IntelliJ, Visual Studio all have plugins that work to varying
degrees of integration with Rust.
Visual Studio Code (not to be confused with Visual Studio) is a cross-platform
programming editor and has a lot of plugins. It can be set up into a complete Rust
development environment by following this tutorial.
Rust plugin for IntelliJ IDEA is under active development. This plugin has a lot of
traction and is turning around new versions on a nearly weekly basis. Offers syntax
highlighting, autocomplete (via built-in parser), cargo builts and eventually other
functionality. IntelliJ is a commercial product but it comes in a community edition which
is sufficient for development.

14

Setting Up Rust

Visual Rust plugin for Microsoft Studio . Offers syntax highlighting, autocompletion,
interactive debugging.
RustDT for Eclipse is also under active development. It adds syntax highlighting,
autocomplete (via racer), cargo builds and rustfmt functionality to Eclipse.
Atom is a popular editor with heaps of plugins. These plugins are very useful for Rust:
language-rust provides basic syntax highlighting
racer for autocompletion functionality
atom-beautify invokes rustfmt to make code look pretty.
build-cargo invokes cargo for you showing errors and warnings inline.
For other editors and IDEs refer to the Rust and IDEs page on the Rust website.

Racer / Rustfmt
Some of the plugins above make use of Racer and Rustfmt.
Racer is used by some plugins to provide autocompletion functionality.
Rustfmt is a source code formatting tool that makes sure your Rust source code is pretty to
look at, adding spacing, indentation and so on.
You can get both just by typing these commands and waiting for the tools to download and
build themselves - they're written in Rust and built through cargo.
cargo install racer
cargo install rustfmt

15

C and C++ Background

C and C++ Background
This section talks about C and C++. It describes its history, standards and provides a
background as to how it ended up where it is today.

History of C
Early Days
The creation of C is closely associated with the early days of Unix. Bell Labs developed Unix
out of an earlier project called Multics. The first version of Unix ran on PDP-7 microcomputer
and funding was given to move it to PDP-11. Dennis Ritchie was a key member on this
project and set about creating a language that could help him develop Unix while minimizing
the amount of assembly language he had to write. Most of the code up to that point was
expressed in assembly language which was error prone and obviously non portable.
Ritchie developed C so that he could write code in terms of variables, expressions, loops,
functions etc. and use a compiler to translate C code into machine code. The generated
code ran almost as fast as hand written assembly and was more portable since only the
compiler had to be changed in order to support a new architecture. C itself was influenced
by B (hence why it was called C), which itself was influenced by BCPL.

Defacto standard and emerging popularity
In 1978 C was formalised into a defacto standard called K&R C, named after Brian
Kernighan & Dennis Ritche who published the standard as a book.
Over time the use of C became more widespread and compilers such as Turbo C, Lattice C,
Microsoft C popularized C on other operating systems including personal computers.

International Standards
C later became an ANSI standard, C89. A further standard followed with C99 and C is still
under review and development.
Some functionality that was introduced in C++ has also found its way back into C standards.
For example, the // style single-line comment and variable declaration rules in blocks.

16

C and C++ Background

History of C++
C++ first appeared in 1983 as C with classes. It was invented by Bjarne Stroustrop as a way
to imbue C with Simula-like features. Simula is a language that allowed concepts such as
objects, classes and inheritance to be expressed in code and as its name suggests was
created for running simulations. However it was considered too slow for systems
programming and so something that combined speed of C with object oriented concepts was
highly desirable.
C++ added these concepts as extensions to the C language and used a precompiler called
cfront

to transform the C++ extensions into C code that could then be compiled into

machine code. So a C++ program could have the high level object oriented concepts but
without the overhead that came with Simula.
C++ became popular in its own right and outgrew the limitations of cfront preprocessor to
become supported by compilers in its own right. Thus toolchains such as Microsoft Visual
C++, GCC, Clang etc. support both languages. Some toolchains have also been given to
favouring C++ over C, for example Microsoft's compiler has been very slow to implement
C99.
Object oriented programming has mostly been used in higher level software - applications,
games, simulations and mathematical work.
C++ has also become formalised standards with C++98, C++03, C++11 and so on.

Modern C++
C++11 onwards is a distinctly different beast from earlier iterations and strives to add
functionality that if used correctly can eliminate a lot of issues that will be discussed later on:
Scoped and shared pointers
auto keyword
move semantics (i.e. moving data ownership of data from one variable to another)
rvalue references
perfect forwarding
nullptr explicit type
However it is worth noting that since many of these things are late additions to C++. Things
like move semantics must be explicitly used and have implications that are not an issue for
Rust where they have been part of the language since early on.

The relationship between C and C++

17

C and C++ Background

While C++ grew out of C and has developed alongside it, it is not true to say C++ is a
superset of C. Rather it is mostly a superset. There are differences such as keywords and
headers that C recognizes that C++ does not.
C++ has function overloading and classes and uses name mangling to disambiguate
overloaded functions. But in practice it is possible to write C as a subset of C++ and compile
the two into the same executable. Most real-world C code could be called C++ without
classes.
C and C++ are even usually handled by the same toolchain. Most compilers would consist of
a front half that parses the language into an intermediate form and a back half which turns
the intermediate form into optimized machine code. Finally the linker would join all the binary
objects together to form an executable. C and C++ would share most of this code path.
C++ tends to be more popular with applications level programming. Part of the reason C++
hasn't found itself in the lower layers is the perception that exception handling, name
mangling, linking and issues of that nature add unwanted complexity or that somehow the
generated code is less efficient. Arguments have been made that this is not the case, but the
perception still remains.
C still tends to be more popular in low level systems programming. Components such as the
Linux kernel are pure C with some assembly. Many popular open source libraries such as
sqlite3 are also written in C.

Objective-C
Objective-C is another C derived language that added objects and classes. Unlike C++,
Objective-C behaves as a strict superset of C.
The language was developed in the 1980s and was popularized in the NeXTSTEP operating
system and later in Apple's OS X and iOS. It hasn't gained much popularity outside of those
platforms but the success of the iPhone has ensured it has a sizeable developer base of its
own. It is also well supported by the GCC and Clang toolchains. Apple has begun to
deprecate Objective-C in favour of Swift which is a modern high level language similar in
some respects to Rust but more application focussed.
Objective-C is strongly influenced by Smalltalk (as opposed to Simula in C++) and so code
works somewhat differently than C++.
Notionally code calls objects by sending them a message. An object defines an interface
specifying what messages it accepts and an implementation that binds those messages to
code. The caller code sends a message to call a method. Objects can also receive dynamic
messages, i.e. ones not defined by their interfaces, so they can do certain tasks such as

18

C and C++ Background

intercepting and forwarding messages. In addition an object can ignore a message or not
implement it without it being considered an error. In a broad sense, an ObjC message and a
C++ method are or more or less analogous in functionality.

C/C++ Timeline
These are the major revisions of C and C++
Year

Event

Description

1972

C

C for PDP-11, other Unix systems

1978

K&R C

C as defined in "The C Programming Language" book by
Kernighan & Ritchie

1989

C89 (ANSI
X3.159-1989)

C is standardized as ANSI C, or C89. C90 (ISO/IEC
9899:1990) is the ISO ratified version of this same
standard.

1979

C with classes ->
C++

Bjarne Stroustrops

1995

C95 (ISO/IEC
9899/AMD1:1995)

Wide character support, digraphs, new macros, and
some other minor changes.

1998

C++98 (ISO/IEC
14882:1998)

C++ is standardized for the first time.

1999

C99 (ISO/IEC
9899:1999)

Single line (//) comments, mixing declarations with code,
new intrinsic types, inlining, new headers, variable length
arrays

2003

C++03 (ISO/IEC
14882:2003)

Primarily a defect revision, addressing various defects in
the specification.

2011

C++11 (ISO/IEC
14882:2011)

A major revision that introduces type inference (auto),
range based loops, lambdas, strongly typed enums, a
nullptr constant, struct initialization. Improved unicode
char16_t, char32_t, u, U and u8 string literals.

2011

C11 (ISO/IEC
9899:2011)

Multi-threading support. Improved unicode char16_t,
char32_t, u, U and u8 string literals. Other minor changes

C++14 (ISO/IEC
14882:2014)

Another major revision that introduces auto return types,
variable templates, digit separators (1'000'000), generic
lambdas, lambda capture expressions, deprecated
attribute.

2014

19

Rust Background

Rust Background
The catalyst for Rust was the Mozilla Firefox web browser. Firefox like most web browsers
is:
Written in C++. 1
Complex with millions of lines of code.
Vulnerable to bugs, vulnerabilities and exploits, many of which are attributable to the
language the software is written in.
Mostly single-threaded and therefore not well suited for many-core devices - PCs,
phones, tablets etc. Implementing multi-threading to the existing engine would
doubtless cause even more bugs and vulnerabilities than being single threaded.
Rust was conceived as a way to obtain C or C++ levels of performance but also remove
entire classes of software problem that destabilize software and could be exploited. Code
that passes the compiler phase could be guaranteed to be memory safe and therefore could
be written in a way to take advantage of concurrency.
So Rust began life as a research project by Graydon Hoare in 2009 for the Mozilla
foundation to solve these issues. It progressed until the release of version 1.0 in 2015.
The project is hosted on GitHub. The language has been self-hosting for quite some time that is to say the Rust compiler is written in Rust, so compiling Rust happens from a
compiler written in Rust. Get your head around that! But it's the same way that C and C++
compilers are these days too.
1 Read this Mozilla internal string guide to get a flavor of the sort of problems the browser
had to overcome. A browser obviously uses a lot of temporary strings. STL strings were too
inefficient / flakey from one compiler to the next and so the browser sprouted an entire tree
of string related classes to solve this issue. Similar tales are told in Qt and other large
libraries.

Problems with C/C++
It is trivial (by accident) to write code that is in error such as causing a memory leak. It is
easy (by malice) to exploit badly written code to force it into error. It easy with the best
testing in the world for some of these errors to only manifest themselves when the code is in
production.

20

Rust Background

At best, bugs are a costly burden for developers to find and fix, not just in time and dollars
but also their reputation. At worst, the bug could causes catastrophic failure but more
ordinarily leaves code unstable or vulnerable to hacking.
Rust is a language that produces machine code that is comparable in performance as
C/C++ but enforces a safe-by-design philosophy. Simply put, the language and the compiler
try to stop errors from happening in the first place. For example the compiler rigorously
enforces lifetime tracking on objects and generates errors on violations. Most of these
checks and guards are done at compile time so there is a zero-cost at runtime.

Active Development
The Rust team releases a new version of Rust approximately every 6 weeks. This means
Rust receives code and speed improvements over time.
Most releases focus on marking APIs as stable, improving code optimization and compile
times.

Open source and free
Rust is dual licensed under the Apache 2.0 and MIT open source licenses. The full copyright
message is viewable online.
Essentially the license covers your right to modify and distribute the Rust source code. Note
that Rust generates code for LLVM so LLVM also has its own software license (TODO link).
What you compile with Rust (or LLVM) is not affected by the open source license. So you
may compile, execute and distribute proprietary code without obligation to these licenses.

Is Rust for everybody?
No of course not. Performance and safety are only two things to consider when writing
software.
Sometimes it's okay for a program to crash every so often
If you have code that's written and works then why throw that away?
Writing new code will always take effort and will still cause application level bugs of one
sort or another.
Performance may not be a big deal especially for network bound code and a higher
level language like Java, C#, Go may suit better.

21

Rust Background

Some people will find the learning curve extremely steep.
Rust is still relatively immature as a language and still has some rough edges compilation times, optimization, complex macros.
But you may still find there is benefit to moving some of your code to Rust. For example,
your C++ software might work great but it has to deal with a lot of user-generated data so
perhaps you want to reimplement that code path in Rust for extra safety.

Safe by design
Some examples of this safe-by-design philosophy:
Variable (binding) is immutable by default. This is the opposite of C++ where mutable is
the default and we must explicitly say const to make something immutable. Immutability
extends to the &self reference on struct functions.
Lifetime tracking. The Rust compiler will track the lifetime of objects and can generate
code to automatically drop them when they become unused. It will generate errors if
lifetime rules are violated.
Borrowing / Variable binding. Rust enforces which variable "owns" an object at any
given time, and tracks values that are moved to other variables. It enforces rules about
who may hold a mutable or immutable reference to it. It will generate errors if the code
tries to use moved variables, or obtain multiple mutable references to it.
There is no NULL pointer in safe code. All references and pointers are valid because
their lifetimes and borrowing are tracked.
Rust uses LLVM for the backend so it generates optimized machine code.
Lint checking is builtin, e.g. style enforcement for naming conventions and code
consistency.
Unit tests can be integrated into the code and run automatically
Modules (equivalent to namespaces C++) are automatic meaning we implicitly get them
by virtue of our file structure.

Don't C++11 / C++14 get us this?
Yes and no. C++11 and C++14 certainly bring in some long overdue changes. Concurrency
primitives (threads at last!), move semantics, pointer ownership and other beneficial things
all come in with these latest standards. Conveniences such as type inference, lambdas et al
also come in.
And perhaps if you program the right subset of features and diligently work to avoid pitfalls of
C++ in general then you are more likely to create safe code.

22

Rust Background

But what is the right subset?
If you use someone else's library - are they using the right subset?
If one subset is right then why does C++ still contain all the stuff that is outside of that?
Why are all the things which are patently unsafe / dangerous still allowed?
Why are certain dangerous default behaviors such as default copy constructors not
flipped to improve code safety?
We could argue that C++ doesn't want to break existing code by introducing change that
requires code to be modified. That's fair enough but the flip-side is that future code is almost
certainly going to be broken by this decision. Perhaps it would be better to inflict a little pain
for some long term gain.

Unsafe programming / C interoperability
Rust recognizes you may need to call an external libraries, e.g. in a C library or a system
API.
Therefore it provides an

unsafe

keyword that throws some of the safety switches when it is

necessary to talk to the outside world.
This allows you consider the possibility of porting code partially to Rust while still allowing
some of it to remain as C.

23

Let's Start Simple

Let's Start Simple
The usual introduction to any language is "Hello, World!". A simple program that prints that
message out to the console.
Here is how we might write it for C:
#include 
int main(int argc, char *argv[]) {
printf("Hello, World!\n");
return 0;
}

C++ could write it the same way, or we could use the C++ stream classes if we preferred:
#include 
using namespace std;
int main(int argc, char *argv[]) {
cout << "Hello, World!" << endl;
return 0;
}

And here is the equivalent in Rust:
fn main() {
println!("Hello, World!");
}

There are some obvious points of similarity that we can observe:
C/C++ and Rust follow the convention of having a

main()

function as the entry point

into code. Note that Rust's main doesn't return anything. It's effectively a void method.
There is a general purpose print statement.
The general structure in terms of main, use of { } and semi-colons is mostly the same. In
both languages a block of code is enclosed in curly braces, and a semi-colon is used as

24

Let's Start Simple

a separator between statements.
Rust looks a little bit more terse than either C or C++ because it automatically includes
references to part of its standard runtime that it refers to as its "prelude".
The

println!()

is actually a macro that expands into code that writes to the standard

output. We know it's a macro because it ends in a ! character but you may treat it like a
function call for now. We'll see how Rust macros differ to those in C/C++ later.

Compiling our code
Open a command prompt and set up your compiler environments.
If you were using gcc, you’d compile your code like this:
gcc hw.cpp -o hw

If you were using Microsoft Visual C++ you'd compile like this:
cl /o hw.exe hw.cpp

To compile in Rust you invoke the rustc compiler.
rustc hw.rs

And to run either
./hw (or .\hw.exe)
Hello, World!

Again there are points of similarity:
There is a shell command that compiles the code and creates an executable from it.
The binary runs in the same way.
A less obvious point of similarity is that Rust shares its code generation backend with gccllvm and clang. Rustc outputs llvm bitcode which is compiled (and optimized) into machine
code via LLVM. This means the resulting executable is very similar in form to that output by
C++ compilers. That includes the symbolic information it supplies for debugging purposes. A
rust executable can be debugged in gdb, lldb or Microsoft Visual Studio depending on the
target platform.

25

Let's Start Simple

rustc -O hw.rs

26

Compiling and Linking in More Detail

Compiling and Linking in More Detail
Your main() entry point
Rust has a main function just like C/C++ which is usually called

main()

.1

It doesn’t take any arguments and it doesn’t return anything unlike C/C++. Let's see how we
might do those things.

Processing command-line arguments
In C/C++, the entry point takes argc, and argv arguments. Argc is the number of arguments
and argv is an array of char * pointers that specify those arguments.
int main(int arcg, char **argv) {
// our code
}

Processing arguments can become inordinately complex (and buggy) so most software will
use a function like
Note that

getopt()

getopt()

or

getopt_long()

to simplify the process.

is not a standard C function and is not portable, e.g. to Windows. So

immediately we see an example of problem that C/C++ forces us to solve.
Rust doesn't process arguments this way. Instead you access the command-line parameters
from

std::env::args()

args()

from anywhere in the code. That is to say, there is a function called

under the namespace

The function

args()

std::env

that returns the strings on the command-line.

returns the parameters in a string array. As with C++, the first element

of the array at index 0 is the command itself:
use std::env;
fn main() {
for argument in env::args() {
println!("{}", argument);
}
}

27

Compiling and Linking in More Detail

Alternatively, since

args()

returns a type called

Args

that implements the

Iterator

trait

you can collect the arguments up into your own collection and process that:
use std::env;
use std::collections::HashSet;
fn main() {
let args: HashSet = env::args().collect();
let verbose_flag = args.contains("--verbose");
}

We can see some clear advantages to how Rust supplies args:
You don't need a separate argc, parameter. You have an array that defines its own
length.
You can access arguments from anywhere in your program, not just from the

main()

.

In C++ you would have to pass your args around from one place to another. In Rust you
can simply ask for them from anywhere.

Use a crate - easy command-line processing
Rust has a number of crates for processing arguments. The most popular crate for
processing arguments is clap.
It provides a very descriptive, declarative way of adding rules for processing arguments into
the code. It is especially useful if your program takes a lot of arguments, including
parameters and validation rules.
For example we add this to

Cargo.toml

:

[dependencies]
clap = "2.27"

And in our

main.rs

.

28

Compiling and Linking in More Detail

#[macro_use] extern crate clap;
use clap::*;
fn main() {
let matches = App::new("Sample App")
.author("My Name ")
.about("Sample application")
.arg(Arg::with_name("T")
.long("timetowait")
.help("Waits some period of time for something to
happen")
.default_value("10")
.takes_value(true)
.possible_values(&["10", "20", "30"])
.required(false))
.get_matches();
let time_to_wait = value_t_or_exit!(matches, "T", u32);
println!("Time to wait value is {}", time_to_wait);
}

This code will process arguments for

-T

or

--timetowait

and ensure the value is one of 3

accepted. And if the user doesn't supply a value, it defaults to

10

. And if the user doesn't

supply a valid integer it will terminate the application with a useful error.
The user can also provide

--help

as an argument and it will print out the usage.

Exit code
If you want to exit with a code, you set it explicitly:
fn main() {
//... my code
std::os::set_exit_status(1);
}

When

main()

drops out, the runtime cleans up and returns the code to the environment.

Again there is no reason the status code has to be set in
somewhere else and

panic!()

main()

, you could set it

to cause the application to exit.

29

Compiling and Linking in More Detail

Optimized compilation
In a typical edit / compile / debug cycle there is no need to optimize code and so Rust
doesn't optimize unless you ask it to.
Optimization takes longer to happen and can reorder the code so that backtraces and
debugging may not point at the proper lines of code in the source.
If you want to optimize your code, add a -O argument to rustc:
rustc -O hw.rs

The act of optimization will cause Rust to invoke the LLVM optimizer prior to linking. This will
produce faster executable code at the expense of compile time.

Incremental compilation
Incremental compilation is also important for edit / compile / debug cycles. Incremental
compilation only rebuilds those parts of the code which have changed through modification
to minimize the amount of time it takes to rebuild the product.
Rust has a different incremental compilation model to C++.
C++ doesn't support incremental compilation per se. That function is left to the make /
project / solution tool. Most builders will track a list of project files and which file
depends on other files. So if file foo.h changes then the builder knows what other files
depend on it and ensures they are rebuilt before relinking the target executable.
In Rust incremental compilation is at the crate level - that if any file in a crate changes
then the crate as a whole has to be rebuilt. Thus larger code bases tend to be split up
into crates to reduce the incremental build time.
There is a recognition in the Rust community that the crate-level model can suck for large
crates so the Rust compiler is getting incremental per-file compilation support in addition to
per-crate.
At the time of writing this support is experimental because it is tied to refactoring the
compiler for other reasons to improve performance and optimization but will eventually be
enabled and supported by rustc and cargo.

Managing a project
30

Compiling and Linking in More Detail

In C++ we would use a

makefile

or a solution file of some kind to manage a real world

project and build it.
For small programs we might run a script or invoke a compiler directly but as our program
grows and takes longer to build, we would have to use a
A typical

makefile

makefile

to maintain our sanity.

has rules that say what files are our sources, how each source depends

on other sources (like headers), what our final executable is and a bunch of other mess
about compile and link flags that must be maintained.
There are lots of different makefile solutions which have cropped up over the years but a
simple gmake might look like one:
SRCS = main.o pacman.o sprites.o sfx.o
OBJS = $(SRCS:.cpp=.o)
EXE = pacman
$(EXE): $(OBJS)
$(CC) $(CFLAGS) -o $(EXE) $(OBJS)
.cpp.o:
$(CC) $(CFLAGS) -c $< -o $@

When you invoke

make

, the software will check all the dependencies of your target, looking

at their filestamps and determine which rules need to be invoked and which order to rebuild
your code.
Rust makes things a lot easier – there is no makefile! The source code is the makefile. Each
file says what other files it uses via depencies on other crates, and on other modules.
Consider this main.rs for a pacman game:
mod pacman;
fn main() {
let mut game = pacman::Game::new();
game.start();
}

If we save this file and type
pacman

and will search for a

rustc main.rs
pacman.rs

(or

the compiler will notice the reference to
pacman/mod.rs

mod

) and compile that too. It will

continue doing this with any other modules referenced along the way.

31

Compiling and Linking in More Detail

In other words you could have a project with 1000 files and compile it as simply as
main.rs

rustc

. Anything referenced is automatically compiled and linked.

Okay, so we can call

rustc

, but what happens if our code has dependencies on other

projects. Or if our project is meant to be exported so other projects can use it?

Cargo
Cargo is a package manager build tool rolled into one. Cargo can fetch dependencies, build
them, build and link your code, run unit tests, install binaries, produce documentation and
upload versions of your project to a repository.
The easiest way to create a new project in Rust is to use the

cargo

command to do it

cargo new hello_world –bin

Creates this
hello_world/
.git/ (git repo)
.gitignore
Cargo.toml
src/
main.rs

Building the project is then simply a matter of this:
cargo build

If you want to build for release you add a --release argument. This will invokes the rust
compiler with optimizations enabled:
cargo build --release

If we wanted to build and run unit tests in our code we could write
cargo test

Crates and external dependencies
32

Compiling and Linking in More Detail

Cargo doesn't just take care of building our code, it also ensures that anything our code
depends on is also downloaded and built. These external dependencies are defined in a
Cargo.toml

in our project root.

We can edit that file to say we have a dependency on an external "crate" such as the

time

crate:
[package]
name = "hello_world"
version = "0.1.0"
authors = ["Joe Blogs "]
[dependencies]
time = "0.1.35"

Now when we run

cargo build

, it will fetch "time" from crates.io and also any dependencies

that "time" has itself. Then it will build each crate in turn automatically. It does this efficiently
so iterative builds do not incur a penalty. External crates are download and built in your
.cargo home directory.
To use our external crate we declare it in the main.rs of our code, e.g.
extern crate time;
fn main() {
let now = time::PreciseTime::now();
println!("The time is {:?}", now);
}

So the change to the

Cargo.toml

and a reference in the source is sufficient to:

1. Fetch the crate (and any dependencies)
2. Build the crate (and any dependencies)
3. Compile and link to the crate and dependencies
All that happened with a line in

Cargo.toml

and a line in our code to reference the crate. We

didn't have to mess around figuring how to build the other library, or maintain multiple
makefiles, or getting our compiler / linker flags right. It just happened.

Cargo.lock

33

Compiling and Linking in More Detail

Also note that once we build, cargo creates a
This file is made so that if

cargo build

Cargo.lock

file in our root directory.

is invoked again it has an exact list of what

packages need to be pulled and compiled. It stops situations where the code under our feet
(so to speak) moves and suddenly our project no longer builds. So if the lock file exists, the
same dependency configuration can be reproduced even from a clean. If you want to force
the cargo to rebuild a new lock file, e.g. after changing
update

Cargo.toml

, you can type

cargo

.

1. You can change the main entry point using a special

#[start]

directive if you want

on another function but the default is main() ↩

34

Source Layout and Other General Points

Source Layout and Other General Points
Header files
C/ C++
C and C++ code tends to be split over two general kinds of file:
The Header file (.h, .hpp) contains class definitions, external function signatures,
macros, templates, inline functions. Sometimes inline functions get stored in their own
file. The standard template library C++ headers do not have a file extension. Some 3rd
party libraries like Qt may sometimes omit the extension.
The Source file (.c, .cc, .cpp) contains the implementation of classes and anything
private. Sometimes C++ will use tricks such as forward class references and Pimpl
patterns to keep complex or dependent code out of the header file.
Occasionally you may also see files with a .inl, or .ipp extension which are headers with a lot
of inline templates or functions.
Compilers are only interested in source files and what they

#include

so what's really

happening in most C/C++ code is that a preprocessor concatenates various header files to
the front of the source file according to the

#

directives within it and the resulting file is fed

to a compiler.
Splitting definition and implementation across multiple files can be a nuisance since it means
that changes to a single class can require modifications to multiple files.

Rust
Rust does not have header files. Every struct, implementation and macro resides in a file
ending in .rs. Code is made public or not by structuring .rs files into modules and exposing
functions via the

pub

keyword.

Ordering is less important too. It is possible to forward reference structs or functions, or even
use

the very same module that a piece of code is a part of. The only time that ordering

matters is for macro definitions. A macro must be defined before a module that uses it.
Rust files reference non-dependent modules with the
modules with the

mod

use

keyword and pull-in dependent

keyword.

35

Source Layout and Other General Points

Namespaces
C / C++
C does not use namespaces. Libraries tend to prefix their functions and structs with a
qualifying name of some sort.
C++ does have namespaces but their use is optional and varies from one piece of code to
the next.

Rust
Rust has modules which are like

#include

and namespaces rolled into one

One major convenience definition and implementation are one and the same. Implementing
a function brings it into existence. Any other module that chooses to "use" it simply says so
and the compiler will ensure it compiles properly.
See Namespacing with modules TODO ref

File name conventions
In C++ filenames typically end in:
.h, .hpp, .inl for headers or inline code
.c, .cpp, .cc for source code
Aside from the extension (which may kick off the compiler expecting C or C++) there is next
to no expected arrangement or naming convention for files.
You can compile a file called deeply/nested/Timbuktu.cpp which defines 20 classes and 30
interfaces if you like and the name does not matter.
Rust files are snake_case and end in .rs. The filename DOES matter because the name is
the module name that scopes whatever is in it. There are also some special files called
main.rs, lib.rs and mod.rs.
So if you name your file foo.rs, then everything inside is scoped foo::* when externally
referenced.

Unicode support
36

Source Layout and Other General Points

Using Unicode in C++ has always been a pain. Neither C nor C++ had support for it at all,
and various solutions have appeared over time. Recent implementations of the standards of
C and C++ provide string literal types for UTF encodings, but prior to that it was strictly ascii
or wide characters.
Here are some general guidelines for Unicode in C / C++:
Source code is normally only safe to use characters 0-127 although some compilers
may have parameters that allow makefiles to specify other encodings.
C++ has char and wchar_t types for 8-bit and 32-bit or possibly 16-bit wide strings. Part
of the problem with wchar_t was the width was immediately subverted.
Char type implies no encoding. It normally means ASCII but could also mean UTF-8,
Latin1, or in fact any form of encoding that predates Unicode. Basically it is up to the
program to understand the meaning.
A "wide" wchar_t is NOT UTF-32. It might be, or it might be UTF-16 on some platforms
(e.g Windows). This messed up definition makes operations such as slicing strings
dangerous due to the risk of cutting through a control point.
What if I want to read Unicode arguments from the command-line such as file paths what encoding are they in? The main() method passes them as char. Windows has a
wmain() that takes wchar_t. What am I supposed to do?
Windows favours wide (UTF-16) character strings for its APIs although it has ASCII
versions too. The ASCII versions are not UTF-8. Compiled code has #define UNICODE
to support multiple languages.
Linux tends to favour UTF-8 encoded char strings. Most languages, toolkits and tools
assume UTF-8. The exception is Qt which has chosen to use UTF-16 internally.
C-lib has acquired various wide versions of its strdup, strcmp, strcpy etc. It also
acquired wide versions of functions for opening files on disk and so forth.
C++ lib has acquired std::string / std::wstring classes. C++ has acquired explicit UTF-16
and UTF-32 versions of these classes.
C11 and C++11 introduce explicit string literals for various UTF widths.
Limited conversion capabilities between wide / narrow in C++. Some operating systems
have incomplete conversion capabilities.
3rd party conversion libraries like ICU4C are commonly used. Libraries like boost, Qt
use libicu for converting between encodings
Embedding Unicode into C source involves using escape codes or hex values
Rust simplifies things a lot by benefit of hindsight.
Source code is UTF-8 encoded.
Comments, characters and string literals can contain Unicode characters without
escaping.
The native char type is 4 bytes wide – as wide as a Unicode characters.

37

Source Layout and Other General Points

The native str & String types are internally UTF-8 to save space but may be iterated by
char or by byte according to what the function is doing.
Since source code is UTF-8 encoded you may embed strings straight into the source.
let hello = "你好";
for c in hello.chars() { /* iterate chars */
//...
}

38

Namespacing With Modules

Namespacing With Modules
C++ namespaces allow you to group your functions, variables and classes into logical blocks
and allow the compiler to disambiguate them from other functions, variables and classes that
might otherwise have the same name.
// Namespacing is usually a good idea
namespace myapp {
void error() {
//...
}
const int SOME_VALUE = 20;
void doSomething(int value) {
//...
}
}
//... somewhere else in the code
myapp::doSomething(100);

Namespacing in C++ is completely optional which means some code may use nest
namespaces while other code may be content to cover its entire codebase with a single
namespace. Some code might even put its code into the global namespace. Other code
might control the use of namespaces with macros.
The equivalent to a namespace in Rust is a module and serves a similar purpose. Unlike
C++ though you get namespacing automatically from the structure of your files. Each file is a
module in its own right.
So if we may have a file myapp.rs
// myapp.rs
pub fn error() { /* ... */ }
pub const SOME_VALUE: i32 = 20;
pub fn doSomething(value: i32) { /* ... */ }

Everything in myapp.rs is automatically a module called myapp. That means modules are
implicit and you don't have to do anything except name your file something meaningful.

39

Namespacing With Modules

use myapp;
myapp::doSomething(myapp::SOME_VALUE);

You could also just bring in the whole of the mod if you like:
use myapp::*;
doSomething(SOME_VALUE);

Or just the types and functions within it that you use:
use myapp::{doSomething, SOME_VALUE}
doSomething(SOME_VALUE);
// Other bits can still be referenced by their qualifying mod
myapp::error();

But if you want an explicit module you may also write it in the code. So perhaps myapp
doesn't justify being a separate file.
// main.rs
mod myapp {
pub fn error() { /* ... */ }
pub const SOME_VALUE = 20;
pub fn doSomething(value: i32) { /* ... */ }
}

Modules can be nested so a combination of implicit modules (from file names) and explicit
modules can be used together.

Splitting modules across files
Namespacing with modules is pretty easy, But sometimes you might have lots of files in a
module and you don't want the outside world to see a single module namespace.
In these cases you’re more likely to use the myapp/mod.rs form. In this instance the mod.rs
file may pull
in subordinate files

40

Namespacing With Modules

// myapp/mod.rs
mod helpers;
mod gui;
#[cfg(test)]
mod tests
// Perhaps we want the outside world to see myapp::Helper
pub use helpers::Helper;

In this example, the module pulls in submodules
pub mod

gui

. Neither is marked as

so they are private to the module.

However the module also says
reference

and

helpers

myapp::Helper

pub use helpers::Helper

which allows the outside to

. Thus a module can act as a gatekeeper to the things it

references, keeping them private or selectively making parts public.
We haven't mentioned the other module here

tests

. The attribute

it is only pulled in when a unit test executable is being built. The

#[cfg(test)]

cfg

indicates

attribute is used for

conditional compliation.

Using a module
Modules can be used once they are defined.
use helpers::*;

Note that the use command is relative to the toplevel
declare a

mod helpers

can also use relative

main

at the top, then the corresponding
use

commands with the

super

and

or

lib

module. So if you

use helpers
self

will retrieve it. You

keywords.

// TODOs

Module aliasing
TODO

External crates
41

Namespacing With Modules

TODO

42

Porting Code

Porting Code
Before starting, the assumption at this point is you need to port code. If you're not sure you
do to port code, then maybe you don't. After all, if your C/C++ code works fine, then why
change it?
TODO This section will provide a more real world C/C++ example and port it to the
equivalent Rust

Automation tools
Corrode
Corrode is a command-line tool that can partially convert C into Rust. At the very least it may
spare you some drudgery ensuring the functionality is as close to the original as possible.
Corrode will take a C file, e.g.
somefile.rs

somefile.c

plus any arguments from

gcc

and produces a

which is the equivalent code in Rust.

It works by parsing the C code into an abstract syntax tree and then generating Rust from
that.
Interestingly Corrode is written in Haskell and more interestingly is written as a literate
Haskell source - the code is a markdown document interspersed with Haskell.

Bindgen
Bindgen is a tool for generating FFI interfaces for Rust from existing C and C++ header files.
You might find this beneficial if you're porting code from C / C++, or writing a new component
that must work with an existing code base.
Bindgen requires that you preinstall the Clang C++ compiler in order to parse code into a
structure it can digest.
The readme documentation on the site link provides more information on installing and using
the tool.

Experiences
A number of websites offer insights of the porting process from C to Rust

43

Porting Code

1. Porting Zopfli from C to Rust. Zopfli is a library that performs good but slow deflate
algorithm. It produces smaller compressed files than zlib while still remaining compatible
with it.
2. TODO

Tips
Use references wherever you can
TODO references are equivalent to C++ references or to pointers in C. Passing by reference
is an efficient way of passing any object greater than 64-bits in size into a function without
relinquishing ownership. In some cases, it is even a good idea to return by reference since
the caller can always clone the object if they need to, and more often than not they can just
use the reference providing the lifetimes allow for it.
TODO you do not need to use references on intrinstic types such as integers, bools etc. and
there is no point unless you intend for them to be mutable and change.

Learn move semantics
C and C++ default to copy on assign, Rust moves on assign unless the type implements the
Copy

trait. This is easily one of the most mind boggling things that new Rust programmers

will encounter. Code that works perfectly well in C++ will instantly fail in Rust.
The way to overcome this is first use references and secondly don't move unless you intend
for the recipient to be the new owner of an object.
TODO if you intend for the recipient to own a copy of an object then implement the Clone
trait on your struct. Then you may call

.clone()

and the recipient becomes the owner of the

clone instead of your copy.

Use modules to naturally arrange your source
code
TODO

Using composition and traits
44

Porting Code

TODO Rust does not allow you to inherit one struct from another. The manner of overcoming
this.

Using Cargo.toml to create your build profiles
Use Rust naming conventions and formatting
C and C++ have never had

Foreign Function Interface
TODO for now read the FFI omnibus.

Leaving function names unmangled
TODO attribute no_mangle

libc
TODO Rust provides a crate with bindings for C library functions. If you find yourself
receiving a pointer allocated with malloc you could free it with the corresponding call to free()
via the bindings.
TODO add the following to your

Cargo.toml

[dependencies]
libc = "*"

TODO example of using libc

45

Features of Rust compared with C++

Features of Rust compared with C++
Rust and C++ have roughly analogous functionality although they often go about it in
different ways.
Rust benefits from learning what works in C / C++ and what doesn't and indeed has cherrypicked features from a variety of languages. It also enjoys a cleaner API in part because
things like Unicode dictate the design.
This section will cover such topics as types, strings, variables, literals, collections, structs,
loops and so on. In each case it will draw comparison between how things are in C/C++ and
how they are in Rust.
Also bear in mind that Rust compiles to binary code and is designed to use C binaries and
be used by C binaries. Therefore the generated code is similar, but it is different as source.

46

Types

Types
C/C++ compilers implement a data model that affects what width the standard types are.
The general rule is that:
1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)

As you can see, potentially everything all the way to

long long

could be a single byte, or

there could be some other crazy definition. In practice however, data models come in four
common types which will be covered in the next section.
For this section, we'll cover the most likely analogous types between Rust and C/C++.

47

Types

C/C++

Rust
i8

(or

char

u8
unsigned
char

u8

signed
char

i8

short int
unsigned
short int
(signed)
int

)

Notes
The signedness of a C++ char can be signed or unsigned the assumption here is signed but it varies by target system.
A Rust char is not the same as a C/C++ char since it can
hold any Unicode character. 1

i16
u16
i32

or

In C/C++ this is data model dependent 2

i16
unsigned
int

u32

or

In C/C++ this is data model dependent 2

u16
(signed)
long int

i32

or

In C/C++ this is data model dependent 2

i64
unsigned
long int

u32

or

In C/C++ this is data model dependent 2

u64
(signed)
long long
int

i64

unsigned
long long
int

u64

size_t

usize

float

f32

double

f64

long
double

f128

bool

bool

void

()

1 Rust's

char

f128 support was present in Rust but removed due to issues
for some platforms in implementing it.

The unit type (see below)

type, is 4 bytes wide, enough to hold any Unicode character. This is

equivalent to the belated
wchar_t

usize holds numbers as large as the address space 3

char32_t

that appears in C++11 to rectify the abused C++98

type which on operating systems such as Windows is only 2 bytes wide. When you

iterate strings in Rust you may do so either by character or

u8

, i.e. a byte.

2 See the next section to for a discussion on data models.

3
48

Types
3 Rust has a specific numeric type for indexing on arrays and collections called
usize

usize

.A

is designed to be able to reference as many elements in an array as there is

addressable memory. i.e. if memory is 64-bit addressable then usize is 64-bits in length.
There is also a signed

isize

which is less used but also available.

Data model
The four common data models in C++ are:
LP32 -

is 16-bit,

int

long

and pointers are 32-bit. This is an uncommon model, a

throw-back to DOS / Windows 3.1
ILP32 LLP64 LP64 -

,

int
int
int

and pointers are 32-bit. Used by Win32, Linux, OS X

long

and

long

is 32-bit,

are 32-bit,

long

/

long long

long long

and pointers are 64-bit. Used by Win64

and pointers are 64-bit. Used by Linux, OS X

C/C++ types compared to Rust
C/C++ and Rust will share the same machine types for each corresponding language type
and the same compiler / backend technology, i.e.:
1. Signed types are two's complement
2. IEE 754-2008 binary32 and binary64 floating points for float and double precision types.

stdint.h / cstdint
C provides a



signedess, e.g.

uint32_t

header that provides unambigious typedefs with length and
. The equivalent in C++ is



.

If you use the types defined in this header file the types become directly analogous and
unambiguous between C/C++ and Rust.

49

Types

C/C++

Rust

int8_t

i8

uint8_t

u8

int16_t

i16

uint16_t

u16

uint32_t

u32

int32_t

i32

int64_t

i64

uint64_t

u64

Integer types
C++
C/C++ has primitive types for numeric values, floating point values and booleans. Strings will
be dealt in a separate section.
Integer types (
A

char

char

,

short

,

int

,

long

) come in

signed

and

unsigned

versions.

is always 8-bits, but for historical reasons, the standards only guarantee the other

types are "at least" a certain number of bits. So an
only say it should be at least as large as a

short

More recent versions of C and C++ provide a

int

is ordinarily 32-bits but the standard

, so potentially it could be 16-bits!



(or



for C) with

typedefs that are unambiguous about their precision.
Even though



can clear up the ambiguities, code frequently sacrifices

correctness for terseness. It is not unusual to see an

int

used as a temporary incremental

value in a loop:
string s = read_file();
for (int i = 0; i < s.size(); ++i) {
//...
}

While

int

is unlikely to fail for most loops in a modern compiler supporting ILP32 or

greater, it is still technically wrong. In a LP32 data model incrementing 32767 by one would
become -32768 so this loop would never terminate if

s.size()

was a value greater than

that.

50

Types

But look again at this snippet. What if the file read by

read_file()

is outside of our control.

What if someone deliberately or accidentally feeds us a file so large that our loop will fail
trying to iterate over it? In doing so our code is hopelessly broken.
This loop should be using the same type returned from
unsigned integer type called

size_type

string::size()

. This is usually a typedef for

necessarily. Thus we have a type mismatch. A

string

which is an opaque

std::size_t

but not

has an iterator which could be used

instead but perhaps you need the index for some reason, but it can messy:
string s = read_file();
for (string::iterator i = s.begin(); i != s.end(); ++i) {
string::difference_type idx = std::distance(s.begin(), i);
//...
}

Now we've swapped from one opaque type

size_type

to another called

difference_type

.

Ugh.
C/C++ types can also be needlessly wordy such as

unsigned long long int

. Again, this sort

of puffery encourages code to make bad assumptions, use a less wordy type, or bloat the
code with typedefs.

Rust
Rust benefits from integer types that unambiguously denote their signedness and width in
their name -

i16

,

u8

etc.

They are also extremely terse making it easy to declare and use them. For example a
is an unsigned 32-bit integer. An

i64

u32

is a signed 64-bit integer.

Types may be inferred or explicitly prefixed to the value:
let v1 = 1000;
let v2 : u32 = 25;
let v3 = 126i8;

Rust also has two types called
size_t

usize

and

isize

respectively. These are equivalent to

in that they are as large enough to hold as many elements as there is addressable

memory. So in a 32-bit operating system they will be 32-bits in size, in a 64-bit operating
system they will be 64-bits in size.

51

Types

Rust will not implicitly coerce an integer from one size to another without explicit use of the
as

keyword.

let v1 = 1000u32;
let v2: u16 = v1 as u16;

Real types
C++
C/C++ has float, double and long double precision floating point types and they suffer the
same vagueness as integer types.
float
double

- "at least as much precision as a

long double

float

- "at least as much precision as a

"

double

"

In most compilers and architectures however a float is a 32-bit single precision value, and a
double is an 64-bit double precision value. The most common machine representation is the
IEEE 754-2008 format.

Long double
The

long double

has proven quite problematic for compilers. Despite expectations that it is

a quadruple precision value it usually isn't. Some compilers such as gcc may offer 80-bit
extended precision on x86 processors with a floating point unit but it is implementation
defined behaviour.
The Microsoft Visual C++ compiler treats it with the same precision as a

double

architectures may treat it as quadruple precision. The fundamental problem with
double

. Other
long

is that most desktop processors do not have the ability in hardware to perform 128-

bit floating point operations so a compiler must either implement it in software or not bother.

Math functions
The



C header provides math functions for working with different precision types.

52

Types

#include 
const double PI = 3.1415927;
double result = cos(45.0 * PI / 180.0);
//..
double result2 = abs(-124.77);
//..
float result3 = sqrtf(9.0f);
//
long double result4 = powl(9,10);

Note how different calls are required according to the precision, e.g. sinf, sin or sinl. C99
supplies a "type-generic" set of macros in



which allows sin to be used

regardless of type.
C++11 provides a



that uses specialised inline functions for the same purpose:

#include 
float result = std::sqrt(9.0f);

Rust
Rust implements two floating point types 32-bit

float

and 64-bit

double

f32

and

f64

. These would be analogous to a

in C/C++.

let v1 = 10.0;
let v2 = 99.99f32;
let v3 = -10e4f64;

Unlike in C/C++, the math functions are directly bound to the type itself providing you
properly qualify the type.
let result = 10.0f32.sqrt();
//
let degrees = 45.0f64;
let result2 = angle.to_radians().cos();

53

Types

Rust does not have a 128-bit double. A

f128

did exist for a period of time but was removed

to portability, complexity and maintenance issues. Note how

long double

is treated (or not)

according to the compiler and target platform.
At some point Rust might get a f128 or f80 but at this time does not have such a type.

Booleans
A

bool

(boolean) type in C/C++ can have the value

promoted to an integer type (0 ==

false

, 1 ==

true

or

true

false

, however it can be

) and a bool even has a ++ operator

for turning false to true although it has no -- operator!?
But inverting true with a ! becomes false and vice versa.
!false == true
!true == false

Rust also has a

bool

type that can have the value

true

or

false

. Unlike C/C++ it is a

true type with no promotion to integer type

void / Unit type
C/C++ uses

void

to specify a type of nothing or an indeterminate pointer to something.

// A function that doesn't return anything
void delete_directory(const std::string &path);
// Indeterminate pointer use
struct file_stat {
uint32_t creation_date;
uint32_t last_modified;
char file_name[MAX_PATH + 1];
};
// malloc returns a void * which must be cast to the type need
file_stat *s = (file_stat *) malloc(sizeof(file_stat));
// But casting is not required when going back to void *
free(s);

54

Types

The nearest thing to

void

in Rust is the Unit type. It's called a Unit type because it's type is

and it has one value of

()

Technically

void

()

.

is absolutely nothing and

()

is a single value of type

()

so they're not

analogous but they serve a similar purpose.
When a block evaluates to nothing it returns

()

. We can also use it in places where we

don't care about one parameter. e.g. say we have a function

do_action()

that succeeds or

fails for various reasons. We don't need any payload with the Ok response so specify

()

as the payload of success:
fn do_action() -> Result<(), String> {
//...
Result::Ok(())
}
let result = do_action();
if result.is_ok() {
println!("Success!");
}

Empty enums
Rust does have something closer (but not the same as)

void

- empty enumerations.

enum Void {}

Essentially this enum has no values at all so anything that assigns or matches this nothingness is unreachable and the compiler can issue warnings or errors. If the code had used
()

the compiler might not be able to determine this.

Tuples
A tuple is a collection of values of the same or different type passed to a function or returned
by one as if it were a single value.
C/C++ has no concept of a tuple primitive type, however C++11 can construct a tuple using
a template:

55

Types

std::tuple v1 = std::make_tuple("Sally", 25);
//
std::cout << "Name = " << std::get<0>(v1)
<< ", age = " << std::get<1>(v1) << std::endl;

Rust supports tuples as part of its language:
let v1 = ("Sally", 25);
println!("Name = {}, age = {}", v1.0, v1.1);

As you can see this is more terse and more useful. Note that the way a tuple is indexed is
different from an array though, values are indexed via .0, .1 etc.
Tuples can also be returned by functions and assignment operators can ignore tuple
members we're not interested in.
let (x, y, _) = calculate_coords();
println!("x = {}, y = {}", x, y);
//...
pub fn calculate_coords() -> (i16, i16, i16) {
(11, 200, -33)
}

In this example, the calculate_coords() function returns a tuple containing three
We assign the first two values to

x

and

y

i16

values.

respectively and ignore the third by passing an

underscore. The underscore tells the compiler we're aware of the 3rd value but we just don't
care about it.
Tuples can be particularly useful with code blocks. For example, let's say we want to get
some values from a piece of code that uses a guard lock on a reference counted service.
We can lock the service in the block and return all the values as a tuple to the recipients
outside of the block:

56

Types

let protected_service: Arc> =
Arc::new(Mutex::new(ProtectedService::new()));
//...
let (host, port, url) = {
// Lock and acquire access to ProtectedService
let protected_service = protected_service.lock().unwrap();
let host = protected_service.host();
let port = protected_service.port();
let url = protected_service.url();
(host, port, url)
}

This code is really neat - the lock allows us to obtain the values, the lock goes out of scope
and the values are returned in one go.

Arrays
An array is a fixed size list of elements allocated either on the stack or the heap.
E.g to create a 100 element array of

double

values in C++:

// Stack
double values[100];
// Heap
double *values = new double[100];
delete []values;
// C99 style brace enclosed lists
double values[100] = {0}; // Set all to 0
double values[100] = {1, 2, 3}; // 1,2,3,0,0,0,0...
// C99 with designator
double values[100] = {1, 2, 3, [99] 99}; // 1,2,3,0,0,0,...,0,99

And in Rust:

57

Types

// Stack
let mut values = [0f64; 100]; // 100 elements
let mut values = [1f64, 2f64, 3f64]; // 3 elements 1,2,3
// Heap
let mut values = Box::new([0f64; 100]);

Note how Rust provides a shorthand to initialise the array with the same value or assigns the
array with every value. Initialisation in C and C++ is optional however it is more expressive in
that portions of the array can be set or not set using enclosed list syntax.
Rust actually forces you to initialise an array to something. Attempting to declare an array
without assigning it a value is a compiler error.

Slices
A slice is a runtime view of a part of an array or string. A slice is not a copy of the array /
string rather that it is a reference to a portion of it. The reference holds a pointer to the
starting element and the number of elements in the slice.
let array = ["Mary", "Sue", "Bob", "Michael"];
println!("{:?}", array);
let slice = &array[2..];
println!("{:?}", slice);

This slice represents the portion of array starting from index 2.
["Mary", "Sue", "Bob", "Michael"]
["Bob", "Michael"]

Size of the array
C and C++ basically give no easy way to know the length of the array unless you
encapsulate the array with a

std::array

or happen to remember it from the code that

declares it.
// C++11
std::array elements;
std::cout << "Size of array = " << elements.size() << std::endl;

58

Types

The

std::array

wrapper is of limited use because you cannot pass arrays of an unknown

size to a function. Therefore even with this template you may pass the array into a function
as one argument and its size as another.
Alternatively you might see code like this:
const size_t num_elements = 1024;
char buffer[num_elements];
//...
// fill_buffer needs to be told how many elements there are
fill_buffer(buffer, num_elements);

Or like this
Element elements[100];
//...
int num_elements = sizeof(elements) / sizeof(Element);

In Rust, the array has a function bound to it called

len()

. This always provides the length

of the array. In addition if we take a slice of the array, that also has a

len()

.

let buffer: [u8; 1024]
println!("Buffer length = {}", buffer.len());
fill_buffer(&buffer[0..10]);
//...
fn fill_buffer(elements: &[Element]) {
println!("Number of elements = {}", elements.len());
}

59

Strings

Strings
Strings in C++ are a bit messy thanks to the way languages and characters have been
mapped onto bytes in different ways. The explanation for this requires some backstory...

What is a character exactly?
Historically in C and C++, a char type is 8-bits. Strictly speaking a char is signed type
(usually -128 to 127), but the values essentially represent the values 0-255.
The US-ASCII standard uses the first 7-bits (0-127) to assign values to upper and lower
case letters in the English alphabet, numbers, punctuation marks and certain control
characters.
It didn't help the rest of the world who use different character sets. And even ASCII was
competing with another standard called EBDIC which was found on mainframe computers.
What about the upper values from 128-255? Some operating systems came up with a
concept called a "code page". According to what "code page" was in effect, the symbol that
the user would see for a character in the 128-255 range would change.
But even this is not enough. Some languages like Chinese, Japanese, Korean, Thai, Arabic
etc. have thousands of symbols that must be encoded with more than one byte. So the first
byte might be a modifier that combines with further bytes to render as a symbol. For
example Microsoft's code page 932 use an encoding called Shift JIS (Japanese) where
some symbols are two bytes.
Obviously this was rapidly becoming a mess. Each code page interpretted the same byte
array differently according to some external setting. So you could not send a file written in
Chinese to someone with a different code page and expect it to render properly.

Unicode to the rescue
The Unicode standard assigns every printable symbol in existence with a unique 32-bit
value, called a code point. Most symbols fall in the first 16-bits called the Basic Multilingual
Plane (BMP).
China has mandated all software must support all 32-bits. We'll see how this has become a
major headache for C and C++

60

Strings

C / C++
There is no string primitive
C and C++ does not have a string primitive type, instead it has

char

type, that is one byte.

A "string" is a pointer to an array of chars that are terminated with a zero byte,

'\0'

.

// The array that my_string points at ends with a hidden \0
char *my_string = "This is as close to a string primitive as you
can get";

In C, functions such as

strlen()

,

strcpy()

,

strdup()

etc. allow strings to be manipulated

but they work by using the zero byte to figure out the length of things. So
number of bytes that were encountered before a

\0

strlen()

the

was found. Sicne these functions run

until they find a terminating character it is very easy to accidentally for them to overrun a
buffer.
In C++ the

std::string

class wraps a char pointer and provides safe methods for

modifying the string in a safe manner. It is a vast improvement over C but it is still not a
primitive - it is a class defined in a header that is compiled and linked to the executable just
like every other class.
In addition, a

std::string

will usually use heap to store the string's data which can have

repercussions for memory usage and fragmentation. There is usually a hidden cost to
assigning one string to another because memory must be allocated to receive a copy of the
string, even if the string itself is not modified during the assignment.

Unicode support
C/C++ added Unicode support by creating a wide character called
an equivalent

std::wstring

wchar_t

. And C++ has

.

We're sorted now right?
Oops no, because

wchar_t

type can be either 2 or 4 bytes wide and is a compiler / platform

specific decision.
In Microsoft Visual C++ the wide char is an

unsigned short

(corresponding to Win32's

Unicode API), in gcc it can be 32-bits or 16-bits according to the compile flags.
A 16-bit value will hold symbols from the Basic Multilingual Plane but not the full 32-bit
range. This means that 16-bit wide strings should be assumed to be UTF-16 encoded
because they cannot support Unicode properly otherwise.
61

Strings

C++11 rectifies this by introducing explicit
corresponding versions of string called

char16_t

and

std::u16string

char32_t

and

types and

std::u32string

.

Character types
So now C++ has 4 character types. Great huh?
Character type

Encoding

char

C, ASCII, EBDIC, UTF-8, ad hoc, ???

wchar_t

UTF-16 or UTF-32

char16_t

UTF-16

char32_t

UTF-32

Rust
Rust has been rather fortunate. Unicode preceded it so it makes a very simple design
choice.
A

char

A

str

type is a 32-bit Unicode character, always enough to hold a single character.
type is a UTF-8 encoded string held in memory. Code tends to use &str which is

a string slice, basically a reference to the str, or a portion of it. A str does not need to be
terminated with a zero byte and can contain zero bytes if it wants.
A

std::String

is a heap allocated string type use for manipulating strings, building

them, reading them from file, cloning them etc.
Note that internally UTF-8 is used for encoding yet a char is 32-bits. The length of a strings
is considered to be its byte length. There are special iterators for walking the string and
decoding UTF-8 into 32-bit characters.
Finally there is a platform specific type

OSString

that handles any differences in how the

operating system sees strings compared to Rust.

Types Comparison

62

Strings

C++

Rust
char *

C++11 -

,

char16_t *

char32_t *

str

,

or

wchar_t *

&str

std::string
std::u16string

,

std::wstring

std::String

std::u32string

char * vs str
C/C++ do not have a string primitive. A string is a pointer to some bytes in memory that are
nul terminated. The same applies for wider chars, except of course they require 2 or 4 bytes.
// The traditional way
char *my_str = "Hello"; // Terminates with \0
// or
char my_str[] = "Hello"; // Terminates with \0
// or wide string with L prefix
wchar_t hello_chinese = L"\u4f60\u597d";
// C11 and C++11 add UTF string literal prefixes
auto hello_8

= u8"\u4f60\u597d"; // UTF-8 encoded

auto hello_16 =

u"\u4f60\u597d"; // UTF-16

auto hello_32 =

U"\u4f60\u597d"; // UTF-32

Rust would use a
in memory. The

str

str

for this purpose. A

str

is an immutable array of bytes somewhere

could be on the heap when it points to a

in global memory if the string is static. A str slice is

&str

String

object, or it could be

, is reference to a str which also

contains a length value.
let my_str = "Hello";
let hello_chinese = "你好";

Type inferences for these assignments will create a string slice pointing to the statically
allocated string data. The data itself doesn't move and the

&str

is read-only.

We can also observe that Rust removes the mess of character width and literal prefixes that
C and C++ have to suffer under because Unicode characters are implicitly supported.
The

str

has functions for iterating over the string in bytes / characters, splitting, find a

pattern etc.

63

Strings

let my_str = "Hello"; // v is a &’static str
println!("My string is {} and it is {} bytes long", v, v.len());

Note

len()

is the length in bytes because strings are UTF-8 encoded. A single character

may be encoded as 1, 2, 3, or 4 bytes. It may not be the number of characters a human
would actually see.
let my_str = "你好";
println!("Number of bytes = {}", my_str.len());
println!("Number of chars = {}", my_str.chars().count());

Number of bytes = 6
Number of chars = 2

You can split a

&str

to produce a left and a right

&str

slice like this:

let (part1, part2) = "Hello".split_at(3);
println!("Part 1 = {}", part1);
println!("Part 2 = {}", part2);

Part 1 = Hel
Part 2 = lo

std::basic_string (C++) vs std::String (Rust)
The standard C++ library also has template class

std::basic_string

that acts as a wrapper

around the various character types and can be used for manipulating a string of any width.
This template is specialised as
std::string

,

std:wstring

,

std::u16string

and

std::u32string

.

64

Strings

std::string my_str = "Hello";
my_str += " world";
// C++11 also allows some type inference with autos
auto s1 =

"Hello"s; // std::string

auto s2 = u8"Hello"s; // std::string, forces UTF-8 encoding
auto s3 = L"Hello"s;

// std::wstring

auto s4 = u"Hello"s;

// std::u16string

auto s5 = U"Hello"s;

// std::u32string

In Rust, the

std::String

type serves the same purpose:

let v = String::from("Hello");
v.push_str(" world");

Using it is fairly simple
let mut v = String::from("This is a String");
v.push_str(" that we can modify");

A

String

has functions to do actions such as appending, e.g.

let b = String::from(" Bananas");
let mut result = String::new();
result.push_str("Apples ");
result.push('&'); // Push a char
result.push_str(b.as_str());
println!("result = {}", result);

Strings are always valid UTF-8.
Internally a String has a pointer to the data, its length and a capacity (max size). If you
intend to expand a string, then you should ensure the

String

has sufficient capacity to

accommodate its longest value otherwise you may cause it to reallocate itself excessively.
Strings will never shrink their capacity unless you explicitly call

shrink_to_fit()

. This

means if you use a temporary string in a loop, it's probably best to place it outside the loop
and reserve space to make it efficient.

65

Strings

let mut v = String::with_capacity(100);
// or
let mut v = String::new();
v.reserve_exact(100);

Strings also have all the methods of str thanks to implementing

Deref

trait.

Formatting strings
Strings can be formatted in C with

printf

operators, e.g. to a

std::stringstream

Rust uses

and

Rust
formatting
trait

C++

%s

format!

,
,
,
,
,

println!

or

sprintf

or in C++ composed with stream

.

macros that more resemble the

sprintf

model.

Purpose

{}

C/C++ require the type of the parameter to be specified. In
Rust the type is inferred and {} will invoked the type's
Display trait regardless of what it is. So a String outputs its
text, numeric types return their value, boolean as returns true
or false, and so on.

{}

C/C++ has extensions to deal with different size ints and
floats, e.g. ll for long long due to the way arguments are
passed to the function. In Rust, there is no need for that.

{:?} ,
{:#?}

In Rust {:?} returns whatever is implemented by a type's
Debug trait. The {:#?} variant can be used to pretty-print the
output for types that derive the Debug trait.

%-10s

{:<10}

Format left aligned string padded to minimum of 10 spaces

%04

{:04}

Pad a number with zero's to a width of 4

%.3

{:.3}

Pad a number's precision to 3 decimal places. May also be
zero-padded, e.g. {:.03}

%u
%d
%i
%f
%c

%lld
%llu

,

%e
%E

,

{:e}
{:E}

,

Exponent in lower or uppercase

%x
%X

,

{:x}
{:X}

,

Hexadecimal in lower or uppercase. Note
prefixes the output with 0x

%o

%p

{:#x}

{:o}

Octal. Note

{:b}

Binary. Note

{:p}

Presents a struct's memory location, i.e. pointer

{:#o}
{:#b}

,

{:#X}

prefixes the output with 0o
prefixes the output with 0b

66

Strings

Rust has many more formatting traits than this.
For example it allows named parameters like this example:
let message = format!("The temperature {temp}C is within
{percent} of maximum", temp = 104, percent = 99);

Named parameters would be particularly useful for localization where the order of values
may be different in one language compared to another.

Display and Debug traits
Rust allows types to be formatted as strings based upon the formatting traits they
implement.
The two main implementation traits are:
Display
Debug

- this is for standard textual representation of a type.

- this is for the debugging textual representation of a type. It might present

additional information or be formatted separately to the Display trait. It is possible to
[derive(Debug)]

#

this trait which is usually enough for the purpose of debugging.

If we look at the traits we can see they're identical
// std::fmt::Display
pub trait Display {
fn fmt(&self, &mut Formatter) -> Result<(), Error>;
}
// std::fmt::Debug
pub trait Debug {
fn fmt(&self, &mut Formatter) -> Result<(), Error>;
}

All of the intrinsic types implement

Display

and

Debug

. We can explicitly implement

Display on our own structs too:

67

Strings

use std::fmt::{self, Formatter, Display};
struct Person {
first_name: String,
last_name: String,
}
impl Display for Person {
fn fmt(&self, f: &mut Formatter) -> fmt::Result {
write!(f, "{} {}", self.first_name, self.last_name)
}
}
//...
let person = Person { first_name: "Susan".to_string(),
last_name: "Smith".to_string() };
println!("Person - {}", person);

Person - Susan Smith

Implementing
The derived

Debug

Debug

is usually done by

#[derive(Debug)]

but it could also be implemented.

will print out the struct name, and then the members in curly braces:

#[derive(Debug)]
struct Person {
//...
}
//...
println!("Person - {:?}", person);

Person - Person { first_name: "Susan", last_name: "Smith" }

Many types process formatting traits which are values held between the

{}

braces in the

string. These are fairly similar to the patterns used in C functions for printf, sprintf etc.

OsString / OsStr

68

Strings

Rust recognises there are times when you need to pass or receive a string from a system
API.
In this case you may use

OsString

which allows interchange between Rust and a system

dependent representations of strings. On Windows it will return UTF-16 strings, on Linux /
Unix systems it will return UTF-8.
An

OsStr

is a slice onto

OsString

, analogous to

str

and

String

.

69

Variables

Variables
C++
Type Inference
C++11 has type inference, previous versions of C++ do not. Type inference allows the
programmer to assign a value to an

auto

typed variable and let the compiler infer the type

based on the assignment.
Boolean and numeric types are fairly easy to understand providing the code is as explicit as
it needs to be.
auto x = true; // bool
auto y = 42;

// int

auto z = 100.; // double

Where C++ gets messy is for arrays and strings. Recall that strings are not primitive types in
the strong sense within C or C++ so auto requires they be explicitly defined or the type will
be wrong.
auto s = std::string("Now is the window of our discontent"); //
char string
auto s = U"Battle of Waterloo"; // char32_t pointer to UTF-32
string literal

Strings are covered elsewhere, but essentially there are many kinds of strings and C++/C
has grown a whole bunch of string prefixes to deal with them all.
Arrays are a more interesting problem. The

auto

keyword has no easy way to infer array

type so is one hack workaround to assign a templatized array to an

auto

and coerce it.

template  using raw_array = T[N];
auto a = raw_array{};

Rust
70

Variables

Rust, variables are declared with a

let

command. The

let

may specify the variable's

type, or it may also use type inference to infer it from the value it is assigned with.
let x = true; // x: bool
let y = 42; // y: i32
let z = 100.0; // z: f64
let v = vec![10, 20, 30]; // v: Vec
let s = "Now is the winter of our discontent".to_string(); // s:
String
let s2 = "Battle of Waterloo"; // s2: &str
let a1: [i32; 5] = [1, 2, 3, 4, 5];

Rust has no problem with using type inference in array assignments:
let a2 = ["Mary", "Fred", "Sue"];

Note that all array elements must be the same type, inference would generate a compiler
error for an array like this:
// Compile error
let a3 = ["Mary", 32, true];

Scope rules
Scope rules in C, C++ and Rust are fairly similar - the scope that you declare the item
determines its lifetime.

Shadowing variables
One very useful feature of Rust is that you can declare the same named variable more than
once in the same scope or nested scopes and the compiler doesn't mind. In fact you'll use
this feature a lot.
This is called shadowing and works like this:

71

Variables

let result = do_something();
println!("Got result {:?}", result);
if let Some(result) = result {
println!("We got a result from do_something");
}
else {
println!("We didn't get a result from do_something");
}
let result = do_something_else();
//...

This example uses the variable name
do_something()

result

, then to extract some value

3 times. First to store the result of calling
Foo

calling something else. We could have assigned
assigned the value

do_something_else()

to

from
result

result3

Option

to

result2

and a third time for
and then later on

but we didn't need to because of

shadowing.

Pointers
In C++
A pointer is a variable that points to an address somewhere in memory. The pointer's type
indicates to the compiler what to expect at the address but there is no enforcement to
ensure that the address actually holds that type. A pointer might might be assigned
(or

nullptr

NULL

in C++11) or may even be garbage if nothing was assigned to it.

char *name = "David Jones";
int position = -1;
find_last_index("find the letter l", 'l', &position);

Generally pointers are used in situations where references cannot be used, e.g. functions
returning allocated memory or parent / child collection relationships where circular
dependencies would prevent the use of references.
C++11 deprecates

NULL

in favour of new keyword

nullptr

to solve a problem with

function overloading.

72

Variables

void read(Data *data);
void read(int value);
// Which function are we calling here?
read(NULL);

Since

NULL

is essentially

#define NULL 0

accident. So C++ introduces an explicit

and 0 is an integer, we call the wrong function by

nullptr

for this purpose.

read(nullptr);

In Rust:
Rust supports pointers, normally called raw pointers however you will rarely use them unless
you need to interact with C API or similar purposes.
A pointer looks fairly similar to that of C++:
// This is a reference coerced to a const pointer
let age: u16 = 27;
let age_ptr: *const u16 = &age;
// This is a mut reference coerced to a mutable pointer
let mut total: u32 = 0;
let total_ptr: *mut u32 = &mut total;

Although you can make a pointer outside of an unsafe block, many of the functions you
might want to perform on pointers are unsafe by definition and must be inside

unsafe

blocks.
The documentation in full is here.

References
In C++
A reference is also a variable that points to an address but unlike a pointer, it cannot be
reassigned and it cannot be

NULL

. Therefore a reference is generally assumed to be safer

than a pointer. It is still possible for the a reference to become dangling, assuming the
address it referenced is no longer valid.

73

Variables

In Rust
A reference is also lifetime tracked by the compiler.

Tuples
A tuple is list of values held in parenthesis. They're useful in cases where transient or ad-hoc
data is being passed around and you cannot be bothered to write a special struct just for that
case.

In C++
C++ does not natively support tuples, but C++11 provides a template for passing them
around like so:
#include 
std::tuple get_last_mouse_click() {
return std::make_tuple(100, 20);
}
std::tuple xy = get_last_mouse_click();
int x = std::get<0>(xy);
int y = std::get<1>(xy);

In Rust
Tuples are part of the language and therefore they're far more terse and easy to work with.
fn get_last_mouse_click() -> (i32, i32) {
(100, 20)
}
// Either
let (x, y) = get_last_mouse_click();
println!("x = {}, y

= {}", x, y);

// or
let xy = get_last_mouse_click();
println!("x = {}, y

= {}", xy.0, xy.1);

74

Variables

75

Literals

Literals
C++
Integers
Integer numbers are a decimal value followed by an optional type suffix.
In C++ an integer literal can be expressed as just the number or also with a suffix. Values in
hexadecimal, octal and binary are denoted with a prefix:
// Integers
42
999U
43424234UL
-3456676L
329478923874927ULL
-80968098606968LL
// C++14
329'478'923'874'927ULL
// Hex, octal, binary
0xfffe8899bcde3728 // or 0X
07583752256
0b111111110010000 // or 0B

The

u

,

type. The

l

, and
u

and

suffixes on integers denotes if it is

ll
l

/

ll

unsigned

,

can be upper or lowercase. Ordinarily the

long
u

or a

long long

must precede the

size but C++14 allows the reverse order.
C++14 also allows single quotes to be inserted into the number as separators - these quotes
can appear anywhere and are ignored.

Floating point numbers
Floating point numbers may represent whole or fractional numbers.

Boolean values

76

Literals

C/C++

literals are

bool

or

true

false

.

Characters and Strings
A character literal is enclosed by single quotes and an optional width prefix. The prefix
indicates a wide character,

u

for UTF-16 and

U

L

for UTF-32.

'a'
L'a' // wchar_t
u'\u20AC' // char16_t
U'\U0001D11E' // char32_t

One oddity of a

char

literal is that

sizeof('a')

yields

sizeof(int)

in C but

sizeof(char)

in C++. It isn't a good idea to test the size of a character literal.
A

and

char16_t

char32_t

are sufficient to hold any UTF-16 and UTF-32 code unit

respectively.
A string is a sequence of characters enclosed by double quotes. A zero value terminator is
always appended to the end. Prefixes work the same as for character literals with an
additional

u8

type to indicate a UTF-8 encoded string.

"Hello"
u8"Hello" // char with UTF-8
L"Hello"

// wchar_t

u"Hello"

// char16_t with UTF-16

U"Hello"

// char32_t with UTF-32

User-defined literals
C++11 introduced user-defined literals. These allow integer, floats, chars and strings to have
a user defined type suffix consisting of an underscore and a lowercase string. The prefix
may act as a form of decorator or even a constant expression operator which modifies the
value at compile time.
C++14 goes further and defines user-defined literals for complex numbers and units of time.
See the link for more information.

Rust
77

Literals

Integers
In Rust number literals can also be expressed as just the number or also with a suffix.
Values in hexadecimal, octal and binary are also denoted with a prefix:
// Integers
123i32;
123u32;
123_444_474u32;
0usize;
// Hex, octal, binary
0xff_u8;
0o70_i16;
0b111_111_11001_0000_i32;

The underscore in Rust is a separator and functions the same way as the single quote in
C++14.

Floating point numbers
Floating point numbers may represent whole or fractional numbers. As with integers they
may be suffixed to indicate their type.
let a = 100.0f64;
let b = 0.134f64;
let c = 2.3f32; // But 2.f32 is not valid (note 1)
let d = 12E+99_E64;

One quirk with floating point numbers is the decimal point is used for float assignments but
it's also used for member and function invocation. So you can't say
you are referencing f32 on 2. Instead syntax requires you to say
declare the type, e.g.

let v: f32 = 2.;

2.f32

2.f32

since it thinks

or alter how you

.

Booleans
Boolean literals are simply

true

or

false

.

78

Literals

true
false

Characters and Strings
A character in Rust is any UTF-32 code point enclosed by single quotes. This value may be
escaped or not since .rs files are UTF-8 encoded.
A special prefix

b

may be used to denote a byte string, i.e. a string where each character is

a single byte.
'x'
'\'' # Escaped single quote
b'&' # byte character is a u8

Strings are the string text enclosed by double quotes:
"This is a string"
b"This is a byte string"

The prefix

b

denotes a byte string, i.e. single byte characters. Rust allows newlines, space,

double quotes and backslashes to be escaped using backslash notation similar to C++.
"This is a \
multiline string"
"This string has a newline\nand an escaped \\ backslash"
"This string has a hex char \x52"

Strings can also be 'raw' to avoid escaping. In this case, the string is prefixed r followed by
zero or more hash marks, the string in double quotes and the same number of hash marks
to close. Byte strings are uninterpretted byte values in a string.
r##"This is a raw string that can contain \n, \ and other stuff
without escaping"##
br##"A raw byte string with "stuff" like \n \, \u and other
things"##

79

Literals

80

Collections

Collections
A collection is something that holds zero or more elements in some fashion that allows you
to enumerate those elements, add or remove elements, find them and so forth.
Vector - a dynamic array. Appending or removing elements from the end is cheap
(providing the array is large enough to accomodate an additional item). Inserting items
or removing them from any other part of the array is more expensive and involves
memory movements. Generally speaking you should always reserve enough space in a
vector for the most elements you anticipate it will hold. Reallocating memory can be
expensive and lead to fragmentation.
Vecdeque - a ring buffer array. Items can be added or removed from either end
relatively cheaply. Items in the array are not arranged sequentially so there is a little
more complexity to managing wraparound and removal than a Vector.
LinkedList - a linked list individually allocates memory for each element making it cheap
to add or remove elements from anywhere in the list. However there is a lot more
overhead to iterating the list by index and much more heap allocation.
Set - a collection that holds a unique set of items. Inserting a duplicate item will not
succeed. Some sets maintain the order of insertion. Sets are useful where you want to
weed out duplicates from an input.
Map - a collection where each item is referenced by a unique key. Some maps can
maintain the order of insertion.
C++ and Rust have have collections as part of their standard library as is common with
modern languages.
C

C++

Rust

-

std::vector

std::vec::Vec

or

-

std::list

std::collections::LinkedList

-

std::set

std::collections::HashSet

,

std::collections::BTreeSet

-

std::map

std::collections::HashMap

,

std::collections::BTreeMap

std::collections::VecDeque

C has no standard collection classes or types. Some libraries offer collection APIs such as
glib or cii.

Iterators
81

Collections

Iterators are a reference to a position in a collection with the means to step through the
collection one element at a time.

C++
C++11 provides a shorthand way of iterating a collection:
std::vector values;
for (const char &c : values) {
// do something to process the value in c
}

Iterators are more explicit in C++98 and before and the code in C++11 is basically equivalent
to this:
std::vector values;
for (std::vector::const_iterator i = values.begin(); i !=
values.end(); ++i) {
const char &c = *i;
// do something to process the value in c
}

This is quite verbose, but essentially each collection type defines a mutable
immutable

const_iterator

the collection. Calling the

type and calling
++

begin

iterator

and

returns an iterator to the beginning of

operator overload on the iterator causes it to advance to the

next element in the collection. When it hits the exclusive value returned by

end

it has

reached the end of the collection.
Obviously with an indexed type such as a

vector

you could also reference elements by

index, but it is far more efficient to use iterators for other collection types.

Processing collections
C++ provides a number of utility templates in for modifying sequences in collections on the
fly.

Rust

82

Collections

Rust also has iterators which work in a similar fashion to C++ - incrementing their way
through collections.
TODO
TODO chaining iterators together
TODO mapping one collection to another collection

83

Structs

Structs
C++
A

class

and a

struct

in C++ are largely the same thing from an implementation

standpoint. They both hold fields and they both can have methods attached to the class
(

static

) or instance level.

class Foo {
public:
// Methods and members here are publicly visible
double calculateResult();
protected:
// Elements here are only visible to ourselves and subclasses
virtual double doOperation(double lhs, double rhs);
private:
// Elements here are only visible to ourselves
bool debug_;
};

The default access level is

public

for struct and

private

for class. Some rules about

templates only apply to classes.
From a psychological perspect a
static and/or passed around. A

struct

class

tends to be used to hold public data that is largely

tends to be something more self contained with

methods that are called to access or manage private fields.
So these are equivalents:

84

Structs

struct Foo { // as a struct
private:
};
class Foo { // As a class
};
// Or the other way around
struct Bar {
};
class Bar {
public:
};

Classes can also use an access specifier to inherit from a base class. So a class may
specify

public

,

protected

or

private

when deriving from another class depending on

whether it wants those methods to be visible to callers, or subclasses.
Classes and structs may have special constructor and destructor methods which are
described in sections below.
class Size {
public:
Size(int width, int height);
int width_;
int height_;
int area() const;
};

Then in the .cpp file you might implement the constructor and method:
Size::Size(int width, int height) : width_(width),
height_(height) {}
int Size::area() { return width_ * height_; }

85

Structs

Rust
Rust only has structs. A

struct

consists of a definition which specifies the fields and their

access level (public or not), and an

impl

section which contains the implementation of

functions bound to the struct.
struct Size {
pub width: i32;
pub height: i32;
}

An

impl

section follows containing the associated functions:

impl Size {
pub fn new(width: i32, height: i32) -> Size {
Size { width: width, height: height, }
}
pub fn area(&self) -> i32 {
self.width * self.height
}
}

The

new()

function here is a convenience method that returns a struct preinitialised with

the arguments supplied. The

area()

function specifies a

area calculation. Any function that supplies a

&self

, or

&self

argument and returns an

&mut self

can be called from the

variable bound to the struct.
let size = Size::new(10, 20);
println!("Size = {}", size.area());

The

self

keyword works in much the same way as C++ uses

this

, as a reference to the

struct from which the function was invoked. If a function modifies the struct it must say
self

&mut

, which indicates the function modifies the struct.

There is no inheritance in Rust. Instead, a

struct

may implement zero or more traits. A trait

describes some kind of behavior that can be associated with the struct and described further
later on in this chapter.

86

Structs

Constructors
In C++ all classes have implicit or explicit constructors. Either the compiler generates them
or you do, or a mix of both.
An implicit default constructor, copy constructor and assignment operator will be created
when a class does not define its own. We saw on page 73 why this could be really bad
news.
What becomes obvious from reading there is a lot of noise and potential for error in C++.
There would be even more if raw pointers were used instead of a

std::unique_ptr

here.

In Rust, things are simpler, and we'll see how it shakes out errors.
First off, let's declare our equivalent struct in Rust:
struct Person {
pub name: String,
pub age: i32,
pub credentials: Option,
}

Since credentials are optional, we wrap in an
or it might be
Person

Some(Credentials)

Option

object, i.e. credentials might be None

. Any code anywhere in the system can instantiate a

simply be declaring an instance:

let person = Person { name: String::from("Bob"), age: 20,
credentials: None }

In Rust you cannot create a struct without initialising all its members so we cannot have a
situation where we don't know what is in each field - it MUST be set by our code.
But declaring the struct is a bit clumsy, especially if the struct is created in lots of places. So
can write function that behaves like a constructor in C++.
Instead you implement a static method in the impl of the Struct which returns an initialised
struct, e.g.

87

Structs

impl Person {
pub fn new(name: String, age: String) -> Person {
Person { name: name.clone(), age: age, credentials: None }
}
}

Note that Rust does not support overloads. So if we had multiple "constructor" methods,
they would each have to have unique names.
Finally what is we wanted to copy the

Person

struct?

By default Rust does not allow copying on user-defined structs. Assigning a variable to
another variable moves ownership, it doesn't copy.
There are two ways to make a user-defined struct copyable
1. implement the

Copy

trait which means assignment is implicit, but is what we want? Do

we really want to make copies of a struct by accident?
2. implement
clone()

Clone

instead to add a

clone()

method and require an explicit call to

order to duplicate the struct a copy.

But the compiler can derive clone() providing all the members of the struct implement the
Clone trait.

88

Structs

#[derive(Clone)]
struct Person {
pub name: String,
pub age: i32,
pub credentials: Option, // Credentials must
implement Clone
}
impl Person {
pub fn new(name: String, age: String) -> Person {
Person { name: name.clone(), age: age, credentials: None }
}
}
//...
let p = Person::new(String::from("Michael"), 20);
let p2 = p.clone();

What we can see is that Rust's construction and

clone()

behavior is basically declarative.

We saw how C++ has all kinds of rules and nuances to construction, copy construction and
assignment which make it complicated and prone to error.

Destructors
A C++ destructor is a specialized method called when your object goes out of scope or is
deleted.
class MyClass {
public:
MyClass() : someMember_(new Resource()) {}
~MyClass() {
delete someMember_;
}
private:
Resource *someMember_;
}

89

Structs

In C++ you can declare a class destructor to be called when the object is about to be
destroyed. You have to use a virtual destructor if your class inherits from another class in
case a caller calls

delete

on the base class.

Since Rust does not do inheritance and does not have constructors, the manner in which
you cleanup is different and simpler. Instead of a destructor you implement the

Drop

trait.

impl Drop for Shape {
fn drop(&mut self) {
println!("Shape dropping!");
}
}

The compiler recognizes this trait. If you implement this trait then the compiler knows to call
your

drop()

function prior to destroying your struct. It’s that simple.

Occasionally there might be a reason to explicitly drop a struct before it goes out of scope.
Perhaps the resources held by the variable should be freed as soon as possible to release a
resource which is in contention. Whatever the reason, the answer is to call

drop

like this:

{
let some_object = SomeObject::new();
//...
// Ordinarily some_object might get destroyed later,
// but this makes it explicitly happen here
drop(some_object);
//...
}

Access specifier rules
A C++ class can hide or show methods and members to any other class, or to things that
inherit from itself using the public, private and protected keywords:
public
private

– can be seen by any code internal or external to the class
– can only be used by code internal to the class. Not even subclasses can

access these members
protected

– can be used by code internal to the class and by subclasses.

90

Structs

A class may designate another function or class as a friend which has access to the private
and protected members of a class.
Rust makes things somewhat simpler.
If you want a struct to be visible outside your module you mark it
pub

pub

. If you do not mark it

then it is only visible within the module and submodules.

pub struct Person { /* ... */ }

If you want publicaccess a member of a struct (including modifying it if its mutable), then
mark it

pub

.

pub struct Person {
pub age: u16,
}

If you want something to be able to call a function on your struct you mark it

pub

.

impl Person {
pub fn is_adult(&self) -> bool {
self.age >= 18
}
}

Functions
Functions can be bound to a struct within an

impl

block:

91

Structs

impl Shape {
pub fn new(width: u32, height: u32) -> Shape {
Shape { width, height }
}
pub fn area(&self) -> i32 {
self.width * self.height
}
pub fn set(&mut self, width: i32, height: i32) {
self.width = width;
self.height = height;
}
}

Functions that start with a

&self

/

&mut self

without are bound to the type. So the
Where

&self

new()

parameter are bound to instances. Those

function can be called as

Shape::new()

.

is provided, the function is invoked on the instance. So for example:

let shape = Shape::new(100, 100);
let area = shape.area();

Where

&mut self

is provided it signifies that the function mutates the struct.

Unlike C++, all access to the struct has to be qualified. In C++ you don't publishing_interval:
Double, lifetime_count: UInt32, max_keep_alive_count: UInt32,
max_notifications_per_publish: UInt32, priority: Bytehave to say

this->foo()

from another member of the class. Rust requires code to say unambiguously

to call foo()
self.foo()

.

Static functions
Static functions are merely functions in the
self

impl

block that do not have

&self

or

&mut

as their first parameter, e.g.

92

Structs

impl Circle {
fn pi() -> f64 { std::f64::consts:PI }
}
//...
let pi = Circle::pi();

In other words they're not bound to an instance of a type, but to the type itself. For example,
Circle::pi()

.

Traits
C++ allows one class to inherit from another. Generally this is a useful feature although it
can get pretty complex if you implement multiple inheritance, particularly the dreaded
diamond pattern.
As we’ve found out, Rust doesn’t have classes at all – they’re structs with bound functions.
So how do you inherit code? The answer is you don’t.
Instead your struct may implement traits which are a bit like partial classes.
A trait is declared like so:
trait HasCircumference {
fn circumference(&self) -> f64;
}

Here the trait

HasCircumference

has a function called

circumference()

whose signature is

defined but must be implemented.
A type can implement the trait by declaring and

impl

of it.

impl HasCircumference for Size {
fn circumference(&self) -> i32 {
2.0 * std::f64::consts::PI * self.radius
}
}

A trait may supply default function implementations. For example, a
might implement

area()

HasDimensions

trait

to spare the implementor the bother of doing it.

93

Structs

trait HasDimensions {
fn width(&self) -> u32;
fn height(&self) -> u32;
fn area(&self) -> u32 {
self.width() * self.height()
}
}

Lifetimes
In C++ an object lives from the moment it is constructed to the moment it is destructed.
That lifetime is implicit if you declare the object on the stack. The object will be created /
destroyed as it goes in and out of scope. It is also implicit if your object is a member of
another object - the lifetime is within the containing object, and the declaration order of other
members in the containing object.
However, if you allocate your object via
delete

too soon, or forget to

delete

new

then it is up to you when to

delete

. If you

then you may destabilize your program. C++

encourages using smart pointers that manage the lifetime of your object, tying it to the
implicit lifetime of the smart pointer itself - when the smart pointer is destroyed, it deletes the
held pointer. A more sophisticated kind of smart pointer allows multiple instances of the
same pointer to exist at once, and reference counting is used so that when the last smart
pointer is destroyed, it destroyes the pointer.
Even so, C++ itself will not care if you initialized a class with a reference or pointer to
something that no longer lives. If you do this, your program will crash.
Let's write an

Incrementor

class which increments an integer value and returns that value.

class Incrementor {
public:
Incrementor(int &value) : value_(value) {}
int increment() { return ++value_; }
private:
int &value_;
};

94

Structs

This seems fine, but what if we use it like this?
Incrementor makeIncrementor() {
// This is a bad idea
int value = 5;
return Incrementor(value);
}

This code passes a reference to an
Incrementor

int

into the class constructor and returns the

from the function itself. But when

increment()

is called the reference is

dangling and anything can happen.

Rust lifetimes
Rust does care about the lifetime of objects and tracks them to ensure that you cannot
reference something that no longer exists. Most of the time this is automatic and self-evident
from the error message you get if you try something bad.
The compiler also implements a borrow checker which tracks references to objects to
ensure that:
1. References are held no longer than the lifetime of the object they refer to.
2. Only a single mutable reference is possible at a time and not concurrently with
immutable references. This is to prevent data races.
The compiler will generate compile errors if it finds code in violation of its rules.
So let's write the equivalent of
reference to a integer

i32

Incrementor

above but in Rust. The Rust code will hold a

and increment it from a bound function:

struct Incrementor {
value: &mut i32
}
impl Incrementor {
pub fn increment(&mut self) -> i32 {
*self.value += 1;
*self.value
}
}

95

Structs

Seems fine, but the first error we get is:
2 |

value: &mut u32

|

^ expected lifetime parameter

We tried to create a struct that manages a reference, but the compiler doesn't know anything
about this reference's lifetime and so it has generated a compile error.
To help the compiler overcome its problem, we will annotate our struct with a lifetime which
we will call

'a

. The label is anything you like but typically it'll be a letter.

This lifetime label is a hint on our struct that says the reference we use inside the struct must
have a lifetime of at least as much the struct itself - namely that
value: &'a mut i32

Incrementor<'a>

and

share the same lifetime constraint and the compiler will enforce it.

struct Incrementor<'a> {
value: &'a mut i32
}
impl <'a> Incrementor<'a> {
pub fn increment(&mut self) -> i32 {
*self.value += 1;
*self.value
}
}

With the annotation in place, we can now use the code:
let mut value = 20;
let mut i = Incrementor { value: &mut value };
println!("value = {}", i.increment());

Note that the annotation

'a

could be any label we like -

'increment

would work if we

wanted, but obviously its more longwinded.
There is a special lifetime called

'static

that refers to things like static strings and

functions which have a lifetime as long as the runtime and may therefore be assumed to
always exist.

Lifetime elision
96

Structs

Rust allows reference lifetimes to be elided (a fancy word for omit) in most function
signatures.
Basically, it assumes that when passing a reference into a function, that the lifetime of the
reference is implicitly longer than the function itself so the need to annotate is not necessary.
fn find_person(name: &str) -> Option
// instead of
fn find_person<'a>(name: &'a str) -> Option

The rules for elision are described in the further reference link.

Further reference
Lifetimes are a large subject and the documentation is here.

97

Comments

Comments
Rust comments are similar to C++ except they may contain Unicode because .rs files are
UTF-8 encoded:
/*
This is a comment
*/
// This a comment with Unicode, 你好

But in addition anything that uses triple slash
rustdoc

///

annotation can be parsed by a tool called

to produce documentation:

/// This is a comment that becomes documentation for do_thing
below
pub fn do_thing() {}
/// Returned by server if the resource could not be found
pub const NOT_FOUND = 404;

Runnining

cargo doc

on a project will cause HTML documentation to be produced from

annotated comments within the file.
Annotation is written in Markdown format. That means you have a human readable language
for writing rich-text documentation and if it's not enough you can resort to HTML via tags.
See here for full documentation

98

Lifetimes, References and Borrowing

Lifetimes, References and Borrowing
When you assign an object to a variable in Rust, you are said to be binding it. i.e your
variable "owns" the object for as long as it is in scope and when it goes out of scope it is
destroyed.
{
let v1 = vec![1, 2, 3, 4]; // Vec is created
...
} // v1 goes out of scope, Vec is dropped

So variables are scoped and the scope is the constraint that affects their lifetime. Outside of
the scope, the variable is invalid.
In this example, it is important to remember the

Vec

is on the stack but the pointer it

allocates to hold its elements is on the heap. The heap space will also be recovered when
the

Vec

is dropped.

If we assign v1 to another variable, then all the object ownership is moved to that other
variable:
{
let v1 = vec![1, 2, 3, 4];
let v2 = v1;
...
println!("v1 = {:?}", v1); // Error!
}

This may seem weird but it's worth remembering a serious problem we saw in C++, that of
copy constructor errors. It is too easy to duplicate a class and inadvertantly share private
date or state across multiple instances.
We don't want objects v1 and v2 to share internal state and in Rust they don't. Rust moves
the data from v1 to v2 and marks v1 as invalid. If you attempt to reference v1 any more in
your code, it will generate a compile error. This compile error will indicates that ownership
was moved to v2.
Likewise, if we pass the value to a function then that also moves ownership:

99

Lifetimes, References and Borrowing

{
let v1 = vec![1, 2, 3, 4];
we_own_it(v1);
println!("v = {:?}", v1);
}
fn we_own_it(v: Vec) {
// ...
}

When we call we_own_it() we moved ownership of the object to the function and it never
came back. Therefore the following call using v1 is invalid. We could call a variation of the
function called we_own_and_return_it() which does return ownership:
v1 = we_own_and_return_it(v1)
...
fn we_own_and_return_it(v: Vec) -> Vec {
// ...
v1
}

But that's pretty messy and there is a better way described in the next section called
borrowing.
These move assignments look weird but it is Rust protecting you from the kinds of copy
constructor error that is all too common in C++. If you assign a non-Copyable object from
one variable to another you move ownership and the old variable is invalid.
If you truly want to copy the object from one variable to another so that both hold
independent objects you must make your object implement the Copy trait. Normally it's
better to implement the Clone trait which works in a similar way but through an explicit
clone() operation.

Variables must be bound to something
Another point. Variables must be bound to something. You cannot use a variable if it hasn't
been initialized with a value of some kind:

100

Lifetimes, References and Borrowing

let x: i32;
println!("The value of x is {}", x);

It is quite valid in C++ to declare variable and do nothing with it. Or conditionally do
something to the variable which confuses the compiler so it only generates a warning.
int result;
{
// The scope is to control the lifetime of a lock
lock_guard guard(data_mutex);
result = do_something();
}
if (result == 0) {
debug("result succeeded");
}

The Rust compiler will throw an error, not a warning, if variables are uninitialised. It will also
warn you if you declare a variable and end up not using it.

References and Borrowing
We've seen that ownership of an object is tracked by the compiler. If you assign one variable
to another, ownership of the object is said to have moved to the assignee. The original
variable is invalid and the compiler will generate errors if it is used.
Unfortunately this extends to passing values into functions and this is a nuisance. But
variable bindings can be borrowed. If you wish to loan a variable to a function for its
duration, you can pass a reference to the object:

101

Lifetimes, References and Borrowing

{
let mut v = Vec::new(); // empty vector
fill_vector(&mut v);
// ...
println!("Vector contains {:?}", v);
}
//...
fn fill_vector(v: &mut Vec) {
v.push(1);
v.push(2);
v.push(3);
}

Here we create an empty vector and pass a mutable reference to it to a function called
fill_vector(). The compiler knows that the function is borrowing v and then ownership is
returned to v after the function returns.

102

Expressions

Expressions
An expression is something that evaluates to something. Just like C++ more or less...
let x = 5 + 5; // expression evaluates to 10

But blocks are expressions too
Where it gets more interesting is that a block of code, denoted by curly braces also
evaluates to an expression. This is legal code:
let x = {};
println!("x = {:?}", x);

What was assigned to x? In this case the block was empty so x was assigned with the value
of

()

. The value

()

is a special unitary type that essentially means neither yes or no. It

just means "value". That is the default type of any function or type. It works a little like

void

in C++ meaning the value is meaningless so don't even look at it.
x = ()

This block also returns a value of

()

.

let x = { println!("Hello"); };
println!("x = {:?}", x);

Again, that's because although the block does stuff (print Hello), it doesn't evaluate to
anything so the compiler returns

()

for us.

So far so useless. But we can change what the block expression evaluates to:

103

Expressions

let x = {
let pi = 3.141592735;
let r = 5.0;
2.0 * pi * r
};
println!("x = {}", x);

Now x assigned with the result of the last line which is an expression. Note how the line is
not terminated with a semicolon. That becomes the result of the block expression. If we’d put
a semicolon on the end of that line as we did with the println!("Hello"), the expression would
evaluate to ().

Use in functions
Trivial functions can just omit the return statement:
pub fn add_values(x: i32, y: i32) -> i32 {
x + y
}

You can use return in blocks too
Sometimes you might explicitly need to use the return statement. The block expression
evaluates at the end of the block so if you need to bail early you could just use return.
pub fn find(value: &str) -> i32 {
if value.len() == 0 {
return -1;
}
database.do_find(value)
}

Simplifying switch statements
In C or C++ you'll often see code like this:

104

Expressions

std::string result;
switch (server_state) {
case WAITING:
result = "Waiting";
break;
case RUNNING:
result = "Running";
break;
case STOPPED:
result = "Stopped";
break;
}
}

The code wants to test a value in server_state and assign a string to result. Aside from
looking a bit clunky it introduces the possibility of error since we might forget to assign, or
add a break, or omit one of the values.
In Rust we can assign directly into result of from a match because each match condition is a
block expression.
let result = match server_state {
ServerState::WAITING => "Waiting",
ServerState::RUNNING => "Running",
ServerState::STOPPED => "Stopped",
};

Not only is this half the length it reduces the scope for error. The compiler will assign the
block expression's value to the variable result. It will also check that each match block
returns the same kind of type (so you can't return a float from one match and strings from
others). It will also generate an error if the ServerState enum had other values that our
match didn't handle.

Ternary operator
The ternary operator in C/C++ is an abbreviated way to perform an if/else expression
condition, usually to assign the result to a variable.
bool x = (y / 2) == 4 ? true : false;

105

Expressions

Rust has no such equivalent to a ternary operator but it can be accomplished using block
expressions.
let x = if y / 2 == 4 { true } else { false };

Unlike C/C++ you could add additiona else ifs, matches or anything else to that providing
each branch returns the same type.

106

Conditions

Conditions
Conditional code is similar between C++ and Rust. You test the boolean truth of an
expression and you can use boolean operators such as && and || to join expressions
together.
int x = 0;
while (x < 10) {
x++;
}
int y = 10;
bool doCompare = true;
if (doCompare && x == y) {
printf("They match!\n");
}

In Rust:
let mut x = 0;
while x < 10 {
x = x + 1;
}
let y = 10;
let do_compare = true;
if do_compare && x == y {
println!("They match!");
}

The most notable difference is that Rust omits the outer braces so the code is slightly
cleaner. You don't have to omit the outer braces but the compiler will issue a warning if you
leave them in.

Ternary operator
The ternary operator is that special ? : shorthand notation you can use to in C++ for simple
conditionals.

107

Conditions

int x = (y > 200) ? 10 : 0;

Rust does not support this notation, however you may take advantage of how a block
evaluates as an expression to say this instead:
let x = if y > 200 { 10 } else { 0 };

So basically you can do one line conditional assignments using if and else. Also note that
you could even throw in an "else if" or two if that's what you wanted to do:
let c = get_temperature();
let water_is = if (c >= 100) { "gas" } else if (c < 0) { "solid"
} else { "liquid" };

Conditional "if let"
One unusual feature is the "if let" pattern. This combines a test to see if something matches
a pattern and if it does, to automatically assign the result to the tuple. It would be most
commonly see in code that returns an enum such as a

Result

or

Option

.

For example:
fn search(name: &str) -> Option { /* ... */ }
//...
if let Some(person) = search("fred") {
println!("You fould a person {}", person);
}
else {
println!("Could not find person");
}

108

Switch / Match

Switch / Match
C++
A

switch

statement in C or C++ allows a condition or a variable to be compared to a series

of values and for code associated with those values to executed as a result. There is also a
default clause to match any value that is is not caught explicitly.
int result = http_get();
switch (result) {
case 200:
success = true;
break;
case 404:
log_error(result);
// Drop through
default:
success = false;
break;
}

Switch statements can be a source of error because behaviour is undefined when a
default

clause is not supplied. It is also possible to inadvertently forget the

break

statement. In the above example, the code explicitly "drops" from the 404 handler into the
default handler. This code would work fine providing someone didn't insert some extra
clauses between 404 and default...
Additionally switch statements only work on numeric values (or

bool

).

Rust
Match is like a

switch

statement on steroids.

In C++ a switch is a straight comparison of an integer value of some kind (including chars
and enums), against a list of values. If the comparison matches, the code next to it executes
until the bottom of the switch statement or a break is reached.
TODO

109

Switch / Match

110

Casting

Casting
Casting is the act of coercing one type to be another, or dynamically producing the
equivalent value in the other type.
C++ has a range of cast operators that turn a pointer or value of one kind into a pointer or
value of another kind.
const_cast(value)

- removes the const enforcement from a value so it may be

modified.
- attempts to convert between types using implicit and user

static_cast(value)

defined conversions.
reinterpret_cast(value)

- a compiler directive to just treat the input as some other

kind. It does not involve any form of conversion.
- attempts to convert a class pointer / reference to/from other

dynamic_cast(value)

classes in its inheritance hierarchy. Involves runtime checks.
Traditional C-style cast - a C++ compiler will attempt to interpret it as a
static_cast

and a

reinterpret_cast

const_cast

,a

in varying combinations.

That's a very brief summary of casting which probably invokes more questions than it
answers. Casting in C++ is very complex and nuanced. Some casts merely instruct the
compiler to ignore const or treat one type as another. A static cast might involve code
generation to convert a type. A dynamic cast might add runtime checks and throw
exceptions.
Rust has nothing equivalent to this complexity. A numeric type may be converted to another
numeric type using the

as

keyword.

let a = 123i32;
let b = a as usize;

Anything beyond this requires implementing the

Into<>

or

From<>

traits and making

conversion an explicit action.
The compiler also does not allow code to cast away
another except through

unsafe

const

-ness or treat one type as

code blocks.

Transmutation
111

Casting

Rust allows some types to be transmuted to others. Transmute is an

unsafe

action but it

allows a memory location to be treated as another type, e.g. an array of bytes as an integer.

112

Enumerations

Enumerations
In C++ an

enum

is a bunch of labels assigned an

int

value. i.e. it is basically a bunch of

constants with scalar values.
enum HttpResponse {
okay = 200,
not_found = 404,
internal_error = 500,
};

C++11 extends this concept a little, allowing you to declare an
of integral type, e.g. a

char

enum

that uses another kind

to hold the values.

enum LibraryCode : char {
checked_in = 'I',
checked_out = 'O',
checked_out_late = 'L'
};

In Rust an

enum

can be a scalar value just like in C++.

enum HttpResponse {
Ok= 200,
NotFound= 404,
InternalError = 500
};

But an enum can also hold actual data so you can convey far more information than a static
value could by itself.
enum HttpResponse {
Ok,
NotFound(String),
InternalError(String, String, Vec)
}

113

Enumerations

You can also bind functions to the enum:
impl HttpResponse {
pub fn code(&self) => {
match *self {
HttpResponse::Ok => 200,
HttpResponse::NotFound(_) => 404,
HttpResponse::InternalError(_, _, _) => 500,
}
}
}

So we might have a function that makes an http request and returns a response:
fn do_request(url: &str) -> HttpResponse {
if url == "/invalid" {
HttpResponse::NotFound(url.to_string())
}
else {
HttpResponse::Ok
}
}
//...
let result = do_request("/invalid");
if let HttpResponse::NotFound(url) = result {
println!("The url {} could not be found", url);
}

Now our code is able to return a more meaningful response in an enum and the code is able
to extract that response to print out useful information.

114

Loops

Loops
C++
For loops
A

for

loop in C/C++ consists of 3 expression sections housed in the

for()

section and a

block of code to execute:
The three segments of a for statement allow:
Zero or more variables to be initialized (can be empty)
Zero or more conditions to be true for the loop to continue (can be empty)
Zero or more actions to perform on each iteration (can be empty).
So this is a valid for loop:
// Infinite
for (;;) {
//...
}

So is this:
for (int i = 10, j = 0; (j = i * i) <= 100; i--) {
//...
}

This is clearly a convoluted and somewhat confusing loop because it mixes assignment and
conditional tests into the terminating text, but it is one which is entirely legal.

Iterating a range
A C++ loop consists of an initialising expression, a condition expression and a a loop
expression separated by semicolons. So a loop that iterates from 0 to 100 looks like this:

115

Loops

for (int i = 0; i < 100; i++ ) {
cout << "Number " << i << endl;
}

Iterating C++ collections
C++ introduces the concept of iterators to its collection classes. An

iterator

is something

that can increment or decrement to traverse a collection.
So to iterate a collection from one end to the other, an iterator is assigned with the
collection's

begin()

iterator and incremented until it matches the

end()

iterator.

for (std::vector::const_iterator i = my_list.begin(); i
!= my_list.end(); ++i ) {
cout << "Value = " << *i << end;
}

C++11 provides new range based for-loop with simpler syntax when iterating over arrays
and collections:
std::vector values;
...
for (const auto & v: values) {
...
}
int x[5] = { 1, 2, 3, 4, 5 };
for (int y : x) {
...
}

Infinite Loop
An infinite loop is one that never ends. The typical way to do this in C++ is to test against an
expression that always evaluates to true or use an empty for loop:

116

Loops

while (true) {
poll();
do_work();
}
// Or with a for loop
for (;;) {
poll();
do_work();
}

While Loop
C++ has conditional

while() {}

and

do { } while()

forms. The former tests the

expression before it even runs while the latter runs at least once before testing the
expression.
while (!end) {
std::string next = getLine();
end = next == "END";
}

The do-while form in C++ will execute the loop body at least once because the condition is
only tested after each iteration instead of before.
int i = 0;
do {
i = rand();
} while (i < 20);

Break and Continue
If you need to exit a loop or start the next iteration early then you use the
continue

break

and

keywords. The break keyword terminates the loop, the continue, causes the loop

to proceed to the next iteration.

117

Loops

bool foundAdministrator = false;
for (int i = 0; i < loginCredentials; ++i) {
const LoginCredentials credentials = fetchLoginAt(i);
if (credentials.disabled) {
// This user login is disabled so skip it
continue;
}
if (credentials .isAdmin) {
// This user is an administrator so no need to search rest
of list
foundAdministrator = true;
break;
}
// ...
}

Rust
For loop
Rust's

for

the trait

loop is actually sugar over the top of iterators. If a structured type implements

IntoIterator

it can be looped over using a

for

loop.

Basically in pseudo code, the loop desugars to this:
If structure type can be turned `IntoIterator`
Loop
If let Some(item) = iterator.next() {
do_action_to_item(item)
Else
break;
End
Else
Compile Error
Done

Iterating a range

118

Loops

A

Range

object in Rust is expressed as

from..to

where

from

and

to

are values or

expressions that evaluate to values.
For example:
let range=0..33;
// Variables
let min = 0;
let max = 100;
let range2 = min..max;

A range is inclusive / exclusive, i.e. the minimum value is included in the

Range

but the

maximum value is exclusive.
Here is a simple loop that counts from 0 to 9
for i in 0..10 {
println!("Number {}", i);
}

The value
Iterator

0..10

is a

Range

that runs from 0 to exclusive of 10. A range implements the

trait so the for loop advances one element at a time until it reaches the end.

Iterators have a lot of functions on them for doing fancy stuff, but one which is useful in loops
is the

enumerate()

function. This transforms the iterator into returning a tuple containing the

index and the value instead of just the value.
So for example:
for (i, x) in (30..50).enumerate() {
println!("Index {} is value {}", i, x);
}

For loop - Iterating arrays and collections
Here is a loop that iterates an array:

119

Loops

let values = [2, 4, 6, 7, 8, 11, 33, 111];
for v in &values {
println!("v = {}", v);
}

Note you can only iterate over an array by reference because iterating it by value would be
destructive.
We can directly use the

iter()

function that arrays and collections implement which works

by reference:
let values = vec![2, 4, 6, 7, 8, 11, 33, 111];
for v in values.iter() {
println!("v = {}", v);
}

If the collection is a map, then iterators will return a key and value tuple
use std::collections::HashMap;
let mut values = HashMap::new();
values.insert("hello", "world");
//...
for (k, v) in &values {
println!("key = {}, value = {}", k, v);
}

Another way to iterate is using the

for_each()

function on the iterator itself:

let values = [2, 4, 6, 7, 8, 11, 33, 111];
values.iter().for_each(|v| println!("v = {}", v));

Break and Continue
Rust also has

break

and

continue

operate on the innermost loop. A

keywords and they operate in a similar fashion - they

continue

will start on the next iteration while a

break

will

terminate the loop.

120

Loops

let values = vec![2, 4, 6, 7, 8, 11, 33, 111];
for v in &values {
if *v % 2 == 0 {
continue;
}
if *v > 20 {
break;
}
println!("v = {}", v);
}

Labels
The

break

and

continue

work by default on the current loop. There will be occasions

where you intend to break out of an enclosing loop instead. For those occasions you can
label your loops and pass that label into the

break

or `continue:

'x: for x in 0..10 {
'y: for y in 0..10 {
if x == 5 && y == 5 {
break 'x;
}
println!("x = {}, y = {}", x, y);
}
}

Infinite Loop
Rust has an explicit infinite

loop

that runs indefinitely:

loop {
poll();
do_work();
}

Rust recommends using this form when an infinite loop is required to assist with code
generation. Note that an infinite loop can still be broken out of using a

break

statement.

121

Loops

While Loop
A

while

loop in Rust looks pretty similar to one written in C/C++. The main difference is

that parentheses are not necessary around the conditional test.
while request_count < 1024 {
process_request();
request_count = request_count + 1;
}

Rust has no equivalent to the do-while loop form. It can be simulated but it looks a bit
inelegant:
let mut i = 0;
loop {
i = i + 1;
if i >= 20 { break; }
}

While let loop
Just as there is an
also a

while let

if let

which tests and assigns a value that matches a pattern, there is

equivalent:

let mut iterator = vec.into_iter();
while let Some(value) = iterator.next() {
process(value);
}

This loop will break when the iterator returns

None

.

122

Functions

Functions
In C++ the standard form of a function is this:
// Declaration
int foo(bool parameter1, const std::string ¶meter2);
// Implementation
int foo(bool parameter1, const std::string ¶meter2) {
return 1;
}

Usually you would declare the function, either as a forward reference in a source file, or in a
header. Then you would implement the function in a source file.
If a function does not return something, the return type is

void

. If the function does return

something, then there should be return statements for each exiting branch within the
function.
You can forego the function declaration in two situations:
1. If the function is inline, i.e. prefixed with the

inline

keyword. In which case the

function in its entireity is declared and implemented in one place.
2. If the function is not inline but is declared before the code that calls it in the same
source file. So if function

foo

above was only used by one source file, then just putting

the implementation into the source would also act as the declaration
In Rust the equivalent to

foo

above is this:

fn foo(parameter1: bool, parameter2: &str) -> i32 {
// implementation
1
}

The implementation is the declaration there is no separation between the two. Functions that
return nothing omit the

->

return section. The function can also be declared before or after

whatever calls it. By default the function is private to the model (and submodules) that
implement it but making it

pub fn

exposes it to other modules.

123

Functions

Like C++, the function must evaluate to something for each exiting branch but this is
mandatory.
Also note, that the

return

keyword is not usually unecessary. Here is a function that adds

two values together and returns them with no return:
fn add(x: i32, y: i32) -> i32 {
x + y
}

Why is there no

return

? As we saw in the section on Expressions, a block evaluates to a

value if we omit the semi-colon from the end so

x + y

is the result of evaluating the

function block and becomes what we return.
There are occasions were you explicitly need the return keyword. Typically you do that if you
want to exit the function before you get to the end of the function block:
fn process_data(number_of_times: ui32) -> ui32 {
if number_of_times == 0 {
return 0;
}
let mut result : ui32 = 0;
for i in number_of_times {
result += i;
}
result
}

Variable arguments
C++ functions can take a variable number of arguments with the ... ellipsis pattern. This is
used in functions such as print, scanf etc.
void printf_like(const char *pattern, ...);

Rust does not support variadic functions (the fancy name for this ability). However you could
pass additional arguments in an array slice if the values are the same, or as a dictionary or a
number of other ways.
TODO Rust example of array slice
124

Functions

Another option is to write your code as a macro. Macros can take any number of
expressions so you are able to write code that takes variable arguments. This is how macros
such println!, format! and vec! work.

Default arguments
C++ arguments can have default values.
std::vector fetch_database_records(int number_to_fetch =
100);

A function defines what its name is, what types it takes and what value (if any) it returns.

Function overloading
C++ functions can be overloaded, e.g.
std::string to_string(int x);
std::string to_string(float x);
std::string to_string(bool x);

Rust does not support overloading. As an alternative, each variation of the function would
have to be named uniquely.

C++11 alternative syntax
C++11 introduces a new syntax which is slightly closer to Rust's in style.
auto function_name(type parameter1, type parameter2, ...) ->
return-type;

This form was created to allow C++ function declarations to more closely to resemble
lambda functions in some scenarios and to help with decltype return values.

125

Polymorphism

Polymorphism
C++
C++ has 4 types of polymorphism:
1. Function name overloading - multiple definitions of the same function taking different
arguments.
2. Coercion - implicit type conversion, e.g. assigning a double to an int or a bool.
3. Parametric - compile type substitution of parameters in templates
4. Inclusion - subtyping a class with virtual methods overloads their functionality. Your code
can use the pointer to a base class, yet when you call the method you are calling the
function implemented by the subtype.
That is to say, the same named function can be overloaded with different parameters.

Function name overloading
class Variant {
public:
void set(); // Null variant
void set(bool value);
void set(int value);
void set(float value);
void set(Array *value);
};

One of the biggest issues that you might begin to see from the above example is that is too
easy to inadvertantly call the wrong function because C++ will also implicitly convert types.
On top of that C++ also has default parameter values and default constructors. So you might
call a function using one signature and be calling something entirely different after the
compiler resolves it.

126

Polymorphism

// Sample code
Variant v;
//...
v.set(NULL);

This example will call the integer overload because
changes to

C++11

was to introduce an explicit

evaluates to 0. One of the

NULL

nullptr

value and type to avoid this issues.

Rust
Rust has limited support for polymorphism.
1. Function name overloading - there is none. See section below for alternatives.
2. Coercion. Rust allows limited, explict coercion between numeric types using the
keyword. Otherwise see below for use on

Into

and

From

as

traits.

3. Parameteric - similar to C++ via generics
4. Inclusion - there is no inheritance in Rust. The nearest thing to a virtual method in rust is
a trait with an implemented function that an implementation overrides with its own.
However this override is at compile time.

Alternatives to function name overloading
If you have a few functions you can just disambiguate them, e.g.
fn new(name: &str) -> Foo { /* ... */ }
fn new_age(name: &str, age: u16) -> Foo { /* ... */ }

Use traits
A common way to do polymorphism is with traits.
There are two standard traits for this purpose:
The

From<>

trait converts from some type into the our type.

The

Into<>

trait converts some type (consuming it in the process) into our type

You only need to implement
The

From

From

or

Into

because one implies the other.

trait is easier to implement:

127

Polymorphism

use std::convert::From;
impl From<&'static str> for Foo {
fn from(v: &'static str) -> Self {
Foo { /* ... */ }
}
}
impl From<(&'static str, u16)> for Foo {
fn from(v: (&'static str, u16)) -> Self {
Foo { /* ... */ }
}
}
//...
let f = Foo::from("Bob");
let f = Foo::from(("Mary", 16));

But let's say we want an explicit
could write it using the

Into

new

constructor function on type

Foo

. In that case, we

trait:

impl Foo {
pub fn new(v: T) -> Foo where T: Into {
let result = Foo::foo(v);
// we could code here that we do here after making Foo by
whatever means
result
}
}

Since

From

implies

Into

we can just call the constructor like so:

let f = Foo::new("Bob");
let f = Foo::new(("Mary", 16));

If you prefer you could implement

Into

but it's more tricky since it consumes the input,

which might not be what you want.

128

Polymorphism

// This Into works on a string slice
impl Into for &'static str {
fn into(self) -> Foo {
//... constructor
}
}
// This Into works on a tuple consisting of a string slice and a
u16
impl Into for (&'static str, u16) {
fn into(self) -> Foo {
//... constructor
}
}
//...
let f: Foo = "Bob".into();
let f: Foo = ("Mary", 16).into();
// OR
let f = Foo::new("Bob");
let f = Foo::new(("Mary", 16));

Use enums
Remember that an enumeration in Rust can contain actual data, so we could also implement
a function that takes an enumeration as an argument that has values for each kind of value it
accepts:

129

Polymorphism

pub enum FooCtorArgs {
String(String),
StringU16(String, u16)
}
impl Foo {
pub fn new(v: FooCtorArgs) {
match v {
FooCtorArgs::String(s) => { /* ... */ }
FooCtorArgs::StringU16(s, i) => { /* ... */ }
}
}
}
//...
let f = Foo::new(FooCtorArgs::String("Bob".to_string()));
let f = Foo::new(FooCtorArgs::StringU16("Mary".to_string(),
16));

130

Error Handling

Error Handling
C++ allows code to throw and catch exceptions. As the name suggests, exceptions indicate
an exceptional error. An exception is thrown to interrupt the current flow of logic and allows
something further up the stack which to catch the exception and recover the situation. If
nothing catches the throw then the thread itself will exit.
void do_something() {
if (!read_file()) {
throw std::runtime_error("read_file didn't work!");
}
}
...
try {
do_something();
}
catch (std::exception e) {
std::cout << "Caught exception -- " << e.what() << std::endl;
}

Most coding guidelines would say to use exceptions sparingly for truly exceptional situations,
and use return codes and other forms of error propagation for ordinary failures. However
C++ has no simple way to confer error information for ordinary failures and exceptions can
be complicated to follow and can cause their own issues.
Rust does not support exceptions. Rust programs are expected to use a type such as
Option

or

Result

to propagate errors to their caller. In other words, the code is expected

to anticipate errors and have code to deal with them.
The

Option

enum either returns

None

or

Some

where the

Some

is a payload of data. It's a

generic enum that specifies the type of what it may contain:
enum Option {
None
Some(T)
}

131

Error Handling

For example, we might have a function that searches a database for a person's details, and
it either finds them or it doesn't.
struct Person { /* ... */}
fn find_person(name: &str) {
let records = run_query(format!("select * from persons where
name = {}", sanitize_name(name)));
if records.is_empty() {
None
}
else {
let person = Person::new(records[0]);
Some(person)
}
}

The

Result

enum either returns a value of some type or an error of some type.

enum Result {
Ok(T),
Err(E)
}

So we might have a function

set_thermostat

for setting the room temperature.

132

Error Handling

fn set_thermostat(temperature: u16) -> Result<(), String> {
if temperature < 10 {
err(format!("Temperature {} is too low", temperature))
}
else if temperature > 30 {
err(format!("Temperature {} is too high", temperature))
}
else {
Ok(())
}
}
// ...
let result = set_thermostat();
if result.is_ok() {
// ...
}

This function will return a unity

()

value for success, or a

String

for failure.

The ? directive
Let's say you have 2 functions
calls

find_user

delete_user

and

find_user

. The function

delete_user

first

to see if the user even exists and then proceeds to delete the user or return

the error code that it got from

find_user

.

133

Error Handling

fn delete_user(name: &str) -> Result<(), ErrorCode> {
let result = find_user(name);
if let Ok(user) = result {
// ... delete the user
Ok(())
}
else {
Err(result.unwrap_err())
}
}
fn find_user(name: &str) -> Result {
//... find the user OR
Err(ErrorCode::UserDoesNotExist)
}

We have a lot of code in

delete_user

to handle success or failure in

its failure code upwards. So Rust provides a convenience

?

find_user

and throw

mark on the end of the call to

a function that instructs the compiler to generate the if/else branch we hand wrote above,
reducing the function to this:
fn delete_user(name: &str) -> Result<(), ErrorCode> {
let user = find_user(name)?;
// ... delete the user
Ok(())
}

Providing you want to propogate errors up the call stack, this can eliminate a lot of messy
conditional testing in the code and make it more robust.
Older versions of Rust used a special
confused with

try-catch

try!()

macro for this same purpose (not to be

in C++) which does the same thing. So if you see code like this, it

would be the same as above.
fn delete_user(name: &str) -> Result<(), ErrorCode> {
let user = try!(find_user(name));
// ... delete the user
Ok(())
}

134

Error Handling

Nuclear option - panic!()
If code really wants to do something equivalent to a throw / catch in C++ it may call panic!().
This is NOT recommended for dealing with regular errors, only irregular ones that the code
has no way of dealing with.
This macro will cause the thread to abort and if the thread is the main programme thread,
the entire process will exit.
A panic!() can be caught and should be if Rust is being invoked from another language. The
way to catch an unwinding panic is a closure at the topmost point in the code where it can
be handled.
use std::panic;
let result = panic::catch_unwind(|| {
panic!("Bad things");
});

135

Lambda Expressions / Closures

Lambda Expressions / Closures
Lambdas in C++11
A lambda expression, or lambda is an anonymous function that can be declared and passed
around from within the scope of the call itself.
This can be particularly useful when you want to sort, filter, search or otherwise do some
trivial small action without the bother of declaring and maintaining a separate function.
In C++ a lambda looks like this:
float values[10] = { 9, 3, 2.1, 3, 4, -10, 2, 4, 6, 7 };
std::sort(values, values + 10, [](float a, float b) {
return a < b;
});

This lambda is passed to a std::sort function to sort an array of values by some criteria.
A C++ lambda can (but doesn't have to) capture variables from the enclosing scope if it
wishes and it can specify capture clauses in the

[]

section that define how capture is

made. Captures can be by value or reference, and can explicitly list the variables to capture,
or specify to capture everything by reference or assignment. A lambda that captures
variables effectively becomes a closure.
auto v1 = 10.;
auto v2 = 2.;
// Capture by value
auto multiply = [v1, v2]() { return v1 * v2; };
// Capture by reference
auto sum = [&v1, &v2]() { return v1 + v2; };
cout << multiply() << endl;
cout << sum() << endl;
v1 = 99; // Now v1 in sum() references 99
cout << multiply() << endl;
cout << sum() << endl;

136

Lambda Expressions / Closures

We can see from the output that
v2

, whereas

sum()

multiply()

has captured copies of the values in

v1

and

captures by reference and so it is sensitive to changes in the

variables:
20
12
20
101

A capture can also specify a default capture mode by specifying
by reference

&

=

in the capture clause or

and then specify capture behaviour for specific variables.

So our captures above could be simplified to:
// Capture by value
auto multiply = [=]() { return v1 * v2; };
// Capture by reference
auto sum = [&]() { return v1 + v2; };

Note that C++ lambdas can exhibit dangerous behaviour - if a lambda captures references
to variables that go out of scope, the lambda's behaviour is undefined. In practice that could
mean the application crashes.

Closures in Rust
Rust implements closures. A closure is like a lambda except it automatically captures
anything it references from the enclosing environment. i.e. by default it can access any
variable that is in the enclosing scope.
Here is the same sort snippet we saw in C++ expressed as Rust. This closure doesn't
borrow anything from its enclosing scope but it does take a pair of arguments to compare
two values for sorting. The

sort_by()

function repeatedly invokes the closure to sort the

array.
use std::cmp::Ord;
let mut values = [ 9.0, 3.0, 2.1, 3.0, 4.0, -10.0, 2.0, 4.0,
6.0, 7.0 ];
values.sort_by(|a, b| a < b );
println!("values = {:?}", values);

137

Lambda Expressions / Closures

A closure that uses a variable from the enclosing scope borrows it by default. That means
the borrowed variable can't change while the closure is in scope. To change the value we
must ensure the closure goes out of scope to free the borrow, e.g. with a block:
let mut x = 100;
{
let square = || x * x;
println!("square = {}", square());
}
x = 200;

Alternatively you can

move

variables used by the closure so it owns them and they become

inaccessible from the outerscope. Since our closure was accessing an integer, the move
becomes an implicit copy. So our
Even if we change

x

square

closure has its own

in the outer scope to

200

x

assigned the value

100

.

, the closure has its own independent copy.

let mut x = 100;
let square = move || x * x;
println!("square = {}", square()); // 10000
x = 200;
println!("square = {}", square()); // 10000

This is the equivalent to the C++ code above that used lambda expressions to bind to copies
and references:
let mut v1 = 10.0;
let v2 = 2.0;
let multiply = move || v1 * v2;
let sum = |x: &f64, y: &f64| x + y;
println!("multiply {}", multiply());
println!("sum {}", sum(&v1, &v2));
v1 = 99.0;
println!("multiply {}", multiply());
println!("sum {}", sum(&v1, &v2));

This will yield the same results as the C++ code. The main difference here is that rather than
binding our closure to a reference, we passed the reference values in as parameters to the
closure.

138

Lambda Expressions / Closures

139

Templates / Generics

Templates / Generics
C++ offers templates as a way to write generic code using an abstract type and then
specialize it by substituting one or more types into a concrete class.
template 
inline void debug(const T &v) {
cout << "The value of object is " << v << endl;
}
//...
debug(10);

This template uses the type of the parameter (int this case 10) to create an inline function
that prints out the value of that type:
The value of object is 10

Classes can also be made from templates:
template 
class Stack {
private:
vector elements;
public:
void push(const T &v) {
// ...
}
T pop() {
// ...
}
}
//...
Stack doubleStack;

This class implements a simple stack using a template to indicate the type of object it
contains.

140

Templates / Generics

This is a very powerful mechanism and the C++ library makes extensive use of it.
Where templates can become a bit of a mess is that the templates are inline and the
compiler will expand out anything you call before attempting to compile it.
An innocuous error such as using a type that has no default copy constructor in a collection
can cause the compiler to go nuts and output a wall of indecipherable errors.

Generic Functions
Rust's equivalent to a template is called a generic. A generic generalizes a function or a trait
so it works with different types that match the criteria.
So the Rust equivalent of the

debug()

function in C++ would be this.

use std::fmt;
fn debug(data: T) where T: fmt::Display {
println!("The value of object is {}", data);
}
//...
debug(10);

Here we describe a function that takes a generic type
must implement the trait

std::fmt::Display

T

where the constraint is that

T

. Any struct that implements this trait can

passed into the call. Since integer types implement the trait, we can just call it directly as
debug(10)

and the compiler is happy.

Generic structs
Similarly we can use generics on a struct. So the equivalent in Rust of the C++ template
class

Stack

is this:

141

Templates / Generics

struct Stack {
elements: Vec
}
impl Stack {
fn new() -> Stack { Stack { elements: Vec::new() } }
fn push(v: T) {
//...
}
fn pop() -> Option {
//...
None
}
}
//...
let double_stack: Stack = Stack::new();

Where clause
The

where

clause can be added to impose constraints on what generic type must do to be

allowed to be supplied to the generic function or struct.
For example we might have a function that takes a closure as an argument. A closure is a
function and so we want to define the shape that the closure will take.
So:
fn compare(a: T, b: T, f: F) -> bool
where F: FnOnce(T, T) -> bool
{
f(a, b)
}
let comparer = |a, b| a < b;
let result = compare(10, 20, comparer);

142

Templates / Generics

Here we have defined a
The

where

compare()

function that takes a couple of values of the same type.

clause states that the function must take two values of the same type and return

a boolean. The compiler will ensure any closure we pass in matches that criteria, as indeed
our

comparer

closure does.

143

Attributes

Attributes
C++ has various ways to give compiler directives during compilation:
Compile flags that control numerous behaviours
#pragma

statements -

comment

have been wildly abused in some compilers to insert "comments" into object

once

,

optimize

,

comment

,

pack

etc. Some pragmas such as

files that control the import / export of symbols, static linking and other functionality.
#define

with ubquitous

Keywords

inline

,

/

#ifdef

const

,

#else

volatile

/

#endif

blocks

etc.. These hint the code and allow the compiler

to make decisions that might change its output or optimization. Compilers often have
their own proprietary extensions.
Rust uses a notation called attributes that serves a similar role to all of these things but in a
more consistent form.
An attribute

#[foo]

applies to the next item it is declared before. A common attribute is

used to denote a unit test case with

#[test]

:

#[test]
fn this_is_a_test() {
//...
}

Attributes can also be expressed as

#![foo]

which affects the thing they're contained by

rather the thing that follows them.
fn this_is_a_test() {
#![test]
//...
}

Attributes are enclosed in a

#[ ]

block and provide compiler directives that allow:

Functions to be marked as unit or benchmark tests
Functions to be marked for conditional compilation for a target OS. A function can be
defined that only compiles for one target. e.g. perhaps the code that communicates with
another process on Windows and Linux is encapsulated in the same function but
implemented differently.

144

Attributes

Enable / disable lint rules
Enable / disable compiler features. Certain features of rust may be experimental or
deprecated and may have to be enabled to be accessed.
Change the entry point function from

main

to something else

Conditional compilation according to target architecture, OS, family, endianess, pointer
width
Inline hinting
Deriving certain traits
Enabling compiler features such as plugins that implement procedural macros.
Importing macros from other crates
Used by certain crates like serde and rocket to instrument code - NB Rocket uses
unstable compiler hooks for this and in so doing limits itself to working in nightly builds
only.

Conditional compilation
Conditional compilation allows you to test the target configurations and optionally compile
functions or modules in or not.
The main configurations you will test include:
Target architecture - "x86", "x86_64", mips", "arm" etc.
Target OS - "windows", "macos", "ios", "linux", "android", "freebsd" etc.
Target family - "unix" or "windows"
Target environment - "gnu", "msvc" etc
Target endianess
Target pointer width
So if you have a function which is implemented one way for Windows and another for Linux
you might code it like so:
#[cfg(windows)]
fn get_app_data_dir() -> String { /* ... */ }
#[cfg(not(windows))]
fn get_app_data_dir() -> String { /* ... */ }

Many more possibilities are listed in the documentation.

Linking to native libraries
145

Attributes

In C/C++ code is first compiled and then it is linked, either by additional arguments to the
compiler, or by invoking a linker.
In Rust most of your linking is taken care for you providing you use

cargo

.

1. All your sources are compiled and linked together.
2. External crates are automatically built as static libs and linked in.
3. But if you have to link against something external through FFI you have to write a
#link

directive in your

lib.rs

#pragma(comment, "somelib")

or

main.rs

. This is somewhat analogous to the

in C++.

C++

Rust

#pragma (comment, "somelib")

The default kind for

#[link(name = "somelib")]
#[link(name = "somelib", kind = "static")]

#link

is

dynamic

library but

static

can be explicitly stated specified.

Inlining code
Inlining happens where your function logic is inserted in-place to the code that invokes it. It
tends to happen when the function does something trivial such as return a value or execute
a simple conditional. The overhead of duplicating the code is outweighed by the
performance benefit.
Inlining is achieved in C++ by declaring and implementing a function, class method or
template method in a header or marking it with the inline keyword.
In Rust, inlining is only a hint. Rust recommends not forcing inlning, rather leaving it as a hint
for the LLVM compiler to do with as it sees fit.
C++
Explicitly with inline or implicitly through methods
implemented in class / struct

Rust
#[inline] , #[inline(always)]
#[inline(never)]

,

Another alternative to explicitly inlining code is to use the link-time optimisation in LLVM.
rustc -C lto

146

Multi-threading

Multithreading
Multithreading allows you to run parts of your programming concurrently, performing tasks in
parallel. Every program has a main thread - i.e. the one your

main()

started from, in

addition to which are any that you create.
Examples of reasons to use threads:
Long running operations, e.g. zipping up a large file.
Activity that is blocking in nature, e.g. listening for connections on a socket
Processing data in parallel, e.g. physics, collision detection etc.
Asynchronous activities, e.g. timers, polling operations.
In addition, if you use a graphical toolkit, or 3rd party libraries they may spawn their own
threads that you do not know about.

Thread safety
One word you will hear a lot in multithreading is thread safety.
By that we mean:
Threads should not be able to modify the data at the same time. When this happens it is
called a data race and can corrupt the data, causing a crash. e.g. two threads trying to
append to a string at the same time.
Threads must not lock resources in a way that could cause deadlock i.e. thread 1
obtains a lock on resource B and blocks on resource A, while thread 2 obtains a lock on
resource A and blocks on resource B. Both threads are locked forever waiting for a
resource to release that never will be.
Race conditions are bad, i.e. the order of thread execution produces unpredictable
results on the output from the same input.
APIs that can be called by multiple threads must either protect their data structures or
make it an explicit problem of the client to sort out.
Open files and other resources that are accessed by multiple threads must be managed
safely.

Protecting shared data
Data should never be read at the same time it is written to in another thread. Nor should
data be written to at the same time by two threads.

147

Multi-threading

The common way to prevent this is either:
Use a mutex to guard access to the data. A mutex is a special class that only one
thread can lock at a time. Other threads that try to lock the mutex will wait until the lock
held by another thread is relinquished
Use a read-write lock. Similar to a mutex, it allows one thread to lock the thread for
writing data, however it permits multiple threads to have read access, providing nothing
is already writing to it. For data that is read more frequently than it is modified, this is a
lot more efficient than just a mutex.

Avoiding deadlock
The best way to avoid deadlock is only ever obtain a lock to one thing ever and release it as
soon as you are done. But if you have to lock more than one thing, ensure the locking order
is consistent between all your threads. So if thread 1 locks A and B, then ensure that thread
2 also locks A and B in that order and not B then A. The latter is surely going to cause a
deadlock.

C / C++
C and C++ predate threading to some extent so until C++11 the languages have had little
built-in support for multi-threading and what there was tended to be compiler specific
extensions.
A consequence of this is that C and C++ have ZERO ENFORCEMENT of thread safety. If
you data race - too bad. If you forget to write a lock in one function even if you remembered
all the others - too bad. You have to discipline yourself to think concurrently and apply the
proper protections where it is required.
The consequence of not doing so may not even be felt until your software is in production
and that one customer starts complaining that their server freezes about once a week. Good
luck finding that bug!

Multithreading APIs
The most common APIs would be:


,



- from C++11 onwards

POSIX threads, or pthreads. Exposed by POSIX systems such as Linux and most other
Unix derivatives, e.g. OS X. There is also pthread-win32 support built over the top of
Win32 threads.
Win32 threads. Exposed by the Windows operating system.
148

Multi-threading

OpenMP. Supported by many C++ compilers.
3rd party libraries like Boost and Qt provide wrappers that abstract the differences
between thread APIs.
All APIs will have in common:
Thread creation, destruction, joins (waiting on threads) and detaches (freeing the thread
to do what it likes).
Synchronization between threads using locks and barriers.
Mutexes - mutual exclusion locks that protect shared data.
Conditional variables - a means to signal and notify of conditions becoming true.

std::thread
The

std::thread

represents a single thread of execution and provides an abstraction over

platform dependent ways of threading.
#include 
#include 
using namespace std;
void DoWork(int loop_count) {
for (int i = 0; i < loop_count; ++i) {
cout << "Hello world " << i << endl;
}
}
int main() {
thread worker(DoWork, 100);
worker.join();
}

The example spawns a thread which invokes the function and passes the parameter into it,
printing a message 100 times.

std::mutex
C++ provides a family of various

mutex

types to protect access to shared data.

149

Multi-threading

The mutex is obtained by a

lock_guard

and other attempts to obtain the mutex are blocked

until the lock is relinquished.
#include 
#include 
#include 
using namespace std;
mutex data_guard;
int result = 0;
void DoWork(int loop_count) {
for (auto i = 0; i < loop_count; ++i) {
lock_guard guard(data_guard);
result += 1;
}
}
int main() {
thread worker1(DoWork, 100);
thread worker2(DoWork, 150);
worker1.join();
worker2.join();
cout << "result = " << result << endl;
}

POSIX threads
The pthreads API is prefixed

pthread_

and works like so:

150

Multi-threading

#include 
#include 
using namespace std;
void *DoWork(void *data) {
const int loop_count = (int) data;
for (int i = 0; i < loop_count; ++i) {
cout << "Hello world " << i << endl;
}
pthread_exit(NULL);
}
int main() {
pthread_t worker_thread;
int result = pthread_create(&worker_thread, NULL, DoWork,
(void *) 100);
// Wait for the thread to end
result = pthread_join(worker_thread, NULL);
}

This example spawns a thread which invokes DoWork with the payload of 100 which causes
the function to print a message 100 times.

Win32 Threads
Win32 threading has functions analogous to those in POSIX. They have names such as
CreateThread

,

ExitThread

,

SetThreadPriority

etc.

OpenMP API
Open Multi-Processing (OpenMP) is an API for multi-threaded parallel processing. OpenMP
relies on compiler support because you use special

#pragma

directives in your source to

control thread creation and access to data.
GCC, Clang and Visual C++ have support for OpenMP so it is an option.
OpenMP is a complex standard but the use of directives can make for cleaner code than
invoking threading APIs directly. The downside is it is also more opaque hiding what the
software is doing, making it considerably more difficult to debug.

151

Multi-threading

OpenMP is described in detail at the OpenMP website.

Thread local storage
Thread local storage, or TLS is static or global data which is private to every thread. Each
thread holds its own copy of this data so it can modify it without fear of causing a data race.
Compilers also have proprietary ways to decorate types as thread local:
__thread int private; // gcc / clang
__declspec(thread) int private; // MSVC

C++11 has gained a

thread_local

directive to decorate variables which should use TLS.

thread_local int private

Rust
We saw with C++ that you had to be disciplined to remember to protect data from race
conditions.
Rust doesn't give you that luxury 1. Any data that you share must be protected in a thread safe fashion
2. Any data that you pass between threads must be marked thread safe

Spawning a thread
Spawning a thread is easy enough by calling

spawn

, supplying the closure you want to run

in the context of your new thread.
use std::thread;
thread::spawn(move || {
println!("Hello");
});

Alternatively you can supply a function to

spawn

which is called in the same manner.

152

Multi-threading

fn my_thread() {
println!("Hello");
}
//...
thread::spawn(my_thread);

If you supply a closure then it must have a lifetime of

'static

because threads can outlive

the thing that created them. i.e. they are detached by default.
A closure can make use of move values that are marked

Send

so the compiler allows

ownership to transfer between threads.
Likewise function / closure may also return a value which is marked

Send

so the compiler

can transfer ownership between the terminating thread and the thread which calls

join

to

obtain the value.
So the thread above is detached. If we wanted to wait for the thread to complete, the
returns a

JoinHandle

that we can call

join

spawn

to wait for termination.

let h = thread::spawn(move || {
println!("Hello");
});
h.join();

If the closure or function returns a value, we can use

join

to obtain it.

let h = thread::spawn(move || 100 * 100);
let result = h.join().unwrap();
println!("Result = {}", result);

Data race protection in the compiler
Data races are bad news, but fortunately in Rust the compiler has your back. You MUST
protect your shared data or it won't compile.
The simplest way to protect your data is to wrap the data in a mutex and provide each
thread instance with a reference counted copy of the mutex.

153

Multi-threading

let shared_data = Arc::new(Mutex::new(MySharedData::new()));
// Each thread we spawn should have a clone of this Arc
let shared_data = shared_data.clone();
thread::spawn(move || {
let mut shared_data = shared_data.lock().unwrap();
shared_data.counter += 1;
});

Here is a full example that spawns 10 threads that each increment the counter.

154

Multi-threading

struct MySharedData {
pub counter: u32,
}
impl MySharedData {
pub fn new() -> MySharedData {
MySharedData {
counter: 0
}
}
}
fn main() {
spawn_threads();
}
fn spawn_threads() {
let shared_data = Arc::new(Mutex::new(MySharedData::new()));
// Spawn a number of threads and collect their join handles
let handles: Vec> = (0..10).map(|_| {
let shared_data = shared_data.clone();
thread::spawn(move || {
let mut shared_data = shared_data.lock().unwrap();
shared_data.counter += 1;
})
}).collect();
// Wait for each thread to complete
for h in handles {
h.join();
}
// Print the data
let shared_data = shared_data.lock().unwrap();
println!("Total = {}", shared_data.counter);
}

So the basic strategy will be this:

155

Multi-threading

1. Every thread will get it's own atomic reference to the mutex.
2. Each thread that wishes to access the shared must obtain a lock on the mutex.
3. Once the lock is released, the next waiting thread can obtain access.
4. The compiler will enforce this and generate errors if ANYTHING is wrong.

Read Write Lock
A read write lock works much like a mutex - we wrap the shared data in a
in an

Arc

RwLock

, and then

.

let shared_data = Arc::new(RwLock::new(MySharedData::new()));

Each thread will then either need to obtain a read lock or a write lock on the shared data.
let shared_data = shared_data.read().unwrap();
// OR
let mut shared_data = shared_data.write().unwrap();

The advantage of a

RwLock

is that many threads can concurrently read the data, providing

nothing is writing to it. This may be more efficient in many cases.

Sending data between threads using channels
TODO mpsc channel

Thread local storage
As with C++ you may have reason to use thread local storage
thread_local! {
// TODO
}

Useful crates
Rayon

156

Multi-threading

The rayon crate implements parallel iterators that allow your collections to be iterated in
parallel. The crate utilises work stealing and divide and conquer algorithms couple to a
thread pool to process collections more quickly than they could be in a sequential fashion.
Generally speaking this is a drop-in replacement with the exception that you call
instead of

iter

. The crate implements a

ParallelIterator

par_iter

trait on collection classes.

use rayon::prelude::*;
fn sum_of_squares(input: &[i32]) -> i32 {
input.par_iter()
.map(|&i| i * i)
.sum()
}

See the crate site for more information.

157

Lint

Lint
C/C++ compilers can issue many useful warnings but the amount of static analysis they can
do is usually quite limited.
The Rust compiler performs a far more rigorous lifecycle check on data and then follows up
with a lint check that inspects your code for potentially bad or erroneous
In particular it looks for:
Dead / unused code
Unreachable code
Deprecated methods
Undocumented functions
Camel case / snake case violations
Unbounded recursion code (i.e. no conditionals to stop recursion)
Use of heap memory when stack could be used
Unused extern crates, imports, variables, attributes, mut, parentheses
Using "while true {}" instead of "loop {}"
Lint rules can be enforced more strictly or ignored by using attributes:
#[allow(rule)]
#[warn(rule)]
#[deny(rule)]
#[forbid(rule)]

A full list of lint rules can be found by typing "rustc -W help":

158

Lint

name

default

meaning

----

-------

-------

allow

use of owned (Box type)

allow

detects transmutes of

allow

detects potentially-

box-pointers
heap memory
fat-ptr-transmutes
fat pointers
missing-copy-implementations

forgotten implementations of `Copy`
missing-debug-implementations

allow

detects missing

allow

detects missing

implementations of fmt::Debug
missing-docs

documentation for public members
trivial-casts

allow

detects trivial casts

allow

detects trivial casts of

which could be removed
trivial-numeric-casts

numeric types which could be removed
unsafe-code

allow

usage of `unsafe` code

...

There are a lot checks than are listed here.

159

Macros

Macros
C / C++ Preprocessor
C languages are little unusual in that they are compiled in two phases. The first phase is
called the preprocess. In this phase, the preprocessor looks for directives starting with a #
symbol and runs string substitution and conditional inclusion / exclusion based on those
directives. Only after the file has been preprocessed does the compiler attempt to compile it.
Preprocessor directives start with a

#

symbol. For example the

#define

directive creates

a macro with an optional value:
#define IS_WINDOWS
#define SHAREWARE_VERSION 1

We'll explore macros more in a moment. Another directive is the
#ifdef\#else\#endif

#if\#else\#endif

or

which can be used to include code from one branch or the other of a

test according to what matches.
#if SHAREWARE_VERSION == 1
showNagwarePopup();
#endif
//...
#ifdef IS_WINDOWS
writePrefsToRegistry();
#else
writePrefsToCfg();
#endif

Another directive is

#include

. In C and C++, public functions and structures are typically

defined and implemented in separate files. The

#include

directive allows a header to be

pulled in to the front of any file that makes use of those definitions.

160

Macros

// System / external headers tend to use angle style
#include 
#include 
// Local headers tend to use double quotes
#include "MyClass.h"

The important thing to remember in all of this is ALL of these things happen before the
compiler even starts! Your

main.c

might only be 10 lines of code but if you

#include

some

headers the preprocessor may be feeding many thousands of lines of types, functions into
the compiler, all of which are evaluated before they get to your code.

C / C++ Macros
Macros are string substitution patterns performed by the preprocessor before the source is
compiled. As such they can be very prone to error and so have been deprecated in favour of
constants and inline functions.
Here is a simple macro that would behave in an unexpected manner:
#define MULTIPLY(x, y) x * y
//
int x = 10, y = 20;
int result = MULTIPLY(x + 1, x + y);
// Value is NOT 330 (11 * 30), it's 41 because macro becomes x +
1 * x + y

The macro is very simple - multiply x by y. But it fails if either argument is an expression.
Judicious use of parentheses might avoid the error in this case, but we could break it again
using some pre or post increments.
Macros in C++ are also unhygenic, i.e. the macro can inadvertently conflict with or capture
values from outside of itself causing errors in the code.
#define SWAP(x, y) int tmp = y; y = x; x = y;
//
int tmp = 10;
int a = 20, b = 30;
SWAP(a, b); // ERROR

161

Macros

Here our SWAP macro uses a temporary value called

tmp

that already existed in the scope

and so the compiler complains. A macro might avoid this by using shadow variables
enclosed within a

do / while(0)

block to avoid conflicts but it is less than ideal.

#define SWAP(x, y) do { int tmp = y; y = x; x = y } while(0);

Consequently inline functions are used wherever possible. Even so macros are still
frequently used in these roles:
To conditionally include for a command-line flag or directive, e.g. the compiler might
#define WIN32

so code can conditionally compile one way or another according to its

presence.
For adding guard blocks around headers to prevent them being #include'd more than
once. Most compilers implement a "#pragma once directive" which is an increasingly
common alternative
For generating snippets of boiler plate code (e.g. namespace wrappers), or things that
might be compiled away depending on #defines like DEBUG being set or not.
For making strings of values and other esoteric edge cases
Writing a macro is easy, perhaps too easy:
#define PRINT(x) \
printf("You printed %d", x);

This macro would expand to printf before compilation but it would fail to compile or print the
wrong thing if x were not an integer.

Rust macros
Macros in Rust are quite a complex topic but they are more powerful and safer than the
ones in C++.
Rust macros are hygenic. That is to say if a macro contains variables, their names do
not conflict with, hide, or otherwise interfere with named variables from the scope
they're used from.
The pattern supplied in between the brackets of the macro are tokenized and
designated as parts of the Rust language. identifiers, expressions etc. In C / C++ you
can #define a macro to be anything you like whether it is garbage or syntactically
correct. Furthermore you can call it from anywhere you like because it is preprocessed
even before the compiler sees the code.

162

Macros

Rust macros are either declarative and rule based with each rule having a left hand side
pattern "matcher" and a right hand side "substitution". Or they're procedural and actualy
rust code turns an input into an output (see section below).
Macros must produce syntactically correct code.
Declarative macros can be exported by crates and used in other code providing the
other code elects to enable macro support from the crate. This is a little messy since it
must be signalled with a #[macro_export] directive.
With all that said, macros in Rust are complex - perhaps too complex - and generally
speaking should be used as sparingly as possible.
Here is a simple declarative macro demonstrating repetition called hello_x!(). It will take a
comma separated list of expressions and say hello to each one of them.
macro_rules! hello_x {
($($name:expr),*) => (
$(println!("Hello {}", $name);)*
)
}
// The code can supply as many arguments it likes to this macro
hello_x!("Bob", "Sue", "John", "Ellen");

Essentially the matcher matches against our comma separate list and the substitution
generates one println!() with the message for each expression.
Hello Bob
Hello Sue
Hello John
Hello Ellen

What if we threw some other expressions into that array?
hello_x!("Bob", true, 1234.333, -1);

Well that works too:

163

Macros

Hello Bob
Hello true
Hello 1234.333
Hello -1

What about some illegal code:
hello_x!(Aardvark {});

We get a meaningful error originating from the macro.
error[E0422]: `Aardvark` does not name a structure
|
8 | hello_x!(Aardvark {});
|

^^^^^^^^

:2:27: 2:58 note: in this expansion of format_args!
:3:1: 3:54 note: in this expansion of print!
(defined in )
:5:7: 5:35 note: in this expansion of println! (defined in
)
:8:1: 8:23 note: in this expansion of hello_x! (defined in
)

Real world example - vec!()
Rust comes with a lot of macros for reducing some of the leg work of tedious boiler plate.
For example the vec!() macro is a way to declare a std::Vec and prepopulate it with some
values.
Here is the actual vec! macro source code taken from the Rust source:

164

Macros

macro_rules! vec {
($elem:expr; $n:expr) => (
$crate::vec::from_elem($elem, $n)
);
($($x:expr),*) => (
<[_]>::into_vec(box [$($x),*])
);
($($x:expr,)*) => (vec![$($x),*])
}

It looks complex but we will break it down to see what it does. Firstly it has a match-like
syntax with three branches that expand to anything that matches the left hand side:

First branch
The first matcher matches a pattern such as
value

100

goes into

$n

1; 100

. The value

1

goes into

$elem

, the

:

($elem:expr; $n:expr) =>

(

$crate::vec::from_elem($elem, $n)
);

The

$crate

is a special value that resolves to the module crate which happens to be std.

So this expands to this:
let v = vec!(1; 100);
// 1st branch matches and it becomes this
let v = std::vec::from_elem(1, 100);

Second branch
The second matcher contains a glob expression - zero or more expressions separated by
comma (the last comma is optional). Each matching expression ends up in

$x

:

($($x:expr),*) => (
<[_]>::into_vec(box [$($x),*])
);

165

Macros

So we can write:
let v = vec!(1, 2, 3, 4, 5);
// 3nd branch matches and it becomes this
let v = <[_]>::into_vec(box [1, 2, 3, 4, 5]);

The box keyword tells Rust to allocate the supplied array on the heap and moves the
ownership by calling a helper function called intovec() that wraps the memory array with a
Vec instance. The <[\]>:: at the front is a turbo-fish notation to make the into_vec() generic
function happy.

Third branch
The third branch is a little odd and almost looks the same as the second branch. But take at
look the comma. In the last branch it was next to the asterisk, this time it is inside the inner
$().
($($x:expr,)*) => (vec![$($x),*])

The matcher matches when the the comma is there and if so recursively calls vec!() again to
resolve to the second branch matcher:
Basically it is there so that there can be a trailing comma in our declaration and it will still
generate the same code.
// 3rd branch matches this
let v = vec!(1, 2, 3, 4, 5,);
// and it becomes this
let v = vec!(1, 2, 3, 4, 5);
// which matches 2nd branch to become
let v = <[_]>::into_vec(box [1, 2, 3, 4, 5]);

Procedural Macros
So far we've talked about declarative macros that expand out to be Rust code based upon
how they pattern match the rules defined by the macro.

166

Macros

A second kind of macro is the procedural macro. A procedural macro is a plugin written in
Rust that is compiled and loaded by the compiler to produce arbitrary Rust code as its
output.
A procedural macro can therefore be thought of as a code generator but one that forms part
of the actual compiler. Procedural macros can be particularly useful for:
Serialization / deserialization (e.g. the serde module generates code for reading and
writing structs to a variety of formats - JSON, YAML, TOML, XML etc.)
Domain Specific Languages (e.g. embedded SQL, regular expressions etc).
Aspect oriented programming (e.g. extra debugging, performance metrics etc)
New lint and derive rules
For more information look at this section on compiler plugins in the Rust book.

Other forms of conditional compilation
We saw that the C / C++ preprocessor can be used for conditional compilation. The
equivalent in Rust is attributes. See the attributes section to see how they may be used.

167

Memory Allocation

Memory allocation
This section is concerned with memory allocation, i.e. creating objects that reside on the
heap and not on the stack, and the manner in which they are created and are destroyed.

C++
C and C++ have various standard ways to allocate memory:
1.

malloc

2.

new

3.

new[]

Invoking

/

calloc

and

/

delete

and

realloc()

/

free()

functions

(C++ only)

delete[]

malloc()

and

for arrays (C++ only)
on a C++ class or struct is never a good idea since it will not

free()

call the corresponding class constructor or destructor. The

realloc()

function allocates a

new piece of memory, copying the contents of an existing piece of memory before freeing
the original.
// malloc / free
char *buffer = (char *) malloc(1024);
...
free(buffer);
// new / delete
Stack *stack = new Stack();
...
delete stack;
// new[] / delete[]
Node *nodes = new Node[100];
...
delete []nodes;

In each case the allocation must be matched by the corresponding free action so
immediately we can see scope for error here:
1. Ownership rules can get messy, especially when a class is passed around a lot - who
deletes the object and when?
2. Not using the correct
instead of

new

&

delete

pair, causing a memory leak. e.g. calling

delete

delete[]

168

Memory Allocation

3. Forgetting to free memory at all causing a memory leak.
4. Freeing memory more than once.
5. Calling a dangling pointer, i.e. a pointer which refers to freed memory.
6. Allocating / freeing in a way that causes heap fragmentation. Reallocation can cause
fragmentation to happen a lot faster.
C++ has smart pointers which manage the lifetime on objects and are a good way to
programmer error:
{
std::auto_ptr db(new Database());
//... object is deleted when db goes out of scope
}
// C++11
{
std::unique_ptr db(new Database());
//... object is deleted when db goes out of scope
std::unique_ptr nodes db(new Database());
// Reference count db
setDatabase(db);
//... object is deleted when last shared_ptr reference to it
goes out of scope
std::shared_ptr nodes
}
impl Blob {
pub fn new() {
Efficient {
data: Box::new([0u8; 16384])
}
}
}

Whoever owns the box can access it. Essentially, that means you can pass the box around
from one place to another and whatever binds to it last can open it. Everyone else’s binding
becomes invalid and will generate a compile error.
A box can be useful for abstraction since it can refer to a struct by a trait it implements
allowing decoupling between types.
TODO example of a struct holding a box with a trait implemented by another struct
It can be useful for situations where one piece of code creates an object on behalf of another
piece of code and hands it over. The Box makes sure that the ownership is explicit at all
times and when the box moves to its new owner, so does the lifetime of the object itself.

Cell
A

Cell

is something that can copied with a

get()

or

set()

to overwrite its own copy. As

the contents must be copyable they must implement the Copy trait.
The

Cell

has a zero-cost at runtime because it doesn’t have to track borrows but the

restriction is it only works on Copy types. Therefore it would not be suitable for large objects
or deep-copy objects.

RefCell
Somewhat more useful is the

RefCell

but it incurs a runtime penalty to maintain read-

write locks.

171

Memory Allocation

The

RefCell

holds a reference to an object that can be borrowed either mutably or

immutably. These references are read-write locked so there is a runtime cost to this since
the borrow must check if something else has already borrowed the reference.
Typically a piece of code might borrow the reference for a scope and then the borrow
disappears when it goes out of scope. If a borrow happens before the last borrow releases, it
will cause a panic.

Reference Counting objects
Rust implements

Rc<>

and

Arc<>

for the purpose of reference counting objects that need

to be shared and used by different parts of code. Rc<> is a single threaded reference
counted wrapper, while

Arc<>

is atomic reference counted wrapper. You use one or the

other depending on whether threads are sharing the object.
A reference counted object is usually wrapping a

Box

,

Cell

or

Refcell

. So multiple

structs can hold a reference to the same object.

Rc
From

std::rc::Rc

. A reference counted object can be held by multiple owners at a time.

Each own holds a cloned

Rc

but the T contents are shared. The last reference to the

object causes the contents to be destroyed.

Arc
From

std::sync::Arc

. An atomic reference counted object that works like

Rc

except it

uses an atomically incremented counter which makes it thread safe. There is more overhead
to maintain an atomic reference count. If multiple threads access the same object they are
compelled to use

Arc

172

Foreign Function Interface

Foreign Function Interface
Rust doesn't work in a vaccum and was never intended as such. Instead it was always
assumed that it would need to call other code and other code would need to call it,
Call other libraries via their entry points
Produce libraries in Rust that can be called by code written in another language. e.g. C,
C++, Python, Ruby etc.
To that end it has the Foreign Function Interface, the means to define external functions,
expose its own functions without name mangling and to invoke unsafe code that would
otherwise be illegal in Rust.

Calling out to C libraries
Rust supports the concept of a foreign function interface which is a definition of an external
function or type that is resolved at link time.
For example, we might wish to link to a library called foo.lib, and invoke a command
foo_command().
#[link(name = "foo")]
extern {
fn foo_command(command: *mut u8)
}

To call this function we have to turn off safety checks first because we are stepping out of
the bounds of Rust's lifetime enforcement. To do this we wrap the call in an unsafe block to
disable the safety checks:
pub fn run_command(command: &[u8]) {
unsafe {
foo_command(command.as_ptr());
}
}

Note how we can use unsafe features like pointers inside of this unsafe block. This allows
interaction with the outside world while still enforcing safety for the rest of our code.

173

Foreign Function Interface

Making Rust code callable
The converse is also possible. We can produce a library from Rust that can be invoked by
some other code.
For example, imagine we have some code written in Python. The code works fine but it is
not performant and the bottle neck is in just one portion of the code, e.g. some file operation
like a checksum. We want our code to consist of a make_checksum() and a
release_checksum().
extern crate libc;
use std::ffi::CString;
use std::ptr;
use libc::{c_char, c_void, malloc, memset, strcpy, free};
#[no_mangle]
pub extern "C" fn make_checksum(filepath: *const c_char) -> *mut
c_char {
// Your code here
if filepath == ptr::null() {
return ptr::null_mut::()
}
unsafe {
// Imagine our checksum code here...
let result = malloc(12);
memset(result, 0, 12);
strcpy(result as *mut c_char,
CString::new("abcdef").unwrap().as_ptr());
return result as *mut c_char;
}
}
#[no_mangle]
pub extern "C" fn release_checksum(checksum: *const c_char) {
unsafe {
free(checksum as *mut c_void);
}
}

174

Foreign Function Interface

Now in Python we can invoke the library simply:
import ctypes
checksum = ctypes.CDLL("path/to/our/dll");
cs = checksum.make_checksum("c:/somefile");
...
checksum.release_checksum(cs)

The FFI specification goes into a lot more detail than this and explains concepts such as
callbacks, structure packing, stdcall, linking and other issues that allow full interoperability.

libc
Rust maintains a crate called libc which holds types and functions corresponding to C.
A dependency to libc would be added to the

Cargo.toml

of your project:

[dependencies]
libc = "0.2.17"

And the file that uses the functions would contain a preamble such as this saying what types
and functions it calls:
extern crate libc;
use libc::{c_char, malloc, free, atoi};

Other libraries
There are also crates that have the definitions of structures, types and functions.
WinAPI bindings for Win32 programming APIs.
OpenSSL bindings for OpenSSL

175

Porting from C/C++ to Rust

Fixing Problems in C/C++
This section is not so much concerned with the correct way code should be written but what
with the C/C++ languages allow. Successive versions of these languages have attempted to
retrofit good practice, but not by eliminating the bad one!
All of these things should be considered bad and any construct in the language that enables
them is also bad:
Calling a pure virtual function in a constructor / destructor of a base class
Calling a dangling pointer
Freeing memory more than once
Using default copy constructors or assignment operators without following the rule of
three
Overflowing a buffer, e.g. being off by one with some string operation or not testing a
boundary condition
Memory leaks due to memory allocation / ownership issues
Heap corruption
The C++ programming language is a very large specification, one that only grows and gets
more nuanced and qualified with each release.
The problem from a programmer's perspective is understanding what things C++ allows
them to do as oppose to what things they should do.
In each case we'll see how Rust might have stopped us getting into this situation in the first
place.

What about C?
C++ will come in for most of the criticism in this section. Someone might be inclined to think
that therefore C does not suffer from problems.
Yes that is true to some extent, but it is akin to arguing we don't need shoes because we
have no legs. C++ exists and is popular because it is perceived as a step up from C.
Namespaces
Improved type checking
Classes and inheritance
Exception handling
More useful runtime library including collections, managed pointers, file io etc.

176

Porting from C/C++ to Rust

The ability to model classes and bind methods to them is a major advance. The ability to
write RAII style code does improve the software's chances of keeping its memory and
resource use under control.

Compilers Will Catch Some Errors
Modern C/C++ compilers can spot some of the errors mentioned in this section. But usually
they'll just throw a warning out. Large code bases always generate warnings, many of which
are innocuous and it's easy to see why some people become numb to them as they scroll
past.
The simplest way to protect C / C++ from dumb errors is to elevate serious warnings to be
errors. While it is not going to protect against every error it is still better than nothing.
In Microsoft VC++ enable a high warning level, e.g. /W4 and possibly /WX to warnings
into errors.
In GCC enable -Wall, -pedantic-errors and possibly -Werror to turn warnings into errors.
The pedantic flag rejects code that doesn't follow ISO C and C++ standards. There are
a lot of errors that can be configured.
However this will probably throw up a lot of noise in your compilation process and some of
these errors may be beyond your means to control.
In addition it is a good to run a source code analysis tool or linter. However these tend to be
expensive and in many cases can be extremely unwieldy.

177

Copy Constructor / Assignment Operators

Copy Constructor / Assignment Operators
In C++ you can construct one instance from another via a constructor and also by an
assignment operator. In some cases a constructor will be used instead of an assignment:
PersonList x;
PersonList y = x; // Copy constructor, not assignment
PersonList z;
z = x; // Assignment operator

By default C++ generates all the code to copy and assign the bytes in one class to another
without any effort. Lucky us!
So our class PersonList might look like this:
struct Person {
//...
};
class PersonList {
std::vector *personList_;
public:
PersonList() : personList_(new std::vector) {
}
~PersonList() {
delete personList_;
}
// ... Methods to add / search list
};

Except we're not lucky, we just got slimed. The default byte copy takes the pointer in
personList_

and makes a copy of it. Now if we copy

x

to

y

three classes pointing to the same private data! On top of that,
personList_

the one from

, or assign
z

x

to

z

we have

allocated its own

during its default constructor but the byte copy assignment overwrote it with
x

so its old

personList_

value just leaks.

178

Copy Constructor / Assignment Operators

Of course we might be able to use a

std::unique_ptr

to hold our pointer. In which case the

compiler would generate an error. But it might not always be that simple.

personList_

may

have been opaquely allocated by an external library so have no choice but to manage its
lifetime through the constructor and destructor.

The Rule of Three
This is such a terrible bug enabling problem in C++ that it has given rise to the so-called the
Rule of Three1.
The rule says that if we explicitly declare a destructor, copy constructor or copy assignment
operator in a C++ class then we probably need to implement all three of them to safely
handle assignment and construction. In other words the burden for fixing C++'s default and
dangerous behaviour falls onto the developer.
So let's fix the class:

179

Copy Constructor / Assignment Operators

struct Person {
//...
};
class PersonList {
std::vector *personList_;
public:
PersonList() : personList_(new std::vector) {
}
PersonList(const PersonList &other) :
personList_(new std::vector)

{

personList_->insert(
personList_->end(), other.personList_->begin(),
other.personList_->end());
}
~PersonList() {
delete personList_;
}
PersonList & operator=(const PersonList &other) {
// Don't forget to check if someone assigns an object to
itself
if (&other != this) {
personList_->clear();
personList_->insert(
personList_->end(), other.personList_>begin(),
other.personList_->end());
}
return *this;
}
// ... Methods to add / search list
};

What a mess!

180

Copy Constructor / Assignment Operators

We've added a copy constructor and an assignment operator to the class to handle copying
safely. The code even had to check if it was being assigned to itself in case someone wrote
x = x

. Without that test, the receiving instance would clear itself in preparation to adding

elements from itself which would of course wipe out all its contents.
Alternatively we might disable copy / assignments by creating private constructors that
prevents them being called by external code:
class PersonList {
std::vector *personList_;
private:
PersonList(const PersonList &other) {}
PersonList & operator=(const PersonList &other) { return
*this; }
public:
PersonList() : personList_(new std::vector) {
}
~PersonList() {
delete personList_;
}
// ... Methods to add / search list
};

Another alternative would be to use noncopyable types within the class itself. For example,
the copy would fail if the pointer were managed with a C++11
boost::scoped_ptr

std::unique_ptr

(or Boost's

).

Boost also provides a

boost::noncopyable

class which provides yet another option. Classes

may inherit from noncopyable which implements a private copy constructor and assignment
operator so any code that tries to copy will generate a compile error.

The Rule of Five
The Rule of Three has become the Rule of Five(!) in C++11 because of the introduction of
move semantics.

181

Copy Constructor / Assignment Operators

If you have a class that can benefit from move semantics, the Rule of Five essentially says
that the existence of the user-defined destructor, copy constructor and copy assignment
operator requires you to also implement a move constructor and a move assignment
operator. So in addition to the code we wrote above we must also write two more methods.
class PersonList {
// See class above for other methods, rule of three....
PersonList(PersonList &&other) {
// TODO
}
PersonList &operator=(PersonList &&other) {
if (&other != this) {
// TODO
}
return

*this

}

How Rust helps
Move is the default
Rust helps by making move semantics the default. i.e. unless you need to copy data from
one instance to another, you don't. If you assign a struct from one variable to another,
ownership moves with it. The old variable is marked invalid by the compiler and it is an error
to access it.
But if you do want to copy data from one instance to another then you have two choices.
Implement the

Clone

trait. Your struct will have an explicit

clone()

function you can

call to make a copy of the data.
Implement the

Copy

move. Implementing
call

clone()

trait. Your struct will now implicitly copy on assignment instead of
Copy

also implies implementing

Clone

so you can still explicitly

if you prefer.

Primitive types such as integers, chars, bools etc. implement

Copy

so you can just assign

one to another

182

Copy Constructor / Assignment Operators

// This is all good
let x = 8;
let y = x;
y = 20;
assert_eq!(x, 8);

But a

String

cannot be copied this way. A string has an internal heap allocated pointer so

copying is a more expensive operation. So

String

only implements the

Clone

trait which

requires you to explicitly duplicate it:
let copyright = "Copyright 2017 Acme Factory".to_string();
let copyright2 = copyright.clone();

The default for any struct is that it can neither be copied nor cloned.
struct Person {
name: String,
age: u8
}

The following code will create a
is assigned to

person2

Person

object, assigns it to

person1

. And when

person1

, ownership of the data also moves:

let person1 = Person { name: "Tony".to_string(), age: 38u8 };
let person2 = person1;

Attempting to use

person1

after ownership moves to

person2

will generate a compile

error:
println!("{}", person1.name); // Error, use of a moved value

To illustrate consider this Rust which is equivalent to the PersonList we saw in C++
struct PersonList {
pub persons: Vec,
}

183

Copy Constructor / Assignment Operators

We can see that
Vec

PersonList

has a

Vec

vector of

Person

objects. Under the covers the

will allocate space in the heap to store its data.

Now let's use it.
let mut x = PersonList { persons: Vec::new(), };
let mut y = x;
// x is not the owner any more...
x.persons.push(Person{ name: "Fred".to_string(), age: 30u8} );

The variable

x

is on the stack and is a

PersonList

but the persons member is partly

allocated from the heap.
The variable

x

we assign

to

x

is bound to a PersonList on the stack. The vector is created in the heap. If
y

then we could have two stack objects sharing the same pointer on the

heap in the same way we did in C++.
But Rust stops that from happening. When we assign

x

bitwise copy of the data in x, but it will bind ownership to

to
y

y

, the compiler will do a

. When we try to access the in

the old var Rust generates a compile error.
error[E0382]: use of moved value: `*x.persons`
|
10 | let mut y = x;
|

----- value moved here

11 | x.persons.push(Person{});
| ^^^^^^^^^ value used here after move
|
= note: move occurs because `x` has type `main::PersonList`,
which does not implement the `Copy` trait

Rust has stopped the problem that we saw in C++. Not only stopped it but told us why it
stopped it - the value moved from x to y and so we can't use x any more.

Implementing the Copy trait
The

Copy

trait allows us to do direct assignment between variables. The trait has no

functions, and acts as a marker in the code to denote data that should be duplicated on
assignment.

184

Copy Constructor / Assignment Operators

You can implement the

Copy

trait by deriving it, or implementing it. But you can only do so if

all the members of the struct also derive the trait:
#[derive(Copy)]
struct PersonKey {
id: u32,
age: u8,
}
// Alternatively...
impl Copy for PersonKey {}
impl Clone for PersonKey {
fn clone(&self) -> PersonKey {
*self
}
}

So

PersonKey

will take the

is copyable because types

#[derive(Copy)]

u32

and

u8

are also copyable and the compiler

directive and modify the move / copy semantics for the struct.

But when a struct contains a a type that does not implement
error. So this struct
implement

Person

Copy

will cause a compiler error because

you will get a compiler

String

does not

Copy:

#[derive(Copy)]
struct Person {
name: String,
age: u8
}
// Compiler error!

Implementing the Clone trait
The

Clone

trait adds a

clone()

function to your struct that produces an independent copy

of it. We can derive it if every member of the struct can be cloned which in the case of
Person

it can:

185

Copy Constructor / Assignment Operators

#[derive(Clone)]
struct Person {
name: String,
age: u8
}
...
let x = Person { /*...*/ };
let y = x.clone();

Now that Person derives

, we can do the same for PersonList because all its member

Clone

types implement that trait - a Person can be cloned, a Vec can be cloned, and a Box can be
cloned:
#[derive(Clone)]
struct PersonList {
pub persons: Box>,
}

And now we can clone

x

into

y

and we have two independent copies.

//...
let mut x = PersonList { persons: Box::new(Vec::new()), };
let mut y = x.clone();
// x and y are two independent lists now, not shared
x.persons.push(Person{ name: "Fred".to_string(), age: 30} );
y.persons.push(Person{ name: "Mary".to_string(), age: 24} );

Summary
In summary, Rust stops us from getting into trouble by treated assigns as moves when a
non-copyable variable is assigned from one to another. But if we want to be able to clone /
copy we can make our intent explicit and do that too.
C++ just lets us dig a hole and fills the dirt in on top of us.

186

Missing Braces in Conditionals

Missing Braces in Conditionals
Every programmer eventually encounters an error like this and spends hours trying to figure
out why it wasn't working.
const bool result = fetch_files();
if (result) {
process_files()
}
else
print_error()
return false;
// Now cleanup and return success
cleanup_files();
return true;

The reason of course was the else statement wasn't enclosed in braces so the wrong code
was executed. The compiler might spot dead code in this instance but that may not always
be the case. Even if it did, it might only issue a warning instead of an error.
The problem can be especially annoying in deeply nested conditions where a misplaced
brace can attach to the wrong level. This problem has lead real-world security issues. For
example here is the infamous "goto fail" bug that occured in some Apple products. This
(intentional?) bug occured during an SSL handshake and was exploitable. :

187

Missing Braces in Conditionals

static OSStatus
SSLVerifySignedServerKeyExchange(
SSLContext *ctx, bool isRsa, SSLBuffer signedParams,
uint8_t *signature, UInt16 signatureLen)
{
OSStatus

err;

//...
if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
goto fail;
goto fail;
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
goto fail;
//...
fail:
SSLFreeBuffer(&signedHashes);
SSLFreeBuffer(&hashCtx);
return err;
}

Note how the "goto fail" is repeated twice and not bound to the condition but is indented as if
it was. The code would jump straight into the fail label and return with an err indicating
success (since the prior SHA1 update had succeeded). If conditionals

How Rust helps
Rust requires if-else expressions and loops to be associated with blocks.
So this code won't compile:

188

Missing Braces in Conditionals

let mut x: i32 = do_something();
if x == 200 {
// ...
}
else
println!("Error");

If you try you will get an error like this.
rustc 1.13.0-beta.1 (cbbeba430 2016-09-28)
error: expected `{`, found `println`
|
8 |
|

println!("Error");
^^^^^^^

|
help: try placing this code inside a block
|
8 |

println!("Error");

|

^^^^^^^^^^^^^^^^^^

error[E0425]: unresolved name `do_something`
|
3 | let mut x: i32 = do_something();
|

^^^^^^^^^^^^ unresolved name

189

Assignment in Conditionals

Assignment in Conditionals
The omission of an

=

in an

==

condition turns it into an assignment that evaluates to true:

int result = getResponseCode();
if (result = 200) { // BUG!
// Success
}
else {
//... Process error
}

So here, result was assigned the value 200 rather than compared to the value 200.
Compilers should issue a warning for these cases, but an error would be better.
Developers might also try to reverse the left and right hand side to mitigate the issue:
if (200 = result) { // Compiler error
// Success
}
else {
// ... Process error
}

Now the compiler will complain because the value of result is being assigned to a constant
which makes no sense. This may work if a variable is compared to a constant but arguably it
makes the code less readable and wouldn't help if the left and right hand sides were both
assignable so their order didn't matter.
The

goto fail

example that we saw in section "Missing braces in conditionals" also

demonstrates a real world dangers combining assignment and comparison into a single line:
if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
goto fail;

This line is not broken for other reasons, but it's easy to see how might be, especially if this
pattern were repeated all over the place. The programmer might have saved a few lines of
code to combine everything in this way but at a greater risk. In this case, the risk might be

190

Assignment in Conditionals

inadvertantly turning the

=

into an

==

, i.e. comparing err to the function call and then

comparing that to 0.
if ((err == SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
goto fail;

How Rust helps
This code just won't compile:
let mut result = 0;
if result = 200 { // Compile Error
//...
}

The only form of assignment inside a conditional is the specialised and explicit
while let

if let

and

forms which are explained elsewhere.

191

Class Member Initialisation

Class Member Initialisation
C++ does not require that you initialise all variables in every constructor.
A member that is a C++ class with its own default constructor doesn't need to be
initialised
A member that is a C++ class without a default constructor must be explicitly initialised.
A member that is a reference must be explicitly initialised
Primitive types, including pointers do not have to be initialised although the compiler
may warn if they are not
Members do not have to be initialised in the order they are declared
Some compilers may issue warnings if you forget to initialise members or their ordering, but
they will still compile the code.
C++11 allows classes to have default member initializers which are used in the absence of a
constructor setting the value to something else:
class Coords {
public:
double x = 0.0;
double y = 0.0;
double z = 0.0;
// 2D initializer, x and y are set with the inputs, z is set
to 0
Coords(double x, double y) : x(x), y(y) {}
};

This is obviously a lot easier to read and ensures that if we have multiple constructors that
we don't have to initialize members if the default value will do.

How Rust helps
You MUST initialise all members of a struct. If your code does not initialise a struct you will
get a compiler error.
This will not compile:

192

Class Member Initialisation

struct Alphabet {
a: i32,
b: u32,
c: bool,
}
let a = Alphabet { a: -10, c: true };

If you try you will get an error like this:
rustc 1.13.0-beta.1 (cbbeba430 2016-09-28)
error[E0063]: missing field `b` in initializer of
`main::Alphabet`
|
9 |

let a = Alphabet { a: -10, c: true };

|

^^^^^^^^ missing `b`

Forcing you to initialise the members of the struct ensures the struct is always in a
consistent predictable state.
Ordering of initialisation does not matter providing all of the fields are set.
Structs often implement a

new()

function which encapsulates this initialisation and acts like

a constructor in C++, e.g.
struct Coord {
pub x: f64,
pub y: f64,
pub z: f64,
}
impl Coord {
pub fn new(x: f64, y:f64) {
Coord { x: x, y: y, z: 0f64 }
}
}
///...
let coord1 = Coord::new(100f64, 200f64);

193

Class Member Initialisation

Alternatively the struct might implement one or more

From<>

traits:

impl From<(f64, f64)> for Coord {
fn from(value: (f64, f64)) -> Coord {
Coord { x: value.0, y: value.1, z: 0.0 }
}
}
impl From<(f64, f64, f64)> for Coord {
fn from(value: (f64, f64, f64)) -> Coord {
Coord { x: value.0, y: value.1, z: value.2 }
}
}

//...
let coord = Coord::from((10.0, 20.0));
let coord = Coord::from((10.0, 20.0, 30.0));

There can be multiple

From

trait implementations so we can implement a form of

polymorphism.

194

Headers and Sources

Headers and Sources
A header file contains definitions of classes, types, macros etc that other files need to
#include in order to resolve their use of those things.
Splitting the implementation and definition across different files is an added burden for
maintaining code but it can also lead to some serious errors.
Headers used across multiple projects that have different compiler settings
Issues with pragmas and alignment
Issues with different #definitions that affect byte length
Issues with different typedefs that affect byte length
Each consumer of the header must do so with the exact same settings that affect the size of
every type, struct and class in the file plus any issues with packing / alignment. If these
settings are not the same, it can cause instability, corruption or problems that only manifest
themselves at at runtime.
Headers also make the compiler slower because source that consumes the header
inevitably pulls in other headers which pull in other headers.

Guard blocks / #pragma once
Headers will also be expanded as many times as they are

#include

'd. To prevent the

expansion happening more than once per source file, they're usually protected by guard
blocks.
#ifndef FOO_H
#define FOO_H
....
#endif

If the same header is included more than once, the second time through it is preprocessed
into nothing.

#pragma once

195

Headers and Sources

Most modern compilers also support a
completely ignore an

#include

#pragma once

directive. This allows the compiler to

which it knows it has already included at least once before

per source file.
This is more efficient than guard blocks because the compile doesn't even bother opening or
processing the file again and just skips over it. There may be situations where this is not
suitable, but usually it results in faster compilation.

Precompiled Headers
Some compilers also support precompiled headers to speed up compilation. The compiler
builds a database lookup when compiling a single source file and subsequent source
compiles with reference to that database. This solution can speed up compilation but it
complicates the build process since one file has flags to generate the precompiled header
file and other sources have flags to reference it.

Pimpl pattern
A popular workaround for header issues is the Pimpl pattern. It is a way to separate a class
into a public part and a private implementation part.
The public class is almost an interface definition in its purity that can be defined in the
header with minimal dependencies. It forward references the implementation class and
stores it as a member:
#pragma once
// Gory details are in the .cpp file
class ComplexThingImpl;
class ComplexThing {
ComplexThingImpl *pimpl_;
public:
ComplexThing();
~ComplexThing();
// See note 1 below
void somethingReallyComplex();
};

196

Headers and Sources

The constructor for the outer class would allocate the implementation class and method calls
would call through to the inner.
The private implementation class is defined in the source file and can pull in as many extra
headers as it needs, pragmas whatever without hurting consumers or compile times of the
header.
// source file
#include "random/header.hpp"
// Lots of includes here
#include <...>
#include "more/stuff.hpp"
class

ComplexThingImpl {

// Lots of member variables and stuff here
// ...
public:
void somethingReallyComplex();
}
void ComplexThingImpl::somethingReallyComplex() {
// Lots of complex stuff here
// ...
}
ComplexThing::ComplexThing() :
pimpl_(new ComplexThingImpl()) {
}
ComplexThing::~ComplexThing() {
delete pimpl_;
}
void ComplexThing:: somethingReallyComplex() {
pimpl_->somethingReallyComplex();
}

This solution is known as Pimpl (private implementation) pattern and while it can work to
protect consumers and speed up builds it also adds complexity and overhead to
development. Instead of 2 definitions of a class to maintain (header / source) you now have

197

Headers and Sources

4(!) because there is a public and private impl class. Changing the signature of a method
means changing it in potentially 4 places, plus the line in the public class that invokes the
private counterpart.
One danger for Pimpl is that the private class is allocated from the heap. Code that uses a
lot of temporary Pimpl objects could contribute to heap fragmentation.
Note 1: Remember the rule of three? That applies to this object too. The example doesn't
show it but if we copy constructed or assigned ComplexThing to another instance we'd be in
a heap of trouble. So on top of the issues with making PImpl work we also have to prevent
the other ones. The easiest way to lock it down would be to derive from
if you were using boost or make the copy constructor

private

boost::noncopyable

, or use delete it in C++11.

How Rust helps
In Rust the definition and the implementation are the same thing. So immediately we have
exactly one thing to maintain.
Writing a function defines the function. Let's assume we have a functions.rs file
// functions.rs
pub fn create_directory_structure() {
// Implementation
}

Anyone can call it as

functions::create_directory_structure()

. The compiler will validate

the call is correct.
A struct's definition and its implementation are also written once. e.g.

directory.rs

// directory.rs
pub struct Directory {
pub path: String,
}
impl Directory {
pub fn mkdir(&self) {
// implementation
}
}

198

Headers and Sources

Implementations can be defined in a private Rust module and only public structs exposed to
consumers.
If we were a library crate (which we'll call
consumers we would write a top-level

file_utils

lib.rs

) wishing to expose these objects to

which says what files our lib comprises of

and we want to expose.
// lib.rs for file_utils
mod functions;
mod directory;
pub use functions::*;
pub use directory::Directory;

Now a consumer can use our crate easily:
extern crate file_utils;
use file_utils::*;
fn main() {
create_directory_structure();
let d = Directory { /* ... */ };
}

199

Forward Declarations

Forward Declarations
C++ prevents us from referring to a class or function which has not been defined yet. The
compiler will complain even if the class or function is in the same file it is referenced from.
This means ordering matters. If our function or class is used by other files, we have to
declare the function in a header. If our function is private to a source file, we have to declare
it in the source file, and possibly make it static.
For classes we can make a forward reference. This acts as a hint to compiler to say a class
does exist with this name and it will be told about it shortly. But it's a hack and it imposes
limits on how we can use the forward declared class.
For example, DataManager below can hand out Data objects but the Data object has a
reference to the DataManager. Since each class refers to each other there is no simple way
to make the compiler happy except with a forward declaration.
class Data; // Forward declaration
class DataManager {
public:
Data *getDataById(const std::string &id);
};
class Data {
public:
Data(DataManager &dataManager);
}

But forward declaration compromises the design of the code. For example we couldn't hold
the Data objects in a collection class:
class Data;
class DataManager {
std::map data_;
public:
Data *getDataById(const std::string &id);
}

200

Forward Declarations

The compiler would complain because it doesn't know anything about the constructors or
size of Data. So instantly the design has to change because of a dumb compiler restriction.
e.g. we might store a pointer to Data instead in the map but then we'd have to remember to
delete it. So forward references increase the potential for bugs.
class Data;
class DataManager {
// Great, now we have to remember to new / delete Data and we
increase
// memory fragmentation
std::map data_;
public:
Data *getDataById(const std::string &id);
}

How Rust helps
In Rust forward declarations are unnecessary. The struct and function’s definition reside in a
.rs and can be referenced with a use directive.

201

Namespace Collisions

Namespace Collisions
C code has no namespaces at all and namespaces in C++ are optional.
C has learned to live without namespaces. Most C code uses prefixes on functions and
structs to avoid collisions, e.g

sqlite3_exec()

prefix stops the function colliding with

exec()

is a function belonging to SQLite3. The
which is a standard POSIX function that

got there first. So the prefix acts as a pseudo namespace. But it adds noise to our code
and would not be necessary if namespaces were supported and enforced.
C++ makes them easy to declare but there is no compunction for any code to bother or
to do so in anything but the most perfunctory way.
Macros are not affected by namespaces. For example, if
by some header they taint everything that

#include

TRUE

and

FALSE

are defined

's those definitions.

By default all C++ code resides in a global namespace:
void hello() {
// My function hello is in the global namespace, i.e.
::hello()
}
int main() {
// Main entry point
hello();
}

The function

hello()

replaced with calls to

is part of the global namespace. The call to it within
::hello()

main

could be

. The problem of course is that the more code we write into

the global namespace, or the more libraries we pull in that have no namespaces, the more
chance there is of collisions.
Namespacing requires code enclose the namespaced portion in a block.
namespace application {
// stuff in here belongs to application::
}
//...
application::App app("my app");

202

Namespace Collisions

It is also easy to abuse namespaces, for example this happens sometimes and is NOT a
good idea:
// Inside of foo.h...
using namespace std;
//... all code after here is tainted with std

Any file that says

#include "foo.h"

will inadvertently tell the compiler to automatically look

up unscoped types and functions against std which may not be what the code wants at all.
Nested namespacing is also possible but it can look messy.
namespace application { namespace gui {
// stuff in here belongs to application::gui::
} }
//... eg.
application::gui::Point2d point(100,100);

If we forget to close a brace when nesting headers it becomes very easy to make C++ throw
up a wall of incoherent errors.

How Rust helps
In Rust every file is implicitly a module (equivalent to a namespace). You cannot NOT use
modules because you get them automatically.
If you have a collision between the names of crates or modules y

203

Macros

Macros
Macros in C/C++ are basically little rules that are defined by a preprocessor and substituted
into the code that the compiler ultimately attempts to compile.
Modern coding practice these days is to use inline functions and constants instead of
macros.
But the reality is they can still be (ab)used and code often does. For example code might
insert debug statements or logging which is compiled away in release mode.
Another common use is on Windows where the type
wchar_t

depending on

USES_CONVERSION

,

A2CT

#define UNICODE

,

T2CW

TCHAR

compiles to be either

char

or

being present or not. Along with it go macros like

etc. Code should compile cleanly either way but the reality

is usually it doesn't.
A classic problem would be something like this:
#define SQUARED(x) x * x
// And in code
float result = SQUARED(++x);
That would expand to
float result = ++x * ++x;

So the value in result would be wrong and the value in x would be incremented twice.

Compilation errors
Consider we are compiling this structure:
// Header
struct Tooltip
#if TOOLTIP_VERSION > 4
char buffer[128];
#else
char buffer[64];
#endif
};

204

Macros

And in C++
Tooltip tooltip;
memset(&tooltip, 0, sizeof(tooltip));

If we fail to define

TOOLTIP_VERSION

to the same value in the implementation as in the caller,

then this code may stomp all over memory because it thinks the struct is 128 bytes in one
place and 64 bytes in another.

Namespace issues
Macros aren't namespaced and in some cases this leads to problems where a macro
definition collides with a well qualified symbol. For example code that
gets a

#define TRUE 1

#include 

. But that excludes any other code that expects to compile on

Windows from ever using

TRUE

as a const no matter how well they qualify it. Consequently

code has to do workarounds such as

#undef

macros to make code work or using another

value.
#ifdef TRUE
#define TMP_TRUE TRUE
#undef TRUE
#endif
bool value = myapp::TRUE;
#ifdef TMP_TRUE
#define TRUE TMP_TRUE
#undef TMP_TRUE
#endif

Ugh. But more likely we'll rename myapp::TRUE to something like myapp::MYAPP_TRUE to
avoid the conflict. It's still an ugly workaround for a problem caused by inconsiderate use of
macros.
Commonly used words like TRUE, FALSE, ERROR, OK, SUCCESS, FAIL are more or less
unusable thanks to macros.

How Rust helps

205

Macros

Rust provides developers with consts, inline attributes, and platform / architecture attributes
for the purpose of conditional compilation.
Rust offers macros but they consist of a set of matching rules than must generate
syntactically Rust. Macro expansion is performed by the compiler so it is capable of
generating errors on the macro if the macro is in error.

206

Type Mismatching

Type Mismatching
Consider two methods. Both are called evaluate() and they are overloaded. The main()
method calls evaluate("Hello world"). What version is called in the compiled code?
#include 
#include 
using namespace std;
void evaluate(bool value) {
cout << "Evaluating a bool " << value << endl;
}
void evaluate(const std::string &value) {
cout << "Evaluating a string " << value << endl;
}
int main() {
evaluate("Hello world");
return 0;
}

It may surprise you to know that the bool version is called and the compiler doesn't even
complain about it either:
Evaluating a bool 1

This is an example of bad type inference. A string literal (a char ) should be turned into a
std::string (a C++ string has a constructor that takes char ) but the compiler chose to treat it
as a bool instead.
On other occasions the compiler might spot ambiguity and complain but the blurred lines
between types in C++ combined with overloading lead to errors: Here is another example
where the compiler is a little more useful by generating an error, but in doing so it
demonstrates the limits of overloading

207

Type Mismatching

bool evaluate(bool value);
bool evaluate(double value);

These overloaded methods should be distinct but they're not distinct enough as far as the
compiler is concerned.
In summary, blurred and confusing rules about types in C++ can cause unexpected errors
that can propagate to runtime.

How Rust helps
In Rust the functions cannot be overloaded in this manner.
Rust is also more strict about type coercion - if you have a bool you cannot pass it to a
function that takes an integer.
Nor can you pass an integer of one size to a function taking an integer of another size.
fn print_i32(value: i32) {
println!("Your value is {}", value);
}
let value = 20i16; // 16-bit int
print_i32(value);

This will yield an error:
error[E0308]: mismatched types
|
7 | print_i32(value);
|

^^^^^ expected i32, found i16

You must use an explicit numeric cast to turn the value into the type the function expects:
print_i32(value as i32);

208

Explicit / Implicit Class Constructors

Explicit / Implicit Class Constructors
It's not just overloading that can be a mess. C++ has a bunch of rules about implicit / explicit
type conversion for single argument constructors.
For example:
class MagicNumber {
public:
MagicNumber(int value) {}
};
void magic(const MagicNumber &m) {
//...
}
int main() {
//...
magic(2016);
return 0;
}

The function

magic()

takes a

const MagicNumber &

compiled. How did it do that? Well our
int

MagicNumber

yet we called it with

2016

and it still

class has a constructor that takes an

so the compiler implicitly called that constructor and used the

MagicNumber

it yielded.

If we didn't want the implicit conversion (e.g. maybe it's horribly expensive to do this without
knowing), then we'd have to tack an

explicit

keyword to the constructor to negate the

behaviour.
explicit MagicNumber(int value) {}

It demonstrates an instance where the default behavior is probably wrong. The default
should be

explicit

and if programmers want implicit they should be required to say it.

C++11 adds to the confusion by allowing classes to declare deleted constructors which are
anti-constructors that generate an error instead of code if they match. For example, perhaps
we only want implicit
double

int

constructors to match but we want to stop somebody passing in a

. In that case we can make a constructor for

double

and then delete it.

209

Explicit / Implicit Class Constructors

class MagicNumber {
public:
MagicNumber(int value) {}
MagicNumber(double value) = delete;
};
void magic(const MagicNumber &m) {
//...
}
//...
magic(2016);

// OK

magic(2016.0); // error: use of deleted function
'MagicNumber::MagicNumber(double)'

How Rust helps
Rust does not have constructors and so there is no implicit conversion during construction.
And since there is no implicit conversion there is no reason to have C++11 style function
delete operators either.
You must write explicit write "constructor" functions and call them explicitly. If you want to
overload the function you can use
For example we might write our

Into<>

patterns to achieve it.

MagicNumber

constructor like this:

struct MagicNumber { /* ... */ }
impl MagicNumber {
fn new(value: T) -> MagicNumber where T: Into
{
value.into()
}
}

We have said here that the
which implements the trait

function takes as its argument anything that type

new()

Into

So we could implement it for

i32

T

.

:

210

Explicit / Implicit Class Constructors

impl Into for i32 {
fn into(self) {
MagicNumber { /* ... */ }
}
}

Now our client code can just call

new

and providing it provides a type which implements

that trait our constructor will work:
let magic = MagicNumber::new(2016);
// But this won't work because f64 doesn't implement the
trait
let magic = MagicNumber::new(2016.0);

211

Poor Lifetime Enforcement

Poor Lifetime Enforcement
A function like is completely legal and dangerous:
std::string &getValue() {
std::string value("Hello world");
return value;
}

This function returns a reference to a temporary variable. Whoever calls it will get a
reference to garbage on the stack. Even if it appears to work (e.g. if we called the reference
immediately) it is only through luck that it does.
Our compiler will probably issue a warning for this trivial example but it won't stop us from
compiling it.

How Rust helps
Rust tracks the lifetime of all objects and knows when their lifetime begins and ends. It
tracks references to the object, knows when it is being borrowed (being passed to a function
/ scope).
It generate a compiler error if it detects any violations of its lifetime / borrowing rules. So the
above code would fail to compile.

212

Memory Allocation

Memory Allocation
Allocated memory is memory that is requested from a portion of memory called a heap, used
for some purpose and returned to the free space when it is no longer required.
In C memory is allocated and freed through a relatively simple API:
malloc

and

calloc

allocate memory and

free

destroys it.

However C++ also needs allocates that call the appropriate constructors and destructors so
in addition to C's memory allocation functions, there are keywords for allocation / free.
new

/

new[]

delete

and

for C++ class instances

delete[]

for arrays of classes

The above but through scoped / shared pointer classes that take ownership of the
pointer and free it when appropriate.
If we fail to free / delete memory that we've allocated, the program will leak memory. If we
free / delete memory we've already deallocated, the program may crash. If we free a C++
class with a C

free()

the program may leak memory because any member variables will

not be destroyed properly. If we fail to call the correct constructor and destructor pair the
program may leak / crash.
A cottage industry of tools has sprung up just to try and debug issues with memory leaks,
crashes and so forth. Tools like Valgrind etc. specialise in trying to figure out who allocated
something without freeing it.
For example, what's wrong with this?
std::string *strings = new std::string[100];
//...
delete strings;

Oops we allocated an array of strings with

new[]

but called

delete

instead of

delete[]

.

So instead of deleting an array of strings we called delete on the first member. 99 of those
string's destructors will never be called. We should have written:
delete []strings;

But the compiler doesn't care and so we have created a potentially hard-to-find bug.

213

Memory Allocation

Some of the problems with memory allocation can be mitigated by wrapping pointers with
scoped or shared pointer classes. But there are even problems which can prevent them from
working.
It's not a good idea to allow memory allocation to cross a library boundary. So many libraries
provide new / free functions through their API. Issues about balancing calls apply to them
too.

How Rust helps
During normal safe programming Rust has no explicit memory allocation or deallocation. We
simply declare an object and it continues to exist until its lifetime goes out of scope (i.e.
nothing refers to it any more).
This is NOT garbage collection. The compiler tracks the lifetime of the object and generates
code to automatically delete it at the point it is no longer used. The compiler also knows if we
enclose an object's declaration inside a cell, box, rc or similar construct that the object
should be allocated on the heap and otherwise it should go on the stack.
Allocation / deallocation is only available in unsafe programming. We would not only
ordinarily do this except when we are interacting with an external library or function call and
explicitly tag the section as unsafe.

214

Null Pointers

Null Pointers
The need to test a pointer for NULL, or blindly call a pointer that might be NULL has caused
so many errors that it has even been called the billion dollar mistake
TODO

215

Virtual Destructors

Virtual Destructors
C++ allows classes to inherit from other classes.
In some cases, such as this example, this can lead to memory leaks:
class ABase {
public:
~ABase() {}
};
class A : public ABase {
std::string *value_;
public:
A() : value_(new std::string) {}
~A() { delete value_; }
};
void do_something() {
ABase *instance = new A();
//...
delete instance;
}

So here we allocate a pointer to A, assign it to "instance" which is of type

ABase

, do

something with it and finally delete it. It looks fine but we just leaked memory! When we
called "delete instance" the code invoked the destructor
~A()

. And

value_

pointer to wrap

~ABase()

and NOT the destructor

was not deleted and the memory leaked. Even if we'd used a scoped

value_

it would still have leaked.

The code should have said
class ABase {
public:
virtual ~ABase() {}
};

216

Virtual Destructors

The compiler didn't care our code was in error. It just allowed us to leak for the sake of a
missing keyword.

How Rust helps
Rust also does not use inheritance so problems like ABase above cannot exist. In Rust
ABase

would be declared as a trait that A implements.

trait ABase {
//...
}
struct A {
value: String,
}
impl ABase for A {
//...
}

Rust also allows our struct to implement another trait called

Drop

which is equivalent to a

C++ destructor.
impl Drop for A {
fn drop(&mut self) {
println!("A has been dropped!");
}
}

It allows our code to do something during destruction such as to free an open resource, log
a message or whatever.

217

Exception Handling / Safety

Exception Handling / Safety
There are no hard and fast rules for when a function in C++ should throw an exception and
when it should return a code. So one codebase may have a tendency to throw lots of
exceptions while another might throw none at all.
Aside from that, code may or may not be exception safe. That is, it may or may not free up
its resources if it suffers an exception.
Articles have been written to describe the levels of guarantees that code can aim for with
exception safety.

Constructors
You may also be advised to throw exceptions in constructors because there is no easy way
to signal the object is an error otherwise except to set the new object into some kind of
zombie / dead state via a flag that has to be tested.
DatabaseConn::DatabaseConn() {
db_ = connect();
if (db_ == NULL) {
throw string("The database connection is null");
}
}
// These both recover their memory
DatabaseConn db1;
DatabaseConn *db2 = new DatabaseConn();

But if DatabaseConn() had allocated some memory before throwing an exception, this would
NOT be recovered and so ~DatabaseConn would have to clean it up.

218

Exception Handling / Safety

DatabaseConn::DatabaseConn() {
buffer_ = new char[100];
// ... exception throwing code
}
DatabaseConn::~DatabaseConn() {
if (buffer_) {
delete[] buffer_;
}
}

But if we waited until after the exception throwing to allocate memory then maybe buffer_ is
not set to NULL, so we'd have to ensure we initialised it to NULL.
DatabaseConn::DatabaseConn() : buffer_(NULL) {
// ... exception throwing code
buffer_ = new char[100];
}

Destructors
But you will be advised NOT to throw exceptions in destructors because throwing an
exception during a stack unwind from handling another exception is fatal.
BadNews::~BadNews() {
if (ptr == NULL) {
throw string("This is a bad idea");
}
}

How Rust helps
The recommended way of dealing with errors is to use the

Option

and

Result

types to

formally pass errors to your caller.

219

Exception Handling / Safety

For irregular errors your code can choose to invoke

panic!()

which is a little like an

exception in that it will cause the entire thread to unwind. If the main thread panics then the
process terminates.
A

panic!()

can be caught and recovered from in some scenarios but it is the nuclear

option.
Lacking exceptions might seem a bad idea but C++ demonstrates that they come with a
whole raft of considerations of their own.

220

Templates vs Generics

Templates vs Generics
What's a template?
C++ provides a way of substituting types and values into inline classes and functions called
templates. Think of it as a sophisicated substitution macro - you specify a type T in the
template and this can substitute for a type

int

or something else at compile time. During

compilation you'll be told if there are any errors with the type you supply. This is a very
powerful feature since it allows a class to be reused for many different types.
Templates are used extensively in the C++ library, Boost and in other places. Collections,
strings, algorithms and various other piece of code use templates in one form or another.
However, templates only expand into code when something actually calls the inline function.
Then, if the template calls other templates, the inline code is expanded again and again until
there is a large body of code which can be compiled. A small error in our code can
propogate into an enormous wall of noise in the middle of some expanded template.
For example a vector takes a type it holds as a template parameter. So we can create a
vector of PatientRecords.
class PatientRecord {
std::string name_;
PatientRecord() {}
PatientRecord operator= (const PatientRecord &other) { return
*this; }
public:
PatientRecord(const std::string &name) : name_(name) {
}
};
...
std::vector records;

So far so good. So let's add a record:
records.push_back(PatientRecord("John Doe"));

221

Templates vs Generics

That works too! Now let's try to erase the record we just added:
records.erase(records.begin());

Boom!
c:/mingw/i686-w64-mingw32/include/c++/bits/stl_algobase.h: In
instantiation of 'static _OI std::__copy_move::__copy_m(_II, _II, _OI) [with
_II = PatientRecord*; _OI = PatientRecord*]':
c:/mingw/i686-w64mingw32/include/c++/bits/stl_algobase.h:396:70:

required from

'_OI std::__copy_move_a(_II, _II, _OI) [with bool _IsMove =
true; _II = PatientRecord*; _OI = PatientRecord*]'
c:/mingw/i686-w64mingw32/include/c++/bits/stl_algobase.h:434:38:

required from

'_OI std::__copy_move_a2(_II, _II, _OI) [with bool _IsMove =
true; _II = __gnu_cxx::__normal_iterator >; _OI =
__gnu_cxx::__normal_iterator >]'
c:/mingw/i686-w64mingw32/include/c++/bits/stl_algobase.h:498:47:

required from

'_OI std::move(_II, _II, _OI) [with _II =
__gnu_cxx::__normal_iterator >; _OI =
__gnu_cxx::__normal_iterator >]'
c:/mingw/i686-w64-mingw32/include/c++/bits/vector.tcc:145:2:
required from 'std::vector<_Tp, _Alloc>::iterator
std::vector<_Tp, _Alloc>::_M_erase(std::vector<_Tp,
_Alloc>::iterator) [with _Tp = PatientRecord; _Alloc =
std::allocator; std::vector<_Tp,
_Alloc>::iterator = __gnu_cxx::__normal_iterator >; typename std::_Vector_base<_Tp,
_Alloc>::pointer = PatientRecord*]'
c:/mingw/i686-w64-mingw32/include/c++/bits/stl_vector.h:1147:58:
required from 'std::vector<_Tp, _Alloc>::iterator
std::vector<_Tp, _Alloc>::erase(std::vector<_Tp,
_Alloc>::const_iterator) [with _Tp = PatientRecord; _Alloc =

222

Templates vs Generics

std::allocator; std::vector<_Tp,
_Alloc>::iterator = __gnu_cxx::__normal_iterator >; typename std::_Vector_base<_Tp,
_Alloc>::pointer = PatientRecord*; std::vector<_Tp,
_Alloc>::const_iterator = __gnu_cxx::__normal_iterator >; typename
__gnu_cxx::__alloc_traits::_Tp_alloc_type>::const_pointer = const PatientRecord*]'
..\vectest\main.cpp:22:34:

required from here

..\vectest\main.cpp:8:19: error: 'PatientRecord
PatientRecord::operator=(const PatientRecord&)' is private
PatientRecord operator= (const PatientRecord &other) {
return *this; }

If you wade through that noise to the bottom we can see the erase() function wanted to call
the assignment operator on PatientRecord, but couldn't because it was private.
But why did vector allow us to declare a vector with a class which didn't meet its
requirements?
We were able to declare the vector, use the std::vector::push_back() function but when we
called std::vector::erase() the compiler discovered some deeply nested error and threw
these errors back at us.
The reason is that C++ only generates code for templates when it is called. So the
declaration was not in violation, the push_back() was not in violation but the erase was.

How Rust helps
Rust has a concept similar to templates called generics. A generics is a struct or trait that
takes type parameters just like a template.
However but the type can be enforced by saying the traits that it must implement. In addition
any errors are meaningful.
Say we want to write a generic function that clones the input value:
fn clone_something(value: T) -> T {
value.clone()
}

223

Templates vs Generics

We haven't even called the function yet, merely defined it. When we compile this, we'll
instantly get an error in Rust.
error: no method named

clone

found for type

T

in the current scope

|
4 |
|

value.clone();
^^^^^

|
= help: items from traits can only be used if the trait is
implemented and in scope; the following trait defines an item
`clone`, perhaps you need to implement it:
= help: candidate #1: `std::clone::Clone`

Rust is saying we never said what T was and because some-random-type has no method
called clone() we got an error. So we'll modify the function to add a trait bound to T. This
binding says T must implement Clone:
fn clone_something(value: T) -> T {
value.clone();
}

Now the compiler knows T must have implement Clone it is able to resolve clone() and be
happy. Next we actually call it to see what happens:
struct WhatHappensToMe;
let x = clone_something(10);
let y = clone_something(WhatHappensToMe{});

We can clone the integer 10 because integers implement the Clone trait, but our empty
struct WhatHappensToMe does not implement Clone trait. So when we compile it we get an
error.

224

Templates vs Generics

error[E0277]: the trait bound `main::WhatHappensToMe:
std::clone::Clone` is not satisfied
|
8 | let y = clone_something(WhatHappensToMe{});
|

^^^^^^^^^^^^^^^

|
= note: required by `main::clone_something`

In summary, Rust improves on templates by TODO
Compiling generic functions / structs even when they are unused and offer meaningful errors
immediately.
Allow us to bind traits to generic types to constrain what we can pass into them.
Offer meaningful errors if we violate the requirements of the trait bounds

225

Multiple Inheritance

Multiple Inheritance
C++ allows code to inherit from multiple classes and they in turn could inherit from other
classes. This gives rise to the dreaded diamond pattern.
e.g. D inherits from B and C but B and C both inherit from A. So does D have two instances
of A or one?
This can cause compiler errors which are only partially solved by using something called
"virtual inheritance" to convince the compiler to share A between B and C.
i.e if we knew B and C could potentially be multiply inherited we might declare them with a
virtual keyword in their inheritance:
class B : public virtual A {
//...
};
class C: public virtual A {
};
class D: public B, public C {
//...
};

When D inherits from B and C, both share the same instance of A. But that assumes the
authors of A, B and C were aware of this problem arising and coded themselves with the
assumption that A could be shared.
The more usual normal solution for diamond patterns is "don't do it". i.e use composition or
something to avoid the problem.

How Rust helps
Rust also does not use class inheritance so problems like diamond patterns cannot exist.
However traits in Rust can inherit from other traits, so potentially it could have diamond-like
issues. But to ensure it doesn't, the base trait is implemented separately from any traits that
inherit from it.
So if struct D implements traits B & C and they inherit from A, then A, B and C must have
impl blocks.

226

Multiple Inheritance

trait A {
//...
}
trait B : A {
//...
}
trait C : A {
//...
}
struct D;
impl A for D {
//...
}
impl B for D {
//...
}
impl C for D {
//...
}

227

Linker Errors

Linker Errors
C and C++ requires you supply a list of all the .obj files that form part of your library or
executable.
If you omit a file by accident you will get undefined or missing references. Maintaining this
list of files is an additional burden of development, ensuring to update your makefile or
solution every time you add a file to your project.

How Rust Helps
Rust includes everything in your library / executable that is directly or indirectly referenced
by mod commands, starting from your toplevel lib.rs or main.rs and working all the way
down.
Providing you reference a module, it will be automatically built and linked into your binary.
If you use the

cargo

command, then the above also applies for external crates that you link

with. The cargo command will also check for version conflicts between external libraries. If
you find your cargo generating errors about compatibility conflicts between crates you may
be able to resolve them by updating the Cargo.lock file like so:
cargo update

228

Debugging Rust

Debugging Rust
Rust compiles into machine code the same as C and benefits from sharing the same ABI
and compiler backend formats as C/C++.
So you can debug Rust in the same way as C/C++. If you built your Rust executable in a gcc
compatible binary format you can just invoke gdb on it:
gdb my_executable

Rust comes with a gdb wrapper script called

rust-gdb

that loads macros which perform

syntax highlighting.

Enabling backtrace
If your code is crashing because of a panic!() you can get a backtrace on the console by
setting the

RUST_BACKTRACE

environment variable.

# Windows
set RUST_BACKTRACE=1
# Unix/Linux
export RUST_BACKTRACE=1

Find out your target binary format
If you are in doubt what you are targeting, you may use

rustup

to show you.

c:\dev\visu>rustup show
Default host: x86_64-pc-windows-msvc
stable-x86_64-pc-windows-msvc (default)
rustc 1.13.0 (2c6933acc 2016-11-07)

Or perhaps:

229

Debugging Rust

[foo@localhost ~]$ rustup show
Default host: x86_64-unknown-linux-gnu
stable-x86_64-unknown-linux-gnu (default)
rustc 1.13.0 (2c6933acc 2016-11-07)

The information will tell you which debugger you can use to debug your code.

Microsoft Visual Studio
If you have the MSVC toolchain (32 or 64-bit) or the LLVM backend will generate a .pdb file
and binaries will be compatible with the standard MSVC runtime.
To debug your code:
1. Open Visual Studio
2. Choose File | Open | Project/Solution...
3. Select the compiled executable
4. Open a source file to debug and set a breakpoint
5. Click the "Start" button

GDB
GDB can be invoked directly from the command line or through a plugin / IDE. From the
command line it's a
TODO

LLDB
TODO

230

Memory Management

Memory Management
The memory model of Rust is quite close to C++. Structures that you declare in Rust reside
on the stack or they reside in the heap.

Stack
The stack is a memory reserved by the operating system for each thread in your program.
Stack is reserved for local variables based upon their predetermined size by moving a stack
pointer register forward by that amount. When the local variables go out of scope, the stack
pointer reduces by the same amount.
// Stack allocated
double pi = 3.141592735;
{
// Stack pointer moves as values goes in and out of scope
int values[20] = { 0, 1, 2, 3, ... , 19, 20 };
}

In C-style languages it is normal for the stack in each thread to be a single contiguous slab
of memory that represents the "worst case" scenario for your program i.e. you will never
need any more stack than the thread allocated at start. If you do exceed the stack, then you
cause a stack overflow.
Some languages support the concept of split or segmented stack. In this case, the stack is a
series of "stacklets" joined together by a linked list. When the stack is insufficient for the next
call, it allocates another stacklet.
The gcc can support a segmented stack, but it greatly complicates stack unwinding when an
exception is thrown and also when calls are made across linker boundaries, e.g. between a
segmented-stack aware process and a non segmented stack dynamic library.

Stack Overflows
The main worry from using the stack is the possibility of a stack overflow, i.e the stack
pointer moves out of the memory reserved for the stack and starts trampling on other
memory.

231

Memory Management

This can occur in two common ways in isolation or combination:
Deeply nested function calls, e.g. a recursive function that traverses a binary tree, or a
recursive function that never stops
Exhausting stack by using excessive and/or large local variables in functions, e.g. lots of
64KB byte arrays.

C++
Some C++ compilers won't catch an overflow at all. They have no guard page and thus
allow the stack pointer to just grow whereever memory takes it until the program is
destabilized and crashes.
The gcc compiler has support segmented stacks but as described earlier not without issue.
The MSVC compiler adds a guard page and stack pointer checks when when the stack
pointer could advance more than a page in a single jump and potentially miss the guard
page.

Rust
Rust used to support a segmented stack as a means of detecting memory violation but since
1.4 has replaced it with a guard page at the end of the stack space. If the guard page is
touched by a memory write, it will generate a segmentation fault that halts the thread. Guard
pages open up a small risk that the stack could grow well in excess of the guard and it might
take some time for a write to the guard to generate a fault.
Rust aspires to support stack probe code generation on all platforms at which point it is likely
to use that in addition to a guard page. A stack probe is additional generated code on
functions that use more than a page of space for local variables to test if they exceed the
stack.
Rust reduces the risk stack overflows in some indirect ways. It's easy in C++ through
inheritance or by calling a polymorphic method inadvertently set off a recursive loop

Heap
Heap is a memory that the language runtime requests from the operating system and makes
available to your code through memory allocation calls
C++

232

Memory Management

char * v = (char *) malloc(128);
memset(v, 0, 128);
strcpy(v, "Hello world");
//...
free(string);
double *values = new double[10];
for (int i = 0; i < 10; i++) {
values[i] = double(i);
}
delete []values;

Allocation simply means a portion of the heap is marked as in-use and the code is provided
with a pointer to the reserved area to do what it likes with. Free causes the portion to be
returned to its free state, coalescing with any free areas that it resides next to in memory.
A heap can grow and code might create multiple heaps and might even be compelled to in
order control problems such as heap fragmentation.

Rust
To allocate memory on the heap in Rust you declare data inside of a a box. For example to
create a 1k block of bytes:
let x: Box<[u8]> = Box::new([0; 1024]);

Many structs in std:: and elsewhere will have a stack based portion and also use use heap
internally to hold their buffers.

Heap fragmentation
Heap fragmentation happens when contiguous space in the heap is limited by the pattern of
memory allocations that it already contains. When this happens a memory allocation can fail
and the heap must be grown to make it succeed. In systems which do not have virtual
memory / paging, memory exhaustion caused by fragmentation can cause the program or
even the operating system to fail completely.

233

Memory Management

The easiest way to see fragmentation is with a simple example. We'll pretend there is no
housekeeping structures, guard blocks or other things to get in the way. Imagine a 10 byte
heap, where every byte is initially free.
Now allocate 5 bytes for object of type A. The heap reserves 5 bytes and marks them used.

Now allocate 1 byte for object of type B. This is also marked used.

Now free object A. The the portion of heap is marked unused. Now we have a block of 5
bytes free and a block with 4 bytes free.

Now allocate 2 bytes for object of type C. Now we have a block of 3 bytes free and a block
with 4 bytes free.

Now allocate 5 slots for object of type A - Oops we can't! The heap has 7 bytes free but they
are not contiguous. At this point the runtime would be forced to grow the heap, i.e. ask the
operating system for another chunk of memory at which point it can allocate 5 bytes for A.

The above assumes the heap is a contiguous, or that memory paging makes it seem so. On
some systems, it might be that the heap is a linked list of chunks, in which case the allocated
space for A would have to reside be in a single chunk, the newly allocated portion above.
This is also an exagerated example, but it demonstrates how heap can have space, but not
enough to fufilly allocations without growing.
Software running in embedded devices are particularly vulnerable to fragmentation because
they do not have virtual memory, have low physical memory and normally have to run for
days, weeks or years at a time.
One major problem for C++ is that heap fragmentation is almost impossible to avoid. The
standard template library allocates memory for virtually all string and collection work, and if a
string / collection grows then it may have to reallocate more memory.
The only way to mitigate the issue is to choose the best collection, and to reserve capacity
wherever possible.

234

Memory Management

std::vector values;
values.reserve(10);
for (int i = 0; i < 10; i++) {
values.push_back(double(i));
}

Rust also has this issue and strings / collections have methods to reserve capacity. But as a
consequence of its design it prefers the stack over the heap. Unless you explicitly allocate
memory by putting it into a Box, Cell or RefCell you do not allocate it on the heap.

RAII
RAII stands for Resource Acquisiton Is Initalization. It's a programming pattern that ties
access to some resource the object's lifetime
C++ classes allow a pattern called RAII (). A class constructor acquires some resource, the
destructor releases that resource. As soon as the class goes out of scope, the resource is
released.
TODO C++ example
Rust is inherently RAII and enforces it through lifetimes. When an object goes out of scope,
the thing it holds is released. Rust also allows the programmer to explicitly drop a struct
earlier than its natural lifetime if there is a reason to.
RAII is most commonly seen for heap allocated memory but it can apply to files, system
handles etc.
TODO Rust example

235

Rust's std:: library

Rust's standard library
The core functionality in Rust is provided by a module called

std

. This is the standard

runtime library.
As with its C++ namesake, everything can be referenced through a
prefix or via a

use std::{foo}

std::

namespace

import.

Some of std is implicitly available by a special

std::prelude

that is automatically used

(along with a reference to the std crate) without declaration. The prelude contains
functionality that virtually all code is likely to use and therefore Rust spares code from having
to import it:
String and ToString trait
Iterators traits of various kinds - Iterator, Exten, IntoIterator etc.
Result<> and Option<> enums
Conversion traits AsRef, AsMut, Into, From
Vec heap allocated vector
Other traits such as Drop, Fn, FnMut, FnOnce, Box, Clone, Copy, Send, Sized, Sync,
PartialEq, PartialOrd etc.
Macros such as println!, format!, assert! etc.
// You don't need these
extern crate std;
use std::prelude::*;

There are various sub-modules under std that concern themselves with aspects of
development. Here are just some of them:
1. clone – the Clone trait
2. cmp – Eq, Ord, PartialEq, PartialOrd traits. These traits are used for equality and
ordering functionality.
3. collections - contains the standard collection types for sequences, maps, sets, and
miscellaneous. e.g. Vec and HashMap are members of this module.
4. env – environmental helpers - command line arguments, status codes, environment
variables, temporary folder
5. fmt – utilities for formatting and printing strings
6. fs - filesystem manipulation
7. io – Read and Write traits that are implemented by streams / buffers in file system and
networking, stdio functionality

236

Rust's std:: library

8. mem – memory primitives
9. net – networking
10. path – path manipulation
11. process – spawn, fork, exec etc.

C / C++ lib to Rust lib cross reference
TODO
Note that Rust's std namespace contains a lot of stuff not in the standard C or C++ libraries
and a lot of things are not directly analogous. For example the standard C / C++ library have
very little to say about sockets, or path manipulation, or atomically incrementing numbers, or
creating threads.
C

C++

Rust

T [S], e.g. char foo[20]

std::array (C++11)

[T; S], e.g. let foo: [u8; 20]
= [0; 20]

char * or char[] with functions
such as strcmp, strcpy, strstr,
strdup etc. Plus wide
equivalents to these.

std::string,
std::wstring,
std::u16string
(C++11),
std::u32string
(C++11)

&str or String as
appropriate

-

std::vector

std::vec::Vec or
std::collections::VecDeque

-

std::list

std::collections::LinkedList

-

std::set

std::collections::HashSet,
std::collections::BTreeSet

-

std::map

std::collections::HashMap,
std::collections::BTreeMap

fopen, fclose, fread / fwrite,
fseek etc.

std::ofstream,
std::ifstream,
std::fstream

TODO

-

Math functions are
direction accessible from
f64. f32 types., e.g.
1.0f64.cos().

Math functions such as cos, sin,
tan, acos, asin, atan, pow, abs,
log, log10, floor, ceil are defined
in

Note that because due to the decimal point being used on a float, you have to prefix f32 or
f64 to literals when you call them so the compiler can figure out what you're doing.

237

Rust's std:: library

Standard Traits
Some traits are system defined and in some cases can be derived automatically.
In others they cause the compiler to generate additional code for you such as the Drop trait
(described in class destructor section)

Drop
The Drop trait allows you do something when an object is dropped, such as add additional
logging or whatever.

Copy
A struct implementing a Copy trait can be copied through assignment, i.e. if you assign a to
b then a and b now how copies of the object, independent of each other. The Copy trait
really only useful when you have small amounts of data that represent a type or value of
some kind. TODO copy example, e.g. struct PlayingCard { suit: Suit, rank: Rank } If you find
yourself with a type that is larger, or contains heap allocated memory then you should use
clone.

Clone
A struct implementing the Clone trait has a .clone() method. Unlike Copy you must explicitly
.clone() the instance to create another. TODO clone example

Eq, PartialEq
TODO equality

Ord, PartialOrd
TODO ordering

238

Rust Cookbook

Rust Cookbook
Numbers
Convert a number to a string
Let's say you have an integer you want to turn into a string.
In C++ you might do one of the following:
const int value = 17;
std::string value_as_string;
// Nonstandard C itoa() (also not thread safe)
value_as_string = itoa(value);
// OR _itoa()
char buffer[16];
_itoa(value, buffer, 10);
// OR
sprintf(buffer, "%d", value);
// OR
stringstream ss;
ss << value;
value_as_string = ss.str();
// OR (boost)
value_as_string = boost::lexical_cast(ivalue);

All of these have issues. Some are extensions to the standard, others may not be thread
safe, some may break if

value

was changed to another type, e.g.

long long

.

Rust makes it far easier because numeric primitives implement a trait called ToString. The
ToString trait has a to_string() function. So to convert the number to string is as simple as
this:

239

Rust Cookbook

let value = 17u32;
let value_as_string = value.to_string();

The same is true for a floating point number:
let value = 100.00345f32;
let value_as_string = value.to_string();

Convert a number to a string with precision / padding
In C you would add precision of padding using printf operations:
double value = 1234.66667;
char result[32];
sprintf(result, "%08.2d", value);

In C++ you could use the C way (and to be honest it's easier than what is written below), or
you can set padding and precision through an ostream:
// TODO validate
double value = 1234.66667;
ostringstream ss;
ss << setfill('0') << setw(8) << setprecision(2) << value;

In Rust you can use format!() [https://doc.rust-lang.org/std/fmt/] for this purpose and it is
similar to printf / sprintf:
let value = 1234.66667;
let value_as_string = format!("{:08.2}", value);
println!("value = {}", value_as_string);

Output
value = 01234.67

Convert a number to a localized string

240

Rust Cookbook

Some locales will use dots or commas for separators. Some languages will use dots or
commas for the decimal place. In order to format these strings we need to make use of the
locale.
TODO

Convert a string to a number
In C / C++ a number might be converted from a string to a number in a number of ways

int value = atoi(value_as_str);

TODO
In Rust we have a &str containing a number:
let value_as_str = "12345";

Any type that implements a trait called FromStr can take its type from a string. All the
standard primitive types implement FromStr so we can simply say this:
let value_as_str = "12345";
let value = i32::from_str(value_as_str).unwrap();

Note the unwrap() at the end - the FromStr::from_str() returns the value inside a Result, to
allow for the possibility that the string cannot be parsed. Production code should test for
errors before calling unwrap() or it will panic.
Another way to get the string is to call parse() on the &str or String itself. In this case, you
use a slightly odd looking syntax nicknamed 'turbofish' which looks like this:
use std::str::FromStr;
let value_as_str = "12345";
let value = value_as_str.parse::().unwrap();

The string's implementation of parse() is a generic that works with any type implementing
FromStr

. So calling

parse::

is equivalent to calling

i32::from_str()

.

241

Rust Cookbook

Note one immediate advantage of Rust is it uses string slices. That means you could have a
long string with many numbers separated by delimiters and parse numbers straight out of
the middle of it without constructing intermediate copies.

Converting between numeric types
Converting between numeric types is as easy as using the "as" keyword.
let f = 1234.42f32;
let i = f as i32;
println!("Value = {}", i);

The result in i is the integer part of f.
Value = 1234

Strings
Rust comes with some very powerful functions that are attached to every &str and String
type. These mostly correspond to what you may be used to on the std::string class and in
boost string algorithms.
Most find / match / trim / split string operations in Rust are efficient because they neither
modify the existing string, nor return a duplicate to you. Instead they return slices, i.e. a
pointer and a length into your existing string to denote the range that is the result.
It is only operations that modify the string contents themselves such as creating upper or
lowercase versions that will return a new copy of a string.

Trimming a string
Spaces, tabs and other Unicode characters defined as whitespace can be trimmed from a
string.
All strings have access to the following functions
fn trim(&self) -> &str
fn trim_left(&self) -> &str
fn trim_right(&self) -> &str

242

Rust Cookbook

Note the signatures of these functions - they are not mutable. The functions return a slice of
the string that excludes the leading and / or trailing whitespace removed. In other words it is
not duplicating the string, nor is it modifying the existing string. Instead it is just telling you
what the trimmed range is within the &str you're already looking at.
So
let untrimmed_str = " this is test with whitespace

\t";

let trimmed_str = untrimmed_str.trim();
println!("Trimmed str = \"{}\"", trimmed_str);

Yields:
Trimmed str = "this is test with whitespace"

Also be aware that trim_left() and and trim_right() above are affected by the directionality of
the string.
Most strings read from left-to-right, but strings in Arabic or Hebrew are read right-to-left and
will start with a control character that sets their base direction right-to-left. If that character is
present, trim_left() actually trims from the right and trim_right() trims from the left.

Get the length of a string
Every &str and String has a len() function.
let message = "All good things come to those who wait";
println!("Length = {}", message.len());

Note that len() is the length in bytes. If you want the number of characters you need to call
message.chars().count(), e.g.
let message = "文字列の長さ";
assert_eq!(message.chars().count(), 6);

Splitting a string
String slices and String have a variety of

split

methods that return an iterable collection of

slices on a string:

243

Rust Cookbook

let input = "20,30,400,100,21,-1";
let values : Vec<&str> = input.split(",").collect();
for (i, s) in values.iter().enumerate() {
println!("Value {} = {}", i, s);
}

The standard

split()

std::str::Split

takes a string pattern for the delimiter and returns a

struct that is an double-ended iterator representation of the matching

result. We could call the iterator directly if we so wished but the
puts the values of the iterator into a

Vec<&str>

collect()

method above

.

Value 0 = 20
Value 1 = 30
Value 2 = 400
Value 3 = 100
Value 4 = 21
Value 5 = -1

A string can also be split on an index, e.g.
let (left, right) = "No Mister Bond I expect you to
die".split_at(14);
println!("Left = {}", left);
println!("Right = {}", right);

Note that index is the byte index! The function will panic if the index is in the centre of a
UTF-8 codepoint.
Another useful function is

split_whitespace

that splits on tabs, spaces, newlines and other

Unicode whitespace. Any amount of whitespace is treated as a single delimiter.
// Split whitespace
for s in " All good

\n\n\tthings

to those who

wait".split_whitespace() {
println!("Part - {}", s);
}

Yields the output.

244

Rust Cookbook

Part - All
Part - good
Part - things
Part - to
Part - those
Part - who
Part - wait

Tokenizing a string
TODO

Joining strings together
TODO

Getting a substring
TODO

Converting a string between upper and lower case
Strings have these functions for converting between upper and lower case:
fn to_lowercase(&self) -> String
fn to_uppercase(&self) -> String

These functions will return a new String that contains the upper or lower case version of the
input. Upper and lower case are defined by Unicode rules. Languages that have no upper or
lowercase strings may return the same characters.

Doing a case insensitive compare
TODO

Using regular expression matches
TODO

245

Rust Cookbook

Date and Time
Get the current date and time
TODO time_rs

UTC
TODO explain what UTC is and why maintaining time in UTC is vital Epochs etc. TODO
preamble about what an epoch is, the Unix epoch and other epochs

Setting a timer
TODO setting a timer

System time vs UTC
TODO the reason timers might be set in system uptime vs timers being set in UTC. Answer
because users and NTP can change the UTC time wherease system time is relative to
bootup. So setting a timer to run 10s from now will always work against system time where
setting a timer to run 10s from now in UTC could fail if the OS sets time back by an hour.

Formatting a date as a string
TODO standard date formatting UTC TODO example

Parsing a date from a string
TODO parsing a date from a string's TODO example

Performing date / time arithmetic

Collections
Creating a static array
An array primitive consists of a type and a length. e.g. a 16 kilobyte array of bytes can be
created and zeroed like this:

246

Rust Cookbook

let values: [u8; 16384] = [0; 16384];

The variable specifies the type and length and the assignment operator assigns 0 to every
element.
The type, length and values can be initialized implicitly in-place like this:
let my_array = [ "Cat", "Dog", "Fish", "Donkey", "Albatross" ];
println!("{:?}", my_array);

This is an array of 5 &str values. The compiler will complain if we try to mix types in the
array. We could also declare the array and manipulate it:
let mut my_array: [&'static str; 5] = [""; 5];
// Set some values
my_array[0] = "Cat";
my_array[1] = "Dog";
my_array[2] = "Fish";
my_array[3] = "Donkey";
my_array[4] = "Albatross";
println!("{:?}", my_array);

Note in this case we declared the array, each element received an empty value. Then our
code programmatically set the new element value. The latter form would obviously be useful
for arrays that change. The latter would be useful for arrays which do not.

Creating a dynamic vector
A vector is a linear array of values. Unlike an array which has a fixed length, a vector can
grow or shrink over time.
A vector can be created using the vec! macro like this:
let mut my_vector = vec![1984, 1985, 1988, 1995, 2001];

This creates a mutable Vec and prepopulates it with 5 values. Note how the vec! macro can
use square brackets for its arguments. We could have used round brackets and it would
have meant the same.
A new Vec can also be made using Vec::new() or Vec::with_capacity(size)

247

Rust Cookbook

let mut my_array = Vec::new();
my_array.push("Hello");
let my_presized_array = Vec::with_capacity(100);

It is strongly recommended you use Vec::with_capacity() to create a vector with enough
capacity for maximum number of elements you expect the vector to contain. It prevents the
runtime from having to reallocate and copy data if you keep exceeding the existing capacity.
It also significantly reduces heap fragmentation.

Removing values from a vector
Sometimes you want to strip out values from a list which match some predicate. In which
case there is a handy function for that purpose. TODO

.retain

Sorting a vector
A vector can be sorted by the natural sort order of the elements it contains:
let mut values = vec![ 99, -1, 3, 555, 76];
values.sort();
println!("Values = {:?}", values);

Sorting is done using the Ord trait and calling Ord::cmp() on the elements to compare them
to each other.
Comparison can also be done through a closure and Vec::sort_by()
TODO

.sort_by

TODO

.sort_by_key

Stripping out duplicates from a vector
Assuming your vec is sorted, you can strip out consecutive duplicate entries using dedup().
This function won't work and the result will be undefined if your vector is not sorted. TODO
.dedup

Creating a linked list
A linked list is more suitable than a vector when items are likely to be inserted or removed
from either end or from points within the list.
std::collections::LinkedList

248

Rust Cookbook

Creating a hash set
A hash set is a unique collection of objects. It is particularly useful for removing duplicates
that might occur in the input.

std::collections::HashSet

Creating a hash map
A hash map consists of a key and a value. It is used for look up operations
std::collections::HashMap

Iterating collections
TODO

Iterator adaptors
TODO
An adaptor turns the iterator into a new value
.enum

.map(X)

.take(N)

.filter(X)

Consuming iterators
A consumer is a convenience way of iterating a collection and producing a value or a set of
values from the result.
.collect()
.find()

will return the first matching element that matches the closure predicate. TODO

.fold()

is a way of doing calculations on the collection. It takes a base value, and then

calls a closure to accumulate the value upon the result of the last value. TODO Processing
collections

Localization
Unicode considerations
TODO

Externalizing strings

249

Rust Cookbook

TODO

Building strings from parameters
TODO

Creating a localization file
TODO

Logging
Files and streams
Rust comes with two standard modules:
std::io contains various stream related traits and other functionality.
std::fs contains filesystem related functionality including the implementation of IO traits
to work with files.

Creating a directory
A directory can be created with

std::fs::DirBuilder

, e.g.

let result =
DirBuilder::new().recursive(true).create("/tmp/work_dir");

File paths
Windows and Unix systems have different notation for path separators and a number of
other differences. e.g. Windows has drive letters, long paths, and network paths called
UNCs.
Rust provides a PathBuf struct for manipulating paths and a Path which acts like a slice and
can be the full path or just a portion of one.
TODO simple example of a path being created
TODO simple example of a Path slice in actively
TODO simple example of relative path made absolute

250

Rust Cookbook

Windows has a bunch of path prefixes so std::path::Prefix provides a way to accessing
those.
TODO example of a path being made from a drive letter and path

Opening a file
A

File

is a reference to an open file on the filesystem. When the struct goes out of scope

the file is closed. There are static functions for creating or opening a file:
use std::io::prelude::*;
use std::fs::File;
let mut f = try!(File::open("myfile.txt"));
TODO

Note that File::open() opens a file read-only by default. To open a file read-write, there is an
OpenOptions struct that has methods to set the behaviour of the open file - read, write,
create, append and truncate.
e.g. to open a file with read/write access, creating it if it does not already exist.
use std::fs::OpenOptions;
let file = OpenOptions::new()
.read(true)
.write(true)
.create(true)
.open("myfile.txt");

Writing to a file
TODO simple example of opening file to write

Reading lines from a file
TODO simple example of opening file text mode, printing contents

Threading
251

Rust Cookbook

Rust actively enforces thread safety in your code. If you attempt to pass around data which
is not marked thread safe (i.e. implements the Sync trait), you will get a compile error. If you
use code which is implicitly not thread safe such as Rc<> you will get a compile error.
This enforcement means that Rust protects against data race conditions, however be aware
it cannot protect against other forms of race conditions or deadlocks, e.g. thread 1 waits for
resource B (held by thread 2) while thread 2 waits for resource A (held by thread 1).

Creating a thread
Creating a thread is simple with a closure.
TODO

Waiting for a thread to complete
TODO

Using atomic reference counting
Rust provides two reference counting types. Type Rc<> is for code residing on the same
thread and so the reference counting is not atomic. Type Arc<> is for code that runs on
different threads and the reference counting is atomic.
An Arc<> can only hold a Sync derived object. Whenever you clone an Arc<> or its lifetime
ends, the counter is atomically incremented or decremented. The last decrement to zero
causes the object to be deleted.
TODO example

Locking a shared resource
Message passing is a preferable way to prevent threads from sharing state but its not
always possible.
Therefore Rust allows you to create a mutex and lock access to shared data. The guard that
locks / unlocks the mutex protects the data and when the guard goes out of scope, the data
is returned.
This style of guard is called TODO

Data race protection

252

Rust Cookbook

Rust can guarantee that protection from data races, i.e. more than one thread accessing /
writing to the same data at the same time.
However even Rust cannot protect against the more general problem of race conditions. e.g.
if two threads lock each other's data, then the code will deadlock. This is a problem that no
language can solve.

Waiting for multiple threads to finish
TODO

Sending data to a thread
Any struct that implements the Send trait is treated safe to send to another thread. Of course
that applies to

Receiving data from a thread
A thread can receive messages and block until it receives one. Thus it is easy to create a
worker thread of some kind.
TODO

Networking
Connecting to a server
TODO

Listening to a socket
TODO

Interacting with C
Using libc functions and types
Calling a C library
Generating a dynamic library
253

Rust Cookbook

Calling Win32 functions

Common design patterns
Singleton
A singleton has one instance ever in your application. TODO

Factory
TODO

Observer
TODO

Facade
TODO

Flyweight
TODO

Adapter
An adapter is where we present a different interface to a client calling the adapter than the
interface the code is implemented in. This might be done to make some legacy code
conform to a new interface, or to manage / hide complexity which might leak out into the
client.
As Rust is a relatively new language you are most likely to use an adapter pattern to wrap
some existing code in C. A common use for the adapter in C++ is to wrap up a C library in
RAII classes or similar.
TODO

254



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.4
Linearized                      : No
Author                          : locka99
Create Date                     : 2018:02:08 21:34:18+00:00
Producer                        : calibre 2.57.1 [http://calibre-ebook.com]
Description                     : A guide to porting C and C++ code to Rust.
Title                           : A Guide to Porting C and C++ code to Rust
Subject                         : 
Publisher                       : GitBook
Creator                         : locka99
Language                        : en
Metadata Date                   : 2018:02:08 21:34:18.978012+00:00
Timestamp                       : 2018:02:08 21:34:08.740333+00:00
Page Count                      : 254
EXIF Metadata provided by EXIF.tools

Navigation menu