Effective Computation In Physics Anthony Scopatz, Kathryn D. Huff Field Guide To Research With%
User Manual:
Open the PDF directly: View PDF .
Page Count: 552
Effective Computation in Physics
Written by two PhDs in nuclear engineering, this book includes practical
examples drawn from a working knowledge of physics concepts. You’ll
learn how to use the Python programming language to perform everything
from collecting and analyzing data to building software and publishing
your results.
In four parts, this book includes:
■■
Getting Started: Jump into Python, the command line, data
containers, functions, flow control and logic, and classes
and objects
■■
Getting It Done: Learn about regular expressions, analysis
and visualization, NumPy, storing data in files and HDF5,
important data structures in physics, computing in parallel,
and deploying software
■■
Getting It Right: Build pipelines and software, learn to use
local and remote version control, and debug and test your code
■■
Getting It Out There: Document your code, process and
publish your findings, and collaborate efficiently; dive into
software licenses, ownership, and copyright procedures
is the book I wish
“This
had existed when I
was a physics graduate
student. Now that
computing has become
central to virtually all
scientific research, it
should be essential
reading for scientists
from many disciplines:
practical, hands-on
knowledge that will help
with all stages of the
research cycle.
”
—Fernando Perez
Staff Scientist,
Lawrence Berkeley National Laboratory
Anthony Scopatz, a computational physicist and longtime Python developer,
holds a Ph.D. in Mechanical/Nuclear Engineering from the University of Texas at
Austin. In August 2015, he'll start as a professor in Mechanical Engineering at the
University of South Carolina.
SCIENCE/PROGR AMMING
US $49.99
Twitter: @oreillymedia
facebook.com/oreilly
Effective
Computation
in Physics
FIELD GUIDE TO RESEARCH
WITH PYTHON
Scopatz & Huff
Kathryn Huff is a fellow with the Berkeley Institute for Data Science and a
postdoctoral scholar with the Nuclear Science and Security Consortium at the
University of California Berkeley. She received her Ph.D. in Nuclear Engineering
from the University of Wisconsin-Madison.
Effective Computation
in Physics
More physicists today are taking on the role of software developer as
part of their research, but software development isn’t always easy or
obvious, even for physicists. This practical book teaches essential software
development skills to help you automate and accomplish nearly any aspect
of research in a physics-based field.
CAN $57.99
Anthony Scopatz &
Kathryn D. Huff
ISBN: 978-1-491-90153-3
www.it-ebooks.info
Effective Computation in Physics
Written by two PhDs in nuclear engineering, this book includes practical
examples drawn from a working knowledge of physics concepts. You’ll
learn how to use the Python programming language to perform everything
from collecting and analyzing data to building software and publishing
your results.
In four parts, this book includes:
■■
Getting Started: Jump into Python, the command line, data
containers, functions, flow control and logic, and classes
and objects
■■
Getting It Done: Learn about regular expressions, analysis
and visualization, NumPy, storing data in files and HDF5,
important data structures in physics, computing in parallel,
and deploying software
■■
Getting It Right: Build pipelines and software, learn to use
local and remote version control, and debug and test your code
■■
Getting It Out There: Document your code, process and
publish your findings, and collaborate efficiently; dive into
software licenses, ownership, and copyright procedures
is the book I wish
“This
had existed when I
was a physics graduate
student. Now that
computing has become
central to virtually all
scientific research, it
should be essential
reading for scientists
from many disciplines:
practical, hands-on
knowledge that will help
with all stages of the
research cycle.
”
—Fernando Perez
Staff Scientist,
Lawrence Berkeley National Laboratory
Anthony Scopatz, a computational physicist and longtime Python developer,
holds a Ph.D. in Mechanical/Nuclear Engineering from the University of Texas at
Austin. In August 2015, he'll start as a professor in Mechanical Engineering at the
University of South Carolina.
SCIENCE/PROGR AMMING
US $49.99
Twitter: @oreillymedia
facebook.com/oreilly
CAN $57.99
Effective
Computation
in Physics
FIELD GUIDE TO RESEARCH
WITH PYTHON
Scopatz & Huff
Kathryn Huff is a fellow with the Berkeley Institute for Data Science and a
postdoctoral scholar with the Nuclear Science and Security Consortium at the
University of California Berkeley. She received her Ph.D. in Nuclear Engineering
from the University of Wisconsin-Madison.
Effective Computation
in Physics
More physicists today are taking on the role of software developer as
part of their research, but software development isn’t always easy or
obvious, even for physicists. This practical book teaches essential software
development skills to help you automate and accomplish nearly any aspect
of research in a physics-based field.
Anthony Scopatz &
Kathryn D. Huff
ISBN: 978-1-491-90153-3
www.it-ebooks.info
Effective Computation in Physics
Anthony Scopatz and Kathryn D. Huff
Boston
www.it-ebooks.info
Effective Computation in Physics
by Anthony Scopatz and Kathryn D. Huff
Copyright © 2015 Anthony Scopatz and Kathryn D. Huff. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are
also available for most titles (http://safaribooksonline.com). For more information, contact our corporate/
institutional sales department: 800-998-9938 or corporate@oreilly.com.
Editor: Meghan Blanchette
Production Editor: Nicole Shelby
Copyeditor: Rachel Head
Proofreader: Rachel Monaghan
June 2015:
Indexer: Judy McConville
Interior Designer: David Futato
Cover Designer: Ellie Volckhausen
Illustrator: Rebecca Demarest
First Edition
Revision History for the First Edition
2015-06-09: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781491901533 for release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Effective Computation in Physics, the
cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
While the publisher and the authors have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility
for errors or omissions, including without limitation responsibility for damages resulting from the use of
or reliance on this work. Use of the information and instructions contained in this work is at your own
risk. If any code samples or other technology this work contains or describes is subject to open source
licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights.
978-1-491-90153-3
[LSI]
www.it-ebooks.info
To THW and friends: gonuke, animal1, kmo, redbeard, spidr, slayer, nicopresto,
wolfman, blackbeard, johnnyb, jdangerx, punkish, radio, crbates, 3rdbit, fastmath, and
others, this one is for you.
www.it-ebooks.info
www.it-ebooks.info
Table of Contents
Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Part I.
Getting Started
1. Introduction to the Command Line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Navigating the Shell
The Shell Is a Programming Language
Paths and pwd
Home Directory (~)
Listing the Contents (ls)
Changing Directories (cd)
File Inspection (head and tail)
Manipulating Files and Directories
Creating Files (nano, emacs, vi, cat, >, and touch)
Copying and Renaming Files (cp and mv)
Making Directories (mkdir)
Deleting Files and Directories (rm)
Flags and Wildcards
Getting Help
Reading the Manual (man)
Finding the Right Hammer (apropos)
Combining Utilities with Redirection and Pipes (>, >>, and |)
Permissions and Sharing
Seeing Permissions (ls -l)
Setting Ownership (chown)
1
2
3
5
6
7
10
11
11
17
18
18
20
21
21
24
25
26
26
28
v
www.it-ebooks.info
Setting Permissions (chmod)
Creating Links (ln)
Connecting to Other Computers (ssh and scp)
The Environment
Saving Environment Variables (.bashrc)
Running Programs (PATH)
Nicknaming Commands (alias)
Scripting with Bash
Command Line Wrap-up
29
29
30
31
33
34
36
36
38
2. Programming Blastoff with Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Running Python
Comments
Variables
Special Variables
Boolean Values
None Is Not Zero!
NotImplemented Is Not None!
Operators
Strings
String Indexing
String Concatenation
String Literals
String Methods
Modules
Importing Modules
Importing Variables from a Module
Aliasing Imports
Aliasing Variables on Import
Packages
The Standard Library and the Python Ecosystem
Python Wrap-up
40
41
42
44
45
45
45
46
49
50
53
54
55
57
58
58
59
59
60
62
63
3. Essential Containers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Lists
Tuples
Sets
Dictionaries
Containers Wrap-up
66
70
71
73
75
4. Flow Control and Logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Conditionals
vi
77
| Table of Contents
www.it-ebooks.info
if-else Statements
if-elif-else Statements
if-else Expression
Exceptions
Raising Exceptions
Loops
while Loops
for Loops
Comprehensions
Flow Control and Logic Wrap-up
80
81
82
82
84
85
86
88
90
93
5. Operating with Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Functions in Python
Keyword Arguments
Variable Number of Arguments
Multiple Return Values
Scope
Recursion
Lambdas
Generators
Decorators
Function Wrap-up
96
99
101
103
104
107
108
109
112
116
6. Classes and Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Object Orientation
Objects
Classes
Class Variables
Instance Variables
Constructors
Methods
Static Methods
Duck Typing
Polymorphism
Decorators and Metaclasses
Object Orientation Wrap-up
Part II.
118
119
123
124
126
127
129
132
133
135
139
141
Getting It Done
7. Analysis and Visualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Preparing Data
145
Table of Contents
www.it-ebooks.info
|
vii
Experimental Data
Simulation Data
Metadata
Loading Data
NumPy
PyTables
Pandas
Blaze
Cleaning and Munging Data
Missing Data
Analysis
Model-Driven Analysis
Data-Driven Analysis
Visualization
Visualization Tools
Gnuplot
matplotlib
Bokeh
Inkscape
Analysis and Visualization Wrap-up
149
150
151
151
152
153
153
155
155
158
159
160
162
162
164
164
167
172
174
175
8. Regular Expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Messy Magnetism
Metacharacters on the Command Line
Listing Files with Simple Patterns
Globally Finding Filenames with Patterns (find)
grep, sed, and awk
Finding Patterns in Files (grep)
Finding and Replacing Patterns in Files (sed)
Finding and Replacing a Complex Pattern
sed Extras
Manipulating Columns of Data (awk)
Python Regular Expressions
Regular Expressions Wrap-up
178
179
180
182
187
188
190
192
193
195
197
199
9. NumPy: Thinking in Arrays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Arrays
dtypes
Slicing and Views
Arithmetic and Broadcasting
Fancy Indexing
Masking
viii
|
202
204
208
211
215
217
Table of Contents
www.it-ebooks.info
Structured Arrays
Universal Functions
Other Valuable Functions
NumPy Wrap-up
220
223
226
227
10. Storing Data: Files and HDF5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Files in Python
An Aside About Computer Architecture
Big Ideas in HDF5
File Manipulations
Hierarchy Layout
Chunking
In-Core and Out-of-Core Operations
In-Core
Out-of-Core
Querying
Compression
HDF5 Utilities
Storing Data Wrap-up
230
235
237
239
242
245
249
249
250
252
252
254
255
11. Important Data Structures in Physics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Hash Tables
Resizing
Collisions
Data Frames
Series
The Data Frame Structure
B-Trees
K-D Trees
Data Structures Wrap-up
258
259
261
263
264
266
269
272
277
12. Performing in Parallel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Scale and Scalability
Problem Classification
Example: N-Body Problem
No Parallelism
Threads
Multiprocessing
MPI
Parallelism Wrap-up
280
282
284
285
290
296
300
307
Table of Contents
www.it-ebooks.info
|
ix
13. Deploying Software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
Deploying the Software Itself
pip
Conda
Virtual Machines
Docker
Deploying to the Cloud
Deploying to Supercomputers
Deployment Wrap-up
Part III.
311
312
316
319
321
325
327
329
Getting It Right
14. Building Pipelines and Software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
make
Running make
Makefiles
Targets
Special Targets
Building and Installing Software
Configuration of the Makefile
Compilation
Installation
Building Software and Pipelines Wrap-up
334
337
337
338
340
341
343
345
346
346
15. Local Version Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
What Is Version Control?
The Lab Notebook of Computational Physics
Version Control Tool Types
Getting Started with Git
Installing Git
Getting Help (git --help)
Control the Behavior of Git (git config)
Local Version Control with Git
Creating a Local Repository (git init)
Staging Files (git add)
Checking the Status of Your Local Copy (git status)
Saving a Snapshot (git commit)
git log: Viewing the History
Viewing the Differences (git diff)
Unstaging or Reverting a File (git reset)
Discard Revisions (git revert)
x
|
Table of Contents
www.it-ebooks.info
349
350
351
352
352
352
354
355
355
357
357
358
361
362
363
364
Listing, Creating, and Deleting Branches (git branch)
Switching Between Branches (git checkout)
Merging Branches (git merge)
Dealing with Conflicts
Version Conrol Wrap-Up
365
366
367
369
369
16. Remote Version Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
Repository Hosting (github.com)
Creating a Repository on GitHub
Declaring a Remote (git remote)
Sending Commits to Remote Repositories (git push)
Downloading a Repository (git clone)
Fetching the Contents of a Remote (git fetch)
Merging the Contents of a Remote (git merge)
Pull = Fetch and Merge (git pull)
Conflicts
Resolving Conflicts
Remote Version Control Wrap-up
371
373
373
374
375
379
380
380
381
382
384
17. Debugging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
Encountering a Bug
Print Statements
Interactive Debugging
Debugging in Python (pdb)
Setting the Trace
Stepping Forward
Querying Variables
Setting the State
Running Functions and Methods
Continuing the Execution
Breakpoints
Profiling
Viewing the Profile with pstats
Viewing the Profile Graphically
Line Profiling with Kernprof
Linting
Debugging Wrap-up
386
387
389
390
391
392
393
393
394
394
395
396
396
397
400
401
402
18. Testing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
Why Do We Test?
When Should We Test?
Where Should We Write Tests?
404
405
405
Table of Contents
www.it-ebooks.info
|
xi
What and How to Test?
Running Tests
Edge Cases
Corner Cases
Unit Tests
Integration Tests
Regression Tests
Test Generators
Test Coverage
Test-Driven Development
Testing Wrap-up
Part IV.
406
409
409
410
412
414
416
417
418
419
422
Getting It Out There
19. Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
Why Prioritize Documentation?
Documentation Is Very Valuable
Documentation Is Easier Than You Think
Types of Documentation
Theory Manuals
User and Developer Guides
Readme Files
Comments
Self-Documenting Code
Docstrings
Automation
Sphinx
Documentation Wrap-up
427
428
429
429
430
431
431
432
434
435
436
436
440
20. Publication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
Document Processing
Separation of Content from Formatting
Tracking Changes
Text Editors
Markup Languages
LaTeX
Bibliographies
Publication Wrap-up
441
442
443
443
444
445
456
459
21. Collaboration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
Ticketing Systems
xii
|
462
Table of Contents
www.it-ebooks.info
Workflow Overview
Creating an Issue
Assigning an Issue
Discussing an Issue
Closing an Issue
Pull Requests and Code Reviews
Submitting a Pull Request
Reviewing a Pull Request
Merging a Pull Request
Collaboration Wrap-up
462
464
466
467
468
468
469
469
470
470
22. Licenses, Ownership, and Copyright. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
What Is Copyrightable?
Right of First Publication
What Is the Public Domain?
Choosing a Software License
Berkeley Software Distribution (BSD) License
GNU General Public License (GPL)
Creative Commons (CC)
Other Licenses
Changing the License
Copyright Is Not Everything
Licensing Wrap-up
472
473
473
474
475
477
478
480
482
483
485
23. Further Musings on Computational Physics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487
Where to Go from Here
487
Glossary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
Bibliography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
Table of Contents
www.it-ebooks.info
|
xiii
www.it-ebooks.info
Foreword
Right now, somewhere, a grad student is struggling to make sense of some badly for‐
matted data in a bunch of folders called final, final_revised, and final_updated.
Nearby, her supervisor has just spent four hours trying to reconstruct the figures in a
paper she wrote six months ago so that she can respond to Reviewer Number Two.
Down the hall, the lab intern is pointing and clicking in a GUI to run an analysis pro‐
gram for the thirty-fifth of two hundred input files. He won’t realize that he used the
wrong alpha for all of them until Thursday…
This isn’t science: it’s what scientists do when they don’t have the equivalent of basic
lab skills for scientific computing. They spend hours, days, or even weeks doing
things that the computer could do for them, or trying to figure out what they or their
colleagues did last time when the computer could tell them. What’s worse, they usu‐
ally have no idea when they’re done how reliable their results are.
Starting with their work at the Hacker Within, a grassroots group at the University of
Wisconsin that they helped found, Katy and Anthony have shown that none of this
pain is necessary. A few basic tools like the command shell and version control, and a
few basic techniques like writing modular code, can save scientists hours or days of
work per week today, and simultaneously make it easier for others (including their
future selves) to reproduce and build on their work tomorrow.
This book won’t make you a great programmer—not on its own—but it will make
you a better programmer. It will teach you how to do everyday tasks without feeling
like you’re wading through mud, and give you the background knowledge you need
to make effective use of the thousands of tutorials and Q&A forums now available on
the Web. I really wish I had written it, but if I had, I couldn’t have done a better job
than Anthony and Katy. I hope you enjoy it as much as I have.
—Gregory V. Wilson
xv
www.it-ebooks.info
www.it-ebooks.info
Preface
Welcome to Effective Computation in Physics. By reading this book, you will learn the
essential software skills that are needed by anyone in a physics-based field. From
astrophysics to nuclear engineering, this book will take you from not knowing how to
make a computer add two variables together to being the software development guru
on your team.
Physics and computation have a long history together. In many ways, computers and
modern physics have co-evolved. Only cryptography can really claim the same time‐
line with computers as physics. Yet in spite of this shared growth, physicists are not
the premier software developers that you would expect. Physicists tend to suffer from
two deadly assumptions:
1. Software development and software engineering are easy.
2. Simply by knowing physics, someone knows how to write code.
While it is true that some skills are transferable—for example, being able to reason
about abstract symbols is important to both—the fundamental concerns, needs, inter‐
ests, and mechanisms for deriving the truth of physics and computation are often dis‐
tinct.
For physicists, computers are just another tool in the toolbox. Computation plays a
role in physics that is not unlike the role of mathematics. You can understand physi‐
cal concepts without a computer, but knowing how to speak the language(s) of com‐
puters makes practicing physics much easier. Furthermore, a physical computer is not
unlike a slide rule or a photon detector or an oscilloscope. It is an experimental device
that can help inform the science at hand when set up properly. Because computers are
much more complicated and configurable than any previous experimental device,
however, they require more patience, care, and understanding to properly set up.
xvii
www.it-ebooks.info
More and more physicists are being asked to be software developers as part of their
work or research. This book aims to make growing as a software developer as easy as
possible. In the long run, this will enable you to be more productive as a physicist.
On the other end of the spectrum, computational modeling and simulation have
begun to play an important part in physics. When experiments are too big or expen‐
sive to perform in statistically significant numbers, or when theoretical parameters
need to be clamped down, simulation science fills a vital role. Simulations help tell
experimenters where to look and can validate a theory before it ever hits a bench.
Simulation is becoming a middle path for physicists everywhere, separate from
theory and experiment. Many simulation scientists like to think of themselves as
being more theoretical. In truth, though, the methods that are used in simulations are
more similar to experimentalism.
What Is This Book?
All modern physicists, no matter how experimental, rely on a computer in some part
of their scientific workflow. Some researchers only use computers as word processing
devices. Others may employ computers that tirelessly collect data and churn analyses
through the night, outpacing most other members of their research teams. This book
introduces ways to harness computers to accomplish and automate nearly any aspect
of research, and should be used as a guide during each phase of research.
Reading this book is a great way to learn about computational physics from all angles.
It will help you to gain and hone software development skills that will be invaluable in
the context of your work as a physicist. To the best of our knowledge, another book
like this does not exist. This is not a physics textbook. This book is not the only way
to learn about Python and other programming concepts. This book is about what
happens when those two worlds inelastically collide. This book is about computa‐
tional physics. You are in for a treat!
Who This Book Is For
This book is for anyone in a physics-based field who must do some programming as a
result of their job or one of their interests. We specifically cast a wide net with the
term “physics-based field.” We take this term to mean any of the following fields:
physics, astronomy, astrophysics, geology, geophysics, climate science, applied math,
biophysics, nuclear engineering, mechanical engineering, material science, electrical
engineering, and more. For the remainder of this book, when the term physics is used
it refers to this broader sense of physics and engineering. It does not simply refer to
the single area of study that shares that name.
Even though this book is presented in the Python programming language, the con‐
cepts apply to a wide variety of programming languages, both modern and historical.
xviii
|
Preface
www.it-ebooks.info
Python was chosen here because it is easy and intuitive to use in a wide variety of
situations. While you are trying to learn concepts in computational physics, Python
gets out of your way. You can take the skills that you learn here and apply them
equally well in other programming contexts.
Who This Book Is Not For
While anyone is welcome to read this book and learn, it is targeted at people in phys‐
ics who need to learn computational skills. The examples will draw from a working
knowledge of physics concepts. If you primarily work as a linguist or anthropologist,
this book is probably not for you. No knowledge of computers or programming is
assumed. If you have already been working as a software developer for several years,
this book will help you only minimally.
Case Study on How to Use This Book: Radioactive
Decay Constants
To demonstrate, let’s take the example of a team of physicists using a new detector to
measure the decay constants of radium isotopes at higher precision. The physicists
will need to access data that holds the currently accepted values. They may also want
to write a small program that gives the expected activity of each isotope as a function
of time. Next, the scientists will collect experimental data from the detector, store the
raw output, compare it to the expected values, and publish a paper on the differences.
Since the heroes of this story value the tenets of science and are respectful of their
colleagues, they’ll have been certain to test all of their analyses and to carefully docu‐
ment each part of the process along the way. Their colleagues, after all, will need to
repeat this process for the thousands of other isotopes in the table of nuclides.
Accessing Data and Libraries
To access a library that holds nuclear data such as currently accepted nuclear decay
constants, λi, for each isotope i, our heroes may have to install the ENSDF database
into their filesystem. Insights about the shell (Chapter 1) and systems for building
software (Chapter 14) will be necessary in this simple endeavor.
Creating a Simple Program
−λ t
The expected activity for an isotope as a function of time is very simple (Ai = N ie i ).
No matter how simple the equation, though, no one wants to solve it by hand (or by
copying and pasting in Excel) for every 10−10 second of the experiment. For this step,
Chapter 2 provides a guide for creating a simple function in the Python program‐
ming language. For more sophisticated mathematical models, object orientation
Preface
www.it-ebooks.info
|
xix
(Chapter 6), numerical Python (Chapter 9), and data structures (Chapter 11) may be
needed.
Automating Data Collection
A mature experiment is one that requires no human intervention. Said another way, a
happy physicist sleeps at home while the experiment is running unaided all night
back at the lab. The skills gained in Chapter 1 and Chapter 2 can help to automate
data collection from an experiment. Methods for storing that data can be learned in
Chapter 10, which covers HDF5.
Analyzing and Plotting the Data
Once the currently accepted values are known and the experimental data has been
collected, the next step of the experiment is to compare the two datasets. Along with
lessons learned from Chapter 1 and Chapter 2, this step will be aided by a familiarity
with sophisticated tools for analysis and visualization (Chapter 7). For very complex
data analysis, parallelism (the basics of which are discussed in Chapter 12) can speed
up the work by employing many processors at once.
Keeping Track of Changes
Because this is science, reproducibility is paramount. To make sure that they can
repeat their results, unwind their analysis to previous versions, and replicate their
plots, all previous versions of the scientists’ code and data should be under version
control. This tool may be the most essential one in this book. The basics of version
control can be found in Chapter 15, and the use of version control within a collabora‐
tion is discussed in Chapter 16.
Testing the Code
In addition to being reproducible, the theory, data collection, analysis, and plots must
be correct. Accordingly, Chapter 17 will cover the basics of how to debug software
and how to interpret error messages. Even after debugging, the fear of unnoticed soft‐
ware bugs (and subsequent catastrophic paper retractions) compels our hero to test
the code that’s been written for this project. Language-independent principles for
testing code will be covered in Chapter 18, along with specific tools for testing Python
code.
Documenting the Code
All along, our physicists should have been documenting their computing processes
and methods. With the tools introduced in Chapter 19, creating a user manual for
code doesn’t have to be its own project. That chapter will demonstrate how a clicka‐
xx
|
Preface
www.it-ebooks.info
ble, Internet-publishable manual can be generated in an automated fashion based on
comments in the code itself. Even if documentation is left to the end of a project,
Chapter 19 can still help forward-thinking physicists to curate their work for poste‐
rity. The chapters on licenses (Chapter 22) and collaboration (Chapter 21) will also be
helpful when it’s time to share that well-documented code.
Publishing
Once the software is complete, correct, and documented, our physicists can then
move on to the all-important writing phase. Sharing their work in a peer-reviewed
publication is the ultimate reward of this successful research program. When the data
is in and the plots are generated, the real challenge has often only begun, however.
Luckily, there are tools that help authors be more efficient when writing scientific
documents. These tools will be introduced in Chapter 20.
What to Do While Reading This Book
You learn by doing. We want you to learn, so we expect you to follow along with the
examples. The examples here are practical, not theoretical. In the chapters on Python,
you should fire up a Python session (don’t worry, we’ll show you how). Try the code
out for yourself. Try out your own variants of what is presented in the book. Writing
out the code yourself makes the software and the physics real.
If you run into problems, try to solve them by thinking about what went wrong. Goo‐
gling the error messages you see is a huge help. The question and answer website
Stack Overflow is your new friend. If you find yourself truly stuck, feel free to contact
us. This book can only give you a finite amount of content to study. However, with
your goals and imagination, you will be able to practice computational physics until
the end of time.
Furthermore, if there are chapters or sections whose topics you already feel comforta‐
ble with or that you don’t see as being directly relevant to your work, feel free to skip
them! You can always come back to a section if you do not understand something or
you need a refresher. We have inserted many back and forward references to topics
throughout the course of the text, so don’t worry if you have skipped something that
ends up being important later. We’ve tried to tie everything together so that you can
know what is happening, while it is happening. This book is one part personal odys‐
sey and one part reference manual. Please use it in both ways.
Preface
www.it-ebooks.info
|
xxi
Conventions Used in This Book
The following typographical conventions are used in this book:
Italic
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width
Used for program listings, as well as within paragraphs to refer to program ele‐
ments such as variable or function names, databases, data types, environment
variables, statements, and keywords.
This element signifies a tip or suggestion.
This element signifies a general note.
This element indicates a warning or caution.
This book also makes use of a fair number of “code callouts.” This is where the coding
examples are annotated with numbers in circles. For example:
print("This is code that you should type.")
This is used to annotate something special about the software you are writing.
These are useful for drawing your attention to specific parts of the code and to
explain what is happening on a step-by-step basis. You should not type the circled
numbers, as they are not part of the code itself.
Using Code Examples
Supplemental material (code examples, exercises, etc.) is available for download at
https://github.com/physics-codes/examples.
xxii
|
Preface
www.it-ebooks.info
This book is here to help you get your job done. In general, if example code is offered
with this book, you may use it in your programs and documentation. You do not
need to contact us for permission unless you’re reproducing a significant portion of
the code. For example, writing a program that uses several chunks of code from this
book does not require permission. Selling or distributing a CD-ROM of examples
from O’Reilly books does require permission. Answering a question by citing this
book and quoting example code does not require permission. Incorporating a signifi‐
cant amount of example code from this book into your product’s documentation does
require permission.
We appreciate, but do not require, attribution. An attribution usually includes the
title, author, publisher, and ISBN. For example: “Effective Computation in Physics by
Anthony Scopatz and Kathryn D. Huff (O’Reilly). Copyright 2015 Anthony Scopatz
and Kathryn D. Huff, 978-1-491-90153-3.”
If you feel your use of code examples falls outside fair use or the permission given
above, feel free to contact us at permissions@oreilly.com.
Installation and Setup
This book will teach you to use and master many different software projects. That
means that you will have to have a lot of software packages on your computer to fol‐
low along. Luckily, the process of installing the packages has recently become much
easier and more consistent. We will be using the conda package manager for all of our
installation needs.
Step 1: Download and Install Miniconda (or Anaconda)
If you have not done so already, please download and install Miniconda. Alterna‐
tively, you can install Anaconda. Miniconda is a stripped-down version of Anaconda,
so if you already have either of these, you don’t need the other. Miniconda is a Python
distribution that comes with Conda, which we will then use to install everything else
we need. The Conda website will help you download the Miniconda version that is
right for your system. Linux, Mac OS X, and Windows builds are available for 32- and
64-bit architectures. You do not need administrator privileges on your computer to
install Miniconda. We recommend that you install the Python 3 version, although all
of the examples in this book should work with Python 2 as well.
If you are on Windows, we recommend using Anaconda because it allievates some of
the other package installation troubles. However, on Windows you can install Mini‐
conda simply by double-clicking on the executable and following the instructions in
the installation wizard.
Preface
www.it-ebooks.info
|
xxiii
Special Windows Instructions Without Anaconda: msysGit and Git Bash
If you are on Windows and are not using Anaconda, please down‐
load and install msysGit, which you can find on GitHub. This will
provide you with the version control system called Git as well as
the bash shell, both of these, which we will discuss at length. Nei‐
ther is automatically available on Windows or through Miniconda.
The default install settings should be good enough for our purposes
here.
If you are on Linux or Mac OS X, first open your Terminal application. If you do not
know where your Terminal lives, use your operating system’s search functionality to
find it. Once you have an open terminal, type in the following after the dollar sign ($).
Note that you may have to change the version number in the filename (the
Miniconda-3.7.0-Linux-x86_64.sh part) to match the file that you downloaded:
# On Linux, use the following to install Miniconda:
$ bash ~/Downloads/Miniconda-3.7.0-Linux-x86_64.sh
# On Mac OS X, use the following to install Miniconda:
$ bash ~/Downloads/Miniconda3-3.7.0-MacOSX-x86_64.sh
Here, we have downloaded Miniconda into our default download directory, ~/Down‐
loads. The file we downloaded was the 64-bit version; if you’re using the 32-bit ver‐
sion you will have to adjust the filename accordingly.
On Linux, Mac OS X, and Windows, when the installer asks you if you would like to
automatically change or update the .bashrc file or the system PATH, say yes. That will
make it so that Miniconda is automatically in your environment and will ease further
installation. Otherwise, all of the other default installation options should be good
enough.
Step 2: Install the Packages
Now that you have Conda installed, you can install the packages that you’ll need for
this book. On Windows, open up the command prompt, cmd.exe. On Linux and Mac
OS X, open up a terminal. You may need to open up a new terminal window for the
installation of Miniconda to take effect. Now, no matter what your operating system
is, type the following command:
$ conda install --yes numpy scipy ipython ipython-notebook matplotlib pandas \
pytables nose setuptools sphinx mpi4py
This may take a few minutes to download. After this, you are ready to go!
xxiv
|
Preface
www.it-ebooks.info
Safari® Books Online
Safari Books Online is an on-demand digital library that deliv‐
ers expert content in both book and video form from the
world’s leading authors in technology and business.
Technology professionals, software developers, web designers, and business and crea‐
tive professionals use Safari Books Online as their primary resource for research,
problem solving, learning, and certification training.
Safari Books Online offers a range of plans and pricing for enterprise, government,
education, and individuals.
Members have access to thousands of books, training videos, and prepublication
manuscripts in one fully searchable database from publishers like O’Reilly Media,
Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que,
Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kauf‐
mann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders,
McGraw-Hill, Jones & Bartlett, Course Technology, and hundreds more. For more
information about Safari Books Online, please visit us online.
How to Contact Us
Please address comments and questions concerning this book to the publisher:
O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)
We have a web page for this book, where we list errata, examples, and any additional
information. You can access this page at http://bit.ly/effective-comp.
To comment or ask technical questions about this book, send email to bookques‐
tions@oreilly.com.
For more information about our books, courses, conferences, and news, see our web‐
site at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia
Preface
www.it-ebooks.info
|
xxv
Acknowledgments
This work owes a resounding thanks to Greg Wilson and to Software Carpentry. The
work you have done has changed the conversation surrounding computational sci‐
ence. You have set the stage for this book to even exist. The plethora of contributions
to the community cannot be understated.
Equally, we must thank Paul P.H. Wilson and The Hacker Within for continuing to
inspire us throughout the years. Independent of age and affiliation, you have always
challenged us to learn from each other and unlock what was already there.
Stephen Scopatz and Bruce Rowe also deserve the special thanks afforded only to
parents and professors. Without them helping connect key synapses at the right time,
this book would never have been proposed.
The African Institute for Mathematical Sciences deserves special recognition for
demonstrating the immense value of scientific computing, even to those of us who
have been in the field for years. Your work inspired this book, and we hope that we
can give back to your students by writing it.
We also owe thanks to our reviewers for keeping us honest: Jennifer Klay, Daniel
Wooten, Michael Sarahan, and Denia Djokić.
To baristas all across the world, in innumerable cafés, we salute you.
xxvi
|
Preface
www.it-ebooks.info
PART I
Getting Started
www.it-ebooks.info
www.it-ebooks.info
CHAPTER 1
Introduction to the Command Line
The command line, or shell, provides a powerful, transparent interface between the
user and the internals of a computer. At least on a Linux or Unix computer, the com‐
mand line provides total access to the files and processes defining the state of the
computer—including the files and processes of the operating system.
Also, many numerical tools for physics can only be installed and run through this
interface. So, while this transparent interface could inspire the curiosity of a physicist
all on its own, it is much more likely that you picked up this book because there is
something you need to accomplish that only the command line will be capable of.
While the command line may conjure images of The Matrix, do not let it intimidate
you. Let’s take the red pill.
Navigating the Shell
You can access the shell by opening a terminal emulator (“terminal” for short) on a
Linux or Unix computer. On a Windows computer, the Git Bash program is equiva‐
lent. Launching the terminal opens an interactive shell program, which is where you
will run your executable programs. The shell provides an interface, called the
command-line interface, that can be used to run commands and navigate through the
filesystem(s) to which your computer is connected. This command line is also some‐
times called the prompt, and in this book it will be denoted with a dollar sign ($) that
points to where your cursor is ready to enter input. It should look something like
Figure 1-1.
1
www.it-ebooks.info
Figure 1-1. A terminal instance
This program is powerful and transparent, and provides total access to the files and
processes on a computer. But what is the shell, exactly?
The Shell Is a Programming Language
The shell is a programming language that is run by the terminal. Like other program‐
ming languages, the shell:
• Can collect many operations into single entities
• Requires input
• Produces output
• Has variables and state
• Uses irritating syntax
• Uses special characters
Additionally, as with programming languages, there are more shells than you’ll really
care to learn. Among shells, bash is most widely used, so that is what we’ll use in this
discussion. The csh, tcsh, and ksh shell types are also popular. Features of various
shells are listed in Table 1-1.
2
|
Chapter 1: Introduction to the Command Line
www.it-ebooks.info
Table 1-1. Shell types
Shell
Name
Description
sh
Bourne shell
Popular, ubiquitous shell developed in 1977, still guaranteed on all Unixes
csh
C shell
Improves on sh
ksh
Korn shell
Backward-compatible with sh, but extends and borrows from other shells
bash
Bourne again shell
Free software replacement for sh, much evolved
tcsh
Tenex C shell
Updated and extended C shell
Exercise: Open a Terminal
1. Search your computer’s programs to find one called Terminal.
On a Windows computer, remember to use Git Bash as your
bash terminal.
2. Open an instance of that program. You’re in the shell!
The power of the shell resides in its transparency. By providing direct access to the
entire filesystem, the shell can be used to accomplish nearly any task. Tasks such as
finding files, manipulating them, installing libraries, and running programs begin
with an understanding of paths and locations in the terminal.
Paths and pwd
The space where your files are—your file space—is made up of many nested directo‐
ries (folders). In Unix parlance, the location of each directory (and each file inside
them) is given by a “path.” These can be either absolute paths or relative paths.
Paths are absolute if they begin at the top of the filesystem directory tree. The very top
of the filesystem directory tree is called the root directory. The path to the root direc‐
tory is /. Therefore, absolute paths start with /.
In many UNIX and Linux systems, the root directory contains directories like bin and
lib. The absolute paths to the bin and lib directories are then /bin and /lib, respec‐
tively. A diagram of an example directory tree, along with some notion of paths, can
be seen in Figure 1-2.
Navigating the Shell
www.it-ebooks.info
|
3
Figure 1-2. An example directory tree
The / syntax is used at the beginning of a path to indicate the toplevel directory. It is also used to separate the names of directories in
a path, as seen in Figure 1-2.
Paths can, instead, be relative to your current working directory. The current working
directory is denoted with one dot (.), while the directory immediately above it (its
“parent”) is denoted with two dots (..). Relative paths therefore often start with a dot
or two.
As we have learned, absolute paths describe a file space location relative to the root
directory. Any path that describes a location relative to the current working directory
4
|
Chapter 1: Introduction to the Command Line
www.it-ebooks.info
instead is a relative path. Bringing these together, note that you can always print out
the full, absolute path of the directory you’re currently working in with the command
pwd (print working directory).
Bash was not available in the 1930s, when Lise Meitner was developing a theoretical
framework for neutron-induced fission. However, had Bash been available, Prof.
Meitner’s research computer might have contained a set of directories holding files
about her theory of fission as well as ideas about its application (see Figure 1-2). Let’s
take a look at how Lise would have navigated through this directory structure.
You can work along with Lise while you read this book. The direc‐
tory tree she will be working with in this chapter is available in a
repository on GitHub. Read the instructions at that site to down‐
load the files.
When she is working, Lise enters commands at the command prompt. In the follow‐
ing example, we can see that the command prompt gives an abbreviated path name
before the dollar sign (this is sometimes a greater-than sign or other symbol). That
path is ~/fission, because fission is the directory that Lise is currently working in:
~/fission $
When she types pwd at the command prompt, the shell returns (on the following line)
the full path to her current working directory:
~/fission $ pwd
/filespace/people/l/lisemeitner/fission/
When we compare the absolute path and the abbreviated prompt, it seems that the
prompt replaces all the directories up to and including lisemeitner with a single char‐
acter, the tilde (~). In the next section, we’ll see why.
Home Directory (~)
The shell starts your session from a special directory called your home directory. The
tilde (~) character can be used as a shortcut to your home directory. Thus, when you
log in, you probably see the command prompt telling you you’re in your home direc‐
tory:
~ $
These prompts are not universal. Sometimes, the prompt shows the username and
the name of the computer as well:
@:~ $
For Prof. Meitner, who held a research position at the prestigious Kaiser Wilhelm
Institute, this might appear as:
Navigating the Shell
www.it-ebooks.info
|
5
meitner@kaiser-wilhelm-cluster:~ $
Returning to the previous example, let us compare:
~/fission
to:
/filespace/people/l/lisemeitner/fission
It seems that the tilde has entirely replaced the home directory path (/filespace/
people/l/lisemeitner). Indeed, the tilde is an abbreviation for the home directory path
—that is, the sequence of characters (also known as a string) beginning with the root
directory (/). Because the path is defined relative to the absolute top of the directory
tree, this:
~/fission
and this:
/filespace/people/l/lisemeitner/fission
are both absolute paths.
Exercise: Find Home
1. Open the Terminal.
2. Type pwd at the command prompt and press Enter to see the
absolute path to your home directory.
Now that she knows where she is in the filesystem, curious Lise is interested in what
she’ll find there. To list the contents of a directory, she’ll need the ls command.
Listing the Contents (ls)
The ls command allows the user to print out a list of all the files and subdirectories
in a directory.
Exercise: List the Contents of a Directory
1. Open the Terminal.
2. Type ls at the command prompt and press Enter to see the
contents of your home directory.
From the fission directory in Professor Meitner’s home directory, ls results in the fol‐
lowing list of its contents:
6
|
Chapter 1: Introduction to the Command Line
www.it-ebooks.info
~/fission $ ls
applications/ heat-production.txt
neutron-release.txt
In the fission directory within her home directory, Lise types ls and then presses
Enter.
The shell responds by listing the contents of the current directory.
When she lists the contents, she sees that there are two files and one subdirectory. In
the shell, directories may be rendered in a different color than files or may be indica‐
ted with a forward slash (/) at the end of their name, as in the preceding example.
Lise can also provide an argument to the ls command. To list the contents of the
applications directory without entering it, she can execute:
~/fission $ ls applications
power/ propulsion/ weapons/
Lise lists the contents of the applications directory without leaving the fission
directory.
The shell responds by listing the three directories contained in the applications
directory.
The ls command can inform Lise about the contents of directories in her filesystem.
However, to actually navigate to any of these directories, Lise will need the command
cd.
Changing Directories (cd)
Lise can change directories with the cd command. When she types only those letters,
the cd command assumes she wants to go to her home directory, so that’s where it
takes her:
~/fission $ cd
~ $
Change directories to the default location, the home directory!
As you can see in this example, executing the cd command with no arguments results
in a new prompt. The prompt reflects the new current working directory, home (~).
To double-check, pwd can be executed and the home directory will be printed as an
absolute path:
~ $ pwd
/filespace/people/l/lisemeitner
Print the working directory.
Navigating the Shell
www.it-ebooks.info
|
7
The shell responds by providing the absolute path to the current working
directory.
However, the cd command can also be customized with an argument, a parameter
that follows the command to help dictate its behavior:
~/fission $ cd [path]
If Lise adds a space followed by the path of another directory, the shell navigates to
that directory. The argument can be either an absolute path or a relative path.
Angle and Square Bracket Conventions
Using is a common convention for terms that
must be included and for which a real value must be substituted.
You should not type in the less-than (<) and greater-than (>) sym‐
bols themselves. Thus, if you see cd , you should type in
something like cd mydir. The [square brackets] convention
denotes optional terms that may be present. Likewise, if they do
exist, do not type in the [ or ]. Double square brackets ([[]]) are
used to denote optional arguments that are themselves dependent
on the existence of other [optional] arguments.
In the following example, Lise uses an absolute path to navigate to a sub-subdirectory.
This changes the current working directory, which is visible in the prompt that
appears on the next line:
~ $ cd /filespace/people/l/lisemeitner/fission
~/fission $
Lise uses the full, absolute path to the fission directory. This means, “change
directories into the root directory, then the filespace directory, then the people
directory, and so on until you get to the fission directory.” She then presses Enter.
She is now in the directory ~/fission. The prompt has changed accordingly.
Of course, that is a lot to type. We learned earlier that the shorthand ~ means “the
absolute path to the home directory.” So, it can be used to shorten the absolute path,
which comes in handy here, where that very long path can be replaced with ~/
fission:
~/ $ cd ~/fission
~/fission $
The tilde represents the home directory, so the long absolute path can be short‐
ened, accomplishing the same result.
8
|
Chapter 1: Introduction to the Command Line
www.it-ebooks.info
Another succinct way to provide an argument to cd is with a relative path. A relative
path describes the location of a directory relative to the location of the current direc‐
tory. If the directory where Lise wants to move is inside her current directory, she can
drop everything up to and including the current directory’s name. Thus, from the
fission directory, the path to the applications directory is simply its name:
~/fission $ cd applications
~/fission/applications $
The applications directory must be present in the current directory for this com‐
mand to succeed.
If a directory does not exist, bash will not be able to change into that location and will
report an error message, as seen here. Notice that bash stays in the original directory,
as you might expect:
~/fission $ cd biology
-bash: cd: biology: No such file or directory
~/fission $
Another useful convention to be aware of when forming relative paths is that the cur‐
rent directory can be represented by a single dot (.). So, executing cd ./power is
identical to executing cd power:
~/fission/applications/ $ cd ./power
~/fission/applications/power/ $
Change directories into this directory, then into the power directory.
Similarly, the parent of the current directory’s parent is represented by two dots (..).
So, if Lise decides to move back up one level, back into the applications directory, this
is the syntax she could use:
~/fission/applications/power/ $ cd ..
~/fission/applications/ $
Using the two-dots syntax allows relative paths to point anywhere, not just at subdir‐
ectories of your current directory. For example, the relative path ../../../ means three
directories above the current directory.
Navigating the Shell
www.it-ebooks.info
|
9
Exercise: Change Directories
1. Open the Terminal.
2. Type cd .. at the command prompt and press Enter to move
from your home directory to the directory above it.
3. Move back into your home directory using a relative path.
4. If you have downloaded Lise’s directory tree from the book’s
GitHub repository, can you navigate to that directory using
what you know about ls, cd, and pwd?
A summary of a few of these path-generating shortcuts is listed in Table 1-2.
Table 1-2. Path shortcuts
Syntax Meaning
/
The root, or top-level, directory of the filesystem (also used for separating the names of
directories in paths)
~
The home directory
.
This directory
..
The parent directory of this directory
../..
The parent directory of the parent directory of this directory
While seeing the names of files and directories is helpful, the content of the files is
usually the reason to navigate to them. Thankfully, the shell provides myriad tools for
this purpose. In the next section, we’ll learn how to inspect that content once we’ve
found a file of interest.
File Inspection (head and tail)
When dealing with input and output files for scientific computing programs, you
often only need to see the beginning or end of the file (for instance, to check some
important input parameter or see if your run completed successfully). The command
head prints the first 10 lines of the given file:
~/fission/applications/power $ head reactor.txt
# Fission Power Idea
10
|
Chapter 1: Introduction to the Command Line
www.it-ebooks.info
The heat from the fission reaction could be used to heat fluids. In
the same way that coal power starts with the production heat which
turns water to steam and spins a turbine, so too nuclear fission
might heat fluid that pushes a turbine. If somehow there were a way to
have many fissions in one small space, the heat from those fissions
could be used to heat quite a lot of water.
As you might expect, the tail command prints the last 10:
~/fission/applications/power $ head reactor.txt
the same way that coal power starts with the production heat which
turns water to steam and spins a turbine, so too nuclear fission
might heat fluid that pushes a turbine. If somehow there were a way to
have many fissions in one small space, the heat from those fissions
could be used to heat quite a lot of water.
Of course, it would take quite a lot of fissions.
Perhaps Professors Rutherford, Curie, or Fermi have some ideas on this
topic.
Exercise: Inspect a File
1. Open a terminal program on your computer.
2. Navigate to a text file.
3. Use head and tail to print the first and last lines to the
terminal.
This ability to print the first and last lines of a file to the terminal output comes in
handy when inspecting files. Once you know how to do this, the next tasks are often
creating, editing, and moving files.
Manipulating Files and Directories
In addition to simply finding files and directories, the shell can be used to act on
them in simple ways (e.g., copying, moving, deleting) and in more complex ways
(e.g., merging, comparing, editing). We’ll explore these tasks in more detail in the fol‐
lowing sections.
Creating Files (nano, emacs, vi, cat, >, and touch)
Creating files can be done in a few ways:
• With a graphical user interface (GUI) outside the terminal (like Notepad, Eclipse,
or the IPython Notebook)
Manipulating Files and Directories
www.it-ebooks.info
|
11
• With the touch command
• From the command line with cat and redirection (>)
• With a sophisticated text editor inside the terminal, like nano, emacs, or vi
Each has its own place in a programming workflow.
GUIs for file creation
Readers of this book will have encountered, at some point, a graphical user interface
for file creation. For example, Microsoft Paint creates .bmp files and word processors
create .doc files. Even though they were not created in the terminal, those files are
(usually) visible in the filesystem and can be manipulated in the terminal. Possible
uses in the terminal are limited, though, because those file types are not plain text.
They have binary data in them that is not readable by a human and must be inter‐
preted through a GUI.
Source code, on the other hand, is written in plain-text files. Those files, depending
on the conventions of the language, have various filename extensions. For example:
• .cc indicates C++
• .f90 indicates Fortran90
• .py indicates Python
• .sh indicates bash
Despite having various extensions, source code files are plain-text files and should
not be created in a GUI (like Microsoft Word) unless it is intended for the creation of
plain-text files. When creating and editing these source code files in their language of
choice, software developers often use interactive development environments (IDEs),
specialized GUIs that assist with the syntax of certain languages and produce plaintext code files. Depending on the code that you are developing, you may decide to use
such an IDE. For example, MATLAB is the appropriate tool for creating .m files, and
the IPython Notebook is appropriate for creating .ipynb files.
Some people achieve enormous efficiency gains from IDEs, while others prefer tools
that can be used for any text file without leaving the terminal. The latter type of text
editor is an essential tool for many computational scientists—their hammer for every
nail.
Creating an empty file (touch)
A simple, empty text file, however, can be created with a mere “touch” in the terminal.
The touch command, followed by a filename, will create an empty file with that
name.
12
|
Chapter 1: Introduction to the Command Line
www.it-ebooks.info
Suppose Lise wants to create a file to act as a placeholder for a new idea for a nuclear
fission application, like providing heat sources for remote locations such as Siberia.
She can create that file with the touch command:
~/fission/applications $ touch remote_heat.txt
If the file already exists, the touch command does no damage. All files have metadata,
and touch simply updates the file’s metadata with a new “most recently edited” time‐
stamp. If the file does not already exist, it is created.
Note how the remote_heat.txt file’s name uses an underscore
instead of a space. This is because spaces in filenames are errorprone on the command line. Since the command line uses spaces
to separate arguments from one another, filenames with spaces can
confuse the syntax. Try to avoid filenames with spaces. If you can’t
avoid them, note that the escape character (\) can be used to alert
the shell about a space. A filename with spaces would then be
referred to as my\ file\ with\ spaces\ in\ its\ name.txt.
While the creation of empty files can be useful sometimes, computational scientists
who write code do so by adding text to code source files. For that, they need text
editors.
The simplest text editor (cat and >)
The simplest possible way, on the command line, to add text to a file without leaving
the terminal is to use a program called cat and the shell syntax >, which is called redi‐
rection.
The cat command is meant to help concatenate files together. Given a filename as its
argument, cat will print the full contents of the file to the terminal window. To out‐
put all content in reactor.txt, Lise could use cat as follows:
~fission/applications/power $ cat reactor.txt
# Fission Power Idea
The heat from the fission reaction could be used to heat fluids. In
the same way that coal power starts with the production heat which
turns water to steam and spins a turbine, so too nuclear fission
might heat fluid that pushes a turbine. If somehow there were a way to
have many fissions in one small space, the heat from those fissions
could be used to heat quite a lot of water.
Of course, it would take quite a lot of fissions.
Perhaps Professors Rutherford, Curie, or Fermi have some ideas on this topic.
Manipulating Files and Directories
www.it-ebooks.info
|
13
This quality of cat can be combined with redirection to push the output of one file
into another. Redirection, as its name suggests, redirects output. The greater-than
symbol, >, is the syntax for redirection. The arrow collects any output from the com‐
mand preceding it and redirects that output into whatever file or program follows it.
If you specify the name of an existing file, its contents will be overwritten. If the file
does not already exist, it will be created. For example, the following syntax pushes the
contents of reactor.txt into a new file called reactor_copy.txt:
~fission/applications/power $ cat reactor.txt > reactor_copy.txt
Without any files to operate on, cat accepts input from the command prompt.
Killing or Interrupting Programs
In the exercise above, you needed to use Ctrl-d to escape the cat program. This is not
uncommon. Sometimes you’ll run a program and then think better of it, or, even
more likely, you’ll run it incorrectly and need to stop its execution. Ctrl-c will usually
accomplish this for noninteractive programs. Interactive programs (like less) typi‐
cally define some other keystroke for killing or exiting the program. Ctrl-d will nor‐
mally do the trick in these cases.
As an example of a never-terminating program, let’s use the yes program. If you call
yes, the terminal will print y ad infinitum. You can use Ctrl-c to make it stop.
~/fission/supercritical $ yes
y
y
y
y
y
y
y
y
Ctrl-c
Exercise: Learn About a Command
1. Open a terminal.
2. Type cat and press Enter. The cursor will move to a blank line.
3. Try typing some text. Note how every time you press Enter, a
copy of your text is repeated.
4. To exit, type Ctrl-d. That is, hold down the Control key and
press the lowercase d key at the same time.
14
|
Chapter 1: Introduction to the Command Line
www.it-ebooks.info
Used this way, cat reads any text typed into the prompt and emits it back out. This
quality, combined with redirection, allows you to push text into a file without leaving
the command line. Therefore, to insert text from the prompt into the remote_heat.txt
file, the following syntax can be used:
~fission/applications/power $ cat > remote_heat.txt
After you press Enter, the cursor will move to a blank line. At that point, any text
typed in will be inserted into remote_heat.txt. To finish adding text and exit cat, type
Ctrl-d.
Be careful. If the file you redirect into is not empty, its contents will
be erased before it adds what you’re writing.
Using cat this way is the simplest possible way to add text to a file. However, since
cat doesn’t allow the user to go backward in a file for editing, it isn’t a very powerful
text editor. It would be incredibly difficult, after all, to type each file perfectly the first
time. Thankfully, a number of more powerful text editors exist that can be used for
much more effective text editing.
More powerful text editors (nano, emacs, and vim)
A more efficient way to create and edit files is with a text editor. Text editors are pro‐
grams that allow the user to create, open, edit, and close plain-text files. Many text
editors exist. nano is a simple text editor that is recommended for first-time users.
The most common text editors in programming circles are emacs and vim; these pro‐
vide more powerful features at the cost of a sharper learning curve.
Typing the name of the text editor opens it. If the text editor’s name is followed by the
name of an existing file, that file is opened with the text editor. If the text editor’s
name is followed by the name of a nonexistent file, then the file is created and
opened.
To use the nano text editor to open or create the remote_heat.txt file, Lise Meitner
would use the command:
~fission/applications/power $ nano remote_heat.txt
Figure 1-3 shows the nano text editor interface that will open in the terminal. Note
that the bottom of the interface indicates the key commands for saving, exiting, and
performing other tasks.
Manipulating Files and Directories
www.it-ebooks.info
|
15
Figure 1-3. The nano text editor
If Lise wanted to use the vim text editor, she could use either the command vim or the
command vi on the command line to open it in the same way. On most modern Unix
or Linux computers, vi is a short name for vim (vim is vi, improved). To use emacs,
she would use the emacs command.
Choose an Editor, Not a Side
A somewhat religious war has raged for decades in certain circles on the topic of
which text editor is superior. The main armies on this battlefield are those that herald
emacs and those that herald vim. In this realm, the authors encourage the reader to
maintain an attitude of radical acceptance. In the same way that personal choices in
lifestyle should be respected unconditionally, so too should be the choice of text edi‐
tor. While the selection of a text editor can powerfully affect one’s working efficiency
and enjoyment while programming, the choice is neither permanent nor an indica‐
tion of character.
Because they are so powerful, many text editors have a steep learning curve. The
many commands and key bindings in a powerful text editor require practice to mas‐
ter. For this reason, readers new to text editors should consider starting with nano, a
low-powered text editor with a shallower learning curve.
16
|
Chapter 1: Introduction to the Command Line
www.it-ebooks.info
Exercise: Open nano
1. Open the Terminal.
2. Execute the command nano.
3. Add some text to the file.
4. Use the instructions at the bottom of the window to name and
save the file, then exit nano.
Copying and Renaming Files (cp and mv)
Now that we’ve explored how to create files, let’s start learning how to move and
change them. To make a copy of a file, use the cp command. The cp command has
the syntax cp
Source Exif Data:
File Type : PDF
File Type Extension : pdf
MIME Type : application/pdf
PDF Version : 1.6
Linearized : No
Author : Anthony Scopatz & Kathryn Huff
Create Date : 2015:06:09 16:04:21Z
Modify Date : 2015:08:10 10:14:56+03:00
Www It-ebooks Info : {6F114860-FA26-42C9-B26F-48EC50C4FBE9}
Keywords : www.it-ebooks.info
Language : EN
XMP Toolkit : Image::ExifTool 9.90
Creator : www.it-ebooks.info
Format : application/pdf
Subject : www.it-ebooks.info
Title : Effective Computation in Physics
Producer : www.it-ebooks.info
Trapped : False
Creator Tool : AH CSS Formatter V6.2 MR4 for Linux64 : 6.2.6.18551 (2014/09/24 15:00JST)
Metadata Date : 2015:06:29 12:33:04-04:00
Document ID : uuid:5e46320e-2d7f-dc4f-81b9-c901bdcc2256
Instance ID : uuid:86f18c08-7eba-114c-b9bc-3f6431a1f243
Page Layout : SinglePage
Page Mode : UseOutlines
Page Count : 552
EXIF Metadata provided by EXIF.tools