Concise Linux//An Introduction To Linux Use And Administration Lxk1 En Manual

User Manual:

Open the PDF directly: View PDF .
Page Count: 484 [warning: Documents this large are best viewed by clicking the View PDF Link!]

Contents
List of Tables
List of Figures
Preface
Introduction
- What is Linux?
- Linux History
- Free Software, ``Open Source'' and the GPL
- Linux—The Kernel
- Linux Properties
- Linux Distributions
Using the Linux System
- Logging In and Out
- Switching On and Off
- The System Administrator
Who's Afraid Of The Big Bad Shell?
- Why?
  - What Is The Shell?
- Commands
Getting Help
- Self-Help
- The *help Command and the *–help Option
- The On-Line Manual
- Info Pages
- HOWTOs
- Further Information Sources
The vi Editor
- Editors
- The Standard—*vi
- Other Editors
Files: Care and Feeding
- File and Path Names
- Directory Commands
- File Search Patterns
- Handling Files
Standard I/O and Filter Commands
- I/O Redirection and Command Pipelines
- Filter Commands
- Reading and Writing Files
- Text Processing
  - Character by Character—*tr, *expand and *unexpand
  - Line by Line—*fmt, *pr and so on
- Data Management
  - Sorted Files—*sort and *uniq
  - Columns and Fields—*cut, *paste etc.
More About The Shell
- Simple Commands: *sleep, *echo, and *date
- Shell Variables and The Environment
- Command Types—Reloaded
- The Shell As A Convenient Tool
- Commands From A File
- The Shell As A Programming Language
  - Foreground and Background Processes
The File System
- Terms
- File Types
- The Linux Directory Tree
- Directory Tree and File Systems
- Removable Media
System Administration
- Introductory Remarks
- The Privileged root Account
- Obtaining Administrator Privileges
- Distribution-specific Administrative Tools
User Administration
- Basics
- User and Group Information
- Managing User Accounts and Group Information
Access Control
- The Linux Access Control System
- Access Control For Files And Directories
- Access Control Lists (ACLs)
- Process Ownership
- Special Permissions for Executable Files
- Special Permissions for Directories
- File Attributes
Process Management
- What Is A Process?
- Process States
- Process Information—*ps
- Processes in a Tree—pstree
- Controlling Processes—*kill and *killall
- pgrep and pkill
- Process Priorities—*nice and *renice
- Further Process Management Commands—*nohup and *top
Hard Disks (and Other Secondary Storage)
- Fundamentals
- Bus Systems for Mass Storage
- Partitioning
- Linux and Mass Storage
- Partitioning Disks
- Loop Devices and kpartx
- The Logical Volume Manager (LVM)
File Systems: Care and Feeding
- Creating a Linux File System
- Mounting File Systems
- The *dd Command
Booting Linux
- Fundamentals
- GRUB Legacy
- Kernel Parameters
- System Startup Problems
System-V Init and the Init Process
- The Init Process
- System-V Init
- Upstart
- Shutting Down the System
Systemd
- Overview
- Unit Files
- Unit Types
- Dependencies
- Targets
- The systemctl Command
- Installing Units
Time-controlled Actions—*cron and *at
- Introduction
- One-Time Execution of Commands
- Repeated Execution of Commands
System Logging
- The Problem
- The Syslog Daemon
- Log Files
- Kernel Logging
- Extended Possibilities: Rsyslog
- The ``next generation'': Syslog-NG
- The *logrotate Program
System Logging with Systemd and ``The Journal''
- Fundamentals
- Systemd and journald
- Log Inspection
TCP/IP Fundamentals
- History and Introduction
  - The History of the Internet
  - Internet Administration
- Technology
  - Overview
  - Protocols
- TCP/IP
- Addressing, Routing and Subnetting
- IPv6
  - IPv6 Addressing
Linux Network Configuration
- Network Interfaces
- Persistent Network Configuration
- DHCP
- IPv6 Configuration
- Name Resolution and DNS
Network Troubleshooting
- Introduction
- Local Problems
- Checking Connectivity With ping
- Checking Routing Using traceroute And tracepath
- Checking Services With netstat And nmap
- Testing DNS With host And dig
- Other Useful Tools For Diagnosis
The Secure Shell
- Introduction
- Logging Into Remote Hosts Using *ssh
- Other Useful Applications: *scp and *sftp
- Public-Key Client Authentication
- Port Forwarding Using SSH
  - X11 Forwarding
  - Forwarding Arbitrary TCP Ports
Software Package Management Using Debian Tools
- Overview
- The Basis: dpkg
- Debian Package Management: The Next Generation
- Debian Package Integrity
- The debconf Infrastructure
- alien: Software From Different Worlds
Package Management with RPM and YUM
- Introduction
- Package Management Using *rpm
- YUM
Sample Solutions
Example Files
LPIC-1 Certification
- Overview
- Exam LPI-101
- Exam LPI-102
- LPI Objectives In This Manual
Command Index
Index

Concise Linux

An Introduction to Linux Use and

Administration

$ echo tux

tux

$ ls

hallo.c

hallo.o

$ /bin/su -

Password:

tuxcademy – Linux and Open Source learning materials for everyone

www.tuxcademy.org ⋅info@tuxcademy.org

These training materials have been certied by the Linux Professional Institute (LPI) under the

auspices of the LPI ATM programme. They are suitable for preparation for the LPIC-1 certication.

The Linux Professional Institute does not endorse specic exam preparation materials or

techniques—refer to

info@lpi.org

for details.

The tuxcademy project aims to supply freely available high-quality training materials on

Linux and Open Source topics – for self-study, school, higher and continuing education

and professional training.

Please visit

http://www.tuxcademy.org/

! Do contact us with questions or suggestions.

Concise Linux An Introduction to Linux Use and Administration

Revision:

lxk1:807d647231c25323:2015-08-21

adm1:33e55eeadba676a3:2015-08-08

10–18, 26–27

adm2:0cd20ee1646f650c:2015-08-21

20–25

grd1:be27bba8095b329b:2015-08-04

1–9, B

grd2:6eb247d0aa1863fd:2015-08-05

lxk1:qPeeTb2oHiy6EUuPrr0DT

http://www.tuxcademy.org

⋅

info@tuxcademy.org

Linux penguin “Tux” © Larry Ewing (CC-BY licence)

All representations and information contained in this document have been com-

piled to the best of our knowledge and carefully tested. However, mistakes cannot

be ruled out completely. To the extent of applicable law, the authors and the tux-

cademy project assume no responsibility or liability resulting in any way from the

use of this material or parts of it or from any violation of the rights of third parties.

Reproduction of trade marks, service marks and similar monikers in this docu-

ment, even if not specially marked, does not imply the stipulation that these may

be freely usable according to trade mark protection laws. All trade marks are used

without a warranty of free usability and may be registered trade marks of third

parties.

This document is published under the “Creative Commons-BY-SA 4.0 Interna-

tional” licence. You may copy and distribute it and make it publically available as

long as the following conditions are met:

Attribution You must make clear that this document is a product of the tux-

cademy project.

Share-Alike You may alter, remix, extend, or translate this document or modify

or build on it in other ways, as long as you make your contributions available

under the same licence as the original.

Further information and the full legal license grant may be found at

http://creativecommons.org/licenses/by-sa/4.0/

Authors: Tobias Elsner, Thomas Erker, Anselm Lingnau

Technical Editor: Anselm Lingnau ⟨

anselm.lingnau@linupfront.de

⟩

Typeset in Palatino, Optima and DejaVu Sans Mono

$ echo tux

tux

$ ls

hallo.c

hallo.o

$ /bin/su -

Password:

Contents

1 Introduction 15

1.1 What is Linux? . . . . . . . . . . . . . . . . . . . . . 16

1.2 Linux History . . . . . . . . . . . . . . . . . . . . . 16

1.3 Free Software, “Open Source” and the GPL . . . . . . . . . . 18

1.4 Linux—The Kernel . . . . . . . . . . . . . . . . . . . 21

1.5 Linux Properties . . . . . . . . . . . . . . . . . . . . 23

1.6 Linux Distributions . . . . . . . . . . . . . . . . . . . 26

2 Using the Linux System 31

2.1 Logging In and Out . . . . . . . . . . . . . . . . . . . 32

2.2 Switching On and O . . . . . . . . . . . . . . . . . . 34

2.3 The System Administrator. . . . . . . . . . . . . . . . . 34

3 Who’s Afraid Of The Big Bad Shell? 37

3.1 Why?........................38

3.1.1 What Is The Shell? . . . . . . . . . . . . . . . . . 38

3.2 Commands . . . . . . . . . . . . . . . . . . . . . . 40

3.2.1 Why Commands?. . . . . . . . . . . . . . . . . . 40

3.2.2 Command Structure. . . . . . . . . . . . . . . . . 40

3.2.3 Command Types . . . . . . . . . . . . . . . . . . 41

3.2.4 Even More Rules . . . . . . . . . . . . . . . . . . 42

4 Getting Help 45

4.1 Self-Help . . . . . . . . . . . . . . . . . . . . . . . 46

4.2 The

help

Command and the

--help

Option . . . . . . . . . . . 46

4.3 The On-Line Manual . . . . . . . . . . . . . . . . . . 46

4.3.1 Overview . . . . . . . . . . . . . . . . . . . . 46

4.3.2 Structure . . . . . . . . . . . . . . . . . . . . . 47

4.3.3 Chapters . . . . . . . . . . . . . . . . . . . . . 48

4.3.4 Displaying Manual Pages . . . . . . . . . . . . . . . 48

4.4 Info Pages . . . . . . . . . . . . . . . . . . . . . . 49

4.5 HOWTOs.......................50

4.6 Further Information Sources . . . . . . . . . . . . . . . . 50

5 The

Editor 53

5.1 Editors........................54

5.2 The Standard—

....................54

5.2.1 Overview . . . . . . . . . . . . . . . . . . . . 54

5.2.2 Basic Functions . . . . . . . . . . . . . . . . . . 55

5.2.3 Extended Commands . . . . . . . . . . . . . . . . 58

5.3 Other Editors . . . . . . . . . . . . . . . . . . . . . 60

4 Contents

6 Files: Care and Feeding 63

6.1 File and Path Names. . . . . . . . . . . . . . . . . . . 64

6.1.1 File Names . . . . . . . . . . . . . . . . . . . . 64

6.1.2 Directories . . . . . . . . . . . . . . . . . . . . 65

6.1.3 Absolute and Relative Path Names . . . . . . . . . . . 66

6.2 Directory Commands . . . . . . . . . . . . . . . . . . 67

6.2.1 The Current Directory:

& Co. . . . . . . . . . . . . 67

6.2.2 Listing Files and Directories—

............68

6.2.3 Creating and Deleting Directories:

mkdir

and

rmdir

. . . . . . 69

6.3 File Search Patterns . . . . . . . . . . . . . . . . . . . 70

6.3.1 Simple Search Patterns . . . . . . . . . . . . . . . . 70

6.3.2 Character Classes. . . . . . . . . . . . . . . . . . 72

6.3.3 Braces . . . . . . . . . . . . . . . . . . . . . . 73

6.4 Handling Files . . . . . . . . . . . . . . . . . . . . . 74

6.4.1 Copying, Moving and Deleting—

and Friends. . . . . . . 74

6.4.2 Linking Files—

and

ln -s

..............76

6.4.3 Displaying File Content—

and

less

..........80

6.4.4 Searching Files—

find

................81

6.4.5 Finding Files Quickly—

locate

and

slocate

.........84

7 Standard I/O and Filter Commands 87

7.1 I/O Redirection and Command Pipelines . . . . . . . . . . . 88

7.1.1 Standard Channels . . . . . . . . . . . . . . . . . 88

7.1.2 Redirecting Standard Channels . . . . . . . . . . . . . 89

7.1.3 Command Pipelines. . . . . . . . . . . . . . . . . 92

7.2 Filter Commands . . . . . . . . . . . . . . . . . . . . 94

7.3 Reading and Writing Files. . . . . . . . . . . . . . . . . 94

7.3.1 Outputting and Concatenating Text Files—

cat

and

tac

. . . . 94

7.3.2 Beginning and End—

head

and

tail

............96

7.3.3 Just the Facts, Ma’am—

and

hexdump

...........97

7.4 Text Processing. . . . . . . . . . . . . . . . . . . . . 100

7.4.1 Character by Character—

expand

and

unexpand

. . . . . . . 100

7.4.2 Line by Line—

fmt

and so on . . . . . . . . . . . . . 103

7.5 Data Management . . . . . . . . . . . . . . . . . . . 108

7.5.1 Sorted Files—

sort

and

uniq

..............108

7.5.2 Columns and Fields—

cut

paste

etc. . . . . . . . . . . . 113

8 More About The Shell 119

8.1 Simple Commands:

sleep

echo

, and

date

............120

8.2 Shell Variables and The Environment. . . . . . . . . . . . . 121

8.3 Command Types—Reloaded. . . . . . . . . . . . . . . . 123

8.4 The Shell As A Convenient Tool. . . . . . . . . . . . . . . 124

8.5 Commands From A File . . . . . . . . . . . . . . . . . 128

8.6 The Shell As A Programming Language. . . . . . . . . . . . 129

8.6.1 Foreground and Background Processes . . . . . . . . . . 132

9 The File System 137

9.1 Terms........................138

9.2 File Types. . . . . . . . . . . . . . . . . . . . . . . 138

9.3 The Linux Directory Tree . . . . . . . . . . . . . . . . . 139

9.4 Directory Tree and File Systems. . . . . . . . . . . . . . . 147

9.5 Removable Media. . . . . . . . . . . . . . . . . . . . 148

10 System Administration 151

10.1 Introductory Remarks . . . . . . . . . . . . . . . . . . 152

10.2 The Privileged

root

Account . . . . . . . . . . . . . . . . 152

10.3 Obtaining Administrator Privileges . . . . . . . . . . . . . 154

10.4 Distribution-specic Administrative Tools . . . . . . . . . . . 156

11 User Administration 159

11.1Basics........................160

11.1.1 Why Users? . . . . . . . . . . . . . . . . . . . . 160

11.1.2 Users and Groups . . . . . . . . . . . . . . . . . 161

11.1.3 People and Pseudo-Users . . . . . . . . . . . . . . . 163

11.2 User and Group Information. . . . . . . . . . . . . . . . 163

11.2.1 The

/etc/passwd

File.................163

11.2.2 The

/etc/shadow

File.................166

11.2.3 The

/etc/group

File .................168

11.2.4 The

/etc/gshadow

File.................169

11.2.5 The

getent

Command . . . . . . . . . . . . . . . . 170

11.3 Managing User Accounts and Group Information . . . . . . . . 170

11.3.1 Creating User Accounts . . . . . . . . . . . . . . . 171

11.3.2 The

passwd

Command . . . . . . . . . . . . . . . . 172

11.3.3 Deleting User Accounts . . . . . . . . . . . . . . . 174

11.3.4 Changing User Accounts and Group Assignment . . . . . . 174

11.3.5 Changing User Information Directly—

vipw

.........175

11.3.6 Creating, Changing and Deleting Groups . . . . . . . . . 175

12 Access Control 179

12.1 The Linux Access Control System . . . . . . . . . . . . . . 180

12.2 Access Control For Files And Directories . . . . . . . . . . . 180

12.2.1 The Basics . . . . . . . . . . . . . . . . . . . . 180

12.2.2 Inspecting and Changing Access Permissions. . . . . . . . 181

12.2.3 Specifying File Owners and Groups—

chown

and

chgrp

. . . . . 182

12.2.4 The umask . . . . . . . . . . . . . . . . . . . . 183

12.3 Access Control Lists (ACLs) . . . . . . . . . . . . . . . . 185

12.4 Process Ownership . . . . . . . . . . . . . . . . . . . 185

12.5 Special Permissions for Executable Files . . . . . . . . . . . . 185

12.6 Special Permissions for Directories . . . . . . . . . . . . . 186

12.7 File Attributes . . . . . . . . . . . . . . . . . . . . . 188

13 Process Management 191

13.1 What Is A Process? . . . . . . . . . . . . . . . . . . . 192

13.2 Process States . . . . . . . . . . . . . . . . . . . . . 193

13.3 Process Information—

.................194

13.4 Processes in a Tree—

pstree

................195

13.5 Controlling Processes—

kill

and

killall

............196

13.6

pgrep

and

pkill

.....................197

13.7 Process Priorities—

nice

and

renice

..............199

13.8 Further Process Management Commands—

nohup

and

top

. . . . . 199

14 Hard Disks (and Other Secondary Storage) 201

14.1 Fundamentals . . . . . . . . . . . . . . . . . . . . . 202

14.2 Bus Systems for Mass Storage . . . . . . . . . . . . . . . 202

14.3 Partitioning . . . . . . . . . . . . . . . . . . . . . . 205

14.3.1 Fundamentals . . . . . . . . . . . . . . . . . . . 205

14.3.2 The Traditional Method (MBR) . . . . . . . . . . . . . 206

14.3.3 The Modern Method (GPT) . . . . . . . . . . . . . . 207

14.4 Linux and Mass Storage . . . . . . . . . . . . . . . . . 208

14.5 Partitioning Disks. . . . . . . . . . . . . . . . . . . . 210

14.5.1 Fundamentals . . . . . . . . . . . . . . . . . . . 210

14.5.2 Partitioning Disks Using

fdisk

.............212

14.5.3 Formatting Disks using GNU

parted

...........215

14.5.4

gdisk

......................216

14.5.5 More Partitioning Tools . . . . . . . . . . . . . . . 217

14.6 Loop Devices and

kpartx

.................217

14.7 The Logical Volume Manager (LVM) . . . . . . . . . . . . . 219

6 Contents

15 File Systems: Care and Feeding 223

15.1 Creating a Linux File System. . . . . . . . . . . . . . . . 224

15.1.1 Overview . . . . . . . . . . . . . . . . . . . . 224

15.1.2 The

ext

File Systems . . . . . . . . . . . . . . . . . 226

15.1.3 ReiserFS . . . . . . . . . . . . . . . . . . . . . 234

15.1.4XFS.......................235

15.1.5 Btrfs . . . . . . . . . . . . . . . . . . . . . . 237

15.1.6 Even More File Systems . . . . . . . . . . . . . . . 238

15.1.7 Swap space . . . . . . . . . . . . . . . . . . . . 239

15.2 Mounting File Systems . . . . . . . . . . . . . . . . . . 240

15.2.1 Basics . . . . . . . . . . . . . . . . . . . . . . 240

15.2.2 The

mount

Command . . . . . . . . . . . . . . . . . 240

15.2.3 Labels and UUIDs . . . . . . . . . . . . . . . . . 242

15.3 The

Command....................244

16 Booting Linux 247

16.1 Fundamentals . . . . . . . . . . . . . . . . . . . . . 248

16.2 GRUB Legacy . . . . . . . . . . . . . . . . . . . . . 251

16.2.1 GRUB Basics . . . . . . . . . . . . . . . . . . . 251

16.2.2 GRUB Legacy Conguration. . . . . . . . . . . . . . 252

16.2.3 GRUB Legacy Installation . . . . . . . . . . . . . . . 253

16.2.4 GRUB 2 . . . . . . . . . . . . . . . . . . . . . 254

16.2.5 Security Advice . . . . . . . . . . . . . . . . . . 255

16.3 Kernel Parameters . . . . . . . . . . . . . . . . . . . 255

16.4 System Startup Problems . . . . . . . . . . . . . . . . . 257

16.4.1 Troubleshooting . . . . . . . . . . . . . . . . . . 257

16.4.2 Typical Problems . . . . . . . . . . . . . . . . . . 257

16.4.3 Rescue systems and Live Distributions . . . . . . . . . . 259

17 System-V Init and the Init Process 261

17.1 The Init Process . . . . . . . . . . . . . . . . . . . . 262

17.2 System-V Init . . . . . . . . . . . . . . . . . . . . . 262

17.3 Upstart . . . . . . . . . . . . . . . . . . . . . . . 268

17.4 Shutting Down the System . . . . . . . . . . . . . . . . 270

18 Systemd 275

18.1 Overview. . . . . . . . . . . . . . . . . . . . . . . 276

18.2 Unit Files . . . . . . . . . . . . . . . . . . . . . . . 277

18.3 Unit Types . . . . . . . . . . . . . . . . . . . . . . 281

18.4 Dependencies . . . . . . . . . . . . . . . . . . . . . 282

18.5 Targets. . . . . . . . . . . . . . . . . . . . . . . . 284

18.6 The

systemctl

Command . . . . . . . . . . . . . . . . . 286

18.7 Installing Units. . . . . . . . . . . . . . . . . . . . . 289

19 Time-controlled Actions—

cron

and

291

19.1 Introduction. . . . . . . . . . . . . . . . . . . . . . 292

19.2 One-Time Execution of Commands . . . . . . . . . . . . . 292

19.2.1

and

batch

....................292

19.2.2

Utilities . . . . . . . . . . . . . . . . . . . . 294

19.2.3 Access Control. . . . . . . . . . . . . . . . . . . 294

19.3 Repeated Execution of Commands . . . . . . . . . . . . . 295

19.3.1 User Task Lists. . . . . . . . . . . . . . . . . . . 295

19.3.2 System-Wide Task Lists . . . . . . . . . . . . . . . 296

19.3.3 Access Control. . . . . . . . . . . . . . . . . . . 297

19.3.4 The

crontab

Command . . . . . . . . . . . . . . . . 297

19.3.5 Anacron . . . . . . . . . . . . . . . . . . . . . 298

20 System Logging 301

20.1 The Problem . . . . . . . . . . . . . . . . . . . . . 302

20.2 The Syslog Daemon . . . . . . . . . . . . . . . . . . . 302

20.3 Log Files . . . . . . . . . . . . . . . . . . . . . . . 305

20.4 Kernel Logging . . . . . . . . . . . . . . . . . . . . 306

20.5 Extended Possibilities: Rsyslog . . . . . . . . . . . . . . . 306

20.6 The “next generation”: Syslog-NG. . . . . . . . . . . . . . 310

20.7 The

logrotate

Program . . . . . . . . . . . . . . . . . . 314

21 System Logging with Systemd and “The Journal” 319

21.1 Fundamentals . . . . . . . . . . . . . . . . . . . . . 320

21.2 Systemd and journald . . . . . . . . . . . . . . . . . . 321

21.3 Log Inspection . . . . . . . . . . . . . . . . . . . . . 323

22 TCP/IP Fundamentals 329

22.1 History and Introduction . . . . . . . . . . . . . . . . . 330

22.1.1 The History of the Internet . . . . . . . . . . . . . . 330

22.1.2 Internet Administration . . . . . . . . . . . . . . . 330

22.2 Technology . . . . . . . . . . . . . . . . . . . . . . 332

22.2.1 Overview . . . . . . . . . . . . . . . . . . . . 332

22.2.2 Protocols . . . . . . . . . . . . . . . . . . . . . 333

22.3 TCP/IP . . . . . . . . . . . . . . . . . . . . . . . 335

22.3.1 Overview . . . . . . . . . . . . . . . . . . . . 335

22.3.2 End-to-End Communication: IP and ICMP . . . . . . . . 336

22.3.3 The Base for Services: TCP and UDP . . . . . . . . . . . 339

22.3.4 The Most Important Application Protocols. . . . . . . . . 342

22.4 Addressing, Routing and Subnetting . . . . . . . . . . . . . 344

22.4.1 Basics . . . . . . . . . . . . . . . . . . . . . . 344

22.4.2 Routing . . . . . . . . . . . . . . . . . . . . . 345

22.4.3 IP Network Classes . . . . . . . . . . . . . . . . . 346

22.4.4 Subnetting . . . . . . . . . . . . . . . . . . . . 346

22.4.5 Private IP Addresses . . . . . . . . . . . . . . . . 347

22.4.6 Masquerading and Port Forwarding . . . . . . . . . . . 348

22.5IPv6.........................349

22.5.1 IPv6 Addressing . . . . . . . . . . . . . . . . . . 350

23 Linux Network Conguration 355

23.1 Network Interfaces . . . . . . . . . . . . . . . . . . . 356

23.1.1 Hardware and Drivers . . . . . . . . . . . . . . . . 356

23.1.2 Conguring Network Adapters Using

ifconfig

. . . . . . . 357

23.1.3 Conguring Routing Using

route

............358

23.1.4 Conguring Network Settings Using

..........360

23.2 Persistent Network Conguration . . . . . . . . . . . . . . 361

23.3DHCP........................364

23.4 IPv6 Conguration . . . . . . . . . . . . . . . . . . . 365

23.5 Name Resolution and DNS . . . . . . . . . . . . . . . . 366

24 Network Troubleshooting 371

24.1 Introduction. . . . . . . . . . . . . . . . . . . . . . 372

24.2 Local Problems. . . . . . . . . . . . . . . . . . . . . 372

24.3 Checking Connectivity With

ping

..............372

24.4 Checking Routing Using

traceroute

And

tracepath

........375

24.5 Checking Services With

netstat

And

nmap

...........378

24.6 Testing DNS With

host

And

dig

...............381

24.7 Other Useful Tools For Diagnosis . . . . . . . . . . . . . . 383

24.7.1

telnet

and

netcat

..................383

24.7.2

tcpdump

......................385

24.7.3

wireshark

.....................385

25 The Secure Shell 387

8 Contents

25.1 Introduction. . . . . . . . . . . . . . . . . . . . . . 388

25.2 Logging Into Remote Hosts Using

ssh

............388

25.3 Other Useful Applications:

scp

and

sftp

............391

25.4 Public-Key Client Authentication . . . . . . . . . . . . . . 392

25.5 Port Forwarding Using SSH . . . . . . . . . . . . . . . . 394

25.5.1 X11 Forwarding . . . . . . . . . . . . . . . . . . 394

25.5.2 Forwarding Arbitrary TCP Ports . . . . . . . . . . . . 395

26 Software Package Management Using Debian Tools 399

26.1 Overview. . . . . . . . . . . . . . . . . . . . . . . 400

26.2 The Basis:

dpkg

.....................400

26.2.1 Debian Packages . . . . . . . . . . . . . . . . . . 400

26.2.2 Package Installation . . . . . . . . . . . . . . . . . 401

26.2.3 Deleting Packages . . . . . . . . . . . . . . . . . 402

26.2.4 Debian Packages and Source Code . . . . . . . . . . . 403

26.2.5 Package Information. . . . . . . . . . . . . . . . . 403

26.2.6 Package Verication . . . . . . . . . . . . . . . . . 406

26.3 Debian Package Management: The Next Generation . . . . . . . 407

26.3.1 APT . . . . . . . . . . . . . . . . . . . . . . 407

26.3.2 Package Installation Using

apt-get

............407

26.3.3 Information About Packages. . . . . . . . . . . . . . 409

26.3.4

aptitude

.....................410

26.4 Debian Package Integrity . . . . . . . . . . . . . . . . . 412

26.5 The debconf Infrastructure . . . . . . . . . . . . . . . . 413

26.6

alien

: Software From Dierent Worlds . . . . . . . . . . . . 414

27 Package Management with RPM and YUM 417

27.1 Introduction. . . . . . . . . . . . . . . . . . . . . . 418

27.2 Package Management Using

rpm

...............419

27.2.1 Installation and Update . . . . . . . . . . . . . . . 419

27.2.2 Deinstalling Packages . . . . . . . . . . . . . . . . 419

27.2.3 Database and Package Queries . . . . . . . . . . . . . 420

27.2.4 Package Verication . . . . . . . . . . . . . . . . . 422

27.2.5 The

rpm2cpio

Program . . . . . . . . . . . . . . . . 422

27.3YUM........................423

27.3.1 Overview . . . . . . . . . . . . . . . . . . . . 423

27.3.2 Package Repositories . . . . . . . . . . . . . . . . 423

27.3.3 Installing and Removing Packages Using YUM . . . . . . . 424

27.3.4 Information About Packages. . . . . . . . . . . . . . 426

27.3.5 Downloading Packages. . . . . . . . . . . . . . . . 428

A Sample Solutions 429

B Example Files 449

C LPIC-1 Certication 453

C.1 Overview. . . . . . . . . . . . . . . . . . . . . . . 453

C.2 Exam LPI-101 . . . . . . . . . . . . . . . . . . . . . 453

C.3 Exam LPI-102 . . . . . . . . . . . . . . . . . . . . . 454

C.4 LPI Objectives In This Manual . . . . . . . . . . . . . . . 455

D Command Index 469

Index 475

$ echo tux

tux

$ ls

hallo.c

hallo.o

$ /bin/su -

Password:

List of Tables

4.1 Manualpagesections........................... 47

4.2 ManualPageTopics............................ 48

5.1 Insert-mode commands for

...................... 56

5.2 Cursor positioning commands in

................... 57

5.3 Editing commands in

......................... 58

5.4 Replacement commands in

...................... 58

5.5

commands in

............................. 60

6.1 Some le type designations in

.................... 68

6.2 Some

options.............................. 68

6.3 Options for

............................... 74

6.4 Keyboard commands for

...................... 80

6.5 Keyboard commands for

less

...................... 81

6.6 Test conditions for

find

.......................... 82

6.7 Logical operators for

find

......................... 83

7.1 Standard channels on Linux . . . . . . . . . . . . . . . . . . . . . . . 89

7.2 Options for

cat

(selection) ........................ 94

7.3 Options for

tac

(selection) ........................ 95

7.4 Options for

(excerpt).......................... 97

7.5 Options for

...............................100

7.6 Characters and character classes for

.................101

7.7 Options for

(selection).........................104

7.8 Options for

(selection).........................105

7.9 Options for

(selection).........................107

7.10 Options for

sort

(selection)........................110

7.11 Options for

join

(selection)........................115

8.1 Important Shell Variables . . . . . . . . . . . . . . . . . . . . . . . . 122

8.2 Key Strokes within

bash

..........................127

8.3 Options for

jobs

..............................134

9.1 Linuxletypes ..............................138

9.2 Directory division according to the FHS . . . . . . . . . . . . . . . . 146

12.1 The most important le attributes . . . . . . . . . . . . . . . . . . . . 188

14.1 Dierent SCSI variants . . . . . . . . . . . . . . . . . . . . . . . . . . 204

14.2 Partition types for Linux (hexadecimal) . . . . . . . . . . . . . . . . 206

14.3 Partition type GUIDs for GPT (excerpt) . . . . . . . . . . . . . . . . 208

18.1 Common targets for systemd (selection) . . . . . . . . . . . . . . . . 284

18.2 Compatibility targets for System-V init . . . . . . . . . . . . . . . . . 285

20.1

syslogd

facilities ..............................303

20.2

syslogd

priorities (with ascending urgency) . . . . . . . . . . . . . . 303

10 List of Tables

20.3 Filtering functions for Syslog-NG . . . . . . . . . . . . . . . . . . . . 312

22.1 Common application protocols based on TCP/IP . . . . . . . . . . . 343

22.2Addressingexample ...........................345

22.3 Traditional IP Network Classes . . . . . . . . . . . . . . . . . . . . . 346

22.4SubnettingExample............................347

22.5 Private IP address ranges according to RFC 1918 . . . . . . . . . . . 347

23.1 Options within

/etc/resolv.conf

.....................367

24.1 Important

ping

options ..........................374

$ echo tux

tux

$ ls

hallo.c

hallo.o

$ /bin/su -

Password:

List of Figures

1.1 Ken Thompson and Dennis Ritchie with a PDP-11 . . . . . . . . . . 17

1.2 Linuxdevelopment............................ 18

1.3 Organizational structure of the Debian project . . . . . . . . . . . . 27

2.1 The login screens of some common Linux distributions . . . . . . . 32

2.2 Running programs as a dierent user in KDE . . . . . . . . . . . . . 35

4.1 Amanualpage .............................. 48

5.1

’smodes ................................. 56

7.1 Standard channels on Linux . . . . . . . . . . . . . . . . . . . . . . . 88

7.2 The

tee

command............................. 93

8.1 Synchronous command execution in the shell . . . . . . . . . . . . . 133

8.2 Asynchronous command execution in the shell . . . . . . . . . . . . 133

9.1 Content of the root directory (SUSE) . . . . . . . . . . . . . . . . . . 140

13.1 The relationship between various process states . . . . . . . . . . . 193

15.1 The

/etc/fstab

le(example).......................241

17.1 A typical

/etc/inittab

le(excerpt) ...................263

17.2 Upstart conguration le for job

rsyslog

................269

18.1 A systemd unit le:

console-getty.service

................279

20.1 Example conguration for

logrotate

(Debian GNU/Linux 8.0) . . . 315

21.1 Complete log output of

journalctl

....................326

22.1 Protocols and service interfaces . . . . . . . . . . . . . . . . . . . . . 334

22.2 ISO/OSI reference model . . . . . . . . . . . . . . . . . . . . . . . . 334

22.3 Structure of an IP datagram . . . . . . . . . . . . . . . . . . . . . . . 337

22.4 Structure of an ICMP packet . . . . . . . . . . . . . . . . . . . . . . . 338

22.5 Structure of a TCP Segment . . . . . . . . . . . . . . . . . . . . . . . 339

22.6 Starting a TCP connection: The Three-Way Handshake . . . . . . . 340

22.7 Structure of a UDP datagram . . . . . . . . . . . . . . . . . . . . . . 341

22.8 The

/etc/services

le(excerpt)......................342

23.1

/etc/resolv.conf

example .........................367

23.2 The

/etc/hosts

le(SUSE).........................368

26.1 The

aptitude

program...........................411

$ echo tux

tux

$ ls

hallo.c

hallo.o

$ /bin/su -

Password:

Preface

This manual oers a concise introduction to the use and administration of Linux.

It is aimed at students who have had some experience using other operating sys-

tems and want to transition to Linux, but is also suitable for use at schools and

universities.

Topics include a thorough introduction to the Linux shell, the

editor, and the

most important le management tools as well as a primer on basic administration

tasks like user, permission, and process management. We present the organisation

of the le system and the administration of hard disk storage, describe the system

boot procedure, the conguration of services, the time-based automation of tasks

and the operation of the system logging service. The course is rounded out by

an introduction to TCP/IP and the conguration and operation of Linux hosts as

network clients, with particular attention to troubleshooting, and chapters on the

Secure Shell and printing to local and network printers.

Together with the subsequent volume, Concise Linux—Advanced Topics, this

manual covers all of the objectives of the Linux Professional Institute’s LPIC-1 cer-

ticate exams and is therefore suitable for exam preparation.

This courseware package is designed to support the training course as e-

ciently as possible, by presenting the material in a dense, extensive format for

reading along, revision or preparation. The material is divided in self-contained

chapters detailing a part of the curriculum; a chapter’s goals and prerequisites chapters

goals

prerequisites

are summarized clearly at its beginning, while at the end there is a summary and

(where appropriate) pointers to additional literature or web pages with further

information.

BAdditional material or background information is marked by the “light-

bulb” icon at the beginning of a paragraph. Occasionally these paragraphs

make use of concepts that are really explained only later in the courseware,

in order to establish a broader context of the material just introduced; these

“lightbulb” paragraphs may be fully understandable only when the course-

ware package is perused for a second time after the actual course.

AParagraphs with the “caution sign” direct your attention to possible prob-

lems or issues requiring particular care. Watch out for the dangerous bends!

CMost chapters also contain exercises, which are marked with a “pencil” icon exercises

at the beginning of each paragraph. The exercises are numbered, and sam-

ple solutions for the most important ones are given at the end of the course-

ware package. Each exercise features a level of diculty in brackets. Exer-

cises marked with an exclamation point (“!”) are especially recommended.

Excerpts from conguration les, command examples and examples of com-

puter output appear in

typewriter type

. In multiline dialogs between the user and

the computer, user input is given in

bold typewriter type

in order to avoid misun-

derstandings. The “” symbol appears where part of a command’s output

had to be omitted. Occasionally, additional line breaks had to be added to make

things t; these appear as “

”. When command syntax is discussed, words enclosed in angle brack-

ets (“⟨Word⟩”) denote “variables” that can assume dierent values; material in

14 Preface

brackets (“[

-f

⟨le⟩]”) is optional. Alternatives are separated using a vertical bar

(“

-a

-b

”).

Important concepts are emphasized using “marginal notes” so they can be eas-Important concepts

ily located; denitions of important terms appear in bold type in the text as well

definitions as in the margin.

References to the literature and to interesting web pages appear as “[GPL91]”

in the text and are cross-referenced in detail at the end of each chapter.

We endeavour to provide courseware that is as up-to-date, complete and error-

free as possible. In spite of this, problems or inaccuracies may creep in. If you

notice something that you think could be improved, please do let us know, e.g.,

by sending e-mail to

info@tuxcademy.org

(For simplicity, please quote the title of the courseware package, the revision ID

on the back of the title page and the page number(s) in question.) Thank you very

much!

LPIC-1 Certification

These training materials are part of a recommended curriculum for LPIC-1 prepa-

ration. Refer to Appendix C for further information.

$ echo tux

tux

$ ls

hallo.c

hallo.o

$ /bin/su -

Password:

Introduction

Contents

1.1 What is Linux? . . . . . . . . . . . . . . . . . . . . . 16

1.2 Linux History . . . . . . . . . . . . . . . . . . . . . 16

1.3 Free Software, “Open Source” and the GPL . . . . . . . . . . 18

1.4 Linux—The Kernel . . . . . . . . . . . . . . . . . . . 21

1.5 Linux Properties . . . . . . . . . . . . . . . . . . . . 23

1.6 Linux Distributions . . . . . . . . . . . . . . . . . . . 26

Goals

• Knowing about Linux, its properties and its history

• Dierentiating between the Linux kernel and Linux distributions

• Understanding the terms “GPL”, “free software”, and “open-source soft-

ware”

Prerequisites

• Knowledge of other operating systems is useful to appreciate similarities

and dierences

grd1-einfuehrung.tex

(

be27bba8095b329b

)

16 1 Introduction

1.1 What is Linux?

Linux is an operating system. As such, it manages a computer’s basic function-

ality. Application programs build on the operating system. It forms the interface

between the hardware and application programs as well as the interface between

the hardware and people (users). Without an operating system, the computer is

unable to “understand” or process our input.

Various operating systems dier in the way they go about these tasks. The

functions and operation of Linux are inspired by the Unix operating system.

1.2 Linux History

The history of Linux is something special in the computer world. While most other

operating systems are commercial products produced by companies, Linux was

started by a university student as a hobby project. In the meantime, hundreds of

professionals and enthusiasts all over the world collaborate on it—from hobbyists

and computer science students to operating systems experts funded by major IT

corporations to do Linux development. The basis for the existence of such a project

is the Internet: Linux developers make extensive use of services like electronic

mail, distributed version control, and the World Wide Web and, through these,

have made Linux what it is today. Hence, Linux is the result of an international

collaboration across national and corporate boundaries, now as then led by Linus

Torvalds, its original author.

To explain about the background of Linux, we need to digress for a bit: Unix,

the operating system that inspired Linux, was begun in 1969. It was developed by

Ken Thompson and his colleagues at Bell Laboratories (the US telecommunicationBell Laboratories

giant AT&T’s research institute)1. Unix caught on rapidly especially at universi-

ties, because Bell Labs furnished source code and documentation at cost (due to

an anti-trust decree, AT&T was barred from selling software). Unix was, at rst,

an operating system for Digital Equipment’s PDP-11 line of minicomputers, but

was ported to other platforms during the 1970s—a reasonably feasible endeavour,

since the Unix software, including the operating system kernel, was mostly writ-

ten in Dennis Ritchie’s purpose-built Cprogramming language. Possibly mostC

important of all Unix ports was the one to the PDP-11’s successor platform, the

VAX, at the University of California in Berkeley, which came to be distributed asVAX

“BSD” (short for Berkeley Software Distribution). By and by, various computer man-

ufacturers developed dierent Unix derivatives based either on AT&T code or on

BSD (e. g., Sinix by Siemens, Xenix by Microsoft (!), SunOS by Sun Microsystems,

HP/UX by Hewlett-Packard or AIX by IBM). Even AT&T was nally allowed to

market Unix—the commercial versions System III and (later) System V. This led toSystem V

a fairly incomprehensible multitude of dierent Unix products. A real standardi-

sation never happened, but it is possible to distinguish roughly between BSD-like

and System-V-like Unix variants. The BSD and System V lines of development

were mostly unied by “System V Release 4”, which exhibited the characteristicsSVR4

of both factions.

The very rst parts of Linux were developed in 1991 by Linus Torvalds, then

a 21-year-old student in Helsinki, Finland, when he tried to fathom the capabil-

ities of his new PC’s Intel 386 processor. After a few months, the assembly lan-

guage experiments had matured into a small but workable operating system ker-

nel that could be used in a Minix system—Minix was a small Unix-like operatingMinix

system that computer science professor Andrew S. Tanenbaum of the Free Uni-

versity of Amsterdam, the Netherlands, had written for his students. Early Linux

had properties similar to a Unix system, but did not contain Unix source code.

Linus Torvalds made the program’s source code available on the Internet, and the

1The name “Unix” is a pun on “Multics”, the operating system that Ken Thompson and his col-

leagues worked on previously. Early Unix was a lot simpler than Multics. How the name came to be

spelled with an “x” is no longer known.

1.2 Linux History 17

Figure 1.1: Ken Thompson (sitting) and Dennis Ritchie (standing) with a

PDP-11, approx. 1972. (Photograph courtesy of Lucent Technologies.)

idea was eagerly taken up and developed further by many programmers. Version

0.12, issued in January, 1992, was already a stable operating system kernel. There

was—thanks to Minix—the GNU C compiler (

gcc

), the

bash

shell, the

emacs

editor

and many other GNU tools. The operating system was distributed world-wide by

anonymous FTP. The number of programmers, testers and supporters grew very

rapidly. This catalysed a rate of development only dreamed of by powerful soft-

ware companies. Within months, the tiny kernel grew into a full-blown operating

system with fairly complete (if simple) Unix functionality.

The “Linux” project is not nished even today. Linux is constantly updated

and added to by hundreds of programmers throughout the world, catering to

millions of satised private and commercial users. In fact it is inappropriate to

say that the system is developed “only” by students and other amateurs—many

contributors to the Linux kernel hold important posts in the computer industry

and are among the most professionally reputable system developers anywhere.

By now it is fair to claim that Linux is the operating system with the widest sup-

ported range of hardware ever, not just with respect to the platforms it is running

on (from PDAs to mainframes) but also with respect to driver support on, e. g., the

Intel PC platform. Linux also serves as a research and development platform for

new operating systems ideas in academia and industry; it is without doubt one of

the most innovative operating systems available today.

Exercises

C1.1 [4] Use the Internet to locate the famous (notorious?) discussion between

Andrew S. Tanenbaum and Linus Torvalds, in which Tanenbaum says that,

with something like Linux, Linus Torvalds would have failed his (Tanen-

baum’s) operating systems course. What do you think of the controversy?

C1.2 [2] Give the version number of the oldest version of the Linux source

code that you can locate.

18 1 Introduction

5MiB

10MiB

15MiB

20MiB

25MiB

30MiB

35MiB

40MiB

45MiB

50MiB

55MiB

1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007

Linux 2.0

Linux 2.1

Linux 2.2

Linux 2.3

Linux 2.4

Linux 2.5

Linux 2.6

Figure 1.2: Linux development, measured by the size of

linux-*.tar.gz

. Each marker corresponds to a Linux

version. During the 10 years between Linux 2.0 and Linux 2.6.18, the size of the compressed Linux

source code has roughly increased tenfold.

1.3 Free Software, “Open Source” and the GPL

From the very beginning of its development, Linux was placed under the GNU

General Public License (GPL) promulgated by the Free Software Foundation (FSF).GPL

Free Software Foundation The FSF was founded by Richard M. Stallman, the author of the Emacs editor

and other important programs, with the goal of making high-quality software

“freely” available—in the sense that users are “free” to inspect it, to change itFree Software

and to redistribute it with or without changes, not necessarily in the sense that

it does not cost anything2. In particular, he was after a freely available Unix-like

operating system, hence “GNU” as a (recursive) acronym for “GNU’s Not Unix”.

The main tenet of the GPL is that software distributed under it may be changed

as well as sold at any time, but that the (possibly modied) source code must

always be passed along—thus Open Source—and that the recipient must receiveOpen Source

the same rights of modication and redistribution. Thus there is little point in

selling GPL software “per seat”, since the recipient must be allowed to copy and

install the software as often as wanted. (It is of course permissible to sell support

for the GPL software “per seat”.) New software resulting from the extension or

modication of GPL software must, as a “derived work”, also be placed under the

GPL.

Therefore, the GPL governs the distribution of software, not its use, and al-

lows the recipient to do things that he would not be allowed to do otherwise—for

example, the right to copy and distribute the software, which according to copy-

right law is the a priori prerogative of the copyright owner. Consequently, it diers

markedly from the “end user license agreements” (EULAs) of “proprietary” soft-

ware, whose owners try to take away a recipient’s rights to do various things. (For

example, some EULAs try to forbid a software recipient from talking critically—or

2The FSF says “free as in speech, not as in beer”

1.3 Free Software, “Open Source” and the GPL 19

at all—about the product in public.)

BThe GPL is a license, not a contract, since it is a one-sided grant of rights

to the recipient (albeit with certain conditions attached). The recipient of

the software does not need to “accept” the GPL explicitly. The common

EULAs, on the other hand, are contracts, since the recipient of the software

is supposed to waive certain rights in exchange for being allowed to use the

software. For this reason, EULAs must be explicitly accepted. The legal

barriers for this may be quite high—in many jurisdictions (e. g., Germany),

any EULA restrictions must be known to the buyer before the actual sale in

order to become part of the sales contract. Since the GPL does not in any

way restrict a buyer’s rights (in particular as far as use of the software is

concerned) compared to what they would have to expect when buying any

other sort of goods, these barriers do not apply to the GPL; the additional

rights that the buyer is conferred by the GPL are a kind of extra bonus.

BCurrently two versions of the GPL are in widespread use. The newer ver-

sion 3 (also called “GPLv3”) was published in July, 2007, and diers from the GPLv3

older version 2 (also “GPLv2”) by more precise language dealing with ar-

eas such as software patents, the compatibility with other free licenses, and

the introduction of restrictions on making changes to theoretically “free”

devices impossible by excluding them through special hardware (“Tivoisa-

tion”, after a Linux-based personal video recorder whose kernel is impossi-

ble to alter or exchange). In addition, GPLv3 allows its users to add further

clauses. – Within the community, the GPLv3 was not embraced with unan-

imous enthusiasm, and many projects, in particular the Linux kernel, have

intentionally stayed with the simpler GPLv2. Many other projects are made

available under the GPLv2 “or any newer version”, so you get to decide

which version of the GPL you want to follow when distributing or modify-

ing such software.

Neither should you confuse GPL software with “public-domain” software. Public Domain

The latter belongs to nobody, everybody can do with it what he wants. A GPL

program’s copyright still rests with its developer or developers, and the GPL

states very clearly what one may do with the software and what one may not.

BIt is considered good form among free software developers to place contri-

butions to a project under the same license that the project is already using,

and in fact most projects insist on this, at least for code that is supposed to

become part of the “ocial” version. Indeed, some projects insist on “copy-

right assignments”, where the code author signs the copyright over to the

project (or a suitable organisation). The advantage of this is that, legally,

only the project is responsible for the code and that licensing violations—

where only the copyright owner has legal standing—are easier to prose-

cute. A side eect that is either desired or else explicitly unwanted is that

project, as this is an act that only the copyright owner may perform.

BIn case of the Linux operating system kernel, which explicitly does not re-

quire copyright assignment, a licensing change is very dicult to impossible

in practice, since the code is a patchwork of contributions from more than

a thousand authors. The issue was discussed during the GPLv3 process,

and there was general agreement that it would be a giant project to ascer-

tain the copyright provenance of every single line of the Linux source code,

and to get the authors to agree to a license change. In any case, some Linux

developers would be violently opposed, while others are impossible to nd

or even deceased, and the code in question would have to be replaced by

something similar with a clear copyright. At least Linus Torvalds is still in

the GPLv2 camp, so the problem does not (yet) arise in practice.

20 1 Introduction

The GPL does not stipulate anything about the price of the product. It is utterlyGPL and Money

legal to give away copies of GPL programs, or to sell them for money, as long

as you provide source code or make it available upon request, and the software

recipient gets the GPL rights as well. Therefore, GPL software is not necessarily

“freeware”.

You can nd out more by reading the GPL [GPL91], which incidentally must

accompany every GPLlicensed product (including Linux).

There are other “free” software licenses which give similar rights to the soft-Other “free” licenses

ware recipient, for example the “BSD license” which lets appropriately licensed

software be included in non-free products. The GPL is considered the most thor-

ough of the free licenses in the sense that it tries to ensure that code, once pub-

lished under the GPL, remains free. Every so often, companies have tried to include

GPL code in their own non-free products. However, after being admonished by

(usually) the FSF as the copyright holder, these companies have always complied

with the GPL’s requirements. In various jurisdictions the GPL has been validated

in courts of law—for example, in the Frankfurt (Germany) Landgericht (state court),

a Linux kernel developer obtained a judgement against D-Link (a manufacturer of

network components, in this case a Linux-based NAS device) in which the latter

was sued for damages because they did not adhere to the GPL conditions when

distributing the device [GPL-Urteil06].

BWhy does the GPL work? Some companies that thought the GPL condi-

tions onerous have tried to declare or have it declared it invalid. For exam-

ple, it was called “un-American” or “unconstitutional” in the United States;

in Germany, anti-trust law was used in an attempt to prove that the GPL

amounts to price xing. The general idea seems to be that GPL-ed soft-

ware can be used by anybody if something is demonstrably wrong with the

GPL. All these attacks ignore one fact: Without the GPL, nobody except the

original author has the right to do anything with the code, since actions like

sharing (let alone selling) the code are the author’s prerogative. So if the

GPL goes away, all other interested parties are worse o than they were.

BA lawsuit where a software author sues a company that distributes his GPL

code without complying with the GPL would approximately look like this:

Judge What seems to be the problem?

Software Author Your Lordship, the defendant has distributed my soft-

ware without a license.

Judge (to the defendant’s counsel) Is that so?

At this point the defendant can say “yes”, and the lawsuit is essentially over

(except for the verdict). They can also say “no” but then it is up to them

to justify why copyright law does not apply to them. This is an uncom-

fortable dilemma and the reason why few companies actually do this to

themselves—most GPL disagreements are settled out of court.

BIf a manufacturer of proprietary software violates the GPL (e. g., by includ-

ing a few hundreds of lines of source code from a GPL project in their prod-

uct), this does not imply that all of that product’s code must now be released

under the terms of the GPL. It only implies that they have distributed GPL

code without a license. The manufacturer can solve this problem in various

ways:

• They can remove the GPL code and replace it by their own code. The

GPL then becomes irrelevant for their software.

• They can negotiate with the GPL code’s copyright holder (if he is avail-

able and willing to go along) and, for instance, agree to pay a license

fee. See also the section on multiple licenses below.

• They can release their entire program under the GPL voluntarily and

thereby comply with the GPL’s conditions (the most unlikely method).

1.4 Linux—The Kernel 21

Independentlyof this there may be damages payable for the prior violations.

The copyright status of the proprietary software, however, is not aected in

any way.

When is a software package considered “free” or “open source”? There are Freedom criteria

no denite criteria, but a widely-accepted set of rules are the Debian Free Software Debian Free Software Guidelines

Guidelines [DFSG]. The FSF summarizes its criteria as the Four Freedoms which

must hold for a free software package:

• The freedom to use the software for any purpose (freedom 0)

• The freedom to study how the software works, and to adapt it to one’s re-

quirements (freedom 1)

• The freedom to pass the software on to others, in order to help one’s neigh-

bours (freedom 2)

• The freedom to improve the software and publish the improvements, in or-

der to benet the general public (freedom 3)

Access to the source code is a prerequisite for freedoms 1 and 3. Of course, com-

mon free-software licenses such as the GPL or the BSD license conform to these

freedoms.

In addition, the owner of a software package is free to distribute it under dif- Multiple licenses

ferent licenses at the same time, e.g., the GPL and, alternatively, a “commercial”

license that frees the recipient from the GPL restrictions such as the duty to make

available the source code for modications. This way, private users and free soft-

ware authors can enjoy the use of a powerful programming library such as the

“Qt” graphics package (published by Qt Software—formerly Troll Tech—, a Nokia

subsidiary), while companies that do not want to make their own source code

available may “buy themselves freedom” from the GPL.

Exercises

C1.3 [!2] Which of the following statements concerning the GPL are true and

which are false?

1. GPL software may not be sold.

2. GPL software may not be modied by companies in order to base their

own products on it.

3. The owner of a GPL software package may distribute the program un-

der a dierent license as well.

4. The GPL is invalid, because one sees the license only after having ob-

tained the software package in question. For a license to be valid, one

must be able to inspect it and accept it before acquiring the software.

C1.4 [4] Some software licenses require that when a le from a software distri-

bution is changed, it must be renamed. Is software distributed under such a

license considered “free” according to the DFSG? Do you think this practice

makes sense?

1.4 Linux—The Kernel

Strictly speaking, the name “Linux” only applies to the operating system “kernel”,

which performs the actual operating system tasks. It takes care of elementary

functions like memory and process management and hardware control. Applica-

tion programs must call upon the kernel to, e.g., access les on disk. The kernel

validates such requests and in doing so can enforce that nobody gets to access

22 1 Introduction

other users’ private les. In addition, the kernel ensures that all processes in the

system (and hence all users) get the appropriate fraction of the available CPU time.

Of course there is not just one Linux kernel, but there are many dierent ver-Versions

sions. Until kernel version 2.6, we distinguished stable “end-user versions” and

unstable “developer versions” as follows:

• In version numbers such as 1.𝑥.𝑦or 2.𝑥.𝑦,𝑥denotes a stable version if it isstable version

even. There should be no radical changes in stable versions; mistakes should

be corrected, and every so often drivers for new hardware components or

other very important improvements are added or “back-ported” from the

development kernels.

• Versions with odd 𝑥are development versions which are unsuitable for pro-development version

ductive use. They may contain inadequately tested code and are mostly

meant for people wanting to take active part in Linux development. Since

Linux is constantly being improved, there is a constant stream of new ker-

nel versions. Changes concern mostly adaptations to new hardware or the

optimization of various subsystems, sometimes even completely new exten-

sions.

The procedure has changed in kernel 2.6: Instead of starting version 2.7 for newkernel 2.6

development after a brief stabilisation phase, Linus Torvalds and the other kernel

developers decided to keep Linux development closer to the stable versions. This

is supposed to avoid the divergence of developer and stable versions that grew to

be an enormous problem in the run-up to Linux 2.6—most notably because corpo-

rations like SUSE and Red Hat took great pains to backport interesting properties

of the developer version 2.5 to their versions of the 2.4 kernel, to an extent where,

for example, a SUSE 2.4.19 kernel contained many hundreds of dierences to the

“vanilla” 2.4.19 kernel.

The current procedure consists of “test-driving” proposed changes and en-

hancements in a new kernel version which is then declared “stable” in a shorter

timeframe. For example, after version 2.6.37 there is a development phase during

which Linus Torvalds accepts enhancements and changes for the 2.6.38 version.

Other kernel developers (or whoever else fancies it) have access to Linus’ internal

development version, which, once it looks reasonable enough, is made available

as the “release candidate” 2.6.38-rc1. This starts the stabilisation phase, whererelease candidate

this release candidate is tested by more people until it looks stable enough to be

declared the new version 2.6.38 by Linus Torvalds. Then follows the 2.6.39 devel-

opment phase and so on.

BIn parallel to Linus Torvalds’ “ocial” version, Andrew Morton maintains

a more experimental version, the so-called “

-mm

tree”. This is used to test

-mm

tree

larger and more sweeping changes until they are mature enough to be taken

into the ocial kernel by Linus.

BSome other developers maintain the “stable” kernels. As such, there might

be kernels numbered 2.6.38.1, 2.6.38.2, …, which each contain only small

and straightforward changes such as xes for grave bugs and security is-

sues. This gives Linux distributors the opportunity to rely on kernel ver-

sions maintained for longer periods of time.

On 21 July 2011, Linus Torvalds ocially released version 3.0 of the Linux ker-version 3.0

nel. This was really supposed to be version 2.6.40, but he wanted to simplify the

version numbering scheme. “Stable” kernels based on 3.0 are accordingly num-

bered 3.0.1, 3.0.2, …, and the next kernels in Linus’ development series are 3.1-rc1,

etc. leading up to 3.1 and so forth.

BLinus Torvalds insists that there was no big dierence in functionality be-

tween the 2.6.39 and 3.0 kernels—at least not more so than between any

two other consecutive kernels in the 2.6 series—, but that there was just a

renumbering. The idea of Linux’s 20th anniversary was put forward.

1.5 Linux Properties 23

You can obtain source code for “ocial” kernels on the Internet from

ftp.

kernel.org

. However, only very few Linux distributors use the original kernel

sources. Distribution kernels are usually modied more or less extensively, e. g.,

by integrating additional drivers or features that are desired by the distribution

but not part of the standard kernel. The Linux kernel used in SUSE’s Linux Enter-

prise Server 8, for example, reputedly contained approximately 800 modications

to the “vanilla” kernel source. (The changes to the Linux development process

have succeeded to an extent where the dierence is not as great today.)

Today most kernels are modular. This was not always the case; former kernels Kernel structure

consisted of a single piece of code fullling all necessary functions such as the

support of particular devices. If you wanted to add new hardware or make use

of a dierent feature like a new type of le system, you had to compile a new

kernel from sources—a very time-consuming process. To avoid this, the kernel

was endowed with the ability to integrate additional features by way of modules.

Modules are pieces of code that can be added to the kernel dynamically (at run- Modules

time) as well as removed. Today, if you want to use a new network adapter, you do

not have to compile a new kernel but merely need to load a new kernel module.

Modern Linux distributions support automatic hardware recognition, which can hardware recognition

analyze a system’s properties and locate and congure the correct driver modules.

Exercises

C1.5 [1] What is the version number of the current stable Linux kernel? The

current developer kernel? Which Linux kernel versions are still being sup-

ported?

1.5 Linux Properties

As a modern operating system kernel, Linux has a number of properties, some

of which are part of the “state of the art” (i. e., exhibited by similar systems in an

equivalent form) and some of which are unique to Linux.

• Linux supports a large selection of processors and computer architectures, processors

ranging from mobile phones (the very successful “Android” operating sys-

tem by Google, like some other similar systems, is based on Linux) through

PDAs and tablets, all sorts of new and old PC-like computers and server

systems of various kinds up to the largest mainframe computers (the vast

majority of the machines on the list of the fastest computers in the world is

running Linux).

BA huge advantage of Linux in the mobile arena is that, unlike Mi-

crosoft Windows, it supports the energy-ecient and powerful ARM

processors that most mobile devices are based upon. In 2012, Microsoft

released an ARM-based, partially Intel-compatible, version of Win-

dows 8 under the name of “Windows RT”, but that did not exactly

prove popular in the market.

• Of all currently available operating systems, Linux supports the broadest

selection of hardware. For the very newest components there may not be hardware

drivers available immediately, but on the other hand Linux still works with

devices that systems like Windows have long since left behind. Thus, your

investments in printers, scanners, graphic boards, etc. are protected opti-

mally.

• Linux supports “preemptive multitasking”, that is, several processes are multitasking

running—virtually or, on systems with more than one CPU, even actually—

in parallel. These processes cannot obstruct or damage one another; the ker-

nel ensures that every process is allotted CPU time according to its priority.

24 1 Introduction

BThis is nothing special today; when Linux was new, this was much

more remarkable.

On carefully congured systems this may approach real-time behaviour,

and in fact there are Linux variants that are being used to control industrial

plants requiring “hard” real-time ability, as in guaranteed (quick) response

times to external events.

• Linux supports several users on the same system, even at the same timeseveral users

(via the network or serially connected terminals, or even several screens,

keyboards, and mice connected to the same computer). Dierent access per-

missions may be assigned to each user.

• Linux can eortlessly be installed alongside other operating systems on the

same computer, so you can alternately start Linux or another system. By

means of “virtualisation”, a Linux system can be split into independentvirtualisation

parts that look like separate computers from the outside and can run Linux

or other operating systems. Various free or proprietary solutions are avail-

able that enable this.

• Linux uses the available hardware eciently. The dual-core CPUs commonefficiency

today are as fully utilised as the 4096 CPU cores of a SGI Altix server. Linux

does not leave working memory (RAM) unused, but uses it to cache data

from disk; conversely, available working memory is used reasonably in or-

der to cope with workloads that are much larger than the amount of RAM

inside the computer.

• Linux is source-code compatible with POSIX, System V and BSD and hencePOSIX, System V and BSD

allows the use of nearly all Unix software available in source form.

• Linux not only oers powerful “native” le systems with properties suchfile systems

as journaling, encryption, and logical volume management, but also allows

access to the le systems of various other operating systems (such as the

Microsoft Windows FAT, VFAT, and NTFS le systems), either on local disks

or across the network on remote servers. Linux itself can be used as a le

server in Linux, Unix, or Windows networks.

• The Linux TCP/IP stack is arguably among the most powerful in the indus-TCP/IP

try (which is due to the fact that a large fraction of R&D in this area is done

based on Linux). It supports IPv4 and IPv6 and all important options and

protocols.

• Linux oers powerful and elegant graphical environments for daily workgraphical environments

and, in X11, a very popular network-transparent base graphics system. Ac-

celerated 3D graphics is supported on most popular graphics cards.

• All important productivity applications are available—oce-type pro-productivity applications

grams, web browsers, programs to access electronic mail and other com-

munication media, multimedia tools, development environments for a di-

verse selection of programming languages, and so on. Most of this software

comes with the system at no cost or can be obtained eortlessly and cheaply

over the Internet. The same applies to servers for all important Internet pro-

tocols as well as entertaining games.

The exibility of Linux not only makes it possible to deploy the system on all

sorts of PC-class computers (even “old chestnuts” that do not support current

Windows can serve well in the kids’ room, as a le server, router, or mail server),

but also helps it make inroads in the “embedded systems” market, meaning com-embedded systems

plete appliances for network infrastructure or entertainment electronics. You will,

for example, nd Linux in the popular AVM FRITZ!Box and similar WLAN, DSL

or telephony devices, in various set-top boxes for digital television, in PVRs, digi-

tal cameras, copiers, and many other devices. Your author has seen the bottle bank

1.5 Linux Properties 25

in the neighbourhood supermarket boot Linux. This is very often not trumpeted

all over the place, but, in addition to the power and convenience of Linux itself

the manufacturers appreciate the fact that, unlike comparable operating systems,

Linux does not require licensing fees “per unit sold”.

Another advantage of Linux and free software is the way the community deals

with security issues. In practice, security issues are as unavoidable in free software security issues

as they are in proprietary code—at least nobody so far has written and published

a software system of interesting size that proved completely free of them in the

long run. In particular, it would be improper to claim that free software has no

security issues. The dierences are more likely to be found on a philosophical

level:

• As a rule, a vendor of proprietary software has no interest in xing security

issues in their code—they will try to cover up problems and to talk down

possible dangers for as long as they possibly can, since constantly publish-

ing “patches” means, in the best case, terrible PR (“where there is smoke,

there must be a re”; the competition, which just happens not to be in the

spotlight of scrutiny at the moment, is having a secret laugh), and, in the

worst case, great expense and lots of hassle if exploits are around that make

active use of the security holes. Besides, there is the usual danger of intro-

ducing three new errors while xing one known one, which is why xing

bugs in released software is normally not an econonomically viable propo-

sition.

• A free-software publisher does not gain anything by sitting on information

about security issues, since the source code is generally available, and ev-

erybody can nd the problems. It is, in fact, a matter of pride to x known

security issues as quickly as possible. The fact that the source code is pub-

lically available also implies that third parties nd it easy to audit code for

problems that can be repaired proactively. (A common claim is that the

availability of source code exerts a very strong attraction on crackers and

other unsavoury vermin. In fact, these low-lifes do not appear to have major

diculties identifying large numbers of security issues in proprietary sys-

tems such as Windows, whose source code is not generally available. Thus

any dierence, if it exists, must be minute indeed.)

• Especially as far as software dealing with cryptography (the encryption and

decryption of condential information) is concerned, there is a strong argu-

ment that availability of source code is an indispensable prerequisite for

trust that a program really does what it is supposed to do, i. e., that the

claimed encryption algorithm has been implemented completely and cor-

rectly. Linux does have an obvious advantage here.

Linux is used throughout the world by private and professional users— Linux in companies

companies, research establishments, universities. It plays an important role par-

ticularly as a system for web servers (Apache), mail servers (Sendmail, Postx),

le servers (NFS, Samba), print servers (LPD, CUPS), ISDN routers, X terminals,

scientic/engineering workstations etc. Linux is an essential part of industrial IT

departments. Widespread adoption of Linux in public administration, such as the Public administration

city of Munich, also serves as a signal. In addition, many reputable IT companies Support by IT companies

such as IBM, Hewlett-Packard, Dell, Oracle, Sybase, Informix, SAP, Lotus etc. are

adapting their products to Linux or selling Linux versions already. Furthermore,

ever more computers (“netbooks”)— come with Linux or are at least tested for

Linux compability by their vendors.

Exercises

C1.6 [4] Imagine you are responsible for IT in a small company (20–30 employ-

ees). In the oce there are approximately 20 desktop PCs and two servers (a

le and printer server and a mail and Web proxy server). So far everything

runs on Windows. Consider the following scenarios:

26 1 Introduction

• The le and printer server is replaced by a Linux server using Samba

(a Linux/Unix-based server for Windows clients).

• The mail and proxy server is replaced by a Linux server.

• The twenty oce desktop PCs are replaced by Linux machines.

Comment on the dierent scenarios and draw up short lists of their advan-

tages and disadvantages.

1.6 Linux Distributions

Linux in the proper sense of the word only consists of the operating system ker-

nel. To accomplish useful work, a multitude of system and application programs,

libraries, documentation etc. is necessary. “Distributions” are nothing but up-to-Distributions

date selections of these together with special programs (usually tools for instal-

lation and maintenance) provided by companies or other organisations, possibly

together with other services such as support, documentation, or updates. Distri-

butions dier mostly in the selection of software they oer, their administration

tools, extra services, and price.

“Fedora” is a freely available Linux distribution developed under the guid-Red Hat and Fedora

ance of the US-based company, Red Hat. It is the successor of the “Red Hat

Linux” distribution; Red Hat has withdrawn from the private end-user mar-

ket and aims their “Red Hat” branded distributions at corporate customers.

Red Hat was founded in 1993 and became a publically-traded corporation

in August, 1999; the rst Red Hat Linux was issued in 1994, the last (ver-

sion 9) in late April, 2004. “Red Hat Enterprise Linux” (RHEL), the current

product, appeared for the rst time in March, 2002. Fedora, as mentioned, is

a freely available oering and serves as a development platform for RHEL;

it is, in eect, the successor of Red Hat Linux. Red Hat only makes Fedora

available for download; while Red Hat Linux was sold as a “boxed set” with

CDs and manuals, Red Hat now leaves this to third-party vendors.

The SUSE company was founded 1992 under the name “Gesellschaft fürSUSE

Software und Systementwicklung” as a Unix consultancy and accordingly

abbreviated itself as “S.u.S.E.” One of its products was a version of Patrick

Volkerding’s Linux distribution, Slackware, that was adapted to the Ger-

man market. (Slackware, in turn, derived from the rst complete Linux

distribution, “Softlanding Linux System” or SLS.) S.u.S.E. Linux 1.0 came

out in 1994 and slowly dierentiated from Slackware, for example by taking

on Red Hat features such as the RPM package manager or the

/etc/ sysconfig

le. The rst version of S.u.S.E. Linux that no longer looked like Slackware

was version 4.2 of 1996. SuSE (the dots were dropped at some point) soon

gained market leadership in German-speaking Europe and published SuSE

Linux in a “box” in two avours, “Personal” and “Professional”; the latter

was markedly more expensive and contained more server software. Like

Red Hat, SuSE oered an enterprise-grade Linux distribution called “SuSE

Linux Enterprise Server” (SLES), with some derivatives like a specialised

server for mail and groupware (“SuSE Linux OpenExchange Server” or

SLOX). In addition, SuSE endeavoured to make their distribution available

on IBM’s mid-range and mainframe computers.

In November 2003, the US software company Novell announced their in-Novell takeover

tention of taking over SuSE for 210 million dollars; the deal was concluded

in January 2004. (The “U” went uppercase on that occasion). Like Red Hat,

SUSE has by now taken the step to open the “private customer” distribution

and make it freely available as “openSUSE” (earlier versions appeared for

public download only after a delay of several months). Unlike Red Hat,

1.6 Linux Distributions 27

elect

Volunteers

Project leader

Technical committee Project secretary

Officers

appoints appoints

Release team

FTP masters

Security team

Press contacts

Administrators

etc.

Delegates

Developers

appoints/approves

Users

approve

Maintainers / porters

etc.

etc.CD team

Web/list/...masters

Policy group

Quality assurance

Documentation / i18n teams

Software in the

Public Interest

(SPI)

DAM NM team / advocates applicants

apply

Figure 1.3: Organizational structure of the Debian project. (Graphic by Martin F. Krat.)

Novell/SUSE still oers a “boxed” version containing additional propri-

etary software. Among others, SUSE still sells SLES and a corporate desktop

platform called “SUSE Linux Enterprise Desktop” (SLED).

In early 2011, Novell was acquired by Attachmate, which in turn was taken Attachmate

over by Micro Focus in 2014. Both are companies whose main eld of busi- Micro Focus

ness is traditional mainframe computers and which so far haven not distin-

guished themselves in the Linux and open-source arena. These maneuver-

ings, however, have had fairly little inuence on SUSE and its products.

A particular property of SUSE distributions is “YaST”, a comprehensive YaST

graphical administration tool.

Unlike the two big Linux distribution companies Red Hat and Novell/SUSE,

the Debian project is a collaboration of volunteers whose goal is to make Debian project

available a high-quality Linux distribution called “Debian GNU/Linux”.

The Debian project was announced on 16 August 1993 by Ian Murdock; the

name is a contraction of his rst name with that of his then-girlfriend (now

ex-wife) Debra (and is hence pronounced “debb-ian”). By now the project

includes more than 1000 volunteers.

Debian is based on three documents:

• The Debian Free Software Guidelines (DFSG) dene which software the

project considers “free”. This is important, since only DFSG-free soft-

ware can be part of the Debian GNU/Linux distribution proper. The

project also distributes non-free software, which is strictly separated

from the DFSG-free software on the distribution’s servers: The latter

is in subdirectory called

main

, the former in

non- free

. (There is an inter-

mediate area called

contrib

; this contains software that by itself would

be DFSG-free but does not work without other, non-free, components.)

28 1 Introduction

• The Social Contract describes the project’s goals.

• The Debian Constitution describes the project’s organisation.

At any given time there are at least three versions of Debian GNU/Linux:versions

New or corrected versions of packages are put into the

unstable

branch.

If, for a certain period of time, no grave errors have appeared in a pack-

age, it is copied to the

testing

branch. Every so often the content of

test-

ing

is “frozen”, tested very thoroughly, and nally released as

stable

. A

frequently-voiced criticism of Debian GNU/Linux is the long timespan be-

tween

stable

releases; many, however, consider this an advantage. The De-

bian project makes Debian GNU/Linux available for download only; media

are available from third-party vendors.

By virtue of its organisation, its freedom from commercial interests, and its

clean separation between free and non-free software, Debian GNU/Linux is

a sound basis for derivative projects. Some of the more popular ones includederivative projects

Knoppix (a “live CD” which makes it possible to test Linux on a PC without

having to install it rst), SkoleLinux (a version of Linux especially adapted to

the requirements of schools), or commercial distributions such as Xandros.

Limux, the desktop Linux variant used in the Munich city administration,

is also based on Debian GNU/Linux.

One of the most popular Debian derivatives is Ubuntu, which is oeredUbuntu

by the British company, Canonical Ltd., founded by the South African

entrepreneur Mark Shuttleworth. (“Ubuntu” is a word from the Zulu lan-

guage and roughly means “humanity towards others”.) The goal of Ubuntugoal

is to oer, based on Debian GNU/Linux, a current, capable, and easy-to-

understand Linux which is updated at regular intervals. This is facilitated,

for example, by Ubuntu being oered on only three computer architec-

tures as opposed to Debian’s ten, and by restricting itself to a subset of the

software oered by Debian GNU/Linux. Ubuntu is based on the

unstable

branch of Debian GNU/Linux and uses, for the most part, the same tools

for software distribution, but Debian and Ubuntu software packages are

not necessarily mutually compatible.

Some Ubuntu developers are also active participants in the Debian project,Ubuntu vs. Debian

which ensures a certain degree of exchange. On the other hand, not all De-

bian developers are enthusiastic about the shortcuts Ubuntu takes every so

often in the interest of pragmatism, where Debian might look for more com-

prehensive solutions even if these require more eort. In addition, Ubuntu

does not appear to feel as indebted to the idea of free software as does De-

bian; while all of Debian’s infrastructure tools (such as the bug management

system) are available as free software, this is not always the case for those

of Ubuntu.

Ubuntu not only wants to oer an attractive desktop system, but also takeUbuntu vs. SUSE/Red Hat

on the more established systems like RHEL or SLES in the server space, by

oering stable distributions with a long life cycle and good support. It is

unclear how Canonical Ltd. intends to make money in the long run; for the

time being the project is mostly supported out of Mark Shuttleworth’s pri-

vate coers, which are fairly well-lled since he sold his Internet certicate

authority, Thawte, to Verisign …

In addition to these distributions there are many more, such as Mageia or LinuxMore distributions

Mint as smaller “generally useful” distributions, various “live systems” for dif-

ferent uses from rewalls to gaming or multimedia platforms, or very compact

systems usable as routers, rewalls, or rescue systems.

Even though there is a vast number of distributions, most look fairly similar inCommonalities

daily life. This is due, on the one hand, to the fact that they use the same basic

programs—for example, the command line interpreter is nearly always

bash

. On

1.6 Bibliography 29

the other hand, there are standards that try to counter rank growth. The “Filesys-

tem Hierarchy Standard” (FHS) or “Linux Standard Base” (LSB) must be men-

tioned.

Exercises

C1.7 [2] Some Linux hardware platforms have been enumerated above. For

which of those platforms are there actual Linux distributions available?

(Hint:

http://www.distrowatch.org/

)

Summary

• Linux is a Unix-like operating system.

• The rst version of Linux was developed by Linus Torvalds and made avail-

able on the Internet as “free software”. Today, hundreds of developers all

over the world contribute to updating and extending the system.

• The GPL is the best-known “free software” license. It tries to ensure that

the recipients of software can modify and redistribute the package, and that

these “freedoms” are passed on to future recipients. GPL software may also

be sold.

• To the user, “open source” means approximately the same as “free soft-

ware”.

• There are other free licenses besides the GPL. Software may also be dis-

tributed by the copyright owner under several licenses at the same time.

• Linux is actually just the operating system kernel. We distinguish “stable”

and “development kernels”; with the former, the second part of the version

number is even and with the latter, odd. Stable kernels are meant for end

users, while development kernels are not necessarily functional, represent-

ing interim versions of Linux development.

• There are numerous Linux distributions bringing together a Linux kernel

and additional software, documentation and installation and administra-

tion tools.

Bibliography

DFSG “Debian Free Software Guidelines”.

http://www.debian.org/social_contract

GPL-Urteil06 Landgericht Frankfurt am Main. “Urteil 2-6 0 224/06”, July 2006.

http://www.jbb.de/urteil_lg_frankfurt_gpl.pdf

GPL91 Free Software Foundation, Inc. “GNU General Public License, Version 2”,

June 1991.

http://www.gnu.org/licenses/gpl.html

LR89 Don Libes, Sandy Ressler. Life with UNIX: A Guide for Everyone. Prentice-

Hall, 1989. ISBN 0-13-536657-7.

Rit84 Dennis M. Ritchie. “The Evolution of the Unix Time-sharing System”.

AT&T Bell Laboratories Technical Journal, October 1984. 63(6p2):1577–93.

http://cm.bell-labs.com/cm/cs/who/dmr/hist.html

RT74 Dennis M. Ritchie, Ken Thompson. “The Unix Time-sharing System”. Com-

munications of the ACM, July 1974. 17(7):365–73. The classical paper on Unix.

TD02 Linus Torvalds, David Diamond. Just For Fun: The Story of an Accidental

Revolutionary. HarperBusiness, 2002. ISBN 0-066-62073-2.

$ echo tux

tux

$ ls

hallo.c

hallo.o

$ /bin/su -

Password:

Using the Linux System

Contents

2.1 Logging In and Out . . . . . . . . . . . . . . . . . . . 32

2.2 Switching On and O . . . . . . . . . . . . . . . . . . 34

2.3 The System Administrator. . . . . . . . . . . . . . . . . 34

Goals

• Logging on and o the system

• Understanding the dierence between normal user accounts and the system

administrator’s account

Prerequisites

• Basic knowledge of using computers is helpful

grd1-bedienung.tex

(

be27bba8095b329b

)

32 2 Using the Linux System

Figure 2.1: The login screens of some common Linux distributions

2.1 Logging In and Out

The Linux system distinguishes between dierent users. Consequently, it may

be impossible to start working right after the computer has been switched on.

First you have to tell the computer who you are—you need to “log in” (or “on”).

Based on the information you provide, the system can decide what you may do

(or may not do). Of course you need access rights to the system (an “account”) –access rights

the system administrator must have entered you as a valid user and assigned you

a user name (e. g.,

joe

) and a password (e. g.,

secret

). The password is supposed to

ensure that only you can use your account; you must keep it secret and should not

make it known to anybody else. Whoever knows your user name and password

can pretend to be you on the system, read (or delete) all your les, send electronic

mail in your name and generally get up to all kinds of shenanigans.

BModern Linux distributions want to make it easy on you and allow you to

skip the login process on a computer that only you will be using anyway. If

you use such a system, you will not have to log in explicitly, but the computer

boots straight into your session. You should of course take advantage of this

only if you do not foresee that third parties have access to your computer;

refrain from this in particular on laptop computers or other mobile systems

that tend to get lost or stolen.

Logging in in a graphical enviroment These days it is common for Linux worksta-

tions to present a graphical environment (as they should), and the login process

takes place in a graphical environment as well. Your computer shows a dialog

2.1 Logging In and Out 33

that lets you enter your user name and password (Figure 2.1 shows some repre-

sentative examples.)

BDon’t wonder if you only see asterisks when you’re entering your password.

This does not mean that your computer misunderstands your input, but that

it wants to make life more dicult for people who are watching you over

your shoulder in order to nd out your password.

After you have logged in, the computer starts a graphical session for you, in

which you have convenient access to your application programs by means of

menus and icons (small pictures on the “desktop” background). Most graphical

environments for Linux support “session management” in order to restore your

session the way it was when you nished it the time before (as far as possible,

anyway). That way you do not need to remember which programs you were

running, where their windows were placed on the screen, and which les you

had been using.

Logging out in a graphical environment If you are done with your work or want

to free the computer for another user, you need to log out. This is also important

because the session manager needs to save your current session for the next time.

How logging out works in detail depends on your graphical environment, but as

a rule there is a menu item somewhere that does everything for you. If in doubt,

consult the documentation or ask your system administrator (or knowledgeable

buddy).

Logging in on a text console Unlike workstations, server systems often support

only a text console or are installed in draughty, noisy machine halls, where you

don’t want to spend more time than absolutely necessary. So you will prefer to log

into such a computer via the network. In both cases you will not see a graphical

For example, you might simply see something like

computer login: _

(if we stipulate that the computer in question is called “

computer

”). Here you must

enter your user name and nish it o with the ↩key. The computer will con-

tinue by asking you for your password:

Password: _

Enter your password here. (This time you won’t even see asterisks—simply noth-

ing at all.) If both the user name and password were correct, the system will ac-

cept your login. It starts the command line interpreter (the shell), and you can

enter commands and invoke programs. After logging in you will be placed in

your “home directory”, where you will be able to nd your les.

BIf you use the “secure shell”, for example, to log in to another machine over

the network, the user name question is usually skipped, since unless you

specify otherwise the system will assume that your user name on the re-

mote computer will be the same as on the computer you are initiating the

session from. The details are beyond the scope of this manual; the secure

shell is discussed in detail in the Linup Front training manual Linux Admin-

istration II.

Logging out on a text console On the text console, you can log out using, for

example, the

logout

command:

$logout

34 2 Using the Linux System

Once you have logged out, on a text console the system once more displays the

start message and a login prompt for the next user. With a secure shell session,

you simply get another command prompt from your local computer.

Exercises

C2.1 [!1] Try logging in to the system. After that, log out again. (You will nd

a user name and password in your system documentation, or—in a training

centre—your instructor will tell you what to use.)

C2.2 [!2] What happens if you give (a) a non-existing user name, (b) a wrong

password? Do you notice anything unusual? What reasons could there be

for the system to behave as it does?

2.2 Switching On and Off

A Linux computer can usually be switched on by whoever is able to reach the

switch (local policy may vary). On the other hand, you should not switch o a

Linux machine on a whim—there might be data left in main memory that really

belong on disk and will be lost, or—which would be worse—the data on the hard

disk could get completely addled. Besides, other users might be logged in to the

machine via the network, be surprised by the sudden shutdown, and lose valu-

able work. For this reason, important computers are usually only “shut down”

by the system administrator. Single-user workstations, though, can usually be

shut down cleanly via the graphical desktop; depending on the system’s settings

normal user privileges may suce, or you may have to enter the administrator’s

password.

Exercises

C2.3 [2] Check whether you can shut down your system cleanly as a normal

(non-administrator) user, and if so, try it.

2.3 The System Administrator

As a normal user, your privileges on the system are limited. For example, you may

not write to certain les (most les, actually—mostly those that do not belong to

you) and not even read some les (e. g., the le containing the encrypted pass-

words of all users). However, there is a user account for system administration

which is not subject to these restrictions—the user “

root

” may read and write all

les, and do various other things normal users aren’t entitled to. Having admin-

istrator (or “root”) rights is a privilege as well as a danger—therefore you should

only log on to the system as

root

if you actually want to exercise these rights, not

just to read your mail or surf the Internet.

ASimply pretend you are Spider-Man: “With great power comes great re-

sponsibility”. Even Spider-Man wears his Spandex suit only if he must …

In particular, you should avoid logging in as

root

via the graphical user inter-

face, since all of the GUI will run with

root

’s privileges. This is a possible security

risk—GUIs like KDE contain lots of code which is not vetted as thoroughly forGUI as

root

: risky

security holes as the textual shell (which is, by comparison, relatively compact).

Normally you can use the command “

/bin/su -

” to assume

root

’sidentity (and thusAssuming

root

’s identity

root

’s privileges).

asks for

root

’s password and then starts a new shell, which

lets you work as if you had logged in as

root

directly. You can leave the shell again

using the

exit

command.

2.3 The System Administrator 35

Figure 2.2: Running programs as a dierent user in KDE

EYou should get used to invoking

via its full path name—“

/bin/su -

”. Oth-

erwise, a user could trick you by calling you to her computer, getting you to

enter “

” in one of her windows and to input the

root

password. What you

don’t realize at that moment is that the clever user wrote her own “Trojan”

command—which doesn’t do anything except write the password to a

le, output the “wrong password” error message and remove itself. When

you try again (gritting your teeth) you get the correct

—and your user

possesses the coveted administrator’s privileges …

You can usually tell that you actually have administrator privileges by look-

ing at the shell prompt—for

root

, it customarily ends with the “

” character. (For

root

’s shell prompt

normal users, the shell prompt usually ends in “

” or “

”).

In Ubuntu you can’t even log in as

root

by default. Instead, the system al-

lows the rst user created during installation to execute commands with

administrator privileges by prexing them with the

sudo

command. With

$sudo chown joe file.txt

for example, he could sign over the

file.txt

le to user

joe

– an operation

that is restricted to the system administrator.

Recent versions of Debian GNU/Linux oer a similar arrangement to

Ubuntu.

BIncidentally, with the KDE GUI, it is very easy to start arbitrary programs

root

and KDE

root

: Select “Run command” from the “KDE” menu (usually the entry

at the very left of the command panel—the “Start” menu on Windows sys-

tems), and enter a command in the dialog window. Before executing the

command, click on the “Settings” button; an area with additional settings

appears, where you can check “As dierent user” (

root

is helpfully set up as

the default value). You just have to enter the

root

password at the bottom

(Figure 2.2).

36 2 Using the Linux System

BAlternatively, you can put “

kdesu

” in front of the actual command in the dia-

kdesu

log window (or indeed any shell command line in a KDE session). This will

ask you for

root

’s password before starting the command with administrator

privileges.

Exercises

C2.4 [!1] Use the

command to gain administrator privileges, and change

back to your normal account.

C2.5 [5] (For programmers.) Write a convincing “Trojan”

program. Use it

to try and fool your system administrator.

C2.6 [2] Try to run the

program as

root

in a terminal session under KDE, us-

ing “Run command …”. Check the appropriate box in the extended settings

to do so.

Commands in this Chapter

exit

Quits a shell

bash

(1) 34

kdesu

Starts a program as a dierent user on KDE KDE:

help:/kdesu

logout

Terminates a shell session

bash

(1) 33

Starts a shell using a dierent user’s identity

(1) 34

sudo

Allows normal users to execute certain commands with administrator

privileges

sudo

(8) 35

Summary

• Before using a Linux system, you have to log in giving your user name and

password. After using the system, you have to log out again.

• Normal access rights do not apply to user

root

, who may do (essentially)

everything. These privileges should be used as sparingly as possible.

• You should not log in to the GUI as

root

but use (e. g.)

to assume admin-

istrator privileges if necessary.

$ echo tux

tux

$ ls

hallo.c

hallo.o

$ /bin/su -

Password:

Who’s Afraid Of The Big Bad

Shell?

Contents

3.1 Why?........................38

3.1.1 What Is The Shell? . . . . . . . . . . . . . . . . . 38

3.2 Commands . . . . . . . . . . . . . . . . . . . . . . 40

3.2.1 Why Commands?. . . . . . . . . . . . . . . . . . 40

3.2.2 Command Structure. . . . . . . . . . . . . . . . . 40

3.2.3 Command Types . . . . . . . . . . . . . . . . . . 41

3.2.4 Even More Rules . . . . . . . . . . . . . . . . . . 42

Goals

• Appreciating the advantages of a command-line user interface

• Knowing about common Linux shells

• Working with Bourne-Again Shell (Bash) commands

• Understanding the structure of Linux commands

Prerequisites

• Basic knowledge of using computers is helpful

grd1-shell1.tex

(

be27bba8095b329b

)

38 3 Who’s Afraid Of The Big Bad Shell?

3.1 Why?

More so than other modern operating systems, Linux (like Unix) is based on the

idea of entering textual commands via the keyboard. This may sound antediluvial

to some, especially if one is used to systems like Windows, who have been trying

for 15 years or so to brainwash their audience into thinking that graphical user

interfaces are the be-all and end-all. For many people who come to Linux from

Windows, the comparative prominence of the command line interface is at rst

a “culture shock” like that suered by a 21-century person if they suddenly got

transported to King Arthur’s court – no cellular coverage, bad table manners, and

dreadful dentists!

However, things aren’t as bad as all that. On the one hand, nowadays there

are graphical interfaces even for Linux, which are equal to what Windows or Ma-

cOS X have to oer, or in some respects even surpass these as far as convenience

and power are concerned. On the other hand, graphical interfaces and the text-

oriented command line are not mutually exclusive, but in fact complementary

(according to the philosophy “the right tool for every job”).

At the end of the day this only means that you as a budding Linux user will

do well to also get used to the text-oriented user interface, known as the “shell”.

Of course nobody wants to prevent you from using a graphical desktop for every-

thing you care to do. The shell, however, is a convenient way to perform many

extremely powerful operations that are rather dicult to express graphically. To

reject the shell is like rejecting all gears except rst in your car1. Sure, you’ll get

there eventually even in rst gear, but only comparatively slowly and with a hor-

rible amount of noise. So why not learn how to really oor it with Linux? And if

you watch closely, we’ll be able to show you another trick or two.

3.1.1 What Is The Shell?

Users cannot communicate directly with the operating system kernel. This is only

possible through programs accessing it via “system calls”. However, you must be

able to start such programs in some way. This is the task of the shell, a special user

program that (usually) reads commands from the keyboard and interprets them

(for example) as commands to be executed. Accordingly, the shell serves as an

“interface” to the computer that encloses the actual operating system like a shell

(as in “nutshell”—hence the name) and hides it from view. Of course the shell is

only one program among many that access the operating system.

BEven today’s graphical “desktops” like KDE can be considered “shells”. In-

stead of reading text commands via the keyboard, they read graphical com-

mands via the mouse—but as the text commands follow a certain “gram-

mar”, the mouse commands do just the same. For example, you select ob-

jects by clicking on them and then determine what to do with them: open-

ing, copying, deleting, …

Even the very rst Unix—end-1960s vintage—had a shell. The oldest shell to

be found outside museums today was developed in the mid-1970s for “Unix ver-

sion 7” by Stephen L. Bourne. This so-called “Bourne shell” contains most basicBourne shell

functions and was in very wide-spread use, but is very rarely seen in its original

form today. Other classic Unix shells include the C shell, created at the UniversityC shell

of California in Berkeley and (very vaguely) based on the C programming lan-

guage, and the largely Bourne-shell compatible, but functionally enhanced, KornKorn shell

shell (by David Korn, also at AT&T).

Standard on Linux systems is the Bourne-again shell,

bash

for short. It wasBourne-again shell

developed under the auspices of the Free Software Foundation’s GNU project by

Brian Fox and Chet Ramey and unies many functions of the Korn and C shells.

1This metaphor is for Europeans and other people who can manage a stick shift; our American

readers of course all use those wimpy automatic transmissions. It’s like they were all running Win-

dows.

3.1 Why? 39

BBesides the mentioned shells, there are many more. On Unix, a shell is sim- shells: normal programs

ply an application program like all others, and you need no special privi-

leges to write one—you simply need to adhere to the “rules of the game”

that govern how a shell communicates with other programs.

Shells may be invoked interactively to read user commands (normally on a “ter-

minal” of some sort). Most shells can also read commands from les containing

pre-cooked command sequences. Such les are called “shell scripts”. shell scripts

A shell performs the following steps:

1. Read a command from the terminal (or the le)

2. Validate the command

3. Run the command directly or start the corresponding program

4. Output the result to the screen (or elsewhere)

5. Continue at step 1.

In addition to this standard command loop, a shell generally contains further fea-

tures such as a programming language. This includes complex command struc- programming language

tures involving loops, conditions, and variables (usually in shell scripts, less fre-

quently in interactive use). A sophisticated method for recycling recently used

commands also makes a user’s life easier.

Shell sessions can generally be terminated using the

exit

command. This also Terminating shell sessions

applies to the shell that you obtained immediately after logging in.

Although, as we mentioned, there are several dierent shells, we shall concen-

trate here on

bash

as the standard shell on most Linux distributions. The LPI exams

also refer to

bash

exclusively.

BIf there are several shells available on the system (the usual case), you can Changing shell

use the following commands to switch between them:

for the classic Bourne shell (if available—on most Linux systems,

refers

to the Bourne-again shell).

bash

for the Bourne-again shell (bash).

ksh

for the Korn shell.

csh

for the C shell.

tcsh

for the “Tenex C shell”, an extended and improved version of the nor-

mal C shell. On many Linux systems, the

csh

command really refers to

tcsh

BIn case you cannot remember which shell you are currently running, the

“

echo $0

” command should work in any shell and output the current shell’s

name.

Exercises

C3.1 [2] How many dierent shells are installed on your system? Which ones?

(Hint: Check the le

/etc/shells

C3.2 [2] Log o and on again and check the output of the “

echo $0

” command

in the login shell. Start a new shell using the “

bash

” command and enter

“

echo $0

” again. Compare the output of the two commands. Do you notice

anything unusual?

40 3 Who’s Afraid Of The Big Bad Shell?

3.2 Commands

3.2.1 Why Commands?

A computer’s operation, no matter which operating system it is running, can be

loosely described in three steps:

1. The computer waits for user input

2. The user selects a command and enters it via the keyboard or mouse

3. The computer executes the command

In a Linux system, the shell displays a “prompt”, meaning that commands can be

entered. This prompt usually consists of a user and host (computer) name, the

current directory, and a nal character:

joe@red:/home > _

In this example, user

joe

works on computer

red

in the

/home

directory.

3.2.2 Command Structure

A command is essentially a sequence of characters which is ends with a press

of the ↩key and is subsequently evaluated by the shell. Many commands are

vaguely inspired by the English language and form part of a dedicated “command

language”. Commands in this language must follow certain rules, a “syntax”, forsyntax

the shell to be able to interpret them.

To interpret a command line, the shell rst tries to divide the line into words.words

Just like in real life, words are separated by spaces. The rst word on a line is usu-First word: command

ally the actual command. All other words on the line are parameters that explainparameters

what is wanted in more detail.

ADOS and Windows users may be tripped up here by the fact that the shell

distinguishes between uppercase and lowercase letters. Linux commands

are usually spelled in lowercase letters only (exceptions prove the rule) and

not understood otherwise. See also Section 3.2.4.

BWhen dividing a command into words, one space character is as good as

many – the dierence does not matter to the shell. In fact, the shell does

not even insist on spaces; tabulator characters are also allowed, which is

however mostly of importance when reading commands from les, since

the shell will not let you enter tab character directly (not without jumping

through hoops, anyway).

BYou may even use the line terminator ( ↩) to distribute a long command

across several input lines, but you must put a “Token\” immediately in front

of it so the shell will not consider your command nished already.

A command’s parameters can be roughly divided into two types:

• Parameters starting with a dash (“

”) are called options. These are usually,options

er, optional—the details depend on the command in question. Figuratively

spoken they are “switches” that allow certain aspects of the command to

be switched on or o. If you want to pass several options to a command,

they can (often) be accumulated behind a single dash, i. e., the options se-

quence “

-a -l -F

” corresponds to “

-alF

”. Many programs have more options

than can be conveniently mapped to single characters, or support “long op-

tions” for readability (frequently in addition to equivalent single-character

options). Long options most often start with two dashes and cannot be ac-

cumulated: “

foo --bar --baz

”.

3.2 Commands 41

• Parameters with no leading dash are called arguments. These are often the arguments

names of les that the command should process.

The general command structure can be displayed as follows: command structure

• Command—“What to do?”

• Options—“How to do it?”

• Arguments—“What to do it with?”

Usually the options follow the command and precede the arguments. However,

not all commands insist on this—with some, arguments and options can be mixed

arbitrarily, and they behave as if all options came immediately after the command.

With others, options are taken into account only when they are encountered while

the command line is processed in sequence.

AThe command structure of current Unix systems (including Linux) has

grown organically over a period of almost 40 years and thus exhibits vari-

ous inconsistencies and small surprises. We too believe that there ought to

be a thorough clean-up, but 30 years’ worth of shell scripts are dicult to

ignore completely … Therefore be prepared to get used to little weirdnesses

every so often.

3.2.3 Command Types

In shells, there are essentially two kinds of commands:

Internal commands These commands are made available by the shell itself. The

Bourne-again shell contains approximately 30 such commands, which can

be executed very quickly. Some commands (such as

exit

) alter the state

of the shell itself and thus cannot be provided externally.

External commands The shell does not execute these commands by itself but

launches executable les, which within the le system are usually found

in directories like

/bin

/usr/bin

. As a user, you can provide your own pro-

grams, which the shell will execute like all other external commands.

You can use the

type

command to nd out the type of a command. If you pass External or internal?

a command name as the argument, it outputs the type of command or the corre-

sponding le name, such as

$type echo

echo is a shell builtin

$type date

date is /bin/date

(

echo

is an interesting command which simply outputs its parameters:

$echo Thou hast it now, king, Cawdor, Glamis, all

Thou hast it now, king, Cawdor, Glamis, all

date

displays the current date and time, possibly adjusted to the current time zone

and language setup:

$date

Mon May 7 15:32:03 CEST 2012

You will nd out more about

echo

and

date

in Chapter 8.)

You can obtain help for internal Bash commands via the

help

command: help

42 3 Who’s Afraid Of The Big Bad Shell?

$help type

type: type [-afptP] name [name ...]

For each NAME, indicate how it would be interpreted if used as a

command name.

If the -t option is used, `type' outputs a single word which is one of

`alias', `keyword', `function', `builtin', `file' or `', if NAME is an



Exercises

C3.3 [2] With bash, which of the following programs are provided externally

and which are implemented within the shell itself:

alias

echo

test

3.2.4 Even More Rules

As mentioned above, the shell distinguishes between uppercase and lowercase

letters when commands are input. This does not apply to commands only, but

consequentially to options and parameters (usually le names) as well.

Furthermore, you should be aware that the shell treats certain characters in the

input specially. Most importantly, the already-mentioned space character is usedspace character

to separate words on teh command line. Other characters with a special meaning

include

$&;(){}[]*?!<>"'

If you want to use any of these characters without the shell interpreting according

to its the special meaning, you need to “escape” it. You can use the backslash “

”“Escaping” characters

to escape a single special character or else single or double quotes (

…

) to

excape several special characters. For example:

$touch 'New File'

Due to the quotes this command applies to a single le called

New File

. Without

quotes, two les called

New

and

File

would have been involved.

BWe can’t explain all the other special characters here. Most of them will

show up elsewhere in this manual – or else check the Bash documentation.

Commands in this Chapter

bash

The “Bourne-Again-Shell”, an interactive command interpreter

bash

(1) 38, 39

csh

The “C-Shell”, an interactive command interpreter

csh

(1) 39

date

Displays the date and time

date

(1) 41

echo

Writes all its parameters to standard output, separated by spaces

bash

(1),

echo

(1) 41

help

Displays on-line help for

bash

commands

bash

(1) 41

ksh

The ”‘Korn shell”’, an interactive command interpreter

ksh

(1) 39

The “Bourne shell”, an interactive command interpreter

(1) 39

tcsh

The “Tenex C shell”, an interactive command interpreter

tcsh

(1) 39

type

Determines the type of command (internal, external, alias)

bash

(1) 41

3.2 Commands 43

Summary

• The shell reads user commands and executes them.

• Most shells have programming language features and support shell scripts

containing pre-cooked command sequences.

• Commands may have options and arguments. Options determine how the

command operates, and arguments determine what it operates on.

• Shells dierentiate between internal commands, which are implemented in

the shell itself, and external commands, which correspond to executable les

that are started in separate processes.

$ echo tux

tux

$ ls

hallo.c

hallo.o

$ /bin/su -

Password:

Getting Help

Contents

4.1 Self-Help . . . . . . . . . . . . . . . . . . . . . . . 46

4.2 The

help

Command and the

--help

Option . . . . . . . . . . . 46

4.3 The On-Line Manual . . . . . . . . . . . . . . . . . . 46

4.3.1 Overview . . . . . . . . . . . . . . . . . . . . 46

4.3.2 Structure . . . . . . . . . . . . . . . . . . . . . 47

4.3.3 Chapters . . . . . . . . . . . . . . . . . . . . . 48

4.3.4 Displaying Manual Pages . . . . . . . . . . . . . . . 48

4.4 Info Pages . . . . . . . . . . . . . . . . . . . . . . 49

4.5 HOWTOs.......................50

4.6 Further Information Sources . . . . . . . . . . . . . . . . 50

Goals

• Being able to handle manual and info pages

• Knowing about and nding HOWTOs

• Being familiar with the most important other information sources

Prerequisites

• Linux Overview

• Basic command-line Linux usage (e. g., from the previous chapters)

grd1-hilfe.tex

(

be27bba8095b329b

)

46 4 Getting Help

4.1 Self-Help

Linux is a powerful and intricate system, and powerful and intricate systems are,

as a rule, complex. Documentation is an important tool to manage this complex-

ity, and many (unfortunately not all) aspects of Linux are documented very exten-

sively. This chapter describes some methods to access this documentation.

B“Help” on Linux in many cases means “self-help”. The culture of free soft-

ware implies not unnecessarily imposing on the time and goodwill of other

people who are spending their free time in the community by asking things

that are obviously explained in the rst few paragraphs of the manual. As

a Linux user, you do well to have at least an overview of the available doc-

umentation and the ways of obtaining help in cases of emergency. If you

do your homework, you will usually experience that people will help you

out of your predicament, but any tolerance towards lazy individuals who

expect others to tie themselves in knots on their behalf, on their own time,

is not necessarily very pronounced.

BIf you would like to have somebody listen around the clock, seven days a

week, to your not-so-well-researched questions and problems, you will have

to take advantage of one of the numerous “commercial” support oerings.

These are available for all common distributions and are oered either by

the distribution vendor themselves or else by third parties. Compare the

dierent service vendors and pick one whose service level agreements and

pricing suit you.

4.2 The

help

Command and the

--help

Option

bash

, internal commands are described in more detail by the

help

command,Internal

bash

commands

giving the command name in question as an argument:

$help exit

exit: exit [n]

Exit the shell with a status of N.

If N is omitted, the exit status

is that of the last command executed.

$ _

BMore detailed explanations are available from the shell’s manual page and

info documentation. These information sources will be covered later in this

chapter.

Many external commands (programs) support a

--help

option instead. Most

commands display a brief listing of their parameters and syntax.

BNot every command reacts to

--help

; frequently the option is called

-h

or help will be output if you specify any invalid option or otherwise illegal

command line. Unfortunately there is no universal convention.

4.3 The On-Line Manual

4.3.1 Overview

Nearly every command-line program comes with a “manual page” (or “man

page”), as do many conguration les, system calls etc. These texts are generally

installed with the software, and can be perused with the “

man

⟨name⟩” command.Command

man

4.3 The On-Line Manual 47

Table 4.1: Manual page sections

Section Content

NAME Command name and brief description

SYNOPSIS Description of the command syntax

DESCRIPTION Verbose description of the command’s eects

OPTIONS Available options

ARGUMENTS Available Arguments

FILES Auxiliary les

EXAMPLES Sample command lines

SEE ALSO Cross-references to related topics

DIAGNOSTICS Error and warning messages

BUGS Known limitations of the command

Here, ⟨name⟩is the command or le name that you would like explained. “

man

bash

”, for example, produces a list of the aforementioned internal shell commands.

However, the manual pages have some disadvantages: Many of them are only

available in English; there are sets of translations for dierent languages which are

often incomplete. Besides, the explanations are frequently very complex. Every

single word can be important, which does not make the documentation accessi-

ble to beginners. In addition, especially with longer documents the structure can

be obscure. Even so, the value of this documentation cannot be underestimated.

Instead of deluging the user with a large amount of paper, the on-line manual is

always available with the system.

BMany Linux distributions pursue the philosophy that there should be a

manual page for every command that can be invoked on the command line.

This does not apply to the same extent to programs belonging to the graph-

ical desktop environments KDE and GNOME, many of which not only do

not come with a manual page at all, but which are also very badly docu-

mented even inside the graphical environment itself. The fact that many of

these programs have been contributed by volunteers is only a weak excuse.

4.3.2 Structure

The structure of the man pages loosely follows the outline given in Table 4.1, even Man page outline

though not every manual page contains every section mentioned there. In partic-

ular, the EXAMPLES are frequently given short shrift.

BThe BUGS heading is often misunderstood: Read bugs within the imple-

mentation get xed, of course; what is documented here are usually restric-

tions which follow from the approach the command takes, which are not able

to be lifted with reasonable eort, and which you as a user ought to know

about. For example, the documentation for the

grep

command points out

that various constructs in the regular expression to be located may lead to

the

grep

process using very much memory. This is a consequence of the way

grep

implements searching and not a trivial, easily xed error.

Man pages are written in a special input format which can be processed for text

display or printing by a program called

groff

. Source code for the manual pages is

stored in the

/usr/share/man

directory in subdirectories called

man

𝑛, where 𝑛is one

of the chapter numbers from Table 4.2.

BYou can integrate man pages from additional directories by setting the

MAN-

PATH

environment variable, which contains the directories which will be

searched by

man

, in order. The

manpath

command gives hints for setting up

MANPATH

48 4 Getting Help

Table 4.2: Manual Page Topics

No. Topic

1 User commands

2 System calls

3 C language library functions

4 Device les and drivers

5 Conguration les and le formats

6 Games

7 Miscellaneous (e. g.

groff

macros, ASCII tables, …)

8 Administrator commands

9 Kernel functions

n »New« commands

4.3.3 Chapters

Every manual page belongs to a “chapter” of the conceptual “manual” (Table 4.2).Chapters

Chapters 1, 5 and 8 are most important. You can give a chapter number on the

man

command line to narrow the search. For example, “

man 1 crontab

” displays the

man page for the

crontab

command, while “

man 5 crontab

” explains the format of

crontab

les. When referring to man pages, it is customary to append the chap-

ter number in parentheses; we dierentiate accordingly between

crontab

(1), the

crontab

command manual, and

crontab

(5), the description of the le format.

With the

-a

option,

man

displays all man pages matching the given name; with-

man -a

out this option, only the rst page found (generally from chapter 1) will be dis-

played.

4.3.4 Displaying Manual Pages

The program actually used to display man pages on a text terminal is usually

less

, which will be discussed in more detail later on. At this stage it is important

to know that you can use the cursor keys ↑and ↓to navigate within a man

page. You can search for keywords inside the text by pressing /—after entering

the word and pressing the return key, the cursor jumps to the next occurrence of

the word (if it does occur at all). Once you are happy, you can quit the display

using qto return to the shell.

BUsing the KDE web browser, Konqueror, it is convenient to obtain nicely for-

matted man pages. Simply enter the URL “

man:/

⟨name⟩” (or even “

⟨name⟩”)

Figure 4.1: A manual page in a text terminal (left) and in Konqueror (right)

4.4 Info Pages 49

in the browser’s address line. This also works on the KDE command line

(Figure 2.2).

Before rummaging aimlessly through innumerable man pages, it is often sen-

sible to try to access general information about a topic via

apropos

. This command Keyword search

works just like “

man -k

”; both search the “NAME” sections of all man pages for

a keyword given on the command line. The output is a list of all manual pages

containing the keyword in their name or description.

A related command is

whatis

. This also searches all manual pages, but for a

whatis

command (le, …) name rather than a keyword—in other words, the part of the

“NAME” section to the left of the dash. This displays a brief description of the

desired command, system call, etc.; in particular the second part of the “NAME”

section of the manual page(s) in question.

whatis

is equivalent to “

man -f

”.

Exercises

C4.1 [!1] View the manual page for the

command. Use the text-based

man

command and—if available—the Konqueror browser.

C4.2 [2] Which manual pages on your system deal (at least according to their

“NAME” sections) with processes?

C4.3 [5] (Advanced.) Use a text editor to write a manual page for a hypotheti-

cal command. Read the

man

(7) man page beforehand. Check the appearance

of the man page on the screen (using “

groff -Tascii -man

⟨le⟩

| less

”) and

as printed output (using something like “

groff -Tps -man

⟨le⟩

| gv -

”).

4.4 Info Pages

For some commands—often more complicated ones—there are so-called “info

pages” instead of (or in addition to) the more usual man pages. These are usu-

ally more extensive and based on the principles of hypertext, similar to the World hypertext

Wide Web.

BThe idea of info pages originated with the GNU project; they are therefore

most frequently found with software published by the FSF or otherwise be-

longing to the GNU project. Originally there was supposed to be only info

documentation for the “GNU system”; however, since GNU also takes on

board lots of software not created under the auspices of the FSF, and GNU

tools are being used on systems pursuing a more conservative approach,

the FSF has relented in many cases.

Analogously to man pages, info pages are displayed using the “

info

⟨command⟩”

command (the package containing the

info

program may have to be installed

explicitly). Furthermore, info pages can be viewed using the

emacs

editor or dis-

played in the KDE web browser, Konqueror, via URLs like “

info:/

⟨command⟩”.

BOne advantage of info pages is that, like man pages, they are written in

a source format which can conveniently be processed either for on-screen

display or for printing manuals using PostScript or PDF. Instead of

groff

the T

EX typesetting program is used to prepare output for printing.

Exercises

C4.4 [!1] Look at the info page for the

program. Try the text-based

info

browser and, if available, the Konqueror browser.

C4.5 [2] Info les use a crude (?) form of hypertext, similar to HTML les on

the World Wide Web. Why aren’t info les written in HTML to begin with?

50 4 Getting Help

4.5 HOWTOs

Both manual and info pages share the problem that the user must basically know

the name of the program to use. Even searching with

apropos

is frequently nothing

but a game of chance. Besides, not every problem can be solved using one sin-

gle command. Accordingly, there is “problem-oriented” rather than “command-Problem-oriented

documentation oriented” documentation is often called for. The HOWTOs are designed to help

with this.

HOWTOs are more extensive documents that do not restrict themselves to sin-

gle commands in isolation, but try to explain complete approaches to solving

problems. For example, there is a “DSL HOWTO” detailing ways to connect a

Linux system to the Internet via DSL, or an “Astronomy HOWTO” discussing as-

tronomy software for Linux. Many HOWTOs are available in languages other

than English, even though the translations often lag behind the English-language

originals.

Most Linux distributions furnish the HOWTOs (or signicant subsets) as pack-HOWTO packages

ages to be installed locally. They end up in a distribution-specic directory—

/usr/

share/doc/howto

for SUSE distributions,

/usr/share/doc/HOWTO

for Debian GNU/Linux—

, typically either als plain text or else HTML les. Current versions of all HOWTOsHOWTOs on the Web

and other formats such as PostScript or PDF can be found on the Web on the site

of the “Linux Documentation Project” (

http://www.tldp.org

) which also oers other

Linux documentation.

4.6 Further Information Sources

You will nd additional documentation and example les for (nearly) every in-Additional information

stalled software package under

/usr/share/doc

/usr/share/doc/packages

(depend-

ing on your distribution). Many GUI applications (such as those from the KDE or

GNOME packages) oer “help” menus. Besides, many distributions oer special-

ized “help centers” that make it convenient to access much of the documentation

on the system.

Independently of the local system, there is a lot of documentation available on

the Internet, among other places on the WWW and in USENET archives.WWW

USENET Some of the more interesting web sites for Linux include:

http://www.tldp.org/

The “Linux Documentation Project”, which is in charge of

man pages and HOWTOs (among other things).

http://www.linux.org/

A general “portal” for Linux enthusiasts.

http://www.linuxwiki.de/

A “free-form text information database for everything

pertaining to Linux” (in German).

http://lwn.net/

Linux Weekly News—probably the best web site for Linux news of

all sorts. Besides a daily overview of the newest developments, products,

security holes, Linux advocacy in the press, etc., on Thursdays there is an

extensive on-line magazine with well-researched background reports about

the preceding week’s events. The daily news are freely available, while the

weekly issues must be paid for (various pricing levels starting at US-$ 5 per

month). One week after their rst appearance, the weekly issues are made

available for free as well.

http://freecode.com/

This site publishes announcements of new (predominantly

free) software packages, which are often available for Linux. In addition to

this there is a database allowing queries for interesting projects or software

packages.

http://www.linux-knowledge-portal.de/

A site collecting “headlines” from other in-

teresting Linux sites, including LWN and Freshmeat.

4.6 Further Information Sources 51

If there is nothing to be found on the Web or in Usenet archives, it is possible to

ask questions in mailing lists or Usenet groups. In this case you should note that

many users of these forums consider it very bad form to ask questions answered

already in the documentation or in a “FAQ” (frequently answered questions) re-

source. Try to prepare a detailed description of your problem, giving relevant

excerpts of log les, since a complex problem like yours is dicult to diagnose at

a distance (and you will surely be able to solve non-complex problems by your-

self).

BA news archive is accessible on

http://groups.google.com/

(formerly De-

jaNews)

BInteresting news groups for Linux can be found in the English-language

comp.os.linux.*

or the German-language

de.comp.os.unix.linux.*

hierarchies.

Many Unix groups are appropriate for Linux topics; a question about the

shell should be asked in a group dedicated to shell programming rather

than a Linux group, since shells are usually not specic to Linux.

BLinux-oriented mailing lists can be found, for example, at

majordomo@vger.

kernel.org

. You should send an e-mail message including “

subscribe LIST

” to

this address in order to subscribe to a list called LIST. A commented list of

all available mailing lists on the system may be found at

http://vger.kernel.

org/vger-lists.html

BAn established strategy for dealing with seemingly inexplicable problems is

to search for the error message in question using Google (or another search search engine

engine you trust). If you do not obtain a helpful result outright, leave out

those parts of your query that depend on your specic situation (such as

domain names that only exist on your system). The advantage is that Google

indexes not just the common web pages, but also many mailing list archives,

and chances are that you will encounter a dialogue where somebody else

had a problem very like yours.

Incidentally, the great advantage of open-source software is not only the large

amount of documentation, but also the fact that most documentation is restricted Free documentation

as little as the software itself. This facilitates collaboration between software

developers and documentation authors, and the translation of documentation

into dierent languages is easier. In fact, there is ample opportunity for non-

programmers to help with free software projects, e. g., by writing good documen-

tation. The free-software scene should try to give the same respect to documen-

tation authors that it does to programmers—a paradigm shift that has begun but

is by no means nished yet.

Commands in this Chapter

apropos

Shows all manual pages whose NAME sections contain a given keyword

apropos

(1) 49

groff

Sophisticated typesetting program

groff

(1) 47, 49

help

Displays on-line help for

bash

commands

bash

(1) 46

info

Displays GNU Info pages on a character-based terminal

info

(1) 49

less

Displays texts (such as manual pages) by page

less

(1) 48

man

Displays system manual pages

man

(1) 46

manpath

Determines the search path for system manual pages

manpath

(1) 47

whatis

Locates manual pages with a given keyword in its description

whatis

(1) 49

52 4 Getting Help

Summary

• “

help

⟨command⟩” explains internal

bash

commands. Many external com-

mands support a

--help

option.

• Most programs come with manual pages that can be perused using

man

apropos

searches all manual pages for keywords,

whatis

looks for manual

page names.

• For some programs, info pages are an alternative to manual pages.

• HOWTOs form a problem-oriented kind of documentation.

• There is a multitude of interesting Linux resources on the World Wide Web

and USENET.

$ echo tux

tux

$ ls

hallo.c

hallo.o

$ /bin/su -

Password:

The

Editor

Contents

5.1 Editors........................54

5.2 The Standard—

....................54

5.2.1 Overview . . . . . . . . . . . . . . . . . . . . 54

5.2.2 Basic Functions . . . . . . . . . . . . . . . . . . 55

5.2.3 Extended Commands . . . . . . . . . . . . . . . . 58

5.3 Other Editors . . . . . . . . . . . . . . . . . . . . . 60

Goals

• Becoming familiar with the

editor

• Being able to create and change text les

Prerequisites

• Basic shell operation (qv. Chapter 2)

grd1-editoren-opt.tex

[

!emacs

] (

be27bba8095b329b

)

54 5 The

Editor

5.1 Editors

Most operating systems oer tools to create and change text documents. Such

programs are commonly called “editors” (from the Latin “edire”, “to work on”).

Generally, text editors oer functions considerably exceeding simple text input

and character-based editing. Good editors allow users to remove, copy or insert

whole words or lines. For long les, it is helpful to be able to search for partic-

ular sequences of characters. By extension, “search and replace” commands can

make tedious tasks like “replace every

by a

” considerably easier. Many editors

contain even more powerful features for text processing.

In contrast to widespread “word processors” such as OpenOce.org Writer orDifference to word processors

Microsoft Word, text editors usually do not oer markup elements such as various

fonts (Times, Helvetica, Courier, …), type attributes (boldface, italic, underlined,

…), typographical features (justied type, …) and so on—they are predominantly

intended for the creation and editing of pure text les, where these things would

really be a nuisance.

BOf course there is nothing wrong with using a text editor to prepare input

les for typesetting systems such as

groff

or L

EX that oer all these typo-

graphic features. However, chances are you won’t see much of these in your

original input—which can really be an advantage: After all, much of the ty-

pography serves as a distraction when writing, and authors are tempted to

ddle with a document’s appearance while inputting text, rather than con-

centrating on its content.

BMost text editors today support syntax highlighting, that is, identifying cer-Syntax highlighting

tain elements of a program text (comments, variable names, reserved words,

strings) by colours or special fonts. This does look spiy, even though the

question of whether it really helps with programming has not yet been an-

swered through suitable psychological studies.

In the rest of the chapter we shall introduce the possibly most important Linux

editor,

. However, we shall restrict ourselves to the most basic functionality; it

would be easy to conduct multi-day training courses for each of the two. As with

the shells, the choice of text editor is up to a user’s own personal preference.

Exercises

C5.1 [2] Which text editors are installed on your system? How can you nd

out?

5.2 The Standard—

5.2.1 Overview

The only text editor that is probably part of every Linux system is called

(from

“visual”, not Roman 6—usually pronounced “vee-i”). For practical reasons, this

usually doesn’t mean the original

(which was part of BSD and is decidedly long

: today a clone

in the tooth today), but more modern derivatives such as

vim

(from “

improved”)

elvis

; these editors are, however, suciently close to the original

, to be all

lumped together.

, originally developed by Bill Joy for BSD, was one of the rst “screen-

oriented” editors in widespread use for Unix. This means that it allowed users to

use the whole screen for editing rather than let them edit just one line at a time.

This is today considered a triviality, but used to be an innovation—which is not to

say that earlier programmers were too stupid to gure it out, but that text termi-

nals allowing free access to arbitrary points on the screen (a mandatory feature for

programs like

) had only just become aordable. Out of consideration for older

5.2 The Standard—

systems using teletypes or “glass ttys” (terminals that could only add material at

the bottom of the screen),

also supports a line-oriented editor under the name

Even with the advanced terminals of that time, one could not rely on the

availability of keyboards with special keys for cursor positioning or advanced Keyboard restrictions

functions—today’s standard PC keyboards would have been considered luxuri-

ous, if not overloaded. This justies

’s unusual concepts of operation, which

today could rightly be considered antediluvian. It cannot be taken amiss if peo-

ple reject

because of this. In spite of this, having rudimentary knowledge of

cannot possibly hurt, even if you select a dierent text editor for your daily

work—which you should by all means do if

does not agree with you. It is not

as if there was no choice of alternatives, and we shall not get into childish games

such as “Whoever does not use

is not a proper Linux user”. Today’s graphical

desktops such as KDE do contain very nice and powerful text editors.

BThere is, in fact, an editor which is even cruder than

—the

program.

The title “the only editor that is guaranteed to be available on any Unix sys-

tem” rightfully belongs to

instead of

, but

as a pure line editor with

a teletype-style user interface is too basic for even hardcore Unix advocates.

(

can be roughly compared with the DOS program,

EDLIN

;

, however, is

vastly more powerful than the Redmond oering.) The reason why

is still

available in spite of the existence of dozens of more convenient text editors

is unobvious, but very Unix-like:

accepts commands on its standard input

and can therefore be used in shell scripts to change les programmatically.

allows editing operations that apply to the whole le at once and is, thus,

more powerful than its colleague, the “stream editor”

sed

, which copies its

standard input to its standard output with certain modications; normally

one would use

sed

and revert to

for exceptional cases, but

is still useful

every so often.

5.2.2 Basic Functions

The Buffer Concept

works in terms of so-called buers. If you invoke

with buffers

a le name as an argument, the content of that le will be read into a buer. If no

le exists by that name, an empty buer is created.

All the modications made with the editor are only applied inside the buer.

To make these modications permanent, the buer content must be explicitly

written back to the le. If you really want to discard the modications, simply

leave

without storing the buer content—the le on the storage medium will

remain unchanged.

In addition to a le name as an argument, you can pass options to

as usual.

Refer to the documentation for the details.

Modes As mentioned earlier, one of the characteristics of

is its unusual man-

ner of operation.

supports three dierent working “modes”:

Command mode All keyboard input consists of commands that do not appear

on screen and mostly do not need to be nalized using the return key. Af-

ter invoking

, you end up in this mode. Be careful: Any key press could

invoke a command.

Insert mode All keyboard input is considered text and displayed on the screen.

behaves like a “modern” editor, albeit with restricted navigation and cor-

rection facilities.

Command-line mode This is used to enter long commands. These usually start

with a colon (“

”) and are nished using the return key.

In insert mode, nearly all navigation or correction commands are disabled, which

requires frequent alternation between insert and command modes. The fact that

56 5 The

Editor

Insert Mode

Command Mode

Command-Line Mode

[Esc] a, i, o, ...

: [Return]

vi command

ZZ, ...

Figure 5.1:

’s modes

Table 5.1: Insert-mode commands for

Command Result

aAppends new text after the cursor

AAppends new text at the end of the line

iInserts new text at the cursor position

IInserts new text at the beginning of the line

oInserts a new line below the line containing the cursor

OInserts a new line above the line containing the cursor

it may be dicult to nd out which mode the editor is currently in (depending on

the

implementation used and its conguration) does not help to make things

easier for beginners. An overview of

modes may be found in Figure 5.1.

BConsider:

started when keyboards consisting only of the “letter block” of

modern keyboards were common (127 ASCII characters). There was really

no way around the scheme used in the program.

After invoking

without a le name you end up in command mode. In con-command mode

trast to most other editors, direct text input is not possible. There is a cursor at the

top left corner of the screen above a column lled with tildes. The last line, also

called the “status line”, displays the current mode (maybe), the name of the le

currently being edited (if available) and the current cursor position.

BIf your version of

does not display status information, try your luck with

Esc

:set showmode

↩.

Shortened by a few lines, this looks similar to Das sieht (um einige Zeilen

verkürzt) etwa so aus:

5.2 The Standard—

Table 5.2: Cursor positioning commands in

Command Cursor moves …

hor ←one character to the left

lor →one character to the right

kor ↑one character up

jor ↓one character down

0to the beginning of the line

$to the end of the line

wto the next word

bto the previous word

f⟨character⟩to the next ⟨character⟩on the line

Strg +Fto the next page (screenful)

Strg +Bto the previous page

Gto the last line of the le

⟨n⟩Gto line no. ⟨n⟩

Empty Buffer 0,0-1

Only after a command such as a(“append”), i(“insert”), or o(“open”)

will

change into “insert mode”. The status line displays something like “

insert mode

INSERT --

”, and keyboard input will be accepted as text.

The possible commands to enter insert mode are listed in Table 5.1; note that

lower-case and upper-case commands are dierent. To leave insert mode and go

back to command mode, press the Esc key. In command mode, enter Z Z to

write the buer contents to disk and quit

If you would rather discard the modications you made, you need to quit the

editor without saving the buer contents rst. Use the command :

↩. The

leading colon emphasises that this is a command-line mode command.

When :is entered in command mode,

changes to command-line mode. command-line mode

You can recognize this by the colon appearing in front of the cursor on the bottom

line of the screen. All further keyboard input is appended to that colon, until the

command is nished with the return key ( ↩);

executes the command and

reverts to command mode. In command-line mode,

processes the line-oriented

commands of its alter ego, the

line editor.

There is an

command to save an intermediate version of the buer called :

(“write”). Commands :

and :

save the buer contents and quit the editor;

both commands are therefore identical to the Z Z command.

Movement Through the Text In insert mode, newly entered characters will be put

into the current line. The return key starts a new line. You can move about the text

using cursor keys, but you can remove characters only on the current line using

⇐—an inheritance of

’s line-oriented predecessors. More extensive navigation

is only possible in command mode (Table 5.2).

Once you have directed the cursor to the proper location, you can begin cor-

recting text in command mode.

Deleting characters The dcommand is used to delete characters; it is always

followed by another character that species exactly what to delete (Table 5.3). To

make editing easier, you can prex a repeat count to each of the listed commands. repeat count

For example; the 3 x command will delete the next three characters.

If you have been too eager and deleted too much material, you can revert the

last change (or even all changes one after the other) using the u(“undo”) com- undo

58 5 The

Editor

Table 5.3: Editing commands in

Command Result

xDeletes the character below the cursor

XDeletes the character to the left of the cursor

r⟨char⟩Replaces the character below the cursor by ⟨char⟩

d w Deletes from cursor to end of current word

d $ Deletes from cursor to end of current line

d 0 Deletes from cursor to start of current line

d f ⟨char⟩Deletes from cursor to next occurrence of ⟨char⟩on the

current line

d d Deletes current line

d G Deletes from current line to end of text

d 1 G Deletes from current line to beginning of text

Table 5.4: Replacement commands in

Command Result

c w Replace from cursor to end of current word

c $ Replace from cursor to end of current line

c 0 Replace from cursor to start of current line

c f ⟨char⟩Replace from cursor to next occurrence of ⟨char⟩on the

current line

c /

abc

Replace from cursor to next occurrence of character se-

quence

abc

mand. This is subject to appropriate conguration settings.

Replacing characters The ccommand (“change”) serves to overwrite a selectedOverwriting

part of the text. cis a “combination command” similar to d, requiring an addi-

tional specication of what to overwrite.

will remove that part of the text before

changing to insert mode automatically. You can enter new material and return to

command mode using Esc . (Table 5.4).

5.2.3 Extended Commands

Cutting, Copying, and Pasting Text A frequent operation in text editing is to move

or copy existing material elsewhere in the document.

oers handy combination

commands to do this, which take specications similar to those for the ccom-

mand. y(“yank”) copies material to an interim buer without changing the

original text, whereas dmoves it to the interim buer, i. e., it is removed from

its original place and only available in the interim buer afterwards. (We have

introduced this as “deletion” above.)

Of course there is a command to re-insert (or “paste”) material from an interim

buer. This is done using p(to insert after the current cursor position) or P(to

insert at the current cursor position).

A peculiarity of

is that there is not just one interim buer but 26. This makes26 buffers

it easy to paste dierent texts (phrases, …) to dierent places in the le. The in-

terim buers are called “

” through “

” and can be invoked using a combination

of double quotes and buer names. The command sequence ” c y 4 w , for

instance, copies the next four words to the interim buer called

; the command

sequence ” g p inserts the contents of interim buer

after the current cursor

position.

5.2 The Standard—

Regular-Expression Text Search Like every good editor,

oers well-thought-

out search commands. “Regular expressions” make it possible to locate character

sequences that t elaborate search patterns. To start a search, enter a slash /in

command mode. This will appear on the bottom line of the terminal followed by

the cursor. Enter the search pattern and start the search using the return key.

will start at the current cursor position and work towards the end of the docu-

ment. To search towards the top, the search must be started using ?instead of /

. Once

has found a matching character sequence, it stops the search and places

the cursor on the rst character of the sequence. You can repeat the same search

towards the end using n(“next”) or towards the beginning using N.

Searching and Replacing Since locating character sequences is often not all that is

desired. Therefore,

also allows replacing found character sequences by others.

The following

command can be used:

:[⟨start line⟩

⟨end line⟩]

⟨regexp⟩

⟨replacement⟩[

]

The parts of the command within square brackets are optional. What do the dif-

ferent components of the command mean?

⟨Start line⟩and ⟨end line⟩determine the range of lines to be searched. Without range of lines

these, only the current line will be looked at! Instead of line numbers, you can

use a dot to specify the current line or a dollar sign to specify the last line—but do

not confuse the meanings of these characters with their meanings within regular

expressions:

:5,$s/red/blue/

replaces the rst occurrence of

red

on each line by

blue

, where the rst four lines

are not considered.

:5,$s/red/blue/g

replaces every occurrence of

red

in those lines by

blue

. (Watch out: Even

Fred Flint-

stone

will become

Fblue Flintstone

BInstead of line numbers, “

”, and “

”,

also allows regular expressions

within slashes as start and end markers:

:/^BEGIN/,/^END/s/red/blue/g

replaces

red

blue

only in lines located between a line starting with

BEGIN

After the command name sand a slash, you must enter the desired regular

expression. After another slash, ⟨replacement⟩gives a character sequence by which

the original text is to be replaced.

There is a special function for this argument: With a

character you can “ref- Back reference

erence back” to the text matched by the ⟨regexp⟩in every actual case. That is, “

s/bull/& frog

” changes every bull within the search range to a bull frog—a task

which will probably give genetic engineers trouble for some time to come.

Command-line Mode Commands So far we have described some command-line

mode (or “

mode”) commands. There are several more, all of which can be

accessed from command mode by prexing them with a colon and nishing them

with the return key (Table 5.5).

Exercises

C5.2 [5] (For systems with

vim

, e. g., the SUSE distributions.) Find out how to

access the interactive

vim

tutorial and work through it.

60 5 The

Editor

Table 5.5:

commands in

Command Result

⟨le name⟩Writes the complete buer content to the

designated le

⟨le name⟩Writes to the le even if it is write-

protected (if possible)

⟨le name⟩Reads the designated le into the buer

e #

Reads the last-read le again

⟨le name⟩Inserts the content of the designated le

after the line containing the cursor

⟨shell command⟩Executes the given shell command and re-

turns to

afterwards

⟨shell command⟩Inserts the output of ⟨shell command⟩after

the line containing the cursor

⟨regexp⟩

⟨replacement⟩Searches for ⟨regexp⟩and replaces by

⟨replacement⟩

Quits

even if the buer contents is un-

saved

oder :e

Saves the buer contents and quits

5.3 Other Editors

We have already alluded to the fact that your choice of editor is just as much down

to your personal preferences and probably says as much about you as a user as

your choice of car: Do you drive a polished BMW or are you happy with a dented

Astra? Or would you rather prefer a Land Rover? As far as choice is concerned,

the editor market oers no less than the vehicle market. We have presented the

possibly most important Linux editor, but of course there are many others.

kate

on KDE and

gedit

on GNOME, for example, are straightforward and easy-to-learn

editors with a graphical user interface that are perfectly adequate for the require-

ments of a normal user. GNU Emacs, however, is an extremely powerful and cus-

tomisable editor for programmers and authors, and its extensive “ecosystem” of

extensions leaves few desires uncatered for. Do browse through the package lists

of your distribution and check whether you will nd the editor of your dreams

there.

Commands in this Chapter

Primitive (but useful) line-oriented text editor

(1) 55

elvis

Popular “clone” of the

editor

elvis

(1) 54

Powerful line-oriented text editor (really

)

(1) 54

sed

Stream-oriented editor, copies its input to its output making changes in

the process

sed

(1) 55

Screen-oriented text editor

(1) 54

vim

Popular “clone” of the

editor

vim

(1) 54

5.3 Other Editors 61

Summary

• Text editors are important for changing conguration les and program-

ming. They often oer special features to make these tasks easier.

•

is a traditional, very widespread and powerful text editor with an id-

iosyncratic user interface.

$ echo tux

tux

$ ls

hallo.c

hallo.o

$ /bin/su -

Password:

Files: Care and Feeding

Contents

6.1 File and Path Names. . . . . . . . . . . . . . . . . . . 64

6.1.1 File Names . . . . . . . . . . . . . . . . . . . . 64

6.1.2 Directories . . . . . . . . . . . . . . . . . . . . 65

6.1.3 Absolute and Relative Path Names . . . . . . . . . . . 66

6.2 Directory Commands . . . . . . . . . . . . . . . . . . 67

6.2.1 The Current Directory:

& Co. . . . . . . . . . . . . 67

6.2.2 Listing Files and Directories—

............68

6.2.3 Creating and Deleting Directories:

mkdir

and

rmdir

. . . . . . 69

6.3 File Search Patterns . . . . . . . . . . . . . . . . . . . 70

6.3.1 Simple Search Patterns . . . . . . . . . . . . . . . . 70

6.3.2 Character Classes. . . . . . . . . . . . . . . . . . 72

6.3.3 Braces . . . . . . . . . . . . . . . . . . . . . . 73

6.4 Handling Files . . . . . . . . . . . . . . . . . . . . . 74

6.4.1 Copying, Moving and Deleting—

and Friends. . . . . . . 74

6.4.2 Linking Files—

and

ln -s

..............76

6.4.3 Displaying File Content—

and

less

..........80

6.4.4 Searching Files—

find

................81

6.4.5 Finding Files Quickly—

locate

and

slocate

.........84

Goals

• Being familiar with Linux conventions concerning le and directory names

• Knowing the most important commands to work with les and directories

• Being able to use shell lename search patterns

Prerequisites

• Using a shell (qv. Chapter 2)

• Use of a text editor (qv. Chapter 5)

grd1-dateien.tex

(

be27bba8095b329b

)

64 6 Files: Care and Feeding

6.1 File and Path Names

6.1.1 File Names

One of the most important services of an operating system like Linux consists

of storing data on permanent storage media like hard disks or USB keys and re-

trieving them later. To make this bearable for humans, similar data are usually

collected into “les” that are stored on the medium under a name.files

BEven if this seems trivial to you, it is by no means a given. In former times,

some operating systems made it necessary to know abominations like track

numbers on a disk in order to retrieve one’s data.

Thus, before we can explain to you how to handle les, we need to explain to

you how Linux names les.

In Linux le names, you are essentially allowed to use any character that yourAllowed characters

computer can display (and then some). However, since some of the characters

have a special meaning, we would recommend against their use in le names.

Only two characters are completely disallowed, the slash and the zero byte (the

character with ASCII value 0). Other characters like spaces, umlauts, or dollar

signs may be used freely, but must usually be escaped on the command line by

means of a backslash or quotes in order to avoid misinterpretations by the shell.

AAn easy trap for beginners to fall into is the fact that Linux distinguishes

uppercase and lowercase letters in le names. Unlike Windows, where up-Letter case

percase and lowercase letters in le names are displayed but treated the

same, Linux considers

x-files

and

X-Files

two dierent le names.

Under Linux, le names may be “quite long”—there is no denite upper

bound, since the maximum depends on the “le system”, which is to say the

specic way bytes are arranged on the medium (there are several methods on

Linux). A typical upper limit is 255 characters—but since such a name would

take somewhat more than three lines on a standard text terminal this shouldn’t

really cramp your style.

A further dierence from DOS and Windows computers is that Linux does not

use suxes to characterise a le’s “type”. Hence, the dot is a completely ordi-suffixes

nary character within a le name. You are free to store a text as

mumble.txt

, but

mumble

would be just as acceptable in principle. This should of course not turn you

o using suxes completely—you do after all make it easier to identify the le

content.

BSome programs insist on their input les having specic suxes. The C

compiler,

gcc

, for example, considers les with names ending in “

” C

source code, those ending in “

” assembly language source code, and

those ending in “

” precompiled object les.

You may freely use umlauts and other special characters in le names. How-special characters

ever, if les are to be used on other systems it is best to stay away from special

characters in le names, as it is not guaranteed that they will show up as the same

characters elsewhere.

AWhat happens to special characters also depends on your locale settings,locale settings

since there is no general standard for representing characters exceeding the

ASCII character set (128 characters covering mostly the English language,

digits and the most common special characters). Widely used encodings

are, for example, ISO 8859-1 and ISO 8859-15 (popularly know as ISO-Latin-

1 and ISO-Latin-9, respectively … don’t ask) as well as ISO 10646, casually

and not quite correctly called “Unicod” and usually encoded as “UTF-8”.

File names you created while encoding 𝑋was active may look completely

dierent when you look at the directory while encoding 𝑌is in force. The

whole topic is nothing you want to think about during meals.

6.1 File and Path Names 65

AShould you ever nd yourself facing a pile of les whose names are encoded

according to the wrong character set, the

convmv

program, which can con-

convmv

vert le names between various character encodings, may be able to help

you. (You will probably have to install it yourself since it is not part of

the standard installation of most distributions.) However, you should re-

ally get down to this only after working through the rest of this chapter, as

we haven’t even explained the regular

yet …

All characters from the following set may be used freely in le names: Portable file names

ABCDEFGHIJKLMNOPQRSTUVWXYZ

abcdefghijklmnopqrstuvwxyz

0123456789+-._

However, you should pay attention to the following hints:

• To allow moving les between Linux and older Unix systems, the length of

a le name should be at most 14 characters. (Make that “ancient”, really.)

• File names should always start with one of the letters or a digit; the other

four characters can be used without problems only inside a le name.

These conventions are easiest to understand by looking at some examples. Allow-

able le names would be, for instance:

X-files

foo.txt.bak

50.something

7_of_9

On the contrary, problems would be possible (if not likely or even assured) with:

-10°F

Starts with ‘‘

’’, includes special character

.profile

Will be hidden

3/4-metre

Contains illegal character

Smörrebröd

Contains umlauts

As another peculiarity, le names starting with a dot (“

”) will be skipped in Hidden files

some places, for example when the les within a directory are listed—les with

such names are considered “hidden”. This feature is often used for les contain-

ing settings for programs and which should not distract users from more impor-

tant les in directory listings.

BFor DOS and Windows experts: These systems allow “hiding” les by

means of a “le attribute” which can be set independently of the le’s

name. Linux and Unix do not support such a thing.

6.1.2 Directories

Since potentially many users may work on the same Linux system, it would be

problematic if each le name could occur just once. It would be dicult to make

clear to user Joe that he cannot create a le called

letter.txt

since user Sue already

has a le by that name. In addition, there must be a (convenient) way of ensuring

that Joe cannot read all of Sue’s les and the other way round.

For this reason, Linux supports the idea of hierarchical “directories” which are

used to group les. File names do not need to be unique within the whole system,

but only within the same directory. This means in particular that the system can

assign dierent directories to Joe and Sue, and that within those they may call

their les whatever they please without having to worry about each other’s les.

66 6 Files: Care and Feeding

In addition, we can forbid Joe from accessing Sue’s directory (and vice versa) and

no longer need to worry about the individual les within them.

On Linux, directories are simply les, even though you cannot access them

using the same methods you would use for “plain” les. However, this implies

that the rules we discussed for le names (see the previous section) also apply to

the names of directories. You merely need to learn that the slash (“

”) serves toslash

separate le names from directory names and directory names from one another.

joe/letter.txt

would be the le

letter.txt

in the directory

joe

Directories may contain other directories (this is the term “hierarchical” we

mentioned earlier), which results in a tree-like structure (inventively called a “di-directory tree

rectory tree”). A Linux system has a special directory which forms the root of the

tree and is therefore called the “root directory”. Its name is “

” (slash).

BIn spite of its name, the root directory has nothing to do with the system

administrator,

root

. It’s just that their names are similar.

BThe slash does double duty here—it serves both as the name of the root

directory and as the separator between other directory names. We’ll come

back to this presently.

The basic installation of common Linux distributions usually contains tens of

thousands of les in a directory hierarchy that is mostly structured according to

certain conventions. We shall tell you more about this directory hierarchy in Chap-

ter 9.

6.1.3 Absolute and Relative Path Names

Every le in a Linux system is described by a name which is constructed by start-

ing at the root directory and mentioning every directory down along the path to

the one containing the le, followed by the name of the le itself. For example,

/home/joe/letter.txt

names the le

letter.txt

, which is located within the

joe

direc-

tory, which in turn is located within the

home

directory, which in turn is a direct

descendant of the root directory. A name that starts with the root directory is

called an “absolute path name”—we talk about “path names” since the name de-absolute path name

scribes a “path” through the directory tree, which may contain directory and le

names (i. e., it is a collective term).

Each process within a Linux system has a “current directory” (often also called

“working directory”). File names are searched within this directory;

letter.txt

is thus a convenient abbreviation for “the le called

letter.txt

in the current di-

rectory”, and

sue/letter.txt

stands for “the le

letter.txt

within the

sue

directory

within the current directory”. Such names, which start from the current directory,

are called “relative path names”.relative path names

BIt is trivial to tell absolute from relative path names: A path name starting

with a “

” is absolute; all others are relative.

BThe current directory is “inherited” between parent and child processes. So

if you start a new shell (or any program) from a shell, that new shell uses

the same current directory as the shell you used to start it. In your new

shell, you can change into another directory using the

command, but the

current directory of the old shell does not change—if you leave the new

shell, you are back to the (unchanged) current directory of the old shell.

There are two convenient shortcuts in relative path names (and even absoluteshortcuts

ones): The name “

” always refers to the directory above the directory in question

in the directory tree—for example, in the case of

/home/joe

/home

. This frequently

allows you to refer conveniently to les in a “side branch” of the directory tree

as viewed from the current directory, without having to resort to absolute path

names. Assume

/home/joe

has the subdirectories

letters

and

novels

. With

letters

as the current directory, you could refer to the

ivanhoe.txt

le within the

novels

6.2 Directory Commands 67

directory by means of the relative path name

../novels/ivanhoe.txt

, without having

to use the unwieldy absolute path name

/home/joe/novels/ivanhoe.txt

The second shortcut does not make quite as obvious sense: the “

” name within

a directory always stands for the directory itself. It is not immediately clear why

one would need a method to refer to a directory which one has already reached,

but there are situations where this comes in quite handy. For example, you may

know (or could look up in Chapter 8) that the shell searches program les for

external commands in the directories listed in the environment variable

PATH

. If

you, as a software developer, want to invoke a program, let’s call it

prog

, which (a)

resides in a le within the current directory, and (b) this directory is not listed in

PATH

(always a good idea for security reasons), you can still get the shell to start

your le as a program by saying

$./prog

without having to enter an absolute path name.

BAs a Linux user you have a “home directory” which you enter immediately

after logging in to the system. The system administrator determines that

directory’s name when they create your user account, but it is usually called

the same as your user name and located below

/home

—something like

/home/

joe

for the user

joe

6.2 Directory Commands

6.2.1 The Current Directory:

& Co.

You can use the

shell command to change the current directory: Simply give Changing directory

the desired directory as a parameter:

$cd letters

Change to the

letters

directory

$cd ..

Change to the directory above

If you do not give a parameter you will end up in your home directory:

$cd

$pwd

/home/joe

You can output the absolute path name of the current directory using the

pwd

current directory

(“print working directory”) command.

Possibly you can also see the current directory as part of your prompt: Depend- prompt

ing on your system settings there might be something like

joe@red:~/letters> _

where

~/letters

is short for

/home/joe/letters

; the tilde (“

”) stands for the current

user’s home directory.

BThe “

cd -

” command changes to the directory that used to be current before

the most recent

command. This makes it convenient to alternate between

two directories.

Exercises

C6.1 [2] In the shell, is

an internal or an external command? Why?

C6.2 [3] Read about the

pushd

popd

, and

dirs

commands in the

bash

man page.

Convince yourself that these commands work as described there.

68 6 Files: Care and Feeding

Table 6.1: Some le type designations in

File type Colour Sux (

ls -F

) Type letter (

ls -l

)

plain le black none

executable le green

* -

directory blue

/ d

link cyan

@ l

Table 6.2: Some

options

Option Result

-a

--all

Displays hidden les as well

-i

--inode

Displays the unique le number (inode number)

-l

--format=long

Displays extra information

-o

--no-color

Omits colour-coding the output

-p

-F

Marks le type by adding a special character

-r

--reverse

Reverses sort order

-R

--recursive

Recurses into subdirectories (DOS:

DIR/S

)

-S

--sort=size

Sorts les by size (longest rst)

-t

--sort=time

Sorts le by modication time (newest rst)

-X

--sort=extension

Sorts le by extension (“le type”)

6.2.2 Listing Files and Directories—

To nd one’s way around the directory tree, it is important to be able to nd out

which les and directories are located within a directory. The

(“list”) command

does this.

Without options, this information is output as a multi-column table sorted byTabular format

le name. With colour screens being the norm rather than the exception today, it

has become customary to display the names of les of dierent types in various

colours. (We have not talked about le types yet; this topic will be mentioned in

Chapter 9.)

BThankfully, by now most distributions have agreed about the colours to use.

Table 6.1 shows the most common assignment.

BOn monochrome monitors—which can still be found—, the options

-F

-p

recommend themselves. These will cause special characters to be appended

to the le names according to the le’s type. A subset of these characters is

given in Table 6.1.

You can display hidden les (whose names begin with a dot) by giving the

-a

Hidden files

(“all”) option. Another very useful option is

-l

(a lowercase “L”, for “long”, rather

than the digit “1”). This displays not only the le names, but also some additionalAdditional information

information about each le.

BSome Linux distributions pre-set abbreviations for some combinations of

helpful options; the SUSE distributions, for example, use a simple

as an

abbreviation of “

ls -alF

”. “

” and “

” are also abbreviations for

variants.

Here is an example of

without and with

-l

$ls

file.txt

file2.dat

$ls -l

6.2 Directory Commands 69

-rw-r--r-- 1 joe users 4711 Oct 4 11:11 file.txt

-rw-r--r-- 1 joe users 333 Oct 2 13:21 file2.dat

In the rst case, all visible (non-hidden) les in the directory are listed; the second

case adds the extra information.

The dierent parts of the long format have the following meanings: The rst Long format

character gives the le type (see Chapter 9); plain les have “

”, directories “

”

and so on (“type character” in Table 6.1).

The next nine characters show the access permissions. Next there are a refer-

ence counter, the owner of the le (

joe

here), and the le’s group (

users

). After the

size of le in bytes, you can see the date and time of the last modication of the

le’s content. On the very right there is the le’s name.

ADepending on the language you are using, the date and time columns in par-

ticular may look completely dierent than the ones in our example (which

we generated using the minimal language environment “

”). This is usu-

ally not a problem in interactive use, but may prove a major nuisance if you

try to take the output of “

ls -l

” apart in a shell script. (Without wanting to

anticipate the training manual Advanced Linux, we recommend setting the

language environment to a dened value in shell scripts.)

BIf you want to see the extra information for a directory (such as

/tmp

), “

ls -l

/tmp

” doesn’t really help, because

will list the data for all the les within

/tmp

. Use the

-d

option to suppress this and obtain the information about

/tmp

itself.

supports many more options than the ones mentioned here; a few of the

more important ones are shown in Table 6.2.

In the LPI exams, Linux Essentials and LPI-101, nobody expects you to know

all 57 varieties of

options by heart. However, you may wish to commit the

most import half dozen or so—the content of Table 6.2, approximately—to

memory.

Exercises

C6.3 [1] Which les does the

/boot

directory contain? Does the directory have

subdirectories and, if so, which ones?

C6.4 [2] Explain the dierence between

with a le name argument and

with a directory name argument.

C6.5 [2] How do you tell

to display information about a directory rather

than the les in that directory, if a directory name is passed to the program?

(Hint: Documentation.)

6.2.3 Creating and Deleting Directories:

mkdir

and

rmdir

To keep your own les in good order, it makes sense to create new directories. You

can keep les in these “folders” according to their subject matter (for example).

Of course, for further structuring, you can create further directories within such

directories—your ambition will not be curbed by arbitrary limits.

To create new directories, the

mkdir

command is available. It requires one or Creating directories

more directory names as arguments, otherwise you will only obtain an error mes-

sage instead of a new directory. To create nested directories in a single step, you

can use the

-p

option, otherwise the command assumes that all directories in a

path name except the last one already exist. For example:

70 6 Files: Care and Feeding

$mkdir pictures/holiday

mkdir: cannot create directory `pictures/holiday': No such file





or directory

$mkdir -p pictures/holiday

$cd pictures

$ls -F

holiday/

Sometimes a directory is no longer required. To reduce clutter, you can removeRemoving directories

it using the

rmdir

(“remove directory”) command.

As with

mkdir

, at least one path name of a directory to be deleted must be given.

In addition, the directories in question must be empty, i. e., they may not contain

entries for les, subdirectories, etc. Again, only the last directory in every name

will be removed:

$rmdir pictures/holiday

$ls -F



pictures/



With the

-p

option, all empty subdirectories mentioned in a name can be removed

in one step, beginning with the one on the very right.

$mkdir -p pictures/holiday/summer

$rmdir pictures/holiday/summer

$ls -F pictures

pictures/holiday/

$rmdir -p pictures/holiday

$ls -F pictures

ls: pictures: No such file or directory

Exercises

C6.6 [!2] In your home directory, create a directory

grd1-test

with subdirecto-

ries

dir1

dir2

, and

dir3

. Change into directory

grd1-test/dir1

and create (e. g.,

using a text editor) a le called

hello

containing “

hello

”. In

grd1-test/dir2

create a le

howdy

containing “

howdy

”. Check that these les do exist. Delete

the subdirectory

dir3

using

rmdir

. Next, attempt to remove the subdirectory

dir2

using

rmdir

. What happens, and why?

6.3 File Search Patterns

6.3.1 Simple Search Patterns

You will often want to apply a command to several les at the same time. For

example, if you want to copy all les whose names start with “

” and end with

“

” from the

prog1

directory to the

prog2

directory, it would be quite tedious to

have to name every single le explictly—at least if you need to deal with more

than a couple of les! It is much more convenient to use the shell’s search patterns.search patterns

If you specify a parameter containing an asterisk on the shell command line—

asterisk like

prog1/p*.c

6.3 File Search Patterns 71

—the shell replaces this parameter in the actual program invocation by a sorted list

of all le names that “match” the parameter. “Match” means that in the actual le

name there may be an arbitrary-length sequence of arbitrary characters in place

of the asterisk. For example, names like

prog1/p1.c

prog1/polly.c

prog1/pop-rock.c

prog1/p.c

are eligible (note in particular the last name in the example—“arbitrary length”

does include “length zero”!). The only character the asterisk will not match is—

can you guess it?—the slash; it is usually better to restrict a search pattern like the

asterisk to the current directory.

BYou can test these search patterns conveniently using

echo

. The

$echo prog1/p*.c

command will output the matching le names without any obligation or

consequence of any kind.

BIf you really want to apply a command to all les in the directory tree starting

with a particular directory, there are ways to do that, too. We will discuss

this in Section 6.4.4.

The search pattern “

” describes “all les in the current directory”—excepting All files

hidden les whose name starts with a dot. To avert possibly inconvenient sur-

prises, search patterns diligently ignore hidden les unless you explicitly ask for

them to be included by means of something like “

”.

AYou may have encountered the asterisk at the command line of operating

systems like DOS or Windows1and may be used to specifying the “

*.*

”

pattern to refer to all les in a directory. On Linux, this is not correct—the

“

*.*

” pattern matches “all les whose name contains a dot”, but the dot isn’t

mandatory. The Linux equivalent, as we said, is “

”.

A question mark as a search pattern stands for exactly one arbitrary character question mark

(again excluding the slash). A pattern like

p?.c

thus matches the names

p1.c

pa.c

p-.c

p..c

(among others). Note that there must be one character—the “nothing” option

does not exist here.

You should take particular care to remember a very important fact: The expan-

sion of search pattern is the responsibility of the shell! The commands that you ex-

ecute usually know nothing about search patterns and don’t care about them,

either. All they get to see are lists of path names, but not where they come

from—i. e., whether they have been typed in directly or resulted from the ex-

pansion of search patterns.

1You’re probably too young for CP/M.

72 6 Files: Care and Feeding

BIncidentally, nobody says that the results of search patterns always need to

be interpreted as path names. For example, if a directory contains a le

called “

- l

”, a “

ls *

” in that directory will yield an interesting and perhaps

surprising result (see Exercise 6.9).

BWhat happens if the shell cannot nd a le whose name matches the search

pattern? In this case the command in question is passed the search pattern

as such; what it makes of that is its own aair. Typically such search patterns

are interpreted as le names, but the “le” in question is not found and an

error message is issued. However, there are commands that can do useful

things with search patterns that you pass them—with them, the challenge

is really to ensure that the shell invoking the command does not try to cut

in with its own expansion. (Cue: quotes)

6.3.2 Character Classes

A somewhat more precise specication of the matching characters in a search pat-

tern is oered by “character classes”: In a search pattern of the form

prog[123].c

the square brackets match exactly those characters that are enumerated within

them (no others). The pattern in the example therefore matches

prog1.c

prog2.c

prog3.c

but not

prog.c

There needs to be exactly one character

prog4.c 4

was not enumerated

proga.c a

neither

prog12.c

Exactly one character, please

As a more convenient notation, you may specify ranges as inranges

prog[1-9].c

[A-Z]bracadabra.txt

The square brackets in the rst line match all digits, the ones in the second all

uppercase letters.

ANote that in the common character encodings the letters are not contiguous:

A pattern like

prog[A-z].c

not only matches

progQ.c

and

progx.c

, but also

prog_.c

. (Check an ASCII table,

e. g. using “

man ascii

”.) If you want to match “uppercase and lowercase

letters only”, you need to use

prog[A-Za-z].c

AA construct like

prog[A-Za-z].c

does not catch umlauts, even if they look suspiciously like letters.

6.3 File Search Patterns 73

As a further convenience, you can specify negated character classes, which are negated classes

interpreted as “all characters except these”: Something like

prog[!A-Za-z].c

matches all names where the character between “

” and “

” is not a letter. As

usual, the slash is excepted.

6.3.3 Braces

The expansion of braces in expressions like

{red,yellow,blue}.txt

is often mentioned in conjunction with shell search patterns, even though it is

really just a distant relative. The shell replaces this by

red.txt yellow.txt blue.txt

In general, a word on the command line that contains several comma-separated

pieces of text within braces is replaced by as many words as there are pieces of

text between the braces, where in each of these words the whole brace expression

is replaced by one of the pieces. This replacement is purely based on the command

line text and is completely independent of the existence or non-existence of any les or

directories—unlike search patterns, which always produce only those names that

actually exist as path names on the system.

You can have more than one brace expression in a word, which will result in

the cartesian product, in other words all possible combinations: cartesian product

{a,b,c}{1,2,3}.dat

produces

a1.dat a2.dat a3.dat b1.dat b2.dat b3.dat c1.dat c2.dat c3.dat

This is useful, for example, to create new directories systematically; the usual

search patterns cannot help there, since they can only nd things that already

exist:

$mkdir -p revenue/200{8,9}/q{1,2,3,4}

Exercises

C6.7 [!1] The current directory contains the les

prog.c prog1.c prog2.c progabc.c prog

p.txt p1.txt p21.txt p22.txt p22.dat

Which of these names match the search patterns (a)

prog*.c

, (b)

prog?.c

, (c)

p?*.txt

, (d)

p[12]*

, (e)

, (f)

*.*

C6.8 [!2] What is the dierence between “

” and “

ls *

”? (Hint: Try both in a

directory containing subdirectories.)

C6.9 [2] Explain why the following command leads to the output shown:

74 6 Files: Care and Feeding

Table 6.3: Options for

Option Result

-b

(backup) Makes backup copies of existing target les by appending a tilde to their

names

-f

(force) Overwrites existing target les without prompting

-i

(engl. interactive) Asks (once per le) whether existing target les should be overwritten

-p

(engl. preserve) Tries to preserve all attributes of the source le for the copy

-R

(engl. recursive) Copies directories with all their content

-u

(engl. update) Copies only if the source le is newer than the target le (or the target le

doesn’t exist)

-v

(engl. verbose) Displays all activity on screen

$ls

-l file1 file2 file3

$ls *

-rw-r--r-- 1 joe users 0 Dec 19 11:24 file1

-rw-r--r-- 1 joe users 0 Dec 19 11:24 file2

-rw-r--r-- 1 joe users 0 Dec 19 11:24 file3

C6.10 [2] Why does it make sense for “

” not to match le names starting with

a dot?

6.4 Handling Files

6.4.1 Copying, Moving and Deleting—

and Friends

You can copy arbitrary les using the

(“copy”) command. There are two basicCopying files

approaches:

If you tell

the source and target le names (two arguments), then a 1∶1copy1 ∶ 1 copy

of the content of the source le will be placed in the target le. Normally

does

not ask whether it should overwrite the target le if it already exists, but just does

it—caution (or the

-i

option) is called for here.

You can also give a target directory name instead of a target le name. The

source le will then be copied to that directory, keeping its old name.

$cp list list2

$cp /etc/passwd .

$ls -l

-rw-r--r-- 1 joe users 2500 Oct 4 11:11 list

-rw-r--r-- 1 joe users 2500 Oct 4 11:25 list2

-rw-r--r-- 1 joe users 8765 Oct 4 11:26 passwd

In this example, we rst created an exact copy of le

list

under the name

list2

After that, we copied the

/etc/passwd

le to the current directory (represented by

the dot as a target directory name). The most important

options are listed in

Table 6.3.

Instead of a single source le, a longer list of source les (or a shell wildcardList of source files

pattern) is allowed. However, this way it is not possible to copy a le to a dierent

name, but only to a dierent directory. While in DOS it is possible to use “

COPY

*.TXT *.BAK

” to make a backup copy of every

TXT

le to a le with the same name

and a

BAK

sux, the Linux command “

cp *.txt *.bak

” usually fails with an error

message.

6.4 Handling Files 75

BTo understand this, you have to visualise how the shell executes this com-

mand. It tries rst to replace all wildcard patterns with the corresponding

le names, for example

*.txt

letter1.txt

and

letter2.txt

. What happens

*.bak

depends on the expansion of

*.txt

and on whether there are match-

ing le names for

*.bak

in the current directory—but the outcome will al-

most never be what a DOS user would expect! Usually the shell will pass

the

command the unexpanded

*.bak

wildcard pattern as the nal argu-

ment, which fails from the point of view of

since this is (unlikely to be)

an existing directory name.

While the

command makes an exact copy of a le, physically duplicating the

le on the storage medium or creating a new, identical copy on a dierent storage

medium, the

(“move”) command serves to move a le to a dierent place or Move/rename files

change its name. This is strictly an operation on directory contents, unless the le

is moved to a dierent le system—for example from a hard disk partition to a

USB key. In this case it is necessary to move the le around physically, by copying

it to the new place and removing it from the old.

The syntax and rules of

are identical to those of

—you can again specify

a list of source les instead of merely one, and in this case the command expects

a directory name as the nal argument. The main dierence is that

lets you

rename directories as well as les.

The

-b

-f

-i

-u

, and

-v

options of

correspond to the eponymous ones de-

scribed with

$mv passwd list2

$ls -l

-rw-r--r-- 1 joe users 2500 Oct 4 11:11 list

-rw-r--r-- 1 joe users 8765 Oct 4 11:26 list2

In this example, the original le

list2

is replaced by the renamed le

passwd

. Like

does not ask for conrmation if the target le name exists, but overwrites

the le mercilessly.

The command to delete les is called

(“remove”). To delete a le, you must Deleting files

have write permission in the corresponding directory. Therefore you are “lord of

the manor” in your own home directory, where you can remove even les that do

not properly belong to you.

AWrite permission on a le, on the other hand, is completely irrelevant as far

as deleting that le is concerned, as is the question to which user or group

the le belongs.

goes about its work just as ruthlessly as

—the les in question are Deleting is forever!

obliterated from the le system without conrmation. You should be especially

careful, in particular when shell wildcard patterns are used. Unlike in DOS, the

dot in a Linux le name is a character without special signicance. For this rea-

son, the “

rm *

” command deletes all non-hidden les from the current directory.

Subdirectories will remain unscathed; with “

rm -r *

” they can also be removed.

AAs the system administrator, you can trash the whole system with a com-

mand such as “

rm -rf /

”; utmost care is required! It is easy to type “

rm -rf

foo *

” instead of “

rm -rf foo*

”.

Where

removes all les whose names are passed to it, “

rm -i

” proceeds a little

more carefully:

$rm -i lis*

rm: remove 'list'? n

rm: remove 'list2'? y

$ls -l

-rw-r--r-- 1 joe users 2500 Oct 4 11:11 list

76 6 Files: Care and Feeding

The example illustrates that, for each le,

asks whether it should be removed

(“

” for “yes”) or not (“

” for “no”).

BDesktop environments such as KDE usually support the notion of a “dust-

bin” which receives les deleted from within the le manager, and which

makes it possible to retrieve les that have been removed inadvertently.

There are similar software packages for the command line.

In addition to the

-i

and

-r

options,

allows

’s

-v

and

-f

options, with similar

results.

Exercises

C6.11 [!2] Create, within your home directory, a copy of the le

/etc/services

called

myservices

. Rename this le to

srv.dat

and copy it to the

/tmp

directory

(keeping the new name intact). Remove both copies of the le.

C6.12 [1] Why doesn’t

have an

-R

option (like

has)?

C6.13 [!2] Assume that one of your directories contains a le called “

- file

”

(with a dash at the start of the name). How would you go about removing

this le?

C6.14 [2] If you have a directory where you do not want to inadvertently fall

victim to a “

rm *

”, you can create a le called “

- i

” there, as in

$> -i

(will be explained in more detail in Chapter 7). What happens if you now

execute the “

rm *

” command, and why?

6.4.2 Linking Files—

and

ln -s

Linux allows you to create references to les, so-called “links”, and thus to assign

several names to the same le. But what purpose does this serve? The applica-

tions range from shortcuts for le and directory names to a “safety net” against

unwanted le deletions, to convenience for programmers, to space savings for

large directory trees that should be available in several versions with only small

dierences.

The

(“link”) command assigns a new name (second argument) to a le in

addition to its existing one (rst argument):

$ln list list2

$ls -l

-rw-r--r-- 2 joe users 2500 Oct 4 11:11 list

-rw-r--r-- 2 joe users 2500 Oct 4 11:11 list2

The directory now appears to contain a new le called

list2

. Actually, there areA file with multiple names

just two references to the same le. This is hinted at by the reference counter in

reference counter the second column of the “

ls -l

” output. Its value is 2, denoting that the le really

has two names. Whether the two le names really refer to the same le can only be

decided using the “

ls -i

” command. If this is the case, the le number in the rst

column must be identical for both les. File numbers, also called inode numbers,inode numbers

identify les uniquely within their le system:

$ls -i

876543 list 876543 list2

6.4 Handling Files 77

B“Inode” is short for “indirection node”. Inodes store all the information that

the system has about a le, except for the name. There is exactly one inode

per le.

If you change the content of one of the les, the other’s content changes as well,

since in fact there is only one le (with the unique inode number 876543). We only

gave that le another name.

BDirectories are simply tables mapping le names to inode numbers. Obvi-

ously there can be several entries in a table that contain dierent names but

the same inode number. A directory entry with a name and inode number

is called a “link”.

You should realise that, for a le with two links, it is quite impossible to nd

out which name is “the original”, i. e., the rst parameter within the

command.

From the system’s point of view both names are completely equivalent and indis-

tinguishable.

AIncidentally, links to directories are not allowed on Linux. The only excep-

tions are “

” and “

”, which the system maintains for each directory. Since

directories are also les and have their own inode numbers, you can keep

track of how the le system ts together internally. (See also Exercise 6.19).

Deleting one of the two les decrements the number of names for le no.

876543 (the reference counter is adjusted accordingly). Not until the reference

counter reachers the value of 0will the le’s content actually be removed.

$rm list

$ls -li

876543 -rw-r--r-- 1 joe users 2500 Oct 4 11:11 list2

BSince inode numbers are only unique within the same physical le system

(disk partition, USB key, …), such links are only possible within the same

le system where the le resides.

BThe explanation about deleting a le’s content was not exactly correct: If the

last le name is removed, a le can no longer be opened, but if a process is

still using the le it can go on to do so until it explicitly closes the le or ter-

minates. In Unix software this is a common idiom for handling temporary

les that are supposed to disappear when the program exits: You create

them for reading and writing and “delete” them immediately afterwards

without closing them within your program. You can then write data to the

le and later jump back to the beginning to reread them.

BYou can invoke

not just with two le name arguments but also with one

or with many. In the rst case, a link with the same name as the original

will be created in the current directory (which should really be dierent

from the one where the le is located), in the second case all named les

will be “linked” under their original names into the diréctory given as the

last argument (think

You can use the “

cp -l

” command to create a “link farm”. This means that link farm

instead of copying the les to the destination (as would otherwise be usual), links

to the originals will be created:

$mkdir prog-1.0.1

New directory

$cp -l prog-1.0/* prog-1.0.1

78 6 Files: Care and Feeding

The advantage of this approach is that the les still exist only once on the disk, and

thus take up space only once. With today’s prices for disk storage this may not be

compellingly necessary—but a common application of this idea, for example, con-

sists of making periodic backup copies of large le hierarchies which should ap-

pear on the backup medium (disk or remote computer) as separate, date-stamped

le hierarchies. Experience teaches that most les only change very rarely, and

if these les then need to be stored just once instead of over and over again, this

tends to add up over time. In addition, the les do not need to be written to the

backup medium time and again, and that can save considerable time.

BBackup packages that adopt this idea include, for example, Rsnapshot (

http:

//www.rsnapshot.org/

) or Dirvish (

http://www.dirvish.org/

AThis approach should be taken with a certain amount of caution. Using

links may let you “deduplicate” identical les, but not identical directo-

ries. This means that for every date-stamped le hierarchy on the backup

medium, all directories must be created anew, even if the directories only

contain links to existing les. This can lead to very complicated directory

structures and, in the extreme case, to consistency checks on the backup

medium failing because the computer does not have enough virtual mem-

ory to check the directory hierarchy.

AYou will also need to watch out if – as alluded to in the example – you make

a “copy” of a program’s source code as a link farm (which in the case of,

e. g., the Linux source code could really pay o): Before you can modify a

le in your newly-created version, you will need to ensure that it is really a

separate le and not just a link to the original (which you will very probably

not want to change). This means that you either need to manually replace

the link to the le by an actual copy of the le, or else use an editor which

writes modied versions as separate les automatically2.

This is not all, however: There are two dierent kinds of link in Linux systems.

The type explained above is the default case for the

command and is called a

“hard link”. It always uses a le’s inode number for identication. In addition,

there are symbolic links (also called “soft links” in contrast to “hard links”). Sym-symbolic links

bolic links are really les containing the name of the link’s “target le”, together

with a ag signifying that the le is a symbolic link and that accesses should be

redirected to the target le. Unlike with hard links, the target le does not “know”

about the symbolic link. Creating or deleting a symbolic link does not impact the

target le in any way; when the target le is removed, however, the symbolic link

“dangles”, i.e., points nowhere (accesses elicit an error message).

In contrast to hard links, symbolic links allow links to directories as well as lesLinks to directories

on dierent physical le systems. In practice, symbolic links are often preferred,

since it is easier to keep track of the linkage by means of the path name.

BSymbolic links are popular if le or directory names change but a certain

backwards compatibility is desired. For example, it was agreed that user

mailboxes (that store unread e-mail) should be stored in the

/var/mail

di-

rectory. Traditionally, this directory was called

/var/spool/mail

, and many

programs hard-code this value internally. To ease a transition to

/var/mail

a distribution can set up a symbolic link under the name of

/var/spool/mail

which points to

/var/mail

. (This would be impossible using hard links, since

hard links to directories are not allowed.)

To create a symbolic link, you must pass the

-s

option to

$ln -s /var/log short

$ls -l

2If you use Vim (a. k.a

, you can add the “

set backupcopy=auto,breakhardlink

” command to the

.vimrc

le in your home directory.

6.4 Handling Files 79

-rw-r--r-- 1 joe users 2500 Oct 4 11:11 liste2

lrwxrwxrwx 1 joe users 14 Oct 4 11:40 short -> /var/log

$cd short

$pwd -P

/var/log

Besides the

-s

option to create “soft links”, the

command supports (among oth-

ers) the

-b

-f

-i

, and

-v

options discussed earlier on.

To remove symbolic links that are no longer required, delete them using

just

like plain les. This operation applies to the link rather than the link’s target.

$cd

$rm short

$ls

liste2

As you have seen above, “

ls -l

” will, for symbolic links, also display the le

that the link is pointing to. With the

-L

and

-H

options, you can get

to resolve

symbolic links directly:

$mkdir dir

$echo XXXXXXXXXX >dir/file

$ln -s file dir/symlink

$ls -l dir

total 4

-rw-r--r-- 1 hugo users 11 Jan 21 12:29 file

lrwxrwxrwx 1 hugo users 5 Jan 21 12:29 symlink -> file

$ls -lL dir

-rw-r--r-- 1 hugo users 11 Jan 21 12:29 file

-rw-r--r-- 1 hugo users 11 Jan 21 12:29 symlink

$ls -lH dir

-rw-r--r-- 1 hugo users 11 Jan 21 12:29 file

lrwxrwxrwx 1 hugo users 5 Jan 21 12:29 symlink -> file

$ls -l dir/symlink

lrwxrwxrwx 1 hugo users 5 Jan 21 12:29 dir/symlink -> file

$ls -lH dir/symlink

-rw-r--r-- 1 hugo users 11 Jan 21 12:29 dir/symlink

The dierence between

-L

and

-H

is that the

-L

option always resolves symbolic links

and displays information about the actual le (the name shown is still always the

one of the link, though). The

-H

, as illustrated by the last three commands in the

example, does that only for links that have been directly given on the command

line.

By analogy to “

cp -l

”, the “

cp -s

” command creates link farms based on sym-

and symbolic links

bolic links. These, however, are not quite as useful as the hard-link-based ones

shown above. “

cp -a

” copies directory hierarchies as they are, keeping symbolic

links as they are; “

cp -L

” arranges to replace symbolic links by their targets in the

copy, and “

cp -P

” precludes that.

Exercises

C6.15 [!2] In your home directory, create a le with arbitrary content (e. g.,

using “

echo Hello >~/hello

” or a text editor). Create a hard link to that le

called

link

. Make sure that the le now has two names. Try changing the

le with a text editor. What happens?

C6.16 [!2] Create a symbolic link called

~/symlink

to the le in the previous ex-

ercise. Check whether accessing the le via the symbolic link works. What

happens if you delete the le (name) the symbolic link is pointing to?

80 6 Files: Care and Feeding

Table 6.4: Keyboard commands for

Key Result

↩Scrolls up a line

Scrolls up a screenful

bScrolls back a screenful

hDisplays help

qQuits

/⟨word⟩↩Searches for ⟨word⟩

!⟨command⟩↩Executes ⟨command⟩in a subshell

vInvokes editor (

)

Ctrl +lRedraws the screen

C6.17 [!2] What directory does the

link in the “

” directory point to?

C6.18 [3] Consider the following command and its output:

$ls -ai /

2 . 330211 etc 1 proc 4303 var

2 .. 2 home 65153 root

4833 bin 244322 lib 313777 sbin

228033 boot 460935 mnt 244321 tmp

330625 dev 460940 opt 390938 usr

Obviously, the

and

/home

directories have the same inode number. Since

the two evidently cannot be the same directory—can you explain this phe-

nomenon?

C6.19 [3] We mentioned that hard links to directories are not allowed. What

could be a reason for this?

C6.20 [3] How can you tell from the output of “

ls -l ~

” that a subdirectory of

contains no further subdirectories?

C6.21 [2] How do “

ls -lH

” and “

ls -lL

” behave if a symbolic link points to a

dierent symbolic link?

C6.22 [3] What is the maximum length of a “chain” of symbolic links? (In

other words, if you start with a symbolic link to a le, how often can you

create a symbolic link that points to the previous symbolic link?)

C6.23 [4] (Brainteaser/research exercise:) What requires more space on disk,

a hard link or a symbolic link? Why?

6.4.3 Displaying File Content—

and

less

A convenient display of text les on screen is possible using the

command,display of text files

which lets you view long documents page by page. The output is stopped after

one screenful, and “

--More--

” appears in the nal line (possibly followed by the

percentage of the le already displayed). The output is continued after a key press.

The meanings of various keys are explained in Table 6.4.

also understands some options. With

-s

(“squeeze”), runs of empty linesOptions

are compressed to just one, the

-l

option ignores page ejects (usually represented

by “

”) which would otherwise stop the output. The

-n

⟨number⟩option sets the

number of screen lines to ⟨number⟩, otherwise

takes the number from the

terminal denition pointed to by

TERM

’s output is still subject to vexing limitations such as the general impossibil-

ity of moving back towards the beginning of the output. Therefore, the improved

6.4 Handling Files 81

Table 6.5: Keyboard commands for

less

Key Result

↓or eor jor ↩Scrolls up one line

for Scrolls up one screenful

↑or yor kScrolls back one line

bScrolls back one screenful

Home or gJumps to the beginning of the text

End or Shift ⇑+gJumps to the end of the text

p⟨percent⟩↩Jumps to position ⟨percent⟩(in %) of the text

hDisplays help

qQuits

less

/⟨word⟩↩Searches for ⟨word⟩towards the end

nContinues search towards the end

?⟨word⟩↩Searches for ⟨word⟩towards the beginning

Shift ⇑+nContinues search towards the beginning

!⟨command⟩↩Executes ⟨command⟩in subshell

vInvokes editor (

)

ror Ctrl +lRedraws screen

version

less

(a weak pun—think “less is more”) is more [sic!] commonly seen to-

less

day.

less

lets you use the cursor keys to move around the text as usual, the search

routines have been extended and allow searching both towards the end as well

as towards the beginning of the text. The most common keyboard commands are

summarised in Table 6.5.

As mentioned in Chapter 4,

less

usually serves as the display program for man-

ual pages via

man

. All the commands are therefore available when perusing man-

ual pages.

6.4.4 Searching Files—

find

Who does not know the following feeling: “There used to be a le

foobar

… but

where did I put it?” Of course you can tediously sift through all your directories

by hand. But Linux would not be Linux if it did not have something handy to help

you.

The

find

command searches the directory tree recursively for les matching a

set of criteria. “Recursively” means that it considers subdirectories, their subdirec-

tories and so on.

find

’s result consists of the path names of matching les, which

can then be passed on to other programs. The following example introduces the

command structure:

$find . -user joe -print

./list

This searches the current directory including all subdirectories for les belonging

to the user

joe

. The

-print

command displays the result (a single le in our case)

on the terminal. For convenience, if you do not specify what to do with matching

les,

-print

will be assumed.

Note that

find

needs some arguments to go about its task.

Starting Directory The starting directory should be selected with care. If you

pick the root directory, the required le(s)—if they exist—will surely be found,

but the search may take a long time. Of course you only get to search those les

where you have appropriate privileges.

82 6 Files: Care and Feeding

BAn absolute path name for the start directory causes the le names in theAbsolute or relative path names?

output to be absolute, a relative path name for the start directory accord-

ingly produces relative path names.

Instead of a single start directory, you can specify a list of directories that willDirectory list

be searched in turn.

Test Conditions These options describe the requirements on the les in detail.

Table 6.6 shows the most important tests. The

find

documentation explains many

more.

Table 6.6: Test conditions for

find

Test Description

-name

Species a le name pattern. All shell search pattern characters

are allowed. The

-iname

option ignores case dierences.

-type

Species a le type (see Section 9.2). This includes:

block device le

character device le

directory

plain le

symbolic link

FIFO (named pipe)

Unix domain socket

-user

Species a user that the le must belong to. User names as well

as numeric UIDs can be given.

-group

Species a group that the le must belong to. As with

-user

, a

numeric GID can be specied as well as a group name.

-size

Species a particular le size. Plain numbers signify 512-byte

blocks; bytes or kibibytes can be given by appending

, re-

spectively. A preceding plus or minus sign stands for a lower or

upper limit;

-size +10k

, for example, matches all les bigger than

10 KiB.

-atime

(engl. access) allows searching for les based on the time of last

access (reading or writing). This and the next two tests take their

argument in days; …

min

instead of …

time

produces 1-minute ac-

curacy.

-mtime

(engl. modication) selects according to the time of modication.

-ctime

(engl. change) selects according to the time of the last inode

change (including access to content, permission change, renam-

ing, etc.)

-perm

Species a set of permissions that a le must match. The per-

missions are given as an octal number (see the

chmod

command.

To search for a permission in particular, the octal number must

be preceded by a minus sign, e.g.,

-perm -20

matches all les with

group write permission, regardless of their other permissions.

-links

Species a reference count value that eligible les must match.

-inum

Finds links to a le with a given inode number.

If multiple tests are given at the same time, they are implicitly ANDed together—Multiple tests

all of them must match.

find

does support additional logical operators (see Ta-

ble 6.7).

In order to avoid mistakes when evaluating logical operators, the tests are best

enclosed in parentheses. The parentheses must of course be escaped from the

shell:

$find . $ -type d -o -name "A*" $ -print

./.

6.4 Handling Files 83

Table 6.7: Logical operators for

find

Option Operator Meaning

Not The following test must not match

-a

And Both tests to the left and right of

-a

must match

-o

Or At least one of the tests to the left and right of

-o

must match

./..

./bilder

./Attic

$ _

This example lists all names that either refer to directories or that begin with “

”

or both.

Actions As mentioned before, the search results can be displayed on the screen

using the

-print

option. In addition to this, there are two options,

-exec

and

, which execute commands incorporating the le names. The single dierence Executing commands

between

-ok

and

-exec

is that

-ok

asks the user for conrmation before actually exe-

cuting the command; with

-exec

, this is tacitly assumed. We will restrict ourselves

to discussing

-exec

There are some general rules governing the

-exec

option:

• The command following

-exec

must be terminated with a semicolon (“

;

”).

Since the semicolon is a special character in most shells, it must be escaped

(e.g., as “

\\;

” or using quotes) in order to make it visible to

find

• Two braces (“

{}

”) within the command are replaced by the le name that

was found. It is best to enclose the braces in quotes to avoid problems with

spaces in le names.

For example:

$find . -user joe -exec ls -l '{}' \;

-rw-r--r-- 1 joe users 4711 Oct 4 11:11 file.txt

$ _

This example searches for all les within the current directory (and below) be-

longing to user

test

, and executes the “

ls -l

” command for each of them. The

following makes more sense:

$find . -atime +13 -exec rm -i '{}' \;

This interactively deletes all les within the current directory (and below) that

have not been accessed for two weeks.

BSometimes—say, in the last example above—it is very inecient to use

exec

to start a new process for every single le name found. In this case,

the

xargs

command, which collects as many le names as possible before

actually executing a command, can come in useful:

$find . -atime +13 | xargs -r rm -i

xargs

reads its standard input up to a (congurable) maximum of characters

or lines and uses this material as arguments for the specied command (here

). On input, arguments are separated by space characters (which can be

escaped using quotes or “

”) or newlines. The command is invoked as often

84 6 Files: Care and Feeding

as necessary to exhaust the input.—The

-r

option ensures that

is executed

only if

find

actually sends a le name; otherwise it would be executed at least

once.

BWeird lenames can get the

find

xargs

combination in trouble, for example

ones that contain spaces or, indeed, newlines which may be mistaken as

separators. The silver bullet consists of using the “

-print0

” option to

find

which outputs the le names just as “

-print

” does, but uses null bytes to

separate them instead of newlines. Since the null byte is not a valid character

in path names, confusion is no longer possible.

xargs

must be invoked using

the “

-0

” option to understand this kind of input:

$find . -atime +13 -print0 | xargs -0r rm -i

Exercises

C6.24 [!2] Find all les on your system which are longer than 1 MiB, and

output their names.

C6.25 [2] How could you use

find

to delete a le with an unusual name (e. g.,

containing invisible control characters or umlauts that older shells cannot

deal with)?

C6.26 [3] (Second time through the book.) How would you ensure that les

/tmp

which belong to you are deleted once you log out?

6.4.5 Finding Files Quickly—

locate

and

slocate

The

find

command searches les according to many dierent criteria but needs to

walk the complete directory tree below the starting directory. Depending on the

tree size, this may take considerable time. For the typical application—searching

les with particular names—there is an accelerated method.

The

locate

command lists all les whose names match a given shell wildcard

pattern. In the most trivial case, this is a simple string of characters:

$locate letter.txt

/home/joe/Letters/letter.txt

/home/joe/Letters/grannyletter.txt

/home/joe/Letters/grannyletter.txt~



AAlthough

locate

is a fairly important service (as emphasised by the fact that

it is part of the LPIC1 curriculum), not all Linux distributions include it as

part of the default installation.

For example, if you are using a SUSE distribution, you must explicitly install

the

findutils-locate

package before being able to use

locate

The “

”, “

”, and “

[

…

]

” characters mean the same thing to

locate

as they do to

the shell. But while a query without wildcard characters locates all le names that

contain the pattern anywhere, a query with wildcard characters returns only those

names which the pattern describes completely—from beginning to end. Therefore

pattern queries to

locate

usually start with “

”:

$locate "*/letter.t*"

/home/joe/Letters/letter.txt

/home/joe/Letters/letter.tab



6.4 Handling Files 85

BBe sure to put quotes around

locate

queries including shell wildcard char-

acters, to keep the shell from trying to expand them.

The slash (“

”) is not handled specially:

$locate Letters/granny

/home/joe/Letters/grannyletter.txt

/home/joe/Letters/grannyletter.txt~

locate

is so fast because it does not walk the le system tree, but checks a

“database” of le names that must have been previously created using the

updat-

edb

program. This means that

locate

does not catch les that have been added to

the system since the last database update, and conversely may output the names

of les that have been deleted in the meantime.

BYou can get

locate

to return existing les only by using the “

-e

” option, but

this negates

locate

’s speed advantage.

The

updatedb

program constructs the database for

locate

. Since this may take

considerable time, your system administrator usually sets this up to run when the

system does not have a lot to do, anyway, presumably late at night.

BThe

cron

service which is necessary for this will be explained in detail in

Advanced Linux. For now, remember that most Linux distributions come

with a mechanism which causes

updatedb

to be run every so often.

As the system administrator, you can tell

updatedb

which les to consider when

setting up the database. How that happens in detail depends on your distribution:

updatedb

itself does not read a conguration le, but takes its settings from the

command line and (partly) environment variables. Even so, most distributions

call

updatedb

from a shell script which usually reads a le like

/etc/updatedb.conf

/etc/sysconfig/locate

, where appropriate environment variables can be set up.

BYou may nd such a le, e.g., in

/etc/cron.daily

(details may vary according

to your distribution).

You can, for instance, cause

updatedb

to search certain directories and omit oth-

ers; the program also lets you specify “network le systems” that are used by sev-

eral computers and that should have their own database in their root directories,

such that only one computer needs to construct the database.

BAn important conguration setting is the identity of the user that runs

up-

datedb

. There are essentially two possibilities:

•

updatedb

runs as

root

and can thus enter every le in its database. This

also means that users can ferret out le names in directories that they

would not otherwise be able to look into, for example, other users’

home directories.

•

updatedb

runs with restricted privileges, such as those of user

nobody

. In

this case, only names within directories readable by

nobody

end up in

the database.

BThe

slocate

program—an alternative to the usual

locate

—circumvents this

problem by storing a le’s owner, group and permissions in the database in

addition to the le’s name. It outputs a le name only if the user who runs

slocate

can, in fact, access the le in question.

slocate

comes with an

updatedb

program, too, but this is merely another name for

slocate

itself.

BIn many cases,

slocate

is installed such that it can also be invoked using the

locate

command.

86 6 Files: Care and Feeding

Exercises

C6.27 [!1]

README

is a very popular le name. Give the absolute path names of

all les on your system called

README

C6.28 [2] Create a new le in your home directory and convince yourself by

calling

locate

that this le is not listed (use an appropriately outlandish le

name to make sure). Call

updatedb

(possibly with administrator privileges).

Does

locate

nd your le afterwards? Delete the le and repeat these steps.

C6.29 [1] Convince yourself that the

slocate

program works, by searching for

les like

/etc/shadow

as normal user.

Commands in this Chapter

Changes a shell’s current working directory

bash

(1) 67

convmv

Converts le names between character encodings

convmv

(1) 64

Copies les

(1) 74

find

Searches les matching certain given criteria

find

(1), Info:

find

less

Displays texts (such as manual pages) by page

less

(1) 80

Creates (“hard” or symbolic) links

(1) 76

locate

Finds les by name in a le name database

locate

(1) 84

Lists le information or directory contents

(1) 67

mkdir

Creates new directories

mkdir

(1) 69

Displays text data by page

(1) 80

Moves les to dierent directories or renames them

(1) 75

pwd

Displays the name of the current working directory

pwd

(1),

bash

(1) 67

Removes les or directories

(1) 75

rmdir

Removes (empty) directories

rmdir

(1) 70

slocate

Searches le by name in a le name database, taking le permissions into

account

slocate

(1) 85

updatedb

Creates the le name database for

locate updatedb

(1) 85

xargs

Constructs command lines from its standard input

xargs

(1), Info:

find

Summary

• Nearly all possible characters may occur in le names. For portability’s sake,

however, you should restrict yourself to letters, digits, and some special

characters.

• Linux distinguishes between uppercase and lowercase letters in le names.

• Absolute path names always start with a slash and mention all directories

from the root of the directory tree to the directory or le in question. Relative

path names start from the “current directory”.

• You can change the current directory of the shell using the

command.

You can display its name using

pwd

•

displays information about les and directories.

• You can create or remove directories using

mkdir

and

rmdir

• The

and

commands copy, move, and delete les and directories.

• The

command allows you to create “hard” and “symbolic” links.

•

and

less

display les (and command output) by pages on the terminal.

•

find

searches for les or directories matching certain criteria.

$ echo tux

tux

$ ls

hallo.c

hallo.o

$ /bin/su -

Password:

Standard I/O and Filter

Commands

Contents

7.1 I/O Redirection and Command Pipelines . . . . . . . . . . . 88

7.1.1 Standard Channels . . . . . . . . . . . . . . . . . 88

7.1.2 Redirecting Standard Channels . . . . . . . . . . . . . 89

7.1.3 Command Pipelines. . . . . . . . . . . . . . . . . 92

7.2 Filter Commands . . . . . . . . . . . . . . . . . . . . 94

7.3 Reading and Writing Files. . . . . . . . . . . . . . . . . 94

7.3.1 Outputting and Concatenating Text Files—

cat

and

tac

. . . . 94

7.3.2 Beginning and End—

head

and

tail

............96

7.3.3 Just the Facts, Ma’am—

and

hexdump

...........97

7.4 Text Processing. . . . . . . . . . . . . . . . . . . . . 100

7.4.1 Character by Character—

expand

and

unexpand

. . . . . . . 100

7.4.2 Line by Line—

fmt

and so on . . . . . . . . . . . . . 103

7.5 Data Management . . . . . . . . . . . . . . . . . . . 108

7.5.1 Sorted Files—

sort

and

uniq

..............108

7.5.2 Columns and Fields—

cut

paste

etc. . . . . . . . . . . . 113

Goals

• Mastering shell I/O redirection

• Knowing the most important lter commands

Prerequisites

• Shell operation (see Chapter 2)

• Use of a text editor (see Chapter 5)

• File and directory handling (see Chapter 6)

grd1-filter-opt.tex

[

!complex

] (

be27bba8095b329b

)

88 7 Standard I/O and Filter Commands

Keyboard Process Screen

File

stdin stdout

Figure 7.1: Standard channels on Linux

7.1 I/O Redirection and Command Pipelines

7.1.1 Standard Channels

Many Linux commands—like

grep

and friends—are designed to read input data,

manipulate it in some way, and output the result of these manipulations. For

example, if you enter

$grep xyz

you can type lines of text on the keyboard, and

grep

will only let those pass that

contain the character sequence, “xyz”:

$grep xyz

abc def

xyz 123

aaa bbb

YYYxyzZZZ

Ctrl

(The key combination at the end lets

grep

know that the input is at an end.)

We say that

grep

reads data from “standard input”—in this case, the keyboard—standard input

and writes to “standard output”—in this case, the console screen or, more likely,

standard output a terminal program in a graphical desktop environment. The third of these

“standard channels” is “standard error output”; while the “payload data”

grep

standard error output

produces are written to standard output, standard error output takes any error

messages (e.g., about a non-existent input le or a syntax error in the regular

expression).

In this chapter you will learn how to redirect a program’s standard output to

a le or take a program’s standard input from a le. Even more importantly, you

will learn how to feed one program’s output directly (without the detour via a

le) into another program as that program’s input. This opens the door to using

the Linux commands, which taken on their own are all fairly simple, as building

blocks to construct very complex applications. (Think of a Lego set.)

BWe will not be able to exhaust this topic in this chapter. Do look forward

to the manual, Advanced Linux, where constructing shell scripts with the

commands from the Unix “toolchest” plays a very important rôle! Here is

where you learn the very important fundamentals of cleverly combining

Linux commands even on the command line.

7.1 I/O Redirection and Command Pipelines 89

Table 7.1: Standard channels on Linux

Channel Name Abbreviation Device Use

0 standard input

stdin

keyboard Input for programs

1 standard output

stdout

screen Output of programs

2 standard error output

stderr

screen Programs’ error messages

The standard channels are summarised once more in Table 7.1. In the pa- standard channels

tois, they are normally referred to using their abbreviated names—

stdin

stdout

and

stderr

for standard input, standard output, and standard error output. These

channels are respectively assigned the numbers 0, 1, and 2, which we are going to

use later on.

The shell can redirect these standard channels for individual commands, with- Redirection

out the programs in question noticing anything. These always use the standard

channels, even though the output might no longer be written to the screen or ter-

minal window but some arbitrary other le. That le could be a dierent device,

like a printer—but it is also possible to specify a text le which will receive the

output. That le does not even have to exist but will be created if required.

The standard input channel can be redirected in the same way. A program no

longer receives its input from the keyboard, but takes it from the specied le,

which can refer to another device or a le in the proper sense.

BThe keyboard and screen of the “terminal” you are working on (no matter

whether this is a Linux text console, a “genuine” terminal on a serial port,

a terminal window in a graphical environment, or a network session using,

say, the secure shell) can be accessed by means of the

/dev/tty

le—if you

want to read data this means the keyboard, for output the screen (the other

way round would be quite silly). The

$grep xyz /dev/tty

would be equivalent to our example earlier on in this section. You can nd

out more about such “special les” from Chapter 9.)

7.1.2 Redirecting Standard Channels

You can redirect the standard output channel using the shell operator “

” (the Redirecting standard output

“greater-than” sign). In the following example, the output of “

ls -laF

” is redi-

rected to a le called

filelist

; the screen output consists merely of

$ls -laF >filelist

$ __

If the

filelist

le does not exist it is created. Should a le by that name exist,

however, its content will be overwritten. The shell arranges for this even before

the program in question is invoked—the output le will thus be created even if

the actual command invocation contained typos, or if the program did not indeed

write any output at all (in which case the

filelist

le will remain empty).

BIf you want to avoid overwriting existing les using shell output redirection, Protecting existing files

you can give the

bash

command “

set -o noclobber

”. In this case, if output is

redirected to an existing le, an error occurs.

You can look at the

filelist

le in the usual way, e. g., using

less

$less inhalt

total 7

90 7 Standard I/O and Filter Commands

drwxr-xr-x 12 joe users 1024 Aug 26 18:55 ./

drwxr-xr-x 5 root root 1024 Aug 13 12:52 ../

drwxr-xr-x 3 joe users 1024 Aug 20 12:30 photos/

-rw-r--r-- 1 joe users 0 Sep 6 13:50 filelist

-rw-r--r-- 1 joe users 15811 Aug 13 12:33 pingu.gif

-rw-r--r-- 1 joe users 14373 Aug 13 12:33 hobby.txt

-rw-r--r-- 2 joe users 3316 Aug 20 15:14 chemistry.txt

If you look closely at the content of

filelist

, you can see a directory entry for

filelist

with size 0. This is due to the shell’s way of doing things: When parsing

the command line, it notices the output redirection rst and creates a new

filelist

le (or removes its content). After that, the shell executes the command, in this

case

, while connecting

’s standard output to the

filelist

le instead of the

terminal.

BThe le’s length in the

output is 0because the

command looked at the

le information for

filelist

before anything was written to that le – even

though there are three other entries above that of

filelist

. This is because

rst reads all directory entries, then sorts them by le name, and only

then starts writing to the le. Thus

sees the newly created (or emptied)

le

filelist

, with no content so far.

If you want to append a command’s output to an existing le without replacingAppending stan-

dard output to a file its previous content, use the

operator. If that le does not exist, it will be created

in this case, too.

$date >> filelist

$less filelist

total 7

drwxr-xr-x 12 joe users 1024 Aug 26 18:55 ./

drwxr-xr-x 5 root root 1024 Aug 13 12:52 ../

drwxr-xr-x 3 joe users 1024 Aug 20 12:30 photos/

-rw-r--r-- 1 joe users 0 Sep 6 13:50 filelist

-rw-r--r-- 1 joe users 15811 Aug 13 12:33 pingu.gif

-rw-r--r-- 1 joe users 14373 Aug 13 12:33 hobby.txt

-rw-r--r-- 2 joe users 3316 Aug 20 15:14 chemistry.txt

Wed Oct 22 12:31:29 CEST 2003

In this example, the current date and time was appended to the

filelist

le.

Another way to redirect the standard output of a command is by using “back-

ticks” (

…

). This is also called command substitution: The standard output of acommand substitution

command in backticks will be inserted into the command line instead of the com-

mand (and backticks); whatever results from the replacement will be executed.

For example:

$cat dates

Our little diary

22/12 Get presents

23/12 Get Christmas tree

24/12 Christmas Eve

$date +%d/%m

What’s the date?

23/12

$grep `̂date +%d/%m.` dates

What’s up?

23/12 Get Christmas tree

BA possibly more convenient syntax for “

`date`

” is “

$(date)

”. This makes it

easier to nest such calls. However, this syntax is only supported by modern

shells such as

bash

You can use

, the “less-than” sign, to redirect the standard input channel. ThisRedirecting standard input

will read the content of the specied le instead of keyboard input:

7.1 I/O Redirection and Command Pipelines 91

$wc -w <frog.txt

1397

In this example, the

lter command counts the words in le

frog.txt

BThere is no

redirection operator to concatenate multiple input les; to

pass the content of several les as a command’s input you need to use

cat

$cat file1 file2 file3 | wc -w

(We shall nd out more about the “

” operator in the next section.) Most

programs, however, do accept one or more le names as command line ar-

guments.

BYou can, however, use the

operator to take input data for a command

from the lines following the command invocation in the shell. This is less

interesting for interactive use than it is for shell scripts, but must be men-

tioned here for completeness. The feature is called a “here document”. For

example, in

$grep Linux <<END

Roses are red,

Violets are blue,

Linux is lovely,

I know this is true.

END

the input to

grep

consists of the lines following the

grep

call up to the line

containing only “

END

”. The output of the command is

Linux is lovely,

BIf you specify the “end string” of a here document without quotes, shell

variables will be evaluated and command substitution (using

…

)

will be performed on the lines of the here document. However, if the end

string is quoted (single or double quotes), the here document will be pro-

cessed verbatim. Compare the output of

$cat <<EOF

Today's date: `date`

EOF

to that of

$cat <<"EOF"

Today's date: `date`

EOF

Finally: If the here document is introduced by “

<<-

” instead of “

”, all tab

characters will be removed from the beginning of the here document’s lines.

This lets you indent here documents properly in shell scripts.

Of course, standard input and standard output may be redirected at the same Simultaneous redirection

time. The output of the word-count example is written to a le called

wordcount

here:

$wc -w <frog.txt >wordcount

$cat wordcount

1397

92 7 Standard I/O and Filter Commands

Besides the standard input and standard output channels, there is also the stan-standard error output

dard error output channel. If errors occur during a program’s operation, the cor-

responding messages will be written to that channel. That way you will see them

even if standard output has been redirected to a le. If you want to redirect stan-

dard error output to a le as well, you must state the channel number for the

redirection operator—this is optional for

stdin

(

) and

stdout

(

) but mandatory

for

stderr

(

You can use the

operator to redirect a channel to a dierent one:

make >make.log 2>&1

redirects standard output and standard error output of the

make

command to

make.

log

BWatch out: Order is important here! The two commands

make >make.log 2>&1

make 2>&1 >make.log

lead to completely dierent results. In the second case, standard error out-

put will be redirected to wherever standard output goes (

/dev/tty

, where

standard error output would go anyway), and then standard output will

be sent to

make.log

, which, however, does not change the target for standard

error output.

Exercises

C7.1 [2] You can use the

-U

option to get

to output a directory’s entries with-

out sorting them. Even so, after “

ls -laU >filelist

”, the entry for

filelist

the output le gives length zero. What could be the reason?

C7.2 [!2] Compare the output of the commands “

ls /tmp

” and “

ls /tmp >ls-

tmp.txt

” (where, in the second case, we consider the content of the

ls-tmp.txt

to be the output). Do you notice something? If so, how could you explain

the phenomenon?

C7.3 [!2] Why isn’t it possible to replace a le by a new version in one step,

for example using “

grep xyz file >file

”?

C7.4 [!1] And what is wrong with “

cat foo >>foo

”, assuming a non-empty le

foo

C7.5 [2] In the shell, how would you output an error message such that it goes

to standard error output?

7.1.3 Command Pipelines

Output redirection is frequently used to store the result of a program in order to

continue processing it with a dierent command. However, this type of interme-

diate storage is not only quite tedious, but you must also remember to get rid of

the intermediate les once they are no longer required. Therefore, Linux oers a

way of linking commands directly via pipes: A program’s output automaticallypipes

becomes another program’s input.

This direct connection of several commands into a pipeline is done using thedirect connection of

several commands

pipeline

operator. Instead of rst redirecting the output of “

ls -laF

” to a le and then

looking at that le using

less

, you can do the same thing in one step without an

intermediate le:

7.1 I/O Redirection and Command Pipelines 93

Command tee Command

File

stdin stdout

Figure 7.2: The

tee

command

$ls -laF | less

total 7

drwxr-xr-x 12 joe users 1024 Aug 26 18:55 ./

drwxr-xr-x 5 root root 1024 Aug 13 12:52 ../

drwxr-xr-x 3 joe users 1024 Aug 20 12:30 photos/

-rw-r--r-- 1 joe users 449 Sep 6 13:50 filelist

-rw-r--r-- 1 joe users 15811 Aug 13 12:33 pingu.gif

-rw-r--r-- 1 joe users 14373 Aug 13 12:33 hobby.txt

-rw-r--r-- 2 joe users 3316 Aug 20 15:14 chemistry.txt

These command pipelines can be almost any length. Besides, the nal result can

be redirected to a le:

$cut -d: -f1 /etc/passwd | sort | pr -2 >userlst

This command pipeline takes all user names from the rst comma-separated col-

umn of

/etc/passwd

le, sorts them alphabetically and writes them to the

userlst

le in two columns. The commands used here will be described in the remainder

of this chapter.

Sometimes it is helpful to store the data stream inside a command pipeline at

a certain point, for example because the intermediate result at that stage is useful intermediate result

for dierent tasks. The

tee

command copies the data stream and sends one copy

to standard output and another copy to a le. The command name should be

obvious if you know anything about plumbing (see Figure 7.2).

The

tee

command with no options creates the specied le or overwrites it if it

exists; with

-a

(“append”), the output can be appended to an existing le.

$ls -laF | tee list | less

total 7

drwxr-xr-x 12 joe users 1024 Aug 26 18:55 ./

drwxr-xr-x 5 root root 1024 Aug 13 12:52 ../

drwxr-xr-x 3 joe users 1024 Aug 20 12:30 photos/

-rw-r--r-- 1 joe users 449 Sep 6 13:50 content

-rw-r--r-- 1 joe users 15811 Aug 13 12:33 pingu.gif

-rw-r--r-- 1 joe users 14373 Aug 13 12:33 hobby.txt

-rw-r--r-- 2 joe users 3316 Aug 20 15:14 chemistry.txt

In this example the content of the current directory is written both to the

list

le

and the screen. (The

list

le does not show up in the

output because it is only

created afterwards by

tee

Exercises

C7.6 [!2] How would you write the same intermediate result to several les

at the same time?

94 7 Standard I/O and Filter Commands

Table 7.2: Options for

cat

(selection)

Option Result

-b

(engl. number non-blank lines) Numbers all non-blank lines in

the output, starting at 1.

-E

(engl. end-of-line) Displays a

at the end of each line (useful

to detect otherwise invisible space characters).

-n

(engl. number) Numbers all lines in the output, starting at 1.

-s

(engl. squeeze) Replaces sequences of empty lines by a single

empty line.

-T

(engl. tabs) Displays tab characters as “

”.

-v

(engl. visible) Makes control characters 𝑐visible as “

𝑐”, char-

acters 𝛼with character codes greater than 127 as “

M-

𝛼”.

-A

(engl. show all) Same as

-vET

7.2 Filter Commands

One of the basic ideas of Unix—and, consequently, Linux—is the “toolkit princi-toolkit principle

ple”. The system comes with a great number of system programs, each of which

performs a (conceptually) simple task. These programs can be used as “building

blocks” to construct other programs, to save the authors of those programs from

having to develop the requisite functions themselves. For example, not every pro-

gram contains its own sorting routines, but many programs avail themselves of

the

sort

command provided by Linux. This modular structure has several advan-

tages:

• It makes life easier for programmers, who do not need to develop (or incor-

porate) new sorting routines all the time.

• If

sort

receives a bug x or performance improvement, all programs using

sort

benet from it, too—and in most cases do not even need to be changed.

Tools that take their input from standard input and write their output to standard

output are called “lter commands” or “lters” for short. Without input redirec-

tion, a lter will read its input from the keyboard. To nish o keyboard input for

such a program, you must enter the key sequence Ctrl +d, which is interpreted

as “end of le” by the terminal driver.

BNote that the last applies to keyboard input only. Files on the disk may of

course contain the Ctrl +dcharacter (ASCII 4), without the system believ-

ing that the le ended at that point. This as opposed to a certain very pop-

ular operating system, which traditionally has a somewhat quaint notion of

the meaning of the Control-Z (ASCII 26) character even in text les …

Many “normal” commands, such as the aforementioned

grep

, operate like l-

ters if you do not specify input le names for them to work on.

In the remainder of the chapter you will become familiar with a selection of the

most important such commands. Some commands have crept in that are not tech-

nically genuine lter commands, but all of them form important building blocks

for pipelines.

7.3 Reading and Writing Files

7.3.1 Outputting and Concatenating Text Files—

cat

and

tac

The

cat

(“concatenate”) command is really intended to join several les named onconcatenating files

the command line into one. If you pass just a single le name, the content of that

7.3 Reading and Writing Files 95

Table 7.3: Options for

tac

(selection)

Option Result

-b

(engl. before) The separator is considered to occur (and be

output) in front of a part, not behind it.

-r

(engl. regular expression) The separator is interpreted as a reg-

ular expression.

-s

𝑠(engl. separator) Denes a dierent separator 𝑠(in place of

)

an. The separator may be several characters long.

le will be written to standard output. If you do not pass a le name at all,

cat

reads its standard input—this may seem useless, but

cat

oers options to number

lines, make line ends and special characters visible or compress runs of blank lines

into one (Table 7.2).

BIt goes without saying that only text les lead to sensible screen output with text files

cat

. If you apply the command to other types of les (such as the binary le

/bin/cat

), it is more than probable—on a text terminal at least—that the shell

prompt will consist of unreadable characters once the output is done. In this

case you can restore the normal character set by (blindly) typing

reset

. If you

redirect

cat

output to a le this is of course not a problem.

BThe “Useless Use of

cat

Award” goes to people using

cat

where it is extra-

neous. In most cases, commands do accept lenames and don’t just read

their standard input, so

cat

is not required to pass a single le to them on

standard input. A command like “

cat data.txt | grep foo

” is unnecessary if

you can just as well write “

grep foo data.txt

”. Even if

grep

could only read its

standard input, “

grep foo <data.txt

” would be shorter and would not involve

an additional

cat

process.However, the whole issue is a bit more subtle; see

Exercise 7.21.

The

tac

command’s name is “

cat

backwards”, and it works like that, too: It Output a file’s lines in reverse

order

reads a number of named les or its standard input and outputs the lines it has

read in reverse order:

$tac <<END

Alpha

Beta

Gamma

END

Gamma

Beta

Alpha

However, this is where the similarity ends already:

tac

does not support the same

options as

cat

but features its own (Table 7.3). For example, you can use the

-s

op-

tion to set up an alternative separator which the program will use when reversing separator

the input—normally the separator is a newline character, so the input is reversed

line by line. Consider, for example

$echo A:B:C:D | tac -s :

C:B:A:$ _

(where the new shell prompt is appended directly to the last output line). This

output, which at rst glance looks totally weird, can be explained as follows: The

input consists of the four parts “

”, “

”, and “

D\n

” (the separator, here “

”

is considered to belong to the immediately preceding part, and the nal newline

96 7 Standard I/O and Filter Commands

character is contributed by

echo

). These parts are output in reverse order, i. e.,

“

D\n

” comes rst and then the other three, with no other intervening separators

(since every part contains a perfectly workable separator already); the next shell

prompt is appended immediately (without a new line) to the output. The

-b

option

considers the separator to belong to the following part rather than the preceding

one; with “

tac -s : -b

”, our example would produce the following output:

:C:BA$ _

(think it through!).

Exercises

C7.7 [2] How can you check whether a directory contains les with “weird”

names (e. g., ones with spaces at the end or invisible control characters in

the middle)?

7.3.2 Beginning and End—

head

and

tail

Sometimes you are only interested in part of a le: The rst few lines to check

whether it is the right le, or, in particular with log les, the last few entries. The

head

and

tail

commands deliver exactly that—by default, the rst ten and the last

ten lines of every le passed as an argument, respectively (or else as usual the rst

or last ten lines of their standard input). The

-n

option lets you specify a dierent

number of lines: “

head -n 20

” returns the rst 20 lines of its standard input, “

tail

-n 5 data.txt

” the last 5lines of le

data.txt

BTradition dictates that you can specify the number 𝑛of desired lines directly

as “

𝑛”. Ocially this is no longer allowed, but the Linux versions of

head

and

tail

still support it.

You can use the

-c

option to specify that the count should be in bytes, not lines:

“

head -c 20

” displays the rst 20 bytes of standard input, no matter how many

lines they occupy. If you append a “

”, “

”, or “

” (for “blocks”, “kibibytes”, and

“mebibytes”, respectively) to the count, the count will be multiplied by 512,1024,

or 1048576, respectively.

head

also lets you use a minus sign: “

head -c -20

” displays all of its standard

input but the last 20 bytes.

BBy way of revenge,

tail

can do something that

head

does not support: If the

number of lines starts with “

”, it displays everything starting with the given

line:

$tail -n +3 file

Everything from line 3

The

tail

command also supports the important

-f

option. This makes

tail

wait

after outputting the current end of le, to also output data that is appended later

on. This is very useful if you want to keep an eye on some log les. If you pass

several le names to

tail -f

, it puts a header line in front of each block of output

lines telling what le the new data was written to.

Exercises

C7.8 [!2] How would you output just the 13th line of the standard input?

C7.9 [3] Check out “

tail -f

”: Create a le and invoke “

tail -f

” on it. Then,

from another window or virtual console, append something to the le us-

ing, e. g., “

echo >>…

”, and observe the output of

tail

. What does it look like

when

tail

is watching several les simultaneously?

7.3 Reading and Writing Files 97

Table 7.4: Options for

(excerpt)

Option Result

-A

𝑟Base of the oset at the beginning of the line. Valid values are:

(decimal),

(octal),

(hexadecimal),

(no oset at all).

-j

𝑜Skip 𝑜bytes at the beginning of the input, then start writing output.

-N

𝑛Output at most 𝑛bytes.

-t

𝑡Use type specication 𝑡. Several

-t

options may occur, and one line will be output for

each of them in the requisite format.

Possible values for 𝑡:

(named character),

(ASCII character),

(signed decimal number),

(oating-point number),

(octal number),

(unsigned decimal number),

(hexadeci-

mal number).

You can append a digit to all options except

and

. This species how many bytes

of the input should be interpreted as a unit. Details for this and for letter-based width

speciers can be found in

(1).

If you append a

to an option, the printable characters of that line will be displayed to

the right.

-v

Outputs all duplicate lines as well.

-w

𝑤Writes 𝑤bytes per line; default value is 16.

C7.10 [3] What happens to “

tail -f

” if the le being observed shrinks?

C7.11 [3] Explain the output of the following commands:

$echo Hello >/tmp/hello

$echo "Hiya World" >/tmp/hello

when you have started the command

$tail -f /tmp/hello

in a dierent window after the rst

echo

above.

7.3.3 Just the Facts, Ma’am—

and

hexdump

cat

tac

head

, and

tail

work best with text les: Arbitrary binary les can in prin-

ciple be processed, but the last three programs in particular prefer dealing with

les that consist of noticeable lines. Even so, it is often useful to be able to check

exactly what is in a le. A suitable tool is the

(“octal dump”) command, which

can display arbitrary data in dierent formats. Binary data can be displayed byte

by byte or word by word in octal, hexadecimal, decimal or ASCII coding. The

standard display style of

is as follows:

$od /etc/passwd | head -3

0000000 067562 072157 074072 030072 030072 071072 067557 035164

0000020 071057 067557 035164 061057 067151 061057 071541 005150

0000040 060563 064163 067562 072157 074072 030072 030072 071072

At the very left there is the (octal) oset in the le where the output line starts. Line format

The eight following numbers each correspond to two bytes from the le, printed

in octal. This is only useful in very specic circumstances.

Fortunately

supports options that let you change the output format in very

many ways (Table 7.4). Most important is the

-t

option, which describes the for-

-t

mat of the data lines. For byte-by-byte hexadecimal output, you could use, for

example,

98 7 Standard I/O and Filter Commands

$od -txC /etc/passwd

0000000 72 6f 6f 74 3a 78 3a 30 3a 30 3a 72 6f 6f 74 3a

0000020 2f 72 6f 6f 74 3a 2f 62 69 6e 2f 62 61 73 68 0a

0000040 73 61 73 68 72 6f 6f 74 3a 78 3a 30 3a 30 3a 72



(the oset remains octal). Here,

species “hexadecimal”, and

species “byte-

wise”. If you want to see the characters themselves in addition to the hexadecimal

numbers, you can append a

$od -txCz /etc/passwd

0000000 72 6f 6f 74 3a 78 3a 30 3a 30 3a 72 6f 6f 74 3a >root:x:0:0:root:<

0000020 2f 72 6f 6f 74 3a 2f 62 69 6e 2f 62 61 73 68 0a >/root:/bin/bash.<

0000040 73 61 73 68 72 6f 6f 74 3a 78 3a 30 3a 30 3a 72 >sashroot:x:0:0:r<



Non-printable characters (here the

—a newline character—at the end of the sec-

ond line) are replaced by “

”.

You can also concatenate several type speciers or put them into separate

-t

several type specifiers

options. This gives you one line per type specier:

$od -txCc /etc/passwd

0000000 72 6f 6f 74 3a 78 3a 30 3a 30 3a 72 6f 6f 74 3a

r o o t : x : 0 : 0 : r o o t :

0000020 2f 72 6f 6f 74 3a 2f 62 69 6e 2f 62 61 73 68 0a

/ r o o t : / b i n / b a s h \n

0000040 73 61 73 68 72 6f 6f 74 3a 78 3a 30 3a 30 3a 72

s a s h r o o t : x : 0 : 0 : r



(which is identical to »

od -txC -tc /etc/passwd

«).

A sequence of lines that would be equal to the last previously-output line isidentical output lines

replaced by an asterisk (“

”) at the left margin:

$od -tx -N 64 /dev/zero

0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0000100

(

/dev/zero

produces an unlimited supply of null bytes, and the

-N

option to

limits

the output to 64 of them.) The

-v

option suppresses the abbreviation:

$od -tx -N 64 -v /dev/zero

0000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0000100

The

hexdump

(or

) program does a very similar job. It supports output formats

hexdump

that are very like those of

, even though the corresponding options are com-

pletely dierent. For example, the

$hexdump -o -s 446 -n 64 /etc/passwd

is mostly equivalent to the rst

example above. Most

options have fairly

similar counterparts in

hexdump

7.3 Reading and Writing Files 99

A major dierence between

hexdump

and

hexdump

’s support for output for- output formats

mats. These let you specify in much more detail than is possible with

what the

output should look like. Consider the following example:

$cat hexdump.txt

0123456789ABC



XYZabc



xyz

$hexdump -e '"%x-"' hexdump.txt

33323130-37363534-



-a7a79-$

The following points are notable:

• The “

"%x"

” output format writes 4 bytes’ worth of the input in a hexadeci-

mal representation—“

” is the hexadecimal equivalent of 48, the numerical

value of the “

” character according to the ASCII. The “

” sux is output

as-is.

• The 4 bytes are output in reverse order. This is an artefact of the Intel pro-

cessor architecture.

• The double quotes are part of the syntax of

hexdump

and need to be protected

using single quotes (or equivalent) lest the shell remove them.

• The $ at the end is the next command prompt;

hexdump

does not output new-

line characters of its own.

Conveniently for programmers, the possible output formats derive from those

used by the

printf

(3) function found in programming languages like C, Perl,

awk

and so on (even

bash

supports a

printf

command). Check the documentation for

details!

hexdump

output formats are much more sophisticated than the simple example

shown above. As usual with

printf

, you get to specify a “eld width” for the

output:

$hexdump -e '"%10x-"' hexdump.txt

33323130- 37363534-



a7a79-$

(In this case every sequence of hexadecimal digits—eight characters in length—

appears ush-right in a ten-character eld.)

You can also specify how often a format will be “executed”: repeat count

$hexdump -e '4 "%x-" "\n"' hexdump.txt

33323130-37363534-42413938-46454443-

4a494847-4e4d4c4b-5251504f-56555453-

5a595857-64636261-68676665-6c6b6a69-

706f6e6d-74737271-78777675-a7a79-

The

preceding the commands in this section says that the “

"%x"

” format is to be

applied four times. After that, we continue with the next format—“

"\n"

”—, which

produces a newline character. After that,

hexdump

starts again from the front. newline character

In addition, you can determine how many bytes a format should process (with byte count

the numerical formats you usually have a choice of 1, 2, and 4):

$hexdump -e '/2 "%x-" "\n"' hexdump.txt

3130-

3332-



7a79-

(“

” is an abbreviation for “

1/2

”, which is why every

format appears just once

per line.) Repeat count and byte count may be combined:

100 7 Standard I/O and Filter Commands

Table 7.5: Options for

Option Result

-c

(complement) Replaces all characters not in ⟨s1⟩by characters from ⟨s2⟩

-d

(delete) Removes all characters in ⟨s1⟩without substitution

-s

(squeeze) Runs of identical characters from ⟨s2⟩are replaced by a single character

$hexdump -e '4/2 "%x-" "\n"' hexdump.txt

3130-3332-3534-3736-

3938-4241-4443-4645-



7675-7877-7a79-a-

And you may also mix dierent output formats:

$hexdump -e '"%2_ad" "%2.2s" 3/2 " %x" " %1.1s" "\n"'





hexdump.txt

0 01 3332 3534 3736 8

9 9A 4342 4544 4746 H



In this case we output the rst two characters from the le as characters (rather

than numerical codes) (“

"%2.2s"

”), then three times the codes of two characters

in hexadecimal form (“

3/2 " %x"

”) followed by another character as a character

(“

"%1.1s"

”) and a newline character. Then we start again from the front. The

“

"%2_ad"

” at the beginning of the line outputs the current oset in the le (counted

in bytes from the start of the le) in decimal form in a 2-character eld.

Exercises

C7.12 [2] What is the dierence between the “

” and “

” type speciers of

C7.13 [3] The

/dev/random

“device” returns random bytes (see Section 9.3). Use

with

/dev/random

to assign a decimal random number from 0to 65535 to

the shell variable

7.4 Text Processing

7.4.1 Character by Character—

expand

and

unexpand

The

command is used to replace single characters by dierent ones inside a text,

or to delete them outright.

is strictly a lter command, it does not take lename

arguments but works with the standard channels only.

For substitutions, the command syntax is “

⟨s1⟩ ⟨s2⟩”. The two parameterssubstitutions

are character strings describing the substitution: In the simplest case the rst char-

acter in ⟨s1⟩will be substituted by the rst character in ⟨s2⟩, the second character

in ⟨s1⟩by the second in ⟨s2⟩, and so on. If ⟨s1⟩is longer than ⟨s2⟩, the “excess”

characters in ⟨s1⟩are replaced by the nal character in ⟨s2⟩; if ⟨s2⟩is longer than

⟨s1⟩, the extra characters in ⟨s2⟩are ignored.

A little example by way of illustration:

$tr AEiu aey <example.txt >new1.txt

7.4 Text Processing 101

Table 7.6: Characters and character classes for

Class Meaning

Control-G (ASCII 7), audible alert

Control-H (ASCII 8), backspace

Control-L (ASCII 12), form feed

Control-J (ASCII 10), line feed

Control-M (ASCII 13), carriage return

Control-I (ASCII 9), tabulator character

Control-K (ASCII 11), vertical tabulator

𝑘𝑘𝑘 the character with octal code 𝑘𝑘𝑘

a backslash

[

𝑐

𝑛

]

in ⟨s2⟩:𝑛times character 𝑐

[

𝑐

in ⟨s2⟩: character 𝑐as often as needed to make ⟨s2⟩as long as ⟨s1⟩

[:alnum:]

all letters and digits

[:alpha:]

all letters

[:blank:]

all horizontal whitespace characters

[:cntrl:]

all control characters

[:digit:]

all digits

[:graph:]

all printable characters (excluding space)

[:lower:]

all lowercase letters

[:print:]

all printable characters (including space)

[:punct:]

all punctuation characters

[:space:]

all horizontal or vertical whitespace characters

[:upper:]

all capital letters

[:xdigit:]

alle hexadecimal letters (

–

)

𝑐

alle characters equivalent to 𝑐(at this point only 𝑐itself)

This command reads le

example.txt

and replaces all “A” characters by “a”, all “E”

characters by “e”, and all “i” and “u” characters by “y”. The result is stored in le

new1.txt

It is permissible to express sequences of characters by ranges of the form “𝑚

𝑛”, ranges

where 𝑚must precede 𝑛in the character collating order. With the

-c

option,

does not replace the content of ⟨s1⟩but its “complement”, all characters not con-

tained in ⟨s1⟩. The command

$tr -c A-Za-z ' ' <example.txt >new1.txt

replaces all non-letters in

example.txt

by spaces.

BIt is also possible to use character classes of the form

𝑘

(the valid class character classes

names are shown in Table 7.6); in many cases this makes sense in order

to construct commands that work in dierent language environments. In

a German-language environment, the character class “

[:alpha:]

”, for exam-

ple, contains the umlauts, a “home-cooked” class like “

A-Za-z

”, which works

for English, doesn’t. There are some other restrictions on character classes

which you can look up in the

documentation (see “

info tr

”).

To delete characters, you need only specify ⟨s1⟩: The Deleting characters

$tr -d a-z <example.txt >new2.txt

command removes all lowercase letters from

example.txt

. Furthermore, you can

replace runs of equivalent input characters by a single output character: The

$tr -s '\n' <example.txt >new3.txt

102 7 Standard I/O and Filter Commands

command removes empty lines by replacing sequences of newline characters by

a single one.

The

-s

option (“squeeze”) also makes it possible to substitute two dierent in-

put characters by two identical ones, and to replace them by a single one (as with

-s

with a single argument). The following turns all “A” and “E” characters (and

sequences of those) into a single “X” in

new3.txt

$tr -s AE X <example.txt >new3.txt

The “tabulator”—a good old typewriter feature—is a convenient way of pro-tabulator

ducing indentation when programming (or entering text in general). By conven-

tion, “tabulator stops” are set in certain columns (usually every 8columns, i. e.,

at positions 8,16,24, …), and common editors move to the next tabulator stop

to the right when the Tab key is pressed—if you press Tab when the cursor is

at column 11 on the screen, the cursor goes to column 16. In spite of this, the

resulting “tabulator characters” (or “tabs”) will be written to the le verbatim,

and many programs cannot interpret them correctly. The

expand

command helpsExpanding tabs

here: It reads the les named as its parameters (or else—you knew it—its stan-

dard input) and writes them to its standard output with all tabs removed by the

appropriate number of spaces to keep the tabulator stops every 8columns. With

the

-t

you can dene a dierent “scale factor” for the tabulator stops; a common

value is, e. g., “

-t 4

”, which sets up tabulator stops at columns 4,8,12,16, etc.

BIf you give several comma-separated numbers with

-t

, tabulator stops will

be set at the named columns exactly: “

expand -t 4,12,32

” sets tabulator stops

at columns 4,12 and 32. Additional tabs in an input line will be replaced by

spaces.

BThe

-i

(“initial”) option causes only tabs at the beginning of the line to be

expanded to spaces.

The

unexpand

more or less reverses the eect of

expand

: All runs of tabs and spacesIntroducing tabs

at the beginning of the input lines (as usual taken from named les or standard in-

put) are replaced by the shortest sequence of tabs and spaces resulting in the same

indentation. A line starting with a tab, two spaces, another tab and nine spaces

will, for example—assuming standard tabulator stops every eight columns—be

replaced by a line starting with three tabs and a space. The

-a

(“all”) option causes

all sequences of two or more tabs and spaces to be “optimized”, not just those at

the beginning of a line.

Exercises

C7.14 [!2] The famous Roman general Julius Caesar supposedly used the fol-

lowing cipher to transmit secret messages: The letter “A” was replaced by

“D”, “B” by “E” and so on; “X” was replaced by “A”, “Y” by “B” and “Z”

by “C” (if we start from today’s 26-letter alphabet, disregarding the fact that

the ancient Romans did not use J, K, W, or Y). Imagine you are are program-

mer in Caesar’s legion. Which

commands would you use to encrypt the

general’s messages and to decrypt them again?

C7.15 [3] What

command would you use to replace all vowels in a text by

a single one? Consider the (German) children’s game:

DREI CHINESEN MIT DEM KONTRABASS

DRAA CHANASAN MAT DAM KANTRABASS

C7.16 [3] How would you transform a text le such that all punctuation is

removed and every word appears on a line on its own?

7.4 Text Processing 103

C7.17 [2] Give a

command to remove the characters “

”, “

”, and “

” from

the standard input.

C7.18 [1] How would you convince yourself that

unexpand

really replaces

spaces and tabulator characters by an “optimal” sequence?

7.4.2 Line by Line—

fmt

and so on

While the commands from the previous section considered their input characters

singly or in small groups, Linux also contains many commands that deal with

whole input lines. Some of them are introduced in this and the subsequent sec-

tions.

The

fmt

program wraps the input lines (as usual taken from the les mentioned Line wrapping

on the command line, or the standard input) such that they have a given maximal

line—75 characters, unless otherwise specied using the

-w

option. It is quite con-

cerned with producing pleasant-looking output.

Let us consider some examples of

fmt

(the

frog0.txt

le is equivalent to

frog.txt

except that the rst line of each paragraph is indented by two spaces):

$head frog0.txt

The Frog King, or Iron Henry

In olden times when wishing still helped one, there lived a king

whose daughters were all beautiful, but the youngest was so beautiful

that the sun itself, which has seen so much, was astonished whenever

it shone in her face.

Close by the king's castle lay a great dark forest, and under an old

lime-tree in the forest was a well, and when the day was very warm,

the king's child went out into the forest and sat down by the side of

In the rst example we reduce the line length to 40 characters:

$fmt -w 40 frog0.txt

The Frog King, or Iron Henry

In olden times when wishing still

helped one, there lived a king

whose daughters were all beautiful,

but the youngest was so beautiful that

the sun itself, which has seen so much,

was astonished whenever it shone in



Note that the second line of the rst paragraph is indented by spaces for no ap-

parent reason. This is due to the fact that

fmt

usually only considers those ranges

of lines for wrapping that are indented the same. The indented rst line of the

rst paragraph of the example le is therefore considered its own paragraph, and

all resulting lines are indented like the input paragraph’s rst (and only) line.

The second and subsequent input lines are considered an independent, additional

“paragraph”, and wrapped accordingly (using the indentation of the second line).

fmt

tries to keep empty lines, the word spacing, and the indentation from

the input. It prefers line breaks at the end of a sentence and tries to avoid

them after the rst and before the last word of a sentence. The “end of a

sentence” according to

fmt

is either the end of a paragraph or a word ending

with “

”, “

”, or “

”, followed by two (!) spaces.

BThere is more information about the way

fmt

works in the info page of the

program (Hint to nd it:

fmt

is part of the GNU “coreutils” collection.)

104 7 Standard I/O and Filter Commands

Table 7.7: Options for

(selection)

Option Result

𝑛Creates 𝑛-column output (any positive integer value is permissible, but only 2to 5usu-

ally make sense)

-h

𝑡(engl. header) Outputs 𝑡instead of the source le name at the top of each page

-l

𝑛(engl. length) Sets the number of lines per page, default value is 66

-n

(engl. number) Labels each line with a ve-digit line number separated from the rest of

the line by a tab character

-o

𝑛(engl. oset) Indents the text 𝑛characters from the left margin

-t

(engl. omit) Suppresses the header and footer lines (5each)

-w

𝑛(engl. width) Sets the number of characters per line, default value is 72

In the next example we use the

-c

(“crown-margin mode”) option to avoid the

phenomenon we just explained:

$fmt -c -w 40 frog0.txt

The Frog King, or Iron Henry

In olden times when wishing still

helped one, there lived a king whose

daughters were all beautiful, but the

youngest was so beautiful that the

sun itself, which has seen so much,

was astonished whenever it shone in



Here the indentation of the (complete) paragraph is taken from the rst two lines

of the input; their indentation is kept, and the subsequent input lines follow the

indentation of the second.

Finally, an example featuring long lines:

$fmt -w 100 frog0.txt

The Frog King, or Iron Henry

In olden times when wishing still helped one, there lived a king

whose daughters were all beautiful, but the youngest was so beautiful that the sun itself,

which has seen so much, was astonished whenever it shone in her face.

Close by the king's castle lay a great dark forest, and under an old

lime-tree in the forest was a well, and when the day was very warm, the king's child went out

into the forest and sat down by the side of the cool fountain, and when she was bored she took

a golden ball, and threw it up on high and caught it, and this ball was her favourite plaything.



We could have used

-c

here as well to avoid the “short” rst lines of the para-

graphs. Without this option, the rst line is once more considered a paragraph of

its own, and not amalgamated with the subsequent lines.

The name of the

(“print”) command may be misleading at rst. It does not,

as might be surmised, output les to the printer—this is the domain of the

lpr

command. Instead,

manages formatting a text for printed output, including

page breaks, indentation and header and footer lines. You can either specify input

les on the command line or have

process its standard input (Table 7.7).

Here is a somewhat more complex example to illustrate

’s workings:

$fmt -w 34 frog.txt | pr -h "Grimm Fairy-Tales" -2

7.4 Text Processing 105

Table 7.8: Options for

(selection)

Option Result

-b

𝑠(body style) Numbers the body lines according to 𝑠. Possible values for 𝑠are

(num-

ber all lines),

(number only non-blank lines),

(number no lines at

all), and

⟨regex⟩(number only the lines matching regular expression

⟨regex⟩). The default value is

-d

𝑝[𝑞] (delimiter) Use the two characters 𝑝𝑞 instead of “

” in delimiter lines. If only 𝑝is

given, 𝑞remains set to “

”.

-f

𝑠(footer style) Formats the footer lines according to 𝑠. The possible values of 𝑠corre-

spond to those of

-b

. The default value is

-h

𝑠(header style) Similar to

-f

, for header lines.

-i

𝑛(increment) Increments the line number by 𝑛for every line.

-n

𝑓(number format) Determines the line number format. Possible values for 𝑓:

(ush-

left with no leading zeroes),

(ush-right with no leading zeroes),

(ush-right with leading zeroes).

-p

(page) Does not reset the line number to its original value between logical

pages.

-v

𝑛Starts numbering at line number 𝑛.

-w

𝑛(width) Output a 𝑛-character line number (according to

-n

2004-09-13 08:42 Grimm Fairy-Tales Page 1

The Frog King, or Iron Henry >>Whatever you will have, dear

frog,« said she, >>My clothes, my

In olden times when wishing pearls and jewels, and even the

still helped one, there lived a golden crown which I am wearing.«

king whose daughters were all

beautiful, but the youngest The frog answered, >>I do not care

was so beautiful that the sun for your clothes, your pearls

itself, which has seen so much, and jewels, nor for your golden

was astonished whenever it shone crown, but if you will love me

in her face. and let me be your companion and



Here we use

fmt

to format the text of the Frog King in a long narrow column, and

to display the text in two columns.

The

command specialises in line numbering. If nothing else is specied, it line numbering

numbers the non-blank lines of its input (which as usual will be taken from named

les or else standard input) in sequence:

$nl frog.txt

1 The Frog King, or Iron Henry

2 In olden times when wishing still helped one, there lived a king whose

3 daughters were all beautiful, but the youngest was so beautiful that

4 the sun itself, which has seen so much, was astonished whenever it

5 shone in her face.

6 Close by the king's castle lay a great dark forest, and under an old



This by itself is nothing you would not manage using “

cat -b

”. For one, though,

allows for much closer control of the line numbering process:

106 7 Standard I/O and Filter Commands

$nl -b a -n rz -w 5 -v 1000 -i 10 frog.txt

01000 The Frog King, or Iron Henry

01010

01020 In olden times when wishing still helped one, there lived a king whose

01030 daughters were all beautiful, but the youngest was so beautiful that

01040 the sun itself, which has seen so much, was astonished whenever it

01050 shone in her face.

01060

01070 Close by the king's castle lay a great dark forest, and under an old

01080 lime-tree in the forest was a well, and when the day was very warm,

01090 the king's child went out into the forest and sat down by the side of



Taken one by one, the options imply the following (see also Table 7.8): “

-b a

” causes

all lines to be numbered, not just—as in the previous example—the non-blank

ones. “

-n rz

” formats line numbers ush-right with leading zeroes, “

-w 5

” caters

for a ve-column line number, and “

-i 10

” increments the line number by 10 per

line (not, as usual, 1).

In addition,

can also handle per-page line numbers. This is organized usingPer-page line numbers

the “magical” strings “

\:\:\:

”, “

\:\:

” und “

”, as shown in the previous example:

$cat nl-test

\:\:\:

Header of first page

\:\:

First line of first page

Second line of first page

Last line of first page

Footer of first page

\:\:\:

Footer of second page

(Two lines high)

\:\:

First line of second page

Second line of second page

Second-to-last line of second page

Last line of second page

Header of second page

(Two lines high)

Each (logical) page has a header and footer as well as a “body” containing the textheader and footer

proper. The header is introduced using “

\:\:\:

”, and separated from the body

using “

\:\:

”. The body, in turn, ends at a “

” line. Header and footer may also

be omitted.

By default,

numbers the lines on each page starting at 1; header and footer

lines are not numbered:

$nl nl-test

Header of first page

1 First line of first page

2 Second line of first page

3 Last line of first page

Footer of first page

7.4 Text Processing 107

Table 7.9: Options for

(selection)

Option Wirkung

-l

(lines) outputs line count

-w

(words) outputs word count

-c

(characters) outputs character count

Footer of second page

(Two lines high)

1 First line of second page

2 Second line of second page

3 Second-to-last line of second page

4 Last line of second page

Header of second page

(Two lines high)

The “

…” separator lines are replaced by blank lines in the output.

The name of the

command is an abbreviation of “word count”. In spite of

this moniker, not just a word count can be determined, but also a count of total Count lines, words, characters

characters and lines in the input (les, standard input). This is done using the

options in Table 7.9. A “word”, from

’s point of view, is a sequence of one or

more letters. Without an option, all three values are output in the order given in

Table 7.9:

$wc frog.txt

144 1397 7210 frog.txt

With the options in Table 7.9, you can limit

’s output to only some of the values:

$ls | wc -l

The example shows how to use

to determine the number of entries in the current

directory by counting the lines in the output of the

command.

Exercises

C7.19 [1] Number the lines of le

frog.txt

with an increment of 2per line

starting at 100.

C7.20 [3] How can you number the lines of a le in reverse order, similar to

144 The Frog King, or Iron Henry

143

142 In olden times when wishing still helped one, there lived a king whose

141 daughters were all beautiful, but the youngest was so beautiful that

(Hint: Two reversals give the original)?

C7.21 [!2] How does the output of the “

wc a.txt b.txt c.txt

” command dier

from that of the “

cat a.txt b.txt c.txt | wc

” command?

108 7 Standard I/O and Filter Commands

7.5 Data Management

7.5.1 Sorted Files—

sort

and

uniq

The

sort

command lets you sort the lines of text les according to predetermined

criteria. The default setting is ascending (from A to Z) according to the ASCIIdefault setting

values1of the rst few characters of each line. This is why special characters such

as German umlauts are frequently sorted incorrectly. For example, the character

code of “Ä” is 143, so that character ends up far beyond “Z” with its character code

of 91. Even the lowercase latter “a” is considered “greater than” the uppercase

letter “Z”.

BOf course,

sort

can adjust itself to dierent languages and cultures. To sort

according to German conventions, set one of the environment variables

LANG

LC_ALL

, or

LC_COLLATE

to a value such as “

”, “

de_DE

”, or “

de_DE@UTF-8

” (the

actual value depends on your distribution). If you want to set this up for

a single

sort

invocation only, do

$ … | LC_COLLATE=de_DE.UTF-8 sort

The value of

LC_ALL

has precedence over the value of

LC_COLLATE

and that,

again, has precedence over the value of

LANG

. As a side eect, German sort

order causes the case of letters to be ignored when sorting.

Unless you specify otherwise, the sort proceeds “lexicographically” considering

all of the input line. That is, if the initial characters of two lines compare equal,

the rst diering character within the line governs their relative positioning. Of

course

sort

can sort not just according to the whole line, but more specically ac-

cording to the values of certain “columns” or elds of a (conceptual) table. FieldsSorting by fields

are numbered starting at 1; with the “

-k 2

” option, the rst eld would be ignored

and the second eld of each line considered for sorting. If the values of two lines

are equal in the second eld, the rest of the line will be looked at, unless you spec-

ify the last eld to be considered using something like “

-k 2,3

”. Incidentally, it is

permissible to specify several

-k

options with the same

sort

command.

BIn addition,

sort

supports an obsolete form of position specication: Here

elds are numbered starting at 0, the initial eld is specied as “

𝑚” and

the nal eld as “

𝑛”. To complete the dierences to the modern form, the

nal eld is specied “exclusively”—you give the rst eld that should not

be taken into account for sorting. The examples above would, respectively,

be “

”, “

+1 -3

”, and “

+1 -2

”.

The space character serves as the separator between elds. If several spaces occurseparator

in sequence, only the rst is considered a separator; the others are considered

part of the value of the following eld. Here is a little example, namely the list

of participants for the annual marathon run of the Lameborough Track & Field

Club. To start, we ensure that we use the system’s standard language environment

(“

POSIX

”) by resetting the corresponding environment variables. (Incidentally, the

fourth column gives a runner’s bib number.)

$unset LANG LC_ALL LC_COLLATE

$cat participants.dat

Smith Herbert Pantington AC 123 Men

Prowler Desmond Lameborough TFC 13 Men

Fleetman Fred Rundale Sportsters 217 Men

Jumpabout Mike Fairing Track Society 154 Men

1Of course ASCII only goes up to 127. What is really meant here is ASCII together with whatever

extension for the characters with codes from 128 up is currently used, for example ISO-8859-1, also

known as ISO-Latin-1.

7.5 Data Management 109

de Leaping Gwen Fairing Track Society 26 Ladies

Runnington Vivian Lameborough TFC 117 Ladies

Sweat Susan Rundale Sportsters 93 Ladies

Runnington Kathleen Lameborough TFC 119 Ladies

Longshanks Loretta Pantington AC 55 Ladies

O'Finnan Jack Fairing Track Society 45 Men

Oblomovsky Katie Rundale Sportsters 57 Ladies

Let’s try a list sorted by last name rst. This is easy in principle, since the last

names are at the front of each line:

$sort participants.dat

Fleetman Fred Rundale Sportsters 217 Men

Jumpabout Mike Fairing Track Society 154 Men

Longshanks Loretta Pantington AC 55 Ladies

O'Finnan Jack Fairing Track Society 45 Men

Oblomovsky Katie Rundale Sportsters 57 Ladies

Prowler Desmond Lameborough TFC 13 Men

Runnington Kathleen Lameborough TFC 119 Ladies

Runnington Vivian Lameborough TFC 117 Ladies

Smith Herbert Pantington AC 123 Men

Sweat Susan Rundale Sportsters 93 Ladies

de Leaping Gwen Fairing Track Society 26 Ladies

You will surely notice the two small problems with this list: “Oblomovsky” should

really be in front of “O’Finnan”, and “de Leaping” should end up at the front of

the list, not the end. These will disappear if we specify “English” sorting rules:

$LC_COLLATE=en_GB sort participants.dat

de Leaping Gwen Fairing Track Society 26 Ladies

Fleetman Fred Rundale Sportsters 217 Men

Jumpabout Mike Fairing Track Society 154 Men

Longshanks Loretta Pantington AC 55 Ladies

Oblomovsky Katie Rundale Sportsters 57 Ladies

O'Finnan Jack Fairing Track Society 45 Men

Prowler Desmond Lameborough TFC 13 Men

Runnington Kathleen Lameborough TFC 119 Ladies

Runnington Vivian Lameborough TFC 117 Ladies

Smith Herbert Pantington AC 123 Men

Sweat Susan Rundale Sportsters 93 Ladies

(

en_GB

is short for “British English”;

en_US

, for “American English”, would also work

here.) Let’s sort according to the rst name next:

$sort -k 2,2 participants.dat

Smith Herbert Pantington AC 123 Men

Sweat Susan Rundale Sportsters 93 Ladies

Prowler Desmond Lameborough TFC 13 Men

Fleetman Fred Rundale Sportsters 217 Men

O'Finnan Jack Fairing Track Society 45 Men

Jumpabout Mike Fairing Track Society 154 Men

Runnington Kathleen Lameborough TFC 119 Ladies

Oblomovsky Katie Rundale Sportsters 57 Ladies

de Leaping Gwen Fairing Track Society 26 Ladies

Longshanks Loretta Pantington AC 55 Ladies

Runnington Vivian Lameborough TFC 117 Ladies

This illustrates the property of

sort

mentioned above: The rst of a sequence of

spaces is considered the separator, the others are made part of the following eld’s

110 7 Standard I/O and Filter Commands

Table 7.10: Options for

sort

(selection)

Option Result

-b

(blank) Ignores leading blanks in eld contents

-d

(dictionary) Sorts in “dictionary order”, i. e., only letters, digits and spaces are taken

into account

-f

(fold) Makes uppercase and lowercase letters equivalent

-i

(ignore) Ignores non-printing characters

-k

⟨eld⟩[

⟨eld’⟩] (key) Sort according to ⟨eld⟩(up to and including ⟨eld’⟩)

-n

(numeric) Considers eld value as a number and sorts according to its numeric

value; leading blanks will be ignored

-o datei

(output) Writes results to a le, whose name may match the original input le

-r

(reverse) Sorts in descending order, i. e., Z to A

-t

⟨char⟩(terminate) The ⟨char⟩character is used as the eld separator

-u

(unique) Writes only the rst of a sequence of equal output lines

value. As you can see, the rst names are listed alphabetically but only within the

same length of last name. This can be xed using the

-b

option, which treats runs

of space characters like a single space:

$sort -b -k 2,2 participants.dat

Prowler Desmond Lameborough TFC 13 Men

Fleetman Fred Rundale Sportsters 217 Men

Smith Herbert Pantington AC 123 Men

O'Finnan Jack Fairing Track Society 45 Men

Runnington Kathleen Lameborough TFC 119 Ladies

Oblomovsky Katie Rundale Sportsters 57 Ladies

de Leaping Gwen Fairing Track Society 26 Ladies

Longshanks Loretta Pantington AC 55 Ladies

Jumpabout Mike Fairing Track Society 154 Men

Sweat Susan Rundale Sportsters 93 Ladies

Runnington Vivian Lameborough TFC 117 Ladies

This sorted list still has a little blemish; see Exercise 7.24.

The sort eld can be specied in even more detail, as the following exampleMore detailed field specification

shows:

$sort -br -k 2.2 participants.dat

Sweat Susan Rundale Sportsters 93 Ladies

Fleetman Fred Rundale Sportsters 217 Men

Longshanks Loretta Pantington AC 55 Ladies

Runnington Vivian Lameborough TFC 117 Ladies

Jumpabout Mike Fairing Track Society 154 Men

Prowler Desmond Lameborough TFC 13 Men

Smith Herbert Pantington AC 123 Men

de Leaping Gwen Fairing Track Society 26 Ladies

Oblomovsky Katie Rundale Sportsters 57 Ladies

Runnington Kathleen Lameborough TFC 119 Ladies

O'Finnan Jack Fairing Track Society 45 Men

Here, the

participants.dat

le is sorted in descending order (

-r

) according to the

second character of the second table eld, i. e., the second character of the rst

name (very meaningful!). In this case as well it is necessary to ignore leading

spaces using the

-b

option. (The blemish from Exercise 7.24 still manifests itself

here.)

With the

-t

(“terminate”) option you can select an arbitrary character in place

of the eld separator. This is a good idea in principle, since the elds then mayfield separator

7.5 Data Management 111

contain spaces. Here is a more usable (if less readable) version of our example le:

Smith:Herbert:Pantington AC:123:Men

Prowler:Desmond:Lameborough TFC:13:Men

Fleetman:Fred:Rundale Sportsters:217:Men

Jumpabout:Mike:Fairing Track Society:154:Men

de Leaping:Gwen:Fairing Track Society:26:Ladies

Runnington:Vivian:Lameborough TFC:117:Ladies

Sweat:Susan:Rundale Sportsters:93:Ladies

Runnington:Kathleen:Lameborough TFC:119:Ladies

Longshanks:Loretta: Pantington AC:55:Ladies

O'Finnan:Jack:Fairing Track Society:45:Men

Oblomovsky:Katie:Rundale Sportsters:57:Ladies

Sorting by rst name now leads to correct results using “

LC_COLLATE=en_GB sort -t:

-k2,2

”. It is also a lot easier to sort, e.g., by a participant’s number (now eld 4, no

matter how many spaces occur in their club’s name:

$sort -t: -k4 participants0.dat

Runnington:Vivian:Lameborough TFC:117:Ladies

Runnington:Kathleen:Lameborough TFC:119:Ladies

Smith:Herbert:Pantington AC:123:Men

Prowler:Desmond:Lameborough TFC:13:Men

Jumpabout:Mike:Fairing Track Society:154:Men

Fleetman:Fred:Rundale Sportsters:217:Men

de Leaping:Gwen:Fairing Track Society:26:Ladies

O'Finnan:Jack:Fairing Track Society:45:Men

Longshanks:Loretta: Pantington AC:55:Ladies

Oblomovsky:Katie:Rundale Sportsters:57:Ladies

Sweat:Susan:Rundale Sportsters:93:Ladies

Of course the “number” sort is done lexicographically, unless otherwise specied—“117”

and “123” are put before “13”, and that in turn before “154”. This can be xed by

giving the

-n

option to force a numeric comparison: numeric comparison

$sort -t: -k4 -n participants0.dat

Prowler:Desmond:Lameborough TFC:13:Men

de Leaping:Gwen:Fairing Track Society:26:Ladies

O'Finnan:Jack:Fairing Track Society:45:Men

Longshanks:Loretta: Pantington AC:55:Ladies

Oblomovsky:Katie:Rundale Sportsters:57:Ladies

Sweat:Susan:Rundale Sportsters:93:Ladies

Runnington:Vivian:Lameborough TFC:117:Ladies

Runnington:Kathleen:Lameborough TFC:119:Ladies

Smith:Herbert:Pantington AC:123:Men

Jumpabout:Mike:Fairing Track Society:154:Men

Fleetman:Fred:Rundale Sportsters:217:Men

These and some more important options for

sort

are shown in Table 7.10; studying

the program’s documentation is well worthwhile.

sort

is a versatile and powerful

command which will save you a lot of work.

The

uniq

command does the important job of letting through only the rst of a

uniq

command

sequence of equal lines in the input (or the last, just as you prefer). What is con-

sidered “equal” can, as usual, be specied using options.

uniq

diers from most

of the programs we have seen so far in that it does not accept an arbitrary number

of named input les but just one; a second le name, if it is given, is considered

the name of the desired output le (if not, standard output is assumed). If no le

is named in the

uniq

call,

uniq

reads standard input (as it ought).

112 7 Standard I/O and Filter Commands

uniq

works best if the input lines are sorted such that all equal lines occur one

after another. If that is not the case, it is not guaranteed that each line occurs only

once in the output:

$cat uniq-test

Hipp

Hopp

Hipp

Hopp

$uniq uniq-test

Hipp

Hopp

Hipp

Hopp

Compare this to the output of “

sort -u

”:

$sort -u uniq-test

Hipp

Hopp

Exercises

C7.22 [!2] Sort the list of participants in

participants0.dat

(the le with colon

separators) according to the club’s name and, within clubs, the last and rst

names of the runners (in that order).

C7.23 [3] How can you sort the list of participants by club name in ascending

order and, within clubs, by number in descending order? (Hint: Read the

documentation!)

C7.24 [!2] What is the “blemish” alluded to in the examples and why does it

occur?

C7.25 [2] A directory contains les with the following names:

01-2002.txt 01-2003.txt 02-2002.txt 02-2003.txt

03-2002.txt 03-2003.txt 04-2002.txt 04-2003.txt



11-2002.txt 11-2003.txt 12-2002.txt 12-2003.txt

Give a

sort

command to sort the output of

into “chronologically correct”

order:

01-2002.txt

02-2002.txt



12-2002.txt

01-2003.txt



12-2003.txt

C7.26 [3] How can you produce a sorted list of all words in a text le? Each

word should occur only once in the list. (Hint: Exercise 7.16)

7.5 Data Management 113

7.5.2 Columns and Fields—

cut

paste

etc.

While you can locate and “cut out” lines of a text le using

grep

, the

cut

command Cutting columns

works through a text le “by column”. This works in one of two ways:

One possibility is the absolute treatment of columns. These columns corre- Absolute columns

spond to single characters in a line. To cut out such columns, the column number

must be given after the

-c

option (“column”). To cut several columns in one step,

these can be specied as a comma-separated list. Even column ranges may be

specied.

$cut -c 12,1-5 participants.dat

SmithH

ProwlD

FleetF

JumpaM

de LeG



In this example, the rst letter of the rst name and the rst ve letters of the

last name are extracted. It also illustrates the notable fact that the output always

contains the columns in the same order as in input. Even if the selected column

ranges overlap, every input character is output at most once:

$cut -c 1-5,2-6,3-7 participants.dat

Smith

Prowler

Fleetma

Jumpabo

de Leap



The second method is to cut relative elds, which are delimited by separator Relative fields

characters. If you want to cut delimited elds,

cut

needs the

-f

(“eld”) option

and the desired eld number. The same rules as for columns apply. The

-c

and

-f

options are mutually exclusive.

The default separator is the tab character; other separators may be specied separators

with the

-d

option (“delimiter”):

$cut -d: -f 1,4 participants0.dat

Smith:123

Prowler:13

Fleetman:217

Jumpabout:154

de Leaping:26



In this way, the participants’ last names (column 1) and numbers (column 4) are

taken from the list. For readability, only the rst few lines are displayed.

BIncidentally, using the

--output-delimiter

option you can specify a dierent

separator character for the output elds than is used for the input elds:

$cut -d: --output-delimiter=': ' -f 1,4 participants0.dat

Smith: 123

Prowler: 13

Fleetman: 217

Jumpabout: 154

de Leaping: 26

114 7 Standard I/O and Filter Commands

BIf you really want to change the order of columns and elds, you have to

bring in the big guns, such as

awk

perl

; you could do it using the

paste

command, which will be introduced presently, but that is rather tedious.

When les are treated by elds (rather than columns), the

-s

option (“sepa-Suppressing no-field lines

rator”) is helpful. If “

cut -f

” encounters lines that do not contain the separator

character, these are normally output in their entirety;

-s

suppresses these lines.

The

paste

command joins the lines of the specied les. It is thus frequentlyJoining lines of files

used together with

cut

. As you will have noticed immediately,

paste

is not a lter

command. You may however give a minus sign in place of one of the input le-

names for

paste

to read its standard input at that point. Its output always goes to

standard output.

As we said,

paste

works by lines. If two le names are specied, the rst lineJoin files “in parallel”

of the rst le and the rst of the second are joined (using a tab character as the

separator) to form the rst line of the output. The same is done with all other lines

in the les. To specify a dierent separator, use the

-d

option.separator

By way of an example, we can construct a version of the list of marathon run-

ners with the participants’ numbers in front:

$cut -d: -f4 participants0.dat >number.dat

$cut -d: -f1-3,5 participants0.dat \

>| paste -d: number.dat - >p-number.dat

$cat p-number.dat

123:Smith:Herbert:Pantington AC:Men

13:Prowler:Desmond:Lameborough TFC:Men

217:Fleetman:Fred:Rundale Sportsters:Men

154:Jumpabout:Mike:Fairing Track Society:Men

26:de Leaping:Gwen:Fairing Track Society:Ladies

117:Runnington:Vivian:Lameborough TFC:Ladies

93:Sweat:Susan:Rundale Sportsters:Ladies

119:Runnington:Kathleen:Lameborough TFC:Ladies

55:Longshanks:Loretta: Pantington AC:Ladies

45:O'Finnan:Jack:Fairing Track Society:Men

57:Oblomovsky:Katie:Rundale Sportsters:Ladies

This le may now conveniently be sorted by number using “

sort -n p-number.dat

”.

With

-s

(“serial”), the given les are processed in sequence. First, all the linesJoin files serially

of the rst le are joined into one single line (using the separator character), then

all lines from the second le make up the second line of the output etc.

$cat list1

Wood

Bell

Potter

$cat list2

Keeper

Chaser

Seeker

$paste -s list*

Wood Bell Potter

Keeper Chaser Seeker

All les matching the

list*

wildcard pattern—in this case,

list1

and

list2

—are

joined using

paste

. The

-s

option causes every line of these les to make up one

column of the output.

The

join

command joins the lines of les, too, but in a much more sophisticated“Relational” joining of files

manner. Instead of just joining the rst lines, second lines, …, it considers one

designated eld per line and joins two lines only if the values in these elds are

equal. Hence,

join

implements the eponymous operator from relational algebra,

7.5 Data Management 115

Table 7.11: Options for

join

(selection)

Option Result

-j1

𝑛Uses eld 𝑛of the rst le as the “join eld” (𝑛 ≥ 1). Synonym:

-1

𝑛.

-j2

𝑛Uses eld 𝑛of the second le as the “join eld” (𝑛 ≥ 1). Synonym:

-2

𝑛.

-j

𝑛(join) Abbreviation for “

-j1

𝑛

-j2

𝑛”

-o

𝑓(output) Output line specication. 𝑓is a comma-separated sequence of eld specica-

tions, where each eld specication is either the digit “

” or a eld number 𝑚.𝑛.

“

” is the “join eld”, 𝑚is 1or 2, and 𝑛is a eld number in the rst or second

le.

-t

𝑐The 𝑐character will be used as the eld separator for input and output.

as seen in SQL databases—even though the actual operation is a lot cruder and

more inecient than with a “real” database.

Even so, Examplethe

join

command does come in useful. Imagine that the Example

big day has arrived and the Lameborough TFC’s marathon has been run. The

umpires have been diligent and not only have timed how long everybody took, but

also entered them into a le

times.dat

. The rst columns is always a participant’s

number, the second the time achieved (in whole seconds, for simplicity):

$cat times.dat

45:8445

123:8517

217:8533

93:8641

154:8772

119:8830

13:8832

117:8954

57:9111

26:9129

Now we want to join this le with the list of participants, in order to assign each

time to the corresponding participant. To do so, we must rst sort the result le

by participant number:

$sort -n times.dat >times-s.dat

Next we can use

join

to join the lines of le

times-s.dat

to the corresponding lines of

the modied list of participants from the

paste

example—

join

presumes by default

that the input les are sorted by the value of the “join eld”, and that the “join

eld” is the rst eld of each line.

$cat p-number.dat

123:Smith:Herbert:Pantington AC:Men

13:Prowler:Desmond:Lameborough TFC:Men

217:Fleetman:Fred:Rundale Sportsters:Men

154:Jumpabout:Mike:Fairing Track Society:Men

26:de Leaping:Gwen:Fairing Track Society:Ladies

117:Runnington:Vivian:Lameborough TFC:Ladies

93:Sweat:Susan:Rundale Sportsters:Ladies

119:Runnington:Kathleen:Lameborough TFC:Ladies

55:Longshanks:Loretta: Pantington AC:Ladies

45:O'Finnan:Jack:Fairing Track Society:Men

57:Oblomovsky:Katie:Rundale Sportsters:Ladies

$sort -n p-number.dat \

>| join -t: times-s.dat - >p-times.dat

116 7 Standard I/O and Filter Commands

$cat p-times.dat

13:8832:Prowler:Desmond:Lameborough TFC:Men

26:9129:de Leaping:Gwen:Fairing Track Society:Ladies

45:8445:O'Finnan:Jack:Fairing Track Society:Men

57:9111:Oblomovsky:Katie:Rundale Sportsters:Ladies

93:8641:Sweat:Susan:Rundale Sportsters:Ladies

117:8954:Runnington:Vivian:Lameborough TFC:Ladies

119:8830:Runnington:Kathleen:Lameborough TFC:Ladies

123:8517:Smith:Herbert:Pantington AC:Men

154:8772:Jumpabout:Mike:Fairing Track Society:Men

217:8533:Fleetman:Fred:Rundale Sportsters:Men

The resulting le

p-times.dat

now just needs to be sorted by time:

$sort -t: -k2,2 p-times.dat

45:8445:O'Finnan:Jack:Fairing Track Society:Men

123:8517:Smith:Herbert:Pantington AC:Men

217:8533:Fleetman:Fred:Rundale Sportsters:Men

93:8641:Sweat:Susan:Rundale Sportsters:Ladies

154:8772:Jumpabout:Mike:Fairing Track Society:Men

119:8830:Runnington:Kathleen:Lameborough TFC:Ladies

13:8832:Prowler:Desmond:Lameborough TFC:Men

117:8954:Runnington:Vivian:Lameborough TFC:Ladies

57:9111:Oblomovsky:Katie:Rundale Sportsters:Ladies

26:9129:de Leaping:Gwen:Fairing Track Society:Ladies

This is a nice example of how Linux’s standard tools make even fairly complicated

text and data processing possible. In “real life”, one would use shell scripts to

prepare these processing steps and automate them as far as possible.

Exercises

C7.27 [!2] Generate a new version of the

participants.dat

le (the one with

xed-width columns) in which the participant numbers and club aliations

do not occur.

C7.28 [!2] Generate a new version of the

participants0.dat

le (the one with

elds separated using colons) in which the participant numbers and club

aliations do not occur.

C7.29 [3] Generate a version of

participants0.dat

in which the elds are not

separated by colons but by the string “

,␣

” (a comma followed by a space

character).

C7.30 [3] How many groups are used as primary groups by users on your

system? (The primary group of a user is the fourth eld in

/etc/passwd

7.5 Data Management 117

Commands in this Chapter

cat

Concatenates les (among other things)

cat

(1) 94

cut

Extracts elds or columns from its input

cut

(1) 112

expand

Replaces tab characters in its input by an equivalent number of spaces

expand

(1) 102

fmt

Wraps the lines of its input to a given width

fmt

(1) 103

Abbreviation for

hexdump hexdump

(1) 98

head

Displays the beginning of a le

head

(1) 96

hexdump

Displays le contents in hexadecimal (octal, …) form

hexdump

(1) 98

join

Joins the lines of two les according to relational algebra

join

(1) 114

Displays binary data in decimal, octal, hexadecimal, … formats

(1) 97

paste

Joins lines from dierent input les

paste

(1) 114

Prepares its input for printing—with headers, footers, etc.

(1) 104

reset

Resets a terminal’s character set to a “reasonable” value

tset

(1) 95

sort

Sorts its input by line

sort

(1) 107

tac

Displays a le back to front

tac

(1) 95

tail

Displays a le’s end

tail

(1) 96

Substitutes or deletes characters on its standard input

(1) 100

unexpand

“Optimises” tabs and spaces in its input lines

unexpand

(1) 102

uniq

Replaces sequences of identical lines in its input by single specimens

uniq

(1) 111

Counts the characters, words and lines of its input

(1) 107

Summary

• Every Linux program supports the standard I/O channels

stdin

stdout

, and

stderr

• Standard output and standard error output can be redirected using opera-

tors

and

, standard input using operator

• Pipelines can be used to connect the standard output and input of programs

directly (without intermediate les).

• Using the

tee

command, intermediate results of a pipeline can be stored to

les.

• Filter commands (or “lters”) read their standard input, manipulate it, and

write the results to standard output.

• The

command substitutes or deletes single characters.

expand

and

unexpand

convert tabs to spaces and vice-versa.

• With

, you can prepare data for printing—not actually print it.

•

can be used to count the lines, words and characters of the standard input

(or a number of named les).

•

sort

is a versatile program for sorting.

• The

cut

command cuts specied ranges of columns or elds from every line

of its input.

• With

paste

, the lines of les can be joined.

$ echo tux

tux

$ ls

hallo.c

hallo.o

$ /bin/su -

Password:

More About The Shell

Contents

8.1 Simple Commands:

sleep

echo

, and

date

............120

8.2 Shell Variables and The Environment. . . . . . . . . . . . . 121

8.3 Command Types—Reloaded. . . . . . . . . . . . . . . . 123

8.4 The Shell As A Convenient Tool. . . . . . . . . . . . . . . 124

8.5 Commands From A File . . . . . . . . . . . . . . . . . 128

8.6 The Shell As A Programming Language. . . . . . . . . . . . 129

8.6.1 Foreground and Background Processes . . . . . . . . . . 132

Goals

• Knowing about shell variables and evironment variables

• Handling foreground and background processes

Prerequisites

• Basic shell knowledge (Chapter 3)

• File management and simple lter commands (Chapter 6, Chapter 7)

• Use of a text editor (Chapter 5)

grd1-shell2.tex

(

be27bba8095b329b

)

120 8 More About The Shell

8.1 Simple Commands:

sleep

echo

, and

date

To give you some tools for experiments, we shall now explain some very simple

commands:

sleep This command does nothing for the number of seconds specied as the

argument. You can use it if you want your shell to take a little break:

$sleep 10

Nothing happens for approximately 10 seconds

$ _

echo The command

echo

outputs its arguments (and nothing else), separated byOutput arguments

spaces. It is still interesting and useful, since the shell replaces variable references

(see Section 8.2) and similar things rst:

$p=Planet

$echo Hello $p

Hello Planet

$echo Hello ${p}oid

Hello Planetoid

(The second

echo

illustrates what to do if you want to append something directly

to the value of a variable.)

BIf

echo

is called with the

-n

option, it does not write a line terminator at the

end of its output:

$echo -n Hello

Hello_

date The

date

command displays the current date and time. You have consider-date and time

able leeway in determining the format of the output—call “

date --help

”, or read

the online documentation using “

man date

”.

B(When reading through this manual for the second time:) In particular,

date

serves as a world clock, if you rst set the

environment variable to the

name of a time zone or important city (usually capital):

$date

Thu Oct 5 14:26:07 CEST 2006

$export TZ=Asia/Tokyo

$date

Tue Oct 5 21:26:19 JST 2006

$unset TZ

You can nd out about valid time zone and city names by rooting around

/usr/share/zoneinfo

While every user is allowed to read the system time, only the system administra-Set the system time

tor

root

may change the system time using the

date

command and an argument of

the form

MMDDhhmm

, where

is the calendar month,

the calendar day,

the hour,

and

the minute. You can optionally add two digits the year (plus possibly an-

other two for the century) and the seconds (separated with a dot), which should,

however, prove necessary only in very rare cases.

8.2 Shell Variables and The Environment 121

$date

Thu Oct 5 14:28:13 CEST 2006

$date 08181715

date: cannot set date: Operation not permitted

Fri Aug 18 17:15:00 CEST 2006

BThe

date

command only changes the internal time of the Linux system. This

time will not necessarily be transferred to the CMOS clock on the computer’s

mainboard, so a special command may be required to do so. Many distri-

butions will do this automatically when the system is shut down.

Exercises

C8.1 [!3] Assume now is 22 October 2003, 12:34 hours and 56 seconds. Study

the

date

documentation and state formatting instructions to achieve the fol-

lowing output:

22-10-2003

03-294 (WK43)

(Two-digit year, number of day within year, calendar

week)

12h34m56s

C8.2 [!2] What time is it now in Los Angeles?

8.2 Shell Variables and The Environment

Like most common shells,

bash

has features otherwise found in programming lan-

guages. For example, it is possible to store pieces of text or numbers in variables

and retrieve them later. Variables also control various aspects of the operation of

the shell itself.

Within the shell, a variable is set by means of a command like “

foo=bar

” (this Setting variables

command sets the

foo

variable to the textual value

bar

). Take care not to insert

spaces in front of or behind the equals sign! You can retrieve the value of the

variable by using the variable name with a dollar sign in front:

$foo=bar

$echo foo

foo

$echo $foo

bar

(note the dierence).

We distinguish environment variables from shell variables. Shell variables environment variables

shell variables

are only visible in the shell in which they have been dened. On the other hand,

environment variables are passed to the child process when an external command

is started and can be used there. (The child process does not have to be a shell;

every Linux process has environment variables). All the environment variables of

a shell are also shell variables but not vice versa.

Using the

export

command, you can declare an existing shell variable an envi-

export

ronment variable:

$foo=bar foo

is now a shell variable

$export foo foo

is now an environment variable

Or you dene a new variable as a shell and environment variable at the same time:

122 8 More About The Shell

Table 8.1: Important Shell Variables

Variable Meaning

PWD

Name of the current directory

EDITOR

Name of the user’s favourite editor

PS1

Shell command prompt template

UID

Current user’s user name

HOME

Current user’s home directory

PATH

List of directories containing executable programs that are

eligible as external commands

LOGNAME

Current user’s user name (again)

$export foo=bar

The same works for several variables simultaneously:

$export foo baz

$export foo=bar baz=quux

You can display all environment variables using the

export

command (with no

parameters). The

env

command (also with no parameters) also displays the cur-

rent environment. All shell variables (including those which are also environment

variables) can be displayed using the

set

command. The most common variables

and their meanings are shown in Table 8.1.

BThe

set

command also does many other strange and wonderful things. You

will encounter it again in the Linup Front training manual Advanced Linux,

which covers shell programming.

env

, too, is actually intended to manipulate the process environment rather

than just display it. Consider the following example:

$env foo=bar bash

Launch child shell with

foo

$echo $foo

bar

$exit

Back to the parent shell

$echo $foo

Not dened

$ _

BAt least with

bash

(and relations) you don’t really need

env

to execute com-

mands with an extended environment – a simple

$foo=bar bash

does the same thing. However,

env

also allows you to remove variables from

the environment temporarily (how?).

If you have had enough of a shell variable, you can delete it using the

unset

Delete a variable

command. This also removes it from the environment. If you want to remove a

variable from the environment but keep it on as a shell variable, use “

export -n

”:

$export foo=bar foo

is an environment variable

$export -n foo foo

is a shell variable (only)

$unset foo foo

is gone and lost forever

8.3 Command Types—Reloaded 123

8.3 Command Types—Reloaded

One application of shell variables is controlling the shell itself. Here’s another ex- Controlling the shell

ample: As we discussed in Chapter 3, the shell distinguishes internal and external

commands. External commands correspond to executable programs, which the

shell looks for in the directories that make up the value of the

PATH

environment

variable. Here is a typical value for

PATH

$echo $PATH

/home/joe/bin:/usr/local/bin:/usr/bin:/bin:/usr/games

Individual directories are separated in the list by colons, therefore the list in the

example consists of ve directories. If you enter a command like

$ls

the shell knows that this isn’t an internal command (it knows its internal com-

mands) and thus begins to search the directories in

PATH

, starting with the leftmost

directory. In particular, it checks whether the following les exist:

/home/joe/bin/ls

Nope …

/usr/local/bin/ls

Still no luck …

/usr/bin/ls

Again no luck …

/bin/ls

Gotcha!

The directory

/usr/games

is not checked.

This implies that the

/bin/ls

le will be used to execute the

command.

BOf course this search is a fairly involved process, which is why the shell

prepares for the future: If it has once identied the

/bin/ls

le as the im-

plementation of the

command, it remembers this correspondence for the

time being. This process is called “hashing”, and you can see that it did take

place by applying

type

to the

command.

$type ls

ls is hashed (/bin/ls)

BThe

hash

command tells you which commands your

bash

has “hashed” and

how often they have been invoked in the meantime. With “

hash -r

” you can

delete the shell’s complete hashing memory. There are a few other options

which you can look up in the

bash

manual or nd out about using “

help hash

”.

BStrictly speaking, the

PATH

variable does not even need to be an environment

variable—for the current shell a shell variable would do just ne (see Exer-

cise 8.5). However it is convenient to dene it as an environment variable so

the shell’s child processes (often also shells) use the desired value.

If you want to nd out exactly which program the shell uses for a given external

command, you can use the

which

command:

$which grep

/bin/grep

which

uses the same method as the shell—it starts at the rst directory in

PATH

and

checks whether the directory in question contains an executable le with the same

name as the desired command.

124 8 More About The Shell

which

knows nothing about the shell’s internal commands; even though

something like “

which test

” returns “

/usr/bin/test

”, this does not imply

that this program will, in fact, be executed, since internal commands have

precedence. If you want to know for sure, you need to use the “

type

” shell

command.

The

whereis

command not only returns the names of executable programs, but

also documentation (man pages), source code and other interesting les pertain-

ing to the command(s) in question. For example:

$whereis passwd

passwd: /usr/bin/passwd /etc/passwd /etc/passwd.org /usr/share/passwd





/usr/share/man/man1/passwd.1.gz /usr/share/man/man1/passwd.1ssl.gz





/usr/share/man/man5/passwd.5.gz

This uses a hard-coded method which is explained (sketchily) in

whereis

(1).

Exercises

C8.3 [!2] Convince yourself that passing (or not passing) environment and

shell variables to child processes works as advertised, by working through

the following command sequence:

$foo=bar foo

is a shell variable

$bash

New shell (child process)

$echo $foo

foo

is not dened

$exit

Back to the parent shell

$export foo foo

is an environment variable

$bash

New shell (child process)

$echo $foo

bar

Environment variable was passed along

$exit

Back to the parent shell

C8.4 [!2] What happens if you change an environment variable in the child

process? Consider the following command sequence:

$foo=bar foo

is a shell variable

$bash

New shell (child process)

$echo $foo

bar

Environment variable was passed along

$foo=baz

New value

$exit

Back to the parent shell

$echo $foo

What do we get??

C8.5 [2] Make sure that the shell’s command line search works even if

PATH

a “only” simple shell variable rather than an environment variable. What

happens if you remove

PATH

completely?

C8.6 [!1] Which executable programs are used to handle the following com-

mands:

fgrep

sort

mount

xterm

C8.7 [!1] Which les on your system contain the documentation for the

“

crontab

” command?

8.4 The Shell As A Convenient Tool

Since the shell is the tool many Linux users use most often, its developers have

spared no trouble to make its use convenient. Here are some more useful tries:

8.4 The Shell As A Convenient Tool 125

Command Editor You can edit command lines like in a simple text editor. Hence,

you can move the cursor around in the input line and delete or add characters

arbitrarily before nishing the input using the return key. The behaviour of this

editor can be adapted to that of the most popular editors on Linux (Chapter 5)

using the “

set -o vi

” and “

set -o emacs

” commands.

Aborting Commands With so many Linux commands around, it is easy to con-

fuse a name or pass a wrong parameter. Therefore you can abort a command

while it is being executed. You simply need to press the Ctrl +ckeys at the same

time.

The History The shell remembers ever so many of your most recent commands as

part of the “history”, and you can move through this list using the ↑and ↓cur-

sor keys. If you nd a previous command that you like you can either re-execute

it unchanged using ↩, or else edit it as described above. You can search the list

“incrementally” using Ctrl +r—simply type a sequence of characters, and the

shell shows you the most recently executed command containing this sequence.

The longer your sequence, the more precise the search.

BWhen you log out of the system, the shell stores the history in the hidden

le

~/.bash_history

and makes it available again after your next login. (You

may use a dierent le name by setting the

HISTFILE

variable to the name in

question.)

BA consequence of the fact that the history is stored in a “plain” le is that you

can edit it using a text editor (Chapter 5 tells you how). So in case you acci-

dentally enter your password on the command line, you can (and should!)

remove it from the history manually—in particular, if your system is one of

the more freewheeling ones where home directories are visible to anybody.

BBy default, the shell remembers the last 500 commands; you can change this

by putting the desired number into the

HISTSIZE

variable. The

HISTFILESIZE

command species how many commands to write to the

HISTFILE

le – usu-

ally 500 as well.

Besides the arrow keys you can access the history also via “magical” character

sequences in new commands. The shell replaces these character sequences rst,

immediately after the command line has been read. Replacement proceeds in two

stages:

• At rst the shell determines which command from the history to use for

the replacement. The

sequence stands for the immediately preceding

command,

𝑛refers to the 𝑛th command before the current one (

!-2

, for

example, to the penultimate one), and

𝑛to the command with number 𝑛

in the history. (The

history

command outputs the whole history including

numbers for the commands.)

!xyz

selects the most recent command starting

with

xyz

, and

!?xyz

the most recent command containing

xyz

• After that, the shell decides which part of the selected command will be

“recycled” and how. If you do not specify anything else, the complete com-

mand will be inserted; otherwise there are various selection methods. All

these selection methods are separated from the command selection charac-

ter sequence by a colon (“

”).

𝑛Selects the 𝑛-th word. Word 0 is the command itself.

Selects the rst word (immediately after the command).

Selects the nal word.

𝑚

𝑛Selects words 𝑚through 𝑛.

126 8 More About The Shell

𝑛

Selects all words starting at word 𝑛.

𝑛

Selects all words starting at word 𝑛except for the nal one.

Some examples for clarity:

!-2:$

Picks the nal word of the penultimate command.

!!:0-

Picks the complete immediately preceding command except for the

nal word.

!333^

Picks the rst word from command 333.

The nal example, incidentally, is not a typo; if the rst character from

the intra-command selection is from the list

^$*-%

you may leave out the

colon.—If you like, look at the

bash

documentation (section HISTORY) to

nd out what else the shell has in store. As far as we (and the LPI) are con-

cerned you do not need to learn all of this o by heart.

BThe history is one of the things that

bash

took over from the C shell, and

whoever did not use Unix during the 1980s may have some trouble imag-

ining what the world looked like before interactive command line editing

was invented. (For Windows users, this time doesn’t even go that far back.)

During that time, the history with all its

selectors and transformations was

widely considered the best idea since sliced bread; today its documentation

exudes the sort of morbid fascination one would otherwise associate with

the user manual for a Victorian steam engine.

BSome more remarks concerning the

history

command: An invocation like

$history 33

(with a number as the parameter) only outputs that many history lines.

“

history -c

” empties the history completely. There are some more options;

check the

bash

documentation or try “

help history

”.

Autocompletion A massive convenience is

bash

’s ability to automatically com-Completing com-

mand and file names plete command and le names. If you hit the Tab key, the shell completes an

incomplete input if the continuation can be identied uniquely. For the rst word

of a command,

bash

considers all executable programs, within the rest of the com-

mand line all the les in the current or specied directory. If several commands

or les exist whose names start out equal, the shell completes the name as far as

possible and then signals acoustically that the command or le name may still be

incomplete. Another Tab press then lists the remaining possibilities.

BIt is possible to adapt the shell’s completion mechanism to specic pro-

grams. For example, on the command line of a FTP client it might oer

the names of recently visited FTP servers in place of le names. Check the

bash

documentation for details.

Table 8.2 gives an overview of the most important key strokes within

bash

Multiple Commands On One Line You are perfectly free to enter several com-

mands on the same input line. You merely need to separate them using a semi-

colon:

$echo Today is; date

Today is

Fri 5 Dec 12:12:47 CET 2008

In this instance the second command will be executed once the rst is done.

8.4 The Shell As A Convenient Tool 127

Table 8.2: Key Strokes within

bash

Key Stroke Function

↑or ↓Scroll through most recent commands

Ctrl +rSearch command history

←bzw. →Move cursor within current command line

Home oder Ctrl +aJump to the beginning of the command line

End oder Ctrl +eJump to the end of the command line

⇐bzw. Del Delete character in front of/under the cursor,

respectively

Ctrl +tSwap the two characters in front of and under

the cursor

Ctrl +lClear the screen

Ctrl +cInterrupt a command

Ctrl +dEnd the input (for login shells: log o)

Conditional Execution Sometimes it is useful to make the execution of the second

command depend on whether the rst was executed correctly or not. Every Unix

process yields a return value which states whether it was executed correctly or return value

whether errors of whatever kind have occurred. In the former case, the return

value is 0; in the latter, it is dierent from 0.

BYou can nd the return value of a child process of your shell by looking at

the

variable:

$bash

Start a child shell …

$exit 33

… and exit again immediately

exit

$echo $?

The value from our

exit

above

$ _

But this really has no bearing on the following.

With

as the “separator” between two commands (where there would other-

wise be the semicolon), the second command is only executed when the rst has

exited successfully. To demonstrate this, we use the shell’s

-c

option, with which

you can pass a command to the child shell on the command line (impressive, isn’t

it?):

$bash -c "exit 0" && echo "Successful"

Successful

$bash -c "exit 33" && echo "Successful"

Nothing -- 33 isn’t success!

Conversely, with

as the “separator”, the second command is only executed

if the rst did not nish successfully:

$bash -c "exit 0" || echo "Unsuccessful"

$bash -c "exit 33" || echo "Unsuccessful"

Unsuccessful

Exercises

C8.8 [3] What is wrong about the command “

echo "Hello!"

”? (Hint: Experi-

ment with commands of the form “

!-2

” or “

!ls

”.)

128 8 More About The Shell

8.5 Commands From A File

You can store shell commands in a le and execute them en bloc. (Chapter 5 ex-

plains how to conveniently create les.) You just need to invoke the shell and pass

the le name as a parameter:

$bash my-commands

Such a le is also called a shell script, and the shell has extensive programmingshell script

features that we can only outline very briey here. (The Linup Front training

manual Advanced Linux explains shell programming in great detail.)

BYou can avoid having to prepend the

bash

command by inserting the magical

incantation

#!/bin/bash

as the rst line of your le and making the le “executable”:

$chmod +x my-commands

(You will nd out more about

chmod

and access rights in Chapter 12.) After

this, the

$./my-commands

command will suce.

If you invoke a shell script as above, whether with a prepended

bash

or as an

executable le, it is executed in a subshell, a shell that is a child process of thesubshell

current shell. This means that changes to, e. g., shell or environment variables

do not inuence the current shell. For example, assume that the le

assignment

contains the line

foo=bar

Consider the following command sequence:

$foo=quux

$bash assignment

Contains

foo=bar

$echo $foo

quux

No change; assignment was only in subshell

This is generally considered a feature, but every now and then it would be quite

desirable to have commands from a le aect the current shell. That works, too:

The

source

command reads the lines in a le exactly as if you would type them

directly into the current shell—all changes to variables (among other things) hence

take eect in your current shell:

$foo=quux

$source assignment

Contains

foo=bar

$echo $foo

bar

Variable was changed!

A dierent name for the

source

command, by the way, is “

”. (You read correctly

– dot!) Hence

$source assignment

8.6 The Shell As A Programming Language 129

is equivalent to

$. assignment

BLike program les for external commands, the les to be read using

source

are searched in the directories given by the

PATH

variable.

8.6 The Shell As A Programming Language

Being able to execute shell commands from a le is a good thing, to be sure.

However, it is even better to be able to structure these shell commands such that

they do not have to do the same thing every time, but—for example—can ob-

tain command-line parameters. The advantages are obvious: In often-used pro-

cedures you save a lot of tedious typing, and in seldom-used procedures you can

avoid mistakes that might creep in because you accidentally leave out some im-

portant step. We do not have space here for a full explanation of the shell als a

programming language, but fortunately there is enough room for a few brief ex-

amples.

Command-line parameters When you pass command-line parameters to a shell

script, the shell makes them available in the variables

, …. Consider the Single parameters

following example:

$cat hello

#!/bin/bash

echo Hello $1, are you free $2?

$./hello Joe today

Hello Joe, are you free today?

$./hello Sue tomorrow

Hello Sue, are you free tomorrow?

The

contains all parameters at once, and the number of parameters is in

:All parameters

$cat parameter

#!/bin/bash

echo $# parameters: $*

$./parameter

0 parameters:

$./parameter dog

1 parameters: dog

$./parameter dog cat mouse tree

4 parameters: dog cat mouse tree

Loops The

for

command lets you construct loops that iterate over a list of words

(separated by white space):

$for i in 1 2 3

>do

>echo And $i!

>done

And 1!

And 2!

And 3!

Here, the

variable assumes each of the listed values in turn as the commands

between

and

done

are executed.

This is even more fun if the words are taken from a variable:

130 8 More About The Shell

$list='4 5 6'

$for i in $list

>do

>echo And $i!

>done

And 4!

And 5!

And 6!

If you omit the “

…”, the loop iterates over the command line parameters:Loop over parameters

$cat sort-wc

#!/bin/bash

# Sort files according to their line count

for f

echo `wc -l <"$f» lines in $f

done | sort -n

$./sort-wc /etc/passwd /etc/fstab /etc/motd

(The “

wc -l

” command counts the lines of its standard input or the le(s) passed

on the command line.) Do note that you can redirect the standard output of a loop

sort

using a pipe line!

Alternatives You can use the aforementioned

and

operators to execute cer-

tain commands only under specic circumstances. The

#!/bin/bash

# grepcp REGEX

rm -rf backup; mkdir backup

for f in *.txt

grep $1 "$f" && cp "$f" backup

done

script, for example, copies a le to the

backup

directory only if its name ends with

.txt

(the

for

loop ensures this) and which contain at least one line matching the

regular expression that is passed as a parameter.

A useful tool for alternatives is the

test

command, which can check a large

test

variety of conditions. It returns an exit code of 0(success), if the condition holds,

else a non-zero exit code (failure). For example, consider

#!/bin/bash

# filetest NAME1 NAME2 ...

for name

test -d "$name" && echo $name: directory

test -f "$name" && echo $name: file

test -L "$name" && echo $name: symbolic link

done

This script looks at a number of le names passed as parameters and outputs for

each one whether it refers to a directory, a (plain) le, or a symbolic link.

AThe

test

command exists both as a free-standing program in

/bin/test

and

as a built-in command in

bash

and other shells. These variants can dier

subtly especially as far as more outlandish tests are concerned. If in doubt,

read the documentation.

8.6 The Shell As A Programming Language 131

You can use the

command to make more than one command depend on a

condition (in a convenient and readable fashion). You may write “

[

…

]

” instead

of “

test

…”:

#!/bin/bash

# filetest2 NAME1 NAME2 ...

for name

if [ -L "$name" ]

then

echo $name: symbolic link

elif [ -d "$name" ]

echo $name: directory

elif [ -f "$name" ]

echo $name: file

else

echo $name: no idea

done

If the command after the

signals “success” (exit code 0), the commands after

then

will be executed, up to the next

elif

else

, or

. If on the other hand it sig-

nals “failure”, the command after the next

elif

will be evaluated next and its exit

code will be considered. The shell continues the pattern until the matching

reached. Commands after the

else

are executed if none of the

elif

commands

resulted in “success”. The

elif

and

else

branches may be omitted if they are not

required.

More loops With the

for

loop, the number of trips through the loop is xed at

the beginning (the number of words in the list). However, we often need to deal

with situations where it is not clear at the beginning how often a loop should be

executed. To handle this, the shell oers the

while

loop, which (like

) executes

while

a command whose success or failure determines what to do about the loop: On

success, the “dependent” commands will be executed, on failure execution will

continue after the loop.

The following script reads a le like

Aunt Maggie:maggie@example.net:the delightful tea cosy

Uncle Bob:bob@example.com:the great football

(whose name is passed on the command line) and constructs a thank-you e-mail

message from each line (Linux is very useful in daily life):

#!/bin/bash

# birthday FILE

IFS=:

while read name email present

(echo $name

echo ""

echo "Thank you very much for $present!"

echo "I enjoyed it very much."

echo ""

echo "Best wishes"

echo "Tim") | mail -s "Many thanks!" $email

done <$1

The

read

command reads the input le line by line and splits each line at the colons

read

132 8 More About The Shell

(variable

IFS

) into the three elds

name

, and

present

which are then made avail-

able as variables inside the loop. Somewhat counterintuitively, the input redirec-

tion for the loop can be found at the very end.

APlease test this script with innocuous e-mail addresses only, lest your rela-

tions become confused!

Exercises

C8.9 [1] What is the dierence (as far as loop execution is concerned) between

for f; do …; done

and

for f in $*; do …; done

? (Try it, if necessary)

C8.10 [2] In the

sort-wc

script, why do we use the

wc -l <$f

instead of

wc -l $f

C8.11 [2] Alter the

grepcp

such that the list of les to be considered is also

taken from the command line. (Hint: The

shift

shell command removes the

rst command line parameter from

and pulls all others up to close the gap.

After a

shift

, the previous

is now

and so on.)

C8.12 [2] Why does the

filetest

script output

$./filetest foo

foo: file

foo: symbolic link

for symbolic links (instead of just »

foo: symbolic link

«)?

8.6.1 Foreground and Background Processes

After a command has been entered, it is processed by the shell. The shell exe-

cutes internal commands directly; for external commands, the shell generates a

child process, which is used to execute the command and terminates itself af-child process

terwards. In Unix, a process is a running programm; the same program can be

executed several times simultaneously (e.g., by dierent users) and corresponds

with several processes. Every process can generate child processes (even if most

of them—unlike shells—don’t).

Usually, the shell waits until the child process has done its work and termi-

nates. You can tell by the fact that no new shell prompt is displayed while the

child process is running. After the child process has exited, the shell reads and

processes its return value, and only after that it displays a new shell prompt. The

execution of the shell and the child process is, so to speak, synchronised. This

“synchronous” manner of processing commands is displayed in Figure 8.1; from

the user’s point of view it looks like the following:

$sleep 10

Nothing happens for approximately 10 seconds

$ _

8.6 The Shell As A Programming Language 133

Time

Shell

Child

Process

Start

End

waits

Figure 8.1: Synchronous command execution in the shell

Time

Shell Child

Process

Start

End

Figure 8.2: Asynchronous command execution in the shell

134 8 More About The Shell

Table 8.3: Options for

jobs

Option Meaning

-l

(long) Adds PIDs to the output

-n

(notify) Displays only processes that have been terminated since

the last invocation of

jobs

-p

(process) Displays only PIDs

If you do not want the shell to wait until the child process has nished, you

have to append an ampersand (

) to the command line. Then, while the child

process is executed in the background, a short message appears on the terminal,

immediately followed by the shell’s command prompt:

$sleep 10 &

[2] 6210

And then immediately:

$ _

This mode of operation is called “asynchronous”, since the shell does not wait

idly for the child process to nish (qv. Figure 8.2).

BThe “

[2] 6210

” means that the system has created the process with the num-

ber (or “process ID”) 6210 as “job” number 2. These numbers will probably

dier on your system.

BSyntactically, the

really acts like a semicolon, and can therefore serve as a

separator between commands. See Exercise 8.14.

Here are some hints for successful background process operation:

• The background process should not expect keyboard input, since the shell

cannot determine to which process—foreground or background—any key-

board input should be assigned. If necessary, input can be taken from a le.

This is covered more extensively in Chapter 7.

• The background process should not direct output to the terminal, since

these may be mixed up with the output of foreground processes or dis-

carded altogether. Again, there is more about this in Chapter 7.

• If the parent process (the shell) is aborted, all its children (and consequently

their children etc.) will in many cases be terminated as well. Only processes

that completely disawov their parents are exempted from this; this applies,

e. g., to processes that perform system services in the background.

When several processes are executed in the background from the same shell,Job control

it is easy to lose track. Therefore the shell makes available an (internal) command

that you can use to nd out about the state of background processes—

jobs

. If

jobs

is invoked without options, its output consists of a list of job numbers, process

states and command lines. This looks approximately like the following:

$jobs

[1] Done sleep

$ _

In this case, job number 1 has already nished (“Done”), otherwise the message

“

Running

” would have appeared. The

jobs

command supports various options, the

most important of which are shown in Table 8.3.

The shell makes it possible to stop a foreground process using Ctrl +z. This

process is displayed by

jobs

with a “

Stopped

” status and can be continued as a back-

ground process using the

command. (Otherwise, processes stay stopped until

8.6 The Shell As A Programming Language 135

hell freezes over, or the next system restart, whichever occurs earlier.) For exam-

ple, “

bg %5

” will send job 5 to the background, where it will continue to run.

Conversely, you can select one of a number of background processes and fetch

it back to the foreground using the

command. The syntax of the

command

is equivalent to that of the

command.

You can terminate a foreground process from the shell with the Ctrl +ckey

sequence. A background process can be terminated directly using the

kill

com-

mand followed by a job number with a leading percent character (similar to

Exercises

C8.13 [2] Use a suitably spectacular program (such as the OpenGL demo

gears

under X11 in the SUSE distributions, alternatively, for example, “

xclock -

update 1

”) to experiment with background processes and job control. Make

sure that you are able to start background processes, to stop foreground

processes using Ctrl +zand send them to the background using

, to list

background processes using

jobs

and so on.

C8.14 [3] Describe (and explain) the dierences between the following three

command lines:

$ sleep 5 ; sleep 5

$ sleep 5 ; sleep 5 &

$ sleep 5 & sleep 5 &

Commands in this Chapter

Reads a le containing shell commands as if they had been entered on

the command line

bash

(1) 128

Continues a (stopped) process in the background

bash

(1) 134

date

Displays the date and time

date

(1) 120

env

Outputs the process environment, or starts programs with an adjusted

environment

env

(1) 122

export

Denes and manages environment variables

bash

(1) 121

Fetches a background process back to the foreground

bash

(1) 134

gears

Displays turning gears on X11

gears

(1) 135

hash

Shows and manages ”‘seen”’ commands in

bash bash

(1) 123

history

Displays recently used

bash

command lines

bash

(1) 125

jobs

Reports on background jobs

bash

(1) 134

kill

Terminates a background process

bash

(1),

kill

(1) 135

set

Manages shell variables and options

bash

(1) 122

source

Reads a le containing shell commands as if they had been entered on

the command line

bash

(1) 128

test

Evaluates logical expressions on the command line

test

(1),

bash

(1) 130

unset

Deletes shell or environment variables

bash

(1) 122

whereis

Searches executable programs, manual pages, and source code for given

programs

whereis

(1) 123

which

Searches programs along

PATH which

(1) 123

xclock

Displays a graphical clock

xclock

(1x) 135

136 8 More About The Shell

Summary

• The

sleep

command waits for the number of seconds specied as the argu-

ment.

• The

echo

command outputs its arguments.

• The date and time may be determined using

date

• Various

bash

features support interactive use, such as command and le

name autocompletion, command line editing, alias names and variables.

• External programs can be started asynchronously in the background. The

shell then immediately prints another command prompt.

$ echo tux

tux

$ ls

hallo.c

hallo.o

$ /bin/su -

Password:

The File System

Contents

9.1 Terms........................138

9.2 File Types. . . . . . . . . . . . . . . . . . . . . . . 138

9.3 The Linux Directory Tree . . . . . . . . . . . . . . . . . 139

9.4 Directory Tree and File Systems. . . . . . . . . . . . . . . 147

9.5 Removable Media. . . . . . . . . . . . . . . . . . . . 148

Goals

• Understanding the terms “le” and “le system”

• Recognising the dierent le types

• Knowing your way around the directory tree of a Linux system

• Knowing how external le systems are integrated into the directory tree

Prerequisites

• Basic Linux knowledge (from the previous chapters)

• Handling les and directories (Chapter 6)

grd1-dateisystem.tex

(

be27bba8095b329b

)

138 9 The File System

Table 9.1: Linux le types

Type

ls -l ls -F

Create using …

plain le

- name

diverse programs

directory

d name/ mkdir

symbolic link

l name@ ln -s

device le

c name mknod

FIFO (named pipe)

p name| mkfifo

Unix-domain socket

s name=

no command

9.1 Terms

Generally speaking, a le is a self-contained collection of data. There is no re-file

striction on the type of the data within the le; a le can be a text of a few letters

or a multi-megabyte archive containing a user’s complete life works. Files do not

need to contain plain text. Images, sounds, executable programs and lots of other

things can be placed on a storage medium as les. To guess at the type of data

contained in a le you can use the

file

command:

file

$file /bin/ls /usr/bin/groups /etc/passwd

/bin/ls: ELF 32-bit LSB executable, Intel 80386,





version 1 (SYSV), for GNU/Linux 2.4.1,





dynamically linked (uses shared libs), for GNU/Linux 2.4.1, stripped

/usr/bin/groups: Bourne shell script text executable

/etc/passwd: ASCII text

file

guesses the type of a le based on rules in the

/usr/share/file

directory.

/usr/share/file/magic

contains a clear-text version of the rules. You can dene

your own rules by putting them into the

/etc/magic

le. Check

magic

(5) for

details.

To function properly, a Linux system normally requires several thousand dierent

les. Added to that are the many les created and owned by the system’s various

users.

Ale system determines the method of arranging and managing data on afile system

storage medium. A hard disk basically stores bytes that the system must be able

to nd again somehow—and as eciently and exibly as possible at that, even

for very huge les. The details of le system operation may dier (Linux knows

lots of dierent le systems, such as

ext2

ext3

ext4

, ReiserFS, XFS, JFS, btrfs, …)

but what is presented to the user is largely the same: a tree-structured hierarchy

of le and directory names with les of dierent types. (See also Chapter 6.)

BIn the Linux community, the term “le system” carries several meanings. In

addition to the meaning presented here—“method of arranging bytes on a

medium”—, a le system is often considered what we have been calling a

“directory tree”. In addition, a specic medium (hard disk partition, USB

key, …) together with the data on it is often called a “le system”—in the

sense that we say, for example, that hard links (Section 6.4.2) do not work

“across le system boundaries”, that is, between two dierent partitions on

hard disk or between the hard disk and a USB key.

9.2 File Types

Linux systems subscribe to the basic premise “Everything is a le”. This may seem

confusing at rst, but is a very useful concept. Six le types may be distinguished

in principle:

9.3 The Linux Directory Tree 139

Plain les This group includes texts, graphics, sound les, etc., but also exe-

cutable programs. Plain les can be generated using the usual tools like

editors,

cat

, shell output redirection, and so on.

Directories Also called “folders”; their function, as we have mentioned, is to help

structure storage. A directory is basically a table giving le names and as-

sociated inode numbers. Directories are created using the

mkdir

command.

Symbolic links Contain a path specication redirecting accesses to the link to

a dierent le (similar to “shortcuts” in Windows). See also Section 6.4.2.

Symbolic links are created using

ln -s

Device les These les serve as interfaces to arbitrary devices such as disk drives.

For example, the le

/dev/fd0

represents the rst oppy drive. Every write

or read access to such a le is redirected to the corresponding device. De-

vice les are created using the

mknod

command; this is usually the system

administrator’s prerogative and is thus not explained in more detail in this

manual.

FIFOs Often called “named pipes”. Like the shell’s pipes, they allow the direct

communication between processes without using intermediate les. A pro-

cess opens the FIFO for writing and another one for reading. Unlike the

pipes that the shell uses for its pipelines, which behave like les from a pro-

gram’s point of view but are “anonymous”—they do not exist within the le

system but only between related processes—, FIFOs have le names and can

thus be opened like les by arbitrary programs. Besides, FIFOs may have

access rights (pipes may not). FIFOs are created using the

mkfifo

command.

Unix-domain sockets Like FIFOs, Unix-domain sockets are a method of inter-

process communication. They use essentially the same programming in-

terface as “real” network communications across TCP/IP, but only work

for communication peers on the same computer. On the other hand, Unix-

domain sockets are considerably more ecient than TCP/IP. Unlike FIFOs,

Unix-domain sockets allow bi-directional communications—both partici-

pating processes can send as well as receive data. Unix-domain sockets are

used, e. g., by the X11 graphic system, if the X server and clients run on the

same computer. There is no special program to create Unix-domain sockets.

Exercises

C9.1 [3] Check your system for examples of the various le types. (Table 9.1

shows you how to recognise the les in question.)

9.3 The Linux Directory Tree

A Linux system consists of hundreds of thousands of les. In order to keep track,

there are certain conventions for the directory structure and the les comprising a

Linux system, the Filesystem Hierarchy Standard (FHS). Most distributions adhere FHS

to this standard (possibly with small deviations). The FHS describes all directories

immediately below the le system’s root as well as a second level below

/usr

The le system tree starts at the root directory, “

” (not to be confused with root directory

/root

, the home directory of user

root

). The root directory contains either just sub-

directories or else additionally, if no

/boot

directory exists, the operating system

kernel.

You can use the “

ls -la /

” command to list the root directory’s subdirectories.

The result should look similar to Figure 9.1. The individual subdirectories follow

FHS and therefore contain approximately the same les on every distribution. We

shall now take a closer look at some of the directories:

140 9 The File System

$cd /

$ls -l

insgesamt 125

drwxr-xr-x 2 root root 4096 Dez 20 12:37 bin

drwxr-xr-x 2 root root 4096 Jan 27 13:19 boot

lrwxrwxrwx 1 root root 17 Dez 20 12:51 cdrecorder





-> /media/cdrecorder

lrwxrwxrwx 1 root root 12 Dez 20 12:51 cdrom -> /media/cdrom

drwxr-xr-x 27 root root 49152 Mär 4 07:49 dev

drwxr-xr-x 40 root root 4096 Mär 4 09:16 etc

lrwxrwxrwx 1 root root 13 Dez 20 12:51 floppy -> /media/floppy

drwxr-xr-x 6 root root 4096 Dez 20 16:28 home

drwxr-xr-x 6 root root 4096 Dez 20 12:36 lib

drwxr-xr-x 6 root root 4096 Feb 2 12:43 media

drwxr-xr-x 2 root root 4096 Mär 21 2002 mnt

drwxr-xr-x 14 root root 4096 Mär 3 12:54 opt

dr-xr-xr-x 95 root root 0 Mär 4 08:49 proc

drwx------ 11 root root 4096 Mär 3 16:09 root

drwxr-xr-x 4 root root 4096 Dez 20 13:09 sbin

drwxr-xr-x 6 root root 4096 Dez 20 12:36 srv

drwxrwxrwt 23 root root 4096 Mär 4 10:45 tmp

drwxr-xr-x 13 root root 4096 Dez 20 12:55 usr

drwxr-xr-x 17 root root 4096 Dez 20 13:02 var

Figure 9.1: Content of the root directory (SUSE)

BThere is considerable consensus about the FHS, but it is just as “binding”

as anything on Linux, i. e., not that much. On the one hand, there certainly

are Linux systems (for example the one on your broadband router or PVR)

that are mostly touched only by the manufacturer and where conforming

to every nook and cranny of the FHS does not gain anything. On the other

hand, you may do whatever you like on your own system, but must be pre-

pared to bear the consequences—your distributor assures you to keep to his

side of the FHS bargain, but also expects you not to complain if you are not

playing completely by the rules and problems do occur. For example, if you

install a program in

/usr/bin

and the le in question gets overwritten during

the next system upgrade, this is your own fault since, according to the FHS,

you are not supposed to put your own programs into

/usr/bin

(

/usr/local/bin

would have been correct).

The Operating System Kernel—

/boot

The

/boot

directory contains the actual op-

erating system:

vmlinuz

is the Linux kernel. In the

/boot

directory there are also

other les required for the boot loader (usually GRUB).

On some systems,

/boot

is placed on its own separate partition. This can be

necessary if the actual le system is encrypted or otherwise dicult to reach for

the boot loader, possibly because special drivers are required to access a hardware

RAID system.

General Utilities—

/bin

there are the most important executable programs

(mostly system programs) which are necessary for the system to boot. This in-

cludes, for example,

mount

and

mkdir

. Many of these programs are so essential

that they are needed not just during system startup, but also when the system

is running—like

and

grep

/bin

also contains programs that are necessary to get

a damaged system running again if only the le system containing the root direc-

tory is available. Additional programs that are not required on boot or for system

9.3 The Linux Directory Tree 141

repair can be found in

/usr/bin

Special System Programs—

/sbin

/bin

/sbin

contains programs that are nec-

essary to boot or repair the system. However, for the most part these are system

conguration tools that can really be used only by

root

. “Normal” users can use

some of these programs to query the system, but can’t change anything. As with

/bin

, there is a directory called

/usr/sbin

containing more system programs.

System Libraries—

/lib

This is where the “shared libraries” used by programs

/bin

and

/sbin

reside, as les and (symbolic) links. Shared libraries are pieces

of code that are used by various programs. Such libraries save a lot of resources,

since many processes use the same basic parts, and these basic parts must then be

loaded into memory only once; in addition, it is easier to x bugs in such libraries

when they are in the system just once and all programs fetch the code in question

from one central le. Incidentally, below

/lib/modules

there are kernel modules,kernel modules

i. e., kernel code which is not necessarily in use—device drivers, le systems, or

network protocols. These modules can be loaded by the kernel when they are

needed, and in many cases also be removed after use.

Device Files—

/dev

This directory and its subdirectories contain a plethora of en-

tries for device les. Device les form the interface between the shell (or, gener- Device files

ally, the part of the system that is accessible to command-line users or program-

mers) to the device drivers inside the kernel. They have no “content” like other

les, but refer to a driver within the kernel via “device numbers”.

BIn former times it was common for Linux distributors to include an entry in

/dev

for every conceivable device. So even a laptop Linux system included

the device les required for ten hard disks with 63 partitions each, eight

ISDN adapters, sixteen serial and four parallel interfaces, and so on. Today

the trend is away from overfull

/dev

directories with one entry for every

imaginable device and towards systems more closely tied to the running

kernel, which only contain entries for devices that actually exist. The magic

word in this context is

udev

(short for userspace

/dev

) and will be discussed in

more detail in Linux Administration I.

Linux distinguishes between character devices and block devices. A character character devices

block devices

device is, for instance, a terminal, a mouse or a modem—a device that provides

or processes single characters. A block device treats data in blocks—this includes

hard disks or oppy disks, where bytes cannot be read singly but only in groups

of 512 (or some such). Depending on their avour, device les are labelled in “

-l

” output with a “

” or “

”:

crw-rw-rw- 1 root root 10, 4 Oct 16 11:11 amigamouse

brw-rw---- 1 root disk 8, 1 Oct 16 11:11 sda1

brw-rw---- 1 root disk 8, 2 Oct 16 11:11 sda2

crw-rw-rw- 1 root root 1, 3 Oct 16 11:11 null

Instead of the le length, the list contains two numbers. The rst is the “major

device number” specifying the device’s type and governing which kernel driver

is in charge of this device. For example, all SCSI hard disks have major device

number 8. The second number is the “minor device number”. This is used by the

driver to distinguish between dierent similar or related devices or to denote the

various partitions of a disk.

There are several notable pseudo devices. The null device,

/dev/null

, is like a pseudo devices

“dust bin” for program output that is not actually required, but must be directed

somewhere. With a command like

$program >/dev/null

142 9 The File System

the program’s standard output, which would otherwise be displayed on the ter-

minal, is discarded. If

/dev/null

is read, it pretends to be an empty le and returns

end-of-le at once.

/dev/null

must be accessible to all users for reading and writ-

ing.

The “devices”

/dev/random

and

/dev/urandom

return random bytes of “crypto-

graphic quality” that are created from “noise” in the system—such as the in-

tervals between unpredictable events like key presses. Data from

/dev/random

suitable for creating keys for common cryptographic algorithms. The

/dev/zero

le returns an unlimited supply of null bytes; you can use these, for example, to

create or overwrite les with the

command.

Configuration Files—

/etc

The

/etc

directory is very important; it contains the

conguration les for most programs. Files

/etc/inittab

and

/etc/init.d/*

, for ex-

ample, contain most of the system-specic data required to start system services.

Here is a more detailed descriptionof the most important les—except for a few

of them, only user

root

has write permission but everyone may read them.

/etc/fstab

This describes all mountable le systems and their properties (type,

access method, “mount point”).

/etc/hosts

This le is one of the conguration les of the TCP/IP network. It maps

the names of network hosts to their IP addresses. In small networks and on

freestanding hosts this can replace a name server.

/etc/inittab

The

/etc/inittab

le is the conguration le for the

init

program and

thus for the system start.

/etc/init.d/*

This directory contains the “init scripts” for various system services.

These are used to start up or shut down system services when the system is

booted or switched o.

On Red Hat distributions, this directory is called

/etc/rc.d/init.d

/etc/issue

This le contains the greeting that is output before a user is asked to

name of the vendor.

/etc/motd

This le contains the “message of the day” that appears after a user has

successfully logged in. The system administrator can use this le to notify

users of important facts and events1.

/etc/mtab

This is a list of all mounted le systems including their mount points.

/etc/mtab

diers from

/etc/fstab

in that it contains all currently mounted le

systems, while

/etc/fstab

contains only settings and options for le systems

that might be mounted—typically on system boot but also later. Even that

list is not exhaustive, since you can mount le systems via the command

line where and how you like.

BWe’re really not supposed to put that kind of information in a le

within

/etc

, where les ought to be static. Apparently, tradition has

carried the day here.

/etc/passwd

there is a list of all users that are known to the system, to-

gether with various items of user-specic information. In spite of the name

of the le, on modern systems the passwords are not stored in this le but

in another one called

/etc/shadow

. Unlike

/etc/passwd

, that le is not readable

by normal users.

1There is a well-known claim that the only thing all Unix systems in the world have in common is

the “message of the day” asking users to remove unwanted les since all the disks are 98% full.

9.3 The Linux Directory Tree 143

Accessories—

/opt

This directory is really intended for third-party software—

complete packages prepared by vendors that are supposed to be installable with-

out conicting with distribution les or locally-installed les. Such software pack-

ages occupy a subdirectory

/opt/

⟨package⟩. By rights, the

/opt

directory should be

completely empty after a distribution has been installed on an empty disk.

“Unchanging Files”—

/usr

there are various subdirectories containing

programs and data les that are not essential for booting or repairing the system

or otherwise indispensable. The most important directories include:

/usr/bin

System programs that are not essential for booting or otherwise impor-

tant

/usr/sbin

More system programs for

root

/usr/lib

Further libraries (not used for programs in

/bin

/sbin

/usr/local

Directory for les installed by the local system administrator. Corre-

sponds to the

/opt

directory—the distribution may not put anything here

/usr/share

Architecture-independent data. In principle, a Linux network consist-

ing, e. g., of Intel, SPARC and PowerPC hosts could share a single copy of

/usr/share

on a central server. However, today disk space is so cheap that no

distribution takes the trouble of actually implementing this.

/usr/share/doc

Documentation, e. g., HOWTOs

/usr/share/info

Info pages

/usr/share/man

Manual pages (in subdirectories)

/usr/src

Source code for the kernel and other programs (if available)

BThe name

/usr

is often erroneously considered an acronym of “Unix system

resources”. Originally this directory derives from the time when computers

often had a small, fast hard disk and another one that was bigger but slower.

All the frequently-used programs and les went to the small disk, while the

big disk (mounted as

/usr

) served as a repository for les and programs

that were either less frequently used or too big. Today this separation can

be exploited in another way: With care, you can put

/usr

on its own partition

and mount that partition “read-only”. It is even possible to import

/usr

from Read-only

/usr

a remote server, even though the falling prices for disk storage no longer

make this necessary (the common Linux distributions do not support this,

anyway).

A Window into the Kernel—

/proc

This is one of the most interesting and impor-

tant directories.

/proc

is really a “pseudo le system”: It does not occupy space on pseudo file system

disk, but its subdirectories and les are created by the kernel if and when someone

is interested in their content. You will nd lots of data about running processes

as well as other information the kernel possesses about the computer’s hardware.

For instance, in some les you will nd a complete hardware analysis. The most

important les include:

/proc/cpuinfo

This contains information about the CPU’s type and clock frequency.

/proc/devices

This is a complete list of devices supported by the kernel including

their major device numbers. This list is consulted when device les are cre-

ated.

/proc/dma

A list of DMA channels in use. On today’s PCI-based systems this is

neither very interesting nor important.

144 9 The File System

/proc/interrupts

A list of all hardware interrupts in use. This contains the inter-

rupt number, number of interrupts triggered and the drivers handling that

particular interrupt. (An interrupt occurs in this list only if there is a driver

in the kernel claiming it.)

/proc/ioports

/proc/interrupts

, but for I/O ports.

/proc/kcore

This le is conspicuous for its size. It makes available the computer’s

complete RAM and is required for debugging the kernel. This le requires

root

privileges for reading. You do well to stay away from it!

/proc/loadavg

This le contains three numbers measuring the CPU load during

the last 1, 5 and 15 minutes. These values are usually output by the

uptime

program

/proc/meminfo

Displays the memory and swap usage. This le is used by the

free

program

/proc/mounts

Another list of all currently mounted le systems, mostly identical to

/etc/mtab

/proc/scsi

In this directory there is a le called

scsi

listing the available SCSI de-

vices. There is another subdirectory for every type of SCSI host adapter in

the system containing a le

(

, …, for multiple adapters of the same type)

giving information about the SCSI adapter.

/proc/version

Contains the version number and compilation date of the current

kernel.

BBack when

/proc

had not been invented, programs like the process status

display tool,

, which had to access kernel information, needed to include

considerable knowledge about internal kernel data structures as well as the

appropriate access rights to read the data in question from the running ker-

nel. Since these data structures used to change fairly rapidly, it was often

necessary to install a new version of these programs along with a new ver-

sion of the kernel. The

/proc

le system serves as an abstraction layer be-

tween these internal data structures and the utilities: Today you just need

to ensure that after an internal change the data formats in

/proc

remain the

same—and

and friends continue working as usual.

Hardware Control—

/sys

The Linux kernel has featured this directory since ver-

sion 2.6. Like

/proc

, it is made available on demand by the kernel itself and al-

lows, in an extensive hierarchy of subdirectories, a consistent view on the available

hardware. It also supports management operations on the hardware via various

special les.

BTheoretically, all entries in

/proc

that have nothing to do with individual

processes should slowly migrate to

/sys

. When this strategic goal is going

to be achieved, however, is anybody’s guess.

Dynamically Changing Files—

/var

This directory contains dynamically changing

les, distributed across dierent directories. When executing various programs,

the user often creates data (frequently without being aware of the fact). For ex-

ample, the

man

command causes compressed manual page sources to be uncom-

pressed, while formatted man pages may be kept around for a while in case they

are required again soon. Similarly, when a document is printed, the print data

must be stored before being sent to the printer, e. g., in

/var/spool/cups

. Files in

/var/log

record login and logout times and other system events (the “log les”),log files

/var/spool/cron

contains information about regular automatic command invoca-

tions, and users’ unread electronic mail is kept in

/var/mail

9.3 The Linux Directory Tree 145

BJust so you heard about it once (it might be on the exam): On Linux, the

system log les are generally handled by the “syslog” service. A program

called

syslogd

accepts messages from other programs and sorts these ac-

cording to their origin and priority (from “debugging help” to “error” and

“emergency, system is crashing right now”) into les below

/var/log

, where

you can nd them later on. Other than to les, the syslog service can also

write its messages elsewhere, such as to the console or via the network to

another computer serving as a central “management station” that consoli-

dates all log messages from your data center.

BBesides the

syslogd

, some Linux distributions also contain a

klogd

service.

Its job is to accept messages from the operating system kernel and to pass

them on to

syslogd

. Other distributions do not need a separate

klogd

since

their

syslogd

can do that job itself.

BThe Linux kernel emits all sorts of messages even before the system is booted

far enough to run

syslogd

(and possibly

klogd

) to accept them. Since the mes-

sages might still be important, the Linux kernel stores them internally, and

you can access them using the

dmesg

command.

Transient Files—

/tmp

Many utilities require temporary le space, for example

some editors or

sort

. In

/tmp

, all programs can deposit temporary data. Many

distributions can be set up to clean out

/tmp

when the system is booted; thus you

should not put anything of lasting importance there.

BAccording to tradition,

/tmp

is emptied during system startup but

/var/tmp

isn’t. You should check what your distribution does.

Server Files—

/srv

Here you will nd les oered by various server programs,

such as

drwxr-xr-x 2 root root 4096 Sep 13 01:14 ftp

drwxr-xr-x 5 root root 4096 Sep 9 23:00 www

This directory is a relatively new invention, and it is quite possible that it does

not yet exist on your system. Unfortunately there is no other obvious place for

web pages, an FTP server’s documents, etc., that the FHS authors could agree on

(the actual reason for the introduction of

/srv

), so that on a system without

/srv

these les could end up somewhere completely dierent, e. g., in subdirectories

/usr/local

/var

Access to CD-ROM or Floppies—

/media

This directory is often generated auto-

matically; it contains additional empty directories, like

/media/cdrom

and

/media/

floppy

, that can serve as mount points for CD-ROMs and oppies. Depending

on your hardware setup you should feel free to add further directories such as

/media/dvd

, if these make sense as mount points and have not been preinstalled by

your distribution vendor.

Access to Other Storage Media—

/mnt

This directory (also empty) serves as a

mount point for short-term mounting of additional storage media. With some

distributions, such as those by Red Hat, media mountpoints for CD-ROM, oppy,

… might show up here instead of below

/media

User Home Directories—

/home

This directory contains the home directories of

all users except

root

(whose home directory is located elsewhere).

BIf you have more than a few hundred users, it is sensible, for privacy protec-

tion and eciency, not to keep all home directories as immediate children

/home

. You could, for example, use the users’ primary group as a criterion

for further subdivision:

146 9 The File System

Table 9.2: Directory division according to the FHS

static dynamic

local

/etc

/bin

/sbin

/lib /dev

/var/log

remote

/usr

/opt /home

/var/mail

/home/support/jim

/home/develop/bob



Administrator’s Home Directory—

/root

The system administrator’s home direc-

tory is located in

/root

. This is a completely normal home directory similar to that

of the other users, with the marked dierence that it is not located below

/home

but

immediately below the root directory (

The reason for this is that

/home

is often located on a le system on a separate

partition or hard disk. However,

root

must be able to access their own user envi-

ronment even if the separate

/home

le system is not accessible for some reason.

Lost property—

lost+found

(

ext

le systems only; not mandated by FHS.) This di-

rectory is used for les that look reasonable but do not seem to belong to any

directory. The le system consistency checker creates liks to such les in the

lost+found

directory on the same le system, so the system administrator can g-

ure out where the le really belongs;

lost+found

is created “on the o-chance” for

the le system consistency checker to nd in a xed place (by convention, on the

ext

le systems, it always uses inode number 11).

BAnother motivation for the directory arrangement is as follows: The FHS di-

vides les and directories roughly according to two criteria—do they need

to be available locally or can they reside on another computer and be ac-

cessed via the network, and are their contents static (do les only change

by explicit administrator action) or do they change while the system is run-

ning? (Table 9.2)

The idea behind this division is to simplify system administration: Direc-

tories can be moved to le servers and maintained centrally. Directories

that do not contain dynamic data can be mounted read-only and are more

resilient to crashes.

Exercises

C9.2 [1] How many programs does your system contain in the “usual”

places?

C9.3 [I]f

grep

is called with more than one le name on the command line,

it outputs the name of the le in question in front of every matching line.

This is possibly a problem if you invoke

grep

with a shell wildcard pattern

(such as “

*.txt

”), since the exact format of the

grep

output cannot be fore-

seen, which may mess up programs further down the pipeline. How can

you enforce output of the le name, even if the search pattern expands to a

single le name only? (Hint: There is a very useful “le” in

/dev

C9.4 [T]he “

cp foo.txt /dev/null

” command does basically nothing, but the

“

mv foo.txt /dev/null

”—assuming suitable access permissions—replaces

/dev/null

foo.txt

. Why?

9.4 Directory Tree and File Systems 147

C9.5 [2] On your system, which (if any) software packages are installed below

/opt

? Which ones are supplied by the distribution and which ones are third-

party products? Should a distribution install a “teaser version” of a third-

party product below

/opt

or elsewhere? What do you think?

C9.6 [1] Why is it inadvisable to make backup copies of the directory tree

rooted at

/proc

9.4 Directory Tree and File Systems

A Linux system’s directory tree usually extends over more than one partition on

disk, and removable media like CD-ROM disks, USB keys as well as portable MP3

players, digital cameras and so on must be taken into account. If you know your

way around Microsoft Windows, you are probably aware that this problem is

solved there by means of identifying dierent “drives” by means of letters—on

Linux, all available disk partitions and media are integrated in the directory tree

starting at “

”.

In general, nothing prevents you from installing a complete Linux system

on a single hard disk partition. However, it is common to put at least the

/home

partitioning

directory—where users’ home directories reside—on its own partition. The ad-

vantage of this approach is that you can re-install the actual operating system,

your Linux distribution, completely from scratch without having to worry about

the safety of your own data (you simply need to pay attention at the correct mo-

ment, namely when you pick the target partition(s) for the installation in your

distribution’s installer.) This also simplies the creation of backup copies.

On larger server systems it is also quite usual to assign other directories, typi- server systems

cally

/tmp

/var/tmp

, or

/var/spool

, their own partitions. The goal is to prevent users

from disturbing system operations by lling important partitions completely. For

example, if

/var

is full, no protocol messages can be written to disk, so we want to

keep users from lling up the le system with large amounts of unread mail, un-

printed print jobs, or giant les in

/var/tmp

. On the other hand, all these partitions

tend to clutter up the system.

BMore information and strategies for partitioning are presented in the Linup

Front training manual, Linux Administration I.

The

/etc/fstab

le describes how the system is assembled from various disk

/etc/fstab

partitions. During startup, the system arranges for the various le systems to be

made available—the Linux insider says “mounted”—in the correct places, which

you as a normal user do not need to worry about. What you may in fact be inter-

ested in, though, is how to access your CD-ROM disks and USB keys, and these

need to be mounted, too. Hence we do well to cover this topic briey even though

it is really administrator country.

To mount a medium, you require both the name of the device le for the

medium (usually a block device such as

/dev/sda1

) and a directory somewhere in

the directory tree where the content of the medium should appear—the so-called

mount point. This can be any directory.

BThe directory doesn’t even have to be empty, although you cannot access the

original content once you have mounted another medium “over” it. (The

content reappears after you unmount the medium.)

AIn principile, somebody could mount a removable medium over an impor-

tant system directory such as

/etc

(ideally with a le called

passwd

containing

root

entry without a password). This is why mounting of le systems in

arbitrary places within the directory tree is restricted to the system adminis-

trator, who will have no need for shenanigans like these, as they are already

root

148 9 The File System

BEarlier on, we called the “device le for the medium”

/dev/sda1

. This is really

the rst partition on the rst SCSI disk drive in the system—the real name

may be completely dierent depending on the type of medium you are us-

ing. Still it is an obvious name for USB keys, which for technical reasons are

treated by the system as if they were SCSI devices.

With this information—device name and mount point—a system administra-

tor can mount the medium as follows:

#mount /dev/sda1 /media/usb

This means that a le called

file

on the medium would appear as

/media/usb/file

in the directory tree. With a command such as

#umount /media/usb

Note: no ‘‘n’’

the administrator can also unmount the medium again.

9.5 Removable Media

The explict mounting of removable media is a tedious business, and the explicit

unmounting before removing a medium even more so—but especially the latter

can lead to problems if you remove the medium physically before Linux is com-

pletely nished with it. Linux does try to speed up the system by not executing

slow operations like writing to media immediately but later, when the “right mo-

ment” has arrived, and if you pull out your USB key before the data have actually

been written there, you have in the best case gained nothing, and in the worst case

the data on there have descended into chaos.

As a user of a graphical desktop interface on a modern Linux system, you have

it easy: If you insert or plug in a medium—no matter whether it is an audio CD,

USB key, or digital camera—, a dialog appears suggesting various interesting ac-

tions that you can perform on the medium. “Mounting” is usually one of those,

and the system also gures out a nice mount point for you. It is just as easy to

remove the medium later by means of an icon on the desktop background or the

desktop environment’s control panel. We don’t need to cover this in detail here.

Things look dierent on the command line, though, where you must mount

and unmount removable media explicitly as discussed in the previous section.

As we said, as a normal user you are not allowed to do this for arbitrary media in

arbitrary places, but only for media that your system administrator has prepared

for this and then only at “pre-cooked” mount points. You can recognise these

because they have been marked with the

user

users

options:

$grep user /etc/fstab

/dev/hdb /media/cdrom0 udf,iso9660 ro,user,noauto 0 0

/dev/sda1 /media/usb auto user,noauto 0 0

/dev/mmcblk0p1 /media/sd auto user,noauto 0 0

For the details of

/etc/fstab

entries we need to refer you to the Linup Front training

manual, Linux Administration I (O. K.,

fstab

(5) also works, but our manual is nicer);

here and now we shall restrict ourselves to pointing out that in our example three

types of removable media are available, namely CD-ROM disks (the rst entry),

USB-based media such as USB keys, digital cameras or MP3 players (the second

entry), and SD cards (the third entry). As a “normal user”, you have to stick to the

given mount points and can (after inserting the medium in question) say things

$mount /dev/hdb

for the CD-ROM

$mount /media/cdrom0

ditto

9.5 Removable Media 149

$mount /dev/sda1

for the USB key

$mount /media/sd

for the SD card

That is, Linux expects either the device name or the mount point; the matching

counterpart always derives from the

/etc/fstab

entry. Unmounting using

umount

works similarly.

BThe

user

option in

/etc/fstab

makes this work (it also produces some other ef-

fects that we shall not be treating in detail here). The

users

option is roughly

the same; the dierence between the two—and you may want to remem-

ber this, as it may occur on the exam—is that, with

user

, only the user who

mounted the le system originally may unmount it again. With

users

, any

user may do so (!). (And

root

can do it all the time, anyway.)

Exercises

C9.7 [1] Insert a oppy in the drive, mount it, copy a le (like

/etc/passwd

) to

the oppy, and unmount the oppy again. (If your system is “legacy-free”

and no longer sports a oppy disk drive, then do the same with a USB key

or a similar suitable removable medium.)

Commands in this Chapter

dmesg

Outputs the content of the kernel message buer

dmesg

(8) 145

file

Guesses the type of a le’s content, according to rules

file

(1) 138

free

Displays main memory and swap space usage

free

(1) 144

klogd

Accepts kernel log messages

klogd

(8) 145

mkfifo

Creates FIFOs (named pipes)

mkfifo

(1) 139

mknod

Creates device les

mknod

(1) 139

syslogd

Handles system log messages

syslogd

(8) 145

uptime

Outputs the time since the last system boot as well as the system load

averages

uptime

(1) 144

Summary

• Files are self-contained collections of data stored under a name. Linux uses

the “le” abstraction also for devices and other objects.

• The method of arranging data and administrative information on a disk is

called a le system. The same term covers the complete tree-structured hi-

erarchy of directories and les in the system or a specic storage medium

together with the data on it.

• Linux le systems contain plain les, directories, symbolic links, device les

(two kinds), FIFOs, and Unix-domain sockets.

• The Filesystem Hierarchy Standard (FHS) describes the meaning of the most

important directories in a Linux system and is adhered to by most Linux

distributions.

• Removable media must be mounted into the Linux directory tree to be ac-

cessible, and be unmounted after use. The

mount

and

umount

commands are

used to do this. Graphical desktop enviroments usually oer more conve-

nient methods.

$ echo tux

tux

$ ls

hallo.c

hallo.o

$ /bin/su -

Password:

System Administration

Contents

10.1 Introductory Remarks . . . . . . . . . . . . . . . . . . 152

10.2 The Privileged

root

Account . . . . . . . . . . . . . . . . 152

10.3 Obtaining Administrator Privileges . . . . . . . . . . . . . 154

10.4 Distribution-specic Administrative Tools . . . . . . . . . . . 156

Goals

• Reviewing a system administrator’s tasks

• Being able to log on as the system administrator

• Being able to assess the advantages and disadvantage of (graphical) admin-

istration tools

Prerequisites

• Basic Linux skills

• Administration skills for other operating systems are helpful

adm1-grundlagen.tex

(

33e55eeadba676a3

)

152 10 System Administration

10.1 Introductory Remarks

As a mere user of a Linux system, you are well o: You sit down in front of your

computer, everything is congured correctly, all the hardware is supported and

works. You have no care in the world since you can call upon a system adminis-

trator who will handle all administrative tasks for you promptly and thoroughly

(that’s what we wish your environment is like, anyway).

Should you be (or strive to be) the system administrator yourself—within your

company or the privacy of your home—then you have your work cut out for you:

You must install and congure the system and connect any peripherals. Having

done that, you need to keep the system running, for example by checking the sys-

tem logs for unusual events, regularly getting rid of old log les, making backup

copies, installing new software and updating existing programs, and so on.

Today, in the age of Linux distributions with luxurious installation tools, sys-

tem installation is no longer rocket science. However, an ambitious administrator

can spend lots of time mobilising every last resource on their system. In general,

system administration mostly takes place when a noticeable change occurs, forchanges

example when new hardware or software is to be integrated, new users arrive or

existing ones disappear, or hardware problems arise.

BMany Linux distributions these days contain specialised tools to facilitateTools

system administration. These tools perform dierent tasks ranging from

user management and creating le systems to complete system updates.

Utilities like these can make these tasks a lot easier but sometimes a lot more

dicult. Standard procedures are simplied but for specialised settings you

should know the exact relationships between system components. Further-

more, most of these tools are only available for certain distributions.

The administration of a Linux system, as of any other computer system, re-

quires a considerable amount of responsibility and care. You should not see your-responsibility

self as a demigod (at least) but as a service provider. No matter whether you are

the only system administrator—say, on your own computer—or working in a team

of colleagues to support a company network: communication is paramount. Youcommunication

should get used to documenting conguration changes and other administrative

decisions in order to be able to retrace them later. The Linux way of directly edit-

ing text les makes this convenient, since you can comment conguration settings

right where they are made (a luxury not usually enjoyed with graphical adminis-

tration tools). Do so.

10.2 The Privileged

root

Account

For many tasks, the system administrator needs special privileges. Accordingly,

he can make use of a special user account called

root

. As

root

, a user is the so-called

super user. In brief: He may do anything.super user

The normal le permissions and security precautions do not apply to

root

. He

has allowing him nearly unbounded access to all data, devices and system compo-unlimited privileges

nents. He can institute system changes that all other users are prohibited from by

the Linux kernel’s security mechanisms. This means that, as

root

, you can change

every le on the system no matter who it belongs to. While normal users cannot

wreak damage (e.g., by destroying le systems or manipulating other users’ les),

root

is not thus constrained.

BIn many cases, these extensive system administrator privileges are really

a liability. For example, when making backup copies it is necessary to be

able to read all les on the system. However, this by no means implies that

the person making the backup (possibly an intern) should be empowered to

open all les on the system with a text editor, to read them or change them—

or start a network service which might be accessible from anywhere in the

10.2 The Privileged

root

Account 153

world. There are various ways of giving out administrator privileges only in

controlled circumstances (such as

sudo

, a system which lets normal users ex-

sudo

ecute certain commands using administrator privileges), of selectively giv-

ing particular privileges to individual process rather than operating on an

“all or nothing” principle (cue POSIX capabilities), or of doing away with POSIX capabilities

the idea of an “omnipotent” system administrator completely (for instance,

SELinux—“security-enhanced Linux”—a freely available software package SELinux

by the American intelligence agency, NSA, contains a “role-based” access

control system that can get by without an omnipotent system administra-

tor).

Why does Linux contain security precautions in the rst place? The most im- Why Security?

portant reason is for users to be able to determine the access privileges that apply

to their own les. By setting permission bits (using the

chmod

command), users

can ascertain that certain les may be read, written to or executed by certain oth-

ers (or no) users. This helps safeguard the privacy and integrity of their data. You

would certainly not approve of other users being able to read your private e-mail

or change the source code of an important program behind your back.

The security mechanisms are also supposed to keep users from damaging the

system. Access to many of the device les in

/dev

corresponding to hardware com- Access control for devices

ponents such as hard disks is constrained by the system. If normal users could ac-

cess disk storage directly, all sorts of mayhem might occur (a user might overwrite

the complete content of a disk or, having obtained information about the layout

of the lesystem on the disk, access les that are none of his business). Instead,

the system forces normal users to access the disks via the le system and protects

their data in that way.

It is important to stress that damage is seldom caused on purpose. The system’s

security mechanisms serve mostly to save users from unintentional mistakes and

misunderstandings; only in the second instance are they meant to protect the pri-

vacy of users and data.

On the system, users can be pooled into groups to which you may assign their groups

own access privileges. For example, a team of software developers could have

read and write permission to a number of les, while other users are not allowed

to change these les. Every user can determine for their own les how permissive

or restrictive access to them should be.

The security mechanisms also prevent normal users from performing certain

actions such as the invocation of specic system calls from a program. For exam- Privileged system calls

ple, there is a system call that will halt the system, which is executed by programs

such as

shutdown

when the system is to be powered down or rebooted. If normal

users were allowed to invoke this routine from their own programs, they could

inadvertently (or intentionally) stop the system at any time.

The administrator frequently needs to circumvent these security mechanisms

in order to maintain the system or install updated software versions. The

root

account is meant to allow exactly this. A good administrator can do his work

without regard for the usual access permissions and other constraints, since these

do not apply to

root

. The

root

account is not better than a normal user account

because it has more privileges; the restriction of these privileges to

root

is a secu-

rity measure. Since the operating system’s reasonable and helpful protection and

security mechanisms do not apply to the system administrator, working as

root

is very risky. You should therefore use

root

to execute only those commands that

really require the privileges.

BMany of the security problems of other popular operating systems can be

traced back to the fact that normal users generally enjoy administrator priv-

ileges. Thus, programs such as “worms” or “Trojan horses”, which users

often execute by accident, nd it easy to establish themselves on the sys-

tem. With a Linux system that is correctly installed and operated, this is

hardly possible since users read their e-mail without administrator privi-

154 10 System Administration

leges, but administrator privileges are required for all system-wide cong-

uration changes.

BOf course, Linux is not magically immune against malicious pests like

“mail worms”; somebody could write and make popular a mail program

that would execute “active content” such as scripts or binary programs

within messages like some such programs do on other operating systems.

On Linux, such a “malicious” program from elsewhere could remove all

the caller’s les or try to introduce “Trojan” code to his environment, but

it could not harm other users nor the system itself—unless it exploited a

security vulnerability in Linux that would let a local user gain administrator

privileges “through the back door” (such vulnerabilities are detected now

and again, and patches are promptly published which you should install in

a timely manner).

Exercises

C10.1 [2] What is the dierence between a user and an administrator? Name

examples for tasks and actions (and suitable commands) that are typically

performed from a user account and the

root

account, respectively.

C10.2 [!1] Why should you, as a normal user, not use the

root

account for your

daily work?

C10.3 [W]hat about access control on your computer at home? Do you work

from an administrator account?

10.3 Obtaining Administrator Privileges

There are two ways of obtaining administrator privileges:

1. You can log in as user

root

directly. After entering the correct

root

password

you will obtain a shell with administrator privileges. However, you should

avoid logging in to the GUI as

root

, since then all graphical applications in-

cluding the X server would run with

root

privileges, which is not necessary

and can lead to security problems. Nor should direct

root

logins be allowed

across the network.

BYou can determine which terminals are eligible for direct

root

by listing them in the

/etc/securetty

le. The default setting is usually

“all virtual consoles and

/dev/ttyS0

” (the latter for users of the “serial

console”).

2. You can, from a normal shell, use the

command to obtain a new shell with

administrator privileges.

, like

, asks for a password and opens the

root

shell only after the correct

root

password has been input. In GUIs like

KDE there are similar methods.

(See also Introduction to Linux for Users and Administrators.)

Even if a Linux system is used by a single person only, it makes sense to createSingle-user systems, too!

a normal account for this user. During everyday work on the system as

root

, most

of the kernel’s normal security precautions are circumvented. That way errors can

occur that impact on the whole system. You can avoid this danger by logging into

your normal account and starting a

root

shell via “

/bin/su -

” if and when required.

BUsing

, you can also assume the identity of arbitrary other users (here

hugo

)

by invoking it like

$/bin/su - hugo

10.3 Obtaining Administrator Privileges 155

You need to know the target user’s password unless you are calling

user

root

The second method is preferable to the rst for another reason, too: If you use

the

command to become

root

after logging in to your own account,

creates a

message like

Apr 1 08:18:21 HOST su: (to root) user1 on /dev/tty2

in the system log (such as

/var/log/messages

). This entry means that user

user1

suc- system log

cessfully executed

to become

root

on terminal 2. If you log in as

root

directly,

no such message is logged; there is no way of guring out which user has fooled

around with the

root

account. On a system with several administrators it is often

important to retrace who entered the

command when.

Ubuntu is one of the “newfangled” distributions that deprecate–and, in the

default setup, even disable—logging in as

root

. Instead, particular users

may use the

sudo

mechanism to execute individual commands with admin-

istrator privileges. Upon installation, you are asked to create a “normal”

user account, and that user account is automatically endowed with “indi-

rect” administrator privileges.

When installing Debian GNU/Linux, you can choose between assigning a

password to the

root

account and thereby enabling direct administrator lo-

gins, and declining this and, as on Ubuntu, giving

sudo

-based administrator

privileges to the rst unprivileged user account created as part of the instal-

lation process.

On many systems, the shell prompt diers between

root

and the other users. shell prompt

The classic

root

prompt contains a hash mark (

), while other users see a prompt

containing a dollar sign (

) or greater-than sign (

). The

prompt is supposed

to remind you that you are

root

with all ensuing privileges. However, the shell

prompt is easily changed, and it is your call whether to follow this convention or

not.

BOf course, if you are using

sudo

, you never get to see a prompt for

root

Like all powerful tools, the

root

account can be abused. Therefore it is impor- Misuse of

root

tant for you as the system administrator too keep the

root

password secret. It

should only be passed on to users who are trusted both professionally and per-

sonally (or who can be held responsible for their actions). If you are the sole user

of the system this problem does not apply to you.

Too many cooks spoil the broth! This principle also applies to system admin- Administration: alone or by

many

istration. The main benet of “private” use of the

root

account is not that the

possibility of misuse is minimised (even though this is surely a consequence).

More importantly,

root

as the sole user of the

root

account knows the complete

system conguration. If somebody besides the administrator can, for example,

change important system les, then the system conguration could be changed

without the administrator’s knowledge. In a commercial environment, it is nec-

essary to have several suitably privileged employees for various reasons—for ex-

ample, safeguarding system operation during holidays or sudden severe illness

of the administrator—; this requires close cooperation and communication.

If there is only one system administrator who is responsible for system con-

guration, you can be sure that one person really knows what is going on on the

system (at least in theory), and the question of accountability also has an obvi- accountability

ous asnwer. The more users have access to

root

, the greater is the probability that

somebody will commit an error as

root

at some stage. Even if all users with

root

access possess suitable administration skills, mistakes can happen to anybody.

Prudence and thorough training are the only precautions against accidents.

156 10 System Administration

There are a few other useful tools for team-based system administration.

For example, Debian GNU/Linux and Ubuntu support a package called

etckeeper

, which allows storing the complete content of the

/etc

directory in

a revision control system such as Git or Mercurial. Revision control systems

(which we cannot cover in detail here) make it possible to track changes to

les in a directory hierarchy in a very detailed manner, to comment them

and, if necessary, to undo them. With Git or Mercurial it is even possible to

store a copy of the

/etc

directory on a completely dierent computer and to

keep it in sync automatically—great protection from accidents.

Exercises

C10.4 [2] What methods exist to obtain administrator rights? Which method

is better? Why?

C10.5 [!2] On a conventionally congured system, how can you recognise

whether you are working as

root

C10.6 [2] Log in as a normal user (e. g.,

test

). Change over to

root

and back to

test

. How do you work best if you frequently need to change between both

these accounts (for example, to check on the results of a new conguration)?

C10.7 [!2] Log in as a normal user and change to

root

using

. Where do you

nd a log entry documenting this change? Look at that message.

10.4 Distribution-specific Administrative Tools

Many Linux distributions try to stand out in the crowd by providing more or less

ingenious tools that are supposed to simplify system administration. These tools

are usually tailored to the distributions in question. Here are a few comments

about typical specimens:

A familiar sight to SUSE administrators is “YaST”, the graphical adminis-

tration interface of the SUSE distributions (it also runs on a text screen). It

allows the extensive conguration of many aspects of the system either by

directly changing the conguration les concerned or by manipulating ab-

stract conguration les below

/etc/sysconfig

which are then used to adapt

the real conguration les by means of the

SuSEconfig

tool. For some tasks

such as network conguration, the les below

/etc/sysconfig

are the actual

conguration les.

Unfortunately, YaST is not a silver bullet for all problems of system admin-

istration. Even though many aspects of the system are amenable to YaST-

based administration, important settings may not be accessible via YaST, or

the YaST modules in question simply do not work correctly. The danger

zone starts where you try to administer the computer partly through YaST

and partly through changing conguration les manually: Yast does exer-

cise some care not to overwrite your changes (which wasn’t the case in the

past—up till SuSe 6 or so, YaST and SuSEcong used to be quite reckless),

but will then not perform its own changes such that they really take eect in

the system. In other places, manual changes to the conguration les will

actually show up in YaST. Hence you have to have some “insider knowl-

edge” and experience in order to assess which conguration les you may

change directly and which your grubby ngers had better not touch.

Some time ago, Novell released the YaST source code under the GPL (in

SUSE’s time it used to be available but not under a “free” licence). However,

so far no other distribution of consequence has adapted YaST to its purposes,

let alone made it a standard tool (SUSE fashion).

10.4 Distribution-specific Administrative Tools 157

BThe Webmin package by Jamie Cameron (

http://www.webmin.com/

) allows the

convenient administration of various Linux distributions (or Unix versions)

via a web-based interface. Webmin is very extensive and oers special fa-

cilities for administering “virtual” servers (for web hosters and their cus-

tomers). However you may have to install it yourself, since most distribu-

tions do not provide it. Webmin manages its own users, which means that

you can extend administrator privileges to users who do not have interac-

tive system access. (Whether that is a smart idea is a completely dierent

question.)

Most administration tools like YaST and Webmin share the same disadvan-

tages:

• They are not extensive enough to take over all aspects of system administra-

tions, and as an administrator you have to have detailed knowledge of their

limits in order to be able to decide where to intervene manually.

• They make system administration possible for people whose expertise is

not adequate to assess the possible consequences of their actions or to nd

and correct mistakes. Creating a user account using an administration tool

is certainly not a critical job and surely more convenient than editing four

dierent system les using

, but other tasks such as conguring a re-

wall or mail server are not suitable for laypeople even using a convenient

administration tool. The danger is that inexperienced administrators will

use an administration tool to attempt tasks which do not look more com-

plicated than others but which, without adequate background knowledge,

may endanger the safety and/or reliability of the system.

• They usually do not oer a facility to version control or document any

changes made, and thus complicate teamwork and auditing by requiring

logs to be kept externally.

• They are often intransparent, i. e., they do not provide documentation about

the actual steps they take on the system to perform administrative tasks.

This keeps the knowledge about the necessary procedures buried in the pro-

grams; as the administrator you have no direct way of “learning” from the

programs like you could by observing an experienced administrator. Thus

the adminstration tools keep you articially stupid.

• As an extension of the previous point: If you need to administer several

computers, common administration tools force you to execute the same

steps repeatedly on every single machine. Many times it would be more

convenient to write a shell script automating the required procedure, and to

execute it automatically on every computer using, e. g., the “secure shell”,

but the administration tool does not tell you what to put into this shell

script. Therefore, viewed in a larger context, their use is inecient.

From various practical considerations like these we would like to recommend

against relying too much on the “convenient” administration tools provided by

the distributions. They are very much like training wheels on a bicycle: They

work eectively against falling over too early and provide a very large sense of

achievement very quickly, but the longer the little ones zoom about with them, the

more dicult it becomes to get them used to “proper” bike-riding (here: doing

administration in the actual conguration les, including all advantages such as

documentation, transparency, auditing, team capability, transportability, …).

Excessive dependence on an administration tool also leads to excessive depen-

dence on the distribution featuring that tool. This may not seem like a real liabil-

ity, but on the other hand one of the more important advantages of Linux is the fact

that there are multiple independent vendors. So, if one day you should be fed up

with the SUSE distributions (for whatever reason) and want to move over to Red

Hat or Debian GNU/Linux, it would be very inconvenient if your administrators

158 10 System Administration

knew only YaST and had to relearn Linux administration from scratch. (Third-

party administration tools like Webmin do not exhibit this problem to the same

degree.)

Exercises

C10.8 [!2] Does your distribution provide an administration tool (such as

YaST)? What can you do with it?

C10.9 [3] (Continuation of the previous exercise—when working through the

manual for the second time.) Find out how your administration tool works.

Can you change the system conguration manually so the administration

tool will notice your changes? Only under some circumstances?

C10.10 [!1] Administration tools like Webmin are potentially accessible to ev-

erybody with a browser. Which advantages and disadvantages result from

this?

Commands in this Chapter

Starts a shell using a dierent user’s identity

(1) 154

sudo

Allows normal users to execute certain commands with administrator

privileges

sudo

(8) 152

Summary

• Every computer installation needs a certain amount of system administra-

tion. In big companies, universities and similar institutions these services

are provided by (teams of) full-time administrators; in smaller companies

or private households, (some) users usually serve as administrators.

• Linux systems are, on the whole, straightforward to administer. Work arises

mostly during the initial installation and, during normal operation, when

the conguration changes noticeably.

• On Linux systems, there usually is a privileged user account called

root

, to

which the normal security mechanisms do not apply.

• As an administrator, one should not work as

root

exclusively, but use a nor-

mal user account and assume

root

privileges only if necessary.

• Administration tools such as YaST or Webmin can help perform some ad-

ministrative duties, but are no substitute for administrator expertise and

may have other disadvantages as well.

$ echo tux

tux

$ ls

hallo.c

hallo.o

$ /bin/su -

Password:

User Administration

Contents

11.1Basics........................160

11.1.1 Why Users? . . . . . . . . . . . . . . . . . . . . 160

11.1.2 Users and Groups . . . . . . . . . . . . . . . . . 161

11.1.3 People and Pseudo-Users . . . . . . . . . . . . . . . 163

11.2 User and Group Information. . . . . . . . . . . . . . . . 163

11.2.1 The

/etc/passwd

File.................163

11.2.2 The

/etc/shadow

File.................166

11.2.3 The

/etc/group

File .................168

11.2.4 The

/etc/gshadow

File.................169

11.2.5 The

getent

Command . . . . . . . . . . . . . . . . 170

11.3 Managing User Accounts and Group Information . . . . . . . . 170

11.3.1 Creating User Accounts . . . . . . . . . . . . . . . 171

11.3.2 The

passwd

Command . . . . . . . . . . . . . . . . 172

11.3.3 Deleting User Accounts . . . . . . . . . . . . . . . 174

11.3.4 Changing User Accounts and Group Assignment . . . . . . 174

11.3.5 Changing User Information Directly—

vipw

.........175

11.3.6 Creating, Changing and Deleting Groups . . . . . . . . . 175

Goals

• Understanding the user and group concepts of Linux

• Knowing how user and group information is stored on Linux

• Being able to use the user and group administration commands

Prerequisites

• Knowledge about handling conguration les

adm1-benutzer.tex

(

33e55eeadba676a3

)

160 11 User Administration

11.1 Basics

11.1.1 Why Users?

Computers used to be large and expensive, but today an oce workplace without

its own PC (“personal computer”) is nearly inconceivable, and a computer is likely

to be encountered in most domestic “dens” as well. And while it may be sucient

for a family to agree that Dad, Mom and the kids will put their les into dierent

directories, this will no longer do in companies or universities—once shared disk

space or other facilities are provided by central servers accessible to many users,

the computer system must be able to distinguish between dierent users and to

assign dierent access rights to them. After all, Ms Jones from the Development

Division has as little business looking at the company’s payroll data as Mr Smith

from Human Resources has accessing the detailed plans for next year’s products.

And a measure of privacy may be desired even at home—the Christmas present

list or teenage daughter’s diary (erstwhile tted with a lock) should not be open

to prying eyes as a matter of course.

BWe shall be discounting the fact that teenage daughter’s diary may be visible

to the entire world on Facebook (or some such); and even if that is the case,

the entire world should surely not be allowed to write to teenage daughter’s

dairy. (Which is why even Facebook supports the notion of dierent users.)

The second reason for distinguishing between dierent users follows from the

fact that various aspects of the system should not be visible, much less change-

able, without special privileges. Therefore Linux manages a separate user iden-

tity (

root

) for the system administrator, which makes it possible to keep informa-

tion such as users’ passwords hidden from “common” users. The bane of older

Windows systems—programs obtained by e-mail or indiscriminate web surng

that then wreak havoc on the entire system—will not plague you on Linux, since

anything you can execute as a common user will not be in a position to wreak

system-wide havoc.

AUnfortunately this is not entirely correct: Every now and then a bug comes

to light that enables a “normal user” to do things otherwise restricted to

administrators. This sort of error is extremely nasty and usually corrected

very quickly after having been found, but there is a considerable chance that

such a bug has remained undetected in the system for an extended period

of time. Therefore, on Linux (as on all other operating systems) you should

strive to run the most current version of critical system parts like the kernel

that your distributor supports.

AEven the fact that Linux safeguards the system conguration from unau-

thorised access by normal users should not entice you to shut down your

brain. We do give you some advice (such as not to log in to the graphical

user interface as

root

), but you should keep thinking along. E-mail messages

asking you to view web site 𝑋and enter your credit card number and PIN

there can reach you even on Linux, and you should disregard them in the

same way as everywhere else.

Linux distinguishes between dierent users by means of dierent user ac-user accounts

counts. The common distributions typically create two user accounts during

installation, namely

root

for administrative tasks and another account for a “nor-

mal” user. You (as the administrator) may add more accounts later, or, on a client

PC in a larger network, they may show up automatically from a user account

database stored elsewhere.

BLinux distinguishes between user accounts, not users. For example, no one

keeps you from using a separate user account for reading e-mail and surf-

ing the web, if you want to be 100% sure that things you download from the

11.1 Basics 161

Net have no access to your important data (which might otherwise happen

in spite of the user/administrator divide). With a little cunning you can

even display a browser and e-mail program running under your “surng

account” among your “normal” programs1.

Under Linux, every user account is assigned a unique number, the so-called

user ID (or UID, for short). Every user account also features a textual user name UID

user name

(such as

root

joe

) which is easier to remember for humans. In most places where

it counts—e. g., when logging in, or in a list of les and their owners—Linux will

use the textual name whenever possible.

BThe Linux kernel does not know anything about textual user names; process

data and the ownership data in the lesystem use the UID exclusively. This

may lead to diculties if a user is deleted while he still owns les on the

system, and the UID is reassigned to a dierent user. That user “inherits”

the previous UID owner’s les.

BThere is no technical problem with assigning the same (numerical) UID to

dierent user names. These users have equal access to all les owned by that

UID, but every user can have his own password. You should not actually

use this (or if you do, use it only with great circumspection).

11.1.2 Users and Groups

To work with a Linux computer you need to log in rst. This allows the system

to recognise you and to assign you the correct access rights (of which more later).

Everything you do during your session (from logging in to logging out) happens

under your user account. In addition, every user has a home directory, where home directory

only they can store and manage their own les, and where other users often have

no read permission and very emphatically no write permission. (Only the system

administrator—

root

—may read and write all les.)

ADepending on which Linux distribution you use (cue: Ubuntu) it may be

possible that you do not have to log into the system explicitly. This is be-

cause the computer “knows” that it will usually be you and simply assumes

that this is going to be the case. You are trading security for convenience; this

particular deal probably makes sense only where you can stipulate with rea-

sonable certainty that nobody except you will switch on your computer—

and hence should be restricted by rights to the computer in your single-

person household without a cleaner. We told you so.

Several users who want to share access to certain system resources or les can

form a group. Linux identies group members either xedly by name or tran- group

siently by a login procedure similar to that for users. Groups have no “home di-

rectories” like users do, but as the administrator you can of course create arbitrary

directories meant for certain groups and having appropriate access rights.

Groups, too, are identied internally using numerical identiers (“group IDs”

or GIDs).

BGroup names relate to GIDs as user names to UIDs: The Linux kernel only

knows about the former and stores only the former in process data or the

le system.

Every user belongs to a primary group and possibly several secondary or addi-

tional groups. In a corporate setting it would, for example, be possible to introduce

project-specic groups and to assign the people collaborating on those projects

to the appropriate group in order to allow them to manage common data in a

directory only accessible to group members.

1Which of course is slightly more dangerous again, since programs runninig on the same screen

can communicate with one another

162 11 User Administration

For the purposes of access control, all groups carry equivalent weight—every

user always enjoys all rights deriving from all the groups that he is a member of.

The only dierence between the primary and secondary groups is that les newly

created by a user are usually2assigned to his primary group.

BUp to (and including) version 2.4 of the Linux kernel, a user could be a mem-

ber of at most 32 additional groups; since Linux 2.6 the number of secondary

groups is unlimited.

You can nd out a user account’s UID, the primary and secondary groups and

the corresponding GIDs by means of the

program:

$id

uid=1000(joe) gid=1000(joe) groups=24(cdrom),29(audio),44(video),





1000(joe)

$id root

uid=0(root) gid=0(root) groups=0(root)

BWith the options

-u

-g

, and

-G

lets itself be persuaded to output just the

account’s UID, the GID of the primary group, or the GIDs of the secondary

groups. (These options cannot be combined.) With the additional option

-n

you get names instead of numbers:

$id -G

1000 24 29 44

$id -Gn

joe cdrom audio video

BThe

groups

command yields the same result as the ”‘

id -Gn

”’ command.

You can use the

last

command to nd who logged into your computer and

last

when (and, in the case of logins via the network, from where):

$last

joe pts/1 pcjoe.example.c Wed Feb 29 10:51 still logged in

bigboss pts/0 pc01.example.c Wed Feb 29 08:44 still logged in

joe pts/2 pcjoe.example.c Wed Feb 29 01:17 - 08:44 (07:27)

sue pts/0 :0 Tue Feb 28 17:28 - 18:11 (00:43)



reboot system boot 3.2.0-1-amd64 Fri Feb 3 17:43 - 13:25 (4+19:42)



For network-based sessions, the third column species the name of the

ssh

client

computer. “

” denotes the graphical screen (the rst X server, to be exact—there

might be more than one).

BDo also note the

reboot

entry, which tells you that the computer was started.

The third column contains the version number of the Linux operating sys-

tem kernel as provided by “

uname -r

”.

With a user name,

last

provides information about a particular user:

$last

joe pts/1 pcjoe.example.c Wed Feb 29 10:51 still logged in

joe pts/2 pcjoe.example.c Wed Feb 29 01:17 - 08:44 (07:27)



2The exception occurs where the owner of a directory has decreed that new les and subdirectories

within this directory are to be assigned to the same group as the directory itself. We mention this

strictly for completeness.

11.2 User and Group Information 163

BYou might be bothered (and rightfully so!) by the fact that this somewhat

sensitive information is apparently made available on a casual basis to arbi-

trary system users. If you (as the administrator) want to protect your users’

privacy somewhat better than you Linux distribution does by default, you

can use the

#chmod o-r /var/log/wtmp

command to remove general read permissions from the le that

last

con-

sults for the telltale data. Users without administrator privileges then get to

see something like

$last

last: /var/log/wtmp: Permission denied

11.1.3 People and Pseudo-Users

Besides “natural” persons—the system’s human users—the user and group con-

cept is also used to allocate access rights to certain parts of the system. This means

that, in addition to the personal accounts of the “real” users like you, there are fur-

ther accounts that do not correspond to actual human users but are assigned to pseudo-users

administrative functions internally. They dene functional “roles” with their own

accounts and groups.

After installing Linux, you will nd several such pseudo-users and groups in

the

/etc/passwd

and

/etc/group

les. The most important role is that of the

root

user

(which you know) and its eponymous group. The UID and GID of

root

are 0(zero).

root

’s privileges are tied to UID 0; GID 0does not confer any additional

access privileges.

Further pseudo-users belong to certain software systems (e. g.,

news

for Usenet

news using INN, or

postfix

for the Postx mail server) or certain components or

devices (such as printers, tape or oppy drives). You can access these accounts,

if necessary, like other user accounts via the

command. These pseudo-users pseudo-users for privileges

are helpful as le or directory owners, in order to t the access rights tied to le

ownership to special requirements without having to use the

root

account. The

same appkies to groups; the members of the

disk

group, for example, have block-

level access to the system’s disks.

Exercises

C11.1 [1] How does the operating system kernel dierentiate between various

users and groups?

C11.2 [2] What happens if a UID is assigned to two dierent user names? Is

that allowed?

C11.3 [1] What is a pseudo-user? Give examples!

C11.4 [2] (On the second reading.) Is it acceptable to assign a user to group

disk

who you would not want to trust with the

root

password? Why (not)?

11.2 User and Group Information

11.2.1 The

/etc/passwd

File

The

/etc/passwd

le is the system user database. There is an entry in this le for

every user on the system—a line consisting of attributes like the Linux user name,

164 11 User Administration

“real” name, etc. After the system is rst installed, the le contains entries for

most pseudo-users.

The entries in

/etc/passwd

have the following format:

⟨user name⟩

⟨password⟩

⟨UID⟩

⟨GID⟩

⟨GECOS⟩

⟨home directory⟩

⟨shell⟩

⟨user name⟩This name should consist of lowercase letters and digits; the rst char-

acter should be a letter. Unix systems often consider only the rst eight

characters—Linux does not have this limitation but in heterogeneous net-

works you should take it into account.

AResist the temptation to use umlauts, punctuation and similar special

characters in user names, even if the system lets you do so—not all

tools that create new user accounts are picky, and you could of course

edit

/etc/passwd

by hand. What seems to work splendidly at rst glance

may lead to problems elsewhere later.

BYou should also stay away from user names consisting of only upper-

case letters or only digits. The former may give their owners trouble

logging in (see Exercise 11.6), the latter can lead to confusion, espe-

cially if the numerical user name does not equal the account’s numeri-

cal UID. Commands such as ”‘

ls -l

”’ will display the UID if there is no

corresponding entry for it in

/etc/passwd

, and it is not exactly straight-

forward to tell UIDs from purely numerical user names in

output.

⟨password⟩Traditionally, this eld contains the user’s encrypted password. Today,

most Linux distributions use “shadow passwords”; instead of storing the

password in the publically readable

/etc/passwd

le, it is stored in

/etc/shadow

which can only be accessed by the administrator and some privileged pro-

grams. In

/etc/passwd

, a “

” calls attention to this circumstance. Every user

can avail himself of the

passwd

program to change his password.

⟨UID⟩The numerical user identier—a number between 0and 232 − 1. By con-

vention, UIDs from 0to 99 (inclusive) are reserved for the system, UIDs

from 100 to 499 are for use by software packages if they need pseudo-user

accounts. With most popular distributions, “real” users’ UIDs start from

500 (or 1000).

Precisely because the system dierentiates between users not by name but

by UID, the kernel treats two accounts as completely identical if they con-

tain dierent user names but the same UID—at least as far as the access

privileges are concerned. Commands that display a user name (e.g., ”‘

-l

”’ or

) show the one used when the user logged in.

⟨GID⟩The GID of the user’s primary group after logging in.primary group

The Novell/SUSE distributions (among others) assign a single group

such as

users

as the shared primary group of all users. This method is

quite established as well as easy to understand.

Many distributions, such as those by Red Hat or Debian GNU/Linux,

create a new group whenever a new account is created, with the GID

equalling the account’s UID. The idea behind this is to allow more

sophisticated assignments of rights than with the approach that puts

all users into the same group

users

. Consider the following situation:

Jim (user name

jim

) is the personal assistant of CEO Sue (user name

sue

). In this capacity he sometimes needs to access les stored inside

Sue’s home directory that other users should not be able to get at. The

method used by Red Hat, Debian & co., “one group per user”, makes it

straightforward to put user

jim

into group

sue

and to arrange for Sue’s

11.2 User and Group Information 165

les to be readable for all group members (the default case) but not oth-

ers. With the “one group for everyone” approach it would have been

necessary to introduce a new group completely from scratch, and to

recongure the

jim

and

sue

accounts accordingly.

By virtue of the assignment in

/etc/passwd

, every user must be a member of

at least one group.

BThe user’s secondary groups (if applicable) are determined from en-

tries in the

/etc/group

le.

⟨GECOS⟩This is the comment eld, also known as the “GECOS eld”.

BGECOS stands for “General Electric Comprehensive Operating Sys-

tem” and has nothing whatever to do with Linux, except that in the

early days of Unix this eld was added to

/etc/passwd

in order to keep

compatibility data for a GECOS remote job entry service.

This eld contains various bits of information about the user, in particular

his “real” name and optional data such as the oce number or telephone

number. This information is used by programs such as

mail

finger

. The

full name is often included in the sender’s address by news and mail soft-

ware.

BTheoretically there is a program called

chfn

that lets you (as a user)

change the content of your GECOS eld. Whether that works in any

particular case is a dierent question, since at least in a corporate set-

ting one does not necessarily want to allow people to change their

names at a whim.

⟨home directory⟩This directory is that user’s personal area for storing his own les.

A newly created home directory is by no means empty, since a new user

normally receives a number of “prole” les as his basic equipment. When

a user logs in, his shell uses his home directory as its current directory, i. e.,

immediately after logging in the user is deposited there.

⟨shell⟩The name of the program to be started by

after successful authentication—

this is usually a shell. The seventh eld extends through the end of the line.

BThe user can change this entry by means of the

chsh

program. The

eligible programs (shells) are listed in the

/etc/shells

le. If a user is

not supposed to have an interactive shell, an arbitrary program, with

arguments, can be entered here (a common candidate is

/bin/true

). This

eld may also remain empty, in which case the standard shell

/bin/sh

will be started.

BIf you log in to a graphical environment, various programs will be

started on your behalf, but not necessarily an interactive shell. The

shell entry in

/etc/passwd

comes into its own, however, when you in-

voke a terminal emulator such as

xterm

konsole

, since these programs

usually check it to identify your preferred shell.

Some of the elds shown here may be empty. Absolutely necessary are only the

user name, UID, GID and home directory. For most user accounts, all the elds

will be lled in, but pseudo-users might use only part of the elds.

The home directories are usually located below

/home

and take their name from home directories

their owner’s user name. In general this is a fairly sensible convention which

makes a given user’s home directory easy to nd. In theory, a home directory

might be placed anywhere in the le system under a completely arbitrary name.

BOn large systems it is common to introduce one or more additional levels

of directories between

/home

and the “user name” directory, such as

166 11 User Administration

/home/hr/joe

Joe from Human Resources

/home/devel/sue

Sue from Development

/home/exec/bob

Bob the CEO

There are several reasons for this. On the one hand this makes it easier to

keep one department’s home directory on a server within that department,

while still making it available to other client computers. On the other hand,

Unix (and some Linux) le systems used to be slow dealing with directories

containing very many les, which would have had an unfortunate impact

on a

/home

with several thousand entries. However, with current Linux le

systems (

ext3

with

dir_index

and similar) this is no longer an issue.

Note that as an administrator you should not really be editing

/etc/passwd

hand. There is a number of programs that will help you create and maintain usertools

accounts.

BIn principle it is also possible to store the user database elsewhere than in

/etc/passwd

. On systems with very many users (thousands), storing user

data in a relational database is preferable, while in heterogeneous networks

a shared multi-platform user database, e. g., based on an LDAP directory,

might recommend itself. The details of this, however, are beyond the scope

of this course.

11.2.2 The

/etc/shadow

File

For security, nearly all current Linux distributions store encrypted user passwords

in the

/etc/shadow

le (“shadow passwords”). This le is unreadable for normal

users; only

root

may write to it, while members of the

shadow

group may read it in

addition to

root

. If you try to display the le as a normal user an error occurs.

BUse of

/etc/shadow

is not mandatory but highly recommended. However

there may be system congurations where the additional security aorded

by shadow passwords is nullied, for example if NIS is used to export user

data to other hosts (especially in heterogeneous Unix environments).

Again, this le contains one line for each user, with the following format:format

⟨user name⟩

⟨password⟩

⟨change⟩

⟨min⟩

⟨max⟩



⟨warn⟩

⟨grace⟩

⟨lock⟩

⟨reserved⟩

For example:

root:gaY2L19jxzHj5:10816:0:10000::::

daemon:*:8902:0:10000::::

joe:GodY6c5pZk1xs:10816:0:10000::::

Here is the meaning of the individual elds:

⟨user name⟩This must correspond to an entry in the

/etc/passwd

le. This eld

“joins” the two les.

⟨password⟩The user’s encrypted password. An empty eld generally means that

the user can log in without a password. An asterisk or an exclamation point

prevent the user in question from logging in. It is common to lock user’s ac-

counts without deleting them entirely by placing an asterisk or exclamation

point at the beginning of the corresponding password.

⟨change⟩The date of the last password change, in days since 1 January 1970.

11.2 User and Group Information 167

⟨min⟩The minimal number of days that must have passed since the last password

change before the password may be changed again.

⟨max⟩The maximal number of days that a password remains valid without hav-

ing to be changed. After this time has elapsed the user must change his

password.

⟨warn⟩The number of days before the expiry of the ⟨max⟩period that the user will

be warned about having to change his password. Generally, the warning

appears when logging in.

⟨grace⟩The number of days, counting from the expiry of the ⟨max⟩period, after

which the account will be locked if the user does not change his password.

(During the time from the expiry of the ⟨max⟩period and the expiry of this

grace period the user may log in but must immediately change his pass-

word.)

⟨lock⟩The date on which the account will be denitively locked, again in days

since 1 January 1970.

Some brief remarks concerning password encryption are in order. You might password encryption

think that if passwords are encrypted they can also be decrypted again. This would

open all of the system’s accounts to a clever cracker who manages to obtain a copy

/etc/shadow

. However, in reality this is not the case, since password “encryption”

is a one-way street. It is impossible to recover the decrypted representation of a

Linux password from the “encrypted” form because the method used for encryp-

tion prevents this. The only way to “crack” the encryption is by encrypting likely

passwords and checking whether they match what is in

/etc/shadow

BLet’s assume you select the characters of your password from the 95 vis-

ible ASCII characters (uppercase and lowercase letters are distinguished).

This means that there are 95 dierent one-character passwords, 952= 9025

two-character passwords, and so on. With eight characters you are already

up to 6.6 quadrillion (6.6 ⋅ 1015) possibilities. Stipulating that you can trial-

encrypt 10 million passwords per second (not entirely unrealistic on current

hardware), this means you would require approximately 21 years to work

through all possible passwords. If you are in the fortunate position of own-

ing a modern graphics card, another acceleration by a factor of 50–100 is

quite feasible, which makes that about two months. And then of course

there are handy services like Amazon’s EC2, which will provide you (or

random crackers) with almost arbitrary CPU power, or the friendly neigh-

bourhood Russian bot net … so don’t feel too safe.

BThere are a few other problems. The traditional method (usually called

“crypt” or “DES”—the latter because it is based on, but not identical to, the

eponymous encryption method3) should no longer be used if you can avoid

it. It has the unpleasant property of only looking at the rst eight characters

of the entered password, and clever crackers can nowadays buy enough disk

space to build a pre-encrypted cache of the 50 million (or so) most common

passwords. To “crack” a password they only need to search their cache for

the encrypted password, which can be done extremely quickly, and read o

the corresponding clear-text password.

BTo make things even more laborious, when a newly entered password is

encrypted the system traditionally adds a random element (the so-called

3If you must know exactly: The clear-text password is used as the key (!) to encrypt a constant

string (typically a sequence of zero bytes). A DES key is 56 bits, which just happens to be 8 characters

of 7 bits each (as the leftmost bit in each character is ignored). This process is repeated for a total of

25 rounds, with the previous round’s output serving as the new input. Strictly speaking the encryption

scheme used isn’t quite DES but changed in a few places, to make it less feasible to construct a special

password-cracking computer from commercially available DES encryption chips.

168 11 User Administration

“salt”) which selects one of 4096 dierent possibilities for the encrypted

password. The main purpose of the salt is to avoid random hits result-

ing from user 𝑋, for some reason or other, getting a peek at the content

/etc/shadow

and noting that his encrypted password looks just like that

of user 𝑌(hence letting him log into user 𝑌’s account using his own clear-

text password). For a pleasant side eect, the disk space required for the

cracker’s pre-encrypted dictionary from the previous paragraph is blown

up by a factor of 4096.

BNowadays, password encryption is commonly based on the MD5 algorithm,

allows for passwords of arbitrary length and uses a 48-bit salt instead of

the traditional 12 bits. Kindly enough, the encryption works much more

slowly than “crypt”, which is irrelevant for the usual purpose (checking a

password upon login—you can still encrypt several hundred passwords per

second) but does encumber clever crackers to a certain extent. (You should

not let yourself be bothered by the fact that cryptographers poo-poo the

MD5 scheme as such due to its insecurity. As far as password encryption is

concerned, this is fairly meaningless.)

AYou should not expect too much of the various password administration pa-

rameters. They are being used by the text console login process, but whether

other parts of the system (such as the graphical login screen) pay them any

notice depends on your setup. Nor is there usually an advantage in forc-

ing new passwords on users at short intervals—this usually results in a se-

quence like

bob1

bob2

bob3

, …, or users alternate between two passwords.

Aminimal interval that must pass before a user is allowed to change their

password again is outright dangerous, since it may give a cracker a “win-

dow” for illicit access even though the user knows their password has been

compromised.

The problem you need to cope with as a system administrator is usually not

people trying to crack your system’s passwords by “brute force”. It is much more

promising, as a rule, to use “social engineering”. To guess your password, the

clever cracker does not start at

, and so on, but with your spouse’s rst name,

your kids’ rst names, your car’s plate number, your dog’s birthday et cetera. (We

do not in any way mean to imply that you would use such a stupid password. No,

no, not you by any means. However, we are not quite so positive about your boss

…) And then there is of course the time-honoured phone call approach: “Hi, this

is the IT department. We’re doing a security systems test and urgently require

your user name and password.”

There are diverse ways of making Linux passwords more secure. Apart from

the improved encryption scheme mentioned above, which by now is used by de-

fault by most Linux distributions, these include complaining about (too) weak

passwords when they are rst set up, or proactively running software that will

try to identify weak encrypted passwords, just like clever crackers would (Cau-

tion: Do this in your workplace only with written (!) pre-approval from your

boss!). Other methods avoid passwords completely in favour of constantly chang-

ing magic numbers (as in SecurID) or smart cards. All of this is beyond the scope

of this manual, and therefore we refer you to the Linup Front manual Linux Secu-

rity.

11.2.3 The

/etc/group

File

By default, Linux keeps group information in the

/etc/group

le. This le containsgroup database

one-line entry for each group in the system, which like the entries in

/etc/passwd

consists of elds separated by colons (

). More precisely,

/etc/group

contains four

elds per line.

⟨group name⟩

⟨password⟩

⟨GID⟩

⟨members⟩

11.2 User and Group Information 169

Their meaning is as follows:

⟨group name⟩The name of the group, for use in directory listings, etc.

⟨password⟩An optional password for this group. This lets users who are not mem-

bers of the group via

/etc/shadow

/etc/group

assume membership of the

group using

newgrp

. A “

” as an invalid character prevents normal users

from changing to the group in question. A “

” refers to the separate pass-

word le

/etc/gshadow

⟨GID⟩The group’s numerical group identier.

⟨Members⟩A comma-separated list of user names. This list contains all users who

have this group as a secondary group, i. e., who are members of this group

but have a dierent value in the GID eld of their

/etc/passwd

entry. (Users

with this group as their primary group may also be listed here but that is

unnecessary.)

/etc/group

le could, for example, look like this:

root:x:0:root

bin:x:1:root,daemon

users:x:100:

project1:x:101:joe,sue

project2:x:102:bob

The entries for the

root

and

bin

groups are entries for administrative groups, sim- administrative groups

ilar to the system’s pseudo-user accounts. Many les are assigned to groups like

this. The other groups contain user accounts.

Like UIDs, GIDs are counted from a specic value, typically 100. For a valid GID values

entry, at least the rst and third eld (group name and GID) must be lled in.

Such an entry assigns a GID (which might occur in a user’s primary GID eld in

/etc/passwd

) a textual name.

The password and/or membership elds must only be lled in for groups that

are assigned to users as secondary groups. The users listed in the membership membership list

list are not asked for a password when they want to change GIDs using the

newgrp

command. If an encrypted password is given, users without an entry in the mem- group password

bership list can authenticate using the password to assume membership of the

group.

BIn practice, group passwords are hardly if ever used, as the administrative

overhead barely justies the benets to be derived from them. On the one

hand it is more convenient to assign the group directly to the users in ques-

tion (since, from version 2.6 of the Linux kernel on, there is no limit to the

number of secondary groups a user can join), and on the other hand a single

password that must be known by all group members does not exactly make

for bullet-proof security.

BIf you want to be safe, ensure that there is an asterisk (“

”) in every group

password slot.

11.2.4 The

/etc/gshadow

File

As for the user database, there is a shadow password extension for the group

database. The group passwords, which would otherwise be encrypted but read-

able for anyone in

/etc/group

(similar to

/etc/passwd

), are stored in the separate le

/etc/gshadow

. This also contains additional information about the group, for ex-

ample the names of the group administrators who are entitled to add or remove

members from the group.

170 11 User Administration

11.2.5 The

getent

Command

Of course you can read and process the

/etc/passwd

/etc/shadow

, and

/etc/group

les,

like all other text les, using programs such as

cat

less

grep

(OK, OK, you need

to be

root

to get at

/etc/shadow

). There are, however, some practical problems:

• You may not be able to see the whole truth: Your user database (or parts of

it) might be stored on an LDAP server, SQL database, or a Windows domain

controller, and there really may not be much of interest in

/etc/passwd

• If you want to look for a specic user’s entry, it is slightly inconvenient to

type this using

grep

if you want to avoid “false positives”.

The

getent

command makes it possible to query the various databases for user and

group information directly. With

$getent passwd

you will be shown something that looks like

/etc/passwd

, but has been assembled

from all sources of user information that are currently congured on your com-

puter. With

$getent passwd hugo

you can obtain user

hugo

’s entry, no matter where it is actually stored. Instead

passwd

, you may also specify

shadow

group

, or

gshadow

to consult the respective

database. (Naturally, even with

getent

you can only access

shadow

and

gshadow

user

root

BThe term “database” is understood as “totality of all sources from where

the C library can obtain information on that topic (such as users)”. If you

want to know exactly where that information comes from (or might come

from), then read

nsswitch.conf

(5) and examine the

/etc/nsswitch.conf

le on

your system.

BYou may also specify several user or group names. In that case, information

on all the named users or groups will be output:

$getent passwd hugo susie fritz

Exercises

C11.5 [1] Which value will you nd in the second column of the

/etc/passwd

le? Why do you nd that value there?

C11.6 [2] Switch to a text console (using, e. g., Alt +F1 ) and try logging in

but enter your user name in uppercase letters. What happens?

C11.7 [2] How can you check that there is an entry in the

shadow

database for

every entry in the

passwd

database? (

pwconv

only considers the

/etc/passwd

and

/etc/shadow

les, and also rewrites the

/etc/shadow

le, which we don’t want.

11.3 Managing User Accounts and Group Information

After a new Linux distribution has been installed, there is often just the

root

ac-

count for the system administrator and the pseudo-users’ accounts. Any other

user accounts must be created rst (and most distributions today will gently but

rmly nudge the installing person to create at least one “normal” user account).

As the administrator, it is your job to create and manage the accounts for all

required users (real and pseudo). To facilitate this, Linux comes with several toolstools for user management

for user management. With them, this is mostly a straightforward task, but it is

important that you understand the background.

11.3 Managing User Accounts and Group Information 171

11.3.1 Creating User Accounts

The procedure for creating a new user account is always the same (in principle)

and consists of the following steps:

1. You must create entries in the

/etc/passwd

(and possibly

/etc/shadow

) les.

2. If necessary, an entry (or several) in the

/etc/group

le is necessary.

3. You must create the home directory, copy a basic set of les into it, and

transfer ownership of the lot to the new user.

4. If necessary, you must enter the user in further databases, e. g., for disk quo-

tas, database access privilege tables and special applications.

All les involved in adding a new account are plain text les. You can perform

each step manually using a text editor. However, as this is a job that is as tedious

as it is elaborate, it behooves you to let the system help you, by means of the

useradd

program.

In the simplest case, you pass

useradd

merely the new user’s user name. Op-

useradd

tionally, you can enter various other user parameters; for unspecied parameters

(typically the UID), “reasonable” default values will be chosen automatically. On

request, the user’s home directory will be created and endowed with a basic set of

les that the program takes from the

/etc/skel

directory. The

useradd

command’s

syntax is:

useradd

[⟨options⟩] ⟨user name⟩

The following options (among others) are available:

-c

⟨comment⟩GECOS eld entry

-d

⟨home directory⟩If this option is missing,

/home/

⟨user name⟩is assumed

-e

⟨date⟩On this date the account will be deactivated automatically (format

“YYYY-MM-DD”)

-g

⟨group⟩The new user’s primary group (name or GID). This group must exist.

-G

⟨group⟩[,⟨group⟩]…Supplementary groups (names or GIDs). These groups

must also exist.

-s

⟨shell⟩The new user’s login shell

-u

⟨UID⟩The new user’s numerical UID. This UID must not be already in use,

unless the “

-o

” option is given

-m

Creates the home directory and copies the basic set of les to it. These les

come from

/etc/skel

, unless a dierent directory was named using “

-k

⟨directory⟩”.

For instance, the

#useradd -c "Joe Smith" -m -d /home/joe -g devel \

>-k /etc/skel.devel

command creates an account by the name of

joe

for a user called Joe Smith, and

assigns it to the

devel

group.

joe

’s home directory is created as

/home/joe

, and the

les from

/etc/skel.devel

are being copied into it.

BWith the

-D

option (on SUSE distributions,

--show-defaults

) you may set de-

fault values for some of the properties of new user accounts. Without addi-

tional options, the default values are displayed:

172 11 User Administration

#useradd -D

GROUP=100

HOME=/home

INACTIVE=-1

EXPIRE=

SHELL=/bin/sh

SKEL=/etc/skel

CREATE_MAIL_SPOOL=no

You can change these values using the

-g

-b

-f

-e

, and

-s

options, respec-

tively:

#useradd -D -s /usr/bin/zsh zsh

as the default shell

The nal two values in the list cannot be changed.

useradd

is a fairly low-level tool. In real life, you as an experienced adminis-

trator will likely not be adding new user accounts by means of

useradd

, but

through a shell script that incorporates your local policies (just so you don’t

have to remember them all the time). Unfortunately you will have to come

up with this shell script by yourself—at least unless you are using Debian

GNU/Linux or one of its derivatives (see below).

Watch out: Even though every serious Linux distribution comes with a program

called

useradd

, the implementations dier in their details.

The Red Hat distributions include a fairly run-of-the-mill version of

useradd

without bells and whistles, which provides the features discussed above.

The SUSE distributions’

useradd

is geared towards optionally adding users to

a LDAP directory rather than the

/etc/passwd

le. (This is why the

-D

option

cannot be used to query or set default values like it can elsewhere—it is

already spoken for to do LDAPy things.) The details are beyond the scope

of this manual.

On Debian GNU/Linux and Ubuntu,

useradd

does exist but the recom-

mended method to create new user accounts is a program called

adduser

(thankfully this is not confusing). The advantage of

adduser

is that it plays

according to Debian GNU Linux’s rules, and furthermore makes it possible

to execute arbitrary other actions for a new account besides creating the

actual account. For example, one might create a directory in a web server’s

document tree so that the new user (and nobody else) can publish les

there, or the user could automatically be authorised to access a database

server. You can nd the details in

adduser

(8) and

adduser.conf

(5).

After it has been created using

useradd

, the new account is not yet accessible;

the system administrator must rst set up a password. We shall be explaining thispassword

presently.

11.3.2 The

passwd

Command

The

passwd

command is used to set up passwords for users. If you are logged in as

root

, then

#passwd joe

asks for a new password for user

john

(You must enter it twice as it will not be

echoed to the screen).

The

passwd

command is also available to normal users, to let them change their

own passwords (changing other users’ passwords is

root

’s prerogative):

11.3 Managing User Accounts and Group Information 173

$passwd

Changing password for joe.

(current) UNIX password: secret123

Enter new UNIX password: 321terces

Retype new UNIX password: 321terces

passwd: password updated successfully

Normal users must enter their own password correctly once before being allowed

to set a new one. This is supposed to make life dicult for practical jokers that

play around on your computer if you had to step out very urgently and didn’t

have time to engage the screen lock.

On the side,

passwd

serves to manage various settings in

/etc/shadow

. For exam-

ple, you can look at a user’s “password state” by calling the

passwd

command with

the

-S

option:

#passwd -S bob

bob LK 10/15/99 0 99999 7 0

The rst eld in the output is (once more) the user name, followed by the password

state: “

” or “

” if a password is set, “

” or “

” for a locked account, and “

” for

an account with no password at all. The other elds are, respectively, the date of

the last password change, the minimum and maximum interval for changing the

password, the expiry warning interval and the “grace period” before the account

is locked completely after the password has expired. (See also Section 11.2.2.)

You can change some of these settings by means of

passwd

options. Here are a

few examples:

#passwd -l joe

Lock the account

#passwd -u joe

Unlock the account

#passwd -n 7 joe

Password change at most every 7 days

#passwd -x 30 joe

Password change at least every 30 days

#passwd -w 3 joe

3 days grace period before password expires

ELocking and unlocking accounts by means of

-l

and

-u

works by putting

a “

” in front of the encrypted password in

/etc/shadow

. Since “

” cannot

result from password encryption, it is impossible to enter something upon

access via the usual login procedure is prevented. Once the “

” is removed,

the original password is back in force. (Astute, innit?) However, you should

keep in mind that users may be able to gain access to the system by other

means that do not refer to the encrypted password in the user database,

such as the secure shell with public-key authentication.

Changing the remaining settings in

/etc/shadow

requires the

chage

command:

#chage -E 2009-12-01 joe

Lock account from 1 Dec 2009

#chage -E -1 joe

Cancel expiry date

#chage -I 7 joe

Grace period 1 week from password expiry

#chage -m 7 joe

passwd -n

(Grr.)

#chage -M 7 joe

passwd -x

(Grr, grr.)

#chage -W 3 joe

passwd -w

(Grr, grr, grr.)

(

chage

can change all settings that

passwd

can change, and then some.)

BIf you cannot remember the option names, invoke

chage

with the name of

a user account only. The program will present you with a sequence of the

current values to change or conrm.

174 11 User Administration

You cannot retrieve a clear-text password even if you are the administrator.

Even checking

/etc/shadow

doesn’t help, since this le stores all passwords already

encrypted. If a user forgets their password, it is usually sucient to reset their

password using the

passwd

command.

BShould you have forgotten the

root

password and not be logged in as

root

any chance, your last option is to boot Linux to a shell, or boot from a rescue

disk or CD. (See Chapter 16.) After that, you can use an editor to clear the

⟨password⟩eld of the

root

entry in

/etc/passwd

Exercises

C11.8 [3] Change user

joe

’s password. How does the

/etc/shadow

le change?

Query that account’s password state.

C11.9 [!2] The user

dumbo

has forgotten his password. How can you help him?

C11.10 [!3] Adjust the settings for user

joe

’s password such that he can change

his password after at least a week, and must change it after at most two

weeks. There should be a warning two days before the two weeks are up.

Check the settings afterwards.

11.3.3 Deleting User Accounts

To delete a user account, you need to remove the user’s entries from

/etc/passwd

and

/etc/shadow

, delete all references to that user in

/etc/group

, and remove the user’s

home directory as well as all other les created or owned by that user. If the

user has, e. g., a mail box for incoming messages in

/var/mail

, that should also be

removed.

Again there is a suitable command to automate these steps. The

userdel

com-

userdel

mand removes a user account completely. Its syntax:

userdel

[

-r

] ⟨user name⟩

The

-r

option ensures that the user’s home directory (including its content) and

his mail box in

/var/mail

will be removed; other les belonging to the user—e. g.,

crontab

les—must be delete manually. A quick way to locate and remove les

belonging to a certain user is the

find / -uid

⟨UID⟩

-delete

command. Without the

-r

option, only the user information is removed from the

user database; the home directory remains in place.

11.3.4 Changing User Accounts and Group Assignment

User accounts and group assignments are traditionally changed by editing the

/etc/passwd

and

/etc/group

les. However, many systems contain commands like

usermod

and

groupmod

for the same purpose, and you should prefer these since they

are safer and—mostly—more convenient to use.

The

usermod

program accepts mostly the same options as

useradd

, but changes

usermod

existing user accounts instead of creating new ones. For example, with

usermod -g

⟨group⟩ ⟨user name⟩

you could change a user’s primary group.

Caution! If you want to change an existing user account’s UID, you could editChanging UIDs

the ⟨UID⟩eld in

/etc/passwd

directly. However, you should at the same time trans-

fer that user’s les to the new UID using

chown

: “

chown -R tux /home/tux

” re-confers

11.3 Managing User Accounts and Group Information 175

ownership of all les below user

tux

’s home directory to user

tux

, after you have

changed the UID for that account. If “

ls -l

” displays a numerical UID instead of

a textual name, this implies that there is no user name for the UID of these les.

You can x this using

chown

11.3.5 Changing User Information Directly—

vipw

The

vipw

command invokes an editor (

or a dierent one) to edit

/etc/passwd

di-

rectly. At the same time, the le in question is locked in order to keep other users

from simultaneously changing the le using, e. g.,

passwd

(which changes would

be lost). With the

-s

option,

/etc/shadow

can be edited.

BThe actual editor that is invoked is determined by the value of the

VISUAL

environment variable, alternatively that of the

EDITOR

environment variable;

if neither exists,

will be launched.

Exercises

C11.11 [!2] Create a user called

test

. Change to the

test

account and create a

few les using

touch

, including a few in a dierent directory than the home

directory (say,

/tmp

). Change back to

root

and change

test

’s UID. What do

you see when listing user

test

’s les?

C11.12 [!2] Create a user called

test1

using your distribution’s graphical tool

(if available),

test2

by means of the

useradd

command, and another,

test3

manually. Look at the conguration les. Can you work without problems

using any of these three accounts? Create a le using each of the new ac-

counts.

C11.13 [!2] Delete user

test2

’s account and ensure that there are no les left

on the system that belong to that user.

C11.14 [2] Change user

test1

’s UID. What else do you need to do?

C11.15 [2] Change user

test1

’s home directory from

/home/test1

/home/user/

test1

11.3.6 Creating, Changing and Deleting Groups

Like user accounts, you can create groups using any of several methods. The

“manual” method is much less tedious here than when creating new user ac-

counts: Since groups do not have home directories, it is usually sucient to edit

the

/etc/group

le using any text editor, and to add a suitable new line (see be-

low for

vigr

). When group passwords are used, another entry must be added to

/etc/gshadow

Incidentally, there is nothing wrong with creating directories for groups.

Group members can place the fruits of their collective labour there. The approach

is similar to creating user home directories, although no basic set of conguration

les needs to be copied.

For group management, there are, by analogy to

useradd

usermod

, and

userdel

the

groupadd

groupmod

, and

groupdel

programs that you should use in favour of edit-

ing

/etc/group

and

/etc/gshadow

directly. With

groupadd

you can create new groups

groupadd

simply by giving the correct command parameters:

groupadd

[

-g

⟨GID⟩] ⟨group name⟩

The

-g

option allows you to specify a given group number. As mentioned be-

fore, this is a positive integer. The values up to 99 are usually reserved for system

groups. If

-g

is not specied, the next free GID is used.

You can edit existing groups with

groupmod

without having to write to

/etc/group groupmod

directly:

176 11 User Administration

groupmod

[

-g

⟨GID⟩] [

-n

⟨name⟩] ⟨group name⟩

The “

-g

⟨GID⟩” option changes the group’s GID. Unresolved le group assign-

ments must be adjusted manually. The “

-n

⟨name⟩” option sets a new name for the

group without changing the GID; manual adjustments are not necessary.

There is also a tool to remove group entries. This is unsurprisingly called

groupdel

⟨group name⟩

Here, too, it makes sense to check the le system and adjust “orphaned” group

assignments for les with the

chgrp

command. Users’ primary groups may not be

removed—the users in question must either be removed beforehand, or they must

be reassigned to a dierent primary group.

The

gpasswd

command is mainly used to manipulate group passwords in a way

gpasswd

similar to the

passwd

command. The system administrator can, however, delegate

the administration of a group’s membership list to one or more group adminis-group administrator

trators. Group administrators also use the

gpasswd

command:

gpasswd -a

⟨user⟩ ⟨group⟩

adds the ⟨user⟩to the ⟨group⟩, and

gpasswd -d

⟨user⟩ ⟨group⟩

removes him again. With

gpasswd -A

⟨user⟩

,…

⟨group⟩

the system administrator can nominate users who are to serve as group adminis-

trators.

The SUSE distributions haven’t included

gpasswd

for some time. Instead

there are modied versions of the user and group administration tools that

can handle an LDAP directory.

As the system administrator, you can change the group database directly using

the

vigr

command. It works like

vipw

, by invoking an editor for “exclusive” access

vigr

/etc/group

. Similarly, “

vigr -s

” gives you access to

/etc/gshadow

Exercises

C11.16 [2] What are groups needed for? Give possible examples.

C11.17 [1] Can you create a directory that all members of a group can access?

C11.18 [!2] Create a supplementary group

test

. Only user

test1

should be a

member of that group. Set a group password. Log in as user

test1

test2

and try to change over to the new group.

11.3 Managing User Accounts and Group Information 177

Commands in this Chapter

adduser

Convenient command to create new user accounts (Debian)

adduser

(8) 172

chfn

Allows users to change the GECOS eld in the user database

chfn

(1) 165

getent

Gets entries from administrative databases

getent

(1) 170

gpasswd

Allows a group administrator to change a group’s membership and up-

date the group password

gpasswd

(1) 176

groupadd

Adds user groups to the system group database

groupadd

(8) 175

groupdel

Deletes groups from the system group database

groupdel

(8) 176

groupmod

Changes group entries in the system group database

groupmod

(8) 175

groups

Displays the groups that a user is a member of

groups

(1) 162

Displays a user’s UID and GIDs

(1) 162

last

List recently-logged-in users

last

(1) 162

useradd

Adds new user accounts

useradd

(8) 171

userdel

Removes user accounts

userdel

(8) 174

usermod

Modies the user database

usermod

(8) 174

vigr

Allows editing

/etc/group

/etc/gshadow

with “le locking”, to avoid con-

icts

vipw

(8) 176

Summary

• Access to the system is governed by user accounts.

• A user account has a numerical UID and (at least) one textual user name.

• Users can form groups. Groups have names and numerical GIDs.

• “Pseudo-users” and “pseudo-groups” serve to further rene access rights.

• The central user database is (normally) stored in the

/etc/passwd

le.

• The users’ encrypted passwords are stored—together with other password

parameters—in the

/etc/shadow

le, which is unreadable for normal users.

• Group information is stored in the

/etc/group

and

/etc/gshadow

les.

• Passwords are managed using the

passwd

program.

• The

chage

program is used to manage password parameters in

/etc/shadow

• User information is changed using

vipw

or—better—using the specialised

tools

useradd

usermod

, and

userdel

• Group information can be manipulated using the

groupadd

groupmod

groupdel

and

gpasswd

programs.

$ echo tux

tux

$ ls

hallo.c

hallo.o

$ /bin/su -

Password:

Access Control

Contents

12.1 The Linux Access Control System . . . . . . . . . . . . . . 180

12.2 Access Control For Files And Directories . . . . . . . . . . . 180

12.2.1 The Basics . . . . . . . . . . . . . . . . . . . . 180

12.2.2 Inspecting and Changing Access Permissions. . . . . . . . 181

12.2.3 Specifying File Owners and Groups—

chown

and

chgrp

. . . . . 182

12.2.4 The umask . . . . . . . . . . . . . . . . . . . . 183

12.3 Access Control Lists (ACLs) . . . . . . . . . . . . . . . . 185

12.4 Process Ownership . . . . . . . . . . . . . . . . . . . 185

12.5 Special Permissions for Executable Files . . . . . . . . . . . . 185

12.6 Special Permissions for Directories . . . . . . . . . . . . . 186

12.7 File Attributes . . . . . . . . . . . . . . . . . . . . . 188

Goals

• Understanding the Linux access control/privilege mechanisms

• Being able to assign access permissions to les and directories

• Knowing about the “umask”, SUID, SGID and the “sticky bit”

• Knowing about le attributes in the

ext

le systems

Prerequisites

• Knowledge of Linux user and group concepts (see Chapter 11)

• Knowledge of Linux les and directories

adm1-rechte.tex

(

33e55eeadba676a3

)

180 12 Access Control

12.1 The Linux Access Control System

Whenever several users have access to the same computer system there must be

an access control system for processes, les and directories in order to ensure thataccess control system

user 𝐴cannot access user 𝐵’s private les just like that. To this end, Linux imple-

ments the standard system of Unix privileges.

In the Unix tradition, every le and directory is assigned to exactly one user

(its owner) and one group. Every le supports separate privileges for its owner,separate privileges

the members of the group it is assigned to (“the group”, for short), and all other

users (“others”). Read, write and execute privileges can be enabled individually

for these three sets of users. The owner may determine a le’s access privileges.

The group and others may only access a le if the owner confers suitable privileges

to them. The sum total of a le’s access permissions is also called its access mode.access mode

In a multi-user system which stores private or group-internal data on a gen-

erally accessible medium, the owner of a le can keep others from reading or

modifying his les by instituting suitable access control. The rights to a le can beaccess control

determined separately and independently for its owner, its group and the others.

Access permissions allow users to map the responsibilities of a group collabora-

tive process to the les that the group is working with.

12.2 Access Control For Files And Directories

12.2.1 The Basics

For each le and each directory in the system, Linux allows separate access rights

for each of the three classes of users—owner, members of the le’s group, others.

These rights include read permission, write permission, and execute permission.

As far as les are concerned, these permissions control approximately whatfile permissions

their names suggest: Whoever has read permission may look at the le’s content,

whoever has write permission is allowed to change its content. Execute permis-

sion is necessary to launch the le as a process.

BExecuting a binary “machine-language program” requires only execute per-

mission. For les containing shell scripts or other types of “interpreted”

programs, you also need read permission.

For directories, things look somewhat dierent: Read permission is requireddirectory permissions

to look at a directory’s content—for example, by executing the

command. You

need write permission to create, delete, or rename les in the directory. “Execute”

permission stands for the possibility to “use” the directory in the sense that you

can change into it using

, or use its name in path names referring to les farther

down in the directory tree.

BIn directories where you have only read permission, you may read the le

names but cannot nd out anything else about the les. If you have only “ex-

ecute permission” for a directory, you can access les as long as you know

their names.

Usually it makes little sense to assign write and execute permission to a directory

separately; however, it may be useful in certain special cases.

AIt is important to emphasise that write permission on a le is completely

immaterial if the le is to be deleted—you need write permission to the direc-

tory that the le is in and nothing else! Since “deleting” a le only removes

a reference to the actual le information (the inode) from the directory, this

is purely a directory operation. The

command does warn you if you’re

trying to delete a le that you do not have write permission for, but if you

conrm the operation and have write permission to the directory involved,

nothing will stand in the way of the operation’s success. (Like any other

12.2 Access Control For Files And Directories 181

Unix-like system, Linux has no way of “deleting” a le outright; you can

only remove all references to a le, in which case the Linux kernel decides

on its own that no one will be able to access the le any longer, and gets rid

of its content.)

BIf you do have write permission to the le but not its directory, you cannot

remove the le completely. You can, however, truncate it down to 0bytes

and thereby remove its content, even though the le itself still exists in prin-

ciple.

For each user, Linux determines the “most appropriate” access rights. For ex-

ample, if the members of a le’s group do not have read permission for the le

but “others” do, then the group members may not read the le. The (admittedly

enticing) rationale that, if all others may look at the le, then the group members,

who are in some sense also part of “all others”, should be allowed to read it as

well, does not apply.

12.2.2 Inspecting and Changing Access Permissions

You can obtain information about the rights, user and group assignment that ap- information

ply to a le using “

ls -l

”:

$ls -l

-rw-r--r-- 1 joe users 4711 Oct 4 11:11 datei.txt

drwxr-x--- 2 joe group2 4096 Oct 4 11:12 testdir

The string of characters in the rst column of the table details the access permis-

sions for the owner, the le’s group, and others (the very rst character is just the

le type and has nothing to do with permissions). The third column gives the

owner’s user name, and the fourth that of the le’s group.

In the permissions string, “

”, “

”, and “

” signify existing read, write, and

execute permission, respectively. If there is just a “

” in the list, then the corre-

sponding category does not enjoy the corresponding privilege. Thus, “

rw-r--r--

”

stands for “read and write permission for the owner, but read permission only for

group members and others”.

As the le owner, you may set access permissions for a le using the

chmod

com-

chmod

command

mand (from “change mode”). You can specify the three categories by means of the

abbreviations “

” (user) for the owner (yourself), “

” (group) for the le’s group’s

members, and “

” (others) for everyone else. The permissions themselves are

given by the already-mentioned abbreviations “

”, “

”, and “

”. Using “

”, “

”,

and “

”, you can specify whether the permissions in question should be added to

any existing permissions, “subtracted” from the existing permissions, or used to

replace whatever was set before. For example:

$chmod u+x file

Execute permission for owner

$chmod go+w file

sets write permission for group and others

$chmod g+rw file

sets read and write permission for group

$chmod g=rw,o=r file

sets read and write permission,

removes group execute permission;

sets just read permission for others

$chmod a+w file

equivalent to

ugo+w

BIn fact, permission specications can be considerably more complex. Con-

sult the

info

documentation for

chmod

to nd out all the details.

A le’s owner is the single user (apart from

root

) who is allowed to change a

le’s or directory’s access permissions. This privilege is independent of the actual

permissions; the owner may take away all their own permissions, but that does

not keep them from giving them back later.

The general syntax of the

chmod

command is

182 12 Access Control

chmod

[⟨options⟩] ⟨permissions⟩ ⟨name⟩

…

You can give as many le or directory names as desired. The most important

options include:

-R

If a directory name is given, the permissions of les and directories inside this

directory will also be changed (and so on all the way down the tree).

--reference=

⟨name⟩Uses the access permissions of le ⟨name⟩. In this case no

⟨permissions⟩must be given with the command.

BYou may also specify a le’s access mode “numerically” instead of “symbol-

ically” (what we just discussed). In practice this is very common for setting

all permissions of a le or directory at once, and works like this: The three

permission triples are represented as a three-digit octal number—the rst

digit describes the owner’s rights, the second those of the le’s group, and

the third those that apply to “others”. Each of these digits derives from

the sum of the individual permissions, where read permission has value 4,

write permission 2, and execute permission 1. Here are a few examples for

common access modes in “

ls -l

” and octal form:

rw-r--r--

644

r--------

400

rwxr-xr-x

755

BUsing numerical access modes, you can only set all permissions at once—

there is no way of setting or removing individual rights while leaving the

others alone, like you can do with the “

” and “

” operators of the symbolic

representation. Hence, the command

$chmod 644 file

is equivalent to the symbolic

$chmod u=rw,go=r file

12.2.3 Specifying File Owners and Groups—

chown

and

chgrp

The

chown

command lets you set the owner and group of a le or directory. This

command takes the desired owner’s user name and/or group name and the le

or directory name the change should apply to. It is called like

chown

⟨user name⟩[

][⟨group name⟩] ⟨name⟩

…

chown :

⟨group name⟩ ⟨name⟩

…

If both a user and group name are given, both are changed; if just a user name is

given, the group remains as it was; if a user name followed by a colon is given,

then the le is assigned to the user’s primary group. If just a group name is given

(with the colon in front), the owner remains unchanged. For example:

#chown joe:devel letter.txt

#chown www-data foo.html

new user

www-data

#chown :devel /home/devel

new group

devel

chown

also supports an obsolete syntax where a dot is used in place of the

colon.

12.2 Access Control For Files And Directories 183

To “give away” les to other users or arbitrary groups you need to be

root

. The

main reason for this is that normal users could otherwise annoy one another if

the system uses quotas (i.e., every user can only use a certain amount of storage

space).

Using the

chgrp

command, you can change a le’s group even as a normal

user—as long as you own the le and are a member of the new group:

chgrp

⟨group name⟩ ⟨name⟩

…

BChanging a le’s owner or group does not change the access permissions

for the various categories.

chown

and

chgrp

also support the

-R

option to apply changes recursively to part

of the directory hierarchy.

BOf course you can also change a le’s permissions, group, and owner using

most of the popular le browsers (such as Konqueror or Nautilus).

Exercises

C12.1 [!2] Create a new le. What is that le’s group? Use

chgrp

to assign the

le to one of your secondary groups. What happens if you try to assign the

le to a group that you are not a member of?

C12.2 [4] Compare the mechanisms that various le browsers (like Kon-

queror, Nautilus, …) oer for setting a le’s permissions, owner, group, …

Are there notable dierences?

12.2.4 The umask

New les are usually created using the (octal) access mode 666 (read and write

permission for everyone). New directories are assigned the access mode 777.

Since this is not always what is desired, Linux oers a mechanism to remove cer-

tain rights from these access modes. This is called “umask”.

BNobody knows exactly where this name comes from—even though there

are a few theories that all sound fairly implausible.

The umask is an octal number whose complement is ANDed bitwise to the

standard access mode—666 or 777—to arrive at the new le’s or directory’s actual

access mode. In other words: You can consider the umask an access mode contain- umask interpretation

ing exactly those rights that the new le should not have. Here’s an example—let

the umask be 027:

1. Umask value: 027

----w-rwx

2. Complement of umask value: 750

rwxr-x---

3. A new le’s access mode: 666

rw-rw-rw-

4. Result (2 and 3 ANDed together): 640

rw-r-----

The third column shows the octal value, the fourth a symbolic representation. The

AND operation in step 4 can also be read o the fourth column of the second and

third lines: In the fourth line ther e is a letter in each position that had a letter in

the second and the third line—if there is just one dash (“

”), the result will be a

dash.

BIf you’d rather not bother with the complement and AND, you can simply

imagine that each digit of the umask is subtracted from the corresponding

digit of the actual access mode and negative results are considered as zero

(so no “borrowing” from the place to the left). For our example—access

mode 666 and umask 027—this means something like

666 ⊖ 027 = 640,

since 6 ⊖ 0 = 6,6 ⊖ 4 = 2, and 6 ⊖ 7 = 0.

184 12 Access Control

The umask is set using the umask shell command, either by invoking it di-umask shell command

rectly or via a shell startup le—typically

~/.profile

~/.bash_profile

, or

~/.bashrc

The umask is a process attribute similar to the current directory or the processprocess attribute

environment, i. e., it is passed to child processes, but changes in a child process do

not modify the parent process’s settings.

The

umask

command takes a parameter specifying the desired umask:syntax

umask

[

-S

|⟨umask⟩]

The umask may be given as an octal number or in a symbolic representation sim-symbolic representation

ilar to that used by

chmod

—deviously enough, the symbolic form contains the per-

missions that should be left (rather than those to be taken away):

$umask 027

… is equivalent to …

$umask u=rwx,g=rx,o=

This means that in the symbolic form you must give the exact complement of the

value that you would specify in the octal form—exactly those rights that do not

occur in the octal specication.

If you specify no value at all, the current umask is displayed. If the

-S

option

is given, the current umask is displayed in symbolic form (where, again, the re-

maining permissions are set):

$umask

0027

$umask -S

u=rwx,g=rx,o=

Note that you can only remove permissions using the umask. There is no wayexecute permission?

of making les executable by default.

Incidentally, the umask also inuences the

chmod

command. If you invoke

chmod

umask and

chmod

with a “

” mode (e. g., “

chmod +w file

”) without referring to the owner, group or oth-

ers, this is treated like “

”, but the permissions set in the umask are not modied.

Consider the following example:

$umask 027

$touch file

$chmod +x file

$ls -l file

-rwxr-x--- 1 tux users 0 May 25 14:30 file

The “

chmod +x

” sets execute permission for the user and group, but not the others,

since the umask contains the execute bit for “others”. Thus with the umask you

can take precautions against giving overly excessive permissions to les.

BTheoretically, this also works for the

chmod

operators “

” and “

”, but this

does not make a lot of sense in practice.

Exercises

C12.3 [!1] State a numerical umask that leaves the user all permissions, but

removes all permissions from group members and others? What is the cor-

responding symbolic umask?

C12.4 [2] Convince yourself that the “

chmod +x

” and “

chmod a+x

” commands

indeed dier from each other as advertised.

12.3 Access Control Lists (ACLs) 185

12.3 Access Control Lists (ACLs)

As mentioned above, Linux allows you to assign permissions for a le’s owner,

group, and all others separately. For some applications, though, this three-tier

system is too simple-minded, or the more sophisticated permission schemes of

other operating systems must be mapped to Linux. Access control lists (ACLs)

can be used for this.

On most le systems, Linux supports “POSIX ACLs” according to IEEE 1003.1e

(draft 17) with some Linux-specic extensions. This lets you specify additional

groups and users for les and directories, who then can be assigned read, write,

and execute permissions that dier from those of the le’s group and “others”.

Other rights, such as that to assign permissions, are still restricted to a le’s owner

(or

root

) and cannot be delegated even wiht ACLs. The

setfacl

and

getfacl

com-

mands are used to set and query ACLs.

ACLs are a fairly new and rarely-used addition to Linux, and their use is subject

to certain restrictions. The kernel does oversee compliance with them, but, for

instance, not every program is able to copy ACLs along with a le’s content—you

may have to use a specially-adapted

tar

(

star

) for backups of a le system using

ACLs. ACLs are supported by Samba, so Windows clients get to see the correct

permissions, but if you export le systems to other (proprietary) Unix systems, it

may be possible that your ACLs are ignored by Unix clients that do not support

ACLs.

BYou can read up on ACLs on Linux on

http://acl.bestbits.at/

and in

acl

(5)

as well as

getfacl

(1) and

setfacl

(1).

Detailed knowledge of ACLs is not required for the LPIC-1 exams.

12.4 Process Ownership

Linux considers not only the data on a storage medium as objects that can be

owned. The processes on the system have owners, too.

Many commands create a process in the system’s memory. During normal use,

there are always several processes that the system protects from each other. Every

process together with all data within its virtual address space is assigned to a Processes have owners

user, its owner. This is most often the user who started the process—but processes

created using administrator privileges may change their ownership, and the SUID

mechanism (Section 12.5) can also have a hand in this.

The owners of processes are displayed by the

program if it is invoked using

the

-u

option.

#ps -u

USER PID %CPU %MEM SIZE RSS TTY STAT START TIME COMMAND

bin 89 0.0 1.0 788 328 ? S 13:27 0:00 rpc.portmap

test1 190 0.0 2.0 1100 28 3 S 13:27 0:00 bash

test1 613 0.0 1.3 968 24 3 S 15:05 0:00 vi XF86.tex

nobody 167 0.0 1.4 932 44 ? S 13:27 0:00 httpd

root 1 0.0 1.0 776 16 ? S 13:27 0:03 init [3]

root 2 0.0 0.0 0 0 ? SW 13:27 0:00 (kflushd)

12.5 Special Permissions for Executable Files

When listing les using the “

ls -l

” command, you may sometimes encounter per-

mission sets that dier from the usual

rwx

, such as

-rwsr-xr-x 1 root shadow 32916 Dec 11 20:47 /usr/bin/passwd

186 12 Access Control

What does that mean? We have to digress here for a bit:

Assume that the

passwd

program carries the usual access mode:

-rwxr-xr-x 1 root shadow 32916 Dec 11 20:47 /usr/bin/passwd

A normal (unprivileged) user, say

joe

, wants to change his password and invokes

the

passwd

program. Next, he receives the message “permission denied”. What is

the reason? The

passwd

process (which uses

joe

’s privileges) tries to open the

/etc/

shadow

le for writing and fails, since only

root

may write to that le—this cannot

be dierent since otherwise, everybody would be able to manipulate passwords

arbitrarily and, for example, change the

root

password.

By means of the set-UID bit (frequently called “SUID bit”, for short) a programSUID bit

can be caused to run not with the invoker’s privileges but those of the le owner—

here,

root

. In the case of

passwd

, the process executing

passwd

has write permission

/etc/shadow

, even though the invoking user, not being a system administrator,

generally doesn’t. It is the responsibility of the author of the

passwd

program to en-

sure that no monkey business goes on, e. g., by exploiting programming errors to

change arbitrary les except

/etc/shadow

, or entries in

/etc/shadow

except the pass-

word eld of the invoking user. On Linux, by the way, the set-UID mechanism

works only for binary programs, not shell or other interpreter scripts.

BBell Labs used to hold a patent on the SUID mechanism, which was invented

by Dennis Ritchie [SUID]. Originally, AT&T distributed Unix with the

caveat that license fees would be levied after the patent had been granted;

however, due to the logistical diculties of charging hundreds of Unix in-

stallations small amounts of money retroactively, the patent was released

into the public domain.

By analogy to the set-UID bit there is a SGID bit, which causes a process to beSGID bit

executed with the program le’s group and the corresponding privileges (usually

to access other les assigned to that group) rather than the invoker’s group setting.

The SUID and SGID modes, like all other access modes, can be changed using

chmod

syntax

the

chmod

program, by giving symbolic permissions such as

u+s

(sets the SUID bit)

g-s

(deletes the SGID bit). You can also set these bits in octal access modes by

adding a fourth digit at the very left: The SUID bit has the value 4, the SGID bit

the value 2—thus you can assign the access mode 4755 to a le to make it readable

and executable to all users (the owner may also write to it) and to set the SUID bit.

You can recognise set-UID and set-GID programs in the output of “

ls -l

” by

output

the symbolic abbreviations “

” in place of “

” for executable les.

12.6 Special Permissions for Directories

There is another exception from the principle of assigning le ownership accord-

ing to the identity of its creator: a directory’s owner can decree that les created

in that directory should belong to the same group as the directory itself. This can

be specied by setting the SGID bit on the directory. (As directories cannot beSGID for directories

executed, the SGID bit is available to be used for such things.)

A directory’s access permissions are not changed via the SGID bit. To create a

le in such a directory, a user must have write permission in the category (owner,

group, others) that applies to him. If, for example, a user is neither the owner of a

directory nor a member of the directory’s group, the directory must be writable for

“others” for him to be able to create les there. A le created in a SGID directory

then belongs to that directory’s group, even if the user is not a member of that

group at all.

BThe typical application for the SGID bit on a directory is a directory that is

used as le storage for a “project group”. (Only) the members of the project

group are supposed to be able to read and write all les in the directory, and

12.6 Special Permissions for Directories 187

to create new les. This means that you need to put all users collaborating

on the project into a project group (a secondary group will suce):

#groupadd project

Create new group

#usermod -a -G project joe joe

into the group

#usermod -a -G project sue sue

too



Now you can create the directory and assign it to the new group. The owner

and group are given all permissions, the others none; you also set the SGID

bit:

#cd /home/project

#chgrp project /home/project

#chmod u=rwx,g=srwx /home/project

Now, if user

hugo

creates a le in

/home/project

, that le should be assigned

to group

project

$id

uid=1000(joe) gid=1000(joe) groups=101(project),1000(joe)

$touch /tmp/joe.txt

Test: standard directory

$ls -l /tmp/joe.txt

-rw-r--r-- 1 joe joe 0 Jan 6 17:23 /tmp/joe.txt

$touch /home/project/joe.txt

project directory

$ls -l /home/project/joe.txt

-rw-r--r-- 1 joe project 0 Jan 6 17:24 /home/project/joe.txt

There is just a little y in the ointment, which you will be able to discern by

looking closely at the nal line in the example: The le does belong to the

correct group, but other members of group

project

may nevertheless only

read it. If you want all members of group

project

to be able to write to it as

well, you must either apply

chmod

after the fact (a nuisance) or else set the

umask such that group write permission is retained (see Exercise 12.6).

The SGID mode only changes the system’s behaviour when new les are cre-

ated. Existing les work just the same as everywhere else. This means, for in-

stance, that a le created outside the SGID directory keeps its existing group as-

signment when moved into it (whereas on copying, the new copy would be put

into the directory’s group).

The

chgrp

program works as always in SGID directories, too: the owner of a

le can assign it to any group he is a member of. Is the owner not a member of

the directory’s group, he cannot put the le into that group using

chgrp

—he must

create it afresh within the directory.

BIt is possible to set the SUID bit on a directory—this permission does not

signify anything, though.

Linux supports another special mode for directories, where only a le’s owner

may delete or remove les within that directory:

drwxrwxrwt 7 root root 1024 Apr 7 10:07 /tmp

This

mode, the “sticky bit”, can be used to counter a problem which arises when

public directories are in shared use: Write permission to a directory lets a user

delete other users’ les, regardless of their access mode and owner! For example,

the

/tmp

directories are common ground, and many programs create their tempo-

rary les there. To do so, all users have write permission to that directory. This

implies that any user has permission to delete les there.

188 12 Access Control

Table 12.1: The most important le attributes

Attribute Meaning

atime is not updated (interesting for mobile computers)

(append-only) The le can only be appended to

The le’s content is compressed transparently (not implemented)

The le will not be backed up by

dump

(immutable) The le cannot be changed at all

Write operations to the le’s content are passed through the journal

(

ext3

only)

File data will be overwritten with zeroes on deletion (not imple-

mented)

Write operations to the le are performed “synchronously”, i. e.,

without buering them internally

The le may be “undeleted” after deletion (not implemented)

Usually, when deleting or renaming a le, the system does not consider that

le’s access permissions. If the “sticky bit” is set on a directory, a le in that di-

rectory can subsequently be deleted only by its owner, the directory’s owner, or

root

. The “sticky bit” can be set or removed by specifying the symbolic

and

-t

modes; in the octal representation it has value 1in the same digit as SUID and

SGID.

BThe “sticky bit” derives its name from an additional meaning it used to have

in earlier Unix systems: At that time, programs were copied to swap space

in their entirety when started, and removed completely after having termi-

nated. Program les with the sticky bit set would be left in swap space

instead of being removed. This would accelerate subsequent invocations of

those programs since no copy would have to be done. Like most current

Unix systems, Linux uses demand paging, i. e., it fetches only those parts

of the code from the program’s executable le that are really required, and

does not copy anything to swap space at all; on Linux, the sticky bit never

had its original meaning.

Exercises

C12.5 [2] What does the special “

” privilege mean? Where do you nd it?

Can you set this privilege on a le that you created yourself?

C12.6 [!1] Which

umask

invocation can be used to set up a umask that would, in

the project directory example above, allow all members of the

project

group

to read and write les in the project directory?

C12.7 [2] What does the special “

” privilege mean? Where do you nd it?

C12.8 [4] (For programmers.) Write a C program that invokes a suitable com-

mand (such as

). Set this program SUID

root

(or SGID

root

) and observe

what happens when you execute it.

C12.9 [I]f you leave them alone for a few minutes with a

root

shell, clever

users might try to stash a SUID

root

shell somewhere in the system, in order

to assume administrator privileges when desired. Does that work with

bash

With other shells?

12.7 File Attributes

Besides the access permissions, the

ext2

and

ext3

le systems support further lefile attributes

12.7 File Attributes 189

attributes enabling access to special le system features. The most important le

attributes are summarised in Table 12.1.

Most interesting are perhaps the “append-only” and “immutable” attributes,

and

attributes

which you can use to protect log les and conguration les from modication;

only

root

may set or reset these attributes, and once set they also apply to processes

running as

root

BIn principle, an attacker who has gained

root

privileges may reset these at-

tributes. However, attackers do not necessarily consider that they might be

set.

The

attribute may also be useful; you can use it on mobile computers to ensure

attribute

that the disk isn’t always running, in order to save power. Usually, whenever

a le is read, its “atime”—the time of last access—is updated, which of course

entails an inode write operation. Certain les are very frequently looked at in

the background, such that the disk never gets to rest, and you can help here by

judiciously applying the

attribute.

BThe

and

attributes sound very nice in theory, but are not (yet) sup-

ported by “normal” kernels. There are some more or less experimental en-

hancements making use of these attributes, and in part they are still pipe

dreams.

You can set or reset attributes using the

chattr

command. This works rather

chattr

chmod

: A preceding “

” sets one or more attributes, “

” deletes one or more

attributes, and “

” causes the named attributes to be the only enabled ones. The

-R

option, as in

chmod

, lets

chattr

operate on all les in any subdirectories passed

as arguments and their nested subdirectories. Symbolic links are ignored in the

process.

#chattr +a /var/log/messages

Append only

#chattr -R +j /data/important

Data journaling …

#chattr -j /data/important/notso

… with exception

With the

lsattr

command, you can review the attributes set on a le. The pro-

lsattr

gram behaves similar to “

ls -l

”:

#lsattr /var/log/messages

-----a----------- /var/log/messages

Every dash stands for a possible attribute.

lsattr

supports various options such

-R

-a

, and

-d

, which generally behave like the eponymous options to

Exercises

C12.10 [!2] Convince yourself that the

and

attributes work as advertised.

C12.11 [2] Can you make all dashes disappear in the

lsattr

output for a given

le?

190 12 Access Control

Commands in this Chapter

chattr

Sets le attributes for

ext2

and

ext3

le systems

chattr

(1) 189

chgrp

Sets the assigned group of a le or directory

chgrp

(1) 182

chmod

Sets access modes for les and directories

chmod

(1) 181

chown

Sets the owner and/or assigned group of a le or directory

chown

(1) 182

getfacl

Displays ACL data

getfacl

(1) 185

lsattr

Displays le attributes on

ext2

and

ext3

le systems

lsattr

(1) 189

setfacl

Enables ACL manipulation

setfacl

(1) 185

star

POSIX-compatible tape archive with ACL support

star

(1) 185

Summary

• Linux supports le read, write and execute permissions, where these per-

missions can be set separately for a le’s owner, the members of the le’s

group and “all others”.

• The sum total of a le’s permissions is also called its access mode.

• Every le (and directory) has an owner and a group. Access rights—read,

write, and execute permission—are assigned to these two categories and

“others” separately. Only the owner is allowed to set access rights.

• Access rights do not apply to the system administrator (

root

). He may read

or write all les.

• File permissions can be manipulated using the

chmod

command.

• Using

chown

, the system administrator can change the user and group as-

signment of arbitrary les.

• Normal users can use

chgrp

to assign their les to dierent groups.

• The umask can be used to limit the standard permissions when les and

directories are being created.

• The SUID and SGID bits allow the execution of programs with the privileges

of the le owner or le group instead of those of the invoker.

• The SGID bit on a directory causes new les in that directory to be assigned

the directory’s group (instead of the primary group of the creating user).

• The “sticky bit” on a directory lets only the owner (and the system admin-

istrator) delete les.

• The

ext

le systems support special additional le attributes.

Bibliography

SUID Dennis M. Ritchie. “Protection of data le contents”. US patent 4,135,240.

$ echo tux

tux

$ ls

hallo.c

hallo.o

$ /bin/su -

Password:

Process Management

Contents

13.1 What Is A Process? . . . . . . . . . . . . . . . . . . . 192

13.2 Process States . . . . . . . . . . . . . . . . . . . . . 193

13.3 Process Information—

.................194

13.4 Processes in a Tree—

pstree

................195

13.5 Controlling Processes—

kill

and

killall

............196

13.6

pgrep

and

pkill

.....................197

13.7 Process Priorities—

nice

and

renice

..............199

13.8 Further Process Management Commands—

nohup

and

top

. . . . . 199

Goals

• Knowing the Linux process concept

• Using the most important commands to query process information

• Knowing how to signal and stop processes

• Being able to inuence process priorities

Prerequisites

• Linux commands

adm1-prozesse.tex

(

33e55eeadba676a3

)

192 13 Process Management

13.1 What Is A Process?

A process is, in eect, a “running program”. Processes have code that is executed,

and data on which the code operates, but also various attributes the operating uses

to manage them, such as:

• The unique process number (PID or “process identity”) serves to identifyprocess number

the process and can only be assigned to a single process at a time.

• All processes know their parent process number, or PPID. Every process canparent process number

spawn others (“children”) that then contain a reference to their procreator.

The only process that does not have a parent process is the “pseudo” process

with PID 0, which is generated during system startup and creates the “init”

process with a PID of 1, which in turn is the ancestor of all other processes

in the system.

• Every process is assigned to a user and a set of groups. These are impor-user

groups tant to determine the access the process has to les, devices, etc. (See Sec-

tion 12.4.) Besides, the user the process is assigned to may stop, terminate,

or otherwise inuence the process. The owner and group assignments are

passed on to child processes.

• The system splits the CPU time into little chunks (“time slices”), each of

which lasts only for a fraction of a second. The current process is entitled to

such a time slice, and afterwards the system decides which process should

be allowed to execute during the next time slice. This decision is made by

the appropriate “scheduler” based on the priority of a process.priority

BIn multi-processor systems, Linux also takes into account the particu-

lar topology of the computer in question when assigning CPU time to

processes—it is simple to run a process on any of the dierent cores

of a multi-core CPU which share the same memory, while the “migra-

tion” of a process to a dierent processor with separate memory entails

a noticeable administrative overhead and is therefore less often worth-

while.

• A process has other attributes—a current directory, a process environment,other attributes

…—which are also passed on to child processes.

You can consult the

/proc

le system for this type of information. This process leprocess file system

system is used to make available data from the system kernel which is collected at

run time and presented by means of directories and les. In particular, there are

various directories using numbers as names; every such directory corresponds to

one process and its name to the PID of that process. For example:

dr-xr-xr-x 3 root root 0 Oct 16 11:11 1

dr-xr-xr-x 3 root root 0 Oct 16 11:11 125

dr-xr-xr-x 3 root root 0 Oct 16 11:11 80

In the directory of a process, there are various “les” containing process informa-

tion. Details may be found in the

proc

(5) man page.

BThe job control available in many shells is also a form of process management—job control

a “job” is a process whose parent process is a shell. From the corresponding

shell, its jobs can be controlled using commands like

jobs

, and

, as well

as the key combinations Ctrl +zand Ctrl +c(among others). More in-

formation is available from the manual page of the shell in question, or

from the Linup Front training manual, Introduction to Linux for Users and

Administrators.

13.2 Process States 193

Process

created

runnable operating Process

terminates

sleeping

Figure 13.1: The relationship between various process states

Exercises

C13.1 [3] How can you view the environment variables of any of your pro-

cesses? (Hint:

/proc

le system.)

C13.2 [2] (For programmers.) What is the maximum possible PID? What hap-

pens when this limit is reached? (Hint: Look for the string “

PID_MAX

” in the

les below

/usr/include/linux

13.2 Process States

Another important property of a process is its process state. A process in mem- process state

ory waits to be executed by the CPU. This state is called “runnable”. Linux uses

pre-emptive multitasking, i. e., a scheduler distributes the available CPU time to pre-emptive multitasking

waiting processes in pieces called “time slices”. If a process is actually execut-

ing on the CPU, this state is called “operating”, and after its time slice is over the

process reverts to the “runnable” state.

BFrom an external point of view, Linux does not distinguish between these

two process states; the process in question is always marked “runnable”.

It is quite possible that a process requires further input or needs to wait for

peripheral device operations to complete; such a process cannot be assigned CPU

time, and its state is considered to be “sleeping”. Processes that have been stopped

by means of Ctrl +zusing the shell’s job control facility are in state “stopped”.

Once the execution of a process is over, it terminates itself and makes a return return code

code available, which it can use to signal, for example, whether it completed suc-

cessfully or not (for a suitable denition of “success”).

Once in a while processes appear who are marked as zombies using the “Z” zombies

state. These “living dead” usually exist only for a brief instant. A process becomes

a zombie when it nishes and dies for good once its parent process has queried

its return code. If a zombie does not disappear from the process table this means

that its parent should really have picked up the zombie’s return code but didn’t.

A zombie cannot be removed from the process table. Because the original pro-

cess no longer exists and cannot take up neither RAM nor CPU time, a zombie

has no impact on the system except for an unattractive entry in the system state.

Persistent or very numerous zombies usually indicate programming errors in the

parent process; when the parent process terminates they should do so as well.

BZombies disappear when their parent process disappears because “or-

phaned” processes are “adopted” by the init process. Since the init process

194 13 Process Management

spends most of its time waiting for other processes to terminate so that it

can collect their return code, the zombies are then disposed of fairly quickly.

BOf course, zombies take up room in the process table that might be required

for other processes. If that proves a problem, look at the parent process.

Exercises

C13.3 [2] Start a

xclock

process in the background. In the

shell variable you

will nd the PID of that process (it always contains the PID of the most re-

cently launched background process). Check the state of that process by

means of the “

grep ^State: /proc/$!/status

” command. Stop the

xclock

moving it to the foreground and stopping it using Ctrl +z. What is the

process state now? (Alternatively, you may use any other long-running pro-

gram in place of

xclock

C13.4 [4] (When going over this manual for the second time.) Can you create

a zombie process on purpose?

13.3 Process Information—

You would normally not access the process information in

/proc

directly but use

the appropriate commands to query it.

The

(“process status”) command is available on every Unix-like system.

Without any otions, all processes running on the current terminal are output. The

resulting list contains the process number

PID

, the terminal

TTY

, the process state

STAT

, the CPU time used so far

TIME

and the command being executed.

$ps

PID TTY STAT TIME COMMAND

997 1 S 0:00 -bash

1005 1 R 0:00 ps

$ _

There are two processes currently executing on the

tty1

terminal: Apart from the

bash

with PID 997, which is currently sleeping (state “

”), a

command is executed

using PID 1005 (state “

”). The “operating” state mentioned above is not being

displayed in

output.

The syntax of

is fairly confusing. Besides Unix98-style options (like

-l

) and

GNU-style long options (such as

--help

), it also allows BSD-style options without

a leading dash. Here is a selection out of all possible parameters:

(“all”) displays all processes with a terminal

--forest

displays the process hierarchy

(“long”) outputs extra information such as the priority

(“running”) displays only runnable processes

(“terminal”) displays all processes on the current terminal

⟨name⟩(“user”) displays processes owned by user ⟨name⟩

also displays processes without a terminal

BThe unusual syntax of

derives from the fact that AT&T’s

traditionally

used leading dashes on options while BSD’s didn’t (and the same option

can have quite dierent results in both avours). When the big reunication

came in System V Release 4, one could hang on to most options with their

customary meaning.

13.4 Processes in a Tree—

pstree

195

If you give

a PID, only information pertaining to the process in question will

be displayed (if it exists):

$ps 1

PID TTY STAT TIME COMMAND

1 ? Ss 0:00 init [2]

With the

-C

option,

displays information about the process (or processes) based

on a particular command:

$ps -C konsole

PID TTY TIME CMD

4472 ? 00:00:10 konsole

13720 ? 00:00:00 konsole

14045 ? 00:00:14 konsole

(Alternatively,

grep

would help here as well.)

Exercises

C13.5 [!2] What does the information obtainable with the

command mean?

Invoke

without an option, then with the

option, and nally with the

option. What does the

option do?

C13.6 [3] The

command allows you to determine the output format your-

self by means of the

-o

option. Study the

(1) manual page and specify a

command line that will output the PID, PPID, the process state and the

command.

13.4 Processes in a Tree—

pstree

If you do not want to obtain every bit of information about a process but are rather

interested in the relationships between processes, the

pstree

command is helpful.

pstree

displays a process tree in which the child processes are shown as depending

on their parent process. The processes are displayed by name:

$pstree

init-+-apache---7*[apache]

|-apmd

|-atd

|-cannaserver

|-cardmgr

|-chronyd

|-cron

|-cupsd

|-dbus-daemon-1

|-events/0-+-aio/0

| |-kblockd/0

| `-2*[pdflush]

|-6*[getty]

|-ifd

|-inetd

|-kapmd

|-kdeinit-+-6*[kdeinit]

| |-kdeinit-+-bash---bash

| | |-2*[bash]

| | |-bash---less

196 13 Process Management

| | |-bash-+-pstree

| | | `-xdvi---xdvi.bin---gs

| | `-bash---emacs---emacsserver

| |-kdeinit---3*[bash]

| |-kteatime

| `-tclsh

|-10*[kdeinit]

|-kdeinit---kdeinit



Identical processes are collected in brackets and a count and “*” are displayed.

The most important options of

pstree

include:

-p

displays PIDs along with process names

-u

displays process owners’ user name

-G

makes the display prettier by using terminal graphics characters—whether this

is in fact an improvement depends on your terminal

BYou can also obtain an approximated tree structure using “

ps --forest

”. The

tree structure is part of the

COMMAND

column in the output.

13.5 Controlling Processes—

kill

and

killall

The

kill

command sends signals to selected processes. The desired signal can besignals

specied either numerically or by name; you must also pass the process number

in question, which you can nd out using

$kill -15 4711

Send signal

SIGTERM

to process 4711

$kill -TERM 4711

Same thing

$kill -SIGTERM 4711

Same thing again

$kill -s TERM 4711

Same thing again

$kill -s SIGTERM 4711

Same thing again

$kill -s 15 4711

Guess what

Here are the most important signals with their numbers and meaning:

SIGHUP

(1, “hang up”) causes the shell to terminate all of its child processes that

use the same controlling terminal as itself. For background processes with-

out a controlling terminal, this is frequently used to cause them to re-read

their conguration les (see below).

SIGINT

(2, “interrupt”) Interrupts the process; equivalent to the Ctrl +ckey com-

bination.

SIGKILL

(9, “kill”) Terminates the process and cannot be ignored; the “emergency

brake”.

SIGTERM

(15, “terminate”) Default for

kill

and

killall

; terminates the process.

SIGCONT

(18, “continue”) Lets a process that was stopped using

SIGSTOP

continue.

SIGSTOP

(19, “stop”) Stops a process temporarily.

SIGTSTP

(20, “terminal stop”) Equivalent to the Ctrl +zkey combination.

AYou shouldn’t get hung up on the signal numbers, which are not all guaran-

teed to be the same on all Unix versions (or even Linux platforms). You’re

usually safe as far as 1, 9, or 15 are concerned, but for everything else you

should rather be using the names.

13.6

pgrep

and

pkill

197

Unless otherwise specied, the signal

SIGTERM

(“terminate”) will be sent, which

(usually) ends the process. Programs can be written such that they “trap” signals

(handle them internally) or ignore them altogether. Signals that a process neither

traps nor ignores usually cause it to crash hard. Some (few) signals are ignored

by default.

The

SIGKILL

and

SIGSTOP

signals are not handled by the process but by the kernel

and hence cannot be trapped or ignored.

SIGKILL

terminates a process without

giving it a chance to object (as

SIGTERM

would), and

SIGSTOP

stops the process such

that it is no longer given CPU time.

kill

does not always stop processes. Background processes which provide sys-

tem services without a controlling terminal—daemons—usually reread their con- daemons

guration les without a restart if they are sent

SIGHUP

(“hang up”).

You can apply

kill

, like many other Linux commands, only to processes that

you actually own. Only

root

is not subject to this restriction.

Sometimes a process will not even react to

SIGKILL

. The reason for this is ei-

ther that it is a zombie (which is already dead and cannot be killed again) or else

blocked in a system call. The latter situation occurs, for example, if a process waits

for a write or read operation on a slow device to nish.

An alternative to the

kill

command is the

killall

command.

killall

acts just

killall

kill

—it sends a signal to the process. The dierence is that the process must

be named instead of addressed by its PID, and that all processes of the same name

are signalled. If no signal is specied, it sends

SIGTERM

by default (like

kill

killall

outputs a warning if there was nothing to signal to under the specied name.

The most important options for

killall

include:

-i killall

will query you whether it is actually supposed to signal the process in

question.

-l

outputs a list of all available signals.

-w

waits whether the process that was signalled actually terminates.

killall

checks every second whether the process still exists, and only terminates

once it is gone.

ABe careful with

killall

if you get to use Solaris or BSD every now and then.

On these systems, the command does exactly what its name suggests—it

kills all processes.

Exercises

C13.7 [2] Which signals are being ignored by default? (Hint:

signal

(7))

13.6

pgrep

and

pkill

As useful as

and

kill

are, as dicult can it be sometimes to identify exactly the

right processes of interest. Of course you can look through the output of

using

grep

, but to make this “foolproof” and without allowing too many false positives

is at least inconvenient, if not tricky. Nicely enough, Kjetil Torgrim Homme has

taken this burden o us and developed the

pgrep

program, which enables us to

search the process list conveniently. A command like

$pgrep -u root sshd

will, for example, list the PIDs of all

sshd

processes belonging to

root

BBy default,

pgrep

restricts itself to outputting PIDs. Use the

-l

option to get it

to show the command name, too. With

-a

it will list the full command line.

BThe

-d

option allows you to specify a separator (the default is “

”):

198 13 Process Management

$pgrep -d, -u hugo bash

4261,11043,11601,12289

You can obtain more detailed information on the processes by feeding the

PIDs to

$ps up $(pgrep -d, -u hugo bash)

(The

option lets you give

a comma-separated list of PIDs of interest.)

pgrep

’s parameter is really an (extended) regular expression (consider

egrep

)

which is used to examine the process names. Hence something like

$pgrep '^([bd]a|t?c|k|z|)sh$'

will look for the common shells.

BNormally

pgrep

considers only the process name (the rst 15 characters of the

process name, to be exact). Use the

-f

option to search the whole command

line.

You can add search criteria by means of options. Here is a small selection:

-G

Consider only processes belonging to the given group(s). (Groups can be spec-

ied using names or GIDs.)

-n

Only display the newest (most recently started) of the found processes.

-o

Only display the oldest (least recently started) of the found processes.

-P

Consider only processes whose parent processes have one of the given PIDs.

-t

Consider only processes whose controlling terminal is listed. (Terminal names

should be given without the leading “

/dev/

”.)

-u

Consider only processes with the given (eective) UIDs.

BIf you specify search criteria but no regular expression for the process name,

all processes matching the search criteria will be listed. If you omit both you

will get an error message.

The

pkill

command behaves like

pgrep

, except that it does not list the found

processes’ PIDs but sends them a signal directly (by default,

SIGTERM

). As in

kill

you can specify another signal:

#pkill -HUP syslogd

The

--signal

option would also work:

#pkill --signal HUP syslogd

BThe advantage of

pkill

compared to

killall

is that

pkill

can be much more

specic.

Exercises

C13.8 [!1] Use

pgrep

to determine the PIDs of all processes belonging to user

hugo

. (If you don’t have a user

hugo

, then specify some other user instead.)

C13.9 [2] Use two separate terminal windows (or text consoles) to start one

“

sleep 60

” command each. Use

pkill

to terminate (a) the rst command

started, (b) the second command started, (c) the command started in one

of the two terminal windows.

13.7 Process Priorities—

nice

and

renice

199

13.7 Process Priorities—

nice

and

renice

In a multi-tasking operating system such as Linux, CPU time must be shared

among various processes. This is the scheduler’s job. There is normally more

than one runnable process, and the scheduler must allot CPU time to runnable

processes according to certain rules. The deciding factor for this is the priority priority

of a process. The priority of a process changes dynamically according to its prior

behaviour—“interactive” processes, i. e, ones that do I/O, are favoured over those

that just consume CPU time.

As a user (or administrator) you cannot set process priorities directly. You can

merely ask the kernel to prefer or penalise processes. The “nice value” quanties

the degree of favouritism exhibited towards a process, and is passed along to child

processes.

A new process’s nice value can be specied with the

nice

command. Its syntax

nice

[

⟨nice value⟩] ⟨command⟩ ⟨parameter⟩

…

(

nice

is used as a “prex” for another command).

The possible nice values are numbers between −20 and +19. A negative nice possible nice values

value increases the priority, a positive value decreases it (the higher the value, the

“nicer” you are towards the system’s other users by giving your own processes a

lower priority). If no nice value is specied, the default value of +10 is assumed.

Only

root

may start processes with a negative nice value (negative nice value are

not generally nice for other users).

The priority of a running process can be inuenced using the

renice

command.

renice

You call

renice

with the desired new nice value and the PID (or PIDs) of the pro-

cess(es) in question:

renice

[

⟨nice value⟩] ⟨PID⟩

…

Again, only the system administrator may assign arbitrary nice values. Normal

users may only increase the nice value of their own processes using

renice

—for

example, it is impossible to revert a process started with nice value 5back to nice

value 0, while it is absolutely all right to change its nice value to 10. (Think of a

ratchet.)

Exercises

C13.10 [2] Try to give a process a higher priority. This may possibly not

work—why? Check the process priority using

13.8 Further Process Management Commands—

nohup

and

top

When you invoke a command using

nohup

, that command will ignore a

SIGHUP

sig- Ignoring

SIGHUP

nal and thus survive the demise of its parent process:

nohup

⟨command⟩

…

The process is not automatically put into the background but must be placed there

by appending a

to the command line. If the program’s standard output is a ter-

minal and the user has not specied anything else, the program’s output together

with its standard error output will be redirected to the

nohup.out

le. If the current

directory is not writable for the user, the le is created in the user’s home directory

instead.

200 13 Process Management

top

unies the functions of many process management commands in a single

top

program. It also provides a process table which is constantly being updated. You

can interactively execute various operations; an overview is available using h.

For example, it is possible to sort the list according to several criteria, send signals

to processes ( k), or change the nice value of a process ( r).

Commands in this Chapter

kill

Terminates a background process

bash

(1),

kill

(1) 196

killall

Sends a signal to all processes matching the given name

killall

(1) 197

nice

Starts programs with a dierent nice value

nice

(1) 199

nohup

Starts a program such that it is immune to

SIGHUP

signals

nohup

(1) 199

pgrep

Searches processes according to their name or other criteria

pgrep

(1) 197

pkill

Signals to processes according to their name or other criteria

pkill

(1) 198

Outputs process status information

(1) 194

pstree

Outputs the process tree

pstree

(1) 195

renice

Changes the nice value of running processes

renice

(8) 199

top

Screen-oriented tool for process monitoring and control

top

(1) 199

Summary

• A process is a program that is being executed.

• Besides a program text and the corresponding data, a process has attributes

such as a process number (PID), parent process number (PPID), owner,

groups, priority, environment, current directory, …

• All processes derive from the

init

process (PID 1).

•

can be used to query process information.

• The

pstree

command shows the process hierarchy as a tree.

• Processes can be controlled using signals.

• The

kill

and

killall

commands send signals to processes.

• The

nice

and

renice

commands are used to inuence process priorities.

ulimit

limits the resource usage of a process.

•

top

is a convenient user interface for process management.

$ echo tux

tux

$ ls

hallo.c

hallo.o

$ /bin/su -

Password:

Hard Disks (and Other Secondary

Storage)