AU20004P001_SCO_Open Server_Performance_Guide_May95 AU20004P001 SCO Open Server Performance Guide May95

AU20004P001_SCO_OpenServer_Performance_Guide_May95 AU20004P001_SCO_OpenServer_Performance_Guide_May95

User Manual: AU20004P001_SCO_OpenServer_Performance_Guide_May95

Open the PDF directly: View PDF PDF.
Page Count: 280

DownloadAU20004P001_SCO_Open Server_Performance_Guide_May95 AU20004P001 SCO Open Server Performance Guide May95
Open PDF In BrowserView PDF
®

seo OpenServerTM
Performance Guide

seo

OpenServerTM

seQ QpenServer™
Performance Guide

© 1983-1995 The Santa Cruz Operation, Inc. All rights reserved.
© 1992-1994 AT&T Global Information Solutions Company; © 1987-1989 Legent Corporation; ©
1980-1989 Microsoft Corporation; © 1993-1994 Programmed Logic Corporation; © 1988 UNIX Systems
Laboratories, Inc. All rights reserved.

No part of this publication may be reproduced, transmitted, stored in a retrieval system, nor translated into
any human or computer language, in any form or by any means, electronic, mechanical, magnetic, optical,
chemical, manual, or otherwise, without the prior written permission of the copyright owner, The Santa
Cruz Operation, Inc., 400 Encinal Street, Santa Cruz, California, 95060, USA. Copyright infringement is a
serious matter under the United States and foreign Copyright Laws.
Information in this document is subject to change without notice and does not represent a commitment on
the part of The Santa Cruz Operation, Inc.
the seQ logo, The Santa Cruz Operation, Open Desktop, QDT, Panner, sea Global Access, seQ QK, seQ
OpenServer, seQ MultiView, seQ Visual Tel, Skunkware, and VP fix are trademarks or registered
trademarks of The Santa Cruz Operation, Inc. in the USA and other countries. UNIX is a registered
trademark in the USA and other countries, licensed exclusively through X/Open Company Limited. All
other brand and product names are or may be trademarks of, and are used to identify products or services
of, their respective owners.

SCQ,

Document Version: 5.0
1 May 1995

The sea software that accompanies this publication is commercial computer software and, together with
any related documentation, is subject to the restrictions on US Government use as set forth below. If this
procurement is for a DOD agency, the following DFAR Restricted Rights Legend applies:
RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the Government is subject to restrictions
as set forth in subparagraph (c)(1)(ii) of Rights in Technical Data and Computer Software Clause at DFARS
252.227-7013. Contractor/Manufacturer is The Santa Cruz Operation, Inc., 400 Encinal Street, Santa Cruz,
CA 95060.

If this procurement is for a civilian government agency, this FAR Restricted Rights Legend applies:

This computer software is submitted with restricted rights under
Government Contract No.
(and Subcontract No.
, if appropriate). It may not be used,
reproduced, or disclosed by the Government except as provided in paragraph (g)(3)(i) of FAR Clause
52.227-14 alt III or as otherwise expressly stated in the contract. Contractor/Manufacturer is The Santa
Cruz Operation, Inc., 400 Encinal Street, Santa Cruz, CA 95060.
RESTRICTED RIGHTS LEGEND:

The copyrighted software that accompanies this publication is licensed to the End User only for use in strict
accordance with the End User License Agreement, which should be read carefully before commencing use
of the software. This sea software includes software that is protected by these copyrights:
© 1983-1995 The Santa Cruz Operation, Inc.; © 1989-1994 Acer Incorporated; © 1989-1994 Acer America
Corporation; © 1990-1994 Adaptec, Inc.; © 1993 Advanced Micro Devices, Inc.; © 1990 Altos Computer
Systems; © 1992-1994 American Power Conversion, Inc.; © 1988 Archive Corporation; © 1990 AT!
Technologies, Inc.; © 1976-1992 AT&T; © 1992-1994 AT&T Global Information Solutions Company; © 1993
Berkeley Network Software Consortium; © 1985-1986 Bigelow & Holmes; © 1988-1991 Carnegie Mellon
University; © 1989-1990 Cipher Data Products, Inc.; © 1985-1992 Compaq Computer Corporation; ©
1986-1987 Convergent Technologies, Inc.; © 1990-1993 Cornell University; © 1985-1994 Corollary, Inc.; ©
1988-1993 Digital Equipment Corporation; © 1990-1994 Distributed Processing Technology; © 1991 D.L.S.
Associates; © 1990 Free Software Foundation, Inc.; © 1989-1991 Future Domain Corporation; © 1994
Gradient TechnolOgies, Inc.; © 1991 Hewlett-Packard Company; © 1994 IBM Corporation; © 1990-1993
Intel Corporation; © 1989 Irwin Magnetic Systems, Inc.; © 1988-1994 IX! Limited; © 1988-1991 JSB
Computer Systems Ltd.; © 1989-1994 Dirk Koeppen EDV-Beratungs-GmbH; © 1987-1994 Legent
Corporation; © 1988-1994 Locus Computing Corporation; © 1989-1991 Massachusetts Institute of
Technology; © 1985-1992 Metagraphics Software Corporation; © 1980-1994 Microsoft Corporation; ©
1984-1989 Mouse Systems Corporation; © 1989 Multi-Tech Systems, Inc.; © 1991 National Semiconductor
Corporation; © 1990 NEC Technologies, Inc.; © 1989-1992 Novell, Inc.; © 1989 Ing. C. Olivetti & C. SpA; ©
1989-1992 Open Software Foundation, Inc.; © 1993-1994 Programmed Logic Corporation; © 1989 Racal
InterLan, Inc.; © 1990-1992 RSA Data Security, Inc.; © 1987-1994 Secureware, Inc.; © 1990 Siemens Nixdorf
Informationssysteme AG; © 1991-1992 Silicon Graphics, Inc.; © 1987-1991 SMNP Research, Inc.; ©
1987-1994 Standard Microsystems Corporation; © 1984-1994 Sun Microsystems, Inc.; © 1987 Tandy
Corporation; © 1992-1994 3COM Corporation; © 1987 United States Army; © 1979-1993 Regents of the
University of California; © 1993 Board of Trustees of the University of Illinois; © 1989-1991 University of
Maryland; © 1986 University of Toronto; © 1976-1990 UNIX System Laboratories, Inc.; © 1988 Wyse
Technology; © 1992-1993 Xware; © 1983-1992 Eric P. Allman; © 1987-1989 Jeffery D. Case and Kenneth W.
Key; © 1985 Andrew Cherenson; © 1989 Mark H. Colburn; © 1993 Michael A. Cooper; © 1982 Pavel Curtis;
© 1987 Owen DeLong; © 1989-1993 Frank Kardel; © 1993 Carlos Leandro and Rui Salgueiro; © 1986-1988
Larry McVoy; © 1992 David L. Mills; © 1992 Ranier Pruy; © 1986-1988 Larry Wall; © 1992 Q. Frank Xia. All
rights reserved. SCO NFS was developed by Legent Corporation based on Lachman System V NFS. sca
TCP /IP was developed by Legent Corporation and is derived from Lachman System V STREAMS TCP, a
joint development of Lachman Associates, Inc. (predecessor of Legent Corporation) and Convergent
Technologies, Inc.

About this book

1

How this book is organized ....................................................................
Related documentation ...........................................................................
Typographical conventions .....................................................................
How can we improve this book? .............................................................

1

2
5
6

Chapter 1

What determines performance

7

Hardware factors that influence performance ........................................ 8
Software factors that influence performance ......................................... 9

Chapter 2

Managing performance
Tuning methodology .............................................................................
Defining performance goals ..... ...................... .......................... .......
Collecting data ................................................................................
Formulating a hypothesis .... ..... ...... .................................................
Getting more specifics ....................................................................
Making adjustments to the system ..................................................
Performance tuning case studies .........................................................
Managing the workload .........................................................................

13
14
16
16
17
17
18

19
19

Chapter 3

Tuning CPU resources
Operating system states .......................................................................
Viewing CPU activity .......................................................................
Process states .................. " ......................... ........ .. ..... ....................
Clock ticks and time slices ..............................................................
Context switching ............................................................................
Interrupts ........................................................................................
Calculation of process priorities .......................................................
Examining the run queue ................................................................
Multiprocessor systems ........................................................................
Support for multiple processors .. .......... ......... ........... ......... ..............

Table of contents

21
22
23
24
26
26
28
28
30

31
33

v

Using the mpstat load displayer ......................................................
Examining interrupt activity on multiprocessor systems ...................
Process scheduling ...............................................................................
Adjusting the scheduling of processes ... .... ... ..... ... ...... ........ .... ........
Controlling priority calculations - dopricalc ....................................
Controlling the effective priority of processes - primove .................
Controlling cache affinity - cache_affinity.......................................
Controlling process preemption - preemptive ................................
Load balancing - loadbalance .......................................................
Identifying CPU-bound systems ............................................................
Tuning CPU-bound systems ..................................................................

34
34
34
34
35
36
37
37
37

38
40

Chapter 4

Tuning memory resources
Physical memory ................ .................. ......... ....... .................................
Virtual memory .......................................... ,. .... .................... ......... .........
Paging ............................................................................................
Swapping ........................................................................................
Viewing physical memory usage .....................................................
Viewing swap space usage .............................................................
Viewing swapping and paging activity .............................................
Identifying memory-bound systems .....................................................
Tuning memory-bound systems ...........................................................
Reducing disk activity caused by swapping and paging ...................
Increasing memory by reducing the buffer cache size ......................
Investigating memory usage by system tables ................................
Using graphical clients on low memory systems ................................
Tuning X server performance ................................................................
Kernel parameters that affect the X Window System ...........................
Case study: memory-bound workstation .............................................
System configuration ......................................................................
Defining a performance goal...........................................................
Collecting data ................................................ ................................
Formulating a hypothesis ................................................................
Getting more specifics ............. ...... ....... .... .... ......... ....... ... ... ............
Making adjustments to the system ..................................................
Case study: memory-bound software development system ...............
System configuration ......................................................................
Defining a performance goal...........................................................

vi

41
42
42
44
47
48
48
49

51
52
53
54
55
56
57

58
59
59
59
60
61
61
65
65
65
66

Collecting data ................................................................................
Formulating a hypothesis ................................................................
Getting more specifics ....................................................................
Making adjustments to the system ..................................................

66
67
67
69

Chapter 5

Tuning 1/0 resources

71

Subsystems that affect disk and other 110 ............................................ 71
How the buffer cache works .................................................................. 73
Viewing buffer cache activity ........................................................... 75
Increasing disk 110 throughput by increasing the buffer cache
size ............................................................................................ 75
Positioning the buffer cache in memory ........................................... 79
Tuning the number of buffer cache hash queues ............................. 80
How the namei cache works ................................................................. 81
Viewing namei cache activity .......................................................... 82
Reducing disk 1/0 by increasing the size of the namei cache ........... 83
How multiphysical buffers are used ..................................................... 84
Tuning the number of multiphysical buffers ................ ...................... 86
The mechanics of a disk transfer ......................................................... 87
Viewing disk and other block 110 activity .......................................... 89
Identifying disk lID-bound systems .................................................. 90
Tuning disk I/O-bound systems ........................................................ 92
SCSI disk driver request queue ....................................................... 93
Tuning the number of SCSI disk request blocks .... ............ ........ ....... 93
Filesystem factors affecting disk performance .......... ....................... 94
Overcoming performance limitations of hard disks ............................ 96
Tuning virtual disk performance ......................................................... 100
Performance considerations for RAID 4 and 5 ............ ............... ..... 102
Choosing a cluster size ................................................................. 103
Balancing disk load in virtual disk arrays ....................................... 105
Tuning virtual disk kernel parameters ............................................ 106
Serial device resources ........... .. ........................................ .. ................ 108
Tuning serial device resources ...................................................... 110
Case study: IIO-bound multiuser system ........................................... 113
System configuration .................................................................... 113
Defining a performance goal ......................................................... 113
Collecting data .............................................................................. 113
Formulating a hypothesis ........ .......... ....................... ....... .............. 115
Getting more specifics .................................. ................................ 115

Table of contents

vii

Making adjustments to the system ... ..... ........... .... ...... ....... ... ...... ...
Case study: unbalanced disk activity on a database server .............
System configuration ....................................................................
Defining a performance goal .........................................................
Collecting data ..............................................................................
Formulating a hypothesis ..............................................................
Getting more specifics ..................................................................
Making adjustments to the system ................................................

117
118
118
119
119
120
120
122

Chapter 6

Tuning networking resources

123
123
129
130
131
131
133
142
144
146
154
155
157
158
158
158
158
159
159
160

STREAMS resources ............................................................................
Monitoring STREAMS performance ................................................
Tuning STREAMS usage ................................................................
TCP/IP resources ..................................................................................
Tuning TCP/IP performance ...... ................................. ............... .....
Monitoring TCPIIP performance .....................................................
NFS resources ......................................................................................
Monitoring NFS performance ...... .......................... ..... ... ........... .....
Tuning NFS performance ...................................................... .........
LAN Manager Client Filesystem resources .........................................
Tuning LAN Manager Client Filesystem performance .....................
Other networking resources ...............................................................
Case study: network overhead caused by X clients ..........................
System configuration ....................................................................
Defining a performance goal .........................................................
Collecting data ..............................................................................
Formulating a hypothesis ..............................................................
Getting more specifics ..................................................................
Making adjustments to the system ................................................

Chapter 7

Tuning system call activity

161

Viewing system call activity ...............................................................
Identifying excessive read and write system call activity ................
Viewing process fork and exec activity ..........................................
Viewing AIO activity .......................................................................

viii

161
162
162
162

Viewing IPC activity ................................................. '" ................ '"
Reducing system call activity .............................................................
Case study: semaphore activity on a database server ......................
System configuration ....................................................................
Defining a performance goal .........................................................
Collecting data ..............................................................................
Formulating a hypothesis ............... .................. .............................
Getting more specifics ..................................................................
Making adjustments to the system ................................................

162
166
167
167
167
168
168
168
169

Appendix A

Tools reference

171

df - report disk space usage .............................................................
ps - check process activity ...............................................................
sar - system activity reporter ...........................................................
How sar works ..............................................................................
Running sar .......... , ....... ........................ ...... ........ ..................... .....
swap - check and add swap space ....... ...... .................. ................ ....
timex - examine system activity per command ...............................
vmstat - virtual memory statistics ................................ ....................

172
173
176
177
178
179
180
181

AppendixB

Configuring kernel parameters

185

When to change system parameters ..................................................
Configuration tools .............................................................................
Using configure to change kernel resources ........ ..... ...... ...... .........
Using idtune to reallocate kernel resources ...................................
Kernel parameters that you can change using configure ...............
Examining and changing configuration-dependent values ..............

186
188
189
190
191
223

AppendixC

Configuring TCP/IP tunable parameters

225

Using ifconfig to change parameters for a network card ................ 225
Using inconfig to change global TCP/IP parameters ....................... 226
TCP/IP parameters ........................................................................ 227

Table of contents

ix

Appendix 0

Quick system tuning reference

235

Bibliography .................................... ............. ......................................... ..... 241
Glossary of performance terminology ............................................... 243

x

About this book
This book is for administrators of sca OpenServerTM systems who are
interested in investigating and improving system performance. It describes
performance tuning for uniprocessor, multiprocessor, and networked systems, including those with TCP lIP, NFS®, and X clients. It discusses how the
various subsystems function, possible performance constraints due to hardware limitations, and optimizing system configuration for various uses. Concepts and strategies are illustrated with case studies.
You will find the information you need more quickly if you are familiar with:
• "How this book is organized" (this page)
• "Related documentation" (page 2)
• "Typographical conventions" (page 5)
Although we try to present information in the most useful way, you are the
ultimate judge of how well we succeed. Please let us know how we can
improve this book (page 6).

How this book is organized
This book tells you:
• what is meant by system performance (page 7)
• how to tune a system (page 13)
• how the configuration of various system components influences the performance of the operating system:
Central Processing Units (CPUs) (page 21) for single and multiprocessor
systems

1

About this book

- memory (page 41) including physical (main) memory in Random Access
Memory (RAM) and swap areas on disk
- Input/Output (I/O) (page 71) including hard disks and serial devices
- networking (page 123) including STREAMS I/O, TCP /IP and NFS
• how you can examine system call activity (page 161) if you are an application programmer
A set of case studies (page 19) illustrates the methodology of system tuning,
and the tools that you can use to examine performance.
Appendixes provide additional information about:
• the tools (page 171) that you can use to examine performance
• the kernel parameters (page 185) that you can use to tune performance
• a quick guide to system tuning (page 235)
There is also a glossary (page 243) which explains technical terms and acronyms used throughout the book.

Related documentation
SCO OpenServer systems include comprehensive documentation. Depending
on which SCO OpenServer system you have, the following books are available

in online and/or printed form. Access online books by double-clicking on the
Desktop Help icon. Additional printed versions of the books are also available. The Desktop and most sea OpenServer programs and utilities are
linked to extensive context-sensitive help, which in tum is linked to relevant
sections in the online versions of the following books. See "Getting help" in
the sea OpenServer Handbook.
NOTE When you upgrade or supplement your SCO OpenServer software,
you might also install online documentation that is more current than the
printed books that came with the original system. For the most up-to-date
information, check the online documentation.

Release Notes
contain important late-breakit,lg information about installation, hardware
requirements, and known limitations. The Release Notes also highlight the
new features added for this release.

sea OpenServer Handbook

provides the information needed to get your sea OpenServer system up
and running, including installation and configuration instructions, and
introductions to the Desktop, online documentation, system administration, and troubleshooting.

2

Performance Guide

Related documentation

Graphical Environment Guide
describes how to customize and administer the Graphical Environment,
including the X Window System ™ server, the SCQ® Panner™ window
manager, the Desktop, and other X clients.

Graphical Environment help
provides online context-sensitive help for Calendar, Edit, the Desktop,
Help, Mail, Paint, the SCQ Panner window manager, and the UNIX®
command-line window.

Graphical Environment Reference
contains the manual pages for the X server (section X), the Desktop, and X
clients from SCQ and MIT (section XC).

Guide to Gateways for LAN Servers
describes how to set up SCQ® Gateway for NetWare® and LAN Manager
Client software on an SCQ OpenServer system to access printers, filesystems, and other services provided by servers running Novell ®
NetWare® and by servers running LAN Manager over DQS, QS/2®, or UNIX
systems. This book contains the manual pages for LAN Manager Client
commands (section LMC).

Mail and Messaging Guide
describes how to configure and administer your mail system. Topics
include sendmail, MMDF, seQ Shell Mail, mailx, and the Post Office
Protocol (POP) server.

Networking Guide
provides information on configuring and administering TCP /IP, NFS®, and
IPX/SPXTM software to provide networked and distributed functionality,
including system and network management, applications support, and
file, name, and time services.

Networking Reference
contains the command, file, protocol, and utility manual pages for the
IPX/SPX (section PADM), NFS (sections NADM, NC, and NF), and TCP/IP
(sections ADMN, ADMP, SFF, and TC) networking software.

Operating System Administrator's Reference
contains the manual pages for system administration commands and utilities (section ADM), system file formats (section F), hardware-specific information (section HW), miscellaneous commands (section M), and SCQ
Visual Tcl™ commands (section TCL).

Operating System Tutorial
provides a basic introduction to the SCQ OpenServer operating system.
This book can also be used as a refresher course or a quick-reference guide.
Each chapter is a self-contained lesson designed to give hands-on experience using the seQ OpenServer operating system.

3

About this book

Operating System User's Guide
provides an introduction to SCO OpenServer command-line utilities, the
SCO Shell utilities, working with files and directories, editing files with the
vi editor, transferring files to disks and tape, using DOS disks and files in
the SCO OpenServer environment, managing processes, shell programming, regular expressions, awk, and sed.

Operating System User's Reference
contains the manual pages for user-accessible operating system commands and utilities (section C).

PC-Interface Guide
describes how to set up PC-Interface™ software on an seo OpenServer
system to provide print, file, and terminal emulation services to computers
running PC-Interface client software under DOS or Microsoft® Windows™.

sea Merge User's Guide
describes how to use and configure an SCo® Merge™ system. Topics
include installing Windows, installing DOS and Windows applications,
using DOS with the SCO OpenServer operating system, configuring hardware and software resources, and using SCO Merge in an international
environment.

sea Wabi User's Guide
describes how to use SCO® WabFM software to run Windows 3.1 applications on the SCO OpenServer operating system. Topics include installing
the SCO Wabi software, setting up drives, configuring ports, managing
printing operations, and installing and running applications.

System Administration Guide
describes configuration and maintenance of the base operating system,
including account, file system, printer, backup, security, UUCP, and virtual
disk management.
The SCO OpenServer Development System includes extensive documentation
of application development issues and tools.
Many other useful publications about SCO systems by independent authors
are available from technical bookstores.

4

Performance Guide

Typographical conventions

Typographical conventions
This publication presents commands, filenames, keystrokes, and other special
elements in these typefaces:
Example:

Used for:

lp or Ip(C)

commands, device drivers, programs, and utilities (names,
icons, or windows); the letter in parentheses indicates the
reference manual section in which the command, driver, program, or utility is documented

lnewlclient.1ist

files, directories, and desktops (names, icons, or windows)

root

system, network, or user names

filename

placeholders (replace with appropriate name or value)

(Esc)

keyboard keys

Exit program?

system output such as prompts and messages

yes or yes

user input

"Description"

field names or column headings (on screen or in database)

open or open(S)

library routines, system calls, kernel functions, C keywords;
the letter in parentheses indicates the reference manual section
in which the file is documented

$HOME

environment or shell variables

SIGHUP

named constants or signals

buf

C program structures
C program structure members and variables

5

About this book

How can we improve this book?
What did you find particularly helpful in this book? Are there mistakes in this
book? Could it be organized more usefully? Did we leave out information you
need or include unnecessary material? If so, please tell us.
To help us implement your suggestions, include relevant details, such as book
title, section name, page number, and system component. We would appreciate information on how to contact you in case we need additional explanation.
To contact us, use the card at the back of the
write to us at:

sea

OpenServer Handbook or

Technical Publications
Attn: eFT

The Santa Cruz Operation, Inc.
PO Box 1900
Santa Cruz, California 95061-9969
USA

or e-mail us at:

techpubs@sco.com or ... uunet!scoltechpubs
Thank you.

6

Performance Guide

Chapter 1

What determines perfo111Ulnce
A computer system consists of a finite set of hardware and software components. These components constitute the resources of the system. One of the
tasks of the operating system is to share these resources between the programs that are running on the system. Performance is a measure of how well
the operating system does this task; the aim of performance tuning is to make
it do this task better.
A system's hardware resources have inherent physical limits in the quantity of
data they can handle and the speed with which they can do this. The physical
subsystems that compose hardware include:
• One or more central processing units (CPUs), and the ancillary processors
that support them.
• Memory - both in Random Access Memory (RAM) and as swap space on
disk.
• I/O devices including hard and floppy disk drives, tape drives, serial ports,

and network cards.
• Networks - both Local Area Networks (LANs) and Wide Area Networks
(WANs).
Operating system resources are limited by the hardware resources such as the
amount of memory available and how it is accessed. The internal resources of
the operating system are usually configurable and control such things as the
size of data structures, security policy, standards conformance, and hardware
modes.

7

What determines performance

Examples of operating system resources are:
• The tables that the operating system uses to keep track of users and the
programs they are running.
• The buffer cache and other memory buffers that reduce dependence on
accessing slow peripheral devices.
If your system is connected to one or more networks, it may depend on
remote machines to serve files, perform database transactions, perform calculations, run X clients, and provide swap space, or it may itself provide some
of these services. Your system may be a router or gateway if it is connected to
more than one network. In such cases, the performance of the network and
the remote machines will have a direct influence on the performance of your
system.

Hardware factors that influence performance
Your system's hardware has the greatest influence on its performance. It is the
ultimate limiting factor on how fast a process will run before it has to start
sharing what is available with the operating system and other user processes.
Performance tuning can require you to add hardware or upgrade existing
hardware if a system's physical subsystems are unbalanced in power, or
insufficiently powerful to satisfy the demands being put on them. There may
come a time when, despite your best efforts, you cannot please enough people
enough of the time with the hardware resources at your disposal. If so, you
will have to go and buy some more hardware. This is one reason why monitoring and recording your system's performance is important if you are not
the person spending the money. With the information that you have gathered, you can make a strong case for upgrading your system.
It is important to balance the power of your computer's subsystems with each
other; the power of the CPU(s) is not enough in itself. If the other subsystems
are slow relative to the available processing power, they will act to constrain
it. If they are more powerful, you have possibly overspent, although you
should be able to upgrade processing power without much extra expenditure.
There are many hardware factors that can limit the overall system performance:
• The speed and width of the system's address and data buses.
• The model, clock speed, and the size of the internal level-one (Ll) memory
cache of the system's CPU or CPUs.
• The size of the level-two (L2) cache memory which is external to the CPU.
This should be capable of working with all of physical memory.

8

Performance Guide

Software factors that in fluence performance

• The amount of memory, the width of its data path, and its access time. The
time that the CPU has to wait for memory to be accessed limits its performance.
• The speed and width of a SCSI bus controlled by a host adapter.
• The width of the data path on peripheral controller cards (32, 16, or 8-bit).
• Whether controllers have built-in cache. This is particularly important for
disk and network controllers.
• Access time for hard disks.
• Whether intelligent or dumb serial cards are used; intelligent cards offload
much of the work that would otherwise be performed by the cPU.
On multiprocessor machines, the following considerations also become

important:
• Write-back L2 cache (for instructions and data) with cache coherency on
each cPU to reduce the number of accesses to main memory. This has the
benefit of improving cPU performance as well as improving general system performance by reducing contention for the system bus.
• Support for fully distributed interrupts to allow any CPU to service interrupts from I/O devices such as network and disk controllers.
• The memory and I/O subsystems must be as fast as possible to keep up
with the demands of the enhanced cPU performance. Use of intelligent
peripheral controllers is particularly desirable.

Software factors that influence performance
The way in which applications are written usually has a large impact on performance. If they make inefficient use of processing power, memory, disk, or
other subsystems, it is unlikely that you will improve the situation significantly by tuning the operating system.
The efficiency of the algorithms used by an application, or the way that it uses
system services, are usually beyond your control unless you have access to
source code. Some applications such as large relational database systems provide extensive facilities for performance monitoring and tuning which you
should study separately.

9

Software factors that in fluence performance

• Is it using large numbers of system calls? System calls are expensive in processing overhead and may cause a context switch on the return from the
call. You can use trace(CP) to discover the system call usage of a program.
• Is it using inefficient read(S) and write(S) system calls to move small numbers of characters at a time between user space and kernel space? If possible use buffered I/O to avoid this.
• Are formatted reads and writes to disk being used? Unformatted reads and
writes are much more efficient for maintaining precision, speed of access,
and generally need less disk space.
• Is the application using memory efficiently? Many older applications use
disk extensively since they were written in the days of limited core storage
and expensive memory.
• What version of malloc(S) does the application use (if it uses it at all)? The
version in the libmalloc.a library allows more control over the allocation of
memory than the version in libc.a. Memory leakage can occur if you do not
call free(S) to place blocks of memory back in the malloc pool when you
have finished with them.
• Does the application group together routines that are used together? This
technique (known as localization of reference) tends to reduce the number
of text pages that need to be accessed when the program runs. (The system
does not load pages of program text into memory when a program runs
unless they are needed for the program's execution.)
• Does the application use shared libraries or dynamic linked libraries
(DLLs)? The object code of shared libraries can be used by several applications at the same time; the object code of DLLs is also shared and is only
loaded when an application needs to access it. Using either type of library
is preferable to using statically linked libraries which cannot be shared.
• Does the application use library routines and system calls that are intended
to enhance performance? Examples of the APls provided are:
Memory-mapping loads files directly into memory for processing (see
mmap(S)).
Fixed-priority scheduling allows selected time-critical processes to control how they are scheduled and ensure that they execute when they
have work to perform. Applications can use the predictable scheduling
behavior to improve throughput and reduce contention (see
sched_setparam(S) and sched~etparam(S)).
Support for high performance asynchronous I/O, semaphores and
latches, and high-resolution timers and spin locks for use by threaded
applications (see aio(FP), semaphore(FP), and time(FP)).

11

~hatderermmesperlormance

12

Perlormance Guide

Chapter 2

Managing perfonnance
To manage the performance of a system, you normally try to share the
available resources equally between its users. However, different users perceive performance according on their own needs and the demands of the
applications that they are running. If they use interactive programs, response
time is likely to be their main index of performance. Someone interested in
performing numeric analysis may only be worried about the turnaround time
for off-line batch mode processing. Another person may wish to perform
sophisticated image processing in real time and requires quick access to
graphics data files. You, as the administrator, are interested in maximizing
the throughput of all jobs submitted to the system - in fact, keeping everyone happy. Unfortunately, such differing requirements may be difficult to
reconcile.
For example, if you administer a single standalone system, you may decide
that your main priority is to improve the interactive response time. You may
be able to do this by decreasing the overall workload at peak usage times.
'This would involve scheduling some work to run as batch jobs at quieter
times, or perhaps restricting simultaneous access to your system to a smaller
number of users. However, in speeding up your system's response you now
have the additional problem of decreased throughput, which results in the
completion of fewer jobs, potentially at critical times. In pursuing any particular performance improvement policy there are always likely to be trade-offs,
especially in a situation where resources are at a premium.
The next section covers the setting of realistic performance goals as the first
step in improving the performance of your computer system. You are then
given a method for observing and tuning a system.

13

Managing performance

Tuning methodology
You can optimize performance by system tuning. This is the process of making adjustments to the way in which various system resources are used so
that they correspond more closely to the overall usage patterns of the system.
You can improve the overall response time of the system by locating and
removing system bottlenecks. You can also customize the various resources
to correspond to the needs of an application that is run frequently on the system. Any system tuning that you perform is limited because the performance
of an operating system depends closely on the hardware on which it is
installed.
To tune a system efficiently, you need a good understanding both of the various system resources, and of how the system is going to be used. This might
also involve understanding how different applications use system resources.
System tuning is an ongoing process. A well-tuned system may not remain so
if the mix of applications and users changes. Once a system has been successfully tuned, you should monitor performance regularly as part of routine system administration. This allows you to make modifications when changes in
performance first occur, and not when the performance degrades to the point
where the system becomes unusable.
You may be able to extend a system's resources by adding or reconfiguring
hardware, but remember that these resources always remain finite. Also you
should always bear in mind that there is no exact formula for tuning a system
- performance is based on the mixture of applications running on the system, the individuals using them, and your perception of the system's performance.
The flowchart shown in Figure 2-1 (page 15) illustrates the tuning methodology we recommend you follow. Its most important feature is its feedback
loop - you may not always get the result you expect when you make
changes to your system. You must be prepared to undo your changes so that
you can restore your system to its earlier state.
The steps outlined in the methodology are described in the following sections.
They are further illustrated by the set of case studies discussed in "Performance tuning case studies" (page 19).

14

Performance Guide

Tuning methodology

Figure 2-1 Flowchart illustrating the methodology for system performance tuning

15

Managing performance

Defining performance goals
The first step in tuning a system is to define a set of performance goals. This
can range from discovering and removing system bottlenecks in order to
improve overall performance, to tuning the system specifically to run a single
application, set of applications, or benchmark as efficiently as possible.
The performance goals should be listed in order of priority. Often goals can
conflict; for example, a system running a database that uses a large cache
might also require a large portion of memory to compile programs during
software development. Assigning priority to these goals might involve deciding whether the database performance or the speed of the compilations is
more important.
You should attempt to understand all goals as well as possible. If possible,
you should note which resources will be affected by each goal. If you specify
several goals, it is important that you understand where they might conflict.
Although this guide assumes that you are a system administrator, the goals
identified for the tuning of the various subsystems also reflect the perspectives and needs of users, and application developers.

Collecting data
Once you have identified your performance goals, your next step is to determine how the system is performing at present. The aspects of a system's performance that you measure depend on the sort of tasks you expect it to carry
out. These are some typical criteria that you might use to judge a system:
• The time taken for an interactive application to perform a task.
• The time taken to process a database transaction.
• The time taken for an application to perform a set number of calculations.
If the system is meant to perform a single function, or run a particular applica-

tion or benchmark, then you might only look at specific resources. However,
it can still be helpful to acquire a sense of the performance of the entire system. If the goals set for the system involve the tuning of applications, then the
tuning information provided with the application should be applied before
looking at more general system performance.

I

NOTE It is often possible to improve performance by the careful design and
implementation of an application, or by tuning an existing application,
rather than by tuning the operating system.

16

Performance Guide

Tuning methodology

To gain an overview of the system's current performance, you should read
and use Appendix A, "Tools reference" (page 171) which discusses the various
system resources, and how you can monitor these.
You should collect data over a duration that is long enough for you to be able
to establish normal patterns of usage. You should not make decisions that
may be based on observations of performance anomalies though your goal
may be to smooth these out.
If your goal involves improving the performance of a particular application,
you must understand the application's use of systems resources if you suspect

that it is not performing as well as it should. Tuning information may be
available in the documentation provided with the application. If this is not
available, then an indication of how the application uses resources can be
gained by gathering information for a period before installing the application,
and comparing that information with information gathered while the application is in use.

Formulating a hypothesis
The next step is to determine what is causing the difference between the
current performance and your performance goal. You need to understand the
subsystems that have an influence on being able to achieve this goal. Begin
with a hypothesis, that is, your best informed guess, of the factors that are
critical for moving the system toward the goal. You can then use this
hypothesis to make adjustments to the system on a trial basis.
If this approach is used then you should maintain a record of adjustments that

you have made. You should also keep all the data files produced with the
various monitoring commands such as timex(ADM), and sar(ADM). This is
useful when you want to confirm that a side effect noticed after a recent
change was caused by that change and did not occur previously.

Getting more specifics
Once you have formulated your hypothesis, look for more specific information. If this information supports the hypothesis, then you can make adjustments to kernel parameters or the hardware configuration to try to improve
the performance. If the new information indicates that your hypothesis is
wrong then you need to form another.
See Appendix D, "Quick system tuning reference" (page 235) for a description
of how to diagnose common performance problems.

17

Managing performance

Making adjustments to the system
Once it appears that the hypothesis is correct, you can make adjustments to
the system. It is vital that you record the parameters that the system had initially, and the changes that you make at each stage. Make all adjustments in
small steps to ensure that they have the desired effect. After each adjustment,
reassess the system's performance using the same commands that you used to
measure its initial performance.
You should normally adjust kernel parameters one at a time so that you can
uniquely identify the effect that an adjustment has. If you adjust several
things at once, the interaction between them may mask the effect of the
change. Some parameters, however, are intended to be adjusted in groups
rather than singly. In such a case, always adjust the minimum number of
parameters, and always adjust the same set of parameters. Examples of such
groups of parameters are NBUF and NHBUF, and HTCACHEENTS,
HTHASHQS and HTOFBIAS.
If your adjustment degrades system performance, retrace your steps to a point

where it was at its peak before trying to adjust any other parameters on the
system. If your performance goals are not met, you must further evaluate and
tune the system. This may mean making changes similar to the ones that you
have already made, or you may need consider improving the performance of
other subsystems.
If you have attained your performance goals then you can check the system

against the lists of desired attributes of well-tuned multiuser or database
server systems given in Appendix D, "Quick system tuning reference" (page
235). You should continue to monitor system performance as part of routine
system administration to ensure that you recognize and treat any possible
future degradation in performance at an early stage.
If you adopt the habit of monitoring performance on a regular basis, you

should be able to spot correlations between the numbers recorded and changing demands on the system. Bursts of high system activity during the day, on
a particular day of the week, month, or quarter almost certainly reflect the
pattern of activity by users, either logged on or running batch jobs. It is up to
you to decide how to manage this. You can choose to tune or upgrade the system to cope with peak demand, to reschedule jobs to make use of periods of
normally low activity, or both.

18

Performance Guide

Managing the workload

Performance tuning case studies
We have provided several case studies that you can use as starting points for
your own investigations. Each study is discussed in terms of the five steps
described in "Tuning methodology" (page 14):
1. Define a performance goal for the system.

2. Collect data to get a general picture of how the system is behaving.
3. Formulate a hypothesis based on your observations.
4. Get more specifics to enable you to test the validity of your hypothesis.
5. Make adjustments to the system, and test the outcome of these. If necessary, repeat steps 2 to 5 until your goal is achieved.
The case studies have been chosen to represent a variety of application mixes
on different systems:
• memory-bound workstation (page 59)
• memory-bound software development system (page 65)
• I/O-bound multiuser system (page 113)
• unbalanced disk activity on a database server (page 118)
• semaphore activity on a database server (page 167)
• network overhead caused by X clients (page 158)

Managing the workload
If a system is sufficiently well tuned for its applications and uses to which it is
normally put, you still have a number of options open to you if you are

looking for further performance gains. This involves managing the system's
workload with the cooperation of the system's users. If they can be persuaded
to take some responsibility with you (as the system administrator) for the
system's performance then significant improvements can usually be made.
Below are some steps that users and administrators can take to alleviate
excessive demands on a system without reconfiguring the kernel.
• Move jobs that do not have to run at a particular time of day to off-peak
hours. Encourage users to submit jobs using at(C), batch(C), or crontab(C)
depending on whether they are one-off (at or batch) or periodic jobs
(crontab).
• Collect data on the average system workload and publish it to users so that
they are aware of the daily peaks and troughs. If they have the flexibility to
choose when to run a program, they will know when they can achieve
more work.

19

Managing performance

• Adjust the default nice value of user processes using the Hardware/Kernel
Manager. This will set a lower CPU priority for all user processes, and will
allow critical jobs with higher priority to use the CPU more frequently.
• Encourage users to reduce the priority of their own processes using nice(C)
and renice(C); this is especially important for those jobs that do not
perform much I/O activity - these CPU-intensive jobs are likely to monopolize the available processing time.
• The default action of the Korn shell (ksh(C» is to run background jobs at a
reduced priority. Make sure users have not altered this setting in their
.profile or .kshrc files.
• Encourage users to kill unnecessary processes, and to log out when they
have finished rather than locking their screen.
• Reduce the maximum number of processes that a user can run
concurrently by lowering the value of the kernel parameter MAXUP. For
example, MAXUP set to 20 means that a user can run 19 other processes in
addition to their login shell.
H you do not have access to additional hardware and your system is well
tuned, you may have to implement some of the above recommendations.

20

Performance Guide

Chapter 3

Tuning CPU resources
Your system hardware contains one or more central processing units (CPUs)
plus a host of ancillary processors that relieve the CPU from having to perform certain tasks:
• Math coprocessors perform floating point calculations much more
efficiently than software can. The 80486DXTM, 80486DX2TM, 80486DX4TM, and
Pentium™ include floating-point capability on the chip itself. Without a
floating point coprocessor, the CPU must emulate it using software - this
is considerably slower. On systems with an SCO® SMP@ License, you can
use the -F option to mpsar(ADM) to monitor how many processes are using
floating point arithmetic. This command displays information about the
usage of both floating point hardware and software emulation.
• Direct memory access (DMA) controllers handle memory transfer between
devices and memory, or memory and memory. Many hardware peripheral
controllers on EISA and MCA bus machines have a built-in Bus Master DMA
chip that can perform DMA rather than relying on the DMA controller on
the motherboard. On MCA bus machines, a chip called a Central Arbitration Control Point (CACP) decides which Bus Master DMA controller gets
control of the bus.
An important limitation of all DMA controllers on ISA and early-series MCA
bus machines, and some peripheral controllers on all bus architectures, is
that they cannot address more than the first 16MB of memory (24-bit
addressing). When the operating system encounters hardware with such
limitations, it must instruct the CPU to transfer data between the first 16MB
and higher memory.

21

Tuning CPU resources

Some peripheral controllers (including IDE disk controllers) and older SCSI
host adapters either cannot perform DMA or the device driver may not support its use. In this case, the operating system instructs the CPU to transfer
data between the peripheral and memory on behalf of the hardware. This
is known as programmed I/O (PIO).
• Graphics adapters that can take advantage of a local bus architecture (such
as VL Bus or PCI) operating at the same speed as the CPU produce a substantial improvement in the performance of the graphics subsystem.
• Universal asynchronous receiver/transmitters (UARTs) control input and
output (lIO) ,on serial lines. Buffering on UARTs enables more efficient use
of the CPU in processing characters input or output over serial lines.
Intelligent serial cards are able to offload much of the character processing
that the CPU might otherwise have to perform.
• Programmable interrupt controllers (PICs) handle interrupts from hardware peripheral devices when they are trying to get the attention of the
CPU.
The operating system handles these resources for you - reprogramming the
various peripheral processor chips to perform tasks on behalf of the CPU.

Operating system states
The operating system can be in one of four states:

executing in user mode
The CPU is executing the text (machine code) of a process that accesses its
own data space in memory.
executing in system mode
If a process makes a system call in order to perform a privileged task
requiring the services of the kernel (such as accessing a disk), then the operating system places the CPU in system mode (also known as kernel
mode).
idle waiting for I/O
Processes are sleeping while waiting for the completion of I/O to disk or
other block devices.
idle No processes are read y-to-run on the CPU or are sleeping waiting for
block I/O. Processes waiting for keyboard input or network I/O are
counted as idle.
The combination of time spent waiting for I/O and time spent idle makes up
the total time that the operating system spends idle.

22

Performance Guide

Operating system states

Viewing CPU activity
You can view CPU activity using sar -u on single processor systems:
23:59:44
23:59:49
23:59:54
23:59:59

%usr
4
7
6

%sys
24
84
70

%wio
6
0
1

%idle
66
9
23

5

59

2

32

Average

On systems with an SCQ SMP License, use mpsar -u to see activity averaged
over all the CPUs and cpusar -u to report activity for an individual CPU.
%usr indicates the percentage of time that the operating system is executing
processes in user mode.
%sys indicates the percentage of time that the operating system is executing in
system mode.

%wio indicates the percentage of time that the operating system is idle with
processes that could run if they were not waiting for I/O to complete.
%idle indicates the percentage of time that the operating system is idle with
no runnable processes. On systems with an seQ SMP License, a CPU runs a
process called idle if there are no other runnable processes.

On systems using SMP, root can make a CPU inactive using the
cpuonoff(ADM) command. The -c option displays the number of active and
inactive CPUs:
$ cpuonoff -c
cpu 1: active
cpu 2: inactive
cpu 3: active

The base processor, which cannot be made inactivate, is always indicated by
cpu 1. An inactive CPU shows 100% idle time with the cpusar -u command.
The following sections outline the different process states and how processes
can share the same CPU.

23

Tuning CPU resources

Process states
As soon as a process has been created, the system assigns it a state. A process
can be in one of several states. You can view the state of the processes on a
system using the ps(C) command with the -el options. The "5" field displays
the current state as a single letter.
The important states for performance tuning are:

a On processor - the processor is executing on the CPU in either user or systemmode.
R Runnable - the process is on a run queue and is ready-to-run. A runnable
process has every resource that it needs to execute except the CPU itself.
S Sleeping - the process is waiting for some I/O event to complete such as
keyboard input or a disk transfer. Sleeping processes are not runnable
until the I/O resource becomes available.
Figure 3-1 (page 25) represents these process states and the possible transitions between them.
On single CPU systems only one process can run on the CPU at a time. All
other runnable processes have to wait on the run queue.
A portion of the kernel known as the scheduler chooses which process to run
on the CPU(s). When the scheduler wants to run a different process on the
CPU, it scans the run queue from the highest priority to the lowest looking for
the first runnable process it can find.
When a process becomes runnable, the kernel calculates its priority and places
it on the run queue at that priority. While it remains runnable, the process'

priority is recalculated once every second, and its position in the run queue is
adjusted. When there are no higher-priority runnable processes on the run
queue, the process is placed on the CPU to run for a fixed amount of time
known as a time slice.
The operation of the scheduler is more sophisticated for SMP. See "Process
scheduling" (page 34) for more information.
For certain mixes of applications, it may be beneficial to performance to adjust
the way that the scheduler operates. This is discussed in "Adjusting the
scheduling of processes" (page 34).

24

Performance Guide

Operating system states

a) Main process states
processes on the
run queue

on CPU

swapped-out
runnable processes

on swap

process
running
on CPU
processes sleeping
on 1/0

swapped-out processes
sleeping on 1/0

b) Transitions between process states

- - - I > main flow
----------1> swapping

Figure 3-1 Main process states in a system and the transitions between them

25

Tuning CPU resources

Clock ticks and time slices
The system motherboard has a programmable interval timer which is used as
the system clock; this generates 100 clock interrupts or clock ticks per second
(this value is defined as the constant HZ in the header file
jusr/include/sys/param.h).
The tunable kernel parameter MAXSLICE sets the maximum time slice for a
process. Its default value is 100 clock ticks (one second). The range of permissible values is between 25 and 100 (between one quarter of a second and one
second).
The effect of reducing MAXSLICE is to allow each process to run more often
but for a shorter period of time. This can make interactive applications running on the system seem more responsive. However, you should note that
adjusting the value of MAXSLICE may have little effect in practice. This is
because most processes will need to sleep before their time slice expires in
order to wait for an I/O resource. Even a calculation-intensive process, which
performs little I/O, will tend to be replaced on the CPU by processes woken
when an I/O resource becomes available.

Context switching
A process runs on the CPU until it is context switched. This happens when
one of the following occurs:
• The process exits.
• The process uses up its time slice.
• The process requires another resource that is not currently available or
needs to wait for I/O to complete.
• A resource has become available for a sleeping process. If there is a higher
priority process ready to run, the kernel will run this instead (the current
process is preempted).
• The process relinquishes the CPU using a semaphore or similar system call.
The scheduler can only take a process off the CPU when returning to user
mode from system mode, or if the process voluntarily relinquishes the CPU
from system mode.
If the process has used up its time slice or is preempted, it is returned to the
run queue. If it cannot proceed without access to a resource such as disk I/O,

it sleeps until the resource is available. Once access to that resource is available, the process is placed on the run queue before being put on the processor.
Figure 3-2 (page 27) illustrates this for a process 01 which goes to sleep waiting for I/O.

26

Performance Guide

Operating system states

on

in

CPU

memory

a) Runnable process R1
put on CPU as 0 1

b) Process 0 1 goes to sleep
waiting for 1/0 as 8 1

c) Context switch - runnable process
R2 put on CPU as O2

d) Process 8 1 is woken when
resource becomes available;
put on run queue as R1

e) Process O2 is preempted and put back
on run queue as R2 . R1 is put on CPU
next, as shown in figure a, because it
has higher priority than R2

Figure 3·2 Preemption of a process that goes to sleep waiting for YO

27

Tuning CPU resources

A context switch occurs when the kernel transfers control of the CPU from an
executing process to another that is ready to run. The kernel first saves the
context of the process. The context is the set of CPU register values and other
data that describes the process' state. The kernel then loads the context of the
new process which then starts to execute.
When the process that was taken off the CPU next runs, it resumes from the
point at which it was taken off the CPU. This is possible because the saved
context includes the instruction pointer. This indicates the point in the executable code that the CPU had reached when the context switch occurred.

Interrupts
An interrupt is a notification from a device that tells the kernel that:

• An action such as a disk transfer has been completed.
• Data such as keyboard input or a mouse event has been received.
The kernel services an interrupt within the context of the current process that
is running on the CPU. The execution of the current process is suspended
while the kernel deals with the interrupt in system mode. The process may
then lose its place on the CPU as a result of a context switch. If the interrupt
signaled the completion of an I/O transfer, the scheduler wakes the process
that was sleeping on that event, and puts it on a run queue at a newly
calculated numeric priority. It mayor may not be the next process to run
depending on this priority.

Calculation of process priorities
A process' priority can range between 0 (lowest priority) and 127 (highest
priority). User mode processes run at lower priorities (lower values) than system mode processes. A user mode process can have a priority of a to 65,
whereas a system mode process has a priority of 66 to 95. Some of the system
mode priorities indicate what a process is waiting for. For example, a priority
of 81 indicates that a process is waiting for I/O to complete whereas a value of
75 means that it is waiting for keyboard input. The ps command with the -1
option lists process priorities under the PRI column.
Processes with priorities in the range 96 to 127 have fixed priority and control
their own scheduling.

I

NOTE You can find a list of priority values in Table A-2, "Priority values"
(page 175).

28

Performance Guide

Operating system states

Figure 3-3 (this page) shows the division of process priorities into user mode,
system mode, and fixed-priority processes.
Priorities
127

96
95

66
65

I

highest
fixed-priority
processes

1

system
mode

user
mode

o

lowest

Figure 3-3 System process priorities

The operating system varies the priorities of executing processes according to
a simple scheduling algorithm which ensures that each process on the system
gets fair access to the CPU. Every process receives a base level priority (of
default value 51) when it is created. However, this soon loses any influence on
whether a process is selected to run. Note that the priorities of kernel daemons such as sched, vhand, and bdflush are not adjusted. Fixed-priority processes are also exempt - such processes have the ability to adjust their own
priority.
The kernel recalculates the priority of a running process every clock tick. The
new priority is based on the process' nice value, and how much CPU time the
process has used (if any). When the process is taken off the CPU, its lowered
priority pushes it down the run queue to decrease the probability that it will
be chosen to run in the near future.
A process that manages to run for an entire time slice will have had its priority reduced by the maximum amount.

29

Tuning CPU resources

The kernel recalculates the priorities of all runnable processes (those with a
user mode priority less than 65) once every second by successively reducing
the negative weighting given to their recent CPU usage. This increases the
probability that these processes will be selected to run again in the near
future.
The default nice value of a user's process is 20. An ordinary user can increase
this value to 39 and in so doing reduce a process' chance of running on the
CPU. Processes with low nice values will on average get more CPU time
because of the effect the values have on the scheduling algorithm.
There are three ways to control the nice value of a process:
• nice(C) reduces the nice value of a new process; root can also increase the
nice value using this command.
• renice(C) reduces the nice value of a process that is already running; root
can also increase the nice value using this command.
• If the option bgnice is set in the Korn shell, it runs background jobs at a
nice value of 24. If this option is not set, background jobs run at an equal
priority to foreground jobs.

Examining the run queue
Run queue statistics can be seen with sar -q on single processor systems or
mpsar -q on multiprocessor systems:
23:59:44 rung-sz %runocc swpg-sz %swpocc
23:59:49
1.7
98
1.5
36
23:59:54
1.0
63
1.0
31
23:59:59
49
1.0
58
1.0
Average

1.3

74

1.2

39

runq-sz indicates the number of processes that are ready to run (on the run
queue) and %runocc indicates the percentage of time that the run queue was
occupied by at least one process.

See "Identifying CPu-bound systems" (page 38) for a discussion of how to
identify if your system is CPU bound.

30

Performance Guide

Multiprocessor systems

Multiprocessor systems
The sca OpenServer system is a multitasking, multiuser operating system,
designed to share resources on a computer with a single CPU. It can run on a
more powerful multiprocessor system but it cannot use more than one of the
available CPUs.

sca SMP License software

adds multiprocessing-specific components to the
standard operating system kernel, enabling it to recognize and use additional
processors automatically. As SMP is implemented as an extension to, and is
completely c'ompatible with the version of the kernel that supports a single
CPU. With sca SMP License software installed, the operating system retains
its multitasking, multiuser functionality. There is no impact on existing utilities, system administration, or filesystems. SMP can executes standard aMF
(x. out), CaFF, and ELF binaries without modification.
SMP is modular. As your system requires more processing power, you can
add additional processors. For example, two processors give you twice the
processing power of a single processor of identical specification in terms of
the number of instructions per second that they can execute.
If the operating system can gain extra performance in direct proportion to the
number of processors, it is said to exhibit perfect scaling as shown in Figure
3-4 (page 32). In practice, the processors have to compete for other resources
such as memory and disk, they have to co-operate in how they handle interrupts from peripherals and from other CPUs, and they may have to wait to
gain access to data structures and devices.

31

Tuning CPU resources

ThroughputAl
(1 CPU = 100% )
perfect
scali ng

-

5000/0

-

-

3000/0 2000/0

-

100%

1

2

3

4

5

6

~

Number of CPUs

Figure 3·4 Perfect multiprocessor scaling

To ensure good scaling, you should ensure that the memory and I/O subsystems (particularly hard disks) are powerful enough to satisfy the demands
that multiple processors put on them. If you do not match the power of your
subsystems to that of the processors, your system is likely to be memory or
I/O bound, and it will not utilize the potential performance of the processors.
A system will scale well when there are many ready-to-run processes.
Multithreaded applications are also well suited to take advantage of a multiprocessing environment.

32

Performance Guide

Multiprocessor systems

Support for multiple processors
In SMP, all CPUs can access the same memory, and they all run the same copy
of the kernel. As in the single processor version of the operating system, the
operating system state on each CPU may be executing in kernel mode, executing in user mode, idle, or idle waiting for I/O.
All processors can run the kernel Simultaneously because it is multithreaded;
that is, it is designed to run simultaneously on several processors while protecting shared memory structures. Any processor can execute primary kernel
functions such as file system access, memory and buffer management, distributed interrupt and trap handling, process scheduling, and system calls.
Most standard device drivers provided with the system are also
multithreaded. Any unmodified driver or kernel module that does not register
itself as multithreaded runs on the base processor.
Figure 3-5 (this page) shows how we can modify the process state diagram
introduced in "Process states" (page 24) and apply it to a multiprocessor system. Note that this diagram implies that the kernel not only has to consider
when to run a process but also on which CPU to run it.

on CPUs

in memory

on swap

Figure 3-5 Process states on a multiprocessor computer

33

Tuning CPU resources

Using the mpstat load displayer
On systems with an SCO SMP License, the mpstat utility visually displays processor activity for each of the processors installed on your system. It allows

you to verify at a glance that the system load is balanced across all available
processors. See the mpstat(ADM) manual page for more information.

Examining interrupt activity on multiprocessor systems
On multiprocessor systems, interrupts sent between the CPUs coordinate and
synchronize timing, I/O, and other cooperative activity.

You can use cpusar -j to see how active interrupt handling routines are on a
particular CPU in a multiprocessor system. If device drivers are not written to
be multithreaded they will only run on the base processor. You can examine
which device drivers are multithreaded using the mthread(ADM) command.
You can also use the displayintr(ADM) command to see how interrupt
handlers are distributed across the system's CPUs and whether they are movable from one CPU to another.
The number of inter-CPU interrupts can be examined using mpsar -I to view
systemwide activity or cpusar -I to examine an individual CPU. The output of
these commands depends on your system hardware.

Process scheduling
In a single processor UNIX operating system, the scheduler only concerns
itself with when to run a process on the CPU. In a multiprocessor UNIX oper-

ating system, the scheduler not only has to consider when to run a process,
but also where to run it. Because the kernel runs on all the processors, the
process scheduler may be active on any or all of the CPUs. You can adjust
how the process scheduler works in order to improve performance as
described in "Adjusting the scheduling of processes" (this page).

Adjusting the scheduling of processes
You can configure the process scheduling policy to suit a particular application mix by adjusting the values of a few kernel variables as described in the
following sections.
The variables dopricalc, primove, and cache_affinity control the behavior of
priority calculations and the scheduler on both single processor and multiprocessor machines; they are to be found in the file /etc/conj/pack.d/kernel/space.c.

34

Performance Guide

Process scheduling

The variables preemptive and loadbalance only apply to SMP and can be
found in /etc/con//pack.d/crllry/space.c. To change the values of these variables,
edit the files, then relink and reboot the kernel.
It is not possible to predict the effect of the settings on a particular system. It
is likely that you will have to try alternative values to determine whether

there is a gain.
For database servers on systems with an SCO SMP License, you may find that
setting preemptive, loadbalance and dopricalc to zero gives a performance
improvement.
The following sections describe the effect of adjusting these variables:
• #Controlling priority calculations - dopricalc" (this page)
• #Controlling the effective priority of processes - primove" (page 36)
• #Controlling cache affinity - cache_affinity" (page 37)
• #Controlling process preemption - preemptive" (page 37)
• #Load balancing -loadbalance" (page 37)

Controlling priority calculations -

dopricalc

The dopricalc variable controls whether the kernel adjusts the priorities of all
runnable processes at one-second intervals. Its value has no effect on the
recalculation every clock tick of the priority of a process that is currently running.
For some application mixes, such as database servers which have no loggedin users and which make frequent I/O requests, disabling the recalculation of
the priorities of ready-to-run processes may improve performance. This is
because a process running on a CPU is more likely to continue to run until it
reaches the end of its time slice or until it sleeps on starting an I/ 0 request.
The default value of dopricalc is 1 which enables the one-second priority calculations. To tum off the calculations, set the value of dopricalc to 0, relink
the kernel, and reboot the system. This modification will reduce the number
of context switches, and may increase the efficiency of the L2 cache. However,
it may impair the performance of system if there is a mixture of interactive
and CPU-intensive processes. CPU-intensive processes spend all or nearly all
of their time in user space; they do not go to sleep waiting for I/O, and so they
are unlikely to be context switched except at the end of their time slice. As a
consequence, interactive processes may receive less access to the CPU.

35

Tuning CPU resources

Controlling the effective priority of processes -

primove

Until now, the discussion of process priorities has assumed that the scheduler
uses a process' calculated priority to decide whether the process should be
put on the CPU to run. In the default configuration of the kernel, this is
effectively true. In fact, the kernel implements the run queue as separate lists
of runnable processes for each priority value. The scheduler examines the
priority value assigned to each list rather than the priorities of the processes
that they contain when looking for a process to run. Provided the kernel
assigns processes to the list corresponding to their priority, the lists are invisible. Under some circumstances, it may be beneficial to performance to allow
processes to remain in a list after their priority has been changed.
When the priority of a user process is adjusted, the variable primove controls
whether the kernel moves the process to a higher or lower value priority list.
The process will only be moved to a new list if its priority differs from the
present list priority by at least the value of primove. The effect of increasing
primove is to make a process remain at a low or high priority for longer. It
also means that the operating system has less work to do moving processes
between different lists. The default value of primove is 0 for compliance with
POSIX.1b. This means that any change in a process' priority will cause it to be
moved to a different list.
For an example of the use of primove, assume that it is given a value of 10. If
the priority of a process begins at 51 and rises by at least a value of 10, it is
moved to the list corresponding to priority 61. The process does not move
between lists until its priority rises by at least the value of primove. So if the
process' priority rose to 60, it would remain on the priority 50 list. The kernel,
however, would still see the process as having a lower priority than another
in the priority 55 list. Conversely, a process in the priority 71 list will stay
there until its priority falls to 61.
Increasing the value of primove makes the kernel less sensitive to process
priorities.
Reducing the value of primove produces fairer scheduling for all processes
but increases the amount of kernel overhead that is needed to manipulate the
run queue.

36

Performance Guide

Process scheduling

Controlling cache affinity -

cache_affinity

By default, the scheduler does not gives preference to a process that last executed on a CPU. The advantage of giving preference to these processes is to
improve the hit rate on the level-one (Ll) cache and L2 caches. As a consequence, the hardware is less likely to have to reload the caches from memory,
an action that could slow down the processor. It also means that the process
selected to run does not necessarily have the highest priority.
Cache affinity behavior is controlled by the value of the variable
cache_affinity. If the value of cache_affinity is changed to 1, the kernel gives
preference to processes which previously ran on a CPU Valid data and text is
more likely to be found in the caches for small processes. If your system tends
to run large processes leave cache_affinity set to O.

Controlling process preemption -

preemptive

On multiprocessor systems, the scheduler looks for a CPU on which to run a
process when that process becomes runnable, or when its time slice. has
expired. The scheduler first looks for an idle CPU. If it cannot find an idle
CPU, it next considers preempting the process on the current CPU if it has a
lower priority; it is quicker to preempt the current process as this does not
require an interprocessor interrupt. With some application mixes, however,
this can increase the number of context switches. For example, when a database server wakes a client, it may be more efficient, in terms of system
resources, for the server to continue to run for a period of time after that
wakeup.
To prevent the scheduler from preempting the current processor, change the
value of preemptive to O.

Load balancing -

loadbalance

On multiprocessor systems, the default behavior of the scheduler is to run the
highest priority jobs on each of the processors. For example, when a process
is woken after a disk transfer completes, the scheduler checks if any CPU is
running a process with a lower priority. If so, the processor is instructed to
reschedule and run the newly woken process. This load balancing feature is
also used when a process is taken off a CPU; it is possible that the process has
a higher priority than one on another CPU.

37

Tuning CPU resources

If you change the value of loadbalance to 0, the scheduler no longer looks for

lower priority processes on other CPUs. This reduces the probability that a
process will be preempted. On a system that is performing a reasonable
amount of I/O requests, this should reduce the number of context switches
and interprocessor interrupts. This provides more processor cycles for executing user applications and should increase overall performance. Processors
which are idle can still be selected so idle time is minimized. This adjustment
is likely to improve performance where context switching frequency is high,
or on database servers where user processes should not be disturbed once
they are running. If the system is already spending a significant amount of
time idle, it is unlikely that this adjustment will improve performance.

Identifying CPU-bound systems
A system is CPU bound (has a CPU bottleneck) if the processor cannot execute
fast enough to keep the number of processes on the run queue consistently
low. To determine if a system is CPU bound, run sar -u (or cpusar -u for each
processor on a system with an SCO SMP License) and examine the %idle
value.
If %idle is consistently less than 5% (for all CPUs) on a heavily loaded database server system, then the system may be lacking in processing power. On
a heavily loaded system with many logged-in users, a %idle value that is persistently less than 20% suggests that the system not be able to cope with a
much larger load. Examination of the number of processes on the run queue
shows whether there is an unacceptable buildup of runnable processes. If
processes are not building up on the run queue, a low idle time need not indicate an immediate problem provided that the other subsystems (memory and
I/O) can cope with the demands placed upon them.

Run queue activity can be considered heavy if sar -q (mpsar -q for SMP)
reports that runq-sz is consistently greater than 2 (and %runocc is greater than
90% for SMP). If low %idle values are combined with heavy run queue
activity then the system is CPU bound.
If low %idle values are combined with low or non-existent run queue activity,

it is possible that the system is running CPU-intensive processes. This in itself
is not a problem unless an increase in the number of executing processes
causes a buildup of numbers of processes on the run queue.

38

Performance Guide

Identifying CPU-bound systems

If %wio values are consistently high (greater than 15% ), this is more likely to
indicate a potential I/O bottleneck than a problem with CPU resources. See
Chapter 5, "Tuning I/O resources" (page 71) for more information on identifying I/O bottlenecks.

High values of %wio may also be seen if the system is swapping and paging.
Memory shortages can also lead to a disk I/O bottleneck because the system
spends so much time moving processes and pages between memory and
swap areas on disk. If the value of %sys is high relative to %usr, and %idle is
close to zero, this could indicate that the kernel is consuming a large amount
of CPU time running the swapping and page stealing daemons( sched and
vhand). These daemons are part of the kernel and cannot be context
switched; this may lead to several processes being stuck on the run queue
waiting to run. For details of how to identify and tune memory-bound systems, see Chapter 4, "Tuning memory resources" (page 41) and "Tuning
memory-bound systems" (page 52).
The following table summarizes the commands that you can use to determine
if a system is CPU bound:
Table 3-1

Identifying a CPU-bound system

Command

Field

Description

sar-u
mpsar-u

%idle
%idle

cpusar-u

%idle

[mp]sar-q

%runocc
runq-sz

percentage of time that the CPU was idle
average percentage of time all CPUs are idle
(SMP only)
percentage of time the specified CPU was idle
(SMP only)
percentage of time the run queue is occupied
number of processes on the run queue

See l'Tuning CPU-bound systems" (page 40) for a discussion of how to tune
CPU-bound systems.

39

Tuning CPU resources

Tuning CPU-bound systems
If it has been determined that the system is CPU bound, there are a number of

things that can be done:
• If possible, consider rescheduling the existing job load on your system. If

many large jobs are being run at once, rescheduling them to run at different
times may improve performance. You should also check the system's crontab(C) files to see if any jobs running at peak times can be scheduled to
run at other times.
~at they use require less CPU power.
Consider replacing non-critical applications with ones that require a less
powerful system.

• If possible, tune the applications so

• If you have evidence that the system is I/O bound serving interrupts from

non-intelligent serial cards, replacing these with intelligent serial cards will
offload some of the I/O burden from the CPUs. See #Serial device
resources" (page 108) for more details.
• Check if the hard disk controllers in the system are capable of using DMA
to transfer data to and from memory. If the CPU has to perform programmed I/O on behalf of the controller, this can limit its performance.
• It is possible that because of a lack of free memory the system is swapping,

which could result in a considerable portion of the CPU resources being
used to transfer processes back and forth between the disk and memory.
To determine if this is the case see the section Chapter 4, "Tuning memory
resources" (page 41).
• Upgrade to a faster CPU or CPUs.
• Upgrade to a multiprocessor system from a single processor system. This
will help if there are runnable jobs on the run queue or the applications
being run are multithreaded.
• Add one or more CPUs to a multiprocessor system.
• Purchase an additional system and divide your processing requirements
between it and your current system.

40

Performance Guide

Chapter 4

Tuning menwry resources
The sea OpenServer system is a virtual memory operating system. Virtual
memory is implemented using various physical resources:
• Physical memory as RAM chips; sometimes referred to as primary, main, or
core memory.
• Program text (machine code instructions) as files within filesystems on disk
or ramdisks.
• Swap space consisting of one or more disk divisions or swap files within
file systems dedicated to this purpose. The individual pieces of swap space
are known as swap areas. Swap space is also referred to as secondary
memory.
Depending on the system hardware, there may also be physical cache
memory on the CPU chip itself (level-one (Ll) cache), or on the computer's
motherboard (level-two (L2) cache), and on peripheral hardware controller
cards. If recently accessed data (or, for some Ll and L2 caches, machine
instructions) exists in this memory, it can be accessed immediately rather than
having to retrieve it from more distant memory.
Write-through caches store data read from memory or a peripheral device;
they ensure that data is written synchronously to memory or a physical
device before allowing the CPU to continue. Write-back caches retain both
read and written data and do not require the CPU to synchronize with data
being written.

41

Tuning memory resources

NOTE Most L2 caches work with a limited amount of main memory. Adding more RAM than the cache can handle may actually make the machine
slower. For some machines with a 64KB L2 cache, this only covers the first
16MB of physical memory. See the documentation provided with your computer or motherboard hardware for more details.

Physical memory
Physical memory on the system is divided between the area occupied by the
kernel and the area available to user processes. Whenever the system is
rebooted the size of these areas, as well as the total amount of physical memory, is logged in the file /usr/adm/messages under the heading mem:, for example:
mem: total

= 32384k,

kernel

= 4484k,

user

= 27900k

This shows a system with 32MB of physical memory; the kernel is using just
over 4MB of this memory with the remainder being available for user processes.
Physical memory is divided into equal-sized (4KB) pieces known as pages.
When a process starts to run, the first 4KB of the program's text (executable
machine instructions) is copied into a page of memory. Each subsequent portion of memory that a process requires is assigned an additional page.
When a process terminates, its pages are returned to the free list of unused
pages.
Physical memory is continually used in this way unless the number of running processes require more pages of memory than currently exist on the system. In this case the system must redistribute the available memory by either
paging out or swapping.

Virtual memory
The operating system uses virtual memory to manage the memory requirements of its processes by combining physical memory with secondary memory ( swap space) on disk. The swap area is usually located on a local disk
drive. Diskless systems use a page server to maintain their swap areas on its
local disk.
The amount of swap space is usually configured to be larger than physical
memory; the sum of physical memory and swap space defines the total virtual memory that is available to the system.

42

Performance Guide

Virtual memory

Having swap space on disk means that the CPU's access to it is very much
slower than to physical memory. Conventionally, the swap area uses an
entire division on a hard disk. It is also possible to configure a regular file
from within a file system for use as swap. Although this is intended for use by
diskless workstations, a server can also increase its swap area in this way.
The swap area is used as an additional memory resource for processes which
are too large for the available physical user memory. In this way, it is possible
to run a process whose entire executable image will not fit into physical memory. However, a process does not have to be completely loaded into physical
memory to run, and in most cases is unlikely to be completely loaded anyway.
The virtual address space of a process is divided into separate areas known as
regions that it uses to hold its text, data, and stack pages. When a program is
loaded, its data region consists of data pages that were initialized when the
program was compiled. If the program creates uninitialized data (often
known as bss for historical reasons) the kernel adds more pages of memory to
the process' data region.
If the operating system is running low on physical memory, it can start to

write pages of physical memory belonging to processes out to the swap area.
See "Paging" (page 44) and "Swapping" (page 47) for more details.
Figure 4-1 (page 44) illustrates how a process' virtual memory might correspond to what exists in physical memory, on swap, and in the file system. The
u-area of a process consists of two 4KB pages (displayed here as U and U ) of
virtual memory that contain information about the process needed by the ~ys­
tem when the process is running. In this example, these pages are shown
existing in physical memory. The data pages, D and D , are shown as being
paged out to the swap area on disk. The text p1g e, T4,bas also been paged
out but it is not written to the swap area as it exists in the filesystem. Those
pages which have not yet been accessed by the process (D , T2' and T ) do not
· l memory or m
·the swap area.
5
5
occupy any resources in p h YSlca

43

Tuning memory resources

Process
virtual
memory

u-area
stack
data
tex1

Physical
memory

Disk

~
I

81

1

82

1

83

1

I I D21~1~1)(J

D1
I T1 k> Itmp/netstat_op
The administrator runs the command on several workstations to try to eliminate the possibility that faulty network interface cards are the cause of the
problem.

158

Performance Guide

Case study: network overhead caused by X clients

The recorded output shows occasional short periods when the network is
overloaded (for clarity, only the statistics for the network interface xxxO are
shown in this example):
input (xxxO)
output
packets errs packets errs colls
110
78
85
180
120
87
67

0
0
1
2
1
0
0

101
66
75
123
55
67
54

0
0
2
1
1
0
0

0
0
23
42
18
2
0

At these times, the numbers of input and output errors are non-zero, and the
number of collisions approaches 30% of output packets. The same behavior is
observed on all the workstations on which statistics were gathered.
If the periods of heavy loading are excluded, the frequency of packet colli-

sions approaches 0%.

Formulating a hypothesis
From the results of running netstat, the system administrator suspects that
some applications must be moving large amounts of data across the network.
Careful examination of the figures shows that the network is overloaded
approximately 5% of the time. Periods of high loading generally last only a
few minutes and seem to occur in bursts. Such behavior is typical if large files
are transferred using NFS. It is unlikely to be the result of network traffic
caused by remote X clients as these are run locally where possible. Possible
culprits are programs used to preview PostScript and graphics image files,
DTP packages, and screen-capture utilities.

Getting more specifics
With the cooperation of several users, the administrator monitors network
performance using netstat over a period of 30 minutes. During this period the
users run the suspect applications to load and manipulate large files across
the network The outcome of this investigation is that graphics image previewers and screen-capture utilities seem to cause the most network overhead. The files being viewed or created are often several megabytes in size.

159

Tuning networking resources

Making adjustments to the system
There are several things that can be done to reduce the peak load on the network:
• Encourage users to save and load graphics images to and from the local
disk on the workstation they are using. These files may then be copied to
the file server when the network is less busy.
• Run screen capture and graphics preview utilities on dedicated workstations rather than on X terminals.
• Splitting the network into several subnets might help if the nodes on the
network can easily be divided into logically distinct groups. However, this
may cause more CPU overhead on the file server if it is is used as the router
between the subnets. This solution is more expensive and may make the
problem worse if the wrong network topology is chosen.

160

Performance Guide

Chapter 7

Tuning system call activity
This chapter is of interest to application programmers who need to investigate the level of activity of system calls on a system.
System calls are used by programs and utilities to request services from the
kernel. These can involve passing data to the kernel to be written to disk,
finding process information and creating new processes. By allowing the kernel to perform these services on behalf of an application program, they can be
provided transparently. For example, a program can write data without
needing to be concerned whether this is to a file, memory, or a physical device
such as disk or tape. It also prevents programs from directly manipulating
and accidentally damaging system structures.
System calls can adversely affect performance because of the overhead
required to go into system mode and the extra context switching that may
result.

Viewing system call activity
System call activity can be seen with sar -c (or mpsar -c for SMP):
23:59:44 scall/s sread/s swrit/s
23:59:49
473
9
0
23:59:54
13
516
3
483
13
23:59:59
3

Average

489

12

2

fork/s
0.09
0.03
0.01

exec/s
0.12
0.03
0.02

rchar/s
292077
367668
366992

wchar/s
421
574
566

0.04

0.06

338280

512

scall/s indicates the average number of system calls per second averaged
over the sampling interval. Also of interest are sread/ s and swri t/ s which
indicate the number of read(S) and write(S) calls, and rchar / s and wchar / s
which show the number of characters transferred by them.

161

Tuning system call activity

If you are an applications programmer and the SCO OpenServer Development

System is installed on your system, you can use prof(CP) to examine the
results of execution profiling provided by the monitor(S) function. This
should show where a program spends most of its time when it is executing.
You can also use the trace(CP) utility to investigate system call usage by a program.

Identifying excessive read and write system call activity
Normally, read and write system calls should not account for more than half
of the total number of system calls. If the number of characters transferred by
each read (rchar/s I sread/s) or write (wchar/s I swrit/s) call is small, it is
likely that some applications are reading and writing small amounts of data
for each system call. It is wasteful for the system to spend much of its time
switching between system and user mode because of the overhead this incurs.
It may be possible to reduce the number of read and write calls by tuning the

application that uses them. For example, a database management system
may provide its own tunable parameters to enable you to tune the caching it
provides for disk I/O.

Viewing process fork and exec activity
fork/ s and exec/ s show the number of fork(S) and exec(S) calls per second. If
the system shows high fork and exec activity, this may be due to it running a
large number of shell scripts. To avoid this, one possibility is to rewrite the
shell scripts in a high-level compiled language such as C.

Viewing AIO activity
If applications are using asynchronous I/O (AIO) to disk, you can use the -0

option to sar(ADM) (or mpsar(ADM) for SMP) to examine the performance of
AIO requests. The values reported include the number of AIO read and write
requests per second, and the total number of lKB blocks (both read and write)
being handled per second. The %direct column of the report shows the percentage of AIO requests that are passed directly to the disk driver by the
POSIX.lb aio functions defined in the Software Update for Database Systems
(SUDS) library. Other AIO requests are handled by the aio(HW) driver.

Viewing IPC activity
You can use the sar -m command (or mpsar -m for SMP) to see how many System V interprocess communication (IPC) message queue and semaphore
primitives are issued per second. Note that you can also use the ipcs(ADM)
command to report the status of active message queues, shared memory segments, and semaphores.

162
--

~----~~-

._-

---

Performance Guide
....

-----~~--

Viewing system call activity

Semaphore resources
Semaphores are used to prevent processes from accessing the same resource,
usually shared memory, at the same time.
The number of System V semaphores configured for use is controlled by the
kernel parameter SEMMNS.
If the sema/s column in the output from sar -m shows that the number of
semaphore primitives called per second is high (for example, greater than
100), the application may not be using !PC efficiently. It is not possible to
recommend a value here. What constitutes a high number of semaphore calls
depends on the use to which the application puts them and the processing
power of the system running the application.

System V semaphores are known to be inefficient and adversely affect the performance of multiprocessor systems. This is because:
• They increase contention between processors - this reduces scaling and
prevents the available CPU power being used effectively.
• They increase activity on the run queues as several processes sleeping on a
semaphore may be woken when its state changes - this increases system
overhead.
• They increase the likelihood of context switching - this also increases system overhead.
If you are an applications programmer, consider using the SUDS library
routines instead; these implement more efficient POSIX.lh semaphores. The
number of POSIX.lb semaphores configured for use is controlled by the kernel
parameter SEM_NSEMS_MAX.

Some database management systems may use a sleeper driver to synchronize
processes. (This may also be referred to as a post-wait driver.) If this is not
enabled, they may revert to using less efficient System V semaphores. See the
documentation provided with the database management system for more information.
For more information on the kernel parameters that you can use to configure
semaphores, see #Semaphores" (page 216) and #Semaphore parameters" (page
217).

Messages and message queue resources
Messages are intended for interprocess communication which involves small
quantities of data, usually less than lKB. Between being sent and being
received, the messages are stored on message queues. These queues are
implemented as linked lists within the kernel.

163

Tuning system call activity

Under some circumstances, you may need to increase resources allocated for
messages and message queues above the default values defined in the
mtune(F) file. Note that the kernel parameters defined in mtune set systemwide limits, not per-process limits.
Follow the guidelines below when changing the kernel parameters that control the configuration of message queues:
• Each process that calls msgget(S) with either of the flags IPC_CREAT or
IPC_PRIVATE set obtains an ID for a new message queue.
• The total number of available message headers (MSGTQL) must be less
than or equal to 16383. This limits the total number of messages systemwide because each unread message must have a header.
• The total number of segments configured for use (MSGSEG) must be less
than or equal to 32768. This limits the total number of messages systemwide because each message consists of at least one segment.
• The size of each message segment (MSGSSZ) is specified in bytes and must
be a multiple of 4 in the range 4 to 4096. Each message is allocated enough
segments to hold it; any remaining space in the last segment allocated to a
message is unused. A small value of MSGSSZ is suitable for systems which
will send and receive many small messages. A large value is suitable if
messages are fewer and larger. Small segments require more processing
overhead by the kernel as it keeps track of them; large segments can be
wasteful of memory.
• The total amount of memory reserved for use by message data is controlled
by the product of the number of segments and the segment size:
MSGSEG

* MSGSSZ

This value must be less than or equal to 128KB (131072 bytes).
• Increase the size of the map used for managing messages (MSGMAP) if a
large number of small messages are processed. Typically, you should set
the map size to half the number of memory segments configured
(MSGSEG). Do not increase MSGMAP to a value greater than that of
MSGSEG.

• The amount of message data allowed in an individual queue (MSGMNB)
must be less than or equal to 64KB - 4 bytes (that is, less than or equal to
65532 bytes).
• The maximum length of an individual message is limited by the value of
MSGMAX. Although the recommended maximum is 8192 bytes (8KB), the
kernel can support messages up to 32767 bytes in length. Note, however,
that the message size may also be limited by the value of MSGMNB.

164

Performance Guide

Viewing system call activity

The following table shows how to calculate the maximum values for these
parameters based on the value of MSGSSZ. Note that MSGSSZ must be a
multiple of 4 in the range 4 to 4096:
Table 7-1

Calculation of maximum value of message parameters

Parameter

Maximum value

MSGMAP
MSGMAX
MSGMNB
MSGMNI
MSGSEG
MSGTQL

131072 / MSGSSZ
32767
65532
1024
131072 / MSGSSZ
MSGMNB / MSGSSZ

For more information on the kernel parameters that you can use to configure
message queues, see "Message queues" (page 213) and "Message queue
parameters" (page 215).

Shared memory resources
Shared memory is an extremely fast method of interprocess communication.
As its name suggests, it operates by allowing processes to share memory segments within their address spaces. Data written by one process is available
immediately for reading by another process. To prevent processes trying to
access the same memory addresses at the same time, known as a race condition, the processes must be synchronized using a mechanism such as a semaphore.
The maximum number of shared-memory segments available for use is controlled by the value of the kernel parameter SHMMNI. The maximum size in
bytes of a segment is determined by the value of the kernel parameter
SHMMAX.
For more information on the kernel parameters that you can use to configure
shared memory, see "Shared memory" (page 218) and "Shared memory
parameters" (page 218).
SUDS library spin locks and latches
If your application uses spin locks and latches from the SUDS library to syn-

chronize processes, you can use the -L option to sar(ADM) (or mpsar(ADM)
for systems with an sea SMP License) to view their activity.

165

Tuning system call activity

These latches allow processes to spin or sleep while waiting to acquire a latch.
Alternatively, a process can be made to sleep if it has been spinning for a
given time period without being able to acquire a latch. This prevents it
spending an unnecessarily long time spinning. It is efficient for a process to
spin for a short time to avoid the system overhead that a context switch
would cause. Process that wait a long time for a latch should sleep to avoid
wasting CPU time.
See the sar(ADM) manual page for more information about the latch activity
reported by the -L option.
The following table summarizes the commands that can be used to determine
if a system is suffering under heavy system call activity:
Table 7·2 Viewing system call activity

Command

Field

Description

[mp]sar-c

scall/s
sread/s
swrit/s
fork/s
exec/s
rchar/s

total number of all system calls per second
read system calls per second
write system calls per second
fork system calls per second
exec system calls per second
characters transferred by read system calls per
second
characters transferred by write system calls per
second
status of System V !PC facilities
message queue primitives per second
semaphore primitives per second
percentage of AlO requests using the POSIX.lb
aio functions

wchar/s
ipcs -a
[mp]sar-m
[mp]sar-O

msg/s
sema/s
%direct

Reducing system call activity
Reducing most system call activity is only possible if the source code for the
programs making the system calls is available. H a program is making a large
number of read and write system calls that each transfer a small number of
bytes, then the program needs to be rewritten to make fewer system calls that
each transfer larger numbers of bytes.

166

Performance Guide

Case study: semaphore activity on a database server

Other possible sources of system call activity are applications that use
interprocess communication (semaphores, shared memory, and message
queues), and record locking. You should ensure that the system has sufficient
of these resources to meet the demands of the application. Most large applications such as database management systems include advice on tuning the
application for the host operating system. They may also include their own
tuning facilities, so you should always check the documentation that was supplied with the application.

Case study: semaphore activity on a database server
In this study, a site has installed a relational database on a multiprocessor system. The database gives the choice of using System V semaphores or the
sleeper driver (sometimes called the post-wait driver) to synchronize processes. The object is to investigate which of these options will maximize the
number of transactions that can be processed per second and the response
time for the user.

System configuration
The system's configuration is as follows:
• Multiprocessor - 2 Pentium 60MHz processors.
• EISA bus.
• 96MB of RAM.
96MB of swap space.

•

• 14GB of hard disk (two arrays of seven 1GB SCSI-2 disks).

• One Bus Mastering DMA Ethernet network card with a 16KB buffer and 32bit wide data path.
The database server does not act as host machine to any users directly; instead
there are five host machines connected to the LAN which serve an average of
100 users each.

Defining a performance goal
The performance goal in this study is to compare the performance of the database when using System V semaphores and when using the sleeper driver.
NOTE To configure the sleeper driver into the kernel, change the second
field of the line in the file letc/conflsdevice.dlsleeper to read Y ". Then relink
and reboot the kernel.

I

1/

167

Tuning system call activity

Collecting data
To monitor the performance, an in-house benchmark is used for an hour with
the system configured to use System V semaphores, and then with it using the
sleeper driver. The benchmark measures the minimum, maximum, and average transaction times and the total throughput in transactions per second.
The result of running the benchmark is that the best performance is achieved
using the sleeper driver.

Formulating a hypothesis
When the database is using System V semaphores, the system may be spending too much time in kernel mode executing semaphore calls. The benchmark
run using the sleeper driver gives better results because it is an enhancement
specifically aimed at improving the performance of relational databases. It
allows an RDBMS to synchronize built-in processes without the high overhead
of switching between user mode and system mode associated with System V
semaphores.

Getting more specifics
To test the hypothesis, mpsar -u is used to display the time that the system
spent in system mode while each benchmark was being run. For the benchmark using the sleeper driver, typical results were:
13:55:00

%usr

%sys

%wio

%idle

14:20:00
14:25:00
14:30:00
14:35:00

75
72
69
77

20
23
24
19

2
1
5
4

3
4
2

o

The averaged performance of all the CPUs was excellent with low percentages
spent in system mode, idle waiting for I/O, or idle.

168

Performance Guide

Case study: semaphore activity on a database server

For the run using semaphores, the results were:
16:08:00

%usr

%sys

%wio

%idle

16:48:00
16:43:00
16:58:00
16:53:00

55
59
61
58

38
32
34
38

6
4

0
2
1

2

2

7

The system spends more time in system mode and waiting for I/O when System V semaphores are used. The benchmark results indicate that transaction
throughput and response time are approximately 10% better when the sleeper
driver is used.

Making adjustments to the system
The database is configured to use the sleeper driver as this provides the best
performance for the benchmark. The system should be monitored in everyday use to evaluate its performance under real loading.
Vendors of the database management systems are continually improving their
products to use more sophisticated database technologies. If you upgrade the
database management system to a version that supports POSIX.1b semaphores, you may need to evaluate if these should be used instead of the
sleeper driver.

169

Tuning system call activity

170

Performance Guide

Appendix A

Tools reference
A variety of tools are available to monitor system performance or report on
the usage of system resources such as disk space, interprocess communication
(IPC) facilities, and pipes:
d£

Reports the amount of free disk blocks on local disk divisions. See
"df - report disk space usage" (page 172) and d£(C) for more information. Also see the descriptions of the related commands:
dfspace(C) and du(C).

ipcs

Reports the status of System V interprocess communication (IPC)
facilities - message queues, semaphores, and shared memory. See
ipcs(ADM) for more information.

netstat Reports on STREAMS usage and various network performance statistics. It is particularly useful for diagnosing if a network is overloaded or a network card is faulty. See netstat(TC) for more information. See also ndstat(ADM) which reports similar information.
nfsstat Reports NFS statistics on NFS servers and clients. It is particularly
useful for detecting problems with NFS configuration. See
nfsstat(NADM) for more information.
ping

Can be used to test connectivity over a network. See ping(ADMN)
for more information.

pipestat Reports on the usage of ordinary and high performance pipes. See
pipe(ADM) for more information.
ps

Reports on processes currently occupying the process table. See lipS
- check process activity" (page 173) and ps(C) for more information.

sar

Samples the state of the system and provides reports on various
system-wide activities. See "sar - system activity reporter" (page
176) and sar(ADM) for more information.

171

Tools reference

swap

Reports on the amount of available swap space or configures additional swap devices. See "swap - check and add swap space" (page
179) and swap(ADM) for more information.

timex

Reports on system resource usage during the execution of a
command or program. See "timex - examine system activity per
command" (page 180) and timex(ADM) for more information. See
also the description of the related command, time(C).

traceroute
Traces the route that network packets take to reach a given destination. ~ee traceroute(ADMN) for more information.
vmstat Reports on process states, paging and swapping activity, system
calls, context switches and CPU usage. See "vmstat - virtual
memory statistics" (page 181) and vmstat(C) for more information.

elf -

report disk space usage
When attempting to achieve optimal performance for the I/O subsystem, it is
important to make sure that the disks have enough free space to do their job
efficiently. The d£(C) command, and its close relative d£space(C), enable you
to see how much free space there is. The following example shows the output
from d£ and d£space on the same system:
$ df
I
lu
Ipublic
I london
$ dfspace
I
lu
I public
I london

(ldev/root
(ldev/u
(I dev Ipublic
(wansvr:/london
Disk
Disk
Disk
Disk

):

):
):
):

37872
270814
191388
149750

blocks
blocks
blocks
blocks

space: 18.49 ME of
space: 132.23 MB of
space: 93.45 ME of
space: 73.12 MB of

292.96
629.98
305.77
202.56

46812 i-nodes
36874 i-nodes
55006 i-nodes
o i-nodes
ME
ME
MB
ME

available
available
available
available

Total Disk Space: 317.29 ME of 1431.29 MB available (22.17%).
$ df -v
free
Mount Dir Filesystem
blocks
used
600000
37872
Idev/root
562128
1290218 1019404
270814
Idev/u
lu
626218
434830
191388
Ipublic
Idev/public
414858
149750
I london
wansvr:/london
265108

( 6.31%).
(20.99%).
(30.56%).
(36.10%).

%used
93%
79%
69%
63%

The -i option to d£ also provides additional information about the number of
free and used inodes.
d£space is a shell script interface to d£. Without options, it presents the filesystem data in a more readable format than d£. When used with its options,
d£ provides more comprehensive information than d£space.
172

Performance Guide

ps - check process activity

In the above example, there are three local filesystems:

• /dev/root
• /dev/u
• /dev/public
and one remote filesystem:

• wansvr:/london
All of these local filesystems have adequate numbers of blocks and inodes
remaining for use. You should aim to keep at least 15% of free space on each
filesystem. This helps to prevent fragmentation which slows down disk I/O.
In the above example there are no problems with the filesystems /dev/u and
jdev/public which are less than 85% used. The root file system (/dev/root), however, is 93% full. This file system is relatively static apart from the temporary
file storage directories /tmp and /usr/tmp. In the configuration shown, there is
very little free space in these directories. Possible solutions are to create divisions to hold these directories on other disks, or increase the size of the root
filesystem.
du(C) is another command that can be used to investigate disk usage. It differs
from df and dfspace because it reports the number of 512-byte blocks that
files and directories contain rather than the contents of an entire filesystem. If
no path is specified, du reports recursively on files and directories in and
below the current directory. Its use is usually confined to sizing file and directory contents.

ps -

check process activity
The ps(C) command obtains information about active processes. It gives a
"snapshot" picture of what processes are executing, which is useful when you
are trying to identify what processes are loading the system. Without options,
ps gives information about the login session from which it was invoked. If
you use ps as user root, you can obtain information about all the system's processes. The most useful options are as follows:
Table A-1

ps options

Option

Reports on:

-e

print information on all processes
generate a full listing
generate a long listing (includes more fields)
print information on a specified user (or users)

-f
-1

-u

173

Tools reference

With various combinations of the above options you can, amongst other
things, find out about the resource usage, priority and state of a process or
groups of processes on the system. For example, below is an extract of output
after typing ps -el:
F
31
20
31
31

S
S
S
S
S

UID
0
0
0
0

20
20
20
20
20
20
20
20
20
20
20
20
20

S
S
S
S
R
S
Z
R
S

0
0
20213
13079
13079
12752
13297
13297
12752
12752
12353
13585
20213

R

S
0
S

PID PPID C PRI NI
0 a 95 20
0
1
0 0 66 20
2
0 0 95 20
0 0 81 20
3
204
441
8783
25014
25016
27895
25733
26089
26142
28220
27047
28248
28240

1
1
1
24908
24910
26142
25153
25148
1
27898
25727
28205
8783

a
0
0
0
22
0
0
45
0
55
0
36
0

76
75
73
75
36
73
51
28
73
25
73
37
75

20
20
20
20
20
20
20
20
20
20
20
20
20

SZ
1f21 0
252 40
254 0
256 0

WCHAN TrY
f0299018
eOOOOOOO
fOOc687c
fOObe318

416
972
1855
155c
506
7bO

96
44
48
48
144
40

f023451a
f01076b8
f011bae4
f010ee28
f010ed58
f011f75c

8a8
1ce2
1e16
161c
cc9
711

48 f012123c p12
48 f01214ec 010
188 f010f6bO p25
44 f012179c p13
p23
92
140 f01156f8 006

ADDR

03
006
p4
p2
010

TIME
0:00
30:37
0:01
5:19
1:56
0:00
0:04
0:01
0:03
0:00
0:00
0:01
0:04
0:01
0:00
0:00
0:00

CMD
sched
init
vhand
bdflush
cron
getty
ksh
ksh
vi
sh

ksh
csh
email
ksh
ps
vi

The field headed F gives information about the status of a process as a combination of one or more octal flags. For example, the sched process at the top
has a setting of 31 which is the sum of the flags 1,10 and 20. This means that
the sched process is part of the kernel (1), sleeping at a priority of 77 or more
(10), and is loaded in primary memory (20). The priority is confirmed by consulting the PRI field further along the line which displays a priority of 95. In
fact both sched (the swapper) and vhand (the paging daemon) are inactive
but have the highest possible priority. Should either of them need to run in the
future they will do so at the context switch following their waking up as no
other process will have a higher priority. For more information on the octal
flags displayed and their interpretation see ps(C).
The s column shows the state of each process. The states shown in the example: S, R, a and Z mean sleeping (waiting for an event), ready-to-run, on the
processor (running) and zombie (defunct) respectively. There is only one process running, which is the ps command itself (see the penultimate line). Every
other process is either waiting to run or waiting for a resource to become
available. The exception is the zombie process which is currently terminating;
this entry will only disappear from the process table if the parent issues a
wait(S) system call.

174

Performance Guide

ps - check process activity

The current priority of a process is also a useful indicator of what a process is
doing. Check the value in the PRI field which can be interpreted as shown in
the following table:
Table A-2 Priority values
Priority

Meaning

95
88
81
80
76
75

swapping/paging
waiting for an inode
waiting for I/O
waiting for buffer
waiting for pipe
waiting for tty input
waiting for tty output
waiting for exit
sleeping - lowest system mode priority
highest user mode priority
default user mode priority
lowest user mode priority

74
73
66
65

51

o

Looking back at the above ps output you can see, for example, that the getty
process has a priority of 75, as it is (not surprisingly) waiting for some
keyboard input. Whereas priority values between 66 and 95 are fixed for a
specific action to be taken, anything lower than 66 indicates a user mode process. The running process in the above example (ps) is at priority 37 and is
therefore in user mode.
The C field indicates the recent usage of CPU time by a process. This is useful
for determining those processes which are making a machine slow currently.
The NI field shows the nice value of a process. This directly affects the calculation of its priority when it is being scheduled. All processes in the above
example are running with the default nice value of 20.
The TIME field shows the minutes and seconds of CPU time used by processes.
This is useful for seeing if any processes are CPU hogs, or runaway, gobbling
up large amounts of CPU time.

175

Tools reference

The sz field shows the swappable size of the process's data and stack in lKB
units. This information is of limited use in determining how much memory is
currently occupied by a process as it does not take into account how much of
the reported memory usage is shared. Totaling up this field for all memory
resident processes will not produce a meaningful figure for current memory
usage. It is useful on a per process basis as you can use it to compare the
memory usage of different versions of an application.
NOTE If you booted your system from a file other than /unix (such as
/unix. old), you must specify the name of that file with the -n option to ps.

I

For example, ps -ef -n unix.old.

sar -

system activity reporter
sar(ADM) provides information that can help you understand how system
resources are being used on your system. This information can help you solve
and avoid serious performance problems on your system.
The individual sar options are described on the sar(ADM) manual page.
For systems with an SCO SMP License, mpsar(ADM) reports systemwide
statistics, and cpusar(ADM) reports per-CPU statistics.
The following table summarizes the functionality of each sar, mpsar, and cpusar option that reports an aspect of system activity:
Table A-3 sar, cpusar, and mpsar options
Option

Activity reported

-a

file access operations
summarize all reports
buffer cache
copy buffers
system calls
block devices including disks and all SCSI peripherals
floating point activity (mpsar only)
serial I/O including overflows and character block usage
scatter-gather and physical transfer buffers
inter-CPU interrupts (cpusar and mpsar only)
interrupts serviced per CPU (cpusar only)
latches

-A
-b
-B

-c
-d
-F
-g
-h
-I
-j
-L

(Continued on next page)

176

Performance Guide

sar - system activity reporter

Table A-3 sar, cpusar, and mpsar options
(Continued)

Option

Activity reported

-m

System V message queue and semaphores
nameicache
asynchronous I/O (AIO)
paging
run and swap queues
processes locked to CPUs (cpusar and mpsar only)
unused memory and swap
process scheduling
SCSI request blocks
CPU utilization (default option for all sar commands)
kernel tables
paging and context switching
terminal driver including hardware interrupts

-n
-0
-p
-q
-Q
-r

-R

-S
-u
-v
-w

-y

How sar works
System activity recording is disabled by default on your system. If you wish
to enable it, log in as root, enter the command lusr/lib/sarlsar_enable -y, then
shut down and reboot the system. See sar_enable(ADM) for more information.
Once system activity recording has been started on your system, it measures
internal activity using a number of counters contained in the kernel. Each
time an operation is performed, this increments an associated counter.
sar(ADM) can generate reports based on the raw data gathered from these
counters. sar reports provide useful information to administrators who wish
to find out if the system is performing adequately. sar can either gather system activity data at the present time, or extract historic information collected
in data files created by sadc(ADM) (System Activity Data Collector) or
sal(ADM).

177

Tools reference

If system activity recording has been started, the following crontab entries
exist for user sys in the file /usr/spool/cron/crontabs/sys:
o * * * 0-6 /usr/lib/sa/sa1
20,40 8-17 * * 1-5 /usr/lib/sa/sa1

The first sal entry produces records every hour of every day of the week. The
second entry does the same but at 20 and 40 minutes past the hour between 8
am and 5 pm from Monday to Friday. So, there is always a record made every
hour, and at anticipated peak times of activity recordings are made every 20
minutes. If necessary, root can modify these entries using the crontab(C) command.
The output files are in binary format (for compactness) and are stored in

lusr/adm/sa. The filenames have the format sadd, where dd is the day of the
month.

Running sar
To record system activity every t seconds for n intervals and save this data to
sar_data, enter sar -0 datafile t n on a single processor system, or mpsar -0
datafile t n on a multiprocessor system.
For example, to collect data every 60 seconds for 10 minutes into the file

/tmp/sar_data on a single CPU machine, you would enter:
sar -0 Itmp/sar_data 60 10
To examine the data from datafile, the sar(ADM) command is:
sar [ option ... ] [ -f datafile ]
and the mpsar(ADM) and cpusar(ADM) commands are:
mpsar [ option . .. ] [ -f datafile ]
cpusar [ option . .. ] [ -f datafile ]
Each option specifies the aspect of system activity that you want to examine.
datafile is the name of the file that contains the statistics you want to view.
For example, to view the sar -v report for the tenth day of the most recent
month, enter:
sar -v -f lusr/admlsalsal0
You can also run sar to view system activity in "real time" rather than examining previously collected data. To do this, specify the sampling interval in
seconds followed by the number of repetitions required. For example, to take
20 samples at an interval of 15 seconds, enter:
sar -v 15 20

178

Performance Guide

swap - check and add swap space

As shipped, the system allows any user to run sar in real time. However, the
files in the JusT/adm/sa directory are readable only by Toot. You must change
the permissions on the files in that directory if you want other users to be able
to access sar data.
With certain options, if there is no information to display in any of the
relevant fields after a specified time interval then a time stamp will be the
only output to the screen. In all other cases zeros are displayed under each
relevant column.
When tuning your system, we recommend that you use a benchmark and
have the system under normal load for your application.

swap -

check and add swap space

Swap space is secondary disk storage that is used when the system considers
that there is insufficient main memory. On a well-configured system, it is primarily used for processing dirty pages when free memory drops below the
value of the kernel parameter GPGSLO. If memory is very short, the kernel
may swap whole processes out to swap. Candidates for swapping out are
processes that have been waiting for an event to complete or have been
stopped by a signal for more than two seconds. If a process is chosen to be
swapped out then its stack and data pages are written to the swap device.
(Initialized data and program text can always be reread from the original executable file on disk).
The system comes configured with one swap device. Adding additional swap
devices with the swap(ADM) command makes more memory available to
user processes. Swapping and excessive paging degrade system performance
but augmenting the swap space is a way to make more memory available to
executing processes without optimizing the size of the kernel and its internal
data structures and without adding physical memory.
The following command adds a second swap device, /dev/swapl, to the system. The swap area starts 0 blocks into the swap device and the swap device
is 16000 512-byte blocks in size.
swap -a /dev/swap1 0 16000
Use the swap -1 command to see statistics about all the swap devices
currently configured on the system. You can also see how much swap is configured on your system at startup by checking nswap. This is listed in the configuration and diagnostic file jusT/adm/messages as a number of 512-byte blocks.

179

Tools reference

Running the swap -a command adds a second swap device only until the system is rebooted. To ensure that the second swap device is available every
time the system is rebooted, use a startup script in the /etc/rc2.d directory. For
example, you could call it S09AddSwap.
In this release, a swap area can also be created within a file system to allow

swapping to a file. To do this, you must marry a block special device to a regular file. For more information, see swap{ADM) and marry{ADM).

timex -

examine system activity per command

timex{ADM) times a command and reports the system activities that occurred
on behalf of the command as it executed. Run without options it reports the
amount of real (clock) time that expired while the command was executing
and the amount of CPU time (user and system) that was devoted to the process. For example:
# timex command command_options
real
user
sys

6:54.30
53.98
14.86

Running timex -s is roughly equivalent to running sar -A, but it displays system statistics only from when you issued the command until the command
finished executing. If no other programs are running, this information can
help identify which resources a specific command uses during its execution.
System consumption can be collected for each application program and used
for tuning the heavily loaded resources. Other information is available if the
process accounting software is installed; see timex{ADM) for more information.
NOTE To enable process accounting, log in as root, enter the command
lusrllib/acctlacct_enable -y, then shutdown and reboot the system. See
acct_enable{ADM) for more information.

I

timex belongs to a family of commands that report command resource usage.
It can be regarded as an extension to time{C) which has no options and produces output identical to timex without options. If you wish to use time then

you must invoke it by its full pathname as each of the Bourne, Kom and C
shells have their own built-in version. The output from each of the shell builtins varies slightly but is just as limited. The C shell, however, does add in
average CPU usage of the specified command.

180

Performance Guide

vmstat - virtual memory statistics

vmstat -

virtual memory statistics

vmstat(C) is a useful tool for monitoring system performance but is not as
comprehensive as sar. vmstat gives an immediate picture of how a system is
functioning. It enables you to see if system resources are being used within
their capacity.
vmstat's default output concentrates on four types of system activity - process, paging/swapping, system and CPU activity. If a timing interval is
specified then vmstat produces indefinite output until you press (Del). Consider the following example for the command vmstat 5:
PROCS
PAGING
SYSTEM CPU
dmd sw cch fil pft frp pas pif pis rsa rsi sy cs us su id
r b w frs

1 126
o 127
1 126
o 127
o 127
1 129
o 130
o 130
o 130
o 130
o 130

o 64000
o 64000
o 64000
o 64000
o 64000
o 64000
o 64000
o 64000
o 64000
o 64000
o 64000

0
8
0
0
0
10
0
0
0
0
0

0 0
0 0
0 0
0 0
0 0
0 15
0 0
0 0
0 0
0 0
0 0

0 0
0 0
0 0
0 0
0 0
0 55
0 0
0 0
0 0
0 0
0 0

0
0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0

0

0

0

0

0
0
0
0
0

59
47
45
86
24
o 1369
0 277
0 78
0 117
0 138
0 144

34 0 3 97
22 0 2 98
16 0 2 98
23 1 5 94
12 0 1 99
43 19 42 39
36 2 6 92
26 0 1 99
36 0 1 99
46 0 2 98
51 1 2 97

In this case vmstat displays data at regular intervals. Each display represent-

ing an average of the activity over the preceding five second interval.
The PROCS heading encompasses the first three fields of output:
r number of processes on the run queue
b

number of processes blocked waiting for a resource

w number of processes swapped out

During the sample period there were no swapped out processes, hardly any
processes on the run queue, and between 126 and 130 blocked processes. Any
process which was ready to run would not spend much time on the run
queue. This conclusion is reinforced by the value of id under the CPU heading
which shows that the system is almost 100% idle most of the time.
The PAGING heading encompasses both paging and swapping activity on the
system. The operating system does not preallocate swap space to running
processes. It only allocates swap space to processes that have been swapped
at least once; this space is only relinquished when such a process terminates.
It does, however, decrease its internal count of available swapp able memory.

181

Tools reference

In the above example, the amount of free swap space (frs) remains a constant

64000 (roughly 32MB in 512-byte units). Because this is the amount of swap
originally configured for this system, no swapping or paging out to disk
occurred during the sampling period. This is confirmed by the zero value of
the w field. The fields from pas to rsi also show that no processes or regions
were swapped in or out during the time that vmstat was running.
There is a brief amount of paging activity on the sixth line of output. One or
more processes attempted to access pages that were not currently valid. To
satisfy the demand for these pages, the kernel obtained them from the page
cache (cch) in memory or from file systems on disk but not from swap (sw).
H a process invokes the fork(S) system call, this creates an additional copy, or
child process, of the original process. The new process shares the data or stack
regions of its parent. The pages in these regions are marked copy-on-write (
COW). This is to avoid wasting CPU and memory resources because the usual
purpose of a fork is for either the parent or child process to execute a new
command in place of itself. H, instead, the parent or child process tries to
write to a page marked COW, this generates a protection fault (pft) causing
the page fault handler in the kernel to make a copy of the page.
The dmd field accounts for a combination of demand zero pages (those created
and initialized with zeros for data storage) and demand fill pages (those created and filled with text).
System call (sy) and context switching activity (cs) can also be seen under the
SYSTEM heading.
The -s option to vmstat reports statistics about paging activity since the system was started or in a specified time interval:
64000
12222
25932
44589
28719
33791
84644
23

free swap space
demand zero and demand fill pages
pages on swap
pages in cache
pages on file
protection fault
pages are freed
success in swapping out a process
o fail in swapping in a process
22 success in swapping in a process
98 swapping out a region
64 swapping in a region
457461 cpu context switches
1870524 system calls

182

Performance Guide

vmstat - virtual memory statistics

Lines showing large values for pages on swap, success in swapping out a
process, success in swapping in a process, swapping out a region, and
swapping in a region may indicate that excessive swapping or paging is
degrading performance.
The -f option to vmstat provides information about the number of forks (that
is, new processes created) since the system was started or in a specified time
interval. For example, to monitor how many fork system calls are being
invoked every second, use the command vmstat -f 1:
o forks

o forks
2 forks
1 forks
o forks

183

Tools reference

184

Performance Guide

Appendix B

Configuring kernel parameters
Kernel parameters control the allocation of various kernel resources. These
resources are constantly being used, released and recycled, and include:
buffers

Recently used data is cached in memory; buffers increase performance by reducing the need to read data from disk. Buffers
also allow efficient transfer of data by moving it in large units.

table entries Space in system tables that the kernel uses to keep track of
current tasks, resources, and events.
policies

Governing such things as security, and conformance to various
standards.

Other parameters are used to indicate control the behavior of device drivers
or the available quantity of special resources such as the number of
multiscreens or semaphores.
Each resource limit is represented by a separate kernel parameter. The limit
imposed by a parameter can be decreased or extended, sometimes at the
expense of other resources. Deciding how to optimize the use of these
resources is one aspect of kernel performance tuning.
For a description of the tools available for examining and changing parameters, see "Configuration tools" (page 188).
For a description of the various kernel parameters that you can change using
the configure(ADM) utility or via the Hardware/Kernel Manager, see "Kernel
parameters that you can change using configure" (page 191).
For a description of the various kernel parameters that you can only change
from the command line using the idtune(ADM) utility, see ''Using idtune to
reallocate kernel resources", (page 190).

185

Configuring kernel parameters

See "Using configure to change kernel resources" (page 189) for a description
of how to run the configure{ADM) utility.
If you have TCP lIP installed on your system, see Appendix C, "Configuring

TCP lIP tunable parameters" (page 225).
If you are using the LAN Manager Client Filesystem (LMCFS), see

"LAN Manager Client Filesystem parameters" (page 223).

When to change system parameters
Among the cases in which you may need to reallocate system resources are:
• You install additional physical memory and thus have greater memory
resources to allocate.
• Persistent error messages are being displayed on the system console
indicating that certain resources are used up, such as inodes or table
entries.
• The system response time is consistently slow, indicating that other
resources are too constrained for the system to operate efficiently (as when
too little physical memory is installed).
• Resource usage needs to be tailored to meet the needs of a particular
application.
If one of your performance goals is to reduce the size of the kernel (usually

because the system is paging excessively or swapping), first concentrate on
tunable parameters that control large structures. The following table lists a
small subset of kernel tunable parameters and indicates the cost (or benefit) in
bytes of incrementing (or decrementing) each parameter by a single unit. For
example, if NCLIST set to 200, this requires 200 times 72 bytes, or approximately 14KB of memory.

186

Performance Guide

When to change system parameters

Parameter

Number of bytes per unit parameter

DTCACHEENTS
DTHASHQS
HTCACHEENTS
HTHASHQS

44
8
44
8
1024
72 (64 for the buffer + 8 for the header)
8
8
4096
8
246
80 (52 for the STREAMS header + 28 for the extended
header)
76 per entry added to the dynamic in-core inode table
344 per entry added to the dynamic process table
12 per entry added to the dynamic open file table
76 per entry added to the dynamic region table

NBUF

NCLIST
NHBUF

NHINODE
NMPBUF

MSGMAP
NSPTTYS
NSTREAM
MAX_INODE
MAX_PROC
MAX_FILE
MAX_REGION

Dynamic table parameters such as MAX_PROC usually have their values set
to O. Each table grows in size as more entries are needed. The memory overhead of the grown kernel table can be found by multiplying the values shown
above by the number of table entries reported by getconf(C). For example,
from the Korn shell, you can find the current size of the process table by
entering:
let nproc=344*$(getconf KERNEL_PRO C)
echo "Size of process table in bytes is $nproc"
Specialized applications often require the reallocation of key system resources
for optimum performance. For example, users with large databases may find
that they need more System V semaphores than are currently allocated.
Most of the tunable parameters discussed in this chapter are defined in

letc!conf/cfdlmtune. This file lists the default, maximum and minimum values
respectively of each of the parameters specified. To change the values of
specific tunable parameters manually, use the appropriate tool as described in
#Configuration tools" (page 188).

187

Configuring kernel parameters

Configuration tools
The following tools are available for examining and/or changing tunable
parameters:
configure

A menu-driven program that allows you to examine and modify
the value of tunable kernel parameters. This program is also
accessible via the Hardware/Kernel Manager. See "Using
configure to change kernel resources" (page 189) and configure(ADM) for more information.

getconf

This utility reports configuration-dependent values for various
standards and for dynamic kernel tables; use setconf to modify
temporarily those values that relate to dynamic kernel tables. See
"Examining and changing configuration-dependent values" (page
223) and getconf(C) for more information.

idtune

Modify the values of some tunable parameters (defined in
/etc/conf/cf.dlmtune) that cannot be modified with configure. See
"Using idtune to reallocate kernel resources" (page 190) and
idtune(ADM) for more information.

iddeftune

Run this command to modify the values of certain tunable parameters if you increase the amount of physical memory (RAM)
to more than 32MB. See iddeftune(ADM) for more information.

ifconfig

Reconfigure the TCP lIP protocol stack belonging to a single network interface. See "Using ifconfig to change parameters for a
network card" (page 225) and ifconfig(ADMN) for more information.

inconfig

Reconfigure default TCP lIP settings for all network interfaces. See
"Using inconfig to change global TCP/IP parameters" (page 226)
and inconfig(ADMN) for more information.

Network Configuration Manager
Examine, configure, or modify network protocol stacks (chains).
The Network Configuration Manager is the graphical version of
netconfig(ADM). See Chapter 25, "Configuring network connections" in the sea OpenServer Handbook for more information.
setconf

188

Increase dynamic kernel table sizes, or decrease maximum size of
dynamic kernel tables. The new value only remains in force until
the system is next rebooted. See "Examining and changing
configuration-dependent values" (page 223) and setconf(ADM)
for more information.

Performance Guide

Configuration tools

Using configure to change kernel resources
The configure(ADM) utility is a menu-driven program that presents each tunable kernel parameter and prompts for modification.
To change a kernel parameter using configure, do the following:
1. Enter the following commands as root to run configure:
cd /etdconflC£.d
lconfigure
2. The configure menu displays groups of parameter categories; their individual meanings are discussed in "Kernel parameters that you can change
using configure" (page 191).
Choose a category by entering the number preceding it. The resources in
that category are displayed, one by one, each with its current value. Enter
a new value for the resource, or to retain the current value, press (Enter).
After all the resources in the category are displayed, configure returns to
the category menu prompt. Return to the Main Menu to choose another
category or exit configure by entering" q".

I

NOTE The software drivers associated with a parameter must be
present in the kernel for the setting of the parameter to have any effect.

3. After you finish changing parameters, link them into a new kernel and
reboot your system as described in "Relinking the kernel" in the sea OpenServer Handbook.
NOTE If you wish to set the values of parameters defined in
letc/conf/cfdlmtune from a shell script, you should use the idtune(ADM)
command as described in ~'Using idtune to reallocate kernel resources"
(page 190).

189

Configuring kernel parameters

Using idtune to reallocate kernel resources
You cannot use configure to change some kernel parameters because they are
not generally considered to need adjusting. If you do need to alter such a
parameter, log in as root and use the idtune(ADM) command:
cd letdconflcf.d
letdconflbinlidtune resource value

resource is the name of the tunable parameter in uppercase as it appears in
/etc/conf/cfd/mtune (see mtune(F». value is the parameter's new value. Mter
changing the parameter values, relink the kernel, shut down and reboot the
system as described in "Relinking the kernel" in the sea OpenServer Handbook.
You can use the -£ option to idtune to force it to accept a value outside the
range specified by the minimum and maximum values defined in mtune. If
necessary, you can also use the -min and -max options to write new minimum
and maximum values to the mtune file.
WARNING The configure and idtune commands write new values defined
for kernel parameters to !etc/conf/cfd/stune (see stune(F». Do not edit mtune
itself as it can be a valuable reference.

I

The following sections describe the parameters that can only be tuned using
idtune:
• "Boot load extension parameters" (page 222)
• "Buffer cache free list" (page 195)
• "Hardware and device driver parameters" (page 222)
• "Memory management parameters" (page 197)
• "Message queue parameters" (page 215)
• "Semaphore parameters" (page 217)
• "Shared memory parameters" (page 218)
• "STREAMS parameters" (page 213)
• "System parameters" (page 219)
• "LAN Manager Client Filesystem parameters" (page 223)

190

Performance Guide

Configuration tools

Kernel parameters that you can change using configure
The tunable parameters that you can change using configure (ADM) are
grouped into two sets of categories depending on whether they affect system
performance or configuration:

Performance tun abies
• #Buffer management" (page 192)
• #Processes and paging" (page 195)
• #TTYs" (page 197)
• "Name cache" (page 198)
• #Asynchronous I/O" (page 199)
• "Virtual disks" (page 200)

Configuration tunables
• "User and group configuration" (page 201)
• "Security" (page 203)
• "TTY and console configuration" (page 204)
• "Filesystem configuration" (page 205)
• "Table limits" (page 207)
• "STREAMS" (page 209)

• "Message queues" (page 213)
• "Event queues" (page 216)
• "Semaphores" (page 216)
• "Shared memory" (page 218)
• "Miscellaneous system parameters" (page 219)
• 'Miscellaneous device drivers and hardware parameters" (page 220)

191

Configuring kernel parameters

Buffer management
The following tunables may be used to tune the performance of your system's
buffers.
NBUF

The amount of memory in IKB units allocated for use by the system buffer
cache at boot time. The system buffer cache is memory used as a temporary storage area between the disk and user address space when reading to or writing from mounted filesystems.
If NBUF is set to the default of 0, the system calculates the size of the buffer

cache automatically.
The size of the buffer cache is displayed as "kernel i/o bufs" at boot time,
and is recorded along with other configuration information in
lusrladmlmessages. The hit rate on the buffer cache increases as the number
of buffers is increased. Cache hits reduce the number of disk accesses and
thus may improve overall disk I/O performance. Study the sar -b report
for statistics about the cache hit rate on your system. See "Increasing disk
I/O throughput by increasing the buffer cache size" (page 75) for more information.
The system buffer cache typically contains between 300 and 600 buffers,
but may contain 8000 or more buffers on a large server system. The maximum possible number of buffers is 450000. On HTFS, EAFS, AFS, and S5IK
filesystems, each buffer uses IKB of memory plus a 72-byte header. Having an unnecessarily large buffer cache can degrade system performance
because too little space is available for executing processes.
If you are using the DTFS filesystem, buffers are multiples of 512 bytes in

size ranging from 512 bytes to 4KB. The number of buffers in the buffer
cache is not constant in this case and varies with demand.
For optimal performance, you should adjust the number of hash queues
(NHBUF) when you adjust the value of NBUF.
NHBUF

Specifies how many hash queues to allocate for buffer in the buffer cache.
These are used to search for a buffer (given a device number and block
number) rather than have to search through the entire list of buffers. This
value of NHBUF must be a power of 2 ranging between 32 and 524288.
Each hash queue costs 8 bytes of memory. The default value of NHBUF is
owhich sets the number of hash queues automatically:
• On single processor machines, NHBUF is set to the power of 2 that is
less than or equal to half the value of NBUF.

• On multiprocessor machines, NHBUF is set to the power of 2 that is
greater than or equal to twice the value of NHBUF. This reduces the
likelihood of contention between processors wanting to access the same
hash queue.
192

Performance Guide

Configuration tools

NMPBUF

Number of 4KB pages of memory used for the following types of multiphysical buffers:
• 16KB scatter-gather buffers (also known as cluster buffers). These are

used to perform transfers of contiguous blocks of data on disk to and
from the buffer cache.
• 4KB transfer buffers. These are used as intermediate storage when mov-

ing data between memory and peripheral devices with controllers that
cannot access memory above 16MB.
• 1KB copy request buffers. These are used as intermediate storage when

moving data between the buffer cache and peripheral devices with controllers that cannot access memory above 16MB.
NMPBUF should be set larger than 40 for machines with more than 16MB
of memory and many users. The maximum possible size is 512.

If the value of NMPBUF is set to zero (default), the kernel determines a
suitable value automatically at startup. In this case, it sets the value of
NMPBUF in the range 40 to 64 depending on the amount of available
memory.
PLOWBUFS

Amount of buffer cache that is contained in the first 16MB of RAM. It is
expressed as a percentage, and should be as high as possible if the controllers for the peripheral devices (such as the disks) in your system cannot
perform DMA to memory above the first 16MB (24-bit addressing controllers). If possible, set PLOWBUFS to 100 to eliminate the need to copy
between buffers above 16MB and the copy buffers (see NMPBUF).
To ascertain if a SCSI host adapter can access memory above the first 16MB
(32-bit addressing), consult the initialization message for its driver in the
file /usr/adm/messages. If the string fts= is followed by one or more characters including a d, the controller is 32-bit, otherwise it is 24-bit.
The default value of PLOWBUFS is 30, and can range between 1 and 100%.
You need only change this parameter if your system has more than 16MB
of RAM.
PUTBUFSZ

Specifies the size of the circular buffer, putbuf, that contains a copy of the
last PUTBUFSZ characters written to the console by the operating system.
The contents of putbuf can be viewed by using crash(ADM). The default
and minimum value is 2000; the maximum is 10000.
NlllNODE

Specifies the size of the inode hash table which must be a power of 2. It
ranges from 64 to 8192 with a default value of 128.

193

Configuring kernel parameters

BDFLUSHR

Specifies the rate for the bdflush daemon process to run, checking the need
to write the filesystem buffers to the disk. The range is 1 to 300 seconds.
The value of this parameter must be chosen in conjunction with the value
of NAUTOUP. For example, it is nonsensical to set NAUTOUP to 10 and
BDFLUSHR to 100; some buffers would be marked dirty 10 seconds after
they were written, but would not be written to disk for another 90 seconds.
Choose the values for these two parameters considering how long a dirty
buffer may have to wait to be written to disk and how much disk-writing
activity will occur each time bdflush becomes active. For example, if both
NAUTOUP and BDFLUSHR are set to 40, buffers are 40 to 80 seconds old
when written to disk and the system will sustain a large amount of diskwriting activity every 40 seconds. If NAUTOUP is set to 10 and BDFLUSHR
is set to 40, buffers are 10 to 50 seconds old when written to disk and the
system sustains a large amount of disk-writing activity every 40 seconds.
Setting NAUTOUP to 40 and BDFLUSHR to 10 means that buffers are 40 to
50 seconds old when written, but the system sustains a smaller amount of
disk writing activity every 10 seconds. With this setting, however, the system may devote more overhead time to searching the block lists.
WARNING If the system crashes with BDFLUSHR set to 300 (its maximum possible value) then 150 seconds worth of data, on average, will
be lost from the buffer cache. A high value of BDFLUSHR may radically
improve disk 110 performance but will do so at the risk of significant
data loss.
NAUTOUP

Specifies the buffer age in seconds for automatic filesystem updates. A
system buffer is written to disk when the bdflush daemon process runs
and the buffer has been scheduled for a write for NAUTOUP seconds or
more. This means that not all write buffers will be flushed each time
bdflush runs. This enables a process to perform multiple writes to a buffer
but fewer actual writes to a disk. This is because bdflush will sometimes
run less than NAUTOUP seconds after certain buffers were written to.
These will remain scheduled to be written until the next appropriate flush.
The ratio of writes between physical memory to kernel buffer and buffer to
disk will tend to increase (that is, fewer actual disk writes) if the ratio
between the flush rate BDFLUSHR and NAUTOUP decreases. Specifying a
smaller limit increases system reliability by writing the buffers to disk
more frequently and decreases system performance. Specifying a larger
limit increases system performance at the expense of reliability. The
default value is 10, and ranges between 0 (flush all buffers regardless of
how short a time they were scheduled to be written) and 60 seconds.

194

Performance Guide

Configuration tools

Buffer cache free list
NOTE This parameter is not tunable using configure(ADM); you must use
the idtune(ADM) command instead as described in ''Using idtune to reallocate kernel resources" (page 190).

I

BFREEMIN

Sets a lower limit on the number of buffers that must remain in the free list.
This allows some (possibly useful) blocks to remain on the free list even
when a large file is accessed. If only BFREEMIN buffers remain on the
freelist, a process requiring one or more buffers may sleep until more
become available. The value of BFREEMIN is usually set to the default and
minimum value of 0; the maximum value is 100. You may see an improvement in the buffer cache read and write hit rates reported by sar -b if you
set the value of BFREEMIN to the smaller of NBUF /10 or 100. An improvement in performance is most likely on machines that are used primarily for
media copying, uucp transfers, and running other applications that are
both quasi-single-user and access many files.

Processes and paging
The tunable parameters GPGSLO and GPGSHI determine how often the paging daemon vhand runs. vhand can only run at clock ticks and it is responsible for freeing up memory when needed by processes. It uses a "least recently
used" algorithm as an approximation of process working sets, and it writes
out pages to disk that are not modified during a defined time period.
GPGSLO

Specifies the low value of free memory pages at which vhand will start
stealing pages from processes. Normally, GPGSLO is tuned to a value that
is about 1/16 of pagable memory. Increase the value to make the vhand
daemon more likely to become active; decrease the value to make it less
likely to become active.
The value of GPGSLO must be a positive whole number greater than or
equal to 0 and less than or equal to 200. Its value must also be less than
that of GPGSHI.
If GPGSLO is too large a fraction of the pages that are available, vhand
becomes active before memory starts to become really short and useful
pages may be paged out. If GPGSLO is too small, the system may run out
of memory altogether between clock ticks. If this happens, the swapper
daemon sched runs to swap whole processes out to disk.
GPGSHI

Specifies the high value of free memory pages at which vhand will stop
stealing pages from processes. Normally GPGSHI is set to a value that is
about 1/10 of pagable memory.

195

Configuring kernel parameters

The value of GPGSHI must be a positive whole number greater than or
equal to 1 and less than or equal to 300. Its value must also be greater than
that of GPGSLO.
If the interval between GPGSLO and GPGSHI is too small, there will be a

tendency for vhand to be constantly active once the number of free pages
first drops below GPGSLO. If the interval is too large, a large amount of
disk activity is required to write pages to disk.
MINARMEM

Threshold value that specifies the minimum amount (in pages) of physical
memory tha~ is available for the text and data segments of user processes.
(Available physical memory for user processes is shown by the command
od -d availrmem in crash(ADM).) The default and minimum is 25; the
maximum is 40 pages.
If there is ever insufficient physical memory available to allocate to

STREAMS or kernel memory allocated resources, an application may fail or
hang, and the system will display the following message on the console:
CONFIG: routine - n resident pages wanted

If you see this message, it is likely that your system has insufficient RAM.
MINASMEM

Threshold value that specifies the minimum size (in pages) that available
virtual memory is allowed to reach. (Available virtual memory is shown
by the command od -d availsmem in crash(ADM).) More swap space or
physical memory must be added to the system if it runs out of virtual
memory. In the case of adding swap space, this can be done dynamically
using swap-ta-file. If system performance is still poor because it is swapping or paging out excessively, add more RAM to the system. The default
and minimum is 25; the maximum is 40 pages. If this limit is exceeded, the
following message is displayed on the console:
CONFIG: swapdel - Total swap area too small (MlNASMEM

= nu~ber

exceeded)

If there is ever insufficient physical memory available to allocate to

STREAMS or kernel memory allocated resources, an application may fail or
hang, and the system will display the following message on the console:
CONFIG: routine - n swappable pages wanted

If you see this message, increasing the value of MINASMEM may help but

it is more likely that your system has insufficient memory or swap space.
MAX SLICE

Specifies in clock ticks the maximum time slice for user processes. After a
process executes for its allocated time slice, that process is suspended. The
operating system then dispatches the highest priority process from the run
queue, and allocates to it MAXSLICE clock ticks. MAXSLICE must be a
value from 25 to 100; the default is 100.

196

Performance Guide

Configuration tools

SPTMAP

Determines the size of the map entry array used for managing kernel virtual address space. The default value is 200; the minimum and maximum
values are 100 and 500.

Memory management parameters
NOTE This group of parameters is not tunable using configure(ADM); you
must use the idtune(ADM) command instead as described in "Using idtune
to reallocate kernel resources" (page 190).

I

MAXSC

Specifies the maximum number of pages that are swapped out in a single
operation. The default and maximum value is 8.
MAXFC

Maximum number of pages that are added to the free list in a single operation. The default and maximum value is 8.

TTYs
The following parameters control various data structure sizes and other limits
in character device drivers provided with the operating system.
NCLIST

Specifies the number of character list buffers to allocate. Each buffer contains up to 64 bytes of data. The buffers are dynamically linked to form
input and output queues for the terminal lines and other slow-speed devices. The average number of buffers needed for each terminal is in the
range of 5 to 10. Each entry (buffer space plus header) costs 72 bytes.
When full, input and output characters dealing with terminals are lost,
although echoing continues, and the following message is displayed on
the console:
CONFIG: OUt of clists (NCLIST =

nu~ber

exceeded)

The default and minimum value of NCLIST is 120, and the maximum is
16640.
For users logged in over serial lines with speeds up to 9600 bps, the recommended setting of NCLIST is 10 times the maximum number of users that
you expect will log in simultaneously. You should also increase the
TTHOG parameter; this controls the effective maximum size of the raw
input queue for fast serial lines.

197

Configuring kernel parameters

Since each buffer is 64 bytes in size, you should increase NCLIST by
TTHOG divided by 64 and multiplied by the number of fast serial lines, as
shown in the following table:
lTHOG value Increase NCLIST by

2048
4096
8192

32 * number of fast serial lines
64 * number of fast serial lines
128 * number of fast serial lines

TTHOG

Sets the effective size of the raw queue of the tty driver. The default and
minimum value is 256 bytes; the maximum is 8192 bytes. Increasing the
value of this parameter allows more unprocessed characters to be retained
in the tty buffer, which may prevent input characters from being lost if the
system is extremely busy.
If you are using sustained data transfer rates greater than 9600 bps, you
should increase TTHOG to 2048 or 4096 bytes depending on the demands
of the application. You must also increase the value of NCLIST to match
the increased value of TTHOG.

Name cache
The following parameters control the performance of the namei caches that
are used to speed the translation of filenames to inode numbers.
Parameters beginning with HT control the namei cache used with HTFS, EAFS,
and AFS file systems (all based on the ht filesystem driver).
HTCACHEENTS

Number of name components in the ht namei cache. It must have a value
of between 1 and 4096; the default is 256. The recommended value for
diverse workgroups is to make HTCACHEENTS large, roughly three times
the maximum grown size of the in-core inode table reported by sar -v.
HTHASHQS

Number of hash queues for the ht namei cache. HTHASHQS must be a
prime number between 1 and 8191; the default is 61. The recommended
value of HTHASHQS for diverse workgroups is to make it at least half the
size of HTCACHEENTS.
HTOFBIAS

Determines the bias towards keeping the names of open files in the ht
namei cache. It must have a value of between 1 and 256; the default is 8.
The higher that you make the value of HTOFBIAS, the longer the names
will remain in the cache. A value of 0 means that the names have no special caching priority.

198

Performance Guide

Configuration tools

Parameters beginning with DT control the namei cache used with DTFS filesystems (based on the dt filesystem driver).
DTCACHEENTS

Number of name components in the dt namei cache. It must have a value
of between 1 and 4096; the default is 256. The recommended value for
diverse workgroups is to make DTCACHEENTS large, roughly three times
the maximum grown size of the in-core inode table reported by sar -v.
DTHASHQS

Number of hash queues for the dt namei cache. DTHASHQS must be a
prime number between 1 and 8191; the default is 61. The recommended
value of DTHASHQS for diverse workgroups is to make it at least half the
size of DTCACHEENTS.
DTOFBIAS

Determines the bias towards keeping the names of open files in the dt
namei cache. It must have a value of between 1 and 256; the default is 8.
The higher that you make the value of DTOFBIAS, the longer the names
will remain in the cache. A value of 0 means that the names have no special caching priority.

Asynchronous 110
The asynchronous I/O feature supports asynchronous I/O operations on raw
disk partitions. It must be added to the kernel using the mkdev aio command
for these parameters to have any effect (see aio(HW) for more information).
NAIOPROC

Size of the AID process table that determines the number of processes that
may be simultaneously performing asynchronous I/O. The range of values
is between 1 and 16; the default is 5. When the AID process table
overflows, the following message is displayed on the console:
CONFIG: aio_rnemlock - AIO process table overflow (NAIOPROC =nur.nber exceeded)
NAIOREQ

Size of the AID request table that determines the maximum number of
pending asynchronous I/O requests. The range of values is between 5 and
200; the default is 120. When the AID request table overflows, the following message is displayed on the console:
CONFIG: aio_breakup - AIO request table overflow (NAIOREQ = nur.nber exceeded)
NAIOBUF

Size of the AID buffer table that determines number of asynchronous I/O
buffers. This should always be set to the same value as NAIOREQ. When
the AID buffer table overflows, the following message is displayed on the
console:
CONFIG: aio_breakup - AIO buffer table overflow (NAIOBUF

=nu~ber

exceeded)

199

Configuring kernel parameters

NAIOHBUF

Number of internal asynchronous hash queues. The range of values is
between 1 and 50; the default is 25.
NAIOREQPP

Maximum number of asynchronous I/O requests that a single process can
have pending. The default value is 120, meaning that a single process can
potentially exhaust all asynchronous I/O resources. The range of values is
between 30 and 200.
NAIOLOCKTBL

Number of entries in the internal kernel table for asynchronous I/O lock
permissions. The range of values is between 5 and 20; the default is 10. If
there are many entries in the /usr/lib/aiomemlock file, this value may need to
be increased. When the AIO lock table overflows, the following message is
displayed on the console:
CONFIG: aio_setlockauth - AIO lock table overflow (NAIOLOCKTBL

=nur.nber

exceeded)

Virtual disks
The following parameters control the performance of virtual disk arrays if
these are configured on your system.
VDUNITMAX

The maximum number of virtual disks that can be configured. This
parameter defines the size of several structures used by the vd driver. On
systems where the number of virtual disks is likely to be constant, set
VDUNITMAX equal to the number of virtual disks. The default value is
100; the minimum and maximum values are 5 and 256.
VDJOBS

The maximum number of virtual disk jobs that can exist in the global job
pool. The default value is 200; the minimum and maximum values are 100
and 400.
VDUNITJOBS

The maximum number of job structures and piece pool entries for each
virtual disk in the system. A piece pool entry contains a piece structure for
each disk piece in a virtual disk array. For example, a piece pool entry for
a three-piece RAID 5 array contains three piece structures. Each job structure is 88 bytes in size. Each piece structure is 84 bytes in size. The default
value of VDUNITJOBS is 100; the minimum and maximum values are 50
and 200.
VDHASHMAX

The size of the hash table used for protecting the integrity of data during
read, modify, and write operations. Each hash table entry requires 24
bytes of memory. The value of VDHASHMAX must be a power of 2; the
minimum and maximum values are 512 and 8192. The default value is
1024.

200

Performance Guide

Configuration tools

VDASYNCPARITY

Controls whether writes to the parity device on RAID 4 and 5 devices are
performed asynchronously. The default is 1 (write asynchronously). If set
to 0, the system waits for all I/O to complete.
VDASYNCWRITES

Controls whether writes to the other half of a RAID 1 device (mirror) are
performed asynchronously. The default is 1 (write asynchronously). If set
to 0, the system waits for I/O on both halves of a mirror to complete.
VDASYNCMAX

Sets the maximum number of outstanding asynchronous writes for RAID
1, 4 and 5 configurations in asynchronous mode (that is,
VDASYNCWRITES or VDASYNCPARITY are set to 1). The default value is
20; the minimum and maximum values are 20 and 64.
VDWRITEBACK

Enables write-back caching. This increases the throughput of a virtual
disk by writing data asynchronously during the last phase of a readmodify-write job. The default value is 0 (do not use write-back caching). If
set to 1, write-back caching is enabled.
WARNING Enabling write-back caching may compromise the integrity of
the data if the system crashes. Use this feature only at your own discretion.

I

VDRPT

The interval in seconds between error conditions being reported. The
default value is 3600; the minimum and maximum values are 0 and 86400
seconds. If set to 0, errors are only reported when detected.

User and group configuration
The following parameters control resources that are specific to individual
users or groups.
NOFILES

Specifies the maximum number of open files for each process. Unless an
application package recommends that NOFILES be changed, the default
setting should be left unaltered.
The Bourne, C and Kom shells all use three file table entries: standard
input, standard output, and standard error (file descriptors 0, 1, and 2
respectively). This leaves the value of NOFILES minus 3 as the number of
other open files available for each process. If a process requires up to three
more than this number, then the standard files must be closed. This practice is not recommended and must be used with caution, if at all. If the
configured value of NOFILES is greater than the maximum (11000) or less
than the minimum (60), the configured value is set to the default (110), and
a message is sent to the console.
201

Configuring kernel parameters

Unless an application package recommends that NOFILES be changed, the
default setting should be left as is.
ULIMIT
Specifies in 512-byte blocks the size of the largest file that an ordinary user
can write. The default value is 2097151; that is, the largest file an ordinary
user can write is approximately 1GB (one gigabyte). A lower limit can be
enforced on users by changing the value of ULIMIT in the file
/etc/dejault/login; see login(M).
The ULIMIT parameter does not apply to reads; any user can read a file of
any size.
MAXUP
Specifies how many concurrent user processes an ordinary user is allowed
to run. The entry is in the range of 15 to 16000, with a default value of 100
processes. This value should be at least 10% smaller than the value of
MAX_PROC (or the maximum grown size of the process table reported by
sar -v if MAX_PROC is set to 0). This value is determined by the user
identification number, not by the terminal. For example, the more people
that are logged in on the same user identification, the quicker the default
limit would be reached.
MAXUMEM
Maximum size of a process' virtual address space in 4096-byte pages. The
allowed range of values is between 2560 and 131072; the default is 131072
pages (512MB). If you decrease this value and a process will not start due
to lack of memory, its parent shell reports one of the messages: "Too big"
or "Not enough space".
NGROUPS
Maximum number of simultaneous supplemental process groups per process. The value of NGROUPS can be set to any integral value from 0 to 12B;
the default value is B.
NGROUPS maps to the POSIX.1 runtime value NGROUPS_MAX for which
the minimum value allowed by FIPS is B. To retain FIPS and XPG4 compliance, you must restrict the value of NGROUPS to be greater than or equal
toB.

CMASK
The default mask used by umask(S) for file creation. By default this is zero,
meaning that the umask is not set in the kernel. The range of values is
between 0 and 0777. See chmod(C)and umask(C) for an explanation of setting absolute mode file permissions.

202

Performance Guide

Configuration tools

CHOWN_RES
Controls system-wide chown kernel privilege (formally known as the
chown kernel authorization) on all filesystems that set the POSIX.l constant _POSIX_CHOWN_RESTRICTED (also defined in X/Open CAE Specification, System Interfaces and Headers, Issue 4, 1992). See getconf(C) for more
information.
If set, CHOWN_RES prevents all users except root from changing
ownership
of
files
on
all
filesystems
that
support
_POSIX_CHOWN_RESTRICTED. The default value of CHOWN_RES is 0
(not set) which causes the restriction not to be enforced.

You can also use the chown kernel privilege to control users' privilege to
change file ownership. If chown kernel privilege is removed, some XPG4conformant applications may fail if they use interprocess communication
(semaphores, shared memory, and message passing). You should only set
chown kernel privilege in this way if you require C2-level security.
IOV_MAX
Maximum size of the I/O vector (struct iovec) array (number of noncontiguous buffers) that can be used by the readv(S) (scatter read) and
writev(S) (gather write) system calls. The default value is 512; the minimum and maximum values are 16 and 1024.

Security
The security profile (High, Improved, Traditional, or Low) can be selected as
discussed in "Changing the system security profile" in the System Administration Guide. The security parameters can be set to modify the behavior of the
security features and to ensure compatibility with utilities that expect traditional UNIX system behavior. Each of these parameters can be set to 0 (off) or
1 (on).
SECLUID
Controls the enforcement of login user ID (LUID). Under SCO's implementation of C2 requirements, every process must have an LUID. This means
that processes that set UIDs or GIDs, such as the printer scheduler
(lpsched), must have an LUID set when started at system startup in
/etc/rc2.d/S80lp. This can cause problems with setuid programs. When the
security default is set to a profile other than "High", enforcement of LUID
is relaxed and setuid programs do not require an LUID to run.

203

Configuring kernel parameters

SECSTOPIO

Controls whether the kernel implements the stopio(S) system call. When
SECSTOPIO is set to 1, the kernel acts on stopio(S) calls; when it is set to 0,
the kernel ignores stopio calls. The stopio system call is used under C2 to
ensure that a device is not held open by another process after it is
reallocated. This means that other processes attempting to access the
same device may be killed.
stopio(S) is used by initcond(ADM), which is called by getty(M) immediately before starting user interaction and by init(M) immediately after an
interactive session has terminated.
SECCLEARID

Controls the clearing of SUID/SGID bits when a file is written. Under C2
requirements, the set user ID (SUID or setuid) and set group ID (SGID or
setgid) bits on files must be cleared (removed) when a file is written. This
prevents someone from replacing the contents of a setuid binary. This can
cause problems with programs that do not expect this behavior. In the
#Low" security profile, SUID and SGID bits are not cleared when files are
written.
The following table summarizes the initial settings of the security parameters
for each security profile.
Parameter

Low

Traditional

Improved

High

SECLUID
SECSTOPIO
SECCLEARID

off
off
off

off
on
on

off
on
on

on
on
on

TTY and console configuration
The multiscreen parameters determine the number of console multiscreens
that can run simultaneously on the system. Each multiscreen requires about 4
to 8KB of memory depending on the number of lines (25 or 43). H you need to
save memory and are not using multiscreens heavily, set NSCRN to 4 and
SCRNMEM to 16 or 32. When you do this, you must also disable(C) multiscreens 5-12 (tty5 to tty12) or getty will generate warning messages when the
system goes to multiuser mode. NSCRN and SCRNMEM can be set to smaller
values than this if you are sure that you need fewer multiscreens.
TBLNK

Controls the console screen saver feature on VGA consoles (only). It is the
number of seconds before the screen blanks to save wear on the monitor.
TBLNK can have a value of 0 to 32767, with 0 (default) disabling screen
blanking.

204

Performance Guide

Configuration tools

NSCRN

The number of console multiscreens. A value of 0 configures this value at
boot time. The maximum value is 12.
SCRNMEM

Number of 1024-byte blocks used for console screen memory. A value of 0
(the default) configures this value at boot time based on the amount of
memory installed. The range of values is between 9 and 128. Each multiscreen uses from 4 to 8KB of memory, so when using a non-zero value for
this parameter, make SCRNMEM equal to 4 or 8 times the value of NSCRN.
NSPTIYS

Number of pseudo-ttys on the system. The default value is 16; the minimum and maximum values are 1 and 256. Each NSPTTYS requires 246
bytes of memory. This parameter should only be altered using the mkdev
ptty command which also creates the additional device nodes. Pseudottys are not related to console multiscreens; they are used for features such
as serial multiscreens mscreen(M), for shell windows, and for remote logins.
NUMXT

Number of layers a sub device can configure to support bitmapped display
devices such as the BLIT or the AT&T 5620 and 730 terminals. The range of
values is between 1 and 32; the default is 3. When this number is exceeded,
the following message is displayed on the console:
CONFIG: xtinit - Cannot allocate xt link buffers (NUMXT

=nUfinber

exceeded)

Note that the xt driver must have been linked into the kernel using the
mkdev layers command or the HardwarelKernel Manager in order to use
these display devices.
NUMSXT

Number of shell layers (shl(C» a subdevice can configure. The range of
values is between 1 and 32; the default is 6.
Note that the sxt driver must have been linked into the kernel using the
mkdev shl command or the Hardware/Kernel Manager in order to use
shell layers.

Filesystem configuration
The following parameters control the configuration of different file system
types.
MAXVDEPfH

Maximum number of undeletable (versioned) files allowed in the DTFS
and HTFS filesystems. A value of 0 disables versioning; the maximum
value is 65535. This parameter can be overridden when the file system is
mounted.

205

Configuring kernel parameters

MINVTIME

Minimum time before a file is made undeletable (versioned) in the DTFS
and HTFS filesystems. If set to 0, a file is always versioned (as long as
MAXVDEPrH is greater than 0); if set to a value greater than 0, the file is
versioned after it has existed for that number of seconds. The maximum
value is 32767.
This parameter can be overridden when the filesystem is mounted.
ROOTCHKPr

If set to 0, disable checkpointing in a root HTFS filesystem; if set to 1

(default), enable checkpointing.
ROOTLOG

If set to 0, disable transaction intent logging in a root HTFS filesystem; if set

to 1 (default), enable logging.
ROOTSYNC

If set to 0 (default), disable file synchronization on close on a root DTFS file-

system; if set to 1, enable synchronization on close.
ROOTNOCOMP

If set to 1, disable compression in a root DTFS filesystem; if set to 0

(default), enable compression.
ROOTMAXVDEPrH

Maximum number of undeletable (versioned) files on a root DTFS or HTFS
file system. A value of 0 disables versioning.
ROOTMINVTIME

Minimum time before a file is made undeletable (versioned) on a root DTFS
or HTFS filesystem. If set to 0, a file is always versioned (as long as ROOTMAXVDEPrH is greater than 0); if set to a value greater than 0, the
file is versioned after it has existed for that number of seconds.
DOSNMOUNT

Maximum number of mounted DOS filesystems. The range of values is
between 0 and 25; the default is 4.
DOSNINODE

Maximum number of open inodes for DOS filesystems. The range of
values is between and 300; the default is 40.

°

206

Performance Guide

Configuration tools

Table limits
The following parameters control the allocation of memory to dynamic kernel
tables.
TBLPAGES

The maximum number of pages of memory for dynamic tables. The range
of values is between 10 and 10000; the default is 0 which means that the
kernel configures the value based on the amount of memory available at
system startup.
TBLDMAPAGES

The maximum number of pages of dmaable" memory for dynamic tables.
The range of values is between 10 and 1000 pages; the default is 100.
II

TBLLIMIT

The percentage of TBLPAGES or TBLDMAPAGES to which a single table
may grow. The range of values is between 10 and 100%; the default is 70.
TBLSYSLIMIT

The percentage of memory allowed for dynamic tables if TBLPAGES is set
to O. The range of values is between 10 and 90%; the default is 25.
TBLMAP

The size of the dynamic table virtual space allocation map. The range of
values is between 50 (default) and 500.
The following parameters control the maximum grown sizes of dynamic kernel tables. If set to 0, the maximum possible size defaults to the value shown
by getconf(C) provided that sufficient TBLPAGES of memory have been allocated. For example, the command getconf KERNEL_MOUNT_MAX displays the maximum possible size of the mount table.
MAX_DISK

The maximum number of disk drives attached to the system. When the
Diskinfo table overflows, the following message is displayed on the console:
CONFIG: dk_name - Diskinfo table overflow (MAX_DISK

= nUfJnber exceeded)

The minimum and maximum configurable values of MAX_DISK are 1 and
1024; the default value of 0 means that the kernel determines the number
of disk drives dynamically.
MAX_IN ODE

Specifies the maximum number of inode table entries that can be allocated.
Each table entry represents an in-core inode that is an active file such as a
current directory, an open file, or a mount point. Pipes, clone drivers,
sockets, semaphores and shared data also use inodes, although they are
not associated with a disk file. The number of entries used depends on the
number of opened files.

207

Configuring kernel parameters

The minimum and maximum configurable values of MAX_INODE are 100
and 64000; the default value of 0 means that the in-core inode table grows
dynamically.
Each open file requires an inode entry in the in-core inode table. If the
inode table is too small, a message similar to the following is displayed on
the console:
eONFIG: routine - Inode table overflow (MAX_INODE = nur,nber exceeded)
When the inode table overflows, the specific request is refused. Although
not fatal to the system, inode table overflow may damage the operation of
various spoolers, daemons, the mailer, and other important utilities.
Abnormal results and missing data files are a common result.
If the system consistently displays this error message, use sar -v to evaluate whether your system needs tuning. The inod-sz value shows the num-

ber of inode table entries being used and the number of entries that have
been allocated for use by the table.
MAX_PROC

Specifies the maximum number of process table entries that can be allocated. Each table entry represents an active process. The number of
entries depends on the number of terminal lines available and the number
of processes spawned by each user. If the process table is full, the following message appears on the console and in the file /usr/adm/messages:
eONFIG: newproc - Process table overflow (MAX_PROe = nur,nber exceeded)
The minimum and maximum values of MAX_PROC that can be set are 50
and 16000; the default value is 0 which means that the process table grows
dynamically. The proc-sz values shown by sar -v show how many process table entries are being used compared to those that have been dynamically allocated.
MAX_FILE

Specifies the maximum number of open file table entries that can be allocated. Each entry represents an open file.
The minimum and maximum values of MAX_FILE that can be set are 100
and 64000; the default value is 0 which means that the file table grows
dynamically.
When the file table overflows, the following warning message is displayed
on the system console:
eONFIG: falloc - File table overflow (MAX_FILE =nur,nber exceeded)
This parameter does not control the number of open files per process; see
the description of NOFILES parameter.

208

Performance Guide

Configuration tools

MAX_REGION

Specifies the maximum number of region table entries that can be allocated. Most processes have three regions: text, data, and stack. Additional
regions are needed for each shared memory segment and shared library
(text and data) attached. However, the region table entry for the text of a
"shared text" program is shared by all processes executing that program.
Each shared-memory segment attached to one or more processes uses
another region table entry.
The minim1lffi and maximum values of MAX_REGION that can be set are
500 and 160000; the default value is 0 which means that the region table
grows dynamically.
If you do configure MAX_REGION, as a general rule you should set its

value to slightly more than three times greater than MAX_PROC. When
the region table overflows, the following message is displayed on the console:
CONFIG: allocreg - Region table overflow (MAX_REGION

=nUfJnber

exceeded)

MAX_MOUNT

Specifies the maximum number of mount table entries that can be allocated. Each entry represents a mounted filesystem. The root filesystem (/) is
always the first entry. When full, the mount(S) system call returns the
EBUSY error code.
The minimum and maximum values of MAX_MOUNT that can be configured are 4 and 4096; the default value of 0 means that the kernel grows the
size of the mount table dynamically.
MAX_FLCKREC

Specifies the maximum number of lock table entries that can be allocated.
This determines the number of file regions that can be locked by the system. The "lock-sz" value reported by sar -v shows the number of entries
that are being used in comparison to the number that have been allocated.
The minimum and maximum values of MAX_FLCKREC that can be configured are 50 and 16000; the default value is 0 which means that the kernel
grows the size of the record lock table dynamically according to the needs
of the applications running on your system.
STREAMS
STREAMS is a facility for UNIX system communication services. It supports
the implementation of services ranging from complete networking protocol
suites (such as TCP lIP and IPX/SPX) to individual device drivers. STREAMS
defines standard interfaces for character I/O. The associated mechanism is
simple and open-ended, consisting of a set of system calls, kernel resources
and kernel routines.

209

Configuring kernel parameters

STREAMS use system resources that are limited by values defined in kernel
configuration modules. Depending on the demand that you and other system
users place on these resources, your system could run out of STREAMS
resources if you do not first reset the allocations in the kernel configuration
modules.
Running out of some STREAMS resources (such as those controlled by the
NSTREAM parameter) generates kernel configuration error messages.
STREAMS message buffers are dynamically allocated from memory up to a
limit set by the value of the kernel parameter NSTRPAGES. This parameter
sets the maximum number of pages of physical memory that can be dynamically allocated for use by STREAMS.
Before changing the STREAMS parameters NSTREAM or NSTRPAGES, you
should check the current usage of STREAMS resources using the strstat command of the crash(ADM) utility or netstat(TC) with the -m option.
The following tunable parameters are associated with STREAMS processing:
NSTREAM
Number of stream head (stdata and estdata) data structures configured.
One of each structure is needed for each stream opened, including both
streams currently open from user processes and streams linked under
multiplexers. The allowed range of values is between 1 and 512; the
default is 32. The recommended configuration value is highly
application-dependent, but a value of 256 usually suffices on a computer
for running a single transport provider with moderate traffic. On Open
Desktop, each X client also uses a pair of stdata and a pair of estdata
structures. You should set NSTREAM to at least 256 on systems that are
running X clients. When the number of stream head structures is
exceeded, the following message is displayed on the console:
CONFIG: stropen - Out of streams (NSTREAM

=n

exceeded)

NSTRPAGES
The maximum number of pages of virtual memory that can be allocated
dynamically for use by STREAMS message buffers. The allowed range of
values is between 0 and 8000 pages; the default is 500.
If NSTRPAGES pages of virtual memory are not available when STREAMS

are initialized at startup, the system displays the following message on the
console for each STREAMS table that is affected:
CONFIG: strinit - Cannot alloc STREAMS nar.ne table \
(NSTRPAGES = n too big)

210

Performance Guide

Configuration tools

If more buffers are requested than there are available pages of physical

memory to create them, the system displays the following message on the
console:
CONFIG: allocb - Out of streams memory (NSTRPAGES

=n

exceeded)

Extra memory is allocated temporarily for high priority buffers only. The
system will then try to reduce STREAMS memory usage until it is less than
NSTRPAGES.

NOTE Memory used by STREAMS for buffers is fully dynamic; memory
can be freed as well as allocated.

The value of NSTRPAGES does not affect the size of the kernel at system
startup although the size of the kernel will grow and shrink over time as
pages of memory are allocated for use by STREAMS and subsequently
released.
STRSPLITFRAC

Sets the percentage of NSTRPAGES above which the system tries to create
buffers by splitting larger buffers that are on the free list. Below this limit,
the system tries to allocate new pages of memory to create the buffers.
STRSPLITFRAC can range between between 50 and 100 (percent); the
default is 80. If you set STRSPLITFRAC lower than this, the system will use
less memory for STREAMS but the memory that is used will tend to
become fragmented and the kernel will require more CPU time to manage
it.
NSTREVENT

Initial number of stream event structures configured. Stream event cells
are used for recording process-specific information in the poll system call.
They are also used in the implementation of the STREAMS I_SETSIG ioctl
and in the kernel bufcall mechanism. A rough minimum value to configure would be the expected number of processes to be simultaneously
using poll times the expected number of STREAMS being polled for each
process, plus the expected number of processes expected to be using
STREAMS concurrently. The default and minimum value is 256; the maximum is 512. Note that this number is not necessarily a hard upper limit
on the number of event cells that are available on the system (see MAXSEPGCNT).

211

Configuring kernel parameters

MAXSEPGCNT

The maximum (4KB) page count for stream events. If this value is 0 (minimum), only the amount defined by NSTREVENT is available for use. If the
value is not 0 and if the kernel runs out of event cells, it will under some
circumstances attempt to allocate an extra page of memory from which
new event cells can be created. MAXSEPGCNT places a limit on the number of pages that can be allocated for this purpose. Once a page is allocated for event cells, however, it cannot be recovered later for use elsewhere. The default value is 1 and the maximum 32.
STRMSGSZ

Maximum allowable size of the data portion of any STREAMS message.
This should usually be set just large enough to accommodate the maximum packet size restrictions of the configured STREAMS modules. If it is
larger than necessary, a single write or putmsg can consume an inordinate
number of message headers. The range of values is between 4096 and
524288; the default value of 16384 is sufficient for existing applications.
NUMSP

Determines the number of STREAMS pipe devices (/dev/spx, see spx(HW»
supported by the system. The default value is 64; the maximum and minimum values are 1 and 256. Administrators do not normally need to
modify this parameter unless certain applications state that they require it.
NUMTIM

Maximum number of timod(M) STREAMS modules that can be pushed by
the Transport Layer Interface (TLI) onto a stream head. This parameter
limits the number of streams that can be opened. The default value is 16
but various protocol stacks (for example, TCP, LMU, or NETBIOS) may
require its value to be set to 32, 64, or 128. Administrators do not normally
need to modify this parameter.
NUMTRW

Maximum number of timod(M) STREAMS modules that can be pushed by
the Transport Layer Interface (TLI) onto a stream head in order that the
stream will accept read(S) and write(S) system calls. This parameter
effectively limits the number of streams onto which the module can be
pushed. The default value is 16 but various protocol stacks (for example,
TCP, LMU, or NETBIOS) may require its value to be set to 32,64, or 128.
Administrators do not normally need to modify this parameter.
See #STREAMS parameters" (page 213) for a description of the STREAMS parameters that can only be tuned using idtune(ADM).

212

Performance Guide

Configuration tools

STREAMS parameters
NOTE This group of parameters is not tunable using configure(ADM); you
must use the idtune(ADM) command instead as described in ~'Using idtune
to reallocate kernel resources" (page 190).
NMUXLINK

Number of stream multiplexer links configured. One link structure is
required for each active multiplexer link (STREAMS CLINK ioctl) in networking protocol stacks such as those used to implement TCP lIP and NFS.
Each PPP link also requires such a structure. The number needed is
application-dependent; the default value is 192. The minimum and maximum configurable values are 1 and 4096.
NSTRPUSH

Maximum number of modules that may be pushed onto a stream. This
prevents an errant user process from consuming all of the available queues
on a single stream. The default possible value is 9. In practice, applications usually push at most four modules onto a stream.
NLOG

Number of minor devices to be configured for the log driver; the active
minor devices are 0 through (NLOG-1). The only value of 3 services an
error logger (strerr) and a trace command (strace), with one left over for
miscellaneous usage.
STRCTLSZ

Maximum allowable size of the control portion of any STREAMS message.
The control portion of a putmsg message is not subject to the constraints
of the minimum/maximum packet size, so the value entered here is the
only way of providing a limit for the control part of a message. The only
possible value of 1024 is more than sufficient for existing applications.

Message queues
The following tunable parameters are associated with interprocess communication message queues:
MSGMAP

Specifies the number of entries in the memory map for messages. An
entry in the message map table says that MSGSEG / MSGMAP memory
segments are free at a particular address.

213

Configuring kernel parameters

MSGMAP measures how fragmented you expect your map to get. Its
value can be small if you always send a few large messages, or it can be
large if you send a lot of small messages. The suggested value for
MSGMAP is approximately half the value of MSGSEG; this allocates two
message segments per map entry. If the value of MSGMAP is set equal to
MSGSEG, long messages may become totally fragmented with their component segments being randomly scattered across the map.

Do not set MSGMAP to a value greater than that of MSGSEG. The range of
configurable values is from 4 to 32768; the default value is 512. Each entry
costs 8 bytes.
MSGMAX

Maximum size of a message in bytes. The minimum value is 128, the
default value is 8192 bytes, and the maximum possible size the kernel can
process is 32767 bytes.
MSGMNB

Maximum number of bytes of memory that all the messages in anyone
message queue can occupy. The default value is 8192; the maximum and
minimum values are 128 bytes and 65532 bytes.
MSGSEG

Number of MSGSSZ segments of memory allocated at kernel startup for
holding messages. Therefore a total of MSGSEG*MSGSSZ bytes of memory are allocated for messages.

I

NOTE The amount of memory allocated for messages must not exceed

128KB.

If MSGSEG is set at 0, then the kernel will auto-configure the values of
MSGSEG, MSGMAX, and MSGMNB. For most memory configurations,
MSGSEG is set to 1024, and MSGMAX and MSGMNB are both set to
MSGSEG*MSGSSZ.

The IPC_NOWAIT flag can be passed into many of the msg system calls. If
this flag is passed, then the system calls will fail immediately if there is no
space for a message. If this flag is not passed, then the system calls will
sleep until there is room for the message.

214

Performance Guide

Configuration tools

To determine adequate values for each of the parameters, compute the
maximum size and number of messages desired, and allocate that amount
of space. For example, if the system will have at most 40 messages of lKB
each pending, then MSGTQL should be set to 40, and MSGSEG is computed as:
• 40 messages of lK each = 40KB total message space.
• Divide total message space by MSGSSZ to get MSGSEG. If MSGSSZ=8
bytes, then MSGSEG = 40*1024/8 = 5120.
The default value of MSGSEG is 1024; the minimum and maximum values
are 32 and 32768.
See "Message queue parameters" (this page) for a description of the message
queue parameters that can only be tuned using idtune(ADM).

Message queue parameters
The following parameters are associated with System V IPe message queues.

NOTE This group of parameters is not tunable using configure(ADM); you
must use the idtune(ADM) command instead as described in ''Using idtune
to reallocate kernel resources" (page 190).

I

MSGMNI
Maximum number of different message queues allowed system-wide. The
default value of MSGTQL is 50; the minimum and maximum values are 1
and 1024. You should not normally need to adjust the value of this parameter.
MSGTQL
Number of system message headers that can be stored by the kernel; that
is, the maximum number of unread messages at any given time. Each
header costs 12 bytes. The default value of MSGTQL is 1024; the minimum
and maximum values are 32 and 16383. You should not normally need to
adjust the value of this parameter unless an application needs a large
number of messages.
MSGSSZ
Size in bytes of the memory segment used for storing a message in a message queue.
A message that is shorter than a whole number multiple of memory segments will waste some bytes. For example, an 18-byte message requires
three message segments if MSGSSZ is set to 8 bytes. In this case, 6 bytes of
memory are unused, and unusable by other messages.

The product of the values of MSGSSZ and MSGSEG determines the total
amount of data that can be present in all message queues on a system. This
product should not be greater than 128KB.
215

Configuring kernel parameters

The default value of MSGSSZ is 8 bytes; the minimum and maximum
values are 4 bytes and 4096 bytes. The configured value of MSGSSZ must
be divisible by 4. You should not normally need to adjust the value of this
parameter.

Event queues
The following parameters control the configuration of the event queues.
EVQUEUES

Maximum number of open event queues systemwide. Each EVQUEUES
costs 88 + (2 * EVDEVSPERQ) bytes of memory. The range of values is
between 1 and 256; the default is 8.
EVDEVS

Maximum number of devices attached to event queues systemwide. Each
EVDEVS costs 48 bytes of memory. The range of values is between 1 and
256; the default is 16. When the event table overflows, the following message is displayed on the console:
CONFIG: event - Event table full (EVDEVS

= number

exceeded)

EVDEVSPERQ

Maximum number of devices for each event queue. The range of values is
between 1 and 16; the default is 3. When the event channel overflows, the
following message is displayed on the console:
CONFIG: event - Event channel full (EVDEVSPERQ

= number

exceeded)

Semaphores
The following tunable parameters are associated with interprocess communication semaphores:
SEMMAP

Size of the control map used to manage semaphore sets. The default and
minimum value is 10; the maximum is 100. Each entry costs 8 bytes.
SEMMNI

Number of semaphore identifiers in the kernel. This is the number of
unique semaphore sets that can be active at any given time. The default
and minimum value is 10; the maximum is 300. Each entry costs 32 bytes.
SEMMNU

Number of semaphore undo structures in the system. The size is equal to
8*{SEMUME + 2) bytes. See #Semaphore parameters" (page 217) for a
definition of SEMUME. The range of values is between 10 and 100; the
default is 30.

216

Performance Guide

Configuration tools

XSEMMAX
Size of the XENIX® semaphore table that determines the maximum number
of XENIX semaphores allowed systemwide. The minimum value for
XSEMMAX is 20, the maximum value is 90, and the default value is 60.
When the XENIX semaphore table overflows, the following message is displayed on the console:
CONFIG: xsem_alloc - XENIX semaphore table overflow (XSEMMAX

=nu~ber

exceeded)

See "Semaphore parameters" (this page) for a description of the semaphore
parameters that can only be tuned using idtune(ADM).

Semaphore parameters
NOTE This group of parameters is not tunable using configure(ADM); you
must use the idtune(ADM) command instead as described in ''Using idtune
to reallocate kernel resources" (page 190).

I

SEM_NSEMS_MAX
Maximum number of POSIX.lb semaphores available for use on the system
(provided by the SUDS library). The default value is 100; the minimum
and maximum configurable values are 1 and 255 respectively.
The following parameters are associated with System V IPC semaphores only:
SEMMSL
Maximum number of semaphores for each semaphore identifier. The
default and minimum value is 25; the maximum value is 60.
SEMOPM
Maximum number of semaphore operations that can be executed for each
semop(S) call. The default value is 10; each entry costs 8 bytes.
SEMUME
Number of undo entries for each process. The default value is 10.
SEMVMX
Maximum value a semaphore can have. The default value is 32767.
SEMAEM
Maximum value for adjustment on exit, alias semadj. This value is used
when a semaphore value becomes greater than or equal to the absolute
value of semop, unless the program has set its own value. The default
value is 16384.
SEMMNS
Number of semaphores in the system. The default and minimum value is
60; the maximum value is 300. Each entry costs 8 bytes.

217

Configuring kernel parameters

Shared memory
The following tunable parameters are associated with interprocess communication shared memory:
SHMMAX
Maximum shared-memory segment size. The range of values is between
131072 and 80530637 bytes; the default value is 524288 bytes.
SHMMIN
Minimum shared-memory segment size. The default value is 1 byte.
XSDSEGS
Maximum number of XENIX special shared-data segments allowed system
wide. The range of values is between 1 and 150; the default is 25. When
the XENIX shared data table overflows, the following message is displayed
on the console:
CONFIG: xsd_alloc - XENIX shared data table overflow (XSDSEGS

= nur.nber

exceeded)

XSDSLOTS
Number of slots for each XENIX shared data segment. The maximum
n:umber of XENIX special shared data segment attachments system wide is
XSDSEGS*XSDSLOTS. The range of values is between 1 and 10; the
default is 3.
See "Shared memory parameters" (this page) for a description of the shared
memory parameters that can only be tuned using idtune(ADM).

Shared memory parameters
NOTE The following parameter is not tunable using configure(ADM); you
must use the idtune(ADM) command instead as described in "Using idtune
to reallocate kernel resources" (page 190).

I

SHMMNI
Maximum number of shared-memory identifiers systemwide. The minimum and default value is 100; the maximum is 2000. Each entry costs 52
bytes.

218

Performance .Guide

Configuration tools

Miscellaneous system parameters
The following parameters control the size of the configuration string buffer,
and the size of the kernel profiler symbol table.
MAX_CFGSIZE

Maximum size of configuration information saved by the tab(HW) driver.
This is the maximum size of information available using Idevlstring/cfg as
described on the string(M) manual page. If this limit is exceeded, the following message is displayed on the console:
CONFIG: string: Configuration buffer full (MAX_CFGSIZE

= nur.nber

exceeded)

MAX_CFGSIZE ranges from 256 to 32768 bytes; the default is 1024 bytes.
PRFMAX

Sets the maximum number of text symbols that the kernel profiler, Idevlprf,
can properly process. The range of values is between 2048 and 8192; the
default is 4500. See profiler(ADM) for information about the kernel profiler.

System parameters
NOTE This group of parameters is not tunable using configure(ADM); you
must use the idtune(ADM) command instead as described in ''Using idtune
to reallocate kernel resources" (page 190).

I

NODE

System name. The value of NODE must not be greater than eight characters. The default value is "scosysv".
TIMEZONE

Specifies the timezone in units of minutes different from Greenwich Mean
Time (GMT). Note that the value specifies the system default timezone and
not the value of the TZ environment variable. TIMEZONE can have a
value from -1440 (east of GMT) to 1440 (west of GMT); the default is 480.
DSTFLAG

Specifies the dstflag described for the ctime(S) system call. A value of 1
indicates Daylight Savings Time applies locally, zero is used otherwise.
KDBSYMSIZE

Size of the kernel debugger symbol table in bytes. (This parameter is only
useful if a debugger is linked into the kernel.) It must have a value of
between 50000 and 500000; the default is 300000.
NCPYRIGHT

Defines the maximum. number of strings used to store some vendor driver
copyright messages that may be displayed on the console when the system
is booted. Modifying this parameter is unlikely to affect the display of
most copyright messages.

219

Configuring kernel parameters

Miscellaneous device drivers and hardware parameters
The following parameters control the configuration of various device drivers
and hardware behavior.
CTBUFSIZE

Size of the tape buffer in kilobytes. This static buffer is allocated by the
QIC-02 cartridge tape device driver (ct) when it is initialized at system
startup. This parameter should have a value of between 32 and 256. Set
this parameter to 0 if the ct driver is linked into the kernel but you either
do not have or do not use a cartridge tape drive. The following are values
that this parameter can take in various circumstances:
32KB
bare minimum: this is insufficient to stream
64KB

minimum to allow streaming (good for systems with little memory) or
little tape use (if tape I/O performance is not critical)
96KB
reduce to this at first if the default uses too much memory
128KB
default: this offers good tradeoff performance between I/O and memory
192KB
increase to this at first if the default provides poor I/O performance
256KB
maximum size

NOTE The SCSI tape device driver (Stp) allocates a statically configured
128KB buffer for each device which is not controlled by this parameter.
All SCSI tape drives including SCSI cartridge tape drives use the Stp
driver.
SDSKOUT

Maximum number of simultaneous requests that can be queued for each
SCSI disk. The SCSI disk driver (Sdsk) will sleep if no request blocks are
available. The default value of this parameter is 4; the minimum and maximum values are 1 and 256. You should set SDSKOUT higher if the -S
option to sar(ADM) (or mpsar(ADM) for SMP) reports that the system is
running out of request blocks.

220

Performance Guide

Configuration tools

DMAEXCL
Specifies whether simultaneous DMA requests are allowed. Some computers have DMA chips that malfunction when more than one allocated channel is used simultaneously. DMAEXCL is set to 0 by default to allow
simultaneous DMA on multiple channels. Set its value to 1 if this causes a
problem.
KBTYPE
Determines the logical character protocol used between the keyboard and
the keyboard driver. This tunable is set by default to 0 for XT scancodes
and is recommended; a value of 1 specifies AT scancodes which are recognized by the console driver but not by the X server or by DOS emulators.
All AT-compatible keyboards support both modes.
VGA_PLASMA
Set to 1 if an IBM® PS/2® model P70 or P75 VGA plasma display is present;
set to 0 (default) if not.
NSHINTR
Maximum number of devices sharing the same interrupt vector. This has
a default value of 8; the minimum and maximum values are 2 and 20. You
should not normally need to modify this parameter.
D0387CR3
Controls the setting of high-order bits of Control Register 3 (CR3) when an
80387™ math coprocessor is installed. Because of design defects in early
versions of the Intel® 80387™ chip (Bl stepping), this math coprocessor
may not operate correctly in some computers. The problem causes a CPU
to hang when DMA, paging, or coprocessor accesses occur. You can work
around this problem by changing the D0387CR3 parameter from the
default value of 0 (switched off) to 1.

I

WARNING Do not set this parameter to 1 on 80486™ or Pentium™ machines.

DOWPCRO
If set, the kernel uses the write protection bit in Control Register 0 (CRO) to
enable write protection in kernel mode. The default value is 1 which sets
this parameter. This parameter is effectively disabled on machines which
contain one or more 80386™ CPUs which do not support this feature.
MODE_SELECT
No effect. Mode-select checking on parallel (printer) ports can be adjusted
on a per-printer basis using the pa_tune [] array defined in 
and documented in the file letc!conflpack.dlpalspace.c.

221

Configuring kernel parameters

Hardware and device driver parameters
NOTE This group of parameters is not tunable using configure(ADM); you
must use the idtune(ADM) command instead as described in #Using idtune
I to reallocate kernel resources" (page 190).
NAHACCB

Number of mailboxes available for the Adaptec 154Xjl64X host adapter
driver to talk to other Adaptec hardware. The higher the number, the less
likely it is that the driver has to sleep. It is not normally necessary to
modify this parameter.
NEMAP

Specifies the maximum number of mapchan(M) I/O translation mappings
that can be in effect at the same time. The default value of this parameter
is 10.
NKDVTTY

Number of virtual terminals (8) supported by the console keyboard driver.
Administrators should not modify this parameter.

Boot load extension parameters
NOTE This group of parameters is not tunable using configure(ADM); you
must use the idtune(ADM) command instead as described in ''Using idtune
I to reallocate kernel resources" (page 190).
EXTRA_NDEV

Number of extra device slots in fmodsw [], io_ini t [], and io ••• []. It
defines the number of slots reserved in the device driver tables for Boot
Time Loadable Drivers (BTLDs).
EXTRA_NEVENT

Number of extra event slots. It defines the number of slots reserved in the
event driver tables for BTLDs.
EXTRA_NFILSYS

Number of extra types of filesystem. It defines the number of extra types
of filesystem that can be loaded using BTLDs.
MAX_BDEV

Maximum number of block devices (bdevcnt is at least this value). It
defines the minimum number of entries in bdevsw [], the block device
switch table.
MAX_CDEV

Maximum number of character devices (cdevcnt is at least this value). It
defines the minimum number of entries in cdevsw [ ] , the character device
switch table.

222

Performance Guide

Configuration tools

LAN Manager Client Filesystem parameters
NOTE The LAN Manager Client Filesystem (LMCF5) adds several kernel parameters to the mtune file that are not tunable using configure(ADM); you
must use the idtune(ADM) command instead as described in "Using idtune
to reallocate kernel resources" (page 190).
LMCFS_BUF_SZ
Determines the maximum amount of data that LMCF5 can transmit or
receive in a single network packet. The default value is 4096 bytes.
LMCFS_LMINUM
Controls the number of allocatable inodes. The default value is 150; the
maximum value is 600. Set this value higher if users have many LMCF5
files open simultaneously.
LMCFS_NUM_BUF
Sets the number of server message block (5MB) data buffers used by
LMCF5. The default value is 256; the maximum value is 8192. The size of
each buffer is determined by LMCFS_BUF_SZ.
LMCFS_NUM_REQ
Constrains the number of simultaneous 5MB requests that can be made on
the network. The default value is 64; the maximum value is 1024. This
parameter should be set to at least one quarter of the value of
LMCFS_NUM_BUF.

Examining and changing configuration-dependent values
getconf allows you to inspect the values of configuration-dependent variables
for various standards, and the values of dynamic kernel table parameters.
Below is an example of the use of getconf:
$ getconf NZERO
20
$ getconf CLK_TCK
100

This indicates that the default process priority on the system is 20 and the system clock runs at 100 ticks per second.

223

Configuring kernel parameters

Path variables, such as NAME_MAX which defines the maximum filename
length, depend on the filesystem type and therefore the pathname. These
examples show the values of NAME_MAX for an HTFS and a XENIX filesystem:
# getconf

NAME_MAX

I htfs_filesystem

NAME_MAX

I xenix_filesystem

255

# getconf
14

For a complete list of the variable names to use with the command see
getconf(C).
If you are logged in as root, you can use the setconf(ADM) command to

change a subset of the configuration dependent parameters. Using setconf,
you can increase the current size of the dynamic kernel tables or decrease
their maximum possible size. You can also dynamically increase the number
of character buffers available for use by the serial driver, for example:
setconf KERNEL_CLISTS 1024
The maximum possible number of such buffers that you can allocate is controlled by the KERNEL_CLISTS_MAX parameter.
NOTE Any change that you make using setconf remains in force only until
the system is next rebooted. Use the Hardware/Kernel Manager or configure to make the change permanent.

I

224

Performance Guide

Appendix C

Configuring TCP/IP tunable parameters
You can adjust the configuration parameters for TCP lIP using the
ifconfig(ADMN) and inconfig(ADMN) utilities as described in the following
sections:
• "Using ifconfig to change parameters for a network card" (this page)
• "Using inconfig to change global TCP lIP parameters" (page 226)
If you need to change STREAMS resources, you must use the configure(ADM)
command as described in ''Using configure to change kernel resources" (page
189).

Using ifconfig to change parameters for a network card
You can use the ifconfig(ADMN) command to reconfigure performance parameters for a single network interface. If you wish to make this change permanent you must edit the entry for the interface in the /etc/tcp script.
The metric, onepacket, and perf parameters affect performance.
metric can be used to artificially raise the routing metric of the interface used
by the routing daemon, routed(ADMN). This has the effect of making a route
using this interface less favorable. For example, to set the metric for the smeO
interface to 10, enter:
letdifconfig smeO inet metric 10
onepacket enables one-packet at a time operation for interfaces with small
buffers that are unable to handle continuous streams of back-ta-back packets.
This parameter takes two arguments that allow you to define a small packet
size, and the number of these that you will permit in the receive window.

225

Configuring TCPI/P tunable parameters

This deals with TCP lIP implementations that can send more than one packet
within the window size for the connection. Set the small packet size and
count to zero if you are not interested in detecting small packets. For example,
to set one-packet mode with a small packet threshold of one small packet of
512 bytes on the e3AO interface, enter:
letC/ifconfig e3AO inet onepacket 512 1
To turn off one-packet mode for this interface, enter:
letC/ifconfig e3AO inet -onepacket
perf allows you to tune performance parameters on a per-interface basis. The
arguments to perf specify the receive and send window sizes in bytes, and
whether TCP should restrict the data iri a segment to a multiple of lKB (a
value of 0 restricts; 1 uses the full segment size).
The following example sets the receive and send window size to 4KB, and
uses the maximum 1464-byte data size available in an Ethernet frame:
letC/ifconfig smeO inet perf 4096 4096 1
NOTE Segment truncation does not change the size of the Ethernet frame;
this is fixed at 1530 bytes.

I

Using inconfig to change global TCPIIP parameters
As root, you can use the inconfig{ADMN) command to change the global
default TCP lIP configuration values.

I

NOTE Any global performance parameters that you set using inconfig are
overridden by per-interface values specified using ifconfig.

For example, to enable forwarding of IP packets, you would enter:
inconfig ipforwarding 1
inconfig updates the values of the parameters defined in letcldefaultlinet and
those in use by the currently executing kernel. You do not need to reboot your
system for these changes to take effect; inconfig dynamically updates the kernel with the changes you specify. Before doing so, it verifies that the values
you input are valid. If they are not, the current values of the parameters are
retained.
See "TCP lIP parameters" (page 227) for a description of the TCP lIP parameters that you can tune using inconfig.

226

Performance Guide

TCPJlP parameters
The parameters that control the operation of TCP lIP are defined in the file

/etc/default/inet.
The parameters are grouped according to function:
• "Address Resolution Protocol (ARP) parameters" (this page)
• "Asynchronous half-duplex (ASYH) line connection parameters" (page 228)
• "Internet Control Message Protocol (ICMP) parameters" (page 228)
• "Internet Group Management Protocol (IGMP) parameters" (page 229)
• "Configuring the in-kernel network terminal (IKNT) driver" (page 229)
• "Internet Protocol (IP) parameters" (page 229)
• IIMessage block control logging (MBCL) parameters" (page 232)
• IINetBIOS parameters" (page 232)
• IITransmission Control Protocol (TCP) parameters" (page 232)
• ''User Datagram Protocol (UDP) parameters" (page 234)
You should read the description for a parameter before you change it using
inconfig(ADMN) as described in IIUsing inconfig to change global TCP/IP
parameters" (page 226). The default values of the parameters are configured
to work efficiently in most situations.

I

NOTE Never edit the settings for these parameters in the file /etc/default/inet;
always use inconfig to change them.

Address Resolution Protocol (ARP) parameters
The following parameters control the behavior of the Address Resolution Protocol (ARP).
arpprintfs
Controls logging of warnings from the kernel ARP driver. These are displayed on the console. If set to 0 (the default), debugging information is
not displayed.
arp _maxretries
Sets the maximum number of retries for the address resolution protocol
(ARP) before it gives up. The default value is 5; the minimum and maximum configurable values are 1 and 128.

227

Configuring TCPI/P tunable parameters

arpt_down
Sets the time to hold onto an incomplete ARP cache entry if ARP lookup
fails. The default value is 20 seconds; the minimum and maximum
configurable values are 1 and 600 seconds.
arpt_keep
Sets the time to keep a valid entry in the ARP cache. The default value is
1200 seconds; the minimum and maximum configurable values are 1 and
2400 seconds.
arpt_prune
Sets the interval between scanning the ARP table for stale entries. The
default value is 300 seconds; the minimum and maximum configurable
values are 1 and 1800 seconds.
The number of ARP units is controlled by the value of the defined constant
ARP_UNITS.

Asynchronous half-duplex (ASYH) line connection parameters
The following parameter controls the behavior of asynchronous half-duplex
(ASYH) line connections used by PPP.
ahdlcmtu
Sets the maximum transmission unit (MTU) for an asynchronous PPP link.
This is normally set on a per-system basis in the /etc/ppphosts file - if not
defined there, this value is used.
The default value of ahdlcmtu is 296 bytes; the minimum and maximum
configurable values are 128 and 2048 bytes.

Internet Control Message Protocol (ICMP) parameters
The following parameters control the behavior of the Internet Control Message Protocol (ICMP).
icmp_answermask
H set to 1, the system will respond to ICMP subnet mask request messages.
This variable must be set to 1 to support diskless workstations. The
default value is 0, do not respond, as specified in RFC 1122.
icmpprintfs
Controls logging of warnings from the kernelICMP driver. These are displayed on the console. H set to 0 (the default), debugging information is
not displayed.

228

Performance Guide

Internet Group Management Protocol (IGMP) parameters
The following parameter controls the behavior of the Internet Group Management Protocol (IGMP).
igmpprintfs
Controls logging of warnings from the kernel IGMP driver. These are displayed on the console. If set to 0 (the default), debugging information is
not displayed.

Configuring the in-kernel network terminal (IKNT) driver
The number of IKNT driver units is determined by the number of pseudo-ttys
configured on the system. Use mkdev ptty to tune the number of pseudottys.

Internet Protocol (IP) parameters
The following parameters control the behavior of the Internet Protocol (IP).
The number of interfaces supported by IP is dynamic and does not need tuning.

NOTE The value of the parameters in_fullsize, in_recvspace, and
in_sendspace affect the systemwide interface defaults. Their values may be
overridden on a per-interface basis by ifconfig(ADMN). This allows you to
mix fast and slow network hardware on the same system with optimal performance parameters defined for each interface.
in_fullsize
Controls the systemwide default TCP behavior for attempting to negotiate
the use of full-sized segments. If set to 1 (the default), TCP attempts to use
a segment size equal to the interface MTU minus the size of the TCP lIP
headers. If set to 0, TCP rounds the segment size down to the nearest
power of 2.
in_Ioglimit
Controls how many bytes of the error packet to display when debugging.
Note that the appropriate xxxprintfs parameter (such as tcpprintfs) must
be set to a non-zero value to enable logging. The default value is 64. The
minimum and maximum configurable values are 1 and 255.

229

Configuring TCPI/P tunable parameters

in_recvspace
Sets the systemwide default size of the TCP lIP receive window in bytes.
The default value is 4096 bytes. The minimum and maximum
configurable values are 2048 and 65535 bytes.
in_sendspace
Sets the systemwide default size of the TCP lIP send window in bytes. This
should be at least as large as the loopback MTU. The default value is 8192
bytes. The minimum and maximum configurable values are 2048 and
65535 bytes.
ip_checkbroadaddr
Controls whether IP validates broadcast addresses. If set to 1 (the default
as specified in RFC 1122), IP discards non-broadcast packets sent to a linklevel broadcast address. In the unlikely event that a data-link driver does
not support this, packets may be discarded erroneously. If the netstat -sp
ip command shows that many packets cannot be forwarded, set this
parameter to 0 to tum off checking.
ip_dirbroadcast
If set to 1 (the default), allows receipt of broadcast packets only if they
match one of the broadcast addresses configured for the interface upon
which the packet was received. If set to 0, allows receipt of broadcast
packets that match any configured broadcast address.
ip_perform_pmtu
IP performs Path MTU (PMTV) discovery as specified in RFC 1191 if set to 1
(the default). This causes IP to send packets with the "do not fragment" bit
set so that routers will generate "Fragmentation Required" messages. If
this causes interoperability problems, a value of 0 disables PMTU.
If you disable PMTU, you should also set tcp_of£er_bi8-mss (described in

"Transmission Control Protocol (TCP) parameters" (page 232» to O.
ip_pmtu_decrease_age
Controls how many seconds IP will wait (while performing PMTU) after
decreasing an MTU estimate before it starts raising it. The default value is
600 seconds. The maximum configurable value is 32667. If set to
Oxffffffff, the estimate is never raised; this is useful if there is only one
path out of your local network and its MTU is known to be constant.
ip_pmtu_increase_age
Sets the number of seconds between increasing the MTU estimate for a
destination once it starts to increase. The default value is 120 seconds. The
minimum and maximum configurable values are 0 and 600 seconds.
ip_settos
If set to 1 (the default), IP sets type-of service TOS information (as specified
in RFS 1122) in packets that it sends down to the data-link layer. Set this to
oif your network card link-level driver cannot handle this.

230

Performance Guide

ip_subnetsarelocal
The default value of 1 specifies that other subnets of the network are to be
considered as local- that is, TCP assumes them to be connected via highMSS paths and adjusts its idea of the MSS to be negotiated. Otherwise, TCP
uses the default MSS specified by tcp_mssdflt (described in #Transmission
Control Protocol (TCP) parameters" (page 232)) - this is typically 512
bytes in accordance with RFC 793 and 1122. By default, the parameter
tcp_of£er_bi~mss is non-zero so that Path MTU discovery will provide
the maximum benefit. If the value of tcp _of£er_bi~mss is zero, the value
of ip_subnetsarelocal is not checked. This allows for good local performance even when PMTU discovery is not used.
The message #ICMP Host Unreachable" is generated for local subnet routing failures. When this value is set to 0, the packet si,ze is set to 576 bytes,
as specified in RFC 1122.
The default value of 1 enables this feature; if set to 0, it is disabled.
ip_ttl
Sets the time to live (TTL) of an IP packet as a number of hops. This value
is used by all kernel drivers that need it (including TCP). The default value
is 64 as recommended by RFC 1340. The minimum and maximum
configurable values are 1 and 255.
ipforwarding
ipsendredirects
If you want to use your machine as a gateway, set both these parameters
to l.
ipforwarding controls whether the system will 'forward packets sent to it
which are destined for another system (that is, act as a router). The default
value is 0 (off) as defined by RFC 1122. A system acting as a host will still
forward source-routed datagrams unless ipnonlocalsrcroute is set to O.
ipsendredirects controls whether IP will redirect hosts when forwarding a
packet out of the same interface on which it was received. This should be
set to 1 if ipforwarding is set to l.
The Network Configuration Manager configures these values when additional drivers are added. This feature usually makes it unnecessary to
change ipforwarding and ipsendredirects with inconfig.

231

Configuring TCPI/P tunable parameters

ipnonlocalsrcroute
Controls whether source-routed datagrams will be forwarded if they are
not destined for the local system. On hosts, the default value is 0 (off). H
your machine is acting as a router (ipforwarding is set to 1), the Network
Configuration Manager sets its value to 1. Set its value back to 0 if you are
concerned that this may open a security hole.
ipprintfs
Controls logging of warnings from the kernel IP driver. These are displayed on the console. If set to 0 (the default), debugging information is
not displayed.

Message block control logging (MBCL) parameters
The following parameter controls the behavior of message block control logging (MBCL).
mbclprintfs
Controls logging of warnings from the kernel MBCL driver which converts
STREAMS messages (mblock) to character lists (clist). The warnings are
displayed on the console. If set to 0 (the default), debugging information
is not displayed.

NetBIOS parameters
The following parameters control the behavior of NetBIOS.
nb _sendkeepalives
Turns NetBIOS level keepalives on or off. When turned on, NetBIOS
keepalives are sent periodically on dormant NetBIOS connections. NetBIOS
keepalives are independent of TCP lIP keepalives, and are useful for systems that do not use TCP lIP keepalives. This parameter is set to 0 (turned
off) by default. Set it to 1 to enable NetBIOS keepalives.
nbprintfs
Controls logging of warnings from the kernel NetBIOS driver as specified
in RFC 1001/2. The warnings are displayed on the console. H set to 0 (the
default), debugging information is not displayed.

Transmission Control Protocol (rCP) parameters
The following parameters control the behavior of the Transmission Control
Protocol (TCP). You can increase the number of TCP units beyond the default
number (256) using the Network Configuration Manager for the appropriate
sco_tcp chain.
tcp_initial_timeout
Sets the TCP lIP retransmit time for an initial SYN segment. The default
value is 180 seconds as defined by RFC 1122. The minimum and maximum
configurable values are 1 and 7200 seconds.

232

Performance Guide

tcp_keep idle
Sets the idle time before TCP lIP keepalives are sent (if enabled). The
default value is 7200 seconds. The minimum and maximum configurable
values are 300 and 86400 seconds.
tcp _keepintvl
Sets the TCP /IP keep alive interval between keep alive packets once they
start being sent. The default value is 75 seconds. The minimum and maximum configurable values are 1 and 43200 seconds.
tcp_mss_sw_threshold
Defines the small window threshold for interface MTUs. If the MTU of an
interface is small enough to force TCP to use an MSS smaller than this
threshold, then TCP will use the receive window size specified by
tcp _small_recvspace. This is an optimization to avoid buffering too much
data on low-speed links such as SLIP and PPP. The default value is 1024
bytes. The minimum and maximum configurable values are 512 and 4096
bytes.
tcp _mssdflt
Sets the default TCP segment size to use on interfaces for which no MSS
and Path MTU information is available. The default and minimum value is
512 bytes. The maximum configurable values is 32768. You should keep
the value of this parameter small if possible.
tcp_nkeep
Sets the number of TCP lIP keepalives that will be sent before giving up.
The default value is 8. The minimum and maximum configurable values
are 1 and 256.
tcp_offer_bi~mss
In order to get the maximum benefit out of Path MTU (PMTU) discovery,
TCP normally offers an MSS that is derived from the local interface MTU
(after subtracting the packet header sizes). This allows the remote system
to send the biggest segments that the network can handle. Set this parameter to 0 for systems that cannot handle this, or that do not implement
PMTU discovery. This causes TCP to offer a smaller MTU for non-local connections (see ip_subnetsarelocal in "Internet Protocol (IP) parameters"
(page 229)). The default value of 1 (offer it) allows maximum benefit to be
gained from PMTU discovery; a value of 0 disables this.

233

Configuring TCPI/P tunable parameters

tcp_small_recvspace
Sets the receive window size to use on interfaces that require small windows (see also tcp_mss_sw_threshold).
MTU is less than
tcp_mss_sw_threshold. The default value is 4096 bytes. The minimum
and maximum configurable values are 1024 and 16384 bytes.
tcp _urgbehavior
Controls how TCP interprets the urgent pointer. If set to 0, it interprets it
in RFC 1122 mode; if set to 1 (the default), it interprets it in BSD mode.
tcpalldebug
If non-zero~ captures trace information for all connections. The default
value is 0 which causes TCP to trace only those connections that set the
SO_DEBUG option. This information can be retrieved using the
trpt(ADMN) command, or displayed on the console if tcpconsdebug is set.
tcpconsdebug
Directs TCP lIP connection trace output to the console if set to 1 (see also
tcpalldebug). The default value is O.
tcpprintfs
Controls logging of warnings from the kernel TCP driver. These are displayed on the console. If set to 0 (the default), debugging information is
not displayed.

User Datagram Protocol (UDP) parameters
The following parameter controls the behavior of the User Datagram Protocol
(UDP).
udpprintfs
Controls logging of warnings from the kernel UDP driver. These are displayed on the console. If set to 0 (the default), debugging information is
not displayed.

234

Performance Guide

Appendix D

Quick system tuning reference
Table D-l, "Diagnosing performance problems" (this page) summarizes the
symptoms and possible solutions for some important performance problems.
Note that the measured values represent averages over time. Suggested critical values may not be suitable for all systems. For example, you may be able
to tolerate a system that is paging out if this is not impacting the performance
of the rest of the system seriously.
Table 0·1

Diagnosing performance problems

Insufficient CPU power at high load

Possible solutions

[mp ]sar -q shows runq-sz > 2
[mp]sar-u shows %idle < 20% on
multiuser system
[mp]sar-u shows %idle < 5% on
dedicated database server
Additionally for SMP:
mpsar -q shows %runocc > 90%
cpusar -u shows %idle < 20% on any
CPU of multiuser system
cpusar -u shows %idle < 5% on any
CPU of dedicated database server

Measures that can be taken include:
• check that the system is not swapping or
paging out excessively
• reschedule jobs to run at other times
• tune applications to use less CPU power
• replace applications with ones needing less
CPU power
• replace non-intelligent serial cards with
intelligent ones
• upgrade the system to use faster CPU(s)
• upgrade to a multiprocessor system
• add more CPUs to a multiprocessor system
• purchase an additional system to share the
load

235

Quick system tuning reference

Excessive paging out or swapping

Possible solutions

[mp]sar -p shows rclm/s» 0
[mp]sar-q shows %swpocc > 20%
[mp]sar -w shows swpot/s > 1
swap -1 shows free < 50% of blocks

Increase free memory until swapping does not
occur by:
• reducing number of buffers (watch out for
reduced cache hit rates)
• running fewer large applications locally
• moving users to another machine
• addingRAM

Poor disk performance

Possible solutions

[mp ]sar -u shows %wio > 15%
[mp]sar -d shows avque» 1 and
%busy > 80%

Increase disk performance by:
• using HTFS filesystem(s)
• using striping across several disks to balance
load
• keeping filesystems < 90% full
• reorganizing directories
• keeping directories small
• distributing different types of activity to
different disks
• adding more disks
• using faster disks, controllers, and host
adapters
• improving buffer cache performance
• improving namei cache performance
• reducing filesystem fragmentation

Poor buffer cache performance

Possible solutions

[mp ]sar -b shows %rcache < 90% and Improve buffer cache performance by:
%wcache < 65%
• increasing number of buffers
• increasing number of buffer hash queues per
buffer

236

Performance Guide

Poor namei cache performance

Possible solutions

[mp ]sar -n shows %Hhit < 65% or
%Dhit < 65%

Increase namei cache hit rate by:
• tuning namei cache parameters for each
filesystem type
• make each pathname component less than
or equal to 14 characters

Fragmented filesystem

Possible solutions

elf -v shows blocks %used > 90%

Reduce the number of disk blocks used by:
• using DTFS filesystem(s)
• removing unwanted files regularly
• archiving and removing, or compressing
infrequently used files
• mounting commonly used resources across
the network using NFS
• adding disk(s)

Reduce fragmentation by:
• archiving and removing the files, and
rebuilding the filesystem

Kernel tables too small

Possible solutions

error messages displayed on console
[mp ]sar -v shows ov > 0 (overflows)

Allow table sizes to grow dynamically; for
example, set MAX_PROC to 0 for the process
table

The desirable attributes of systems with many logged-in users and database
server systems differ in some respects. Use the following tables to check that
you have not overlooked anything:
• Table D-2, Attributes of a well-tuned multiuser system" (page 238)
/I

• Table D-3, "Attributes of a well-tuned dedicated database server system"
(page 239)
Note that the performance values suggested in these tables may not be suitable for all systems. The appropriate values depend greatly on the mix of applications that is running and the likely demands placed on the system.

237

Quick system tuning reference

To record system activity to a file for later analysis, use the -0 option of
sar(ADM) on a single processor system, and of mpsar(ADM) on a multiprocessor system. Take the measurements over a period of at least an hour with a
sampling interval sufficiently small to capture the level of detail which you
are interested in. Record the system's activity at varying levels of loading so
that you can identify when bottlenecks are appearing.
Table 0-2 Attributes of a well-tuned multiuser system
CPU performance

Explanation

[cpu]sar -u shows %idle > 20%

Some idle time on each CPU at high load

[mp ]sar -q shows runq-sz < 2

Few processes waiting to run

mpsar -q shows %runocc < 90%
(SMP only)

Run queue is not continually occupied

See Chapter 3, "Tuning CPU resources" (page 21).
Memory performance

Explanation

[mp]sar -p shows rdm/s::::: 0

Little or no swapping or paging out activity

[mp ]sar -w shows swpot/ s ::::: 0

Little or no activity on the swap device(s)

[mp ]sar -q shows swpq-sz ::::: 0 and
%swpocc:::::O%

No swapped-out runnable processes

[mp ]sar -r shows freemem» GPGSHI Ample free memory and swap space
and freeswp ::::: constant

See Chapter 4, "Tuning memory resources" (page 41).
Disk I/O performance

Explanation

[cpu]sar -u shows %wio < 15%

Little time spent waiting for I/O to complete

[mp]sar-b shows %rcache > 90%
and %wcache > 65%

Good hit rate for reading and writing to the
buffer cache

[mp ]sar -d shows avque ::::: 1

Low average number of disk requests queued

[mp ]sar -n shows %Hhit > 65% or
%Dhit> 65%

Good hit rate for namei cache

See Chapter 5, "Tuning I/O resources" (page 71).

238

Performance Guide

Table 0-3 Attributes of a well-tuned dedicated database server system
CPU performance

Explanation

[cpu]sar -u shows %idle > 5%

Some idle time on each CPU at high load

[mp ]sar -q shows runq-sz < 2

Few processes waiting to run

mpsar -q shows %runocc < 90%
(SMP only)

Run queue is not continually occupied

See your database documentation and Chapter 3, "Tuning CPU resources"
(page 21).
Memory performance

Explanation

[mp]sar -p shows rclm/s::::: a

Little or no swapping or paging out activity

[mp ]sar -w shows swpot/ s ::::: 0

Little or no activity on the swap device(s)

[mp ]sar -q shows swpq-sz ::::: 0 and
%swpocc :::::0%

No swapped-out runnable processes

[mp ]sar -r shows freemem::::: GPGSHI
and freeswp ::::: constant

Little excess free memory; allow the database to
use any excess memory by increasing its
internal work area.

See your database documentation and Chapter 4, "Tuning memory resources"
(page 41).
Disk 110 performance

Explanation

[cpu]sar -u shows %wio < 15%

Little time spent waiting for I/O to complete

[mp ]sar -d shows avque ::::: 1

Low average number of disk requests queued

See your database documentation and Chapter 5, "Tuning I/O resources"
(page 71).

239

Quick system tuning reference

240

Performance Guide

Bibliography

The following books provide more information about topics outlined in this
guide. This list is provided for reference only; it is not comprehensive and The
Santa Cruz Operation, Inc. does not guarantee the accuracy of these publications. The implementation of the UNIX system, networking and performance
analysis software described in these books may differ in some details from
that of the current sea OpenServer software.
Several references are also included on the subject of algorithmics which has
direct relevance to programmers who wish to improve the performance of
applications programs.
Ammeraal, Leendert. Programs and Data Structures in C, Second Edition. New
York, NY: Wiley, 1992. A practical introduction to the implementation and
manipulation of data structures using the ANSI e programming language.
Bach, Maurice J. The Design of the UNIX Operating System. Englewood Cliffs,
NJ: Prentice Hall, 1986. A technical discussion of the internals of the UNIX
System V Operating System, written shortly before the release of UNIX System
V Release 3.
Deitel, Harvey M. An Introduction to Operating Systems, Second Edition. Reading, MS: Addison-Wesley, 1990. Discusses general performance issues for operating systems.
Harel, David. Algorithmics: The Spirit of Computing, Second Edition. Reading,
MS: Addison-Wesley, 1992. A very readable introduction to the subject of
algorithmics.
Hunt, Craig. TCP/IP Network Administration. Sebastopol, CA: O'Reilly and
Associates, 1993. Contains information about the configuration of IP packet
routing and name service.
Knuth, Donald E. The Art of Computer Programming, Volume I: Fundamental
Algorithms. Reading, MS: Addison-Wesley, 1968. The first volume of the classic three-volume series on the subject of computer programming.
Knuth, Donald E. The Art of Computer Programming, Volume II: Seminumerical
Algorithms. Reading, MS: Addison-Wesley, 1969.
Knuth, Donald E. The Art of Computer Programming, Volume III: Sorting and
Searching. Reading, MS: Addison-Wesley, 1973.

241

Bibliography

Loukides, Mike. System Performance Tuning. Sebastopol, CA: O'Reilly and
Associates, 1991. Includes many excellent tips for getting the best performance out of UNIX systems.
Mansfield, Niall. The Joy of x. Reading, MS: Addison-Wesley, 1993. Contains
useful information about performance issues for the X Window System.
Messmer, Hans-Peter. The Indispensable PC Hardware Book. Reading, MS:
Addison-Wesley, 1994. Provides comprehensive information about system
hardware issues.
Miscovitch, Gina and David Simons. The sca Performance Tuning Handbook.
Englewood Cliffs, NJ: Prentice Hall, 1994. Written by two senior kernel
engineers at sea, this book describes performance tuning for sca® UNIX®
Release 3.2 Version 4.2, sca MPXTM 3.0, sea Open Desktop 3.0, and sca Open
ServerTM 3.0 systems.
Press, William H., Brian P. Flannery, Saul A. Teukolsky, and William T. Vetterling. Numerical Recipes in c: The Art of Scientific Computing, Second Edition.
Cambridge University Press, 1994. Includes many numerical algorithms for
scientific and engineering applications.
Stem, Hal. Managing NFS and NIS. Sebastopol, CA: O'Reilly and Associates,
1991. Contains a detailed chapter on performance analysis and tuning as well
as useful references on IP packet routing and NFS benchmarks.

242

Performance Guide

Glossary of peifonnance terminology

This section contains definitions of the key terms used throughout this book
in discussing the performance of computer systems.
AlO

See asynchronous I/O.
asymmetric multiprocessing
A multiprocessor system is asymmetric when processors are not equally able
to perform all tasks. For example, only the base processor is able to control
I/O. Most machines acknowledged to be symmetric may still have some
asymmetric features present such as only being able to boot using the base
processor.
asynchronous VO
Provides non-blocking I/O access through a raw device interface.
bandwidth
The maximum I/O throughput of a system.
base processor
The first CPU in a multiprocessor system. The system normally boots using
this CPU. Also called the default processor, it cannot be deactivated.
bdflush
The system name for the buffer flushing daemon.
benchmark
Software run on a computer system to measure its performance under specific
operating conditions.
block device interface
Provides access to block-structured peripheral devices (such as hard disks)
which allow data to be read and written in fixed-sized blocks.
blockingVO
Forces a process to wait for the I/O operation to complete. Also known as
synchronous I/O.
boUleneck
Occurs when demand for a particular resource is beyond the capacity of that
resource and this adversely affects other resources. For example, a system has
a disk bottleneck if it is unable to use all of its CPU power because processes
are blocked waiting for disk access.
bss
Another name for data which was not initialized when a program was compiled. The name is an acronym for block started by symbol.
243

Glossary of performance terminology

buffer
A temporary data storage area used to allow for the different capabilities
(speed, addressing limits, or transfer size) of two communicating computer
subsystems.
buffer cache
Stores the most-recently accessed blocks on block devices. This avoids having
to re-read the blocks from the physical device.
buffer flushing daemon
Writes the contents of dirty buffers from the buffer cache to disk.
cache memory
High-speed, low-access time memory placed between a CPU and main memory in order to enhance performance. See also level-one (Ll) cache and leveltwo (L2) cache.
checkpointing
One of the functions of the htepi_daemon; marking a filesystem state as clean
after it flushes changed metadata to disk.
child process
A new process created when a parent process calls the fork(S) system call.
clean
The state of a system buffer or memory page that has not had its contents
altered.
client-server model
A method of implementing application programs and operating system
services which divides them into one of more client programs whose requests
for service are satisfied by one or more server programs. The client-server
model is suitable for implementing applications in a networked computer
environment.
Examples of application of the client-server model are:
• page serving to diskless clients
• file serving using NFS and NUCFS
• Domain Name Service (DNS)
• the X Window System
• many relational database management systems (RDBMSs)
clock interrupt
See clock tick.
clock tick
An interrupt received at regular intervals from the programmable interrupt
timer. This interrupt is used to invoke kernel activities that must be performed on a regular basis.

244

Performance Guide

contention
Occurs when several CPUs or processes need to access the same resource at
the same time.
context
The set of CPU register values and other data, including the u-area, that
describe the state of a process.
context switch
Occurs when the scheduler replaces one process executing on a CPU with
another.
copy-on-write page
A memory page that is shared by several processes until one tries to write to
it. When this happens, the process is given its own private copy of the page.

Cow page
See copy-on-write page.
CPU

Abbreviation of central processing unit. One or more CPUs give a computer
the ability to execute software such as operating systems and application programs. Modem systems may use several auxiliary processors to reduce the
load on the CPU(s).
CPU bound

A system in which there is insufficient CPU power to keep the number of
runnable processes on the run queue low. This results in poor interactive
response by applications.
daemon
A process that performs a service on behalf of the kernel. Since daemons
spend most of their time sleeping, they usually do not consume much CPU
power.
device driver
Performs I/O with a peripheral device on behalf of the operating system kernel. Most device drivers must be linked into the kernel before they can be
used.
dirty
The state of a system buffer or memory page that has had its contents altered.
distributed interrupts
Interrupts from devices that can be serviced by any CPU in a multiprocessor
system.
event
In the X Window System, an event is the notification that the X server sends
an X client to tell it about changes such as keystrokes, mouse movement, or
the moving or resizing of windows.

245

Glossary of performance terminology

executing
Describes machine instructions belonging to a program or the kernel being
interpreted by a cpu.
fragmentation
The propensity of the component disk blocks of a file or memory segments of
a kernel data structure to become separated from each other. The greater the
fragmentation, the more work has to be performed to retrieve the data.
free list
A chain of unallocated data structures which are available for use.
garbage collection
The process of compacting data structures to retrieve unused memory.
htepCdaemon
A kernel daemon that handles filesystem metadata. It can also perform
optional transaction intent logging and checkpointing on behalf of the HTFS
filesystem.
idle
The operating system is idle if no processes are ready-to-run or are sleeping
while waiting for block I/O to complete.
idle waiting for 110
The operating system is idle waiting for I/O if processes that would otherwise
be runnable are sleeping while waiting for I/O to a block device to complete.
in-core
Describes something that is internal to the operating system kernel.
in-core inode
An entry in the kernel table describing the status of a file system inode that is
being accessed by processes.
inode
Abbreviation of Index Node. An inode is a data structure that represents a file
within a traditional UNIX filesystem. It consists of a file's metadata and the
numbers of the blocks that can be used to access the file's data.
interrupt
A notification from a hardware device about an event that is external to the
cPU. Interrupts may be generated for events such as the completion of a
transfer of data to or from disk, or a key being pressed.
interrupt bound
A system which is unable to handle all the interrupts that are arriving.
interrupt latency
The time that the kernel takes to handle an interrupt.

246

Performance Guide

interrupt overrun
Occurs when too many interrupts arrive while the kernel is trying to handle a
previous interrupt.
110

Abbreviation of input/output. The transfer of data to and from peripheral
devices such as hard disks, tape drives, the keyboard, and the screen.
110 bound

A system in which the peripheral devices cannot transfer data as fast as
requested.
job
One or more processes grouped together but issued as a single command. For
example, a job can be a shell script containing several commands or a series of
commands issued on the command line connected by a pipeline.
kernel
The name for the operating system's central set of intrinsic services. These
services provide the interface between user processes and the system's hardware allowing access to virtual memory, I/O from and to peripheral devices,
and sharing resources between the user processes running on the system.

kernel mode
See system mode.
kernel parameter
A constant defined in the file /etc!conf/cfdlmtune (see mtune(F» that controls
the configuration of the kernel.
level-one (Ll) cache
Cache memory that is implemented on the CPU itself.
level-two (L2) cache
Cache memory that is implemented externally to the CPU.
load average
The utilization of the CPU measured as the average number of processes on
the run queue over a certain period of time.
logging
See transaction intent logging.
marry driver
A pseudo-device driver that allows a regular file within a filesystem to be
accessed as a block device, and, hence, as a swap area.
memory bound
A system which is short of physical memory, and in which pages of physical
memory, but not their contents, must be shared by different processes. This is
achieved by paging out, and swapping in cases of extreme shortage of physical memory.

247

Glossary of performance terminology

memory leak
An application program has a memory leak if its size is constantly growing in
virtual memory. This may happen if the program is continually requesting
more memory without re-using memory allocated to data structures that are
no longer in use. A program with a memory leak can eventually make the
whole system memory bound, at which time it may start paging out or swapping.
metadata
The data that an inode stores concerning file attributes and directory entries.
multiprocessor system
A computer system with more than one cpu.
multithreaded program
A program is multithreaded if it can be accessed simultaneously by different
CPUs. Multithreaded device drivers can run on any cpu in a multiprocessor
system. The kernel is multithreaded to allow equal access by all CPUs to its
tables and the scheduler. Only one copy of the kernel resides in memory.
namei cache
A kernel data structure that stores the most-commonly accessed translations
of file system pathname components to inode number. The namei cache
improves I/O performance by reducing the need to retrieve such information
from disk.
nice value
A weighting factor in the range 0 to 39 that influences how great a share of
cpu time a process will receive. A high value means that a process will run
on the cpu less often.
non-blocking liD
Allows a process to continue executing without waiting for an I/O operation
to complete. Also known as asynchronous I/O.
operating system
The software that manages access to a computer system's hardware resources.
overhead
The load that an operating system incurs while sharing resources between
user processes and performing its internal accounting.
page
A fixed-size (4KB) block of memory.
page fault
A hardware event that occurs when a process tries to access an address in virtual memory that does not have a location in physical memory associated
with it. In response, the system tries to load the appropriate data into a newly
assigned physical page.

248

Performance Guide

page stealing daemon
The daemon responsible for releasing pages of memory for use by other processes. Also known as vhand.
paging in
Reading pages of program text and pre-initialized data from the filesystems,
or stack and data pages from swap.
paging out
Releasing pages of physical memory for use by making temporary copies of
the contents of dirty pages to swap space. Clean pages of program text and
pre-initialized data are not copied to swap space because they can be paged in
from the filesystems.
parent process
A process that executes a fork(S) system call to create a new child process.
The child process usually executes an exec(S) system call to invoke a new program in its place.
physical memory
Storage implemented using RAM chips.
preemption
A process that was running on a CPU is replaced by a higher priority process.
priority
A value that the scheduler calculates to determine which process(es) should
next run on the CPUs. A process' priority is calculated from its nice value and
its recent CPU usage.
process
A single instance of a program in execution. This can be a login shell or an
operating system command, but not a built-in shell command. H a command
is built into the shell a separate process is not created on its invocation; the
built-in command is issued within the context of the shell process.
process tab Ie
A data structure inside the kernel that stores information about all the processes that are present on a system.
protocol
A set of rules and procedures used to establish and maintain communication
between hardware or software subsystems.
protocol stack
Allows two high-level systems to communicate by passing messages through
a low-level physical interface.
pseudo-device driver
A device driver that allows software to behave as though it is a physical device. Examples are ram disks and pseudo-ttys.

249

Glossary of performance terminology

pseudo-tty
A pseudo-terminal is a device driver that allows one process to communicate
with another as though it were a physical terminal. Pseudo-ttys are used to
interface to programs that expect to receive non-blocking input and to send
terminal control characters.
queue
An ordered list of entities.
race condition
The condition which occurs when several processes or CPUs are trying to
write to the same memory or disk locations at the same time. The data that is
eventually stored depends on the order that the writes occur. A synchronization mechanism must be used to enforce the desired order in which the
writes are to take place.
array
Abbreviation of redundant array of inexpensive disks. Used to implement
high performance and/or high integrity disk storage.

RAID

ramdisk
A portion of physical memory configured to look like a physical disk but
capable of fast access times. Data written to a ramdisk is lost when the operating system is shut down. Ramdisks are, therefore, only suitable for implementing temporary filesystems.
raw device interface
Provides access to block-structured peripheral devices which bypasses the
block device interface and allows variable-sized transfers of data. The raw
interface also allows control of a peripheral using the ioctl(S) system call. This
allows, for example, for low-level operations such as formatting a disk or
rewinding a tape.
region
A region groups a process' pages by their function. A process has at least
three regions for its data, stack, and text.
resource
Can be divided into software and hardware resources. Software resources
may be specific to applications, or they may be kernel data structures such as
the process table, open file, and in-core inode tables, buffer and namei caches,
multiphysical buffers, and character lists. Hardware resources are a
computer's physical subsystems. The three main subsystems are CPU, memory and I/O. The memory subsystem can be divided into two resources physical memory (or main memory) and swap space (or secondary memory).
The I/O subsystem comprises one or more resources of similar or different
types - hard and floppy disk drives, tape drives, CD-ROMs, graphics displays
and network devices.
ready-to-run process
A process that has all the system resources that it needs in order to be able to
runonaCPU.
250

Performance Guide

response time
The time taken between issuing a command and receiving some feedback
from the system. This is not to be confused with turnaround time which is a
measure of how long a particular task takes from invocation to completion.
run queue
The list of ready-to-run processes maintained by the kernel.
runnable process
See ready-to-run process.
scaling
A computer system's ability to increase its processing capacity as CPUs are
added. If the processing capacity increases in direct proportion to the number
of CPUs, a system is said to exhibit 100% scaling. In practice, a system's ability
to scale is limited by contention between the CPUs for resources and depends
on the mix of applications being run.
sched
The system name for the swapper daemon.
scheduler
The part of the kernel that chooses which process(es) to run on the CPUs.
single threaded program
A program is single threaded if it can only run on one CPU at a time. Single
threaded devices drivers can only run on the base processor in a multiprocessor system.
sleeping on 110
See waiting for I/O.
spin lock
A method of synchronizing processes on a multiprocessor system. A process
waiting for a resource which is currently in use (locked) by a process running
on a different CPU repeatedly executes a short section of kernel code (spins)
until the lock is released.
stack
A list of temporary data used by a program to handle function calls.
strd
The system name for the STREAMS daemon.
stream head
The level of the STREAMS I/O interface with which a user process communicates.
STREAMS 110

A mechanism for implementing a layered interface between applications
running in user space and a device driver. Most often used to implement network protocol stacks.

251

Glossary of performance terminology

STREAMS daemon

The daemon used by the STREAMS I/O subsystem to manage STREAMS memory.
swap area
A piece of swap space implemented as a disk division or as a block device
married to a regular file in a filesystem.
swap space
A collection of swap areas used to store the contents of stack and data memory pages temporarily while they are used by other processes.
swapper daemon
Part of the kernel that reclaims physical pages of memory for use by copying
whole regions of processes to swap space.
swapping
The action take by the swapper daemon when the system is extremely short
of physical memory needed for use by processes. Swapping can place a
heavy load on the CPU and disk I/O subsystems.
symmetric multiprocessing
A multiprocessor system is symmetric when any processor can perform any
function. This ensures an even load distribution because no processor
depends on another. Each process is executed by a single processor.
system mode
The state of a CPU when the kernel needs to ensure that it has privileged
access to its data and physical devices. Also known as kernel mode.
text
Executable machine instructions (code) that a CPU can interpret and act on.
throughput
The amount of work (measured in number of jobs completed, disk requests
handled, and so on) that a system processes in a specified time.
time slice
The maximum amount of time for which a process can run without being
preempted.
transaction intent logging
One of the functions of the htepi_daemon; writing the intention to change
filesystem metadata to a log file on disk.
u-area
Abbreviation of user area and also known as a u-block. A data structure possessed by every process. The u-area contains private data about the process
that only the kernel may access.
user mode
The state of a CPU when it is executing the code of a user program that
accesses its own data space in memory.
252

Performance Guide

vhand
The system name for the page stealing daemon.
virtual disk
A disk composed of pieces of several physical disks.
virtual memory
A method of expanding the amount of available memory by combining physical memory (RAM) with cheaper and slower storage such as a swap area on a
hard disk.
waiting for I/O
A process goes to sleep if it has to wait for an I/O operation to complete.
X

client
An applications program that communicates with an X server to request that
it display information on a screen or to receive input events from the keyboard or a pointing device such as a mouse. The client may be running on the
same computer as the server (local), or it may be connected via a network
(remote).

X

server
The software that controls the screen, keyboard and pointing device under the
X Window System.

X terminal

A display device that is able to run X server software. All of an X terminal's
clients must run on remote machines.
X Window System

A windowing system based on the client-server model.
zombie process
An entry in the process table corresponding to a process that no longer exists.
The entry will only be removed if its parent process invokes a wait(S) system
call. A zombie process does not consume any system resources apart from its
slot in the process table. However, you should beware of runaway processes
that generate many zombies. These will cause the system to become short of
memory as the process table grows to accommodate them.

253

Glossary of performance terminology

254

Performance Guide

Index

Symbols, numbers
16450, UART, 108
16550, UART,l08
80387, math coprocessor,221
8250, UART, 108
10Base2, 137
10Base5, 137
10Base-T, 137

asynchronous writes, configuring on NFS
server, 151
automount(NADM), performance
considerations, 153
avque field, sar -d, 69, 89, 90
avservfield,sar-d,89,90
avwait field, sar -d, 89, 90

B
A
Address Resolution Protocol, parameters,
227
address space, limiting, 202
ahdlcmtu, 228
AlO. See asynchronous I/O
aio_breakup - AlO buffer table overflow,
199
aio_breakup - AIO request table overflow,
199
aio_memlock - AlO process table overflow,
199
aio_setlockauth - AIO lock table overflow,
200
allocb - Out of streams memory, 211
allocreg - Region table overflow,209
applications
performance tuning, 9
using STREAMS, 129
AJ{P,parameters,227
arp_maxretries,227
arpprintfs, 227
arpt_down,228
arpCkeep, 228
arpCprune, 228
ASYH, parameters, 228
asynchronous I/O
control blocks, 85
high performance, 11
introduced,72
kernel parameters, 199
POSIX.lb,162
viewing activity of, 162

back-to-back packets, 225
bad line, 110
badcalls field, nfsstat -c, 144
badlen field, nfsstat -s, 145
badxid field, nfsstat -c, 145
balancing hard disk activity, 98
base processor,23
bdflush, 73, 194
BDFLUSHR, 93, 194
benchmarks, 118
BFREEMIN, 195
biod daemons, performance tuning, 149
blks/s field, sar -d, 90
block device, switch table size, 222
block I/O, viewing, 89
blocks field, swap -1,48
boot, load extension, kernel parameters,
222
Boot Time Loadable Driver, kernel parameters,222
bread/s field, sar -b, 75
bridges, 137
bswot/s field, sar -w, 49, 68
BTLD, kernel parameters, 222
buffer cache
changing size at boot time, 79
disk blocks read to, 75
disk blocks written from, 75
effect of large, 77
finding size of, 74
free list, 195
hash queues, setting number of, 80
hit rates, 54
increasing available memory, 54
increasing size of, 75

255

buffer
buffer cache (continued)
number of reads from, 54, 75
number of writes to, 54, 75
position in memory of, 79
reducing contention, 81
reducing size of, 53, 54, 70
too small, 67
used by a database, 122
used by system, 72
viewing activity of, 75
buffer flushing daemon, 73
buffer header, STREAMS, 127
buffers
allocating character list, 197
configuration string, 219
increasing cache hit rate, 192
kernel parameters, 192
specifying age for filesystem updates,
194
splitting threshold, 211
writing to disk, 194
%busy field, sar -d, 69, 89, 90
buying hardware, 8
bwrit/s field, sar -b, 75

c
C2, disabling features, 203
cache hits, reducing disk accesses, 192
cache_affinity variable, 36
cblock,110
character block, 110
character buffers
allocating number of, 197
kernel parameters, 197
character device, switch table size, 222
character lists
introduced, 110
tuning, 112
chattering terminal, 110
CheaperNet, 137
checkpointing, 206
chown kernel privilege, controlling, 203
CHOWN_RES, 203
client-server
applications, 139
running applications over network, 140
clist,110
CLK_TCK,223
clock interrupt, 26

256

clock tick, 26
cluster, filesystem, 94
cluster buffers, number set using NMPBUF,
193
cluster size, 103
CMASK,202
Collis field, netstat -i, 133
configuration, tunable kernel parameters,
191
configuration string, size of buffer, 219
configuration-dependent values, changing,
223
configure(ADM), 189
console
kernel parameters, 204
plasma display, 221
console screen saver, 204
contention, locking, 10
context switch, 28
control, map size, specifying, 216
Control Register 0 (CRO), 221
Control Register 3 (CR3), 221
copy buffers
number set using NMPBUF, 193
tuning number of, 86
used by system, 79, 84
CPU
adding, 31
base processor, 23
disabling, 23
enabling, 23
idle, 22
number currently active, 23
turning on/off, 23
viewing activity of, 23
CPU-bound system
identifying, 38
tuning, 40
cpuonoff(ADM), 23
cpusar(ADM)
-1,34
-j,34
-u, 23,119
crash(ADM)
available swappable memory, 49
reading putbuf buffer, 193
crontab(C), 40,52
CTBUFSlZE,220

D
data, region, 209
database server, adjusting scheduler
behavior for, 35
database systems, 118
databases
arranging disks on server, 122
buffer cache used by, 122
disk layout of journal logs, 122
profiling files in, 122
shared memory, 122
desktop, reducing memory usage, 56
desktop client, performance, 56
device driver
kernel parameters, 222
multithreaded, 33
third party, 33
device field, sar -d, 89
/ dev / spx, 212
df(C),172
dfspace(C),172
%Dhit field, sar -n, 82, 91
D_hits field, sar -n, 82
%direct field, sar -0, 162
directories, efficiency of searching, 95
disk controllers
block caching, 92
effect of slow, 67
multiple, 92
track caching, 92
disk I/O-bound system, identifying, 90
diskless clients, NIS, 154
disks
average number of requests waiting for,
89
average size of data transfers, 90
average time for request to, 89
configuration for database server, 122
estimating throughput of, 90
even distribution of activity, 120
examining amount of space, 172
kernel parameters, 192
percentage of time busy, 89
redistributing data, 122
time request waits in driver, 89
dkconfig(ADM)
-ps,104
-Tp,106
dk_name - Diskinfo table overflow, 207

DMA (Direct Memory Access)
buffers, 84
simultaneous requests on channel, 221
transfers, 79
use by hard disk controllers, 40
DMAEXCL,221
Dmisses field, sar -n, 82
DNS (Domain Name Service), performance
considerations, 141-142
D0387CR3, 221
dopricalc variable, 35
DOS filesystem kernel parameters, 206
DOSNINODE,206
DOSNMOUNT,206
DOWPCRO, 221
DSTFLAG,219
DTCACHEENTS, 199
DTHASHQS, 199
DTOFBIAS,199
dynamic kernel table parameters, 56, 207
dynamic linked libraries, 11
dynamic tables, kernel parameters, 207

E
environment variables, TZ (timezone), 219
/etc/conf/cf.d/mtune,190
/etc/conf/cf.d/stune,190
/ etc/ default/ inet, 227
TCP lIP configuration, 226
/etc/default/login,202
/etc/tcp script, TCP lIP configuration, 225
Ethernet, 137
EVDEVS,216
EVDEVSPERQ,216
event-Event channel full, 216
event - Event table full, 216
event queue, kernel parameters, 216
EVQUEUES,216
exec/ s field, sar -c, 162
execution profiler, 10, 162
EXTRA_NDEV,222
EXTRA_NEVENT,222
EXTRA_NFILSYS, 222

257

factor(c)

F

H

factor(C), testing for prime, 83
fail field, netstat -m, 130
falloc - File table overflow, 208
file table, viewing, 55
files
compression, 206
controlling depth of versioning, 205
controlling undelete time, 206
default mask used on creation of, 202
maximum number of open, 201
size limit, 202
synchronization, 206
filesystem configuration, kernel parameters,205
filesystems
cluster, 94
defragmenting, 94
examining amount of space, 172
factors that affect performance of, 94
fragmentation, 94
nameicache, 198
writing buffers to disk, 194
file-sz field, sar -v, 55
fixed-priority process, 28
floating point coprocessors, 21
fork/s field, sar -c, 162
fragmentation, filesystem, 94
Fragmentation Required, 230
free list
used by buffer cache, 195
used by paged memory, 42
free memory pages, 48
freemem field, sar -r, 48, 51, 68
freeswp field, sar -r, 48
full frames, 131
full stripe, 103

hard disks
balancing activity of, 98
performance limitations, 96
hardware
kernel parameters, 222
performance, 8
performance considerations, 8
upgrading, 8
hardware-dependent kernel parameters,
220
Hardware/Kernel Manager, 188
hash queues
increasing with system buffers, 192
setting number of, 80
%Hhit field, sar -n, 82, 83, 91
H_hits field, sar -n, 82
Hmisses field, sar -n, 82
hop count, increasing on interface, 140
host adapter
scatter-gather, 92
tagged command queuing, 92
HTCACHEENTS,198
HTFS filesystems, increasing performance
of, 95
HTHASHQS, 198
HTOFBIAS, 198
HZ, clock interrupt rate, 26

G
Gateway for NetWare, performance tuning,
157
getconf(C),207,223
GPGSHI, 45, 53,67, 195
GPGSLO, 44,51,53, 195
group configuration, kernel parameters,
201
groups, limiting supplemental, 202

258

I
ICMP Host Unreachable, 231
ICMP (Internet Control Message Protocol)
parameters, 228
icmp_answermask, 228
icmpprintfs,228
iddeftune(ADM),53
idle
no runnable processes, 23
operating system state, 22
waiting for 1/0,22
%idle field, sar -u, 23
idle waiting for 1/0,23
idtune(ADM), 190
changing kernel parameters using, 190
Ierrs field, netstat -i, 133
ifconfig(ADMN), 131,225
IGMP (Internet Group Management
Protocol) parameters, 229
igmpprintfs, 229

kernel
IKNT (in-kernel network terminal) driver,
configuring, 229
inconfig(ADMN), 132,226
in-core inode table, viewing, 55
indirect blocks, 95
inet file, TCP /IP configuration, 226
in_fullsize, 229
in-kernel network terminal driver, configuring, 229
in_Ioglimit, 229
inode table
allocating entries, 207
viewing, 55
Inode table overflow, 208
inodes
indirect blocks, 95
number in DOS filesystem, 206
inod-sz field, sar -v, 55, 83
in_recvspace, 230
in_sendspace, 230
intelligent serial cards, 40
inter-CPU interrupts, examining, 34
interface cards, performance tuning, 225
Internet Control Message Protocol parameters,228
Internet Group Management Protocol parameters, 229
Internetwork Packet Exchange, IPX, 157
interrupt
bound,l11
examining activity, 34
examining inter-CPU activity, 34
inter-CPU, 34
introduced, 28
latency, 111
overrun,lll
sharing, 221
trigger level, 111
I/O
asynchronous, 72
buffers, 74
programmed,22
synchronous, 72
tuning, 71
I/O bottlenecks
due to LMCFS performance, 155
due to NFS performance, 146
I/O vector size, setting, 203
I/O-bound system, identifying, 91
10V_MAX,203

IP (Internet Protocol)
configuring for NFS, 152
introduced, 131
parameters, 229
IPC activity, viewing, 162
ip_checkbroadaddr, 230
IPC_NOWAIT, 214
ip_dirbroadcast, 230
ipforwarding, 231
ipnonlocalsrcroute, 232
ip _perform_pmtu, 230
ip _pmtu_decrease_age, 230
ip_pmtu_increase_age, 230
ipprintfs, 232
ipsendredirects, 231
ip _settos, 230
ip_subnetsarelocal,231
ip_ttl,231
IPX (Internetwork Packet Exchange), 157
IPX/SPX, performance tuning, 157

J
job structure, 107
joumallogs
bottleneck, 98
disk layout, 122

K
KB1YPE,221
KDBSYMSIZE, 219
kernel
managing virtual address space, 197
relinking with link_unix, 189
resources, 189, 190
kernel debugger, size of symbol table, 219
kernel mode, operating system state, 22
kernel parameters
AIO,199
boot load extension, 222
BTLD,222
buffers, 192
changing,189
changing using configureADM), 191
changing using idtune(ADM), 190
console, 204
disks,192
event queues, 216
filesystem, 205

259

kernel
kernel parameters (continued)
hardware-dependent, 220
math coprocessor, 221
memory management, 195,197
message queues, 213, 215
multiphysical buffers, 192
multiscreens,204
nameicache,198
paging, 195
processes, 195
semaphores, 216
shared memory, 218
S11(E~S,209-213

swapping, 195
system name, 219
tunable for configuration, 191
tunable for performance, 191
user and group configuration, 201
virtual disk, 200
kernel profiler, text symbols, 219
kernel tables
dynamic, 56
kernel parameters, 207
KERNEL_CLISTS, 224
adjusting number of, 112
KERNEL_CLISTS_MAX,224
KERNEL_MOUNT_MAX,207
keyboard, logical character protocol, 221

L
L1 cache, 37, 41
L2 cache, 35, 37, 41
LAN Manager Client Filesystem, LMCFS
kernel parameters, 223
performance tuning, 154
latches, 165
latency, interrupt, 111
layers, setting number of, 205
libraries, 11
link_unix(ADM), 189
LMCFS (LAN Manager Client Filesystem)
kernel parameters, 223
performance tuning, 154,155
LMCFS_BUF_SZ, 223
LMCFS_L~,156,223

LMCFS_NUM_BUF, 156,223
LMCFS_NUM_RECb156,223

260

Imc(LMC)
mntstats, 156
stats,156
load displayer,34
loadbalance variable, 37
localization of reference, 11
lock table, viewing, 55
locks, contention, 10
lock-sz field, sar -v, 55
log driver, number of minor devices, 213
logging, 206
login user ID, LUID,203
logs, disk layout, 122
LUID (login user ID), 203

M
math coprocessor, kernel parameter, 221
MAX_BDEV,222
MAX_CDEV,222
MAX_CFGSIZE,219
MAX_DISK, 207
MAXFC,197
MAX_FILE, 208
MAX_FLCKREC,209
maximum segment size, adjusting, 131
MAX_INODE,207
MAX_MOUNT, 153,209
MAX_PROC, 202,208
MAX_REGION, 209
MAXSC,197
MAXSEPGCNT,212
MAXSLICE,26,196
MAXUMEM,202
MAXUP,202
MAXVDEPTH,205
MBCL (message block control logging), parameters,232
mbclprintfs, 232
mdmin/s field, sar -y, 110
memory
adding more, 53
cause of leak, 52
finding amount of, 42
greater than 32MB, 53
management 195
maximum used by STREAMS, 210
pages, 42
setting maximum used by process, 202
shared segment size, 218

networking
memory (continued)
swappable, 49
used by virtual disk driver, 101
memory management, kernel parameters
197
'
memory-bound system
identifying, 51
tuning, 52
message block control logging, MBCL, parameters, 232
message buffers, 127
message header, STREAMS, 127
message map, size of, 164
message queue, kernel parameters, 213, 215
messages
data per queue, 164
file, 42, 74
length of, 164
memory reserved for, 164
message queues, 163
number of segments, 164
size of message map, 164
size of segment, 164
using, 163
viewing activity of, 162
MINARMEM, 196
MINASMEM,196
MINVTIME,206
mkdev(ADM)
configuring layers, 205
configuring pseudo-ttys, 205
configuring shell layers, 205
modems, tuning serial port parameters, 111
MODE_SELECT, 221
mpsar(ADM). See sar(ADM)
mpstat(ADM),34
MSGMAP, 164, 213
MSG~,l64,214
MS~B,l64,214
MS~I,215

msg/ s field, sar -m, 163
MSGSEG, 164,214
MSGSSZ, 164, 215
MSGTQL,215
mtune file, 190
mtune(F), kernel parameters file, 187
multiphysical buffers
configuring number of, 193
kernel parameters, 192
tuning number of, 86
used by system, 72, 79, 84

multiplexer links, 213
multiscreen, kernel parameters, 204

N
NAHACCB, 222
NAIOBUF, 199
NAIOHBUF, 200
NAIOLOCKTBL, 200
NAIOPROC,199
NAIOREQ,199
NAIOREQPP, 200
. name service, performance considerations
141-142
'
name to inode (namei) translation cache 72
namei cache
'
DTFS kernel parameters, 199
HTFS kernel parameters, 198
kernel parameters, 198
low hit rate for, 91
number of components found in, 82
number of misses in, 82
operation of, 81
percentage of hits in, 82
tuning performance of, 83
used by system, 72
NAME_MAX,224
NAUTOUP, 93, 194
nbprintfs, 232
nb_sendkeepalives, 232
NBUF,54,74,78,192
nbuf bootstring, 79
NCLIST, 110, 111, 112, 197
NCPYRIGHT, 219
NEMAP,222
NetBEUI
performance tuning, 157
protocol stack, 157
NetBIOS
interface, 157
parameters, 232
performance tuning, 157
netstat(TC)
-i, 133,158
-m,130
networking, performance tuning, 123-157
networking parameters, TCP lIP, 227-234

261

networks
networks
configuring topology of, 137-140
interface cards, performance tuning, 225
monitoring activity of, 138
packet collisions, 133, 138
packet corruption, 133
packet transmit errors, 133
route tracing, 136
server types, 139
sniffer, 138
subnets, 139
testing connectivity, 136
newproc - Process table overflow, 208
NFS (Network File System)
asynchronous writes, configuring, 151
configuring IP for, 152
configuring to use TCP, 152
daemons, tuning, 147
examining client performance, 144
mount(ADM) options, configuring, 153
performance implications of daemons,
146
performance tuning, 142-154
server, examining performance, 145
synchronous writes, configuring, 151
tuning client performance, 147
nfsd daemons, performance tuning, 148
nfsstat(NADM)
-c,l44
-s,145
NGROUPS,202
NGROUPS_MAX runtime value, 202
~BUF,78,80,192

NHINODE, 193
nice value
changing, 30
use in calculating priority,29
NIS (Network Information Service)
clients, 154
performance considerations, 154
NKDVTIY,222

NLOG,213
NMPBUF, 79, 84, 193
NMUXLINK,213
NODE,219
NOFILES,201
non-intelligent serial cards, 40, 108
Not enough space, 202
nping(PADM), 157
NSCRN,205
NSHINTR, 221

262

NSPTIYS, 58, 205
~STREAM,58,129,130,210

~STREVENT,

211
58, 128, 130
~STRPUSH, 213
~UMSP, 58, 212
~STRPAGES,

~SXT,205
~UMTIM,212
~MTRW,212
~,205

~ZERO,223

o
Oerrs field, netstat -i, 133
ompb Is field, sar -h, 86
one-packet mode
disabling, 226
enabling, 225
setting, 132
open file table, viewing, 55
operating system states, 22
oreqblkls field, sar -5, 94
OS! protocol stack, 157
Out of clists ..., 197
out of streams, 129
ov clistI s field, sar -g, 112
overrun, interrupt, 111
ovsiodmal s field, sar -g, 112
ovsiohw I s field, sar -g, 111

p
packets
collisions, 133, 138
corrupted,133
output errors, 133
page stealing daemon, vhand, 44
pages, 42
paging
affecting I/O throughput, 69
heavy activity, 51
indicating memory shortage, 67
memory, 195
pages added to the free list, 50
pages not found in memory, 50
used by system, 44
Path MTU discovery, 230,233
perfect scaling, 31

ROOTMINVTIME

performance
buying hardware, 8
collecting data, 16
defining goals, 16
formulating a hypothesis, 17
getting more specifics, 17
hard disk, 96
hardware considerations, 8
introduced, 7
making adjustments, 18
managing, 13
managing the workload, 19
tunable kernel parameters, 191
tuning applications, 9
tuning methodology, 14
upgrading hardware, 8
performance tuning, quick guide, 235-239
piece structure, 107
ping(ADMN),136
PIO (programmed I/O), 22
PLOWBUFS, 79,193
PMTU discovery, 230, 233
_POSIX_CHOWN_RESTRICTED, 203
PPP
ASYH parameters, 228
performance tuning, 136
preemption variable, 37
PRFMAX,219
prime number, testing for, 83
primove variable, 36
priority
table of values, 175
types of, 29
process
context, 28
examining activity of, 173
finding virtual size of, 52
fixed-priority, 28
kernel parameters, 195
limiting number of, 202
memory management, paging and
swapping, 195
nice value, 29
priority of, 29
regions, 209
scheduling, 24
specifying maximum time slice, 196
process table
allocating entries in, 208
viewing, 55
processes, reducing number of, 70

proc-sz field, sar -v, 55
profiling, 10, 162
profiling files, in databases, 122
programmed I/O, PIO, 22
protocol stack, implementation of, 125
ps(C)
-el,52
using, 173
pseudo-ttys, configuring, 205
putbuf buffer, 193
PUTBUFSZ, 193

Q
QIC-02 tape drive, size of buffer, 220

R
RAID (redundant array of inexpensive
disks), performance, 102
raw I/O, 118
rawch/s field, sar -y, 110
%rcache field, sar -b, 54, 69, 75,91
rchar/s field, sar -c, 161
rclm/s field, sar -p, 50, 51
rcvin/s field, sar -y, 110
read-ahead, 73
readv(S), I/O vector size, 203
receive window size
adjusting, 131
setting for each interface, 226
records, locked by system, 209
region table, 209
repeaters, 137
reqblk/s field, sar -5, 94
request counts, 104
rescheduling jobs, 40
resident pages wanted, 196
retrans field, nfsstat -c, 145
root filesystem
checkpointing,206
compression, 206
file synchronization, 206
logging, 206
undelete depth, 206
undelete time, 206
ROOTCHKPT,206
ROOTLOG,206
ROOTMAXVDEPTH,206
ROOTMINVTIME,206

263

ROOTNOCOMP
ROOTNOCOMP,206
ROOTSYNC,206
routers, 125, 137
routing, performance considerations, 140141
routing metric, adjusting, 225
run queue
heavy activity on, 38
runnable processes on, 24
viewing activity of, 30
viewing occupancy of, 30
viewing size of, 30
%runocc field, sar -q, 30
runq-sz field, sar -q, 30
r+w /s field, sar -d, 90

s
sadc (System Activity Data Collector), 177
sar(ADM), 176
-B,86
-b, 54, 69, 75, 91
-c,161
-d, 69,89,90
enabling for use, 177
-g,111
-h,86
-L,165
-m,162
-n, 82, 83, 91
-0,162
-p,50,51
-q, 30,49,68
-r, 48, 51, 68
-5,93,220
system activity reporter, 176
-u, 23, 146, 155 ,
-v, 55, 83,202,208
-w,49,68
-y,110
scaling, perfect, 31
scall/s field, sar -c, 161
scatter-gather buffer headers, used by
system, 85
scatter-gather buffers
number set using NMPBUF, 193
tuning number of, 86
used by system, 84

264

sched
daemon, 47
heavy activity by, 51
scheduler, purpose of, 24
scheduling
cache affinity, 36
cache_affinity variable, 36
dopricalc variable, 35
fixed-priority, 11
load balancing, 37
loadbalance variable, 37
of processes, 34
preemption variable, 37
primove variable, 36
priority calculations, 35
screen saver, 204
SCRNMEM,205
SCSI disk request blocks, tuning, 93
SCSI disks
request queue, 93
tuning number of request blocks, 93
sdevice file, entry for sleeper driver, 167
sdmabuf/s field, sar -h, 86
SDSKOUT, 93, 220
SECCLEARID,204
SECLUID,203
SECSTOPIO, 204
security
disabling C2 features, 203
kernel parameters, 203
SEMAEM,217
semaphores
kernel parameters, 216-217
POSIX.lb,163
System V, 162
used by database, 168
using, 163
viewing activity of, 162
sema/s field, sar -m, 163
SEMMAP,216
SEMMNI,216
SEMMNS,163,217
SEMMNU,216
SEMMSL,217
SEM_NSEMS_MAX, 163, 217
SEMOPM,217
SEMUME,217
SEMVMX,217

swritls
send window size
adjusting, 131
setting for each interface, 226
Sequenced Packet Exchange, SPX, 157
serial I/O
device driver, 108
tuning, 110
server types, 139
setconf(ADM), 224
SGID bits, 204
shared memory
by CPUs,33
kernel parameters, 218
used by databases, 122
using, 165
shell layers, setting number of, 205
SHMMAX, 165, 218
SHMMIN,218
S~,165,218

sio, serial driver, 108
sleeper driver, 163, 167, 168
SLIP, performance tuning, 135
slpcpybufs/ s field, sar -B, 86
spin locks, 165
split job, 103
SP1MAP, 197
spurious interrupts, 110
SPX (Sequenced Packet Exchange), 157
sread/s field, sar -c, 161
stack, region, 209
static shared libraries, 11
stopio(S), 204
STRCTLSZ, 213
strd, daemon, 129
stream event, structures, 211
stream head, 123
stream heads
configuring number of, 129, 210
structures, 210
STREAMS
applications using, 129
buffer splitting, 211
configuring number of pipes, 212
kernel parameters, 209-213
message buffers, 127
message header, 127
message~124,127

monitoring use, 129
multiplexer links, 213
performance tuning, 123-131
too few stream heads, 129

STREAMS message, control portion, 213
STREAMS modules
kernel parameters,212
number on stream, 213
STREAMS pipes, configuring number of, 212
string: Configuration buffer full, 219
strinit - Cannot alloc STREAMS table, 210
striped disks, 98
STRMSGSZ, 212
stropen - Out of streams,210
STRSPLfTFRJ\C,128,211
stune file, 190
subnets, 139
SUDS library
AIO, 162
semaphores, 163
spin locks and latches, 165
SurD bits, 204
supplemental groups, limiting, 202
swap area
adding, 179
deleting, 179
examining usage of, 179
size of, 48
unused disk blocks in,48
used by system, 47
swap queue
activity on, 51
used by system, 49
viewing occupancy of, 49
viewing size of, 49
swap (ADM)
-1,48,68
using,179
swapdel - Total swap area too small, 196
swappable pages wanted, 196
swapper daemon, sched, 47
swapping
activity,51
affecting I/O throughput, 69
consuming CPU resources, 40
heavy activity, 51
indicating memory shortage, 67
kernel parameters, 195
memory,195
used by system, 47
viewing activity of, 49
%swpocc field, sar -q, 49, 68
swpot/ s field, sar -w, 68
swpq-sz field, sar -q, 49, 68
swrit/s field, sar -c, 161

265

synchronous
synchronous 1/0,72
synchronous writes, configuring on NFS
server, 151
%sys field, sar -u, 23
system, increasing reliability of, 194
system activity, per command, 180
System Activity Data Collector, sa dc, 177
system activity recording, enabling, 177
system calls
excessive number of, 162, 168
investigating activity of, 161
number of characters read, 161
number of characters written, 161
number of execs, 162
number of forks, 162
number of reads, 161
number of writes, 161
reducing, 166
total number of, 161
system mode, operating system state, 22
system name, 219
system resources, kernel, 189, 190
system tables, viewing, 55
SZ field, ps -el, 52

T
tape drive buffer, size of, 220
TBLDMAPAGES,207
TBLLIMIT, 207
TBLMAP,207
TBLNK,204
TBLPAGES, 207
TBLSYSLIMIT,207
TCP (Transmission Control Protocol)
introduced, 131
parameters. 232
tcpalldebug, .;.34
tcpconsdebug, 234
tcp_initiaLtimeout, 232
TCP/IP

daemons, performance implications, 134
global parameters, changing, 226
maximum segment size, adjusting, 131
one-packet mode, setting, 132
parameters, 227-234
performance considerations, 132
performance tuning, 131-142
problem solving, 132
receive window size, adjusting, 131

266

TCP /IP (continued)
send window size, adjusting, 131
setting receive window size, 226
setting send window size, 226
setting truncate segment, 226
time-to-live, setting, 132
using with NFS, 152
tcp_keepidle, 233
tcp_keepintvl, 233
tcp_mssdflt,233
tcp_mss_sw_threshold, 233
tcp_nkeep, 233
tcp_offer_bi~mss, 233
tcpprintfs, 234
tcp_smaILrecvspace,234
tcp_urgbehavior,234
terminal driver,109
text
of program, 42
region, 209
shared,209
ThickNet, 137
ThinNet, 137
throughput, disk, 90
time slice, 26
timeout field, nfsstat -c, 145
time-to-live, setting, 132
timex (ADM), 180
TIMEZONE, 219
timezone variable, TZ, 219
timod(M), STREAMS modules, 212
TLI (Transport Library Interface), kernel parameters, 212
Too big, 202
traceroute(ADMN),136
transfer buffers
number set using NMPBUF, 193
tuning number of, 86
used by system, 84
Transport Library Interface, TLI, kernel parameters, 212
trigger level, UART, 111
truncate segment, setting for each interface,
226
TTHOG,111

tty
configuration parameters, 197
kernel parameters, 204
terminal driver, 109

virtual

tuning
CPU resources, 21-40
CPU-bound systems, 40
disk I/O-bound systems, 92
increasing disk I/O throughput, 75
increasing speed of access to buffers, 80
I/O resources, 71-122
LMCFS performance, 155
memory resources, 41-70
memory-bound systems, 52
methodology, 14
networking resources, 123-160
NFS client performance, 147
NFS performance, 146
number of biod daemons, 149
number of nfsd daemons, 148
PPP performance, 136
reducing contention for buffers, 80
reducing contention for multiphysical
buffers, 86
reducing contention for SCSI disk request
blocks, 93
reducing disk I/O using the namei cache,
83
serial 1/0, 110
SLIP performance, 135
STREAMS resources, 130
system call activity, 161-169
TCP /IP performance, 131
virtual disk performance, 100
X server performance, 57
twisted pair, 137
TZ (timezone) variable, 219

u
UART (universal asynchronous
receiver /transmitter), 108
UDP, parameters, 234
udpprintfs, 234
ULIMIT,202
umask(S), default mask, 202
undelete, controlling time, 206
undelete depth, 206
undelete time, 206
undo structures, number in system, 216
universal asynchronous
receiver/transmitter, UART, 108
upgrading hardware, 8
user configuration, kernel parameters, 201

User Datagram Protocol, parameters, 234
user mode, operating system state, 22
%usr field, sar -u, 23
/usr/adm/messages,42,74

v
vcview(LMC), -v, 155
VDASYNCMAX, 107,201
VDASYNCPARITY,201
VDASYNCWRITES, 201
VDHASHMAX, 108,200
vdisk - job pool is empty, 107
vdisk - job queue is full, 107
vdisk - piece pool is empty, 107
vdisk driver, 100, 105
VDJOBS, 107,200
VDRPT,201
VDUNlTJOBS, 107, 200
VDUNITMAX,200
VDWRITEBACK, 201
versioning, controlling depth of, 205
vflt/s field, sar -p, 50
VGA_PLASMA,221
vhand
daemon, 44, 195
heavy activity by, 51
viewing, block 1/0,89
virtual, terminals, 222
virtual connection, 126
virtual disks
asynchronous writes, 107
balancing disk load, 105
buffer headers, 106
choosing a cluster size, 103
comparison of configurations, 100
CPU requirements, 100
examining request counts, 104
full stripe, 103
hash table, 108
job pool, 107
kernel parameters, 200
memory requirements, 101
number of job structures, 107
number of jobs, 107
number of piece pool entries, 107
piece pool, 107
RAID 4 and 5 performance, 102
split job, 103
striping, 98

267

virtual

virtual disks (continued)
tuning, 100-108
tuning kernel parameters, 106
used by databases, 103
using, 98
vdisk driver, 100
write-back caching, 201
virtual memory statistics, examining, 181
vrnstat(C), virtual memory statistics, 181

w
wait field, nfsstat -c, 145
waiting for I/O, operating system state, 22
%wcache field, sar -b, 54,69,75,91
wchar/s field, sar -c, 161
window sizes, setting for each interface,
226
%wio field
cpusar -u, 119
sar -u, 23, 39, 90
workload, managing, 19
writev(S), I/O vector size, 203

x
X client
performance, 56
unable to start, 58
X server, tuning, 57
X Window System, configuration, 58
xdrcall field, nfsstat -s, 145
XENIX, shared data segments, 218
XENIX semaphores, 217
xsd_alloc - XENIX shared data table
overflow, 218
XSDSEGS, 218
XSDSLOTS, 218
xsem_alloc - XENIX semaphore table
overflow, 217
XSEMMAX,217
xtinit - Cannot allocate xt link buffers, 205

268

1 May 1995

AU20004POOl



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.3
Linearized                      : No
XMP Toolkit                     : Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:56:37
Create Date                     : 2014:01:12 18:57:35-08:00
Modify Date                     : 2014:01:12 18:38-08:00
Metadata Date                   : 2014:01:12 18:38-08:00
Producer                        : Adobe Acrobat 9.55 Paper Capture Plug-in
Format                          : application/pdf
Document ID                     : uuid:031f308d-85c4-6d4d-a3a3-96da49c8e33b
Instance ID                     : uuid:6853ca1b-a612-c84a-b6d6-e1a324070046
Page Layout                     : SinglePage
Page Mode                       : UseNone
Page Count                      : 280
EXIF Metadata provided by EXIF.tools

Navigation menu