AU20004P001_SCO_Open Server_Performance_Guide_May95 AU20004P001 SCO Open Server Performance Guide May95
AU20004P001_SCO_OpenServer_Performance_Guide_May95 AU20004P001_SCO_OpenServer_Performance_Guide_May95
User Manual: AU20004P001_SCO_OpenServer_Performance_Guide_May95
Open the PDF directly: View PDF .
Page Count: 280
Download | |
Open PDF In Browser | View PDF |
® seo OpenServerTM Performance Guide seo OpenServerTM seQ QpenServer™ Performance Guide © 1983-1995 The Santa Cruz Operation, Inc. All rights reserved. © 1992-1994 AT&T Global Information Solutions Company; © 1987-1989 Legent Corporation; © 1980-1989 Microsoft Corporation; © 1993-1994 Programmed Logic Corporation; © 1988 UNIX Systems Laboratories, Inc. All rights reserved. No part of this publication may be reproduced, transmitted, stored in a retrieval system, nor translated into any human or computer language, in any form or by any means, electronic, mechanical, magnetic, optical, chemical, manual, or otherwise, without the prior written permission of the copyright owner, The Santa Cruz Operation, Inc., 400 Encinal Street, Santa Cruz, California, 95060, USA. Copyright infringement is a serious matter under the United States and foreign Copyright Laws. Information in this document is subject to change without notice and does not represent a commitment on the part of The Santa Cruz Operation, Inc. the seQ logo, The Santa Cruz Operation, Open Desktop, QDT, Panner, sea Global Access, seQ QK, seQ OpenServer, seQ MultiView, seQ Visual Tel, Skunkware, and VP fix are trademarks or registered trademarks of The Santa Cruz Operation, Inc. in the USA and other countries. UNIX is a registered trademark in the USA and other countries, licensed exclusively through X/Open Company Limited. All other brand and product names are or may be trademarks of, and are used to identify products or services of, their respective owners. SCQ, Document Version: 5.0 1 May 1995 The sea software that accompanies this publication is commercial computer software and, together with any related documentation, is subject to the restrictions on US Government use as set forth below. If this procurement is for a DOD agency, the following DFAR Restricted Rights Legend applies: RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the Government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of Rights in Technical Data and Computer Software Clause at DFARS 252.227-7013. Contractor/Manufacturer is The Santa Cruz Operation, Inc., 400 Encinal Street, Santa Cruz, CA 95060. If this procurement is for a civilian government agency, this FAR Restricted Rights Legend applies: This computer software is submitted with restricted rights under Government Contract No. (and Subcontract No. , if appropriate). It may not be used, reproduced, or disclosed by the Government except as provided in paragraph (g)(3)(i) of FAR Clause 52.227-14 alt III or as otherwise expressly stated in the contract. Contractor/Manufacturer is The Santa Cruz Operation, Inc., 400 Encinal Street, Santa Cruz, CA 95060. RESTRICTED RIGHTS LEGEND: The copyrighted software that accompanies this publication is licensed to the End User only for use in strict accordance with the End User License Agreement, which should be read carefully before commencing use of the software. This sea software includes software that is protected by these copyrights: © 1983-1995 The Santa Cruz Operation, Inc.; © 1989-1994 Acer Incorporated; © 1989-1994 Acer America Corporation; © 1990-1994 Adaptec, Inc.; © 1993 Advanced Micro Devices, Inc.; © 1990 Altos Computer Systems; © 1992-1994 American Power Conversion, Inc.; © 1988 Archive Corporation; © 1990 AT! Technologies, Inc.; © 1976-1992 AT&T; © 1992-1994 AT&T Global Information Solutions Company; © 1993 Berkeley Network Software Consortium; © 1985-1986 Bigelow & Holmes; © 1988-1991 Carnegie Mellon University; © 1989-1990 Cipher Data Products, Inc.; © 1985-1992 Compaq Computer Corporation; © 1986-1987 Convergent Technologies, Inc.; © 1990-1993 Cornell University; © 1985-1994 Corollary, Inc.; © 1988-1993 Digital Equipment Corporation; © 1990-1994 Distributed Processing Technology; © 1991 D.L.S. Associates; © 1990 Free Software Foundation, Inc.; © 1989-1991 Future Domain Corporation; © 1994 Gradient TechnolOgies, Inc.; © 1991 Hewlett-Packard Company; © 1994 IBM Corporation; © 1990-1993 Intel Corporation; © 1989 Irwin Magnetic Systems, Inc.; © 1988-1994 IX! Limited; © 1988-1991 JSB Computer Systems Ltd.; © 1989-1994 Dirk Koeppen EDV-Beratungs-GmbH; © 1987-1994 Legent Corporation; © 1988-1994 Locus Computing Corporation; © 1989-1991 Massachusetts Institute of Technology; © 1985-1992 Metagraphics Software Corporation; © 1980-1994 Microsoft Corporation; © 1984-1989 Mouse Systems Corporation; © 1989 Multi-Tech Systems, Inc.; © 1991 National Semiconductor Corporation; © 1990 NEC Technologies, Inc.; © 1989-1992 Novell, Inc.; © 1989 Ing. C. Olivetti & C. SpA; © 1989-1992 Open Software Foundation, Inc.; © 1993-1994 Programmed Logic Corporation; © 1989 Racal InterLan, Inc.; © 1990-1992 RSA Data Security, Inc.; © 1987-1994 Secureware, Inc.; © 1990 Siemens Nixdorf Informationssysteme AG; © 1991-1992 Silicon Graphics, Inc.; © 1987-1991 SMNP Research, Inc.; © 1987-1994 Standard Microsystems Corporation; © 1984-1994 Sun Microsystems, Inc.; © 1987 Tandy Corporation; © 1992-1994 3COM Corporation; © 1987 United States Army; © 1979-1993 Regents of the University of California; © 1993 Board of Trustees of the University of Illinois; © 1989-1991 University of Maryland; © 1986 University of Toronto; © 1976-1990 UNIX System Laboratories, Inc.; © 1988 Wyse Technology; © 1992-1993 Xware; © 1983-1992 Eric P. Allman; © 1987-1989 Jeffery D. Case and Kenneth W. Key; © 1985 Andrew Cherenson; © 1989 Mark H. Colburn; © 1993 Michael A. Cooper; © 1982 Pavel Curtis; © 1987 Owen DeLong; © 1989-1993 Frank Kardel; © 1993 Carlos Leandro and Rui Salgueiro; © 1986-1988 Larry McVoy; © 1992 David L. Mills; © 1992 Ranier Pruy; © 1986-1988 Larry Wall; © 1992 Q. Frank Xia. All rights reserved. SCO NFS was developed by Legent Corporation based on Lachman System V NFS. sca TCP /IP was developed by Legent Corporation and is derived from Lachman System V STREAMS TCP, a joint development of Lachman Associates, Inc. (predecessor of Legent Corporation) and Convergent Technologies, Inc. About this book 1 How this book is organized .................................................................... Related documentation ........................................................................... Typographical conventions ..................................................................... How can we improve this book? ............................................................. 1 2 5 6 Chapter 1 What determines performance 7 Hardware factors that influence performance ........................................ 8 Software factors that influence performance ......................................... 9 Chapter 2 Managing performance Tuning methodology ............................................................................. Defining performance goals ..... ...................... .......................... ....... Collecting data ................................................................................ Formulating a hypothesis .... ..... ...... ................................................. Getting more specifics .................................................................... Making adjustments to the system .................................................. Performance tuning case studies ......................................................... Managing the workload ......................................................................... 13 14 16 16 17 17 18 19 19 Chapter 3 Tuning CPU resources Operating system states ....................................................................... Viewing CPU activity ....................................................................... Process states .................. " ......................... ........ .. ..... .................... Clock ticks and time slices .............................................................. Context switching ............................................................................ Interrupts ........................................................................................ Calculation of process priorities ....................................................... Examining the run queue ................................................................ Multiprocessor systems ........................................................................ Support for multiple processors .. .......... ......... ........... ......... .............. Table of contents 21 22 23 24 26 26 28 28 30 31 33 v Using the mpstat load displayer ...................................................... Examining interrupt activity on multiprocessor systems ................... Process scheduling ............................................................................... Adjusting the scheduling of processes ... .... ... ..... ... ...... ........ .... ........ Controlling priority calculations - dopricalc .................................... Controlling the effective priority of processes - primove ................. Controlling cache affinity - cache_affinity....................................... Controlling process preemption - preemptive ................................ Load balancing - loadbalance ....................................................... Identifying CPU-bound systems ............................................................ Tuning CPU-bound systems .................................................................. 34 34 34 34 35 36 37 37 37 38 40 Chapter 4 Tuning memory resources Physical memory ................ .................. ......... ....... ................................. Virtual memory .......................................... ,. .... .................... ......... ......... Paging ............................................................................................ Swapping ........................................................................................ Viewing physical memory usage ..................................................... Viewing swap space usage ............................................................. Viewing swapping and paging activity ............................................. Identifying memory-bound systems ..................................................... Tuning memory-bound systems ........................................................... Reducing disk activity caused by swapping and paging ................... Increasing memory by reducing the buffer cache size ...................... Investigating memory usage by system tables ................................ Using graphical clients on low memory systems ................................ Tuning X server performance ................................................................ Kernel parameters that affect the X Window System ........................... Case study: memory-bound workstation ............................................. System configuration ...................................................................... Defining a performance goal........................................................... Collecting data ................................................ ................................ Formulating a hypothesis ................................................................ Getting more specifics ............. ...... ....... .... .... ......... ....... ... ... ............ Making adjustments to the system .................................................. Case study: memory-bound software development system ............... System configuration ...................................................................... Defining a performance goal........................................................... vi 41 42 42 44 47 48 48 49 51 52 53 54 55 56 57 58 59 59 59 60 61 61 65 65 65 66 Collecting data ................................................................................ Formulating a hypothesis ................................................................ Getting more specifics .................................................................... Making adjustments to the system .................................................. 66 67 67 69 Chapter 5 Tuning 1/0 resources 71 Subsystems that affect disk and other 110 ............................................ 71 How the buffer cache works .................................................................. 73 Viewing buffer cache activity ........................................................... 75 Increasing disk 110 throughput by increasing the buffer cache size ............................................................................................ 75 Positioning the buffer cache in memory ........................................... 79 Tuning the number of buffer cache hash queues ............................. 80 How the namei cache works ................................................................. 81 Viewing namei cache activity .......................................................... 82 Reducing disk 1/0 by increasing the size of the namei cache ........... 83 How multiphysical buffers are used ..................................................... 84 Tuning the number of multiphysical buffers ................ ...................... 86 The mechanics of a disk transfer ......................................................... 87 Viewing disk and other block 110 activity .......................................... 89 Identifying disk lID-bound systems .................................................. 90 Tuning disk I/O-bound systems ........................................................ 92 SCSI disk driver request queue ....................................................... 93 Tuning the number of SCSI disk request blocks .... ............ ........ ....... 93 Filesystem factors affecting disk performance .......... ....................... 94 Overcoming performance limitations of hard disks ............................ 96 Tuning virtual disk performance ......................................................... 100 Performance considerations for RAID 4 and 5 ............ ............... ..... 102 Choosing a cluster size ................................................................. 103 Balancing disk load in virtual disk arrays ....................................... 105 Tuning virtual disk kernel parameters ............................................ 106 Serial device resources ........... .. ........................................ .. ................ 108 Tuning serial device resources ...................................................... 110 Case study: IIO-bound multiuser system ........................................... 113 System configuration .................................................................... 113 Defining a performance goal ......................................................... 113 Collecting data .............................................................................. 113 Formulating a hypothesis ........ .......... ....................... ....... .............. 115 Getting more specifics .................................. ................................ 115 Table of contents vii Making adjustments to the system ... ..... ........... .... ...... ....... ... ...... ... Case study: unbalanced disk activity on a database server ............. System configuration .................................................................... Defining a performance goal ......................................................... Collecting data .............................................................................. Formulating a hypothesis .............................................................. Getting more specifics .................................................................. Making adjustments to the system ................................................ 117 118 118 119 119 120 120 122 Chapter 6 Tuning networking resources 123 123 129 130 131 131 133 142 144 146 154 155 157 158 158 158 158 159 159 160 STREAMS resources ............................................................................ Monitoring STREAMS performance ................................................ Tuning STREAMS usage ................................................................ TCP/IP resources .................................................................................. Tuning TCP/IP performance ...... ................................. ............... ..... Monitoring TCPIIP performance ..................................................... NFS resources ...................................................................................... Monitoring NFS performance ...... .......................... ..... ... ........... ..... Tuning NFS performance ...................................................... ......... LAN Manager Client Filesystem resources ......................................... Tuning LAN Manager Client Filesystem performance ..................... Other networking resources ............................................................... Case study: network overhead caused by X clients .......................... System configuration .................................................................... Defining a performance goal ......................................................... Collecting data .............................................................................. Formulating a hypothesis .............................................................. Getting more specifics .................................................................. Making adjustments to the system ................................................ Chapter 7 Tuning system call activity 161 Viewing system call activity ............................................................... Identifying excessive read and write system call activity ................ Viewing process fork and exec activity .......................................... Viewing AIO activity ....................................................................... viii 161 162 162 162 Viewing IPC activity ................................................. '" ................ '" Reducing system call activity ............................................................. Case study: semaphore activity on a database server ...................... System configuration .................................................................... Defining a performance goal ......................................................... Collecting data .............................................................................. Formulating a hypothesis ............... .................. ............................. Getting more specifics .................................................................. Making adjustments to the system ................................................ 162 166 167 167 167 168 168 168 169 Appendix A Tools reference 171 df - report disk space usage ............................................................. ps - check process activity ............................................................... sar - system activity reporter ........................................................... How sar works .............................................................................. Running sar .......... , ....... ........................ ...... ........ ..................... ..... swap - check and add swap space ....... ...... .................. ................ .... timex - examine system activity per command ............................... vmstat - virtual memory statistics ................................ .................... 172 173 176 177 178 179 180 181 AppendixB Configuring kernel parameters 185 When to change system parameters .................................................. Configuration tools ............................................................................. Using configure to change kernel resources ........ ..... ...... ...... ......... Using idtune to reallocate kernel resources ................................... Kernel parameters that you can change using configure ............... Examining and changing configuration-dependent values .............. 186 188 189 190 191 223 AppendixC Configuring TCP/IP tunable parameters 225 Using ifconfig to change parameters for a network card ................ 225 Using inconfig to change global TCP/IP parameters ....................... 226 TCP/IP parameters ........................................................................ 227 Table of contents ix Appendix 0 Quick system tuning reference 235 Bibliography .................................... ............. ......................................... ..... 241 Glossary of performance terminology ............................................... 243 x About this book This book is for administrators of sca OpenServerTM systems who are interested in investigating and improving system performance. It describes performance tuning for uniprocessor, multiprocessor, and networked systems, including those with TCP lIP, NFS®, and X clients. It discusses how the various subsystems function, possible performance constraints due to hardware limitations, and optimizing system configuration for various uses. Concepts and strategies are illustrated with case studies. You will find the information you need more quickly if you are familiar with: • "How this book is organized" (this page) • "Related documentation" (page 2) • "Typographical conventions" (page 5) Although we try to present information in the most useful way, you are the ultimate judge of how well we succeed. Please let us know how we can improve this book (page 6). How this book is organized This book tells you: • what is meant by system performance (page 7) • how to tune a system (page 13) • how the configuration of various system components influences the performance of the operating system: Central Processing Units (CPUs) (page 21) for single and multiprocessor systems 1 About this book - memory (page 41) including physical (main) memory in Random Access Memory (RAM) and swap areas on disk - Input/Output (I/O) (page 71) including hard disks and serial devices - networking (page 123) including STREAMS I/O, TCP /IP and NFS • how you can examine system call activity (page 161) if you are an application programmer A set of case studies (page 19) illustrates the methodology of system tuning, and the tools that you can use to examine performance. Appendixes provide additional information about: • the tools (page 171) that you can use to examine performance • the kernel parameters (page 185) that you can use to tune performance • a quick guide to system tuning (page 235) There is also a glossary (page 243) which explains technical terms and acronyms used throughout the book. Related documentation SCO OpenServer systems include comprehensive documentation. Depending on which SCO OpenServer system you have, the following books are available in online and/or printed form. Access online books by double-clicking on the Desktop Help icon. Additional printed versions of the books are also available. The Desktop and most sea OpenServer programs and utilities are linked to extensive context-sensitive help, which in tum is linked to relevant sections in the online versions of the following books. See "Getting help" in the sea OpenServer Handbook. NOTE When you upgrade or supplement your SCO OpenServer software, you might also install online documentation that is more current than the printed books that came with the original system. For the most up-to-date information, check the online documentation. Release Notes contain important late-breakit,lg information about installation, hardware requirements, and known limitations. The Release Notes also highlight the new features added for this release. sea OpenServer Handbook provides the information needed to get your sea OpenServer system up and running, including installation and configuration instructions, and introductions to the Desktop, online documentation, system administration, and troubleshooting. 2 Performance Guide Related documentation Graphical Environment Guide describes how to customize and administer the Graphical Environment, including the X Window System ™ server, the SCQ® Panner™ window manager, the Desktop, and other X clients. Graphical Environment help provides online context-sensitive help for Calendar, Edit, the Desktop, Help, Mail, Paint, the SCQ Panner window manager, and the UNIX® command-line window. Graphical Environment Reference contains the manual pages for the X server (section X), the Desktop, and X clients from SCQ and MIT (section XC). Guide to Gateways for LAN Servers describes how to set up SCQ® Gateway for NetWare® and LAN Manager Client software on an SCQ OpenServer system to access printers, filesystems, and other services provided by servers running Novell ® NetWare® and by servers running LAN Manager over DQS, QS/2®, or UNIX systems. This book contains the manual pages for LAN Manager Client commands (section LMC). Mail and Messaging Guide describes how to configure and administer your mail system. Topics include sendmail, MMDF, seQ Shell Mail, mailx, and the Post Office Protocol (POP) server. Networking Guide provides information on configuring and administering TCP /IP, NFS®, and IPX/SPXTM software to provide networked and distributed functionality, including system and network management, applications support, and file, name, and time services. Networking Reference contains the command, file, protocol, and utility manual pages for the IPX/SPX (section PADM), NFS (sections NADM, NC, and NF), and TCP/IP (sections ADMN, ADMP, SFF, and TC) networking software. Operating System Administrator's Reference contains the manual pages for system administration commands and utilities (section ADM), system file formats (section F), hardware-specific information (section HW), miscellaneous commands (section M), and SCQ Visual Tcl™ commands (section TCL). Operating System Tutorial provides a basic introduction to the SCQ OpenServer operating system. This book can also be used as a refresher course or a quick-reference guide. Each chapter is a self-contained lesson designed to give hands-on experience using the seQ OpenServer operating system. 3 About this book Operating System User's Guide provides an introduction to SCO OpenServer command-line utilities, the SCO Shell utilities, working with files and directories, editing files with the vi editor, transferring files to disks and tape, using DOS disks and files in the SCO OpenServer environment, managing processes, shell programming, regular expressions, awk, and sed. Operating System User's Reference contains the manual pages for user-accessible operating system commands and utilities (section C). PC-Interface Guide describes how to set up PC-Interface™ software on an seo OpenServer system to provide print, file, and terminal emulation services to computers running PC-Interface client software under DOS or Microsoft® Windows™. sea Merge User's Guide describes how to use and configure an SCo® Merge™ system. Topics include installing Windows, installing DOS and Windows applications, using DOS with the SCO OpenServer operating system, configuring hardware and software resources, and using SCO Merge in an international environment. sea Wabi User's Guide describes how to use SCO® WabFM software to run Windows 3.1 applications on the SCO OpenServer operating system. Topics include installing the SCO Wabi software, setting up drives, configuring ports, managing printing operations, and installing and running applications. System Administration Guide describes configuration and maintenance of the base operating system, including account, file system, printer, backup, security, UUCP, and virtual disk management. The SCO OpenServer Development System includes extensive documentation of application development issues and tools. Many other useful publications about SCO systems by independent authors are available from technical bookstores. 4 Performance Guide Typographical conventions Typographical conventions This publication presents commands, filenames, keystrokes, and other special elements in these typefaces: Example: Used for: lp or Ip(C) commands, device drivers, programs, and utilities (names, icons, or windows); the letter in parentheses indicates the reference manual section in which the command, driver, program, or utility is documented lnewlclient.1ist files, directories, and desktops (names, icons, or windows) root system, network, or user names filename placeholders (replace with appropriate name or value) (Esc) keyboard keys Exit program? system output such as prompts and messages yes or yes user input "Description" field names or column headings (on screen or in database) open or open(S) library routines, system calls, kernel functions, C keywords; the letter in parentheses indicates the reference manual section in which the file is documented $HOME environment or shell variables SIGHUP named constants or signals buf C program structures C program structure members and variables 5 About this book How can we improve this book? What did you find particularly helpful in this book? Are there mistakes in this book? Could it be organized more usefully? Did we leave out information you need or include unnecessary material? If so, please tell us. To help us implement your suggestions, include relevant details, such as book title, section name, page number, and system component. We would appreciate information on how to contact you in case we need additional explanation. To contact us, use the card at the back of the write to us at: sea OpenServer Handbook or Technical Publications Attn: eFT The Santa Cruz Operation, Inc. PO Box 1900 Santa Cruz, California 95061-9969 USA or e-mail us at: techpubs@sco.com or ... uunet!scoltechpubs Thank you. 6 Performance Guide Chapter 1 What determines perfo111Ulnce A computer system consists of a finite set of hardware and software components. These components constitute the resources of the system. One of the tasks of the operating system is to share these resources between the programs that are running on the system. Performance is a measure of how well the operating system does this task; the aim of performance tuning is to make it do this task better. A system's hardware resources have inherent physical limits in the quantity of data they can handle and the speed with which they can do this. The physical subsystems that compose hardware include: • One or more central processing units (CPUs), and the ancillary processors that support them. • Memory - both in Random Access Memory (RAM) and as swap space on disk. • I/O devices including hard and floppy disk drives, tape drives, serial ports, and network cards. • Networks - both Local Area Networks (LANs) and Wide Area Networks (WANs). Operating system resources are limited by the hardware resources such as the amount of memory available and how it is accessed. The internal resources of the operating system are usually configurable and control such things as the size of data structures, security policy, standards conformance, and hardware modes. 7 What determines performance Examples of operating system resources are: • The tables that the operating system uses to keep track of users and the programs they are running. • The buffer cache and other memory buffers that reduce dependence on accessing slow peripheral devices. If your system is connected to one or more networks, it may depend on remote machines to serve files, perform database transactions, perform calculations, run X clients, and provide swap space, or it may itself provide some of these services. Your system may be a router or gateway if it is connected to more than one network. In such cases, the performance of the network and the remote machines will have a direct influence on the performance of your system. Hardware factors that influence performance Your system's hardware has the greatest influence on its performance. It is the ultimate limiting factor on how fast a process will run before it has to start sharing what is available with the operating system and other user processes. Performance tuning can require you to add hardware or upgrade existing hardware if a system's physical subsystems are unbalanced in power, or insufficiently powerful to satisfy the demands being put on them. There may come a time when, despite your best efforts, you cannot please enough people enough of the time with the hardware resources at your disposal. If so, you will have to go and buy some more hardware. This is one reason why monitoring and recording your system's performance is important if you are not the person spending the money. With the information that you have gathered, you can make a strong case for upgrading your system. It is important to balance the power of your computer's subsystems with each other; the power of the CPU(s) is not enough in itself. If the other subsystems are slow relative to the available processing power, they will act to constrain it. If they are more powerful, you have possibly overspent, although you should be able to upgrade processing power without much extra expenditure. There are many hardware factors that can limit the overall system performance: • The speed and width of the system's address and data buses. • The model, clock speed, and the size of the internal level-one (Ll) memory cache of the system's CPU or CPUs. • The size of the level-two (L2) cache memory which is external to the CPU. This should be capable of working with all of physical memory. 8 Performance Guide Software factors that in fluence performance • The amount of memory, the width of its data path, and its access time. The time that the CPU has to wait for memory to be accessed limits its performance. • The speed and width of a SCSI bus controlled by a host adapter. • The width of the data path on peripheral controller cards (32, 16, or 8-bit). • Whether controllers have built-in cache. This is particularly important for disk and network controllers. • Access time for hard disks. • Whether intelligent or dumb serial cards are used; intelligent cards offload much of the work that would otherwise be performed by the cPU. On multiprocessor machines, the following considerations also become important: • Write-back L2 cache (for instructions and data) with cache coherency on each cPU to reduce the number of accesses to main memory. This has the benefit of improving cPU performance as well as improving general system performance by reducing contention for the system bus. • Support for fully distributed interrupts to allow any CPU to service interrupts from I/O devices such as network and disk controllers. • The memory and I/O subsystems must be as fast as possible to keep up with the demands of the enhanced cPU performance. Use of intelligent peripheral controllers is particularly desirable. Software factors that influence performance The way in which applications are written usually has a large impact on performance. If they make inefficient use of processing power, memory, disk, or other subsystems, it is unlikely that you will improve the situation significantly by tuning the operating system. The efficiency of the algorithms used by an application, or the way that it uses system services, are usually beyond your control unless you have access to source code. Some applications such as large relational database systems provide extensive facilities for performance monitoring and tuning which you should study separately. 9 Software factors that in fluence performance • Is it using large numbers of system calls? System calls are expensive in processing overhead and may cause a context switch on the return from the call. You can use trace(CP) to discover the system call usage of a program. • Is it using inefficient read(S) and write(S) system calls to move small numbers of characters at a time between user space and kernel space? If possible use buffered I/O to avoid this. • Are formatted reads and writes to disk being used? Unformatted reads and writes are much more efficient for maintaining precision, speed of access, and generally need less disk space. • Is the application using memory efficiently? Many older applications use disk extensively since they were written in the days of limited core storage and expensive memory. • What version of malloc(S) does the application use (if it uses it at all)? The version in the libmalloc.a library allows more control over the allocation of memory than the version in libc.a. Memory leakage can occur if you do not call free(S) to place blocks of memory back in the malloc pool when you have finished with them. • Does the application group together routines that are used together? This technique (known as localization of reference) tends to reduce the number of text pages that need to be accessed when the program runs. (The system does not load pages of program text into memory when a program runs unless they are needed for the program's execution.) • Does the application use shared libraries or dynamic linked libraries (DLLs)? The object code of shared libraries can be used by several applications at the same time; the object code of DLLs is also shared and is only loaded when an application needs to access it. Using either type of library is preferable to using statically linked libraries which cannot be shared. • Does the application use library routines and system calls that are intended to enhance performance? Examples of the APls provided are: Memory-mapping loads files directly into memory for processing (see mmap(S)). Fixed-priority scheduling allows selected time-critical processes to control how they are scheduled and ensure that they execute when they have work to perform. Applications can use the predictable scheduling behavior to improve throughput and reduce contention (see sched_setparam(S) and sched~etparam(S)). Support for high performance asynchronous I/O, semaphores and latches, and high-resolution timers and spin locks for use by threaded applications (see aio(FP), semaphore(FP), and time(FP)). 11 ~hatderermmesperlormance 12 Perlormance Guide Chapter 2 Managing perfonnance To manage the performance of a system, you normally try to share the available resources equally between its users. However, different users perceive performance according on their own needs and the demands of the applications that they are running. If they use interactive programs, response time is likely to be their main index of performance. Someone interested in performing numeric analysis may only be worried about the turnaround time for off-line batch mode processing. Another person may wish to perform sophisticated image processing in real time and requires quick access to graphics data files. You, as the administrator, are interested in maximizing the throughput of all jobs submitted to the system - in fact, keeping everyone happy. Unfortunately, such differing requirements may be difficult to reconcile. For example, if you administer a single standalone system, you may decide that your main priority is to improve the interactive response time. You may be able to do this by decreasing the overall workload at peak usage times. 'This would involve scheduling some work to run as batch jobs at quieter times, or perhaps restricting simultaneous access to your system to a smaller number of users. However, in speeding up your system's response you now have the additional problem of decreased throughput, which results in the completion of fewer jobs, potentially at critical times. In pursuing any particular performance improvement policy there are always likely to be trade-offs, especially in a situation where resources are at a premium. The next section covers the setting of realistic performance goals as the first step in improving the performance of your computer system. You are then given a method for observing and tuning a system. 13 Managing performance Tuning methodology You can optimize performance by system tuning. This is the process of making adjustments to the way in which various system resources are used so that they correspond more closely to the overall usage patterns of the system. You can improve the overall response time of the system by locating and removing system bottlenecks. You can also customize the various resources to correspond to the needs of an application that is run frequently on the system. Any system tuning that you perform is limited because the performance of an operating system depends closely on the hardware on which it is installed. To tune a system efficiently, you need a good understanding both of the various system resources, and of how the system is going to be used. This might also involve understanding how different applications use system resources. System tuning is an ongoing process. A well-tuned system may not remain so if the mix of applications and users changes. Once a system has been successfully tuned, you should monitor performance regularly as part of routine system administration. This allows you to make modifications when changes in performance first occur, and not when the performance degrades to the point where the system becomes unusable. You may be able to extend a system's resources by adding or reconfiguring hardware, but remember that these resources always remain finite. Also you should always bear in mind that there is no exact formula for tuning a system - performance is based on the mixture of applications running on the system, the individuals using them, and your perception of the system's performance. The flowchart shown in Figure 2-1 (page 15) illustrates the tuning methodology we recommend you follow. Its most important feature is its feedback loop - you may not always get the result you expect when you make changes to your system. You must be prepared to undo your changes so that you can restore your system to its earlier state. The steps outlined in the methodology are described in the following sections. They are further illustrated by the set of case studies discussed in "Performance tuning case studies" (page 19). 14 Performance Guide Tuning methodology Figure 2-1 Flowchart illustrating the methodology for system performance tuning 15 Managing performance Defining performance goals The first step in tuning a system is to define a set of performance goals. This can range from discovering and removing system bottlenecks in order to improve overall performance, to tuning the system specifically to run a single application, set of applications, or benchmark as efficiently as possible. The performance goals should be listed in order of priority. Often goals can conflict; for example, a system running a database that uses a large cache might also require a large portion of memory to compile programs during software development. Assigning priority to these goals might involve deciding whether the database performance or the speed of the compilations is more important. You should attempt to understand all goals as well as possible. If possible, you should note which resources will be affected by each goal. If you specify several goals, it is important that you understand where they might conflict. Although this guide assumes that you are a system administrator, the goals identified for the tuning of the various subsystems also reflect the perspectives and needs of users, and application developers. Collecting data Once you have identified your performance goals, your next step is to determine how the system is performing at present. The aspects of a system's performance that you measure depend on the sort of tasks you expect it to carry out. These are some typical criteria that you might use to judge a system: • The time taken for an interactive application to perform a task. • The time taken to process a database transaction. • The time taken for an application to perform a set number of calculations. If the system is meant to perform a single function, or run a particular applica- tion or benchmark, then you might only look at specific resources. However, it can still be helpful to acquire a sense of the performance of the entire system. If the goals set for the system involve the tuning of applications, then the tuning information provided with the application should be applied before looking at more general system performance. I NOTE It is often possible to improve performance by the careful design and implementation of an application, or by tuning an existing application, rather than by tuning the operating system. 16 Performance Guide Tuning methodology To gain an overview of the system's current performance, you should read and use Appendix A, "Tools reference" (page 171) which discusses the various system resources, and how you can monitor these. You should collect data over a duration that is long enough for you to be able to establish normal patterns of usage. You should not make decisions that may be based on observations of performance anomalies though your goal may be to smooth these out. If your goal involves improving the performance of a particular application, you must understand the application's use of systems resources if you suspect that it is not performing as well as it should. Tuning information may be available in the documentation provided with the application. If this is not available, then an indication of how the application uses resources can be gained by gathering information for a period before installing the application, and comparing that information with information gathered while the application is in use. Formulating a hypothesis The next step is to determine what is causing the difference between the current performance and your performance goal. You need to understand the subsystems that have an influence on being able to achieve this goal. Begin with a hypothesis, that is, your best informed guess, of the factors that are critical for moving the system toward the goal. You can then use this hypothesis to make adjustments to the system on a trial basis. If this approach is used then you should maintain a record of adjustments that you have made. You should also keep all the data files produced with the various monitoring commands such as timex(ADM), and sar(ADM). This is useful when you want to confirm that a side effect noticed after a recent change was caused by that change and did not occur previously. Getting more specifics Once you have formulated your hypothesis, look for more specific information. If this information supports the hypothesis, then you can make adjustments to kernel parameters or the hardware configuration to try to improve the performance. If the new information indicates that your hypothesis is wrong then you need to form another. See Appendix D, "Quick system tuning reference" (page 235) for a description of how to diagnose common performance problems. 17 Managing performance Making adjustments to the system Once it appears that the hypothesis is correct, you can make adjustments to the system. It is vital that you record the parameters that the system had initially, and the changes that you make at each stage. Make all adjustments in small steps to ensure that they have the desired effect. After each adjustment, reassess the system's performance using the same commands that you used to measure its initial performance. You should normally adjust kernel parameters one at a time so that you can uniquely identify the effect that an adjustment has. If you adjust several things at once, the interaction between them may mask the effect of the change. Some parameters, however, are intended to be adjusted in groups rather than singly. In such a case, always adjust the minimum number of parameters, and always adjust the same set of parameters. Examples of such groups of parameters are NBUF and NHBUF, and HTCACHEENTS, HTHASHQS and HTOFBIAS. If your adjustment degrades system performance, retrace your steps to a point where it was at its peak before trying to adjust any other parameters on the system. If your performance goals are not met, you must further evaluate and tune the system. This may mean making changes similar to the ones that you have already made, or you may need consider improving the performance of other subsystems. If you have attained your performance goals then you can check the system against the lists of desired attributes of well-tuned multiuser or database server systems given in Appendix D, "Quick system tuning reference" (page 235). You should continue to monitor system performance as part of routine system administration to ensure that you recognize and treat any possible future degradation in performance at an early stage. If you adopt the habit of monitoring performance on a regular basis, you should be able to spot correlations between the numbers recorded and changing demands on the system. Bursts of high system activity during the day, on a particular day of the week, month, or quarter almost certainly reflect the pattern of activity by users, either logged on or running batch jobs. It is up to you to decide how to manage this. You can choose to tune or upgrade the system to cope with peak demand, to reschedule jobs to make use of periods of normally low activity, or both. 18 Performance Guide Managing the workload Performance tuning case studies We have provided several case studies that you can use as starting points for your own investigations. Each study is discussed in terms of the five steps described in "Tuning methodology" (page 14): 1. Define a performance goal for the system. 2. Collect data to get a general picture of how the system is behaving. 3. Formulate a hypothesis based on your observations. 4. Get more specifics to enable you to test the validity of your hypothesis. 5. Make adjustments to the system, and test the outcome of these. If necessary, repeat steps 2 to 5 until your goal is achieved. The case studies have been chosen to represent a variety of application mixes on different systems: • memory-bound workstation (page 59) • memory-bound software development system (page 65) • I/O-bound multiuser system (page 113) • unbalanced disk activity on a database server (page 118) • semaphore activity on a database server (page 167) • network overhead caused by X clients (page 158) Managing the workload If a system is sufficiently well tuned for its applications and uses to which it is normally put, you still have a number of options open to you if you are looking for further performance gains. This involves managing the system's workload with the cooperation of the system's users. If they can be persuaded to take some responsibility with you (as the system administrator) for the system's performance then significant improvements can usually be made. Below are some steps that users and administrators can take to alleviate excessive demands on a system without reconfiguring the kernel. • Move jobs that do not have to run at a particular time of day to off-peak hours. Encourage users to submit jobs using at(C), batch(C), or crontab(C) depending on whether they are one-off (at or batch) or periodic jobs (crontab). • Collect data on the average system workload and publish it to users so that they are aware of the daily peaks and troughs. If they have the flexibility to choose when to run a program, they will know when they can achieve more work. 19 Managing performance • Adjust the default nice value of user processes using the Hardware/Kernel Manager. This will set a lower CPU priority for all user processes, and will allow critical jobs with higher priority to use the CPU more frequently. • Encourage users to reduce the priority of their own processes using nice(C) and renice(C); this is especially important for those jobs that do not perform much I/O activity - these CPU-intensive jobs are likely to monopolize the available processing time. • The default action of the Korn shell (ksh(C» is to run background jobs at a reduced priority. Make sure users have not altered this setting in their .profile or .kshrc files. • Encourage users to kill unnecessary processes, and to log out when they have finished rather than locking their screen. • Reduce the maximum number of processes that a user can run concurrently by lowering the value of the kernel parameter MAXUP. For example, MAXUP set to 20 means that a user can run 19 other processes in addition to their login shell. H you do not have access to additional hardware and your system is well tuned, you may have to implement some of the above recommendations. 20 Performance Guide Chapter 3 Tuning CPU resources Your system hardware contains one or more central processing units (CPUs) plus a host of ancillary processors that relieve the CPU from having to perform certain tasks: • Math coprocessors perform floating point calculations much more efficiently than software can. The 80486DXTM, 80486DX2TM, 80486DX4TM, and Pentium™ include floating-point capability on the chip itself. Without a floating point coprocessor, the CPU must emulate it using software - this is considerably slower. On systems with an SCO® SMP@ License, you can use the -F option to mpsar(ADM) to monitor how many processes are using floating point arithmetic. This command displays information about the usage of both floating point hardware and software emulation. • Direct memory access (DMA) controllers handle memory transfer between devices and memory, or memory and memory. Many hardware peripheral controllers on EISA and MCA bus machines have a built-in Bus Master DMA chip that can perform DMA rather than relying on the DMA controller on the motherboard. On MCA bus machines, a chip called a Central Arbitration Control Point (CACP) decides which Bus Master DMA controller gets control of the bus. An important limitation of all DMA controllers on ISA and early-series MCA bus machines, and some peripheral controllers on all bus architectures, is that they cannot address more than the first 16MB of memory (24-bit addressing). When the operating system encounters hardware with such limitations, it must instruct the CPU to transfer data between the first 16MB and higher memory. 21 Tuning CPU resources Some peripheral controllers (including IDE disk controllers) and older SCSI host adapters either cannot perform DMA or the device driver may not support its use. In this case, the operating system instructs the CPU to transfer data between the peripheral and memory on behalf of the hardware. This is known as programmed I/O (PIO). • Graphics adapters that can take advantage of a local bus architecture (such as VL Bus or PCI) operating at the same speed as the CPU produce a substantial improvement in the performance of the graphics subsystem. • Universal asynchronous receiver/transmitters (UARTs) control input and output (lIO) ,on serial lines. Buffering on UARTs enables more efficient use of the CPU in processing characters input or output over serial lines. Intelligent serial cards are able to offload much of the character processing that the CPU might otherwise have to perform. • Programmable interrupt controllers (PICs) handle interrupts from hardware peripheral devices when they are trying to get the attention of the CPU. The operating system handles these resources for you - reprogramming the various peripheral processor chips to perform tasks on behalf of the CPU. Operating system states The operating system can be in one of four states: executing in user mode The CPU is executing the text (machine code) of a process that accesses its own data space in memory. executing in system mode If a process makes a system call in order to perform a privileged task requiring the services of the kernel (such as accessing a disk), then the operating system places the CPU in system mode (also known as kernel mode). idle waiting for I/O Processes are sleeping while waiting for the completion of I/O to disk or other block devices. idle No processes are read y-to-run on the CPU or are sleeping waiting for block I/O. Processes waiting for keyboard input or network I/O are counted as idle. The combination of time spent waiting for I/O and time spent idle makes up the total time that the operating system spends idle. 22 Performance Guide Operating system states Viewing CPU activity You can view CPU activity using sar -u on single processor systems: 23:59:44 23:59:49 23:59:54 23:59:59 %usr 4 7 6 %sys 24 84 70 %wio 6 0 1 %idle 66 9 23 5 59 2 32 Average On systems with an SCQ SMP License, use mpsar -u to see activity averaged over all the CPUs and cpusar -u to report activity for an individual CPU. %usr indicates the percentage of time that the operating system is executing processes in user mode. %sys indicates the percentage of time that the operating system is executing in system mode. %wio indicates the percentage of time that the operating system is idle with processes that could run if they were not waiting for I/O to complete. %idle indicates the percentage of time that the operating system is idle with no runnable processes. On systems with an seQ SMP License, a CPU runs a process called idle if there are no other runnable processes. On systems using SMP, root can make a CPU inactive using the cpuonoff(ADM) command. The -c option displays the number of active and inactive CPUs: $ cpuonoff -c cpu 1: active cpu 2: inactive cpu 3: active The base processor, which cannot be made inactivate, is always indicated by cpu 1. An inactive CPU shows 100% idle time with the cpusar -u command. The following sections outline the different process states and how processes can share the same CPU. 23 Tuning CPU resources Process states As soon as a process has been created, the system assigns it a state. A process can be in one of several states. You can view the state of the processes on a system using the ps(C) command with the -el options. The "5" field displays the current state as a single letter. The important states for performance tuning are: a On processor - the processor is executing on the CPU in either user or systemmode. R Runnable - the process is on a run queue and is ready-to-run. A runnable process has every resource that it needs to execute except the CPU itself. S Sleeping - the process is waiting for some I/O event to complete such as keyboard input or a disk transfer. Sleeping processes are not runnable until the I/O resource becomes available. Figure 3-1 (page 25) represents these process states and the possible transitions between them. On single CPU systems only one process can run on the CPU at a time. All other runnable processes have to wait on the run queue. A portion of the kernel known as the scheduler chooses which process to run on the CPU(s). When the scheduler wants to run a different process on the CPU, it scans the run queue from the highest priority to the lowest looking for the first runnable process it can find. When a process becomes runnable, the kernel calculates its priority and places it on the run queue at that priority. While it remains runnable, the process' priority is recalculated once every second, and its position in the run queue is adjusted. When there are no higher-priority runnable processes on the run queue, the process is placed on the CPU to run for a fixed amount of time known as a time slice. The operation of the scheduler is more sophisticated for SMP. See "Process scheduling" (page 34) for more information. For certain mixes of applications, it may be beneficial to performance to adjust the way that the scheduler operates. This is discussed in "Adjusting the scheduling of processes" (page 34). 24 Performance Guide Operating system states a) Main process states processes on the run queue on CPU swapped-out runnable processes on swap process running on CPU processes sleeping on 1/0 swapped-out processes sleeping on 1/0 b) Transitions between process states - - - I > main flow ----------1> swapping Figure 3-1 Main process states in a system and the transitions between them 25 Tuning CPU resources Clock ticks and time slices The system motherboard has a programmable interval timer which is used as the system clock; this generates 100 clock interrupts or clock ticks per second (this value is defined as the constant HZ in the header file jusr/include/sys/param.h). The tunable kernel parameter MAXSLICE sets the maximum time slice for a process. Its default value is 100 clock ticks (one second). The range of permissible values is between 25 and 100 (between one quarter of a second and one second). The effect of reducing MAXSLICE is to allow each process to run more often but for a shorter period of time. This can make interactive applications running on the system seem more responsive. However, you should note that adjusting the value of MAXSLICE may have little effect in practice. This is because most processes will need to sleep before their time slice expires in order to wait for an I/O resource. Even a calculation-intensive process, which performs little I/O, will tend to be replaced on the CPU by processes woken when an I/O resource becomes available. Context switching A process runs on the CPU until it is context switched. This happens when one of the following occurs: • The process exits. • The process uses up its time slice. • The process requires another resource that is not currently available or needs to wait for I/O to complete. • A resource has become available for a sleeping process. If there is a higher priority process ready to run, the kernel will run this instead (the current process is preempted). • The process relinquishes the CPU using a semaphore or similar system call. The scheduler can only take a process off the CPU when returning to user mode from system mode, or if the process voluntarily relinquishes the CPU from system mode. If the process has used up its time slice or is preempted, it is returned to the run queue. If it cannot proceed without access to a resource such as disk I/O, it sleeps until the resource is available. Once access to that resource is available, the process is placed on the run queue before being put on the processor. Figure 3-2 (page 27) illustrates this for a process 01 which goes to sleep waiting for I/O. 26 Performance Guide Operating system states on in CPU memory a) Runnable process R1 put on CPU as 0 1 b) Process 0 1 goes to sleep waiting for 1/0 as 8 1 c) Context switch - runnable process R2 put on CPU as O2 d) Process 8 1 is woken when resource becomes available; put on run queue as R1 e) Process O2 is preempted and put back on run queue as R2 . R1 is put on CPU next, as shown in figure a, because it has higher priority than R2 Figure 3·2 Preemption of a process that goes to sleep waiting for YO 27 Tuning CPU resources A context switch occurs when the kernel transfers control of the CPU from an executing process to another that is ready to run. The kernel first saves the context of the process. The context is the set of CPU register values and other data that describes the process' state. The kernel then loads the context of the new process which then starts to execute. When the process that was taken off the CPU next runs, it resumes from the point at which it was taken off the CPU. This is possible because the saved context includes the instruction pointer. This indicates the point in the executable code that the CPU had reached when the context switch occurred. Interrupts An interrupt is a notification from a device that tells the kernel that: • An action such as a disk transfer has been completed. • Data such as keyboard input or a mouse event has been received. The kernel services an interrupt within the context of the current process that is running on the CPU. The execution of the current process is suspended while the kernel deals with the interrupt in system mode. The process may then lose its place on the CPU as a result of a context switch. If the interrupt signaled the completion of an I/O transfer, the scheduler wakes the process that was sleeping on that event, and puts it on a run queue at a newly calculated numeric priority. It mayor may not be the next process to run depending on this priority. Calculation of process priorities A process' priority can range between 0 (lowest priority) and 127 (highest priority). User mode processes run at lower priorities (lower values) than system mode processes. A user mode process can have a priority of a to 65, whereas a system mode process has a priority of 66 to 95. Some of the system mode priorities indicate what a process is waiting for. For example, a priority of 81 indicates that a process is waiting for I/O to complete whereas a value of 75 means that it is waiting for keyboard input. The ps command with the -1 option lists process priorities under the PRI column. Processes with priorities in the range 96 to 127 have fixed priority and control their own scheduling. I NOTE You can find a list of priority values in Table A-2, "Priority values" (page 175). 28 Performance Guide Operating system states Figure 3-3 (this page) shows the division of process priorities into user mode, system mode, and fixed-priority processes. Priorities 127 96 95 66 65 I highest fixed-priority processes 1 system mode user mode o lowest Figure 3-3 System process priorities The operating system varies the priorities of executing processes according to a simple scheduling algorithm which ensures that each process on the system gets fair access to the CPU. Every process receives a base level priority (of default value 51) when it is created. However, this soon loses any influence on whether a process is selected to run. Note that the priorities of kernel daemons such as sched, vhand, and bdflush are not adjusted. Fixed-priority processes are also exempt - such processes have the ability to adjust their own priority. The kernel recalculates the priority of a running process every clock tick. The new priority is based on the process' nice value, and how much CPU time the process has used (if any). When the process is taken off the CPU, its lowered priority pushes it down the run queue to decrease the probability that it will be chosen to run in the near future. A process that manages to run for an entire time slice will have had its priority reduced by the maximum amount. 29 Tuning CPU resources The kernel recalculates the priorities of all runnable processes (those with a user mode priority less than 65) once every second by successively reducing the negative weighting given to their recent CPU usage. This increases the probability that these processes will be selected to run again in the near future. The default nice value of a user's process is 20. An ordinary user can increase this value to 39 and in so doing reduce a process' chance of running on the CPU. Processes with low nice values will on average get more CPU time because of the effect the values have on the scheduling algorithm. There are three ways to control the nice value of a process: • nice(C) reduces the nice value of a new process; root can also increase the nice value using this command. • renice(C) reduces the nice value of a process that is already running; root can also increase the nice value using this command. • If the option bgnice is set in the Korn shell, it runs background jobs at a nice value of 24. If this option is not set, background jobs run at an equal priority to foreground jobs. Examining the run queue Run queue statistics can be seen with sar -q on single processor systems or mpsar -q on multiprocessor systems: 23:59:44 rung-sz %runocc swpg-sz %swpocc 23:59:49 1.7 98 1.5 36 23:59:54 1.0 63 1.0 31 23:59:59 49 1.0 58 1.0 Average 1.3 74 1.2 39 runq-sz indicates the number of processes that are ready to run (on the run queue) and %runocc indicates the percentage of time that the run queue was occupied by at least one process. See "Identifying CPu-bound systems" (page 38) for a discussion of how to identify if your system is CPU bound. 30 Performance Guide Multiprocessor systems Multiprocessor systems The sca OpenServer system is a multitasking, multiuser operating system, designed to share resources on a computer with a single CPU. It can run on a more powerful multiprocessor system but it cannot use more than one of the available CPUs. sca SMP License software adds multiprocessing-specific components to the standard operating system kernel, enabling it to recognize and use additional processors automatically. As SMP is implemented as an extension to, and is completely c'ompatible with the version of the kernel that supports a single CPU. With sca SMP License software installed, the operating system retains its multitasking, multiuser functionality. There is no impact on existing utilities, system administration, or filesystems. SMP can executes standard aMF (x. out), CaFF, and ELF binaries without modification. SMP is modular. As your system requires more processing power, you can add additional processors. For example, two processors give you twice the processing power of a single processor of identical specification in terms of the number of instructions per second that they can execute. If the operating system can gain extra performance in direct proportion to the number of processors, it is said to exhibit perfect scaling as shown in Figure 3-4 (page 32). In practice, the processors have to compete for other resources such as memory and disk, they have to co-operate in how they handle interrupts from peripherals and from other CPUs, and they may have to wait to gain access to data structures and devices. 31 Tuning CPU resources ThroughputAl (1 CPU = 100% ) perfect scali ng - 5000/0 - - 3000/0 2000/0 - 100% 1 2 3 4 5 6 ~ Number of CPUs Figure 3·4 Perfect multiprocessor scaling To ensure good scaling, you should ensure that the memory and I/O subsystems (particularly hard disks) are powerful enough to satisfy the demands that multiple processors put on them. If you do not match the power of your subsystems to that of the processors, your system is likely to be memory or I/O bound, and it will not utilize the potential performance of the processors. A system will scale well when there are many ready-to-run processes. Multithreaded applications are also well suited to take advantage of a multiprocessing environment. 32 Performance Guide Multiprocessor systems Support for multiple processors In SMP, all CPUs can access the same memory, and they all run the same copy of the kernel. As in the single processor version of the operating system, the operating system state on each CPU may be executing in kernel mode, executing in user mode, idle, or idle waiting for I/O. All processors can run the kernel Simultaneously because it is multithreaded; that is, it is designed to run simultaneously on several processors while protecting shared memory structures. Any processor can execute primary kernel functions such as file system access, memory and buffer management, distributed interrupt and trap handling, process scheduling, and system calls. Most standard device drivers provided with the system are also multithreaded. Any unmodified driver or kernel module that does not register itself as multithreaded runs on the base processor. Figure 3-5 (this page) shows how we can modify the process state diagram introduced in "Process states" (page 24) and apply it to a multiprocessor system. Note that this diagram implies that the kernel not only has to consider when to run a process but also on which CPU to run it. on CPUs in memory on swap Figure 3-5 Process states on a multiprocessor computer 33 Tuning CPU resources Using the mpstat load displayer On systems with an SCO SMP License, the mpstat utility visually displays processor activity for each of the processors installed on your system. It allows you to verify at a glance that the system load is balanced across all available processors. See the mpstat(ADM) manual page for more information. Examining interrupt activity on multiprocessor systems On multiprocessor systems, interrupts sent between the CPUs coordinate and synchronize timing, I/O, and other cooperative activity. You can use cpusar -j to see how active interrupt handling routines are on a particular CPU in a multiprocessor system. If device drivers are not written to be multithreaded they will only run on the base processor. You can examine which device drivers are multithreaded using the mthread(ADM) command. You can also use the displayintr(ADM) command to see how interrupt handlers are distributed across the system's CPUs and whether they are movable from one CPU to another. The number of inter-CPU interrupts can be examined using mpsar -I to view systemwide activity or cpusar -I to examine an individual CPU. The output of these commands depends on your system hardware. Process scheduling In a single processor UNIX operating system, the scheduler only concerns itself with when to run a process on the CPU. In a multiprocessor UNIX oper- ating system, the scheduler not only has to consider when to run a process, but also where to run it. Because the kernel runs on all the processors, the process scheduler may be active on any or all of the CPUs. You can adjust how the process scheduler works in order to improve performance as described in "Adjusting the scheduling of processes" (this page). Adjusting the scheduling of processes You can configure the process scheduling policy to suit a particular application mix by adjusting the values of a few kernel variables as described in the following sections. The variables dopricalc, primove, and cache_affinity control the behavior of priority calculations and the scheduler on both single processor and multiprocessor machines; they are to be found in the file /etc/conj/pack.d/kernel/space.c. 34 Performance Guide Process scheduling The variables preemptive and loadbalance only apply to SMP and can be found in /etc/con//pack.d/crllry/space.c. To change the values of these variables, edit the files, then relink and reboot the kernel. It is not possible to predict the effect of the settings on a particular system. It is likely that you will have to try alternative values to determine whether there is a gain. For database servers on systems with an SCO SMP License, you may find that setting preemptive, loadbalance and dopricalc to zero gives a performance improvement. The following sections describe the effect of adjusting these variables: • #Controlling priority calculations - dopricalc" (this page) • #Controlling the effective priority of processes - primove" (page 36) • #Controlling cache affinity - cache_affinity" (page 37) • #Controlling process preemption - preemptive" (page 37) • #Load balancing -loadbalance" (page 37) Controlling priority calculations - dopricalc The dopricalc variable controls whether the kernel adjusts the priorities of all runnable processes at one-second intervals. Its value has no effect on the recalculation every clock tick of the priority of a process that is currently running. For some application mixes, such as database servers which have no loggedin users and which make frequent I/O requests, disabling the recalculation of the priorities of ready-to-run processes may improve performance. This is because a process running on a CPU is more likely to continue to run until it reaches the end of its time slice or until it sleeps on starting an I/ 0 request. The default value of dopricalc is 1 which enables the one-second priority calculations. To tum off the calculations, set the value of dopricalc to 0, relink the kernel, and reboot the system. This modification will reduce the number of context switches, and may increase the efficiency of the L2 cache. However, it may impair the performance of system if there is a mixture of interactive and CPU-intensive processes. CPU-intensive processes spend all or nearly all of their time in user space; they do not go to sleep waiting for I/O, and so they are unlikely to be context switched except at the end of their time slice. As a consequence, interactive processes may receive less access to the CPU. 35 Tuning CPU resources Controlling the effective priority of processes - primove Until now, the discussion of process priorities has assumed that the scheduler uses a process' calculated priority to decide whether the process should be put on the CPU to run. In the default configuration of the kernel, this is effectively true. In fact, the kernel implements the run queue as separate lists of runnable processes for each priority value. The scheduler examines the priority value assigned to each list rather than the priorities of the processes that they contain when looking for a process to run. Provided the kernel assigns processes to the list corresponding to their priority, the lists are invisible. Under some circumstances, it may be beneficial to performance to allow processes to remain in a list after their priority has been changed. When the priority of a user process is adjusted, the variable primove controls whether the kernel moves the process to a higher or lower value priority list. The process will only be moved to a new list if its priority differs from the present list priority by at least the value of primove. The effect of increasing primove is to make a process remain at a low or high priority for longer. It also means that the operating system has less work to do moving processes between different lists. The default value of primove is 0 for compliance with POSIX.1b. This means that any change in a process' priority will cause it to be moved to a different list. For an example of the use of primove, assume that it is given a value of 10. If the priority of a process begins at 51 and rises by at least a value of 10, it is moved to the list corresponding to priority 61. The process does not move between lists until its priority rises by at least the value of primove. So if the process' priority rose to 60, it would remain on the priority 50 list. The kernel, however, would still see the process as having a lower priority than another in the priority 55 list. Conversely, a process in the priority 71 list will stay there until its priority falls to 61. Increasing the value of primove makes the kernel less sensitive to process priorities. Reducing the value of primove produces fairer scheduling for all processes but increases the amount of kernel overhead that is needed to manipulate the run queue. 36 Performance Guide Process scheduling Controlling cache affinity - cache_affinity By default, the scheduler does not gives preference to a process that last executed on a CPU. The advantage of giving preference to these processes is to improve the hit rate on the level-one (Ll) cache and L2 caches. As a consequence, the hardware is less likely to have to reload the caches from memory, an action that could slow down the processor. It also means that the process selected to run does not necessarily have the highest priority. Cache affinity behavior is controlled by the value of the variable cache_affinity. If the value of cache_affinity is changed to 1, the kernel gives preference to processes which previously ran on a CPU Valid data and text is more likely to be found in the caches for small processes. If your system tends to run large processes leave cache_affinity set to O. Controlling process preemption - preemptive On multiprocessor systems, the scheduler looks for a CPU on which to run a process when that process becomes runnable, or when its time slice. has expired. The scheduler first looks for an idle CPU. If it cannot find an idle CPU, it next considers preempting the process on the current CPU if it has a lower priority; it is quicker to preempt the current process as this does not require an interprocessor interrupt. With some application mixes, however, this can increase the number of context switches. For example, when a database server wakes a client, it may be more efficient, in terms of system resources, for the server to continue to run for a period of time after that wakeup. To prevent the scheduler from preempting the current processor, change the value of preemptive to O. Load balancing - loadbalance On multiprocessor systems, the default behavior of the scheduler is to run the highest priority jobs on each of the processors. For example, when a process is woken after a disk transfer completes, the scheduler checks if any CPU is running a process with a lower priority. If so, the processor is instructed to reschedule and run the newly woken process. This load balancing feature is also used when a process is taken off a CPU; it is possible that the process has a higher priority than one on another CPU. 37 Tuning CPU resources If you change the value of loadbalance to 0, the scheduler no longer looks for lower priority processes on other CPUs. This reduces the probability that a process will be preempted. On a system that is performing a reasonable amount of I/O requests, this should reduce the number of context switches and interprocessor interrupts. This provides more processor cycles for executing user applications and should increase overall performance. Processors which are idle can still be selected so idle time is minimized. This adjustment is likely to improve performance where context switching frequency is high, or on database servers where user processes should not be disturbed once they are running. If the system is already spending a significant amount of time idle, it is unlikely that this adjustment will improve performance. Identifying CPU-bound systems A system is CPU bound (has a CPU bottleneck) if the processor cannot execute fast enough to keep the number of processes on the run queue consistently low. To determine if a system is CPU bound, run sar -u (or cpusar -u for each processor on a system with an SCO SMP License) and examine the %idle value. If %idle is consistently less than 5% (for all CPUs) on a heavily loaded database server system, then the system may be lacking in processing power. On a heavily loaded system with many logged-in users, a %idle value that is persistently less than 20% suggests that the system not be able to cope with a much larger load. Examination of the number of processes on the run queue shows whether there is an unacceptable buildup of runnable processes. If processes are not building up on the run queue, a low idle time need not indicate an immediate problem provided that the other subsystems (memory and I/O) can cope with the demands placed upon them. Run queue activity can be considered heavy if sar -q (mpsar -q for SMP) reports that runq-sz is consistently greater than 2 (and %runocc is greater than 90% for SMP). If low %idle values are combined with heavy run queue activity then the system is CPU bound. If low %idle values are combined with low or non-existent run queue activity, it is possible that the system is running CPU-intensive processes. This in itself is not a problem unless an increase in the number of executing processes causes a buildup of numbers of processes on the run queue. 38 Performance Guide Identifying CPU-bound systems If %wio values are consistently high (greater than 15% ), this is more likely to indicate a potential I/O bottleneck than a problem with CPU resources. See Chapter 5, "Tuning I/O resources" (page 71) for more information on identifying I/O bottlenecks. High values of %wio may also be seen if the system is swapping and paging. Memory shortages can also lead to a disk I/O bottleneck because the system spends so much time moving processes and pages between memory and swap areas on disk. If the value of %sys is high relative to %usr, and %idle is close to zero, this could indicate that the kernel is consuming a large amount of CPU time running the swapping and page stealing daemons( sched and vhand). These daemons are part of the kernel and cannot be context switched; this may lead to several processes being stuck on the run queue waiting to run. For details of how to identify and tune memory-bound systems, see Chapter 4, "Tuning memory resources" (page 41) and "Tuning memory-bound systems" (page 52). The following table summarizes the commands that you can use to determine if a system is CPU bound: Table 3-1 Identifying a CPU-bound system Command Field Description sar-u mpsar-u %idle %idle cpusar-u %idle [mp]sar-q %runocc runq-sz percentage of time that the CPU was idle average percentage of time all CPUs are idle (SMP only) percentage of time the specified CPU was idle (SMP only) percentage of time the run queue is occupied number of processes on the run queue See l'Tuning CPU-bound systems" (page 40) for a discussion of how to tune CPU-bound systems. 39 Tuning CPU resources Tuning CPU-bound systems If it has been determined that the system is CPU bound, there are a number of things that can be done: • If possible, consider rescheduling the existing job load on your system. If many large jobs are being run at once, rescheduling them to run at different times may improve performance. You should also check the system's crontab(C) files to see if any jobs running at peak times can be scheduled to run at other times. ~at they use require less CPU power. Consider replacing non-critical applications with ones that require a less powerful system. • If possible, tune the applications so • If you have evidence that the system is I/O bound serving interrupts from non-intelligent serial cards, replacing these with intelligent serial cards will offload some of the I/O burden from the CPUs. See #Serial device resources" (page 108) for more details. • Check if the hard disk controllers in the system are capable of using DMA to transfer data to and from memory. If the CPU has to perform programmed I/O on behalf of the controller, this can limit its performance. • It is possible that because of a lack of free memory the system is swapping, which could result in a considerable portion of the CPU resources being used to transfer processes back and forth between the disk and memory. To determine if this is the case see the section Chapter 4, "Tuning memory resources" (page 41). • Upgrade to a faster CPU or CPUs. • Upgrade to a multiprocessor system from a single processor system. This will help if there are runnable jobs on the run queue or the applications being run are multithreaded. • Add one or more CPUs to a multiprocessor system. • Purchase an additional system and divide your processing requirements between it and your current system. 40 Performance Guide Chapter 4 Tuning menwry resources The sea OpenServer system is a virtual memory operating system. Virtual memory is implemented using various physical resources: • Physical memory as RAM chips; sometimes referred to as primary, main, or core memory. • Program text (machine code instructions) as files within filesystems on disk or ramdisks. • Swap space consisting of one or more disk divisions or swap files within file systems dedicated to this purpose. The individual pieces of swap space are known as swap areas. Swap space is also referred to as secondary memory. Depending on the system hardware, there may also be physical cache memory on the CPU chip itself (level-one (Ll) cache), or on the computer's motherboard (level-two (L2) cache), and on peripheral hardware controller cards. If recently accessed data (or, for some Ll and L2 caches, machine instructions) exists in this memory, it can be accessed immediately rather than having to retrieve it from more distant memory. Write-through caches store data read from memory or a peripheral device; they ensure that data is written synchronously to memory or a physical device before allowing the CPU to continue. Write-back caches retain both read and written data and do not require the CPU to synchronize with data being written. 41 Tuning memory resources NOTE Most L2 caches work with a limited amount of main memory. Adding more RAM than the cache can handle may actually make the machine slower. For some machines with a 64KB L2 cache, this only covers the first 16MB of physical memory. See the documentation provided with your computer or motherboard hardware for more details. Physical memory Physical memory on the system is divided between the area occupied by the kernel and the area available to user processes. Whenever the system is rebooted the size of these areas, as well as the total amount of physical memory, is logged in the file /usr/adm/messages under the heading mem:, for example: mem: total = 32384k, kernel = 4484k, user = 27900k This shows a system with 32MB of physical memory; the kernel is using just over 4MB of this memory with the remainder being available for user processes. Physical memory is divided into equal-sized (4KB) pieces known as pages. When a process starts to run, the first 4KB of the program's text (executable machine instructions) is copied into a page of memory. Each subsequent portion of memory that a process requires is assigned an additional page. When a process terminates, its pages are returned to the free list of unused pages. Physical memory is continually used in this way unless the number of running processes require more pages of memory than currently exist on the system. In this case the system must redistribute the available memory by either paging out or swapping. Virtual memory The operating system uses virtual memory to manage the memory requirements of its processes by combining physical memory with secondary memory ( swap space) on disk. The swap area is usually located on a local disk drive. Diskless systems use a page server to maintain their swap areas on its local disk. The amount of swap space is usually configured to be larger than physical memory; the sum of physical memory and swap space defines the total virtual memory that is available to the system. 42 Performance Guide Virtual memory Having swap space on disk means that the CPU's access to it is very much slower than to physical memory. Conventionally, the swap area uses an entire division on a hard disk. It is also possible to configure a regular file from within a file system for use as swap. Although this is intended for use by diskless workstations, a server can also increase its swap area in this way. The swap area is used as an additional memory resource for processes which are too large for the available physical user memory. In this way, it is possible to run a process whose entire executable image will not fit into physical memory. However, a process does not have to be completely loaded into physical memory to run, and in most cases is unlikely to be completely loaded anyway. The virtual address space of a process is divided into separate areas known as regions that it uses to hold its text, data, and stack pages. When a program is loaded, its data region consists of data pages that were initialized when the program was compiled. If the program creates uninitialized data (often known as bss for historical reasons) the kernel adds more pages of memory to the process' data region. If the operating system is running low on physical memory, it can start to write pages of physical memory belonging to processes out to the swap area. See "Paging" (page 44) and "Swapping" (page 47) for more details. Figure 4-1 (page 44) illustrates how a process' virtual memory might correspond to what exists in physical memory, on swap, and in the file system. The u-area of a process consists of two 4KB pages (displayed here as U and U ) of virtual memory that contain information about the process needed by the ~ys tem when the process is running. In this example, these pages are shown existing in physical memory. The data pages, D and D , are shown as being paged out to the swap area on disk. The text p1g e, T4,bas also been paged out but it is not written to the swap area as it exists in the filesystem. Those pages which have not yet been accessed by the process (D , T2' and T ) do not · l memory or m ·the swap area. 5 5 occupy any resources in p h YSlca 43 Tuning memory resources Process virtual memory u-area stack data tex1 Physical memory Disk ~ I 81 1 82 1 83 1 I I D21~1~1)(J D1 I T1 k>Itmp/netstat_op The administrator runs the command on several workstations to try to eliminate the possibility that faulty network interface cards are the cause of the problem. 158 Performance Guide Case study: network overhead caused by X clients The recorded output shows occasional short periods when the network is overloaded (for clarity, only the statistics for the network interface xxxO are shown in this example): input (xxxO) output packets errs packets errs colls 110 78 85 180 120 87 67 0 0 1 2 1 0 0 101 66 75 123 55 67 54 0 0 2 1 1 0 0 0 0 23 42 18 2 0 At these times, the numbers of input and output errors are non-zero, and the number of collisions approaches 30% of output packets. The same behavior is observed on all the workstations on which statistics were gathered. If the periods of heavy loading are excluded, the frequency of packet colli- sions approaches 0%. Formulating a hypothesis From the results of running netstat, the system administrator suspects that some applications must be moving large amounts of data across the network. Careful examination of the figures shows that the network is overloaded approximately 5% of the time. Periods of high loading generally last only a few minutes and seem to occur in bursts. Such behavior is typical if large files are transferred using NFS. It is unlikely to be the result of network traffic caused by remote X clients as these are run locally where possible. Possible culprits are programs used to preview PostScript and graphics image files, DTP packages, and screen-capture utilities. Getting more specifics With the cooperation of several users, the administrator monitors network performance using netstat over a period of 30 minutes. During this period the users run the suspect applications to load and manipulate large files across the network The outcome of this investigation is that graphics image previewers and screen-capture utilities seem to cause the most network overhead. The files being viewed or created are often several megabytes in size. 159 Tuning networking resources Making adjustments to the system There are several things that can be done to reduce the peak load on the network: • Encourage users to save and load graphics images to and from the local disk on the workstation they are using. These files may then be copied to the file server when the network is less busy. • Run screen capture and graphics preview utilities on dedicated workstations rather than on X terminals. • Splitting the network into several subnets might help if the nodes on the network can easily be divided into logically distinct groups. However, this may cause more CPU overhead on the file server if it is is used as the router between the subnets. This solution is more expensive and may make the problem worse if the wrong network topology is chosen. 160 Performance Guide Chapter 7 Tuning system call activity This chapter is of interest to application programmers who need to investigate the level of activity of system calls on a system. System calls are used by programs and utilities to request services from the kernel. These can involve passing data to the kernel to be written to disk, finding process information and creating new processes. By allowing the kernel to perform these services on behalf of an application program, they can be provided transparently. For example, a program can write data without needing to be concerned whether this is to a file, memory, or a physical device such as disk or tape. It also prevents programs from directly manipulating and accidentally damaging system structures. System calls can adversely affect performance because of the overhead required to go into system mode and the extra context switching that may result. Viewing system call activity System call activity can be seen with sar -c (or mpsar -c for SMP): 23:59:44 scall/s sread/s swrit/s 23:59:49 473 9 0 23:59:54 13 516 3 483 13 23:59:59 3 Average 489 12 2 fork/s 0.09 0.03 0.01 exec/s 0.12 0.03 0.02 rchar/s 292077 367668 366992 wchar/s 421 574 566 0.04 0.06 338280 512 scall/s indicates the average number of system calls per second averaged over the sampling interval. Also of interest are sread/ s and swri t/ s which indicate the number of read(S) and write(S) calls, and rchar / s and wchar / s which show the number of characters transferred by them. 161 Tuning system call activity If you are an applications programmer and the SCO OpenServer Development System is installed on your system, you can use prof(CP) to examine the results of execution profiling provided by the monitor(S) function. This should show where a program spends most of its time when it is executing. You can also use the trace(CP) utility to investigate system call usage by a program. Identifying excessive read and write system call activity Normally, read and write system calls should not account for more than half of the total number of system calls. If the number of characters transferred by each read (rchar/s I sread/s) or write (wchar/s I swrit/s) call is small, it is likely that some applications are reading and writing small amounts of data for each system call. It is wasteful for the system to spend much of its time switching between system and user mode because of the overhead this incurs. It may be possible to reduce the number of read and write calls by tuning the application that uses them. For example, a database management system may provide its own tunable parameters to enable you to tune the caching it provides for disk I/O. Viewing process fork and exec activity fork/ s and exec/ s show the number of fork(S) and exec(S) calls per second. If the system shows high fork and exec activity, this may be due to it running a large number of shell scripts. To avoid this, one possibility is to rewrite the shell scripts in a high-level compiled language such as C. Viewing AIO activity If applications are using asynchronous I/O (AIO) to disk, you can use the -0 option to sar(ADM) (or mpsar(ADM) for SMP) to examine the performance of AIO requests. The values reported include the number of AIO read and write requests per second, and the total number of lKB blocks (both read and write) being handled per second. The %direct column of the report shows the percentage of AIO requests that are passed directly to the disk driver by the POSIX.lb aio functions defined in the Software Update for Database Systems (SUDS) library. Other AIO requests are handled by the aio(HW) driver. Viewing IPC activity You can use the sar -m command (or mpsar -m for SMP) to see how many System V interprocess communication (IPC) message queue and semaphore primitives are issued per second. Note that you can also use the ipcs(ADM) command to report the status of active message queues, shared memory segments, and semaphores. 162 -- ~----~~- ._- --- Performance Guide .... -----~~-- Viewing system call activity Semaphore resources Semaphores are used to prevent processes from accessing the same resource, usually shared memory, at the same time. The number of System V semaphores configured for use is controlled by the kernel parameter SEMMNS. If the sema/s column in the output from sar -m shows that the number of semaphore primitives called per second is high (for example, greater than 100), the application may not be using !PC efficiently. It is not possible to recommend a value here. What constitutes a high number of semaphore calls depends on the use to which the application puts them and the processing power of the system running the application. System V semaphores are known to be inefficient and adversely affect the performance of multiprocessor systems. This is because: • They increase contention between processors - this reduces scaling and prevents the available CPU power being used effectively. • They increase activity on the run queues as several processes sleeping on a semaphore may be woken when its state changes - this increases system overhead. • They increase the likelihood of context switching - this also increases system overhead. If you are an applications programmer, consider using the SUDS library routines instead; these implement more efficient POSIX.lh semaphores. The number of POSIX.lb semaphores configured for use is controlled by the kernel parameter SEM_NSEMS_MAX. Some database management systems may use a sleeper driver to synchronize processes. (This may also be referred to as a post-wait driver.) If this is not enabled, they may revert to using less efficient System V semaphores. See the documentation provided with the database management system for more information. For more information on the kernel parameters that you can use to configure semaphores, see #Semaphores" (page 216) and #Semaphore parameters" (page 217). Messages and message queue resources Messages are intended for interprocess communication which involves small quantities of data, usually less than lKB. Between being sent and being received, the messages are stored on message queues. These queues are implemented as linked lists within the kernel. 163 Tuning system call activity Under some circumstances, you may need to increase resources allocated for messages and message queues above the default values defined in the mtune(F) file. Note that the kernel parameters defined in mtune set systemwide limits, not per-process limits. Follow the guidelines below when changing the kernel parameters that control the configuration of message queues: • Each process that calls msgget(S) with either of the flags IPC_CREAT or IPC_PRIVATE set obtains an ID for a new message queue. • The total number of available message headers (MSGTQL) must be less than or equal to 16383. This limits the total number of messages systemwide because each unread message must have a header. • The total number of segments configured for use (MSGSEG) must be less than or equal to 32768. This limits the total number of messages systemwide because each message consists of at least one segment. • The size of each message segment (MSGSSZ) is specified in bytes and must be a multiple of 4 in the range 4 to 4096. Each message is allocated enough segments to hold it; any remaining space in the last segment allocated to a message is unused. A small value of MSGSSZ is suitable for systems which will send and receive many small messages. A large value is suitable if messages are fewer and larger. Small segments require more processing overhead by the kernel as it keeps track of them; large segments can be wasteful of memory. • The total amount of memory reserved for use by message data is controlled by the product of the number of segments and the segment size: MSGSEG * MSGSSZ This value must be less than or equal to 128KB (131072 bytes). • Increase the size of the map used for managing messages (MSGMAP) if a large number of small messages are processed. Typically, you should set the map size to half the number of memory segments configured (MSGSEG). Do not increase MSGMAP to a value greater than that of MSGSEG. • The amount of message data allowed in an individual queue (MSGMNB) must be less than or equal to 64KB - 4 bytes (that is, less than or equal to 65532 bytes). • The maximum length of an individual message is limited by the value of MSGMAX. Although the recommended maximum is 8192 bytes (8KB), the kernel can support messages up to 32767 bytes in length. Note, however, that the message size may also be limited by the value of MSGMNB. 164 Performance Guide Viewing system call activity The following table shows how to calculate the maximum values for these parameters based on the value of MSGSSZ. Note that MSGSSZ must be a multiple of 4 in the range 4 to 4096: Table 7-1 Calculation of maximum value of message parameters Parameter Maximum value MSGMAP MSGMAX MSGMNB MSGMNI MSGSEG MSGTQL 131072 / MSGSSZ 32767 65532 1024 131072 / MSGSSZ MSGMNB / MSGSSZ For more information on the kernel parameters that you can use to configure message queues, see "Message queues" (page 213) and "Message queue parameters" (page 215). Shared memory resources Shared memory is an extremely fast method of interprocess communication. As its name suggests, it operates by allowing processes to share memory segments within their address spaces. Data written by one process is available immediately for reading by another process. To prevent processes trying to access the same memory addresses at the same time, known as a race condition, the processes must be synchronized using a mechanism such as a semaphore. The maximum number of shared-memory segments available for use is controlled by the value of the kernel parameter SHMMNI. The maximum size in bytes of a segment is determined by the value of the kernel parameter SHMMAX. For more information on the kernel parameters that you can use to configure shared memory, see "Shared memory" (page 218) and "Shared memory parameters" (page 218). SUDS library spin locks and latches If your application uses spin locks and latches from the SUDS library to syn- chronize processes, you can use the -L option to sar(ADM) (or mpsar(ADM) for systems with an sea SMP License) to view their activity. 165 Tuning system call activity These latches allow processes to spin or sleep while waiting to acquire a latch. Alternatively, a process can be made to sleep if it has been spinning for a given time period without being able to acquire a latch. This prevents it spending an unnecessarily long time spinning. It is efficient for a process to spin for a short time to avoid the system overhead that a context switch would cause. Process that wait a long time for a latch should sleep to avoid wasting CPU time. See the sar(ADM) manual page for more information about the latch activity reported by the -L option. The following table summarizes the commands that can be used to determine if a system is suffering under heavy system call activity: Table 7·2 Viewing system call activity Command Field Description [mp]sar-c scall/s sread/s swrit/s fork/s exec/s rchar/s total number of all system calls per second read system calls per second write system calls per second fork system calls per second exec system calls per second characters transferred by read system calls per second characters transferred by write system calls per second status of System V !PC facilities message queue primitives per second semaphore primitives per second percentage of AlO requests using the POSIX.lb aio functions wchar/s ipcs -a [mp]sar-m [mp]sar-O msg/s sema/s %direct Reducing system call activity Reducing most system call activity is only possible if the source code for the programs making the system calls is available. H a program is making a large number of read and write system calls that each transfer a small number of bytes, then the program needs to be rewritten to make fewer system calls that each transfer larger numbers of bytes. 166 Performance Guide Case study: semaphore activity on a database server Other possible sources of system call activity are applications that use interprocess communication (semaphores, shared memory, and message queues), and record locking. You should ensure that the system has sufficient of these resources to meet the demands of the application. Most large applications such as database management systems include advice on tuning the application for the host operating system. They may also include their own tuning facilities, so you should always check the documentation that was supplied with the application. Case study: semaphore activity on a database server In this study, a site has installed a relational database on a multiprocessor system. The database gives the choice of using System V semaphores or the sleeper driver (sometimes called the post-wait driver) to synchronize processes. The object is to investigate which of these options will maximize the number of transactions that can be processed per second and the response time for the user. System configuration The system's configuration is as follows: • Multiprocessor - 2 Pentium 60MHz processors. • EISA bus. • 96MB of RAM. 96MB of swap space. • • 14GB of hard disk (two arrays of seven 1GB SCSI-2 disks). • One Bus Mastering DMA Ethernet network card with a 16KB buffer and 32bit wide data path. The database server does not act as host machine to any users directly; instead there are five host machines connected to the LAN which serve an average of 100 users each. Defining a performance goal The performance goal in this study is to compare the performance of the database when using System V semaphores and when using the sleeper driver. NOTE To configure the sleeper driver into the kernel, change the second field of the line in the file letc/conflsdevice.dlsleeper to read Y ". Then relink and reboot the kernel. I 1/ 167 Tuning system call activity Collecting data To monitor the performance, an in-house benchmark is used for an hour with the system configured to use System V semaphores, and then with it using the sleeper driver. The benchmark measures the minimum, maximum, and average transaction times and the total throughput in transactions per second. The result of running the benchmark is that the best performance is achieved using the sleeper driver. Formulating a hypothesis When the database is using System V semaphores, the system may be spending too much time in kernel mode executing semaphore calls. The benchmark run using the sleeper driver gives better results because it is an enhancement specifically aimed at improving the performance of relational databases. It allows an RDBMS to synchronize built-in processes without the high overhead of switching between user mode and system mode associated with System V semaphores. Getting more specifics To test the hypothesis, mpsar -u is used to display the time that the system spent in system mode while each benchmark was being run. For the benchmark using the sleeper driver, typical results were: 13:55:00 %usr %sys %wio %idle 14:20:00 14:25:00 14:30:00 14:35:00 75 72 69 77 20 23 24 19 2 1 5 4 3 4 2 o The averaged performance of all the CPUs was excellent with low percentages spent in system mode, idle waiting for I/O, or idle. 168 Performance Guide Case study: semaphore activity on a database server For the run using semaphores, the results were: 16:08:00 %usr %sys %wio %idle 16:48:00 16:43:00 16:58:00 16:53:00 55 59 61 58 38 32 34 38 6 4 0 2 1 2 2 7 The system spends more time in system mode and waiting for I/O when System V semaphores are used. The benchmark results indicate that transaction throughput and response time are approximately 10% better when the sleeper driver is used. Making adjustments to the system The database is configured to use the sleeper driver as this provides the best performance for the benchmark. The system should be monitored in everyday use to evaluate its performance under real loading. Vendors of the database management systems are continually improving their products to use more sophisticated database technologies. If you upgrade the database management system to a version that supports POSIX.1b semaphores, you may need to evaluate if these should be used instead of the sleeper driver. 169 Tuning system call activity 170 Performance Guide Appendix A Tools reference A variety of tools are available to monitor system performance or report on the usage of system resources such as disk space, interprocess communication (IPC) facilities, and pipes: d£ Reports the amount of free disk blocks on local disk divisions. See "df - report disk space usage" (page 172) and d£(C) for more information. Also see the descriptions of the related commands: dfspace(C) and du(C). ipcs Reports the status of System V interprocess communication (IPC) facilities - message queues, semaphores, and shared memory. See ipcs(ADM) for more information. netstat Reports on STREAMS usage and various network performance statistics. It is particularly useful for diagnosing if a network is overloaded or a network card is faulty. See netstat(TC) for more information. See also ndstat(ADM) which reports similar information. nfsstat Reports NFS statistics on NFS servers and clients. It is particularly useful for detecting problems with NFS configuration. See nfsstat(NADM) for more information. ping Can be used to test connectivity over a network. See ping(ADMN) for more information. pipestat Reports on the usage of ordinary and high performance pipes. See pipe(ADM) for more information. ps Reports on processes currently occupying the process table. See lipS - check process activity" (page 173) and ps(C) for more information. sar Samples the state of the system and provides reports on various system-wide activities. See "sar - system activity reporter" (page 176) and sar(ADM) for more information. 171 Tools reference swap Reports on the amount of available swap space or configures additional swap devices. See "swap - check and add swap space" (page 179) and swap(ADM) for more information. timex Reports on system resource usage during the execution of a command or program. See "timex - examine system activity per command" (page 180) and timex(ADM) for more information. See also the description of the related command, time(C). traceroute Traces the route that network packets take to reach a given destination. ~ee traceroute(ADMN) for more information. vmstat Reports on process states, paging and swapping activity, system calls, context switches and CPU usage. See "vmstat - virtual memory statistics" (page 181) and vmstat(C) for more information. elf - report disk space usage When attempting to achieve optimal performance for the I/O subsystem, it is important to make sure that the disks have enough free space to do their job efficiently. The d£(C) command, and its close relative d£space(C), enable you to see how much free space there is. The following example shows the output from d£ and d£space on the same system: $ df I lu Ipublic I london $ dfspace I lu I public I london (ldev/root (ldev/u (I dev Ipublic (wansvr:/london Disk Disk Disk Disk ): ): ): ): 37872 270814 191388 149750 blocks blocks blocks blocks space: 18.49 ME of space: 132.23 MB of space: 93.45 ME of space: 73.12 MB of 292.96 629.98 305.77 202.56 46812 i-nodes 36874 i-nodes 55006 i-nodes o i-nodes ME ME MB ME available available available available Total Disk Space: 317.29 ME of 1431.29 MB available (22.17%). $ df -v free Mount Dir Filesystem blocks used 600000 37872 Idev/root 562128 1290218 1019404 270814 Idev/u lu 626218 434830 191388 Ipublic Idev/public 414858 149750 I london wansvr:/london 265108 ( 6.31%). (20.99%). (30.56%). (36.10%). %used 93% 79% 69% 63% The -i option to d£ also provides additional information about the number of free and used inodes. d£space is a shell script interface to d£. Without options, it presents the filesystem data in a more readable format than d£. When used with its options, d£ provides more comprehensive information than d£space. 172 Performance Guide ps - check process activity In the above example, there are three local filesystems: • /dev/root • /dev/u • /dev/public and one remote filesystem: • wansvr:/london All of these local filesystems have adequate numbers of blocks and inodes remaining for use. You should aim to keep at least 15% of free space on each filesystem. This helps to prevent fragmentation which slows down disk I/O. In the above example there are no problems with the filesystems /dev/u and jdev/public which are less than 85% used. The root file system (/dev/root), however, is 93% full. This file system is relatively static apart from the temporary file storage directories /tmp and /usr/tmp. In the configuration shown, there is very little free space in these directories. Possible solutions are to create divisions to hold these directories on other disks, or increase the size of the root filesystem. du(C) is another command that can be used to investigate disk usage. It differs from df and dfspace because it reports the number of 512-byte blocks that files and directories contain rather than the contents of an entire filesystem. If no path is specified, du reports recursively on files and directories in and below the current directory. Its use is usually confined to sizing file and directory contents. ps - check process activity The ps(C) command obtains information about active processes. It gives a "snapshot" picture of what processes are executing, which is useful when you are trying to identify what processes are loading the system. Without options, ps gives information about the login session from which it was invoked. If you use ps as user root, you can obtain information about all the system's processes. The most useful options are as follows: Table A-1 ps options Option Reports on: -e print information on all processes generate a full listing generate a long listing (includes more fields) print information on a specified user (or users) -f -1 -u 173 Tools reference With various combinations of the above options you can, amongst other things, find out about the resource usage, priority and state of a process or groups of processes on the system. For example, below is an extract of output after typing ps -el: F 31 20 31 31 S S S S S UID 0 0 0 0 20 20 20 20 20 20 20 20 20 20 20 20 20 S S S S R S Z R S 0 0 20213 13079 13079 12752 13297 13297 12752 12752 12353 13585 20213 R S 0 S PID PPID C PRI NI 0 a 95 20 0 1 0 0 66 20 2 0 0 95 20 0 0 81 20 3 204 441 8783 25014 25016 27895 25733 26089 26142 28220 27047 28248 28240 1 1 1 24908 24910 26142 25153 25148 1 27898 25727 28205 8783 a 0 0 0 22 0 0 45 0 55 0 36 0 76 75 73 75 36 73 51 28 73 25 73 37 75 20 20 20 20 20 20 20 20 20 20 20 20 20 SZ 1f21 0 252 40 254 0 256 0 WCHAN TrY f0299018 eOOOOOOO fOOc687c fOObe318 416 972 1855 155c 506 7bO 96 44 48 48 144 40 f023451a f01076b8 f011bae4 f010ee28 f010ed58 f011f75c 8a8 1ce2 1e16 161c cc9 711 48 f012123c p12 48 f01214ec 010 188 f010f6bO p25 44 f012179c p13 p23 92 140 f01156f8 006 ADDR 03 006 p4 p2 010 TIME 0:00 30:37 0:01 5:19 1:56 0:00 0:04 0:01 0:03 0:00 0:00 0:01 0:04 0:01 0:00 0:00 0:00 CMD sched init vhand bdflush cron getty ksh ksh vi sh ksh csh email ksh ps vi The field headed F gives information about the status of a process as a combination of one or more octal flags. For example, the sched process at the top has a setting of 31 which is the sum of the flags 1,10 and 20. This means that the sched process is part of the kernel (1), sleeping at a priority of 77 or more (10), and is loaded in primary memory (20). The priority is confirmed by consulting the PRI field further along the line which displays a priority of 95. In fact both sched (the swapper) and vhand (the paging daemon) are inactive but have the highest possible priority. Should either of them need to run in the future they will do so at the context switch following their waking up as no other process will have a higher priority. For more information on the octal flags displayed and their interpretation see ps(C). The s column shows the state of each process. The states shown in the example: S, R, a and Z mean sleeping (waiting for an event), ready-to-run, on the processor (running) and zombie (defunct) respectively. There is only one process running, which is the ps command itself (see the penultimate line). Every other process is either waiting to run or waiting for a resource to become available. The exception is the zombie process which is currently terminating; this entry will only disappear from the process table if the parent issues a wait(S) system call. 174 Performance Guide ps - check process activity The current priority of a process is also a useful indicator of what a process is doing. Check the value in the PRI field which can be interpreted as shown in the following table: Table A-2 Priority values Priority Meaning 95 88 81 80 76 75 swapping/paging waiting for an inode waiting for I/O waiting for buffer waiting for pipe waiting for tty input waiting for tty output waiting for exit sleeping - lowest system mode priority highest user mode priority default user mode priority lowest user mode priority 74 73 66 65 51 o Looking back at the above ps output you can see, for example, that the getty process has a priority of 75, as it is (not surprisingly) waiting for some keyboard input. Whereas priority values between 66 and 95 are fixed for a specific action to be taken, anything lower than 66 indicates a user mode process. The running process in the above example (ps) is at priority 37 and is therefore in user mode. The C field indicates the recent usage of CPU time by a process. This is useful for determining those processes which are making a machine slow currently. The NI field shows the nice value of a process. This directly affects the calculation of its priority when it is being scheduled. All processes in the above example are running with the default nice value of 20. The TIME field shows the minutes and seconds of CPU time used by processes. This is useful for seeing if any processes are CPU hogs, or runaway, gobbling up large amounts of CPU time. 175 Tools reference The sz field shows the swappable size of the process's data and stack in lKB units. This information is of limited use in determining how much memory is currently occupied by a process as it does not take into account how much of the reported memory usage is shared. Totaling up this field for all memory resident processes will not produce a meaningful figure for current memory usage. It is useful on a per process basis as you can use it to compare the memory usage of different versions of an application. NOTE If you booted your system from a file other than /unix (such as /unix. old), you must specify the name of that file with the -n option to ps. I For example, ps -ef -n unix.old. sar - system activity reporter sar(ADM) provides information that can help you understand how system resources are being used on your system. This information can help you solve and avoid serious performance problems on your system. The individual sar options are described on the sar(ADM) manual page. For systems with an SCO SMP License, mpsar(ADM) reports systemwide statistics, and cpusar(ADM) reports per-CPU statistics. The following table summarizes the functionality of each sar, mpsar, and cpusar option that reports an aspect of system activity: Table A-3 sar, cpusar, and mpsar options Option Activity reported -a file access operations summarize all reports buffer cache copy buffers system calls block devices including disks and all SCSI peripherals floating point activity (mpsar only) serial I/O including overflows and character block usage scatter-gather and physical transfer buffers inter-CPU interrupts (cpusar and mpsar only) interrupts serviced per CPU (cpusar only) latches -A -b -B -c -d -F -g -h -I -j -L (Continued on next page) 176 Performance Guide sar - system activity reporter Table A-3 sar, cpusar, and mpsar options (Continued) Option Activity reported -m System V message queue and semaphores nameicache asynchronous I/O (AIO) paging run and swap queues processes locked to CPUs (cpusar and mpsar only) unused memory and swap process scheduling SCSI request blocks CPU utilization (default option for all sar commands) kernel tables paging and context switching terminal driver including hardware interrupts -n -0 -p -q -Q -r -R -S -u -v -w -y How sar works System activity recording is disabled by default on your system. If you wish to enable it, log in as root, enter the command lusr/lib/sarlsar_enable -y, then shut down and reboot the system. See sar_enable(ADM) for more information. Once system activity recording has been started on your system, it measures internal activity using a number of counters contained in the kernel. Each time an operation is performed, this increments an associated counter. sar(ADM) can generate reports based on the raw data gathered from these counters. sar reports provide useful information to administrators who wish to find out if the system is performing adequately. sar can either gather system activity data at the present time, or extract historic information collected in data files created by sadc(ADM) (System Activity Data Collector) or sal(ADM). 177 Tools reference If system activity recording has been started, the following crontab entries exist for user sys in the file /usr/spool/cron/crontabs/sys: o * * * 0-6 /usr/lib/sa/sa1 20,40 8-17 * * 1-5 /usr/lib/sa/sa1 The first sal entry produces records every hour of every day of the week. The second entry does the same but at 20 and 40 minutes past the hour between 8 am and 5 pm from Monday to Friday. So, there is always a record made every hour, and at anticipated peak times of activity recordings are made every 20 minutes. If necessary, root can modify these entries using the crontab(C) command. The output files are in binary format (for compactness) and are stored in lusr/adm/sa. The filenames have the format sadd, where dd is the day of the month. Running sar To record system activity every t seconds for n intervals and save this data to sar_data, enter sar -0 datafile t n on a single processor system, or mpsar -0 datafile t n on a multiprocessor system. For example, to collect data every 60 seconds for 10 minutes into the file /tmp/sar_data on a single CPU machine, you would enter: sar -0 Itmp/sar_data 60 10 To examine the data from datafile, the sar(ADM) command is: sar [ option ... ] [ -f datafile ] and the mpsar(ADM) and cpusar(ADM) commands are: mpsar [ option . .. ] [ -f datafile ] cpusar [ option . .. ] [ -f datafile ] Each option specifies the aspect of system activity that you want to examine. datafile is the name of the file that contains the statistics you want to view. For example, to view the sar -v report for the tenth day of the most recent month, enter: sar -v -f lusr/admlsalsal0 You can also run sar to view system activity in "real time" rather than examining previously collected data. To do this, specify the sampling interval in seconds followed by the number of repetitions required. For example, to take 20 samples at an interval of 15 seconds, enter: sar -v 15 20 178 Performance Guide swap - check and add swap space As shipped, the system allows any user to run sar in real time. However, the files in the JusT/adm/sa directory are readable only by Toot. You must change the permissions on the files in that directory if you want other users to be able to access sar data. With certain options, if there is no information to display in any of the relevant fields after a specified time interval then a time stamp will be the only output to the screen. In all other cases zeros are displayed under each relevant column. When tuning your system, we recommend that you use a benchmark and have the system under normal load for your application. swap - check and add swap space Swap space is secondary disk storage that is used when the system considers that there is insufficient main memory. On a well-configured system, it is primarily used for processing dirty pages when free memory drops below the value of the kernel parameter GPGSLO. If memory is very short, the kernel may swap whole processes out to swap. Candidates for swapping out are processes that have been waiting for an event to complete or have been stopped by a signal for more than two seconds. If a process is chosen to be swapped out then its stack and data pages are written to the swap device. (Initialized data and program text can always be reread from the original executable file on disk). The system comes configured with one swap device. Adding additional swap devices with the swap(ADM) command makes more memory available to user processes. Swapping and excessive paging degrade system performance but augmenting the swap space is a way to make more memory available to executing processes without optimizing the size of the kernel and its internal data structures and without adding physical memory. The following command adds a second swap device, /dev/swapl, to the system. The swap area starts 0 blocks into the swap device and the swap device is 16000 512-byte blocks in size. swap -a /dev/swap1 0 16000 Use the swap -1 command to see statistics about all the swap devices currently configured on the system. You can also see how much swap is configured on your system at startup by checking nswap. This is listed in the configuration and diagnostic file jusT/adm/messages as a number of 512-byte blocks. 179 Tools reference Running the swap -a command adds a second swap device only until the system is rebooted. To ensure that the second swap device is available every time the system is rebooted, use a startup script in the /etc/rc2.d directory. For example, you could call it S09AddSwap. In this release, a swap area can also be created within a file system to allow swapping to a file. To do this, you must marry a block special device to a regular file. For more information, see swap{ADM) and marry{ADM). timex - examine system activity per command timex{ADM) times a command and reports the system activities that occurred on behalf of the command as it executed. Run without options it reports the amount of real (clock) time that expired while the command was executing and the amount of CPU time (user and system) that was devoted to the process. For example: # timex command command_options real user sys 6:54.30 53.98 14.86 Running timex -s is roughly equivalent to running sar -A, but it displays system statistics only from when you issued the command until the command finished executing. If no other programs are running, this information can help identify which resources a specific command uses during its execution. System consumption can be collected for each application program and used for tuning the heavily loaded resources. Other information is available if the process accounting software is installed; see timex{ADM) for more information. NOTE To enable process accounting, log in as root, enter the command lusrllib/acctlacct_enable -y, then shutdown and reboot the system. See acct_enable{ADM) for more information. I timex belongs to a family of commands that report command resource usage. It can be regarded as an extension to time{C) which has no options and produces output identical to timex without options. If you wish to use time then you must invoke it by its full pathname as each of the Bourne, Kom and C shells have their own built-in version. The output from each of the shell builtins varies slightly but is just as limited. The C shell, however, does add in average CPU usage of the specified command. 180 Performance Guide vmstat - virtual memory statistics vmstat - virtual memory statistics vmstat(C) is a useful tool for monitoring system performance but is not as comprehensive as sar. vmstat gives an immediate picture of how a system is functioning. It enables you to see if system resources are being used within their capacity. vmstat's default output concentrates on four types of system activity - process, paging/swapping, system and CPU activity. If a timing interval is specified then vmstat produces indefinite output until you press (Del). Consider the following example for the command vmstat 5: PROCS PAGING SYSTEM CPU dmd sw cch fil pft frp pas pif pis rsa rsi sy cs us su id r b w frs 1 126 o 127 1 126 o 127 o 127 1 129 o 130 o 130 o 130 o 130 o 130 o 64000 o 64000 o 64000 o 64000 o 64000 o 64000 o 64000 o 64000 o 64000 o 64000 o 64000 0 8 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 55 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 59 47 45 86 24 o 1369 0 277 0 78 0 117 0 138 0 144 34 0 3 97 22 0 2 98 16 0 2 98 23 1 5 94 12 0 1 99 43 19 42 39 36 2 6 92 26 0 1 99 36 0 1 99 46 0 2 98 51 1 2 97 In this case vmstat displays data at regular intervals. Each display represent- ing an average of the activity over the preceding five second interval. The PROCS heading encompasses the first three fields of output: r number of processes on the run queue b number of processes blocked waiting for a resource w number of processes swapped out During the sample period there were no swapped out processes, hardly any processes on the run queue, and between 126 and 130 blocked processes. Any process which was ready to run would not spend much time on the run queue. This conclusion is reinforced by the value of id under the CPU heading which shows that the system is almost 100% idle most of the time. The PAGING heading encompasses both paging and swapping activity on the system. The operating system does not preallocate swap space to running processes. It only allocates swap space to processes that have been swapped at least once; this space is only relinquished when such a process terminates. It does, however, decrease its internal count of available swapp able memory. 181 Tools reference In the above example, the amount of free swap space (frs) remains a constant 64000 (roughly 32MB in 512-byte units). Because this is the amount of swap originally configured for this system, no swapping or paging out to disk occurred during the sampling period. This is confirmed by the zero value of the w field. The fields from pas to rsi also show that no processes or regions were swapped in or out during the time that vmstat was running. There is a brief amount of paging activity on the sixth line of output. One or more processes attempted to access pages that were not currently valid. To satisfy the demand for these pages, the kernel obtained them from the page cache (cch) in memory or from file systems on disk but not from swap (sw). H a process invokes the fork(S) system call, this creates an additional copy, or child process, of the original process. The new process shares the data or stack regions of its parent. The pages in these regions are marked copy-on-write ( COW). This is to avoid wasting CPU and memory resources because the usual purpose of a fork is for either the parent or child process to execute a new command in place of itself. H, instead, the parent or child process tries to write to a page marked COW, this generates a protection fault (pft) causing the page fault handler in the kernel to make a copy of the page. The dmd field accounts for a combination of demand zero pages (those created and initialized with zeros for data storage) and demand fill pages (those created and filled with text). System call (sy) and context switching activity (cs) can also be seen under the SYSTEM heading. The -s option to vmstat reports statistics about paging activity since the system was started or in a specified time interval: 64000 12222 25932 44589 28719 33791 84644 23 free swap space demand zero and demand fill pages pages on swap pages in cache pages on file protection fault pages are freed success in swapping out a process o fail in swapping in a process 22 success in swapping in a process 98 swapping out a region 64 swapping in a region 457461 cpu context switches 1870524 system calls 182 Performance Guide vmstat - virtual memory statistics Lines showing large values for pages on swap, success in swapping out a process, success in swapping in a process, swapping out a region, and swapping in a region may indicate that excessive swapping or paging is degrading performance. The -f option to vmstat provides information about the number of forks (that is, new processes created) since the system was started or in a specified time interval. For example, to monitor how many fork system calls are being invoked every second, use the command vmstat -f 1: o forks o forks 2 forks 1 forks o forks 183 Tools reference 184 Performance Guide Appendix B Configuring kernel parameters Kernel parameters control the allocation of various kernel resources. These resources are constantly being used, released and recycled, and include: buffers Recently used data is cached in memory; buffers increase performance by reducing the need to read data from disk. Buffers also allow efficient transfer of data by moving it in large units. table entries Space in system tables that the kernel uses to keep track of current tasks, resources, and events. policies Governing such things as security, and conformance to various standards. Other parameters are used to indicate control the behavior of device drivers or the available quantity of special resources such as the number of multiscreens or semaphores. Each resource limit is represented by a separate kernel parameter. The limit imposed by a parameter can be decreased or extended, sometimes at the expense of other resources. Deciding how to optimize the use of these resources is one aspect of kernel performance tuning. For a description of the tools available for examining and changing parameters, see "Configuration tools" (page 188). For a description of the various kernel parameters that you can change using the configure(ADM) utility or via the Hardware/Kernel Manager, see "Kernel parameters that you can change using configure" (page 191). For a description of the various kernel parameters that you can only change from the command line using the idtune(ADM) utility, see ''Using idtune to reallocate kernel resources", (page 190). 185 Configuring kernel parameters See "Using configure to change kernel resources" (page 189) for a description of how to run the configure{ADM) utility. If you have TCP lIP installed on your system, see Appendix C, "Configuring TCP lIP tunable parameters" (page 225). If you are using the LAN Manager Client Filesystem (LMCFS), see "LAN Manager Client Filesystem parameters" (page 223). When to change system parameters Among the cases in which you may need to reallocate system resources are: • You install additional physical memory and thus have greater memory resources to allocate. • Persistent error messages are being displayed on the system console indicating that certain resources are used up, such as inodes or table entries. • The system response time is consistently slow, indicating that other resources are too constrained for the system to operate efficiently (as when too little physical memory is installed). • Resource usage needs to be tailored to meet the needs of a particular application. If one of your performance goals is to reduce the size of the kernel (usually because the system is paging excessively or swapping), first concentrate on tunable parameters that control large structures. The following table lists a small subset of kernel tunable parameters and indicates the cost (or benefit) in bytes of incrementing (or decrementing) each parameter by a single unit. For example, if NCLIST set to 200, this requires 200 times 72 bytes, or approximately 14KB of memory. 186 Performance Guide When to change system parameters Parameter Number of bytes per unit parameter DTCACHEENTS DTHASHQS HTCACHEENTS HTHASHQS 44 8 44 8 1024 72 (64 for the buffer + 8 for the header) 8 8 4096 8 246 80 (52 for the STREAMS header + 28 for the extended header) 76 per entry added to the dynamic in-core inode table 344 per entry added to the dynamic process table 12 per entry added to the dynamic open file table 76 per entry added to the dynamic region table NBUF NCLIST NHBUF NHINODE NMPBUF MSGMAP NSPTTYS NSTREAM MAX_INODE MAX_PROC MAX_FILE MAX_REGION Dynamic table parameters such as MAX_PROC usually have their values set to O. Each table grows in size as more entries are needed. The memory overhead of the grown kernel table can be found by multiplying the values shown above by the number of table entries reported by getconf(C). For example, from the Korn shell, you can find the current size of the process table by entering: let nproc=344*$(getconf KERNEL_PRO C) echo "Size of process table in bytes is $nproc" Specialized applications often require the reallocation of key system resources for optimum performance. For example, users with large databases may find that they need more System V semaphores than are currently allocated. Most of the tunable parameters discussed in this chapter are defined in letc!conf/cfdlmtune. This file lists the default, maximum and minimum values respectively of each of the parameters specified. To change the values of specific tunable parameters manually, use the appropriate tool as described in #Configuration tools" (page 188). 187 Configuring kernel parameters Configuration tools The following tools are available for examining and/or changing tunable parameters: configure A menu-driven program that allows you to examine and modify the value of tunable kernel parameters. This program is also accessible via the Hardware/Kernel Manager. See "Using configure to change kernel resources" (page 189) and configure(ADM) for more information. getconf This utility reports configuration-dependent values for various standards and for dynamic kernel tables; use setconf to modify temporarily those values that relate to dynamic kernel tables. See "Examining and changing configuration-dependent values" (page 223) and getconf(C) for more information. idtune Modify the values of some tunable parameters (defined in /etc/conf/cf.dlmtune) that cannot be modified with configure. See "Using idtune to reallocate kernel resources" (page 190) and idtune(ADM) for more information. iddeftune Run this command to modify the values of certain tunable parameters if you increase the amount of physical memory (RAM) to more than 32MB. See iddeftune(ADM) for more information. ifconfig Reconfigure the TCP lIP protocol stack belonging to a single network interface. See "Using ifconfig to change parameters for a network card" (page 225) and ifconfig(ADMN) for more information. inconfig Reconfigure default TCP lIP settings for all network interfaces. See "Using inconfig to change global TCP/IP parameters" (page 226) and inconfig(ADMN) for more information. Network Configuration Manager Examine, configure, or modify network protocol stacks (chains). The Network Configuration Manager is the graphical version of netconfig(ADM). See Chapter 25, "Configuring network connections" in the sea OpenServer Handbook for more information. setconf 188 Increase dynamic kernel table sizes, or decrease maximum size of dynamic kernel tables. The new value only remains in force until the system is next rebooted. See "Examining and changing configuration-dependent values" (page 223) and setconf(ADM) for more information. Performance Guide Configuration tools Using configure to change kernel resources The configure(ADM) utility is a menu-driven program that presents each tunable kernel parameter and prompts for modification. To change a kernel parameter using configure, do the following: 1. Enter the following commands as root to run configure: cd /etdconflC£.d lconfigure 2. The configure menu displays groups of parameter categories; their individual meanings are discussed in "Kernel parameters that you can change using configure" (page 191). Choose a category by entering the number preceding it. The resources in that category are displayed, one by one, each with its current value. Enter a new value for the resource, or to retain the current value, press (Enter). After all the resources in the category are displayed, configure returns to the category menu prompt. Return to the Main Menu to choose another category or exit configure by entering" q". I NOTE The software drivers associated with a parameter must be present in the kernel for the setting of the parameter to have any effect. 3. After you finish changing parameters, link them into a new kernel and reboot your system as described in "Relinking the kernel" in the sea OpenServer Handbook. NOTE If you wish to set the values of parameters defined in letc/conf/cfdlmtune from a shell script, you should use the idtune(ADM) command as described in ~'Using idtune to reallocate kernel resources" (page 190). 189 Configuring kernel parameters Using idtune to reallocate kernel resources You cannot use configure to change some kernel parameters because they are not generally considered to need adjusting. If you do need to alter such a parameter, log in as root and use the idtune(ADM) command: cd letdconflcf.d letdconflbinlidtune resource value resource is the name of the tunable parameter in uppercase as it appears in /etc/conf/cfd/mtune (see mtune(F». value is the parameter's new value. Mter changing the parameter values, relink the kernel, shut down and reboot the system as described in "Relinking the kernel" in the sea OpenServer Handbook. You can use the -£ option to idtune to force it to accept a value outside the range specified by the minimum and maximum values defined in mtune. If necessary, you can also use the -min and -max options to write new minimum and maximum values to the mtune file. WARNING The configure and idtune commands write new values defined for kernel parameters to !etc/conf/cfd/stune (see stune(F». Do not edit mtune itself as it can be a valuable reference. I The following sections describe the parameters that can only be tuned using idtune: • "Boot load extension parameters" (page 222) • "Buffer cache free list" (page 195) • "Hardware and device driver parameters" (page 222) • "Memory management parameters" (page 197) • "Message queue parameters" (page 215) • "Semaphore parameters" (page 217) • "Shared memory parameters" (page 218) • "STREAMS parameters" (page 213) • "System parameters" (page 219) • "LAN Manager Client Filesystem parameters" (page 223) 190 Performance Guide Configuration tools Kernel parameters that you can change using configure The tunable parameters that you can change using configure (ADM) are grouped into two sets of categories depending on whether they affect system performance or configuration: Performance tun abies • #Buffer management" (page 192) • #Processes and paging" (page 195) • #TTYs" (page 197) • "Name cache" (page 198) • #Asynchronous I/O" (page 199) • "Virtual disks" (page 200) Configuration tunables • "User and group configuration" (page 201) • "Security" (page 203) • "TTY and console configuration" (page 204) • "Filesystem configuration" (page 205) • "Table limits" (page 207) • "STREAMS" (page 209) • "Message queues" (page 213) • "Event queues" (page 216) • "Semaphores" (page 216) • "Shared memory" (page 218) • "Miscellaneous system parameters" (page 219) • 'Miscellaneous device drivers and hardware parameters" (page 220) 191 Configuring kernel parameters Buffer management The following tunables may be used to tune the performance of your system's buffers. NBUF The amount of memory in IKB units allocated for use by the system buffer cache at boot time. The system buffer cache is memory used as a temporary storage area between the disk and user address space when reading to or writing from mounted filesystems. If NBUF is set to the default of 0, the system calculates the size of the buffer cache automatically. The size of the buffer cache is displayed as "kernel i/o bufs" at boot time, and is recorded along with other configuration information in lusrladmlmessages. The hit rate on the buffer cache increases as the number of buffers is increased. Cache hits reduce the number of disk accesses and thus may improve overall disk I/O performance. Study the sar -b report for statistics about the cache hit rate on your system. See "Increasing disk I/O throughput by increasing the buffer cache size" (page 75) for more information. The system buffer cache typically contains between 300 and 600 buffers, but may contain 8000 or more buffers on a large server system. The maximum possible number of buffers is 450000. On HTFS, EAFS, AFS, and S5IK filesystems, each buffer uses IKB of memory plus a 72-byte header. Having an unnecessarily large buffer cache can degrade system performance because too little space is available for executing processes. If you are using the DTFS filesystem, buffers are multiples of 512 bytes in size ranging from 512 bytes to 4KB. The number of buffers in the buffer cache is not constant in this case and varies with demand. For optimal performance, you should adjust the number of hash queues (NHBUF) when you adjust the value of NBUF. NHBUF Specifies how many hash queues to allocate for buffer in the buffer cache. These are used to search for a buffer (given a device number and block number) rather than have to search through the entire list of buffers. This value of NHBUF must be a power of 2 ranging between 32 and 524288. Each hash queue costs 8 bytes of memory. The default value of NHBUF is owhich sets the number of hash queues automatically: • On single processor machines, NHBUF is set to the power of 2 that is less than or equal to half the value of NBUF. • On multiprocessor machines, NHBUF is set to the power of 2 that is greater than or equal to twice the value of NHBUF. This reduces the likelihood of contention between processors wanting to access the same hash queue. 192 Performance Guide Configuration tools NMPBUF Number of 4KB pages of memory used for the following types of multiphysical buffers: • 16KB scatter-gather buffers (also known as cluster buffers). These are used to perform transfers of contiguous blocks of data on disk to and from the buffer cache. • 4KB transfer buffers. These are used as intermediate storage when mov- ing data between memory and peripheral devices with controllers that cannot access memory above 16MB. • 1KB copy request buffers. These are used as intermediate storage when moving data between the buffer cache and peripheral devices with controllers that cannot access memory above 16MB. NMPBUF should be set larger than 40 for machines with more than 16MB of memory and many users. The maximum possible size is 512. If the value of NMPBUF is set to zero (default), the kernel determines a suitable value automatically at startup. In this case, it sets the value of NMPBUF in the range 40 to 64 depending on the amount of available memory. PLOWBUFS Amount of buffer cache that is contained in the first 16MB of RAM. It is expressed as a percentage, and should be as high as possible if the controllers for the peripheral devices (such as the disks) in your system cannot perform DMA to memory above the first 16MB (24-bit addressing controllers). If possible, set PLOWBUFS to 100 to eliminate the need to copy between buffers above 16MB and the copy buffers (see NMPBUF). To ascertain if a SCSI host adapter can access memory above the first 16MB (32-bit addressing), consult the initialization message for its driver in the file /usr/adm/messages. If the string fts= is followed by one or more characters including a d, the controller is 32-bit, otherwise it is 24-bit. The default value of PLOWBUFS is 30, and can range between 1 and 100%. You need only change this parameter if your system has more than 16MB of RAM. PUTBUFSZ Specifies the size of the circular buffer, putbuf, that contains a copy of the last PUTBUFSZ characters written to the console by the operating system. The contents of putbuf can be viewed by using crash(ADM). The default and minimum value is 2000; the maximum is 10000. NlllNODE Specifies the size of the inode hash table which must be a power of 2. It ranges from 64 to 8192 with a default value of 128. 193 Configuring kernel parameters BDFLUSHR Specifies the rate for the bdflush daemon process to run, checking the need to write the filesystem buffers to the disk. The range is 1 to 300 seconds. The value of this parameter must be chosen in conjunction with the value of NAUTOUP. For example, it is nonsensical to set NAUTOUP to 10 and BDFLUSHR to 100; some buffers would be marked dirty 10 seconds after they were written, but would not be written to disk for another 90 seconds. Choose the values for these two parameters considering how long a dirty buffer may have to wait to be written to disk and how much disk-writing activity will occur each time bdflush becomes active. For example, if both NAUTOUP and BDFLUSHR are set to 40, buffers are 40 to 80 seconds old when written to disk and the system will sustain a large amount of diskwriting activity every 40 seconds. If NAUTOUP is set to 10 and BDFLUSHR is set to 40, buffers are 10 to 50 seconds old when written to disk and the system sustains a large amount of disk-writing activity every 40 seconds. Setting NAUTOUP to 40 and BDFLUSHR to 10 means that buffers are 40 to 50 seconds old when written, but the system sustains a smaller amount of disk writing activity every 10 seconds. With this setting, however, the system may devote more overhead time to searching the block lists. WARNING If the system crashes with BDFLUSHR set to 300 (its maximum possible value) then 150 seconds worth of data, on average, will be lost from the buffer cache. A high value of BDFLUSHR may radically improve disk 110 performance but will do so at the risk of significant data loss. NAUTOUP Specifies the buffer age in seconds for automatic filesystem updates. A system buffer is written to disk when the bdflush daemon process runs and the buffer has been scheduled for a write for NAUTOUP seconds or more. This means that not all write buffers will be flushed each time bdflush runs. This enables a process to perform multiple writes to a buffer but fewer actual writes to a disk. This is because bdflush will sometimes run less than NAUTOUP seconds after certain buffers were written to. These will remain scheduled to be written until the next appropriate flush. The ratio of writes between physical memory to kernel buffer and buffer to disk will tend to increase (that is, fewer actual disk writes) if the ratio between the flush rate BDFLUSHR and NAUTOUP decreases. Specifying a smaller limit increases system reliability by writing the buffers to disk more frequently and decreases system performance. Specifying a larger limit increases system performance at the expense of reliability. The default value is 10, and ranges between 0 (flush all buffers regardless of how short a time they were scheduled to be written) and 60 seconds. 194 Performance Guide Configuration tools Buffer cache free list NOTE This parameter is not tunable using configure(ADM); you must use the idtune(ADM) command instead as described in ''Using idtune to reallocate kernel resources" (page 190). I BFREEMIN Sets a lower limit on the number of buffers that must remain in the free list. This allows some (possibly useful) blocks to remain on the free list even when a large file is accessed. If only BFREEMIN buffers remain on the freelist, a process requiring one or more buffers may sleep until more become available. The value of BFREEMIN is usually set to the default and minimum value of 0; the maximum value is 100. You may see an improvement in the buffer cache read and write hit rates reported by sar -b if you set the value of BFREEMIN to the smaller of NBUF /10 or 100. An improvement in performance is most likely on machines that are used primarily for media copying, uucp transfers, and running other applications that are both quasi-single-user and access many files. Processes and paging The tunable parameters GPGSLO and GPGSHI determine how often the paging daemon vhand runs. vhand can only run at clock ticks and it is responsible for freeing up memory when needed by processes. It uses a "least recently used" algorithm as an approximation of process working sets, and it writes out pages to disk that are not modified during a defined time period. GPGSLO Specifies the low value of free memory pages at which vhand will start stealing pages from processes. Normally, GPGSLO is tuned to a value that is about 1/16 of pagable memory. Increase the value to make the vhand daemon more likely to become active; decrease the value to make it less likely to become active. The value of GPGSLO must be a positive whole number greater than or equal to 0 and less than or equal to 200. Its value must also be less than that of GPGSHI. If GPGSLO is too large a fraction of the pages that are available, vhand becomes active before memory starts to become really short and useful pages may be paged out. If GPGSLO is too small, the system may run out of memory altogether between clock ticks. If this happens, the swapper daemon sched runs to swap whole processes out to disk. GPGSHI Specifies the high value of free memory pages at which vhand will stop stealing pages from processes. Normally GPGSHI is set to a value that is about 1/10 of pagable memory. 195 Configuring kernel parameters The value of GPGSHI must be a positive whole number greater than or equal to 1 and less than or equal to 300. Its value must also be greater than that of GPGSLO. If the interval between GPGSLO and GPGSHI is too small, there will be a tendency for vhand to be constantly active once the number of free pages first drops below GPGSLO. If the interval is too large, a large amount of disk activity is required to write pages to disk. MINARMEM Threshold value that specifies the minimum amount (in pages) of physical memory tha~ is available for the text and data segments of user processes. (Available physical memory for user processes is shown by the command od -d availrmem in crash(ADM).) The default and minimum is 25; the maximum is 40 pages. If there is ever insufficient physical memory available to allocate to STREAMS or kernel memory allocated resources, an application may fail or hang, and the system will display the following message on the console: CONFIG: routine - n resident pages wanted If you see this message, it is likely that your system has insufficient RAM. MINASMEM Threshold value that specifies the minimum size (in pages) that available virtual memory is allowed to reach. (Available virtual memory is shown by the command od -d availsmem in crash(ADM).) More swap space or physical memory must be added to the system if it runs out of virtual memory. In the case of adding swap space, this can be done dynamically using swap-ta-file. If system performance is still poor because it is swapping or paging out excessively, add more RAM to the system. The default and minimum is 25; the maximum is 40 pages. If this limit is exceeded, the following message is displayed on the console: CONFIG: swapdel - Total swap area too small (MlNASMEM = nu~ber exceeded) If there is ever insufficient physical memory available to allocate to STREAMS or kernel memory allocated resources, an application may fail or hang, and the system will display the following message on the console: CONFIG: routine - n swappable pages wanted If you see this message, increasing the value of MINASMEM may help but it is more likely that your system has insufficient memory or swap space. MAX SLICE Specifies in clock ticks the maximum time slice for user processes. After a process executes for its allocated time slice, that process is suspended. The operating system then dispatches the highest priority process from the run queue, and allocates to it MAXSLICE clock ticks. MAXSLICE must be a value from 25 to 100; the default is 100. 196 Performance Guide Configuration tools SPTMAP Determines the size of the map entry array used for managing kernel virtual address space. The default value is 200; the minimum and maximum values are 100 and 500. Memory management parameters NOTE This group of parameters is not tunable using configure(ADM); you must use the idtune(ADM) command instead as described in "Using idtune to reallocate kernel resources" (page 190). I MAXSC Specifies the maximum number of pages that are swapped out in a single operation. The default and maximum value is 8. MAXFC Maximum number of pages that are added to the free list in a single operation. The default and maximum value is 8. TTYs The following parameters control various data structure sizes and other limits in character device drivers provided with the operating system. NCLIST Specifies the number of character list buffers to allocate. Each buffer contains up to 64 bytes of data. The buffers are dynamically linked to form input and output queues for the terminal lines and other slow-speed devices. The average number of buffers needed for each terminal is in the range of 5 to 10. Each entry (buffer space plus header) costs 72 bytes. When full, input and output characters dealing with terminals are lost, although echoing continues, and the following message is displayed on the console: CONFIG: OUt of clists (NCLIST = nu~ber exceeded) The default and minimum value of NCLIST is 120, and the maximum is 16640. For users logged in over serial lines with speeds up to 9600 bps, the recommended setting of NCLIST is 10 times the maximum number of users that you expect will log in simultaneously. You should also increase the TTHOG parameter; this controls the effective maximum size of the raw input queue for fast serial lines. 197 Configuring kernel parameters Since each buffer is 64 bytes in size, you should increase NCLIST by TTHOG divided by 64 and multiplied by the number of fast serial lines, as shown in the following table: lTHOG value Increase NCLIST by 2048 4096 8192 32 * number of fast serial lines 64 * number of fast serial lines 128 * number of fast serial lines TTHOG Sets the effective size of the raw queue of the tty driver. The default and minimum value is 256 bytes; the maximum is 8192 bytes. Increasing the value of this parameter allows more unprocessed characters to be retained in the tty buffer, which may prevent input characters from being lost if the system is extremely busy. If you are using sustained data transfer rates greater than 9600 bps, you should increase TTHOG to 2048 or 4096 bytes depending on the demands of the application. You must also increase the value of NCLIST to match the increased value of TTHOG. Name cache The following parameters control the performance of the namei caches that are used to speed the translation of filenames to inode numbers. Parameters beginning with HT control the namei cache used with HTFS, EAFS, and AFS file systems (all based on the ht filesystem driver). HTCACHEENTS Number of name components in the ht namei cache. It must have a value of between 1 and 4096; the default is 256. The recommended value for diverse workgroups is to make HTCACHEENTS large, roughly three times the maximum grown size of the in-core inode table reported by sar -v. HTHASHQS Number of hash queues for the ht namei cache. HTHASHQS must be a prime number between 1 and 8191; the default is 61. The recommended value of HTHASHQS for diverse workgroups is to make it at least half the size of HTCACHEENTS. HTOFBIAS Determines the bias towards keeping the names of open files in the ht namei cache. It must have a value of between 1 and 256; the default is 8. The higher that you make the value of HTOFBIAS, the longer the names will remain in the cache. A value of 0 means that the names have no special caching priority. 198 Performance Guide Configuration tools Parameters beginning with DT control the namei cache used with DTFS filesystems (based on the dt filesystem driver). DTCACHEENTS Number of name components in the dt namei cache. It must have a value of between 1 and 4096; the default is 256. The recommended value for diverse workgroups is to make DTCACHEENTS large, roughly three times the maximum grown size of the in-core inode table reported by sar -v. DTHASHQS Number of hash queues for the dt namei cache. DTHASHQS must be a prime number between 1 and 8191; the default is 61. The recommended value of DTHASHQS for diverse workgroups is to make it at least half the size of DTCACHEENTS. DTOFBIAS Determines the bias towards keeping the names of open files in the dt namei cache. It must have a value of between 1 and 256; the default is 8. The higher that you make the value of DTOFBIAS, the longer the names will remain in the cache. A value of 0 means that the names have no special caching priority. Asynchronous 110 The asynchronous I/O feature supports asynchronous I/O operations on raw disk partitions. It must be added to the kernel using the mkdev aio command for these parameters to have any effect (see aio(HW) for more information). NAIOPROC Size of the AID process table that determines the number of processes that may be simultaneously performing asynchronous I/O. The range of values is between 1 and 16; the default is 5. When the AID process table overflows, the following message is displayed on the console: CONFIG: aio_rnemlock - AIO process table overflow (NAIOPROC =nur.nber exceeded) NAIOREQ Size of the AID request table that determines the maximum number of pending asynchronous I/O requests. The range of values is between 5 and 200; the default is 120. When the AID request table overflows, the following message is displayed on the console: CONFIG: aio_breakup - AIO request table overflow (NAIOREQ = nur.nber exceeded) NAIOBUF Size of the AID buffer table that determines number of asynchronous I/O buffers. This should always be set to the same value as NAIOREQ. When the AID buffer table overflows, the following message is displayed on the console: CONFIG: aio_breakup - AIO buffer table overflow (NAIOBUF =nu~ber exceeded) 199 Configuring kernel parameters NAIOHBUF Number of internal asynchronous hash queues. The range of values is between 1 and 50; the default is 25. NAIOREQPP Maximum number of asynchronous I/O requests that a single process can have pending. The default value is 120, meaning that a single process can potentially exhaust all asynchronous I/O resources. The range of values is between 30 and 200. NAIOLOCKTBL Number of entries in the internal kernel table for asynchronous I/O lock permissions. The range of values is between 5 and 20; the default is 10. If there are many entries in the /usr/lib/aiomemlock file, this value may need to be increased. When the AIO lock table overflows, the following message is displayed on the console: CONFIG: aio_setlockauth - AIO lock table overflow (NAIOLOCKTBL =nur.nber exceeded) Virtual disks The following parameters control the performance of virtual disk arrays if these are configured on your system. VDUNITMAX The maximum number of virtual disks that can be configured. This parameter defines the size of several structures used by the vd driver. On systems where the number of virtual disks is likely to be constant, set VDUNITMAX equal to the number of virtual disks. The default value is 100; the minimum and maximum values are 5 and 256. VDJOBS The maximum number of virtual disk jobs that can exist in the global job pool. The default value is 200; the minimum and maximum values are 100 and 400. VDUNITJOBS The maximum number of job structures and piece pool entries for each virtual disk in the system. A piece pool entry contains a piece structure for each disk piece in a virtual disk array. For example, a piece pool entry for a three-piece RAID 5 array contains three piece structures. Each job structure is 88 bytes in size. Each piece structure is 84 bytes in size. The default value of VDUNITJOBS is 100; the minimum and maximum values are 50 and 200. VDHASHMAX The size of the hash table used for protecting the integrity of data during read, modify, and write operations. Each hash table entry requires 24 bytes of memory. The value of VDHASHMAX must be a power of 2; the minimum and maximum values are 512 and 8192. The default value is 1024. 200 Performance Guide Configuration tools VDASYNCPARITY Controls whether writes to the parity device on RAID 4 and 5 devices are performed asynchronously. The default is 1 (write asynchronously). If set to 0, the system waits for all I/O to complete. VDASYNCWRITES Controls whether writes to the other half of a RAID 1 device (mirror) are performed asynchronously. The default is 1 (write asynchronously). If set to 0, the system waits for I/O on both halves of a mirror to complete. VDASYNCMAX Sets the maximum number of outstanding asynchronous writes for RAID 1, 4 and 5 configurations in asynchronous mode (that is, VDASYNCWRITES or VDASYNCPARITY are set to 1). The default value is 20; the minimum and maximum values are 20 and 64. VDWRITEBACK Enables write-back caching. This increases the throughput of a virtual disk by writing data asynchronously during the last phase of a readmodify-write job. The default value is 0 (do not use write-back caching). If set to 1, write-back caching is enabled. WARNING Enabling write-back caching may compromise the integrity of the data if the system crashes. Use this feature only at your own discretion. I VDRPT The interval in seconds between error conditions being reported. The default value is 3600; the minimum and maximum values are 0 and 86400 seconds. If set to 0, errors are only reported when detected. User and group configuration The following parameters control resources that are specific to individual users or groups. NOFILES Specifies the maximum number of open files for each process. Unless an application package recommends that NOFILES be changed, the default setting should be left unaltered. The Bourne, C and Kom shells all use three file table entries: standard input, standard output, and standard error (file descriptors 0, 1, and 2 respectively). This leaves the value of NOFILES minus 3 as the number of other open files available for each process. If a process requires up to three more than this number, then the standard files must be closed. This practice is not recommended and must be used with caution, if at all. If the configured value of NOFILES is greater than the maximum (11000) or less than the minimum (60), the configured value is set to the default (110), and a message is sent to the console. 201 Configuring kernel parameters Unless an application package recommends that NOFILES be changed, the default setting should be left as is. ULIMIT Specifies in 512-byte blocks the size of the largest file that an ordinary user can write. The default value is 2097151; that is, the largest file an ordinary user can write is approximately 1GB (one gigabyte). A lower limit can be enforced on users by changing the value of ULIMIT in the file /etc/dejault/login; see login(M). The ULIMIT parameter does not apply to reads; any user can read a file of any size. MAXUP Specifies how many concurrent user processes an ordinary user is allowed to run. The entry is in the range of 15 to 16000, with a default value of 100 processes. This value should be at least 10% smaller than the value of MAX_PROC (or the maximum grown size of the process table reported by sar -v if MAX_PROC is set to 0). This value is determined by the user identification number, not by the terminal. For example, the more people that are logged in on the same user identification, the quicker the default limit would be reached. MAXUMEM Maximum size of a process' virtual address space in 4096-byte pages. The allowed range of values is between 2560 and 131072; the default is 131072 pages (512MB). If you decrease this value and a process will not start due to lack of memory, its parent shell reports one of the messages: "Too big" or "Not enough space". NGROUPS Maximum number of simultaneous supplemental process groups per process. The value of NGROUPS can be set to any integral value from 0 to 12B; the default value is B. NGROUPS maps to the POSIX.1 runtime value NGROUPS_MAX for which the minimum value allowed by FIPS is B. To retain FIPS and XPG4 compliance, you must restrict the value of NGROUPS to be greater than or equal toB. CMASK The default mask used by umask(S) for file creation. By default this is zero, meaning that the umask is not set in the kernel. The range of values is between 0 and 0777. See chmod(C)and umask(C) for an explanation of setting absolute mode file permissions. 202 Performance Guide Configuration tools CHOWN_RES Controls system-wide chown kernel privilege (formally known as the chown kernel authorization) on all filesystems that set the POSIX.l constant _POSIX_CHOWN_RESTRICTED (also defined in X/Open CAE Specification, System Interfaces and Headers, Issue 4, 1992). See getconf(C) for more information. If set, CHOWN_RES prevents all users except root from changing ownership of files on all filesystems that support _POSIX_CHOWN_RESTRICTED. The default value of CHOWN_RES is 0 (not set) which causes the restriction not to be enforced. You can also use the chown kernel privilege to control users' privilege to change file ownership. If chown kernel privilege is removed, some XPG4conformant applications may fail if they use interprocess communication (semaphores, shared memory, and message passing). You should only set chown kernel privilege in this way if you require C2-level security. IOV_MAX Maximum size of the I/O vector (struct iovec) array (number of noncontiguous buffers) that can be used by the readv(S) (scatter read) and writev(S) (gather write) system calls. The default value is 512; the minimum and maximum values are 16 and 1024. Security The security profile (High, Improved, Traditional, or Low) can be selected as discussed in "Changing the system security profile" in the System Administration Guide. The security parameters can be set to modify the behavior of the security features and to ensure compatibility with utilities that expect traditional UNIX system behavior. Each of these parameters can be set to 0 (off) or 1 (on). SECLUID Controls the enforcement of login user ID (LUID). Under SCO's implementation of C2 requirements, every process must have an LUID. This means that processes that set UIDs or GIDs, such as the printer scheduler (lpsched), must have an LUID set when started at system startup in /etc/rc2.d/S80lp. This can cause problems with setuid programs. When the security default is set to a profile other than "High", enforcement of LUID is relaxed and setuid programs do not require an LUID to run. 203 Configuring kernel parameters SECSTOPIO Controls whether the kernel implements the stopio(S) system call. When SECSTOPIO is set to 1, the kernel acts on stopio(S) calls; when it is set to 0, the kernel ignores stopio calls. The stopio system call is used under C2 to ensure that a device is not held open by another process after it is reallocated. This means that other processes attempting to access the same device may be killed. stopio(S) is used by initcond(ADM), which is called by getty(M) immediately before starting user interaction and by init(M) immediately after an interactive session has terminated. SECCLEARID Controls the clearing of SUID/SGID bits when a file is written. Under C2 requirements, the set user ID (SUID or setuid) and set group ID (SGID or setgid) bits on files must be cleared (removed) when a file is written. This prevents someone from replacing the contents of a setuid binary. This can cause problems with programs that do not expect this behavior. In the #Low" security profile, SUID and SGID bits are not cleared when files are written. The following table summarizes the initial settings of the security parameters for each security profile. Parameter Low Traditional Improved High SECLUID SECSTOPIO SECCLEARID off off off off on on off on on on on on TTY and console configuration The multiscreen parameters determine the number of console multiscreens that can run simultaneously on the system. Each multiscreen requires about 4 to 8KB of memory depending on the number of lines (25 or 43). H you need to save memory and are not using multiscreens heavily, set NSCRN to 4 and SCRNMEM to 16 or 32. When you do this, you must also disable(C) multiscreens 5-12 (tty5 to tty12) or getty will generate warning messages when the system goes to multiuser mode. NSCRN and SCRNMEM can be set to smaller values than this if you are sure that you need fewer multiscreens. TBLNK Controls the console screen saver feature on VGA consoles (only). It is the number of seconds before the screen blanks to save wear on the monitor. TBLNK can have a value of 0 to 32767, with 0 (default) disabling screen blanking. 204 Performance Guide Configuration tools NSCRN The number of console multiscreens. A value of 0 configures this value at boot time. The maximum value is 12. SCRNMEM Number of 1024-byte blocks used for console screen memory. A value of 0 (the default) configures this value at boot time based on the amount of memory installed. The range of values is between 9 and 128. Each multiscreen uses from 4 to 8KB of memory, so when using a non-zero value for this parameter, make SCRNMEM equal to 4 or 8 times the value of NSCRN. NSPTIYS Number of pseudo-ttys on the system. The default value is 16; the minimum and maximum values are 1 and 256. Each NSPTTYS requires 246 bytes of memory. This parameter should only be altered using the mkdev ptty command which also creates the additional device nodes. Pseudottys are not related to console multiscreens; they are used for features such as serial multiscreens mscreen(M), for shell windows, and for remote logins. NUMXT Number of layers a sub device can configure to support bitmapped display devices such as the BLIT or the AT&T 5620 and 730 terminals. The range of values is between 1 and 32; the default is 3. When this number is exceeded, the following message is displayed on the console: CONFIG: xtinit - Cannot allocate xt link buffers (NUMXT =nUfinber exceeded) Note that the xt driver must have been linked into the kernel using the mkdev layers command or the HardwarelKernel Manager in order to use these display devices. NUMSXT Number of shell layers (shl(C» a subdevice can configure. The range of values is between 1 and 32; the default is 6. Note that the sxt driver must have been linked into the kernel using the mkdev shl command or the Hardware/Kernel Manager in order to use shell layers. Filesystem configuration The following parameters control the configuration of different file system types. MAXVDEPfH Maximum number of undeletable (versioned) files allowed in the DTFS and HTFS filesystems. A value of 0 disables versioning; the maximum value is 65535. This parameter can be overridden when the file system is mounted. 205 Configuring kernel parameters MINVTIME Minimum time before a file is made undeletable (versioned) in the DTFS and HTFS filesystems. If set to 0, a file is always versioned (as long as MAXVDEPrH is greater than 0); if set to a value greater than 0, the file is versioned after it has existed for that number of seconds. The maximum value is 32767. This parameter can be overridden when the filesystem is mounted. ROOTCHKPr If set to 0, disable checkpointing in a root HTFS filesystem; if set to 1 (default), enable checkpointing. ROOTLOG If set to 0, disable transaction intent logging in a root HTFS filesystem; if set to 1 (default), enable logging. ROOTSYNC If set to 0 (default), disable file synchronization on close on a root DTFS file- system; if set to 1, enable synchronization on close. ROOTNOCOMP If set to 1, disable compression in a root DTFS filesystem; if set to 0 (default), enable compression. ROOTMAXVDEPrH Maximum number of undeletable (versioned) files on a root DTFS or HTFS file system. A value of 0 disables versioning. ROOTMINVTIME Minimum time before a file is made undeletable (versioned) on a root DTFS or HTFS filesystem. If set to 0, a file is always versioned (as long as ROOTMAXVDEPrH is greater than 0); if set to a value greater than 0, the file is versioned after it has existed for that number of seconds. DOSNMOUNT Maximum number of mounted DOS filesystems. The range of values is between 0 and 25; the default is 4. DOSNINODE Maximum number of open inodes for DOS filesystems. The range of values is between and 300; the default is 40. ° 206 Performance Guide Configuration tools Table limits The following parameters control the allocation of memory to dynamic kernel tables. TBLPAGES The maximum number of pages of memory for dynamic tables. The range of values is between 10 and 10000; the default is 0 which means that the kernel configures the value based on the amount of memory available at system startup. TBLDMAPAGES The maximum number of pages of dmaable" memory for dynamic tables. The range of values is between 10 and 1000 pages; the default is 100. II TBLLIMIT The percentage of TBLPAGES or TBLDMAPAGES to which a single table may grow. The range of values is between 10 and 100%; the default is 70. TBLSYSLIMIT The percentage of memory allowed for dynamic tables if TBLPAGES is set to O. The range of values is between 10 and 90%; the default is 25. TBLMAP The size of the dynamic table virtual space allocation map. The range of values is between 50 (default) and 500. The following parameters control the maximum grown sizes of dynamic kernel tables. If set to 0, the maximum possible size defaults to the value shown by getconf(C) provided that sufficient TBLPAGES of memory have been allocated. For example, the command getconf KERNEL_MOUNT_MAX displays the maximum possible size of the mount table. MAX_DISK The maximum number of disk drives attached to the system. When the Diskinfo table overflows, the following message is displayed on the console: CONFIG: dk_name - Diskinfo table overflow (MAX_DISK = nUfJnber exceeded) The minimum and maximum configurable values of MAX_DISK are 1 and 1024; the default value of 0 means that the kernel determines the number of disk drives dynamically. MAX_IN ODE Specifies the maximum number of inode table entries that can be allocated. Each table entry represents an in-core inode that is an active file such as a current directory, an open file, or a mount point. Pipes, clone drivers, sockets, semaphores and shared data also use inodes, although they are not associated with a disk file. The number of entries used depends on the number of opened files. 207 Configuring kernel parameters The minimum and maximum configurable values of MAX_INODE are 100 and 64000; the default value of 0 means that the in-core inode table grows dynamically. Each open file requires an inode entry in the in-core inode table. If the inode table is too small, a message similar to the following is displayed on the console: eONFIG: routine - Inode table overflow (MAX_INODE = nur,nber exceeded) When the inode table overflows, the specific request is refused. Although not fatal to the system, inode table overflow may damage the operation of various spoolers, daemons, the mailer, and other important utilities. Abnormal results and missing data files are a common result. If the system consistently displays this error message, use sar -v to evaluate whether your system needs tuning. The inod-sz value shows the num- ber of inode table entries being used and the number of entries that have been allocated for use by the table. MAX_PROC Specifies the maximum number of process table entries that can be allocated. Each table entry represents an active process. The number of entries depends on the number of terminal lines available and the number of processes spawned by each user. If the process table is full, the following message appears on the console and in the file /usr/adm/messages: eONFIG: newproc - Process table overflow (MAX_PROe = nur,nber exceeded) The minimum and maximum values of MAX_PROC that can be set are 50 and 16000; the default value is 0 which means that the process table grows dynamically. The proc-sz values shown by sar -v show how many process table entries are being used compared to those that have been dynamically allocated. MAX_FILE Specifies the maximum number of open file table entries that can be allocated. Each entry represents an open file. The minimum and maximum values of MAX_FILE that can be set are 100 and 64000; the default value is 0 which means that the file table grows dynamically. When the file table overflows, the following warning message is displayed on the system console: eONFIG: falloc - File table overflow (MAX_FILE =nur,nber exceeded) This parameter does not control the number of open files per process; see the description of NOFILES parameter. 208 Performance Guide Configuration tools MAX_REGION Specifies the maximum number of region table entries that can be allocated. Most processes have three regions: text, data, and stack. Additional regions are needed for each shared memory segment and shared library (text and data) attached. However, the region table entry for the text of a "shared text" program is shared by all processes executing that program. Each shared-memory segment attached to one or more processes uses another region table entry. The minim1lffi and maximum values of MAX_REGION that can be set are 500 and 160000; the default value is 0 which means that the region table grows dynamically. If you do configure MAX_REGION, as a general rule you should set its value to slightly more than three times greater than MAX_PROC. When the region table overflows, the following message is displayed on the console: CONFIG: allocreg - Region table overflow (MAX_REGION =nUfJnber exceeded) MAX_MOUNT Specifies the maximum number of mount table entries that can be allocated. Each entry represents a mounted filesystem. The root filesystem (/) is always the first entry. When full, the mount(S) system call returns the EBUSY error code. The minimum and maximum values of MAX_MOUNT that can be configured are 4 and 4096; the default value of 0 means that the kernel grows the size of the mount table dynamically. MAX_FLCKREC Specifies the maximum number of lock table entries that can be allocated. This determines the number of file regions that can be locked by the system. The "lock-sz" value reported by sar -v shows the number of entries that are being used in comparison to the number that have been allocated. The minimum and maximum values of MAX_FLCKREC that can be configured are 50 and 16000; the default value is 0 which means that the kernel grows the size of the record lock table dynamically according to the needs of the applications running on your system. STREAMS STREAMS is a facility for UNIX system communication services. It supports the implementation of services ranging from complete networking protocol suites (such as TCP lIP and IPX/SPX) to individual device drivers. STREAMS defines standard interfaces for character I/O. The associated mechanism is simple and open-ended, consisting of a set of system calls, kernel resources and kernel routines. 209 Configuring kernel parameters STREAMS use system resources that are limited by values defined in kernel configuration modules. Depending on the demand that you and other system users place on these resources, your system could run out of STREAMS resources if you do not first reset the allocations in the kernel configuration modules. Running out of some STREAMS resources (such as those controlled by the NSTREAM parameter) generates kernel configuration error messages. STREAMS message buffers are dynamically allocated from memory up to a limit set by the value of the kernel parameter NSTRPAGES. This parameter sets the maximum number of pages of physical memory that can be dynamically allocated for use by STREAMS. Before changing the STREAMS parameters NSTREAM or NSTRPAGES, you should check the current usage of STREAMS resources using the strstat command of the crash(ADM) utility or netstat(TC) with the -m option. The following tunable parameters are associated with STREAMS processing: NSTREAM Number of stream head (stdata and estdata) data structures configured. One of each structure is needed for each stream opened, including both streams currently open from user processes and streams linked under multiplexers. The allowed range of values is between 1 and 512; the default is 32. The recommended configuration value is highly application-dependent, but a value of 256 usually suffices on a computer for running a single transport provider with moderate traffic. On Open Desktop, each X client also uses a pair of stdata and a pair of estdata structures. You should set NSTREAM to at least 256 on systems that are running X clients. When the number of stream head structures is exceeded, the following message is displayed on the console: CONFIG: stropen - Out of streams (NSTREAM =n exceeded) NSTRPAGES The maximum number of pages of virtual memory that can be allocated dynamically for use by STREAMS message buffers. The allowed range of values is between 0 and 8000 pages; the default is 500. If NSTRPAGES pages of virtual memory are not available when STREAMS are initialized at startup, the system displays the following message on the console for each STREAMS table that is affected: CONFIG: strinit - Cannot alloc STREAMS nar.ne table \ (NSTRPAGES = n too big) 210 Performance Guide Configuration tools If more buffers are requested than there are available pages of physical memory to create them, the system displays the following message on the console: CONFIG: allocb - Out of streams memory (NSTRPAGES =n exceeded) Extra memory is allocated temporarily for high priority buffers only. The system will then try to reduce STREAMS memory usage until it is less than NSTRPAGES. NOTE Memory used by STREAMS for buffers is fully dynamic; memory can be freed as well as allocated. The value of NSTRPAGES does not affect the size of the kernel at system startup although the size of the kernel will grow and shrink over time as pages of memory are allocated for use by STREAMS and subsequently released. STRSPLITFRAC Sets the percentage of NSTRPAGES above which the system tries to create buffers by splitting larger buffers that are on the free list. Below this limit, the system tries to allocate new pages of memory to create the buffers. STRSPLITFRAC can range between between 50 and 100 (percent); the default is 80. If you set STRSPLITFRAC lower than this, the system will use less memory for STREAMS but the memory that is used will tend to become fragmented and the kernel will require more CPU time to manage it. NSTREVENT Initial number of stream event structures configured. Stream event cells are used for recording process-specific information in the poll system call. They are also used in the implementation of the STREAMS I_SETSIG ioctl and in the kernel bufcall mechanism. A rough minimum value to configure would be the expected number of processes to be simultaneously using poll times the expected number of STREAMS being polled for each process, plus the expected number of processes expected to be using STREAMS concurrently. The default and minimum value is 256; the maximum is 512. Note that this number is not necessarily a hard upper limit on the number of event cells that are available on the system (see MAXSEPGCNT). 211 Configuring kernel parameters MAXSEPGCNT The maximum (4KB) page count for stream events. If this value is 0 (minimum), only the amount defined by NSTREVENT is available for use. If the value is not 0 and if the kernel runs out of event cells, it will under some circumstances attempt to allocate an extra page of memory from which new event cells can be created. MAXSEPGCNT places a limit on the number of pages that can be allocated for this purpose. Once a page is allocated for event cells, however, it cannot be recovered later for use elsewhere. The default value is 1 and the maximum 32. STRMSGSZ Maximum allowable size of the data portion of any STREAMS message. This should usually be set just large enough to accommodate the maximum packet size restrictions of the configured STREAMS modules. If it is larger than necessary, a single write or putmsg can consume an inordinate number of message headers. The range of values is between 4096 and 524288; the default value of 16384 is sufficient for existing applications. NUMSP Determines the number of STREAMS pipe devices (/dev/spx, see spx(HW» supported by the system. The default value is 64; the maximum and minimum values are 1 and 256. Administrators do not normally need to modify this parameter unless certain applications state that they require it. NUMTIM Maximum number of timod(M) STREAMS modules that can be pushed by the Transport Layer Interface (TLI) onto a stream head. This parameter limits the number of streams that can be opened. The default value is 16 but various protocol stacks (for example, TCP, LMU, or NETBIOS) may require its value to be set to 32, 64, or 128. Administrators do not normally need to modify this parameter. NUMTRW Maximum number of timod(M) STREAMS modules that can be pushed by the Transport Layer Interface (TLI) onto a stream head in order that the stream will accept read(S) and write(S) system calls. This parameter effectively limits the number of streams onto which the module can be pushed. The default value is 16 but various protocol stacks (for example, TCP, LMU, or NETBIOS) may require its value to be set to 32,64, or 128. Administrators do not normally need to modify this parameter. See #STREAMS parameters" (page 213) for a description of the STREAMS parameters that can only be tuned using idtune(ADM). 212 Performance Guide Configuration tools STREAMS parameters NOTE This group of parameters is not tunable using configure(ADM); you must use the idtune(ADM) command instead as described in ~'Using idtune to reallocate kernel resources" (page 190). NMUXLINK Number of stream multiplexer links configured. One link structure is required for each active multiplexer link (STREAMS CLINK ioctl) in networking protocol stacks such as those used to implement TCP lIP and NFS. Each PPP link also requires such a structure. The number needed is application-dependent; the default value is 192. The minimum and maximum configurable values are 1 and 4096. NSTRPUSH Maximum number of modules that may be pushed onto a stream. This prevents an errant user process from consuming all of the available queues on a single stream. The default possible value is 9. In practice, applications usually push at most four modules onto a stream. NLOG Number of minor devices to be configured for the log driver; the active minor devices are 0 through (NLOG-1). The only value of 3 services an error logger (strerr) and a trace command (strace), with one left over for miscellaneous usage. STRCTLSZ Maximum allowable size of the control portion of any STREAMS message. The control portion of a putmsg message is not subject to the constraints of the minimum/maximum packet size, so the value entered here is the only way of providing a limit for the control part of a message. The only possible value of 1024 is more than sufficient for existing applications. Message queues The following tunable parameters are associated with interprocess communication message queues: MSGMAP Specifies the number of entries in the memory map for messages. An entry in the message map table says that MSGSEG / MSGMAP memory segments are free at a particular address. 213 Configuring kernel parameters MSGMAP measures how fragmented you expect your map to get. Its value can be small if you always send a few large messages, or it can be large if you send a lot of small messages. The suggested value for MSGMAP is approximately half the value of MSGSEG; this allocates two message segments per map entry. If the value of MSGMAP is set equal to MSGSEG, long messages may become totally fragmented with their component segments being randomly scattered across the map. Do not set MSGMAP to a value greater than that of MSGSEG. The range of configurable values is from 4 to 32768; the default value is 512. Each entry costs 8 bytes. MSGMAX Maximum size of a message in bytes. The minimum value is 128, the default value is 8192 bytes, and the maximum possible size the kernel can process is 32767 bytes. MSGMNB Maximum number of bytes of memory that all the messages in anyone message queue can occupy. The default value is 8192; the maximum and minimum values are 128 bytes and 65532 bytes. MSGSEG Number of MSGSSZ segments of memory allocated at kernel startup for holding messages. Therefore a total of MSGSEG*MSGSSZ bytes of memory are allocated for messages. I NOTE The amount of memory allocated for messages must not exceed 128KB. If MSGSEG is set at 0, then the kernel will auto-configure the values of MSGSEG, MSGMAX, and MSGMNB. For most memory configurations, MSGSEG is set to 1024, and MSGMAX and MSGMNB are both set to MSGSEG*MSGSSZ. The IPC_NOWAIT flag can be passed into many of the msg system calls. If this flag is passed, then the system calls will fail immediately if there is no space for a message. If this flag is not passed, then the system calls will sleep until there is room for the message. 214 Performance Guide Configuration tools To determine adequate values for each of the parameters, compute the maximum size and number of messages desired, and allocate that amount of space. For example, if the system will have at most 40 messages of lKB each pending, then MSGTQL should be set to 40, and MSGSEG is computed as: • 40 messages of lK each = 40KB total message space. • Divide total message space by MSGSSZ to get MSGSEG. If MSGSSZ=8 bytes, then MSGSEG = 40*1024/8 = 5120. The default value of MSGSEG is 1024; the minimum and maximum values are 32 and 32768. See "Message queue parameters" (this page) for a description of the message queue parameters that can only be tuned using idtune(ADM). Message queue parameters The following parameters are associated with System V IPe message queues. NOTE This group of parameters is not tunable using configure(ADM); you must use the idtune(ADM) command instead as described in ''Using idtune to reallocate kernel resources" (page 190). I MSGMNI Maximum number of different message queues allowed system-wide. The default value of MSGTQL is 50; the minimum and maximum values are 1 and 1024. You should not normally need to adjust the value of this parameter. MSGTQL Number of system message headers that can be stored by the kernel; that is, the maximum number of unread messages at any given time. Each header costs 12 bytes. The default value of MSGTQL is 1024; the minimum and maximum values are 32 and 16383. You should not normally need to adjust the value of this parameter unless an application needs a large number of messages. MSGSSZ Size in bytes of the memory segment used for storing a message in a message queue. A message that is shorter than a whole number multiple of memory segments will waste some bytes. For example, an 18-byte message requires three message segments if MSGSSZ is set to 8 bytes. In this case, 6 bytes of memory are unused, and unusable by other messages. The product of the values of MSGSSZ and MSGSEG determines the total amount of data that can be present in all message queues on a system. This product should not be greater than 128KB. 215 Configuring kernel parameters The default value of MSGSSZ is 8 bytes; the minimum and maximum values are 4 bytes and 4096 bytes. The configured value of MSGSSZ must be divisible by 4. You should not normally need to adjust the value of this parameter. Event queues The following parameters control the configuration of the event queues. EVQUEUES Maximum number of open event queues systemwide. Each EVQUEUES costs 88 + (2 * EVDEVSPERQ) bytes of memory. The range of values is between 1 and 256; the default is 8. EVDEVS Maximum number of devices attached to event queues systemwide. Each EVDEVS costs 48 bytes of memory. The range of values is between 1 and 256; the default is 16. When the event table overflows, the following message is displayed on the console: CONFIG: event - Event table full (EVDEVS = number exceeded) EVDEVSPERQ Maximum number of devices for each event queue. The range of values is between 1 and 16; the default is 3. When the event channel overflows, the following message is displayed on the console: CONFIG: event - Event channel full (EVDEVSPERQ = number exceeded) Semaphores The following tunable parameters are associated with interprocess communication semaphores: SEMMAP Size of the control map used to manage semaphore sets. The default and minimum value is 10; the maximum is 100. Each entry costs 8 bytes. SEMMNI Number of semaphore identifiers in the kernel. This is the number of unique semaphore sets that can be active at any given time. The default and minimum value is 10; the maximum is 300. Each entry costs 32 bytes. SEMMNU Number of semaphore undo structures in the system. The size is equal to 8*{SEMUME + 2) bytes. See #Semaphore parameters" (page 217) for a definition of SEMUME. The range of values is between 10 and 100; the default is 30. 216 Performance Guide Configuration tools XSEMMAX Size of the XENIX® semaphore table that determines the maximum number of XENIX semaphores allowed systemwide. The minimum value for XSEMMAX is 20, the maximum value is 90, and the default value is 60. When the XENIX semaphore table overflows, the following message is displayed on the console: CONFIG: xsem_alloc - XENIX semaphore table overflow (XSEMMAX =nu~ber exceeded) See "Semaphore parameters" (this page) for a description of the semaphore parameters that can only be tuned using idtune(ADM). Semaphore parameters NOTE This group of parameters is not tunable using configure(ADM); you must use the idtune(ADM) command instead as described in ''Using idtune to reallocate kernel resources" (page 190). I SEM_NSEMS_MAX Maximum number of POSIX.lb semaphores available for use on the system (provided by the SUDS library). The default value is 100; the minimum and maximum configurable values are 1 and 255 respectively. The following parameters are associated with System V IPC semaphores only: SEMMSL Maximum number of semaphores for each semaphore identifier. The default and minimum value is 25; the maximum value is 60. SEMOPM Maximum number of semaphore operations that can be executed for each semop(S) call. The default value is 10; each entry costs 8 bytes. SEMUME Number of undo entries for each process. The default value is 10. SEMVMX Maximum value a semaphore can have. The default value is 32767. SEMAEM Maximum value for adjustment on exit, alias semadj. This value is used when a semaphore value becomes greater than or equal to the absolute value of semop, unless the program has set its own value. The default value is 16384. SEMMNS Number of semaphores in the system. The default and minimum value is 60; the maximum value is 300. Each entry costs 8 bytes. 217 Configuring kernel parameters Shared memory The following tunable parameters are associated with interprocess communication shared memory: SHMMAX Maximum shared-memory segment size. The range of values is between 131072 and 80530637 bytes; the default value is 524288 bytes. SHMMIN Minimum shared-memory segment size. The default value is 1 byte. XSDSEGS Maximum number of XENIX special shared-data segments allowed system wide. The range of values is between 1 and 150; the default is 25. When the XENIX shared data table overflows, the following message is displayed on the console: CONFIG: xsd_alloc - XENIX shared data table overflow (XSDSEGS = nur.nber exceeded) XSDSLOTS Number of slots for each XENIX shared data segment. The maximum n:umber of XENIX special shared data segment attachments system wide is XSDSEGS*XSDSLOTS. The range of values is between 1 and 10; the default is 3. See "Shared memory parameters" (this page) for a description of the shared memory parameters that can only be tuned using idtune(ADM). Shared memory parameters NOTE The following parameter is not tunable using configure(ADM); you must use the idtune(ADM) command instead as described in "Using idtune to reallocate kernel resources" (page 190). I SHMMNI Maximum number of shared-memory identifiers systemwide. The minimum and default value is 100; the maximum is 2000. Each entry costs 52 bytes. 218 Performance .Guide Configuration tools Miscellaneous system parameters The following parameters control the size of the configuration string buffer, and the size of the kernel profiler symbol table. MAX_CFGSIZE Maximum size of configuration information saved by the tab(HW) driver. This is the maximum size of information available using Idevlstring/cfg as described on the string(M) manual page. If this limit is exceeded, the following message is displayed on the console: CONFIG: string: Configuration buffer full (MAX_CFGSIZE = nur.nber exceeded) MAX_CFGSIZE ranges from 256 to 32768 bytes; the default is 1024 bytes. PRFMAX Sets the maximum number of text symbols that the kernel profiler, Idevlprf, can properly process. The range of values is between 2048 and 8192; the default is 4500. See profiler(ADM) for information about the kernel profiler. System parameters NOTE This group of parameters is not tunable using configure(ADM); you must use the idtune(ADM) command instead as described in ''Using idtune to reallocate kernel resources" (page 190). I NODE System name. The value of NODE must not be greater than eight characters. The default value is "scosysv". TIMEZONE Specifies the timezone in units of minutes different from Greenwich Mean Time (GMT). Note that the value specifies the system default timezone and not the value of the TZ environment variable. TIMEZONE can have a value from -1440 (east of GMT) to 1440 (west of GMT); the default is 480. DSTFLAG Specifies the dstflag described for the ctime(S) system call. A value of 1 indicates Daylight Savings Time applies locally, zero is used otherwise. KDBSYMSIZE Size of the kernel debugger symbol table in bytes. (This parameter is only useful if a debugger is linked into the kernel.) It must have a value of between 50000 and 500000; the default is 300000. NCPYRIGHT Defines the maximum. number of strings used to store some vendor driver copyright messages that may be displayed on the console when the system is booted. Modifying this parameter is unlikely to affect the display of most copyright messages. 219 Configuring kernel parameters Miscellaneous device drivers and hardware parameters The following parameters control the configuration of various device drivers and hardware behavior. CTBUFSIZE Size of the tape buffer in kilobytes. This static buffer is allocated by the QIC-02 cartridge tape device driver (ct) when it is initialized at system startup. This parameter should have a value of between 32 and 256. Set this parameter to 0 if the ct driver is linked into the kernel but you either do not have or do not use a cartridge tape drive. The following are values that this parameter can take in various circumstances: 32KB bare minimum: this is insufficient to stream 64KB minimum to allow streaming (good for systems with little memory) or little tape use (if tape I/O performance is not critical) 96KB reduce to this at first if the default uses too much memory 128KB default: this offers good tradeoff performance between I/O and memory 192KB increase to this at first if the default provides poor I/O performance 256KB maximum size NOTE The SCSI tape device driver (Stp) allocates a statically configured 128KB buffer for each device which is not controlled by this parameter. All SCSI tape drives including SCSI cartridge tape drives use the Stp driver. SDSKOUT Maximum number of simultaneous requests that can be queued for each SCSI disk. The SCSI disk driver (Sdsk) will sleep if no request blocks are available. The default value of this parameter is 4; the minimum and maximum values are 1 and 256. You should set SDSKOUT higher if the -S option to sar(ADM) (or mpsar(ADM) for SMP) reports that the system is running out of request blocks. 220 Performance Guide Configuration tools DMAEXCL Specifies whether simultaneous DMA requests are allowed. Some computers have DMA chips that malfunction when more than one allocated channel is used simultaneously. DMAEXCL is set to 0 by default to allow simultaneous DMA on multiple channels. Set its value to 1 if this causes a problem. KBTYPE Determines the logical character protocol used between the keyboard and the keyboard driver. This tunable is set by default to 0 for XT scancodes and is recommended; a value of 1 specifies AT scancodes which are recognized by the console driver but not by the X server or by DOS emulators. All AT-compatible keyboards support both modes. VGA_PLASMA Set to 1 if an IBM® PS/2® model P70 or P75 VGA plasma display is present; set to 0 (default) if not. NSHINTR Maximum number of devices sharing the same interrupt vector. This has a default value of 8; the minimum and maximum values are 2 and 20. You should not normally need to modify this parameter. D0387CR3 Controls the setting of high-order bits of Control Register 3 (CR3) when an 80387™ math coprocessor is installed. Because of design defects in early versions of the Intel® 80387™ chip (Bl stepping), this math coprocessor may not operate correctly in some computers. The problem causes a CPU to hang when DMA, paging, or coprocessor accesses occur. You can work around this problem by changing the D0387CR3 parameter from the default value of 0 (switched off) to 1. I WARNING Do not set this parameter to 1 on 80486™ or Pentium™ machines. DOWPCRO If set, the kernel uses the write protection bit in Control Register 0 (CRO) to enable write protection in kernel mode. The default value is 1 which sets this parameter. This parameter is effectively disabled on machines which contain one or more 80386™ CPUs which do not support this feature. MODE_SELECT No effect. Mode-select checking on parallel (printer) ports can be adjusted on a per-printer basis using the pa_tune [] array defined in and documented in the file letc!conflpack.dlpalspace.c. 221 Configuring kernel parameters Hardware and device driver parameters NOTE This group of parameters is not tunable using configure(ADM); you must use the idtune(ADM) command instead as described in #Using idtune I to reallocate kernel resources" (page 190). NAHACCB Number of mailboxes available for the Adaptec 154Xjl64X host adapter driver to talk to other Adaptec hardware. The higher the number, the less likely it is that the driver has to sleep. It is not normally necessary to modify this parameter. NEMAP Specifies the maximum number of mapchan(M) I/O translation mappings that can be in effect at the same time. The default value of this parameter is 10. NKDVTTY Number of virtual terminals (8) supported by the console keyboard driver. Administrators should not modify this parameter. Boot load extension parameters NOTE This group of parameters is not tunable using configure(ADM); you must use the idtune(ADM) command instead as described in ''Using idtune I to reallocate kernel resources" (page 190). EXTRA_NDEV Number of extra device slots in fmodsw [], io_ini t [], and io ••• []. It defines the number of slots reserved in the device driver tables for Boot Time Loadable Drivers (BTLDs). EXTRA_NEVENT Number of extra event slots. It defines the number of slots reserved in the event driver tables for BTLDs. EXTRA_NFILSYS Number of extra types of filesystem. It defines the number of extra types of filesystem that can be loaded using BTLDs. MAX_BDEV Maximum number of block devices (bdevcnt is at least this value). It defines the minimum number of entries in bdevsw [], the block device switch table. MAX_CDEV Maximum number of character devices (cdevcnt is at least this value). It defines the minimum number of entries in cdevsw [ ] , the character device switch table. 222 Performance Guide Configuration tools LAN Manager Client Filesystem parameters NOTE The LAN Manager Client Filesystem (LMCF5) adds several kernel parameters to the mtune file that are not tunable using configure(ADM); you must use the idtune(ADM) command instead as described in "Using idtune to reallocate kernel resources" (page 190). LMCFS_BUF_SZ Determines the maximum amount of data that LMCF5 can transmit or receive in a single network packet. The default value is 4096 bytes. LMCFS_LMINUM Controls the number of allocatable inodes. The default value is 150; the maximum value is 600. Set this value higher if users have many LMCF5 files open simultaneously. LMCFS_NUM_BUF Sets the number of server message block (5MB) data buffers used by LMCF5. The default value is 256; the maximum value is 8192. The size of each buffer is determined by LMCFS_BUF_SZ. LMCFS_NUM_REQ Constrains the number of simultaneous 5MB requests that can be made on the network. The default value is 64; the maximum value is 1024. This parameter should be set to at least one quarter of the value of LMCFS_NUM_BUF. Examining and changing configuration-dependent values getconf allows you to inspect the values of configuration-dependent variables for various standards, and the values of dynamic kernel table parameters. Below is an example of the use of getconf: $ getconf NZERO 20 $ getconf CLK_TCK 100 This indicates that the default process priority on the system is 20 and the system clock runs at 100 ticks per second. 223 Configuring kernel parameters Path variables, such as NAME_MAX which defines the maximum filename length, depend on the filesystem type and therefore the pathname. These examples show the values of NAME_MAX for an HTFS and a XENIX filesystem: # getconf NAME_MAX I htfs_filesystem NAME_MAX I xenix_filesystem 255 # getconf 14 For a complete list of the variable names to use with the command see getconf(C). If you are logged in as root, you can use the setconf(ADM) command to change a subset of the configuration dependent parameters. Using setconf, you can increase the current size of the dynamic kernel tables or decrease their maximum possible size. You can also dynamically increase the number of character buffers available for use by the serial driver, for example: setconf KERNEL_CLISTS 1024 The maximum possible number of such buffers that you can allocate is controlled by the KERNEL_CLISTS_MAX parameter. NOTE Any change that you make using setconf remains in force only until the system is next rebooted. Use the Hardware/Kernel Manager or configure to make the change permanent. I 224 Performance Guide Appendix C Configuring TCP/IP tunable parameters You can adjust the configuration parameters for TCP lIP using the ifconfig(ADMN) and inconfig(ADMN) utilities as described in the following sections: • "Using ifconfig to change parameters for a network card" (this page) • "Using inconfig to change global TCP lIP parameters" (page 226) If you need to change STREAMS resources, you must use the configure(ADM) command as described in ''Using configure to change kernel resources" (page 189). Using ifconfig to change parameters for a network card You can use the ifconfig(ADMN) command to reconfigure performance parameters for a single network interface. If you wish to make this change permanent you must edit the entry for the interface in the /etc/tcp script. The metric, onepacket, and perf parameters affect performance. metric can be used to artificially raise the routing metric of the interface used by the routing daemon, routed(ADMN). This has the effect of making a route using this interface less favorable. For example, to set the metric for the smeO interface to 10, enter: letdifconfig smeO inet metric 10 onepacket enables one-packet at a time operation for interfaces with small buffers that are unable to handle continuous streams of back-ta-back packets. This parameter takes two arguments that allow you to define a small packet size, and the number of these that you will permit in the receive window. 225 Configuring TCPI/P tunable parameters This deals with TCP lIP implementations that can send more than one packet within the window size for the connection. Set the small packet size and count to zero if you are not interested in detecting small packets. For example, to set one-packet mode with a small packet threshold of one small packet of 512 bytes on the e3AO interface, enter: letC/ifconfig e3AO inet onepacket 512 1 To turn off one-packet mode for this interface, enter: letC/ifconfig e3AO inet -onepacket perf allows you to tune performance parameters on a per-interface basis. The arguments to perf specify the receive and send window sizes in bytes, and whether TCP should restrict the data iri a segment to a multiple of lKB (a value of 0 restricts; 1 uses the full segment size). The following example sets the receive and send window size to 4KB, and uses the maximum 1464-byte data size available in an Ethernet frame: letC/ifconfig smeO inet perf 4096 4096 1 NOTE Segment truncation does not change the size of the Ethernet frame; this is fixed at 1530 bytes. I Using inconfig to change global TCPIIP parameters As root, you can use the inconfig{ADMN) command to change the global default TCP lIP configuration values. I NOTE Any global performance parameters that you set using inconfig are overridden by per-interface values specified using ifconfig. For example, to enable forwarding of IP packets, you would enter: inconfig ipforwarding 1 inconfig updates the values of the parameters defined in letcldefaultlinet and those in use by the currently executing kernel. You do not need to reboot your system for these changes to take effect; inconfig dynamically updates the kernel with the changes you specify. Before doing so, it verifies that the values you input are valid. If they are not, the current values of the parameters are retained. See "TCP lIP parameters" (page 227) for a description of the TCP lIP parameters that you can tune using inconfig. 226 Performance Guide TCPJlP parameters The parameters that control the operation of TCP lIP are defined in the file /etc/default/inet. The parameters are grouped according to function: • "Address Resolution Protocol (ARP) parameters" (this page) • "Asynchronous half-duplex (ASYH) line connection parameters" (page 228) • "Internet Control Message Protocol (ICMP) parameters" (page 228) • "Internet Group Management Protocol (IGMP) parameters" (page 229) • "Configuring the in-kernel network terminal (IKNT) driver" (page 229) • "Internet Protocol (IP) parameters" (page 229) • IIMessage block control logging (MBCL) parameters" (page 232) • IINetBIOS parameters" (page 232) • IITransmission Control Protocol (TCP) parameters" (page 232) • ''User Datagram Protocol (UDP) parameters" (page 234) You should read the description for a parameter before you change it using inconfig(ADMN) as described in IIUsing inconfig to change global TCP/IP parameters" (page 226). The default values of the parameters are configured to work efficiently in most situations. I NOTE Never edit the settings for these parameters in the file /etc/default/inet; always use inconfig to change them. Address Resolution Protocol (ARP) parameters The following parameters control the behavior of the Address Resolution Protocol (ARP). arpprintfs Controls logging of warnings from the kernel ARP driver. These are displayed on the console. If set to 0 (the default), debugging information is not displayed. arp _maxretries Sets the maximum number of retries for the address resolution protocol (ARP) before it gives up. The default value is 5; the minimum and maximum configurable values are 1 and 128. 227 Configuring TCPI/P tunable parameters arpt_down Sets the time to hold onto an incomplete ARP cache entry if ARP lookup fails. The default value is 20 seconds; the minimum and maximum configurable values are 1 and 600 seconds. arpt_keep Sets the time to keep a valid entry in the ARP cache. The default value is 1200 seconds; the minimum and maximum configurable values are 1 and 2400 seconds. arpt_prune Sets the interval between scanning the ARP table for stale entries. The default value is 300 seconds; the minimum and maximum configurable values are 1 and 1800 seconds. The number of ARP units is controlled by the value of the defined constant ARP_UNITS. Asynchronous half-duplex (ASYH) line connection parameters The following parameter controls the behavior of asynchronous half-duplex (ASYH) line connections used by PPP. ahdlcmtu Sets the maximum transmission unit (MTU) for an asynchronous PPP link. This is normally set on a per-system basis in the /etc/ppphosts file - if not defined there, this value is used. The default value of ahdlcmtu is 296 bytes; the minimum and maximum configurable values are 128 and 2048 bytes. Internet Control Message Protocol (ICMP) parameters The following parameters control the behavior of the Internet Control Message Protocol (ICMP). icmp_answermask H set to 1, the system will respond to ICMP subnet mask request messages. This variable must be set to 1 to support diskless workstations. The default value is 0, do not respond, as specified in RFC 1122. icmpprintfs Controls logging of warnings from the kernelICMP driver. These are displayed on the console. H set to 0 (the default), debugging information is not displayed. 228 Performance Guide Internet Group Management Protocol (IGMP) parameters The following parameter controls the behavior of the Internet Group Management Protocol (IGMP). igmpprintfs Controls logging of warnings from the kernel IGMP driver. These are displayed on the console. If set to 0 (the default), debugging information is not displayed. Configuring the in-kernel network terminal (IKNT) driver The number of IKNT driver units is determined by the number of pseudo-ttys configured on the system. Use mkdev ptty to tune the number of pseudottys. Internet Protocol (IP) parameters The following parameters control the behavior of the Internet Protocol (IP). The number of interfaces supported by IP is dynamic and does not need tuning. NOTE The value of the parameters in_fullsize, in_recvspace, and in_sendspace affect the systemwide interface defaults. Their values may be overridden on a per-interface basis by ifconfig(ADMN). This allows you to mix fast and slow network hardware on the same system with optimal performance parameters defined for each interface. in_fullsize Controls the systemwide default TCP behavior for attempting to negotiate the use of full-sized segments. If set to 1 (the default), TCP attempts to use a segment size equal to the interface MTU minus the size of the TCP lIP headers. If set to 0, TCP rounds the segment size down to the nearest power of 2. in_Ioglimit Controls how many bytes of the error packet to display when debugging. Note that the appropriate xxxprintfs parameter (such as tcpprintfs) must be set to a non-zero value to enable logging. The default value is 64. The minimum and maximum configurable values are 1 and 255. 229 Configuring TCPI/P tunable parameters in_recvspace Sets the systemwide default size of the TCP lIP receive window in bytes. The default value is 4096 bytes. The minimum and maximum configurable values are 2048 and 65535 bytes. in_sendspace Sets the systemwide default size of the TCP lIP send window in bytes. This should be at least as large as the loopback MTU. The default value is 8192 bytes. The minimum and maximum configurable values are 2048 and 65535 bytes. ip_checkbroadaddr Controls whether IP validates broadcast addresses. If set to 1 (the default as specified in RFC 1122), IP discards non-broadcast packets sent to a linklevel broadcast address. In the unlikely event that a data-link driver does not support this, packets may be discarded erroneously. If the netstat -sp ip command shows that many packets cannot be forwarded, set this parameter to 0 to tum off checking. ip_dirbroadcast If set to 1 (the default), allows receipt of broadcast packets only if they match one of the broadcast addresses configured for the interface upon which the packet was received. If set to 0, allows receipt of broadcast packets that match any configured broadcast address. ip_perform_pmtu IP performs Path MTU (PMTV) discovery as specified in RFC 1191 if set to 1 (the default). This causes IP to send packets with the "do not fragment" bit set so that routers will generate "Fragmentation Required" messages. If this causes interoperability problems, a value of 0 disables PMTU. If you disable PMTU, you should also set tcp_of£er_bi8-mss (described in "Transmission Control Protocol (TCP) parameters" (page 232» to O. ip_pmtu_decrease_age Controls how many seconds IP will wait (while performing PMTU) after decreasing an MTU estimate before it starts raising it. The default value is 600 seconds. The maximum configurable value is 32667. If set to Oxffffffff, the estimate is never raised; this is useful if there is only one path out of your local network and its MTU is known to be constant. ip_pmtu_increase_age Sets the number of seconds between increasing the MTU estimate for a destination once it starts to increase. The default value is 120 seconds. The minimum and maximum configurable values are 0 and 600 seconds. ip_settos If set to 1 (the default), IP sets type-of service TOS information (as specified in RFS 1122) in packets that it sends down to the data-link layer. Set this to oif your network card link-level driver cannot handle this. 230 Performance Guide ip_subnetsarelocal The default value of 1 specifies that other subnets of the network are to be considered as local- that is, TCP assumes them to be connected via highMSS paths and adjusts its idea of the MSS to be negotiated. Otherwise, TCP uses the default MSS specified by tcp_mssdflt (described in #Transmission Control Protocol (TCP) parameters" (page 232)) - this is typically 512 bytes in accordance with RFC 793 and 1122. By default, the parameter tcp_of£er_bi~mss is non-zero so that Path MTU discovery will provide the maximum benefit. If the value of tcp _of£er_bi~mss is zero, the value of ip_subnetsarelocal is not checked. This allows for good local performance even when PMTU discovery is not used. The message #ICMP Host Unreachable" is generated for local subnet routing failures. When this value is set to 0, the packet si,ze is set to 576 bytes, as specified in RFC 1122. The default value of 1 enables this feature; if set to 0, it is disabled. ip_ttl Sets the time to live (TTL) of an IP packet as a number of hops. This value is used by all kernel drivers that need it (including TCP). The default value is 64 as recommended by RFC 1340. The minimum and maximum configurable values are 1 and 255. ipforwarding ipsendredirects If you want to use your machine as a gateway, set both these parameters to l. ipforwarding controls whether the system will 'forward packets sent to it which are destined for another system (that is, act as a router). The default value is 0 (off) as defined by RFC 1122. A system acting as a host will still forward source-routed datagrams unless ipnonlocalsrcroute is set to O. ipsendredirects controls whether IP will redirect hosts when forwarding a packet out of the same interface on which it was received. This should be set to 1 if ipforwarding is set to l. The Network Configuration Manager configures these values when additional drivers are added. This feature usually makes it unnecessary to change ipforwarding and ipsendredirects with inconfig. 231 Configuring TCPI/P tunable parameters ipnonlocalsrcroute Controls whether source-routed datagrams will be forwarded if they are not destined for the local system. On hosts, the default value is 0 (off). H your machine is acting as a router (ipforwarding is set to 1), the Network Configuration Manager sets its value to 1. Set its value back to 0 if you are concerned that this may open a security hole. ipprintfs Controls logging of warnings from the kernel IP driver. These are displayed on the console. If set to 0 (the default), debugging information is not displayed. Message block control logging (MBCL) parameters The following parameter controls the behavior of message block control logging (MBCL). mbclprintfs Controls logging of warnings from the kernel MBCL driver which converts STREAMS messages (mblock) to character lists (clist). The warnings are displayed on the console. If set to 0 (the default), debugging information is not displayed. NetBIOS parameters The following parameters control the behavior of NetBIOS. nb _sendkeepalives Turns NetBIOS level keepalives on or off. When turned on, NetBIOS keepalives are sent periodically on dormant NetBIOS connections. NetBIOS keepalives are independent of TCP lIP keepalives, and are useful for systems that do not use TCP lIP keepalives. This parameter is set to 0 (turned off) by default. Set it to 1 to enable NetBIOS keepalives. nbprintfs Controls logging of warnings from the kernel NetBIOS driver as specified in RFC 1001/2. The warnings are displayed on the console. H set to 0 (the default), debugging information is not displayed. Transmission Control Protocol (rCP) parameters The following parameters control the behavior of the Transmission Control Protocol (TCP). You can increase the number of TCP units beyond the default number (256) using the Network Configuration Manager for the appropriate sco_tcp chain. tcp_initial_timeout Sets the TCP lIP retransmit time for an initial SYN segment. The default value is 180 seconds as defined by RFC 1122. The minimum and maximum configurable values are 1 and 7200 seconds. 232 Performance Guide tcp_keep idle Sets the idle time before TCP lIP keepalives are sent (if enabled). The default value is 7200 seconds. The minimum and maximum configurable values are 300 and 86400 seconds. tcp _keepintvl Sets the TCP /IP keep alive interval between keep alive packets once they start being sent. The default value is 75 seconds. The minimum and maximum configurable values are 1 and 43200 seconds. tcp_mss_sw_threshold Defines the small window threshold for interface MTUs. If the MTU of an interface is small enough to force TCP to use an MSS smaller than this threshold, then TCP will use the receive window size specified by tcp _small_recvspace. This is an optimization to avoid buffering too much data on low-speed links such as SLIP and PPP. The default value is 1024 bytes. The minimum and maximum configurable values are 512 and 4096 bytes. tcp _mssdflt Sets the default TCP segment size to use on interfaces for which no MSS and Path MTU information is available. The default and minimum value is 512 bytes. The maximum configurable values is 32768. You should keep the value of this parameter small if possible. tcp_nkeep Sets the number of TCP lIP keepalives that will be sent before giving up. The default value is 8. The minimum and maximum configurable values are 1 and 256. tcp_offer_bi~mss In order to get the maximum benefit out of Path MTU (PMTU) discovery, TCP normally offers an MSS that is derived from the local interface MTU (after subtracting the packet header sizes). This allows the remote system to send the biggest segments that the network can handle. Set this parameter to 0 for systems that cannot handle this, or that do not implement PMTU discovery. This causes TCP to offer a smaller MTU for non-local connections (see ip_subnetsarelocal in "Internet Protocol (IP) parameters" (page 229)). The default value of 1 (offer it) allows maximum benefit to be gained from PMTU discovery; a value of 0 disables this. 233 Configuring TCPI/P tunable parameters tcp_small_recvspace Sets the receive window size to use on interfaces that require small windows (see also tcp_mss_sw_threshold). MTU is less than tcp_mss_sw_threshold. The default value is 4096 bytes. The minimum and maximum configurable values are 1024 and 16384 bytes. tcp _urgbehavior Controls how TCP interprets the urgent pointer. If set to 0, it interprets it in RFC 1122 mode; if set to 1 (the default), it interprets it in BSD mode. tcpalldebug If non-zero~ captures trace information for all connections. The default value is 0 which causes TCP to trace only those connections that set the SO_DEBUG option. This information can be retrieved using the trpt(ADMN) command, or displayed on the console if tcpconsdebug is set. tcpconsdebug Directs TCP lIP connection trace output to the console if set to 1 (see also tcpalldebug). The default value is O. tcpprintfs Controls logging of warnings from the kernel TCP driver. These are displayed on the console. If set to 0 (the default), debugging information is not displayed. User Datagram Protocol (UDP) parameters The following parameter controls the behavior of the User Datagram Protocol (UDP). udpprintfs Controls logging of warnings from the kernel UDP driver. These are displayed on the console. If set to 0 (the default), debugging information is not displayed. 234 Performance Guide Appendix D Quick system tuning reference Table D-l, "Diagnosing performance problems" (this page) summarizes the symptoms and possible solutions for some important performance problems. Note that the measured values represent averages over time. Suggested critical values may not be suitable for all systems. For example, you may be able to tolerate a system that is paging out if this is not impacting the performance of the rest of the system seriously. Table 0·1 Diagnosing performance problems Insufficient CPU power at high load Possible solutions [mp ]sar -q shows runq-sz > 2 [mp]sar-u shows %idle < 20% on multiuser system [mp]sar-u shows %idle < 5% on dedicated database server Additionally for SMP: mpsar -q shows %runocc > 90% cpusar -u shows %idle < 20% on any CPU of multiuser system cpusar -u shows %idle < 5% on any CPU of dedicated database server Measures that can be taken include: • check that the system is not swapping or paging out excessively • reschedule jobs to run at other times • tune applications to use less CPU power • replace applications with ones needing less CPU power • replace non-intelligent serial cards with intelligent ones • upgrade the system to use faster CPU(s) • upgrade to a multiprocessor system • add more CPUs to a multiprocessor system • purchase an additional system to share the load 235 Quick system tuning reference Excessive paging out or swapping Possible solutions [mp]sar -p shows rclm/s» 0 [mp]sar-q shows %swpocc > 20% [mp]sar -w shows swpot/s > 1 swap -1 shows free < 50% of blocks Increase free memory until swapping does not occur by: • reducing number of buffers (watch out for reduced cache hit rates) • running fewer large applications locally • moving users to another machine • addingRAM Poor disk performance Possible solutions [mp ]sar -u shows %wio > 15% [mp]sar -d shows avque» 1 and %busy > 80% Increase disk performance by: • using HTFS filesystem(s) • using striping across several disks to balance load • keeping filesystems < 90% full • reorganizing directories • keeping directories small • distributing different types of activity to different disks • adding more disks • using faster disks, controllers, and host adapters • improving buffer cache performance • improving namei cache performance • reducing filesystem fragmentation Poor buffer cache performance Possible solutions [mp ]sar -b shows %rcache < 90% and Improve buffer cache performance by: %wcache < 65% • increasing number of buffers • increasing number of buffer hash queues per buffer 236 Performance Guide Poor namei cache performance Possible solutions [mp ]sar -n shows %Hhit < 65% or %Dhit < 65% Increase namei cache hit rate by: • tuning namei cache parameters for each filesystem type • make each pathname component less than or equal to 14 characters Fragmented filesystem Possible solutions elf -v shows blocks %used > 90% Reduce the number of disk blocks used by: • using DTFS filesystem(s) • removing unwanted files regularly • archiving and removing, or compressing infrequently used files • mounting commonly used resources across the network using NFS • adding disk(s) Reduce fragmentation by: • archiving and removing the files, and rebuilding the filesystem Kernel tables too small Possible solutions error messages displayed on console [mp ]sar -v shows ov > 0 (overflows) Allow table sizes to grow dynamically; for example, set MAX_PROC to 0 for the process table The desirable attributes of systems with many logged-in users and database server systems differ in some respects. Use the following tables to check that you have not overlooked anything: • Table D-2, Attributes of a well-tuned multiuser system" (page 238) /I • Table D-3, "Attributes of a well-tuned dedicated database server system" (page 239) Note that the performance values suggested in these tables may not be suitable for all systems. The appropriate values depend greatly on the mix of applications that is running and the likely demands placed on the system. 237 Quick system tuning reference To record system activity to a file for later analysis, use the -0 option of sar(ADM) on a single processor system, and of mpsar(ADM) on a multiprocessor system. Take the measurements over a period of at least an hour with a sampling interval sufficiently small to capture the level of detail which you are interested in. Record the system's activity at varying levels of loading so that you can identify when bottlenecks are appearing. Table 0-2 Attributes of a well-tuned multiuser system CPU performance Explanation [cpu]sar -u shows %idle > 20% Some idle time on each CPU at high load [mp ]sar -q shows runq-sz < 2 Few processes waiting to run mpsar -q shows %runocc < 90% (SMP only) Run queue is not continually occupied See Chapter 3, "Tuning CPU resources" (page 21). Memory performance Explanation [mp]sar -p shows rdm/s::::: 0 Little or no swapping or paging out activity [mp ]sar -w shows swpot/ s ::::: 0 Little or no activity on the swap device(s) [mp ]sar -q shows swpq-sz ::::: 0 and %swpocc:::::O% No swapped-out runnable processes [mp ]sar -r shows freemem» GPGSHI Ample free memory and swap space and freeswp ::::: constant See Chapter 4, "Tuning memory resources" (page 41). Disk I/O performance Explanation [cpu]sar -u shows %wio < 15% Little time spent waiting for I/O to complete [mp]sar-b shows %rcache > 90% and %wcache > 65% Good hit rate for reading and writing to the buffer cache [mp ]sar -d shows avque ::::: 1 Low average number of disk requests queued [mp ]sar -n shows %Hhit > 65% or %Dhit> 65% Good hit rate for namei cache See Chapter 5, "Tuning I/O resources" (page 71). 238 Performance Guide Table 0-3 Attributes of a well-tuned dedicated database server system CPU performance Explanation [cpu]sar -u shows %idle > 5% Some idle time on each CPU at high load [mp ]sar -q shows runq-sz < 2 Few processes waiting to run mpsar -q shows %runocc < 90% (SMP only) Run queue is not continually occupied See your database documentation and Chapter 3, "Tuning CPU resources" (page 21). Memory performance Explanation [mp]sar -p shows rclm/s::::: a Little or no swapping or paging out activity [mp ]sar -w shows swpot/ s ::::: 0 Little or no activity on the swap device(s) [mp ]sar -q shows swpq-sz ::::: 0 and %swpocc :::::0% No swapped-out runnable processes [mp ]sar -r shows freemem::::: GPGSHI and freeswp ::::: constant Little excess free memory; allow the database to use any excess memory by increasing its internal work area. See your database documentation and Chapter 4, "Tuning memory resources" (page 41). Disk 110 performance Explanation [cpu]sar -u shows %wio < 15% Little time spent waiting for I/O to complete [mp ]sar -d shows avque ::::: 1 Low average number of disk requests queued See your database documentation and Chapter 5, "Tuning I/O resources" (page 71). 239 Quick system tuning reference 240 Performance Guide Bibliography The following books provide more information about topics outlined in this guide. This list is provided for reference only; it is not comprehensive and The Santa Cruz Operation, Inc. does not guarantee the accuracy of these publications. The implementation of the UNIX system, networking and performance analysis software described in these books may differ in some details from that of the current sea OpenServer software. Several references are also included on the subject of algorithmics which has direct relevance to programmers who wish to improve the performance of applications programs. Ammeraal, Leendert. Programs and Data Structures in C, Second Edition. New York, NY: Wiley, 1992. A practical introduction to the implementation and manipulation of data structures using the ANSI e programming language. Bach, Maurice J. The Design of the UNIX Operating System. Englewood Cliffs, NJ: Prentice Hall, 1986. A technical discussion of the internals of the UNIX System V Operating System, written shortly before the release of UNIX System V Release 3. Deitel, Harvey M. An Introduction to Operating Systems, Second Edition. Reading, MS: Addison-Wesley, 1990. Discusses general performance issues for operating systems. Harel, David. Algorithmics: The Spirit of Computing, Second Edition. Reading, MS: Addison-Wesley, 1992. A very readable introduction to the subject of algorithmics. Hunt, Craig. TCP/IP Network Administration. Sebastopol, CA: O'Reilly and Associates, 1993. Contains information about the configuration of IP packet routing and name service. Knuth, Donald E. The Art of Computer Programming, Volume I: Fundamental Algorithms. Reading, MS: Addison-Wesley, 1968. The first volume of the classic three-volume series on the subject of computer programming. Knuth, Donald E. The Art of Computer Programming, Volume II: Seminumerical Algorithms. Reading, MS: Addison-Wesley, 1969. Knuth, Donald E. The Art of Computer Programming, Volume III: Sorting and Searching. Reading, MS: Addison-Wesley, 1973. 241 Bibliography Loukides, Mike. System Performance Tuning. Sebastopol, CA: O'Reilly and Associates, 1991. Includes many excellent tips for getting the best performance out of UNIX systems. Mansfield, Niall. The Joy of x. Reading, MS: Addison-Wesley, 1993. Contains useful information about performance issues for the X Window System. Messmer, Hans-Peter. The Indispensable PC Hardware Book. Reading, MS: Addison-Wesley, 1994. Provides comprehensive information about system hardware issues. Miscovitch, Gina and David Simons. The sca Performance Tuning Handbook. Englewood Cliffs, NJ: Prentice Hall, 1994. Written by two senior kernel engineers at sea, this book describes performance tuning for sca® UNIX® Release 3.2 Version 4.2, sca MPXTM 3.0, sea Open Desktop 3.0, and sca Open ServerTM 3.0 systems. Press, William H., Brian P. Flannery, Saul A. Teukolsky, and William T. Vetterling. Numerical Recipes in c: The Art of Scientific Computing, Second Edition. Cambridge University Press, 1994. Includes many numerical algorithms for scientific and engineering applications. Stem, Hal. Managing NFS and NIS. Sebastopol, CA: O'Reilly and Associates, 1991. Contains a detailed chapter on performance analysis and tuning as well as useful references on IP packet routing and NFS benchmarks. 242 Performance Guide Glossary of peifonnance terminology This section contains definitions of the key terms used throughout this book in discussing the performance of computer systems. AlO See asynchronous I/O. asymmetric multiprocessing A multiprocessor system is asymmetric when processors are not equally able to perform all tasks. For example, only the base processor is able to control I/O. Most machines acknowledged to be symmetric may still have some asymmetric features present such as only being able to boot using the base processor. asynchronous VO Provides non-blocking I/O access through a raw device interface. bandwidth The maximum I/O throughput of a system. base processor The first CPU in a multiprocessor system. The system normally boots using this CPU. Also called the default processor, it cannot be deactivated. bdflush The system name for the buffer flushing daemon. benchmark Software run on a computer system to measure its performance under specific operating conditions. block device interface Provides access to block-structured peripheral devices (such as hard disks) which allow data to be read and written in fixed-sized blocks. blockingVO Forces a process to wait for the I/O operation to complete. Also known as synchronous I/O. boUleneck Occurs when demand for a particular resource is beyond the capacity of that resource and this adversely affects other resources. For example, a system has a disk bottleneck if it is unable to use all of its CPU power because processes are blocked waiting for disk access. bss Another name for data which was not initialized when a program was compiled. The name is an acronym for block started by symbol. 243 Glossary of performance terminology buffer A temporary data storage area used to allow for the different capabilities (speed, addressing limits, or transfer size) of two communicating computer subsystems. buffer cache Stores the most-recently accessed blocks on block devices. This avoids having to re-read the blocks from the physical device. buffer flushing daemon Writes the contents of dirty buffers from the buffer cache to disk. cache memory High-speed, low-access time memory placed between a CPU and main memory in order to enhance performance. See also level-one (Ll) cache and leveltwo (L2) cache. checkpointing One of the functions of the htepi_daemon; marking a filesystem state as clean after it flushes changed metadata to disk. child process A new process created when a parent process calls the fork(S) system call. clean The state of a system buffer or memory page that has not had its contents altered. client-server model A method of implementing application programs and operating system services which divides them into one of more client programs whose requests for service are satisfied by one or more server programs. The client-server model is suitable for implementing applications in a networked computer environment. Examples of application of the client-server model are: • page serving to diskless clients • file serving using NFS and NUCFS • Domain Name Service (DNS) • the X Window System • many relational database management systems (RDBMSs) clock interrupt See clock tick. clock tick An interrupt received at regular intervals from the programmable interrupt timer. This interrupt is used to invoke kernel activities that must be performed on a regular basis. 244 Performance Guide contention Occurs when several CPUs or processes need to access the same resource at the same time. context The set of CPU register values and other data, including the u-area, that describe the state of a process. context switch Occurs when the scheduler replaces one process executing on a CPU with another. copy-on-write page A memory page that is shared by several processes until one tries to write to it. When this happens, the process is given its own private copy of the page. Cow page See copy-on-write page. CPU Abbreviation of central processing unit. One or more CPUs give a computer the ability to execute software such as operating systems and application programs. Modem systems may use several auxiliary processors to reduce the load on the CPU(s). CPU bound A system in which there is insufficient CPU power to keep the number of runnable processes on the run queue low. This results in poor interactive response by applications. daemon A process that performs a service on behalf of the kernel. Since daemons spend most of their time sleeping, they usually do not consume much CPU power. device driver Performs I/O with a peripheral device on behalf of the operating system kernel. Most device drivers must be linked into the kernel before they can be used. dirty The state of a system buffer or memory page that has had its contents altered. distributed interrupts Interrupts from devices that can be serviced by any CPU in a multiprocessor system. event In the X Window System, an event is the notification that the X server sends an X client to tell it about changes such as keystrokes, mouse movement, or the moving or resizing of windows. 245 Glossary of performance terminology executing Describes machine instructions belonging to a program or the kernel being interpreted by a cpu. fragmentation The propensity of the component disk blocks of a file or memory segments of a kernel data structure to become separated from each other. The greater the fragmentation, the more work has to be performed to retrieve the data. free list A chain of unallocated data structures which are available for use. garbage collection The process of compacting data structures to retrieve unused memory. htepCdaemon A kernel daemon that handles filesystem metadata. It can also perform optional transaction intent logging and checkpointing on behalf of the HTFS filesystem. idle The operating system is idle if no processes are ready-to-run or are sleeping while waiting for block I/O to complete. idle waiting for 110 The operating system is idle waiting for I/O if processes that would otherwise be runnable are sleeping while waiting for I/O to a block device to complete. in-core Describes something that is internal to the operating system kernel. in-core inode An entry in the kernel table describing the status of a file system inode that is being accessed by processes. inode Abbreviation of Index Node. An inode is a data structure that represents a file within a traditional UNIX filesystem. It consists of a file's metadata and the numbers of the blocks that can be used to access the file's data. interrupt A notification from a hardware device about an event that is external to the cPU. Interrupts may be generated for events such as the completion of a transfer of data to or from disk, or a key being pressed. interrupt bound A system which is unable to handle all the interrupts that are arriving. interrupt latency The time that the kernel takes to handle an interrupt. 246 Performance Guide interrupt overrun Occurs when too many interrupts arrive while the kernel is trying to handle a previous interrupt. 110 Abbreviation of input/output. The transfer of data to and from peripheral devices such as hard disks, tape drives, the keyboard, and the screen. 110 bound A system in which the peripheral devices cannot transfer data as fast as requested. job One or more processes grouped together but issued as a single command. For example, a job can be a shell script containing several commands or a series of commands issued on the command line connected by a pipeline. kernel The name for the operating system's central set of intrinsic services. These services provide the interface between user processes and the system's hardware allowing access to virtual memory, I/O from and to peripheral devices, and sharing resources between the user processes running on the system. kernel mode See system mode. kernel parameter A constant defined in the file /etc!conf/cfdlmtune (see mtune(F» that controls the configuration of the kernel. level-one (Ll) cache Cache memory that is implemented on the CPU itself. level-two (L2) cache Cache memory that is implemented externally to the CPU. load average The utilization of the CPU measured as the average number of processes on the run queue over a certain period of time. logging See transaction intent logging. marry driver A pseudo-device driver that allows a regular file within a filesystem to be accessed as a block device, and, hence, as a swap area. memory bound A system which is short of physical memory, and in which pages of physical memory, but not their contents, must be shared by different processes. This is achieved by paging out, and swapping in cases of extreme shortage of physical memory. 247 Glossary of performance terminology memory leak An application program has a memory leak if its size is constantly growing in virtual memory. This may happen if the program is continually requesting more memory without re-using memory allocated to data structures that are no longer in use. A program with a memory leak can eventually make the whole system memory bound, at which time it may start paging out or swapping. metadata The data that an inode stores concerning file attributes and directory entries. multiprocessor system A computer system with more than one cpu. multithreaded program A program is multithreaded if it can be accessed simultaneously by different CPUs. Multithreaded device drivers can run on any cpu in a multiprocessor system. The kernel is multithreaded to allow equal access by all CPUs to its tables and the scheduler. Only one copy of the kernel resides in memory. namei cache A kernel data structure that stores the most-commonly accessed translations of file system pathname components to inode number. The namei cache improves I/O performance by reducing the need to retrieve such information from disk. nice value A weighting factor in the range 0 to 39 that influences how great a share of cpu time a process will receive. A high value means that a process will run on the cpu less often. non-blocking liD Allows a process to continue executing without waiting for an I/O operation to complete. Also known as asynchronous I/O. operating system The software that manages access to a computer system's hardware resources. overhead The load that an operating system incurs while sharing resources between user processes and performing its internal accounting. page A fixed-size (4KB) block of memory. page fault A hardware event that occurs when a process tries to access an address in virtual memory that does not have a location in physical memory associated with it. In response, the system tries to load the appropriate data into a newly assigned physical page. 248 Performance Guide page stealing daemon The daemon responsible for releasing pages of memory for use by other processes. Also known as vhand. paging in Reading pages of program text and pre-initialized data from the filesystems, or stack and data pages from swap. paging out Releasing pages of physical memory for use by making temporary copies of the contents of dirty pages to swap space. Clean pages of program text and pre-initialized data are not copied to swap space because they can be paged in from the filesystems. parent process A process that executes a fork(S) system call to create a new child process. The child process usually executes an exec(S) system call to invoke a new program in its place. physical memory Storage implemented using RAM chips. preemption A process that was running on a CPU is replaced by a higher priority process. priority A value that the scheduler calculates to determine which process(es) should next run on the CPUs. A process' priority is calculated from its nice value and its recent CPU usage. process A single instance of a program in execution. This can be a login shell or an operating system command, but not a built-in shell command. H a command is built into the shell a separate process is not created on its invocation; the built-in command is issued within the context of the shell process. process tab Ie A data structure inside the kernel that stores information about all the processes that are present on a system. protocol A set of rules and procedures used to establish and maintain communication between hardware or software subsystems. protocol stack Allows two high-level systems to communicate by passing messages through a low-level physical interface. pseudo-device driver A device driver that allows software to behave as though it is a physical device. Examples are ram disks and pseudo-ttys. 249 Glossary of performance terminology pseudo-tty A pseudo-terminal is a device driver that allows one process to communicate with another as though it were a physical terminal. Pseudo-ttys are used to interface to programs that expect to receive non-blocking input and to send terminal control characters. queue An ordered list of entities. race condition The condition which occurs when several processes or CPUs are trying to write to the same memory or disk locations at the same time. The data that is eventually stored depends on the order that the writes occur. A synchronization mechanism must be used to enforce the desired order in which the writes are to take place. array Abbreviation of redundant array of inexpensive disks. Used to implement high performance and/or high integrity disk storage. RAID ramdisk A portion of physical memory configured to look like a physical disk but capable of fast access times. Data written to a ramdisk is lost when the operating system is shut down. Ramdisks are, therefore, only suitable for implementing temporary filesystems. raw device interface Provides access to block-structured peripheral devices which bypasses the block device interface and allows variable-sized transfers of data. The raw interface also allows control of a peripheral using the ioctl(S) system call. This allows, for example, for low-level operations such as formatting a disk or rewinding a tape. region A region groups a process' pages by their function. A process has at least three regions for its data, stack, and text. resource Can be divided into software and hardware resources. Software resources may be specific to applications, or they may be kernel data structures such as the process table, open file, and in-core inode tables, buffer and namei caches, multiphysical buffers, and character lists. Hardware resources are a computer's physical subsystems. The three main subsystems are CPU, memory and I/O. The memory subsystem can be divided into two resources physical memory (or main memory) and swap space (or secondary memory). The I/O subsystem comprises one or more resources of similar or different types - hard and floppy disk drives, tape drives, CD-ROMs, graphics displays and network devices. ready-to-run process A process that has all the system resources that it needs in order to be able to runonaCPU. 250 Performance Guide response time The time taken between issuing a command and receiving some feedback from the system. This is not to be confused with turnaround time which is a measure of how long a particular task takes from invocation to completion. run queue The list of ready-to-run processes maintained by the kernel. runnable process See ready-to-run process. scaling A computer system's ability to increase its processing capacity as CPUs are added. If the processing capacity increases in direct proportion to the number of CPUs, a system is said to exhibit 100% scaling. In practice, a system's ability to scale is limited by contention between the CPUs for resources and depends on the mix of applications being run. sched The system name for the swapper daemon. scheduler The part of the kernel that chooses which process(es) to run on the CPUs. single threaded program A program is single threaded if it can only run on one CPU at a time. Single threaded devices drivers can only run on the base processor in a multiprocessor system. sleeping on 110 See waiting for I/O. spin lock A method of synchronizing processes on a multiprocessor system. A process waiting for a resource which is currently in use (locked) by a process running on a different CPU repeatedly executes a short section of kernel code (spins) until the lock is released. stack A list of temporary data used by a program to handle function calls. strd The system name for the STREAMS daemon. stream head The level of the STREAMS I/O interface with which a user process communicates. STREAMS 110 A mechanism for implementing a layered interface between applications running in user space and a device driver. Most often used to implement network protocol stacks. 251 Glossary of performance terminology STREAMS daemon The daemon used by the STREAMS I/O subsystem to manage STREAMS memory. swap area A piece of swap space implemented as a disk division or as a block device married to a regular file in a filesystem. swap space A collection of swap areas used to store the contents of stack and data memory pages temporarily while they are used by other processes. swapper daemon Part of the kernel that reclaims physical pages of memory for use by copying whole regions of processes to swap space. swapping The action take by the swapper daemon when the system is extremely short of physical memory needed for use by processes. Swapping can place a heavy load on the CPU and disk I/O subsystems. symmetric multiprocessing A multiprocessor system is symmetric when any processor can perform any function. This ensures an even load distribution because no processor depends on another. Each process is executed by a single processor. system mode The state of a CPU when the kernel needs to ensure that it has privileged access to its data and physical devices. Also known as kernel mode. text Executable machine instructions (code) that a CPU can interpret and act on. throughput The amount of work (measured in number of jobs completed, disk requests handled, and so on) that a system processes in a specified time. time slice The maximum amount of time for which a process can run without being preempted. transaction intent logging One of the functions of the htepi_daemon; writing the intention to change filesystem metadata to a log file on disk. u-area Abbreviation of user area and also known as a u-block. A data structure possessed by every process. The u-area contains private data about the process that only the kernel may access. user mode The state of a CPU when it is executing the code of a user program that accesses its own data space in memory. 252 Performance Guide vhand The system name for the page stealing daemon. virtual disk A disk composed of pieces of several physical disks. virtual memory A method of expanding the amount of available memory by combining physical memory (RAM) with cheaper and slower storage such as a swap area on a hard disk. waiting for I/O A process goes to sleep if it has to wait for an I/O operation to complete. X client An applications program that communicates with an X server to request that it display information on a screen or to receive input events from the keyboard or a pointing device such as a mouse. The client may be running on the same computer as the server (local), or it may be connected via a network (remote). X server The software that controls the screen, keyboard and pointing device under the X Window System. X terminal A display device that is able to run X server software. All of an X terminal's clients must run on remote machines. X Window System A windowing system based on the client-server model. zombie process An entry in the process table corresponding to a process that no longer exists. The entry will only be removed if its parent process invokes a wait(S) system call. A zombie process does not consume any system resources apart from its slot in the process table. However, you should beware of runaway processes that generate many zombies. These will cause the system to become short of memory as the process table grows to accommodate them. 253 Glossary of performance terminology 254 Performance Guide Index Symbols, numbers 16450, UART, 108 16550, UART,l08 80387, math coprocessor,221 8250, UART, 108 10Base2, 137 10Base5, 137 10Base-T, 137 asynchronous writes, configuring on NFS server, 151 automount(NADM), performance considerations, 153 avque field, sar -d, 69, 89, 90 avservfield,sar-d,89,90 avwait field, sar -d, 89, 90 B A Address Resolution Protocol, parameters, 227 address space, limiting, 202 ahdlcmtu, 228 AlO. See asynchronous I/O aio_breakup - AlO buffer table overflow, 199 aio_breakup - AIO request table overflow, 199 aio_memlock - AlO process table overflow, 199 aio_setlockauth - AIO lock table overflow, 200 allocb - Out of streams memory, 211 allocreg - Region table overflow,209 applications performance tuning, 9 using STREAMS, 129 AJ{P,parameters,227 arp_maxretries,227 arpprintfs, 227 arpt_down,228 arpCkeep, 228 arpCprune, 228 ASYH, parameters, 228 asynchronous I/O control blocks, 85 high performance, 11 introduced,72 kernel parameters, 199 POSIX.lb,162 viewing activity of, 162 back-to-back packets, 225 bad line, 110 badcalls field, nfsstat -c, 144 badlen field, nfsstat -s, 145 badxid field, nfsstat -c, 145 balancing hard disk activity, 98 base processor,23 bdflush, 73, 194 BDFLUSHR, 93, 194 benchmarks, 118 BFREEMIN, 195 biod daemons, performance tuning, 149 blks/s field, sar -d, 90 block device, switch table size, 222 block I/O, viewing, 89 blocks field, swap -1,48 boot, load extension, kernel parameters, 222 Boot Time Loadable Driver, kernel parameters,222 bread/s field, sar -b, 75 bridges, 137 bswot/s field, sar -w, 49, 68 BTLD, kernel parameters, 222 buffer cache changing size at boot time, 79 disk blocks read to, 75 disk blocks written from, 75 effect of large, 77 finding size of, 74 free list, 195 hash queues, setting number of, 80 hit rates, 54 increasing available memory, 54 increasing size of, 75 255 buffer buffer cache (continued) number of reads from, 54, 75 number of writes to, 54, 75 position in memory of, 79 reducing contention, 81 reducing size of, 53, 54, 70 too small, 67 used by a database, 122 used by system, 72 viewing activity of, 75 buffer flushing daemon, 73 buffer header, STREAMS, 127 buffers allocating character list, 197 configuration string, 219 increasing cache hit rate, 192 kernel parameters, 192 specifying age for filesystem updates, 194 splitting threshold, 211 writing to disk, 194 %busy field, sar -d, 69, 89, 90 buying hardware, 8 bwrit/s field, sar -b, 75 c C2, disabling features, 203 cache hits, reducing disk accesses, 192 cache_affinity variable, 36 cblock,110 character block, 110 character buffers allocating number of, 197 kernel parameters, 197 character device, switch table size, 222 character lists introduced, 110 tuning, 112 chattering terminal, 110 CheaperNet, 137 checkpointing, 206 chown kernel privilege, controlling, 203 CHOWN_RES, 203 client-server applications, 139 running applications over network, 140 clist,110 CLK_TCK,223 clock interrupt, 26 256 clock tick, 26 cluster, filesystem, 94 cluster buffers, number set using NMPBUF, 193 cluster size, 103 CMASK,202 Collis field, netstat -i, 133 configuration, tunable kernel parameters, 191 configuration string, size of buffer, 219 configuration-dependent values, changing, 223 configure(ADM), 189 console kernel parameters, 204 plasma display, 221 console screen saver, 204 contention, locking, 10 context switch, 28 control, map size, specifying, 216 Control Register 0 (CRO), 221 Control Register 3 (CR3), 221 copy buffers number set using NMPBUF, 193 tuning number of, 86 used by system, 79, 84 CPU adding, 31 base processor, 23 disabling, 23 enabling, 23 idle, 22 number currently active, 23 turning on/off, 23 viewing activity of, 23 CPU-bound system identifying, 38 tuning, 40 cpuonoff(ADM), 23 cpusar(ADM) -1,34 -j,34 -u, 23,119 crash(ADM) available swappable memory, 49 reading putbuf buffer, 193 crontab(C), 40,52 CTBUFSlZE,220 D data, region, 209 database server, adjusting scheduler behavior for, 35 database systems, 118 databases arranging disks on server, 122 buffer cache used by, 122 disk layout of journal logs, 122 profiling files in, 122 shared memory, 122 desktop, reducing memory usage, 56 desktop client, performance, 56 device driver kernel parameters, 222 multithreaded, 33 third party, 33 device field, sar -d, 89 / dev / spx, 212 df(C),172 dfspace(C),172 %Dhit field, sar -n, 82, 91 D_hits field, sar -n, 82 %direct field, sar -0, 162 directories, efficiency of searching, 95 disk controllers block caching, 92 effect of slow, 67 multiple, 92 track caching, 92 disk I/O-bound system, identifying, 90 diskless clients, NIS, 154 disks average number of requests waiting for, 89 average size of data transfers, 90 average time for request to, 89 configuration for database server, 122 estimating throughput of, 90 even distribution of activity, 120 examining amount of space, 172 kernel parameters, 192 percentage of time busy, 89 redistributing data, 122 time request waits in driver, 89 dkconfig(ADM) -ps,104 -Tp,106 dk_name - Diskinfo table overflow, 207 DMA (Direct Memory Access) buffers, 84 simultaneous requests on channel, 221 transfers, 79 use by hard disk controllers, 40 DMAEXCL,221 Dmisses field, sar -n, 82 DNS (Domain Name Service), performance considerations, 141-142 D0387CR3, 221 dopricalc variable, 35 DOS filesystem kernel parameters, 206 DOSNINODE,206 DOSNMOUNT,206 DOWPCRO, 221 DSTFLAG,219 DTCACHEENTS, 199 DTHASHQS, 199 DTOFBIAS,199 dynamic kernel table parameters, 56, 207 dynamic linked libraries, 11 dynamic tables, kernel parameters, 207 E environment variables, TZ (timezone), 219 /etc/conf/cf.d/mtune,190 /etc/conf/cf.d/stune,190 / etc/ default/ inet, 227 TCP lIP configuration, 226 /etc/default/login,202 /etc/tcp script, TCP lIP configuration, 225 Ethernet, 137 EVDEVS,216 EVDEVSPERQ,216 event-Event channel full, 216 event - Event table full, 216 event queue, kernel parameters, 216 EVQUEUES,216 exec/ s field, sar -c, 162 execution profiler, 10, 162 EXTRA_NDEV,222 EXTRA_NEVENT,222 EXTRA_NFILSYS, 222 257 factor(c) F H factor(C), testing for prime, 83 fail field, netstat -m, 130 falloc - File table overflow, 208 file table, viewing, 55 files compression, 206 controlling depth of versioning, 205 controlling undelete time, 206 default mask used on creation of, 202 maximum number of open, 201 size limit, 202 synchronization, 206 filesystem configuration, kernel parameters,205 filesystems cluster, 94 defragmenting, 94 examining amount of space, 172 factors that affect performance of, 94 fragmentation, 94 nameicache, 198 writing buffers to disk, 194 file-sz field, sar -v, 55 fixed-priority process, 28 floating point coprocessors, 21 fork/s field, sar -c, 162 fragmentation, filesystem, 94 Fragmentation Required, 230 free list used by buffer cache, 195 used by paged memory, 42 free memory pages, 48 freemem field, sar -r, 48, 51, 68 freeswp field, sar -r, 48 full frames, 131 full stripe, 103 hard disks balancing activity of, 98 performance limitations, 96 hardware kernel parameters, 222 performance, 8 performance considerations, 8 upgrading, 8 hardware-dependent kernel parameters, 220 Hardware/Kernel Manager, 188 hash queues increasing with system buffers, 192 setting number of, 80 %Hhit field, sar -n, 82, 83, 91 H_hits field, sar -n, 82 Hmisses field, sar -n, 82 hop count, increasing on interface, 140 host adapter scatter-gather, 92 tagged command queuing, 92 HTCACHEENTS,198 HTFS filesystems, increasing performance of, 95 HTHASHQS, 198 HTOFBIAS, 198 HZ, clock interrupt rate, 26 G Gateway for NetWare, performance tuning, 157 getconf(C),207,223 GPGSHI, 45, 53,67, 195 GPGSLO, 44,51,53, 195 group configuration, kernel parameters, 201 groups, limiting supplemental, 202 258 I ICMP Host Unreachable, 231 ICMP (Internet Control Message Protocol) parameters, 228 icmp_answermask, 228 icmpprintfs,228 iddeftune(ADM),53 idle no runnable processes, 23 operating system state, 22 waiting for 1/0,22 %idle field, sar -u, 23 idle waiting for 1/0,23 idtune(ADM), 190 changing kernel parameters using, 190 Ierrs field, netstat -i, 133 ifconfig(ADMN), 131,225 IGMP (Internet Group Management Protocol) parameters, 229 igmpprintfs, 229 kernel IKNT (in-kernel network terminal) driver, configuring, 229 inconfig(ADMN), 132,226 in-core inode table, viewing, 55 indirect blocks, 95 inet file, TCP /IP configuration, 226 in_fullsize, 229 in-kernel network terminal driver, configuring, 229 in_Ioglimit, 229 inode table allocating entries, 207 viewing, 55 Inode table overflow, 208 inodes indirect blocks, 95 number in DOS filesystem, 206 inod-sz field, sar -v, 55, 83 in_recvspace, 230 in_sendspace, 230 intelligent serial cards, 40 inter-CPU interrupts, examining, 34 interface cards, performance tuning, 225 Internet Control Message Protocol parameters,228 Internet Group Management Protocol parameters, 229 Internetwork Packet Exchange, IPX, 157 interrupt bound,l11 examining activity, 34 examining inter-CPU activity, 34 inter-CPU, 34 introduced, 28 latency, 111 overrun,lll sharing, 221 trigger level, 111 I/O asynchronous, 72 buffers, 74 programmed,22 synchronous, 72 tuning, 71 I/O bottlenecks due to LMCFS performance, 155 due to NFS performance, 146 I/O vector size, setting, 203 I/O-bound system, identifying, 91 10V_MAX,203 IP (Internet Protocol) configuring for NFS, 152 introduced, 131 parameters, 229 IPC activity, viewing, 162 ip_checkbroadaddr, 230 IPC_NOWAIT, 214 ip_dirbroadcast, 230 ipforwarding, 231 ipnonlocalsrcroute, 232 ip _perform_pmtu, 230 ip _pmtu_decrease_age, 230 ip_pmtu_increase_age, 230 ipprintfs, 232 ipsendredirects, 231 ip _settos, 230 ip_subnetsarelocal,231 ip_ttl,231 IPX (Internetwork Packet Exchange), 157 IPX/SPX, performance tuning, 157 J job structure, 107 joumallogs bottleneck, 98 disk layout, 122 K KB1YPE,221 KDBSYMSIZE, 219 kernel managing virtual address space, 197 relinking with link_unix, 189 resources, 189, 190 kernel debugger, size of symbol table, 219 kernel mode, operating system state, 22 kernel parameters AIO,199 boot load extension, 222 BTLD,222 buffers, 192 changing,189 changing using configureADM), 191 changing using idtune(ADM), 190 console, 204 disks,192 event queues, 216 filesystem, 205 259 kernel kernel parameters (continued) hardware-dependent, 220 math coprocessor, 221 memory management, 195,197 message queues, 213, 215 multiphysical buffers, 192 multiscreens,204 nameicache,198 paging, 195 processes, 195 semaphores, 216 shared memory, 218 S11(E~S,209-213 swapping, 195 system name, 219 tunable for configuration, 191 tunable for performance, 191 user and group configuration, 201 virtual disk, 200 kernel profiler, text symbols, 219 kernel tables dynamic, 56 kernel parameters, 207 KERNEL_CLISTS, 224 adjusting number of, 112 KERNEL_CLISTS_MAX,224 KERNEL_MOUNT_MAX,207 keyboard, logical character protocol, 221 L L1 cache, 37, 41 L2 cache, 35, 37, 41 LAN Manager Client Filesystem, LMCFS kernel parameters, 223 performance tuning, 154 latches, 165 latency, interrupt, 111 layers, setting number of, 205 libraries, 11 link_unix(ADM), 189 LMCFS (LAN Manager Client Filesystem) kernel parameters, 223 performance tuning, 154,155 LMCFS_BUF_SZ, 223 LMCFS_L~,156,223 LMCFS_NUM_BUF, 156,223 LMCFS_NUM_RECb156,223 260 Imc(LMC) mntstats, 156 stats,156 load displayer,34 loadbalance variable, 37 localization of reference, 11 lock table, viewing, 55 locks, contention, 10 lock-sz field, sar -v, 55 log driver, number of minor devices, 213 logging, 206 login user ID, LUID,203 logs, disk layout, 122 LUID (login user ID), 203 M math coprocessor, kernel parameter, 221 MAX_BDEV,222 MAX_CDEV,222 MAX_CFGSIZE,219 MAX_DISK, 207 MAXFC,197 MAX_FILE, 208 MAX_FLCKREC,209 maximum segment size, adjusting, 131 MAX_INODE,207 MAX_MOUNT, 153,209 MAX_PROC, 202,208 MAX_REGION, 209 MAXSC,197 MAXSEPGCNT,212 MAXSLICE,26,196 MAXUMEM,202 MAXUP,202 MAXVDEPTH,205 MBCL (message block control logging), parameters,232 mbclprintfs, 232 mdmin/s field, sar -y, 110 memory adding more, 53 cause of leak, 52 finding amount of, 42 greater than 32MB, 53 management 195 maximum used by STREAMS, 210 pages, 42 setting maximum used by process, 202 shared segment size, 218 networking memory (continued) swappable, 49 used by virtual disk driver, 101 memory management, kernel parameters 197 ' memory-bound system identifying, 51 tuning, 52 message block control logging, MBCL, parameters, 232 message buffers, 127 message header, STREAMS, 127 message map, size of, 164 message queue, kernel parameters, 213, 215 messages data per queue, 164 file, 42, 74 length of, 164 memory reserved for, 164 message queues, 163 number of segments, 164 size of message map, 164 size of segment, 164 using, 163 viewing activity of, 162 MINARMEM, 196 MINASMEM,196 MINVTIME,206 mkdev(ADM) configuring layers, 205 configuring pseudo-ttys, 205 configuring shell layers, 205 modems, tuning serial port parameters, 111 MODE_SELECT, 221 mpsar(ADM). See sar(ADM) mpstat(ADM),34 MSGMAP, 164, 213 MSG~,l64,214 MS~B,l64,214 MS~I,215 msg/ s field, sar -m, 163 MSGSEG, 164,214 MSGSSZ, 164, 215 MSGTQL,215 mtune file, 190 mtune(F), kernel parameters file, 187 multiphysical buffers configuring number of, 193 kernel parameters, 192 tuning number of, 86 used by system, 72, 79, 84 multiplexer links, 213 multiscreen, kernel parameters, 204 N NAHACCB, 222 NAIOBUF, 199 NAIOHBUF, 200 NAIOLOCKTBL, 200 NAIOPROC,199 NAIOREQ,199 NAIOREQPP, 200 . name service, performance considerations 141-142 ' name to inode (namei) translation cache 72 namei cache ' DTFS kernel parameters, 199 HTFS kernel parameters, 198 kernel parameters, 198 low hit rate for, 91 number of components found in, 82 number of misses in, 82 operation of, 81 percentage of hits in, 82 tuning performance of, 83 used by system, 72 NAME_MAX,224 NAUTOUP, 93, 194 nbprintfs, 232 nb_sendkeepalives, 232 NBUF,54,74,78,192 nbuf bootstring, 79 NCLIST, 110, 111, 112, 197 NCPYRIGHT, 219 NEMAP,222 NetBEUI performance tuning, 157 protocol stack, 157 NetBIOS interface, 157 parameters, 232 performance tuning, 157 netstat(TC) -i, 133,158 -m,130 networking, performance tuning, 123-157 networking parameters, TCP lIP, 227-234 261 networks networks configuring topology of, 137-140 interface cards, performance tuning, 225 monitoring activity of, 138 packet collisions, 133, 138 packet corruption, 133 packet transmit errors, 133 route tracing, 136 server types, 139 sniffer, 138 subnets, 139 testing connectivity, 136 newproc - Process table overflow, 208 NFS (Network File System) asynchronous writes, configuring, 151 configuring IP for, 152 configuring to use TCP, 152 daemons, tuning, 147 examining client performance, 144 mount(ADM) options, configuring, 153 performance implications of daemons, 146 performance tuning, 142-154 server, examining performance, 145 synchronous writes, configuring, 151 tuning client performance, 147 nfsd daemons, performance tuning, 148 nfsstat(NADM) -c,l44 -s,145 NGROUPS,202 NGROUPS_MAX runtime value, 202 ~BUF,78,80,192 NHINODE, 193 nice value changing, 30 use in calculating priority,29 NIS (Network Information Service) clients, 154 performance considerations, 154 NKDVTIY,222 NLOG,213 NMPBUF, 79, 84, 193 NMUXLINK,213 NODE,219 NOFILES,201 non-intelligent serial cards, 40, 108 Not enough space, 202 nping(PADM), 157 NSCRN,205 NSHINTR, 221 262 NSPTIYS, 58, 205 ~STREAM,58,129,130,210 ~STREVENT, 211 58, 128, 130 ~STRPUSH, 213 ~UMSP, 58, 212 ~STRPAGES, ~SXT,205 ~UMTIM,212 ~MTRW,212 ~,205 ~ZERO,223 o Oerrs field, netstat -i, 133 ompb Is field, sar -h, 86 one-packet mode disabling, 226 enabling, 225 setting, 132 open file table, viewing, 55 operating system states, 22 oreqblkls field, sar -5, 94 OS! protocol stack, 157 Out of clists ..., 197 out of streams, 129 ov clistI s field, sar -g, 112 overrun, interrupt, 111 ovsiodmal s field, sar -g, 112 ovsiohw I s field, sar -g, 111 p packets collisions, 133, 138 corrupted,133 output errors, 133 page stealing daemon, vhand, 44 pages, 42 paging affecting I/O throughput, 69 heavy activity, 51 indicating memory shortage, 67 memory, 195 pages added to the free list, 50 pages not found in memory, 50 used by system, 44 Path MTU discovery, 230,233 perfect scaling, 31 ROOTMINVTIME performance buying hardware, 8 collecting data, 16 defining goals, 16 formulating a hypothesis, 17 getting more specifics, 17 hard disk, 96 hardware considerations, 8 introduced, 7 making adjustments, 18 managing, 13 managing the workload, 19 tunable kernel parameters, 191 tuning applications, 9 tuning methodology, 14 upgrading hardware, 8 performance tuning, quick guide, 235-239 piece structure, 107 ping(ADMN),136 PIO (programmed I/O), 22 PLOWBUFS, 79,193 PMTU discovery, 230, 233 _POSIX_CHOWN_RESTRICTED, 203 PPP ASYH parameters, 228 performance tuning, 136 preemption variable, 37 PRFMAX,219 prime number, testing for, 83 primove variable, 36 priority table of values, 175 types of, 29 process context, 28 examining activity of, 173 finding virtual size of, 52 fixed-priority, 28 kernel parameters, 195 limiting number of, 202 memory management, paging and swapping, 195 nice value, 29 priority of, 29 regions, 209 scheduling, 24 specifying maximum time slice, 196 process table allocating entries in, 208 viewing, 55 processes, reducing number of, 70 proc-sz field, sar -v, 55 profiling, 10, 162 profiling files, in databases, 122 programmed I/O, PIO, 22 protocol stack, implementation of, 125 ps(C) -el,52 using, 173 pseudo-ttys, configuring, 205 putbuf buffer, 193 PUTBUFSZ, 193 Q QIC-02 tape drive, size of buffer, 220 R RAID (redundant array of inexpensive disks), performance, 102 raw I/O, 118 rawch/s field, sar -y, 110 %rcache field, sar -b, 54, 69, 75,91 rchar/s field, sar -c, 161 rclm/s field, sar -p, 50, 51 rcvin/s field, sar -y, 110 read-ahead, 73 readv(S), I/O vector size, 203 receive window size adjusting, 131 setting for each interface, 226 records, locked by system, 209 region table, 209 repeaters, 137 reqblk/s field, sar -5, 94 request counts, 104 rescheduling jobs, 40 resident pages wanted, 196 retrans field, nfsstat -c, 145 root filesystem checkpointing,206 compression, 206 file synchronization, 206 logging, 206 undelete depth, 206 undelete time, 206 ROOTCHKPT,206 ROOTLOG,206 ROOTMAXVDEPTH,206 ROOTMINVTIME,206 263 ROOTNOCOMP ROOTNOCOMP,206 ROOTSYNC,206 routers, 125, 137 routing, performance considerations, 140141 routing metric, adjusting, 225 run queue heavy activity on, 38 runnable processes on, 24 viewing activity of, 30 viewing occupancy of, 30 viewing size of, 30 %runocc field, sar -q, 30 runq-sz field, sar -q, 30 r+w /s field, sar -d, 90 s sadc (System Activity Data Collector), 177 sar(ADM), 176 -B,86 -b, 54, 69, 75, 91 -c,161 -d, 69,89,90 enabling for use, 177 -g,111 -h,86 -L,165 -m,162 -n, 82, 83, 91 -0,162 -p,50,51 -q, 30,49,68 -r, 48, 51, 68 -5,93,220 system activity reporter, 176 -u, 23, 146, 155 , -v, 55, 83,202,208 -w,49,68 -y,110 scaling, perfect, 31 scall/s field, sar -c, 161 scatter-gather buffer headers, used by system, 85 scatter-gather buffers number set using NMPBUF, 193 tuning number of, 86 used by system, 84 264 sched daemon, 47 heavy activity by, 51 scheduler, purpose of, 24 scheduling cache affinity, 36 cache_affinity variable, 36 dopricalc variable, 35 fixed-priority, 11 load balancing, 37 loadbalance variable, 37 of processes, 34 preemption variable, 37 primove variable, 36 priority calculations, 35 screen saver, 204 SCRNMEM,205 SCSI disk request blocks, tuning, 93 SCSI disks request queue, 93 tuning number of request blocks, 93 sdevice file, entry for sleeper driver, 167 sdmabuf/s field, sar -h, 86 SDSKOUT, 93, 220 SECCLEARID,204 SECLUID,203 SECSTOPIO, 204 security disabling C2 features, 203 kernel parameters, 203 SEMAEM,217 semaphores kernel parameters, 216-217 POSIX.lb,163 System V, 162 used by database, 168 using, 163 viewing activity of, 162 sema/s field, sar -m, 163 SEMMAP,216 SEMMNI,216 SEMMNS,163,217 SEMMNU,216 SEMMSL,217 SEM_NSEMS_MAX, 163, 217 SEMOPM,217 SEMUME,217 SEMVMX,217 swritls send window size adjusting, 131 setting for each interface, 226 Sequenced Packet Exchange, SPX, 157 serial I/O device driver, 108 tuning, 110 server types, 139 setconf(ADM), 224 SGID bits, 204 shared memory by CPUs,33 kernel parameters, 218 used by databases, 122 using, 165 shell layers, setting number of, 205 SHMMAX, 165, 218 SHMMIN,218 S~,165,218 sio, serial driver, 108 sleeper driver, 163, 167, 168 SLIP, performance tuning, 135 slpcpybufs/ s field, sar -B, 86 spin locks, 165 split job, 103 SP1MAP, 197 spurious interrupts, 110 SPX (Sequenced Packet Exchange), 157 sread/s field, sar -c, 161 stack, region, 209 static shared libraries, 11 stopio(S), 204 STRCTLSZ, 213 strd, daemon, 129 stream event, structures, 211 stream head, 123 stream heads configuring number of, 129, 210 structures, 210 STREAMS applications using, 129 buffer splitting, 211 configuring number of pipes, 212 kernel parameters, 209-213 message buffers, 127 message header, 127 message~124,127 monitoring use, 129 multiplexer links, 213 performance tuning, 123-131 too few stream heads, 129 STREAMS message, control portion, 213 STREAMS modules kernel parameters,212 number on stream, 213 STREAMS pipes, configuring number of, 212 string: Configuration buffer full, 219 strinit - Cannot alloc STREAMS table, 210 striped disks, 98 STRMSGSZ, 212 stropen - Out of streams,210 STRSPLfTFRJ\C,128,211 stune file, 190 subnets, 139 SUDS library AIO, 162 semaphores, 163 spin locks and latches, 165 SurD bits, 204 supplemental groups, limiting, 202 swap area adding, 179 deleting, 179 examining usage of, 179 size of, 48 unused disk blocks in,48 used by system, 47 swap queue activity on, 51 used by system, 49 viewing occupancy of, 49 viewing size of, 49 swap (ADM) -1,48,68 using,179 swapdel - Total swap area too small, 196 swappable pages wanted, 196 swapper daemon, sched, 47 swapping activity,51 affecting I/O throughput, 69 consuming CPU resources, 40 heavy activity, 51 indicating memory shortage, 67 kernel parameters, 195 memory,195 used by system, 47 viewing activity of, 49 %swpocc field, sar -q, 49, 68 swpot/ s field, sar -w, 68 swpq-sz field, sar -q, 49, 68 swrit/s field, sar -c, 161 265 synchronous synchronous 1/0,72 synchronous writes, configuring on NFS server, 151 %sys field, sar -u, 23 system, increasing reliability of, 194 system activity, per command, 180 System Activity Data Collector, sa dc, 177 system activity recording, enabling, 177 system calls excessive number of, 162, 168 investigating activity of, 161 number of characters read, 161 number of characters written, 161 number of execs, 162 number of forks, 162 number of reads, 161 number of writes, 161 reducing, 166 total number of, 161 system mode, operating system state, 22 system name, 219 system resources, kernel, 189, 190 system tables, viewing, 55 SZ field, ps -el, 52 T tape drive buffer, size of, 220 TBLDMAPAGES,207 TBLLIMIT, 207 TBLMAP,207 TBLNK,204 TBLPAGES, 207 TBLSYSLIMIT,207 TCP (Transmission Control Protocol) introduced, 131 parameters. 232 tcpalldebug, .;.34 tcpconsdebug, 234 tcp_initiaLtimeout, 232 TCP/IP daemons, performance implications, 134 global parameters, changing, 226 maximum segment size, adjusting, 131 one-packet mode, setting, 132 parameters, 227-234 performance considerations, 132 performance tuning, 131-142 problem solving, 132 receive window size, adjusting, 131 266 TCP /IP (continued) send window size, adjusting, 131 setting receive window size, 226 setting send window size, 226 setting truncate segment, 226 time-to-live, setting, 132 using with NFS, 152 tcp_keepidle, 233 tcp_keepintvl, 233 tcp_mssdflt,233 tcp_mss_sw_threshold, 233 tcp_nkeep, 233 tcp_offer_bi~mss, 233 tcpprintfs, 234 tcp_smaILrecvspace,234 tcp_urgbehavior,234 terminal driver,109 text of program, 42 region, 209 shared,209 ThickNet, 137 ThinNet, 137 throughput, disk, 90 time slice, 26 timeout field, nfsstat -c, 145 time-to-live, setting, 132 timex (ADM), 180 TIMEZONE, 219 timezone variable, TZ, 219 timod(M), STREAMS modules, 212 TLI (Transport Library Interface), kernel parameters, 212 Too big, 202 traceroute(ADMN),136 transfer buffers number set using NMPBUF, 193 tuning number of, 86 used by system, 84 Transport Library Interface, TLI, kernel parameters, 212 trigger level, UART, 111 truncate segment, setting for each interface, 226 TTHOG,111 tty configuration parameters, 197 kernel parameters, 204 terminal driver, 109 virtual tuning CPU resources, 21-40 CPU-bound systems, 40 disk I/O-bound systems, 92 increasing disk I/O throughput, 75 increasing speed of access to buffers, 80 I/O resources, 71-122 LMCFS performance, 155 memory resources, 41-70 memory-bound systems, 52 methodology, 14 networking resources, 123-160 NFS client performance, 147 NFS performance, 146 number of biod daemons, 149 number of nfsd daemons, 148 PPP performance, 136 reducing contention for buffers, 80 reducing contention for multiphysical buffers, 86 reducing contention for SCSI disk request blocks, 93 reducing disk I/O using the namei cache, 83 serial 1/0, 110 SLIP performance, 135 STREAMS resources, 130 system call activity, 161-169 TCP /IP performance, 131 virtual disk performance, 100 X server performance, 57 twisted pair, 137 TZ (timezone) variable, 219 u UART (universal asynchronous receiver /transmitter), 108 UDP, parameters, 234 udpprintfs, 234 ULIMIT,202 umask(S), default mask, 202 undelete, controlling time, 206 undelete depth, 206 undelete time, 206 undo structures, number in system, 216 universal asynchronous receiver/transmitter, UART, 108 upgrading hardware, 8 user configuration, kernel parameters, 201 User Datagram Protocol, parameters, 234 user mode, operating system state, 22 %usr field, sar -u, 23 /usr/adm/messages,42,74 v vcview(LMC), -v, 155 VDASYNCMAX, 107,201 VDASYNCPARITY,201 VDASYNCWRITES, 201 VDHASHMAX, 108,200 vdisk - job pool is empty, 107 vdisk - job queue is full, 107 vdisk - piece pool is empty, 107 vdisk driver, 100, 105 VDJOBS, 107,200 VDRPT,201 VDUNlTJOBS, 107, 200 VDUNITMAX,200 VDWRITEBACK, 201 versioning, controlling depth of, 205 vflt/s field, sar -p, 50 VGA_PLASMA,221 vhand daemon, 44, 195 heavy activity by, 51 viewing, block 1/0,89 virtual, terminals, 222 virtual connection, 126 virtual disks asynchronous writes, 107 balancing disk load, 105 buffer headers, 106 choosing a cluster size, 103 comparison of configurations, 100 CPU requirements, 100 examining request counts, 104 full stripe, 103 hash table, 108 job pool, 107 kernel parameters, 200 memory requirements, 101 number of job structures, 107 number of jobs, 107 number of piece pool entries, 107 piece pool, 107 RAID 4 and 5 performance, 102 split job, 103 striping, 98 267 virtual virtual disks (continued) tuning, 100-108 tuning kernel parameters, 106 used by databases, 103 using, 98 vdisk driver, 100 write-back caching, 201 virtual memory statistics, examining, 181 vrnstat(C), virtual memory statistics, 181 w wait field, nfsstat -c, 145 waiting for I/O, operating system state, 22 %wcache field, sar -b, 54,69,75,91 wchar/s field, sar -c, 161 window sizes, setting for each interface, 226 %wio field cpusar -u, 119 sar -u, 23, 39, 90 workload, managing, 19 writev(S), I/O vector size, 203 x X client performance, 56 unable to start, 58 X server, tuning, 57 X Window System, configuration, 58 xdrcall field, nfsstat -s, 145 XENIX, shared data segments, 218 XENIX semaphores, 217 xsd_alloc - XENIX shared data table overflow, 218 XSDSEGS, 218 XSDSLOTS, 218 xsem_alloc - XENIX semaphore table overflow, 217 XSEMMAX,217 xtinit - Cannot allocate xt link buffers, 205 268 1 May 1995 AU20004POOl
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.3 Linearized : No XMP Toolkit : Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:56:37 Create Date : 2014:01:12 18:57:35-08:00 Modify Date : 2014:01:12 18:38-08:00 Metadata Date : 2014:01:12 18:38-08:00 Producer : Adobe Acrobat 9.55 Paper Capture Plug-in Format : application/pdf Document ID : uuid:031f308d-85c4-6d4d-a3a3-96da49c8e33b Instance ID : uuid:6853ca1b-a612-c84a-b6d6-e1a324070046 Page Layout : SinglePage Page Mode : UseNone Page Count : 280EXIF Metadata provided by EXIF.tools