Apache3 Apache The Definitive Guide Third Edition

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 622

DownloadApache3 Apache The Definitive Guide Third Edition
Open PDF In BrowserView PDF
Copyright
Preface
Who Wrote Apache, and Why?
The Demonstration Code
Conventions Used in This Book
Organization of This Book
Acknowledgments
Chapter 1. Getting Started
Section 1.1. What Does a Web Server Do?
Section 1.2. How Apache Works
Section 1.3. Apache and Networking
Section 1.4. How HTTP Clients Work
Section 1.5. What Happens at the Server End?
Section 1.6. Planning the Apache Installation
Section 1.7. Windows?
Section 1.8. Which Apache?
Section 1.9. Installing Apache
Section 1.10. Building Apache 1.3.X Under Unix
Section 1.11. New Features in Apache v2
Section 1.12. Making and Installing Apache v2 Under Unix
Section 1.13. Apache Under Windows
Chapter 2. Configuring Apache: The First Steps
Section 2.1. What's Behind an Apache Web Site?
Section 2.2. site.toddle
Section 2.3. Setting Up a Unix Server
Section 2.4. Setting Up a Win32 Server
Section 2.5. Directives
Section 2.6. Shared Objects
Chapter 3. Toward a Real Web Site
Section 3.1. More and Better Web Sites: site.simple
Section 3.2. Butterthlies, Inc., Gets Going
Section 3.3. Block Directives
Section 3.4. Other Directives
Section 3.5. HTTP Response Headers
Section 3.6. Restarts
Section 3.7. .htaccess
Section 3.8. CERN Metafiles
Section 3.9. Expirations
Chapter 4. Virtual Hosts
Section 4.1. Two Sites and Apache
Section 4.2. Virtual Hosts

Section 4.3. Two Copies of Apache
Section 4.4. Dynamically Configured Virtual Hosting
Chapter 5. Authentication
Section 5.1. Authentication Protocol
Section 5.2. Authentication Directives
Section 5.3. Passwords Under Unix
Section 5.4. Passwords Under Win32
Section 5.5. Passwords over the Web
Section 5.6. From the Client's Point of View
Section 5.7. CGI Scripts
Section 5.8. Variations on a Theme
Section 5.9. Order, Allow, and Deny
Section 5.10. DBM Files on Unix
Section 5.11. Digest Authentication
Section 5.12. Anonymous Access
Section 5.13. Experiments
Section 5.14. Automatic User Information
Section 5.15. Using .htaccess Files
Section 5.16. Overrides
Chapter 6. Content Description and Modification
Section 6.1. MIME Types
Section 6.2. Content Negotiation
Section 6.3. Language Negotiation
Section 6.4. Type Maps
Section 6.5. Browsers and HTTP 1.1
Section 6.6. Filters
Chapter 7. Indexing
Section 7.1. Making Better Indexes in Apache
Section 7.2. Making Our Own Indexes
Section 7.3. Imagemaps
Section 7.4. Image Map Directives
Chapter 8. Redirection
Section 8.1. Alias
Section 8.2. Rewrite
Section 8.3. Speling
Chapter 9. Proxying
Section 9.1. Security
Section 9.2. Proxy Directives
Section 9.3. Apparent Bug
Section 9.4. Performance
Section 9.5. Setup

Chapter 10. Logging
Section 10.1. Logging by Script and Database
Section 10.2. Apache's Logging Facilities
Section 10.3. Configuration Logging
Section 10.4. Status
Chapter 11. Security
Section 11.1. Internal and External Users
Section 11.2. Binary Signatures, Virtual Cash
Section 11.3. Certificates
Section 11.4. Firewalls
Section 11.5. Legal Issues
Section 11.6. Secure Sockets Layer (SSL)
Section 11.7. Apache's Security Precautions
Section 11.8. SSL Directives
Section 11.9. Cipher Suites
Section 11.10. Security in Real Life
Section 11.11. Future Directions
Chapter 12. Running a Big Web Site
Section 12.1. Machine Setup
Section 12.2. Server Security
Section 12.3. Managing a Big Site
Section 12.4. Supporting Software
Section 12.5. Scalability
Section 12.6. Load Balancing
Chapter 13. Building Applications
Section 13.1. Web Sites as Applications
Section 13.2. Providing Application Logic
Section 13.3. XML, XSLT, and Web Applications
Chapter 14. Server-Side Includes
Section 14.1. File Size
Section 14.2. File Modification Time
Section 14.3. Includes
Section 14.4. Execute CGI
Section 14.5. Echo
Section 14.6. Apache v2: SSI Filters
Chapter 15. PHP
Section 15.1. Installing PHP
Section 15.2. Site.php
Chapter 16. CGI and Perl

Section 16.1.
Section 16.2.
Section 16.3.
Section 16.4.
Section 16.5.
Section 16.6.
Section 16.7.
Section 16.8.
Section 16.9.

The World of CGI
Telling Apache About the Script
Setting Environment Variables
Cookies
Script Directives
suEXEC on Unix
Handlers
Actions
Browsers

Chapter 17. mod_perl
Section 17.1. How mod_perl Works
Section 17.2. mod_perl Documentation
Section 17.3. Installing mod_perl — The Simple Way
Section 17.4. Modifying Your Scripts to Run Under mod_perl
Section 17.5. Global Variables
Section 17.6. Strict Pregame
Section 17.7. Loading Changes
Section 17.8. Opening and Closing Files
Section 17.9. Configuring Apache to Use mod_perl
Chapter 18. mod_jserv and Tomcat
Section 18.1. mod_jserv
Section 18.2. Tomcat
Section 18.3. Connecting Tomcat to Apache
Chapter 19. XML and Cocoon
Section 19.1. XML
Section 19.2. XML and Perl
Section 19.3. Cocoon
Section 19.4. Cocoon 1.8 and JServ
Section 19.5. Cocoon 2.0.3 and Tomcat
Section 19.6. Testing Cocoon
Chapter 20. The Apache API
Section 20.1. Documentation
Section 20.2. APR
Section 20.3. Pools
Section 20.4. Per-Server Configuration
Section 20.5. Per-Directory Configuration
Section 20.6. Per-Request Information
Section 20.7. Access to Configuration and Request Information
Section 20.8. Hooks, Optional Hooks, and Optional Functions
Section 20.9. Filters, Buckets, and Bucket Brigades
Section 20.10. Modules

Chapter 21. Writing Apache Modules
Section 21.1. Overview
Section 21.2. Status Codes
Section 21.3. The Module Structure
Section 21.4. A Complete Example
Section 21.5. General Hints
Section 21.6. Porting to Apache 2.0
Appendix A. The Apache 1.x API
Section A.1. Pools
Section A.2. Per-Server Configuration
Section A.3. Per-Directory Configuration
Section A.4. Per-Request Information
Section A.5. Access to Configuration and Request Information
Section A.6. Functions
Colophon
Index

Copyright
Copyright © O'Reilly & Associates, Inc.
Printed in the United States of America.
Published by O'Reilly & Associates, Inc., 1005 Gravenstein Highway North, Sebastopol,
CA 95472.
O'Reilly & Associates books may be purchased for educational, business, or sales
promotional use. Online editions are also available for most titles
(http://safari.oreilly.com). For more information, contact our corporate/institutional sales
department: (800) 998-9938 or corporate@oreilly.com.
Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered
trademarks of O'Reilly & Associates, Inc. Many of the designations used by
manufacturers and sellers to distinguish their products are claimed as trademarks. Where
those designations appear in this book, and O'Reilly & Associates, Inc. was aware of a
trademark claim, the designations have been printed in caps or initial caps. The
association between the image of Appaloosa horse and the topic of Apache is a trademark
of O'Reilly & Associates, Inc.
While every precaution has been taken in the preparation of this book, the publisher and
authors assume no responsibility for errors or omissions, or for damages resulting from
the use of the information contained herein.

Preface
Apache: The Definitive Guide, Third Edition, is principally about the Apache web-server
software. We explain what a web server is and how it works, but our assumption is that
most of our readers have used the World Wide Web and understand in practical terms
how it works, and that they are now thinking about running their own servers and sites.
This book takes the reader through the process of acquiring, compiling, installing,
configuring, and modifying Apache. We exercise most of the package's functions by
showing a set of example sites that take a reasonably typical web business — in our case,
a postcard publisher — through a process of development and increasing complexity.
However, we have deliberately tried to make each site as simple as possible, focusing on
the particular feature being described. Each site is pretty well self-contained, so that the
reader can refer to it while following the text without having to disentangle the meat from
extraneous vegetables. If desired, it is possible to install and run each site on a suitable
system.
Perhaps it is worth saying what this book is not. It is not a manual, in the sense of
formally documenting every command — such a manual exists on the Apache site and
has been much improved with Versions 1.3 and 2.0; we assume that if you want to use
Apache, you will download it and keep it at hand. Rather, if the manual is a road map that
tells you how to get somewhere, this book tries to be a tourist guide that tells you why
you might want to make the journey.
In passing, we do reproduce some sections of the web site manual simply to save the
reader the trouble of looking up the formal definitions as she follows the argument.
Occasionally, we found the manual text hard to follow and in those cases we have
changed the wording slightly. We have also interspersed comments as seemed useful at
the time.
This is not a book about HTML or creating web pages, or one about web security or even
about running a web site. These are all complex subjects that should be either treated
thoroughly or left alone. As a result, a webmaster's library might include books on the
following topics:
•
•
•
•
•
•

The Web and how it works
HTML — formal definitions, what you can do with it
How to decide what sort of web site you want, how to organize it, and how to
protect it
How to implement the site you want using one of the available servers (for
instance, Apache)
Handbooks on Java, Perl, and other languages
Security

Apache: The Definitive Guide is just one of the six or so possible titles in the fourth
category.

Apache is a versatile package and is becoming more versatile every day, so we have not
tried to illustrate every possible combination of commands; that would require a book of
a million pages or so. Rather, we have tried to suggest lines of development that a typical
webmaster could follow once an understanding of the basic concepts is achieved.
We realized from our own experience that the hardest stage of learning how to use
Apache in a real-life context is right at the beginning, where the novice webmaster often
has to get Apache, a scripting language, and a database manager to collaborate. This can
be very puzzling. In this new edition we have therefore included a good deal of new
material which tries to take the reader up these conceptual precipices. Once the
collaboration is working, development is much easier. These new chapters are not
intended to be an experts' account of, say, the interaction between Apache, Perl, and
MySQL — but a simple beginners' guide, explaining how to make these things work with
Apache. In the process we make some comments, from our own experience, on the merits
of the various software products from which the user has to choose.
As with the first and second editions, writing the book was something of a race with
Apache's developers. We wanted to be ready as soon as Version 2 was stable, but not
before the developers had finished adding new features.
In many of the examples that follow, the motivation for what we make Apache do is
simple enough and requires little explanation (for example, the different index formats in
Chapter 7). Elsewhere, we feel that the webmaster needs to be aware of wider issues (for
instance, the security issues discussed in Chapter 11) before making sensible decisions
about his site's configuration, and we have not hesitated to branch out to deal with them.

Who Wrote Apache, and Why?
Apache gets its name from the fact that it consists of some existing code plus some
patches. The FAQFAQ is netspeak for Frequently Asked Questions. Most sites/subjects
have an FAQ file that tells you what the thing is, why it is, and where it's going. It is
perfectly reasonable for the newcomer to ask for the FAQ to look up anything new to her,
and indeed this is a sensible thing to do, since it reduces the number of questions asked.
Apache's FAQ can be found at http://www.apache.org/docs/FAQ.html. thinks that this is
cute; others may think it's the sort of joke that gets programmers a bad name. A more
responsible group thinks that Apache is an appropriate title because of the
resourcefulness and adaptability of the American Indian tribe.
You have to understand that Apache is free to its users and is written by a team of
volunteers who do not get paid for their work. Whether they decide to incorporate your or
anyone else's ideas is entirely up to them. If you don't like what they do, feel free to
collect a team and write your own web server or to adapt the existing Apache code — as
many have.
The first web server was built by the British physicist Tim Berners-Lee at CERN, the
European Centre for Nuclear Research at Geneva, Switzerland. The immediate ancestor

of Apache was built by the U.S. government's NCSA, the National Center for
Supercomputing Applications. Because this code was written with (American) taxpayers'
money, it is available to all; you can, if you like, download the source code in C from
http://www.ncsa.uiuc.edu, paying due attention to the license conditions.
There were those who thought that things could be done better, and in the FAQ for
Apache (at http://www.apache.org ), we read:
...Apache was originally based on code and ideas found in the most popular HTTP server
of the time, NCSA httpd 1.3 (early 1995).
That phrase "of the time" is nice. It usually refers to good times back in the 1700s or the
early days of technology in the 1900s. But here it means back in the deliquescent bogs of
a few years ago!
While the Apache site is open to all, Apache is written by an invited group of (we hope)
reasonably good programmers. One of the authors of this book, Ben, is a member of this
group.
Why do they bother? Why do these programmers, who presumably could be well paid for
doing something else, sit up nights to work on Apache for our benefit? There is no such
thing as a free lunch, so they do it for a number of typically human reasons. One might
list, in no particular order:
•
•

•

•
•

They want to do something more interesting than their day job, which might be
writing stock control packages for BigBins, Inc.
They want to be involved on the edge of what is happening. Working on a project
like this is a pretty good way to keep up-to-date. After that comes consultancy on
the next hot project.
The more worldly ones might remember how, back in the old days of 1995, quite
a lot of the people working on the web server at NCSA left for a thing called
Netscape and became, in the passage of the age, zillionaires.
It's fun. Developing good software is interesting and amusing, and you get to meet
and work with other clever people.
They are not doing the bit that programmers hate: explaining to end users why
their treasure isn't working and trying to fix it in 10 minutes flat. If you want
support on Apache, you have to consult one of several commercial organizations
(see Appendix A), who, quite properly, want to be paid for doing the work
everyone loathes.

The Demonstration Code
The code for the demonstration web sites referred to throughout the book is available at
http://www.oreilly.com/catalog/apache3/. It contains the requisite README file with
installation instructions and other useful information. The contents of the download are
organized into two directories:
install/
This directory contains scripts to install the sample sites:
install
Run this script to install the sites.
install.conf
Unix configuration file for install.
installwin.conf
Win32 configuration file for install.
sites/
This directory contains the sample sites used in the book.

Conventions Used in This Book
This section covers the various conventions used in this book.
Typographic Conventions
Constant width
Used for HTTP headers, status codes, MIME content types, directives in
configuration files, commands, options/switches, functions, methods, variable
names, and code within body text
Constant width bold

Used in code segments to indicate input to be typed in by the user
Constant width italic

Used for replaceable items in code and text

Italic
Used for filenames, pathnames, newsgroup names, Internet addresses (URLs),
email addresses, variable names (except in examples), terms being introduced,
program names, subroutine names, CGI script names, hostnames, usernames, and
group names
Icons

Text marked with this icon applies to the Unix version of Apache.

Text marked with this icon applies to the Win32 version of Apache.
This icon designates a note relating to the surrounding text.

This icon designates a warning related to the surrounding text.

Pathnames
We use the text convention ... / to indicate your path to the demonstration sites, which
may well be different from ours. For instance, on our Apache machine, we kept all the
demonstration sites in the directory /usr/www. So, for example, our path would be
/usr/www/site.simple. You might want to keep the sites somewhere other than /usr/www,
so we refer to the path as ... /site.simple.
Don't type .../ into your computer. The attempt will upset it!

Directives
Apache is controlled through roughly 150 directives. For each directive, a formal
explanation is given in the following format:
Directive
Syntax
Where used

An explanation of the directive is located here.
So, for instance, we have the following directive:
ServerAdmin
ServerAdmin email address
Server config, virtual host
ServerAdmin gives the email address for correspondence. It automatically generates
error messages so the user has someone to write to in case of problems.

The Where used line explains the appropriate environment for the directive. This will
become clearer later.

Organization of This Book
The chapters that follow and their contents are listed here:
Chapter 1
Covers web servers, how Apache works, TCP/IP, HTTP, hostnames, what a client
does, what happens at the server end, choosing a Unix version, and compiling and
installing Apache under both Unix and Win32.
Chapter 2
Discusses getting Apache to run, creating Apache users, runtime flags,
permissions, and site.simple.
Chapter 3
Introduces a demonstration business, Butterthlies, Inc.; some HTML; default
indexing of web pages; server housekeeping; and block directives.

Chapter 4
Explains how to connect web sites to network addresses, including the common
case where more than one web site is hosted at a given network address.
Chapter 5
Explains controlling access, collecting information about clients, cookies, DBM
control, digest authentication, and anonymous access.
Chapter 6
Covers content and language arbitration, type maps, and expiration of
information.
Chapter 7
Discusses better indexes, index options, your own indexes, and imagemaps.
Chapter 8
Describes Alias, ScriptAlias, and the amazing Rewrite module.
Chapter 9
Covers remote proxies and proxy caching.
Chapter 10
Explains Apache's facilities for tracking activity on your web sites.
Chapter 11
Explores the many aspects of protecting an Apache server and its content from
uninvited guests and intruders, including user validation, binary signatures, virtual
cash, certificates, firewalls, packet filtering, secure sockets layer (SSL), legal
issues, patent rights, national security, and Apache-SSL directives.
Chapter 12
Explains best practices for running large sites, including support for multiple
content-creators, separating test sites from production sites, and integrating the
site with other Internet technologies.

Chapter 13
Explores the options available for using Apache to host automatically changing
content and interactive applications.
Chapter 14
Explains using runtime commands in your HTML and XSSI — a more secure
server-side include.
Chapter 15
Explains how to install and configure PHP, with an example for connecting it to
MySQL.
Chapter 16
Demonstrates aliases, logs, HTML forms, a shell script, a CGI script in Perl,
environment variables, and using MySQL through Perl and Apache.
Chapter 17
Demonstrates how to install, configure, and use the mod_perl module for efficient
processing of Perl applications.
Chapter 18
Explains how to install these two modules for supporting Java in the Apache
environment.
Chapter 19
Explains how to use XML in conjunction with Apache and how to install and
configure the Cocoon set of tools for presenting XML content.
Chapter 20
Explores the foundations of the Apache 2.0 API.
Chapter 21
Describes how to create Apache modules using the Apache 2.0 Apache Portable
Runtime, including how to port modules from 1.3 to 2.0.

Appendix A
Describes pools; per-server, per-directory, and per-request information; functions;
warnings; and parsing.
In addition, the Apache Quick Reference Card provides an outline of Apache 1.3 and 2.0
syntax.

Acknowledgments
First, thanks to Robert S. Thau, who gave the world the Apache API and the code that
implements it, and to the Apache Group, who worked on it before and have worked on it
since. Thanks to Eric Young and Tim Hudson for giving SSLeay to the Web.
Thanks to Bryan Blank, Aram Mirzadeh, Chuck Murcko, and Randy Terbush, who read
early drafts of the first edition text and made many useful suggestions; and to John
Ackermann, Geoff Meek, and Shane Owenby, who did the same for the second edition.
For the third edition, we would like to thank our reviewers Evelyn Mitchell, Neil Neely,
Lemon, Dirk-Willem van Gulik, Richard Sonnen, David Reid, Joe Johnston, Mike Stok,
and Steven Champeon.
We would also like to offer special thanks to Andrew Ford for giving us permission to
reprint his Apache Quick Reference Card.
Many thanks to Simon St.Laurent, our editor at O'Reilly, who patiently turned our text
into a book — again. The two layers of blunders that remain are our own contribution.
And finally, thanks to Camilla von Massenbach and Barbara Laurie, who have continued
to put up with us while we rewrote this book.

Chapter 1. Getting Started
•
•
•
•
•
•
•
•
•
•
•
•
•

1.1 What Does a Web Server Do?
1.2 How Apache Works
1.3 Apache and Networking
1.4 How HTTP Clients Work
1.5 What Happens at the Server End?
1.6 Planning the Apache Installation
1.7 Windows?
1.8 Which Apache?
1.9 Installing Apache
1.10 Building Apache 1.3.X Under Unix
1.11 New Features in Apache v2
1.12 Making and Installing Apache v2 Under Unix
1.13 Apache Under Windows

Apache is the dominant web server on the Internet today, filling a key place in the
infrastructure of the Internet. This chapter will explore what web servers do and why you
might choose the Apache web server, examine how your web server fits into the rest of
your network infrastructure, and conclude by showing you how to install Apache on a
variety of different systems.

1.1 What Does a Web Server Do?
The whole business of a web server is to translate a URL either into a filename, and then
send that file back over the Internet, or into a program name, and then run that program
and send its output back. That is the meat of what it does: all the rest is trimming.
When you fire up your browser and connect to the URL of someone's home page — say
the notional http://www.butterthlies.com/ we shall meet later on — you send a message
across the Internet to the machine at that address. That machine, you hope, is up and
running; its Internet connection is working; and it is ready to receive and act on your
message.
URL stands for Uniform Resource Locator. A URL such as http://www.butterthlies.com/
comes in three parts:
:///

So, in our example, < scheme> is http, meaning that the browser should use HTTP
(Hypertext Transfer Protocol);  is www.butterthlies.com ; and  is /,
traditionally meaning the top page of the host.[1] The  may contain either an IP
address or a name, which the browser will then convert to an IP address. Using HTTP
1.1, your browser might send the following request to the computer at that IP address:
GET / HTTP/1.1

Host: www.butterthlies.com

The request arrives at port 80 (the default HTTP port) on the host www.butterthlies.com.
The message is again in four parts: a method (an HTTP method, not a URL method), that
in this case is GET, but could equally be PUT, POST, DELETE, or CONNECT; the Uniform
Resource Identifier (URI) /; the version of the protocol we are using; and a series of
headers that modify the request (in this case, a Host header, which is used for namebased virtual hosting: see Chapter 4). It is then up to the web server running on that host
to make something of this message.
The host machine may be a whole cluster of hypercomputers costing an oil sheik's
ransom or just a humble PC. In either case, it had better be running a web server, a
program that listens to the network and accepts and acts on this sort of message.
1.1.1 Criteria for Choosing a Web Server
What do we want a web server to do? It should:
•
•

•
•

•

•

•

•

Run fast, so it can cope with a lot of requests using a minimum of hardware.
Support multitasking, so it can deal with more than one request at once and so that
the person running it can maintain the data it hands out without having to shut the
service down. Multitasking is hard to arrange within a program: the only way to
do it properly is to run the server on a multitasking operating system.
Authenticate requesters: some may be entitled to more services than others. When
we come to handling money, this feature (see Chapter 11) becomes essential.
Respond to errors in the messages it gets with answers that make sense in the
context of what is going on. For instance, if a client requests a page that the server
cannot find, the server should respond with a "404" error, which is defined by the
HTTP specification to mean "page does not exist."
Negotiate a style and language of response with the requester. For instance, it
should — if the people running the server can rise to the challenge — be able to
respond in the language of the requester's choice. This ability, of course, can open
up your site to a lot more action. There are parts of the world where a response in
the wrong language can be a bad thing.
Support a variety of different formats. On a more technical level, a user might
want JPEG image files rather than GIF, or TIFF rather than either of those. He
might want text in vdi format rather than PostScript.
Be able to run as a proxy server. A proxy server accepts requests for clients,
forwards them to the real servers, and then sends the real servers' responses back
to the clients. There are two reasons why you might want a proxy server:
o The proxy might be running on the far side of a firewall (see Chapter 11),
giving its users access to the Internet.
o The proxy might cache popular pages to save reaccessing them.
Be secure. The Internet world is like the real world, peopled by a lot of lambs and
a few wolves.[2] The aim of a good server is to prevent the wolves from troubling

the lambs. The subject of security is so important that we will come back to it
several times.
1.1.2 Why Apache?
Apache has more than twice the market share than its next competitor, Microsoft. This is
not just because it is freeware and costs nothing. It is also open source,[3] which means
that the source code can be examined by anyone so inclined. If there are errors in it,
thousands of pairs of eyes scan it for mistakes. Because of this constant examination by
outsiders, it is substantially more reliable[4] than any commercial software product that
can only rely on the scrutiny of a closed list of employees. This is particularly important
in the field of security, where apparently trivial mistakes can have horrible consequences.
Anyone is free to take the source code and change it to make Apache do something
different. In particular, Apache is extensible through an established technology for
writing new Modules (described in more detail in Chapter 20), which many people have
used to introduce new features.
Apache suits sites of all sizes and types. You can run a single personal page on it or an
enormous site serving millions of regular visitors. You can use it to serve static files over
the Web or as a frontend to applications that generate customized responses for visitors.
Some developers use Apache as a test-server on their desktops, writing and trying code in
a local environment before publishing it to a wider audience. Apache can be an
appropriate solution for practically any situation involving the HTTP protocol.
Apache is freeware . The intending user downloads the source code and compiles it
(under Unix) or downloads the executable (for Windows) from http://www.apache.org or
a suitable mirror site. Although it sounds difficult to download the source code and
configure and compile it, it only takes about 20 minutes and is well worth the trouble.
Many operating system vendors now bundle appropriate Apache binaries.
The result of Apache's many advantages is clear. There are about 75 web-server software
packages on the market. Their relative popularity is charted every month by Netcraft
(http://www.netcraft.com). In July 2002, their June survey of active sites, shown in Table
1-1, had found that Apache ran nearly two-thirds of the sites they surveyed (continuing a
trend that has been apparent for several years).

Table 1-1. Active sites counted by Netcraft survey, June 2002
Developer
Apache
Microsoft
iPlanet
Zeus

May 2002
10411000
4121697
247051
214498

Percent
65.11
25.78
1.55
1.34

June 2002
10964734
4243719
281681
227857

Percent
64.42
24.93
1.66
1.34

1.2 How Apache Works
Apache is a program that runs under a suitable multitasking operating system. In the
examples in this book, the operating systems are Unix and Windows
95/98/2000/Me/NT/..., which we call Win32. There are many others: flavors of Unix,
IBM's OS/2, and Novell Netware. Mac OS X has a FreeBSD foundation and ships with
Apache.
The Apache binary is called httpd under Unix and apache.exe under Win32 and normally
runs in the background.[5] Each copy of httpd/apache that is started has its attention
directed at a web site, which is, for our purposes, a directory. Regardless of operating
system, a site directory typically contains four subdirectories:
conf
Contains the configuration file(s), of which httpd.conf is the most important. It is
referred to throughout this book as the Config file. It specifies the URLs that will
be served.
htdocs
Contains the HTML files to be served up to the site's clients. This directory and
those below it, the web space, are accessible to anyone on the Web and therefore
pose a severe security risk if used for anything other than public data.
logs
Contains the log data, both of accesses and errors.
cgi-bin
Contains the CGI scripts. These are programs or shell scripts written by or for the
webmaster that can be executed by Apache on behalf of its clients. It is most
important, for security reasons, that this directory not be in the web space — that
is, in .../htdocs or below.
In its idling state, Apache does nothing but listen to the IP addresses specified in its
Config file. When a request appears, Apache receives it and analyzes the headers. It then
applies the rules it finds in the Config file and takes the appropriate action.
The webmaster's main control over Apache is through the Config file. The webmaster has
some 200 directives at her disposal, and most of this book is an account of what these
directives do and how to use them to reasonable advantage. The webmaster also has a
dozen flags she can use when Apache starts up.

We've quoted most of the formal definitions of the directives directly
from the Apache site manual pages because rewriting seemed
unlikely to improve them, but very likely to introduce errors. In a
few cases, where they had evidently been written by someone who
was not a native English speaker, we rearranged the syntax a little.
As they stand, they save the reader having to break off and go to the
Apache site

1.3 Apache and Networking
At its core, Apache is about communication over networks. Apache uses the TCP/IP
protocol as its foundation, providing an implementation of HTTP. Developers who want
to use Apache should have at least a foundation understanding of TCP/IP and may need
more advanced skills if they need to integrate Apache servers with other network
infrastructure like firewalls and proxy servers.
1.3.1 What to Know About TCP/IP
To understand the substance of this book, you need a modest knowledge of what TCP/IP
is and what it does. You'll find more than enough information in Craig Hunt and Robert
Bruce Thompson's books on TCP/IP,[6] but what follows is, we think, what is necessary
to know for our book's purposes.
TCP/IP (Transmission Control Protocol/Internet Protocol) is a set of protocols enabling
computers to talk to each other over networks. The two protocols that give the suite its
name are among the most important, but there are many others, and we shall meet some
of them later. These protocols are embodied in programs on your computer written by
someone or other; it doesn't much matter who. TCP/IP seems unusual among computer
standards in that the programs that implement it actually work, and their authors have not
tried too much to improve on the original conceptions.
TCP/IP is generally only used where there is a network.[7] Each computer on a network
that wants to use TCP/IP has an IP address, for example, 192.168.123.1.
There are four parts in the address, separated by periods. Each part corresponds to a byte,
so the whole address is four bytes long. You will, in consequence, seldom see any of the
parts outside the range 0 -255.
Although not required by the protocol, by convention there is a dividing line somewhere
inside this number: to the left is the network number and to the right, the host number.
Two machines on the same physical network — usually a local area network (LAN) —
normally have the same network number and communicate directly using TCP/IP.
How do we know where the dividing line is between network number and host number?
The default dividing line used to be determined by the first of the four numbers, but a

shortage of addresses required a change to the use of subnet masks. These allow us to
further subdivide the network by using more of the bits for the network number and less
for the host number. Their correct use is rather technical, so we leave it to the routing
experts. (You should not need to know the details of how this works in order to run a
host, because the numbers you deal with are assigned to you by your network
administrator or are just facts of the Internet.)
Now we can think about how two machines with IP addresses X and Y talk to each other.
If X and Y are on the same network and are correctly configured so that they have the
same network number and different host numbers, they should be able to fire up TCP/IP
and send packets to each other down their local, physical network without any further
ado.
If the network numbers are not the same, the packets are sent to a router, a special
machine able to find out where the other machine is and deliver the packets to it. This
communication may be over the Internet or might occur on your wide area network
(WAN). There are several ways computers use IP to communicate. These are two of
them:
UDP (User Datagram Protocol)
A way to send a single packet from one machine to another. It does not guarantee
delivery, and there is no acknowledgment of receipt. DNS uses UDP, as do other
applications that manage their own datagrams. Apache doesn't use UDP.
TCP (Transmission Control Protocol)
A way to establish communications between two computers. It reliably delivers
messages of any size in the order they are sent. This is a better protocol for our
purposes.
1.3.2 How Apache Uses TCP/IP
Let's look at a server from the outside. We have a box in which there is a computer,
software, and a connection to the outside world — Ethernet or a serial line to a modem,
for example. This connection is known as an interface and is known to the world by its IP
address. If the box had two interfaces, they would each have an IP address, and these
addresses would normally be different. A single interface, on the other hand, may have
more than one IP address (see Chapter 3).
Requests arrive on an interface for a number of different services offered by the server
using different protocols:
•
•
•

Network News Transfer Protocol (NNTP): news
Simple Mail Transfer Protocol (SMTP): mail
Domain Name Service (DNS)

•

HTTP: World Wide Web

The server can decide how to handle these different requests because the four-byte IP
address that leads the request to its interface is followed by a two-byte port number.
Different services attach to different ports:
•
•
•
•

NNTP: port number 119
SMTP: port number 25
DNS: port number 53
HTTP: port number 80

As the local administrator or webmaster, you can decide to attach any service to any port.
Of course, if you decide to step outside convention, you need to make sure that your
clients share your thinking. Our concern here is just with HTTP and Apache. Apache, by
default, listens to port number 80 because it deals in HTTP business.

Port numbers below 1024 can only be used by the superuser (root, under Unix); this
prevents other users from running programs masquerading as standard services, but
brings its own problems, as we shall see.

Under Win32 there is currently no security directly related to port numbers and no
superuser (at least, not as far as port numbers are concerned).
This basic setup is fine if our machine is providing only one web server to the world. In
real life, you may want to host several, many, dozens, or even hundreds of servers, which
appear to the world as completely different from each other. This situation was not
anticipated by the authors of HTTP 1.0, so handling a number of hosts on one machine
has to be done by a kludge, assigning multiple addresses to the same interface and
distinguishing the virtual host by its IP address. This technique is known as IP-intensive
virtual hosting. Using HTTP 1.1, virtual hosts may be created by assigning multiple
names to the same IP address. The browser sends a Host header to say which name it is
using.
1.3.3 Apache and Domain Name Servers
In one way the Web is like the telephone system: each site has a number that uniquely
identifies it — for instance, 192.168.123.5. In another way it is not: since these numbers
are hard to remember, they are automatically linked to domain names —
www.amazon.com, for instance, or www.butterthlies.com, which we shall meet later in
examples in this book.

When you surf to http://www.amazon.com, your browser actually goes first to a specialist
server called a Domain Name Server (DNS), which knows (how it knows doesn't concern
us here) that this name translates into 208.202.218.15.It then asks the Web to connect it
to that IP number. When you get an error message saying something like "DNS not
found," it means that this process has broken down. Maybe you typed the URL
incorrectly, or the server is down, or the person who set it up made a mistake — perhaps
because he didn't read this book.
A DNS error impacts Apache in various ways, but one that often catches the beginner is
this: if Apache is presented with a URL that corresponds to a directory, but does not have
a / at the end of it, then Apache will send a redirect to the same URL with the trailing /
added. In order to do this, Apache needs to know its own hostname, which it will attempt
to determine from DNS (unless it has been configured with the ServerName directive,
covered in Chapter 2. Often when beginners are experimenting with Apache, their DNS
is incorrectly set up, and great confusion can result. Watch out for it! Usually what will
happen is that you will type in a URL to a browser with a name you are sure is correct,
yet the browser will give you a DNS error, saying something like "Cannot find server."
Usually, it is the name in the redirect that causes the problem. If adding a / to the end of
your URL causes it, then you can be pretty sure that's what has happened.

1.3.3.1 Multiple sites: Unix
It is fortunate that the crucial Unix utility ifconfig, which binds IP addresses to physical
interfaces, often allows the binding of multiple IP numbers to a single interface so that
people can switch from one IP number to another and maintain service during the
transition. This is known as "IP aliasing" and can be used to maintain multiple "virtual"
web servers on a single machine.
In practical terms, on many versions of Unix, we run ifconfig to give multiple IP
addresses to the same interface. The interface in this context is actually the bit of software
— the driver — that handles the physical connection (Ethernet card, serial port, etc.) to
the outside. While writing this book, we accessed the practice sites through an Ethernet
connection between a Windows 95 machine (the client) and a FreeBSD box (the server)
running Apache.
Our environment was very untypical, since the whole thing sat on a desktop with no
access to the Web. The FreeBSD box was set up using ifconfig in a script lan_setup,
which contained the following lines:
ifconfig ep0 192.168.123.2
ifconfig ep0 192.168.123.3 alias netmask 0xFFFFFFFF
ifconfig ep0 192.168.124.1 alias

The first line binds the IP address 192.168.123.2 to the physical interface ep0. The
second binds an alias of 192.168.123.3 to the same interface. We used a subnet mask
(netmask 0xFFFFFFFF) to suppress a tedious error message generated by the FreeBSD
TCP/IP stack. This address was used to demonstrate virtual hosts. We also bound yet

another IP address, 192.168.124.1, to the same interface, simulating a remote server to
demonstrate Apache's proxy server. The important feature to note here is that the address
192.168.124.1 is on a different IP network from the address 192.168.123.2, even though
it shares the same physical network. No subnet mask was needed in this case, as the error
message it suppressed arose from the fact that 192.168.123.2 and 192.168.123.3 are on
the same network.
Unfortunately, each Unix implementation tends to do this slightly differently, so these
commands may not work on your system. Check your manuals!
In real life, we do not have much to do with IP addresses. Web sites (and Internet hosts
generally) are known by their names, such as www.butterthlies.com or
sales.butterthlies.com , which we shall meet later. On the authors' desktop system, these
names both translate into 192.168.123.2. The distinction between them is made by
Apache' Virtual Hosting mechanism — see Chapter 4.

1.3.3.2 Multiple sites: Win32
As far as we can discern, it is not possible to assign multiple IP addresses to a single
interface under a standard Windows 95 system. On Windows NT it can be done via
Control Panel
Networks
Protocols
TCP/IP/Properties...
IP Address
Advanced. Later versions of Windows, notably Windows 2000 and XP, support multiple
IP addresses through the TCP/IP properties dialog of the Local Area Network in the
Network and Dial-up Settings area of the Start menu.

1.4 How HTTP Clients Work
Once the server is set up, we can get down to business. The client has the easy end: it
wants web action on a particular site, and it sends a request with a URL that begins with
http to indicate what service it wants (other common services are ftp for File Transfer
Protocolor https for HTTP with Secure Sockets Layer — SSL) and continues with these
possible parts:
//:@:/

RFC 1738 says:
Some or all of the parts ":@", ":",":", and "/" may be omitted. The scheme specific data start with a double slash "//" to indicate
that it complies with the common Internet scheme syntax.
In real life, URLs look more like: http://www.apache.org/ — that is, there is no user and
password pair, and there is no port. What happens?
The browser observes that the URL starts with http: and deduces that it should be using
the HTTP protocol. The client then contacts a name server, which uses DNS to resolve

www.apache.org to an IP address. At the time of writing, this was 63.251.56.142. One
way to check the validity of a hostname is to go to the operating-system prompt[8] and
type:
ping www.apache.org

If that host is connected to the Internet, a response is returned:
Pinging www.apache.org [63.251.56.142] with 32 bytes of data:
Reply
Reply
Reply
Reply

from
from
from
from

63.251.56.142:
63.251.56.142:
63.251.56.142:
63.251.56.142:

bytes=32
bytes=32
bytes=32
bytes=32

time=278ms
time=620ms
time=285ms
time=290ms

TTL=49
TTL=49
TTL=49
TTL=49

Ping statistics for 63.251.56.142:

A URL can be given more precision by attaching a post number: the web address
http://www.apache.org doesn't include a port because it is port 80, the default, and the
browser takes it for granted. If some other port is wanted, it is included in the URL after a
colon — for example, http://www.apache.org:8000/. We will have more to do with ports
later.
The URL always includes a path, even if is only /. If the path is left out by the careless
user, most browsers put it back in. If the path were /some/where/foo.html on port 8000,
the URL would be http://www.apache.org:8000/some/where/foo.html.
The client now makes a TCP connection to port number 8000 on IP 204.152.144.38 and
sends the following message down the connection (if it is using HTTP 1.0):
GET /some/where/foo.html HTTP/1.0

These carriage returns and line feeds (CRLF) are very important because they separate
the HTTP header from its body. If the request were a POST, there would be data
following. The server sends the response back and closes the connection. To see it in
action, connect again to the Internet, get a command-line prompt, and type the following:
% telnet www.apache.org 80
> telnet www.apache.org 80
GET http://www.apache.org/foundation/contact.html HTTP/1.1
Host: www.apache.org

On Win98, telnet puts up a dialog box. Click connect
remote system, and change Port
from "telnet" to "80". In Terminal
preferences, check "local echo". Then type this,
followed by two Returns:
GET http://www.apache.org/foundation/contact.html HTTP/1.1
Host: www.apache.org

You should see text similar to that which follows.
Some implementations of telnet rather unnervingly don't echo what you type to the
screen, so it seems that nothing is happening. Nevertheless, a whole mess of response
streams past:
Trying 64.125.133.20...
Connected to www.apache.org.
Escape character is '^]'.
HTTP/1.1 200 OK
Date: Mon, 25 Feb 2002 15:03:19 GMT
Server: Apache/2.0.32 (Unix)
Cache-Control: max-age=86400
Expires: Tue, 26 Feb 2002 15:03:19 GMT
Accept-Ranges: bytes
Content-Length: 4946
Content-Type: text/html




Contact Information--The Apache Software
Foundation


The
Apache Software Foundation

Apache Projects

  • HTTP Server
  • APR
  • Jakarta
  • Perl
  • PHP
  • TCL
  • XML
  • Conferences
  • Foundation
  • ...... and so on 1.5 What Happens at the Server End? We assume that the server is well set up and running Apache. What does Apache do? In the simplest terms, it gets a URL from the Internet, turns it into a filename, and sends the file (or its output if it is a program)[9] back down the Internet. That's all it does, and that's all this book is about! Two main cases arise: • The Unix server has a standalone Apache that listens to one or more ports (port 80 by default) on one or more IP addresses mapped onto the interfaces of its machine. In this mode (known as standalone mode), Apache actually runs several copies of itself to handle multiple connections simultaneously. • On Windows, there is a single process with multiple threads. Each thread services a single connection. This currently limits Apache 1.3 to 64 simultaneous connections, because there's a system limit of 64 objects for which you can wait at once. This is something of a disadvantage because a busy site can have several hundred simultaneous connections. It has been improved in Apache 2.0. The default maximim is now 1920 — but even that can be extended at compile time. Both cases boil down to an Apache server with an incoming connection. Remember our first statement in this section, namely, that the object of the whole exercise is to resolve the incoming request either into a filename or the name of a script, which generates data internally on the fly. Apache thus first determines which IP address and port number were used by asking the operating system to where the connection is connecting. Apache then uses the IP address, port number — and the Host header in HTTP 1.1 — to decide which virtual host is the target of this request. The virtual host then looks at the path, which was handed to it in the request, and reads that against its configuration to decide on the appropriate response, which it then returns. Most of this book is about the possible appropriate responses and how Apache decides which one to use. 1.6 Planning the Apache Installation Unless you're using a prepackaged installation, you'll want to do some planning before setting up the software. You'll need to consider network integration, operating system choices, Apache version choices, and the many modules available for Apache. Even if you're just using Apache at an ISP, you may want to know which choices the ISP made in its installation. 1.6.1 Fitting Apache into Your Network Apache installations come in many flavors. If an installation is intended only for local use on a developer's machine, it probably needs much less integration with network systems than an installation meant as public host supporting thousands of simultaneous hits. Apache itself provides network and security functionality, but you'll need to set up supporting services separately, like the DNS that identifies your server to the network or the routing that connects it to the rest of the network. Some servers operate behind firewalls, and firewall configuration may also be an issue. If these are concerns for you, involve your network administrator early in the process. 1.6.2 Which Operating System? Many webmasters have no choice of operating system — they have to use what's in the box on their desks — but if they have a choice, the first decision to make is between Unix and Windows. As the reader who persists with us will discover, much of the Apache Group and your authors prefer Unix. It is, itself, essentially open source. Over the last 30 years it has been the subject of intense scrutiny and improvement by many thousands of people. On the other hand, Windows is widely available, and Apache support for Windows has improved substantially in Apache 2.0. 1.6.3 Which Unix? The choice is commonly between some sort of Linux and FreeBSD. Both are technically acceptable. If you already know someone who has one of these OSs and is willing to help you get used to yours, then it would make sense to follow them. If you are an Apple user, OS X has a Unix core and includes Apache. Failing that, the difference between the two paths is mainly a legal one, turning on their different interperations of open source licensing. Linux lives at http://www.linux.org, and there are more than 160 different distributions from which Linux can be obtained free or in prepackaged pay-for formats. It is rather ominously described as a "Unix-type" operating system, which sometimes means that long-established Unix standards have been "improved", not always in an upwards direction. Linux supports Apache, and most of the standard distributions include it. However, the default position of the Config files may vary from platform to platform, though usually on Linux they are to be found in /etc. Under Red Hat Linux they will be in/etc/httpd/conf by default. FreeBSD ("BSD" means "Berkeley Software Distribution" — as in the University of California, Berkeley, where the version of Unix FreeBSD is derived from) lives at http://www.freebsd.org. We have been using FreeBSD for a long time and think it is the best environment. If you look at http://www.netcraft.com and go to What's that site running?, you can examine any web site you like. If you choose, let's say, http://www.microsoft.com, you will discover that the site's uptime (length of time between rebooting the server) is about 12 days, on average. One assumes that Microsoft's servers are running under their own operating systems. The page Longest uptimes, also at Netcraft, shows that many Apache servers running Unix have uptimes of more than 1380 days (which is probably as long as Netcraft had been running the survey when we looked at it). One of the authors (BL) has a server running FreeBSD that has been rebooted once in 15 years, and that was when he moved house. The whole of FreeBSD is freely available from http://www.freebsd.org/. But we would suggest that it's well worth spending a few dollars to get the software on CD-ROM or DVD plus a manual that takes you though the installation process. If you plan to run Apache 2.0 on FreeBSD, you need to install FreeBSD 4.x to take advantage of Apache's support for threads: earlier versions of FreeBSD do not support them, at least not well enough to run Apache. If you use FreeBSD, you will find (we hope) that it installs from the CD-ROM easily enough, but that it initially lacks several things you will need later. Among these are Perl, Emacs, and some better shell than sh (we like bash and ksh), so it might be sensible to install them straightaway from their lurking places on the CD-ROM. 1.7 Windows? The main problem with the Win32 version of Apache lies in its security, which must depend, in turn, on the security of the underlying operating system. Unfortunately, Windows 95, Windows 98, and their successors have no effective security worth mentioning. Windows NT and Windows 2000 have a large number of security features, but they are poorly documented, hard to understand, and have not been subjected to the decades of public inspection, discussion, testing, and hacking that have forged Unix security into a fortress that can pretty well be relied upon. It is a grave drawback to Windows that the source code is kept hidden in Microsoft's hands so that it does not benefit from the scrutiny of the computing community. It is precisely because the source code of free software is exposed to millions of critical eyes that it works as well as it does. In the view of the Apache development group, the Win32 version is useful for easy testing of a proposed web site. But if money is involved, you would be wise to transfer the site to Unix before exposure to the public and the Bad Guys. 1.8 Which Apache? At the time this edition was prepared, Apache 1.3.26 was the stable release. It has an improved build system (see the section that follows). Both the Unix and Windows versions were thought to be in good shape. Apache 2.0 had made it through beta test into full release. We suggest that if you are working under Unix and you don't need Apache 2.0's improved features (which are multitudinous but not fundamental for the ordinary webmaster), you go for Version 1.3.26 or later. 1.8.1 Apache 2.0 Apache 2.0 is a major new version. The main new features are multithreading (on platforms that support it), layered I/O (also known as filters), and a rationalized API. The ordinary user will see very little difference, but the programmer writing new modules (see the section that follows) will find a substantial change, which is reflected in our rewritten Chapter 20 and Chapter 21. However, the improvements in Apache v2.0 look to the future rather than trying to improve the present. The authors are not planning to transfer their own web sites to v2.0 any time soon and do not expect many other sites to do so either. In fact, many sites are still happily running Apache v1.2, which was nominally superseded several years ago. There are good security reasons for them to upgrade to v1.3. 1.8.2 Apache 2.0 and Win32 Apache 2.0 is designed to run on Windows NT and 2000. The binary installer will only work with x86 processors. In all cases, TCP/IP networking must be installed. If you are using NT 4.0, install Service Pack 3 or 6, since Pack 4 had TCP/IP problems. It is not recommended that Windows 95 or 98 ever be used for production servers and, when we went to press, Apache 2.0 would not run under either at all. See http://www.apache.org/docs-2.0/platform/windows.html. 1.9 Installing Apache There are two ways of getting Apache running on your machine: by downloading an appropriate executable or by getting the source code and compiling it. Which is better depends on your operating system. 1.9.1 Apache Executables for Unix The fairly painless business of compiling Apache, which is described later, can now be circumvented by downloading a precompiled binary for the Unix of your choice. When we went to press, the following operating systems (mostly versions of Unix) were suported, but check before you decide. (See http://httpd.apache.org/dist/httpd/binaries.) aix darwin aux dgux beos digitalunix bs2000-osd freebsd bsdi hpux irix netware qnx sunos linux openbsd reliantunix unixware macosx os2 rhapsody win32 macosxserver os390 sinix netbsd osf1 solaris Although this route is easier, you do forfeit the opportunity to configure the modules of your Apache, and you lose the chance to carry out quite a complex Unix operation, which is in itself interesting and confidence-inspiring if you are not very familiar with this operating system. 1.9.2 Making Apache 1.3.X Under Unix Download the most recent Apache source code from a suitable mirror site: a list can be found at http://www.apache.org/[10]. You will get a compressed file — with the extension .gz if it has been gzipped or .Z if it has been compressed. Most Unix software available on the Web (including the Apache source code) is zipped using gzip, a GNU compression tool. When expanded, the Apache .tar file creates a tree of subdirectories. Each new release does the same, so you need to create a directory on your FreeBSD machine where all this can live sensibly. We put all our source directories in /usr/src/apache. Go there, copy the .tar.gz or .tar.Z file, and uncompress the .Z version or gunzip (or gzip -d ) the .gz version: uncompress .tar.Z or: gzip -d .tar.gz Make sure that the resulting file is called .tar, or tar may turn up its nose. If not, type: mv .tar Now unpack it: % tar xvf .tar Incidentally, modern versions of tar will unzip as well: % tar xvfz .tar.gz Keep the .tar file because you will need to start fresh to make the SSL version later on (see Chapter 11). The file will make itself a subdirectory, such as apache_1.3.14. Under Red Hat Linux you install the .rpmfile and type: rpm -i apache Under Debian: aptget install apache The next task is to turn the source files you have just downloaded into the executable httpd. But before we can discuss that that, we need to talk about Apache modules. 1.9.3 Modules Under Unix Apache can do a wide range of things, not all of which are needed on every web site. Those that are needed are often not all needed all the time. The more capability the executable, httpd, has, the bigger it is. Even though RAM is cheap, it isn't so cheap that the size of the executable has no effect. Apache handles user requests by starting up a new version of itself for each one that comes in. All the versions share the same static executable code, but each one has to have its own dynamic RAM. In most cases this is not much, but in some — as in mod_perl (see Chapter 17) — it can be huge. The problem is handled by dividing Apache's functionality into modules and allowing the webmaster to choose which modules to include into the executable. A sensible choice can markedly reduce the size of the program. There are two ways of doing this. One is to choose which modules you want and then to compile them in permanently. The other is to load them when Apache is run, using the Dynamic Shared Object (DSO) mechanism — which is somewhat like Dynamic Link Libraries (DLL) under Windows. In the two previous editions of this book, we deprecated DSO because: • • It was experimental and not very reliable. The underlying mechanism varies strongly from Unix to Unix so it was, to begin with, not available on many platforms. However, things have moved on, the list of supported platforms is much longer, and the bugs have been ironed out. When we went to press, the following operating systems were supported: Linux Darwin/Mac OS OpenStep/Mach SCO HPUX Digital Unix SunOS FreeBSD OpenBSD DYNIX/ptx ReliantUNIX DGUX UnixWare AIX IRIX NetBSD BSDI Ultrix was entirely unsupported. If you use an operating system that is not mentioned here, consult the notes in INSTALL. More reasons for using DSOs are: • • • Web sites are also getting more complicated so they often positively need DSOs. Some distributions of Apache, like Red Hat's, are supplied without any compiledin modules at all. Some useful packages, such as Tomcat (see Chapter 17), are only available as shared objects. Having said all this, it is also true that using DSOs makes the novice webmaster's life more complicated than it need be. You need to create the DSOs at compile time and invoke them at runtime. The list of them clogs up the Config file (which is tricky enough to get right even when it is small), offers plenty of opportunity for typing mistakes, and, if you are using Apache v1.3.X, must be in the correct order (under Apache v2.0 the DSO list can be in any order). Our advice on DSOs is not to use them unless: • • • You have a precompiled version of Apache (e.g., from Red Hat) that only handles modules as DSOs. You need to invoke the DSO mechanism to use a package such as Tomcat (see Chapter 17). Your web site is so busy that executable size is really hurting performance. In practice, this is extremely unlikely, since the code is shared across all instances on every platform we know of. If none of these apply, note that DSOs exist and leave them alone. 1.9.3.1 Compiled in modules This method is simple. You select the modules you want, or take the default list in either of the following methods, and compile away. We will discuss this in detail here. 1.9.3.2 DSO modules To create an Apache that can use the DSO mechanism as a specific shared object, the compile process has to create a detached chunk of executable code — the shared object. This will be a file like (in our layout) /usr/src/apache/apache_1.3.26/src/modules/standard/mod_alias.so. If all the modules are defined to be DSOs, Apache ends up with only two compiled-in modules: core and mod_so. The first is the real Apache; the second handles DSO loading and running. You can, of course, mix the two methods and have the standard modules compiled in with DSO for things like Tomcat. 1.9.3.3 APXS Once mod_so has been compiled in (see later), the necessary hooks for a shared object can be inserted into the Apache executable, httpd, at any time by using the utility apxs: apxs -i -a -c mod_foo.c This would make it possible to link in mod_foo at runtime. For practical details see the manual page by running man apxs or search http://www.apache.org for "apxs". The apxs utility is only built if you use the configure method — see Section 1.10.1 later in this chapter. Note that if you are running a version of Apache prior to 1.3.24, have previously configured Apache and now reconfigure it, you'll need to remove src/support/apxs to force a rebuild when you remake Apache. You will also need to reinstall Apache. If you do not do all this, things that use apxs may mysteriously fail. 1.10 Building Apache 1.3.X Under Unix There are two methods for building Apache: the "Semimanual Method" and "Out of the Box". They each involve the user in about the same amount of keyboard work: if you are happy with the defaults, you need do very little; if you want to do a custom build, you have to do more typing to specify what you want. Both methods rely on a shell script that, when run, creates a Makefile. When you run make, this, in turn, builds the Apache executable with the side orders you asked for. Then you copy the executable to its home (Semimanual Method) or run make install (Out of the Box) and the various necessary files are moved to the appropriate places around the machine. Between the two methods, there is not a tremendous amount to choose. We prefer the Semimanual Method because it is older[11] and more reliable. It is also nearer to the reality of what is happening and generates its own record of what you did last time so you can do it again without having to perform feats of memory. Out of the Box is easier if you want a default build. If you want a custom build and you want to be able to repeat it later, you would do the build from a script that can get quite large. On the other hand, you can create several different scripts to trigger different builds if you need to. 1.10.1 Out of the Box Until Apache 1.3, there was no real out-of-the-box batch-capable build and installation procedure for the complete Apache package. This method is provided by a top-level configure script and a corresponding top-level Makefile.tmpl file. The goal is to provide a GNU Autoconf-style frontend that is capable of driving the old src/Configure stuff in batch. Once you have extracted the sources (see earlier), the build process can be done in a minimum of three command lines — which is how most Unix software is built nowadays. Change yourself to root before you run ./configure; otherwise, if you use the default build configuration (which we suggest you do not), the server will be looking at port 8080 and will, confusingly, refuse requests to the default port, 80. The result is, as you will be told during the process, probably not what you really want: ./configure make make install This will build Apache and install it, but we suggest you read on before deciding to do it this way. If you do this — and then decide to do something different, do: make clean afterwards, to tidy up. Don't forget to delete the files created with: rm -R /usr/local/apache Readers who have done some programming will recognize that configure is a shell script that creates a Makefile. The command make uses it to check a lot of stuff, sets compiler variables, and compiles Apache. The command make install puts the numerous components in their correct places around your machine, using, in this case, the default Apache layout, which we do not particularly like. So, we recommend a slightly more elaborate procedure, which uses the GNU layout. The GNU layout is probably the best for users who don't have any preconcieved ideas. As Apache involves more and more third-party materials and this scheme tends to be used by more and more players, it also tends to simplify the business of bringing new packages into your installation. A useful installation, bearing in mind what we said about modules earlier and assuming you want to use the mod_proxy DSO, is produced by: make clean ./configure --with-layout=GNU \ --enable-module=proxy --enable-shared=proxy make make install ( the \ character lets the arguments carry over to a new line). You can repeat the -enable- commands for as many shared objects as you like. If you want to compile in hooks for all the DSOs, use: ./configure --with-layout=GNU --enable-shared=max make make install If you then repeat the ./configure... line with --show-layout > layout added on the end, you get a map of where everything is in the file layout. However, there is an nifty little gotcha here — if you use this line in the previous sequence, the --showlayout command turns off acutal configuration. You don't notice because the output is going to the file, and when you do make and make install, you are using whichever previous ./configure actually rewrote the Makefile — or if you haven't already done a ./configure, you are building the default, old Apache-style configuration. This can be a bit puzzling. So, be sure to run this command only after completeing the installation, as it will reset the configuration file. If everything has gone well, you should look in /usr/local/sbin to find the new executables. Use the command ls -l to see the timestamps to make sure they came from the build you have just done (it is surprisingly easy to do several different builds in a row and get the files mixed up): total 1054 -rwxr-xr-x -rwxr-xr-x -rwxr-xr-x -rwxr-xr-x -rwxr-xr-x -rw-r--r--rwxr-xr-x 1 1 1 1 1 1 1 root root root root root root root wheel wheel wheel wheel wheel wheel wheel 22972 7061 20422 409371 7000 0 4360 Dec Dec Dec Dec Dec Dec Dec 31 31 31 31 31 31 31 14:04 14:04 14:04 14:04 14:04 14:17 14:04 ab apachectl apxs httpd logresolve peter rotatelogs Here is the file layout (remember that this output means that no configuration was done): Configuring for Apache, Version 1.3.26 + using installation path layout: GNU (config.layout) Installation paths: prefix: exec_prefix: bindir: sbindir: libexecdir: mandir: sysconfdir: datadir: iconsdir: htdocsdir: cgidir: includedir: localstatedir: runtimedir: logfiledir: /usr/local /usr/local /usr/local/bin /usr/local/sbin /usr/local/libexec /usr/local/man /usr/local/etc/httpd /usr/local/share/httpd /usr/local/share/httpd/icons /usr/local/share/httpd/htdocs /usr/local/share/httpd/cgi-bin /usr/local/include/httpd /usr/local/var/httpd /usr/local/var/httpd/run /usr/local/var/httpd/log proxycachedir: /usr/local/var/httpd/proxy Compilation paths: HTTPD_ROOT: SHARED_CORE_DIR: DEFAULT_PIDLOG: DEFAULT_SCOREBOARD: DEFAULT_LOCKFILE: DEFAULT_XFERLOG: DEFAULT_ERRORLOG: TYPES_CONFIG_FILE: SERVER_CONFIG_FILE: ACCESS_CONFIG_FILE: RESOURCE_CONFIG_FILE: /usr/local /usr/local/libexec var/httpd/run/httpd.pid var/httpd/run/httpd.scoreboard var/httpd/run/httpd.lock var/httpd/log/access_log var/httpd/log/error_log etc/httpd/mime.types etc/httpd/httpd.conf etc/httpd/access.conf etc/httpd/srm.conf Since httpd should now be on your path, you can use it to find out what happened by running it, followed by one of a number of flags. Enter httpd -h. You see the following: httpd: illegal option -- ? Usage: httpd [-D name] [-d directory] [-f file] [-C "directive"] [-c "directive"] [-v] [-V] [-h] [-l] [-L] [-S] [-t] [-T] Options: -D name : define a name for use in directives -d directory : specify an alternate initial ServerRoot -f file : specify an alternate ServerConfigFile -C "directive" : process directive before reading config files -c "directive" : process directive after reading config files -v : show version number -V : show compile settings -h : list available command line options (this page) -l : list compiled-in modules -L : list available configuration directives -S : show parsed settings (currently only vhost settings) -t : run syntax check for config files (with docroot check) -T : run syntax check for config files (without docroot check) A useful flag is httpd -l, which gives a list of compiled-in modules: Compiled-in modules: http_core.c mod_env.c mod_log_config.c mod_mime.c mod_negotiation.c mod_status.c mod_include.c mod_autoindex.c mod_dir.c mod_cgi.c mod_asis.c mod_imap.c mod_actions.c mod_userdir.c mod_alias.c mod_access.c mod_auth.c mod_so.c mod_setenvif.c This list is the result of a build with only one DSO: mod_alias. All the other modules are compiled in, among which we find mod_so to handle the shared object. The compiled shared objects appear in /usr/local/libexec. as .so files. You will notice that the file /usr/local/etc/httpd/httpd.conf.default has an amazing amount of information it it — an attempt, in fact, to explain the whole of Apache. Since the rest of this book is also an attempt to present the same information in an expanded and digestible form, we do not suggest that you try to read the file with any great attention. However, it has in it a useful list of the directives you will later need to invoke DSOs — if you want to use them. In the /usr/src/apache/apache_XX directory you ought to read INSTALL and README.configure for background. 1.10.2 Semimanual Build Method Go to the top directory of the unpacked download — we used /usr/src/apache/apache1_3.26. Start off by reading README. This tells you how to compile Apache. The first thing it wants you to do is to go to the src subdirectory and read INSTALL. To go further, you must have an ANSI C-compliant compiler. Most Unices come with a suitable compiler; if not, GNU gcc works fine. If you have downloaded a beta test version, you first have to copy .../src/Configuration.tmpl to Configuration. We then have to edit Configuration to set things up properly. The whole file is in Appendix A of the installation kit. A script called Configure then uses Configuration and Makefile.tmpl to create your operational Makefile. (Don't attack Makefile directly; any editing you do will be lost as soon as you run Configure again.) It is usually only necessary to edit the Configuration file to select the permanent modules required (see the next section). Alternatively, you can specify them on the command line. The file will then automatically identify the version of Unix, the compiler to be used, the compiler flags, and so forth. It certainly all worked for us under FreeBSD without any trouble at all. Configuration has five kinds of things in it: • • Comment lines starting with # Rules starting with the word Rule • • • Commands to be inserted into Makefile , starting with nothing Module selection lines beginning with AddModule, which specify the modules you want compiled and enabled Optional module selection lines beginning with %Module, which specify modules that you want compiled-but not enabled until you issue the appropriate directive For the moment, we will only be reading the comments and occasionally turning a comment into a command by removing the leading #, or vice versa. Most comments are in front of optional module-inclusion lines to disable them. 1.10.3 Choosing Modules Inclusion of modules is done by uncommenting (removing the leading #) lines in Configuration. The only drawback to including more modules is an increase in the size of your binary and an imperceptible degradation in performance.[12] The default Configuration file includes the modules listed here, together with a lot of chat and comment that we have removed for clarity. Modules that are compiled into the Win32 core are marked with "W"; those that are supplied as a standard Win32 DLL are marked "WD." Our final list is as follows: AddModule modules/standard/mod_env.o Sets up environment variables to be passed to CGI scripts. AddModule modules/standard/mod_log_config.o Determines logging configuration. AddModule modules/standard/mod_mime_magic.o Determines the type of a file. AddModule modules/standard/mod_mime.o Maps file extensions to content types. AddModule modules/standard/mod_negotiation.o Allows content selection based on Accept headers. AddModule modules/standard/mod_status.o (WD) Gives access to server status information. AddModule modules/standard/mod_info.o Gives access to configuration information. AddModule modules/standard/mod_include.o Translates server-side include statements in CGI texts. AddModule modules/standard/mod_autoindex.o Indexes directories without an index file. AddModule modules/standard/mod_dir.o Handles requests on directories and directory index files. AddModule modules/standard/mod_cgi.o Executes CGI scripts. AddModule modules/standard/mod_asis.o Implements .asis file types. AddModule modules/standard/mod_imap.o Executes imagemaps. AddModule modules/standard/mod_actions.o Specifies CGI scripts to act as handlers for particular file types. AddModule modules/standard/mod_speling.o Corrects common spelling mistakes in requests. AddModule modules/standard/mod_userdir.o Selects resource directories by username and a common prefix. AddModule modules/proxy/libproxy.o Allows Apache to run as a proxy server; should be commented out if not needed. AddModule modules/standard/mod_alias.o Provides simple URL translation and redirection. AddModule modules/standard/mod_rewrite.o (WD) Rewrites requested URIs using specified rules. AddModule modules/standard/mod_access.o Provides access control. AddModule modules/standard/mod_auth.o Provides authorization control. AddModule modules/standard/mod_auth_anon.o (WD) Provides FTP-style anonymous username/password authentication. AddModule modules/standard/mod_auth_db.o Manages a database of passwords; alternative to mod_auth_dbm.o. AddModule modules/standard/mod_cern_meta.o (WD) Implements metainformation files compatible with the CERN web server. AddModule modules/standard/mod_digest.o (WD) Implements HTTP digest authentication; more secure than the others. AddModule modules/standard/mod_expires.o (WD) Applies Expires headers to resources. AddModule modules/standard/mod_headers.o (WD) Sets arbitrary HTTP response headers. AddModule modules/standard/mod_usertrack.o (WD) Tracks users by means of cookies. It is not necessary to use cookies. AddModule modules/standard/mod_unique_id.o Generates an ID for each hit. May not work on all systems. AddModule modules/standard/mod_so.o Loads modules at runtime. Experimental. AddModule modules/standard/mod_setenvif.o Sets environment variables based on header fields in the request. Here are the modules we commented out, and why: # AddModule modules/standard/mod_log_agent.o Not relevant here — CERN holdover. # AddModule modules/standard/mod_log_referer.o Not relevant here — CERN holdover. # AddModule modules/standard/mod_auth_dbm.o Can't have both this and mod_auth_db.o. Doesn't work with Win32. # AddModule modules/example/mod_example.o Only for testing APIs (see Chapter 20). These are the "standard" Apache modules, approved and supported by the Apache Group as a whole. There are a number of other modules available (see http://modules.apache.org). Although we mentioned mod_auth_db.o and mod_auth_dbm.o earlier, they provide equivalent functionality and shouldn't be compiled together. We have left out any modules described as experimental. Any disparity between the directives listed in this book and the list obtained by starting Apache with the -h flag is probably caused by the errant directive having moved out of experimental status since we went to press. Later on, when we are writing Apache configuration scripts, we can make them adapt to the modules we include or exclude with the IfModule directive. This allows you to give out predefined Config files that always work (in the sense of Apache loading), regardless of what mix of modules is actually compiled. Thus, for instance, we can adapt to the absence of configurable logging with the following: ... LogFormat "customers: host %h, logname %l, user %u, time %t, request %r, status %s, bytes %b" ... 1.10.4 Shared Objects If you want to enable shared objects in this method, see the notes in the Configuration file. Essentially, you do the following: 1. Enable mod_so by uncommenting its line. 2. Change an existing AddModule /.o so it ends in .so rather than .o and, of course, making sure the path is correct. 1.10.5 Configuration Settings and Rules Most Apache users won't have to bother with this section at all. However, you can specify extra compiler flags (for instance, optimization commands), libraries, or includes by giving values to the following : EXTRA_CFLAGS= EXTRA_LDFLAGS= EXTRA_LIBS= EXTRA_INCLUDES= Configure will try to guess your operating system and compiler; therefore, unless things go wrong, you won't need to uncomment and give values to these: #CC= #OPTIM=-02 #RANLIB= The rules in the Configuration file allow you to adapt for a few exotic configuration problems. The syntax of a rule in Configuration is as follows: Rule RULE =value The possible values are as follows: yes Configure does what is required. default Configure makes a best guess. Any other value is ignored. The Rule s are as follows: STATUS If yes, and Configure decides that you are using the status module, then full status information is enabled. If the status module is not included, yes has no effect. This is set to yes by default. SOCKS4 SOCKS is a firewall traversal protocol that requires client-end processing. See http://ftp.nec.com/pub/security/socks.cstc. If set to yes, be sure to add the SOCKS library location to EXTRA_LIBS; otherwise, Configure assumes L/usr/local/lib lsocks. This allows Apache to make outgoing SOCKS connections, which is not something it normally needs to do, unless it is configured as a proxy. Although the very latest version of SOCKS is SOCKS5, SOCKS4 clients work fine with it. This is set to no by default. SOCKS5 If you want to use a SOCKS5 client library, you must use this rule rather than SOCKS4. This is set to no by default. IRIXNIS If Configure decides that you are running SGI IRIX, and you are using NIS, set this to yes. This is set to no by default. IRIXN32 Make IRIX use the n32 libraries rather than the o32 ones. This is set to yes by default. PARANOID During Configure, modules can run shell commands. If PARANOID is set to yes, it will print out the code that the modules use. This is set to no by default. There is a group of rules that Configure will try to set correctly, but that can be overridden. If you have to do this, please advise the Apache Group by filling out a problem report form at http://apache.org/bugdb.cgi or by sending an email to apachebugs@ apache.org. Currently, there is only one rule in this group: WANTHSREGEX: Apache needs to interpret regular expressions using POSIX methods. A good regex package is included with Apache, but you can use your OS version by setting WANTHSREGEX=no or commenting out the rule. The default action depends on your OS: Rule WANTSHREGEX=default 1.10.6 Making Apache The INSTALL file in the src subdirectory says that all we have to do now is run the configuration script. Change yourself to root before you run ./configure; otherwise the server will be configured on port 8080 and will, confusingly, refuse requests to the default port, 80. Then type: % ./Configure You should see something like this — bearing in mind that we're using FreeBSD and you may not be: Using config file: Configuration Creating Makefile + configured for FreeBSD platform + setting C compiler to gcc + Adding selected modules o status_module uses ConfigStart/End: o dbm_auth_module uses ConfigStart/End: o db_auth_module uses ConfigStart/End: o so_module uses ConfigStart/End: + doing sanity check on compiler and options Creating Makefile in support Creating Makefile in main Creating Makefile in ap Creating Makefile in regex Creating Makefile in os/unix Creating Makefile in modules/standard Creating Makefile in modules/proxy Then type: % make When you run make, the compiler is set in motion using the makefile built by Configure, and streams of reassuring messages appear on the screen. However, things may go wrong that you have to fix, although this situation can appear more alarming than it really is. For instance, in an earlier attempt to install Apache on an SCO machine, we received the following compile error: Cannot open include file 'sys/socket.h' Clearly (since sockets are very TCP/IP-intensive), this had to do with TCP/IP, which we had not installed: we did so. Not that this is a big deal, but it illustrates the sort of minor problem that arises. Not everything turns up where it ought to. If you find something that really is not working properly, it is sensible to make a bug report via the Bug Report link in the Apache Server Project main menu. But do read the notes there. Make sure that it is a real bug, not a configuration problem, and look through the known bug list first so as not to waste everyone's time. The result of make was the executable httpd. If you run it with: % ./httpd it complains that it: could not open document config file /usr/local/etc/httpd/conf/httpd.conf This is not surprising because, at the moment, httpd.conf, which we call the Config file, doesn't exist. Before we are finished, we will become very familiar with this file. It is perhaps unfortunate that it has a name so similar to the Configuration file we have been dealing with here, because it is quite different. We hope that the difference will become apparent later on. The last step is to copy httpd to a suitable storage directory that is on your path. We use /usr/local/bin or /usr/local/sbin. 1.11 New Features in Apache v2 The procedure for configuring and compiling Apache has changed, as we will see later. High-level decisions about the way Apache works internally can now be made at compile time by including one of a series of Multi Processing Modules (MPMs). This is done by attaching a flag to configure: ./configure --with_mpm= Although MPMs are rather like ordinary modules, only one can be used at a time. Some of them are designed to adapt Apache to different operating systems; others offer a range of different optimizations for Unix. It will be shown, along with the other compiled-in modules, by executing httpd -l. When we went to press, these were the possible MPMs under Unix: prefork Default. Most closely imitates behavior of v1.3. Currently the default for Unix and sites that require stability, though we hope that threading will become the default later on. threaded Suitable for sites that require the benefits brought by threading, particularly reduced memory footprint and improved interthread communications. But see "prefork" earlier in this list. perchild Allows different hosts to have different user IDs. mpmt_pthread Similar to prefork, but each child process has a specified number of threads. It is possible to specify a minimum and maximum number of idle threads. Dexter Multiprocess, multithreaded MPM that allows you to specify a static number of processes. Perchild Similar to Dexter, but you can define a seperate user and group for each child process to increase server security. Other operating systems have their own MPMs: spmt_os2 For OS2. beos For the Be OS. WinNT Win32-specific version, taking advantage of completion ports and native function calls to give better network performance. To begin with, accept the default MPM. More advanced users should refer to http://httpd.apache.org/docs-2.0/mpm.html and http://httpd.apache.org/docs2.0/misc/perf-tuning.html. See the entry for the AcceptMutex directive in Chapter 3. 1.11.1 Config File Changes in v2 Version 2.0 makes the following changes to the Config file: • • • • • • • • CacheNegotiatedDocs now takes the argument on/off. Existing instances of CacheNegotiatedDocs should be given the argument on. ErrorDocument "" now needs quotes around the , not just at the start. The AccessConfig and ResourceConfig directives have been abolished. If you want to use these files, replace them by Include conf/srm.conf Include conf/access.conf in that order, and at the end of the Config file. The BindAddress directive has been abolished. Use Listen. The ExtendedStatus directive has been abolished. The ServerType directive has been abolished. The AgentLog, ReferLog, and ReferIgnore directives have been removed along with the mod_log_agent and mod_log_referer modules. Agent and referer logs are still available using the CustomLog directive. The AddModule and ClearModule directives have been abolished. A very useful point is that Apache v2 does not care about the order in which DSOs are loaded. 1.11.2 httpd Command-Line Changes Running the v2 httpd with the flag -h to show the possible command-line flags produces this: Usage: ./httpd [-D name] [-d directory] [-f file] [-C "directive"] [-c "directive"] [-v] [-V] [-h] [-l] [-L] [-t] [-T] Options: -D name : define a name for use in directives -d directory : specify an alternate initial ServerRoot -f file : specify an alternate ServerConfigFile -C "directive" : process directive before reading config files -c "directive" : process directive after reading config files -v : show version number -V : show compile settings -h : list available command line options (this page) -l : list compiled in modules -L : list available configuration directives -t -D DUMP_VHOSTS : show parsed settings (currently only vhost settings) -t : run syntax check for config files (with docroot check) -T : run syntax check for config files (without docroot check) In particular, the -X flag has been removed. You can get the same effect — running a single copy of Apache without any children being generated — with this: httpd -D ONE_PROCESS or: httpd -D NO_DETACH depending on the MPM used. The available flags for each MPM will be visible on running httpd with -?. 1.11.3 Module Changes in v2 Version 2.0 makes the following changes to module handling: • • mod_auth_digest is now a standard module in v2. mod_mmap_static, which was experimental in v1.3, has been replaced by mod_file_cache. • Third-party modules written for Apache v1.3 will not work with v2 since the API has been completely rewritten. See Chapter 20 and Chapter 21. 1.12 Making and Installing Apache v2 Under Unix Disregard all the previous instructions for Apache compilation. There is no longer a .../src directory. Even the name of the Unix source file has changed. We downloaded httpd-2_0_40.tar.gz and unpacked it in /usr/src/apache as usual. You should read the file INSTALL. The scheme for building Apache v2 is now much more in line with that for most other downloaded packages and utilities. Set up the configuration file with this: ./configure --prefix=/usr/local or wherever it is you want to keep the Apache bits — which will appear in various subdirectories. The executable, for instance, will be in .../sbin. If you are compiling under FreeBSD, as we were, --with-mpm=prefork is automatically used internally, since threads do not currently work well under this operating system. To see all the configuration possibilities: ./configure --help | more If you want to preserve your Apache 1.3.X executable, you might rename it to httpd.13, wherever it is, and then: make which takes a surprising amount of time to run. Then: make install The result is a nice new httpd in /usr/local/sbin. 1.13 Apache Under Windows Apache 1.3 will work under Windows NT 4.0 and 2000. Its performance under Windows 95 and 98 is not guaranteed. If running on Windows 95, the "Winsock2" upgrade must be installed before Apache will run. "Winsock2" for Windows 95 is available at http://www.microsoft.com/windows95/downloads/contents/WUAdminTools/S_WUNetw orkingTools/W95Sockets2. Be warned that the Dialup Networking 1.2 (MS DUN) updates include a Winsock2 that is entirely insufficient, and the Winsock2 update must be reinstalled after installing Windows 95 dialup networking. Windows 98, NT (Service Pack 3 or later), and 2000 users need to take no special action; those versions provide Winsock2 as distributed. Apache v2 will run under Windows 2000 and NT, but, when we went to press, they did not work under Win 95, 98, or Me. These different versions are the same as far as Apache is concerned, except that under NT, Apache can also be run as a service. From Apache v1.3.14, emulators are available to provide NT services under the other Windows platforms. Performance under Win32 may not be as good as under Unix, but this will probably improve over coming months. Since Win32 is considerably more consistent than the sprawling family of Unices, and since it loads extra modules as DLLs at runtime rather than compiling them at make time, it is practical for the Apache Group to offer a precompiled binary executable as the standard distribution. Go to http://www.apache.org/dist, and click on the version you want, which will be in the form of a self-installing .exe file (the .exe extension is how you tell which one is the Win32 Apache). Download it into, say, c:\temp, and then run it from the Win32 Start menu's Run option. The executable will create an Apache directory, C:\Program Files\Apache, by default. Everything to do with Win32 Apache happens in an MS-DOS window, so get into a window and type: > cd c:\ > dir and you should see something like this: Volume in drive C has no label Volume Serial Number is 294C-14EE Directory of C:\apache . 21/05/98 .. 21/05/98 DEISL1 ISU 12,818 29/07/98 HTDOCS 29/07/98 MODULES 29/07/98 ICONS 29/07/98 LOGS 29/07/98 CONF 29/07/98 7:27 7:27 15:12 15:12 15:12 15:12 15:12 15:12 . .. DeIsL1.isu htdocs modules icons logs conf CGI-BIN 29/07/98 15:12 cgi-bin ABOUT_~1 12,921 15/07/98 13:31 ABOUT_APACHE ANNOUN~1 3,090 18/07/98 23:50 Announcement KEYS 22,763 15/07/98 13:31 KEYS LICENSE 2,907 31/03/98 13:52 LICENSE APACHE EXE 3,072 19/07/98 11:47 Apache.exe APACHE~1 DLL 247,808 19/07/98 12:11 ApacheCore.dll MAKEFI~1 TMP 21,025 15/07/98 18:03 Makefile.tmpl README 2,109 01/04/98 13:59 README README~1 TXT 2,985 30/05/98 13:57 README-NT.TXT INSTALL DLL 54,784 19/07/98 11:44 install.dll _DEISREG ISR 147 29/07/98 15:12 _DEISREG.ISR _ISREG32 DLL 40,960 23/04/97 1:16 _ISREG32.DLL 13 file(s) 427,389 bytes 8 dir(s) 520,835,072 bytes free Apache.exe is the executable, and ApacheCore.dll is the meat of the thing. The important subdirectories are as follows: conf Where the Config file lives. logs Where the logs are kept. htdocs Where you put the material your server is to give clients. The Apache manual will be found in a subdirectory. modules Where the runtime loadable DLLs live. After 1.3b6, leave alone your original versions of files in these subdirectories, while creating new ones with the added extension .default — which you should look at. We will see what to do with all of this in the next chapter. See the file README-NT.TXT for current problems. 1.13.1 Modules Under Windows Under Windows, Apache is normally downloaded as a precompiled executable. The core modules are compiled in, and others are loaded .so at runtime (if needed), so control of the executable's size is less urgent. The DLLs supplied (they really are called .so and not .dll ) in the .../apache/modules subdirectory are as follows: mod_auth_anon.so mod_auth_dbm.so mod_auth_digest.so mod_cern_meta.so mod_dav.so mod_dav_fs.so mod_expires.so mod_file_cache.so mod_headers.so mod_info.so mod_mime_magic.so mod_proxy.so mod_rewrite.so mod_speling.so mod_status.so mod_unique_id.so mod_usertrack.so mod_vhost_alias.so mod_proxy_connect.so mod_proxy_ftp.so mod_proxy_http.so mod_access.so mod_actions.so mod_alias.so mod_asis.so mod_auth.so mod_autoindex.so mod_cgi.so mod_dir.so mod_env.so mod_imap.so mod_include.so mod_isapi.so mod_log_config.so mod_mime.so mod_negotiation.so mod_setenvif.so mod_userdir.so What these are and what they do will become more apparent as we proceed. 1.13.2 Compiling Apache Under Win32 The advanced user who wants to write her own modules (see Chapter 21) will need the source code. This can be installed with the Win32 version by choosing Custom installation. It can also be downloaded from the nearest mirror Apache site (start at http://apache.org/ ) as a .tar.gz file containing the normal Unix distribution. In addition, it can be unpacked into an appropriate source directory using, for instance, 32-bit WinZip, which deals with .tar and .gz format files, as well as .zip. You will also need Microsoft's Visual C++ Version 6. Scripts are available for users of MSVC v5, since the changes are not backwards compatible. Once the sources and compiler are in place, open an MS-DOS window, and go to the Apache src directory. Build a debug version, and install it into \Apache by typing: > nmake /f Makefile.nt _apached > nmake /f Makefile.nt installd or build a release version by typing: > nmake /f Makefile.nt _apacher > nmake /f Makefile.nt installr This will build and install the following files in and below \Apache\: Apache.exe The executable ApacheCore.dll The main shared library Modules\ApacheModule*.dll Seven optional modules \conf Empty config directory \logs Empty log directory The directives described in the rest of the book are the same for both Unix and Win32, except that Win32 Apache can load module DLLs. They need to be activated in the Config file by the LoadModule directive. For example, if you want status information, you need the line: LoadModule status_module modules/ApacheModuleStatus.dll Apache for Win32 can also load Internet Server Applications (ISAPI extensions). Notice that wherever filenames are relevant in the Config file, the Win32 version uses forward slashes (/) as in Unix, rather than backslashes (\) as in MS-DOS or Windows. Since almost all the rest of the book applies to both Win32 and Unix without distinction between then, we will use forward slashes (/) in filenames wherever they occur. [1] Note that since a URL has no predefined meaning, this really is just a tradition, though a pretty well entrenched one in this case. [2] We generally follow the convention of calling these people the Bad Guys. This avoids debate about "hackers," which to many people simply refers to good programmers, but to some means Bad Guys. We discover from the French edition of this book that in France they are Sales Types -- dirty fellows. [3] For more on the open source movement, see Open Sources: Voices from the Open Source Revolution (O'Reilly & Associates, 1999). [4] Netcraft also surveys the uptime of various sites. At the time of writing, the longest running site was http://wwwprod1.telia.com, which had been up for 1,386 days. [5] This double name is rather annoying, but it seems that life has progressed too far for anything to be done about it. We will, rather clumsily, refer to httpd/apache and hope that the reader can pick the right one. [6] Windows NT TCP/IP Network Administration, by Craig Hunt and Robert Bruce Thompson (O'Reilly & Associates, 1998), and TCP/IP Network Administration, Third Edition, by Craig Hunt (O'Reilly & Associates, 2002). [7] In the minimal case we could have two programs running on the same computer talking to each other via TCP/IP — the network is "virtual". [8] The operating-system prompt is likely to be ">" (Win95) or "%" (Unix). When we say, for instance, "Type % ping," we mean, "When you see '%', type 'ping'." [9] Usually. We'll see later that some URLs may refer to information generated completely within Apache. [10] It is best to download it, so you get the latest version with all its bug fixes and security patches. [11] New is a dirty four letter word in computing. [12] Assuming the module has been carefully written, it does very little unless enabled in the httpd.conf files. Chapter 2. Configuring Apache: The First Steps • • • • • • 2.1 What's Behind an Apache Web Site? 2.2 site.toddle 2.3 Setting Up a Unix Server 2.4 Setting Up a Win32 Server 2.5 Directives 2.6 Shared Objects After the installation described in Chapter 1, you now have a shiny bright apache/httpd, and you're ready for anything. For our next step, we will be creating a number of demonstration web sites. 2.1 What's Behind an Apache Web Site? It might be a good idea to get a firm idea of what, in the Apache business, a web site is: it is a directory somewhere on the server, say, /usr/www/APACHE3/site.for_instance. It usually contains at least four subdirectories. The first three are essential: conf Contains the Config file, usually httpd.conf, which tells Apache how to respond to different kinds of requests. htdocs Contains the documents, images, data, and so forth that you want to serve up to your clients. logs Contains the log files that record what happened. You should consult .../logs/error_log whenever anything fails to work as expected. cgi-bin Contains any CGI scripts that are needed. If you don't use scripts, you don't need the directory. In our standard installation, there will also be a file go in the site directory, which contains a script for starting Apache. Nothing happens until you start Apache. In this example, you do it from the command line. If your computer experience so far has been entirely with Windows or other Graphical User Interfaces (GUIs), you may find the command line rather stark and intimidating to begin with. However, it offers a great deal of flexibility and something which is often impossible through a GUI: the ability to write scripts (Unix) or batch files (Win32) to automate the executables you want to run and the inputs they need, as we shall see later. 2.1.1 Running Apache from the Command Line If the conf subdirectory is not in the default location (and it usually isn't), you need a flag that tells Apache where it is. httpd -d /usr/www/APACHE3/site.for_instance -f... apache -d c:/usr/www/APACHE3/site.for_instance Notice that the executable names are different under Win32 and Unix. The Apache Group decided to make this change, despite the difficulties it causes for documentation, because "httpd" is not a particularly sensible name for a specific web server and, indeed, is used by other web servers. However, it was felt that the name change would cause too many backward-compatibility issues on Unix, and so the new name is implemented only on Win32. Also note that the Win32 version still uses forward slashes rather than backslashes. This is because Apache internally uses forward slashes on all platforms; therefore, you should never use a backslash in an Apache Config file, regardless of the operating system. Once you start the executable, Apache runs silently in the background, waiting for a client's request to arrive on a port to which it is listening. When a request arrives, Apache either does its thing or fouls up and makes a note in the log file. What we call "a site" here may appear to the outside world as hundred of sites, because the Config file can invoke many virtual hosts. When you are tired of the whole Web business, you kill Apache (see Section 2.3, later in this chapter), and the computer reverts to being a doorstop. Various issues arise in the course of implementing this simple scheme, and the rest of this book is an attempt to deal with some of them. As we pointed out in the preface, running a web site can involve many questions far outside the scope of this book. All we deal with here is how to make Apache do what you want. We often have to leave the questions of what you want to do and whyyou might want to do it to a higher tribunal. httpd (or apache) takes the following flags. (This is information you can evoke by running httpd -h): -Usage: httpd.20 [-D name] [-d directory] [-f file] [-C "directive"] [-c "directive"] [-v] [-V] [-h] [-l] [-L] [-t] [-T] Options: -D name : define a name for use in directives -d directory : specify an alternate initial ServerRoot -f file : specify an alternate ServerConfigFile -C "directive" : process directive before reading config files -c "directive" : process directive after reading config files -v : show version number -V : show compile settings -h : list available command line options (this page) -l : list compiled in modules -L : list available configuration directives -t -D DUMP_VHOSTS : show parsed settings (currently only vhost settings) -t : run syntax check for config files (with docroot check) -T : run syntax check for config files (without docroot check) -i : Installs Apache as an NT service. -u : Uninstalls Apache as an NT service. -s : Under NT, prevents Apache registering itself as an NT service. If you are running under Win95 this flag does not seem essential, but it would be advisable to include it anyway. This flag should be used when starting Apache from the command line, but it is easy to forget because nothing goes wrong if you leave it out. The main advantage is a faster startup (omitting it causes a 30second delay). -k shutdown|restart : Run on another console window, apache -k shutdown stops Apache gracefully, and apache -k restart stops it and restarts it gracefully. The Apache Group seems to put in extra flags quite often, so it is worth experimenting with apache -? (or httpd -?) to see what you get. 2.2 site.toddle You can't do much with Apache without a web site to play with. To embody our first shaky steps, we created site.toddle as a subdirectory, /usr/www/APACHE3/site.toddle, which you will find on the code download. Since you may want to keep your demonstration sites somewhere else, we normally refer to this path as ... /. So we will talk about ... /site.toddle. (Windows users, please read this as ...\site.toddle). In ... /site.toddle, we created the three subdirectories that Apache expects: conf, logs, and htdocs. The README file in Apache's root directory states: The next step is to edit the configuration files for the server. In the subdirectory called conf you should find distribution versions of the three configuration files: srm.conf-dist, access.conf-dist, and httpd.conf-dist. As a legacy from the NCSA server, Apache will accept these three Config files. But we strongly advise you to put everything you need in httpd.conf and to delete the other two. It is much easier to manage the Config file if there is only one of them. From Apache v1.3.4-dev on, this has become Group doctrine. In earlier versions of Apache, it was necessary to disable these files explicitly once they were deleted, but in v1.3 it is enough that they do not exist. The README file continues with advice about editing these files, which we will disregard. In fact, we don't have to set about this job yet; we will learn more later. A simple expedient for now is to run Apache with no configuration and to let it prompt us for what it needs. The Configuration File Before we start running Apache with no configuration, we would like to say a few words about the philosophy of the Configuration File. Apache comes with a huge file that, as we observe elsewhere, tries to tell you every possible thing the user might need to know about Apache. If you are new to the software, a vast amount of this will be gibberish to you. However, many Apache users modify this file to adapt it to their needs. We feel that this is a VERY BAD IDEA INDEED. The file is so complicated to start with that it is very hard to see what to do. It is all too easy to make amendments and then to forget what you have done. The resulting mess then stays around, perhaps for years, being teamed with possibly incompatible Apache updates, until it finally stops working altogether. It is then very difficult to disentangle your input from the absolute original (which you probably have not kept and is now unobtainable). It is much better to start with a completely minimal file and add to it only what is absolutely necessary. The set-up process for Unix and Windows systems is quite different, so they are described in two separate sections as follows. If you're using Unix, read on; if not, skip to Section 2.4 later in this chapter. 2.3 Setting Up a Unix Server We can point httpd at our site with the -d flag (notice the full pathname to the site.toddle directory, which will probably be different on your machine): % httpd -d /usr/www/APACHE3/site.toddle Since you will be typing this a lot, it's sensible to copy it into a script called go. This can go in /usr/local/bin or in each local site. We have done the latter since it is convenient to change it slightly from time to time. Create it by typing: % cat > /usr/local/bin/go test -d logs || mkdir logs httpd -f 'pwd'/conf/httpd$1.conf -d 'pwd' ^d ^d is shorthand for Ctrl-D, which ends the input and gets your prompt back. This go will work on every site. It creates a logs directory if one does not exist, and it explicitly specifies paths for the ServerRoot directory (-d) and the Config file (-f). The command 'pwd' finds the current directory with the Unix command pwd. The back-ticks are essential: they substitute pwd's value into the script — in other words, we will run Apache with whatever configuration is in our current directory. To accomodate sites where we have more than one Config file, we have used ...httpd$1... where you might expect to see ...httpd... The symbol $1 copies the first argument (if any) given to the command go. Thus ./go 2 will run the Config file called httpd2.conf, and ./go by itself will run httpd.conf. Remember that you have to be in the site directory. If you try to run this script from somewhere else, pwd's return will be nonsense, and Apache will complain that it 'could not open document config file ...'. Make go runnable, and run it by typing the following (note that you have to be in the directory .../site.toddle when you run go): % chmod +x go % go If you get the error message: go: command not found you need to type: % ./go This launches Apache in the background. Check that it's running by typing something like this (arguments to psvary from Unix to Unix): % ps -aux This Unix utility lists all the processes running, among which you should find several httpds.[1] Sooner or later, you have finished testing and want to stop Apache. To do this, you have to get the process identity (PID) of the program httpd using ps -aux: USER PID %CPU %MEM VSZ RSS TT STAT STARTED root 701 0.0 0.8 396 240 v0 R+ 2:49PM root 1 0.0 0.9 420 260 ?? Is 8:13AM /sbin/init -root 2 0.0 0.0 0 0 ?? DL 8:13AM (pagedaemon) root 3 0.0 0.0 0 0 ?? DL 8:13AM (vmdaemon) root 4 0.0 0.0 0 0 ?? DL 8:13AM (syncer) root 35 0.0 0.3 204 84 ?? Is 8:13AM adjkerntz -i root 98 0.0 1.8 820 524 ?? Is 7:13AM daemon 107 0.0 1.3 820 384 ?? Is 7:13AM /usr/sbin/portma root 139 0.0 2.1 888 604 ?? Is 7:13AM root 142 0.0 2.0 980 592 ?? Ss 7:13AM root 146 0.0 3.2 1304 936 ?? Is 7:13AM sendmail: accept root 209 0.0 1.0 500 296 con- I 7:13AM /usr/loc root 238 0.0 5.8 10996 1676 con- I 7:13AM /usr/local/libex root 239 0.0 1.1 460 316 v0 Is 7:13AM (csh) root 240 0.0 1.2 460 336 v1 Is 7:13AM (csh) root 241 0.0 1.2 460 336 v2 Is 7:13AM (csh) root 251 0.0 1.7 1052 484 v0 S 7:14AM root 576 0.0 1.8 1048 508 v1 I 2:18PM root 618 0.0 1.7 1040 500 v2 I 2:22PM root 627 0.0 2.2 992 632 v2 I+ 2:22PM demo_test root 630 0.0 2.2 992 636 v1 I+ 2:23PM home root 694 0.0 6.7 2548 1968 ?? Ss 2:47PM /u webuser 695 0.0 7.0 2548 2044 ?? I 2:47PM /u webuser 696 0.0 7.0 2548 2044 ?? I 2:47PM /u webuser 697 0.0 7.0 2548 2044 ?? I 2:47PM /u webuser 698 0.0 7.0 2548 2044 ?? I 2:47PM /u TIME COMMAND 0:00.00 ps -aux 0:00.02 0:00.04 0:00.00 0:02.24 0:00.00 0:00.43 syslogd 0:00.00 0:00.07 inetd 0:00.27 cron 0:00.25 0:00.02 /bin/sh 0:00.09 0:00.09 -csh 0:00.07 -csh 0:00.07 -csh 0:00.32 0:00.07 0:00.04 0:00.02 bash bash bash mince 0:00.06 mince 0:00.03 httpd -d 0:00.00 httpd -d 0:00.00 httpd -d 0:00.00 httpd -d 0:00.00 httpd -d webuser /u 699 0.0 7.0 2548 2044 ?? I 2:47PM 0:00.00 httpd -d To kill Apache, you need to find the PID of the main copy of httpd and then do kill — the child processes will die with it. In the previous example the process to kill is 694 — the copy of httpd that belongs to root. The command is this: % kill 694 If ps -aux produces more printout than will fit on a screen, you can tame it with ps aux | more — hit Return to see another line or Space to see another screen. It is important to make sure that the Apache process is properly killed because you can quite easily kill a child process by mistake and then start a new copy of the server with its children — and a different Config file or Perl scripts — and so get yourself into a royal muddle. To get just the lines from ps that you want, you can use: ps awlx | grep httpd On Linux: killall httpd Alternatively and better, since it is less prone to finger trouble, Apache writes its PID in the file ... /logs/httpd.pid (by default — see the PidFile directive), and you can write yourself a little script, as follows: kill 'cat /usr/www/APACHE3/site.toddle/logs/httpd.pid' You may prefer to put more generalized versions of these scripts somewhere on your path. stop looks like this: pwd | read path kill 'cat $path/logs/httpd.pid' Or, if you don't plan to mess with many different configurations, use .../src/support/apachect1 to start and stop Apache in the default directory. You might want to copy it into /usr/local/bin to get it onto the path, or add $apacheinstalldir/bin to your path. It uses the following flags: usage: ./apachectl (start|stop|restart|fullstatus|status|graceful|configtest|help) start Start httpd. stop Stop httpd. restart Restart httpd if running by sending a SIGHUP or start if not running. fullstatus Dump a full status screen; requires lynx and mod_status enabled. status Dump a short status screen; requires lynx and mod_status enabled. graceful Do a graceful restart by sending a SIGUSR1 or start if not running. configtest Do a configuration syntax test. help This screen. When we typed ./go, nothing appeared to happen, but when we looked in the logs subdirectory, we found a file called error_log with the entry: []:'mod_unique_id: unable to get hostbyname ("myname.my.domain") In our case, this problem was due to the odd way we were running Apache, and it will only affect you if you are running on a host with no DNS or on an operating system that has difficulty determining the local hostname. The solution was to edit the file /etc/hosts and add the line: 10.0.0.2 myname.my.domain myname where 10.0.0.2 is the IP number we were using for testing. However, our troubles were not yet over. When we reran httpd, we received the following error message: []--couldn't determine user name from uid This means more than might at first appear. We had logged in as root. Because of the security worries of letting outsiders log in with superuser powers, Apache, having been started with root permissions so that it can bind to port 80, has attempted to change its user ID to -1. On many Unix systems, this ID corresponds to the user nobody : a supposedly harmless user. However, it seems that FreeBSD does not understand this notion, hence the error message.[2] In any case, it really isn't a great idea to allow Apache to run as nobody (or any other shared user), because you run the risk that an attacker exploiting the fact that various different services are sharing the same user, that is, if you are running several different services (ftp, mail, etc) on the same machine. 2.3.1 webuser and webgroup The remedy is to create a new user, called webuser, belonging to webgroup. The names are unimportant. The main thing is that this user should be in a group of its own and should not actually be used by anyone for anything else. On most Unix systems, create the group first by running adduser -group webgroup then the user by running adduser. You will be asked for passwords for both. If the system insists on a password, use some obscure non-English string like cQuycn75Vg. Ideally, you should make sure that the newly created user cannot actually log in; how this is achieved varies according to operating system: you may have to replace the encrypted password in /etc/passwd, or remove the home directory, or perhaps something else. Having told the operating system about this user, you now have to tell Apache. Edit the file httpd.conf to include the following lines: User webuser Group webgroup The following are the interesting directives. 2.3.1.1 User The User directive sets the user ID under which the server will run when answering requests. User unix-userid Default: User #-1 Server config, virtual host In order to use this directive, the standalone server must be run initially as root. unixuserid is one of the following: username Refers to the given user by name #usernumber Refers to a user by his number The user should have no privileges that allow access to files not intended to be visible to the outside world; similarly, the user should not be able to execute code that is not meant for httpd requests. However, the user must have access to certain things — the files it serves, for example, or mod_proxy 's cache, when enabled (see the CacheRoot directive in Chapter 9). If you start the server as a non-root user, it will fail to change to the lesser-privileged user and will instead continue to run as that original user. If you start the server as root, then it is normal for the parent process to remain running as root. Don't set User (or Group) to root unless you know exactly what you are doing and what the dangers are. 2.3.1.2 Group The Group directive sets the group under which the server will answer requests. Group unix-group Default: Group #-1 Server config, virtual host To use this directive, the standalone server must be run initially as root. unix-group is one of the following: groupname Refers to the given group by name #groupnumber Refers to a group by its number It is recommended that you set up a new group specifically for running the server. Some administrators use group nobody, but this is not always possible or desirable, as noted earlier. If you start the server as a non-root user, it will fail to change to the specified group and will instead continue to run as the group of the original user. Now, when you run httpd and look for the PID, you will find that one copy belongs to root, and several others belong to webuser. Kill the root copy and the others will vanish. 2.3.2 "Out of the Box" Default Problems We found that when we built Apache "out of the box" using a GNU layout, some file defaults were not set up properly. If when you run ./go you get the rather odd error message on the screen: fopen: No such file or directory httpd: could not open error log file site.toddle/var/httpd/log/error_log you need to add the line: ErrorLog logs/error_log to ...conf/httpd.conf. If, having done that, Apache fails to start and you get a message in .../logs/error_log: .... No such file or directory.: could not open mime types log file /site.toddle/etc/httpd/mime.types you need to add the line: TypesConfig conf/mime.types to ...conf/httpd.conf. And if, having done that, Apache fails to start and you get a message in .../logs/error_log: fopen: no such file or directory httpd: could not log pid to file /site.toddle/var/httpd/run/ httpd.pid you need to add the line: PIDFile logs/httpd.pid to ...conf/httpd.conf. 2.3.3 Running Apache Under Unix When you run Apache now, you may get the following error message: httpd: cannot determine local hostname Use ServerName to set it manually. What Apache means is that you should put this line in the httpd.conf file: ServerName Finally, before you can expect any action, you need to set up some documents to serve. Apache's default document directory is ... /httpd/htdocs — which you don't want to use because you are at /usr/www/APACHE3/site.toddle — so you have to set it explicitly. Create ... /site.toddle/htdocs, and then in it create a file called 1.txt containing the immortal words "hullo world." Then add this line to httpd.conf : DocumentRoot /usr/www/APACHE3/site.toddle/htdocs The complete Config file, .../site.toddle/conf/httpd.conf, now looks like this: User webuser Group webgroup ServerName my586 DocumentRoot /usr/www/APACHE3/APACHE3/site.toddle/htdocs/ #fix 'Out of the Box' default problems--remove leading #s if necessary #ServerRoot /usr/www/APACHE3/APACHE3/site.toddle #ErrorLog logs/error_log #PIDFile logs/httpd.pid #TypesConfig conf/mime.types When you fire up httpd, you should have a working web server. To prove it, start up a browser to access your new server, and point it at http:///.[3] As we know, http means use the HTTP protocol to get documents, and / on the end means go to the DocumentRoot directory you set in httpd.conf. Lynx is the text browser that comes with FreeBSD and other flavors of Unix; if it is available, type: % lynx http:/// You see: INDEX OF / * Parent Directory * 1.txt If you move to 1.txt with the down arrow, you see: hullo world If you don't have Lynx (or Netscape, or some other web browser) on your server, you can use telnet :[4] % telnet 80 You should see something like: Trying 192.168.123.2 Connected to my586.my.domain Escape character is '^]' Then type: GET / HTTP/1.0 You should see: HTTP/1.0 200 OK Sat, 24 Aug 1996 23:49:02 GMT Server: Apache/1.3 Connection: close Content-Type: text/html Index of /

    Index of

    Connection closed by foreign host. This is a rare opportunity to see a complete HTTP message. The first lines are headers that are normally hidden by your browser. The stuff between the < and > is HTML, written by Apache, which, if viewed through a browser, produces the formatted message shown by Lynx earlier, and by Netscape or Microsoft Internet Explorer in the next chapter. 2.3.4 Several Copies of Apache To get a display of all the processes running, run: % ps -aux Among a lot of Unix stuff, you will see one copy of httpd belonging to root and a number that belong to webuser. They are similar copies, waiting to deal with incoming queries. The root copy is still attached to port 80 — thus its children will be as well — but it is not listening. This is because it is root and has too many powers for this to be safe. It is necessary for this "master" copy to remain running as root because under the (slightly flawed) Unix security doctrine, only root can open ports below 1024. Its job is to monitor the scoreboard where the other copies post their status: busy or waiting. If there are too few waiting (default 5, set by the MinSpareServers directive in httpd.conf ), the root copy starts new ones; if there are too many waiting (default 10, set by the MaxSpareServers directive), it kills some off. If you note the PID (shown by ps -ax, or ps -aux for a fuller listing; also to be found in ... /logs/httpd.pid ) of the root copy and kill it with: % kill PID you will find that the other copies disappear as well. It is better, however, to use the stop script described in Section 2.3 earlier in this chapter, since it leaves less to chance and is easier to do. 2.3.5 Unix Permissions If Apache is to work properly, it's important to correctly set the file-access permissions. In Unix systems, there are three kinds of permissions: read, write , and execute. They attach to each object in three levels: user, group, and other or "rest of the world." If you have installed the demonstration sites, go to ... /site.cgi/htdocs, and type: % ls -l You see: -rw-rw-r-- 5 root bin 1575 Aug 15 07:45 form_summer.html The first - indicates that this is a regular file. It is followed by three permission fields, each of three characters. They mean, in this case: User (root) Read yes, write yes, execute no Group (bin) Read yes, write yes, execute no Other Read yes, write no, execute no When the permissions apply to a directory, the x execute permission means scan: the ability to see the contents and move down a level. The permission that interests us is other, because the copy of Apache that tries to access this file belongs to user webuser and group webgroup. These were set up to have no affinities with root and bin, so that copy can gain access only under the other permissions, and the only one set is "read." Consequently, a Bad Guy who crawls under the cloak of Apache cannot alter or delete our precious form_summer.html; he can only read it. We can now write a coherent doctrine on permissions. We have set things up so that everything in our web site, except the data vulnerable to attack, has owner root and group wheel. We did this partly because it is a valid approach, but also because it is the only portable one. The files on our CD-ROM with owner root and group wheel have owner and group numbers 0 that translate into similar superuser access on every machine. Of course, this only makes sense if the webmaster has root login permission, which we had. You may have to adapt the whole scheme if you do not have root login, and you should perhaps consult your site administrator. In general, on a web site everything should be owned by a user who is not webuser and a group that is not webgroup (assuming you use these terms for Apache configurations). There are four kinds of files to which we want to give webuser access: directories, data, programs, and shell scripts. webuser must have scan permissions on all the directories, starting at root down to wherever the accessible files are. If Apache is to access a directory, that directory and all in the path must have x permission set for other. You do this by entering: % chmod o+x To produce a directory listing (if this is required by, say, an index), the final directory must have read permission for other. You do this by typing: % chmod o+r It probably should not have write permission set for other: % chmod o-w To serve a file as data — and this includes files like .htaccess (see Chapter 3) — the file must have read permission for other: % chmod o+r file And, as before, deny write permission: % chmod o-w To run a program, the file must have execute permission set for other: % chmod o+x To execute a shell script, the file must have read and execute permission set for other: % chmod o+rx

    Navigation menu