Apache Server Survival Guide asg13.htm
|
host ident authuser date request status bytes |
|
host | The host field contains the fully qualified domain name or IP, if the name was not available, of the machine that made the request. |
ident | If IdentityCheck is enabled and the client machine was running an identity daemon, the ident field will contain the name of the user that made the request. You should never trust this information, unless you know that the host is trusted. Otherwise, understand that this information can be spoofed and is not trustworthy, don't bother enabling it! |
authuser | If the request required authentication, the authuser field will contain the login of the user who made the request. |
date | The date field contains the date and time of the request, including the offset from Greenwich Mean Time. The date format used is day/month/year:hour:minute:second timezone |
request | The request field is set to the actual request received from the client. It is enclosed in double quotes ("). |
status | This field contains the three-digit HTTP status code returned to the client. Apache can return any of the following HTTP response codes: |
200: OK | |
302: Found | |
304: Not Modified | |
400: Bad Request | |
401: Unauthorized | |
403: Forbidden | |
404: Not Found | |
500: Server Error | |
503: Out Of Resources (Service Unavailable) | |
501: Not Implemented | |
502: Bad Gateway | |
The HTTP standard defines many other codes, so this list is likely to grow as new features are implemented in Apache. | |
bytes | The size of the transfer in bytes returned to the client, not counting any header information. |
To enable logging using the standard log format, use the TransferLog directive. This directive allows you to specify the filename to receive the logging information. Instead of a file, you can also specify a program to receive the information on its Standard Input stream (stdin).
The syntax of the TransferLog directive is as follows:
Syntax: | TransferLog [filename] | [|program] |
Default: | TransferLog logs/transfer_log |
filename is the name of a file relative of ServerRoot. If for some reason you don't want to log, specify /dev/null as the access log file.
|program is the pipe symbol (|) followed by a path to a program capable of receiving the log information on stdin.
As with any program started by the server, the program is run with the User ID (UID) and Group ID (GID) of the user that started the httpd daemon. If the user starting the program is root, be sure that the User directive demotes the server privileges to those of an unprivileged user such as nobody. Also, make sure the program is secure.
Here's a sample from an accesslog file generated by Apache for http://www.PlanetEarthInc.COM, a site hosted at accessLINK:
sundmz1.bloomberg.com - - [20/Jul/1996:09:56:03 -0500] "GET /two.gif HTTP/1.0" 200 2563 sundmz1.bloomberg.com - - [20/Jul/1996:09:56:03 -0500] "GET /three.gif HTTP/1.0" 200 4078 sundmz1.bloomberg.com - - [20/Jul/1996:09:56:03 -0500] "GET /four.gif HTTP/1.0" 200 4090 pn3-ppp-109.primary.net - - [20/Jul/1996:09:57:29 -0500] "GET / HTTP/1.0" 200 5441 pn3-ppp-109.primary.net - - [20/Jul/1996:09:57:36 -0500] "GET /images/ultimate.gif HTTP/1.0" 200 7897 pn3-ppp-109.primary.net - - [20/Jul/1996:09:57:38 -0500] "GET /sponsors/banner-bin/emusic2.gif HTTP/1.0" 200 8977 pn3-ppp-109.primary.net - - [20/Jul/1996:09:57:44 -0500] "GET /images/hero.gif HTTP/1.0" 200 16098 128.58.101.231 - - [20/Jul/1996:09:59:19 -0500] "GET / HTTP/1.0" 200 5441 128.58.101.231 - - [20/Jul/1996:09:59:23 -0500] "GET / HTTP/1.0" 200 5441 slip-2-28.slip.shore.net - - [20/Jul/1996:10:03:44 -0500] "GET / HTTP/1.0" 200 5439 slip-2-28.slip.shore.net - - [20/Jul/1996:10:04:07 -0500] "GET /sponsors/banner-bin/books.gif HTTP/1.0" 200 5726 slip-2-28.slip.shore.net - - [20/Jul/1996:10:04:09 -0500] "GET /images/ultimate.gif HTTP/1.0" 200 7897 slip-2-28.slip.shore.net - - [20/Jul/1996:10:04:16 -0500] "GET /images/hero.gif HTTP/1.0" 200 16098 slip-12-16.ots.utexas.edu - - [20/Jul/1996:10:09:38 -0500] "GET / HTTP/1.0" 200 5441 slip-12-16.ots.utexas.edu - - [20/Jul/1996:10:09:50 -0500] "GET /anim.class HTTP/1.0" 200 12744 slip-12-16.ots.utexas.edu - - [20/Jul/1996:10:10:00 -0500] "GET /one.gif HTTP/1.0" 404 - slip-12-16.ots.utexas.edu - - [20/Jul/1996:10:10:01 -0500] "GET /two.gif HTTP/1.0" 200 2563 slip-12-16.ots.utexas.edu - - [20/Jul/1996:10:10:05 -0500] "GET /three.gif HTTP/1.0" 200 4078 slip-12-16.ots.utexas.edu - - [20/Jul/1996:10:10:09 -0500] "GET /four.gif HTTP/1.0" 200 4090 slip-12-16.ots.utexas.edu - - [20/Jul/1996:10:10:12 -0500] "GET /five.gif HTTP/1.0" 200 3343 slip-12-16.ots.utexas.edu - - [20/Jul/1996:10:10:15 -0500] "GET /six.gif HTTP/1.0" 200 2122 slip-12-16.ots.utexas.edu - - [20/Jul/1996:10:10:18 -0500] "GET /seven.gif HTTP/1.0" 200 2244 slip-12-16.ots.utexas.edu - - [20/Jul/1996:10:11:06 -0500] "GET /eight.gif HTTP/1.0" 200 2334 www-j8.proxy.aol.com - - [20/Jul/1996:10:31:50 -0500] "GET / HTTP/1.0" 200 5443 www-j8.proxy.aol.com - - [20/Jul/1996:10:31:57 -0500] "GET /images/ultimate.gif HTTP/1.0" 200 7897 www-j8.proxy.aol.com - - [20/Jul/1996:10:31:57 -0500] "GET /images/hero.gif HTTP/1.0" 200 16098 www-j8.proxy.aol.com - - [20/Jul/1996:10:31:57 -0500] "GET /sponsors/banner-bin/ktravel.gif HTTP/1.0" 200 1500 sage.wt.com.au - - [20/Jul/1996:10:43:05 -0500] "GET / HTTP/1.0" 200 5441 |
By simple inspection of this log excerpt, you can see that most requests are answered successfully. Only one entry is suspicious:
|
slip-12-16.ots.utexas.edu - - [20/Jul/1996:10:10:00 -0500] "GET /one.gif HTTP/1.0" 404 - |
It has a response code 404 - "Not Found." The person maintaining this site should check to see if this error is repeated elsewhere because one of his pages could be referencing a broken link.
In addition to the standard mod_log_common logging module, Apache provides a log that is fully customizable. This log module is still considered experimental as of release 1.1, but according to some sources, it will be the preferred and default logging module for Apache 1.2. Even in its "experimental" state (actually it is just as reliable as the other one), its flexible log format may provide you with more useful logging capabilities and may give you the opportunity to reduce several logs into a single one.
|
%h | Remote host. |
%l | Remote logname (from identd, if supplied). |
%u | Remote user (from auth; may be bogus if return status (%s) is 401). |
%t | Time of the request using the time format used by the Common Log Format. |
%r | First line of request. |
%s | Status. For requests that got internally redirected, this is the status of the original request; %>s for the last. |
%b | Bytes sent. |
%{Header}i | The contents of Header: header line(s) in the request sent to the client. |
%{Header}o | The contents of Header: header line(s) in the reply. |
One of the better features this module produces is conditional logging. Conditional logging can include the information depending on a HTTP response code. You can specify the conditions for inclusion of a particular field by specifying the HTTP status code between the % and letter code for the field. You may specify more than one HTTP status code by separating them with a comma (,). In addition, you can specify to log any of the environment variables, such as the User-Agent or the Referer, received by the server by specifying its name between braces ({variable}). Here are a few examples:
%400,500{User-agent}i
The preceding example logs User-agent headers only on Bad Request or Not Implemented errors.
You can also specify that a field be logged. If a certain HTTP code is not returned by adding an exclamation symbol (!) in front of the code, you want to check for
%!200,304,302{Referer}i
This example logs the Referer header information on all requests not returning a normal return code. When a condition is not met, the field is null. As with the common log format, a null field is indicated by a dash (-) character.
Virtual hosts can have their own LogFormat and/or TransferLog. If no LogFormat is specified, it is inherited from the main server process. If the virtual hosts don't have their own TransferLog, entries are written to the main server's log. To differentiate between virtual hosts writing to a common log file, you can prepend a label to the log format string:
<VirtualHost xxx.com> LogFormat "xxx formatstring" ... </VirtualHost> <VirtualHost yyy.com> LogFormat "yyy formatstring" ... </VirtualHost> |
"%h %l %u %t \"%r\" %s %b %{Cookie}i %{User-agent}i %400,401,403,404{Referer}i" |
In order to enable the Cookie header, we compiled in the mod_cookies. We also disabled the CookieLog by pointing it to /dev/null. There is no need to have a separate Cookie log when you can include this information in the main log.
To enable logging, you need to use the TransferLog directive:
filename is the name of a file relative of ServerRoot.
|program is the pipe symbol (|) followed by a path to a program capable of receiving the log information on stdin.
As with any program started by the server, the program is run with the UID and GID of the user that started the httpd daemon. If the user starting the program is root, be sure that the User directive demotes the server privileges to those of an unprivileged user such as nobody. Also, make sure the program is secure.
|
LogFormat "%h %l %u %t \"%r\" %s %b %{Cookie}i %{User-agent}i %400,401,403,404{Referer}i" |
This log format adds the Cookie header (a number) associated with each request. It also logs the browser the visitor was using and the Referer header information if the request was bad.
This format allows you to pack a lot of useful information into a single log file while still remaining compatible with most, if not all, of the standard logging tools available. (The order of the first seven fields is the same as the CLF.)
|
awk '{print $11}' logfile | sort | uniq | wc -l |
The awk command prints the eleventh field in the file. Fields in the logfile are separated by spaces, so each space creates a field. Output containing only the cookies numbers is piped to sort, which will sort all the cookies in numerical order. The sorted output is piped to uniq, which removes duplicate lines. Finally, the thinned out list is sent to wc which counts the number of lines in the result. This number matches the number of unique visitors that came to your site. For more information on these commands, please consult your UNIX documentation.
|
grep -c "19/Jul/1996:15" access_log |
|
grep -c "GET / " access_log |
or
|
grep -c "GET /index.html " access_log |
or
|
grep -c "~/username" accesslog |
The sum of these two searches is the number of total accesses to your home page, assuming that your home page is at the root directory and it is named index.html. For private home pages, you should use the third option. Just replace username with the login of the user.
|
The higher-end tools, such as net.Analysis from net.Genesis ( http://www.netgen.com), cost anywhere from $295$2,995. They provide a number of features that may be interesting to very high-traffic sites.
In the inexpensive range (less than $100), there are many nice tools with tons of options available. My favorite tools are described in the following sections.
|