Apache Server Survival Guide asg06.htm

Previous Page TOC Next Page



6


Server Side Includes (SSI)


Server Side Includes (SSI), also known as Server Parsed HTML (SPML), provides a convenient way of performing server-side processing on an HTML file before it is sent to the client. SSI provides a set of dynamic features, such as including the current time or the last modification date of the HTML file without developing a CGI program to that performs this function. SSI can be considered as a server side scripting language.

Server parsed HTML documents are parsed and processed by the server before they are sent to the client. Only documents with a MIME type of text/x-server-parsed-html or text/x-server-parsed-html3 are parsed and processed. The resulting HTML is given a MIME type of text/html and is returned to the client.

You can include information such as the current time, execute a program and include its output, or include a document just by adding some simple SPML commands to your HTML page. When the HTML page is properly identified to the server as containing SPML tokens, the server parses the file and sends the result to the client requesting it.



Enabling Apache to Run Server Side Includes



By default, SSIs are not enabled on a standard Apache configuration. Before you can incorporate SSI commands into your HTML documents, you will have to add a few directives to your configuration files.
The first step is to uncomment (remove the # character at the beginning of a line) the two lines in your srm.conf file that enable the following directives:
AddType text/html .shtml
AddHandler server-parsed .shtml
If these directives are missing, simply add them to the configuration file. The AddType directive maps the extension .shtml to the MIME type text/html. The AddHandler directive maps the .shtml extension to a handler. Handlers allow the server to perform some action based on a file type. They handle the processing of the file before it is returned to the client. The server-parsed handler referenced in the AddHandler directive is predefined in the base Apache distribution.
The AddType and AddHandler directives enable the server to recognize and process files that contain SPML tokens. However, that alone is not enough to enable their use in Apache. Because SSI can execute programs and include other documents in your filesystem, processing of the SPML tokens is not allowed by default. To enable the processing of SPML commands, you’ll need to override the default set of options. The Options directive allows you to control which features are available in which server directory. This allows you to provide different security settings to different areas of your server document tree. If the server attempts to execute an SSI document in a directory that doesn’t enable this functionality, the request fails.
By default the htdocs document tree only allows Indexes and FollowSymLinks, which enable the server to generate automatic directory listings and to follow symbolic links, respectively. Apache provides two different options that enable SSI execution:
Includes
IncludesNOEXEC
Includes activates all commands available to server side includes. IncludesNOEXEC is a more restrictive option; it disables the exec and include commands. As their names suggest, the exec command executes programs and the include command inserts other documents into the requested HTML file.
Because of the obvious security implications associated with executing programs and including other documents, you should not enable SSI execution system wide, unless you are able to control what those SSI do.
The Options directive can be found inside a <Directory> section inside the global access configuration file, access.conf or in a per-directory access control file (.htaccess) files. For more information, see Chapter 9, "Apache Server Core Directives," which explains in great detail the <Directory> and Options directives.


An SPML document is parsed as an HTML document, with SPML commands embedded as Standard Generalized Markup Language (SGML) comments. The commands follow this syntax:


<!--#command option=value option=value ...-->

Each command has a different set of options that you can specify. Usually options have a value portion (a parameter). Currently the available commands include facilities to do the following:

  • Execute programs

  • Obtain file size and modification information

  • Include text from other documents or from a program

  • Configure the format used to display results from the various commands


SPML Commands


The available SPML commands are

  • config

  • echo

  • exec

  • cmd

  • fsize

  • flastmod

  • include


config

The config command controls various aspects of the parsing and establishes various formatting options for displaying errors, date-time strings, and file sizes. The valid options you can use are (note that quotes surround values):
errmsg="message" message is the message returned to the client if an error occurs during the parsing of the document.
sizefmt=["bytes"] | ["abbrev"] These options set the format used to display a file size. Valid values are

bytes for a size returned in bytes.

abbrev for a size returned in KB or MB as appropriate.
timefmt="format" format is a format string that specifies the format used to print the date. It is compatible with the strftime library available under most UNIX environments. The various strftime format options are

%% identical to %; use it if you need a percent sign in the output.

%a abbreviated weekday name

%A full weekday name

%b abbreviated month name

%B full month name

%c time and date using the time and date representation for the locale (the same as using the %X %x options together).

%d day of the month as a decimal number (01–31)

%H hour based on a 24-hour clock as a decimal number (00–23)

%I hour based on a 12-hour clock as a decimal number (01–12)

%j day of the year as a decimal number (001–366)

%m month as a decimal number (01–12)

%M minute as a decimal number (00–59)

%p AM/PM designation associated with a 12-hour clock

%S second as a decimal number (00–61)

%w weekday as a decimal number (0–6), where Sunday is 0

%x date using the date representation for the locale

%X time using the time representation for the locale

%y year without century (00–99)

%Y year with century (for example, 1990)

%Z time zone name or abbreviation, or no characters if no time zone is determinable.

echo

The echo command prints any of the include environment variables. If the specified variable doesn't have a value, it prints as (none). Printing of dates is subject to the currently configuration of timefmt. Valid options include the following:
var="variable" variable is the name of the variable to print.

Beginning with Apache 1.1, the SSIs and CGIs you call from your SSI also have access to the CGI environment variables. For a complete listing, refer to Chapter 5, "CGI (Common Gateway Interface) Programming."

Beginning with Apache 1.2, Apache adopted the use of the XSSI module as their standard SSI processor. XSSI is discussed in detail in a section of this chapter entitled, "eXtended Server Side Includes (XSSI)."

exec

The exec command executes the specified shell command or CGI program. This option can be disabled by the IncludesNOEXEC option. The valid options are
cgi="path" path specifies the program to be run. If path is not an absolute path (one that begins with a /), then path is taken to be relative to the current document.

The directory containing the program must be a CGI directory approved by either a ScriptAlias or by setting the ExecCGI option in the global access configuration file or on a per-directory access file.

The program's environment include the PATH_INFO and QUERY_STRING variables set to the values sent in the original request. The include variables are available to the script in addition to the standard CGI environment.

Programs that return a Location: header have their output translated into an HTML anchor.

Use of the include virtual element is preferred to exec cgi.
cmd="string" The server will execute string using /bin/sh. The environment includes the include variables and, beginning with Apache 1.1, the complete set of CGI environment variables.

fsize

This command inserts the size of the specified file, subject to the sizefmt format specification. The options to this command are
file="path" The value is a path relative to the directory containing the current document being parsed.
virtual="path" If path is not an absolute path (one that begins with a (/), path is taken to be relative to the current document.

flastmod

This command inserts the last modification date of the specified file, subject to the timefmt format specification. The options for this command are
file="path" The value is a path relative to the directory containing the current document being parsed.
virtual="path" If path is not an absolute path (one that begins with a (/), it is taken to be relative to the current document.

include

This command inserts another document into the parsed file. Included files are subject to any access-control settings governing their access, including any restrictions to CGI program execution. If there is a permissions restriction, the file or program output won't be included.

An option specifies the document's location; an inclusion is done for each option given to the include command. Valid options are as follows:
file="filename" filename is a filename relative to the directory containing the current document being parsed. It cannot contain ../, nor can it be an absolute path. The virtual option should always be used in preference to this one.
virtual="urlpath" urlpath is a path relative to the current document being parsed. The URL cannot contain a scheme or hostname, only a path and an optional query string. If urlpath is not an absolute path (one that begins with a /), path is taken to be relative to the current document.

A URL is constructed from the option, and the output the server would return if the URL were accessed by the client is included in the parsed output; included files can be nested.

Include Variables


These variables are provided for includes and to any program invoked by the document:
DATE_GMT The current date in Greenwich Mean Time (GMT).
DATE_LOCAL The current date in the local time zone.
DOCUMENT_NAME The filename of the document requested by the user. DOCUMENT_NAME does not include any path information.
DOCUMENT_URI The path of the document requested by the user.
LAST_MODIFIED The document's last modification date.

[ic:Apache 1.1]In Apache 1.1, you can access the CGI environment variables (for a complete list, refer to Chapter 5) in addition to the include variables. The standard CGI 1.1 specification defines the several variables. Some are filled for all requests (SERVER_SOFTWARE, SERVER_NAME, and GATEWAY_INTERFACE). Others are request specific and may not be defined. A third category is added by the client program. These variables start with HTTP.

With this information you can create some pretty useful SPML pages that start to act more like a CGI program than a SPML. Two omissions from the Apache SSI module are conditional execution and user-defined variables. Howard Fear has developed a full replacement for the Apache SSI module, adding this missing functionality. This module is called XSSI. It is available from the CD-ROM included with this book, and is part of the standard Apache distribution for 1.2, which at the time this book was written, is yet unavailable.

SSI Example


Here's an example, shown in Listing 6.1 and then in Figure 6.1, that puts it all together. You may want to implement something like this on your own site. The page returned is generated at random, based on a small database of quotes (in my programs the database is referred to as quotes.conf). Each entry in the database is a single line of text. Each record in the quote database is separated by a newline.

Listing 6.1. quote.shtml.


<HTML>

<HEAD>

<TITLE> Random Quote </TITLE>

</HEAD>

<BODY>

<P ALIGN=CENTER>

<FONT SIZE=7><EM>Random Quote</EM></FONT>

</P>

<HR>

<BLOCKQUOTE>

<FONT SIZE=5><EM>

<!--#exec cmd="cgi/quoteoftheday.cgi"-->

</EM>

</FONT>

</BLOCKQUOTE>

<HR>

<!--#exec cmd="cgi/envvar.cgi"-->

<P>

<!--#config timefmt="%A, %B %d %Y"-->

<!--#echo var="DATE_LOCAL"-->

</BODY>

</HTML>

Figure 6.1. Quote of the day.

The quoteoftheday.cgi program returns a random line from the quote database (see Listing 6.2). The quote database is just a simple text file with one quote per line. The CGI returns one line, which the server inserts into the HTML stream returned to the client.

Listing 6.2. quoteoftheday.cgi.


#!/usr/local/bin/perl -w

#

# quote.cgi - prints a quote at random

# $conf_file: the absolute path to your configuration file

# $ad_tag: the string outputted to the web page SSI location

# Call this script from a server-parsed html document (.shtml for example) and make

# sure that server-side includes are enabled.

#

# Use the following example code:

# <!--#exec cmd="/yourpath/random.cgi"-->

#

# Of course, substitute your actual path to the random.cgi for / yourpath. Again,

# this won't work unless Server-Side Includes are activated for Apache...

use strict; # Declare all our variables before using them

$| = 1; # Flush the output buffer

#Variables

my( $conf_file, $quote );

my( @Quotes );

my( $num_quotes, $rand_quote );

$conf_file = "/NextLibrary/WebServer/htdocs/AccessLink.htmld/cgi/quotes.conf";

srand;

open( IN, $conf_file ) || die "Cannot open $conf_file: $!";

@Quotes = <IN>;

close( IN );

$num_quotes = @Quotes; #How many quotes are there?

$rand_quote = int( rand( $num_quotes - 1 ) );

$quote = $Quotes[$rand_quote];

print $quote;

exit( 0 );

Finally, envvar.cgi is a trivial Perl program that prints out two of my CGI environment variables. (See Listing 6.3.) If you are using Apache 1.1.1's SSI module or XSSI, CGI variables are available as any of the standard SSI variables, so there's no need for this program.

Listing 6.3. envvar.cgi.


#!/usr/local/bin/perl -w

#

use strict; # Declare all our variables before using them

$| = 1; # Flush the output buffer

print "HTTP_USER_AGENT = $ENV{HTTP_USER_AGENT}<BR>\n";

print "REMOTE_ADDR = $ENV{REMOTE_ADDR}<BR>\n";

Extended Server Side Includes (XSSI)


To circumvent some of the limitations of the standard SSI Apache module, Fear developed an extended SSI module that provides additional functionality and directives.

XSSI is a plug-in replacement module for the built-in mod_include module. Extended Server Side Includes (XSSI) enhances the server side include syntax and provides serveral additional features, including:

  • Conditional inclusion with if-then-else constructs

  • User-defined variables

  • Additional output directives

  • Regular expression support

[ic:cd icon]The SSI module is available on the CD-ROM included with this book. The latest version can be found at http://pageplus.com/~hsf/xssi.

Variables


XSSI supports the notion of user-defined variables that you can use on all value tags, with the exception of var=, which you can use on all directives.

To define a variable, you use the set directive:


<!--#set var="variableName" value="variable_value"-->

For example, to define a variable program that holds the value "/cgi-bin/printenv", you would set the following:


<!--#set var="program" value="/cgi-bin/printenv" -->

You can then use this variable on any value tag, except on the var=tags:


<!--#exec cgi="$program" -->

Alternatively, you can specify additional arguments by enclosing the variable in braces like the following:


<!--#exec cgi="${program} additional_text"-->

In addition to the variables defined for SSI, you can use any of the CGI environment variables (for a complete listing of these variables, please refer to the Chapter 5), which help create very simple, yet powerful, SSI programs without the need to write a CGI program.

It's worth noting that the $ syntax to denote variable names may cause problems in exec cmd constructs where the $ may be used by another program to denote an argument. For example:


<!--#exec cmd="/usr/local/etc/httpd/cgi-bin/finger @${HOST_NAME} |/bin/ awk '{print $1}'" -->

This SSI won't work as expected. The $1 is set to a null value, which causes awk to print the complete line instead of the first word as intended. The solution to this minor inconveniece is to have an sh script handle all the output properly and return the data to you.

Output Commands


In addition to the echo, fsize, and flastmod commands, which work as I've described, XSSI provides printenv, which prints all variables currently set (this includes CGI environment variables):


<!--#printenv -->

Flow Control


XSSI provides you with basic flow control with the following directives:


<!--#if expr="condition" -->

<!--#elif expr="condition" -->

<!--#else -->

<!--#endif -->

The if constructs work as they do in any programming language. If condition evaluates to true, all text specified until the next elif, else, or endif is included in the output. If condition evaluates to false, text specified after the else statement is included in the output.

The elif condition is evaluated if the preceding if statement evaluated to false. Statements specified after an elif statement are included in the output if the elif expression evaluates to true.

All conditional constructs must be terminated by an endif construct.

Any token that is not recognized as a variable or an operator is treated as a string. Strings can be quoted; variable substitution is done within quoted strings (if you need a $ within a string, you can escape it by preceding it with a backslash (\) character). Unquoted strings cannot contain whitespace because spaces are used to separate other tokens. condition can be any of the following:
string Evaluates to true if the string or variable is not empty
stringA = stringB Evaluates to true if stringA is equal to stringB
stringA != stringB Evaluates to true if stringA is not equal to stringB
(condition) True if the condition evaluates to true
!condition True if the condition is false
conditionA && conditionB True if both conditions are true
conditionA || conditionB True if either conditionA or conditionB is true

More interesting than a straight comparison is that XSSI allows you to test using regular expressions that follow the UNIX egrep syntax. In Table 6.1, character excludes newline.

Table 6.1. egrep syntax.

Syntax Effect
\character Matches that character.
^ Matches the beginning of a line.
$ Matches the end of a line.
. Matches any character.
[string] Matches any single character in the string. You can abbreviate ranges of ASCII character codes as a-z0-9. A ] may occur only as the first character of the string. A literal (a character) must be placed where it can't be mistaken as a range indicator.
character Matches that character, as long as that character doesn't have another meaning on a regular expression
expresion* Matches a sequence of 0 or more matches of the regular expression.
expresion+ Matches a sequence of 1 or more matches of the regular expression.
expresion? Matches a sequence of 0 or 1 matches of the regular expression.
expression expression Two regular expressions concatenated match a match of the first followed by a match of the second.
expression|expression Two regular expressions separated by | or a newline can either match the first or the second.
(expression) A regular expression enclosed in parentheses matches the regular expression definition.
The order of precedence of operators at the same parenthesis level is [], then *+?, then concatenation, and then | and newline.

The following example shows you how to output HTML based on the browser detected:


<!--#if expr="${HTTP_USER_AGENT} = /Mozilla/" -->

Output NETSCAPE specific HTML

<!--#else --> Output HTML 2.0 compliant formatting

<!--#endif -->

Summary


The benefits of Server Parsed HTML don't come gratis. The parsing process is costly in terms of server performance, and also adds some security issues. If you are concerned about security, you may want to disable the IncludeNoExecs option, which allows SPML but disables the #exec and #include commands from SSI. Obviously, this severely hampers the usefulness of SSI severely. For more information on the security issues involved, take a tour to Chapter 16, "Web Server Security Issues."

In terms of programming, SPML gives you the capability to embed some "programming" into your pages. However, there are many features that have to be lacking for it to become a serious option to CGI programs.

If you are interested in pursuing HTML scripting type languages, of which SSI is a member, you may want to explore PHP/FI, which is another HTML-embedded scripting language. For more information on PHP/FI, check its home page at http://www.vex.net/php.

Previous Page Page Top TOC Next Page