|
Chapter 23HTML with Perl Modules
CONTENTS
This chapter covers how to use Perl with HTML FORMs to get user input and respond back to Web servers. One of the examples in this chapter covers getting statistics from a Web site. The topics in this chapter include extending the way the collection of user input in an HTML FORM is handled, processing the input, and then displaying the results back in tabular form. The data used in this chapter is information from the well-known, free utility getstats. I also introduce a way to produce clickable images and show you how to connect to scripts that handle the input for you. At the end of this chapter, you'll continue to work with public-domain extension modules to Perl, CGI.pm and HTML.pm, which remove a lot of the onus from writing HTML pages and segregates the application from the HTML standard, thus making the application more portable and less susceptible to changes in the standard. Presenting Data on the Web Using CGI ScriptsThe World Wide Web provides a lot of flexibility for presenting and publishing data. You can present data in graphical images, or text data arranged neatly in tables, or just as plain columnar text. In graphical form, data can be shown as figures and charts, even as images of tables, and so on. For tabular forms of presentation, you can show data by using the built-in tabulating features of HTML. This chapter covers the basics of representing data on the Web. I do not assume that you have an existing, multi-layered, whiz-bang database. With such a database, no doubt you'll also have the tools to get this data out in just about any format you need. Instead, I concentrate on the basic comma-delimited format generated by most spreadsheets. It is easy to generate comma-delimited data from commonly available software. This chapter cannot possibly cover the database engines and display options for all the software packages out there in the software world. Finally, given the examples in this chapter, you can easily extend the methods learned from applying them to your own databases. You already learned the basic principles in Chapter 22, "Using HTML FORMs with Perl CGI Scripts," concerning the collecting of responses from an HTML form. Here are the basic steps involved:
Collecting User Input Using Perl ModulesIn this chapter I cover how to use existing Perl module extensions to collect user input in a CGI script. The best way to show something is by example. Refer to the Perl script in Listing 22.7 (in Chapter 22) for processing a very simplified credit card application. We had to go through several steps for extracting the data from the environment variables and setting the internal variables in the handler script. What if this was not necessary? That's when the CGI modules come in. The CGI modules, CGI.pm and its related files, covered in this section are used in conjunction with HTML modules in the next section of this chapter. The file you need to install this package is called CGI-modules.2.75.tar.gz. You can get it from your nearest CPAN site. The author of this package is Lincoln Stein. Please convey your comments directly to him at lstein@genome.wi.mit.edu. Untar and unzip the package file. You'll be left with a directory called CGI-modules.2.75 in the same directory. Move all the files in the /usr/lib/perl5/CGI-modules-2.75/CGI directory to the /usr/lib/perl/CGI directory. Now you're set to use the CGI modules stuff. Refer to the ./doc directory for more information. Listing 23.1 illustrates how the same application could be rewritten using the CGI module extension for Perl. Listing 23.1. A sample application rewritten with CGI modules. 1 #!/usr/bin/perl Let's examine some of the lines that show how the CGI module is being used. Lines 7 and 8 are used to declare that you intend to use the Base and Request classes of the CGI module. Both modules reside in the /usr/lib/perl5/CGI directory by default as Base.pm and Request.pm files, respectively. The CGI::Base class is required for all functions that you intend using in the CGI module. The CGI::Request class is required to parse incoming user input from QUERY_STRING in your CGI script. The CGI::Base class transparently handles all the POST and PUT requests and reads from STDIN into QUERY_STRING. You have to parse the value of the environment variable QUERY_STRING yourself or use the CGI::Request class. CGI::Base automatically sets Perl variables with the same name with the environment variable value. The CGI::Request does require the CGI::Base object for its initialization and subsequent use, even though it does not inherit any information from the CGI::Base object. Lines 9 through 16 declare a package for parsing the incoming fields in the QUERY_STRING. The assignments are of the form variable = name in a query string. The local package CREDIT declares those Perl variables that are required by this script as well as what strings to use to extract those values. Line 17 prints the response header for the HTML request back to the client. Line 18 takes the input from the client (in QUERY_STRING) and parses it into the members in the CREDIT package. Lines 23 through 69 process the error-handling for the CGI script as before. Lines 70 through 79 echo the members of the CREDIT package. Line 81 terminates the output from the CGI script. Terminating the CGI script destroys the CGI::Base and CGI::Request objects automatically. Line 80 is used for debugging and echoes all the environment variables set by the CGI::Base object. This output is very similar to Listing 20.9 in Chapter 20, "Introduction to Web Pages and CGI." In fact, given that Listing 20.9 is about 45 lines long, you can write a similar application that is much shorter using CGI class, as shown in Listing 23.2. Listing 23.2. Echoing CGI environment variables. 1 #!/usr/bin/perl It's complicated enough to write CGI scripts. It's worse to write lengthy Perl scripts to generate HTML for you. Using the CGI classes certainly takes some (not all) of the drudgery away. In the next section, you'll use some more features of HTML to show data. Using Tables in HTMLThe HTML 3.2 specification allows for displaying data in a clean tabular form using HTML widgets. The possibilities of showing data in a nice tabular format are tremendous. Tables in HTML pages are in the following form: <TABLE BORDER> The <TABLE> and </TABLE> tags delimit the table. The BORDER attribute instructs the browser to put lines around the cell. If you do not want borders around the cells, omit the BORDER attribute. The table will be as wide as the width of all the columns. Browsers adjust the width of columns to accommodate all the text as best they can. The ALIGN attribute can take one of three values to align the text in a table cell: left, right, or center (the default). The data in between the table data tags, <TD> and </TD>, is for the cell at a current row number. Rows start at every row tag, <TR>. The table header tags, <TH> and </TH>, specify the titles in the columns in the first row. Table data input is finished with the </TABLE> tag. If not enough headers are specified, the headers for the table will be empty. Note that the first row has four columns, but only three headers. Therefore, the fourth column will not have a heading. The Table widget provides other nifty features such as column spanning, where the colspan attribute determines the number of columns a heading or item will span. The rowspan attribute specifies the number of rows an item will span. Some sample code is shown in Listing 23.3. Listing 23.3. Using row and column spanning. 1 <html><head> Now that you know how to put data in a table for an HTML page, let's see what we can display using these tables. For this example, you will display the statistics of which pages on the server get hit the most. This way you can gauge what the most popular items are on the Web site. The statistics for the number of hits per file, including the date and IP number of the requesting server, are kept in a file called access_log in your Web server's logs directory. This is the file to look at if you want to know which file has been hit the most. The location of this file is set when your Web server is configured. You can find this file with the find command (find / -name access_log -print) if you are not sure where to look. Rather than write a whole statistics utility from scratch, you can use existing tools to get the information from access_log. In this example, you use getstats. The getstats program was written by Kevin Hughes at Enterprise Integration Technologies (www.eit.com). Get it via FTP from ftp.eit.com/web.software/getstats or from http://www.eit.com/goodies/software/getstats/src/statform.html. (It's available in source form only, and you have to use GNU's gcc to compile it. The source might be called getstats.XX.c, where XX is the version number. The version I work with in this section is 12.) One word of caution before you build the file: determine the type of server you are running and the type of format your access_log is in. The most common server is the ncSA server. Chances are that your access_log looks like this: crow.lib.uh.edu - - [04/Mar/1995:16:28:39 -0600] "GET /iistv.html The first item is the name of the calling browser, followed by a hyphen or an IP number, and then either a user name or a hyphen. Within the square brackets is the time of access, followed by the method of access and the file accessed. The HTTP server version number is listed next. Then the server result code is shown, followed by the number of bytes sent back. This is known as the COMMON format. If your log file is different from this one, determine the type of server you have and get the correct tool for it. In almost all cases, getstats will work for you, so try it anyway. You may be pleasantly surprised.
Let's now work with the getstats program to see how to use it. Before you can use it, you have to set some values for the getstats program sources and then recompile it. First of all, in the getstats.12.c source file you have to set #define COMMON to 1, not the value 0, which is the way it's delivered. Also, be sure to define the location of the files in your WWW directory, especially the name of your server and the location of the access_log file. Creating the program is as easy as typing the following command: gcc getstats.12.c -o getstats. Ignore any warnings you get with the gcc compiler. The warnings, if any, are harmless and are about comparing an integer with a pointer. To get all the statistics, you just type getstats at the prompt. If you do not get any output or the program appears to hang, make sure you have defined the COMMON value to 1 and then recompile. The output from the getstats program is long and verbose, depending on what files you have on your system. I am particularly interested in the following section of output: HTTP Server Request Statistics The Perl script to extract this section of code is shown in Listing 23.4. The variable $a is assigned the returned string from the getstats command. Listing 23.4. The Perl script to get the needed information. 1 #!/usr/bin/perl Here's the output from the script in Listing 23.4: 6:08/25/95:/bosnia1.htm The strange construct in the if clause, ($x =~ /(.*) : (.*) : (.*)/ ), looks for three words separated by colons. The ~= does the search-and-replace operation to eliminate any extra white space. Displaying statistics in a tabular form in HTML involves taking the output now safely stored in colon-delimited strings. This involves modifying the script to include printing out the correct table tags along with the data. The script is now modified to display table tags along with the returned data from the getstats program. The modified script is shown in Listing 23.5. Listing 23.5. The modified Perl listing to include table information. 1 #!/usr/bin/perl The $| command in the script at line 6 forces the output back to the calling browser as soon as the print statement is executed. This keeps the browser from timing out at the other end if the execution of the getstats command takes too long. The header and footer for the HTML document are generated from the statements print <<"HTMLHEAD" and print <<"HTML", respectively. See line 53 for an example. Everything between print and its terminating words (for example, HTMLHEAD) is printed verbatim. This keeps me from having to type many print commands. Also, note that I use a counter called $ctr in this script to limit the output (line 42) to only 10 rows. The machine I work on does not get hit this often, nor does it have that many files to offer. Your site may have a lot more hits per file. Therefore, in order to limit the output, you might want to keep this number to a reasonable value. Now you know how to display data in a table. There is much more to displaying data in a tabular form that simply cannot be covered in one chapter alone. Please refer to the online documentation for writing HTML pages. Using the HTML ModuleHTML::Base is an expansion module for Perl 5 that provides an object-oriented way to build HTML pages. Its purpose is to create HTML 2.0 tags, plus a few tags from the HTML 3.0 standard, including the Table tags. The package comes with documentation in the html_base.pod file.
Using this package shields you from a lot of the nuances of HTML syntax. Basically, you should be able to use this package without worrying about the nitty-gritty details of HTML. For example, special characters such as the ampersand (&) are output as the correct ASCII escape character required for HTML-all you do is type in the ampersand in the text you want displayed. Plus, using the module lets you use the flexibility and language abstraction of Perl. You really do not need to learn the syntax for HTML to use the module described here. However, such knowledge is invaluable when debugging the output from a script using this module. To install the HTML package, simply copy the file Base.pm to a subdirectory called HTML in whatever directory you use to store Perl 5 modules. For example, if your Perl 5 modules are in /usr/local/lib/perl5, you should create a subdirectory there called HTML and copy Base.pm into it, like this: mkdir /usr/local/lib/perl5/HTML Each object in the HTML::Base class represents a single instance of an HTML tag. An object whose class is defined by HTML::Base could be called an "HTML object." The primary function of the HTML::Base module is to provide definitions and methods for classes of HTML objects. A base class, known as HtmlObject, is defined from which all other HTML objects are derived. All objects know where they are situated in the hierarchy of HTML objects that make up a page (or pages) of HTML. They also know how to realize (display) themselves. Here are the steps involved in creating the document with this package:
A key point to keep in mind when working with any HTML object in the package is that you are always working with a "current" object. As you create more objects, they in turn become the current object. You can always make an object current by calling the make_current function. To go back up the hierarchy, you call the end_object function on each object that you want to be the default. HTML objects are created using the new function. Each newly created object becomes the current object and is then the parent of the next object created. This chain of parenthood continues until an object is ended, or until another object is made the current object. When an entire hierarchy of HTML objects has been created, it must be realized (or displayed). Realization is when the objects may be told to output the appropriate HTML for their object classes. The output is sent either to standard output or to a file. Listing 23.6 presents a modified example of what comes with the documentation. Listing 23.6. A sample HTML package usage. 1 #!/usr/bin/perl Here is the output from the HTML module: <BODY> To use HTML::Base in your Perl 5 program, include the following use command in the beginning of your program: use HTML::Base; HTML::Base exports no subroutine names into your program's name space. All objects that can output an HTML tag are derived from subclasses of the class created by the base. Each HTML object knows how to display itself and how to use fields called attributes in the display process. Each HTML object knows which attributes to recognize and will ignore all strings. It is okay to give your own attributes to HTML objects during their construction as long as their names do not conflict with any of the standard HTML attributes. Constructing HTML ObjectsHTML objects are constructed using the new function. The simplest case is an HTML object that needs no attributes: $line = new HTML::Base::HorizontalRule; This creates a line tag, making it the child of the current HTML object. After construction, the new object becomes the current object; therefore, the next HTML object to be constructed will be the child of this HorizontalRule object. You can prevent this from happening by calling $line->HTML::Base::end_object() This call to the end_object() function will set the current object as the parent of the line object. Some HTML objects must have a mandatory first parameter specified. For example, the HTML headings come in six flavors (numbered 1-6); therefore, to create a Header object you can use either one of the following two lines: $h2 = new HTML::Base::Header 2; This creates a Level-2 heading as the child of the current HTML object. All HTML objects will accept attributes. It is assumed that the attributes (if any) will follow any required parameters in the new call and take the form of simple key-value pairs, like this: new HTML::Base::Anchor ('HREF','http://www.ikra.com/', HREF is recognized as a valid attribute. Name is not used by the object because it's not all uppercase letters. Those HTML objects that do recognize attributes expect them to be set in the constructor. Consider the following line, which creates an HTML image reference: new HTML::Base::Image An image tag is created with the given SRC, ALT, and ALIGN attributes. Note that all attributes are in capitals. Lowercase and mixed-case letters for attribute names will cause the attribute to be ignored. Specifying the Body of the Text in HTML DocumentsUse the HTML::Base::Text portions for implementing regular text in HTML. There are three attributes for this object: Text, Eval, and Verb. The output from all three is in the form of a paragraph. Text is a special-purpose HTML object that has no HTML tag associated with it. Instead, it is meant to contain the text that makes up the actual content of the HTML document. A Text object that is a child of an HTML object will output its text within the scope of the HTML tags of its owner. When being passed to the HTML::Base::Text constructor, the text to be displayed must be the first parameter, preceding any attributes to be set. The text may also be passed in as the attribute 'Text', but if specified like this, it must be the first attribute given. All three of the following lines are equivalent: new HTML::Base::Text "This is my text"; By default, the text is sanitized for HTML when an object is being realized and makes the text HTML easier to read by translating special HTML characters (such as &) into their HTML escape equivalents. Two other attributes are defined for the Text object. If Verb is defined in the constructor, then the text will not be processed in any way or form before being output. This allows you to pump out raw text "as is" to the HTML document. You are responsible for the sanity of such code. This is useful for sending code samples as part of output. Similarly, if Eval is defined, the text is first passed to the Perl eval() function. The output of that call is sent, unfiltered, to the output stream. The value of Eval is set to 1 for evaluation to take place. Controlling the Output Destination FileBy default, all HTML output by the objects is directed to STDOUT. This can be changed using the OUTPUTFILE attribute of the Page object, which creates the <HTML> and </HTML> tags and takes the attributes OUTPUTFILE and OUTPUTMODE. The OUTPUTMODE attribute can be set to appeND or OVERWRITE. Thus, HTML::Base::Page not only outputs the <HTML> and </HTML> tags, but also controls the file handle to which output for a particular page of HTML is sent. Each Page object tracks its own output file handle. This allows you to nest Page objects in a hierarchy (if you want to). Here is the segment of code to track the page it's writing to and its respective output. $page = new HTML::Base::Page ('OUTPUTFILE','first.html'); Here is the way to look at the output in two different HTML files: $ more first.html Using the Tables Feature in HTML::BaseThe 0.6 release of the HTML::Base includes support for generating tables for HTML 3.0 and later. See Listing 23.7 for an example. Note in Listing 23.7 how each table row object is created and then ended before a new one is created. The end step is not necessary when creating data items because the object is smart enough to figure out which parent to use. Note ending </TR> tags in this output. This does not affect the output in any way with Netscape, although this is not the "right" way to generate the table row end tags. I cover the correct way to end these <TR> objects in Listing 23.8.
Listing 23.7. Using tables in HTML::Base. 1 #!/usr/bin/perl Here is the output for the HTML tables in the output file: <BODY> Sure, the output does not look pretty as far as HTML pages go. However, the code generating this HTML output is abstracted from the HTML implementation below it. If the HTML specification is upgraded, the package optimized, or the module otherwise enhanced, then our Perl scripts would not be affected as long as the interface is kept consistent. Let's rewrite the usage of the getstats module with the CGI and HTML modules (see Listing 23.8). The placement of the <TR> and </TR> tags is now correct because the row object is ended correctly. Contrast the output of this listing with the output from Listing 23.9. You'll see how the </TR> tags are matched when objects are ended and how they output one long list when objects are not ended correctly. As a rule, if you create a row, you must end it. Listing 23.8. A rewrite of the getstats script using Perl modules. 1 #!/usr/bin/perl Here's the output of the getstats rewrite: Content-type: text/html Making Clickable Images in HTMLPictures often convey more information than do gobs of text. Sometimes a picture or graph can describe data better than a table. HTML documents allow you to display GIF or JPEG images in documents. With CGI, you can even have "hot" portions of a GIF image so that clicking the hot area of the image produces input from the client to the server. Currently, the images have to be in the GIF format. To make an image "clickable," you have to define regions on the image in the form of rectangles, circles, or other closed polygons. The coordinates within each defined region are then associated with an URL to follow if the click happens to be in that region. The mapping of an image click to coordinates is done through a program called image map. If you do not have the image map executable on your machine, you have to install it yourself. This installation is simpler than it sounds. For UNIX systems, your httpd daemon software should untar itself with a cgi-src directory containing the source for image map. For a CERN server, this file is called htimage, but the building and installation is very similar. Edit the source file to point to the location of your server's root tree and make the executable using the makefile provided with the server software. The image map program on almost all UNIX systems requires a file called imagemap.conf. The location of this file is set in the image map executable. If you are making the imagemap file, you have to edit the CONF_FILE constant to specify the location of the imagemap.conf file in the source file. The default line is shown here: #define CONF_FILE "/usr/local/etc/httpd/conf/imagemap.conf" The imagemap.conf file is a text file with all mappings as one item per line. The items in the imagemap.conf file have two text parts each. The first part is the name of the GIF file with the extension replaced with a colon. The second part is the absolute path to the map file for the image. Therefore, notepad.gif would have this entry on my system: notepad: /usr/local/etc/httpd/htdocs/notepad.map By convention, the image and the map file share the same base name. The .gif extension is for the image, and the .map extension is for the map file. The map file is a text file as well, containing the methods to use when mapping mouse clicks on the image. One method is defined per line. Each method in the map file therefore defines the hot spots for the image. Each method is of the form method URL coordinate1 coordinate2 .... coordinateN where coordinates take the form x,y. The number of coordinates depends on the type of hot spot. If regions overlap in a mapping file, the first region hit is returned. Here are the types of methods:
A sample mapping is shown in Listing 23.9 for a notepad.gif image. In the image itself, you have to add the URL as shown in Listing 23.9. Note that the HREF URL for this image ends in imagemap/image-name. point http://www.ikra.com/cgi-bin/pointer.pl 5,5 Listing 23.9. The URL for the mapping. 1 <html> SummaryThis chapter has been an introduction to some of the techniques available to you for presenting data with CGI Perl modules. I covered ways of abstracting your CGI scripts from server implementations by using Perl modules. I also covered how to show data in a table and how to collect user input via images. The two modules covered in this chapter include the CGI and HTML modules available from the CPAN archives at http://www.perl.com. Both modules provide a clean interface for your CGI scripts and can also be used to generate your own HTML documents.
|
|||||||||||||||||||||||||||||||||
With any suggestions or questions please feel free to contact us |