Chapter 8 -- Documenting Perl Scripts

Chapter 8 Documenting Perl Scripts `Perl/Tk`

Embedding man Pages
The POD Format
Translating POD into Other Formats
Summary

This chapter introduces you to the types of documentation available for Perl scripts. I cover two documentation methods: embedding nroff pages and using the plain old documentation (POD) format. Documenting your Perl scripts is easy and standardized enough in Perl 5 to allow for generating LaTex, HTML, and man pages from the source files.

Embedding `man` Pages

Generally, you have to keep the software documentation for your program in a file separate from the source code. This separate storage forces you to remember two different files for every change you make: one for the source file and the other for the documentation. The consequence is that the source file is almost never in sync with the documentation.

Since Perl 4, you can keep your documentation and code in the same source file. The way to do this is to use the nroff tags in the man package and embed these codes in a Perl source file. This trick works only if you are using the -man package with nroff; therefore, if your system is not UNIX-like or if you abhor nroff, you should skip ahead to the next section, which is on POD formats.

The way to do this embedding was documented initially in the book Programming Perl by Larry Wall and Randall Schwartz, published by O'Reilly & Associates. Wall has also written for this book a shell script, called wrapman, that performs embedding as a template. However, it will be instructive to see how the method works.

The trick in embedding man pages is to use the .di and .ig commands in nroff. The .di command in nroff "diverts" text into an nroff macro. It works this way: There are two .di tags; one is defined at the top of the text to be diverted as .di X, and the other .di tag (with no arguments) at the bottom of the text. The first .di X asks nroff to divert all the text into macro X until it sees .di at the start of a line. The .ig macro in nroff works the same way as .di, but it forces nroff to ignore all text between .ig X and any other .X tag. Now comes the important part: The double quotes ("") in both .ig and .di commands can be replaced with a single quote to get 'ig and 'di commands that do the same thing as the .ig and .di commands, except that text output is suppressed until the macro call is over. Note also that the single quote is, interestingly enough, also the defining character for a string in Perl.

Another point to remember is that Perl stops interpreting your script when it sees an _ _END_ _ token. This stopping feature can be used to your advantage because you can put all of your text after the _ _END_ _ token.

So if you were to add the following two statements to the start of a Perl script, your script would still run:

'di'; 'ig00 ';

As far as Perl is concerned, these two lines are simply strings. For nroff, the two lines are interpreted as calls to macros. The first line uses the 'di'; macro to divert text until it sees 'di on a line by itself. The next line 'ig00 '; diverts text until it sees .00 on a line by itself.

Now, at the end of the source file, place the following lines, which are valid in Perl and in nroff:

.00 # Terminates the .ig processing 'di # Terminates the 'di X processing. .nr nl 0-1 # Sets the page to the start of the document .nr % 0 # Sets the page count back to zero '; _ _END_ _ # Terminates the 'di macro and all Perl interpreting

'di and '; really do define a Perl string between single quotes. _ _END_ _ stops Perl from processing further, and .00 is conveniently ignored by Perl.

Now you can place the man page contents after the line containing the _ _END_ _ statement. Look at the sample listing shown in Listing 8.1. The output from this listing is shown in Figure 8.1.

Figure 8.1 : man page output

Listing 8.1. Embedding man pages in Perl.

1 #!/usr/bin/perl 2 'di '; 3 'ig 00 '; 4 5 print "$#ARGV \n" ; 6 if ( $#ARGV ) { 7 print "\n Usage: $0 file \n"; 8 exit 0; 9 } 10 $name = $ARGV[0]; 11 12 print "\nTesting flags for $name \n"; 13 print "\n========== Effective User ID tests "; 14 print "\n is readable" if ( -r $name); 15 print "\n is writeable" if ( -w $name); 16 print "\n is executeable" if ( -x $name); 17 print "\n is owned " if ( -o $name); 18 print "\n========== Real User ID tests "; 19 print "\n is readable" if ( -r $name); 20 print "\n is writeable" if ( -w $name); 21 print "\n is executeable" if ( -x $name); 22 print "\n is owned by you" if ( -o $name); 23 24 print "\n========== Reality Checks "; 25 print "\n exists " if ( -r $name); 26 print "\n has zero size " if ( -z $name); 27 print "\n has some bytes in it " if ( -s $name); 28 29 print "\n is a file " if (-f $name); 30 print "\n is a directory " if (-d $name); 31 print "\n is a link " if (-l $name); 32 print "\n is a socket " if (-S $name); 33 print "\n is a pipe " if (-p $name); 34 35 print "\n is a block device " if (-b $name); 36 print "\n is a character device " if (-c $name); 37 38 39 print "\n has setuid bit set " if (-u $name); 40 print "\n has sticky bit set " if (-k $name); 41 print "\n has gid bit set " if (-g $name); 42 43 print "\n is open to terminal " if (-t $name); 44 print "\n is a Binary file " if (-B $name); 45 print "\n is a Binary file " if (-T $name); 46 47 print "\n is Binary to terminal " if (-t $name); 48 print "\n is open to terminal " if (-t $name); 49 50 51 .00 ; 52 53 'di \" finish diversion 54 .nr nl 0-1 \" Start new page with -1 55 .nr % 0 \" start at page 1 56 '; _ _END_ _ #### Start Man Page #### 57 58 .TH Test 1 "Apr 15, 1996" 59 .AT 3 60 .SH NAME 61 tf - Test file attributes 62 .SH SYNOPSIS 63 .B tf file 64 .P 65 .B tf directory 66 .SH DESCRIPTION 67 .I tf 68 Prints out the file attributes for a file. 69 .SH FILES 70 Just add perl. 71 .SH AUTHOR 72 Kamran Husain. 73 .SH BUGS 74 We don't believe in bugs, we introduce features.

Warning

You might have to work with lines 2, 3, and 51 to get the spaces right if you are using different versions of nroff. The groff version of GNU did not work on two machines but worked fine on a Sun with these lines:

'di'; 'ig00';

To get the two lines to work properly, I had to introduce a space in the calls to the macros:

'di '; 'ig 00 ';

You have been warned. The limitation of this method should be obvious by now: It's useful for generating one man page for one source file. In addition, it's too heavily tied to the nroff package. The man page will not be generated on NT machines that do not have the nroff packages installed by default. Obviously, something more generic is needed. This is where the POD format comes in.

The POD Format

The Perl plain old documentation (POD) format is designed to be an easier way to get your Perl files documented. Once you have documented your files in the POD format, you can use a translator program to convert your documents into HTML, LaTeX, or man pages. Nothing really prevents you from writing your own translator program; however, once you convert your documents into HTML, you can use off-the-shelf products to convert them into other word processing formats. For example, the Internet Assistant for Microsoft Word lets you read and convert HTML into a variety of formats.

The POD format lets you introduce some formatting directives into your source files. Note that the formatting terms in Table 8.1 all begin with an equal sign (=).

Table 8.1. Formatting terms for POD files.

Term	Description
`=pod`	Begins formatting. The Perl interpreter ignores all text until it sees the `=end` directive. Only POD-related text is found between the `=pod` and `=end` directives.
`=end`	Stops formatting. Only POD-related text is found between the `=pod` and `=end` directives.
`=head1`	Header level 1.
`=head2`	Header level 2.
`=over N`	Starts indentation by moving the text to the right by `N` columns. By convention, the value of `N` is `4` to accommodate the translation programs; however, it does not have to be `4`.
`=back`	Nullifies a previous `=over` directive. An `=over/=back` pair is used to print lists of items.
`=item C`	Specifies an item to be used between `=over/=back` pairs. `C` is a character or number to use as the bulleted item. There must be at least one `=item` in an `=over/=back` list.

An example here will help. In Listing 8.2, a file called tf.pod is constructed to document the man page in POD format.

Listing 8.2. A sample POD file.

0 #!/usr/bin/perl 1 =pod 2 =head1 NAME 3 tf - Test file attributes 4 5 =head1 SYNOPSIS 6 7 Usage: 8 9 tf F<file> 10 11 tf F<directory> 12 13 =head1 DESCRIPTION 14 15 The first thing to rememeber is that text is not formatted in a pod 16 file but rather in the formatter. Paragraphs are left as they are. 17 18 The B<tf> program (notice how tf bold) works on these items: 19 20 =over 4 21 22 =item * Files 23 24 Just file names in your directory tree. The file name could be a 25 regular file, socket, device or a link. 26 27 =item * Directories 28 29 Yes, it'll work on directories too. 30 31 =back 32 33 Ship it! 34 35 =head1 BUGS 36 37 Remember the note about features? 38 39 =head1 Header 1 40 41 This is a header 1 42 43 =head2 Header 2 44 45 This is header 2 in I<Italics>. 46 47 =head2 Another Header 2 48 49 This is header 2 in B<BOLD>. 50 51 Another list with non-bulleted items. 52 53 =over 5 54 55 =item First 56 57 This is the First item. 58 59 =item Second 60 61 This is the Second item. 62 63 =item Third 64 65 This is the Third item. 66 67 =back 68 69 =cut 70 ... the rest of the script will be here ...

Line 1 begins the POD portion, and line 69 is where POD processing is cut. Line 70 is where the executable code would start; that is, right after the line that contains the "=cut" tag. Line 0 is present if this is an executable script and absent if this is only a Perl file. All the tags are separated by a blank line, but this is really unnecessary. In my opinion, the POD documentation is more readable if the tags are separated by blank lines.

Now, look at line 18 in Listing 8.2. The B<text> tag is used here to place text in bold typeface. Several tags exist for formatting text. Table 8.2 lists these tags.

Table 8.2. Tags for formatting text.

Tag	Description
`B<text>`	The `text` is placed in bold.
`I<text>`	The `text` is placed in italics.
`S<text>`	The `text` contains non-breaking spaces.
`C<code>`	A literal code for the formatter.
`L<name>`	A link to a `man` page referred to by `name`.
`L<name/sec>`	A link to a section `sec` in a `man` page referred to by `name`.
`L<name/"sec">`	A link to a section in this `man` page.
`L<"sec">`	A link to a section in this `man` page.
`F<file>`	A file name.
`X<index>`	An indexed entry.
`Z<>`	A zero width character.

In most cases you only wind up using the B<> and I<> tags, as you'll see in the documentation that comes with Perl. Refer to Listing 18.2 to see how some of the formatting codes are used in POD files.

The POD information in a file can be included just about anywhere in a source file, although it's best to place this information either at the top or bottom of the source file. As long as you keep your =pod, =cut, and =over/back pairs matched, you shouldn't run into any problems.

Translating POD into Other Formats

Three filters exist that convert POD formatted documents into three different formats. Here's a list of these filters:

Filter	Description
`pod2html`	Used to convert POD files to HTML files
`pod2man`	Used to convert POD files to `man` pages
`pod2latex`	Used to convert POD files to LaTeX files

To run these programs, simply type the command and the filename. For example, to generate HTML files from the POD file shown in Listing 18.2, run this command:

pod2html gnat.pod

You'll find that running the pod2html program on gnat.pod created a file called gnat.html in your directory. The output for Listing 18.2 is shown in Figure 18.2.

Figure 8.2 : HTML output from pod2html

Summary

This chapter covered two ways of documenting Perl files: one using man pages and the other using POD documentation. man pages can be embedded in the source file, but they require the use of nroff with the man package. POD files are more generic in that you can use translators to convert from POD to one of three known formats: HTML, man, or LaTeX. In extreme cases, you can even write your own Perl script to decode the POD format and write files in your own format. If you really need to do something elaborate, you might want to consider taking the formatted HTML output from a pod2html program and placing the output in a word processor, such as Microsoft Word, to edit the HTML file directly.

Previous chapter Chapter contents Contents Next chapter

Chapter 8

Documenting Perl Scripts Perl/Tk

CONTENTS

Embedding man Pages

The POD Format

Translating POD into Other Formats

Summary

Documenting Perl Scripts `Perl/Tk`

Embedding `man` Pages