Online Documentation Server
 ПОИСК
ods.com.ua Web
 КАТЕГОРИИ
Home
Programming
Net technology
Unixes
Security
RFC, HOWTO
Web technology
Data bases
Other docs

 


 ПОДПИСКА

 О КОПИРАЙТАХ
Вся предоставленная на этом сервере информация собрана нами из разных источников. Если Вам кажется, что публикация каких-то документов нарушает чьи-либо авторские права, сообщите нам об этом.




Chapter 31

Generating Code


CONTENTS


This chapter introduces you to using Perl to solve real-world problems. The ideas you'll gain from this chapter pertain to applying Perl to solve coding problems. The sample problem chosen is complicated enough to be encountered during your programming endeavors. Remember to concentrate on how the problem is solved, not on for what the final solution is being used.

Introduction

The main concepts introduced in this chapter have nothing to do with navigation and seismic fields. However, the problem that I address in this chapter is related to writing a seismic navigation record parser. This parser was crucial in getting a delayed project up and running in the field. Rather than spend days writing a parser in C, we were up in a few hours of coding effort. Plus the concepts gained during this experience helped me write parsers for other data formats for the same project with just as little effort.

Basically, I'll be covering ways to use Perl to generate C code for a parser. Instead of writing another phone book manager or a database for music or home inventory records, it's probably better that we deal with a real-world example. Perhaps after reading this chapter and seeing how this problem was tackled, you can draw parallel solutions for your current problems.

The sample project involves writing a parser to read FORTRAN-based records for a seismic survey. Most seismic navigation data, believe it or not, is based on an archaic standard based on the use of FORTRAN programs to read it. The standard is known as the UKOOA P2/86 standard. Sometimes this format is simply referred to as P286. This format was developed for old FORTRAN programs, and a huge amount of data still exists in this format. Now with the wide acceptance of C for most of the code being developed for graphical interfaces and numeric processing, it's only natural to look for ways to read these files without requiring the FORTRAN executable.

Of course, the decode and encode functions for P286 had to be done yesterday. Given the options of getting another job or writing 100+ functions and structure declarations, I tried tackling this problem with the lazy programmer's approach: Let Perl do the grunt work.

Here is what we were dealing with. Each record of data in a P286 data file is exactly 80 characters long with a trailing newline or null character. Fields within a record are based on character column positions and lengths in the record. Columns are numbered from 1 up. Because white spaces and commas can be part of the data, there are no "field separators" as such.

The first few characters of a record identify the type of data in the record. For example, records begin as H0001, H0002, and so on, with the rest of the characters as fields within the record. There are 52 such records in all. Records in a file are not in sequential order. The first identifying characters are between three and five characters long. The only guarantees are that there will always be only one record per line and that there will be no blank lines.

By reading the specification I discovered these things that would make the coding process programmable:

  • The only variables that could be defined in all records were either integers, doubles, or character strings. This implied a closed set of variables to check for.
  • The specification of all the headers was explicit about where each data item began in a line and where it ended. The lack of delimiters between fields was no longer a problem since a line could be chopped into substrings quite easily.
  • These were text files instead of packed binary files. This made debugging a little easier because no special routines were required to display the data.

Voilà! The two important criteria (consistency and working with a closed set) made it possible to do the encoding and decoding functions manually. By using a Perl program to do the coding for me, I reduced the possibility of errors. Any one error would be propagated to all functions and would be easy to catch and fix.

The most obvious question was, Why not use Perl to do all the decoding and encoding? The encoding and decoding routines were to be incorporated into a C program running on different platforms running DOS or another lower operating system. Embedding Perl within the C program would involve installing and maintaining Perl on these platforms. Most of the platforms the final code would run on did not support Perl.

Choosing the Input File

After reading the specification, I extracted all the header declarations into one file called P286hdrs. The specification listed the contents of headers in plain text in between two keywords, RECORD and END. The specifications were not consistent enough to be extracted using a program. Actually, I extracted most of the lines defining the format using an awk script:

/RECORD/,/^ENDREC$/ { print ; }

This script removes all the lines between the lines containing RECORD and END. I still had to do some editing after extracting all these records to get the correct input format.

The case presented in this example had a text file with the following format:

RECORD TypeOfRecord StringToUse
    variableName    varType    startCol endCol [format]
    variableName    varType    startCol endCol [format]
    variableName    varType    startCol endCol [format]
        [REPEAT count st1 st2 ... ]
    variableName    varType    startCol endCol [format]
    variableName    varType    startCol endCol [format]
    variableName    varType    startCol endCol [format]
ENDREC

The variableName would be the name of a variable in a structure; the VarType would be int, double, or char. The startCol and endCol values defined the locations in the string where the data could be picked up. The first column was still numbered 1 instead of 0. It's easier to increment by 1 in a program than to change so many declarations.

Some blocks of variables in some records were repeated. These were defined after the option REPEAT keyword. The syntax for the REPEAT keyword was this:

REPEAT count st1 st2 ... stN

The st1 to stN are the starting offsets for all the fields that follow the REPEAT word. The count specified the number of times to repeat these blocks.

For example, the following record is interpreted as "Record H0001, with one variable starting at column 29 up to column 80."

RECORD H0001
    SurveyType char 29 80
ENDREC

Another example of a record using more than one field is shown here:

RECORD H011
    datumId    int 5 5
    spheroidName     char 6 23
    datumName     char 24 41
    semimajorAxis    double 42 53     12.3
    conversionFactor double 66 77    12.8
    inverseFlattening double 66 77     12.7
ENDREC

Note how only the first four characters are relevant in identifying the record. Also, the integers and characters don't have a floating point string, whereas the numbers defining the double type do have a floating point specification of the form: length.decimals. The length is the total number of columns in the number including the decimal point, the decimals portion is the number of digits to the right of the decimal point. For example, 12.8 will occupy 12 character spaces in one column, 3 digits to the left of the decimal point and 8 digits to the right of the decimal point. The format for the floating point number is the same as that for a printf() statement in C.

Another example is a record using REPEAT fields, as shown here:

RECORD E3100
    velprop  double 6 12 7.2
    REPEAT   5  13 26 39 52 65
    srcNdx   int 13 15
    dstNdx   int 16 18
    slant    double 19 25 7.2
ENDREC

In this record type, the block {srcNdx,dstNdx,slant} is repeated at columns 13, 26, 39, 52, and 65. This implies that each of these variables can be interpreted as arrays of five elements each.

The entire file for parsing these records is about 520 lines long. A shorter sample file is shown in Listing 31.1. Note how comment lines in that listing are inserted in this input file with the use of the # character. Actually, any lines could be used for comments as long as the comments are outside the confines of RECORD and ENDREC statements. The reason to use the hash is to maintain some consistency with Perl.


Listing 31.1. The input file.
 1 #
 2 # Comment lines are permitted in the usual Perl style.
 3 #
 4 RECORD H0001
 5     SurveyType  char 29 80
 6 ENDREC
 7
 8 RECORD H0010
 9     numPatterns int 6 7
10     sblInUse    int 8 8
11     sattInUse   int 9 9
12     numVessels  int 10 10
13     numDatum    int 11 11
14     offsetMode  int 12 12
15 ENDREC
16
17 RECORD H011
18     datumId     int 5 5
19     spheroidName     char 6 23
20     datumName     char 24 41
21     semimajorAxis    double 42 53     12.3
22     conversionFactor double 66 77    12.8
23     inverseFlattening double 66 77     12.7
24 ENDREC
25
26 RECORD E3100
27     velprop  double 6 12 7.2
28     REPEAT      5  13 26 39 52 65
29     srcNdx     int 13 15
30     dstNdx      int 16 18
31     slant    double 19 25 7.2
32 ENDREC

Parsing Records

Now that we have an input file, let's tackle parsing the records within this file. The most likely way to tackle this problem is to generate three files from the input file: one header, one file with all source code for the decoder, and one source file for the encoder.

The pseudocode looks something like this:

open file for input
open files for output
while  (more records)
      
 if recognized start of record
    start structure definitions
    start encoder function preamble
    start decoder function preamble
      
 if recognized end of record
    terminate structure definitions
    terminate encoder function preamble
    terminate decoder function preamble
    if within record
    generate structure variable definitions
    generate encoder function parsing for variable
    generate decoder function parsing for variable
close all files

Perl has the capability to have more than one file open at once. By running the input file through one parser, you can generate three files simultaneously. All you do is send the output to its respective file. In this case, three files are opened for output: HDRS for the header declarations, EncD for the encoding output, and DECD for the decoding output. The SAFE handle is used to read in the input records from the P286hdrs file. The lines to open the file are

open (SAFE, "P286hdrs") || die "Cannot open Input file  $!\n";
open (HDRS, ">P286.h")  || die "Cannot open  Header $!\n";
open (EncD, ">P286enc.c")  || die "Cannot open  Encoder $!\n";
open (DECD, ">P286dec.c")  || die "Cannot open  Decoder $!\n";

After the files are opened, some preamble stuff is required for each source and header file. The calls to the these functions provide the initialization. The contents of each file are destroyed when the files are opened, so you have to initialize each file:

&startHeaderFile();
&startEncoderFile();
&startDecoderFile();

Then, a while loop simply reads in all the input, one record at a time.

After chopping off the terminating newline, the incoming line is examined to see whether there are any comments or if it's a blank line. If either case is true, the line is discarded. Look at the following code:

while (<SAFE>) {
    chop($_);
    if (/^#/) { next; }
    if (/^\s*$/) { next; }

If the line appears to be non-empty, it's split into the @names array and examined for the tokens RECORD, ENDREC, and REPEAT. The default case is to process variable types and generate either structure variable declarations or code for encoding and decoding their values.

If it's a RECORD token, a new declaration is started for a structure. The name of the record is in $rname, with the $rtype as the type of record. Note how the $_ is used twice when splitting the record. The value of $_ is not modified with a call to the split() function nor is any function called that will modify the value of $_. The three functions, &startHeaderRecord($rtype), &startEncoderFunction($rtype), and &startDecoderFunction($rtype) take the P286 record type and generate a header declaration, an encoder function preamble, and a decoder function preamble. We also mark the fact the we are starting a new record by setting a flag $inRecord to 1. Further processing of the incoming line is halted with a call to the next() function. The $repeat flag is set to 0 to start a new record and to stop any previous declarations for any previous records. The fragment of code to start each type of data is shown here:

if (/^RECORD/) {
    ($rname,$rtype,@rest) = split(' ',$_);
    &startHeaderRecord($rtype);
    &startEncoderFunction($rtype);
    &startDecoderFunction($rtype);
    $inRecord = 1;
    $repeat = 0;
    next;
   }

The code in while also has to check when the record has ended with the receipt of an ENDREC token. When ENDREC is seen, three functions are called to close up the structure and function declarations started in the RECORD structure. Because we are no longer within a record, the value of $inRecord is set to 0 and the next function is called to skip further processing of this record. The fragment of code to do this cleanup is shown as this:

if (/^ENDREC/) {
    &stopHeaderRecord($rtype);
    &closeEncoderFunction();
    &closeDecoderFunction();
    $inRecord = 0;
    next;
    }

The REPEAT block is hit if the word REPEAT is the first word on a new line. Note that in RECORD and ENDREC token recognition, we looked at the start of a new line, whereas with the REPEAT keyword, we look for the REPEAT token after some white spaces from the start of a new line. The offsets are derived in two stages. The first stage gets the number of offsets to work within $count with the split call. The first stage puts the REPEAT line's variables into an array called @allOffsets. The next stage calls the splice() function to extract the subset of items starting from item number 2 in @allOffsets. The @offsets array then has the offsets in a record where the rest of the variables will be repeated in blocks. The next function is called to proceed to the next line of the input file.

The fragment of code to do this is shown here:

if (/^[\s]*REPEAT/) {
    $repeat = 1;
    @allOffsets = split(' ',$_);
    $index = $allOffsets[0];
    $count = $allOffsets[1];
    print "INDEX = $index, COUNT= $count";
    @offsets = splice(@allOffsets,2);
    next;
    }

Finally, the default processing begins for a line. The first thing to do before attempting to parse a line is to see whether we are in the middle of a record. Because the input file may contain free-form text, too, in the future, this is a bit of insurance to help prevent any variables from being accidentally declared.

The incoming line is parsed to extract five values into an array. The input string $_ is split on white spaces. The call for this is

($vname,$vtype,$from,$to,$fmt) = split(' ',$_);

There are two ways to process a variable in our case. One is when a variable is by itself and another is when the variable is in a block being repeated. If a block is being repeated, it's easier to simply declare an array and parse into it. Here's the code to handle this part:

if ($inRecord)
    {
    ($vname,$vtype,$from,$to,$fmt) = split(' ',$_);
    if ($repeat == 0)
        {
        &makeHeaderItem($vname,$vtype,$from,$to);
        &encodeVariable($vname,$vtype,$from,$to,$fmt);
        &decodeVariable($vname,$vtype,$from,$to,$fmt);
        }
    else
        {
         &makeArrayedItem($vname,$vtype,$count,$from,$to);
        $offsetLen   = $to - $from + 1;
        $offsetCount = 0;
        foreach $x (@offsets) {
            $offsetFrom  = $x;
            $offsetTo    = $offsetLen + $x;
            $offsetName = sprintf "%s[%d]", $vname,$offsetCount;
            print "Name = $offsetName, COUNT= $count\n";
            &encodeVariable($offsetName,$vtype,
                            $offsetFrom,$offsetTo,$fmt);
            &decodeVariable($offsetName,$vtype,
                            $offsetFrom,$offsetTo,$fmt);
            $offsetCount++;
            }

        } ## end of else clause.
    }
} # of while loop.

The while loop continues to process each line in the input file until all the record definitions have been completed. After the while loop ends, any terminal processing that be must done is completed and all open files are closed:

close (SAFE);
&closeHeaderFile();
close (HDRS);
close (EncD);
close (DECD);

When the program terminates, you should have three files in the directory: P286.h, P286enc.c, and P286dec.c. The headers are declared in the P286.h file, encoding functions are declared in the P286enc.c file, and the decoding functions are declared in P286dec.c. The acid test really is to see if the code compiles. Try these commands-you should see no errors:

gcc -c P286enc.c
gcc -c P286dec.c

Alas, we still have to write the code to use these functions. But that's really beyond the scope of this book. What's important to see is that in a few hours or so, we have created the boring part of the application and are now ready to proceed with using these tools.

During this discussion I have glazed over the details of how the header and source files are created in the subroutine calls we made. Let's take a look at the details of how these functions work.

Writing the C Header Files

The first task for generating the header file is to create the preamble to the include file being created in the file that is pointed to by the HDRS file handle. The code to do this is as follows:

sub startHeaderFile() {
    print ( HDRS "#ifndef P286_HDRS\n",
 "#define P286_HDRS 1\n",
 "#define STRPTR   char *\n");
    }
When the file is closed, you'll want to put in an #endif statement to allow multiple inclusions of the header file. This is done with a call to the subroutine sHeaderFile():
sub closeHeaderFile() {
    printf HDRS "\n#endif\n";
    }

The empty pair of parentheses in the function declaration is used in Perl 5.002 or later to define a function prototype that allows for no input parameters. When a RECORD header is received, it generates two items: a #define token for a header number and the preamble for the structure to use. The token is helpful if you want to create a parser that does a switch() statement on a type of structure. To make sure that each token has a unique value, a counter is kept in $recordCounter for use in assigning a record type a unique value. The code to do this is shown here:

sub startHeaderRecord {
    my ($name) = @_;
    printf HDRS "\n#define P286_%s_RECTYPE %d ", $name, $recordCounter++;
    printf HDRS "\n\ntypedef struct P286_%s_type {", $name;
}

By assigning the @_ array to my($name) we are actually permitting more than one argument into the function even though only the first argument is used. The contents of the @_ array are not altered in this case. Using a command like my $name = shift @_ would achieve the same purpose but would also alter the contents of the @_ array.

Each structure definition being created has to be stopped. Two variable types are constructed: P286_HEADERNAME_TYPE and a pointer type to the structure *P286_HEADERNAME_PTR. The code to do this is shown here:

sub stopHeaderRecord($name) {
    my $name = shift;
    my $ntype = "P286_" . $name . "_TYPE";
    my $nptr  = "*P286_" . $name . "_PTR;";
    print HDRS "\n}" .  uc($ntype) . "," . uc($nptr) ." \n";
}

Note the syntax used for the function prototype shown previously. The $name variable declaration in the argument list is only valid in Perl 5.002 or later. Formal parameter lists to subroutines in Perl are not completely supported in Perl as we go to print. It does not hurt to be prepared for the future by including formal parameter lists if they do not affect the underlying code in the subroutine itself. If you want to force Perl to take only one parameter into this subroutine, you can also declare this as

sub stopHeaderRecord($) { … }

The dollar sign in the parentheses will be used by the Perl interpreter as an indicator that only one parameter is allowed into the subroutine.

For a non-arrayed item, three variable types are created: a char string of fixed length, an int, and a double. The type of variable to generate a declaration for is passed in as a parameter to the makeHeaderItem function. Comments are also generated in the header file pointed to by the HDRS file handle to indicate what columns the data points to. These comments serve as a cross- reference for when you are debugging the generated code:

sub makeHeaderItem {
    my($vname,$vtype,$from,$to) = @_;
    my $len;
    if ($vtype eq 'char') {
    $len = $to - $from + 2 ;
    printf HDRS "\n %s %s[%d]\; \/* %d %d *\/ ",
            $vtype, $vname, $len, $from, $to;
    }
    else
    {
    printf HDRS "\n %s %s\; \/* %d %d *\/", $vtype, $vname, $from,  $to;
    }
}

If the variable being generated is in a REPEAT block, it's declared as an array with a call to the MakeArrayedItem() function. The call to generate this arrayed item is

sub makeArrayedItem {
    my ($vname,$vtype,$count,$from,$to) = @_;
    printf HDRS "\n    %s %s[%d]; \/* from %d %d *\/ ",
            $vtype, $vname, $count, $from,  $to;
}

This code generates the P286.h headers file. Now let's look at the files that will contain the encode and decode functions.

Writing the Encoder Source File

To create the C source file to create encoder for the data file, use the EncD file handle. The subroutine startEncoderFile() starts the preamble for the file, which includes two items. The first is the call to include the header file, which is also being generated. The second is to write out the code to a function that pads spaces to the right of an incoming string to make the length equal to 80 characters plus a null character. The code to perform this preamble is shown here:

sub startEncoderFile {
print (EncD "\/*** C source file to encode records.",
    "\nDon't edit this file\n ",
    "*/\n",
    '#include "p286.h" ',
    "\n",
    "\n/* The incoming buffer must be 81 chars! */",
    "\nvoid padTo80(STRPTR buffer)\n{\n",
    "int i,ln;\n",
    "ln = strlen(buffer);\n",
    "for(i=ln;i<80;i++) buffer[i] = ' '; \n ",
    "buffer[81] = 0; /* NULL terminate the string*/",
    "\n} /* end of padding function */\n",
    "\n");
}

As each new RECORD type is encountered in the input file, its corresponding encoding function header is created in the output file. A function header is created that takes a string to put an encoded record in and a pointer to a structure to unpack. Because the name of the record being parsed is passed into the code generation function, it's easy to derive the pointer name for it: P286_HEADERNAME_PTR. The code to accomplish this is shown here:

sub startEncoderFunction {
    my($vname) = @_;                 # Pick up name of record.
    print (EncD "\n/*: ",
        "\n** Generated by Perl script -- Avoid editing\n*/\n");
    printf EncD
        "void encode_%s_type(STRPTR buffer,P286_%s_PTR sp)\n{",
        $vname, $vname;
    print (EncD "\nSTRPTR ncp; \n",
        "\nSTRPTR cp; \n",
        "register int i; \n",
        "char tempbuffer[80];\n");
}

When an ENDREC line is read, the function has to be closed. Unmatched RECORD and ENDREC lines in the input file will cause bad, uncompilable code to be generated. The cleanup at the end of the encode function is done by adding a call the padding function, padTo80, and printing out the ending curly brace:

sub closeEncoderFunction {
    print (EncD "\npadTo80(buffer);",
           "\n} /* End of encoding function */ \n");
}

As with structure declarations, two types of variables have to be parsed. One is a non-arrayed element and the other is an arrayed element. However, within the original loop the $vname being passed into this function is already set up as a variable or a member of an array so that no further processing is necessary.

The three types of variables used by the parsing encoder are used to generate the code. If the variable is a string, it's simply cut and pasted into its columns in the outgoing buffer. If it's an integer, the value of the integer is printed in the columns in the output buffer for the integer. For a double, the $fmt string contains the format string to explicitly place the decimal point at the right location in the columns for the output buffer. The code to perform this parsing is shown here:

sub encodeVariable {
    my($vname,$vtype,$from,$to,$fmt) = @_;
    my $len = $to - $from + 1;
    printf EncD "\n\n";
    printf EncD "/* Encode:$vtype,$vname,$from,$to,$len,$fmt */";
    if ($vtype =~ /char/) {
        printf EncD "\ncp = (STRPTR )&(buffer[%d]); ",$from-1;
        printf EncD "\nfor (i=0; i< %d;i++)",$len;
        printf EncD "\n     buffer[%d + i] = sp->%s[i]; ",$from-1,$vname;
        }
    if ($vtype =~ /double/) {
        if (length($fmt) > 0)
        {
    printf EncD "\nsprintf(tempbuffer,\"%%%sf\",(sp->%s));",
            $fmt,$vname;
        }
        else
        {
    printf EncD "\nsprintf(tempbuffer,\"%%%sf\",(sp->%s));",
                $len,$vname;
        }
    printf EncD "\ntempbuffer[%d]= 0; ",$len;
    printf EncD "\nstrncat((STRPTR )(&buffer[%d]),tempbuffer,%d);",
            $from-1,$len;
        }
    if ($vtype =~ /int/) {
        {
        if ($len == 1)
            {
            printf EncD "\nsp->%s %%= 10;", $vname;
            printf EncD "\ntempbuffer[%d] = (char)('0' + sp->%s);",
                $from-1,$vname;
            printf EncD "\ntempbuffer[%d] = 0; ",$from;
            }
        else
            {
        printf EncD "\nsprintf(tempbuffer,\"%%d%d\",(sp->%s));",$len,$vname;
        printf EncD "\ntempbuffer[%d]= 0; ",$len;
            }
        printf EncD "\nstrncat((STRPTR )(&buffer[%d]),tempbuffer,%d);",
                $from-1,$len;
        }
    }
} # end of subroutine.

Writing the Decoder

The decoder source file is created in almost exactly the same way that the encoder file is created. The preamble for the decoder file includes the #include statement for the P286.h header file, which is also being created. It also makes a reference to the padTo80 function in the encoder file (should this extra function be required).

The decoder file also generates code to extract a substring from another string given the offset and length to extract. The function may be used on systems that have a broken strncpy (just like my old, in-house version of the C compiler for a Motorola 6809E microprocessor).

Here's the code to generate the decoder source file preamble:

sub startDecoderFile {
print (DECD "\/**\n* C source file to decode records.",
    " \nDon't edit this file\n ",
    "*/\n",
    '#include "p286.h" ',
    "\n",
    "\n/* The outgoing buffer must be 81 chars! */",
    "\nextern void padTo80(STRPTR buffer); \n",
    "\n",
    "\n/* The outgoing buffer must also be 81 chars! */",
    "\nvoid substr(STRPTR buffer,STRPTR cut,int offset, int len)\n{\n",
    "int i,j; \nj = offset;\n",
    "for(i=len;i<80;i++) cut[i] = buffer[j]; \n ",
    "cut[len] = 0; /* NULL terminate the string*/",
    "\n} /* end of padding function */\n",
    "\n");
}

Each RECORD and ENDREC pair in the input file causes a call to the Perl subroutines startDecoderFunction() and closeDecoderFunction(), respectively. The first subroutine creates the preamble to decode a string into a structure. The second subroutine prints the closing brace for a function. The code for the two functions is shown here:

sub startDecoderFunction {
    my($vname) = @_;                 # Pick up name of record.
    print DECD "\n/*: ";
    print DECD "\n** Generated by Perl script -- Avoid editing ";
    print DECD "\n** The outgoing buffer must also be 81 chars!";
    print DECD "\n*/\n";
    printf DECD  "void decode_%s_type(P286_%s_PTR sp,STRPTR buffer)\n{",
        $vname, $vname;
    print DECD "\nSTRPTR cp; \n";
    print DECD "register int i; \n";
    print DECD "char tempbuffer[80];\n";
}

sub closeDecoderFunction {
    print DECD "\n} /* End of decoding function */ \n";
}

Finally, the decodeVariable function is called when a variable is encountered. The type of variable dictates how to read the value from a substring in the incoming buffer. The name of the variable is set to be a scalar or the member of an array before the call is made. Therefore, the Perl subroutine simply uses the value of $vname verbatim:

sub decodeVariable {
    my ($vname,$vtype,$from,$to,$fmt) = @_;
    my  $len = $to - $from + 1;
    printf DECD "\n";
    if ($vtype =~ /char/)
        {
        printf DECD "\nstrncpy(sp->%s,(STRPTR )(&buffer[%d]),%d);",
            $vname,$from-1,$len;
        printf DECD "\nsp->%s[%d]= 0; ",$vname,$len;
        }
    if ($vtype =~ /double/)
        {
        printf DECD "\nstrncpy(tempbuffer,(STRPTR )(&buffer[%d]),%d);",
                    $from-1,$len;
        printf DECD "\ntempbuffer[%d]= 0; ",$len;
        printf DECD "\nsscanf(tempbuffer,\"%%%sf\",&(sp->%s));", $fmt,$vname;
        }
    if ($vtype =~ /int/)
        {
        printf DECD "\nstrncpy(tempbuffer,(STRPTR )(&buffer[%d]),%d);",
                                $from-1,$len;
        printf DECD "\ntempbuffer[%d]= 0; ",$len;
        printf DECD "\nsscanf(tempbuffer,\"%%%dd\",&(sp->%s));",$len,$vname;
        }
}

Putting It Together

Here are the input and output files generated from the code fragments shown previously. The input file shown here has only five records in it. The actual file had 52 record types, and I applied the same methodology to three types of similar formats. Here is a breakdown of the listings:

  • Listing 31.2. The header file generated from Perl.
  • Listing 31.3. The encoder file P286enc.c.
  • Listing 31.4. The encoder file P286dec.c.
  • Listing 31.5. The full listing for the code generator.


Listing 31.2. The P286.h header file.
 1 #ifndef P286_HDRS
 2 #define P286_HDRS 1
 3 #define STRPTR   char *
 4
 5 #define P286_H0001_RECTYPE 0
 6
 7 typedef struct p286_H0001_type {
 8  char SurveyType[53]; /* 29 80 */
 9 }P286_H0001_TYPE,*P286_H0001_PTR;
10
11 #define P286_H0010_RECTYPE 1
12
13 typedef struct p286_H0010_type {
14  int numPatterns; /* 6 7 */
15  int sblInUse; /* 8 8 */
16  int sattInUse; /* 9 9 */
17  int numVessels; /* 10 10 */
18  int numDatum; /* 11 11 */
19  int offsetMode; /* 12 12 */
20 }P286_H0010_TYPE,*P286_H0010_PTR;
21
22 #define P286_H011_RECTYPE 2
23
24 typedef struct p286_H011_type {
25  int datumId; /* 5 5 */
26  char spheroidName[19]; /* 6 23 */
27  char datumName[19]; /* 24 41 */
28  double semimajorAxis; /* 42 53 */
29  double conversionFactor; /* 66 77 */
30  double inverseFlattening; /* 66 77 */
31 }P286_H011_TYPE,*P286_H011_PTR;
32
33 #define P286_E3100_RECTYPE 3
34
35 typedef struct p286_E3100_type {
36  double velprop; /* 6 12 */
37     int srcNdx[5]; /* from 13 15 */
38     int dstNdx[5]; /* from 16 18 */
39     double slant[5]; /* from 19 25 */
40 }P286_E3100_TYPE,*P286_E3100_PTR;
41
42 #endif


Listing 31.3. The encoder file P286enc.c.
  1 /*** C source file to encode records.
  2 Don't edit this file
  3  */
  4 #include "p286.h"
  5
  6 /* The incoming buffer must be 81 chars! */
  7 void padTo80(STRPTR buffer)
  8 {
  9 int i,ln;
 10 ln = strlen(buffer);
 11 for(i=ln;i<80;i++) buffer[i] = ' ';
 12  buffer[81] = 0; /* NULL terminate the string*/
 13 } /* end of padding function */
 14
 15
 16 /*:
 17 ** Generated by Perl script -- Avoid editing
 18 */
 19 void encode_H0001_type(STRPTR buffer,P286_H0001_PTR sp)
 20 {
 21 STRPTR ncp;
 22
 23 STRPTR cp;
 24 register int i;
 25 char tempbuffer[80];
 26
 27
 28 /* Encode:char,SurveyType,29,80,52, */
 29 cp = (STRPTR )&(buffer[28]);
 30 for (i=0; i< 52;i++)
 31      buffer[28 + i] = sp->SurveyType[i];
 32 padTo80(buffer);
 33 } /* End of encoding function */
 34
 35 /*:
 36 ** Generated by Perl script -- Avoid editing
 37 */
 38 void encode_H0010_type(STRPTR buffer,P286_H0010_PTR sp)
 39 {
 40 STRPTR ncp;
 41
 42 STRPTR cp;
 43 register int i;
 44 char tempbuffer[80];
 45
 46
 47 /* Encode:int,numPatterns,6,7,2, */
 48 sprintf(tempbuffer,"%d2",(sp->numPatterns));
 49 tempbuffer[2]= 0;
 50 strncat((STRPTR )(&buffer[5]),tempbuffer,2);
 51
 52 /* Encode:int,sblInUse,8,8,1, */
 53 sp->sblInUse %= 10;
 54 tempbuffer[7] = (char)('0' + sp->sblInUse);
 55 tempbuffer[8] = 0;
 56 strncat((STRPTR )(&buffer[7]),tempbuffer,1);
 57
 58 /* Encode:int,sattInUse,9,9,1, */
 59 sp->sattInUse %= 10;
 60 tempbuffer[8] = (char)('0' + sp->sattInUse);
 61 tempbuffer[9] = 0;
 62 strncat((STRPTR )(&buffer[8]),tempbuffer,1);
 63
 64 /* Encode:int,numVessels,10,10,1, */
 65 sp->numVessels %= 10;
 66 tempbuffer[9] = (char)('0' + sp->numVessels);
 67 tempbuffer[10] = 0;
 68 strncat((STRPTR )(&buffer[9]),tempbuffer,1);
 69
 70 /* Encode:int,numDatum,11,11,1, */
 71 sp->numDatum %= 10;
 72 tempbuffer[10] = (char)('0' + sp->numDatum);
 73 tempbuffer[11] = 0;
 74 strncat((STRPTR )(&buffer[10]),tempbuffer,1);
 75
 76 /* Encode:int,offsetMode,12,12,1, */
 77 sp->offsetMode %= 10;
 78 tempbuffer[11] = (char)('0' + sp->offsetMode);
 79 tempbuffer[12] = 0;
 80 strncat((STRPTR )(&buffer[11]),tempbuffer,1);
 81 padTo80(buffer);
 82 } /* End of encoding function */
 83
 84 /*:
 85 ** Generated by Perl script -- Avoid editing
 86 */
 87 void encode_H011_type(STRPTR buffer,P286_H011_PTR sp)
 88 {
 89 STRPTR ncp;
 90
 91 STRPTR cp;
 92 register int i;
 93 char tempbuffer[80];
 94
 95
 96 /* Encode:int,datumId,5,5,1, */
 97 sp->datumId %= 10;
 98 tempbuffer[4] = (char)('0' + sp->datumId);
 99 tempbuffer[5] = 0;
100 strncat((STRPTR )(&buffer[4]),tempbuffer,1);
101
102 /* Encode:char,spheroidName,6,23,18, */
103 cp = (STRPTR )&(buffer[5]);
104 for (i=0; i< 18;i++)
105      buffer[5 + i] = sp->spheroidName[i];
106
107 /* Encode:char,datumName,24,41,18, */
108 cp = (STRPTR )&(buffer[23]);
109 for (i=0; i< 18;i++)
110      buffer[23 + i] = sp->datumName[i];
111
112 /* Encode:double,semimajorAxis,42,53,12,12.3 */
113 sprintf(tempbuffer,"%12.3f",(sp->semimajorAxis));
114 tempbuffer[12]= 0;
115 strncat((STRPTR )(&buffer[41]),tempbuffer,12);
116
117 /* Encode:double,conversionFactor,66,77,12,12.8 */
118 sprintf(tempbuffer,"%12.8f",(sp->conversionFactor));
119 tempbuffer[12]= 0;
120 strncat((STRPTR )(&buffer[65]),tempbuffer,12);
121
122 /* Encode:double,inverseFlattening,66,77,12,12.7 */
123 sprintf(tempbuffer,"%12.7f",(sp->inverseFlattening));
124 tempbuffer[12]= 0;
125 strncat((STRPTR )(&buffer[65]),tempbuffer,12);
126 padTo80(buffer);
127 } /* End of encoding function */
128
129 /*:
130 ** Generated by Perl script -- Avoid editing
131 */
132 void encode_E3100_type(STRPTR buffer,P286_E3100_PTR sp)
133 {
134 STRPTR ncp;
135
136 STRPTR cp;
137 register int i;
138 char tempbuffer[80];
139
140
141 /* Encode:double,velprop,6,12,7,7.2 */
142 sprintf(tempbuffer,"%7.2f",(sp->velprop));
143 tempbuffer[7]= 0;
144 strncat((STRPTR )(&buffer[5]),tempbuffer,7);
145
146 /* Encode:int,srcNdx[0],13,16,4, */
147 sprintf(tempbuffer,"%d4",(sp->srcNdx[0]));
148 tempbuffer[4]= 0;
149 strncat((STRPTR )(&buffer[12]),tempbuffer,4);
150
151 /* Encode:int,srcNdx[1],26,29,4, */
152 sprintf(tempbuffer,"%d4",(sp->srcNdx[1]));
153 tempbuffer[4]= 0;
154 strncat((STRPTR )(&buffer[25]),tempbuffer,4);
155
156 /* Encode:int,srcNdx[2],39,42,4, */
157 sprintf(tempbuffer,"%d4",(sp->srcNdx[2]));
158 tempbuffer[4]= 0;
159 strncat((STRPTR )(&buffer[38]),tempbuffer,4);
160
161 /* Encode:int,srcNdx[3],52,55,4, */
162 sprintf(tempbuffer,"%d4",(sp->srcNdx[3]));
163 tempbuffer[4]= 0;
164 strncat((STRPTR )(&buffer[51]),tempbuffer,4);
165
166 /* Encode:int,srcNdx[4],65,68,4, */
167 sprintf(tempbuffer,"%d4",(sp->srcNdx[4]));
168 tempbuffer[4]= 0;
169 strncat((STRPTR )(&buffer[64]),tempbuffer,4);
170
171 /* Encode:int,dstNdx[0],13,16,4, */
172 sprintf(tempbuffer,"%d4",(sp->dstNdx[0]));
173 tempbuffer[4]= 0;
174 strncat((STRPTR )(&buffer[12]),tempbuffer,4);
175
176 /* Encode:int,dstNdx[1],26,29,4, */
177 sprintf(tempbuffer,"%d4",(sp->dstNdx[1]));
178 tempbuffer[4]= 0;
179 strncat((STRPTR )(&buffer[25]),tempbuffer,4);
180
181 /* Encode:int,dstNdx[2],39,42,4, */
182 sprintf(tempbuffer,"%d4",(sp->dstNdx[2]));
183 tempbuffer[4]= 0;
184 strncat((STRPTR )(&buffer[38]),tempbuffer,4);
185
186 /* Encode:int,dstNdx[3],52,55,4, */
187 sprintf(tempbuffer,"%d4",(sp->dstNdx[3]));
188 tempbuffer[4]= 0;
189 strncat((STRPTR )(&buffer[51]),tempbuffer,4);
190
191 /* Encode:int,dstNdx[4],65,68,4, */
192 sprintf(tempbuffer,"%d4",(sp->dstNdx[4]));
193 tempbuffer[4]= 0;
194 strncat((STRPTR )(&buffer[64]),tempbuffer,4);
195
196 /* Encode:double,slant[0],13,20,8,7.2 */
197 sprintf(tempbuffer,"%7.2f",(sp->slant[0]));
198 tempbuffer[8]= 0;
199 strncat((STRPTR )(&buffer[12]),tempbuffer,8);
200
201 /* Encode:double,slant[1],26,33,8,7.2 */
202 sprintf(tempbuffer,"%7.2f",(sp->slant[1]));
203 tempbuffer[8]= 0;
204 strncat((STRPTR )(&buffer[25]),tempbuffer,8);
205
206 /* Encode:double,slant[2],39,46,8,7.2 */
207 sprintf(tempbuffer,"%7.2f",(sp->slant[2]));
208 tempbuffer[8]= 0;
209 strncat((STRPTR )(&buffer[38]),tempbuffer,8);
210
211 /* Encode:double,slant[3],52,59,8,7.2 */
212 sprintf(tempbuffer,"%7.2f",(sp->slant[3]));
213 tempbuffer[8]= 0;
214 strncat((STRPTR )(&buffer[51]),tempbuffer,8);
215
216 /* Encode:double,slant[4],65,72,8,7.2 */
217 sprintf(tempbuffer,"%7.2f",(sp->slant[4]));
218 tempbuffer[8]= 0;
219 strncat((STRPTR )(&buffer[64]),tempbuffer,8);
220 padTo80(buffer);
221 } /* End of encoding function */


Listing 31.4. The decoder file P286dec.c.
  1 /**
  2 * C source file to decode records.
  3 Don't edit this file
  4  */
  5 #include "p286.h"
  6
  7 /* The outgoing buffer must be 81 chars! */
  8 extern void padTo80(STRPTR buffer);
  9
 10
 11 /* The outgoing buffer must also be 81 chars! */
 12 void substr(STRPTR buffer,STRPTR cut,int offset, int len)
 13 {
 14 int i,j;
 15 j = offset;
 16 for(i=len;i<80;i++) cut[i] = buffer[j];
 17  cut[len] = 0; /* NULL terminate the string*/
 18 } /* end of padding function */
 19
 20
 21 /*:
 22 ** Generated by Perl script -- Avoid editing
 23 ** The outgoing buffer must also be 81 chars!
 24 */
 25 void decode_H0001_type(P286_H0001_PTR sp,STRPTR buffer)
 26 {
 27 STRPTR cp;
 28 register int i;
 29 char tempbuffer[80];
 30
 31
 32 strncpy(sp->SurveyType,(STRPTR )(&buffer[28]),52);
 33 sp->SurveyType[52]= 0;
 34 } /* End of decoding function */
 35
 36 /*:
 37 ** Generated by Perl script -- Avoid editing
 38 ** The outgoing buffer must also be 81 chars!
 39 */
 40 void decode_H0010_type(P286_H0010_PTR sp,STRPTR buffer)
 41 {
 42 STRPTR cp;
 43 register int i;
 44 char tempbuffer[80];
 45
 46
 47 strncpy(tempbuffer,(STRPTR )(&buffer[5]),2);
 48 tempbuffer[2]= 0;
 49 sscanf(tempbuffer,"%2d",&(sp->numPatterns));
 50
 51 strncpy(tempbuffer,(STRPTR )(&buffer[7]),1);
 52 tempbuffer[1]= 0;
 53 sscanf(tempbuffer,"%1d",&(sp->sblInUse));
 54
 55 strncpy(tempbuffer,(STRPTR )(&buffer[8]),1);
 56 tempbuffer[1]= 0;
 57 sscanf(tempbuffer,"%1d",&(sp->sattInUse));
 58
 59 strncpy(tempbuffer,(STRPTR )(&buffer[9]),1);
 60 tempbuffer[1]= 0;
 61 sscanf(tempbuffer,"%1d",&(sp->numVessels));
 62
 63 strncpy(tempbuffer,(STRPTR )(&buffer[10]),1);
 64 tempbuffer[1]= 0;
 65 sscanf(tempbuffer,"%1d",&(sp->numDatum));
 66
 67 strncpy(tempbuffer,(STRPTR )(&buffer[11]),1);
 68 tempbuffer[1]= 0;
 69 sscanf(tempbuffer,"%1d",&(sp->offsetMode));
 70 } /* End of decoding function */
 71
 72 /*:
 73 ** Generated by Perl script -- Avoid editing
 74 ** The outgoing buffer must also be 81 chars!
 75 */
 76 void decode_H011_type(P286_H011_PTR sp,STRPTR buffer)
 77 {
 78 STRPTR cp;
 79 register int i;
 80 char tempbuffer[80];
 81
 82
 83 strncpy(tempbuffer,(STRPTR )(&buffer[4]),1);
 84 tempbuffer[1]= 0;
 85 sscanf(tempbuffer,"%1d",&(sp->datumId));
 86
 87 strncpy(sp->spheroidName,(STRPTR )(&buffer[5]),18);
 88 sp->spheroidName[18]= 0;
 89
 90 strncpy(sp->datumName,(STRPTR )(&buffer[23]),18);
 91 sp->datumName[18]= 0;
 92
 93 strncpy(tempbuffer,(STRPTR )(&buffer[41]),12);
 94 tempbuffer[12]= 0;
 95 sscanf(tempbuffer,"%12.3f",&(sp->semimajorAxis));
 96
 97 strncpy(tempbuffer,(STRPTR )(&buffer[65]),12);
 98 tempbuffer[12]= 0;
 99 sscanf(tempbuffer,"%12.8f",&(sp->conversionFactor));
100
101 strncpy(tempbuffer,(STRPTR )(&buffer[65]),12);
102 tempbuffer[12]= 0;
103 sscanf(tempbuffer,"%12.7f",&(sp->inverseFlattening));
104 } /* End of decoding function */
105
106 /*:
107 ** Generated by Perl script -- Avoid editing
108 ** The outgoing buffer must also be 81 chars!
109 */
110 void decode_E3100_type(P286_E3100_PTR sp,STRPTR buffer)
111 {
112 STRPTR cp;
113 register int i;
114 char tempbuffer[80];
115
116
117 strncpy(tempbuffer,(STRPTR )(&buffer[5]),7);
118 tempbuffer[7]= 0;
119 sscanf(tempbuffer,"%7.2f",&(sp->velprop));
120
121 strncpy(tempbuffer,(STRPTR )(&buffer[12]),4);
122 tempbuffer[4]= 0;
123 sscanf(tempbuffer,"%4d",&(sp->srcNdx[0]));
124
125 strncpy(tempbuffer,(STRPTR )(&buffer[25]),4);
126 tempbuffer[4]= 0;
127 sscanf(tempbuffer,"%4d",&(sp->srcNdx[1]));
128
129 strncpy(tempbuffer,(STRPTR )(&buffer[38]),4);
130 tempbuffer[4]= 0;
131 sscanf(tempbuffer,"%4d",&(sp->srcNdx[2]));
132
133 strncpy(tempbuffer,(STRPTR )(&buffer[51]),4);
134 tempbuffer[4]= 0;
135 sscanf(tempbuffer,"%4d",&(sp->srcNdx[3]));
136
137 strncpy(tempbuffer,(STRPTR )(&buffer[64]),4);
138 tempbuffer[4]= 0;
139 sscanf(tempbuffer,"%4d",&(sp->srcNdx[4]));
140
141 strncpy(tempbuffer,(STRPTR )(&buffer[12]),4);
142 tempbuffer[4]= 0;
143 sscanf(tempbuffer,"%4d",&(sp->dstNdx[0]));
144
145 strncpy(tempbuffer,(STRPTR )(&buffer[25]),4);
146 tempbuffer[4]= 0;
147 sscanf(tempbuffer,"%4d",&(sp->dstNdx[1]));
148
149 strncpy(tempbuffer,(STRPTR )(&buffer[38]),4);
150 tempbuffer[4]= 0;
151 sscanf(tempbuffer,"%4d",&(sp->dstNdx[2]));
152
153 strncpy(tempbuffer,(STRPTR )(&buffer[51]),4);
154 tempbuffer[4]= 0;
155 sscanf(tempbuffer,"%4d",&(sp->dstNdx[3]));
156
157 strncpy(tempbuffer,(STRPTR )(&buffer[64]),4);
158 tempbuffer[4]= 0;
159 sscanf(tempbuffer,"%4d",&(sp->dstNdx[4]));
160
161 strncpy(tempbuffer,(STRPTR )(&buffer[12]),8);
162 tempbuffer[8]= 0;
163 sscanf(tempbuffer,"%7.2f",&(sp->slant[0]));
164
165 strncpy(tempbuffer,(STRPTR )(&buffer[25]),8);
166 tempbuffer[8]= 0;
167 sscanf(tempbuffer,"%7.2f",&(sp->slant[1]));
168
169 strncpy(tempbuffer,(STRPTR )(&buffer[38]),8);
170 tempbuffer[8]= 0;
171 sscanf(tempbuffer,"%7.2f",&(sp->slant[2]));
172
173 strncpy(tempbuffer,(STRPTR )(&buffer[51]),8);
174 tempbuffer[8]= 0;
175 sscanf(tempbuffer,"%7.2f",&(sp->slant[3]));
176
177 strncpy(tempbuffer,(STRPTR )(&buffer[64]),8);
178 tempbuffer[8]= 0;
179 sscanf(tempbuffer,"%7.2f",&(sp->slant[4]));
180 } /* End of decoding function */


Listing 31.5. The complete source file for generating parser code.
  1 #!/usr/bin/perl
  2 #
  3 # Copyright (c) Kamran Husain.
  4 # The following code is hereby placed in the public domain.
  5 # NO WARRANTIES OF ANY TYPE ARE IMPLIED.
  6 # For entertainment use only. Contents of package may shift
  7 # during shipment.
  8 # You may copy this code freely as long as you name me as
  9 # the original author.
  10 #
  11 # This program parses input records of this form:
  12 #
  13 # RECORD TypeOfRecord StringToUse
  14 #     variableName     VarType    startCol EndCol
  15 #     variableName     VarType    startCol EndCol
  16 #     variableName     VarType    startCol EndCol
  17 #     [REPEAT count st1 st2 ... ]
  18 #     variableName     VarType    startCol EndCol
  19 #     variableName     VarType    startCol EndCol
  20 #     variableName     VarType    startCol EndCol
  21 # ENDREC
  22 #
  23 require 5.002;
  24 open (SAFE, "safehdrs") || die "Cannot open Input file  $!\n";
  25 open (HDRS, ">p286.h")  || die "Cannot open  Header $!\n";
  26 open (EncD, ">p286enc.c")  || die "Cannot open  Encoder $!\n";
  27 open (DECD, ">p286dec.c")  || die "Cannot open  Decoder $!\n";
  28
  29 $inRecord = 0;
  30 $repeat = 0;
  31 $recordCounter = 0;
  32
  33 &startHeaderFile();
  34 &startEncoderFile();
  35 &startDecoderFile();
  36
  37 while (<SAFE>) {
  38     chop($_);
  39     if (/^#/) { next; }
  40     if (/^\s*$/) { next; }
  41     @names = split(' ',$_);
  42     #
  43     # If not inside a record, start one
  44     #
  45     if (/^RECORD/) {
  46         ($rname,$rtype,@rest) = split(' ',$_);
  47         &startHeaderRecord($rtype);
  48         &startEncoderFunction($rtype);
  49         &startDecoderFunction($rtype);
  50         $inRecord = 1;
  51         $repeat = 0;
  52         next;
  53         }
  54     if (/^ENDREC/) {
  55         &stopHeaderRecord($rtype);
  56         &closeEncoderFunction();
  57         &closeDecoderFunction();
  58         $inRecord = 0;
  59         }
  60     if (/^[\s]*REPEAT/) {
  61         $repeat = 1;
  62         @allOffsets = split(' ',$_);
  63         $index = $allOffsets[0];
  64         $count = $allOffsets[1];
  65         print "INDEX = $index, COUNT= $count";
  66         @offsets = splice(@allOffsets,2);
  67         # foreach $x (@offsets) { print "offset = $x\n"; }
  68         next;
  69         }
  70     if ($inRecord)
  71         {
  72         ($vname,$vtype,$from,$to,$fmt) = split(' ',$_);
  73         if ($repeat == 0)
  74             {
  75             &makeHeaderItem($vname,$vtype,$from,$to);
  76             &encodeVariable($vname,$vtype,$from,$to,$fmt);
  77             &decodeVariable($vname,$vtype,$from,$to,$fmt);
  78             }
  79         else
  80         {
  81          &makeArrayedItem($vname,$vtype,$count,$from,$to);
  82         $offsetLen   = $to - $from + 1;
  83         $offsetCount = 0;
  84         foreach $x (@offsets) {
  85             $offsetFrom  = $x;
  86             $offsetTo    = $offsetLen + $x;
  87             $offsetName = sprintf "%s[%d]", $vname,$offsetCount;
  88             print "Name = $offsetName, COUNT= $count\n";
  89         &encodeVariable($offsetName,$vtype,$offsetFrom,$offsetTo,$fmt);
  90         &decodeVariable($offsetName,$vtype,$offsetFrom,$offsetTo,$fmt);
  91             $offsetCount++;
  92             }
  93         } ## end of else clause.
  94     } ## end of outer if ($inRecord) clause.
  95     }
  96 close (SAFE);
  97
  98 &closeHeaderFile();
  99 close (HDRS);
100 close (EncD);
101 close (DECD);
102
103 exit(0);
104
105
106 # --------------------------------------------------------------------
107 #  To create the C headers file from the data file.
108 # --------------------------------------------------------------------
109 sub startHeaderRecord {
110     my ($name) = @_;
111     printf HDRS "\n#define P286_%s_RECTYPE %d ", $name, $recordCounter++;
112     printf HDRS "\n\ntypedef struct p286_%s_type {", $name;
113 }
114 # --------------------------------------------------------------------
115 # Note the formal input parameter $name in this function.
116 # --------------------------------------------------------------------
117 sub stopHeaderRecord($name) {
118     my $name = shift;
119     my $ntype = "P286_" . $name . "_TYPE";
120     my $nptr  = "*P286_" . $name . "_PTR;";
121     print HDRS "\n}" .  uc($ntype) . "," . uc($nptr) ." \n";
122 }
123 # --------------------------------------------------------------------
124 #     Create the definitions for the header file.
125 # --------------------------------------------------------------------
126 sub makeHeaderItem {
127     my($vname,$vtype,$from,$to) = @_;
128     my $len;
129     if ($vtype =~ /char/) {
130         $len = $to - $from + 2 ;
131         printf HDRS "\n %s %s[%d]\; \/* %d %d *\/ ",
132             $vtype, $vname, $len, $from, $to;
133         }
134     else
135         {
136         printf HDRS "\n %s %s\; \/* %d %d *\/",
137             $vtype, $vname, $from,  $to;
138         }
139 }
140 # --------------------------------------------------------------------
141 #     Create the preamble for the header file.
142 # --------------------------------------------------------------------
143 sub startHeaderFile() {
144     printf (HDRS "#ifndef P286_HDRS\n",
145         "#define P286_HDRS1\n",
146         "#define STRPTR   char *\n");
147 }
148
149 # --------------------------------------------------------------------
150 #     Close the preprocessor definitions for the header file.
151 # --------------------------------------------------------------------
152 sub closeHeaderFile() {
153     printf HDRS "\n#endif\n";
154 }
155 # --------------------------------------------------------------------
156 #    Create an array item for a structure
157 # --------------------------------------------------------------------
158 sub makeArrayedItem {
159     my ($vname,$vtype,$count,$from,$to) = @_;
160     printf HDRS "\n    %s %s[%d]; \/* from %d %d *\/ ",
161             $vtype, $vname, $count, $from,  $to;
162 }
163
164 # --------------------------------------------------------------------
165 #    Create the C source file to create encoder for the data file.
166 # --------------------------------------------------------------------
167 sub startEncoderFile {
168 print (EncD "\/*** C source file to encode records.",
169     "\nDon't edit this file\n ",
170     "*/\n",
171     '#include "p286.h" ',
172     "\n",
173     "\n/* The incoming buffer must be 81 chars! */",
174     "\nvoid padTo80(STRPTR buffer)\n{\n",
175     "int i,ln;\n",
176     "ln = strlen(buffer);\n",
177     "for(i=ln;i<80;i++) buffer[i] = ' '; \n ",
178     "buffer[81] = 0; /* NULL terminate the string*/",
179     "\n} /* end of padding function */\n",
180     "\n");
181 }
182
183 # --------------------------------------------------------------------
184 #    Create the preamble for an encoder function.
185 # --------------------------------------------------------------------
186 sub startEncoderFunction {
187     my($vname) = @_;                 # Pick up name of record.
188     print (EncD "\n/*: ",
189         "\n** Generated by Perl script -- Avoid editing\n*/\n");
190     printf EncD
191         "void encode_%s_type(STRPTR buffer,P286_%s_PTR sp)\n{",
192         $vname, $vname;
193     print (EncD "\nSTRPTR ncp; \n",
194         "\nSTRPTR cp; \n",
195         "register int i; \n",
196         "char tempbuffer[80];\n");
197 }
198
199 # --------------------------------------------------------------------
200 #    Create the closing brackets for an encoder function
201 # --------------------------------------------------------------------
202 sub closeEncoderFunction {
203     print EncD "\npadTo80(buffer);";
204     print EncD "\n} /* End of encoding function */ \n";
205 }
206
207 # --------------------------------------------------------------------
208 #    Add code for encoding a record.
209 # --------------------------------------------------------------------
210 sub encodeVariable {
211     my($vname,$vtype,$from,$to,$fmt) = @_;
212     my $len = $to - $from + 1;
213     printf EncD "\n\n";
214     printf EncD "/* Encode:$vtype,$vname,$from,$to,$len,$fmt */";
215     if ($vtype eq 'char') {
216         printf EncD "\ncp = (STRPTR )&(buffer[%d]); ",$from-1;
217         printf EncD "\nfor (i=0; i< %d;i++)",$len;
218         printf EncD "\n     buffer[%d + i] = sp->%s[i]; ",$from-1,$vname;
219         }
220     if ($vtype eq 'double') {
221         if (length($fmt) > 0)
222         {
223         printf EncD "\nsprintf(tempbuffer,\"%%%sf\",(sp->%s));", $fmt,$vname;
224         }
225         else
226         {
227         printf EncD "\nsprintf(tempbuffer,\"%%%sf\",(sp->%s));",
228                 $len,$vname;
229         }
230         printf EncD "\ntempbuffer[%d]= 0; ",$len;
231         printf EncD "\nstrncat((STRPTR )(&buffer[%d]),tempbuffer,%d);",
232             $from-1,$len;
233         }
234     if ($vtype eq 'int') {
235         {
236         if ($len == 1)
237             {
238             printf EncD "\nsp->%s %%= 10;", $vname;
239             printf EncD "\ntempbuffer[%d] = (char)('0' + sp->%s);",
240                 $from-1,$vname;
241             printf EncD "\ntempbuffer[%d] = 0; ",$from;
242             }
243         else
244             {
245         printf EncD "\nsprintf(tempbuffer,\"%%d%d\",(sp->%s));",$len,$vname;
246         printf EncD "\ntempbuffer[%d]= 0; ",$len;
247             }
248         printf EncD "\nstrncat((STRPTR )(&buffer[%d]),tempbuffer,%d);",
249                 $from-1,$len;
250         }
251     }
252 } # end of subroutine.
253
254
255 # --------------------------------------------------------------------
256 #  To create the C source file to create decoder for the data file.
257 # --------------------------------------------------------------------
258 sub startDecoderFile {
259 print (DECD "\/**\n* C source file to decode records.",
260     " \nDon't edit this file\n ",
261     "*/\n",
262     '#include "p286.h" ',
263     "\n",
264     "\n/* The outgoing buffer must be 81 chars! */",
265     "\nextern void padTo80(STRPTR buffer); \n",
266     "\n",
267     "\n/* The outgoing buffer must also be 81 chars! */",
268     "\nvoid substr(STRPTR buffer,STRPTR cut,int offset, int len)\n{\n",
269     "int i,j; \nj = offset;\n",
270     "for(i=len;i<80;i++) cut[i] = buffer[j]; \n ",
271     "cut[len] = 0; /* NULL terminate the string*/",
272     "\n} /* end of padding function */\n",
273     "\n");
274 }
275
276 # --------------------------------------------------------------------
277 #   Create the preamble for a decoder function
278 # --------------------------------------------------------------------
279 sub startDecoderFunction {
280     my($vname) = @_;                 # Pick up name of record.
281     print (DECD "\n/*: ",
282     "\n** Generated by Perl script -- Avoid editing ",
283     "\n** The outgoing buffer must also be 81 chars!",
284     "\n*/\n";
285     printf (DECD
286     "void decode_%s_type(P286_%s_PTR sp,STRPTR buffer)\n{",
287         $vname, $vname);
288     print (DECD "\nSTRPTR cp; \n",
289     "register int i; \n",
290     "char tempbuffer[80];\n");
291 }
292
293 # --------------------------------------------------------------------
294 #   Create the closing braces for a decoder function
295 # --------------------------------------------------------------------
296 sub closeDecoderFunction {
297     print DECD "\n} /* End of decoding function */ \n";
298 }
299
300 # --------------------------------------------------------------------
301 #   Create the source code for decoding a variable
302 # --------------------------------------------------------------------
303 sub decodeVariable {
304     my ($vname,$vtype,$from,$to,$fmt) = @_;
305     my  $len = $to - $from + 1;
306     printf DECD "\n";
307     if ($vtype =~ /char/)
308         {
309         printf DECD "\nstrncpy(sp->%s,(STRPTR )(&buffer[%d]),%d);",
310             $vname,$from-1,$len;
311         printf DECD "\nsp->%s[%d]= 0; ",$vname,$len;
312         }
313     if ($vtype =~ /double/)
314         {
315         printf DECD "\nstrncpy(tempbuffer,(STRPTR )(&buffer[%d]),%d);",
316                     $from-1,$len;
317         printf DECD "\ntempbuffer[%d]= 0; ",$len;
318         printf DECD "\nsscanf(tempbuffer,\"%%%sf\",&(sp->%s));", $fmt,$vname;
319         }
320     if ($vtype =~ /int/)
321         {
322         printf DECD "\nstrncpy(tempbuffer,(STRPTR )(&buffer[%d]),%d);",
323                                 $from-1,$len;
324         printf DECD "\ntempbuffer[%d]= 0; ",$len;
325         printf DECD "\nsscanf(tempbuffer,\"%%%dd\",&(sp->%s));",$len,$vname;
326         }
327 }

Summary

Although the example given in this chapter is for a very specialized parse code generator, it provides examples of several ways to apply Perl to solving problems. Here are the important points to remember:

  • The code being generated by the Perl program was being used to parse records with a similar format. The format of each record was general enough to have similar functions decode it. There were no hidden surprises in terms of special cases that would make code generation anything less than general.
  • Using Perl would have been a good choice to parse the data but the underlying platforms did not support Perl.
  • The number of variables to look for was fixed. Only a few types of variables (char, int, and double) were used. Introducing user-defined types or several other variables would make the code generation more complicated.
  • The code being generated out of a Perl program was used with C code in other programs. Once generated, it could be optimized manually for speed.
  • Because you are in control of the code being generated, do not forget to generate comments in the native language of the code being created. Such comments will serve as a great cross-reference in the future or the generated code is being debugged.
  • By taking the grunt work out of the programming cycle, a lot of typing errors were removed from the final code. If there was one error, it would be propagated to all the generated code dependent on that erroneous code.
  • Similarly, any changes in the format, though not very likely given the age of the format, could be accommodated by simply modifying the Perl script and regenerating the source code. It sure beats going through pages of code that looks the same, doesn't it?
  • The original specification was inconsistent in defining the structures for use with the format. Had the original specification been more consistent, Perl could have been used to actually extract the data from the specification itself and generated the P286hdrs file! Unfortunately, this was not possible; however, your situation might be quite different. Keep your eyes open for such developing patterns.
  • Finally, the input files in the real case were a lot larger, with at least four files-each more than 500 lines long. Had this only been a few structures, it would have been more practical to just code the final version without resorting to Perl.


Previous chapter Chapter contents Contents




With any suggestions or questions please feel free to contact us