Programming Perl [Chapter 3] 3.2 Perl Functions in Alphabetical Order

3.2 Perl Functions in Alphabetical Order

/PATTERN/

/PATTERN/
m/PATTERN/

The match operator. See "Regular Expressions" in Chapter 2, The Gory Details.

?PATTERN?

?PATTERN?

This is just like the /PATTERN/ search, except that it matches only once between calls to reset, so it finds only the first occurrence of something rather than all occurrences. (In other words, the operator works repeatedly until it actually matches something, then it turns itself off until you explicitly turn it back on with reset.) This may be useful (and efficient) if you want to see only the first occurrence of the pattern in each file of a set of files. Note that m?? is equivalent to ??.

The reset operator will only reset instances of ?? that were compiled in the same package that it was.

abs

abs VALUE

This function returns the absolute value of its argument (or $_ if omitted).

accept

accept NEWSOCKET, GENERICSOCKET

This function does the same thing as the accept system call--see accept (2). It is used by server processes that wish to accept socket connections from clients. Execution is suspended until a connection is made, at which time the NEWSOCKET filehandle is opened and attached to the newly made connection. The function returns the connected address if the call succeeded, false otherwise (and puts the error code into $!). GENERICSOCKET must be a filehandle already opened via the socket operator and bound to one of the server's network addresses. For example:

unless ($peer = accept NS, S) {
    die "Can't accept a connection: $!\n";
}

See also the example in the section "Sockets" in Chapter 6, Social Engineering.

alarm

alarm EXPR

This function sends a SIGALRM signal to the executing Perl program after EXPR seconds. On some older systems, alarms go off at the "top of the second," so, for instance, an alarm 1 may go off anywhere between 0 to 1 second from now, depending on when in the current second it is. An alarm 2 may go off anywhere from 1 to 2 seconds from now. And so on. For better resolution, you may be able to use syscall to call the itimer routines that some UNIX systems support. Or you can use the timeout feature of the select function.

Each call disables the previous timer, and an argument of 0 may be supplied to cancel the previous timer without starting a new one. The return value is the number of seconds remaining on the previous timer.

atan2

atan2 Y, X

This function returns the arctangent of Y/X in the range -pi to pi. A quick way to get an approximate value of pi is to say:

$pi = atan2(1,1) * 4;

For the tangent operation, you may use the POSIX::tan() function, or use the familiar relation:

sub tan { sin($_[0]) / cos($_[0]) }

bind

bind SOCKET, NAME

This function does the same thing as the bind system call--see bind (2). It attaches an address (a name) to an already opened socket specified by the SOCKET filehandle. The function returns true if it succeeded, false otherwise (and puts the error code into $!). NAME should be a packed address of the proper type for the socket.

bind S, $sockaddr or die "Can't bind address: $!\n";

See also the example in the section "Sockets" in Chapter 6, Social Engineering.

binmode

binmode FILEHANDLE

This function arranges for the file to be treated in binary mode on operating systems that distinguish between binary and text files. It should be called after the open but before any I/O is done on the filehandle. The only way to reset binary mode on a filehandle is to reopen the file.

On systems that distinguish binary mode from text mode, files that are read in text mode have \r\n sequences translated to \n on input and \n translated to \r\n on output. binmode has no effect under UNIX or Plan9. If FILEHANDLE is an expression, the value is taken as the name of the filehandle. The following example shows how a Perl script might prepare to read a word processor file with embedded control codes:

open WP, "$file.wp" or die "Can't open $file.wp: $!\n";
binmode WP;
while (read WP, $buf, 1024) {...}

bless

bless REF, CLASSNAME
bless REF

This function looks up the item pointed to by reference REF and tells the item that it is now an object in the CLASSNAME package--or the current package if no CLASSNAME is specified, which is often the case. It returns the reference for convenience, since a bless is often the last thing in a constructor function. (Always use the two-argument version if the constructor doing the blessing might be inherited by a derived class. In such cases, the class you want to bless your object into will normally be found as the first argument to the constructor in question.) See "Objects" in Chapter 5, Packages, Modules, and Object Classes for more about the blessing (and blessings) of objects.

caller

caller EXPR
caller

This function returns information about the stack of current subroutine calls. Without an argument it returns the package name, filename, and line number that the currently executing subroutine was called from:

($package, $filename, $line) = caller;

With an argument it evaluates EXPR as the number of stack frames to go back before the current one. It also reports some additional information.

$i = 0;
while (($pack, $file, $line, $subname, $hasargs, $wantarray) = caller($i++)) {
    ...
}

Furthermore, when called from within the DB package, caller returns more detailed information: it sets the list variable @DB::args to be the arguments passed in the given stack frame.

chdir

chdir EXPR

This function changes the working directory to EXPR, if possible. If EXPR is omitted, it changes to the home directory. The function returns 1 upon success, 0 otherwise (and puts the error code into $!).

chdir "$prefix/lib" or die "Can't cd to $prefix/lib: $!\n";

The following code can be used to move to the user's home directory, one way or another:

$ok = chdir($ENV{"HOME"} || $ENV{"LOGDIR"} || (getpwuid($<))[7]);

Alternately, taking advantage of the default, you could say this:

$ok = chdir() || chdir((getpwuid($<))[7]);

See also the Cwd module, described in Chapter 7, The Standard Perl Library, which lets you keep track of your current directory.

chmod

chmod LIST

This function changes the permissions of a list of files. The first element of the list must be the numerical mode, as in chmod (2). (When using nonliteral mode data, you may need to convert an octal string to a decimal number using the oct function.) The function returns the number of files successfully changed. For example:

$cnt = chmod 0755, 'file1', 'file2';

will set $cnt to 0, 1, or 2, depending on how many files got changed (in the sense that the operation succeeded, not in the sense that the bits were different afterward). Here's a more typical usage:

chmod 0755, @executables;

If you need to know which files didn't allow the change, use something like this:

@cannot = grep {not chmod 0755, $_} 'file1', 'file2', 'file3';
die "$0: could not chmod @cannot\n" if @cannot;

This idiom makes use of the grep function to select only those elements of the list for which the chmod function failed.

chomp

chomp VARIABLE
chomp LIST
chomp

This is a slightly safer version of chop (see below) in that it removes only any line ending corresponding to the current value of $/, and not just any last character. Unlike chop, chomp returns the number of characters deleted. If $/ is empty (in paragraph mode), chomp removes all trailing newlines from the selected string (or strings, if chomping a LIST).

chop

chop VARIABLE
chop LIST
chop

This function chops off the last character of a string and returns the character chopped. The chop operator is used primarily to remove the newline from the end of an input record, but is more efficient than s/\n$//. If VARIABLE is omitted, the function chops the $_ variable. For example:

while (<PASSWD>) {
    chop;   # avoid \n on last field
    @array = split /:/;
    ...
}

If you chop a LIST, each string in the list is chopped:

@lines = `cat myfile`;
chop @lines;

You can actually chop anything that is an lvalue, including an assignment:

chop($cwd = `pwd`);
chop($answer = <STDIN>);

Note that this is different from:

$answer = chop($tmp = <STDIN>);  # WRONG

which puts a newline into $answer, because chop returns the character chopped, not the remaining string (which is in $tmp). One way to get the result intended here is with substr:

$answer = substr <STDIN>, 0, -1;

But this is more commonly written as:

chop($answer = <STDIN>);

To chop more than one character, use substr as an lvalue, assigning a null string. The following removes the last five characters of $caravan:

substr($caravan, -5) = "`;

The negative subscript causes substr to count from the end of the string instead of the beginning.

chown

chown LIST

This function changes the owner (and group) of a list of files. The first two elements of the list must be the numerical uid and gid, in that order. The function returns the number of files successfully changed. For example:

$cnt = chown $uid, $gid, 'file1', 'file2';

will set $cnt to 0, 1, or 2, depending on how many files got changed (in the sense that the operation succeeded, not in the sense that the owner was different afterward). Here's a more typical usage:

chown $uid, $gid, @filenames;

Here's a subroutine that looks everything up for you, and then does the chown:

sub chown_by_name {
    local($user, $pattern) = @_;
    chown((getpwnam($user))[2,3], glob($pattern));
}
&chown_by_name("fred", "*.c");

Notice that this forces the group of each file to be the gid fetched from the passwd file. An alternative is to pass a -1 for the gid, which leaves the group of the file unchanged.

On most systems, you are not allowed to change the ownership of the file unless you're the superuser, although you should be able to change the group to any of your secondary groups. On insecure systems, these restrictions may be relaxed, but this is not a portable assumption.

chr

chr NUMBER

This function returns the character represented by that NUMBER in the character set. For example, chr(65) is "A" in ASCII. To convert multiple characters, use pack(`C*`, LIST) instead.

chroot

chroot FILENAME

This function does the same operation as the chroot system call--see chroot (2). If successful, FILENAME becomes the new root directory for the current process--the starting point for pathnames beginning with "/". This directory is inherited across exec calls and by all subprocesses. There is no way to undo a chroot. Only the superuser can use this function. Here's some code that approximates what many FTP servers do:

chroot +(getpwnam('ftp'))[7]
    or die "Can't do anonymous ftp: $!\n";

close

close FILEHANDLE

This function closes the file, socket, or pipe associated with the filehandle. You don't have to close FILEHANDLE if you are immediately going to do another open on it, since the next open will close it for you. (See open.) However, an explicit close on an input file resets the line counter ($.), while the implicit close done by open does not. Also, closing a pipe will wait for the process executing on the pipe to complete (in case you want to look at the output of the pipe afterward), and it prevents the script from exiting before the pipeline is finished.[1] Closing a pipe explicitly also puts the status value of the command executing on the pipe into $?. For example:

[1] Note, however, that a dup'ed pipe is treated as an ordinary filehandle, and close will not wait for the child on that filehandle. You have to wait for the child by closing the filehandle on which it was originally opened.

open OUTPUT, '|sort >foo';     # pipe to sort
...                            # print stuff to output
close OUTPUT;                  # wait for sort to finish
die "sort failed" if $?;       # check for sordid sort
open INPUT, 'foo';             # get sort's results

FILEHANDLE may be an expression whose value gives the real filehandle name. It may also be a reference to a filehandle object returned by some of the newer object-oriented I/O packages.

closedir

closedir DIRHANDLE

This function closes a directory opened by opendir. See the examples under opendir.

connect

connect SOCKET, NAME

This function does the same thing as the connect system call--see connect (2). The function initiates a connection with another process that is waiting at an accept (2). The function returns true if it succeeded, false otherwise (and puts the error code into $!). NAME should be a packed network address of the proper type for the socket. For example:

connect S, $destadd
    or die "Can't connect to $hostname: $!\n";

To disconnect a socket, either close or shutdown. See also the example in the section "Sockets" in Chapter 6, Social Engineering.

cos

cos EXPR

This function returns the cosine of EXPR (expressed in radians). For example, the following script will print a cosine table of angles measured in degrees:

# Here's the lazy way of getting degrees-to-radians.
$pi = atan2(1,1) * 4;
$piover180 = $pi/180;
# Print table.
for ($_ = 0; $_ <= 90; $_++) {
    printf "%3d %7.5f\n", $_, cos($_ * $piover180);
}

For the inverse cosine operation, you may use the POSIX::acos() function, or use this relation:

sub acos { atan2( sqrt(1 - $_[0] * $_[0]), $_[0] ) }

crypt

crypt PLAINTEXT, SALT

This function encrypts a string exactly in the manner of crypt (3). This is useful for checking the password file for lousy passwords.[2] Only the guys wearing white hats are allowed to do this.

[2] What you really want to do is prevent people from adding the bad passwords in the first place.

To see whether a typed-in password $guess matches the password $pass obtained from a file (such as /etc/passwd), try something like the following:

if (crypt($guess, $pass) eq $pass) {
    # guess is correct
}

Note that there is no easy way to decrypt an encrypted password apart from guessing. Also, truncating the salt to two characters is a waste of CPU time, although the manpage for crypt (3) would have you believe otherwise.

Here's an example that makes sure that whoever runs this program knows their own password:

$pwd = (getpwuid $<)[1];
$salt = substr $pwd, 0, 2;
system "stty -echo";
print "Password: ";
chop($word = <STDIN>);
print "\n";
system "stty echo";
if (crypt($word, $salt) ne $pwd) {
    die "Sorry...\n";
} else {
    print "ok\n";
}

Of course, typing in your own password to whoever asks for it is unwise.

The crypt function is unsuitable for encrypting large quantities of data. Find a library module for PGP (or something like that) for something like that.

dbmclose

dbmclose HASH

This function breaks the binding between a DBM file and a hash.

This function is actually just a call to untie with the proper arguments, but is provided for backward compatibility with older versions of Perl.

dbmopen

dbmopen HASH, DBNAME, MODE

This binds a DBM file to a hash (that is, an associative array). (DBM stands for Data Base Management, and consists of a set of C library routines that allow random access to records via a hashing algorithm.) HASH is the name of the hash (with a %). DBNAME is the name of the database (without the .dir or .pag extension). If the database does not exist, and a valid MODE is specified, the database is created with the protection specified by MODE (as modified by the umask). To prevent creation of the database if it doesn't exist, you may specify a MODE of undef, and the function will return a false value if it can't find an existing database. If your system supports only the older DBM functions, you may have only one dbmopen in your program.

Values assigned to the hash prior to the dbmopen are not accessible.

If you don't have write access to the DBM file, you can only read the hash variables, not set them. If you want to test whether you can write, either use file tests or try setting a dummy array entry inside an eval, which will trap the error.

Note that functions such as keys and values may return huge list values when used on large DBM files. You may prefer to use the each function to iterate over large DBM files. This example prints out the mail aliases on a system using sendmail:

dbmopen %ALIASES, "/etc/aliases", 0666
    or die "Can't open aliases: $!\n";
while (($key,$val) = each %ALIASES) {
    print $key, ' = ', $val, "\n";
}
dbmclose %ALIASES;

Hashes bound to DBM files have the same limitations as DBM files, in particular the restrictions on how much you can put into a bucket. If you stick to short keys and values, it's rarely a problem. Another thing you should bear in mind is that many existing DBM databases contain null-terminated keys and values because they were set up with C programs in mind. The B News history file and the old sendmail aliases file are examples. Just use "$key\0" instead of $key.

There is currently no built-in way to lock generic DBM files. Some would consider this a bug. The DB_File module does provide locking at the granularity of the entire file, however. See the documentation on that module in Chapter 7, The Standard Perl Library for details.

This function is actually just a call to tie with the proper arguments, but is provided for backward compatibility with older versions of Perl.

defined

defined EXPR

This function returns a Boolean value saying whether EXPR has a real value or not. A scalar that contains no valid string, numeric, or reference value is known as the undefined value, or undef for short. Many operations return the undefined value under exceptional conditions, such as end of file, uninitialized variable, system error, and such. This function allows you to distinguish between an undefined null string and a defined null string when you're using operators that might return a real null string.

You may also check to see whether arrays, hashes, or subroutines have been allocated any memory yet. Arrays and hashes are allocated when you first put something into them, whereas subroutines are allocated when a definition has been successfully parsed. Using defined on the predefined special variables is not guaranteed to produce intuitive results.

Here is a fragment that tests a scalar value from a hash:

print if defined $switch{'D'};

When used on a hash element like this, defined only tells you whether the value is defined, not whether the key has an entry in the hash table. It's possible to have an undefined scalar value for an existing hash key. Use exists to determine whether the hash key exists.

In the next example we use the fact that some operations return the undefined value when you run out of data:

print "$val\n" while defined($val = pop(@ary));

The same thing goes for error returns from system calls:

die "Can't readlink $sym: $!"
    unless defined($value = readlink $sym);

Since symbol tables for packages are stored as hashes (associative arrays), it's possible to check for the existence of a package like this:

die "No XYZ package defined" unless defined %XYZ::;

Finally, it's possible to avoid blowing up on nonexistent subroutines:

sub saymaybe {
   if (defined &say) {
       say(@_);
   }
   else {
       warn "Can't say";
   }
}

delete

delete EXPR

This function deletes the specified key and associated value from the specified hash. (It doesn't delete a file. See unlink for that.) Deleting from $ENV{} modifies the environment. Deleting from a hash that is bound to a (writable) DBM file deletes the entry from the DBM file.

The following naïve example inefficiently deletes all the values of a hash:

foreach $key (keys %HASH) {
    delete $HASH{$key};
}

(It would be faster to use the undef command.) EXPR can be arbitrarily complicated as long as the final operation is a hash key lookup:

delete $ref->[$x][$y]{$key};

For normal hashes, the delete function happens to return the value (not the key) that was deleted, but this behavior is not guaranteed for tied hashes, such as those bound to DBM files.

To test whether a hash element has been deleted, use exists.

die

die LIST

Outside of an eval, this function prints the concatenated value of LIST to STDERR and exits with the current value of $! (errno). If $! is 0, it exits with the value of ($? >> 8) (which is the status of the last reaped child from a system, wait, close on a pipe, or `command`). If ($? >> 8) is 0, it exits with 255. If LIST is unspecified, the current value of the $@ variable is propagated, if any. Otherwise the string "Died" is used as the default.

Equivalent examples:


die "Can't cd to spool: $!\n" unless chdir '/usr/spool/news';
chdir '/usr/spool/news' or die "Can't cd to spool: $!\n"

(The second form is generally preferred, since the important part is the chdir.)

Within an eval, the function sets the $@ variable equal to the error message that would have been produced otherwise, and aborts the eval, which then returns the undefined value. The die function can thus be used to raise named exceptions that can be caught at a higher level in the program. See the section on the eval function later in this chapter.

If the final value of LIST does not end in a newline, the current script filename, line number, and input line number (if any) are appended to the message, as well as a newline. Hint: sometimes appending `, stopped" to your message will cause it to make better sense when the string "at scriptname line 123" is appended. Suppose you are running script canasta:

die "/etc/games is no good";
die "/etc/games is no good, stopped";

which produces, respectively:

/etc/games is no good at canasta line 123.
/etc/games is no good, stopped at canasta line 123.

If you want your own error messages reporting the filename and linenumber, use the _ _FILE_ _ and _ _LINE_ _ special tokens:

die '"', _  _FILE_  _, '", line ', _  _LINE_  _, ", phooey on you!\n";

This produces output like:

"canasta", line 38, phooey on you!

do

do BLOCK
do SUBROUTINE(LIST)
do EXPR

The do BLOCK form executes the sequence of commands in the BLOCK, and returns the value of the last expression evaluated in the block. When modified by a loop modifier, Perl executes the BLOCK once before testing the loop condition. (On other statements the loop modifiers test the conditional first.)

The do SUBROUTINE(LIST) is a deprecated form of a subroutine call. See "Subroutines" in Chapter 2, The Gory Details.

The do EXPR, form uses the value of EXPR as a filename and executes the contents of the file as a Perl script. Its primary use is (or rather was) to include subroutines from a Perl subroutine library, so that:

do 'stat.pl';

is rather like:

eval `cat stat.pl`;

except that it's more efficient, more concise, keeps track of the current filename for error messages, and searches all the directories listed in the @INC array. (See the section on "Special Variables" in Chapter 2, The Gory Details.) It's the same, however, in that it does reparse the file every time you call it, so you probably don't want to do this inside a loop.

Note that inclusion of library modules is better done with the use and require operators, which also do error checking and raise an exception if there's a problem.

dump

dump LABEL
dump

This function causes an immediate core dump. Primarily this is so that you can use undump (1) to turn your core dump into an executable binary after having initialized all your variables at the beginning of the program. (The undump program is not supplied with the Perl distribution, and is not even possible on some architectures. There are hooks in the code for using the GNU unexec() routine as an alternative. Other methods may be supported in the future.) When the new binary is executed it will begin by executing a goto LABEL (with all the restrictions that goto suffers). Think of the operation as a goto with an intervening core dump and reincarnation. If LABEL is omitted, the function arranges for the program to restart from the top. Please note that any files opened at the time of the dump will not be open any more when the program is reincarnated, with possible confusion resulting on the part of Perl. See also the -u command-line switch. For example:

#!/usr/bin/perl
use Getopt::Std;
use MyHorridModule;
%days = (
    Sun => 1,
    Mon => 2,
    Tue => 3,
    Wed => 4,
    Thu => 5,
    Fri => 6,
    Sat => 7,
);
dump QUICKSTART if $ARGV[0] eq '-d';
QUICKSTART:
Getopts('f:');
...

This startup code does some slow initialization code, and then calls the dump function to take a snapshot of the program's state. When the dumped version of the program is run, it bypasses all the startup code and goes directly to the QUICKSTART label. If the original script is invoked without the -d switch, it just falls through and runs normally.

If you're looking to use dump to speed up your program, check out the discussion of efficiency matters in Chapter 8, Other Oddments, as well the Perl native-code compiler in Chapter 6, Social Engineering. You might also consider autoloading, which at least makes it appear to run faster.

each

each HASH

This function returns a two-element list consisting of the key and value for the next value of a hash. With successive calls to each you can iterate over the entire hash. Entries are returned in an apparently random order. When the hash is entirely read, a null list is returned (which, when used in a list assignment, produces a false value). The next call to each after that will start a new iteration. The iterator can be reset either by reading all the elements from the hash, or by calling the keys function in scalar context. You must not add elements to the hash while iterating over it, although you are permitted to use delete. In a scalar context, each returns just the key, but watch out for false keys.

There is a single iterator for each hash, shared by all each, keys, and values function calls in the program. This means that after a keys or values call, the next each call will start again from the beginning. The following example prints out your environment like the printenv (1) program, only in a different order:

while (($key,$value) = each %ENV) {
    print "$key=$value\n";
}

eof

eof FILEHANDLE
eof()
eof

This function returns true if the next read on FILEHANDLE will return end of file, or if FILEHANDLE is not open. FILEHANDLE may be an expression whose value gives the real filehandle name. An eof without an argument returns the end-of-file status for the last file read. Empty parentheses () may be used in connection with the combined files listed on the command line. That is, inside a while (<>) loop eof() will detect the end of only the last of a group of files. Use eof(ARGV) or eof (without the parentheses) to test each file in a while (<>) loop. For example, the following code inserts dashes just before the last line of the last file:

while (<>) {
    if (eof()) {
        print "-" x 30, "\n";
    }
    print;
}

On the other hand, this script resets line numbering on each input file:

while (<>) {
    print "$.\t$_";
    if (eof) {       # Not eof().
        close ARGV;  # reset $.
    }
}

Like "$" in a sed program, eof tends to show up in line number ranges. Here's a script that prints lines from /pattern/ to end of each input file:

while (<>) {
    print if /pattern/ .. eof;
}

Here, the flip-flop operator (..) evaluates the regular expression match for each line. Until the pattern matches, the operator returns false. When it finally matches, the operator starts returning true, causing the lines to be printed. When the eof operator finally returns true (at the end of the file being examined), the flip-flop operator resets, and starts returning false again.

Note that the eof function actually reads a byte and then pushes it back on the input stream with ungetc (3), so it is not very useful in an interactive context. In fact, experienced Perl programmers rarely use eof, since the various input operators already behave quite nicely in while-loop conditionals. See the example in the description of foreach in Chapter 2, The Gory Details.

eval

eval EXPR
eval BLOCK

The value expressed by EXPR is parsed and executed as though it were a little Perl program. It is executed in the context of the current Perl program, so that any variable settings remain afterward, as do any subroutine or format definitions. The code of the eval is treated as a block, so any locally scoped variables declared within the eval last only until the eval is done. (See local and my.) As with any code in a block, a final semicolon is not required. If EXPR is omitted, the operator evaluates $_.

The value returned from an eval is the value of the last expression evaluated, just as with subroutines. Similarly, you may use the return operator to return a value from the middle of the eval. If there is a syntax error or run-time error (including any produced by the die operator), eval returns the undefined value and puts the error message in $@. If there is no error, $@ is guaranteed to be set to the null string, so you can test it reliably afterward for errors.

Here's a statement that assigns an element to a hash chosen at run-time:

eval "\$$arrayname{\$key} = 1";

(You can accomplish that more simply with soft references--see "Symbolic References" in Chapter 4, References and Nested Data Structures.) And here is a simple Perl shell:

while (<>) { eval; print $@; }

Since eval traps otherwise-fatal errors, it is useful for determining whether a particular feature (such as socket or symlink) is implemented. In fact, eval is the way to do all exception handling in Perl. If the code to be executed doesn't vary, you should use the eval BLOCK form to trap run-time errors; the code in the block is compiled only once rather than on each execution, yielding greater efficiency. The error, if any, is still returned in $@. Examples:

# make divide-by-zero non-fatal
eval { $answer = $a / $b; }; warn $@ if $@;
# same thing, but less efficient
eval '$answer = $a / $b'; warn $@ if $@;
# a compile-time error (not trapped)
eval { $answer = };
# a run-time error
eval '$answer =';  # sets $@

Here, the code in the BLOCK has to be valid Perl code to make it past the compilation phase. The code in the string doesn't get examined until run-time, and so doesn't cause an error until run-time.

With an eval you should be careful to remember what's being looked at when:

eval $x;          # CASE 1
eval "$x";        # CASE 2
eval '$x';        # CASE 3
eval { $x };      # CASE 4
eval "\$$x++";    # CASE 5
$$x++;            # CASE 6

Cases 1 and 2 above behave identically: they run the code contained in the variable $x. (Case 2 has misleading double quotes, making the reader wonder what else might be happening, when nothing is. The contents of $x would in any event have to be converted to a string for parsing.) Cases 3 and 4 likewise behave in the same way: they run the code $x, which does nothing at all except return the value of $x. (Case 4 is preferred since the expression doesn't need to recompiled each time.) Case 5 is a place where normally you would like to use double quotes to let you interpolate the variable name, except that in this particular situation you can just use symbolic references instead, as in case 6.

A frequently asked question is how to set up an exit routine. One common way is to use an END block. But you can also do it with an eval, like this:

#!/usr/bin/perl
eval <<'EndOfEval';  $start = __LINE__;
   .
   .           # your ad here
   .
EndOfEval
# Cleanup
unlink "/tmp/myfile$$";
$@ && ($@ =~ s/\(eval \d+\) at line (\d+)/$0 .
    " line " . ($1+$start)/e, die $@);
exit 0;

Note that the code supplied for an eval might not be recompiled if the text hasn't changed. On the rare occasions when you want to force a recompilation (because you want to reset a .. operator, for instance), you could say something like this:

eval $prog . '#' . ++$seq;

exec

exec LIST

This function terminates the currently running Perl script by executing another program in place of itself. If there is more than one argument in LIST (or if LIST is an array with more than one value) the function calls C's execvp (3) routine with the arguments in LIST. This bypasses any shell processing of the command. If there is only one scalar argument, the argument is checked for shell metacharacters. If metacharacters are found, the entire argument is passed to "/bin/sh -c" for parsing.[3] If there are no metacharacters, the argument is split into words and passed directly to execvp (3) in the interests of efficiency, since this bypasses all the overhead of shell processing. Ordinarily exec never returns--if it does return, it always returns false, and you should check $! to find out what went wrong. Note that exec (and system) do not flush your output buffer, so you may need to enable command buffering by setting $| on one or more filehandles to avoid lost output. This statement runs the echo program to print the current argument list:

[3] Under UNIX, that is. Other operating systems may use other command interpreters.

exec 'echo', 'Your arguments are: ', @ARGV;

This example shows that you can exec a pipeline:

exec "sort $outfile | uniq"
  or die "Can't do sort/uniq: $!\n";

The UNIX execv (3) call provides the ability to tell a program the name it was invoked as. This name might have nothing to do with the name of the program you actually gave the operating system to run. By default, Perl simply replicates the first element of LIST and uses it for both purposes. If, however, you don't really want to execute the first argument of LIST, but you want to lie to the program you are executing about its own name, you can do so. Put the real name of the program you want to run into a variable and then put that variable out in front of the LIST without a comma, kind of like a filehandle for a print statement. (This always forces interpretation of the LIST as a multi-valued list, even if there is only a single scalar in the list.) Then the first element of LIST will be used only to mislead the executing program as to its name. For example:

$shell = '/bin/csh';
exec $shell '-sh', @args;      # pretend it's a login shell
die "Couldn't execute csh: $!\n";

You can also replace the simple scalar holding the program name with a block containing arbitrary code, which simplifies the above example to:

exec {'/bin/csh'} '-sh', @args; # pretend it's a login shell

exists

exists EXPR

This function returns true if the specified hash key exists in its hash, even if the corresponding value is undefined.

print "Exists\n" if exists $hash{$key};
print "Defined\n" if defined $hash{$key};
print "True\n" if $hash{$key};

A hash element can only be true if it's defined, and can only be defined if it exists, but the reverse doesn't necessarily hold true in either case.

EXPR can be arbitrarily complicated as long as the final operation is a hash key lookup:

if (exists $ref->[$x][$y]{$key}) { ... }

exit

exit EXPR

This function evaluates EXPR and exits immediately with that value. Here's a fragment that lets a user exit the program by typing x or X:

$ans = <STDIN>;
exit 0 if $ans =~ /^[Xx]/;

If EXPR is omitted, the function exits with 0 status. You shouldn't use exit to abort a subroutine if there's any chance that someone might want to trap whatever error happened. Use die instead, which can be trapped by an eval.

exp

exp EXPR

This function returns e to the power of EXPR. If EXPR is omitted, it gives exp($_). To do general exponentiation, use the ** operator.

fcntl

fcntl FILEHANDLE, FUNCTION, SCALAR

This function calls UNIX's fcntl (2) function. (fcntl stands for "file control".) You'll probably have to say:

use Fcntl;

first to get the correct function definitions. SCALAR will be read and/or written depending on the FUNCTION--a pointer to the string value of SCALAR will be passed as the third argument of the actual fcntl call. (If SCALAR has no string value but does have a numeric value, that value will be passed directly rather than a pointer to the string value.)

The return value of fcntl (and ioctl) is as follows:

System call returns Perl returns

-1 undefined value

0 string "0 but true"

anything else that number

Thus Perl returns true on success and false on failure, yet you can still easily determine the actual value returned by the operating system:

$retval = fcntl(...) or $retval = -1;
printf "System returned %d\n", $retval;

Here, even the string "0 but true" prints as 0, thanks to the %d format.

For example, since Perl always sets the close-on-exec flag for file descriptors above 2, if you wanted to pass file descriptor 3 to a subprocess, you might want to clear the flag like this:

use Fcntl;
open TTY,"+>/dev/tty" or die "Can't open /dev/tty: $!\n";
fileno TTY == 3 or die "Internal error: fd mixup";
fcntl TTY, &F_SETFL, 0
    or die "Can't clear the close-on-exec flag: $!\n";

fcntl will produce a fatal error if used on a machine that doesn't implement fcntl (2). On machines that do implement it, you can do such things as modify the close-on-exec flags, modify the non-blocking I/O flags, emulate the lockf (3) function, and arrange to receive the SIGIO signal when I/O is pending. You might even have record-locking facilities.

fileno

fileno FILEHANDLE

This function returns the file descriptor for a filehandle. (A file descriptor is a small integer, unlike the filehandle, which is a symbol.) It returns undef if the handle is not open. It's useful for constructing bitmaps for select, and for passing to certain obscure system calls if syscall (2) is implemented. It's also useful for double-checking that the open function gave you the file descriptor you wanted--see the example under fcntl.

If FILEHANDLE is an expression, its value is taken to represent a filehandle, either indirectly by name, or directly as a reference to a filehandle object.

A caution: don't count on the association of a Perl filehandle and a numeric file descriptor throughout the life of the program. If a file has been closed and reopened, the file descriptor may change. Filehandles STDIN, STDOUT, and STDERR start with file descriptors of 0, 1, and 2 (the UNIX standard convention), but even they can change if you start closing and opening them with wild abandon. But you can't get into trouble with 0, 1, and 2 as long as you always reopen immediately after closing, since the basic rule on UNIX systems is to pick the lowest available descriptor, and that'll be the one you just closed.

flock

flock FILEHANDLE, OPERATION

This function calls flock (2) on FILEHANDLE. See the manual page for flock (2) for the definition of OPERATION. Invoking flock will produce a fatal error if used on a machine that doesn't implement flock (2) or emulate it through some other locking mechanism. Here's a mailbox appender for some BSD-based systems:

$LOCK_SH = 1;
$LOCK_EX = 2;
$LOCK_NB = 4;
$LOCK_UN = 8;
sub lock {
    flock MBOX, $LOCK_EX;
    # and, in case someone appended
    # while we were waiting...
    seek MBOX, 0, 2;
}
sub unlock {
    flock MBOX, $LOCK_UN;
}
open MBOX, ">>/usr/spool/mail/$ENV{'USER'}"
    or die "Can't open mailbox: $!";
lock();
print MBOX $msg, "\n\n";
unlock();

Note that flock is unlikely to work on a file being accessed through a network file system.

fork

fork

This function does a fork (2) call. If it succeeds, the function returns the child pid to the parent process and 0 to the child process. (If it fails, it returns the undefined value to the parent process. There is no child process.) Note that unflushed buffers remain unflushed in both processes, which means you may need to set $| on one or more filehandles earlier in the program to avoid duplicate output.

A nearly bulletproof way to launch a child process while checking for "cannot fork" errors would be:

FORK: {
    if ($pid = fork) {
        # parent here
        # child process pid is available in $pid
    } elsif (defined $pid) { # $pid is zero here if defined
        # child here
        # parent process pid is available with getppid
    } elsif ($! =~ /No more process/) {     
        # EAGAIN, supposedly recoverable fork error
        sleep 5;
        redo FORK;
    } else {
        # weird fork error
        die "Can't fork: $!\n";
    }
}

These precautions are not necessary on operations which do an implicit fork (2), such as system, backquotes, or opening a process as a filehandle, because Perl automatically retries a fork on a temporary failure in these cases. Be very careful to end the child code with an exit, or your child may inadvertently leave the conditional and start executing code intended only for the parent process.

If you fork your child processes, you'll have to wait on their zombies when they die. See the wait function for examples of doing this.

The fork function is unlikely to be implemented on any operating system not resembling UNIX, unless it purports POSIX compliance.

format

format NAME =
    picture line
    value list
    ...
.

Declares a named sequence of picture lines (with associated values) for use by the write function. If NAME is omitted, the name defaults to STDOUT, which happens to be the default format name for the STDOUT filehandle. Since, like a sub declaration, this is a global declaration that happens at compile time, any variables used in the value list need to be visible at the point of the format's declaration. That is, lexically scoped variables must be declared earlier in the file, while dynamically scoped variables merely need to be set in the routine that calls write. Here's an example (which assumes we've already calculated $cost and $quantity:

my $str = "widget";               # A lexically scoped variable.
format Nice_Output =
Test: @<<<<<<<< @||||| @>>>>>
      $str,     $%,    '$' . int($num)
.
$~ = "Nice_Output";               # Select our format.
local $num = $cost * $quantity;   # Dynamically scoped variable.
write;

Like filehandles, format names are identifiers that exist in a symbol table (package) and may be fully qualified by package name. Within the typeglobs of a symbol table's entries, formats reside in their own namespace, which is distinct from filehandles, directory handles, scalars, arrays, hashes, or subroutines. Like those other six types, however, a format named Whatever would also be affected by a local on the *Whatever typeglob. In other words, a format is just another gadget contained in a typeglob, independent of the other gadgets.

The "Formats" section in Chapter 2, The Gory Details contains numerous details and examples of their use. The "Per Filehandle Special Variables" and "Global Special Variables" sections in Chapter 2, The Gory Details describe the internal format-specific variables, and the English and FileHandle modules in Chapter 7, The Standard Perl Library provide easier access to them.

formline

formline PICTURE, LIST

This is an internal function used by formats, although you may also call it. It formats a list of values according to the contents of PICTURE, placing the output into the format output accumulator, $^A. Eventually, when a write is done, the contents of $^A are written to some filehandle, but you could also read $^A yourself and then set $^A back to "". Note that a format typically does one formline per line of form, but the formline function itself doesn't care how many newlines are embedded in the PICTURE. This means that the ~ and ~~ tokens will treat the entire PICTURE as a single line. You may therefore need to use multiple formlines to implement a single record-format, just like the format compiler.

Be careful if you put double quotes around the picture, since an @ character may be taken to mean the beginning of an array name. formline always returns true. See "Formats" in Chapter 2, The Gory Details for other examples.

getc

getc FILEHANDLE
getc

This function returns the next byte from the input file attached to FILEHANDLE. At end-of-file, it returns a null string. If FILEHANDLE is omitted, the function reads from STDIN. This operator is very slow, but is occasionally useful for single-character, buffered input from the keyboard. This does not enable single-character input. For unbuffered input, you have to be slightly more clever, in an operating-system-dependent fashion. Under UNIX you might say this:

if ($BSD_STYLE) {
  system "stty cbreak </dev/tty >/dev/tty 2>&1";
} else {
  system "stty", "-icanon", "eol", "\001";
}
$key = getc;
if ($BSD_STYLE) {
  system "stty -cbreak </dev/tty >/dev/tty 2>&1";
} else {
  system "stty", "icanon", "eol", "^@"; # ASCII NUL
}
print "\n";

This code puts the next character typed on the terminal in the string $key. If your stty program has options like cbreak, you'll need to use the code where $BSD_STYLE is true, otherwise, you'll need to use the code where it is false. Determining the options for stty is left as an exercise to the reader.

The POSIX module in Chapter 7, The Standard Perl Library provides a more portable version of this using the POSIX::getattr() function. See also the TERM::ReadKey module from your nearest CPAN site.

getgrent

getgrent
setgrent
endgrent

These functions do the same thing as their like-named system library routines--see getgrent (3). These routines iterate through your /etc/group file (or its moral equivalent coming from some server somewhere). The return value from getgrent in list context is:

($name, $passwd, $gid, $members)

where $members contains a space-separated list of the login names of the members of the group. To set up a hash for translating group names to gids, say this:

while (($name, $passwd, $gid) = getgrent) {
    $gid{$name} = $gid;
}

In scalar context, getgrent returns only the group name.

getgrgid

getgrgid GID

This function does the same thing as getgrgid (3): it looks up a group file entry by group number. The return value in list context is:

($name, $passwd, $gid, $members)

where $members contains a space-separated list of the login names of the members of the group. If you want to do this repeatedly, consider caching the data in a hash (associative array) using getgrent.

In scalar context, getgrgid returns only the group name.

getgrnam

getgrnam NAME

This function does the same thing as getgrnam (3): it looks up a group file entry by group name. The return value in list context is:

($name, $passwd, $gid, $members)

where $members contains a space-separated list of the login names of the members of the group. If you want to do this repeatedly, consider slurping the data into a hash (associative array) using getgrent.

In scalar context, getgrnam returns only the numeric group ID.

gethostbyaddr

gethostbyaddr ADDR, ADDRTYPE

This function does the same thing as gethostbyaddr (3): it translates a packed binary network address to its corresponding names (and alternate addresses). The return value in list context is:

($name, $aliases, $addrtype, $length, @addrs)

where @addrs is a list of packed binary addresses. In the Internet domain, each address is four bytes long, and can be unpacked by saying something like:

($a, $b, $c, $d) = unpack('C4', $addrs[0]);

In scalar context, gethostbyaddr returns only the host name. See the section on "Sockets" in Chapter 6, Social Engineering for another approach.

gethostbyname

gethostbyname NAME

This function does the same thing as gethostbyname (3): it translates a network hostname to its corresponding addresses (and other names). The return value in list context is:

($name, $aliases, $addrtype, $length, @addrs)

where @addrs is a list of raw addresses. In the Internet domain, each address is four bytes long, and can be unpacked by saying something like:

($a, $b, $c, $d) = unpack('C4', $addrs[0]);

In scalar context, gethostbyname returns only the host address. See the section on "Sockets" in Chapter 6, Social Engineering for another approach.

gethostent

gethostent
sethostent STAYOPEN
endhostent

These functions do the same thing as their like-named system library routines--see gethostent (3).

They iterate through your /etc/hosts file and return each entry one at a time. The return value from gethostent is:

($name, $aliases, $addrtype, $length, @addrs)

where @addrs is a list of raw addresses. In the Internet domain, each address is four bytes long, and can be unpacked by saying something like:

($a, $b, $c, $d) = unpack('C4', $addrs[0]);

Scripts that use these routines should not be considered portable. If a machine uses a nameserver, it would interrogate most of the Internet to try to satisfy a request for all the addresses of every machine on the planet. So these routines are unimplemented on such machines.

getlogin

getlogin

This function returns the current login from /etc/utmp, if any. If null, use getpwuid. For example:

$login = getlogin || (getpwuid($<))[0] || "Intruder!!";

getnetbyaddr

getnetbyaddr ADDR, ADDRTYPE

This function does the same thing as getnetbyaddr (3): it translates a network address to the corresponding network name or names. The return value in list context is:

($name, $aliases, $addrtype, $net)

In scalar context, getnetbyaddr returns only the network name.

getnetbyname

getnetbyname NAME

This function does the same thing as getnetbyname (3): it translates a network name to its corresponding network address. The return value in list context is:

($name, $aliases, $addrtype, $net)

In scalar context, getnetbyname returns only the network address.

getnetent

getnetent
setnetent STAYOPEN
endnetent

These functions do the same thing as their like-named system library routines--see getnetent (3). They iterate through your /etc/networks file, or moral equivalent. The return value in list context is:

($name, $aliases, $addrtype, $net)

In scalar context, getnetent returns only the network name.

getpeername

getpeername SOCKET

This function returns the packed socket address of other end of the SOCKET connection. For example:

use Socket;
$hersockaddr = getpeername SOCK;
($port, $heraddr) = unpack_sockaddr_in($hersockaddr);
$herhostname = gethostbyaddr($heraddr, AF_INET);
$herstraddr = inet_ntoa($heraddr);

getpgrp

getpgrp PID

This function returns the current process group for the specified PID (use a PID of 0 for the current process). Invoking getpgrp will produce a fatal error if used on a machine that doesn't implement getpgrp (2). If PID is omitted, the function returns the process group of the current process (the same as using a PID of 0). On systems implementing this operator with the POSIX getpgrp (2) system call, PID must be omitted or, if supplied, must be 0.

getppid

getppid

This function returns the process ID of the parent process. On the typical UNIX system, if your parent process ID changes to 1, your parent process has died and you've been adopted by the init program.

getpriority

getpriority WHICH, WHO

This function returns the current priority for a process, a process group, or a user. See getpriority (2). Invoking getpriority will produce a fatal error if used on a machine that doesn't implement getpriority (2). For example, to get the priority of the current process, use:

$curprio = getpriority(0, 0);

getprotobyname

getprotobyname NAME

This function does the same thing as getprotobyname (3): it translates a protocol name to its corresponding number. The return value in list context is:

($name, $aliases, $protocol_number)

In scalar context, getprotobyname returns only the protocol number.

getprotobynumber

getprotobynumber NUMBER

This function does the same thing as getprotobynumber (3): it translates a protocol number to its corresponding name. The return value in list context is:

($name, $aliases, $protocol_number)

In scalar context, getprotobynumber returns only the protocol name.

getprotoent

getprotoent
setprotoent STAYOPEN
endprotoent

These functions do the same thing as their like-named system library routines--see getprotent (3). The return value from getprotoent is:

($name, $aliases, $protocol_number)

In scalar context, getprotoent returns only the protocol name.

getpwent

getpwent
setpwent
endpwent

These functions do the same thing as their like-named system library routines--see getpwent (3). They iterate through your /etc/passwd file (or its moral equivalent coming from some server somewhere). The return value in list context is:

($name,$passwd,$uid,$gid,$quota,$comment,$gcos,$dir,$shell)

Some machines may use the quota and comment fields for other purposes, but the remaining fields will always be the same. To set up a hash for translating login names to uids, say this:

while (($name, $passwd, $uid) = getpwent) {
    $uid{$name} = $uid;
}

In scalar context, getpwent returns only the username.

getpwnam

getpwnam NAME

This function does the same thing as getpwnam (3): it translates a username to the corresponding passwd file entry. The return value in list context is:

($name,$passwd,$uid,$gid,$quota,$comment,$gcos,$dir,$shell)

If you want to do this repeatedly, consider caching the data in a hash (associative array) using getpwent.

In scalar context, getpwnam returns only the numeric user ID.

getpwuid

getpwuid UID

This function does the same thing as getpwuid (3): it translates a numeric user id to the corresponding passwd file entry. The return value in list context is:

($name,$passwd,$uid,$gid,$quota,$comment,$gcos,$dir,$shell)

If you want to do this repeatedly, consider slurping the data into a hash using getpwent.

In scalar context, getpwuid returns the username.

getservbyname

getservbyname NAME, PROTO

This function does the same thing as getservbyname (3): it translates a service (port) name to its corresponding port number. PROTO is a protocol name such as "tcp". The return value in list context is:

($name, $aliases, $port_number, $protocol_name)

In scalar context, getservbyname returns only the service port number.

getservbyport

getservbyport PORT, PROTO

This function does the same thing as getservbyport (3): it translates a service (port) number to its corresponding names. PROTO is a protocol name such as "tcp". The return value in list context is:

($name, $aliases, $port_number, $protocol_name)

In scalar context, getservbyport returns only the service port name.

getservent

getservent
setservent STAYOPEN
endservent

These functions do the same thing as their like-named system library routines--see getservent (3). They iterate through the /etc/services file or its equivalent. The return value in list context is:

($name, $aliases, $port_number, $protocol_name)

In scalar context, getservent returns only the service port name.

getsockname

getsockname SOCKET

This function returns the packed sockaddr address of this end of the SOCKET connection. (And why wouldn't you know your own address already? Because you might have bound an address containing wildcards to the generic socket before doing an accept. Or because you might have been passed a socket by your parent process--for example, inetd.)

use Socket;
$mysockaddr = getsockname(SOCK);
($port, $myaddr) = unpack_sockaddr_in($mysockaddr);

getsockopt

getsockopt SOCKET, LEVEL, OPTNAME

This function returns the socket option requested, or the undefined value if there is an error. See setsockopt for more.

glob

glob EXPR

This function returns the value of EXPR with filename expansions such as a shell would do. (If EXPR is omitted, $_ is globbed instead.) This is the internal function implementing the <*> operator, except that it may be easier to type this way. For example, compare these two:

@result = map { glob($_) } "*.c", "*.c,v";
@result = map <${_}>, "*.c", "*.c,v";

The glob function is not related to the Perl notion of typeglobs, other than that they both use a * to represent multiple items.

gmtime

gmtime EXPR

This function converts a time as returned by the time function to a 9-element list with the time correct for the Greenwich time zone (aka GMT, or UTC, or even Zulu in certain cultures, not including the Zulu culture, oddly enough). Typically used as follows:

($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
        gmtime(time);

All list elements are numeric, and come straight out of a struct tm (that's a C programming structure--don't sweat it). In particular this means that $mon has the range 0..11, $wday has the range 0..6, and the year has had 1,900 subtracted from it. (You can remember which ones are 0-based because those are the ones you're always using as subscripts into 0-based arrays containing month and day names.) If EXPR is omitted, it does gmtime(time). For example, to print the current month in London:

$london_month = (qw(Jan Feb Mar Apr May Jun
        Jul Aug Sep Oct Nov Dec))[(gmtime)[4]];

The Perl library module Time::Local contains a subroutine, timegm( ), that can convert in the opposite direction.

In scalar context, gmtime returns a ctime (3)-like string based on the GMT time value.

goto

goto LABEL
goto EXPR
goto &NAME

goto LABEL finds the statement labeled with LABEL and resumes execution there. It may not be used to go into any construct that requires initialization, such as a subroutine or a foreach loop. It also can't be used to go into a construct that is optimized away. It can be used to go almost anywhere else within the dynamic scope,[4] including out of subroutines, but for that purpose it's usually better to use some other construct such as last or die. The author of Perl has never felt the need to use this form of goto (in Perl, that is--C is another matter).

[4] This means that if it doesn't find the label in the current routine, it looks back through the routines that called the current routine for the label, thus making it nearly impossible to maintain your program.

Going to even greater heights of orthogonality (and depths of idiocy), Perl allows goto EXPR, which expects EXPR to evaluate to a label name, whose scope is guaranteed to be unresolvable until run-time since the label is unknown when the statement is compiled. This allows for computed gotos per FORTRAN, but isn't necessarily recommended[5] if you're optimizing for maintainability:

[5] Understatement is reputed to be funny, so we thought we'd try one here.

goto +("FOO", "BAR", "GLARCH")[$i];

goto &NAME is highly magical, substituting a call to the named subroutine for the currently running subroutine. This is used by AUTOLOAD subroutines that wish to load another subroutine and then pretend that this subroutine--and not the original one--had been called in the first place (except that any modifications to @_ in the original subroutine are propagated to the replacement subroutine). After the goto, not even caller will be able to tell that the original routine was called first.

grep

grep EXPR, LIST
grep BLOCK LIST

This function evaluates EXPR or BLOCK in a Boolean context for each element of LIST, temporarily setting $_ to each element in turn. In list context, it returns a list of those elements for which the expression is true. (The operator is named after a beloved UNIX program that extracts lines out of a file that match a particular pattern. In Perl the expression is often a pattern, but doesn't have to be.) In scalar context, grep returns the number of times the expression was true.

Presuming @all_lines contains lines of code, this example weeds out comment lines:

@code_lines = grep !/^#/, @all_lines;

Since $_ is a reference into the list value, altering $_ will modify the elements of the original list. While this is useful and supported, it can occasionally cause bizarre results if you aren't expecting it. For example:

@list = qw(barney fred dino wilma);
@greplist = grep { s/^[bfd]// } @list;

@greplist is now "arney", "red", "ino", but @list is now "arney", "red", "ino", "wilma"! Caveat Programmor.

See also map. The following two statements are functionally equivalent:

@out = grep { EXPR } @in;
@out = map { EXPR ? $_ : () } @in

hex

hex EXPR

This function interprets EXPR as a hexadecimal string and returns the equivalent decimal value. (To interpret strings that might start with 0 or 0x see oct.) If EXPR is omitted, it interprets $_. The following code sets $number to 4,294,906,560:

$number = hex("ffff12c0");

To do the inverse function, use:

sprintf "%lx", $number;         # (That's an ell, not a one.)

import

import CLASSNAME LIST
import CLASSNAME

There is no built-in import function. It is merely an ordinary class method defined (or inherited) by modules that wish to export names to another module through the use operator. See use for details.

index

index STR, SUBSTR, POSITION
index STR, SUBSTR

This function returns the position of the first occurrence of SUBSTR in STR. The POSITION, if specified, says where to start looking. Positions are based at 0 (or whatever you've set the $[ variable to--but don't do that). If the substring is not found, the function returns one less than the base, ordinarily -1. To work your way through a string, you might say:

$pos = -1;
while (($pos = index($string, $lookfor, $pos)) > -1) {
    print "Found at $pos\n";
    $pos++;
}

int

int EXPR

This function returns the integer portion of EXPR. If EXPR is omitted, it uses $_. If you're a C programmer, you'll often forget to use int in conjunction with division, which is a floating-point operation in Perl:

$average_age = 939/16;      # yields 58.6875 (58 in C)
$average_age = int 939/16;  # yields 58

ioctl

ioctl FILEHANDLE, FUNCTION, SCALAR

This function implements the ioctl (2) system call. You'll probably have to say:

require "ioctl.ph";
    # probably /usr/local/lib/perl/ioctl.ph

first to get the correct function definitions. If ioctl.ph doesn't exist or doesn't have the correct definitions you'll have to roll your own, based on your C header files such as <sys/ioctl.h>. (The Perl distribution includes a script called h2ph to help you do this, but it's non-trivial.) SCALAR will be read and/or written depending on the FUNCTION--a pointer to the string value of SCALAR will be passed as the third argument of the actual ioctl (2) call. (If SCALAR has no string value but does have a numeric value, that value will be passed directly rather than a pointer to the string value.) The pack and unpack functions are useful for manipulating the values of structures used by ioctl. The following example sets the erase character to DEL on many UNIX systems (see the POSIX module in Chapter 7, The Standard Perl Library for a slightly more portable interface):

require 'ioctl.ph';
$getp = &TIOCGETP or die "NO TIOCGETP";
$sgttyb_t = "ccccs";            # 4 chars and a short
if (ioctl STDIN, $getp, $sgttyb) {
    @ary = unpack $sgttyb_t, $sgttyb;
    $ary[2] = 127;
    $sgttyb = pack $sgttyb_t, @ary;
    ioctl STDIN, &TIOCSETP, $sgttyb
        or die "Can't ioctl TIOCSETP: $!";
}

The return value of ioctl (and fcntl) is as follows:

System call returns Perl returns

-1 undefined value

0 string "0 but true"

anything else that number

Thus Perl returns true on success and false on failure, yet you can still easily determine the actual value returned by the operating system:

$retval = ioctl(...) or $retval = -1;
printf "System returned %d\n", $retval;

Calls to ioctl should not be considered portable. If, say, you're merely turning off echo once for the whole script, it's much more portable (and not much slower) to say:

system "stty -echo";   # Works on most UNIX boxen.

Just because you can do something in Perl doesn't mean you ought to. To quote the Apostle Paul, "Everything is permissible--but not everything is beneficial."

join

join EXPR, LIST

This function joins the separate strings of LIST into a single string with fields separated by the value of EXPR, and returns the string. For example:

$_ = join ':', $login,$passwd,$uid,$gid,$gcos,$home,$shell;

To do the opposite, see split. To join things together into fixed-position fields, see pack.

The most efficient way to concatenate many strings together is to join them with a null string.

keys

keys HASH

This function returns a list consisting of all the keys of the named hash. The keys are returned in an apparently random order, but it is the same order as either the values or each function produces (assuming that the hash has not been modified between calls). Here is yet another way to print your environment:

@keys = keys %ENV;
@values = values %ENV;
while (@keys) {
    print pop(@keys), '=', pop(@values), "\n";
}

or how about sorted by key:

foreach $key (sort keys %ENV) {
    print $key, '=', $ENV{$key}, "\n";
}

To sort an array by value, you'll need to provide a comparison function. Here's a descending numeric sort of a hash by its values:

foreach $key (sort { $hash{$b} <=> $hash{$a} } keys %hash)) {
    printf "%4d %s\n", $hash{$key}, $key;
}

Note that using keys on a hash bound to a largish DBM file will produce a largish list, causing you to have a largish process. You might prefer to use the each function in this case, which will iterate over the hash entries one-by-one without slurping them all into a single gargantuan list.

In scalar context, keys returns the number of elements of the hash (and resets the each iterator). However, to get this information for tied hashes, including DBM files, Perl must still walk the entire hash, so it's not very efficient in that case.

kill

kill LIST

This function sends a signal to a list of processes. The first element of the list must be the signal to send. You may use a signal name in quotes (without a SIG on the front). The function returns the number of processes successfully signaled. If the signal is negative, the function kills process groups instead of processes. (On System V, a negative process number will also kill process groups, but that's not portable.) Examples:

$cnt = kill 1, $child1, $child2;
kill 9, @goners;
kill 'STOP', getppid;  # Can *so* suspend my login shell...

last

last LABEL
last

The last command is like the break statement in C (as used in loops); it immediately exits the loop in question. If the LABEL is omitted, the command refers to the innermost enclosing loop. The continue block, if any, is not executed.

LINE: while (<STDIN>) {
    last LINE if /^$/; # exit when done with header
    # rest of loop here
}

lc

lc EXPR

This function returns a lowercased version of EXPR (or $_ if omitted). This is the internal function implementing the \L escape in double-quoted strings. POSIX setlocale (3) settings are respected.

lcfirst

lcfirst EXPR

This function returns a version of EXPR (or $_ if omitted) with the first character lowercased. This is the internal function implementing the \l escape in double-quoted strings. POSIX setlocale (3) settings are respected.

length

length EXPR

This function returns the length in bytes of the scalar value EXPR. If EXPR is omitted, the function returns the length of $_, but be careful that the next thing doesn't look like the start of an EXPR, or the tokener will get confused. When in doubt, always put in parentheses.

Do not try to use length to find the size of an array or hash. Use scalar @array for the size of an array, and scalar keys %hash for the size of a hash. (The scalar is typically dropped when redundant, which is typical.)

link

link OLDFILE, NEWFILE

This function creates a new filename linked to the old filename. The function returns 1 for success, 0 otherwise (and puts the error code into $!). See also symlink later in this chapter. This function is unlikely to be implemented on non-UNIX systems.

listen

listen SOCKET, QUEUESIZE

This function does the same thing as the listen (2) system call. It tells the system that you're going to be accepting connections on this socket and that the system can queue the number of waiting connections specified by QUEUESIZE. Imagine having call-waiting on your phone, with up to five callers queued. (Gives me the willies!) The function returns true if it succeeded, false otherwise (and puts the error code into $!). See the section "Sockets" in Chapter 6, Social Engineering.

local

local EXPR

This operator declares one or more global variables to have locally scoped values within the innermost enclosing block, subroutine, eval, or file. If more than one variable is listed, the list must be placed in parentheses, because the operator binds more tightly than comma. All the listed variables must be legal lvalues, that is, something you could assign to. This operator works by saving the current values of those variables on a hidden stack and restoring them upon exiting the block, subroutine, or eval, or file. After the local is executed, but before the scope is exited, any called subroutines will see the local, inner value, not the previous, outer value, because the variable is still a global variable, despite having a localized value. The technical term for this is "dynamic scoping".

The EXPR may be assigned to if desired, which allows you to initialize your local variables. (If no initializer is given, all scalars are initialized to the undefined value and all arrays and hashes to empty.) Commonly, this is used to name the formal arguments to a subroutine. As with ordinary assignment, if you use parentheses around the variables on the left (or if the variable is an array or hash), the expression on the right is evaluated in list context. Otherwise the expression on the right is evaluated in scalar context.

Here is a routine that executes some random piece of code that depends on $i running through a range of numbers. Note that the scope of $i propagates into the eval code.

&RANGEVAL(20, 30, '$foo[$i] = $i');
sub RANGEVAL {
    local($min, $max, $thunk) = @_;
    local $result = "";
    local $i;
    # Presumably $thunk makes reference to $i
    for ($i = $min; $i < $max; $i++) {
        $result .= eval $thunk;
    }
    $result;
}

This code demonstrates how to make a temporary modification to a global array:

if ($sw eq '-v') {
    # init local array with global array
    local @ARGV = @ARGV;
    unshift @ARGV, 'echo';
    system @ARGV;
}
# @ARGV restored

You can also temporarily modify hashes:

# temporarily add a couple of entries to the %digits hash
if ($base12) {
    # (NOTE: not claiming this is efficient!)
    local(%digits) = (%digits, T => 10, E => 11);
    parse_num();
}

But you probably want to be using my instead, because local isn't really what most people think of as local. See the section on my later.

localtime

localtime EXPR

This function converts the value returned by time to a nine-element list with the time corrected for the local time zone. It's typically used as follows:

($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
        localtime(time);

All list elements are numeric, and come straight out of a struct tm. (That's a bit of C programming lingo--don't worry about it.) In particular this means that $mon has the range 0..11, $wday has the range 0..6, and the year has had 1,900 subtracted from it. (You can remember which ones are 0-based because those are the ones you're always using as subscripts into 0-based arrays containing month and day names.) If EXPR is omitted, it does localtime(time). For example, to get the name of the current day of the week:

$thisday = (Sun,Mon,Tue,Wed,Thu,Fri,Sat)[(localtime)[6]];

The Perl library module Time::Local contains a subroutine, timelocal(), that can convert in the opposite direction.

In scalar context, localtime returns a ctime (3)-like string based on the localtime value. For example, the date command can be emulated with:

perl -e 'print scalar localtime'

See also POSIX::strftime() in Chapter 7, The Standard Perl Library for a more fine-grained approach to formatting times.

log

log EXPR

This function returns logarithm (base e) of EXPR. If EXPR is omitted, the function returns the logarithm of $_.

lstat

lstat EXPR

This function does the same thing as the stat function, but if the last component of the filename is a symbolic link, stats a symbolic link instead of the file the symbolic link points to. (If symbolic links are unimplemented on your system, a normal stat is done instead.)

map

map BLOCK LIST
map EXPR, LIST

This function evaluates the BLOCK or EXPR for each element of LIST (locally setting $_ to each element) and returns the list value composed of the results of each such evaluation. It evaluates BLOCK or EXPR in a list context, so each element of LIST may produce zero, one, or more elements in the returned value. These are all flattened into one list. For instance:

@words = map { split ' ' } @lines;

splits a list of lines into a list of words. Often, though, there is a one-to-one mapping between input values and output values:

@chars = map chr, @nums;

translates a list of numbers to the corresponding characters. And here's an example of a one-to-two mapping:

%hash = map { genkey($_), $_ } @array;

which is just a funny functional way to write this:

%hash = ();
foreach $_ (@array) {
    $hash{genkey($_)} = $_;
}

See also grep. map differs from grep in that map returns a list consisting of the results of each successive evaluation of EXPR, whereas grep returns a list consisting of each value of LIST for which EXPR evaluates to true.

mkdir

mkdir FILENAME, MODE

This function creates the directory specified by FILENAME, with permissions specified by the numeric MODE (as modified by the current umask). If it succeeds it returns 1, otherwise it returns 0 and sets $! (from the value of errno).

If mkdir (2) is not built in to your C library, Perl emulates it by calling the mkdir (1) program. If you are creating a long list of directories on such a system it will be more efficient to call the mkdir program yourself with the list of directories to avoid starting zillions of subprocesses.

msgctl

msgctl ID, CMD, ARG

This function calls the msgctl (2) system call. See msgctl (2) for details. If CMD is &IPC_STAT, then ARG must be a variable that will hold the returned msqid_ds structure. The return value works like ioctl's: the undefined value for error, "0 but true" for zero, or the actual return value otherwise. On error, it puts the error code into $!. Before calling, you should say:

require "ipc.ph";
require "msg.ph";

This function is available only on machines supporting System V IPC, which turns out to be far fewer than those supporting sockets.

msgget

msgget KEY, FLAGS

This function calls the System V IPC msgget (2) system call. See msgget (2) for details. The function returns the message queue ID, or the undefined value if there is an error. On error, it puts the error code into $!. Before calling, you should say:

require "ipc.ph";
require "msg.ph";

This function is available only on machines supporting System V IPC.

msgrcv

msgrcv ID, VAR, SIZE, TYPE, FLAGS

This function calls the msgrcv (2) system call to receive a message from message queue ID into variable VAR with a maximum message size of SIZE. See msgrcv (2) for details. When a message is received, the message type will be the first thing in VAR, and the maximum length of VAR is SIZE plus the size of the message type. The function returns true if successful, or false if there is an error. On error, it puts the error code into $!. Before calling, you should say:

require "ipc.ph";
require "msg.ph";

This function is available only on machines supporting System V IPC.

msgsnd

msgsnd ID, MSG, FLAGS

This function calls the msgsnd (2) system call to send the message MSG to the message queue ID. See msgsnd (2) for details. MSG must begin with the long integer message type. You can create a message like this:

$msg = pack "L a*", $type, $text_of_message;

The function returns true if successful, or false if there is an error. On error, it puts the error code into $!. Before calling, you should say:

require "ipc.ph";
require "msg.ph";

This function is available only on machines supporting System V IPC.

my

my EXPR

This operator declares one or more private variables to exist only within the innermost enclosing block, subroutine, eval, or file. If more than one variable is listed, the list must be placed in parentheses, because the operator binds more tightly than comma. Only simple scalars or complete arrays and hashes may be declared this way. The variable name may not be package qualified, because package variables are all global, and private variables are not related to any package. Unlike local, this operator has nothing to do with global variables, other than hiding any other variable of the same name from view within its scope. (A global variable can always be accessed through its package-qualified form, however.) A private variable is not visible until the statement after its declaration. Subroutines called from within the scope of such a private variable cannot see the private variable unless the subroutine is also textually declared within the scope of the variable. The technical term for this is "lexical scoping", so we often call these "lexical variables". In C culture they're called "auto" variables, since they're automatically allocated and deallocated at scope entry and exit.

The EXPR may be assigned to if desired, which allows you to initialize your lexical variables. (If no initializer is given, all scalars are initialized to the undefined value and all arrays and hashes to empty arrays.) As with ordinary assignment, if you use parentheses around the variables on the left (or if the variable is an array or hash), the expression on the right is evaluated in list context. Otherwise the expression on the right is evaluated in scalar context. You can name your formal subroutine parameters with a list assignment, like this:

my ($friends, $romans, $countrymen) = @_;

Be careful not to omit the parentheses indicating list assignment, like this:

my $country = @_;  # right or wrong?

This assigns the length of the array (that is, the number of the subroutine's arguments) to the variable, since the array is being evaluated in scalar context. You can profitably use scalar assignment for a formal parameter though, as long as you use the shift operator. In fact, since object methods are passed the object as the first argument, many such method subroutines start off like this:

sub simple_as {
    my $self = shift;   # scalar assignment
    my ($a,$b,$c) = @_; # list assignment
    ...
}

new

new CLASSNAME LIST
new CLASSNAME

There is no built-in new function. It is merely an ordinary constructor method (subroutine) defined (or inherited) by the CLASSNAME module to let you construct objects of type CLASSNAME. Most constructors are named "new", but only by convention, just to delude C++ programmers into thinking they know what's going on.

next LABEL
next

The next command is like the continue statement in C: it starts the next iteration of the loop designated by LABEL:

LINE: while (<STDIN>) {
    next LINE if /^#/;     # discard comments
    ...
}

Note that if there were a continue block in this example, it would execute immediately following the invocation of next. When LABEL is omitted, the command refers to the innermost enclosing loop.

no

no Module LIST

See the use operator, which no is the opposite of, kind of.

oct

oct EXPR

This function interprets EXPR as an octal string and returns the equivalent decimal value. (If EXPR happens to start off with 0x, it is interpreted as a hex string instead.) The following will handle decimal, octal, and hex in the standard notation:

$val = oct $val if $val =~ /^0/;

If EXPR is omitted, the function interprets $_. To perform the inverse function on octal numbers, use:

$oct_string = sprintf "%lo", $number;

open

open FILEHANDLE, EXPR
open FILEHANDLE

This function opens the file whose filename is given by EXPR, and associates it with FILEHANDLE. If EXPR is omitted, the scalar variable of the same name as the FILEHANDLE must contain the filename. (And you must also be careful to use "or die" after the statement rather than "|| die", because the precedence of || is higher than list operators like open.) FILEHANDLE may be a directly specified filehandle name, or an expression whose value will be used for the filehandle. The latter is called an indirect filehandle. If you supply an undefined variable for the indirect filehandle, Perl will not automatically fill it in for you--you have to make sure the expression returns something unique, either a string specifying the actual filehandle name, or a filehandle object from one of the object-oriented I/O packages. (A filehandle object is unique because you call a constructor to generate the object. See the example later in this section.)

After the filehandle is determined, the filename string is processed. First, any leading and trailing whitespace is removed from the string. Then the string is examined on both ends for characters specifying how the file is to be opened. (By an amazing coincidence, these characters look just like the characters you'd use to indicate I/O redirection to the Bourne shell.) If the filename begins with < or nothing, the file is opened for input. If the filename begins with >, the file is truncated and opened for output. If the filename begins with >>, the file is opened for appending. (You can also put a + in front of the > or < to indicate that you want both read and write access to the file.) If the filename begins with |, the filename is interpreted as a command to which output is to be piped, and if the filename ends with a |, the filename is interpreted as command which pipes input to us. You may not have an open command that pipes both in and out, although the IPC::Open2 and IPC::Open3 library routines give you a close equivalent. See the section "Bidirectional Communication" in Chapter 6, Social Engineering.

Any pipe command containing shell metacharacters is passed to /bin/sh for execution; otherwise it is executed directly by Perl. The filename "-" refers to STDIN, and ">-" refers to STDOUT. open returns non-zero upon success, the undefined value otherwise. If the open involved a pipe, the return value happens to be the process ID of the subprocess.

If you're unfortunate enough to be running Perl on a system that distinguishes between text files and binary files (modern operating systems don't care), then you should check out binmode for tips for dealing with this. The key distinction between systems that need binmode and those that don't is their text file formats. Systems like UNIX and Plan9 that delimit lines with a single character, and that encode that character in C as '\n', do not need binmode. The rest need it.

Here is some code that shows the relatedness of a filehandle and a variable of the same name:

$ARTICLE = "/usr/spool/news/comp/lang/perl/misc/38245";
open ARTICLE or die "Can't find article $ARTICLE: $!\n";
while (<ARTICLE>) {...

Append to a file like this:

open LOG, '>>/usr/spool/news/twitlog'; # (`log' is reserved)

Pipe your data from a process:

open ARTICLE, "caesar <$article |";   # decrypt article with rot13

Here < does not indicate that Perl should open the file for input, because < is not the first character of EXPR. Rather, the concluding | indicates that input is to be piped from caesar <$article (from the program caesar, which takes $article as its standard input). The < is interpreted by the subshell that Perl uses to start the pipe, because < is a shell metacharacter.

Or pipe your data to a process:

open EXTRACT, "|sort >/tmp/Tmp$$";    # $$ is our process number

In this next example we show one way to do recursive opens, via indirect filehandles. The files will be opened on filehandles fh01, fh02, fh03, and so on. Because $input is a local variable, it is preserved through recursion, allowing us to close the correct file before we return.

# Process argument list of files along with any includes.
foreach $file (@ARGV) {
    process($file, 'fh00');
}
sub process {
    local($filename, $input) = @_;
    $input++;               # this is a string increment
    unless (open $input, $filename) {
        print STDERR "Can't open $filename: $!\n";
        return;
    }
    while (<$input>) {      # note the use of indirection
        if (/^#include "(.*)"/) {
            process($1, $input);
            next;
        }
        ...               # whatever
    }
    close $input;
}

You may also, in the Bourne shell tradition, specify an EXPR beginning with >&, in which case the rest of the string is interpreted as the name of a filehandle (or file descriptor, if numeric) which is to be duped and opened.[6] You may use & after >, >>, <, +>, +>>, and +<. The mode you specify should match the mode of the original filehandle. Here is a script that saves, redirects, and restores STDOUT and STDERR:

[6] The word "dup" is UNIX-speak for "duplicate". We're not really trying to dupe you. Trust us.

#!/usr/bin/perl
open SAVEOUT, ">&STDOUT";
open SAVEERR, ">&STDERR";
open STDOUT, ">foo.out" or die "Can't redirect stdout";
open STDERR, ">&STDOUT" or die "Can't dup stdout";
select STDERR; $| = 1;         # make unbuffered
select STDOUT; $| = 1;         # make unbuffered
print STDOUT "stdout 1\n";     # this propagates to
print STDERR "stderr 1\n";     # subprocesses too
close STDOUT;
close STDERR;
open STDOUT, ">&SAVEOUT";
open STDERR, ">&SAVEERR";
print STDOUT "stdout 2\n";
print STDERR "stderr 2\n";

If you specify <&=N, where N is a number, then Perl will do an equivalent of C's fdopen (3) of that file descriptor; this is more parsimonious with file descriptors than the dup form described earlier. (On the other hand, it's more dangerous, since two filehandles may now be sharing the same file descriptor, and a close on one filehandle may prematurely close the other.) For example:

open FILEHANDLE, "<&=$fd";

If you open a pipe to or from the command "-" (that is, either |- or -|), then an implicit fork is done, and the return value of open is the pid of the child within the parent process, and 0 within the child process. (Use defined($pid) in either the parent or child to determine whether the open was successful.) The filehandle behaves normally for the parent, but input and output to that filehandle is piped from or to the STDOUT or STDIN of the child process. In the child process the filehandle isn't opened--I/O happens from or to the new STDIN or STDOUT. Typically this is used like the normal piped open when you want to exercise more control over just how the pipe command gets executed, such as when you are running setuid, and don't want to have to scan shell commands for metacharacters. The following pairs are equivalent:

open FOO, "|tr '[a-z]' '[A-Z]'";
open FOO, "|-" or exec 'tr', '[a-z]', '[A-Z]';
open FOO, "cat -n file|";
open FOO, "-|" or exec 'cat', '-n', 'file';

Explicitly closing any piped filehandle causes the parent process to wait for the child to finish, and returns the status value in $?. On any operation which may do a fork, unflushed buffers remain unflushed in both processes, which means you may need to set $| on one or more filehandles to avoid duplicate output (and then do output to flush them).

Filehandles STDIN, STDOUT, and STDERR remain open following an exec. Other filehandles do not. (However, on systems supporting the fcntl function, you may modify the close-on-exec flag for a filehandle. See fcntl earlier in this chapter. See also the special $^F variable.)

Using the constructor from the FileHandle module, described in Chapter 7, The Standard Perl Library, you can generate anonymous filehandles which have the scope of whatever variables hold references to them, and automatically close whenever and however you leave that scope:

use FileHandle;
...
sub read_myfile_munged {
    my $ALL = shift;
    my $handle = new FileHandle;
    open $handle, "myfile" or die "myfile: $!";
    $first = <$handle> or return ();      # Automatically closed here.
    mung $first or die "mung failed";     # Or here.
    return $first, <$handle> if $ALL;     # Or here.
    $first;                               # Or here.
}

In order to open a file with arbitrary weird characters in it, it's necessary to protect any leading and trailing whitespace, like this:

$file =~ s#^\s#./$&#;
open FOO, "< $file\0";

But we've never actually seen anyone use that in a script . . .

If you want a real C open (2), then you should use the sysopen function. This is another way to protect your filenames from interpretation. For example:

use FileHandle;
sysopen HANDLE, $path, O_RDWR|O_CREAT|O_EXCL, 0700
    or die "sysopen $path: $!";
HANDLE->autoflush(1);
HANDLE->print("stuff $$\n");
seek HANDLE, 0, 0;
print "File contains: ", <HANDLE>;

See seek for some details about mixing reading and writing.

opendir

opendir DIRHANDLE, EXPR

This function opens a directory named EXPR for processing by readdir, telldir, seekdir, rewinddir, and closedir. The function returns true if successful. Directory handles have their own namespace separate from filehandles.

ord

ord EXPR

This function returns the numeric ASCII value of the first character of EXPR. If EXPR is omitted, it uses $_. The return value is always unsigned. If you want a signed value, use unpack('c', EXPR). If you want all the characters of the string converted to a list of numbers, use unpack('C*', EXPR) instead.

pack

pack TEMPLATE, LIST

This function takes a list of values and packs it into a binary structure, returning the string containing the structure. The TEMPLATE is a sequence of characters that give the order and type of values, as follows:

Character Meaning

a An ASCII string, will be null padded

A An ASCII string, will be space padded

b A bit string, low-to-high order (like vec( ))

B A bit string, high-to-low order

c A signed char value

C An unsigned char value

d A double-precision float in the native format

f A single-precision float in the native format

h A hexadecimal string, low nybble first

H A hexadecimal string, high nybble first

i A signed integer value

I An unsigned integer value

l A signed long value

L An unsigned long value (continued)

n A short in "network" (big-endian) order

N A long in "network" (big-endian) order

p A pointer to a string

P A pointer to a structure (fixed-length string)

s A signed short value

S An unsigned short value

v A short in "VAX" (little-endian) order

V A long in "VAX" (little-endian) order

u A uuencoded string

x A null byte

X Back up a byte

@ Null-fill to absolute position

Each character may optionally be followed by a number which gives a repeat count. Together the character and the repeat count make a field specifier. Field specifiers may be separated by whitespace, which will be ignored. With all types except "a" and "A", the pack function will gobble up that many values from the LIST. Saying "*" for the repeat count means to use however many items are left. The "a" and "A" types gobble just one value, but pack it as a string of length count, padding with nulls or spaces as necessary. (When unpacking, "A" strips trailing spaces and nulls, but "a" does not.) Real numbers (floats and doubles) are in the native machine format only; due to the multiplicity of floating formats around, and the lack of a standard network representation, no facility for interchange has been made. This means that packed floating-point data written on one machine may not be readable on another--even if both use IEEE floating-point arithmetic (as the endian-ness of the memory representation is not part of the IEEE spec). Also, Perl uses doubles internally for all numeric calculation, and converting from double to float to double will lose precision; that is, unpack(`f`, pack(`f`,$num)) will not in general equal $num.

This first pair of examples packs numeric values into bytes:

$out = pack "cccc", 65, 66, 67, 68;      # $out eq "ABCD"
$out = pack "c4", 65, 66, 67, 68;        # same thing

This does a similar thing, with a couple of nulls thrown in:

$out = pack "ccxxcc", 65, 66, 67, 68;    # $out eq "AB\0\0CD"

Packing your shorts doesn't imply that you're portable:

$out = pack "s2", 1, 2;    # "\1\0\2\0" on little-endian
                           # "\0\1\0\2" on big-endian

On binary and hex packs, the count refers to the number of bits or nybbles, not the number of bytes produced:

$out = pack "B32", "01010000011001010111001001101100";
$out = pack "H8", "5065726c";    # both produce "Perl"

The length on an "a" field applies only to one string:

$out = pack "a4", "abcd", "x", "y", "z";      # "abcd"

To get around that limitation, use multiple specifiers:

$out = pack "aaaa",  "abcd", "x", "y", "z";   # "axyz"
$out = pack "a" x 4, "abcd", "x", "y", "z";   # "axyz"

The "a" format does null filling:

$out = pack "a14", "abcdefg";   # "abcdefg\0\0\0\0\0\0\0"

This template packs a C struct tm record (at least on some systems):

$out = pack "i9pl", gmtime, $tz, $toff;

The same template may generally also be used in the unpack function. If you want to join variable length fields with a delimiter, use the join function.

Note that, although all of our examples use literal strings as templates, there is no reason you couldn't pull in your templates from a disk file. You could, in fact, build an entire relational database system around this function.

package

package NAMESPACE

This is not really a function, but a declaration that says that the rest of the innermost enclosing block, subroutine, eval or file belongs to the indicated namespace. (The scope of a package declaration is thus the same as the scope of a local or my declaration.) All subsequent references to unqualified global identifiers will be resolved by looking them up in the declared package's symbol table. A package declaration affects only global variables--including those you've used local on--but not lexical variables created with my.

Typically you would put a package declaration as the first thing in a file that is to be included by the require or use operator, but you can put one anywhere that a statement would be legal. When defining a class or a module file, it is customary to name the package the same name as the file, to avoid confusion. (It's also customary to name such packages beginning with a capital letter, because lowercase modules are by convention interpreted as pragmas.)

You can switch into a given package in more than one place; it merely influences which symbol table is used by the compiler for the rest of that block. (If it sees another package declaration at the same level, the new one overrides the previous one.) Your main program is assumed to start with a package main declaration.

You can refer to variables and filehandles in other packages by qualifying the identifier with the package name and a double colon: $Package::Variable. If the package name is null, the main package as assumed. That is, $::sail is equivalent to $main::sail.

The symbol table for a package is stored in a hash with a name ending in a double colon. The main package's symbol table is named %main:: for example. So the package symbol *main::sail can also be accessed as $main::{"sail"}.

See "Packages" in Chapter 5, Packages, Modules, and Object Classes, for more information about packages, modules, and classes. See my in Chapter 3, Functions, for other scoping issues.

pipe

pipe READHANDLE, WRITEHANDLE

Like the corresponding system call, this function opens a pair of connected pipes--see pipe (2). This call is almost always used right before a fork, after which the pipe's reader should close WRITEHANDLE, and the writer close READHANDLE. (Otherwise the pipe won't indicate EOF to the reader when the writer closes it.) Note that if you set up a loop of piped processes, deadlock can occur unless you are very careful. In addition, note that Perl's pipes use standard I/O buffering, so you may need to set $| on your WRITEHANDLE to flush after each output command, depending on the application--see select (output filehandle).

See also the section on "Pipes" in Chapter 6, Social Engineering.

pop

pop ARRAY
pop

This function treats an array like a stack--it pops and returns the last value of the array, shortening the array by 1. If ARRAY is omitted, the function pops @ARGV (in the main program), or @_ (in subroutines). It has the same effect as:

$tmp = $ARRAY[$#ARRAY--];

or:

$tmp = splice @ARRAY, -1;

If there are no elements in the array, pop returns the undefined value. See also push and shift. If you want to pop more than one element, use splice.

Note that pop requires its first argument to be an array, not a list. If you just want the last element of a list, use this:

(something_returning_a_list)[-1]

pos

pos SCALAR

Returns the location in SCALAR where the last m//g search over SCALAR left off. It returns the offset of the character after the last one matched. (That is, it's equivalent to length($`) + length($&).) This is the offset where the next m//g search on that string will start. Remember that the offset of the beginning of the string is 0. For example:

$grafitto = "fee fie foe foo";
while ($grafitto =~ m/e/g) {
    print pos $grafitto, "\n";
}

prints 2, 3, 7, and 11, the offsets of each of the characters following an "e". The pos function may be assigned a value to tell the next m//g where to start:

$grafitto = "fee fie foe foo";
pos $grafitto = 4;  # Skip the fee, start at fie
while ($grafitto =~ m/e/g) {
        print pos $grafitto, "\n";
}

This prints only 7 and 11. (Thank heaven.) The regular expression assertion, \G, matches only at the location currently specified by pos for the string being searched.

print

print FILEHANDLE LIST
print LIST
print

This function prints a string or a comma-separated list of strings. The function returns 1 if successful, 0 otherwise. FILEHANDLE may be a scalar variable name (unsubscripted), in which case the variable contains either the name of the actual filehandle or a reference to a filehandle object from one of the object-oriented filehandle packages. FILEHANDLE may also be a block that returns either kind of value:

print { $OK ? "STDOUT" : "STDERR" } "stuff\n";
print { $iohandle[$i] } "stuff\n";

Note that if FILEHANDLE is a variable and the next token is a term, it may be misinterpreted as an operator unless you interpose a + or put parentheses around the arguments. For example:

print $a - 2;   # prints $a - 2 to default filehandle (usually STDOUT)
print $a (- 2); # prints -2 to filehandle specified in $a
print $a -2;    # ditto (weird parsing rules :-)

If FILEHANDLE is omitted, the function prints to the currently selected output filehandle, initially STDOUT. To set the default output filehandle to something other than STDOUT use the select(FILEHANDLE) operation.[7] If LIST is also omitted, prints $_. Note that, because print takes a LIST, anything in the LIST is evaluated in list context, and any subroutine that you call will likely have one or more of its own internal expressions evaluated in list context. Thus, when you say:

[7] Thus, STDOUT isn't really the default filehandle for print. It's merely the default default filehandle.

print OUT <STDIN>;

it is not going to print out the next line from standard input, but all the rest of the lines from standard input up to end-of-file, since that's what <STDIN> returns in list context. Also, remembering the if-it-looks-like-a-function-it-is-a-function rule, be careful not to follow the print keyword with a left parenthesis unless you want the corresponding right parenthesis to terminate the arguments to the print--interpose a + or put parens around all the arguments:

print (1+2)*3, "\n";            # WRONG
print +(1+2)*3, "\n";           # ok
print ((1+2)*3, "\n");          # ok

printf

printf FILEHANDLE LIST
printf LIST

This function prints a formatted string to FILEHANDLE or, if omitted, the currently selected output filehandle, initially STDOUT. The first item in the LIST must be a string that says how to format the rest of the items. This is similar to the C library's printf (3) and fprintf (3) function, except that the * field width specifier is not supported. The function is equivalent to:

print FILEHANDLE sprintf LIST

See print and sprintf. The description of sprintf includes the list of acceptable specifications for the format string.

Don't fall into the trap of using a printf when a simple print would do. The print is more efficient, and less error prone.

push

push ARRAY, LIST

This function treats ARRAY as a stack, and pushes the values of LIST onto the end of ARRAY. The length of ARRAY increases by the length of LIST. The function returns this new length. The push function has the same effect as:

foreach $value (LIST) {
    $ARRAY[++$#ARRAY] = $value;
}

or:

splice @ARRAY, @ARRAY, 0, LIST;

but is more efficient (for both you and your computer). You can use push in combination with shift to make a fairly time-efficient shift register or queue:

for (;;) {
    push @ARRAY, shift @ARRAY;
    ...
}