О КОПИРАЙТАХ |
Вся предоставленная на этом сервере информация собрана нами из разных источников. Если Вам кажется, что публикация каких-то документов нарушает чьи-либо авторские права, сообщите нам об этом. |
|
|
|
|
Previous | Next
Just as there are many levels on which languages can compete, so too
there are many levels on which languages can cooperate. Here we'll talk
primarily about generation, translation and embedding (via linking).
Almost from the time people first figured out that they could write programs,
they started writing programs that write other programs. These are called
program generators. (If you're a history buff, you might know that
RPG stood for Report Program Generator long before it stood for Role
Playing Game.) Now, anyone who has written a program generator knows that it
can make your eyes go crossed even when you're wide awake. The problem
is simply that much of your program's data looks like real code, but
isn't (at least not yet). The same text file contains both stuff that does
something and similar looking stuff that doesn't. Perl has various
features that make it easier to mix it together with other languages,
textually speaking.
Of course, these features also make it easier to write Perl in Perl,
but it's rather expected that Perl would cooperate with itself.
Perl is, of course, a text-processing language, and most computer
languages are textual. Beyond that, the lack of arbitrary limits together
with the various quoting and interpolation mechanisms make it pretty easy to
visually isolate the code of the other language you're spitting out.
For example, here is a small chunk of s2p, the
sed-to-perl translator:
print &q(<<"EOT");
: #!$bin/perl
: eval 'exec $bin/perl -S \$0 \${1+"\$@"}'
: if \$running_under_some_shell;
:
EOT
Here the enclosed text happens to be legal in two languages, both Perl
and shell. We've used the trick of putting a colon and a tab on the
front of every line, which visually isolates the enclosed code. One
variable, $bin, is interpolated in the multi-line quote in two
places, and then the string is passed through a function to strip the
colon and tab.
Of course, you aren't required to use multi-line quotes. One often sees
CGI scripts containing millions of print statements, one per line.
It seems a bit like driving to church in an F-16, but hey, if it gets
you there. . . .
When you are embedding a large, multi-line quote containing some other
language (such as HTML), it's sometimes helpful to pretend you're
enclosing Perl into the other language instead:
print <<"END";
stuff
blah blah blah ${ \( EXPR ) } blah blah blah
blah blah blah @{[ LIST ]} blah blah blah
nonsense
END
You can use either of those two tricks to interpolate the value of any
scalar EXPR or LIST into a longer string.
Perl can easily be generated in other languages because it's both concise
and malleable. You can pick your quotes not to interfere with the other
language's quoting mechanisms. You don't have to worry about
indentation, or where you put your line breaks, or whether to backslash
your backslashes yet again. You aren't forced to define a package as a
single string in advance, since you can slide into your package's namespace
repeatedly, whenever you want to evaluate more code in that package.
One of the very first Perl applications was the sed-to-perl translator,
s2p. In fact, Larry delayed the initial release of Perl in order to
complete s2p and awk-to-perl (a2p), because he thought they'd improve the
acceptance of Perl. Hmm, maybe they did.
The s2p program takes a sed script specified on the command line
(or from standard input) and produces a comparable Perl script on the
standard output.
Options include:
- -Dnumber
-
Sets debugging flags.
- -n
-
Specifies that this sed script was always invoked as sed -n.
Otherwise a switch parser is prepended to the front of the script.
- -p
-
Specifies that this sed script was never invoked as sed -n.
Otherwise a switch parser is prepended to the front of the script.
The Perl script produced looks very sed-like, and there may very well
be better ways to express what you want to do in Perl. For instance,
s2p does not make any use of the split operator, but you might
want to.
The Perl script you end up with may be either faster or slower than the
original sed script. If you're only interested in speed you'll just
have to try it both ways. Of course, if you want to do something sed
doesn't do, you have no choice. It's often possible to speed up the
Perl script by various methods, such as deleting all references to $\
and chop.
The a2p program takes an awk script specified on the command line
(or from standard input) and produces a comparable Perl script on the
standard output.
Options include:
- -Dnumber
-
Sets debugging flags.
- -Fcharacter
-
Tells a2p that this awk script is always invoked with a -F
switch specifying character.
- -nfieldlist
-
Specifies the names of the input fields if input does not have to be
split into an array for some programmatic reason. If you were
translating an awk script that processes the password file, you might
say:
a2p -7 -nlogin.password.uid.gid.gcos.shell.home
Any delimiter may be used to separate the field names.
- -number
-
Causes a2p to assume that input will always have that many fields.
a2p cannot do as good a job translating as a human would, but it
usually does pretty well. There are some areas where you may want to
examine the Perl script produced and tweak it some. Here are some of
them, in no particular order.
There is an awk idiom of putting int(...) around a string expression to
force numeric interpretation, even though the argument is always an integer
anyway. This is generally unneeded in Perl, but a2p can't tell if
the argument is always going to be an integer, so it leaves it in. You may
wish to remove it.
Perl differentiates numeric comparison from string comparison. awk has
one operator for both that decides at run-time which comparison to do.
a2p does not try to do a complete job of awk emulation at this
point. Instead it guesses which one you want. It's almost always
right, but it can be spoofed. All such guesses are marked with the
comment #???. You should go through and check them. You might want
to run at least once with Perl's -w switch, which warns you if
you use == where you should have used eq.
It would be possible to emulate awk 's behavior in selecting string
versus numeric operations at run-time by inspection of the operands, but
it would be gross and inefficient. Besides, a2p almost always
guesses right.
Perl does not attempt to emulate the behavior of awk in which
nonexistent array elements spring into existence simply by being
referenced. If somehow you are relying on this mechanism to create null
entries for a subsequent for . . . in, they won't be there in Perl.
If a2p makes a split command that assigns to a list of variables
that looks like ($Fld1, $Fld2, $Fld3...) you may want to rerun a2p
using the -n option mentioned above. This will let you name the
fields throughout the script. If it splits to an array instead, the
script is probably referring to the number of fields somewhere.
The "exit" statement in awk doesn't necessarily exit; it
goes to the END block if there is one. awk scripts that
do contortions within the END block to bypass the block
under such circumstances can be simplified by removing the
conditional in the END block and just exiting directly
from the Perl script.
Perl has two kinds of arrays, numerically indexed and associative.
awk arrays are usually translated to associative arrays, but if you
happen to know that the index is always going to be numeric, you could
change the { . . . } to [ . . . ]. Remember that iteration over an
associative array is done using the keys function, but iteration over
a numeric array isn't. You might need to modify any loop that is
iterating over the array in question.
awk starts by assuming OFMT has the value %.6g. Perl starts by
assuming its equivalent, $#, to have the value %.20g. You'll want to
set $# explicitly if you use the default value of OFMT. (Actually,
you probably don't want to set $#, but rather put in printf formats
everywhere it matters.)
Near the top of the line loop will be the split operator that is
implicit in the awk script. There are times when you can move this operator
down past some conditionals that test the entire record, so that the
split is not done as often.
For aesthetic reasons you may wish to change the array base $[ from 1
back to Perl's default of 0, but remember to change all array subscripts
and all substr and index operations to match.
Cute comments that say:
# Here's a workaround because awk is so dumb.
are, of course, passed through unmodified.
awk scripts are often embedded in a shell script that pipes stuff
into and out of awk. Often the shell script wrapper can be
incorporated into the Perl script, since Perl can start up pipes into
and out of itself, and can do other things that awk can't do by
itself.
Scripts that refer to the special variables RSTART and RLENGTH can often
be simplified by referring to the variables $`, $&, and $', as
long as they are within the scope of the pattern match that sets them.
The produced Perl script may have subroutines defined to
deal with awk 's semantics regarding "getline" and "print".
Since a2p usually picks correctness over efficiency, it
is almost always possible to rewrite such code to be more
efficient by discarding the semantic sugar.
ARGV[0] translates to $0, but ARGV[n] translates to
$ARGV[$n]. A loop that tries to iterate over ARGV[0] won't find it.
NOTE:
Storage for the awk syntax tree is currently static, and can run out.
You'll need to recompile a2p if that happens.
The find2perl program is really easy to understand if you already
understand the UNIX find (1) program. Just type find2perl instead
of find, and give it the same arguments you would give to find. It
will spit out an equivalent Perl script.
There are a couple of options you can use that your ordinary find (1)
command probably doesn't support:
- -tar tarfile
-
Outputs a tar file much like the -cpio switch of some versions of find.
- -eval string
-
Evaluates the string as a Perl expression, and continues if true.
The notion of a source filter started with the idea that a script or
module should be able to decrypt itself on the fly, like this:
#!/usr/bin/perl
use MyDecryptFilter;
@*x$]`0uN&k^Zx02jZ^X{.?s!(f;9Q/^A^@~~8H]|,%@^P:q-=
...
But the idea grew from there, and now a source filter can be defined to
do any transformation on the input text you like. One can now even do
things like this:
#!/usr/bin/perl
use Filter::exec "a2p";
1,30{print $1}
Put that together with the notion of the -x switch mentioned at the
beginning of this chapter, and you have a general mechanism for pulling
any chunk of program out of an article and executing it, regardless of
whether it's written in Perl or not. Now that's cooperation.
The Filter module is available from CPAN.
Historically, the Perl interpreter has been rather self-contained. When
Perl was redesigned for Version 5, however, one of the requirements was
that it be possible to write extension modules that could traverse the
parsed syntax tree and emit code in other languages, either low-level
or high-level. This has now come to pass.
More precisely, this is now coming to pass. Malcolm Beattie has been
developing a "real compiler" for Perl. As of this writing, it's in
Alpha 2 state, which means it mostly works, except for the really hard
bits. The compiler consists of an ordinary Perl parser and
interpreter (since you need to be able to execute BEGIN blocks to
compile Perl), plus a set of modules under the name of B, which is short
for both "Backend" and "Beattie". You don't actually invoke the B
module directly though. Instead you invoke a particular backend via the
O module, which pulls in the B module for you. Typically you invoke the
O module right on the command line with the -M switch, so a
compilation command might look like this:
perl -MO=C foo.pl >foo.c
There are three backends at the moment. The C backend rather woodenly
spits out C calls into the ordinary Perl interpreter, but it can
translate almost anything except the most egregious abuses of the
dynamic capabilities of the interpreter. The Bytecode module is also
fairly complete, and spits out an external Perl bytecode representation,
which can then be read back in and executed by a suitably clued version
of Perl. Finally, the CC backend attempts to translate into more
idiomatic C with a lot of optimization. Obviously, that's a bit harder
to do than the other thing. Nevertheless, it already works on a majority of
the Perl regression tests. It's possible with some care to get C code
that runs considerably faster than Perl 5's interpreter, which is no
slouch to begin with. And Malcolm hasn't put in all the optimizations
he wants to yet.
This is an ongoing topic of research, but you'll want to keep track of
it. You are quite likely to be using this someday soon, if you aren't
already. Look for it on CPAN of course, if it's not already a part
of the standard Perl distribution by the time you read this.
Another part of the design of Perl 5 was that it be possible to embed a
Perl interpreter in a C or C++ program. And in fact, the ordinary
perl executable pretends to have an embedded interpreter in it; the
main( ) function essentially does this:
PerlInterpreter *my_perl;
int main(int argc, char **argv)
{
int exitstatus;
my_perl = perl_alloc();
perl_construct( my_perl );
exitstatus = perl_parse( my_perl, xs_init, argc, argv,
(char **) NULL );
if (exitstatus)
exit( exitstatus );
exitstatus = perl_run( my_perl );
perl_destruct( my_perl );
perl_free( my_perl );
exit(exitstatus);
}
The important parts are the calls to perl_parse() and perl_run(), which
respectively compile and run the program. If you were embedding Perl in
your own program, you might replace the call to perl_run() with calls to
perl_call_sv() function, which calls individual subroutines rather than
the program as a whole. Or you can do both, if the main script contains
initialization code as well as subroutine definitions.
There are many other useful entry points into the interpreter, such as
perl_eval_sv(), which evaluates a string, but this chapter is already
getting pretty long, and the fact of the matter is that there is
extensive online documentation for the internals of Perl. To include it
here would make this book even more unwieldy than it is, and most people
who would be embedding Perl aren't scared of online documentation. See
the perlembed (3) manpage for more on embedding Perl interpreters in your
program.
A number of programs in the real world already have Perl embedded in
them--the authors know of several proprietary products shipping with
embedded Perl interpreters. There are also a couple of modules for the
Apache HTTP servers that use an embedded Perl interpreter to avoid
process startup costs on CGI-like scripting. And then there's the version
of Berkeley's nvi editor with a Perl engine in it. Watch out,
emacs, you've got company. :-)
If a respectable number of programs embed a Perl interpreter, then a
veritable flood of extension modules embed C and C++ into Perl. Again,
the Perl distribution itself does this with many of its standard
extension modules, including DB_File, DynaLoader, Fcntl, FileHandle,
GDBM_File, NDBM_File, ODBM_File, POSIX, Safe, SDBM_File, and Socket.
And many of the modules on CPAN do this. So if you decide to do it
yourself, you won't feel like you're researching a Ph.D. dissertation.
And again, we only have space to give you teasers for the online
documentation, which is exhaustively extensive. We recommend you start
with the perlxstut (3) manpage, which is a tutorial on the XS
language, a preprocessor that spits out the glue routines you need to do
the "impedance matching" between Perl and C or C++. You'll also be
interested in perlxs (3), perlguts (3), and perlcall (3).
And once again, let us reiterate that your best resource is the Perl
community itself. They invented a lot of this stuff, and are emotionally
committed to making you like it, whether you like it or not. You'd better
cooperate.
Previous | Home | Next
|