Orders Orders Backward Forward
Comments Comments
© 1997 The McGraw-Hill Companies, Inc. All rights reserved.
Any use of this Beta Book is subject to the rules stated in the Terms of Use.

Chapter 11: Perl 5 Odds and Ends:

Perl has a lot of success behind it. Perl has become the most popular language on the Web to use for system administration, database administration, and for a myriad of other tasks. Perl has managed to become incredibly useful without the one thing that other languages have had: a formal design committee. Quite the contrary, Perl's design is about as anarchical as you could possibly imagine. In all likelihood, a given feature started with a 'why don't we try this ' post on the mailing list Perl5-porters. Likewise, much of the look and feel of Perl is due to historical accident (Larry Wall's own personal historical accidents, so to speak), and the desire for 'just one more feature'.

Chapter Overview

The upshot of this 'just one more feature' ideal is that there are a few concepts in Perl that don't fit very well with any of the other Perl features, yet are too powerful to overlook. This chapter is devoted to these features. The features that we shall cover here are:

1) formats

  1. coderefs
  2. globbing

4) BEGIN/END

5) eval

Formats are Perl's 'what you see is what you get' way of generating ASCII reports. Formats are extremely helpful in generating quick and dirty analysis output for others to read (so you can get back to programming more edifying items in Perl). This chapter could be considered a primer on formats. You can get much fancier with them than what is shown here. You might want to check out the perlform man page for more detail.

Coderefs or callbacks, let you pass around functions as you would data, into other functions. In doing so, coderefs permit the construction of much more powerful modules and objects, and allow you to think about things in a much more 'object oriented' page. If you want more information, check out the perlref on-line man page for more detail.

Globbing is the ability to refer to more than one thing by the use of a wildcard. This chapter covers one form of globbing, called type globbing. The other form of globbing, called file globbing, was covered in the last chapter.

BEGIN/END functions are Perl's method of performing certain tasks before the logical execution of a program, and after the end of the program. In other words, these functions allow you to do such things as global cleanup or global setup. They also are important for understanding how packages work. Execution Control is my term for controlling the sequence of events followed by Perl when an executable runs. This includes when variables are assigned, and when functions are defined. Understanding execution control can explain several strange errors that you might get while running Perl.

Perl's function for executing strings as if they were small Perl programs is eval. It was necessary in Perl 4 to use eval a lot more than it is in Perl 5. In fact, it is best to forget about most of the uses of eval in Perl 4. For example if you used eval to create complex data structures, now you will use references instead.

Each one of these subjects is covered briefly here. If you need to get more in-depth on these topics you can consult perlform for formats, and perlfunc for eval and BEGIN/END. Type Globbing is covered in the Perl documentation, but you will need to dig for it in several places.

Those of you just starting out with the language might want to read and understand formats, since they can make you immediately productive. Be sure to read about eval if you need to port your Perl scripts to many platforms. If you need to make your scripts run such that they are being fired off automatically (whether by cron or by Web server), read about BEGIN/END. If you are going to be doing more than just a little object oriented programming, be sure to read about callbacks.

The odds and ends covered in this chapter fill in the last remaining gaps of the basics of Perl. They are rather "messy" because they don't fit into neat, tidy categories of what has come before.. However, they are great at solving the particular problems that they were designed to solve. Pragmatism is the key here. When you get familiar with Perl, you will want to add these to your repertoire .

Formats

Have you ever had need for a quick analysis about reams and reams of data? Do you feel that you are "drowning in data and starving for information?" Odds are that you have. Whether the report is on the performance of a certain piece of code, a report on the bugs in a piece of software, or cost/benefit analysis, all reports share the same three steps:

1) data collection

2) data manipulation

3) data output.

Formats are Perl's way of handling the data output step in this chain. Formats allow you to change data into meaningful information. They do this in a very quick, 'pretty enough but not too pretty' way. Formats are not very scaleable because complicated reports are difficult to make. When formats were devised, Perl was a very young language and there was no such thing as 'my' variables. Hence, you need to use globals to write out reports.

In other words, formats are useful for simple reports because they let you easily extract data into an output file a lot faster than is possible with C or C++.

Format Syntax

Using formats involves a two step process:

1) define a top (header) and a body to the report

2) write data to the report

Let's look at both of these steps in turn, in an actual example. Let's consider a program to summarize a Web log.

Example: Summarizing a Web Log

Perl is used for CGI creation and Web access. One of the main tasks that one will encounter with CGI is summarizing tons of information. Here, we will use formats to define and execute this summary.

Defining the format

The first thing we need to do in order to get a useful summary of what happens with our Web server is to define a report format that makes a good summary. Below is the format we will want to use:Web Hits Report

Server Hits Domain Avg Connect Time Total Xfer Comments

-----------------------------------------------------------------------------

1323 umn.edu 04 h 15 m 11 sec 55331 KB Way too much

net lag

44 str.com 00 h 16 m 04 sec 432 KB from T1 --

Fast transfer

You could define this in Perl with the following formats:

Listing 11.1 Webreport.p (format header)

1 format REPORT_TOP =

2 Web Hits Report

3 Server Hits Domain Avg Connect Time Kbytes Comments

4 ------------------------------------------------------------------------

5 format REPORT =

6 @<<<<<< @<<<<<<< @# hr @# min @# sec @<<<<<<< ^<<<<<<<<~~

7 $hits, $domain, $hr, $min, $sec, $kbytes, $comment.

.

In these two formats, 'REPORT_TOP' corresponds to what the report prints at the beginning of each page. 'REPORT' corresponds to the body of the report. The idea here is that you can define the report to look like the desired output without actually having to go to the trouble of making a GUI picture of the report, use a specialty application like excel, or any of a hundred hurdles.

You do need to know quite a bit of special characters in order to accomplish this, though. The special characters involved with formats are given in Table 11.1:

Table 11.1: Special Characters in Formats

'@' indicates the beginning of a variable that is fixed

'<' indicates that the variable is to be left justified

'>' indicates that the variable is to be right justified

'|' indicates that the variable is to be center justified

'^' indicates the beginning of a multi-line variable

'#' indicates that the variable is to be a number

'~~' indicates that a multi-line variable is to be indefinitely continued on the next line.

'.' indicates that the end of the format is reached. This character has to be by itself, at the beginning of a line.

 

Any other character in a format will be treated as a regular character in true Perl style.

Let's look at some more examples. When you see something like:

format STDOUT =

Read: @<<<<<<<

$variable

.

this is indicating that the scalar $variable is to be left-justified, and that any more than nine input characters is to be chopped off. This also means that you are to print out your report to the screen rather than a given file. Whereas if you see something like:

format STDOUT =

Center the following variable

@||||||||||||||||||||||||||||||||||||||||||||||||||||

$variable

.

this indicates that the scalar $variable is to be centered, and that any extra characters that don't fit will fall off either edge of the line.

Now, something like:

format REPORT =

Comment: ^<<<<<<<<<<<<<

$variable

^<<<<<<<<<<<<<<

$variable

.

indicates that the variable is to span more than one line, that the first 13 characters are going to be spread on the first line, and the next 13 characters are going to be spread on the second.

Let's then go back and see the report that we are going to be using:

Listing 11.1 Webreport.p (format header)

1 format REPORT_TOP =

2 Web Hits Report

3 Server Hits Domain Avg Connect Time Kbytes Comments

4 ------------------------------------------------------------------------

5 format REPORT =

6 @<<<<<< @<<<<<<< @# hr @# min @# sec @<<<<<<< ^<<<<<<<<~~

7 $hits, $domain, $hr, $min, $sec, $kbytes, $comment.

Now, how do we actually use this format to create something useful?

The write function

First, we defined the report structure. Now go onto the second step of the formatting process, which is actually using the format by calling the function write. To demonstrate this, we need to create some data to actually write out. Let's write a stub to do this (in other words, some test data to test your application.) The actual data would be located in your logs in your http server).

Listing 11.1 Webreport.p continued

8 my $data = [

9 [ 12331, 'umn.edu', 4. 05, 36, 44232, 'too much lag time' ],

10 [ 44, 'str.com', 00,4, 6, 432, 'from t1 -- fast transfer']

11 ];

This is the data to print out. To then create the report we can say:

Listing 11.1 Webreport.p continued.

12 open (REPORT, "> report_file"); # this 'binds' the

13 # REPORT file handle to the place we want to

14 # print out the report.

15 foreach $element (@$data)

16 {

17 ($hits, $domain, $hr, $min, $sec, $kbytes, $comment) = @{$element};

18 write REPORT;

19 }

See how easy this is?. We simply populate the variables that are to be used in the format, and they are automatically put in the right place on the report. Because we have defined the handle "REPORT_TOP", we have assured that we have the same header on each page of the formatted report.

When we actually take the data from a httpd server instead of a stub, we shall simply replace the code '@{$element}' with a call that returns a data structure that mirrors the function shown.

The first thing to do in learning formats is to concentrate on knowing the special characters that make them work. Realize that '@' indicates a single line variable, whereas '^' indicates that the variable may go on for several lines. The special characters are really quite straightforward. The special characters show in the simple example above should suffice for about 90% of the reports that you write.

How Formats Work (advanced formats)

There is some heavy magic going on here. First, there is a simple 'write FILEHANDLE' statement instead of having to worry about explicitly passing in the variables that are being used in the format. It also appears that Perl "auto-magically" right justifies, centers, and left justifies for you, as well as splits up variables onto many lines! Finally, Perl handles the length of the page for you.

How does Perl do all of this so easily?

There are quite a few variables that are associated with format. As with much of Perl, these variables are of the 'one special character variety'. Look at Table 11.2:

Table 11.2 Special variables and formats

'$%' the page number that the corresponding format is on.

'$=' the number of lines per page (default 60)

'$-' the number of lines left on the page

'$~' the name of the format. (default STDOUT)

'$^' the name of the 'header' (default STDOUT_TOP)

 

It is easy to tweak all of these variables to get certain effects. For example, suppose you are doing inventory reporting, and want to output only one inventory request per page. You define the report's look via:

Listing 11.2 inventory.p (format header)

1 format STDOUT_TOP =

2 Inventory Request for Date: @<<<<<<<<<<

3 $date

4 Request Number: @<<<<<<<<<

5 $%

6 ----------------------------------------

7 format STDOUT =

8 Part Number: @############ Name of parts: @<<<<<<<<<<<<<

9 $part_number, $part_name

10 Cost: @######.##

11 $cost_of_part

12 Comments on request: ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~~

13 $comment

Notice the '$%' used here. It prints the page number on the report. Now look at what happens when we do the write again, with test data:

Listing 11.2 inventory.p continued

14 $reportStuff = [

15 [ 'Jun 15, 1996', 66423412, 'tongue depressors', 199.99,

16 'These things are expensive!' ],

17 [ 'May 1, 1995', 123122, 'squeegies', 1.19,

18 'need replacements.. do not squeeg effectively.'

19 ]

20 ];

21 foreach $report (@$reportStuff)

22 {

23 ($date, $part_number, $part_name, $cost_of_part, $comment) = @{$report};

24 write STDOUT;

25 $- = 0;

26 }

Note the use of the '$-' here. By setting it to zero, you are telling the internal Perl formatter to restart the page. This produces the following output:

Inventory Request for Date: Jun 15, 1996

Request Number: 1

----------------------------------------

Part Number: 66423412 Name of parts: tongue depressors

Cost: 199.99

Comments on request: These things are expensive!

^L (page break)

Inventory Request for Date: May 1, 1995

Request Number: 2

----------------------------------------

Part Number: 123122 Name of parts: squeegies

Cost: 1.19

Comments on request: do not squeege effectively.

Since this reset happens every time we loop through the report, we get the effect of only having one report per page. The same effect would occur if we set $= to 10, since there would be only 10 lines per page.

What if you wanted to add an optional comment to the end of a given record, something that only applies to certain items in the list. In other words, you want to intermix print statements with write. This is possible, but you are going to have to manage '$-' (newpage) yourself.

Format Caveats

The idea that in Perl that all it takes to output to the report is the simple statement 'write FILEHANDLE;' is a direct result of the fact that Perl is using global variables with the function write.

This means that if you do something like:

format LOOSE =

This won't work: @>>>>>>>>>>>>>>

$left_justify_me

.

my $left_justify_me = "variable_text";

write LOOSE;

this simply won't work because '$left_justify_me' is a my variable, because write is a function call, and because (as we have said) my variables simply don't work in function calls.

Also, realize that formats are one of the older, squeakier features of the language. As such, they tend to show the stress (via bugs) when used in too complicated a manner, and are not being maintained (my variables came along as a new feature and made formats a lot more confusing). One of the 'to do' items is to make a 'format package' which takes the functionality of formats out of the language, and places them into a module.

Therefore, formats should not probably be used in huge projects, and only used in small throwaway scripts.

Coderefs

Coderefs are a bit troublesome from the perspective of this book. Coderef stands for code reference, and you could argue that they belong in the section on references. However, coderefs aren't really pointers to data. They are pointers to functionality. Hence, you could argue that they belong in the chapter on functions!

Anyway, my solution was to punt, and put them in the this odds and ends chapter, which is really sort of a pity since they are so powerful. In fact, they are a necessary when doing anything more than simple object oriented programming. We shall see lots of code references in the sections on object-oriented programming.

Format of Coderefs.

When you say something like:

my $functionReference = \&subr;

you are defining a code reference. $functionReference is a scalar which gets set to 'point to the function &subr'. If you now print out $functionReference, it will look like:

CODE(0xa47ec)

indicating that it is a code reference that resides at the address 0xa47ec in memory.

Now you can use the character '&' to dereference $functionReference, much the same way as you used '@' to dereference array references, and '%' to dereference hash references. If subr looks like:

sub subr { my ($arg1, $arg2) = @_; print "$arg1 $arg2\n"; }

then the code

my $functionReference = \&subr;

&$functionReference(1,2);

then prints out '1 2'.

Internally, Perl is going through gyrations that look something like what is in Figure 11.1:

/tmp/fig111.fig

Figure 11.1

How code references work.

Hence, '&$functionReference(1,2) becomes &{CODE(0xa47ec)} which becomes &subr. Of course, the important thing is to remember what is pointing to what! If you try to dereference, with a '&', something that is not a code-reference, you will end up with a fatal error.

Anonymous Subroutines

Just as you can have anonymous data structures you can have anonymous subroutines. Anonymous subroutines do not have a name associated with them, only pointers. If you say:

my $coderef = sub { print "@_\n"; };

then you are defining a code reference. Note the semicolon at the end of the line! This is a statement not a block. (If you understand the difference between these two concepts, you will learn Perl very fast).

This statement is equivalent to:

my $coderef = \&subr;

sub subr { print "@_\n"; }

only it is much cleaner, and much more concise. To call the codereference you then say:

&$coderef('args', 'here','now');

which then prints out 'args here now'.

Callbacks

The primary use of code references is to program callbacks. Callbacks are functions that fill in a bit of functionality that a function or object lacks. We have already seen some forms of callbacks, although we didn't point them out. Remember grep and map? Well, they are implemented in terms of callbacks. When you say:

my @definedElements = grep { defined($_) } @elements;

grep has no way of directly knowing what function to use to transform elements. '{ defined($_) }' here is a callback. grep then looks inside this code snippet, and uses this information to modify its behavior. As a result, grep then returns a list of defined elements, assigning them to @definedElements.

Anyway, for fun, let's reimplement the built-in grep function in Perl just to get an idea of how you might do this in your own projects:

grep Redone in Perl

As said in the last chapter, grep is a very useful built-in. It allows you to filter out elements that don't belong in an array, or simply see if a pattern is in an array. The usage looks something like:

@elements = grep( condition, @array );

where condition is a user defined subroutine that tells grep what to look for, and @array is a user defined array. After all is done, @elements contains a list of elements which satisfy the condition. Hence, let's define the bare bones of mygrep as:

1 sub mygrep

2 {

3 my ($coderef, @array) = @_;

4 (print ("Need coderef in first argument!\n"), return())

5 if (ref($coderef) ne "CODE");

6 }

Here, we build in the fact that $coderef needs to be a code reference in the guts of the code. If we did not, as soon as we tried to dereference the code with &$coderef, Perl would croak in a most unfriendly fashion.

To fill this in, remember what grep() does. grep() goes through each element in an array and then sorts through them, returning a list of the ones that match the condition specified. mygrep will look like:

Listing 11.3 - mygrep.p

1 sub mygrep

2 {

3 my ($coderef, @array) = @_;

4 my ($return, $element) = ([], '');

5 (print ("Need coderef in first argument!\n"), return())

6 if (ref($coderef) ne "CODE");

7 foreach $element (@array)

8 {

9 if ( &$coderef($element) )

10 {

11 push(@$return, $element);

12 }

13 }

14 return(@$return);

15 }

The heart of the algorithm is in lines 7 through 13. Processing goes through each element, and then calls the code reference $coderef in line 9. We don't care what it is as we hope that the user of the function knows what he is doing, whereas mygrep does not.

All we care about is that $element gets pushed on the stack $return, if indeed it does evaluate to 'true'. If we called this now with:

@array = (1,4,11,10);

my @return = mygrep( sub { $_[0] > 10 }, @array);

then Perl essentially goes through the following logic:

Step #1: substitute.

if (&{ $_[0] > 10 }(1)) { push (@$return, 1); }

if (&{ $_[0] > 10}(12)) { push (@$return,12); }

if (&{ $_[0] > 10}(11)) { push (@$return, 11); }

if (&{ $_[0] > 10}(10)) { push (@$return, 10); }

Step #2: evaluate.

if (1 > 10) { push (@$return, 1); }

if (12 > 10) { push (@$return, 12); }

if (11 > 10) { push (@$return, 11); }

if (10 > 10) { push (@$return, 10); }

Step #3: assign

push (@$return, 12);

push (@$return, 11);

Step #4: return

return(12, 11)

And then,

@array = (1,12,11,10);

my @return = mygrep( sub { $_[0] > 10 }, @array);

evaluates to '@return = (12,11);'.

If you understand this logic and the steps that Perl is going through to make the magic happen, you will be able to debug Perl code like lightning. It is a simple matter of 'thinking the way the Perl interpreter thinks' in order to get coding done fast.

Closures.

And finally, we shall give a nod to closures which, some people state are just as powerful as the modules and functions that we will cover in the upcoming chapter.

The process of making closures is simply the 'use of callbacks to generate a family of functions, based on another function'

That may not sound the most simple, but when you get into practice doing it, making closures is fairly easy.

Suppose that you had the following function, which added two values together:

$val1 = 4; $val2 = 4;

sub add

{

print "Adding $val1 and $val2 to get ", $val1 + $val2;

}

my $addref = \&add;

print &$addref();

This is not so good an idea, because $val1 and $val2 are global. However, we can get rid of the global part by saying:

my ($val1, $val2) = (4, 4);

sub add

{

print "Adding $val1 and $val2 to get ", $val1 + $val2;

}

my $addref = \&add;

print &$addref();

Simple, right? We just added a my variable to get rid of a global value - and perl keeps track of what $val1 and $val2 are inside the code. Now the trick behind closures is to take this idea of encapsulating values one step further, to say:

sub addgen

{

my ($val1, $val2) = @_;

return sub

{

print "Adding $val1 and $val2 to get ", $val1 + $val2;

}

}

So what have we done here? We now have the function add inside addgen, so the variables $val1, $val2 are no longer static. They are dynamic, changing each time on a call to addgen. Furthermore we have made addgen return a sub reference each time it has been called. Now watch this:

my $ref1 = addgen(5, 5);

my $ref2 = addgen (6,6);

print &$ref1(), "\n";

print &$ref2(), "\n";

This will print out 'Adding 5 and 5 to get 10' when you say &$ref1(), and will print out 'Adding 6 to 6 to get 12' when you say &$ref2().

In other words, addgen is a subroutine generator. Each time you call it, it binds the anonymous sub it returns with the values that are passed to it. This is useful, if not slightly dangerous.

The main thing to see about closures is that they are just as powerful as objects in their own right. You can use them if you want to store massive data structures inside a given sub, and then return that sub to the 'world' with data that it 'remembers', just like objects.*

Those of you new to object oriented programming will probably want to skip to chapter 13 for more detail on objects and the benefits of OO programming. We cover it a lot.

However, I am not fond of closures. I mention them here in the effort to be complete; the functionality they provide can be given by the more standard object. There is 'more than one way to do it' though, and you can experiment with them if you like.

  1. Summary of Code References

Code references are simply perl's way of 'pointing' to functions. Code references, and especially callbacks, are concepts that you are going to want to learn fairly well. The whole of PerlTk - a method of making GUI perl scripts - is built on their concept, as well as a few techniques that we will discuss in the later half of the book.

To make a code reference, you simply 'point' a scalar variable at either an anonymous sub:

my $a = sub { print "HERE @_\n"; }

or at an already existing sub

my $a = sub new { print "HERE @_\n"; }

and then $a will contain a code reference. To actually call the function in that code reference you say:

&$a(1,2,3);

where the '&' dereferences the code reference to get the actual function. Doing so you will print 'HERE 1 2 3'.

From code references we build callbacks (where a code reference is used to 'fill in the gaps' behind another function) and closures (which are functions that are code reference generators).

Globbing (typeglobs)

Globbing is the process of referring to more than one thing by a special symbol. We have already seen globbing as it applies to filenames in some detail. For example, this statement:

my @files = glob("*.c");

tells Perl to reach out into the operating system and grab all the files that end with a '.c'. Filename globbing is a quick workaround for interfacing with the operating system (you might want to use opendir, readdir, and closedir instead, as suggested last chapter).

However, this statement:

*line = *variable;

demonstrates what is called typeglobbing. This tells Perl to expand the * into each type of variable symbol that there is, hence the above statement could be translated to mean:

$line = $variable;

@line = @variable;

%line = %variable;

&line = &variable;

all of which is being done at the same time, only the above examples will actually copy the different variables. In globbing, no copy is being done. Everything is being done with references, and the symbols line and variable are now truly aliased, which means that they point to the same reference. This is more like:

\$line = \$variable;

\@line = \@variable;

\%line = \%variable;

\&line = \&variable;

where each reference is tied together. Globbing is used because this above syntax is illegal in Perl (but it may not be soon). After you say *line = *variable, if you change '$line', you also change '$variable' and so on.

Globbing Tricks

You can do some interesting things with globbing. Here are three:

Alias individual variables

Create read only values

Create subroutine aliases

Alias Individual Variables

Globbing not only works in aliasing all symbols together, you can also alias individual symbols as well. This:

*alias = \@line;

would make it so @alias was exactly the same variable as line (until @alias was typeglobbed to something else!).

Create Read Only Values

For a read-only value (array, scalar, or hash) you can say something like:

*PI = \'3.14';

*E = \'2.718'

and get a true, read only value. With this value, the statement $E=3; prompts you with a:

'Modification of Read only value attempted at line xxxx'

error.

Create subroutine aliases

Since aliases work on any type of reference, including subroutines, you can make aliases to other subroutines if you say:

*other = \&this;

Now you are aliasing the subroutine other to the subroutine this. Hence, anybody that called the function other as in:

other_sub('my','arguments');

would actually be calling the subroutine this instead.

Globbing and Exporter

Aside from globbing allowing these relatively useful tricks, a good reason to know about globbing is because it explains so much about how Perl functions.

I warn you though, the following discussion gets rather intricate. If you have not done so already, go to the chapter on namespaces (the syntax of libraries and modules, chapter 14) and brush up on how Perl handles segmenting up functions into different compartments called 'packages' or 'namespaces'.

Anyway, consider yourself warned. One of the key modules that we will see when we come to modular and object oriented programming is the module Exporter, has at its heart the use of globbing and globbing tricks. (for more information on the use and benefits of Exporter, turn to chapter 14.) As we shall see, when you say something like:

1 package MyPackage;

2 use Exporter;

3 @EXPORT = (my_sub, my_sub2);

and then use this package inside another package/script, as in:

use MyPackage;

somehow the functions "magically" appear inside the MyPackage namespace, so you don't need to say 'MyPackage::my_sub2()' to reference the function 'my_sub2'. You simply need to say:

my_sub2();

and away you go. Globbing is the reason for this magic. The essence of exporter can be summarized by the following line:

*{"${callpkg}::$sym"} =

$type eq '' ? \&{"${pkg}::$sym"} :

$type eq '&' ? \&{"${pkg}::$sym"} :

$type eq '$' ? \${"${pkg}::$sym"} :

$type eq '@' ? \@{"${pkg}::$sym"} :

$type eq '%' ? \%{"${pkg}::$sym"} :

$type eq '*' ? *{"${pkg}::$sym"} :

Carp::croak("Can't export symbol: $type$sym");

which appears in Exporter.pm lines 158 to 164. This rather elegant statement does a lot of magic, and if you understand it, you understand Perl quite well. '${callpkg}' is the calling package, and '$sym' is the symbol to be evaluated. '${pkg}' is the package where exporter is used. Lets go over this statement in a little more detail:

First, define a package to export names:

1 package MyPackage;

2 use Exporter;

3 @EXPORT = (my_sub, my_sub2);

then use:

use MyPackage;

in a script. Now your calling package is main, the package where exporter is used is MyPackage, and the $type of symbol is ''. The upshot is that this expression evaluates as:

*{main::my_sub} = \&{MyPackage::my_sub};

Now, if you want to call MyPackage's version of my_sub, you simply have to say:

my_sub();

and since the glob statement inside the exporter has aliased it for you, Perl knows which statement you mean. If you understand how this works, you understand a lot about the malleability (and usefulness) of Perl. If it isn't quite clear, go through it a couple of times, or even look at the Exporter.pm module in the distribution. Your Perl will improve immeasurably if you understand what this module is doing.

How Perl 'Runs' Your Program, and a Discussion of BEGIN/END

This section is devoted to folks who need to have a more complete picture of the sequence of events that occur when Perl is actually runs your programs. In other words, the steps that occur between the time you type 'script.p' and when Perl executes your program. This information can be extremely helpful in finding solutions to otherwise inexplicable problems, especially for people doing CGI programming, since CGI programs usually do not run by people typing them, but instead have a server run them, which can cause lots of confusion.

Perl Compilation Steps

Exactly what happens when a Perl script is executed? Perl has two distinct steps in parsing a given script: compile and run.

The compile time step is when Perl turns scripts or programs into what is called a parse tree. A parse tree is the internal representation of code that is able to be understood by the operating system, and is optimized by compilation. The run time step of running Perl takes the internal representation and actually executes whatever parse tree the interpreter came up with.

This simple picture is complicated quite a bit by the fact that Perl provides two special routines:

BEGIN { }

END { }

These routines allow you to have the usual order of events subverted such that some blocks of code are executed before anything else (BEGIN) and some blocks are executed last (END). In other words, if you put what is termed a BEGIN block anywhere in your code, Perl will parse and execute that piece of code first, before any of your other code is compiled. For example:

1 print "This will print second!\n";

2 BEGIN

3 {

4 print "This will print first!\n";

5 }

prints out:

This will print first!

This will print second!

even though the 'This will print first!' line comes after the 'This will print second!' line in the code fragment.

Likewise if you say:

1 END

2 {

3 print "This will print second!\n";

4 }

5 print "This will print first!\n";

Then you will get the same results ('print second' after 'print first') even though the END block comes before the print.

What is happening here is not simply semantics, i.e. Perl is not rearranging the code so that all the BEGIN blocks happen first, and all the END blocks happen last. Instead, Perl is actually treating all of the BEGIN and END blocks as separate programs, much like eval does later on. In other words, if you say

1 BEGIN

2 {

3 print "Program #1\n";

4 }

5 print "Program #2\n";

6 END

7 {

8 print "Program #3\n";

9 }

these are actually three separately compiled programs, such that the BEGIN block is checked for syntax, and executed first, then the main body of the code (Program #2) is checked for syntax and executed, and finally the END block is checked for syntax and executed. The difference between this and simply writing 3 separate Perl scripts is that the BEGIN blocks and END blocks can communicate with the main process through variables or subroutines.

If you say:

1 BEGIN

2 {

3 $ENV{'PERL5LIB'} = "/my/path/to/perl";

4 }

5 print $ENV{'PERL5LIB'};

then this will first set the 'PERL5LIB' key in the environment, and the print statement will output:

/my/path/to/perl

because the variables, etc. are shared.

Table 11.3 gives the exact order of what is going on in these situations. You will find that it is helpful for tracing down problems

Table 11.3

For each given BEGIN block Step #1: Compile the given BEGIN block (this includes use as well)

Step #2: Run the given BEGIN block.

For each END block defined (in the opposite order defined 'backwards'):

Step #1: Compile the given END block.

For code outside of a BEGIN or END block:

Step #1: Compile the main code

Step #2: Run the main code

For each END block defined (in the opposite order).

Step #1 Run the given END block.

Also do this on the case that the program terminates abnormally (due to a die)

Do global clean-up

Exit.

 

Hence, if your program is:

1 BEGIN

2 {

3 print "Hmmm.. this works";

4 }

5 print "This does too....\n";

6 END

7 {

8 print "Syntax ERROR!!!!\n

9 }

when run, this program outputs:

Hmm...this works

and then die with a syntax error since END blocks are compiled before the main program is run.

More on Perl Parsing

Let's take a look at two common places in which you should be aware of the way Perl is parsing programs:

1) when using modules

2) when running non-interactive Perl. This is a common practice among CGI programmers (in which the computer is executing the scripts, not you).

Modules

Later on in chapter 14 we discuss how modules work, but for now, let's look at how modules relate to BEGIN {} and END {}. When using modules, it is important to understand the Perl parsing order when programming with modules. For example, if you say something like:

use MyModule;

then, in reality you are saying:

1 BEGIN

2 {

3 require MyModule;

4 eval("MyModule->import()");

5 }

where import is a user defined function that may or may not exist in a package. That is why it is wrapped in an eval. (See the next section for more on this.)

Therefore, modules are actually BEGIN blocks in disguise. This code:

1 use Config;

2 if ($Config{'osname'} =~ m"Win")

3 {

4 use WindowsModule;

5 }

6 else

7 {

8 use UnixModule;

9 }

is not going to work. Why? Since 'use' statements are inherently inside begin blocks, the 'if ($Config' test' is ignored and there is an attempt to import both modules. This code turns into something like:

1 use WindowsModule;

2 use UnixModule;

3 if ($Config{'osname'} =~ m"Win")

4 {

5 }

6 else

7 {

8 }

This means that the program dies on the module that does not exist on the other platform.

Instead, you are going to have to say:

1 BEGIN

2 {

3 use Config; # uses the module config which tells you if a certain

4 # system is being used. This tells you what is or is

5 # not on your system.

6 if (($Config{'osname'} =~ m"Win") # look at OS type.

7 {

8 require 'NTModule.pm'; NTModule::import();

9 }

10 else

11 {

12 require 'UNIXModule.pm'; UNIXModule::import();

13 }

14 }

In other words, avoid the 'use' statement, and put everything in a BEGIN block. This is kind of messy, since you have to put this block of code inside every script that you want to use. An even better approach is:

package GenericModule;

use Config;

 

sub import

{

# code up above...

}

1;

Now when people use the package GenericModule in their code, all they have to say is:

use GenericModule;

and the details of what is UNIX, and what is NT will be hidden from them.

Other uses of BEGIN/END

You will find quite a few uses for BEGIN and END once you get used to them. For example, suppose that you have a Perl script that creates temporary files while checking on the status of various systems ( pseudo code - you need to define system_status() and network_down() ):

1 my @systems = ('sampson', 'zulu', 'bravo','charlie');

2

3 my $system;

4

5 foreach $system (@systems)

6 {

7 my $fh = new FileHandle("> /tmp/system.$ext");

8 print $fh "Status of System: ".system_status($system);# prints status.

9 die "Fatal Error!!!\n" if (network_down()); # dies if network down.

10 }

11

12 &email_status_and_cleanup();

As it stands, this program has a flaw. What happens if the network goes down, in line 9? Then email_status_and_cleanup() will never get called. die causes the script to terminate immediately,

There are two potential solutions to this problem. One is to pair the call to email_status_and_cleanup with the error as in:

1 if (network_down())

2 {

3 email_status_and_cleanup();

4 die "Fatal Error!\n";

5 }

But this is no good if you have several separate calls to exit() because you will need to copy the call to the function email_status_and_cleanup() to many separate places, which of course raise the chance for error. What happens if you forget to call this function someplace?

It also may be feasible to say:

1 die_and_cleanup("Fatal Error!\n");

2

3 sub die_and_cleanup

4 {

5 my (@messages) = @_;

6 email_status_and_cleanup();

7 die "@messages\n";

8 }

But this is a little unclean as well. Wrappers like this are helpful, but suppose that we want to catch several different types of errors? Then we would have to have several wrapper functions.

(side note: We shall see when we get to 'debugging Perl' that Perl has a variety of ways of dying. We even saw one of these above: carpout makes the script die in the form of an HTML page, for example.)

END is the perfect solution here. END provides a convenient way of doing something right before the program exits, whether that program dies, exits, or whatever. It is important to note that END blocks will NOT be executed if a signal kills the program. Short of a kill signal or a syntax error in your main program, Perl guarantees:

1 END

2 {

3 &email_status_and_cleanup();

4 }

will always call email_status_and_cleanup(), since it is not part of the main execution cycle. In fact, this function occurs between the end of the main program and the exit of the program.

Likewise, suppose you wish to make sure that two programs do not run at the same time. You could do something like this:

1 BEGIN

2 {

3 exit if (-e "marker");

4 my $fh = new FileHandle("> marker"); close("marker");

5 }

6

7 # ..... do your stuff

8

9 END

10 {

11 my $status = unlink("marker") ||

12 die "Couldn't delete marker!\n";

13 }

At the very beginning of the program a marker file is created, and the 'exit if (-e 'marker');' statement assures that no second process will be able to start. Then, after the program exits, the marker is deleted so the program is runnable again.*

This scheme is fairly good for small processes.

However, this could fail if the multiple processes run at exactly the same time, since both processes would create the file 'marker' at the same time. See the perlfunc man page under flock for a more bulletproof, if less portable way of doing this (UNIX only). We will also consider this when making a mutex (or mutually exclusive resource) object. See chapter 16.

BEGIN and END are also good for timing programs, notifying people upon start and completion of a process, and diagnosing the output of a process, amongst other things. And they have quite the use in a type of Perl programming that is very popular today, and we shall talk about presently.

Non-Interactive Perl

The previous section demonstrated that knowing the order of execution within Perl can be helpful. Once you learn it you can use it to your advantage with BEGIN/END. Another place in which knowledge of the execution order of Perl comes in handy is when the machine executes the code, rather than a human (this includes CGI). The following typed at the keyboard:

prompt% perl script.p

executes Perl with a hidden dependency. The executing program is dependent on the environment that in which it is running. For example if script.p consists of:

#!/usr/local/bin/perl5

 

system("run_program");

then you are dependent on your PATH variable to find 'run_program' (in both windows NT and UNIX). If you don't have your PATH variable set correctly, then this program will not run correctly: it will say 'run_program not found'.

This all seems self-explanatory. However, things get more complicated when these same scripts do not run from the command line.

Trust me, you will find out exactly how dependent your scripts are on the environment in which they are running when you run this from a Web-server or a cron job! And it can be quite a struggle to remove these dependencies.

Let's take a look at a common place where the environmental variables come into play. Here's a small, convenient CGI script that prints out the environment in which http servers run their scripts

Listing 11.4: env_printer.p:

1 #!/usr/local/bin/perl5

2

3 use CGI; # Uses CGI module

4

5 my $page = new CGI; # makes an instance of the CGI module

6 print $page->header; # prints out header for the page

7

8 print $page->start_HTML( # what will show up on the top of the browser

9 -title=>'Environment Printer',

10 -BGCOLOR=>'white');

11

12

13 foreach $key (keys %ENV) # print out the environment

14 {

15 print "$key= $ENV{$key}\n<br>\n";

16 }

17 print $page->end_HTML; # end the page.

All env_printer.p does is create a Web page that shows the environment in which the script was run. Let's take a look at the differences when run this from the command line:

prompt% env_printer.p

pdg1102.gif

Figure 11.2

output of env_printer.p at command line.

And now look what happens when run it from the CGI prompt:

pdg1103.gif

Figure 11.3

output of env_printer via scripts.

As you can see, the environments are totally different. In particular, the paths when env_printer.p is run under the server reflect the environment at the time the server was compiled. Therefore, any Perl script that has dependencies on environmental variables that aren't in this list is not going to work.

This behavior brings up the following two problems for CGI developers.

Problem #1: Extensions for Third Party Software

Consider the situation in which it is desired to add an extension for a database, or some other third party software onto a site, as in the program nifty_dbase.p1:

1 use CGI;

2 use Sybase::DBlib;

3

4 my $page = new CGI;

5 my $sybLogon = new Sybase::DBlib($user, $password, $server);

nifty_dbase.p1 is NOT going to work. This is because in order to get a database connection, the Sybase::DBlib object needs to have access to environmental variable SYBASE that tells it where the database is located on the machine. Also, depending on how Sybase was installed, Sybase::DBlib may need variable LD_LIBRARY_PATH to link with Perl correctly.

Thus, the problem is how to add LD_LIBRARY_PATH and SYBASE so the script knows about them?

Problem #2: Adding include paths for libraries

Suppose you want to put together your own libraries in a certain location, and use them in a CGI script:

1 use CGI;

2 use lib "$ENV{MY_LIBRARIES}";

3

4 use MyObject;

5 $myobject = new MyObject();

6

7 $page = new CGI;

This is not going to work either. The environmental variable MY_LIBRARIRES will not be found by the CGI script (since it was not compiled with the server), and MyObject will not be loaded. And both examples will blow up when you try to access them, giving the result:

Internal Server Error:

 

The server encountered an internal error or misconfiguration.

which is the servers way of dealing with all the error messages that a Perl script is generating (instead of sending HTML data). What is really happening is something like:

Can't locate MyObject.pm in @INC at myscript line 3.

BEGIN failed--compilation aborted at myscript.pl line 3.

where the Perl script prints this out to STDERR and the browser isn't smart enough to recognize this as an error. How do you add the environmental variable MY_LIBRARIES to your HTML script?

Of course, the reason for the above behavior is security. If you let the cronjob inherit the environment of the user, or let the http server inherit the environment of the user who runs the httpd, then theoretically a malevolent user could slip in a command of his own choosing, and run it against your computer with the authority of the http server. By restricting the environment of the server, you reduce the loopholes, and the threat of the attack is minimized.

Paranoid? Yes. Justifiable? Probably. Anyway, it is a fact of life, and we need to deal with it while programming. It also may or may not be as simple a solution as tacking on the correct environmental variables to a script that is not working.

The above two examples are part of a class of problems which a Perl script faces, namely questions of TIMING. When, exactly, do things happen in a Perl script? In the above two examples, we needed to add to add three different environmental variables to make the scripts work. The obvious solution (simply adding the environmental variable in the body of the script itself) may or may not work.

Solutions through BEGIN/END

Surprisingly, adding these three environmental variables to a Perl script are three separate problems, and therefore require different solutions. They can be classed as:

1) Environmental variables that can be set inside the Perl script itself. Example: the environmental variable SYBASE, which is necessary for Sybase::DBlib to find the location of the databases.

2) Environmental variables that need to be set before the compilation step. Example: the environmental variable MY_LIBRARIES, which is necessary to find the package 'MyObject' before running the program.

3) Environmental variables that need to be set before the Perl script runs. Example: the environmental variable LD_LIBRARY_PATH, which is necessary for Perl to find the C library associated with Sybase::DBlib. Another example: PERL5LIB, which is only looked at once by Perl, when the Perl program starts.

Let's look at solving each of these in turn.

First Solution: Environmental Variables Needed at Run Time

$ENV{SYBASE} was needed by the program to tell the Sybase module where to point to its database. Here the solution is easy. Since the problem is a run time problem, and Perl executes its statements sequentially, all we have to do is set the variables. These variables could be set at any time:

1 $ENV{'SYBASE'} = '/usr/lib/sybase';

2 $dbconnection = new Sybase::DBlib('user','password', 'SERVER');

3

4 $ENV{'SYBASE'} = '/usr/other/lib/sybase';

5 $dbconnection = new Sybase::DBlib('user','password', 'SERVER2');

could set up two simultaneous connections to databases, if so desired.

Solution 2: Environmental Variables Needed at Compile Time

To solve this problem, we again take advantage of the BEGIN block. Since we know that use blocks occur before the rest of the main program, and that they run sequentially, all we have to do is insert:

1 BEGIN

2 {

3 $ENV{'MY_LIBRARIES'} = '/use/my/libraries/now';

4 }

before the:

1 use lib "$ENV{'MY_LIBRARIES'}";

This then forces the use to use the updated environment.

Solution 3: Environmental Variables Needed Before the Perl Script is Run

Solving the problem of adding environmental variables which are needed before the script is run (LD_LIBRARY_PATH and PERL5LIB come to mind) is the most counter-intuitive of the bunch, and takes a bit of understanding of what is going on in the underlying environment.

After all, how do you add a definition to a Perl script that needs this definition before it executes? Seems like a catch 22, and it sort of is. The tricky part here is that you cannot use the following code to get around this problem:

1 BEGIN

2 {

3 $ENV{LD_LIBRARY_PATH} = '/correct/path/to/lib';

4 }

use Sybase::DBlib;

To see why, consider the environment that Perl runs in. It is a very basic fact of shell architecture that any script, program, binary, or whatever, cannot change the parent environment it runs in. This is a very basic principle, but one that many people do not know, or vaguely comprehend. If your environment looks like:

prompt% env;

VARB1=1

VARB2=1

VARB3=1

and you write a script that looks like:

undef %ENV; # not necessarily the best thing to do!

which wipes out the environment, and then run it, this will NOT affect your current workspace! When you say:

prompt% env_killer.p # wipes out current environment

prompt% env;

VARB1=1

VARB2=1

VARB3=1

you get the same environment as before.

In other words, programs run in their own protected, safe environment. This is true cross platform, UNIX, DOS, or Windows NT. Pictorially, it looks like this:

/tmp/fig114.fig

Figure 11.4

conceptual drawing of the shell environment

The child is fenced in; it inherits all the environmental variables from its parent, but it is NOT allowed to change those variables.

In our case with LD_LIBRARY_PATH,

1 BEGIN

2 {

3 $ENV{LD_LIBRARY_PATH} = '1'

4 }

this looks like Figure 11.5

/tmp/fig115.fig

Figure 11.5Caveats of running in the shell

In other words, this will not work because by the time the program starts, the thing that glues Perl together with outside libraries (the linker) has already found and used LD_LIBRARY_PATH. Hence, setting it in our process doesn't do a thing.

So, what's a programmer to do? Well, in this case we have two choices. The first choice is to compile the libraries statically rather than dynamically. This involves taking all the necessary libraries, and remaking Perl so that they are internal to Perl.

This solves our problem, since you have eliminated the dependency on finding the shared libraries, but it also makes a large executable. and sometimes is tough technically. And often, this is not an option, especially if the variable does not have to do with LD_LIBRARY_PATH (or libraries) at all, or you are using a precompiled Perl like ActiveWare.

Another choice is to set the necessary variables, and then somehow 'trick' the program into thinking that these variables are its own. To see how we can do this, lets take a look at our model of the environment Perl runs in:

/tmp/fig114.fig

Figure 11.4

conceptual drawing of the shell environment

This model has some limitations, but it does not say that we can't muck around in the CHILD variable namespace. In particular, if we said something such as:

Listing 11.5 parent.p

1 #!/usr/local/bin/perl

2

3 undef %ENV;

4 system("printenv.p");

Listing 11.6 printenv.p

1 #!/usr/local/bin/perl

2

3 foreach $key (keys %ENV)

4 {

5 print "$key => $ENV{$key}\n";

6 }

This will print out nothing, when we run 'parent.p'. Now the printenv.p script is inheriting from its parent process (the script shown), and NOT from the shell proper. Therefore, we can use a pretty cool trick in order to force our original program to use the correct linker:

Listing 11.7 getlink.p

1 BEGIN

2 {

3 if ($ENV{'LD_LIBRARY_PATH'} eq '')

4 {

5 $ENV{LD_LIBRARY_PATH} = 'my_path';

6 exec($0);

7 }

8 }

Pictorially, this looks something like Figure 11.5

/tmp/fig116.fig

Figure 11.6

The Re-Exec Trick

See what this is doing? The first time around, it checks for a LD_LIBRARY_PATH variable. Since this variable is undefined, Perl goes ahead and defines it for us ($ENV{LD_LIBRARY_PATH} = 'my_path';) and then re-executes our program for us, keeping the modified environment intact! It forks off itself, so it inherits $ENV{'LD_LIBRARY_PATH'} from itself! T

The exec re-executes the script, starting it over with the new environment. And since $ENV{'LD_LIBRARY_PATH'} is set the first time through, we don't go into an infinite loop! The child will re-execute, and since it finds LD_LIBRARY_PATH set, because the child inherited it from its parent, Perl will ignore this BEGIN block and go on to execute the rest of the code.

This is extremely useful since it makes Perl totally self contained. We can internalize the environment that we use inside the Perl scripts themselves, without resorting to wrappers, batch files, or other such inconveniences. Once you move your source code from one place to another seamlessly and silently, you will thank Perl for this power.

Using BEGIN for Debugging CGI

Finally, let's get rid of that very annoying 'Internal Server Error', by using BEGIN:

An internal error occured, please check server configuration and try again.

The reason why this is occurring is because if you get an error such as:

Can't locate MyModule.pm in @INC at program_name.p

this is not a viable HTML page, hence the complaint from the browser. By taking advantage of the fact that the BEGIN blocks come before the rest of the programs, we can make our errors turn into HTML pages!

We could do this by hand, but fortunately the CGI module provides us with a couple of really wonderful functions: carpout() and fatalsToBrowser(). Both of these will save you hours of tracking down bugs. Again, we can do this by using a BEGIN block. Insert the following code into any HTML program:

Listing 11.8 cgiDebug.p

1 BEGIN

2 {

3 use CGI::Carp qw (carpout fatalsToBrowser);

4 use FileHandle;

5 my $LOG = new FileHandle ( ">> /usr/local/lib/cgi/log/logname");

6 carpout($LOG);

7 }

and voila! Any time an error comes into your program, from any source, internal or external, it will be logged into the file pointed to by the FileHandle object $LOG. Importing the function fatalsToBrowser redefines all the functions die, confess, carp, and so on. With these two statements, you will trap both fatal errors (programming errors), or debugging output, getting displays like this for fatals:

pdg1107.gif

Figure 11.7

Fatals output via fatalsToBrowser

and, if you so desire, your own debugging output for debugging output:

pdg1108.gif

Figure 11.8

Debugging output via carpout

This is extremely powerful, and can cut your time drastically when dealing with environmental problems such as the example below. You can test in the same environment that you are going to release your programs to (i.e.: the cgi-bin server), which as any systems engineer can tell you, is about 90% of the battle in developing systems. Read the documentation thoroughly on CGI::Carp for more information on this. This site details quite a few more useful tricks.

If you use carpout and fatalsToBrowser, your life will be made ten times easier because the environmental errors that you get in running scripts non-interactively will be displayed clearly for you to see. Now let's fix them. Again, we have three types of environmental errors, classified by when Perl needs the environmental variable.

Summary of BEGIN/END and Flow Control

The BEGIN and END special functions are functions that Perl provides in order to give the programmer more flexibility in deciding when things are to be done. BEGIN blocks are always compiled and ran first, whereas END blocks wait until everything else in your main program is done running before they kick in.

Flow Control, the process of understanding each individual step that Perl is doing to avoid coding mistakes, is a very important thing to understand for Web programmers, and folks that write cron jobs (it also doesn't hurt to learn even if you do neither of these things). Basically, Perl program execution has two steps: runtime and compile time. Each step does several different things, but the main thing to remember is that the 'compile time' step will not catch errors where subroutines are undefined.

Eval

eval is a function in Perl that has a lot of significance, since it is used in Perl in such a myriad of ways. eval takes a string, turns it into a Perl program, and executes it.

Hence, if you say:

$line = 'print "Eval par-excellence!\n";';

eval ($line);

This will go ahead and execute the line $line, printing out 'Eval par-excellence!'. eval is one of the really cool things that Perl inherited from the shell. Try doing that in any compiled language!

Principles of Using eval

The three main things to remember when you are using eval are:

1) eval goes through exactly the same rigmarole as Perl itself does when started up. eval checks the syntax of the program, assigns variables, the whole bit, everything short of starting a new process. Hence, there might be a considerable bit of overhead in using eval.

2) Although eval checks for syntax errors and fatal errors (such as die), eval does not terminate the main program when this occurs. Instead, it traps the errors in a special variable, called $@. Likewise, eval will return '' if it did not execute correctly, or '1' if it did. If you say:

my $status = eval('print "This Works because it is correct syntax\n";');

print $@;

Then $status returns a 1 because the eval worked, and $@ will become ''. This policy let's you check whether eval worked cleanly.

3) eval can (and does) use all of the variables that it inherits from the main program. Furthermore, if you make any variables inside the eval, they will transfer to the 'outside' unless they are 'my' variables. For example, if you say:

my $a = 1;

eval('$b = $a;');

print "$b\n";

This will print '1'.

However,

my $a = 1;

eval ('my $b = $a;');

print "$b\n";

will print '', because the variable $b is a 'my' variable and has gone outside of the scope of the eval.

Usage of eval

OK, so what can you do with eval? Several things, as it turns out. Let's go over some of them below.

Checking For Features on a System.

One of the really cool things that eval provides you is the ability to check what system you are in, or what resources that system has.

Since eval can check to see if code will pass or fail, and then return a status to the main script, you can wrap calls in an eval, and then make decisions based on what the eval returns. For example:

eval ("require 'OLE'") || $system = 'Unix';

is a pretty safe way to determine whether or not you are on an 'ActiveWare' variant of Perl. After all, UNIX doesn't have a built-in version of OLE, and hence this statement will evaluate to false, short circuit, and set the system to UNIX. Likewise,

eval("getpriority(0,0);1;") || $system = 'Unix';

will give you a pretty good idea of whether or not you are a UNIX system.

By using eval in this way, knowing that certain Perl functions are not universals, and then providing an alternative to those functions is an extremely good way of bulletproofing your programs. For more ways to bulletproof your processes, we have devoted an entire chapter (Portability and Perl).

Using Perl Syntax to Enhance User Interfaces:

One of the smallest, yet most powerful Perl scripts out there is the 'rename' Perl script as written by Larry Wall. It looks something like this (I've rewritten to make it a little more verbose):

Listing 11.9 rename.p

1 my $operation = shift (@ARGV);

2 foreach $argument (@ARGV)

3 {

4 $_ = $argument;

5 eval ("$operation");

die "You must give a legal regular expression ($@)" if ($@);

6 $new_argument = $_;

7 if ($argument ne $new_argument)

8 {

9 rename($argument, $new_argument) || print "Couldn't rename $argument to $new_argument!\n";

10 }

11 }

What this script is doing is basically giving you the power to manipulate file names as if they were strings in Perl. If you say:

prompt% rename 's"\.bak$""' *.bak;

Then this script will take all the '.bak' files and get rid of the extension. Likewise,

prompt% rename 'tr"[A-Z][a-z]";" *

will rename all of your uppercase files to lowercase ones.

How is this working? The key to this script's power is in the line 5:

eval ("$operation"); # was $ARGV[0]

The foreach loop (4) goes through each file name, and then 'filters' that file name through what was passed as the first argument. Hence 'rename 's"\.bak"";' *.bak makes the eval statement:

eval('s"\.bak"";');

which has the effect of chopping off the '.bak' from the end of the input. The results of that filter (home.bak becomes home) are then tested for equivalency. If they aren't equivalent, a rename is required, and the 'rename' function is called (9).

Kind of cool, right? You can use similar tricks to give database access programs the power of Perl syntax. You could enhance grep to take a similar sort of syntax, such that:

grep 'm"[a-z]"' filelist

would find all files with lowercase letters in them.

You could make Perlish calculators or Perlish shells in which Perl input becomes the interface to the world, and people 'type' in Perl expressions as if they were commands. You could make intelligent 'config' files, which have Perl in them to drive other Perl scripts.

More information on what you can do with eval is in the perlfunc manpage.

Increasing Program Performance

Since eval code can be generated before it is executed, you can use Perl to generate tons of code. This process is done for speed rather than readability.

Consider the following problem. Suppose you have files of the form:

123123|Ron Cassidy|2552 Sycamore Street

144123|Thea Thompson|15 Orchard

155152|Helen Gaskell|544 Kentucky Court

This is a fairly common file format when dealing with databases. The file above is a flat file, and it consists of data that is going to be loaded into a database, and which is loaded by a 'bulk copy' command.

Now, for various reasons, sometimes doing a 'bulk copy' in this way is not the most advantageous. Instead, you may want to interact directly with the SQL engine and say something like:

insert into table message_text (123123, 'Ron Cassidy', '2552 Sycamore Street');

insert into table message_text (144123, 'Thea Thompson','15 Orchard');

insert into table message_text (155152, 'Helen Gaskell','544 Kentucky Court')

This also inserts data into the database, but does it in a safer way. Those of you who are Database Administrators will see that this fires off triggers, hence preserving data integrity, whereas the previous example doesn't.

This example shows a pretty good use of eval. In the above, we want to treat characters and integers differently (and floats, and dates, etc.). We want to put single quotes (') around characters, and no quotes around integers. Assuming that $line holds a line of input from the 'input' file, and @types holds the type of each field (integer or character), whereas previously:

$type[0] equals 'integer'

$type[1] equals 'character'

$type[2] equals 'character'

Now, in order to generate these insert statements, we could say something like:

Listing 11.10 genInsertInefficient.p

1 my $line;

2 while ($line = <FD>)

3 {

4 my @rows = split(/\|/, $line);

5 my ($xx, @types);

6 for ($xx = 0; $xx < @rows; $xx++)

7 {

12 if ($types[$xx] eq 'character')

13 {

14 $rows[$xx] = "'$rows[$xx]'";

15 }

16 }

17 local($") = ",";

18 my $insertStatement = "insert into table $table values (@rows)\n";

19 print "$insertStatement\n";

20 }

This is a little complicated, but the main heart of the code is 8-15, where we decide whether or not something is a character field or not. We check the type of the field, and if it is a character, we put quotes around it. However, the main thing to notice about this code is that it is so inefficient. For each row in the file that we are 'transmuting', we:

1) a complete loop through each element in that file.

2) a potential data copy.

3) an 'if then' statement

This could translate into lots of time lost. Going through each element, one at a time to find out what 'type' it is redundant, and unnecessary. We would be much better off knowing ahead of time what types the different fields are, i.e.: saying something like:

1 local($") = ",";

2 while ($line = <FD>)

3 {

4 my @rows = split(/\|/, $line);

5 $rows[1] = "'$rows[1]'";

6 $rows[2] = "'$rows[2]'";

7 print "insert into table message_text values (@rows)\n";

8 }

where we know ahead of time that fields 1 and 2 are character fields. This way, we don't have to loop through the fields each time finding out what we already know. The code as it stands, though, is low-level, simple, and repetitive. If we had a hundred different tables, we would have to write a hundred different subroutines. Not a fun thing to do. Hence, we generate it on the fly.

This is a job for eval:

Listing 11.11 genInsert.p

1 #!/usr/local/bin/perl5

2 my $code =

3 'local($") = ",";

4 while ($line = <FD>)

5 {

6 my (@rows) = split(/\|/, $line);

7 ';

This generates the header.. no matter what the table or what the circumstances, this will always be the same. Note that this is in fact a big variable assignment: the single quotes around lines 3-7 assign everything there to the variable '$code'. As for what follows:

Listing 11.12 genInsert.p

8 my $xx;

9 for ($xx = 0; $xx < @rows; $xx++)

10 {

11 if ($type[$xx] eq 'character')

12 {

13 $code .= '$rows[$xx] = \'$rows[$xx]\'' . "\n";

14 }

15 }

16 $code .= 'print "insert into table message_text values (@rows)\n";' . "\n";

17

18 }

19 eval $code;

This example adds on the code (line 13) each time a character field has been found. When we get to the eval part of the processing, Perl has created the program for us. Instead of directly reading the variables, we create the program via making the Perl program itself the programmer.

Optimally, this avoids an entire order of magnitude in running times, and also some extra 'if then' statements which take running time as well. This can make the difference between a process taking a day or an hour when the file sizes that you traverse are megabytes upon megabytes in size.

This 'unrolling of the loop' type of programming is pretty common in Perl, especially when you are dealing with splitting files and manipulating them. If you think of places where Perl can write your code for you, you are in good shape and your productivity at certain tasks will skyrocket.

Summary of Eval

eval is Perl's way of making strings into executable programs. Object oriented types could think of it as the 'method which makes a string an executable and runs that executable'. Simply say

eval($code);

and voila! whatever is in the string $code will be treated like a Perl program.

eval has quite a few uses and we took a look at three major categories:

1) trapping errors in Perl code/ doing cross-platform development

2) sprucing up a user interface with the power of Perl's syntax

3) generating code optimized for speed, not readability.

'Any problem can be solved by a layer of indirection', or so goes the saying, and eval shows this quite nicely. You can get quite complicated with eval, and usually that complication is a small negative compared with the benefits in code power that you obtain. But you can overdo it. The worst I have gotten with eval is to have programs that generated programs, which generated programs of their own. I don't suggest doing that (for your own sanity's sake), but the power for you to do so is there.

Summary of Chapter 11

This chapter contained Perl concepts not easily tucked away in a box. However, these concepts are not trivial. In fact, knowledge and usage of these "odds and ends" will enable you to be a powerful, graceful Perl programmer.

Orders Orders Backward Forward
Comments Comments

COMPUTING MCGRAW-HILL | Beta Books | Contact Us | Order Information | Online Catalog


HTML conversions by Mega Space.

This page updated on October 14, 1997 by Webmaster.

Computing McGraw-Hill is an imprint of the McGraw-Hill Professional Book Group.

Copyright ©1997 The McGraw-Hill Companies, Inc. All Rights Reserved.
Any use is subject to the rules stated in the Terms of Use.