Orders Orders Backward Forward
Comments Comments
© 1997 The McGraw-Hill Companies, Inc. All rights reserved.
Any use of this Beta Book is subject to the rules stated in the Terms of Use.

Chapter 15: Abstraction and Modular Programming in Perl

The previous chapter discussed the syntax of modules and libraries, along with examples on how to create a Perl project. In this chapter, we expand on these concepts, giving more examples of the thought patterns that go behind making modules in Perl.

Making modules in Perl is the first step to scaling up code. Probably the best way to think about it is that modules are collections of functions, whereas objects are collections of functions and associated data.

This means that modules are easier to make than objects because there is no need to worry about as many technical details going on at the same time. Or, to put it another way, programming with modules is like juggling three balls simultaneously. Programming with objects, well, is like juggling with four or five, perhaps six balls at the same time. It is the purpose of this chapter to flesh out these concepts with examples.

If you aren't a Perl programmer, or are used to only programming "quick" scripts in Perl, then you are going to want to read this chapter.

We hate to beat it into the ground, but if you are going to:

build anything larger in Perl than programs approximately 1000 lines long

build a set of related programs

actually have to maintain your code

not rely on 'job security through code obscurity'

build a foundation for object-oriented programming in Perl

then you will want to have an intimate understanding of modular programming and how to effectively use it in Perl. Even if you aren't going to be a large scale OO programmer, learning modular programming will go a long way to making you feel more comfortable with the language, as well as make you more efficient in it.

Modular programming concepts

Modular programming is based on a rather simple concept, that if followed regularly, gives incredible power over your programs. There are challenges in becoming a good, modular programmer, and they are not all technical.

In fact, the technical side of modular programming is rather simple. The 'simple concept' mentioned above involves creating subroutines and pigeonholing them into a common place, the package. Instead of having functions sitting at the bottom of scripts, they are abstracted out by putting them in a place where they can be found again. That's about it! Simple, isn't it?

You can think about this in terms of a 'client/server' relationship. We will use these terms a lot, and it is a good idea to know what they mean in an object oriented context:

A client is a program that uses bits and pieces of other programs.

A server is not a program in itself, but bits of functionality that a client would find very useful.

The client gets its requests filled from the server; whether those requests be to format a date, obtain a file, and what not. You can get more complicated than this, too; clients can be servers of other clients, something like:

client => server/client => server;

is a three tiered client server system. The first client forks its requests off to the first server which then gets other requests filled from the second server.

Of course, the trick of actually programming in a modular way is to recognize which functions are reusable, or in other words, what can and cannot be abstracted. That is what the bulk of this chapter is about. But we would be remiss if, before we got started, we didn't bring up the costs of programming in a modular way.

The Costs of Modular Programming

As we introduced in chapter 13, an unfortunate number of programmers cling to the lower ends of the 'abstraction scale'. In other words, they never get above the point in which they are putting subroutines in their individual scripts. This is good for throwaway scripts, but horrid when trying to do something really big in Perl.

However, there are some reasons for clinging to the 'lower level' of abstraction, and never attempting to make anything larger than a subroutine. Some of these reasons are valid and some are misplaced. Let's look at some of them up front.

1) larger up-front time commitment

Modular programming is synonymous with design. In order to make a set of modules in which you can pigeonhole effective functions, there must be a design, and with design comes time to design.

2) modular programming involves a level of 'indirection'

Indirection is best defined by example: it is simply a matter of "I'm calling this function here, but the function actually lives in a module, over there!". In a way, this is harder to debug, and you have to think a little bit more on how to debug it. See Chapter 21 for more information on debugging indirect code.

3) There is a psychological cost to modular programming

Face it, people generally like to make simple scripts in which everything is there, in front of them, and they don't need to look anywhere else to debug. There are many reasons for this: habit, wanting to get something working fast, the sheer joy of hackery. But, just realize that when you are getting something done quickly by using short-cuts, you are not actually getting stuff done quickly, and there is a hidden cost to what you do.

4) Modular programming can limit flexibility

This is a difficult concept for non-programmers (especially non-technical managers!) to understand. When making modules, you are making, in essence, building blocks. Building blocks are great for creating things quickly, but if the particular block does not fit exactly, it can be difficult to make it fit. What often looks like a difficult problem may actually be very trivial, and what looks like an easy problem can be quite non-trivial. Good modules limit the number of problems that fall in the second category.

5) There are a lot of outside pressures that can prevent you from programming correctly

Since modular programming takes a larger up-front time commitment, let's not forget the pressures that can come from programming for profit, before programming for fun.

Each one of these points affects people to a different extent. As an example, I didn't get snagged up by points 1, 2, and 3 but point 4 was a real kicker, since I am primarily programming for profit.

In fact, we have a name for this last point at my work. It is affectionately known as the "Scotty syndrome", named after the Star Trek character of the same name. This is the situation in which a Kirkish-like manager comes in and says to the unfortunate Perl programmer 'Please if you don't get this done in the next hour.. WERE ALL GONNA DIE!'

Since Perl sometimes lets you do very complicated things in a short space of time, you will be surprised how in demand Perl programmers become in the workplace.*

Alas, I digress. The main point is that if you aren't used to modular programming, and want to learn it, it is going to take resolve, determination, and drive, as well as working 'smarter rather than harder'. Believe me, if you aren't thinking in a 'modular state' right now, it is well worth it when you start doing ten times the work in half the time.

Do yourself a favor, and don't get stuck in the rut in which you are always making quick fixes to a problem. Consider the Scotty syndrome if you are in the position in which you are constantly on the critical path. If you always give way to the desire/orders to do that quick fix, then as sure as rain is wet, there will come a day when there will be a TRUE problem where SERIOUS fallout will occur. Then you won't be able to get it done, because it will not warrant a quick fix, but an overhaul to the entire process.

If you are in a situation like this, do not hesitate to tell the manager something such as: "Yes. I can do this (insert quick fix here). But this quick fix will probably cost us a lot more time in the future than if I (modularized/objectified/designed) the problem correctly now." In other words, be up front with his demands, and above all, actually learn how to fix the problems correctly.

I wish I had a dollar for every time those quick fixes ended up in production for years. And, by definition, since quick fixes are usually devoid of comments, everyone is afraid to get rid of, or fix, the program. Repeat after me: nothing is an emergency unless there is blood on the floor!

Converting a Procedural Script to a Modular One

I do code reviews at my work, with a red pen, a long, empty conference room, and lots of caffeine to tide me over until the code review is done. Code reviews are held for different purposes, but let's just assume that we are taking a bit of code that someone has written from scratch. We want to fit it in to an already existing project by abstracting out what we can and putting it into modules. (We only consider modules here; in chapter 17, we shall see that some of these modules are better off as objects.)

By amazing coincidence, the script that we shall consider is the one that we discussed in chapter 13, the one that started this whole discussion on code reuse:

Listing: procedual_code.p 15.1

1 #!/usr/local/bin/perl -w

2

3 use strict; # makes it so we adhere to strict rules

4 open (FD, "> input_log"); # opens up the input log for

5 # the input to follow.

6 if (grep (/-input/, @ARGV)) # looks for the input switch in the

7 { # arguments on the command line.

8 my ($answer, $amount, @output);

9 do

10 {

11 undef (@output);

12 print "What do you want for Christmas?\n";

13 chop($answer = <STDIN>);

14 push(@output, $answer);

15

16 print "How much will it cost?\n";

17 chop($amount = <STDIN>);

18 if ($amount > 100.00)

19 {

20 print "That's way too expensive\n";

21 }

22 push(@output, $amount);

23

24 } while ($amount > 100.00);

25 }

26

27 print FD "@output\n";

28 close(FD);

What can we make of this?

Well, any good program design consists of two steps. First, figure out what makes sense to abstract. Second, figure out how to implement the abstractions. If I had my red pen, I'd mark up this script resembling something like Figure 15.1

fig151.fig

Figure 15.1

Marked up Procedural Code

As you can see from Figure 15.1, three things come to mind from the example:

1) log files being accessed directly

2) standard input being accessed directly

3) options from the command line being accessed directly.

Each of these represent classes of items that Perl programmers consistently do not abstract. ( which as we have said earlier, is a good and bad thing depending on your intentions). Hence, we will look at them in isolation and in detail. Let's take the first step, and consider how we wish to abstract each one of these items.

Point #1: Abstracting the Log File into a Module

There were two lines in the code that dealt with logging:

4 open (FD, "> input_log"); # opens up the input log for

27 print FD "@output\n";

Now, this might not be a problem in itself - the script might be a one liner - but we have a problem if this was part of a bunch of Perl scripts.

This is a problem if we want to do something a little more sophisticated than simply print out the input as we do in line #27. Say we want to print out a timestamp, or to log which program is using the log file, or to keep a running tally of gifts:

Oct 24 1996 13:45:02 Stretch Armstrong 14.95

Nov 14 1996 14:05:02 Hunchback of Notre Dame 15.95

Or perhaps we want to load this data into a database and make the elements in it pipe delimited:

Oct 24 1996 13:45:02|Stretch Armstrong|14.95

Nov 14 1996 14:05:02|Hunchback of Notre Dame|15.95

In each case, if we don't abstract it, and have more than one program that does similar 'logging', then we will have to change every single occurrence in every single program that uses this logging. Therefore, it does make sense to abstract it out.

As we shall see in Chapter 17, this is a good candidate for an object. (See that chapter for the rationale.) But for now, let's make this a module. Modules, after all, are easier to do, both technically and conceptually. Finally, (this is important!) making it a module does not preclude making it an object later. Later, we will convert this module into an object to show how easy (if tedious) it is to do.

Let's start in abstracting the above code by making three subroutines at the beginning of the program. We'll call them logOpen, logClose, and logWrite. Let's also proceed to make the usage for our 'log' functions as simple as possible. In other words, the file handle, and the name of the file are now associated with the functions that are accessing them. This way, the call to our logOpen(), logClose(), and logWrite() functions will only take one argument. The code will look something like this:

Listing 15.2 procedural_code_mod.p

1 #!/usr/local/bin/perl

2

3 my $fh;

4 sub logOpen

5 {

6 my ($log) = @_;

7 $fh = new FileHandle("$log");

8 }

9 sub logClose

10 {

11 close($fh);

12 }

13 sub logWrite

14 {

15 my (@output) = @_;

16 print $fh "@output";

17 }

18

19 logOpen('log_file'); # open(FD, ... )

20 # the input to follow.

21 if (grep (/-input/, @ARGV)) # looks for the input switch in the

22 { # arguments on the command line.

23 my ($answer, $amount, @output);

24 do

25 {

26 undef @output; # clear any mistakes.

27 print "What do you want for Christmas?\n";

28 chop($answer = <STDIN>);

29 push(@output, $answer);

30

31 print "How much will it cost?\n";

32 chop($amount = <STDIN>);

33 if ($amount > 100.00)

34 {

35 print "That's way too expensive\n";

36 }

37 } while ($amount > 100.00);

38 }

39

40 logWrite(@output);

41 logClose();

This program is exactly equivalent to the previous program. The code in bold we are going to move to a package, but it makes sense to keep it here for ease of testing.

After we are done testing, we then proceed to abstract the three functions into their own package. As before, we do this in the same file for ease of testing:

Listing 15.3 procedural_code_mod2.p

1 #!/usr/local/bin/perl

2 package Log;

3 my $fh;

4 sub open

5 {

6 my ($log) = @_;

7 $fh = new FileHandle( "$log");

8 }

9 sub close

10 {

11 close($fh);

12 }

13 sub write

14 {

15 my (@output) = @_;

16 print $fh "@output";

17 }

18 1;

19 package main;

 

20 Log::open("log_file");

# ..... all the other stuff.

# .....

40 Log::write(@output);

41 Log::close();

So, logOpen becomes Log::open, logClose becomes Log::close, and logWrite becomes Log::write. The Log:: tag shows that they are related in the same module.

In this case, our client is the program that actually does the asking about Christmas gifts. The server is our log module, which will be used by the Christmas gift program.

Cut the package out of the client, that is remove everything between 'package Log;' and '1'. Put this code in its own place, say a directory denoted by an environmental variable 'MYLIB'. (We saw how to create MYLIB in the last chapter.) The client becomes:

Listing 15.3 procedural_code_mod3.p

1 #!/usr/local/bin/perl

2 use lib "$ENV{MYLIB}";

3 use Log;

4

5 Log::open("log_file");

6 # ..... all the other stuff.

7 # .....

8 Log::write(@output);

9 Log::close();

Then make a file named Log.pm:

Listing 15.4 Log.pm

1 package Log;

2 my $fh;

3 sub open

4 {

5 my ($log) = @_;

6 $fh = new FileHandle(">> $log");

7 }

8 sub close

9 {

10 close($fh);

11 }

12 sub write

13 {

14 my (@output) = @_;

15 print $fh "@output";

16 }

17 1;

Of course, there are several ways to stick Log.pm in its own central place. Using the directive:

use lib "/directory/path"

is the easiest. It simply says to Perl 'OK, the first place you are going to look for a module is inside "/directory/path", before you look in any of the centralized places.'

Summary of Abstracting Out Input Into a Log

This is a simple example of modularizing a program. We recognized that a few of the functions, open(FD,..), print FD, and close(FD), were doing work that was too low-level for the problems at hand. If we wanted to change our interface, for example we would have had problems. We may also wish to reuse the concept of a log-file, and would hate to throw away all that good code.

We also realized that it would probably be better if this logfile was in fact an object rather than a module and that in the future we would probably be better off converting it to an object.

So, we took four steps to modularize the routines:

1) realization of the problem, and that the three functions were better off in a module/object

2) design of the new subroutines

2) prototyping, making each of the functions into a subroutine that abstracted out the low-level data (fileHandles, etc.)

3) making the subroutines into a package, which existed in the same program where the packages were used

4) and finally, making a module in a centralized place ($ENV{MYLIB}) which will contain all of the modules that we are going to use. We then pointed the client to use that module via use lib.

Now that we have this flexible code that can be reused in other programs, we might add onto this with 'ideas for the future': thinking of how to expand our log module, by adding features like the timestamp that we talked about.

This is one of the fun things about programming in modules and objects. One almost never comes up with a perfect module or object. There are always ways to improve/add to your design, and since you are putting things in a centralized place, you are in essence solving a thousand scenarios at once!

Point #2: Abstracting out Calls to Standard Input

Let's use this same line of thinking for the second problem on our list, the problem about standard input. Again, we have two questions that are being asked from the command line in the loop:

1 do

2 {

3 undef @output; # clear any mistakes.

4 print "What do you want for Christmas?\n";

5 chop($answer = <STDIN>);

6 push(@output, $answer);

7

8 print "How much will it cost?\n";

9 chop($amount = <STDIN>);

10 if ($amount > 100.00)

11 {

12 print "That's way too expensive\n";

13 }

14 } while ($amount > 100.00);

How can this code be abstracted out of the main code?

Any given interaction between the user and the program consists of three things:

1) a question being asked

2) a response being given for that question

3) a validation for that question that tells the computer if the answer is appropriate or not

Therefore, this looks like the perfect case for a 'killer' subroutine (i.e.: one that kills a thousand different problems at once), or a couple of killer subroutines that abstract out all of the input problems up above. Let's call this package InputHandle.

Now, unlike our first log module, I believe that the InputHandle package should be a module ultimately, not an object. In this case, we are doing a straightforward action: namely, getting input from the user. Since modules consist of only functions, it makes sense not to introduce the extra complexity that an object would offer.

This is also the perfect example of which to create a low level subroutine, one which is useful in a wide variety of situations, i.e. the 'killer'.

Let's call our low level subroutine get and place it inside the module InputHandler so there is no name conflict with Perl. There will be only one subroutine in this module. In accordance with the requirements of the program, the subroutine will have two arguments:

1) the question that is going to be asked

2) an optional function pointer that will be used to validate whether or not the function returned an appropriate value

The usage of the get function looks something like this:

1 push (@output, InputHandler::getInput('What do you want for Christmas?'));

which will do the exact equivalent of:

1 print "What do you want for Christmas?\n";

2 chop($answer = <STDIN>);

3 push(@output, $answer);

and

1 $answer = getInput('How much does it cost?', \&validation);

which will do the same thing as:

1 chop($amount = <STDIN>);

2 if ($amount > 100.00)

3 {

4 print "That's way too expensive\n";

5 }

except that the 'cost' validation is abstracted out into a subroutine, called validation, which the user controls.

Here is the subroutine in its entirety:

Listing 15.5 InputHandler.pm

1 package InputHandler;

2 sub getInput

3 {

4 my ($question, $validFunction) = @_;

5

6 die "improper usage!\n" if (ref($validFunction) ne "CODE");

7 # simple type checking.

8 my $answer;

9

10 print "$question\n"; # print out the question

11 chop($answer = <STDIN>); # get the answer via

12 # standard input.

13

14 $answer = &$validFunction($answer); # try to validate the input

15 # assign the result to the

16 # return value. Example of

17 # a function pointer.

18 return($answer); # return the validated input.

19 }

We then abstract it, put its in own module, and our client becomes:

Listing 15.4 procedural_code_mod4.p

1 #!/usr/local/bin/perl5

2 use lib $ENV{MYLIB};

3 use Getopt::Long;

4 use InputHandler;

5 use Log;

6

7 my $options = {};

8 GetOptions($options, '--input');

9

10 Log::open('log_file'); # open(FD, ... )

11 # the input to follow.

12 if ($options->{input}) # looks for the input switch in the

13 { # arguments on the command line.

14 do

15 {

16 my ($answer, $amount) =

17 (

18 InputHandler::getInput("What do you want for Christmas?\n"),

19 InputHandler::getInput("How much will it cost\n",\&tooMuch);

20 );

21 } while ($amount eq 'NO_GOOD');

22 }

23

24 Log::write($answer, $amount);

25 Log::close();

26

27 sub tooMuch

28 {

29 my ($amount) = @_;

30 if ($amount > 100.00)

31 {

32 print "That's way too expensive\n";

33 return("NO_GOOD");

34 }

35 return($amount);

36 }

See how this is working? We are molding the internal loop, and abstracting it so it contains only the main points of the action. Before the abstraction, we had bare filehandles and access to the standard input.

We have also gotten ahead of ourselves and added command line processing, GetOptions. The reason we did this is that this is a trivial usage of Getoptions. If you want more information on option processing, see the next section.

*

Notice that you do not need to abstract everything out of a given operation in order for a module to be useful. Take the case of the function pointer as used by getInput. From the point of view of InputHandler::getInput, this function is a black box. InputHandler::getInput calls it, not caring what this function is, and then uses the results of that call to filter the input that the user has given at the keyboard.

This function pointer is also called a callback. Callbacks are used to supply a missing piece of functionality to the module, a piece of information that the module cannot POSSIBLY know. (For more information on callbacks, see Chapter 11.)

For example, the InputHandler module cannot know whether a value passed to it is good or bad. InputHandler::getInput doesn't know whether the string that we pass it is asking about Christmas gifts, or steel girders, InputHandler::getInput shouldn't know nor care what function we use to validate the data. These details should also be abstracted from the point of the module.

We simply say that the user (not the Module itself) is going to define a function - a function that InputHandler::getInput simply calls without knowing what it is. Let's look at that code again:

Listing 15.6 InputHandler.pm

1 package InputHandler;

2 sub getInput

3 {

4 my ($question, $validFunction) = @_;

5

6 die "improper usage!\n" if (ref($validFunction) ne "CODE");

7 # simple type checking.

8 my $answer;

9

10 print "$question\n"; # print out the question

11 chop($answer = <STDIN>); # get the answer via

12 # standard input.

13

14 $answer = &$validFunction($answer); # try to validate the input

15 # assign the result to the

16 # return value. Example of

17 # a function pointer.

18 return($answer); # return the validated input.

19 }The line:

14 $answer = &$validFunction($answer);

calls the function pointed to by $validFunction, passing the value $answer which we get from the command line:

chop($answer = <STDIN>);

Here are some examples of how it is so useful in this context:

$integer = InputHandler::getInput('give me an float..', \&roundUp);

sub roundUp { my ($float) = @_; $float += .999999999; return(int($float)); }

This gets a float from the command line, and rounds it up to the nearest, greater integer. And:

do

{

} while (

$number = InputHandler::getInput

(

'give me a number from 1 thru 10..',

\&rejectdigit

)

);

 

sub rejectdigit

{

my ($input) = @_;

return('') if (($input > 0) && ($input<11));

return($input);

}

keeps on trying to get a number from the command line, while the user refuses to type a number, and types a string instead. \&rejectdigit is called, and the user input is validated, returning '' (false) if the input is not between 1 and 10. Otherwise, it returns the value that the user typed.

This is enough for now about callbacks. Perceptive readers may recall that we first covered them in Chapter 11, and we shall see them again. Just be aware that they are there, and remember that they are used in different ways..

So, now let's think to the future on how we could expand InputHandler::getInput. One of the most obvious ways is that we could add an option for 'shadowing' (i.e.: not displaying the text to screen) when the user was to enter a password. Another might be to make it so getInput could automatically be mirrored into a log file, to record what the user had typed into a file.

Summary of Abstracting Out Standard Input

The process of creating this Module was very similar to the previous one. We recognized the problem and figured out what we were going to abstract. We then abstracted the standard input problem, this time into a common module called InputHandler.

This time, however, we added a wrinkle: modularizing a client, but modularizing it while STILL relying on the script to do some of the processing. In this example, we did this by the use of a function pointer termed a callback.

Callbacks are functions that let one abstract a problem out, but not abstract it out completely. In this case we used callbacks to make a powerful 'getInput' routine, which could be used to get any standard input we want.

Abstracting Out Argument Processing: an Example of Getopts in Detail

There is one more point that we need to address in this example, that of argument processing. The array to manipulate is @ARGV, the standard options library. Argument processing is the teething ring upon which every Perl programmer learns Perl libraries. There are HUNDREDS of option processing libraries out there for Perl.

Note that there is a very good option processing library, which is passed out with the standard distribution. It is called Getopt, and has two flavors:

Getopt::Long;

Getopt::Std;

You can therefore consider this example to be one of successful reuse. As we shall see, this is probably one wheel that you don't want to reinvent.

Getopts::Long Example

Consider a common situation in which there is a function that takes five different command line directives:

1) '-name' takes an non-optional string argument

2) '-age' takes an non-optional integer argument

3) '-occupation' takes an optional string argument

4) '-married' is a switch (i.e.: it is either there or not).

5) '-police_record' takes an array of strings as an argument

The following example uses Getopt to stuff all of the options from the command line into the hash reference called $options, and takes them out of the @ARGV array in the process:

use Getopt::Long;

 

my $options = {};

 

GetOptions($options, "--name=s", "--age=i", "--occupation:s",

"--married", '--police_record:s@');

$options here is an example of passing a reference to a subroutine to be filled. Using this code with the following input:

prompt% script.p -name 'Ed Peschko' -age '28' -occupation 'Data Migrant' -police_record 'jaywalking' -police_record 'snorkeling without a license' other_arg1 other_arg2

ends up with a hash reference $options which contains:

{ name => 'Ed Peschko',

age => 28,

occupation => 'Data Migrant'

police_record => ['jaywalking', 'snorkeling without a license']

}

And @ARGV becomes:

( 'other_arg1', 'other_arg2')

because these arguments cannot be processed by Getopts::Long. Hence, they are left in the surrounding array.

Or you could say:

prompt% script.p -name 'Ed Peschko' -occupation -age '28'

and get

{ name => 'Ed Peschko', occupation => '', age => 28 }

Getopts Advantages

The Getopt module is pretty complicated, but very useful. Be sure to read the documentation that comes along with it to get a flavor of its complexity. I pretty much always use the option for stuffing everything into a hash reference because:

1) all of the command processing is on one line

2) all of the options go into one hash reference, which can be passed to subroutines

In addition, there are hundreds of command line options modules out there for your use. Getopt::Long is the best of the hundreds. Getopt::Long has three, inherent advantages.

1) It has lots of functionality, and is very resilient. There is a consistent interface with Getopt::Long, and one that is very user resistant. Switching the order of the options has no effect on the actual use of those operations.

2) It is distributed with Perl itself. Hence, you are guaranteed that other people will be able to use your module.

3) It is the standard that has been tested for years. Pretty much every program out there uses Getopt::Long.

Keep in mind that nothing is perfect, and Getopt::Long is no exception. We discuss its warts next.

Getopt::Long Defaults, Good and Bad.

In order to use Getopt::Long effectively, we feel it necessary to make you aware that there are two package variables that need to be set in order to make things work correctly. These package variables are already set in the package, but I dislike the defaults. Here is one:

$Getopt::Long::ignorecase.

ignorecase is a package variable that controls the case-sensitivity of the command line options. Do yourself a favor and set this to 0. By default it is one, which means that -U and -u are the same thing, which to many programmers is just plain wrong.

Here is the other package variable that needs to be changed:

$Getopt::Long::passthrough.

This package variable controls whether or not options are passed through or ignored if they cannot be understood by Getopt::Long. "Passed through" means that they will remain in @ARGV to be processed later. Without the passthrough option, they will be 'eaten' by Getopt::Long.

This 'passthrough' is a new option, and it is essential to make a class of Perl programs called 'wrappers' around other functions, which we talk about next.

Getopt::Long and Perl Wrappers

The statement '$Getopt::Long::passthrough = 1' facilitates the creation of wrappers around already existing commands. Wrappers inherit the functionality of a given command without having to rewrite that functionality. If you set passthrough to true, and call the function 'script.p' as so:

prompt% script.p -option_not_defined 'a' -defined_option 'b'

in which the command line is passed to

use Getopt::Long;

GetOptions($options, '--defined_option:s');

then this will leave the undefined option alone. @ARGV will become ('-option_not_defined', 'a') and $options will contain "{ 'defined_option' => 'b' }".

Example of Passthrough: Database Interface

Here is an example of where passthrough is so useful. All major databases come with a command line interface, for example, in which you can directly type commands to the underlying Sybase engine. For example, if you have used Sybase before, then you are familiar with the infamous 'isql' command, which is Sybase's command line interface with its database.

In Sybase, you can say something like:

isql -U user -P password -S server -i SQL_file.

This will take the SQL in 'SQL_file' and apply it (-I) to the server given by '-S'.

However, this is USUALLY no good from the command line, since we want to apply our SQL file to a particular database, not a server. Hence, we could add a '-D' argument which would take the SQL_file and directly apply it to a database. Therefore, we could do something like:

Listing 15.5 Command Wrapper example isql.p

1 #!/usr/bin/perl

2 use Getopt::Long;

3 use FileHandle;

4 $Getopt::Long::passthrough = 1;

5 $Getopt::Long::ignorecase = 0;

6 my $options = {};

7 GetOptions($options, '--D:s', '--i:s');

8 my $file;

9 if ($options->{D})

10 {

11 _$tempfile = _addDB($options);

12 }

13 system("isql @ARGV -i $tempfile"); # we pass through the arguments.

14 unlink($tempfile);

The meat of the statement occurs in line 13.

If a user says

isql -U user -P password -S server -D database -i SQL_file.

then the arguments '-D database' and '-i SQL_file' are picked up by 'Getopt::Long'. '-P password' and '-S server' pass through, i.e. they stay in the argument list. Hence

system("isql @ARGV -i $tempfile");

becomes

system("isql -U user -P password -S server -i <temp_file>,

This is a vanilla, every day isql command, with a difference. What's the trick? Well, we shall make 'temp_file' a direct copy of the input file given on the command line, only we shall add a header telling where the SQL is going to be executed. If our SQL looked something like:

select * from account

go

then, inside temp_file, code will look something like:

use <database_name>

go

select * from account

go

Usually this editing would be done by hand, i.e. we would insert 'use <database_name> go' into the file before processing. In this case, the editing is done by Perl!

This is especially helpful if you have a large, sprawling database model with tables inside different servers, and different databases.

Here is the implementation. Again, note the technique of undefining $/ and slurping an entire file into a string via '$line = <$fd>;':

Listing 15.5 isql.p continued

1 sub _addDB

2 {

3 my ($options) = @_;

4 my ($database, $sqlfile) = @_;

5 local($/) = undef; # let's us suck all the data down in one fell swoop;

6 my $fh = new FileHandle("$sqlfile") ||

7 die "Couldn't open file $sqlfile\n";

8 my $fh2 = new FileHandle("> $$.SQL");

9 my $line = <$fh>; # sucks in the whole file;

10 print $fh2 "use database $database\n";

11 print $fh2 "$line;

12 }

Note how easy this translates into Perl code. We have a program, isql, that does not exactly fit our needs. We want isql to take a database parameter, and it doesn't. Hence we ADD that functionality 'on top' as it were.

Your cup of tea may not be database administration, but you can use this technique quite successfully no matter what you do. For example suppose you want to log any 'del' commands that you type:

Listing 15.6 dellog.p

1 #!/usr/local/bin/perl5

2 use Log;

3 Log::open("del_log");

4 my $path = getcwd();

5 Log::write("$path; del @ARGV");

6 system("del @ARGV");

Now, if you convert this into a batch file (dellog.bat) using pl2bat.bat (see chapter 3), and you put it in your path, each time you type

C:\> dellog

you will store all of the 'del' commands that you type. (Not much use in Windows, one might argue, since they wrap up everything into the file Explorer, but for larger projects, such a 'trail of breadcrumbs' is essential to figure out what actually happened. You must weigh the convenience that the 'Recycle Bin' gives you with the lack of traceability.)

Advanced Argument Processing

You can do a lot with simple wrappers like the above example. But there are some cases in which you are going to want to get more fancy than GetOptions permits. Say, for example, you have a bunch of scripts that all use the same options:

prompt% record.p -name 'Ed Peschko' -age '28' -occupation 'Data Migrant' -police_record 'jaywalking' -police_record 'snorkeling without a license' other_arg1 other_arg2

prompt% get.p -name 'Ed Peschko' -age '28' -occupation 'Data Migrant' -police_record 'jaywalking' -police_record 'snorkeling without a license' other_arg1 other_arg2

and where record.p may record the information given at the command line into a file or database. get.p may simply get the information out that file or database.

In this case, it makes sense to reuse the functionality of the command line in both scripts. Say that we want to have a common usage for both scripts. If you want to do this, then simply reuse that functionality by placing it in a module! Define a module CommonOptions, which then uses a specific set of behaviors from the Getopt module. Do something like this:

Listing 15.7 CommonOptions.pm

1 package CommonOptions; # in file CommonOptions.pm

2 use Getopt::Long; # using the module Getopt::Long;

3 use strict; # we want to have built in tracing.

4

5 my ($options) = {};

6

7 sub myopts

8 {

9 my (@extraOptions) = @_;

10

11 GetOptions($options, "--name=s", "--age=i", "--occupation:s",

12 "--married", "--police_record:s@", @extraOptions);

13

14

15 return($options);

16 }

17

18 # package CommonOptions.pm

This puts the information in a centralized place. It says that we will have the options '-name','-age','-occupation','-married', and '-police_record' as standard options.

Any time we want a client to get these standard options, all we have to say is:

Listing 15.8 comm_opt_client.p

1 #!/usr/local/bin/perl

2

3 use lib $ENV{MYLIB};

4 use CommonOptions;

5

6 my $options = CommonOptions::myopts();

7 # ......

8 # rest of script comm_opt_client.p

Line 6 then populates the hash reference $options with our standard options ('-age', '-name', etc.) such that if we said:

prompt% comm_opt_client.p -age 15 -name Ed

then $options will become:

$options = { 'age' => 15, 'name' => 'Ed' };

and

print $options->{'name'};

would print 'Ed'.

Now suppose that we want to have all of the standard options for our script, plus an extra option, one that could take an array of elements. Then, all we do is pass the 'CommonOptions ::myopts()' function a parameter (extraComments:s@):

Listing 15.9 comm_opt_client2.p

#!/usr/local/bin/perl

use lib $ENV{MYLIB};

use CommonOptions;

 

my $options = CommonOptions::myopts('--extraComments:s@');

# ......

# client record.p

Both of these commands will inherit, by default, the name, age, occupation, married, and police record options. In the second case, we added '-extraComments:s@' which will let us turn the command line:

prompt% comm_opt_client2.p -name Paul -age 31

-extraComments comment1 -extraComments comment2

into the data structure:

$options = { 'name' => 'Paul', 'age' => 31,

'extraComments' => ['comment1','comment2']

This second script is a simple example of what is termed delegation. Here is a small definition:

Definition: Delegation is a programming technique in which a module or an object gets an argument or function, and if it cannot handle this argument/function, it passes it on to a module which can.

The module CommonOptions::myopts cannot handle the argument @extraOptions, so it is passed to a module that can handle it, namely Getopt::Long. This merges both the common options ('-name', '-age', '-occupation', '-married', '-police_record') and non-common options ('-extraComments') together.

We won't talk much about delegation in this book, but the postscript to this book contains a couple of good resources on where to find more information about it.

Now having made a 'Common Options' module, we can do lots of cool things with it. If we say:

use CommonOptions;

 

my $options = CommonOptions::myopts();

at the beginning of every script we write, then we have a way to make each script do a 'common action'. Here we overload it to keep a log of what commands have been typed, and at what time:

Listing 15.10 comm_opt_client2.p

1 package CommonOptions;

2 use Config;

3 use Getopt::Long;

4 use Log;

5 use strict;

6

7 my $logDir = "$ENV{LOG_DIRECTORY}"; # we make one log directory where

8 # we will keep all of our logs. Could

9 # actually be put in the Log module.

10

11 my $command = ($Config('osname'} =~ m"in32")?

12 substr($0, rindex($0,"\")+1):

13 substr($0, rindex($0,"/")+1);

14 # another hack that really should be

15 # abstracted into a module, if

16 # not only to insure portability. It

17 # simply gets the name of the script

18 # run ('/export/home/epeschko/a.p')

19 # becomes 'a.p'.

20 sub myopts

21 {

22 my (@extraOpts) = @_;

23 my $options;

24

25 _logOptions(); # 'private' function, written below

29

30 GetOptions($options, @extraOpts);

31 }

32

33 sub _logOptions

34 {

35 Log::open("$logDir/$command.log");

39

40 my $time = localtime(); # gets 'localtime': ie:the current time

41 Log::write("$time: $0 @ARGV");

42 # writes all of the options out to the

43 # logfile, prefaced by the time. This could be made a default

44 # behaviour of the 'Logger'.

45 Log::close();

46 }

Now if you get in the habit of saying:

use CommonOptions;

at the beginning of each script, you will not only get the common options (that you don't have to retype) but you will you will also get a log, per command, of every single time someone typed that command, which version of the command they typed, and which arguments they used for the commands. The center of the log here is line 41, where we write to the file '$command.log'.

You also get this interface project wide, which means you can do wondrous things with your interfaces. Here a few examples of what you can do by using the concept of a common environment.

Q: Need all of your scripts to have a common, GUI interface through CGI scripting?

A: Simply overload the myopts function to recognize that it is being called through CGI, and then construct a GUI screen to handle this.

Q: Need all of your scripts to have a mode where they can read from a configuration file, rather than from the command line?

A: Simply overload the myopts function to take a special argument, config, which then recognizes that all of the arguments are going to be coming from a config file instead of from the command line.

Q: Need all of your scripts to have a default set of options? (Say we want to assume everybody has an age of 30 instead of being told otherwise.)

A: Simply overload the myopts function to default to the age of 30, and then use that default, unless given the option on the command line.

All of this is really cool, and comes from the paradigm shift of looking at changes to programs as if they occurred in 'bins'. Once you get to looking at things in a modular and/or object-oriented way, changes to your programs will become easy to manage. Change will become a process of deciding which bin you are going to throw the changes into. In this example, it so happens that a very convenient bin to throw stuff into is the configuration bin, since at that point every single program will inherit the stuff that you gave to it.

Summary of Abstracting Command Line Processing

In this section, we introduced the reuse of code. There are hundreds of command line interfaces that Perl programmers have created. We choose the standard, Getopt, to discuss.

The Module Getopt::Long does a wonderful job of abstracting out command line interfaces. It consists of a generic interface which is extremely flexible. We went through a simple example of its use, and also some caveats to consider.

Finally, we thought about how we could further abstract out the command module in order to put our scripts into HTML, and do common setup routines, along with a lot of other wide-reaching things.

Example Conclusion

This has been a rather long and arduous example, but let's see where it has taken us. Here is the script again, after our 'code review' :

Listing 15.11 procedural_code_mod4.p

1 #!/usr/local/bin/perl5

2 use lib $ENV{MYLIB};

3 use CommonOptions;

4 use InputHandler;

5 use Log;

6

7 my $options = {};

8 my ($options) = CommonOptions::myopts('--input');

9

10 Log::open('log_file'); # open(FD, ... )

11 # the input to follow.

12 if ($options->{input}) # looks for the input switch in the

13 { # arguments on the command line.

14 do

15 {

16 my ($answer, $amount) =

17 (

18 InputHandler::getInput("What do you want for Christmas?\n"),

19 InputHandler::getInput("How much will it cost\n",\&tooMuch);

20 );

21 } while ($amount eq 'NO_GOOD');

22 }

23

24 Log::write($answer, $amount);

25 Log::close();

26

27 sub tooMuch

28 {

29 my ($amount) = @_;

30 if ($amount > 100.00)

31 {

32 print "That's way too expensive\n";

33 return("NO_GOOD");

34 }

35 return($amount);

36 }

What have we gotten out of this?

Well, the first thing you will notice is that the code actually increased in length. Before our modifications there were 28 lines of code, afterwards 36. Hence, abstraction usually results in a code reduction, but not always.

However, if we gained in size, we also gained in stability, and flexibility. Here are some of the things we won by abstracting stuff out, by line number:

Line #8 (CommonOptions::myopts()) insures that this script works well with other scripts of its type.

Line #10, #24, #25 (Log::open(), write(), close()) insures that the log that this script opens will have a consistent interface.

Lines #18, #19: insure our input from the user will be of the same format, everywhere, insuring consistent look and feel of our code.

In addition, we now have the power to change our interface, the look and feel of our scripts, by changing code in one place.

Whereas if we used the old, procedural method (where subroutines were at the bottom of each script), making a consistent interface is a question of 'herding cats'. Your code will take the path of diverging rather than converging on a central goal.

Ideas on other things to abstract:

Below are some quick ideas on other things to put into modules. They have counterparts on CPAN (and the CD associated with this book) so you might want to take a look there as well.

dates and times

Instead of preferring to rely on localtime, try making a generic module to handle dates and times. You might consider having a Date module in which you can say:

use Date; # includes the module 'Date'.

$daysApart = Date::dateDiff

(

'1996-12-12',

'Mon Dec 15 02:52:13 1996',

'day'

);

and have the module return the number of days that two dates are apart. Another idea is:

$goodDate =Date::isReal ( 'Mon Dec 15 02:52:13 1996' );

which returns true if the given date is a real date, or false if it is a false date. To this end, check out the modules Date::Parse and Date::DateCalc. These two modules are available via CPAN.

command wrapping, for portability's sake

Another useful abstraction into modules is system calls to other programs. A system call to a program such as ls or dir, or more exotic calls such as source control (sccs) or database control (isql, bcp), can easily be abstracted into modules. Saying something like:

system("ls");

is just begging your program to become unportable, since ls does not exist on NT. Instead, say:

use Commands;

Commands::dir();

This is much safer. Here, the particular command 'ls' is abstracted into the generic 'Commands::dir()'.

If you need to make it portable, you just hide the details inside the module Commands, and you need not worry about it in the interface.

variable wrapping, for portability's sake

A third good opportunity for abstraction into modules is paths and variables. This really corresponds to two types of abuse. If you say:

$line = "/export/home/epeschko/a.p";

$line2 = "/export/home/epeschko/b.p";

then you are not abstracting your variables properly. It is better written as:

$path = "/export/home/epeschko";

$line = "$path/a.p";

$line2 = "$path/b.p";

In commonly used paths, this is also better off as:

use CommonPaths;

$line = CommonPaths::get('homepath') . "a.p";

$line = CommonPaths::get('homepath'} . "b.p";

since 'homepath' tells you the function of what you are getting, and not the particular instance.

Second, people have the habit of saying:

my $command = substr($0, rindex($0,"/")+1);

# gets the command name

# as above:

# /export/home/epeschko/a.p

# becomes 'a.p'.

which is probably better off as:

use Path;

my $command = Path::getName($0);

since the details of the path, namely the delimiter, is abstracted out. Also, moving between UNIX and NT will not be nearly the hassle when using this syntax.

debugging and data structures

We shall talk about this later in the chapter 'Programming for Debugging', but a fourth area which lends itself to abstraction is debugging and saving data structures.

Perl programmers waste a lot of time printing out and storing data structures. All of this time could be saved by using the Data::Dumper module as it exists on CPAN. If you find yourself doing something like:

my $a = [{'complex', 'data','reference'}];

 

foreach $element (@$a)

{

foreach $key (keys (%$element))

{

print "$element->{$key}\n";

}

}

throw it all away! Instead, do:

use Data::Dumper;

print Dumper($a);

which will make a lot nicer printout.

installation and configuration management

Installing newer versions of Perl modules is yet another area ripe for abstraction. Perl is the master of configuration management, and there are modules which handle this quite well.

We don't have a chance to talk about it here, but you can look on the CD. We have made an 'install.p' that helps keep you up to date on the most current modules.

common subroutines

Finally, by all means, don't be a purist when it comes to modularizing your programs! It is perfectly OK to have a module like CommonSubs, in which the only thing in common for the functions in CommonSubs is that they are all common! Building the module CommonSubs simply consists of:

1) recognizing that a subroutine is common

2) changing that subroutine to the preface CommonSubs::<sub_name>, where sub_name is the name of the program

3) dumping that subroutine into the CommonSubs bucket.

For example, let's suppose you find that you are using the subroutine typeOf a lot. Remember ref? It was a function that let you tell if something is a reference or not.

Well, typeOf could be considered an extension of ref, such that it doesn't care whether what you pass to it is a reference or not. So:

$scalar = "scalar variable";

typeOf($scalar);

will return 'SCALAR',

@array = ('1','2');

typeOf(@array);

returns 'ARRAY', and:

$ref = ['1','2','3'];

returns ARRAYREF.

One way to code this is:

Listing 15.12 typeOf.p

1 sub typeOf

2 {

3 my (@variable) = @_;

4 my $reference;

5 if (@variable > 1) { return("ARRAY"); }

6

7 if ($reference = ref($variable[0]))

8 {

9 return($reference."REF");

10 }

11 else

12 {

13 return("SCALAR");

14 }

15 }

Take these fifteen lines out of your code, and dump them into CommonSubs. Now, you can do this:

$type = CommonSubs::typeOf(@reference);

This is a really easy way of code re-use. We find a helpful subroutine, and instead of keeping it in the bottom of some script, we pull it out, and stick it in a bin with a bunch of other subroutines. Then, when this 'bin' becomes full, we simply look at it, and decide then how to better sort these functions out.

Summary of common things to abstract

You can think of the above as templates to get started on the thought process to start modularizing your code. Of course, there are thousands of different functions you can possibly extract. Picking the best places to cut code up and put it into bundles is a bit like cutting a diamond; it takes a little practice, and some precision is involved.

Hence, lots of practice here could be considered a good thing. Here were the six things that you might consider abstracting/modularizing in your own Perl code:

1) dates and times

2) command wrapping (for portability's sake)

3) date and time wrapping (for portability's sake)

4) debugging and data structures

5) installation and configuration management

6) common subroutines

Looking at the Perl distribution is also good for showing the technique of modularizing your code. Finally, CPAN gives you some good examples as well.

Examples of Modular Programming

Let's take this a little bit further, and actually implement some more modules. You will ultimately want to start digging into the Perl distribution and/or the Comprehensive Perl Archive Network (CPAN) to get pre-built functionality (and there is a lot out there, believe me.)

However, seeing how modules are implemented by example should ease the learning curve quite a bit, especially if you are unfamiliar with Perl syntax and/or modular programming. They aren't that complicated (well some of them aren't complicated), but they perform some pretty cool tasks that could help you out quite a bit, especially when you get to throwing scripts onto the web. So, here we go. Let's start with a crucial issue that everybody seems to be concerned about these days, portability.

Example #1: UNIX/NT portability via CommonVarbs

If you have used both UNIX and NT, then you know that they have similar concepts, yet do things just a 'little bit differently'. The most common example of this is that NT makes its paths look like:

\TEMP;

with backslashes, whereas UNIX has a path separator of

/tmp

with forward slashes.

The backslash example is just one of many examples. Very annoying, especially for cross platform programmers.*

Actually, this is just a "straw-man's" argument - by the time you read this, Perl should handle the /, \ issue internally with the setting of an internal variable. Nonetheless, it is a good example of a common type of problem

 

One way of dealing with these low-level differences is, you guessed it, to put them into a module. If we could say something like:

use CommonVarbs qw($DSEP);

${DSEP}tmp${DSEP}tmp2;

and let Perl get the path separator for us, then our portability goes up, even if our readability goes down. Or, even more readable:

pathify("/tmp/tmp2");

which would then turn "/tmp/tmp2" into a format that the internal Perl can handle. (If in case you are wondering how to do this, a subroutine like:

sub pathify { my ($return) = @_; $return =~ s"[/\\]"$DSEP"g; $return }

should do it.)

There are a lot of these little quibbles between the operating system, and it makes sense to put them into a centralized 'CommonVariables' subroutine. Something like this is a first try:

Listing 15.13 CommonVarbs.pm

1 package CommonVarbs;

2

3 use strict;

4 use Exporter;

5 use Config;

6 use Cwd;

7 use Sys::Hostname;

8

9 BEGIN

10 {

11 use vars qw (@varlist);

12 @varlist =

13 qw(

14 $HaveSend $HaveUtil $DEFAULT_EDITOR $USERNAME $DOMAIN

15 $MAILING_ADDRESS $TEMPDIR $PATHSEP @PATHDIRS @PAGERS $DSEP

16 $BINMODE $CWD $IGNORECASE $FORCE_WRITEABLE $ABS_PATH $HOSTNAME

17 $CONSOLE $RM $RMDIR @ISA @EXPORT_OK

18 );

19 }

20

21 use vars (@varlist);

22 @ISA = qw(Exporter);

23 @EXPORT_OK = (@varlist);

Here, we simply define all of the variables before they are used. This is a quick, cheap way of getting the benefits of 'use strict' without having to say something like '$CommonVarbs::DEFAULT_EDITOR' the first time we use this variable.

Finally, it makes a good way of self documenting what this module is supposed to do. This module will determine the environment in which the script is running, and what are the values the following:

1) $DEFAULT_EDITOR: the default editor on the platform

2) $USERNAME: the username of the person executing the code

3) $DOMAIN: where the person is running the code (netscape.com)

4) $MAILING_ADDRESS the person who is using this script's email address

5) $TEMPDIR: where this platform keeps temporary files

6) $PATHSEP: a ':' or a ';' - how the platform separates its paths

7) $PAGERS: commands to output a file to the screen

8) $DSEP: Directory separator (/, \)

9) $BINMODE: whether or not there is 'binary' and 'ascii' mode.

10) $CWD: function pointer to get the current working directory

11) $IGNORECASE: whether the OS is case-insensitive or not.

12) $ABS_PATH: a pointer to the function that gets the absolute path of a file

13) $HOSTNAME: the name of the machine.

14) $CONSOLE: the name of the place where you get data from the keyboard

15) $RM the name of the remove command

16) $RMDIR the name of the remove directory command

 

Each one of these varies between UNIX and NT, and this module will define them all so you don't need to worry about doing a check each time you run into a platform dependency.

Line 23 is a simple way of Exporting the variables back to whatever script uses the module. This is what lets us say:

use CommonVarbs qw($DEFAULT_EDITOR);

to get the '$DEFAULT_EDITOR' as seen by the module. But I digress. The module continues:

Listing 15.13 CommonVarbs.pm continued:

24 BEGIN

25 {

26 eval "use Mail::Send;"; # stuff to determine domain more efficiently

27 $HaveSend = ($@ eq "");

28 eval "use Mail::Util;";

29 $HaveUtil = ($@ eq "");

30 }

 

31 sub import

32 {

33 my ($type) = @_;

34 if ($Config{osname} eq 'MSWin32')

35 {

36 _getWin32();

37 }

38 else

39 {

40 _getUnix();

41 }

42 }

Here, we define how we are going to split up the module. The module first looks at the operating system it is running on ('use Config'). If, as time goes on, we want to add more operating systems (OS2, VMS, and MacOS come to mind), we can simply tack on an extra subroutine.

For now, all we have to do is fill in the two pieces we have defined. Below is the defaults for Win32:

Listing 15.14 CommonVarbs::_getWin32()

43 sub _getWin32

44 {

45 $DEFAULT_EDITOR = "notepad";

46 $USERNAME = $ENV{'USERNAME'};

47

48 if ($HaveUtil) { $DOMAIN = Mail::Util::maildomain(); }

49 else { $DOMAIN = $ENV{'USERDOMAIN'}; }

50

51 $MAILING_ADDRESS = "$USERNAME\@$DOMAIN";

52 $TEMPDIR = (defined ($ENV{'TEMP'}))? $ENV{'TEMP'} : '/tmp/';

53

54 $DISPLAYER = 'pod2text';

55 $PATHSEP = $Config{'path_sep'};

56 @PATHDIRS = grep( -d, split(m"$PATHSEP", $ENV{'PATH'}));

57

58 @PAGERS = qw (more< less notepad);

59 unshift (@PAGERS, $ENV{PAGER}) if (defined ($ENV{PAGER}));

60

61 $DSEP = "\\";

62 $BINMODE = 1;

63 $CWD = \&cwd;

64 $IGNORECASE = 1;

65 $FORCE_WRITEABLE = 1;

66 $ABS_PATH = \&abs_path;

67 $HOSTNAME = hostname();

68 $CONSOLE = "con";

69 $RM = "del";

70 $RMDIR = "deltree";

71 }

After we are done with WinNT and Windows 95, we fill in the UNIX portion:

Listing 15.15 CommonVarbs::_getUnix()

72 sub _getUnix

73 {

74

75 $DEFAULT_EDITOR = "vi";

76 $USERNAME = getpwuid($<);

77

78 if ($HaveUtil) { $DOMAIN = Mail::Util::maildomain() }

79 else { $DOMAIN = `hostname` . "." . `domainname`; }

80 $MAILING_ADDRESS = "$USERNAME\@$DOMAIN";

81 $TEMPDIR = '/tmp/';

82

83 $DISPLAYER = 'pod2man';

84 $PATHSEP = $Config{'path_sep'};

85 @PATHDIRS = grep ( -d, split(m"$PATHSEP", $ENV{'PATH'}));

86

87 @PAGERS = qw (more less pg view cat );

88 unshift (@PAGERS, $ENV{PAGER}) if (defined ($ENV{PAGER}));

89

90 $DSEP = "/";

91

92 $BINMODE = 0;

93 $CWD = \&cwd;

94 $IGNORECASE = 0;

95 $FORCE_WRITEABLE = 0;

96

97 $ABS_PATH = \&abs_path;

98 $HOSTNAME = hostname();

99 $CONSOLE = "/dev/tty";

100 }

101 1;

Whew! I guess the point of this module is that about 90% of all of the portability issues you will face will be related to the 'small' issues like you see above. You want to get a list of directories where you can find executables? Well, you can either say:

if ($Config{'os_name'} eq 'MSWin32')

{

@PATHDIRS = grep (-d, split(m"$Config{'path_sep'}, $ENV{'PATH'});

}

each time you want to get this particular item, or you can say:

use CommonVarbs qw(@PATHDIRS);

print "@PATHDIRS\n";

to let tested code do the work for you, and not worry about the internal details.

One more point: you might wonder how I went about writing this particular module. Well, part of it was memory, but you have to remember that the code in the Perl distribution has ten years of dealing with these sorts of portability issues. So, yes, I looked to those thousands of lines of experienced, world weary Perl code in order to glean parts of the above.

Example #2: Using 'switches' inside a script, rather than at the command line

Example #2 will come in pretty handy for those of you working on Windows 95 or Windows NT, even VMS or OS2. As we have seen before, there are certain 'flags' (-w, -d, -c) that you can set at the command line which let you tell Perl what to do, 'above and beyond the call of duty'. The following checks the syntax of script.p:

prompt% perl -c script.p

Now, what happens if you are running in a place where you don't want to use the command line, and still use the command switches? For example, Windows 95 and Windows NT aren't really set up very well for this whole switch thing, and it is a drag to have to open up a command shell each time to run a Perl script..

The following module lets you do stuff like the following:

use Switch qw(-c);

# my script here...

Now, when you run your script, with something like:

C:\> perl script.p

Perl will do a syntax check on your script for you (instead of running it!) This one is short and kinda sour:

Listing 15.16 Switch.pm:

1 package Switch;

2

3 sub import

4 {

5 my ($type, @switches) = @_;

6

7 my $perl = "$^X"; # perl interpreter name.

8 my $script = $0; # perl script name

9

10 my @run = ($^X, @switches, $script, @ARGV);

11 $script =~ s"\W""g;

12 $script = substr($script,0,6);

13

14 if (!$ENV{"SCRIPT_$script"})

15 {

16 $ENV{"SCRIPT_$script"} = 1;

17 exec(@run);

18 }

19 }

20 1;

Here, line #5 takes whatever switches the user passes in, and line #7 and #8 proceed to get the name of the Perl executable (line #7) and the name of the script that was run (line #8). Line #10 puts all this information together into a run statement, such that if you said

use Switch qw(-w);

#####

in the script calling 'Switch.pm', and you then said:

C:\> perl script.p 1 2 3

the variable $run would become:

perl -w script.p 1 2 3

in the way shown in Figure 15.2

fig152.fig

Figure 15.2

Command line maker.

Now, in lines 14 through 18, we proceed to 'exec'ute this code. We set an environment variable, "$ENV{'SCRIPT_scriptp'}" to leave a bread crumb for the script that follows, and to prevent an infinite loop from the script executing itself over and over again. Then, in line 17, we:

exec ('perl', '-w', 'script.p', 1, 2, 3);

which immediately 'takes the place' of the original process. We've seen this before (in Chapter 11).

The upshot is that we have a portable way of doing command line switches which do not too much time (this exec trick is fairly efficient) and is portable between operating systems. This particular trick will work pretty much anywhere, not just UNIX and NT.

Note, again, always use the exec form of

exec (@run);

rather than

exec("@run");

To see why, if you say something like:

prompt% a.p \"1 2 3

where you have embedded quotes, if you use "@run", Perl will munge your quote for you when you exec("@run") - and your argument list will become:

@ARGV = (1,2,3);

and the quote goes silently away! This is a very subtle bug, and although people don't get bit by it often, when it does bite, it can cause you hours of pain. Remember this, especially when dealing with and writing programs for, the shell (whether it be a UNIX shell, NT shell, or whatever type of shell).

Let's show another example of this type of module in action; that was kind of fun! This time, we shall consider scripts that are run non-interactively (via cron or browser).

Example 3: Getting the By-Product of Shell Scripts inside of Perl

Both NT and UNIX share the convenience, and the exasperation, of having startup scripts which set up a person's environment. On UNIX, they are kshrc, bashrc, .tcshrc, named 'resource' files, and on NT the main one is autoexec.bat

Usually when you run scripts on the command line, they inherit the variables that you have set up. But with a cronjob, or a script run by a CGI server, they don't get inherited, which can cause developers lots of pain.

Here is a module to take what we learned in Chapter 11 to its logical conclusion. We already talked a little bit on how to set such variables as LD_LIBRARY_PATH, PERL5LIB, and other 'hard to set' variables by the 'exec' trick. This module will take the environment as produced by a shell script and import it inside your Perl script. Hence, if you say:

use ShellScript "ksh -c '. shellfile.ksh'";

# my script

print "$ENV{'PERL5LIB'}\n";

and 'shellfile.ksh' sets PERL5LIB, then you will inherit, and be able to use PERL5LIB inside your script.

Again, it is important to realize why:

system("ksh -c shellfile.ksh");

won't work! A lot of people get tripped up on this one. The environment for the system call is copied from the parent's environment, and lives as long as the system call is in process. And it dies when the system call is done, and doesn't effect the parent. Hence, PERL5LIB will never change inside the script which calls system.

Here is the workaround for this, again, using that exec trick:

Listing 15.17 ShellScript.pm continued

1 package ShellScript;

2

3 use strict;

4 use FileHandle;

5 use CommonVarbs qw($TEMPDIR $DSEP $RM);

6

Here we set up which modules we are going to use. Notice that we are already using CommonVarbs to help with our portability! We use the variable $TEMPDIR to tell us where we are going to put our temporary files, and $DSEP to tell us what type of delimiter to use for directories (/ or \). Now, we need to define an import function which actually implements the statement "use ShellScript 'ksh -c shellfile.ksh'":

Listing 15.18 ShellScript.pm continued

7 sub import

8 {

9 my ($type, $statement) = @_;

10

11 my $script = $0; my $perl = $^X;

12 @run = ($perl, $script, @ARGV);

13 $script = s"\W""g;

14 $script = substr($script, 0, 6);

15

16 if (!$ENV{'SHELL_$script'})

17 {

18 $ENV{'SHELL_$script'} = 1;

19

20 my $envfile = "$TEMPDIR${DSEP}env.$$.p";

21 my $fh = new FileHandle("> $envfile")

22 || die "Couldn't open environment file!\n";

23

24 print $fh <<"END_OF_CODE";

25 foreach \$key (keys \%ENV)

26 {

27 print "\$ENV{'$key'}='\$ENV{\$key}';\\\n";

28 }

29 END_OF_CODE

30 close($fh);

31

A bit of a note here on what's going on. Again, we are using that old environment trick ($ENV{'SHELL_$script'}) to avoid an infinite loop where exec calls exec, which calls exec. However, we have thrown in the additional trick of using a code generator.

Lines #25 through #28 generate a small Perl script, one that prints out the environment for us. Remember this; later as we shall use this script that we generate to great effect! The code continues:

Listing 15.18 ShellScript.pm continued

32 my ($code, $perl, $file);

33

34 foreach $file (@files)

35 {

36 $perl = $^X;

37 my $code =

38 `$statement;$perl $envfile; $RM $envfile`;

39 eval $code;

40 }

41 exec(@run);

42 }

43 }

Now this code may look like a little code noise-ish and, will, it is! But if you understand how this logic works, you will be able to program a lot of cool things.

So what is going on here? Well, remember that file we created in lines #25 through line #28? In line #38 we use it. This line is the heart of the program. When translated, it looks something like this:

`ksh -c '. shellfile.ksh';/usr/local/bin/perl5 /tmp/env.1333;rm -f /tmp/env.1333`

In other words, we execute the shell statement, thus getting a modified environment. We then run our created Perl script (/tmp/env.1333), get some output, and then remove the traces of the file we created (rm -f /tmp/env.1333).

What we are left with, then, is a string with the environment which we would have gotten had we done ksh -c '. shellfile.ksh' on the command line alone.

There is another twist here. The output of our generated script is in the form:

$ENV{'VARIABLE'}='value';

$ENV{'VARIABLE2'}='value2';

In other words, the output is a legal Perl script. When we get to line 39,

eval $code;

we then use the eval statement to run this code, producing the environment we want (the one from the shell script). Line 41 cements this environment into our original Perl script, by calling "exec(@run)" to fork off a child that has the correct environment.

In other words, we have used a Perl script to generate Perl code, which is in turn used to generate Perl code. This code is then evaluated to get the right environment. That environment is sent to an exec() which then re-executes the code to pass the effects to the original script which then continues. This module's one mercy is that it is brief and therefore somewhat maintainable.

I do admit that I had fun programming it, and it does have a helpful effect, but one more level of indirection and any sane programmer would go mad! Use the techniques in this example with discretion, and wear your safety helmet.

Example #4: Diff Module

OK, so I do feel a little bit guilty for dragging you through that last example. It was a little bit cruel to subject you to so many levels of indirection. As soon as you get good at it, you will be subjecting your colleagues to code like this!

Anyway, let's look at one more example, this one being a little bit more straightforward. As we have said, one of the big advantages to Perl is that you can represent any data structure in terms of arrays, hashes, and scalars. The module Data::Dumper shows that strength by dumping out any data structure to the screen.

However, say you have some huge data structures that you needed to compare, or you just made a change to a data structure which caused a bug in your code. Then it would be helpful if you could go through the data structure, and only look at the points that changed.

In other words, given the data structures:

$arrayRef1 = [1,2,3,4,5];

$arrayRef2 = [1,2,3,4,6];

we somehow want to point our attention to the fact that the only elements that differ are the last two (5 and 6). This way, we can trace down what caused the our error a lot easier. It also helps with changing your software. If we make several small, incremental changes, and test them carefully in this way, we are a lot less likely to introduce new bugs.

So how to do this? Well, note that if we compare each of the elements:

$arrayRef1->[0] eq $arrayRef2->[0];

$arrayRef1->[1] eq $arrayRef2->[1];

we could basically 'cancel out' all of the elements that our equal. If we do so, we are left with:

$diffRef = [undef, undef, undef, undef, XXXXX];

where XXXXX equals a difference in our data structure. The question is then what do we fill in for XXXXX?

Well, it makes sense to fill it in with something that is meaningful to our searching for bugs. Hence, we would probably want to stick the two values that differ here. Unfortunately, as it stands we only have one position to put two elements ('5' and '6') So what do we do? We use Perl's malleability in data structures to cheat, and put both of the elements here!

We define a diff as the following: if the elements in a position of the data structure are the same, then we register an 'undef'. If the elements are different, we put an array consisting of the elements that differ. For our array this turns into:

$diffRef = [undef, undef, undef, undef, [5,6]];

That way, we can see differences at a glance, and not lose any information. We also don't reinvent any of the functionality that is already in Data::Dumper. We reuse Data::Dumper by letting it be the function which prints out this 'diff' structure. It would be wasteful to make an elaborate print function, when Data::Diff does it for you.

Now, what functions make sense to be in our module? Well, three come to mind:

Diff::checkData($ref1,$ref2);

Diff::checkEq($ref1,$ref2);

Diff::patch($ref1, $ref2);

'Diff::checkData($ref1, $ref2)' will create this 'diff' data structure we are talking about. 'Diff::checkEq($ref1, $ref2)' will simply check if $ref1 is equal to $ref2. After all, sometimes we only want that minimum amount of information. 'Diff::patch($ref1, $ref2)' will return a list of Perl commands to change $ref1 to $ref2.

We will implement the first two functions. Patch isn't that difficult, but my fingers are running out of ink.

So how do we go about doing it? Well, recursion is obviously going to play a part, since an arbitrary data structure in Perl is recursive. Also, notice that we have to worry about hash references, array references, and scalars. Once we hit a scalar, we hit the edge of the data structure, and we don't have to recurse any more.

Hence, the pseudo-code is going to look something like

sub checkData

{

if arg1 is scalar { compare arg1 to arg2; return difference }

if arg1 is array { go through each element in arg1 and arg2 recursively }

if arg1 is hash { go through each key in arg1 and arg2 recursively }

}

and, each time we recurse, we keep going until we hit bedrock when we hit the scalars. Of course, it gets a little more complicated than this. We need to deal with cases in which the elements are not of the same type, and associated details. The code starts below:

Listing 15.19 Diff.pm

1 package Diff;

2

3 use strict;

4 use Carp;

5 use Data::Dumper;

6

7 ##########################################################################

8 # Gives a data structure that shows the differences between two data structures

9 #

10 # $data = _checkData([1,2,3], [1,2,4]); # $data = [undef, undef,[3,4];

11 #

12 ##########################################################################

13

14 sub checkData

15 {

16 my ($dst1, $dst2) = @_;

17 confess "You need to pass two arguments/data structures\n" if (@_ != 2);

18 my $return = _checkData($dst1, $dst2);

19 }

Here's the start. Just to be clear, we define the function checkData() in terms of an internal function _checkData(). _checkData() is what is called a private function. It isn't meant to be called by any external scripts that use Diff.pm. (You can, Perl won't prevent you from doing it, but it still is a good idea not to do so.) We shall see a lot more private functions later on. The leading '_' is a clue that whatever you are looking at is private.

We now define the _checkData() function, which is going to be heavily recursive. We highlight the recursive calls:

Listing 15.20 Diff.pm continued

20 sub _checkData

21 {

22 my ( $dst1, $dst2 ) = @_;

23 my $return;

24

25 if ((!ref($dst1)) || (ref($dst1) ne ref($dst2)))

26 {

27 $return = _scalarEqData($dst1, $dst2);

28 }

29 elsif (ref($dst1) eq 'ARRAY')

30 {

31 my $xx;

32 for ($xx =0; $xx < @$dst1; $xx++)

33 {

34 $return->[$xx] = _checkData ( $dst1->[$xx], $dst2->[$xx] );

35 }

36 for ($xx = @$dst1; $xx < @$dst2; $xx++)

37 {

38 $return->[$xx] = _scalarEqData ( undef, $dst2->[$xx] );

39 }

40 }

41 elsif (ref($dst1) eq 'HASH')

42 {

43 my $key;

44 foreach $key (keys(%$dst1))

45 {

46 $return->{$key} = _checkData( $dst1->{$key}, $dst2->{$key} );

47 }

48 foreach $key (keys(%$dst2))

49 {

50 next if ($dst1->{$key});

51 $return = _scalarEqData ( undef, $dst2->{$key} );

52 }

53 }

54 return($return);

55 }

56 sub _scalarEqData

57 {

58 my ( $dst1, $dst2 ) = @_;

59 ($dst1 ne $dst2)? [ $dst1, $dst2 ] : undef;

60 }

There are three things to notice about this code. First, notice that the $return data structure mimics the structure that it finds inside $dst1. If $dst1 and $dst2 so happen to be array references, Line #34 ($return->[$xx]) creates an array element for each array element it finds in @$dst1. If $dst1 and $dst2 so happen to be hash references, Line #46 creates a hash element for each hash element it finds in %$dst1.

Second, notice that it isn't quite as simple as the pseudo-code we gave. We need to compare each element together, true, but there is also the case where $dst1 and $dst2 have a different number of elements. The data structures:

$arrayRef = [1,2,3]; $arrayRef2 = [1,2,3,4];

are different, too. We would want our final data structure to look like '$diffRef = [undef, undef, undef, [undef, 4]]'. It is the job of lines 36-39 and 48-52 to deal with unequal numbers of array elements and hash elements.

Finally, notice we 'hit bedrock' if:

$dst1 is a scalar

$dst1 is not the same data type as $dst2

In these cases, _scalarEq() is called, which is a simple subroutine which compares the two elements. If they are equal, the subroutine returns 'undef'. If not equal, _scalarEq() returns an array reference containing the two differing elements.

Once we get this structure created, the subroutine to check if the two data structures are equal is a lot easier.

To program this, we simply reuse the checkData() function that we have already coded. This internal code reuse can help you out quite a bit. We don't need to go through all of the work that we did in order to make the data structure in the first place. This means that our code is a bit shorter:

61 sub checkEq

62 {

63 my ($dst1, $dst2) = @_;

64 confess "You need to pass two data structures\n" if (@_ != 2);

65 my $diff = checkData($dst1, $dst2);

66

67 my $status = 1;

68 _diffData($diff, \$status);

69 return($status);

70 }

71

72 sub _diffData

73 {

74 my ($return, $status) = @_;

75

76 return(0) if ($$status = 0);

77

78 if (!ref($return))

79 {

80 $$status = 0 if (defined($return));

81 }

82 if (ref($return) eq 'ARRAY')

83 {

84 foreach (@$return) { _diffData($_, $status ); }

85 }

86 elsif (ref($return eq 'HASH'))

87 {

88 foreach (keys (%$return)) { _diffData($return->{$_}, $status);}

89 }

90 }

Again, we outline the recursive calls. _diffData() is pretty simple. All it does is go through the data structure we have created via checkData($dat1, $dat2) and look for defined elements (in line 80). If we find any, our 'status' is set to zero, meaning we have found a difference, and that short-circuits the subroutine in line 76 (so any other recursive calls immediately return 0).

Note one more thing. $status is an example of a way to share variables between recursive subroutine calls. Since we are passing a reference to a scalar around, other recursive calls will immediately see what $status actually is. If we had passed a scalar rather than a scalarref, then we would have been passing multiple copies around.

We could modify this to handle code references and object references here but we are really running out of space.

Summary of chapter

Code abstraction into modules is the first step in a thought process which leads to object-oriented programming. The goal is code reusability across programs and platforms. This process allows you to develop code once, test it for validity, and then use it endlessly in other programs.

The best way of proving this assertion is through examples. Since Perl syntax is so expressive, the scope and breadth of possible Perl modules is incredible. And to get the most out of Perl, you should use your imagination in coming up with the broadest, simplest modules you can think of! In the above code, we have:

1) manipulated the way you can call Perl (switches)

2) gotten effects from the execution of another language and used these effects in a Perl script (ShellScript.pm)

3) made an all-purpose module to do data structure testing (Diff)

4) centralized the interface of all of our Perl scripts (CommonOptions)

In most other languages, these goals are the exclusive territory of language design committees! In other words, you would need to redefine the language itself to get the required effects that we accomplished in 400 or so lines of code.

Anyway, the next chapter we take the next big step up in the quest to scale up our code: we talk about the syntax of the object. The rest of the book will be about object oriented programming and examples, so make sure that you understand much of what has been covered here.

Orders Orders Backward Forward
Comments Comments

COMPUTING MCGRAW-HILL | Beta Books | Contact Us | Order Information | Online Catalog


HTML conversions by Mega Space.

This page updated on October 14, 1997 by Webmaster.

Computing McGraw-Hill is an imprint of the McGraw-Hill Professional Book Group.

Copyright ©1997 The McGraw-Hill Companies, Inc. All Rights Reserved.
Any use is subject to the rules stated in the Terms of Use.