![]() ![]() |
![]() ![]() |
![]() ![]() | |
© 1997 The McGraw-Hill Companies, Inc. All rights reserved. Any use of this Beta Book is subject to the rules stated in the Terms of Use. |
The purpose of this chapter is to add some substance to the theoretical material on object oriented programming that we outlined in the last chapter. This chapter gives you the syntax behind the larger building blocks of Perl: Libraries and Modules.
This chapter is a good reference, but most of these elements will be covered in detail in the chapters to come. If you have simply used Perl as a one-liner language, you might want to read this chapter in entirety, otherwise skim ahead, and refer back when you have questions about syntax.
The one thing to remember, though, is that the syntax for libraries, objects, and modules is simple, but as we said last chapter, it takes a mind-shift to get into the frame of actually using them. Always remember that while it is one of the worst temptations to cut and paste code, it is the most harmful type of code reuse. Use code modularity instead.
For more information, there are the perlmod (Perl modules), perltoot (object bag of tricks) man pages that come along with the documentation and which elaborate on this chapter quite well.
In this chapter, we will introduce some of the major concepts that you will need to know before getting into modular programming, and into object oriented programming.
The main concepts are:
Namespaces are places to segregate code so that one piece of code does not conflict with another.
Modules are defined code that can be included into scripts (via 'use Module').
Libraries are similar to modules, but are included in the application while the program is running, rather than at compile time, as are modules
@INC and %INC are variables in Perl that define which library or module is going to be loaded into code with 'use Module' or 'require Library'.
Before we get to talking about libraries and modules, we really should bring up a basic concept which will help you scale up your programs quite well: the namespace.
Namespaces are tags that let you segregate variables and functions into different compartments. A tag is a way to find something that has been filed away. Tags define or point to where a variable or function may be found.
How do namespaces work? Every function, and all global variables, have a given namespace that they are associated with. This namespace is merely a tag that is associated with the variable or function. For example:
&A::function();
calls the namespace A's version of function(), and:
B::function()
calls the namespace B's version of function().
The namespace mechanism in Perl is an elegant one. In fact, it is one that is directly taken from C, C++, filesystem design, and several other sources. It is, at its heart, a hierarchical way of storing information. A filesystem's power comes from storing files in a directory tree. This makes it possible to access thousands and thousands of files only going a couple of steps down in the tree. For example, a standard, Windows hierarchy might look something like this:
C:\MSOFFICE
C:\MSOFFICE\EXCEL
C:\MSOFFICE\WINWORD
Although only three directories are listed, thousands of files could be safely stored in each of these directories, lending order to chaos.
Perl stores functions and variables inside a hierarchy. The Windows hierarchy example above might be represented in Perl as:
MSOFFICE
MSOFFICE::EXCEL
MSOFFICE::WINWORD
in which each of the above namespaces are homes for functions and variables. Here, the '::' is the separator, just as '\' is the separator in the Win32 world, and '/' is the separator in the UNIX world.
In this case, 'MSOFFICE::EXCEL' might refer to functions and variables which implement the EXCEL program, 'MSOFFICE::WINWORD' might refer to the functions and variables that implement WINWORD, and 'MSOFFICE' might refer to the functions and variables that tie the functionality of the whole package together.
And the important point is that the variables/functions in MSOFFICE::EXCEL are different than the variables in MSOFFICE::WINWORD. You can use the scalar $MSOFFICE::EXCEL::A, and the variable $MSOFFICE::WINWORD::A without fear of stepping all over yourself.
All of this is quite hypothetical. As a more practical example, Figure 14.1 shows the actual hierarchy that comes with the centralized distribution of Perl.
fig141.fig
Figure 14.1
Central distribution of Perl object tree.
Each node in this figure is a module that has its own namespace, functions, variables, and may be an object. (For more information on the central distribution, the documentation that comes with Perl has over 200 pages of information.)
You can look at this hierarchy as a functionality map. Each entry has its own niche of functionality which it covers. There is no need for you to program for this functionality on your own. Instead, you should be re-using the modules listed above by looking in the documentation to see what is available to you.
In creating your own projects, you should be thinking in exactly the same way. You should be thinking about creating your own functionality map; creating your own hierarchy, and generally thinking 'have I implemented this before?' before actually starting to code. This mind shift is essential to scaling up your projects, and is essential to start looking at the world in an object oriented way.
Now, lets get a little more pragmatic, and look at the ways that Perl provides for accessing, and working with, namespaces.
There are two basic ways to work with a namespace. You can either directly access it, or you can use the package keyword to access it.
One way of getting at variables and functions inside a namespace is to simply append the namespace to the beginning, plus '::'. For example,
S::execute();
refers to namespace S's version of the execute function, and
$Mail::hello;
refers to namespace Mail's version of the scalar hello. As said, packages are hierarchical, so something like:
print keys(%level1::level2::hash);
prints out the keys of level1::level2's hash.
Directly accessing a namespace as we do above, however, can be quite painful. This is especially true when there are several variables and functions which are related together. Suppose for example that you were writing a simple sort function:
1 for ($sort::xx = 0; $sort::xx < @sort::elements; $sort::xx++)
2 {
3 for ($sort::yy = $sort::xx+1; $sort::yy < @sort::elements; $sort::yy++ )
4 {
5 @sort::elements[$sort::xx, $sort::yy] =
@sort::elements[$sort::yy, $sort::xx]
if ($sort::elements[$sort::yy] < $sort::elements[$sort::xx]);
6 }
7 }
This code contains way too many colons! We shouldn't have to go into gory detail about where each variable is, when everything is in the same namespace. Well, as a short-cut, Perl provides the package keyword which simplifies this code considerably:
1 package sort;
2 for ($xx = 0; $xx < @elements; $xx++)
3 {
4 for ($yy = 0; $yy < @elements; $yy++)
5 {
6 @elements[$xx,$yy] = @elements[$yy, $xx];
7 }
8 }
This does exactly the same thing as the first piece of code: $xx, $yy, and @elements are aliased to $sort::xx, $sort::yy, @sort::elements. package therefore acts as a default namespace, and cuts down on the amount of syntax traffic (or code noise, take your pick) in the code.
Package Declaration Rules
To make all the variables and functions that follow part of a given namespace, say something like below.
Package declaration has the syntax:
package Packagename;
This simply tags whatever follows to be of package 'Packagename'; and you need to remember the following three things about package declaration.
Rule #1: Packages Supercede Each Other
Any package declarations are superseded by any other package declarations. In this code fragment:
package One;
package Two;
_execute();
'_execute()' refers to 'Two::_execute()' rather than to 'One::_execute()' since package Two was the last package to be mentioned.
Rule #2: Packages Have Scope
Second, if you declare a package statement, the package declaration is only good up to the end of the scope that it is included in.(See 'scoping rules' in chapter 5 for more detail on scope). The following statement:
package Two;
{ package One; }
print $b;
prints out "$Two::b" since even though package One is the last one declared and mentioned, it was put inside brackets and therefore, has gone out of scope. However, if you had said:
package Two;
package One;
print $b;
then 'print $b' refers to print "$One::b" instead.
Rule #3: Packages are not Synonymous with Scope
There is a reason why we separated scope into chapter 5 and are only starting packages now. That is because the package and the scope are totally unrelated concepts. This may not bite you often, but when it does, it can cause problems. Take the following code:
$a::variable = 3;
package a;
my $variable = 1;
package b;
my $variable = 2;
print $a::variable;
What does Perl print when it sees this code? Well, the prudent answer would be '1', since 'my $variable = 1;' was put under the 'package a;' banner. However, the correct answer is "3".
In my opinion, this result is counter-intuitive and should be changed, but it vividly shows how Perl has been programmed. What is going on here?
The answer is that namespaces and my variables have absolutely nothing in common with each other. '$a::variable = 3' sets a global variable inside the a namespace, while 'package a; my $variable = 1;' sets a variable without a namespace.
It sets what is called a lexical variable, which is associated with the scope that it is in. Since there can be multiple namespaces in a given scope, the following code:
package a;
my $variable = 1;
package b;
my $variable = 2;
actually sets the same variable twice: first to 1, and then to 2, even though it looks like it is setting $a::variable and $b::variable!
Perl, thankfully, warns you about this type of error, but I still think it is a design flaw of the language that should be fixed. Unfortunately that is easier said than done (since it would cause a lot of code that is already out there to change).
As we shall see, namespaces are the mechanism by which you will create modules, libraries, objects, classes, and other such fun stuff. If you are getting into OO, you will be seeing a lot of them. In fact, namespaces will probably be the center around which you program.
Since namespaces will be the center in which you program, here are three quick tips which can save you a lot of time and hassle when working with them.
As a general rule, there should be one namespace per file, and one file per namespace. If you follow this rule, you will overcome any weirdness with the difference between scoping and namespaces such as described above, and be able to instantly track down a given subroutine in your code. If you say something like:
Module::function();
when calling your functions, then you will instantly say 'Aha, I need to go to Module.pm to find that' (I think of this in terms of 'where the code lives').
The keyword use provides a convenient way to follow this rule.
You can use the variable "%<namespace>::" to get a list of all the global symbols in a given namespace. For example:
foreach $key (keys %Time::)
{
print "\$$key => $Time::{$key}\n"
print "\@$key => @$Time::{$key}\n";
print "\%$key => @{[%$Time::{$key}\n"]}\n";
}
prints out all of the variable names, and their values, that have been globally declared in the namespace Time. This trick does not work for my variables as they are totally different. Again, this is a major stumbling block for people when dealing with Perl.
The central task of any project that you are on should be to develop a code hierarchy of your own. The Perl libraries that come with the standard distribution give a good example of how to make a large Perl project, essentially giving you 30,000 lines of code for free. Building your project as a hierarchy is one of the basic tasks of Object Oriented programming.
Namespaces are where functions and variables live. The simple way of accessing any variables or functions in a namespace is to say 'Namespace::function()' or '@Namespace::variable' to directly access it, or to say 'package Namespace;' in front of any code that you want to be interpreted to have its own namespace.
Namespaces are used so that you do not have to worry about name collision, which occurs when two variables have the same name. Namespaces allow:
$my::variable = 1;
and
$your::variable = 2;
in the same program without having the second variable over-write the first.
Before Perl version 4, the only way to make a large scale Perl project, or scale up your code, was the ability to make functions and the ability to segregate variables into scopes. You could get pretty far with just these, but it limited Perl to 'one shot scripts'; scripts that were useful, but didn't have reusable components.
Perl4 came along, and added libraries or collections of functions that were related, which we shall address here. To make a library, you require it, just as you used modules.
The model for creating and using libraries in Perl is fairly simple:
1) To make a library, create a Perl program that you are going to reuse in several places. Usually, this file has the suffix ".pl" (for Perl library).
2) To use that library, say "require 'library.pl'" in the program that you wish to use it.
This is pictured in Figure 14.2:
fig142.fig
Figure 14.2
Perl's model for requiring libraries
For example, say that you have created the following library, which does certain translations to words. It is stored it in the file wordtrans.pl:
1 sub anagram { return(reverse($_[0])); }
2
3 sub rot13
4 {
5 my ($word) = @_;
6 $word =~ tr"A-Za-z"N-ZA-Mn-za-m";
7 return($word);
8 }
9 sub piglatin
10 {
11 my($word) = @_;
12 my (@letters) = split('', $word);
13 my ($first, $rest) = ($letters[0], join('', @letters[1..$#letters]));
14 return($rest . $first . "ay");
15 }
To actually use this library, put:
require "wordtrans.pl";
at the beginning of any script that wants to call piglatin, rot13, or anagram. And:
1 require "wordtrans.pl";
2 my $word = rot13('theseus');
3 print $word;
will do a simple encryption of 'theseus' by shifting each letter 13 'places' to the right ('A' becomes 'M', 'B' becomes 'N', etc). Hence, this will print out 'gurfrhf'.
This may sound simple enough. However, there is one issue that we haven't discussed yet: exactly how this mechanism works. Of course, I should also point out some of the small gotchas that can trouble new Perl programmers.
So, what happens when you say 'require "perlprog.pl"'? There are two things to remember here.
The main principle to remember is that when you write a library, and include it in a program, Perl simply executes the code in library.pl at the point which you say 'require library.pl'. For example, if you put the following statement inside the file library.pl:
1 print "Testing of require!n";
and then inside a program:
1 require "library.pl";
2 print "Done with require!\n";
Starting execution at line 1, Perl opens up, parses, and then executes the code inside library.pl. This prints out:
Testing of require!
Done with require!
The library code is not inserted in the main code until run time. When you say "require 'program.pl'", Perl executes that statement in exactly the place where it is called. Hence, the following:
1 use Config;
2 if ($Config{'osname'} =~ m"nix"i)
3 {
4 require "UnixLibrary.pl";
5 }
6 else
7 {
8 require "NTLibrary.pl";
9 }
does exactly what you might expect, namely checking the operating system, and seeing if it is a variant of UNIX. If so, it requires the library UnixLibrary.pl. If otherwise, it assumes that you are using NT (we could make an explicit test here, too: Config would return 'MSWin32').
This principle can be abused quite readily to make very illegible code. (In fact, the above could be considered an abuse if used irresponsibly.) For example, you could say something like:
1 $library = "mylib";
2 require $library;
Since require is run-time, 'require $library' becomes 'require mylib'. Hence, here we dynamically decide, based on a variable, which library to include.
If you do this a lot, it will become almost impossible to sort through the way you structured your program. The less said about this the better, since the poor soul who has to maintain your code will be cursing your name as he searches through the layers of logic.
require is the older way to include code in Perl, therefore, there are a couple of caveats of which you should be aware. (use overcomes these caveats, so you are advised to learn it.) These caveats are listed below.
Since require does not enforce any type of convention for functions, there is always the chance that the names of functions will collide, or overlap, each other. For example:
lib1.pl:
1 sub get { print "lib1's version of get\n"; }
2 sub put { print "lib1's version of put\n"; }
lib2.pl:
1 sub get { print "lib2's version of get\n"; }
2 sub put { print "lib2's version of put\n"; }
Now what happens when you say something like:
1 require "lib1.pl";
2 require "lib2.pl";
3 get();
Well, the first thing Perl does is open up lib1.pl, and include the functions get and put. But then Perl goes on to open lib2.pl and include lib2's versions of the same functions. This means that when you say 'get()' in line 3, Perl prints out:
lib2's version of get
This is a name collision, which has caused Perl to ignore lib1 and run lib2's version instead. This error can cause a lot of wasted time, and the amount of wasted time increases exponentially as the size of the project grows.
What is the best way to avoid name collisions? Well, the best way to avoid this is by using namespaces as we shall see below, so that your functions aren't in 'one big happy family' (i.e.: the main namespace), but in lots of smaller families. The flag '-w' will also show when namespaces collide.
However, even then, on large, large projects you will get some name collision. You are best off with the concept of modules, and the keyword use instead.
This is a simple one, but you'd be surprised how many times this can catch you. Look at the following code:
1 if ($var1 !=1 ) { require "lib2.pl"; } else { require "lib1.pl"; }
in which lib1.pl and lib2.pl look like:
lib1.pl:
1 print "This has a Syntax Error!\n
lib2.pl:
1 print "This doesn't have a syntax error!\n";
Now, if $var1 is not equal to '1', then the program will
a) hit the statement 'require "lib2.pl"'
b) parse the code in lib2.pl
c) run correctly (and print out 'This doesn't have a syntax error').
In other words, it will do exactly what you expect, since lib2.pl contains valid Perl code.
However, if $var1 equals 1, then you will trigger the other part of the if clause, and things aren't so simple. What happens? Well, the same three things:
a) Perl will hit the statement 'require "lib1.pl"'
b) parse the code in 'require "lib1.pl"'
c) die with a syntax error. (since lib1.pl has a syntax error in it!)
This can be rather nasty, since the process that you run could have gone for hours before, and then and only then tell you that there is a problem in your code!
What to do about this? Well, the best thing to do is minimize your use of this form of require. In fact, I would go as far to say that you only use require this way in doing portable coding:
if ($UNIX) { require "UNIXLibrary.pl"; } else { require "NTLibrary.pl"; }
In fact, I would take the further precaution of wrapping this up in a use statement, since use statements include code before any of the code is actually run, that is, in the compile step. We shall talk about quite a bit later on, when we get to use.
In short, require lets you include code just as if the code was typed on the spot: 'require program.pl' opens up the Perl file 'program.pl', inserts it into the spot where require was called, checks it for syntax and then runs it. Hence, it can be useful in cases where you are not sure which library to call. However, in general, use is, well, more 'useful', as we shall see below.
Libraries are a good start for Perl, but in many ways they are too flexible. In particular, since they are run-time, this means that their syntax is checked during the middle of a run. This means, as we saw up above, your script could be merrily going along, and then hit a require statement in the middle, along with a syntax error inside that library, and then stop dead in its tracks. Also, it is too easy to get into bad programming practice with libraries, for reasons stated above.
Hence, Perl version 5 introduced the use statement, and the concept of modules to Perl. Modules are bits of related code which sit together in a specialized compartment, and which follow a semi-strict naming convention which helps the programmer organize his or her code.
This has many advantages over require, as we shall see below. In fact, you should be programming with use about 90% of the time (there will be a couple of places where you will have to resort to require, which we shall mention).
The model for creating modules with use is almost as simple as the one with require. There are just a couple of layers thrown on top:
1) To make a module, write a Perl program that has the suffix ".pm", as in like "Module.pm".
2) To use that module, say "use Module" in the program that wants to include it.
So far, so good. In fact, this is exactly the same thing as is the case with libraries up above. However, as said, there are the following quirks which may seem like a pain starting off, but will make your life really easy when you get used to them:
3) The actual program, "Module.pm", must have the header 'package Module;' in front of any code that exists in the module.
4) You may include a special function, called import, in the module, which will be executed at the point of the module's inclusion in a program.
The general form of programming and using modules is shown in Figure 14.3:
fig143.fig
Figure 14.3
Module form in Perl.
Therefore, there are several equivalencies here. The filename of the module (without the".pm") is equal to the package name of the module, which is equal to the namespace of the module, which is equivalent to the prefix put in front of every function in the module. If you don't have these relationships in your code, it will fail (with an 'empty package' error). One of the simplest modules (with usage) in Perl is:
module Simple.pm:
1 package Simple;
2 sub True { return (1); }
3 sub False { return (0); }
client (client.p):
1 use Simple;
2 print Simple::True(); # (prints 1);
Or, as in the more complicated example:
module SubClass/Class1.pm:
1 package SubClass::Class1;
2 sub True { return(1); }
3 sub False { return(0); }
client (client.p):
1 use SubClass::Class1;
2 print SubClass::Class1::True();
Aside from accessing 'True()' from the namespace Simple (or SubClass::Class1), this is very much like programming a Perl library. And to the casual user, it behaves very much the same as require.
However, there are a few things that set apart use from require, which we discuss below.
As said, require can bite you quite a bit due to the fact that it is done 'too late', after Perl has compiled the script. (This actually was the motivation for a separate 'compile phase' in which you could alter the way the script was run. There used to be no way to do this in Perl4. ) use makes things much cleaner by following two principles.
As said above, each instance of use is run at compile time. This means a very helpful syntax check of all modules that are included in scripts, before they actually get executed. However, this means one, very misleading thing. Consider the following code:
1 if ($time eq 'now')
2 {
3 use Module1;
4 }
5 else
6 {
7 use Module2;
8 }
This does not mean 'include Module1 if time equals 'now' otherwise include Module2'. Instead, at compile time, Perl rips out each use statement, and then parses the code on the fly, leading to something that executes like:
1 use Module1;
2 use Module2;
3
4 if ($time eq 'now')
5 {
6 }
7 else
8 {
9 }
This is very misleading for the new Perl programmer who is constantly thinking in terms of require. They are effectively using use as if it were require. Both keywords have their place, but this isn't one of them for use. Programmers are strongly encouraged to encapsulate all of their require statements inside a use. We shall show how to do this a little later on: it makes the interface for client programs of the module a lot easier.
There is another twist here too: the import function. import lets you pass parameters to the use statement, exactly as if it were a function.
import is Perl's mechanism for making what are called directives. A directive is a line of code which modifies the environment that Perl is working in, so that programs which look at the directive are modified to run in a certain way at compile time.
Remember 'use strict'? Well, this uses import internally, When you say 'use strict', Perl calls the function 'strict::import()' at compile time, which then modifies the way that the Perl interpreter parses code. And 'use Config'? This also uses import internally to gather the information about what environment Perl has been compiled in, and store it in the hash %Config. Both are directives that you can use to your benefit.
As a simpler example that shows us what is going on, let us implement addition in a kind of an offshoot way, as a directive. Again, the client is the one with the use statement, and the module is the one that implements the package:
client:
1 use Add 2,2;
Module (in Add.pm):
1 package Add;
2 sub import
3 {
4 my ($type, @parameters) = @_;
5 my ($param, $answer) = (0, 0);
6 foreach $param (@parameters) { $answer += $param; }
7 return($answer);
8 }
9 1;
Again, this is a rather strange way of doing addition, but it is a good example of import. When Perl sees 'use Add 2,2' it does two things:
a) parses Add.pm (checks it for syntax errors, imports functions into the current 'code space' etc.)
b) calls the function: Add::import('Add',2,2) which passes the parameters that the use statement called. (Note that the first function parameter is the name of the package being used.)
When you consider that you can put anything into import, you get a very powerful mechanism. Indeed, different import functions, such as the one in the optional module Filter, have been used to modify the syntax of Perl itself on the fly! (We used filter at my work to make the syntax of Perl look more like C++, for example.)
Let's now take a look at actual use samples. You can get more information about these from the documentation itself, but they are so common that you will be constantly using them in the programs you write.
We just mention it here, but we have been using it throughout the book, and no summary of directives would be complete without it. See the chapter on Debugging Perl for more information.
Again, we have used this off and on. It populates a hash named %Config, which is really a synonym for this file called 'config.sh' which is created when you compile and install Perl. By dumping the keys and values in this hash, you can get a complete list of the elements that Config is aware of.
Here's a sample program which prints out some of the more common attributes:
1 use Config;
2 print $Config{'osname'}; # prints your operating system name.
3 print $Config{'cc'} # prints the compiler which compiled perl
4 print $Config{'ccflags'} # prints the options which were used to compile perl
As said, these are the most common values. There are about 447 others you can print out! See the on-line documentation for more detail.
'use Env' is a simple workaround that allows you to pretend that the environment are actually variables inside your program. It takes the elements of the hash %ENV, and makes a scalar for each one of the elements it finds. For example, if you say something like:
use Env;
print $PATH;
then Perl will print out the current path for you, rather than you having to say:
print $ENV{'PATH'};
'use vars' is used when you want to have a global variable and also want to have the strict directive in force in the program. For example, say that you had the following code:
use strict;
$file = 1;
This will fail because $file is a global, and strict doesn't allow globals that are undeclared (don't have an occurrence without the package in front of them, like $main::file.) However, if you say:
1 use strict;
2 use vars qw($file);
3 $file = 1;
then line 2 'pre-declares' $file for you, and this can work. It tells Perl that 'yes, I know that a variable named $file exists, it is not just a typo'.
This is the big kahuna, so to speak, of directives and people use it quite a bit (too much, actually). It lets you 'Export' functions and variables into an environment.
What does this mean? Remember the code which we programmed before (the 'Simple' module)?
module Simple.pm:
1 package Simple;
2 sub True { return (1); }
3 sub False { return (0); }
client (client.p):
1 use Simple;
2 print Simple::True(); # (prints 1);
Well, in line 2 of the client, we said 'print Simple::True()'. We had to fully qualify the namespace to the function that we were using. If you exported this function instead, you could say:
1 use Simple;
2 print True();
without knowing that 'True()' was inside 'Simple'.
How to do this? Well, Perl provides Exporter to do this very thing. You say:
module Simple.pm
1 package Simple;
2 use Exporter;
3 @ISA = qw(Exporter);
4 @EXPORT = qw(True);
Don't worry about the mechanics of how this works for now. All lines 2-4 are saying is 'Export the function True into any place that says "use Simple"' This is called 'Exporting by default'. When you say:
use Simple;
in your script, what Perl does is load up Simple, and then magic happens, copying the True function into the namespace where 'use Simple' is called. Don't worry, we will get to the magic later when we talk about inheritance.
Anyway, 'exporting by default' is not necessarily the best thing to do. There are two things that are pretty nasty about it.
1) You can forget about where your functions are coming from. There is something supremely satisfying about seeing 'Simple::True()' instead of 'True()', because you know exactly where to look in case of problems.
2) Since there is a lot of stuff going on here ( inheritance, globbing, using the import function, etc) you are best off knowing about how it works, before actually using it. We treat Exporter as a good example of inheritance in that chapter, so you might want to read this before using it.
The documentation has a lot more in the way of examples on 'use Exporter'. You don't need to necessarily 'export by default', you can export by choice, give a list of patterns to export (or not export) export variables, and so on. Go to the documentation for more information.
OK, now let's supplement the use examples from the distribution with some of our own. Let's take a look at three more simple examples applying this to writing modules and using them in code.
First of all, lets take the 'word translation' library we wrote up above, and convert it into a module. We call this file 'WordTrans.pm':
1 package WordTrans;
2
3 sub anagram { return(reverse($_[0])); }
4
5 sub rot13
6 {
7 my ($word) = @_;
8 $word =~ tr"A-Za-z"N-ZA-Mn-za-m";
9 return($word);
10 }
11 sub piglatin
12 {
13 my($word) = @_;
14 my (@letters) = split('', $word);
15 my ($first, $rest) = ($letters[0], join('', @letters[1..$#letters]));
16 return($rest . $first . "ay");
17 }
18 1;
Note two things here. One, the code is exactly the same. We have simply taken our three subroutines (anagram, rot13, piglatin) and have cut-and-pasted them into WordTrans.pm, our module file. Second, note the 'package WordTrans;' and the '1;' statements at the beginning and end of the file. These statements enforce the convention that what you are doing is separating your functions off into a namespace.
When we actually get around to including our module into a program, we will call these functions differently. The client, or program that uses the module, will need to put 'WordTrans' on the front of any functions that it uses:
1 use WordTrans;
2 my $word = WordTrans::rot13('theseus');
3 print $word;
Instead of 'require "WordTrans.pl"', we have 'use WordTrans'. We are assuming that the module is called WordTrans.pm and that the functions we include are all in the WordTrans namespace.
Second, let's take a look at an example of import: the function which runs code inside a module when it first is imported, or compiled, into the program. Here, we will implement a simple version checker in Perl. The main problem here you will probably already be familiar with if you have ever worked on a software project bigger than two people.
The problem is keeping things in sync. If you are using a module that has function A, that is no guarantee that previous versions of that module have function A.
In fact, if a script of yours is using version 1.005 of the module, and it so happens to ooze its way into a place which only has version 1.004 (which doesn't have function A) then the users of this module will get the rather rude message:
Undefined subroutine A at line ...
and the users of your module will have to go through your code to see what is going on (all the while thinking unpleasant thoughts about you!). Not good. We will substitute this message with:
Version 1.005 required! You only have version 1.004! Please get the latest version from (....) (source name)
How to do this? Well, with the import function it is straightforward. Assume our module is called Widget. We want to block any code that is older than 1.005, so we say so inside our client:
client.p
use Widget 1.005;
Now, the '1.005' is passed to the import function as an argument. This will be executed when the Widget module is first applied. To get the behavior that we want, we program Widget as so:
1 package Widget;
2 my $_actualVersion = 1.004;
3 sub import
4 {
5 my ($type, $neededVersion) = @_;
6 die "the module $type requires a version!\n" if (!defined $version);
7 if ($neededVersion > $_actualVersion)
8 {
9 print "Version $neededVersion required! You only have version $actualVersion! Please get the latest version from (....) (source name)";
10 }
11 }
This mechanism can be added to any package to get this useful behavior: it is pretty much a universally applicable function. In fact, when we get to the Exporter, we shall see a standard way of handling revisioning.
This is a very generic example (in fact we will cover it more deeply when we get to polymorphism), but it is a good example of how you can get the bulletproof nature of use, and still maintain the flexibility of require.
Consider the simple example that we had above with require:
1 use Config;
2 if ($Config{'osname'} =~ m"nix") { require "UNIXMailLib.pl"; }
3 else { require "NTMailLib.pl"; }
Supposedly UNIXLib.pl and NTLib.pl are pieces of code which do the same thing on different platforms, and the above is a workaround to get a piece of code to run on both.
However, there are a couple of drawbacks to the above code. One, it does not shield the user from the details of how the code is working. We see the 'gunk', the internal glue, which makes the code work.
Second of all, it suffers from the drawbacks of require that we discussed before: namespace pollution, and the fact that it is executed at runtime. The second is particularly irksome, since as said, we may not be warned of any errors in our programs until the code is actually hit.
Hence, it would be natural for us to want to stuff all of this messy code into a module, such that one would say:
use Mail;
and get the behavior that we want. With use, and the import function in use, this is easy:
1 package Mail;
2 use Config;
3 sub import
4 {
5
6 if ($Config{'osname'} =~ m"nix") { require "UNIXMailLib.pl"; }
7 else { require "NTMailLib.pl"; }
8 }
9 1;
In fact, we may want to make this error checking more stringent. Suppose that, in the same example, our script depends on a couple of executables being installed. Since we are working with pseudo-code that implements a Mail module, let's use this as an example. We can take the above code and augment it with a couple of checks for code that we need:
1 package Mail;
2 use Config; # module to get config details
3 use Carp; # module to get debugging output.
4 sub import
5 {
6 if ($config{'osname'} =~ m"nix")
7 {
8 if (`which elm` =~ m"not found") { confess "Couldn't find elm\n"; }
9 require "UNIXMailLib.pl";
10 }
11 else
12 {
13 require "NTMailLib.pl";
14 }
15 }
Here, lines 8 and 13 check to see if the appropriate executables that we use (elm, MSmail) are installed, and 'confess' (the process spilling its guts telling where exactly when and where it died) if the executables are not there.
As you can see from the examples above, use can be extremely powerful, providing a lot of extra functionality that require does not even approach. However, a lot of former Perl programmers take a while to get familiar with use, out of habit or stubbornness. Whatever the reason, there is one simple way to remember how use is different from require: they are almost the exact opposites.
require is executed at runtime, and use is executed at compile time. This makes it so that code forms such as 'if ($a) { use A; } else { use B; }' don't work.
require has no naming conventions to follow to make it work. use needs the module name in use <ModuleName> to be equal to the file name (plus ".pm"), which is equal to the package name.
require simply imports the code with 'no questions asked'. use allows you to write an import function which lets you do anything that Perl allows to interrogate the code to learn about the environment that it is running in before actually importing the function.
In other words, use makes you go through more hoops to get things done, but in doing so it makes your job a lot easier, and makes your code more scaleable. Hence you should be using require only in specialized circumstances.
The above concepts (require, use, and the namespace) are the logical elements that make Perl able to process libraries and modules. You write a module, and then require it into your program, or use it. By doing so, you populate the variables and functions which make your program go.
However, there is another side of the coin: your modules and libraries need to come from somewhere. They are stored on disk, then need to be loaded into memory, and parsed before they can be used by the Perl interpreter.
All of these are physical concerns, and they account for a lot of the overhead in actual projects.
Where you actually get the code to be loaded into memory can mean all the difference in the world: was it a test version that you loaded? Or a pre-production version? Has the code that you have loaded gone through quality assurance? And so forth.
I have actually spent the better part of some days tracing down an elusive bug in Perl code, only to find that, hey, I was using the wrong version of the module, was running from the wrong machine, or even was using the wrong version of Perl.
These errors can trip you up time and time again, and Perl provides several facilities to stop you from doing so. This section goes over some of those facilities. Read these over carefully, and we believe that you will come to less grief as time goes on.
The best way to avoid problems with loading the wrong libraries or modules is to understand how Perl actually finds the libraries that it is to load into memory. We cover this in detail below.
The central variable here is called @INC (short for include) and if you remember anything out of this section, remember this.
@INC is an array provided by Perl which tells Perl (and you) the directories to search to find libraries and modules. 90% of your problems with version mis-management can be solved by this simple statement put at the beginning of your program:
print "@INC\n";
This gives you a list of places that Perl is looking to find your libraries. Here is the central rule about @INC that you should remember: Perl goes through each of the directories in order looking for the module or library that you are including. If it cannot find it in one directory, it goes to the next.
Printing out @INC shows exactly which order Perl is taking to get the libraries.
This seems to be confusing for some, so let's go further into an example on the way Perl actually does this lookup, and a session in trouble-shooting.
Suppose you had the simple client that looks something like:
client1.p
use Devel::Peek;
and you find out that for some reason, Perl is saying something like:
Can't locate Devel::Peek in @INC at line
Well, the first thing to do is comment out the offending line (use Devel::Peek) to get a 'working' program, and print out the "@INC\n" variable as the script sees it, to get something like:
/usr/local/lib/perl5 /usr/local/lib/perl5/sun4-solaris . /home/ed/WORK/Devel
Good enough. The second thing to do is print out what is in each of those directories:
/usr/local/lib/perl5: <Bunch of files, no Devel directory>
/usr/local/lib/perl5/sun4-solaris: <Bunch of files, no Devel directory>
.: <Bunch of files, no Devel directory>
/home/ed/WORK/Devel: Peek.pm, SelfStubber.pm
Now, in tracing this down, what we do is 'follow in Perl's footsteps'. Perl goes through the following machinations:
1) Perl looks in /usr/local/lib/perl5, and tries to find /usr/local/lib/perl5/Devel/Peek.pm.
2) It fails, so it goes to the next directory (/usr/local/lib/perl5/sun4-solaris) and tries to find /usr/local/lib/perl5/sun4-solaris/Devel/Peek.pm.
3) It fails here, too, so it goes to the next directory (., or the current working directory) and tries to find ./Devel/Peek.pm.
4) This fails, so it finally goes to /home/ed/WORK/Devel to find /home/ed/WORK/Devel/Devel/Peek.pm.
5) It finally gives up and registers an error.
Note one important thing here. Perl does not find 'Peek.pm' inside '/home/ed/WORK/Devel'. Why? Because the directory '/home/ed/WORK/Devel' is the root directory that Perl tries to match. It sticks Devel/Peek.pm on the end of the root, to get:
/home/ed/WORK/Devel/Devel/Peek.pm
So, in solving the problem, we notice that there is an extra 'Devel' on the end of our @INC which is causing mischief. The simple solution therefore is to change @INC to include /home/ed/WORK instead of /home/ed/WORK/Devel.
For those of you who like logic diagrams, Figure 14.4 shows the process in pictorial form.
fig144.fig
Figure 14.4
Perl's Process of Including Libraries
This is fairly straightforward if you understand "Perl-think". Learning include paths like Perl's will pay off tenfold (in other compilers, tools, etc.) It is quite a common design tactic in computer science.
@INC is fairly well known by people who program Perl regularly, but %INC is not. This is unfortunate, because %INC can be used to track down problems that would take a lot longer time to track down with @INC.
If @INC contains a list of directories that Perl searches for modules, then %INC contains a list of actual modules that Perl has loaded from the environment. For example, the following code:
use Benchmark;
foreach $key (sort keys(%INC))
{
print "$key => $INC{$key}\n";
}
will print out:
Benchmark => '/usr/local/lib/perl5/Benchmark.pm'
assuming that '/usr/local/lib/perl5' is the first library that Perl stumbles across.
As you can see, this can be extremely helpful. For one thing, it can save you the trouble of searching through the include path (@INC) to find a library out of sync, but more importantly it is exactly what Perl sees, so there is no chance of human error in tracing down these problems.
It also has one more benefit, one which you will get without any effort on your part, but you will be grateful for (trust me!). Take the following code:
for ($xx = 0; $xx < 10; $xx++)
{
require "lib1.pl";
}
On the face of it, this code would cause the library lib1.pl to be included several times. In most languages, this would cause severe problems. C++, for instance, has 'compile guards' which are put around all header files. Something like this:
#ifndef HEADERA
#define HEADERA 1
#include "headera.h"
#endif
This is essentially a hack to prevent a header from being included several times (and getting 'xxx redefined' errors!).
Perl does not need this. Instead, each time a Perl library is included into your script, Perl records the fact inside %INC. The next time that library is encountered in require or use, Perl recognizes that fact, and stops right there! Hence, no problems, and a vast saving of time and resources.
Anyway, let's use %INC to solve another resource problem. Suppose that you have written a script, and get several weird errors, such as:
Undefined subroutine File::Path::seek() line ...
Now, File::Path is a module that you have written to do basic things like open a bunch of files (say in a 'tree' format), and when you open up the file 'File/Path.pm', and whatever else, you do see the subroutine named seek. Furthermore, the module 'File/Path.pm' is in your @INC variable when you start Perl.
After a couple of minutes of double-checking yourself, your first reaction should be that this is a path problem. That somehow, you are including the wrong module into your program (Perl doesn't lie, after all!).
Therefore, you should insert the following line into your program:
foreach $key (sort keys(%INC)) { print "$key => $INC{$key}\n"; }
This is exactly like we did before, except now that we run the program, we see (in the midst of the output):
File::Path => /usr/local/lib/perl5/File/Path.pm
So, at a glance, you see that somehow, '/usr/local/lib/perl5' has a module 'File/Path.pm' instead of the expected '/home/ed/WORK' having 'File/Path.pm'. What has happened? You have gotten unlucky, and accidentally called your module the same name as one that is included in the central distribution! The solution? Rename your module File/MyPath.pm (and change the package definition inside) and everything compiles successfully.
In short, learning to use %INC (and @INC) can mean hours of difference in effort. Problems like the one described above can be fairly common, especially in large, large, projects, and tracking them down can be trivial, or difficult, depending on your knowledge.
The variable @INC is such an important beastie to Perl that there are several different ways of setting its value. Each way has its advantages and disadvantages, you should be aware of them all (especially when debugging projects!)
Unlike most variables in Perl, the default value of @INC is not blank. Instead, it is a value that is set at the moment which Perl has been compiled. If you look inside the config.sh file that comes with your Perl distribution, you will see this value. (Or, probably easier, the one line script 'perl -e 'print "@INC\n"' will do the same.)
This path points to the place where the Perl installation has put all of the libraries that came with the standard distribution (see the namespace diagram up above).
The first way you should know about setting @INC is via the environmental variable PERL5LIB. This is a good way to set the environment for the purposes of a cron job, or to set the environment temporarily so you can test out new code.
If you say something like:
prompt> set PERL5LIB = "my_path";
on Win32, or:
prompt> export PERL5LIB="my_path";
on ksh, then you prepend 'my_path' to the @INC variable. Hence, if @INC was:
"/usr/local/lib/perl5", "/usr/local/lib/perl5/sun-solaris"
it becomes:
"my_path", "/usr/local/lib/perl5", "/usr/local/lib/perl5/sun-solaris"
with 'my_path' being the first place that Perl will search for libraries.
Setting PERL5LIB does have some drawbacks however. Since you are setting the environment rather than putting your instructions into the code itself, you have to be very careful when moving between environments. Forgetting to set PERL5LIB is an easy thing to do, hence caution is advised.
'use lib "my_path"' is the second way to set @INC, and is probably the best, most stable way of doing it. It has the following benefits:
1) it is done inside code, hence you see exactly what is going on.
2) it works well with some Perl internals (in particular, MakeMaker, which you can read about in the Perl on-line documentation.)
3) it uses a module, so your code will get benefits automatically of any further enhancements to the @INC mechanism.
When you say:
use lib 'my_path';
at the beginning of any code, Perl does exactly the same thing as with PERL5LIB. Namely, it takes 'my_path' and prepends it to "@INC".
Finally, we mention the fact that you can set @INC directly, for closure. But to tell the truth, it is not such a hot idea. You can say
BEGIN { unshift(@INC, "my_path"); }
which does the same thing as both PERL5LIB and:
use lib "my_path"
but it is unclear, ugly, and non-encapsulated. (i.e.: it shows the guts of the logic rather than hiding the details).
So do yourself a favor and stick to the two other methods!
@INC is the one major variable which is used to actually piece together script out of the several modules and libraries that are require'd and use'd by your program.
Perl looks at @INC a directory at a time, trying to find modules. When it finds them, it notes the fact that it has found them in the hash %INC.
As @INC is so important, there are several methods for setting @INC:
1) by default, Perl looks at how you have configured Perl to find the libraries that it needs.
2) by PERL5LIB, an environmental variable. This prepends a list of directories to @INC.
3) by saying 'use lib "path"' which changes @INC at compile time.
Now is the time to take a look at how you might set up an environment in which you can have different stages of Perl code development. There are some issues here which are outside the scope of this book (source control being the big one), but right now we are interested in the issues that have to do with code: how we can program for scalability.
We have four such areas at our work right now:
DEVELOPMENT: this is where the majority of the brainstorming, new ideas, blunders, and other associated mishaps happen with new code development.
TEST: this is where the first swag of testing comes in, where the ideas that were first expounded in DEVELOPMENT get their trial by fire.
PRE-PROD: this is where the code gets released to a larger audience, and it gets a chance to 'cool down' with a large scale (and usually lengthy) test, a staging area before the next environment (PRODUCTION). This stage is also sometimes called the integrated test area.
PRODUCTION: a final, clamped down version of code which is supposed to be bulletproof.
The general plan of attack we are going to have in this example is to set up a different directory tree for each of the environments that we are working in. Figure 14.5 shows how it will all look:
fig145.fig
Figure 14.5
Different Perl Environments
Each, different directory tree represents a complete picture, in which scripts can run using the modules that are associated with it. In this simplistic model we have four different directories:
1) modules/ This will be where all of the 'in house' modules will reside, the ones that we will program. This is where we will set up our code tree.
2) perl_extensions/ This will be where the Perl extensions which come from the net will reside.
3) scripts/ This is where the scripts that we will program will reside, the ones which use the Perl extensions and modules.
4) perl_source_code/ This is where the Perl source code (the actual source for Perl) will reside.
It is a good idea to keep track of modules and scripts with these four directories. However, you may want to have a system administrator keep track of the Perl_extensions and Perl_source_code.
However, if you keep track of your Perl extensions and Perl source code, you can have complete confidence and control that your scripts will always work. (Well OK, not 'complete confidence'; the operating system or any supporting tools that you use might change.)*
***********begin footnote********
Anyway, if you choose to keep track of your Perl source code, what you can do is install your own Perl binary inside the tree by choosing the correct installation command for Perl. (We don't have room to talk about it here, but the CD that comes along with this disk has some sample scripts on how to set this sort of thing up. We also give a sample setup on the CD. If you have any problems with this, drop me an email and I'll try to help. Or post to a Perl newsgroup.)
***********end footnote**********
The whole point of this is that you want to minimize the number of variables which can impact the scripts, and you definitely don't want these types of variables to effect you when you get people who are counting on your scripts in a production environment to have them fail.
Setting up different environments is not free, however. There are three issues when we set something like this up:
1) change control, and populating the different environments. This is the source code control issue I was talking about. You want to have a way to test your DEVELOPMENT code, before releasing it to TEST, and having a way to release it into PREPROD, etc. There are several good source control tools out there, CVS, PerForce being two of them. Or, you can write a Perl wrapper to handle your source control!
2) code changes to make the environments work. The scripts that you write need to have some way to find the modules that they are going to use. For example, if you say 'use MyModule;' you have to make sure that 'MyModule' is in a directory that is in @INC.
3) synchronization issues. Making sure that production scripts don't use TEST, or DEVELOPMENT modules. This can be a horror, as alluded to before. If a production script uses a development module, then it may not be compatible with the production world, and things will break all over the place.
Now as we said, issue one is outside the scope of this book. But issues two and three are different. In fact, there are very helpful directives in Perl that can insure stability in scripts, and minimize these two types of problems.
Recognizing these problems, here is one possible solution. Let's call the module that we write Sync, for synchronization, and list it below:
Listing 14.1 - Sync.pm
1 package Sync;
2 use Carp;
3
4 sub import
5 {
6 my ($type, $place) = @_;
7 BEGIN
8 {
9 $ROOT = ($place eq 'TEST')? "/code/tree/TEST":
10 ($place eq 'DEVELOPMENT')? "/code/tree/DEVELOPMENT" :
11 ($place eq 'PRODUCTION')? "/code/tree/PRODUCTION" :
12 ($place eq 'PREPROD')? "/code/tree/PREPROD" :
13 ($place eq '')? confess "You need to provide a type of environment!\n" :
14 confess "Unknown place $place!\n";
15 my $script = $0;
16 if ($script !~ m"$ROOT")
17 {
18 confess "The script $script is needs to be running under the tree $ROOT since it is a $place type of script!\n";
19 }
20 }
21 use lib "$ROOT/modules";
22 }
23 1;
Pretty simple logic, but following a strategy like this can help you out a great deal. Lines 4-22 are the import function which will be called at compile time, before any of your script is run. In lines 9-14, we figure out, based on what was passed to this module, which library tree that we are to use: TEST, DEVELOPMENT, PRODUCTION, or PREPROD. If import gets passed TEST, then the place that becomes the $ROOT to where we locate our modules becomes:
/code/tree/TEST
Continuing, line 15 gets the full path to the script (i.e.: the name of the script that actually includes this module) and lines 16-19 check to make sure that the name of the script is under the same tree as the modules that it includes. In other words, if the script you are running is:
/code/tree/PRODUCTION/scripts/my_script.p
and since $ROOT has become
/code/tree/TEST
the test
'/code/tree/PRODUCTION/scripts/my_script.p' !~ m"/code/tree/TEST"
succeeds, and flags the problem of a mismatch. (since the script as ran is not inside the directory /code/tree/TEST). And the script dies (confesses) because of it.
Therefore the code consists of two parts: one that figures out what the root is going to be, and one that flags potential problems with the code that one might encounter. And the usage?
Well, each script that you put into this system should have something like the following lines:
use Sync "$ENV{USER_TYPE}";
# .... code comes here
In this case, we tell from the environment what type of module we use. This comes from the simple observation that most of the time people are developers, testers, or users of the code that you write. Hence, when people log on, you could set up the environmental variable $USER_TYPE to tell what type of user they are, and have Sync import the right modules automatically.
Why go through all of this trouble? Well, figure the alternative. If you say something like:
use Sync 'TEST';
# .... code comes here
then you pretty much condemn your code to be in TEST hell for the rest of eternity. On the other hand, you could edit the code each time that you move it from TEST to PREPROD, and PREPROD to PROD (or even have a Perl script to do it) but you really want to avoid this type of magic. Making auto-changing code is most likely, a maintenance nightmare.
If you don't like the environmental idea, then there are others you could try. People in test usually work from a certain type of machine, or are in a certain group. You could auto-magically figure out what type of code out from these sources, or perhaps others.
In fact, some people take something like this a step further. Just food for thought, but we have seen something (done in Perl) like Figure 14.6 before.
fig146.fig
Figure 14.6
Personalized environments
Yes, that's right, there are a lot of arrows, and it can be pretty much done only in a UNIX environment (last time I checked, NT didn't have links.). The trick is that each person has their own, personal environment, which has links to the centralized environments. That way, when they check something out, everybody sees that something is checked out. And when someone puts something back, everybody sees it put back, so they are instantly effected.
This scheme has, I admit, some maintenance involved, but if you get it working, it can do wonders. Each person has their own 'playspace' in which they can do anything without affecting any other people. And if you do it right, the speed of code change gets a lot faster and goes a lot smoother.
We have covered a lot of territory here. Your basic weapons in the fight for scaling up code are the namespace (and the keyword package to implement namespaces), the library (with require) and the module (with use). use should be highly preferred over require, since use gives you some freebies in debugging your code, and is generally more scaleable. There are some times when you will need require though (such as for portability issues) but these should be rather rare.
In the next chapter, we shall go through a lot more examples of modules and directives, and in a lot of detail. Perl has a lot of simple, useful modules you can write to manage projects amongst other things, and we shall go through how to spot them and program them.
![]() ![]() |
![]() ![]() |
![]() ![]() |
COMPUTING MCGRAW-HILL | Beta Books | Contact Us | Order Information | Online Catalog
HTML conversions by Mega Space.
This page updated on October 14, 1997 by Webmaster.
Computing McGraw-Hill is an imprint of the McGraw-Hill Professional Book Group.
Copyright ©1997 The McGraw-Hill Companies, Inc. All Rights Reserved.
Any use is subject to the rules stated in the
Terms of Use.