Orders Orders Backward Forward
Comments Comments
© 1997 The McGraw-Hill Companies, Inc. All rights reserved.
Any use of this Beta Book is subject to the rules stated in the Terms of Use.

Chapter 23: Perl Debugging Tips

Perl is free-form compared to other languages, and the programming process in Perl tends to be free-form as well. When a Perl script is running, it is also being compiled.

Likewise, when you are designing your application you are inevitably doing some of the programming of it at the same time. The usual Software Development Life Cycle (SDLC) stages tend to blur and/or become compressed with Perl, and you tend to do more than one part of the cycle at once. Iterations of the "gather requirements, plan, design, program, test and implement" stages tend to come more closely together than in most other languages.

The same is true for debugging Perl programs. We covered the more 'ordinary' tools in the Perl programmers toolbox last chapter: debugger, profiler, coverage tester, compiler. But even here, the 'ordinary' tools weren't ordinary at all, for they were built out of Perl itself.

This chapter goes one step further covers some very Perl specific debugging techniques that will make your programming life much easier. The purpose of this chapter is to go over those techniques in detail so you can get the most out of your programming time.

You will definitely want to read this chapter if you are unfamiliar with large-scale Perl projects. This chapter covers the gamut of debugging tricks that are not in the on-line documentation, along with examples of their use in real life.

Chapter Overview

Each programming language has its own tips and tricks to get the maximum efficiency in debugging the language, and Perl is no exception. Perl is just fairly unique in that many of the tips and tricks are integrated into the language itself. Since Perl makes it possible to program your own warnings and errors in modules and objects, the line between programming and debugging gets even thinner.

Hence, this chapter is going to be a bit of a grab bag chapter. We discuss lots of the 'hooks' or 'tricks' that have surfaced, some of which are very powerful. The chapter consists of sections:

1) Tips for error free programming. Including information such as how to interpret Perl error messages, learning to program Perl with style, and so on.

2) Perl safety guards - including 'use strict', 'use diagnostics', '-w', and the new Lint module (that comes with the compiler)

3) Stack Tracing Code - including 'use Carp' and all of its associated functions

4) Debugging on the Fly - Exception handling, and how it is covered in Perl

5) Data Debugging - using Data::Dumper and Tie::Watch effectively to find data problems.

6) the '-D' option to Perl - what '-D' covers, how to use it, and using the '-D' option to debug regular expressions

7) programming your own debugging modules. making a module to capture and retain warnings and fatal errors, and how to catch programming errors via a module which makes a class more strict.

As you can see, this is a rather scattershot group of topics. So let's take a little bit in orienting yourself on how to actually fit all of this information together.

Orienting Yourself to Debug-Programming in Perl

Debugging in Perl is really a craft. With a little effort, you can get a lot more useful results than with languages such as C or C++, and hence, can make more powerful programs, faster than usual.

I have found when programming that the more information available about the environment that you are working in, the quicker you can track down problems. In many ways, this is a self-evident principle.

Anybody who has programmed in assembly languages know that it is a major pain when your program says something like 'Stack Fault in 0x1233fd'. This is a bare bones error which tells absolutely nothing about the surrounding conditions. Did you misalign the stack wrong? Or perhaps there was an overflow in one of the variables?

This error could signal a million different problems, and you have nothing but your own noggin' to figure out which of these possible causes it could have been. Same thing goes with the mysterious 'General Protection Fault' error that every single Windows user has experienced. What exactly does 'General Protection Fault' mean? One of a trillion different things that could be wrong with:

Windows (the OS itself)

the Application layer

the memory stack

Furthermore, since you probably don't have the source code to Windows, there is no way of telling where the error occurred. Good luck on figuring this out!

The key to effective debugging then, is information. When something goes wrong, you want to have information about:

The exact place in the program which has the error

The surrounding conditions that made the program have an error

The state of the data when the program had an error

Since Perl's specialty is manipulating text, you can get tons of information about any error that might occur. It is a simple question of organizing the information that you receive.

Now, of course, the first thing you have to do is make your program parse in Perl. That is the point which we shall turn to next.

Tips for error-free programming

Let's assume that you have gotten a program written and need to make it run. What points should you be aware of to make this job easier?

Perl's Error Messages

Point #1: Perl is generally correct when it tells you where your programs have errors, but not always.

The accuracy of Perl in pinpointing where your errors are is incredible, given how complicated the language is. Nonetheless, Perl does make mistakes. Take the following program:

print FD "Statement1

print FD "Statement #2"

Here, Perl correctly diagnoses the problem of not having a parenthesis and prints:

Bare word found where operator expected at a.p line 4, near "print FD "Statement2"

(Might be a runaway multi-line "" string starting on line 3)

(Do you need to predeclare print?)

syntax error at a.p line 4, near "print FD "Statement2"

Can't find string terminator '"' anywhere before EOF at a.p line 4.

However, in this, admittedly contrived program

1 #!/usr/local/bin/perl5

2 $line = s' ';

3 $print = 1;

4 $summation = 2;

5 $line = ';1';

we get the error:

String found where operator expected at a.p line 6, at end of line

(Missing operator before ';

?)

Can't find string terminator "'" anywhere before EOF at a.p line 6.

even though the error is truly in line 2. Perl has been fooled by thinking that you intended the following, bold text to be a statement:

1 #!/usr/local/bin/perl5

2 $line = s' ';

3 $print = 1;

4 $summation = 2;

5 $line = ';1';

To Perl, it therefore looks like there needs to be another single quote at the end of the whole thing. Likewise, when there are missing brackets:

1 #!/usr/local/bin/perl5

2 sub a

3 {

4 print "Subroutine missing bracket!\n";

5

6 sub b

7 {

8 }

Perl gives the same type of error, diagnosing that you need a 'closing bracket at the end of line 8', even though the error really starts at line 3.

There are two ways/tricks of finding exactly where these errors are:

First, if you have an editor like emacs or vi, and are getting an 'unmatched parenthesis/bracket' error, you can use the editor to 'bounce' the parentheses against each other (in vi this function is shift-5, emacs has an 'auto bounce' mode. Start at the top, and bounce each bracket with each other, looking for mis-matches.)

Second, you can stick __END__ in the appropriate places, until the error disappears. Take the second problem above, with the mismatched brackets:

1 #!/usr/local/bin/perl5

2 sub a

3 {

4 print "Subroutine missing bracket!\n";

5

6 sub b

7 {

8 }

Now, in order to figure out where the error is, we can insert __END__ at specific points:

1 #!/usr/local/bin/perl5

2 sub a

3 {

4 print "Subroutine missing bracket!\n";

5

6 __END__

7 sub b

8 {

9 }

When we compile this changed program, it says the error is at line 6, instead of line 9. This means that the error must be above line 6. When we move the __END__ up to the next bracket that could possibly mismatch (line 1) we get:

1 #!/usr/local/bin/perl5

2 __END__

3 sub a

4 {

5 print "Subroutine missing bracket!\n";

6

7 sub b

8 {

9 }

And voila, the error disappears! Hence, the missing bracket must be between lines 1 and lines 6. We can use this information to narrow down our area of search.

Point #2: If you are getting a weird error, the first thing you should do is go to perldiag.

Let me give you a little advice, although nobody ever really follows it until they get burnt. Perl has a wonderfully useful man-page called perldiag. It gives all the errors that you would ever hope to come across with the Perl interpreter. It even includes errors that occur when you run a Perl script but are not the fault of Perl itself.

perldiag is an absolute treasure trove of debugging tips that you should consult often. To show how useful it has been to me, here is an example in which I had a problem that could have been easily solved with perldiag:

Someone asked me why a certain Perl script, wrapped up in a shell script, wasn't working. It was giving the error message:

Can't execute perlscript.p

when 'execScript' was typed on the command line (execScript was the shell script wrapper). The first thing I did was to check to see if 'execScript' was executable. It was. The second thing I did was to check that the right version of Perl was running, by saying:

perl -v

which gave me 5.003 with EMBED. No problem there. The third thing that I did was to check that perlscript.p was in the correct place. It was.

So, after scratching my head for a bit, I opened up the shell script to see how it was actually executing the Perl script: It said:

exec perl5 -S perlscript.p $0 $1

Hmm... People who read chapters 1 and 2 might recognize this statement as a version of the 'universal' Perl header, the one that can be used to port Perl scripts to almost anywhere without changing the path to Perl in a #! statement. So what was unusual about this?

It turned out to be the '-S'. Remove the '-S', and the script ran fine. Why? Well, it turned out that the

Can't execute perlscript.p

was due to perlscript.p, not the wrapper, having no execute permissions.

Usually, when you say 'perl perlscript.p', Perl ignores what permissions perlscript.p has. 'perl -S perlscript.p' overrides that fact.

Yuck. A very subtle bug, and a total waste of debugging time. If I had just swallowed my pride, and went to the error list in perldiag, I would have seen the following message:

Can't execute %s:

(F) You used the -S switch but the script to execute could not be found in the PATH, or at least not with the correct permissions.

This message told me exactly what I needed, without needing to go through the two hours of debugging that was necessary.

Hence, perldiag is your friend. Every time you have a bug which you don't know, your knee-jerk reaction should be to go to this manpage.

Point #3: Get to instinctively know the Perl Parser

Once you get to know the syntax, and above all the foibles, behind Perl's syntax, creating and executing a Perl program will be a lot easier. To this end, Perl comes with a 'syntax highlighter' for emacs called cperl-mode.el. When you load a Perl file into cperl-mode.el, it automatically highlights, in different colors or fonts, the functions, matched parentheses, matched brackets, and what not for you and makes it infinitely easier for you to learn Perl syntax by heart. Refer to the distribution for more detail; cperl-mode.el comes with embedded documentation.

Besides the information in the previous chapter, Appendix One contains a lot of the common mistakes people make, and more importantly, the reasons that they are mistakes. If you run across a syntax problem in your programs, it is a good 80% chance that this mistake is in this set of common errors.

Likewise, the documentation contains a lot of common errors people make, as well as a complete description of what the errors mean. The perldiag manpage is the key document here.

Since Perl inherited its functionality from pretty much every computer language on the planet (except COBOL), the perltrap manpage contains common problems that people have when coming from certain backgrounds. (For example, 'If you are a 'Cerebral C' programmer, you should be aware that the 'else if' C construct is 'elsif' in Perl'.)

Style Tips

Understanding the previous section can help you compile programs, but this does not make up for how much time you can waste if you don't follow conventions. This section looks at coding conventions that should save you a lot of pain.

Point #4: Program with style: perlstyle

Look at, nay memorize, the perlstyle documentation page. It gives some very good guidelines for coding, each of which will make your programs be less buggy and easier to debug when you have problems.

I personally do not agree with some of the conventions there, but it always pays to have some sort of method to your madness. For example, I like to spread out my code a little bit more than perlstyle prefers:

sub aaa

{

}

but I have a reason: I like to see which brackets belong with each other at a glance, and like to see lots of whitespace for logic's sake. However, on some occasions I do the opposite:

sub aaa { }

especially when there is only one line to a given function. This way, I see all the logic which goes with that subroutine on the same line.

The main point is that you do have conventions and that they seem reasonable to you.

Perl 'safety' guards

Having assiduously studied the perlstyle man page, you have resolved to code in style, and avoid the spaghetti-like of code that comes from not thinking issues through properly. Furthermore, you have gotten lots of practice in dealing Perl code efficiently. There are in general, three types of scripts that you will program out there:

1) Throwaway scripts. Scripts that are meant to be used once, and then discarded.

2) Transition scripts - those meant to fulfill a role for a limited amount of time, and then be discarded.

3) Keepers - scripts which are both meant to fulfill a role, and are programmed correctly.

Part of the process in figuring out how much attention you want to pay to style, and how much protection you want for your code, is to figure out what type of script you are dealing with.

If the script is a throwaway, you really don't care about its style. If it is a transition script, you should take a little more care. If your script is a keeper, you avoid bad styles like the plague.

Perl has a few tools to keep you in good style. I call them 'safety guards', and we shall go over them here. As we have said above, you will want to use these safety guards in differing amounts depending on whether your script is throwaway, transition, or a keeper.

use strict

use strict is a biggie amongst safety guards, and we have pretty much peppered this book with 'use strict' for good reason. Simply put, 'use strict' should be in about 90% of your code, if not more*. Why? Well, it protects you against yourself.

##### BEGIN NOTE #####

The only place that you shouldn't have 'use strict' is in places where you dynamically export variables.. Code such as 'use Exporter' can't be programmed without soft references. See Chapter 11 (Perl 5 Odds and Ends) for more detail.

##### END NOTE #####

Since Perl compiles and executes on the fly without checking the correct use of variables, without the aid of 'use strict' you are vulnerable to a whole bunch of simple errors. Consider the following pieces of code:

Probable Error #1;

$variable = 2;

print $varbiable**2

Probable Error #2;

if ($condition) { my @array = (1,2,3); }

else { my @array = (1,2,3,4); }

print "@array\n";

Probable Error #3;

my $arrayRef = 'scalar';

print @$arrayRef;

Probable Error #4;

my $scalarRef = 'scalar';

print $$scalarRef;

Probable error #5;

$value = function; # (function meant to be a

# function returning a value, but not defined)

Probable error #6;

sub function { &do_stuff; }

$value = function;

Each of these probable errors represent a possible runtime, 'tear my hair out to the roots scenario' that you can go through - probably will go through if you don't use 'use strict'. Let's look at them more closely.

strict variables:

Consider the first two errors:

Probable Error #1;

$variable = 2;

print $varbiable**2

Probable Error #2;

if ($condition) { my $scalar = 1; }

else { my $scalar = 2; }

print "$scalar";

Error #1 illustrates the common mistake of mispelling a variable ($varbiable) and therefore, making the 'print' statement print zero. Error #2 shows a scoping problem. The $scalar variable is created within the 'if { }', and therefore the $scalar variable outside the scope is blank.

Both of these errors are due to the non-strictness of the variables. 'use strict' prevents these errors by forcing the following convention:

You must either fully declare a regular variable, namespace included, or declare it as a my variable that is correctly scoped.

Consider how this prevents the two errors described above. If we say:

my $variable = 2;

print $varbiable ** 2;

then Perl registers the variable $variable as being a my variable, but then it comes to $varbiable and sees no such reference. Hence, it prints out:

Global symbol "varbiable" requires explicit package name at as.p line 5.

In other words, you would have to say '$package::varbiable' instead of $varbiable in order to make this mistake. Which isn't likely!

As for the scope error, if we say:

if ($condition) { my $scalar = 1; }

else { my $scalar = 2; }

print "$scalar\n";

then Perl recognizes the fact that the variable $scalar inside the brackets is not the same as the $scalar inside the print. When you try to compile this, Perl again says:

Global symbol "condition" requires explicit package name at a.p line 4.

Global symbol "scalar" requires explicit package name at a.p line 6.

Perl with 'use strict' warns us of a potential error that we didn't even notice, the fact that 'condition' hasn't been declared with a my, which could bite us later.

Strict References

Now let's talk about errors three and four:

Probable Error #3;

my $arrayRef = 'scalar';

print @$arrayRef;

Probable Error #4;

my $scalarRef = 'scalar';

print $$scalarRef;

Both of these 'errors' are perfectly legal statement in Perl, without 'use strict'. Even though you obviously meant to make $arrayRef an array reference, it ended up a scalar. This is the same with $scalarRef.

Since Perl has the ominous soft references, which are extremely useful 10% of the time, this has to be legal. But 90% of the time, you don't want to use soft references.

Strict references will help you out here. They tag the code above with the errors:

Can't use string ("scalar") as an ARRAY ref while "strict refs" is in use at a.p, line 5.

and

Can't use string ("scalar") as a SCALAR ref while "strict refs in use at a.p, line 5

This helps a lot even though this check is made while the program is running, rather than being the compile time error that strict vars provided. It sends you straight to the source of the error, rather than having the error occur, and have you not notice till two subroutines later, when you find out that the certain variable you are tracking down is empty.

strict subs

The third type of strict that Perl provides is strict subs. This is demonstrated in probable errors 5 and 6:

Probable error #5;

my $value = function_name; # (function meant to be a

# function returning a value, but not defined)

Probable error #6;

my $value = function_name;

sub function { &do_stuff; }

are pretty poor programming style in themselves, but with strict subs they become syntax errors. The first one, '$value = function;' is particularly bad, because it could mean two things to Perl:

$value = 'function_name';

or

$value = function_name();

depending on whether or not the function function is defined before or after the statement '$value = function;'. This has to do with the way Perl parses the program in the first place, and it is a long story why.('Just say it is due to forward references", one person from the net says.*)

In a nutshell, Perl doesn't know that 'function' is in fact a function until it sees the 'sub function { }' statement. Enough said.

This is why '$value = function;' is a no-no. There is no clarity of intent; this statement says two things at once, and that is a bad thing. Likewise, function could become a future keyword, so you are setting yourself up for errors in the future. Compounding this, you can do it accidentally. Hence, use strict blocks this:

Bareword "function" not allowed while "strict subs" in use at a.p line 4.

This allows you to disambiguate, and say what you actually mean, before you do it.

Partial Use of strict

As I said, sometimes you want to have non-strict variables, non-strict references, or (although I never have had the need) to have non-strict functions! You'll know the time, when you come upon it. In fact, a couple of our examples in the Object Oriented part of this book needed to turn off strict for the time being.

Should that need ever arise, this is how you do it. If you say

use strict "vars";

this directive will make only the variables strict. This means that you need to fully declare variables or say 'my $varbname'. Likewise:

use strict "refs";

use strict "subs";

will only do strict references and strict functions, respectively. You can use the prefix no to turn off strictness, and can do so at the level of the block. For example:

1 use strict "vars";

2 my $a = 1;

3 if (defined $a)

4 {

5 no strict "vars";

6 $b = $a;

7 }

This compiles, even though in line 6, we have not declared $b. no strict "vars" turns off strictness within the brackets, from lines 4-7. If we said in line 8:

8 $b = 1;

then the code wouldn't have compiled.

use strict caveat

There is only one caveat that you should be aware of when you use use strict, and that is the caveat that goes along with many of the directives:

use strict is lexically scoped, not globally scoped.

In other words if you say, in the file a.pm (containing the class a);

a.pm

package a;

use strict;

# package a's code.

the use strict only covers package a. Likewise, if you then say in the file b.pm (containing the class b):

b.pm

package b;

# package b's code..

neglecting to say use strict in package b's code, then class a's use strict will not affect class b. Hence, you should make it a rule that your first statement after every package statement or beginning Perl line to say:

#!/usr/local/bin/perl5

use strict;

or

package a;

use strict;

And if you are foolish enough to say:

{

use strict;

}

well, you should expect trouble. After all, since use strict is lexically scoped, the strictness goes out of scope after the closing bracket!

use strict summary

'use strict' is your first line of defense against the constant assault of typos and logical mistakes that every programmer makes. Yes, everybody makes these types of mistakes; but the mark of a good Perl programmer over one that is struggling is in the way they handle finding these types of errors, and how quick they are to fix them.

Create yourself a good Perl programming habit and use use strict pretty much all the time. Only in exceptional circumstances should you neglect it.

The '-w' flag and 'use diagnostics'

The first line of defense that you have is use strict as said in the previous section, but a close second is '-w'. '-w' is the warning flag: it warns you against code that may be wrong, but cannot be sure. You will quickly develop a love/hate relationship with '-w' until you learn instinctively how to write code that avoids '-w' errors.

Why? Well, many beginning Perl programmers write code and get really discouraged when they either say:

perl -w filename;

or

#!/usr/local/bin/perl -w

for the first time, and their code outputs thousands of warnings to the screen. That is what happened to me the first time I used '-w'. These warnings are notoriously difficult to get rid of. If you are the verbose type, then you might prefer

use diagnostics;

instead, since use diagnostics will do the same thing as '-w' except it is more verbose. For example, '-w' will give you a warning that looks like:

print (...) interpreted as function in line 4.

whereas 'use diagnostics' will provide:

print (...) interpreted as function in line 4.

(W) You've run afoul of the rule that says that any list

operator followed by parenthesis turns into a function,

with all the list operators arguments found the parentheses.

See perlop/Terms and List Operators (Leftward)

Some people like the longer warnings, some people think they are redundant. I personally favor the longer warnings when working with people new to Perl - after all, look at exactly how much information this is telling you about Perl! It gives you a long explanation of what is going on and a place to find more information about the reasoning about why Perl is put together in a certain way

On the other hand, if you are running a program that generates fifty thousand error messages, you are probably going to get sick of this rule.

Functionality provided by '-w'

What code problems does '-w' catch? Here are some of them:

1) writes and reads on closed and undefined filehandles:

print STDERRR "Aha!\n"; # mean STDERR?

$line = <STDNI>; # mean STDIN?

open (FD, "non_existant_file");# open fails, and you neglect to check status

$line = <FD>;

2) writes and reads on closed and undefined sockets:

accept(NOTHING, NOBODY);

connect(NOTHING, $inetadr);

3) ambiguous usages and their resolution:

print ${map}; # is this print ${map()} or print $map?

4) variables that are just used once:

print $sefl; # sure its not '$self?'

5) undefined variables, and array/hash elements.

$scalar = undef; print $scalar

$condition = undef; if ($condition) { do_something();}

$hash{'val'} = 1; print $hash{'vall'};

6) system and execute calls that fail because of permission problems and/or nonexistant executables:

system("dirr");

system("wrong_permission_file");

7) places in which you try to write to a read-only filehandle, or read from a writable file-handle:

open(FD, "> writeFile"); # writing to a file

$line = <FD>; # reading from that file

8) probable syntax errors, but ones that Perl can't be sure about:

open FD, "filename" || die; # should be open(FD, "filename") || die;

# or open FD, "filename" or die;

@array['aaa']; # Can't use string as array element!

@key = (1,2,3); $hash{@key}; # meant $hash{$key}?

@array[1]; # not an error, but $array[1] is cleaner

%hash{$key} # meant $hash{$key}?

my $a, $b = @_; # means my ($a), ($b = @_);

if ($a = $b) { do_something;} # do you mean ==? I *wish* this warning was in C!

$array[0,0,0]; # should be $array[0][0][0]

9) 'void' contexts (where a variable is dropped off into nothingness):

if ((1,2,3) < (4,5,6)) # means 1 < 4

$aa = ('this','drops','words'); # sets $aa == 'this'

10) redefined subroutines:

sub subname { print "HERE!\n"; }

sub subname { print "HERE2!\n"; }

subname();

11) conflicts between strings and numbers:

if ('a' < 'b') { }

$a = 'aaa'; $b = 'bbb'; $a <=> $b;

12) user-defined warnings:

use Carp;

warn "oh dear. it happened again.\n";

These are just some of the warnings '-w' gives you, albeit the main ones. As you can see, they are quite the treasure trove, especially to the new Perl programmer! In fact, I would go as far to say that - if you are programming correctly - you have debugged about 90% of your program if the programs runs successfully without 'warnings'.

Overcoming the 'not use -w' (or use diagnostics) Barrier.

This is a fair warning. If you are not used to it, the '-w' flag will be a big wet blanket at first by always slowing you down, and making your programming go twice as slow as it did 'pre -w'.

Here is the pattern that I went through in my encounter with the warning flag.

1) being unaware of '-w': First, I was unaware of '-w' flag, and I bloodied my nose by making lots of the same mistakes that -w could catch over and over again.

2) trying and rejecting '-w': I then heard of '-w', and testing the waters, put it in my script. I immediately got disgusted with exactly how many 'potential errors' there were in my script.

3) acceptance of '-w': After a long, hard struggle with my habits, I finally steeled myself to use '-w' in my programs.

This is the 'ignorance, denial, acceptance' cycle, and I doubt that I'm the only person who has gone through it. The simplest thing to do is skip stages 1 and 2 and go directly into 3. I daresay you will be a much stronger programmer if you learn to program under the constraints that 'use strict' and '-w' provide!

However, this is easier said than done, since it is potentially a big discomfort when you are starting out to get rid of warning messages, especially if you think the code is already correct. Below, however, we have given some quick ways to get rid of some of the more impertinent warning messages.

Common Warnings and how to Avoid Them

Below are some of the warnings that you will encounter in your Perl programs, and some advice on how to avoid them. But if you get a persistant '-w' error that you can't get rid of, the best way to get rid of the error is to understand the logic behind how Perl is parsing the annoying expression "use of unitialized value".

use of uninitialized value

This is an extremely common error, and it occurs when the code:

accidentally passes a null value to a subroutine

accidentally accesses a null hash element

otherwise tries to read a null value

Most of the time you will be plagued by these errors when you have a subroutine, or method call, and you hand it to other people to use. In calling your subroutine, they will try your code in all ways that you didn't think possible, and report to you exactly what errors they get. It is up to you to use things like '-w' to be able to debug what errors they bring up.

In all cases, the fix for the 'use of unitialized value' error is the same. Suppose you have a subroutine, that is being passed the variable $variable.

The solution is to either put guards, using the keyword defined, that prevent null values from being passed to a subroutine:

die if "variable needs to be defined!\n" (!defined $variable);

Or, put the guard by the variable access itself:

$variable =~ m"string" if (defined $variable);

Or, more kindly, provide a default in case a null value is passed in:

$value = (defined $value)? $value : 0;

In each case, the trick is to make the guard as inobtrusive as possible. If you need more than one line to make a default, or do a check, then chances are that you will want to put the guard in a subroutine:

$value = (defined $value)? $value : make_default('my_sub', 'value',0);

sub make_default

{

print

"

The subroutine $_[0] took the parameter $_[1]

in which you passed a null value. Assigning a default of $_[2]!

";

$_[2];

}

If you do stuff like this, then you not only prevent warnings in your code, you also warn that the other programmers should pass in a value for a given subroutine.

"Ambiguous use of ..." Error

This is a bit of a nasty error, since sometimes you can't get rid of it cleanly, without making 'use strict' complain! For example, suppose that you have the following code:

print "${map}_show_me_map\n";

Well, the reason Perl is complaining is that it doesn't know whether or not ${map} means ${map()} (i.e.: that you are taking the function map and turning the value into a variable, or ${map} means $map.

It assumes the second, that you meant ${map} to mean $map (99% likely) but it is ugly to have it report this.

However, saying ${"map"} to avoid the error doesn't work! Because now, you are making ${"map"} a symbolic reference, and use strict complains! And if you say

print "$map_show_me_map\n";

Then you are saying something opposite from what you originally intended, printing the variable $map_show_me_map. The only answer to this problem is to rename the variable:

print "${mapp}_show_me_map\n";

Ugly (*sigh*) but you can't have everything.

"%s (...) interpreted as function" Error

This is another one of those weird errors that comes up from time to time. It usually happens when you say:

(print ("HERE!!!\n"), exit());

You have used the (statement1, statement2); form to print out stuff and then exit, but Perl is getting confused because of the extra space here. Say:

print("HERE!!!\n", exit());

instead.

Other Warnings

The other warnings that Perl's -w provides (useless use of variable in void context, strings interpreted as numbers, etc.) are fairly straightforward. Getting them usually stems from carelessness or from a misunderstanding of Perlish logic. For example,

if ('x' < 'y') { print "HERE!!\n"; }

will generate the error

Argument "x" isn't numeric in lt at ... line ...

Here, the -w flag points out to you that '<' is supposed to work on numbers and numbers alone. The correct statement should be

if ('x' lt 'y') { print "HERE!!\n"; }

If you pay attention to these types of warnings, and correct them constantly, you will learn Perl's logical rules a lot faster than 'going it alone'.

Warnings '-w' Does Not Cover

Once a Perl programmer finds and starts using '-w', there is a tendency to overcompensate, especially if that user was reluctant to start using '-w' in the first place. In other words, the tendency is to think that a combination of 'use strict' and '-w' will solve all the ills of Perl-kind. However, there are a few errors that '-w' does not cover, and you should be aware of them:

1) mismatching numbers of arguments

'-w' does not warn you of errors that look like:

$scalar = (1,2);

 

subroutine($a,$b,$c);

sub subroutine { my ($a, $b) = @_; }

where the number of arguments are different. In this case, the bolded elements (1, $c) will be dropped.

2) undefined variables inside an if clause

'-w' does not warn you that:

$self->{'key'} = 1;

if ($self->{'ky'}) { print "HERE!\n"; }

has a blank element ($self->{'ky'}). This is a real problem with objects, and we will address it later.

3) misusing hashes as arrays, and arrays as hashes

'-w' does not point out that:

%hash = (1,2,3,4);

function(%hash);

sub function { my (@array) = @_; print "@array\n"; }

will scramble the elements in %hash, and have an indeterminate order.

Finally, '-w' does not point out several logical errors when the programmer gets really fancy! As we said, you can have your code approach the level of 'code noise' (with more special characters than there are letters). If this happens, all bets are off. Don't do this unless you feel really fancy.

Finding the Source of Warnings

Finally, I might note that sometimes it is actually more difficult to find the source of the warning than it is to fix it. This is especially true of warnings that are found in subroutines, since it is usually the case that the bug is found inside the calling code rather inside the subroutine itself.

We saw this above. When you say:

1 $a = undef;

2 subroutine($a);

3 sub subroutine { my ($a) = @_; print "$a\n"; }

and use '-w', this will point the error out in line 3 (i.e.: in the subroutine) rather than the true source of the error which is line 2 (the code that called the subroutine).

In order to get a more descriptive error which points to line 2, you will have to redefine the '__WARN__' signal handler. We shall talk about this later, after we cover stack traces. Read stack traces first, and then see the section 'Debugging on the Fly' for more detail.

The Lint Module

Let's go over the Lint module, which has the potential to make some serious advances in the process of actually finding errors before they bite you in Perl. It is relatively new, and already has spawned some extremely useful features.

What does Lint actually mean? Lint is named after the UNIX C-program of the same name, which in turn is named after the pieces of fluff that embed themselves inside clothing.*

According to the on-line computer dictionary at 'http://wagner.princeton.edu/foldoc', that is. Kinda boring, in comparison to some of the etymologies you can find there.

The job of the Lint module, then, is to vacuum up the bugs that dwell inside your Perl code. These bugs form two cases, items that are:

1) not obviously bugs: bugs that can't be caught with 'use strict' since they might have genuine uses.

2) items that you want to catch at compile time rather than runtime: This excludes using '-w', because '-w' is a run-time check.

To fill in this gap - warnings that aren't necessarily errors but need to be found at compile time - Malcolm Beattie developed his Lint module in combination with the compiler. It basically is a programmable module with an interface on the way that the perl interpreter works.*

This is done through a module called 'B.pm' - 'B' stands for 'backend'. To understand B.pm, you need to understand how the components of Perl are built. See perlguts, and the book Advanced Perl Programming for more detail.

Usage of the Lint module

To use the Lint module, you run it much like you ran the debugger last chapter. You simply say:

perl -MO=Lint perl_script.p

to check for Lint warnings in your program. Perl will then list the warnings that it finds, and then return you back to the prompt. Usually, Lint checks only your main program for warnings. To turn on debug checks for all modules included in your program, you can say:

perl -MO=Lint,-u perl_script.p # VAPORWARE?

This is my preferred method of checking my scripts, since it is a compile-time check, and it doesn't hurt to see all of the warnings that are generated in modules. If you want to be more selective, you can say something like:

perl -MO=Lint,-uMyModule,MyModule2 perl_script.p

to selectively cover the script Prl_script.p, and the modules MyModule, and MyModule2.

Lint provides quite a few warnings - if you want to selectively turn them on and off, you can say:

perl -MO=Lint,coverage perl_script.p

to only turn the 'coverage' test on, and ignore the rest of the subroutines.

The only thing left then, is to know what options the Lint module has available. This is a rapidly growing topic, but here are the current ones, listed below.

Lint Debugging checks.

Lint has only existed a couple of months as of this writing, but the following tests have already been incorporated into its interface. I list them below, each one with a small explanation of how it can be so helpful in real life:

the undefined-subs test

This is perhaps the most useful of the Lint checks, and it fills in a great hole that existed prior to the Lint module. This hole can be demonstrated quite vividly. Suppose you had the following code:

Listing 23.X - scaletest.p

1 scale1();

2 scale22();

3

4 sub scale1

5 {

6 print "do re mi fa sol la ti..";

7 sleep(36000);

8 }

9 sub scale2

10 {

11 print "do\n";

12 }

When you run this code, you will get a run that looks like:

prompt% perl scaletest.p

do re mi fa sol la ti..

Undefined subroutine &main::_scale22 called at scaletest.p line 3;

Since you hit line 7, you waited 10 hours just to see that your program bombed before completing its task. All because there was an extra digit on the end of '_scale2'! This was one of my biggest pet peeves with Perl since it meant that you needed to be very careful not to make a syntax error in a method name. (before Lint that is.) Now, when you say:

prompt% perl -MO=Lint,undefined-subs a.p

or

prompt% perl -MO=Lint a.p

you get a warning:

Warning: Undefined subroutine &main::_scale22 to be called at scaletest.p line 3!

Immediately, before you actually start the script! Much better (and 10 hours saved).*

More on this option to Lint. You might notice that you can have similar problems occur with methods (either class methods or object methods. If you say:

Class->method();

and method doesn't exist either in Class or in Class' parents through @ISA, you have problems. However, this option finds these errors quite nicely. When Perl gets strong typing (may be already true by the time this book is out) you will be able to say:

1 my Dog $spot = new Dog('chihuahua');

2 $spot->wooff();

and have line 2 flagged as an error as well since you probably meant 'woof'. (line #1 designates that $spot is always going to be a dog so you can make certain assumptions about him.)

the context test

The context test simply flags down warnings that look something like:

$length = length(@array);

$length = @bar;

in which you are converting to a scalar from an array, say, to find the length of it. I'm not too fond of this test - since I am always saying statements like:

if (@a > @b) { }

but it is helpful for new Perl folks who constantly run into the:

sub

my_sub { my $arg = @_; }

trap (which makes $arg equal to the length of @_ not the first element in @_) when programming subroutines. If you want to get rid of the warning, you can say:

my $length = scalar(@array);

implicit-read, implicit-write

Again, these two tests get rid of common, new Perl programmer mistakes: dealing with $_ unintentionally. For example:

my $value = m"value";

does not look for the pattern 'value' inside the variable $value . Instead, it looks inside $_ and sets $value equal to '' or 1 depending on if this match found anything. This is much to the dismay of new Perl programmers. Likewise,

$value = s"value"other_value"g;

does a substitution of 'value' for 'other value' on $_ rather than 'value', and again trashes the value of $value (setting it to the number of times the string 'value' was substituted for 'other_value' inside $_.). Saying

prompt% perl -MO=Lint script.p

prevents these errors.

dollar-underscore

The dollar_underscore option looks for places where $_ is said explicitly, such as:

$_ = 1;

or implicitly inside print, like:

print, "Hi there!\n";

  • private-names
  • Finally, there is one more variable that we shall cover - private-names. Private names gives you a gentle reminder when you are using method names outside the scope in which they were intended.

    For example, if you define a package as:

    package A;

    sub _function1 {}

    sub _function2 {}

    1;

    and then someone making a script goes ahead and uses the private functions in a script, as in:

    script.p

    use A;

     

    my $object = new A();

    $object->_function1();

    then that person is breaking your implicit rule that the function was in fact private. To warn of such potentially dangerous behavior you can use the private-names routine. When you say:

    prompt% perl5 -MO=Lint.pm,private-names script.p

    Then perl will emit a warning, something like:

    Warning! The function _function1() is a private function, used in a non private context!

    This especially helps on large projects, where you need to insure internal privacy rules.

    Future directions for Lint.

    Those are the only options available for Lint today, because of how young the Lint module is. However, the situation may be vastly different by the time you read this sentence, so Lint.pm is a module to watch.

    Since Lint is programmable - much like the debugger facilities in Perl (see last chapter) - one may soon be able to enforce a coding standard much stricter than one can do today, simply by programming a lint-like module to warn on certain types of syntax. Here are a couple directions I would like to see lint take in future versions:

    1) simple usage. The lint module could use a simplified front-end, much like when you say:

    prompt% perl -d script.p

    you run the debugger. Perhaps a '-L' flag could be created, which links the lint module in, so you could say:

    prompt% perl -L script.p

    to get the desired, Lint-ing effect

    2) customizability. The Perl debugger has a .perldb file that lets you set debugger commands. I'd like to see a .perllint file which lets you pick and choose which Lint warnings you wish suppressed, as well as modules which you either always want to check, or always don't want to check.

    3) programmability. Right now, to program your own lint module, you need to specify it to the compiler 'backend', something like:

    prompt% perl -MO=MyLint

    and know quite a bit about the internal workings of the compiler. I'd like to see the same usage for the Lint module as for the debugger, where you would say something like:

    prompt% perl -L:MyLint file.p

    which would then run the 'Lint::MyLint' module on file file.p

    4) making -u standard, and having a separate (non default) option for 'just linting the script without the modules'. At least in my case, I am always worried about the whole picture when it comes to debugging. Hence, I would like to see the check which looks at all the modules for lint errors be the default, and having people specify a command line argument if they don't want everything checked.

    5) adding an 'ignore' flag. No matter how much one tries, one is not going to get rid of all the warnings that Lint generates (or one does not want to). Therefore, I would like to see an option for putting an:

    # IGNORE

    comment around items that I know are warnings. This would save lots of time in tracing down false warnings.

    In short, as I said Lint is new software - and Lint can be expanded quite a bit. Probably by the time you read this, it will have been. Lint is a module to watch; and there are going to be a lot of cool things that come out of it.

    Summary of 'use strict', '-w' and 'use diagnostics', and 'lint'

    The safety nets '-w', 'use strict', 'use diagnostics', and the Lint module, are essential tools that should be in the tool box of every Perl programmer. These tools point out simple potential problems with code, such as:

    1) mistyped variables

    2) undefined variables

    3) code logic errors

    4) assorted, other, errors.

    All serious Perl programs should be "warning" proof and 'use strict'. In addition, you should run all serious Perl programs through

    prompt% perl -MO=Lint,-u

    to point out dangling subroutine references, misused contexts, and so forth. These four tools provide about as much support you can reasonably expect to get from automatic tools - the next level of error-checking comes from items you add to the body of your code - items like stack-tracing (next) or exception handling. We will first talk about Carp below.

    Pinpointing Errors: Stack Traces with 'use Carp'

    Now suppose that you have some complicated code, one that has a pretty deep 'calling tree', something that looks like Figure 21.1:

    211fig.fig

    Figure 21.1

    Usage Tree for complicated Perl Code

    In other words, module 'Application' uses module 'Screen', which uses module 'Field', which uses the module 'Submit'. Now suppose that a bug happens in 'Application', but the source of the bug is in 'Submit'. Let's say that the code responsible looks something like:

    Submit.pm

    1 sub print

    2 {

    3 ### code skipped.

    4 if ($wrong_argument)

    5 {

    6 die "You can't pass the SEND parameter to the function 'print'!";

    7 }

    8 }

    Now what happens when line 6 gets hit? Well, Perl dutifully exits the code, saying something like:

    You can't pass the SEND parameter to the function 'get'! at line 6

    But the source of the error is not in 'Submit.pm'. It is in 'Application.pm'! Hence, this code is next to useless. It points out, that yes, there was an error, but you will spend the next 5 hours actually traipsing through the code, trying to find where the real error is located.

    What to do about this? Well, since the true error lies somewhere up the call chain, we need to somehow see the path which Perl used to actually get to the module 'Submit.pm'. In other words, we need what is called a stack trace.

    How to get this stack trace? Well, remember the function caller() from Chapter 10, Built-in Functions and Variables in Perl? caller's purpose is to give varying types of stack traces, and we can use that.

    Perl has four very helpful wrappers around caller() which you may want to use instead of directly calling caller() itself. They are carp(), cluck(), confess() and croak().

    All of these routines are included in source code by saying:

    use Carp;

    at the beginning of any module or script that uses them. Let's take a look at each of them, and see what they will do for you:

    carp()

    carp() is the least intrusive function of them all. It prints out the stack trace from the tip of the trace (where the carp call actually resides) to the point at which Perl enters the package itself. For example, suppose the stack trace of the complicated code above looked something like:

    Application::get() calls:

    Application::find() calls:

    Screen::draw() calls:

    Field::draw() calls:

    Field::type() calls:

    Submit::get() calls

    Submit::print()

    in which each function calls the function on the following line. If Submit::print() had in it:

    1 if ($wrong_argument)

    2 {

    3 die "You can't pass the SEND parameter to the function 'print'!";

    4 }

    and the user indeed had a '$wrong argument' so that the code was executed, then carp would display something like:

    Submit::print called at Submit.pm line 54

    Submit::get called at Submit.pm line 30

    In other words, carp stops the trace at the point where the module 'Submit.pm' is entered, printing the trace from the bottom up so to speak. The code continues executing, exactly as if you did a print.

    Carp thus acts as a signpoint function, not giving an overwhelming amount of information, just printing it out to the screen as a warning).

    cluck()

    cluck() is exactly like carp(), except that cluck goes through the entire stack trace. Given the same calling tree:

    Application::get() calls:

    Application::find() calls:

    Screen::draw() calls:

    Field::draw() calls:

    Field::type() calls:

    Submit::get() calls

    Submit::print()

    cluck() prints out

    Submit::print called at Submit.pm line 54

    Submit::get called at Submit.pm line 30

    Field::type called at Field.pm line 65

    Field::draw called at Field.pm line 144

    Screen::draw called at Screen.pm line 11

    Application::find called at Application.pm line 65

    Application::get called at Application.pm line 42

    Hence, it prints the entire stack up to the root, and then continues with the program. This is very useful for tracing the logic behind convoluted programs: simply put a cluck() every once in a while to keep the program honest, and so that you don't lose track of the logic.*

    One more thing; You don't get cluck by default, when you say 'use Carp'. You need to do export cluck() manually. You need to say:

    use Carp qw(cluck);

    because of concerns in the Perl development community of exporting extra functions.

    croak()

    croak() is the fatal version of carp(). In other words, given the calling tree:

    Application::get() calls:

    Application::find() calls:

    Screen::draw() calls:

    Field::draw() calls:

    Field::type() calls:

    Submit::get() calls

    Submit::print()

    croak prints out:

    Submit::print called at Submit.pm line 54

    Submit::get called at Submit.pm line 30

    and it dies. This is good for places in which you are debugging an object or module, and you are sure that the any errors you get will fall inside the module itself.

    confess()

    Finally, we come to my favorite: confess(), which acts as a catch-all procedure, and which will save you hours, especially when you pass the code that you make for other people to use.

    confess() is the fatal version of cluck(). Given the calling tree:

    Application::get() calls:

    Application::find() calls:

    Screen::draw() calls:

    Field::draw() calls:

    Field::type() calls:

    Submit::get() calls

    Submit::print()

    confess() prints out the entire tree down to the point at which confess is hit:

    Submit::print called at Submit.pm line 54

    Submit::get called at Submit.pm line 30

    Field::type called at Field.pm line 65

    Field::draw called at Field.pm line 144

    Screen::draw called at Screen.pm line 11

    Application::find called at Application.pm line 65

    Application::get called at Application.pm line 42

    and then it kills the program.

    I cannot stress how much confess() contributes to making Perl a scalable language. Without confess(), making a program with more than one level of depth is treacherous, and the level of maintenance that you will expend in order to maintain your applications will mushroom.

    In fact, I would go as far to say that you always want to use confess in your modules, instead of die or croak. You can not have too much information in debugging a module, and it is better to have more than less.

    Summary of 'use Carp'

    Again, Perl has given you (as a free gift that comes with the standard distribution) a great tool for scaling up your programs, and finding where elusive bugs lie. This is the module Carp, and it provides the following four functions:

    carp() - which acts like print() but prints out a mini-stack trace.

    cluck() - which acts like print() but prints out a full stack trace.

    croak() - which acts like die() but prints out a mini-stack trace.

    confess() - which acts like die() but prints out a full stack trace.

    Of these four, cluck() and confess() are the most useful, since they give you the most information. They will save you a great deal of time. If you are a Perl programmer that hasn't heard of or used them, you will wonder how you ever coded without them.

    Debugging on the Fly: Finding Problems Dynamically

    You will want to read this section if you are involved in large Perl projects. In any given large project, 10% of the errors that you will encounter come out of the blue. An error hits because of an unforeseen test case. An error hits because a logic error, or a physical error.

    If you do not catch these errors while they occur, they will never get corrected, which is totally unacceptable. The designers and programmers of Perl should be congratulated that they found a solution, it was elegant, and it saved a lot of time.

    We talked about functions which give a stack trace, and warning messages that come out of '-w'. Wouldn't it be great if these two concepts were combined? After all, since most of the warnings come out of functions, and the culprit is really found in the code that calls these functions, there seems to be good reason for making something that can catch a stack trace for unplanned errors.

    So it was done. Perl lets you redefine the logic that happens when a fatal error occurs or a warning occurs. In doing so, Perl gives a very simple way of tracking down a bevy of problems. We go over how to do this below.

    %SIG, $SIG{'__WARN__'}, and $SIG{'__DIE__'}

    We have seen %SIG before. %SIG is Perl's special hash which is used to trap signals from the Operating System. The classic example is:

    $SIG{INT} = sub { "Ouch!!!!\n"; die };

    which means if Perl gets a break signal (someone hits Control-C ) then Perl will die with the message 'Ouch!', instead of the regular, more jerky method of simply returning back a prompt.

    Usually signals are reserved for places where the Operating System sends you a message. In Perl, there is an additional twist:

    You can have signals that come from the program itself: either warnings or errors.

    These special signals are called $SIG{__WARN__} and $SIG{__DIE__}, and we talk about them below.

    $SIG{'__DIE__'}

    Let's first look at how we might handle trapping errors that are fatal, but are unplanned. Consider the code:

    use Carp;

    $SIG{'__DIE__'} = sub { confess "@_"; };

    and see how this can help in your fight against bugs that come up in runtime.

    A good example of a runtime bug is when you accidentally use a hash reference as an array reference. The following code, for example, will die unexpectedly:

    1 $hashref = {};

    2 print @$hashref;

    3 print "Will never get Here!!\n";

    This will never get to line 3. Instead, at line 2, you will unceremoniously get the error:

    Not an ARRAY reference at a.p line 4.

    What happens in line 2 is called an exception, and they will make your life interesting. An exception is one of those errors that I was talking about earlier which came out of the blue. Either the user pressed control-C, or, as above, we came across a piece of data that should have been an array reference, but turned out to be a hash reference.

    Consider, again, what happens if this is buried deep inside your code:

    Application::get() calls:

    Application::find() calls:

    Screen::draw() calls:

    Field::draw() calls:

    Field::type() calls:

    Submit::get() calls

    Submit::print()

    Suppose the offending code that causes the error is in Application::get() and the actual error code is in Submit::print(). Then this error won't be most helpful! What we need is a way to catch the error, and print out something other than:

    Not an ARRAY reference at Submit.pm line 54.

    In languages such as C++ this is a rather complicated affair. You define a 'try { } catch { }' function per class which knows how to handle the errors that it 'catches'. (Perl will have this feature in future releases.)

    Again, in Perl, something like:

    use Carp;

    $SIG{'__DIE__'} = sub { confess "@_"; };

    will suffice. Now, instead of calling the default signal handler, as soon as you hit the statements:

    Submit.pm

    53 $hashref = {};

    54 print @$hashref;

    55 print "Will never get Here!!\n";

    You will get the statements

    Submit::print called at Submit.pm line 54

    Submit::get called at Submit.pm line 30

    Field::type called at Field.pm line 65

    Field::draw called at Field.pm line 144

    Screen::draw called at Screen.pm line 11

    Application::find called at Application.pm line 65

    Application::get called at Application.pm line 42

    This whole process is called exception handling.Try it in your own applications! This type of information is invaluable in tracing down exactly where errors lie in your program. 99% of the time you aren't going to need something more complicated than:

    use Carp;

    $SIG{'__DIE__'} = sub { confess "@_"; };

    If you do need something more complicated ( a try, catch mechanism for example) trust me... it will soon be there.

    $SIG{'__WARN__'}

    So far, so good. $SIG{'__DIE__'} was used to catch fatal errors. Now, lets use $SIG{'__WARN__'} to catch pesky warnings.

    Sick of having thousands of warnings thrown out to the screen, where they whiz past, doing absolutely no good? Well, simply add the following signal handler to your scripts:

    1 use Carp

    2 use FileHandle;

    3 $SIG{'__WARN__'} = \&warnHandler;

    4

    5 sub warnHandler

    6 {

    7 my $fd = new FileHandle(">> $0.log") || die "Couldn't open $0.log!\n";

    8 my $text = Carp::longmess(@_);

    9 print $fd $text;

    10 close($fd);

    11 }

    What exactly does this do? Well, it assigns the '__WARN__' handler to be the function &warnHandler. Hence when you get the warning:

    Use of unitialized value at Submit.pm line 60

    Perl calls the warnHandler function, sets '@_' to be the warning message's value ('Use of..'), opens up a file (the name of your process plus the appendage 'log') and then prints out in gory detail the whole stack trace of the warning:

    Submit::print called at Submit.pm line 59

    Submit::get called at Submit.pm line 30

    Field::type called at Field.pm line 65

    Field::draw called at Field.pm line 144

    Screen::draw called at Screen.pm line 11

    Application::find called at Application.pm line 65

    Application::get called at Application.pm line 42

    This means that you have a trail of breadcrumbs that you can mull over to find the source of your problems.

    You may want to go farther than this. You might want to sort the errors, and make a log for each package that you encounter. You may want to email the trace to you so you need not be dependent on the people who use your code to tell you of the problems.

    And so on. There are hundreds of things that you can do, all to insure that your final product is solid, reliable, and can be more sophisticated. The better the foundation for your code, and the more hooks like the above you have, the more cool things you can do.

    Successful Data Debugging: Data::Dumper() and Tie::Watch()

    As said above, the more data you have about a bug, the quicker you are going to be able to kill it. To that end, there is an essential Perl module, included on the CD, called Data::Dumper() which will make your Perl programming life much easier.

    Its purpose in life is printing out data structures. No matter how complicated your data structures are, no matter how dense, how intertwined, what type they are, whatever - as long as they are legal in Perl, you can print them out by saying:

    use Data::Dumper;

    print Dumper($varb);

    use Data::Dumper gives you the function Dumper() by exporting it into your current namespace. Here, $varb is any legal Perl variable. If you said something like:

    $a = {1 => [1,2,3,4], 2 => $a->{1});

    print Dumper($a);

    Then Perl will dutifully print out your data structure for you:

    $VAR1 = {

    1 => [

    1,

    2,

    3,

    4

    ],

    2 => $VAR1->{1}

    };

    Note two things about this example. One, the output that comes out of Dumper() is perfectly legal Perl code. Hence if you printed this into a file and then ran that file it would recreate $a as it was when it was 'Dumped'.

    Hence, Dumper() is a boon for regression testing, where you are trying to come up with tests which decide whether or not your code is behaving correctly. If you say:

    if (Dumper($var1) ne Dumper($var2)) { print "Test failed!\n"; }

    then this compares the data structure $a with the data structure $b point for point. If they are any different, the strings from Dumper() will be different, and the test will fail.

    Second, note that Dumper() successfully recreates when a data structure references itself. If you said

    $a = {1 => [1,2,3,4]};

    $a->{2} = $a->{1};

    then this is subtly different than:

    $a = {1 => [1,2,3,4], 2=> [1,2,3,4]};

    since, in the first case, if you change $a->{1} you also change $a->{2}, whereas in the second example, $a->{1} is separate from $a->{2}.

    Dumper() and debugging objects

    Dumper() is also very astute at debugging objects. Since objects are simple references in Perl, you can say something like:

    my $object = new ComplicatedObject('arguments');

    print Dumper($object);

    and Dumper() will slice through the object like hot butter, displaying all the internal members of that object! For example, remember our first object, the clock? Well, if you said:

    my $object = new Clock();

    print Dumper($object);

    it would print out

    $VAR1 = bless( {

    time => 868147042

    }, 'Clock' );

    showing you not only the data in the class, but also the fact that it is a clock! A side note: C++ programmers, you may be cringing at this thought, since it 'breaks encapsulation' of the object, but think about it for a minute:

    1) when you say Dumper($object) to get data like this, you are not affecting the object in any way.

    2) you need not write a bunch of routines for debugging. Dumper() will pretty much do all you need.

    3) this approach goes hand in hand with rapid development.

    In all, it is perfectly reasonable to use Dumper in this way, to reach in and see the guts of objects. If anything, it makes the debugging and improvement of these objects easier.

    Tie::Watch # SEMI-VAPORWARE

    Finally (for this section) we will consider the module Tie::Watch, which also makes tracking down data problems very easy. We already created an object called WarnHash in the chapter on inheritance. This module warned us when a hash changed one of its values. Tie::Watch takes this logic to its natural conclusion, letting you watch the way that any of your data changes.

    In other words, let Perl do the hard work, and sit back, watching when things change, and deciding on when the change is a bug or not. For example, last chapter we used the debugger to track down a problem in our heap sort algorithm. We could have used Tie::Watch instead. We could have programmed a test harness:

    Listing 23.X - tietestharness1.p

    1 use Tie::Watch;

    2

    3 require "wrongheap.p";

    4 my @array = (14, 12, 144, 55, 1, 910);

    5 my $watch = new Tie::Watch(-variable => \@array, -store => \&store);

    6

    7 print "@array\n";

    8

    9 sub store

    10 {

    11 my ($tie, $key, $value) = @_;

    12 $tie->Store($key, $value); # does the store for us…

    13 print "Storing value :$value: into index :$key:\n";

    14 }

    In line 5, you set a subroutine to 'watch' when a given array value is stored. The callback subroutine store is called each time somebody does something such as:

    @store = (1);

    $store[1] = 2;

    then the subroutine in line 9 is called. Since heapsort does so much in the way of switching elements around, we should see a lot of activity when we run this. Below is the output I got when running it:

    Storing value :14: into key :0:

    Storing value :12: into key :1:

    Storing value :144: into key :2:

    Storing value :55: into key :3:

    Storing value :1: into key :4:

    Storing value :910: into key :5:

    Storing value :144: into key :5:

    Storing value :910: into key :2:

    Storing value :12: into key :3:

    Storing value :55: into key :1:

    Storing value :14: into key :2:

    Storing value :910: into key :0:

    Storing value :14: into key :5:

    Storing value :144: into key :2:

    Storing value :: into key :0:

    Storing value :910: into key :6:

    The second to the last line is where something is going wrong. Also notice that we are storing a '910' into element 6 whereas there are only five elements in the array that we are sorting. So we need to track this down, where the first 'fetch' happens of element 6. So we add a fetch function to our test harness:

    Listing 23.X - tietestharness2.p

    1 use Tie::Watch;

    2

    3 require "wrongheap.p";

    4 my @array = (14, 12, 144, 55, 1, 910);

    5 my $watch = new Tie::Watch(-variable => \@array,

    6 -store => \&store, -fetch => \&fetch);

    7

    8 print "@array\n";

    9

    10 sub store

    11 {

    12 my ($tie, $key, $value) = @_;

    13 $tie->Store($key, $value); # does the store for us…

    14 (print ("Storing value :$value: into index :$key:\n"), cluck(@_))

    15 if ($value eq '');

    16 }

    17 sub fetch

    18 {

    19 my ($tie, $key) = @_;

    20 my $value = $tie->Fetch($key);

    21 ( print ("--------------------------------------------------\n",

    22 "Fetching the value :$value: out of index:$key:\n"), cluck(@_))

    23 if ($value eq '');

    24 return ($value);

    25 }

    The fetch function now prints out exactly which values are fetched from the array if the value of the element so happens to be '' (an error). We have also added a trace (so we see exactly what has called what), and restricted store to outputting on case of error.

    If we run this now, we get something like:

    --------------------------------------------------

    Fetching the value :: out of index:6:

     

    Tie::Watch::Array=ARRAY(0xb991c) 6

    at /home/ed/perl5.005/install/lib/site_perl/Tie/Watch.pm line 329

     

    Tie::Watch::callback('Tie::Watch::Array=ARRAY(0xb991c)', '-fetch', 6) called at /home/ed/perl5.005/install/lib/site_perl/Tie/Watch.pm line 505

     

    Tie::Watch::Array::FETCH('Tie::Watch::Array=ARRAY(0xb991c)', 6) called at wrongheap.p line 48

     

    main::heapify('ARRAY(0x103fbc)',2,'SCALAR(0xb1df4)') called at

    wrongheap.p line 33

     

    main::build_heap('ARRAY(0x103fbc)', 'SCALAR(0xb1df4)') called at

    wrongheap.p line 12

     

    main::heapsort('ARRAY(0x103fbc)') called at tietestharness2.p

    --------------------------------------------------

    This is the first time that we have fetched the value '' out of the index key 6, so this list of lines should contain our error someplace. We just need to track backward. Is the error in Tie::Watch (329 or 505); no. Is the error in the FETCH? Well, the corresponding line is:

    $largest = $right if (($right <= $$heapsize) &&

    ($array->{$right} > $array->{$largest});

    So in this case, either $right or $largest is 6, since these are the two accesses that are made. Look back up, and see that main::heapify was called with the arguments:

    main::heapify('ARRAY(0x103fbc)',2,'SCALAR(0xb1df4)') called at

    wrongheap.p line 33

    so it must be the one of these arguments that is causing problems.

    At this time, we have pinpointed our problem down to one statement. It shouldn't be that hard to track down now. Using Tie::Watch seldom finds problems on its own (although it does that, too). Instead, it gives you a ballpark to look for bugs. You need not search through all your code; given the context, and area where the bugs are happening, you can narrow your search and then concentrate your efforts using the debugger.

    Summary of Successful Data Debugging

    We covered two of the most useful tools for tracking down data problems: Tie::Watch and Data::Dumper. Perl programmers are infinitely innovative when it comes to making new modules to track down problems, so you might want to check CPAN for new developments, or talk on comp.lang.perl.misc to discuss new ideas for data debugging.

    The Debugging Flag.

    The debugging flag is a bit of an odd duck, but it really is helpful if you get stuck on debugging unusual problems. This is especially true of difficult problems that sometimes are not your fault. '-D' is a flag that you can supply to a Perl script, which assumes that you have debugging turned on in your script.

    As we said in Chapter 1, to get '-D', you need to make a 'debug' version of the Perl executable. If you have not done so already, go to that chapter, and do this. You don't want to have the debug version of Perl as a default. It makes too much of a demand on computer resources.

    Anyway, we give a short overview of how -D is used below.

    Values for '-D'

    Once you have the debugging executable, realize what it gives you. If you have compiled the debugging executable correctly, you get the following 'stuff' in Table 21.1

    Table 21.1

    Number Letter What the Option Does

    1 p Shows exactly how your Perl program is parsed.

    (good if you are a fan of lex and yacc)

     

    2 s Shows exactly what is on the 'stack' at a given time.

    4 l Same as 's', but labeled clearly.

    8 t low level trace of execution.

     

    16 o shows how your object methods are traced

    ( good for debugging inheritance)

     

    32 c shows conversions from strings to numbers

    (not that useful, use '-w' instead.)

     

    64 P Shows preprocessor command. Anachronism.

    (Nobody uses #ifdef, #endif, #define,etc anymore in Perl!)

     

    128 m memory trace.. shows memory usage in gory detail

    (good as a last refuge if you have a memory leak)

     

    256 f Shows how formats are processed. Good if you

    have a format that just doesn't quite work.

     

    512 r Regular expression parse (this section)

     

    1024 x Syntax tree generator

    (shows the results of the parse in option -Dp)

     

    2048 u Shows tainting checks (tainting is an option which

    happens in Unix when you are running with

    'setuid' or 'setgid' bits set on.See perlsec for more detail.

     

    4096 L used to show memory leaks. Compile with 'make pureperl'

    or 'make quantperl' or make 'purecovperl' instead.

     

    8192 H shows exactly how your hashes are stored. For

    Perl debuggers only.

     

    16384 X shows a summary of memory allocation (the 'scratchpad').

    Much like '-Dm' but in summary detail.

     

    32768 D 'Cleaning Up' - shows what steps Perl takes between the end of executing your script, and when it returns to the

    prompt.

     

    If you want to get into actually changing the Perl source code itself, these flags are invaluable, since they give in great, gory detail everything that the Perl binary is doing.

    You can also use these flags to debug your programs. To actually use these flags you have two choices. First, you can call your program with the flag '-D' plus the options you want, i.e.:

    prompt% perl -DrX script.p

    or, equivalently,

    prompt% perl -D16896 script.p # (16384(X) + 512(r) = 16896)

    which then would run script.p for you, with the regular expression checker turned on, and the memory allocation checker turned on for the whole script. This is the first, less useful way. The other way is to set the $^D (dollar Control-D) variable (which does the same thing) yourself.

    This is to prevent the torrent of output that comes with the flag '-D'. Remember, that the debugging flag was made for debugging the Perl executable first, not your Perl script. Hence, it makes sense to only look at areas you are interested in. Below, we use $^D to debug some regular expressions.

    Debugging Regular Expressions

    The thing that I use the most, with Perl's debugging option, is debugging my regular expressions. As with most debugging flags, '-Dr' gives a torrent of output, so I tend to turn it on and off when I need it.

    Just to get a feel of what is going on, lets take a look at debugging the following expression:

    $expr = 'abacabac';

    $expr =~ m"a(.*)c(.*)b";

    Now, this expression is a good example, because it is simple and non-trivial (will go through some backtracking), so we can get a feel for the output. The first thing to do is put the correct $^D values before and after our example:

    open (STDERR, "> log");

    $^D = 512; # Turn on regular expression debugging

    $expr = 'abacabac';

    $expr =~ m"a(.*)c(.*)b";

    $^D = 0; # Turns off debugging.

    Now, we have bounded our problem. ('open(STDERR, "> log");' redirects STDERR to a file named log) I usually then run my script saying something like:

    %prompt: perl script.p

    and then I open the file 'log' to look at what it has captured. In this case:

    1 1:BRANCH <abacabac>

    2 5:EXACT <abacabac>

    3 11:OPEN1 <bacabac>

    4 17:BRANCH <bacabac>

    5 21:STAR <bacabac>

    6 29:CLOSE1 <>

    7 35:EXACT <>

    8 29:CLOSE1 <c>

    9 35:EXACT <c>

    10 41:OPEN2 <>

    11 47:BRANCH <>

    12 51:STAR <>

    13 59:CLOSE2 <>

    14 65:EXACT <>

    15 29:CLOSE1 <ac>

    16 35:EXACT <ac>

    17 29:CLOSE1 <bac>

    18 35:EXACT <bac>

    19 29:CLOSE1 <abac>

    20 35:EXACT <abac>

    21 29:CLOSE1 <cabac>

    22 35:EXACT <cabac>

    23 41:OPEN2 <abac>

    24 47:BRANCH <abac>

    25 51:STAR <abac>

    26 59:CLOSE2 <>

    27 65:EXACT <>

    28 59:CLOSE2 <c>

    29 65:EXACT <c>

    30 59:CLOSE2 <ac>

    31 65:EXACT <ac>

    32 59:CLOSE2 <bac>

    33 65:EXACT <bac>

    34 71:END <ac>

    Now, what to make of this? It isn't the easiest thing to read, but it contains some useful information about what is going on. The main points to realize about this output is that:

    a) The text between <, and > is the text left in the pattern that is being matched. Hence, at the end, <ac> is all that remains of the expression.

    b) each 'in' indentation indicates an extra place where the regular expression can backtrack, each 'out' indentation indicates where the backtrack failed. Hence, in line 14, there is a backtrack, since a(.*)c tried to match abacabac, was too greedy and failed.

    c) you can see greediness in action. At line 15, a(.*)c tries to match abacabac, then abacabac, and then finally abacabac and then branches in line 26.

    Now, try and do your own matches! For example, you can learn quite a bit by saying:

    $a = "'a\\''";

    $a =~ m/'((?:[^'\\]|\\.)*)'/;

    and watch Perl go through its twists and turns to see how it matches your regular expression. This, if you remember, is the regular expression which matches any single quoted string, as seen in the book Mastering Regular Expressions by Jeffery Friedl, and it contains quite a bit of logic that you can exploit

    Summary of '-D' and Debugging Regular Expressions.

    The debugging version of Perl, and '-D' is your 'brute force last resort' if you have a bug that you cannot find. (well, that and reporting bugs via perlbug. ) By use of different flags, you can see various ways that Perl is actually running your program.

    The debugging version of Perl is also the road to successfully fiddling around with the Perl internal code, if you so desire, as well as a refresher on linking C code with Perl code, and a whole bunch of other exotic things. If you are so inclined...

    Programming Auxilliary Tools for Debugging

    As you can see, debugging Perl code isn't nearly like debugging something like 'C' or 'C++'. The debugging process and the coding process are so intertwined that sometimes you feel like you are coding to debug your program, and debugging to code your program!

    The two are therefore, almost indistinguishable from each other, and it is a tribute to how indistinguishable that we can write an entire, 40 page chapter on debugging without talking about the Perl debugger itself!

    Anyway, this chapter ends with two examples of how you can make packages which help extend debugging, extra layers of protection against the stray errors you might encounter:

    Example 1: Warning Directive or Pragma

    First thing we will do is get rid of the necessity to redefine the $SIG{'__WARN__'} and $SIG{'__DIE__'} handlers every single time you write a script. We will write a directive, or pragma, which will do it for us.

    The code in question is the code to redefine $SIG{'__DIE__'}:

    use Carp;

    $SIG{'__DIE__'} = sub { confess "@_"; };

    and $SIG{'__WARN__'}:

    use Carp

    2 use FileHandle;

    3 $SIG{'__WARN__'} = \&warnHandler;

    4

    5 sub warnHandler

    6 {

    7 my $fd = new FileHandle(">> $0.log") || die "Couldn't open $0.log!\n";

    8 my $text = Carp::longmess(@_);

    9 print $fd $text;

    10 close($fd);

    11 }

    Now there are two problems with this code. For one, it needs to be put in front of every script. For two, it hardcodes the log that we create as $0.log which we may want to change, or not overwrite. Hence, this is the perfect place for a pragma, or directive. We want to say:

    use FullWarn "file";

    and have the signal handlers defined for us, and all the output from warnings and errors stuffed into the file "file".

    Hence, we shall make a package ("FullWarn.pm") and stuff the code into it. We will make it a 'wrapper' much in the same way we have transformed scripts into modules:

    FullWarn.pm:

    1 package FullWarn;

    2

    3 use Carp;

    4 use FileHandle;

    5 sub import

    6 {

    7 $FullWarn::log = $_[1]; # use second argument as log name.

    8 $SIG{'__DIE__'} = \&dieHandler;

    9 $SIG{'__WARN__'} = \&warnHandler;

    10 }

    11 sub warnHandler

    12 {

    13 my $fd = new FileHandle(">> $log") || die "Couldn't open $log!\n";

    14 my $text = Carp::longmess(@_);

    15 print $fd $text;

    16 close($fd);

    18 }

    19 sub dieHandler

    20 {

    21 my $fd = new FileHandle(">> $log") || die "Couldn't open $log!\n";

    22 my $text = Carp::longmess(@_);

    23 print $fd $text;

    24 close($fd);

    25 confess(@_);

    26 }

    27 1;

    Note two things here. First, since we are in a directive, we can be more explicit than we were before, defining both the warnHandler, and dieHandler to display output.

    Second, the signal handler requires a global variable $FullWarn::log because $SIG{'__WARN__'} and $SIG{__DIE__} cannot take arguments.

    Things to try out next: You could make this method more fancy, separating errors into different packages or even dumping these errors to a centralized log. You even may want to make it so that you have a rigorous, 'warnings check' before you hand off your code to anybody else (or put it into source control!)

    Example #2: Using tie to make a 'safe object'

    Let's get a little bit more fancy, and look at how to make a 'safe object'.*

    Actually, this sample code is a simpler version of a module which is already in the standard distribution called Class::Fields. Use that one for real, and only take a look at this as an example.

    In C++, you have the convention where you can say:

    class ClassName

    {

    private:

    int a;

    int b;

    public:

    int c;

    int d;

    };

    where, in other words, you can declare your variables as 'int a', etc.

    Let's implement a small 'tied' class that does the same thing for Perl. We will inherit this class, so you can say:

    1 use MyClass;

    2 @ISA = qw(SmartObject);

    3 sub new

    4 {

    5 my ($type, @args) = @_:

    6 my $self = bless {}, $type;

    7 $self->declare('a','b','c','d');

    8 $self;

    9 }

    In other words, line 7 will make it so anyone trying to access:

    $self->{'e'};

    will provoke the message

    Element 'e' is not a 'blessed' data part of the object MyClass!

    So how to go about doing this? Well, we will make $self a tied class so that any time we access an element of $self, we get a runtime check that this element is OK. As we did before, lets split this up into all the functions we are to program:

    The inherited function - declare and TIEHASH

    Here is the code for our 'declaration function':

    SmartObject.pm

    1 package SmartObject;

    2 sub declare

    3 {

    4 my ($self, @elements) = @_;

    5 tie (%$self, 'SmartObject', ref($self), @elements);

    6 }

    Here, we declare that our object ($self) is also a 'SmartObject', and hence, will go through the functions below when its elements are accessed, changed, and what not. Line 5 calls the constructor below:

    7 sub TIEHASH

    4 {

    5 my ($type, $beginType, @elements) = @_;

    6 my $self = {hashval => {});

    7 $self->{'element_list'} = {};

    8 $self->{'type'} = $beginType;

    9 my $list = $self->{'element_list'};

    10 foreach (@elements) { $list->{$_} = 1; }

    11 bless $self, $type;

    12 }

    In line 10 then, we register all of the elements that the user passed in ('a','b','c','d'). We will remember this declaration later, when we decide when an element is 'appropriate' or not. We now make a small shorthand function that checks to see if a data member is registered:

    13 sub isRegistered

    14 {

    15 my ($self, $key) = @_;

    16 my $list = $self->{'element_list'}l

    17 if (!defined $list->{$key})

    18 {

    19 print "Element '$key' is not a 'blessed' data part of the object ",

    20 $self->{type}, "!\n"; return(undef);

    21 }

    22 }

    Now, we are ready to make our 'tied' methods.

    The FETCH method

    When someone tries to access $self->{'e'}, the FETCH method is called:

    23 sub FETCH

    24 {

    25 my ($self, $key) = @_;

    26 if ($self->isRegisted($key)) { return($self->{hashval}->{$key}); }

    27 return(undef);

    28 }

    line 26 acts as a 'filter'. Here is where our check on whether or not we have registered the element $key. If not, we are given a warning, and return an undefined key in 27.

    The STORE method

    STORE is the opposite of FETCH. Here we check to make sure that our class is not storing an element which we do not want:

    30 sub STORE

    31 {

    32 my ($self, $key,$value) = @_;

    33 if ($self->isRegisted($key)) { $self->{'hashval'}->{$key} = $value }

    34 return(undef);

    35 }

    When we say $self->{'e'} = 1, line 33 again stops us. The key is not registered, and undef is returned

    Other tie Methods in SmartObject

    The other methods are simple consequences of the above. DELETE, CLEAR, EXISTS, FIRSTKEY and NEXTKEY all are either regular hash calls, or have the filter:

    36 sub DELETE

    37 {

    38 my ($self, $key) = @_;

    39 if ($self->isRegisted($key)) { delete $self->{'hashval'}->{$key}; }

    40 return(undef);

    41 }

    42 sub CLEAR

    43 {

    44 my ($self) = @_;

    45 my $key;

    46

    47 my $hash = $self->{'hashval'};

    48 foreach $key ( keys %$hash ) { $self->DELETE($key); }

    49 }

    50 sub EXISTS

    51 {

    52 my ($self, $key) = @_;

    53 if ($self->isRegistered($key))

    54 {

    55 return(1) if (exists $self->{'hashval'}->{$key});

    56 }

    57 return (0);

    58 }

    59 sub FIRSTKEY

    60 {

    61 my ($self) = @_;

    62 my ($key, $value) = each(%{$self->{'hashval'}});

    63 return($key);

    64 }

    65

    66 sub NEXTKEY

    67 {

    68 my ($self, $lastkey) = @_;

    69 $self->FIRSTKEY();

    70 }

    Through the magic of inheritance then (see section inheritance for more detail), the method

    $self->declare('a','b','c','d');

    calls the method

    SmartObject::declare($self, 'a','b','c','d')

    which then constructs a 'guard' through 'tie'ing a SmartObject 'around' the hash reference. Whew!

    I guess that there are two points to this example:

    First, you can probably see why it is necessary to keep close track of your warnings and errors in complicated code! We have three levels of depth here:

    $self->declare() (in package MyClass) references

    SmartObject::declare() which calls

    SmartObject::TieHash();

    In addition, if there are any bugs, they will happen in MyClass, or the application that calls MyClass. Hence, carp(), confess(), and crew, plus redefining the signal handlers is essential to keep our sanity and to debug our classes.

    Second, you can see exactly how malleable Perl is! Any feature of other object oriented languages (privacy, etc) can be implemented in Perl, albeit at runtime. It is up to you to decide exactly how strict you want the language.

    This is the last of the chapters which emphasizes concepts in programming. Next, we turn to Perl projects, where we are concerned about making an application, rather than individual objects.

    Orders Orders Backward Forward
    Comments Comments

    COMPUTING MCGRAW-HILL | Beta Books | Contact Us | Order Information | Online Catalog


    HTML conversions by Mega Space.

    This page updated on October 14, 1997 by Webmaster.

    Computing McGraw-Hill is an imprint of the McGraw-Hill Professional Book Group.

    Copyright ©1997 The McGraw-Hill Companies, Inc. All Rights Reserved.
    Any use is subject to the rules stated in the Terms of Use.