![]() ![]() |
![]() ![]() |
![]() ![]() | |
© 1997 The McGraw-Hill Companies, Inc. All rights reserved. Any use of this Beta Book is subject to the rules stated in the Terms of Use. |
Perl does not leave you blind when it comes to tracking down bugs, distributing your programs, and figuring out performance bottlenecks in your code. Far from it. Perl goes quite overboard with its developer's support. There are debuggers, performance profilers, coverage testers, and other testing tools, all of which are essential to a rapid software development cycle.
There is also a compiler which is just as important. By compiling your Perl code into a standalone executable you can hide its contents with the intent to sell your Perl programs.* By distributing compiled executables, with or without source, you also limit the number of dependencies that people using your software need, as well as reducing your programs start-up and run-time.
Some folks do not like this type of logic. Somehow, the open nature of Perl and selling Perl programs do not mesh well. I guess my approach is that for every program I sell, I place a module in the public domain, just to give back to the programming community. Like all grass-roots technologies, there has to be a balance found between cooperation and competition. If every Perl programmer acted as an island, then Perl would have never started. At a very fundamental level, Perl is a language that facilitates sharing. Nonetheless, people have to be able to make a living with Perl, therefore Perl's support for code privacy. |
This goal of this chapter is to go over some of these tools which can help you with your programming effort. Make no mistake about it, there are 'cookie cutter' tools which provide quite a bit of functionality in your day-to-day programming chores. It is also important how Perl implements these tools. The way Perl implements these tools is a fascinating study in modular design in itself. Hence, there will be four parts to this chapter.
First, we will give an overview of how Perl is structured to handle programming tools. The programming tools in Perl are self-referential; in other words, they are themselves programmed in Perl! This means that if you don't like the default Perl debuggers, you can easily create your own.
Second, we will go over the default debugger that comes with the Perl executable itself. The debugger is full, option-rich, and of course has a Perl-ish syntax which makes it familiar and easy to use.
Third, we will go over some of the alternate debuggers that people have written for Perl, such as the speed profiler DProf, the small speed profiler SmallProf and the Coverage Tester. All three of these modules are included with the CD at the end of the book.
Finally, we will consider the Compiler module. We shall look at its usage, and see how its judicious use can save you many headaches in actually managing your code base.
Perl has quite an elegant way to handle programs such as debuggers, profilers, and so forth. These programs help the programming process itself. There are two basic types of extensions which Perl defines:
extensions for debugging: profilers (for performance testing), debuggers, coverage checkers
extensions to manipulate the internals of Perl code: Lint and the Compiler
Instead of making one built-in program to do the work of a profiler and another to do the work of a debugger, Perl provides hooks which allow programmers to look inside the code as it is running or look at how it is compiled. Let's look at each of the above extensions in turn.
Perl supports programmable debugging extensions, as well as a built-in debugger. For example, if you say:
C:> perl -d program.p
this tells Perl to turn on a special debugging flag internally. This flag tells Perl to tuck aside additional bits of information about what is going on inside the process, storing them for later use. It then fires up the default debugger, which is a Perl file, and which we shall describe later.
Simple enough, but if you say
C:\> perl -d:MyDebugger program.p
then instead of the default debugger, the debugger stored in Devel::MyDebugger runs instead. As we shall see, creating your own debugger is not that difficult. How would you actually program Devel::MyDebugger? The online documentation perldebug has more detail, but when you say -d, Perl provides the hooks for a Debugger programmer are made through the package called DB.
Each time a new expression in a program with the '-d' flag is called, the function DB::DB is called behind the scenes. Here is a simple debugger that simply counts the number of statements that you run in any given program.
Devel/MyDebugger.pm
1 package Devel::MyDebugger.pm
2 my $_noOfStatements = 0;
3 sub DB::DB { $_noOfStatements++; }
4 END { print "This program had $_noOfStatements statements.\n"; }
5 1;
Line #3 keeps a running tally. Each time a statement is called, the counter is incremented. At the end of the program, the END block is called and if you run this script on the following example:
C:\> perl -d:MyDebugger -e 'for ($xx = 0; $xx < 5; $xx++) { 'print "HERE!\n"; }'
Then you get the following output:
This program had 8 statements
Figure 22.1 shows this relationship:
221.fig
Figure 22.1
Relationship between Debuggers and Perl Core
Each time a statement is run then, the DB namespace is dumped into so you can take a look at its comments and manipulate them. One such manipulation was DB::DB being run after each statement. Another is the array @{$main::{'<_' . $filename}} being set to the lines in the file.
By using caller() inside DB you can get the package, filename, and line number from code that you call. From there, you can get the text of the line being executed. For example, if you say:
Devel/MyDebugger2.pm
1 sub DB::DB
2 {
3 my ($pkg,$filename,$line) = (caller(0))[0..2];
4 print "$pkg: $filename: $main::{'_<' . $filename}[$line]\n";
5 }
then when you say:
perl -d:MyDebugger2.pm -e 'for ($xx = 0; $xx < 5; $xx++) { print HERE\n"; }'
then you will get echoed back
main: -e: for ($xx = 0; $xx < 5; $xx++) { print "HERE\n"; }
HERE
main: -e: for ($xx = 0; $xx < 5; $xx++) { print "HERE\n"; }
HERE
main: -e: for ($xx = 0; $xx < 5; $xx++) { print "HERE\n"; }
HERE
main: -e: for ($xx = 0; $xx < 5; $xx++) { print "HERE\n"; }
HERE
main: -e: for ($xx = 0; $xx < 5; $xx++) { print "HERE\n"; }
HERE
main: -e: for ($xx = 0; $xx < 5; $xx++) { print "HERE\n"; }
Perl rotates here between executing the debugger function DB::DB, and executing the actual code which prints 'HERE' to the screen. What we have basically implemented here is the trace function, one that we will describe below.
For more information on this topic the best place to go is inside the standard distribution, in:
lib/perl5db.pl
This is the standard debugger, the one that is automatically called when you say 'perl -d' on the command line. It is heavily documented, so you can see what is going on fairly easily.
You can also look at
Devel::SmallProf - the 'Small Profile' item we will address below
Devel::Coverage - a good example of a coverage testing tool
Devel::DProf - the standard profiler everybody uses although its help is limited to novices since it is written in C and linked into Perl
We will cover each of these items below (including the standard debugger, of course). If you want to see the magic behind the scenes, start digging into the source code along with the perldebug page that comes with the central distribution.
Some people are surprised that the Perl debugger is not part of Perl or a C add-on, and that it is written in Perl itself. But since Perl's forte is manipulating and parsing text, all the writers of Perl need to do is hang on the hooks that we talked about up above, and voila: a 50,000 line debugger if written in C becomes a 2,000 line Perl debugger. More powerful, too, since we can use all the powers of Perl to talk to the applications we are to debug. You might not be surprised that the Perl compiler itself is written in Perl. Substitute 2,000 line debugger for 50,000 line compiler, and that pretty much sums up how the compiler was written.
Again, the logic is much more complicated than was the case with the debugger, but essentially what is going on is something that looks like Figure 22.2:
222.fig
Figure 22.2
Relationship between Compiler and Perl Core
Here, instead of Perl populating a run-time data structure, the compiler reuses parts of the interpreter. This lets Perl do the difficult part of actually figuring out what a particular symbol means. For example:
${"main::$variable"};
is a SCALAR, but I'd like to see the regular expression that can parse this correctly, along with the other thousand or so ways to express scalars! The compiler piggybacks on the shoulders of the interpreter to turn this into the correct address for the compiler to generate C-code or Byte Code.
Note that this is a very high level description of what is occurring to make the Compile and Lint modules work. It is not nearly as easy to program your own warnings as it is to program your own debuggers. In order to do so, you need to know the outline of the way a Perl program actually splits up into 'op' codes or operation codes. This is not an easy task.
For a good start on this subject, I suggest 'Advanced Perl Programming' by Sriram Srinivasan, or the online documentation called perlguts. But notice that this is a formidable task, and it probably will take you a while to get to the point where you can do this successfully.
Perl differentiates itself from pretty much every other language out there by making it easy to program debugging tools and actually fiddle around with the language itself with its dynamic, adaptable syntax. We distinguished two types of auxiliary tools above: debuggers which used a special package called DB which Perl itself provided via the '-d' flag, and compilers which used the Perl interpreter itself to figure out how to parse, compile, and otherwise gauge your code.
Now is the time to look at examples of these principles in action. Some of you might be thinking that I am going in a backwards way with this. After all, we showed you how to program a debugger, albeit a simple one. Only now am I actually getting into the default Perl debugger. Surely the second one is the easier topic!
Nonetheless, I think that the sequence is right. You will be a much more powerful Perl programmer when you realize exactly how malleable, bendable, and otherwise shape-able Perl syntax can be. Perl's debugger is only one of a multitude that could possibly exist, since, as you can see above, seldom is anything hard coded into the Perl core. Perl is in many ways the duct tape of languages. Larry Wall calls it a "glue language" and well, Perl itself is used to glue different parts of Perl together!
The program that most people think of when they consider tools to help them out in a Perl development environment is the Perl debugger. As we noticed above, when you say something like:
prompt% perl -d program.p
Perl magically comes back with a prompt, looking something like:
Stack dump during die enabled outside of evals.
Loading DB routines from perl5db.pl patch level 0.94
Emacs support available.
Enter h or `h h' for help.
A::(a.p:16): 1;
DB<1>
The prompt (DB<1>) indicates that you are interacting with the Perl debugger. What is going on behind the scenes is that you have silently loaded the file:
lib/perl5db.pl
which loads a debugger programmed in Perl into memory. You interact with the debugger with a series of one-word commands which do such tasks as setting breakpoints, tracing, and so forth.
However, since the debugger is itself in Perl, you have the power to actually interact with the program that you are tracing. Suppose, for example, that you were writing a program that had an extremely large for loop in it. Suppose the relevant code looked something like:
script.p
#
#
55 for ($xx = 0; $xx < 10000; $xx++)
56 {
57 print "HERE!\n";
58 # do something interminably long based on $xx
59 }
In order to test this, you are obviously going to have to cut short the for loop. You could do this either by changing the code itself (not a very good idea) or by causing the loop to end prematurely. If we opt for the second, the debugger is the perfect way to handle it. All we need to do is load the debugger into memory on the script:
prompt% perl -d script.p
Stack dump during die enabled outside of evals.
Loading DB routines from perl5db.pl patch level 0.94
Emacs support available.
Enter h or `h h' for help.
A::(a.p:16): 1;
DB<1>
Now set a simple breakpoint in the script:
DB<2> b 57
This will stop the script as soon as it hits the line 57. Then we proceed to run it:
DB<3> r
main::(script.p:57): print "HERE!!!\n";
We have now broken the script at line 57. The printing of the line indicates that the line is about to be executed. It has not been executed already.
Since we are now inside the loop itself, we can modify it to our heart's content. We can execute any legal Perl command here. If we say:
DB<4> $xx = 9998;
DB<5> D
Deleting All Breakpoints.
we set the iterator to 9998 so there are only two more rolls of the loop to be executed. The 'D' then deletes all of the breakpoints that we have assigned, so that we will not be stopped the next time we hit 57. Hence, when we say:
DB<6> c
to continue, the script only iterates twice through the loop rather than the full complement of ten thousand iterations.
This seems pretty simple, correct? The fact that the Perl debugger is so simple and so powerful has made it very popular. To get the full picture on the debugger, the best source is the online documentation (perldebug) as well as looking at the source code itself. Below, we give a quick overview of the most popular commands:
Common Debugger Functionality
There are thiry-two separate flags in the Perl debugger. Below are the twelve most common flags used, along with some examples of their usage.
Online Help with 'h'
There are two forms of online help that you should be aware of. Type:
prompt% perl -d script.p
Stack dump during die enabled outside of evals.
Loading DB routines from perl5db.pl patch level 0.94
Emacs support available.
Enter h or `h h' for help.
DB<1> h
to get a list of all the commands in gory detail. Type:
DB<1> h h
to get a nice, handy cheat sheet to all the operations listd below, plus many others. This cheat sheet is so handy, that I reproduce it below:
223.fig
Figure 22.3
Cheat Sheet For Debugger
If you then want more information on each of these commands, you can say:
DB<1> h =
= [alias value] Define a command alias, or list current aliases.
to get a more detailed version of the command's text.
Interacting with the Debugged Program - running Perl expressions, and 'x' for expand data structure
As we said earlier, the Perl debugger is a Perl debugger in more than one sense of the word. It debugs your programs, but it is also written in Perl so you can modify your program's attributes when you debug it. The following debugging session uses the LWP::UserAgent module introduced in chapter 12. It is used to give a 'command line' browser. In other words, you can interact with the net without the GUI attached, and without even writing Perl programs. First, load up a dummy program:
c:\> perl -d a.p
Loading DB routines from perl5db.pl version 1
Emacs support available.
Enter h or `h h' for help.
and then load functionality into it:
DB<1> use LWP::UserAgent;
Assuming that you have the user agent module of Perl installed, this gives you all the power necessary to get files off target servers and put them into memory:
DB<2> $ua = new LWP::UserAgent();
DB<3> $request = new HTTP::Request('GET', "ftp://ftp.cs.colorado.edu/ls-lR.gz");
DB<4> $result = $ua->request($request, "target_file.gz");
DB<5> system("gunzip target_file.gz");
DB<6> open(FD, "target_file");
DB<7> @lines = <FD>;
DB<8> @lines = grep(m"perl"i, @lines);
DB<9> print @lines;
# output deleted...
Note that all of these commands are being done in the debugger. Perl is essentially acting as a shell here, doing each of your commands one by one and immediately executing them. This (interactive) process gets the file ls-lR.gz from ftp.cs.colorado.edu, unzips the file, and reads the file's lines into memory. Perl then looks for the string Perl in these lines and then prints them out. The debugger acts as a mixture of shell and Perl combined.*
*
Note however, that we are not using my because the Perl uses eval internally. If we used my variables, they would go out of scope immediately. |
printing data structures with 'x'
Remember Data::Dumper, that handy module for printing out data structures? Well, the debugger has its own form of 'Data::Dumper', which you can access via 'x'. In the following debugger session, you could print out the contents of @array1 by saying:
prompt%perl -d script.p;
Loading DB routines from perl5db.pl patch level 0.94.
Emacs support available.
Enter h of 'h h' for help;
DB<1> @array1 = ( 0, \
cont: [1,2,3,4], \
cont: [5, [6,7,8,9], \
cont: 5,6, 7], \
cont: 1
cont: );
DB<2> x @array1
0 0
1 ARRAY(0x23234ab)
0 1
1 2
2 3
3 4
2 ARRAY(0x23424ac)
0 5
1 ARRAY(0x23242ab)
0 6
1 7
2 8
3 9
2 5
3 6
4 7
3 1
As you can see, this output is designed more with visual debugging in mind than is Data::Dumper. At a glance, you can see which array elements go with which subscript, although this clarity has a cost. The output is not nearly as good for making Perl tests, because Data::Dumper outputs legal perl syntax.
Also notice that we used a '\' on the end of the line to make a multi-line Perl script in the debugger. This is just one trick that makes complicated debugging easier here.
When you start up the Perl5 debugger, Perl just sits there and waits for you to start running the program that you had loaded. Now, as always, there are a quite few ways to do this.
Running Programs with 'r'
The first way to run programs is with 'r'. When you get a debugger prompt, simply type 'r':
DB<2> r
then your program will run either till it finishes executing your code (and returns to the debugger prompt) or the program catches a fatal error. In the case of a fatal error, the debugger dumps out the sequence of calling routines (i.e.: the stack trace) of where the program ended its life.
tracing with 't'
The next most common flag that you will use in conjunction with the debugger is 't'. The 't' command is used for showing verbose detail about the expressions that are being executed by Perl at any given time. When you say something like:
DB<1> t
Trace=on
then you are putting Perl in trace mode. Trace mode tells the Perl debugger to print out all the statements that pass through. In fact, the combination of 't' and 'r', as in:
DB<1> t
Trace=on
DB<2> r
will often be all that you need in order to trace through code errors. Suppose your program looks something like this:
Listing 22.1: Insertion Sort:
1 #!/usr/local/bin/perl5
2
3 my @array = (1, 66,2);
4 insertion(\@array);
5
6 sub insertion
7 {
8 my ($array) = @_;
9 for ($xx = 0; $xx < @$array; $xx++)
10 {
11 for ($yy = $xx+1; $yy < @$array; $yy++)
12 {
13 swap($array->[$xx], $array->[$yy])
14 if ($array->[$xx] > $array->[$yy]);
15 }
16 }
17 }
18
19 sub swap
20 {
21 my ($tmp) = $_[0];
22 $_[0] = $_[1];
23 $_[1] = $tmp;
24 }
This is just simple insertion sort. Let's look at the algorithm in action:
Listing 22.2 Recursive Headers
1 DB<2> r
2 context return from : undef
3 main::(as.p:5): for ($xx = 0; $xx < @array; $xx++)
4 main::(as.p:6): {
5 main::(as.p:11): }
6 main::(as.p:7): for ($yy = $xx+1; $yy < @array; $yy++)
7 main::(as.p:8): {
8 main::(as.p:10): }
9 main::(as.p:9): swap($array[$xx], $array[$yy])
10 if ($array[$xx] > $array[$yy]);
11 main::(as.p:9): swap($array[$xx], $array[$yy])
12 if ($array[$xx] > $array[$yy]);
13 main::(as.p:7): for ($yy = $xx+1; $yy < @array; $yy++)
14 main::(as.p:8): {
15 main::(as.p:10): }
16 main::(as.p:9): swap($array[$xx], $array[$yy])
17 if ($array[$xx] > $array[$yy]);
18 main::swap(as.p:17): my ($tmp) = $_[0];
19 main::swap(as.p:18): $_[0] = $_[1];
20 main::swap(as.p:19): $_[1] = $tmp;
21 main::(as.p:7): for ($yy = $xx+1; $yy < @array; $yy++)
22 main::(as.p:8): {
23 main::(as.p:10): }
24 main::(as.p:13): print "@array\n";
Although this output does take some time to get used to - the parenthesis positions in the for loops are counterintuitive - it is fairly easy to see what is going on. The test array is: (1,66,2). We try to swap in lines 9-12 but without success, since $array[0] is already the lowest and does not need to be swapped.
In line 16, we swap $array[1] and $array[2], as indicated by the fact that we actually go into the subroutine swap (main::swap in lines 18-20).
Continuing with 'c'
The flags 'r' and 't' are what you might call the bludgeon's tools in finding bugs. Fifty percent of my debugging sessions consist of:
DB<1> t
Trace=on
DB<2> r
and then being lazy watching all of the debugging text fly by as we have just seen. Sometimes I save the output to files for later perusal (by using 'script' on UNIX. I'm not sure of the equivalent syntax on NT.)
Sometimes you need a bit more of a delicate approach. Suppose, for example, there was a lot of overhead in starting your code. Say that you were testing only a small part of it, such as a new merge-sort algorithm to replace the insertion-sort one.
This is a job for 'c'. Instead of typing 'r' which runs through the whole program (or dies or hits a breakpoint), 'c' can stop at any particular place. Say your code looked like this:
Listing 22.3: Merge Sort (algorithm code listed only).
# other stuff at beginning.
150 my @array;
151 @array = (11112,12,442,1231);
152
153 mergesort (\@array);
154 print "@array\n";
155
156 sub mergesort
157 {
158 my ($array) = @_;
159 mergesort($array, 0, scalar(@$array) - 1)
160 }
161
162 sub _mergesort
163 {
164 my ($array, $low, $high) = @_;
165
166 my $middle;
167
168 if ($low < $high )
169 {
170 $middle = int(($low + $high)/2);
171 _mergesort($array, $low, $middle);
172 _mergesort($array, $middle+1, $high);
173 _merge($array, $low, $middle, $high);
174 }
175 }
176
177 sub _merge
178 {
179 my ($array, $low, $middle, $high) = @_;
180
181 my $sorted_array = [];
182
183 my ($lowcounter, $highcounter) = ($low, $middle+1);
184
185
186 while (($lowcounter != $middle + 1) || ($highcounter != $high + 1))
187 {
188 if ($lowcounter == $middle +1)
189 {
190 push(@$sorted_array, $array->[$highcounter]);
191 $highcounter++;
192 next;
193 }
194 elsif ($highcounter == $high + 1)
195 {
196 push(@$sorted_array, $array->[$lowcounter]);
197 $lowcounter++;
198 next;
199 }
200 elsif ($array->[$lowcounter] < $array->[$highcounter])
201 {
202 push(@$sorted_array, $array->[$lowcounter]);
203 $lowcounter++;
204 }
205 else
206 {
207 push (@$sorted_array, $array->[$highcounter]);
208 $highcounter++;
209 }
210 }
211 splice(@$array, $low, scalar(@$sorted_array), @$sorted_array);
212 }
We probably do not want to have to wade through the first 150 or so lines of Perl that we have written. We certainly do not want to do this if we have thousands of lines of Perl code that we have included from other modules and objects such as data dumper. (Yes. The Perl debugger will go through its own code as well.). Instead, let's tell the interpreter to 'run up to a certain point and then stop':
DB<1> c _merge
This does us the favor of running all of the code that we are not interested in silently. We stop right at line 179, where we can now either trace the rest of the code, or step through it.
Suppose that we were having difficulty with the _merge portion of the merge-sort algorithm. The process of merge-sort is a divide and conquer strategy: it splits up an array to be sorted into sub arrays, which are then split up until they are re-united in a process that looks something like Figure 22.1:
221.fig
Figure 22.1
Merge sort algorithm
The _merge algorithm is a key component of merge sort, and has to do something like the following. The two, separately sorted arrays:
Array #1: 1 2 5 6 Array #2: 1 3 4 6
Need to be sorted into:
Array #1: 1 1 2 3 4 5 6 6
This array can now be passed back to its parents to be merged into a larger array. However, there is a catch: in order for MergeSort to work correctly, we cannot be sloppy in our sorting. We need to only go through Array #1 and Array #2 once.
So we come up with a plan. We will directly compare the leftmost elements inside Array #1 and Array #2, take the smallest one of these elements, and stuff it into a temporary array. We will repeat this process until there are no elements left in either of the children arrays (#1 and #2).
The best way to see how this works is by looking at it in action. Fire up the debugger and type:
DB<1> c _merge
main::_merge(merge.p:180): my ($array, $low, $middle, $high) = @_;
which brings us up to the point where the merge happens. We proceed to step through the code, first with 's':
DB<2> s
main::_merge(merge.p:182): my $sorted_array = [];
and then <CR> for return.
DB<3>
main::_merge(merge.p:38): my ($lowcounter, $highcounter) =
($low, $middle+1);
If we are eager to take a look at any of these values, say:
DB<3> x @_
0 ARRAY(0x173e78)
0 11112
1 12
2 442
3 1231
1 0
2 0
3 1
which gives us what has been passed to the subroutine. $low is zero, $middle is zero, and $high is one, so it looks like we are at the tail end of Figure 22.1. We continue stepping:
Listing 22.4 - the _merge function in Action
1 DB<4>
2 main::_merge(merge.p:186): while (($lowcounter != $middle + 1) ||
3 ($highcounter != $high + 1))
4 main::_merge(merge.p:187): {
5 DB<4>
6 main::_merge(merge.p:188): if ($lowcounter == $middle +1)
7 main::_merge(merge.p:189): {
8 DB<4>
9 main::_merge(merge.p:194): elsif ($highcounter == $high+1 )
10 main::_merge(merge.p:195): {
11 DB<4>
12 main::_merge(merge.p:200): elsif ($array->[$lowcounter]
13 < $array->[$highcounter])
14 main::_merge(merge.p:201): {
15 DB<4>
16 main::_merge(merge.p:207): push (@$sorted_array,
17 $array->[$highcounter]);
18 DB<4>
19 main::_merge(merge.p:208): $highcounter++;
20 DB<4>
21 main::_merge(merge.p:188): if ($lowcounter == $middle +1)
22 main::_merge(merge.p:189): {
23 DB<4>
24 main::_merge(merge.p:194): elsif ($highcounter == $high + 1)
25 main::_merge(merge.p:195): {
26 DB<4>
27 main::_merge(merge.p:196): push(@$sorted_array,
28 $array->[$lowcounter]);
29 DB<4>
30 main::_merge(merge.p:197): $lowcounter++;
31 DB<4>
32 main::_merge(merge.p:198): next;
33 DB<4>
34 main::_merge(merge.p:211): splice(@$array, $low,
35 scalar(@$sorted_array), @$sorted_array);
At each step in this process, we can stop, peek, investigate, and generally see what is going on through any Perl expression that we like. Line 34 shows the major step here. We have built up @$sorted_array to hold the part of the array which has been sorted by _merge(). We then replace the non-sorted part of our array with a sorted one. Since our sample array looks like:
@array = (11112,12,442,1231);
and the underlined members of the array are what is being tested here, this array should end up as:
@array = (12, 11112, 442, 1231);
In other words, an operation like Figure 22.2 should have occured:
222.fig
Figure 22.2
Operation to do mergesort
We can indeed test this by checking what 'x' has to say:
DB<5> x $sorted_array
0 ARRAY(0x2e8630)
0 12
1 11112
DB<6> x $array # Before
0 ARRAY(0x1985ec)
0 11112
1 12
2 442
3 1231
DB<7> x $array # After
0 ARRAY(0x1985ec)
0 12
1 11112
2 442
3 1231
As you can see, the sorted array has inserted itself into the total, sorted array that we are collecting.
'n' for stepping through code (over subroutines)
For a more high level version of what is occurring within the processing, we can use 'n' instead of 's'. Say we wanted to get a higher level view of what is happening in merge-sort. We want to go through only one level of recursion, instead of hitting rock bottom as we did with _merge() only having one element.
We can step over the subroutines. We can say something like:
prompt% perl -d merge.p
Stack dump during die enabled outside of evals.
Loading DB routines from perl5db.pl patch level 0.94
Emacs support available.
Enter h or `h h' for help.
main::(merge.p:150): my @array;
DB<1> c _mergesort
We now step as we did before, but this time, we use 'n', and do not show the recursion when we hit _mergesort():
DB<2> c _mergesort
main::_mergesort(merge.p:164): my ($array, $low, $high) = @_;
DB<3> n
main::_mergesort(merge.p:166): my $middle;
DB<3>
main::_mergesort(merge.p:168): if ($low < $high )
main::_mergesort(merge.p:169): {
DB<3>
main::_mergesort(merge.p:170): $middle = int(($low + $high)/2);
DB<3>
main::_mergesort(merge.p:171): _mergesort($array, $low, $middle);
DB<4> x @{$array}[$low..$middle]
0 11112
1 12
2 442
3 1231
DB<5> x @{$array}[$middle+1..$high]
0 3232
1 23421
2 5453
3 222
Just doing a sanity check here: the arrays should be unsorted before we call _mergesort() recursively. But after we step again:
DB<6>
main::_mergesort(merge.p:172): _mergesort($array, $middle+1, $high);
DB<7> !4 # does 4th command - DB<4>
x @{$array}[$low..$middle]
0 12
1 442
2 1231
3 11112
We miss the internals of the merge sort completely. Notice that the sub-array is completely sorted. Continuing:
DB<7>
main::_mergesort(merge.p:173): _merge($array, $low, $middle, $high);
DB<8>!5 # does 5th command - DB<5>
x @{$array}[$middle+1..$high]
0 222
1 3232
2 5453
3 23421
DB<9>
main::(merge.p:154): print "@array\n";
DB<10>
12 222 442 1231 3232 5453 11112 23421
We have conclusive proof that _mergesort() is working, at least at a higher level. After all, both sub arrays are sorted, and in line 154, we show that they are combined to become a sorted array. Of course, there is more than one way to do this tracing. Two other ways which we discuss below are through breakpoints and action points.
The 'c' function, continue, gives you a certain amount of control over where your programs are executed. The next step in control are breakpoints - where you can set a place where code execution stops, so you can take a look at what is going on. The four main ways of setting breakpoints are:
setting the breakpoints with 'b'.
The key 'b' sets a breakpoint, a place inside your code where the execution of the program will stop. When you then hit 'r' for run, the code continues up until the next breakpoint that you have set. You are then at a command prompt next to the statement on which you set the breakpoint. There are several ways that you can set the breakpoints in your code. For a complete bestiary, go to the online documentation in perldebug. Following are the most important.
Setting breakpoints on line numbers:
This is the least exciting of the ways to set breakpoints. All you have to do is say:
DB<3> b 30
and your program will then dutifully break every single time line 30 comes around. Generally, this is slow and inefficient since people think of code more in terms of subroutines than in line numbers. But see the second point below.
Setting break points on subroutines.
If you say:
DB<3> b sync
then Perl looks for the function sync, and sets the breakpoint right after the function was called. For example, in the following listing the breakpoint will be set "right here":
sync('variable1','variable2');
sub sync
{
#### breaks right here.
my ($name, $args) = @_;
}
You are, therefore, inside the subroutine; if you want to examine the arguments passed to that subroutine you can say:
DB<4> x @_;
'x' is the printout function - so we are getting a little bit out of order here - but this is a perfect example of non-intrusive debugging techniques. An internal variable (@_) holds all the information about what has been passed to the subroutine. Since the debugger is written in Perl, we can access @_ and print it out. The 'x @_' statement then results in:
DB<4> x @_;
0 'variable1'
1 'variable2'
where variable1 is the zero-th element of the array and variable2 is the first.
Setting breakpoints with conditionals:
There is one more way to set breakpoints for a given program. That is to not run the breakpoints automatically, but instead run them with a condition attached. In other words, you can say something like:
DB<5> b 20 ($variable > 5);
which will then break if and only if $variable is greater than five. Notice the lack of an if here. The reason for this is, again, Perl's crowded syntax. If we put an if here, then the whole line would have been treated as a Perl expression, and would have been executed wrong!*
*
*
This is one of the drawbacks of having a program that tries to tie in its own language and yet uses Perl's functionality. Perl has such a large syntactical footprint (i.e.: almost syntax good or bad is legal in Perl) that there is almost nothing left for the designer. * |
Setting conditional breakpoints is my favorite way to handle breakpoints. In fact, you can do things with conditional breakpoints that you could not possibly do any way else as far as debugging goes.
Consider our MergeSort code up above. We wanted to trace the functionality of _merge(), and we partially did so. However, since the recursion 'broke up' @$array into one element chunks for handling by _merge() we didn't get to test all the conditions that could possibly arrive. We have tested the part of the _merge() algorithm that merged single elements together into pairs:
222.fig
Figure 22.2
Merge sort algorithm - tested cases
We now need to test the cases of _merge() in which there are more than two elements in each sub-array being merged. With conditional breakpoints this is easy:
DB<2> c _merge # gets us up to _merge function.
main::_merge(merge.p:179): my ($array, $low, $middle, $high) = @_;
DB<3> b _merge $_[3] - $_[1] > 3;
DB<4> c
main::_merge(merge.p:179): my ($array, $low, $middle, $high) = @_;
DB<5> s
The command 'DB<3> b _merge ($_[3] - $_[1] > 3)' puts a breakpoint at the place where the $high element is three elements greater than the $low element. In other words, where more than two elements in each sub-array are being merged.
One might ask, why is the direct access of the @_ variable necessary? Well, when you say "b _merge", this tells the debugger to stop before any values are actually read in. Then, if we tried to print out $low or $high we would get 'undef'.
After DB<5>, we are ready to step through our algorithm. See the section on 's' for more detail.
Breaking a conditional with 'd'
The keystroke 'd' does almost the exact opposite of 'b': it gets rid of one breakpoint instead of making one breakpoint. It is, however, of limited use. You can only say something like:
DB<6> d 20
where the argument to 'd' is a line number. The format 'd <subroutine_name>' doesn't work, neither does setting a conditional deletion 'd 20 ($varb > 5);'.
Listing all conditionals with 'L'
The 'L' command simply lists out the conditionals that you have already set inside your program. If, for example, you have set three conditionals at three different spots in your program:
DB<7> b qsort
DB<8> b qsort
DB<9> b partition $pivot > 5
the following command will then print out all of the conditional points:
DB<10> L
12: my ($array) = @_;
break if (1)
18: my ($array, $pivot, $right) = @_;
break if (1)
31: my ($array, $pivot, $right) = @_;
break if ($pivot > 5)
If you want to go through and selectively delete them with 'd' now, you can.
Deleting all Breakpoints with 'D'
The 'D' command goes through and deletes all break points set in your program. This is quite handy when you have set 15 different breakpoints in 10 different module files, and just want to 'start over from scratch'.
action points with 'a'
Action points allow you to insert "temporary code" into your script so that it is executed each time the statement is hit. For example, if you wanted to count how many actual times a certain loop was performed in the following code:
1 my $xx;
2 for ($xx = 0; $xx < int(rand(600)); $xx++)
3 {
4 if (_isPrime())
5 {
6 calc_code();
7 }
8 }
9 1;
We could put a temporary variable in that does this counting, but that requires a change to the code - and sometimes you don't or can't change the code (as in debugging production systems). Hence, instead you can use the debugger:
C:\> perl -d randscript.p
Loading DB routines from perl5db.pl version 1
Emacs support available
Enter h or 'h h' for help.
Main::(randscript.p):1 for ($xx = 0; $xx < int(rand(600)); $xx++)
DB<1> a 6 $count++;
DB<2> a 9 print "$count\n";
DB<3> r
142
This then gives a count of how many times the subroutine is run. Note that again we make $count a global since the debugger is working via an eval statement.
deleting action points with 'A'
'A' does the opposite of 'a'; it deletes all the action points. Unfortunately, there is no easy way to delete just one, but a request report has been filed to do so..
Debugger Shell Commands
Perl's default debugger is quite flexible; in fact one could say that it is almost like a shell. It is possible to alias commands, interact with a command history, cycle back and forward through your commands like doskey or most Unix shells. The debugger is also hooked into the code that you are executing, hence you can search through the code for patterns, and list out pertinent sections.
Cycling through commands with up arrow and down arrow
Each time you type a command in the debugger interface, Perl stores that command internally in a buffer which you then can traverse by hitting the up arrow and down arrow. You need Term::ReadKey in order to get this magic to work. You can get this code on either CPAN or with the CD accompanying this book. Suppose that you typed the following commands:
DB<1> t
Trace=on
DB<2> b mysub
DB<3> s
DB<4> x @my_value
The idea behind this is that you want to manually track what @my_value is, and stop when that value changes. So now, instead of having to type 'x @my_value' every time, you can type:
DB<4> <CR>
to step, and then:
DB<4> <up arrow> <CR>
to get back the text 'x @my_value'. Do this several times, and you get something like:
DB<4> x @my_value
# print out
DB<5>
# print next line
DB<5> x @my_value
# print out value
DB<6>
In other words, you get an extremely rapid turnaround, seeing each line of code, and then immediately seeing how that line of code effected your variable, all by typing three characters in sync (<CR> <up arrow> <CR>).
history of commands with 'H'
There is another way of accessing this buffer in addition to the step-by-step method we listed above. If, instead, you want to get a list of all the commands you typed, you can say:
DB<6> H
which will then output everything you have typed so far this session, prefixed by a number telling you what you had done. You can get a partial history by saying:
DB<6> H -5
which shows the last five commands. In the example with fetching a file via LWP::UserAgent, the output to this command would look something like:
10: H -5
9: print @lines;
8: @lines = grep(m"perl"i, @lines);
7: @lines = <FD>;
6: open(FD, "target_file");
You can now redo these commands with our next command: '!'.
redoing commands with '!'
Suppose that you want to redo the seventh command in the previous stack of commands. By saying:
DB<11> !7
Perl accesses the internal stack, and redoes the seventh command in its memory. If you say:
DB<11> !!
then in this case Perl will redo the tenth command, the last one typed ('H -5' - therefore you will get a history.) And if you say:
DB<11> !@
then Perl will reach back and get the last pattern that starts with '@'. In this case, it will read all the lines back into memory with '@lines = <FD>;'
aliases with '='
When you find something cool that you want to do over and over again, you can save it. You simply say:
DB<1> = <alias_name> <command_name>
Hence, if you said:
DB<1> = x1 'x @variable1'
then any time you type 'x1' you will get a listing and history of the variable x1. If you want to save your aliases, you will have to put them into a file that Perl evaluates before it starts the debugger, called $ENV{'HOME'}/.perldb. The above alias would become an entry in that file:
$DB::alias{'x1'} = 's"^x1(.*)"x $1"';
For more information about the $ENV{'HOME'}/.perldb file, see the on-line documentation perldebug.
Looking Through Code with 'V' and 'S'
Let's take a look at some more pretty cool ways of searching through code. Perl has many other commands that we won't consider here, such as /pattern/ to search forward, ?pattern? to search backward, 'l' to print out lines, and 'w' to get a screen's worth of code. However, 'V' and 'S' are the best of these commands, since they let you do things that a straight editor like vi or emacs, couldn't.
Looking at variables with 'V'
Perl has a way of doing a total variable dump with capital 'V'. Simply say something like:
DB<1> V main
and you will get a couple of screenfuls of your environment, variables included via use and require, any arguments that were passed to the script, and whatnot. All the global variables are available.
If you said:
DB<1> V main ~ENV
then this would show you the environment as seen by main.
Since my variables do not show up in the symbol table, they are not included in this list, however.
Looking at Subroutines with 'S'
The 'V' command has limited uses because of Perl's policy with variables (lexical variables don't show up in the symbol tables). However, the 'S' command to list subroutines is quite cool since it shows you exactly the API that you can use in modules.
For example, suppose you wanted to see the internals of the CPAN module. You could say something like:
C:\> /home/ed/perl5.004_50/install/bin/perl -d -MCPAN -e shell
which goes into the debugger:
Loading DB routines from perl5db.pl version 1
Emacs support available.
Enter h or `h h' for help.
main::(-e:1): shell
DB<1>
We then ask what subroutines it knows about:
DB<1> S
AutoLoader::AUTOLOAD
AutoLoader::BEGIN
AutoLoader::import
CPAN::AUTOLOAD
CPAN::Author::as_glimpse
CPAN::Author::email
CPAN::Author::fullname
CPAN::BEGIN
CPAN::Bundle::as_string
CPAN::Bundle::clean
CPAN::Bundle::contains
CPAN::Bundle::find_bundle_file
CPAN::Bundle::force
CPAN::Bundle::get
CPAN::Bundle::inst_file
CPAN::Bundle::install
CPAN::Bundle::make
CPAN::Bundle::readme
CPAN::Bundle::rematein
CPAN::Bundle::test
CPAN::Bundle::xs_file
CPAN::CacheMgr::BEGIN
CPAN::CacheMgr::as_string
CPAN::Distribution::MD5_check_file
CPAN::Distribution::called_for
CPAN::Distribution::clean
CPAN::Distribution::dir
CPAN::Distribution::eq_MD5
CPAN::Distribution::force
CPAN::Distribution::get
CPAN::Distribution::install
CPAN::Distribution::look
CPAN::Distribution::make
CPAN::Distribution::new
CPAN::Distribution::perl
CPAN::Distribution::pm2dir_me
CPAN::Distribution::readme
CPAN::Distribution::test
CPAN::Distribution::untar_me
CPAN::Distribution::unzip_me
CPAN::Distribution::verifyMD5
CPAN::END
###### MUCH DELETED ######
readline::rl_getc
readline::savestate
readline::substr_with_props
strict::bits
strict::import
strict::unimport
subs::import
vars::BEGIN
vars::import
All in all, when you give 'S' to the command line, you get 394 subroutines back, just for the command "perl -MCPAN -e shell"! Although this might seem intimidating, it boxes the problem, confines it. You can now see any subroutine that happens to run when you are doing different commands through CPAN in action. For example, we now set a breakpoint on CPAN::Distribution::perl:
DB<2> b CPAN::Distribution::perl
and we then run the program:
DB<3> r
Cannot create second readline interface, falling back to dumb.
Term::ReadLine::Perl::new('Term::ReadLine', 'CPAN Monitor') called at /usr/local/lib/perl5/CPAN.pm line 97
CPAN::shell() called at -e line 1
cpan shell -- CPAN exploration and modules installation (v1.30)
ReadLine support available (try ``install Bundle::CPAN'')
cpan> i Bundle::CPAN
## MUCH DELETED #####
Trying with /usr/local/bin/ncftp to get^M
ftp://ftp.digital.com/pub/plan/perl/CPAN/authors/id/HAYASHI/CHECKSUMS^M
Issuing "/usr/bin/ftp -n"^M
Local directory now /home/ed/.cpan/sources/authors/id/HAYASHI^M
GOT /home/ed/.cpan/sources/authors/id/HAYASHI/CHECKSUMS^M
Checksum for /home/ed/.cpan/sources/authors/id/HAYASHI/Term-ReadLine-Gnu-0
.09.tar.gz ok^M
x Term-ReadLine-Gnu-0.09, 0 bytes, 0 tape blocks^M
x Term-ReadLine-Gnu-0.09/typemap, 166 bytes, 1 tape blocks^M
x Term-ReadLine-Gnu-0.09/eg, 0 bytes, 0 tape blocks^M
x Term-ReadLine-Gnu-0.09/eg/ptksh+, 2649 bytes, 6 tape blocks^M
CPAN::Distribution::perl(/home/ed/perl5.004_50/install/lib/CPAN.pm:2652):
2652: my($self) = @_;
DB<2>
The debugger puts you exactly where you would want to be in order to trace that given function. You can then simply step through to see how CPAN actually figures out where your Perl executable is located, in order to install it correctly:
CPAN::Distribution::perl(/home/ed/perl5.004_50/install/lib/CPAN.pm:2653):
2653: my($perl) = MM->file_name_is_absolute($^X) ? $^X : "";
DB<2>
CPAN::Distribution::perl(/home/ed/perl5.004_50/install/lib/CPAN.pm:2654):
2654: my $getcwd = $CPAN::Config->{'getcwd'} || 'cwd';
DB<2> x $perl
0 '/home/ed/perl5.004_50/install/bin/perl'
DB<3>
We could of course go into much more detail, but for now, we will limit ourselves. It looks like this subroutine determines what 'Perl' you are looking at by seeing if $^X (the variable that holds the Perl executable name) is absolute or relative.
Summary of Debugger Commands
This section has shown only a subset of the debugger commands available for your use in a debugging session. You can also:
fork off commands to the shell
set 'actions' with 'a' so that a certain Perl command is executed before a piece of code is executed
print sessions and save them in a file
In addition, we have not even scratched the surface on customization. Do a 'h O' to see what commands can be customized (how many array elements are printed with 'x', what history limit can be set, etc.).
If you are a fan of the editor emacs, the debugger can run inside emacs (by installing cperl-mode.el inside emacs and typing
M-x perldb <CR>
script <CR>
cperl-mode.el does a lot of other things for you. Its main forte is syntax highlighting: highlighting variables, keywords, and subroutines so you do not make as many syntax mistakes. You can find it under the emacs/ directory in the standard distribution.
The commands covered are in table 22.1:
Table 22.1 - Common Perl debugger Expressions
key meaning
N/A running Perl expressions through the debugger
'h' getting online help
'x' printing data structures
'r' running programs
't' setting tracing on/off
'c' continuing a run
's' or <CR> stepping through a program line by line (<CR> works after you press 's' once),
'r' running a program under the debugger
'a' setting an action
'A' deleting all actions
'b' setting a breakpoint
'd' deleting a breakpoint
'D' deleting all breakpoints
'L' listing all breakpoints
<up arrow> Cycling up through commands in stack
<down arrow> Cycling down through commands in the stack
'H' history of all commands
'!' redoing commands
'=' making aliases
'V' printing global variables
'S' printing subroutines
These commands should do more than get you started. They should be about 95% of what you need from the debugger. Anything else you should be able to get with a combination of 'C:\> perldoc perldebug', 'DB<1> h h', and 'DB<1> h <key>'. Below, we shall try to fill in some of the gaps of keys that we did not cover.
An Example of Debugger Usage
Let's go through a couple of more cases where we would use the debugger. In my experience, I have noticed that the best use for the debugger is on the small to medium sized projects. In other words, you will want to use the Perl debugger with local, precise problems. For bigger issues and specialized debugging, see the next chapter, Debugging Techniques.
In short, the best time to use the debugger is when you have algorithm bugs, and want to step through in detail where the algorithm is going wrong. Since we have pretty much stuck to a sorting algorithm motif in this chapter so far, we might as well continue.
Example: Finding a Logic Bug in Heap Sort
Heap sort is another sorting technique that was developed to sort 'in place', so you do not need to copy elements around. It is almost exactly like Merge sort, as it runs in about the same amount (i.e.: same order of magnitude) of time. However, heap sort does not use heavy recursion. There is never a tree of the same subroutine running at the same time, and is more predictable than merge sort. It is, however, a bit more complicated. It rests on the idea that an array has parent elements and child elements, i.e.. an array element i has children 2i + 1 and 2i + 2. Each one of these children can have children of its own. The heap property states that every child has lower values than its parent. Figure 22.3 shows a heap.
223.fig
Figure 22.3
A heap in action
The trick here is that once you have made something a heap, you know that the parent of everybody (on the left hand side) is the largest element. If we made the array into a heap, saved the element at the end, and then continued to make a heap out of the remaining elements in the array we could, ultimately, sort the whole array an element at a time. Figure 22.4 shows this process:
224.fig
Figure 22.4
heap sort in action
Given the code for heap sort, there would be two steps in our algorithm, and to debug it, we could:
make heapify, a subroutine that turns an array into a heap
call heapify several times to sort the numbers in the array.
Listing 22.4 shows the actual heap sort code in Perl. That is, the actual heap sort code in Perl before a couple of bugs were removed by debugging it.
Listing 22.4 - heapsort.p
1 use strict;
2
3 my (@array) = (4,2,5,1,6,6,33,1,2);
4 heapsort(\@array);
5 print "@array\n";
6
7
8 sub heapsort
9 {
10 my ($array) = @_;
11
12 my $heapsize;
13 build_heap($array, \$heapsize);
14
15 my $xx;
16
17 for ($xx = $heapsize; $xx >= 0; $xx--)
18 {
19 swap($array->[0], $array->[$xx]);
20 $heapsize--;
21 heapify($array, 0, \$heapsize);
22 }
23 }
24
25 sub build_heap
26 {
27 my ($array, $heapsize) = @_;
28
29 my $xx;
30 $$heapsize = @$array;
31
32 for ($xx = int($$heapsize/2); $xx >= 0; $xx--)
33 {
34 heapify($array, $xx, $heapsize);
35 }
36 }
37
38 sub heapify
39 {
40 my ($array, $index, $heapsize) = @_;
41 my $left = leftify($index);
42 my $right = rightify($index);
43
44 my $largest = $index;
45 $largest = $left if (($left <= $$heapsize)
46 && ($array->[$left] > $array->[$index]));
47
48 $largest = $right if (($right <= $$heapsize) &&
49 ($array->[$right] > $array->[$largest]));
50
51 if ($largest != $index)
52 {
53 swap( $array->[$largest], $array->[$index] );
54 heapify($array, $largest, $heapsize);
55 }
56 }
57
58 sub leftify { return (2 * $_[0] + 1); }
59 sub rightify { return (2 * $_[0] + 2); }
60
61 sub swap { my $tmp; $tmp = $_[0]; $_[0] = $_[1]; $_[1] = $tmp; }
The trick is to find the bugs in the fewest steps possible. To do this, we need to know what is going on in the code.
We first build a heap in line 13, passing a reference around. Then, we continue to make heaps in the loop from 17 to 22. Each time through, though, we save an element, so the next heap we make is one shorter than the first.
This is perhaps where we want to start debugging. At each stage of the algorithm, we should see the sorted array growing on the right, and see heaps on the left. Let's start up the debugger:
prompt% perl -d wrongheap.p
Stack dump during die enabled outside of evals.
Loading DB routines from perl5db.pl patch level 0.94
Emacs support available.
Enter h or `h h' for help.
main::(wrongheap.p:3): my (@array) = (4,2,5,1,6,6,33,1,2);
DB<1>
This seems to be the job for 'a'. We shall set a line of code to be executed, to print out our array each time the call to heapify to make an array out of a heap in our main loop is done. As we can see, this is in line 20, but if we were not sure where it was in the code, we could use the debugger to find out:
DB<1> /heapify/
20: heapify($array, 0, \$heapsize);
This looks for the first occurrence of the string "heapify" inside the code. After we have found it, we then can list the context around line 20:
DB<2> l 15+10
15:
16: for ($xx = $heapsize; $xx >= 0; $xx--)
17: {
18: swap($array->[0], $array->[$xx]);
19: $heapsize--;
20: heapify($array, 0, \$heapsize);
21: }
22: }
23:
24: sub build_heap
25: {
which says 'list 10 lines around line 15'.*
I do not usually use the debugger to look at the code. Instead, I usually have a power editor (vim or emacs) open on the side and then reference the code directly that way. The exception to this is the 'S' function. I usually look through the project I'm working on with 'S' just to see an overarching view of the code. This can really help your understanding. |
We can now insert our action into the for loop, and watch heap sort in action:
DB<3>
DB<4> a 21 print "@{$array}[0..$xx] >@{$array}[$xx+1..$#array]<\n";
DB<5> r
33 6 6 2 2 4 5 1 1
6 6 2 2 4 5 1 1 33 ><
1 2 6 1 2 4 5 6 >33<
2 5 1 2 4 1 6 >6 33<
1 2 4 1 2 5 >6 6 33<
2 1 1 2 4 >5 6 6 33<
2 1 1 2 >4 5 6 6 33<
1 1 2 >2 4 5 6 6 33<
1 1 >2 2 4 5 6 6 33<
1 >1 2 2 4 5 6 6 33<
>1 1 2 2 4 5 6 6 33<
1 1 2 2 4 5 6 6 33
As you can see, something weird is going on here. In the second line, a 'ghost' element seems to have arisen. The text is sorting correctly, but there is a '><' element, which is a dead giveaway that a blank element has arisen. We can see this more plainly by doing our own test call to heapsort, and then printing the results (unfortunately, the my variables go out of scope):
DB::fake::(/usr/local/lib/perl5/perl5db.pl:2043):
2043: "Debugged program terminated. Use `q' to quit or `R' to restart.";
DB<6> @array = (1,2,3,4,0);
DB<7> main::heapsort(\@array);
4 2 3 1 0
2 3 1 0 4 ><
0 2 1 3 ><
0 1 2 ><
0 1 ><
0 ><
0 ><
DB<8> x @array
empty array
DB<9>
Ugh... Even worse. Here it looks like our bug is actually eating the elements up. So how do we track it down?
We track it down by noticing that sometime between the action at 16 and the action at 21, the array size changed. Let's try to re-run the script through the debugger. For length's sake, I've eliminated our old actions below:
DB<1> c 16
main::heapsort(wrongheap.p:16): for ($xx = $heapsize; $xx >= 0; $xx--)
main::heapsort(wrongheap.p:17): {
DB<2> x $array
0 ARRAY(0x1cacec)
0 33
1 6
2 6
3 2
4 2
5 4
6 5
7 1
8 1
DB<3> n
main::heapsort(wrongheap.p:18): swap($array->[0], $array->[$xx]);
DB<3> n
main::heapsort(wrongheap.p:19): $heapsize--;
DB<3> x $array
0 ARRAY(0x1cacec)
0 undef
1 6
2 6
3 2
4 2
5 4
6 5
7 1
8 1
9
Aha! When we swap elements in line 18, we bring in an unwanted element. Remember that Perl automatically creates elements for you when you access an element outside array boundaries? This is what is happening here. $xx is too big by one, so $heapsize is too big by one. We look up to where $heapsize is being set, and we see:
12 build_heap($array, \$heapsize);
This looks suspicious to me, as heapsize is being passed in as a reference to build_heap so build_heap can change its value. We can trace this pretty much by hand:
29 $$heapsize = @$array;
$$heapsize gets set to @$array or the length of the array. But since Perl has a zero subscript (instead of one) $array->[$$heapsize] translates into an empty element, one past the bounds of the array.
Let's go into our editor, and change the offending line to:
29 $$heapsize = @$array - 1;
(I wish there was a way to modify it temporarily in the debugger, but there isn't, because of very deep-seated reasons of the way eval works.) Run it again. This time, when we run it, it looks like this:
33 6 6 2 2 4 5 1 1
1 2 6 1 2 4 5 6 >33<
1 2 5 1 2 4 6 >6 33<
1 2 4 1 2 5 >6 6 33<
2 2 1 1 4 >5 6 6 33<
1 2 1 2 >4 5 6 6 33<
1 1 2 >2 4 5 6 6 33<
1 1 >2 2 4 5 6 6 33<
1 >1 2 2 4 5 6 6 33<
1 1 2 2 4 5 6 6 33
Much better. Each time the loop happens, the sorted part of the string grows by one, until finally the whole array is sorted.
That was a pretty easy bug in retrospect, but believe me, Perl's ability to blur the line between programming and debugging will really help you create bug-free programs fast. At low-level coding, the debugger is your eyes into how the program works, whether that debugger is the one listed above, or by putting simple print statements in the code to figure out what is going on.
However, this is not the only way we can trace down logical bugs. The other major way of tracing down bugs is by doing coverage testing, in which you know which parts of your code have been executed and which ones have not.
Coverage testing is concerned with whether or how often a piece of code has been executed. To that end, there exists a CPAN module named Devel::Coverage, which is now in alpha state and should be out in beta before this book hits the shelves. (However, I have not had any problems using it so far.)
What is the need for coverage testing? If you have any conditional statements in your code at all, then you may have holes in your testing logic. Say you had the following code, in script.p:
1 if (_$ARGV[0] > $ARGV[1]) { _doSomethingNice(); }
2 if (_$ARGV[0] < $ARGV[1]) { _dieAHorribleDeathAndTakeDownTheOSWithMe(); }
If you just tested this code with a simple:
prompt% perl script.p 22 11;
Then Perl would execute _doSomethingNice(). Believing that this code was ready for production, you could release it with the bug in line 2.*
The _dieAHorribleDeathAndTakeDownTheOSWithMe() bug actually might happen if you are doing something particularly dangerous, and are running the script with a user that has special authority. In these cases, coverage testing is a necessity, and you will get bitten quite strongly if you don't do it assiduously. Needless to say, you are going to want to do a lot of testing for scripts that run as 'root' or 'super-user', and therefore could potentially do #2 if you underestimate what you are doing. |
This shows the need for coverage testing, in which you would have exercised the code in line 2 (preferably with a user that has less authority and cannot actually do the things that the bug entails.) Devel::Coverage deals with this issue by actually keeping track of the lines of code that you have tested, and saving the file's lines plus the number of times they were executed, into a '.cvp' file. Then, use a tool called coverperl (which is installed into the Perl binary directory and should be in your path) to read the contents of this file, and display them to the screen.
For example, let's see how well our heapsort code fares under the coverage test. We would say:
prompt% perl -d:Coverage rightheap.p
prompt% coverperl rightheap.cvp
Total of 1 instrumentation runs.
/home/ed/rightheap.p
1 main::build_heap
1 line 26
1 line 28
1 line 29
1 line 31
5 line 33
27 main::heapify
27 line 39
27 line 40
27 line 41
27 line 43
27 line 44
27 line 47
27 line 50
13 line 52
13 line 53
1 main::heapsort
1 line 9
1 line 11
1 line 12
##.... stuff deleted
22 main::swap
88 line 60
3 line 1
1 line 3
1 line 4
1 line 5
The lines like '1 main::heapsort' show how many times subroutines has been executed. They are also indented in. Lines inside subroutines are indented relative to their parent subroutine, and lines that are from the main loop are not indented in. This is pretty straightforward. The only slight complication comes with lines such as:
22 main::swap
88 line 60
This is saying that main::swap was called 22 times, but line #60 (part of the subroutine swap) has been executed 88 times! What's going on here?
Well, if we look at line 60, we see:
60 sub swap { my $tmp; $tmp = $_[0]; $_[0] = $_[1]; $_[1] = $tmp; }
which has four different statements in it. Therefore, the Coverage tool is sensitive to each expression in a line, not the line itself. Unfortunately, right now they are jumbled up so we can not see which expression has been run and which ones have been ignored (did I mention this was alpha software?) but that problem should be addressed by the time this book hits the shelves.
A couple of more things about the Coverage tool. First, it resets itself each time you modify it in any way. If you change things, the Coverage tool has no way of knowing whether the change you made was serious or not. Therefore, if you add a space to the end of a line, all your previous coverage tests are gone. It makes sense to save them often, if you want a history of what you have tested. Either that, or save the coverage testing for last.
Second, the Coverage tool will avoid the 'import()' directives, and everything else at compile time. This is just a simple fact about how the debugger works. It needs compiled bytecode before it can start.
It is very important to realize that just because the Coverage tester says there has been more than one execution of a line, does not mean that the code is bug free. It is obvious that if a line says:
0 main::mysub
that this is a hole, a bug, since the code has never been tested before. However, it is not obvious that when you see:
1 main::mysub
that this is bug free. A lot of people get a false sense of security from Coverage tools. They say, 'Oh, I've already covered that branch, it must be OK.' Here is a just a simple example that shows otherwise.
If you wanted to match the integer part of any number, one might try this code:
$data = '234222';
my ($number) = ($data =~ m"(\d+)\.");
If we run this under the coverage testing tool, we will get the comforting statement:
Total number of instrumentations: 1
1 line 1
1 line 2
However, of course there is a bug in this code. The fact that a '.' is essential in the regular expression causes the match to fail, and $number to remain undefined. We will have to go on to other methods that we will talk about next chapter (-w, use strict) to supplement our debugging arsenal.
Up to this point, we have covered logical bugs, in which the bugs were dependent on outright coding mistakes. However, as anyone can attest, there are other bugs that can occur. For example, you know there is something wrong if you get a transfer rate of four characters per second while downloading a webpage, even though the functionality is correct.
For this purpose, Perl has two profilers which concentrate on two different scopes of measuring performance. First, there is Devel::DProf, the most popular debugger. It is designed with the large-scale application in mind.
Second, there is the more recently introduced Devel::SmallProf, which is just as helpful for dealing with algorithm speed debugging. We shall go over an example of each.
But first, we shall look at a very helpful module Perl has in the standard distribution called Benchmark, which will help your debugging for speed problems immensely.
Benchmark.pm is a gem of a module that comes with the standard distribution. It is invaluable for doing speed comparisons of code, especially on those small pieces of code which add up to make a large impact on your system, but that there is no way to actually benchmark without being run thousands of times.
Suppose, for example, that you wanted to find out how much the copy in:
sub add { my ($a, $b) = @_; $a + $b }
slows down the adding function, as compared to:
sub add { $_[0] + $_[1]; }
which does no such copy because you are directly accessing the elements here. Well, Benchmark gives you a quick way of testing this:
addtest.p
use Benchmark;
sub addcopy { my ($a, $b) = @_; $a + $b }
sub add { $_[0] + $_[1]; }
timethese(1_000_000, { 'addcopy' => 'addcopy(543,111);',
'add' => 'add(543,111);' } );
timethese() is a quick and dirty test suite runner. Its job is to run (in this instance) one million addcopy() and add() subroutines, and display the elapsed time of each of these pieces of code. In this case:
prompt% perl addtest.p
Benchmark: timing 1000000 iterations of add, addcopy . . .
add: 13 secs (12.333 usr 0.00 sys = 12.33 cpu)
addcopy: 18 secs (18.33 usr 00 sys = 18.33 cpu)
Each copy of an integer (at least one less than 32 bits) took approx 6 seconds /1 million trials / 2 integers per trial = 3 millionths of a second. If we wanted to test the effect of larger integers on how long addition takes, we could say:
sub addcopy { my ($a, $b) = @_; $a + $b }
sub add { $_[0] + $_[1]; }
timethese(1_000_000, { 'addsmall' => 'add(543,111);',
'addlarge' => 'add(543234234234232422,111);' } );
and get the following:
Benchmark: timing 1000000 iterations of addlarge, addsmall . . .
addlarge: 14 secs (13.56 usr 0.00 sys = 13.56 cpu )
addsmall: 14 secs (13.50 usr 0.00 sys = 13.50 cpu )
which shows that there is no appreciable difference between the two.
Benchmark is infinitely useful. If you are just beginning with Perl, you can use it to get a feel about how fast or slow Perl is at certain things. Later, you can use it to test possible performance replacements for subroutines, to make sure that they actually make a performance improvement. This second use is what we will exploit below with the profilers Devel::DProf, and Devel::SmallProf.
Devel::DProf is the older of the two profilers. It is written in C, with the purpose of interjecting as little overhead as possible in timing Perl code. It is not in the standard distribution for some reason so you are going to need to pick it up from CPAN.
The use of Devel::DProf is analgous to Devel::Coverage. In fact, all of the debuggers outside the default borrowed their usage from Devel::DProf:
C:\> perl -d:DProf insertion.p
This runs our insertion sort under the Profiler (with 100 numbers being sorted), saving the information inside tmon.out. You then can extract this information by saying:
C:\> dprofpp -F
The '-F' is necessary because the underlying C-code is persnickety about the way that Perl handles exiting subroutines. This produces an output looking something like:
Total Elapsed Time = 2.33 Seconds
User+System Time = 2.18 Seconds
Exclusive Times
%Time Seconds #Calls sec/call Name
52.2 1.140 1 1.1400 main::insertion
46.7 1.020 5050 0.0002 main::swap
which of course is not that meaningful to us. We know that insertion sort spends most of its time in swap mode. We also know that main::swap is called by main::insertion. Hence, the second number is really indicative of the first. If we did not know the calling order, we could find out with:
C:\> dprofpp -T -F
This is not really good in this case since insertion sort is calling swap 5050 times in a row, leading to an output that looks something like:
main::insertion
main::sort
### repeated 4999 times.
)
DProf is not as good at the small projects as it is finding the bottlenecks in larger projects, especially larger text applications. Unfortunately, we do not really have something to show off DProf's talents in this book. The best thing we have is the calendar project from the last chapter.
So let's use DProf to see bottlenecks in the calendar. From there, we can tell whether or not we want to improve any subroutines.
Start by firing up the profiler, along with a client which uses the Tk::Calendar object:
C:\> perl -d:DProf calendar.p
where calendar.p looks like
use Tk;
use Tk::Calendar;
use Data::Dumper;
my $screen = new Tk::Calendar();
$screen->draw();
MainLoop();
Simple enough; make a calendar as the main window, draw it, and then go into the event loop. We need to exercise this code as much as we possibly can in order to get an idea of which functions are the bottlenecks. Cycle through the different operations that the calendar allows, such as:
going to the next and previous months
entering text into the associated calendar screens
have different types of notified boxes (flashing, and stable)
If we were doing this formally, we would want to either:
1. split the non-GUI components from the GUI components, and set up a scheme to test the non-GUI components with a file.
2. Use WinPerl++ (aka GuiDo) to make a regression test for the perltk script. Unfortunately, right now GuiDo is only available on Win32 platforms, but that may change in the future.
For now, we will just play around. Start up the calendar with:
prompt% perl -d:DProf calendar.p,
and wait for the screen to load. When I was doing this particular test, I:
Hit the forward month button four times
Clicked on four different dates in the same month
Entered in 'Coding Perl' as the plan for each of these days
Closed one of these windows
Tried to go back a month, opening up the file dialog box
Said 'yes' to the save question, closing all three dates
Went back three more months, opened the current day
Entered in 'book done', as the plan
Closed this window
Opened another date
Hit 'Done' in menu, bringing up 'Open Screens' window
Hit 'OK' to exit program.
Well, you get the idea. Tests like this in real life can be much more complicated than this, but they don't necessarily have to be. If you can pull off this sort of test on your code with a significant amount of interaction, you will get approximately 90% of the data about the performance of your code.
Here are the results of the performance analysis. The profiler only shows the top fifteen subroutines by default. Let's go with this default for now.
Type:
prompt% dprofpp -F
The -F is almost a necessity, but there are a couple of other interesting options we could have used here:
'-O' 1000 to show all the subroutines profiled
'-v' to sort by the amount of time each subroutine takes rather than the total time involved
Here is a sampling of the output, with the subroutines we can control in bold:
Faking 118 exit timestamp(s).
Total Elapsed Time = 69.39 Seconds
User+System Time = 5.24 Seconds
Exclusive Times
%Time Seconds #Calls sec/call Name
7.63 0.400 1125 0.0004 Date::Format::Generic::time2str
6.30 0.330 369 0.0009 Tk::Widget::new
3.24 0.170 1 0.1700 Tk::MainLoop
3.05 0.160 299 0.0005 Tk::button
2.86 0.150 284 0.0005 Time::Local::timelocal
2.67 0.140 1125 0.0001 Date::Format::Generic::_subs
2.48 0.130 672 0.0002 Tk::configure
2.48 0.130 253 0.0005 Tk::destroy
2.10 0.110 275 0.0004 Tk::Date::new
2.10 0.110 1 0.1100 main::CODE(0xb1588)
2.10 0.110 1107 0.0001 Date::Format::Generic::format_b
2.10 0.110 284 0.0004 Date::Parse::CODE(0x30257c)
2.10 0.110 127 0.0009 Tk::Derived::configure
1.91 0.100 549 0.0002 Tk::Month::_diffMonth
1.91 0.100 235 0.0004 Tk::Derived::Subconfigure
So we even though we have actually interacted with the calendar program 70 seconds, we have only actually exercised the system about 5 seconds or less.
What can we make out of the subroutines that take the most time? Well, time2str is used in more than one place, which can be confirmed either by looking at the code (the easiest way), or (if you are desperate) analyzing the output of:
%prompt dprofpp -T
which will give you the calling tree with reams of output (exercise is left to the reader). Since this subroutine occurs in many places, if we optimized it, we would get the most bang for our buck. However, this particular subroutine is not a good candidate for optimization because we do not own it. Either that or we optimize it, and tell the author Graham Barr about it so that he can fix it inside the release that he gives to the world.
In fact, out of all these 15 subroutines, we can only optimize two that we have directly programmed: Tk::Date::new and Tk::Month::diffMonth. Let's look at the code of Tk::Date::new:
13 sub new
14 {
15 my ($type, $widget, $monthstring, $day, $date, $row, $xcoord
16 $ycoord) = @_;
17 my $self = bless {}, $type;
18
19 if (!defined ($_dbm))
20 {
21 $_dbm = {};
22 tie(%$_dbm, 'AnyDBM_File', $_dbmfile, O_RDWR|O_CREAT, 0640);
23 }
24
25 (
26 $self->{'monthstring'}, $self->{'day'},
27 $self->{'date'}, $self->{'row'},
28 $self->{'xcoord'}, $self->{'ycoord'}, $self->{'widget'}
29
30 ) = ($monthstring, $day, $date, $row, $xcoord, $ycoord,$widget);
31
32 my $string = $self->{'monthstring'};
33 $string =~ s"Calendar Month:""g;
34 $string = $self->{'date'} . " $string";
35
36 $self->{'secs'} = _getSecs($string);
37 $self->{'dbmstring'} = $string;
38
39 return ($self);
40 }
There does not seem to be a whole lot we can optimize here. The only call that is moderately labor intensive is the tie to the DBM file, and we have already optimized it by putting it into the class! The only reason Tk::Date::new() takes up so much total time is because each button is a date, and each time we go to a next or previous month, we have to recreate and draw thirty or so buttons.
By making the Tk::Month a canvas, and putting dates directly on that canvas, we may make a performance improvement. But since this particular item takes up only 2.1% of the time, the redesign is probably not worth it. We might want to verify that Tk::Date::new() is efficient by looking at Devel::SmallProf below, but otherwise I think that the extra work would not be worth it.
The second call that we can optimize is _diffMonth(). _diffMonth was used to calculate whether a month was different or not, and looks like this:
sub _diffMonth
{
my ($secs, $type) = @_;
my $month = time2str("%b", $secs);
$secs = ($type eq '+')? $secs + 86400 : $secs - 86400;
my $newmonth = time2str("%b", $secs);
return(1) if ($month ne $newmonth);
}
Now, considering that time2str had the highest profile listed above (taking 7.63% of the time) we might want to consider substituting the subroutine above for a call to localtime, which can do the same task.
This subroutine is listed below:
sub _diffMonth
{
my ($secs, $type) = @_;
my $month = (localtime($secs))[3];
$secs = ($type eq '+')? $secs + 86400: $secs - 86400;
my $new_month = (localtime($secs))[3];
return(1) if ($month eq $newmonth);
}
How are we to be sure that the new _diffMonth actually does what it says it does, and increases the performance of a given application? Well, we could place this code in a second version of our calendar program, and then use DProf to see the results. If I had an automated test suite that might just be what we would choose to do. After all, the best way to test something is sometimes to see how it performs in real life.
However, in this case, the Benchmark module will do very nicely. Define a benchmark test case:
Listing 22.X diffmonthtest.p
1 #!/usr/local/bin/perl5
2
3 use Benchmark;
4 use Date::Format;
5
6 my $t = timeit( 10000, '_diffMonth1(100000)');
7 my $v = timeit( 10000, '_diffMonth2(100000)');
8
9 print timestr($t), "\n";
10 print timestr($v), "\n";
11
12 sub _diffMonth1
13 {
14 my ($secs, $type) = @_;
15
16 my $month = time2str($secs, "%b");
17 $secs = ($type eq '+')? $secs + 86400: $secs - 86400;
18 my $new_month = time2str($secs,"%b");
19
20 return(1) if ($month eq $newmonth);
21 }
22
23 sub _diffMonth2
24 {
25 my ($secs, $type) = @_;
26
27 my $month = (localtime($secs))[3];
28 $secs = ($type eq '+')? $secs + 86400: $secs - 86400;
29 my $new_month = (localtime($secs))[3];
30
31 return(1) if ($month eq $newmonth);
32 }
which uses the explicit version of the Benchmark module (timeit() and timestr() give a method of carrying around results of a benchmarking run. timeit() is used to actually make the benchmark, and timestr() to turn that result into a string. The results are:
time2str: 8 secs (7.87 usr 0.00 sys = 7.87 cpu)
localtime: 1 secs (1.77 usr 0.00 sys = 1.77 cpu)
So, yes, the results are quite a bit different. If we replaced all of our time2str() calls with localtime() calls, we could save:
.400 - .400 * 1.77/7.87 = .309 secs
Or a whopping .309 seconds. Not much, perhaps, but it does take the subroutines that we can control out of the top 15 CPU hogs. To make a long story short, here is the profile as it exists after we make the change:
prompt% dprofpp -F
Faking 104 exit timestamp(s).
Total Elapsed Time = 70.21 Seconds
User+System Time = 4.89 Seconds
Exclusive Times
%Time Seconds #Calls sec/call Name
6.95 0.340 379 0.0009 Tk::Widget::new
4.09 0.200 1 0.2000 Tk::MainLoop
3.48 0.170 284 0.0006 Time::Local::timelocal
3.07 0.150 1 0.1500 main::CODE(0xb1588)
3.07 0.150 302 0.0005 Tk::button
3.07 0.150 1 0.1500 Tk::Text::CODE(0xccd3c)
2.86 0.140 254 0.0006 Tk::destroy
2.66 0.130 284 0.0005 Date::Parse::CODE(0x30257c)
2.66 0.130 284 0.0005 Date::Parse::str2time
2.66 0.130 142 0.0009 Tk::Derived::configure
2.45 0.120 720 0.0002 Tk::configure
2.25 0.110 386 0.0003 Tk::Widget::CreateArgs
1.84 0.090 282 0.0003 Tk::Widget::place
1.84 0.090 369 0.0002 Tk::Widget::SetBindtags
We have gotten rid of the two items we could control that even showed up on the scanner of DProf. We could also do the same for str2time, but my hunch is that it is probably worthwhile to keep str2time around for flexibility's sake.
The question now is: does all of this tinkering make the calendar object a better object? In this case, I would say yes. Even though .390 seconds doesn't sound like a great deal, when you are actually maneuvering around in the Calendar application, response time is very critical.
Many people's perception of the Internet is that one half a second between clicks of the mouse can be intolerable, let alone twenty to thirty seconds between mouse click time. Indeed, I did notice the speed difference in our example. Multiply this by the speed difference you will get by doing things like this in real life, and you've made your applications twice as usable.
DProf is a good tool to do debugging 'in the large'. To profile something, simply say:
C:\> perl -d:DProf file.p
resulting in a tmon.out file, and then say:
C:\> dprofpp -F
to get a printout of the functions that take up the most time. There is lots of functionality that we have not listed above. In their own way, dprofpp and DProf are just as complicated as the debugger, so here are the flags you can pass to dprofpp, listed below:
Table 22.X - dprofpp flags
Flag Meaning
-O maximum number of subroutines to display
-a sort subroutines in alphabetical order
-l sort by number of calls to subroutines (for inlining)
-v Sort by average amount of time spent in subroutines
-T Show the call tree (shown above)
-t show the compressed call tree (not very compressed though!)
-q Don't print headers
-u use user time rather than user plus system
-s use system time rather than user plus system
-r use real elapsed time rather than user+system
-U Do not sort subroutines
-E Sub times are reported exclusive of child times
-I Sub times are reported inclusive of child times
-V print dprofpp's version
-p script profile script 'script', and generate a report
-Q used with '-p' to profile the script and then quit before generating report
This is straight from the script dprofpp (stored in the central distribution when you install it). Dprofpp has a great deal of elaboration on these explanations which is stored in the script itself as a pod file, so you may want to check it out.
The second profiler we will look at is called Devel::SmallDProf It is a relatively new (beta) piece of software which allows you to trace the performance of Perl code line by line. For example, if you wanted to see exactly how much time each line in the following code took:
Listing 22.X multvsadd.p
multvsadd.p
1 $DB::print_lines = 1;
2 my $yy = 1;
3 my $zz = 1;
4 for ($xx = 0; $xx < 100000; $xx++)
5 {
6 $yy = ($yy * 2) % 100001;
7 $zz = ($zz + $zz) % 100001;
8 }
Two things here. First, use $DB::print_lines = 1, because that gives verbose output with the actual text of the lines (rather than just their numbers). Second, add the modulo 1000001 to multiplication just so the numbers don't get unreasonably big. We then use SmallDProf, and say:
prompt% perl -d:SmallDProf multvsadd.p
and look at the file smallprof.out. It looks like:
==================================================
Profile of multvsadd.p
==================================================
0 0.00000000 #!/usr/local/bin/perl5
0 0.00000000
0 0.00000000
1 0.00003397 $DB::print_lines = 1;
1 0.00003409 my $yy = 1;
1 0.00003493 my $zz = 1;
1 0.00004101 for ($xx = 0; $xx < 100000; $xx++)
0 0.00000000 {
0 0.00000000
100000 3.98235953 $yy = ($yy*2) % 100001;
100000 4.77289319 $zz = ($zz + $zz) % 100001;
0 0.00000000 }
==================================================
Hence, we can see that multiplying by two is faster than adding a number to itself.
By default, SmallProf profiles everything that is Perl. It does not profile C-code linked into Perl, because those lines of code are outside the range of the '-d' flag. If you are dealing with a large program which includes lots of modules, you are not going to want to profile everything. In cases like these, you are going to want to skip certain modules in order to pinpoint performance bugs. SmallProf has two ways of doing this, which are listed below.
$DB::smallprof_on
Devel::SmallProf has two ways of skipping modules. First, there is the flag $DB::smallprof_on which can be used to start and stop the profiling of data. For example, if you said something like:
$DB::smallprof_on = 0;
# do lots of other stuff
DB::smallprof_on = 1;
for ($xx = 0; $xx < 10; $xx++)
{
print "Profiling!\n";
}
$DB::smallprof_on = 0;
then SmallProf would only profile the for loop and leave the rest of your code alone. Note that the default for DB::smallprof_on is 1, so unless you want the beginning of your program profiled, you have to manually switch it off yourself.
@DB::packageprof_on
Of course, it would be a real pain to have to turn off profiling each time you enter into a different package, or call a method based on that package. Therefore, SmallProf provides a simple way of turning off tracing for all but a subset of packages. If you say something like:
use Data::Dumper;
@DB::packageprof_on = ('Data::Dumper', 'main');
use Date::Format;
my $time = time();
time2str("%b", $time );
then this will only trace the times of the calls in main. Hence you will find out how much time time2str takes, and how long the time() call takes.
That's about it. Since this is beta software at the moment, there isn't too much functionality in it. Let's now take a look at an example of Devel::SmallProf, to compare three different sorting algorithms (that basically do the same thing) in a statistically sensible manner.
We have outlined three different types of sorting algorithms: insertion sort, merge sort, and heap sort. There are two more search algorithms that are just as popular: quick sort and radix sort.
In addition, Perl sports its own, internal version of quick sort which we discussed earlier. Just for kicks then, let's compare a:
radix sort coded in Perl
quick sort coded in Perl
the built-in quick sort.
This will probably be kind of embarrassing since Perl does so much better on math-like tasks with built-in functions and C extensions to Perl than it does with native, Perl functionality, but doing such a comparison really helps you understand what is quick in Perl, and what is not so quick.
Plan of Attack
The first step to do this compare is to code radix sort and quick sort, for a good comparison. Second, we need to make what is called a test harness. We notice that one simple test is not going to be adequate because a sorting algorithm might work one way with a small set of input, and another with a totally different set of input. Third, we need to combine the two together, run SmallProf, and analyze the results.
Coding Radix Sort and Quick Sort
I'm not going to dwell too much on how Radix Sort or Quick Sort works. Here is a small, very high level explanation.*
For more details, go to the book 'Introduction to Algorithms' by Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. This is the nominal book on tons of algorithms including quick sort and heap sort. |
Radix Sort works on the principle of collating input: looking at the least significant digit, then looking at the next significant digit, and so on, up to the top digit. Each time you sort that output in the digit order that you see. Sort of like Figure 22.X:
22X.fig
Figure 22.X
Radix Sort in action
By the time then you have gotten to the last digit, the output is sorted.
Quick Sort is like Merge Sort in that it is a recursive divide-and-conquer algorithm. Instead of dividing exactly in half as does like merge sort, quick sort divides according to an algorithm called partition which sorts an array in place. Partition works like Figure 22.X:
22X.fig
Figure 22.X
Partition algorithm in quick sort
The key here is that the partition algorithm does a partial sort such that one side of the array is always guaranteed to be smaller than the other side (although nothing can be said about the ordering in either of these two sides). Also, partition returns the location of this split between small-half and large-half occurs, so that quick sort knows how to divvy up the array next. By the time that quick sort has gotten to the bottom of the array (has divvied up the array into segments of one) the whole array is sorted.
Again, these are only high level explanations of how the two processes work, and in any case I just lifted the pseudo-code out of 'Introduction to Algorithms' and fleshed in the details. So here are working versions of the code.
Implementation of Radix Sort.
Let's implement Radix Sort first.
Listing 22.X radix.p
1 use strict;
2 sub radix
3 {
4 my ($array) = @_;
5 my ($elem, $max, $xx);
6 my (@arrayelem);
7
8 foreach (@$array) { $max = (length($_) > $max)? length($_) : $max; }
9
10 for ($xx = 0; $xx <= $max; $xx++)
11 {
12 @arrayelem = ();
13 foreach $elem (@$array)
14 {
15 my ($key) = substr($elem, $xx, 1);
16 push(@{$arrayelem[$key]}, $elem);
17 }
18 straighten(\@arrayelem, $array);
19 }
20 }
21
22 sub straighten
23 {
24 my ($arrayelem, $return) = @_;
25 my ($array, $elem, $xx) = ('','',0);
26
27 foreach $array (@$arrayelem)
28 {
29 next if (!defined $array);
30 foreach $elem (@$array)
31 {
32 $return->[$xx++] = $elem;
33 }
34 }
35 }
Line 15-16 are the key. First, we get one character in each substring and sort it by putting it inside a bucket. This means that if we saw a zero, we'd put the element associated with it in bucket 0, if we saw a one, bucket 1, and so forth. In line 18, then, we re-arrange the array by the subroutine straighten, taking the contents of these buckets and stringing them together (line 31). By the time we have arranged the last elements, the sort is done.
Quick Sort Implementation.
Let us now implement quick sort. Again, the key algorithm is the partition subroutine, which tells quick sort how to split-up the problem, and which does the sorting in-place.
Listing 22.X - qsort.p
1 #!/usr/local/bin/perl
2
3 use strict;
4
5 sub qsort
6 {
7 my ($array) = @_;
8 _qsort($array, 0, scalar(@$array) - 1);
9 }
10
11 sub _qsort
12 {
13 my ($array, $pivot, $right) = @_;
14 my $new_pivot;
15
16 if ($pivot < $right)
17 {
18 $new_pivot = _partition($array, $pivot, $right);
19
20 _qsort($array, $pivot, $new_pivot);
21 _qsort($array, $new_pivot+1, $right);
22 }
23 }
24
25 sub _partition
26 {
27 my ($array, $low, $high) = @_;
28
29 my $element = $array->[$low];
30 my $left = $low;
31 my $right = $high;
32 my $action_taken = 0;
33
34 while ($left != $right)
35 {
36 $action_taken = 0;
37 if ($array->[$left] < $element) { $left++; $action_taken = 1; }
38 if ($array->[$right] > $element) { $right--; $action_taken = 1; }
39
40 if ($array->[$left] > $element)
41 {
42 swap ($array->[$right], $array->[$left]); $action_taken = 1;
43 }
44 if ($array->[$right] < $element)
45 {
46 swap ($array->[$right], $array->[$left]); $action_taken = 1;
47 }
48
49 if (($array->[$right] == $element)&&($action_taken == 0)) { $right--; }
50 }
51 return($left);
52 }
53
54 sub swap { my ($tmp) = $_[0]; $_[0] = $_[1]; $_[1] = $tmp; }
55 1;
As we said above, _partition is the important algorithm here. The loop in 35 to 50 looks at the elements to the left of the array, and only swaps them with the right side elements if they are smaller than the central 'pivot' element ( line 42 ) _partition also looks at the elements to the right of the array, and only swaps them with the left elements if they are smaller than the central pivot element ( line 46 )
In addition, we skip elements that are in the 'right place': Lines 37 and 38 keep track of where in the array we have searched, constantly moving toward each other as the algorithm proceeds. When they merge together ($left == $right) we return the place where they merge as the new place for quick sort to partition.
That's about it for our contenders. The qsort that comes along with Perl is pretty easy to code, all we have to do is say:
@elements = sort (@elements);
and this will do the job..
Now that we are done coding the contending algorithms, we are going to need a series of arrays to sort. In this case, the Benchmark module isn't very useful, for we want to test the properties of sorting on differing arrays, not the same array a thousand times.
We need to come up with a test harness, something that generates tests and actually executes them. Fortunately, Perl is very good at that. The natural structure for this problem is an Array of Arrays, actually 3 Array of Arrays, one for each sorting algorithm that we are going to test.
The test harness is in Listing 22.X:
Listing 22.X - testharness.p Test Harness for Sorting Comparison
1 use Data::Dumper;
2 use strict;
3
4 require "qsort.p";
5 require "radix.p";
6
7 my $xx;
8 my ($radix_sort, $perl_qsort, $internal_qsort) = ([],[],[]);
9
10 for ($xx = 0; $xx < 100; $xx++)
11 {
12 my ($no_of_elements) = int(rand()*100) + 2;
13 my $sort_array = _makeRand($no_of_elements);
14
15 $radix_sort->[$xx] = [@$sort_array];
16 $perl_qsort->[$xx] = [@$sort_array];
17 $internal_qsort->[$xx] = [@$sort_array];
18 }
19
20 for ($xx = 0; $xx < 100; $xx ++)
21 {
22 qsort($perl_qsort->[$xx]);
23 radix($radix_sort->[$xx]);
24 @{$internal_qsort->[$xx]} = sort (@{$internal_qsort->[$xx]});
25 }
26
27 sub _makeRand
28 {
29 my ($no_elements) = @_;
30 my $return = [];
31 my $xx;
32 my $range = int(rand() * 100) + 2;
33
34 for ($xx = 0; $xx < $no_elements; $xx++)
35 {
36 $return->[$xx] = int(rand()*$range);
37 }
38 return($return);
39 }
This is what you might call the scattershot approach to testing. First, we require the qsort.p and radix.p programs even though we know we coded them for standalone applications (this is a quick way to test functionality without having to cut and paste code.) We then create a bunch of arrays with random length, random order, and random element. Then, we copy them three times and stuff them into a double dimension array. This ensures:
1. an even distribution of array sizes
2. good variability in their order
3. some atypical arrays. qsort may work well on totally random arrays, for example, but how does it work on sorting (1 2 1 3 1 2 3) where the elements are not too far apart?
In real life, we would want to do a lot more of this variability testing. We might want to test already sorted arrays (1 2 3 4 5) or arrays that have patterns in them (1 2 3 1 2 3 1 2 3) to see how it affects the time.
For a good example of real life testing, look at the 't/' directory (for tests) inside the Perl standard distribution. This gives an idea of how much variability a test suite can comprise, although Perl itself might be a worst case scenario.
Testing with SmallProf
We now have our test harness and we have our code. All we have to do is fire up SmallProf, and see what happens:
C:\> perl -d:SmallProf testharness.p
and print out the output:
==================================================
Profile of qsort.p
==================================================
0 0.00000000 #!/usr/local/bin/perl
0 0.00000000
3 0.00011611 use strict;
0 0.00000000
0 0.00000000 sub qsort
0 0.00000000 {
100 0.00416720 my ($array) = @_;
100 0.00485253 _qsort($array, 0, scalar(@$array) - 1);
0 0.00000000 }
0 0.00000000
0 0.00000000 sub _qsort
0 0.00000000 {
112114 4.70423114 my ($array, $pivot, $right) = @_;
112114 3.67239106 my $new_pivot;
0 0.00000000
112114 4.88533628 if ($pivot < $right)
0 0.00000000 {
56007 1.99951482 $new_pivot = _partition($array, $pivot, $right);
0 0.00000000
56007 2.12438154 _qsort($array, $pivot, $new_pivot);
56007 2.35733151 _qsort($array, $new_pivot+1, $right);
0 0.00000000 }
0 0.00000000 }
0 0.00000000
0 0.00000000 sub _partition
0 0.00000000 {
56007 2.34353590 my ($array, $low, $high) = @_;
0 0.00000000
56007 2.31031060 my $element = $array->[$low];
56007 1.96292341 my $left = $low;
56007 1.96854722 my $right = $high;
56007 1.94039702 my $action_taken = 0;
0 0.00000000
56007 1.87235343 while ($left != $right)
0 0.00000000 {
696038 24.10507333 $action_taken = 0;
1193510 43.92413342 if ($array->[$left] < $element)
{ $left++; $action_taken = 1; }
1330440 49.74650681 if ($array->[$right] > $element)
{ $right--; $action_taken = 1; }
0 0.00000000
842032 32.56550169 if ($array->[$left] > $element)
{
swap ($array->[$right], $array->[$left]);
$action_taken = 1;
}
888644 34.93133080 if ($array->[$right] < $element)
{
swap ($array->[$right], $array->[$left]);
$action_taken = 1;
}
0 0.00000000
814161 35.19384336 if (($array->[$right] == $element )
&& ($action_taken == 0)) { $right--; }
0 0.00000000 }
56007 2.81549573 return($left);
0 0.00000000 }
0 0.00000000
507900 18.91453564 sub swap { my ($tmp) = $_[0]; $_[0] = $_[1]; $_[1] = $tmp; }
0 0.00000000 1;
==================================================
==================================================
Profile of radix.p
==================================================
0 0.00000000 #!/home/epeschko/perl5.004_50/install/bin/perl
0 0.00000000
3 0.00011599 use strict;
0 0.00000000
0 0.00000000 sub radix
0 0.00000000 {
100 0.00426257 my ($array) = @_;
100 0.00358534 my ($elem, $max, $xx);
100 0.00347233 my (@arrayelem);
0 0.00000000
56173 3.00032341 foreach (@$array) { $max = (length($_) > $max)?
length($_) : $max; }
0 0.00000000
100 0.00402462 for ($xx = 0; $xx <= $max; $xx++)
0 0.00000000 {
391 0.18335211 @arrayelem = ();
391 0.01458871 foreach $elem (@$array)
0 0.00000000 {
218686 8.74936187 my ($key) = substr($elem, $xx, 1);
437372 19.37564433 push(@{$arrayelem[ord($key)]}, $elem);
0 0.00000000 }
391 0.01633441 straighten(\@arrayelem, $array);
0 0.00000000 }
0 0.00000000 }
0 0.00000000
0 0.00000000 sub straighten
0 0.00000000 {
391 0.01596546 my ($arrayelem, $return) = @_;
391 0.01618814 my ($array, $elem, $xx) = ('','',0);
0 0.00000000
391 0.01507688 foreach $array (@$arrayelem)
0 0.00000000 {
16963 0.69085145 next if (!defined $array);
3136 0.11467004 foreach $elem (@$array)
0 0.00000000 {
218686 10.01768053 $return->[$xx++] = $elem;
0 0.00000000 }
0 0.00000000 }
0 0.00000000 }
0 0.00000000 1;
==================================================
==================================================
Profile of testharness.p
==================================================
1 0.00003302 $DB::print_lines = 1;
1 0.00004196 for ($xx = 0; $xx < 100; $xx ++)
0 0.00000000 {
100 0.00549781 qsort($perl_qsort->[$xx]);
100 0.00527000 radix($radix_sort->[$xx]);
100 4.26977360 @{$internal_qsort->[$xx]} =
sort (@{$internal_qsort->[$xx]});
0 0.00000000 }
I've cut out all but the most essential, but I hope you get the idea. The internal sort took a total of approximately 4 seconds, radix took about 25, and qsort programmed in Perl took about 140. More importantly, we can see exactly how the two algorithms work, and which statements are taking up the most time. The qsort's partition function looks horrendously inefficient, for example: out of approximately 140 seconds in quick-sort, 120 are being used to shuffle pointers to elements, and only 18 are being used to do the swapping!
I must confess I perpetrated this particular crime of an algorithm: 'Introduction to Algorithms' has a typo somewhere in their version of partition, and I didn't want to
Anyway, through this sort of analysis, you can see - to the statement - where your bottlenecks are. With a little bit of effort, we could probably bring quicksort's time down to about 25 seconds, which is still 8 times the built-in speed. Ah well, if we really wanted to make an algorithm different than the built-in, we would probably have to resort to making a C-extension anyway, discussed at length in the perlXS online manual.
SmallProf is pretty new software, but it has already changed the way I view my debugging for speed. By using SmallProf often, you get an intuitive feel for what Perl does fast, and what Perl does slow. There is a really cool chapter in the book 'Programming Perl'*
By Larry Wall - author of Perl himself, Tom Christiansen, and Randal Schwartz. The one with the Camel on the cover. |
that gives a bunch of efficiency tips. Here are some tips, summarized from that section (there are many more there):
Array access is faster than hash access
Hash access is faster than searching through an array (linear search)
Short-circuit alternation is faster than regular expressions (most of the time) /a/ || /b/ is faster than /a|b/
You can see all of these things in action by using SmallProf. We have just seen things that might look fairly innocuous (compares, array accesses) can make a big difference if you lazily throw them around (as I did) in a loop that is called often.
Let's discuss the Perl compiler. On the face of it, compilers are not the most exciting of programs. They don't actually let you 'do more'. Compilers are concerned with how scripts are packaged, rather than any new functions or features in the language.
However, if you think about it a little bit, having a compiler and using it well, gives you a lot of associated benefits. With a compiler you can:
1) Eliminate parsing time. Since Perl does not need to parse through all those lines of code once you hit return, scripts that are longer than 1000 lines suddenly are much faster to parse than they would have been otherwise.
2) Speed up execution time over interpreted code. This is not much of a deal in the Perl world, since code is already pretty optimized with the current bytecode interpreter. However, for some things such as object access, it is a major improvement.
3) Eliminate dependencies. The compiler can create a standalone executable for you. It is 'stand alone' in the sense that you do not need to have the setup of libraries in the correct place like you usually would with the interpreter.
4) Hide code which can still be executed. This is a big deal with some folks. If you can 'hide' your code, you can sell it. (I think that #3 is more important for the 'selling factor'. The dependencies in Perl can be a real wet-blanket on portability.)
Hence, when Perl got its own compiler (circa perl5.004) it was a real plus to the language. The way that the compiler was programmed - as an extension to the language - was also a big boon. Malcolm made it so that you could see how Perl parses its source code, into an operation code tree (or opcode tree) which has led to the development of tools that can see 'good style' or 'bad style' code without actually executing the code itself.
Also, the way that the compiler is programmed allows for many different types of compilation:
1) Compilation as a standalone executable. This is the main type of compilation, in which a script, and is turned into a binary executable which can be run without the need for supporting libraries.
2) Compilation of modules into linkable binary shared objects. Lots of available extensions include '.xs' code - which is really C-code that is compiled into a '.so' (shared object) file. And when you say:
use Object;
the binary libraries get loaded into your Perl script in no time flat. The compiler supports making these '.so' types of libraries, which cuts down on the loading time of the object that you have compiled.
3) Compilation of modules and scripts into bytecode. The interface for this is still being worked out, but eventually the Perl compiler will allow you to compile objects into machine-independent byte code. This is much like Java's, except this bytecode should be portable anywhere as it doesn't have to worry about competing and incompatible virtual machines.
Although the interface for this is not nailed down (we won't cover it here), this is going to be something to watch. Bytecode gives you most of the advantages of compiled code (the startup time is minimal) yet it is just as portable as regular Perl is. Again, stay tuned.
Since Perl supports such a wide variety of platforms (each of which have its own rules about C code, and how to compile it) the Perl distribution provides a front end to the complicated rules that are necessary to do compilation. This frontend is named 'perlcc' (after the Unix c compiler) and it is quite easy to use. First, if you want to get all of the options for perlcc, you can say:
prompt% perlcc -h
or
prompt% perlcc
which lists all of the options that are available in compiling programs. Not that this is any big deal; suppose that we wanted to compile the calendar program that we developed last chapter. We would say:
prompt% perlcc -ocalendar calendar.p
Then, Perl will go into your calendar.p program, parse how the calendar needs to be laid out in order to do compilation, and actually compile calendar.p into the program 'calendar'. That way, you can run:
prompt% calendar
from anywhere where the calendar program is installed, without having to worry about having perlTk installed. This means, of course, that you can distribute the binary by itself, without needing Perl attached.
The second way of compiling using CC is turn libraries into shared objects. This has the benefit that you can still see what is going on with your 'application' code (your central scripts which use the compiled libraries stay the same) but the resulting 'shared objects' can still have dependencies on other shared objects depending on how it was compiled.
If we wanted to compile the Tk::Calendar Perl source code into a shared object, we could have said:
prompt perlcc -mTk/Calendar Tk/Calendar.pm
Again, this makes an '.so' file, but does not lessen the dependencies of Tk::Calendar on the objects that it uses, Tk::Month and Tk::Date.
There are two things that you should be aware of when using Perl's compiler. First, Perl's compiler does not support all of the nuances of Perl syntax. You can do things in Perl such that the code happens at runtime, which means that the compiler can't possibly know about it. For example:
my $xx = 2;
LABEL1: print "HERE2!\n";
LABEL2: print "HERE3!\n";
goto "LABEL". $xx;
This figures out the place that the code is going to jump to at the time it actually does the jumping which means that Perl does not know where to jump until the exact moment of jumping. Perl will warn you about such problems, so you have a chance to change your syntax. In 99.9% of the cases, if you are fooling the Perl compiler, your code probably should be changed to something more simple.
Second, note that the compiler is a new piece of software, and hence is somewhat experimental on some systems. The compiler front-end, perlcc, will warn you if you try to do something that the compiler does not support on your given platform. Email to perlhelp@apxtech.com if you have an issue that you want resolved with the compiler, or want to get some help.
The compiler is going to be your primary way of making your scripts be able to be passed from machine to machine. Working with an interpreter inevitably leads to dealing with lots of dependencies, as in when a module is stored in a certain place, which leads to you supporting a directory structure, which leads to your scripts being tied down to a certain location. The compiler frees those dependencies to a large extent. Type 'perlcc -oscript script.p', and see your scripts run by themselves via simply 'script'.
We covered quite a bit in this chapter:
1) making your own debuggers
2) the Perl internal debugger
3) coverage testing scripts
4) Perl profilers that allowed you to do performance testing,
5) the Perl compiler which turn your scripts into executables
As time goes by, the number of these tools are going to expand quite a bit. Perl's success at developing tools to manipulate Perl programs is due to the fact that its debugging/profiling/coverage testing/compiling ability is programmable. All of these tools were programmed in Perl itself. This means that each one of the tools listed is extremely versatile, since Perl itself is extremely versatile.
In the following chapter, we shall continue in this vein; talking about how to debug and manipulate your Perl programs. All of the above tools were fairly generic (you could find them in other languages.) There are quite a few techniques which are perl specific though, which help you debug in ways that are quite alien to other languages, and we turn these tools next.
![]() ![]() |
![]() ![]() |
![]() ![]() |
COMPUTING MCGRAW-HILL | Beta Books | Contact Us | Order Information | Online Catalog
HTML conversions by Mega Space.
This page updated on October 14, 1997 by Webmaster.
Computing McGraw-Hill is an imprint of the McGraw-Hill Professional Book Group.
Copyright ©1997 The McGraw-Hill Companies, Inc. All Rights Reserved.
Any use is subject to the rules stated in the
Terms of Use.