Orders Orders Backward Forward
Comments Comments
© 1997 The McGraw-Hill Companies, Inc. All rights reserved.
Any use of this Beta Book is subject to the rules stated in the Terms of Use.

Chapter 19: A Class for Maintaining Code Documentation

This chapter synthesizes the information from the last three chapters. Now that you understand Perl's object syntax and common object methods, can turn Perl code into objects, and realize the benefits and drawbacks of object oriented methodology, the next step is to put all of this knowledge to use in creating new classes to integrate into existing projects.

This chapter creates a class that keeps code documentation up to date, not normally a trivial task. This code documentation is used internally for program maintenance, and externally for help at the command line. In addition, we provide methods for manipulating the documentation inside your own code.

Everybody who has been in contact with documentation (even those of us who write books!) know that there are several hurdles to overcome in order to make effective documentation. The class in this chapter - DocChecker (or Pod::Checker) - is intended to soften some of those hurdles for the documentation that comes in code.

Chapter Overview

This chapter goes over the steps from beginning to end in implementing objects. The process is:

1) recognize that a problem exists and define that problem

2) search through available resources for a solution

3) evaluate these resources to see if they are satisfactory at solving the problem, and if not, what we can do to integrate them into a solution of our own

4) make some proposals, and an informal design for a solution

5) move the informal design into a formal design.

6) implement the solution

7) document the new class.

8) regression test the new class.

The intent here is twofold. First, we want to show you a non-trivial Perl object and its syntax. Second, for those of you who have not done much object oriented programming, exactly how much work is involved in creating a solid object from scratch! Let's take a look at the problem: effective management of code documentation.

The Problem: Solving the Code Documentation Nightmare

Anyone who has been on a big project knows that documentation can become quite a bone of contention between the initial programmers and the maintenance programmers. Documentation can be:

in the wrong format

lost or never created

out of date

just plain ignored

Our documentation problem is really four problems in one, albeit interrelated. If the code documentation is the wrong format, nobody will ever read it, and it will be ignored. If the code documentation is out of date, it will be ignored. If the code documentation is in a hard-to-make format, nobody will ever do it, and it will never be created.

On the other hand, most people who work with or use code like to have programs documented. If a code maintainer or troubleshooter, good code documentation should make the job easier. If a code user, good documentation should steer the user in the right direction.

So lets start on the path of making a module to handle this type of problem. Let's start with our plan and take a look at the available resources.

Step #1: Looking at Available Resources

Looking for existing resources when starting a big project will ultimately be a big win for you. (It's a good habit to acquire.) When you look before you leap you:

save yourself time: you can often slip code from other people directly into your own code

learn more: you can learn from how other people do things

Now, it may not be that you want to go along with the code solutions that you find. Sometimes the dependancy on someone else's code is not acceptable, or you simply want to see if you can solve the problem better than what you find. It is your choice. But even if you do not accept what you find, you can still look through the code that you get for ideas. The starting point for this solution is called POD. It, and the ubiquitous CPAN which has a lot of pre-packaged functionality, is what we shall look at next.

POD - Plain Old Documentation

Realizing that documentation is often lost, the perl5 porters group decided that the best place to actually keep documentation for code is where it can't be lost. This place is in the code itself. There would be no way to delete documentation about the code without deleting the code itself. Beautiful!

Another problem that POD was meant to solve is the obsolescence of documentation. Say that the current hot thing is MSWord 7.0, and all of a sudden your company acquires a bunch of Sun boxes. Suddenly the documentation is out of date!

So a second design decision was made: that POD would be such a ubiquitous format that it can translated into several other formats without difficulty, and that these formats are so common that there would be no chance of them getting out of date. POD can be translated into

ASCII text

HTML (even does links for you)

latex

UNIX man pages

Postscript

Furthermore, all of the translations from POD to any given format were to be written in Perl itself, so there would be no problem of having any dependency on another program to do the conversion. This makes POD the perfect format for documentation for Perl scripts and modules. Below is a small users manual. The online documentation in perlpod is more comprehensive.

Usage of POD

POD has two concepts as its roots that make it a simple yet effective documentation source for Perl scripts. First, POD uses tags. A tag, in POD terms, is an '=' followed by a string that occurs at the beginning of a line. The existing tags are =head1, =head2, =item, =over, =back, =cut, and =pod -, although to use POD effectively you can confine yourself to '=head', '=item', '=over' and '=back'.

Second, POD works off the idea of paragraphs. A paragraph is defined as any string that is separated by at least two newlines. As POD works right now, if you make a tag, the text that gets associated with that tag will be up to the next two newlines.

=head1 This

=head2 That

This does not declare two separate tags (one with the value 'This' and one with the value 'That'. Instead, the one tag defined is:

This

=head2 That

because of the missing newline.

Note that the Perl executable is integrated with POD, so you can embed documentation directly into code. If you say:

for ($xx = 0; $xx < 10; $xx++)

{

= head This does nothing

This really does nothing, I swear!

 

}

This is perfectly legal syntax in Perl. However, note the two newlines. If there was only one newline, Perl would complain about the missing bracket because it would be seen as documentation rather than code. The Perl executable strips out all of the information after it sees a POD tag, and between certain POD tags, so the documentation is embedded in the code itself.

You now have embedded your documentation into the code. To retrieve the documentation, use one of the translators which come along with the Perl executable. There are four translators which strip out all of the code, leaving nothing but the documentation intact. (Think of the translators as kind of like a Perl executable in reverse.) These format translators are:

pod2html, pod2man, pod2text, pod2latex

Hence, if you say:

C:\> pod2html script.p

Perl then creates a script named script.html which can be used with your favorite browser. In addition, there is a pod2ps translator available on CPAN, which turns the documentation into any format that translates postscript.

We now need to look at the actual tags that you can use with POD. These are just a subset; take a look at the perlpod manpage online for more information.

=head

The first thing that POD looks for is a =head tag which signifies the beginning of a header. Anything after a string '=head' that starts at a the very beginning of a new line will be ignored by the Perl bytecode interpreter and compiler, and tossed into the bit bucket. For example, this small program:

$a = 1;

 

=head1 Sets $a to 1

 

This is a test of setting $a to 1

 

=head2 doesn't do much

 

As it says, it doesn't do much.

will compile, because everything after =head1 is ignored. The '1' and '2' in '=head1' and '=head2' are simply how far the text is to be indented in. If we ran it through the pod2html interpreter, and loaded it up into the browser, we'd get something like Figure 19.1:

191.gif

Figure 19.1

translation of head tags to html

Note the internal links here. pod2html is nice enough to put cross referencing links for you, so you get a mini-outline at the beginning of the html page.

=item and =over

=item and =over are used to make lists of items. For example, you could write:

Listing 19.1 itemtest.p

$a = 1;

=head1 This is the first list of items

 

=over 4

 

=item *

 

This is item 1

 

=item *

This is item 2

as documentation to your script. It will shift the output over 4 characters. Type:

C:\> pod2html itemtest.p --outfile 'itemtext.html'

to create the file named itemtext.html, which then can be pulled up in a browser. This looks something like Figure 19.2:

192.png

Figure 19.2

item and over tags

As you can see, items don't qualify as being part of an outline. Instead, Perl makes them bullet for the immediate head tag above.

Summary of POD usage

This is pretty much all you need to know about POD in order to become useful in it:

=head defines a heading

=item defines an item inside the code

The details are plentiful in the perlpod manpage, and there are plenty of examples in the standard distribution because everything is "POD"- ified!

CPAN

POD is a strong ally in our fight against the code documentation nightmare, but it isn't the entire solution. Remember our four original problems, that documentation can be:

(a) in the wrong format

(b) lost or never created

(c) out of date

(d) just plain ignored

Problems (a) and (b) are solved by the judicious use of POD. Problems (c) and (d) are not. After all, just because documentation is in a certain place does not mean that it is going to stay up to date or paid attention to.

The 'out of date' problem, is of course a matter of "round tuits" for most programmers. Most programmers simply don't take the time to properly document code, and by the time they notice that documentation is a problem in their code, the task is so monumental that they don't want to do it.

The 'ignore' problem is another big deal, which usually has something to do with ease of use. People want documentation to be available, and be easy to find. They do not want to have to search through directories, open executables, or to navigate through thousands of pages of online documentation in order to find what they want. They just want it correct and they want it now!

Looking for Documentation information on CPAN

So, being prudent programmers, and not wanting to redo anything that has already been done, the next step we take is to look online for available documentation packages. The easiest way to do this is by using the CPAN module. This is assuming you have the CPAN module installed (Please do so! It is so cool...) We can use the 'info' feature in the CPAN shell to look for modules having to do with documentation. The information feature, again is designated by the letter 'i':

C:\> perl -MCPAN -eshell

cpan> i /Doc/

Going to read C:\TEMP\CPAN\01mailrc.gz

Going to read C:\TEMP\CPAN\02packag.gz

Going to read C:\TEMP\CPAN\03mlist.gz

No objects found of any type for argument /Doc/

This searches for all of the modules that have the substring 'Doc' in them. Since we did not find any items with the substring 'doc' in them, lets try 'Pod' (you can just as easily say ('pod', 'POD' or 'pOd' - CPAN is case sensitive):

C:\> perl -MCPAN -eshell

cpan> i /Pod/

Going to read C:\TEMP\CPAN\01mailrc.gz

Going to read C:\TEMP\CPAN\02packag.gz

Going to read C:\TEMP\CPAN\03mlist.gz

Distribution BRADAPP/PodParser-1.00.tar.gz

Distribution KJALB/PodSimplify-0.04.tar.gz

Module Pod::Functions (TIMB/perl5.004.tar.gz)

Module Pod::HTML (KJALB/PodSimplify-0.04.tar.gz)

Module Pod::Html (TIMB/perl5.004.tar.gz)

Module Pod::Index (Contact Author KJALB=Kenneth Albanowski)

Module Pod::Latex (Contact Author KJALB=Kenneth Albanowski)

Module Pod::MIF (Contact Author JNH=Joseph N. Hall)

Module Pod::Man (Contact Author KJALB=Kenneth Albanowski)

Module Pod::Parser (BRADAPP/PodParser-1.00.tar.gz)

Module Pod::Pod (Contact Author KJALB=Kenneth Albanowski)

Module Pod::Select (BRADAPP/PodParser-1.00.tar.gz)

Module Pod::Texinfo (Contact Author KJALB=Kenneth Albanowski)

Module Pod::Text (TIMB/perl5.004.tar.gz)

Module Pod::Usage (BRADAPP/PodParser-1.00.tar.gz)

Module Tk::Pod (NI-S/Tk402.002.tar.gz)

There is quite a lot of information here! With the exception of Tk::Pod, each one of these has 'Pod::' in front of it.

There does not seem to be any modules that have to do with verification of documentation, however. All of the modules seem to be interested in slicing and dicing POD into desirable formats (HTML, Man, etc). If we want to get more information, we can say:

C:\> perl -MCPAN -eshell

cpan> a /KJALB/

Author id = KJALB

EMAIL kjahds@kjahds.com

FULLNAME Kenneth Albanowski

to get more information on the modules, and to contact the authors for more detail.

Since we haven't found anything that really addresses our two remaining problems (ignoring code documentation and keeping code documentation 'in date') it looks like we have some coding to do. So let's turn to actually proposing a solution, and see how we might implement it.

Summary of Evaluating the Available Resources

We have gone through two major steps in evaluating whether or not a solution is correct for us. We have:

1) looked at the standard distribution, and found that a major chunk of our work is done in the form of 'Pod' or plain old documentation.

2) looked at CPAN for available modules, and found that there are not very many cookie cutter solutions for two of our remaining problems - to keep documentation in date, and by extension to keep people interested in the code.

So let's turn to making proposals on how to deal with these problems next.

Step #3: Proposals for a Solution

The solution I propose is a class. The name of the class I had in mind was 'DocChecker', but let's name it Pod::Checker instead, to be more in sync with CPAN. The class will not work everywhere - 90% of helpful solutions won't - but with a little effort you can modify it to fit inside your code.*

In fact, that is probably why there is not a true solution available on CPAN. CPAN tends to favor true-blue 100% workable solutions and it is incredibly difficult to come up with a bulletproof solution for this problem. A true, 100% solution would require that documentation was integrated a lot tighter into the Perl executable itself - maybe, for example if there was a 'doc checker' flag which you could run, something like:

c:\> perl -DOCCHECK file.p

which would check 'file.p' for consistency. This flag would have to know about what subroutines are in all the packages, enforce a standard on documentation, and generally be a jack-of-all-trades to combine these elements together. That would probably require a lot of wizardry. Perl 6, anyone?

However, notice that there are a couple of modules which look like they could help a lot to simplify our programming job. Pod::Parser looks like it could make our job considerably easier. You will see, the module we are going to tackle isn't the easiest to program!

For now, we will assume a canned solution does not exist, both for the experience of programming it, and because sometimes you really do not want the dependency.

*

Remember, the problems for this example are keeping documentation up to date and getting easy access to the documentation that we do have from inside the code. Let's take the two problems in hand, and see what we are going to have to do to solve them.

Proposal #1: Fixing the 'Out of Date Documentation' Problem

The solution for this is simple, although some of you may find it a bit drastic. It is to make out of date documentation a compile-time warning or error. (Depending how draconian you are, that is. Forcing programmers to always have in-sync documentation will definitely not win you any popularity contests.)

After all, the best way to force people to keep up their documentation is to make non-documented scripts annoy them:

1) Warn if a script is not documented (in development).

2) Die if a script is not documented (in production).

By doing this, you gently coerce them to document their code while they are working on it, and flatly refuse to accept their code if they don't accept your warnings.

The Approach

Now that we have an idea for the solution to the out of date documentation problem, the next thing we have to do is decide how we are going to implement the Pod::Checker class. This may seem fanciful, but with Perl it is easy. All you have to do is:

1) Define a simple POD format to which all documentation on the project should conform.

2) Define a coding standard for how the code looks, so that Perl can recognize which subroutines are in a given module. This helps Perl decide if the given subroutine doesn't have any documentation.

3) Define a coding standard for arguments from the command shell or command line

Our simple strategy will be to compare the documentation to the subroutines that are actually in the code. Then, to compare the documentation about the options with what is in the code.

Perhaps an illustration of these three points is in order. Suppose that the module has subroutines 'a' and 'b' defined in the code. Then the code better have an item in the documentation that describes 'a' and 'b'. Likewise, if the module has options 'arg1', and 'arg2', the code must have documentation on 'arg1' and 'arg2'.

Let's take the following pseudo-script, as an example and go through each one of these three points:

Listing 19.2 pseudo_script.p

1 use MyModule;

2 use Getopt::Long;

3

4 GetOptions(\%varb, '--arg1:s','--arg2:i');

5

6 my $a = new MyModule($myarg);

7 my $text = $a->gettext();

8 myMethod();

9 sub myMethod

10 {

11 # do junk

12 }

13 sub MyMethodToo

14 {

15 # do other junk

16 }

There's a lot going on in this pseudo-script. It has two possible arguments (arg1, arg2). A new 'MyModule' is created in line 6. There is a call to both a subroutine internal to the script (line 8) and a method for $a (line 7). Finally, two subroutines are defined ('myMethod', 'myMethod2'). All of this needs to be documented - so what can we do to make this code more maintainable, given that we are using POD?

1) define a simple POD format to which all documentation on the project should conform

Standardization is a good thing in this case; we want to define a documentation format that is so simple and quick that everybody is going to use it, pretty much without thinking. It is easier to follow the documentation requirement if there is a template to simply 'cut and paste' into a given document. Listing 19.3 gives a sample format, one that we can put to good use for this particular problem:

Listing 19.3 - template tagged onto pseudo-script.p

17 __END__

18

19 =head1 NAME

20

21 pseudo_script.p - shows sample documentation format

22

23 =head1 SYNOPSIS

24

25 pseudo_script.p --arg1 <string> --arg2 <string>

26

27 =head1 DESCRIPTION

28

29 pseudo_script.p is a simple sample of a template being used in a script,

30 and how you could implement it in your code.

31

32 =over 4

33

34 =item method B<myMethod>

35

36 Sample method showing how methods are documented

37

38 =item method B<myMethodToo>

39

40 Another method that we document

41

42 =back

43

44 =head1 BUGS

45

46 The code doesn't do anything at all; this should probably be fixed.

47

48 =cut

Simple enough. Notice that we tagged this template onto the end of the script (for convenience's sake), and notice that the documentation section is longer than the code itself. Anyway, the meaning of each '=head1' and '=item' section is pretty straightforward:

The NAME tag keeps track of the name of the module or script

The SYNOPSIS gives a quick idea how to use the program. (In scripts this will be a list of arguments plus associated descriptions, and in modules this will be a list of common method uses.)

The '=item method' flags go over each method in the script or module in detail.

The BUGS section shows you things that need to be fixed.

We therefore have two parts of the script here, code and accompanying documentation. We can then have Pod::Checker do some very specific comparisons between these two parts to make sure that they are in sync. First, we compare the subroutines. Note lines #9 and #13:

9 sub MyMethod # defines MyMethod.

13 sub MyMethodToo # defines MyMethodToo.

We then compare these with lines

34 =item method B<myMethod>

35

36 Sample method showing how methods are documented

37

38 =item method B<myMethodToo>

39

40 Another method that we document

41

Pod::Checker will then confirm that documentation and code are consistent.

As for the arguments to GetOptions, well, let's note line #4:

4 GetOptions(\%varb, '--arg1:s','--arg2:i');

We can now compare this to:

23 =head1 SYNOPSIS

24

25 pseudo_script.p --arg1 <string> --arg2 <string>

26

to make sure that each of the arguments have small documentation items. We could go further, and enforce something like:

41 =item argument <arg1>

42

43 arg1 does the thing that arg1 does.

44

in which we give a short, written description of how the argument works; we will leave this to you as an enhancement...

Note that this is the bare minimum which we will enforce here. We will not enforce that documentation is a certain length, or if it is correct in any way shape or form. Instead, we will program our Pod::Checker module such that if you want to make your documentation more complicated than this, you can.

But it's been my experience that even this bare minimum isn't enforced in many places. If you had descriptions defined for every module and script, and enforced the convention to keep it up to date, then job of documentation is about 80% done. As for this example, simply pass it through the 'pod2html' html-ifier:

c:\> pod2html pseudo_script.p -outfile 'C:\temp\pseudo.htm'

and then you will see this when you use a browser:

191.fig

Figure 19.3

Documentation From pod2html in browser

This shows that we have our document formatted correctly.

define a coding standard for how we are to represent our subroutines

One of the main reasons that this is only a 90% solution is because Perl has such a complicated syntax. Remember how free form Perl can be? Well, this plays havoc with any script that parses through Perl unless we control ourselves. If we saw something like:

sub # this is a comment

subroutine name

{

}

This is a perfectly legal script. It so happens to have a comment embedded between the subroutine declaration and the subroutine name. Likewise:

sub

=head1 subroutine name

 

 

=cut

 

{

}

is legal, with an embedded pod statement. Try making a regular expression to match any of these! In order for this mixing and matching to work, using Perl to parse through the code, we are going to have to constrain ourselves just a little bit in our coding style.

We must settle on good, uniform formats for subroutines in order for our Pod::Checker to work. The subroutine format that modules will be able to parse through easily, having the three following formats:

sub format { }

sub format($$) # prototypes... see perlfunc for more info

{ # tis a rapidly changing field

}

sub format # indented spacing and comments

{

}

should be enough. The format is the sub keyword, next to a newline followed by zero or many spaces, followed by a word of one or more characters. We say this for ease of parsing, but it isn't perfect. For example,

$line =<<'EOF';

sub generatedSub

{

}

EOF

;

can fool it because generatedSub looks like a subroutine, but is really a string. Situations such as this can come up if you are generating code. Hence, we can get around by saying something like:

$line =<<'EOF';

sub generatedSub # IGNORE

{

}

EOF

;

where IGNORE is a special keyword, which tells the parser to ignore the subroutine as a fake. But, if you implement this workaround, make sure to still warn about it when code goes through into production. For convenience sake, people might use the IGNORE keyword to actually put off writing documentation, which is something we'd rather avoid.

Now, for the next point: non-defined arguments. They are just as bad (or worse) as far as documentation goes, since this is usually the place where the end users of the code will actually make mistakes. End users are generally less forgiving than other developers.

define a coding standard for arguments from the command shell or command line

This is a bit more tricky, since there is no strict standard format for command line arguments. Some programmers like to use the Getopt::Std module that comes with the standard Perl distribution. It lets you do things like:

prompt% ps.p -acf

where '-f', '-u', '-a', and '-r' are all options in the standard sense. This is called clustering, and many UNIX commands seem to be pretty fond of this configuration. I dislike clustering, mainly because I'm more fond of explicit things like:

prompt% ps.p -common -schd -full

which shows exactly what these options are doing, and lessens the learning curve for the command.

And to complicate matters further, remember how many different command line libraries there are out there! We already did the math when we were looking at command line solutions then:

cpan> i /Getopt/

Module Getopt::EvaP (LUSOL/Getopt-EvaP-2.3.1.tar.gz)

Module Getopt::Gnu (Contact Author WSCOT=Wayne Scott)

Module Getopt::Help (Contact Author IANPX=Ian Phillipps)

Module Getopt::Long (JV/GetoptLong-2.9.tar.gz)

Module Getopt::Mixed (CJM/Getopt-Mixed-1.008.tar.gz)

Module Getopt::Regex (JARW/Getopt-Regex-0.02.tar.gz)

Module Getopt::Std (TIMB/perl5.004_02.tar.gz)

Module Getopt::Tabular (GWARD/Getopt-Tabular-0.2.tar.gz)

Module Tk::Getopt (SREZIC/Tk-Getopt-0.30.tar.gz)

Nine solutions, each probably having its own syntax, and its own strengths and weaknesses. It's enough to drive a sane programmer mad; and, if you so happen to be in the position to make lots of different people's code work together, a sane development manager beyond mad.

Hence, we standardize. We could standardize on any one of these platforms, or even make our Pod::Checker program be able to recognize more than one option's package. Anything is possible: after all it is your own project! Just remember, the more complex you make things, the more (probable) maintenance you are going to have to do in the Pod::Checker module.

For the purpose of this example, we are going to standardize on Getopt::Long. Why? Well, as we said in Chapter 15, Getopt::Long has the following benefits:

1) it comes with the standard distribution

2) it has a lot of power

3) it has a point of focus: the Getopt::Long module implements only one major function, the GetOptions function, which does all of the work.

All of these are important points. Points one and two insure that 90% of the people will be using Getopt::Long. Point three insures that the code we eventually write to parse Getopt::Long will be the simplest possible.

In short, we will assume that, if we have any options statement, it will look like:

GetOptions(\%options, '--option1:s', '--option2:s');

where option1, and option2 are the options that come from the command line. (The hash '%options' is not necessary. Here, it merely is used as a receptacle to hold all of the options. You don't need to use it.)

Note again, Perl's syntax works against us to have a 100% full solution. Syntax such as:

my (@actual_options) = ('--option1:s', '--option2:s');

GetOptions(\%options, @actual_options);

can fool us. Pod::Checker can't be a mind reader; we can't know how 'option1' and 'option2' get put into GetOptions at run-time. We are checking at compile-time and need to give Pod::Checker a break!

When we see GetOptions being passed an array like this - or passed a hash - we will issue a warning in development, and an error in production so as not to make the module too complicated.*

Either that, or you may to have some sort of tie in, which makes GetOptions feed back the options it receives to the Pod::Checker module for run-time checking. Although, as said above, this makes it pretty complicated.

Summary of 'Solving the Up to Date Documentation' Proposal

Our proposal for how to solve the 'up to date documentation' problem has to do with verification and validation (for all you software engineering fans out there).

We validate that something is correct by verifying that two things that are supposed to correspond with each other actually do. In this case, the two things that are supposed to correspond to each other are:

1) code

and

2) documentation

Verification is accomplished by comparing them together. If there text in the code that says there is a subroutine named createNirvana, there had better be text in the documentation that describes that subroutine.

By doing this comparison, then, we make both the code and documentation stronger. They have been validated to be correct.

Proposal #2: Solution for 'Ease of Access' problem

Now, some may say that this problem is already solved, in fact if you do something like:

c:\> perldoc Module

you will see all the documentation for that module on the screen, assuming that Module is installed into the Perl work tree. And you can say:

c:\> pod2html Module.pm --outfile 'Module.html'

to put the module in an html format.

However, these always assume that you are coming from a command line, rather than looking at POD from inside a program. Sometimes, it would be awfully helpful to have a mechanism which lists out all of the methods that a programmer is allowed to use, something like

$check = new Pod::Checker('Module');

print $check->getmethods();

where 'getmethods' simply tells you what you can do with a given module. It also would be nice to somehow have an automatic mechanism on the command line such that if someone types:

prompt% script.p -help

then Perl prints out the actual usage of that script automatically, by relying on the documentation.

Other ideas include: an automatic documentation installer, a document searcher that looks and retrieves documents by a command line, an html program that acts as this document searcher, and so forth. All these ideas require a programmer's interface, and if we do it right, creating them should be trivial. After all, ease of access is a feeling just as much it is a concrete technical solution. To give people that feeling, you need to give them a hundred different ways of doing the same thing, so that they adapt to the way that suits them.

Either that or you have the power to enforce that they use a particular solution, which often leads to suboptimal ways of doing things. Perl has never been a fan of enforcing things like this - people generally know what suits them - they don't need to go to a central authority to tell them what to do. Of course, you can heavily advertise what would be a good way of handling documentation, mentioning that you've got all this cool stuff, and that your life would be a lot easier if you used it. Generally, the best standards come out of consensus.

Anyway, the plan is this. We will provide a bunch of methods to our Pod::Checker class, which then let us easily access the documentation inside our programs. Once we have a way to actually deal with documentation inside the program itself, we will be set. We can add methods as we go. These are some ideas that come to mind (ones that we will implement btw!)

sprintf()

sprintf makes a string of the document in a specific fashion. For example,

my $doc = new Pod::Checker('script.p');

my $usage = $a->sprintf('usage');

makes a usage string out of the sprintf statement. Other useful statements would be:

my $html = $a->sprintf('html');

my $text = $a->sprintf('text');

which would make $html a legal html page, and $text a text version of the documentation.

getmethods()

getmethods gets the legal methods that one can do, from the point of view of the module, in an array reference. For example:

my $doc = new Pod::Checker('Module');

my $allmethods = $doc->getmethods();

my $accessors = $doc->getmethods('/get/');

If given no arguments, then getmethods() will return all of the methods that 'Module' is able to do. If given an argument of a regular expression, it will return all the methods that fit that description: '/get/' will return all of the modules that have the string 'get' in them.

gettags()

gettags gets information about indvidual tags from inside the code. If we say:

my $doc = new Pod::Checker('Module');

my $name = $doc->gettags('/NAME/');

then this will retrieve into an array reference the text of all of the tags that have the string 'NAME' in them (like '=head1 NAME'). And:

my $doc = new Pod::Checker('Module');

my $bugs = $doc->gettags('/M.*BUGS/');

gets the text of all of the tags that have the regular expression '/M.*BUGS/' in them ('=head1 MODULE BUGS', '=head1 METHOD BUGS', etc.)

Just these three methods can do a lot. Consider the two the problems that we mentioned:

1) documentation installer: the documentation installer is a simple wrapper around sprintf. You print to a string, and then dump that string to an appropriate file.

2) document searcher: this is a simple wrapper around getmethods() and gettags(); getmethods would be used to get all of the methods that one can use, and gettags() would be then used to get the documentation for those methods.

In addition, there are several times when you are simply too lazy to go outside the editor. You want to just find out what you can do in a module, easily, quickly, and with no hassle. You could then say:

1 my ($module) = new Pod::Checker('Module');

2 my $methods = $module->getmethods();

3 my $method;

4 foreach $method (@$methods)

5 {

6 print "$method: " . $method->gettags('/\b' . $method . '\b/');

7 }

to print out all of the methods and their usage to the screen. Although this seems too much of a hassle here, we could make a shortcut method showwhatIcanDo, which looks like:

1 print $module->showwhatIcanDo();

and which does exactly the same thing, only simpler.

Summary of Proposed 'Ease of Use' solution

The next step after insuring that the documentation of something is correct, is to make sure that users can get to it easily. This is ease of use, and it will make all the difference in the world on whether or not people will actually care about your documentation solution.

In this case, our solution is to punt; we simply do not know what people find the most attractive, so we give them a myriad of ways to actually get at the documentation. There are the ways that Perl itself lets you get at the documentation:

1) the command 'perldoc <Module>' shows centrally installed modules

2) the commands 'pod2...' translate POD to so many different formats (pod2html - to html, pod2man - to man, pod2text - to text, pod2latex - to latex, etc)

Also, provided by this module, are methods to:

1) search in the documentation for the definitions by regular expression of functions

2) make a scalar that holds the different forms of documentation

3) show the possible functions in our namespace

In short, we provide a hundred different ways of finding the documentation, broadcast these different ways to users, and then see what sticks.

This is the scattershot method, and Perl is especially good at this, because the language itself is probably the best example of this particular principle! Let 'survival of the fittest' be your guide; let your ideas compete, see what works, and then concentrate on the working ones.

Step #4: Formal Design and Pseudo-Code

Now that we have an idea of the functionality that we want to get out of the Pod::Checker module, we can think a little bit about how to implement it. First, by drawing up a formal design, then writing some pseudo-code. Those steps are labeled below:

Formal Design

The first step to make a formal design is to think of how people are going to use the application. One possible idea is to say:

1 use Pod::Checker;

2

3 my $script = $0;

3 my $doc = new Pod::Checker($script);

4 print @{$doc->errors()} if (@{$doc->errors()});

to check the script that has Pod::Checker in it. It would therefore give warnings and/or errors that the script has in its documentation. But, as it stands, this doesn't do very much good. It has the fault that it requires the user to make the check herself. If the user forgets about putting these three lines in the module, then - oh well.

Suppose that somebody says something like :

1 use MyModule1;

2 use MyModule2;

3 use MyModule3;

4

5 use Pod::Checker;

As it stands, the code to do checking for this script would look like:

6 my $doc1 = new Pod::Checker('module1');

7 my $doc2 = new Pod::Checker('module2');

8 my $doc3 = new Pod::Checker('module3');

9 print "@{$doc1->errors()} if (@{$doc1->errors()});

10 print "@{$doc1->errors()} if (@{$doc1->errors()});

11 print "@{$doc1->errors()} if (@{$doc1->errors()});

the user is going to have to check 'MyModule1', 'MyModule2', and 'MyModule3' for errors, or else our documentation check will be no good.

Hence, we want to have an automatic component to Pod::Checker. We want to be able to say:

use Pod::Checker;

and have it check every document that is required by the user's script, without the user having to do anything at all. The easier it is to use, after all, the quicker it will be accepted.

This is the first component of our design: the part which warns us of errors, done invisibly to the user.

The second part is the manipulation of the POD format; which has the methods sprintf() and gettags() as described. So the beginnings of our formal design looks something like what is in Figure 19.4:

194.fig

Figure 19.4

Pod::Checker structure, initial.

The idea behind this is to simply list the algorithms, and the likely methods, which will be located in 'what place'. The auto-checker goes into our class, the methods sprintf(), gettags(), etc. go into the object part of the class.

Second iteration of the design

However, the question remains, exactly how will the 'auto-checker', as it stands, do its 'autochecking'? First, we need some way of going through everything that the script Doc::Checker uses. Second, the checker is going to need to do some parsing of the POD format to do anything useful.

We could envision making a module whose only point in life is to actually return the documentation errors in modules passed into it. Either that, or to hand code the autochecking for the sake of the class. Our design would then look something like this:

195.fig

Figure 19.5

Pod::Checker structure, round 2.

This does not look very cool. It seems too complicated, too redundant, just plain too much work.

So what do we do? We simplify. The module Pod::Checker has committed to the methods gettags, getmethods, and sprintf right now, and all of them are going to have to do parsing of the pod format. Why not simply add an errors method as well, which returns all of the errors that it finds while parsing?

Beautiful! That way, the design that was in two parts becomes a problem that has two parts with a common core. Our revised document becomes:

195.fig

Figure 19.5

Pod::Checker structure, revised.

In other words, the automatic Checker, which is run once at the beginning of our program, is defined in terms of our other problem (having information about the documentation is available to a end-user.) We now go on to the second step, which is to start filling in parts of this structure into a code skeleton.

The Pseudo-Code Stage

When you are just starting out with this whole OO thing, it is difficult to set yourself down to write pseudo-code like what we are going to do right now. However if you are a new OO programmer, I suggest that you try to aim for doing this.

Writing pseudo-code forces you to think further before you program; but I can understand the temptation to 'just dig in and do it'. In fact that's how I learned. But by eventually pulling yourself up a level, you should be able to see things a lot clearer, and ultimately program better. Patiently restrain yourself from jumping in.

Anyway, enough advice; let's look at how we put our diagram into pseudocode. I think of pseudo-code as a two step process: writing a skeleton of all the public subroutines, filling the primary subroutines (import, new, etc) with the private subroutines we are going to use and the data members that the private subroutines use. So, here goes...

Making a class skeleton

The first thing to do is generate which public subroutines are going to be in the module. This is the code skeleton. Easy enough, but it marks the transition between thinking about the code, and actually doing it:

Listing 19.4 Pod/CheckerSkel1.pm skeleton

1 package Pod::Checker;

2

3 sub import # run at beginning

4 {

5 }

6

7 sub new # takes module or script as argument

8 {

9 }

10

11 sub gettags # returns a list of tags

12 { # in an array reference.

13 }

14

15 sub getmethods # takes a tag or a regular expression

16 { # returns array reference.

17 }

18

19 sub sprintf # shows either 'usage','html', or 'text'

20 {

21 }

22

23 sub errors # returns the errors between

24 { # documentation and code

25 }

This code skeleton is an important step. We won't necessarily keep to the exact framework laid out here. We will make revisions, sometimes pretty massive revisions, to the code skeleton as the situation warrants. The code skeleton does not show important details.

However, it is a point in which we can take a step back and see if the code skeleton actually fills our needs. To reinforce this high level, I sometimes make a calling diagram, which simply lists the subroutines in relationship to each other, and which I subscript with helpful text of what is going on:

21X.fig

Figure 21.X

Calling diagram for Pod::Checker - high level.

As it is, we can sort of see the demarcation between the 'checker' part and the 'parser' part.

1) the import function is going to handle the 'checker' part, since that gets run when we say 'use Pod::Checker;'.

2) the new, gettags, getmethods, etc, functions are going to be the 'parser' part, implemented as an object.

In addition, the import function is going to call the parser; that links the two parts together. We also make comments to ourselves on what the functions are going to do.

We have reached the point where we have an idea about what is going to happen. We now make a skeleton 'calling tree', which further commits us as to how to program a solution.

Making a calling tree.

Now for the second part. We fill in the constructor - and the import method - with the functions that they are going to use:

Listing 19.5 Pod/CheckerSkel2.pm filled in skeleton

1 package Pod::Checker;

2 use strict;

3 use Data::Dumper;

4

5 my $_script = $0;

6

7 sub import # run at beginning

8 {

9 my ($module, $config) = @_;

10 my $errcnt = 0;

11

12 my $doc = new Pod::Checker($_script);

13 if (@{$doc->errors()}) { print @{$doc->errors()}; $errcnt++; }

14

15 foreach $module (keys(%INC))

16 {

17 my $file = $INC{$module};

18 my $doc = new Pod::Checker($file);

19 if (@{$doc->errors()}) { print @{$doc->errors()}; $errcnt++; }

20 }

21

22 }

23

This will do for now for the import function, and it is a good start. It is not complete code and we will add to it later, but it does show some important design decisions.

The most important thing to notice here is how we define the 'checker' problem that we have in terms of the other subroutines in Pod::Checker that we are going to write. We abstract the design, so later on we can implement it. In English, the pseudo-code is saying:

1 Create a new POD Checker object out of the script that is using Pod::Checker (line #12)

2 Check for errors in the Pod::Checker object of the script that is using Pod::Checker, and if so, print them out and say we have found an error (line #13).

3 For each module included (line #15), go through that module and create a new POD Checker object for each module (line #18). Check and print out the errors. (line #19).

We now do the same thing for the new statement. We define simple, private subroutines that will make the constructor:

Listing 19.6 Pod/CheckerSkel2.pm filled in skeleton continued

24 sub new # takes module or script as argument

25 {

26 my ( $type, $modfile) = @_;

27

28 my $self = bless {}, $type;

29

30 $self->{'filename'} = _$modfile;

31 $self->_getUsage(); # parsing that module for

32 # usage arguments

33 # sets $self->{'usgdocs'}

34 # $self->{'usgtxt'}

35 # $self->{'fulltext'}

36

37 $self->_getTags(); # parsing that module for 'Tags'

38 # sets $self->{'doctags'}

39 # $self->{'subnames'}

40 # $self->{'methodlist'}

41 # ie: '=item', '=head', etc.

42

43 $self->_calcErrors(); # sets $self->{'usgerrors'};

44 # augments $self->{'suberrors'}

45 $self;

46 }

With this pseudo-code, our calling diagram becomes:

21X.fig

Figure 21.X

Calling diagram for Pod::Checker - next level.

We define the important private functions we are going to use here:

_getUsage() which looks through the module given by $self->{'filename'} for command line arguments, and their documentation.

_getTags() which looks through the module given by $self->{'filename'} for the various documentation tags and subroutines inside the module.

_calcErrors() which takes the information gleaned from _getUsage() and _getTags() and then actually fits it together to see which methods and arugments are undocumented, and so forth.

In the process, we also recognize some elements that the Pod::Checker object is going to have. They are as follows:

filename - the object filename to check

usgdocs -the documentation for arguments to the command line

usgtxt - the actual GetOptions call or calls that set the arguments

fulltext - the full text of the module, code and all.

doctags - the documentation tags ('=head1 NAME',etc).

subnames - the subroutine names, gleaned from the code.

methodlist - the subroutines, gleaned from documentation

usgerrors - the usage errors from cross-checking usgdocs and usgtxt. Command line errors.

suberrors - the subroutine errors from cross-checking subnames and methodlist together.

By putting them down (with comments next to the private functions that actually create them),and before we go into full gear, we prevent a lot of needless variables. You know that your design is getting off track if there are a lot of excess functions and data that seem to exist in a couple of places, but all seem to be doing the same thing.*

Just like the private functions that we created above, you can change this list, add things and subtract things, but you should think real hard about these changes when you make them. Try to keep the list of private functions as up to date as possible.

In defining these functions and data, we have done exactly the same thing as in the case with our import function. We have left them as abstractions. We do not care about how the holes are filled in. For now, having the big picture is the important thing.

Summary of Pseudo-Code Stage

At this point we are pretty much ready to go full steam into the development of this particular module; but realize that you will probably have several iterations like the ones that we had before, not just the single iteration shown here. The key concept to realize here is that we are developing this module top down. In other words, we are keeping the code at a high level of abstraction, in which we create each function in terms of methods that we haven't yet composed, except in our mind. The development process is then filling in these holes.

This is not the only way to develop modules. Other people like to pretty much get it all in their head, and then code the module straight from start to finish. But if you are new to OO, you are probably want to start out this way. Plus, this works better in larger projects. And, as time goes by, you will get an instinctive feeling on how to code with less overhead in the planning stage.

Step #5: Full Scale Development

As we said, we are ready to go full steam into the development process. Now is the time for thinking about all of the nasty little details:

1) performance: is it important for this application, or can we live with sluggishness?

2) function arguments: what format should they take?

3) algorithms: how exactly are we going to parse through a POD file?

And so on. Usually I pretty much stumble across them as I am programming; it's pretty much unavoidable because there are so many variables in software development.

Here we show you the full module, a step at a time, starting with the headers, and then going to the last, private function. And, in the process we document what stumbling blocks we encountered, as well as difficult technical obstacles we overcame. That way, the next time we won't stumble over those blocks quite as hard, and those technical obstacles won't be quite as difficult.

Headers and the Import function

Below, in listing 19.7, are the headers and import function. Notice that they pretty much grow out of the pseudo-code from the previous section, but there a few of these small details and stumbling blocks that we came up with during development:

Listing 19.7 Pod::Checker - headers

1 package Pod::Checker;

2

3 use Config;

4 use Carp;

5 use Diff;

6 use FileHandle;

7 use strict;

8 use Data::Dumper;

9

10 my $_bin = $Config{'bin'}; # where the perl binaries are stored.

11

12 my $_lib = $Config{'installprivlib'}; # this is where all the central

13 # modules are stored. We can't

14 # assume they follow the same

15 # convention

16 my $_script = $0;

17

18 my $_docs = {}; # we will cache the documents so

19 # we don't have to go through the

20 # work of making it again

21

A couple of quick notes this code. We will use the Diff:array function which we developed in the last chapter, so we include the Diff module in line 5. We also use Config - for the sake of getting some information about our Perl configuration - namely what path to the binaries that we have installed (line #10), and where the central modules (out of our control) are stored (line #12).

We then make a few more private variables that will hold some more information that we will want to keep around each time we make a Pod::Checker object. $_perlexec holds the name of the Perl executable that ran the script, $_docs is a 'holding place': if we make a Pod::Checker object out of MyModule, for performance's sake we will want to reuse it if somebody types 'new Pod::Checker(..)' again.

How we make use of this information is shown below.

Listing 19.8 Pod::Checker - import()

22

23 sub import # run at beginning

24 {

25 my ($module, $config) = @_;

26 my $errcnt = 0;

27

28 return() if ($ENV{'DOCOFF'});

29

30 my $doc = new Pod::Checker($_script);

31

32 if (@{$doc->errors()})

33 {

34 print "$_script:\n";

35 print @{$doc->errors()}; $errcnt++;

36 }

37

38 foreach $module (keys(%INC))

39 {

40 my $file = $INC{$module};

41 my $doc = new Pod::Checker($file);

42

43 next if ($file =~ m"^$_lib");

44 if (@{$doc->errors()})

45 {

46 print "$file:\n";

47 print @{$doc->errors()}; $errcnt++;

48 }

49 }

50

51 die if ($ENV{'DOCDIE'} && $errcnt);

52 }

Here we've hit another couple of code 'speed bumps' so to speak (but to be fair, we hit these speed bumps when we actually used this module for a while.) First, we've added a feature such that we can control our Pod::Checker by environment variable; the two environmental variables - DOCOFF and DOCDIE - are used as a 'turn on and turn off' switch.

Why do this? Since parsing through all of the modules for documentation is quite a performance hit, we don't want to have this feature turned on in production scripts. And we also don't want to change our code to make a script production ready. Hence, DOCOFF can be used to turn off the check. Likewise, DOCDIE can be used to make the lack of documentation a fatal error, rather than just a warning.

The idea here is that we have a turnkey person: someone responsible for loading scripts into a production environment. This person first turns on DOCDIE and rejects any scripts that have errors in them. When the scripts pass the DOCDIE check, then DOCOFF is turned on, and the Pod::Check is bypassed.

The only other thing we ran up against is in line #46. We say something like 'next if ($module =~ m"^$_lib")'. Why? Well, as much as we like our document checker, not all modules on the Web use the same standard! Hence, we will not print out the errors that we find in all modules that are installed in the standard distribution. We can parse them, we can even doc-check them, but the errors that they generate we have to bypass. If we don't we will spend 90% of our time chasing down errors that don't exist.

The constructor New

We are ready now to make the new function, where we take the next step in implementing what (for now) is pseudo-code in import(). Listing 19.9 shows new() in its entirety:

Listing 19.9 Pod::Checker - new()

52

53 sub new # takes module or script as argument

54 {

55 my ( $type, $modfile) = @_;

66

57 my $self = bless {}, $type;

58

59

60 $self->{'filename'} = _getFile($modfile); # foolproof way of getting

61 # a module name

62

63 my $filename = $self->{'filename'};

64 return($_docs->{$filename}) if (defined $_docs->{$filename});

65

66

67 $self->_getUsage(); # parsing that module for

68 # usage arguments

69 # sets $self->{'usgdocs'}

70 # $self->{'usgtxt'}

71 # $self->{'fulltext'}

72

73 $self->_getTags(); # parsing that module for 'Tags'

74 # sets $self->{'doctags'}

75 # $self->{'suberrors'}

76 # $self->{'subnames'}

77 # $self->{'methodlist'}

78

79 # ie: '=item', '=head', etc.

80 $self->_calcErrors(); # sets $self->{'usgerrors'};

81 # augments $self->{'suberrors'}

82 $_docs->{$filename} = $self;

83 $self;

84 }

This is almost exactly the same as our pseudo-code, with a few more speed bumps that we have found along the way. We decided that, for ease of use, users should be able to say:

my $doc = new Pod::Checker('Module');

my $doc = new Pod::Checker('Module.pm');

my $doc = new Pod::Checker('/module/not/in/INC/Module.pm');

where any one of these three uses will be valid. The first one will look for the module inside @INC, and use the first module that it finds (kind of like saying 'use Module' without actually using the module). The second will do the same, and the third accepts a full path for finding a module.

In all cases, the function that does this is line #60:

61 $self->{'filename'} = _getFile($modfile); # foolproof way of getting

62 # a module name

Function _getFile() does all the work of munging the document that the user provided. And for now, we just note that this is how we are going to add this bulletproof-ness, and go on.

Another road rash: Line #64 shows how we are going to handle performance matters. We have defined $_docs in the headers; this is where we check to see if the Pod::Checker has seen $filename before:

64 return($_docs->{$filename}) if (defined $_docs->{$filename});

As in the headers, we make it so we do not parse the same file twice. Line #82:

82 $_docs->{$filename} = $self;

stores the fact that we have parsed a certain document inside $_docs. This works with line #64 to insure that we are not doing any extra work.

Public functions for Pod::Checker

Notice that now we have five private functions on our plate that we have to fill in: _getFile(), _getUsage(), _getTags(), _getSubs() and _calcErrors(). Let's blissfully ignore them for the time being. Assuming that they have been implemented, we can fill out the rest of our private functions a lot quicker.

The public functions we have on our plate are: gettags(), getmethods(), sprintf(), errors(). These should be fairly easy to implement, given that the private functions are doing all of the hard work

Public function gettags()

The first public function in our skeleton is gettags(), the interface for returning an array of text, such that:

my $doc = new Pod::Checker('Module');

my $tag = $doc->gettags('/NAME/');

returns something like:

[

'=head1 NAME

The name of this module is Module. Please use it carefully - or beware.'

]

or, an array of text, where the header tag, '=head NAME' contains the string 'NAME'. Listing 19.10 shows 'gettags()':

Listing 19.10 Pod/Checker.pm - gettags()

87 sub gettags # returns a list of tags.

88 { # in an array reference

89 my ($self, $pattern) = @_;

90

91 my $return = [];

92 my $regexp = 0;

93 if ($pattern =~ m"^/" && $pattern =~ m"/$")

94 {

95 $regexp = 1;

96 $pattern =~ s"^/""g;

97 $pattern =~ s"/$""g;

98 }

99 my $tag;

100 my $tags = $self->{'doctags'};

101 if ($regexp == 1)

102 {

103 foreach $tag (sort keys (%$tags))

104 {

105 if ($tag =~ m"$pattern") { push(@$return, $tags->{$tag}); }

106 }

107 }

108 else

109 {

110 foreach $tag (sort keys (%$tags))

111 {

112 if ($tag eq "$pattern") { push(@$return, $tags->{$tag}); }

113 }

114 }

115 return($return);

116 }

This looks fairly straightforward. We look at the pattern that the user gives us in line #93, and then decide whether or not they intended that pattern as a regular expression. If so, we treat it as such and set $regexp to 1.

If the user has given us a regular expression, we use lines 103-108 to search through it for us. If it isn't a regular expression, we assume an exact match and go through the logic in lines 109-115.

In any case the code:

106 if ($tag eq "$pattern") { push(@$return, $tags->{$tag}); }

113 if ($tag =~ m"$pattern") { push (@$return, $tags->{$tag}); }

will push all of the tags onto a stack, which we then return in 116. The hard part isn't this code, but actually getting $self->{'doctags'} in the first place.

Ah, but again, we ignore this. That is a topic for when we get to implementing our private functions. For now, we simply note that line #116 returns the array reference containing all of the text in "$pattern".

Public function getmethods

The next public function in our queue is getmethods. It is a simple way to get functions that we can do for a given module, again assuming the existance of $self->{'methodlist'}:

Listing 19.11 Pod/Checker.pm - getmethods()

117 sub getmethods # takes a tag or a regular expression

118 { # returns array reference

119 my ($self, $pattern) = @_;

120 my $return = [];

121 my $regexp = 0;

122 my $method = '';

123 my $methodlist = $self->{'methodlist'};

124

125 if ($pattern =~ m"^/" && $pattern =~ m"/$")

126 {

127 $regexp = 1;

128 $pattern =~ s"^/""g;

129 $pattern =~ s"/$""g;

130 }

131 if ($regexp)

132 {

133 foreach $method (@$methodlist)

134 {

135 if ($method =~ m"$pattern") { push (@$return, $method); }

136 }

137 }

138 elsif (defined($pattern))

139 {

140 foreach $method (@$methodlist)

141 {

142 if ($method eq "$pattern") { push(@$return, $method); }

143 }

144 }

145 else

146 {

147 @$return = @$methodlist;

148 }

149 $return;

150 }

Simple enough. We follow the same format as gettags(), only, where gettags() used $self->{'gettags'} , we use $self->{'methodlist'} instead. We could probably even make a private function such that gettags was a one liner:

sub gettags { my $return = $self->_getlist($pattern, 'doctags'); }

and getmethods was also a one liner that runs the same private function (_getlist()):

sub getmethods { my $return = $self->_getlist($pattern,'methodlist');}

Well, you can certainly do this, but for now the code works, and we will note this down as a future improvement. We move on to the next function, sprintf.

Public Function sprintf

Next we have the sprintf function, which will act sort of as a 'bag of all tricks' function. You want to see the whole POD document in html format? Say:

my $doc = new Pod::Checker($0);

print $doc->sprintf('html');

which will then print out a html formatted document.

You want to see it in text format? Say:

my $doc = new Pod::Checker($0);

print $doc->sprintf('text');

You want to see a summary of the usage? Say:

my $doc = new Pod::Checker($0);

print $doc->sprintf('usage');

And finally, do you want to see the usage of your command, given all of the modules you have included, you can say:

if ($opt->{'help'} { print Pod::Checker->sprintf('command'); }

which is a class method that goes through each of the modules, looking for arguments to the command line, and puts them in a format that is printable. The idea that is people can say:

prompt% script.p -help

and get:

Usage:

script.p <files>

-show (show them files!)

-suppress (suppress them for the time being)

-delete (delete the files (note: very dangerous))

-munge (combine files into one)

-help ( get this screen)

and that you can put this usage inside the section called SYNOPSIS, rather than having to state it separately. Since it does so much, we write sprintf in terms of a bunch of arguments:

Listing 19.12 Pod/Checker.pm - sprintf()

151 sub sprintf # takes either 'usage' or 'html' or 'text', or 'command'

152 {

153 my ($self, $type) = @_;

154 my $text;

155 if (!ref($self) && $type ne 'command')

156 {

157 die "Usage $type is not supported!\n";

158 }

159 elsif (!ref($self))

160 {

161 $text = Pod::Checker->_sprintfclass($type);

162 }

163 else

164 {

165 $text = $self->_sprintfobject($type);

166 }

167 return ($text);

168 }

169

So what are we doing here? We are making sprintf() both a class method and an object method. If the user has said

Pod::Checker->sprintf('command');

then $self becomes

Pod::Checker

and the test 'ref($self)' returns back nothing (in line #159). As such, the command

Pod::Checker->_sprintfclass('command');

is run. If the call was

my $doc = new Pod::Checker('Module');

$doc->sprintf('command');

then the test "ref($self)" in line #159 returns back that it is, in fact, a reference after all. As such, $self becomes:

Pod::Checker=HASH(0x2664bc)

and then line 165:

165 $text = $self->_sprintfobject($type);

is called.

Why all this hubbub to make two separate calls, one object, one class? Well, sometimes scripts have part of their usage come from the script, and another parts come from the module.

For example, the script may provide the following arguments:

--argument1 --argument2

But the modules that use Doc::Checker may also provide some arguments.

Hence, ModuleA may provide

--argument3 --argument4

and ModuleB may provide

--argument5 --argument6

Therefore, to get a true command line that reflects all the options, we make Pod::Checker go through every module looking for options.

The _sprintfclass function and Circular Dependencies

I know that we are in the section of the code that deals with public functions, and _sprintfclass() is private (note the leading underscore), but realize that sometimes it helps to associate private functions with the public functions that call them. In this case, we have a specific task - to go through each module looking for GetOptions calls - and it is a class method. Since it is a specific method that is only going to be called by one, public function (sprintf), I put it here.

It represents one more of those 'speed bumps' that I keep talking about. There was a need for being able to print out all of the GetOptions() arguments so that we could say:

Pod::Checker->sprintf('command');

to return all of the command line arguments that the class knows about. Hence, we associate it with sprintf() and it is listed below.

Listing 19.13 Pod/Checker.pm - _sprintfclass()

171

172 sub _sprintfclass

173 {

174 my ($class, $type) = @_;

175

176 my $scriptdocs = new Pod::Checker($_script);

177 my $usage = $scriptdocs->sprintf($type);

178 my $module;

179 foreach $module (keys (%INC))

180 {

181 my $docobject = new Pod::Checker($INC{$module});

182 $usage .= $docobject->_sprintfobject($type);

183 }

184 return($usage);

185 }

We basically do the same work that the import function is doing; however, we are careful not to make this function dependent on import. If we do this, then we've got a dependency diagram that looks something like Figure 19.7:

197.fig

Figure 19.7

Bad Pod::Checker structure

In other words, import is dependent on the module Pod::Checker, and Pod::Checker is directly dependent on import! This is a circular dependency and you really want to avoid these in your code.

If your code has a lot of circular dependencies, it makes it a lot more difficult to change any given part. Instead of line 179-185, we could have said:

Pod::Checker->import();

foreach $doc (keys (%$_docs))

{

my $obj = $_docs->{$doc};

}

to do our job, but now, if we change import() we need to change _sprintfClass(), and vice versa. In fact, as it is coded above, calling Pod::Checker->import() doesn't work. For we have made the option to say:

$ENV{'DOCOFF'} = 1;

which turns off the documentation importing. If we tried this and the DOCOFF flag was turned on, import wouldn't have done anything and we would get no output at all!

We could hack the code to turn off the DOCOFF flag, and then call Pod::Checker->import(), and then turn it back on, but this type of hack will always get you into trouble. One hack leads to the next, and so on.

the _sprintfobject function

Again, this is a private function listed in the public part of the code for the same reason as _sprintfclass(). You may have noticed that _sprintfclass() was directly dependent on _sprintfobject() (in line #182). _sprintfobject() does the actual work, however. It's listed below:

Listing 19.14 Pod/Checker.pm - _sprintfobject()

186

187 sub _sprintfobject

188 {

189 my ($self,$type) = @_;

190

191 if ($type eq 'html')

192 {

193 my $text = `$_bin/pod2html --outfile '' $self->{'filename'}`;

194 return($text);

195 }

196 if ($type eq 'text')

197 {

198 my $text = `$_bin/pod2text $self->{'filename'}`;

199 return($text);

200 }

201 elsif ($type eq 'usage')

202 {

203 my $text = $self->{'usgdocs'};

204 return($text);

205 }

206 elsif ($type eq 'command')

207 {

208 my $text = $self->{'usgtxt'};

209 if ($text) { my $text = $self->{'usgdocs'}; return($text); }

210 else { return(''); }

211 }

212 }

213

Here, we have coded _sprintfobject() as a series of function calls. pod2html, pod2text, etc., exist in the standard distribution; to make sure that they are called correctly, we take the name of the directory where they are installed that we stored at the beginning ($_bin) and call these scripts, fully qualified. However, it is a bit of a hack, and we make a resolution to check on the status of how the Pod::HTML and Pod::Text modules are coming, and hope that they soon become flexible enough to do what is said here, so we don't need to call a shell! We move on to the errors function.

Public Function errors

Another function which is a wrapper around what has already been determined by a private function, but an important one. import() uses this function to show the users what documentation errors that need to be fixed, so the output had better be correct:

Listing 19.15 Pod/Checker.pm - errors()

214 sub errors # returns the errors between

215 { # documentation and code.

216 my ($self, $type) = @_;

217 my $errors = [];

218 if (!defined($type) || $type eq 'usgerrors')

219 {

220 push(@$errors, @{$self->{'usgerrors'}});

221 }

222 if (!defined($type) || $type eq 'suberrors')

223 {

224 push(@$errors, @{$self->{'suberrors'}});

225 }

226 return $errors;

227 }

An important function, and fortunately, a simple one. By the time that this function is called, we have already found which errors are there due to the constructor calling _calcErrors(). Not much to say, although note that we are being consistent in having all of our return values be array references.

Summary of Public Functions for Pod::Checker

Our public functions actually went pretty smoothly; but note that we delegate a lot of work to the private functions which are going to occur next. The functions new(), gettags(), getmethods(), sprintf(), and errors() all use attributes which are basically vaporware - for now. How do we turn:

'Method';

into

'/usr/local/lib/perl5/Method.pm'

for instance? And how do we get the tags (as '=head NAME') for each document?

These questions are ones that we address next, when we talk about Pod::Checker's private functions, which do all of the dirty work for us.

Pod::Checker's Private Functions

We are still left with the five private functions that we have left unimplemented: _getFile(), _getUsage(), _getTags(), _getSubs(), and _calcErrors().

These private functions are the meat of our module. Now, how do we go about actually dealing with the file formats, and parsing Plain Old Documentation (POD)?

The first route we could take is to reuse existing code. As you may recall, when we entered into the CPAN shell via:

prompt% perl -MCPAN -eshell

cpan> i /Pod/

Going to read /tmp/.cpan/01mailrc.gz

Going to read /tmp/.cpan/02packag.gz

Going to read /tmp/.cpan/03mlist.gz

Distribution BRADAPP/PodParser-1.00.tar.gz

....

....

Module Pod::Parser (BRADAPP/PodParser-1.00.tar.gz)

Module Pod::Pod (Contact Author KJALB=Kenneth Albanowski)

Module Pod::Select (BRADAPP/PodParser-1.00.tar.gz)

Module Pod::Texinfo (Contact Author KJALB=Kenneth Albanowski)

....

we see that Pod::Parser, and Pod::Select seem pretty close fits. In fact, we could probably use them; it saves us time and effort, at the cost of giving us a dependency.

For now, we will go the second route, which is to implement the private functions on their own, without help from CPAN. This example shows some good uses and subtleties of regular expressions that probably would never come up in anything but a real-life problem. You will probably find uses for them in your real-life problems.

So let's go. The first thing we need to consider is the translation problem: going from Module to '/usr/local/lib/perl5/Module.pm'

Private Function _getFile()

Fortunately, Perl provides some hooks to make the translation from Module to '/usr/local/lib/perl5/Module.pm' fairly easy. The first form (what I am going to call relative) is not nearly as useful from the computer's point of view as the second (absolute). We have touched on the relative filename to absolute filename problem before (in our chapter on Modules).

The absolute path form (seeing /usr/local/lib/perl5/Module.pm ) is a lot clearer about what file is meant, than the relative form (Module); hence _getFile() will take as an argument any relative or absolute form, and turn it into an absolute form, so that the constructor can store it.

The function to go from a relative module form (use Module) to an absolute form (require "/usr/local/lib/perl5/Module.pm") should probably be in the standard distribution, but we code it here for now inside _getFile():

Listing 19.16 Pod/Checker.pm - _getFile()

228 sub _getFile

229 {

230 my ($modfile) = @_;

231

232 $modfile .= ".pm" if ($modfile !~ m"\.");

233 $modfile =~ s"::"/"g;

234 my $file = (-e $INC{$modfile}) ? $INC{$modfile} : $modfile;

235

236 ($file) = grep (-r "$_/$modfile", @INC) if (!-e $file);

237 $file = $file || undef; # get rid of -w noise.

238 confess "Unable to open module for $modfile!\n" if (!-e $file);

239

240 return($file);

241 }

This function is short, but it is doing a lot. Here are the steps:

1) line #232 takes forms such as Pod::Checker and turns them into 'Pod::Checker.pm'.

2) line #233 takes any '::' in $modfile and turns it into '/', so we have a legal key for %INC. Step #1 and #2 turn Pod::Checker into 'Pod/Checker.pm'

3) line #234 looks through %INC to turn 'Pod/Checker.pm' into '/home/ed/DEV/modules/Pod/Checker.pm'. i.e., it turns the tag into the actual module that was loaded by the program.

4) If %INC does not know about the relative file Pod/Checker.pm, in line #236, we make a last-ditch attempt and go through @INC looking for the file.

If we can't find the file 'Pod/Checker' there, we give up! This function is pretty bulletproof. We could maybe make it more bulletproof by adding a check so that it does a 'find' if _getFile() can't locate the module we want (Pod::Checker in this example) in @INC.

However, this seems way overkill. We could easily give users the wrong information by pointing them to an old version of Pod::Checker by doing this. They are just going to have to fix it themselves if _getFile() cannot find a module that they want to look at.

_getUsage(), _getTags(), getSubs() - Parsing POD

Now for the parsing part of the module. It is no simple matter parsing POD exactly; but with Perl's regular expressions, you can pretty much do anything. With the extended form of regular expressions, you can do pretty much anything and make it look relatively clean.

So let's go through each of the three parsing problems that we have, and give a simple diagram of each. And hopefully, you can come back to these problems for review for your own regular expressions.

_getUsage()

The purpose of _getUsage() is twofold: we need to get the SYNOPSIS tag of our POD text, and then also get the text inside of any GetOptions() calls. Our intent will be, again, to compare the two together. Our job is to isolate the text regarding usage. We do this by making two regular expressions, which work something like Figure 19.X:

19X.fig

Figure 19.X

Regular expression to match usage text.

The first regular expression works by finding all of the non-semicolon ([^;]) text right after the string GetOptions( and right before the ending ).*

Why non-semicolon? Well, if we use '.*?', it will match the text in:

GetOptions(\%varb, function(), function2());

and function() will short-circuit the regular expression. Without actually making a parser, the semicolon is the safest thing to use. We could make it even safer by substituting all single and double strings for placeholders, getting the text, and then substituting back. Something like:

GetOptions(\%varb, '--variable();', '--variable2();');

which would break the way that we've got it set up (it will match the bolded text as is) could become:

GetOptions(\%varb, AAAAAAA, AAAAAB);

We could then match the text using our regular expression (it would match the bolded text again), and then 're-expand' to get back:

\%varb, '--variable();', '--variable2();'

We don't have time to do this here, but we will show how to do this on the CD. Look under the script 'reexpand.p' to do this.

The second regular expression is more straightforward. We match the string '=head' after a return, and at the beginning of a line, and then match everything after the SYNOPSIS tag, up to, and including, the next '=' that we find.

The text for _getUsage() is shown below:

Listing 19.17 Pod/Checker.pm - _getUsage()

242 sub _getUsage

243 {

244 my ($self) = @_;

245 my $file = $self->{'filename'};

246

247 local($/) = undef;

248 my $fh = new FileHandle("$file");

249 my $text = $self->{'fulltext'} = <$fh>;

250

251 while (

252 $self->{'fulltext'} =~ m{

253 (?:\n|^) # nail down regexp next to newline

254 =head[^\n]*SYNOPSIS.*?\n # SYNOPSIS line

255 (.*?) # SYNOPSIS text

256 (?=\n\s*=) # get everything to next paragraph

257 }sgxo

258 )

259 {

260 $self->{'usgdocs'} .= $1;

261 }

262

263 $text =~ s"(^|\n)=.*?=cut""sg; # get rid of all POD

264 $text =~ s"#.*\n""mg; # get rid of all comments

265

266 while ( $text =~ m{

267 GetOptions # assumes GetOptions Call

268 \s*\( # junk between GetOptions and paren

269 ([^;]*?) # arguments Note -- not .*?

270 \); # end of argument.

271 }sgxo

272 )

273 {

274 $self->{'usgtxt'} .= $1;

275 }

276 }

A few more things to notice here. In finding GetOptions, we have had to get rid of the POD that we find (line #263). Why? Well, some of the POD's may have documented GetOptions calls, and these will trigger 'false positives'. Also, we get rid of comments to be on the safe side (line #264).

The next thing we need to implement is finding the tags, which is a bit more complicated.

_getTags()

The tags are a little bit more difficult. One might think that the easiest way to match these would be to say match something like

17 __END__

18

19 =head1 NAME

20

21 pseudo_script.p - shows sample documentation format

22

23 =head1 SYNOPSIS

24

that is, everything between the first '\n=' and the second '\n=' - and to iterate through the whole file this way. Come to think of it, this is the simplest way of doing things. But hey, we were suboptimally intelligent in this case and did it the hard way. Ah well.

The point that we got caught up in is matching '=over', or '=back' or '=cut'. Basically, out of this function we are going to want to get a hash that looks something like:

$self->{'doctags'} =

{ '=head1 NAME' => 'first_script.p - does things first',

'=head1 SYNOPSIS' => 'it really does things first!'

};

Now, we were getting stuck with '=cut', because '=cut' doesn't have any text associated with it, and we were getting stuff like:

$self->{'doctags'} =

{ '=cut' => '=head1 NAME first_script.p - does things first'};

where the cut was mistakenly picking up text from the '=head' later because the associated POD looked like:

=cut

=head1 NAME

first_script.p - does things first

Ah well. An expression like

$self->{'fulltext'} =~ m"(\n|^)=(.*?)(?=\n=)";

would probably work fine. You would take the results from this expression and then plug them into another regular expression to glean information from the substring, but oh well. Figure 19.6 shows how the regular expression is working:

19X.fig

Figure 19.X

Regular expression to match complicated tags.

There are two major linchpins for this regular expression. First, we have alternation. We check, first to see if the tag we are dealing with is a 'over', 'back' or 'cut' tag; if so, we match it first. If it isn't one of these tags (a '=head' or '=item' tag) we then pass through to try to match the second expression.

The second point to notice is the '\G' in the expression. We didn't have a chance to cover '\G' in the regular expressions chapter; it usually comes handy in very difficult matching problems. '\G' makes it simple where you are matching blocks. In situations where:

17 __END__

18

19 =head1 NAME

20

21 pseudo_script.p - shows sample documentation format

22

23 =head1 SYNOPSIS

24

25 just use it!

26 =cut

the bold is matched first, the \G anchors the match, so that in a regular expression loop, the regular expression not only matches a character (like =) but can also match the last position where the previous match left off.

The (?=\=) clause insures that the regular expression pointer is sitting at the beginning of the next '=', and hence the pattern '\G=' will always match. Hence, the very next thing our regular expression will match is:

17 __END__

18

19 =head1 NAME

20

21 pseudo_script.p - shows sample documentation format

22

23 =head1 SYNOPSIS

24

25 just use it!

26 =cut

and position us for the next match (at the '=cut'). Anyway, enough talk. Here's the actual code:

Listing 19.18 Pod/Checker.pm - _getTags()

277 sub _getTags

278 {

279 my ($self) = @_;

280

281 my $subs = $self->_getSubs();

282 my $docs = $self->{'doctags'} = {};

283

284 my ($key, $value);

285 local($") = "|";

286 my $text = $self->{'fulltext'};

287

288 my $overRegexp = q$

289 # header (beginning/newline)

290 (=(?:over|back|cut).*?\n)

291 (?=\=)) # cut junk till next '='

292 $;

293

294

295 my $tagRegexp = q$

296 ((=.*?\n\s*\n) # tag '=item', '=head',etc

297 (.*?) # tag text

298 \n(?=\=)) # to next tag line

299 $;

300

301 while ( $self->{'fulltext'} =~ m{

302 (?:\G|\n|^)

303 ($overRegexp|$tagRegexp)

304 }sgxo

305 )

306 {

307 my $text = $1;

308 my ($key, $value) = ($5, $4);

309 next if ($text =~ m"(^\n*)=back"); # ignore the '=back' header

310 next if ($text =~ m"(^\n*)=over"); # ignore the '=over' header

311 next if ($text =~ m"(^\n*)=cut"); # ignore the '=cut' header

312

313 if ($key =~ m"^=item\s*method")

314 {

315 my ($methodKey) = ($key =~ m"method.*?\b(@$subs)\b");

316

317 if (defined($methodKey))

318 {

319 $docs->{$methodKey} = "$value";

320 push(@{$self->{'methodlist'}}, $methodKey);

321 next;

322 }

323 push(@{$self->{'methodlist'}}, $key);

324 push(@{$self->{'suberrors'}},

325 "Error!! Tag :$key: doesn't have any associated code!\n");

326 }

327 $docs->{$key} = "$value";

328 }

329 }

330

Quite nasty, eh? The main things to notice is that it took quite a while to work out line #308 (my ($key, $value) = ($5, $4). To do so, actually was some trial and error: I set up a test statement like:

print ":$1: :$2: :$3: :$4: :$5: :$6: :$7: :$8: :$9: :$10:\n";

to let Perl figure it out for me. The second thing to notice is that line #313 picks out the methods for us. If we see the special tag:

'=item method'

then we know the thing to follow is a method. In line #315 we do another sneaky thing. In a subroutine that is yet to be defined, (_getSubs()), we get all of the subroutines in that package, from the code itself. From there, we jam them together with a '|' to get:

sub1|sub2|sub3|sub4

Hence the expression

315 my ($methodKey) = ($key =~ m"method.*?\b(@$subs)\b");

gets which function the documentation is referring to! If this expression comes up blank, we have a problem, and note it in the statements:

324 push(@{$self->{'suberrors'}},

325 "Error!! Tag :$key: doesn't have any associated code!\n");

which then will be printed out, if we are calling import().

_getSubs()

OK, so we have one more major parsing task. This time, however, we will wing it a bit, not counting on a diagram for processing. The main thing that we want to do here is parse subroutines, but ignore the statements where we say:

$code .=<<'EOF';

sub false # IGNORE

{

}

EOF

;

Right now, we make a placeholder for these statements., call it $self->{'falsehits'}; but we may want to take a more active stance in the future (by printing the 'falsehits' when they get to production) if people start abusing this feature:

Listing 19.19 Pod/Checker.pm - _getTags()

331 sub _getSubs

332 {

333 my ($self) = @_;

334

335

336 my @subnames;

337 my @falseHits;

338

339 local($") = undef;

340 while (

341 $self->{'fulltext'} =~ m{

342 (?:\n|^) # return or begin

343 \s*sub\s+ # sub keyword

344 (\w+) # name of subroutine

345 (?:[\s\(\{]*)((\#.*?\n){0,1})

346 }sgxo # all the fixings. })

347 )

348 {

349 my ($sub,$warn) = ($1,$2);

350 (push(@falseHits, $warn), next) if ($warn =~ m"IGNORE");

351 push (@subnames, $sub);

352 }

353 $self->{'falsehits'} = \@falseHits;

354 $self->{'subnames'} = \@subnames;

355 }

356

The key lines are 341 through 345; line #345 in particular. Here, we take the task of matching:

sub marine # this is a comment.

where the spaces after 'marine' are matched by the (?:..) clause, and the optional comment is matched by ((\%.*?\n){0,1}).

The function _calcErrors()

We finally have the function _calcErrors(). _calcErrors() is what we use to reconcile code and documentation. As it is, we have a lot of text that we have stored: the elements usgdocs, usgtxt, subnames, and doctags all store different elements of the file that we have parsed.

We therefore pit usgdocs against usgtxt, and subnames against doctags to see what is documented and what is not:

Listing 19.20 Pod/Checker.pm - _calcErrors()

357

358 sub _calcErrors

359 {

360 my ($self) = @_;

361

362 # calculate Usage Errors First

363

364 my $usgdocs = $self->{'usgdocs'};

365 my $usgtxt = $self->{'usgtxt'};

366 $self->{'usgerrors'} = [];

367

368 my (@usageflags) = ( $usgdocs =~ m"-[\-]*(\w+)"g);

369 my (@codeflags) = ( $usgtxt =~ m"-[\-]*(\w+)"g);

370

371

372 my ($onlyinusage, $onlyincode) = Diff::array

373 (

374 \@usageflags,

375 \@codeflags,

376 'separate'

377 );

378 my $usg;

379 foreach $usg (@$onlyinusage)

380 {

381 push( @{$self->{'usgerrors'}},

382 "Error!! The flag '$usg' is defined only in SYNOPSIS!\n" );

383 }

384

385 foreach $usg (@$onlyincode)

386 {

387 push(@{$self->{'usgerrors'}},

388 "Error!! You have no docs for the flag '$usg'\n");

389 }

390

391 # Now, we calculate subroutine errors.

392

393 $self->{'suberrors'} = $self->{'suberrors'} || [];

394 my $subs = $self->{'subnames'};

395 my $docs = $self->{'doctags'};

396 my $sub;

397

398 foreach $sub (@$subs)

399 {

400 if ((!defined ($docs->{$sub})) && $sub !~ m"^_")

401 {

402 push (@{$self->{'suberrors'}},

403 "Error!! Subroutine :$sub: doesn't have documentation!\n");

404 }

405 }

406 }

407

408 __END__

Again, this is simply more shenanigans. usgtxt contains something like:

\%varb, '--opt1','--opt2'

and we break this apart with

368 my (@usageflags) = ( $usgdocs =~ m"-[\-]*(\w+)"g);

to get an array

@usageflags = ('opt1','opt2');

Likewise, we have defined our documentation to be tagged by the name of the subroutine:

=item method B<my_sub>

 

this is my_sub and it does X, Y, Z and THETA

becomes the hash entry:

$self->{'doctags'} ==

{ my_sub => 'this is my_sub and it does X, Y, Z, and THETA }

We then go through all the subroutines (line #398) and look for doctags associated with it. If a doctag doesn't exist for that particular subroutine, it is an error. Notice that we ignore subroutines that have an '_' in front of them - this too could be abused by someone who wanted to get around the system.

Summary of Private functions.

The private functions, _getTags(), _getUsage(), and the like, are the workhorses of our module. They are also pretty low level. They directly work with the way that POD is formatted, which is a negative. If we wanted to avoid this, we could have used the Pod::Parser module that was available on CPAN, as it might make things a little easier. However, I do not think that this necessarily means that you should always reuse code. Not many people openly acknowledge this, but it can also be a negative to reuse code. You have complete control over your own code, which you don't have with somebody else's.

You have the freedom to change the interface of your own code, but you don't have this freedom with someone else's. You need to make the judgment call. In cases such as CGI, WWW, and so forth, it is a nobrainer. Those modules have so much support that it is pretty much silly to rebuild them. In cases where you see 'version .1' attached, well, you probably don't want to use the code (at least for serious scripts). In this case, it was pretty much 50/50. I would maybe tend to lean towards implementing it myself; but then again, I haven't seen the Pod::Parser module and how it was implemented.

c) Step #6: Documenting Pod::Checker

I admit that the strong temptation is to wrap Pod::Checker up and call it a day, but there are a couple more things that we could do here.

1) We could MakeMaker-ify it.

This makes it easy to distribute to the world. I hate to do this, but we leave it to the documentation to show you how to do this. Remember how you could say:

perl Makefile.PL

make

make test

make install?

to actually install a given module? Or use CPAN to install a module? MakeMaker are the hooks inside Perl which lend this functionality. One of the benefits of distributing your modules to the world (besides fame and fortune) is that you get a good, solid test of your code by other people who find your modules interesting.

See the documentation for MakeMaker in the perlmod man pages on how its done.

2) We could document it.

This is the step that we will concentrate on here, and it is apt for a module named Pod::Checker. After all, it is a good thing to lead by example, and showing that you can go through the pain of documenting something makes others more willing to do it, too. There are two forms of documentation that you can maintain, a calling diagram form (i.e.: a picture) and text (i.e.: pod).

Documentation inside the Pod::Checker itself (text).

For the sake of consistency, we follow the same format as defined by the Pod::Checker module for the text part of our documentation:

Listing 19.21 Pod/Checker.pm - documentation

409

410 ######################### START DOCUMENTATION ##############################

411

412 =head1 NAME

413

414 Pod::Checker - pragma and object to manage updated documentation

415

416 =head1 SYNOPSIS

417

418 use Pod::Checker;

419

420 Class methods:

421 Pod::Checker->sprintf();

422

423 Object methods:

424 my $docobject = new Pod::Checker('Module') || new Pod::Checker($0);

425

426 $docobject->gettags('/.*subname.*/');

427 $docobject->getmethods('/get/');

428 $docobject->sprintf(); # 'usage', 'html', 'text'

429

430 =head1 DESCRIPTION

431

432 Pod::Checker is a method for managing and controlling documentation,

433 and manipulating it inside your code. It assumes that you are using POD.

434

435 When you say 'use Pod::Checker' in your code, you make perl go through

436 some hoops to check that your documentation is in fact up to date.

437

438 In addition, Pod::Checker provides you, by an OO interface, methods

439 for printing out and searching through documentation (for modules inside

440 @INC.

441

442 =over 4

443

444 =item method B<import>

445

446 This method does the checking of the documentation, to make sure

447 that it is kosher.

448

449 If perl cannot find:

450

451 a: a short usage statement about options you place in a 'GetOptions'

452 call, in a special header called 'SYNOPSIS'.

453

454 b: a documented 'item' or 'header' for each public subroutine call

455 (not beginning with a '_')

456

457 then perl warns you by default, dies if you have $ENV{'DOCDIE'} set.

458 If you have $ENV{'DOCOFF'} set, this test will not be made.

459

460 =item method B<new>

461

462 Constructor. Nothing interesting - takes modulename or filename as an

463 argument. May alternately take a string in a future version if enough

464 interest.

465

466 =item method B<gettags>

467

468 This method gets information out of the documentation Pod::Checker knows

469 about in text format, given a regular expression.

470

471 $doc->gettags('/.*sub.*/') gets docs for all subs named with the regular

472 expression .*sub.*

473

474 $doc->gettags('NAME') gets the text for NAME.

475

476 =item method B<sprintf> - 'usage', 'html', 'text'

477

478 This method gets information out of the module about what argument

479 processing it provides to the module. Ex:

480

481 $docs->sprintf('usage');

482

483 in a module containing the statement:

484

485 GetOptions(\%args, '--varb:s', '--whatever:s');

486

487 will look for a short usage document about the command line options

488 'varb' and 'whatever' in a special section called USAGE.

489

490 And the class method

491

492 Pod::Checker->sprintf('usage');

493

494 will print out the usage, combining all the usage statements that the

495 Pod::Checker knows about.

496

497 And

498

499 $docs->sprintf('html');

500

501 returns a legal html document of the whole doc.

502

503 =item method B<getmethods>

504

505 The method 'showmethod' returns an arrayref of what methods can be

506 used, given the documentation for those methods. Hence, if you did a

507

508 my $doc = new Pod::Checker('Pod::Checker');

509 my $methods = $doc->getmethods();

510 print @{$methods};

511

512 This would write out:

513

514 get sprintf getmethods

515

516 =item method B<errors>

517

518 Cache of any errors encountered in the process of parsing.Separated into

519 two types: errors in parsing subroutines ('suberrors') and errors in

520 parsing command line arguments ('usgerrors'); hence

521

522 $doc->errors('usgerrors');

523

524 returns usage errors for the module.

525

526 =back

527

528 =head1 BUGS

529

530 Pod::Checker uses wrappers around existing POD commands (pod2man,

531 pod2html, pod2text) and hence is kind of kludgy.

532

533 If you use this module, you have to keep your perl modules separate from

534 the centrally installed ones, lest you get tons of errors.

535

536 The Pod::Checker has to come last in the include hierarchy. Otherwise,

537 Pod::Checker will not check the modules that come after (this should

538 be fixed by perl5

539

540 =cut

541

542 ########################## END DOCUMENTATION ###############################

A hundred and thirty five lines of documentation. Whew! Very little to say here, except:

One can see why people put off the documentation part because it is a lot of work

One can see why it is important to make documentation as painless as possible.*

Please note, we will have to document Diff.pm in the same way, since we use it inside Pod::Checker and if we don't document it, Pod::Checker will register that as an error!

Finally, it might be helpful to see how our documentation for Pod::Checker actually looks inside a web browser. Again, simply call pod2html on our Pod/Checker.pm file:

prompt% pod2html Pod/Checker.pm --outfile Pod/Checker.html

For the sake of actually maintaining the Pod::Checker code, it is helpful to have a calling tree form of documentation as well, i.e.: the diagrams that we used to analyze our design can be kept up to date so we see exactly which subroutines are calling which others.

This itself is a coding challenge. One could, without too much difficulty, come up with a package that automatically generates calling diagrams, but for now, let's document the calling diagram manually. It is shown in Figure 19.X:

19X.fig

Figure 19.X

Calling Diagram for Finished Pod::Checker module.

Notice the lack of loops here. Every single loop that you have (as we discussed earlier) will make your design more unwieldy. With loops where subroutine 1 calls subroutine 2 calls subroutine 1, the dependencies will choke you every time.*

We discuss this next chapter, and in the chapter about layering.

OK, one more comment, of a technical nature. We have listed one of the bugs as it has to come 'last in the include hierarchy to work.' What does this mean?

If you say:

use Pod::Checker;

use OtherModule;

then at the time Pod::Checker runs, it will not know about OtherModule, and hence it doesn't show up in %INC.

This problem should be rectified by the time this book comes to press; we expect to have an INIT statement such that:

INIT

{

foreach $mod (keys (%INC))

####

} ####

will run stuff after everything else is included. This means that %INC will be complete, and the checker will be delayed until runtime.

Step #7: Regression Testing

Are we there yet? Not quite. We simply need to note that we have one more step to go; and that is of testing. Untested code is unreliable, and will bite you when you least want to be bitten. You can formally or informally test it. Informal testing is simply banging on it and watching it break, which is surprisingly effective if you are devious enough.

However, just for giggles, let's make a small test script to make sure that this application is working:

Listing 19.22 tPod_Checker.p - test script.

1 use Pod::Checker;

2 my $doc = new Pod::Checker($0);

3 my $poddoc = new Pod::Checker('Pod::Checker');

8 GetOptions(\%varb, '--arg1', '--arg2');

9 print "TEST1:\n===================\n\t", $doc->sprintf('command');

10 print "TEST2:\n===================\n\t", $doc->sprintf('usage');

11 print "TEST3:\n===================\n\t", Pod::Checker->sprintf('usage');

12 print "TEST4:\n===================\n\t",$doc->gettags('/SYNOPSIS/');

13 sub aha

14 {

15 print "You have a problem!\n";

16 }

Although for a big module like this our test script will be a lot bigger. Look at the CD's version of Pod_Checker.p, and our SpreadSheet program of Chapter 25 to see how big.

Summary of Chapter

Although we didn't directly state it, our conception, and implementation of the Pod::Checker module is really a fleshed out example of the software development cycle. We started with an idea, moved it into a formal design, moved that into pseudo-code, and moved the pseudo-code into actual code. And then we gave a small example of testing the code.

If this were to be used in real-life, we would then move on to the software aging cycle. Software starts out young and flexible, then new developments and changes make it grow, and finally those same changes can cause it to get unwieldy and obsolete.

Anyway, one of the biggest requests I've gotten from talking to people on the Internet is to show exactly what this chapter attempted to show: building a real Perl object, totally from scratch. If you are new to OO, try making some of your own objects from scratch (like we did this chapter) or building on your own code (as we did last chapter with Expect) and see where it takes you.

If you need ideas, use the CPAN shell to see exactly what is out there (the command in the CPAN shell 'cpan> i' will give you info on everything out there, authors, distributions, the whole lot.) Then just sit down and experiment.

This is the last chapter where we will deal with objects by themselves. We now move on to groups of objects and some techniques on how to deal with them. The next two chapters will cover inheritance, and layering. These will help you in actually making Perl projects, which is what we will do in the last chapters.

Orders Orders Backward Forward
Comments Comments

COMPUTING MCGRAW-HILL | Beta Books | Contact Us | Order Information | Online Catalog


HTML conversions by Mega Space.

This page updated on October 14, 1997 by Webmaster.

Computing McGraw-Hill is an imprint of the McGraw-Hill Professional Book Group.

Copyright ©1997 The McGraw-Hill Companies, Inc. All Rights Reserved.
Any use is subject to the rules stated in the Terms of Use.