This chapter is devoted to providing solutions to some of the
most common bugs you'll run across when debugging CGI applications.
The information in this chapter is the result of many frustrating
hours spent figuring out why an obviously simple CGI script would
not work. Hopefully, the information in this chapter will solve
some of the problems you might face while debugging CGI scripts.
The information in this chapter is based on topics covered in
Chapter 20, "Introduction to Web
Pages and CGI."
Basically, this chapter will cover the points to keep in mind
when you write CGI scripts that have to open files or pipes or
keep temporary information in files, as well issues about how
to respond to client-side requests. We will also cover some security
issues about writing and placing Perl scripts in directory trees.
The summary at the end of the chapter will serve as a good checklist
for you to use when evaluating potential problems with your Perl
CGI scripts.
The CGI script on the server side must be an executable file and
have execute permissions set for the file. Not having the correct
permissions results in "file not found" errors sent
back to the client. Not having execute permissions results in
other error messages being sent back to the client.
All paths of execution in the CGI script must return a value that
can be interpreted by the client. Test your CGI to make sure that
the application does not exit with an unpredictable return code.
If the CGI script relies on files on the server side, are they
accessible? The CGI script will be running under the same permissions
as the Web server. The server runs as the user called "nobody."
Make sure that the data files being accessed by the CGI script
have the correct read/write permissions for being accessed by
"nobody."
Opening the file usually entails checking for errors before proceeding.
Here's a usual call in a Perl script to open a file and read from
it:
If the Perl script skips the error-checking portion and just goes
on attempting to read from the nonexistent file handle MFILE,
it will read no data. (Perl does not crash in this instance.)
For example, the following lines are expected to return a GIF
image back to the client:
This code would be called in a CGI script to return the contents
of an image back to the browser whose request caused the script
to be executed. If the GIF image being opened does not exist,
then no data is sent back to the browser. Here are the ways to
avoid such errors:
One important thing to look for are possible infinite loops. Make
sure your CGI script always returns. Remember that CGI scripts
are run as the result of a client request coming to your server.
It's quite possible to have several clients make the same request
to start off these infinitely looping CGI scripts on your server.
Too many of these scripts will cause the machine running the server
to lock up!
Another common mistake when writing CGI scripts is to use the
error from a pipe command directly. The returned code from a pipe
command is based on whether or not the pipe was opened. The returned
code from the pipe command has nothing to do with the command
on either side of the pipe symbols (|).
For example, the following code will almost never execute the
die statement because a pipe
almost always can be opened on a given system:
Instead, break this statement into two statements and then close
the file handle. The relevant error code will be in the variable
($?) in case of any errors.
Use the error code returned from the close
call before proceeding with the second portion of the code. If
there are no errors, reopen the file and continue. This procedure
sounds really hokey, but it works as long as you keep two things
in mind: that starting up myprogram
is not inefficient and that each invocation of myprogram
is totally unrelated to any previous invocations of myprogram.
If these conditions are met, then you can go ahead and simply
structure the code like this:
Sending Mail to Valid Recipients
One way to use CGI scripts is to send mail to a recipient as part
of a response to a FORM handler.
For example, the following statements send a mail message using
mailx:
$sendTo = "badguy@bad.code.edu";
#### NO, NO, NO, NOT this way....
open (MAILME,"| mailx -s 'A report' $sendTo")
print MAILME, $mailMessage;
#
# some other code here.
#
close MAILME;
When using lines to send mail, make sure you specify the absolute
pathname to the mailer program. Also, close the pipe to the mail
handler as soon as possible; otherwise you might forget to do
so later in the code. Therefore, it's generally better to structure
the previous lines as this:
$sendTo = "myname@ikra.com";
open (MAILME,"| /usr/bin/mailx -s 'A report' $sendto")
print MAILME, $mailMessage;
close MAILME;
#
# some other code here.
#
Of course, you should use your system's mail carrier, such as
mh, or
elm, instead of mailx
in the previous example. Obviously, the underlying mail system
has to be up and running for this to work.
Most CGI applications run on UNIX systems that require the first
line of the CGI script to be a "bang" line. For example,
if your CGI script is a Perl script and your Perl interpreter
is in /usr/local/bin, then
the first line of the file should be #!/usr/local/bin/perl.
If you don't use this bang line, the default shell script is used
instead, and your CGI script is run by the default shell script.
On NT systems, the Perl CGI script should not have the bang line
as the first line in the script. The bang line is ignored on NT
systems. The CGI script should run on the NT machine without any
problems. However, when porting CGI scripts from an NT system
to a UNIX system, make sure you add the #!
line to the first line of every Perl CGI script.
Some Perl scripts use the Autoload.pm
module to dynamically load in extensions at execution time. A
Perl script will not run if the module cannot dynamically load
the extensions. When porting such scripts to different systems,
ensure that the extensions you have to load are available on each
system you port your Perl script to. Some modules may not be available
on the base system you port your CGI script to.
To avoid such problems at load time, you can either port the modules
yourself, not use the module extensions, or statically link the
extensions in. For example, most modules are statically linked
into the NT version of Perl because the autoload
module is not supported under NT. If you are certain that the
modules you are loading are not dynamically linked executables
and that all the functionality you need is in the .pm
file, then you can simply use the use
statement to load the .pm
module file.
Avoid system calls as much as possible when writing CGI scripts
that have to run on different systems. As its name suggests, a
system call is very dependent on the type of system on which it's
being called. Most versions of UNIX support system calls uniformly
and do not cause any problems. Different operating systems support
different types of system calls. A system call that works on a
VMS system might not work on a UNIX system, and vice versa.
A common problem that results from CGI scripts is that malformed
headers are sent back when a request for data arrives from a browser.
Normally, a MIME header is sent from a server back to a browser.
For example, to send an HTML document back, a header will be of
the form Content-type: text/html \n\n,
for a GIF image Content-type: image/gif
\n\n, and so on. A script that has errors in it or
simply does not run will not return this header to the browser.
Also, don't forget to send two new lines at the end of every header.
The server expects a blank line following the MIME header, so
make the header call like this:
print "Content-type: image/gif \n\n";
The \n\n construct may not
work under all conditions, especially those that require an explicit
carriage-return/line-feed pair. In this case you should use the
construct \r\n\r\n instead
of \n\n.
It's important to flush the output buffers used by CGI scripts
immediately. The underlying operating system may keep output written
to a file handle such as STDOUT
for some time. This time may be longer than a browser expects
to spend while waiting for a response. The simplest way to do
this is to select the output file handle and then set the $|
variable to 1.
A CGI script is the child process of the Web server running on
a system. Being a child process, it cannot set its environment
variables for a period longer than its own execution time. That
is, any environment variables set using statements like the following
will only set the value of the environment variable GEEPERS
for the script while it's executing:
$ENV{'GEEPERS'} = "creepers";
The value of GEEPERS is not
available to the parent (server) process, which invoked this shell
script in the first place. In fact, the next time the same CGI
script is run, the value of the environment variable GEEPERS
will be the value set in the server, not one set previously by
a client.
A possible way to track information between successive runs of
a CGI script is to use an HTML FORM
object to store variables. HTML FORM
handling is covered in detail in Chapter 20.
Basically, what you can do is store intermediate values in a TEXT
object, making the TEXT box
invisible. Successive calls to the CGI script update the value
of the variable in the TEXT
box. Of course, you can chew up disk space by saving intermediate
results to disk.
There are occasions when CGI scripts use temporary files to store
information. Don't forget to delete these files after your script
is done. After some time, such temporary files can accumulate
and use up valuable disk space. It's a good idea to exit from
one point in the code by calling a subroutine and to remove all
temporary files in that subroutine before exiting.
Keeping temporary files on a server also poses the problem of
synchronizing the temporary file with the process that created
it. Normally, the name of the temporary file is derived from the
process ID of the creating application. This, in turn, means that
only the process that created the file knows the filename and
when to delete the file. Even if a common prefix, such as CGI,
is used for all temporary filenames, processes within the same
process group should not arbitrarily delete all temporary files
beginning with CGI. For one
thing, other CGI applications might be using the temporary files
when another process deletes them. Also, there might be other
unrelated processes using CGI
as the prefix for their filenames.
Another common problem with CGI scripts is that beginning Webmasters
forget to make the path to these scripts visible to the Web server.
Most servers look in the cgi-bin
subdirectory as the top of the path for a CGI script to execute.
If the named file in the path does not follow the rules for the
server you happen to be running, the server will pick up the script
and ship it back to the browser as a text file. In most cases,
this is simply an annoyance. In some cases, looking at your CGI
script may give away valuable directory information to the end
user at the browser.
To avoid such problems, you should edit the configuration files
for the server you are running. For the ncSA server, this entails
editing the srm.conf file
in conf subdirectory of where
you installed the distribution for the server. The ScriptAlias
directive in the srm.conf
file controls which directories contain server scripts. The format
for the ScriptAlias directive
in the srm.conf file is
ScriptAlias fakename realname
For example, the following setting will make the /home/webserver/httpd/cgi-bin/
directory look like the /cgi-bin
directory to the Web server:
ScriptAlias /cgi-bin/ /home/webserver/httpd/cgi-bin/
Also, if you want to execute files at locations other than those
specified in the ScriptAlias
path, you can specify what file extensions are allowed with the
AddType directive. For example,
the following directive allows all executable scripts with .pl
or .cgi to be executed:
AddType application/x-httpd-cgi .cgi
In general, use absolute pathnames to all the files your CGI script
accesses. Specifying a relative pathname causes all searches using
the relative pathname to be started from the "root"
of the DocumentRoot. The
DocumentRoot directive in
the srm.conf file is the
base directory from which files are searched for binary files.
The benefit of using an external base starting directory is that
an entire directory tree can be moved by simply moving the root
of that tree. This way you do not have the agony of resetting
all pathnames if all the scripts in the root of the directory
change. However, the downside of this base directory path is that
it makes your movable directory susceptible to hackers who can
use the relative pathnames to point to their own files in place
of a directory tree on a system and let your documents point to
their own versions of your documents.
Finally, the configuration file access.conf
has a FollowSymLinks/ directive.
If this directive is enabled, a browser can be used to follow
symbolic links when it's resolving pathnames to find a document.
If your CGI script is accessing a file via a symbolic link, the
script will not work unless this directive is set to allow the
follow-up of links. Unfortunately, enabling the follow-through
opens up a security hole big enough to drive a virtual bus through.
If someone symbolically links a document to the /bin
or /sbin directory on your
system, he or she has free run of the system.
Warning |
Never put perl.exe in the httpd directories in the heat of debugging. It's a major mistake that will let users at the browser run anything on your system! Don't even symbolically
link to an executable program such as perl, sh, or something similar that a user could run off the command line.
|
Almost all servers append /index.html
or index.html to a given URL that references a directory. Therefore,
the following URLs both become http://www.ikra.com/index.html:
http://www.ikra.com/
http://www.ikra.com
Guess what happens when there is no index.html
file in the directory being referenced? The server returns an
FTP listing of all files in the directory! This type of
exposure of your directory subtree to the world might not be what
you want.
CGI scripts can often return HTML pages as responses. One of the
first things you should do is to check all URLs generated in these
scripts that refer to your server. Make sure that there is an
index.html file in all the
directories that a URL generated at your server can refer to.
It's a good idea for all URLs that you generate to be absolute
pathnames instead of relative pathnames.
One very important directory to place an index.html
file in is the logs subdirectory in the httpd
tree. Not placing an index.html
file in the logs subdirectory will expose all your Web server
logs.
This chapter is a synopsis of some of the problems you can run
into when coding CGI scripts using Perl. I cannot possibly enumerate
all the problems you might run into when debugging CGI applications;
however, this checklist will help you in debugging some common
problems:
- Make sure your CGI script has execute
permissions.
- Make sure that any data files used by
the CGI script are readable by user "nobody."
- The CGI script should compile and run
correctly. Do use the -w
switch on your CGI scripts, especially when testing, to make sure
that your Perl script does not have any embarrassing bugs.
- On UNIX systems, make sure you have the
Perl interpreter line (#!/usr/local/bin/perl,
#!/usr/bin/perl, whatever
), and on NT systems, take this line out. If you forget
to place this line in the front of your Perl script in UNIX, a
browser will get a 500 Server Error.
- Libraries for dynamically loaded modules
must exist in the @Inc path
of the Perl script.
- Any system calls in the CGI script must
be supported in the underlying system.
- Return valid MIME headers in all cases
from a CGI script.
- Flush output buffers immediately by setting
$| to 1.
- Don't rely on environment variables being
set on successive runs of the same CGI script.
- Clean up any temporary files created by
CGI scripts.
- Make the CGI directory visible by configuring
ScriptAlias in the
configuration file.



