![]() ![]() |
![]() ![]() |
![]() ![]() | |
© 1997 The McGraw-Hill Companies, Inc. All rights reserved. Any use of this Beta Book is subject to the rules stated in the Terms of Use. |
Perl is not a minimalist language when it comes to built-in functions. It is difficult to get far in Perl without using built-in functions and special variables.
Just in talking about Perl, we have already covered several of these variables/functions in detail. There are many, many more; so many, in fact, that people get lost in which ones to concentrate on.
This chapter is designed as a solution to that problem: to point out the most important variables and functions and demonstrates their most common usage. For a complete reference, see the perlrun, perlfunc, and perlvar reference pages.
Since we don't want to duplicate the effort of the online manual, we shall confine ourselves to going over the most common perl functions and variables that we did not touch upon in chapter 3.
We shall parse out the functions available in the 'perl core' into groups; and try to stay away (if at all possible) from the functions that are Operating System - Unix - specific, and these shall form the basic structure of our chapter.
The basic 'functional groups' we shall discuss are:
Specialized quoting and formatting
functions for IO on formatted data
functions for variable manipulation
time manipulation functions
debugging functions
functions to get information about files
Process forking and management
functions emulating operating system commands
In each one of these sections, the emphasis will be on giving examples, and filling in the holes that I believe the on-line documentation has. (However note that the perl5 development group is a busy bunch; these holes may already be filled by the time you read this book.)
After going through these functions, we turn to the internal variables that perl provides. These internal variables make for lots of shortcuts in your actual programs; and we discuss the most common ones:
Internal Tokens for use inside Perl code
Internal filehandles inside perl
named, internal variables
one character variables that affect the way an internal perl function is executed
one character variables that are set as a side effect from other internal perl functions.
Even though this looks fairly extensive, please don't use this chapter as a reference! You will want to know the functions and variables described in this chapter pretty well, but you are doing yourself a disservice if you don't print out the relevant documentation.
If there were a phrase to describe Perl, I would say that Perl is "simple in the design, complicated in the details".
As you have seen that this is true from previous chapters, especially with regular expressions. Eight simple principles form the core of regular expressions. But, since Perl evolved to handle real world problems rather than being designed from scratch, the engine evolved to be very intricate, with a lot of functionality behind each symbol.
The same is true with the story of how Perl has evolved its built-in functions and special variables. The built-in functions each fill a pragmatic gap. And, just to be on the safe side, some gaps have been filled more than once. We have already seen this before, in the form of shift and splice. The following two statements are equal:
my $return = shift(@arrayName);
my $return = splice(@arrayName,0,1);
Both take the first element off the beginning of @arrayName and stuff it into the variable $return. Or, if you want to count the number of characters in a string:
my $nochars = length($string);
my $nochars = @{[(split(//, $string))]};
my $nochars = $string =~ tr[\000-\377][\000-\377];
(although I dare say that the first usage is a little bit less noisome and a lot easier to read than the others.) The official Perl motto may be 'There's more than one way to do it', but I would be sorely tempted to modify this to be 'There's more than one way to do it, but some are better than others!'.
Anyway, this chapter is designed to exhibit some of the more powerful built-in functions and special variables that are built in to Perl (some are UNIX specific, and then we give their NT counterparts). These functions are so much a part of the Perl programming experience that you should learn them and know them quite well.
There are a few simple rules that you need to know about the default behavior of internal functions and operators. Then, if you run up against a real life problem that does not exactly match this default behavior, chances are that there is either a function, or a special variable that will help you resolve it.
For example:
@arrayName = ("This", "is", "an", "array");
$stringName = "@arrayName\n"; # $stringName becomes
print $file; # 'This is an array'
prints out "This is an array" by default, as you might remember from the section on interpolation. But what happens if you want "This|is|an|array" instead, perhaps to use in a regular expression, or to load into a database? Perl provides the variable $" (or $LIST_SEPARATOR) which determines WHAT to put between the elements of the array (in string context). Here, setting:
$" = "|" # also use English; $LIST_SEPARATOR = "|";
# see below
print "@arrayName\n"; # prints 'This|is|an|array'.
does what you want, which is return a character string with "|" delimiters.
There are thousands of these little tricks in Perl. Likewise, there are classes of tricks:
tricks with functions
tricks with special variables
tricks with the order of execution.
These tricks flow from the fact that Perl is such an expressive language. It is full of computer metaphors, similes, and other such constructions. Learning these tricks is one of the biggest steps you will take in becoming a strong Perl programmer. In fact, the biggest step, aside from object-oriented Perl.
It is pretty much out of the scope of this book to give all of these small time savers - the first appendix of this book is a start - but the best way to learn them is to:
learn the principles we discussed these last few chapters (contexts, variables and references)
discover them on your own by combining these principles with what we will cover in this chapter
look at the Perl source code for ideas. Thousands of lines are there at your beck and call.
talk to other Perl programmers ('comp.lang.Perl.misc' comes to mind)
just plain experiment
Built-in Perl functions are routines that are already recognized by the Perl interpreter. You get them for free, that is, you need no libraries in order to be able to access them. They are "built-in" to the language.
As you might expect, the main principle behind built-in functions is pragmatism. Who wants to have to remember two or more functions which basically do the same thing? Since ease of usage was the supreme goal behind Perl's design, Perl's functions have evolved to be as painless as possible. Perl functions are smart. They know what type of variable as been passed in to them, and know how to treat it.
For example:
$line = "Please chop the letter T";
chop($line); # $line becomes "Please chop the letter "
works to chop the last character off of $line, whereas
@listOfValues = ("Chop", "Every", "Letter", "Off");
chop(@listOfValues); # @listOfValues becomes ("Cho", "Ever", "Lette","Of")
works to chop each character off of @listOfValues.
In other words, chop works the way you would expect it to work, chopping one character off each scalar, and one character off each element in an array. chop performs a simple function, but does it 'blind to datatype'.
What happens if you say:
%hash = ('key' => 'value', 'key2' => 'value2');
chop(%hash);
Well, pretty much the same thing. After this is done, the hash becomes:
%hash = ('key' => 'valu', 'key2' => 'value');
by chopping off each letter in the values, but leaving the keys alone! This is a somewhat startling, but entirely logical, way to program. If you chop a character off the key as well, you may end up with a hash that has empty keys, or a key with more than one value. The hash would no longer be viable. For example, suppose that you had the following hash:
%hash = ('a' => 'bb', 'c' => 'dd');
and when you called
chop(%hash);
it went ahead and chopped off both the value and the key. Then you would end up with something that looked like:
%hash == ('' => 'b', '' => 'd');
This is not viable. What happens if you tried to look up the following hash entry:
$hash{''};
What would Perl return? 'b' or 'd'? Therefore it was crucial that chopping only effected values where you cannot possibly destroy the integrity of the hash.
This whole evolution of usefulness is due to a lot of thought (and trial and error). Sweat and tears went into programming these built-in functions. You would be smart to learn them not only for their power, but for their ability to teach good programming style. Let's now turn to look at some of the more important ones (a comprehensive list, along with portability concerns exists in appendix 1. You can also go to the perlfunc manpage for more detail)
Unless you are doing specialty programming, (sockets, UNIX system administration, etc.) the functions below (combined with the functions you have already learned) are the ones you will be using 95% of the time.
Perl's built-in functions can be grouped by different functional groups and below are the most useful functions which are covered in this chapter. I supply a windows32 function if no internal Perl function works correctly on win32 (this is especially the case for process management):
Functions for specialized quoting and formatting: quotemeta() qq() qw() q{}, qx{}, HERE documents
functions for IO to or on formatted data: read, printf(), sprintf(). (For information on open, close, and print, turn to chapter 3 where we go over filehandles in 'Perl variables'.)
functions for variable manipulation: sort(), split(), grep(), and map() For many others, turn to section 'Perl variables'
time manipulation functions: localtime(), timelocal(), time(), times(), sleep()
debugging functions: caller(), warn(), die() (we shall touch on these again the chapter 'programming for debugging'.
Functions to get information about files: file tests, glob()
process forking and management: system(), ``, fork(), exec(), wait(), Win32::Process()
functions for emulating operating system commands: rmdir(), mkdir(), chdir()
As a quick note, Perl has a couple of cool 'pseudo' functions which will help you keep your sanity, especially if you are writing code to generate more code, and in dealing with regular expressions.
These functions are: quotemeta(), and the specialized quote operators (qw(), qx() qq() q{ }). We briefly look at them below.
Suppose that you had the following text that you wanted to search for inside a lot of files:
( aleph + beta ) ** 2;
and suppose you knew that this text could be matched verbatim. If you now did the straightforward thing:
1 $text = '( aleph + beta ) ** 2';
2 foreach $file (@filelist)
3 {
4 open (FD, $file);
5 $line = <FD>;
6 print "Match if ($line =~ m"$text");
7 }
then you would come to grief. Why? Because when you put the text '( aleph + beta ) ** 2' into the regular expression (line 6) the regular expression treats the '(,+,),*' characters as special characters. Hence, the text $text will not match the string $line in line 6.
quotemeta() changes all that. When you say:
$text = quotemeta('( aleph + beta ) ** 2');
it makes the resulting variable $text regular expression friendly, backslashing all the right characters for you:
\( aleph \+ beta \) \*\* 2
which is a much easier way this type of regular expression.
These four 'functions' are actually more readable forms of quotes, which are especially handy when writing Perl code to generate Perl code.
For example, suppose you wanted to generate the following code:
\$line = "\$text2$text";
in which you wanted to keep $line and $text2 as variables, and interpolate $text into whatever it is on the command line.
Well, if you did the usual:
$code = "\$line = \"\$text2$text\"";
you will, before long, get a bad case of "backslashitis". With qq(), you can say:
$code = qq(\$line = "\$text2$text");
which does exactly the same thing. 'qq()' here is just a synonym for "". Here's a table of all their meanings, plus an example.
Table 10.1 Specialized quotes
Symbol Meaning Example
qq() double quotes $text = 1; $text2 = qq($text) => $text2 = 1
q{} single quotes $text = 1; $text2 = q{$text} => $text2 = '$text'
qw() word list @array = qw(this is) => @array = ('this', 'is');
qx() execute (``) $text = qx(dir) => $text = `dir`;
Each of these have the benefit that you don't need to backslash the respective quotes.
Perl goes farther than this in helping you print out things such as code, or data. Suppose you have several lines (tens to hundreds) that you want to print out pretty much verbatim, but with one change: you want to interpolate any variables that you find. Lets suppose, that there are lots of double quotes (you are generating C code.) If you said:
$line =
"
char $variable\[$length\] = \"$value\";
...
";
then you've got two strikes against you for readability. First, you have extra newlines, since the returns are interpreted as returns. Second, you need to backslash any double quotes you see.
Enter the HERE document. HERE documents are ways for you to make literal text in Perl, and assign it to a variable or function. The above becomes:
$line =<<"EOL";
char $variable\[$length\] = "$value";
EOL
EOL is an arbitrary tag which tells Perl what to look for to end interpreting the text as text, and start looking at the rest as Perl code.
This is basically saying 'take all the text between "EOL" and the exact characters EOL and turn it into a Perl string.' Since you are saying "EOL", interpolation happens. Perl also ignores the beginning and ending returns so you get one line of text. Note that we did not say 'EOL;' at the end. This is a syntax error.
Perl allows three types of HERE documents 'EOL' (single quotes/no interpolation), "EOL" (double quotes plus interpolation) and `EOL` (backticks for shell calls plus interpolation). The third type is quite intriguing. It basically lets you embed a shell script inside Perl, returning the text to a Perl variable:
$line =<<`ENDOFSCRIPT`
cd $dir1
dir /s
type filename
ENDOFSCRIPT
Or, more helpfully:
my $fd = new FileHandle("scriptname"); my @script = <$fd>; close($fd);
$line =<<`ENDOFSCRIPT`;
@script
ENDOFSCRIPT
which will execute any script (UNIX or Windows NT) and then pipe the results to the variable $line.
Reading and Writing Operations to files and variables
You actually have seen the three functions that are the most important for File Handle manipulation: open, print, and close. You can get a long way knowing these three functions. The functions that follow are really the 'icing on the cake'. These functions, read, print, and sprintf, aren't nearly as important as open, print, and close, but can be extremely helpful in dealing with special cases.
read, print and sprintf are good to know, especially when you hit data which can't be parsed easily by slurping it into a variable with a '$line = <FD>' (read). Another use is if you hit a case in which you want to have selective writing to a variable or filehandle (printf, sprintf). Read on for more information on these built-in functions.
read
read is Perl's major way to read in information, in bulk, from a filehandle. It's usage is as follows:
$bytes = read(FILEHANDLE, $scalar, $length, $offset);
where $scalar is the variable you want to populate with the data from FILEHANDLE, $length is the amount of data you want to read in, and $offset (optional) is at what point you want to write the data into the scalar. $bytes is the number of characters read. If there are no bytes to be read (i.e.: the file is at an EOF) this returns 0. This makes it nice to use in a while loop (as in 'while (read(...))'). For example:
open(FILEHANDLE, "file");
$bytes = read(FILEHANDLE, $input, 40, 20);
would read up to 40 characters from the open filehandle FILEHANDLE into the scalar $input, starting at the position 20. Something like:
$input = "<twenty nulls><read from filehandle>";
in which the string $input is padded with 20 nulls, and then (and only then) are the forty characters inserted. Here, for example is a simple program to make 'block text' (text with no newlines) a little more 'user friendly'. It chops up data into 80 character segments, putting a newline after each (Use \r\n if you are using DOS or NT)
while (read(FILEHANDLE, $input, 80))
{
print "$input\n";
}
although I wouldn't suggest using this script on text that is more conversational in nature (i.e.: lots of words with newlines on the end.) In that case, the read statement has no way of knowing whether one of the characters it has read is a newline or not. Hence, you could get lines with one word and only one word on them!
read really should only be used when you know exactly what size blocks that you need to slurp in from a filehandle, and there is no real delimiter to the file. If there is a delimiter, such as a newline, or you want to read the entire file at once, use $line = <FILEHANDLE> instead. This will save you tons of trouble. Too many times have I thought that I knew something was of 'fixed length' and used read, only to find out that the file in question was not always fixed length, but off by a character or two. This caused strange errors when reading it.
printf
printf is 'sort of' the opposite of read. Instead of reading in fixed amounts of data, printf gives you the option to output fixed amounts of data. It's usage is:
printf [FILEHANDLE] $format_string, @arrayOfValues;
For example:
use FileHandle;
my $FH = new FileHandle("> my_file");
$longvarb = "This has 23 chars in it"
printf $FH "%10.10s\n", $longvarb;
prints out 'This has 2' to the file handle $FH. printf has chopped the scalar to fit into 10 characters. The printf statement should be read as "print 10 and only 10 characters of the variable $longvarb to the file handle $FH." C programmers should be fairly familiar with this. printf does something similar to what we see in Figure 12.X:
figure 10.1 (line art)
Figure 10.1
reading printf
printf uses the percent sign (%) followed by a combination of numbers or letters which give it instructions on how to print out whatever follows in the argument list. To use printf correctly, you need to have a 'secret decoder ring' that tells you what each one of these % signs mean. Fortunately for C programmers, this is the same as the one in C, the more important elements given in Table 10.1, which shows which shows what some popular combos to printf.
Table 10.1--
c -- one character. %7c, will pad out one character with six spaces..
d -- number (integer) in decimal. %7d will pad out the decimal to 7 places (right justified)
f -- float %7.2f will be of precision 2,pad out to 7 places (right justified)
g -- double %-7.2g will be of precision 2,pad out to 7 places (left justified)
ld -- long decimal %7ld will do the same as %7d, except work with larger decimals
o -- octal %3o will right pad octals less than 3 places with spaces
s -- string %16.11s will take first 11 chars of string, pad to 16 with spaces.
u -- unsigned number %-7u will pad out, left justified, any unsigned number
x -- a hex number %6x will pad out, right justified, hexadecimal numbers
Each one of these formats takes two optional numbers (like %10.10s). If you said:
printf "%s\n", "This is a string";
This is equivalent to the statement:
print "This is a string\n";
However, if you say:
printf ":%4.3s:", "This is a string";
this will print out
': Thi:'
because the '%4.3s' says to Perl "print out space for 4 characters, yet only print 3 characters in the string itself." It therefore truncates your string to the first 3 characters, and at the same time prints four characters (space padded.)
printf "%5.2f", "1.333333333");
This would print to standard output by default, "1.33" because the '2' in 5.2 indicates how much precision is going to be printed in that number. You can use anything to match up with the percents, even user functions. Hence:
printf("hey %s\n", userFunction());
will stuff the results from userFunction() into the argument to printf, %s.
Perhaps the best way of showing the use of printf is to give examples. Kernighan and Ritchie's book on C also has some pretty cool ones:
printf(":%-15.10s:%7d:", "String incarnate", 222);
:String inc : 222:
Again, the '-' means to be left justified, and 15.10 means 'take at most 10 characters, and stuff them into a space for 15.'.
printf(":%15s:%7d:\n", "String incarnate", 222);
:String incarnate: 222:
Since a second number wasn't specified here, the string is not chopped off, and the full string's length is given, even though it is more than 15 characters long.
printf(":%15s:%-7u:\n", "String", -222);
: String:4294967074:
Here, you need to be careful of conversions. Since %u indicates an unsigned number, a negative number behaves unexpectedly.
printf(":%8x:%8o:\n", 123123,123123);
: 1e0f3: 360363:
This converts 123123 into hexadecimal(%x) and into octal (%o).
printf(":%d:\n", 13131313111);
:-1:
Here is another case in which you need to be careful of conversions. '13131313111' is too big an integer for %d (or %ld) to handle (use %s instead).
$this = 13131313111;
printf("$this :%ld:\n", 13131313111);
13131313111 :2147483647:
Interpolation works, too. The argument to %ld shows that it too can overflow (and the largest integer possible for %ld is 2147483647).
printf is the most flexible way to print in Perl. We shall see another cool way to do reports next chapter, by using formats. But simply remember that you can do anything with printf that you can do with formats, plus more.
sprintf
sprintf is simply the analogue to printf which prints to variables rather than to files. The usage of sprintf is:
$variable = sprintf($string, @arrayOfValues);
Hence, $variable gets whatever the format string $string and @arrayOfValues evaluate to. For example, the following:
my $hex = sprintf("%x", $decimal);
converts a decimal value to hex. To see more examples on this usage, see printf above.
Summary of Reading and Writing Operations to File Handles
The main three functions for File Handle reads are open, print, and close. Covered in this section are some more specialized read and write functions:
1) read which reads chunks of data into memory.
2) printf which prints out to a file handle chunks of data using a special formatting string.
3) sprintf which acts as printf on a string, rather than a file handle.
Operations on variables.
Perl provides a myriad of ways of operating with variables. We have already seen some of these functions in detail, as in delete, keys, each, push, pop, shift, unshift, and splice. Others which will greatly empower you when programming with Perl are sort, split, grep, and map. These functions can take common, 20-line subroutines, and turn them into one liners. Therefore, we shall cover them in great detail.
As with any Perl component, these functions can be either used or abused. For example, you probably don't want to do something such as:
grep(s"pattern1"pattern2", @list);
because it is slower than:
foreach (@list) { s"pattern1"pattern2"; }
Both will substitute pattern1 for pattern2 inside the array @list, but the second construct creates a return list, which takes time and memory. Hence they are functionally equivalent, but not equivalent in efficiency.
sort
The function sort allows you to sort an array using any sort function you feel is appropriate. sort's syntax can get quite involved, but below are some of the more common usages:
sort (@arrayName);
sort { CODEBLOCK } @arrayName);
sort sort_function @arrayName;
The default usage of sort is to sort in alphanumeric order. Start with an array @arrayName as follows:
@arrayName = ('Apples','Bananas','Carrots');
To sort alphanumerically:
@arrayName = sort (@arrayName);
This syntax sorts the values in @arrayName in alphanumeric order. Hence, something such as:
@arrayName = (1,10,9);
@arrayName = sort(@arrayName);
is probably not going to work in the way you want, unless you want the order 1,10,9. Note that the above is not an 'inplace' sort. The sorted values get put in their own array, to be returned to the main program as necessary.
Another usage of sort is with the passing of a function to the sort. Since the previous example does not work (as in returning 1, 9, 10), and since people have a myriad of ways in which they can sort their data, Perl provides the ability to give a subroutine to the sort.
If you define a subroutine that goes to a sort, do not use the regular:
my ($variable) = @_;
method. Instead, sort uses the special values $a and $b (to denote element1 and element2), to pass to the sorting subroutine. Whatever the result of the compare between these two values determines the result.
Consider the following blocks of code:
@arrayName = (1,10,9);
@arrayName = sort numerically @arrayName;
print @arrayName;
sub numerically
{
$a <=> $b; # '$a' here means the first element
} # '$b' here means the second element
# does a numeric comparison, will come out
# in numeric order.
In this example, Perl takes all of the elements in the array, and plugs them in to the comparison function numerically in turn (as opposed to alphanumerically). Internally, the comparisons might look like:
($a = 1) <=> ($b = 10) # a becomes 1, b becomes 10. Do not switch
($a = 1) <=> ($b = 9) # a becomes 1, b becomes 9... Do not switch.
($a = 10) <=> ($b = 9) # a becomes 10, b becomes 9. DO switch.
and so on. (If you want to watch this order yourself, you can put a print statement in the sort routine.)
A short hand method is to give a subroutine right before the sort, as in:
@arrayName = sort { $a <=> $b } @arrayName;
which does the same thing, but avoids writing a subroutine call.
******Begin Side note******
You can also get really fancy with sorts. See the perlfunc manpage for more details. In essence, you can do something such as:
@arrayName =
@arrayName
[
sort
{
$arrayName[$b] <=> $arrayName[$a];
}
(0..$#arrayName)
];
This is a little wacky. It does the same thing as:
@arrayName = sort {$a <=> $b } @arrayName;
but it is more efficient since it sorts the INDEX of the array, rather than the array itself. So, if the comparison function is pretty nasty, or the elements of array name are huge, you can avoid a lot of extra comparison time or copying time. Again, see sort under Perlfunc for more details.
*********end side note*******
split
split is the function which takes scalars, and breaks them apart into arrays. split works by looking for either a delimiter or a regular expression, and then divides them upon that boundary.
Usages of split:
@arrayName = split($text, $scalarName);
@arrayName = split(m"regexp", , $scalarName);
@arrayName = split(m"regexp", ,$scalarName, $NUM_ELEMENTS)
The following example splits $scalarName into the array ('Comma', 'Separated', 'elements'). In other words, split separates $scalarName by commas:
$scalarName = "Comma,separated,elements";
@splitArray = split(",", $scalarName); # @splitArray becomes
# ('Comma', 'Separated', 'elements')
If you had:
$scalarName = ',has,,,Extra Commas Here'
@splitArray = split(",", $scalarName); # @splitArray =
# ('','has','','','','Extra Commas Here');
then notice that you get an element for the initial space between the beginning of line, and the first ','. You also get extra elements for the null fields between two adjoining commas (,,). To avoid the second behavior, you can use a regular expression in the first argument of split:
@splitArray = split(m",+", $scalarName);
Since the regular expression is now matching on several commas in a row, they all get combined into one comma as far as split is concerned. This produces:
('','has','Extra','Commas','Here')
If you want to get rid of the extra first space, you are going to have to do it with a shift function:
shift(@splitArray) if (!defined $splitArray[0]);
Likewise, you will have a trailing blank if the last element in $scalarName is a ','. To get rid of that do a pop:
pop(@splitArray) if (!$splitArray[-1]);
There are three behaviors of split that you should be aware of. First, if you use any parentheses inside the regular expression (for back matches), those parentheses will turn up as fields in your split, as in:
$line = 'PIPE|DELIMITED|FIELD';
@arrayName = split(m"( \| )"x, $line); # Note... modifiers to regular
# expressions (x) work here.
In this example, @arrayName becomes:
('PIPE','|','DELIMITED','|', 'FIELD')
Second, the construction:
"$line = " This is a special case ";
@arrayName = split(' ', $line);
makes @arrayName ('This', 'is', 'a', 'special','case'). In other words, split gets rid of all the trailing and leading spaces for you and just gives the text in separated array elements.
Third, if you put a number in the third argument of split, it makes split create an array only as big as that number. In:
$line = "MULTI:LINE:COLON:DELIMITED:FIELD";
@line = split(m":", $line, 3);
@line becomes ('MULTI','LINE','COLON:DELIMITED:FIELD'); This behavior makes your code more efficient, since Perl does not have to split as many fields for you.
Split is especially useful for dealing with tabular data, in which there are nice, stately rows of data all delimited by a certain character. This is an very common metaphor:
($accountId, $firstName, $lastName) = split(m"\|", $accountField);
in which the names of the fields that you are splitting are listed out while being split. Here is an example that, (along with the block text example above) takes conversational text that so often happens to have the very bad habit of going over 80 characters per line (hence wrapping to the next line) and turns it into neat lines that fit on an editor screen. This example shows the power of regular expressions at its finest:
undef $/;
$line = <FD>;
$line =~ s"\n" "g; # substitute, temporarily, newlines for spaces.
@eighty_char_lines = split(m"(.{1,79}\b)", $line);
$" = "\n";
print "@eighty_char_lines\n";
This works by using one of the properties of regular expressions. Remember that regular expressions are by nature greedy. In the line 'split(m"(.{1,79})", $line);' you are telling the regular expression engine to look for from one to 79 characters, the more the better, and to terminate these seventy-nine characters by a word break (\b).*
OK, not exactly. There is a subtle bug here, which if you have spotted, you are catching on quite well. By saying m"(.{1,79}\b)", we are telling split to take this regular expression as our delimiter. Therefore, since the delimiter matches everything, we will get something like:
in which the spaces are the actual elements that split matched. To get rid of these you can say:
where the grep weeds out strings of length 0. See grep below. |
The result of all of this is that the split greedily chops up all of the lines into chunks as big as possible, but not exceeding 79 characters. It looks something like:
undef $/;
$line = <FD>; # read in the entire file
@words = split(' ', $line); # split the line into words
do
{
while ((length ($line . $words[$xx]) < 80) && (defined $words[$xx]))
{
$line .= " " . $words[$xx];
$xx++;
}
print $line;
} while (defined $words[$xx]);
except a lot faster, and a lot more succinctly stated.
grep is the all purpose array manipulator in Perl. It is very similar to the UNIX utility of the same name, but this grep works on Perl arrays and Perl functions. However, Perl's grep comes with traps for the unwary. Its usage is:
@arrayName = grep(EXPRESSION, @arrayToMatch);
ArrayName = grep { FUNCTION } @arrayToMatch);
This is how Perl's grep works. The array given to the second argument has all of its values taken, one by one, and set to the special variable '$_' (see section 'special variables' below). Something like figure 10.2
figure 10.2 (line art)
Figure 10.2
Simple grep usage
This variable can then be used in the function or condition specified in the first argument. If that condition or function evaluates to non-blank or non-zero, then that element matched the condition of the grep, and is put on the return value of the grep.
grep is legendary for being obscure. The best way to explain it is to demonstrate its usage. A simple example may make grep clear:
my @integers = (0,1,2,3,4,5,6,7,8,9,10);
my @matches = grep($_ > 5, @integers); # '@matches' now equals ( 6,7,8,9,10 )
grep took each value in @integers, stuck it in $_, and compared it to 5. If the comparison was true, (i.e.: $_ > 5 evaluated to one) grep then pushed the matching element onto @matches. If not, grep went on to the next element. The one line grep statement is pretty much equivalent to:
foreach $integer (@integers)
{
if ($integer > 5)
{
push(@matches, $integer);
}
}
It is also possible to use a function or a regular expression with grep:
@matches = grep{ $_ > 5; } $integer; # same as above, only has anonymous
# subroutine aka sort.
@strings = (1, 'integers', 2, 'and', 3, 'strings');
@integers = grep(m"^\d+$", @strings); # matches (1,2,3). Doesn't match non-'integers', 'and', 'strings'
And if the value of $_ changes while you are grepping through the function, it actually changes the array itself, as in:
@integers = (1,2,3);
grep ($_ = $_*2, @integers); # makes @integers the array (2,4,6).
# you may want to use map or foreach instead.
although this usage is slightly less efficient than:
foreach (@integers) { $_ *= 2 }
because it builds a list for returning to a variable, and you aren't using that return value!
Now there are several practical uses for grep. You can, for example, use grep to turn an array into a hash whose keys are the elements in the array, and whose values are the number of times that element occurs in the array:
@words = ('be','very','very','very','afraid);
grep($concordance{$_}++, @words);
Now %concordance will look like:
%concordance = ('be' => 1, 'very' => 3, 'afraid' => 1);
Therefore, this is a true concordance.. By saying "$concordance{'very'}" we find out that this occurred three times in a our input array.
In scalar context, grep returns the number of times a given pattern matched. If you use good coding style and always put your subroutines at the beginning of a line, as in:
sub mySub
{
}
Then you can find out, say, if you have more than five subroutines in 'mySub' by saying:
my $FH = new FileHandle("Module.pm");
@code = <$FH>;
if (grep(m"^sub ", @code) > 5)
will evaluate to true if there are more than 5 occurrences of ^sub in the code (sub at the beginning of a line). And:
if ((@sub_lines = grep (m"^sub\s+(\w+)", @code)) > 5) # saves subroutine names.
both evaluates to true on the same condition, and saves the matches in @sub_lines (which will happen to be your subroutine names).
Another interesting thing you can do with grep is weed out duplicates in arrays:
@duplicates = grep { $marked{$_}++; $marked{$_} == 1; } @original;
This takes an array like '(3,1,2,4,3,2,1,5)' and turns it into '(3,1,2,4,5)' by marking each array element, and only returning 'true' if that element was seen for the first time ($marked{$_} == 1) .
Perhaps you need an array that consists only of values that are in ascending order, and throw out the middle values:
@array = (1,4,6,2,8,1,11,32,16);
@ascending = grep
{
if ($_ > $max) { $max = $_; return(1); }
else { return(0); }
} @array;
returns (1,4,6,8,11,32). This works by having a placeholder called $max, and only returns true if the value (again passed through the special variable $_) is greater than $max.
Another idea is to get all the error lines out of a log file:
use FileHandle;
my $LOG_FILE = new FileHandle("log");
@error_lines = grep(m"ERROR", <$LOG_FILE>);
Here, the filehandle $LOG_FILE gets slurped into a big list, and only the lines that have 'ERROR' in them get passed back to the array @error_lines.
We could also use this trick to get a concordance of words on an entire file:*
grep($concordance{$_}++, (map { split(' ',$_) } <FILEHANDLE>)));
Attentive readers will notice that this is the same as an example given in chapter 5. Twenty lines in the script are shrunken down to one, somewhat obscure line. There has to be a happy medium. |
This borders a little too heavy on the code hack side of things. If you can read this one line program and understand it, you are learning Perl pretty well. The 'map split' takes all of the lines in FileHandle, splits them into words, and then passes the result of that split through the grep engine, which then dutifully marks each time it has seen a given pattern in the hash %concordance. This is illustrated in Figure 10.3:
Figure 10.3 (line art)
Figure 10.3
Nasty grep example.
The major concept to remember about grep is not to accidentally trash your array whilst doing the compare. So, if you say something like:
@backup_files = grep (s"\.bak"", @files);
you will not only get a list of backup files, but also make all of the backup files in @files disappear, so the array @files ('file1','file2.bak','file3.bak') becomes ('file1', 'file2','file3') whilst @backup_files becomes ('file2','file3'). This is probably not the original intention of your code!
map begins with an input array and transforms it into an output array given a function that you specify. It sort of is like a map in the sense that there is a relationship between the input values and the output values, and that you map one value to another given a transform function.
The usage of map is:
$arrayName = map( &function, @source_array);
$arrayName= map( &function, @source_array);
map is like grep in that it iterates through every element in an array, but is different because map has no implicit filter that gets rid of undefined values. This is very important: map has no default filtering of values applied to elements that pass through the function function. Therefore:
@array = grep( 1 == 2, @newArray);
will always return an undef array(), whereas:
@array = map (1 == 2, @newArray);
will return, instead, a list of elements in @array that equals the size of the number of elements in @newArray. This array will be blank, since the condition '1==2' always evaluates to false.
In other words, the following statement using map:
@targetArray = map(function($_), @sourceArray);
is the logical equivalent to:
foreach $element (@sourceArray)
{
$targetElement = function($element); # do the target
push(@targetArray, $targetElement);
}
The following two examples have the same result:
@integers = (1,2,3);
@twiceAsBig = @integers; # makes @twiceAsBig (1,2,3) temporarily.
grep($_ = $_*2, @twiceAsBig); # makes @twiceAsBig (2,4,6).
@twiceAsBig = map($_*2, @integers); # makes @twiceAsBig (2,4,6) in one step.
Except that the example using map is much cleaner, more explicit, and performs the task in one step. (As stated many times, functionality in one step should not be the only goal, unless it can be achieved with clarity.)
Notice that map does not do a check on the array elements as grep does. In the statement:
@files = map (-f $_, @fileList);
map does not eliminate records that evaluate to zero or NULL, hence the resulting array is:
(1,'',1,'','','','',1)
meaning that the first record matched was a file, the second wasn't, etc., etc., etc. Whereas if you said:
@files = grep(-f $_, @fileList);
then you get a list of files instead of an array of ones and blanks.
Another important note about map is that it takes whatever is returned by the mapping function, and stuffs it into the return array. This means that you need not have the same number of elements in both. If you say something like:
@multiples = map { $_, $_*2 } @multiples;
then you have doubled your array size, turning (1,2,3,4,5) into (1,2,2,4,3,6,4,8,5,10). Likewise:
@greater_than_five = map { $_ > 5 ? ($_) : () } @multiples;
returns an array with less elements than you started with. You are probably better off doing this particular function as a grep.
Like grep, map is immensely useful. Suppose you have a data structure which you want to flatten. Say for example, you have an array of hashes that look something like:
$AoH = [
{
'name' => 'Carmen',
....
},
{
'name' => 'Peter'
...
}
];
and you wish to extract all of the name's out of it into an array. With map, it is fairly straightforward. This is how to do it:
@names = map { $_->{'name'}} @$AoH;
Likewise, suppose you want to make some skeleton code given a list of subroutine names, to conform to coding standards, give a common interface to packages, etc. A good start would be something like:
@subNames = ('subroutine1','subroutine2','subroutine3');
@codeSubs = map { "sub $_\n{\n}" } @subNames;
This would make @codeSubs look like:
@codeSubs = (
'sub subroutine1
{
}'.
'sub subroutine2
{
}',
'sub subroutine3
{
}'
which does the subroutine bit, but ignores any issues about package names, package variables, etc. Now all you have to do is figure out how to actually fill these subroutines up with code!
Summary of Operations on Variables.
As a Perl programmer, you really want to know sort, grep, map, and split. These functions are truly workhorses in Perl, being used for thousands of different applications.
1) sort, in its simplest application, sorts an array in alphabetical order, or by a user defined functions.
2) split takes a scalar and separates it based on the criteria that the user provides.
3) grep, in its simplest application, finds which elements in an array satisfy a condition.
4) map, in its simplest application, makes a new array based on a transform from the old array.
Time functions
Perl has quite a few functions which help you to deal with either timing code, getting time of day, affecting the timing of running code, and just plain dealing with time. These applications are shown below.
localtime
localtime is Perl's major interface for getting information about what time the system thinks it is. It has two usages, depending on its context. The usage of localtime is:
($seconds, $minutes, $hours, $monthdays, $months, $years, $weekdays, $yeardays, $daylightSavings) = localtime($secs);
$timestring = localtime($secs);
The first usage, in array context, gives an array of time elements ($seconds, $minutes, $hours, etc.) based on the one argument to localtime, the number of seconds since 1970. In this example:
($secs, $min, $hr, $mday, $mnth, $yr, $wd, $yd, $ds) = localtime();
Perl assumes that you mean the current time, something like:
$secs = 22;
$min = 24;
$hr = 22;
$mday = 7;
$mnth = 2;
$yr = 97;
$wd = 5;
$yd = 65;
$ds = 0;
You should know that year is number of years since 1900 and month starts with January being equal to month 0. Likewise weekday starts with 0 being Sunday, 6 being Saturday. The scalar form:
$timeString = localtime();
gives the current time in the format:
Fri Mar 7 22:22:50 1997
whereas something like:
$timeString = localtime(0);
results in:
Wed Dec 31 17:00:00 1969
because localtime is geared to the number of seconds since Jan 1, 1970, Greenwich Mean Time.
timelocal
timelocal is not strictly a built-in function. Instead, it is a function provided by the Perl package called 'Time::Local', a module which you need to include in your programs if you wish to use. It is the opposite of localtime in that it calculates the number of seconds since January 1, 1970. timelocal uses an array given by localtime to do this calculation.
The usage of timelocal is:
use Time::Local;
my $secs = timelocal(@arrayFromLocalTime);
The following example will always print out the number of seconds from 1/1/1970:
my $seconds = timelocal(localtime());
time
time simply gives the number of seconds since 1/1/1970. Since localtime takes the number of seconds since 1970, here is a way to find the time an hour ago:
my $time = localtime(time()-3600);
Expanding on this, here is a small subroutine to find out times one second, minute, hour, etc. from the present:
my %secs =
(
'second' => 1,
'minute' => 60,
'hour' => 3600,
'day' => 86400,
'year' => '31557600'
);
foreach $key (keys %secs)
{
my $time = localtime(time()-$secs{$key});
print "Time one $secs{$key} ago: $time\n";
}
times
times is Perl's main way to benchmark code, to see how CPU intensive it is. Unfortunately, it is not yet available on the ActiveWare port, only on UNIX.
The usage of times is:
($user, $system, $child_user, $child_system) = times();
The $user, and $child_user times is the time that your process (and its children) is utilizing the CPU. The $system, and $child_system times is the time that the operating system is being engaged by the process.
Now, and this is important, remember that times does not give you any idea of elapsed time. You can get the 'wallclock' time by the time function given above. Instead, times is good for figuring out how much of a CPU hog the process is. If you say:
($user1, $sys1, $child1, $childsys1) = times();
sleep(20); # sleeps for 20 seconds (see below)
($user2, $sys2, $child2, $childsys2) = times();
this will not print out 20 seconds if you say $user2-$user1. Instead, it will print out an extremely small number (like .01) since that was the amount of time that you are engaging the CPU. If you want wallclock time, see time instead.
sleep
sleep pauses a process for X number of seconds. The usage of sleep is:
my $secondsSlept = sleep($seconds);
This lets you do things such as wait for another process to complete, or make an infinite loop with a check that is extremely CPU intensive. If you said:
while (!-f "$filename")
{
}
to wait for a file to appear (perhaps from another process), this chew ups the CPU because the file test ('-f': see below) is particularly CPU intensive, and the chances of the file being there .001 seconds after you checked in the first place are slim. Instead, say:
while (!-f "$filename")
{
sleep(20):
}
This is significantly better, since you are only checking for the file every twenty seconds.
Summary of Time functions
Manipulating time is one of the biggest inconveniences ever faced by programmers (and because of this one of most expensive issues for businesses today, as the year approaches 2000). Another big inconvenience is optimizing code for speed, and being able to tell is wallclock elapsed time has been improved. This is why Perl provides low-level built-in time functions:
1) localtime, which returns a time (either string or elements) based on the argument being the number of seconds since 1970.
2) timelocal, which does the opposite of localtime. It returns the number of seconds since 1/1/70. (Don't forget to specify 'use Time::Local' in the code.)
3) sleep, which puts a process to sleep for a given amount of time.
4) time, which gives the number of seconds since 1/1/1970.
5) times, which gives CPU and system time for processes.
There is really quite a lot more to it than this, but this is enough to get you started.
Debugging Functions
The following debugging functions are extremely helpful in locating problems in Perl programs. We shall go over these in detail in 'Debugging and Perl5', but you should be aware of them for now.
caller()
The built-in function caller lets you take a quick peek into how a program is running. caller is invaluable, especially in its incarnations (confess, carp, in the package Carp) in tracing down why a certain error has occurred. If you do something like:
a();
sub a { b(); }
sub b { c(); }
sub c { d(); }
sub d { print "HERE!!!\n";}
Then you are making a calling stack, 'a()' calls 'b()' which calls 'c()', etc. This can be very difficult to debug, since finding the bug becomes a problem of navigation, or unwinding this stack. caller unwinds the stack for you at any given point. The usage of caller is:
($package, $filename, $line) = caller();
or
($package, $filename, $line, $subroutineName, $arguments, $context) = caller($frame);
Let's take another look at the example, and use caller instead:
a();
sub a { b(); }
sub b { c(); }
sub c { d(); }
sub d { print "@{[ caller() ]}\n"; }
This prints out:
main script.p 4
because d was called from subroutine c, which was in the main package, in the file script.p, on the fourth line.
If we said:
a();
sub a { b(); }
sub b { c(); }
sub c { d(); }
sub d { print "@{[ caller(1) ]}\n"; }
instead, (note the one here as the argument to caller), we've unwound the stack one frame, so it shows the function that called the function that called caller. This prints out:
main script.p 3 main::c 1 0
because main::c was the function that called main::d which called caller.
caller is not usually used directly, although we shall see places where it is. Instead, a package called Carp is usually used, which contains the functions carp.(lower case 'c') and confess. If you use the function confess, say:
use Carp;
a();
sub a { b(); }
sub b { c(); }
sub c { d(); }
sub d { confess "Dying here!\n"; }
Then Perl will print out the entire stack:
main::d called at line 6
main::c called at line 5
main::b called at line 4
main::a called at line 3
and terminate the process. You can now debug at will. The function carp does the exact same thing, but doesn't terminate. It just shows you where you are at a given time.
die()
die basically kills a process at a given point, printing out the arguments that it is given. Its usage is:
die @arguments;
If, for example, you say something like:
die "This program is not very correct, is it!";
then you are telling Perl to exit the program, giving an indication of where it is in the program, and to put that message on the standard error stream, or STDERR. The above statement will say something like:
This program is not very correct, is it! at line 32.
die is mainly used with short circuiting, such that if a condition is not met which is necessary for a program to continue. Something like:
use FileHandle;
my $fh = new FileHandle("> file") || die "Couldn't open file!";
will exit the program on the case in which a file is not openable.
warn()
warn is like die, except that it only gives a warning on STDERR, and does not stop the execution of the program. If you say something like:
use FileHandle;
my $fh = new FileHandle("> file") || warn "Couldn't open file!";
this prints out the same message as die, i.e.:
Couldn't open file! at line 32
but not die, or terminate the program. warn simply states the message, instead).
Summary of Debugging Functions
Debugging functions are what make it possible to Perl code scalable, and are the key to debugging and understanding object oriented calls. You should also know these built-in functions quite well:
caller prints out parts of an execution trace given a certain point.
die exits the code in a given spot, printing out a message where it does die.
warn prints the same thing as die (a message to STDERR) but does not exit.
Perl Interfaces to Operating Systems: System Calls, Operating System Emulations, File Operators
Perl started its useful life as a scripting language for UNIX. As such, it was sort of a cross between a language meant for controlling system level activities and for extracting other useful data in the form of reports. In its latest incarnation, obviously, it is so much more.
One powerful advantage to the programmer is that Perl has retained its system level roots and allows the programmer to easily, from within a Perl program, perform system level activities and do system level checking.
These functions come in two flavors: those which manage the running of other processes and those which allow Perl to perform common system calls cross-platform. Function calls system, exec, fork, and `` (backtick operator) allow Perl to manage the running of other processes. Function calls chdir,, mkdir and others allow Perl to make common system calls totally portable between platforms.
Or so we wish! Perl, like almost everything else in the computing world, has been caught in the crossfire over the battle between differing operating systems. The current battle getting the most attention is, of course, the one between UNIX and NT, and the dust has not settled yet. Yes, you can have portable Perl software. You should use these Perl internal calls (like chdir, etc.) whenever possible, but this is not 100% bulletproof. You just have to be careful for now on going about doing it.
You can also hedge your bets with this book, especially if you work for a large company and want to make sure that the software you developed on UNIX will work on NT.
system and Win32::Spawn
system takes the supplied argument and executes it as if it were a command in an operating system shell, either UNIX, NT, DOS, or anything else.. Win32::Spawn does the same thing, but is NT specific, and lets you make processes stand alone on an NT box.
Since system can execute platform specific commands, by its very nature system creates platform dependent code. Since Win32::Spawn is only on NT, calls to this make code even more platform dependent.
. In general, UNIX users are much more inclined to use system calls than NT users (due to the fact that NT is more 'interactive' by nature.) Much of the examples below could be emulated in NT by the specific calls to OLE, for example.
Usage:
system($command);
and
Win32::Spawn($executable, $arguments, $pid);
If your code executes something such as:
system("ls");
then you cannot hope to run your script on any system that does not have the executable ls. This is the same for dir:
system("dir");
will make an NT script totally bound to the NT world. One of the ways of getting around this is by using Perl's internal functions to EMULATE ls or dir. For example:
opendir(DIRHANDLE, ".");
while ($file = readdir(DIRHANDLE))
{
print "$file\n";
}
approximates an ls or dir, in an extremely portable way. On UNIX, the system command:
system("elm -s 'done with work' recipient < filename");
mails off to the user recipient the filename 'filename'.(On NT you would need to do this in OLE. We shall give examples about this a little later, in chapter 14.)The following executes a source control statement, and does so in the background.
system("rcs -u filename &"); # Unlocks the file filename, and does so in the background.
and the following directs output to a file:
system ("ls > file 2>&1");
Again, these functions are highly UNIX specific. In fact they depend on a UNIX shell being available in order to work correctly. In particular, the '&' (the background character) is very OS specific. This is where Win32::Spawn comes in:
use Win32;
Win32::Spawn("notepad.exe", "myNote.txt", $processid);
which runs notepad.exe in the background. This, too, is highly specific, and doesn't work on Windows 95, where there is no good workaround.
This example runs an interactive program for the user, and when finished returns.
system("vi filename") # starts up vi on the file filename
But of course, these commands do not work if the particular system does not have vi, elm, rcs, ls, etc. Therefore, it is a good idea to reduce your use of this command to a minimum, and in highly marked spots.
If you don't minimize your use of system,, when the time comes to port your scripts to a new system, you will have to completely rewrite your system calls to deal with that system's inconsistencies. Also, consider using the Perlish versions of common system calls instead. In fact, in some places you simply can't get around using the Perlish equivalents. If you say something like:
chdir($dirname) || die "couldn't change directory to $dirname!";
this changes the directory for the duration of your Perl script, whereas
system("cd $dirname") && die "Couldn't change directory to $dirname!";
simply will not work, since the system call doesn't 'keep its side effects'. This is a very important point to remember about system. If you use system to change environmental variables, some other aspect of your environment, and then try to use these side effects later on, you will simply be shooting yourself in the foot. If you say something like:
system("VARIABLE=value");
print $ENV{'VARIABLE'};
then you are on the wrong track. system opens up its very own small process, which has its own 'mini' environment which it inherits from the Perl script. Something like figure 10.4:
line art 10.4
Figure 10.4
the impermanence of system calls.
However, there is one thing that the system call leaves behind for the Perl executable, and you should be aware of it. The system call sets $? (see 'special variables') which is nonzero if it fails.* Hence if you say:
system("bad_command");
if ($? ne 0)
{
print "Error in command!\n";
}
Notice, again this is not portable between UNIX and other OS's! $? is a shell variable, and is not emulated in the other versions of Perl (like Activeware's for NT). |
The backticks function/operator takes the string specified by COMMAND, interpolates it, runs it, and then returns the TEXT of that command to either a scalar or an array. If it is in scalar context, the results of the command are slurped entire into the scalar . If the backticks are in array context, Perl uses the variable $/ ($INPUT_FIELD_SEPARATOR) to determine how to split the text. (See section 'special variables' for more detail on this.)
Using backticks shares the same, negative property as system: if you use it you are almost guaranteed to anchor your program to a particular operating system.
Usage:
$scalarName = `COMMAND`;
@arrayName = `COMMAND`;
This example uses the backticks operator to perform a system level command, to get a listing of directories in UNIX:
chop(@files = `ls -1`); # take the files from the ls command, and
# stuff them into @files.
# if 'ls -l returns:
# "file1\nfile2\nfile3" (three files with returns)
# then @files becomes ('file1','file2','file3').
Note that there are two major differences between backticks and system. The first difference is that the backticks command buffers its output, which means that it only displays on the screen, or is assigned to a variable, after the call completes. It also means that if the call you just made in backticks creates 50 megabytes of output, then your program will become 50 megabytes as well For example, the following example on UNIX will take a very, very long time to perform, without any note to the user as to what is going on, and probably causing the machine to run out of memory:
print `find / -print`; # do a find on the whole darn system.
# returns every single file, in one long string..
UNIX isn't the only system which could have this occur:
print `cd \; dir /s`;
which basically does the same thing, and dumps the entire contents of your DOS system into memory.
The second difference between system and backticks is that the backticks operator will not work on items which require human input, as in:
$badIdea = `interactive_program`;
since any text prompting the user for input is going to get buffered in the computer, and hence NOT displayed to the screen. Therefore, the user will not see what is expected from him, and the program will seem to hang.
The program has not hung. Instead, the computer has probably printed out the message requiring human input, but it is invisible since the backticks are preventing it from being displayed until the command is done!
So, unless you can sense what the computer wants and act accordingly, this is a bad idea. In lots of cases, though, it does make a great deal of sense to retrieve the output from a command in the form of a string. Here are more examples, which again, will only be good on a UNIX system:
$wordcount = `wc $file`; # gets the word count of 'file'
@matching_lines = `grep pattern @files`; # gets all the occurences of pattern in @files.
@sorted_lines = `sort file_name`; # sorts file_name
However, one of your goals should be to limit your use of backticks, for the above-mentioned portability issue. In fact, all of the above backtick commands could be rewritten in Perl. The wordcount becomes:
use FileHandle;
undef $/; # make it so <FD> slurps the whole file into a scalar.
my $FD = new FileHandle($file); # see special variables for more info.
$stuff = <$FD>;
$chars = length($stuff);
$words = @{[ (split(' ', $stuff)) ] }; # splits 'stuff' into words and then counts them $ = @
$lines = @{[ (split(m"\n", $stuff)) ] }; # splits 'stuff' into lines and then counts them.
$wordcount = "$chars $words $lines"; #
This shows the power of split (yet again) and the power of the @{[ ]} syntax we mentioned earlier to prevent temporary variables and put everything into one line. If you aren't comfortable with this syntax, split up the above into two lines, and say:
@word_array = split(' ', $stuff);
$words = @word_array;
instead.
The grep command becomes:
use FileHandle;
foreach $file (@files)
{
my $fh = new FileHandle("$file");
push(@matching_lines, grep(m"pattern", <$fh>));
}
where the Perlish grep returns only those lines that match a given pattern, and then stuffs them into '@matching_lines'. The sort command becomes:
my $fh = new FileHandle("file_name");
@lines = <$fh>;
@sorted_lines = sort(@lines);
where we simply use the Perlish sort instead of the sort command. These not only work exactly the same as their backtick counterparts, they also will work cross platform (i.e.: on NT as well as UNIX, as well as other OS's). Hence, if you are doing cross-platform development, you will want to stay as far away as you can from using `` and system.
We should mention one last command here, or rather, command set. fork is its name, along with exec and wait. fork is an extremely powerful command. It is similar to system, but with better process control, for controlling whatever processes your program spawn or generate. Unfortunately, NT does not natively support fork. To get fork-like functionality on NT (note NT, not Windows 95, you use the 'Process' and module. (and there ARE software packages that can make NT support fork. If, for example you use Perl with the package gnu-win32 on Windows NT, you will get fork as well, since gnu-win32 emulates UNIX on NT.)
So without further ado, here they are: , exec, wait, and fork:
Usage:
$processId = fork();
exec($commandString);
wait();
The function fork makes a copy of your process, and then proceeds to run both processes until they hit 'exit' statements. It returns the processId of the child process that it generates, the unique identifier for that process on UNIX systems.
The standard metaphor is:
if (!($pid = fork()))
{ # if child -- the child will
# return a non-zero value for the $pid
exec("script_name");
}
else
{
wait();
}
This essentially splits the one process into two processes, each with its own namespace, etc. These two processes are called 'parent' and 'child'. The parent then goes into the wait loop, which waits for the child processes to finish. The child continues to execute in the if loop, in this case 'exec(script_name)' before exiting.
This is very helpful because it bypasses the overhead of a system call, and because it provides process control in the form of a pid. Unfortunately, NT does not have fork, and several Perl scripts have 'fork()' calls in them. This is where 'Win32::Process' comes in, NT's version of fork. Hence it is a big issue for porting Perl to NT.*
In fact, you may want to have a 'wrapper' around fork, which checks to see if your OS is UNIX or NT, and then acts accordingly. In the ActiveWare version of Perl, there is a group of functions that come with Perl itself, in a module called 'Win32::Process' that do the same sort of stuff as fork. They work something like:
Notice that you need to be familiar with Perl's object oriented syntax in order to do this, and notice that it is quite nasty in syntax. You might wrap the forking off of a Perl script as in:
People interested in this usage may want to read ahead, especially in the sections on Perl portability, and the object oriented section. '$object' is basically your process, 'Create' runs your process, and 'Wait', and 'GetExitCode' wait for the process to finish, and return the Exit code, respectively. |
Perl provides several functions to help you avoid the non-portability issue. We give some of the most important ones below.
Functions chdir, unlink, link, mkdir, readdir, and opendir provide a Perlish alternative to their UNIX and DOS cousins. The following table shows the Perl command and the similar UNIX/DOS command:
Table 10.3
Perl command: UNIX Command Dos Command
chdir cd cd
unlink rm rm
readdir, opendir, closedir ls dir
mkdir mkdir mkdir
rmdir rmdir rmdir
There is also a hostname package (Sys::Host) and a find package (File::Find) which do things very much like hostname and find. Again, the main reason behind these scripts is portability. If you get used to using them, then you won't be bitten when you move to new systems. Anyway, enough of a rant, here are some examples. If you find yourself using 'system("cd dirname")' you haven't been paying attention. Use these instead:
chdir("C:/windows"); # changes to the windowsdirectory. Sets
# cwd inside Perl to be "C:/windows".
opendir(DIR, "C:/windows") || die; # opens up the directory "C:/windows" for reading.
@list = readdir(DIR) || die; # reads a list of files from the directory $dir.
# equivalent to 'dir DIR'.
foreach $file (@list)
{
if (-f $file) # if a file.
{
$status = unlink($file); # delete it. and return the status.
}
}
$status = unlink(@list); # deletes all the files in the directory. better off
# doing it one at a time, so you can get the
# status. This 'status' is the number of files
# the unlink actually does. The unlink
# Does not touch the directories!
rmdir("/tmp") || die; # tries to remove the directory. Succeeds
# if empty.
mkdir("/tmp", 0755) || die; # tries to make the directory "/tmp" with the
# permissions 0755.
unlink <file.*>; # tries to delete all the files in the cwd that
# match file..
All of these operators have a status, which you ignore at your own peril. For example, let's suppose that mkdir("/tmp", 0755) returned a status of 2. This means that something wrong happened during the making of that directory. If you do these commands without checking the status of the command, and part of your program depends on having, say a directory created, and the directory isn't there, well, then all gloves are off as to the effects on your system.
It takes a while to get used to doing all of these operations in Perl, but once you manage to do it, you lessen your dependence on the operating system a great deal. This, of course, makes your code more portable.
These functions are just a sampling of all the operating system functions that are available. You will see more of these throughout this book. In fact, it is this book's policy to use Perl if it can, rather than system calls or pipes. Once you decide to uphold this policy in your own code, you will find that the next time you need to port your scripts it will be orders of magnitude easier.
Perl has quite a few file operators that it uses to check certain files. They all have a similar usage, inherited from the UNIX shell:
Usage
(-OP SCALAR)
(-OP FILEHANDLE)
Where -OP is any one of a number of one character switches that operate on a FILEHANDLE.
Table 10.3 shows some of the more important operators that Perl knows about:
Table 10.3
-f filehandle or $filename ( test whether or not the filehandle is a file)
-d filehandle or $filename ( test if the filehandle points to a directory)
*-l filehandle or $filename ( test if the filehandle points to a link)
*-r filehandle or $filename ( test if the filehandle is readable by owner of script)
*-w filehandle or $filename ( test if the file is writable by owner of script)
*-x filehandle or $filename ( test if the file is executable by owner of script)
-z filehandle or $filename ( test if the file has zero size)
-s filehandle or $filename ( gives size of file )
-e filehandle or $filename ( test if file exists )
-T filehandle or $filename ( test if the file is a text file )
-B filehandle or $filename ( test if the file is a binary file )
Note that this is not a complete list. For more, go to the perlvar manpage for that. Also note that not all of these operators work on a Windows platform. The ones which do not are prefixed by a '*', and do not work because there is no such functionality. Each one of these operators returns a 1 if true, and a '' if false, except for the '-s' operator, which returns the size in number of bytes. Following are some examples of their usage:
$fh = new FileHandle("FILE") || die;
print "symbolic link\n" if (-l $fh) # if the filehandle points to a symbolic link, print
print "fileName is a file" if (-f "fileName");# if fileName is a file, print.
print "fileName is readable" if (-r _); # if fileName is readable, print.
The last example shows the use of the '_' variable, which is quite good at preventing more than one system call on the same file. This is because the file operators are all manifestations of the low level call stat. stat is Perl's way to get information about files in the operating system. If, for example you did the following:
if (-r "file" &&-e "file" && -s "file") # looking for a readable, existing, non-zero
# size file. Unfortunately, this means three
# stat calls, which provide you all this info anyway!
This is wasteful. Rather, you should use:
if (-r "file" && -e _ && -s _) # much better -- ONE stat call only.
This is a big save in processing resources, since stat calls are expensive. Some common, cross-platform uses of file operators:
if (!(-e $directory))
{
mkdir("$directory") || die "Couldn't make directory!\n";
}
if (!(-e $file))
{
$fd = new FileHandle("> $dir");
}
which both make a directory, or touch a file if it does not yet exist.
glob is Perl's function for getting file names and directory names from the operating system, in a manner similar to how the OS itself does it. Usage:
my @arrayName = glob($glob_pattern);
For example, if you want to get all the '.c' files in a given directory:
my @files = glob("*.c");
expands the * in the same matter that the shell would, substituting * with all files that have .c on the end.
glob is portable between the win32 world and the UNIX world, since it isn't really using the underlying platform to expand the "*.c", but is using an internal function to do it.
These functions described above are a bit more portable between operating systems, and they perform some fairly low level operations. The main ones to know are:
1) File Operators, which get information on what is in a given file
2) glob, which expands a pattern into its corresponding files in a system independent way (like "rm *")
Perl built-in functions provide a wide variety of functionality to the perl shell. This is one of the reasons that perl is so useful to solve everyday problems; you know that however you want to manipulate your data, or whatever system calls you wish to make, there is a perl function built in to the core which does it for you!
We outlined a bunch of perl internal functions that were the most common, and covered the common everyday problems that people encounter day-to-day, but you are going to want to check out perlfunc for a far more complete listing. I haven't counted them recently, but last count there were over 300 internal perl functions builtin(if you count different uses as different functions)!
Perl functions also have the property that they are multiplexed, or as we shall call them in chapter 21, polymorphic. We gave the count of 300 as the number of internal functions, but the number of distinct names for internal functions is more like 80.
They simply work differently depending on the parameters that you pass them. This is what makes perl seem so simple to use even though the number of actual separate commands is gigantic.
Anyway, onwards and upwards. The next section, special perl variables, deals with another aspect that makes perl so useful; the 'special perl variable' which gives a location for common data that you shall use in your perl applications.
Perl has special variables which are provided, by default, with the Perl executable. It is important to note that these special variables are read only.
Throughout this chapter, you will see two forms of special variables: a 'short' form, and a 'long' form. ( i.e.: $" and $LIST_SEPARATOR ). The short form is all we had in Perl 4 Those of us who are familiar with these short forms would not want to give them up since they save a lot of typing. With Perl 5, however, there is a 'long-hand' provided for these variables. For example:
used at the beginning of a program will make synonyms of the long form for all short variables. For example, Perl changes $` to $PREMATCH, $| to $OUTPUT_AUTOFLUSH, etc. Sometimes there are two such synonyms:
Don't worry about setting $, and forget about setting $OFS or $OUTPUT_FIELD_SEPARATOR. These are true synonyms: once one of the values change, the others change as well. (We shall see in the next chapter how this is accomplished.) Whether or not you want to use these long forms is another matter. If you like being explicit or remember easier in longhand, they are fine, but it is just as easy to look up the short form in a reference like 'Perlvar' as it is to look up the long value. The practice of this book is to use the 'short form' and then show a translation of the short form into long form in a comment or two. |
We have already seen examples of special variables: the $" ($LIST_SEPARATOR) as seen above, the $` ($PREMATCH), $' ($POSTMATCH), and $& ($MATCH) as seen in regular expressions.
Why use special variables? Perl puts them to use in several ways:
to hold very common data, that is used frequently in Perl programs
to modify the way that a Perl function or operator works as in $" ($LIST_SEPARATOR) above, which modifies interpolation
to keep 'scratch' data to help a function or operator. as in $MATCH, which holds the information about the last match, for the regular expression engine.
In general, they are used to make a complicated interface simple. Consider, for example, what would have to happen if you didn't have a $" operator. Then, you would have to have an interpolation function and pass the $" as an argument as in:
$line = interpolate( ' ', @arrayName); # ' ' equals $" ($LIST_SEPARATOR)
# unintuitive
This 'pseudo-syntax' is extremely cumbersome, and somewhat ugly. Compare it to the Perlish way:
$line = "@arrayName"; # does what you expect.
This form is shorter, more concise, avoids a function call, and (if you get used to the idea that $" is set to the default of " ") does what you would expect.
Internal variables are part of reason that Perl is easy to write and one of the reasons why things can be done so fast in Perl. For example, the following example prints out the arguments supplied to a Perl program:
print "@ARGV\n";
The following prints out the directories for which Perl searches for libraries:
print "@INC\n";
The following sections demonstrate some of the more important variables that Perl supports, with examples of their usage, sorted by category. Also, keep in mind that all of the variables are global.
Perl provides quite a few special filehandles that you can use to send output to various locations:
STDIN provides a simple method for you to get data from the keyboard, the shell in which the user is typing. If you say:
chop( $value = <STDIN> );
then $value will contain what the user has typed (on the command line). You also can use this on the left hand side of an equation, like:
if (<STDIN> =~ m"Y") { print "user input contained a 'Y' in it! }
Or perhaps:
if (($value = <STDIN>) =~ m"Y") { print "User input contained a Y - in $value!\n" }
Both use STDIN to capture input, the second one to save that input into a variable before matching it to a regular expression. This can be handy to have points at which you want to have the user type a certain value, and die if the input doesn't contain yes.
These are the two main ways to send text to the screen where the user typed the perl command. STDOUT, in particular, is also the 'filehandle' that perl uses by default in the print command. It stands for 'Standard Output'. When you say:
print "THIS GOES TO THE SCREEN!\n";
then perl is silently translating this to:
print STDOUT "THIS GOES TO THE SCREEN!\n";
for you. STDERR also directs the text to the screen, although in a slightly different way. When you say something like:
print STDERR "filename!\n";
then this will go to the screen, but in particular if you say something like:
prompt% perl -e "print STDERR 'filename!'" > file
this will not redirect the output to the file 'file', even though you have put in the '>'. ( This works exactly the same way on NT, but not Win95). This allows you to split the two 'input streams' on the command line. Amd at least on Unix, the following command for korn shell works to split the two filestreams into two separate files:
prompt% perl -e "print STDERR 'filename!\n" > stdout 2>& stderr
One last note. Sometimes it is really annoying to have both STDOUT and STDERR streams going at the same time. Say you were capturing the output of a program that did this. The output text in this case tends to get garbled up - since STDERR and STDOUT are two different streams and they are both going to the same place.
In these cases it is helpful to redirect STDERR to STDOUT, either by doing so explicitly in the shell, or and then use a pipe to open up the relevant command. (This trick works on both NT and Unix, but not Win95. I'm not even sure if Win95 has separate STDERR and STDOUT handles like this.) For example:
open (STDERR, "> STDOUT"); #
print STDERR 'HERE';
this will send all the STDOUT text to STDERR. And:
open ( FD, "dir 2>&1 |");
will redirect STDOUT to STDERR inside the shell (again NT/Unix), so that you can actually see the 'dir' command being executed by saying:
while ($line = <FD>) { print FD $line; }
Again, I'm not sure about Windows 95...
The ARGV file handle is used in special cases where you want to actually treat all of the arguments on the command line as files. ARGV will then act as a default filehandle for each and every entry in @ARGV (see below). Hence when you say:
while ($line = <ARGV>) { print "$line!\n"; }
you are actually saying in disguise:
foreach $arg (@ARGV)
{
open (FD, "$arg");
while ($line = <FD>)
{
print "$line!\n";
}
}
Hence, ARGV is just a cleaner way of stating things in a small subset of important scripts.
The DATA filehandle is another, simple mechanism to help write simple and quick scripts to parse data. It allows you to attach code to a piece of data, and thus process that piece of data 'in place'. If you say, for example, something like:
while ($line = <DATA>)
{
print $line;
}
__DATA__
NUMBER 1|IS| A | PIPE | DELIMITED | LINE
NUMBER 2| IS | ANOTHER | PIPE | DELIMITED | LINE
This prints out
NUMBER 1
NUMBER 2
Perl is reaching in and grabbing the data after the special symbol __DATA__.Hence, on the first iteration of the while loop, $line becomes 'NUMBER 1', and on the second $line becomes 'NUMBER 2'.
The special tokens, given above, while not being strictly variables are often used in conjunction with the special filehandles like DATA. They are listed below.
__FILE__ and __LINE__ are used as quick ways of getting the
file that the current statement is executing in,
line that the current statement is executing in.
As such, they give information that is similar to caller() - however caller only reports the lines that are in the stack above the place where the statement is executing in, not the actual place itself. Here is a statement which supplements caller by showing the file and line of the actual calling statement:
print __FILE__, " ", __LINE__, "@{[caller()]}\n";
Note that since they are not variables per se, you cannot put them in double quotes.
__DATA__ and __END__ are used with the DATA special filehandle as shown above. The only difference between the two is that __DATA__ can be used more than once so that each package can have its own 'DATA' handler. For example:
A.pm:
package A;
while ($line = <DATA>)
{
print "$line in package A\n";}
1;
__DATA__
package A line 1
package A line 2
package A line 3
package B;
while ($line = <DATA>)
{
print "$line in package B\n";
}
1;
__DATA__
package B line 1
package B line 2
package B line 3
This works via setting the filehandle A::DATA to point at the lines after package A, and the filehandle B::DATA to point at the lines after package B. See the chapter 'syntax of perl modules and libraries' for more detail.
Like named filehandles, perl provides quite a few special, named variables which help you interface with certain aspects of the shell. These will probably be some of the most widely used variables you will have in your programs.
@ARGV (no synonym) holds the arguments as passed into Perl via the command line.
print "@ARGV\n"
prints out the arguments passed to the command. The following is a simplistic option processor which creates a hash out of all items that come after a '-':
my (%options, @newARGV, $xx);
for ($xx = 0; $xx < @ARGV; $xx++)
{
my $arg = $ARGV[$xx];
my $val = $ARGV[$xx+1];
if ($arg =~ m"^-") # matches a '-' at the beginning of the argument
{
$options{$arg} = $val;
$xx++; # we skip the 'value' part, since it has
# already matched.
}
else
{
push(@newARGV, $arg); # we save the argument that
# is *not* a directive on a stack.
}
}
@ARGV = @newARGV; # we then take @newARGV (the stuff we
# just matched) and then copy it over
# the old @ARGV stack. (we are done with it!)
If someone types:
command.p -option1 option_value -option2 value2 table1 table2 table3
you then can access whatever the user typed on the command line by:
$options{'-option1'} ; # holds value 'option_value'.
For more robust option processing, see the module Getopts. Perl provides a module to do the above transparently, and we shall see quite a few examples of this in the next section..
Perl also provides an ARGV filehandle in which each argument on the command line is treated as a file, and opened. This example:
while ($line = <ARGV>) { $count++;}
print $count;
counts all the lines that were given from files on the command line.
@INC shows the directory order in which Perl searches for libraries (either 'required' or 'used'), much like the '-I' flag for UNIX C compilers. Perl looks for libraries in first-to-last order, which means if the @INC looks like:
('/usr/local/lib/Perl', 'mylib/Perl').
and there are the libraries:
/usr/local/lib/Perl/standardPerlLib.pm, mylib/Perl/standardPerlLib.pm
YOUR library is going to get ignored when you do your use or require.
We will talk a lot more about @INC in Chapter 14: The Syntax of Libraries and Modules.'
%INC is the hash equivalent of @INC. Instead of giving information about where the libraries are coming from, %INC provides information about which libraries are actually in your workspace.
For example, suppose you say something like:
use Data::Dumper;
use strict;
at the beginning of your code. If you then say:
print Dumper(\%INC);
it will contain the following two messages:
'Data/Dumper.pm' => '/usr/local/lib/perl5/site_perl/Data/Dumper.pm'.
'strict.pm' => /usr/local/lib/perl5/strict.pm'
which indicates which library is being used at the time.
%ENV is an array that shows all of the environmental variables that Perl is aware of. These come directly from the shell. For example, if you say something in korn shell such as:
export ENVIRONMENTAL_VARIABLE=/home/install/bin
Then the Perl statement:
print $ENV{'ENVIRONMENTAL_VARIABLE'}; # This prints out '/home/install/bin'.
print keys %ENV; # This prints out your
# entire environment
Notice that by the nature of the shell, if you modify an environment variable in Perl, you do not modify it in the original shell. All the changes that made in the environment will be wiped clean as soon as the Perl script exits. This is sometimes an annoying behavior, but is unavoidable due to the way that the shells are put together.
However, if you modify the environment, you do modify it for any processes that the Perl script spawns off, as in:
$ENV{PERL5LIB} = "/home/edward/Perlwork";
system("Perlcall.p"); # this is a simple 'system' command
# fires off 'Perlcall.p' as a subprocess,
# and waits for it to return.
In this case, "Perlcall.p" will see the changes you made to the environment and execute them.
Perl also provides a module, Env, that lets you see your environment as a bunch of scalars. $ENV{'SHELL'} becomes $SHELL, and so on.. In other words:
use Env; # uses the Environmental module .
print $PERL5LIB; # prints out the environmental
# variable PERL5LIB.
prints out the environmental variable PERL5LIB, and:
$PERL5LIB = "/full/path/for/perl5lib";
sets the environmental PERL5LIB variable.
This sometimes makes code much cleaner, especially in modules that manipulate the environment a lot. These are true synonyms that Env sets up. If you undefine $ENV{PATH} you also undefine $PATH, and so on. You can do things such as:
use Env "HOME", "PATH";
which will only make a $HOME and a $PATH variable.
The SIG variable handles all the signal handlers that the Perl script knows about. It is, by default, set to an empty hash. If you want to get a list of signals that your system supports, you can simply use the 'Config' module (again, see section 'Configuring Perl' for more information on Config).
This is done this way:
use Config;
print $Config{sig_name}; # prints out a list of signals that your system supports.
If you set any of the signals in that config list via:
$SIG{SIGNAL_NAME} = \&signal_function; # code reference.
Then, when the function SIGNAL_NAME gets 'caught' by the process, the function 'signal_function' is called instead of doing the default action that the signal does. The usual way signal handlers are used is to prevent the process from dying when a command is typed.
For example, if your machine supports an INT signal (both Windows and UNIX do this):
$SIG{INT} = 'IGNORE'; # special keyword, IGNORE, which means 'just skip it'.
If someone types a 'Ctrl-C' (or whatever equals 'send an interrupt to the underlying process') now, the process will ignore that Ctrl-C. Likewise, if you set the handler to:
$SIG{INT} = \&hit_control_c;
sub hit_control_c
{
print "Ouch! Somebody hit a signal $_[0]\n"; # Note the $_[0] here.
} # if you print out this varb
# it shows which signal
# was actually pressed.
Now when someone types the interrupt sequence (Ctrl-C) it will print 'Ouch! Somebody hit a signal INT' and continue.
Perl also provides a special handle __DIE__ for running a subroutine before the program exits abnormally, and __WARN__ for when a warning message is printed. It is extremely helpful in debugging, as in:
This will give a stack trace when your program exits with an error condition. For more on Carp, confess, and warn, see section 'debugging Perl'. |
Signals and underlying processes. Signal handlers do not work right off with underlying processes. For example, if you say something such as:
and then press the interrupt char (Ctrl-C), if Perl is in the middle of 'command' it will interrupt it, even though you told it not to. |
Now, we get to the somewhat hairy part of the chapter; we talk about those one line variables which make perl so lovable to people who like to write terse code, and turn off so many others.
If you belong to the second category, or are new to perl and think longer names would help you learn the language better, the module 'English' will come in very handy to give you an alternative. Just say:
use English;
and you will get the long filenames below (instead of the terse ones.)
We have seen only one of these already: $" ($LIST_SEPARATOR). This section discusses some of the similar variables like this, variables which change the internal state of some special perl function or app.
I hesitated to include the variable $_ ($ARG), since it often times gets abused. However, its use is so prevalent that we have to say something about it.
$_ acts as a default variable. It is a variable, used by several functions and in several contexts, which you never see. When a function has no arguments and you know it should take one, as in the following statements:
print;
die if (-f);
or, when you see what looks like a regular expression, and it doesn't have a '=~' in it, as in:
if (m"whatever") { &do_something; }
This checks $_ to see if it has 'whatever' in it, and if it does, does the routine 'do_something'. '$_' works with a foreach loop such as:
foreach (@args) { } # instead of foreach $arg (@args) { }
which goes through each element in @args, and sets $_ to it temporarily for the duration of the loop. Or as in:
while (<ARGV>) { print; }
where this is actually a small program, the equivalent of:
while (defined $line = <ARGV>)
{
print $line;
}
In the use with grep and map:
@dirs = grep(-d , @fileList); # goes thru a @fileList, tests each instance to see if it is a
# directory and returns a list of directories into @dirs.
@value = map(ord, @charlist); # turns characters in @charlist
# into their ascii values in @value.
where $_ is being used by grep and map as a temporary value to search through.
From the this section you can see why it $_ is attractive. With its use there is no need to set an extra variable. Syntax becomes shorter, and quicker to write. But it is also extremely easy with $_ to become too cryptic. By explicitly setting $line, you are forcing yourself to remember that '$line' means something; that is, a line in a file described by the filehandle ARGV.
If you get into the constant habit of using $_, you are going to trip up somewhere, as in the following:
use FileHandle;
foreach (@ARGV)
{
my $file = match_pattern();
print; # print out argument list (arg to @ARGV);
sub munge_files
{
my $FH = new FileHandle ("$_");
while (defined <$FH>)
{
return(1) if (m"pattern");
}
This trips up since the 'while' line (while (defined <$FH>)) also uses $" for you, and this collides with the 'foreach' loop. The big question is where do you draw the line? We say that you should use $_ in three cases. One is when you have a quick and dirty script to do, and need to do it pronto. Two, in functions that require it, like grep and map.. Three is inside a small to medium sized subroutine, where you always localize it, as in:
sub read_file
{
my ($fileName) = @_;
local($_);
my $return = [ ];
open(FD, "$fileName");
while (<FD>)
{
push(@$return, $_);
}
$return;
}
Even here you aren't gaining much, just one less variable. You also have to be aware that the operator push() doesn't take $_ as an argument.
Internal variable $" ($LIST_SEPARATOR ) we have seen already. It provides some magic for arrays being printed inside quoted context. If an array is interpolated in a doublequoted string context:
@arrayName = ("Example", "Again");
$line = "@arrayName"; # prints "Example Again" if $" is
print $line; # a space (default)
then $" is used to determine what to put between the elements of the list, the default of which is a space. If we set this value to "\n\t":
local($") = "\n\t"; # $LIST_SEPARATOR = "\n\t"
@line = ("Example", "Again"); # w/'use English';
print "@arrayName\n"; # prints "Example\n\tAgain";
Then notice we have both put the words on separate lines, and indented them. This comes in extremely handy in two cases. One, if you have multiple patterns that you want matched in a regular expression:
local($") = "|"; # $LIST_SEPARATOR = "|";
@patternlist = ('pattern1','pattern2','pattern3','pattern4');
$string = "we want to match pattern4";
$args = m"(@patternlist)"; # this equals
# (pattern1|pattern2|pattern3|pattern4)
# remember. regular expressions
# interpolate like double quotes!
The second case where it comes real handy is code generation. Suppose you had a list of fields that corresponded to a table definition, as in:
my (@fieldList) = ('char a(50)', 'int b', 'text c');
By setting the $" to equal say, a ",\n\t", you can say something such as:
my $tableName = "example";
my $table = "create table $tableName ( @fieldList )";
# This prints out
# create table example (
# char a(50),
# int b,
# text c )
Note that this is totally, completely, syntactically correct sql, and it is even prettified!
The same trick can be done for C and Perl:
$" = ";";
@declarationList = ('char a[50]', 'int intName');
The other two variables $, ($OUTPUT_FIELD_SEPARATOR or $OFS), and $\ ($OUTPUT_RECORD_SEPARATOR or $ORS) aren't nearly as useful. We list them here only to understand print a little better. The function print is a pretty odd duck. Since it is so commonly used, Perl provides several modifiers to how its arguments are actually being printed. These three print statements will print different things, even though they are printing out different aspects of @arrayName:
print "@arrayName";
print @arrayName;
print scalar(@arrayName);
The first example we have seen (and using interpolation isn't specific to print), and the third prints out the number of elements in an array (since 'scalar' forces the arg to be a scalar, print sees a list with one element).
The second (since print is a function on arrays) does something different. Since it is printing out @arrayName in ARRAY context, and not in interpolated double quotes, it uses the '$,' variable instead. So if you want to use this construct, you have to remember to switch between $, and $".
$\ is a little more helpful. It indicates what character print is going to print after it finishes printing everything else.. $\ is originally set to NULL, but if you set it, you'll get something like:
local($") = "|";
local($\) = "$";
@arrayName = ('regexp1', 'regexp2');
print "@arrayName"; # prints regexp1|regexp2$
# Note the '$' at the end here...
But still, you can always print the '$' explicitly at the end, so this is only really useful in heavy duty code generation.
This variable, $/, is a lot more useful than its cousin, ($\ or $OUTPUT_RECORD_SEPARATOR). It indicates how the file descriptor reads its information. $/ is set to "\n" by default. For example:
local($/) = "\n"; # actually, the default.
open(FD, "$file"); # opens the file descriptor FD for information
$line = <FD>; # Now -- line contains the 'line' from FD
# UPTO and INCLUDING
# the $INPUT_RECORD_SEPARATOR
# "\n";
For example, if you have a file with the format:
THIS|IS|A|LINE|OF|DATA
THIS|IS|A|LINE|OF|DATA2
in which both of the lines of data are terminated by a "\n", and then after the above statement, you read in
'THIS|IS|A|LINE|OF|DATA
'
with the return or '$/' intact, and you want to get rid of the return, then you can do:
chomp($line);
chomp($line =<FD>); # reads stuff in, and automatically chomps
# off the '$/'.
chomp works on the '$/', hence it is safe. If for some reason, you read in a line without a '$/' on it, as in:
'NO INPUT_RECORD_SEPARATOR' # lacking a '\n';
then chomp does NOTHING. It ONLY works on the '$/'. Its unsafe cousin, chop does, on the other hand, chop off the 'R' giving:
'NO INPUT_RECORD_SEPARATO'.
Suppose now, you have a 'record' file, in which there is a different row delimiter present:
'THIS|LINE
OF DATA|
HAS CONTROL NS IN IT| AND IS DELIMITED BY A CARET-RETURN ^
'
Then, you can set $/ to be
local($/) = "^\n";
and the construct:
while (defined $line = <FD>)
{
chomp($line); # gets rid of "^\n";
}
reads each logical line; everything up to and including the ^\n -- and for each line, gets rid of the row separator "^\n".
A final use of $/ is to suck in all the data from a file into one long string. We have already seen this, in a couple of places. This is very powerful when combined with the regular expressions from the last chapter and there isn't a delimiter that is special (like the above). If you want to do this, you undef it and set it to blank.
Say, for example, we had a file of the format:
BLOCK1
this is a block of text
containing multiple lines.
BLOCK2
this is another block.
Then, we might be able to iterate through this using something such as:
undef $/;
open(FD, "$file");
$line = <FD>; # now '$line' equals ALL of file $file!
while ($line =~ m"(.*?)(BLOCK\d+|$)") # now we loop through the
{ # with the delimiter 'BLOCK'
do_something_with_line($1); # plus several digits.
}
This sets '$1' to:
'this is a block of text
containing multiple lines
first, and:
'this is another block'
second, using the power of regular expressions.
The $/ construct can be an arbitrarily complicated string. It can be used to parse code, manipulate documents, manipulate database definitions, whatever. As long as the text is semi-readable, you can do whatever you want with it. We will have lots of examples of this later.
Variables that are set by function calls, and/or support various functions and operators ($`, $', $&, $$, $0, and $?)
The role of these variables is to support underlying perl functions so that the amount of syntax necessary in those functions is minimal.
Whereas the one-line variables in the section above actually changed the way certain functions executed, the variables here go the other way round. They are set by various functions and basically make the operations of those functions smoother. In the chapter on Regular Expressions (chapter 9) we already encountered a few ($` ($PREMATCH), $' ($POSTMATCH) and $& ($MATCH) ). Here is a review of each of them, along with a few new ones. As always, the names of the 'English' variables are given as weill.
$` ($PREMATCH), $' ($POSTMATCH), $& ($MATCH)
$` $PREMATCH, $& ($MATCH), $' ($POSTMATCH) are set as 'side effects' by the regular expression to equal - respectively, the text preceeding the match, the text that actually was matched, and the text after the match.
If you have the regular expression:
$line = "fee fie foe";
$line =~ m"fie";
Then $` becomes the string "fee ", $& becomes the string "fie" and $' becomes the string " foe".
$$ ($PROCESS_ID, or $PID)
$$ ($PROCESS_ID, or $PID) is the current process ID of the process running on your machine. This is incredibly useful for making temporary file names:
open(FD, "> $$"); # open a temporary file based on the PID.
# open for writing.
The reason this is incredibly useful is, that if you are running processes on the same machine, there is no chance for collision when you run other processes which use $$ in this way. In other words, if 'script.p' is the name of the program above that is opening a file, you can run:
prompt%perl_script.p (PID 1123) (temporary file name /tmp/1123)
prompt%perl_script.p (PID 1124) (temporary file name /tmp/1124)
prompt%perlscript.p (PID 1125) (temporary file name /tmp/1125)
prompt% perl_script.p (PID 1126) (temporary file name /tmp/1126)
and the operating system keeps track of the file names for you. This again, is only good on Windows NT and UNIX machines. Windows 95 does not have a concept of process ids, and hence will not work here.
$0 ($PROGRAM_NAME) (read as $ <zero>)
$0 (read as $<zero>) holds the name of the script, as you ran it. For example, if you name your script:
script_name.p
in the directory
/home/utils
Then, the variable $0 becomes '/home/utils/script_name.p'.
This is extremely useful for modules that need to know where they were called from. We will use this logic when we go into our automatic documentation handler later on. For example, suppose you want to make a module that kept track of what was run, and when. You could simply do something such as:
package ModuleTrak;
sub log
{
my $fh = new FileHandle(">> process_log");
my $date = localtime;
print $fh "$0 $date\n";
}
Then, on any call to:
ModuleTrak::log;
Perl will automatically keep track of the program that was run, without the package having to be 'passed' that information.
$? ($CHILD_ERROR)
$? gives the last error status for any system call or command given in backticks. Usually it is used to check for any errors returned from a given process:
system("cd $ROOT");
if ($?)
{
print "Error in changing to $ROOT!\n";
}
@files = `ls -1 $DIR`;
if ($?)
{
print "Error in getting files from $DIR!\n";
}
which again, shows the actual error that occured as it happened, as well as flagging that an error indeed happened.
- Summary of Internal Variables.
Again, as was the case with the Internal Functions we mentioned above, we have only given the most common of Internal Variables. There are many more others, some with exotic names simply because perl ran out of common keys on the keyboard. ($^X holds the perl executable name, for instance)
A quick scan of the perlvar manpage shows that there are about 60 in all. So we again ask you not to take this chapter as gospel, and instead go to the online documentation.
Summary of Chapter
The main purpose of this chapter is to supplement the documentation and focus on the more important special variables and built-in functions into one place. This chapter explained them in more detail, especially with regards to portability. After one has gotten used to the amount of detail here, one can then go on to the perlvar, and perlfunc man-pages for more information, as well as the Perl for Win32 documentation.
If I were to summarize what I thought was the most important functions to know (in a chapter of important functions), I would say that a good plan of attack for learning Perl functionality would be to:
a) learn about variables and contexts first (see chapter on Perl variables)
b) learn split, map, grep, and sort
c) learn caller and Carp
d) learn the Perl interfaces to the OS (mkdir, rmdir, etc.) for portability.
e) use the rest of the variables/functions as you see fit.
This approach emphasizes learning Perl in a 'Perlish' way. split, map, grep, and sort are the most unique functions to Perl. If you learn them, then you shall be easily able to pick up the other functions in their wake.
![]() ![]() |
![]() ![]() |
![]() ![]() |
COMPUTING MCGRAW-HILL | Beta Books | Contact Us | Order Information | Online Catalog
HTML conversions by Mega Space.
This page updated on October 14, 1997 by Webmaster.
Computing McGraw-Hill is an imprint of the McGraw-Hill Professional Book Group.
Copyright ©1997 The McGraw-Hill Companies, Inc. All Rights Reserved.
Any use is subject to the rules stated in the
Terms of Use.