![]() ![]() |
![]() ![]() |
![]() ![]() | |
© 1997 The McGraw-Hill Companies, Inc. All rights reserved. Any use of this Beta Book is subject to the rules stated in the Terms of Use. |
This chapter concerns a very basic strength of Perl , and why it has grown so popular as a data manipulation language: its policy towards variables. This chapter is not meant to be considered a complete reference. For that consult the perlvar manpage.
Perl looks at variables just a little bit differently than other languages do. Languages such as LISP give one or two basic types of variables, and a few functions for their manipulation. This is the minimalist approach, in which the programmer is expected to have the knowledge to create from only a few building blocks.
Other languages, such as C, take an opposite tack. The number of types of variables in these languages are huge (int, float, double, long, short, char). Hence the process for manipulating these variables can get quite complex.
Perl tries to strike a balance between minimalist and complex. Perl's overall philosophy is pragmatic: it hides the complexity of having a thousand data types, but realizes that giving one datatype to the user may be a little restrictive. Therefore, Perl has settled on four basic datatypes, each of which is admittedly a compromise between elegance and functionality. These data types have been hammered out over ten years of hard experience. The amount of functionality that these four datatypes cover is quite impressive. The types are:
Scalars - a single chunk of data
Arrays - a bunch of scalars, indexed by number
Hashes - a bunch of scalars, indexed by a scalar called a key
Handles - a pointer that enables programmers to open resources from the operating system (files, directories, sockets, etc.)
This chapter is designed to teach you 'successful wrangling' of these datatypes. Once you get used to thinking in scalars, arrays, hashes, and handles and start using them correctly, you can do some pretty cool things.
As there are four different types of variables in perl, it may come to no surprise that there are four major parts to this chapter. However, there are some issues that arise from the fact that there are only four datatypes available, so each section on variables is divided into subsections.
First, we shall talk about the scalar, and its role as 'universal perl datatype'. We shall see how it differs from variables such as the ones provided by languages such as C, and talk about a technique called interpolation which allows the scalar to become the 'universal perl datatype'. We then consider some of the more common operations to manipulate them.
Second, we talk about the array, which is perl's way of representing a grouping of scalars, ordered by key. We go over its philosophy, and how you can manipulate the array via built-in functions.
Third, we go over the hash, which lets you keep data in perl as a dictionary does; in a set of key-value pairs. We go over functions to manipulate individual key-value pairs, or manipulate the hash entire.
And finally, we go over handles, which let perl interact with data external to the perl program, such as files and directories.
That, and a section of examples, make up the bulk of this chapter. But first, lets take a look the 'big picture' we outlined above in a little more detail.
So what makes Perl so special? What sets its variables so far apart from other languages? And why do data manipulation tasks such as generating html pages come so naturally to Perl? A large part of this is due to the inherent ease behind Perl's variables. You need not declare variables as you would in C or FORTRAN. The following is a no-no:
Scalar $scalarName;
Array @arrayName;
Hash %hashName;
Handle HandleName
which is a C programmer's beginning mistake in Perl. Perl variables carry their datatype alongside them (so to speak). Scalars are always prefixed with '$', arrays a '@', and hashes '%'. If you have something that you know is a variable, and it doesn't have a special character in front of it, then you know it is a FileHandle. Or, in a concise table:
table 3.1
Special Character What it Denotes Example
'$' Scalar $number = 123.44;
$string = 'aaaa';
'@' Array @numberArray =(1,2,3);
@stringArray = ('elmt1', 'elmt2',3 );
$<var>[ ] Array Element print $stringArray[2]; # prints 3
$stringArray[4] = 'newstring'; # sets 5th element
'%' Hash %hashName = ('key' => 'value', 'key2'=>'value2');
$<var>{ } Hash Lookup print $hashName{'key'}; # prints 'value'
$hashName{'key3'} = 'value3'; # sets 'key3'
There are other things that you need not worry about with Perl variables. You don't need to worry about memory-management with Perl. Perl takes care of that for you. Variables grow on demand. Perl variables also hold any type of data, so you need not worry about data being truncated or mutilated in any way.
Finally, Perl provides a robust set of internal functions that help you manipulate your data. Programming in Perl sometimes feels like you are cooking (yes, in a kitchen). You 'chop' variables, 'split' them and 'splice' them. You 'shift' them and 'map' them. You 'join' them, and 'chomp' them. *
Although, at my work, we have even sort of blurred this line, because once you have the analogy in your head, you inevitably go too far --and start calling your data directories 'medium_to_well' |
So lets take a look at each of these datatypes in close examination, and go over them in detail.
Although there are four built-in datatypes in Perl, there is only one basic datatype in Perl: the scalar. Scalars are simply blocks of information. These blocks of information may represent integers, strings, floats, binary information or whatever else the programmer dreams up. Scalars are denoted by a '$' (dollar sign).
Scalars are one of the reasons why Perl is as powerful as it is. As stated above, you needn't worry about how big to make your scalar. Perl will make variables as big as necessary for you, even if your scalar is a file several megabits in size. Pictorially, you can think of scalars as a whole entity, hiding the details on how they are stored. Figuratively, scalars look like they do in figure 3.1.
line art
<Figure 3.1>
Whereas the traditional variable (something like C's char variable, and how it is used to make strings) might look something like:
line art
Figure 3.2
A string in C
Therefore, when programming in C, the programmer needs to worry about such low-level details. Whereas in Perl, the details are hidden so you need not worry. To assign a scalar to another scalar, you simply say:
$string1 = "This is a scalar";
or
$string1 = $string2;
Scalars allow programs to be much shorter and less complex for several reasons. One reason is that there are much fewer issues with boundary conditions. A C program for doing the same thing might look something like:*
char *string1 = (char *) malloc(sizeof(string2)+1);
int xx = 0;
while (string2[xx] != NULL)
{
string2[xx] = string1[xx];
xx++;
}
string2[xx] = '\0';
because of the explicit need in C to worry about each character. This fails if one of the characters in the string is a null (\0) character.
C and its brethren (C++, etc) are really easy to pick on since they are so low-level. This isn't totally fair. The two languages evolved to handle different concerns, and C is concerned with making things as fast as possible. The above piece of code is fast. So if you are a C/C++ programmer, take such comparisons with a grain of salt. |
This points to the second thing that you need not worry about in Perl: type inconsistency. C has, floats, doubles, unsigned chars, chars, longs, shorts, etc. What's the difference between a float and a double? How the variable is stored on the computer (i.e.: the precision of the variable).
Perl deems this sort of information too low-level for the programmer to worry about. Therefore, scalars are floats, doubles, chars, longs and shorts, all at the same time. In C, When you say something like:
int yy = 32767;
yy++;
what do you get? Well, depending on the compiler you are working on, you get:
-32767: the integer wrapped around because it was only 16 bits wide.
32768: the integer is 32 bits or 64 bits, so you have plenty of space to go
program crash: your program segmentation faults, and aborts with some sort of cryptic message.
If you hit number 3, what is probably going on, is the same as condition #1, but the compiler is nice enough to die for you instead of silently corrupting your data.
You needn't worry about this sort of thing in Perl. $xx = 32767; $xx++; will always return 32768. If your integers go too high adding more numbers to them will return floats.
There are other benefits to this scalar, the universal type. There are no keywords to memorize, as well as a need for you to worry about conflicting with keywords. Scalars allow unlimited length of both variable name and the variable's data itself. There is no need to do anything specific to denote binary or special characters as there is in so many other languages.
Finally, there is nothing special to be done with embedded null characters. In other words, scalars allow almost total freedom to program effectively and efficiently. Of course, with this freedom must come some responsibility. If you aren't careful, you could load that 100 MB file into one gigantic scalar, and cause your machine a seriously bad day. There are a few caveats you should be aware of because Perl has made the ubiquitous scalar what it is by a compromise of elegance and pragmatism.
Overall, the loose typing of variables enables a lot of leeway in design. For example, if you think something is going to be an integer, and it turns out to be a float, there are usually no changes needed to be made to the code. There are a few exceptions to this rule, which we will discuss in the chapter on Conditionals.
The syntax for creating a scalar is:
$variableName = value;
In which value may be numeric (integer or float),
string
reference (scalar, array, hash, code, or package)
or boolean (True or False)
The variable name can be any legal word string (a-z, A-Z, 0-9 or _ ) that starts with a lower or capital letter. Perl is case sensitive, so be sure to create a standard for the programming project.
Perl is similar to shell scripting when assigning values to a variable. The single quote (') indicates that the text is to be use verbatim and the double quote (") indicates that the text is going to be interpreted.
Following are some simple examples of scalars (note the pound sign # used for comments):
$scalar1 = 'this is a scalar'; # simple scalar assigned 'this is string'
$print = 10.0; # simple scalar assigned 10 (rounding is done)
$print = '10.0'; # number that is also a string, assigned 10.0.
$scalar2 = "this is $print"; # interpolation, assigned 'this is 10.0'
$scalar3 = 'this is $print'; # non-interpolation, assigned 'this is $print'
$emptyString = ''; # empty string.
We may be getting ahead of ourselves here, but the above example shows a couple of things about Perl. One is that quotes are important. If you say something like:
$line = '$';
Then you are using single quotes, and $line gets the value '$' (dollar sign.). If on the other hand, you use a double quote, something like:
$print = 10;
$line = "this is $print";
t
hen $print is evaluated as a variable, and $line gets the value 'this is 10'. This is what is called interpolation, and we shall have a lot more to say about it in the chapter on Contexts. Interpolation allows for a very natural way to make scalars out of other scalars, blocking them together. Here are some more complicated examples of assigning values to variables in Perl:$octalData = "\07\04\00\01"; # octal data. (\01 equals '1' octal)
$binaryData = "\x0B\0F\xAC\xBC" # hexadecimal data. 'hello' in ascii
$LooksLikeBinaryDataButIsnt = '\x0b\x0f'; # value is \x0b\x0f (single quotes)
$ReadFromFile = <FD>; # Filehandle read ( line from file).
$AssignSingleQuote = '\''; # backticking (\') which provides
# ability to assign a single quote
$AssignDoubleQuote = "\""; # assigning a double quote by \"
$assignTabsNewlinesBackslashes ="\t\n\\"; # \t = tab, \n = newline, \\ = backslash
$MultiLineString = ' Example: this shows
how Perl can be assigned a multi-line string.
'; # Example.. multi line string.
$4illegalVariableName1; # Illegal variable starts with number
$Illegal%VariableName2; # Illegal -- has a % in it.
$return = "return value"; # 'return' is keyword nonetheless
# $return is a legal variable.
These examples show some of the ways that Perl handles such dilemmas as making a variable equal a quote, how to handle binary data, octal data, make 'return characters', and so forth. Perl has one, general rule for making special characters:
If you affix a backslash to the beginning of something, you can make it a special character.
Hence, '\n' becomes a newline, '\t' becomes a tab, and '\r' becomes a linefeed. If you want to print a single quote inside single quotes, you say '\''.
These examples also show a other useful things, things that Perl inherited from the shell: the ability to have completely unambiguous references to variables, without fear for conflicting with keywords. Since Perl attaches the type of variable to the variable itself, $return is a variable, different from return the keyword. Hence, if you really want to, you could say something like:
if ($if)
{
print $print;
} elsif ($elsif)
{
chop($chop);
}
and Perl will understand what you are doing, even if you don't. After all $elsif, $chop, and $print are just variables.
This mutability, this ability to make variables that have the same name as keywords, can confuse programmers who are used to programming in low-level languages such as C. Indeed, there are other caveats to remember about scalars, which we will now look at.
Scalars as the 'Generic Datatype'
A question that may be going through your mind is: 'Exactly how does Perl make scalars work overtime, that is, make them work as floats, ints, doubles, and strings?'. Well, Perl simply decides on the right interpretation based on the context that the variable is in. If a variable is in a place where it looks like it should be interpreted as a string, it is interpreted as a string. If a variable is in a place where it looks like a number, it is interpreted as a number, and so forth.
This idea of context runs all the way through the language. In fact, we discuss it in detail in chapter 6 on contexts.
Now, the following three rules approximate the behavior that Perl follows when assigning values to a variable, but these rules are by no means exclusive. In other words, you can have a STRING that is also a NUMBER which is also a BOOLEAN. See the Perldata man page if you want the complete reference.
STRING Interpolation:
A scalar is interpreted as a string if it is part of a string comparison operator, or is an built-in function requiring a string, or part of the "", or '' construction syntax. A list of places where scalars are interpreted as strings is given in Table 3.2:
Table 3.2quotes '', "" $string = 'a', $string = 'b'
comparison gt, lt, eq, cmp,ne '9' gt '10', '1.1' ne '1'
operators
dot operator . $a = 1; $b =2; print $a . $b # prints 12
hash keys { } $hash{10.0} = 1; print $hash{10.0}; # prints 1
Lets take a look at a few of these elements more closely, for they tell a lot about how Perl can get away with only having one datatype.
quotes ('' and "")
Quotes go a long way to tell how a variable is being interpreted. When you see quotes, think of strings. One of things that newcomers to Perl get confused about is that:
if ('9.0' eq '9');
is not the same as:
if (9.0 == 9);
This is because the quotes are a dead giveaway that '9.0' is to be interpreted as the string 9.0, rather than the number 9. Hence the string '9.0' does not equal the string '9' (i.e.: they aren't the same characters, but the number 9.0 does numerically equal 9).
the string comparison operators (gt, lt, eq, cmp, ne)
The operators gt, lt, eq, cmp and ne do certain string comparisons. gt means 'greater than', lt is 'less than', eq is 'equal', and ne is 'not equal'. ('cmp' is a special operator for sorting, don't worry about it right now) These string comparison operators compare variables in dictionary order. Any scalars attached to them are interpreted as strings. If you see some code like:
print "yep.. aaa is greater than bbb\n" if ('aaa' gt 'bbb');
then Perl is interpreting both 'aaa' and 'bbb' as strings, then checking to see if 'aaa' is greater than 'bbb' (gt). Which it is not and is therefore false. If you say:
print "9.0 does not equal 9" if ('9.0' ne '9');
then you might not get what you expect. For the string '9.0' doesn't equal the string '9', and this message will always get printed. Likewise:
print "10 isn't always bigger than 9" if ('10' gt '9');
evaluates to true since 10 is smaller than 9 when evaluated as a string.
There is a caveat here, however. If you don't have parentheses around a numeric string, it will be interpreted as a number first, and a string second. For example:
if (9.0 eq 9) { print "HERE!\n"; }
is true because 9.0 gets trimmed to 9 (rounded) before the string compare. Again, I refer you to the perldata manpage for more detail.
The '.' operator.
The '.' symbol denotes the concatenation operator in Perl. The operator takes two scalars, and combines them together in one scalar. As such, both of the scalars to the left and right are interpreted as strings in this scenario. Hence:
$line = $line . 'end of the';
appends the string 'end of the' to the end of $line.
$nineHundredTen = '9' . '10';
is the string '910', and not 9.10 as the unwary might expect since the nine and ten are being concatenated. However, note that this is not the same as:
$ninepointTen = 9 . 10;
where the dot isn't a concatenation operator.
NUMBER interpolation:
A scalar is interpreted as a number if it is part of an array subscript, is in a numeric comparison operator, or is in an built-in function requiring a number.
In other words, if a scalar looks like it should be interpreted as a number, it generally is interpreted as a number. Table 3.3 has a list of places in which a scalar is interpreted as a number:
Table 3.3
numeric operators: +,-,/,*,**, %(mod) "45"+23; 5%4, "error"/"error"
numeric comparisons >,<,<=,>=,== '1e+23' > '5e+22',2344.32 == "2344.32"
array subscripts [ ] $array[15],$array["aa"]
built in functions int(), localtime(). int("1111.5") localtime(34234234)
.
Now, the main thing to recognize here is that in statements like $array["aa"] and "error"/"error", Perl converts these to the number 0 for you. However, if it is something like '2344.32' it is a numerical format, and Perl will keep it as it is.
Perl has quite a few numerical formats that it recognizes. The only rule that you need to remember here is not to use quotes, or you will end up with strings rather than numbers, and they may or may not be converted into numbers when necessary for you! Table 3.4 shows the formats:
Table 3.4
Number: Type: number to string auto convert?
114123 integer converts: "114123" == 114123
1123141.111 float converts: "1123141.111" == 1123141.111
1_344_454 integer (with does not convert: "1_344_454" != 1344454;
underscores for
thousands)
0x3FFFFF hexadecimal does not convert: "0x3FFFFF" != 0x3FFFFF;
0x3ffff hexadecimal does not convert: "0x3ffff" != 0x3ffff;
03777 octal does not convert: "03777" != 03777;
5.432e+300 exponential does not convert: "5.432e+300"!= 5.432e+300;
5.224E300 exponential does not convert: "5.224E300"!= 5.224E300
Remember, '1_000_000' is not the same as 1_000_000. The quotes make all the difference in the world. (The underscores make the number easier to read. Don't use commas, as they have special significance in Perl as the list separator.) Note the quotes again; 1_000_000 denotes the number one million, and '1_000_000' denotes the string 1, followed by an underscore, three zeros, another underscore, and three other zeros. Hence:
if (1000000 == 1_000_000);
is always true, since the underscores are simply so the number is easier to read. Whereas:
if (1000000 == "1_000_000")
is always false, since "1_000_000" is not a number because of the quotes.
Following are more examples of variables interpreted as a numeric values:
$scalar = $scalar +5; # Easy, add 5 to the variable
if ($biggerScalar > $lesserScalar) { print "$biggerScalar is bigger than $lesserScalar\n"; }
$onehundred = 10**2; # ten to the second power is 100.
$five = $two + $two; # math in '1984'
$subscript = "1"
print $array[$subscript];
The last code example is a bit tricky. The quotes should tip you off that the thing you are looking at is a string, but since $subscript is used in a context which takes a number, the $subscript is forced to become a number. This evaluates as
print $array[1];
The same thing happens with expressions such as:
$sum = "111.00" + 12;
The '+' turns the string 111.00 into a number, so $sum becomes 123. Likewise with:
$value = "non_number" + 1231;
you are forcing a string that is normally not a number, to become a number. Once a string is so forced, it becomes zero.*
This is a caveat that you can locate via usage of the '-w' (warning flag). If you run the script as in:
then you will get the message:
Refer to the section 'debugging Perl' for more information. |
Confusing? Well, yes. The best way to get over any awkwardness and avoid mistakes is to always use the '-w' flag in your scripts. '-w' will notify you about many potential problems with your scripts.
After you get used to this automatic conversion, it becomes quite natural. The best practice for people new to the language is to get used to using '-w' and looking at the output it makes.
Please consider in detail what happens when you say something such as 'if ($a gt $b)'. A scalar is interpreted as a BOOLEAN if it is part of a conditional clause. Conditional clauses in Perl are such constructs as if, then, while, and do while. If the scalar is empty, exactly zero ("0"), or the special value 'undefined' (see below) this corresponds to the FALSE value*. If the scalar is non empty, or is a reference, this corresponds to TRUE.
This is a gross simplification, but one that will do until we give it more treatment in the section on 'Operators'. For an example of what can lie in wait for Perl programmers, consider the value "0.000". Is this false? It certainly looks false, because "0.000" looks like it is numerically 0. However, remember that strings are denoted by the quotes ("). Hence, "0.000" is simply a string that looks like zero, but isn't a zero as far as Perl is concerned. By our definition therefore, "0.000" is true, and if you say something such as:
then this will always print 'This is true'. Something to think about. |
Following are examples of variables being assigned Boolean values:
my $nameOfScalar = "something";
if ($nameOfScalar)
{
print "True";
}
This example always evaluates to true, since $nameOfScalar has a length greater than 0.
$scalar = "";
if ($scalar)
{
print "This never gets here";
}
This will never get perform the code inside the loop, since $scalar is null and therefore false.
Scalars can be interpreted as either strings, numbers, or Boolean (true or false). Generally, if something looks like a number it is a number, if it has quotes around it is interpreted as a string, and if it is in a comparison (>,=,<, eq, ne, cmp) the value that comes out if it is a Boolean.
The main thing to watch out for with loosely typed variables is STRINGS being mistaken for NUMBERS. If you say something such as:
$varb = "hello" + "goodbye";
then you don't get "hello goodbye", you get '0', because '+' forces "hello" and "goodbye" to be numbers, and this expression evaluates as 0 + 0 which equals 0. The best way to catch such beasts is by the use of the warning flag ('-w').
Again, all this complication, rules, and confusing mass of steps that Perl goes through can be mostly avoided if you use '-w' in your scripts. This way, you learn by easy experience rather than hard knocks.
Perl invests a lot of its functionality in scalars. Perl provides quite a few functions and operators to manipulate them. Here are some of the most widely used functions used when manipulating scalars. This is by no means an exhaustive set of functions, but surprisingly, they should suffice for 90% of the operations that you ever would want to perform on scalars.
length is a built-in function which gives the length of a scalar.
$scalar = "Sample scalar";
$length = length($scalar); # $length becomes 13.
In this example, the function length counts the letters in the scalar $scalar.
The following is the built-in function chop, which chops a character off the end of a scalar:
$scalar = "HO\n";
chop($scalar); # returns 'HO'.
chop($scalar); # returns 'H'
$scalar = 12323;
chop($scalar); # returns 1232 (chop works on numbers, too!)
In each case, you end up with a scalar that is one character less than what went in. Note that you do not say something like:
$scalarName = chop($scalarName);
This simply doesn't work, because it returns the character that was chopped rather than the abbreviated string.
chomp is another built-in function which chops characters off of scalars. It is an intelligent chop, which is usually used for getting rid of newlines on the ends of Perl input. Lets say you define a 'special character' to be "\n" ( a newline). Then a statement such as:
$scalar = "This has a newline\n";
chomp($scalar);
would return "This has a newline". In other words, the newline is gone, and chomp gets rid of the newline. However, if you say:
$scalar = "This doesn't have a newline";
chomp($scalar);
In this example, the original contents of $scalar remains the same because there is no special character chomp-ed from the end..
See the difference? chomp is safer because chop would end up with "This doesn't have a newlin" instead. It is all controlled by that special variable (called $/) which contains the characters that you don't want to be chopped. This can be set to any value you want, as in:
$/ = "AAAAA";
$scalar = "ChoppingAAAAA";
chomp($scalar);
print ($scalar); # prints 'Chopping'
The variable '$/' does quite a bit more than this. It is called the Input record separator, and we will have more to say on it in the section 'special variables and operators'.
A scalar is considered to be a 'whole', or a unit. You can't simply reach in and grab parts of a scalar. For example, the following will not work:
$scalarName = "dog";
print $scalarName[0]; # will NOT print 'd'.
because $scalar[0] represents an array element, not the first character of the scalar $scalarName.
Perl provides the substr function (substring) to extract parts of a scalar. If you wanted to get the first character of a scalar, you would say something like:
$scalarName = "dog";
$letterD = substr($scalarName,0,1);
which is read as 'starting at the zeroth (first) letter in $scalarName, take one character, and stuff it into scalar $letterD'.
If you want, you can also use the 'short' form of substr, and take, say, the fifth and following characters:
$vowels = "AEIOUANDSOMETIMESY";
$ANDSOMETIMESY = substr($vowels, 5);
print $ANDSOMETIMESY; # prints 'ANDSOMETIMESY'.
This prints 'ANDSOMETIMESY'.
Substring is infinitely useful. Lets say you want to chop off two characters from the scalar $chop:
substr($chop, -2) = '';
which is to say, that negative subscripts are OK, and that they count backwards from the end of the string.
uc() returns an uppercase version of the string that you give it. For example, if you say something like:
$name = uc("Huey");
print $name;
this prints "HUEY"
ucfirst() returns a capitalized version:
$name = ucfirst("huey");
print $name;
prints "Huey";
Likewise lc() and lcfirst() return lowercased versions of strings. lc returns all lowercase. lcfirst() makes the first character uncapitalized, although why anybody would want to do that is beyond me.
Scalars can pop into being any time necessary without a formal definition statement. You can say:
print $scalarName;
without ever initializing $scalarName first. All variables are said to have a special value before they are created explicitly. This value is called undefined, and it plays quite a large role in Perl.
If you need to test whether or not a scalar has a defined value or not, simply say something like:
if (defined $scalarName)
{
print "The scalar has been defined\n";
}
Likewise, if you want to set a value to be undefined, or simply want to pass the undefined value around, you can simply say:
return(undef); # returns the undefined value from a
# function.
$string = undef; # sets string to the undefined value.
undef($string); # undefines the string (same as above
This doesn't do undef and defined nearly enough justice. There are several cases where defined/undef is used in Perl to a programmer's great advantage:
1) when a filehandle or a directory handle reaches its end, the last value returned is 'undef'. See handles for more information on this
2) Some special functions return undef as the last value. See below, and the Perlfunc page for more details.
3) arrays, and hashes can also be 'undef'd. See below
Scalars are Perl's heavily overloaded jack of all trades datatype. They take a little bit of patience to get used to, but if you take the time to learn them well, they will do you yeoman's service.
Scalars are so ubiquitous, and so close to the way that people naturally think about how variables should operate, that you will find yourself using them in ever-more complicated ways. Remember:
1) scalars operate as numbers, strings, and Boolean
2) they are abstractions in that you can't simply reach in and manipulate them internally. You need functions to do so.
3) you need not know how these variables are stored. They manage themselves.
Wondering why we spent so much time on scalars? If you know scalars inside and out, you know arrays (and hashes) inside and out as well. Arrays are simply groups of scalars which are indexed by a number. Arrays in Perl grow and shrink automatically, unlike some programming languages which require the size of an array to be preset. Arrays are represented by an '@' (at sign). Arrays in Perl are like arrays in C in that they start at the zeroth element and progress upwards.
The values of the individual elements in simple, one dimensional arrays are assigned in the same way as scalars are by the following construct:
@ArrayName = (value1, value2, value3); #this assigns values starting at index 0
When dealing with strings that are in order, or with numbers, you can also simply say:
@arrayName = (1..10);
where '..' is a special symbol which is says the same thing as:
@arrayName = (1,2,3,4,5,6,7,8,9,10);
Or you can say:
@arrayName = ('a'..'e');
which creates an array with the following contents:
@arrayName = ('a','b','c','d','e');
where the '..' expands into a series of elements.
Each array element is a scalar. Therefore, the element may be assigned a value individually by index number with the following construct:
$ArrayName[index] = value;
Again, C programmers especially can get confused by this. This is not getting the <index>th character of the scalar $ArrayName. Instead, it is the <index>th element of @ArrayName. Something like Figure 3.4
Figure 3.3(line art)
Figure 3.3
Reading Array Syntax in Perl
Here are some simple examples of arrays and array elements:
@array1 = ('This', 'is', 'an', 'array'); # four element array
@NumberArray = (1, 2, 3, 4, 5); # array of numbers
@EmptyArray = (); # empty array
$Array2[0] = 'elem0'; # element 0 of array2 equald 'elem0'
The following example prints out all the elements in the array in order:
print "@array1\n"; # Example: printing out
# an array -- by default
# prints '1 2 3 4 5';
And the following copies @array2 into @array1
@array1 = @array2; # Example of copy
# constructor for arrays
# copies each element so
# array1 equals array2
The next two examples demonstrate manipulating arrays directly (i.e.: copying two arrays into one). However, realize that Perl provides you a function to do this (push), so you should probably use this instead.
@array1 = (@array2, @array3); # Makes @array1 equal
# to @array2 concatenated with @array3.
@array1 = (@array1, 'element1'); # appends onto array1 element 1.
Following are more interesting things you can do with arrays:
@array = ($scalar1, $scalar2, $scalar3); # Example of scalar assignment to arrays
@ArraySlice[3,4,5] = ('elem3','elem4','elem5'); # Example of slicing an array
# which assigns string "elem3" to
# $arraySlice[3], and so on
The following demonstrates moving the value of elements to other elements in the same array. This moves the value of element three to element one, the original value of element one to element two, and the original value of element two to element three:
@swap[1,2,3] = @swap[3,1,2]; # Example of swapping elements.
This is called slicing which makes transposition of variables extremely easy because you need no temporary variables or loops.
Here's one that you should be aware of:
$TooBig[100000000] = '1'; # Too big an array, will probably make your system thrash
# and perhaps die.
Since arrays grow by default, this automatically tries to allocate the space for 100,000,000 elements, which will probably exhaust your memory. Be careful. Other examples of arrays:
@array = <FD>; # Will assign @array the entire file via a file handle.
@array1 = qw(This is an array); # Example of qw (quoted word) syntax.
# Results in ('This', 'is', 'an','array')
@array1 = (This, is, an, array); # Example of quoteless assignment
$array[0] = "@array1"; # Example of scalar interpolation context
# see section contexts for more info.
# makes the first element of the array
# 'This is an array'
The first example shows what you can do with arrays and how they can be assigned to a filehandle (which we shall talk about shortly).
The last example shows another ubiquitous thing about arrays. If you use double quotes(") around an array, you are using what is called array interpolation. The double quotes join all of the elements into one big scalar.
This section discusses the functions that manipulate arrays. This is not by any means a full list, but again, here are the more important ones.
There is a function that returns the number of elements in an array. It just isn't the function that you would expect. The function is called scalar, and it forces the array into a scalar context, much as you force strings into a number context.
This example:
@array = (1,2,3,4);
$number_of_elements = scalar(@array);
returns the number '4'.
In practice the scalar function is seldom used. Once you understand how contexts work (see section 'contexts'), you can safely drop the scalar word and say:
$number of elements = @array;
instead, because it does the same thing.
The old way of doing this used to be to say:
$number_of_elements = $#array;
but be careful if you are to use it, since it returns one less than the statement above. (i.e.: if @array = ('1','2'), then $#array equals 1 NOT 2).
scalar(@arrayName) shows a simple way for you to iterate through all of the elements of an array. You simply say:
for ($xx = 0; $xx < scalar(@arrayName); $xx++)
{
print $arrayName[$xx];
}
and the scalar function supplies you the size of the array to iterate through. In practice, the scalar is dropped so this becomes:
for ($xx = 0; $xx < @arrayName; $xx++
{
print $arrayName[$xx];
}
The push function pushes an element or elements onto the end of an array. Remember, the size of arrays is not predefined so it is OK to keep adding elements to the end of the array.
@array = ('another', 'array', 'of', 'scalars');
push(@array, 'pushed element'); # pushes an element onto the array.
# Afterwards, the array contains (another,
# array, of, scalars, pushed element)
@array = ('one');
push(@array, @array); # doubles the array
# Afterwards, the array equals ('one','one')
push(@array, ('list','of','elements')); # appends ('list','of','elements') onto
# array which now contains
# "one, one, list, of, elements".
push is a very handy function to put in your toolbox for manipulating the contents of arrays. It is not necessary to read through the whole array to add elements to the end, neither is it necessary to know the number of elements in the array in order to create the next one. Finally, it is a safe way to combine the contents of arrays.
Taking Elements Off Arrays: pop Function
pop is the opposite of push. It takes an element off the end of an array. pop then returns the element's value as a scalar. Once again, you do not have to know the number of elements in an array in order to operate on the last element.
Following is an example of using pop:
@array = ('another', 'array', 'of', 'scalars');
$scalar = pop(@array); # @array now contains ( 'another', 'array', 'of' ).
# $scalar becomes string 'scalars'.
push(@array, 'aha'): # does a push.
$scalar = pop(@array); # undoes that push.
# $scalar becomes string 'aha'.
# array now contains ('another', 'array', 'of'), and
# 'scalars' is off the stack
Note that pop only undoes one scalar at a time. Hence the following are not exact equivalents:
push(@array, @array);
$scalar = pop(@array);
unshift and shift Functions
unshift does the exact thing as push, but it adds at the beginning of an array rather than at the end. Following is an example of unshift:
unshift (@array, 'element'); # adds element to beginning of array.
unshift (@array, @array); # 'doubles' the size of the array
Likewise, shift does exactly the same thing as pop, but removes the element at the beginning of an array, not the end.
unshift( @array,'element'); # adds element to beginning of an array. ('add')
# becomes ('element','add'). See above example.
$scalar = shift (@array); # $scalar becomes 'element' which undoes the 'unshift'
Complicated Array Management: splice Function
Since there are functions for manipulating elements at the beginning and end of arrays, you might expect that there is a function which manipulates the middle elements. There is, and it is called splice. splice is quite powerful. In fact, every function above can be rewritten in terms of splice.
splice does just what it sounds like it does: it takes a list and then removes or adds elements to that list in any order specified. General usage is: @array is the array to be affected, $position is the starting position to be affected, $length is the length to be effected, and @list is value to be added.
splice usage is:
: splice(@array, $position, $length, @list);Usage 1
Which can be probably better explained in pictorial form (I always confuse the four functions). Figure: 6.4 shows what's going on with splice:
Figure 3.4 (line art)
Figure 3.4
Splice arguments
Usage 1 takes out elements of @array starting at $position, and continuing for $length (i.e.: elements between $position and $position+$length) and adds elements of list in its place.
: splice(@array, $position, $length);Usage 2
Usage 2 removes elements between $position and $position+$length, and does no adding of elements.
: splice(@array, $position);Usage 3
Usage 3 removes elements starting at $position and continuing to the end of the array.
All usages return a LIST to the left hand side.
Following are examples of usage of splice:
@array = ('el1','el2','el3','el4');
$scalar = splice(@array, $#array); # Equivalent to 'pop(@array);'
# @array becomes ('el1', 'el2', 'el3')
# $scalar becomes 'el4'
@ReturnArray = splice(@array, 1,2); # @ReturnArray becomes ('el2','el3');
# @array becomes ('el1','el4);
# Read as: 'take out 2 elements from
# @array, starting at position 1.
@AddArray = ('el2a','el2b');
splice(@array, 2, 0, @AddArray); # @array becomes (el1,el2, el2a,el2b, el3, el4)
# Read as 'take out zero elements, starting
# at position 2, and add @AddArray.
splice(@array, 2, 0, ('el2a','el2b'));
# exactly the same as
# the example above, except that list is explicit now.
reverse takes an array, and turns it 'inside out'. Hence if you say something like:
@array = (1..10);
and then say
print "Counting down...\n";
foreach $element (reverse(@array))
{
print "$element ";
}
This prints
"10 9 8 7 6 5 4 3 2 1"
and the array has been reversed.
split provides the method to take a scalar and split it up into an array. split's most common forms are with two or three arguments. The default form takes a given regular expression or pattern, and then splits it into as many pieces as specified by $limit. If $limit is eliminated, then the scalar is split up into as many pieces as the scalar allows, and the rest of the string is jammed into the last element.
Syntax of split:
@array = split(REGEXP, $scalar, $limit)
where limit is an optional, numeric string, and REGEXP is a regular expression (see section 'regular expressions' for more detail.)
Following are examples of split:
@arrayOfChars = split( '', $scalar); # splits up scalar into its
# associated chars. eg 'here' turns into
# ('h','e','r','e')
@ArrayOfWords = split(' ', $scalar); # splits up scalar into its
# associated words based on spaces.
@ArrayOfWords = split(m#\s+#, $scalar);
# splits up scalar into its associated
# words by matching spaces.
These are relatively simple, they split on spaces. With the optional, limit argument, you can insure that an array will be made that has a fixed number of elements.
$scalar = "Last word will be 'many words after split'";
@array_of_words = split(m"\s+", $scalar, 5);
# split jams ''many words after split''
# into $ArrayOfWords[4]
To get the most use out of split, you really should know how regular expressions work. The above example splits on spaces (m"\s+" is to be read as 'match any space')
For a more in-depth way on regular expressions and regular expression functions like split, and how to use them, see section Using Regular Expressions.
join does the very opposite of split. It takes an array, and then turns it into a scalar. It has the following syntax:
$scalarName = join("chars_to_join", @arrayName);
For example with:
@arrayName = (1,2,3,4,5,6,7,8,9,10);
you can say
$scalarName = join(' ',@arrayName);
to get the scalar
"1 2 3 4 5 6 7 8 9 10";
Or, to join a list directly by colons, you could say:
$scalarName = join(':', ('my','list'));
to produce the scalar:
"my:list";
Arrays have their own version of these three functions. That is, you can simply do something like:
undef(@arrayName);
and then the array will become blank (i.e.: contain no elements).
Likewise 'chop(@arrayName)' and 'chomp(@arrayName)' take every element in @arrayName and get rid of the end characters in the same matter as chop and chomp above. They simply save you some typing..
Hashes are groups of scalars which are indexed by another scalar rather than a number. Like arrays, hashes grow and shrink automatically when you add or subtract elements. They do not need to be pre-sized.
Hash structure is similar to the traditional structure of arrays, but the indexes are themselves scalars. Again, they are like dictionaries. The following:
$dictionary{'dog'} = 'Domesticated mammal';
defines 'dog' as 'Domesticated mammal'. As in dictionaries, you can look up 'dog' and get 'Domesticated mammal' as a definition:
print $dictionary{'dog'}; # prints 'Domesticated mammal'.
Pictorially, you read hashes as in Figure 3.5
Figure 3.5
Figure 3.5
Interpreting the hash syntax
Hashes can be much more flexible than arrays since you can use pretty much any scalar as a key element. However, and this is very important, hashes lose all concept of order when they are put together. In other words, you would not want to locate an element by the key and then march through X number of records. Results would be unpredictable.
Hashes are represented by a '%' (percent sign) when the assignment of values takes place.
The construct for creating a hash is:
%HashName = (key1, value1, key2, value2);
Or
%HashName = (key1 => value1, key2 => value2);
****Side note***
The second syntax example uses the symbol '=>' which is known as syntactic sugar. This makes the code much more appealing to look at. The '=>' is really just a ',' in disguise.
*****end side note*********
There is no separate syntax for the initial creation of a hash as there is in Java and other programming languages. The hash is created in memory the first time it is referenced. The syntax for assigning a hash element is:
$hashNAME{key1}=value1;
Following are simple examples of hashes:
%hash1 = ('key1' , 'value1'); # simple hash construct
%hash1 = ('key1' => 'value1'); # does the samething as above, but is more readable
The following example shows assignment of multiple values in one statement:
%hash2 =
( # Example: hash with two values
'key1' => 'value1',
'key2' => 'value2'
);
Finally, the next example shows explicit assignment of a value to a hashkey (assigning a word a definition):
$hash2{'key1'} = 'value1'; # Example of setting a hashkey explicitly.
This code fragment prints the value of a hash element using the above example's hash:
print "$hash2{'key2'}\n"; # prints 'value2'
The following example shows incorrect usage of assigning values to a hash. key2 has no value assigned to it.
%hash1 = ( # Example of INCORRECT
'key1' => 'value1', # USAGE. Need hashkey pair
'key2'
);
%hashNAME = () # Empty hash. If deleting a hash, use 'delete' instead
# (see section on Hash Operators') .
As you can see, hash value assignment is almost exactly equivalent to array value assignment. Both use the ( ) syntax. The main difference is that the subscripts and values are interchanged throughout the assignment. Aside from the fact that the array has a concept of order, the two following examples are equivalent in value:
@arrayNAME = ('val0','val1','val2','val3','val4');
%hashNAME = (
0 => 'val0', 1 => 'val1',
2 => 'val2', 3 => 'val3',
4=> 'val4'
);
In other words, $hashNAME{0} and $arrayNAME[0] both equal 'val0'. However printing out all the values of %hashNAME and @arrayNAME leads to something different:
print @arrayNAME;
prints:
val0val1val2val3val4
On the other hand,
print values(%hashNAME);
does not print a fixed sequence at all because, again, hashes have no concept of order.
Following are more complicated examples:
@arrayNAME = ('key', 'value', 'key2', 'value2');
%hashNAME = @array; # An example
# of array context (see chapter on contexts
$hashNAME{key} = 'value';
$hashNAME{key2} = 'value2'
%hashNAME = ( # Example of hash assignment via scalars
$key1 => $value1,
$key2 => $value2
);
%hash1 = %hash; # Example of copying hashes by value.
%hash1 = ( # Example of copying hashes by value,
%hash0, # and adding an additional key.
'additional_key' => 'additional_value'
);
A lot of these examples are a bit of semantic trickery. You probably want to use the functions below instead, but then again, its your choice.
Hashes are basically black boxes in that you put something into them and it is difficult to predict where the data goes in the hash structure. This results in no easy way to print out an entire hash structure with a statement similar to print "%hashNAME"; , because, again, of the order thing. Hashes need special retrieval functions.
With 'tie'ing, (see section 'Tie'ing Variables in chapter 16) there also comes the problem that hashes can grow extremely large. I have worked with one hash that was hundreds of megabytes in size. Hence there is a need for a function that will manipulate hashes one element at a time.
These functions are described below.
The keys function returns a list of keys that is in a given hash, in array format. The syntax for keys function is:
@arrayName = keys (%hashName);
which assigns to @arrayName the keys of %hashName in indeterminate order. Following are examples of assigning values to hash elements and then retrieving those values:
%hashName =
(
key1 => value1,
key2 => value2,
key3 => $scalar
);
@arrayName = keys(%hashName); # @arrayName equals ('key1','key2','key3')..or is it
# ('key2','key3','key1') ?;
The keys function is also used to interactively move through a hash. For example, the following code fragment prints out all the key value pairs in the hash %hashName:
foreach $keyName (keys %hashName)
{
print "$keyName => $hashName{$keyName}\n";
}
The foreach control structure is described in the next chapter.
The values function returns a list of values that is in a given hash and returns that list in array format. The syntax is:
@arrayName = values(%hashName);
The following example assigns a value to a scalar, $scalarName, then assigns values to keys in hash %hashName. Finally, the values in the hash are returned in array format in array @arrayName.
$scalarName = 'value3';
%hashName = (
key1=>value1,
key2=> value2,
key3=>$scalarName
);
@arrayName = values(%hashName); # @arrayName now equals (value1,value2,value3);
# or is it ('value2','value1','value3');
values and keys are the primary windows into the hash data structure. Through them you can iterate through each element in the hash, or see what you have stored in the hash. However, realize the overhead that occurs when you say:
@keys = keys(%hashName);
This creates an array element for every key value, and if the hash is a huge one, you will end up with a huge array. If this is the case, you probably want to use each, described below.
The each function is the way to handle very large arrays without overflowing memory space. It returns a ($key, $value) list pair for each one of the elements in the hash. After it is done, each returns an undefined value.
Following is an example of the usage of each:
%hashName= ('very_large_hash'); # pseudocode!
while (($key, $value) = each %hashName)
{
print "$key, $value\n";
}
while (($key, $value) = each %hashName) # you can nest them.
{
while (($key2, $value2) = each %hash2)
{
}
}
By using each here, you make sure that you don't overwhelm the local internal memory. You can nest each'es since every individual hash has its own each attached to it. Then, they will not conflict.
*********side note**********
Do not add to the hash while iterating through the list. When you first do an each on a hash, an iterator is created for that hash, which is then tied to values of that hash. This iterator is shared by values, keys, and each, and it persists until the FALSE value returns. This iterator is confused if you start adding hash values in the middle of it, and this problem can be very hard to track down. In other words, results are unpredictable.
while (($key, $value) = each (%hash))
{
$hash{'newkey'} = 1; # EXTREMELY BAD IDEA.
delete $hash{$key}; # ANOTHER BAD BAD IDEA.
} # INDETERMINITE BEHAVIOUR
************
The delete function is the way to remove one element from a hash, since, after all, each element in a hash is a scalar. It is safe to use on any hash, unlike the undef function described below, it can be used on tied hashes as well as non-tied hashes (see section Tying for more information on tied functions.)
Following are examples with delete:
delete $ENV{PATH}; # deletes PATH environment variable
foreach $key (keys %hashName) # accesses hash elements iteratively
{
delete $hashName{$key}; # deletes every hash element
} # in hash %hashNAME.
Hashes also have their own version of the undef function. This example:
undef(%hashName);
has the effect of deleting the entire hash. This may or may not be what you intended to do.
The above functions show how you get values out of a hash, but how do you test to see if a value is in a hash? This is the job of the exists function. If you say something such as:
if (exists($dictionary{'dog'}))
{
print $dictionary{'dog'};
}
then this goes into the hash %dictionary and sees if %dictionary has the key 'dog'. If it does, it returns true. If not, it returns false.
Hashes, also called associated arrays, are not usually inherent in other languages (such as C). They take a little bit of getting used to. You basically put values into the hash by defining it as in:
%hash = ('this'=>'is', 'a'=>'hash');
in key/value pairs. Then, if you wish to access a value, you can say:
print $hash{'this'};
to print 'is', or
print $hash{'a'}
to print 'hash'. You can also say:
$hash{'newkey'} = 'new definition'.
to directly set the definition of newkey in the hash %hash.
Hashes, Scalars, and Arrays make up the total of all of the strict datatypes of Perl. They hold data, and you will use Hashes, Arrays, and Scalars to manipulate any data that you might conceive.
However, there is, of course, the question of where that data will come from. To do this, Perl provides something called a handle. If Scalars, Hashes, and Arrays are Perl's data, then handles are Perl's method of getting that data. Perl provides two types of handles:
File Handles
Directory Handles
which provide the Perl the ability to interface with files, processes, sockets, and, in the case of directory handles, directories. We shall be concerned here with file handles. Directory handles will get treatment in section 'Perl odds and ends'.
Rather than being distinguished by a special character, Handles have the convention of being distinguished by being in all upper case. Hence,
print FD "file name\n";
should be read as 'print to the file handle FD, the value 'file name\n'.
Filehandles are a bit of a misnomer. You can use them to have Perl read and write to files, but you can also use them to read/write to pipes, and read/write to sockets. They are Perl's primary window to the outside world.
Using a filehandle consists of three steps. First you open that filehandle, then you read or write to that filehandle, and then you finally close the filehandle.
To open: you say:
open(FD, "fileName");
which ties the filehandle FD to the file "fileName". This is to be read as 'open file fileName (read only) and tie it to the file descriptor is FD'.
This open statement forms a connection to the file fileName. If you then say something like:
$line = <FD>;
then $line will contain the next line in the file descriptor FD. This is the read. Alternatively, you can say:
print FD $line;
to write to the filehandle. (to do this you would have to open it like: open(FD, "> fileName") or open(FD, ">> fileName").
Finally, you close the fileHandle when done. You say:
close(FD);
to indicate that the operating system can close the connection for you and output information to the OS.
This is extremely useful for getting data from the operating system, and this only touches the surface on what file handles can do.
Below are the more common functions for dealing with filehandles:
open opens a file handle, and prepares that data handle for reading
< > reads from a file handle
print prints to a file handle
close closes a file handle
We also introduce below the package FileHandle, which comes along with the Perl standard distribution, and can be used for filehandles, but are cleaner than the filehandles Perl provides.
open is the primary way that you create a file handle. If you say something like:
open(FD, "> output_file") || die "Couldn't open output file!\n";
you are making the file descriptor FD synonymous with the file output_file (and you so happen to delete any existing output_file at the same time). You are opening the file for writing and the die says that if you don't correctly open the connection, you die.
Other things you can do with filehandles:
open(FD, ">> output_file") || die; # append to file or die.
This will append to a file, or die if it cannot write to it.
open(FD, "input_file") || die; # get data from a file or die
This will open a file for reading (or die if it can't)
open(FD, "process | ") || die; # get data from a process or die
This will open a process for reading or die, if it can't
open(FD, " | to process") || die; # pipe data to a process or die
This will pipe data to a process, or die if it can't (Although typically in the Windows world you don't pipe output to processes. This tends to be a strictly UNIX thing.)
If you prefer, you can also say something like:
my $fh = new FileHandle(">> output_file") || die;
which does the same thing, but ties the filehandle to the scalar $fh instead of FD. For the bulk of this book we shall use the FileHandle syntax rather than open. It is cleaner, and if you use it consistently, will help you avoid many problems.*
Specifically, file descriptors via open, are pretty odd items in the Perl world. They are bare words that stick out from the more elegant hashes, scalars, and arrays which can be localized fairly effectively. If the statement:
occurred in the heart of a program, and then:
was in a subroutine, then the subroutine's file descriptor overrides the one in main (i.e.: just like a global). It is highly recommended that you use the FileHandle package instead, and use the second of the constructs above, i.e.:
As we shall see, the 'my' makes it so the variable $fileHandleName is localized to the , so that if you have another '$fileHandleName' variable in the code, there is no conflict. |
open is useless by itself. After all, you want to do reading and writing as well! Hence, open goes in tandem with other the other functions below.
Now lets suppose you have opened a file or a process for reading via:
open(FD, "input_file") || die;
or
open(FD, "dir |") || die;
Then both of these statements create a tie between the file descriptor FD and the file 'file'. and you can do something such as:
$line = <FD>; # or $line = <$FD>; to use the FileHandle object.
The <> syntax means that we are reading the file descriptor FD, and putting the results into the variable $line. This descriptor knows which type of variable it is reading into, and hence you can say something like:
@lines = <FD>;
to slurp the entire file into the variable @lines. Now, To read through a file sequentially, we can simply use some form of the construct:
while (defined $line = <FD>) { } # or while (defined $line = <$FD>) with FileHandle
Take this construct and use the data file:
----- DATA FILE -----
line1
line2
line3
On the first iteration through, 'while (defined $line = <FD>)' will set:
$line = "line1\n";
the second iteration will set:
$line = "line2\n",
and the third call will set:
$line = "line3\n".
And voila! your file is read. Note that the '<FD>' operator reads the line with the ending character ("a newline") intact. If you want this character to be removed, simply do a chop() or a chomp() on the resulting scalar.
The technique for reading from a process is quite similar:
open(FD, "dir |") || die "Couldn't open pipe!\n"; # or $FD = new FileHandle("ls |") ||
# die "Couldn't open pipe!\n";
which opens up a file descriptor for the process dir. In English, this is read as 'take the output from the Win32 command dir and stick it into the file descriptor FD.
This technique is called piping. Piping occurs when the output of ls is routed to the file descriptor FD.
This Perl functionality is very powerful. However, this is a place where you need to be extremely careful about portability, since ls may not exist when you try to port over to another operating system, and in particular, 'dir' does not exist on UNIX. The following construct will print out all the files in a directory on a UNIX system:
open(FD, "ls |") || die "Couldn't open pipe!\n"; # $FD = new FileHandle("ls |") ||
# die "Couldn't open pipe!\n";
while ($line = <FD>) # while ($line = <$FD>)
{
print $line;
}
Just as before, the FD argument keeps track of the last line that you have read in your filehandle, and hence you can iterate through them.
Usage:
print (@arrayName)
Or
print FILEHANDLENAME (@arrayName)
print is a commonly used function. We have seen several examples of print already:
print "@ARGV\n";
# prints the array using interpolation, see both the section
# 'special variables' and $" ($LIST_SEPARATOR) and
# $\ ($OUTPUT_RECORD_SEPARATOR)
# and the section on interpolation in previous chapters.
print "\n\tHello\n"; # prints a newline, tab, Hello then a newline
print functionName(); # prints the output from the function functionName.
All of these examples show forms of the usage (print (ArrayName)). Although invisible, Perl prints to a special file handle called 'STDOUT' if you fail to give print a file handle of its own.
If instead, you want to print to STDERR, the error stream, (and another standard variable that Perl provides) use:
print STDERR "@ARGV\n";
or to a defined (writeable) filehandle, use:
my $fileHandleName = new FileHandle("> output_file");
print $fileHandleName "@ARGV\n";
This again, will destroy the file that you open. If you want to append to this file instead, say:
my $fileHandleName = new FileHandle(">> output_file");
print $fileHandleName "@ARGV\n";
Remember chomp, that had a special variable associated with it? There is a Perl special variable that you should be aware of ($|), which is intimately tied with print. It is the flushing mechanism in Perl. If you want your output to IMMEDIATELY go to a file, rather than be buffered, just set $| equal to one. Then, you won't be waiting for the system to write a block of output at a time. Immediately after a print that text will go to the specified file.
For example, you may do something such as:
my $fh = new FileHandle("> file");
foreach $line (@lots_of_lines)
{
print $fh "$line\n";
}
If you do this, without setting $|, output (i.e.: the 'lines' specified) will accumulate in a buffer, which may be lost if the process dies, as in via interrupt. Setting $| makes things a little bit slower, but a lot safer. The output in the variable $line will be put into the file IMMEDIATELY after the print command executes.
Shutting Filehandles: close
The usage of close is:
close (FILEHANDLENAME)
close is the logical opposite of open, or the opposite of 'new Filehandle' When you:
close ($FileHandle);
or
$fileHandle->close(); # this IS object oriented, after all!
This breaks the pipe between the filehandle and the file it is associated with it, which causes several things:
1) If the corresponding open was writing to the file - i.e.:
$filehandle = new FileHandle("> file");
Then any buffered output from 'print $filehandle' is then put into the file right before the close.
2) The $filehandle iterator gets reset to read from the beginning of the file again.
3) If the corresponding open was a PIPE, then the command doing the pipe is terminated..
Both open and the phrase 'new FileHandle' do an implicit close when the filehandle goes out of scope, i.e.:
sub openFile
{
my $fh = new FileHandle("> log") || die "Couldn't open log!\n";
print $fh "LOG ENTRY";
}
This will both close the file handle $fh, and save the buffered output 'LOG ENTRY' into a file, when the function openFile is finished.
Likewise:
open(FD, "> log");
print FD "LOG ENTRY\n";
will implicitly close the file handle FD right before the Perl script exits.
Summary of FileHandles.
FileHandles are used in Perl to connect to the outside world. First tie a handle to its corresponding file, via:
use FileHandle; # Uses the file handle object.
my $filehandle = new FileHandle("file");
Then either read or write from that filehandle:
my $line = <$fh>; # reads a line from the file
print $fh $line; # prints to the file
and then you close the filehandle:
close($fh);
This is the simplest form of filehandle usage. You can also read from processes, write to processes, and slurp entire files into either arrays or scalars.
Examples:
Perl's variables have a great deal of synergy. If you look at the datatypes Perl provides, you see that there are certain hooks to translate different types of variables into each other. For example, if you say something like:
@words = split(' ', $paragraph);
you are taking a scalar ($paragraph) and turning it into an array (@words). If you then decide that you want to reverse the words, then you can say:
$paragraph = join(' ', reverse(@words));
which reverses the words and then sticks them together back into a backward paragraph.
Here are some admittedly arbitrary, things that you can do to manipulate text. Don't worry about doing anything productive for now; simply look at the flow of how Perl's special variables can make the manipulation of text so easy.
a) One task made very easy by Perl is to load an entire file into an array. To do this, simply say:
use FileHandle;
my $fh = new FileHandle("file_name");
my @arrayName = <$fh>;
After this, @arrayName will contain a list of lines in a given file. If we wanted to load the entire file into a scalar we could simply say:
use FileHandle
local($/) = undef;
my $fh = new FileHandle("file_name");
my $scalarName = <$fh>;
and then $scalarName contains the whole file.
b) What if you wanted to go through a file, line by line, backward, take out the first field of that file, and print it? To illustrate this example, the table looks like:
a:b
c:d
d:e
The code to perform this task looks like:
1 use FileHandle;
2 my $fh = new FileHandle("my_file");
3 my @lines = <$fh>;
4 my $line;
5 foreach $line (reverse(@lines))
6 {
7 @words_in_line = split(m":", $line);
8 print "$words_in_line[0]\n";
9 }
The filehandle $fh is combined with the array @lines to slurp the entire file into memory. We then traverse through @lines a line at a time, and in reverse order (5), split the line by colons (7) and then print out the first word onto the line (8)
This then results in the text:
d
c
a
if fed the above file.
c) Another task well suited for Perl is to manipulate text within a file. As an example, let's 'auto abbreviate' a file, splitting the paragraph into words, truncate each word in that file to 5 characters, join the paragraph together again, and then finally write out the file to another file:
1 use FileHandle;
2 my $fh = new FileHandle("in_file");
3 my $fh2 = new FileHandle("> out_file");
4
5 while ($line = <$fh>) { $paragraph .= $line; }
6
7 @words = split(' ', $paragraph);
8
9 foreach $word (@words) { $word = substr($word, 0,5); }
10
11 $paragraph = join(' ', @words);
12
13 print $fh2 $paragraph;
14 close($fh2);
15 close($fh);
We open the files,(2,3), go through the files and slurp the file into one big line ($paragraph) and then go through and snip off the first five characters in the word (9) and then print the paragraph out (13), and then close the file.(14,15).
Hence, a paragraph like:
"Courtesy itself must convert to disdain if you come in her presence"
becomes:
"Court itsel must conve to disda if you come in her prese"
d) Lets do the opposite, by taking a file and making an array of words that are over 9 characters long. To do this, simply say:
1 use FileHandle;
2 my $fh = new FileHandle("in_file");
3
5 while ($line = <$fh>) { $paragraph .= $line; }
6
7 @words = split(' ', $paragraph);
8
9 my ($word, @longWords);
9 foreach $word (@words) { if (length($word) > 9) { push(@longWords, $word); } }
10 print "@longWords\n";
e) To continue this line of thought, another form of file manipulation is to make a concordance out of any given file. Hence, we shall have the following interface. If we say:
print $number{'the'}
this prints out the number of occurrences of the word 'the' in an input file:
1 use FileHandle;
2 my $fh = new FileHandle("in_file");
3 my $fh2 = new FileHandle("> out_file");
4 my ($line, %number, @word);
5 while ($line = <$fh>) { $paragraph .= $line; }
6
7 @words = split(' ', $paragraph);
8 foreach $word (@words) { $number{$word}++; }
9 print $number{'the'};
This works as follows. The logic for taking a paragraph, and turning it into words is the same (1-7). But now, we go through each word(8), and then note that we have found an occurrence of the word by adding on to the end of a hash ($number{$word}), Hence, with the input:
"In the beginning, there was the word, and the word was God"
the resulting hash from line 8 becomes:
%number =
(
'In' => 1, 'the' => 3,
'beginning => 1, 'there' => 1,
'was' => 2, 'word' => 2,
'and'=> 1, 'God' => 1
);
This works because the hash remembers the number of times it has seen each word and therefore line (9) will print out '3'.
f) Let's get a little more practical. Suppose you want to get a listing of files, such that we can say something like:
print $size{'my_file'}
prints out the size of 'my_file' on a Win32 platform.
1 use FileHandle;
2 my $fh = new FileHandle("dir |");
3 while ($line = <$fh>)
4 {
5 my @stats = split(' ',$line);
6 $size{$stats[-1]} = $stats[-4];
7 }
Here, we simply open up a file handle to the process "dir |", and then go through each line of the output, splitting it by spaces. Hence, (2) opens the output to the process 'dir', and reads it into Perl via (3).
We then take the line from the output. We note that the ending word (subscript -1) holds the name of the file, and the fourth word from the end (subscript -4) holds the size of the file. Hence, the hash assignment:
$size{$stats[-1]} = $stats[-4];
expands to something like:
$size{'file'} = '20,000';
so we can look up the size from the name of the file.
Summary of Perl Variables
Perl's variables are tailor-made for fast manipulation of data. Instead of getting low-level access to data that is directly stored in the computer (as you would in such languages as Basic or C), you get access to a rich variety of functions that let you manipulate them to your heart's desire.
Scalars, denoted by '$', are Perl's 'jack of all trades' variable. You can store any type of information in them, and then split them to form arrays, chop or chomp them to take a character off the end, take a look at their length and manipulate them either as strings or numbers. The one other main thing that you can do (which we have not discussed here) is to look for patterns in them. This is the domain of 'regular expressions', to which we have dedicated an entire chapter.
Arrays, denoted by '@', are groups of scalars indexed by a number. They grow and shrink for you, and can be manipulated as easily as scalars. To access an element of them, you say $variable[$index], where 'variable' is a valid array variable, and '$index' is the position on which the array holds. You push (onto the end of the array) and unshift (onto the beginning of the array) them to make them longer, and pop and shift them to make them shorter. You splice to remove certain elements, and join them to make scalars. You can also chop and chomp characters off the end.
Hashes, denoted by '%', are groups of scalars indexed by a scalar. They also grow and shrink for you. To access them, you say $variable{$key}, where 'variable' is a valid hash variable, and $key is a valid hash key. To delete an element from a hash, you delete, to add an element you say $variable{$key} = $value. To get a listing of keys in a given hash, you say 'keys(%hashName)', To get a list of values in that hash you say 'values(%hashName)'. To go through the keys and values one at a time, you say ($key, $value) = each (%hashName).
Finally, handles are Perl's way of reaching out to the operating system. To get a filehandle, you open it with an associated process or file, and when you are done with the filehandle you close it. If you want to print to the file, you 'print' and if you want to extract from the file, you say '$varb = <FH>' or '@varb = <FH>' where FH is the name of the filehandle. And finally, people may want to consider saying 'my $fh = new FileHandle(OPEN_EXPR);' instead because it is a generally cleaner way to approach handles.
So, that's about all it takes to introduce the basic data structures of Perl. . Perl's variables let you do a lot of data-wrangling.. If you learn how to use these data structures properly, you shall be able to solve problems with ease that would be very difficult in other languages.
![]() ![]() |
![]() ![]() |
![]() ![]() |
COMPUTING MCGRAW-HILL | Beta Books | Contact Us | Order Information | Online Catalog
HTML conversions by Mega Space.
This page updated on October 14, 1997 by Webmaster.
Computing McGraw-Hill is an imprint of the McGraw-Hill Professional Book Group.
Copyright ©1997 The McGraw-Hill Companies, Inc. All Rights Reserved.
Any use is subject to the rules stated in the
Terms of Use.