Orders Orders Backward Forward
Comments Comments
© 1997 The McGraw-Hill Companies, Inc. All rights reserved.
Any use of this Beta Book is subject to the rules stated in the Terms of Use.

Chapter 7: References

References are Perl 5's ways of representing pointers to data in the language, rather than the data itself.

References are a big improvement over anything that existed in Perl 4. They allow you to easily make complicated data structures such as trees, multi-dimensional arrays, priority queues - pretty much any sort of data structure you desire. Old guard Perl programmers should pay special attention to this section because, once you learn how easy it is to program with references, you'll never want to go back to the old way of manipulating complex data.

The purpose of this chapter is to get you used to the elasticity of references: i.e. to make you understand how the syntax works behind them, and then, by extension, use them efficiently. If you are fond of documentation (as I am) the appropriate Perl reference page is perlref which goes into a lot of detail, (and with many examples) on the following material.

It used to be a royal pain, in Perl4, to program data structures more complicated than the traditional arrays and hashes. Just to give you an example of how bad it was, this horrid construct:

@array = ("1$;2$;3$;4", "5$;6$;7$;8$;");

was used to implement a two dimensional array. Each element was accessed by:

print ((split(m"$;", $array[1]))[2]);

which would print out '7'. (Contextually speaking, this says 'take the element $array[1] as a scalar, split it into an array of elements using the special variable $;, and then dereference this array using the subscript '[2]', and print it out'.)

Yuck. Just try to extend this to do three dimensional arrays. Now, in Perl 5, the same construct looks like:

@array = ([1,2,3,4],[5,6,7,8]);

and element 7 is accessed by:

print $array[1][2];

Once again, if you are not familiar with references, you are going to want to read this chapter. Unless, of course, you want to go back to the Perl4 syntax listed above, or syntacies like it.

Chapter Overview

Perl's references are just like its contexts; there is a strict set of language rules, which when combined make for very powerful techniques in programming. This chapter deals with these language rules in detail.

First, we shall cover the definitions of the two types of references that perl has - hard and soft, and go over the reasons why hard references should be preferred over their soft counterparts.

We then go over the syntax that is used to construct complex data structures - the various operators (backslash, ref, and anonymous arrays and hashes).

Once we have put the data into the references that we have built, we go over the process of extracting that data. We go over the rules necessary to look inside the array, hash, and scalar references that we have constructed. In addition, we look at the process of drawing/listing what a given data structure will look like in 'raw form' - how it looks if we constructed it manually via perl's syntax.

And finally, we again have examples: storing a HTTP access log into a data structure, recursive references to a directory/file hierarchy and turning it into a data structure, and quite a few others.

Introduction

For 8 years, all that Perl ever had in the way of data structures were these three datatypes: hashes, arrays, and scalars. Some pretty incredible things were done with them in spite of their limitations. As we saw above, people overcame these limitations, by using specific Perl techniques which are totally unnecessary today.

Other such syntax monstrosities were using eval to create variable names, manipulating the symbol table; using typeglobs, by making one dimensional hashes/arrays 'look' like two or more, and by various other techniques which made the code obscure, hard to debug, and inflexible.*

****Begin FootNote****

If this sounds ugly, well it is. eval and typeglobs have their uses, but not for data structure creation. We will briefly touch on these concepts in chapter 11, so don't worry about their definitions for now.

****End FootNote****

Not any more!

With object-oriented Perl comes a new concept: references. References are simply pointers to items, whether they be hashes, arrays, scalars. There are even pointers to functions, called code references, which we will go into later. References are used to create trees, recursive data structures, list of lists, double arrayed hashes, or any other data structures that you can think of, quickly and painlessly.

Here, for example, is a double (or two-dimensional) array which uses references:

$doubleArray = [[1,2,3],[4,5,6],[7,8,9]].

Each individual element is accessed by:

$doubleArray->[1][1];

The first statement is read as '$doubleArray is a reference to an array which has three references to arrays in it: [1,2,3], [4,5,6] and [7,8,9]'. The second statement is read as 'access the second element (subscripted by zero, again) of the first element of the second array in $doubleArray.'

References greatly expand the usability and maintainability of Perl. As you can see, references also make Perl a much more readable language. A lot of the line noise in Perl 4 scripts were due to the lack of references.(again, 'line noise' being where there are more special characters than normal characters.). The job of this section is to teach how to make references work, and how to decode them.

As with the rest of Perl, there are a few simple rules that determine 95% of what you need to know about references and how to use them. These rules are also discussed in this section.

Perl 5 References: Hard and Soft

References are tags which point to other variables, or to pieces of data on the system. Think of them like a postal address. A postal address refers to the place you live, but is not actually the place you live.

Perl has two types of references: hard and soft.

Hard references are references that point, physically, to a piece of data on a system. They are not a separate datatype in Perl, but instead are stored in scalars, just like every other bit of data in Perl.

If you are a C programmer, you can think of hard references as very much like pointers in the sense that they 'point' to an object. Likewise, in order to access the object pointed to by a process called dereferencing. Hard references are not the same in the sense that you can't do pointer arithmetic with them. For example:

'$a = [1,2,3]; $a++'

does not make $a point to [2,3].

Perl references are also 'smart'. They keep track of how many different variables point to them, and only go away (their memory being automatically freed) when the last reference to them goes away, in a process called garbage collection. If you say:

{

my $a = [1,2,3];

$b = $a;

}

print "@$b\n";

this prints out '1 2 3' since even though the variable $a should have gone away at the end of the loop, $b stays on in its stead. This means that the variable stays around.

Now, one way of making a hard reference to a piece of data is simply by affixing a '\' onto the front of a variable, and then assign that variable to a scalar. For example:

@array = (1,2,3,4,5);

$hardRef = \@array;

@array is actually an array, (1,2,3,4,5), which takes up a chunk of memory. By virtue of the '\', $hardRef now points to the same chunk of data in memory that @arrayRef does. Now you can manipulate @arrayRef by virtue of changing $hardRef. In other words, you have internally made a picture very much like what is in Figure 7.1:

fig71 (drawn line art)

Figure 7.1

caption 'references and internal memory management'.

You can now treat $hardRef exactly as if it were a copy of @array, by putting a '@' in front of it to indicate that you are reaching into the reference to get the value stored there. If you now say:

@$hardRef = ('0');

print "@array\n";

This prints out 0, since with '@$hardRef = 0' you have changed the value of @array by changing the underlying array.

Soft references or 'symbolic references', are references which work simply because one variable contains the name of another variable.. Look closely at the following example:

$variable = 'scalar1'; $scalar1 = 'PRINT ME';

print $$variable;

Perl evaluates $variable to be 'scalar1'. The above is equivalent to:

print $scalar1;

Hence:

print $$variable1;

results in printing the string 'PRINT ME'.*.

***Begin Note***

It's just personal preference with me, but I strongly suggest NOT using soft references at all. You can't use them with my or lexically scoped variables. Hence

my $varb = '1';

$name = 'varb';

print $$name;

will not print out '1'. Instead, it will print either nothing at all, or if you so happened to define a 'local' or 'global' version of $varb, it will print out whatever happens to be inside this variable.

Hence, symbolic references promote the use of global and local variables, which you should be actively avoiding.

In fact, with the directive use strict in your scripts(see section on Debugging), you cannot use soft references at all. If you program something like:

use strict;

$var = 'scalar1';

$$var = 1;

print $scalar1;

Then you will get errors generated like:

Can't use string ("scalar1") as a scalar ref while "strict refs" in use at line 6.

This itself is good enough reason not to use them. Since use strict is going to be your first line of defense against typos, you really want to avoid them.

***End Note***

Because of the inherent problems of using soft references, this book is only going to be (99%) concerned with hard references. The second type, soft references, existed in Perl4, but has rapidly become only of specialized use due to hard references, and their power. Hard references are by far the most scalable way in Perl to make complicated data structures.

They are hard in the sense that, if you link two variables by a reference, and change one of those two variables, you change the other. They are also hard in the sense that references can exist 'alone', without any name to support them. The statement 'function([1,2,3]]' is perfectly legal, and it passes in an array reference without a name.

Finally, if one of the variables gets eliminated, or goes out of scope the other will stay around in its stead.

Recognizing references when you see them, the ref function

The first thing you should realize about references is how to spot them. Perl provides a function, ref, whose reason for being is to tell you whether or not a certain variable is a reference or not.

If you say something like:

print ref(\@arrayReference);

This dutifully prints out 'ARRAY' since the '\@' indicates that you have an array reference. Likewise:

print ref(\%hashReference);

prints out 'HASH' since the '\%' indicates a hash reference. ref prints out nothing at all if its argument isn't a hash.

Keep this in mind when you are going through the following material. Perl always has the 'answers at hand', and if you have a question about what type of data you are handling, you can always use ref to figure out the type.

Manually Constructing Data Structures

There are two ways of constructing the data structures that we are talking about:

1) by using a '\' to signify the reference.

2) by creating anonymous references

Lets be concerned with constructing these data structures right now. Later on, we shall learn how to get our data out of them.

Using the Backslash operator

We have already mentioned the backslash operator in the definition for hard references. The '\' operator is a great way of creating hard references. Simply put a '\' in front of any variable (array, scalar, or hash) and voila! you've got a reference to it. You could, for example, create a two dimensional array in the following way:

@array = (1,2,3,4);

@array2 = (5,6,7,8);

@array3 = (9,10,11,12);

@twodimarray = (\@array, \@array2, \@array3);

Each of the arrays is manually created and then stuffed into its appropriate 'box' inside the 'master' array. You've made a picture like Figure 7.2:

fig72

Figure 7.2

caption 'two dimensional array'

If you wanted to make a hash instead, in which its members themselves were hashes:

%hash1 = (1 => 2,3 => 4);

%hash2 = (5 => 6,7 => 8);

%hash3 = (9 => 10,11 => 12);

%hash =

(

'key1' => \%hash1,

'key2' => \%hash2,

'key3' => \%hash3

);

Here is how to make a reference to a scalar:

$scalarRef = \$scalar;

Note that here, $scalar itself is a reference. Hence, here is how to make a reference to a reference to a reference to a scalar:

$scalarRef = \\\'Text Here';

Note in the last example that an important thing about the backslash is that you can stack your references, or make a reference to a reference. In this case, its pretty silly. To actually print out the text inside $scalarRef you would have to do something like:

print $$$$scalarRef;

which is a real waste of '$' signs.

However, there are some cases where you don't want to take the trouble of creating the temporary arrays (@array, @array2, @array3), and simply want to 'build them from scratch'. This is a good case for Anonymous arrays and hashes (created by, what else, Anonymous array and hash constructors), which we touch on below.

Anonymous Arrays and Hashes (and their Constructors):

When you say something like:

$arrayRef = \@array;

you are creating an array reference that points to the same value as what is in @array. However, @array is still the 'actual' array here. What if you want to make something that looks like the following, where everything is a reference to a piece of data that is sort of floating by itself in computer space? In Perl, the way to do this is by creating what are called anonymous references. If you say something such as:

$arrayRef = [1,2,3,4];

the '[]' is an anonymous array constructor and denotes that you are creating an array reference, rather than an actual array. What is created is something that looks like Figure 7.3:

fig73

Figure 7.3

caption 'anonymous' references.

See the difference here? If you say:

@array = (1,2,3,4);

@array actually is the array pointed to. $arrayRef simply points to an anonymous piece of data that Perl manages, or keeps track of, and you don't have to worry about. For example, if you say something like:

sub array

{

my $arrayRef = [1,2,3,4];

}

where $arrayRef is a temporary bit of data, then after the reference goes out of scope, Perl cleans up after you. This cleanup happens by deleting the reference and reclaiming the memory automatically. This is automatic garbage collection, and we discuss it at the end of the chapter.

You can then add other pointers to the same piece of data, i.e.:

$arrayRef2 = $arrayRef1;

There are two different types of anonymous references you can create: anonymous array references, and anonymous hash references. Lets see how we can use these datatypes to create multi-dimensional, complex data structures:

Anonymous array references are denoted by a '[]'. Instead of saying something like:

@array1 = (1,2,3,4);

@array2 = (5,6,7,8);

@array3 = (9,10,11,12);

@twodimarray = (\@array1,\@array2,\@array3);

to create a two dimensional array, you can say:

@twodimarray = ([1,2,3,4],[5,6,7,8],[9,10,11,12]);

instead. See how this is working? The above two statements are not only similar, they are computationally equal. The only difference is that instead of having @array1 be the pointer for the first element, @array2 be the second, and so forth. Perl keeps track of the elements for you.

In fact, if you wanted to, you could drop the need for having @twodimarray point to the actual, two dimensional array. You could say something like:

$twodimarrayRef = [[1,2,3,4],[5,6,7,8],[9,10,11,12]];

so that $twodimarrayRef is a reference itself, and you don't actually have any variable pointing to the data.

Anonymous hash references are denoted by '{ }'. To make an anonymous hash, you say:

$hashRef = { 'this' => 'is', 'a' => 'hashref'};

as opposed to:

%hash = ('this' => 'is', 'a' => 'hash' );

which has parentheses and a '%' to indicate that you are assigning to a hash rather than a hash reference. You could then go to make a two dimensional hash reference by saying something like:

$hashRef = {

'key1' => {1 => 2, 3 => 4 },

'key2' => {5 => 6, 7 => 8 }

};

key1 has a value which itself is a hash reference, which points to the hash (1=>2, 3=>4). Likewise, key2 has a value (5=>6, 7=>8).

Like the '\' operator, you can stack the anonymous array and hash constructors ([] and {}). To make a reference to a reference to a reference to an array, you could:

$nestedArrayRef = [[[[1,2,3,4]]]];

which is again a waste of brackets, and which you shouldn't need to do too often. (It can be done, however. that's what Perl is about!)

Getting More Complicated

Using the backslash operator, and the {}, [] constructs, you can make as complicated structures as you want. Need a three dimensional array? You could write:

$twoDarrayref1 = [[1,2],[3,4],[5,6]];

$twoDarrayref2 = [[7,8,9],['bob','cat']];

$threeDarrayref = [$twoDarrayref1, $twoDarrayref2];

in which the references are placed inside a variable, which is itself a reference. You've generated something that looks like Figure 7.4:

fig74

Figure 7.4

caption 'a three dimensional array reference'.

Need an array, which itself has hashes?

%hash = {'key5' => 'value5', 'key6' => 'value6'};

$arrayOfHashes = [

{ 'key' => 'value' , 'key2' => 'value2'},

{ 'key3' => 'value3', 'key4' => 'value4'},

\%hash

];

As you can see, you can mix and match these constructs, intermixing '{}', '[]', and '\' to your heart's content. Although, for consistency's sake, you are probably better off sticking to either anonymous references or explicit references through a backslash.

Here are some more examples of using references, just to get comfortable with the syntax:

$arrayref = [1 ,2 ,3, 4, [1,2,3,4]];

# $arrayref is an array reference

# which points to 5 elements one of which itself is an array

# ref which contains 4 data points.

This shows a hybrid data structure. The first four elements in the array reference are scalars. The fifth one, however, is itself an array reference.

Following is an example of taking two arrays, and putting them into their very own array reference:

my $arrayref = [@array1, @array2];

# Example of array that is a reference

# to @array1 and @array2 concatenated.

Here @array1 and @array2 are concatenated together, and stuck in the reference $arrayref.

Finally, try to figure out what is going on in the below examples yourself. Just remember that each time you use a \@ or a [, you are dealing with an array reference, and when you use an { or a \%, you are dealing with a hash reference.

$hashref =

{

key1 => \@array1,

key2 => \@array2,

key3 => [ 'anonymous', 'array', 'reference']

};

# hash has keys that themselves are arrays

$hashref =

{

key1 =>

{

subkey1 => ['array', 'of', 'elements']

}

};

# hash that has keys which have hashes as values.

$hashref_of_ref = \$hashref; # $hashref_of_ref is now a

# reference to the reference of the hash listed above!

The above show the limits of how far you would ever want to go with data structures. Chances are if you are doing something more complicated, you should do it in a class, with methods, etc., for retrieving data. You probably have gone too far once you have a reference structure four or five levels deep. (I'm sure that you can see that this would not be very fun to maintain, either.)

Using Complicated Data Structures

Complicated data structures are used in Perl when you need more power than a simple hash or array can provide. If you wanted to, you could make an object out of the data that you wish to model. However, making an object takes time and design knowledge, and you may not want to go to the trouble when a data structure will do. There are tradeoffs to either one. These tradeoffs between complex data structures and objects are discussed at length in the next major section in the book, on Objects.

Complex data structures are also very handy for translating data from a database, and turning it into a structure through which you can iterate. Here's an example of parsing information from the command line. The statement:

script.p -option1 'value1' -option2 'array1 array2'.

translates very cleanly into:

$commandOptions =

{

'option1' => 'value1',

'option2' => ['array1', 'array2']

};

See how this works? option1 takes one argument, and becomes a hashkey in commandOptions. It has one value which is a scalar, value1. Since option2 takes an ARRAY of values, we translate this into a hashkey in commandOptions with an anonymous array ['array1', 'array2'] as values.

Alternatively, you could do something like this:

$database_table =

[

{'name' => 'table_id', 'type' => 'int', 'size'=> 4 },

{'name' => 'first_column', 'type' => 'char', 'size'=> 10 },

{'name' => 'second_column', 'type' => 'char', 'size'=> 15 }

];

to denote a database table, with its name, type, and size of database fields. These are only some simple examples. There are many other times that you will want the power of more complicated data structures without taking the time and trouble to make a class. We shall see lots of these examples in the chapters to come. We have devoted the next chapter to the more common complex data structures.

Summary of Manually Referencing and Constructing Data Structures

References point to a place in memory that contains data. The data may be a scalar, array or hash. The simple way to remember what type of data structure that you are referencing and constructing is:

  • \$ means scalars

  • \@ and [ ] mean arrays

  • \% and { } mean hashes

  • References may be stacked, moved, copied or swapped. They are a handy way to access data. An anonymous reference is one in which a reference is created by the [], or {} syntax and one which allocates a chunk of memory, without associating that memory directly with any particular variable.

    Dereferencing Complex Data Structures

    The previous section showed you how to create and populate data structures. Of course it is always nice to be able to get that data back out of the data structure. Dereference means to turn a reference back into its component values.

    So how do you do this: go about getting the data out of the data structure? For one thing, you cannot simply print out the references to get the data that is in them. This would be implicit dereferencing (i.e.: done behind the scenes). This is an OK policy as long as Perl did it always. (Python for example, always dereferences. It is a 'pointer semantic language'.)

    But as it stands, Perl can either access 'actual items', or 'references to actual items', so it needs to have a different syntax for doing each one (Perl, after all, is good but not a mindreader). If a reference is directly printed, Perl shows text indicating the place in memory where the reference is, which is seldom what you want. This is shown in the statements:

    $scalarRef = \'the reference to a scalar'; # $scalarRef is a scalar reference

    print $scalarRef # reference. prints out "SCALARREF(0x....)".

    Printing out the memory location instead of the data at that memory location also happens with arrays and hashes:

    $arrayRef = [ 1, 2, 3, 4 ]; # $arrayRef is a reference to an array.

    print $arrayRef; # prints out ARRAYREF(0x....).

     

    $hashref = { key => value} # $hashref is reference to a hash.

    print $hashref; # prints out HASHREF(0x.....)

    These print out 'ARRAYREF(0x12323)' and 'HASHREF(0x12123)' respectively. Note that these are just strings and not usable as references, are useful for debugging only, and for indicating what type of reference each item is. Since hashes only hash on strings, you can't use references as keys to hashes. This means you can't say:

    $hash{$arrayRef} = {$arrayRef2};

    or

    $hash{"$arrayRef"} = {$arrayRef2};'

    since each of these references are being interpreted as strings when going into the hash.

    We need to get a little more complicated in coding to get our data back out of these data structures. Following are rules that govern the dereferencing of a data structure. These rules go from most readable and maintainable to least.

    Rule1: In simple cases, you can just take whatever the variable is and dereference it by the appropriate symbol for the reference (i.e.: '$', '@', or '%').

    A simple example of this is:

    $line = \'this is a scalar ref';

    print $$line; # prints 'this is a scalar ref.'

    Notice the two dollar signs. In effect, this says "take the scalar variable reference and dereference that data". An equivalent can be done for arrays and hashes:

    $arrayRef = [1,2,3,4];

    print "@$arrayRef\n";

    which prints out '1 2 3 4';

    The following prints out 'key1 key2':

    $hashRef = {'key' => 'value', 'key2' => 'value2' };

    @keys = keys (%$hashRef);

    print "@keys\n";

    The following code gives an error:

    $hashRef = ('key' => 'value', 'key2' => 'value2' };

    print @$hashRef;

    since you are trying to dereference a an array reference, when $hashRef is actually a hash reference.

    Rule2: For array and hash references, elements can be accessed by the symbol->. This is called direct access.

    If you say:

    $hashRef = {'key' => 'value'},

    then

    print $hashRef->{'key'};

    prints out the string 'value', since the -> has grabbed inside the hash reference, and retrieved the definition for the key key, much the same way that $hash{'key'} grabs data inside a real hash.

    Rule3: For array and hash references, you can alternatively access each element by doing $$varb[] for arrays or $$varb{} for hashes. This is called indirect access. $varb is the name of the variable that you are using.

    Rule 2 is by far the more readable. Rule2 can also be stacked, which means you can use it on deeply nested constructs.

    Simple Example of rule2 and rule3:

    $line = [0,1,2,3];

    print $line->[0]; # prints 0 (rule2a);

    print $$line[0]; # prints 0 (rule2b);

    This is a more complicated example of rule3 (showing stacking):

    $line = [[0,1]]

    print $line->[0][0]; # prints '0' (rule3)

    Both of these examples show how you would reach inside the data structure, using the -> syntax.

    In cases of ambiguity, brackets can be used to disambiguate. If you are using this rule a lot, your code is probably not very readable.

    Simple example:

    $arrayRef = [0,1,2,3];

    print ${$line}[0]; # prints 0;

    As we said, you are probably not using references correctly if you resort to Rule3 a lot. By judicious use of assignment, you can take Rule3 and break it up into many instances of Rules 1 and 2. If you have something such as this:

    $complicatedRef = [1,2,3,[4,5,6]];

    you COULD say something such as:

    print "@{$complicatedRef->[3]}";

    to print out "4 5 6". What this does is directly reach in to the complicated data structure, return an anonymous array that is associated to $complicatedRef, and then dereference it. Look at Figure 7.5:

    fig7.5

    Figure 7.5

    How complicated referencing works

    But why bother with this complicated syntax? You are better off using a temporary placeholder:

    my $ref = $complicatedRef->[3];

    print @$ref;

    This can make your code far more readable and far more maintainable. It shows your thinking process in all its steps instead of a fait accompli.

    Again, lets take some examples, of some complicated data structures, and how you would want to dereference them.

    $arrayref = [ 1, 2, 3, 4 ]; # a simple array ref.

    print $arrayref->[1]; # use rule2, -> syntax ( prints 2 )

    print $$arrayref[1]; # use rule3, $$ syntax ( prints 2 )

    print ${$arrayref}[1]; # use rule4 to disambiguate ( prints 2 )

    print @$arrayref; # use rule1 ( prints 1 2 3 4 )

    The following example shows several different, simple ways of accessing an array reference. It is as much a matter of taste, but rule2 (for me) is clearest.

    $arrayref = [' ', ' ',' ', ' ', [1,2]];

    print $arrayref->[4]->[2]; # use rule2 ( prints 2);

    print $arrayref->[4][2]; # use rule2. Note

    # lack of '->' in second bracket. This is

    # equivalent to the above statement because

    # you don't need to write a second '->'

    $complicatedRef = [{hmm => {hmm2 => ['key']}}];

    print $complicatedRef->[0]{'hmm'}{'hmm2'}[0]; # uses rule 2. Note that we

    # can mix and match array and

    # hash constructors as we please!

    Here we show the use of syntactic sugar. Syntactic sugar denotes a syntax construct which is equivalent to some more 'verbose' syntax, but is easier to write and therefore understand.

    In this case, the -> can be omitted when you go more than one level deep in a reference. This is because once you get past the first level of a array call, you are dealing strictly with references. The '->' is only used to distinguish between a data structure which is a reference, and a data structure which is the real thing. Here are some more examples:

    $ArrayRefOfArrayRef = [0,1,2,3,[a1,a2,a3,a4]];

    print $$arrayref[4][2]; # use rule2b.

    print ${$arrayref}[4][2]; # use rule4.

    print "@{$arrayref->[4]};"; # use a combination of rule 2 and rule4.

    # getting kind of ugly.yet this prints out 'a1 a2 a3 a4';

    print "@$arrayref->[4]\n"; # Note: NOT the same.. Prints out ARRAYREF#

    The last line in the example does NOT print out 'a1 a2 a3 a4'. '@$arrayref->[4]' turns into '@{$arrayref}->[4]' which is interpreted as 'show me the array slice associated with the fourth element of $arrayref', @a[4]. This prints out "ARRAYREF(0x....);" Again, this is getting way too complicated, and we are better off creating a placeholder for all of this data:

    my $placeholder = $arrayref->[4] # better way to handle ambiguity.

    print @$placeholder; # now $placeholder holds [a1,a2,a3,a4].

    @$placeholder dereferences to print out "a1 a2 a3 a4".

    This shows the confusion that you can get in when you do not split up the dereferencing into separate parts. It is an easy place for people to get confused. '@$array->[4]' seems a very natural way to dereference a data chunk. Unfortunately, it is ambiguous, and if you are doing stuff like this make sure to do it in two separate steps since that forces you to be clear in your thoughts.

    Here's a subtle issue that you should be aware of when using the -> operator:

    @ArrayOfArrayRef = (1,2,3,4,[a1,a2,a3,a4]);

    print $ArrayOfArrayRef[4][2];

    Note the difference here between this and $ArrayRefOfArrayRef. Since we are assigning to @ArrayOfArrayref directly, it is an array first (note the '@' here). We need to assign to @ArrayOfArrayRef in LIST context, which forces a (). Note how we access it. There is no -> between the Ref and the [4] because ArrayOfArrayRef is an actual Array, NOT an array reference. Hence, the -> is only good in the case of array references, NOT arrays. In fact, this is a good maxim to remember:

    Only use the -> syntax when you are dealing with references, not actual pieces of data. Only use a -> at the 'top level' of a data structure. You don't need to do it at lower levels.

    You can use these three rules in combination with the methods for access we gave last chapter to access and print out some pretty complicated data structures. As a final example, lets suppose we want to print out the following data structure:

    $hashref =

    {

    key1 =>

    {

    key1a => value1a,

    key1b => value1b,

    },

    key2 =>

    {

    key2a => value2a,

    key2b => value2b

    }

    };

    This is a hash of hashes. The idea is traverse down the data structure, and at each level print out all of the data structures below. Hence, we can use %$hashref, to dereference the first layer, giving access to key1 and key2.

    foreach $key (sort keys (%$hashref))

    {

    # accessing the first level of the

    # data structure by rule1 (% in front of $)

    # gets keys (key1, key2). We do 'sort' to retain order;

    The line 'sort keys (%$hashref)' now contains an array of keys inside the hash. The 'foreach $key' loop lets us iterate through them. We now want to make a placeholder, (rule2a) so that we can access the values of the first level which are themselves hashes. This looks like:

    my $ref = $hashref->{$key};

    # sets the reference equal to the values of key1, key2 (ie: other hashes.)

    This points to the hashes that are the actual values of the keys in hashref. If one level is:

    'key1' => { 'key1a'=>'value1a','key2a' => 'value2a' }

    then $hashref->{'key1'} will point to the hash {'key1a'=>'value1a', 'key2a' => 'value2a'}. We now proceed to go through the second level hash in the same way we did the first:

    foreach $key2 (sort keys %$ref)

    { # loops first level hash(key1a, key1b)

    my $value = $hashref->{$key}; # (key2a, key2b) second.

    print "$key: $key2: $value\n";

    }

    We then close the first level loop with a right curly brace. When this loop is done, it prints out:

    key1: key1a: value1a

    key1: key1b: value1b

    key2: key2a: value2a

    key2: key2b: value2b.

    Make sure you understand how this is working as it is a common metaphor in Perl. We have seen it once, and we shall see it again.

    As with everything else in Perl, you do have a choice. This functionality could be written by using the shorthand of Rule3 (using brackets to disambiguate):

    foreach $key (sort keys (%$hashref))

    {

    foreach $key2 (sort keys %{$hashref->{$key}})

    {

    foreach $key3 (sort keys %{$hashref->{$key}{$key2}})

    {

    ....

    I personally prefer the first structure rather than the second, because I'm forced to think what I am doing. The second example blithely goes through the levels of the data structure without being explicit on how the levels interact with each other, although the code is much shorter. It lacks clarity of thought; unless you know exactly what the syntax means, and how this is working, I would stick to using the placeholder technique.

    Summary of Dereferencing Data References

    This section discussed the various methods for retrieving data from memory once it had been put there in scalar, array or hash form. Dereferencing means interpretation of a reference to see where it points, going there and retrieving the data. This occurs all in one step.

    The most important rule with dereferencing and deconstructing is, as with all of Perl, keep it simple. It is better to use multiple lines of code to show your intentions than to try to show off your brilliance through complexity (which usually has the opposite effect!). In particular there are three rules you should remember:

    1) Use '@' '$', or '%' to dereference references of the appropriate type: if you say @arrayRef = ('my', 'hash'), then $hashref->{'my'} equals 'hash'.

    2) Use ->, as in $hashRef->{'key'} or $arrayRef->[1], to access elements of an hash reference or array reference.

    3) Use $$arrayRef[0], or $$hashRef{'key'} as an alternate method of accessing elements. Use in moderation, as rule 2 is much cleaner.

    4)Use brackets to disambiguate the two ways of referring to these structures. Hence, ${$arrayRef}[0] is the same as $$arrayRef[0], only more 'clean'.

    Finally, remember that you can use these things in conjunction with each other ${$arrayRef->[0]}[0] might refer to an element in a two dimensional array (although the cleaner $arrayRef->[0][0] might be preferable.)

    The next section shows another way to create data and data structures.

    Creating Data Structures By Assignment

    Perl provides the programmer a powerful way to create data structures implicitly. This is through assignment of a value. In other words, there is no need to take two steps to create a structure and then assign values to it.

    If you want a three dimensional array, you can write:

    $array3D[4][2][1] = 1;

    and voila! The array is created for you. As always, Perl is nice enough to handle memory management transparently for you. Perl sizes up the 3D array and makes a chunk of memory at least five elements by three elements by 2 elements in size (remember '0' subscripts.)

    Before you go onto this section, you should understand the above two sections on constructing and deconstructing references thoroughly. The reason is that Perl delivers a loaded weapon into the developer's hand. This is the ability to make complicated data structures without thinking of the consequences of the complication. In other words, it is easy to write:

    $a->{'sam\'s'}{'account'}[1]{'January'} = ['list', 'of', 'purchases'];

    without thinking of how to access the structures underlying what you are doing. There is a general rule for creating complex data structures by assignment, and this is more a programmer's style rule than anything else.

    If you are going to create a complex data structure by simple assignment, know what the anonymous reference of what you just created looks like, from the point of view of where you created it. The above example (with list of purchases) looks like:

    $a = {'sam\'s' => {'account' => ['',{'January' => ['list', 'of', 'purchases']}]}};

    Ugly, right?

    When code looks that ugly, consider breaking up the data structure creation into several separate steps, and think about capturing the sub-steps into functions. In other words think about what the process is going to be for recovering your data, and if that data is better served in a class.

    ***Begin Note***

    As a personal rule, I never directly access data structures more than two levels deep. This forces clarity and, hopefully, makes life easier for my fellow programmers.

    ***End Note***

    The Process to Create Data Structures by Assignment

    The process for creating a data structure by assignment is extremely easy. Simply write down the data structure you want, being careful of contexts, and Perl will do all the variable management for you.

    For example, the following creates a three dimensional array with at least one element in each column. It also sets the 0,0,0 element to 1:

    $threeDarray[0][0][0] = 1; # Creates a three dimentional array, ie:

    # [[[1]]]

    The next example creates a three dimensional array with an initial size of 2 elements in the first column, 3 elements in the second column, and 1001 elements in the third column. Since the 0..999 elements are not there, Perl creates them and stuffs them with blanks.

    $threeDarray[1][2][1000] = 1;

    It is possible to create hashes with arrays for the hash keys. The next example is a 2 dimensional hash with an anonymous array as the entry for hashkey1 and hashkey2:

    $twoDhash{hashkey1}{hashkey2} = [@array];

    The following statements create the reference equivalents of the previous three examples. Note that there is an extra '->' between the Ref and the first [ ]. This indicates that we are dealing with a reference here, NOT a real variable. It also makes the variable much easier to pass to functions:

    $threeDarrayRef->[0][0][0] = 1;

    $threeDarrayRef->[1][2][1000] = 1;

    $twoDhashRef->{hashkey1}{hashkey2} = [@array];

    complicatedFunction($threeDarrayRef, $twoDhashRef);

    See how clean and easy this is? This will be a major point in our chapters to come. Passing a 'real' variable to a function looks like:

    complicatedFunction(\@threeDarray, \%twoDhash);

    which is a little bit cluttered.

    We have now seen the creation of a Perl data structure by way of an assignment to a variable. Just remember, this is not a very explicit way to create variables and data structures. Code maintainers will think mean thoughts about you if you scatter newly created variables throughout your code without appropriate comments. We are not calling for a return to the days of declaring all variables at the top of a program, but do show some kindness to those who follow you.

    Translating Direct Assignments into Equivalent Perl 5 Statements

    Translating assignments into anonymous references is the process of taking an assignment and converting it into a Perl statement that would do the same thing, if made by a direct assignment via list, or anonymous constructors. (forgetting the rest of the data structure for the time being)

    This is a good exercise to go through, and helps you thoroughly in understanding references. For example, the assignment:

    $array1[4] = 1;

    turns into the Perl statement

    @array1 = (undef,undef,undef,undef,1);

    See how this works? If you set the fifth element of array1 to 1, then this creates, by default, 4 undefined elements in the array, before the '1' in element #4 (since Perl is indexed by 0). This is what actually happens when you assign to an array element larger than the largest one: the array extends, filling the extra 'spaces' with undefined elements.

    Again, concentrate on the one statement itself. Forget about the rest of the data structure when doing these exercises.

    There are lots of reasons why you want to be able to translate complicated data structures into references:

    1. clarity, so you know exactly what you are doing

    2. for regression testing

    3. for debugging.

    Reason #1, clarity, is probably the most important. If you are going to be making complicated data structures, you need to know exactly what you are doing. Turning an assignment into an anonymous reference (so if you wanted to, you could build the data structure by hand) is an easy exercise to see how clearly you are thinking.

    Fortunately, there is an easy process for translating a data structure into an anonymous reference: simply use the three rules used to dereference the variables to deconstruct the arrays, and put them into reference form. In this way, ({ }becomes a {...} context, [ ] becomes an array context [ ... ], etc..

    This is a subject that is best explained by example. If you have the assignment:

    $array[4] = 1;

    it turns into the following data structure/Perl statement:

    # @array = (undef,undef,undef,undef,1); # shows the data structure @array after assignment.

    # note... only affects the place with the 1, doesn't

    # affect the entire data structure.

    Likewise the assignment:

    $array2D[4][0] = 1;

    Turns into:

    @array2D = # shows the data structure @array2d after assignmnt.

    (undef,undef,undef,undef

    [1] # ArrayRef in fifth (indexed by 0) element.

    );

    And if we add the assignment

    $array2D[4][1] = 2;

    To the data structure $array2D we get:

    @array == (undef,undef,undef,undef,

    [1,2] # ArrayRef in fifth (indexed by 0) element

    ); # added to.(in bold)

    What has happened here is that Perl has automatically grown the array for you. The '2' here gets added after the '1' to get the structure above.

    Now if we do an assignment with a ->, a reference to an array or hash is created instead:

    # $arrayRef->[4] = 1;

    In the corresponding anonymous form, there are square brackets instead of the parentheses:

    # $arrayRef == [undef,undef,undef,1]; # Note... brackets here instead of ()

    Here's a couple of three dimensional examples. Notice that each bracket causes an extra level of nesting:

    $arrayRef3D->[4][1][1] = 'ADDED ELEMENT1';

    # Equivalent to:

    #$arrayRef3D ==

    # [undef,undef,undef,undef

    # [undef, #arrayRef in fifth (indexed by 0) element.

    # [undef,'ADDED ELEMENT1'] #array Ref in second element of

    # ] #fifth element

    # ];

    If we add something again:

    $arrayRef3D->[4][0][2] = 'ADDED ELEMENT2';

    #Equivalent to

    # $arrayRef3D == [undef,undef,undef,

    [ [undef,undef,'ADDED ELEMENT2'],[undef,'ADDED ELEMENT1']]];

    the added element gets put in the place where Perl grows the array.

    Finally, a hash example:

    $twoDhashRef->{hashkey1}{hashkey2} = [@array];

    # Equivalent to

    # $twoDhashRef == {hashkey1 => {hashkey2 =>[@array]}};

    See how each level of nesting in the hash here causes an extra '{'? Since [@array] points to an array, the value of hashkey2 becomes a pointer to the elements of @array..

    The above examples show why it is so important to know exactly what you are doing before you do that long assignment. Think of what happens if you do something like:

    $arrayRef3D->[4][0][1] = [@array];

    and then

    $arrayRef3D->[4][0] = { %hash };

    Lets turn this into the anonymous constructs as above. The first assignment '$arrayRef3D->[4][0][1] = [@array];' becomes:

    $arrayRef3D = [undef,undef,undef,undef,[[undef,[@array]]];

    with each indentation in $arrayRef3D becoming an extra dimension.

    Now, when we add the hash, we get

    $arrayRef3D = [undef,undef,undef,undef,[{%hash}]];

    See what happened? We replaced the [4][0] element with a hash. [@array] occupies the [4][0][1], element position. Since [@array] was part of that [4][0] element, it got clobbered, along with any other subscripts hanging off [4][0].

    Since Perl does all of its own memory management implicitly, the array reference that you are expecting disappears, and is replaced by a hash. This is probably not what was intended. Be careful with this construct and use translation to debug.

    ***Begin Note***

    Fortunately, there is lots of help for the programmer that is dealing with structures and debugging them.. The built-in debugger has facilities for printing out data structures as complicated as you would like. Also there is a gem of a module (imported from CPAN, it is discussed it in section 'Modules: Data::Dumper' and it is included in the CD that comes along with this book) that can take any reference that Perl uses and turn it into the structure format listed. We will take a look at Data::Dumper later on.

    Simple example of Dumper:

    use Data::Dumper;

     

    $arrayref = [1,2,3,[1,2,3,4,{key1 => value1, key2 => value2}]];

     

    print Dumper($arrayref);

     

    $VAR1 = [1, 2, 3,

    [1, 2, 3, 4,

    {

    key1 => value1,

    key2 => value2

    }

    ]

    ];

    We discuss Dumper further in the chapter on 'programming for debugging' - it really helps quite a bit.

    ***End Note***

    So why learn how to turn a Perl assignment into a data structure that is equal to that assignment if Perl can do it for you? Because you can intuitively debug any Perl program if you know this method. If you are an experimenter, then read the above twice or thrice, look at the more complicated examples below, and then and then go ahead and turn to the section called Dumper and experiment with it.

    The better you know the actual Perl translation of code, and the better you can understand how these data structures work, the cleaner your code will be. And those who maintain after you will sing your praises.

    References and Scoping: Garbage Collection

    There is one special property that you should be aware of when dealing with references, and that is Perl's policy towards re-collecting the memory of references that go 'out of scope'.

    This policy is called garbage collection by reference count and it works like this.

    Remember when we said that the following statement:

    if ($true)

    {

    my $varb = [1,2,3,4];

    }

    actually created a new variable called $varb, assigned it the value [1,2,3,4] and then instantly destroyed $varb after the bracket? Well there was a bit of complex machinations going behind the scenes here.

    Every time you say 'my ($varb) = [1,2,3,4]' or somesuch, Perl marks down that the particular piece of memory [1,2,3,4] is being referred to by one variable, namely $varb. When the closing bracket is reached, the reference count for the piece of memory goes down by one ($varb is destroyed). And now, when the reference count is zero for [1,2,3,4], that piece of memory is reclaimed.

    The important thing to remember here is that Perl does not claim memory back when a particular instance of a variable is destroyed, but only when the reference count for a piece of memory goes down to zero. It is very important to understand how Perl behaves in this situation, and is one of the primary ways that Perl differs from C, C++, or any other lower level language. If you now say:

    1 my ($saveVarb);

    2 if ($varb)

    3 {

    4 my ($varb) = [1,2,3,4];

    5 $saveVarb = $varb;

    6 }

    7 print "@$saveVarb\n";

    the reference count for the piece of memory in [1,2,3,4] goes up by one (for my ($varb) = [1,2,3,4]) and then goes up to two (from $saveVarb = $varb). When the closing bracket is reached, the reference count goes down to one. $varb is destroyed as per scoping rules, but the piece of memory referred to by $varb does not get destroyed because the reference count is still one. Therefore, line 7 prints out '1 2 3 4'.

    Another example:

    1 sub subroutine

    2 {

    3 my (@array) = (@_, @_);

    4 return(\@array);

    5 }

    This simply returns twice the argument stack back to the caller in the form of an array reference. At line 3, the reference count to the data contained by @array is now one, and when you get to line 4, the reference count to @array is temporarily increased to 2 (since a variable on the right hand side of the subroutine call is being assigned \@array, as in my $arrayRef = subroutine(@array)).

    The reference count then goes back down to one after the main subroutine is entered. But the memory in @array persists.

    This policy allows you to not worry about memory management, 99% of the time that is. If you say something like:

    1 if ($true)

    2 {

    3 my $a = 'HERE';

    4 $a = \$a;

    5 }

    Then Perl will never reclaim this memory. Instead, it persists. In line 3, the reference count for the piece of memory pointed to by 'HERE' becomes 1. In line 4, it's bumped up to two (since $a is now pointing to 'HERE' as well). When line 5 comes, the reference count goes down from two to one as $a is 'destroyed'.

    Hence, the memory count here never reaches zero. You are going to have to explicitly do something like:

    if ($true)

    {

    my $a = 'HERE';

    $a = \$a;

    undef $$a;

    }

    to break this cycle (which may be common in cases where you make tree structures).

    The point of this section is trust Perl to do the right memory management most of the time, but be quick to jump on the 'unable to free memory' error if it occurs.

    Examples

    Let's take a look at some examples of references closer, with the intent of understanding the syntax behind them. Let's start with an example of the flexibility behind Perl's access.

    Example 1: Direct Anonymous Access

    The first example deals with something that is near and dear to the Perl coder's heart: saving typing space. Consider, for example, the situation in which you want to temporarily treat an array as a hash, and dereference it. For example:

    @array = ('key1', 'value1', 'key2' => 'value2', 'key3' => 'value3');

    Suppose that you want to treat this as a 'hash', such that 'key1's value is 'value1', and so on. The traditional approach (and not a bad one, I might add) is:

    %hash = @array; # sets the hash %hash equal to the array @array;

    print $hash{'key1'};# prints out 'value1'.

    Can we get shorter than this? Of course! Its Perl! Hence, we could say something like:

    print ( {@array}->{'key1'} );

    See how this is working? @array, by virtue of being enclosed by '{}' is being interpreted as a hash reference. '->' dereferences that hash reference, and {'key1'} tells us that we are looking for the value that corresponds to 'key1'. This prints out:

    value1

    This might seem a bit on the tacky side (it is), but you can actually use it to make complicated switch statements:

    my $response = { 'option1' => 'value1', 'option2' => 'value2', 'option3' => 'value3'}->{$input};

    which is equivalent to the more conventional (and probably better):

    $response = ($input eq 'option1')? 'value1' :

    ($input eq 'option2')? 'value2' :

    ($input eq 'option3')? 'value3';

    Which of course goes through each 'option1' and compares $input to it. It is your choice, however, to decide which is best. If you have a function that returns a hash, such as:

    $days = { _getDays() }->{'Mon'};

    Where _getDays() returns a hash:

    sub _getDays

    {

    my %return = ('Mon' => 0, 'Tue' => 1, 'Wed' => 2, 'Thu'=>3, 'Fri' => 4, 'Sat' => 5, 'Sun' => 6);

    %return;

    }

    (this would make '$days' 0, by the way)

    Example #2: Reading a Flat File and bundling it into a data structure

    OK, so lets take a little bit more conventional example. Suppose you have a list of data that corresponds to time connected to a network. It looks something like this, which corresponds to an HTTP access log:

    alpha.umn.edu - - [24/Feb/1997:09:03:50 -0700] "POST /cgi-bin/script1.cgi HTTP/

    alpha.umn.edu - - [24/Feb/1997:09:04:15 -0700] "POST /cgi-bin/script1.cgi HTTP/

    mcgraw.com - - [24/Feb/1997:09:04:22 -0700] "POST /cgi-bin/script2.cgi HTTP/

    rohcs.ats.com - - [24/Feb/1997:09:04:34 -0700] "POST /cgi-bin/script2.cgi HTTP/

    rohcs.ats.com - - [24/Feb/1997:09:04:34 -0700] "POST /cgi-bin/script1.cgi HTTP/

    Now, your job is see which sites are accessing each particular file. In short, you might want to see the output be a data structure like:

    $hash = {'/cgi-bin/script1.cgi' => {'alpha.umn.edu' => 2, 'rohcs.ats.com' => 1 }

    which tells me, at first glance, that the script 'script1.cgi' has been accessed 2 times from alpha.umn.edu, and 1 time from 'rohcs.ats.com'. To do this, you might want do something like:

    Listing 7.1 logParse.p

    1 use FileHandle;

     

    2 my $FH = new FileHandle("access_log");

    3 my ($line, $accessHash) = ('',{});

    4

    5 while (defined $line = <$FH>)

    6 {

    7 $line =~ m"(.*?) .*POST (.*?) HTTP";

    8 my $address = $1;

    9 my $script = $2;

    10 $accessHash->{$script}{$address}++;

    11 }

    The main thing here is lines 7 and 11. Note line #7 here -- here are regular expressions again! This particular regular expression goes through the line, matches the address and script name, saves them to the variables $1 and $2 respectively, as in Figure 7.6:

    fig706.fig

    Figure 7.06

    regular expression to match line from access log.

    Anyway, the line here in question is #11. When you say:

    $accessHash->{'/cgi-bin/script1.cgi'}{'alpha.umn.edu'}++;

    This creates an entry in the hash accessHash, according to the rules that we described above (substituting '->' and '{}{}' each for an anonymous hash. This is like:

    $accessHash = {'/cgi-bin/script1.cgi' => {'alpha.umn.edu' => 1 }}

    After the while loop is done, the data structure looks like:

    $accessHash =

    {

    '/cgi-bin/script1.cgi' => {

    'alpha.umn.edu' => 2,

    'rohcs.ats.com' => 2

    },

    '/cgi-bin/script2.cgi' => {

    'rohcs.ats.com' => 1,

    'idg.com' => 1

    }

    };

    Using the rules of dereferencing, we can untangle this (it is a 'hash of hashes' as we shall see next chapter). For instance, if we want to get the particular sites that access '/cgi-bin/script1.cgi', we can say:

    my $script1Hash = $accessHash->{'/cgi-bin/script1.cgi'};

    my @accessSites = keys(%$script1Hash);

    print "Access sites for script1: @accessSites\n";

    which works to dereference this, and print out 'Access sites for script1: alpha.umn.edu idg.com', as in Figure 7.7:

    fig707

    Figure 7.07

    Dereferencing a hash of hashes.

    In other words, the code '$accessHash->{'/cgi-bin/script1.cgi' goes down one level into the hash, and saves an 'intermediate' hash reference into $script1Hash. 'keys(%$script1Hash)' gets the keys for the hash referenced by $script1Hash.

    Example #3: A Recursive Data Structure for files

    OK, lets take one more example, one that is complicated enough to tie in all the concepts that we have seen this chapter. If you say something like:

    dir file;

    and get

    Volume in drive C has no label

    Volume Serial Number is 4631-1803

    Directory of C:\WINDOWS

     

    . <DIR> 11-16-96 5:10p .

    .. <DIR> 11-16-96 5:10p ..

    COMMAND <DIR> 11-16-96 5:14p COMMAND

    SYSTEM <DIR> 11-16-96 5:14p SYSTEM

    HELP <DIR> 11-16-96 5:14p HELP

    NETDET INI 7,885 07-11-95 9:50a NETDET.INI

    FORMS <DIR> 11-16-96 5:38p FORMS

    SMARTDRV EXE 45,145 11-16-96 5:01p SMARTDRV.EXE

    REGEDIT EXE 120,320 07-11-95 9:50a REGEDIT.EXE

    ...SKIP LOTS HERE.

    242 file(s) 12,379,223 bytes

    21 dir(s) 438,534,144 bytes free

    this can be looked on as a recursive data structure, or a tree. Each of the directories here correspond to a branch, that can be expanded, and each file corresponds to an edge, or leaf. Now, doing a directory command on the entire structure on disk can be costly in time resources. Instead, we'd like to do it (if possible) only once, and then cache or save the fact that we have done it in memory. So, we could do something like.

    my $tree = findTree("/");

    where 'findTree' takes a directory ("/") as an argument, and returns a data structure representing a file tree, and

    my @files = getFiles($tree, "export/home/epeschko");

    retrieves the information about files underneath the directory 'export/home/epeschko', relative to the data structure itself, without actually having to do the disk read again. This turns into a data structure like:

    $tree =

    { "COMMAND" => { 'file list_in_command' }.

    "SYSTEM" => { 'file_list in system' }.

    "NETDET.INI" => "FILE",

    "FORMS" => { 'file_list_in_forms' }.

    "SMARTDRV.EXE" => "FILE",

    "REGEDIT.EXE" => "FILE",

    "HELP" => { "ACCESS.TXT" => "FILE",

    "HELP.TXT" => "FILE",

    "WHATVR" => { "TEST1" => "FILE"

    "TEST2" => "FILE"

    }

    }

    };

    Here, each hash represents a directory encountered, and each sub-hash represents either files, or directories under that directory. Here, 'help' is expanded for you (to see the files underneath it).

    Now this is a job for recursion, and we have already seen a recursive solution from chapter #5 which did something very similar:

    Listing 7.2: simpleFind.p

    1 sub simple_find

    2 {

    3 my ($input) = @_;

    4 my $file;

    5 print "$input\n" if (-f $input);

    6 if (-d $input)

    7 {

    8 opendir(FD, $input);

    9 my @files = readdir(FD);

    10 closedir(FD);

    11 foreach $file (@files)

    12 {

    13 next if ($file eq '.' || $file eq '..'); # don't want to self-recurse

    13 simple_find("$input/$file");

    14 }

    15 }

    16 }

    What do we need to do to modify this to get what we want? Well, the first thing we do is realize that we are simply printing out the input here, rather than putting it into a data structure.

    Secondly, 'simple_find("$input/$file") will print out the whole path to the file itself, printed out by 'print $input'. This is not what we want. Again, we are shooting for:

    $tree = { 'dir' => {'dir1 => { 'file1' => 'FILE' },

    { 'file2' => 'FILE' }

    }

    }

    not

    $tree = { 'dir' => 'dir/dir1' => {'dir/dir1/file1' => 'FILE' } } }

    Hence, we have two problems. One, the code as stated does not its results into a data structure. Two, it prints out too much information (we want to have only one level of the directory structure per 'branch'). These two problems translate into a modified subroutine like:

    Listing 7.3 findTree.p

    1 sub findTree

    2 {

    3 my ($input) = @_;

    4 my ($file, $dirhash) = ('', {});

    5 return ("FILE") if (-f $input);

    6 if (-d $input)

    7 {

    8 opendir(FD, $input);

    9 my @files = readdir(FD);

    10 closedir(FD);

    11

    12 chdir ("$input");

    13 foreach $file (@files)

    14 {

    15 next if ($file eq "." || $file eq '..');

    16 $dirhash->{$file} = findTree("$file");

    17 }

    18 chdir("..");

    19 }

    20 return($dirhash);

    21 }

    The first thing we notice is that the print is gone. Instead we have in line 5 'return("FILE") if (-f $input)'. This checks to see if '$input' is a file, if so, it returns a 'leaf' to the findTree that called it, saying that the argument that got passed to it was a file.

    Second thing is that the call returns a hash called '$dirhash' if and only if $input is a directory. And whether or not $input is a file or directory, notice that the heart of the subroutine is in line 16:

    $dirhash->{$file} = findTree("$file");

    This innocent looking statement is what builds our data structure for us. At the top level, $dirhash is the entire structure which gets returned to the main subroutine. In recursive calls to findTree, $dirhash->{$file} is the portion of the tree that is below the file (or directory '$file'). Say we call this routine on the file structure:

    $tree = findTree('.');

    'dir'

    'dir\dir1'

    'dir\dir1\dir2'

    'dir\dir1\file1'

    'dir\file1'

    Then this corresponds to the data structure:

    $tree =

    {

    dir => {

    dir1 => {

    dir2 => { },

    file1 => 'FILE',

    },

    file1 => 'FILE'

    };

    Now, if you say

    $dirRef = $tree->{'dir'}->{'dir1'};

    you get the data structure:

    {

    'dir2' => { },

    'file1' => 'FILE'

    }

    which is all of the files and directories under 'dir/dir1'. Further, this points to a solution to our second problem, namely, a way of retrieving information from this structure. Lets look at how we wanted to call it:

    my @files = getFiles($tree, "export/home/epeschko");

    A sample way of doing this is shown below.

    Listing 7.4 getFiles.p

    1 sub getFiles

    2 {

    3 my ($tree, $path) = @_;

    4 my ($pathPart, $filelist) = ('',[]);

    5 my $subtree = $tree;

    6 foreach $pathPart (split(m[\/], $path))

    7 {

    8 $subtree = $subtree->{$pathPart};

    9 }

    10 listFiles($subtree, $path, $filelist);

    11 return($filelist);

    12 }

    This is our 'stub' routine, which lets us have the usage that we want. Passing in 'export/home/epeschko' with the data structure is 'user friendly', but it is not 'computer friendly'. It requires the computer to parse out the structure, so the job of lines 6-9 is to take the path apart, turning it into ('export','home','epeschko') , and then in line 8, to go down to the appropriate level in the hash.($subtree = $subtree->{$pathPart}) At that point, and only that point do we call 'listFiles' which is our recursive subroutine that does most of the work. This is listed below.

    Listing 7.5 getFiles.p part 2

    1

    2 sub listFiles

    3 {

    4 my ($subtree, $path, $filelist) = @_;

    5 my $key;

    6 my $return = [];

    7 foreach $key (sort keys %$subtree)

    8 {

    9 my $sublist;

    10 if (ref ($subtree->{$key}))

    11 {

    12 $sublist = listFiles($subtree->{$key}, "$path/$key", $return);

    13 }

    14 else

    15 {

    16 push(@$return, "$path/$key");

    17 }

    18 }

    19 return($return);

    20 }

    Here we use ref, and traverse the data structure going down. 'foreach $key (%$subtree)' in line 7 gets all of the keys in the data structure at our current level, and line 10 checks to see if the key '$key' points to a hash or a regular element.

    If $subtree->{$key} is a hash reference, then 'listFiles' is called recursively (line 10-13). If $subtree->{$key} is a scalar (i.e.: not a reference) we know that it is a file, and we push it onto the return list (line 16). Look at Figure 7.8:

    fig78

    Figure 7.8

    Getting the data out of the findtree structure.

    This example shows even more. Look at the way we are calling our subroutine. To save time, when we do our recursive call, we are putting the values into an array reference. Hence:

    push(@$return, "$path/$key");

    is the only place where we do any copying of any sort. Hence, in the subroutine call:

    listFiles($tree, "export/home/epeschko", $return);

    $return is actually getting changed by the subroutine. The push statement above actually references what is inside of $return, and actually changes it. The $return variable is therefore 'there for the ride'.

    This has been a long and rather arduous example, but it shows a lot about how Perl's subroutine structure works. It is available on the CD associated with this book and if you have questions about it, one of the best ways to figure it out is by taking it apart. In fact, you will learn more about Perl by taking this one group of subroutines apart, than most people learn about Perl in months.

    Summary of References

    References are Perl's way of creating extremely complicated data structures. If you are going to remember anything out of this section, remember how to:

    1. create an arbitrary data structure via references. That is, know what $anonHash ={key =>{ key => {key => [@array]}}}} means.

    2. know how to get values out of a data structure, that is know that $anonHash->{key}{key}{key}[1] gets the value $array[1].

    3. and know how to turn an assignment into a Perl construction. That $array->[0][2][1] = 'ADDED taken by itself turns into the data structure $array = [['','',['','ADDED']]];

    In other words, if you can manipulate hard references and go backwards and forwards with them in your mind, then you are ready to use Data::Dumper. This will help you immensely to understand references. In fact, it will also help you to debug your programs quite well. We will see much more use of Dumper throughout this book.

    Remember also, the policy on how Perl reclaims memory that is allocated in a process called garbage collection. In short, Perl does what you want 95% of the time, and only when you start making statements like:

    my $a = 1;

    $a = \$a;

    are you going to get memory leaks. Even then, you can explicitly tell Perl not to leak via saying something like undef $$a;

    Also, please look over the three examples in this chapter. The next chapter will give lots of examples too (in fact it is a chapter devoted to pragmatic examples about references), but these three really help with the concepts involved behind them. Master them, and the next chapters examples will be almost too easy.

    Finally, don't get too complicated with references. If you need a complicated data structure, you might as well use a class instead. Classes have other inherent plusses to them. One value of classes is that they are not generic, because you can tell at a glance what type of you can have as little or as much typechecking as you wish. You can also simplify your interface. Therefore, we will show some of the more common data structures below, and then move on to package and class syntax.

     

    Orders Orders Backward Forward
    Comments Comments

    COMPUTING MCGRAW-HILL | Beta Books | Contact Us | Order Information | Online Catalog


    HTML conversions by Mega Space.

    This page updated on October 14, 1997 by Webmaster.

    Computing McGraw-Hill is an imprint of the McGraw-Hill Professional Book Group.

    Copyright ©1997 The McGraw-Hill Companies, Inc. All Rights Reserved.
    Any use is subject to the rules stated in the Terms of Use.