Orders Orders Backward Forward
Comments Comments
© 1997 The McGraw-Hill Companies, Inc. All rights reserved.
Any use of this Beta Book is subject to the rules stated in the Terms of Use.

Chapter 8: More on Perl 5 References and Common Data Structures

Up to this point we have been talking about the very lowest level of Perl 5 data structures: scalars, hashes and arrays. We have also discussed references, which are ways to point to and access data from within a given structure. Now we are going to put these pieces together into a more complex format.

There are data structures that come up again and again in Perl, and 80% of the time these are all you need to get along even without modules or objects. Indeed, these structures are often used in conjunction with modules and classes: they handily return data to the programmer that is too complex for a simple array or hash reference to provide.

Chapter Overview.

In this chapter, we shall go over four of these 'commonly used data structures'. These data structures are:

  • Array of Arrays

  • Array of Hashes

  • Hash of Hashes

  • Hash of Arrays

  • In each of these cases, there are three basic things that you need to know about each of the above structures, in addition to examples, in order to actually use them:

    1) how to recognize them

    2) how to construct them

    3) how to get values out of them

    This chapter is therefore split into four parts, each part dedicated to a different data structure, looking at these three issues, and seeing actual examples of their usage. We shall also consider hybrid data structures, ones that combine elements of the four data structures into structures that are either more complicated (like Array of Hash of Hashes) or more irregular.

    Finally, we shall talk a little bit about passing around complicated data structures inside programs, and functions.*

    There is a slight simplification in talking about these data structures that you should know about. When we talk about 'Array of Arrays' or 'Array of Hashes', etc. we are really talking about what you might call a reference to an Array of Arrays, or a reference to an a Array of Hashes.

    I mention them without the 'reference' part since I have found that it is wise to always use references when dealing with data structures that are more complicated than your standard hash or array. If you always use references, then this will simplify your life quite a bit.

    All this is an 'applied' chapter. The last chapter talked quite a bit about the theory behind Perl references, and gave some examples, but didn't go into the nuts and bolts behind the most common Perl data structures. Perl4 programmers also take a little bit of orientation to actually get used to them.

    If you are not quite sure of how references work, or want more examples of their usage, this chapter is recommended. People who are looking for even more examples on the way Perl references are advised to turn to the reference pages (perlLoL). As always, this page goes up and down, inside and out, giving many examples on the subject.

    So, here we go. Let's first take a look at the data structure that most people get to know first when they learn Perl references: an Array of Array.

    Array Of Arrays (AoA)

    An array of arrays is a simple, two dimensional array.. Sort of like a checkerboard with the elements being the reference points to the squares on a board.

    The Array of Arrays data structure isn't used as often as one might think, since as we shall see, getting individual values out of them requires that one knows the numeric subscript for that value, which can make them difficult to debug (often you find yourself asking questions like 'is the element I want in $value->[7][6], or $value->[7][5]?')

    However, they are the first reference data structure that people learn, since they look a lot like two dimensional arrays in other languages. They are also important in cases where you have data in a table-like form, where you are assured that the data is in a regular structure. Or, in cases where you need to keep the order of the data, and that data is too complicated to keep in an array.

    How To Recognize an Array Of Arrays

    If you were to construct an Array of Arrays, or create it out of a direct assignment, it would look something like this:

    $AoA =

    [

    ['element1a', 'element2a', 'element3a','element4a']

    ['element1b', 'element2b','element3b','element4b']

    ];*

    Here, the outermost set of brackets (the [] encompassing everything) indicates that $AoA is a reference to an array. Each inner set of brackets equals a sub-array, or the second dimension of the array.

    This is also the output that you would get if you used the module Data::Dumper, and did something like:

    use Data::Dumper;

    print Dumper($AoA);

    Given, of course that $AoA is an array of arrays, and contains the values given above.

    Array of Arrays Direct Access

    As we said, Array of Arrays work well in cases where the data layout is well known. Such a data set might be seen in a database table (say customer), where each row in that database consists of four columns: id, name, address, type. Suppose that the data looked like:

    1111|George Hammond|12 Elk Pkwy | GOOD

    1112|Susie Wayland|15 Sachs Road | CANCELED

    1113|Michael Thurmond|1115 Cherry St |GOOD

    This could, fairly easily, be turned into a Perlish data structure that looked like the following:

    $ArrayOfCustomers =

    [

    [ 1111, 'George Hammond','12 Elk Pkwy','GOOD'],

    [ 1112, 'Susie Wayland','15 Sachs Road','CANCELED'],

    [ 1113, 'Michael Thurmond','1115 Cherry St','GOOD']

    ];

    This is an Array of Arrays. Each row of the database table corresponds to an array inside the Array of Arrays. Let's look at five ways that you may want to access this internal element.

    First Way:

    Let's say that you want to get the first customer's name. Each element in an array of arrays can be directly accessed by using the following syntax:

    $AoA->[$dim1][$dim2];

    in which $dim1 and $dim2 are the subscripts of each dimension in the array that you want to access. Hence, something like:

    print $ArrayOfCustomers->[0][1];

    prints out the first character's name, printing out the bolded element:

    $ArrayOfCustomers =

    [

    [ 1111, 'George Hammond','12 Elk Pkwy','GOOD'],

    [ 1112, 'Susie Wayland','15 Sachs Road','CANCELED'],

    [ 1113, 'Michael Thurmond','1115 Cherry St','GOOD']

    ];

    Going further, suppose you want to print out the data of a customer directly. In other words, you want to print out an entire row consisting of id, name, address, and status. Well, the syntax for printing out a sub array is just as easy:

    my $arrayRef = $AoA->[$dim1]; @$arrayref;

    in which $dim1 is the first dimension of the array, the row you wish to access. Given the above data, the statement:

    (@{$AoA->[0]});

    points to the bolded customer in the following

    $ArrayOfCustomers =

    [

    [ 1111, 'George Hammond','12 Elk Pkwy','GOOD'],

    [ 1112, 'Susie Wayland','15 Sachs Road','CANCELED'],

    [ 1113, 'Michael Thurmond','1115 Cherry St','GOOD']

    ];

    and the statement:

    print "@{$AoA->[1]}\n";

    prints out "1112 Susie Wayland 15 Sachs Road CANCELED".

    Third way

    Suppose you want to extract only the names from your data structure. This is a job for map (as we shall see in built-in functions and special variables):

    @names = map($_->[1], @$ArrayOfCustomers);

    which works by going through each array in the array @$ArrayOfCustomers and pulling out the second element in each, and then passing this back to @names:

    $ArrayOfCustomers =

    [

    [ 1111, 'George Hammond','12 Elk Pkwy','GOOD'],

    [ 1112, 'Susie Wayland','15 Sachs Road','CANCELED'],

    [ 1113, 'Michael Thurmond','1115 Cherry St','GOOD']

    ];

    See the chapter on built-in functions for more information on map. It could be rewritten as:

    foreach $name (@$ArrayOfCustomers)

    {

    push(@names, $ArrayOfCustomers->[1]);

    }

    Fifth way

    Suppose you want to get the Name and Address for the third customer in the list:

    ($name, $address) = @{$ArrayOfCustomers->[2]}[1,2];

    which works by going into the second array element of $ArrayOfCustomers and then grabs its second and third members, accessing the following bolded elements:

    $ArrayOfCustomers =

    [

    [ 1111, 'George Hammond','12 Elk Pkwy','GOOD'],

    [ 1112, 'Susie Wayland','15 Sachs Road','CANCELED'],

    [ 1113, 'Michael Thurmond','1115 Cherry St','GOOD']

    ];

    Final Way

    Suppose that you want to get the customers that only have ids greater than 1111. This is a job for grep:

    my $greatIDS = [];

    @$greatIDS = grep($_->[0] > 1111, @$ArrayOfCustomers);

    Here, grep simply weeds out all the customers that have a first element that is less than or equal to 1111, and keeps all elements that satisfy the condition. Hence, $greatIDS will contain the following elements in bold:

    $ArrayOfCustomers =

    [

    [ 1111, 'George Hammond','12 Elk Pkwy','GOOD'],

    [ 1112, 'Susie Wayland','15 Sachs Road','CANCELED'],

    [ 1113, 'Michael Thurmond','1115 Cherry St','GOOD']

    ];

    Notice, that we make $greatIDS a reference, and then assign to that reference rather than to the array itself. This allows the data structure to mirror the one that it was derived from. This makes it look like:

    $greatIDS =

    [

    [1112,'Susie Wayland','15 Sachs Road','CANCELED']

    [1113,'Michael Thurmond','1115 Cherry St','GOOD' ]

    ];

    Again, grep will be discussed in the chapter on built-in variables. It is basically equivalent to the following:

    foreach $array (@$ArrayOfCustomers)

    {

    if ($array->[0] > 1111) { push(@$greatIDS, $array); }

    }

    i.e.: going through each array in the Array of Arrays and comparing it to 1111.

    A Common Misconception In Array of Arrays:

    Lots of Perl programmers make the following mistake in accessing elements inside an Array of Array reference. If we had said:

    @greatIDS = grep($_->[0] > 1111, @$ArrayOfCustomers );

    then @greatIDS is no longer a reference, and the data structure would look like:

    @greatIDS =

    (

    [1112,'Susie Wayland','15 Sachs Road','CANCELED']

    [1113,'Michael Thurmond','1115 Cherry St','GOOD' ]

    );

    Note the parentheses here. You would then access the first name ('Susie Wayland') with:

    print $greatIDS[0][1];

    instead of

    print $greatIDS->[0][1];

    Although with this usage, you would again have to be careful about passing @greatIDS to functions. If you said:

    function(@greatIDS, @greatIDS2);

    This is wrong if you want to pass these variables distinctly to a function call. Something like:

    function(\@greatIDS, \@greatIDS2);

    Since, again, Perl will take anything that looks like (@greatIDS, @greatIDS2) and munge it into one, gigantic list.

    Figure 10.1 summarizes all of the basic methods to access an Array of Arrays, albeit on a simpler example:

    Figure 8.1 (line art)

    Figure 8.1

    Access Methods on Array of Arrays

    You need to be careful not to mismatch references and actual variables. If you say something like

    @array = @{$AoA->[1]};

    then @array is an actual array, a non reference. Whereas if you say:

    @$array = @{$AoA->[1]}

    then $array is now a reference to an array, and therefore a lot more convenient to be passed to functions.

    If you truly understand how each one of these access methods are working, then you will have a good idea of how everything in Perl is working.

    Creating an Array of

    One thing that we have kind of ignored was actually how to turn:

    1111|George Hammond|12 Elk Pkwy | GOOD

    1112|Susie Wayland|15 Sachs Road | CANCELED

    1113|Michael Thurmond|1115 Cherry St |GOOD

    into:

    $ArrayOfCustomers =

    [

    [ 1111, 'George Hammond','12 Elk Pkwy','GOOD'],

    [ 1112, 'Susie Wayland','15 Sachs Road','CANCELED'],

    [ 1113, 'Michael Thurmond','1115 Cherry St','GOOD']

    ];

    In other words, we need to make a constructor. This problem extends to situations in which there is not a simple flat file, where we are accessing records from a database, from parsing data from commands, or other sources of information. For the simple purposes of this flat file example, the constructor will look like the following:

    my $AoA = makeArrayOfArrays('filename');

    where makeArrayOfArrays takes a filename, opens it up, parses the data in the format given above, creates an ArrayOfArrays, and passes back the reference to $AoA on the right hand side. Let's consider two possible attacks, one bad, and one good.

    Direct Constructor 1 of Array of Arrays (probably bad)

    First, let's look at, a suboptimal way of creating an Array of Arrays out of a flat file.

    This is the direct method: to take each individual element, one at a time, and then put that element into its appropriate slot. Something like this:

    1111|George Hammond|12 Elk Pkwy |GOOD

    1112|Susie Wayland|15 Sachs Road |CANCELED

    1113|Michael Thurmond|1115 Cherry St |GOOD

    set 1111 first (so the data structure looks like: [['1111']]

    setting 'George Hammond' second ( data structure looks like [['1111','George Hammond']]

    etc.. etc.. etc.

    Codewise this becomes:

    Listing 8.1: makeArrayOfArrays.p

    0 use FileHandle; # we need filehandle module for 'new FileHandle'

    1 sub makeArrayOfArrays

    2 {

    3 my ($file) = @_; # Assume that we are passed in a file name

    4 my $return = []; # return value;

    5 my (@lines); # lines for file.

    6

    7 my $fd = new FileHandle("$file") || die "Couldn't open $file\n";

    8 chomp(@lines = <$fd>); # get all lines, chop the return off of each line

    9

    10 my ($line, $xx, $yy);

    11 for ($xx = 0; $xx < @lines; $xx++)

    12 {

    13 my (@elements) = split(m"\|", $line);

    14 for ($yy = 0; $yy < @elements; $yy++)

    15 {

    16 $return->[$xx][$yy] = $elements[$yy];

    17 }

    18 }

    19 return($return);

    20 }

    This is a pretty common metaphor for C programmers just beginning with Perl. The heart of the Array of Arrays construction is in lines 11 through 17, which is basically two nested for loops and it is, well, 'sub optimal' for three reasons:

    1) it requires two variables ($xx, $yy) to manually iterate through each element in the two dimensional array

    2) it is way too verbose. As we shall see, we can compress this function down to 3 lines

    3) it has no sense of abstraction. In creating this data structure, we have to worry about each and every element in it.

    Point #3 is perhaps the most important point, one which we shall spend a great deal of time on in the second part of this book. What happens if we decide that we want to enhance the function so that it can ignore certain elements in the file? Something like:

    $AoA = makeArrayOfArrays('filename', {IGNORE => [1,2]});

    where the '{IGNORE => [1,2]}' parameter simply tells makeArrayOfArrays to ignore fields 1 and 2 in this particular file name. If we wanted to do this, we would have to add code to the function itself:

    Listing 8.2: makeArrayOfArrays2.p

    0 use FileHandle; # we need filehandle module for 'new FileHandle'

    1 sub makeArrayOfArrays

    2 {

    3 my ($file, $config) = @_; # Assume that we are given file name

    4 my ($return, @lines) = ([],()); # return value, lines for file;

    5 my $ignoreElements =

    6 $config->{IGNORE} || []; # we get the items we want to ignore

    7

    8 my $fd = new FileHandle("$file") || die "Couldn't open $file\n";

    9 chomp(@lines = <$fd>); # get all lines,chop return off lines

    10

    11 my ($line, $xx, $yy);

    12 for ($xx = 0; $xx < @lines; $xx++)

    13 {

    14 my (@elements) = split(m"\|", $line);

    15 for ($yy = 0; $yy < @elements; $yy++)

    16 {

    17 if (!grep ($_ == $yy, @$ignoreElements)) #

    18 {

    19 $return->[$xx][$yy] = $elements[$yy];

    20 }

    21 }

    22 }

    23 return($return);

    24 }

    Here, the added code is in bold. The grep statement basically looks through the @$ignoreElements array, checking to see if any of the elements happens to be $yy. English-wise, it translates as 'Ignore if $yy happens to be in the list @$ignoreElements'. (grep is infinitely useful. Again, see 'built-in functions' for more examples). But note two things here:

    First of all, the subroutine is getting rather long (24 lines).

    Second of all, the subroutine is subtly incorrect. Notice that line 19 will give you a structure that looks like:

    $AoA =

    [

    ['1111',''.''.'GOOD'],

    ['1112',''.''.'CANCELED'],

    ['1113',''.''.'GOOD']

    ];

    where each of the ignored elements will leave behind a 'ghost' element (a '' in its place). This may be desired, but most likely it isn't. This is a consequence of Perl's policy of variables.

    In any case, the upshot is that by doing this direct assignment, you limit your flexibility in coding. If you need to add functionality, then more likely than not, you will have to go through a lot of discomfort to make your functions work.

    So let's take a look at a better constructor for this particular data structure.

    Indirect Constructor of Array of Arrays. (better)

    As said, the above is bad because it does not allow for flexibility in coding. By relying on directly creating the array, you are painting yourself in a corner, not allowing your code to expand.

    So, how to overcome these limitations? The solution is to abstract your data structure creation. In this case, think of your lowest unit of measurement as a 'line' in a file, rather than each individual element.

    Below is a constructor that does basically the same thing, but does it in a way that will allow more room to grow:

    Listing 8.3: makeArrayOfArrays3.p

    0 use FileHandle; # use FileHandle package to get below functionality.

    1 sub makeArrayOfArrays

    2 {

    3 my ($file) = @_; # Assume that we are passed the

    4 # file name.

    5 my $return = []; # return value;

    6 my (@lines); # lines in file

    7

    8 my $fd = new FileHandle("$file") || die "Couldn't open $file\n";

    9 foreach $line (@lines)

    10 {

    11 push(@$return, [ split(m"\|", $line) ]);

    12 }

    13 return($return);

    14 }

    As you can see, this is a lot cleaner. Notice that the code is also shorter. In particular one line (line 11) takes the place of four in the direct constructor! Second, there are no iterator variables ($xx,$yy), and overall, the subroutine is shorter by about 10 lines. We could compress it further, too, if pressed. In particular, lines 9 through 12 could become:

    @$return = map {[split{m"\|", $_]} @lines;

    which basically is a synonym for 9-12, pushing onto a return array a changed version of an input variable (@lines).

    The main benefit of this approach is flexibility, however. Now suppose we want to add the same functionality that we tried (and failed) to add before, namely:

    $AoA = makeArrayOfArrays('filename', {IGNORE => [1,2]});

    where IGNORE signals the makeArrayOfArrays function to ignore the second and third elements. Well, the key to adding this is to recognize that line 11 is a perfect line to replace with a function call:

    Listing 8.4: makeArrayOfArrays4.p

    0 use FileHandle; # use FileHandle package to get below functionality.

    1 sub makeArrayOfArrays

    2 {

    3 my ($file,$config) = @_; # Assume that we are passed the

    4 # file name.

    5 my $return = []; # return value;

    6 my (@lines); # lines in file

    7

    8 my $fd = new FileHandle("$file") || die "Couldn't open $file\n";

    9 foreach $line (@lines)

    10 {

    11 push(@$return, getArrayRef($line, $config));

    12 }

    13 return($return);

    14 }

    Here, getArrayRef returns an array reference, given the passed-in line, and a the configuration hash. In this case, getArrayRef might look like this:

    1 sub getArrayRef

    2 {

    3 my ($line, $config) = @_;

    4 my @return;

    5 my $ignoreElements = $config->{IGNORE} || [];

    6 my @elements = split(m"\|", $line);

    7 foreach (@elements)

    8 { push(@return, $_) if (!grep($_ == $yy, @$ignoreElements)); }

    9 return([ @return ] );

    10 }

    Hence, we delegate the task of deciding which elements we want to a sub-function. Line 7 is responsible for weeding out the elements which are in our 'Ignore List', and we return an array reference to the ones that are left over (line 8). We end up, therefore, with a data structure that looks like:

    $AoA =

    [

    ['1111', 'GOOD'],

    ['1112', 'CANCELED'],

    ['1113', 'GOOD']

    ];

    Since we have abstracted out what we do with lines, we can split our function into several manageable chunks rather than a monolithic whole.

    Anyway, now that we have created our function to make this data structure, let's take a look at a generic function to access this data structure.

    Array of Arrays Access Function:

    We are now in a position to create a function that will print out any array of arrays. Let's keep going a little further with the access methods we talked about. Remember that:

    $AoA->[2][1]

    directly accessed the element in a two dimensional array, and:

    $arrayRef = $AoA->[2];

    print @$arrayRef;

    prints out the array slice associated with that array.

    Now say that we want to print out the entire data structure. Given:

    $account =

    [

    ['name','type','address','amount'],

    ['name2','type2','address2','amount2']

    ];

    we want to print out name type address amount\nname2 type2 address2 amount2

    What we need is a generic function which does this for us, which traverses down our data structure and prints out what we need in the way that we need it.

    Below is such a function. We assume that we are passed an Array of Arrays reference:

    Listing 8.5: printAoA.p

    1 sub printAoA

    2 {

    3 my ($AoA) = @_; # we are passed the AoA reference.

    4 my ($return);

    5

    6 foreach $arrayref (@$AoA) # gives an array ref.

    7 {

    8 my $scalar = "@$arrayref\n"; # simply print the reference out, as a scalar.

    9 $return .= $scalar;

    10 }

    11 $scalar;

    12 }Here, loop 6-10 does most of the work. We go through each reference in the Array of Array reference passed in, and simply make a scalar out of it (line 8). By the rules of interpolation, this comes out to be a space separated scalar. We might want to add a parameter for a configuration hash, like we did with the constructor, so that you could say something like:

    printAoA($AoA, {DELIM => '|'});

    to print out a pipe delimited field. This would look like:

    Listing 8.6: printAoA2.p

    1 sub printAoA

    2 {

    3 my ($AoA, $config) = @_; # we are passed the AoA reference.

    4 my ($return);

    5 local($") = $config->{DELIM} if ($config->{DELIM});

    6 foreach $arrayref (@$AoA) # gives an array ref.

    7 {

    8 my $scalar = "@$arrayref\n"; # simply print the reference out, as a scalar.

    9 $return .= $scalar;

    10 }

    11 $scalar;

    12 }

    Here, the special variable $" = "|" makes the expression "@$arrayref\n" equal to something like "1|2|3|4", rather than "1 2 3 4". In other words, this would give you the magic required to print out an array with any delimiter that you wanted (you could do a similar trick with $\ to print out any ending character that you want.)

    The upshot of all this is that the above function can be made very generic. Since we are not concerned anywhere in this function about the size of the arrays involved (since Perl is nice enough to keep track of this for us), this can be used on any array of arrays. Therefore, foreach $arrayRef (@$AoA)traverses the first dimension of AoA and

    print "@$arrayRef";

    by default traverses the second dimension.

    Array of Arrays Example: Filtering Columns from Excel

    Now let's take a look at an example of the use of Array of Arrays, to try to show where they are helpful. Keep in mind that we are talking about flat files, although with extensions like SybPerl, or OraPerl, or OLE you could do the same thing with a database or Microsoft programs (you could do the following inside Excel, for example). We provide these packages on the CD associated with the book. Suppose that you have a table (in a file) that looks something like this:

    "Raymond Burton",31,"Architect","Sampson Architecture"

    "Sam Ireland",34,"Administrative Assistant","USV"

    "Jim Hampton",35,"Senior VP","Tyco LTD"

    This is your standard, vanilla 'Comma Separated Values' file, which any Excel program can spit out. Now suppose that you want to get a summary that simply lists the name and title of each of the people in the file, and print it out to a file, in alphabetical order.

    Plan of Attack

    There are four steps here:

    #1: parse the file, and put it into a data structure.

    #2: sift through the data structure, and take out the elements that we want.

    #3: sort these elements into an alphabetically ordered reference.

    #4: print out these elements to a file.

    Now steps 1, 2, and 4 look very similar to the above stuff we were doing with 'makeArrayOfArrays', 'printAoA'. Hence, let's modify these functions to do these steps of the task that we require in two statements:

    my $modifiedAoA = makeArrayOfArrays("CSVfile", {IGNORE => [1,3], DELIMITER => "," });

    printAoA($modifiedAoA, {OUTPUT_FILE => "newfile", DELIMITER => ","});

    where the first statement (makeArrayOfArrays) actually:

    1) creates the array of arrays, separating words by ",".

    2) ignores the first and third elements and the second (printAoA)

    1) prints out the modified elements to an output file.

    2) uses the delimiter "," to do so.

    The sorting then would be done by a different routine which we have yet to write.

    Now this is probably the best way to do it, since you are reusing code, and making a powerful subroutine in the process. However, it probably isn't the best way to learn about Array of Array syntax, so I leave it as an exercise to the reader.

    Let's make a complete program that does this: which takes as its first argument an input, CSV, file, and as its second argument an output file, and cuts out the first and third elements. Here is a first crack:

    1 my $ifh = new FileHandle("$ARGV[0]");

    2 my $ofh = new FileHandle("> $ARGV[1]");

    3 my $line;

    4 my $AoA = [];

    5 while (defined ($line = <$ifh>))

    6 {

    7 @elements = split(m",", $line);*

    8 push(@$AoA, [@elements[0,2]]);

    9 }

    This is subtly wrong, and if you look at comma separated value files, you can see why. Suppose you have something like the following in $line:

    "Struthers, Aaron", 54, "Bouncer","Sam's Grill"

    Then when you 'split(m",", $line);, you will get five elements:

    ('"Struthers','Aaron"',54,'"Bouncer"','"Sam\'s Grill"');

    rather than four, since there is a comma in "Struthers, Aaron". Fortunately, Perl provides a package called 'Text::Parsewords' which let's you do something like:

    use Text::ParseWords;

    @elements = quotewords(",", 1, $line);

    which parses it correctly, since the comma is in the middle of double quotes.

    This routine does your data structure creation. Line 7 (with caveats) does the splitting of your input lines (into elements) and line 8 actually populates the array of arrays with the two elements that we want (name and occupation. ) Our structure now looks like this:

    $AoA =

    [

    ['"Raymond Burton"','"Architect"'],

    ['"Sam Ireland"','"Administrative Assistant"'],

    ['"Jim Hampton"','"Senior VP"']

    ];

    Now, we need to sort them:

    1 @$AoA = sort {$a->[0] cmp $b->[0]} @$AoA;

    Here, sort is a built-in function in Perl which lets you sort elements in any way, shape or form that you would desire. (Here, we so happen to be sorting the names in the file in alphabetical order, since we know that name is element 0 ). Since each of the elements in @$AoA is itself an array, this function reaches in, gets the first element out of two of the elements in @$AoA, and then compares them. After this, we have a structure like:

    $AoA =

    [

    ['"Raymond Burton"','"Architect"'],

    ['"Jim Hampton"','"Senior VP"'],

    ['"Sam Ireland"','"Administrative Assistant"']

    ];

    Now that we have sorted it, print it out:

    1 my $element;

    2 local($") = ",";

    3 foreach $element (@$AoA)

    4 {

    5 print $ofh "@$element\n";

    6 }

    Again, we use the $" variable to make interpolation work a certain way, and then print using that interpolation. This trick works quite a bit, and you should get used to it.

    Summary of Array of Arrays:

    The data structure Array of Arrays is good for cases in which you have tabular data in which you care about its order. Such cases include data printouts, formatted reports, matrixes, and other structures. To access a certain slice of data (say the second row) you say something like:

    my @array = @{$AoA[1]};

    which would access the bold in the following data structure:

    $AoA = [ [1,2,3,4], [5,6,8,9,10] }

    To access an individual element you say:

    my $element = $AoA->[1][2];

    instead.

    Array of Hashes (AoH):

    An array of hashes is an array of elements, each of which is a hash. This is much more useful at making data structures 'name friendly': after all, we tend to associate data with names, rather than numbers. Let us suppose that we want to print out all the names in a data structure that looks like the following:

    $accounts =

    [

    ['name1','type1','address1','amount1'],

    ['name2','type2','address2','amount2']

    ];

    In other words, an Array of Arrays. You could say something like:

    foreach $arrayRef (@$accounts)

    {

    print $arrayRef->[0];

    }

    However, this relies on you knowing that 'row zero' equals 'name'. Of course you could always define a hash to translate:

    %account = ('name' => 0, 'type' => 1, 'address' => 2, 'amount' => 3 );

    and then say something such as:

    foreach $arrayRef (@$accounts)

    {

    print $arrayRef->[$account{'name'}];

    }

    where $account{'name'} translates into '0', which is then the index that points to 'named data'. This is ugly, slow, and unnecessary. It also does not insulate your code against change (we'll talk about this later). It's much better to build the translation IN to the data structure itself. This is the Array Of Hashes, and it looks like:

    Anonymous Reference Structure

    $AoH =

    [

    {'key1a' => 'value1a', 'key2a' => 'value2a', 'key3a'=>'value3a'},

    {'key1b' => 'value1b','key2b' => 'value2b', 'key3b'=>'value3b'}

    ];

    Note the '[' indicates the first level of the data structure, and the comma separated '{' indicates the second. These, again, indicate the primary structure is going to be an array, and that the secondary structure is going to be a hash.

    In the example with the accounts above, then, the data structure will look like:

    $accounts =

    [

    {'name' => 'fred', 'type' => 'book binding',

    'account'=> 'stable', 'amount' => 'owes $250'},

    {'name' => 'john doe', 'type' => 'insurance',

    'account' => 'risky', 'amount' => 'owes $60K' }

    ];

    To use data structures then, we just would need to know two things like we did with the array of arrays: how to access the function, and how to build it.

    Array of Hashes Direct Access

    Here are four common ways of accessing an Array of Hashes:

    1) How to access a given element. Given this structure, then, each element in the array of hashes can be accessed by the following syntax:

    $AoH->[$dim1]{$key1};

    where $dim1 is the dimension which the array of hashes $AoH is based on, and $key1 is the key to the hash.

    Then, in the example above, $account->[0]{'name'} accesses the following element in the hash, equaling 'fred'. Here it is in the hash (emboldened again):

    $accounts =

    [

    {'name' => 'fred', 'type' => 'book binding',

    'account'=> 'stable', 'amount' => 'owes $250'},

    {'name' => 'john doe', 'type' => 'insurance',

    'account' => 'risky', 'amount' => 'owes $60K' }

    ];

    2) Accessing the Entire Row: Now suppose that we want to access the entire row for fred. Well, hashes inside the array can be accessed by this syntax:

    my $hashref = $AoH->[$dim1]; %hash = %$hashref;

    where $dim1 is the first dimension of the array.

    The following syntax could be used to print the array of hashes from the example above:

    my $hashref = $account->[0]; my @values = values %$hashref; print "@values\n";

    Here is the relevant data in the structure itself:

    $accounts =

    [

    {'name' => 'fred', 'type' => 'book binding',

    'account'=> 'stable', 'amount' => 'owes $250'},

    {'name' => 'john doe', 'type' => 'insurance',

    'account' => 'risky', 'amount' => 'owes $60K' }

    ];

    Again, only the values are referenced (since we used the function values). If you said something like 'print "@{[%$hashref]}\n" (again using the trick to interpolate any array) you would reference the whole row.

    Now the output of 'values %$hashref' could be 'fred book binding stable owes $250'. Or it could be 'book binding stable fred owes $250'. or something like it.

    Remember, hash elements do not come out of the hash in a pre-determined order. This is one of the things that Array of Arrays have in advantage over Array of Hashes, since arrays have a concept of order.

    3) Getting all the names of a certain element: Let's get all the names out of our array of hashes:

    my @names = map ( $_->{name}, @$accounts);

    which makes '@names' ('fred','john doe'). Or the bold elements in:

    $accounts =

    [

    {'name' => 'fred', 'type' => 'book binding',

    'account'=> 'stable', 'amount' => 'owes $250'},

    {'name' => 'john doe', 'type' => 'insurance',

    'account' => 'risky', 'amount' => 'owes $60K' }

    ];

    Here, you are guaranteed to get out things in the order 'fred', 'john doe' it is a array of hashes, and you are accessing each element, sequentially, with 'map' in an array'.

    4) Getting a slice of elements out of the hash: Let's get the name and amount of the second account in the array:

    my ($name, $account) = @{$accounts->[1]}{('name','account')};

    This uses slicing to directly access the 'name' and 'account'. However, this is getting a little convoluted, so let's split it up.

    my $hashRef = $accounts->[1];

    my ($name, $account) = ($hashRef->{name}, $hashRef->{account});

    where we make a placeholder ($hashRef) and then access directly into this, to get the name and account elements inside it. Again, here are the relevant structures:

    $accounts =

    [

    {'name' => 'fred', 'type' => 'book binding',

    'account'=> 'stable', 'amount' => 'owes $250'},

    {'name' => 'john doe', 'type' => 'insurance',

    'account' => 'risky', 'amount' => 'owes $60K' }

    ];

    As you can see, this is exactly the same as referencing an Array of Arrays. The only difference is the {} instead of the [], and the fact that slicing (getting a portion of a hash or array) is a little more complicated.

    Summary of Array of Hashes Access.

    Accessing an Array of Hashes is quite similar to accessing an Array of Arrays. Figure 10.2 gives the most common access forms, and what they refer to:

    Figure 8.2 (line art)

    Figure 8.2

    Accessing an Array of Hashes

    Get these forms right, and understand the principles behind them, and you will understand Arrays of Hashes quite well.

    Array of Hashes Sample Constructor:

    So let's see how we can construct an Array of Hashes out of a flat file that consists of a list delimited rows. In order to do so, you need to have a way to construct the hash part of the Array of Hashes. Something like this is appropriate:

    name |type |account |amount

    fred |book binding |stable |owes $250

    john doe |insurance |risky |owes $60K

    in which the first row indicates what you are going to be hashing on, and the second and later rows indicate the data to be put in the hash.

    Hence, the code will be quite similar to the Array of Arrays constructor. Here is the central function:

    Listing 8.7: makeArrayOfHashes.p

    1 use FileHandle; # we need FileHandle for 'new FileHandle' call

    2 sub makeArrayOfHashes

    3 {

    4 my ($filename) = @_; # we are given filename

    5 my $fh = new FileHandle("$filename") || die; # we open up the file.

    6 my ($line, $return, $hash) = ('',[],'',());

    7 # $line = '',$return =[], etc

    8 chomp($line = <$fh>); # we chomp newlines off $line.

    9 # we'll hash against $line.

    10 my $hashRef = getHashKeys($line); # we make keys out of $line

    11 chomp(@lines = <$fh>); # we get rest of the lines

    12 foreach $line (@lines)

    13 {

    14 push (@$return, getRef($line, $hashRef);

    15 }

    16 return($return);

    17 }

    Or we might say, the 'outline' of a our ArrayOfHashes constructor. As is appropriate with functions, we stuff the 'hard bits' (getting the keys that we need to hash on (line 10), and the actual 'turning a line into a hash' (line 15)) into functions. We could then re-use them. The function getHashKeys then becomes:

    Listing 8.7b: makeArrayOfHashes.p continued

    1 sub getHashKeys

    2 {

    3 my ($line) = @_; # we pass in a line

    4 my (@hashElements) = split(m"\|", $line); # we split it by '|'

    5 my ($element, $return, $xx) = ('', {},0 ); # another multiple set

    6 foreach $element (@hashElements) # we go through each element.

    7 {

    8 $element =~ s"^\s+""g; # get rid of leading spaces

    9 $element =~ s"\s+$""g; # and trailing spaces.

    10 $return->{$xx++} = $element; # we make'translation' hash.

    11 }

    12 $return;

    13 }

    This function does something very useful for us. It makes a translation table for us, so we know that the first element of the table happens to be the name, the second an account, etc. The heart of the algorithm is in line 10, which returns something like:

    $return == { 0 => 'name', 1=> 'type', 2 => 'account', 3 => 'amount' };

    working sort of like what you see in Figure 8.3:

    Figure 8.3: line art

    Figure 8.3

    creating a translation table in Perl.

    What do we use this for? Well, here is the plan of attack. Since we want to make a structure like:

    $accounts =

    [

    {'name' => 'fred', 'type' => 'book binding',

    'account'=> 'stable', 'amount' => 'owes $250'},

    {'name' => 'john doe', 'type' => 'insurance',

    'account' => 'risky', 'amount' => 'owes $60K' }

    ];

    we need some way to know turn the array we get from splitting up a line into a hash reference. With a translation table, we can say:

    my (@elements) = split(m"\|", $line);

    $hash{$return->{0}} = $elements[0];

    to actually say

    $hash{'name'} = 'fred';

    since $return->{0} resolves to 'name', and $elements[0] resolves to 'fred'. More formally, we put it in a function called getRef:

    Listing 8.7c: makeArrayOfHashes.p continued

    1 sub getRef

    2 {

    3 my ($line, $hashKeys) = @_;

    4 my ($element, @elements) = @_;

    5 my ($return) = {};

    6 my ($xx);

    7 @elements = split(m"\|",$line); # split $line on "\|".

    8 for ($xx = 0; $xx < @elements; $xx++)

    9 {

    10 $element = $elements[$xx];

    11 $element =~ s"^\s+""g; # strip leading spaces.

    12 $element =~ s"\s+""g; # strip trailing spaces

    13 $return->{$hashKeys->{$xx}} = $element[$xx]; # magic turning array

    14 } # into a hash.

    15 $return;

    16 }

    Again, line 13 is the magic one, where we take the lookup table that we have built (row 0 equals the name field, for example) and turn it into the hash that we return in line 15 to the main routine, to be stuffed into the data structure by:

    push (@$return, getRef($line, $hashRef));

    We then iterate through the file, (for each line) adding a hash onto the array as we go.

    Array of Hashes Access Function:

    By creating the data structure in this way, (AoH) we are granted a lot more freedom in how we access it. Now, instead of doing

    foreach $arrayRef (@$accounts)

    {

    print $arrayRef->[0]; # prints out the names of the accounts.

    }

    we can do:

    foreach $hashRef (@$accounts)

    {

    print $hashRef->{'name'};

    }

    which is infinitely more useful for accessing data elements, and also quite a bit more useful if something in your code changes..

    Let's suppose that the field 'name' gets moved to the third column of account (if you are dealing with databases), so you have:

    type, account, name, amount. # $arrayRef->[0] now points to type

    Any code which says '$arrayRef->[0]' is going to break since it now points to 'type'. Or if a field gets prepended:

    id, name, type, account, amount # $arrayRef->[0] now points to id

    Now, any code that says '$arrayRef->[0]' is going to point to the id. If you use Array of Hashes to access these elements, then your code is insulated against this change. If you were dealing with Array of Arrays, then suddenly your code is using 'type' where it should be using 'name'!

    Note in the above example, there still are some changes that you have to be wary of. If someone changes the field from 'name' to 'Name', you will still break. This is another good reason for '-w': it will warn you when you are using a hash value that is undefined, so:

    #!/usr/local/bin/perl -w

     

    my $AoH = getAoH('filename');

    my $hashRef = $AoH->[0];

    print $hashRef->{'name'};

    will issue a warning about undefined columns if 'name' so happens to be undefined.

    Let's now 'print out' an Array of Hashes in exactly the same way that you we printed out an Array of Arrays. Its just that it may not be as useful as the Array of Arrays example listed above. Again, this is because of the orderless nature of hashes. You won't be guaranteed of the order that text prints out.

    Let's call the function printAoH. It will look something like this:

    Listing 8.8: makeArrayOfHashes.p continued

    1 sub printAoH

    2 {

    3 my ($AoH) = @_;

    4 my ($hashRef);

    5 foreach $hashRef (@$AoH) # go through each hash in array.

    6 {

    7 my $key;

    8 foreach $key (keys (%$hashRef)) # go through each key in the hash.

    9 {

    10 print "$hashRef->{$key} $hashRef->{$value} ";

    11 }

    12 print "\n";

    13 }

    14 }

    The above function is generic, and prints out the entire AoH structure, just like the Array of Arrays function did. It does it explicitly, by going through each key in the hash and printing that. We could make it shorter, via some trickery:

    Listing 8.9: makeArrayOfHashes.p continued

    1 sub printAoH

    2 {

    3 my ($AoH) = @_;

    4 my ($hashRef);

    5 foreach $hashRef (@$AoH) # go through each hash in the array.

    6 {

    7 print "@{[ %$hashRef ]} \n"; # print out using interpolation trick

    8 }

    9 }

    Line number 7 takes on the job of 7-12 in the other example. Again, it takes $hashRef, dereferences it, turns it into an array, and then 'interpolates' that array so it prints out elements space delimited. (what a lot of work for one line!)

    But of course, this isn't very useful since that order thing, again. (unless you want to just see the AoH). Most of the time, you will be doing reporting using specialized functions.

    Here's an example of a reporting function:

    1 sub reportAoH

    2 {

    3 my ($AoH) = @_; # passed in from above.

    4 foreach $hashRef (@$AoH)

    5 {

    6 print "$hashRef->{'name'}\t$hashRef->{'account'}\n";

    7 }

    8 }

    Again, if you substitute reportAoH for the function printAoH this will print out the name and account fields in the table account, rather than the whole thing.

    Array of Hashes example: Dealing with Incomplete Data

    Array of Hashes are used pretty interchangeably with Array of Arrays, hence the examples that work for AoA's work for AoH's as well. (You could easily rewrite the Excel example above to work for AoH's).

    However, there are a couple of situations in which you can do things with Array of Hashes, which you cannot do with an Array of Arrays. Suppose the file that you were accessing had data that looked like:

    Record #1:

    name: 'George Simpson'

    age: 25

    occupation: 'Plumber'

    salary: $45000

    Record #2:

    name: 'Sam Plinkton'

    age: 56

    occupation: 'Accountant'

    Record #3:

    name: 'Heather Sanford'

    occupation: 'botanist'

    In other words, your data is incomplete. You have a lot of data with associated tags, but they contain only certain information about a given item. Let's take a file like this, and prepare a report of occupations and their associated salaries (again, in alphabetical order), and print it to an output file.

    Plan of Attack

    There are three steps here:

    1) Turning the file into a data structure

    2) getting the data out of that data structure that we need

    3) sorting those elements into a sorted reference.

    4) printing out those elements into a file.

    However, since the format of the file that we want is different, and because we are asking for a different task (making an average of the salaries), the code itself will be quite different. Here's a possible solution, having the user say 'script.p <input_file> <output_file>':

    Listing 8.10: incompleteRead.p

    1 use FileHandle;

    2 use strict;

    3 my $ifh = new FileHandle("$ARGV[0]");

    4 my $ofh = new FileHandle("> $ARGV[1]");

    5

    6 my ($hashRef, $AoH) = ({}, []);

    7 while ($hashRef = getHashRecord($ifh))

    8 {

    9 $hashRef->{'occupation'} = 'UNKNOWN' if (!defined $hashRef->{'occupation'};

    10 $hashRef->{'salary'} = 'UNKNOWN' if (!defined $hashRef->{'salary'});

    11 push(@$AoH, $hashRef);

    12 }

    13 @$AoH = sort { $a->{'occupation'} cmp $b->{'occupation'} } @$AoH;

    14 foreach $hashRef (@$AoH)

    15 {

    16 print $ofh "$hashRef->{'occupation'}: $hashRef->{'salary'}\n";

    17 }

    The only out of the ordinary thing here is that we are parsing the file differently in line 7. Instead of the usual

    while (defined ($line = <$ifh>))

    We have:

    while ($hashRef = getHashRecord($fd))

    Why? because our input file, again, of the format:

    Record #1:

    name: 'George Simpson'

    age: 25

    occupation: 'Plumber'

    salary: $45000

    Record #2:

    name: 'Sam Plinkton'

    age: 56

    occupation: 'Accountant'

    Record #3:

    name: 'Heather Sanford'

    occupation: 'botanist'

    with multiple lines per record. Further more, each line corresponds to a hash. Hence, we need a function to get make a hash reference out of this, returning:

    $hashRef = {'name' => 'George Simpson', 'age' => 25,

    occupation => 'Plumber', 'salary => '$45000' }

    the first time,

    $hashRef = { 'name' => 'Sam Plinkton', 'age' => 56,

    'occupation' => 'Accountant' }

    the second, and so on. Below is such a function. Let's use the module 'Text::Parsewords' to handle splitting up each of the lines:

    Listing 8.10b: incompleteRead.p continued

    1 use Text::Parsewords;

    2 use strict;

    3 sub getHashRecord

    4 {

    5 my ($fh) = @_;

    6 my ($return, $line) = (undef, undef);

    7 while (defined ($line = <$fh>))

    8 {

    9 return($return) if (($line =~ m"Record #")&&(keys (%$return) != 0));

    10 next if ($line =~ m"Record #"); # ignores "Record #"returns if

    11 # created hash,and'Record #' found

    12 # (this indicates next record)

    13 my ($key, $value) = quotewords(":", 0, $line); # splits to name,value

    14 $key =~ s"^\s+""g; $key =~ s"\s+$""g;

    15 $value =~ s"^\s+""g; $value =~ s"\s+$""g;

    16 $return->{$key} = $value;

    17 }

    18 return($return);

    19 }

    Here, we use a small trick to handle the format of the record coming in. The heart of the subroutine is in lines 7 through 17, where each line is read in turn, and then put into the data structure, until either the file ends (and we return on line 18) or the line contains the text 'Record #' and we have already created an element in our hash (keys (%$return) != 0). In that case, the hash is returned on line The running of it looks something like figure 10.4:

    Figure 8.4 (line art)

    Figure 8.4

    Population of a hash record based on an input file

    After each line of input, an extra record is made, which then is stuffed into a hash. When the string 'Record #' is detected, the loop ends, and keeps the filehandle pointing at the right place to pick up the next record. Hence,

    while ($hash = getHashRecord($ifh))

    does what we want, picks up record 1, then record 2 returns a hash reference, much the same way that while $line = <$fd> returns a line of text. Let's turn back to our original function:

    Listing 8.10: incompleteRead.p

    1 use FileHandle;

    2 use strict;

    3 my $ifh = new FileHandle("$ARGV[0]");

    4 my $ofh = new FileHandle("> $ARGV[1]");

    5

    6 my ($hashRef, $AoH) = ({}, []);

    7 while ($hashRef = getHashRecord($ifh))

    8 {

    9 $hashRef->{'occupation'} = 'UNKNOWN' if (!defined $hashRef->{'occupation'};

    10 $hashRef->{'salary'} = 'UNKNOWN' if (!defined $hashRef->{'salary'});

    11 push(@$AoH, $hashRef);

    12 }

    13 @$AoH = sort { $a->{'occupation'} cmp $b->{'occupation'} } @$AoH;

    14 foreach $hashRef (@$AoH)

    15 {

    16 print $ofh "$hashRef->{'occupation'}: $hashRef->{'salary'}\n";

    17 }

    Now we can see where this fits in. Line 7 provides the bridge between our input source (a flat file) and a Perl data structure. And the rest of the function is rather ordinary. We massage the data in line 9-10, adding a couple of elements (an occupation or salary of 'UNKNOWN' if it wasn't found in the data), and then in line 11 build our data structure. Then in line 13, we sort it (by occupation) and in 14-17, print it out to the output file.

    The upshot?

    We get a file that looks like:

    Accountant: UNKNOWN

    botanist: UNKNOWN

    Plumber: $45000

    Of course, this is just a simple sample of what we could do. We could average the salaries, get a subset of the data (sorted by name) print it out in different format, lowercase the data, or whatever. The important thing to remember here is that once you get the data into a format that Perl can recognize, you can do anything with it. We shall see this even closer when we look at the next data structure: Hash of Hashes.

    Summary of Array of Hashes

    Arrays of Hashes are what they sound like: an array (in our case array reference) in which each element of that array references is itself a hash.

    They are more flexible than array of arrays, since you can make your data structure more resistant to change. Take a database example. If you have a table with the following elements:

    id, name, account, address

    and turn it into an array of arrays, your code will refer to

    $column->[0]

    to get an id.

    Similar code in an array of hashes will refer to

    $column->{id};

    instead, and hence won't break if someone switches around elements:

    name,id,account,address

    Now, $column->[0] will break (since it refers to the name) but if you program it right, $column->{id} will not (since you have named the element as such.

    Hash Of Hashes

    A hash of hashes is a simple, two dimensional hash. As you might expect, each hash key has, as a value, a hash. The anonymous form for a Hash of Hashes looks like this:

    $HoH =

    {

    level1key1 => { level2key1 => value1, level2key2 => value2 }

    level1key2 => { level2key1 => value1, level2key2 => value2 }

    };

    Hash of hashes are great for an extremely common problem in the world of business, and a common database problem: unique keys.

    Suppose that we have state info that is indexed by a unique identifier, an id. Then, we could use the two dimensional hash to 'tag' all the unique accounts on the system, producing a data structure which looks like:

    $stateKeys =

    {

    ID421 => { state => 'Minnesota', capital => 'St Paul',

    state_weed => 'dandelion' }

    ID221 => { state => 'South Dakota', capital => 'Pierre',

    state_weed => , 'crabgrass }

    };

    Notice what we are doing here. For each row of the state info table, we pick out and 'tag' a given key as part of a two dimensional hash. Let's look at how we can now access it:

    Hash of Hashes Access Methods:

    In a way, by stuffing the data of a database into a Hash of Hashes, makes that data less easy to access. Since a hash is a 'black box' in that you need a method to see what is inside it, the syntax for accessing anything that you don't know is there becomes more difficult.

    Hence, given a data structure like:

    $stateKeys =

    {

    ID421 => { state => 'Minnesota', capital => 'St Paul',

    state_weed => 'dandelion' }

    ID221 => { state => 'South Dakota', capital => 'Pierre',

    state_weed => , 'crabgrass }

    };

    Let's access different parts of these data elements in the same way that we did with array of hashes and array of arrays.

    1) You can access the state associated with the ID 'ID421' by saying:

    print $stateKeys->{'ID421'}{'state'};

    to print out 'Minnesota'

    2) To get a hash slice of the same data structure, you could say:

    my $hashRef = $stateKeys->{'ID421'};

    to get the hash referenced by 'ID421', i.e.:

    $stateKeys =

    {

    ID421 => { state => 'Minnesota', capital => 'St Paul',

    state_weed => 'dandelion' }

    ID221 => { state => 'South Dakota', capital => 'Pierre',

    state_weed => , 'crabgrass }

    };

    but only if you know the ID421 is there inside the hash. You can get this information, of course, by saying:

    my (@keys) = keys(%$stateKeys);

    which references the following keys (again, in bold) in the HoH:

    $stateKeys =

    {

    ID421 => { state => 'Minnesota', capital => 'St Paul',

    state_weed => 'dandelion' }

    ID221 => { state => 'South Dakota', capital => 'Pierre',

    state_weed => , 'crabgrass }

    };

    3) However, the syntax for getting, say, all of the states out of the hash becomes rather convoluted:

    1 foreach $key (keys(%$stateKeys))

    2 {

    3 my $hash = $stateKeys->{$key};

    4 push(@states, $hash->{state});

    5 }

    which references:

    $stateKeys =

    {

    ID421 => { state => 'Minnesota', capital => 'St Paul',

    state_weed => 'dandelion' }

    ID221 => { state => 'South Dakota', capital => 'Pierre',

    state_weed => , 'crabgrass }

    };

    It works by taking the keys of '$stateKeys (ID421, ID221)' in line 1, then accessing the hash associated with that key (line 3) and putting it into a temporary hash ($hash). Then, we access the second level out of the hash, and push this onto an array called @states (line 4) But still, its not syntax that you'd want to write home about. Same goes for getting the records for all states which begin with letters greater than N:

    foreach $key (keys(%$stateKeys))

    {

    my $hash= $stateKeys->{$key};

    if ($hash->{'state'} gt 'N')

    {

    $states_gt_N->{$key} = $hash;

    }

    }

    This code does the trick, but again, it isn't simple, and it isn't very clean. You would end up with a Hash of Hashes in $states_gt_N, which weeded out 'Minnesota' and ended up with just the bold elements in:

    $stateKeys =

    {

    ID421 => { state => 'Minnesota', capital => 'St Paul',

    state_weed => 'dandelion' }

    ID221 => { state => 'South Dakota', capital => 'Pierre',

    state_weed => , 'crabgrass }

    };

    But again, this isn't the easiest syntax. If you wanted to skip getting the key, and just get a array of hashes that started with letters greater than N you could say:

    my $hashList = [];

    @$hashList = grep ($_->{state} gt 'N', values (%$stateKeys));

    which would simply return an Array of Hashes, each element of which is a hash with states greater than N, something like:

    $hashList =

    [

    { state => 'South Dakota', capital => 'Pierre',

    state_weed => , 'crabgrass }

    ];

    But still, we are not guaranteed that the states we are getting are in any given order, because of the 'values' statement.

    Summary of Hash of Hashes Access Methods

    Hash of Hashes access methods are a little bit convoluted. If you don't know what is in your data structure, you need to use the 'keys()' or 'values()' keyword a lot. A summary is given in figure 10.5:

    Figure 8.5 (line art)

    Figure 8.5

    Hash of Hash access methods, summarized.

    What we give up in accessibility, we gain in speed. The best thing about Hash of Hashes is that if you know the elements you are looking for, they are only a couple of hash accesses away.

    Hash of Hashes Sample Constructor:

    We are now in a position to make a generalized constructor for a Hash of Hashes. Again, we shall assume that we have a file that looks something like:

    id |state |capital |state_weed

    ID421 |Minnesota |St. Paul |dandelion

    ID221 |South Dakota |Pierre |crabgrass

    Now, the benefit of reusing the format that we had before is that we can reuse the code for getting that translation table that we talked about before, which says something like:

    $column = { 0 => id, 1 => state, 2 => capital, 3 => state_weed };

    and lets us turn an array into a hash, by saying something like:

    $return->{$column->{0}} = $element[0];

    where $column->{0} becomes 'id', and $return->{$column->{0}} becomes $return->{'id'} becomes 'ID421'. Here is that code from the section on Array of Hashes again:

    Listing 8.11: getHashKeys.p

    1 sub getHashKeys

    2 {

    3 my ($line) = @_; # we pass in a line

    4 my (@hashElements) = split(m"\|", $line); # we split it by '|'

    5 my ($element, $return, $xx) = ('', {},0 ); # multiple assignment

    6 foreach $element (@hashElements) # iterate through elements

    7 {

    8 $element =~ s"^\s+""g; # strip off leading spaces

    9 $element =~ s"\s+$""g; # strip off trailing spaces

    10 $return->{$xx++} = $element; # we make 'translation' hash.

    11 }

    12 $return;

    13 }

    While we are at it, we might as well use the code to actually translate an array into a hash as well, from our last example in the section on array of hashes:

    Listing 8.12: getRef.p

    1 sub getRef

    2 {

    3 my ($line, $hashKeys) = @_;

    4 my ($element, @elements) = @_;

    5 my ($return) = {};

    6 my ($xx);

    7 @elements = split(m"\|",$line); # split $line on "\|".

    8 for ($xx = 0; $xx < @elements; $xx++)

    9 {

    10 $element = $elements[$xx];

    11 $element =~ s"^\s+""g; # strip off leading spaces.

    12 $element =~ s"\s+""g; # strip off trailing spaces

    13 $return->{$hashKeys->{$xx}} = $element[$xx]; # array to hash magic

    14 }

    15 $return;

    16 }

    Where line 13 does all our work for us. If we use this code again, then our job becomes exceedingly simple in making a constructor, given a file name:

    Listing 8.13: makeHoH.p

    1 use FileHandle;

    2 sub makeHoH

    3 {

    4 my ($fileName) = @_;

    5 my $fh = new FileHandle($fileName) || die "Couldn't open $fileName";

    6 my ($line, $return) = ('',{});

    7 my $defLine = <$fh>; chomp($defLine);

    8 my $transHash = getHashKeys($defLine);

    9 my @lines = <$fh>; chomp(@lines);

    10 foreach $line (@lines)

    11 {

    12 my $hash = getRef($line, $transHash);

    13 $return->{$hash->{id}} = $hash;

    14 }

    15 $return;

    16 }

    Here, the key lines are 10-14. The '$return->{$hash->{id}}' takes the id out of the hash that we just created in line 12, and creates the following entry in $return:

    'IDXXX' => { state => 'XXXXX', capital => 'XXXXX', state_weed => 'XXXX' };

    Iterating, then, through every line in the file makes our data structure which is returned on line 15.

    Now, what happens if you have more than one field that you wish to hash on? Well, simple: replace line 13 with whatever field you wish to hash on. Say you wanted to hash on the capital and the id. Then you would say:

    $return->{$hash->{id}}{$hash->{state}} = $hash;

    instead. But this is a triple hash, or a HoHoH, and we won't go into it in more detail here.

    Hash of Hashes Access Function:

    Likewise, here is the general access function for extracting information from a Hash of Hashes. It is almost exactly the same as AoH and HoH, except that it uses the 'keys' function in both dimensions:

    Listing 8.14: printHoH.p

    1 sub printHoH

    2 {

    3 my ($HoH) = @_;

    4 my ($key, $hash, $secondKey)

    5 foreach $key (keys %$HoH)

    6 {

    7 my $hash = $HoH->{$key};

    8 print "$key: ";

    9 foreach $secondKey (keys (%$hash))

    10 {

    11 ..print "$secondKey, $hash->{$secondKey} -- ";

    12 }

    13 print "\n";

    14 }

    15 }

    This will print out all the keys, and values, of the second level hash on the same line as its associated 'first level' key. Line 7 gets the first level key, line 9-12 prints out the keys and values associated with it. In our example, we would get an output with something like:

    ID421: id, ID421 -- state, Minnesota -- state_weed, dandelion -- capital, St.Paul

    ID221: state, South Dakota -- state_weed, crabgrass -- Capital, Pierre

    Although again, the use of this is somewhat problematic because of the order with hashes.

    Example of Hash of Hashes: Poor Person's database

    Let's use a hash of hashes to implement what we might call a 'poor man's database', which uses flat files. It will be quite simple, with the ability to

    1) add an entry

    2) get an entry based on a key

    3) check for duplicates (and disallow them).

    4) write out a file on closing.

    Let's say then that there are three commands to this database. We will assume that each entry has a certain order (say ID, name, address, phone, and comment, delimited by colons) The commands are:

    a -- add

    g -- get (based on key)

    e -- exit and store

    l -- load from file.

    Hence,:

    'a 16:Ed:1 trampoline blvd:444-3234:No telemarketing calls, please!'

    adds a record, and 'g 16:name' prints out 'Ed'. e file stores the session to 'file', and 'l file' would load the session from 'file'.

    Let's write the interface first. it will look something like:

    Listing 8.15: poorMansDB.p

    1 use FileHandle;

    2 use strict;

    3 my ($HoH, $line);

    4 my $config = { 0 => 'id', 1 => 'name', 2 => 'address',3 => 'phone nuber', 4 => 'comment' };

    5 while (defined ((chop($line = <STDIN>)))

    6 {

    7 processAdd($HoH, $line) if ($line =~ m"^a");

    8 processGet($HoH, $line) if ($line =~ m"^g");

    9 processStore($HoH, $line) if ($line =~ m"^e")

    10 processLoad($HoH) if ($line =~ m"^l");

    11 print "Unknown Comand\n" if ($line !~ m"^[aqel]"); # if $line doesn't begin with a,g,e,l

    12 }

    This is about it. Line 2 shows the structure of our table (this is the translation hash that comes up again and again, translating an array to a hash) Each time the user enters a key, lines 4-8 check to see if the user has entered various commands. Now all we need to do is write each one of these functions. We take them in turn.

    Listing 8.15b: poorMansDB.p continued

    13 sub processAdd

    14 {

    15 my ($HoH, $line) = @_;

    16 my ($hash, $xx) = ({},0);

    17 $line =~ s"a\s+""g;

    18 my @elements = split(m":", $line);

    19

    20 ( print ("Error in input! wrong no. of fields!\n"), return())

    21 if (@elements != keys(%config));

    22 foreach $element (@elements) { $hash->{$config->{$xx++}} = $element; }

    23

    24 ( print ("Duplicate key!\n"), return()) if (defined $HoH->{$hash->{id}});

    25 $HoH->{$hash->{id}} = $hash;

    26 }

    Here, we make a quick check to see that the number of elements the user entered matches with the number of rows in the database (line 20), make the hash out of the array (line 21), and then check for duplicate keys (in line 23). If there is an error in input, or it is a duplicate key we return. Otherwise we make the assignment into the data structure (line 24). Now that it is added, we can get various fields:

    Listing 8.15c: poorMansDB.p continued

    26 sub processGet

    27 {

    28 my ($HoH, $line) = @_;

    29 $line =~ s"^g\s*""g;

    30 my ($id, @fieldsToPrint) = split(m":", $line);

    31

    32 if (!defined $HoH->{$id})

    33 {

    34 print "ID $id not defined!\n";

    35 return();

    36 }

    37 my $hash = $HoH->{$id};

    38 my $field;

    39 if (!@fieldsToPrint)

    40 {

    41 my (@fields) = sort keys(%$config);

    42 foreach $field (@fields) { print "$hash->{$config->{$field}} "; }

    43 print "\n";

    44 }

    45 else

    46 {

    47 foreach $field (@fieldsToPrint) { print "$hash->{$field} "; }

    48 print "\n";

    49 }

    50 }

    There are only a couple of things to note about this (rather long) subroutine. One, the bulk of its length is due to 'user friendliness'. In lines 32-36, we check to see if the id is defined. And lines 40-44 show a small trick that we can use to pretend that a hash has order.

    If a user enters a command like 'g 16', we assume that they want to have all the information about the id 16. Hence, if we simply printed out:

    print "@{[%$hash]}\n";

    We would get the information in no particular order. In line 16 we say

    @fields = sort { $a cmp $b } keys(%$config);

    'keys(%$config)' is equal to the following elements in bold:

    my $config = { 0 => 'id', 1 => 'name', 2 => 'address',

    3 => 'phone nuber', 4 => 'comment' };

    Hence '@fields refers to '(0,1,2,3,4)', because 'sort keys (%$config)' sorts them in numeric order. (see chapter on special functions for more detail). When we say:

    foreach $field (@fields) { print "$hash->{$config->{$field}} "; }

    Then $config->{$field} refers first to $config->{0}, (id) second to $config->{1}, (name), third to $config->{2} (address), etc. The upshot of this is that it is equivalent to:

    foreach $field ('id','name','address','phone_number','comment')

    {

    print "$hash->{$field}\n";

    }

    and hence prints out the fields in order, as if they were an array.

    We don't have room for the other two functions processLoad, and processStore right here. See if you can come up with a good solution: and then compare your solution with the one on the CD associated with this book. Since you are both loading and storing the file, you are free to pick any format that you would like.

    Summary of Hash of Hashes

    Hash of Hashes is probably the most difficult of the data structures to get used to.

    Hash of Hashes are best used when you have to model a 'unique key', where that unique key is either a key from a database, or a key from a flat file. They are more difficult to model, but if you know the information that you are going to get out of them, they are worth it.

    They are also useful when you need to access data fast. Anything in a hash of hashes can be accessed in a hash lookup. In an array of hashes, the equivalent lookup would need to search through the entire array to find an element.

    Hash of Arrays

    A hash of arrays is a hash in which each of the values in the hash is an array. Below is the anonymous structure for a Hash of Arrays:

    $HoA = {

    key1 => [ $scalar1a, $scalar1b, $scalar1c ],

    key2 => [ $scalar2a, $scalar2b, $scalar2c ]

    };

    Making a hash of arrays is good for cases where want to have direct access to a given data set, based on a key, but don't care about the order in which the elements of that key are stored.

    For example, suppose you want to model a simple directory structure. You could say something like:

    my $files =

    {

    directory1 => ['file1','file2','file3','file4'],

    directory2 => ['file1','file2','file3','file4']

    };

    and access the files in directory1 by saying:

    my $directory1 = $files->{'directory1'};

    print @$files;

    to print out the libraries used by the program, etc. Below is the general form for a hash of Arrays.

    Hash of Arrays Direct Access Method

    How do you access parts of a hash of arrays? Well, given the above it should be easy by now:

    1) A particular element in a hash of arrays can be accessed by:

    $HoA->{$key1}[$dim1];

    where $dim1 is the dimension which the HoA is based on, and $key1 is the key to the hash. Hence,

    print $HoA->{'dierectory1'}[0];

    accesses:

    my $files =

    {

    directory1 => ['file1','file2','file3','file4'],

    directory2 => ['file1','file2','file3','file4']

    };

    2) Arrays inside the hash can be accessed by my $arrayref = $AoH->{directory2}; print "@$arrayref\n";

    This accesses the following, bolded elements

    my $files =

    {

    directory1 => ['file1','file2','file3','file4'],

    directory2 => ['file1','file2','file3','file4']

    };

    and therefore prints out 'file1 file2 file3 file4'.

    3) Slices can be done by the following syntax:

    my @firstfiles = map($_->[0], values(%$files);

    which accesses:

    my $files =

    {

    directory1 => ['file1','file2','file3','file4'],

    directory2 => ['file1','file2','file3','file4']

    };

    or alternatively:

    foreach $value (keys (%$files))

    {

    push(@firstfiles, $value->[0]);

    }

    The rest of the forms of access are left as an exercise.. they are directly determinable from the hash of hashes, and array of arrays examples above.

    Hash of Arrays Sample Constructor:

    Now let's look at how we would construct a data structure that looked like above. Again, the output was that of a directory, and its contents. Suppose you the directory looks like:

    file1

    directory1/

    file1

    file2

    file3

    file4

    directory2/

    file1

    file2

    file3

    file4

    Let's read this in Perl, assuming that we do the reads directly using readdir. Let's add the twist that we are reading it in alphabetical order, and ignore recursion:

    Listing 8.16: readDir

    1 use DirHandle; # module implementing directory handles, like FileHandle does files

    2 use strict;

    3 sub readDirectory # makes a hash of arrays.

    4 {

    5 my ($directory) = @_;

    6 my $DH = new DirHandle ("$directory");

    7 my ($entry, $return) = ('',{});

    8 my @entries = readdir($directory);

    9 foreach $entry (@entries)

    10 {

    11 if (-d $entry)

    12 {

    13 my $arrayDH = new DirHandle ("$entry");

    14 my @files = readdir($arrayDH);

    15 $return->{$entry} = [ sort @files ];

    16 }

    17 else

    18 {

    19 $return->{$entry} = '';

    20 }

    21 }

    22 }

    Here, we get the directories and files from the operating system in line 7 (and sort them on the fly), and for each of these entries check to see if it is a directory.

    If it is, we open it up (line 13) and get out its files, and then stick them into a hash associated with that entry (lines 14). If it is a file instead (lines 16-20) we simply make a 'dummy' hash entry to show the file is there and stick that into the return value (line 18).

    As a result, the directory structure:

    file1

    directory1/

    file1

    file2

    file3

    file4

    directory2/

    file1

    file2

    file3

    file4

    becomes

    $return =

    {

    'file' => ['file']

    'directory1' => ['file1','file2','file3','file4'],

    'directory2' => ['file1','file2','file3','file4']

    };

    which we then can manipulate to our heart's content.

    Hash of Arrays Access Function

    And, going with the flow, below is the access function to get the values out of a HoA. In the sample above (with the directories), this would print out:

    directory1: file1 file2 file3 file4

    directory2: file1 file2 file3 file4

    file:

    Here's the relevant code:

    Listing 8.17: printHoA.p

    1 sub printHoA

    2 {

    3 my ($HoA) = @_; # we are passed the HoA reference

    4 my ($arrayRef, $key);

    5 foreach $key (sort keys(%$HoA))

    6 {

    7 my $arrayRef = $HoA->{$key};

    8 print "$key: @$arrayRef\n";

    9 }

    10 }

    An easy one this time -- we simply take each key, and then access the Array associated with that key (line 5-7) and then print out the key along with its associated value (line 8).

    Example of Hash of Arrays: Processing Lists of Data/Key

    Now let's take a look at another short example of a Hash of Arrays. (this chapter is too long already!) Let's suppose that you have a grocery list (where you always keep them, in flat files) that has the following items:

    Cereal: 1.49,1.59,1.69, 1.49

    Pizza: 1.99,1.89,2.05

    Pop Tarts:2.55, 2.75, 2.67

    Tofu: 2.39,2.39

    Cereal: 1.59

    (you can tell I'm a bachelor, can't you!) And you want a summary of the cost for each type of food that you purchase. Well, this is the natural place for a Hash of Arrays. We:

    1) Read in the file to a data structure

    2) manipulate the data structure

    3) print out the summary.

    Here's the code below:

    Listing 8.18: readGroceries.p

    1 use FileHandle;

    2 use strict;

    3

    4 my $groceryHash = readGroceries("$ARGV[0]"); # take from first argument

    5 sumAndPrint($groceryHash);

    We just fill in the blocks now:

    6 sub readGroceries

    7 {

    8 my ($filename) = @_;

    9 my $fh = new FileHandle("$filename");

    10 my @lines = <$fh>; chop(@lines);

    11 foreach $line (@lines)

    12 {

    13 my ($hashElement, $list) = split(m"\s*:\s*", $line);# cheap way of making trailing spaces

    14 push(@{$hash->{$hashElement}}, split(m"\s*,\s*", $list)); # disappear with split.

    15 }

    16 return($hash);

    17 }

    Here, the heart of the algorithm is in lines 13, and 14, which take the line from the file, and first split it on the ':' to get the name of the element, and then split it on the ',' to get the prices associated. Line 14 then builds the data structure directly out of the list. (in a sort of sneaky way, see below)

    We then proceed to sum up, and print out the grocery list with:

    Listing 8.18b: readGroceries.p continued

    18 sub sumAndPrint

    19 {

    20 my ($groceryHash) = @_;

    21 my $item;

    22 foreach $item (keys %$groceryHash)

    23 {

    24 my ($sum, $price);

    25 my $arrayOfPrices = $groceryHash->{$item};

    26 foreach $price (@$arrayOfPrices)

    27 {

    28 $sum += $price;

    29 }

    30 print "$item: $total\n";

    31 }

    32 }

    And this prints out something like:

    Cereal: 7.85

    Pop Tarts: 7.97

    Pizza: 5.93

    Tofu: 4.78

    Now, in the whole program, the only thing unusual about this syntax lies in line 14:

    push(@{$hash->{$hashElement}}, split(m"\s*,\s*", $list));

    which so happens to be the center of our algorithm. $hash->{$hashElement} refers to 'cereal' or 'Pop Tarts'; i.e. it is what we are hashing on. Hence

    push(@{$hash->{$hashElement}},

    refers to the array in the hash of arrays, i.e.:

    $HoA =

    {

    'Cereal' => [ 1.59,1.69,1.49,1.59 ].

    'Pop Tarts' => [2.55, 2.75,2.67 ],

    'Pizza' => [ 1.99,1.89,2.05 ],

    'Soup' => [2.39,2.39]

    };

    This means that 'split(m"\s*,\s*", $list)' must fill in the values here. It does this by splitting elements in the list by many characters zero or more spaces, followed by a ',', followed by zero or more spaces*. Hence,

    1001, 1002, 10003

    becomes ('1001','1002','1003') as the spaces are munged by the split. Whew! but we are getting ahead of ourselves.

    split (m"\s*,\s*", $list) doesn't work on the leading spaces of the first chunk to be split, or the trailing spaces of the last chunk to be split. The most bulletproof way of doing this is saying:

    @elements = split(m",", $list);

    foreach (@elements) { s"^\s+""g; s"\s+$""g; }

    Summary of Common Data Structures

    From the previous sections on arrays of arrays, arrays of hashes, hashes of hashes, and hashes of arrays, you can see that there are common elements which tend to flow from an application of two rules, which we discussed last chapter. These rule are:

    1) if you want to traverse through an array reference, dereference with the construct

    foreach $elmt (@$aref)

    {

    ', where $aref is an array reference . This will go through each element of the array.

    }

    2) if you want to traverse through a hash reference dereference with the construct

    foreach $key (keys %$href) {

    my $value = $href->{$key}

    }

    The 'keys %$href' will give you a , where $href is a hash reference. This will go through each element in the hash, one by one.

    The data structure Arrays of Arrays were good at printing out nice, orderly tables of data. Arrays of Hashes were good at accessing tables via the name of their elements. Hashes of Hashes were good at quick access to individual pieces of data, also known as a database, and Hashes of Arrays were good at processing lists that so happen to be associated with a string (like grocery lists).

    Finally, as you can see from all the code of this chapter, making a Perl data structure is quite closely tied into reading a source of data. If you read from a certain type of file, then knowing Perl's functions for reading it in is extremely important. For if you get the data into Perl in the first place, half your battle is won.

    Final Note

    So where can you go from here?

    Well, there are two ways to extend the above data structures.

    1) creating hybrid structures. Sometimes, you will need a structure that doesn't exactly fit the mold of always having an array of elements which have arrays of their own, or so on. Something like this:

    my $configHash =

    {

    config1 => file1,

    config2 => ['-w','-s','-h']

    config3 => {key1 => 'value1',key2 => 'value2'}

    };

    is extremely common, but is not really an HoA or an HoH. Learning how to use these structures is a simple matter of understanding the rules from the last chapter, and applying them.

    2) Making structures with more dimensions. And also from the above, you can pretty much extend the mechanism to populate ridiculously complicated data structures. I mean, really, if you want that AoHoAoHoHoAoA, go for it. All you need to do is peel off a letter from the datastructure you want each level you go deeper into it, and use the above rules (keys %$hashRef and @$arrayRef) to determine which 'foreach construct you use. For example:

    foreach $key (keys %$HoHoA)

    {

    my $HoA = $HoHoA->{$key}

    foreach $key2 (keys %$HoA)

    {

    my $arrayRef = $HoA->{$key};

    foreach $element (@$arrayRef)

    {

    print "$element";

    }

    }

    }

    If you are really curious, the AoHoAoHoHoAoA listed above becomes:

    foreach $HoAoHoHoAoA (@$AoHoAoHoHoAoA)

    { # A

    foreach $key (keys %$HoAoHoHoAoA)

    { # H

    my $AoHoHoAoA = $HoAoHoHoAoA->{$key};

    foreach $HoHoAoA (@$AoHoHoAoA))

    { # A

    foreach $key (keys %$HoHoAoA)

    { # H

    my $HoAoA = $HoHoAoA->{$key};

    foreach $key (keys %$HoAoA)

    { # H

    my $AoA=$HoAoA->{$key};

    foreach $arrayRef ($@$AoA)

    { # A

    foreach $scalar (@$arrayRef)

    { # A

    push (@scalars, $scalar);

    }

    }

    }

    }

    }

    }

    }

    A piece of advice, though: if you were forced to do this in a real program please rethink your design.

    Anyway, that's about it. If you skipped the last chapter, or think that this chapter showed some chinks in your Perl armor, then I would strongly suggest going back to the last chapter for review. It contains more information on the guts of making Perl data structures, so it will open up the ways to creating tailor made data structures.

    Orders Orders Backward Forward
    Comments Comments

    COMPUTING MCGRAW-HILL | Beta Books | Contact Us | Order Information | Online Catalog


    HTML conversions by Mega Space.

    This page updated on October 14, 1997 by Webmaster.

    Computing McGraw-Hill is an imprint of the McGraw-Hill Professional Book Group.

    Copyright ©1997 The McGraw-Hill Companies, Inc. All Rights Reserved.
    Any use is subject to the rules stated in the Terms of Use.