![]() ![]() |
![]() |
![]() ![]() | |
© 1997 The McGraw-Hill Companies, Inc. All rights reserved. Any use of this Beta Book is subject to the rules stated in the Terms of Use. |
Coding by Eric Waldheim, Design/Documentation by Eric Waldheim and Ed Peschko.
One of the unfortunate perceptions that some people hold about perl is that it is not scaleable. It seems to be hard for them to believe that a language that is so useful in the small, which is the 'king of the one-liners', which has been called a 'duct tape' language, a language that 'gets the job done' can actually be used to form large applications and projects.
And to tell you the truth, I find it difficult to believe. Tools usually have a range of use which is finite. Nobody would use a calculator to do jet-wing design, or a microwave oven to prepare a gourmet meal.
Nonetheless, seeing is believing - anybody who has worked with perl for more than two months knows that it is extremely scaleable, scaleable in a way that has not been fully tested yet. Perl’s 'range' as it will seems to be close to infinite. You can write a one-line script that can save you hours of work, but you can also:
use perl to maintain million-line code base
manage large web sites, sometimes with hundreds if not thousands of concurrent users
convert data between systems for mission critical projects.
But its easy to talk big like this; it is harder to actually justify these words with actions. This chapter's idea is to show you the design, and execution difficulties of a large perl project in its entirety. The code in question is a fully-functional spreadsheet (a la Excel), written in perl. And hopefully, you will have some ideas on how it can be improved.
This chapter was designed to give you an idea of what it took to design the perl spreadsheet, why it was done, and to give you a high level overview of how it worked.*
Note that this chapter was not designed as a PerlTk tutorial. We have given other Tk apps in this book (the calendar (chapter 21) and a couple of standalone tk programs in chapter 12. However, if you are interested in learning tk the best thing you could do is go back to your computer, install tk from CPAN or the CD and type 'C:\> widget', and look at the source code. Trust me. |
First of all, we will talk about the motivation to build this spreadsheet, as well as the 'design requirements' behind the spreadsheet's creation.
Second, we will translate these requirements into a high-level Object Oriented design, with an emphasis on showing the rationale behind why each of the Objects in the spreadsheet has the relationship it does to the whole of the spreadsheet.
Third, we make a 'spreadsheet layout'; showing what each of the menus will hold, and showing all the associated windows. This gives an idea of the look and feel behind the spreadsheet itself.
Fourth, we will look a little closer at this design, into the area of the algorithms that needed to be written in order to support the spreadsheet. Here the design gets its first road test (so to speak), and items that were hazy in the OO design crystallize as we think of how to implement them.
Finally, we play around with our new spreadsheet. Developing new software isn't that exciting if you can't use it. And knowing how the software actually works lets us bang on it all the harder so that other people don't need to find the bugs.
If we went through every nuance of the design here, it would take ages (we did this on the chapter on documentation, and it came out to 60 pages for a 400 line class!). Hence, we can only give you a feel for the design process.
Also note that we did not talk about the implementation phase of our design. The documentation that comes along with the perl spreadsheet is fairly explicit on the interface to the spreadsheet.
Appendix A gives you all the code for the spreadsheet, as well as the documentation in POD format and attached to the code (what else!). Hopefully, you can take this code as a starting point and run with it. The code itself comes to 3052 lines, but we include it as one of my big pet-peeves of books is that they show you only the 'important' code and force you to fend for yourself. We hope that this chapter gives you the gumption to actually dive into the code itself and explore.
But for now, sit back and enjoy. Writing a perl spreadsheet like this gave us quite a few ideas on where perl can be taken for large projects, and taught us a lot about how far we could 'push the envelope' with perl to make larger and larger applications.
When you are about to embark on a big project (either in code, or in writing), you had better ask yourself some pretty damn good questions before you start.
In this case, there were two parallel tasks - we were writing a book, and the last example, the 'grand finale' so to speak was to be a spreadsheet in Tk. The book was a given, but the spreadsheet wasn't - there are other 'grand finale' projects that we could have written.
The first question we had to ask is - is it really worth it to do a spreadsheet in perltk? After all, the 'spreadsheet' is in the realm of the office applications, which have the reputation of:
1) having huge development staffs
2) being extraordinarily complex
3) having a high overhead to develop.
We didn't have a huge development staff (in fact, a development staff of one, part time), and the complexity of the spreadsheet was a given. And if the code turned out to be 30,000 lines long, well, that wouldn't be too much of a selling point, even if we had the time to develop it.
That gave us pause. So we needed to consider the risk behind the spreadsheet, and balance that with the reward. We couldn't escape one fact; that a spreadsheet in perl hadn't been done to our knowledge before. This is what made us consider the spreadsheet idea further.
A spreadsheet in Tk would be an exceedingly cool thing to do. If done, such a spreadsheet would be completely portable between operating systems. As soon as perltk was ported to the Mac, the spreadsheet would work on the Mac as if by magic. Hell, the way that things are going for Perl and Tk, we might end up using our spreadsheet on Windows 3.1 for all we knew!
No other spreadsheet that we knew of was both useful and portable like this. And, knowing perl's flexibility and versatility, we had a hunch that the code for making a useful spreadsheet probably would be a lot less than 30,000 lines. Hence, we had one large plus in our favor - flexibility - and we trusted perl as a development platform. After that, the positives pretty much kept rolling in:
1) we would get a lot of free debugging, and development from the user community. We had always known that we were going to give away the source code for free (well, or as part of this book) so we knew that we could count on having bug-patches, fixes, and so forth come in from the net, from people who tried our spreadsheet application and found it useful.
2) Our spreadsheet would have the source available. Since our end-users could see the code, they could modify the spreadsheet - make it better - and tell us of the modifications that they made. They could also use parts of it to form other applications, and tell us of applications that they formed. These 'exchanges of ideas' would be vital if we decided to make a bulletproof version of the spreadsheet.
3) We wanted the power of perl in a spreadsheet. We said, up top, that we only had a development team of one. This isn't strictly true. When you develop a complex application like this, you are inevitably using perl to expand the functionality of what your application can do. We saw this with eval - you can write a simple calculator by saying:
while (defined($line = <STDIN>))
{
chomp ($line);
eval ($line);
last if ($line eq '.');
}
Without explicitly stating it, this gives you the entire syntax of perl at your disposal, and hence ten years worth of development time in three lines of code. We wanted the same types of 'perl usage' in our code.
4) We wanted to be able to load perl modules into our spreadsheet. This is really an extension of point #3, but by loading modules into our spreadsheet, we can not only use the power of standard perl in our spreadsheets, but the power of extended perl (web interfaces, database interfaces) as well.
5) We wanted our spreadsheet to be portable. The big spreadsheet out there is Excel, of course. Unfortunately, Excel doesn't run on multiple platforms, and we wanted to have a way to process Excel spreadsheets on different platforms.
6) We wanted to have our spreadsheet be able to work inside other applications. Since the code is available through a public API, we wanted to see if we couldn't link up the spreadsheet with other modules in the 'free' world of CPAN. Such things as GD, which allows for plots of diagrams, and of course CGI (to put spreadsheets on the web) would enhance both our program and the public domain programs on CPAN.
7) We wanted a spreadsheet that was just a spreadsheet. A spreadsheet in our minds, was something that kept track of running totals based on formulas, typed in cells. It is not supposed to do graphs, nor is it supposed to process documents. Those could be done by other modules inside the perl CPAN hierarchy - doing them in the spreadsheet itself seemed like a waste of energy, as well as being more complicated to use.
8) We wanted our spreadsheet to be customizable, by keybindings. Some people in the Unix community like working with emacs keybindings, some like working with vi keybindings. Some people on NT like working with Excel bindings. We wanted to be able to make the spreadsheet customizable enough to support all three.
9) We wanted to make our spreadsheet safe. Too many programs out there give people power to do things without thinking of the consequences of that power. Word, for example, has macros that can be loaded into memory without the user knowing it - this is a common recipe for viruses (and it lost me 60 pages of this book with one fell swoop). And with MSWord being granted all the power of Visual Basic in version 8.0, watch out! Hence, we wanted any macros, any outside source code that is loaded into our spreadsheet to be done so with the explicit knowledge of the user, and the explicit OK of the user.*
* To be fair, Microsoft is taking steps to combat this type of virus by having the automatic macros need an explicit OK from the user. Unfortunately they still allow macros to be stored inside of documents. They shouldn't - they should store pointers to macros, and have the user load each one explicitly. |
10) We wanted to see if perl was up to the challenge of making a large useful app like this. Spreadsheets like Excel are pretty complicated, and we wanted to see if we could write one in perl! Would the performance be satisfactory? What problems would we run into in programming it? We would learn a lot by 'pushing the edges of the envelope' so to speak.
So there we had it - ten good reasons to see if we could make a spreadsheet, and ten incentives for doing it well. We didn't deceive ourselves; there is no way that we could emulate excel if we even tried. It is true that excel does have a huge development team - and for good reason It has tons and tons of graphical features, and 'feature rich' is a mild adjective when coming to describe it.
But just because we couldn't match up to Excel's features in one way (if we actually wanted to, which is doubtful) didn't mean that we could contribute something value in other ways (as in the ten ways we mentioned above).
Hence, we decided to go for it. The next step, then was to come up with a high level design that addressed some of the concerns that we had, and to lead to a clean implementation.
So how to design this program? GUI programs like spreadsheets are heavily user-centric. In other words, you need to think of a lot of test cases to justify your design.
We already have a bit of a start; If you look above, the 10 'reasons' that we came up with for doing it are also requirements. We required it to be portable, we required it to be safe, and so on.
But these aren't going to be enough to make a full-fledged Spreadsheet. So we come up with other criteria, sample features that we believe a spreadsheet should be able to do:
1) A spreadsheet should support functions and formulas that can manipulate the value of its cells.
2) A spreadsheet should support different formats of data. For example - dates should have different functions for manipulating them than integers. The cell should be able to know which is which, as well.
3) A spreadsheet should be able to keep the value of its cells current. If another cell changes its values - and a whole bunch of other cells are dependent on that cell - the spreadsheet should be able to update the cells dependent on the changed cell.
Now, if you take a look at these three design issues, we have developed two basic objects that we can discuss further; spreadsheets and cells. We say above that a 'spreadsheet' should be able to do this, and a 'cell' should do that - so these seem two natural places start in our quest for a high level design. We therefore start with a very simple diagram, something that looks like Figure 25.1:
251.fig
Figure 25.1
Simple design - two object approach.
In our first iteration, we take the three issues that we have discussed up above about what a 'spreadsheet should be able to do' and see 'how they fit' into our current model.
Cells:
Cells should: support what values they have, and their formatted value. Cells should also change their own values - the spreadsheet should not change it for them. Likewise, if the cell is displaying something in the 'date' format, it had better know what a 'date format' actually is.
Spreadsheet:
Likewise, the Spreadsheet should be in charge of figuring out the relationship between cells. If a cell changes, the spreadsheet should know which values are dependent on that cell - and be able to update them.
Suppose that a cell 'c1' references two other cells, as in:
=a1+b1
Now, suppose a user comes along and changes the value of b1. If we want our spreadsheet to work properly, we have
1) have the spreadsheet remember that c1 is in fact dependent on b1
2) tell the cell c1 that this dependency exists, and to therefore update its value
So far, so good. Our two object design works pretty well at figuring out how to actually change values of cells, and it seems to support the concept of 'different formats of data'. The spreadsheet acts as a railway junction to its cells, telling them what to do - when the cells themselves are responsible for keeping their values up to date.
However, we are omitting one big thing here; the GUI! How does the GUI actually enter into this design of two simple objects? There are two possible ways that we can continue here:
1) the GUI could actually be incorporated into the Spreadsheet object and Cell objects themselves
2) we could make the GUI separate from the Spreadsheet and Cell objects.
Design approach #1 is what we did with our Calendar application in Chapter 21. We did not have a separate object for the 'CalendarGUI' and 'CalendarData.
Instead, we simplified the design by lumping everything that had to do with the 'Calendar' into a Tk::Calendar object. This made things easier, but we lost:
1) the ability to use the calendar in a way separate from the GUI
2) the ability to do effective regression testing.
If we wanted, say, to automatically assign appointments into our calendar from a text file, we would need to change the calendar interface to do so. Everything went through the Calendar GUI, so we would be forced to make a GUI/file interface.
Likewise, if we wanted to do automatic testing on the data portion of the calendar object, we were just as stuck. Since the GUI was an integral part of the object, this prevented automatic regression testing; we would probably have to wait for WinPerl++ to be ported to Unix in order to make an effective Regression Test.
With our the Calendar, I think arguments could be made that such a move to simplify the design was indeed justified; we probably made the code about 300 lines shorter by doing so.
However, in the case of our spreadsheet I think we shall need a more complicated design, and to split GUI from functionality. We have stated in our requirements that 'the spreadsheet object should be able to be used in other applications.
Say we wanted to have an interface which automatically took spreadsheets, formatted them a certain way, and then printed them. Having a GUI integrated into the SpreadSheet object would be a major pain - we could implement methods that made the GUI invisible, but it still would be a performance drain, and would also be some complex coding.
No - our requirements almost dictate that we separate the GUI from the Spreadsheet Engine. Our object design becomes something like Figure 25.2:
252.fig
Figure 25.2
Next Round of Design - Extra GUI Object Tk::SpreadSheetWindow
So we have the Object diagram for our Spreadsheet.
The question is what the Tk::SpreadSheetWindow should like? If we take Excel as an example, we see how every feature under the sun can be put into a spreadsheet. Looking at an Excel spreadsheet is an exercise in humility. It looks like Figure 25.3:
ExcelSnapshot.PNG
Figure 25.3
Sample Excel Spreadsheet
There are probably thousands of man-years put into this interface, if not tens of thousands.
Excel is an exceedingly complicated environment - just move around inside Excel and you will see. You can resize all of the menu bars, you can change sheets, you can make the text in your spreadsheet have different fonts.
If you click on certain parts of the menu, things start to resize. Different drop down listboxes have different values (you can resize your spreadsheet with one), and there are several 'chartwizard' windows which appear to graph your data in certain preset ways.
And so on, and so on, and so on. One can spend a lifetime learning to use Excel. So we classified features that are related to a spreadsheet, and which ones are not:
1) Resizing fonts, document size, and so forth, are not part of a spreadsheet. They are part of a word processor - so they go.
2) being able to resize the sliding bars at the bottom and side of the document, plus all of the graphical wizardry, is pretty inconsequential to actually using the spreadsheet. So we put it on the list of 'nice to haves' and they go.
3) Charting, graphing, and so forth are not parts of a spreadsheet. They are part of a graphics package. So they go.
4) Half of all the small buttons up top (print, file, summation, fourier series, and so forth) are due to features that we have already crossed off our list. The other half pretty much clutter up the GUI design. So they go.
When we actually pared down all of the 'features' that we deemed unnecessary or 'Nice To Have' in Microsoft Excel, we got a spreadsheet design that looked like Figure 25.4:
shot1.tif
Figure 25.4
Design for Perl SpreadSheet.
This seemed doable. We don't have the nice graphics, but we do have a functional interface that can be enhanced at a later date into something more 'snappy'.
We therefore went on to the next level of design: actually taking the picture we had in our head, revising the OO design, and describing the elements that were to be in our spreadsheet.
As you can see from our original screenshot up above, we have definitely cut a lot out of our original inspiration - Excel. However, notice that it was not all subtracting. In particular, where excel had one column to indicate what the value of a given cell was, we now have two: 'utxt' and 'fmt'. (we shall get to their definitions in a moment. )
The other main change we have is in the menus. Figure 25.5 shows a shot of the menus as we designed them:*
You might be wondering how we actually got these screenshots. Tk provides a feature called 'detachable menus' which are really useful. If you click on the dotted line, which we shall see shortly, the menu disassociates itself from the underlying window. This really helps your work flow because selecting a certain menu option becomes a one step process rather than two - you just point and click on the menus. |
menusonly.tif
Figure 25.5
Shot of the menus, by themselves.
These menus, made through the Tk::Menu object are pretty generic, but need to be supported. Table 25.1 shows you what each option does:
Table 25.1 Menu Options
'Open' Opens a file
'New' Makes a new file
'Save' Saves a PSS file to disk under current name.
'Save As' Saves a PSS file to disk under a different name
'Setup' Sets up a file for printout ( currently unimplemented)
'Print' Prints a file out
'Quit' Quits the application
'Insert Column' Adds an extra column to the spreadsheet where the current cell is located
'Delete Column' Deletes the column underneath the current cell
'Insert Row' Inserts a row underneath the current cell
'Delete Row' Deletes the row under the current cell
'Copy' Copies both the user text (utxt) and format (fmt) of the cell into all
other cells below the topmost cell in the range specified by the mouse.
Works only downwards, not across.
'Copy utxt' Copies just the user text of the topmost element into the other cells.
Works the same as 'Copy'.
'Copy format' Copies just the format of the topmost element into other cells. Works
the same as 'Copy'
'Add require' Import a perl module into the spreadsheet so that one can use its
functions inside the spreadsheet itself. The spreadsheet already has
Date::DateCalcLib internally, to do date modifications.
All of this functionality then needs to be filtered into our current design. Where do we put the 'Open' functionality, and the New functionality of the file menu item? And what do we do with the different parts of the GUI?
Well as you can see in Figure 25.6, We have beefed up our object diagram a bit to accommodate all of this information. We still have kept the Tk::SpreadSheetWindow object, but we now know what objects the Tk::SpreadsheetWindow will consist of:
256.fig
Figure 25.6
New, revised Object Diagram
We haven't added any objects to our 'functional' part of the spreadsheet (the PSS object still stores the actual 'numbers and strings' of the spreadsheet. Now it gets the role of saving and loading the PSS file as well), but we have added quite a few objects to our Tk::SpreadSheetWindow. So - going in order of visibility:
1) The Canvas will be the main place where the spreadsheet information is displayed.
2) The two Entry windows up above (labeled utxt and fmt ) show the contents of the cell that so happens to be highlighted inside the canvas.
3) And the menu bar is a separate entity entirely, which is used to access the operating system, print spreadsheets and so forth.
These three objects' places on the spreadsheet are shown in Figure 25.7:
257.fig
Figure 25.7
SpreadSheet Layout: Objects to Figures
The idea of the Tk::SpreadSheetWindow on top is simply another issue of control. Just as we had the PSS object control all the actions of its underlying cells, we have the Tk::SpreadSheetWindow control the actions of all the underlying windows. It acts as a railway junction for messages that come from user events (mouse clicks and so forth) These events are then parceled out by the Tk::SpreadSheetWindow to their correct destination.
Our design is beginning to solidify. When we get to this stage where it looks like we have a skeleton of a program, we can start thinking of the elements that go inside the classes that we have planned. The ones we tentatively come up with are in table 25.2 (in order of their hierarchy level - Tk::SpreadSheetWindow HASA PSS, which HASA Cell):
Table 25.2 Objects and Tentative Data stored inside.
Object Elements
Tk::SpreadSheetWindow PSS, Canvas, Menu, utxt and fmt Buffers, FileSelect
column_widths (pixels)
PSS Cells, Dirty Cell list, Modules to Create Spreadsheet
column_widths (characters)
Cell utxt (user text), fmt (format to use) fmt_val (formatted value)
Here's a small definition for each of these elements we have not seen yet:
utxt and fmt buffers: These two buffers are the 'Entry' windows at the top of the screen. Utxt stands for 'user text', and contains the input that the spreadsheet user gives to the PSS module to be processed into cells. The fmt buffer lets the user decide on how the text is going to be displayed; on the left right or center, as dollars, money or as a date.
FileSelect box: The file select box is an interactive menu which allows us to save and load spreadsheets from disk.
Column_widths (in characters): Our spreadsheet is going to support variable length columns (so we can display large amounts of data) hence we need to store them somewhere. The 'column_widths (in characters)' shows how long each column is (in characters) and is stored in the PSS object itself
column_widths (in pixels): The column_widths are no use to the spreadsheet in the unit 'characters'. They are translated to 'pixels' and saved inside the window itself so we can display our spreadsheet correctly.
Cells: The cells are objects that actually are the 'lowest level' item of our spreadsheet. They contain the methods for giving data to be displayed on the canvas screen.
Dirty Cells: Dirty cells are cells that are 'wrong'; they are dependent on another cell, and that cell has changed value, thus throwing them out of sync. The PSS object keeps a dirty_cell list so that we can put the cells back in sync when they change like this.
Modules to Create Spreadsheet: These are modules that have been loaded into our spreadsheet to provide perl functionality (part of our spec that we listed above - to provide the power of perl in our spreadsheet). Once loaded into memory, one can have as a value any function inside one of these modules which returns a value.
Fmt_val: A scalar which is calculated by taking the format (in fmt ) and applying it against an underlying utxt value.
These are just concepts right now, but you can hopefully see some of the logic of the application forming.
Once we have gotten to this level of detail, this is the time to actually start thinking of scenarios to test our design. We want to go from beginning to end; from the moment the GUI gets an event, through to the effect of that action. Doing this gives our design a good road test before we commit to anything.
So here are three scenarios we imagine, that may come up during a given session with our spreadsheet application:
1) user clicks on pixel 300x450 within the Canvas. When the user clicks on a given pixel with button 1, a callback is made, to figure out which pixel was actually selected.
These row and column values are then passed back to the Tk::SpreadSheetWindow, which then figures out which row, and column the pixel actually is in. This is then passed to the PSS object, which queries the appropriate cell for values.
The Cell, being queried, then sends back its utxt, fmt, and fmt_val values to the PSS object, which then forwards these values back to the Tk::SpreadSheetWindow, which then forwards these values to the two buffers, which then displays them to the screen. The net result is that we get the values in the current window in sync with the values on the buffers. Whew! We have something like what is happening in Figure 25.8:
258.fig
Figure 25.8
Message passing diagram for pixel Click.
As you can see, we always move in an up-down fashion, adhering to the 'parent-child' object relationship that we described in our Object Diagram. We need to. In applications this size, if you have circularities where one function sidesteps the message chain that you set up, your code will choke itself up, since you will form interdependancies that make changing one part of the code break another.
For example - it looks tempting for the utxt and fmt windows to directly alter a cell when it changes the cell's value. It seems like we cut out several steps in the messaging chain that we set up. However, doing this creates a dependency of the Cell on the Edit window - and we form a Object chain like Figure 25.9:
259.fig
Figure 25.9
Loop in the messaging chain.
This is harmful in two ways. First, the program becomes hard to debug. Before we could always be sure that the Cell's requests were changed via the PSS object. Now we can't make that assumption, so we need to track the source of change.
Second, we are stuck making sure that the code for the Cell and the code for the Edit window can pass messages to each other successfully.
Since we already have to make sure that the Cell objects and PSS object can talk to each other, this means that we have doubled our maintenance cost for this code.
2) User edits utxts, and hits return. In this scenario, we go in the opposite direction as we did when we clicked on the pixel 300x450. The user wants to change values inside a certain cell. He/she edits this cell, and presses return. When return is hit, a callback is thrown by the utxt Edit buffer, which tells the Tk::SpreadSheetWindow that utxt has changed in the current Cell.
The Tk::SpreadSheetWindow then tells the PSS object that this has occurred, and which row and column it has occurred in. The PSS object then tells the appropriate cell to update its values - which it then does, passing back its new value.
PSS then looks at how the change of the Cell affected all of the other Cells, and passes back a list of values that are to be redisplayed.
Tk::SpreadSheetWindow then tells the canvas to redisplay these cell's values. This buck-passing can be captured in looking at Figure 25.10:
2510.fig
Figure 25.10
Message passing diagram for spread sheet save.
Again, we go through the messaging chain that we set up, and again we have avoided passing elements 'sideways'.
From these trials, it is beginning to look like we have a plausible design. If we need to add new functions, we will be able to 'pigeon hole' the addition so:
1) the lowest level functionality, the level that affects the cell is added to the Cell class
2) functionality to tie two or more cells together is added to the spreadsheet
3) the functionality for adding to the GUI is added to the Tk::SpreadSheetWindow.
The only thing that really could derail us is performance. However, and this is important, we do the best design first and put performance as a concern on the back burner. Only if performance becomes a bit pressing do we rethink our design; for it will be easier to go from a 'correct' design to a 'performance based' design than it will the other way round.
We test the flexibility of our design one more time just to be stubborn:
3) User hits 'save as' from Menu. This time, when we test, we will come from a totally different angle from what we did before. We will look at the menu, and see how a function from the menu works through our 'chain of command' design we have set up.
First, in this scenario, the user hits 'save as'. This then causes the Tk::SpreadSheetWindow to open a File Select box. The user enters in a filename, which the Tk::SpreadSheetWindow then verifies can be written by our program.
The Tk::SpreadSheetWindow then tells the SpreadSheet to save the file. The PSS object saves:
a) the list of modules that we require to actually run this particular spreadsheet.
b) The column widths of each column the user defined
c) the column/row values, saving utxt and fmt
These three elements - stored inside the SpreadSheet module itself - are enough to recreate a '.pss' file, which is in turn enough to actually drive the spreadsheet program.
Our design seems to be holding firm. The three scenarios that we tried all pass muster with the design that we have developed. Therefore, we take this as a 'first pass' and move to the next stage, where we tackle some of the thorny algorithms that we will encounter in the actual writing of our spreadsheet.
If we can tackle these algorithms, and they seem to hold firm with the design that we have created, we can then move to coding.
The idea of a spreadsheet is to be able to manipulate numbers and/or data easily. Our spreadsheet's effectiveness therefore relies heavily upon the user can react with two main elements of the spreadsheet in a useful manner:
1) the 'utxt' Edit window where we input 'user text'
2) the 'fmt' Edit window.
After all, the utxt forms the basis of what we see on the spreadsheet window, and the fmt forms the basis of how we see it. As we want to make our spreadsheet as versatile as possible, we support four types of operations:
1) basic data. Utxt values that have no '=' in front indicate basic data - strings numbers, or dates. For example, the 'utxt' values of '12331' or 'The Wild One' are both literal values to be printed out on the spreadsheet 'as is'.
Since perl has a 'typeless' format (the scalar), supporting the 'basic data' type in our spreadsheet is trivial. In a strongly typed language like C++, it would be a lot more difficult since we would have to make our own version of the scalar. To do so in a way that is as flexible as perl's built in would be a challenge indeed
2) evaluatable expressions. Utxt values prefixed by one '=' indicate that the utxt is to be interpreted as an expression, and not a literal value.
As such, they are executable by perl, and use perl syntax to be executed. All of the following are legal utxt strings - they are shown with what they get evaluated to:
a) = 10 + 20 => evaluated to 30.
c) = reverse ("fmtyewtk") => evaluated to "ktweytmf"
Example 1 shows what you might expect in an excel like application. However, examples 2 and 3 are perl syntax specific: the first concatenates 'dog' and ' catcher', the second reverses the string (to get 'ktweytmf')
However, if you notice, these examples show just 'strings'. We still need variables in our spreadsheet; we still need the ability to reference the data inside our spreadsheet rows and columns.
To do this, we borrow the idea from excel of cell specifiers. Cell specifiers are strings like 'a1' which tells excel to grab the value inside the spreadsheets column labeled 'a', its row '1'.
Thus, we add the syntax to support two sort of cell specifiers in our spreadsheet. They are:
In addition, we need to be able to use these specifiers inside the perl expressions that we talked about. Hence the following expressions are just as legal inside pss:
which are to be read as 'the value inside the cell a1 squared', 'the length - in characters - of the value inside a1', and 'the string made by joining the values in a2, a3 and a4 together by ,'.
3) built-in spreadsheet functions. We want to give the spreadsheet its own specialized functions. For example, we want to be able to say something like: =date_add(a1, 1, 'month'), to add one month to the date contained in the cell 'a1'.
4) indirect referencing of other cells. When a cell 'indirectly references' another cell, it does not simply grab the value behind that cell and use it as part of its calculations. Instead, it copies the user text of that cell, and modifies it so that the references inside that user text are useful to the cell itself. This concept is a little hard to envision, so we go over it in detail below.
In effect, the 'built-in spreadsheet functions' (point #3) could be combined with the evaluatable expressions point in #2 - since we shall build these 'built in' functions as - what else - perl functions which we then link to the spreadsheet! Therefore, there are only three distinct types of usage for utxt that need describing on how they are going to be implemented.
The first use of the utxt we will discuss is probably the most frequent: where we are setting the cell equal to a number or string, or other piece of data, or a reference to another cell's number or string or other piece of data.
If you see a utxt string that looks something like:
10000
then you know that it is a number, or a value. If you see a string in a utxt window that looks something like:
= a1 + b1
then the cell is referencing the values of a1 and b1, and summing them to form a total. Figure 25.11 shows this type of usage in action:
algorithm_realsimple.png
Figure 25.11
algorithm 'snap' in action
Of course in this example we somehow need to translate the string b1 into the value behind the cell b1. So how should we go about translating this into returning the right value?
Well, first of all, realize that the a utxt that is an expression might itself point to a cell that contains another expression. Hence if a cell a0 has:
= a1 + b1
as text, we may open up b1 and find that it has:
= c1 + d1
inside, which itself could contain '=e1', and so on.
Second of all, realize that if any one of the cells pointed to up above changes, in any way, it forces us to re-evaluate a0. This could happen anywhere along the chain that a0 depends on.
Hence, in order to make this work we are going to need to do two things:
1) come up with an algorithm that is capable of evaluating expressions which may have more than one 'level' of nesting and
2) come up with an algorithm which keeps track of dependencies of the cell on other cells.
Furthermore, whatever algorithm we design is going to have to work given our message passing paradigm. The cell, if it computes its value, is going to have to pass that value back to the PSS, which is then going to have to pass that value back to the Tk::SpreadSheetWindow for processing.
If we developed a large language to do this processing, where we parsed all the syntax that a spreadsheet could have, this could easily take months to get right.
So what do we do? We wimp out. We already have a large language that is suited to do any function that a spreadsheet would ever want to do - and moreover we are programming in it. Instead of trying to parse the expressions above and come up with a language, we simply turn:
=a1 + b1
Into a perl function like:
$pss->value(0, 1) + $pss->value(1,1);
through a regular expression. 'value' is a method for the spreadsheet that gets the final text that is to be displayed on the spreadsheet window, and it works recursively. If the value for '0,1' contained:
= c1 + d1
then 'value' would call itself with the two function calls:
$pss->value(2,1) + $pss->value(3,1);
and it would continue to do so until it hit the numbers, strings, and so forth that 'value' contained.
The nice thing about this approach is that it works so well for multiple values. For example, suppose that you have a tag that looks something like
= sum (a0..a4)
Then this easily translates into a different method, one that looks like:
sum($pss->values_down(0,0,4));
to be read as 'sum the values - going down - from zero to four in column 0 (a)'.
The beautiful thing about the algorithm above is that if we implement it right, we pretty much get - for free - the use of the whole perl language in making our cells. Figure 25.12 shows a simple case of this, where we square the value of a column to get another column:
algorithm_simple.PNG
Figure 25.12
algorithm used in conjunction with perl syntax
Simple, really, but you are really getting a lot of power for free. Since we can use any perl syntax inside our spreadsheet, we can do quite complicated transforms that Excel can't even come close to matching. Figure 25.13 shows one of these transforms, having the last column show, dynamically, all the words in our document including the letter 'e' joined into one long string:
silliness.PNG
Figure 25.13
utxt to show 'e' words joined together by commas
The other two columns show the first word in sorted alphabetical order, and the last word sorted in alphabetical order. If we changed cow to bovine, then the 'cat' word would automatically change as would the string: it would have bovine appended to the front.
Finally we will consider a piece of syntax that isn't in any spreadsheet that we are aware of. As we saw above, an '=' sign in a spreadsheet is pretty much like a reference in perl or a pointer in C - you are saying that the item after the equals sign refers to a previous item.
However, consider the problem of making two columns connected to each other. Say you wanted to make all the numbers in one column the same as all of the numbers in another column minus 10. You could say something like in Table 25.3:
Table 25.3 utxt for making two columns connected
=B1 - 10 14
=B2 - 10 33
=B3 - 10 12
=B4 - 10 15
=B5 - 10 16
Here, we make A1 equal to B1 minus ten, A2 equal to B2 minus ten, and so on down the line. This is the method that Excel uses. You highlight a column of the spreadsheet and Excel links things graphically for you. If you actually then look at the cells of the spreadsheet you will see something like the above.
Microsoft Excel relies on a smart algorithm to determine what you are meaning when you 'Cut' and 'Paste' a reference. When you say something like:
1
=A1+1
and then 'copy' the second format into a buffer, and then 'paste' the format into cells A3 through A10, for example, you get some text values that look like:
1
=A1+1
=A2+1
=A3+1
=A4+1
=A5+1
=A6+1
=A7+1
=A8+1
=A9+1
In other words, you are forming a 'chain of responsibility'. A9 looks at A8 which looks at A7 which looks at A6 and so on and so forth. This 'chain of responsibility' that you form has the effect of making the values above into:
1
2
3
4
5
6
7
8
9
10
which is usually what you want.
The Perl SpreadSheet does the same sort of thing by explicitly stating that a cell is going to have a dependency on another cell - but a dependency with respect to position. Lets look at the same example, in the perl spreadsheet's lingo:
1 1
2 =a1+1
3 ==a2
4 ==a2
5 ==a2
6 ==a2
7 ==a2
8 ==a2
9 ==a2
10 ==a2
What happens here is that the PSS object silently translates the third line '==a2' into '=a2+1', line 4 into '=a3+1', and so on. And again, this translates into the same, numbered list:
1
2
3
4
5
6
7
8
9
10
Say we were looking at cell a9 up above. a9, like every other late cell has '==a2' as a user text element. And since we want a9 to print out that it is on line number 9, we would want this to transform into '=a8+1', which in fact it does. '==a2' transforms into '=a8+1' by the following process
1) A9 recognizes that A2 is seven steps away in the row direction, and 0 steps away in the column direction.
2) It then looks into A2, seeing that it is dependent on the Cells A1.
3) Since A2 is seven steps away, and is dependent on the previous cell A1, A9 assumes that it is dependent on A1 + 7 or cell A8.
4) Hence, the formula for A9 becomes A8 + 1.
If we look at the actual application then, we can see a four step process for making links between columns. Figure 25.14 shows a snapshot of this process:
algorithm_complex.png
Figure 25.14
'==' function in action.
We have four steps here. First we set the utxt of cell a1 to be equal to '=b1 . b1' - we are linking the two rows 'a' and 'b', such that row 'a' is a doubled version of row 'b'.
Then, we go to the next window down, and set the value of cell a2 to be equal to '==a1': this links a2 to a1 such that a2 copies a1's function in the way described above - making a2 related to b2 and not b1.
And finally we copy the utxt and fmt of A2 to the rest of the cells below. The utxts of all of them become '==a1', and the values of column 'a' all become doubles of their cousin in column 'b'. Here's another example:
algorithm_date.png
Figure 25.15
'==' function in action for dates
This shows '==' being used to make a list of dates, each one month apart from each other. The key is the dateadd function in cell a2; but the '==a2' in all the other cells is what makes the addition possible. We could just as easily made them hours, days, or years apart, all with the power of dateadd.
We will consider one more hurdle that we face, before going on to look at the pseudocode for our spreadsheet.
Note that when we link two cells together, we might have the problem of 'infinite dependencies' to deal with. Suppose you define cell A1 to be
=B1
and cell B1 to be
=A1
This program could end up 'chasing its own tail': A1 would evaluate B1 to see its value, B1 would then evaluate A1 to see its value, and so on for all eternity.
Excel, for one, does not deal with infinite dependencies very well. Lets take a look at how infinite dependencies can be a pain. Do the following keystrokes:
enter in =B1 in cell A1, = A1+1 in cell A2.
Copy this format down 5 times, to A6.
Make B6 be dependent on A6 + 1, ie: =A6+1.
Copy the format down to B1.
Excel now goes into conniptions since it cannot find the start of the reference chain:
ExcelConnip.PNG
Figure 25.16
Circular references.
The #REF! Statements indicate that you have hit a point at which excel can no longer iterate. It hits an infinite loop, and 'gives up', thus dumping the text that you see on the screen.
We don't want to have this happen in our perl spreadsheet, so when we actually evaluate our code as in :
=a1 + b1
we silently keep track of the dependencies that we encounter. We also, silently, keep track of the old values that were in the cell before the evaluation occurs.
If we hit a dependency that so happens to cause an infinite loop, we will have marked it as already been hit. If a1 hits c1 from the source b1, a1 better not ever hit c1 from the source b1 again. Note that we are tracking source/destination pairs - it is perfectly legal to hit the same square more than once. If the square that we want to evaluate, d1, contains:
= a1 + c1
and square a1 contains '= b1' and square c1 contains '= b1', we have hit '=b1' twice, and the chain stops, making d1 equal to 2 times b1.
This is just a small sampling of how the algorithms work inside PSS - we haven't even considered such issues as copying between buffers, our notorious require statement which lets us run any module we'd like from our spreadsheet, or how we resize/delete/ and insert columns. Nor have we considered formats - which let us see information in different ways.
In actually implementing the real thing, at least having those issues in the back of your mind is really helpful. I never know, before I start a major project, all the algorithms that I will use, but I have an idea of some.
I hope that the above text helped to give you 'hooks' to be able to dig into the code itself, which is the 'source of record'. For this is where you will learn the most about how the spreadsheet is working.
For example, I must confess there was one simplification above, in the algorithm about keeping the canvas of the spreadsheet up-to-date. This plays itself out beautifully in the code, but we can only describe it briefly here.
The simplification is: in what order do you update your cells? If a1 changes, and a10 is dependent on a9, which is dependent on a8, and so on, down the line to a1 - then in order to update the code correctly, you need to change a2 first then a3 and then a4, and so on. If you do it in any other way, you get wrong results. See the section on 'dirty_cells' in PSS.pm. |
This is the point where we say 'and then a miracle occurred'. Well, not exactly. We took the information above, the design that we built and the algorithms that we guessed, and actually churn out a spreadsheet. As we have discussed over the course of this entire book, this process was a rather vigorous process of:
In particular, the spreadsheet had a lot of levels of iteration in development. As you can gather from above, spreadsheets are rather intricate pieces of software; a lot of different parts have to work together very closely, in order to provide the 'seamless' experience that you get when working with one.
We designed the basic GUI interface first, with the canvas, utxt and fmt buffers, and menu - and played around with it until we were satisfied with how the interface felt.
Then we made the canvas work, so that it showed text values. Since we made the utxt/fmt values go through the Tk::SpreadsheetWindow, and disassociated them from the actual spreadsheet itself, we could test the connection between the two GUI elements and actually 'see' a pseudo spreadsheet on the screen.
Then, we added the basic elements of the PSS - the spreadsheet itself. Evaluated expressions did not exist, and the only thing you could have in your spreadsheet were actual perl strings, but by doing so we slowly integrated the two separate objects (PSS and Tk::SpreadSheetWindow) together.
After that came the more complicated process of the Expressions. At first, '=' was the only thing that existed, so you could show evaluated perl expressions on the screen. Then, we added the references to other cells, with the two types of syntax (a1 standing for cell a1 and a1..a3 standing for range a1 to a3.)
And so on. The last things we added were:
As you can see, implementing pss was quite the iterative process (just as iterative as designing it), and the process has not stopped yet. 'pss' is alpha software. Just play around with it and you will see; there are a number of bugs in pss. We anticipate that finding all the bugs in the UI will take a while, and we are looking for suggestions on improvement.
However, the actual spreadsheet software (implemented in the PSS object) is quite a bit more stable, and this is due to the regression tests that we have developed. We stated that one of the major benefits of actually separating GUI from functionality was for regression testing, and appendix 1 shows why. We have over 400 lines of code in our testing suite, and they go over a lot of the common data requirements for the PSS object itself.
When GuiDo gets ported to the Unix platform, we fully intend on making a regression test suite for the GUI part of the application. This will be essential for having a 'baseline' of what is working in our project, and what is not.
The final product so far is listed in Appendix 1, and we strongly suggest that you either go over it, or look at the code on the CD, for quite a few more details on the implementation process.
The original intention of this chapter was to go over the further design decisions, and so forth in rather excruciating detail - but then we realized that we were just duplicating the documentation that is attached to the code. Above is a quick summary of how it was done. Appendix #1 has a lot more detail (in fact all the detail we have so far via code.)
However, we also had a less practical motive... We wanted to play around with the spreadsheet! When you design and implement a cool application like this, the temptation to see how far we can stress that application is pretty strong, and you spend a great deal of time and energy seeing what the application can do.
Lets just recap - these are the strengths that we are going to exploit when testing the perl spreadsheet:
1) flexibility in entering data into the application
2) flexibility of displaying data in the canvas
3) ability to load perl modules to interface with the spreadsheet itself.
This covers quite a bit of territory, but it will do as a starting point. It lets us actually do something useful, rather than imaginary when we test our application. That 'something useful' is something that people have on their minds quite a bit nowadays. We shall use our spreadsheet to track the Dow Jones industrial average in real-time.
If you have done any investing online, you will know that the internet has become a speculator's dream. There are hundreds, if not thousands of brokers all scrambling to get on the internet now. They are lured by the picture of doubling their client list overnight, and in dread fear that their current client list is going to shrink to nothing - as everybody rushes to do trading online.
All of these brokerage houses have provided websites - and there are even quite a few non-trading houses which have put up information services which let you see the state of the different markets, sometimes up to the minute. Some brokerages have even gotten to the point where you can indeed trade online - sometimes for $10 per transaction (as opposed to $50 per transaction to do it traditionally via a face-to-face broker, and ,well, thousands if you want to get a brokers license and do it yourself.)
So this looks like a promising area to write an application. Making a stock monitor is not good at testing out our Spreadsheet application, but provides information that we would like to know in a format that is useful to us. As you remember from above, we can write modules to interface with our spreadsheet, which means we can use any of the modules that are available on CPAN.
The ones we will use are:
LWP::UserAgent;
HTML::Parser;
Both come with the distribution 'libwww' which is on the CD and available via CPAN. We have at least seen LWP::UserAgent in Chapter 12. LWP::UserAgent will be our method for getting the stock data in the form of an HTML page from one of the brokers mentioned above, and HTML::Parser will be our main window on parsing these online files for information.
Now the only thing we need then to put this whole 'Stock monitor' module together is a webpage which gives HTML quotes of stock prices. There are thousands - so we pick one pretty much at random: www.pcquote.com. We do a little bit of searching, and it provides a page that looks like Figure 25.17 at 'http://www.pcquote.com/cgi-bin/client.exe':
pcquotepage.png
Figure 25.17
PC quote page; provided information.
For the purposes of this application, we don't care about the graphics of course. We are interested in the text it provides. It provides
1) the symbol
2) the net change,
3) the last price of the stock at the time of closing.
In short, it provides the bare bones of information that we need to figure out our portfolio's price. We take a look a little bit more at the service the site provides, we find that it shows information accurate up to 20 minutes.
Perfect! We decide to make a small interface to parse the information out of the website that PCQuote provides. We will then use this module inside our spreadsheet application and see what happens.
Writing an module which automatically interacts with a website is a fairly common task. In fact, we've already done it in this book - in chapter 12, when we built the metacrawler to get web information from several search engines that already exist on the web.
In fact, we wrote a script to automate the process of writing interface files in chapter 12: we called it 'formbuilder.p'. We might as well use it right now, to figure out what to input into client.exe to get the stock information listed above:
C:\> perl formbuilder.p "http://www.pcquote.com/cgi-bin/client.exe"
Web Site: http://www.pcquote.com/cgi-bin/client.exe
----------------
Script Name:
"/cgi-bin/client.exe"
Actions:
Name: "quote1" Values:
Name: "quote2" Values:
Name: "quote3" Values:
Name: "quote4" Values:
Name: "quote5" Values:
Name: Values: "Clear"
So we have the ability to get five quotes at a time. Cool. This will definitely help when we get to actually interfacing our module with the spreadsheet - we won't have to go to the web each time we want to get a new stock. To get five stocks at the same time, we say:
http://www.pcquote.com/cgi- bin/client.exe?quote1=MSW"e2=SUNE"e3=IDE"e4=CSG"e5=WHOK
and the client will return us the stock information we require. All we need to do now is write the module around the query we constructed. This is listed in Listing 25.1:
Listing 25.1: StockQuote
1 package StockQuote;
2
3 use strict;
4 use LWP::UserAgent;
5 use HTML::Parser;
6 @StockQuote::ISA = (qw(HTML::Parser));
7
8 my $static_self = bless {}, 'StockQuote';
9
10 #--------------------
11 # Public
12 #--------------------
13 sub get
14 {
15 my $self = ref($_[0]) ? shift(@_) : $static_self;
16
17 my @symbols = @_;
18
19 $self->get_sym_info(@symbols);
20
21 wantarray ? map($self->{quote}{uc($_)}, @symbols)
22 : $self->{quote}{uc($symbols[0])};
23 }
24
'Get' is the only function that we need to worry about in actually downloading data from www.pcquote.com. In our spreadsheet, we will say:
StockQuote::get('MSFT','SUNW');
to download the stock prices for Microsoft and Sun Microsystems. Get is merely a wrapper around 'get_sym_info'; a private function, which itself is merely a wrapper around get_via_http. See below.
37 #--------------------
38 # Private
39 #--------------------
40
41 sub get_sym_info
42 {
43 my ($self, @symbols) = @_;
44
45 my @unknowns = grep(! defined $self->{quote}{uc($_)}, @symbols);
46
47 if (@unknowns)
48 {
49 $self->get_via_http(@unknowns);
50 }
51 else
52 {
53 print "IN CACHE\n";
54 }
55 }
56
Now, it would be a real drag if each time we wanted to download data about a stock symbol, we had to wait for www.pcquote.com to return the value. Therefore we use the get_sym_info function as a filter. We maintain a cache of quotes inside of $self->{quote} which we then access if we possibly can. Otherwise, we call get_via_http.
57 sub get_via_http
58 {
59 my ($self, @symbols) = @_;
60
61 while (@symbols)
62 {
63 my @get_list;
64 while (@symbols && @get_list < 5)
65 {
66 push(@get_list, shift @symbols);
67 }
68 $self->get_five_via_http(@get_list);
69 }
70 }
71
As we saw up above, the program www.pcquote.com/cgi-bin/client.exe can take up to five stock symbols. Hence, the loop in 65 to 67 pushes stock symbols onto a queue. When we hit five stock symbols, we actually call the http daemon.
72 sub get_five_via_http
73 {
74 my ($self, @symbols) = @_;
75
76 my $http = 'http://www.pcquote.com/cgi-bin/client.exe';
77 my $i = 1;
78 my $parms = join("&", map('quote' . $i++ . "=$_", @symbols));
79 my $request = "$http?$parms";
80 print "REQUEST $request\n";
81
82 my $req = new HTTP::Request("GET" => "$request");
83 $req->header("Accept" => "text/html");
84
85 my $ua = new LWP::UserAgent();
86 $ua->agent("Mozilla/3.0");
87
88 my $res = $ua->request($req);
89
90 if (!$res->is_success)
91 {
92 print "Error: " . $res->code() . " " . $res->message();
93 return '<error>';
94 }
95
96 $self->{re} = join("|", @symbols);
97 print "RE: $self->{re}\n";
98
99 my $doc = $res->content();
100
101 $self->parse($doc);
102 }
Finally, we get to get_five_via_http, which is the program that actually goes out and does the work to get our stock quotes. Lines 76-80 actually create our request. Line 101 uses the fact that we have subclassed off of HTML::Parser to get functionality from that module, parse the document, and return stock prices for our symbols.
To do this magic, we need to override one of the internal functions that HTML::Parser provides - we need to customize the text function, in order to fit it to our problem.
The rest of our module is specific to www.pcquote.com.
104 # override HTML::Parser::text
105 sub text
106 {
107 my ($self, $text) = @_;
108 chomp $text;
109 if ($self->{current_sym} && $text)
110 {
111 if ($self->{current_get} eq 'quote')
112 {
113 $self->set_quote($self->{current_sym}, $text);
114 $self->{current_get} = 'delta';
115 }
116 elsif ($self->{current_get} eq 'delta')
117 {
118 $self->set_delta($self->{current_sym}, $text);
119 #$self->{current_get} = 'delta';
120 $self->{current_sym} = 0;
121 }
122 }
123
124 if ($text =~ /\(($self->{re})\)/i)
125 {
126 $self->{current_sym} = $1;
127 $self->{current_get} = 'quote';
128 }
129 }
130
131 sub set_quote
132 {
133 my ($self, $sym, $quote) = @_;
134 print "$sym -> $quote\n";
135 $self->{quote}{uc($sym)} = to_float($quote);
136 }
137
138 sub set_delta
139 {
140 my ($self, $sym, $delta) = @_;
141 print "$sym -> $delta\n";
142 $self->{'delta'}{uc($sym)} = to_float($delta);
143 }
144
145 sub to_float
146 {
147 my ($num) = @_;
148 my $val;
149 $num =~ s/^\s*([+-])//;
150 my $neg = $1 && $1 eq '-';
151
152 if ($num =~ /(\d+)\s+(\d+)\s*\/\s*(\d+)/)
153 {
154 $val = $1 + $2 / $3;
155 }
156 elsif ($num =~ /([-]?\s*\d+)\s*\/\s*(\d+)/)
157 {
158 $val = $1 / $2;
159 }
160 else
161 {
162 $val = $num + 0;
163 }
164
165 $val = - $val if $neg;
166 $val;
167 }
168 1;
Finally, we standardize our quotes so that they come out to be floats. Stock quotes are based in 1/16 intervals - we want to show them in decimal value for the spreadsheets sake.
Now, we can test this on the command line. We make a simple script:
Listing 25.2 - test.p
use StockQuote;my (@quotes) = StockQuote::get('MSFT', 'SUNW');
print "Your prices are ", "@quotes\n";
This prints out something like:
Your prices are: 45.6875 133.1875
And voila! You have a direct connection to an online stock agency.
Now that we have a reliable way of getting stock prices directly off the internet, we can integrate the module with our spreadsheet. We could probably do other things with this 'stock grabber' as well:
1) make an automatic stock tracker which warns you if your stock goes down past a certain level. This could warn you via email or via cron job.
2) expand it with a 'buy' and 'sell' functions, so we could say 'StockQuote::buy('MSFT',100)'. Perhaps we could even integrate this into our Spreadsheet, to make it a stock trading app!
For now though, we are content to interface it with out SpreadSheet. We open up the Spreadsheet, and the first thing we do is add the stock symbols that we want to trace, plus how many shares of stock we have in each:
SharesNStocks.PNG
Figure 25.18
Stock Symbols inside Our Spreadsheet
So far so good. We have the information that we need to now go off and get the stock prices. Now, we need to actually 'require' the StockQuote.pm module that we wrote above. We click on 'Requires', 'Add Requires', as in figure 25.19:
RequireAdd.PNG
Figure 25.19
Adding a require tag inside our spreadsheet
and then pick 'StockQuotes.pm' from the menu that is listed there. This imports the 'StockQuotes.pm' functionality that we developed before, and puts the function 'StockQuotes::get' into reach so that our spreadsheet can actually use it. Now, a tricky step.
Remember when we made it so that the 'StockQuotes' module can go off and get five stocks simultaneously from www.pcquote.com? Well, if we now make a column which simply goes off and says:
StockQuotes::get(a2)
StockQuotes::get(a3)
StockQuotes::get(a4)
and so on, we will end up going to the www.pcquote.com stock page ten times in a row, and it will take a long time to get our quotes, indeed! So what we do now, is manually go off and get our stock prices 'in bulk'. We simply add an invisible column in a1, something that looks like Figure 25.20:
RequireAdd.PNG
Figure 25.20
Getting the StockQuotes from www.pcquote.com
which actually goes off and gets all the stock prices at once, without 'showing them'. We cache the prices for cells a2 to a10, so that when we start to add the actual column to show the values of our stocks in Figure 25.21:
StockQuoteFill1.PNG
Figure 25.21
Getting the value of our Sun Microsystems stock
This value (column b2 times column a2) is coming from a cache of values which is located internal to the StockQuote module.*
If you look carefully at the shell actually running our perl spreadsheet, you will see the 'IN CACHE' printout letting you know that this is occurring. Right now, unfortunately, there is no way of flushing the cache so you will need to reload the spreadsheet to get new stock price values. |
We now need to mirror this result for Sun Microsystems, and apply it 'downwards'; we need to take the formula to get stock quote prices, and apply it for cells c3 through c10. This is done by first making a 'evaluated cell' (ie: '==c2' in cell C3:
StockQuoteFill2.PNG
Figure 25.22
Mimicking the format in C2, for Microsoft stock
Now, we need to 'copy' this formula downwards, so that it is applied for all of our stock. We do this by highlighting cells c3 through c10, and hitting 'Copy' inside the Edit menu:
StockQuoteFill3.PNG
Figure 25.23
Copying the formula to C4-C10
Now that we have copied our formats, we can see the amount of money we actually have in both our Mutual Funds and stocks. We probably would like to see it as money though, and not out to millionths of a penny. This is the job of the 'format' command, and we can tell our spreadsheet to display our 'wealth' in dollars/cents:
StockQuoteFill4.PNG
Figure 25.24
Making the format 'money'
Of course, this 'money' format needs to be copied down. We do this by looking at the menu, and hitting 'copy format' instead:
StockQuoteFill5a.PNG
Figure 25.25
Copying the money format downwards
Lets then add a total - which we do by using the 'internal' function '=sum(c2..c10):
StockQuoteFill5b.PNG
Figure 25.26
Making a sum for our stocks
Looks pretty good, but we have forgotten one thing.... the price of the stocks themselves! Usually in figures like this, the price of the stocks come between the number of shares we have, and the total wealth. So we need to add a column to handle this. We do so by going under edit, and pressing 'Insert Column':
StockQuoteFill6.PNG
Figure 25.27
Inserting a column for our Stock Prices
Notice that when we inserted the column, even though everything that used to be in column C went to column D, the values stayed exactly the same because we defined everything in terms of column A which did not change. If something had depended on column C, we would have needed to updated the things that had changed.
This may be a feature that we will want to add to the spreadsheet in the future - having the cells auto-update themselves when something that they depend on moves.
Anyway, after inserting, we can now can add a list of stock prices by again using a combination of cell C2 having =StockQuote::get(a2)', 'C3 having '==c2' and a copy of formats. The end result is in Figure 25.28:
StockQuoteFill7.PNG
Figure 25.28
Inserting a column for our Stock Prices
And voila, we have our Stock tracer spreadsheet. When we reload our spreadsheet, the stock prices will then be updated, and we can keep track of how our portfolio is doing.
So, with a little effort, we can make our spreadsheet do a useful thing - track our portfolio for us. With some more effort we could probably make this spreadsheet buy and sell our stocks for us - an 'auto trading board'.
For now, note that our spread sheet is inactive. It doesn't actually update the stock prices for us until we reload the spreadsheet, and it sort of has a clunky interface due to the way that we programmed the StockQuote module.
You may want to experiment around with it to see if you can come up with
We started this chapter by saying that programming a perl spreadsheet was an experiment, and indeed it was. By putting perl syntax through the ringer like this, we could test how scaleable perl was in an place where perl was not well known for: the area of end-user, complicated business applications like the spreadsheet.
Given the limited scope of our test (one developer on a part time basis, limited time frame, well-established nature of the application) we believe that our experiment was an astounding success.
Not only is the Spreadsheet usable in the sense that you can successfully add data to it, and get meaningful results from the data, but it shows some novel aspects that other spreadsheets on the market don't even have. Aspects like:
In addition, we need not think of the perl spreadsheet in competition with any of the spreadsheets on the market. Far from it! With a few changes, we could use the spreadsheet to write out CSV (comma separated values) files, which could directly be plugged into an Excel and hence printed out on any type of printer, or graphed in a multitude of different forms.
Or we could take CSV files, and therefore extend the place where we could view spreadsheets from commercial programs. The high portability of this spreadsheet lets us use a program like Excel by extension. We need no longer worry that Excel isn't ported into a certain place; we could make spreadsheets in Excel, and then view them on other platforms via our 'portable' spreadsheet.
The fact that all of this can be done with so little effort is a very strong testament to the power of perl and the associated tools that come along with perl (like libnet, and perltk). I truly doubt that we could have accomplished as much in so little time with any other language. Play around with it some more, improve it, expand it, and let us know about your improvements. We hereby release it to the public domain and we will continue to improve it ourselves; who knows where this particular road will lead.
![]() ![]() |
![]() ![]() |
![]() ![]() |
COMPUTING MCGRAW-HILL | Beta Books | Contact Us | Order Information | Online Catalog
HTML conversions by Mega Space.
This page updated on October 14, 1997 by Webmaster.
Computing McGraw-Hill is an imprint of the McGraw-Hill Professional Book Group.
Copyright ©1997 The McGraw-Hill Companies, Inc. All Rights Reserved.
Any use is subject to the rules stated in the
Terms of Use.