Programmers' Corner: Database Subsets and SSL

To slice and dice your databases and serve them up your way, use the built-in subset function and add a dash of SSL.
By Ed Keefe
How do you speed up the speed locate feature in a large database?

 How can you sort a database on a date or time field and still use the speed locate feature?

 How do you display all the items in a database that belong to two categories but not to a third category?

 How do you display items that have empty date or number fields?

 How do you find the most recently entered items in a database file?

 How do you check for invalid data in a database?

 Got Questions?

 Database developers started asking the above questions almost as soon as they started using the Database, Phonebook, NoteTaker and WorldTime databases in the HP 100/200LX Palmtops. They noticed that as databases grew larger they also grew more lethargic. They realized that the more data they had to key into a database the more likely it was that they'd miss filling in a field or two and the more chances they'd have of entering invalid data.

 Database developers discovered another problem. They could sort their databases by dates or numbers but then found that the speed locate feature failed to work properly.

 We've Got Answers

 These problems begot questions and, over time, the questions begot answers. Most of the answers involved a better understanding of the subset function in the Palmtop's database engine.

 In this article we'll start at the bottom of the "subset learning curve" and describe what subsets are and what they're good for. Along the way we'll point out some subset features that should work but don't and conversely we'll mention some features that should not work but perversely do. Finally we'll look under the hood at the Subset Specification Language (SSL) and explore its powerful documented features and some of its quirky undocumented features.

 Throughout the article, we'll use the GENERAL.GDB file, developed for the "Data-Basics" article as an Exemplary Database.

 By the way, here are the quick and dirty answers to the questions asked above.

 Use subsets and sort first on the leftmost field.

 Sort on a blank field; then sort on a number field.

 Use the 'or', 'not and' SSL operators.

 Use the !*Date or !*Number SSL commands.

 Sort the primary field with None as the 1st sort item.

 Use the SSL range operator "->".

 If you deem these answers to be sufficient, you're way ahead of the rest of us. On the other hand, if the answers mean little or nothing to you, then it's time to go...

 Up the Learning Curve

 To start at the bottom of the subset learning curve, we'll describe what a subset is.

 A subset is just what the name implies: a small part of all the items that make up a database.

 The HP database engine extends the idea of a subset. In the Database, Phonebook, NoteTaker and WorldTime applications subsets may include not only a part of the database but all the items in the database. So the question arises, if you can have subsets that include all the items in a database, why bother? In other words what are subsets good for?

 To get a better appreciation of the value of subsets consider the following.

 Think of a database as if it were a spreadsheet. The data items would be the rows in a spreadsheet and the data fields would be the columns. In a spreadsheet, if you wanted to view the information in a different way you could move the columns around and delete or hide some of them. Similarly you could select a few rows from the spreadsheet and display them all by themselves on the screen.

 However, with a spreadsheet, like Lotus 1-2-3, once you rearrange the data, there's no easy way to get back to the original layout. To do so you'd have to retrieve the original spreadsheet and thereby wipe out all your hard work.

 With the database engine you can rearrange the columns or reduce the number of rows, or both, and save the layout as a "subset". To get back to the original view you simply press the F6 (subset) key, highlight the "All Database Items" subset and press Enter. There's no need to reload the whole database itself.

 Once you've created and saved a subset, you can retrieve it just as quickly.

 The important thing to realize is that subsets do not change the original information in a database. No matter how many columns you apparently delete, no matter how many rows you wind up with, you won't wipe out any data from the database.

 In the same vein, if you press the Enter key on a selected row in a subset, you'll see that the Item View is the same as it is for the All Database Items subset. In other words, subsets do not alter the underlying structure of an item.

 In short, subsets are a non-destructive way to rearrange the information in a database for your viewing pleasure. Since you can save subsets and call them up whenever you want, they can act as "information filters" to let you examine parts of a database that are relevant to your needs. They do this by hiding the columns and/or rows that you don't want to see.

 A Simple Example

 Screen 1 shows the original List View of the GENERAL.GDB file.

 Our goal is to show all the items (rows) in the database but we want to rearrange the columns so the Date Field column is on the left edge of the screen. Here's how to do this.

 Press the F6 key to start the subset function and then press the F2 (Define) key. The Define subset screen will appear.

 Immediately press F10 (OK) and you'll see the "Name" entry box. You could give the subset a unique name but you don't have to. Just press F10 to keep the "Unnamed subset" label and press F10 (OK) to return to the List View.

 To rearrange the columns in the List View press the F8 (Column) key. Move the cursor to the Date column and use the F7 key to move it to the left side of the screen. Press F10 (Done) and you really are "Done". You've got a new way to view the data.

 To sort the items by date press F7 and then press the D' key to set the Date field as the "1st sort field." Press Tab and the Spacebar to check the Ascending box and press F10 to return to the List View.

 You can return to the original view by pressing F6 (subset) and selecting the All Database Items subset. When you want to have the data sorted by date, just press F6 and pick the Unnamed subset from the list and press Enter or F10.

 Suppose you want to review the last ten items you keyed into a database. Here's a subset that will let you do this.

 Press F6 (Subset), press F2 (Define) and press F10 (Done). Name the subset "NaturalOrder" and press F10 twice Now press F7 (Sort) and make sure that all three sort fields have the word None in them. If necessary clear the Ascending check boxes. Press F10 (OK). The last items entered should be at the end of the list.

 An Undocumented Non-Feature

 Suppose you want to speed locate an item dated 08-JAN-99 using the Unnamed subset.

 If you press the 0 key, the cursor will not move. However, when you press 8 the cursor will hop to the last item in the list.

 It may come as a surprise that the speed locate feature does not work with date fields. Apparently it only works with text fields, note fields and categories. It doesn't work with dates, times and numbers.

 This isn't a bug: it's a feature that should work but doesn't and the User's Manual doesn't say anything about it. I suppose we could call this an undocumented non-feature. The sidebar describes a clever way to make the speed locate feature work with date, number and time fields.

 Rearranging columns may seem like a trivial use of the subset function but it turns out to be quite useful, especially if you have a Phonebook file with thousands of names and addresses.

 In a Phonebook file all the fields are text-type fields so speed locate will always work. Simply move a chosen field to the first column in the List View and you're almost ready to use speed locate to find a Name, a Zip code or a City.

 To make sure that speed locate really is speedy, sort the database in ascending order using the first column as the "1st sort field."

 Speed locate will work even if the data is sorted on some other field but you may have to wait for a second or two between keystrokes. On large databases the wait can grow to three or more agonizing seconds between keystrokes.

 More and Different subsets

 The previous example showed how to use a subset to view all the items in a database file. The only thing that changed was the arrangement of the columns in the List View.

 Another use for the subset function is to limit the number of rows in a List View.

 The purpose of this type of subset is to gather together items that have something in common so that you can look at just those items and filter out all the others.

 Since there are fewer rows in this reduced item subset the speed locate and the search operations will work more quickly.

 The simplest way to generate a reduced item type of subset is to use a category field in your database.

 Screen 2 shows a partial list of the subsets for the GENERAL database. Most of these subsets were created using the different category options in the database.

 Building subsets with categories makes sense. After all, when you "categorize" something you're limiting it in some way. So why not show the limited set of items that belong to the category?

 The only thing wrong with this thinking is that, even though you can have up to 32 different category options in a database, you're limited to 16 subsets. The second sidebar for this article describes a way to get around this limitation.

 It's easy to generate subsets based on category options. Here's how:

 In the List View, press F6 (subset) and then press F2 to define a new subset. Use the tab key to move the cursor to the Category field in the Define Subset screen and press the Down Arrow key and select a category option with the spacebar. Press F10 and give the subset the same name as the Category. Press F10 twice to return to the List View and see what appears.

 If you want a subset that combines the items in two or more categories just pick the second category option with the spacebar before you press F10. The database engine will add a comma between the category options. If you want to have a subset whose items belong both to one category and another, just edit the line in the Category field and substitute an ampersand for a comma.

 (The User's Manual says that a "semicolon separates multiple selections" but on the HP 200LX it is a comma rather than a semicolon.)

 As an example of a multiple category subset for the GENERAL database you could define a subset that would look like "Add-on,Subset&Tip" which would generate a list of all items that belong to either the Add-on OR Subset AND Tip categories.

 You can also create subsets using fields other than the Category field. For some examples of this variety of subsets, turn to the User's Manual. The section "Using Check Boxes and Options Buttons in a subset Definition" is quite good.

 To use Text and Note fields to create subsets go to the Define subset screen and fill in a Text or Note field with the word or phrase for which you want a subset. For example, in the GENERAL database, you could type the words "tip,=Trap" in the Note field. Omit the quotes but keep the comma: it signifies that you want items with tip OR Trap. The "=" sign signifies that you want the word "Trap" exactly as shown, i.e., case sensitive. Since "tip" doesn't have an equal sign the database engine will do a case-insensitive search for the word. If you want only those items in which the Note field contains both tip AND Trap use the phrase "tip&=Trap". The ampersand stands for the word AND.

 If you do create a subset by putting "tip,=Trap" in the Note field you can expect to wait a looong time for the subset to appear after you press F10. The database engine has to search through all the Notes looking for "tip" or "Trap".

 You can stop the subset creation process by pressing Ctrl+MENU (Break).

 If you absolutely must have this type of subset here's a way to speed up the subset.

 Suppose, for example, you want a subset that shows only those items in which the Note fields contain "tip" or "Trap".

 Start by selecting the All Database Items subset and, in the List View, use the MENU File Modify command and add a new field to the database. Name the field "tiporTrap" and select "Check box" from the list of Field types.

 Now create a temporary subset by putting "tip,=Trap" in the Note field and pressing F10. When the subset finally appears open each item and check the "tiporTrap" box for that item.

 Finally press F6, highlight the temporary subset and press F4 to edit the definition. Delete the words "tip or Trap" from the Note field and put a check mark in the new tiporTrap box. Press F10 twice and watch how quickly the subset appears. Of course this is cheating because you searched the database beforehand and checked the relevant items, but this corner-cutting method will make your database a lot faster.

 Here's another way to create a subset using a field type other than a category field.

 Suppose, in the GENERAL database, you want a subset of all items whose Number field is greater than or equal to 10000.0.

 If you open the Define subset window and type ">=10000.0" (without quotes) in the Number field and press F10, you'll get an error message telling you that this is an "Invalid subset Definition". Apparently you have to use the statement ">10000,10000" to make this work. (This is another example of something that should work, but doesn't.) A better way to build this kind of subset is to use SSL, as we'll see.

 Validating Data

 Most database programs have a method to keep you from entering invalid data. For example, they can stop you from entering alphabetical characters in a ZIP code field.

 The database engine in the Palmtop has a small amount of data trapping built in. For example, you can't put alphabet characters in a Number field. You can put alphabet characters in a Date field but the field will be cleared and the computer will beep if the entry is not something like 26-Jan-99.

 Something the database won't do is prevent you from putting values in a number field that are "out of bounds" for your particular application. Likewise, you could accidentally put alphabetic characters in a ZIP code field and the Palmtop will let you get away with it.

 Similarly, you can omit entering data in a required field and the database will not stop you. Other database programs would force you to fill in a required field before you could continue.

 Although the database engine doesn't stop you from making mistakes, the subset function can help you find and correct many data entry errors.

 Here are a few examples.

 Out of Range Values

 Suppose, in the GENERAL database, we wanted to limit the values in the Number field to those greater than or equal to 10,000 and less than or equal to 700,000.

 To clean up the data we want to create a list of all items that have a Number field value less than 10000 or greater than 700000. Here's how.

 Press F6 (subset) and F2 to define a new subset. In the Number field of the Define subset screen type

 <10000,>700000

 and press F10 (OK) two or more times and you'll have a subset of items whose Number fields are outside the "valid" range. The values will be either less than 10000 OR greater than 700000.

 Suppose we want to check for date values that are before (less than) 1/6/99 or after (greater than) 1/9/99.

 If you enter <1/6/99,>1/9/99 in the Date field of the Define Subset screen and press F10 two or three times, you may or may not get the results you want. This subset definition will only work if the dates you enter in the Define Subset screen match the date format selected in the Setup program. I'm not ready to call this a bug in the database engine, but it is a surprise.

 The SSL Language

 The Subset Specification Language (SSL) of the HP database engine is what drives the subset function.

 To see what a typical SSL statement looks like we'll use the subset definition in which the Number field has <10000,>700000 in it. If you press F6(subset) and then press F4(Edit) you'll see the "Edit General subset Definition" window. In the SSL Statement field you should see (NumberField<10000 | NumberField >700000) as in Screen 3.



 This is the SSL code that is generated automatically when you enter the expression in the Number field. The "|" (bar or pipe symbol) stands for the OR operator.

 If you're not familiar with the Logical Operators (&, | and !, i.e. AND, OR, and NOT) think of them as being like the arithmetic operators "*", "+" and "-" (multiply, add, and negate).

 (C/C++ programmers take note. Since these are logical operators, you'll probably want to use && and || instead of & and | which are the symbols for the bitwise AND and OR operators. Resist the temptation and go with the flow of SSL.)

 The User's Manual refers to these operators as Boolean Operators since they only work with values of 0 and 1 (True and False). "Boolean" comes from the name of the founder of two-valued arithmetic, George Boole.

 SSL also has a standard set of simple relational operators, namely >, <, = (greater than, less than, and equal to). It also supports the compound relational operators >=, <=, <> (greater than OR equal to, less than OR equal to, not equal to). However, these compound operators are not allowed in the Define subset screen. They only work in SSL statements.

 Both kinds of relational operators compare two values and give the result of the operation as 0 or 1, i.e., false or true.

 To see how relational operators work, take a look at the SSL statement above and imagine that the database engine is examining a Number field that contains 707070.70. The engine evaluates the expression NumberField <10000 as false (0). It evaluates the second expression, NumberField >700000 as true (1). Then the OR operator takes over and evaluates 0 OR 1 = 1, i.e. false OR true is true. Since the result is true, the item does indeed belong to this subset.

 The relational operators in SSL have been extended (C++ folks use the term "overloaded") to work with text, date and time values as well as numbers. You can compare a field to a value or you can compare two fields of the same type to each other. For example, if you're building a database to track projects you could use an SSL statement such as {Begin Date}>{EndDate} to create a subset of all items that may have their date fields in the wrong order.

 To complete the picture, SSL throws in a couple of extra operators: "->", "#", and "*". The first is a range or set operator; the second is a substring operator and the third is called a wildcard operator.

 To see how the range operator works, consider the SSL statement above. It looks for all items whose Number fields are outside the range 10000..700000. We could use the range operator to write an equivalent SSL statement:

 !(NumberField->10000..700000)

 Note the "!" (NOT operator) before the parenthesis. The expression would read "give me all items whose values are in this range: NOT" That is, if the expression inside the parentheses is true, the whole expression will become false. Likewise if the inner expression is false the outer expression will become true.

 I don't recommend using the range operator for creating subsets. It seems to be slower than the equivalent expression using relational and logical operators.

 You could key in SSL statements, but why bother! Use the fill-in-the-blank method. It's faster and there's less chance to making typing mistakes.

 Screen 4 shows a complicated subset definition screen and the equivalent SSL statement. Take your pick. Both types of subset definitions get the job done.

 Here's another example. It assumes that you want to see only those items in which the Text field contains numbers. Screen 5 shows the text field filled with the ten digits separated by commas. It also shows the SSL statement generated by the database engine. The # sign stands for "contains. The code says, in effect "If the Textfield contains a "0" or if the Textfield contains a "1", or if.... the Textfield contains a "9" then the item belongs in the subset.

 If it does nothing else, this example shows the advantage of letting the database engine do the coding. Moreover, the code can be quite useful in trapping a common error.

 Suppose you've entered 500 or more items to your Phonebook and you want to check that the phone numbers and Zip code contain only numbers and no other characters. Here's how to expedite this.

 Highlight the SSL code above; copy it to the clipboard and paste it into Memo. In Memo use the F5 (Replace) key to replace all instances of "Textfield" with "Zip". Put ! (an exclamation mark, i.e. the NOT operator) before the opening parenthesis and add &!(ZIP="") at the end to eliminate any blank ZIP fields from the subset. Copy the new version of the SSL code to the clipboard and it into the General subset Entry box in your Phonebook and press F10 (OK) a couple of times to activate the subset. See Screen 6 for a picture of this code. Hopefully you'll wind up with an empty subset which shows that your Zip codes include only the digits 0 through 9.

 The "*" operator is called a wildcard operator. The User's Manual and the Help screens for the subset function give a couple of examples of how to use it.

 Detecting Empty Fields

 When you're filling in items in a database it's only human to leave some fields empty and promise yourself that you'll come back later and fill them in. However, when the database has grown to hundreds or thousands of items, it's problematic that you'll catch all the records with empty fields.

 The subset function can save a lot of time tracking down those items.

 Here's a recipe for creating a subset of the items that have empty fields in your Phonebook.

 Open your Phonebook and press F6 and F2 to define a new subset. In the Define subset press the Spacebar and the Enter key to put a space in each field. Then press the F2 key to look at the SSL code. It will contain a lot of statements like (Name#"") & (Busi ness#"") etc. Highlight and copy all this code and paste it in Memo. Use the Replace key in Memo to change "&" to "|" and replace "#" with "=". Remove any fields, along with their OR operator, that you don't care about such as (Address2="")|. Copy the modified code and paste it back in the General subset screen. Press F10 (OK) two or more times and you'll see how many items are missing important data.

 Note the difference between the "#" operator and the "=" operator. In this case we wanted to find those fields that were "exactly" blank rather than those that might contain a blank somewhere in the text.

 This recipe will work for the Phonebook since all the fields are text fields and the way to test for an empty field is with the SSL statement Fieldname="".

 However the technique for finding empty number, date, and time fields is not documented in the User's Manual.

 Without further ado, here is the SSL code that gets the job done.

 To find items with empty date fields use the SSL code !*Date Field. For empty number fields use the code !*Number field (i.e., NOT wildcard field name). Note you can also use NumberField=0 but this will give you fields that contain a 0 as well as empty fields. For empty time fields use TimeField=":"

 The use of the wildcard operator with a field name is not supposed to work, but it does, at least for the date and number types of fields. The same trick fails to work for time fields so we resort to looking for time fields that do not contain a colon as in 12:34am. If you're using a time format such as HH,MM,SS you'll have to substitute a comma in place of the colon.

 Speeding Up subsets with a System Macro

 Once you've started using subsets to slice and dice your Database or Phonebook you'll wind up with a number of subsets that you use frequently.

 You may also begin to notice that switching from one subset to another takes a long time. This delay happens the first time you switch subsets. Subsequent switches happen almost immediately.

 The delay is caused by the database rebuilding the subset each time you open your Database or Phonebook.

 To get over the hurdle of waiting for the subset to be rebuilt use a system macro that you run each time you open the Database or Phonebook. The macro would consist of a series of expressions like:

 {F6}{F10}{F6}{Down}{F10}{F6}

 {Down}{F10}{F6}{Down}{F10}

 and so on for as many subsets as there are in the database. Rather than wait around, run the macro and leave your Palmtop alone until the task is done. From then on you should be able to use the subsets quickly.

 Two Final Speed Up Strategies

 There are two "must have" programs that will speed up databases: FASTDB.TSR and DBVIEW.EXE. Both of these programs were mentioned in the Data-Basics article on page 14.

 Fastdb.tsr will make editing your databases much faster. Once you've completed a database, the DBView program will let you view it without having to wait for subsets to be rebuilt each time you want to use the database.

 In this article we've covered a lot of the tips, traps and techniques of using the Palmtop's Subset function and its SSL language. In the process we hope that we've answered most of the questions that you may have as you continue to use the database applications in your Palmtop.
 
 

Overcoming the Subset Limit

Getting Speed Locate to Work with Dates and Numbers

A Database About the HP Palmtop's Database