INDEX
*  General Documentation
B
B.7  Understanding and Coding Index Records
B.7.1  How Indexing Works
B.7.2  Understanding Simple Indexes
B.7.3  Understanding Qualifiers
B.7.4  Understanding Sub-Indexes
B.7.5  Understanding Combined Indexes
B.7.6  The Impact of Global FOR and ALSO on Indexing
B.7.7  Index Definition
B.7.8  Coding Simple Indexes
B.7.9  Coding Simple Indexes with Qualifiers
B.7.10  Coding Combined Indexes
B.7.11  Coding Sub-Indexes
B.7.12  Index Record and Goal Record Elements
B.7.13  Index Records as Goal Records
B.7.14  Index Records for Non-Removed Record Types
B.7.15  Ensuring the Validity of Index Records
B.7.16  Personal Name Algorithm Details
B.8  Understanding and Coding the Linkage Section
B.8.1  Functions of the Linkage Section
B.8.2  The Global Parameters Section
B.8.3  Individual Index Linkages
B.8.4  Simple Indexes
B.8.5  Sub-Indexes
B.8.6  Global Qualifiers
B.8.7  Local Qualifiers
B.8.8  Combined Indexes
B.8.9  Coding SRCPROC Rules
B.8.10  The NOPASS Statement
B.8.11  Coding PASSPROC Rules
B.8.12  Choosing the "Fetcher" PASSPROC
B.8.13  Other Actions in a PASSPROC Rule String
B.8.14  How Passing Works
1  Triples
1.1  Triple Functions
1.1.1  $MAKE
1.1.2  $MADE
1.1.3  $UNMAKE
1.1.4  $UNMAKETRIPLE
1.1.5  $LOOKUP
1.2  Decomposing Triples
1.2.1  $ATTRIBUTE
1.2.2  $OBJECT
1.2.3  $VALUE
1.3  Groups
1.3.1  $GROUP
1.3.2  $GROUPSIZE
1.3.3  $GROUPELEM
1.3.4  $GROUPSORT
1.4  SHOW & CLEAR TRIPLES
2  Add Function Documentation
2.1  SPIRES Functions -- Background
2.2  SPIRES Functions -- Implementation
2.3  SPIRES Functions -- Installation
2.4  SPIRES Functions -- Documentation
2.5  SPIRES Functions -- Distribution
3  Add Variable Documentation
3.1  SPIRES Variables -- Background
3.2  SPIRES Variables -- Implementation
3.3  SPIRES Variables -- Installation
3.4  SPIRES Variables -- Documentation
3.5  SPIRES Variables -- Distribution
8  The Host Language Interface (HLI)
9  UPDCLOSE Processing: INCLOSE
9.1  Intermediate Form
9.2  Structure Processing Dsect
9.2.1  FATL
9.2.2  NXTL
9.2.3  LSTL
9.2.4  NXTH
9.2.5  LSTH
9.2.6  SLOC
12  Partial Record Processing: Partial FOR
12.1  Introduction
12.2  Current Capabilities
12.3  Concept of Partial Processing
12.4  Record Level Commands
12.5  Record Navigation
12.6  Partial Processing Commands
12.7  Partial Processing UPDATE and MERGE Capabilities
12.8  General Information
12.9  The FOR * command
12.10  Partial Processing to the rescue

*  General Documentation

B

B.7  Understanding and Coding Index Records

B.7.1  How Indexing Works

Let's consider what happens when we want to build an index. Suppose we had a subfile called "TABLE OF CONTENTS"; each record in the subfile is a chapter number (the key of the record) and a chapter title. If an appropriate format were written, the table of contents for the first seven chapters of Part B of this manual might look like this:

Chapter   1: Goal Record Concepts and Definition
Chapter   2: Goal Record Keys, Slot and Removed Records
Chapter   3: Structures
Chapter   4: Processing Rules: INPROC, INCLOSE, OUTPROC
Chapter   5: FILEDEF Subfile and SPICOMP Compiler
Chapter   6: File Structure: Tree and Slot, Goal and Index Records
Chapter   7: Understanding and Coding Index Records

An index based on the words appearing in that chapter titles might look like this:

In fact, an index at the end of a book is a good example of the structure of a simple SPIRES index. The record definition for this index would look like this:

The element "TITLE-WORD" contains one of the words in each of the titles. Since it is the key of the record, it can only occur once in each record. So, each word ("AND", "CODING", etc.) in the above index is the key of a separate record in the index record-type. But notice that for each "TITLE-WORD" there may be several occurrences of "GOAL-RECORD-KEY", each occurrence pointing to the goal record in which the title word occurs.

The record in this index record-type for the TITLE-WORD "GOAL" would look something like this:

This is an index record whose key is some "word" (here, "GOAL") and contains a set of pointers to those goal records that contain a certain word in the title. Each record that is stored in this index contains as its key a word from a chapter title, and one or more pointers, each to a goal record whose TITLE element contains the word that is the key of the record.

A SPIRES search of this index could look like this:

While a DISPLAY <key> command searches the goal record tree for the key named, an index record is searched by the FIND <searchterm> <key> command. For example: SPIRES will locate the record that has the key-value "goal" in the index named in the FIND command, and count up the number of pointers to the goal record data set to indicate how many records are likely to be in the search result. (The number reported may not be entirely accurate, since a single particular record may have several pointers to it in the same index record; if this is the case, SPIRES will report a corrected number in the search result after it has examined the records via the TYPE or OUTPUT commands.)

An index record thus looks very much like a goal record as far as SPIRES is concerned. It has a key of fixed or varying length, depending upon the nature of the data being passed to the index; it has a multiply occurring element called a pointer (that may be the key of a structure). Since the FIND command attempts to locate records by key, the most efficient structure for storing and locating index records will be a tree structure. Typically, the records in an index are not REMOVED, since they are usually quite small, allowing a large number of them to fit into a single tree block. Index records are thus structured and stored in a manner identical to that for goal records, except that we can take advantage of their small size.

The simple definition for the TITLE-WORD index shown above is probably not what most file definers would specify, especially if the goal record (the chapter titles) were a REMOVED record type. If the goal record is REMOVED, then its indexes do not usually store goal record keys, but addresses of goal records in the residual data set. The index definition would probably look more like this:

Index records exhibit a new type of element, the locator, denoted by "TYPE=LCTR." This element refers to ("locates") a goal record, not by its key, but by its address (location) in the residual data set. To see why this is done, consider the sequence of events for a FIND and TYPE command. SPIRES searches an index and accumulates a list of pointers to the goal records, and reports on the number of pointers found. If each of the pointers were in the form of a goal record key, then the TYPE command would cause SPIRES to read blocks of the goal record tree until it found the location of the referenced record in the residual; then SPIRES would access the residual. In almost all cases, the middle step of searching the goal record tree for the record's location in the residual can be eliminated by storing that location itself as the pointer, rather than the key of the record; this optimization can only be done when the goal records are REMOVED.

To be precise, a pointer to a non-REMOVED record type is not declared TYPE=LCTR, since it contains a goal record key rather than a pointer to a location in the residual data set. Only the pointer to a REMOVED record type can be TYPE=LCTR.

Since SPIRES creates and maintains indexes automatically, the file definer must tell SPIRES how and what information is to go from the goal record to a certain index. The file definer specifies how this is done in the "Linkage Section" that follows all of the index record definitions and precedes the "Subfile Section" that specifies subfile name and privileges.

As its name implies, the Linkage Section links the goal and index records; it defines how information is passed from the goal to the index records when file updating is done, and it defines how indexes are to be searched. The details of coding the Linkage Section are covered in the next chapter "Understanding and Coding the Linkage Section." [See B.8.] With this brief look at the structure of a very simple index record, we can now consider the different methods of indexing available to the file definer. Each method has a search and retrieval situation for which it is particularly well suited. For any file that will be searched often, or will contain more than one thousand records, indexing plans should be discussed with the SPIRES consultant. For each indexing strategy described below, guidelines for its use are also presented.

B.7.2  Understanding Simple Indexes

Simple indexes may be defined for files of any size. Their structure and use by the system is "simple" and efficient. Here is a picture of two records in a simple index:

Also, simple indexes are the only type of index for which a thesaurus and/or synonym can be maintained. [See C.2, C.3.]

If an element to be indexed has only a few possible values, it may be best to "search" this element using Global FOR, or perhaps index it as a "qualifier" (see below). Any time a search request would retrieve a large percentage of the records in a file (seventy percent or so), simple indexes may not be the best search mechanism. For example, an index built on the sex (male or female) of people in a personnel file may or may not be necessary, depending upon the search situation. If that index will not be searched frequently, it may be cheaper to search the goal records sequentially (using FOR or ALSO) than to pay the cost of building, updating, and storing the very large index record entries. If a search result will frequently be narrowed by a "sex" criterion, then "sex" might be added to another index or indexes as a qualifier.

B.7.3  Understanding Qualifiers

Qualifiers provide a search flexibility for large files, allowing search requests to be narrowed by the specification of criteria that would be inefficient to search and index otherwise (such as the language in which a program is written in the MASTERLIST subfile--only four or so possibilities exist). Qualifiers should be used sparingly: they must be stored redundantly in each index to which they apply, generating high storage costs. Let's look at the structure of a simple index with one qualifier.

If we were to qualify the title-word index shown in the beginning of this chapter with a STATUS element, allowing only the values "Preliminary," "Current," and "Out of Date," the structure of the index and a sample record in it would be something like this:

As you can see, the qualifier is stored with each pointer. Thus, a qualifier takes up quite a bit of space, relative to a simple index on an element with only a few values (such as STATUS). But, the time required to search on the basis of a qualifier is less than that for searching two indexes, especially if one of them has only a few large records (entries) in it. This is because a qualifier search request narrows a search by operating off an existing search result stack; a search involving two indexes requires SPIRES to build two search results, and AND them together.

So, if search time is more important than storage cost, and you will frequently want to qualify a search request by a certain criterion or criteria (there can be more than one qualifier for an index), a qualifier may be appropriate.

Several other facts about qualifiers will influence a decision on their use: 1) they may only be used with the AND and AND NOT logical operators; 2) they allow the full range of relational operators, such as ">" and "<"; 2) they can only be used after a search request involving the index to which they are attached. For example, assume DATE is a qualifier to a TITLE index:

One additional requirement is that the qualifier must occur in any index(es) to which it applies; note that a global qualifier usually is a REQUIRED element in the OPTIONAL pointer structure--if the POINTER occurs, then the qualifier must occur also. For both global and local qualifiers, this means that the element being passed from the goal record as the qualifier may not be optional, or a default value must be supplied by pass processing rules if the element does not occur in the goal record.

Qualifiers may also be "local" or "global." If local, then it may only be used after the index to which it applies has been named in a search request. If global, then it may be used any time after the first FIND command referencing any index. Global qualifiers are stored redundantly on every pointer of every index; they are thus quite expensive from a standpoint of storage costs.

B.7.4  Understanding Sub-Indexes

Sub-indexes are almost exclusively used with personal name indexes. The personal name search processing rule (SRCPROC A38) breaks a search value into two portions: last name, and first names. After searching the index record on the last name, the first names are used to determine which sub-index structures define the pointer groups. If no first names were given in the search request, all pointer groups in the index record are logically OR'd together.

Sub-indexes can be used in other ways besides personal name, and are searched by specifying the commercial ("@") character. For example, we might make CITY a sub-index of STATE and search as follows:

or make SEAT and ROW sub-indexes of SECTION:

Sub-indexes can be useful when things logically fit inside other things, as cities do in states, or seats do in sections. They allow you to choose a subset of the index as a result.

B.7.5  Understanding Combined Indexes

Several elements can be passed to a single combined index (only one combined index may be defined per goal record), requiring somewhat less storage space than several simple indexes. Passing several elements to a single combined index is necessary when the number of indexes defined for a goal record would require that more record-types be defined in a file definition than are allowed (64 is the maximum). Combined index organization is most efficient when the data elements are numerics (often requiring relational operators for effective searching) and short alphanumerics (such as codes).

However, there are significant disadvantages to combined indexing strategies when a file is large (more than eight thousand records) and is searched or updated frequently. As the file gets large, the cost of updating a combined index with many entries gets progressively greater; combined indexes are also somewhat more time consuming and expensive to search than simple indexes, usually requiring several disk accesses to retrieve the large records they contain from the residual data set. Also, many of the elaborate search and pass processing rules (SRCPROC and PASSPROC) available for simple indexes are not available for combined indexes. In addition, the BROWSE command cannot be used to inspect the contents of a combined index.

All of the advantages and disadvantages of combined indexes arise either from the search and update techniques they require, or from their structure, which is similar to indexes with local qualifiers. In a simple index, there is one index record for each unique value passed from the goal records; if several goal records had the same value, then the one index record for that value would have multiple occurrences of pointers to the goal records. For combined indexes, however, there is one index record for each element-mnemonic in the goal record that passes to the index, and every unique value that that mnemonic has forms an occurrence of a pointer structure containing the pointer and the value of the element in the goal record that is being pointed to. For example: if a goal record passes TEMPERATURE, AGE and DATE to a combined index, the goal and index records would look like this:

Note that the records shown in the right column are created and maintained by passing. The key of the record is a combination of the structure and element number of the elements that is being passed from the goal record; these keys are computed by SPIRES.

When a search request is made against a combined index,

the single index record containing all of the AGE values in the goal record is read, then all the values (ELEM-VALUE, above) in the index record read in are scanned, and pointer groups not meeting the criteria are weeded out of the search result. Because a single record containing all AGE values exists in a combined index on AGE, a command such as

is possible; the result will be all records in which the AGE element passed a value to the index.

A combined index record may grow quite large if 1) it contains many values because the number of goal records passing to it is large, or 2) the values passing to it are long, such as lengthy character strings. Since a large record must be read and then scanned, searching a combined index, particularly in medium and large sized files, may give a noticeably slower response than searching a simple index in the same subfile. Also, updating such a large record is more time consuming and thus expensive than updating a simple index; because the large records in a combined index may often overflow the 2048-byte limit for an ORVYL file block, multiple disk accesses may be necessary to search for or update a single record.

If the file is not large or, if the elements being indexed do not occur in a majority of the goal records, or if updating is not done nightly, then combined indexes are quite suitable for numerics and short alphanumerics, such as codes.

B.7.6  The Impact of Global FOR and ALSO on Indexing

The Global FOR and ALSO commands provide substitutes for a combined index in a large file. Of course, these methods involve sequential rather than indexed searching, and will be noticeably slower (more elapsed and CPU time required) than a combined index search unless the existing search result is small.

The ALSO command always examines all the goal records pointed to in an existing search result; this capability is also available using the Global FOR commands. In contrast to the FOR and ALSO sequential search, an index search request preceded by another search request operates as any compound search request: two or more subsets of pointers are built, one for each of the search criteria, then put together into a single search result. For this reason, if a search request requiring relational operators were always preceded by search requests yielding a relatively small search result, such a request might be performed most efficiently using a Global FOR or ALSO command.

Another consideration: if searching is done sequentially by the Global FOR or ALSO commands, then no expenses are incurred for building, updating and storing combined or simple indexes. If search requests against the values in some elements will be quite infrequent, it may be advisable to use sequential search techniques rather than indexed search techniques. Retrieval may be slower, but costly indexes of little use will not be maintained.

There are some cautions to the use of sequential searching techniques, however. Unlike the FIND command, the ALSO command cannot initiate a search; it must always operate on a preceding search result--in this respect it is like a Qualifier. Unlike the ALSO command, the Global FOR commands need not operate via a search result, but can.

The search criteria for Global FOR commands are specified in the WHERE clause. Two additional operators are available in the Global FOR WHERE clause: OCCURS and LENGTH; these are not available to the FIND or ALSO commands. OCCURS allows a user to specify search criteria based on the number of occurrences of an element, and LENGTH allows criteria based on the length of any single occurrence. For example, suppose you wished to print mailing labels from your subfile's records, but first wanted to print all addresses that would not fit on standard labels. This might be done as follows:

This would place in the active file the subset of all goal records that had more than four lines of ADDRESS or had any occurrence of the element ADDRESS that was longer than the width of a label, 35 characters.

Note from the above example that the Global FOR commands do not automatically provide you with a count of the number of records meeting the criteria specified in the (optional) WHERE clause. This is because the FOR command itself does not initiate a search of the file; the file is not searched until another command is issued that specifies what is to be done with the records--remove them, display them, dequeue them, etc. A count can be obtained, however, and a new set of WHERE criteria specified if the number is too small. The following example shows this process, which involves several examinations of the goal records in the search result, and is therefore rather time-consuming and expensive:

The system's response to the SHOW LEVEL command gives two numbers: The second indicates the number of records examined--here it is 22, the same as the number of records in the search result. The first number indicates how many of the records examined met the criteria specified in the WHERE clause--4 for the first WHERE clause and 11 for the second.

The same results are more directly obtained by the use of the ALSO command, which gives an indication of the number of records meeting the criteria immediately, just as a FIND or other index search command does. For example:

If further index searching commands (e.g. AND, OR) are necessary, then the ALSO command must be used, since the "result" of a Global FOR command is not a set of pointers in a search result. The pointers in a search result can be combined logically with the pointers meeting the criteria specified in subsequent search commands. If, however, the records meeting the WHERE criteria are to be displayed at the terminal or placed in the active file, regardless of their number, then Global FOR is a far more efficient way to do this than the ALSO command.

Compare the two search scenarios following:

The second series of search commands is almost twice as efficient as the first. With the ALSO command, the system must read the goal records to examine the EYES element, then read the records meeting the criteria a second time when a TYPE command is given. With the FOR RESULT command, the record is read to examine the EYES element, then, while the record is still in main memory, it is displayed on the terminal. The net effect is that a record is accessed only once when FOR RESULT is used.

In addition, Global FOR commands can be used for many record management functions other than what has been described. Here we have just exhibited its capabilities with respect to those of the ALSO command. The Global FOR commands facilitate a full range of data base and record management functions unavailable otherwise. All file owners and managers should be familiar with the capabilities of Global FOR for sequential subfile search and subsetting. Consult "SPIRES/370 Searching and Updating" for an introduction to Global FOR searching.

B.7.7  Index Definition

Having considered the different indexing options available to the file definer, and having described the functional differences among them, we can now attack the practical problem of coding the record definitions for the different indexes a file will have.

Subfiles may have one, several or no index records defined. There usually is one index record definition for each simple index in a file. One index record definition could be for a combined index in the subfile (remember that only one such index can be defined per subfile). Through a process called "passing", a combined index typically receives values from more than one element in the goal record, while a simple index typically receives values from only one element in the goal record. However, it is entirely possible for a combined index to have its values passed from a single goal record element. And it is also possible for more than one element in the goal record to pass to a simple index; this situation is known as "multiple passers." There may not be more than one combined index per subfile, but there may be more than one combined index defined in a file that has more than one subfile. There may be a large number of simple indexes, provided that the total number of records defined for a file (goal and index records) does not exceed sixty-four.

The different kinds of indexes a subfile has influences the kinds of records defined for all indexes in a subfile. This is due to one of the primary rules of coding index record definitions: all pointer groups to the same goal record must "look alike" in terms of their structure. This means that if there is a combined index or a simple index with a qualifier for a goal record, the pointer groups in all indexes to that goal record must exhibit the structure of a combined index or simple index with a qualifier.

It is fairly easy to reduce the definition of most index records to a "recipe," and "A Guide to Coding Index Record Definitions" [See D.5.] gives recipes for the indexes encountered in most SPIRES file definitions. The following sections describe how to code index record definitions in such a way that you can see the reason for their structure.

B.7.8  Coding Simple Indexes

If all indexes to a single goal record are to be simple indexes, then the structure of each index record might look something like the following:

The only essential difference between the two is that one declares the key to be fixed in length, the other declares it to be varying in length. What is the key? The key of a simple index record is, in almost all cases, the value of an element passed from the goal record. Thus, if you were passing a fixed binary number, such as a price or date, you would want to specify that the key is fixed. The length is the same as the length of the stored value in the goal record.

It is wise to choose names carefully for the elements whose values are shown in lower case. "Record-name" can be up to six characters long; its value is used by SPIRES to sort record definitions for both goal and index records into alphabetical sequence. By tradition, the name REC01 has often been used for the goal record, and REC02, REC03, etc., have been chosen for the index records.

The "element-name" may be anything up to sixteen characters long. For simplicity, it is usually best to give this element the same name as the name of the goal record element that passes its value to this index record.

The "pointer-name" again may be anything, but the name you choose must be coded in the linkage section. One additional requirement falls on the "pointer-name": it must be given the same name in all indexes for a particular goal record. This is the second primary rule for coding indexes. The pointer element is often given the mnemonic name "POINTER." In the example shown above, the pointer element is an optional multiply occurring simple element. If possible, it is best for the pointer element to be fixed length. That's because SPIRES can do logical operations more efficiently when the pointer element is fixed length, and fixed length elements take less overhead in the index records.

Only one other comment need be made about these two index record definitions. The pointer-name is said to be "TYPE=LCTR;" (fixed length of 4 bytes). The pointer-element is the "thing" that SPIRES tallies when it reports the number of records in a search result. It is also the element in which SPIRES stores the reference back to the goal record. If the file definer has coded "REMOVED;" for the goal record definition, then this "reference back" or pointer is usually in the form of the four-byte location (a "locator") of the goal record in the residual dataset. If the goal-record has not been removed, then TYPE=LCTR may not be coded. (However, even if the goal-record has been removed, in certain circumstances you may not want TYPE=LCTR.)

Let's look now at a very simple bibliographic file definition which contains two indexes, one for titles and one for dates. The date element will pass its value to the FIXED key of an index. Note that the following definition is incomplete in that it doesn't say how this "passing" is to occur. This process is defined in the next chapter.

B.7.9  Coding Simple Indexes with Qualifiers

A qualifier adds another level of "depth" to a simple index record definition: it introduces a structure, the same kind of structure that was defined by "TYPE=STR" in the goal record.

The pointer element, which was only a simple data element containing a reference to a goal record, is now a structure. The structure is always a keyed structure, and the key is always the pointer element. The structure itself is optional, but its key is fixed if the key is TYPE=LCTR. This introduces the third rule of index definition: if the pointer element is in a structure, then it must be the key of the structure.

What of the other elements in the structure? Typically, there is usually only one, the qualifier itself; if there is more than one qualifier, then there will be more than one qualifier element in the structure. The qualifier elements should always be defined in the index record definition with their OCC=1. If the goal record element that is passed to the qualifier does not occur in the goal record, then special pass-processing rules (PASSPROC rules, covered in the next chapter) should be coded to provide a default value.

Let's examine the skeleton of a simple index with one qualifier. Next to it is shown a TITLE index in which there is a SUBJECT qualifier.

We can now expand the example file definiton which had two simple indexes on DATE and TITLE to include a qualifier on the TITLE index. The new definition will illustrate another primary rule of coding indexes: if a pointer structure is used in one index, it must be coded in all indexes to that goal record, whether it is necessary to the structure of the specific index record in which it occurs or not. To see this, notice how the definition of the DATE index record has changed from its appearance in the previous example. Before, it was only a simple index; now, it looks like a simple index with a qualifier--even though no qualifier is passed to the DATE index.

In general, pointer groups for indexes that apply to the same goal record must have identical structure; this is so SPIRES can AND and OR pointer groups when manipulating search results. If you need to violate this general rule, then the first index record-type defined in the linkage section for the goal-record is taken as the model for the other index record-types. If this record-type has the appropriate structure defined for it (as described above), then more specific rules can be used for other index record-types.

The specific rules for pointer group structures are as follows:

If there is no REQUIRED section, then all pointer groups must be identical through the length of the FIXED section. If one pointer group structure is declared with LEN attribute, then all must be declared that same way, and only FIXED elements are allowed. This is the most efficient form of pointer group structure. If the pointer group structure is not declared with LEN attribute, then any (or all) may have OPTIONAL elements.

If there is a REQUIRED section, then all pointer groups must be identical through the end of that section. If the only REQUIRED element is the KEY of the pointer group, then any (or all) may have OPTIONAL elements. If there are non-key REQUIRED elements, then if one pointer group has OPTIONAL elements declared, all must declare OPTIONAL elements.

Note that if an index record-type is to have multiple qualifiers passed to it, then the following definition is appropriate:

B.7.10  Coding Combined Indexes

A combined index record cosmetically looks very similar to a simple index with one qualifier. Two important differences must be noted. First, the KEY of the combined index record is always fixed with a length of two bytes. This is because the key of such a record is an encoded form of the element name that is being passed to this index. [See B.7.5.] If the elements DATE, AGE, and TEMPERATURE all pass to a combined index, there will be three keys, and hence three records, in the index. (The keys, with which the file definer and searcher need never be concerned, tell SPIRES the structure and element number of the element in the goal record definition.) Second, the element that previously named the qualifier is now given a generic name, usually something mnemonically significant, like "VALUE", since each occurrence of it contains one value passed from the goal record.

Below, a skeleton record definition is presented. Next to it is shown a combined index that contains a DATE occurrence. Notice that the word DATE never appears in the index definition. The linkage section specifies which element(s) will pass to the combined index record.

Notice how these record definitions follow one of the indexing rules: if the pointer (here, TYPE=LCTR) is in a structure, it must be the key of the structure.

As the following example shows, all of the other rules are followed: 1) if one index has a pointer structure (because it is a simple index with a qualifier or because it is a combined index) then all indexes must have pointer structures; 2) all pointer elements must have the same name in each index.

The following example is similar to the previous two in some ways. The goal record contains TITLE, SUBJECT and DATE elements; COST will now be introduced and placed in the combined index. Note how the definition is quite different from the first example, which showed only simple indexes; but its only difference from the previous example, which showed simple index qualifiers, is the addition of a combined index.

Note that the occurrence of the VALUE element is always 1. The structure containing this element occurs once for each value of an element passed from the goal record. A record passing two COST values would cause two POINTER-STRs to occur, each with a single POINTER and VALUE. When SPIRES retrieves such records, it reports a result, which is the number of POINTER-STRs that met the criteria specified; this count may be high, since a single record could be represented in the POINTER-STR list more than once. SPIRES will correct any erroneous result count after it has been asked to TYPE the records in the search result.

Notice that the POINTER-STR in REC02 (TITLE) has the same "form" as the POINTER-STR in REC03 (DATE) and REC04. SUBJECT, DUMMY, and VALUE all occupy the same position.

B.7.11  Coding Sub-Indexes

Sub-indexes, usually used only for personal name indexing, provide a variation on the theme of simple indexes and simple indexes with qualifiers. Sub-indexes cannot be defined as part of a combined index, but may be defined for simple indexes in subfiles that have combined indexes.

Sub-indexes provide a way of searching data that has a hierarchical organization. Two simple hierarchies might have the following structures:

It would not be useful to find all people with a first name of "John" in a subfile unless you had first established that you were interested only in people whose last name was "Smith." It would also not be helpful for an airline reservation system to be able to find all seats with the number 13 unless a particular flight had been established to restrict the domain of the search.

Two types of sub-indexes can be defined, one for subfiles that contain only simple indexes and no qualifiers (no pointer structures would be involved in this case) and one for subfiles that contain either qualifiers or combined indexes. The following example shows the two types of record definitions; each is defined for a personal name index. The key of such an index is the person's last name, and the key of the sub-index (which is a structure) is the rest of the person's name (first, middle, etc.). Note that the first name structure is not a pointer structure: the pointer or pointer structure is an optional element in the first name structure.

RECORD-NAME = REC02;                   RECORD-NAME = INDEX5;
  REQUIRED;                              REQUIRED;
    KEY = LAST-NAME;                       KEY = LAST-NAME;
  OPTIONAL;                              OPTIONAL;
    ELEM = FIRSTNAME-STRUCT;               ELEM = FIRSTNAME-STR;
      TYPE = STR;                            TYPE = STR;
  STRUCTURE = FIRSTNAME-STRUCT;        STRUCTURE = FIRSTNAME-STR;
    REQUIRED;                            REQUIRED;
      KEY = FIRST-NAME;                    KEY = FIRST-NAME;
      ELEM = POINTER;                    OPTIONAL;
        TYPE = LCTR;                       ELEM = POINTER-STR;
                                             TYPE = STR;
                                       STRUCTURE = POINTER-STR;
                                         FIXED;
                                           KEY = POINTER;
                                             TYPE = LCTR;
                                         OPTIONAL;
                                           ELEM = VALUE;
                                             OCC = 1;

B.7.12  Index Record and Goal Record Elements

Up to this point, the definition of an index record looks very similar to the definition of a goal record: there are keys, elements of fixed or varying length and optional elements, and there are structures. Several file definition elements have not appeared: SLOT, SLOTCHECK, REMOVED, INPROC, OUTPROC, and ALIASES.

SLOT, SLOTCHECK and REMOVED are rarely coded for index records. However, one element is often coded for index records that is not usually coded for the first record-type (usually the goal record) in the file defintion; this is COMBINE. As explained in "Tree and Slot, Goal and Index Records," [See B.6.5.] COMBINE specifies that the data sets created by the compiler for each record definition specifying COMBINEd are to be merged into a single data set or file. Any tree structured data set can be combined with any other tree structured data set; slot record-types cannot be combined with each other, or with any record-type. Except in the largest files (over 100,000 records) with several subfiles, or files in which there are large table-lookup files, COMBINE should be used whenever possible. When there are table-lookup record-types, it is often a good idea to COMBINE them with each other, and to COMBINE the goal and index record-types together. This allows flexibility in erasing and recreating the table files with the ZAP DATA SET command. [See B.10.17.]

The COMBINE element is coded just after the RECORD-NAME element as follows:

Note that COMBINE is not coded for the goal record, REC01, in the above example, since it is the record with which other record-types are combined. The record-type named in the COMBINED statement must have been defined earlier in the file definition; it may not be defined further down. All of the file definitions in "Annotated File Definition Examples" [See D.7.] use the COMBINE feature wherever possible.

B.7.13  Index Records as Goal Records

What about coding INPROC, OUTPROC and ALIASES for index record definitions? These file definition elements may be coded for index records, but often are not. Since SPIBILD maintains the indexes, there is no need for ALIASES, and any INPROC, INCLOSE or OUTPROC rules that are coded are ignored when SPIBILD is updating the indexes as part of its processing. If the SPIBILD process will create new records in a record-type (as is usually the case with index records), it is important that no FIXED or REQUIRED elements be defined that will not be created by SPIBILD. If a required element is not present when SPIBILD attempts to create a new record in a record-type, a PASS ERROR with a code of S419 will occur. This is a serious error.

Generally file owners are encouraged to include INPROCs and OUTPROCs for the key or each index record, because these affect the results displayed by the BROWSE command. When index values are displayed with the BROWSE command, the values are processed through the OUTPROCs for the key. Also, if a value is given in the BROWSE command (such as BROWSE DATE-ADDED 7/1/80), that value is processed through the key's INPROCs as well. Without such INPROCs and OUTPROCs on the key of the index record, browsing the index can be a pointless exercise for the subfile user. (Actions A32 and A65 will not be executed during "BROWSE processing", however.) If only an INPROC is defined for an index record key and the INPROC sets the type for the key, e.g., an A31 identifies the stored key as a hexadecimal one, then SPIRES will convert the values displayed to string values when the BROWSE command is issued. This is not usually as valuable, or as straightforward, as putting the appropriate INPROCs and OUTPROCs in the index record definition.

It is also important to code INPROCs and OUTPROCs (and perhaps even ALIASES) if the index is to be used as a goal record that can be selected (using the SELECT command) or attached (using the ATTACH command).

In the following example, suppose the first index record, the SUBJECT index, can be selected as a goal record. An element called CROSS-REFERENCE has been added, so that the file owner can add cross-reference records to the SUBJECT index. Also, an OUTPROC action 32 has been added on the pointer, and it will convert the pointer on output by referring to the goal record it locates and looking up the TITLE element. The details of coding action 32 are covered in "Indirect Record-Access: Action 32 and SUBGOAL Processing." [See C.5.]

B.7.14  Index Records for Non-Removed Record Types

If the goal record does not have REMOVED specified, the file definer may not use the TYPE=LCTR specification in defining index records. Whenever the pointer to the goal record is the goal record key, rather than a location of the goal record in the residual data set, then TYPE=LCTR may not be specified.

This is always the situation when the goal record is not REMOVED. It may also occur for REMOVED goal records if the file definer has chosen to pass the goal record's key rather than goal record's residual location.

All of the sample index definitions shown so far have assumed that a location in the residual data set is being stored rather than a key. However, it is fairly simple to see the implications of storing a key for index record definition:

 - "TYPE=LCTR;" cannot be used.
 - If the goal record key is of fixed length, then code  "LEN=n;"  in  place  of
 "TYPE=LCTR;"  where "n" is the fixed length of the goal record key.
 - If the goal record key is varying in length, then the pointer element cannot be in  the
 FIXED section of the index record or structure in which it is defined.

Usually you will pass the locator rather than the goal record key as the pointer. However, it can be very useful to pass the key at times: if the key itself is stored in the index record, the key can be examined directly when the index is used as a goal record. Also, if the techniques of "goal-to-goal passing" or "self-indexing goal records" are being used, the key usually must be passed. [See C.12.] Passing the goal record key to indexes may be a good idea if any of the following is true:

 - the key is fixed-length and short (fewer than five characters)
 - the goal record is SLOT
 - the indexes will be used to produce lists of goal record keys.

B.7.15  Ensuring the Validity of Index Records

Index records are normally maintained entirely by SPIBILD, in accordance with the rules the file definer specifies in the linkage section of the file definition. [See B.8.] The file definer does not need to take any explicit action to ensure that information in the indexes is valid, but must ensure that a null value isn't passed as the key of an index record.

When index records can be transferred and updated as goal records, [See B.7.13.] the file owner must ensure that a user cannot incorrectly alter the linkages between goal and index records build by SPIBILD.

The most important ingredient or rule of this linkage is that all keys along the structural path from the index record's key to the pointer group (or pointer element) are in descending sort order. This is the way they are automatically created by SPIBILD. The best way to ensure this is to make these elements non-updateable. This is usually done with a PRIV-TAG specification. [See B.9.4.]

All structures along the pointer group path up to and including the pointer element itself must be in descending sort order. This can be ensured by coding an A138:0 as the INPROC for all structures along this path.

For example, in a personal name index [See B.7.11.] the sub-index structure must be sorted in descending order by its key (a person's first name), and:

 - if the pointer-group is a structure, it must be sorted in descending order by its  key,
 which is the pointer
 - if the pointer-group is just a simple element, then it must  be  sorted  in  descending
 order.

B.7.16  Personal Name Algorithm Details

The following describes in detail the way the "personal name algorithm" matches (or does not match) values in search commands against values in index records.

Assume that a NAME index has been built using the personal name algorithm and that it is being searched by a corresponding search term. The algorithm first places the search value into the standard form: surname, non-surnames. Thus:

becomes: NAME Vincent, John Michael

Note that the non-surname portion of the search value may be null as in: NAME Smith

The algorithm then establishes a null "master" pointer group set into which matching pointer groups will be OR'd (if found).

The algorithm then proceeds to retrieve records based on the surname portion of the search value (including truncated search retrieval). Each record retrieved contains sub-index portions keyed on non-surnames. For each record retrieved, these sub-index portions are processed as follows:

The first non-surname in the search value is used to scan the names in a sub-index key. The scan matches through the length of the shorter of any two names. Thus, "David" in the sub-index key and "D" in the search value match (D=D). Likewise, "ANN" in the sub-index key and "ANNE" in the search value match (ANN=ANN). The scan of the sub-index key names continues until a match is made. Then the next non-surname in the search value is used and the scan continues in a similar manner.

If all non-surnames in the search value succeed in matching, then the associated pointer groups in this sub-index portion are OR'd into the "master" set. Note that for a search value with a null non-surname, all sub-index keys are assumed to match.

If all the non-surname fields in the sub-index key are scanned before all the fields in the search value have been used, then this sub-index portion is ignored (does not match).

For example:

If the index contains "Smith, Jane Anne Marie" then:

 - if the search is for "Smith, J M", it succeeds.
 - if the search is for "Smith, Ann", it succeeds.
 - if the search is for "Smith, Ma", it succeeds.
 - if the search is for "Smith, Jane M Ann", it fails.
 - if the search is for "Smith, Anabel Marie", it fails.
 - if the search is for "Smith, Mary", it fails.

If the index had contained "Smith, J A M" then the last two searches above would have succeeded.

There is an option on the personal name algorithm used in building indexes that allows multiple index entries to be built from a single name. This is particularly useful with married women's names. For example:

would be indexed as both of the following:

This allows retrieval by maiden name to still succeed.

Finally, it should be noted that the personal name algorithm can be used in searching "funny names". All that is required is that there be a sub-index keyed by terms that look like non-surnames. This can be accomplished by using A38 in the Goal record as INPROC/OUTPROC rules, and then pass that element to the sub-index (without forcing to upper case). For example, assume a structure contains the following items in the Goal records:

FAMILY is just a surname; PARENT and CHILD are non-surnames. Thus,

Upon passing, FAMILY is passed to the key of an index record just like a standard index (A169). The PARENT and CHILD elements are then passed to the sub-index key (A167:5,PARENT,CHILD). The SRCPROC for FAMILY does something like: SRCPROC = A38/A14,#; specifying both "personal name algorithm" and truncated search. A search request could then be formed like: FIND FAMILY JONES, JOHN

The index is still a "personal name" index, but it is built from separate elements within the Goal records. This same kind of process could be applied to any single-word element and multi-word "qualifiers" as long as these multi-word elements can be upper case in the goal record AND can be searched by non-surname matching rules.

B.8  Understanding and Coding the Linkage Section

B.8.1  Functions of the Linkage Section

The previous chapters of this manual have covered the definition of the record-types that will make up a SPIRES file. Two kinds of record-types have been examined in detail: goal records and index records.

The linkage section, as its name implies, links the goal and index records for two purposes: searching and passing. The linkage section controls the search process by specifying in the "SEARCHTERMS" statement the names of the components of an index to be searched. The linkage section also specifies, in the "SRCPROC" statement, the processing rules to be applied to values in a search request. Passing, which is the process of using information in a goal record to build an index record, is controlled by specifying the goal record information to be passed. The source of this information is specified by the "GOALREC-ELEM" statement or by processing rules coded in the "PASSPROC" statement.

Thus we can see at least two different parts of a file definition. The first part, defining the goal and index records, is a description of data structures. The second part, defining the linkage between goal and index records, is devoted to procedural rather than descriptive statements. These procedural statements provide for passing and searching. A third part of the file definition, defining the privileges of any user or group of users with respect to a subfile, is described in the next chapter "Defining Subfile Privileges." [See B.9.]

The linkage section itself can be subdivided into small sections: 1) a single group of statements defining certain global relationships between a goal record and all its index record(s), and 2) groups of statements describing the specific processing of the linkage between the goal record and a single index record. (1) is discussed in "The Global Parameters Section" [See B.8.2.] and (2) is described in "Individual Index Linkages" [See B.8.3.] There usually is one individual index linkage for each index record you have defined. The structure of these parts is fairly simple; the definition of the linkage is in terms of SEARCHTERMS, SRCPROC, GOALREC-ELEM and PASSPROC statements. If a combined index, qualifiers, or sub-indexes are used, then one or two additional elements must be specified in the linkage definition for the index record in which they occur. The only difficulty usually encountered in defining linkage sections is in coding the various PASSPROC rules, and occasionally in coding the SRCPROC rules; we will not consider the definition of these processing rule strings in detail until the end of this chapter.

B.8.2  The Global Parameters Section

The linkage section for any particular goal record begins with some "global" information that is common to all indexes belonging to that goal record. This information always includes the name of the goal record to which the entire linkage section applies, the name given to a search result for the goal record, and the name of the pointer element in all of the indexes. Any global qualifiers (qualifiers that are passed to all indexes) are specified here also.

Linkage sections are coded following the record definitions of the goal and index records. The linkage section begins with the global parameters portion:

Because of the rarity of global qualifiers, the additions they require to the global section will not be covered until later in this chapter. [See B.8.6.] Let's begin coding the linkage section by defining the global parameters section for a very simple bibliographic file. In this file, we have one goal record, BOOK, and two index records, REC02 and REC03. Let's say that we want the search result to be called "CITATION".

The file definition, up through the global portion of the linkage section, looks like this:

The GOALREC-NAME statement names the goal record by specifying its RECORD-NAME. The EXTERNAL-NAME statement declares what a search result will be called when SPIRES reports the result count after a search command such as FIND. The PTR-ELEM statement names the element in each index record that is to receive the pointer back to the goal record. You may choose any element name you wish, but it must be the same in all index records. In our example, it happens to be POINT-BACK. The PASSPROC specifies A170 because the pointer element is TYPE=LCTR and the goal records are REMOVED. A170 specifies that the information passed to the pointer element in each index will be the address of the goal record in the residual data set. If the pointer element is not TYPE=LCTR, then the information passed to the pointer element in each index should be the key of the goal record, which is usually specified by a GOALREC-KEY statement. If the goal records are not REMOVED, then the pointer element cannot be TYPE=LCTR and A170 cannot be used.

B.8.3  Individual Index Linkages

After any global parameters section, a linkage between the goal record and each index record must be defined. The definition of these linkages is, in structure, fairly straight-forward, and looks like the following:

The structure of this skeleton can be slightly complicated by the inclusion of linkage information for a sub-index, local qualifiers, or a combined index. These cases will be covered later in this chapter. [See B.8.5, B.8.7, B.8.8.]

A "recipe" for coding the global and individual parameters of the linkage section is given in "A Guide to Coding the Linkage Section Definition." This guide covers all types of linkage definition. [See D.6.] The different kinds of index records coded in the preceding chapter will serve as examples of simple, personal name, qualified and combined index linkage sections. We will take each possibility in turn, leaving the detailed consideration of PASSPROC rule strings to the end of this chapter.

The PTR-GROUP statement, for any particular index, names a multiply occurring element in the index that is either a STRUCTURE element whose KEY is the PTR-ELEM, or a simple ELEM which is the PTR-ELEM. If the subfile needs only simple pointers back to goal records (no combined index or qualifiers), then PTR-ELEM and PTR-GROUP refer to the same simple pointer ELEM in all indexes. But if there is a need for combined index or qualifier terms, then PTR-GROUP for each index refers to a multiply occurring STRUCTURE which will contain those terms. The KEY of each STRUCTURE is the PTR-ELEM (pointer element).

B.8.4  Simple Indexes

Here are the two individual linkages to the index records REC02 and REC03. In the complete file definition we are building towards, these would be added right after the global parameters section with which the previous example ended.

The INDEX-NAME statements name the particular index records that will be linked to the goal record; here they are REC02 and REC03. The SEARCHTERMS statement is similar to the ALIASES statement in the goal record definition. Here SEARCHTERMS specifies the name or names that can be used to access an index in a search command such as FIND. [See B.9.4 to see how PRIV-TAG can restrict the use of SEARCHTERMS.]

The SRCPROC statement specifies processing that is to be performed on search values given in search commands. This processing is usually equivalent to a combination of both INPROC and PASSPROC rules used to determine the form in which goal record values are to be placed in the index record. That is, SRCPROC rules are usually coded to "translate" incoming search values into values that might be found in the index records. The SRCPROC for REC02:

breaks a search value up into individual words ("A45,", which breaks on blanks), then excludes any words of two or fewer characters (A47,2), and allows special truncated search if a word of more than three characters contains a "#" (A11:3,#).

Notice that the PASSPROC for this index contains similar rules:

A166 specifies that the goal record element value (or values) named in GOALREC-ELEM is to be fetched, and is later to be processed by A45 or A38 (both are actions that "split" a value into parts). The rules "A45,/ A47,2" make sure that only individual words are passed to the key of the index records, and that no words less than two characters are passed. This part of the rule string is identical to a portion of the SRCPROC rule string.

The SRCPROC and PASSPROC rules coded for REC03 are as follows:

The SRCPROC rules coded will convert a date in a search value to the internal form of a date, just as was done by an INPROC=AS31 statement in the goal record definition. The PASSPROC rule specifies only that the element whose name is coded in the GOALREC-ELEM statement be fetched and stored in the index record without the standard conversion to uppercase on passing. Values that are stored in character form should always be forced to uppercase on passing. Any other form of a value (e.g., binary, floating-point, packed decimal) should not be forced to uppercase. No translation by a matching AS31 is necessary in the PASSPROC, since the date is stored in the appropriate format in the goal record via the INPROC=AS31. Part of the power of SPIRES indexing methods is that values can appear in the goal record in one form, and can be passed and searched in a more usable (for the searcher) form.

The PTR-GROUP statement names the same element as the PTR-ELEM statement because our indexes have been defined to use only simple pointer elements (no qualifiers or combined index).

Although each INDEX-NAME refers to a different RECORD-NAME in our example, it is possible for any RECORD-NAME to be referenced by more than one INDEX-NAME. Such a case usually occurs when different elements within the goal records are to be passed to the same index, but those elements require different PASSPROC or SRCPROC rules.

B.8.5  Sub-Indexes

The general form of SUB-INDEX linkage is similar to INDEX-NAME:

When SUB-INDEX terms are added to a simple index, the effect is to introduce additional structural levels to the hierarchy leading from the KEY of the INDEX-NAME record to the PTR-GROUP element. SUB-INDEX names a keyed structure in the index record. The KEY of that STRUCTURE receives the goal record's value being passed for the sub-index term. A personal name index is a good example of a simple index with a sub-index structure. Let's modify our sample file definition to include a PERSON element in the BOOK records, and another index record: REC04

No GOALREC-ELEM was needed for the SUB-INDEX term in this example because the PASSPROC A165 indicates the value to be passed to FIRST-NAME had already been created by A38 in the PASSPROC associated with INDEX-NAME. This is usually the case with personal name sub-indexes, but not for other sub-index structures. The SEARCHTERMS of the SUB-INDEX for personal name are not usually used in a search request because A38 in the SRCPROC of the INDEX-NAME provides the necessary search values for the SUB-INDEX. [See B.9.4 to see how PRIV-TAG can restrict the use of SEARCHTERMS.]

Let's examine the index record definition and linkage definition for a sub-index that is not for a personal name. Suppose the following hierarchy were needed for an airline reservation system:

So, SEAT is inside SECTION which is inside FLIGHT. The index record definition for this structure would look like this:

The linkage definition for this index record would look like this:

Note the use of A171 to pass a default value of SECTION and SEAT if no value is found in the goal record. This will ensure that the index record is created, even if it is incomplete. A171 is also used this way in passing qualifier elements. [See B.8.6, B.8.7.]

The SEARCHTERMS of a SUB-INDEX are specified with a leading @-sign in a search request along with the SEARCHTERMS of the INDEX-NAME. For example, FIND FLIGHT 27 @SECTION B @SEAT 9 requests a specific hierarchy within the REC04 index.

B.8.6  Global Qualifiers

In order to have qualifiers in an index record, PTR-GROUP should specify a structure element in all index records. PTR-ELEM specifies the KEY of the structure, and the other elements within the structure receive qualifier values. The "form" of the structure must be the same across all index records associated with a particular GOALREC-NAME. By that is meant, the number of FIXED, REQUIRED, and OPTIONAL elements must be the same in each definition of the structure; and the LENgth and OCCurrence attributes associated with corresponding elements must be the same within each structure. The KEY of the structure receives the pointer back to the goal.

Global qualifiers are specified in the global parameters section of a linkage description just prior to the first INDEX-NAME section. The statements of the QUAL-ELEM section are:

The SEARCHTERMS of any QUAL-ELEM are specified in a search request following the AND or AND NOT logical operators. [See B.9.4 to see how PRIV-TAG can restrict the use of SEARCHTERMS.]

Let's alter our sample file definition and linkage section to pass DATE as a global qualifier instead of building a separate DATE index (REC03). We will make DATE a global qualifier of both TITLE (REC02) and PERSON (REC04) indexes. Since PTR-ELEM must become the KEY of a PTR-GROUP structure, we will have to alter the index record definitions. The revised definition might look like:

Notice that we defined the POINTER-STR as consisting of entirely FIXED information, and included LEN=8 with TYPE=STR. The "form" of the pointer group structure is the same in all indexes.

If the DATE element within the BOOK records occurred multiple times, only the first occurrence would be passed to the global qualifier. And if DATE hadn't occurred at all, either A171 would need to be specified in the PASSPROC to supply a default value, or else a null value would be passed to the global qualifier.

There is a special case of global qualifier worth mentioning. If the keys of goal records are passed to PTR-ELEM, then the pointer element in the indexes referred to by the PTR-ELEM can also be referred to by a global QUAL-ELEM. The QUAL-ELEM would not specify a GOALREC-ELEM since the key of the goal records had already been passed to PTR-ELEM. The SRCPROC would correspond to the INPROC of the goal record's keys, and the PASSPROC must be A165. The SEARCHTERM statement provides you with search names that allow you to use the PTR-ELEM as a qualifier, which means you can qualifiy your search requests by goal record key criteria.

If this special QUAL-ELEM is the only qualifier defined for the GOALREC-NAME, and there is no combined index, then PTR-GROUP, PTR-ELEM, and this QUAL-ELEM can all refer to the same simple element in the indexes. This is the only exception to the rule about PTR-GROUP structures being required when qualifiers are defined.

B.8.7  Local Qualifiers

All the rules for PTR-GROUP structures and PTR-ELEM keys apply for local qualifiers just as they do for global qualifiers. [See B.8.6.] Local qualifiers are specified in the linkage section for any particular index by adding QUAL-ELEM sections just after the PTR-GROUP statement.

Let's alter our sample file definition and linkage section again to pass DATE as a local qualifier of the TITLE index (REC02) instead of making it a global qualifier in all indexes. We will keep the personal name index (REC04) introduced in the SUB-INDEX section, but it will not have a qualifier.

Notice the DUMMY element in the pointer group structure of REC04. It is there to make the structure "form" identical to the structure defined in REC02, which has a DATE-QUALIFIER element. Also notice that both the DUMMY element and DATE-QUALIFIER element were declared OPTIONAL. That's because the DUMMY element will not occur within REC04 occurrences of the POINTER-STR.

In this sample definition, the DATE element in the goal records always occurred since it is a FIXED element. However, if had been an OPTIONAL element which did not occur, then A171 should be coded in the PASSPROC to pass some default value to the local qualifier, otherwise no index entries would be created for TITLE. All local qualifier and sub-index sections must define values for an index entry to be created. If the goal record elements which supply values for local qualifiers or sub-index terms are multiply occurring, or a PASSPROC rule specifies multiple passer elements, then multiple index entries can be created. [See B.8.14.]

Problems may be encountered if a variable length qualifier is passed to a fixed-length qualifier element in the index record. If this is being done, the following PASSPROC should be included with any other qualifier PASSPROCs:

where "n" is the value of the LEN statement on the qualifier element (i.e., the fixed-length field size).

B.8.8  Combined Indexes

CINDEX-VALUE is just a special case of local qualifier. PTR-GROUP must be a structure with PTR-ELEM as its KEY.

Let's alter our file definition to make DATE a combined index. REC03 will now be used to define a combined index record-type. Remember, the "form" of the PTR-GROUP structure must be the same in all index definitions. Here is the revised file definition, including the linkage sections for both the simple index on TITLE and the combined index on DATE (the PERSON index has been dropped).

Compare the linkage definitions for REC02, a simple index, and REC03, a combined index. Notice that the PTR-GROUP statements refer to different structure names in each index, but the "form" of those structures is the same, and they have the same KEY name. The PTR-GROUP structure names are usually the same, but that is not a requirement, as this example demonstrates.

Also notice that the combined index linkage has a "dummy" SEARCHTERMS statement coded, no SRCPROC or GOALREC-ELEM statements, two PASSPROC statements, and a new kind of statement, "CINDEX-VALUE".

When searching a combined index, the searcher may use the element name or alias of any of the goal record elements passing to the combined index; the SEARCHTERMS statement must be coded, but its value is meaningless. The index names that are reported when a user issues the SHOW SEARCH TERMS command are picked up from the P+ values of PASSPROC=A167. Note from the description of this action [See D.1.7.] that the order of the P+ parameters is not important unless some of the elements being passed are inside structures; in this case, the order must be the order in which the elements would be displayed if a record from the file were displayed in the standard output format.

It is this first PASSPROC, A167, that specifies the goal record elements that are passed to the combined index; this is why no GOALREC-ELEM statement is needed. Instead of a SRCPROC rule string, SPIRES passes all search values through the INPROC rules for the particular goal record element being searched. (Only one SRCPROC rule can be coded in a combined index definition: SRCPROC = A6; in the INDEX-NAME section.)

The CINDEX-VALUE statement is only coded in the linkage to a combined index, immediately following the PTR-GROUP statement. It names an element in the index record's PTR-GROUP structure that will receive data values being passed from the goal record elements (See A167 in the PASSPROC of INDEX-NAME). In the sample file, this element has the name "VALUE", hence the statement CINDEX-VALUE=VALUE in the linkage definiton.

The final statement, a second PASSPROC, is always coded in combined index linkages. If the elements being passed are in binary form (as is often the case in combined indexes), such as the DATE element, then A169:1 is the only rule coded for this statement. If the elements are values that must be converted to uppercase, then A169:0 (or simply A169) is coded. If some elements being passed must not be converted to uppercase and others must be, then A162 is also coded, as explained later. [See B.8.11.]

B.8.9  Coding SRCPROC Rules

If the values of an element have been altered by an INPROC or PASSPROC then the same processing rules are generally coded in the SRCPROC rule string to apply a similar transformation on the values a user might specify in search commands.

The following SRCPROC rules are available to modify the search process. Other actions may also be used in SRCPROC rule strings.

If no SRCPROC statement is coded, and thus no SRCPROC rules are specified, the default SRCPROC will be used: "A45,". This will automatically cause search values to be broken on blanks.

B.8.10  The NOPASS Statement

The NOPASS statement may be specified in the linkage section of a file definition. If it is coded, SPIBILD will not attempt to update any of a subfile's indexes when it is processing records. The indexes can still be searched, however.

The statement, "NOPASS;", is placed after the last statement in the linkage section of the goal record whose indexing is to be "turned off." The file definition must then be recompiled. Subsequent SPIBILD processing will not attempt to pass any information to the subfile's indexes.

In order to re-start index updating, the NOPASS statement must be removed from the file definition, and the definition must then be recompiled. Note that NOPASS stops passing to all indexes in a single linkage section; it cannot be used to disable one of several indexes selectively.

B.8.11  Coding PASSPROC Rules

The rules for coding strings of PASSPROCs are more rigid and difficult than the rules for coding INPROCS, OUTPROCS or SRCPROCS. The descriptions of the PASSPROC rules in the last part of this manual [See D.1.7, D.2.6, D.3.] and in the ACTIONS subfile are very concise; a problem with their brevity is that the choices a file definer has in coding PASSPROC rule strings are not easily distinguished. This section will focus on the central choices that must be made in coding PASSPROC strings.

The first PASSPROC encountered in most file definitions is in the global parameters of the linkage section. For example:

As has been mentioned, PASSPROC=A170 is used if the goal record is REMOVED (as most samples definitions in this manual are) and the PTR-ELEM is declared TYPE=LCTR. This rule says that the PTR-ELEM in each index record will receive the locator of the goal record in the residual data set.

It is also possible to pass the key of the goal record to the PTR-ELEM, instead of the locator of the goal record in the residual data set. If the goal records are not REMOVED then you must pass the key. To do this, one of the following PASSPROC rules must be coded instead of A170:

Both of these rules do not force to uppercase. When no PASSPROC is coded, the default action is to force the key to upper case, which should only be done when the goal record's key is already an uppercase value.

The first rule is used when only the GOALREC-KEY is to be passed. That includes passing the slot-number of SLOT records. The second rule is usually used with multiple passer elements, but can be used with just the goal record's key specified.

Although it is not frequently done, it is possible to pass the key of the goal record as the pointer even if the goal record is a REMOVED record-type. This should be avoided if the key is varying in length or if the key is more than four bytes long.

There are two other places in the linkage section where the choice of a PASSPROC is simple. The first of these is the second PASSPROC statement coded in a combined index linkage definition:

A169:1 is used when the value or values being passed are stored as numbers and hence must not be forced to uppercase. A169:0 would be used if character strings were being passed to this index. If both character and binary data were being passed to this index, then A162:1 would also be coded to exclude certain elements' values from uppercase conversion. For example:

would force all values to uppercase upon passing, except those from the DATE element.

The second case in which the choice of a single PASSPROC is simple is the second PASSPROC string coded in a personal name index:

Here, only PASSPROC=A165 can be coded because A38 in the PASSPROC of INDEX-NAME supplies both the KEY of the index record and the KEY of the sub-index structure.

B.8.12  Choosing the "Fetcher" PASSPROC

Choosing the PASSPROC rule that fetches the element value or values from the goal record is a matter of selecting one rule from among sixteen that are shown in a table below. However, the table itself requires some explanation of the terminology that is often found in SPIRES processing rule descriptions.

The terms "single passer" and "multiple passer" need definition. A single passer situation occurs when only one goal record element is passing its value or values to an index. It does not matter whether this element is itself singly or multiply occurring, or whether A45 is used to break a single occurrence into multiple occurrences. A multiple passer situation occurs when more than one element in the goal record passes to a single index. For example, if HOME-PHONE and BUSINESS-PHONE elements in a goal record were both passed to a PHONE index, this would be a multiple passer situation.

Combined indexes present a special problem, since they contain two PASSPROC rule strings. However, the choice of PASSPROC rules is fairly straight-forward. [See B.8.5 for a complete discussion.] The first PASSPROC is always "A167:0"; this defines the elements from the goal record to be passed to the index. The second PASSPROC may be either A169:0 or A169:1, depending on whether the elements being passed are to be forced to upper case in passing or not; see the entry for a single passer without A38 or A45 in the following table.

One other factor affects the choice of PASSPROC rules from the table. If values "fetched" (obtained from) the goal record are to be broken apart in passing by action 45 or by action 38 (the latter for personal name indexes), then different PASSPROC rules must be selected than would be if A45 or A38 were not to be coded in the same rule string.

The P1 parameter on the PASSPROC that fetches the value from the goal record is determined by whether or not the value is to be forced to uppercase in passing. Values that are converted to an internal form, such as fixed binary or date values, must not be converted to "uppercase" in passing, since this would change their value. The P1 parameter is also determined by whether or not the value should be processed through the OUTPROC rules associated with that element, thus passing the external form of the value to the index.

                 Without                With
               A38 or A45            A38 or A45
         |--------------------|---------------------|
         |                    |                     |
 Single  |  A169:0            |  A166:0             |--Force Upper
 Passer  |                    |                     |
         |            A169:1  |            A166:1   |--Don't Force
         |                    |                     |
         |  A169:8            |  A166:8             |--Force Upper, but
         |                    |                     |  Pass External Form
         |            A169:9  |            A166:9   |--Don't Force, but
         |                    |                     |  Pass External Form
         |--------------------|---------------------|
         |                    |                     |
Multiple |  A167:1            |  A167:2             |--Force Upper
 Passers |                    |                     |
         |            A167:5  |            A167:6   |--Don't Force
         |                    |                     |
         |  A167:9            |  A167:10            |--Force Upper, but
         |                    |                     |  Pass External Form
         |            A167:11 |            A167:12  |--Don't Force, but
         |                    |                     |  Pass External Form
         |--------------------|---------------------|

B.8.13  Other Actions in a PASSPROC Rule String

The following describes the syntax for any PASSPROC rule string. You enter the table at the word "PASSPROC" and follow the paths defined. The symbol "::" is read "is defined as." Terms on the left side of a "::" are defined by the term(s) that appear on the right side of the "::". Terms on the right side of the "::" that are listed directly under another term (or terms) on that side are an alternative definition for the term on the left side of the "::". The symbol "|" means "or," and also separates alternative definitions.

 <Term>    indicates a term that must occur once, i.e., is required
 (Term)    indicates a term that may occur once, i.e., is optional
 (0,Term)  indicates a term that may occur several times,
           i.e., may not occur or may occur more than once
 A-number  indicates a required processing rule.  If no P1
           parameter is specified, then all P1 parameters are
           included.  If a P1 parameter is specified, then only
           that P1 parameter is allowed.
PASSPROC         ::  <MULTIPLE-PASSER>
                  |  <SINGLE-PASSER>
                  |  <SIMPLE-PASSER>
MULTIPLE-PASSER  ::  (DEFAULT) <MULTIPLE-FETCHER> (0,MIDDLE) <BREAK>
SINGLE-PASSER    ::  (DEFAULT)  <SINGLE-FETCHER>  (0,END)
SIMPLE-PASSER    ::  <DEFAULT> | A165 | A167:0 | A170
DEFAULT          ::  A171
MULTIPLE-FETCHER ::  A166 | A167:2 | A167:6
SINGLE-FETCHER   ::  A167:1 | A167:5 | A169
MIDDLE           ::  A22  | A32  | A36  | A40  | A43  | A44  | A46  | A47
                  |  A48  | A55  | A62  | A161 | A162 | A163 | A168
BREAK            ::  A45  (0,END)
                  |  A38
END              ::  <MIDDLE> | A52 | A164

For example, this syntax shows that the following is illegal syntax:

because A52 (as an END rule) must follow A45 (which is a BREAK rule). This syntax also shows that A38 must be the last rule in any PASSPROC in which it is coded.

B.8.14  How Passing Works

The SPIBILD and FASTBILD processors make use of the Linkage Sections of a File Definition (GOALREC-NAME sections) in the following ways:

Each record type of a file is processed in ascending order by record-type number (all REC1's, then all REC2's, etc.). Goal records associated with each record-type are processed either in the order in which they are input under BATCH command processing, or in ascending key order if thru the DEFQ. Note that UPDATE and REMOVE commands under BATCH processing normally cause DEFQ entries to be created and later processed in key order. ADD commands under BATCH processing are handled in the order they are received.

Limiting the discussion to just the PROCESS command in SPIBILD will simplify what happens, and make it possible for you to determine what happens on a "recovery" situation. For simplicity, assume all records come from the DEFQ. (Note that the process of passing index information is optimized during BATCH MERGE requests in SPIBILD. Only indexed information that is changed between the old version of a record and a new version is passed. This minimizes CPU and I/O activity.)

Phase 1

If the record-type does NOT have an associated Linkage Section (or sections), then skip to Phase 2.

For ADD-type records, a copy of the record is first added to the TREE if the record type is declared as REMOVED. If the record fails to add because a copy already exists in the TREE, then it is assumed that "recovery" is taking place, and the record is updated in the TREE. The original DEFQ record is now sent to the passing process marked for pointer INSERT. For UPDATE-type or REMOVE-type records, the original TREE copy of the record is read up and sent to the passing process marked for pointer DELETE. Then, the original UPDATE-type record from the DEFQ is sent to the passing process marked for pointer INSERT.

The passing process is controlled by the Linkage Section. For each record received (marked for pointer INSERT or DELETE), the PTR-ELEM and Global QUAL-ELEM portion of the Linkage is processed first. One and only one item is placed in the "pass stack" (called PS) for each item in this Common portion. PASSPROC A165 may be used to cause no entry, or something like A167:1 (multiple passer) can be used to cause the first occurrence of the first existing element in the list to be placed in PS. If no value can be found and A165 or A171 was not used, then a "null" value is placed in PS.

Once the Common portion of the Linkage Section is finished, processing begins on each INDEX-NAME portion. The INDEX-NAME, SUB-INDEX, and Local QUAL-ELEM terms constitute one INDEX-NAME portion. These terms are "traversed" in both a forward and backward manner beginning with a forward scan starting with INDEX-NAME and running thru the last term. Each term causes one value to be retrieved from the goal record. If no value can be found during a forward scan, and A165 or A171 was not used, then NO INDEX ENTRY is defined, and backward scan commenses. If the forward scan can be completed, then an INDEX ENTRY is defined, and backward scan commences. During backward scan, each term attempts to retrieve a value from the goal record from where it left off during the previous forward scan. If another value can be retrieved, then forward scan resumes from this point, and if it continues forward thru the last term, then another INDEX ENTRY is defined. However, if during any forward scan, structural boundaries have to be crossed to satisfy a term as compared to a previous term, then the backward scan will NOT be able to pick up inside the same occurrence of the structure again. The common root of the two paths closest to the record level determines the next forward path.

Here are some examples using a single INDEX-NAME section with different layouts of elements in goal records. Note: only important items are shown.

The layout of WORD, DATE, and TIME in the goal records can vary in many ways. Structural boundaries can make a big difference in what happens. Consider the following:

 Record-level  WORD(1)  WORD(2)  DATE(1)  DATE(2)  TIME(1)  TIME(2)
 --------------|--------|--------|--------|--------|--------|

All element are doubly occurring, and all are at the Record-level. The following INDEX ENTRIES will be constructed:

Now consider the following layout:

                              TIME(1)             TIME(2)
                     DATE(1)  |          DATE(2)  |
   WORD(1)  WORD(2)  |--------|          |--------|
  -|--------|--------|-------------------|

Here, TIME is the key of a structure which has been defined as an element of another structure whose key is DATE, and that structure is defined at the record level along with WORD.

The following INDEX ENTRIES will be created:

After completing the first forward pass, an attempt to pick up another value of TIME on a backward scan would have required leaving the structural bounds defined by the first DATE. On a backward scan, structural bounds defined by a previous element can't be crossed, so the backward scan continues by trying to get another DATE value. The boundary now is the record-level defined by WORD. Since the DATE structure has its root at that level, SPIBILD can proceed forward again beginning with DATE(2), and then TIME(2). Scanning backward again takes us all the way back to WORD(1) from which SPIBILD now picks up WORD(2) and scan forward again. If there had been other occurrences of the TIME structure associated with DATE(1), then SPIBILD would have picked all of them up along with DATE(1), but none of them with DATE(2). The same thing would have happened if TIME had been simply a multiply occurring element inside the DATE structure instead of being the key of a multiply occuring structure defined inside the DATE structure. That is,

                     DATE(1)  TIME(1)    DATE(2)  TIME(2)
   WORD(1)  WORD(2)  |--------|          |--------|
  -|--------|--------|-------------------|

Now, consider the following layout:

                                         WORD(1)  WORD(2)
                     DATE(1)  DATE(2)    |--------|
   TIME(1)  TIME(2)  |--------|----------|
  -|--------|--------|

Here, WORD is a multiply occurring element of a structure defined as an element of another structure containing a multiply occurring DATE element, and that structure is defined at the record-level along with a multiply occurring TIME element.

The following INDEX ENTRIES will be created:

Why? Because on the forward scan, SPIBILD has to abandon the structure containing WORD to retrieve DATE, and then SPIBILD has to abandon that structure to retrieve TIME. We can't go back into the same occurrences of those structures on the backward scan. The common root would be the next occurrence of a structure at the record-level. Interestingly enough however, the following example does give us more INDEX ENTRIES (Note the similarity to the first structural layout).

                              WORD(1)             WORD(2)
                     DATE(1)  |          DATE(2)  |
   TIME(1)  TIME(2)  |--------|          |--------|
  -|--------|--------|-------------------|

Here, WORD is the key of a structure which has been defined as an element of another structure whose key is DATE, and that structure is defined at the record level along with TIME.

The following INDEX ENTRIES will be created:

Once all INDEX ENTRIES for an INDEX-NAME section have been placed in PS, the next INDEX-NAME section is processed. An INDEX-NAME section is considered completed when a backward scan for the INDEX-NAME term fails to retrieve a value. When all INDEX-NAME sections have been completed, passing for the record is finished, but the information concerning index updates only exists in PS. When PS fills up or no more records are retrieved from the DEFQ, then PS is processed by sorting it down to unique INDEX ENTRIES cancelling duplicate entries and any INSERT/DELETE pairs. The result is a list of INSERT and/or DELETE entries to be applied to the indexes. The appropriate index records are added, removed, or updated using the PS information. Because of the sort, an index record need only be read once in order to apply all PS information associated with that record.

When all DEFQ records for one record-type have been passed, processing continues with Phase 2 below.

Phase 2

All the records in the DEFQ associated with one record-type are read sequentially. For each ADD-type record, the record is added to the TREE if no Linkage Section was defined, or the record type is not removed; otherwise, these ADD-type records are ignored. For each REMOVE-type record, the TREE copy of the record is removed. For each UPDATE-type record, the TREE copy is replaced by the DEFQ copy. When all the records for a particular record-type have been processed, the next logical record-type is processed starting back at Phase 1. When all record-types have been processed, the DEFQ is cleared and the PROCESS command is finished.

1  Triples

There is a "type" associated with all variables in SPIRES. The normal types are as follows:

CHAR is a "fixed length" form of STRING. LINE is the type associated with line-number variables like $WDST or $WDSR. The others represent the most commonly used types.

A 'VALUE' is an association of the value of a field with the name of the field. A Triple (or 3-Tuple) is used to associate more than 2 entities. A triple might be used if it was also necessary to know who owned the field. For example: BIRTHDATE (the attribute) and 476322 (the employee number whose birthdate this is) and 19540611 (the value). Thus a triple is an association of three pieces of information. Such an association is itself an entity with TYPE=TRIPLE. A collection of triples form another entity with TYPE=GROUP. Thus, there are additional types that variables can assume. They are:

1.1  Triple Functions

Triples are constructed and manipulated by $-functions. The functions are:

Within the parens in the list above, "att", "obj", and "val" represent variables and/or values of any type.

1.1.1  $MAKE

$MAKE guarantees that the triple exists and returns a type TRIPLE result. There are two common ways to make a triple:

EVAL makes the triple but doesn't assign it to a variable. This is the most common way of creating triples. The PSEUDO variable $NEW can be used in any field to make a unique triple when less than 3 values are available. Each use of $NEW causes a different PSEUDO value to be generated.

1.1.2  $MADE

$MADE returns an integer count of the number of triples that satisfy the criteria. The PSEUDO variable $ANY can be used to indicate a non-specific criteria. For example:

would set N to 2 since there are two triples that have "A" as the att-field, "B" as the obj-field, and anything in the val-field. $MADE($ANY,$ANY,$ANY) returns the count of all triples that currently exist.

1.1.3  $UNMAKE

$UNMAKE unmakes the triple specified. If $ANY is coded as a parameter, $UNMAKE unmakes all triples that qualify.

unmakes all triples unconditionally. No error occurs on an attempt to unmake a triple that already is unmade. As in the case of $MAKE, $UNMAKE guarantees that the triple doesn't exist. However, if the triple has been assigned to a variable, then the variable must be eliminated before space for the triple is released. Thus:

1.1.4  $UNMAKETRIPLE

$UNMAKETRIPLE unmakes a specified triple. For example:

then

both unmake the same triple. $UNMAKETRIPLE returns an integer which is the use-count of the triple unmade. Note that the triple-type variable must still be eliminated before space for the triple is released.

1.1.5  $LOOKUP

$LOOKUP returns the first triple which satisfies the criteria.

would return the first triple encountered that had "A" in the att-field, "B" in the obj-field, and anything in the val-field.

1.2  Decomposing Triples

The $ATTRIBUTE, $OBJECT, and $VALUE functions are used to isolate the three components of a triple, any of which may be another triple. A triple-type variable cannot be displayed by the /* command because triples do not convert to strings.

1.2.1  $ATTRIBUTE

$ATTRIBUTE returns the att-field of a specified triple. Thus:

sets R to the value that T had when the triple was made. The "type" of R would be the same as the type associated with T (which might be triple or group).

1.2.2  $OBJECT

$OBJECT returns the obj-field of a specified triple.

sets O to the string-value "APPLES".

1.2.3  $VALUE

$VALUE returns the val-field of a specified triple.

sets V to the integer-value of 9.

1.3  Groups

Triples are 3 values, given that a value may itself be a triple. Groups are N values, all of which must be triples. Thus groups are N-Tuples which may be referenced like a list.

1.3.1  $GROUP

$GROUP makes a group out of all triples that qualify. Thus:

makes a group out of all triples that have "A" in the att-field, "B" in the obj-field, and anything in the val-field.

eliminates the G variable and the associated group.

1.3.2  $GROUPSIZE

$GROUPSIZE returns an integer representing the size (i.e. number of members) in the specified group. For example: