INDEX
*  SPIRES File Definition
A  Introduction to SPIRES File Definition
A.1  Preface
A.2  System Overview
A.2.1  SPIRES Command and Definition Languages
A.2.2  SPIRES Processors and Programs
A.2.3  The User Interface
A.2.4  File Definition Concepts and Facilities
A.3  The Process of Defining a File
A.3.1  Design Analysis
A.3.2  File Definition
A.3.2.1  The File Definer: A SPIRES Subsystem to Simplify File Definition
A.4  Glossary of Important File Definition Terms
A.4.1  Element
A.4.2  Record
A.4.3  Structure
A.4.4  Key
A.4.5  Goal Record
A.4.6  Index Record
A.4.7  Record-type
A.4.8  Index
A.4.9  Simple Index
A.4.10  Compound Index
A.4.11  Combined Record-type
A.4.12  Removed Record-type
A.4.13  Subfile
A.4.14  File
A.4.15  Hierarchy of File Definition Components
B  Defining a SPIRES File
B.1  Goal Record Concepts and Definition
B.1.1  Element Names, Occurrences, and Lengths
B.1.2  File and Record Name Statements
B.1.3  Element Categories
B.1.4  Record Keys
B.1.5  Element Name, Occurrence and Length Statements
B.1.6  Element Aliases
B.1.7  Dummy Elements; Comment Statements
B.1.8  Optional Statements: AUTHOR, MAXVAL, NOAUTOGEN and BIN
B.1.8.1  (The AUTHOR Statement)
B.1.8.2  (The MAXVAL Statement)
B.1.8.3  (The NOAUTOGEN Statement)
B.1.8.4  (The BIN Statement)
B.1.9  Statements in the Subfile Section
B.1.10  A Complete File Definition
B.2  Goal Record Keys, Slot and Removed Records
B.2.1  Record Keys
B.2.2  Slot Keys
B.2.2a  SLOT-START Statement
B.2.3  Removed Record-Types
B.2.4  Monotonic Record-Types
B.3  Structures
B.3.1  Data Structuring
B.3.2  Coding Structures
B.3.3  Structured-Data Input
B.3.4  Keyed Structures
B.3.5  Keyed-Structure Data Input
B.3.6  Floating Structures
B.4  Processing Rules: INPROC, INCLOSE, OUTPROC
B.4.1  Functions of Processing Rules
B.4.2  INPROC and OUTPROC Rule Functions
B.4.3  Processing Rule Strings
B.4.4  Processing Rule Syntax
B.4.5  Processing Rule Restrictions
B.4.6  Understanding Processing Rule Descriptions
B.4.7  Custom Error Messages
B.4.8  INCLOSE Rules
B.4.9  Processing Rules For Numeric Data
B.4.10  Processing Rules for Dollar-and-Cents Data
B.4.11  Processing Rules for Free Form Character Strings
B.4.12  Processing Rules for Strings of Codified Data
B.4.13  Processing Rule for Personal Names to Canonical Form
B.4.14  Processing Rules for Validating Length and Occurrence
B.4.15  Processing Rules and Element Types
B.4.16  Processing Rule Tracing: SET PTRACE
B.4.17  Processing Rule Tracing for Passprocs: SET PASSTRACE
B.5  The FILEDEF Subfile and File Compilation
B.5.1  The FILEDEF Subfile
B.5.2  Adding Records to FILEDEF
B.5.3  Compiling File Definitions
B.5.4  Altering a File Definition in FILEDEF
B.5.5  ORVYL Files Created by Compilation
B.5.6  The ATTACH and SELECT Commands
B.5.7  The PROCESS Command in SPIBILD
B.5.8  Making Major Changes to a File: The ZAP FILE Command
B.5.9  Making Minor Changes to a File: The RECOMPILE Command
B.5.10  Destroying a SPIRES File
B.5.11  Summary
B.6  File Structure: Tree & Slot, Goal & Index, Removed Records
B.6.1  Introduction
B.6.2  Element Storage
B.6.3  Record Storage
B.6.4  Removed Record-Types
B.6.4.1  Very Large Databases
B.6.4.2  RES-LARGE Statement
B.6.5  Combined Record-Types
B.6.5a  Extended Tree Data Sets for Large Databases
B.6.5b  The OVERFLOW-TO and OVERFLOW-KEY statements
B.6.5c  The EXTERNAL-TYPE statement
B.6.6  Figure: Function of Goal and Index Records
B.6.7  Figure: Storage of Element Length and Occurrence Information
B.6.8  Figure: A Tree-Structured Data Set
B.6.9  Figure: Detail of the Structure of a Single File Block
B.6.10  Figure: Sample Tree After Intense Local Growth
B.6.11  Figure: Sample Tree With Well-Distributed Growth
B.6.12  Figure: Tree Showing High Number of Access Per Record
B.6.13  Figure: Previous Tree After Rebalancing
B.7  Understanding and Coding Index Records
B.7.1  How Indexing Works
B.7.2  Understanding Simple Indexes
B.7.3  Understanding Qualifiers
B.7.4  Understanding Sub-Indexes
B.7.5  Understanding Compound Indexes
B.7.6  The Impact of Global FOR and ALSO on Indexing
B.7.7  Index Definition
B.7.8  Coding Simple Indexes
B.7.9  Coding Simple Indexes with Qualifiers
B.7.10  Coding Compound Indexes
B.7.11  Coding Sub-Indexes
B.7.12  Index Record and Goal Record Elements
B.7.13  Index Records as Goal Records
B.7.14  Indexes for Non-Removed Record Types; Keys vs. Locators
B.7.15  Ensuring the Validity of Index Records
B.8  Understanding and Coding the Linkage Section
B.8.1  Functions of the Linkage Section
B.8.2  The Global Parameters Section
B.8.2.1  The SEARCHPROC Statement in the Global Parameters section
B.8.2.2  The EXTERNAL-NAME Statement
B.8.3  Individual Index Linkages
B.8.4  Simple Indexes
B.8.5  Sub-Indexes
B.8.6  Global Qualifiers
B.8.7  Local Qualifiers
B.8.8  Compound Indexes
B.8.9  Coding Searchproc Rules
B.8.10  The NOPASS Statement
B.8.11  Coding PASSPROC Rules
B.8.12  Choosing the "Fetcher" Passproc
B.8.13  Other Actions in a PASSPROC Rule String
B.9  Defining Subfile Privileges
B.9.1  The Function of the Subfile Section
B.9.2  Basic Statements in the Subfile Section
B.9.2a  Subfile Selection for "Access Lists" of Accounts
B.9.3  The SECURE-SWITCHES Statement
B.9.3.1  Secure-Switches 1 and 2
B.9.3.2  Secure-Switch 2
B.9.3.3  Secure-Switch 3
B.9.3.4  Secure-Switch 4
B.9.3.5  Secure-Switch 5
B.9.3.6  Secure-Switch 6
B.9.3.7  Secure-Switch 7
B.9.3.8  Secure-Switch 8
B.9.3.9  Secure-Switch 9
B.9.3.10  Secure-Switch 10
B.9.3.11  Secure-Switch 11
B.9.3.12  Secure-Switch 12
B.9.3.13  Secure-Switch 13
B.9.3.14  Secure-Switch 14
B.9.3.15  Secure-Switch 15
B.9.3.16  Secure-Switch 16
B.9.3.17  Secure-Switch 17
B.9.3.18  Secure-Switch 18
B.9.3.19  Secure-Switch 19
B.9.3.20  Secure-Switch 20
B.9.3.21  Secure-Switch 21
B.9.3a  The SHOW SSW and SET SSW Commands
B.9.4  Security for Individual Elements and Indexes
B.9.4.1  Views: Element Security Defined in Packets
B.9.4.2  Advanced Features of the View Facility
B.9.4.3  Specific Effects of the View Facility
B.9.4.4  Priv-Tags and the CONSTRAINT and NOUPDATE Statements
B.9.4.5  The INPROC-REQ and OUTPROC-REQ Statements
B.9.4.6  Index Security: Priv-Tags and the NOSEARCH Statement
B.9.5  The SUBGOAL Statement
B.9.6  The SELECT-COMMAND Statement
B.9.7  The PROGRAM Statement
B.9.8  The SUBCODE Statement
B.9a  Defining File Access Privileges
B.10  SPIRES File Management
B.11  Logging Database Use in SPIRES
B.12  Immediate Indexing
B.12.1  Coding Immediate Indexes
B.12.2  Efficiency Considerations for Immediate Indexes
B.12.3  Immediate Indexing and Goal-to-Goal Passing
C  Additional Facilities for the SPIRES File Definer
C.1  Recompile of an Existing File's Definition
C.1.1  The Function of the RECOMPILE Command
C.1.2  Statements You Can Change, Add or Delete Anytime
C.1.3  Statements You Can Sometimes Change, Add or Delete
C.1.4  Statements You Can Never Change, Add or Delete
C.2  [Currently not used]
C.3  Synonyms
C.3.1  The Function of a Synonym
C.3.2  Defining a SYNONYM Index
C.3.3  Adding Synonyms to Index Records
C.4  Executable Elements: Protocols and TYPE=XEQ
C.5  Indirect Record-Access: Action 32 and SUBGOAL Processing
C.5.1  The Function of Action 32
C.5.2  Action 32: Problem One
C.5.3  First Solution to Problem One
C.5.4  Second Solution to Problem One
C.5.5  Third Solution to Problem One
C.5.6  Action 32: Problem Two
C.5.7  Solution to Problem Two
C.5.8  SUBGOAL Processing
C.6  Practical Techniques for File Definers and Managers
C.6.1  Introduction
C.6.2  Proximity Searching: Information for File Developers
C.6.2.1  Proximity Searching: File Definition Requirements
C.6.2.2  Proximity Searching: How it Works
C.6.5  Examination Of Index Entries
C.6.10  Automatic Accumulation of Record Modification Dates
C.6.11  Free Global Qualifiers
C.6.12  Record Protection By Account Number
C.6.13  Same-Structure Retrieval Through Indexes
C.6.14  Phonetic Search of Personal Names
C.6.15  Non-Unique INDEX-NAME Statements
C.6.17  The QUELEVEL and RES-LEVEL Statements
C.6.18  Record Size Limits and Split Records
C.6.19  Indexing Negative Integer or Real Values
C.6.20  Encrypting Data Values
C.6.21  DEFQ-Only Record-Types
C.6.22  File Security: ORVYL Data Sets
C.6.23  Creating a Deferred Queue on Another Account
C.6.23a  Creating a Duplicate Deferred Queue
C.6.24  Checkpoint Data Sets
C.7  Compiling File Definition Code from Several Sources
C.7.1  The RECDEF Subfile
C.7.2  The EXT-REC and EXT-LINK Statements
C.7.3  Comparing the DEFINED-BY and EXT-REC / EXT-LINK Statements
C.8  FASTBILD
C.9  File Definition Compilation Diagnostics
C.9.1  Determining the Location of Errors
C.9.2  ATCHFILE, INITFILE and Other Error Messages
C.9.3  Compile Diagnostics and Errors
C.9.3.1  * ACCOUNT MISMATCH
C.9.3.2  * AET TABLE OVERFLOW
C.9.3.3  * COMBINE RECORD INVALID
C.9.3.3.0  * NAME PCT TABLE FULL
C.9.3.3.0a  * PACKED CHAR TABLE FULL
C.9.3.3.0b  * ELEMENT TABLE FULL
C.9.3.3.0c  * USERPROC TABLE FULL
C.9.3.3.1  * LABL PCT TABLE FULL
C.9.3.3.1a  * IMT TABLE FULL
C.9.3.3.1b  * AET TABLE FULL
C.9.3.3.1c  * LABEL TABLE FULL
C.9.3.3.2  * TOO MANY RECORD TYPES
C.9.3.3.2a  * TOO MANY REAL RECORD TYPES
C.9.3.3.3  * TOO MANY SLOT TYPES
C.9.3.3.4  * XEQ TYPE MUST BE VARIABLE
C.9.3.4  * DEFINED-BY RECORD TABLES NOT FOUND
C.9.3.5  * EXTRANEOUS string
C.9.3.6  * FILE EXISTS
C.9.3.7  * FILE NOT AVAILABLE
C.9.3.8  * ILLEGAL 2ND PROPERTY
C.9.3.9  * INVALID ACTION CODE
C.9.3.9a  * ELEMENT MUST BE TYPE STRUCTURE
C.9.3.10  * INVALID ACTION SYNTAX
C.9.3.10a  * ILLEGAL ACTION SEQUENCE
C.9.3.10b  * INVALID ACTION GROUP
C.9.3.11  * INVALID CINDEX
C.9.3.12  * INVALID MNEMONIC
C.9.3.13  * INVALID NAME LEN
C.9.3.13a  * INVALID LOCATOR LENGTH
C.9.3.14  * INVALID REC NAME
C.9.3.15  * INVALID SEQUENCE
C.9.3.16  * KEY ELEMENT ERROR
C.9.3.17  * LENGTH VALUE > 255
C.9.3.18  * LENGTH NOT GIVEN
C.9.3.19  * DUPLICATE ELEMENT NAMES IN STRUCTURE
C.9.3.19a  * DUPLICATE VARIABLE NAME
C.9.3.20  * NO ELEMENTS ALLOWED WITH DEFINED-BY VALUE
C.9.3.21  * NO ELEMENTS IN RECORD
C.9.3.22  * NO ELEMENTS IN STRUCTURE
C.9.3.23  * NO RECORD TO COMPILE
C.9.3.24  * NON FIXED ELEMS FOR SLOT
C.9.3.25  * ONE UNIQUE ID PER RECORD
C.9.3.26  * PROCESSING RULE TABLES FULL
C.9.3.27  * PROCESSING RULE TABLE OVERFLOW
C.9.3.28  * RECORD KEY ELEMENT MISSING
C.9.3.28a  * IMMEDIATE INDEX IS ALSO IMMEDIATE GOAL
C.9.3.28b  * TREE-DATA VALUE INVALID
C.9.3.29  * INVALID SLOTCHECK VALUE
C.9.3.29a  * INVALID ELEM MNEMONIC
C.9.3.29b  * ACTION 32 RECORD recname NOT VALID
C.9.3.29c  * USEMPROC error
C.10  Processing Rule-String Procedures
C.10.1  The PROC and RULE Statements
C.10.2  The PARM and DEFAULT Statements
C.10.3  The SYMBOL and VALUE Statements
C.10.4  Processing Rule Procedure Examples
C.10.5  The EXTDEF Subfile
C.11  User-Defined Processing Rules: Userprocs
C.11.1  Coding Userprocs
C.11.2  Uprocs (Commands) Available in Userprocs
C.11.2.1  SET Uprocs
C.11.2.2  Block-Construct Uprocs for Execution Flow
C.11.2.3  Other Uprocs for Execution Flow
C.11.2.4  Uprocs for Setting User Variables
C.11.2.5  Uprocs for Terminal Input/Output
C.11.2.6  Miscellaneous Userproc Uprocs
C.11.3  Using Variables in Userprocs
C.11.3.1  System Variables Available Only in Userprocs
C.11.3.2  User Variables in Userprocs
C.11.4  Some Interesting and Unusual Uses for Userprocs
C.11.4.1  Retrieving Other Element Values with the $GETxVAL Functions
C.11.4.2  (*) Outproc Userprocs for Record Keys and Secure-Switch 13
C.11a  Virtual Elements
C.11a.1  Examining Virtual Elements
C.11a.2  Retrieving Other Elements for Virtual Elements
C.11a.3  The REDEFINE Statement
C.11a.3.1  (A Comparison of Rule A79, $GETxVAL, and Redefining Elements)
C.11a.4  Variably-Occurring Virtual Elements
C.12  Record-Types that Serve as Goals and Indexes; Goal-to-Goal Passing
C.12.1  Index Record-Types Used as Goal Records
C.12.2  Record-Types with Goal and Index Data
C.12.3  Linking Goal-Record Types: Goal-to-Goal Passing
C.12.3.1  Rules and Suggestions for Goal-to-Goal Passing
C.12.3.2  A Solution to the Example in C.12.3
C.12.3.3  Using Qualifiers in Passing to Create Goal Records
C.12.4  Double-Headed Files
C.12.5  Chain Passing
C.12.6  Goal-to-Same-Goal Passing
C.12a  Indirect Searching
C.12a.1  Coding Details for Indirect Indexes
C.12a.2  Uses for Indirect Searching
C.12a.3  Some Technical Details on how SPIRES Handles Indirect Indexes
C.12a.4  Dynamic Indexes
C.13  File Definition Information Packets
C.13.1  Element Information Packets
C.13.1.1  The ELEMINFO (INFO) Info-element
C.13.1.2  The NOTE Info-element
C.13.1.3  The DESCRIPTION (DESC) Info-Element
C.13.1.4  The HEADING (HEAD) Info-element
C.13.1.5  The COL-HEADING (COLHEAD) Info-element
C.13.1.6  The WIDTH (WID) Info-element
C.13.1.7  The ADJUST (ADJ) Info-element
C.13.1.8  The INDENT Info-element
C.13.1.9  The MAXROWS (MAXROW) Info-element
C.13.1.10  The EDIT Info-element
C.13.1.11  The VALUE-TYPE (VTYPE) Info-element
C.13.1.12  The INDEX Info-element
C.13.1.13  The USERINFO Info-element
C.13.1.14  The INPUT-OCC (INOCC) Info-element
C.13.1.15  The DEFAULT Info-element
C.13.1.16  The SUPPLIED Info-element
C.13.1.17  The RDBMS_COLUMN Info-element
C.13.1.18  The RDBMS_DATATYPE Info-element
C.13.1.19  The RDBMS_DATALENGTH Info-element
C.13.2  System Commands and Utilities Using Element Information
C.13.3  Index Information Packets
C.13.3.1  The INDEXINFO (INFO) Info-element
C.13.3.2  The NOTE Info-element
C.13.3.3  The DESCRIPTION (DESC) Info-Element
C.13.3.4  The SOURCE (SOU) Info-element
C.13.3.5  The VALUE-TYPE (VTYPE) Info-element
C.13.3.6  The TRUNCATE (TRUNC) Info-element
C.13.3.7  The EXCLUDE (EXC) Info-element
C.13.3.8  The USERINFO Info-element
C.13.4  System Commands and Utilities Using Index Information
C.13.5  Alternate Locations for Information Packet Definitions
D  Appendices
D.1  Actions: Complete Listing By Number
D.1.1  About this Chapter
D.1.2  Actions Used Only As SEARCHPROC Rules (A6 -- A16)
D.1.2.0.0.6  * A6
D.1.2.0.0.7  * A7
D.1.2.0.0.8  * A8
D.1.2.0.0.9  * A9
D.1.2.0.1.0  * A10
D.1.2.0.1.1  * A11
D.1.2.0.1.2  * A12
D.1.2.0.1.3  * A13
D.1.2.0.1.4  * A14
D.1.2.0.1.5  * A15
D.1.2.0.1.6  * A16
D.1.2.0.1.7  * A17
D.1.3  Actions Used as INPROC, OUTPROC, SEARCHPROC or PASSPROC Rules (A21 -- A66)
D.1.3.0.2.1  * A21
D.1.3.0.2.2  * A22
D.1.3.0.2.3  * A23
D.1.3.0.2.4  * A24
D.1.3.0.2.5  * A25
D.1.3.0.2.6  * A26
D.1.3.0.2.7  * A27
D.1.3.0.2.8  * A28
D.1.3.0.2.8a  * A28 for time
D.1.3.0.2.8b  * A28 for datetime
D.1.3.0.2.9  * A29
D.1.3.0.3.0  * A30
D.1.3.0.3.1  * A31
D.1.3.0.3.2  * A32
D.1.3.0.3.3  * A33
D.1.3.0.3.4  * A34
D.1.3.0.3.5  * A35
D.1.3.0.3.6  * A36
D.1.3.0.3.7  * A37
D.1.4.0.3.8  * A38
D.1.4.0.3.9  * A39
D.1.4.0.4.0  * A40
D.1.4.0.4.1  * A41
D.1.4.0.4.2  * A42
D.1.4.0.4.3  * A43
D.1.4.0.4.4  * A44
D.1.4.0.4.5  * A45
D.1.4.0.4.6  * A46
D.1.4.0.4.7  * A47
D.1.4.0.4.8  * A48
D.1.4.0.4.9  * A49
D.1.4.0.5.0  * A50
D.1.4.0.5.1  * A51
D.1.4.0.5.2  * A52
D.1.4.0.5.3  * A53
D.1.4.0.5.4  * A54
D.1.4.0.5.5  * A55
D.1.4.0.5.6  * A56
D.1.4.0.5.7  * A57
D.1.4.0.5.8  * A58
D.1.4.0.5.9  * A59
D.1.4.0.6.0  * A60
D.1.4.0.6.1  * A61
D.1.4.0.6.2  * A62
D.1.4.0.6.3  * A63
D.1.4.0.6.4  * A64
D.1.4.0.6.5  * A65
D.1.4.0.6.6  * A66
D.1.4.0.6.7  * A67
D.1.4.0.6.8  * A68
D.1.4.0.6.9  * A69
D.1.5.0.7.0  * A70
D.1.5.0.7.1  * A71
D.1.5.0.7.2  * A72
D.1.5.0.7.3  * A73
D.1.5.0.7.4  * A74
D.1.5.0.7.5  * A75
D.1.5.0.7.6  * A76
D.1.5.0.7.6a  * A76 for date
D.1.5.0.7.6b  * A76 for datetime
D.1.5.0.7.7  * A77
D.1.5.0.7.8  * A78
D.1.5.0.7.9  * A79
D.1.5.0.8.0  * A80
D.1.5.0.8.1  * A81
D.1.5.0.8.2  * A82
D.1.5.0.8.3  * A83
D.1.5.0.8.4  * A84
D.1.5.0.8.5  * A85
D.1.5.0.8.6  * A86
D.1.6  Actions Used Only as INCLOSE Rules (A122 -- A148)
D.1.6.1.2.2  * A122
D.1.6.1.2.3  * A123
D.1.6.1.2.4  * A124
D.1.6.1.2.5  * A125 assigned element
D.1.6.1.2.6  * A126
D.1.6.1.2.7  * A127
D.1.6.1.2.8  * A128
D.1.6.1.2.9  * A129
D.1.6.1.3.0  * A130
D.1.6.1.3.1  * A131
D.1.6.1.3.2  * A132
D.1.6.1.3.3  * A133
D.1.6.1.3.4  * A134
D.1.6.1.3.7  * A137
D.1.6.1.3.8  * A138
D.1.6.1.3.9  * A139
D.1.6.1.4.0  * A140
D.1.6.1.4.6  * A146
D.1.6.1.4.7  * A147
D.1.6.1.4.8  * A148
D.1.7  Actions Used Only as PASSPROC Rules (A161 -- A171)
D.1.7.1.6.1  * A161
D.1.7.1.6.2  * A162
D.1.7.1.6.3  * A163
D.1.7.1.6.4  * A164
D.1.7.1.6.5  * A165
D.1.7.1.6.6  * A166
D.1.7.1.6.7  * A167
D.1.7.1.6.8  * A168
D.1.7.1.6.9  * A169
D.1.7.1.7.0  * A170
D.1.7.1.7.1  * A171
D.1.7.1.7.2  * A172
D.2  Quick Reference to Processing Rule Functions by Number
D.2.1  Actions Used Only As SEARCHPROC Rules (A6 -- A16)
D.2.2  Actions Used as INPROC, OUTPROC or SEARCHPROC Rules (A21 -- A37)
D.2.3  Actions Used as INPROC, OUTPROC, SEARCHPROC, PASSPROC Rules (A38-A62)
D.2.4  Actions Used Only as OUTPROC Rules (A71 -- A85)
D.2.5  Actions Used Only as INCLOSE Rules (A122 -- A148)
D.2.6  Actions Used Only as PASSPROC Rules (A161 -- A171)
D.3  Quick Reference to Processing Rules by Function-Keyword
D.3.1  Binary Conversion
D.3.2  Character Test
D.3.3  Comparison and Generation of Elements
D.3.4  Date Generation and Conversion
D.3.5  Default Value Generation
D.3.6  Dollar-and-Cents Conversion
D.3.7  Floating-Point Conversion
D.3.8  General
D.3.8a  Hexadecimal Conversion
D.3.9  Insertion of String
D.3.10  Length Test
D.3.11  Multiple Occurrence Conversion
D.3.11a  Packed Decimal Conversion
D.3.12  Personal Name Algorithm
D.3.13  Range Test
D.3.14  String Replacement, Inclusion, Exclusion, Code Translation
D.3.15  Time Generation and Conversion
D.4  FILEDEF Subfile Syntax and Semantics
D.4.1  Understanding FILEDEF Subfile Structure
D.4.2  Record Level Elements
D.4.3  Record Definition Elements
D.4.4  Key Element Definition
D.4.5  Element Definition
D.4.6  Structure Definition Elements
D.4.7  Linkage Section Definition Elements
D.4.8  Individual Index Linkage Elements
D.4.9  Qualifier Linkage Definition Elements
D.4.10  Sub-Index Linkage Definition Elements
D.4.11  Compound Index Linkage Definition Elements
D.4.12  Subfile Section Definition Elements
D.4.13  Processing Rule Procedure Definition Elements
D.4.14  SLOT Section Definition Elements
D.4.15  User Defined Processing Section
D.4.16  Subfile Section Selection Elements
D.4.17  FILE-PERMITS Definition Elements
D.4.18  Element-Information-Packet Elements
D.4.19  Index-Information-Packet Elements
D.4.20  View Definition Elements
D.4.21  Version Information Elements
D.5  A Guide to Coding Index Record Definitions
D.6  A Guide to Coding the Linkage Section Definition
D.7  Annotated File Definition Examples
D.7.1  GA.SPI.BIBLIOGRAPHY
D.7.2  GA.SPI.BIBLIOGRAPHY2
D.7.3  GA.SPI.PEOPLE
D.7.4  GA.SPI.RESTAURANT
D.7.5  XA.B14.AV
D.7.6  GA.SPI.MAIL
D.8  Passing Keys or Passing Locators to Indexes
:29  SPIRES Documentation

*  SPIRES File Definition

******************************************************************
*                                                                *
*                     Stanford Data Center                       *
*                     Stanford University                        *
*                     Stanford, Ca.   94305                      *
*                                                                *
*       (c)Copyright 1994 by the Board of Trustees of the        *
*               Leland Stanford Junior University                *
*                      All rights reserved                       *
*            Printed in the United States of America             *
*                                                                *
******************************************************************

        SPIRES (TM) is a trademark of Stanford University.

A  Introduction to SPIRES File Definition

A.1  Preface

The intent of this manual is to enable anyone with a good knowledge of SPIRES searching and updating to define a functional SPIRES file and manage its contents.

The file definition process for most files is fairly straight- forward: analyze the structure of the records you want to have in your data base, define the characteristics of those records in the file definition language, then test your definition with sample data records. Next, by analyzing the requirements for retrieving those records, define the indexes required and the method in which information from the data records will be passed to the indexes. After testing the searching capabilities you may want to define various levels of access and protection for your file.

Before tackling a production file requiring complex techniques, experiment with a file of modest requirements. This manual is meant to accompany you through your first file definition; several appendixes give details of some powerful definition techniques, which rely on a previous mastery of the basics of file definition.

Since a file definition is itself a record in a SPIRES system-owned subfile, a knowledge of SPIRES searching and updating commands is essential for entering a file definition. This knowledge is also essential for the assessment of searching requirements for a file: you must formulate your searching needs in terms of the SPIRES command facilities you understand. For example, you might be likely to overlook compound indexing techniques as a possibility for a file if you had not used the extended capabilities of compound indexes in searching a SPIRES file. An informed file definer is first an experienced SPIRES user; you are encouraged to investigate the SPIRES facilities you will need with the SPIRES consultant.

Many experienced users are not aware of the internal features of SPIRES that provide the command language. Therefore, Part A of this document is devoted to linking conceptually the internal file definition options to the external command language. A cursory view is presented of those facilities of the file definition language which are invisible to the general user.

The section following the overview gives a timetable for the file definition process, outlining the topics to be considered at each stage of the definition. It is tempting to define a file all at once, that is, defining indexes and access privileges while trying to decide the length of certain elements. This approach is haphazard at best; at its worst it is confusing--try to approach the tasks of file definition in the order in which they are presented in this manual.

If you encounter problems in learning file definition, you may contact the SPIRES consultant in SCIP User Services at Polya Hall for assistance. Extensive assistance in defining files is available for a fee through Contract Programming services.

Intimately tied to a file may be various input and output formats, as well as protocols. Formats and protocols, whose command languages are described in other SPIRES documents, can be used to present an attractive, concise and helpful interface between your file and its users.

The original file definition manual was written by J. R. Schroeder; the present document was written by J. R. Sack.

A.2  System Overview

SPIRES, the Stanford Public Information Retrieval System, is a generalized, online data base management system developed at Stanford University in the early 1970's.

The task of the original SPIRES development was to provide a file and file management system for Stanford's library automation project (BALLOTS). The versatility of the present SPIRES design can be gauged by the diversity of applications SPIRES now supports. Since 1972 well over 300 data bases have been implemented, including such applications as BALLOTS, the library automation project; bibliographic citations files, such as PLANTBIO; student record management; document inquiry and preparation systems, using SPIDOC; program library maintenance, MASTERLIST; survey data, both geo-physical and astronomical; inventory and materials tracking systems; directories, catalogs, mailing lists, and others. Present files range in size from a few dozen records to over three-quarters of a million. In sum, SPIRES serves the database needs of a large and diverse computing community.

SPIRES users design and maintain their own data bases; there is no centralized data base administrator. A number of the data base applications noted above were defined by individual users, non-professionals in data base systems, largely without individual programming aid from the data base professional staff.

Files presently in the system vary widely in complexity from unindexed files with two elements (a protocols subfile, for example), through files with ten indexes for as many elements (a personnel file, perhaps), up to files with over a hundred elements and nested data structures (the MARC or FILEDEF subfile, for example). Many of the present files are made up of more than one subfile, with the file definition describing the interrelationships between subfiles in a single file.

The command language available to SPIRES users for manipulation and management of these data bases is described in the SPIRES/370 Searching and Updating Manual; only commands that relate specifically to file definition and management will be discussed in this manual.

A.2.1  SPIRES Command and Definition Languages

The various "languages" that form the language for development, management and use of data bases are:

Interactive Command Language

Most SPIRES users are familiar with this language, which is used for input and editing of data (TRANSFER, UPDATE, ADD), record retrieval (FIND, AND, OR, ALSO, FOR), and record display (SET FORMAT, SET REPORT, TYPE, OUTPUT, DISPLAY). The prospective file definer and manager should know the capabilities of the index and sequential searching commands.

Protocols Language

This facility is an extension of the command language; protocols are a set of SPIRES, WYLBUR, MILTEN and ORVYL commands. The commands make up a program that can be executed by users needing guidance in manipulation of a particular data base. Using protocols you can extend the normal SPIRES command language, tailoring the interface of specific users to specific files. This tailoring is particularly valuable when the end user of a production file has no special training in SPIRES commands.

Protocols are developed and tested interactively, and feature string manipulation and arithmetic functions as well as condition testing and branching capabilities for sophisticated interactive dialog and data base manipulation.

Format Definition Language

Any file user can provide formats for input and/or output of any data base's contents by defining a set of data transformations that map information from a source (a data base or terminal, for example) to a destination (a terminal or data base).

By means of input formats, a user can be prompted for the records' element values, and be given helpful diagnostic messages and reprompts should any error conditions be raised. Also, input formats provide a tool for converting pre-existing machine readable data to SPIRES-suitable input form. Using commands for arithmetic and string operations, condition testing and branching, complex algorithms for data validation can be performed that are unavailable using even elaborate file-definition input editing rule sequences.

By means of output formats, products such as reports, directories and catalogs can be produced by mapping file element values, and any other computed values, onto a two-dimensional array that can be output onto a terminal, line-printer, or full-face CRT. Output formats can make SPIRES data base contents acceptable for use by batch programs, or can arrange the same data so it can be easily understood by an untrained user.

The formats facility should be considered an integral part of the file/user interface. Database contents can be graphically organized so that information structures and hierarchies are easily recognized by any user. For example, a bibliographic entry in the CATALOG file can be output in the form of a library catalog card so that any user would easily recognize which elements are the author, title, and call number by their place on the output screen. Using another format, information in a card catalog file can be selected for printing in the form of a call number on a book's spine label. A data base of computer charges can be used to produce a billing letter for an individual user and reports of system-wide charges for an accountant. Just as in an outline, formats can make use of indentation to highlight hierarchical relationships of elements in a record. Where data is logically understood as a table, table formats can be devised. Text such as catalogs can be output in a form suitable for photocomposition.

A.2.2  SPIRES Processors and Programs

There are several SPIRES processors that are important to the file definer and manager.

Online SPIRES

The SPIRES that most users know, this is the program with which most file searching and updating is done.

Batch SPIRES

This is an offline version of SPIRES that allows users to indicate a series of SPIRES commands that are to be executed during non-prime time blocks. Large searches and reports are also aided by this facility, since they are not hampered by the active file and user core size restrictions that are applied to interactive SPIRES users. Use of this facility is made easier by the OFFLINE file: user commands are added as a record to this file, then executed after file updating is done by JOBGEN (see below).

SPICOMP

This is an online program that compiles file definitions, format definitions, variable group definitions and protocols, returning error messages if the compilation process is not successful. Users no longer need SPICOMP, since all the compilation functions mentioned are incorporated into SPIRES.

JOBGEN

This is a batch program run every night against any file that has records in the deferred queue and does not have the NOAUTOGEN option set. JOBGEN submits a batch SPIBILD job (see below) that causes the passing of deferred queue records to goal and index records overnight, during non-prime time blocks; the same passing can be done online by SPIBILD. The file owner can issue online commands that cause JOBGEN to pass over the file on certain nights, or can indicate in the file definition that JOBGEN is not to be run on the file.

SPIBILD

This program can be called in either an online or batch form to pass records in the deferred queue of your file to the goal record and index record data sets.

FASTBILD

This is a batch program that greatly reduces the CPU time and I/O's necessary for adding a large number of initial records to a new (empty) file. A protocol is available to generate the necessary JCL.

FASTADD

This is a batch program similar to FASTBILD: it provides a facility for adding a large number of records to a file using less CPU time and fewer I/O's than SPIBILD. Unlike FASTBILD, it can be used on files that already contain data. A protocol is available to generate the necessary JCL.

Host Language Interface (HLI)

This facility allows access to SPIRES data bases through batch programs in PL/I, COBOL, and FORTRAN. The batch programs can provide input and/or process output from SPIRES files.

A.2.3  The User Interface

A SPIRES user can access any data base permitted to him or her through a powerful set of English-like commands. The same command language applies to all SPIRES files, in keeping with the general-purpose intent of the SPIRES system. Though the number of commands is large, a searcher develops a feel for the grammar implicit in SPIRES commands. These commands allow you to

At Stanford these command facilities are supplemented by the text-editor, WYLBUR, which provides commands for data entry, offline listings, and remote job entry. Terminal communication is the function of another service program called MILTEN. The timesharing monitor, called ORVYL, supports the file system in which SPIRES data bases are stored, and controls virtual memory and resource scheduling for interactive SPIRES users. These three subsystems, WYLBUR, ORVYL and MILTEN, are distinct from SPIRES, and SPIRES does not duplicate their functions. At other installations these companion services are provided by different text editors, communications controllers and timesharing systems.

The SPIRES commands used for searching and updating any data base are covered in a user's manual of less than 170 pages ("SPIRES/370 Searching and Updating"), and can be learned in a four-session course.

A.2.4  File Definition Concepts and Facilities

The primary data base unit familiar to SPIRES users is the "subfile." The subfile is a set of goal records optionally linked to index records. Speaking generally, we can say that a goal record is what you retrieve from a search request--a book in the CATALOG subfile, a restaurant in the RESTAURANT subfile, a file definition in the FILEDEF subfile.

The subfile name is specified in a section of the file definition, appropriately called the "subfile section," in conjunction with a list of the computer account numbers of users or groups of users for whom you specify various levels of access to your file. There are many levels possible. You may make the entire subfile available to the public for searching and updating (the RESTAURANT subfile is an example of this privilege level). You may make all records searchable and only some updatable (such as the PEOPLE subfile), or make only certain records available for search and update (such as the FILEDEF and FORMATS subfiles). You could even prevent a user from seeing, searching and/or updating certain elements in all of the goal records of a subfile. Of course, a subfile's use can be restricted to only one account (the file owner's) if you wish.

The file definer must be aware of the difference between a file and a subfile. The difference is best shown through examples: the DOCUMENTS, MASTERLIST, and MASTERLIST SHARE CODES subfiles are all part of a SCIP-maintained file of software resources; the MARC and CATALOG subfiles belong to a single file maintained by the BALLOTS Center.

It is often useful to put subfiles that share some common information together in a single file, allowing SPIRES to look up or cross-reference information in one subfile when it is needed for input or output operations in another. Information is not stored redundantly in each subfile, because look-ups between subfiles in the same file can be performed; such a method of linkage also makes file updating more convenient, since common information need only be modified once per file, not once per subfile. It might be useful to link together subfiles of student, teacher, and course goal records in a single file, looking up a student's identification number or name when printing out a teacher's course list, and looking up a course number or title when printing out a student's transcript.

Search and retrieval of a reasonable number of records (less than five million) through index is accomplished in seconds, with a maximum of five disk accesses to retrieve records in a file of a million records. For most applications SPIRES uses a "B-tree" method of record access. The chapter on SPIRES File Structure contains more information on this. [See B.6.] Rapid retrieval is possible if the file definer has specified that selected information in the goal records be passed to index records. If an index was not built when records were first added to the file, it can usually be added easily to the file at a later date, by using the original goal records to pass the requisite information to that index. In cases where indexes have not been built, perhaps because the frequency of searching for a certain kind of information would not have warranted the cost of building or maintaining an index, the file can be searched sequentially for this information. It is also possible to obtain a subset of the file via an index search, then examine or process that subset sequentially (using global FOR). However, a sequential search is usually slower and more expensive than an index search.

When records have been retrieved, they can be manipulated in several ways: they can be displayed at the terminal using any predefined output format; they can be put into a user's work-area or "scratch pad" for manipulation by the text-editor; they can be made available to a batch program; they can become a source for a new series of SPIRES commands, such as a sequencing command; or they can be used to generate reports.

Keeping information in the data base current is done by removing records from, adding records to, or updating records in the file. For adding and updating, records can be moved between a subfile and the text editor's work-area using simple commands. Data is collected or modified using the text-editor's commmands. SPIRES also supports use of a CRT in full face mode.

The integrity of the data is constantly verified and protected by SPIRES. Redundant information stored in each file block insures the validity of the data inside. Modifications made by users to the data base do not take place immediately (though they are immediately visible online). Changes are held until the deferred queue records are processed by JOBGEN overnight; this reduces the chance of accidental loss of data base contents as a result of a system crash or user error.

Data can also be validated on entry by specifying certain tests that the input values must pass. These tests can be specified in the file definition, or in an input format or protocol used to add records to the file. They may be as simple as tests based on the number of occurrences of an element, an element's length, a required range for element values, or inclusion or exclusion of elements that contain certain characters. Editing of input data can also be specified: values can be converted to binary; personal names can be changed to a canonical form; the date or time of day can be supplied, and many other kinds of editing can take place. Similar to these input processing rules are index passing rules that specify how or what kind of information is passed from an element in the goal record to an element in an index. You can specify that groups of words, individual words or phrases, all words, or all words over a certain length be passed; you can also include or exclude a certain set of words, delete or nullify punctuation, or force capitalization, when an element or elements are passed to an index.

A.3  The Process of Defining a File

A complete file definition is never easy. But if you approach the process in the order outlined below, trying not to define ahead of your comprehension, you will avoid general confusion. If you test your first file definition at the different stages or "test points" indicated, you will be able to locate problems more easily and review the materials in a particular section of this manual or ask the SPIRES consultant in User Services for help.

Experienced file definers go through all of the following steps, and usually in the order presented.

A.3.1  Design Analysis

Analyze your data base with respect to the following:

a) What are the elementary parts of each "entry" in your file?

An entry such as a restaurant in the RESTAURANT subfile is usually called a "record." A collection of logically related entries or records that are the goal of search requests in your file are called "goal records." For example, an entry or record for an individual restaurant in the RESTAURANT subfile is a goal record in that subfile.

The elementary parts of a record in the phone book are a name, an address, and a phone number. In a SPIRES file, each of these becomes an "element" and is assigned an "element name" in the file definition.

b) How many times does each element occur?

In most entries in a phone book for example, each name has a single address and phone number. But if we were making a file of students and the courses they take, the "courses" element in the record might occur four, five, or six times, or perhaps not at all.

So, in your file perhaps, some elements must occur once or twice, some must occur at least once but perhaps many times, and some elements may be entirely optional. A file can have elements with any combination of these possibilities.

c) Do you know the length of some of the elements?

Elements like a telephone number can be fixed in length, just as a social security number can be. Elements like dates and most numbers are of fixed length if you tell SPIRES to change them to binary (a fixed internal form) before storing them. If some elements only have a limited number of values, you may want to have SPIRES turn the value into a fixed-length code.

The "length" of an element is always the length in bytes (or characters) as the value will be stored on disk. It is cheapest to process and store elements that are either fixed in occurrence (see "b" above) or fixed in length, or fixed in both occurrence and length. Elements that vary in either or both length and occurrence are more expensive to store and process. Optionally-occurring elements that may vary in length and occurrence are the most expensive to process and store.

d) Do some elements have only certain allowable values or forms?

SPIRES can be told to check the validity of input data if you can specify the criteria for validation. Elements like a phone number and a social security number have "-" in certain required places and are of a known length. Zip codes are of a certain length and contain only numerals. Course numbers at certain institutions might always be three letters, a blank, then three numbers. You can tell SPIRES to reject input to an "age" element when a value is greater than 100 and less than 0, or a "number of children" element's input value that is negative or greater than fifteen. You may want to supply automatically a default value if an element is not input, or override an input value if one is supplied: the date and time a record is added to your file can be supplied as input data by SPIRES.

e) Do some elements occur with other elements in a "structure"?

Elements that are grouped together form a "structure." Taken together, a street address, city, state and zip-code might be called an "address"; in a file in which several addresses occur in each record (perhaps a home address and business address in a mailing list file) there must be a way to associate or bind the first city input with the first state and zip-code input, and the second city with the second state and zip-code. This logical binding of different elements is a structure. The university affiliation of a professor may always be paired with his or her name in a subfile of conference participants, for example. Or, a job-code might always occur with a salary figure in a record of a person's employment history.

Structures can be nested within structures: the record of a student's grades for a single term, making up a course-number/grade structure, could itself be a structure that occurs several times in a goal record that is a student's transcript for several terms of work.

f) What single element is best suited to be the key of the record?

The key of the record is a unique identifier by which one record can be distinguished from all other records. The key must be chosen carefully, since it has many consequences: the element designated as the key must only occur once in each record and the value for that key must be unique among all the records in the subfile.

In a file with a goal record of employee data, the name of a single employee is not likely to be unique, but his or her social security number is unique. So the social security number may be the best choice for the key. In a file whose goal record is comprised of items ordered for a store, you might be tempted to use a purchase order number as the key, but if more than one item were listed on a purchase order then the goal record, which is the result retrieved from a SPIRES search (see "a" above), could not be individual items but would have to be purchase orders.

If you don't have an element that can be the key, that is, an element whose uniqueness could be guaranteed by the nature of your data, you can have SPIRES assign an integer or "slot" key for each added record; this technique is used by the RESTAURANT subfile, a "slot" subfile. SPIRES simply assigns the first record the key "1", the second record the key "2" and so on.

An "augmented key" can also be coded if you must use a non-unique key. SPIRES will simply place a suffix on any non-unique key you enter; this suffix will make the key unique. This technique is useful when personal names are used as keys, or when accession numbers are being assigned.

A.3.2  File Definition

Step One

Define your goal record elements using the file definition language to describe the data characteristics you determined in the above steps. [See A.3.1.] The language for goal record definition is described in the first three chapters of Part B, "Goal Record Concepts and Definition," "Goal Record Keys, Slot and Removed Records," and "Structures." [See B.1, B.2, B.3.]

Step Two

Add processing rules to the description of each element. These processing rules or "actions" are called INPROCS or OUTPROCS depending upon whether they affect the input or output of an element. Study "Processing Rules: INPROC, INCLOSE, OUTPROC" [See B.4.] and become familiar with the use of the appendices "Processing Rules: Complete Listing by Number," "Quick Reference to Processing Rule Functions by Number," and "Quick Reference to Processing Rules by Function-Keyword." [See D.1, D.2, D.3.]

Step Three

Test your basic goal record description. To do this you must first study "The FILEDEF Subfile and File Compilation." [See B.5.] Then:

Two additional chapters may be of help at this point: "File Definition Syntax and Semantics" and "Recompile of an Existing File's Definition." [See D.4, C.1.]

Step Four

Study "File Structure: Tree and Slot, Goal and Index Records," [See B.5.] in order to understand the structure of the ORVYL files that SPIRES has created according to your file definition. That study prepares you for defining your file's index records.

Step Five

Study "Understanding and Coding Index Records" [See B.7.] for an understanding of the various indexing techniques and when to use each: Simple Indexes, Qualifiers, Sub-Indexes and Compound Indexes. In that chapter you will learn how to describe the structure of the "index records" which are associated with the "goal record" you defined and tested previously.

Step Six

Study "Understanding and Coding the Linkage Section" [See B.8.] so you can use the file definition language to describe:

Step Seven

Make use of the appendices "A Guide for Coding Index Record Definitions" and "A Guide for Coding the Linkage Section" [See D.5, D.6.] to code the index and linkage sections of your file. This chapter provides (almost) guaranteed recipes for these two difficult-to-code parts of a file definition. You determine the indexing techniques you need, and use the recipes for index and linkage sections that are indicated.

Step Eight

Transfer your goal record file definition and add the index and linkage sections to it, update the definition, then erase and compile your file. (Before this, you may want to save any data that you have entered.) Use the online SPIBILD processor to pass information from the deferred queue and goal records to the index records you just defined, and build the indexes and goal records into a searchable file. Use the SPIRES searching and browsing commands to check your SRCPROCS and PASSPROCS, verifying that the indexes contain the information you intended.

Step Nine

Make use of the file definition language described in "Defining Subfile Privileges" [See B.9.] to specify accounts or groups of accounts that can: use the file, search it, update it, see only some elements, and update only some elements. Modify your file definition, adding this new code, and recompile it. You will be able to verify that you specified the correct privilege codes after the system FILEDEF file has been updated--that is, the day after you make any changes.

Step Ten

Make use of the file-manager commands described in "SPIRES File Management" [See B.10.] to monitor and control the status, activity and processing of your file.

A.3.2.1  The File Definer: A SPIRES Subsystem to Simplify File Definition

A subsystem of SPIRES, named the File Definer, can simplify the process of file definition. By using a concentrated language, based on a subset of the standard file definition language discussed in this manual, you can specify basic information about the file design, such as element names, which elements should be indexed, etc., and the File Definer will generate a complete file definition for you, saving you the trouble of writing and coding goal and index records and the linkage section.

The File Definer subsytem is available only when you are in SPIRES; to use it, you issue the SPIRES command ENTER FILE DEFINER. Below is an example of a sample File Definer session:

The five lines of input shown would be used by File Definer to generate a file definition of about 30 lines, including a goal record definition with the elements NAME, PHONE and COMMENT, an index record definition for the NAME element, a linkage section and a subfile section. That is much simpler than coding the record definitions and linkage sections yourself. [See B.7, B.8.] The file definition generated may then be added to the FILEDEF subfile and compiled.

Most people writing file definitions will want to use the File Definer at some point because it relieves them from the tedious task of coding index record definitions and linkage sections. Even if you cannot code the entire definition using File Definer (it has some limitations, e.g., you cannot directly code SEARCHTERMS, SRCPROC and PASSPROC statements), you can use it to create a file definition ranging from skeletal to almost complete for any file.

Naturally, there is a great deal of educational value in writing an entire file definition (goal records, index records, linkage section and all) yourself, especially if you want to learn and understand SPIRES file structure. However, letting the File Definer do the tedious work and studying the file definition it generates can be educationally rewarding as well, especially if you do so as you read this manual.

The File Definer has its own reference manual, entitled "File Definer", which is written for people already familiar with the concepts and language of file definition as taught in this manual. A primer to the File Definer, aimed at people who primarily want to create a SPIRES file quickly but who are unfamiliar with file definition concepts and language, may be found in the SPIRES primer "A Guide to Data Base Development".

A.4  Glossary of Important File Definition Terms

A.4.1  Element

A data "element" is the smallest unit of named data known to SPIRES. Data elements (or "fields" as they are called in other systems) may consist of characters, numbers or bits; they may be fixed in length or varying in length. They may also be required to occur more than once or be completely optional. Elements are things such as a person's name, a social security number, a salary, or an abstract of an article.

A.4.2  Record

A "record" consists of a series of data elements and their values. Usually, the record is a collection of all the data elements that pertain to a single entity in the entire collection of data. Thus, a record could be made up of one person's name, address, social security number, and salary. Another record in the same collection of data would have the same elements, but for a different person.

A.4.3  Structure

Within a record, elements may be grouped together in "structures," which are referenced in the same manner as elements. For example, if a person has several offices and phone numbers, the office and phone number elements might be grouped or paired together in a structure to keep the proper phone number associated with an office.

Elements that are not in structures are called "record level" elements. Elements inside a structure are "lower level" elements with respect to the record level elements. Structures that are not inside of other structures are record level structures. Structures inside of another structure are lower level structures with respect to the containing structure.

A.4.4  Key

Each record in a collection of data maintained by SPIRES has a required singularly occurring data element known as the "key." Each key within a collection of goal records must be unique to the goal record--no two records in the same goal record collection can have the same key. The key element in a personnel goal record would probably be the social security number, since it will be unique for each person. If the data records themselves do not contain unique elements that are useable as keys, SPIRES can supply unique consecutive numbers as the values for the keys in a set of goal records.

A.4.5  Goal Record

"Goal Record" is the SPIRES term for a data record that could found as a result of a SPIRES search operation. In a collection of data about restaurants, the goal records would probably be restaurants; in a collection of data about library holdings, a goal record would probably be a book. A single record retrieved from a SPIRES search is a goal record. All of the records that have the same structure as the retrieved record or records, and hence "could" have been retrieved by a search, are referred to collectively as "the goal record" or "the goal record data set."

A.4.6  Index Record

An "index record" consists of a series of data elements and their values, just as a goal record does. However, in index records one of the data elements contains as its value an internal pointer (or pointers) to a goal record (or records). An index built out of names in the goal record would contain one index record for each name that occurs in the goal record as well as a pointer to the goal record (or records) in which the name occurs. The user has no direct interaction with indexes, though they are used by searching commands.

A.4.7  Record-type

A "record-type" must be distinguished clearly from a "record." A record-type refers to a collection of records, and may refer to either goal records or index records. There are "goal record record-types" and "index record record-types." The record-type is a collection of records that all have the same structure. In a personnel file, the goal record of social security numbers, names and salaries makes up one record-type, while a name index and a salary index are two other record-types.

A.4.8  Index

An "index" is a collection of index records created and maintained by SPIRES; one usually does not manipulate them directly. Indexes act as a "go between" between a searching command and the goal records. The values in a search request are looked up in the index, and the values in the index point to particular goal records. This is similar to the index one might find at the end of a book: such an index contains words or concepts, and each word or concept has a list of the pages on which it occurs.

There are two types of indexes available: simple and compound.

A.4.9  Simple Index

A "simple index" (or more specifically, a simple index record-type) contains one record for each entry in the index. For each unique name in a personnel file, there is one record in a NAME index that points to each goal record containing that name. The key of a simple index is the thing being indexed, a person's name, for example. Simple indexes are cheaper to search than compound indexes.

A.4.10  Compound Index

A "compound index" may index several elements, and is usually used to index short numeric values or coded elements such as salaries and dates. In a compound index, there is one record for each element being indexed. If a personnel file has the elements salary, job-class, and date-hired all in a compound index, then the compound index will have three records (one for salary, one for job-class, and one for date-hired). Each record in the compound index contains all the values that exist in the goal records for a particular element; this is why compound indexes are not recommended for large files -- the compound index records become too large to be searched quickly. Compound indexes may be searched with all of the relational operators, but are more expensive to search than simple indexes.

A.4.11  Combined Record-type

Combined record-types are record-types that are stored in the same ORVYL file. The file owner specifies in the file definition which record-types are to be combined together; if no combinations are defined, then each record-type occupies its own ORVYL file. There is a system limit of 13 physical record-types, so if a goal record is to have more than eight indexes, some of them must be combined into the same ORVYL file. SLOT record-types may not be combined with any other record-type. Record-types that are physically combined are kept conceptually separate by the logic built into SPIRES. Record-types that are physically combined with each other are different "logical record-types," even though they may occupy the same "physical record-type" (the same ORVYL file). There is a system limit of 64 logical record-types.

A.4.12  Removed Record-type

A "removed record-type" has nothing to do with records that have been deleted from the file with the REMOVE command. Removed record-types may provide increased access efficiency for some data. SPIRES access efficiency depends on a large number of records being packed in a single file block. SPIRES provides the file definer with an option of keeping only the key of a record in the file block, plus a pointer to the remainder of the record's data. The remainder of the data is kept in the "residual data set." This is called "record removal" and allows many more (partial) records to be kept in a file block than if whole records were kept intact.

A.4.13  Subfile

A "subfile" is defined as one set of goal records, the indexes to those goal records, and the access and update restrictions that apply to the data elements. Among the record-types that are brought into association in a subfile, a clear distinction is made between goal records and index records, since the user can only manipulate goal records, not index records. If a goal record has no indexes built for it, then the subfile consists of only the goal record and the access restrictions to it.

A.4.14  File

Several subfiles may relate to the same data and may be placed in one "file" or data base. A file thus contains all the subfiles that relate to the same data. A user can only work on a single subfile at a time, even though there may be several subfiles defined in one file. It is also possible (and is frequently the case) that only a single subfile is contained in a file.

A.4.15  Hierarchy of File Definition Components

The following chart shows the relationship of the parts of a file. The chart depicts a single file, with two subfiles. The first subfile has three record-types: one goal record and two index records. The second subfile has only a single record-type, the goal record.

The goal record of the first subfile is composed of a record with a key and several elements. One of the elements is a structure, and is thus itself composed of elements. The first index record of the first subfile is a record with only a key and a pointer.

              ------  ----------   -------------    -----
                                   | goal record-> | KEY
              |       |            |               | ELEMENT
              |       | "goal      |               | ELEMENT     ---
              |       |  record    |               |    :        |
              |       |  record-   |               | STRUCTURE---|ELEMENT
              |       |  type"     |               | ELEMENT     |ELEMENT
              |       |   or    -> |               |    :        |--:
              |       | "goal      | goal record   |

              |       |  record    | goal record   |
              |       |  dataset"  |      :        ---
              |       |            |      :          :
              |       |            |-----------      :
              |   S   |            ------------
              |   U   |            | index record  |---
              |   B   | "index     |             ->| KEY
              |   F ->|  record    |               | POINTER
              |   I   |  record- ->|               |----
      FILE -->|   L   |  type"     | index record  [
              |   E   |   or       | index record  [
              |       | "index"    | index record  [
              |       |            |      :
              |       |            |-------------
              |       |
              |       |             -------------
              |       | "index     |
              |       |  record    |
              |       |  record-   |
              |       |  type"   ->|   as above
              |       |   or       |   for
              |       | "index"    |   index-record
              |       |            |   record-type
              |       |            |
              |       -----------  -------------
              |
              |       -----------   -------------
              |       |             |
              |       | "goal       |
              |       |  record     |
              |   S   |  record-    |
              |   U   |  type"    ->|  as above
              |   B ->|    or       |  for
              |   F   | "goal       |  goal
              |   I   |  record     |  record
              |   L   |  data set"  |  record-type
              |   E   |             |
              |       |             |
              ------- -----------   ------------

B  Defining a SPIRES File

B.1  Goal Record Concepts and Definition

To begin our consideration of goal record definition, let's take a telephone directory as our example; the structure of a directory is something with which we are all familiar. Certain assumptions we are going to make for our directory file will simplify its structure.

What information is stored in a telephone directory? Usually, and most simply, a name, address, and telephone number make up each entry.

B.1.1  Element Names, Occurrences, and Lengths

If we have a single "record" or entry that consists of the three elements, name, address, and phone number, how many times will each element occur in a single record? Let's look at the question this way:

Most likely (in the simplest case), the name and address elements will occur once and only once: for each name there is one address. But it would not be unusual for a person to have several phone numbers, so we don't know how many phone numbers to expect or allow for.

Now, what can we say about the length of each of these elements? Here is a review of what we have so far:

We really don't know the length of the longest possible name and address. We could probably specify a length that couldn't be exceeded, but SPIRES does not require us to. If you do not specify a length, SPIRES stores only the length of the value input, plus two bytes of information about the length. Let's not specify a length for the NAME and ADDRESS elements.

The question of the length of the phone number requires some decision; let's agree that a phone number is an eight character (or eight "byte") value, such as "497-4420". (If we wanted to include the area code with each number, then the value is thirteen bytes long: "(415)497-4420" for example.) Our "file definition" now looks like this:

B.1.2  File and Record Name Statements

Let's see what this file will look like in the SPIRES file definition language. The first thing we must "code" or specify is the name of the file. This name is always an alphanumeric string preceded by the file definer's account number in the form GG.UUU. This account becomes the only account that by default can modify or compile the file definition. The file name is coded first in the definition, and looks like this:

Here, GG.UUU is the account, and "DIRECTORY" is the name chosen for this particular file. The file name (including the account) may be up to 23 characters long (longer names will be truncated by SPIRES). No one but the file owner need ever see this name. This is not the name used to select the subfile.

A file consists of sets of records; each set is called a "record-type." Most often there is a goal record record-type and several index record record-types per subfile. To simplify our discussion at this point, we will call a record-type a "record." (Though this is not true, strictly speaking. Many goal-records make up a single goal-record record-type. [See A.4.] Each of these records has a unique name.) The goal record is often called "REC01", and its name is coded

We will see later why this name is most common, and some circumstances in which you might want to choose a different name. [See B.2.2.]

B.1.3  Element Categories

Within each record we define that record's elements. All of the elements must be in one of three categories: FIXED, REQUIRED, or OPTIONAL. Elements are segregated into these categories by their occurrence and length attributes as follows:

Notice that "Fixed" and "Varying" for FIXED and REQUIRED refer to the length attribute of the element being defined, not its occurrence attribute.

The length of an element in the FIXED section of the record definition must be specified. If the occurrence is not specified for an element in the FIXED section, then the number of occurrences is assumed to be one. On the other hand, an element defined in the REQUIRED section of the record definition need not have either length or occurrence attributes specified; the element must occur--but its occurrence and length may vary from entry to entry. Elements in the OPTIONAL section need not have either length or occurrence specified, because that element may or may not occur in a given record.

If you do specify an occurrence for an element in the OPTIONAL section or the REQUIRED section, it has a special meaning, which is different from an occurrence specification for a FIXED element. If the number of occurrences is one, then the element is "singularly occurring": for a REQUIRED element, this means that it must occur once and only once in each record; for an OPTIONAL element, this means that if it occurs at all, it can occur only once. If the number of occurrences is more than one, then the element is multiply occurring: for a multiply-occurring REQUIRED or OPTIONAL element, the number of occurrences is not checked. You may have SPIRES do a minimum and/or maximum occurrence check by specifying certain processing rules, usually A123 and A146. [See B.4.14.]

A small amount of storage space is saved when REQUIRED or OPTIONAL elements are specified as singularly occurring.

Though not required for a valid file definition, an OPTIONAL section with a dummy element should be coded in every file or record definition for which an OPTIONAL section would not otherwise occur. By coding this section you have the flexibility to add elements to the record definition even after data has been stored in the file. Such elements are always added to the OPTIONAL section; the dummy element is never coded with length and occurrence attributes. [See B.1.7.] No more than 254 elements may be coded in an OPTIONAL section.

Remember that we decided "PHONE" is fixed in length, but is not fixed in the number of times it can occur, though it must occur at least once. Elements that must occur, but for which a firm occurrence count can't be specified, are placed in the REQUIRED section, even if they can be fixed in length.

We code the category name at the head of the list of elements it describes:

Note the order in which the categories must appear: FIXED, then REQUIRED, then OPTIONAL.

If you do not code any categories, all the elements will be OPTIONAL, with the exception of the key, which will be REQUIRED. [See B.1.4.]

B.1.4  Record Keys

In addition to placing each element in an appropriate category, we must choose an element that will be the "key" of the record. A key is required for every record (whether goal or index) defined in the file; it must be unique in value and occur only once in each record.

Now, in our phone directory, we would most likely pick the name as the key. What consequences does this have? The key of a certain record is a unique value for that record; no two records or entries in the file can have the same value for the key element. Thus, no two records in our telephone book could have the same name. (Here is where we allow ourselves to simplify with the assumption that no two people in our phone directory will have the same name. This would not be a realistic assumption for a real phone directory. The solution to this problem is found in the next chapter, "Goal Record Keys, Slot and Removed Records.")

In addition to being unique in value among the goal records, the key must always be singly occurring; that is, the occurrence attribute of the key must be one. For this reason, an occurrence number need not be specified for the key element. A key may be varying in length, such as the name in our phone directory. But a length attribute may be specified if it is known. In our phone directory, NAME would be coded as the key element as follows:

The key element is always coded as the first element in the category in which it is defined. Since the key must be singly occurring, but may be fixed or varying in length, it is coded as the first element in either the FIXED or REQUIRED categories.

Let's review our definition:

Since we don't have any elements in the FIXED section, we don't code it.

B.1.5  Element Name, Occurrence and Length Statements

The next element to code is ADDRESS. For this element we can specify that it must occur once and only once, but we can't specify a length attribute. The name of an element is specified in the ELEM statement. (You are allowed to use ELEMENT instead of ELEM; however, SPIRES will change it to ELEM when you add it to the FILEDEF subfile later The occurrence attribute of an element is specified in the OCC statement. (Similarly, OCCURS, OCCURRENCE and OCCURRENCES may be used instead, though they will be changed to OCC.) The ADDRESS element would be coded in the REQUIRED section as follows:

If we had not specified "OCC = 1", then ADDRESS could occur one or more times. (Since it is coded in the REQUIRED section, it must occur at least once if the occurrence attribute is not coded.)

We now must code the phone number element. For this element we can only say, "it must occur." We don't know how many times. In such a case, "OCC = 1" is not coded since this would limit the element to one and only one occurrence. We have decided that the length of the phone number in bytes (characters) as it will be stored on disk is eight characters. The length attribute of an element is specified in the LEN statement. (LENGTH is also allowed but will be changed to LEN.) Since it may vary in occurrence, the phone number element is coded in the REQUIRED section thus:

Remember that the length attribute, coded by "LEN =", is the length as the value will be stored on disk, which is not necessarily the length of the value as input when the record is added; processing rules, called "actions", can manipulate the input values.

If we specify "LEN = 8" for the PHONE-NUMBER element, then all element values stored on disk will be eight bytes long. If a value is input that is longer than eight characters, the record will be rejected for input, and an error message will be issued. If a value is input that is shorter than eight characters, SPIRES will pad it with blanks to a length of eight bytes. [To allow null values for an element's input, omit the LEN statement; otherwise, SPIRES will fill the entire length with blanks, which is not the same as a null value.] Manipulation of input values can be effected more intelligently when "actions" are coded. [See B.4.]

Embedded blanks are not permitted in element names; the special characters ".", "_", "-", and "$" are allowed, though "-" is not allowed as an element name by itself; it may be embedded within a name, however. [See "SPIRES Searching and Updating", section D.1.3.1, for more information about the "-" or "throw-away" element.] The length of an element's name is limited to sixteen characters.

B.1.6  Element Aliases

Long element names are often advisable for clarity, since the value coded in the ELEM statement is the name of the element used when records are displayed.

However, it is not convenient or sensible to enter a twelve-character element name for an eight-character value: "PHONE-NUMBER = 497-4420;". SPIRES allows you to give an element a long, descriptive name such as "PHONE-NUMBER" and refer to it by several other names, such as "P" or "PN". The file definer must indicate what these other names can be by coding "aliases" for the element names in the file definition. Here is how aliases are coded:

A phone number can now be entered by "P = 497-4420;" or, more simply, by "P 497-4420;" (since the "=" is optional). We have also allowed the aliases "PN" and "NUM", which are mnemonically more significant than the terse "P". No two elements can have the same alias at the record level or in a structure.

B.1.7  Dummy Elements; Comment Statements

Since we have no OPTIONAL section, we should code an "empty" OPTIONAL section with a single dummy element. (This will allow us to add elements to the record definition at a later date without invalidating data already stored.) This section is coded as follows:

An item called "COMMENTS" may be coded for any element you define; no single comment can be longer than 1,024 characters.

Let's look at the record definition we have coded, adding aliases where they are useful:

B.1.8  Optional Statements: AUTHOR, MAXVAL, NOAUTOGEN and BIN

Other statements can be coded that will make the file definition more complete: AUTHOR, MAXVAL, NOAUTOGEN and BIN. They are coded after the FILE statements, which is the first statement in our definition.

B.1.8.1  (The AUTHOR Statement)

It is important to specify the AUTHOR statement in your file definition. In case it is necessary for the data base systems staff to contact you, the AUTHOR statement should supply the necessary information. This element is usually coded after the FILE element, and is a free-form text string:

B.1.8.2  (The MAXVAL Statement)

Another file-level element is necessary for some applications, particularly those involving long text strings such as bibliographic and abstract files. If any element values in your file will be longer than 4,096 bytes, you must code the following in your file definition:

The value specifies the maximum data length for any single occurrence of an element in the file. MAXVAL cannot exceed 32,760. Also, no single record in a SPIRES file can be more than 120,000 characters long.

The MAXVAL limit also applies to values processed by actions A44 and A48 and by the SET VALUE Uproc in Userprocs. [See C.11.1.1.]

B.1.8.3  (The NOAUTOGEN Statement)

An optional element "NOAUTOGEN;" may be coded in your file definition if you do not want SPIBILD automatically (i.e., nightly) to pass records from the deferred queue to the goal and index records. Normally, every night that there are records in the deferred queue, JOBGEN will generate a job to build or process them into the goal and index record data sets. Every time this job runs, a certain amount of overhead for job scheduling and initiation is incurred. With only a small number of records to be processed (say, fewer than 5), this overhead is a significant percentage of the job cost.

However, if NOAUTOGEN is coded, you must explicitly cause this job to be submitted by issuing the online SET AUTOGEN command, perhaps after allowing several records to accumulate in the deferred queue. JOBGEN will generate a SPIBILD job that night, and then reset the file to the NOAUTOGEN condition. If NOAUTOGEN is not coded, then you must take specific action to prevent overnight processing; SET NOAUTOGEN can be issued to prevent the generation of this job until you explicitly SET AUTOGEN in SPIRES or PROCESS the file in SPIBILD.

B.1.8.4  (The BIN Statement)

You may code the bin number to which you wish output from SPIRES-generated jobs to be sent. Output from compilations and automatic file building (JOBGEN) will go to the bin specified; if no bin is coded, then such output will be directed to the default bin of the file owner.

If you code PURGE for the bin, then the output will be purged if there were no batch requests processed by SPIBILD and if no errors occurred during SPIBILD processing. Otherwise, the output will be sent to the file owner's default bin. Coding PURGE is recommended because it generates output only in the event of a SPIBILD problem or a batch request, thus saving you printing charges.

If you code HOLD for the bin, the output will be directed to the default bin of the file owner but the output will be held. The file owner can fetch the output and then either purge it or release it for printing.

The bin is coded in your file definition like this:

where "nnn" is the number of the bin or HOLD or PURGE as described above.

B.1.9  Statements in the Subfile Section

Though our goal record definition is now complete, there are several other things that must be coded to complete the definition of the file itself. (Remember that a file definition usually, but not necessarily, contains several record definitions.)

As noted earlier, the file name is almost never seen by the user; what the user sees is the subfile name, which is coded as the first statement in the "subfile section" of the file definition. The subfile section (or sections) follows at the end of the last record description.

Embedded blanks are allowed in the subfile name. Since this name is typed in a SELECT command, it should not be very long or otherwise difficult to type. The maximum length for a subfile name is thirty-two characters, including blanks.

The second statement in the subfile section identifies the record that will be the goal record when the subfile is selected. Because we only have one record name for our single record definition, this may seem redundant. But since most subfiles have multiple record descriptions--usually one goal record and several index records--SPIRES must be told explicitly which record is the goal record. This statement is coded as follows:

Remember that "REC01" was the name of the record we described and named by the statement "RECORD-NAME = REC01;".

Now we must specify what accounts are permitted to select the subfile whose name is given by the "SUBFILE-NAME" value immediately preceding.

This permits access to the subfile only to the account specified. At a minimum, the file-owner's account should always be specified; if it is not, then the file owner must issue the ATTACH command to use the subfile.

You can permit more than one account by coding other account values:

To permit all group "GG" accounts (but not "GA" accounts), you would include "GG...." in the ACCOUNTS value. To make a subfile public, you specify "PUBLIC" as the ACCOUNTS value. The matter of controlling access to SPIRES subfiles is detailed in "Defining Subfile Privileges." [See B.9.] A complete subfile section can be coded like this:

B.1.10  A Complete File Definition

Here is what our complete phone directory looks like when coded in the file definition language:

The indentation shown is for the sake of clarity; you can use any indentation that is helpful to you. Also, an element's name, occurrence, length and aliases need not be defined on a single line; in fact, when SPIRES displays your file definition, each of these will be on a separate line, with indentation used to structure the definition for easy reading.

B.2  Goal Record Keys, Slot and Removed Records

B.2.1  Record Keys

Let's consider another way of defining a telephone directory file. Suppose we made the telephone number the key of the record, what would be the impact on the file? Here is a record definition in which the key is the phone number; we have also allowed name and address to occur more than once by not specifying any OCC limits.

Such a directory would give you access to all the users of a particular phone number; if one person had two different phones, the name would be in two different records, each record's key being one of the phone numbers. A directory keyed on the phone number might not be useful to someone looking for John Jones' phone number, but it would be useful to someone looking for the owner of phone number 497-4420, which has been reported out of order, perhaps.

Notice that this dramatically changes how we look at or use the file. Now, all the people sharing a single office extension can be found, but one person's phone number can't be found as directly as it was in the directory keyed on name. The goal of the search--either names, as in the previous case, or phone numbers as in the present example--determines the choice of key.

Since it is unlikely that one phone number could be at more than one address (though some businesses have "extensions" in several buildings), we will code "OCC = 1" for the occurrence attribute of the ADDRESS element. But it is very likely that more than one person could be listed for each phone. For this reason, we will not code any occurrence attribute for the NAME element: we simply don't know how many times this element will occur. The occurrence of a record key must be one; the length of a record key must never be greater than 240 bytes, whether the length is fixed or not, whether it is the key of a goal record or index record.

The present definition, keyed on phone number, is different in another way from the definition keyed on name: the phone number, which was in the REQUIRED section (varying in length, required to occur) in our record keyed on name, is now in the FIXED section. The phone number was multiply occurring before, though it was fixed in length; now, since it is the key of the record, it is required to occur exactly once. Elements whose occurrence and length attributes both can be fixed are usually coded in the FIXED section.

There may be reasons why you would choose not to put a fixed length and occurrence element in the FIXED section of a record definition. Let's look at two record definitions for a phone directory keyed on phone number; we will add an element for zip code, which seems to belong in the FIXED section, being fixed in both length and occurrence.

 RECORD-NAME = RECO1               RECORD-NAME = RECO1;
   FIXED;                            FIXED;
     KEY = PHONE-NUMBER;               KEY = PHONE-NUMBER;
       LEN = 8;                          LEN = 8;
     ELEM = ZIP-CODE; OCC=1;
       LEN=5;
   REQUIRED;                         REQUIRED;
     ELEM = NAME;                      ELEM = NAME;
     ELEM = ADDRESS; OCC = 1;          ELEM = ADDRESS; OCC = 1;
                                       ELEM = ZIP-CODE; OCC = 1;
                                         LEN = 5;
   OPTIONAL;                         OPTIONAL;
     ELEM = DUMMY;                     ELEM = DUMMY;

In the standard SPIRES output format, "element mnemonic = value", the elements in a record are output in the order in which they are defined: FIXED, REQUIRED, then OPTIONAL elements. (If an element occurs more than once, its occurrences are output in the order in which they were input.) Standard record output formats for each of the above definitions might be as follows:

  PHONE-NUMBER = 497-4400;         PHONE-NUMBER = 497-4400;
  ZIP-CODE = 94305;                NAME = USER SERVICES;
  NAME = USER SERVICES;            ADDRESS = POLYA HALL 117;
  ADDRESS = POLYA HALL 117;        ZIP-CODE = 94305;

So, for readability, you may want to put ZIP-CODE in the REQUIRED section of the record definition. But if you are certain to define an output format, there is no need to consider this problem.

B.2.2  Slot Keys

We have just discussed the importance of choosing the best element for the key of the record. Let's look at situations in which the choice of unique key may be difficult or impossible.

Suppose our SPIRES file was going to be a collection of abstracts from scientific journals. Our element record definition might be as follows (note that the key is not specified):

Now, if we wanted a search to retrieve the list of journal abstracts in which the words specified in the search request appeared, the goal record would be "articles." A search request for such a file would look like this:

How would we go about choosing a key for an "article" goal record? None of the elements defined above is very likely to be entirely unique. We could contrive a unique key by concatenating portions of the JOURNAL, YEAR and PAGE elements: NG.76.202, for example, could signify page 202 of a 1976 issue of National Geographic. However, such a key would not be convenient to enter or use. (See the "Structured Key" processing rule, A33, for one solution.)

SPIRES has a more elegant solution to the problem of a lack of a natural key. If you specify that a record type is "SLOT", SPIRES will assign a unique integer key to each record added; these keys start at one and will be incremented by one as each record enters the file. SPIRES always stores a slot key as a four byte binary number.

This simple solution could lead to problems: suppose you typed a command such as "remove 197" when it was actually record 187 that was to be removed. The file definer can protect against this kind of error in a slot file. SPIRES allows you to specify that a "check digit" be appended to each integer slot number as the record is added to the file. A check digit is a single digit that is appended to the right end of a number; it is computed by performing multiplication and addition operations on each digit of the original number, and then adding and subtracting the resulting sum to yield a single digit. Since this digit is computed from the other digits in the number, the original number's digits can be verified by seeing that the final digit is correct for a number you type at the terminal.

For example, a record in your file may have the key "2757", of which the final digit, "7", is the check digit. The value "2657" would not be a valid key, however, since the first three digits, "265", require (or compute to) a different check digit than "7". Thus, each digit becomes significant in computing the check digit, and most typographical errors in specifying a record's key (such as typing "2657" instead of "2757") will be caught when the system attempts to verify the check digit. Note that the record whose key is "2757" is not two thousand seven hundred fifty-seventh record in the file, but the two hundred seventy-fifth; the final digit is the check digit. The system does not store the check digit with the key, but computes it each time you display the key--the digit is always shown when the key is displayed. (If you "look up" the key of a record using action 32 [See C.5.] and intend to display it, you must explicitly code a processing rule to have this digit displayed.)

A check digit is requested by coding "SLOTCHECK" on a SLOT record. On all commands requiring a key, the check digit, first computed by SPIRES, is appended to the record key by the user and recomputed and validated by the system. This digit functions similarly to a parity bit in tapes, verifying that the data is valid. The method SPIRES uses in computing the check digit is described in detail in the description of action 27. [See D.1.3.0.2.7.]

The default check-digit formula is called the Mod-11 rule; it can be explicitly requested by giving the value "0" to the SLOTCHECK statement ("SLOTCHECK = 0;"). Other formulas, described in the description of action 27, can be requested by coding different integers on the SLOTCHECK statement:

No KEY statement is coded for a slot record, since the slot number (with a check digit if one is requested) is the key of the record. SLOT and SLOTCHECK are coded as part of the record definition as follows. (REMOVED will be explained in the next section of this chapter.)

You will notice that "RECORD-NAME = ENTRY" was coded, instead of "RECORD-NAME = REC01" as before. In a slot record the name of the goal record key is determined by the value of the "RECORD-NAME" statement. One caution should be observed: the value of this element should be lower in alphabetical sequence than any other "RECORD-NAME" statements you code since the record definitions are displayed in alphabetical sequence by RECORD-NAME. In general, it is not good to code "RECORD-NAME = GOAL". If you wish to have the goal record key named something other than the RECORD-NAME, then following the "SLOT" statement, code the "SLOT-NAME = name" statement. You may also code an ALIASES statement for the slot key. [See B.1.6.]

The SLOT statement may also have a numeric value, representing its priv-tag number. [See B.9.4.4.]

A file may not have more than eight slot-type record definitions. You may, under certain circumstances, want to define a goal record as "slot" even when a natural key exists. The advantages and disadvantages of such a scheme must be weighed carefully, and are described below.

SPIRES treats slot type goal records in a special way; it keeps them in a data set that is organized sequentially rather than tree structured. If all of the elements in a slot record are fixed required (coded in the FIXED section of the record definition), then the amount of space each record requires is known exactly; when SPIRES goes to retrieve such a record, it "calculates" the record's position and goes directly to that location--it does not have to account for the varying size of each goal record stored. Thus, the major advantage of slot organization of fixed required elements is that record retrieval goes much faster (retrieval is not to be confused with searching, the process that usually precedes retrieval). A second significant advantage is that, since the goal records are structured sequentially, sequential searching by global FOR commands is faster.

The disadvantages of forcing your data base structure into a fixed slot-type record format are actually inconveniences; the file definer must decide if these inconveniences are acceptable.

One inconvenience is that records must be referred to by their slot number in TRANSFER, REMOVE, UPDATE and DISPLAY commands. In a personnel file keyed on social security number, you could remove or update an employee's record by simply giving the person's social security number. If this file were defined as a slot file, you would first retrieve the record by using a FIND command against a social security index, then TRANSFER the record retrieved using a global FOR command. As you can see, the extra expense of building a social security number index would have to be incurred.

A second inconvenience is the loss of verification that the social security number of each person was unique; if social security number were the key of the record, SPIRES would verify that no records had the same value for the key. Generally, when a natural key exists, SLOT organization is not used.

If you expect to do a lot of sequential searching of the data base using global FOR commands, then consider making the record elements FIXED and the record SLOT. How is this done? If most of the elements in a record are of fixed length and fixed occurrence, you can consider having SPIRES store a variable length element such as NAME as a fixed length element; you would choose the largest possible length for the length attribute of each variable length element. But if a record has an element that can vary greatly in length, such as ABSTRACT in our article goal record above, we would not want to waste storage space by fixing the length of this element at its longest possible value.

If all of a SLOT record's elements are not coded in the FIXED section, then the record must be "removed." In the following section we will see what record removal means, and how it is coded.

B.2.2a  SLOT-START Statement

SLOT-START is a new field in the SLOT structure of record definitions, both in FILEDEF and RECDEF. SLOT-START serves more than a single purpose -- enabling a simple way to generate keys that begin at a particular value.

Suppose you would like to generate Slot records whose first key value is something other than 1 -- say 9000000. You want the first record to have a key of 9000000, the second 9000001, then 9000002, etc. The ORVYL system on the Stanford mainframe stored the key of 9000000 in block 35573 of the record-type data set (assuming the block size is 2048 and the record-type is REMOVED). This situation poses no real disadvantages in mainframe SPIRES because ORVYL only writes a single block 35573 resulting in a data set that has two blocks -- block 0 and block 35573.

But for Unix SPIRES this would be a different matter. If you added the first record of 9000000 to a slot record-type in that system then SPIRES must fill in the gap between 0 and 35573. Not only does this represent "wasted" space but represents 35572 extra block read requests should you attempt to do a sequential scan (eg. FOR SUBFILE / DISPLAY ALL) of the subfile.

If you wish to take advantage of this option then you should code the "SLOT-START = number" statement immediately following the "SLOT" statement in your record definition. If the subfile is a NEW subfile then your work is done and the first record added to the subfile will have a key of "number".

If the subfile already exists and has SLOT keys that begin from a different value and you wish to take advantage of this new option then you should RECOMPILE using the REBALANCE option, following the same recipe that you use to rebalance Tree data set using the CONVERT option. [EXPLAIN RECOMPILE COMMAND, WITH REBALANCE OPTION.]

Here are some answers to other questions you might ask:

B.2.3  Removed Record-Types

In our first example of a file, a telephone directory keyed on name, the length of a single record is approximately the sum of the lengths of the individual elements. It would probably not exceed fifty bytes or characters: the NAME and ADDRESS elements may take twenty bytes each, and the PHONE-NUMBER element takes an additional eight bytes.

This means that in one block of ORVYL storage, which is 2048 bytes, approximately forty records could be stored. SPIRES access efficiency depends to a great extent upon the number of records each block contains. If the average number of records per block is eighty, then one record in 512,000 can be retrieved by accessing three blocks or less, if the data is structured in tree fashion. If the average number of records in a block is only twenty (each record averaging one hundred bytes in length), then five accesses may be necessary. If the number of records per block drops to sixteen or fewer, then efficiency of record access seriously degenerates, since many file I/O's may be necessary to locate a particular record.

In order to keep access efficiency high, SPIRES provides the file definer with the option of removing large record types (remember that a "record type" is our "REC01," a "record" is an entry in the file), say of 60 bytes or more, from the goal record data set to a "residual data set." This removal is done for all records in a record type, and may be specified for any record-type, whether large or small in size. Only a key and a pointer to the record's location in the residual data set remain in the goal record data set when you specify record removal.

Tree (or "non-slot") record types, such as our telephone directory, should usually be removed, since the size of an entry is often over forty bytes. You can specify that a record type's contents are to be removed to a residual data set when you code the record type's name:

Slot record types, which use a different access technique from tree structured record types, are always removed unless all elements are fixed in length and occurrence. Slot record types, such as our articles file, can specify removal to a residual data set as follows:

Slot and tree structured data sets may be mixed in a file or data base.

Some rules-of-thumb can be stated for record removal. Remove record types if any of the following are true:

B.2.4  Monotonic Record-Types

A "monotonic" record-type is a record type for which keys will be added in monotonic order. This simply means that the key of every new record will be higher or lower (either alphabetically or numerically) than the key of the previous record added.

For example, suppose a business receives materials and assigns a key to a record in a SPIRES file. The key could be a concatenation of the year and an integer number like a SLOT number:

In this example, it is apparent that the key of a new record will always be greater than the key of all previous records. Because of this, the record type should be declared as MONOTONIC. The MONOTONIC statement is coded at the record level as follows:

MONOTONIC cannot be declared for SLOT record types.

The MONOTONIC statement provides for more balanced growth [See B.6.3.] of a file when keys are added in monotonic order; it causes SPIRES to pack each file tree block full before beginning a new block. Normally, SPIRES leaves some room for growth in each block; this is because a new key is often added that falls between two existing keys. In a MONOTONIC situation, however, this is not the case; new keys always fall at the end of previous keys.

Since MONOTONIC only specifies the way tree blocks are packed, and not an access method (SLOT determines both a packing method and an access methods for example), it is possible to recompile a file definition after records have been added and either add or delete the MONOTONIC statement. [See B.5.9.] It is also possible to add new keys in-between previous keys, even if MONOTONIC has been specified. If this is frequently necessary, then the MONOTONIC statement should be deleted and the file recompiled.

B.3  Structures

B.3.1  Data Structuring

Let's look at a phone directory file for a doctors' answering service. The file should allow the answering service operator to find one of several phone numbers a doctor has left for emergency calls, then choose the one that matches the time of day. We will make this a slot file to avoid the problem of duplicate names. Since all of the elements don't appear in the FIXED section, we must specify that the slot goal record be removed to a residual data set.

A record as entered and retrieved from this file would look like the following:

    ENTERED:                         RETRIEVED:

    NAME = Taylor, Paul;             NAME = Taylor, Paul;
    STREET = Ash Lane;               OFFICE = Medical Center;
    CITY = Palo Alto;                STREET = Ash Lane;
    STATE = Ca;                      STREET = Stanford Ave;
    ZIP = 94305;                     CITY = Palo Alto;
    HOURS = 9-12 Daily;              CITY = Stanford;
    HOURS = 5-8 Thursday;            STATE = Ca;
    PHONE = 555-1212;                STATE = Ca;
    OFFICE = Medical Center;         ZIP = 94305;
    STREET = Stanford Ave;           ZIP = 94305;
    CITY = Stanford;                 HOURS = 9-12 Daily;
    STATE = Ca;                      HOURS = 5-8 Thursday;
    ZIP = 94305;                     HOURS = 2-5 Daily;
    HOURS = 2-5 Daily;               HOURS = 12-2 Daily;
    PHONE = 497-4420;                PHONE = 555-1212;
    HOURS = 12-2 Daily;              PHONE = 497-4420;
    PHONE = 548-7737;                PHONE = 548-7737;

This kind of data organization is obviously not acceptable; not only is one occurrence of each of the address elements not grouped with the one or more occurrences of HOURS and PHONE that are related, but the varying number of occurrences of each of these elements makes it impossible to sort the elements into related structures visually.

Now let's look at the entered record in terms of groups or structures of the elements:

                                         NAME
  LOCATION:        --ADDRESS:         |--STREET
                   |                  |  CITY
                   |                  |  STATE
                   |                  |--ZIP
                   |
                   | HOURS-PHONE:     |--HOURS
                   --                 |--PHONE

  LOCATION:        --ADDRESS:         |--OFFICE
                   |                  |  STREET
                   |                  |  CITY
                   |                  |  STATE
                   |                  |--ZIP
                   |
                   | HOURS-PHONE:     |--HOURS
                   |                  |--PHONE
                   |
                   | HOURS-PHONE:     |--HOURS
                   --                 |--PHONE

(The lines drawn around groups of elements represent a hierarchical structuring of the data that relates elements to each other.) SPIRES allows you to group together separate occurrences of elements into a hierarchy or "structure." The structure itself has a name, and an occurrence of the structure is stored logically as a unit; upon retrieval, the elements in a single occurrence of the structure are displayed together. The elements within a given occurrence of the structure are said to be "structurally bound" -- structural binding is an important aspect of searching techniques that needs to be considered when designing a file. [See C.6.13.]

Structures may contain other structures, nested up to ten levels, as LOCATION contains ADDRESS and HOURS-PHONE. The NAME element is called a "record level" element, since it is not contained in any structure. The LOCATION structure is called a record level structure, since it is not contained in any structure, and is defined by an element, LOCATION, occurring at the record level. ADDRESS and HOURS-PHONE are not record level structures, since they are contained in a structure; similarly, STREET, CITY, and other elements are not record level elements.

B.3.2  Coding Structures

Structures, like elements, can have occurrence and length attributes. The structures in the doctors' phone directory are fairly complex; "Location" is multiply occurring, and has in it a singly occurring "Address" structure and a multiply occurring "Hours-Phone" structure.

Until now, all the elements we had defined were "simple" elements, such as NAME and PHONE-NUMBER. SPIRES allows another kind of element, "STR", for structured elements. If we want to code the address structure for the above record, we would begin this way:

If the structure has fixed length or occurrence attributes, or aliases, they must also be coded:

The length of a structure is the sum of the lengths of all elements in the structure. The length attribute can only be specified for structures containing only FIXED elements; otherwise the length attribute must be omitted. The occurrence attribute is the number of occurrences of the structure as a whole.

Structures, like elements, must be placed in the FIXED, REQUIRED, or OPTIONAL section of the record. If the structure contains only FIXED elements, it may be placed in the FIXED section of the record. If the structure need not occur at all (e.g., if none of the elements in the structure will be given values), it should be placed in the OPTIONAL section of the record. Note that a REQUIRED structure may contain OPTIONAL elements, and an OPTIONAL structure may contain FIXED and REQUIRED elements.

Where are the elements in a structure specified? Where do you define the occurrence and length attributes of the STREET, CITY, STATE and ZIP elements of the ADDRESS structure? The structure definition, containing element definitions, is entered at the end of the record definition. Its organization is similar to that of a record; it has FIXED, REQUIRED and OPTIONAL elements and sections in it, coded as follows:

If none of the categories, FIXED, REQUIRED, etc., are coded, all the elements of the structure are OPTIONAL, with the exception of the structure key, if there is one, which is REQUIRED.

The "name" of a structure is given in the structure declaration in the goal record description:

Here is a somewhat better phone directory of doctors, using structures to group related occurrences of elements:

B.3.3  Structured-Data Input

The information we input into the earlier file [See B.3.1.] will be input to this new file with only slight modifications. The inclusion of the structure name ("LOCATION;" "ADDRESS;", and "HOURS-PHONE;") signals the start of an occurrence of the structure. An input record for this file is shown below; an output record is identical.

B.3.4  Keyed Structures

The structures used in the above file definition and data record are called "non-keyed" structures.

A keyed structure differs from a non-keyed structure in that one element of the structure is defined to be the structure's key. Like a record key, a structure key must be singularly occurring, and must be either the first element in the FIXED section or the first element of the REQUIRED section of the structure. Unlike the record key, multiple occurrences of the same structure in the same record may have the same value for their key. Also unlike a record's key, a structure's key must be the first element defined in the structure. This means that if a structure that has both FIXED elements AND a REQUIRED key, the structure is treated as a "non-keyed" structure. But if any FIXED element is declared the KEY, SPIRES forces it to be the first element of a keyed structure.

Keyed structures also differ from non-keyed structures in the following ways:

Keyed structures are defined in the same way as non-keyed structures, except that one element is designated as the structure key. For example, the LOCATION structure could be defined as a keyed-structure as follows; note that ADDRESS is a keyed structure within LOCATION.

B.3.5  Keyed-Structure Data Input

A record for input to this file will be organized differently than input for a non-keyed structure. The most important difference is that the occurrence of the key element of the structure must be the first element entered for input to the structure. Here is an example:

Use of a keyed rather than a non-keyed structure has a significant effect on data entry. The structure name need not be entered; but the key element can only occur once for each occurrence of the structure. Compare data entry and organization of a non-keyed and keyed structure (on PHONE) when the same HOURS element applies to two different PHONE elements.

Two occurrences of a keyed structure are necessary for two occurrences of the structure's key. Thus, the choice of a key for a structure is almost as important as the choice of the proper key for a record since it determines how information is entered, stored and displayed. In our current LOCATION structure keyed on PHONE, different HOURS at one PHONE are entered as a single occurrence of the structure; if this structure were keyed on HOURS, every occurrence of a different HOURS element would require a new occurrence of the structure to be keyed.

Unlike record keys, structure keys need not be unique. The same phone number key element could occur twice in two different occurrences of the structure, perhaps each having different hours.

B.3.6  Floating Structures

As we have seen, structure definitions are quite flexible and can thus be used to represent many types of data organization. Nested structures offer one kind of flexibility. Floating structures offer an additional flexibility.

Because a structure is defined independently of its position in the record, it is simple to define a structure which occurs at multiple places in the record. Such a structure is called a floating structure. The elements in a floating structure can be indexed by specifying them by their structural element path. [EXPLAIN STRUCTURAL ELEMENT PATH.]

In the following definition, the doctors' answering service phone directory has been altered to allow two ADDRESS structures, but the structure is only defined once, since it floats. ADDRESS is now a structure that occurs at the record level, where it contains either a doctors' home or business address. The home and business addresses are distinguished by the occurrence or non-occurrence of the BUSINESS element; if the address is a business address, then the element occurs but has no value (LEN = 0). Conversely, if the element doesn't occur, then it is not a business address. ADDRESS is also a structure that occurs nested in the HOURS-PHONE structure.

Sample input to this file is shown on the left; sample output is shown on the right:

B.4  Processing Rules: INPROC, INCLOSE, OUTPROC

B.4.1  Functions of Processing Rules

SPIRES provides a facility, called "processing rules", for examining, validating and modifying input and output values.

In all of the files we have defined so far, element values such as "497-4420" for a telephone number and "94305" for a zip code have been input and stored as character strings.

Suppose we could tell SPIRES to take the eight byte (character) telephone number, verify that it is in fact eight bytes, delete the "-", then convert the remaining seven digits to a four byte binary number, giving an error message if the value contains anything but digits. If we do this, eight bytes of input can be verified for correctness and stored in four bytes of space; in a telephone directory, this would mean a substantial reduction in storage costs and data entry errors.

Of course, having stored the data in a binary fashion, we want to restore it to its original form on output by converting it to a string then inserting a "-" after the third character.

Processing rules perform this checking and translation. Rules that process input to a record are called INPROCS; rules that modify stored data for output are called OUTPROCS. A third category of processing rules called INCLOSE actions are associated with INPROCS and perform a special kind of input processing. The two remaining categories of actions, SRCPROCS and PASSPROCS deal with searching and indexing, respectively; these are discussed in later chapters. [See B.8.9, B.8.11.] The general syntax rules discussed in this chapter will apply to them, however.

The basic form of processing rule is called an "action". Another type of processing rule, called a "system proc" (pronounced "prock"), is also available; system procs are an alternate way to specify an action or combination of actions using a more descriptive language and simpler (in most cases) syntax. System procs are based on "processing rule-string procedures", which are described later in this manual. [See C.10.] This chapter and the rest of this manual discuss actions as the main type of processing rule, but keep in mind that system procs may be used in place of actions, and are in fact used by the File Definer when it creates file definitions. [See A.3.2.1.] They are described fully in the SPIRES reference manual "System Procs".

B.4.2  INPROC and OUTPROC Rule Functions

In general, INPROCS can validate data on input and convert element values from one form of representation to another, perhaps more compact, form. In addition, if any of the validation rules discover an error, you can ignore it, or simply put out a warning message, or reject the value, or reject the entire record.

Let's briefly consider a few examples. Suppose we have an AGE element; we can verify that the input is between 1 and 100, then convert the number to a one byte binary value. This allows us to specify the LEN attribute of the AGE element as fixed at one. Consider another case: if we had an element that contains the number of children a couple had in the last decade, we could validate that the value was between 0 and 15, supply a default value of 0 if no value was input, and store the value as a one byte binary number; now the element could be coded in the FIXED-REQ section of the goal record definition. Similarly, a date element can be verified for one of several forms, converted to a canonical form, then stored as a four byte binary value.

String input value processing rules can compress lengthy values; a day of the week element can be converted to a one byte binary number representing numbers between one and seven after validating that the input value was a correct day of the week. On output these codes could be converted back to the full name of the appropriate day of the week. String processing rules thus allow you to input an abbreviation, store the value in its most compact form, then translate the compact form into a more verbose form for visually pleasing output.

B.4.3  Processing Rule Strings

The conversions and manipulations described are not usually performed by a single processing rule, but series of rules, each separated from the preceding one by a "/". Such a series of rules is often called a "processing rule string" or simply a "rule string." The result or output of one action becomes the input to the next; thus, the order in which you specify rules may be important. Processing rules usually apply to a specific element in a record, and are defined along with the ELEM, OCC, LEN and ALIASES attributes. Processing rules can be coded for a structure as a whole; in this case they are coded along with the "TYPE=STR;" attribute. An element description is thus extended as follows:

A single processing rule or action can be coded as simply as this:

This rule would convert data element input in upper-lower case to uppercase only before storing it on disk.

Processing rules usually occur in strings, each rule effecting some transformation on the element value input or, if a rule is preceded by another rule, effecting some transformation on the output of the previous rule. A simple rule string would be:

This sequence of actions would convert an upper-lowercase input value to uppercase (A30) then compress multiple blanks to a single blank (A40) before storing the data on disk.

Up to 31 actions (15 if an "INCLOSE" action is in the string) can be used together, each separated by a "/". A long rule string can be continued over several lines terminated by a semicolon. As with any input value to SPIRES, the value can only be continued to another line if it can be broken at a blank. For this reason it is important to use "/ " (a slash blank) as a separator to insure that SPIRES can break a processing-rule string into multiple lines at blanks.

A30 and A40 both exhibit the simplest form of an action: an "A" followed by a number. Processing rules are referenced by number, to allow compact expression. They typically have more complex syntax; for example:

B.4.4  Processing Rule Syntax

A general syntax for processing rules follows. Optional portions are enclosed in square brackets. Choices are configured one above another vertically. Defaults are underlined. The symbol "#" represents a number.

         -   -           -          -
         | D |           | ,P2[,P3] |
           -     -     -
         | W |   | :P1 | |          |
        A| E | # | :0  | | ,P+...   |
                 -  -  - -          -
         | S |
         -   -

Examine the following examples:

"A30" contains the only required parts of the syntax, "A" and a number, which is 30. "AE146:3,10" contains an "A", an "E", a number, P1=3, and P2=10.

Let's examine what each unit of the syntax indicates to SPIRES, and any restrictions on the units.

A

This is simply the letter "A", with which every instance of a rule begins.

D, W, E, or S

Only one of these parameters can be coded; if none is coded, then D is the default. This parameter controls the level of the response if an error flag is set on by the processing rule. For example, an action that translates integers to binary will set on the error flag if the input value contains characters other than a sign and digits. The possible error responses are:

#

The rule number. Processing rules are referenced by number, not by name.

:P1

The P-one parameter specifies the "way" in which an action works: whether the other parameters form an inclusion or exclusion list, or whether the result is stored in 1, 2 or 4 bytes, for example. If P1 is not explicitly coded, it defaults to 0. When coded, the P1 parameter is always preceded by a colon, ":", to distinguish it from other parameters. Some processing rule descriptions say that you should add "4" or "8", for example, to a given value of P1 in order to specify a slightly different way in which the action should work. Thus, if you were intending to code "3" but the description you desired said to add "8" to it, you would code "11". Alternatively, you may code "3+8", which may help you read the coded action more easily later if you need to reinterpret its meaning using the action description. (Processing rule-string procedures [See C.10.] can also take good advantage of this feature.)

,P2

The P-two parameter. For many rules, the P2 parameter controls the function of the processing rule subroutine. The P2 parameter is preceded by a comma.

,P3

The P-three parameter. For some rules, the P3 parameter controls the function of the processing rule subroutine. The P3 parameter is preceded by a comma. The P3 parameter always follows the P2 parameter; no rule uses a P3 parameter without a P2 parameter. If the P3 parameter is null, the preceding comma must still be coded in rules which require a P3 parameter.

,P+

The P-plus parameter(s). Some rules require a set of parameters. There may be as many as 255 P+ parameters in any one processing rule; each is preceded by a comma.

B.4.5  Processing Rule Restrictions

Restrictions which you should keep in mind when coding any processing rules:

Length and Occurrence:

Note: INCLOSE rules are a special variety of INPROC rules. They are discussed later in this chapter. [See B.4.8.]

Special Characters:

Processing rules may transform data prior to storage on disk. The RECOMPILE command may be used after adding or changing processing rules in a file definition only if data already stored on disk is not thereby invalidated. In all other cases, the ZAP and COMPILE commands must be used, and the file rebuilt. [See B.5.8.]

B.4.6  Understanding Processing Rule Descriptions

"Processing Rules: Complete Listing by Number" is a list of processing rules current at the time this manual was produced. [See D.1.] This appendix also describes the use of the actions file; [See D.1.1.] a protocol is available to produce an up-to-date actions listing. [See C.6.9.]

Understanding the explanations for each processing rule in the actions listing requires a familiarity with the syntax of a processing rule, and its requirements for P1, P2, P3 or P+ parameters. Since this familiarity is best obtained through practice, let's look at a few element definitions in which we code processing rules.

As our first example, let's interpret the following element definition:

Beginning with the first rule of the INPROC rule string we find the following explanation for A30 in the listing of rules:

This rule, most frequently used before an input value is to be tested for codes, takes input such as "Female" and converts it to "FEMALE". The output from this rule becomes the input to the next rule, A44.

The line of information in the description "Processing Rules: INPROC, OUTPROC, SRCPROC" tells you that the rule may appear in INPROC, OUTPROC and SRCPROC rule strings. It may not appear in a PASSPROC rule string.

The description of action 44, the next rule in the INPROC string, is more difficult to understand than that of action 30. Unfortunately, action 44, having several parameters, is more typical of action descriptions than is A30. Here is the description we find:

Compare the first two lines, which describe the syntax of A44, with the actual rules coded:

In the rule coded, every occurrence of the string "MALE" in the output of A30 is changed to "M". Thus, not only is "MALE" changed to "M", but "FEMALE" is changed to "FEM". The second A44, "A44,FEM,F" changes any occurrence of "FEM" to "F".

We have not coded D,W,E or S in either of the A44s (it is meaningless in A30), so D is the default. When is the error flag turned on by A44, and what is the system's response when D is the error response parameter? According to the description the error flag is set on if the P2 string, "MALE" in the first A44 and "FEM" in the second, is not found. Why wasn't an error message requested when the strings weren't found? In the case of an input value "FEMALE", both A44's will find a string to convert from P2 to P3. In the case of an input value "MALE", only the first A44 will do any conversion. For this reason, it is best to wait until the next action, A46, to determine whether an error condition exists.

The last action in the INPROC rule string is A46; its description reads as follows:

   A46 :<NUM> ,VALUE (,...., VALUE)
   A46 :P1 ,P+
      Purpose: INCLUSION, EXCLUSION, EXCLUSION LIST, VALUE
        REPLACEMENT
      Processing Rules: SRCPROC, INPROC
        The alphanumeric value is  matched against  the string  of
        codes  that make up the multiple occurrences of P+.  There
        may not be more than 255 occurrences.  If P1 is 2 then the
        value is  replaced  by  a  short  integer  containing  the
        relative  position of the code that matches the value.  If
        P1 is 1, then the value is replaced by a  byte  containing
        the  relative position of the code that matches the value.
        The short integer or byte value  is  suitable  for  output
        using  OUTPROC  action  A75  having  a duplicate string of
        codes.  If P1 is > 2, the value is unaffected if  a  match
        is  found.   For all of these P1 conditions the error flag
        is turned on if the value does not match  a  code  in  the
        list.   For  P1  =2  or  1, the highest code number + 1 is
        returned as a default value.  For P1  =  0  the  value  is
        abandoned  when a match is found and the error flag is set
        on.  P1 = 0 is used for exclusion lists.
      Processing Rules: OUTPROC
        This  action  allows  exclusion   on   output.    Elements
        containing  values matching the supplied list of values in
        this action are excluded from output.  Note  that  a  null
        valued element is always excluded.
      Processing Rules: PASSPROC
        The alphanumeric value is  matched against the  string  of
        values that make up the multiple occurrences of P+.  There
        may  not  be  more  than 255 occurrences.  If P1 = 1 and a
        match is not made, or P1 ~= 1 and a match  is  made,  then
        the  value  is not passed.  For P1 = 1, P+ is an inclusion
        list and only values contained in P+ are passed. For P1~=1
        (P1  =  0)  P+ is an exclusion list and only values  which
        are not included in P+ are passed.

Since A46 in the SEX element INPROC has a P1 parameter of 1, the value "M" will be stored as a binary 1, and the value "F" will be stored as a binary 2; any other value will be stored as a binary 3. Since "W" was coded as the error response parameter, any input to this action other than "M" or "F" will cause an error condition, a warning message will be printed, and the default action is taken, giving the "bad" value a binary value one higher than any legal code.

B.4.7  Custom Error Messages

Suppose the following INPROC is coded for an element called SEX:

If the input for this element was:

then error message put out looks like this:

A user who issued the command "EXPLAIN E46" would get the following explanation:

The generality, and, in fact, ambiguity of this diagnostic is necessary because A46 wears several faces depending upon the value of the P1 parameter coded by the file definer. To reduce the ambiguity of many multi-purpose diagnostics, you may want to code one or several A56's in a processing rule string. A56 facilitates customized error messages as follows:

Whenever any INPROC or OUTPROC rule in an action sequence puts out an error or warning message, the string in the nearest preceding A56 is also output. A56 might be coded in our rule string as follows:

Now, if "SEX=MAIL;" were input, the error message would be:

Note that an action 56 coded before an INCLOSE rule [See B.4.8.] will have no effect on error messages.

B.4.8  INCLOSE Rules

In the INPROC rule strings we have examined so far, the output from one rule becomes input to the next rule in the string. INCLOSE rules, which provide the only form of looping available in processing rule strings, are the exception to this chain. The term "INCLOSE" can be considered a short form of "INPROC CLOSE-OUT." This expanded term tells us two things about INCLOSE rules: 1) they are coded in an INPROC rule string, and 2) they "close out" the rule string and hence must be the last rule or rules coded in a string of rules. INCLOSE rules, A122 through A148, can supply values such as date, time or account to a data base element; they can supply a value for one element that is the number of occurrences of another element; they can supply additional values or delete values if an element doesn't occur a specified number of times. An INCLOSE rule can be used to sort multiple occurrences of an element or of a structure by its key value.

Below is an example of the use of INCLOSE rules. Two elements are defined as FIXED, neither of which need ever be supplied by the user. These elements will contain the date on which a record was first added to a subfile and the date on which it was last updated.

In the above example, note that A126 produces an eight byte character string of the form MM/DD/YY, yet we are going to store this value in a four byte binary number. A31 performs this conversion, yet we have coded it before A126; this is because INCLOSE rules perform a type of looping. The value generated is passed back through the entire INPROC rule string where the new value is converted or validated as specified. (The generated value is usually not passed through the INCLOSE action a second time, unless the rule's description states that it is; A123 for example.) Look at the following example:

If there are fewer than three occurrences of the element (P2 of A123=3), then an additional occurrence is generated and passed through the rule string, where A46 converts it to a binary code. Note that when an INCLOSE is coded, any previous INPROC rule must find the output from the INCLOSE rules acceptable, because the value generated is passed through the entire INPROC rule string.

It is important to note that INPROC and INCLOSE rules are executed at completely different times. INPROC rules are executed when an element is recognized on input. INCLOSE rules occur when "closing out" the structure in which they occur. INCLOSE rules are executed when all elements of a structure, including all lower level structures, have been input.

INCLOSE rules for elements embedded in a structure will not be executed unless the structure (i.e., any element in the structure) occurs. INCLOSE rules that refer to elements in addition to the one for which they are coded (for example, A122, A131, A132) can refer only to elements in the same structure. If such a rule is coded on a record level element, then it can only refer to other record level elements. INPROC and INCLOSE rules may be coded for a structured element (an element of TYPE = STR); see A33 for an example.

Some additional rules governing INCLOSE rules are:

Now that we have looked at the syntax and semantics of processing rules, let's look at some of the different functions they may perform. The suggestions that follow are by no means exhaustive, in fact, only the more common rule strings are sampled.

B.4.9  Processing Rules For Numeric Data

Numeric data may be more compactly stored if it is converted to a binary or packed decimal representation. For example, a one-byte binary integer can represent a number in the range 0 through 255; a two-byte binary integer any number in the range -32768 through 32767; and a four-byte binary integer any number in the range -2,147,483,648 through 2,147,483,647. The savings in disk storage charges by converting integers from their character representations may be significant. Note that three-byte binary numbers are not supported by IBM.

The same processing rules that convert numeric data may also be used to validate that the character string values being entered are numbers. Once numbers have been converted, it is possible to validate, by processing rules, that they are within a specific numeric range.

Action 21 will convert a value from a character string to a fixed binary integer. If the P1 parameter is 1, the result will be a one-byte binary number in the range 0 through 255. This rule may be used as follows:

Note that since the value is being converted to one-byte, a length attribute of 1 can be specified. (This element may now be placed in the FIXED-REQ section of the record or structure.)

If the value as entered is not an integer number, or not in the range 0 through 255, the error flag will be set. The letter "S" in the processing rule specifies that if an error is detected a serious error message is to be typed and the entire record rejected.

Since the value as stored is now a binary integer, it should be converted on output to a character string format. The appropriate element definition thus becomes:

To validate that the AGE element is in the range 1 through 99, the INPROC and OUTPROC could be written as:

If the value is greater than 99, action 24 will report a warning message, and will set the value to 99. If the value is less than 1 (e.g., if it is equal to 0 or negative) a warning message will be issued and the value will be set to 1. Note that a P2 parameter of 2 is sufficient for action 71 to output a number no greater than 99.

INPROC action 25 is used for floating point numbers which may be represented on input as signed or unsigned real constants or in scientific notation (e.g., 1.705E-3 is the same as .001705). SPIRES will set the error flag and type a message if the value is not in the correct format, depending upon whether the rule is written as "AS25", "AE25", "AW25" or "AD25" (or "A25"). If the P1 parameter is 1, a single precision internal floating point number is generated; if P1 is 2, a double precision floating point number is generated; if P1 is 0 the value is only checked for floating point format and not converted. Action 26 is the floating point equivalent of INPROC action 24, and action 72 the floating point equivalent of OUTPROC action 71. A72 will suppress leading zeros, but all places to the right of the decimal point (the number of places is specified in the P2 parameter) will be printed (e.g., -1.75000 is possible with A72,8,5). A P1 parameter of 2 or 3 provides for stripping trailing zeros in the decimal portion (e.g. -- 1.75 is possible with A72:2,8,5). For example:

The packed decimal conversion rules are INPROC action 39 and OUTPROC action 80. Packed decimal conversion is very useful when a number is to be stored, but it is too large to be stored in a four-byte binary number. Though social security numbers can be stored in four-byte binary form (after "-"s have been removed), a telephone number with an area code cannot. Such a number can be stored in packed decimal format.

Check digits can be appended to fixed binary values by INPROC action 27 and validated by action 34. See also OUTPROC action 77.

B.4.10  Processing Rules for Dollar-and-Cents Data

The processing rules for dollars-and-cents are similar to the processing rules for floating point numbers described in the last section; in fact dollars-and-cents values are converted to single precision binary floating point numbers. For example:

An error message is produced and the value rejected if the value is not in the correct format. The OUTPROC specifies a maximum field width of 15 characters, which is more than sufficient for a single-precision floating point number and a dollar sign.

Action 85 can be used to "edit" a dollar and cents field, inserting check protection or appending a "CR" if the value is negative, for example. The use of this action is patterned after the COBOL and PL/I picture edit masks; see COBOL or PL/I documentation on the "Picture Clause" for complete documentation of the options available.

B.4.11  Processing Rules for Free Form Character Strings

Character string values, such as book titles, disease diagnoses, journal abstracts, etc., can seldom be validated on input. It is possible, however, to perform limited conversion, in most cases only to conserve storage space.

INPROC action 30 can be used to convert the value into upper case. (Ordinarily, text strings should be stored in upper and lower case for ease of reading.) Action 30 will be more useful when processing coded values. [See B.4.12.]

Actions 40 and 51 will compress extra blanks from character string values. These extra blanks are often unnecessary, and increase storage cost. A51 removes trailing blanks; A40 "squeezes" multiple blanks to a single blank and removes trailing blanks. An example of the use of A40 follows:

Blanks which have been removed by actions 40 or 51 cannot be reinserted with an OUTPROC. Note that if you use A40 you do not need to use A51.

Using Action 57, a character string is translated on input into a packed form for compact storage, then retranslated using action 58 on output. Restrictions apply on which characters will be translated and retranslated, and which will be converted to another character.

B.4.12  Processing Rules for Strings of Codified Data

The examples of processing rules used in the introduction to this chapter highlight the facility for processing rules to examine an input value for predetermined string values. Actions are available to convert one string to another (for example, A44 or A48), or a string to a binary code (for example, A46). An example shown earlier in the text was:

Instead of the two A44's, we might have coded a single A48 instead, observing the note in the action description:

Using A48, we would have coded:

Since the value we have stored on disk is a binary number, we must code an OUTPROC with a set of codes corresponding to those in the A46 to translate the binary value stored back into the appropriate string. It is extremely important that any value output (by an OUTPROC or not) be able to go back in through the INPROCS, since it is likely that transfer and update commands will be used in any file. For this reason, the output "Androgynous" will trigger a warning message on an ADD or UPDATE command, when it is sent through the INPROC; but the record is accepted, and the value "ANDROGYNOUS" is converted to a binary 3.

Processing rules are available to convert character string values from one character set to another, or to validate that only legal characters are entered in an element value. Refer to the descriptions of actions 43 through 49 in the actions appendix for compete descriptions. [See D.1.4.] Action 46 can be used to test whether specified strings either occur or don't occur, that is, to form an inclusion or exclusion list. Character set translation, say "$" to "-", is performed by action 43.

B.4.13  Processing Rule for Personal Names to Canonical Form

Action 41 is available to give great flexibility to personal name entry and retrieval. Any recognizable form for a personal name can be input, and SPIRES will convert all forms to a canonical form before storage. The file owner determines the canonical form from among six forms. If a user will later want to sequence a search result by last name, then the name must be stored in a last-name first form.

B.4.14  Processing Rules for Validating Length and Occurrence

The length and occurrence attributes of an element describe the disk storage that SPIRES will reserve for an element. If an element value exceeds the length attribute, or does not occur the specified number of times, the record is rejected. More extensive validation of element length and occurrence can be made with processing rules.

Actions 22 and 23 may be used to validate the length of an element. The following element definition ensures that the value is at least 2 and no more than 15 characters long. Warning messages are issued if an error is detected; no fewer than 2 and no more than 15 characters will be stored:

If a VARYING-REQ element has no LEN specified for it, it is possible for a user to input a value of 0 length. If this element were then to be passed to an index, a PASS ERROR would occur because a zero length element was being passed. [See B.10.10.] In order to prevent this, action 23 should be used to force the value to have a minimum length, perhaps padding with blanks. For example:

would allow "POSTAL-CODE;" (a zero length value) to be input. But

would provide a warning message if "POSTAL-CODE;" were input, and would enter a blank value rather than a null value.

Action 146 may be used to ensure that an element occurs fewer than a specified number of times. The following element definition will produce a warning message if a person has more than 10 children:

A123, which tests for a minimum number of occurrences, and A146, which tests for a maximum number of occurrences, are especially useful when a VARYING-REQ or OPTIONAL element is multiply occurring with a fixed number of occurrences. As noted before, [See B.1.3.] SPIRES does not validate the occurrence count of VARYING-REQ or OPTIONAL elements that have more than one occurrence specified; these elements are simply "multiply occurring" to SPIRES; A123 or A146 must be used to insure fixed occurrence. These actions can also be useful to force a fixed number of occurrences for FIXED-REQ elements that are multiply occurring. Otherwise, SPIRES will store blanks or zeroes for occurrences that are required but do not occur in the input data.

B.4.15  Processing Rules and Element Types

Every element in a record-type can be categorized by "type". To most users, the element type is not important -- the user gives the element a string value for input and it comes out as a string value on output, so its internal type is only significant to SPIRES. But there are programming situations where the internal type is relevant, and even situations where the programmer needs the element to be treated as a different type.

For most elements, the element type is determined by INPROC processing rules. For example, an element whose INPROC string ends with the $INT proc (A21) will be type integer. Processing rules also determine that an element is type character [Elements are called type string or type character interchangeably, and there is no difference, as there is between variables of type string and character. Elements would be exclusively called type string except that the abbreviation STR is used for elements that are type structure.] (the default, if no rules are specified), type real, type packed, and type hex. [Elements may be declared type "bits" with the TYPE statement, where "bits" means SPIRES will treat it as type hex. That is sometimes done on elements created internally (e.g., during passing) that do not have INPROC statements but which you do not want SPIRES to treat as type character. It is also done in situations where you want WHERE clauses and ALSO commands to be case-sensitive for a given element. [See B.9.3.16.]]

But the types of some elements, i.e., structures, locators and executable elements, are determined by the TYPE statement. [See B.3.2, B.7.1, B.4.] The TYPE statement has the following syntax:

where "type" is one of the following:

     STR   -  structure
     LCTR  -  locator
     XEQ   -  executable
     CHAR  -  string, character
     HEX   -  hex     (BITS is equivalent)
     REAL  -  real
     PACK  -  packed decimal
     INT   -  integer

If you want to know what type an element is, you can issue the command SHOW ELEMENT CHARACTERISTICS. Note that it will list "CHAR" elements as "String" and "BITS" elements as "Hex".

Knowing the element type is also useful when you use the system variables $CVAL and $UVAL in formats and the function $GETUVAL. The type of value returned by them is determined by the type of the element involved. For example, in an output format, the type of $UVAL changes from label group to label group, based on the type of element retrieved by the GETELEM statement.

Using the TYPE Statement to Solve Programming Problems

In most cases, you should only put the TYPE statement in an element definition when you need to. Specifically, you must use it when you are defining a structure, a locator or an executable element. But if a processing rule defines an element as type integer, for example, you do not need to include "TYPE = INT" in its definition.

However, in certain circumstances, you might want to declare an element as a different type than the processing rules (or lack of processing rules) would create. For instance, an element might be defined as such:

For some reason, the file owner has decided not to code an INPROC; the input is perhaps coming from some external source already in integer form. So it is stored as a four-byte integer. The problem is that SPIRES considers it an element of type string (since it has no INPROC rules), which could create problems for format definers who think it is type integer. Simply adding the statement "TYPE = INT" would solve those problems. [This is a somewhat unlikely example, however. The file owner is more likely to include an INPROC, such as $INT, which would set the type to integer properly, and then write an input format to handle the input from the external source. That format could then override the file definition's $INT rule for the element. A more likely example would probably be more complicated, however.]

The TYPE statement may not be used indiscriminately to change an element to any type at all. The following is allowed: an element whose INPROC rules would define it as type string or hex may be changed by the TYPE statement to INT, REAL or PACK. Also, an element of any type may be "redefined" as a hex element, using "TYPE = BITS".

B.4.16  Processing Rule Tracing: SET PTRACE

SET PTRACE is the basic command for activating processing-rule tracing, where SPIRES displays environmental information about the processing rules being executed as element values are processed by them. Record-processing commands that can display Ptrace information include DISPLAY, TYPE, ADD, UPDATE, MERGE, ADDUPDATE, ADDMERGE, the INPUT commands, and TRANSFER, where Inproc, Outproc and Inclose rules will be traced. It can also be invoked for tracing the execution of Searchproc rules in searches, i.e., in FIND, AND, AND NOT and OR commands. However, Ptrace does not trace Passproc rules; a separate facility, invoked by the command SET PASSTRACE, provides that facility. [See B.4.17.]

It is also available in SPIBILD for batch processing commands such as INPUT BATCH, for Inproc and Inclose processing.

Depending on the Ptrace options selected, SPIRES can show you:

Additionally, you can ask for tracing only for specific elements.

There are eight forms of the SET PTRACE command, depending on which options you want to use. You may issue multiple SET PTRACE commands to achieve particular combinations.

Here is the full list of commands, each of which is described below:

Each has a parallel CLEAR PTRACE command to clear its particular effect. Also, "+ names" and "- names" lists can be used on the USERPROCS, VARIABLES, ELEMENTS or TYPES flavors to alter the effect of a previous command.

SET PTRACE

The basic command is:

Like all SET PTRACE commands, it should be issued after the subfile you want is selected. The command will succeed only if you have SEE access to the SPIRES file to which the subfile belongs. By default, Ptracing will then be in effect for all elements of the goal record-type except phantom elements and dynamic elements. [Ptracing of phantom elements may be a future extension of this feature.]

With the basic SET PTRACE in effect, SPIRES will display a "-Ptrace elem: <name>" and "-End ptrace: <name>" line for each element as execution of its processing rules begins and ends. Additionally, between those two will be a line displaying the value of the element at the end of its processing ($Val) as well as its length ($Vlen).

For instance, a basic SET PTRACE command might show this for a date element on output:

SET PTRACE SNAPSHOT

For more explicit information, add the SNAPSHOT or SNAP option to the command, which will show you a "snapshot" of information about each action as it is executed:

The information shown includes the action number, often with its P1 parameter, the type of the value if it is not string, and the value and its length BEFORE the action is executed.

Continuing the example of the BIRTHDATE element above, here is the Ptrace information on output under SET PTRACE SNAPSHOT:

The processing rule A76:3 works with the input value 17231222 of type Hex; A76, the action for converting date values for output, returns the value "WED DEC 22, 1723", which is processed by the second rule, A30, and the final output value is shown in the fourth line.

The information shown by Ptrace is derived from compiled code during its execution. Unfortunately, because system procs (such as $INT or $DATE) used in processing rules, are converted during compilation to their action components (such as A21 or A30), any system proc that invoked the action cannot be displayed by Ptrace, which means interpreting Ptrace information may be challenging. For instance, in the example shown above, the one system proc coded in the Outproc rule string for the BIRTHDATE element was a $DATE.OUT rule, which turned into actions A76 and A30 during compilation.

[For assistance in translating system procs into actions, see the System Proc Expansion section at the end of each system proc description in the SPIRES manual "System Procs". Also see the Action descriptions at the end of this manual. [See D.1.1.]]

SET PTRACE USERPROCS

When the processing rules invoke Userprocs, neither the basic form nor the SNAPSHOT form will name the Userproc called. To get that information, use the USERPROCS form:

Besides the basic information shown for SET PTRACE, entry to and exit from each Userproc called by action A62 or A124 ($CALL proc) will be displayed as it occurs, as shown in the example below. Userprocs invoked from within other Userprocs with the XEQ USERPROC Uproc will not be shown; use SET PTRACE JUMP for that.

In the absence of a list of Userproc names on SET PTRACE USERPROCS, SPIRES will show trace information for all Userprocs called by action A62 or A124. If you add a list to the command, then only the Userprocs that match the names given will be traced. You can alter that list further with subsequent SET PTRACE USERPROCS commands, using the "+" or "-" characters in front of the list. You can also begin by adding an "exclusion list", listing the Userprocs you don't want to trace by preceding them with a minus sign; in other words, all Userprocs except those named in the exclusion list will be included in the trace.

Here is an example of Userproc tracing for an element:

SET PTRACE VARIABLES and SET PTRACE JUMP

The USERPROCS version of SET PTRACE is often accompanied by SET PTRACE VARIABLES, which shows the values of user variables in the Userprocs as they are changed, and SET PTRACE JUMP, which shows all JUMP (GOTO), XEQ USERPROC, and RETURN Uprocs within Userprocs as they execute. Neither command has any effect unless SET PTRACE USERPROCS is in effect.

When SET PTRACE VARIABLES is in effect, anytime the value of a user variable in a Userproc is changed, the new value is shown in the tracing information.

In the absence of a list of variable names on SET PTRACE VARIABLES, SPIRES will show trace information for all user variables in executing Userprocs. If you add a list to the command, then only the variables that match the names given will be traced. You can alter that list further with subsequent SET PTRACE VARIABLES commands, using the "+" or "-" characters in front of the list. Note that you omit the preceding pound sign when specifying them here. You can also begin by adding an "exclusion list", listing the variables you don't want to see by preceding them with a minus sign; in other words, all user variables except those named in the exclusion list will be included in the trace list.

With SET PTRACE JUMP, you will see information about JUMP or GOTO Uprocs as well as XEQ USERPROC Uprocs and any RETURN Uprocs that return execution control back to the calling Userproc. The XEQ USERPROC trace information names the Userproc that has been invoked. The JUMP trace information, again dependent on the compiled code, does not have the label name to which execution has been directed, but does have the (admittedly cryptic) displacement value used internally by SPIRES.

The example below, expanding on the previous one, shows the influence of SET PTRACE VARIABLES and SET PTRACE JUMP, with SET PTRACE SNAPSHOT as well:

SET PTRACE ELEMENTS and SET PTRACE TYPES

The last two forms of SET PTRACE control the scope of Ptracing, as established by the other commands.

You can request Ptracing for only specific elements by naming them in a SET PTRACE ELEMENTS command:

The first time you use the ELEMENTS option in a given PTRACE session, you specify either an inclusion list ("elements"), the elements for which you want tracing, or an exclusion list ("- elements"), the elements for which you don't want tracing. Later, you can change the list of elements being traced by adding new ones ("+ elements") or removing others ("- elements").

You can request Ptracing for only specific types of processing rules (for instance, you are interested only in Inproc rules as they are executed) with the SET PTRACE TYPES command:

where "types" can be one or more of the following: INPROC, INCLOSE, OUTPROC, SEARCHPROC. If this command is used, only the types of processing rules named will be traced. Note that INPROC includes INCLOSE.

SET PTRACE ALL

Issuing the SET PTRACE ALL command is equivalent to issuing the following commands all in a row, making it a convenient shortcut when you want full tracing:

CLEAR PTRACE

Turning off processing-rule tracing is as simple as:

CLEAR PTRACE and CLEAR PTRACE ALL are equivalent, turning off all the tracing. However, you can turn off the individual options as well:

SET PTRACE is often used in combination with SET TLOG, which directs the tracing data to a temporary log file, which can be examined with the SHOW TLOG DATA command. For more information, [EXPLAIN SET TLOG COMMAND.]

You can see what SET PTRACE commands are in effect with the SHOW PTRACE command:

Since SHOW PTRACE lists the actual commands, you can use the IN ACTIVE command to put them into your active file, where you can modify them and then re-execute them with the XEQ command; clearing the current Ptrace settings with a CLEAR PTRACE command first is recommended.

The $PTRACE flag variable can also be checked to determine if Ptrace is in effect.

B.4.17  Processing Rule Tracing for Passprocs: SET PASSTRACE

SET PASSTRACE is the basic command for activating processing-rule tracing for Passprocs, where SPIRES displays environmental information about the processing rules being executed as element values are being passed to indexes.

The Passtrace facility is available only in SPIRES, not in SPIBILD or FASTBILD. Thus, it is available when a file is processed in SPIRES (using the PROCESS command) or when a goal record-type has immediate indexing and a record transaction occurs. It displays information only about the Passproc rules that are executed; other processing-rule tracing is handled by the command SET PTRACE. [See B.4.16.] A limitation: If you are passing virtual elements whose values are generated by executing Outprocs and/or Inprocs, or if you are passing the external form of an element, then SET PASSTRACE does not display trace information for the Outprocs and Inprocs that are executed. They would not be shown if you SET PTRACE either, however.

Depending on the Passtrace options selected, SPIRES can show you:

There are six forms of the SET PASSTRACE command, depending on which options you want to use. You may issue multiple SET PASSTRACE commands to achieve particular combinations.

Here is the full list of commands:

  SET PASSTRACE                   - basic information
  SET PASSTRACE SNAPSHOT          - detail information
  SET PASSTRACE USERPROCS [names] - information about Userprocs
  SET PASSTRACE VARIABLES [names] - information about variables
                                      used in Userprocs
  SET PASSTRACE JUMP              - information about execution
                                      flow commands in Userprocs
  SET PASSTRACE ALL               - same as issuing all of the above

Each has a parallel CLEAR PASSTRACE command to clear its particular effect. The "names" option on the USERPROCS and VARIABLES forms limits the tracing to the specific Userprocs or variables named. Also, "+ names" and "- names" lists can be used on the USERPROCS and VARIABLES flavors to alter the effect of a previous command.

Aside from some basic differences in the output between Passtrace and Ptrace, the Passtrace commands work the same as their Ptrace counterparts, with similar function. Additionally, like Ptrace, Passtrace has a $PASSTRACE flag variable that you can check (e.g., in a Userproc: "UPROC = If $Passtrace Then * 'Current value is ' $Procvalue;") to determine is Passtracing is set.

Here is a sample session showing part of the output from Passtracing; this subfile, to which a record is being added, has immediate indexing. Note the use of TLOG to hold the tracing output; Passtrace commonly generates hundreds of lines of output per record, particularly for updates. (Updates generate pass entries from both the replacement copy of the record and the tree copy being replaced.)

-> select authors
-> set passtrace all
-> set tlog
-> add
-> sho tlog
-Begin passing (Add) Goal-type BMRC  , Key = 342201
 Passproc A170  $Vlen: 0
-Pass Index value:  $Vlen: 4  $Val: '<00001E01>'
 Passproc A169:1  $Vlen: 0
-Fetch Goal element FD  $Vlen: 4  $Val: <19750803>
-Pass Index value:  $Vlen: 4  $Val: <19750803>
 Passproc A169:1  $Vlen: 0
 Passproc A167:2  $Vlen: 0
-Fetch Goal element MEPN  $Vlen: 28  $Val: 'FREEMAN, HENRY GEORGE, 1902-'
 Passproc A168  $Vlen: 28  $Val: 'FREEMAN, HENRY GEORGE, 1902-'
 Passproc A44  $Vlen: 28  $Val: 'FREEMAN, HENRY GEORGE, 1902-'
 Passproc A44  $Vlen: 28  $Val: 'FREEMAN, HENRY GEORGE, 1902-'
 Passproc A43  $Vlen: 28  $Val: 'FREEMAN, HENRY GEORGE, 1902-'
 Passproc A38:1  $Vlen: 28  $Val: 'FREEMAN, HENRY GEORGE, 1902 '
-Pass Index value:  $Vlen: 7  $Val: 'FREEMAN'
-Pass Index value:  $Vlen: 13  $Val: '<05>HENRY<06>GEORGE'
 Passproc A165  $Vlen: 0
 Passproc A165  $Vlen: 0
 Passproc A167:2  $Vlen: 0
 ...

CLEAR PASSTRACE

Turning off Passproc tracing is as simple as:

CLEAR PASSTRACE and CLEAR PASSTRACE ALL are equivalent, turning off all the tracing. However, you can turn off the individual options as well:

SET PASSTRACE is often used in combination with SET TLOG, which directs the tracing data to a temporary log file, which can be examined with the SHOW TLOG DATA command. For more information, [EXPLAIN SET TLOG COMMAND.]

You can see what SET PASSTRACE commands are in effect with the SHOW PASSTRACE command:

Since SHOW PASSTRACE lists the actual commands, you can use the IN ACTIVE command to put them into your active file, where you can modify them and then re-execute them with the XEQ command; clearing the current Passtrace settings with a CLEAR PASSTRACE command first is recommended.

B.5  The FILEDEF Subfile and File Compilation

B.5.1  The FILEDEF Subfile

All of the file definitions we have written so far have at least one thing in common: they look like records in a SPIRES file. In fact, every file definition is stored and compiled from a record which you add to a highly-structured SPIRES subfile called FILEDEF. This record is your file definition itself.

The fact that all file definitions are records in the FILEDEF subfile indicates several things about the form your file definition must take in order for it to be accepted as a record in FILEDEF. Lines of our definitions, such as "FILE = GG.UUU.DIRECTORY;" are in the form "element name = value;". Lines such as "REQUIRED;" are in the form "element name;". Several lines of a file definition can actually be coded on one line as long as the ";" is present to act as a delimiter. This compression is not recommended, however, since errors while attempting to add a record to FILEDEF result in a diagnostic message which refers to a line number in the active file. (The definitions in this manual are compressed only to increase readability and save space.)

Since a single statement in a file definition is an element in a SPIRES file, you may extend it over several lines: break the value at any blank and continue it on subsequent lines. One length restriction exists, however: no single element in your file definition may have more than 4,096 characters in it. (In other words, the MAXVAL of the file containing the FILEDEF subfile is 4096.)

Just as with the records we have defined so far, records in the FILEDEF subfile have a unique key. The record is a file definition itself, and the key is the file name. You indicate the key of a FILEDEF record by coding the "FILE =" element of your definition. Since the key of each goal record in a subfile must be unique, you cannot define two files with the same file name. The key, or file name, is the value by which you reference your file definition when you issue a TRANSFER, DISPLAY, UPDATE or REMOVE command while FILEDEF is selected.

The FILEDEF key includes the "GG.UUU" (account) portion of the file name. When you issue a TRANSFER, DISPLAY, UPDATE or REMOVE command, a check is made to insure that the key of the record involved begins with the account number of the logged-on user. You cannot see, modify, or remove a file definition from any account but the one that added the record, even if you have privileges to use that account's subfiles.

In addition to the FILE element, the FILEDEF subfile has an element called "RECORD-NAME" that is the key of a data element hierarchy defined as a structure. The values of the multiply occurring RECORD element are the names of the records in your file definition. There is a record description that follows each RECORD element. For each record description there is a FIXED section, a REQUIRED section, and an OPTIONAL section. Within each section are element descriptions headed by an ELEM element and followed by an OCC and LEN attribute.

If the ELEM is actually the KEY, it may or may not have a LEN specified. The KEY cannot be in the OPTIONAL section of the RECORD; it must occur in either the FIXED or REQUIRED section, but only in one of these. An additional requirement is that the KEY must be the first element defined in the one section in which it occurs.

Non-KEY elements may occur in any of the three sections. Elements defined in the FIXED section must have both an OCC and LEN specified. If an element occurs in the REQUIRED section, it need not have an OCC or LEN, but its OCC is one or more. Elements defined in the OPTIONAL section also need not have an OCC or LEN, but their OCC is zero or more. As mentioned earlier, if an OCC of greater than one is specified for a REQUIRED or OPTIONAL element, then SPIRES makes that element multiply occurring. The occurrence count of such a multiply occurring element is not checked unless appropriate INPROCs are coded.

Following the record definitions, another portion of the FILEDEF record defines the subfile by specifying the SUBFILE-NAME, the GOAL-RECORD and what ACCOUNTS have access to the goal record. This section, called the "subfile section," also allows an optional element, EXP; this element contains explanatory information about the subfile. The user requests this information by issuing the EXPLAIN command.

You can have several subfile sections, though there need not be more than one. Multiple subfile sections allow you to specify different subfile names, goal records, explanations and access levels to various account groups.

B.5.2  Adding Records to FILEDEF

Once you have collected a record in your active file that meets all the requirements of the FILEDEF subfile structure, you add the record to the file definition subfile. A record is added to FILEDEF just as it is to any subfile, except that in this case the ADD command invokes some complicated validation procedures. Among the validation tests is a check that the first portion of the file name matches the account of the logged-on user.

To add a file definition, do the following in SPIRES:

If your definition contains certain kinds of errors, they will be reported to you at this time and the record will not be added; some other kinds of errors will not be caught until you try to compile the definition. Errors are usually reported for certain lines of the file definition code in your active file; the error number tells you what processing rule you violated in your file definition. These errors can be explained using the EXPLAIN command. The record was successfully added to the FILEDEF subfile if SPIRES returns only a command prompt to you.

Typical errors that are detected when you try to add a record to FILEDEF are:

For example, if you had specified "RECORD-NAME=ARTICLE" in the record description and "GOAL-RECORD=REC01" in the subfile section, no error would be reported until you tried to compile the definition.

Errors can be explained using the EXPLAIN command. If errors are reported, you correct your record using the text editor and reissue the ADD command. Once the record has been added, you may then compile it.

B.5.3  Compiling File Definitions

To compile your file definition, select the FILEDEF subfile and issue the COMPILE command:

[IN ACTIVE [CLEAR|CONTINUE]] COMPILE filename [BIG] [STATISTICS] [NOWARN]

All these options, also available on the RECOMPILE command, are described below:

An Example of Compiling a File Definition

Despite the elaborate syntax of the COMPILE command, compiling most file definitions is quite simple; most of the options are for the benefit of large or complex files. Here's how a typical compilation session looks:

If SPIRES cannot compile the file definition, it will report errors to you. These errors will look quite different from those you may have gotten when you tried to add the record to the FILEDEF subfile. In particular, the errors caught at the time of compilation are not keyed to line numbers, but to structures and elements in the FILEDEF record. All the diagnostics you might receive from the compiler are described in "File Definition Compilation Diagnostics"; explanations and suggested solutions are provided for most error messages. [See C.9.] You may also issue the EXPLAIN command to find out more about error messages you may receive.

If errors were reported, you should modify your definition in the FILEDEF subfile, and then try compiling it again.

The STATISTICS Option on the COMPILE Command

The STATISTICS option on the COMPILE command is quite useful when you are compiling a very large file definition, because it shows you how close internal tables are getting to their size limitations. Here is a sample display:

The first number in each line is the number of bytes in the table; the second number is the maximum size for that table; the two numbers are compared in the percentage that follows.

Asterisks at the start of a line indicate a table that exceeds 90% of its allowed size. If you don't include the STATISTICS option on the COMPILE command, those lines are displayed as warning messages, which can be suppressed by using the NOWARN option.

Two important tables to watch are the "Overall Limit" and the "USERPROC Tables & Header" in the Record Statistics section. Each is a combination of most of the tables listed above it, with a limit that is less than the total of all the limits of the tables within it. So you could exceed its limit without exceeding the limits of any of the tables within it.

B.5.4  Altering a File Definition in FILEDEF

To alter a file definition in the FILEDEF subfile, use the following commands:

The SPIRES command does not clear the active file, so if your added definition is still in the active file, you may make modifications to it using the text editor, and issue the command:

with no preceding transfer command. You may also want to use the MERGE command when it will suit your purposes. To use any of these commands, you must be in SPIRES and have FILEDEF selected. The "account" portion of the file name is required when you are naming the key of a record in the FILEDEF subfile (e.g., on a TRANSFER, DISPLAY, UPDATE or REMOVE command).

If you do transfer your definition you will notice that it is listed in a form that may be different from the form in which you added it: every element is on a separate line, and indentation has been added to show a hierarchical relationship among elements of the definition. You may want to obtain an offline listing of the definition; you can use the PERFORM PRINT command to get one. For help, [EXPLAIN PERFORM PRINT.]

After you have issued the UPDATE command and received no error messages, you must attempt to compile the file definition record in the FILEDEF subfile again by issuing the COMPILE command. If you again receive error messages, you must modify the definition in the FILEDEF subfile before trying the COMPILE command again.

B.5.5  ORVYL Files Created by Compilation

You may find it informative to issue the ORVYL command SHOW FILES both before and after you compile your definition, so you can see exactly what disk files are created on your account by SPIRES. The disk data sets created by the COMPILE command (and erased by the ZAP command) are the following:

The goal-record and index-record data sets are created one per record-type, by default. So, the first record-type defined in the file will be stored in the data set "filename.REC1", the second in "filename.REC2", etc. Only 15 such data sets are allowed, "numbered" from REC1 through REC9, followed by RECA, RECB, RECC, RECD, RECE and RECF.

For various reasons, you may want or need to "combine" several record-types into a single data set. That would allow you to have more than 15 record-types, or to have fewer data sets in use at once. You'll use the COMBINE statement to do this. [See B.6.5.]

B.5.6  The ATTACH and SELECT Commands

As soon as you have added the file definition to the FILEDEF subfile, you (and others users to whom you've given access) can issue the SELECT command to select your subfile. (If the file definition has not been compiled, SPIRES will tell you that there is no file to access.) At that point, you may begin adding and displaying the records, and processing the file in SPIBILD in order to try the indexes.

The ATTACH command is necessary when you wish to examine some record-type other than one that is the goal record-type of a subfile. The syntax of the command is:

"Filename" should use the "ORV." form of the name (ORV.gg.uuu.filename), unless the file is your own, in which case you can omit the "ORV.gg.uuu." portion. Without any of the options (i.e., "ATTACH filename"), the first record-type of the file will be attached.

You can also specify the record-type you want to attach by either its name or number, where "record-number" is the ordinal number (1 through 64) of the record-type you wish to ATTACH. The first record-type defined in the file is number 1; the second is number 2, etc.

This command simulates the SELECT command, although no subfile is involved. However, the attached record-type may be treated as if it were a goal record-type for a selected subfile: the DISPLAY and BROWSE GOAL commands are particularly useful. The ATTACH command is very useful in allowing a file owner to examine the contents of index records, since these can be attached as if they were goal records. [See C.6.5.]

Subfile Selection When Multiple Subfiles Share a Name

Anytime SPIRES, SPIBILD or FASTBILD must attach a record-type based on a subfile name (e.g., when the SELECT command is issued, or a subfile lookup proc such as $SUBF.LOOKUP or A65 is executed, or subfile subgoal processing occurs in formats, or subfile phantom structures are used), a procedure for determining which file and which record-type to attach is followed:

The effect of this procedure is to establish an order of precedence for subfile selection, allowing your own files and their subfiles to take precedence over the system's, which in turn take precedence over other people's when you select a subfile whose name is shared by multiple files.

B.5.7  The PROCESS Command in SPIBILD

After you have added a few records to the subfile, using the ADD or INPUT BATCH commands in SPIRES, you may want to pass those records from the deferred queue to the goal record data set. Normally this process is done overnight by an automatically-generated batch SPIBILD unless you have coded NOAUTOGEN in the file definition or issued the SET NOAUTOGEN command. If you do not want to wait overnight, you may pass the records online by using the SPIBILD processor. SPIBILD may be called from SPIRES or WYLBUR by issuing the SPIBILD command. The processor is used as follows:

If you want to process a file belonging to another account to which you have processing privileges [See B.9a.] you must type the entire file name, including the account number, preceded by "ORV.", as in:

For more information about SPIBILD and file-processing, see the manual "SPIRES File Management"; online, EXPLAIN PROCESS COMMAND.

B.5.8  Making Major Changes to a File: The ZAP FILE Command

[Editor's note: A fuller explanation of this process appears in the SPIRES manual "File Management", chapters 2.6 through 2.9.]

Once you have a compiled file definition you may want to change it -- put new elements in the file, add aliases or delete some elements. After you modify the definition that exists in the FILEDEF subfile, you will have to compile the definition again. However, the COMPILE command will return an error message if the disk data sets created by the COMPILE command already exist. So in order to modify a definition, you often must "zap" the file, which means discarding any and all records already in the file. (There are some changes that can be made to the file definition that will not require you to destroy the records already there because the data actually stored on disk is not invalidated. [See C.1.1.] Changes in the subfile section of the file definition do not force you to erase the records you have entered, for example.)

Let's assume that your changes will require you to erase the file's contents and ORVYL files and start again. Before you get rid of the file that already exists, you may want to salvage data that has been built into the goal records from adds or updates in the deferred queue, in order to minimize the amount of data that must be entered again after the file has been redefined.

To retain records that have been passed from the deferred queue to the goal record data set (either by JOBGEN overnight or by the online SPIBILD PROCESS command), issue the following commands:

After you destroy the current file and compile the file definition again, you can add these records into the new file, as described below.

You now can erase the disk data sets associated with your SPIRES file (those created by the COMPILE command) by issuing the ZAP FILE command in SPIRES:

For example,

The "permission" that SPIRES asks lets you confirm that you have requested the proper file to be "zapped". (Remember, this is the most permanent change you can make to a file!) If you include the NOWARN option, SPIRES will not ask you if you are sure you want to zap that file.

If you issue the ORVYL command SHOW FILES after the ZAP FILE command, you will see that the disk files created under your account by the COMPILE command have been erased. If you enter SPIRES and select the FILEDEF subfile, you will see that the file definition for the file you zapped, which is maintained in the system FILEDEF subfile, has not been destroyed or altered. You may transfer and update this definition to reflect the changes you wish to make, and then compile the new definition with the COMPILE command.

After you have "zapped" the current file and compiled the file definition anew, you can get the old records back into your active file and then, in SPIRES, select the subfile and issue the INPUT BATCH command to put the records back into the file (that is, into the new file):

Alternatively, use the INPUT BATCH command in SPIBILD as follows:

B.5.9  Making Minor Changes to a File: The RECOMPILE Command

There are some changes you can make to your file definition that do not invalidate data already stored on disk. For example: you can add an element or elements to the OPTIONAL section; you can add aliases to any element; you can add more record types (such as the index records we will define later); you can change some processing rule sequences. A detailed list specifying the changes that can and can't be made to a previously compiled definition is in the section "Recompile of an Existing File's Definition." [See C.1.]

If you make changes like these, you need only TRANSFER and UPDATE the file definition in the FILEDEF subfile, and then issue the RECOMPILE command:

With the exception of SHARE, described below, these options are the same as those on the COMPILE command. [See B.5.3.]

The RECOMPILE command changes only the "master" ORVYL data set for your SPIRES file (filename.MSTR); data records are not altered. Take care that you do not use the RECOMPILE command when data already stored on disk will be altered; this error could cause catastrophic loss of your data, and at best is very difficult to recover from. It is a good precaution to save a copy of your old file definition before you update the FILEDEF record and RECOMPILE; if an error is made, you can recover by going back to the previous definition and recompiling it.

You can make some changes that do not require a COMPILE or RECOMPILE command. Changes in the subfile section (which begins with the SUBFILE-NAME element) only require that you update the file definition in the FILEDEF subfile; the changes will take effect immediately. However, changes in BIN, NOAUTOGEN, or MAXVAL do require a RECOMPILE.

The SHARE option allows you to recompile a file definition even when other users have subfiles of the file selected. (If you don't include the SHARE option, SPIRES won't allow you to recompile in that situation.) The only possible impact to those users might be a delay of a few seconds in processing commands they issue while the recompilation takes place. They won't see any of the changes wrought by the recompilation (e.g., new index or element names, new elements, etc.) until they completely reselect the subfile (that means they must CLEAR SELECT or select another subfile before reselecting the subfile in question).

Using the SHARE option is not recommended if you are changing the number of record-types in the file.

B.5.10  Destroying a SPIRES File

When you want to completely destroy a SPIRES file because you no longer need it, you usually follow two steps. First, you destroy the ORVYL data sets stored on your account that hold the data, by issuing the ZAP FILE command discussed earlier. Second, you remove your file definition from the FILEDEF subfile.

The first step, issuing the ZAP FILE command, is quite radical, since it immediately destroys your entire file, releasing the storage blocks back to ORVYL. If you think there is a possibility that you will need to use the data in the file again someday, you should consider saving the data somewhere else before zapping the file. The procedure for doing that was discussed earlier. [See B.5.8.]

Then you issue the ZAP FILE command:

where "account.filename" is the name of the file as given in the FILE statement of the file definition. (The "account" portion may be omitted, if desired, or replaced by an asterisk or a period, as in "ZAP FILE .MYFILE".)

For example,

If you specify the NOWARN option, SPIRES will destroy the file without asking you to confirm your request. Remember that this change is permanent; once the command starts executing, your file is gone.

The second step is to remove your file definition from the FILEDEF subfile:

If you think you might want to use the file definition again someday, you are welcome to move it from the FILEDEF subfile to the BACKFILE subfile, an archive of file definitions. The above procedure would be modified as follows:

You should remove the file definition from the FILEDEF subfile so that the names of its subfiles will no longer appear in response to the SHOW SUBFILES command. The list of subfiles an account can select is derived from the FILEDEF subfile, not from the compiled files themselves.

Other Uses of the BACKFILE Subfile

There are other reasons why a file owner might want to put a file definition into the BACKFILE subfile. These reasons depend on the fact that a file definition in BACKFILE can be compiled, creating a regular SPIRES file. However, the compiled file is different from other files in two respects:

To compile a file definition in the BACKFILE subfile, follow this procedure:

The COMPILE command has precisely the same syntax in this context as it does when you compile a file definition stored in the FILEDEF subfile. [See B.5.3.]

Record-types may be defined with DEFINED-BY statements, indicating that the record definition is in a BACKRECS subfile record and has been compiled separately. That is, if you move a file definition containing DEFINED-BY statements referring to RECDEF subfile records, you should move the record definitions from RECDEF into BACKRECS -- otherwise, the file definition cannot be compiled in BACKFILE. You can combine the procedure described above for moving a file definition from FILEDEF to BACKFILE, altering it for RECDEF and BACKREC, with the procedure for compiling record definitions, replacing RECDEF with BACKREC, which is described later in this manual. [See C.7.1.]

Similarly, if the file definition contains EXT-REC or EXT-LINK statements, the records referred to in the FILEDEF subfile should be moved (or copied, if they are used by other files) from FILEDEF to the BACKFILE subfile. These are not compiled separately, so no further work needs to be done. [See C.7.2.]

EXTDEF subfile records are handled differently, however. If you compile a file definition or a record definition from BACKFILE or BACKREC respectively, and it contains an EXTDEF-ID statement, the record referred to must be in the EXTDEF subfile, not the BACKDEFS subfile; otherwise, SPIRES will not find it. The BACKDEFS subfile is strictly for archival purposes, and is not examined by SPIRES during any compilation process. [See C.10.5.]

B.5.11  Summary

In sum, the SPIRES commands TRANSFER, UPDATE, DISPLAY and REMOVE are used to maintain your file definition record in the FILEDEF subfile. The COMPILE, RECOMPILE and ZAP commands are used to create, modify and destroy the ORVYL files associated with your SPIRES file that are stored under your account. The BATCH and PROCESS commands are used in SPIBILD to add records to the deferred queue and pass records to the record types (goal record and index records) of your file.

B.6  File Structure: Tree & Slot, Goal & Index, Removed Records

B.6.1  Introduction

In the file definitions coded up to this point, each has had only one RECORD-NAME statement; the record named has been called the "goal record." Though it is possible for a SPIRES file to have only this single record definition, typically a file definition will have several RECORD-NAME statements. Each RECORD-NAME statement after the goal record's RECORD-NAME statement signals the definition of another "record-type," whose contents may be derived from or independent of the contents of the goal record. Additional records whose contents are derived from the elements in the goal records are called "index records"; a file has as many index record-types as it has indexes. Indexes in SPIRES can serve the same purpose as indexes in books; they relate a term (or search value in SPIRES) to its location (a goal record). At this point, it may be helpful to review the terms defined in the glossary. [See A.4.]

Figure B.6.6 is an outline of the relationship between commands issued by a user and the goal and index records. Simply stated, information is passed from the goal records to the appropriate index record or records when the file is processed; this process is called "passing" appropriately enough. When a FIND command is issued, SPIRES examines the index record named in the command. For example:

would examine the author index and report on the number of SMITH's found. Each SMITH located is associated with a pointer to a particular goal record that contains the name SMITH. When a TYPE command is issued, SPIRES uses the pointers to access and retrieve the goal records themselves.

SPIRES updates the index records during overnight processing. The definition of index records is fairly straightforward, and can be reduced almost to "recipes" for the majority of files defined. The linkage between goal and index records is defined by the file owner, and can also largely be reduced to recipe. The recipes require the file definer to determine how the file will be used; the types of indexes needed often follow directly from user requirements.

How does SPIRES interpret, handle, and store data that you input to your file? This question addresses at least two levels of data organization: 1) the element level, dealing with the organization and demarcation of elements in a single record; 2) the record level, dealing with the organization of records in a single record type, i.e., goal or index. We will begin with the element level.

B.6.2  Element Storage

When coding the goal record definition, three storage categories were used: FIXED for elements fixed in both length and occurrence, REQUIRED for elements that must occur, and OPTIONAL for elements that may not occur. (Note: these storage categories are distinct from element structure categories, of which we have encountered only simple elements and TYPE = STR elements; there are also LCTR and XEQ elements.)

"FIXED", "REQUIRED and "OPTIONAL" signify both an element type and an element storage category. Each type requires a slightly different storage schema. In an earlier chapter of this manual, it was noted that FIXED elements, those for which both LEN and OCC are determined, are the least expensive to store, while OPTIONAL elements, those for which neither LEN nor OCC need be specified, are the most expensive.

Fixed elements are the least expensive to store because they are of predefined and unvarying length and occurrence. SPIRES groups these elements together in the order in which they appear in the file definition, so that individual elements can be separated out of a packed data schema by using element locators stored in the compiled file characteristics. These characteristics are stored in the MSTR (master) file created by the COMPILE command for the following reason: Fixed elements are always stored at the beginning of a record in a SPIRES subfile. The locations of these elements never change from record to record, so the location information for all records can be stored just once, in the file characteristics (MSTR) table.

Element location information for Required elements cannot be stored entirely in the file characteristics table. Because these elements can vary in length and occurrence from record to record, "header" information must be prefixed to each element to indicate how many values occur and how long each value is. Figure B.6.7 shows schematically how the headers are arranged as required for each storage category.

If a Required element has fixed length, then the header contains the total length for all of the element's occurrences. If the element is singly occurring but of variable length, then the header still contains the total length of all occurrences, but here "all occurrences" means "one occurrence." If a Required element varies in both length and occurrence, then three kinds of information are stored in the record in front of the elements' occurrences: 1) again, the total length of all occurrences is stored at the head of the group of occurrences of an element; 2) this single header is followed by an occurrence count for the element; 3) each element occurrence is preceded by the length of that occurrence of the element. Each of these three pieces of information is stored in a two-byte field.

Note that Required (and optional) elements really have only two categories of occurrence counts: single and multiple. An occurrence count other than one (eg. OCC=3;) does not cause SPIRES to verify that the element occurs the specified number of times; but if OCC=1 is specified no occurrence count bytes will be maintained. INCLOSE rules can be used to verify that an element occurs a certain number of times.

Each data element in the Required section of the record may be located by skipping over any preceding elements through the use of the length information prefixed to each element. Thus, the time and expense required to locate a Required element increases the farther into the Required section of a record the element is stored; for this reason it is advantageous to define often-accessed elements early in the Required section and put seldom used elements toward the end.

Optional data elements occur in the same way as required elements, except that they need not occur at all in some records. The same sort of information that is stored with required elements is sufficient to locate optional elements, but only if there is some means of detecting the presence or absence of each optional element before the value-skipping process begins. This means of detection is a string of bits (binary digits) that is called the "optional element bit mask." If optional elements were defined in a record definition, the optional element bit mask is stored as a data element in the required section. The presence of this element is transparent to the user: it can not be displayed. Each optional element in the record is assigned a number, starting with one. The corresponding bit in the optional element bit mask will be "1" if the element occurs, and "0" if it does not. (For instance, if the fourth bit in the mask for a record is "0", the fourth possible optional element does not occur in that particular record.) In order to locate an optional element, the system checks the corresponding bit in the mask and if it is "1", then counts the number of "1" bits that precede it. The counting determines how many preceding elements in the optional section must be skipped over to reach the desired element.

The optional element bit mask is itself an element in the file. Just as SPIRES creates a fixed key element for you if your record is slot (and creates an implicit fixed section if you had not coded one before), SPIRES implicitly creates a Required element of fixed occurrence (one) and varying length if you define any optional elements. If no Required section exists, one is created for you, though only SPIRES knows it exists. However, a slot key is explicitly requested by the file definer, while the optional element bit mask is part of the overhead involved in storing optional elements.

Once a record-type has records in it, elements can only be added to the optional section. If no optional elements were defined, then no new elements can ever be defined for the record-type. But if any optional elements are coded, the optional element bit mask will occur, and other optional elements can be added to the record definition, with SPIRES simply making the bit mask (which is a varying-length element) longer. You can see now why it is advisable to code a "dummy" element in the optional section if you would not otherwise have an optional section: it allows you to add elements to your file at a later date. Elements added must go in the optional section after any previously defined elements. You may want to specify length and occurrence if they are known, but you cannot add elements to the Fixed or Required sections.

Adding elements to structure definitions has the same rules as adding elements to record definitions: new elements must go at the end of the OPTIONAL section of the structure. If no OPTIONAL elements were defined for the structure, then no elements can be added. Adding new structures to a record definition follows the same rules as adding elements: new structures must be declared in the OPTIONAL section of the record definition.

The chapter "Recompiling An Existing File's Definition" contains detailed information on recompiling a file definition. [See C.1.]

As shown in the chapter "Structures," [See B.3.] a data element may be simple and incapable of any further breakdown, or it may be a structure that consists in turn of other simple or structure elements. Data elements in any of the three storage categories may be structures. Furthermore, the elements within a structure are categorized as fixed, required, and optional. Unlike full records, structures may or may not have key data elements, but if they do, these keys must be fixed or required. There are several corollaries to all this. One is that if a structure is a fixed element, then all its elements must be fixed elements. Probably, if all the simple data elements in a structure are optional, then the structure itself should be declared optional--because if none of the structure's optional elements occur, the structure will not have occurred. (This is why INCLOSE rules for elements in a structure are not executed unless at least one element in the structure occurs.) However, a structure containing required elements may be an optional structure.

B.6.3  Record Storage

We can now look at SPIRES file structure at the record level; that is, how does SPIRES store the goal and index records?

Recall that in the definition of goal records, two types of goal records were distinguished: "tree" and "slot." For a slot goal record we coded "SLOT;" probably "REMOVED;" and possibly "SLOTCHECK;". The way SPIRES handles tree and slot goal records is quite different; since most goal records and all index records are tree structured, we will begin the discussion of record storage with a tree structured data set, such as the one shown in figure B.6.8.

At Stanford, SPIRES files are maintained in the ORVYL file system, in which files are made up of fixed length blocks. SPIRES takes the goal and index records in your file and packs them into a block. A block is a unit of space on disk measuring 2048 bytes or characters. It is the amount of contiguous data that can be retrieved from disk into memory in one read operation by the system. A block is a physical record; it can be packed with a number of logical records such as the "records" you add to your file. Just as a block can contain several logical records, one logical record may extend over several blocks if the record is larger than 2048 bytes.

Given that one block is 2048 bytes, then 1) if you have goal records of about 200 bytes, only 10 of them will fit into a block; 2) if your goal records are only 10 bytes long, then about 200 of them will fit into a block.

Record data sets (RECORD-NAME = REC01 is the start of one record-type data set in many of the sample definitions in this manual) must be organized so that a particular record can be located by the value of its key data element (with a command such as DISPLAY <key>) in the fewest possible number of disk reads. The number of disk reads is the largest factor in the speed of searching a file. Keeping in mind that the number of blocks the system must examine to locate a particular key is identical to the number of reads, you can see why the more highly packed structure, 200 records of 10 bytes each per block, is preferable to an organization in which only 10 records are contained in a block. But this does not mean that SPIRES efficiency deteriorates as the size of the records increases.

In the introduction to this manual, the claim was made that SPIRES can locate one record in 500,000 in only four disk accesses or reads. What kind of file structure allows such efficiency? SPIRES files are structured in what is called a "B-tree." Figure B.6.8 depicts a very simple B-tree structure; we will use this figure in the discussion of file structure that follows. Many of the details of a file block's contents have been removed from Figure B.6.8; these details are concentrated in Figure B.6.9, showing only one block. The tree structured data set shown in Figure B.6.8 might show the goal records in a dictionary subfile keyed on "word," where "word" is the item you would look up to find its definition. The figure shows only the records' keys, instead of all the elements in each record. Let's assume that each record contains all the elements that are normally associated with a dictionary entry.

Some data base terminology is necessary before we describe how SPIRES would locate a record in this dictionary tree. (You will need to refer to Figures B.6.8 and B.6.9 in this short definition of terms.) The "depth" of a tree is the maximum number of disk accesses to reach a record, or the number of accesses necessary to reach the deepest record. In Figure B.6.8 the depth of the tree is three. The "average disk accesses" is the sum of the number of disk accesses to reach each record divided by the number of records; this is 2.24 in figure B.6.8. A "node" is any record in a block; there are twenty-one nodes in the present case. A "terminal node" is any record that does not point to a block deeper in the tree, "jungle" and "zero" are terminal nodes, while "join" is a non-terminal node. The highest level block, not pointed to by any other blocks, is called the base or "root" block (also called block 0). Each block begins with an area of "block control information" (labeled "header" in Figure B.6.9) that SPIRES uses to validate the integrity of data stored in the block. Each block ends with an area of "trailers." Trailers begin at the end of the block and move toward the center of the block. They are fixed length and are maintained in key sequence, which is either alphabetically increasing, depending on the type of the key. Trailers are used to give the location ("displacement") of each record in the block. Thus, the fixed length trailers can be used to access the varying length records. Because the trailers are kept in key sequence, the records need not be. The trailers are shown only in Figure B.6.9, and Figure B.6.8 shows the records themselves in key sequence for simplicity.

After the block control information the actual records are stored, each preceded by a "branch pointer" to a block deeper in the tree. No two branch pointers point to the same block; thus, there is one and only one path from the base block to any record in the data set.

Except in unusual cases, SPIRES will insure that each block contains more than sixteen records, so that accessing efficiency will not be impaired by a few large records. The mechanism for this is described in the SPIRES design documents, and will not be discussed here. (Figure B.6.8 is simplified by putting fewer than sixteen records in each block.)

Let's outline what happens when the command "DISPLAY HERB" is issued, where HERB is the key of a record in the tree-structured goal record shown in Figure B.6.8. SPIRES will read the base block into memory and scan the keys for "HERB". Finding the key, the search terminates, having taken only one disk access. The record is then displayed.

Suppose the command "DISPLAY DATA" is issued. Starting at the base block again, the block is read into memory and scanned. "DATA" is not found before a key greater than "DATA" in sort sequence, "HERB," is found. Right before HERB there is a branch pointer to block 2. Block 2 is read into memory and scanned until a key greater than "DATA," "ELEMENT," is found. Preceding "ELEMENT" is a branch pointer to block 6, which contains the key "DATA." Blocks 0, 2 and 6 were read to find the key "DATA". Thus, three disk accesses were required.

As noted before, the depth of the tree in Figure B.6.8 is three: three disk accesses are required to read the record whose key is "JUNGLE". The depth of any tree is a function of the number of records that can fit into a block: the more records per block, the shallower the tree. If eight records were put into each block in Figure B.6.8, only three blocks would be needed for the entire file, and it would never take more than two disk accesses to locate any record. If 100 records could be put into each block, only a single block would be used, making retrieval extremely efficient. Later in this chapter, we will see how this many records can be packed into a single block.

If you add a number of records whose keys are close together (with respect to alphabetical sequence), one particular path of the tree will become longer and more heavily "travelled" during record searching operations than others. This is because more records are stored on this long path. Figure B.6.10 depicts such a situation. The unbalanced tree shown would be very unlikely; SPIRES would automatically invoke a rebalancing process to correct the situation, trying to establish a uniform depth for the tree, such as that shown in Figure B.6.11. Uniform depth is not the only quality that SPIRES optimizes for in the rebalancing process, however; SPIRES tries to optimize for a minimum average accesses per record. Figure B.6.12 shows a tree with a uniform depth of three, in which the average number of accesses per record is 2.74. The same tree can be rebalanced as shown in Figure B.6.13. The depth is still three, but the average number of accesses per record is now 2.19.

The dynamic rebalancing that SPIRES does is localized; it does not extend across the tree. If a general rebalancing of the tree ever becomes necessary, a utility is available. [See B.10.16.] A file manager can determine the shape of any tree by using the file status utilities described in "SPIRES File Management." [See B.10.]

B.6.4  Removed Record-Types

If SPIRES insists on sixteen or more records per block, it would seem that only records of less than 128 bytes would be allowed in the tree, since sixteen 128 byte records would exhaust the 2048 byte limit. But SPIRES records have no such length limitation; strings of textual data alone are commonly longer than 128 bytes. In the chapter describing goal records, the only limitations noted were that 1) if any single occurrence of an element (or structure) was longer than 1024 bytes, a statement such as "MAXVAL=2048" must be coded after the file name [See B.1.8.2.] and 2) no record or element can be longer than 80,000 characters.

How, then, can sixteen or more records, each longer than 128 bytes, be packed into the 2048 bytes in a block? A complete answer to this question must take us into an examination of the ORVYL files created under your account when you issue the COMPILE command.

In the previous chapter, "FILEDEF Subfile and File Compilation," [See B.5.5.] we named the following files and functions:

   MSTR  an encoded characteristics table created from your file
         definition.
   REC1  the first and following record-types you define, usually
   REC2  the goal and index records.
    :
    :
   RECn
   DEFQ  the deferred queue, containing records added or updated
         during the day.
   RES   the "RESidual" data set, whose function is described in
         the remainder of this chapter.

Having these names in mind, let's outline what happens when a record passed from the deferred queue to the tree of REC1 would, because of the record's size, cause fewer than sixteen records to exist in one of the blocks of the REC1 tree. Also, what would happen if the record itself were larger than a block (2048 bytes)?

At this point, SPIRES would not put the record directly in the tree, where it would promote accessing inefficiency, but would "remove" the record to the "residual" ORVYL file. When a record is removed to the residual, the entire record is placed in the residual, and the tree contains only the key of the removed record, and an absolute pointer to its address in the residual. Thus, for the record whose key is "CYST" in figure B.6.8, only the key and a four byte pointer are stored in the tree, while the entire record, including its key, is stored in the residual. (Note: in record-types not coded as REMOVED (see below), the Fixed elements are also placed in the key tree with the key if the key is Required; however, if the key is Fixed, the other Fixed elements are placed in the residual with the remainder of the record. In other words, for non-REMOVED record-types, the key and all Fixed data that precede it are stored in the tree, while the rest of the record may be moved to the residual -- see below.)

What are the consequences of this technique? Several small disadvantages may be obvious: 1) we have duplicate storage of the key, once in the tree and once in the residual, and 2) in order to display a record, SPIRES must do an extra disk access to the residual after having read the necessary blocks of the tree to find the pointer to a record's address in the residual. First, duplicate storage of the key is of small importance, since the key is typically only six to ten bytes. With respect to the second disadvantage, the extra disk access necessary to the residual dataset is usually only a single access: the residual is not tree-structured, but organized sequentially; this means that the pointers into it are not keys, as in tree datasets, but physical addresses of information. SPIRES does not have to hunt for a record in the residual, but can go to its block and address immediately from the pointer information in the tree.

The advantages of removing records to the residual are significant. Since the number of records that can fit in a block is now dependent upon the size of the record key and not the size of the record, many more records can fit in a block, and, since each record in a block points to a block below it, the depth or number of levels of the tree is significantly reduced. Also, if all goal records are removed to the residual, a search request of an index record will locate address pointers to the residual, rather than keys of records in the goal tree. This results in a significant savings in retrieval time and disk accesses. But, the advantages of record removal for indexed search and retrieval are only available if all records are removed.

If a block would contain fewer than sixteen records, SPIRES will automatically remove some records to the residual. It is possible, and, in fact, the norm, for the file definer to request that this process be done unconditionally. This is done by coding "REMOVED;" after the "RECORD-NAME = " statement. Record removal can be specified for both slot and tree records, and is required for slot record-types if all elements are not fixed length.

There are few instances in which you would not want to code REMOVED for goal records (it is very unusual to code REMOVED for index records since they are usually small). If your goal records are small (less than sixty bytes) and there are few of them, then a tree of non-removed records would never become very deep ("deep" is three levels). In this case, if most of your record retrieval were to be done by key values using the DISPLAY command rather than index searching using the FIND command, you would probably not remove records. If your records are large, or you have a large number of them, or they pass to index records and are not themselves index records, then code REMOVED for the goal records. Record-types that serve only as index records should not be REMOVED, and those that serve as both goal and index records usually should not be REMOVED. [See B.2.3.]

B.6.4.1  Very Large Databases

In most SPIRES files with more than a hundred records, the residual (RES) file is the largest of all the files associated with a SPIRES file. There is a system limit to how large a single residual file can grow: 425984 blocks. If your file will, or is about to, grow beyond this limit, you should specify the RESIDUAL statement in your file definition.

The RESIDUAL statement is specified after the FILE name statement, and indicates the number of residual files to be allowed for your SPIRES file. The allowed values are:

Any changes in the RESIDUAL statement of a file definition require that the definition be recompiled.

Only 45 residuals may reside on the file owner's account. Alternatively, you may specify multiple RES-ACCOUNTS, one for each residual. If you use RES-ACCOUNTS, there must be RESIDUAL number of them.

where "gg.uuN" is an account that will hold the N-th residual data set. You must have WRITE access to each file for each RES-ACCOUNTS account. Therefore, RES-ACCOUNTS works much like DEFQ-ACCOUNT. [See C.6.23 for details on subjects such as default data set permits for the new residuals, etc..]

[See B.6.4.2 for information about RES-LARGE and its relationship to the RES-ACCOUNTS statement..]

B.6.4.2  RES-LARGE Statement

The RES-LARGE statement indicates that each RESIDUAL data set can grow to 13,631,488 blocks, and that locators will be 26-bits in size. RES-LARGE may be coded without RES-ACCOUNTS, in which case RESIDUALS will be large (as above), and will be stored on the FILE's account. However, RES-LARGE is usually coded with a matching set of RES-ACCOUNTS. [See B.6.4.1.]

Currently, only the Research Libraries Group files use RES-LARGE. You can't switch from a file without RES-LARGE to a file with it without rebuilding the file because the locators change to 26-bits.

B.6.5  Combined Record-Types

It is possible to reduce the number of disk accesses in another way. When a FIND command is issued, SPIRES attaches all ORVYL files associated with the indexes being searched in the selected SPIRES file. For a file with fourteen index datasets and a goal record dataset, this could require attaching fifteen ORVYL files: REC1, REC2,...REC9, RECA, RECB, RECC, RECD, RECE and RECF. Such a file could not have more than fourteen indexes, since a "RECG" is not allowed.

To reduce the number of attaches, it is advisable to increase the number of index records put into the same physical ORVYL file, while keeping them logically distinct. This is done by coding "COMBINE = <record-name>;" after the "RECORD-NAME = " statement.

Often all index records can be combined into the goal record dataset as follows:

By combining data sets, you may have up to 64 record types defined (i.e. RECORD-NAME statements) in a file; you are limited to thirteen otherwise. (By the way, if there are more than 32 record types in a file, issuing the SHOW SUBFILE SIZE command may not give you the number of records in a particular subfile of the file; instead you will receive the message "UNKNOWN SIZE". This won't happen if the file's available-space tables have been reformatted with the AVSPREC command, using the REFORMAT option; see the manual "SPIRES File Management" for details.)

No more than eight of the record types defined in a file may be slot, however.

The "record-name" of "COMBINE = record-name" may never be a slot record and a SLOT record type may never have a COMBINE statement coded in it; that is, each SLOT record type must be physically separate from any other record types defined. [See B.7.12.]

B.6.5a  Extended Tree Data Sets for Large Databases

As of January 2003, all new SPIRES non SLOT tree data sets (RECn data sets) will be defined as Extended-Tree data sets by default. This means that the "EXTENDED-TREE;" statement described below is no longer needed to secure the benefits of extended tree data sets.

The RECOMPILE command will still recognize that non extended tree data sets are to remain non extended. If you are RECOMPILEing an existing file definition and adding record-types which will be in a new RECn data set, then that data set will be an extended tree data set. If you wish to convert an older data set to extended-tree you still must follow the steps that enable you to perform this conversion [EXPLAIN EXTENDED TREES, CREATING.]

You may have valid reasons to ensure that a tree data set is not extended (certain system data sets must remain this way). An option on the EXTENDED- TREE statement allows this possibility. If you code "EXTENDED-TREE = NO;" you will accomplish this goal.

Typically, in a large file, the residual dataset is the first physical dataset to grow to the ORVYL file system limit of 425,984 blocks. Getting around that limit requires the use of multiple residuals, as described earlier in this chapter. [See B.6.4.1.]

But eventually, a very large file may run into the limit on the size of record trees, which is 64K blocks per tree. By coding the EXTENDED-TREE statement with no value as part of the record definition, you can request that the data set containing the current record-type is to be an "extended-tree" data set, one which can grow to 434,280 blocks:

Once this has been added to the file definition, and the file definition has been recompiled, you need to follow a special rebalancing procedure to convert the existing trees that are affected by this change (internally speaking, that means they will have 3-byte instead of 2-byte branch block pointers). The full procedure to follow is explained in the manual "SPIRES File Management", chapter 5.6; online, [EXPLAIN EXTENDED TREE DATA SETS.]

Restriction: Slot record-types may not be in an extended tree data set; that means a slot record-type cannot be combined with a record-type that is in an extended tree data set.

Using the TREE-DATA Structure to Control Tree Depth

Some files are prone to exceptionally large amounts of updating activity in a particular key area. For example, suppose a goal record-type of transactional data has keys that begin with some particular prefix (the current year, for example) and there is a huge amount of growth in one area of the tree. That could lead to tree depth problems in that area.

The TREE-DATA structure lets you request a tree specifically for a key area of the record-type. It occurs multiple times in a record-type definition, so multiple trees can (and should) be defined. If a record definition contains TREE-DATA structures, then it must be in an extended tree data set or COMBINEd with a record-type that is in an extended tree data set.

The START-KEY statement holds the value of the lowest key that will be stored in the tree being defined by this structure. All records whose keys are greater than or equal to this starting key but are less than the start-key of the next tree will be stored in this tree. The START-KEY value will be embedded in the block header of all blocks associated with this tree.

It can be variable in length, and have one of two forms:

The optional TREE-PREFIX statement is for special cases of tree-data structures where you know every single value that will go in the tree being defined will have the identical prefix. You would know that's true if your Inproc rules guaranteed it, or if the next TREE-DATA structure's START-KEY was the very next possible prefix (see below). The TREE-PREFIX statement provides an integer specifying the number of characters in the START-KEY that will be on all keys of the tree. So, for instance, if the START-KEY of a tree is 92, and the START-KEY for the next tree is 93, then you would know for sure that all keys between 92 and 93 would go into this tree, and they would all begin with "92". So you could specify TREE-PREFIX = 2.

The TREE-PREFIX statement causes SPIRES to strip off the prefix before storage in the tree. If you have thousands of records with the same prefix, which has several bytes, this can be a significant storage savings over time.

As mentioned above, your record-type may have Inproc rules to guarantee what the possibilities for a key are. However, it is recommended that you also include TREE-DATA structures to cover the entire range of possible key values, particularly at the beginning. For example, here is a series of TREE-DATA structures for a file:

As you can see, there is no key value that would be left without a tree to call its home. The records stored in the middle three trees listed above would all be stored without the first two characters of their keys, which would be either 90, 91 or 92 as appropriate. Notice that a null value was given for the first one to handle any records with keys previous to "90"; a null value needs to be written as shown, using apostrophes. The standard null value "START-KEY;" will not compile.

The TREE-MONOTONIC statement tells SPIRES that the keys of the records stored in that particular tree will be entered monotonically, i.e., in order. That helps SPIRES store the records more efficiently. It should not be coded unless at least the vast majority (say 98 percent or more) of the records for that tree will be entered into the database in sequential order. If the entire record-type (as opposed to individual trees) would be monotonic, you should code the MONOTONIC statement. [See B.2.4.]

There is no fixed limit on the number of trees you can define with TREE-DATA. The limit is determined by an internal table that keeps track of numerous file characteristics. The length of the start keys will also affect the limit. But generally speaking, there is probably room for dozens, perhaps a hundred or more, TREE-DATA structures.

When you add or change TREE-DATA structures to an existing file, you need to recompile the file definition with the REBALANCE option. Again, the full procedure to follow is explained in the manual "SPIRES File Management", chapter 5.6; online, EXPLAIN EXTENDED TREE DATA SETS.

B.6.5b  The OVERFLOW-TO and OVERFLOW-KEY statements

In extremely large files, a record tree can possibly become too large for SPIRES to handle properly. Specifically, if the tree would contain more than 65K blocks, then it must be split over two identical record-types, using the OVERFLOW-TO and OVERFLOW key statements.

To use "overflow processing", your file definition must have two almost identical record definitions. The difference between them is that one includes the OVERFLOW-TO and OVERFLOW-KEY statements. When records being added to that record-type have keys greater than that given in the OVERFLOW-KEY statement, they are added to the other record-type. In accessing records by key, SPIRES checks the requested key against the "overflow key" to determine which record-type contains the desired record. No other special statements or values need to be coded (e.g., you do not specify both record-types as GOAL-RECORDs). [See B.9.2.] Once you have coded the OVERFLOW statements, SPIRES handles the rest.

The syntax of the two statements is:

These are coded at the end of the definition of the "overflowing" record-type. The "record-name" is the name of another record-type defined in the file that has the same design as the current one. The "hex-value" is a four-byte hexadecimal value that specifies the value at which overflowing is to occur. For example, "OVERFLOW-KEY = C4000000;" specifies that all records with keys whose values are "D" or greater ("D" is "C4" in hex) are to overflow to the specified record-type. (Note: keys shorter than 4 bytes are padded on the right with zeroes (i.e., hex 00) to a length of four bytes before being compared to the overflow key value.)

Even if overflow processing is not required, it may be desirable to specify it to help keep record accessing efficient. In any event, the decision to use it should be made with your SPIRES consultant.

B.6.5c  The EXTERNAL-TYPE statement

A SPIRES file can be defined in such a way that one of its record-types is an External record-type. This is done by coding the element EXTERNAL-TYPE in the record-type definition. A subfile which has an EXTERNAL goal record-type has some properties that are quite different from other SPIRES subfiles. This subfile will have a look similar to normal subfiles in that its Deferred Queue can be used for retrieval and update but the "tree" portion of the subfile is "external" to SPIRES. That is, the "tree" is not in an ORVYL RECn data set. Rather it is located either on a SPIRES Device or on some medium foreign to the ORVYL environment.

This "foreign" information source refers to any source of information that can be moved in some fashion into a SPIRES Device area. This data could come from a WYLBUR data set, a remote database accessed through the NIO Device Area, a WYLBUR QUERY or OS data sets accessed through batch jobs or through the SUSAN Path.

The EXTERNAL-TYPE statement provides the linkage that SPIRES needs to understand the access control to this remote (external) data.

[EXPLAIN EXTERNAL FILES, INTRODUCTION.] for more information about this facility.

B.6.6  Figure: Function of Goal and Index Records

B.6.7  Figure: Storage of Element Length and Occurrence Information

B.6.8  Figure: A Tree-Structured Data Set

B.6.9  Figure: Detail of the Structure of a Single File Block

B.6.10  Figure: Sample Tree After Intense Local Growth

B.6.11  Figure: Sample Tree With Well-Distributed Growth

B.6.12  Figure: Tree Showing High Number of Access Per Record

B.6.13  Figure: Previous Tree After Rebalancing

B.7  Understanding and Coding Index Records

B.7.1  How Indexing Works

Let's consider what happens when we want to build an index. Suppose we had a subfile called "TABLE OF CONTENTS"; each record in the subfile is a chapter number (the key of the record) and a chapter title. If an appropriate format were written, the table of contents for the first seven chapters of Part B of this manual might look like this:

Chapter   1: Goal Record Concepts and Definition
Chapter   2: Goal Record Keys, Slot and Removed Records
Chapter   3: Structures
Chapter   4: Processing Rules: INPROC, INCLOSE, OUTPROC
Chapter   5: FILEDEF Subfile and File Compilation
Chapter   6: File Structure: Tree and Slot, Goal and Index Records
Chapter   7: Understanding and Coding Index Records

An index based on the words appearing in that chapter titles might look like this:

In fact, an index at the end of a book is a good example of the structure of a simple SPIRES index. The record definition for this index would look like this:

The element "TITLE-WORD" contains one of the words in each of the titles. Since it is the key of the record, it can only occur once in each record. So, each word ("AND", "CODING", etc.) in the above index is the key of a separate record in the index record-type. But notice that for each "TITLE-WORD" there may be several occurrences of "GOAL-RECORD-KEY", each occurrence pointing to the goal record in which the title word occurs.

The record in this index record-type for the TITLE-WORD "GOAL" would look something like this:

This is an index record whose key is some "word" (here, "GOAL") and contains a set of pointers to those goal records that contain a certain word in the title. Each record that is stored in this index contains as its key a word from a chapter title, and one or more pointers, each to a goal record whose TITLE element contains the word that is the key of the record.

A SPIRES search of this index could look like this:

While a DISPLAY <key> command searches the goal record tree for the key named, an index record is searched by the FIND <searchterm> <key> command. For example: SPIRES will locate the record that has the key-value "goal" in the index named in the FIND command, and count up the number of pointers to the goal record data set to indicate how many records are likely to be in the search result. (The number reported may not be entirely accurate, since a single particular record may have several pointers to it in the same index record; if this is the case, SPIRES will report a corrected number in the search result after it has examined the records via the TYPE or OUTPUT commands.)

An index record thus looks very much like a goal record as far as SPIRES is concerned. It has a key of fixed or varying length, depending upon the nature of the data being passed to the index; it has a multiply occurring element called a pointer (that may be the key of a structure). Since the FIND command attempts to locate records by key, the most efficient structure for storing and locating index records will be a tree structure. Typically, the records in an index are not REMOVED, since they are usually quite small, allowing a large number of them to fit into a single tree block. Index records are thus structured and stored in a manner identical to that for goal records, except that we can take advantage of their small size.

The simple definition for the TITLE-WORD index shown above is probably not what most file definers would specify, especially if the goal record (the chapter titles) were a REMOVED record type. If the goal record is REMOVED, then its indexes do not usually store goal record keys, but addresses of goal records in the residual data set. The index definition would probably look more like this:

Index records exhibit a new type of element, the locator, denoted by "TYPE=LCTR." This element refers to ("locates") a goal record, not by its key, but by its address (location) in the residual data set. To see why this is done, consider the sequence of events for a FIND and TYPE command. SPIRES searches an index and accumulates a list of pointers to the goal records, and reports on the number of pointers found. If each of the pointers were in the form of a goal record key, then the TYPE command would cause SPIRES to read blocks of the goal record tree until it found the location of the referenced record in the residual; then SPIRES would access the residual. In almost all cases, the middle step of searching the goal record tree for the record's location in the residual can be eliminated by storing that location itself as the pointer, rather than the key of the record; this optimization can only be done when the goal records are REMOVED.

To be precise, a pointer to a non-REMOVED record type is not declared TYPE=LCTR, since it contains a goal record key rather than a pointer to a location in the residual data set. Only the pointer to a REMOVED record type can be TYPE=LCTR.

Since SPIRES creates and maintains indexes automatically, the file definer must tell SPIRES how and what information is to go from the goal record to a certain index. The file definer specifies how this is done in the "Linkage Section" that follows all of the index record definitions and precedes the "Subfile Section" that specifies subfile name and privileges.

As its name implies, the Linkage Section links the goal and index records; it defines how information is passed from the goal to the index records when file updating is done, and it defines how indexes are to be searched. The details of coding the Linkage Section are covered in the next chapter "Understanding and Coding the Linkage Section." [See B.8.] With this brief look at the structure of a very simple index record, we can now consider the different methods of indexing available to the file definer. Each method has a search and retrieval situation for which it is particularly well suited. For any file that will be searched often, or will contain more than one thousand records, indexing plans should be discussed with the SPIRES consultant. For each indexing strategy described below, guidelines for its use are also presented.

B.7.2  Understanding Simple Indexes

Simple indexes may be defined for files of any size. Their structure and use by the system is "simple" and efficient. Here is a picture of two records in a simple index:

Also, simple indexes are the only type of index for which a synonym can be maintained. [See C.3.]

If an element to be indexed has only a few possible values, it may be best to "search" this element using Global FOR, or perhaps index it as a "qualifier" (see below). Any time a search request would retrieve a large percentage of the records in a file (seventy percent or so), simple indexes may not be the best search mechanism. For example, an index built on the sex (male or female) of people in a personnel file may or may not be necessary, depending upon the search situation. If that index will not be searched frequently, it may be cheaper to search the goal records sequentially (using FOR or ALSO) than to pay the cost of building, updating, and storing the very large index record entries. If a search result will frequently be narrowed by a "sex" criterion, then "sex" might be added to another index or indexes as a qualifier.

B.7.3  Understanding Qualifiers

Qualifiers provide a search flexibility for large files, allowing search requests to be narrowed by the specification of criteria that would be inefficient to search and index otherwise (such as the language in which a program is written in the MASTERLIST subfile--only four or so possibilities exist). Qualifiers should be used sparingly: they must be stored redundantly in each index to which they apply, generating high storage costs. Let's look at the structure of a simple index with one qualifier.

If we were to qualify the title-word index shown in the beginning of this chapter with a STATUS element, allowing only the values "Preliminary," "Current," and "Out of Date," the structure of the index and a sample record in it would be something like this:

As you can see, the qualifier is stored with each pointer. Thus, a qualifier takes up quite a bit of space, relative to a simple index on an element with only a few values (such as STATUS). But, the time required to search on the basis of a qualifier is less than that for searching two indexes, especially if one of them has only a few large records (entries) in it. This is because a qualifier search request narrows a search by operating off an existing search result stack; a search involving two indexes requires SPIRES to build two search results, and AND them together.

So, if search time is more important than storage cost, and you will frequently want to qualify a search request by a certain criterion or criteria (there can be more than one qualifier for an index), a qualifier may be appropriate.

Several other facts about qualifiers will influence a decision on their use: 1) they may only be used with the AND and AND NOT logical operators; 2) they allow the full range of relational operators, such as ">" and "<"; 2) they can only be used after a search request involving the index to which they are attached. For example, assume DATE is a qualifier to a TITLE index:

One additional requirement is that the qualifier must occur in any index(es) to which it applies; note that a global qualifier usually is a Required element in the Optional pointer structure--if the POINTER occurs, then the qualifier must occur also. For both global and local qualifiers, this means that either:

Qualifiers may also be "local" or "global." If local, then it may only be used after the index to which it applies has been named in a search request. If global, then it may be used any time after the first FIND command referencing any index. Global qualifiers are stored redundantly on every pointer of every index; they are thus quite expensive from a standpoint of storage costs.

B.7.4  Understanding Sub-Indexes

Sub-indexes are almost exclusively used with personal name indexes. The personal name search processing rule (SRCPROC A38) breaks a search value into two portions: last name, and first names. After searching the index record on the last name, the first names are used to determine which sub-index structures define the pointer groups. If no first names were given in the search request, all pointer groups in the index record are logically OR'd together.

Sub-indexes can be used in other ways besides personal name, and are searched by specifying the commercial ("@") character. For example, we might make CITY a sub-index of STATE and search as follows:

or make SEAT and ROW sub-indexes of SECTION:

Sub-indexes can be useful when things logically fit inside other things, as cities do in states, or seats do in sections. They allow you to choose a subset of the index as a result.

Only the equality operator ("=" or a blank) may be used with sub-indexes.

B.7.5  Understanding Compound Indexes

Several elements can be passed to a single compound index (only one compound index may be defined per goal record), requiring somewhat less storage space than several simple indexes. Passing several elements to a single compound index is necessary when the number of indexes defined for a goal record would require that more record-types be defined in a file definition than are allowed (64 is the maximum). Compound index organization is most efficient when the data elements are numerics (often requiring relational operators for effective searching) and short alphanumerics (such as codes).

However, there are significant disadvantages to compound indexing strategies when a file is large (more than eight thousand records) and is searched or updated frequently. As the file gets large, the cost of updating a compound index with many entries gets progressively greater; compound indexes are also somewhat more time consuming and expensive to search than simple indexes, usually requiring several disk accesses to retrieve the large records they contain from the residual data set. Also, many of the elaborate search and pass processing rules (SRCPROC and PASSPROC) available for simple indexes are not available for compound indexes. In addition, the BROWSE command cannot be used to inspect the contents of a compound index.

All of the advantages and disadvantages of compound indexes arise either from the search and update techniques they require, or from their structure, which is similar to indexes with local qualifiers. In a simple index, there is one index record for each unique value passed from the goal records; if several goal records had the same value, then the one index record for that value would have multiple occurrences of pointers to the goal records. For compound indexes, however, there is one index record for each element-mnemonic in the goal record that passes to the index, and every unique value that that mnemonic has forms an occurrence of a pointer structure containing the pointer and the value of the element in the goal record that is being pointed to. For example: if a goal record passes TEMPERATURE, AGE and DATE to a compound index, the goal and index records would look like this:

Note that the records shown in the right column are created and maintained by passing. The key of the record is a combination of the structure and element number of the elements that is being passed from the goal record; these keys are computed by SPIRES.

When a search request is made against a compound index,

the single index record containing all of the AGE values in the goal record is read, then all the values (ELEM-VALUE, above) in the index record read in are scanned, and pointer groups not meeting the criteria are weeded out of the search result. Because a single record containing all AGE values exists in a compound index on AGE, a command such as

is possible; the result will be all records in which the AGE element passed a value to the index.

A compound index record may grow quite large if 1) it contains many values because the number of goal records passing to it is large, or 2) the values passing to it are long, such as lengthy character strings. Since a large record must be read and then scanned, searching a compound index, particularly in medium and large sized files, may give a noticeably slower response than searching a simple index in the same subfile. Also, updating such a large record is more time consuming and thus expensive than updating a simple index; because the large records in a compound index may often overflow the 2048-byte limit for an ORVYL file block, multiple disk accesses may be necessary to search for or update a single record.

If the file is not large or, if the elements being indexed do not occur in a majority of the goal records, or if updating is not done nightly, then compound indexes are quite suitable for numerics and short alphanumerics, such as codes.

B.7.6  The Impact of Global FOR and ALSO on Indexing

The ALSO and Global FOR commands provide substitutes for a compound index in a large file. Of course, these methods involve sequential rather than indexed searching, and will be noticeably slower (more elapsed and CPU time required) than a compound index search unless the existing search result is small.

The ALSO command always examines all the goal records pointed to in an existing search result; this capability is also available using the Global FOR commands. In contrast to the FOR and ALSO sequential search, an index search request preceded by another search request operates as any compound search request: two or more subsets of pointers are built, one for each of the search criteria, then put together into a single search result. For this reason, if a search request requiring relational operators were always preceded by search requests yielding a relatively small search result, such a request might be performed most efficiently using a Global FOR or ALSO command.

Another consideration: if searching is done sequentially by the Global FOR or ALSO commands, then no expenses are incurred for building, updating and storing compound or simple indexes. If search requests against the values in some elements will be quite infrequent, it may be advisable to use sequential search techniques rather than indexed search techniques. Retrieval may be slower, but costly indexes of little use will not be maintained.

There are some cautions to the use of sequential searching techniques, however. Unlike the FIND command, the ALSO command cannot initiate a search; it must always operate on a preceding search result--in this respect it is like a Qualifier. Unlike the ALSO command, the Global FOR commands need not operate via a search result, but can.

The search criteria for Global FOR commands are specified in the WHERE clause. Two additional operators are available in the Global FOR WHERE clause: OCCURS and LENGTH; these are not available to the FIND or ALSO commands. OCCURS allows a user to specify search criteria based on the number of occurrences of an element, and LENGTH allows criteria based on the length of any single occurrence. For example, suppose you wished to print mailing labels from your subfile's records, but first wanted to print all addresses that would not fit on standard labels. This might be done as follows:

This would place in the active file the subset of all goal records that had more than four lines of ADDRESS or had any occurrence of the element ADDRESS that was longer than the width of a label, 35 characters.

Note from the above example that the Global FOR commands do not automatically provide you with a count of the number of records meeting the criteria specified in the (optional) WHERE clause. This is because the FOR command itself does not initiate a search of the file; the file is not searched until another command is issued that specifies what is to be done with the records--remove them, display them, dequeue them, etc. A count can be obtained, however, and a new set of WHERE criteria specified if the number is too small. The following example shows this process, which involves several examinations of the goal records in the search result, and is therefore rather time-consuming and expensive:

The system's response to the SHOW LEVEL command gives two numbers: The second indicates the number of records examined--here it is 22, the same as the number of records in the search result. The first number indicates how many of the records examined met the criteria specified in the WHERE clause--4 for the first WHERE clause and 11 for the second.

The same results are more directly obtained by the use of the ALSO command, which gives an indication of the number of records meeting the criteria immediately, just as a FIND or other index search command does. For example:

If further index searching commands (e.g. AND, OR) are necessary, then the ALSO command must be used, since the "result" of a Global FOR command is not a set of pointers in a search result. The pointers in a search result can be combined logically with the pointers meeting the criteria specified in subsequent search commands. If, however, the records meeting the WHERE criteria are to be displayed at the terminal or placed in the active file, regardless of their number, then Global FOR is a far more efficient way to do this than the ALSO command.

Compare the two search scenarios following:

The second series of search commands is almost twice as efficient as the first. With the ALSO command, the system must read the goal records to examine the EYES element, then read the records meeting the criteria a second time when a TYPE command is given. With the FOR RESULT command, the record is read to examine the EYES element, then, while the record is still in main memory, it is displayed on the terminal. The net effect is that a record is accessed only once when FOR RESULT is used.

In addition, Global FOR commands can be used for many record management functions other than what has been described. Here we have just exhibited its capabilities with respect to those of the ALSO command. The Global FOR commands facilitate a full range of data base and record management functions unavailable otherwise. All file owners and managers should be familiar with the capabilities of Global FOR for sequential subfile search and subsetting. Consult "SPIRES Searching and Updating" for an introduction to Global FOR searching.

B.7.7  Index Definition

Having considered the different indexing options available to the file definer, and having described the functional differences among them, we can now attack the practical problem of coding the record definitions for the different indexes a file will have.

Subfiles may have one, several or no index records defined. There usually is one index record definition for each simple index in a file. One index record definition could be for a compound index in the subfile (remember that only one such index can be defined per subfile). Through a process called "passing", a compound index typically receives values from more than one element in the goal record, while a simple index typically receives values from only one element in the goal record. However, it is entirely possible for a compound index to have its values passed from a single goal record element.

It is also possible for more than one element in the goal record to pass to a simple index; this situation is known as "multiple passers." There may not be more than one compound index per subfile, but there may be more than one compound index defined in a file that has more than one subfile. There may be a large number of simple indexes, provided that the total number of records defined for a file (goal and index records) does not exceed sixty-four.

The different kinds of indexes a subfile has influences the kinds of records defined for all indexes in a subfile. This is due to one of the primary rules of coding index record definitions: all pointer groups to the same goal record must "look alike" in terms of their structure. This means that if there is a compound index or a simple index with a qualifier for a goal record, the pointer groups in all indexes to that goal record must exhibit the structure of a compound index or simple index with a qualifier.

It is fairly easy to reduce the definition of most index records to a "recipe," and "A Guide to Coding Index Record Definitions" [See D.5.] gives recipes for the indexes encountered in most SPIRES file definitions. The following sections describe how to code index record definitions in such a way that you can see the reason for their structure.

B.7.8  Coding Simple Indexes

If all indexes to a single goal record are to be simple indexes, then the structure of each index record might look something like the following:

The only essential difference between the two is that one declares the key to be fixed in length, the other declares it to be varying in length. What is the key? The key of a simple index record is, in almost all cases, the value of an element passed from the goal record. Thus, if you were passing a fixed binary number, such as a price or date, you would want to specify that the key is fixed. The length is the same as the length of the stored value in the goal record.

It is wise to choose names carefully for the elements whose values are shown in lower case. "Record-name" can be up to six characters long; its value is used by SPIRES to sort record definitions for both goal and index records into alphabetical sequence. By tradition, the name REC01 has often been used for the goal record, and REC02, REC03, etc., have been chosen for the index records.

The "element-name" may be anything up to sixteen characters long. For simplicity, it is usually best to give this element the same name as the name of the goal record element that passes its value to this index record.

The "pointer-name" again may be anything, but the name you choose must be coded in the linkage section. One additional requirement falls on the "pointer-name": it must be given the same name in all indexes for a particular goal record. This is the second primary rule for coding indexes. The pointer element is often given the mnemonic name "POINTER." In the example shown above, the pointer element is an optional multiply occurring simple element. If possible, it is best for the pointer element to be fixed length. That's because SPIRES can do logical operations more efficiently when the pointer element is fixed length, and fixed length elements take less overhead in the index records.

Only one other comment need be made about these two index record definitions. The pointer-name is said to be "TYPE=LCTR;" (fixed length of 4 bytes). The pointer-element is what SPIRES tallies when it reports the number of records in a search result. It is also the element in which SPIRES stores the reference back to the goal record.

If the file definer has coded "REMOVED;" for the goal record definition, then the "reference back" or "pointer" is usually in the form of the four-byte location (a "locator") of the goal record in the residual dataset. If the goal-record has not been removed, then TYPE=LCTR may not be coded; instead the goal-record key will serve as the pointer. But note: even if the goal-record has been removed, it may be advantageous to pass the key rather than a locator in some circumstances. [See B.7.14.]

Let's look now at a very simple bibliographic file definition which contains two indexes, one for titles and one for dates. The date element will pass its value to the FIXED key of an index. Note that the following definition is incomplete in that it doesn't say how this "passing" is to occur. This process is defined in the next chapter.

B.7.9  Coding Simple Indexes with Qualifiers

A qualifier adds another level of "depth" to a simple index record definition: it introduces a structure, the same kind of structure that was defined by "TYPE=STR" in the goal record.

The pointer element, which was only a simple data element containing a reference to a goal record, is now a structure. The structure is always a keyed structure, and the key is always the pointer element. The structure itself is optional, but its key is fixed if the key is TYPE=LCTR. This introduces the third rule of index definition: if the pointer element is in a structure, then it must be the key of the structure. Moreover, the pointer element should always be the first element in the structure; do not begin it with Fixed qualifier elements, for instance, if the pointer element itself is in the Required section.

What of the other elements in the structure? Typically, there is usually only one, the qualifier itself; if there is more than one qualifier, then there will be more than one qualifier element in the structure. The qualifier elements should always be defined in the index record definition with their OCC=1. If the goal record element that is passed to the qualifier does not occur in the goal record, then special pass-processing rules (PASSPROC rules, covered in the next chapter) should be coded to provide a default value. Note that the maximum length limit for a value passed as a qualifier element is 2047 bytes.

Let's examine the skeleton of a simple index with one qualifier. Next to it is shown a TITLE index in which there is a SUBJECT qualifier.

We can now expand the example file definition which had two simple indexes on DATE and TITLE to include a qualifier on the TITLE index. The new definition will illustrate another primary rule of coding indexes: if a pointer structure is used in one index, it must be coded in all indexes to that goal record, whether it is necessary to the structure of the specific index record in which it occurs or not. To see this, notice how the definition of the DATE index record has changed from its appearance in the previous example. Before, it was only a simple index; now, it looks like a simple index with a qualifier--even though no qualifier is passed to the DATE index.

In general, pointer groups for indexes that apply to the same goal record must have identical structure; this is so SPIRES can AND and OR pointer groups when manipulating search results. If you need to violate this general rule, then the first index record-type defined in the linkage section for the goal-record is taken as the model for the other index record-types. If this record-type has the appropriate structure defined for it (as described above), then more specific rules can be used for other index record-types.

The specific rules for pointer group structures are as follows:

The pointer element must be the key of the structure, and it must be the first element in the structure.

If there is no REQUIRED section, then all pointer groups must be identical through the length of the FIXED section. If one pointer group structure is declared with LEN attribute, then all must be declared that same way, and only FIXED elements are allowed. This is the most efficient form of pointer group structure. If the pointer group structure is not declared with LEN attribute, then any (or all) may have OPTIONAL elements.

If there is a REQUIRED section, then all pointer groups must be identical through the end of that section. If the only REQUIRED element is the KEY of the pointer group, then any (or all) may have OPTIONAL elements. If there are non-key REQUIRED elements, then if one pointer group has OPTIONAL elements declared, all must declare OPTIONAL elements.

Note that if an index record-type is to have multiple qualifiers passed to it, then the following definition is appropriate:

B.7.10  Coding Compound Indexes

A compound index record cosmetically looks very similar to a simple index with one qualifier. Two important differences must be noted. First, the KEY of the compound index record is always fixed with a length of two bytes. This is because the key of such a record is an encoded form of the element name that is being passed to this index. [See B.7.5.] If the elements DATE, AGE, and TEMPERATURE all pass to a compound index, there will be three keys, and hence three records, in the index. (The keys, with which the file definer and searcher need never be concerned, tell SPIRES the structure and element number of the element in the goal record definition.) Second, the element that previously named the qualifier is now given a generic name, usually something mnemonically significant, like "VALUE", since each occurrence of it contains one value passed from the goal record. Note that the value passed cannot exceed 2047 bytes in length.

Below, a skeleton record definition is presented. Next to it is shown a compound index that contains a DATE occurrence. Notice that the word DATE never appears in the index definition. The linkage section specifies which element(s) will pass to the compound index record.

Notice how these record definitions follow one of the indexing rules: if the pointer (here, TYPE=LCTR) is in a structure, it must be the key of the structure.

As the following example shows, all of the other rules are followed: 1) if one index has a pointer structure (because it is a simple index with a qualifier or because it is a compound index) then all indexes must have pointer structures; 2) all pointer elements must have the same name in each index.

The following example is similar to the previous two in some ways. The goal record contains TITLE, SUBJECT and DATE elements; COST will now be introduced and placed in the compound index. Note how the definition is quite different from the first example, which showed only simple indexes; but its only difference from the previous example, which showed simple index qualifiers, is the addition of a compound index.

Note that the occurrence of the VALUE element is always 1. The structure containing this element occurs once for each value of an element passed from the goal record. A record passing two COST values would cause two POINTER-STRs to occur, each with a single POINTER and VALUE. When SPIRES retrieves such records, it reports a result, which is the number of POINTER-STRs that met the criteria specified; this count may be high, since a single record could be represented in the POINTER-STR list more than once. SPIRES will correct any erroneous result count after it has been asked to TYPE the records in the search result.

Notice that the POINTER-STR in REC02 (TITLE) has the same "form" as the POINTER-STR in REC03 (DATE) and REC04. SUBJECT, DUMMY, and VALUE all occupy the same position.

B.7.11  Coding Sub-Indexes

Sub-indexes, usually used only for personal name indexing, provide a variation on the theme of simple indexes and simple indexes with qualifiers. Sub-indexes cannot be defined as part of a compound index, but may be defined for simple indexes in subfiles that have compound indexes.

Sub-indexes provide a way of searching data that has a hierarchical organization. Two simple hierarchies might have the following structures:

It would not be useful to find all people with a first name of "John" in a subfile unless you had first established that you were interested only in people whose last name was "Smith." It would also not be helpful for an airline reservation system to be able to find all seats with the number 13 unless a particular flight had been established to restrict the domain of the search.

Two types of sub-indexes can be defined, one for subfiles that contain only simple indexes and no qualifiers (no pointer structures would be involved in this case) and one for subfiles that contain either qualifiers or compound indexes. The following example shows the two types of record definitions; each is defined for a personal name index. The key of such an index is the person's last name, and the key of the sub-index (which is a structure) is the rest of the person's name (first, middle, etc.). Note that the first name structure is not a pointer structure: the pointer or pointer structure is an optional element in the first name structure.

RECORD-NAME = REC02;                   RECORD-NAME = INDEX5;
  REQUIRED;                              REQUIRED;
    KEY = LAST-NAME;                       KEY = LAST-NAME;
  OPTIONAL;                              OPTIONAL;
    ELEM = FIRSTNAME-STRUCT;               ELEM = FIRSTNAME-STR;
      TYPE = STR;                            TYPE = STR;
  STRUCTURE = FIRSTNAME-STRUCT;        STRUCTURE = FIRSTNAME-STR;
    REQUIRED;                            REQUIRED;
      KEY = FIRST-NAME;                    KEY = FIRST-NAME;
      ELEM = POINTER;                    OPTIONAL;
        TYPE = LCTR;                       ELEM = POINTER-STR;
                                             TYPE = STR;
                                       STRUCTURE = POINTER-STR;
                                         FIXED;
                                           KEY = POINTER;
                                             TYPE = LCTR;
                                         OPTIONAL;
                                           ELEM = VALUE;
                                             OCC = 1;

The value passed as a SUB-INDEX value cannot exceed 2047 bytes.

B.7.12  Index Record and Goal Record Elements

Up to this point, the definition of an index record looks very similar to the definition of a goal record: there are keys, elements of fixed or varying length and optional elements, and there are structures. Several file definition elements have not appeared: SLOT, SLOTCHECK, REMOVED, INPROC, OUTPROC, and ALIASES.

SLOT, SLOTCHECK and REMOVED are rarely coded for index records. However, one element is often coded for index records that is not usually coded for the first record-type (usually the goal record) in the file defintion; this is COMBINE. As explained in "Tree and Slot, Goal and Index Records," [See B.6.5.] COMBINE specifies that the data sets created by the compiler for each record definition specifying COMBINEd are to be merged into a single data set or file. Any tree structured data set can be combined with any other tree structured data set; slot record-types cannot be combined with each other, or with any record-type. Except in the largest files (over 100,000 records) with several subfiles, or files in which there are large table-lookup files, COMBINE should be used whenever possible. When there are table-lookup record-types, it is often a good idea to COMBINE them with each other, and to COMBINE the goal and index record-types together. This allows flexibility in erasing and recreating the table files with the ZAP DATA SET command. [See B.10.17.]

The COMBINE element is coded just after the RECORD-NAME element as follows:

Note that COMBINE is not coded for the goal record, REC01, in the above example, since it is the record with which other record-types are combined. The record-type named in the COMBINED statement must have been defined earlier in the file definition; it may not be defined further down. All of the file definitions in "Annotated File Definition Examples" use the COMBINE feature wherever possible. [See D.7.]

Warning: when many terminals will be using the same file simultaneously, the COMBINED statement is not recommended. In general in such situations, you want as many ORVYL data sets to be used as possible, rather than as few as possible. Note, however, that there is a limit of nine ORVYL data sets (filename.REC1 through filename.REC9) that can exist for a single file, so if you have more than nine record-types, the COMBINE statement will be necessary for some of them.

B.7.13  Index Records as Goal Records

What about coding INPROC, OUTPROC and ALIASES for index record definitions? These file definition elements may be coded for index records, but often are not. Since SPIBILD maintains the indexes, there is no need for ALIASES, and any INPROC, INCLOSE or OUTPROC rules that are coded are ignored when SPIBILD is updating the indexes as part of its processing. If the SPIBILD process will create new records in a record-type (as is usually the case with index records), it is important that no FIXED or REQUIRED elements be defined that will not be created by SPIBILD. If a required element is not present when SPIBILD attempts to create a new record in a record-type, a PASS ERROR with a code of S419 will occur. This is a serious error.

Generally file owners are encouraged to include INPROCs and OUTPROCs for the key of each index record, because these affect the results displayed by the BROWSE command. When index values are displayed with the BROWSE command, the values are processed through the OUTPROCs for the key. Also, if a value is given in the BROWSE command (such as BROWSE DATE-ADDED 7/1/80), that value is processed through the key's INPROCs as well. Without such INPROCs and OUTPROCs on the key of the index record, browsing the index can be a pointless exercise for the subfile user. If only an INPROC is defined for an index record key and the INPROC sets the type for the key, e.g., an A31 identifies the stored key as a hexadecimal one, then SPIRES will convert the values displayed to string values when the BROWSE command is issued. This is not usually as valuable, or as straightforward, as putting the appropriate INPROCs and OUTPROCs in the index record definition.

It is also important to code INPROCs and OUTPROCs (and perhaps even ALIASES) if the index is to be used as a goal record that can be selected (using the SELECT command) or attached (using the ATTACH command).

In the following example, suppose the first index record, the SUBJECT index, can be selected as a goal record. An element called CROSS-REFERENCE has been added, so that the file owner can add cross-reference records to the SUBJECT index. Also, an OUTPROC action 32 has been added on the pointer, and it will convert the pointer on output by referring to the goal record it locates and looking up the TITLE element. The details of coding action 32 are covered in "Indirect Record-Access: Action 32 and SUBGOAL Processing." [See C.5.]

B.7.14  Indexes for Non-Removed Record Types; Keys vs. Locators

If the goal record does not have REMOVED specified, the file definer may not use the TYPE=LCTR specification in defining index records. Whenever the pointer to the goal record is the goal record key, rather than a location of the goal record in the residual data set, then TYPE=LCTR may not be specified.

This is always the situation when the goal record is not REMOVED. It may also occur for REMOVED goal records if the file definer has chosen to pass the goal record's key rather than the goal record's residual location.

All of the sample index definitions shown so far have assumed that a location in the residual data set is being stored rather than a key. However, it is fairly simple to see the implications of storing a key for index record definition:

Traditionally, SPIRES file owners have more frequently passed locators rather than goal record keys as pointers. By default, the File Definer subsystem creates index records that contain locators rather than keys. However, it can be very useful to pass the key at times: if the key itself is stored in the index record, the key can be examined directly when the index is used as a goal record. Also, if the techniques of "goal-to-goal passing" or "self-indexing goal records" are being used, the key usually must be passed. [See C.12.]

Below are some guidelines to follow in choosing whether to pass locators or to pass keys. Note that it would not be a fatal error to pass keys instead of locators or vice versa in contradiction to the second recommendation; however, it might mean that SPIRES would not handle certain search procedures as efficiently as it could.

These guidelines can be affected by other circumstances. For example, if most searches will retrieve more than half of the records in the result from the deferred queue, locator access could be more costly than key access. A detailed study discussing the key-locator decision appears in the back of this manual. Remember, for most files, the choice will not make very much difference; for most of the rest, the guidelines above will be satisfactory. [See D.8.]

B.7.15  Ensuring the Validity of Index Records

Index records are normally maintained entirely by SPIBILD, in accordance with the rules the file definer specifies in the linkage section of the file definition. [See B.8.] The file definer does not need to take any explicit action to ensure that information in the indexes is valid, but must ensure that a null value isn't passed as the key of an index record.

When index records can be transferred and updated as goal records, [See B.7.13.] the file owner must ensure that a user cannot incorrectly alter the linkages between goal and index records build by SPIBILD.

The most important ingredient or rule of this linkage is that all keys along the structural path from the index record's key to the pointer group (or pointer element) are in descending sort order. This is the way they are automatically created by SPIBILD. The best way to ensure this is to make these elements non-updateable, either in a view or with a PRIV-TAG specification. [See B.9.4.]

All structures along the pointer group path up to and including the pointer element itself must be in descending sort order. This can be ensured by coding an A138:0 as the INPROC for all structures along this path.

For example, in a personal name index [See B.7.11.] the sub-index structure must be sorted in descending order by its key (a person's first name), and:

B.8  Understanding and Coding the Linkage Section

B.8.1  Functions of the Linkage Section

The previous chapters of this manual have covered the definition of the record-types that will make up a SPIRES file. Two kinds of record-types have been examined in detail: goal records and index records.

The linkage section, as its name implies, links the goal and index records for two purposes: searching and passing. The linkage section controls the search process by specifying in the "SEARCHTERMS" statement the names of the components of an index to be searched. The linkage section also specifies, in the "SRCPROC" statement, the processing rules to be applied to values in a search request. Passing, which is the process of using information in a goal record to build an index record, is controlled by specifying the goal record information to be passed. The source of this information is specified by the "GOALREC-ELEM" statement or by processing rules coded in the "PASSPROC" statement.

Thus we can see at least two different parts of a file definition. The first part, defining the goal and index records, is a description of data structures. The second part, defining the linkage between goal and index records, is devoted to procedural rather than descriptive statements. These procedural statements provide for passing and searching. A third part of the file definition, defining the privileges of any user or group of users with respect to a subfile, is described in the next chapter "Defining Subfile Privileges." [See B.9.]

The linkage section itself can be subdivided into small sections: 1) a single group of statements defining certain global relationships between a goal record and all its index record(s), and 2) groups of statements describing the specific processing of the linkage between the goal record and a single index record. (1) is discussed in "The Global Parameters Section" [See B.8.2.] and (2) is described in "Individual Index Linkages" [See B.8.3.] There usually is one individual index linkage for each index record you have defined. The structure of these parts is fairly simple; the definition of the linkage is in terms of SEARCHTERMS, SRCPROC, GOALREC-ELEM and PASSPROC statements. If a compound index, qualifiers, or sub-indexes are used, then one or two additional elements must be specified in the linkage definition for the index record in which they occur. The only difficulty usually encountered in defining linkage sections is in coding the various PASSPROC rules, and occasionally in coding the SRCPROC rules; we will not consider the definition of these processing rule strings in detail until the end of this chapter.

B.8.2  The Global Parameters Section

The linkage section for any particular goal record begins with some "global" information that is common to all indexes belonging to that goal record. This information always includes the name of the goal record to which the entire linkage section applies, the name given to a search result for the goal record, and the name of the pointer element in all of the indexes. Any global qualifiers (qualifiers that are passed to all indexes) are specified here also.

Linkage sections are coded following the record definitions of the goal and index records. The linkage section begins with the global parameters portion:

Because of the rarity of global qualifiers, the additions they require to the global section will not be covered until later in this chapter. [See B.8.6.] Let's begin coding the linkage section by defining the global parameters section for a very simple bibliographic file. In this file, we have one goal record, BOOK, and two index records, REC02 and REC03. Let's say that we want the search result to be called "CITATION".

The file definition, up through the global portion of the linkage section, looks like this:

The GOALREC-NAME statement names the goal record by specifying its RECORD-NAME. The EXTERNAL-NAME statement declares what a search result will be called when SPIRES reports the result count after a search command such as FIND. The PTR-ELEM statement names the element in each index record that is to receive the pointer back to the goal record. You may choose any element name you wish, but it must be the same in all index records. In our example, it happens to be POINT-BACK.

The PASSPROC specifies A170 because the pointer element is TYPE=LCTR and the goal records are REMOVED. A170 specifies that the information passed to the pointer element in each index will be the address of the goal record in the residual data set. If the pointer element is not TYPE=LCTR, then the information passed to the pointer element in each index should be the key of the goal record, which is usually specified by a GOALREC-KEY statement. If the goal records are not REMOVED, then the pointer element cannot be TYPE=LCTR and A170 cannot be used.

B.8.2.1  The SEARCHPROC Statement in the Global Parameters section

As noted earlier, you can allow users to search for records using record-key criteria in a FIND command by adding a SEARCHPROC statement to the global parameters section, e.g.:

In this example, the keys of record-type REC01 are presumably stored as 4-byte integers. When a user issues the search command FIND PERSON 295, the value "295" will be run through the $INT(4) processing rule to retrieve that record.

No extra index record-type needs to be defined; the goal record-type, which is arranged in key order, of course, is used as the index. The feature is indicated to the user by the term "Goal-Index:" in front of the goal record name, instead of "Goal Record:" in the SHOW INDEXES display.

There are some restrictions on the use of this feature, however:

The SEARCHPROC string, like any other SEARCHPROC, should translate the incoming search values into values that might be found in the index records. [See B.8.4, B.8.9.]

To use this facility in searching, you type any of the values for EXTERNAL-NAME as the search term, e.g., using the example above:

All relational operators, except those restricted to compound indexes, may be used in such searches.

In a sense, added records in the deferred queue are "immediately indexed" in the goal-index -- they can be found immediately with the FIND command:

Trouble can arise if a search result like the one above containing added records is stored, using the STORE RESULT command -- if the result is restored after the file has been processed, those particular records may not be found. [They would have been represented in the stored result with a temporary deferred queue locator, not their permanent locator.]

The logical operators TAND, TNOT and TOR will not work properly in most searches involving the goal-index. Similarly, when secure-switch 7 is set, pointer groups may not be compared properly. [Technical note: when processing "goal-as-index" searches, SPIRES must create pointer groups (either locators or the actual keys) for the "goal-index" records retrieved, so that they can be compared with other pointer groups in iterative searching. If the pointer groups for the regular indexes are structures (meaning they contain qualifiers), SPIRES must build dummy structures for the "goal-index" records. These dummy structures are built with binary-zero Fixed values and null-length Required values. Thus, if qualifiers are compared in iterative searching (e.g., because of TAND), SPIRES will be comparing dummy qualifiers to real ones, creating inaccurate results.]

B.8.2.2  The EXTERNAL-NAME Statement

The EXTERNAL-NAME statement can thus have two purposes:

No name can exceed 15 characters in length nor contain any blanks.

If you are not creating a goal-index, then the value for the EXTERNAL-NAME statement will be one or possibly two values that will appear in search result messages. If only one value is given, e.g.,

it will be used as the singular form in messages, with an "S" added to create the plural form when needed:

If multiple values are given, SPIRES will use the first for the singular form, and the second for the plural. For example:

would lead to:

and

would lead to:

If you are using the EXTERNAL-NAME statement both to supply the singular and plural forms and to supply additional search terms for the goal-index, be sure to place the singular and plural terms as the first two entries in the list.

On the other hand, if you are adding terms to the EXTERNAL-NAME statement only for the purpose of supplying additional search terms, remember that you will be affecting the singular and plural terms used by SPIRES in reporting search result. SPIRES will always use the first one for the singular form and the second one for the plural.

B.8.3  Individual Index Linkages

After any global parameters section, a linkage between the goal record and each index record must be defined. The definition of these linkages is, in structure, fairly straight-forward, and looks like the following:

The structure of this skeleton can be slightly complicated by the inclusion of linkage information for a sub-index, local qualifiers, or a compound index. These cases will be covered later in this chapter. [See B.8.5, B.8.7, B.8.8.]

A "recipe" for coding the global and individual parameters of the linkage section is given in "A Guide to Coding the Linkage Section Definition." This guide covers all types of linkage definition. [See D.6.] The different kinds of index records coded in the preceding chapter will serve as examples of simple, personal name, qualified and compound index linkage sections. We will take each possibility in turn, leaving the detailed consideration of PASSPROC rule strings to the end of this chapter.

The PTR-GROUP statement, for any particular index, names a multiply occurring element in the index that is either a STRUCTURE element whose KEY is the PTR-ELEM, or a simple ELEM which is the PTR-ELEM. If the subfile needs only simple pointers back to goal records (no compound index or qualifiers), then PTR-ELEM and PTR-GROUP refer to the same simple pointer ELEM in all indexes. But if there is a need for compound index or qualifier terms, then PTR-GROUP for each index refers to a multiply occurring STRUCTURE which will contain those terms. The KEY of each STRUCTURE is the PTR-ELEM (pointer element).

You are limited to 88 INDEX-NAME sections per linkage section that actually pass data from the goal record-type to an index record-type. (In some situations, you can code an INDEX-NAME section that does not cause any data to be passed. [See D.1.7.1.6.5 for an explanation of action A165 and the $PASS.OCC proc.] A section like that does not count as one of the 88.)

B.8.4  Simple Indexes

Here are the two individual linkages to the index records REC02 and REC03. In the complete file definition we are building towards, these would be added right after the global parameters section with which the previous example ended.

The INDEX-NAME statements name the particular index records that will be linked to the goal record; here they are REC02 and REC03. The SEARCHTERMS statement is similar to the ALIASES statement in the goal record definition. Here SEARCHTERMS specifies the name or names that can be used to access an index in a search command such as FIND. [See B.9.4.5 to see how PRIV-TAG can restrict the use of SEARCHTERMS.]

The SEARCHPROC statement (also seen as SRCPROC) specifies processing that is to be performed on search values given in search commands. This processing is usually equivalent to a combination of both INPROC and PASSPROC rules used to determine the form in which goal record values are to be placed in the index record. That is, SEARCHPROC rules are usually coded to "translate" incoming search values into values that might be found in the index records. The SEARCHPROC for REC02:

breaks a search value up into individual words ("A45,", which breaks on blanks), then excludes any words of two or fewer characters (A47,2), and allows special truncated search if a word of more than three characters contains a "#" (A11:3,#).

Notice that the PASSPROC for this index contains similar rules:

A166 specifies that the goal record element value (or values) named in GOALREC-ELEM is to be fetched, and is later to be processed by A45 or A38 (both are actions that "split" a value into parts). The rules "A45,/ A47,2" make sure that only individual words are passed to the key of the index records, and that no words less than two characters are passed. This part of the rule string is identical to a portion of the SEARCHPROC rule string.

The SEARCHPROC and PASSPROC rules coded for REC03 are as follows:

The SEARCHPROC rules coded will convert a date in a search value to the internal form of a date, just as was done by an INPROC=AS31 statement in the goal record definition. The PASSPROC rule specifies only that the element whose name is coded in the GOALREC-ELEM statement be fetched and stored in the index record without the standard conversion to uppercase on passing. Values that are stored in character form should always be forced to uppercase on passing. Any other form of a value (e.g., binary, floating-point, packed decimal) should not be forced to uppercase. No translation by a matching AS31 is necessary in the PASSPROC, since the date is stored in the appropriate format in the goal record via the INPROC=AS31. Part of the power of SPIRES indexing methods is that values can appear in the goal record in one form, and can be passed and searched in a more usable (for the searcher) form.

The PTR-GROUP statement names the same element as the PTR-ELEM statement because our indexes have been defined to use only simple pointer elements (no qualifiers or compound index).

Although each INDEX-NAME refers to a different RECORD-NAME in our example, it is possible for any RECORD-NAME to be referenced by more than one INDEX-NAME. Such a case usually occurs when different elements within the goal records are to be passed to the same index, but those elements require different PASSPROC or SEARCHPROC rules. For passing efficiency, it is better to use multiple-passer rules ($PASS.ELEM, A167), putting the elements all together into a single individual linkage section, when you can. However, if you do need to code separate linkage sections because of different PASSPROC or SEARCHPROC needs, and if any single goal record could be so large that it could not fit in the pass stack, then try to put the individual linkages as early as possible in the linkage section to insure that they will pass at the same time. (This last point is noteworthy only for files with very, very large records that each pass lots of data to indexes.)

B.8.5  Sub-Indexes

The general form of SUB-INDEX linkage is similar to INDEX-NAME:

When SUB-INDEX terms are added to a simple index, the effect is to introduce additional structural levels to the hierarchy leading from the KEY of the INDEX-NAME record to the PTR-GROUP element. SUB-INDEX names a keyed structure in the index record. The KEY of that STRUCTURE receives the goal record's value being passed for the sub-index term. A personal name index is a good example of a simple index with a sub-index structure. Let's modify our sample file definition to include a PERSON element in the BOOK records, and another index record: REC04

No GOALREC-ELEM was needed for the SUB-INDEX term in this example because the PASSPROC A165 indicates the value to be passed to FIRST-NAME had already been created by A38 in the PASSPROC associated with INDEX-NAME. This is usually the case with personal name sub-indexes, but not for other sub-index structures. The SEARCHTERMS of the SUB-INDEX for personal name are not usually used in a search request because A38 in the SEARCHPROC of the INDEX-NAME provides the necessary search values for the SUB-INDEX. [See B.9.4.5 to see how PRIV-TAG can restrict the use of SEARCHTERMS.]

Let's examine the index record definition and linkage definition for a sub-index that is not for a personal name. Suppose the following hierarchy were needed for an airline reservation system:

So, SEAT is inside SECTION which is inside FLIGHT. The index record definition for this structure would look like this:

The linkage definition for this index record would look like this:

Note the use of A171 to pass a default value of SECTION and SEAT if no value is found in the goal record. This will ensure that the index record is created, even if it is incomplete. A171 is also used this way in passing qualifier elements. [See B.8.6, B.8.7.]

The SEARCHTERMS of a SUB-INDEX are specified with a leading @-sign in a search request along with the SEARCHTERMS of the INDEX-NAME. For example, FIND FLIGHT 27 @SECTION B @SEAT 9 requests a specific hierarchy within the REC04 index.

B.8.6  Global Qualifiers

In order to have qualifiers in an index record, PTR-GROUP should specify a structure element in all index records. PTR-ELEM specifies the KEY of the structure, and the other elements within the structure receive qualifier values. The "form" of the structure must be the same across all index records associated with a particular GOALREC-NAME. By that is meant, the number of FIXED, REQUIRED, and OPTIONAL elements must be the same in each definition of the structure; and the LENgth and OCCurrence attributes associated with corresponding elements must be the same within each structure. The KEY of the structure receives the pointer back to the goal.

Global qualifiers are specified in the global parameters section of a linkage description just prior to the first INDEX-NAME section. The statements of the QUAL-ELEM section are:

The SEARCHTERMS of any QUAL-ELEM are specified in a search request following the AND or AND NOT logical operators. [See B.9.4.5 to see how PRIV-TAG can restrict the use of SEARCHTERMS.]

Let's alter our sample file definition and linkage section to pass DATE as a global qualifier instead of building a separate DATE index (REC03). We will make DATE a global qualifier of both TITLE (REC02) and PERSON (REC04) indexes. Since PTR-ELEM must become the KEY of a PTR-GROUP structure, we will have to alter the index record definitions. The revised definition might look like:

Notice that we defined the POINTER-STR as consisting of entirely FIXED information, and included LEN=8 with TYPE=STR. The "form" of the pointer group structure is the same in all indexes.

If the DATE element within the BOOK records occurred multiple times, only the first occurrence would be passed to the global qualifier. And if DATE hadn't occurred at all, either A171 would need to be specified in the PASSPROC to supply a default value, or else a null value would be passed to the global qualifier.

There is a special case of global qualifier worth mentioning. If the keys of goal records are passed to PTR-ELEM, then the pointer element in the indexes referred to by the PTR-ELEM can also be referred to by a global QUAL-ELEM. The QUAL-ELEM would not specify a GOALREC-ELEM since the key of the goal records had already been passed to PTR-ELEM. The SEARCHPROC would correspond to the INPROC of the goal record's keys, and the PASSPROC must be A165. The SEARCHTERM statement provides you with search names that allow you to use the PTR-ELEM as a qualifier, which means you can qualify your search requests by goal record key criteria.

If this special QUAL-ELEM is the only qualifier defined for the GOALREC-NAME, and there is no compound index, then PTR-GROUP, PTR-ELEM, and this QUAL-ELEM can all refer to the same simple element in the indexes. This is the only exception to the rule about PTR-GROUP structures being required when qualifiers are defined. [See C.6.11 for more information about this technique, B.8.2 for another technique that provides key searching.]

B.8.7  Local Qualifiers

All the rules for PTR-GROUP structures and PTR-ELEM keys apply for local qualifiers just as they do for global qualifiers. [See B.8.6.] Local qualifiers are specified in the linkage section for any particular index by adding QUAL-ELEM sections just after the PTR-GROUP statement.

Let's alter our sample file definition and linkage section again to pass DATE as a local qualifier of the TITLE index (REC02) instead of making it a global qualifier in all indexes. We will keep the personal name index (REC04) introduced in the SUB-INDEX section, but it will not have a qualifier.

Notice the DUMMY element in the pointer group structure of REC04. It is there to make the structure "form" identical to the structure defined in REC02, which has a DATE-QUALIFIER element. Also notice that both the DUMMY element and DATE-QUALIFIER element were declared OPTIONAL. That's because the DUMMY element will not occur within REC04 occurrences of the POINTER-STR.

In this sample definition, the DATE element in the goal records always occurred since it is a FIXED element. However, if had been an OPTIONAL element which did not occur, then A171 should be coded in the PASSPROC to pass some default value to the local qualifier, otherwise no index entries would be created for TITLE. All local qualifier and sub-index sections must define values for an index entry to be created. If the goal record elements which supply values for local qualifiers or sub-index terms are multiply occurring, or a PASSPROC rule specifies multiple passer elements, then multiple index entries can be created. [See B.8.14.]

Problems may be encountered if a variable length qualifier is passed to a fixed-length qualifier element in the index record. If this is being done, the following PASSPROC should be included with any other qualifier PASSPROCs:

where "n" is the value of the LEN statement on the qualifier element (i.e., the fixed-length field size).

B.8.8  Compound Indexes

CINDEX-VALUE is just a special case of local qualifier. PTR-GROUP must be a structure with PTR-ELEM as its KEY.

Let's alter our file definition to make DATE a compound index. REC03 will now be used to define a compound index record-type. Remember, the "form" of the PTR-GROUP structure must be the same in all index definitions. Here is the revised file definition, including the linkage sections for both the simple index on TITLE and the compound index on DATE (the PERSON index has been dropped).

Compare the linkage definitions for REC02, a simple index, and REC03, a compound index. Notice that the PTR-GROUP statements refer to different structure names in each index, but the "form" of those structures is the same, and they have the same KEY name. The PTR-GROUP structure names are usually the same, but that is not a requirement, as this example demonstrates.

Also notice that the compound index linkage has a "dummy" SEARCHTERMS statement coded, no SEARCHPROC or GOALREC-ELEM statements, two PASSPROC statements, and a new kind of statement, "CINDEX-VALUE".

When searching a compound index, the searcher may use the element name or alias of any of the goal record elements passing to the compound index; the SEARCHTERMS statement must be coded, but its value is meaningless. The index names that are reported when a user issues the SHOW SEARCH TERMS command are picked up from the P+ values of PASSPROC=A167. Note from the description of this action [See D.1.7.] that the order of the P+ parameters is not important unless some of the elements being passed are inside structures; in this case, the order must be the order in which the elements would be displayed if a record from the file were displayed in the standard output format.

It is this first PASSPROC, A167, that specifies the goal record elements that are passed to the compound index; this is why no GOALREC-ELEM statement is needed. Instead of a SEARCHPROC rule string, SPIRES passes all search values through the INPROC rules for the particular goal record element being searched. (Only one SEARCHPROC rule can be coded in a compound index definition: SEARCHPROC = A6; in the INDEX-NAME section.)

The CINDEX-VALUE statement is only coded in the linkage to a compound index, immediately following the PTR-GROUP statement. It names an element in the index record's PTR-GROUP structure that will receive data values being passed from the goal record elements (See A167 in the PASSPROC of INDEX-NAME). In the sample file, this element has the name "VALUE", hence the statement CINDEX-VALUE=VALUE in the linkage definiton.

The final statement, a second PASSPROC, is always coded in compound index linkages. If the elements being passed are in binary form (as is often the case in compound indexes), such as the DATE element, then A169:1 is the only rule coded for this statement. If the elements are values that must be converted to uppercase, then A169:0 (or simply A169) is coded. If some elements being passed must not be converted to uppercase and others must be, then A162 is also coded, as explained later. [See B.8.11.]

B.8.9  Coding Searchproc Rules

If the values of an element have been altered by an Inproc or Passproc then the same processing rules are generally coded in the Searchproc rule string to apply a similar transformation on the values a user might specify in search commands.

Numerous Searchproc rules are available to modify the search process, as listed in section D.1.2 of this manual and in the "SPIRES System Procs" manual. Other actions may also be used in Searchproc rule strings, as indicated in their descriptions.

If no Searchproc statement is coded, and thus no Searchproc rules are specified, the default Searchproc will be used: "A45,". This will automatically cause search values to be broken on blanks.

Case-Sensitive Searching

By default, SPIRES assumes all index entries are in uppercase, and thus converts search values into uppercase automatically. Hence, no rule such as A30 or $CAP is needed in the Searchproc to convert the search value to uppercase.

However, in rare circumstances, file owners want case-sensitive searching, e.g., FIND ID = 32s14 should provide a different result from FIND ID = 32S14. In those cases, the file owner needs to turn off the automatic uppercase-conversion by invoking Secure-Switch 16 in the Subfile section of the file definition. Then, any other indexes of the subfile in which uppercase search values are required should probably include $CAP or A30 in their Searchproc statements. [See B.9.3.16.]

Note that case-sensitive indexes are created in passing by choosing a fetcher rule that doesn't automatically convert the passed value to uppercase. [See B.8.12.]

B.8.10  The NOPASS Statement

The NOPASS statement may be specified in the linkage section of a file definition. If it is coded, SPIBILD will not attempt to update ANY of a subfile's indexes when it is processing records. The indexes can still be searched, however.

The statement, "NOPASS;", is placed after the last statement in the linkage section of the goal record whose indexing is to be "turned off." The file definition must then be recompiled. Subsequent SPIBILD processing will not attempt to pass any information to the subfile's indexes. In order to re-start index updating, the NOPASS statement must be removed from the file definition, and the definition must then be recompiled.

Note that NOPASS stops passing to all indexes in a single linkage section; it cannot be used to disable one of several indexes selectively. To do that, consider using the PASSPROC = $NOPASS statement. [EXPLAIN $NOPASS PROC.]

You can disable passing on a case-by-case basis (as opposed to globally by changing the file definition) by using the SET PASS or SET NOPASS commands is SPIBILD. These commands also allow you to turn off passing from individual record-types. [See B.10.12.2.]

B.8.11  Coding PASSPROC Rules

The rules for coding strings of PASSPROCs are more rigid and difficult than the rules for coding INPROCS, OUTPROCS or SEARCHPROCS. The descriptions of the PASSPROC rules in the last part of this manual [See D.1.7, D.2.6, D.3.] and in the ACTIONS subfile are very concise; a problem with their brevity is that the choices a file definer has in coding PASSPROC rule strings are not easily distinguished. This section will focus on the central choices that must be made in coding PASSPROC strings.

The first PASSPROC encountered in most file definitions is in the global parameters of the linkage section. For example:

As has been mentioned, PASSPROC=A170 is used if the goal record is REMOVED (as most samples definitions in this manual are) and the PTR-ELEM is declared TYPE=LCTR. This rule says that the PTR-ELEM in each index record will receive the locator of the goal record in the residual data set.

It is also possible to pass the key of the goal record to the PTR-ELEM, instead of the locator of the goal record in the residual data set. If the goal records are not REMOVED then you must pass the key. To do this, one of the following PASSPROC rules must be coded instead of A170:

Both of these rules do not force to uppercase. When no PASSPROC is coded, the default action is to force the key to upper case, which should only be done when the goal record's key is already an uppercase value.

The first rule is used when only the GOALREC-KEY is to be passed. That includes passing the slot-number of SLOT records. The second rule is usually used with multiple passer elements, but can be used with just the goal record's key specified.

Although it is not frequently done, it is possible to pass the key of the goal record as the pointer even if the goal record is a REMOVED record-type. This should be avoided if the key is varying in length or if the key is more than four bytes long.

There are two other places in the linkage section where the choice of a PASSPROC is simple. The first of these is the second PASSPROC statement coded in a compound index linkage definition:

A169:1 is used when the value or values being passed are stored as numbers and hence must not be forced to uppercase. A169:0 would be used if character strings were being passed to this index. If both character and binary data were being passed to this index, then A162:1 would also be coded to exclude certain elements' values from uppercase conversion. For example:

would force all values to uppercase upon passing, except those from the DATE element.

The second case in which the choice of a single PASSPROC is simple is the second PASSPROC string coded in a personal name index:

Here, only PASSPROC=A165 can be coded because A38 in the PASSPROC of INDEX-NAME supplies both the KEY of the index record and the KEY of the sub-index structure.

B.8.12  Choosing the "Fetcher" Passproc

Choosing the Passproc rule that fetches the element value or values from the goal record is a matter of selecting one rule from among sixteen that are shown in a table below. However, the table itself requires some explanation of the terminology that is often found in SPIRES processing rule descriptions.

The terms "single passer" and "multiple passer" need definition. A single passer situation occurs when only one goal record element is passing its value or values to an index and the GOALREC-ELEM statement is coded. It does not matter whether this element is itself singly or multiply occurring, or whether A45 is used to break a single occurrence into multiple occurrences. A multiple passer situation occurs when more than one element in the goal record passes to a single index, and so the elements to be passed are coded in the Passproc itself, not in any GOALREC-ELEM statement. For example, if HOME-PHONE and BUSINESS-PHONE elements in a goal record were both passed to a PHONE index, this would be a multiple passer situation.

Compound indexes present a special problem, since they contain two Passproc rule strings. However, the choice of Passproc rules is fairly straight-forward. [See B.8.5 for a complete discussion.] The first Passproc is always "A167:0"; this defines the elements from the goal record to be passed to the index. The second Passproc may be either A169:0 or A169:1, depending on whether the elements being passed are to be forced to upper case in passing or not; see the entry for a single passer without A38 or A45 in the following table.

One other factor affects the choice of Passproc rules from the table. If values "fetched" (obtained from) the goal record are to be broken apart in passing by action 45 or by action 38 (the latter for personal name indexes), then different Passproc rules must be selected than would be if A45 or A38 were not to be coded in the same rule string.

The P1 parameter on the Passproc that fetches the value from the goal record is determined by whether or not the value is to be forced to uppercase in passing. Values that are converted to an internal form, such as fixed binary or date values, must not be converted to "uppercase" in passing, since this would change their value. Otherwise you would choose a rule that would convert the value to uppercase, since SPIRES will automatically convert the user's search value to uppercase (except in the rare situation of case-sensitive indexes). [See B.8.9.]

The P1 parameter is also determined by whether or not the value should be processed through the OUTPROC rules associated with that element, thus passing the external form of the value to the index.

Both actions and equivalent system procs are shown. For details about system procs, see the SPIRES manual "System Procs".

                 Without                With
               A38 or A45            A38 or A45
         |--------------------|---------------------|
         |                    |                     |
 Single  | A169:0             | A166:0              |--Force Upper
 Passer  | $PASS              | $PASS(,BREAK)       |
         |                    |                     |
         | A169:1             | A166:1              |--Don't Force
         | $PASS(UPLOW)       | $PASS(UPLOW, BREAK) |
         |                    |                     |
         | A169:8             | A166:8              |--Force Upper,
         | $PASS(,,OUT)       | $PASS(,BREAK, OUT)  |  but Pass
         |                    |                     |  External Form
         |                    |                     |
         | A169:9             | A166:9              |--Don't Force,
         | $PASS(UPLOW,,OUT)  | $PASS(UPLOW, BREAK, |  but Pass
         |                    |   OUT)              |  External Form
         |--------------------|---------------------|
         |                    |                     |
Multiple | A167:1             | A167:2              |--Force Upper
 Passers | $PASS.ELEM(elems)  | $PASS.ELEM(elems,   |
         |                    |  BREAK)             |
         |                    |                     |
         | A167:5             | A167:6              |--Don't Force
         | $PASS.ELEM(elems,  | $PASS.ELEM(elems,   |
         |  UPLOW)            |  BREAK.NUM)         |
         |                    |                     |
         | A167:9             | A167:10             |--Force Upper, but
         | $PASS.ELEM(elems,, | $PASS.ELEM(elems,   |  Pass External Form
         |  OUT)              |  BREAK, OUT)        |
         |                    |                     |
         | A167:13            | A167:14             |--Don't Force, but
         | $PASS.ELEM(elems,  | $PASS.ELEM(elems,   |  Pass External Form
         |  UPLOW, OUT)       |  BREAK.NUM, OUT)    |
         |--------------------|---------------------|

B.8.13  Other Actions in a PASSPROC Rule String

The following describes the syntax for any PASSPROC rule string. You enter the table at the word "PASSPROC" and follow the paths defined. The symbol "::" is read "is defined as." Terms on the left side of a "::" are defined by the term(s) that appear on the right side of the "::". Terms on the right side of the "::" that are listed directly under another term (or terms) on that side are an alternative definition for the term on the left side of the "::". The symbol "|" means "or," and also separates alternative definitions.

 <Term>    indicates a term that must occur once, i.e., is required
 (Term)    indicates a term that may occur once, i.e., is optional
 (0,Term)  indicates a term that may occur several times,
           i.e., may not occur or may occur more than once
 A-number  indicates a required processing rule.  If no P1
           parameter is specified, then all P1 parameters are
           included.  If a P1 parameter is specified, then only
           that P1 parameter is allowed.
PASSPROC         ::  <MULTIPLE-PASSER>
                  |  <SINGLE-PASSER>
                  |  <SIMPLE-PASSER>
MULTIPLE-PASSER  ::  (DEFAULT) <MULTIPLE-FETCHER> (0,MIDDLE) <BREAK>
SINGLE-PASSER    ::  (DEFAULT)  <SINGLE-FETCHER>  (0,END)
SIMPLE-PASSER    ::  <DEFAULT> | A165 | A167:0 | A170
DEFAULT          ::  A171
MULTIPLE-FETCHER ::  A166 | A167:2 | A167:6
SINGLE-FETCHER   ::  A167:1 | A167:5 | A169
MIDDLE           ::  A22  | A32  | A36  | A40  | A43  | A44  | A46  | A47
                  |  A48  | A55  | A62  | A161 | A162 | A163 | A168
BREAK            ::  A45  (0,END)
                  |  A38
END              ::  <MIDDLE> | A52 | A164

For example, this syntax shows that the following is illegal syntax:

because A52 (as an END rule) must follow A45 (which is a BREAK rule). This syntax also shows that A38 must be the last rule in any PASSPROC in which it is coded.

B.9  Defining Subfile Privileges

B.9.1  The Function of the Subfile Section

Preceding chapters of this manual have considered the definition of goal and index records, and most recently, the definition of the linkage section. The present chapter will describe the last major section of a file definition, the subfile section.

The subfile section determines a particular user's view of the file and its data. Different users may have different views defined. The subfile section defines what are called "privilege groups," that is, groups or individual users who have certain "rights" of access (privileges) when a particular subfile is selected. Generally, these privileges determine 1) whether a user can update a subfile or merely search it; 2) whether a user can issue commands such as SET FORMAT, SET ELEMENT, and BROWSE; 3) whether certain indexes cannot be searched; 4) whether certain structures or elements can't be updated; 5) whether certain structures or elements can't be seen; 6) what format, search modifier, or subfile explanation are in effect when a user has selected a subfile. Together, all these privileges make up a profile or view of the data in a file that particular users see.

Privilege groups in the subfile section vary in number and complexity according to the number and variety of restrictions placed on elements and indexes. Most file definers have found the definition of the subfile section to be the easiest and most straight forward part of the file definition task. In fact, you may choose to omit the section from your file definition entirely. But since this section defines, among other things, the name that will be used to access the goal record in a SELECT command, only the file owner could have access to the goal record, and then only via the ATTACH command. Quite often during the design and testing of a file definition this is acceptable, and so the subfile section is frequently not coded until the file definition is complete.

The information in this section of the file definition is quite unlike statements in other parts of the file definition in one important respect. Information added or changed in this section does not require a RECOMPILE of the definition to take effect. The changes made here take effect as soon as the record is added to, updated in or removed from the FILEDEF subfile.

The information in the subfile section does not affect the structure of the file, just the manner or conditions under which data can be accessed.

B.9.2  Basic Statements in the Subfile Section

Look carefully at the new group of statements appearing at the end of the following skeleton file definition.

Beginning with the SUBFILE-NAME statement, the last lines of the definition make up a simple subfile section. For many files defined by users, these are all the statements ever needed in the subfile section. Here we have defined a subfile called "PROFESSORS" that can be selected by anyone in group GG or by accounts GA.SPI and GA.LHB. In addition if any of these users issue the command EXPLAIN PROFESSORS, the six lines of explanation (EXP) provided in the file definition will be output.

Let's look at a descriptive skeleton of this simple subfile section. Note the indentation used, as it indicates which elements are structures.

ACCOUNTS is a singly occurring element in a multiply occurring structure keyed on GOAL-RECORD. GOAL-RECORD in turn is the key of a multiply occurring structure in another multiply occurring structure keyed on SUBFILE-NAME. We will see later in this chapter how this nesting of structures gives a good deal of flexibility in setting up different groups of users with different access privileges to information in the file.

The value coded in the SUBFILE-NAME statement should not be longer than 32 characters, including blanks, and should be descriptive of the contents of the subfile. The name chosen should be unique among subfile names that the users of the subfile can select. If the name is not unique in this way, a user attempting to select the subfile may be asked to specify which file he or she wants, after being given the file names of the "competing" subfiles. The user will not be asked if one of the files belongs to him or her, in which case that subfile will be used. [See B.5.6.]

The value(s) coded for the EXP statement(s) each make up a single line of terminal output when the user types the EXPLAIN command and names the subfile. Notice from the example that blank lines can be output by coding only "EXP;", and that lines with leading blanks must have their values enclosed in double quotes. The subfile explanation is not required, but is strongly recommended, especially when a subfile is made public. The explanation might include a description of the contents, uses and scope of the subfile; information about updating frequency, data elements, and any special aspects of subfile use are also helpful. At a minimum, a public subfile should give the name of a person or document that can be consulted for further information.

The GOAL-RECORD statement must have the name of the goal record that was coded in the RECORD-NAME statement in the record definition.

The value of the ACCOUNTS statement is a list of the accounts that are permitted to select the subfile; the accounts are separated from each other by commas. It is possible to privilege an entire group to select a subfile by coding the group name followed by four periods: "XA...." for example, will permit any member of group "XA" to select the subfile. (More specific or more general subsetting of permitted users is also allowed. The following forms are valid: g....., gg...., gg.u.., gg.uu., and gg.uuu. The form "g....." would determine community subfiles, and the form "gg...." would determine group subfiles, as listed by the SHOW SUBFILES command. Note that the number of periods plus the number of characters of the account number must equal six. Other forms, such as "...uuu", are not allowed.) You may permit the public (any account) to select a subfile by specifying "PUBLIC" as an account.

If a user appears in several privilege groups for a single subfile, he or she will be given the privileges associated with the most specific listing of the account number. That is, if account XX.YYY selects a subfile with a set of privileges for the PUBLIC, another set for community X, another set for group XX, and another set for account XX.YYY, the set for account XX.YYY (the most specific listing) will be in effect. This situation often arises when a particular account or accounts are allowed to update a subfile that is selectable by the public for searching only.

In the following subfile section, two simple statements have been added to the previous subfile section:

The FORMAT statement specifies the name of a format that is to be set automatically whenever the subfile is selected in SPIRES (not in SPIBILD), just as if a SET FORMAT command had been issued. The SEARCH-MOD statement specifies a string that is to be appended to every search command issued, just as with the SET SEARCH MODIFIER command. Both of these statements take effect only once, when the subfile is selected. The user is free to issue other SET FORMAT or SET SEARCH MODIFIER commands to change these settings.

A WHERE-MOD statement, similar to the SEARCH-MOD statement, can be specified. This statement specifies a criteria clause (the clause that follows WHERE in a FOR command) that is appended to all FOR commands (except FOR REMOVES) issued against the subfile. The syntax of the statement is:

If no logical operator is included, AND is assumed. AND NOT and NOT produce the same result. Any FOR command will have this "WHERE-MOD" appended to it; unlike SEARCH-MOD where a SET SEARCH MODIFIER command can cancel it (unless there is some SECURE-SWITCH involved), the WHERE-MOD cannot be removed by the subfile user.

The WHERE-MOD statement has no effect on Global FOR requests in SPIBILD.

B.9.2a  Subfile Selection for "Access Lists" of Accounts

File owners often own multiple files that can be selected by the same sets of accounts. For example, you might own ten files that everyone in your department must be able to select. But suppose your department, like most others, has a great deal of turnover, and every time someone comes or goes, you have to change the accounts lists in ten file definitions.

Instead of trying to keep the same list of accounts in synch between ten file definitions, you could maintain the list in a single place -- a record in the EXTDEF subfile -- and refer to the list by name in the Subfile section of each file definition. That way, you could make changes to the single EXTDEF record, and they would apply immediately to each of the file definitions that referred to it.

The relevant statements in the EXTDEF record are:

You put the list of accounts in the ACCESS-ACCOUNTS statement. The ACCESS-ACCOUNTS list has the same format as the ACCOUNTS statement in the file definition. Any of the following may be specified, each separated from the next by a comma:

The ID of the EXTDEF record is then named in ACCESS-LISTS, a statement in the Subfile section of the appropriate file definitions:

As the form suggests, you can name several access lists in the ACCESS-LISTS statement. In addition, you can also name individual accounts in the ACCOUNTS statement as well; both the ACCESS-LISTS and ACCOUNTS statements may appear in the same Subfile section. [See B.9.2.]

If it simplifies maintenance of access lists for you, you may also want to "nest" access lists within each other by using the ACCESS-SUBLIST element in the EXTDEF record:

Anytime the current EXTDEF record is referenced as an access list during the subfile-selection process, SPIRES will first look through the accounts listed in the ACCESS-ACCOUNTS statement in the current record; then, it will look in the "secondary" access lists named in the ACCESS-SUBLIST statement in the current record.

It is possible for other file owners to use the "access lists" you have put in the EXTDEF subfile as well. That means that you can share access lists with other file owners. If you want to prevent others from using your access lists, or wish to limit the accounts that can, you can add the ACCOUNTS statement to the EXTDEF record:

If specified, only the accounts listed (as well as your own account) can use the EXTDEF record. To give access to your account only, add an ACCOUNTS statement with only your account listed. Do not confuse the ACCOUNTS statement in EXTDEF with the ACCOUNTS statement in the Subfile section of a FILEDEF.

Note that other file owners have "use-only" access to your EXTDEF records; they cannot see or change them, meaning that any changes to the ACCESS-ACCOUNTS (or ACCOUNTS) statement must be made by you. (However, you may give other accounts "update" access to EXTDEF records using the "metacct" facility; EXPLAIN METACCT SUBFILE for details.)

(Stanford-only) The ACCESS-UNIVIDS Statement

You can also specify the users who can select your subfile by placing their 8-digit Stanford University ID into the ACCESS-UNIVIDS statement, e.g.,

When the $UNIVID variable is set to one of the specified University IDs, then subfiles that give access to those IDs on this access list can be selected.

Subfiles that give access via ACCESS-UNIVIDS will not show up in the SHOW SUBFILES display.

The ACCESS-MATCH Element

A virtual element called ACCESS-MATCH provides a handy way for comparing the current user's account or Stanford University ID to the lists of accounts or University IDs specified in the ACCESS-ACCOUNTS, ACCESS-UNIVIDS and ACCESS-SUBLISTS elements in the EXTDEF record. If SPIRES matches the current user to an entry in one of those elements, then it returns the matching value in the ACCESS-MATCH element. If there is no match, the element's value is null.

In other words, if a program needs to check whether the current user is in one of the access lists of the EXTDEF record, it can simply check the value of the ACCESS-MATCH element.

B.9.3  The SECURE-SWITCHES Statement

Often it is not desirable to allow the user to set another format, or to clear the format set by the file definer, just as it is sometimes not desirable to allow a particular user and group to update a file that can be selected and searched. Preventing certain kinds of data manipulation is the function of the SECURE-SWITCHES statement. This statement, usually used to prevent users from doing particular things, is also coded in the subfile section.

Secure-switches are specified as numbers that activate certain functions:

For example:

There are 19 secure-switches that can be used independently or in any combination. A summary list, which describes how each switch affects the targeted users, appears below; details on each switch follow this section. Online, issue the command "EXPLAIN SSW n", replacing "n" with the number of the switch you're interested in.

Two other commands are handy when working with secure-switches:

which shows the secure-switches in effect for the selected subfile; and

which allows the file owner, and users with MASTER access to the file, to set a secure-switch dynamically, usually in order to test its effect on the subfile. [See B.9.3a.]

B.9.3.1  Secure-Switches 1 and 2

Secure-switches 1 and 2 provide for variations in the action taken when an error is detected by a processing rule. For example, suppose a personnel office has a file of salary data. In order to verify that salaries entered are within a particular range, the following might be the definition of the salary elements.

The Inproc coded would abort a record being added or updated if the salary was not from 9000 to 15000. However, it might be reasonable to allow certain groups of accounts to enter salaries outside of this range, perhaps providing a warning message if such data were entered. If the Inproc were recoded with a "V" for variable error conditions, and the appropriate secure-switch or switches were specified, this would be possible. For example:

Then, if no secure-switch were coded for a particular account group, no error message would be output and the value would be accepted. If SECURE-SWITCHES=2 were coded for the privilege group, then VD would be treated as a W error, and a warning would be provided. If SECURE-SWITCHES=1,2 were coded for the privilege group, then VD would be treated as an S error, and a message would be output, and the record aborted. If no SECURE-SWITCHES statement is coded for a privilege group, then all error conditions are treated as if the "V" specification were absent; "VD" is treated as "D", for example.

If SECURE-SWITCHES=1 is coded for a privilege group, then

If SECURE-SWITCHES=2 is coded for a privilege group, then

If SECURE-SWITCHES=1,2 is coded for a privilege group, then

As shown, if both 1 and 2 are coded, all V errors are treated as S errors. So, if VW were coded for actions for which S would have been coded, it would be possible to have one account group receive only warning messages while another account group, having SECURE-SWITCHES 1 and 2 specified, would receive serious error messages.

Variable error modifiers can also be used to advantage in Searchproc rules, to force one group of accounts to search only certain values in an index, while others could search the entire index.

By the way, X is allowed as an equivalent for VD; for example, AX53 and AVD53 are the same.

B.9.3.2  Secure-Switch 2

This section is merely a place-holder for Secure-Switch 2, which is described in the previous section. [See B.9.3.1.]

B.9.3.3  Secure-Switch 3

Secure-switch 3 prevents users from adding, updating or removing goal records. It blocks ADD, UPDATE, ADDUPDATE, REMOVE, MERGE, ADDMERGE, DEQUEUE, UNQUEUE, BATCH and all the INPUT commands in SPIRES.

In an obscure way, it also affects SPIBILD. In SPIBILD, input access is limited to the file owner and users with PROCESS access to the file. [See B.9a.] However, even those users cannot do any input processing (e.g., with the INPUT BATCH command) into a subfile if secure-switch 3 is set for their accounts.

B.9.3.4  Secure-Switch 4

Under secure-switch 4, account checking by processing rules ($TEST.ACCT proc or action A53) is ignored for the account using the subfile. This switch is most often used to allow a file owner, or specially privileged accounts, to have access to all records in a subfile when normal access to the records is controlled by account number.

B.9.3.5  Secure-Switch 5

Under secure-switch 5, the BROWSE GOAL command is blocked. Global FOR commands are not affected.

B.9.3.6  Secure-Switch 6

Secure-switch 6 is meant to restrict access to records only through normal search processing; "TRANSFER key", "DISPLAY key", EXTRACT, SCAN, SET and SHOW SEARCH MODIFIER, STACK, echo of protocol commands and SHOW XEQ STACK are blocked. Most Global FOR commands are also blocked, with the exception of processing under FOR RESULT, FOR SET or FOR STACK, if the stack was created from a result or set.

B.9.3.7  Secure-Switch 7

Secure-switch 7 ensures that when OR and AND NOT operations are performed during index searching, the entire pointer structure is compared, not just the pointer; therefore qualifiers are compared with each other. In other words, OR and AND NOT will always be treated as TOR and TNOT respectively in index searching.

Secure-switch 17 has a similar effect on AND, converting it to TAND. [See B.9.3.17.]

B.9.3.8  Secure-Switch 8

Secure-switch 8 blocks any non-formatted processing or display of records, including REFERENCE, SET ELEMENT, SHOW ELEMENT, GENERATE SET, GENERATE LOAD and CLEAR FORMAT commands, as well as TYPE and OUTPUT commands followed by an element list. TRANSFER is also blocked unless a format with USAGE=TRANSFER or ALL is in effect.

B.9.3.9  Secure-Switch 9

Secure-switch 9 blocks SET FORMAT, SHOW FORMAT, TRANSFER and REFERENCE commands; hence, it allows processing only with formats or without, depending on whether a default format is set for the subfile using the FORMAT statement. [See B.9.2.]

B.9.3.10  Secure-Switch 10

Secure-switch 10 prevents two users from having the same record transferred or referenced (unless the NOUPDATE option appears on the REFERENCE command) at the same time.

In addition, it blocks the "UPDATE key" command and the ADDUPDATE command. UPDATE alone, following a TRANSFER or REFERENCE command, is allowed under SSW 10.

The MERGE command is blocked under SSW 10 if the record being updated is currently transferred or referenced by another user.

Lastly, SSW 10 blocks the UPDATE option on INPUT LOAD commands in SPIRES, but allows it in SPIBILD.

B.9.3.11  Secure-Switch 11

Secure-switch 11 affects how SPIRES treats relational operators during searches involving simple indexes. When secure-switch 11 is set, only the equality operator is recognized; other operators, if they appear in the search command, are treated as part of the search value.

The session below shows how secure-switch 11 can affect a search; the user shown is the file owner, who is allowed to change the secure-switches with the SET SSW command. [See B.9.3a.] At the start, the switch is not set.

In the first FIND command, SPIRES treats the word AFTER as the relational operator. In the second, SPIRES treats it as part of the search value, since it is enclosed with the rest of the value within quotation marks.

But once secure-switch 11 is set, the word AFTER is treated as part of the value, not as a relational operator. So, in essence, the second and third FIND commands have the same effect: they both find a record with the title AFTER THE FOX.

The error message that SPIRES displays when it spots a relational operator other than the equality operator under secure-switch 11 processing is a warning error, which can be suppressed in protocols by issuing the SET WARNING MESSAGES = 0 command, if desired.

B.9.3.12  Secure-Switch 12

Secure-switch 12 has been withdrawn. It did provide a special type of SPIBILD processing for batch requests where index information was passed immediately for every update or merge request processed.

B.9.3.13  Secure-Switch 13

Secure-switch 13 is used to force a record into memory for key processing when the record is accessed. If the key of the goal record has a Userproc in its Outproc statement (a Userproc in the SLOT-PROC statement [See C.11.4.2.] if the goal record is a slot), then secure-switch 13 will make sure that the record key is driven through its Outproc (or Slot-proc) when the record is accessed. Thus, for example, you could prevent a particular user from updating records by coding a Userproc that checks the account number of the user. Whenever the record was accessed by its key (using a command such as "REMOVE 13" for example) the record would first be brought into core and the Userproc on the key's Outproc would be executed.

Note that SECURE-SWITCH 13 does not guarantee that the record will always be brought into core whenever it is accessed. This disclaimer is needed because it is possible that a record is already in core. For example, suppose that a Userproc on the key forbids some user from updating a record, i.e., issuing the UPDATE command against a record. If the user has issued a Global FOR command, such as FOR TREE, and DISPLAYed the record, the record has already been driven through its key's Outproc when the record is displayed. A subsequent TRANSFER * command, followed by an UPDATE, would be permitted; the record is already in core, and thus would not be checked again.

The ADDUPDATE command cannot be used with subfiles that have SECURE-SWITCH 13 set.

B.9.3.14  Secure-Switch 14

Secure-switch 14 must be set for users to take advantage of the WITH UPDATE feature, which allows:

See the manual "SPIRES Formats" for further information on WITH UPDATE.

B.9.3.15  Secure-Switch 15

Secure-switch 15 provides a second way for SPIRES to handle record input containing see-only elements. [See B.9.4.3.]

If secure-switch 15 is set, record transactions that attempt to add, update or remove see-only elements will fail with an S281 error. If secure-switch 15 is not set, the input for the see-only elements is completely ignored, i.e., SPIRES discards it, and it has no effect on the success or failure of the transaction.

B.9.3.16  Secure-Switch 16

Secure-switch 16 allows indexes containing upper- and lowercase values to be searched successfully. Unless this secure-switch is specified, the values processed by FIND, AND, OR, AND NOT, ALSO and SYNONYM commands are always converted to uppercase by SPIRES before any Searchproc rules are applied.

If this secure-switch is specified, then no automatic conversion to uppercase is done. Thus, upper-lowercase indexes can be searched. Any uppercase indexes should have a $CAP or $UPPER proc (A30) in the Searchproc (or the user would have to remember to enter values for uppercase indexes in uppercase only). (If an A30 is specified in a Searchproc, it should precede any A14 or A11 truncated search rules.)

For compound indexes, if this secure-switch is specified, then the elements being indexed should have a $CAP or $UPPER proc (A30) as an Inproc rule, should be numeric elements, or should be passed with A169:1 (which prevents forcing to uppercase on passing).

Note that specifying this secure-switch has no effect on processing of element values against criteria in an ALSO command or WHERE clause. If it is necessary for such criteria to be "case sensitive" then the data element (or a corresponding redefined virtual element) should have TYPE=HEX specified; WHERE and ALSO criteria are not forced to uppercase for such element-types. Declaring the element TYPE=HEX has the effect of defining the element as non-character, even if it still is character. (Note: HEX is equivalent to BITS as an element type.)

B.9.3.17  Secure-Switch 17

Secure-switch 17, which affects ANDs in index searches, is similar to secure-switch 7 for AND NOT and OR. It forces SPIRES to treat ANDs as if they were TANDs; in other words, SPIRES will compare both the pointers and qualifiers when ANDing pointer groups together.

The switch is meant for use only when all indexes of the subfile are guaranteed to have qualifiers.

B.9.3.18  Secure-Switch 18

Secure-switch 18 cuts off general SELECT access to the subfile, making it available only via subgoal access (e.g., $LOOKSUBF procs, subgoal processing, phantom elements). The subfile cannot be selected, nor can records be batched into it in SPIBILD.

B.9.3.19  Secure-Switch 19

Secure-switch 19 cuts off all subgoal access (e.g., $LOOKSUBF procs, subgoal processing, phantom elements) to the subfile, making it available only through SELECT commands. This is a complementary switch to SSW 18. [See B.9.3.18.]

B.9.3.20  Secure-Switch 20

Secure-switch 20 (SSW 20) will indicate to SPIRES that all OR operations are to act just like TOR, which means all unique combinations of pointer groups occur in the result. AND NOT will operate as before, which means it will eliminate all pointer groups from the 1st set that have pointers matching at least one pointer group in the 2nd set (1st AND NOT 2nd). AND will be done in a very special manner. The result set will contain all combinations of pointer groups from both sets when the pointer matches at least once in both sets, otherwise the pointer groups are eliminated. Here is an example of this special AND, called XAND:

      1st set      2nd set       result set
        PTR = 24;    PTR = 30;     PTR = 24;
         QUAL = X;    QUAL = A;     QUAL = X;
        PTR = 24;    PTR = 24;     PTR = 24;
         QUAL = A;    QUAL = K;     QUAL = K;
        PTR = 10;    PTR = 15;     PTR = 24;
         QUAL = F;     QUAL = G;    QUAL = A;
        PTR = 7;     PTR = 12;     PTR = 7;
         QUAL = R;    QUAL = J;     QUAL = Y;
        PTR = 7;     PTR = 7;      PTR = 7;
         QUAL = H;    QUAL = Y;     QUAL = R;
        PTR = 3;     PTR = 7;      PTR = 7;
         QUAL = S;    QUAL = R;     QUAL = H;

Although AND normally reduces a result set, this new kind of AND can increase the set since it logically does TOR of all pointer groups for which the pointer matches at least once in both source sets. Therefore, $RESULT may be larger, but $RESCNT (the count of unique pointers) will remain the same or will reduce. In the example above, $RESCNT for the 1st set was 4, while $RESCNT for the result is just 2. But all combinations of pointer groups for PTR=24 and PTR=7 are in the result (1st AND 2nd).

Note that this new AND (XAND with SSW 20) will take longer to perform since pointers need to be compared, and complete pointer groups also need to be compared.

Here are the major implementation features of AutoTAND.

If Secure-Switch 20 is set, and A6:16 through A6:30 are used to identify "classes" in searching, then SPIRES does the following with search requests:

a. Each search value defines a "mnemonic/class" code. For search values that don't use A6 classes, the code is 0. For search values that do use A6 classes, the code is a combination of "mnemonic" and "class number".

b. An internal processing stack maintains either "mnemonic/class" or "class/bit" codes. A "class/bit" code has just a single bit set for a single "class number", and no "mnemonic/class" ever matches any "class/bit" or combination of "class/bits". Entries are added to the processing stack by search values.

c. Each logical operator "processes" the last two entries in the internal processing stack. The last entry of the two is defined by the term to the right of the logical operator, and the first entry is defined by the term to the left of the logical operator. The result of the logical operation is to drop the last entry and replace the first entry by some logical combination of the original two entries. If the two entries match exactly, the result is the first entry, and the logical operator is not altered (AND remains AND which implies it will be processed as XAND with Secure-switch 20 set).

If the two entries don't match, then "mnemonic/class" entries are converted to "class/bit" entries (if needed), and the result entry depends upon the logical operator and bit patterns of the two "class/bit" entries as follows:

If the logical operator is OR or TOR, then the resulting entry is 0 if either "class/bit" entry is zero, otherwise it is the or'd bit pattern of the two "class/bit" entries.

If the logical operator isn't OR or TOR, then if bits match in both "class/bit" entries, the result is the and'd bits of the two "class/bit" entries (matching bits only) and the logical operator is changed to a T-operator (TAND or TANDNOT). But, if there are no matching bits, the result is 0 if the logical operator is already a T-operator, otherwise the result is the or'd bit pattern and an AND operator is treated as XAND. Finally, if the operator is some form of NOT, the result is set back to the original first entry (before conversion).

d. When a search command is finished being analyzed, the internal processing stack contains only one entry...that corresponding to the final result. This information is remembered and is used as the initial entry for an iterative search.

Under AutoTAND logic (SSW 20), note that OR'ing only defines a combination class when both sides contribute classification bits. If either side is zero, the or'd class is assumed to be zero. This has an effect on any subsequent AND'ing. AND's which have class bits are TAND'd when there are matching bits from both sides. OR's can only contribute bits when both contribute, so TAND'ing only occurs when everything contributes bits. Otherwise, XAND occurs, and the bit pattern becomes the or'd bit pattern of both sides, yielding non-zero bits if either side contributes bits.

The following example may help shed some light on the situation. Assume four search terms:

  1.  NAME       (0, does NOT contribute bits).
  2.  AUTH       (bit 1)
  3.  AUTH.CLASS (bit 1)
  4.  MJR        (bit 4)

  -> find name garcia or auth.class 7
  -> find auth.class 7 or name garcia

  ; The result bit-pattern is zero because NAME
  ; on either side of the OR contributes 0.

  -> find auth.class 7 or mjr D10
  -> find mjr D10 or auth.class 7

  ; The result bit-pattern is 5 (1 or 4, 4 or 1).
  ; Both sides contribute bits, thus the or'd pattern.

  -> and auth gra

  ; If done with either result involving NAME, this
  ; is done with an XAND and has a bit-pattern of 1.

  ; If done with either result involving MJR, this
  ; is done with TAND and has a bit-pattern of 1.

  -> find name smith and name wesson

  ; Done with an XAND, but the result bit-pattern is
  ; zero because NAME contributes 0 from both sides.

B.9.3.21  Secure-Switch 21

Secure-switch 21 alters the way SPIRES constructs the "-Result: <n> Records" message users see after issuing a successful FIND command. Instead of "n" coming from the value in the $RESULT variable, it will come from the $RESCNT variable when secure-switch 21 is set. Similarly, the displays from the SHOW RESULT and SHOW RESULT HISTORY commands are also affected.

Note, however, that a TYPE or DISPLAY ALL command, which would normally eliminate any pointers in the result that point to records that have been removed or that otherwise result in some error, will not eliminate those pointers when SSW 21 is set; neither $RESULT nor $RESCNT will change.

This switch is useful only when a subfile has indexes with qualifiers, which can produce misleadingly high result counts, as reflected in $RESULT. In such situations, $RESCNT is more accurate because it actually scans the search result and counts unique records. Because that takes more work, SSW 21 can make searches more expensive.

B.9.3a  The SHOW SSW and SET SSW Commands

You can see what secure-switches are in effect for a selected subfile by issuing the SHOW SSW command:

Those shown are the secure-switches in effect for the account which has the subfile selected. Unlike most SHOW commands, the SHOW SSW command may not be prefixed by IN ACTIVE in order to place the information in the active file.

The $SSW function may also be used to see whether a particular switch is in effect. See the manual "SPIRES Protocols" for more information about it; online, EXPLAIN $SSW FUNCTION.

The file owner and those with master privileges to the file can change the secure-switches in effect for the duration of the session with the SET SSW command:

where "ssw-number" is a valid secure-switch. If ON is used, the secure-switch is turned on; if OFF is used, the named switch is turned off; if neither is used, ON is assumed. This command can be useful for testing the effects of various secure-switches without having to change the file definition first.

Any changes made will be in effect only for the account under which the commands were issued and only while the subfile remains selected. To make permanent changes to the secure-switches, you must change the file definition.

B.9.4  Security for Individual Elements and Indexes

Previous sections of this chapter have shown how the Subfile section of the file definition can be used to provide subfile-level security. For example, you use the ACCOUNTS statement to specify exactly which accounts can select a subfile. And, in combination with secure-switch 3, for example, you can use the ACCOUNTS statement to specify which accounts can update records in a subfile. [See B.9.2, B.9.3.3.]

But you can also use the Subfile section to control access to individual elements and indexes. For example, you might have a subfile of personnel records accessible to a group of accounts that contains salary information that only a few of those accounts are allowed to see, and even fewer are allowed to update. This section, B.9.4 and its subsections, will discuss methods you can use to provide security for individual elements and indexes. [The Subfile section can also be used to provide security for specific records. This might be done using a combination of Userprocs and the SUBCODE and ACCOUNTS statements. [See C.11.3.]]

The contents of this section are outlined below:

B.9.4.1  Views: Element Security Defined in Packets

If you are responsible for a subfile, you may want one group of users to have no restrictions on the elements in each record; they can see and make changes to all the data. On the other hand, if the subfile contained sensitive data, e.g., grade elements in a school's registration subfile, you might want to let some accounts see the grade elements but not update them. And a third group might need access to the other elements in records of the subfile but have no need to see or update the grade data.

Each group of users needs to have a separate "view" of the goal records; the first group has an unrestricted view of all the elements, while another group's view lets its users see all but update only some of the elements. The third group's view lets its users see and update only some of the elements.

SPIRES lets you define views, such as the three described above, in the record definition. The two views that limit users might look like this:

Though each of these two views specifies only a single element, they actually define a view of all the elements in the record-type. That is, in both of these views, all elements except GRADE can be seen and updated, since only GRADE has any restrictions place on it.

You connect these views to particular user accounts in the Subfile section of the file definition, like this:

Most users (PUBLIC) will see the records through the HIDE.GRADE view -- that is, the GRADE element is hidden from them for both seeing and updating. Account TR.OUT, with the SEE.GRADE.ONLY view, can see the value of the GRADE element but not update it. Account SH.ARK has the least restrictive view at all; it has no view restrictions since no view is assigned to that account. [See B.9.4.3 for details on how specific commands are affected by various view restrictions.]

The VIEW Statements in the Record Definition

A view definition appears under the RECORD-NAME statement for the appropriate record in the file definition. Though it is compiled as part of the file's characteristics, it has no effect at all unless a VIEW-NAME statement naming it appears in the Subfile section of the file definition.

Specifically, a view definition can have the following statements, all of which are optional:

The MODIFY-VIEW statement is described in the next section. [See B.9.4.2.] The others are described below, though not quite in the same order shown above:

The "element.list" consists of a list of one or more element names, separated by commas or blanks, to a maximum length of 3000 characters. Each of the statements whose value is an element list can be multiply occurring, so the 3000-character limit would not pose a problem if you had to list hundreds of elements.

You can specify structure names too, if you want to assign a particular view category to all the elements in the structure (or those elements in the structure not named in another category) at once. Some other special forms for element specification are available too. Structures in view definitions and other advanced topics are discussed in the following section. [See B.9.4.2.]

The VIEW-NAME Statement in the Subfile Section

The VIEW-NAME statement has the same syntax as the VIEW statement:

Its value is chosen from the list of views defined in the definition of the subfile's goal record. It is placed in the Subfile section under the ACCOUNTS statement, as shown in the example above. Only one view may be specified per account list. The VIEW-NAME statement is optional; if it is omitted for a group of accounts, no view restrictions will exist for those accounts. [See B.9.4.4 to see how they might instead be affected by CONSTRAINT and NOUPDATE statements.]

You cannot specify both a view and any of the statements that refer to priv-tags (except for NOSEARCH) in the Subfile section. The same file, and even the same record-type, may have both priv-tags and views defined in them, but only one method of element security may be declared in the Subfile section.

In the view definition itself, if no DEFAULT statement is coded, the default security level for elements not covered by the HIDDEN, SEE-ONLY and UPDATE statements depends on which of these three statements were coded. The default value is:

Remember, though, that the defaults discussed here apply when you do not use the DEFAULT statement -- you can use the DEFAULT statement to override (or confirm) the default. Coding a DEFAULT statement is recommended, to insure the proper default for unassigned elements.

Another recommendation is that you keep the element lists in the HIDDEN, SEE-ONLY and UPDATE statements as short as possible, for efficiency's sake. For example, if all but three elements should be hidden, it is best to name the three exceptions in an UPDATE statement, and code the DEFAULT as HIDDEN, rather than name all the hidden elements in a HIDDEN statement.

For many applications, which just need to protect a couple of elements from particular groups of users, the above material will be more than sufficient when defining views. The next section covers some advanced features of the view facility.

B.9.4.2  Advanced Features of the View Facility

This section is divided into several discussions:

     1. Allowing records to be updated but not added or removed
     2. Assigning security levels to structural elements
     3. Changing one view into a similar one: MODIFY-VIEW
     4. View definitions in an EXTDEF record
     5. Alternate ways to list elements in view statements

1) Allowing Records to be Updated but not Added or Removed

If you use an element view to make a non-slot record key a "see-only" element, you prevent users from being able to add new records or remove existing ones. If it is the only see-only element, then the users can make changes to the rest of the record as desired with MERGE and UPDATE commands, but they cannot remove, add, dequeue or unqueue any records. [See B.9.4.3.]

A slot record-key, if declared "see-only", will prevent users from removing, dequeuing, or unqueuing any records, but adds are still allowed, because the slot key is assigned as part of the Inclose processing.

On the other hand, if you make any required element hidden, with no Inclose processing to supply default values, then users will not be able to add or remove records just as described above.

2) Assigning Security Levels to Structural Elements

If you name a structure in a view element-list, then all the elements within that structure are assigned that security level. For example, given the following elements in a record definition:

If a view is defined with COURSE as hidden, then COURSE.NUMBER and GRADE are hidden. If you want only GRADE to be hidden, just code "HIDDEN = GRADE".

On the other hand, suppose you want all the elements within QUARTER to be "see-only" except for those in COURSE, which should be hidden:

In essence, naming a structure in a view element-list is a shortcut way of naming all the elements within it; to say that a structure is hidden is to talk about its contents, not the container. This concept is important to understand in some situations.

For example, using the same record-type as above, suppose you want the elements YEAR and QUARTER.NUMBER to be hidden, while COURSE.NUMBER and GRADE should be see-only. Here are two different view definitions that produce the same view:

Both views have the following effect:

The SHORT view definition shows that when all but a few elements in a structure are to be assigned a particular level, it may be easier to assign that level to the structure as a whole and list the exceptions in another statement, rather than list all the elements for that level, as was done in the LONG view.

In sum, listing a structure in (say) a HIDDEN statement does not mean that the structure itself is "hidden"; instead, it means that all elements inside it will be hidden, unless they are changed by other statements. In the SHORT example above, the elements in QUARTER were first described as hidden, but then the SEE-ONLY statement changed the elements in COURSE to see-only.

The Overall Security Level for a Structure

There are uses for terms such as "hidden structures" and "see-only structures", but it is important to realize that these labels are not determined absolutely by a statement such as HIDDEN, but by the final status of the structure after all the statements of the view definition have been established. That is, in the example above, we would not call QUARTER a hidden structure, since there are elements within it that are not hidden at all.

So here is a list of the types of structures that a view can establish, followed by a discussion of how the types affect users differently:

Here is another sample view, shown with its effects on the sample record-type:

The COURSE structure is see-only, because it contains a combination of see-only and hidden elements. On the other hand, QUARTER is a partial-update structure, because it contains an update element (e.g., YEAR) and a "non-update" structure (COURSE).

These distinctions can be important when you are trying to determine the effects of a view on individual users. [See B.9.4.3.]

3) Changing One View to a Similar One: MODIFY-VIEW

The MODIFY-VIEW statement lets you make minor changes to another view without having to respecify the entire view. For example, here are two views, where one modifies the other:

In this particular case, the two views are identical, except for the COURSE.NUMBER element, which is HIDDEN in PRIMARY but SEE-ONLY in SECONDARY.

The MODIFY-VIEW statement is part of the view definition in the record definition. Its syntax is:

where "view.name" is the name of another view defined for the same record-type.

When compiling a view that includes a MODIFY-VIEW statement, SPIRES first sets the levels for the HIDDEN, SEE-ONLY and UPDATE statements listed in the view named in the MODIFY-VIEW statement. Then it sets the views for the HIDDEN, SEE-ONLY and UPDATE statements listed in the rest of the current view definition. Then it sets the default, using the DEFAULT statement in the current view definition, if there is one; otherwise, it uses the DEFAULT statement in the view named in the MODIFY-VIEW statement.

For INPROC-REQ and OUTPROC-REQ statements, all the elements named in all occurrences of these statements, either in the current view or in the view named in the MODIFY-VIEW statement, will be restricted.

4) View Definitions in an EXTDEF Record

You can place a record's view definitions in a record of the EXTDEF subfile if you want. Doing so could be advantageous if record-types in several different files belonging to you (or even in the same file) have the same record definition (or at least the same element names). Rather than maintaining the same view definitions in several different file definitions, you can keep them in a single EXTDEF record.

You can place the view definitions into an EXTDEF subfile record belonging to your account. Then, you tell SPIRES where the view definitions are by adding the EXT-VIEW statement to the appropriate record-definition section of the file definition:

where "gg.uuu.extdef.record" is the ID of the EXTDEF record. Remember, the EXTDEF record must be your own; you cannot name EXTDEF records belonging to other accounts. More than one may be named. In the record definition, the EXT-VIEW statement comes just before any view definitions you want to code there.

All the views in an EXTDEF record must be for the same record-type. SPIRES will issue error messages if it finds names of elements not in the current record-type it is compiling. So if you want several record-types in the same file to have view definitions that are stored in the EXTDEF subfile, you should put the definitions into separate EXTDEF records by record-type. [See C.10.5.]

Alternate Ways to List Elements in View Statements

The HIDDEN, SEE-ONLY, UPDATE, INPROC-REQ and OUTPROC-REQ statements all specify elements to be affected by their control. Some special ways of listing elements can simplify the coding of these elements.

If several different elements in different structures all have the same element name (e.g., PHONE.NUMBER), you can identify each one individually be using the "structural path" form of the name:

On the other hand, if you want to specify that all elements of that name should be protected similarly, precede the element name with the "@" symbol, e.g.:

All elements named PHONE.NUMBER, regardless of what structures they are in, will be hidden in that view. That technique also works for floating structures.

If you need to identify several individual elements in one structure and you would need to use the "structural path" forms of their names to identify them, a special form using parentheses is available to you. For example, suppose PHONE, AGE and WEIGHT are elements in the CHILD structure while other structures also have the same elements. To hide these three elements in the CHILD structure only, you can code either of the following:

Other structures can be nested inside; be sure to get the parentheses matched up properly though!

B.9.4.3  Specific Effects of the View Facility

The effects of element views are in most cases quite obvious, but are occasionally subtle too. Naturally, if you declare an element to be hidden in one view, you know the element's values cannot be changed through that view. But if the UPDATE command completely replaces the old version of a record with a new one, wouldn't an update of such a record cause any hidden elements to be discarded? Or do hidden elements mean an UPDATE command is not allowed?

This section will answer those kinds of questions as it discusses the effects that element views have on various SPIRES commands. In particular:

- for output commands, such as TYPE and DISPLAY:

Occurrences of any hidden elements will not appear in the output, but update and see-only elements will.

- for ADD commands:

- for TRANSFER commands:

- for UPDATE commands:

- for MERGE commands:

SPIRES treats hidden and see-only elements the same for MERGE commands as it does for UPDATE commands (see above). MERGE is still different than UPDATE for the "update" elements: UPDATE discards all the update elements from the old record and adds the new ones from the input data. MERGE, on the other hand, merges the new data into the old data of the "update" elements.

- for ADDUPDATE commands:

SPIRES will treat the record as described above for ADD commands or for UPDATE commands, depending on whether the record turns out to be an add or an update.

- for REMOVE commands:

You can remove records in a subfile that defines hidden or see-only elements if and only if those records contain no occurrences of the see-only and hidden elements.

- for UNQUEUE and DEQUEUE commands:

You cannot issue either UNQUEUE or DEQUEUE commands if the goal record has either hidden or see-only elements in your view of the subfile.

- for SHOW ELEMENT commands:

For commands that list the elements in a record-type, such as SHOW ELEMENTS and SHOW ELEMENT NAMES, hidden elements and hidden structures will not be shown to the user at all. See-only and update elements and structures, and the non-hidden portions of partial-update structures will be shown.

- for other commands and functions:

Any command in which you name an element (e.g., "TYPE element-list") will fail if you name a hidden element; you will be told that you have named an "invalid mnemonic". Functions in which you name an element, such as $ELEMTEST, may return only a null value if you name a hidden element.

B.9.4.4  Priv-Tags and the CONSTRAINT and NOUPDATE Statements

Another way to handle element security, mutually exclusive from element views [See B.9.4.1.] is with "priv-tags" assigned to elements, combined with CONSTRAINT and NOUPDATE statements coded in the Subfile section of the file definition.

The CONSTRAINT and NOUPDATE statements, associated with a given group of accounts that can select the subfile, specify "priv-tag numbers" of those elements to be affected. Any element with the matching priv-tag number is assigned the particular degree of security identified by the statement.

The syntax of each statement is shown below:

What are "priv-tags"? These are numbers that the file definer assigns to elements in the record or linkage definitions by coding a PRIV-TAG statement. These numbers may then be referenced by CONSTRAINT, NOUPDATE, INPROC-REQ or OUTPROC-REQ statements in privilege group specifications. [See B.9.4.5 for information about INPROC-REQ and OUTPROC-REQ.]

Each element in a record-type may be assigned a single priv-tag number. The PRIV-TAG statement appears in the element definition:

where "n" is a single integer from 1 to 63. If the element is a slot key, you must code the priv-tag number on the SLOT statement:

where "n" is a priv-tag integer from 1 to 9 (the slot key cannot have a priv-tag of two digits).

Only one number can be assigned to each element. Any positive integers from 1 to 63 can be chosen (except for the slot key, as noted above), and they do not have to be in sequence. Several elements may share the same priv-tag number. This is usually done to reduce the number of tags that must be specified in the CONSTRAINT and NOUPDATE statements, whenever the elements have identical restrictions placed on them.

Since CONSTRAINT and NOUPDATE apply only to priv-tags in the selected goal record, priv-tag numbers in one goal record can duplicate priv-tag numbers in other goal records in the same file.

To use priv-tags to make an element invisible to a user, the element must have both CONSTRAINT and NOUPDATE priv-tags applied. If a user can see an element but not update it, then only NOUPDATE need be specified. A simple (i.e., non-structured) data element never has CONSTRAINT specified without NOUPDATE.

Here are two elements, SALARY and NAME, that have PRIV-TAG values specified:

Let's now define two privilege groups for a subfile PROFESSORS with a goal record called PROF, providing varying levels of access to those two elements. Note how two privilege groups are defined for a single subfile simply by repeating the privilege-group structure key, GOAL-RECORD. A third privilege group must select the subfile under a different name, and has access to a different explanation.

Note that only one method of element security may be used in a subfile: either the priv-tag method, using the CONSTRAINT, NOUPDATE, INPROC-REQ or OUTPROC-REQ statements, or the view method discussed in the previous sections. Both views and priv-tags may be defined in the same record-type, but the Subfile section of the file definition may only refer to one of the methods.

Priv-tag values are somewhat more difficult to assign when elements being controlled by CONSTRAINT and NOUPDATE are elements in a structure. In such cases, the structure itself must always have a priv-tag value. If any element in a structure has CONSTRAINT and NOUPDATE applied to it then the restriction on the structure itself is always the same as the least stringent level of restriction on the elements.

The following table indicates what restriction should be assigned to an element whose TYPE=STR, depending upon the minimum and maximum restrictions defined for elements that are part of the structure. In this table,

        - 0 indicates that the element is not restricted;  that is,
          neither CONSTRAINT nor NOUPDATE are specified.
        - 1  indicates  that  the  element   has   only   CONSTRAINT
          specified.
        - 2 indicates that the element has only NOUPDATE  specified.
        - 3 indicates that  the  element  has  both  CONSTRAINT  and
          NOUPDATE specified.

    Maximum Restriction in Structure

       0      1       2      3
     ---------------------------
    |  0  |   1   |   1   |  1  |   0
     ---------------------------
          |   1   |   1   |  1  |   1   Minimum Restriction
           ---------------------           in Structure
                  |   2   |  2  |   2
                   -------------
                          |  3  |   3
                           -----

Thus, if one element in a structure has no restrictions applied to it (a minimum restriction of 0), and another element has CONSTRAINT and NOUPDATE (a maximum restriction of 3), then the entire structure must have CONSTRAINT applied to it (a restriction of 1); this is the least stringent restriction in the structure. (This is the only situation in which CONSTRAINT is specified without NOUPDATE.) If all elements in a structure have both CONSTRAINT and NOUPDATE applied to them (a minimum and maximum restriction of 3), then the structure itself must have a priv-tag of CONSTRAINT and NOUPDATE. If structures exist within structures, then begin assigning restrictions to the elements deepest within structures and work toward the record level.

It is often desirable to prevent members of a privilege group from adding or removing records, but to allow updating access to other elements in a record. This can be done simply by putting a restriction of NOUPDATE on the key of the record.

Note that if a goal record has any NOUPDATE restrictions for the user:

More information about how SPIRES treats elements that are hidden because of CONSTRAINT and NOUPDATE statements can be gleaned from the earlier section on the effects of hidden and see-only elements, which are identical. [See B.9.4.3.]

B.9.4.5  The INPROC-REQ and OUTPROC-REQ Statements

SPIRES by default allows a subfile's users to retrieve both the internal and external form of an element. For example, a dynamic element can be defined to provide the internal form of another element:

Sometimes, though, a file owner needs to prevent users from retrieving the internal form of an element, often for security reasons. In other words, the file owner wants to prevent users from bypassing processing rules.

The statements INPROC-REQ and OUTPROC-REQ, in conjunction with either element views or element priv-tags, can be used to require that element values be processed through the INPROC or OUTPROC rules coded in the file definition respectively. If used with element views, the statements are coded in the view definition, and name the elements to be affected. [See B.9.4.1.] If used with priv-tags, they are coded in the Subfile section of the file definition, like CONSTRAINT and NOUPDATE. [See B.9.4.4.] Either way, you can specify them for different elements and for different accounts.

Whether in view definitions or in the Subfile section (for the priv-tag method), the INPROC-REQ and OUTPROC-REQ statements share the same syntax:

where the list of element names, separated by commas or blanks, can be up to 3000 characters long, and each statement may occur multiple times in the view definition.

where each "priv-tag" is an integer from 1 to 63 corresponding to the priv-tag value assigned to elements to be affected by these statements. [See B.9.4.4.]

The impact of the INPROC-REQ statement is on input formats. Label groups in an input format may include INPROC statements, which are executed instead of the file definition's INPROC rules for the element being processed. But if the element is under the control of the INPROC-REQ statement, the INPROC statement in the format label group will not be executed. Instead, SPIRES sets the $SKIPEL flag, which means that the remainder of the label group will be executed, but any PUTELEM or REMELEM statement in the label group will be ignored. In other words, a format cannot override the file definition's INPROC rules for an element under the control of INPROC-REQ.

The OUTPROC-REQ statement affects output formats similarly. When an element is under the control of the OUTPROC-REQ statement, any OUTPROC statement in a format label group processing that element will be ignored. Instead, the rest of the label group is skipped, unless the DEFAULT statement is coded in the label group to specify default processing. [See the manual "SPIRES Formats" for more information on the effects of INPROC-REQ and OUTPROC-REQ on formats.]

The OUTPROC-REQ blocks access to the internal form of an element by other techniques as well. Specifically, if an element is controlled by OUTPROC-REQ:

Please refer to the documentation of these features for more specific information.

B.9.4.6  Index Security: Priv-Tags and the NOSEARCH Statement

Occasionally you may want or need to hide an index from users of your subfiles. This happens more frequently than you might think; you will probably want to hide the sub-index part of a name index, for example. [See B.7.11, B.8.5.] Or, if you have an element that is hidden from a group of users but is also indexed, you will probably want to hide the index from the users as well, so that they cannot use the index or see values in it (with the BROWSE command). [See B.9.4.1.]

To hide one or more indexes, you must use priv-tags in combination with the NOSEARCH statement coded in the linkage sections of the indexes you want to hide. The NOSEARCH statement's syntax is:

where each "n" is an integer from 1 to 63 matching "priv-tag numbers" of indexes, sub-indexes or qualifiers that are not to be searched by accounts in the privilege group.

Unless you are hiding different indexes in several combinations from several different user groups, you will probably use only one or two priv-tag numbers, which you assign in the linkage section of each index, sub-index or qualifier you want to hide:

where "n" is an integer from 1 to 63. Each index can have only one priv-tag number, but the same number may be (and most commonly is) shared between several indexes.

As suggested earlier, the most common situation involving NOSEARCH is when a file owner wants to hide the sub-index portion of a name index from users. The NOSEARCH statement will be used to prevent users from seeing the sub-index that contains the first names of the indexed people; searchers don't need to see the sub-index because the Searchproc action A38 ($PNAME proc) will search it automatically.

In other words, the Searchproc lets the user type a command such as "FIND NAME JOHN SMITH" and converts it (more or less) into the command "FIND NAME SMITH @ FIRST.NAME JOHN", which means the user does not have to type a complicated command like that. By hiding the sub-index FIRST.NAME, the user does not see the sub-index listed in the SHOW INDEXES display, even though he or she actually does use it when the name index is used.

Here is the way the linkage section for the personal name index might look:

Since you probably want to hide the sub-index from all accounts, the Subfile section of the file definition might look like this:

If you had several sub-indexes (or any other indexes) you wanted to hide from the view of these accounts, you could assign them all the priv-tag value of 1, or give them different integer values and add those values to the NOSEARCH statement.

B.9.5  The SUBGOAL Statement

The SUBGOAL statement allows members of a privilege group to access data in record-types of the file other than the goal record record-type using SPIRES formats. Such access may be necessary when indirect record-access operations such as table-lookup become too complex for action 32 ($LOOKUP proc) to handle. [See C.5.7, C.5.8.]

The SUBGOAL statement is also necessary when a user other than the file owner writes a format for a file and that format contains an action 32 ($LOOKUP) -- except for the file owner, only users of the format who have been given access to the accessed record-type by the SUBGOAL statement in the file definition can successfully use the format. (Accounts given SEE file access [See B.9a.] can also use the format.) _G Detail The SUBGOAL statement is also needed for the $LOOKSUBG function to work from any accounts other than the file owner's and those given SEE access.

Phantom structure access, another way of allowing access to other record-types in a file, does not require that the SUBGOAL statement be specified. A file owner establishes phantom structures in the file definition, so the file owner has other controls (CONSTRAINT, etc.) for blocking their use.

SPIRES formats that use the SUBGOAL processing feature cannot be used except by accounts that have been given SUBGOAL privileges. This privilege is granted by coding a SUBGOAL statement in the subfile-section for the account or group of accounts to be privileged. The SUBGOAL statement specifies the RECORD-NAME values of the record-types that may be accessed. The SUBGOAL statement can specify a maximum of ten record names that can be used by a privilege-group.

For example, to give groups W7 and W8 the ability to access INDX1 and INDX2 using SUBGOAL processing, the following subfile-section would be coded:

Whenever a member of either group W7 or W8 selects the PROFESSORS subfile, he or she may invoke a format to access records in INDX1 or INDX2; the PUBLIC would not be able to invoke such a format.

Note that if you are using the subfile's goal record for subgoal processing (that is, the same record-type is both the goal record and the subgoal), you do not need to code the record name of the goal record in the SUBGOAL statement -- it is automatically available for subgoal processing. Also, it does not count as one of the ten record-types allowed, as described above.

B.9.6  The SELECT-COMMAND Statement

SELECT-COMMAND is a multiply occurring element that specifies commands to be executed whenever the subfile is selected. Hence, it can be used to set a COMPXEQ or XEQ subfile for use with the subfile, for example, or to issue informational messages, such as "subfile news", to the user selecting the subfile.

Here is an example of some SELECT-COMMAND statements at the end of a SUBFILE section:

When the subfile is selected,

Then, if you issue the SHOW XEQ command for confirmation:

All commands (of any system) are valid, including XEQ FROM to invoke a protocol. Moreover, because the SELECT-COMMANDs are treated like a protocol, you can use labels and JUMP commands as SELECT-COMMANDs. [See "SPIRES Protocols" for more information about protocols.]

Note that there is a limit of 10 occurrences of SELECT-COMMAND in each ACCOUNTS section of the SUBFILE section. Note also that even if ECHO is set, the "select commands" are not echoed at the user's terminal; only the file owner knows for sure what commands have been issued.

The select commands are executed at the very end of the selection process, after the default format is set (if a FORMAT statement is coded).

B.9.7  The PROGRAM Statement

The file owner can limit the programs in which a user can select the subfile by adding the PROGRAM statement to the subfile section of the file definition. The most common application of this statement is to make a subfile available for selection only in Prism.

The syntax of the statement is:

where "program" is a program chosen from the following list:

 SPIRES   PRISM   SPIBILD   FASTBILD   FOLIO

If the user attempts to select the subfile but is not in one of the programs specified in the PROGRAM statement, an error will occur and the select will fail. (The currently executing program is named in the system variable $PROGRAM.) If desired, the user can issue the EXPLAIN command to get an explanation of the error message.

Of course, if the PROGRAM statement is omitted, the subfile can be selected from any of the programs listed here that are available to the user.

B.9.8  The SUBCODE Statement

The SUBCODE statement in the Subfile section of a file definition allows you to establish a "name" for the particular set of users who have access to the subfile with the specific constraints, secure-switches, etc. defined there. This name is placed in the system variable $SUBCODE, which can be used by Userprocs, protocols and formats.

Here is the syntax of the SUBCODE statement:

where "value" is a string up to 32 characters in length. SPIRES will convert the value to uppercase, so case is not significant. The variable $SUBCODE is set to "value" when the user selects the subfile; the user cannot change the value, though he or she can examine it.

The statement is best considered as a label for the accounts given a particular view of a subfile as defined in the rest of the Subfile section. For example, suppose a file definition has this Subfile section:

If a format, protocol or Userproc needs to determine whether the current user is in the first group or the second, it can check the value of $SUBCODE:

If the user is one of the second set, the above command would cause the subfile log to be displayed. That is considerably cleaner than this:

B.9a  Defining File Access Privileges

By default, certain commands relating to file management can be used only by the file owner. Some of the commands provide information only (SHOW FILE STATUS or SHOW SUBFILE TRANSACTIONS, for example); others can affect the status of the file (SET AUTOGEN, or PROCESS in SPIBILD); still others can completely destroy the file (ZAP FILE). The FILE-PERMITS section, a record-level structure in the file definition, gives the file owner the ability to allow other accounts to issue various sets of these commands. For instance, other accounts may be able to process the file [See B.10.12.] or copy the entire file to their own account, and so forth.

The FILE-PERMITS structure appears at the end of the file definition, following the SUBFILE section. It contains two elements. The first is called FILE-ACCESS and is the key of the structure. The second is a multiply-occurring ACCOUNTS element.

FILE-ACCESS can have one or more of these values: SEE, UPDATE, PROCESS, MASTER and COPY. (If more than one value is used, they must be separated by commas.) The five values are described below:

            SHOW FILE ACTIVITY
            SHOW FILE BLOCK
            SHOW FILE STATUS
            SHOW RECORD OVERHEAD
            SHOW RECORD RECOVERY
            SHOW SUBFILE TRANSACTIONS
            SHOW SUBFILE LOG
            STATUS
            ATTACH
            GENERATE LOAD

The ACCOUNTS statement can specify one or more accounts which are to be given the access specified in the previous FILE-ACCESS statement. The same forms of the account that were valid for the ACCOUNTS statement in the SUBFILE section are valid here (e.g., GA.JNK, GG...., or PUBLIC). [See B.9.2.]

Here is a sample of a FILE-PERMITS structure which would appear at the end of a file definition:

All accounts have SEE access to the file. All accounts in group GG, as well as account GA.SPI, have MASTER access to the file. Two accounts, GG.WCK and GG.RLG, have COPY access to the file.

There are, of course, still some privileges that only the file owner has. Only the file owner can see and update the file definition record in the FILEDEF subfile for the file. Also, only the file owner can issue the COMPILE and RECOMPILE commands. Destroying the file, using the command ZAP FILE, can only be done by the file owner as well.

Note that, unlike subfile access privileges [See B.9.] which take effect as soon as the file definition is added or updated in the FILEDEF subfile, the privileges defined in the FILE-PERMITS structure are compiled in the file definition characteristics, and take effect immediately upon compilation.

B.10  SPIRES File Management

The contents of this chapter have been incorporated into the manual "SPIRES File Management".

"File management" is about how to take care of a SPIRES file that already exists. It is often the next subject after "file definition" in that it covers, for example, how to load data into a subfile (i.e., add multiple records to a subfile at one time), how to add new indexes or "remove" indexes no longer needed, how to ensure that the file is using its storage space most efficiently, etc.

B.11  Logging Database Use in SPIRES

The owner of a SPIRES file has the option of having SPIRES log certain information about the access and use of a database. The file definer can specify the kind of information that is to be logged and the manner in which it is to be stored. This is done with the LOG and STATISTICS statements.

The file definer can also inform SPIRES that use of a database is to be charged for (as distinct from merely logged), supplying the rates which apply for particular uses of the database. In this case, SPIRES will inform a user who attempts to select a chargeable database of the rates that apply. SPIRES will then log those charges incurred during individual users' sessions.

Details about creating and using a log appear in the manual "SPIRES File Management", chapter 4.3. Online, EXPLAIN FILE LOG.

B.12  Immediate Indexing

By default, subfile updating activities do not affect its indexes. Added and updated records, as well as removal requests, are placed in the deferred queue -- not till the file is processed in SPIBILD are the indexes updated to reflect the deferred queue data. However, in some applications, immediate indexing of the deferred queue data may be desirable. A file owner may request that one or more indexes of a subfile be updated immediately when a subfile transaction would affect them, meaning that those indexes will always be synchronized with the goal records.

Only two extra steps in the file definition procedure are needed to request immediate indexing. First, the IMMEDIATE statement must be added to the linkage section of each index for which the feature is desired. Second, appropriate ORVYL data set security permits must be set to allow accounts updating the subfile to change the data sets containing the indexes. These steps will be discussed in the next section. [See B.12.1.]

Considered by themselves, the costs of "immediate" indexes are higher than "non-immediate" indexes. For example, processing that would occur overnight when rates are cheaper will occur whenever a subfile transaction involving immediate indexing occurs. Also, immediate indexing handles records individually, whereas overnight processing handles multiple records all at one time, which is more efficient. On the other hand, if users must search the deferred queue fairly regularly because the data they need may not be indexed, the benefits of immediate indexing (in terms of both searching costs and user convenience) may outweigh any efficiency disadvantages. Efficiency considerations will be considered in more detail later in this chapter. [See B.12.2.]

It is important to realize that the file owner declares indexes to be "immediate" on an individual basis -- in any given subfile with, say, a dozen indexes, any number from 0 to 12 of them could be immediate indexes, and the rest would not be. Thus, immediate indexing does not mean that a file's deferred queue is unnecessary. Quite to the contrary, SPIRES will still add the subfile transaction to the file's deferred queue, awaiting tree processing by SPIBILD. However, in addition, SPIRES will update the tree copies of any immediate indexes at the time of the subfile transaction. All Global FOR operations against the deferred queue (FOR UPDATES, DISPLAY ALL, for instance) will still work as they did before, since the transactions are still placed in it.

B.12.1  Coding Immediate Indexes

To request immediate indexing of one or more indexes, the file owner must follow two steps. First, the IMMEDIATE statement must be added to the linkage section of the particular index. Second, appropriate ORVYL data set permits may need to be set.

The IMMEDIATE Statement

The IMMEDIATE statement is placed at the end of the linkage section for a given index. Its syntax is simply

Below is a sample individual linkage section:

The IMMEDIATE statement here applies only to the index record-type ZIN04. An IMMEDIATE statement must be added to the linkage section of each index for which immediate indexing is desired.

Immediate indexing may be coded for any index in any SPIRES file with the following exception in goal-to-goal passing: if a goal record-type (say, REC01) passes data to an index record-type via immediate indexing (REC02), that index record-type may not pass the same data to another index record-type (REC03) via immediate indexing. In other words, if REC01 passes data immediately to REC02, REC02 cannot itself pass the same data immediately to any other record-type. The SPIRES compile process may not detect this error, but will warn you if a record type is both an immediate index and a record type that passes data immediately. [See B.12.3 for other information on immediate indexing with goal-to-goal passing.]

If you are adding immediate indexing to a file that already exists, you must recompile the file definition after adding the IMMEDIATE statement. The deferred queue of the file must be empty.

In addition to the other ORVYL data sets, a "filename.CKPT" data set will be created when the file definition is compiled (or recompiled, after IMMEDIATE is added). [See B.5.5.] That data set will be used instead of the SYSTEM.CHKPT data set for file processing. [See C.6.24.] (Be aware of the difference in the abbreviations of "checkpoint" in the data set names.)

ORVYL Data Set Permits

When you compile a file definition, SPIRES creates ORVYL data sets in which the file data will be stored. Appropriate permits are set to allow users to change particular data sets or forbid the same. By default, for example, the DEFQ data set of a file (i.e., the ORVYL data set "ORV.gg.uuu.filename.DEFQ") is set for "public WRITE" access, which allows SPIRES to allow users with subfile access to place subfile transactions in the deferred queue. All other data sets for the file, such as RES or MSTR, are set for public READ, allowing the data to be read but not changed by other accounts, since all updating activity goes to the deferred queue.

If, however, you define an index as immediate, SPIRES will set public WRITE on all newly created data sets for the file that might be modified during immediate-index activity. Usually that means these data sets:

SPIRES will not change the permits for data sets that already exist. Therefore if you are recompiling a file definition, of changing an index that already exists into an immediate index, you will need to set the permits properly (i.e., to WRITE access) yourself.

Giving the data sets public WRITE access does not necessarily mean the public can update your SPIRES file, particularly if you have used the SET SPIFILE command in ORVYL. [See C.6.22.] With the SET SPIFILE command, you can completely control who has access to your file by means of the file definition.

However, if limitations of the SET SPIFILE command get in your way, the ORVYL commands SET CLP, SET PERMIT and SET NOCLP may be used to change the permits for those data sets if you want. The procedure to follow for each data set is:

where "data.set.name" is the name of the ORVYL data set whose permits are to be changed. Various forms of "account" are allowed, including "gg.uuu" for a specific account, "GROUP gg" for all members of a group, and PUBLIC.

For example, to allow users in group FF to add records to be immediately indexed to the subfile STAFF in the file FF.JNK.EMPLOYEES, the file owner account would issue the following commands:

And that process would be repeated for other data sets, as described above.

B.12.2  Efficiency Considerations for Immediate Indexes

There are two types of data processing to consider when determining the impact of immediate indexing in regard to efficiency and costs: the maintenance of the data base (in terms of index building from user updates) and the general use of the data base (e.g., for searching). In both regards, immediate indexing incurs additional overhead not incurred by non-immediate indexing. In neither regard, however, is the overhead likely to be overwhelming to the point of discouraging you from using immediate indexes as long as you have good reasons to.

(This discussion is not meant to discourage you from using immediate indexing, but to tell you a