LCLS SLC IOC General Func Reqts

LCLS Controls

SLC-Aware IOC General Functional Requirements
This is a DRAFT!!! Work is in-progress!!!
Quick links:

References

System Description and External Interfaces

General Requirements

SLC Message Service (MSG)

SLC Database Service (DBS)

Cluster Status and Test Service (CSTR)

Timing System PNET Diagnostics (TIME)

Magnet-Like Device Monitor and Control (MGNT)

Beam Synchronous Acquisition and Control (BSAC)

A. Introduction
LCLS Specification #1.2-201 written by the LCLS Controls Group on May 2004 provides background. The first two paragraphs of this specification are reproduced here:
The SLC control system at SLAC is currently used on most of the LINAC. It is the only control system in sectors 20-30, which will be used by the LCLS mostly intact. LCLS will replace all of the BPM electronics in these sectors to provide higher resolution. The Injector for LCLS will use all new control, except for the high power RF components, which are existing SLC klystrons and modulators. The corrector magnets in the LINAC that will be used for LCLS will all have new EPICS based controllers. From the undulator to the experimental stations, all new controls will be done in EPICS. Note that all SLC data from the existing LINAC will be available to the EPICS environment, but the time stamp information that allows data correlation to beam events is not available at the present.

The motivation to implement an SLC aware, EPICS IOC, is to allow the new elements of the LCLS control system to use EPICS, while still taking advantage of the high level applications on the SLC control systems. These high level applications include correlation plots, energy management, beam steering, beam based alignment, emittance measurements, and slow feedback.

B. System Description and External Interfaces
An IOC (Input/Output Controller) consists of a set of tasks on a single board computer (SBC) or a single process with a set of threads on a Unix workstation. In this document, "task" means either a SBC task or a process thread. A "normal" IOC, like those in the PEPII project, has just EPICS and CMLOG tasks. An "SLC-aware" IOC (aka "SLC IOC" or "IOC micro") adds groups of tasks to the normal IOC to interface with the SLC control system. Figure B-1 is a block diagram showing the SLC task groups (aka "jobs" or "services") (within the dashed line) and the interfaces with external tasks and resources.

Figure B-1: System Block Diagram and External Interfaces
B.1 SLC IOC Description
A brief description of each SLC task group is provided here. Specific requirements are listed in later sections:

SLC Executive
The SLC executive is started after both CMLOG initialization and EPICS iocInit during IOC startup. It exists through the lifetime of the IOC and is responsible for starting and stopping all other SLC tasks in the proper order.

SLC Message Service (MSG)
The IOC SLC message service accepts a subset of Alpha SLC requests that are structurally the same as those sent to SLC micros. These requests are queued for other tasks to process. If necessary, those other tasks then queue replies to the requests, each normally consisting of status and data (if applicable), which the SLC message service sends back to the requesting Alpha process.

SLC Database Service (DBS)
On startup, the SLC database service downloads the IOC's part of the SLC database from the Alpha Database Executor (DBEX) process and creates it locally on the IOC exactly like the SLC micro. This service then accepts new set points (supertype 2 data) from DBEX and updates the local SLC database, sending acknowledgments to DBEX when required. Other SLC tasks update readbacks (supertype 3 data) and set points changed by EPICS in the local SLC database and queue update requests. The SLC database service acts on these update requests by sending the data on to DBEX and waiting on acknowledgment.

Cluster Status and Test Service (CSTR)
This service periodically updates CSTR health and status secondaries (supertype 3 data) in the SLC database based on statistics updated by other SLC tasks on the IOC. It updates memory and CPU usage in the SLC database with values read from associated EPICS PVs. It also processes TEST requests from the Alpha, most importantly, the existence test request from the Alpha PARANOIA process.

Timing System Diagnostics (TIME)
This task responds to an Alpha request for timing system PNET history.

B.2 SLC IOC External Interfaces
External interfaces include the following:

Alpha SLC Message Service used by SCPs and standalones. Requests and replies are transferred using TCP/IP. Messages are transferred through a proxy task running on Linux which funnels and thus minimizes connections in both directions.

Alpha DBEX process for downloading the database and keeping both sides in synch. TCP/IP messages are transferred through the proxy.

NFS server which contains the PRIMARY.MAP file created during Alpha database generation (DBGEN) and copied to production by Alpha database installation (DBINSTALL). The SLC database service reads this file during startup for database definition.

CMLOG client consisting of an IOC message handler task, which receives cmlog message requests queued by the EPICS errlog task and all SLC IOC tasks, and a daemon which receives messages from the handler and transfers them over the network to the cmlogServer.

IOC shell which access the SLC shared resources and database allowing console users to view diagnostics and perform a limited set of commands.

EPICS resources and EPICS database accessed by the device control, gated ADC acquisition, timing diagnostics, and cluster status SLC tasks to update new setpoint values, activate EPICS commands as a result of SLC requests, and periodically get readback values and setpoint values changed by EPICS.

B.3 Unsupported SLC Functions
Currently, there are no plans to support the following SLC interfaces. These functions are either not needed or replaced by pure EPICS:

Fast Feedback

Timing System (except PNET diagnostics)

MPS Module Configuration

MPS Algorithm Processing and Rate Control

BitBus Power Supply Control

KISnet Communication

Micro-to-Micro Communication via Alpha

Crate Monitoring

Analog Signal Monitoring

Digital Input/Output

Klystron

SLC-Style Error Logging

Direct Hardware Access (ie, camcom)

Debugging from VMS

Video Interface (ie, catcart)

Other specialized SLC-style diagnostics

Multibus Exerciser

C. General Requirements

C.1 SLC and EPICS

SLC tasks do NO hardware access, control sequences, or calculations. EPICS does this work.

There is no requirement to transfer high-level application data (supertype 4), used by VMS applications and not currently downloaded to SLC micros, to and from the EPICS database.

There is no requirement to keep static, rarely changed data (supertype 1) in synch between the EPICS and SLC databases. Supertype 1 data may be changed on-the-fly in the IOC SLC database by IOC shell. Users may want to manually change supertype 1 data after an Alpha DBEDIT to avoid IOC restart.

Requirements for the creation of EPICS and SLC database files will be described elsewhere.

All SLC tasks are lower in priority than the highest priority EPICS tasks. Some SLC tasks are higher in priority than the channel access server. TODO: A table with EPICS and SLC functions and their relative priorities is needed here.

SLC IOCs can be used for all EPICS projects (not just LCLS).

The number of SLC IOCs to be added to the existing system is not yet known. If the number is too great, then the Alpha "micro active mask" and the maximum number of micros will need to be increased. This will require significant changes to the existing Alpha software.

C.2 Operating System

SLC IOCs run on RTEMS, VxWorks, Linux, Solaris, and possibly Macs (for LCLS only).

RTEMS or VxWorks SLC IOCs run on any supported 32 bit microprocessor.

64 bit microprocessors are NOT supported.

Only IEEE floating point formats are supported.

The CMLOG client is currently available for VxWorks, Linux, and Solaris. It must be ported to RTEMS and Macs, if necessary.

Messages and data from the VMS control system are little-endian and in VMS formats. SLC tasks convert to or from little-endian/VMS where necessary (words, floating point, and time).

There is no requirement that the timestamp on the IOC be synchronized with the timestamp on the Alpha. Any timestamp set by the Alpha in a request or in the database is not used in any calculation (ie, delta time) with an IOC timestamp.

Conversion from a GMT EPICS timestamp to a local VMS timestamp is required. Timestamp conversion from VMS to EPICS is not required.

Unlike the Alpha, daylight savings time on the IOC is automatic. Any local VMS timestamp set by the IOC in a reply or in the database is an hour different from the Alpha until the Alpha is manually adjusted later on Sunday. It is expected that functionality will not be impacted.

C.3 Proxy

Any SLC task maintaining a connection with the proxy automatically reconnects to the proxy when the proxy is restarted. The reconnection is attempted every 10 seconds. There is no need for SLC task restart ("IPL") unless the Alpha PARANOIA process sets the IOC "offline" due to either the proxy being down too long or PARANOIA taking too long to reconnect to the proxy.

When the proxy is restarted, it takes minutes for Alpha processes, particularly DBEX and PARANOIA, to reconnect to the new proxy. If the IOC is restarted before DBEX and PARANOIA reconnect to the proxy, the restart fails and another manual restart is required later. There is no requirement for the IOC to wait for Alpha processes to reconnect to the proxy before resuming a restart.

Currently, the proxy allows only one SLC IOC per machine. There is no requirement to run multiple IOCs on one Linux or Solaris machine. If this requirement changes, only the proxy will need to be changed.

Development SLC IOCs connect to the development proxy and production SLC IOCs connect to the production proxy.

Currently, there is one proxy task to handle ALL production SLC micros and IOCs. If that proxy cannot handle both the PEPII and LCLS traffic or if there is a need to do a "realm-split" to provide independence of projects or machine regions, multiple proxies will be allowed. This will require significant changes to the existing Alpha message service.

C.4 Startup and Shutdown

A failure of the SLC executive to start does not prevent EPICS and CMLOG from running normally.

Environment variables are used to determine if an SLC IOC runs on development or production system. There is no requirement that the production or development mode be changeable on-the-fly.

Environment variables are set during IOC startup to define:

Micro name

Location of the PRIMARY.MAP file

Proxy host

CMLOG server host and port

CDEV tag table

Allow dbedit of supertype 1 data

It is a design goal but not a requirement that the SLC executive be able to stop and restart the SLC tasks on user request without requiring an entire IOC restart or halt. Starting and stopping are achieved by IPL and reset from the SCP, IOC shell request, and, optionally, channel access. Channel access security delivered EPICS base is used to limit who can stop and restart using channel access. There is no requirement to restart the SLC tasks when requested by the broadcast from the PNET module which is generated by the MPG.

Before stopping SLC tasks, the SLC executive waits up to forever for any SLC-related IOC shell command to complete. It then commands the SLC tasks to exit and waits up to forever until all tasks are gone. Any failure in the SLC executive is considered fatal and an IOC restart is required to recover.

All SLC tasks respond to a command from the SLC executive to perform a normal exit in a timely manner. Right before exiting, all tasks set a stop flag for the SLC executive.

While SLC is stopped, all SLC-related IOC shell commands, except restart, output an error message to the console and do nothing more.

C.4 Resource Management

Resources allocated by SLC tasks are used only by SLC tasks with the exception of the EPICS IOC shell. Shared resources are properly protected.

On shutdown, each SLC task frees any EPICS or SLC resource it has taken and deallocates any resource it allocated.

If SLC restarts are implemented without requiring an IOC reboot, then to prevent memory fragmentation on single board computers, memory management using memory pools is required.

Any failure to allocate a resource results in a fatal error message and task exit or preferably suspension. An IOC restart is required to recover.

C.5 Message Logging

All messages have the following tags:

formatted message text

formatted condition code string

integer condition code with optional log-only bit

condition severity string

micro name

task name

destination SCP ID

Optional tags include the file name and line number where the error occurred. All messages that are logged by EPICS utilities called by SLC tasks have less informative values for the condition code and severity tags and do not include the destination SCP ID tag.

All messages that are logged as a result of a request from a SCP are displayed on that SCP only. Any task handling an SLC request copies the source SCP ID from the SLC request header to the destination SCP ID used by logging and restores it back to "V017" after SLC request processing is complete.

All messages that are not a result of a request from a SCP have a destination of "V017" and thus are displayed on all SCPs.

Any message logged by an EPICS base utility called by an SLC task does not appear on any SCP since the destination tag is not provided.

Noisy messages are throttled using SLAC-style throttling. Each task performs throttling for its own messages. A throttling table is used by all tasks to access throttling parameters. New conditions are added or removed from the throttling table by either the tasks themselves or on-the-fly by the IOC shell or, optionally, channel access.

An existing SLC message code is used whenever the message is conceptually the same as the one used by the SLC micro or if it is expected in a reply message back to the Alpha. The text of the message may be altered to use IOC/OSI instead of micro/RMX language.

When no existing SLC message code is obviously available, new message codes are added to the existing VMS message definition files. The number of new codes is minimized.

Development SLC IOCs send messages only to the development cmlogServer and production SLC IOCs send messages only to the production cmlogServer.

C.6 Diagnostics

Each task keeps diagnostics and allows access to the diagnostics from the IOC shell.

It is a design goal but not a requirement that a subset of the diagnostics are made available to channel access via EPICS records.

EPICS EDM displays allow display and control of IOC health including:

IOC Reboot

SLC Restart

SLC Stop

Diagnostic data reset for some tasks (design goal)

Subset of diagnostic data display per task (design goal)

EPICS IOC diagnostics including CPU, file descriptor, and memory usage

C.7 Software Development

SLC IOCs mimic the SLC micros (in action and timing) to minimize changes in the VMS control system software.

SLC IOC source is maintained like an EPICS package and use EPICS-style Makefiles.

An EPICS IOC application that needs the SLC interface includes the SLC libraries in its Makefile and adds the SLC environment variables and startup command to its startup file.

To make it easier to port the existing SLC micro code where necessary, all source is C instead of C++.

All source follows the LCLS C Coding Standards.

SLC tasks use EPICS operating-system-independent (OSI) functions to minimize platform-dependent source.

In addition to OSI functions, SLC tasks may use well-designed EPICS utilities. Any utility that allocates resource must provide for resource deallocation.

To access the EPICS database, SLC tasks use runtime database access instead of channel access for efficiency.

SLC source is kept in AFS. The master copy is kept in the EPICS CVS repository.

The number of include files that are FTPed from VMS to AFS is minimized.

When both EPICS and SLC provide similar typedefs, defines, or macros, the EPICS ones are used.

C.8 LCLS-Specific Requirements

Full integration testing of the SLC IOCs on VxWorks and Solaris can be done later since these platforms are not used by LCLS.

Full integration testing of the SLC IOCs on RTEMS non-PPC platforms can be done later since LCLS uses only PPCs.

LCLS will import the master copy of SLC code into the LCLS CVS repository and keep it up-to-date when allowed by the maintenance schedule.

The LCLS NFS server will probably be different from the MCC NFS server. Some mechanism will be needed to copy the PRIMARY.MAP file from the MCC NFS server to the LCLS NFS server when it changes.

It is possible that for LCLS production, a separate LCLS-dedicated cmlogServer and cmlog browser that forwards messages to the Alpha (fwdBro) are required. These processes will probably run on Linux instead of Solaris. This will require changes to the existing production CMLOG setup and possibly some changes to the Alpha and fwdBro.

The GNU suite of tools are used for all builds.

D. SLC Message Service (MSG)

D.1 Accept and Queue Request Messages

Accept all SLC message service requests coming from the Alpha via the proxy.

Like the SLC micros, no data integrity (CRC) check is done on the incoming requests.

Requests are no larger than 2K.

In general, requests are expected to arrive at 1 Hz or less with a possiblity of 10 Hz during short periods of high activity.

Convert request header from VMS to native format. The request data is left in VMS format to be converted by the task that processes the request.

Copy the source SCP ID from the request header to the destination SCP ID used by message logging. Reset after the request is queued.

Override the timestamp in the request header that was set by the Alpha with the current local VMS time to be used by the task that processes the request for diagnostic purposes.

Queue each SLC request in the appropriate job queue based on service code (TEST, TIME, MGNT, BSAC) which is a part of the function code. Drop any request with an unsupported service code. Queue each request as it arrives (first come, first queued).

Accepting and queuing requests is a lower priority function, compared to handling and replying to requests already in the queue.

Log the following conditions:

Invalid request header

Invalid service code

Unable to queue a request

Queue full

Update the following diagnostics, reset some on-demand:

Total number of startups

Total number of shutdowns

Total number of requests received

Total number of requests dropped due to error

Maximum and average request size

Time of last request

Total number of times the proxy connection is established

Current proxy connection status

Time of last request for each service (TBD)

Maximum and average request size per service (TBD)

D.2 Send Reply Messages

Read a reply from a queue (first-come, first-served) and send it to the Alpha via the proxy. This queue is filled by other SLC tasks that formulate a reply after processing a request.

Normal replies are no larger than 8K. Large replies (ie, from the gated ADC acquisition) are no larger than 128K.

For replies greater than 8K, the reply is sent 8K at a time using the same mechanism as the SLC micros where each chunk of the reply is sent with a special header that is used by the Alpha message service to put the reply back to together.

If the connection to the proxy is down, the replies wait in the queue for the connection to be reestablished.

Copy the destination SCP ID from the reply header to the destination SCP ID used by message logging. Reset after reply is sent.

Set the current local VMS timestamp in the reply header.

Set reply bit in the function code in the reply header.

Convert reply header from native format to VMS format first.

Replying to requests is a higher priority function, compared to handling and accepting new requests.

Log the following conditions:

Invalid reply header

Unable to send a reply

Update the following diagnostics, reset some on-demand:

Total number of replies sent

Total number of replies that could not be sent

Maximum and average reply size

Total number of times the proxy connection is established

Current proxy connection status

Maximum and average delta time between the time of the request and the time of the reply

Time of last reply

D.3 Process Requests with MSG Function Codes

Take a request with the MSG function code out of the MSG job queue, first come, first served.

Copy the source SCP ID from the request header to the destination SCP ID used by message logging. Reset after reply is queued.

Handle the following function codes:

Function Code Action

MSG_IOC_SLCNOTIFY Reply with status first. Send request to SLC executive to either restart or stop all SLC tasks depending on the request data.

IOC_STOP No reply. Exit gracefully.

Copy the source and destination from the request header to destination and source, respectively, in the reply header. Copy the VMS timestamp and function code from the request to the reply header. Set the proper data length in the reply header.

Queue reply in the message service reply queue.

Log the following conditions:

Invalid function code

Unable to queue a reply

Queue full

Update the following diagnostics, reset on-demand:

Total number of requests processed for each function code

Total number of replies

Total number of replies dropped due to error

D.4 Message Utilities
Functions used by more than one task include:

Take an SLC request out of a thread receive queue.

Queue an SLC reply into the message service reply queue.

Log a message with all the appropriate CMLOG tags.

Conversion for each primitive data type to convert to and from VMS and native formats.

Convert GMT EPICS time to VMS local time.

Get current GMT EPICS time and convert to VMS local time.

Find difference between two local VMS timestamps.

Proxy connection handling and sending and receiving messages (shared with the SLC database service).

E. SLC Database Service (DBS)

E.1 Download and Create Database at Startup

At startup, transfer messages with DBEX via the proxy to download the IOC's part of the SLC database from the Alpha to the IOC. The message structures are identical to those transferred between DBEX and the SLC micros. All downloaded "pieces" must arrive in a known sequence. The database is put into memory accessible by all tasks.

All messages transferred between the SLC database service and DBEX are no greater than 8K.

Convert offsets into the database (supertype 0 data) from VMS to native format. Leave the rest of the database in VMS format.

Read the PRIMARY.MAP ASCII file and create a character-based hash table to hold information about each primary and about each secondary of each primary. Information includes primary name and number, secondary name and number, data type, data width, and array size ("V" if variable).

If the database and file are loaded without error, set a database-exists flag used by other tasks. Reset the flag if any error. Set event that the database download is complete to release tasks waiting on this event.

Create and initialize the async function tables as described in section F under Async Utilities.

Log the following conditions:

Any database download or timeout error

Problem with offset conversion

PRIMARY.MAP read or format error

Error creating and populating hash table

Database service successful startup

Database service shutdown

E.2 Accept and Process Database Changes and Up/Down Requests from DBEX

Receive setpoint (supertype 2) database updates or up/down requests from DBEX via the proxy. The supertype 2 update request is structurally identical to that sent by DBEX to the SLC micros.

Updates from DBEX are expected to arrive at much much less than 1 Hz with a possibility of up to 10 Hz during short periods of high activity.

When a data update request is received, update the local SLC database. Other tasks that need to do database access wait up to forever while the update occurs.

If an up/down request is received, set or reset a flag that indicates DBEX is available. If DBEX is now available, save the database major/minor ID from the request and use it in checking for valid database access requests.

If DBEX is now available, add a request per job to the SLC database send queue to send any outstanding data updates for that job to DBEX.

Send acknowledgment back to DBEX via the proxy if requested.

Processing database changes is a higher priority function, compared to handling and accepting new SLC message requests.

Log the following conditions:

DBEX state change.

Invalid request from DBEX

Unable to send acknowledgment

Update the following diagnostics, reset some on-demand:

Total number of data updates received

Total number of data updates dropped due to error

Time of last data update

Total number of times that DBEX was flagged as down

Current DBEX state

Database major/minor ID

Total number of times the proxy connection is established

Current proxy connection status

E.3 Send IOC-Generated Database Changes to DBEX

The SLC database service reads a database update request from a queue (first-come, first-served). Note that when an SLC task updates the SLC database using dblput (see Database Utilities section below), a per-job update table which keeps track of what needs to be sent to DBEX is updated. The mechanism used to provide an efficient way to send updates to DBEX is exactly the same as that used on the SLC micro and involves "high/low water marks". After the task is done with all its updates, it adds an update request with job ID to the SLC database send queue and waits for an acknowledgement.

Update requests of readbacks (supertype 3 data) are done about once every 5 to 60 seconds. Update request of setpoints are done at much less than 1 Hz with a possiblity of up to 10 Hz during short periods of high activity.

An update request is created for the job based on the contents of the job update table and the local SLC database.

The request is sent to DBEX via the proxy and includes a request for acknowledgement using a unique code that DBEX must provide in its acknowledgment. This code is incremented with every update request that is sent and rolls over at 256.

Until the update request is either acknowledged or an acknowledge timeout occurs, no further updates are processed and newer update requests remain in the queue. No other task can update the database for the associated job after the request is created and before acknowledgment is received. Double-buffering of the job update table relaxes this restriction.

If the connection to the proxy is down, all database update requests are dropped (the queue does NOT fill up).

When the connection to the proxy comes back, updates for all jobs is done automatically and immediately.

Sending updates to DBEX is a higher priority function, compared to handling and accepting new SLC message requests and replies.

Log the following conditions:

Unable to send a request to DBEX

Update the following diagnostics, reset on-demand:

Total number of updates sent

Total number of updates dropped due to error

Time of last update (per job?)

Total number of times the proxy connection is established

Current proxy connection status

E.4 Wait for and Process Acknowledgments of Database Updates from DBEX

A wait for valid acknowledgments from DBEX via the proxy is done with a timeout of 15 seconds or less. If DBEX is flagged as down, a shorter timeout is used. Acknowledgments are received for each piece of the job update table.

Any invalid acknowledgment received during this time is dropped. The wait then continues with whatever time is left. To be valid, the code in the acknowledgment must match the expected code in the update request that was sent.

If DBEX is flagged as down and ANY acknowledgment (valid or invalid) is received, flag DBEX as available.

If a valid acknowledgment is received, the corresponding piece of the job update table is cleared. If there is any error, the pieces of the job update table that have not been acknowledged are not cleared and the next time an update request is done, data from the previously unsuccessful updates are included. If a database service shutdown is requested, the wait is aborted. In all cases, a flag is set so that the task that asked for the update and is waiting on the flag is free to continue.

Any task that needs to update the database and thus the job update table while the send and wait cycle is active for that job, waits up to forever until the cycle finishes.

Processing acknowledgments from DBEX is the highest priority function of all SLC tasks.

Log the following conditions:

DBEX state change.

Timeout waiting for acknowledgment

Invalid acknowledgment received

Update the following diagnostics, reset on-demand:

Total number of acknowledgments received

Total number of invalid acknowledgments received

Total number of timeouts

Time of last acknowledgment

Current DBEX state

Total number of times the proxy connection is established

Current proxy connection status

Maximum and average delta time between the time of the send and the time of the receipt of a valid acknowledgment

E.5 Database Utilities

Create database list. Update the input pointer list for input ASCII primary, unit, and secondary, allocating memory as needed. Unit may be "ALL*". If the database does not exist and is being created, first wait up to forever for the creation to finish. Update the database version in the list header. If the database version is already there, check that it is valid.

Get units. Update the input data list with ASCII unit names for the input primary, allocating memory as needed. Same wait requirement as for dblist.

Get data. If the database exists, get values from the database using the input pointer list, convert from VMS to native format, and update the input data list, allocating memory as needed. If another task is writing to the database, wait up to forever until it is finished. Check for a valid database version in the list header.

Put data. If the database exists, convert the values in the input data list from native to VMS format and write them into the database using the input pointer list. If another task is reading or writing to the database, wait up to forever until it is done. If the data is supertype 2 or 3, update the job update table that keeps track of what needs to be sent to DBEX. If a wait for a DBEX acknowledgment of a previous update for the job is in process, wait up to forever until the previous update is done. Check for a valid database version in the list header.

Update the Alpha database. If the database exists, add an update request to the SLC database service send queue for the input job to update DBEX now. Wait up to forever for acknowledgment. The input job name may be "ALL*".

Allocate and free pointer and data lists.

Convert integer unit to ASCII name.

Check if the database is available.

Return the native data type for an input primary and secondary name as a single character (R,I,Z,S,T).

Return the native data width for an input primary and secondary name in number of bytes.

Return the number of elements for an input primary, unit, and secondary.

Get the format, width, and count for an input primary, unit, and secondary.

Return the database version.

E.6 IOC Shell Interface

Writes value(s) for the input primary, optional unit, and optional secondary to the console.

Writes units to the console for an input primary name.

Update the local SLC database for input value(s) for an input primary, unit, and secondary. Log the change. Allow supertype 1 data to be editted only if an environment variable is set.

Write hash table created from PRIMARY.MAP to the console.

Update the Alpha database for one or all jobs.

F. Cluster Status and Test Service

F.1 Process Requests with TEST Function Codes

During initialization, send an "I'm alive" unsolicited message to the Alpha PARANOIA process (V016) so that PARANOIA turns the micro online in the CSTR STAT secondary and begins existence check polling. Log a "Started and at your service" message.

During exit, log a "Stopped and out of service" message.

Take a request with the TEST function code out of the TEST job queue, first come, first served.

Copy the source SCP ID from the request header to the destination SCP ID used by message logging. Reset after reply is queued.

Handle the following function codes:

Function Code Action

FUNC_TEST Existence check from PARANOIA. Reply with data exactly as sent.

TEST_ECHO Reply with data exactly as sent.

TEST_ECHO_MWORD Reply with blocks of repetitions for given word.

TEST_ERR_METER_RESET Reset cmlog throttling (TBD). Reply with the throttling reset status.

IOC_STOP No reply. Exit gracefully.

Copy the source and destination from the request header to destination and source, respectively, in the reply header. Copy the VMS timestamp and function code from the request to the reply header. Set the proper data length in the reply header.

Perform a CHK1 function as detailed in the next section.

Update statistics for the CHK1 function in the async function tables. More detail on this step is provided in the Async Utilities section.

Queue reply in the message service reply queue.

Send a request to update the Alpha database. More detail on this step is provided in the Async Utilities section.

Log the following conditions:

Service availability change

Invalid function code

Unable to queue a reply

Queue full

Update the following diagnostics, reset on-demand:

Total number of requests processed for each function code

Total number of replies

Total number of replies dropped due to error

Time of last existence check

Maximum and average delta time between the time of receipt and the time of the Alpha database update for existence checks

F.2 Periodic Update for Async Functions CHK1, CHK2, CHK3, and CPUM
The "cluster status" (CSTR) primary contains supertype 3 secondaries in the SLC database for micro health and status data. These secondaries are periodically updated using data read from shared async function tables and values read from associated EPICS PVs. The following functions are performed at different update rates:

Name Function Description

CHK1 Update timestamps, elapsed times, job availability in the SLC database

CHK2 Update and reset statistics (counters, percentages) in the SLC database

CHK3 Update and reset statistics in the EPICS database

CPUM Update CPU and memory usage

All CSTR secondaries involved in periodic update are listed in the following tables. Secondaries noted with "*" after the description have one value per function:

Supertype 1

Secn Description Use

JMSK Expected job bitmask Read at init
Used by CHK1 to set job status (MSTA)

CNAM Cycling job/function name pair* Read only at init and used in async function table creation

CYCL Fixed cycle period (>= 1 seconds)* Read at init and during CHK1
Used to determine cycle period

MTRC Max # DB updates allowed during MTRL (>= 1)* Read at init and during CHK1
Used for DB update metering

MTRL DB update period for MTRC (seconds, default is 60)* Read at init and during CHK1
Used for DB update metering

MAXT Max time between DB updates (>= 0)* Read at init and during CHK1
Forces a DB update regardless of metering

Supertype 2

Secn Description Use

VTIM Time of SLC restart request Read at init time only, used to set MTIM

HSTA Job enable/disable bitmask Read whenever a function is ready
Prevents cycling if bit not set for the function's job

CMSK Function enable/disable bitmask Read whenever a function is ready
Prevents cycling if bit not set for the function

MMSK CA monitor enable/disable bitmask Read during cycle time determination
Used to determine cycle period

FMSK Important CA monitor enable/disable bitmask Read before a function updates the DB
Forces a DB update regardless of metering

SCAN Cycle period (> 0 seconds)* Read during cycle time determination
Used if less than CYCL and the function's MMSK bit is set

Supertype 3

Secn Description Use

MTIM Time of SLC restart Set to VTIM at init time only

AMSK Active job bitmask Bit set if all tasks for the job are active and JMSK bit set.
Set during CHK1

MSTA Job status flag Set based on AMSK/JMSK comparison
Set during CHK1

CPU CPU idle time (percent) Value of IOC:micr:1:CPU EPICS PV
Set during CPUM processing

RMX Available memory (bytes) Value of IOC:micr:1:MEM EPICS PV
Set during CPUM processing

CAM Available CAMAC memory pool Initialized to zero, no further updates

UTIM Time of last Alpha database update attempt* Set when the update attempted

CTIM Time of last function cycle* Set when the function cycle completes

ELPS Time needed for function execution (seconds)* Set when the function cycle completes

NRUN # times a function cycles* Incremented when function completes
Cleared during CHK2

FAIL # times an Alpha DB update fails* Incremented when the update fails
Cleared during CHK2

PUPD % of NRUN resulting in a successful DB update* Calculated during CHK2

PVAX % of NRUN triggered by Alpha message requests* Calculated during CHK2

CRTS CAMAC crate status bitmask Initialized to zero, no further updates

CRTT CAMAC crate temperatures Initialized to zero, no further updates

CRV1-8 CAMAC crate voltages Initialized to zero, no further updates

NTIM Time of last TSTA update Initialized to zero, no further updates

TSTA Timing job interrupt status Initialized to zero, no further updates

MAGF Last magnet job function code Set by magnet job (section H)

BTIM Time of magnet job initialization Set by magnet job (section H)

All EPICS PVs involved in periodic update are listed in the following table:

Name PV Description Use

IOC:micr:1:CPU CPU idle time (percent) Copied to CSTR CPU during CPUM processing

IOC:micr:1:RMX Available memory (bytes) Copied to CSTR RMX during CPUM processing

IOC:micr:1:TBD TBD Copied from TBD during CHK3 processing

During periodic update initialization, the following actions are performed:

All CSTR secondaries that do not apply to the SLC IOC are zeroed. These secondaries include CAM, CRTS, CRTT, CRV*, NTIM, and TSTA.

AMSK and MSTA bitmasks are set and updated in the SLC database. Each bit in AMSK corresponding to a particular job is set if all the tasks associated with the job are active and the corresponding JMSK bit is set. MSTA is set to 1 if AMSK matches JMSK and 0 otherwise.

The value of VTIM is copied (or echoed) to MTIM as an indication that the micro's database has been downloaded. CSTR VTIM will not be checked to see if it is 25 minutes older than current micro time as it is on the SLC micro. On the SLC micro, that test is done to see if no SCP was involved in the boot request (in which case fast feedback loops are not to be restarted).

A non-metered request to update the Alpha database is sent.

The periodic update loop includes the following actions:

Wait up to forever until one of up to four cycling functions are ready. More detail on this step is provided in the next section (Async Utilities).

If it is time to process the CHK1 function, the following steps are taken:

Set AMSK and MSTA using the same logic as in initialization above. Updates the local database on change only.

Log a message for each task that is inactive as a reminder that repair and restart are required.

Update UTIM, CTIM, and ELPS in the local database. More detail on this step is provided in the next section (Async Utilities).

If it is time to process the CHK2 function, then PUPD, PVAX, NRUN, and FAIL are updated in the local database and counters are zeroed. More detail on this step is provided in the next section (Async Utilities).

If it is time to process the CPUM function, the current values of CPU and memory usage EPICS PVs are fetched and copied to CPU and RMX in the local database.

If it is time to process the CHK3 function, the current values of TBD are fetched from TBD and copied to TBD EPICS PVs. No SLC database update is done.

Update statistics for the function in the async function tables. More detail on this step is provided in the next section (Async Utilities).

A metered request to update the Alpha database is done. More detail on this step is provided in the next section (Async Utilities).

F.3 Async Utilities

Create and initialize async function tables shared between tasks. Get CSTR CNAM job/function pairs and use them to create a table of a variable number of functions for every job. Get CSTR CYCL, MTRC, MTRL, MAXT from the database and call async utility below to check values and update the table for each job/function. Initialize all counters to zero and all timestamps to current time. Set up database lists for supertype 2 and 3 data for later use by cycling functions and CSTR periodic update. Populate each table with the following items:

Function name from CSTR CNAM

Cycling time from CSTR CYCL

Alpha database update metering parameters from CSTR MTRC, MTRL, MAXT

Timestamp of the last database update attempt

Timestamp when Alpha database update metering started

Number of database updates since metering started

Database-update-failed counter

Database-update-success counter

Last database-update status (initialized to failed)

Timestamp of the last execution

Execution counter

Execution elapsed time

Execution-due-to-Alpha-request counter

Set event that the creation is complete to release tasks waiting on this event. All other async utilities will wait on this event, if it is not set, before proceeding.

Destroy the async function tables created above.

Check input CYCL, MTRC, MTRL, MAXT values for an input job and function against reasonable limits. Any value which is its outside reasonable limits is reset to an appropriate default (see supertype 1 table above for defaults). Set values into the async function table for the input function.

Wait up to forever for a cycling function of the input job to be ready. Return when either the cycling function is ready or SLC exec has asked for the task to exit, polled every second. Return with the function that is most past-due. To determine if a cycling function is ready, the following conditions must be true. Note that current values of supertype 2 CSTR secondaries (MMSK, CMSK, and HSTA) must be retrieved from the database first:

The current time exceeds the time of the previous action (CTIM) by the cycling period. The cycling period is either CYCL or SCAN, if the bit for the function is set in MMSK and SCAN is less than CYCL. Get SCAN from the database only if the MMSK bit is set (only if needed).

The function must be enabled as determined by checking the appropriate bit in CMSK.

The job must be enabled as determined by checking the appropriate bit in HSTA.

Update CSTR UTIM, CTIM, and ELPS in the local database. Copy all values from the async function tables to the database. The timestamps are first converted to VMS format. Values for UTIM and CTIM will always be different so there is no need to check for changes.

Update CSTR PUPD, PVAX, NRUN, and FAIL in the local database. Calculate percent of NRUN resulting in a successful DB update (PUPD) and percent of NRUN triggered by Alpha message requests (PVAX). Update the database on change only. Zero applicable counters after update.

Update the Alpha database with metering. A request to update the Alpha database is done for the input job if one of the following conditions is true:

The database write was a result of processing an Alpha message request as indicated by an input flag.

The last Alpha update request failed.

No database update has happened in the last MAXT seconds.

The number of successful database updates is less than the allowed number (MTRC) in the last MTRL seconds.

The bit for the function is set in FMSK. The value of FMSK must be retrieved from the database first.

Update the following items in the async function table for the input job and function if an Alpha update was done:

Metering-started timestamp set to current time if metering has just started.

Increment the metering counter if metering is in effect.

Increment either the database-update-failed counter or database-update-success counter depending on status from the update request.

Set the last-database-update timestamp to current time.

Set the last-database-update status.

Update statistics for the input function in the async function tables including:

Time of the last execution set to current time.

Increment of execution counter.

Current time - input time right before function began (function execution elapsed time).

If the function was executed as part of an Alpha message request as indicated by an input flag, increment of the execution-due-to-request counter.

Write async function tables for an input job name (can be ALL*) to the console.

F.4 IOC Shell Interface

Write the async function tables for an input (and optional) job name by calling an async utility from above.

Allow the user to override CYCL, MTRC, MTRL, MAXT for an input job and function in the async functions table by calling an async utility from above. Someone may have done a "dbedit" on the Alpha and this will put the change into effect without needing to restart the SLC IOC. It is also convenient for testing.

G. Timing System PNET Diagnostics (TIME)

G.1 Process Requests with TIMING Function Codes

Take a request with the TIMING function code out of the TIMING job queue, first come, first served.

Copy the source SCP ID from the request header to the destination SCP ID used by message logging. Reset after reply is queued.

Handle the following function codes:

Function Code Action

TIMING_PNET_GETCIRC Return a full second of PNET data.

For TIMING_PNET_GETCIRC, if the request is to get PNET data which is synchronized with the MPG, wait on the PNETDIAG ready EPICS record with a 2 second timeout. Once the record is set to "READY", copy the last second of the PNETDIAG circular buffer EPICS record to the reply and update the reply status. Reset the PNETDIAG ready record to "NOTREADY" after the copy is finished.

For TIMING_PNET_GETCIRC, if the request is to get non-synchronized PNET data or if a timeout occurs for a synchronized PNET data request, set the PNETDIAG ready EPICS record to "READY", copy the last FULL second's worth of PNET data from the PNETDIAG circular buffer EPICS record to the reply and update the reply status. Reset the PNETDIAG ready record to "NOTREADY" after the copy is finished. To get a FULL second, the circular buffer must be searched until the previous two edges, as defined in the PNET data and corresponding to the beginning and end of the last second, are found and then the copy done appropriately. If the two edges cannot be found, a bad status is set in the reply.

Copy the source and destination from the request header to destination and source, respectively, in the reply header. Copy the VMS timestamp and function code from the request to the reply header. Set the proper data length in the reply header.

If a reply is requested, queue reply in the message service reply queue.

Log the following conditions:

Invalid function code

Unable to queue a reply

Queue full

Two edges, corresponding to the beginning and end of a second, cannot be found in the PNET data buffer

Update the following diagnostics, reset on-demand:

Total number of requests processed.

Time of last request

Maximum and average delta time between the time of receipt and the time of reply

EPICS ASSUMPTIONS:

When the PNETDIAG ready record is "NOTREADY", the PNET data is copied into the PNETDIAG circular buffer sized by 720 values for 2 second's worth of data. When the YY_PNETDIAG YY value occurs in the PNET data, the copy continues until the next "edge" of a second as defined in the PNET data happens and then stops. At that point, EPICS sets the PNETDIAG ready record to "READY". If another YY_PNETDIAG occurs before the "edge" is found for the previous YY_PNETDIAG or while the PNETDIAG ready record is "READY", it is ignored.

I. References

I.1 General

LCLS Specification #1.2-201 by the LCLS Controls group, May 2004

LCLS Controls C Coding Standards

LCLS Controls EPICS Database Standards

LCLS Record Naming Convention

LCLS Distributed Control System Requirements by Patrick Krejcik, Nov 2003

List of all LCLS Requirement Documents

EPICS: Input/Output Controller Application Developers Guide, Release 3.14.6

SLC Development Help for SLC IOC Testing

SLC-Aware IOC Architecture Thoughts by Tony Gromme, Mar 2004

Overview of the SLAC Control System by Rusty Humphrey, 2002

LCLS Controls Timing System Documents

Basic Users Guide: SLC Control System Introduction

Basic Users Guide: SLC Micro Structure

Adding a Micro to the SLC Control System by Ken Underwood, 2001

I.2 SLC Message Service

Network Upgrade for the SLC: Control System Modifications by Mark Crane, 1997

SLC EnetMicro How-To

Basic Users Guide: Message Communications in SLC

I.3 SLC Database

SLC-Aware IOC Database Notes by Tony Gromme, Mar 2004

PRIMARY.DBS: List of SLC Database Primaries and Secondaries

MICRONAME.DAT: List of SLC Database Micro Names

Index Panel for SLC Control System Channel Access Clients

Introduction to the SLC Database by Ken Underwood, 1988

Principles of Operation: SLC Database Internals

Basic Users Guide: SLC Database

Programmers Guide: SLC Database

I.4 Async Utilities

SLC Asynch Database Update Design Spec by T Lahey, N Spencer 1989

Improving Control of Auto-Checking Functions by T Lahey,N Spencer, R Hall 1990

I.5 Message Logging

CMLOG Attachment to the SLC Control System by Ron MacKenzie

LCLS Controls CMLOG Page

ESD CMLOG Page

I.6 Controlled Devices: Power Supplies, Stepper Motors

LCLS Spec: Orbit Feedback Corrector Requirements

Principles of Operation: SLC Large Power Supply Control

Principles of Operation: SLC Stepping Motor Control

I.7 Gated ADCs: BPMs, Toroids, Wire-Scanners, Profile Monitors

LCLS Spec: BPM System Requirements

LCLS BPM System Requirements by Linda Hendrickson, 2004

SLC BPM Acq Description by Tony Gromme, Sep 2004

SLC-Aware IOC BPM Notes by Tony Gromme, Mar 2004

SLC Control System Beam-synchronized Data Acquisition Software by Tony Gromme, 1999

Principles of Operation: SLC BPM Software Internals

Wire Scanners for Emittance and Beam Size Measurement at the SLC

SLC-Aware IOC Home Page | LCLS Controls | EPICS at SLAC | SLAC Computing | SLAC Networking | SLAC Home
Contact: Stephanie Allison,
Last Modified: June 28, 2005

Function Code	Action
MSG_IOC_SLCNOTIFY	Reply with status first. Send request to SLC executive to either restart or stop all SLC tasks depending on the request data.
IOC_STOP	No reply. Exit gracefully.

Function Code	Action
FUNC_TEST	Existence check from PARANOIA. Reply with data exactly as sent.
TEST_ECHO	Reply with data exactly as sent.
TEST_ECHO_MWORD	Reply with blocks of repetitions for given word.
TEST_ERR_METER_RESET	Reset cmlog throttling (TBD). Reply with the throttling reset status.
IOC_STOP	No reply. Exit gracefully.

Name	Function Description
CHK1	Update timestamps, elapsed times, job availability in the SLC database
CHK2	Update and reset statistics (counters, percentages) in the SLC database
CHK3	Update and reset statistics in the EPICS database
CPUM	Update CPU and memory usage

Secn	Description	Use
JMSK	Expected job bitmask	Read at init Used by CHK1 to set job status (MSTA)
CNAM	Cycling job/function name pair*	Read only at init and used in async function table creation
CYCL	Fixed cycle period (>= 1 seconds)*	Read at init and during CHK1 Used to determine cycle period
MTRC	Max # DB updates allowed during MTRL (>= 1)*	Read at init and during CHK1 Used for DB update metering
MTRL	DB update period for MTRC (seconds, default is 60)*	Read at init and during CHK1 Used for DB update metering
MAXT	Max time between DB updates (>= 0)*	Read at init and during CHK1 Forces a DB update regardless of metering

Name	PV Description	Use
IOC:micr:1:CPU	CPU idle time (percent)	Copied to CSTR CPU during CPUM processing
IOC:micr:1:RMX	Available memory (bytes)	Copied to CSTR RMX during CPUM processing
IOC:micr:1:TBD	TBD	Copied from TBD during CHK3 processing

Function Code	Action
TIMING_PNET_GETCIRC	Return a full second of PNET data.