EPICS Save/Restore (sr) Upgrade Project
(Channel Watcher)
Problem: The current version of Save/Restore (sr), also known as Bumpless Reboot, may be interfering with normal IOC operations due to the large number of file writes.
Solution: Move the file writes (i.e. Save functions) to a client computer (i.e. UNIX box). We may even eventually do away with the file altogether and use a database such as Oracle. Leave the restore functions on the IOC.
Notes from the EPICS Workshop, Dec. 3&4, 2001, Fairmont Hotel, San Jose: Jeff Hill suggests that we design the software with Object Oriented Plug-ins so users can choose which features to include in their local Channel Watcher builds. Click here for software design notes.
The software upgrade will be developed in the following stages. Each stage represents a point at which the software will be tested and released.
Stage 1: Simple move to UNIX with enhancements
Stage 1 Enhancements:
- Toggle channel logging on the fly
- Add and Remove channels on the fly
- /nowrite option for logging only and toggle it on the fly. Channels with the /nowire option do not cause the Restore Repository to be generated.
- Don't write unknown values to Repository for when Channel Watcher can't connect to IOC, like when it's down
- Allow other Channel Groups inside the Channel Group (i.e. Masterfile).
- Web documentation
- Put Channel Groups and the Master Files in AFS and manage using CVS. Manually FTP or SCP files to the gateway /usr/local area after a change.
- Add a channel alias name which will be supplied as a comment next to the channel name in some Channel Group files.
- Put Repositories in the /u1 area (different from the area for the Channel Groups) where they will be backed up every night. Each Repository may be in a different directory so if a Master File is used, it must specify the Channel Group file name, with path, and the path and name of the Repository created for it.
- Write Repository on any channel value change, but no more than every configurable seconds. See -t option on startup.
- Repository files need to have owner and group write privilege
- Write Repositories in s/r version V1.91 format. This format is in ascii because it is easily editable by somebody making a software release who needs the IOC to reboot with different values.
- Remove check for string of nulls.
- Create a health summary EPICS Record and additional CWlog messages when this record goes into alarm state. Record names are CS02:CWPEPII:SUMY:STAT and CS02:CWNLCTA:SUMY:STAT (not yet available). mbbo enum values are:
- 0 = OK
- 1 = DOWN
- 2 = HUNG
- 3 = CONNERR
- 4 = DISKERR
- probably others up to 15
- Don't tie Channel Watcher to a specific file system such as afs or nfs.
- If, for some reason, the Channel Watcher is prone to crashing, make cron job that will execute every 5 minutes and restart any missing Channel Watcher process (we may have one cron job per Channel Watcher process if that's better).
- Test Plan: In order to ensure the Channel Watcher is properly running we plan on running it in parallel with the current Save/Restore software on the IOC, using different Repositories, of course, and comparing the Repositories generated by both. When we're satisfied Channel Watcher works we'll need to add Channel Watcher to the UNIX startup files and remove the create_monitor_sets from IOC startup.
The following tags may be used in CWlog messages:
Tag Value text text of message like what epicsPrintf provides facility Channel Watcher host IOC associated with the channel device Channel Name message Channel alias severity EPICS severity of the channel status EPICS status of the channel value Current value of the channel as a string code See Channel Watcher file CWlogmsgABC.hh for a list of codes
Stage 3: Statistics
Web interface to list channels by
- Active Channel Group
- Clickable Inactive Channel Group files just like windows does
- IOC
- Alphabetically
- How often channels are changing and flag channels changing faster than some max rate that we can change on the fly. Even have a different max update rate per channel and a list of channels by max update rate.
- Time of last change
- Which channels are not connected (just like the Channel Archiver)
Stage 4: Wildcarding using regular expressions
Add wildcarding of channel names in the Channel Groups and to the Statistics tool listed above using the same Oracle Database we now know and love. Steph says that there are epics library tools that already know how to resolve regular expressions.
As long as we're doing that Kristi wants the software designed in such a way as to make it easy to make a tool that will list channels given a regular expression. Make it and call it ChannelLister.
Stage 5: Channel Groups and Repositories in Oracle Database
Stage 5 Enhancements:
- Real on the fly toggling of /log, /nowrite, add/remove channel, and maximum update rate on a per channel basis with messages to CWlog. Make sure to allow wildcarding with regular expressions.
- GUI replaces Channel Groups in files
- Channel values written to Oracle Database, but file Repositories still generated.
- Channels with /nowrite should not update the Oracle Database.
- On demand save via a button push that allows different file names for the Repository, but the Bumpless Reboot functionality still generates the Repository on value change
- CGI scripts in Perl for web interface to data stored in Oracle
.
Stage 6: Restore from Oracle Database
Instead of reading the file Repositories the IOC could get the channel values directly from the Oracle Database possibly with Aida. Steph also points out that at this point the Channel Watcher could run on an IOC so make sure the code is generic enough so that it can run on multiple platforms.
Channel-Group-file-name <white space> Repository_file_name
Master File Notes:
- Optional items are underlined.
- Repository file name defaults to Channel-Group-file-name.sav
- Allow macros in the above file names and use the epics macro library.
- Make sure Channel Group file names don't recurse.
Channel Group File Format: optional items in italics
# comment
channel:name <white space> /log <white space> /nowrite <white space> # <white space> channel:alias
channel:name <white space> /LOG <white space> /NOWRITE <white space> # <white space> channel:alias
<blank lines>
How to start the Channel Watcher:
Options for starting channel watcher:
-f | specifies the Channel Group file name or Master File name | required |
-s | specifies the Repository Name | optional, defaults to Channel Group File name .sav |
-c | specifies the top of the Channel Group location. Overwrites CW_CHANNEL_GROUP_ROOT | optional |
-r | specifies the top of the Repository location. Overwrites CW_REPOSITORY_ROOT | optional, but -r requires -c |
-t | specifies the minimum time required between Repository generations | optional, default to 10 seconds. Must be >= 10 seconds. |
-l | specifies the location of any non-Repository file written by Channel Watcher (like a log file) overwrites stderr | optional, defaults to stderr |
-n | specifies the maximum number of cmlog messages to issue in -m seconds before throttling for a channel | optional, defaults to 4 messages |
-m | specifies the minimum time required between cmlog messages for a channel before throttling | optional, defaults to 10 seconds |
-e | specifies ca_pend_event time in seconds | optional, defaults to 1, must be less than or equal to -t and -m |
-i | specifies and identifying tag for log messages. Useful when running more than one Channel Watcher. | optional, defaults to NULL |
-h | help | optional |
-p | Health Summary PV name | optional |
Environment variables used by Channel Watcher:
- CW_CHANNEL_GROUP_ROOT - optional. Specifies the top of the Channel Group location.
- CW_REPOSITORY_ROOT - optional. Specifies the top of the Repository location.
Notes:
Definitions:
Last Update 10-Oct-2002 by Mike Zelazny zelazny@slac.stanford.edu