Minutes from the 03/23/05 LCLS SLC IOC meetings: (1) Database Service and Utilities: Debbie has dblput working for scalars, all data types. No byte-swapping is needed for S (>4 char strings) and A (4 char strings). When Ron tests async, he'll test dblput of arrays. Debbie has dbupdate working too including all the hi/lo logic that she mostly ported straight from the SLC micro. Currently, she will update either supertype 2 or supertype 3 data (per second argument in dbupdate). So to update both supertype 2 and 3, two messages to DBEX are required. It's possible that Kristi may want to update both ST2 and ST3 data in one message (BDES, BACT, STAT in one message). If so, Debbie says that it won't be too hard to combine both in one message and allow one dbupdate call for both supertypes. Debbie found a problem in the alpha DBEX when updating supertype 2 data. Tony quickly diagnosed and fixed DBEX to allow ST2 for SLC IOCs. Currently, no wait for acknowledgement from DBEX is done in dbupdate. Debbie will work on this after cleaning up and modularizing the dbSend code. We tried to think of a use-case where someone will not want to wait for acknowledgement. One is iocsh and Debbie has already coded that separately using utilities under dbupdate but she could use dbupdate instead if it had a no-wait argument. The only iocsh left to test is slcdbupdate with the ALL* argument. Tasks left for Debbie in this order: * finish iocsh routines and testing * dbupdate to wait for acknowledgement, dbSend cleanup * other cleanup * add any missing diagnostics (collision count, for one) (2) Message Service and General Utilities: Diane has supported Debbie as needed with utilities. She has also tested large buffer transfer using TEST_ECHO_MWORD by using the handy custom message function for the SCP. She found no problems with the transfer but was unable to test that the data wasn't corrupted using TEST_ECHO_MWORD. That will need to be done when the BPM job is ready. Diane did some proxy up/down testing. She confirmed that after the proxy is restarted, it takes some time for the Alpha processes to reconnect to the proxy, even though the SLC IOC side reconnects right away. Bottom line - after a proxy restart, the SLC IOC must be restarted, same as for the ethernet SLC micros. We have no plans to fix this at this time. Diane is thinking about developing device support for the interface to SLC IOC diagnostics. We think that there are certain diagnostics common for all threads that Diane may want to add to the common thread structure like: - looping counter (increment whenever thread loops or processes a message) - busy flag (like epics PACT - set when thread is doing something vs waiting on a queue or sleeping) - timestamp of last action - the last function code or job code or some integer associated with the last action of the thread - status of the last action Tasks left for Diane: * support others as they start using her utilities, plus consultation * diagnostics * correct thread priorities (waiting on requirements) * slcExec change (see later for more discussion) (3) Cluster Status Service and Async Utilities: Ron has had many comments on his design in the last week. Enough to warrant another revision of the specs. He is also starting to code those pieces which are not affected by the changes or have a clear design. The major change is to the cstrAsync thread which now has a different purpose - to send messages to the Handlers when there is a cycling function to perform. cstrHdlr is now doing the functions that cstrAsync did in the previous design. We have decided for now to log a message and exit on fatal error if CNAM is not set up properly. People will have to correct before SLC will start properly - note CNAM rarely changes once it is set up. Also, Ron will add proper comments to ref_dbsfile:defaults.dbs. This simplifies the code quite a bit. Ron presented us a diagram (thank you!!) of the new way async fits in the overall system and we all agreed it was logical and a good improvement. Tasks left for Ron: * update specs and send out for review * code cstrAsync and async utilities used only by cstrAsync * finish cstrHdlr and async utilities used only by cstrHdlr * code async utilities used by all Hdlr's (4) Magnet Service: Kristi is updating her functional requirements based on comments from the Mar 16 review and ideas/emails since then. She is splitting the bitmasks into their individual bits and describing what each is used for. Same with cmlog messages (about 50 of these and most will have to be done on the EPICS side instead of the SLC side). She is adding pictures of the SCP displays and panels in order to describe functionality better (good idea!). Once she has the second version of the requirements ready, she wants to have another discussion with a small group - Nancy, Ken, Tony, Stephanie. Then another discussion with the larger slc ioc group. We'd like to try "use-cases" in this larger discussion and see how this method works for us. She is also coming up with a list of proposed changes to the SCP displays for magnets that we'll need to review. The idea of another secondary to store IOC-unique bits of information to be used by the SCP was discussed but discouraged. (5) Proposal for changes to slcExec: Ron wondered if we could get rid of event flags (one for the database and the other for the function tables) that everybody has to wait on since all threads are started at the same time and some depend on the others to initialize. Diane proposed changing slcExec and we all agreed it sounded worthwhile and logical. What slcExec does to start the SLC threads: (1) Sets dbExists, asyncExists, and all thread active flags to false. (2) Starts msgSend, dbSend, dbHdlr, and dbRecv. (3) Waits for all threads to become active and dbExists to be true. (4) Starts cstrAsync. (5) Waits for active and asyncExists to be true. (6) Starts all Hdlrs (msgHdlr, mgntHdlr, cstrHdlr, timeHdlr, bpmHdlr). (7) Waits for all to become active. (8) Starts msgRecv. msgRecv now sends PARANOIA the I'm-alive message. (9) Logs the "started and at your service" cmlog message. What slcExec does to stop SLC threads: (1) Logs the "stopped and out of service" cmlog message. (2) Sets the stop flag for all threads. (3) Waits for all threads to be inactive below allowing a restart. About the thread active flag: Each handler and cstrAsync thread will set its active flag after successful initialization and before entering its main loop. The *Recv and *Send threads have different criteria for setting their active flag. Hints about deallocating a resource used by multiple threads: When deallocating a mutex or queue that may be used by another thread before that thread sees that it needs to exit, always null the pointer before actually deallocating the resource. Then if the other thread tries to use it, it'll just get an error. Diane and Debbie have discovered race conditions and solved them this way. (6) CMLOG Message Logging: Stephanie to change slcCmlog so that any message being logged by a thread that is stopping will be dropped unless the IOC is development or the debug flag is on. We ran out of time to discuss error metering. (7) PV Naming: Stephanie updated based on comments from the last few weeks: http://www.slac.stanford.edu/grp/lcls/controls/global/standards/software/RecNameConv.html (8) IOC Shell Help: In previous meeting minutes: http://www.slac.stanford.edu/grp/lcls/controls/slc_ioc/meeting/041124.txt it was stated that Debbie and Diane will produce an "SLC-Aware IOC Shell Command User's Guide". We've decided to simply document the iocsh commands in our design specs and not produce this additional document. Most of the time, people will use the regular iocsh help facility to remind them of the commands and command arguments. For example: epics> help Type `help command_name' to get more information about a particular command. # < asInit asSetFilename asSetSubstitutions ascar asdbdump asphag aspmem asprules aspuag astac callbackSetQueueSize casr cd coreRelease dbDumpBreaktable dbDumpDevice dbDumpDriver dbDumpField dbDumpFunction dbDumpMenu dbDumpPath dbDumpRecord dbDumpRecordType dbDumpRegistrar dbDumpVariable dbLoadDatabase dbLoadRecords dbLoadTemplate dbLockShowLocked dbPvdDump dbPvdTableSize dbReportDeviceConfig dba dbap dbb dbc dbcar dbd dbel dbgf dbgrep dbhcr dbior dbl dblsr dbnr dbp dbpf dbpr dbs dbstat dbtgf dbtpf dbtpn dbtr eltc epicsEnvSet epicsEnvShow epicsMutexShowAll epicsParamShow epicsPrtEnvParams epicsThreadShowAll epicsThreadSleep errlogInit exit gft help iocCmlogDStart iocCmlogHdlrQSize iocCmlogMsg iocCmlogStart iocCmlogStop iocInit iocLogInit pft pwd registryDeviceSupportFind registryDriverSupportFind registryFunctionFind registryRecordTypeFind saIoc_registerRecordDeviceDriver scanOnceSetQueueSize scanpel scanpiol scanppl setIocLogDisable slcFreeLists slcRestart slcStart slcStop slcdbDump slcdbDumpHash slcdbDumpHashFile slcdbEdit slcdbGetExists slcdbGetMeta slcdbGetUnits slcdbGetVersion slcdbSetHashTableSize slcdbUpdate thread tpn var ..... Note that all SLC IOC iocsh commands start with "slc".... epics> help slcdbEdit slcdbEdit primary unit secondary value (9) Use-Cases: Debbie proposed that we try using "use-cases" to nail down requirements. We'd like to try it with the magnet requirements and Debbie and Diane would like to give it a try with their next project. She found this primer: http://www.bredemeyer.com/use_cases.htm Extracted from Debbie's proposal: We are all trying to a) come to an understanding of what the current (sub)system does, and then b) how to best implement the functionality (requirements) in our current ioc design. My thought concerns the process for (a). I have found "use cases" to work well. I am no expert, nor have I formally schooled myself in use cases, but by researching all the "user" cases, (where a user is not literal - can be a person operating a SCP, or another automated subsystem "using" this subsystem), they help decipher the requirements, relative "use" priorities, timing, implementation, and answer why something is implemented a certain way. This is documented at a high level (top down) which also serves to bring newcomers up to speed. For example, ask the question what are all the user interface (UI) functions concerning magnet and magnet-like devices, that are performed? This requires interviewing "the experts" who actually use the system and the programmers who previously built-in the automation. What functions have priority over the others...What are the timing constraints, diagnostic opportunities... What other subsystems "use" (interface to) this subsystem, etc, etc. Ask how, what, where, when.. Perhaps if we take a more formal approach and document use cases prior to anything else, we would all have a better understanding of what we are trying to accomplish top down and have the team members be on the same page. As a newcomer, I have found the bulk of our documentation to jump in at the detail level without having the "big picture" of why this subsystem exists and who (person, place, thing...) uses it.