Current LCLS archiver data is stored by many archive engine processes in subdirectories of the lcls-archeng local disk buffer /arch/lcls directory. This disk buffer space must be large enough to store at least two weeks of LCLS archiver data but is not designed to store long-term LCLS archiver data. Currently the LCLS archiver administrator causes each archive engine to be restarted each workday to close data/index files in the previous current engine data directories in the lcls-archeng local disk buffer area and store new archiver data in new current engine data directories. The administrator then invokes the scripts/update_server.pl script from the /nfs/slac/g/archiver/arch_lcls directory on the lcls-archsrv machine, which first copies the closed data/index files from the previous current engine data directories to the NFS LCLS archiver long-term regular density data storage area (subdirectories of /nfs/slac/g/archiver/arch_lcls/lcls).
The update_server.pl script also updates the NFS LCLS archiver long-term regular density disk storage area engine indexes and the higher-level /nfs/slac/g/archiver/arch_lcls/lcls/master_index index after the copy operation is complete. This index and the current engine data directory indexes are used to update the current top-level LCLS archiver index (/nfs/slac/g/archiver/arch_lcls/current_and_all_index), which is continually updated by the lcls-archsrv /nfs/slac/g/archiver/arch_lcls/scripts/update_indicies_not_server.csh process.
This current top-level LCLS archiver index is the index used by a Archive Data Server process and the Aida Channel Archiver data server processes to retrieve recent archiver data. Older regular density data is retrieved using saved older versions of the /nfs/slac/g/archiver/arch_lcls/lcls master index files (master_index_*).
Sparse density data is created to dramatically improve retrieval performance for older archiver data. This data is retrieved by default for archiver data older than the past two weeks (Archive Viewer users, who obtain archiver data through a Archive Data Server process started to process a request, and Aida Channel Archiver clients may override this default and retrieve only regular density data if desired). Early each morning a cronjob running on lcls-prod01 invokes the file /nfs/slac/g/archiver/arch_lcls/scripts/perform_sparcify_activities.pl, which first uses regular density archiver data copied since its previous invocation to create sparsified archiver data in subdirectories of /nfs/slac/g/archiver/sparce_arch_lcls/lcls. Next, this script updates the sparse density disk storage area engine indexes and the higher-level /nfs/slac/g/archiver/sparce_arch_lcls/lcls/master_index index. Finally, the script updates the /nfs/slac/g/archiver/aida_indexes/lcls_sparce_indexes_info.txt file, one of two retrieval information files that determines which index files to use to retrieve data for a requested time interval.
The lcls_sparce_indexes_info.txt retrieval information file is used by both Archive Data Server and Aida Channel Archiver server processes to determine which indexes to use to retrieve data for a requested time interval when the default retrieval is made to retrieve regular density archiver data for the past two weeks and sparsified density data for older archiver data. The other retrieval information file in the /nfs/slac/g/archiver/aida_indexes directory is lcls_indexes_info.txt, which is used when the user requests to retrieve only regular density data for any specified time interval. These two ASCII text files have the same format which contain three lines for each index that may be used for retrieval: the first line indicating the location of an index, the second line indicating the start retrieval date/time for this index, and the third line indicating the end retrieval date/time for this index.
There are frequent requests to modify the archiver configuration files. Usually the requests are for new PVs to be added to the LCLS archiver or for changes in the archiving mode or sampling rate for existing archiver PVs. There are also occasional requests to remove PVs from archiver configuration files.
There is an archiver configuratin file for each of the LCLS archiver engines. These files are located in the local disk area on the lcls-archeng and may be accessed from the lcls-archeng, lcls-archsrv, lcls-archsrv machines. To access a configuration file from lcls-archeng, first login to this machine using the laci account:
Refer to the "LCLS Channel Archiver PV Change Scripts Guide" document for information regarding how to modify archiver engine configuration files using scripts, which is the usual recommended archiver engine configuration file modification method. These configuration files are not modified by directly editing them in the /arch/lcls/lcls_nn directories but instead are modified in the /nfs/slac/g/archiver/lcls_pv_changes directory and then copied to the appropriate /arch/lcls/lcls_nn directories after backups are made (see the "Need for Backup Configuration Files and Recovery" section below).
After a change to an archiver engine configuration file, the corresponding archiver engine must be restarted for the change to take effect. This is accomplished by stopping an archiver engine associated with the modified configuration file and allow the LCLS Archive Daemon running on lcls-archeng to detect that the engine has been stopped and restart it. To stop an archiver engine, the preferred method is to send a message to an archiver engine to stop by entering a URL in a browser.
For example, to stop engine 1enter the following URL in a browser:
To monitor the status of the LCLS archiver engines and check whether an archiver has been restarted, enter the URL of the main LCLS Archive Daemone web page in a browser:
Occassionally, the method of stopping an archive engine by sending a message to an archive engine to stop by entering a URL in a browser will not work. Then it is necessary to stop the engine by issuing a UNIX kill command. First, find the process id number of the archive engine one wishes to stop using the following command on the lcls-archeng machine under the laci account:
Also, after this kill command has been invoked to stop an engine it is frequently necessary to remove the lock file for the engine so the engine may be restarted by the LCLS Archive Daemon.
For example, to remove the lock file for archive engine 1 after it has been stopped using the kill command:
Frequently, people who make requests for PVs to be added to the archiver do not specify the archiver mode to be used and the associated sample rate for sampled PVs or the estimated rate of change for monitored PVs. The requestor must specify the archiver mode for each PV: sampled or monitored. For each PV archived in sampled mode, the requestor must specify the sample interval in seconds. For each PV archived in monitored mode, the estimated change interval in seconds must specified in the archiver configuration file (often the requestor does not know how often a PV value will change in monitored mode so in these cases use either an estimated interval of 1 second or an estimated interval that is the same as other similar PVs already in the archiver file).
A message such as the following may be sent via email to these requestors explaining the differences between the sampled and monitored archiver modes and the information they need to supply:
It is important to notify the requestor via email after the request has been processed. If new archiver PVs were added, it is important to determine whether all of the new archiver PVs connected successfully and let the requestor know if this is the case. If one or more PVs did not connect successfully, the requestor should be sent a list of the names of these PVs.
To determine this information for new archiver PVs to be added, the easiest method is to determine how many channels were connected in the archiver groups to which more PVs are to be added before the change and after the change.
The number of channels (PVs) for an archive engine group can be found by selecting the link for the desired engine on the main Archive Daemon web page and then the "Groups" link on the subsequent web page. Alternatively, the URL for the groups of a specified engine may be entered directly with the following form:
For example, if 10 new archiver PVs are to added to the LCLS_1 archive engine group "lcls_1_water", one should bring up the "http://lcls-archeng:4901/groups" web page before the change is made to determine the number of channels in the group and how many are currently connected. Let us say that 346 channels were in the group and 340 were currently connected before adding the 10 new archiver PVs to this group. After adding these new archiver PVs to this group, one would expect that this web page should indicate that 356 channels are now in the group and 350 are connected. If this web page does not reflect these expected totals, then one can determine which channels are not connected for the group by selecting the link on this web page containing the group name (e.g., "lcls_1_water"). The not connected channel (PV) names are shown in red. One possibility for a discrepancy in the actual and expected number of channels connected for a group is one or more typos in channel names. Another possibilty is that the signals associated with the new archiver PVs are not currently operational but will be in the future.
Obviously, one should be very careful when modifying an archiver configuration file. A mistake made in editing one of these XML files will prevent the associated archiver engine files from restarting. Even if the automated tools to modify the archiver configuration files described in the "LCLS Channel Archiver PV Change Scripts Guide" are used, it is occassionally necessary to back out an archiver configuratin file change. Therefore, a disciplined approach to maintaining backup configuration files must be observed. The convention that is used is to retain two previous versions of each archiver configuration file, with file names suffixes ".prev" and ".prev2".
For example, the archiver configuration files and their backups for the LCLS_1 archiver are located in the /arch/lcls/lcls_1 directory:
Before copying a modifed archiver engine configuration file from the /nfs/slac/g/archiver/lcls_pv_changes directory to the production configuration files directory for an engine: (1) copy the first backup into the second backup, and (2) copy the current configuration file into the first backup. Then copy modified archiver configuration file from the /nfs/slac/g/archiver/lcls_pv_changes directory to the engine's production configuration file directory. It is also useful to examine the differences between the previous version and the new version of the configuration file to verify that the expected changes were made. For example,
After the configuration file has been modified, the first backup may be used to quickly restore operation of the associated archive engine if a problem occurs using the new configuration file.
For example, if a mistake is made in modifying the /arch/lcls/lcls_1/lcls_1-group.xml archiver configuration file then the LCLS Archiver Daemon will be unable to successfully restart the LCLS_1 archive engine after it has been stopped. This can be seen by entering the following URL in a browser:
For example, under the "Messages" section on this web page messages such as the following will be seen approximately every 30 seconds if there is a problem restarting the LCLS_1 archive engine due to a bad /arch/lcls/lcls_1/lcls_1-group.xml archiver configuration file: "Starting Engine 'LCLS_1': lcls-archeng:4901". If a message such as this appears more than once after stopping an archive engine, do the following immediately to (1) save the bad configuration file under a different name, and (2) copy the first backup to the current configuration file, which should allow the archive engine to be restarted successfully:
Then one can determine the location of at least the first error in the bad configuration file by finding a log file created when the LCLS Archive Daemon was unable to start the associated archive engine using this file:
In the above example, line 13 of lcls_1-group.xml.bad should be examined to determine the error in the XML configuration file. The error may be then corrected in the "lcls_1-group.xml.bad" file and then the command "cp lcls_1-group.xml.bad lcls_1-group.xml" may be issued. Then one can stop the LCLS_1 archive engine again, which will cause the LCLS Archive Daemon to attempt to restart the LCLS_1 archive engine using the corrected current archiver configuration file.
Users may request the display of a plot of archiver data for a PV referenced on a LCLS EDM screen (accessed through the "lclshome" main EDM screen). When the retrieval of archiver data fails as the result of such a request, a line containing the timestamp of the request and the name of the PV is appended to the end of the /u1/lcls/tools/ArchiveBrowser/toBeArchivedList file (which may be accessed from a machine such as lcls-builder). The archiver administrator should process this file periodically (e.g., once per week) to add PVs from this list to LCLS archiver configuration files when appropriate.
Note that there is a similar file for FACET, /u1/lcls/facet/ArchiveBrowser/toBeArchivedList, that should also be processed periodically while FACET is running. This file does not grow as quickly as the corresponding LCLS file.
A PV in a toBeArchivedList file does not necessarily need to be archived. In general, the archiver administrator should make a judgement whether a PV in this file can be excluded from consideration for archiving (based on past experience due to naming patterns) or needs to be forwarded to the responsible IOC engineer for a decision. If the responsible IOC engineer determines that a PV forwarded to him/her needs to be archived, he/she needs to indicate the archiving method. However, there are some PVs that do not have a Controls Software responsible IOC engineer (e.g., SLC system PVs and photon [including FEE] PVs). Usually these PVs are archived by the archiver administrator using an archiver method the same as other similar PVs (or an archiving method based on the administrator's judgement by executing "camonitor", for example, for the PV on lcls-archeng).
The following procedure may be followed to process a toBeArchivedList file:
It is good practice to reply as soon as possible after modifying one or more configuration files to the requestor of the change informing him/her that the configuation file change request has been processed. It is also good practice when new archiver PVs are added to inform the requestor about any new archiver PVs that did not connect successfully when the requestor might expect that all of the new archiver PVs should have connected successfully.
If the list of new archiver PVs is large, it may be useful to save the connection status of the group to which PVs are added to a file and process that file with a script that determines which PVs are not connected. After one selects the link for the engine's group name of interest on the Archive Daemon web interface, one may save the group PV connection statuses to a file using the Firefox browser "File => Save Page As..." command. The created file may then be processed by the /nfs/slac/g/archiver/lcls_pv_changes/create_not_connected_list2.pl Perl script to create a text file of the PVs in the group that are not connected.
The LCLS Archive Daemon usually restarts each engine every workday with restart times controlled by the /arch/archiveconfig.xml file. The Archive Daemon startup script, /arch/scripts/ArchiveDaemon.pl, has been modified to not restart each engine on weekends when the active archiveconfig.xml file specifies daily restarts. This was done to avoid the restart of engines when there is no archiver administrator available to respond to any problem that might occur. For this reason it is also desirable not to restart each engine on holidays. The ArchiveDaemon.pl script has also been modified to read the /arch/holidays.txt file during a restart of the Archive Daemon process in order to determine other days (besides weekends) it is desired not to restart the engines.
After the /arch/holidays.txt file is modifed, the Archive Daemon process must be restarted. The /arch/scripts/archive_engine_monitor.pl process should also be restarted since it also reads the holidays.txt file during initialization.
Every workday the LCLS engines are normally restared starting at 10:00 AM (at one minute intervals) on the lcls-archeng machine. The /arch/scripts/archive_engine_monitor.pl process running on this machine monitors the engine restarts and upon successful completion writes a file of the form date_ready_for_copy_and_index.txt (where "date" is today's date in yyyy_mm_dd format) in the /arch/log area. This normally occurs at approximately 10:19 AM and email is sent to the archiver administrator(s) indicating that all engines have been successfully restarted.
The lcls-archsrv /nfs/slac/g/archiver/arch_lcls/scripts/auto_update_server.pl process checks this /arch/log (mounted lcls-archeng local disk space) area every minute looking for a file in the above format for the current day's date. When it detects such a file it begins the task of copying the recently closed engine directory data and rebuilding the NFS regular density indexes (in the /nfs/slac/g/archiver/arch_lcls subdirectories). After this task has completed email is sent to the archiver administrator(s) indicating that the LCLS archiver copy and index rebuild finished.
During the process of changing the LCLS Top-Level Archiver Index (as described in the section below) it may be desired to override this default automation processing to restart the engines before the automation process 10:00 AM start time and initiate the data copy/ index rebuild task sooner than the usual 10:19 AM. The following procedure describes procedure describes this process:
After changing the LCLS Top-Level Archiver Index, the default automation processing can be restored:
Archiver indexes may not be allowed to exceed 2 GB. In a matter of weeks since the LCLS top-level archiver index was last changed, it must be changed again before it reaches the 2 GB size limit. Several hours are required to change this index.
http://lcls-archeng:4900and confirm that the engines are indeed restarting the next day.
and then for each engine, to remove the data for April 2013, we'd dossh lcls-archeng -l laci cd /arch/lcls
pushd /arch/lcls/lcls_xx/2013 pwd ls -d 04_* rm -rf 04_* ls -d 04_* popd
The majority of users obtain data from the LCLS archiver system using the LCLS Archive Viewer. This may be started from the lcls-builder machine as follows:
An update archive index process running on the lcls-archsrv machine should be constantly updating the main LCLS archive index, /nfs/slac/g/archiver/arch_lcls/current_and_all_index.
To check whether this index has been updated lately, perform the following commands on lcls-archsrv or any flora machine, for example:
Finally, if the above tests were successful but the main LCLS archive index is not being updated, check to make sure the update archive index process is running on the lcls-archsrv machine:
You should see information from the above command indicating that a process is running which is executing a command containing the string "indicies" ("/nfs/slac/g/archiver/arch_lcls/scripts/update_indicies_not_server.csh", for example). If such a command is not running, start it on the lcls-archsrv machine running under the laci account:
All of the archiver engine processes should be running at almost all times on the lcls-archeng machine. The easiest method to check whether these archive engine processes are running is to use the main LCLS archiver web page:
This web page also indicates the status of each engine: the number of channels currently connected and the number of channels belonging to each engine (e.g., "6376/6426 channels connected", displayed in red).
If this web page cannot be accessed, it is likely that the LCLS Archive Daemon process is not running on the lcls-archeng machine. To determine whether this is the case:
This last command should indicate show information such as the following if the LCLS Archive Daemon process is running:
If the main LCLS archiver web page cannot be accessed, the following procedure should enable you to access it whether or not the LCLS Archive Daemon process was running previously. It will also cause all of the LCLS archive engines to be stopped and then restarted by the new LCLS Archive Daemon process. The procedure assumes that you are on the lcls-archeng machine logged in using the laci account:
After restarting the LCLS Archive Daemon process, one should check whether this process has successfully restarted all of the LCLS archive engines. This is done through the following web page:
It may a minute or two from the time the LCLS Archive Daemon is restarted for an indication to appear on this web page for each LCLS archive engine that each engine is running. For each LCLS archive engine, a red "Not running" message usually means that the system has not finished dectecting whether the engine is running. The indication that an LCLS archive engine is running is also shown in red and shows the number of channels connected and the total number of channels in the archiver configuration file for that engine (e.g., "6376/6426 channels connected").
If a message appears indicating that a lock file may be present, remove all lock files for the associated archive engine. For example, for the LCLS_1 archive engine:
Each time an LCLS archive engine is restarted, a new ".log" and ".out" file is created in the main directory for the archiver (e.g., /arch/lcls/lcls_1 for the LCLS_1 archive engine). The name of each archive engine ".log" and ".out" file reflects the timestamp when the engine was started (e.g., 2008_05_20-16_00_16.log and 2008_05_20-16_00_16.out). The log file contains useful diagnostic information that may contain information regarding errors that occurred during archiving. The ".out" file is usually empty.
The Archive Export utility may be used to view archive data in tabular form. This is occasionally useful as a diagnostic tool when it is desired to see individual data sample timestamps and values rather than the graphical archive data available through the Archive Viewer.
The Archive Export utility may be run from either the lcls-archsrv or lcls-archsrv machines. You may type "./ArchiveExport" to obtain help information for running the Archive Export utility. An example of its use is obtaining archive data for a PV ("VPIO:IN20:111:VRAW") from a specified start time to a specified end time using the main LCLS archive index:
On rare occasions, the main LCLS archiver index has become corrupted. This may be a suspected cause if the Archive Viewer returns an error when attempting to retrieve data. This problem can be confirmed by a failure to retrieve data using the Archive Export utility (see above for directions on its use).
The first step in rebuilding the main LCLS archive index is to stop the process that is updating the index. First, find the process id of the update archive index process:
Once the process id of the update archive index process has been determined using this command, stop the process corresponding to this process id using the UNIX kill command:
Next, copy the corrupted archive index file to another name just in case it needs to be examined later:
Then rebuild the main archive index but do not cause it to be rebuilt continually:
At this point, test the newly built main archive index (/nfs/slac/g/archiver/arch_lcls/current_and_all_index) by plotting data for one or more PVs using the Archive Viewer.
If this is successful, restart the process that rebuilds the main LCLS archive index continually:
The Apache web server used to support the Archive Data Server operation has now been installed locally on lcls-archsrv. The Apache web server initialization script is:
The apachectl script is located at:
The Apache configuration file is located at:
The Apache access and error log files are in:
If archiver data may be retrieved using the Archive Export utility but not with the Archive Viewer, there may be a problem activating the Archive Data Server used by the Archive Viewer to retrieve data.
This will occur if the Apache web server is not running on lcls-archsrv. To determine if it is running:
The result of this "ps" command should show the presence of several web server processes. If they are not running, a system adminstrator should be able to restart the Apache web server on lcls-archsrv using the startup web script, /etc/init.d/httpd.
The Archive Data Server is started as a CGI script by the lcls-archsrv Apache web server when the Archive Viewer makes a data request (e.g., during plotting). The following script is invoked on lcls-archsrv:
Author: Bob Hall 20-May-2008 Rev:
Bob Hall 16-Sep-2009 Added information for new local installation of lcls-archsrv Apache web server. Rev:
Bob Hall 17-Jul-2010 Extensively revised.
Bob Hall 16-Sep-2009 Added information for new local installation of lcls-archsrv Apache web server. Rev:
Bob Hall 17-Jul-2010 Extensively revised.
Rev: Bob Hall 17-Jul-2010 Extensively revised.