LCLS Channel Archiver Operation

Archiver System Overview

Current LCLS archiver data is stored by many archive engine processes in subdirectories of the lcls-archeng local disk buffer /arch/lcls directory. This disk buffer space must be large enough to store at least two weeks of LCLS archiver data but is not designed to store long-term LCLS archiver data. Currently the LCLS archiver administrator causes each archive engine to be restarted each workday to close data/index files in the previous current engine data directories in the lcls-archeng local disk buffer area and store new archiver data in new current engine data directories. The administrator then invokes the scripts/update_server.pl script from the /nfs/slac/g/archiver/arch_lcls directory on the lcls-archsrv machine, which first copies the closed data/index files from the previous current engine data directories to the NFS LCLS archiver long-term regular density data storage area (subdirectories of /nfs/slac/g/archiver/arch_lcls/lcls).

The update_server.pl script also updates the NFS LCLS archiver long-term regular density disk storage area engine indexes and the higher-level /nfs/slac/g/archiver/arch_lcls/lcls/master_index index after the copy operation is complete. This index and the current engine data directory indexes are used to update the current top-level LCLS archiver index (/nfs/slac/g/archiver/arch_lcls/current_and_all_index), which is continually updated by the lcls-archsrv /nfs/slac/g/archiver/arch_lcls/scripts/update_indicies_not_server.csh process.

This current top-level LCLS archiver index is the index used by a Archive Data Server process and the Aida Channel Archiver data server processes to retrieve recent archiver data. Older regular density data is retrieved using saved older versions of the /nfs/slac/g/archiver/arch_lcls/lcls master index files (master_index_*).

Sparse density data is created to dramatically improve retrieval performance for older archiver data. This data is retrieved by default for archiver data older than the past two weeks (Archive Viewer users, who obtain archiver data through a Archive Data Server process started to process a request, and Aida Channel Archiver clients may override this default and retrieve only regular density data if desired). Early each morning a cronjob running on lcls-prod01 invokes the file /nfs/slac/g/archiver/arch_lcls/scripts/perform_sparcify_activities.pl, which first uses regular density archiver data copied since its previous invocation to create sparsified archiver data in subdirectories of /nfs/slac/g/archiver/sparce_arch_lcls/lcls. Next, this script updates the sparse density disk storage area engine indexes and the higher-level /nfs/slac/g/archiver/sparce_arch_lcls/lcls/master_index index. Finally, the script updates the /nfs/slac/g/archiver/aida_indexes/lcls_sparce_indexes_info.txt file, one of two retrieval information files that determines which index files to use to retrieve data for a requested time interval.

The lcls_sparce_indexes_info.txt retrieval information file is used by both Archive Data Server and Aida Channel Archiver server processes to determine which indexes to use to retrieve data for a requested time interval when the default retrieval is made to retrieve regular density archiver data for the past two weeks and sparsified density data for older archiver data. The other retrieval information file in the /nfs/slac/g/archiver/aida_indexes directory is lcls_indexes_info.txt, which is used when the user requests to retrieve only regular density data for any specified time interval. These two ASCII text files have the same format which contain three lines for each index that may be used for retrieval: the first line indicating the location of an index, the second line indicating the start retrieval date/time for this index, and the third line indicating the end retrieval date/time for this index.

Modifying LCLS Channel Archiver Configuration Files

There are frequent requests to modify the archiver configuration files. Usually the requests are for new PVs to be added to the LCLS archiver or for changes in the archiving mode or sampling rate for existing archiver PVs. There are also occasional requests to remove PVs from archiver configuration files.

Basic Operations

There is an archiver configuratin file for each of the LCLS archiver engines. These files are located in the local disk area on the lcls-archeng and may be accessed from the lcls-archeng, lcls-archsrv, lcls-archsrv machines. To access a configuration file from lcls-archeng, first login to this machine using the laci account:

  1. ssh lcls-archsrv -l laci
  2. ssh lcls-archeng -l laci
  3. cd /arch/lcls/lcls_nn
    where "nn" is a single or double digit engine number.

Refer to the "LCLS Channel Archiver PV Change Scripts Guide" document for information regarding how to modify archiver engine configuration files using scripts, which is the usual recommended archiver engine configuration file modification method. These configuration files are not modified by directly editing them in the /arch/lcls/lcls_nn directories but instead are modified in the /nfs/slac/g/archiver/lcls_pv_changes directory and then copied to the appropriate /arch/lcls/lcls_nn directories after backups are made (see the "Need for Backup Configuration Files and Recovery" section below).

After a change to an archiver engine configuration file, the corresponding archiver engine must be restarted for the change to take effect. This is accomplished by stopping an archiver engine associated with the modified configuration file and allow the LCLS Archive Daemon running on lcls-archeng to detect that the engine has been stopped and restart it. To stop an archiver engine, the preferred method is to send a message to an archiver engine to stop by entering a URL in a browser.

For example, to stop engine 1enter the following URL in a browser:

To monitor the status of the LCLS archiver engines and check whether an archiver has been restarted, enter the URL of the main LCLS Archive Daemone web page in a browser:

If an archiver has been stopped, this web page should indicate that the LCLS Archive Daemon has detected that it has stopped and restarted the engine. A message such as "Starting Engine 'LCLS_1': lcls-archeng:4901" should appear on this web page several seconds after the archiver has been stopped and the following should appear in red on this web page on the line corresponding to the stopped archive engine: "Not Running". After several more seconds, this should be replaced by an indication in red of how many channels (PVs) are connected out of the total number of channels in the archiver list (e.g., "6376/6426 channels connected").

Occassionally, the method of stopping an archive engine by sending a message to an archive engine to stop by entering a URL in a browser will not work. Then it is necessary to stop the engine by issuing a UNIX kill command. First, find the process id number of the archive engine one wishes to stop using the following command on the lcls-archeng machine under the laci account:

 

Once the process id of the archive engine to be stopped has been determined using this command, stop the process corresponding to this process id using the UNIX kill command:

Also, after this kill command has been invoked to stop an engine it is frequently necessary to remove the lock file for the engine so the engine may be restarted by the LCLS Archive Daemon.

For example, to remove the lock file for archive engine 1 after it has been stopped using the kill command:

  1. cd /arch/lcls/lcls_1
  2. rm *.lck

Communicating with Requestors

Frequently, people who make requests for PVs to be added to the archiver do not specify the archiver mode to be used and the associated sample rate for sampled PVs or the estimated rate of change for monitored PVs. The requestor must specify the archiver mode for each PV: sampled or monitored. For each PV archived in sampled mode, the requestor must specify the sample interval in seconds. For each PV archived in monitored mode, the estimated change interval in seconds must specified in the archiver configuration file (often the requestor does not know how often a PV value will change in monitored mode so in these cases use either an estimated interval of 1 second or an estimated interval that is the same as other similar PVs already in the archiver file).

A message such as the following may be sent via email to these requestors explaining the differences between the sampled and monitored archiver modes and the information they need to supply:

It is important to notify the requestor via email after the request has been processed. If new archiver PVs were added, it is important to determine whether all of the new archiver PVs connected successfully and let the requestor know if this is the case. If one or more PVs did not connect successfully, the requestor should be sent a list of the names of these PVs.

To determine this information for new archiver PVs to be added, the easiest method is to determine how many channels were connected in the archiver groups to which more PVs are to be added before the change and after the change.

The number of channels (PVs) for an archive engine group can be found by selecting the link for the desired engine on the main Archive Daemon web page and then the "Groups" link on the subsequent web page. Alternatively, the URL for the groups of a specified engine may be entered directly with the following form:

For example, if 10 new archiver PVs are to added to the LCLS_1 archive engine group "lcls_1_water", one should bring up the "http://lcls-archeng:4901/groups" web page before the change is made to determine the number of channels in the group and how many are currently connected. Let us say that 346 channels were in the group and 340 were currently connected before adding the 10 new archiver PVs to this group. After adding these new archiver PVs to this group, one would expect that this web page should indicate that 356 channels are now in the group and 350 are connected. If this web page does not reflect these expected totals, then one can determine which channels are not connected for the group by selecting the link on this web page containing the group name (e.g., "lcls_1_water"). The not connected channel (PV) names are shown in red. One possibility for a discrepancy in the actual and expected number of channels connected for a group is one or more typos in channel names. Another possibilty is that the signals associated with the new archiver PVs are not currently operational but will be in the future.

Need for Backup Configuration Files and Recovery

Obviously, one should be very careful when modifying an archiver configuration file. A mistake made in editing one of these XML files will prevent the associated archiver engine files from restarting. Even if the automated tools to modify the archiver configuration files described in the "LCLS Channel Archiver PV Change Scripts Guide" are used, it is occassionally necessary to back out an archiver configuratin file change. Therefore, a disciplined approach to maintaining backup configuration files must be observed. The convention that is used is to retain two previous versions of each archiver configuration file, with file names suffixes ".prev" and ".prev2".

For example, the archiver configuration files and their backups for the LCLS_1 archiver are located in the /arch/lcls/lcls_1 directory:

Before copying a modifed archiver engine configuration file from the /nfs/slac/g/archiver/lcls_pv_changes directory to the production configuration files directory for an engine: (1) copy the first backup into the second backup, and (2) copy the current configuration file into the first backup. Then copy modified archiver configuration file from the /nfs/slac/g/archiver/lcls_pv_changes directory to the engine's production configuration file directory. It is also useful to examine the differences between the previous version and the new version of the configuration file to verify that the expected changes were made. For example,

  1. cp lcls_1-group.xml.prev lcls_1-group.xml.prev2
  2. cp lcls_1-group.xml lcls_1-group.xml.prev
  3. cp /nfs/slac/g/archiver/lcls_pv_changes/lcls_1-group.xml .
  4. diff lcls_1-group.xml.prev lcls_1-group.xml

After the configuration file has been modified, the first backup may be used to quickly restore operation of the associated archive engine if a problem occurs using the new configuration file.

For example, if a mistake is made in modifying the /arch/lcls/lcls_1/lcls_1-group.xml archiver configuration file then the LCLS Archiver Daemon will be unable to successfully restart the LCLS_1 archive engine after it has been stopped. This can be seen by entering the following URL in a browser:

For example, under the "Messages" section on this web page messages such as the following will be seen approximately every 30 seconds if there is a problem restarting the LCLS_1 archive engine due to a bad /arch/lcls/lcls_1/lcls_1-group.xml archiver configuration file: "Starting Engine 'LCLS_1': lcls-archeng:4901". If a message such as this appears more than once after stopping an archive engine, do the following immediately to (1) save the bad configuration file under a different name, and (2) copy the first backup to the current configuration file, which should allow the archive engine to be restarted successfully:

  1. cd /arch/lcls/lcls_1
  2. cp lcls_1-group.xml lcls_1-group.xml.bad
  3. cp lcls_1-group.xml.prev lcls_1-group.xml

Then one can determine the location of at least the first error in the bad configuration file by finding a log file created when the LCLS Archive Daemon was unable to start the associated archive engine using this file:

  • ls -alt *.log | more
  • Edit the latest ".log" file to determine its contents. If there are no errors indicated in this file, edit the next-to-latest ".log" file to determine its contents. An indicated error in this file due to a bad configuration file may look something like the following:
  • In the above example, line 13 of lcls_1-group.xml.bad should be examined to determine the error in the XML configuration file. The error may be then corrected in the "lcls_1-group.xml.bad" file and then the command "cp lcls_1-group.xml.bad lcls_1-group.xml" may be issued. Then one can stop the LCLS_1 archive engine again, which will cause the LCLS Archive Daemon to attempt to restart the LCLS_1 archive engine using the corrected current archiver configuration file.

     

    Processing Generated File from EDM Screen Failed Archiver Retrieval Requests

    Users may request the display of a plot of archiver data for a PV referenced on a LCLS EDM screen (accessed through the "lclshome" main EDM screen). When the retrieval of archiver data fails as the result of such a request, a line containing the timestamp of the request and the name of the PV is appended to the end of the /u1/lcls/tools/ArchiveBrowser/toBeArchivedList file (which may be accessed from a machine such as lcls-builder). The archiver administrator should process this file periodically (e.g., once per week) to add PVs from this list to LCLS archiver configuration files when appropriate.

    Note that there is a similar file for FACET, /u1/lcls/facet/ArchiveBrowser/toBeArchivedList, that should also be processed periodically while FACET is running. This file does not grow as quickly as the corresponding LCLS file.

    A PV in a toBeArchivedList file does not necessarily need to be archived. In general, the archiver administrator should make a judgement whether a PV in this file can be excluded from consideration for archiving (based on past experience due to naming patterns) or needs to be forwarded to the responsible IOC engineer for a decision. If the responsible IOC engineer determines that a PV forwarded to him/her needs to be archived, he/she needs to indicate the archiving method. However, there are some PVs that do not have a Controls Software responsible IOC engineer (e.g., SLC system PVs and photon [including FEE] PVs). Usually these PVs are archived by the archiver administrator using an archiver method the same as other similar PVs (or an archiving method based on the administrator's judgement by executing "camonitor", for example, for the PV on lcls-archeng).

    The following procedure may be followed to process a toBeArchivedList file:

    1. Logon to the lcls-builder machine using the "softegr" account.
    2. Change your current directory to a private working directory (in this procedure it is referred to as the "to_be_archived_list" directory).
    3. Move the toBeArchivedList to your "to_be_archived_list" working directory: You may want to rename your private toBeArchivedList file to another name so you can store a history of these files (e.g., a name such as toBeArchivedList_n, where "n" is an integer one larger than the previous renamed toBeArchivedList file name).
    4. Copy your private toBeArchivedList file to the LCLS PV changes NFS working directory. For example, if your user name is "rdh" and your private toBeArchivedList file name is "toBeArchivedList_100":
    5. In another window (referred to the "NFS access window" in this procedure), logon to the lcls-archsrv machine using the "laci" account.
    6. In the "NFS access window":
      1. cd /nfs/slac/g/archiver/lcls_pv_changes
      2. Edit the copied "toBeArchivedList" file. Evaluate each PV name to determine whether it definitely does not need to be archived (and therefore its line is removed from the list), definitely does need to be archived (and therefore an archiver configuration file is edited to add this PV name for archiving, after which its line is removed from the "toBeArchivedList"), or the decision whether this PV is to be archived should be made by the responsible IOC engineer. At the end of this process, the copied "toBeArchivedList" file has been edited so this file only contains the names of PVs whose determination of whether they need to be archived should be made by the responsible IOC engineers (one PV name per line, with no other information). If there is at least one PV name in this file after editing, continue to the next step. The remaining steps involve determining the responsible IOC engineers for the remaining PV names in the copied "toBeArchivedList" file.
      3. export TWO_TASK=MCCO
      4. ./generate_ioc_engineer_contact_script.pl
        (Note: this script contains the irmisdb Oracle account name and password. It needs to be edited each time after this password is changed.)
        Enter the name of the copied "toBeArchivedList" file in response to the "Please enter name of archiver PV list request file" prompt. The output file from invoking this Perl script is "ioc_engineer_contacts.bash", which contains a "caget" of the "CONTACT" PV for each PV name remaining in the copied "toBeArchivedList" file.
    7. In your lcls-builder window, copy the generated "ioc_engineer_contacts.bash" file generated in NFS space to your private working directory. For example:
    8. In your lcls-builder window, invoke the copied file: The displayed output of this file will indicate the responsible IOC engineer that needs to be contacted (e.g., through email) for each PV to determine whether the PV needs to be archived and, if so, the archiving method.

    Reporting Not Connected PVs to Requestors

    It is good practice to reply as soon as possible after modifying one or more configuration files to the requestor of the change informing him/her that the configuation file change request has been processed. It is also good practice when new archiver PVs are added to inform the requestor about any new archiver PVs that did not connect successfully when the requestor might expect that all of the new archiver PVs should have connected successfully.

    If the list of new archiver PVs is large, it may be useful to save the connection status of the group to which PVs are added to a file and process that file with a script that determines which PVs are not connected. After one selects the link for the engine's group name of interest on the Archive Daemon web interface, one may save the group PV connection statuses to a file using the Firefox browser "File => Save Page As..." command. The created file may then be processed by the /nfs/slac/g/archiver/lcls_pv_changes/create_not_connected_list2.pl Perl script to create a text file of the PVs in the group that are not connected.

    Engine Restart Days Control

    The LCLS Archive Daemon usually restarts each engine every workday with restart times controlled by the /arch/archiveconfig.xml file. The Archive Daemon startup script, /arch/scripts/ArchiveDaemon.pl, has been modified to not restart each engine on weekends when the active archiveconfig.xml file specifies daily restarts. This was done to avoid the restart of engines when there is no archiver administrator available to respond to any problem that might occur. For this reason it is also desirable not to restart each engine on holidays. The ArchiveDaemon.pl script has also been modified to read the /arch/holidays.txt file during a restart of the Archive Daemon process in order to determine other days (besides weekends) it is desired not to restart the engines.

    After the /arch/holidays.txt file is modifed, the Archive Daemon process must be restarted. The /arch/scripts/archive_engine_monitor.pl process should also be restarted since it also reads the holidays.txt file during initialization.

    Overriding the Default Automation Processing

    Every workday the LCLS engines are normally restared starting at 10:00 AM (at one minute intervals) on the lcls-archeng machine. The /arch/scripts/archive_engine_monitor.pl process running on this machine monitors the engine restarts and upon successful completion writes a file of the form date_ready_for_copy_and_index.txt (where "date" is today's date in yyyy_mm_dd format) in the /arch/log area. This normally occurs at approximately 10:19 AM and email is sent to the archiver administrator(s) indicating that all engines have been successfully restarted.

    The lcls-archsrv /nfs/slac/g/archiver/arch_lcls/scripts/auto_update_server.pl process checks this /arch/log (mounted lcls-archeng local disk space) area every minute looking for a file in the above format for the current day's date. When it detects such a file it begins the task of copying the recently closed engine directory data and rebuilding the NFS regular density indexes (in the /nfs/slac/g/archiver/arch_lcls subdirectories). After this task has completed email is sent to the archiver administrator(s) indicating that the LCLS archiver copy and index rebuild finished.

    During the process of changing the LCLS Top-Level Archiver Index (as described in the section below) it may be desired to override this default automation processing to restart the engines before the automation process 10:00 AM start time and initiate the data copy/ index rebuild task sooner than the usual 10:19 AM. The following procedure describes procedure describes this process:

    1. ssh lcls-archeng -l laci
    2. cd /arch
    3. Stop the archive_engine_monitor.pl process:
      1. ps -ef | grep archive_engine_monitor
      2. Determine the process number for this process.
      3. Stop this process:
        kill -9 process_number
    4. Copy a different version of the archiveconfig.xml file to the current file. For instance, if it is desired to start all of the engines at 10:00 AM each Friday, rather than daily:
      cp archiveconfig_fri.xml archiveconfig.xml
    5. To make this change effective, run the update_archive_tree.pl script:
      scripts/update_archive_tree.pl
      Ignore all generated messages.
    6. Stop the Archive Daemon process and all of the engines it controls:
      scripts/stop_daemons.pl -p
    7. Immediately restart the Archive Daemon process, which will restart all archive engines:
      scripts/start_daemons.pl
    8. Monitor the LCLS Archive Daemon main web page (http://lcls-archeng:4900) to confirm all engines have been restarted. On this same web page confirm that each engine will be restarted at the new desired time (e.g., at 10:00 AM each Friday).
    9. Manually signal to the lcls-archsrv /nfs/slac/g/archiver/arch_lcls/scripts/auto_update_server.pl process that it should begin the data copy/index rebuild task:
      1. cd /arch/log
      2. Copy an existing file of the form date_ready_for_copy_and_index.txt (where "date" is a date in yyyy_mm_dd format) to a file where the "date" indicates today's date. This is needed since the archive_engine_monitor.pl process that is used in the normal automation has been stopped (killed) previously. For instance, if today's date is March 13, 2012 and there is a previous March 12 "ready for copy and index" file:
        cp 2012_03_12_ready_for_copy_and_index.txt 2012_03_13_ready_for_copy_and_index.txt

    After changing the LCLS Top-Level Archiver Index, the default automation processing can be restored:

    1. ssh lcls-archeng -l laci
    2. cd /arch
    3. Restore the version of the archiveconfig.xml file that is used in the default automation process (restarting each engine daily starting at 10:00 AM):
      cp archiveconfig.xml.daily archiveconfig.xml
    4. To make this change effective, run the update_archive_tree.pl script:
      scripts/update_archive_tree.pl
      Ignore all generated messages.
    5. Stop the Archive Daemon process and all of the engines it controls:
      scripts/stop_daemons.pl -p
    6. Immediately restart the Archive Daemon process, which will restart all archive engines:
      scripts/start_daemons.pl
    7. Monitor the LCLS Archive Daemon main web page (http://lcls-archeng:4900) to confirm all engines have been restarted. On this same web page confirm that each engine will be restarted daily at 10:00 AM (one minute apart), which is the default automation setting.
    8. cd /arch/scripts
    9. Restart the archive_engine_monitor.pl process:
      ./st.archive_engine_monitor

    Changing the LCLS Top-Level Archiver Index Before it Approaches 2 GB

    Archiver indexes may not be allowed to exceed 2 GB. In a matter of weeks since the LCLS top-level archiver index was last changed, it must be changed again before it reaches the 2 GB size limit. Several hours are required to change this index.

    1. First, choose the name for the new LCLS top-level archiver index as it will shown on the Archive Viewer. The standard convention for this name is "LCLS01-yyyy_mm", where "yyyy" is the current year and "mm" is the current month number ("01" is January, "12" is December). For example, the index may be "LCLS01-2010_07".
    2. Create a tar file of the production LCLS Archive Viewer configuration files:
      1. ssh lcls-builder -l softegr
        Choose the number corresponding to your login name (e.g., rdh).
      2. cd /u1/lcls/tools/ArchiveBrowser
      3. tar -cvf config.tar config
    3. Expand the tar file in your area (e.g., /home/softegr/rdh):
      1. cd /home/softegr/rdh
      2. cp /u1/lcls/tools/ArchiveBrowser/config.tar .
      3. tar -xvf config.tar
    4. Edit each Archive Viewer configuration file in your area (NOT the production area). Each configuration file contains a reference to a top-level index file for each PV name in the file. For files containing relative start/end times, replace the current LCLS top-level archiver index file name with the new name (e.g., replace "LCLS01_2010_05" with "LCLS01_2010_07"). For those configuration files with an absolute start and/or end time, substitute the next higher "LCLSxx" number (e.g., replace "LCLS01_2010_05" with "LCLS02_2010_05").
    5. Create a tar file of your area Archive Viewer configuration files. For example:
      1. cd /home/softegr/rdh
      2. rm config.tar
      3. tar -cvf config.tar config
    6. Wait for the current engine directory data to be copied and the NFS regular density indexes to be rebuilt. This activity automatically starts every workday at approximately 10:19 AM after the /nfs/slac/g/archiver/arch_lcls/scripts/auto_update_server.pl process running on lcls-archsrv detects that a file of the form date_ready_for_copy_and_index.txt (where "date" is today's date in yyyy_mm_dd format) exists in the /arch/log directory (mounted lcls-archeng local disk space). This file is written by the /arch/scripts/archive_engine_monitor.pl process (running on lcls-archeng) after each LCLS archive engine has been restarted starting at 10:00 AM. The auto_update_server.pl process sends email to the archiver administrator(s) after the copy/index rebuild task has completed. See the above section "Overriding the Default Automation Processing" section above if it is desired to restart the LCLS archive engines before the usual automation restart starting time of 10:00 AM (and keep them from restarting then) and initiate the copy/index rebuild before the usual 10:19 AM starting time.
    7. After the processing for the preceding step has completed, perform the sparsify activities:
      1. ssh lcls-prod01 -l laci
      2. cd /nfs/slac/g/archiver/arch_lcls
      3. scripts/perform_sparcify_activities.pl
    8. Make copies of the regular density and sparsified top-level master_index directories:
      1. ssh lcls-archsrv -l laci
      2. cd /nfs/slac/g/archiver/arch_lcls/lcls
      3. cp master_index master_index.save
      4. Use the current (NOT new) index name suffix to make another copy of master_index. For example, if the current name suffix is 2010_05:
        cp master_index master_index_2010_05
      5. cd /nfs/slac/g/archiver/sparce_arch_lcls/lcls
      6. cp master_index master_index.save
      7. Use the current (NOT new) index name suffix to make another copy of master_index. For example, if he current name suffix is 2010_05:
        cp master_index master_index_2010_05
    9. Build the backup (/nfs/slac/g/archiver/arch_lcls_2) LCLS top-level index referencing the regular density index master_index.save file instead of the regular density index master_index file.
      1. ssh lcls-archsrv -l laci
      2. cd /nfs/slac/g/archiver/arch_lcls_2
      3. cp current_and_all_indexconfig.xml current_and_all_indexconfig.xml.prev
      4. Edit current_and_all_indexconfig.xml to change the "base/master_index" reference to "base/master_index.save".
      5. mv current_and_all_index current_and_all_index.prev
      6. scripts/update_indicies_not_server5.csh
      7. After previously invoked script completes:
        scripts/st.update_backup_indicies
    10. Save the current versions of the lcls_sparce_indexes_info.txt and lcls_indexes_info.txt retrieval information files. Then edit them to reference the arch_lcls_2 LCLS top-level index file and the saved sparsify master_index.save file:
      1. cd /nfs/slac/g/archiver/aida_indexes
      2. cp lcls_indexes_info.txt lcls_indexes_info.txt.save
      3. cp lcls_sparce_indexes_info.txt lcls_sparce_indexes_info.txt.save
      4. cp lcls_indexes_info.txt lcls_indexes_info.txt.temp
      5. cp lcls_sparce_indexes_info.txt lcls_sparce_indexes_info.txt.temp
      6. Edit lcls_indexes_info.txt.temp to change "arch_lcls/current_and_all_index" to "arch_lcls_2/current_and_all_index".
      7. Edit lcls_sparce_indexes_info.txt.temp to change "arch_lcls/current_and_all_index" to "arch_lcls_2/current_and_all_index". Also change "sparce_arch_lcls/lcls/master_index" to "sparce_arch_lcls/lcls/master_index.save".
      8. cp lcls_indexes_info.txt.temp lcls_indexes_info.txt
      9. cp lcls_sparce_indexes_info.txt.temp lcls_sparce_indexes_info.txt
      10. Test retrieval of archiver data using the Archive Viewer to verify that there are no problems retrieving data for either the "Yes" or "No" settings of "Include Sparsified Data".
    11. Create ".next" versions of the lcls_indexes_info.txt and lcls_sparce_indexes_info.txt retrieval information files in preparation for the switch over to the changed LCLS top-level archiver index:
      1. cp lcls_indexes_info.txt.save lcls_indexes_info.txt.next
      2. cp lcls_sparce_indexes_info.txt.save lcls_sparce_indexes_info.txt.next
      3. Edit lcls_indexes_info.txt.next to indicate the start date/time is the current date at time 00:00:00. Also add three lines below it for the current name suffix of the regular density master_index file (e.g., "/nfs/slac/g/archiver/arch_lcls/lcls/master_index_2010_05") with start/end times for this index.
      4. Edit lcls_sparce_indexes_info.txt.next to indicate the start date/time is the current date at time 00:00:00. Add three lines below it for the current name suffix of the regular density master_index file (e.g., "/nfs/slac/g/archiver/arch_lcls/lcls/master_index_2010_05") with start/end times for the last two weeks ending at the current date at time 00:00:00. Finally, add three lines below it for the current name suffix of the sparse master_index file (e.g., "/nfs/slac/g/archiver/sparce_arch_lcls/lcls/master_index_2010_05").      Following is the example of lcls_indexes_info.txt.next file before and after editing -

        /nfs/slac/g/archiver/arch_lcls/current_and_all_index

        04/10/2013 00:00:00

        now

        /nfs/slac/g/archiver/sparce_arch_lcls/lcls/master_index

        03/26/2013 00:00:00

        04/10/2013 00:00:00

        /nfs/slac/g/archiver/sparce_arch_lcls/lcls/master_index_2013_02

        02/26/2013 00:00:00

        03/26/2013 00:00:00

        /nfs/slac/g/archiver/sparce_arch_lcls/lcls/master_index_2013_01

        01/29/2013 00:00:00

        02/26/2013 00:00:00

        /nfs/slac/g/archiver/arch_lcls/current_and_all_index

        04/24/2013 00:00:00

        now

        /nfs/slac/g/archiver/arch_lcls/lcls/master_index_2013_03

        04/10/2013 00:00:00

        04/24/2013 00:00:00

        /nfs/slac/g/archiver/sparce_arch_lcls/lcls/master_index_2013_03

        03/26/2013 00:00:00

        04/10/2013 00:00:00

        /nfs/slac/g/archiver/sparce_arch_lcls/lcls/master_index_2013_02

        02/26/2013 00:00:00

        03/26/2013 00:00:00

        /nfs/slac/g/archiver/sparce_arch_lcls/lcls/master_index_2013_01

        01/29/2013 00:00:00

        02/26/2013 00:00:00


    12. Save the current version of the Archive Data Server configuration file and then create a ".next" version of this file in preparation for the switch over to the changed LCLS top-level archiver index:
      1. ssh lcls-archsrv -l laci
      2. cd /www/cgi-bin/xmlrpc
      3. cp serverconfig.xml serverconfig.xml.save
      4. cp serverconfig.xml serverconfig.xml.next
      5. Edit serverconfig.xml.next by first changing the first "archive" tag entry from the current index name to the new index name (e.g., LCLS01-2010_07). Add a new "archive" tag entry for the current name but prefixed with "LCLS02" (e.g., LCLS02-2010_05). Set the key for "archive" tag entry to "2". Modify the remaining "archive" tag entries by adding 1 to the current key value and if it is a LCLS "archive" tag entry, add one to the "LCLSxx" prefix.
    13. Stop the primary top-level index process:
      1. ssh lcls-archsrv -l laci
      2. ps -ef | grep -i update
      3. Determine the process number for the "update_indicies_not_server.csh" process.
      4. Stop this process:
        kill -9 process_number
    14. Delete the regular density and sparse top-level master_index files:
      1. ssh lcls-archsrv -l laci
      2. cd /nfs/slac/g/archiver/arch_lcls/lcls
      3. rm master_index
      4. cd /nfs/slac/g/archiver/sparce_arch_lcls/lcls
      5. rm master_index
    15. Rebuild the regular density engine configuration files. Perform the following for each of the lcls_n subdirectories of the /nfs/slac/g/archiver/arch_lcls/lcls on lcls-archsrv as laci:
      1. cd /nfs/slac/g/archiver/arch_lcls/lcls/lcls_n
      2. mv master_index master_index.save
      3. ArchiveIndexTool -v 1 indexconfig.xml master_index > ArchiveIndexTool.log
    16. Rebuild the regular density top-level master_index file:
      1. cd /nfs/slac/g/archiver/arch_lcls/lcls
      2. ArchiveIndexTool -v 1 indexconfig.xml master_index > ArchiveIndexTool.log
    17. Rebuild the sparse density engine configuration files. Perform the following for each of the lcls_n subdirectories of the /nfs/slac/g/archiver/sparce_arch_lcls/lcls on lcls-archsrv as laci:
      1. cd /nfs/slac/g/archiver/sparce_arch_lcls/lcls/lcls_n
      2. mv master_index master_index.save
      3. ArchiveIndexTool -v 1 indexconfig.xml master_index > ArchiveIndexTool.log
    18. Rebuild the sparse density top-level master_index file:
      1. cd /nfs/slac/g/archiver/sparce_arch_lcls/lcls
      2. ArchiveIndexTool -v 1 indexconfig.xml master_index > ArchiveIndexTool.log
    19. Build the new primary LCLS top-level index using the new regular density top-level master_index file:
      1. ssh lcls-archsrv -l laci
      2. cd /nfs/slac/g/archiver/arch_lcls
      3. mv current_and_all_index current_and_all_index.prev
      4. scripts/update_indicies_not_server5.csh
      5. After the previously invoked script completes:
        scripts/st.update_indicies_not_server
    20. Release the change of the LCLS top-level archiver index. This consists of three steps: (1) replace the Archive Viewer configuration files with the previously edited new versions, (2) replace the Archive Data Server configuration file with the next version, and (3) replace the two retrieval information files with their next versions:
      1. ssh lcls-builder -l softegr
        Choose the number corresponding to your login name (e.g., rdh).
      2. cd /u1/lcls/tools/ArchiveBrowser
      3. mv config.tar config.tar.prev
      4. cp /home/softegr/rdh/config.tar .
      5. rm -rf config
      6. tar -xvf config.tar
      7. ssh lcls-archsrv -l laci
      8. cd /www/cgi-bin/xmlrpc
      9. cp serverconfig.xml.next serverconfig.xml
      10. ssh lcls-archsrv -l laci
      11. cd /nfs/slac/g/archiver/aida_indexes
      12. cp lcls_indexes_info.txt.next lcls_indexes_info.txt
      13. cp lcls_sparce_indexes_info.txt.next lcls_sparce_indexes_info.txt
    21. Test the retrieval of archiver data for time intervals including yesterday and today's archiver data as well as more than 2 weeks ago using the Archiver Viewer. Test using both the Archive Viewer "Yes" and "No" settings of "Include Sparsified Data". Also test retrieval using an Aida client (e.g., an Aida Java test case) for yesterday and today's archiver data. Verify that no problems are encountered.
    22. Stop the backup index building process and restore its configuration file:
      1. ssh lcls-archsrv -l laci
      2. ps -ef | grep -i update
      3. Determine the process number for the "update_backup_indicies.csh" process.
      4. Stop this process:
        kill -9 process_number
      5. cd /nfs/slac/g/archiver/arch_lcls_2
      6. cp current_and_all_indexconfig.xml.prev current_and_all_indexconfig.xml
    23. If you have modified the LCLS engine start times as outlined in the "Overriding the Default Automation Processing" section above, remember to restore the engines back to start daily. After completion of this step, check the next start times in http://lcls-archeng:4900 and confirm that the engines are indeed restarting the next day.
    24. The auto_update_server.pl process that copies the data from lcls-archeng over to NFS does not remove the data from lcls-archeng. When switching the index, it is a good time to cleanup the old data. Typically, there will be two+ months worth of data and we should remove the oldest month data for each lcls-archeng engine. For example,
      ssh lcls-archeng -l laci
      cd /arch/lcls
      
      and then for each engine, to remove the data for April 2013, we'd do
      pushd /arch/lcls/lcls_xx/2013
      pwd
      ls -d 04_*
      rm -rf 04_*
      ls -d 04_*
      popd
      

    Troubleshooting Problems

    Use of LCLS Archive Viewer as a Diagnostic Tool

    The majority of users obtain data from the LCLS archiver system using the LCLS Archive Viewer. This may be started from the lcls-builder machine as follows:

    1. ssh lcls-prod02
    2. ssh lcls-builder -l softegr
    3. Enter the number corresponding to your account name or 0.
    4. lclsarch
    To test the retrieval of data for a typical PV, VPIO:IN20:111:VRAW, one may perform the following actions through the Archive Viewer:
    1. Using the top pull-down menu bar: File => Open...
    2. Select the rdh_test3.xml file name in the dialog box and select the "Open" button.
    3. Select the "Plot" button on the main Archive Viewer menu.
    There should be continuous data for the last day on the displayed plot. If there is not (an error message is displayed on a popup screen, the plot fails to complete, or no/limited data appears) then one must diagnose the problem. The following subsections provide guidance for this effort.

    Check Whether Main LCLS Archive Index is Being Updated

    An update archive index process running on the lcls-archsrv machine should be constantly updating the main LCLS archive index, /nfs/slac/g/archiver/arch_lcls/current_and_all_index.

    To check whether this index has been updated lately, perform the following commands on lcls-archsrv or any flora machine, for example:

    1. cd /nfs/slac/g/archiver/arch_lcls
    2. ls -alt | more
    The last update timestamp displayed for the current_and_all_index should be close to the current date and time. If it is not, first check whether you can login to the lcls-archeng machine. There have been instances when the cause of the problem of the main LCLS archive index not being updated was a system problem with the lcls-archeng machine, where the most recent data referenced by the main LCLS archive index is stored. To check whether you can login to the lcls-archeng machine:
    1. ssh lcls-archsrv -l laci
    2. ssh lcls-archeng -l laci
    If this is successful, next check whether you can access the local disk directories on the lcls-archeng machine where the most recent data is stored from the lcls-archsrv machine, where the update archive index process runs. For example:
    1. ssh lcls-archsrv -l laci
    2. ls -l /arch/lcls/lcls_1/current_index
    3. The above command points to the index where the most recent data is stored for the LCLS_1 archive engine. If the result of the above command was "/arch/lcls/lcls_1/current_index -> 2010/07_17/index", for example, check whether one can access this directory and also check that this directory is being updated by the LCLS_1 archive engine (note: the timestamp for the latest files in this directory may be somewhat different than the system time on the lcls-archsrv system due to the difference with the system time on the lcls-archeng system, which can be seen using the "date" command on this system).

    Finally, if the above tests were successful but the main LCLS archive index is not being updated, check to make sure the update archive index process is running on the lcls-archsrv machine:

    1. ssh lcls-archsrv -l laci
    2. ps -ef | grep -i update

    You should see information from the above command indicating that a process is running which is executing a command containing the string "indicies" ("/nfs/slac/g/archiver/arch_lcls/scripts/update_indicies_not_server.csh", for example). If such a command is not running, start it on the lcls-archsrv machine running under the laci account:

    1. cd /nfs/slac/g/archiver/arch_lcls
    2. scripts/st.update_indicies_not_server

    Checking the Archive Engines

    All of the archiver engine processes should be running at almost all times on the lcls-archeng machine. The easiest method to check whether these archive engine processes are running is to use the main LCLS archiver web page:

    This web page also indicates the status of each engine: the number of channels currently connected and the number of channels belonging to each engine (e.g., "6376/6426 channels connected", displayed in red).

    If this web page cannot be accessed, it is likely that the LCLS Archive Daemon process is not running on the lcls-archeng machine. To determine whether this is the case:

    1. ssh lcls-archsrv -l laci
    2. ssh lcls-archeng -l laci
    3. ps -ef | grep -i daemon

    This last command should indicate show information such as the following if the LCLS Archive Daemon process is running:

    If the main LCLS archiver web page cannot be accessed, the following procedure should enable you to access it whether or not the LCLS Archive Daemon process was running previously. It will also cause all of the LCLS archive engines to be stopped and then restarted by the new LCLS Archive Daemon process. The procedure assumes that you are on the lcls-archeng machine logged in using the laci account:

    1. cd /arch
    2. scripts/start_daemons.pl -p

    After restarting the LCLS Archive Daemon process, one should check whether this process has successfully restarted all of the LCLS archive engines. This is done through the following web page:

    It may a minute or two from the time the LCLS Archive Daemon is restarted for an indication to appear on this web page for each LCLS archive engine that each engine is running. For each LCLS archive engine, a red "Not running" message usually means that the system has not finished dectecting whether the engine is running. The indication that an LCLS archive engine is running is also shown in red and shows the number of channels connected and the total number of channels in the archiver configuration file for that engine (e.g., "6376/6426 channels connected").

    If a message appears indicating that a lock file may be present, remove all lock files for the associated archive engine. For example, for the LCLS_1 archive engine:

    1. cd /arch/lcls/lcls_1
    2. rm *.lck

    Less Frequently Used Diagnostic Information and Tools

    Archive Engine Log Files

    Each time an LCLS archive engine is restarted, a new ".log" and ".out" file is created in the main directory for the archiver (e.g., /arch/lcls/lcls_1 for the LCLS_1 archive engine). The name of each archive engine ".log" and ".out" file reflects the timestamp when the engine was started (e.g., 2008_05_20-16_00_16.log and 2008_05_20-16_00_16.out). The log file contains useful diagnostic information that may contain information regarding errors that occurred during archiving. The ".out" file is usually empty.

    Archive Export Utility to View Archive Data

    The Archive Export utility may be used to view archive data in tabular form. This is occasionally useful as a diagnostic tool when it is desired to see individual data sample timestamps and values rather than the graphical archive data available through the Archive Viewer.

    The Archive Export utility may be run from either the lcls-archsrv or lcls-archsrv machines. You may type "./ArchiveExport" to obtain help information for running the Archive Export utility. An example of its use is obtaining archive data for a PV ("VPIO:IN20:111:VRAW") from a specified start time to a specified end time using the main LCLS archive index:

    Rare Problems

    Corrupted Main LCLS Archiver Index

    On rare occasions, the main LCLS archiver index has become corrupted. This may be a suspected cause if the Archive Viewer returns an error when attempting to retrieve data. This problem can be confirmed by a failure to retrieve data using the Archive Export utility (see above for directions on its use).

    The first step in rebuilding the main LCLS archive index is to stop the process that is updating the index. First, find the process id of the update archive index process:

    1. ssh lcls-archsrv -l laci
    2. ps -ef | grep -i update

    Once the process id of the update archive index process has been determined using this command, stop the process corresponding to this process id using the UNIX kill command:

    Next, copy the corrupted archive index file to another name just in case it needs to be examined later:

    1. cd /nfs/slac/g/archiver/arch_lcls
    2. mv current_and_all_index current_and_all_index.bad

    Then rebuild the main archive index but do not cause it to be rebuilt continually:

    At this point, test the newly built main archive index (/nfs/slac/g/archiver/arch_lcls/current_and_all_index) by plotting data for one or more PVs using the Archive Viewer.

    If this is successful, restart the process that rebuilds the main LCLS archive index continually:

    Archive Data Server Apache Web Server

    The Apache web server used to support the Archive Data Server operation has now been installed locally on lcls-archsrv. The Apache web server initialization script is:

    The apachectl script is located at:

    The Apache configuration file is located at:

    The Apache access and error log files are in:

    Archive Data Server Problem

    If archiver data may be retrieved using the Archive Export utility but not with the Archive Viewer, there may be a problem activating the Archive Data Server used by the Archive Viewer to retrieve data.

    This will occur if the Apache web server is not running on lcls-archsrv. To determine if it is running:

    1. ssh lcls-archsrv -l laci
    2. ps -ef | grep -i http

    The result of this "ps" command should show the presence of several web server processes. If they are not running, a system adminstrator should be able to restart the Apache web server on lcls-archsrv using the startup web script, /etc/init.d/httpd.

    The Archive Data Server is started as a CGI script by the lcls-archsrv Apache web server when the Archive Viewer makes a data request (e.g., during plotting). The following script is invoked on lcls-archsrv:

    Author:  Bob Hall 20-May-2008

    Rev:  Bob Hall 16-Sep-2009 Added information for new local installation of lcls-archsrv Apache web server.

    Rev:  Bob Hall 17-Jul-2010 Extensively revised.