UNIX File System Backups at SLAC

UNIX File Systems

For UNIX, there are two primary network file systems in use at SLAC today: NFS (Network File System) and AFS (Andrew File System). They are backed up in different ways, and have different schedules and different capabilities. However, there are a few underlying policies that were followed in setting up both backup systems.

Backups are performed automatically on a daily basis and should be viewed primarily as a disaster recovery mechanism, not as an archival system. This means that the backups are not retained forever: the maximum is generally a year, but can be as short as 2 weeks. See below for the backup retention policies for NFS and AFS files.

NFS Backup and Recovering Your Own Files

By default, we do not backup NFS file systems since they are quite large and often used only as temporary work space. Special requests to backup NFS file systems should be sent via email to unix-admin.

Those NFS file systems that are backed up are done so via IBM Tivoli Storage Manager (ITSM, or commonly just TSM) software. The ITSM server is currently a Sun Solaris host with a disk cache and multiple attached tape drives. TSM supports clients running on various platforms; we currently support Solaris SPARC, Solaris x86 and Red Hat Enterprise Linux x64/x86 clients.

Files are recovered from TSM by using the dsmj (GUI) or dsmc (command line) programs. If you backed up files from a flora public machine, then you can also restore the file from a flora. Users may recover any file owned by their account using either interface, though the graphical interface may be easier to understand. See the TSM Restore Web page for instructions on restoring your files. To request the restoration of files you do not own or that were backed up directly from an NFS server, send email (with an explanation) to unix-admin.

TSM is an incremental backup system. It backs up only the files that changed since the last backup, and maintains information on the state of the client file system. It is possible to restore the file system to the last backup state, and to restore some older versions of deleted files. TSM is not configured to restore the file system to the state it had at a specific point in time, i.e., it may not be possible to restore a directory to the way it looked 4 weeks ago at 12pm or any other particular day or time. Such a policy would use significantly more tape space since the backup server would be forced to keep a copy of every file version going back to that date.

TSM Schedule and Retention Policy

The TSM backup runs each night, usually starting sometime between 12:00AM-5:00AM on TSM client machines.

TSM maintains backup data for both active and inactive file versions. An active version of a file is the most recent backup copy of a file stored in TSM for a file that currently exists on a file server or workstation. An active version remains active and exempt from deletion until: 1) replaced by a new backup version or 2) TSM detects, during an incremental backup, that the user has deleted the original file from a file server or workstation. An inactive version of a file is a copy of a backup file in TSM that either is not the most recent version, or the corresponding original file was deleted from the client file system.

Unless otherwise stated, the STANDARD retention policy is as follows:

  • Up to 31 days of a particular file are kept on tape as long as the file exists on the client's file system.
  • Only the most recently backed up version on tape is active. All other versions on tape are inactive.
  • Once a file on tape goes inactive, it expires after 31 days and gets deleted off tape.
  • If a file is deleted from a client's disk, several things happen during the next full incremental backup: 1) the active version of that file on tape will be marked inactive, 2) all inactive copies start expiring off tape as they reach 31 days of age, and 3) the last remaining inactive copy (which is also the most current backup copy) will be kept for 366 days, after which it expires too.
  • Note that as long as a file remains on a client's disk, its latest backup copy will remain active on tape and not expire.

    What does all this mean? Basically, if a file still exists on disk, the last 31 day's worth are also on tape. But once a file is deleted from disk, the last backup copy is kept for 366 days while all older copies expire as they reach 31 days of age. So if the file is changing every day and then gets deleted, only the last 31 days worth will remain on tape. If the file is changing weekly and then gets deleted, then only about 4-5 inactive versions will remain on tape and span up to 31 days.

    We will notify file owners in advance if their backups are not using the STANDARD retention policy.

    AFS Backup and Recovering Your Own Files

    AFS backup is provided by the native AFS backup system. The unit of AFS file storage and backup is the volume. Typically, each user's home directory is a single volume. For the first level of backup, AFS creates a copy of each volume at midnight each night. This copy is called a "backup volume". You can find this backup volume from the .backup link in each home directory. If you have just deleted or damaged a file that existed at midnight, type "cd ~/.backup" to find a version of it from the previous day and copy it back into your home directory.

    Note that backup volumes are automatically "mounted" for home directory volumes only. This means that you must manually mount backup volumes for group volumes or user sub-volumes if you need to recover files from the midnight copy. To do this you will need the volume name. The easiest way to get this is to execute the "fs listquota" command on the directory in question. For example, if you had accidentally removed a file from the directory /afs/slac/g/babar/data/data01, you would type

    > fs listquota /afs/slac/g/babar/data/data01
    Volume Name                   Quota      Used %Used   Partition
    g.babar.data.01              500000    223503   45%         32%
    The first column lists the name of this volume as "g.babar.data.01". To get the name of the backup volume, append ".backup", then mount it in your home directory with the command:
    > fs mkmount ~/bdata01 g.babar.data.01.backup

    and reference it at ~/bdata01. (You may pick any name in place of bdata01 as long as the directory doesn't already exist.) The only privileges you need for fs mkmount are insert and administer for the directory you are mounting in (such as your home directory).

    We recommend doing such mounts in your home directory to avoid creating directory "loops". For example, it is tempting to mount the .backup volume in the volume you're dealing with, because that is frequently your current directory. However, if you mount a volume's .backup volume within itself, and you leave the mount there, then tomorrow and thereafter, .backup and .backup/.backup and .backup/.backup/.backup etc. will exist. This causes real problems to recursive commands like "ls -lR", "find", and "du". We also recommend that you remove the mount when you are done with it, because you won't really like seeing it under your home directory. You can remove it with

    > fs rmmount ~/bdata01

    AFS Backup Schedule and Retention Policy

    The AFS backup is a series of full and incremental backups, designed to provide complete coverage of recent changes, and sparser coverage going back in time. A level 0 backup is a full backup of the AFS file system. A level 1 backup is an incremental backup of all changes since the previous level 0 backup. A level 2 backup is an incremental backup of all changes since the previous level 1 backup. The schedule of AFS backups is as follows:

    Level 0: A full backup is performed starting at midnight on the first Sunday of each month. This backup is retained for six months. After six months, only the quarterly (January, April, July, October) backups are kept. The quarterly backups are retained for one year.

    Level 1: An incremental backup is performed starting at midnight every Sunday morning (except for the first Sunday of each month). These backups are retained for two months.

    Level 2: An incremental backup is performed starting at midnight Monday through Saturday. These backups are retained for two weeks.

    The result of that schedule is that a volume can be retrieved from the daily backups for the first two weeks, then from the weeklies for the first two months, then from the monthlies for the first six months, and then from the quarterlies for one year.

    AFS backups are not yet retrievable by users with the exception of those files that are located in the user's .backup subdirectory created each midnight. See the AFS Restore Web page or send email to unix-admin to request the retrieval of a file from backup.

    UNIX Backup Home Page

    For corrections or comments, please send email to unix-admin. Please include this URL so we know to which page you're referring.

    Last modified: 09 May 2013