This is not a use case list , but rather proposals or feen in for discussions. "Data Sweeps" (production data) ------------------------------- Since each data file is independent of each other, we should be able to "sweep" indiviual files. In practice, it means making data available to users via hpss in most cases, which means a data file must be backed up before this. Sweep => making data available => having data in hpss. The dataset is made publicly available for users, when it's registered (or published) with a bookeeping system. We want to bind data back up and registering data sets, i.e. only when a file is backed up, the data set is published, and therefor made publicly available. ===> With objy, especially SP, we always had user complains who find some runs in prod database and wonder why it's not available. In the new system, if data is not available to users (i.e. not in hpss), it should not appear in the bookeeping. It should be possible to create datasets without backing up data, but in this case trulbeshoouting becomes an issue for a local db admin ("Hi, I run some jobs but they don't find some data" - "Oh, right, we had a disk crash last night and some data wasn't backed up, and now I have to find and mark all those datasets bad..."). PR, SP sweeps. - - - - - - - PR and SP produce many small files. When merged file is created it's not registered until it's backed up. Ideally, back up and registering is done by one application, or they a coupled. This is because when dataset is available to user's via skim data, user need to be able to stage it in read-only data pool, therefore a file must be in hpss at this time. There are two scenarios here, based on implementation details. If merged file is recorded in PR tables, then backing up the file is resposibility of PR system. A file is backed up and a record is made in PR tables. DQM gets prod info from PR tables. If file is not backed up, DQM can not obtain the info about corresponding runs, as they are considered to be still in production. If PR registers collections directly in skim data, then new dataset is not created until merged file is backed up. If data set has to be created, it should have status "open" or "active" (analogy to objy database file), and so not available for user's to run on. Only when dataset is about to "close", a merged file is backed up first, and then a dataset is declared closed, and becomes available to users. An application will be provided for backing up individual files, with confirmation. Foreign imports. - - - - - - - - Runs are merged at the production site. Production site is, by defenition, the one where run files are merged into big files. Only already merged files are imported to Tier A. They are imported together with the descripttion of the data set, and dataset is entered into skimdata at the time of import. Current objy import protocol can be reused, if nothing else. (Some mods to import app code needed, though). User data --------- Individual user's data (as opposed to production-like working group activity and skimming) is by default not registered in central bookeeping. User's who wish to runs on this data have to make special arrangments. The reason for this is that data for analysis jobs should be in read-only pool under load balancing, and therefore must be in hpss. Backing up small user's run files is not practical. It should be of couse possible to run read-only jobs on user's data, but "in-situ" i.e. where data was originally produced and beeing kept. User's data can be made publicly available by merging runs files into composite files, backing it up, and creating a dataset in central bookeeping. This should be done automaticly upon user's request.