RFC: collection naming conventions in the new Kanga/ROOT eventstore Here is a proposal for collection naming of production data. Physically, collections are stored in files according to Pete's proposal http://www.slac.stanford.edu/BFROOT/www/Computing/Distributed/RFC/FileNaming.txt It should be emphasised that the collection name is not intended to encode all the information about the collection or how it was produced. We don't expect users to have to decode the different parts of the path. That information is available from the bookkeeping. We do need to ensure that the name will always be unique, won't be over-written after creation, and that no directory level will have more than the possibility of a few thousand sub-directories in the life-time of the experiment. This proposal had in mind the following use-cases:- 1) A useful subset of the data (eg. all runs from one skim) can be copied to a small site (or user's laptop) by copying a single directory tree. Sites that don't want to install the data distribution tools can do this. 2) Similarly, it may be convenient to dump these files into a single directory. This means that the file name (hence the final part of the collection name) should be unique within a small dataset. 3) It should be possible to specify simple data management operations using wildcards. Eg. pin hot skims on disk. All three of these cases actually put constraints on the physical file paths, but since these will be the same as the collection name with a prefix (PFN) and suffix (LFN), the constraints apply also to the collection name. This scheme was optimised for data management. Of course we need to be sure that it won't cause any problems with production. Pete had some ideas that address some concerns that Teela raised (...). Anyway, here is our proposal for the five types of production data, as well as user and AWG prefixes. Comments follow. /store/PR/R14///// __V /store/SPruns/R14////SP_ Generic: /store/SP/R14////SP__ Signal: /store/SP/R14////SP__ Unmerged: /store/PRskims/R14////// __V_V Merged: /store/PRskims/R14////_ Unmerged: /store/SPskims/R14/// //|// __V Merged /store/SPskims/R14/// //|/ __ /store/users/R14//... /store/AWG/R14//... Eg. /store/PR/R14/AllEvents/0001/23/14.0.0a/AllEvents_00012345_14.0.0aV01 /store/SPruns/R14/0250/43/A12.5.2bV01x23F/SP_02504384 Generic: /store/SP/R14/001071/200301/12.3.4a/SP_001071_0035 Signal: /store/SP/R14/001071/run2/12.3.4a/SP_001071_0035 Unmerged: /store/PRskims/R14/14.2.3c/Btoll/0001/23/14.0.0a/Btoll_00012345_V01_V02 Merged: /store/PRskims/R14/14.2.3c/Btoll/00/Btoll_0012 Unmerged: /store/SPskims/R14/12.5.2b/Btoll/00/1071/200301/12.3.4a/Btoll_001071_V04 Merged: /store/SPskims/R14/12.5.3/Btoll/00/1071/200301/Btoll_001071_3456 /store/users/R14/adye/mydir1/MySkimCollection23 /store/AWG/R14/002/ourdir1/OurSkimCollection42 Comments:- 1) All published data (that will be backed up and available via XRootd) is under /store. 2) SP data is initially generated in the SPruns tree and merged into the SP tree. The SPruns files are then deleted. 3) Not valid now, kept below for reference. 4) The condalias format has been changed from Jan2003 to 200301 to allow for clearer ordering in directories. 5) For data converted from Objy to Kan there will be a trailer on the collection name of _CV (the release used to do the conversion and the attempt number of the conversion). Tim. Changes 20th March 2004: Split the /store/SP/ into Generic & Signal. Changes 24th Febraruy 2004: Update /store/SP/ and /store/SPruns/ to reflect actual usage. Changes 23rd February 2004: Clear up some names for SP/PR merged collections. Removed the following comment; 3) The run number (PR, PRskims, and SPruns) is an 8-digit number. In a change to the current naming scheme, the 100-run directory for runs 00012300-00012399 is stored in 0001/00012300 (rather than 0001/2300). This allows the run directory to be located more easily with find. Similarly, the 6-digit SP mode 001071 is stored in directory 001/001071. Changes 11th December 2003: Remove the P from P names. Changes 7th November 2003: Added Comment 5). Changes 31st October 2003: Remove unneeded 00s (0001/3400 -> 0001/34) Add release to P filenames. Changes 1st October 2003: reordered release and type (R14/PR -> PR/R14) Removed duplicate part of run/mode number (0001/00013400 -> 0001/3400) Changes 22 July 2003: added to SPskims filename. Changed some names to distinguish releases and merge numbers: -> or ; -> or . ============================== cut here ============================== Tim Adye, BaBar Group, Particle Physics Dept., _ /| Rutherford Appleton Laboratory, UK. \'o.O' Oop! e-mail: T.J.Adye@rl.ac.uk =(___)= Ack! WWW: http://hepwww.rl.ac.uk/Delphi/Adye/homepage.html U Thphft!