Minutes/notes from discussion of eventstore design at RAL workshop (Tuesday afternoon, 14 Jan, 2003) --------------------------------------------------------------------- [These are my notes made after the fact so I may have missed, forgotten or misunderstood some things. Let me know if that is the case.] The discussion started with a quick review of the OCP requirements: http://www.slac.stanford.edu/BFROOT/www/Computing/Distributed/workshops/Jan2003/OCPrequirements.pdf (also linked from the top of the agenda page) and then proceeded to a simple list of discussion points (also linked from this sessions portion of the agenda page): http://www.slac.stanford.edu/BFROOT/www/Computing/Distributed/workshops/Jan2003/evs_discussion.pdf o AllEvents Micro discussion / PR/SP/Skim production output The discussion here revolved around whether the AllEvents Micro should (eventually) be produced in PR/SP or as part of each (or just the first) skim production. Conclusion: The AllEvents micro should be produced at the same time as the mini in PR/SP and subsequent skim production should produce only the individual, deep-copy skims. [In the time period where we are producing "new micro" skims from from Objy mini, we will have to produce the AllEvents Kanga micro in a dedicated conversion step as we do now.] There was also the question of how physics deep-copy skims would be produced for "new" PR/SP data (as opposed to the every-three-month production reskim), the general consensus was that the skims should also be produced immediately after the PR/SP production, but in a separate step from the one using Elf/Moose (presumably after any merge of output from PR/SP). This could be done with the very latest production skim executable (not the one from the Moose/Elf release itself) so that it doesn't significantly increase the average resources needed for (re)skimming. o Tag from production, from skims We discussed briefly the tag written by PR/SP if PhysProdSequence is disabled/off (as in the case where we do the skimming in a dedicated, separate step). The PR/SP output should include the minimal tag information from the trigger, Digi/BG filters and this tag output should be read in and augmented in writing a full physics tag in the skimming step. We didn't really discuss the possibility of moving some things from the tag to the deep copy skims, although it was mentioned (quickly) in passing. We also didn't discuss explicitly the contents of the tag written to the deep-copy skims, but the working assumption is that it will be the _full_ tag, the same for every deep-copy skims. (Thus, for example, one can read one skim and see if a particular event is in any other skim.) o Logical/physical, collections/files We discussed briefly the very simple model used in the "classic" Kanga where a single collection maps to a single file, and acknowledged the (obvious) need for something more sophisticated to navigate to event components in other files (or borrowed from other collections). I think it was generally accepted that it would be best to have one uniform structure both for the standard event data (tag/aod, esd) as well as any user-added data ("usr", say). [Should this be encapsulated in an additional requirement?] There was a short discussion about what should be in tcl file. The possibility of putting (for example) all of the information about the files directly into the tcl file was mentioned. This would not satisfy one of the OCP requirements (that users deal with event collections, not files) and doesn't really extrapolate well to dealing with sparse (pointer) collections generically in the same way as we deal with with regular collections. A proposal would be to add an explicit "event header" to each collection in the ROOT file itself, with one implementation of this being simply a generalization of the existing "pointer" collections to point event-by-event to the location of every _component_ (usr, aod, tag, ...) of the event, "owned" or "borrowed". This would have the advantage of simply generalizing the "pointer" collections to cover all possible types collections in the implementation (as opposed to having two different "types" of implementations). There was discussion of this for a while. (Unfortunately, I'm probably not going to be able to summarize all of the discussion, though as it wandered around.) The possibility of implementing a real logical/physical mapping for file location (instead of this pseudo-mapping handled via soft links) was brought up. For both this purpose (and anything to do with the event header) there seem to be general distaste for any implementation that requires the executable running on the batch cpu to talk to a central server (e.g. a relational database) to get such information. [Should this be encapsulated in a requirement somehow?] Eric described briefly his implementation of this in the interactive prototype and it was very similar, basically allowing one to take a tree in another file as a Friend of the original tree (thus "borrowing" that data). An additional question (IIRC) arose as to whether the micro would be completely "orthogonal" to the mini if the micro is a reduced mini, i.e. whether the micro would duplicate the information from the mini or not. I'm not sure we reached a conclusion on this. (Correct me if I misunderstood here.) o File size We discussed the issue of file size. There are three reasons why it would be useful to aim for a larger file size: o staging the file from a MSS (if the file is saved individually there, as opposed to saved as part of a tar archive). A minimum size was given as 100MB, with larger preferred. (See OCP requirements.) o keep number of file descriptors down on server o reduce file opening cost when running over many small files It was noted that the problem will be particularly bad for the skims (which can take between 0.5-10% of the events). Thus even if the original collections from PR/SP (when we are doing production with Kanga/ROOT output) are sized well, the skims will be too small. The need to merge over skims was acknowledged, perhaps merging different skims over different numbers of runs given the variation of skim fractions. It was pointed out that merging over runs would make it difficult for people who really want to see "Skim X from Run Y". While in many cases users will not to deal with or run over particular runs, there were two examples where this may be useful: o When the "good runs" list is changing (e.g. for new data) and a particular run (already merged in with others) is excluded from the "data set". o A subsystem may want to look at a particular calibration skim (e.g. mu-pairs) for a particular run. There was a long discussion about this. Multiple ideas being tossed out as to how to handle it: a) Store also the small files which were merged in HPSS so that they were available even if the merged collection/file is the one usually used (this was mentioned in passing and wasn't pursued). b) Add a filter to select by run from the larger, merged collection There was the discussion about how "run number" isn't a concept our data knows about (Eric claimed it was actually in the Kanga data somehow, though) other than in collection names. Various possibilities were given for how to implement this nonetheless. The primary difficulty here is that one still needs to read all of the events, or at least the tag information if selecting on something there. This could be fast (6kHz), but is a bit ugly if it does require touching all of the events. c) Add some "collection metadata" that knows about the runs and event offsets and allow the user to ask somehow for a particular run subset from a given collection. d) Add a layer to the eventstore that allows for real hierarchical collections (collections composed of other collections). This would probably complicate significantly the model such that it is less like the existing Kanga model and wasn't pursued for now. (This is from my point of view a generalization possibility (c).) e) Let SkimData handle it. It should know the number of events for each skim for each of the runs which were merged. It could easily generate the appropririate tcl for this (e.g. adding the collection and doing the skip over the appropriate number of events). My personal take was that option (e) was the best as it avoided teaching the executables about anything other than "collections" and sequential event numbering withing the collection (e.g. run over the first N events). If a particular "subcollection" is needed (and exists due to merging) the user will have to talk to SkimData in any case. Failing that as a way of dealing with this, my preference would be for (d) as a generalization of (c), but that would probably add a significant complication to the structure of the eventstore and take it further away from the simplicity of the "classic" Kanga eventstore. In some discussions after the sessions, a slight variation came out of (e) which avoids some of the ugliness (and complexity) of skimData generating "ev cont -nev N" in the middle of a list of collections would simply be to extend the semantics of collections to allow one to specify (for example): /groups/someskim/whatever/events@N-M to mean events N through M (inclusive) from collection /groups/..../events. This would also allow one to specify a particular event within a collection as (say): /groups/someskim/whatever/events@N This would also be a way of specifying the same thing to the event display (this possibility came out in discussions related to the same need in the eventstore some months ago). This should probably be discussed again to make sure we are converging on the general ideas (perhaps encapsulated in some new requirements if need be). o collection metadata/stateID, etc. The subject of collection metadata was brought up as well as the question of how we store the stateID. We agreed that it would be useful to discuss the Bdb implementation for these things with Jacek (who was in the other parallel session about bookkeeping), hence discussion of these topics was deferred. These notes are based on my understanding of the discussion, please let me know if I missed something. We will probably have to encapsulate the above into a presentation and go through it another time in one of the upcoming (Kanga?) meetings. I will probably try to prepare such a presentation myself, but am happy if someone else volunteers. Pete