-> Things that are part of the "validation" we need to do this fall: 1) validation of 13.x.x/14.x.x simulation and reconstruction (this is an independent problem from the others, but not truly separable so we need to deal with it at the same time) 2) verify that the skims given sensible _selections_ in 13.x.x/14.x.x (again, this is an independent from the new CM2 work, but not separable hence we need to deal with it) 3) check the correctness of the Kanga version of the mini, raw (and eventually sim) 4) Insure that the output is correct in that all parts are correctly written: no events or parts of events are lost or unreadable. 5) verify that the portion of the "mini" we will call the "micro" is functionally equivalent to the "old micro". 6) verify that the output is correctly readable and that non-trivial tasks performed on it give sensible results (more of a "usability" and/or scaling test). 7) verify that the composite and/or user-added data _content_ is complete and correct ============================================================================== -> We don't work in a vacuum, so we do have things to which we can compare the output from tests in 13.x.x/14.x.x. Some of these things are easier than others. (The numbering of the points in this section match that of the pervious section.) 1) We can compare the 13.x.x/14.x.x monitoring plots from the reconstruction and simulation in 12.x.x. The subsystems routinely do this. It is a bit subjective in that both code and constants are likely to have changed. 2) We can compare the selection rates skim-by-skim with 12.x.x. Also a bit subjective in that code and constants will have changed. The delegated expert from each AWG should sign-off of on their skim selection rates. (Comparing event by event as necessary.) 3) The Kanga mini (raw, sim) should be byte-by-byte identical with the Objy mini (raw, sim) so this is an identity test. We plan to use the event mixing technology to read two collections and do this in an automated way. 4) For some basic cases (for example in the full data/MC conversion and the merge) we can use the identity test of (3) to insure that all pieces are written correctly and readable. In almost all cases we can check that the number of entries on all trees and branches is what is expected (and independently reported) by the Framework code itself. The most elegant way to do this would be with the "Framework report" mechanism already being discussed for bookkeeping task management system, but this could also be done (as it is now) by taking the Framework expectation from the log files. The most complicated cases here is the PR one where we have to insure that no events (or parts of events) are lost when an Elf process crashes during processing. There is a simple checkpointing mechanism that seems to work in standalone tests, but we have no experience with this in bulk. Until we have that experience (and are able to dissect in detail enough cases to know how to automatically check the data integrity) we can simply not accept runs where any Elven crashed during the ER pass. This is in practice what we do for SP and (putting aside Objy related crashes) our experience has been in the past that the reco code Elf is in fact very reliable (or can be made to be so). 5) This one has some subjective parts. In practice given the check of 3) above we are building in part on the work done to demonstrate the "micro" mode of running on the objy version of the mini. This part of the validation should be discussed more widely. 6) This one isn't validation per se, but more "building confidence" that the new Kanga _in bulk_ can be used for things like calibration tasks, mini (or micro) based QA applications, etc. While points 1-5 should technically insure that the output is good, the bottom line is that the detector subsystems, RQM, etc. should believe that they can use this data reliably. (Analysis users have a similar need, but they should probably focus more on the skims.) 7) This one is somewhat difficult. We can probably do some basic tests to verify that things are mechanically behaving as they should (data once read back is what it should be given what was written), but verifying the validity of the contents needs the involvement of the contact person from that skims AWG. ============================================================================== -> Plan for validating PR for the fall. Main points: (a) At least 1 PR ER farm should be operating continuously from now until the fall. More than 1 should be tried occasionally (i.e. we should be capable of doing o(1ifb)/week) at a minimum) (b) A byte-for-byte comparison of the Kanga mini with the objy mini will be done on smaller samples in standalone mode. This may be going in parallel with other initial tests. (c) The first people to sign off on things should be the detector subsystems. As we have done in the past, we should aim to produce a series of coherent samples of o(1ifb) and introduce their fixes (code and calibrations in a "rolling" fashion). (d) All of the PR monitoring and QA stripcharts should be made available, as they are for normal running, to allow the detector subsystems to evaluate the data in the standard way. (e) The samples should be swept from Padova to SLAC as they are produced and made available on disk at SLAC. The latency for this _should_ be shorter than for steady-state processing in the fall given that the conditions/xtc files should already be present in Padova. (f) The last point requires appropriate disk space (as the samples should initially be disk resident). 1ifb of AllEvents+TriggerStream is XXX GB Samples should be flushed from disk when all of the subsystems agree that they are no longer useful. (g) The detector subsystems should sign off on the samples as they are produced based on the monitoring plots and QA stripcharts. This is to validate the reconstruction itself (i.e. point 1 above) (h) The detector subsystems should sign off on _reading_ the mini in the Kanga/ROOT format using whatever QA application they use. Even better is if they can perform whatever "task" they might normally perform on such a sample (e.g. calibration). This addresses point 6 above. (i) The physics output of these samples should be validated. (See skims discussion below.) (j) Since we plan to "turn off PhysProdSequence" for the fall in Elf, we should explicitly do some number of runs (say 1 days worth, 150ifb) both before _and_ after making this change with other code unchanged to allow the subsystems to sign off on what should be an identity check. This should be done _after_ the subsystems have signed off on the "normal" reco, but well before we start taking data, say in early August. (And should be done after a standalone test has already shown no differences.) (k) The PC pass should be done as needed, but not more often that once/week as part of this "rolling" process. A farm at SLAC should be available for this. ============================================================================== -> Plan for validating SP for the fall. Main points: (a) A dedicated queue of cpus should be available for this, sufficient to produce 2M(?) generic events per week plus (for each new release) the "standard" signal samples used for signoff. These resources should be made available at SLAC. (b) New code and constants will be deployed in the same "rolling" fashion as for PR. (b) The resources should be used to produce samples which are periodically passed to the detector subsystems for sign-off. (d) The monitoring plots from the production should be made available to the detector subsystems. (e) The eventstore output itself should be made available to the subsystems. They should test the usability of this data by running their (mini-based) QA applications on the output. (f) The last point requires disk space so that the samples can be made disk resident. A figure of merit for this is: 1M events * 25kB/event = 25GB (g) The physics output of should be validated. (See skims discussion below) (h) Since we plan to "turn off PhysProdSequence" for the fall in Moose, we should explicitly do some number of runs both before _and_ after making this change with other code unchanged to allow the subsystems to sign off on what should be an identity check. This should be done _after_ the subsystems have signed off on the "normal" reco, but well before we start SP6. Assuming that the similar PR/Elf test has been done, this is more or less a pro-forma test. (It should be done in any case after a standalone test has already shown no differences.) ============================================================================== -> Plan for validating the skims for the fall. Main points: (a) We want to start on 1 Oct. (b) All code which changes _selections_ should be available by 1 Sept. (c) All code which changes composite lists stored or adds user data should be available by 15 Sept. (d) From 1 Sept. on, we will run the skim application continously to produce small samples for validation checks. (e) The first thing we need to do is validate that the _selection_ rates are correct and consistent with (or different from) as expected from the latest skims in 12.x.x. There should be at least one check of this before 1 Sept. (probably in August) to look for other problems in the 13.x.x/14.x.x releases. (f) Tools to automate the collection and presentation of the skim rates (as well as monitoring from tests of the SkimMiniApp?) should be built. (g) The AWG expert should also sign off on the _content_ of the composite lists and user data. This is a process we will have to iterate on to automate it. It is necessary to automate it to facilitate changing "experts" over time.