Dear All,

The phone conference to discuss HLT database access is booked for Wednesday 
14th December at 17:30 CERN time. Call +41 22 767 7000 and ask for the ATLAS 
online database meeting organised by Richard Hawkings.

For some introduction to the ATLAS model for conditions data, have a look at 
section 4.4 in the ATLAS computing TDR (in particular online and Athena access, 
performance and scalability).

http://atlas-proj-computing-tdr.web.cern.ch/atlas-proj-computing-tdr/PDF/Computing-TDR-final-July04.pdf

I enclose below some discussion on the current status of HLT database 
replication possibilities, which I hope is understandable. I should apologies - 
I hoped to write something more coherent for input to the meeting, but have run 
out of time for today.

regards,
Richard.


--------------------------

Here is some more input to the A-Team HLT database access discussion from the 
last A-team (consider it as a response to my 'actions' in the last meeting).

Actions >>>

1. Ask Richard to explain the mechanisms to propagate database changes from
   primary to secondary cache (subfarm) and local cache (local disk of
   farm node).

-> At the moment, this is largely undefined. There will be a 'master' 
-> online
database server (Oracle) which will actually reside in the computer centre.
For database scalability and network bandwidth reasons, some sort of local 
replication or caching will be needed to support access by the 1000s of HLT nodes.
For relational databases accessed by COOL or RAL (all Athena database access 
should go this way) we have two possibilities:

 - make physical replicas of the data which will be needed in next HLT
   run. RAL allows us to prepare these replicas as local MySQL databases
   or SQLite files, which can then be distributed to intermediate servers
   (e.g. one per rack) or even (in the case of SQLlite) to each worker node.
   The issues here are:
     - identify data needed for the next run
     - extract the neccessary data from Oracle and copy to other technology
     - distribute the data (MySQL db dump, or SQLlite file) to needed nodes
   This will clearly take some time; we need to understand if this should
   be done 'from scratch' at each new 'prepare for run' step, or we should
   foresee some kind of incremental update. COOL is already providing
   some selective replication tools which can do the 'identify + extract'
   step here, for data which is directly in COOL. For data which is is 
   stored as references in COOL to other database info, we need to provide
   some intelligence in the replication tool so that it also follows the
   links from COOL to the payload data tables.

 - the alternative possibility is to use the RAL Frontier approach, where
   database SQL lookups are encoded as http requests, which can then 
   be cached at an intermediate proxy server. Since each HLT node will make
   EXACTLY the same database queries at start of run, this is potentially
   a very attractive approach - we would have one (or maybe more) layers(s)
   of proxy servers talking to the database, and the worker nodes just 
   talk to the proxy. Since this is done at the RAL level, it will cope
   with both COOL and RAL database lookup, so its quite attractive.
   The easiest way to update at a new run would simply be to flush all the
   proxies and have them re-retrieve from the database (it would probably
   be worth having one 'pilot' worker node go first to seed the proxy caches
   with the neccessary data).  

The choice of approaches depends largely on testing, which requires people, 
which at the moment we don't have (one of the issues for the current online 
database taskforce).

We also have to deal with the POOL conditions data files required by some 
subdetectors - here some sort of set of buffer disks is clearly required.
I'm hoping we can leverage some of the DDM tools to manage these data, but 
again at the moment this is completely unproven.

All of the above discussion assumes that new conditions are loaded into the 
HLT only at start of run, and this will clearly take some time (minutes?).
So this is the sort of thing you can do at start of fill, but not at the 
start of a new run within a full coming from a checkpoint transition.
Different people have ideas about how you might perform 'incremental' 
conditions updates during a fill, but to my mind that looks very difficult 
without incurring significant deadtime; the archiecture of the conditions 
database is not really designed to do this (when you ask for an object, 
you get back its interval of validity, and you don't expect that to change 
'under your feet' during a run).

5. Ask Richard to describe the mechanism by which new conditions data
   are produced and the relationship between this and the TDAQ end-of
   run transition (if any). Ask about how & when they are propagated
   to databases. (Relates to 4.)

We can expect new conditions data to be produced during a run, and make 
their way into the online conditions database. This is largely asynchronous 
of the TDAQ end of run transition - data will be produced during a fill, 
and possible afterwards as a result of monitoring processes.
If we start the next run with some sort of cache flush process, they will 
be picked up automatically if Athena makes a fresh request at the start of 
the next run. One problem I see is synchronising the start of run - how do
we decide when all required  updates are in and registered in the online 
conditions database. Only then can we start the setting up of replicas 
for the HLT nodes to use (whether by physical or virtual Frontier replicas). 
This process could take some time...