![]() |
AliPhysics
2b88e80 (2b88e80)
|
The TrainSetup framework allows users to easily set up an analysis train which can be executed in all environments supported by ALICE.
The train definition takes the form of a class deriving from the base class TrainSetup.
Specific hooks in the base class allows users to customize the various aspects of a train. The base class also facilities to easily define parameters of the train which can be set by parsing simple command line options or strings. Furthermore, the basic setup ensures that the analysis becomes a self-contained, self-documenting unit by storing all relevant files together with the various kinds of output generated during the analysis job.
The execution environment (local, Proof, Grid) is specified as a simple URL like string, with room for environment specific options. This scheme allows a user to run the same analysis in various environments by simply changing the execution environment URL with another URL. Various helpers for each type of environment ensures that all needed steps are taken to help ensure successful execution of the analysis regardless of the underlying execution environment.
Trains defined using this framework can either be executed in an interactive AliROOT session or using a stand-alone program.
Users should define a class that derives from TrainSetup. The class should implement the member function TrainSetup::CreateTasks to add needed tasks to the train. The derived class must also override the member function TrainSetup::ClassName to return the name of the derived class as a C-string.
(Please note, that TrainSetup does not inherit from TObject so one should not put in a call to the ClassDef macro)
Parameters of the user defined class deriving from TrainSetup is best handled by adding options to the internal member fOptions
in the constructor e.g.,
The first 4 forms defined a parameter that has a value, while the last 2 forms defines a flag (or toggle). The values or flags can be retrieved later by doing
Parameters defined this way are directly accessible as options to pass to either runTrain or RunTrain.C
A user defined TrainSetup class can then be run like
or using the program runTrain
> runTrain --class=<class> --name=<name> --url=<uri> [<options>]
Here,
<class>
<name>
<uri>
<options>
runTrain
, the options are of the traditional Unix long type: --<option>=<value>
and --<option>
. The exact list of options for a given train can be listed by passing the option help. In both cases, a new sub-directory called escaped name of the train is created, and various files are copied there - depending on the mode of execution.
For local analysis, no aditional files are copied there, but the output will be put there.
For PROOF analysis, the needed PAR files are copied there and expanded. The output of the job may end up in this directory if so instructed.
For Grid analysis, various JDL and steering scripts are copied to this directory. Scripts to run merge/terminate stages and to download the results are also generated for the users convinence. The special generated script Watch.C
will monitor the progess of the jobs and automatically execute the needed merging and terminate stages. Various files needed by the train are copied to the Grid working directory as a form of documentation.
In all cases, a file named ReRun.C
(and for runTrain: rerun.sh) is generated in this sub-directory. It contains the setting used for the train and can easily be used to run jobs again as well as serve as a form of documentation.
This URI has the form
<protocol>://[[<user>@]<host>]/<input>[?<options>][#<treename>]
and specifies several things.
<protocol>
local
lite
proof
alien
[[<user>@]<host>]
<input>
<options>
<treename>
Local and Grid jobs are in a sense very similar. That is, the individual Grid jobs are very much like Local jobs, in that they always produce output files (albiet not after Terminate, though parameter container files are (re)made).
PROOF jobs are very different. In a PROOF analysis, each slave only produces in memory output which is then sent via net connections (sockets) to the master. One therefore needs to be very of output object ownership and the like.
Another major difference is that output files are generated within the PROOF cluster, and are generally not accessible from the outside. For plain PROOF clusters in a local area network or so-called Lite session, it is generally not a problem since the files are accessible on the LAN or local machine for Lite sessions. However, for large scale analysis farms (AAFs), the workers and masters are generally on a in-accessible sub-net, and there's no direct access to the produced files. Now, for normal output files, like histogram files, etc. there are provisions for this, which means the final merged output is sent back to the client. Special output, such as AODs, are however not merged nor sent back to the user by default. There are two ways to deal with this:
The first mode is specified by passing the option dsname=
<name> in the cluster URI. The created dataset will normally be made in /default/
<user>/
<name>. If the =
<name> part is left out, the escaped name of the job will be used.
The second mode is triggered by passing the option storage=URI
to the train setup. The URI should be of the form
rootd://<host>[:<port>]/<path>
where <host> is the name of a machine accessible by the cluster, <port> is an optional port number (e.g., if different from 1093), and <path> is an absolute path on <host>.
The XRootd process should be started (optionally by the user) on <host> as
xrootd -p <port> <path>
When running jobs on AAFs, one can use the Grid handler to set-up aspects of the job. To enable the Grid handler, pass the option plugin
in the execution URI
For both ESD and AOD input for local jobs, one must specify the root of the sub-tree that holds the data. That is, if - for example - the data resides in a directory structure like
/some/directory/<run>/<seq>/AliESDs.root
then one should specify the input location like
local:///some/directory[?pattern=AliESDs.root][#esdTree] lite:///some/directory[?pattern=AliESDs.root][#esdTree]
/some/directory
is then search recursively for input files that match the pattern given by the analysis type (ESD: AliESDs.root
, AOD: AliAOD.root
). The found files are then chained together. If MC input is specified, then the companion files galice.root
, Kinematics.root
, and TrackRefs.root
must be found in the same directories as the AliESDs.root
files
The input data for a PROOF based analysis is specified as data set names,
proof://[<user>@]<host>/<data-set-name>[?options][#<treename>]
Suppose the ESD files are stored on the Grid as
/alice/data/<year>/<period>/<run>/ESDs/pass<no>/<year><run><chunk>.<part>/AliESDs.root
where <run> is zero-padded by typically 3 '0's. One should specify the input location like
alien:///alice/data/<year>/<period>?pattern=ESDs/pass<no>/*&run=<run>[#<treename>]
If a particular kind of pass is needed, say pass<no>_MUON
, one should do modify the pattern
option accordingly
/alice/data/<year>/<period>/<run>/ESDs/pass<no>_MUON/* /AliESDs.root
For simulation output, the files are generally stored like
/alice/sim/<year>/<prod>/<run>/<seq>/AliESDs.root
where <run> is generally not zero-padded. One should specify the input location like
alien:///alice/data/<year>/<period>?pattern=*&mc&run=<run>[#<treename>]
Suppose your AOD files are placed in directories like
/some/directory/<run>/<seq>/AliAOD.root
where <run> is zero-padded by typically 3 '0's. One should then specify the input as
alien:///some/directory?pattern=*&run=<run>[#<treename>
The AliEn analysis plug-in is then instructed to look for data files under
/some/directory/<run>/* /AliAOD.root
for each added run.
Suppose the AODs are in
/alice/data/<year>/<period>/<run>/ESDs/pass<no>/AOD<vers>/<seq>/AliAOD.root
Then the url should be
alien:///alice/data/<year>/<period>?pattern=ESDs/pass<no>/AOD<vers>/*&run=<run>[#<treename>]
Auxillary libraries should be loaded using
where the argument is the name of the library
If the train needs additional files, say a script for setting up the tasks, or some data file, it can be passed on the the PROOF/Grid workers using the member functions
The base class TrainSetup tries to implement a sensible setup for a given type of analysis, but some times a particular train needs a bit of tweaking. One can therefore overload the following functions
A task can even be defined in a script, like for example a task like
Our train set-up can then use the member function ParUtilities::MakeScriptPAR to make a PAR file of the script and use that to make a library loaded on the workers and then generate an object of our task defined in the script.
This can allow for fast development and testing of analysis tasks without having to wait for official tasks and builds of all of AliROOT
If you want to run an ESD analysis with a set of tender supplies, all you have to do is to pass the option - -tender=
list to runTrain. Here, list is a list of tender supply names:
If you need to specify a non-standard OCDB location, you can do so using the option –ocdb=
location where location can be an OCDB snapshot or a valid OCDB url.
If you pass the option - -ocdb
possibly with an argument, then an instance of the class AliTaskConnectCDB
will be added to the train. This task automatically connects to OCDB for the run being analysed.
The option - -ps=
option defines how to set-up the physics selection. Here option can be
none
In this case the physics selection is completely disabled.custom[=
Script]
A custom physics selection is read from the script Script. If no Script is specified, then Script=CustomPS.C
is assumed. The script must define a function with the same name and that function must accept a single pointer to an AliPhysicsSelection
object.bare
In this case a physics selection is installed on the input handler. but there's no accompanying task.all
Disable filtering on background triggersTo enable friends in the analysis, pass the option - -friends
The specifics of the each possible execution environment and input is handled by sub-classes of the base class Helper. Each of these helpers define
Currently defined helpers are