AliPhysics  95775ff (95775ff)
Using the TrainSetup facility

# Overview

The TrainSetup framework allows users to easily set up an analysis train which can be executed in all environments supported by ALICE.

The train definition takes the form of a class deriving from the base class TrainSetup.

Specific hooks in the base class allows users to customize the various aspects of a train. The base class also facilities to easily define parameters of the train which can be set by parsing simple command line options or strings. Furthermore, the basic setup ensures that the analysis becomes a self-contained, self-documenting unit by storing all relevant files together with the various kinds of output generated during the analysis job.

The execution environment (local, Proof, Grid) is specified as a simple URL like string, with room for environment specific options. This scheme allows a user to run the same analysis in various environments by simply changing the execution environment URL with another URL. Various helpers for each type of environment ensures that all needed steps are taken to help ensure successful execution of the analysis regardless of the underlying execution environment.

Trains defined using this framework can either be executed in an interactive AliROOT session or using a stand-alone program.

# Usage

Users should define a class that derives from TrainSetup. The class should implement the member function TrainSetup::CreateTasks to add needed tasks to the train. The derived class must also override the member function TrainSetup::ClassName to return the name of the derived class as a C-string.

// MyTrain.C
class MyTrain : public TrainSetup
{
public:
MyTrain(const char name="MyTrain")
: TrainSetup(name),
{
// fOptions.Set("type", "AOD"); // AOD input
// fOptions.Set("type", "ESD"); // ESD input
fOptions.Add("parameter", "VALUE", "Help on parameter", "value");
}
protected:
{
AliAnalysisManager::SetCommonFileName("my_analysis.root");
Bool_t mc = mgr->GetMCtruthEventHandler() != 0;
Double_t param = fOptions.AsDouble("parameter");
}
const char* ClassName() const { return "MyTrain"; }
};

(Please note, that TrainSetup does not inherit from TObject so one should not put in a call to the ClassDef macro)

# Parameters of the setup

Parameters of the user defined class deriving from TrainSetup is best handled by adding options to the internal member fOptions in the constructor e.g.,

The first 4 forms defined a parameter that has a value, while the last 2 forms defines a flag (or toggle). The values or flags can be retrieved later by doing

Double_t value = fOptions.AsDouble("<name>",<value if not set>);
Int_t value = fOptions.AsInt("<name>",<value if not set>);
Long64_t value = fOptions.AsLong("<name>",<value if not set>);
Bool_t value = fOptions.AsBool("<name>",<value if not set>)
TString value = fOptions.Get("<name>");
Bool_t value = fOptions.Has("<name>");

Parameters defined this way are directly accessible as options to pass to either runTrain or RunTrain.C

# Execution of the train

A user defined TrainSetup class can then be run like

Root> .x RunTrain.C("<class>", "<name>", "<uri>", "<options>")

or using the program runTrain

  > runTrain --class=<class> --name=<name> --url=<uri> [<options>]

Here,

<class>
is the name of the user defined class deriving from TrainSetup.
<name>
is an arbitary name to give to the train. Note, an escaped name will be generated from this, which replaces all spaces and the like with '_' and (optionally) with the date and time appended.
<uri>
is the job execution URI which specified both the execution environment and the input data, as well as some options. See more below.
<options>
is a list of options. For RunTrain this is a comma separated list of options in the form <option>=<value> for value options and <option> for flags (booleans). For runTrain, the options are of the traditional Unix long type: --<option>=<value> and --<option>. The exact list of options for a given train can be listed by passing the option help.

In both cases, a new sub-directory called escaped name of the train is created, and various files are copied there - depending on the mode of execution.

For local analysis, no aditional files are copied there, but the output will be put there.

For PROOF analysis, the needed PAR files are copied there and expanded. The output of the job may end up in this directory if so instructed.

For Grid analysis, various JDL and steering scripts are copied to this directory. Scripts to run merge/terminate stages and to download the results are also generated for the users convinence. The special generated script Watch.C will monitor the progess of the jobs and automatically execute the needed merging and terminate stages. Various files needed by the train are copied to the Grid working directory as a form of documentation.

In all cases, a file named ReRun.C (and for runTrain: rerun.sh) is generated in this sub-directory. It contains the setting used for the train and can easily be used to run jobs again as well as serve as a form of documentation.

# Execution URI

This URI has the form

  <protocol>://[[<user>@]<host>]/<input>[?<options>][#<treename>]

and specifies several things.

<protocol>
One of
local
Local analysis on local data executed sequentially on the local machine
lite
Proof-Lite analysis on local data executed in parallel on the local machine
proof
Proof analysis on cluster data executed in parallel on a PROOF cluster
alien
Grid analysis on grid data executed on the Grid
[[<user>@]<host>]
Sets the master host for Proof analysis
<input>
Input data specification. The exact form depends on the protocol used e.g., for local analysis it can be a single, while for other environments it could be a data set name, and so on.
<options>
Protocol specific options
<treename>
If specified, gives what data to analyse

# PROOF specifics

Local and Grid jobs are in a sense very similar. That is, the individual Grid jobs are very much like Local jobs, in that they always produce output files (albiet not after Terminate, though parameter container files are (re)made).

PROOF jobs are very different. In a PROOF analysis, each slave only produces in memory output which is then sent via net connections (sockets) to the master. One therefore needs to be very of output object ownership and the like.

Another major difference is that output files are generated within the PROOF cluster, and are generally not accessible from the outside. For plain PROOF clusters in a local area network or so-called Lite session, it is generally not a problem since the files are accessible on the LAN or local machine for Lite sessions. However, for large scale analysis farms (AAFs), the workers and masters are generally on a in-accessible sub-net, and there's no direct access to the produced files. Now, for normal output files, like histogram files, etc. there are provisions for this, which means the final merged output is sent back to the client. Special output, such as AODs, are however not merged nor sent back to the user by default. There are two ways to deal with this:

1. Register the output tree as a data set on the cluster. This is useful if you need to process the results again on the cluster.
2. Send the output to a (possibly custom) XRootd server. This is useful if you need to process the output outside of the cluster

The first mode is specified by passing the option dsname=<name> in the cluster URI. The created dataset will normally be made in /default/<user>/<name>. If the =<name> part is left out, the escaped name of the job will be used.

The second mode is triggered by passing the option storage=URI to the train setup. The URI should be of the form

  rootd://<host>[:<port>]/<path>

where <host> is the name of a machine accessible by the cluster, <port> is an optional port number (e.g., if different from 1093), and <path> is an absolute path on <host>.

The XRootd process should be started (optionally by the user) on <host> as

  xrootd -p <port> <path>

When running jobs on AAFs, one can use the Grid handler to set-up aspects of the job. To enable the Grid handler, pass the option plugin in the execution URI

# Specifying the input

## Local and Lite data input

For both ESD and AOD input for local jobs, one must specify the root of the sub-tree that holds the data. That is, if - for example - the data resides in a directory structure like

  /some/directory/<run>/<seq>/AliESDs.root

then one should specify the input location like

  local:///some/directory[?pattern=AliESDs.root][#esdTree]
lite:///some/directory[?pattern=AliESDs.root][#esdTree]

/some/directory is then search recursively for input files that match the pattern given by the analysis type (ESD: AliESDs.root, AOD: AliAOD.root). The found files are then chained together. If MC input is specified, then the companion files galice.root, Kinematics.root, and TrackRefs.root must be found in the same directories as the AliESDs.root files

## PROOF input.

The input data for a PROOF based analysis is specified as data set names,

  proof://[<user>@]<host>/<data-set-name>[?options][#<treename>]

## Grid ESD input.

Suppose the ESD files are stored on the Grid as

  /alice/data/<year>/<period>/<run>/ESDs/pass<no>/<year><run><chunk>.<part>/AliESDs.root

where <run> is zero-padded by typically 3 '0's. One should specify the input location like

  alien:///alice/data/<year>/<period>?pattern=ESDs/pass<no>/*&run=<run>[#<treename>]

If a particular kind of pass is needed, say pass<no>_MUON, one should do modify the pattern option accordingly

  /alice/data/<year>/<period>/<run>/ESDs/pass<no>_MUON/* /AliESDs.root

For simulation output, the files are generally stored like

  /alice/sim/<year>/<prod>/<run>/<seq>/AliESDs.root

where <run> is generally not zero-padded. One should specify the input location like

  alien:///alice/data/<year>/<period>?pattern=*&mc&run=<run>[#<treename>]

## Grid AOD input

Suppose your AOD files are placed in directories like

  /some/directory/<run>/<seq>/AliAOD.root

where <run> is zero-padded by typically 3 '0's. One should then specify the input as

  alien:///some/directory?pattern=*&run=<run>[#<treename>

The AliEn analysis plug-in is then instructed to look for data files under

  /some/directory/<run>/* /AliAOD.root

Suppose the AODs are in

  /alice/data/<year>/<period>/<run>/ESDs/pass<no>/AOD<vers>/<seq>/AliAOD.root

Then the url should be

  alien:///alice/data/<year>/<period>?pattern=ESDs/pass<no>/AOD<vers>/*&run=<run>[#<treename>]

# Other features

## Auxillary libraries, sources, and files

Auxillary libraries should be loaded using

where the argument is the name of the library

If the train needs additional files, say a script for setting up the tasks, or some data file, it can be passed on the the PROOF/Grid workers using the member functions

The base class TrainSetup tries to implement a sensible setup for a given type of analysis, but some times a particular train needs a bit of tweaking. One can therefore overload the following functions

A task can even be defined in a script, like for example a task like

// MyAnalysis.C
#ifndef __CINT__
# include <AliAnalysisManager.h>
# include <AliESDEvent.h>
# include <AliMultiplicity.h>
# include <AliVEventHandler.h>
# include <AliESDVertex.h>
# include <AliProdInfo.h>
# include <TH1D.h>
# include <TH2D.h>
#else
class TH1D;
class TH2D;
class AliProdInfo;
#endif
{
public:
: AliAnalysisTaskSE(), fList(0), fMult(0), fVz(0), fProd(0)
{}
MyAnalysis(const char* name)
: AliAnalysisTaskSE(name), fList(0), fMult(0), fVz(0), fProd(0)
{
DefineOutput(1, TList::Class());
DefineOutput(2, TList::Class()); // For output from Terminate
fBranchNames = "AliMultiplicity.,SPDVertex.,PrimaryVertex.";
}
{}
virtual ~MyAnalysis() {}
MyAnalysis& operator=(const MyAnalysis&) { return *this; }
virtual void UserCreateOutputObjects()
{
fList = new TList();
fList->SetName("Sums");
fList->SetOwner();
fMult = new TH2D("mult", "SPD tracklets", 80, -2, 2, 10, -10, 10);
fMult->SetXTitle("#eta");
fMult->SetYTitle("v_{z} [cm]");
fMult->Sumw2();
fMult->SetDirectory(0); // Disassociate from file
fVz = new TH1D("vz", "Interaction point", 10, -10, 10);
fVz->SetXTitle("v_{z} [cm]");
fVz->Sumw2();
fVz->SetDirectory(0); // Disassociate from file
PostData(1, fList);
}
virtual void UserExec(Option_t* )
{
if (!fProd) {
AliAnalysisManager *mgr=AliAnalysisManager::GetAnalysisManager();
AliVEventHandler *inputHandler=mgr->GetInputEventHandler();
if (inputHandler) {
Info("", "Got input handler");
TList *uiList = inputHandler->GetUserInfo();
if (uiList) {
Info("", "Got user list:");
uiList->ls();
fProd = new AliProdInfo(uiList);
Info("", "Lising production information");
fProd->List();
}
}
}
AliESDEvent* event = dynamic_cast<AliESDEvent*>(InputEvent());
if (!event) return;
if (event->IsPileupFromSPD(3,0.8)) return;
const AliESDVertex* vtx = event->GetPrimaryVertexSPD();
if (!vtx || !vtx->GetStatus()) return;
if (vtx->IsFromVertexerZ() &&
(vtx->GetDispersion() > 0.2 || vtx->GetZRes() > 1.25 * 0.2))
return;
const AliMultiplicity* mult = event->GetMultiplicity();
if (!mult) return;
Double_t vz = vtx->GetZ();
fVz->Fill(vz);
Int_t nTracklets = mult->GetNumberOfTracklets();
for (Int_t i = 0; i < nTracklets; i++)
fMult->Fill(mult->GetEta(i), vz);
PostData(1, fList);
}
{
TList* l = dynamic_cast<TList*>(GetOutputData(1));
if (!l) {
Warning("Terminate", "No out data # 1 found");
return;
}
TH2D* mult = static_cast<TH2D*>(l->FindObject("mult"));
TH1D* vz = static_cast<TH1D*>(l->FindObject("vz"));
if (!mult || !vz) {
mult, vz);
return;
}
TList* output = new TList; // Needed for new output from Terminate
output->SetName("Results"); // 1st output re-opened read-only
output->SetOwner();
TH2D* out = static_cast<TH2D*>(mult->Clone("dndeta"));
out->SetTitle("dN_{ch}/d#eta from SPD tracklets per vertex bin");
out->SetZTitle("#frac{1}{N}#frac{dN_{ch}}{d#eta}");
out->SetDirectory(0); // Disassociate from file
Int_t nVz = mult->GetNbinsY();
Int_t nEta = mult->GetNbinsX();
for (Int_t iVz = 1; iVz <= nVz; iVz++) {
Double_t nEv = vz->GetBinContent(iVz);
Double_t e1 = vz->GetBinError(iVz);
Double_t sca = (nEv == 0 ? 0 : 1. / nEv);
for (Int_t iEta = 1; iEta <= nEta; iEta++) {
Double_t c = mult->GetBinContent(iEta,iVz);
Double_t e = mult->GetBinError(iEta,iVz);
Double_t ee = TMath::Sqrt(c*c * e1*e1 + nEv*nEv * e*e) * sca*sca;
out->SetBinContent(iEta, iVz, sca * c);
out->SetBinError(iEta, iVz, ee);
}
}
Double_t etaMin = mult->GetXaxis()->GetXmin();
Double_t etaMax = mult->GetXaxis()->GetXmax();
out->Scale(Double_t(nEta) / (etaMax-etaMin));
PostData(2, output);
}
protected:
AliProdInfo* fProd;
};
//
// EOF
//

Our train set-up can then use the member function ParUtilities::MakeScriptPAR to make a PAR file of the script and use that to make a library loaded on the workers and then generate an object of our task defined in the script.

#ifndef __CINT__
# include <AliAnalysisManager.h>
#else
#endif
#include "TrainSetup.C"
#include "ParUtilities.C"
class MyTrain : public TrainSetup
{
public:
MyTrain(const char* name="myTest") : TrainSetup(name)
{
fOptions.Set("type", "ESD");
}
{
"MyAnalysis.C",
"STEERBase,ESD,AOD,ANALYSIS,"
Fatal("CreateTasks", "Failed to create PAR file");
Long_t r = gROOT->ProcessLine("new MyAnalysis(\"test\")");
AliAnalysisDataContainer* sums =
mgr->CreateContainer("Sums", TList::Class(),
AliAnalysisManager::kOutputContainer,
AliAnalysisManager::GetCommonFileName());
AliAnalysisDataContainer* results = // Needed for output from Terminate
mgr->CreateContainer("Results", TList::Class(),
AliAnalysisManager::kParamContainer, // Important!
AliAnalysisManager::GetCommonFileName());
mgr->ConnectOutput(t, 1, sums);
mgr->ConnectOutput(t, 2, results);
mgr->ConnectInput(t, 0, mgr->GetCommonInputContainer());
}
AliVEventHandler* CreateOutputHandler(UShort_t) { return 0; }
const char* ClassName() const { return "MyTrain"; }
};
//
// EOF
//

This can allow for fast development and testing of analysis tasks without having to wait for official tasks and builds of all of AliROOT

## Enabling Tender Supplies

If you want to run an ESD analysis with a set of tender supplies, all you have to do is to pass the option - -tender=list to runTrain. Here, list is a list of tender supply names:

• VZERO
• TPC
• PTFix
• T0
• TOF
• TRD
• VTX
• EMCAL
• PID

If you need to specify a non-standard OCDB location, you can do so using the option –ocdb=location where location can be an OCDB snapshot or a valid OCDB url.

## Enable OCDB access

If you pass the option - -ocdb possibly with an argument, then an instance of the class AliTaskConnectCDB will be added to the train. This task automatically connects to OCDB for the run being analysed.

## Specifying the kind of Physics Selection

The option - -ps=option defines how to set-up the physics selection. Here option can be

• none In this case the physics selection is completely disabled.
• custom[=Script] A custom physics selection is read from the script Script. If no Script is specified, then Script=CustomPS.C is assumed. The script must define a function with the same name and that function must accept a single pointer to an AliPhysicsSelection object.
• bare In this case a physics selection is installed on the input handler. but there's no accompanying task.
• all Disable filtering on background triggers

To enable friends in the analysis, pass the option - -friends

# Implementation details

## Helpers

The specifics of the each possible execution environment and input is handled by sub-classes of the base class Helper. Each of these helpers define

• URI options.
• Steps to be done before the tasks are added to the train