D⁰, D⁺, D_s⁺, D^*+-meson, and Λ_c⁺-baryon analysis code

Code for the measurement of D_s⁺, D⁺, D^*+-meson, and Λ_c⁺-baryon p_T-differential yields starting from the outputs of the AliPhysics tasks AliAnalysisTaskSEDs.cxx, AliAnalysisTaskSEDplus.cxx, and AliAnalysisTaskSENonPromptLc.cxx, using rectangular or ML selections

Run analysis tasks

Creation of files with selections to be applied on the tasks

In the cutobjects folder all the macros needed to produce the cut-object files used in the tasks are stored

Run D⁺, D_s⁺, and Λ_c⁺ tasks with private jobs

The AliAnalysisTaskSEDplus.cxx, AliAnalysisTaskSEDs.cxx and AliAnalysisTaskSENonPromptLc.cxx tasks can be run with private jobs using the RunAnalysisDplusDsLcTask.cc script in the runanalysistask folder:

root -l RunAnalysisDplusDsLcTask.cc+(TString configfilename = configfile.yml, TString runMode = "full", bool mergeviajdl = true)

where configfile.yml is a configuration file (such as runAnalysis_config_LHC17p_cent.yml) with the information about the dataset, the AliPhysics version, and the task options to be used. The tasks options include the possibility to create a tree for the ML studies or apply a ML model trained with xgboost or scikit learn. The ML model application is not supported by the Λ_c⁺ task.

Train output merge

The by-hand merge of unmerged outputs of a ALICE analysis train or private jobs can be performed with the script in the merge folder:
```
python3 MergeTrainOutputs.py files_to_merge.yml
```
where files_to_merge.yml is the configuration file containing the information about the outputs that has to be merged such as files_to_merge_LHC18q.yml

Main analysis with THnSparses

Pre-filter ThnSparses

The THnSparse in the task outputs can be pre-filtered to reduce the file size (useful if the train outputs are too large and cannot be merged) with the FilterSparse.py script in the filterdata folder:
```
python3 FilterSparse.py configfile.yml cutset.yml
```
where configfile.yml is the config file with the info of the input files and cutset.yml is the set of selections to be applied in the filtering. It creates output files as the input ones, with the ThnSparses filtered. With the option --suffix suffixname, a suffix is added to the output file names, otherwise the input files are overwritten. With the option --plot it creates control plots that are saved in .pdf files

Projection of invariant-mass distributions from THnSparses

Project the THnSparse with the desired selections into invariant-mass distributions (TH1F):
```
python3 ProjectDplusDsSparse.py configfile.yml cutset.yml output.root
```
where configfile.yml is a configuration file with the info of the input files (such as config_Dplus_pp_data_tree.yml), while cutset.yml is the set of selections to be applied.

To apply p_T weights in case of MC the --ptweights argument followed by the name of the input file with the p_T weights and the name of the p_T-weights histogram should be parsed. In this case, the p_T weights are applied to both the prompt and the FD distributions. If also the --ptweightsB argument followed by the name of the input file with the p_T^B weights and the name of the p_T^B-weights histogram is parsed, the p_T weights for the FD are computed from the B-mother p_T

Main analysis with TTrees or dataframes

Filter trees to prepare data sets for ML studies

To filter trees produced with the D_s⁺and D⁺ tasks and divide each category (data, MC prompt D, MC feed-down D, MC background) in a separated file (tree or dataframe) to prepare the datasets for the ML analyses, the FilterTrees4ML.cc and FilterTrees4ML.py scripts in the filterdata folder can be used:

root -l FilterTrees4ML.cc+(TString configfilename = configfile.yml)

python3 FilterTrees4ML.py configfile.yml

where configfile.yml is a configuration file (such as config_Dplus_data_skim_pp5TeV.yml) that contains the information about the decay channel, the input files, the preselections to apply, the features to keep and the output files. The output files are by default root files in the c++ script and parquet in the python script. If the --root option is used, the output data are saved into root files instead of parquet files.

Machine Learning analsyis for D-meson candidate selections

To be added

Projection of invariant-mass distributions from TTrees

Project the TTree or dataframe with the desired selections into invariant-mass distributions (TH1F):
```
python3 ProjectDplusDsTree.py configfile.yml cutset.yml output.root
```
where configfile.yml is a configuration file with the info of the input files, including the original task output (such as config_Dplus_pp_data_tree.yml), while cutset.yml is the set of selections to be applied. It autodetects whether the input files are root files containing TTrees or parquet files containing pandas dataframes.

Common analysis

The following steps can be performed after having projected THnSparse or TTree (dataframe) objects

Raw yield extraction

To perform raw-yield extraction either a ROOT or a python script can be used.

c++:

root -l GetRawYieldsDplusDs.C+(int cent, bool isMC = false, TString infilename = "distributions.root", TString cfgfilename = "config_Fit.yml", TString outFileName = "output.root")

python:
```
python3 GetRawYieldsDplusDs.py config_Fit.yml centName distributions.root output.root
```
where distributions.root is the file obtained projecting the data or MC THnSparse and config_Fit.yml is a configuration file with the inputs needed to perform the invariant-mass fits such as config_Ds_Fit.yml and output.root is the name of the output .root file name. In case of the python script, the --isMC option can be used to specify if the input distributions are from MC simulations and the --batch option can be used to execute the script in batch mode.

Efficiency-times-acceptance computation

The efficiency-times-acceptance computation is done in two steps:

Efficiency computation:
```
python3 ComputeEfficiencyDplusDs.py config_Fit.yml centName distributionsMC.root output.root
```
where distributions.root is the file obtained projecting the MC THnSparse and config_Fit.yml is the same config file used for the raw-yield extraction needed to have the same p_T binning. The --batch option can be used to execute the script in batch mode.
Acceptance and efficiency combination:
```
python3 CombineAccTimesEff.py effFileName.root accFileName.root outFileName.root
```
where effFileName.root is the file with the efficiencies computed in the previous step and accFileName.root is the file with the acceptance computed using the ComputeAcceptance.C

both can be run with the --batch argument to avoid the canvas window

Standard analysis with theory-driven prompt fraction evaluation

Cross section

For the computation of the cross section, a modified version of HFPtSpectrum.C present in this repository, is used

Nuclear modification factor

For the computation of the nuclear modification factor, a modified version of HFPtSpectrumRaa.C present in this repository, is used

Corrected yield

For the computation of the p_T-differential corrected yields, a modified version of ComputeDmesonYield.C present in this repository, is used

Analysis with data-driven evaluation of prompt / feed-down fraction

Prompt / feed-down fraction

The evaluation of the prompt / feed-down fractions can be performed with the cut-variation method with the script:

python3 ComputeCutVarPromptFrac.py cfgFileName.yml outFileName.root

where cfgFileName.yml is a configuration file such as config_Dplus_PromptFrac_pp5TeV.yml). The method requires several raw yields and efficiency files obtained with different topological selections applied to enrich/reduce the prompt or the feed-down contribution.

Cross section

The computation of the prompt / feed-down p_T-differential cross sections can be performed with the script:
```
python3 ComputeDataDrivenCrossSection.py rawYieldFile.root effAccFile.root fracFile.root outFile.root [--prompt] [--FD] [--Dplus] [--Ds] [--system] [--energy] [--batch]
```
where rawYieldFile.root, effAccFile.root, fracFile.root are the root files containing the raw yields, the acceptance-times-efficiency factors, and the fraction of prompt (feed-down) D mesons estimated with the cut-variation method (previous paragraph), while outFile.root is the ROOT output file. The optional parameters are needed to define wether the prompt or the feed-down cross section should be computed for the D_s⁺ or D⁺ meson, the system (pp or Pb-Pb) and the centre-of-mass energy.

Run full analysis

To run the full analysis escept for the ML part, from the raw-yield extraction to the nuclear modification factor and the corrected yields, the script
```
sh RunFullAnalysis.sh
```
can be used by setting some hard-coded parameters

Significance optimisation

Optimisation with TTrees

The script ScanSelectionsTree.py can be used to compute expected quantities (i.e. signal, background, significance, S/B, prompt fraction) for all combinations of different selection criteria:
```
python3 ScanSelectionsTree.py cfgFileName.yml outFileName.root
```
where cfgFileName.yml is a yaml config file containing all the information about the input data to be used and the selections to be tested, such as config_Dplus_pp5TeV_Optimisation.yml. If the number of variables tested are less or equal 2 (i.e. ML outputs), the script produces plots with expected quantities as a function of the applied selections. In any case, a ntuple with all the expected quantities and the values of applied selections is produced and stored in the output file.

Systematic uncertainties

All the code for the evaluation of the systematic uncertainties is in the systematics directory.

Selection efficiency

For the cut-variation studies with TTrees or THnSparses, the configuration files for each set of selection criteria can be created using:
```
python3 MakeCutsFilesForSyst.py
```
the variables and the ranges should be set hard-coded in the script. Once the configuration files are created they can be used to repeat the main analysis with the different selection criteria.

Once the analysis has been repeated for all the sets of selections, the systematic uncertainty can be evaluated using the script in the systematics/seleff directory:

root -l PlotCutVariationsOnePtBin.cc+(TString cfgFileName = "cfgFile.yml")

where the config file cfgFile.yml includes all the information of the sets of selections to be used and the quality criteria that has to be applied, such as config_cutvar_DsFD_pp.yml.

Raw-yield extraction

For the raw-yield extraction uncertainty a multi-trial study (with the usage of AliHFInvMassMultiTrialFit.cxx) can be run with the script in the systematics/rawyields directory:
```
root -l RawYieldSystematics.cc+(TString cfgFileName = "cfgFile.yml")
```
where the config file cfgFile.yml includes all the information of the variations that has to be applied, such as config_multi_trial_DplusFD_pp.yml

Generated MC p_T shape

The systematic uncertainty arising from the shape of the p_T distributions in the MC simulation can be evaluated with the code in the systematics/genptshape directory.
- The first step is the computation of the p_T weights:
```
  python3 ComputePtGenShapeWeights.py inFileMC.root outFile.root [--Dspecie Dname] [--Bspecie Bname] [--PbPb] [--rebin] [--smooth]
```
where the root file inFileMC.root can be the output of the AliAnalysisTaskCheckHFMCProd.cxx task or AliCFTaskVertexingHF.cxx, outFile.root is the output file name, --Dspecie and --Bspecie is the argument to chose the D-meson and B-meson species to use, --PbPb is a flag to enable in case of Pb-Pb analysis while --rebin and --smooth are two flags to apply a rebin of the spectra and a smoothening of the weights.
- The second step step is the computation of the efficiencies with and without p_T weights, as described in the dedicated section.
- The second step step is the evaluation of the systematic uncertainty:
```
  python3 GetPtWeightSyst.py cfgFileName.yml
```
  where cfgFileName.yml is a config file as config_ptshape_syst.yml.yml

Test and validation of alternative code for production of TTrees (AliAnalysisTaskSEHFTreeCreator.cxx)

The validation of the code for production of trees used in ML studies can be done using the scripts in the runanalysistask folder

To run the AliAnalysisTaskSEDs.cxx and AliAnalysisTaskSEHFTreeCreator.cxx on the same files:
```
root -l RunAnalysisTreeCreator.cc+(TString configfilename = configfile.yml, TString runMode = "full", bool mergeviajdl = true)
```
where configfile.yml is a configuration file with the information about the dataset and the AliPhysics version to be used
To run the validation of the output:
```
python3 ValidateTreeCreator.py inputfile inputdir inputlist
```
where inputfile, inputdir, and inputlist are the root file produced by RunAnalysisTreeCreator.C, the name of the TDirectoryFile and the TList inside the root file

D0, D+, Ds+, D*+-meson, and Λc+-baryon analysis code

Code for the measurement of transverse-momentum differential yields of charm hadrons using rectangular or ML selections

D⁰, D⁺, D_s⁺, D^*+-meson, and Λ_c⁺-baryon analysis code

Run analysis tasks

Creation of files with selections to be applied on the tasks

Run D⁺, D_s⁺, and Λ_c⁺ tasks with private jobs

Train output merge

Main analysis with THnSparses

Pre-filter ThnSparses

Projection of invariant-mass distributions from THnSparses

Main analysis with TTrees or dataframes

Filter trees to prepare data sets for ML studies

Machine Learning analsyis for D-meson candidate selections

Projection of invariant-mass distributions from TTrees

Common analysis

Raw yield extraction

Efficiency-times-acceptance computation

Standard analysis with theory-driven prompt fraction evaluation

Cross section

Nuclear modification factor

Corrected yield

Analysis with data-driven evaluation of prompt / feed-down fraction

Prompt / feed-down fraction

Cross section

Run full analysis

Significance optimisation

Optimisation with TTrees

Systematic uncertainties

Selection efficiency

Raw-yield extraction

Generated MC p_T shape

Test and validation of alternative code for production of TTrees (AliAnalysisTaskSEHFTreeCreator.cxx)

D0, D+, Ds+, D*+-meson, and Λc+-baryon analysis code

Run analysis tasks

Creation of files with selections to be applied on the tasks

Run D+, Ds+, and Λc+ tasks with private jobs

Train output merge

Main analysis with THnSparses

Pre-filter ThnSparses

Projection of invariant-mass distributions from THnSparses

Main analysis with TTrees or dataframes

Filter trees to prepare data sets for ML studies

Machine Learning analsyis for D-meson candidate selections

Projection of invariant-mass distributions from TTrees

Common analysis

Raw yield extraction

Efficiency-times-acceptance computation

Standard analysis with theory-driven prompt fraction evaluation

Cross section

Nuclear modification factor

Corrected yield

Analysis with data-driven evaluation of prompt / feed-down fraction

Prompt / feed-down fraction

Cross section

Run full analysis

Significance optimisation

Optimisation with TTrees

Systematic uncertainties

Selection efficiency

Raw-yield extraction

Generated MC pT shape

Test and validation of alternative code for production of TTrees (AliAnalysisTaskSEHFTreeCreator.cxx)

D⁰, D⁺, D_s⁺, D^*+-meson, and Λ_c⁺-baryon analysis code

Run D⁺, D_s⁺, and Λ_c⁺ tasks with private jobs

Generated MC p_T shape