Table of Contents
Converting a run to root
Analysis pipeline
The conversion of a .fast (or fast.gz) file to a root file is a process that involve several operation. Some of them are carried away by fasterac utilities and some by nptool v4 . In order for nptool to read .fast file you need to install the faster nptool plugin and the nebula-plus nptool plugin for analysis of nebula-plus. The following section will detailled what each step is doing. However script exist that will run the entire pipeline automatically (see following sections).
Step 1: grouping of SAMURAI trigger and QDC data
The grouping is an offline software trigger applied to a run. A fast file is readout and for every SAMURAI trigger register in faster DAQ (on channel 189), a new event is created. Since the SAMURAI trigger is coming aroung 800ns after a QDC hit in Nebula-plus, a window of 1us is open before the Trigger. All QDC event inside this window are associated with that entry.
This operation is carried out by the faster_file_group binary from fasterac package.
faster_file_group -l -e"189" -b1000 -a0 -f"3000" -g"3001" <input file> <output file>
Here the arguments are:
- -l : lossless mode disable (all followers outside the event are removed from the output file. Single are kept anyway)
- -e”<boolean expression>”: a boolean expression describing the trigger condition, here only a hit in channel 189 is required
- -b<time window in ns>: time window before the trigger
- -a<time window in ns>: time window after the trigger
- -f”<followers>”: List of label to be included in the event. Here we use the label of the groupe 3000, the online trigger.
- -g”<group number>”: group number (label) of the new trigger
- <input_file.fast|.fast.gz> and <output_file.fast>
Step 2: Conversion from .fast to .root
This conversion is handled by the npconversion utility from nptool v4 using the faster nptool plugin for readout of the .fast file and the root nptool plugin for writeout of the .root file. The command is the following:
npconversion --detector <detector.yaml> --input faster,<pid file>,<.fast file> --output root,<TreeName>,<.root file>
Here the arguments are:
- –detector <detector.yaml> : this file describe the detector configuration of the experiment. It will be used to load the plugin associated with each detector used. In our case the presence of the nebula-plus block will trigger the load of the plugin and subsequent call to the necessary method by the framework
- –intput faster,<pid file>,<.fast file> : the faster token will trig the load of the faster plugin and its configuration. The <pid file> is a produce by the faster DAQ and is used to associate a name to the channel ID. Based on this name the correct detector class will be called during convertion.
- –output root,<tree name>,<.root file> : the root token will trig the load of the root plugin and its configuration. The <tree name> will be used to name the tree inside the output root file.
The output root file contains a tree with a Data class per detector. Such class are designed to hold uncalibrated data inside std::vector. At this stage no analysis is performed.
Step 3: Tree Trimming (Merge)
To perform the merge we want to produce an nptool root file that is entry match with the anaroot root file. Since an entry is produce in the nebula plus root file for every received trigger from SAMURAI DAQ, the tree should be already entry match.
However the faster trigger merger has a limitation of 127 channel within a single event. Some events have more than 127 channel (about 1 every 30 Millions during cosmic run). The issue cause faster trigger merger to put the 127 first QDC hit of the event in a special group (group 0), and leave alone the hit in the trigger channel (channel 189). During the merge the group 0 events are kept in the output tree but they do not match any entry in the anaroot file, as they contain no trigger TS. The corresponding trigger is left alone in a separate entry that match the anaroot one.
When this happen the difference in time between entry n of nptool tree and entry n of anaroot tree will have a time difference greater than 1us. If this is the case, the entry is dropped from the Nebula Plus output root file.
The trimmed tree is produced using the nebp_anaroot_merger utility from the nebula-plus nptool plugin.
nebp_anaroot_merger --window 1000 --nptool root,<treename>,<filename> --anaroot root,<treename><filename> --output root,<treename>,<filename>
Here the arguments are:
- –window <window lenth in ns> : acceptable TS diff between nebula plus and nebula
- –nptool: list of space separated root file containing a nebula-plus branch
- –anaroot: a single root file from anaroot with a TS branch
- –nptool-tree-name <tree name> : optional flag specifying the tree name inside the nebula plus root file. Default is “DataTree”.
- –anaroot-tree-name : optional flag specifying the tree name inside the anaroot root file. Default is “tree”.
The output root file should be of equal length, or smaller than the corresponding anaroot file and could be associated with it using the Friend Tree mechanism.
On expandacq
On expandacq a Snakemake workflow has been implemented to run all of the step mentioned above automatically. Like make, Snakemake has a set of rule to produce necessary file for each steps. If a file already exist it will not be recreated, unless the rule for producing it has been changed.
The workflow is configured using a yaml file. Here is a sample yaml file:
# processing chain for fast file # fast -> group -> data -> merged -> physics run: start: 220 stop: 240 # main data path data_path: "data" # experiment name exp_name: "Test_SAMURAI" # base name of the run basename: "nebnptest" # for remote nebula download: anaroot_basename: "nebula0"
Here is the content of the Snakefile currently used. The first section of the file is written in plain python and use information from analysis.yaml to produce a list of output file, i.e. one merged file per run. The following sections contains rules to make each intermediate file.
import pathlib import glob import os import yaml with open("analysis.yaml", "r") as f: config = yaml.load(f, Loader=yaml.FullLoader) # list of run to analyse RUN=list(range(config["run"]["start"],config["run"]["stop"]+1)) basename = config["basename"] data_path = config["data_path"] exp_name = config["exp_name"] anaroot_basename = config["anaroot_basename"] # list of all output to be produced nebulaplus_output=[] nebula_output=[] merge_output=[] run_subrun={}# key: run, val: list of subrun # building the dictionnaries for r in RUN: nebula_output.append(data_path+"/"+exp_name+"_analysis/anaroot/"+anaroot_basename+str(r)+".root") SUBRUN=[] base=data_path+"/"+exp_name+"/"+basename+"_"+str(r)+"_" path=glob.glob(base+"*.fast") if path: # build date DATE=path[0].split(base) DATE=DATE[1].split(".fast") # build subrun fast=glob.glob(base+"*.fast/*.fast") for f in fast: subrun=f[-9:-5] SUBRUN.append(subrun) nebulaplus_output.append(data_path+"/"+exp_name+"_analysis/merged/"+basename+"_"+str(r)+"_"+DATE[0]+".root") run_subrun[str(r)]=SUBRUN; ############################################################ ####################### TRIGGER ############################ ############################################################ # trigger the entire analysis chain rule pipeline: threads: 10 # limit the number of concurent convertion input: # list of all file to be produced nebulaplus_output # trigger download of nebulaXXXX.root file frome ribfana04 rule remote: input: nebula_output ############################################################ ######################## RULE ############################## ########################################################### # group trigger with QDC and produce a groupped fast file rule group: input: "{data_path}/{exp_name}/{base}_{run}_{date}_{time}.fast/{base}_{run}_{date}_{time}_{subrun}.fast" output: "{data_path}/{exp_name}_analysis/group/{base}_{run}_{date}_{time}_{subrun}_group.fast" shell: "./script/fastergroup_ts.sh {input} {output}" ########################################################### # convert a groupped fast file to root rule convert: input: # a .fast file to be converted "{data_path}/{exp_name}_analysis/group/{base}_{run}_{date}_{time}_{subrun}_group.fast" output: # a .root file to be produced by npconvertion "{data_path}/{exp_name}_analysis/root/{base}_{run}_{date}_{time}_{subrun}.root" shell: "npconversion --input faster,sample.pid,{input} --output root,DataTree,{output} --detector detector.yaml > .convert" ########################################################## # trim a nebula root file based on match with anaroot rule merge: input: # a nptool and anaroot file nebp=lambda wildcards: expand("{{data_path}}/{{exp_name}}_analysis/root/{{base}}_{{run}}_{{date}}_{{time}}_{subrun}.root",subrun=run_subrun[wildcards.run]), anaroot="{data_path}/{exp_name}_analysis/anaroot/nebula0{run}.root" output: # a .root file to be produced by trimming "{data_path}/{exp_name}_analysis/merged/{base}_{run}_{date}_{time}.root" shell: "nebp_anaroot_merger --window 1000 --nptool {input.nebp} --anaroot {input.anaroot} --outfile {output}>.merger" ############################################################ # download remotely converted nebula file rule download_one_anaroot_root: output: "{output}" wildcard_constraints: output=".*anaroot.*.root" # any root file containing anaroot in its path # shell: # "scp ribfana04:~/rootfiles/nebula/"+os.path.basename("{output}")+" {output}" run: command="scp ribfana04:~/rootfiles/nebula/"+os.path.basename(output[0])+" "+output[0] +"> .download" os.system(command)
Quick Summary:
npp commissioning snakemake pipeline
On ribfana04
On ribfana04 a special environment need to be loaded because nptool require a recent compiler. In addition all the faster lib and binary need to be added to the relevant path. Prior to this analysis, one need to process the ridf file for the run in anaroot.
All nebula-plus related software are installed in the directory :
/home/s053/exp/exp2301_s053/faster
in this directory your will find several component of the conversion and analysis process:
- faster : contain the fasterac lib necessary for reading faster file
- ctm2 : contain the C Trigger Merger of faster acq, this contain the faster_file_group binary necessary to re-run the software trigger
- nptool : contain the nptool framework installation
- plugin: contain the nptool plugin installation
- s053: the nptool project were all the work will take place
- sandbox: an nptool project used for dev and debug.
- nebula_plus.sh : a bash script that prepare the environment to use all the above mentionned tool.
The first step is to source the nebula_plus.sh file.
cd /home/s053/exp/exp2301_s053/faster source nebula_plus.sh
## load the correct version of gcc ## . /opt/rh/devtoolset-9/enable ## load the matching version of root ## . /opt/cpp17/root-6-24-08-install/bin//thisroot.sh ## for nptool ## # what env should be used export NPTOOL_ENV=plugin # place were the env folder are located/created export NPTOOL_HOME=/home/s053/exp/exp2301_s053/faster/ # load the nptool config (must happen after previous export of HOME and ENV) source /home/s053/exp/exp2301_s053/faster/nptool/install/bin/nptool.sh ## for fasterac ## source /home/s053/exp/exp2301_s053/faster/faster/fasterac-2.18/install/bin/fasterac_config.sh # this variable allow nptool to find fasterac sources export FASTERAC=/home/s053/exp/exp2301_s053/faster/faster/fasterac-2.18/install/ ## C Trigger Merger from faster ## export LD_LIBRARY_PATH=/home/s053/exp/exp2301_s053/faster/ctm2/lib:$LD_LIBRARY_PATH export PATH=/home/s053/exp/exp2301_s053/faster/ctm2/bin:$PATH ## for snakemake ## conda activate snakemake
Now that the environment is correctly set, we can access the desired nptool project directory using (here for s053 project):
npp s053
the s053 folder is prepared with symbolic link to the raw faster data, the processed faster data and the anaroot data:
- raw_data : /home/s053/rawdata/nebulaplus/
- processed_data : /home/s053/rootfiles/nebulaplus/
- anaroot : /home/s053/rootfiles/nebula/
The first step to perform a conversion is to download the newly acquired .fast file from expandacq to ribfana04 . This is achieve via the rsync script rsync_expand_test.sh :
rsync_expand_test.sh
After rsync is done, and the corresponding anaroot file for this run produced, we can run the snakemake workflow prepared in the Snakefile. This one is using the analysis.yaml file as a configuration :
processing chain for fast file # fast -> group -> data -> merged -> physics run: start: 220 stop: 240 # raw data path raw_data_path: "raw_data" # processed data path processed_data_path: "processed_data" # base name of the run basename: "nebnptest" # base name of the anaroot file anaroot_path: "anaroot" anaroot_basename: "nebula"
The Snakefile is produced below. The first part contain plain python creating the list of final file to produced (i.e. post merged), and then rules to create all intermediate file.
import pathlib import glob import os import yaml with open("analysis.yaml", "r") as f: config = yaml.load(f, Loader=yaml.FullLoader) # list of run to analyse RUN=list(range(config["run"]["start"],config["run"]["stop"]+1)) basename = config["basename"] raw_data_path = config["raw_data_path"] processed_data_path = config["processed_data_path"] anaroot_basename = config["anaroot_basename"] anaroot_path= config["anaroot_path"] # list of all output to be produced nebulaplus_output=[] nebula_output=[] merge_output=[] run_subrun={}# key: run, val: list of subrun # building the dictionnaries for r in RUN: nebula_output.append(anaroot_path+"/"+anaroot_basename+str(r).zfill(4)+".root") SUBRUN=[] base=raw_data_path+"/"+basename+"_"+str(r)+"_" path=glob.glob(base+"*.fast") if path: # build date DATE=path[0].split(base) DATE=DATE[1].split(".fast") # build subrun fast=glob.glob(base+"*.fast/*.fast") for f in fast: subrun=f[-9:-5] SUBRUN.append(subrun) nebulaplus_output.append(processed_data_path+"/merged/"+basename+"_"+str(r)+"_"+DATE[0]+"_merged.root") run_subrun[str(r)]=SUBRUN; ############################################################ ####################### TRIGGER ############################ ############################################################ # trigger the entire analysis chain rule pipeline: threads: 10 # limit the number of concurent convertion input: # list of all file to be produced nebulaplus_output ############################################################ ######################## RULE ############################## ########################################################### # group trigger with QDC and produce a groupped fast file rule group: input: lambda wildcards: expand("{raw_data_path}/{{base}}_{{run}}_{{date}}_{{time}}.fast/{{base}}_{{run}}_{{date}}_{{time}}_{{subrun}}.fast",raw_data_path=raw_data_path) output: "{processed_data_path}/group/{base}_{run}_{date}_{time}_{subrun}_group.fast" shell: "faster_file_group -l -e\"189\" -b1000 -a0 -f\"3000\" -g\"3001\" {input} {output}" ########################################################### # convert a groupped fast file to root rule convert: input: # a .fast file to be converted "{processed_data_path}/group/{base}_{run}_{date}_{time}_{subrun}_group.fast" output: # a .root file to be produced by npconvertion "{processed_data_path}/root/{base}_{run}_{date}_{time}_{subrun}_unmerged.root" shell: "npconversion --input faster,sample.pid,{input} --output root,DataTree,{output} --detector detector.yaml > .convert" ########################################################## # trim a nebula root file based on match with anaroot rule merge: input: # a nptool and anaroot file nebp=lambda wildcards: expand("{{processed_data_path}}/root/{{base}}_{{run}}_{{date}}_{{time}}_{subrun}_unmerged.root",subrun=run_subrun[wildcards.run]), anaroot=lambda wildcards: expand("{anaroot_path}/{anaroot_basename}{{run}}.root",anaroot_path=anaroot_path,anaroot_basename=anaroot_basename) output: # a .root file to be produced by trimming "{processed_data_path}/merged/{base}_{run}_{date}_{time}_merged.root" shell: "nebp_ridf_merger --window 50 --nebp {input.nebp} --ridf {input.anaroot} --outfile {output}>.merger"
To run the pipeline use the following command:
snakemake pipeline --cores 10
Be careful about the number of cores required, as it define the number of concurrent process that will be spawn by the snakemake call.
The analysis pipeline will produce file in the process_data directory. To each step of the conversion and analysis is associated a folder:
- group: .fast file produce after the grouping of channel 189 registering SAMURAI trigger.
- root: nptool conversion of the .fast file to root format
- merged: a trimmed nptool tree that could be Friend with the corresponding anaroot file.
Quick Summary:
cd exp/exp2301_s053/faster/ source nebula_plus.sh npp s053 rsync_expand_test.sh snakemake pipeline --cores 10