Converting a run to root

Analysis pipeline

The conversion of a .fast (or fast.gz) file to a root file is a process that involve several operation. Some of them are carried away by fasterac utilities and some by nptool v4 . In order for nptool to read .fast file you need to install the faster nptool plugin and the nebula-plus nptool plugin for analysis of nebula-plus. The following section will detailled what each step is doing. However script exist that will run the entire pipeline automatically (see following sections).

Step 1: grouping of SAMURAI trigger and QDC data

The grouping is an offline software trigger applied to a run. A fast file is readout and for every SAMURAI trigger register in faster DAQ (on channel 189), a new event is created. Since the SAMURAI trigger is coming aroung 800ns after a QDC hit in Nebula-plus, a window of 1us is open before the Trigger. All QDC event inside this window are associated with that entry.

This operation is carried out by the faster_file_group binary from fasterac package.

faster_file_group -l -e"189" -b1000 -a0 -f"3000" -g"3001" <input file> <output file>

Here the arguments are:

  • -l : lossless mode disable (all followers outside the event are removed from the output file. Single are kept anyway)
  • -e”<boolean expression>”: a boolean expression describing the trigger condition, here only a hit in channel 189 is required
  • -b<time window in ns>: time window before the trigger
  • -a<time window in ns>: time window after the trigger
  • -f”<followers>”: List of label to be included in the event. Here we use the label of the groupe 3000, the online trigger.
  • -g”<group number>”: group number (label) of the new trigger
  • <input_file.fast|.fast.gz> and <output_file.fast>

Step 2: Conversion from .fast to .root

This conversion is handled by the npconversion utility from nptool v4 using the faster nptool plugin for readout of the .fast file and the root nptool plugin for writeout of the .root file. The command is the following:

npconversion --detector <detector.yaml> --input faster,<pid file>,<.fast file> --output root,<TreeName>,<.root file>

Here the arguments are:

  • –detector <detector.yaml> : this file describe the detector configuration of the experiment. It will be used to load the plugin associated with each detector used. In our case the presence of the nebula-plus block will trigger the load of the plugin and subsequent call to the necessary method by the framework
  • –intput faster,<pid file>,<.fast file> : the faster token will trig the load of the faster plugin and its configuration. The <pid file> is a produce by the faster DAQ and is used to associate a name to the channel ID. Based on this name the correct detector class will be called during convertion.
  • –output root,<tree name>,<.root file> : the root token will trig the load of the root plugin and its configuration. The <tree name> will be used to name the tree inside the output root file.

The output root file contains a tree with a Data class per detector. Such class are designed to hold uncalibrated data inside std::vector. At this stage no analysis is performed.

Step 3: Tree Trimming (Merge)

To perform the merge we want to produce an nptool root file that is entry match with the anaroot root file. Since an entry is produce in the nebula plus root file for every received trigger from SAMURAI DAQ, the tree should be already entry match.

However the faster trigger merger has a limitation of 127 channel within a single event. Some events have more than 127 channel (about 1 every 30 Millions during cosmic run). The issue cause faster trigger merger to put the 127 first QDC hit of the event in a special group (group 0), and leave alone the hit in the trigger channel (channel 189). During the merge the group 0 events are kept in the output tree but they do not match any entry in the anaroot file, as they contain no trigger TS. The corresponding trigger is left alone in a separate entry that match the anaroot one.

When this happen the difference in time between entry n of nptool tree and entry n of anaroot tree will have a time difference greater than 1us. If this is the case, the entry is dropped from the Nebula Plus output root file.

The trimmed tree is produced using the nebp_anaroot_merger utility from the nebula-plus nptool plugin.

nebp_anaroot_merger --window 1000 --nptool root,<treename>,<filename> --anaroot root,<treename><filename> --output root,<treename>,<filename>

Here the arguments are:

  • –window <window lenth in ns> : acceptable TS diff between nebula plus and nebula
  • –nptool: list of space separated root file containing a nebula-plus branch
  • –anaroot: a single root file from anaroot with a TS branch
  • –nptool-tree-name <tree name> : optional flag specifying the tree name inside the nebula plus root file. Default is “DataTree”.
  • –anaroot-tree-name : optional flag specifying the tree name inside the anaroot root file. Default is “tree”.

The output root file should be of equal length, or smaller than the corresponding anaroot file and could be associated with it using the Friend Tree mechanism.

On expandacq

On expandacq a Snakemake workflow has been implemented to run all of the step mentioned above automatically. Like make, Snakemake has a set of rule to produce necessary file for each steps. If a file already exist it will not be recreated, unless the rule for producing it has been changed.

The workflow is configured using a yaml file. Here is a sample yaml file:

# processing chain for fast file
# fast -> group -> data -> merged -> physics
run:
  start: 220
  stop: 240
  
# main data path
data_path: "data"
# experiment name 
exp_name: "Test_SAMURAI"
# base name of the run
basename: "nebnptest"

# for remote nebula download:
anaroot_basename: "nebula0"

Here is the content of the Snakefile currently used. The first section of the file is written in plain python and use information from analysis.yaml to produce a list of output file, i.e. one merged file per run. The following sections contains rules to make each intermediate file.

import pathlib
import glob
import os
import yaml

with open("analysis.yaml", "r") as f:
    config = yaml.load(f, Loader=yaml.FullLoader)

# list of run to analyse
RUN=list(range(config["run"]["start"],config["run"]["stop"]+1))

basename = config["basename"]
data_path = config["data_path"]
exp_name = config["exp_name"]
anaroot_basename = config["anaroot_basename"]


# list of all output to be produced
nebulaplus_output=[]
nebula_output=[]
merge_output=[]
run_subrun={}# key: run, val: list of subrun
# building the dictionnaries
for r in RUN:
  nebula_output.append(data_path+"/"+exp_name+"_analysis/anaroot/"+anaroot_basename+str(r)+".root")
  SUBRUN=[]
  base=data_path+"/"+exp_name+"/"+basename+"_"+str(r)+"_"
  path=glob.glob(base+"*.fast")
  if path:
    # build date
    DATE=path[0].split(base)
    DATE=DATE[1].split(".fast")
    # build subrun
    fast=glob.glob(base+"*.fast/*.fast")
    for f in fast:
      subrun=f[-9:-5]
      SUBRUN.append(subrun)
      nebulaplus_output.append(data_path+"/"+exp_name+"_analysis/merged/"+basename+"_"+str(r)+"_"+DATE[0]+".root")

  run_subrun[str(r)]=SUBRUN;

############################################################
####################### TRIGGER ############################
############################################################
# trigger the entire analysis chain
rule pipeline:
  threads: 10 # limit the number of concurent convertion
  input: # list of all file to be produced 
    nebulaplus_output

# trigger download of nebulaXXXX.root file frome ribfana04
rule remote:
  input:
    nebula_output

############################################################
######################## RULE ##############################
###########################################################
# group trigger with QDC and produce a groupped fast file
rule group:
  input:
    "{data_path}/{exp_name}/{base}_{run}_{date}_{time}.fast/{base}_{run}_{date}_{time}_{subrun}.fast"  
  output:
    "{data_path}/{exp_name}_analysis/group/{base}_{run}_{date}_{time}_{subrun}_group.fast"   
  shell:
    "./script/fastergroup_ts.sh {input} {output}"

###########################################################
# convert a groupped fast file to root
rule convert:
  input: # a .fast file to be converted
     "{data_path}/{exp_name}_analysis/group/{base}_{run}_{date}_{time}_{subrun}_group.fast"
  output: # a .root file to be produced by npconvertion
     "{data_path}/{exp_name}_analysis/root/{base}_{run}_{date}_{time}_{subrun}.root" 
  shell:
    "npconversion --input faster,sample.pid,{input} --output root,DataTree,{output} --detector detector.yaml > .convert"

##########################################################
# trim a nebula root file based on match with anaroot
rule merge:
  input: # a nptool and anaroot file
     nebp=lambda wildcards: expand("{{data_path}}/{{exp_name}}_analysis/root/{{base}}_{{run}}_{{date}}_{{time}}_{subrun}.root",subrun=run_subrun[wildcards.run]), 
     anaroot="{data_path}/{exp_name}_analysis/anaroot/nebula0{run}.root"
  output: # a .root file to be produced by trimming
     "{data_path}/{exp_name}_analysis/merged/{base}_{run}_{date}_{time}.root" 
  shell:
    "nebp_anaroot_merger --window 1000 --nptool {input.nebp} --anaroot {input.anaroot} --outfile {output}>.merger"

############################################################
# download remotely converted nebula file
rule download_one_anaroot_root:
  output:
    "{output}"
  wildcard_constraints:
    output=".*anaroot.*.root" # any root file containing anaroot in its path
#  shell:
#    "scp ribfana04:~/rootfiles/nebula/"+os.path.basename("{output}")+" {output}"
  run:
    command="scp ribfana04:~/rootfiles/nebula/"+os.path.basename(output[0])+" "+output[0] +"> .download"
    os.system(command)

Quick Summary:

npp commissioning
snakemake pipeline

On ribfana04

On ribfana04 a special environment need to be loaded because nptool require a recent compiler. In addition all the faster lib and binary need to be added to the relevant path. Prior to this analysis, one need to process the ridf file for the run in anaroot.

All nebula-plus related software are installed in the directory :

/home/s053/exp/exp2301_s053/faster

in this directory your will find several component of the conversion and analysis process:

  • faster : contain the fasterac lib necessary for reading faster file
  • ctm2 : contain the C Trigger Merger of faster acq, this contain the faster_file_group binary necessary to re-run the software trigger
  • nptool : contain the nptool framework installation
  • plugin: contain the nptool plugin installation
  • s053: the nptool project were all the work will take place
  • sandbox: an nptool project used for dev and debug.
  • nebula_plus.sh : a bash script that prepare the environment to use all the above mentionned tool.

The first step is to source the nebula_plus.sh file.

cd /home/s053/exp/exp2301_s053/faster 
source nebula_plus.sh
## load the correct version of gcc ##
. /opt/rh/devtoolset-9/enable

## load the matching version of root ##
. /opt/cpp17/root-6-24-08-install/bin//thisroot.sh

## for nptool ##
# what env should be used
export NPTOOL_ENV=plugin
# place were the env folder are located/created
export NPTOOL_HOME=/home/s053/exp/exp2301_s053/faster/

# load the nptool config (must happen after previous export of HOME and ENV)
source /home/s053/exp/exp2301_s053/faster/nptool/install/bin/nptool.sh

## for fasterac ##
source /home/s053/exp/exp2301_s053/faster/faster/fasterac-2.18/install/bin/fasterac_config.sh
# this variable allow nptool to find fasterac sources
export FASTERAC=/home/s053/exp/exp2301_s053/faster/faster/fasterac-2.18/install/

## C Trigger Merger from faster ##
export LD_LIBRARY_PATH=/home/s053/exp/exp2301_s053/faster/ctm2/lib:$LD_LIBRARY_PATH
export PATH=/home/s053/exp/exp2301_s053/faster/ctm2/bin:$PATH

## for snakemake ##
conda activate snakemake

Now that the environment is correctly set, we can access the desired nptool project directory using (here for s053 project):

npp s053

the s053 folder is prepared with symbolic link to the raw faster data, the processed faster data and the anaroot data:

  • raw_data : /home/s053/rawdata/nebulaplus/
  • processed_data : /home/s053/rootfiles/nebulaplus/
  • anaroot : /home/s053/rootfiles/nebula/

The first step to perform a conversion is to download the newly acquired .fast file from expandacq to ribfana04 . This is achieve via the rsync script rsync_expand_test.sh :

rsync_expand_test.sh

After rsync is done, and the corresponding anaroot file for this run produced, we can run the snakemake workflow prepared in the Snakefile. This one is using the analysis.yaml file as a configuration :

processing chain for fast file
# fast -> group -> data -> merged -> physics
run:
  start: 220
  stop: 240

# raw data path
raw_data_path: "raw_data"

# processed data path
processed_data_path: "processed_data"

# base name of the run
basename: "nebnptest"

# base name of the anaroot file
anaroot_path: "anaroot"
anaroot_basename: "nebula"

The Snakefile is produced below. The first part contain plain python creating the list of final file to produced (i.e. post merged), and then rules to create all intermediate file.

import pathlib
import glob
import os
import yaml

with open("analysis.yaml", "r") as f:
    config = yaml.load(f, Loader=yaml.FullLoader)

# list of run to analyse
RUN=list(range(config["run"]["start"],config["run"]["stop"]+1))

basename = config["basename"]
raw_data_path = config["raw_data_path"]
processed_data_path = config["processed_data_path"]
anaroot_basename = config["anaroot_basename"]
anaroot_path= config["anaroot_path"]

# list of all output to be produced
nebulaplus_output=[]
nebula_output=[]
merge_output=[]
run_subrun={}# key: run, val: list of subrun
# building the dictionnaries
for r in RUN:
  nebula_output.append(anaroot_path+"/"+anaroot_basename+str(r).zfill(4)+".root")
  SUBRUN=[]
  base=raw_data_path+"/"+basename+"_"+str(r)+"_"
  path=glob.glob(base+"*.fast")
  if path:
    # build date
    DATE=path[0].split(base)
    DATE=DATE[1].split(".fast")
    # build subrun
    fast=glob.glob(base+"*.fast/*.fast")
    for f in fast:
      subrun=f[-9:-5]
      SUBRUN.append(subrun)
      nebulaplus_output.append(processed_data_path+"/merged/"+basename+"_"+str(r)+"_"+DATE[0]+"_merged.root")

  run_subrun[str(r)]=SUBRUN;

############################################################
####################### TRIGGER ############################
############################################################
# trigger the entire analysis chain
rule pipeline:
  threads: 10 # limit the number of concurent convertion
  input: # list of all file to be produced
    nebulaplus_output

############################################################
######################## RULE ##############################
###########################################################
# group trigger with QDC and produce a groupped fast file
rule group:
  input:
    lambda wildcards: expand("{raw_data_path}/{{base}}_{{run}}_{{date}}_{{time}}.fast/{{base}}_{{run}}_{{date}}_{{time}}_{{subrun}}.fast",raw_data_path=raw_data_path)
  output:
    "{processed_data_path}/group/{base}_{run}_{date}_{time}_{subrun}_group.fast"
  shell:
    "faster_file_group -l -e\"189\" -b1000 -a0 -f\"3000\" -g\"3001\" {input} {output}"

###########################################################
# convert a groupped fast file to root
rule convert:
  input: # a .fast file to be converted
     "{processed_data_path}/group/{base}_{run}_{date}_{time}_{subrun}_group.fast"
  output: # a .root file to be produced by npconvertion
     "{processed_data_path}/root/{base}_{run}_{date}_{time}_{subrun}_unmerged.root"
  shell:
    "npconversion --input faster,sample.pid,{input} --output root,DataTree,{output} --detector detector.yaml > .convert"

##########################################################
# trim a nebula root file based on match with anaroot
rule merge:
  input: # a nptool and anaroot file
     nebp=lambda wildcards: expand("{{processed_data_path}}/root/{{base}}_{{run}}_{{date}}_{{time}}_{subrun}_unmerged.root",subrun=run_subrun[wildcards.run]),
     anaroot=lambda wildcards: expand("{anaroot_path}/{anaroot_basename}{{run}}.root",anaroot_path=anaroot_path,anaroot_basename=anaroot_basename)
  output: # a .root file to be produced by trimming
     "{processed_data_path}/merged/{base}_{run}_{date}_{time}_merged.root"
  shell:
    "nebp_ridf_merger --window 50 --nebp {input.nebp} --ridf {input.anaroot} --outfile {output}>.merger"

To run the pipeline use the following command:

snakemake pipeline --cores 10

Be careful about the number of cores required, as it define the number of concurrent process that will be spawn by the snakemake call.

The analysis pipeline will produce file in the process_data directory. To each step of the conversion and analysis is associated a folder:

  • group: .fast file produce after the grouping of channel 189 registering SAMURAI trigger.
  • root: nptool conversion of the .fast file to root format
  • merged: a trimmed nptool tree that could be Friend with the corresponding anaroot file.

Quick Summary:

cd exp/exp2301_s053/faster/
source nebula_plus.sh
npp s053
rsync_expand_test.sh
snakemake pipeline --cores 10