Skip to content

Overview

BatFaker contains tools designed to produce fake raw data files. It was originally designed to created "salted" data, with fake events hidden alongside regular data. It may also be used to generate fake raw data files for evaluating reconstruction and analysis performance. At present, it only works with the Soudan raw data format, NOT with the MIDAS format. So for the purposes of SuperCDMS SNOLAB BatFaker should be considered deprecated as of Aug 16th, 2021.

The BatFaker directory will produce 2 executables: MapRawFile and BatFaker. BatFaker is the actual workhorse, while MapRawFile produces an auxiliary input file necessary for BatFaker to run.

HISTORY

20161205 BML: Initial creation

Author: Ben Loer

Running BatFaker

BatFaker requires several inputs:

  1. 1 or more raw data files
  2. eventmap files for each raw data file (produced by MapRawFile)
  3. Template files and detector_status file if using templates
  4. An input 'control' file specifying how to generate the fake events

BatFaker operates by copying a raw event into memory, overwriting the area corresponding to the raw pulse array, and then writing the modified raw event to a new file. BatFaker CANNOT create fake events out of whole cloth; it requires an original event to modify for each output event. BatFaker also cannot copy only part of an event. E.g., if you want to fake only PAS2 of detector 10 in a given event, all of the other detectors, and other event parts like veto and history buffer, will be copied as well.

Run 'BatFaker --help' to see the command line ordering and available options

Raw data files

BatFaker expects the raw data to be under a single directory, with individual dump files in directories by series. I.E.:

raw_data/<series>/<series>_F<dump>.tgz

At this time BatFaker does not have the ability to take in raw data from different top-level directories.

By default, BatFaker will create a similar directory hiearchy (with dump files inside series directories) in the output. If this is not desired, BatFaker accepts a "--nosubdirs" switch which will result in all output files placed in the top-level output directory.

eventmap files

In order to read and copy selected bits of raw data files around, BatFaker relies on an auxiliary file that I have labeled an eventmap. An eventmap is a list of every event recorded in a dump file, and every channel in that event. The absolute memory location (number of uncompressed bytes between the start of that event/channel and the start of the file) is recorded. Then, when BatFaker needs to access a particular event/channel, it can seek directly to it.

Eventmap files should already exist for Soudan data. If the raw data series directories are located under /path/to/raw_data, eventmaps are located under /path/to/raw_data/eventmaps .

By default, BatFaker will look for .eventmap files in the same directory as the raw data files, i.e. /path/to/raw_data/series/series_dump.gz.eventmap . Since this doesn't match the directory layout, BatFaker takes an optional "--mapdir \

" command line switch to point to the eventmap directory. If BatFaker cannot find the eventmap for a given dump file, it will attempt to create one by invoking MapRawData in a sub command. If this fails (for example, if you do not have write permission to the target directory), BatFaker will exit with an error.

Therefore, it should not be necessary to manually call MapRawFile so long as the mapdir is pointing somewhere with write access.

pulse template files

BatFaker uses the built-in cdmsbats machinery to load the correct pulse template for a given channel in a given series. This includes loading templates for broken/shorted channels, etc. Therefore BatFaker requires both the directory containing PulseTemplate ROOT files and a detectorStatus file. These can both be provided explicitly at the command line via the "--templatedir" and "--detstatusfile" switches.

If these options are not provided, BatFaker will first look for pulse templates and the detector status file under

$CDMSBATSDIR/PulseTemplates and
$CDMSBATSDIR/UserSettings/BatRootSettings/detector_status/detectorStatus.SuperCDMS

respectively. If CDMSBATSDIR is not defined, it will be assumed to be equal to PWD, i.e., that BatFaker is beting run from the top-level cdmsbats directory. Finally, if the detector status file cannot be found, BatFaker will generate a warning but continue to run. In this case templates will be incorrect for any problematic channels.

control file

This file specifies line-by-line what events to copy and simulate. The syntax of the control file will be detailed in the next section. A single BatFaker config file should be all that one needs to uniquely determine the contents of a fake dataset.

Writing a BatFaker control file

Overview

The control file is used to specify how to build the fake events. Lines beginning with '#' are treated as comments and ignored, and blank lines are allowed. Otherwise, each line instructs BatFaker to copy and possibly modify all or part of an event to output.

Each non-comment line MUST have one of the following forms:

Copy(series, event)
Replace(series, event, detCode) => F1() [ + F2() [ + F3() ... ] ]

Copy and Replace identify "target" events, while the "F" functions (described below) specify the inputs to construct a fake pulse. An arbitrary number of input F functions can be specified. Spaces between arguments, commas, parentheses, etc are ignored, but spaces are required around the '=>' and '+', and function specifiers. Quotes around string-like arguments, including empty strings, are optional.

The behavior of BatFaker is significantly different depending on whether the "--skim" switch is provided or not (default off). In the default behavior, BatFaker will copy an entire dump file, replacing only the waveforms specified by each Replace line. In this mode, Copy lines have no effect. In skim mode, BatFaker will only copy events that appear in a Copy or Replace statement, and skip all other parts of a dump file.

NB: Lines in the control file MUST be in ascending event order. BatFaker behavior is undefined otherwise, and the resulting output files will not make much sense. BatFaker reads the control file line by line, and takes the following actions:

  • Check the series and event in the Copy or Replace statement. If an event is currently loaded and it is different from the new target:
    • write the currently loaded event to the currently opened output file
    • If the new target series,event is in a different dump file
      • if BatFaker is in normal (not skim) mode, copy all remaining events in the currently opened input file to output
      • close both current input and output files, and open the input and output files for the new target dump
    • If BatFaker is in normal (not skim) mode, copy all events between the currently loaded event (or first event if this is a new file) and the new target to the output
    • Load the new target into memory
  • If this is a Replace line, calculate the fake pulse from the input functions, and copy the resulting waveform over the target waveform in memory
  • On reaching the end of the control file, close the currently opened input and output files, copying any subsequent events if not in skim mode

The next section provides more details on the function syntax.

Function Syntax

This section describes the available target and input function specifiers that BatFaker knows how to interpret. Each specifier is a name followed by a list of arguments in parentheses, exactly like invoking a C function. Also like a C function, if a default argument is provided in the function signature, it is optional.

Many of the functions take a channelcode as one of their inputs. This follows the CDMS convention to globally and uniquely identify a single channel in a run. The code is

detectorType*1000000 + detectorNum*1000 + channelNum

11 is the detectorType for SuperCDMS iZIPs, and 21 for CDMSlite Soudan detectors. See the functions in BatCommon/datareader/ChannelMapHelper for more info.

Copy

Copy(series, event)

This function will copy a whole event from the input raw data file to output. This is only useful if running in skim mode, since in normal mode all events are copied whether they appear in the control file or not.

Replace

Replace(series, event, channelcode)

Select the channel within the target series and event to replace with sum of the input functions in the remainder of the line. It is only possible to replace one channel per line, so to fully fake an event with 10 detectors and 12 channels each would require 120 lines in the control file!

RawPulse

RawPulse(series, event, channelCode, scale=1, delay=0)

Copy the specified pulse for channelcode in series, event into the target buffer. Optionally linearly scale the pulse by scale and offset by delay seconds. If delay does not correspond to an integer multiple of the sample dt, the output pulse will be interpolated. Samples at the window edges will be wrapped: e.g., if the delay value is -0.001, the first ms of the pulse will be moved to the last ms.

This function is most useful to provide a "random" pulse to get a representative baseline, or to take a high energy pulse and scale to lower energy.

@TODO: add an optional 'offset' parameter.

Template

TemplateByCode(series, dump, channelcode, suffix, scale=1, delay=0)
TemplateByName(series, dump, detNum, templateName, scale=1, delay=0)

These two functions both perform the same action: loading the appropriate template, but with slightly different syntax for convenience. Series and dump are used to determine which template to load. If an exact series is not known, a good rule of thumb is to use the target series (in the Replace statement). Otherwise, if templates are being used to fake an existing event, use that series. Dump turns out not to be used and can in all cases be set to 0, but unfortunately it must be provided at this time.

@TODO: Get rid of the dump parameter

Time-domain templates are stored in cdmsbats by channel name under detector number subdirectories. There are additionally multiple templates for each channel. E.g. for channel PAS1, there is a default OF template "PAS1", there may be fast and slow 2 template inputs ("PAS1fast" and "PAS1slow") as well as glitch and LF templates. TemplateByName takes a detector number and full template name, and is the "natural" syntax based on the template file structure. TemplateByCode takes a full channelcode as all other control file functions and a suffix, which may be '' (empty string) for the default template, or 'fast' or 'slow' in the previous examples. Scale and delay are as for RawPulse, except that samples at the window edge are not wrapped for templates, but set to 0.

What should the amplitude (scale) for a template pulse be? For a given channel '#', the 'raw' reconstructed amplitude (e.g. P#OFamps) will be equal to (scale/P#norm). In other words, to match a real pulse with a reconstructed amplitude of P#OFamps,

scale = P#OFamps * P#norm

Examples

This example is pulled from a recent salting trial dataset. We want to replace T5Z2 (CDMSlite Run 2) in series 01140201_0228, event 20001 with the barium event 01140611_1119, event 3430085, scaled to 10keV (pt) and added to the random event 01140201_0228,280154. Because I'm lazy, we'll just do channels PA and PB. Here are the details for the Ba event in Prodv5-3-6:

  • ptNF: 128.949
  • PAnorm: 1.638e+9
  • PAOFamps: 3.580e-7
  • PAOFdelay: -1.6e-5
  • PAOF1X2Pamps: 3.201e-7
  • PAOF1X2Ramps: 4.998e-5
  • PAOF1X2delay: -9.65e-6
  • PBnorm: 1.638e9
  • PBOFamps: 2.499e-7
  • PBOFdelay: 4.48e-5
  • PBOF1X2Pamps: 2.964e-7
  • PBOF1X2Ramps: -8.95e-5
  • PBOF1X2delay: 1.861e-5

Because we have a target PT of 10 keV, we'll scale everything by

ratio = 10./128.949 = 0.07755

We'll do several options. For each, each line will start the same, with a Replace statement followed by a RawPulse statement for the randoms baseline. We're only doing channels PA and PB (2, and 3) but normally one would do all of the channels. For clarity on gitblits markdown parser, I've indented longer lines, but in the real config file these must be single lines.

@TODO: implement multi-line parsing

Option 1: Raw Pulses

For option 1, let's scale the actual raw pulses. That gives:

Replace(01140201_0228, 20001, 21014002) => RawPulse(01140201_0228, 280154, 21014002) \
                                         + RawPulse(01140611_1119, 3430085, 21014002, 0.07755)
Replace(01140201_0228, 20001, 21014003) => RawPulse(01140201_0228, 280154, 21014003) \
                                         + RawPulse(01140611_1119, 3430085, 21014003, 0.07755)

Option 2: OF Templates

For option 2, let's use the regular OF templates:

Replace(01140201_0228, 20001, 21014002) => RawPulse(01140201_0228, 280154, 21014002) \
                                         + TemplateByCode(01140611_1119, 0, 21014002, , 45.58, -1.6e-5)
Replace(01140201_0228, 20001, 21014003) => RawPulse(01140201_0228, 280154, 21014003) \
                                         + TemplateByCode(01140611_1119, 0, 21014003, , 31.74, 4.48e-5)

where the scale values are given by

scale = P#OFamps * P#norm * ratio

We would get the same effect via

Replace(01140201_0228, 20001, 21014002) => RawPulse(01140201_0228, 280154, 21014002) \
                                         + TemplateByName(01140611_1119, 0, 14, PA, 45.58, -1.6e-5)
Replace(01140201_0228, 20001, 21014003) => RawPulse(01140201_0228, 280154, 21014003) \
                                         + TemplateByCode(01140611_1119, 0, 14, PB, 31.74, 4.48e-5)

Option 3: Two-Templates OF Templates

Finally for option 3, let's use the two-templates OF (1X2) templates:

Replace(01140201_0228, 20001, 21014002) => RawPulse(01140201_0228, 280154, 21014002) \
                                         + TemplateByCode(01140611_1119, 0, 21014002, slow, 40.66, -9.65e-6) \
                                         + TemplateByCode(01140611_1119, 0, 21014002, fast, 6342, -9.65e-6)
Replace(01140201_0228, 20001, 21014003) => RawPulse(01140201_0228, 280154, 21014003) \
                                         + TemplateByCode(01140611_1119, 0, 21014003, slow, 37.65, 1.86e-5) \
                                         + TemplateByCode(01140611_1119, 0, 21014003, fast, -11305, 1.86e-5)

Note that the fast amplitudes look incredibly large. This is due to a weird normalization of the fast ampltidues in Prodv5-3-6 that has been fixed in more recent versions of cdmsbats. So I probably should have picked a better example. However, the example is correct for the values given, i.e. the fast scale parameter should be

scale = P#OF1X2Ramps * P#norm * ratio