Likelihood Overview

On this page:

Binned vs Unbinned Likelihood
Model Fitting
- Goodness of Fit
- Test Statistic
Select Data: What data should be used for source analysis?
- Recommended Values
Precomputation of Likelihood Quantities
- Livetime Cube
- Exposure Maps

Also see:

Likelihood Overview (Adapted from the FSSC's Cicerone)

In order to analyze LAT data, it is necessary to construct the likelihood that is applicable to the LAT data, and then use this likelihood to find the best fit model parameters, including the description of a source's spectrum, its position, and even whether it exists.

The likelihood L is the probability of obtaining the data given an input model. In this case, the input model is the distribution of gamma-ray sources on the sky, and includes their intensity and spectra.

Binned vs Unbinned Likelihood

Unbinned likelihood analysis is the preferred method for spectral fitting of the LAT data. However, a binned analysis is provided for cases where the unbinned analysis cannot be used. For example, the memory required for the likelihood calculation scales with number of photons and number of sources. This memory usage becomes excessive for long observations of complex regions, necessitating the use of binned analysis.

Model Fitting

Assuming that we know a source is present, we expect the best model to have the highest probability of resulting in the data, and we vary the spectral parameters until the likelihood is maximized. Note that χ2 is -2 times the logarithm of the likelihood in the limit of a large number of counts in each bin, and therefore where χ2 is a valid statistic, minimizing χ2 is equivalent to maximizing the likelihood.

To fit a source's spectra:

Select the data.

Due to the overlapping of the point spread functions of nearby sources, data from a substantial spatial region around the source(s) being analyzed must be used.
(See: What data should be used for source analysis?)

Select the model.

The model includes the position of the source(s) being analyzed, the position of nearby sources, a model of the diffuse emission, the functional form of the source spectra, and values of the spectral parameters.

In fitting the source(s) of interest, let the parameters for these sources vary.

Note: Because the region around these sources includes counts from nearby sources in which you are not interested, you might also let the parameters from these nearby sources vary.

Models used by the Science Tools are stored in XML files. For historic reasons the Science Tools use two XML formats, one used for parameter fitting (e.g., by the likelihood tool) and the other for source simulation. The likelihood XML format includes parameter uncertainties but does not allow time dependence while the simulation format does; in addition, the simulation format includes source models that are not in the likelihood format.

ModelEditor can read and write XML files of both formats. You could master the syntax to create a model XML file, and indeed occasionally you might find it convenient to edit an existing XML file. However, a GUI-driven tool called ModelEditor is included in the Science Tools. This tool is invoked at the command line, and its use is fairly intuitive.

See Create a Source Model XML File.

Also see ModelEditor.

Precompute a number of quantities that are part of the likelihood computation.

As parameter values are varied in searching for the best fit, the likelihood is calculated many times. While not strictly necessary, precomputing a number of computation-intensive quantities will greatly speed up the fitting process.
(See Precomputation of Likelihood Quantities.)

Perform the fit.

The parameter space can be quite large as the spectral parameters from a number of sources must be fit simultaneously; therefore, the likelihood tools provide a choice of 'optimizers' to maximize the likelihood efficiently (e.g., DRMNGB, DRMNFB, NEWMINUIT, MINUIT and LBFGS.

Tip: Generally speaking, the faster way to find the parameters estimation is to use the DRMNGB (or DRMNFB) approach to find initial values and then use MINUIT (or NEWMINUIT) to find more accurate results.

Fitting requires repeatedly calculating the likelihood for different trial parameter sets until a value sufficiently near the maximum is found; the optimizers guide the choice of new trial parameter sets to converge efficiently on the best set.

The variation of the likelihood in the vicinity of the maximum can be related to the uncertainties on the parameters, and therefore these optimizers estimate the parameter uncertainties.

Goodness-of-fit. Likelihood spectral fitting provides the best fit parameter values and their uncertainties, but is this a good fit?

When χ2 is a valid statistic, we know that the value of χ2 is drawn from a known distribution, and we can use the probability of obtaining the observed value as a goodness-of-fit measure. When there are many degrees of freedom (i.e., the number of energy channels minus the number of fitted parameters), we expect the χ2 per degree of freedom to be ~1 for a good fit.

However, when χ2 is not a valid statistic, we usually do not know the distribution from which the maximum likelihood value is drawn, and therefore we do not have a goodness-of-fit measure.

Note: While the optimizers find the best fit spectral parameters, they do not find the location. However, a tool is provided that performs a grid search—mapping out the maximum likelihood value over a grid of locations. As will be explained below, it is convenient to use a quantity called the 'Test Statistic' TS that is maximized when the likelihood is maximized.

Test Statistic. The Test Statistic is defined as TS=-2ln_{(Lmax,0/Lmax,1)}, where L_max,0 is the maximum likelihood value for a model without an additional source (the 'null hypothesis') and L_max,1 is the maximum likelihood value for a model with the additional source at a specified location.

As can be seen, TS is a monotonically increasing function of L_max,1, which is why maximizing TS on a grid is equivalent to maximizing the likelihood on a grid. In the limit of a large number of counts, Wilkes Theorem states that the TS for the null hypothesis is asymptotically distributed as χ²x. Note: Here χ² is the distribution, not a value of the statistic, where x is the number of parameters characterizing the additional source.

This means that TS is drawn from this distribution if no source is present, and an apparent source results from a fluctuation. Thus, a larger TS indicates that the null hypothesis is incorrect (i.e., a source really is present), which can be quantified.

Note: As a basic rule of thumb, the square root of the TS is approximately equal to the detection significance for a given source.

Select Data: What data should be used for source analysis?

Choosing the Data to Analyze — Regions of Interest and Source Regions

Assume that we are analyzing the spectrum of a single source. Because of the large point spread function at low energies (e.g., 68% of the counts will be within 3.5 degrees at 100 MeV), we want to use the counts within a region around the source. Nearby sources will contribute counts to this region, and we want to model them, i.e., to model a single source we are forced to model a handful of sources. Therefore, we need to include counts from an even larger region.

For the greatest accuracy possible in modeling a single source, we should model the entire sky(!), but this is not usually feasible and, in reality, the influence of sources a very great distance away from the source will be greatly attenuated. Thus ,we include sources from a large 'Source Region' and counts from a smaller 'Region of Interest' (ROI).

The positions and spectra of sources in the Source Region outside of the ROI were obtained previously; from a catalog, for example. These sources are included because of their contribution to the counts in the ROI. How we treat the sources in the ROI is under our control: we may wish to fix the parameters of sources other than the one we are studying at their catalog values, or we might want to perform a fit on the parameters of all of these sources.

To summarize, we will use all of the sources in the Source Region, and determine the size of the Source Region appropriate for our needs from experience and experimentation.

Recommended values. Default values of ROI+10 and ROI+5 degrees are recommended for sources dominated by ~100 MeV and ~1 GeV events, respectively, and all counts in the ROI are included. The appropriate ROI size is determined based on experience and experimentation, but the recommended default values are 20 and 15 degrees, respectively, for sources dominated by ~100 MeV and ~1 GeV events.

Precomputation of Likelihood Quantities

The computation of the likelihood usually occurs many times. Fits are done with various model parameters fixed or with different sources present or absent. Certain quantities need be calculated only once, speeding up the repeated computation of the likelihood.

Livetime Cube

The LAT instrument response functions are a function of the inclination angle, the angle between the direction to a source and the LAT normal. The number of counts that a source should produce should therefore depend on the amount of time that the source spent at a given inclination angle during an observation. This livetime quantity, the time that the LAT observed a given position on the sky at a given inclination angle, depends only on the history of the LAT's orientation during the observation and not on the source model. The array of these livetimes at all points on the sky is called the 'livetime cube.' As a practical matter, the livetime cubes are provided on a healpix grid on the sky and in inclination angle bins (see the Cicerone's Likelihood Livetime and Exposure: Livetime cubes).

Exposure Maps

The likelihood consists of two factors: the first is dependent on the detected counts and differs between binned and unbinned likelihood calculations; and the second is equal to the exponential of the negative of the expected total number of counts Nexp for the source model. The exposure map is the total exposure—area multiplied by time—for a given position on the sky producing counts in the Region of Interest. Since the response function is a function of the photon energy, the exposure map is also a function of this energy. Thus, the counts produced by a source at a given position on the sky is the integral of the source flux and the exposure map (a function of energy) at that position. The exposure map is used for extended sources such as the diffuse Galactic and Extragalactic backgrounds, and not for individual sources.

The exposure map should be computed over a Source Region that is larger than the Region of Interest by ~50%. This is necessary to insure that all source photons are included due to the size of the LAT instrument PSF at low energies.

Last updated by: Chuck Patterson 12/08/2010