# Introduction to MaPyDe

MaPyDe stands for MadGraph-Pythia-Delphes which is a utility that allows one to run all of the various HEP toolings or chain them together and perform a quick analysis with the results, such as running [ATLAS SimpleAnalysis](https://simpleanalysis.docs.cern.ch) or [pyhf](https://scikit-hep.org/pyhf).

## Background

This tool should be considered the proof-of-concept toolchain that ties together an entire workflow from end-to-end:
* Event Generation (Madgraph + Pythia)
* Detector Simulation and/or Reconstruction (Delphes, [atlas/athena](https://gitlab.cern.ch/atlas/athena), or your favorite truth-smearing tool)
* Event Selection (RECAST, Rivet, SimpleAnalysis, etc...)
* Statistical Analysis (pyhf, etc...)

There are many reasons why one might want to be able to execute a defined workflow like this such as:
* Reproduction of results ("good experiment"),
* Reinterpretation of results ("good theory"),
* &lt;insert your favorite reason here&gt;

All of these pieces requires some amount of input, either from experiment, or your friendly neighborhood phenomenologist, including (but not limited to!):
* model parameters (e.g. using [SLHA](https://skands.physics.monash.edu/slha/) files)
* detector acceptances and selection efficiencies (if you don't have access to the experiment's bread-and-butter reconstruction/simulation)
* (full&#8253;) probability models

All of this is not necessarily so obvious to chain together, especially for a newcomer to the field just trying to get a grasp on the past century of particle physics. *mapyde* should make this easier (but not trivial!)

# Understanding data

For this tutorial, we'll simply use `mapyde` as a command-line utility as it is meant to be primarily user-facing. The python package is usable for others who want to develop on top of it, but that won't be covered in this notebook.

In [1]:
import mapyde

print(mapyde.__version__)

0.4.4


In [2]:
!mapyde --help

Usage: mapyde [OPTIONS] COMMAND [ARGS]...

  Manage top-level options

Options:
  --version                       Print the current version.
  --prefix [data|cards|likelihoods|scripts|templates]
                                  Print the path prefix for data files.
  --install-completion [bash|zsh|fish|powershell|pwsh]
                                  Install completion for the specified shell.
  --show-completion [bash|zsh|fish|powershell|pwsh]
                                  Show completion for the specified shell, to
                                  copy it or customize the installation.
  --help                          Show this message and exit.

Commands:
  config
  run


The first thing to notice is that `mapyde` has a `--prefix` option where we ship a handful of useful files (including <abbr title="Configuration Files">**cards**</abbr>) for getting started. We don't intend for this to be a complete or exhaustive list of configuration files, but it should get you started quickly.

In [3]:
!mapyde --prefix data

/Users/kratsg/mapyde-tutorial/venv/share/mapyde


In [4]:
!ls -lavh `mapyde --prefix cards`

total 0
drwxr-xr-x   9 kratsg  staff   288B Dec 13 18:54 [34m.[m[m
drwxr-xr-x   6 kratsg  staff   192B Dec 13 18:54 [34m..[m[m
drwxr-xr-x  12 kratsg  staff   384B Dec 13 18:54 [34mdelphes[m[m
drwxr-xr-x  11 kratsg  staff   352B Dec 13 18:54 [34mmadspin[m[m
drwxr-xr-x  12 kratsg  staff   384B Dec 13 18:54 [34mparam[m[m
drwxr-xr-x  43 kratsg  staff   1.3K Dec 13 18:54 [34mprocess[m[m
drwxr-xr-x   5 kratsg  staff   160B Dec 13 18:54 [34mpythia[m[m
drwxr-xr-x   5 kratsg  staff   160B Dec 13 18:54 [34mrun[m[m
drwxr-xr-x   4 kratsg  staff   128B Dec 13 18:54 [34msherpa[m[m


In [5]:
!ls -lavh `mapyde --prefix likelihoods`

total 6864
drwxr-xr-x  5 kratsg  staff   160B Dec 13 18:54 [34m.[m[m
drwxr-xr-x  6 kratsg  staff   192B Dec 13 18:54 [34m..[m[m
-rw-r--r--  1 kratsg  staff   1.1M Dec 13 18:54 Higgsino_2L_bkgonly.json
-rw-r--r--  1 kratsg  staff   1.1M Dec 13 18:54 Slepton_bkgonly.json
-rw-r--r--  1 kratsg  staff   1.1M Dec 13 18:54 WinoBino_noWeight_2L_bkgonly.json


## Templates

The first of the files we'll discuss is about `templates`. Templates are wildly important since oftentimes, when we write a configuration to run a pipeline in `mapyde`, you often have some defaults you'd like to use. This is what we use templates for.

In [6]:
!ls -lavh `mapyde --prefix templates`

total 24
drwxr-xr-x  5 kratsg  staff   160B Dec 13 18:54 [34m.[m[m
drwxr-xr-x  6 kratsg  staff   192B Dec 13 18:54 [34m..[m[m
-rw-r--r--  1 kratsg  staff   1.7K Dec 13 18:54 defaults.toml
-rw-r--r--  1 kratsg  staff   1.7K Dec 13 18:54 ewkinos.toml
-rw-r--r--  1 kratsg  staff   1.7K Dec 13 18:54 sleptons.toml


Here, you can see a few current templates we have right now, such as a `defaults.toml` and one for `ewkinos.toml`. These templates are written in <abbr title="Tom's Obvious Minimal Language">[toml](https://toml.io/en/)</abbr> which is plaintext format for configuration files. It has some nice features that make it much more useful for variable injection, but also can look cleaner when you deal with deeply nested dictionaries as part of your configuration. Let's look at the `sleptons` template:

In [7]:
!cat `mapyde --prefix templates`/sleptons.toml

[base]
path = "{{PWD}}"
output = "output"
logs = "logs"
data_path = "{{MAPYDE_DATA}}"
cards_path = "{{MAPYDE_CARDS}}"
scripts_path = "{{MAPYDE_SCRIPTS}}"
process_path = "{{MAPYDE_CARDS}}/process/"
param_path = "{{MAPYDE_CARDS}}/param/"
run_path = "{{MAPYDE_CARDS}}/run/"
pythia_path = "{{MAPYDE_CARDS}}/pythia/"
delphes_path = "{{MAPYDE_CARDS}}/delphes/"
madspin_path = "{{MAPYDE_CARDS}}/madspin/"
likelihoods_path = "{{MAPYDE_LIKELIHOODS}}"

[madgraph]
skip = false
params = "SleptonBino"
ecms = 13000
cores = 1
nevents = 50000
seed = 0
version = "madgraph:2.9.3"
batch = false
paramcard = "{{madgraph['params']}}.slha"

[madgraph.generator]
output = "run.mg5"

[madgraph.masses]
MSLEP = 250
MN1 = 240

[madgraph.run]
card = "default_LO.dat"

[madgraph.run.options]
mmjj = 0
mmjjmax = -1
deltaeta = 0
ktdurham = -1
xqcut = -1
ptj = 20
ptj1min = 50

[madgraph.proc]
name = "isrslep"
card = "{{madgraph['proc']['name']}}"

[madspin]
skip = true
card 

There are a couple of top-level options available here, each corresponding roughly to a different tool (with the exception of `base`):
* `base`: controls all the global configurations such as location of inputs and outputs
* `madgraph`: configuration for madgraph
* `madspin`: configuration for madspin
* `pythia`: configuration for pythia
* `delphes`: configuration for delphes
* `analysis`: configuration for running analysis (such as `Delphes2SA.py`)
* `simpleanalysis`: configuration for running SA
* `sa2json`: a special tool we have that converts the output of SA into a HiFa JSON patch (for use with `pyhf`)
* `pyhf`: configuration for running pyhf

The big caveat here is that this configuration is in a beta-state, so it can change as we get more experience with how this works for us, and whether we want to make it easier or cleaner to configure. But for now, this isn't the worst. You should be able to glance over and get a rough idea of what most of the options do. There are specific options that you will (currently) need to look in the code for, or ask us on GitHub. [![open a discussion](https://camo.githubusercontent.com/8c6d18358e02e9e49a6dacefec3bb40cc4236c2bd8165bc74a997767c064d1ae/68747470733a2f2f696d672e736869656c64732e696f2f7374617469632f76313f6c6162656c3d44697363757373696f6e73266d6573736167653d41736b26636f6c6f723d626c7565266c6f676f3d676974687562)](https://github.com/scipp-atlas/mapyde/discussions)

[![file an issue](https://camo.githubusercontent.com/f621acd9b2de1bac2b320af8fb80c8673305de9c798aa2f1eea38709c7afef88/68747470733a2f2f696d672e736869656c64732e696f2f7374617469632f76313f6c6162656c3d497373756573266d6573736167653d46696c6526636f6c6f723d626c7565266c6f676f3d676974687562)](https://github.com/scipp-atlas/mapyde/issues)

There are some special variables available for interpolating into your configuration, which is done via `{{VARIABLE}}` double curly braces. You can see what's currently used [in the code](https://github.com/scipp-atlas/mapyde/blob/25e67e83bb0d9524f3ac51c9ec33325c718c591e/src/mapyde/utils.py#L47) but the following can be typically expected:
* `PWD`
* `USER`
* `MAPYDE_DATA`
* `MAPYDE_CARDS`
* `MAPYDE_LIKELIHOODS`
* `MAPYDE_SCRIPTS`
* `MAPYDE_TEMPLATES`

In addition to this, there is dynamic variable substitution as well. That means a configuration block like

```toml
[simpleanalysis]
skip = false
additional_opts = ""
name = "EwkCompressed2018"
outputtag = ""

[sa2json]
inputs = "{{simpleanalysis['name']}}{{simpleanalysis['outputtag']}}.root"
```

will render as

```json
{"sa2json": {"inputs": "EwkCompressed2018.root"}}
```

if you didn't override anything in the `[simpleanalysis]` block.

## Using a Template

To make this template usable, you will first create a configuration file that can inherit or use this template as a base. Typically will look like this:

```toml
[base]
path = "/data/users/{{USER}}/SUSY"
output = "mytag"
template = "{{MAPYDE_TEMPLATES}}/defaults.toml"

[madgraph.proc]
name = "charginos"
card = "{{madgraph['proc']['name']}}"

[madgraph.masses]
MN2 = 500
```

Here, in this example `user.toml` config, we will use the `defaults.toml` template that is shipped with `mapyde` via `{{MAPYDE_TEMPLATES}}/defaults.toml` (however you could always make your own template and provide that instead!). `mapyde` will always parse your template out first, before parsing the rest of the file.

Additionally, `[madgraph.masses` will be overriden from the default template to specify that `{{MN2}}` in the corresponding param card is substituted with the value `500`. The reason for using `{{VAR}}` pattern here is that it allows us to use [jinja](https://jinja.palletsprojects.com/en/3.1.x/) as a templating engine. This not only makes it much easier to maintain the same style of substitution across multiple files, but also allows us to use the concept of templates in an "inheritance" pattern.

# Your First Config

## Making It

Let's go ahead and create our first config (`tutorial.toml`) and learn about configuration parsing.

In [8]:
from pathlib import Path

import toml

data = {
    "base": {
        "path": "{{PWD}}",
        "output": "tutorial",
        "template": "{{MAPYDE_TEMPLATES}}/sleptons.toml",
    },
    "madgraph": {
        "nevents": 1000,
        "proc": {"name": "charginos", "card": "{{madgraph['proc']['name']}}"},
        "masses": {"MN1": 200},
    },
}

with Path().joinpath("tutorial.toml").open("w") as fp:
    toml.dump(data, fp)

In [9]:
!cat tutorial.toml | pygmentize -l toml

[34m[base][39;49;00m[37m[39;49;00m
path[37m [39;49;00m=[37m [39;49;00m[33m"{{PWD}}"[39;49;00m[37m[39;49;00m
output[37m [39;49;00m=[37m [39;49;00m[33m"tutorial"[39;49;00m[37m[39;49;00m
template[37m [39;49;00m=[37m [39;49;00m[33m"{{MAPYDE_TEMPLATES}}/sleptons.toml"[39;49;00m[37m[39;49;00m
[37m[39;49;00m
[34m[madgraph][39;49;00m[37m[39;49;00m
nevents[37m [39;49;00m=[37m [39;49;00m[34m1000[39;49;00m[37m[39;49;00m
[37m[39;49;00m
[34m[madgraph.proc][39;49;00m[37m[39;49;00m
name[37m [39;49;00m=[37m [39;49;00m[33m"charginos"[39;49;00m[37m[39;49;00m
card[37m [39;49;00m=[37m [39;49;00m[33m"{{madgraph['proc']['name']}}"[39;49;00m[37m[39;49;00m
[37m[39;49;00m
[34m[madgraph.masses][39;49;00m[37m[39;49;00m
MN1[37m [39;49;00m=[37m [39;49;00m[34m200[39;49;00m[37m[39;49;00m


Now from this, you can see we are setting `MN1 = 200` instead of the default `MN1 = 240` from the template.

In addition, we set the `path` to the current directory via `{{PWD}}` and the output for this config will be stored under `{{base['path']}}/tutorial`.

What does `mapyde` parse it as? We can run `mapyde config parse`:

In [10]:
!mapyde config parse tutorial.toml

{
    "base": {
        "path": "/Users/kratsg/mapyde-tutorial/book",
        "output": "tutorial",
        "logs": "logs",
        "data_path": "/Users/kratsg/mapyde-tutorial/venv/share/mapyde",
        "cards_path": "/Users/kratsg/mapyde-tutorial/venv/share/mapyde/cards",
        "scripts_path": "/Users/kratsg/mapyde-tutorial/venv/share/mapyde/scripts",
        "process_path": "/Users/kratsg/mapyde-tutorial/venv/share/mapyde/cards/process/",
        "param_path": "/Users/kratsg/mapyde-tutorial/venv/share/mapyde/cards/param/",
        "run_path": "/Users/kratsg/mapyde-tutorial/venv/share/mapyde/cards/run/",
        "pythia_path": "/Users/kratsg/mapyde-tutorial/venv/share/mapyde/cards/pythia/",
        "delphes_path": "/Users/kratsg/mapyde-tutorial/venv/share/mapyde/cards/delphes/",
        "madspin_path": "/Users/kratsg/mapyde-tutorial/venv/share/mapyde/cards/madspin/",
        "likelihoods_path": "/Users/kratsg/mapyde-tutorial/venv/share/mapyde/likelihoods",
        "t

Notice that for example, in `madgraph.masses` we now have

```json
        "masses": {
            "MSLEP": 250,
            "MN1": 200
        },
```

which merged the template with our configuration, rather than completely overriding. This is the default behavior in `mapyde` for nested dictionaries (and might be revisited later if this is not helpful). One nice thing about this `parse` functionality is that it effectively provides a "frozen" configuration that can be hashed and passed around / uniquely identified.

This mass gets substituted into the param card which, in this example is `SleptonBino.slha` under the `base['param_path']` (using the `mapyde` default paths):

In [11]:
!head -n78 `mapyde --prefix cards`/param/SleptonBino.slha | sed 1,37d

###################################
## INFORMATION FOR MASS
###################################
Block mass
    5 4.889917e+00 # MB
    6 1.750000e+02 # MT
   15 1.777000e+00 # Mta
   23 9.118760e+01 # MZ
   24 7.982901e+01 # MW
   25 1.108991e+02 # MH01
   35    4.5E9  # MH02
   36    4.5E9  # MA0
   37    4.5E9  # MH
   1000001    4.5E9  # set of param :1*Msd1, 1*Msd2
   1000002    4.5E9  # set of param :1*Msu1, 1*Msu2
   1000005    4.5E9  # Msd3
   1000006    4.5E9  # Msu3
   1000011    {{MSLEP}}  # Msl1
   1000012    4.5E9  # Msn1
   1000013    {{MSLEP}}  # Msl2
   1000014    4.5E9  # Msn2
   1000015    {{MSLEP}}  # Msl3
   1000016    4.5E9  # Msn3
   1000021    4.5E9  # Mgo
   1000022    {{MN1}} # Mneu1
   1000023    4.5E9  # Mneu2
   1000024    4.5E9  # Mch1
   1000025    4.5E9  # Mneu3
   1000035    4.5E9  # Mneu4
   1000037    4.5E9  # Mch2
   2000001    4.5E9  # set of param :1*Msd4, 1*Msd5
   2000002    4.5E9  # set of param :1*Msu4, 1*Msu5
   2

## Running It

Now, let's go ahead and demonstrate what can be done. As mentioned above, `mapyde` is a user-facing CLI so let's see what is available to run:

In [12]:
!mapyde run --help

Usage: mapyde run [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  all             Run madgraph, delphes, analysis, and pyhf.
  analysis        Run analysis.
  delphes         Run delphes.
  madgraph        Run madgraph.
  pyhf            Run pyhf.
  root2hdf5       Transform from .root to .hdf5 format.
  sa2json         Run sa2json.
  sherpa          Run Sherpa.
  simpleanalysis  Run simpleanalysis (ATLAS tool)


We have the ability to just run `all` which will run anything defined in the config, that's not specified as `skip = true`. We can also run individual steps/pieces as well. Let's try running madgraph and generating the 1000 events we configured.

This will pull the docker image associated for madgraph (that we built at https://github.com/scipp-atlas/mapyde), create an instance, and pass both input files and commands into it.

In [13]:
!mapyde run madgraph tutorial.toml

2d13c8c920ccd7042956d7f0e67c001781b7f0cbbcc4b2165271ee1c48de8844
************************************************************
*                                                          *
*                     W E L C O M E to                     *
*              M A D G R A P H 5 _ a M C @ N L O           *
*                                                          *
*                                                          *
*                 *                       *                *
*                   *        * *        *                  *
*                     * * * * 5 * * * *                    *
*                   *        * *        *                  *
*                 *                       *                *
*                                                          *
*         VERSION 2.9.3                 2021-03-25         *
*                                                          *
*    The MadGraph5_aMC@NLO Development Team - Find us at   *
*    https://server0

INFO: Generating Feynman diagrams for Process: d d~ > x1- x1+ WEIGHTED<=4 @1 
INFO: Generating Feynman diagrams for Process: s s~ > x1- x1+ WEIGHTED<=4 @1 
INFO: Finding symmetric diagrams for subprocess group qq_x1mx1p 
Generated helas calls for 4 subprocesses (12 diagrams) in 0.022 s
Wrote files for 44 helas calls in 0.139 s
ALOHA: aloha creates FFS1 routines
ALOHA: aloha creates FFV2 routines
ALOHA: aloha creates FFV5 routines
ALOHA: aloha creates FFS2 routines
ALOHA: aloha creates FFV3 routines
ALOHA: aloha creates FFV1 set of routines with options: P0
save configuration file to /tmp/tmp.GqSC5y7MdF/PROC_madgraph/Cards/me5_configuration.txt
INFO: Use Fortran compiler gfortran 
INFO: Use c++ compiler g++ 
INFO: Generate web pages 
Output to directory /tmp/tmp.GqSC5y7MdF/PROC_madgraph done.
Type "launch" to generate events from this process, or see
/tmp/tmp.GqSC5y7MdF/PROC_madgraph/README
Run "open index.html" to see more information about this process.
set run_mode 2
This option will

INFO: Update the dependent parameter of the param_card.dat 
Generating 1000 events with run name run_01
survey  run_01 
INFO: compile directory 
[1;34mNot able to open file /tmp/tmp.GqSC5y7MdF/PROC_madgraph/crossx.html since no program configured.Please set one in ./input/mg5_configuration.txt[0m
INFO: Using LHAPDF v6.3.0 interface for PDFs 
INFO: Trying to download NNPDF30_nlo_as_0118 
Unable to download /cvmfs/sft.cern.ch/lcg/external/lhapdfsets/current/NNPDF30_nlo_as_0118.tar.gz
NNPDF30_nlo_as_0118.tar.gz:    25.9 MB [100.0%] 118.tar.gz:    16.2 MB [62.4%] 
INFO: NNPDF30_nlo_as_0118 successfully downloaded and stored in /usr/local/share/LHAPDF 
compile Source Directory
Using random number seed offset = 21
INFO: Running Survey 
Creating Jobs
Working on SubProcesses
INFO: Compiling for process 1/1. 
INFO:     P1_qq_x1mx1p  
INFO:     P1_qq_x1mx1p  
[1;34mZero result detected:  No Phase Space. Please check particle masses.
[0m
INFO:  
quit
INFO:  
[1;31mGeneration failed (no resul

Of course, this doesn't actually give you any physics (which wasn't really the point of this portion of the tutorial). In the next section, we're going to walk through a real-world usage of `mapyde`.

To finish up this section however, remember that the `base['path']` and `base['output']` meant that outputs would be stored in our current working directory under `tutorial/` so let's see what got created:

In [14]:
!ls -lavh tutorial/

total 120
drwxr-xr-x   9 kratsg  staff   288B Dec 13 19:01 [34m.[m[m
drwxr-xr-x  14 kratsg  staff   448B Dec 13 18:50 [34m..[m[m
-rw-r--r--   1 kratsg  staff    21K Dec 13 19:01 SleptonBino.slha
-rw-r--r--   1 kratsg  staff   1.3K Dec 13 19:01 charginos
-rw-r--r--   1 kratsg  staff    17K Dec 13 19:01 default_LO.dat
drwxr-xr-x   3 kratsg  staff    96B Dec 13 18:20 [34mlogs[m[m
drwxr-xr-x   3 kratsg  staff    96B Dec 13 18:20 [34mmadgraph[m[m
-rw-r--r--   1 kratsg  staff   4.1K Dec 13 19:01 pythia_card.dat
-rw-r--r--   1 kratsg  staff   1.4K Dec 13 19:01 run.mg5


We can confirm, for example, that our masses were substituted in correctly:

In [15]:
!head -n78 tutorial/SleptonBino.slha | sed 1,37d

###################################
## INFORMATION FOR MASS
###################################
Block mass
    5 4.889917e+00 # MB
    6 1.750000e+02 # MT
   15 1.777000e+00 # Mta
   23 9.118760e+01 # MZ
   24 7.982901e+01 # MW
   25 1.108991e+02 # MH01
   35    4.5E9  # MH02
   36    4.5E9  # MA0
   37    4.5E9  # MH
   1000001    4.5E9  # set of param :1*Msd1, 1*Msd2
   1000002    4.5E9  # set of param :1*Msu1, 1*Msu2
   1000005    4.5E9  # Msd3
   1000006    4.5E9  # Msu3
   1000011    250  # Msl1
   1000012    4.5E9  # Msn1
   1000013    250  # Msl2
   1000014    4.5E9  # Msn2
   1000015    250  # Msl3
   1000016    4.5E9  # Msn3
   1000021    4.5E9  # Mgo
   1000022    200 # Mneu1
   1000023    4.5E9  # Mneu2
   1000024    4.5E9  # Mch1
   1000025    4.5E9  # Mneu3
   1000035    4.5E9  # Mneu4
   1000037    4.5E9  # Mch2
   2000001    4.5E9  # set of param :1*Msd4, 1*Msd5
   2000002    4.5E9  # set of param :1*Msu4, 1*Msu5
   2000005    4.5E9  # Msd

Or what the madgraph run card looks like:

In [16]:
!cat tutorial/run.mg5

set default_unset_couplings 99
set group_subprocesses Auto
set ignore_six_quark_processes False
set loop_optimized_output True
set loop_color_flows False
set gauge unitary
set complex_mass_scheme False
set max_npoint_for_channel 0
import model MSSM_SLHA2
define j = g u c d s u~ c~ d~ s~
define pb = g u c d s b u~ c~ d~ s~ b~
define jb = g u c d s b u~ c~ d~ s~ b~
define l+ = e+ mu+
define l- = e- mu-
define vl = ve vm vt
define vl~ = ve~ vm~ vt~
define fu = u c e+ mu+ ta+
define fu~ = u~ c~ e- mu- ta-
define fd = d s ve~ vm~ vt~
define fd~ = d~ s~ ve vm vt
define susystrong = go ul ur dl dr cl cr sl sr t1 t2 b1 b2 ul~ ur~ dl~ dr~ cl~ cr~ sl~ sr~ t1~ t2~ b1~ b2~
define susyweak = el- el+ er- er+ mul- mul+ mur- mur+ ta1- ta1+ ta2- ta2+ n1 n2 n3 n4 x1- x1+ x2- x2+ sve sve~ svm svm~ svt svt~
define susylq = ul ur dl dr cl cr sl sr
define susylq~ = ul~ ur~ dl~ dr~ cl~ cr~ sl~ sr~
define susysq = ul ur dl dr cl cr sl sr t1 t2 b1 b2
define susysq~ = ul~ ur~ dl~ dr~ cl