Working with Workspaces

This tutorial covers how to work with PyHS3 workspaces - loading, exploring, and understanding their structure.

What is a Workspace?

A Workspace is the main container in PyHS3 that holds all the components needed to define a statistical model:

  • Distributions: Probability distributions (Gaussian Normal distribution, Poisson distribution, etc.)

  • Functions: Mathematical functions that compute parameter values (Sum, Product, Generic Function)

  • Domains: Parameter space constraints and bounds

  • Parameter Points: Named sets of parameter values

  • Data: Observed data specifications (point data, unbinned data, binned/histogram data)

  • Likelihoods: Mappings between distributions and data

  • Analyses: Complete analysis configurations

  • Metadata: Version information and documentation

Loading a Workspace

You can create a workspace from a dictionary or load it from a JSON file. The following example shows a simple workspace with a Gaussian distribution:

Exploring Workspace Contents

Once you have a workspace, you can explore its contents:

>>> import pyhs3
>>> workspace_data = {
...     "metadata": {"hs3_version": "0.2"},
...     "distributions": [
...         {
...             "name": "signal",
...             "type": "gaussian_dist",
...             "x": "obs",
...             "mean": "mu",
...             "sigma": "sigma",
...         }
...     ],
...     "parameter_points": [
...         {
...             "name": "nominal",
...             "parameters": [
...                 {"name": "obs", "value": 0.0},
...                 {"name": "mu", "value": 0.0},
...                 {"name": "sigma", "value": 1.0},
...             ],
...         }
...     ],
...     "domains": [
...         {
...             "name": "physics_region",
...             "type": "product_domain",
...             "axes": [
...                 {"name": "obs", "min": -5.0, "max": 5.0},
...                 {"name": "mu", "min": -2.0, "max": 2.0},
...                 {"name": "sigma", "min": 0.1, "max": 3.0},
...             ],
...         }
...     ],
... }
>>> ws = pyhs3.Workspace(**workspace_data)
>>> # Print workspace structure
>>> print(f"Workspace contains:")
Workspace contains:
>>> print(f"- {len(ws.distributions)} distributions")
- 1 distributions
>>> print(f"- {len(ws.functions)} functions")
- 0 functions
>>> print(f"- {len(ws.domains)} domains")
- 1 domains
>>> print(f"- {len(ws.parameter_points)} parameter sets")
- 1 parameter sets
>>> print(f"- {len(ws.data)} data components")
- 0 data components
>>> print(f"- {len(ws.likelihoods)} likelihoods")
- 0 likelihoods
>>> print(f"- {len(ws.analyses)} analyses")
- 0 analyses
>>> # Access distributions
>>> print("Distributions:")
Distributions:
>>> for dist in ws.distributions:
...     print(f"  {dist.name} ({dist.type})")
...     print(f"    Parameters: {sorted(dist.parameters)}")
...
  signal (gaussian_dist)
    Parameters: ['mu', 'obs', 'sigma']
>>> # Access parameter sets
>>> print()

>>> print("Parameter sets:")
Parameter sets:
>>> for param_set in ws.parameter_points:
...     print(f"  {param_set.name}:")
...     for param in param_set.parameters:
...         print(f"    {param.name} = {param.value}")
...
  nominal:
    obs = 0.0
    mu = 0.0
    sigma = 1.0
>>> # Access domains
>>> print()

>>> print("Domains:")
Domains:
>>> for domain in ws.domains:
...     print(f"  {domain.name}:")
...     for axis in domain.axes:
...         print(f"    {axis.name}: [{axis.min}, {axis.max}]")
...
  physics_region:
    obs: [-5.0, 5.0]
    mu: [-2.0, 2.0]
    sigma: [0.1, 3.0]

Understanding Workspace Structure

The workspace follows a hierarchical structure:

        ---
config:
  darkMode: 'true'
  theme: forest

---
%%{
  init: {
    'theme': 'forest',
    'themeVariables': {
      'primaryColor': '#fefefe',
      'lineColor': '#aaa'
    }
  }
}%%

classDiagram
    class Workspace {
        +metadata: Metadata
        +distributions: list[Distribution]
        +functions: list[Function]
        +domains: list[Domain]
        +parameter_points: list[ParameterSet]
        +data: list[Data]
        +likelihoods: Likelihoods
        +analyses: Analyses
    }

    class Metadata {
        +hs3_version: str
        +authors: optional[list]
        +description: optional[str]
    }

    class Distribution {
        +name: str
        +type: str
        +parameters: dict
    }

    class Function {
        +name: str
        +type: str
        +parameters: dict
    }

    class Domain {
        +name: str
        +type: str
        +axes: list[Axis]
    }

    class ParameterSet {
        +name: str
        +parameters: list[ParameterPoint]
    }

    class Likelihood {
        +name: str
        +distributions: list[str]
        +data: list[str|float|int]
        +aux_distributions: optional[list[str]]
    }

    class Analysis {
        +name: str
        +likelihood: str
        +domains: list[str]
        +parameters_of_interest: optional[list[str]]
        +init: optional[str]
        +prior: optional[str]
    }

    class Datum {
        +name: str
        +type: str
    }

    class PointData {
        +name: str
        +type: "point"
        +value: float
        +uncertainty: optional[float]
    }

    class UnbinnedData {
        +name: str
        +type: "unbinned"
        +entries: list[list[float]]
        +axes: list[Axis]
        +weights: optional[list[float]]
        +entries_uncertainties: optional[list[list[float]]]
    }

    class BinnedData {
        +name: str
        +type: "binned"
        +contents: list[float]
        +axes: list[Axis]
        +uncertainty: optional[GaussianUncertainty]
    }

    Workspace --> Metadata : contains
    Workspace --> Distribution : contains
    Workspace --> Function : contains
    Workspace --> Domain : contains
    Workspace --> ParameterSet : contains
    Workspace --> Datum : contains
    Workspace --> Likelihood : contains
    Workspace --> Analysis : contains
    Datum <|-- PointData : inherits
    Datum <|-- UnbinnedData : inherits
    Datum <|-- BinnedData : inherits
    

Creating Models from Workspaces

The main purpose of a workspace is to create models that you can evaluate:

>>> # Create a model using specific domain and parameter set
>>> model = ws.model(domain="physics_region", parameter_set="nominal")

>>> # Or use defaults (index 0)
>>> model = ws.model()

>>> # Evaluate the model
>>> import numpy as np
>>> result = model.pdf("signal", obs=np.array(0.5), mu=np.array(0.0), sigma=np.array(1.0))
>>> print(f"PDF value: {result}")
PDF value: ...

Example: Complete Physics Model

Here’s a more realistic example of a workspace for a physics analysis using both Gaussian distributions and generic expressions with a sum function:

Working with Likelihoods and Analyses

Likelihoods and analyses are optional but important components for statistical inference:

Working with Data Components

The data component in PyHS3 provides structured specifications for observed data used in likelihood evaluations. There are three types of data supported:

Point Data: Single measurements with optional uncertainties (see HS3 data specification)

point_data_example = {
    "name": "higgs_mass_measurement",
    "type": "point",
    "value": 125.09,
    "uncertainty": 0.24,
}

Unbinned Data: Individual data points in multi-dimensional space

unbinned_data_example = {
    "name": "particle_tracks",
    "type": "unbinned",
    "entries": [
        [120.5, 0.8],  # [mass, momentum] for event 1
        [125.1, 1.2],  # [mass, momentum] for event 2
        [122.3, 0.9],  # [mass, momentum] for event 3
    ],
    "axes": [
        {"name": "mass", "min": 100.0, "max": 150.0},
        {"name": "momentum", "min": 0.0, "max": 5.0},
    ],
    "weights": [0.8, 1.0, 0.9],  # optional event weights
    "entries_uncertainties": [  # optional uncertainties for each coordinate
        [0.1, 0.05],
        [0.2, 0.08],
        [0.15, 0.06],
    ],
}

Binned Data: Histogram data with bin contents and optional uncertainties

binned_data_example = {
    "name": "mass_spectrum",
    "type": "binned",
    "contents": [45.0, 67.0, 52.0, 38.0],  # bin contents
    "axes": [
        {
            "name": "mass",
            "edges": [110.0, 120.0, 130.0, 140.0, 150.0],  # irregular binning
        }
    ],
    "uncertainty": {
        "type": "gaussian_uncertainty",
        "sigma": [6.7, 8.2, 7.2, 6.2],  # uncertainties for each bin
        "correlation": 0,  # or correlation matrix for correlated uncertainties
    },
}

# Regular binning alternative
regular_binned_example = {
    "name": "pt_spectrum",
    "type": "binned",
    "contents": [100.0, 80.0, 60.0, 40.0, 20.0],
    "axes": [
        {
            "name": "pt",
            "min": 0.0,
            "max": 100.0,
            "nbins": 5,  # regular binning: 5 bins from 0 to 100
        }
    ],
}

Accessing Data in Workspaces

# Access data components
print(f"\\nData components ({len(physics_ws.data)}):")
for datum in physics_ws.data:
    print(f"  {datum.name} ({datum.type})")
    if hasattr(datum, "value"):
        print(f"    Value: {datum.value}")
    elif hasattr(datum, "contents"):
        print(f"    Bins: {len(datum.contents)}")
    elif hasattr(datum, "entries"):
        print(f"    Events: {len(datum.entries)}")

# Get specific data by name
mass_data = physics_ws.data["observed_mass_spectrum"]
print(f"Data '{mass_data.name}' has {len(mass_data.contents)} bins")

# Check if data exists
if "observed_mass_spectrum" in physics_ws.data:
    print("Mass spectrum data is available")

Data components integrate with likelihoods to define the complete statistical model for parameter estimation and hypothesis testing.