Working with Workspaces¶
This tutorial covers how to work with PyHS3 workspaces - loading, exploring, and understanding their structure.
What is a Workspace?¶
A Workspace is the main container in PyHS3 that holds all the components needed to define a statistical model:
Distributions: Probability distributions (Gaussian Normal distribution, Poisson distribution, etc.)
Functions: Mathematical functions that compute parameter values (Sum, Product, Generic Function)
Domains: Parameter space constraints and bounds
Parameter Points: Named sets of parameter values
Data: Observed data specifications (point data, unbinned data, binned/histogram data)
Likelihoods: Mappings between distributions and data
Analyses: Complete analysis configurations
Metadata: Version information and documentation
Loading a Workspace¶
You can create a workspace from a dictionary or load it from a JSON file. The following example shows a simple workspace with a Gaussian distribution:
Exploring Workspace Contents¶
Once you have a workspace, you can explore its contents:
>>> import pyhs3
>>> workspace_data = {
... "metadata": {"hs3_version": "0.2"},
... "distributions": [
... {
... "name": "signal",
... "type": "gaussian_dist",
... "x": "obs",
... "mean": "mu",
... "sigma": "sigma",
... }
... ],
... "parameter_points": [
... {
... "name": "nominal",
... "parameters": [
... {"name": "obs", "value": 0.0},
... {"name": "mu", "value": 0.0},
... {"name": "sigma", "value": 1.0},
... ],
... }
... ],
... "domains": [
... {
... "name": "physics_region",
... "type": "product_domain",
... "axes": [
... {"name": "obs", "min": -5.0, "max": 5.0},
... {"name": "mu", "min": -2.0, "max": 2.0},
... {"name": "sigma", "min": 0.1, "max": 3.0},
... ],
... }
... ],
... }
>>> ws = pyhs3.Workspace(**workspace_data)
>>> # Print workspace structure
>>> print(f"Workspace contains:")
Workspace contains:
>>> print(f"- {len(ws.distributions)} distributions")
- 1 distributions
>>> print(f"- {len(ws.functions)} functions")
- 0 functions
>>> print(f"- {len(ws.domains)} domains")
- 1 domains
>>> print(f"- {len(ws.parameter_points)} parameter sets")
- 1 parameter sets
>>> print(f"- {len(ws.data)} data components")
- 0 data components
>>> print(f"- {len(ws.likelihoods)} likelihoods")
- 0 likelihoods
>>> print(f"- {len(ws.analyses)} analyses")
- 0 analyses
>>> # Access distributions
>>> print("Distributions:")
Distributions:
>>> for dist in ws.distributions:
... print(f" {dist.name} ({dist.type})")
... print(f" Parameters: {sorted(dist.parameters)}")
...
signal (gaussian_dist)
Parameters: ['mu', 'obs', 'sigma']
>>> # Access parameter sets
>>> print()
>>> print("Parameter sets:")
Parameter sets:
>>> for param_set in ws.parameter_points:
... print(f" {param_set.name}:")
... for param in param_set.parameters:
... print(f" {param.name} = {param.value}")
...
nominal:
obs = 0.0
mu = 0.0
sigma = 1.0
>>> # Access domains
>>> print()
>>> print("Domains:")
Domains:
>>> for domain in ws.domains:
... print(f" {domain.name}:")
... for axis in domain.axes:
... print(f" {axis.name}: [{axis.min}, {axis.max}]")
...
physics_region:
obs: [-5.0, 5.0]
mu: [-2.0, 2.0]
sigma: [0.1, 3.0]
Understanding Workspace Structure¶
The workspace follows a hierarchical structure:
---
config:
darkMode: 'true'
theme: forest
---
%%{
init: {
'theme': 'forest',
'themeVariables': {
'primaryColor': '#fefefe',
'lineColor': '#aaa'
}
}
}%%
classDiagram
class Workspace {
+metadata: Metadata
+distributions: list[Distribution]
+functions: list[Function]
+domains: list[Domain]
+parameter_points: list[ParameterSet]
+data: list[Data]
+likelihoods: Likelihoods
+analyses: Analyses
}
class Metadata {
+hs3_version: str
+authors: optional[list]
+description: optional[str]
}
class Distribution {
+name: str
+type: str
+parameters: dict
}
class Function {
+name: str
+type: str
+parameters: dict
}
class Domain {
+name: str
+type: str
+axes: list[Axis]
}
class ParameterSet {
+name: str
+parameters: list[ParameterPoint]
}
class Likelihood {
+name: str
+distributions: list[str]
+data: list[str|float|int]
+aux_distributions: optional[list[str]]
}
class Analysis {
+name: str
+likelihood: str
+domains: list[str]
+parameters_of_interest: optional[list[str]]
+init: optional[str]
+prior: optional[str]
}
class Datum {
+name: str
+type: str
}
class PointData {
+name: str
+type: "point"
+value: float
+uncertainty: optional[float]
}
class UnbinnedData {
+name: str
+type: "unbinned"
+entries: list[list[float]]
+axes: list[Axis]
+weights: optional[list[float]]
+entries_uncertainties: optional[list[list[float]]]
}
class BinnedData {
+name: str
+type: "binned"
+contents: list[float]
+axes: list[Axis]
+uncertainty: optional[GaussianUncertainty]
}
Workspace --> Metadata : contains
Workspace --> Distribution : contains
Workspace --> Function : contains
Workspace --> Domain : contains
Workspace --> ParameterSet : contains
Workspace --> Datum : contains
Workspace --> Likelihood : contains
Workspace --> Analysis : contains
Datum <|-- PointData : inherits
Datum <|-- UnbinnedData : inherits
Datum <|-- BinnedData : inherits
Creating Models from Workspaces¶
The main purpose of a workspace is to create models that you can evaluate:
>>> # Create a model using specific domain and parameter set
>>> model = ws.model(domain="physics_region", parameter_set="nominal")
>>> # Or use defaults (index 0)
>>> model = ws.model()
>>> # Evaluate the model
>>> import numpy as np
>>> result = model.pdf("signal", obs=np.array(0.5), mu=np.array(0.0), sigma=np.array(1.0))
>>> print(f"PDF value: {result}")
PDF value: ...
Example: Complete Physics Model¶
Here’s a more realistic example of a workspace for a physics analysis using both Gaussian distributions and generic expressions with a sum function:
Working with Likelihoods and Analyses¶
Likelihoods and analyses are optional but important components for statistical inference:
Working with Data Components¶
The data component in PyHS3 provides structured specifications for observed data used in likelihood evaluations. There are three types of data supported:
Point Data: Single measurements with optional uncertainties (see HS3 data specification)
point_data_example = {
"name": "higgs_mass_measurement",
"type": "point",
"value": 125.09,
"uncertainty": 0.24,
}
Unbinned Data: Individual data points in multi-dimensional space
unbinned_data_example = {
"name": "particle_tracks",
"type": "unbinned",
"entries": [
[120.5, 0.8], # [mass, momentum] for event 1
[125.1, 1.2], # [mass, momentum] for event 2
[122.3, 0.9], # [mass, momentum] for event 3
],
"axes": [
{"name": "mass", "min": 100.0, "max": 150.0},
{"name": "momentum", "min": 0.0, "max": 5.0},
],
"weights": [0.8, 1.0, 0.9], # optional event weights
"entries_uncertainties": [ # optional uncertainties for each coordinate
[0.1, 0.05],
[0.2, 0.08],
[0.15, 0.06],
],
}
Binned Data: Histogram data with bin contents and optional uncertainties
binned_data_example = {
"name": "mass_spectrum",
"type": "binned",
"contents": [45.0, 67.0, 52.0, 38.0], # bin contents
"axes": [
{
"name": "mass",
"edges": [110.0, 120.0, 130.0, 140.0, 150.0], # irregular binning
}
],
"uncertainty": {
"type": "gaussian_uncertainty",
"sigma": [6.7, 8.2, 7.2, 6.2], # uncertainties for each bin
"correlation": 0, # or correlation matrix for correlated uncertainties
},
}
# Regular binning alternative
regular_binned_example = {
"name": "pt_spectrum",
"type": "binned",
"contents": [100.0, 80.0, 60.0, 40.0, 20.0],
"axes": [
{
"name": "pt",
"min": 0.0,
"max": 100.0,
"nbins": 5, # regular binning: 5 bins from 0 to 100
}
],
}
Accessing Data in Workspaces
# Access data components
print(f"\\nData components ({len(physics_ws.data)}):")
for datum in physics_ws.data:
print(f" {datum.name} ({datum.type})")
if hasattr(datum, "value"):
print(f" Value: {datum.value}")
elif hasattr(datum, "contents"):
print(f" Bins: {len(datum.contents)}")
elif hasattr(datum, "entries"):
print(f" Events: {len(datum.entries)}")
# Get specific data by name
mass_data = physics_ws.data["observed_mass_spectrum"]
print(f"Data '{mass_data.name}' has {len(mass_data.contents)} bins")
# Check if data exists
if "observed_mass_spectrum" in physics_ws.data:
print("Mass spectrum data is available")
Data components integrate with likelihoods to define the complete statistical model for parameter estimation and hypothesis testing.