Python package guide
policyengine.py for UK tax and benefit analysis
Reference examples covering household impact, policy reforms, microsimulation over calibrated microdata, and regional breakdowns.
Core concepts
The Python guide now follows the unified policyengine package. Four concepts show up throughout the workflow:
Household calculator
Call pe.uk.calculate_household(...) with plain Python dicts; the typed result exposes every variable in the model.
Datasets
Use pe.uk.ensure_datasets() to load representative microdata, then feed it into Simulation.
Reforms as dicts
A reform is a {"param.path": value} dict. Same shape for reform= (household) and policy= (microsim).
Outputs
Aggregate, ChangeAggregate, and pe.uk.economic_impact_analysis() turn simulations into analysis.
UK entity hierarchy
Outputs come back at the entity level where a variable is defined. Everything else is a mapping operation.
| Entity | Scope | Example variables |
|---|---|---|
person | Individual | employment_income, age, is_disabled_for_benefits |
benunit | Benefit unit (adult(s) + dependent children) | universal_credit, child_benefit |
household | All people at one address | household_net_income, region, council_tax |
Parameter types
Every reform target is a parameter. Knowing which shape a parameter has tells you how to reference it in a Policy.
gov.irs.credits.ctc.amount.base[0].amountA scalar amount, rate, or threshold. Set a new value for a date range.
gov.irs.credits.refundableA list of values, often names of variables that qualify for a rule.
gov.hmrc.income_tax.ratesGraduated thresholds and rates. Access via .thresholds, .rates, .amounts.
gov.irs.credits.ctc.phase_out.threshold.JOINTParameter broken down by an enum (filing status, age band, region).
Simulation
Household-level analysis
Per-household calculations with pe.uk.calculate_household: reforms, variation grids, programmatic builders, tracing, and charts.
Start with pe.uk.calculate_household()
For one explicit family or household, call calculate_household with plain Python dicts. No wrapper class, no situation dictionary - keyword args for people and each entity, plus a year. The result is a typed object with one attribute per entity section.
pip install "policyengine[uk]"import policyengine as pe
result = pe.uk.calculate_household(
# One dict per person - keys are any person-level variable on the UK model.
people=[
{"age": 35, "employment_income": 30000}, # primary earner
{"age": 33}, # partner
{"age": 8}, # dependent
{"age": 5}, # dependent
],
# Benefit unit (adult(s) + dependent children). Empty dict uses defaults.
benunit={},
# Household inputs.
household={"rent": 12000, "region": "NORTH_WEST"},
# Year determines which parameter values apply.
year=2026,
)
# Attribute access on the typed result. Group entities (benunit, household)
# are single objects; person sections are lists (result.person[0]).
print(f"Net income: £{result.household.hbai_household_net_income:,.0f}")
print(f"Child benefit: £{result.benunit.child_benefit:,.0f}")
print(f"Universal credit: £{result.benunit.universal_credit:,.0f}")Net income: £43,338 Child benefit: £2,328 Universal credit: £15,639
Microsimulation
Population-level analysis
Aggregate estimates over calibrated microdata: weighted totals, baseline-vs-reform impacts, regional slices, and distributional charts.
Representative datasets replace the old Microsimulation entry point
For population analysis, move to dataset-backed Simulation objects. pe.uk.ensure_datasets() is the entry point: it loads cached HDF5 datasets when present and otherwise downloads and uprates them. Simulation.ensure() is the new canonical run method - it loads a cached result if available, otherwise runs and caches. pe.uk.model supplies the pinned TaxBenefitModelVersion.
import policyengine as pe
from policyengine.core import Simulation
year = 2026
# ensure_datasets downloads from HuggingFace on first run, caches locally,
# and returns a {"<stem>_<year>": Dataset} dict.
datasets = pe.uk.ensure_datasets(
datasets=["hf://policyengine/policyengine-uk-data/enhanced_frs_2023_24.h5"],
years=[year],
data_folder="./data",
)
dataset = datasets[f"enhanced_frs_2023_24_{year}"]
# pe.uk.model is the country model version pinned by this policyengine.py release.
simulation = Simulation(dataset=dataset, tax_benefit_model_version=pe.uk.model)
# ensure() loads a cached run if available, otherwise runs and caches.
simulation.ensure()
output = simulation.output_dataset.data
print(output.household[["household_net_income", "household_tax"]].head())weight household_net_income household_tax 0 808.091309 22852.031250 3373.469727 1 166.748154 29921.960938 4778.160645 2 467.949768 102740.921875 74814.242188 3 181.570221 33344.003906 5540.328613 4 515.411926 34987.277344 5446.388184
# Old mental model:
# from policyengine_uk import Microsimulation
# sim = Microsimulation(dataset=...)
#
# New policyengine.py mental model:
# import policyengine as pe
# datasets = pe.uk.ensure_datasets(...)
# simulation = Simulation(dataset=dataset, tax_benefit_model_version=pe.uk.model)
import policyengine as pe
from policyengine.core import Simulation
dataset = datasets[f"enhanced_frs_2023_24_{year}"]
simulation = Simulation(dataset=dataset, tax_benefit_model_version=pe.uk.model)
simulation.ensure()
print(simulation.release_bundle["bundle_id"])
print(type(simulation.output_dataset.data.household).__name__)uk-4.0.0 MicroDataFrame
Reproducibility
Pin, verify, export
A policyengine.py release pins a country model to an exact certified data artifact and refuses to mix a model with data it was not certified against. Pin the bundle in requirements, verify the two manifest layers, and emit a TRACE TRO for citations.
Pin the bundle and save it next to every output
The user-facing reproducibility boundary in policyengine.py is the certified runtime bundle. It pins a policyengine.py version to an exact country-model version AND an exact certified data artifact. v4 adds a hard certification check at import time: the installed country package must match the bundled manifest. The practical workflow: pin policyengine in requirements, and write simulation.release_bundle to disk alongside the results you publish.
# Step 1: pin the exact policyengine.py release in your environment.
# pip install "policyengine[uk]==4.3.0"
# Step 2: capture the certified runtime bundle next to every output you save.
import json
from pathlib import Path
import policyengine as pe
from policyengine.core import Simulation
datasets = pe.uk.ensure_datasets(years=[2026], data_folder="./data")
dataset = next(iter(datasets.values()))
simulation = Simulation(dataset=dataset, tax_benefit_model_version=pe.uk.model)
simulation.ensure()
bundle = simulation.release_bundle
Path("outputs").mkdir(exist_ok=True)
Path("outputs/release_bundle.json").write_text(json.dumps(bundle, indent=2, default=str))
print("bundle_id:", bundle["bundle_id"])
print("country:", bundle["country_id"])
print("model:", bundle["model_package"], bundle["model_version"])
print("data:", bundle["data_package"], bundle["data_version"])
print("dataset:", bundle["dataset_filepath"])bundle_id: uk-4.0.0 country: uk model: policyengine-uk 2.88.0 data: policyengine-uk-data 1.40.4 dataset: ./data/enhanced_frs_2023_24_year_2026.h5
References
Where to go after the walkthrough
The model explorer, the policyengine.py repo, and the release-bundle docs are the three sources of truth. Use the quick-reference block below to check the bundle attached to any simulation you have already run.
# After running a simulation, inspect the certified runtime bundle
print(simulation.release_bundle){'bundle_id': 'uk-4.0.0', 'country_id': 'uk', 'policyengine_version': '4.0.0', 'model_package': 'policyengine-uk', 'model_version': '2.88.0', 'data_package': 'policyengine-uk-data', 'data_version': '1.40.4', 'default_dataset': 'enhanced_frs_2023_24', 'certified_data_build_id': 'policyengine-uk-data-1.40.4', 'compatibility_basis': 'matching_data_build_fingerprint', ...}Variables and parameters
Use the model explorer after the walkthrough when you need exact variable names or parameter paths.
Release bundles
The release-bundles doc describes the two-manifest layer, the fingerprint compatibility rule, and artifact states.
Working scripts
The checked-in examples in policyengine.py are the best place to look when you need a longer end-to-end pattern or paper-style reproduction.