mlpaneldata — Hybrid ML/Econometric Panel Data Library

Key Features

📐

5 Linear Estimators

Pooled OLS, Fixed Effects, Random Effects (Swamy–Arora), Mundlak CRE, and First Differences — all with a unified scikit-learn-style API.

🧠

3 Neural Estimators

Deep NN Panel (Chronopoulos et al., 2023), Interpretable NN with Persistent Change Filter (Yang et al., 2020), and Hybrid Linear+NN.

🧪

24+ Statistical Tests

16+ pre-estimation tests (Hausman, Pesaran CD, IPS unit root, RESET, ...) and 8 post-estimation tests with a single function call.

📊

11 Plot Types + Dashboard

Residual diagnostics, PDP/ICE, PCF trajectories, coefficient stability, heterogeneity analysis, and a 9-panel master dashboard.

📋

Publication Tables

Side-by-side regression tables with significance stars, model fit summaries, and diagnostic tables — ready for journals.

📄

Automated Reports

Generate complete Markdown reports bundling pre-tests, models, post-tests, and all visualisations with one fluent API call.

Quick Start

from mlpaneldata.data import simulate_panel
from mlpaneldata.models import FixedEffects, DNNPanel, HybridPanel
from mlpaneldata.tests import full_pretest_suite, full_posttest_suite
from mlpaneldata.plots import diagnostic_dashboard
from mlpaneldata.tables import regression_table

# Generate panel data
df = simulate_panel(n_units=30, n_periods=40, n_features=5, nonlinear=True)

# Pre-estimation tests (16+ diagnostics)
pre = full_pretest_suite(df, y="y", X=["x1","x2","x3","x4","x5"])
print(pre.summary())

# Fit models
fe = FixedEffects().fit(df, y="y", X=["x1","x2","x3","x4","x5"])
hybrid = HybridPanel(linear_part="within", nn_part="dnn_panel",
                     nn_kwargs=dict(hidden=(32,32), epochs=150)).fit(
                     df, y="y", X=["x1","x2","x3","x4","x5"])

# Regression table + dashboard
print(regression_table([fe, hybrid]))
diagnostic_dashboard(hybrid, save="dashboard.png")

Models

#	Model	Type	Class	Paper
1	Pooled OLS	Linear	`PooledOLS`	Classical
2	Fixed Effects (within)	Linear	`FixedEffects`	Classical
3	Random Effects (Swamy–Arora)	Linear	`RandomEffects`	Classical
4	Mundlak (CRE)	Linear	`Mundlak`	Mundlak (1978)
5	First Differences	Linear	`FirstDifferences`	Classical
6	Deep NN Panel	Neural	`DNNPanel`	Chronopoulos et al. (2023)
7	Interpretable NN (INN)	Neural	`INNPanel`	Yang, Zheng & E (2020)
8	Hybrid FE + DNN	Semi-parametric	`HybridPanel`	mlpaneldata

Real Data Results

Complete analysis of a 20-country × 20-year macro panel (GDP growth, investment, trade, government expenditure, population, inflation, FDI). All results from the executed tutorial notebook.

Data Exploration

Correlation Matrix

GDP Growth by Country (2003–2022)

Variable Distributions

Bivariate Relationships

Pre-Estimation Tests (16+ Diagnostics)

Regression Table — All Linear Models

Model Fit Summary

Fixed Effects — Diagnostic Dashboard

Residual Diagnostics & Unit Fixed Effects

Residual Diagnostics — FE

Estimated Unit Fixed Effects

Q-Q Plot — FE Residuals

Coefficient Stability Across Countries

Deep NN Panel (Chronopoulos et al., 2023)

9-Panel Diagnostic Dashboard — DNN Panel

Training Loss Curve

Heterogeneity — Fitted vs Actual

Partial Dependence Plots (DNN)

PDP — Investment (x1)

PDP — Trade (x2)

PDP — Inflation (x5)

Feature Importance — DNN Panel

Permutation Importance

Gradient × Input Importance

Persistent Change Filter — Deep Dive

Figure 2 Reproduction (Yang et al., 2020)

k Sensitivity Analysis (7 values)

PCF on 5 Economic Signals — Structural Break, Trend, V-Shape, Business Cycle, Double Jump

p(t) / q(t) / D(t) Decomposition — Understanding the Filter Internals

Learning k via Gradient Descent

Multi-Feature Filtering

PCF Applied to Real GDP Growth — Detecting Persistent Growth Regimes

Interpretable NN (Yang, Zheng & E, 2020)

Feature Importance (|head weight|)

Learned k Values per Reduced Dimension

Hybrid FE + DNN Panel

Diagnostic Dashboard — Hybrid Model

Heterogeneity — Hybrid

Partial Derivatives — Hybrid

Model Comparison — All Estimators

Actual vs Fitted — 3 Models

R² and RMSE Comparison

Q-Q Comparison — FE vs DNN vs Hybrid

Feature Importance Comparison Across Model Types

Architecture

mlpaneldata/

data.py — Simulation, balancing, lagging, description
utils.py — Demeaning, R², AIC, BIC
diagnostics.py — Partial derivatives, marginal effects, importance
plots.py — 11 plot types + 9-panel dashboard
tables.py — Publication-quality regression tables
reports.py — Automated Markdown report builder

mlpaneldata/models/

linear.py — Pooled, FE, RE, Mundlak, FD
filters.py — PersistentChangeFilter (PyTorch)
inn.py — Interpretable NN (Paper 1)
dnn_panel.py — Deep NN Panel (Paper 2)
hybrid.py — Semi-parametric Linear + NN

mlpaneldata/tests/

pretests.py — 16+ pre-estimation tests
posttests.py — 8 post-estimation tests
VIF, Hausman, Pesaran CD, IPS, LLC, CIPS, RESET, ...
Diebold–Mariano, Clark–West forecast comparison

Papers Implemented

Paper 1

Interpretable Neural Networks for Panel Data Analysis in Economics

Yang, Y., Zheng, Z. & E, W. (2020)

Introduces the Persistent Change Filter (PCF) — a differentiable module that captures the duration of persistent jumps in a sequence. Combined with splitting layers and sparse dimension reduction into an interpretable architecture.

arXiv:2010.05311 →

Paper 2

Deep Neural Network Estimation in Panel Data Models

Chronopoulos, I., Chrysikou, K., Kapetanios, G., Mitchell, J. & Raftapostolos, A. (2023)

Decomposes the panel relationship into common h(x;θ) and idiosyncratic h_i(x;θ_i) components — the non-linear analogue of heterogeneous panel models. Enables semi-structural analysis via autograd partial derivatives.

arXiv:2305.19921 →