Antonio Aguirre Data Modelling·Bayesian Statistics·Machine Learning

On Bayesian Methodology. Part 4/6

IV. Addressing Computational Problems

The Folk Theorem of Statistical Computing:
  • When computational problems arise, they often indicate an issue with the model itself.
  • Many cases of poor convergence are tied to regions of parameter space that are either irrelevant or nonsensical.
  • Instead of throwing more computational resources at the problem, the first step should be to identify and address potential model pathologies.
  • Starting at Simple and Complex Models and Meeting in the Middle

    Diagnosing computational problems often requires a two-pronged approach:

    • Simplify the problematic model step by step until it works reliably.
    • Start with a simple, well-understood model and gradually add features until the issue reappears.

    This process helps isolate the root cause of the problem.

    Getting a Handle on Models That Take a Long Time to Fit

    Slow computation is often symptomatic of deeper issues, such as poorly performing Hamiltonian Monte Carlo (HMC). However, debugging becomes harder as computation times increase.
    Key strategies include:

    • Viewing model choices as provisional.
    • Fitting multiple models to understand computational and inferential behavior in the applied problem.

    Monitoring Intermediate Quantities

    Saving and plotting intermediate quantities during computation can reveal hidden issues with the model or algorithm. These visualizations often provide valuable clues for debugging.

    Stacking to Reweight Poorly Mixing Chains

    In situations where multiple chains mix slowly but remain within reasonable ranges, stacking can be used:

    • Combine simulations by assigning weights to chains through cross-validation.
    • Particularly useful during model exploration when diagnostics suggest some progress but full convergence remains elusive.

    Posterior Distributions with Multimodality and Difficult Geometry

    Multimodality and complex posterior geometries pose significant challenges:

    • Disjoint Posterior Volumes:
      • Near-zero mass for all but one mode.
      • Symmetric volumes, such as label switching in mixture models.
      • Distinct volumes with significant probability mass.
    • Unstable Tails:
      A single posterior volume with arithmetically unstable regions.

    Each scenario requires tailored strategies for efficient computation.

    Reparameterization

    HMC-based samplers perform best when:

    • The mass matrix is well-tuned.
    • The posterior geometry is smooth, with no sharp corners or irregularities.

    For many classical models, results like the Bernstein-von Mises theorem simplify posterior geometry with sufficient data. When this is not the case, reparameterization can significantly improve computational performance by simplifying posterior geometry.

    Marginalization

    Challenging geometries in posterior distributions often stem from parameter interactions. Marginalizing over certain parameters can simplify computations:

    • Approximations like the Laplace method can be particularly effective for latent Gaussian models.
    • Exploiting the structure of the problem can lead to substantial improvements.

    Adding Prior Information

    Many computational issues can be mitigated by incorporating prior information:

    • Priors help address weakly informative data regions, improving model behavior without sacrificing inference quality.
    • While the primary purpose of priors is not to fix fitting problems, their inclusion often resolves computational challenges.
    Identifiability?
  • “Identification” is an asymptotic property in classical statistics
  • Bayesian inference prioritizes inference with finite data.
  • If data are insufficient for certain aspects of the model, priors can supply the necessary information.
  • Ladder of Abstraction:

    1. Poor mixing of MCMC.
    2. Difficult geometry as a mathematical explanation.
    3. Weakly informative data as a statistical explanation.
    4. Substantive prior information as a solution.

    Addressing computational issues can start at either end of this ladder, transitioning from troubleshooting to workflow optimization.

    Adding Data

    Similar to priors, additional data can constrain models and resolve computational problems:

    • Incorporate new data sources into the model.
    • Models that work well with larger datasets may struggle in small data regimes; expanding the dataset can improve performance.

    Addressing computational problems in Bayesian modeling involves a combination of simplifying models, leveraging prior information, and refining computational techniques. A systematic approach, starting with diagnostics and iterative improvements, ensures both model reliability and computational efficiency.


    V. Evaluating and Using a Fitted Model

    Evaluating a fitted model involves multiple checks, each tailored to the specific goals of the analysis. The aspects of the model that require evaluation depend on the application and the intended users of the statistical methods.

    Posterior Predictive Checking

    Posterior predictive checking involves simulations from the posterior distribution to evaluate model performance:

    • While there’s no universal guide for which checks to perform, conducting a few direct checks can safeguard against gross misspecification.
    • Similarly, there’s no definitive rule for deciding when a failed check necessitates adjustments to the model.
    • The choice of checks depends on the analysis goals and the costs and benefits of adjustments.
    Key Principle:
  • Seek “severe tests”—checks likely to fail if the model produces misleading answers to critical questions.
  • Cross-Validation and Influence of Individual Data Points

    Cross-validation (CV) enhances predictive diagnostics, especially for flexible models, by providing insights into model fit and data influence:

    1. Calibration Checks: Use the cross-validation predictive distribution to assess calibration.
    2. Difficult Observations: Identify observations or groups that are hard to predict.
    3. Influence Diagnostics: Examine the additional information provided by individual observations.
    • Leave-one-out cross-validation (LOO-CV) is a popular method, though it doesn’t always align with inferential goals for multilevel (hierarchical) models.
    • Calibration insights:
      • Posterior predictive checking compares marginal prediction distributions to data.
      • LOO-CV predictive checking evaluates the calibration of conditional predictive distributions.
      • Probability integral transformations (PIT) under good calibration are uniform.
    Practical Tip:
  • Cross-validation for multilevel models requires thoughtful implementation to ensure alignment with inferential goals.
  • Influence of Prior Information

    Understanding how prior information affects posterior inferences is essential for a robust evaluation:

    • A statistical model can be understood in two ways:
      1. Generatively: Explore how parameters map to data using prior predictive simulations.
      2. Inferentially: Examine the path from inputs (data and priors) to outputs (estimates and uncertainties).
    • Sensitivity Analysis:
      • Measure shrinkage between prior and posterior distributions, such as comparing posterior standard deviations or quantiles.
      • Use importance sampling to approximate and compare posteriors across models.
      • Conduct static sensitivity analysis to study posterior sensitivity to prior perturbations without re-fitting the model.
    Practical Insight:
  • Sensitivity analysis highlights the balance between prior information and data, offering valuable diagnostic insights without excessive computation.
  • Summarizing Inference and Propagating Uncertainty

    Traditional methods of summarizing Bayesian inference often fail to fully represent the complexity of variation and uncertainty:

    • Tables and graphs of parameter estimates and uncertainties only capture one-dimensional margins.
    • Marginal posterior distribution graphs become unwieldy for models with many parameters and fail to illustrate the interplay of hierarchical model uncertainties.

    Tools for Advanced Summaries:

    • Use visualization tools like the bayesplot R package to effectively summarize and explore Bayesian inference results.

    Evaluating a fitted model is a multifaceted process. It involves a combination of diagnostic checks, sensitivity analyses, and advanced visualization techniques. The goal is not just to identify potential misfits but to refine the model for better inference and predictive accuracy.


    VI. Modifying a Model

    Constructing a Model for the Data

    Model construction is a creative process where the modeler combines existing components to account for new data, enhance features of existing data, or establish links to underlying processes.

    Model Building as a Task:
  • Model construction is akin to a language-like task, where components are assembled to encompass new data, existing data features, and links to underlying processes.
  • Reasons for Model Expansion:
  • Response to new data.
  • Failures of models fit to existing data.
  • Computational challenges with current fitting procedures.
  • Incorporating Additional Data

    Expanding a model to include more data is a critical step in a Bayesian Workflow.
    It’s often said that the value of a statistical method lies not just in how it handles data but in the choice of what data to use.

    Working with Prior Distributions

    Traditionally, Bayesian statistics refers to noninformative or fully informative priors, but in practice, these rarely exist:

    • Uniform Prior: Depends on parameterization, carrying implicit information.
    • Reference Prior: Based on asymptotic regimes and fictional data assumptions.
    • Informative Prior: Rarely encompasses all available knowledge.

    Ladder of Priors:
    Think of prior distributions as existing on a continuum:

    1. Improper flat prior.
    2. Super-vague but proper prior.
    3. Very weakly informative prior.
    4. Generic weakly informative prior.
    5. Specific informative prior.

    Priors also act as constraints, shrinking estimates toward simpler models. However, the need for prior information varies based on:

    • The role of the parameter in the model.
    • The parameter’s position in the hierarchy.
    Critical Insight: Priors must be specified for each model in the workflow. Expanded models often require additional thought regarding parameterization and joint priors to avoid unintended effects like cancellations or overly stabilizing priors.

    When introducing new parameters:

    • Consider tightening priors on overarching metrics like means and standard deviations.
    • Be cautious of the concentration of measure in higher-dimensional spaces.

    A Topology of Models

    Models within a framework can be thought of as forming a topology or network structure. This structure reflects connections and partial orderings rather than probabilities assigned to individual models.

    Model Topology?:
  • A topology of models refers to their connections and relationships, not a probability space. The goal is not to average over models but to navigate among them effectively.
  • Examples of Model Navigation Tools:

    • Automatic Statistician: Explores models in specified but open-ended classes, like time series or regression models, using inference and model criticism.
    • Prophet: A time series forecasting tool allowing users to build models from predefined building blocks.

    Model Operations: Models, treated as probabilistic random variables, can be combined in multiple ways:

    • Additive, multiplicative, linear mixing, log-linear mixing, pointwise mixing, and more.
      Each model has its internal structure, with parameters estimated from data, and parameters across models can interact (e.g., shared parameters).
    Applications:
  • Forecasting: Using interconnected models to predict future outcomes.
  • Causal Inference: Exploring relationships between variables using networked model structures.