Conditional Causal Discovery Interface

Enhancing Convincingness, Variety, and Discoverability

G-RIPS Sendai 2024 | Fujitsu Group

Team: John Forde, Gaspar Mendez, Akane Okubo, Daniel Quigley, Renji Sakamoto

Mentors: Fabiana Ferracina, Jorge Gutierrez, Hiroyuki Higuchi

Team Fujitsu Team Fujitsu
Team Fujitsu

Left to right: Renji Sakamoto, Akane Okubo, Fabiana Ferracina, Gaspar Mendez, Hiroyuki Higuchi, Jorge Gutierrez, Daniel Quigley, John Forde

Overview

  • Problem: Understanding complex causal relationships in data
  • Challenge: "Spaghetti graphs" are hard to interpret and compare
  • Solution: Interactive interface promoting CVD principles
  • Demo: Live walkthrough of the interface
  • Future: Extensions and research directions

The Causality Challenge

"A is the cause of B" - seemingly simple, but has profound implications across domains: mathematics, computer science, philosophy, and social sciences

  • Traditional approach: Single causal graph (DirectLiNGAM)
  • Wide Learning™: Multiple graphs under different conditions
  • Problem: Too many graphs to comprehend effectively

The "Spaghetti Graph" Problem

Messy Spaghetti Graph

Complex causal graphs are inherently difficult to interpret

CVD: Our Design Framework

Convincingness ↔ Variety ↔ Discoverability

CVD Triangle

CV/CD/VD

Assumptions:convincingness goes at least as explainability; variety goes at least as the Rashomon set; discoverability goes at least as uncovering new or unexpected causal relationships.

We consider the pairwise interaction of these concepts:

  • CV - we explore by explanation of elements of the Rashomon set with similarities
  • CD - we explore by understanding how and why these causalities came to be through independent or directed evaluation of evidence
  • VD - we explore by choosing elements from the Rashomon set that have new or unexpected results
Model Library Approach
Model Builder Approach
Model Simplifier Approach

Understanding the Background and Necessary Tools

Directed Acyclic Graphs (DAGs)

Mathematical foundation for representing causal structures

  • Nodes: Variables in the system
  • Directed edges: Causal relationships
  • Acyclic: No variable causes itself
  • Constraint: Prevents circular causality
DAGs

Causal Discovery

Recovering DAG representations from observational data

Key Challenge: Given only probability distributions, you can identify a Markov equivalence class — multiple DAGs that are statistically indistinguishable

Causal Discovery

Structural Equation Model (SEM)

Structural equation modeling (SEM) is a multivariate statistical framework for specifying and testing systems of relationships among observed and latent variables in a single, integrated model

\(M_{SEM} = (U, V, F, P)\)

  • U: Exogenous variables (external factors)
  • V: Endogenous variables (observed)
  • F: Functions defining relationships
  • P: Joint probability distribution

Direct Linear Non-Gaussian Acyclic Model

Resolves ambiguity by exploiting non-Gaussianity

Result: Identifies a unique DAG rather than an equivalence class

  • Variables are continuous
  • Relationships are linear
  • Noise terms are mutually independent
  • Noise terms are non-Gaussian

DirectLiNGAM: Algorithm

Iterative process to identify causal ordering

  1. Find exogenous variables (no parents)
  2. Test which variable's residuals are most independent
  3. Remove that variable and repeat
  4. Build unique causal structure
Direct LiNGAM

DirectLiNGAM: Example

Variables: Temperature, Ice Cream Sales, Drowning Deaths

  • Step 1: Test all variables — Temperature's residuals are most independent → no parents
  • Step 2: Remove Temperature, test remaining — Ice Cream Sales is independent
  • Step 3: Only Drowning Deaths left
  • Result: Temperature → Ice Cream Sales, Temperature → Drowning Deaths

Both caused by temperature, but don't cause each other!

Wide Learning (WL)

Fujitsu Causal Discovery

Symbolic classification model for causal discovery

\(M_{WL} = (\text{features}, \text{combinator}, \text{evaluation})\)

Generates multiple causal graphs under different conditions

  • Finds all feature combinations (up to \(K \leq 4\))
  • Evaluates via mutual information or entropy
  • Identifies which combinations contribute to outcome
  • Creates condition-specific causal graphs

Wide Learning: Example

Student Performance Study

DirectLiNGAM:

Study hours → Performance (one global graph)

Wide Learning discovers:

  • High stress, low sleep: Study hours have little effect
  • Low stress, adequate sleep: Study hours strongly affect performance
  • With tutoring: Study hours → understanding → performance

Rashomon Sets

Set of unique, equally good explanations/models

Model Multiplicity: Many structurally different models achieve nearly identical performance

Named after the film where the same event has multiple plausible accounts

Rashomon Set

Model Evaluation Metrics

Assess fit and select among competing DAG structures

  • AIC: Akaike Information Criterion
  • CFI: Comparative Fit Index
  • RMSEA: Root Mean Square Error of Approximation

AIC: Akaike Information Criterion

\(\text{AIC} = 2k - 2\ln(L)\)

  • k: Number of parameters
  • L: Likelihood
  • Balances fit against complexity
  • Lower AIC = better model

CFI & RMSEA

CFI

  • Compares model to baseline
  • Range: 0 to 1, \(>0.95\) is good fit

RMSEA

  • Population fit estimate
  • \(<0.06\) indicates good fit

Graph Hierarchies

Solution for navigating spaghetti graphs

  • Input: Complex graph structures
  • Output: Numerical grades for comparison
  • Metrics: Incoherence and Democracy
  • Enable discovery by properties
Graph hierarchy example

Interface Design Principles

Sample Solution
  1. Minimize overwhelmingness - Present information in digestible subsets
  2. Storytelling approach - Build understanding progressively
  3. Guided & unguided discovery - Balance exploration and direction
  4. Accessibility features - Color, navigation, responsiveness
  5. "Don't think, feel!" - Intuitive, interactive design

Case Study: Wine Quality

Overall Case Study

A winemaker's apprentice exploring historical data to understand quality drivers

Feature 1: Data Visualization

Goal: Understand your dataset before exploring causality
  • Data cleaning suggestions
  • Descriptive statistics
  • Correlation analysis (heatmap & scatterplots)
  • Distribution histograms

Data Visualization Dashboard

Data Visualization 1

Raw data, descriptive statistics, and data quality checks

Correlation Analysis

Data Visualization 2

Pairwise correlations revealed through heatmap and scatterplots

Wine Dataset Insights

Data Visualization Insights
Key observations: FSD correlated with ~5 features, skewed distributions, low density variance

Feature 2: Prior Knowledge Survey

Goal: Apply domain expertise to guide the model
  • -1 (Default): No prior knowledge
  • 0 (Add): x causes y (specify sinks/outcomes)
  • 1 (Add): y causes x (specify sources/actions)

Prior Knowledge Interface

Prior Knowledge Interface

Interactive matrix for specifying causal relationships

Feature 3: Hierarchical Metrics

Democratic Axis

Democracy Coefficient

  • Measures "balance" of influence
  • High: Many features equally important
  • Low: Few features dominate

Incoherence Score

  • Measures graph complexity
  • High: Many contradictory paths
  • Low: Clear, coherent structure

Hierarchical Visualization

Hierarchical Metrics

Democracy vs. Incoherence scatter plot reveals graph clusters

Feature 4: Model Builder for Guided Discovery

Philosophy: Humans learn best through interaction
  • Step-by-step exploration from parent to children nodes
  • Path tracking and undo functionality
  • Edge weight threshold filtering
  • Child node preview on hover

Guided Discovery in Action

Causal Discovery Model Builder

Building causal paths interactively

Path Tracking

Dashed Path
Solid Path

Selected nodes form a clear causal path

Feature 5: Model Builder - Unguided Discovery

Dynamic View

  • Interactive graph manipulation
  • Physics-based layout
  • Node color = flux (in/out ratio)
  • Edge style = direction & magnitude

Static View

  • Same visual encoding
  • Exportable as image
  • Easy sharing & comparison

Dynamic Graph View

Causal Discovery Graph

Interactive exploration with full graph visibility

Static Graph View

Static Representation

Clean, exportable representation with color-coded nodes

Feature 6: Insights & Analysis

  • Summary Statistics: Feature ranges, averages, modes for each condition
  • Statistical Validity: CFI, AIC, RMSEA indices
  • Recipe for Success: Actionable recommendations based on models

Statistical Validity

Scatter and Table

Quantitative details add convincingness to discoveries

Decision-Making Support

Decision Making

Translating causal discoveries into strategic actions

🎥 Interactive Demo

click for full demo

Real-World Applications

Domain Decision-Maker Objective
Healthcare Biopharmaceutical Firm Determine causality between lung cancer resistance and genes for immunotherapy R&D
Manufacturing Chemical Company Understand causal relations among catalysts to develop new synthesis methods
Real Estate Property Developer Rank attributes influencing house value to guide development strategy
Food & Beverage Manufacturer Analyze determinants of product quality for QC protocols

Future Directions

  1. Bootstrapping: Report which causal relationships are most supported across samples
  2. Learning by playing: Touchscreen, tablet, VR integration
  3. Model supplementation: Integrate latent variable models, handle confounding
  4. Generative AI themes: Customize interface design based on dataset domain

Research Extensions

  • Prior Knowledge: Advanced identification techniques for chains, forks, and colliders
  • Accessibility: Full WCAG compliance, multi-modal interaction
  • Domain Expertise: Customized interfaces for different user types (scientists, managers, regulators)
  • Statistical Enhancements: Automated model comparison and selection

Thank You!

Questions?

G-RIPS Sendai 2024

Fujitsu Causal Discovery Project

"Kaze ga fukeba, okeya ga moukaru"
When the wind blows, the barrel-makers profit.

Kenichi Pekori