Data collection challenges
Modeling complexity
Need for efficient processing
Importance to society
arxiv.org/abs/2404.03701 submitted to Crop Science
AUTHORS
Fabiana Ferracina, Bala Krishnamoorthy, Mahantesh Halappanavar, Shengwei Hu, Vidyasagar Sathuvalli
Given data from trials performed in Oregon, can we use machine learning to predict which varieties should graduate to the next step in the process versus which varieties should be dropped?
Multi-layer Perceptron (MLP): artificial neural network consisting of fully connected input, hidden, and output layers.
Nonlinear activation function (such as ReLU) to introduce nonlinearity into the model.
Histogram-based Gradient Boosting (HGB): Additive model where many weak learners (typically decision trees) are combined to form a strong predictor. Each new tree corrects errors made by the previous ones.
Support Vector Machine (SVM): RBF kernel maps the original feature space into a higher-dimensional space where a hyperplane can be found to separate the classes.
| Model | ΔCV MCC | Test MCC | CI Test MCC |
|---|---|---|---|
| MLPC | 0.476 | 0.608 | (0.581, 0.633) |
| HGBC | 0.531 | 0.574 | (0.546, 0.602) |
| SVM | 0.539 | 0.502 | (0.471, 0.533) |
Postprocessing imputed dataset had 885 observations and 40 features.
| Model | ΔCV MCC | Test MCC | CI Test MCC |
|---|---|---|---|
| MLPC | 0.595 | 0.623 | (0.584, 0.659) |
| HGBC | 0.511 | 0.645 | (0.608, 0.680) |
| SVM | 0.731 | 0.623 | (0.584, 0.659) |
Postprocessing non-imputed dataset had 404 observations and 40 features.
Will be submitted to the Aerosol Science and Technology Journal
AUTHORS
Fabiana Ferracina, Payton Beeler, Mahantesh Halappanavar, Bala Krishnamoorthy, Marco Minutoli, Laura Fierce
Understanding the chemical composition of aerosols is crucial due to their impact on atmospheric processes and human health.
In particular MOSAIC solves a first order ODE system, where the rate of concentration change over time of a particular gas species is proportional to the its concentration at the time. The proportionality constant depends on several fixed values and correction factors. It can be efficient for reduced models, however it doesn't scale well in particle-based models such as PartMC.
How can we study large scale aerosol particles' chemical composition in a fast and accurate way?
Multi-output mean squared error function to compute loss at training:
$$L^{\text{dim}}_{\text{MSE}} = \sum_{part.} \sum_{\text{chem.}} \frac{\left(\text{target dynamic} - \text{pred. dynamic}\right)^2}{\text{number of particles}}$$Loss is minimized using Adam (Adaptive Moment Estimation) algorithm.
Prediction loss is measured both with a observation-wise MSE of particles' mass and concentration differences $\left(L^{\text{flat}}_{\text{MSE}}\right)$ , and with a normalized mean absolute error (NMAE) computed for each chemical species:
$$\text{NMAE} = \sum_{part.} \frac{\sum\limits_{\text{tstep}} \left|\text{true mass} - \text{predicted mass}\right|}{\text{number of particles} \cdot \sum\limits_{\text{tstep}} \left| \text{true mass} \right|}$$We computed the NMAE between the prediction and the ground-truth data (PartMC-MOSAIC simulated data) for all chemical species and obtained histograms of the particles' total mass and dry diameter.
$N = 1146$, NMAE $= 0.0126$
$N = 1146$, NMAE $= 0.0123$
$N=1146$, # of training steps $= 1500$, $L^{\text{flat}}_{\text{MSE}} = 1.03 \times 10^{-6}$
Will be submitted to the Journal of Cleaner Production
AUTHORS
Ayane Nakamura, Fabiana Ferracina, Naoki Sakata, Amanda E. Hampton, Takahiro Noguchi, Hiroyasu Ando
Developed by Ayane Nakamura
$$ \begin{equation} E_k = \sum\limits_i^{\text{vehicle type}} [\text{number of type $i$'s} \times \text{ave. distance by i} \\ \times \sum\limits_j^{\text{road type}} \left(\text{prop. of distance on road $j$} \times \right. \\ \left. \text{emission factor for pollutant $k$, vehicle $i$ on road $j$}\right)], \end{equation}$$
Empirical expected total trip time and emissions are 0.3893 hours and 6,093,234 grams of pollutants, respectively.
We introduce SCETT, the Social Cost of Emissions and Trip Time:
$$\text{SCETT} = \text{social cost of CO$_2$} \times \\ \text{CO$_2$ emissions of vehicles during time interval $T$} \\ + \text{social cost of trip time} \times |T| \times \\ \text{average total trip time per hour}$$$$ \begin{equation} \text{arg}\min\limits_{(b,C)} \left(\sum\limits_{h=1}^n \sum\limits_{p_{car} \in P} \text{SCETT}(b,C) \right), \end{equation}$$ $b, C$ represent the bus frequency interval and bus capacity respectively. $h$ represents PnR stations 1 through $n$. $p_{car}$ is the proportion of customers using private cars. Note that in our current PnR system we only account for emissions of buses and private cars, and the length of time interval $|T| = 4$ hours.
SCETT according to FUND in international dollars/capita per percent of car usage. Hub 3 of Tsukuba's PnR system according to the 2018 Person Trip survey. Each point's color pertain to the number of buses ($[1/b]$) and annotations above each point represent the bus capacity. Time is 5.2 times more valuable than emissions.
SCETT according to RICE in international dollars/capita per percent of car usage. Hub 3 of Tsukuba's PnR system according to the 2018 Person Trip survey. Each point's color pertain to the number of buses ($[1/b]$) and annotations above each point represent the bus capacity. Time is 1.25 times more valuable than emissions.
| Hub Number | Time+CO$_2$ Social Cost (FUND) |
|---|---|
| 1 | 1024.03 |
| 2 | 1053.50 |
| 3 | 1662.02 |
| 4 | 1011.24 |
| 5 | 553.23 |
Environmental and societal impacts of these fields are interconnected, with each influencing and being influenced by factors such as air quality, climate change, food security, and urban pollution.
Cross-disciplinary innovations can lead to more holistic approaches to environmental management and policy-making, with data-driven approaches and systems thinking required for meaningful breakthoughs.