Preprint / Version 1

Improving Linear Regression Performance for Interpretable Control

A Partial Dependence Plot Approach

##article.authors##

DOI:

https://doi.org/10.51094/jxiv.1516

Keywords:

Partial Dependence Plots (PDP), Linear regression, Feature engineering, Real-time control, Industrial process

Abstract

This paper addresses the challenge of improving the performance of interpretable linear models for real-time process control in industrial and chemical systems. While black-box models such as Random Forest and XGBoost achieve high predictive accuracy, their lack of interpretability and limited suitability for real-time applications make them difficult to integrate into production environments. To overcome this limitation, we propose a novel feature engineering approach that leverages Partial Dependence Plots (PDP) to capture the complex nonlinear relationships learned by black-box models and convert them into features appropriate for linear regression. We evaluate the proposed methodology using the well-known publicly available wine quality dataset, which provides a strong benchmark due to its multivariate and nonlinear characteristics that are analogous to those observed in industrial processes. Our results show that PDP-based feature transformation substantially improves the predictive accuracy of linear models, achieving an R² score comparable to that of black-box models. This study proposes a practical solution for building high-performance yet interpretable models, showing strong potential for real-time deployment in process control and monitoring.

Conflicts of Interest Disclosure

There are no conflicts of interest to declare.

Downloads *Displays the aggregated results up to the previous day.

Download data is not yet available.

References

L. Breiman, "Random forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001.

Chen, T., & Guestrin, C. “XGBoost: A Scalable Tree Boosting System“. KDD '16, 2016.

P. J. Werbos, "Backpropagation: Past and future," in Proceedings of the IEEE International Conference on Neural Networks, 1988.

Rudin, C. et al. “Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges”, 2021.

P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis, "Modeling wine preferences by data mining from physicochemical properties," Decision Support Systems, vol. 47, no. 4, pp. 547-553, 2009.

Downloads

Posted


Submitted: 2025-09-05 08:39:24 UTC

Published: 2025-09-11 08:55:34 UTC
Section
Information Sciences