Safe EVs battery management using reinforcement learning

Authors

  • Maximiliano Trimboli CONICET-UNSL. Laboratorio de Sistemas Inteligentes. San Luis. Argentina https://orcid.org/0009-0009-8523-0406
  • Nicolás Antonelli CONICET-UNSL. Laboratorio de Sistemas Inteligentes. San Luis. Argentina
  • Luis Avila CONICET-UNSL. Laboratorio de Sistemas Inteligentes. San Luis. Argentina https://orcid.org/0000-0003-0321-068X
  • Mariano de Paula UNICEN-CICpBA-CONICET, INTELYMEC. Centro de Investigaciones en Física e Ingeniería del Centro. Olavarría. Argentina https://orcid.org/0000-0001-7582-9188

DOI:

https://doi.org/10.30973/progmat/2024.16.1/4

Keywords:

Safe-RL, SOC, battery aging, variability

Abstract

Lithium-ion batteries are the standard power source for electric vehicles (EVs) as an alternative of choice to reduce CO2 emissions. But before becoming a reliable technology, lithium-ion batteries must face two major challenges: undesirable electrochemical reactions due to excessive charging rates and the considerable time it takes for an EV to charge. Therefore, it is necessary to use balanced current profiles that avoid both the serious effects of battery degradation and inconvenience to end users. In this work, the authors propose a safe scanning deep reinforcement learning (SDRL) approach to determine optimal load profiles under varying operating conditions. One of the main advantages of RL techniques is that they can learn from the interaction with the simulated or real system by incorporating nonlinearity and uncertainty arising from fluctuating environmental conditions. However, since RL techniques have to explore undesirable states before obtaining an optimal policy, they do not offer security guarantees. The proposed approach aims to maintain zero constraint violations throughout the entire learning process by incorporating a security layer that corrects the action if a constraint is likely to be violated. The proposed method is tested on the equivalent circuit of a lithium-ion battery considering variability conditions. First results show that SDRL is able to find optimized and safe charging policies taking into account a trade-off between charging speed and battery life.

Author Biographies

Maximiliano Trimboli, CONICET-UNSL. Laboratorio de Sistemas Inteligentes. San Luis. Argentina

Maximiliano Trimboli es ingeniero mecatrónico graduado en la Facultad de Ingeniería y Ciencias Agropecuarias de la Universidad Nacional de San Luis (FICA-UNSL), Argentina. Allí además ejerce un cargo como Auxiliar Docente en el área de automatización. Becado por el Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), se encuentra realizando el Doctorado en Ciencias de la Computación. Desarrolla tareas de investigación en el Laboratorio de Sistemas Inteligentes (LSI) relacionadas a métodos de aprendizaje automático aplicados en el campo de las energías renovables.

Nicolás Antonelli, CONICET-UNSL. Laboratorio de Sistemas Inteligentes. San Luis. Argentina

Nicolás Antonelli es un ingeniero electromecánico graduado en la Universidad Nacional General Sarmiento (UNGS), Argentina. Está realizando, mediante una beca del Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), un doctorado en ciencias de la computación en la Universidad Nacional de San Luis (UNSL). Desarrolla tareas de investigación en el Laboratorio de Sistemas Inteligentes (LSI) relacionadas a métodos de aprendizaje automático aplicados en el campo de las energías renovables.

Luis Avila, CONICET-UNSL. Laboratorio de Sistemas Inteligentes. San Luis. Argentina

Luis Avila es ingeniero electrónico graduado en la Universidad Nacional de San Luis (UNSL), Argentina. Obtuvo su Doctorado en Ingeniería en la Universidad Tecnológica Nacional (UTN-FRSF). Es investigador del Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) trabajando en el Laboratorio de Sistemas Inteligentes (LSI) de la UNSL, donde además ejerce un cargo como Profesor.

Mariano de Paula, UNICEN-CICpBA-CONICET, INTELYMEC. Centro de Investigaciones en Física e Ingeniería del Centro. Olavarría. Argentina

Mariano de Paula es ingeniero industrial graduado de la Universidad Nacional del Centro de la provincia de Buenos Aires, Argentina. Obtuvo un doctorado en Ingeniería de la Universidad Tecnológica Nacional (UTN-FRSF), Argentina. Es investigador en el Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) y lleva a cabo sus actividades en el INTELYMEC-UNCPBA. Además, es Profesor Adjunto en la Facultad de Ingeniería de la UNCPBA.

References

Campbell, I. D., Gopalakrishnan, K., Marinescu, M., Torchio, M., Offer, G. J., Raimondo, D. Optimising lithium-ion cell design for plug-in hybrid and battery electric vehicles. Journal of Energy Storage. 2019, 22, 228-238. https://doi.org/10.1016/j.est.2019.01.006.

Danilov, D., Notten, P. H. L. Adaptive battery management systems for the new generation of electrical vehicles. In 2009 IEEE Vehicle Power and Propulsion Conference, 2009, 317-320. https://doi.org/10.1109/VPPC.2009.5289835.

Xing, Y., Ma, E. W., Tsui, K. L., Pecht, M. Battery management systems in electric and hybrid vehicles. Energies. 2011, 4(11), 1840-1857. https://doi.org/10.3390/en4111840.

Yan, W., Zhang, B., Zhao, G., Weddington, J., Niu, G. Uncertainty management in Lebesgue-sampling-based diagnosis and prognosis for lithium-ion battery. IEEE Transactions on Industrial Electronics. 2017, 64(10), 8158-8166. https://doi.org/10.1109/TIE.2017.2701790.

Kim, M., Lim, J., Ham, K. S., Kim, T. Optimal charging method for effective Li-ion battery life extension based on reinforcement learning. In Proc. of the 38th ACM/SIGAPP Symposium on Applied Computing. 2023, 1659-1661. https://doi.org/10.1145/3555776.3577800.

Tunuguntla, S. T. Adaptive charging techniques for Li-ion battery using Reinforcement Learning (Doctoral dissertation), 2021.

Chang, F., Chen, T., Su, W., Alsafasfeh, Q. Control of battery charging based on reinforcement learning and long short-term memory networks. Computers & Electrical Engineering. 2020, 85, 106670. https://doi.org/j.compeleceng.2020.106670.

Triki, M., Ammari, A. C., Wang, Y., Pedram, M. Reinforcement learning-based dynamic power management of a battery-powered system supplying multiple active modes. In 2013 European Modelling Symposium, 2013, 437-442. https://doi.org/10.1109/EMS.2013.74.

Park, S., Pozzi, A., Whitmeyer, M., Perez, H., Joe, W. T., Raimondo, D. M., Moura, S. Reinforcement learning-based fast charging control strategy for li-ion batteries. In 2020 IEEE Conference on Control Technology and Applications (CCTA), 2020, 100-107. https://doi.org/10.1109/CCTA41146.2020.9206314.

Chow, Y., Nachum, O., Faust, A., Duenez-Guzman, E., Ghavamzadeh, M. Lyapunov-based safe policy optimization for continuous control. 2019, arXiv preprint 1901.10031. https://doi.org/10.48550/arXiv.1901.10031.

Cheng, R., Orosz, G., Murray, R. M., Burdick, J. W. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In Proc. of the AAAI Conference on Artificial Intelligence. 2019, 33, 3387-3395. https://doi.org/10.1609/aaai.v33i01.33013387.

Grzes, M. Reward shaping in episodic reinforcement learning, Proc. of the Int. Joint Conf. on Autonomous Agents and Multiagent Systems, AAMAS, 2017, 1, 565–573.

Dong, Y., Tang, X., Yuan, Y. Principled reward shaping for reinforcement learning via Lyapunov stability theory. Neurocomputing, 2020, 393, 83-90. https://doi.org/10.1016/j.neucom.2020.02.008.

Achiam, J., Held, D., Tamar, A., Abbeel, P. Constrained policy optimization. In International conference on machine learning, 2017, 22-31.

Junges, S., Jansen, N., Dehnert, C., Topcu, U., Katoen, J. P. Safety-constrained reinforcement learning for MDPs. In International Conference on tools and algorithms for the construction and analysis of systems, 2016, 130-146.

Abbeel, P., Ng, A. Y. Apprenticeship learning via inverse reinforcement learning, In Proceedings, Twenty-First International Conference on Machine Learning, ICML 2004, New York, USA, ACM Press, 2004, 1-8.

Zhang, X., Ma, H. Pretraining deep actor-critic reinforcement learning algorithms with expert demonstrations. 2018, arXiv preprint 1801.10459. https://doi.org/10.48550/arXiv.1801.10459.

Dalal, G., Dvijotham, K., Vecerik, M., Hester, T., Paduraru, C., Tassa, Y. Safe exploration in continuous action spaces. 2018, arXiv preprint 1801.08757. https://doi.org/10.48550/arXiv.1801.08757.

Perez, H. E., Hu, X., Dey, S., Moura, S. J. Optimal charging of Li-ion batteries with coupled electro-thermal-aging dynamics. IEEE Transactions on Vehicular Technology. 2017 66(9), 7761-7770. https://doi.org/10.1109/TVT.2017.2676044.

Lin, X., Perez, H. E., Mohan, S., Siegel, J. B., Stefanopoulou, A. G., Ding, Y., Castanier, M. P. A lumped-parameter electro-thermal model for cylindrical batteries. Journal of Power Sources. 2014, 257, 1-11. https://doi.org/10.1016/j.jpowsour.2014.01.097.

Perez, H. E., Siegel, J. B., Lin, X., Stefanopoulou, A. G., Ding, Y., Castanier, M. P. Parameterization and validation of an integrated electro-thermal cylindrical lfp battery model. In Dynamic Systems and Control Conference. 2012, 45318, 41-50. https://doi.org/10.1115/DSCC2012-MOVIC2012-8782.

Altman, E. Constrained Markov decision processes. Routledge, 2021.

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Wierstra, D. Continuous control with deep reinforcement learning. 2015, arXiv preprint 1509.02971. https://doi.org/10.48550/arXiv.1509.02971.

Baxter, J., Bartlett, P. L. Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research. 2001, 15, 319-350. https://doi.org/10.1613/jair.806.

2024-16-01-04

Published

2024-02-01

How to Cite

Trimboli, M., Antonelli, N., Avila, L., & de Paula, M. (2024). Safe EVs battery management using reinforcement learning. Programación Matemática Y Software, 16(1), 35–46. https://doi.org/10.30973/progmat/2024.16.1/4