Patients with severe Coronavirus disease 19 (COVID-19) typically require supplemental oxygen as an essential treatment. We developed a machine learning algorithm, based on a deep Reinforcement Learning (RL), for continuous management of oxygen flow rate for critical ill patients under intensive care, which can identify the optimal personalized oxygen flow rate with strong potentials to reduce mortality rate relative to the current clinical practice. Basically, we modeled the oxygen flow trajectory of COVID-19 patients and their health outcomes as a Markov decision process. Based on individual patient characteristics and health status, a reinforcement learning based oxygen control policy is learned and real-time recommends the oxygen flow rate to reduce the mortality rate. We assessed the performance of proposed methods through cross validation by using a retrospective cohort of 1,372 critically ill patients with COVID-19 from New York University Langone Health ambulatory care with electronic health records from April 2020 to January 2021. The mean mortality rate under the RL algorithm is lower than standard of care by 2.57% (95% CI: 2.08- 3.06) reduction (P<0.001) from 7.94% under the standard of care to 5.37 % under our algorithm and the averaged recommended oxygen flow rate is 1.28 L/min (95% CI: 1.14-1.42) lower than the rate actually delivered to patients. Thus, the RL algorithm could potentially lead to better intensive care treatment that can reduce mortality rate, while saving the oxygen scarce resources. It can reduce the oxygen shortage issue and improve public health during the COVID-19 pandemic.