Fault Detection and Control Systems Open access Peer reviewed

State-Separated SARSA: A Practical Sequential Decision-Making Algorithm with Recovering Rewards

Yuto Tanimoto, Kenji Fukumizu

Algorithms | May 21, 2026

Abstract

While many multi-armed bandit algorithms assume that rewards for all arms are constant across rounds, this assumption does not hold in many real-world scenarios. This paper considers the setting of recovering bandits, where the reward depends on the number of rounds elapsed since the last time an arm was pulled. We propose a new reinforcement learning (RL) algorithm tailored to this setting, named the State-Separated SARSA (SS-SARSA) algorithm, which treats the elapsed rounds as states. The SS-SARSA algorithm achieves efficient learning by reducing the number of state combinations required for Q-learning/SARSA, which often suffers from combinatorial explosion for large-scale RL problems. Additionally, it makes minimal assumptions about the reward structure and has lower computational complexity. Furthermore, we prove asymptotic convergence to an optimal policy under mild assumptions. Simulation studies demonstrate the superior performance of our algorithm across various settings.

Direct answer

What can I do from this paper page?

Use this page to scan "State-Separated SARSA: A Practical Sequential Decision-Making Algorithm with Recovering Rewards" quickly: start with the summary and abstract, then check the authors, source, topics, and related papers. From here, open Scollr to follow Fault Detection and Control Systems research, save the paper, or map adjacent work.

Authors

Researchers on this paper

Yuto Tanimoto

first | The Institute of Statistical Mathematics

Kenji Fukumizu

last | The Institute of Statistical Mathematics | ORCID 0000-0002-3488-2625

Research areas

Follow related topics

Latest Fault Detection and Control Systems research Software System Performance and Reliability

Citation

BibTeX

@article{Tanimoto2026State,
  title = {State-Separated SARSA: A Practical Sequential Decision-Making Algorithm with Recovering Rewards},
  author = {Yuto Tanimoto and Kenji Fukumizu},
  journal = {Algorithms},
  year = {2026},
  doi = {10.3390/a19050419},
  url = {https://doi.org/10.3390/a19050419}
}

FAQ

Using this paper in a discovery workflow

How do I find related work for this paper?

Use the related papers and topic links on this page as starting points. In Scollr, you can also open the paper and build a literature map around its references, citing papers, and related work.

How can I keep up with new Fault Detection and Control Systems research papers?

Follow Fault Detection and Control Systems research in Scollr. New papers from the topic flow into a personalized feed, and you can save useful studies to revisit later.

Can I cite this paper from this page?

This page includes a static BibTeX block for State-Separated SARSA: A Practical Sequential Decision-Making Algorithm with Recovering Rewards. Always verify the DOI, source, and publication details against the publisher record before submitting a manuscript.

Follow this research in Scollr

Follow the topics and authors behind this paper, save useful studies, and build a literature map when you are ready to go deeper.

Get the app