Using Temperature Sampling to Effectively Train Robot Learning Policies on Imbalanced Datasets

University of Michigan - Ann Arbor

Temperature Based Sampling for Imbalanced Datasets.

Abstract

Increasingly large datasets of robot actions and sensory observations are being collected to train ever-larger neural networks. These datasets are typically partitioned into tasks described using natural language. However, while these tasks may be distinct in their descriptions, many involve very similar physical action sequences (e.g., 'pick up an apple' versus 'pick up an orange'). As a result, many datasets of robotic tasks are substantially imbalanced in terms of the physical robotic actions they represent. In this work, we explore methods for sampling data during policy training to manage this imbalance. We first use the RoboCasa dataset in simulation, from which we subset 3,000 examples of pick-and-place tasks and 50 examples of each other task. We use temperature sampling, where the temperature value modulates the likelihood of sampling from different action domains. We evaluate two different temperature sampling schedules during training: cosine decay and cosine warming. We observe that cosine warming—where low-resource tasks are sampled more frequently at the end of training— improves training efficiency, increases sample efficiency across tasks, and boosts overall task success. We then construct an imbalanced dataset of eight real-world tasks on a Franka Panda robot arm. We find that training on this dataset using the same sampling schedules confirms our simulation results.

Task Videos

Select Task

BibTeX

@inproceedings{Patil2025UsingTS,
  title={Using Temperature Sampling to Effectively Train Robot Learning Policies on Imbalanced Datasets},
  author={Basavasagar Patil and Sydney Belt and Jayjun Lee and Nima Fazeli and Bernadette Bucher},
  year={2025},
}