learning based on experiment and reward