Uber’s AI beats troublesome games with new type of reinforcement learning

Ryan Daws is a senior editor at TechForge Media with over a decade of experience in crafting compelling narratives and making complex topics accessible. His articles and interviews with industry leaders have earned him recognition as a key influencer by organisations like Onalytica. Under his leadership, publications have been praised by analyst firms such as Forrester for their excellence and performance. Connect with him on X (@gadget_ry) or Mastodon (@gadgetry@techhub.social)


Video games have become a proving ground for AIs and Uber has shown how its new type of reinforcement learning has succeeded where others have failed.

Some of mankind’s most complex games, like Go, have failed to challenge AIs from the likes of DeepMind. Reinforcement learning trains algorithms by running scenarios repeatedly with a ‘reward’ given for successes, often a score increase.

Two classic games from the 80s – Montezuma’s Revenge and Pitfall! – have thus far been immune to a traditional reinforcement learning approach. This is because they have little in the way of notable rewards until later in the games.

Applying traditional reinforcement learning typically results in a failure to progress out the first room in Montezuma’s Revenge, while in Pitfall! it fails completely.

One way researchers have attempted to provide the necessary rewards to incentivise the AI is by adding them in for exploration, what’s called ‘intrinsic motivation’. However, this approach has shortcomings.

“We hypothesize that a major weakness of current intrinsic motivation algorithms is detachment,” wrote Uber’s researchers. “Wherein the algorithms forget about promising areas they have visited, meaning they do not return to them to see if they lead to new states.”

Uber’s AI research team in San Francisco developed a new type of reinforcement learning to overcome the challenge.

The researchers call their approach ‘Go-Explore’ whereby the AI will return to a previous task or area to assess whether it yields a better result. Supplementing with human knowledge to guide it towards notable areas sped up its progress dramatically.

If nothing else, the research provides some comfort us feeble humans are not yet fully redundant and the best results will be attained by working hand-in-binary with our virtual overlords.

 Interested in hearing industry leaders discuss subjects like this and their use cases? Attend the co-located AI & Big Data Expo events with upcoming shows in Silicon Valley, London, and Amsterdam to learn more. Co-located with the IoT Tech Expo, Blockchain Expo, and Cyber Security & Cloud Expo.

Tags: , , , , , , ,

View Comments
Leave a comment

Leave a Reply