The ultimate achievement for some in the artificial intelligence industry is the creation of a system with artificial general intelligence (AGI), or the ability to understand and learn any task a human can perform. Long relegated to the realm of science fiction, AGI was supposed to create systems capable of reasoning, planning, learning, representing knowledge, and communicating in natural language.
Not every expert is convinced that AGI is a real goal or even a possibility. But it can be argued that DeepMind, an Alphabet-backed research lab, has gone in that direction this week with the release of an artificial intelligence system called Gato,
Gato is what DeepMind describes as a “general purpose” system, a system that can be taught to perform many different types of tasks. Researchers at DeepMind trained Gato to complete 604 tasks, including adding captions to images, participating in dialogues, stacking blocks with a real robot arm, and playing Atari games.
Jack Hessel, a research fellow at the Allen Institute for Artificial Intelligence, notes that a single AI system that can solve multiple problems is not new. For example, Google recently started using a system in Google search called the multitasking unified model, or MOTHER, which can process text, images, and video to perform tasks from finding cross-language spellings of a word to associating a search query with an image. But what is an potentially new here, according to Hessel, is the variety of tasks and teaching methods.
“We’ve seen evidence before that individual models can handle surprisingly diverse sets of inputs,” Hessel told TechCrunch in an email. “In my opinion, the main question when it comes to multi-tasking learning…is whether the tasks complement each other or not. You can imagine a more boring case if the model implicitly separates the tasks before solving them, for example: “If I find task A as input, I will use subnet A. If I find task B instead, I will use a different subnet B .’ For this null hypothesis, similar performance can be achieved by training A and B separately, which is not impressive. On the contrary, if training A and B together leads to improvement in one of them (or both!), Then everything becomes more exciting.
Like all AI systems, Gato learned by example, ingesting billions of words, images from the real world and simulated environments, button presses, joint torques, and more in the form of tokens. These tokens served to represent data in a way that Gato could understand, allowing the system to, for example, determine the Breakout mechanic or determine which combination of words in a sentence could make grammatical sense.
Gato does not necessarily perform these tasks well. For example, when communicating with a person, the system often responds with a superficial or factually incorrect answer (for example, “Marseille” in response to “What is the capital of France?”). Signing photos, Gato changes the gender of people. And the system correctly stacks blocks with a real robot only 60% of the time.
But in 450 of the above 604 problems, DeepMind claims that Gato performs better than the expert in more than half of the cases.
“If you think we need a common [systems]that is, a lot of people in the field of artificial intelligence and machine learning, then [Gato is] big deal,” Matthew Guzdial, assistant professor of computer science at the University of Alberta, told TechCrunch via email. “I think people who say this is an important step towards AGI are overstating it a little, because we are still not at the level of human intelligence and probably will not soon (in my opinion) get there. I personally am more in the camp of many small models [and systems] be more useful, but these generic models definitely have advantages in terms of their performance on tasks beyond their training data.”
Curiously, from an architectural point of view, Gato does not differ much from many modern artificial intelligence systems. It shares characteristics with GPT-3 OpenAI in that it is a “transformer”. As of 2017, Transformer has become the architecture of choice for complex logic tasks, demonstrating the ability to summarize documents, create music, classify objects in images, and analyze protein sequences.
Perhaps even more remarkable, Gato is orders of magnitude smaller than single-tasking systems, including GPT-3, in terms of the number of options. Parameters are parts of the system derived from the training data and essentially determine the skill of the system in solving a problem such as text generation. Gato has just 1.2 billion while GPT-3 has over 170 billion.
DeepMind researchers deliberately made Gato small so that the system could control the robot arm in real time. But they suggest that – if you zoom in – Gato will be able to handle any “task, behavior, and expression of interest.”
Assuming this to be the case, several other hurdles would need to be overcome to make Gato outperform advanced single-tasking systems in specific tasks, such as Gato’s inability to continually learn. Like most Transformer-based systems, Gato’s knowledge of the world is based on training data and remains unchanged. If you ask Gato a date-sensitive question like the current President of the United States, chances are he will answer incorrectly.
Transformer – and by extension Gato – has another limitation in its context window, or the amount of information the system can “remember” in the context of a given task. Even the best Transformer-based language models cannot write a long essay, let alone a book, without forgetting key details and thus losing sight of the plot. Forgetting happens in any task, whether it’s writing or controlling a robot, which is why some experts called this is the Achilles heel of machine learning.
“It’s not that Gato makes new things possible,” Guzdial added, pointing out the flaws in the system. “[B]but it’s becoming clear that with today’s machine learning models, we can do more than we thought.”
Credit: techcrunch.com /