The research on machine learning and artificial intelligence, which is now a key technology in almost every industry and company, is too voluminous to read in its entirety. This column perceptronaims to collect some of the most important recent discoveries and papers, especially in but not limited to artificial intelligence, and explain why they are important.
In this series of recent studies, Meta has discovered a language system that it claims is the first to be able to translate 200 different languages with “state of the art” results. Not to be outdone, Google has detailed the machine learning model, Minerva, which can solve problems of quantitative thinking, including mathematical and scientific questions. And Microsoft released the language model, Gödelto create “realistic” conversations similar to those widely advertised by Google lambda. We also have some new text-to-image generators with a twist.
The new Meta model, NLLB-200, is part of the company’s No Language Left Behind initiative to develop machine translation capabilities for most of the world’s languages. Trained to understand languages such as Kamba (spoken by the Bantu ethnic group) and Lao (the official language of Laos), as well as over 540 African languages poorly or not at all supported by previous translation systems, the NLLB-200 will be used to translate languages in the news feed Facebook and Instagram in addition to the Wikimedia Foundation’s content translation tool, Meta recently announced.
AI translation can scale significantly — and already It has scalable – the number of languages into which you can translate without human intervention. But, as some researchers point out, errors related to incorrect terminology, omissions and mistranslation can occur in AI-generated translations because systems are trained mainly on data from the Internet, not all of which is qualitative. For example, Google Translate once assumed doctors were male and nurses were female, while Bing’s translator translated phrases like “table soft” like the feminine “die Tabelle” in German (which refers to a table of numbers ).
For NLLB-200, Meta said it has “completely redesigned” its data cleansing pipeline with “major filtering steps” and toxicity filtering lists for a full set of 200 languages. It remains to be seen how well this works in practice, but as the NLLB-200 meta-researchers acknowledge in an academic paper describing their methods, no system can be completely free from bias.
Similarly, Godel is a language model trained on a huge amount of text from the Internet. However, unlike the NLLB-200, Godel was designed to have an “open” dialogue – conversations on a variety of topics.
Gödel might answer a question about a restaurant or have a two-way conversation about a specific topic, such as the history of the area or a recent sports match. Usefully, like Google Lamda, the system can use content from all over the web that was not part of the training dataset, including restaurant reviews, Wikipedia articles, and other content on public websites.
But Gödel faces the same pitfalls as the NLLB-200. In the paper, the team responsible for its creation notes that it “may generate harmful reactions” due to “forms of social bias and other toxicity” in the data used to train it. Eliminating or even alleviating these biases remains an unsolved problem in the field of AI—a problem that may never be fully resolved.
Google’s Minerva model is potentially less problematic. As the team behind it describes in a blog post, the system learned a dataset of 118GB of scientific papers and web pages containing mathematical expressions to solve quantitative thinking problems without the use of external tools like a calculator. Minerva can generate solutions involving numerical calculations and “symbolic manipulations”, achieving top performance on popular STEM tests.
Minerva is not the first model designed to solve these problems. For example, DeepMind from Alphabet. demonstrated several algorithms that can help mathematicians with complex and abstract problems, and OpenAI experimented with a system trained to solve school-level math problems. But according to the team, Minerva incorporates the latest techniques for better solving math problems, including an approach that involves “hinting” the model with multiple step-by-step solutions to existing questions before presenting it with a new question.
Minerva still makes quite a few mistakes, and sometimes she arrives at the correct final answer, but with flawed reasoning. However, the team hopes it will serve as the basis for models that “help push the boundaries of science and education.”
The question of what AI systems actually “know” is more philosophical than technical, but the question of how they organize that knowledge is a fair and valid one. For example, an object recognition system might show that it “understands” that domestic cats and tigers are somewhat similar, allowing the concepts to intentionally overlap in how it identifies them – or maybe it doesn’t really understand it. , and the two creature types are completely unrelated to it.
Researchers at UCLA wanted to find out if language patterns “understand” words in this sense. developed a method called “semantic projection” which assumes that yes, they. While you can’t just ask the model to explain how and why a whale is different from a fish, you can see how closely she links these words to other words, such as mammal, big, Scales, and so on. If a whale is strongly associated with mammals and large, but not scales, you know he has a good idea of what he is talking about.
As a simple example, they found that animals matched the concepts of size, sex, danger, and humidity (the sample was a little odd), and fortunes matched the weather, wealth, and commitment. Animals are non-partisan and fortunes are asexual, so everything is tracked.
There is currently no more reliable way to tell if a model understands certain words than by asking it to draw them, and text-to-image models continue to improve. The Google Pathways Autoregressive Text-to-Image or Parti model looks like one of the best, but it’s hard to compare with competitors (DALL-E and others) without the access that few of the models offer. . Anyway, you can read about the party approach here.
One interesting aspect of the Google article shows how the model performs with a growing number of parameters. See how the image gradually improves as the numbers increase:
Does this mean that all the best models will have tens of billions of parameters, that is, they will take forever to train and they will only work on supercomputers? At the moment, of course, this is kind of a crude approach to making things better, but tick-tock AI means that the next step is not just to make it bigger and better, but to make it smaller and more equivalent. Let’s see who can do it.
Meta hasn’t been left out, this week Meta also showcased a generative AI model, though it claims it gives artists more freedom to use it. I’ve played with these generators a lot myself, part of the fun is seeing what they get, but they often produce nonsensical mockups or don’t “understand” the hint. Meta’s Make-A-Scene aims to fix this.
It’s not exactly an original idea – you draw a basic silhouette of what you’re talking about and he uses that as a base to create an image on top of it. We saw something similar in 2020 with Google Nightmare Generator. It’s a similar concept, but scaled up to allow the creation of realistic images from text clues, using a sketch as a base, but with more room for interpretation. Can be useful for artists who have a general idea of what they are thinking about but want to incorporate the boundless and weird creativity of the model.
Like most of these systems, Make-A-Scene isn’t really available for public use because, like the others, it’s pretty computationally greedy. Don’t worry, we’ll be getting decent versions of these things at home soon.
Credit: techcrunch.com /