Among the announcements made on the occasion of Google’s 20th anniversary last September, Danny Sullivan revealed that Google had been using “neural matching” in its algo for several months. On the occasion of the recent update of March 12, some raised the possibility that this technology played a role in the upheavals of ranking observed (which Google has denied, moreover).
But what is “neural matching”? We will see that it is at the same time anything but a revolution (it is more an evolution) of the techniques used by the search engines, and we will point the finger at the errors of interpretation that I could note here and there with SEOs.
Neural matching: an avatar of well-known techniques
“Neural match” is a loose term that Google is probably using to confuse the issue. But clearly it alludes to the use of neural networks for a task Danny Sullivan referred to as “ad hoc retrieval.”
“Ad hoc extraction” is the scientific name given to a particular task that search engines like Google must perform to build a results page: extract from its index a list of web pages ordered according to their relevance in response to a query.
The classic solution is to use similarity calculation methods between the query and the documents, based on the terms contained in the typed expression and in the web pages. But this method, which works well enough to have been used since the 1960s and until today in engines, has two drawbacks:
- as the queries contain few terms, the results judged to be “similar” are not all of great relevance
- the method does not allow web pages that are correct answers to rise to the top of the rankings if they do not really contain the terms of the request
A CLASSIC EXAMPLE: THE TWO TEXTS ABOVE HAVE THE SAME SIMILARITY SCORES (COSINE OF SALTON) ON THE QUERY “ALBUQUERQUE”, EXCEPT THAT THE FIRST IS RELEVANT BUT NOT THE SECOND. NEURAL INFORMATION RETRIEVAL METHODS MAKE IT POSSIBLE TO IDENTIFY THE LATENT MEANING (UNDERLYING, HIDDEN), AND TO IDENTIFY THE TEXT A AS THE ONLY RELEVANT TEXT
Information retrieval researchers have therefore been looking for more effective alternative solutions for many years. They thought they would hold a lead in the early 2000s with the LDA (Latent Dirichlet Allocation) and LSI (Latent Semantic Indexing) approaches, but this did not give solid applications for “ad hoc retrieval”. It was not until the revolution brought by the use of neural networks to linguistic problems to see the first integrations take shape. In particular with the “word embeddings” method found in Rankbrain, a component of Google’s algorithm since 2016.
What Google describes with neural matching is only the result of the evolution of the state of the art: today, we know better how to analyze the “meaning” of the terms included in the queries, as well as in the web pages. In practice, the methods now use separate and then combined analyzes. We also know better how to associate the information provided by methods based on neural networks with that provided by conventional methods, in order to obtain more relevant results pages than before.
Danny Sullivan recently clarified the difference between Rankbrain and Neural Matching: RankBrain helps Google better relate pages to concepts; Neural matching helps Google to better relate words to searches….
Both approaches are part of a new discipline in the science behind search engine ranking algorithms that researchers call “Neural Information Retrieval”.
What does this change concretely?
The integration into the algorithm of Neural IR in general and Neural Matching has two important consequences for referencing.
- it is no longer mandatory to have the keywords of the query in the content of the page to come out at the top of the results. Danny Sullivan cited a striking example, where on the query “why does my tv look strange” appear pages talking about the “soap opera” effect (an effect produced by Ultra High Definition TVs which give the impression that a film from the 70s was filmed on video!)\
- Generally speaking, neural network-based methods relying a lot on context, by extracting information about the precise meaning of words in this context, stuffing a page with poorly chosen synonyms works much less well than with classical methods