Multimodal Search Requires Entirely New Input From Users

This is the news coming from Google at its recent "Search On Event". One overriding theme from the event was "context is king". Yes, I know you have heard it before somewhere - the web "Content is King". Interesting how these phrases metamorphose but with some tail, tail signs left over. Jokes aside, the company is now leveraging its new Multitask Unified Model (MUM) machine learning technology into search. The company hopes to kick off a virtuous cycle that will provide more detailed and context-rich answers, and in return, it hopes users will ask more detailed and context-rich questions. The end result, the company hopes, will be a richer and deeper search experience. Now note the phrase "Virtuous Circle" as used above and not to be confused with Vicious circle". The former is an upward spiral of potential - where each success you garner more resources which in turn allow you to achieve greater and greater successes. In other words the more we are sucked into their eco-system with very little room to maneuver or seek alternatives. A vicious circle, on the other hand, compounds the negative and drives you bonkers. - just an observation - mine of course.

The boss Prabhakar Raghavan (Google SVP) repeatedly stated that - “search is not a solved problem.” That may be true, but the problems he and his team are trying to solve now have less to do with wrangling the web and more to do with adding context to what they find there.

Google for its part, is going to begin flexing its ability to recognize constellations of related topics using machine learning and present them to you in an organized way. A coming redesign to Google search will begin showing “Things to know” boxes that send you off to different subtopics. When there’s a section of a video that’s relevant to the general topic — even when the video as a whole is not — it will send you there. Shopping results will begin to show inventory available in nearby stores, and even clothing in different styles associated with your search.

For the consumer and that is you, Google is offering — though perhaps “asking” is a better term — new ways to search that goes beyond the text box. It’s making an aggressive push to get its image recognition software Google Lens into more places. It will be built into the Google app on iOS and also the Chrome web browser on desktops. And with MUM, Google is hoping to get users to do more than just identify flowers or landmarks, but instead use Lens directly to ask questions and shop.

“It’s a cycle that I think will keep escalating,” Raghavan says. “More technology leads to more user affordance, leads to better expressivity for the user, and will demand more of us, technically.”

The strive for google to capture and configure both sides of the search equation are meant to kick start the next stage of Google search, one where its machine learning algorithms become more prominent in the process by organizing and presenting information directly. In this, Google efforts will be helped hugely by recent advances in AI language processing. Thanks to systems known as large language models (MUM is one of these), machine learning has got much better at mapping the connections between words and topics. It’s these skills that the company is leveraging to make search not just more accurate, but more explorative and, it hopes, more helpful.

One of Google’s examples is instructive. You may not have the first idea what the parts of your bicycle are called, but if something is broken you’ll need to figure that out. Google Lens can visually identify the derailleur (the gear-changing part hanging near the rear wheel) and rather than just give you the discrete piece of information, it will allow you to ask questions about fixing that thing directly, taking you to the information (YouTube channel off course).

The push to get more users to open up Google Lens more often is fascinating on its own merits, but the bigger picture (so to speak) is about Google’s attempt to gather more context about your queries. More complicated, multimodal searches combining text and images demand “an entirely different level of contextualization that we the provider have to have, and so it helps us tremendously to have as much context as we can,” Raghavan says.

We are very far from the so-called “ten blue links” of search results that Google provides. It has been showing information boxes, image results, and direct answers for a long time now. Today’s announcements are another step, one where the information Google provides is not just a ranking of relevant information but a distillation of what its machines understand by scraping the web.

In some cases as with shopping, that distillation means you’ll likely be sending Google more page views. As with Lens, that trend is important to keep an eye on: Google searches increasingly push you to Google’s own products. But there’s a bigger danger here, too. The fact that Google is telling you more things directly increases a burden it’s always had: to speak with less bias.

By that, I mean bias in two different senses. The first is technical: the machine learning models that Google wants to use to improve search have well-documented problems with racial and gender biases. They’re trained by reading large swaths of the web, and, as a result, tend to pick up nasty ways of talking. Google’s troubles with its AI ethics team are also well documented. As Google’s VP of search, Pandu Nayak, said in the event, Google knows that all language models have biases, but the company believes it can avoid “putting it out for people to consume directly.”.