I needed to find an old email the other day. That’s it. Simple. But I dreaded the prospect, because I use Gmail. And the search function in Gmail, as millions of users know from bitter personal experience, makes it almost impossible to find what you’re looking for.
You hunt for a flight confirmation number; you get every newsletter from the frequent-flyer program. Search by sender’s name, and you get only the most recent few days of emails from them — if you get anything relevant at all. Search for an attachment, and you can’t tell which message actually has the attachment or which ones are just replies. I’d laugh out loud if I didn’t have a headache from banging my head against my desk. How is it possible that the company that makes the best, most robust technology for searching the internet also makes an email product in which the search function doesn’t actually function?
But the truth is, Gmail isn’t an outlier when it comes to search. Apple’s Spotlight search often coughs up no results for specific documents or files I need; search in the Finder finds too many. The screens of Google Maps and Apple Maps are too cluttered with functions to see, especially on mobile. Amazon literally shows you things you did not ask for, followed by its own knockoffs, before taking a stab at locating what you typed into the search bar. Instagram doesn’t have image search. Searching for a specific tweet you remember, even by the handle of the tweeter? Good luck.
So my question is this: Why is search so bad? Solving how to search for things was the key to the web’s integration into mainstream life, the thing that moved the internet out of university basements and into our pockets. Now, it seems as if our ability to locate and retrieve information is getting worse instead of better, right at the moment when true facts are humanity’s most precious commodity. When we moved into the digital age, we made a collective decision to store almost everything we know — even our most personal and intimate memories — outside our brains. At this point, search is memory. And when we all use the same slightly broken tools for recall, we’re at risk of forgetting ourselves.
To understand how we got from there to here — from our neatly organized past to the hopelessly cluttered present — we need to understand that search comes in different flavors. Different kinds of information require different kinds of searches. But every form of search has one thing in common. To put it in technical terms, they all kind of suck.
Broadly speaking, there are two kinds of search. One is “known item,” where you have a specific fact or object or destination in mind, and you just need to know where it is. The other is “exploratory,” where you don’t know what you don’t know.
Email is a sort of special case. People often know a specific email exists, or who sent it, or when, but those criteria might also fit a bunch of other emails. People mostly want to find emails that are recent, from within the past month or so — except sometimes they don’t. People often remember a few key details, like who sent an email or what some of the words were, but sometimes they misremember. “The reason why search provided by individual providers, including Gmail, is often quite awful is that the underlying problem is quite hard,” says Sridhar Ramaswamy, a former Google executive who is now the CEO of Neeva, a search-engine startup.
When Google first arrived, it solved the search problem in several ingenious ways. The famous one, the one you’ve probably heard of, is called PageRank, which counted “inbound links” — i.e., the number of times other sites cited the same result as a source. PageRank gave you the answer that the rest of the internet thought was a good answer. But Google’s deeper power is in identifying what those pages are about and being able to associate the kinds of words you and I might search for with the stuff on those pages. And then there’s the index — Google’s regularly updated crawl of the entire internet, or a vast majority of it. Today the index surveys 100 million gigabytes of data, hundreds of millions of webpages. That scale gives Google a huge advantage: statistics on all the different things people search for, and the ways they do it.
Alas, most of that advantage evaporates in Gmail. Yes, there’s a lot of email in the world — according to a book on the problem by a bunch of Gmail engineers, people receive over 300 billion emails every day, if you count machine-generated stuff like receipts or notifications. That sounds like a big enough corpus of data to make Google-scale stats work. But email isn’t a collective thing like the web. Your email inbox is yours, and no matter how much email you have in there (read or unread, I don’t judge), it’s not enough to let a Google-type search engine function properly.
“The algorithms that Google uses to search news are not necessarily going to be effective,” Ian Ruthven, an information scientist at the University of Strathclyde, told me. “Even though it’s huge to you, it’s tiny, and the statistics don’t work as well.”
Plus, while Gmail is happy to help advertisers target you based on your email behavior, it doesn’t collect or share information about how people search their email. It’d be a privacy violation if it did. That means software engineers trying to build email search capabilities can’t easily draw on statistical commonalities. They can’t learn from the crowd. They have to rely on survey data, or anonymized usage data, or giant, stored repositories of email from dead companies. One of the biggest research archives turns out to be all the email sent inside Enron, the disgraced energy-arbitrage company.
“In web search, you have a collection of documents, the webpages, and that is shared for all users. If you search for something and click on a result, and then if I search for the same thing, Google can use your data, the clicks, to improve my search,” says Hamed Zamani, the associate director of the Center for Intelligent Information Retrieval at UMass Amherst. “Email, you have your collection of email, I have my own. The transfer of knowledge between clicks, or any feedback that Google gets from users, cannot be shared.”
Basically, email search is a massive coding problem spread out across millions of users. Trying to locate an email, ironically, may be the most solitary activity in the digital age — the only moment when we’re truly alone with our data.
Most websites, especially startups, don’t have the money or know-how to build their own search function. You can click the magnifying glass on a news site’s homepage, but it’s likely to cough up irrelevant articles, or ones that don’t reach back far enough in time. Same thing if you try to search social media: You’ll get pointed to lots of specific uses of your query words, but not necessarily from the user you actually want. And if the site has a more calibrated “advanced search” option, good luck finding it.
Google’s ubiquity has led us to assume that every horizontal box with a little magnifying glass on the side will function like a Google search. But they don’t. Internet giants like Amazon or Facebook spend lots of time and money on search functions, but smaller organizations can’t, or don’t. Many use off-the-shelf search software — products like Elastic or Apache Lucene — and customize it a little. They’re solid products, but they don’t have the advantages of scale that Google does. And since most people will wind up using Google anyway, creating a custom search function just isn’t worth it for most companies. “It’s not the heart of the business,” says Doug Cutting, a retired search-engine builder who helped invent Lucene. “They tend not to invest.”
That also means that what Google has trained us to do — type keywords into the search bar, over and over, until we find what we’re looking for — won’t necessarily work on other sites. “When people develop these habits and then go somewhere else expecting the system to be just as effective, they’re often supremely disappointed,” says Chirag Shah, an information scientist at the University of Washington.
There’s a simple solution to the problem: Companies could give Google’s bots access to their websites. The algorithm would help customers find what they were looking for. But that would expose a company’s internal data — and the habits and behaviors of its users — to a Silicon Valley giant renowned for its ferocious competitive instincts. Letting Google handle your search means letting Google all up in your business — literally.
“Giving away the front door to your product puts you at incredible risk,” Ramaswamy says. “Facebook, Instagram, Twitter, Pinterest are exceptionally careful about what they will and will not let Google do. They’ve all learned there is zero incentive to just giving all their information to Google.”
Here’s where the problem is perhaps less technical than venal. Not every site wants to show you what you want to find. Let’s say you want to buy something. Say you’re searching for something on Amazon, which once prided itself on using “users like you” recommendation filters and sophisticated ranking of results to display its wares. Today that website will literally show you other things first, followed by its own knockoff products, and then paid advertisements, before deigning to show you what you asked for.
After a couple of decades using Google, we’re all trained to assume that search results get ranked by relevance to our query. But the fact is, a website trying to sell something will always game its results to its benefit. The catch is, the options for searching a commerce site can’t completely suck, because then people won’t use it — you’d be forgiven for abandoning Home Depot for McMaster-Carr on this basis alone. A site trying to sell you something has to show just enough of what you want to buy, and just enough of what it wants to sell — that’s the green double-zero sweet spot that makes sure the house maintains its win margin. “I’ve actually worked at those places, so I know,” says Shah, the information scientist. “They have to balance what will increase their profit margin and what will give the user the sense they’re getting a good deal and relevant result.”
Streamers, run by big content creators like
confusing results at best — becauseand Disney, are like most websites — they don’t want Google to have access to data that could give away a competitive edge. So a simple Google query doesn’t always cough up particularly useful results from them. It’s also why the search function on Apple TV yields
services won’t grant Apple access to their data. Why would
want its customers to get lost in a sea of results using Apple’s interface instead of directly searching its own?
As for why their own internal search functionality doesn’t work well — well, that goes back to gaming the results. Representatives for the streaming services I spoke with emphasized their focus on recommendation algorithms, which show you content based on what they can tell about your preferences from what you’ve already watched. That’s in part because a more straightforward search would show you that their libraries are finite. If you go searching for stuff that isn’t there, you might start thinking about subscribing to a different service. So they proactively show you “The Goonies,” which they have, before you start looking for “Gremlins,” which they don’t.
Recommendation algorithms based on past behavior are really just search engines where the queries are implicit. Show me other movies like this. That’s called a zero-query search, and if you’ve ever fallen down a YouTube or Instagram rabbit hole, you know the vibe. But in an oblique way, every search we do has a secret, implicit zero-query search embedded in it. We’re looking for something that scratches an intellectual or emotional itch — something that makes us feel better in some unarticulated way. That’s why algorithmic recommendations are so pernicious. They work! They give us what we want, confirm our suspicions, comfort us, and tell us we were right about what we already thought, even when that’s not what we need.
There’s a reason the company’s name is a verb. It’s not just the over-90% market share, or the unbeatably massive index, or even the speed with which it responds to queries. Maybe the days of the “10 blue links” — when the first page of results was reliably filled with the most relevant places to find the information you were looking for — are over. But it’s still the case that, for most searches, Google works.
Sure, there’s some tension between Google’s “editorial” product — the search results — and its advertisements. Every few months another article or report (here’s one) confirms that more and more Google results are pay-for-play, including plenty of straight-up spam, or grifts. After all, the whole point of SEO — search-engine optimization — is for sites to game their way to the top of Google. One way or another, most results you get on Google are the product of a concerted effort to win your attention.
Pandu Nayak, vice president of search at Google, says the ad-edit dynamic is a good-faith one: “If you talk to the ads team, they’re very focused on making sure that ads are actually helpful. Because they realize if they have unhelpful ads, that’s a recipe for people to learn to avoid them altogether.” Nevertheless, he adds, “search is by no means a solved problem.” Which, when you think about it, is quite a thing for a VP of search at Google to say.
One way to think of search — not just online or digital — is as an attempt to interact with any system of organized information. Whether that’s asking a librarian in Alexandria to fetch down a scroll, sending a clerk to thumb through files in a cabinet in the basement, heading into the stacks with a Dewey Decimal number, or typing keywords in the syntax of formal Boolean logic into a LexisNexis terminal at the reference desk, we’re forever trying to stare into an abyss filled with information and cajole it into telling us what it knows in a way that both it and we understand.
The people who first started thinking about how computers were going to work strongly implied that these new devices would solve both known-item and exploratory search. In 1945, Vannevar Bush, who headed up scientific research for the US government during World War II, said “associative indexes” — links, basically — would be the key to a desktop (well, desk-size) information-processing device he called a memex. The first guides to the web, in fact, were literal lists of websites. That’s what Yahoo originally was. “The idea was, you’d try to create a hierarchy of topics you navigate,” Nayak says. “It was a great way to organize the web when it was small. But it quickly became infeasible.” There was just too much internet.
Google figured out how to search through that vast bulk so quickly that users could simply do keyword queries over and over, until the right answer showed up. It didn’t matter whether you were looking for a known item (Is Michael Caine still alive?) or just exploring (What are Michael Caine’s best movies?). We use the same tool for both.
But over time, in Google’s drive to be big and comprehensive, it got worse and worse at finding stuff that was small or obscure. “They’re trying to serve most of the customers well most of the time,” says Ruthven, the information scientist. “If you’re part of the fatter part of the tail, you get better results. But if you’re doing an unusual search, or you have really unusual taste in music or something, you’ll get worse results.”
The vast majority of the internet is crap, or stuff hardly anyone cares about. Google mostly ignores all that, optimizing to a fraction of its indexed pages. “Right away, that’s a filter,” says Cutting, the search-engine builder. “As an optimization, they’ve just restricted what they’re searching.” And because most people don’t want what Cutting calls “esoteric shit,” Google winds up favoring the many over the few. “The feeling that search is dumbed down,” Cutting says, “is because search is a mainstream thing now.”
And that’s only going to become more of the norm. Google’s massive store of data has given it the ability to create software that can actually understand and produce what looks a lot like human speech. This kind of “large language model” means Google search interactions may come to look less like an exchange of keywords for links and more like an interaction with a librarian in the old days, a trade of questions and answers. But that will be illusory. Google’s algorithms will be able to answer queries in 75 languages, but those answers will still come from parts of the web in Google’s index that the company has determined to be “a high-quality subset.” The search bar will be easier to use, but the answers won’t be more right.
It’s tough to imagine a technical challenge to Google’s hegemony. So many of the brains and so much of the data that might fix search are swiped into the Googleplex. “If you look at who’s got the data,” Cuttings says, “who’s got access to what people are actually searching for — if you’re an academic, you want to get an internship or job at a place like Google, and then get permission to publish, because they have all the resources you need. The cutting-edge work is going to be done at a place like Google or Microsoft or Yandex, and that’s unfortunate.”
Still, more than a half-dozen startups are hoping to come at the king. Some offer the ability to customize what and how you search, transparency that Google forgoes in favor of more and more direct answers to questions. Neeva, the Google competitor run by Ramaswamy, promises an ad-free experience that will search both the web and information on your own computer while protecting your privacy — for a subscription fee. Or you could enhance your Google search, as some search experts suggest, by using Google to search Reddit to find out what real humans say about your query. But you’d still be using Google.
A couple of months ago, a comic-book critic and historian I like tweeted a panel from an old Batman comic. It was a Silver Age meta thing showing Batman sitting in what looks like a library, paging through books and complaining that his editors at DC Comics are nagging him to pick his best stories for a compilation. “Editors are merciless men,” Batman says.
While I was working on this article, I thought it’d be funny to send that Batman panel to my own editor. But of course I couldn’t find the historian’s tweet. I used Twitter’s basic search function to combine the guy’s name with some language I remembered from the tweet. But the tweet still didn’t show up. I went to Twitter’s advanced search, did the same, added “editors are merciless men” and “Batman,” and still got nothing. Desperate (well, desperately procrastinating, since I was supposed to be writing my story), I opened up the writer’s Twitter feed and started scrolling. Zilch. When I got too far back in time, I stopped, befuddled.
On a hunch, I went to Google and typed in everything I remembered about the tweet except the writer’s name. And there it was, in the top 10 blue links. Turns out I’d misremembered who tweeted it. It was a different comic-book critic and historian, whom I also like. The problem was not search. It was me.
People expect Google or Bing or the magnifying glass on their computer to answer questions like the librarians of old — even complex, open-ended questions that generate complex, often-contradictory answers. “In some senses, the big search engines have trained us to behave in certain ways: short queries, mainly look superficially at the first page,” Ruthven says. “In an exploratory search scenario where you don’t know the vocabulary or domain, it’s not a good model for interacting with a search system.”
And when a search doesn’t produce the answer you’re looking for, what’s the most human thing to do? Keep asking endless variations of the same question, over and over, until you want to smash something. “We’ve seen in some of our studies that people will keep trying the same kinds of queries again and again, hoping it’ll yield the right results,” says Shah, the information scientist. “They’re not willing to change their behavior much.”
Search will always suffer from what we searchers know, or think we know — and what we don’t. Our own errant certainty, our mistaking unknown unknowns for known unknowns, puts limits on what we type into a search box. And because Google is pretty good at finding close to what we want from nothing more than a bag of misspelled keywords, we think we’re pretty good at searching. Any failures, we assume, must be on the other side of the screen. But search, by necessity, will always involve an interface between human and machine — a relationship, if you will.
So how do we fix our troubled interactions with search? Knowing that a healthy relationship is founded on open dialogue, I asked my search partner for suggestions.
“How do I fix our relationship?” I Googled.
“Face and embrace your differences,” Google replied.
Words to search by.
Adam Rogers is a senior correspondent at Insider.