Last week Google began including results from Twitter on their results page. The tweets are accessed through a timeline with a handle you can grab to scroll through results over time.

This is incredibly cool. At the same time, I can’t help noticing that while it presents a lot of information, it’s not immediately clear how to construct meaning from it.

Google talks about using the results to “’replay’ what people were saying publicly about a topic on Twitter.” That seems to describe the usage model pretty accurately: search, scroll through all results, and make of them what you will. It seems to lend itself to historical or anthropological purposes, rather than traditional search.

Here’s some sample tweets returned by searching for “Obama“: This isn’t so great if you’re interested in policy, but highly interesting if you’re investigating the teaparty movement. Ditto with this result:

Up until now, if you were researching a group of people, you would search on the group’s name. With tweets, you really want to search on the topics the group publishes about. So this could change the average information consumer’s search strategies.

The Google Blog suggests this search to “relive” Shaun White’s Olympic glory. The idea of reliving it is interesting, because what’s being relived is not the actual moment, but the response of thousands of people to that moment.

(And, like everything else, it could really use semantic search to filter out stuff like this: )

To sum up: Twitter on Google is very cool. It will change the way we search, but right now not even Google knows a good way to use it. It dumps a huge amount of raw info on the searcher, and leaves it the individual to navigate, sift, and construct meaning out of it.

But, it was only announced this week, and clever people are certainly already at work on innovative ways to build meaning out of the firehose that is the global tweetstream. A semantic search layer? Sentiment analysis? There’s a lot of possibility here.

By the time this posts, Google will probably have rolled this out worldwide. Have you tried it? What do you think?

Posted via email from Modelicious


A lot of applications claim to be “semantic”.  In some cases it’s easy to understand why. For instance, Zigtag ties its tags back to a taxonomy, so it knows that the tags “New York”, “NYC” and “The Big Apple” refer to the same thing. And that’s kind of semantic-ish.  True Knowledge is built around a sophisticated ontology that understands relationships as they change over time. That’s very semantic.

In other cases, it’s hard to understand where an app’s semantics are. As semantic search becomes more of a buzzword, the term “semantics” gets thrown around freely and, ironically for a word that means “meaning”, loses its meaning.

NetBase has generated a lot of excitement for what seems to be a truly semantic approach to search. They do parts of speech analysis on the text of documents, then put the concepts they find into relationship with each other.

All good, right? But this week, NetBase launched HealthBase, a “health research showcase”. HealthBase was intended to show off their technology. Instead, it pointed up some really big holes in it that make me wonder if there’s anything semantic going on here at all.

TechCrunch has a good story about searches on HealthBase producing questionable results. The most glaring error: a query for “AIDS” returns “Jews” as one of the disease’s causes. The software then goes on to helpfully suggest salt and alcohol as ways to get rid of Jews.

Speaking as a Jew, this suggests all kinds of wildly inappropriate jokes. Ply me with alcohol and salt, and I’ll tell you a few. Leave me sober and not hypertensive, and I’ll point out that this is not actually a case of conspiracy theory run amok, but just some really bad algorithms.

NetBase’s take on the situation was interesting. This is from their response to TechCrunch:

This is an unfortunate example of homonymy, i.e. words that have different meanings.
The showcase was not configured to distinguish between the disease “AIDS” and the verb “aids” (as in aiding someone). If you click on the result “Jew” you see a sentence from a Wikipedia page about 7th Century history: “Hispano-Visigothic king Egica accuses the Jews of aiding the Muslims, and sentences all Jews to slavery. ” Although Wikipedia contains a lot of great health information it also contains non-health related information (like this one) that is hard to filter out.

This is a funny answer: this is the exact problem NetBase’s technology is supposed to solve. Pointing out that it’s hard to solve doesn’t win you any points — you’ve got to actually solve the problem for that.

I’ve got to question what’s going on under the hood here. Granted, natural language processing is far from perfect. But if you’re truly analyzing how words are used in a document, you should be able to tell the difference between the noun that refers to a disease and the verb that refers to helping someone. It’s just coincidence that these concepts are represented by two words with the same spelling. If that trips you up, you must be doing keyword matching. Good old web 1.0, why would anyone fund this or pay for this, we already have search based on it, keyword matching.

Reading between the lines, there are other disturbing implications about Netbase’s approach. They don’t seem to analyze the context of their sources — Yes, Wikipedia contains a lot of non-health related content. Don’t use it for your health knowledge base! They don’t seem to take into account how many times a statement was made — if Jews and AIDS appear together only one time, consider it an outlier. And, they don’t seem to take time into account — AIDS has only been around since the 1980’s, so how could something that happened in the 7th century possibly be relevant?

HealthBase has been “fixed” since the initial uproar, or at least fixed enough to not categorize Jews as an agent of disease (thanks, NetBase!). But given the general cluelessness about semantics in their response, you’ve got to wonder if the fix consisted of tuning their text analytics, or hacking a bunch of workarounds into their code.