Weirdly enough, this isn’t a rhetorical question. (That it’s not is one of the many things I love about my job.)

Lately I’ve been evaluating Proton, an upper ontology developed by SEKT, which is an EU initiative. Most of the ontologies I evaluate aren’t written by actual ontologists, which leads to a certain amount of ranting and despair on my part. Proton has been developed by a number of skilled ontologists and logicians, and it’s a pleasure to spend time with a well-thought-out model. Of course, modeling is an art, and I don’t agree with every modeling decision in Proton — but that’s okay, because my disagreements give me a lot of food for thought.

One idea that’s given me pause is whether or not notions of time and number are more abstract than other concepts in an ontology. In Proton’s model, classes generally descend from the Entity class, with the exception of a few system classes. Entity has three direct subclasses, Abstract, Happening and Object. The full hierarchy looks like this:
Proton top-level entities

Thinking about it, I can’t come up with a single reason why the concept of time is more abstract than any other concept represented in an ontology. I understand that it’s modeling an abstraction — there’s no concrete thing in the world called Tuesday; it’s a concept in our calendar system. But there’s also not a concrete thing in the world represented by the class “Boston Marathon” or “US Currency” or “World Leader”. There are real instances of all those classes, but Tuesday March 9th is a real instance of the concept of Tuesday. So, is there a difference? 10 points to anyone who can explain it to me.

True Knowledge went into public beta last week. I’ve been playing around with the private beta for the better part of a year now, and there’s a lot I like about this system. So far, they haven’t received a fraction of the hype of some other knowledge bases (<cough> Wolfram|Alpha), but what they’re doing is more interesting, harder, and truly semantic.

One of the things I love about True Knowledge is that it exposes lineage. Run a query, click on the “How do we know this?” link at the bottom of the results, and you’ll see the facts and reasoning used to derive them. This visibility into the reasoning process should be standard operating procedure for any semantic application — without it, you have no way of assessing the quality of the information you’re getting. Lineage is noticeably absent from Wolfram|Alpha, which is one of my main complaints about it.

Similarly, True Knowledge lets you agree or disagree with any fact in their knowledgebase. You can edit existing facts, or contribute new ones. I like this because it means (a) there’s a model they’re computing over and (b) the model is extensible. And their UI is a great example of how to painlessly elicit complex information from end users.

True Knowledge is smartly done, model-driven, and really different than any other “semantic” system I’ve demoed to date (what with actually relying on semantics and all). It’s been interesting to watch True Knowledge evolve, and my hope is that they’ll not only succeed, but become the gold standard for semantic web apps to come.

Posted via email from Modelicious

Is the semantic web a memex?

December 31, 2009

I agree with mc schrafel that the semantic web needs a better metaphor, or really any metaphor, to help people understand and embrace it. I’m just not sure the memex is the right one. It’s not a concept that’s easily recognizable by most people. And I’m not convinced that it’s an accurate metaphor.

Central to Vannevar Bush’s original description of the memex are paths of association between items, the connection made between point a and point b. While ontologies and semantic web apps let us label the relationship between two things, I’ve yet to see an application that lets you capture the path that led you to make that connection.

So for instance Zotero lets me say Paper 1 is related to Paper 2, but not that I followed a link to a citation in paper 1, which led me to a Wikipedia page, which led me to Paper 2. Paper 2 and Paper 1 may have a generally meaningful relationship that any reader would recognize: a shared author, similar subject matter. Or their relationship may be meaningful only to me: there was some association I made along the path from Paper 1 to Paper 2 that may not matter to anyone else. However, that association — the dynamic path leading to the association, not the static association itself — may be a source of information or inspiration to me. Where is the system that lets me preserve it?

To the best of my knowledge, that system doesn’t exist yet. Really, that’s not too surprising: we’re still working on representing the relationship between two things, much less the evolution and lineage of that relationship. There are thorny semantic and user experience questions related to the larger project, especially working across the boundaries of information systems and the semantic web does (or will). But it’s a worthwhile goal, and we should make sure that we make it there and aren’t satisfied with representing static associations. Why? Because doing so creates rich context, that starts to approximate the kind of implicit context humans generate all the time. It grounds are machine representations in human notions of time. And it facilitates that mysterious capacity humans have of sparking new ideas by juxtaposing two apparently unconnected things.

So my answer to my own question at the top of this post — and to dr. schraefel — is: not yet. But maybe someday.

What I did at (info)camp

October 16, 2009

Last weekend I went to Seattle for InfoCamp, an unconference put together by a group of IA/UX/IxD/Library folks. I could only stay for the Saturday session, and due to my (ahem) directional challenges, I got there later than I would have liked. No matter: the day was absolutely inspiring. I spent it having great conversations with some very smart and creative people, and I’ll be chewing over the ideas they sparked for weeks to come.

So, it turns out Seattle is north of Portland these days. Weird. I missed the first half of Axel Roseler‘s keynote, which was too bad. He has interesting things to say about design, creativity, and process. A designer is someone who predicts the future, he said, and illustrated with examples of some pretty radical rethinkings of airplane cockpit and wayfinding interactions.

I was pretty sure I had missed my calling until I followed up the keynote with a session on service design. Service design involves applying design skills to real-world user experience and information systems. For example, making the DMV experience work for the customer, or streamlining processes across agencies. Coming from more of an information architecture than a user experience background, the first thing about it that most appeals to me is how it can be used to pull together a mishmash of accidental systems into a cohesive whole. But listening to a roomful of talented user experience practitioners made me realize how little I know about that world, and how much I should be learning about it and incorporating it into my work.

The service design conversation really took off. You could have spent your entire Saturday at various follow up sessions, and the people who organized it have plans to find a service design project to do in Seattle. I was too tempted by the other offerings to specialize. I wound up at permaculture design for social media sites, and a brainstorming session on integrating taxonomies and social media into corporate intranets.

Both of these left me with a lot of ideas zinging around my brain, and I’ll post about some of them in the next few days. In the meantime, who wants to put together InfoCamp PDX? I don’t think I can wait a whole year to do this again.

A lot of applications claim to be “semantic”.  In some cases it’s easy to understand why. For instance, Zigtag ties its tags back to a taxonomy, so it knows that the tags “New York”, “NYC” and “The Big Apple” refer to the same thing. And that’s kind of semantic-ish.  True Knowledge is built around a sophisticated ontology that understands relationships as they change over time. That’s very semantic.

In other cases, it’s hard to understand where an app’s semantics are. As semantic search becomes more of a buzzword, the term “semantics” gets thrown around freely and, ironically for a word that means “meaning”, loses its meaning.

NetBase has generated a lot of excitement for what seems to be a truly semantic approach to search. They do parts of speech analysis on the text of documents, then put the concepts they find into relationship with each other.

All good, right? But this week, NetBase launched HealthBase, a “health research showcase”. HealthBase was intended to show off their technology. Instead, it pointed up some really big holes in it that make me wonder if there’s anything semantic going on here at all.

TechCrunch has a good story about searches on HealthBase producing questionable results. The most glaring error: a query for “AIDS” returns “Jews” as one of the disease’s causes. The software then goes on to helpfully suggest salt and alcohol as ways to get rid of Jews.

Speaking as a Jew, this suggests all kinds of wildly inappropriate jokes. Ply me with alcohol and salt, and I’ll tell you a few. Leave me sober and not hypertensive, and I’ll point out that this is not actually a case of conspiracy theory run amok, but just some really bad algorithms.

NetBase’s take on the situation was interesting. This is from their response to TechCrunch:

This is an unfortunate example of homonymy, i.e. words that have different meanings.
The showcase was not configured to distinguish between the disease “AIDS” and the verb “aids” (as in aiding someone). If you click on the result “Jew” you see a sentence from a Wikipedia page about 7th Century history: “Hispano-Visigothic king Egica accuses the Jews of aiding the Muslims, and sentences all Jews to slavery. ” Although Wikipedia contains a lot of great health information it also contains non-health related information (like this one) that is hard to filter out.

This is a funny answer: this is the exact problem NetBase’s technology is supposed to solve. Pointing out that it’s hard to solve doesn’t win you any points — you’ve got to actually solve the problem for that.

I’ve got to question what’s going on under the hood here. Granted, natural language processing is far from perfect. But if you’re truly analyzing how words are used in a document, you should be able to tell the difference between the noun that refers to a disease and the verb that refers to helping someone. It’s just coincidence that these concepts are represented by two words with the same spelling. If that trips you up, you must be doing keyword matching. Good old web 1.0, why would anyone fund this or pay for this, we already have search based on it, keyword matching.

Reading between the lines, there are other disturbing implications about Netbase’s approach. They don’t seem to analyze the context of their sources — Yes, Wikipedia contains a lot of non-health related content. Don’t use it for your health knowledge base! They don’t seem to take into account how many times a statement was made — if Jews and AIDS appear together only one time, consider it an outlier. And, they don’t seem to take time into account — AIDS has only been around since the 1980’s, so how could something that happened in the 7th century possibly be relevant?

HealthBase has been “fixed” since the initial uproar, or at least fixed enough to not categorize Jews as an agent of disease (thanks, NetBase!). But given the general cluelessness about semantics in their response, you’ve got to wonder if the fix consisted of tuning their text analytics, or hacking a bunch of workarounds into their code.

Modeling the apocalypse

August 18, 2009

How likely are you to survive a zombie attack? Not likely at all, according to these mathematicians. Someone posted their paper on our bulletin board at work, and the entire modeling department is busy trying to glean survival tips from it.

Of course, some people question their model.

Here’s a great New York Times article describing a working Phillips machine. A Phillips machine is a physical model of the flow of money through a national economy, circa 1949.  It’s pretty wacky looking, but to its credit it gives a visceral understanding of those stock and flow concepts that are so hard for humans to grasp about systems. And apparently it worked to predict the economy, although it never was put into production widely.

Page through the comments to find some good points about modeling and the history of analogue computing. Given my feelings on the the limits of economic models, I especially appreciated this one on the limits of modeling. In a less negative vein, this commenter argues that systems arise from simple processes.  Since Wolfram|Alpha is on everyone’s minds right now (and is garnering criticism for not being Google), I appreciated his point.