Human meaning in machine encoding? Thoughts on the semantic web

Tim Berners-Lee, the inventor of the world wide web, outlines his goals for the semantic web in the book he wrote about the development of the web.  I love his dream, that one day we would be able to ask “find out where a baseball game was played today and it was also 22C”.  I just don’t believe it is very likely to happen, for two reasons:

  • Effort
  • Natural language

The effort question is a really interesting one.  Somewhere along the line, someone has to expend the effort to make human semantic concepts in some way machine encoded, or, alternatively to answer their own questions.  For some, a certain level of machine encoding of the semantics they personally attach to an object (usually in the form of tags) is useful, either for some purpose of their own (information retrieval, for example), or for some social-capital reason (see a more detailed explanation of this here).  However, when a person has only a small amount of information to organise they are considerably less likely to add semantic information to it.

If there is no human being willing to expend the effort to add semantic information, there may be a human being willing to write computer programs to extract such information.  This will be more or less successful dependent on the kind of information to be extracted, and what it is to be extracted from, for example:

This is lesser effort than tagging, because it can be done once and used multiple times, but it is still effort that someone has to expend.

One further approach is, as in this paper (sorry, paywall), leveraging human-created tags to allow machines to do things that look like they understand the semantic web–so in the paper, for example, the author wrote a program that used the way people had combined tags on flickr to unsdersdtand what concrete things (for example tulips) were associated with abstract concepts (for example spring).

In any of the three cases human effort is required to generate the information needed for machines to do the kind of processing Berners-Lee suggests the semantic web ought to be able to do for us.  To actually get people to expend this effort requires them to have a special interest in it, either at a personal level (as with tagging) or a research interest (as with automatic extraction programs.  I think this effort is a major impediment to more widespread “semantic web” applications and uses.

The natural language question is also a barrier, and a much more usability centred barrier.  Even if we could get evertyhing tagged up, either by human hands or automatically, how people would then ask this semantic web to answer their questions is an open question.  glenn, an acquaintance of mine who works in the field (and like his name spelt wiht a lower case ‘g’) thinks that we need query languages, and I am inclined to agree.  If natural language searching on the free-text internet fails (paywall again, sorry), it will surely fail in any kind of structured environment.  Unfortunately, users are known to do poorly with Boolean search, and it is reasonable to expect that other query languages would porduce similarly bad results, so even if the web was tagged up, it may still be fairly difficult for the average user to ask the question Berners-Lee posed in his book.

I think tagging is great, because it imbues objects with personal meaning, and allows people to find things more easily.  I have yet to see evidence of a truly workable (and by implication usable) semantic web, though, and as such I don’t believe people will be able to answer questions about baseball games at 22C for some time to come. I also believe that even when it is possible to answer these soorts of questions, it will be not because of advanced tagging of web-pages, but more form advanced text processing by search engines–and that isn’t the semantic web, it’s search engine companies prioritising user experience.


1 Response to “Human meaning in machine encoding? Thoughts on the semantic web”

  1. 1 CJ Friday, October 24, 2008 at 3:43 am

    “human semantic concepts in some way machine encoded”….ontologies are often automated these days. Many ontologies are also available, like the frameNet corpus for example. There are a lot of NLP tools for everything from concordance to automated pattern matching and machine learning.

    SPARQL is a good query language for ontologies if that’s what you;re using to build semantic nets.

    Semantics are used by question-answering systems, but the semantic web alone isn’t going to be good enough. Natural language generation and understanding is full of crazy tools and good ideas, not involving tagging up loads of stuff.

    Try Cognition search engine for a taste of something different, they’re working on well…a semantic search engine. It’s not the only one out there, Hakia is another, but I think Cognition works better.

    I wouldn’t write off natural language just yet.

    “I also believe that even when it is possible to answer these soorts of questions, it will be not because of advanced tagging of web-pages, but more form advanced text processing by search engines–and that isn’t the semantic web, it’s search engine companies prioritising user experience.”

    I completely agree – nice post 🙂

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s



Some rights reserved.

Comment moderation

If it is your first time posting, your comment will automatically be held for my moderation -- I try get to these as soon as possible. After that, your comments will appear automatically. If your comment is on-topic and isn't abusing me or anyone else who comments, chances are I'll leave it alone. That said, I reserve the right to delete (or infinitely moderate) any comments that are abusive, spammy or otherwise irelevant.

%d bloggers like this: