Monthly Archive for October, 2008

Interview for Journalism.co.uk… Journalists get to know the Semantic Web!

I was interviewed last week by Colin Meek from Journalism.co.uk on the topic of “Web 3.0″ and what it means for journalists… You can read the full article in two parts (1, 2). My original answers are part of an interview on their Insite blog. I also had the chance to talk about various DERI offerings in the Semantic Web area including SIOC, SWSE, Sindice, Semantic Radar, etc.


Colin also asked me about other readable data that is being crawled by Semantic Web search engines like Sindice, SWSE or Swoogle. These search engines can usually match keywords in any data that has been crawled or integrated into a semantic store, not just people. It could be from structured information about people, places, dates, library documents, blog items or topics, whatever. In fact, there is no limit to the types of things that can be indexed and searched - since RDF (an open data model that can be adapted to describe pretty much anything) is used as the data format. Anyone can reuse existing RDF vocabularies like SIOC to publish data, or they can publish data using their own custom vocabularies (e.g. to describe stamp collecting or Bollywood movie genres or whatever), or they can combine public and custom vocabularies (e.g. take FOAF and your own vocabulary about soccer to describe players and managers on a soccer team). Geotemporal information is particularly useful across a range of domains, and provides nice semantic linkages between things. For example, having geographic information and time information is useful for describing where people have been and when, for detailing historical events or TV shows, for timetabling and scheduling of events, etc., and for connecting all of these things together (”I’m travelling to Edinburgh next week: show me all the TV shows of relevance and any upcoming events I should be aware of according to my interests…”).

The keyword searches in the Sindice search engine allow you to find more information on where resources of interest are (searching for “john breslin” will point to all public pages that contain semantic information about yours truly). Sindice also has an API that can provide results in a resuable (semantic) format that can be leveraged by other applications. Alternately, SWSE (Semantic Web Search Engine) shows you semantic information about the object of interest (e.g. my phone number, my friends, etc.) which may be derived from multiple sources (this information on me comes from tens of sources consolidated together via unique identifiers for me or through what’s called “object consolidation”).

For me, this article highlighted the fact that the Semantic Web community needs to be very aware that one of the key features of the Social Web for journalists and for many others is the ability to find a lot of personal and sensitive information on people, and with the advent of “Web 3.0″, we need to realise that (”with great power comes great responsibility”) the availability of contextual and semantically-related information is going to become even more apparent, and people will talk about it in both positive and negative terms. Educating site owners about what semantic data they may be publishing (knowingly or unknowingly, even if it’s just RSS feeds) is needed, and developers should determine exactly what opt-in or opt-out mechanisms are required before implementing semantic solutions. Users also should be aware of the benefits and other potential uses of their semantic data.

I think now is the time to avert any scares, because in reality, the data that is on the Web or the Social Web can be used in new ways anyway, whether metadata is present or not (some facts can be derived). Google have recently implemented some discussion forum parsing algorithms to determine how many posts are on a thread, how many users posted on that thread and when the last post was made. You can see this in a search result I did for “irish pubs boards.ie” below. It’s not complete, and probably relies on identifying certain HTML structures for non-Google discussion sites, e.g. you can see two threads in the middle that don’t show details of the total posts or commenters. But it’s moving towards the SIOC vision of providing more metadata about discussions on the Web to help you in finding more relevant information - whether the site owners want to provide Semantic Web data or not!

Making data available semantically enables computers to help us do things we cannot easily do (or cannot do at all) right now, and this is what makes it so powerful. We also need to think more towards educating people about the benefits as well as how we can minimise any hazards. Is this a job for W3C SWEO? As my colleague James Cooley said: “I think scientists thought the benefits of GM food were so obvious that there was no case to make. Then you got Frankenstein Food and the game was up.”

For journalists interested in the Semantic Web, I’d recommend reading this paper entitled “SemWebbing the London Gazette” by Jeni Tennison and John Sheridan which describes how they have exposed information from their newspaper website using RDFa so that it becomes easy to re-use (slides here). You can also view some interesting slides by Colin Meek from a seminar he gave to journalists about the Social Web in Olso a few days ago. It’s in three parts (1, 2, 3). I’ve embedded the third part (on the Semantic Web) below…

Other posts referencing this article:

Tusavvy, the social search engine

I got a nice message from Jaesung Ro, founder of Tusavvy, the social search engine (who is a friend of my student Haklae and collaborator Honggee). Their byline is “searching community knowledge without navigating the entire web”.

He pointed me to their service which I tried out today (as one does, when you pick up a pen for the first time to try it out, you often write your own name, or is that just me and my ego?!); you can see the screenshot below… I was happy to see that my own page was prominently featured of course.

Tusavvy reveals “not easily linked-to pages” that are often buried in conventional search results. It was built by aligning human factors with search: using socially-annotated web data, leveraging a lexicon built via tags, and utilising rankings selected through a user’s accumulated interests.

RDFa, SearchMonkey, Drupal (and SIOC)

It’s been an exciting week in terms of developments and announcements for the Semantic Web and search.

Firstly, Yahoo! SearchMonkey has published a list of recommended vocabularies for developers of SearchMonkey applications, including FOAF, SIOC, DC, vCard, vCalendar, hReview, GoodRelations, dbPedia and Freebase. SIOC is recommended for “blogs, discussion forums, Q&A sites”. See the video below for a nice overview of SearchMonkey.

Secondly, Drupal creator Dries Buytaert wrote a very interesting and encouraging post yesterday entitled “Drupal, the semantic web and search” in which he says:

“On a social networking site built with Drupal, [semantic technology] opens up the possibility to do all sorts of deep social searches - searching by types and levels of relationships while simultaneously filtering by other criteria. I was talking with David Peterson the other day about this, and if Drupal core supported FOAF and SIOC out of the box, you could search within your network of friends or colleagues. This would be a fundamentally new way to take advantage of your network or significantly increase the relevance of certain searches. I can has semweb in Drupal core?”

Thirdly, RDFa just became a W3C recommendation! Congratulations to all involved…

Edit: Fourthly, the Creative Commons Network has been launched. CC CTO Nathan Yergler said:

“The CC Network is where the semantic rubber meets the web road. With the CC Network we’re leveraging everything we’ve learned over the past five years about metadata on the web, including the new RDFa standard, along with the work of many other groups, including FOAF, POWDER, and SIOC.”

Oh, and finally, SIOC is now OWL-DL compliant. This change was motivated by the SWANSIOC initiative in the W3C HLCSIG (Semantic Web for Health Care and Life Sciences Interest Group), which uses the Science Collaboration Framework based on Drupal.

See also:

Two wins for boards.ie at the Irish Web Awards

boards.ie won the “Best Discussion Forum” award category and the “Best Website in Ireland” grand prix at the Moviestar.ie Irish Web Awards this evening…

I just wanted to express my sincere thanks to all of our readers, posters and moderators for supporting us!

Róisín Griffith Breslin

Rugadh ár gcailín Róisín ar an 3ú lá de Deireadh Fómhair… Is aingeal beag í freisin!

Our baby girl Róisín was born on the 3rd of October… She’s a little angel too!

Tales from the SIOC-o-sphere #8

20080403a.png It’s time for another installment from the world of SIOC!

Previous SIOC-o-sphere articles:

#7 http://sioc-project.org/node/328
#6 http://sioc-project.org/node/310
#5 http://sioc-project.org/node/294
#4 http://sioc-project.org/node/272
#3 http://sioc-project.org/node/271
#2 http://sioc-project.org/node/138
#1 http://sioc-project.org/node/79

If you wish to contribute to the next article, join the SIOC Twine and use the tag “siocosphere9” when you add items.