Archive for the 'Blogs' Category

Interview for Journalism.co.uk… Journalists get to know the Semantic Web!

I was interviewed last week by Colin Meek from Journalism.co.uk on the topic of “Web 3.0″ and what it means for journalists… You can read the full article in two parts (1, 2). My original answers are part of an interview on their Insite blog. I also had the chance to talk about various DERI offerings in the Semantic Web area including SIOC, SWSE, Sindice, Semantic Radar, etc.


Colin also asked me about other readable data that is being crawled by Semantic Web search engines like Sindice, SWSE or Swoogle. These search engines can usually match keywords in any data that has been crawled or integrated into a semantic store, not just people. It could be from structured information about people, places, dates, library documents, blog items or topics, whatever. In fact, there is no limit to the types of things that can be indexed and searched - since RDF (an open data model that can be adapted to describe pretty much anything) is used as the data format. Anyone can reuse existing RDF vocabularies like SIOC to publish data, or they can publish data using their own custom vocabularies (e.g. to describe stamp collecting or Bollywood movie genres or whatever), or they can combine public and custom vocabularies (e.g. take FOAF and your own vocabulary about soccer to describe players and managers on a soccer team). Geotemporal information is particularly useful across a range of domains, and provides nice semantic linkages between things. For example, having geographic information and time information is useful for describing where people have been and when, for detailing historical events or TV shows, for timetabling and scheduling of events, etc., and for connecting all of these things together (”I’m travelling to Edinburgh next week: show me all the TV shows of relevance and any upcoming events I should be aware of according to my interests…”).

The keyword searches in the Sindice search engine allow you to find more information on where resources of interest are (searching for “john breslin” will point to all public pages that contain semantic information about yours truly). Sindice also has an API that can provide results in a resuable (semantic) format that can be leveraged by other applications. Alternately, SWSE (Semantic Web Search Engine) shows you semantic information about the object of interest (e.g. my phone number, my friends, etc.) which may be derived from multiple sources (this information on me comes from tens of sources consolidated together via unique identifiers for me or through what’s called “object consolidation”).

For me, this article highlighted the fact that the Semantic Web community needs to be very aware that one of the key features of the Social Web for journalists and for many others is the ability to find a lot of personal and sensitive information on people, and with the advent of “Web 3.0″, we need to realise that (”with great power comes great responsibility”) the availability of contextual and semantically-related information is going to become even more apparent, and people will talk about it in both positive and negative terms. Educating site owners about what semantic data they may be publishing (knowingly or unknowingly, even if it’s just RSS feeds) is needed, and developers should determine exactly what opt-in or opt-out mechanisms are required before implementing semantic solutions. Users also should be aware of the benefits and other potential uses of their semantic data.

I think now is the time to avert any scares, because in reality, the data that is on the Web or the Social Web can be used in new ways anyway, whether metadata is present or not (some facts can be derived). Google have recently implemented some discussion forum parsing algorithms to determine how many posts are on a thread, how many users posted on that thread and when the last post was made. You can see this in a search result I did for “irish pubs boards.ie” below. It’s not complete, and probably relies on identifying certain HTML structures for non-Google discussion sites, e.g. you can see two threads in the middle that don’t show details of the total posts or commenters. But it’s moving towards the SIOC vision of providing more metadata about discussions on the Web to help you in finding more relevant information - whether the site owners want to provide Semantic Web data or not!

Making data available semantically enables computers to help us do things we cannot easily do (or cannot do at all) right now, and this is what makes it so powerful. We also need to think more towards educating people about the benefits as well as how we can minimise any hazards. Is this a job for W3C SWEO? As my colleague James Cooley said: “I think scientists thought the benefits of GM food were so obvious that there was no case to make. Then you got Frankenstein Food and the game was up.”

For journalists interested in the Semantic Web, I’d recommend reading this paper entitled “SemWebbing the London Gazette” by Jeni Tennison and John Sheridan which describes how they have exposed information from their newspaper website using RDFa so that it becomes easy to re-use (slides here). You can also view some interesting slides by Colin Meek from a seminar he gave to journalists about the Social Web in Olso a few days ago. It’s in three parts (1, 2, 3). I’ve embedded the third part (on the Semantic Web) below…

Other posts referencing this article:

Tales from the SIOC-o-sphere #8

20080403a.png It’s time for another installment from the world of SIOC!

Previous SIOC-o-sphere articles:

#7 http://sioc-project.org/node/328
#6 http://sioc-project.org/node/310
#5 http://sioc-project.org/node/294
#4 http://sioc-project.org/node/272
#3 http://sioc-project.org/node/271
#2 http://sioc-project.org/node/138
#1 http://sioc-project.org/node/79

If you wish to contribute to the next article, join the SIOC Twine and use the tag “siocosphere9” when you add items.

Prototype for distributed / decentralised microblogging using semantics

Download the paper and get the code.

Try out our anonymous client and server demos for SMOB.

Michael Arrington of TechCrunch wrote an interesting blog post on Monday about a “decentralised Twitter”, which was picked up by Dave Winer, Marc Canter and Chris Saad amongst others.

20080512a.png I’m happy to say that we have recently described and shown how this can work. Alex has been the driving force behind a paper that we (Alexandre Passant, Tuukka Hastrup, Uldis Bojars and I) have written for SFSW 2008, demonstrating (a prototype called SMOB for) distributed / decentralised microblogging:

Microblogging: A Semantic Web and Distributed Approach

The prototype uses FOAF and SIOC to model microbloggers, their properties, account and service information, and the microblog updates that users create. A multitude of publishing services can ping one or a set of aggregating servers as selected by each user, and it is important to note that users retain control of their own data through self hosting.

The aggregate view of microblogs use ARC2 for storage / querying and Exhibit for the user interface. Security and privacy are open issues, but can be addressed in some part by requiring OpenID authentication.

The SMOB prototype code (both the semantic microblogging publishing client and server-based web service) is available here. You can install your own client and post to our demo server (set up today by Tuukka) here. There are some pictures below of it in use:

20080505a.jpg
Latest updates rendered in Exhibit

20080505b.jpg
Map view of latest updates with Exhibit

20080505c.png
Global architecture of distributed semantic microbloggging

Related posts:

CELT talk / WWW@15 on Morning Ireland / Ulrich Schnauss

A mixed-up blog post, but I haven’t the energy to write three separate posts, so here’s a three-in-one:

  • On Wednesday, I gave a talk at CELT, NUI Galway about “Learning via the Social Web”, which was a slightly-revised version of the one I gave in February. Again, there was an amazing turnout, and there will be a webcast made available via the CELT website at a later date. For now, you can access the PowerPoint slides here.
  • Yesterday, Damien Mulley and I were interviewed by Richard Downes on RTÉ R1 Morning Ireland about the 15th anniversary of CERN releasing the World Wide Web code for free (podcast available here; alternatively there’s an extracted clip here). I talked a little bit about the WWW versus UMn’s Gopher, and how the Web has expanded beyond the initial target audience of academics and researchers. I gave a slightly-tangential answer to a question I was asked about the importance of the Web to Ireland’s future and economy (FYI: CSO 2007 ICT stats), saying how dependent we are on the Web to do many tasks today, and describing how our work at DERI in NUI Galway will help us to deal with the current over-abundance of websites, by adding more structure to web pages so that computers can help us in finding the right information. “Are you telling me that the future of the Web [...] is being designed in Galway?”, Richard asked at one point. Yes!!! Finally, I mentioned how the problems with online video gridlock may have larger consequences as the Web is increasingly moving from the desktop to mobile devices where bandwidth is even more important, so smarter ways are needed to reduce exactly what will be sent to your phone (FYI: Opera Mini is a nice example, a tiny Java browser that works on most phones where the content is pre-filtered server-side before it gets to you).
  • Last night, I went along with friend Conrad to see Ulrich Schnauss at Stress in DeBurgo’s here in Galway. Although I missed the encore (it had been a long day, with a nine-hour session at work), I really enjoyed the night and the support acts: Beatpoet was great playing on his mono-something device, and Airiel were pretty good too :)

Slides from the SIOC tutorial at WWW2008

Here are the PowerPoint slides from our tutorial on “Interlinking Online Communities and Enriching Social Software with the Semantic Web” at the World Wide Web Conference in Beijing - you can also download them from here:

The tutorial went well, it was hot in the room and we were a bit jetlagged, but we had some good feedback afterwards and about 30 people attended in all.

I had a nice few days in Beijing, participating in the W3C advisory commitee meeting on Sunday, Monday and Tuesday, giving our SIOC tutorial with Alex and Uldis on Monday afternoon, popping along to our paper at the Linked Data on the Web workshop on Tuesday, attending some sessions on Wednesday (Kai-Fu Lee’s plenary keynote on Cloud Computing, the discussion panel with Lada Adamic et al. on the Future of Online Social Interactions, the W3C Open Your Data! track, and a packed session on Social Networks: Discovery and Evolution of Communities). On Thursday, I gave a talk about DERI at Tsinghua University to Cemon Yang and his team at the Digital Government / Web and Software Research Centre. Thursday evening we had the banquet in the Great Hall of the People, and I headed back to Ireland on Friday.

Unfortunately I saw little of Beijing outside of travelling between venues in taxis and buses, so I have a good reason to return and see / do more next time…

Tales from the SIOC-o-sphere #7

20080403a.png It’s been three months since my last round-up of all things SIOC-ed, so here is entry number seven in the series:

Previous SIOC-o-sphere articles:

#6 http://sioc-project.org/node/310
#5 http://sioc-project.org/node/294
#4 http://sioc-project.org/node/272
#3 http://sioc-project.org/node/271
#2 http://sioc-project.org/node/138
#1 http://sioc-project.org/node/79