Monthly Archive for February, 2007

To Wikis and Beyond

(Originally posted by John Breslin on the IIA Blog.)

Last time, I talked about semantic blogging and how the blogging experience can be augmented by adding structure and metadata about the things you’re blogging about. Today, I’m going to talk about wikis and how they too can benefit from such structure.

Firstly, some history. Many people are familiar with the Wikipedia, but less know exactly what a wiki is. In short, a wiki is an “information space” (web or desktop application) that allows users to easily add and edit content, and is especially suited for collaborative writing. Wikis rely on cooperation, on checks and balances of the wiki site members, and a belief in the sharing of ideas. The name comes from a Hawaiian phrase, “wiki wiki”, which means to hasten or go quickly. Ward Cunningham, who now works for Microsoft, created the first wiki in 1995, and I had the pleasure of meeting both Ward and Jimmy Wales (who set up the Wikipedia in 2001) at the first Wikimedia conference. Apart from the Wikipedia, wikis are being used for free dictionaries, book repositories, event organisation, and software development. They have become increasingly used in enterprise environments for collaborative purposes: research projects, papers and proposals, coordinating meetings, etc. Ross Mayfield’s SocialText produced the first commercial open source wiki solution, and many companies now use wikis as one of their main intranet collaboration tools.

There are a plethora (hundreds) of wiki software systems now available, ranging from MediaWiki, the software used on the Wikimedia family of sites, and Eugene Eric Kim’s PurpleWiki, where fine grained elements on a wiki page are referenced by purple numbers, to Alex Schröder’s OddMuse, a single Perl script wiki install, and WikidPad, a desktop-based wiki for managing personal information. Many are open source, free, and will often run on multiple operating systems. The differences between wikis are usually quite small but can include the development language used (Java, PHP, Python, Perl, Ruby, etc.), the database required (MySQL, flat files, etc.), whether attachment file uploading is allowed or not, spam prevention mechanisms, page access controls, RSS feeds, etc.

The Wikipedia project consists of 250 different wikis, corresponding to a variety of languages. The English-language one is currently the biggest, with over 1.5 million pages, but there are wikis in languages ranging from Irish to Arabic to Chinese (and even in constructed languages such as Esperanto and Klingon!). A typical wiki page will have two buttons of interest: “Edit” and “History”. Normally, anyone can edit an existing wiki article, and if the article does not exist on a particular topic, you can create it. If someone messes up an article (either deliberately or erroneously), there is a revision history so that you can fix or revert the contents. There is a certain amount of ego-related motivation in contributing to a wiki - people like to show that they know things, to fix mistakes and fill in gaps in underdeveloped articles (stubs), and to have a permanent record of what they have contributed via their registered account. By providing a template structure to input facts about certain things (towns, people, etc.), wikis also facilitate this user drive to populate wikis with information.

For some time on the Wikipedia and in other wikis, templates have been used to provide a consistent look to the content placed within article texts. They can also be used to provide a structure for entering data, so that it is easy to extract metadata about the topic of an article (e.g. from a template field called “population” in an article about Galway). Semantic wikis bring this to the next level by allowing users to create semantic annotations anywhere within a wiki article text for the purposes of structured access and finer-grained searches, inline querying, and external information reuse. There are already about 20 semantic wikis in existence, and one of the largest ones is Semantic MediaWiki, based on the popular MediaWiki system.

20070216a.pngLet’s take some examples of providing structured access to information in wikis. At the moment, there may be a page about John Grisham that has a link to the Pelican Brief (and to other books that he has written), to Mississippi because he lives there, and to Random House, his publisher (thanks to Eyal for this example). But, you cannot perform fine-grained searches on the Wikipedia dataset such as “show me all the books written by John Grisham”, or “show me all authors that live in the US”, or “what authors are signed to Random House”, because the type of links (i.e. the relationship type) between wiki pages are not defined. In Semantic MediaWiki, you can do this by linking with [[author of::Pelican Brief]] rather than just [[Pelican Brief]]. There may also be some attribute such as [[birthdate:=1955-02-08]] which is defined in the John Grisham article. Such attributes could be used for answering questions like “show me authors over 50″ or for sorting articles.

20070216b.pngSome semantic wikis also provide what is called inline querying. The screenshot on the right (from another system called SemperWiki) gives an example of this. The text in red (which says find me all pages where the creator is Eyal Oren) is processed as a query when the page is viewed and the results are shown at the bottom. Other wikis will process the query and show the results as part of the article text itself. [The green text here defines some relationships and attributes, and for each of these, articles with matching properties are shown on the right-hand side.]

Finally, just as in the semantic blogging scenario, wikis can enable the Web to be used as a clipboard, by allowing readers to drag structured information from wiki pages into other applications (for example, geographic data about locations on a wiki page could be used to annotate information on an event or a person in your calendar application or address book software respectively).

My next (and final) guest blog post will be on social network services and connecting them all together. See you then!

Found a proposal about semantic structures for scientific writing

I’ve just been reading this proposal paper by Anita de Waard from the University of Utrecht and ATG in Elsevier, entitled “Semantic Structures for Scientific Writing”. It’s over a year old, but gives some interesting ideas on “a new format for the scientific article [...] where a semantic structure is created by the author during writing”. In a way it’s similar to the idea of structured / semantic blogging, just for different content.

If the article was produced within Semantic Web standards, tools will be created that allow for browsing and integration of this content. Subject specific visualizations which are required for an effective sense making environment will also be developed, if the content is available. Hopefully such structured publications can enhance scientific communication as a whole, within and between specific subfields of science. Feeding back the results of the document model to the Semantic Web community, we would like to help the development of authoring and editing tools, e.g. in the ScholOnto and Semantically Interlinked Online Communities projects. We believe the communication between discourse studies and textual analysis on the one hand, and says Semantic Web and artificial intelligence on the other can finally lead to the new way of publishing which Vannevar Bush thought so overdue in 1945.

boards.ie Birthday / boards.org.uk Launch

boards.ie Ltd. celebrated its seventh birthday recently (the company was formed in late January 2000). Yesterday was also the ninth anniversary of our first post on the old WWWBoard! To coincide with this (I like dates), we launched boards.org.uk, our new sister site for Great Britain. The first stab at a logo has two quotation marks, that look vaguely like a “g” and a “b”…

20070213c.gif

Testing Yahoo! Pipes

Yes, it’s very cool! RSS fans, prepare to be blown away. Via this Slashdot article and CaptSolo’s post on sioc-dev:

“Yahoo has introduced a new product called Pipes. It seems to be a GUI-based interface for building applications that aggregate RSS feeds and other services, creating Web-based apps from various sources, and publishing those apps. Sounds very cool. TechCrunch has a decent write-up, and Tim O’Reilly is all over it. The site was down for a few hours and is just back up. Has anybody tried this?”

and from the Pipes page:

Pipes is an interactive feed aggregator and manipulator. Using Pipes, you can create feeds that are more powerful, useful and relevant.

So I created a basic pipe to take three feeds from Planet Journals, IrishBlogs.ie and awards.ie about the forthcoming Irish Blog Awards using the “Fetch” module. I then used their “For Each: Annotate” module to add a sioc:topic annotation, using the first matching result from a Yahoo! search for the phrase “Irish Blog Awards”. The graphical interface is very easy to use, and a screenshot of the pipe construction is shown on the left. You can see the pipe output on the right below; unfortunately the RSS 2.0 dump loses the sioc:topic annotation I added, but the JSON dump still retains it so with a bit of manipulation this could provide the appropriate RDF.

20070213a.png20070213b.png

SIOC appearance in Debian Unstable (SWAML, Buxon)


$ apt-cache search sioc
buxon - SIOC forums browser
swaml - Semantic Web Archive of Mailing Lists

Thanks to Wikier.

DM110 - Week 6 - Video Podcasting

My slides from today’s lecture are now online at Slideshare.

Times Online: Fake bloggers soon to be ‘named and shamed’

Fake bloggers soon to be ‘named and shamed’-News-Politics-TimesOnline

Fake bloggers soon to be ‘named and shamed’

Sam Coates, Political Correspondent

Hotels, restaurants and online shops that post glowing reviews about themselves under false identities could face criminal prosecution under new rules that come into force next year.

Businesses which write fake blog entries or create whole wesbites purporting to be from customers will fall foul of a European directive banning them from “falsely representing oneself as a consumer”.

From December 31, when the change becomes law in the UK, they can be named and shamed by trading standards or taken to court.

The Times has learnt that the new regulations also will apply to authors who praise their own books under a fake identity on websites such as Amazon.

Read more.

Thanks to Dan for the link.

IIA Blog: Semantic Blogging

I’ve just published my fourth guest post for the IIA Blog - it’s about Semantic Blogging. I think I only have a few days left in my guest slot so I hope to fit in one or two more posts about wikis and maybe social networks before the end…

Semantic Blogging

(Originally posted by John Breslin on the IIA Blog.)

We’ve already seen how Web 2.0 has brought about a paradigm of tagged and commented-upon content: photos, bookmarks, events, videos, and blog posts. Blog posts are usually only tagged on the blog itself by the post creator, using free-text keywords such as “scotland”, “movies”, etc. (unless they are bookmarked and tagged by others using social bookmarking services like del.icio.us or personal aggregators like Gregarius). Technorati, the blog search engine, aims to use these keywords to build a “tagged web”. Both tags and hierarchial categorisations of blog posts can be further enriched using the SKOS framework. However, there is often much more to say about a blog post than simply what category it belongs in…

So let’s move on to semantic blogging (some ideas here are from Knud Moeller who is working on semiBlog). Traditional blogging is aimed at what can be called the “eyeball Web” - i.e. text, images or video content that is targetted mainly at people. Semantic blogging aims to enrich traditional blogging with metadata about the structure (what relates to what and how) and the content (what is this post about - a person, event, book, etc.). In this way, metadata-enriched blogging can be better understood by computers as well as people.

Last time I talked about structured blogging, where microcontent such as microformats is positioned inline in the HTML (and subsequent syndication feeds) and can be rendered via CSS. Structured blogging and semantic blogging do not compete, but rather offer metadata in slightly different ways (using microcontent / microformats and RDF respectively). There are already mechanisms such as GRDDL which can be used to move from one to the other.

So why would one choose to enhance their blogs and posts with semantics? Current blogging offers poor query possibilities (except for searching by keyword or seeing all posts labelled with a particular tag). There is little or no reuse of data offered (apart from copying URLs or text from posts). Some linking of posts is possible via direct HTML links or trackbacks, but again, nothing can be said about the nature of those links (are you agreeing with someone, linking to an interesting post, or are you quoting someone whose blog post is directly in contradiction with your own opinions?). Semantic blogging aims to tackle some of these issues, by facilitating better (i.e. more precise) querying when compared with keyword matching, by providing more reuse possibilities, and by creating “richer” links between blog posts.

It is not simply a matter of adding semantics for the sake of creating extra metadata, but rather a case of being able to reuse what data a person already has in their desktop or web space and making the resulting metadata available to others. People are already (sometimes unknowingly) collecting and creating large amounts of structured data on their computers, but this data is often tied into specific applications and locked within a user’s desktop (e.g. contacts in a person’s addressbook, events in a calendaring application, author and title information in documents, audio metadata in MP3 files). Semantic blogging can be used to “lift” or release this data onto the Web.

20070211a.pngLooking at the picture on the right, Aidan writes a blog post which he annotates using content from his desktop calendaring and addressbook applications. He publishes this post onto the Web, and John, reading this post, can reuse the embedded metadata in his own desktop applications.

20070211b.pngThe next picture is from a semantic blogging application called semiBlog. In this picture, a semantic blog post is being created by annotating a part of the post text about John with an address book entry that has extra metadata describing John. Once a blog has semantic metadata, it can be used to perform queries such as “which blog posts talk about papers by Stefan Decker?”; it can be used for browsing not only across blogs but also other kinds of discussion methods; or it can be used by blog readers for importing metadata into desktop applications (using the Web as a clipboard).

As well as semiBlog, other semantic blogging systems have been developed by HP, the National Institute of Informatics, Japan and MIT. But it’s not just blog posts that are being enhanced by structured metadata and semantics - it’s happening in many other Web 2.0 application areas. Wikis such as the Wikipedia have contained structured metadata in the form of templates for some time now, and at least twenty “semantic wikis” have also appeared to address a growing need for more structure in wikis. I’ll talk about semantic wikis next time, and in the meantime look forward to your comments…

IT panelists for MIT Technology and Entrepreneurship Forum

20070210a.pngThe panelists for the forthcoming MIT Technology and Entrepreneurship Forum (to be held on March 2nd at the MIT Stata Center) have been announced on the TEF site. On the IT panel, the panelists are moderator Randy Adams, founder and CEO of Searchme, David Diamond, COO of Tourtellotte Solutions, Douglas Wyatt, CTO of McAfee SiteAdvisor, and myself. (The TEF is MIT’s largest student-run conference and is focused on sharing knowledge about entrepreneurship and technological innovation.)