Archive for the 'Wikis' Category

Tales from the SIOC-o-sphere #8

20080403a.png It’s time for another installment from the world of SIOC!

Previous SIOC-o-sphere articles:

#7 http://sioc-project.org/node/328
#6 http://sioc-project.org/node/310
#5 http://sioc-project.org/node/294
#4 http://sioc-project.org/node/272
#3 http://sioc-project.org/node/271
#2 http://sioc-project.org/node/138
#1 http://sioc-project.org/node/79

If you wish to contribute to the next article, join the SIOC Twine and use the tag “siocosphere9” when you add items.

Bits and pieces: SMOB 2nd Prize in SFSW / Drupal Ireland on Saturday / Wikipedia on OLPC

Semantic Microblogging wins 2nd prize in the Scripting for the Semantic Web challenge!

Alex tweeted from the SFSW workshop at the European Semantic Web Conference that our SMOB (distributed microblogging with semantics) prototype has won 2nd prize in the SFSW challenge. Congratulations and thanks to Alex, Tuukka and Uldis for all their hard work… Well done to Alex on also winning the best paper award at the SemWiki workshop - wow, what a great day for you!

Drupal Ireland Meetup 2008 on this Saturday in DERI

Stephane is organising the second Drupal Ireland Meetup for this Saturday in the Digital Enterprise Research Institute at NUI Galway (map). There will be presentations on the CCK, Views and ImageCache modules, Joomla2Drupal, theming, security, your first “Hello, World!” Drupal module, easy RDFS vocabulary publishing using Neologism, and maybe vBDrupal if I can manage it! You can sign up and find more details here.

Wikipedia iPhone app being ported to OLPC

Patrick Collison reports that the Wikipedia iPhone application he wrote for offline browsing of the Wikipedia is being ported to the One Laptop Per Child project… Excellent news!

CELT talk / WWW@15 on Morning Ireland / Ulrich Schnauss

A mixed-up blog post, but I haven’t the energy to write three separate posts, so here’s a three-in-one:

  • On Wednesday, I gave a talk at CELT, NUI Galway about “Learning via the Social Web”, which was a slightly-revised version of the one I gave in February. Again, there was an amazing turnout, and there will be a webcast made available via the CELT website at a later date. For now, you can access the PowerPoint slides here.
  • Yesterday, Damien Mulley and I were interviewed by Richard Downes on RTÉ R1 Morning Ireland about the 15th anniversary of CERN releasing the World Wide Web code for free (podcast available here; alternatively there’s an extracted clip here). I talked a little bit about the WWW versus UMn’s Gopher, and how the Web has expanded beyond the initial target audience of academics and researchers. I gave a slightly-tangential answer to a question I was asked about the importance of the Web to Ireland’s future and economy (FYI: CSO 2007 ICT stats), saying how dependent we are on the Web to do many tasks today, and describing how our work at DERI in NUI Galway will help us to deal with the current over-abundance of websites, by adding more structure to web pages so that computers can help us in finding the right information. “Are you telling me that the future of the Web [...] is being designed in Galway?”, Richard asked at one point. Yes!!! Finally, I mentioned how the problems with online video gridlock may have larger consequences as the Web is increasingly moving from the desktop to mobile devices where bandwidth is even more important, so smarter ways are needed to reduce exactly what will be sent to your phone (FYI: Opera Mini is a nice example, a tiny Java browser that works on most phones where the content is pre-filtered server-side before it gets to you).
  • Last night, I went along with friend Conrad to see Ulrich Schnauss at Stress in DeBurgo’s here in Galway. Although I missed the encore (it had been a long day, with a nine-hour session at work), I really enjoyed the night and the support acts: Beatpoet was great playing on his mono-something device, and Airiel were pretty good too :)

Slides from the SIOC tutorial at WWW2008

Here are the PowerPoint slides from our tutorial on “Interlinking Online Communities and Enriching Social Software with the Semantic Web” at the World Wide Web Conference in Beijing - you can also download them from here:

The tutorial went well, it was hot in the room and we were a bit jetlagged, but we had some good feedback afterwards and about 30 people attended in all.

I had a nice few days in Beijing, participating in the W3C advisory commitee meeting on Sunday, Monday and Tuesday, giving our SIOC tutorial with Alex and Uldis on Monday afternoon, popping along to our paper at the Linked Data on the Web workshop on Tuesday, attending some sessions on Wednesday (Kai-Fu Lee’s plenary keynote on Cloud Computing, the discussion panel with Lada Adamic et al. on the Future of Online Social Interactions, the W3C Open Your Data! track, and a packed session on Social Networks: Discovery and Evolution of Communities). On Thursday, I gave a talk about DERI at Tsinghua University to Cemon Yang and his team at the Digital Government / Web and Software Research Centre. Thursday evening we had the banquet in the Great Hall of the People, and I headed back to Ireland on Friday.

Unfortunately I saw little of Beijing outside of travelling between venues in taxis and buses, so I have a good reason to return and see / do more next time…

WebCamp SNP and BlogTalk 2008 approacheth…

I’m in Cork with a posse of eight from DERI, and it’s the night before two co-located events: the WebCamp workshop on social network portability (Sunday) and the BlogTalk conference on social software (Monday, Tuesday). Others that have arrived in Cork this evening include Niall Larkin, Ajit Jaokar, Aral Balkan, Ben Ward, Dan Brickley, Ross Duggan and Stephanie Booth.

I’m really looking forward to the talks, the discussions, the networking, the food, and some positive outcomes from the next three days. And with invited speakers of this quality, I know it’s going to be good.

Unfortunately, I’m missing the Irish Blog Awards for the second year running, but boards.ie’s Managing Director Gerry Shanahan is representing us as a sponsor. At least I hope to meet up with many of the bloggers at tomorrow night’s optional blogger’s dinner at Rossini’s here in Cork (43 people have signed up).

More blog posts about the events will be available via the tags webcampsnp and blogtalk2008. Here are some recent posts:

Five days left to register online for BlogTalk 2008!

Please note that online registration for BlogTalk 2008 (and WebCamp Social Network Portability) will close next Wednesday, 26th February 2008.

You can register at Amiando.

There are a few discount codes out there.

(Don’t forget to sign up for the optional blogger’s dinner as well!)

XTech 2008, May 6th-9th 2008, Dublin, Ireland

Call for Participation for XTech 2008

Proposals for presentations and tutorials are invited for XTech 2008, Europe’s premier web technologies conference. The deadline for submitting proposals is January 25th, 2008.

XTech 2008 will be held from May 6-9th 2008, in Dublin, Ireland.

XTech’s theme this year is “The Web on the Move”, focusing on the emerging portability of data, applications and identity on the internet. We will explore the benefits, issues, practicalities and fun of a web built on open standards, open source and commodity technology.

XTech presentations should inspire, educate and challenge. Your audience will be people like you, responsible for steering the technological direction of their organizations and the web as a whole.

Last year’s schedule can be viewed on the XTech 2007 web site.

Please direct any questions to the conference chair, Edd Dumbill.

View the calls for participation and submit a proposal

Suggested topics include, but are not limited to:

  • Social platforms
    • Design patterns for social software
    • Social network interoperability
    • Internet application platforms (Facebook F8, OpenSocial, etc.)
  • Identity management
    • OpenID
    • Practical security
    • OAuth
  • Ajax
    • jQuery, YUI, other toolkits
    • Offline applications
    • Comet
    • Professional Javascript
    • Flex
  • The web of data
    • Collective intelligence
    • Semantic technologies
    • Search
    • Markup and meaning
    • Freebase, Twine, Google Base
    • The place of XML on the web
  • Data and databases
    • Client-side databases
    • REST-oriented databases (e.g. CouchDB)
    • XML and RDF
    • Messaging architectures
    • XQuery
  • Operations and programming
    • Web application frameworks
    • Virtualization and appliances
    • Application scaling
    • Multicore and concurrency oriented programming
  • Mobile devices
    • Commodity mobiles
    • Android, iPhone
    • Hardware hacking and personal prototyping
    • Geolocation
    • Getting the mobile mindset

(Note: DERI will be a co-host of this event.)

Brewster Kahle’s (Internet Archive) ISWC talk on worldwide distributed knowledge

Universal access to all knowledge can be one of our greatest achievements.

The keynote speech at ISWC 2007 was given this morning by Brewster Kahle, co-founder of the Internet Archive and also of Alexa Internet. Brewster’s talk discussed the challenges in putting various types of media online, from books to video:

  • He started to talk about digitising books (1 book = 1 MB; the Library of Congress = 26 million books = 26 TB; with images, somewhat larger). At present, it costs about $30 to scan a book in the US. For 10 cents a page, books or microfilm can now be scanned at various centres around the States and put online. 250,000 books have been scanned in so far and are held in eight online collections. He also talked about making books available to people through the OPLC project. Still, most people like having printed books, so book mobiles for print-on-demand books are now coming. A book mobile charges just $1 to print and bind a short book.
  • Next up was audio, and Brewster discussed issues related to putting recorded sound works online. At best, there are two to three million discs that have been commercially distributed. The biggest issue with this is in relation to rights. Rock ‘n’ roll concerts are the most popular category of the Internet Archive audio files (with 40,000 concerts so far); for “unlimited storage, unlimited bandwidth, forever, for free”, the Internet Archive offers bands their hosting service if they waive any issues with rights. There are various cultural materials that do not work well in terms of record sales, but there are many people who are very interested in having these published online. Audio costs about $10 per disk (per hour) to digitise. The Internet Archive has 100,000 items in 100 collections.
  • Moving images or video was next. Most people think of Hollywood films in relation to video, but at most there are 150,000 to 200,000 video items that are designed for movie theatres, and half of these are Indian! Many are locked up in copyright, and are problematic. The Internet Archive has 1,000 of these (out of copyright or otherwise permitted). There are other types of materials that people want to see: thousands of archival films, advertisements, training films and government films, being downloaded in the millions. Brewster also put out a call to academics at the conference to put their lectures online in bulk at the Internet Archive. It costs $15 per video hour for digitisation services. Brewster estimates that there are 400 channels of “original” television channels (ignoring duplicate rebroadcasts). If you record a television channel for one year, it requires 10 TB, with a cost of $20,000 for that year. The Television Archive people at the Internet Archive have been recording 20 channels from around the world since 2000 (it’s currently about 1 PB in size) - that’s 1 million hours of TV - but not much has been made available just yet (apart from video from the week of 9/11). The Internet Archive currently has 55,000 videos in 100 collections,
  • Software was next. For example, a good archival source is old software that can be reused / replayed via virtual machines or emulators. Brewster came out against the Digital Millennium Copyright Act, which is “horrible for libraries” and for the publishing industry.
  • The Internet Archive is best known for archiving web pages. It started in 1996, by taking a snapshot of every accessible page on a website. It is now about 2 PB in size, with over 100 billion pages. Most people use this service to find their old materials again, since most people “don’t keep their own materials very well”. (Incidentally, Yahoo! came to the Internet Archive to get a 10-year-old version of their own homepage.)

Brewster then talked about preservation issues, i.e., how to keep the materials available. He referenced the famous library at Alexandria, Egypt which unfortunately is best known for burning. Libraries also tend to be burned by governments due to changes in policies and interests, so the computer world solution to this is backups. The Internet Archive in San Francisco has four employees and 1 PB of storage (including the power bill, bandwidth and people costs, their total costs are about $3,000,000 per year; 6 GB bandwidth is used per second; their storage hardware costs $700,000 for 1 PB). They have a backup of their book and web materials in Alexandria, and also store audio material at the European Archive in Amsterdam. Also, their Open Content Alliance initiative allows various people and organisations to come together to create joint collections for all to use.

Access was the next topic of his presentation. Search is making in-roads in terms of time-based search. One can see how words and their usage change over time (e.g., “marine life”). Semantic Web applications for access can help people to deal with the onslaught of information. There is a huge need to take large related subsets of the Internet Archive collections and to help them make sense for people. Great work has been done recently on wikis and search, but there is a need to “add something more to the mix” to bring structure to this project. To do this, Brewster reckons we need the ease of access and authoring from the wiki world, but also ways to incorporate the structure that we all know is in there, so that it can be flexible enough for people to add structure one item at a time or to have computers help with this task.

20071113b.jpg In the recent initiative “OpenLibrary.org“, the idea is to build one webpage for every book ever published (not just ones still for sale) to include content, metadata, reviews, etc. The relevant concepts in this project include: creating Semantic Web concepts for authors, works and entities; having wiki-editable data and templates; using a tuple-based database with history; making it all open source (both the data and the code, in Python). OpenLibrary.org has 10 million book records, with 250k in full text.

I really enjoyed this talk, and having been a fan of the Wayback Machine for many years, I think there could be an interesting link to the SIOC Project if we think in terms of archiving people’s conversations from the Web, mailing lists and discussion groups for reuse by us and the generations to come.

Multiple MediaWikis on Debian

Spent a few hours today trying to make a “wiki farm” on Debian using MediaWiki. I already had six wikis using separate code directories on the one server, so when I needed to update them all it was a real pain. Having to create a seventh standalone wiki today pushed me to doing this. I documented it here. Not sure if my notes will be helpful to others but I hope so… Took a little longer as I wanted to be to lock down each wiki with htpasswd (I know you can lock down parts of MoinMoin, but MediaWiki isn’t so partitionable).

Ross Mayfield podcast interview on PodLeaders

Tom Raftery has posted his podcast interview with Ross Mayfield at PodLeaders, well worth a listen. Ross, whom I met him briefly at Wikimania 2005, is CEO of SocialText.

To follow up on my question about Semantic Wikis, I think that sometimes there is the misapprehension that anything semantic has to involve some automatic AI-like deduction of metadata from the content by some agent or computer. A big part of the Semantic Web is enabling users to add structured content / annotations to pages (wikis being a good example here!) that can then be used to link things together (see the latter part of my IIA blog post on this). The Wikipedia page about Ross Mayfield links to about 25 pages - but it isn’t possible to get help with even a simple question such as “find me all the organisations that Ross has worked with or for”.

For example, the Semantic MediaWiki system allow people to add structured data into pages, such as typed links and attributes (or relationships and number / text properties). By allowing people to add such extra metadata, the systems can then show related pages (either through common relationships or properties or by embedding search queries in pages). These enhancements are powered by the metadata that the people enter (aided by computers of course, but not too much!)…