Archive for the 'Web' Category

DERI Entrepreneurial Forum #2 last week

We had a very interesting event in DERI last week - the DERI Entrepreneurial Forum #2 - where six CEOs from the west of Ireland gave us their views on entrepreurship. There was some frank sharing of professional and personal experiences on both starting and running a company in Ireland.

The speakers were Jan Blanchard, CEO of Tourist Republic; John Brosnan, CEO of Netfort Technologies; Greg Cawley, CEO of Traventec; Julian Ellison, CEO of Tablane; Alan Duggan, CEO of Nephin Games; and Karl Flannery, CEO of Storm.

I think it was very useful for buddying entrepreneurs in DERI to engage these CEOs and to exchange ideas about their “dos and do nots”. (We even got some book recommendations from Jan!)

(Aside: God, I hate it when Google do their link tracking stuff for searches. I just want to be able to right click and copy a link, not have to copy some text on a page or click through, CTRL+L and CTRL+C. Stop it Google, you have enough tracking information already!)

Brewster Kahle’s (Internet Archive) ISWC talk on worldwide distributed knowledge

Universal access to all knowledge can be one of our greatest achievements.

The keynote speech at ISWC 2007 was given this morning by Brewster Kahle, co-founder of the Internet Archive and also of Alexa Internet. Brewster’s talk discussed the challenges in putting various types of media online, from books to video:

  • He started to talk about digitising books (1 book = 1 MB; the Library of Congress = 26 million books = 26 TB; with images, somewhat larger). At present, it costs about $30 to scan a book in the US. For 10 cents a page, books or microfilm can now be scanned at various centres around the States and put online. 250,000 books have been scanned in so far and are held in eight online collections. He also talked about making books available to people through the OPLC project. Still, most people like having printed books, so book mobiles for print-on-demand books are now coming. A book mobile charges just $1 to print and bind a short book.
  • Next up was audio, and Brewster discussed issues related to putting recorded sound works online. At best, there are two to three million discs that have been commercially distributed. The biggest issue with this is in relation to rights. Rock ‘n’ roll concerts are the most popular category of the Internet Archive audio files (with 40,000 concerts so far); for “unlimited storage, unlimited bandwidth, forever, for free”, the Internet Archive offers bands their hosting service if they waive any issues with rights. There are various cultural materials that do not work well in terms of record sales, but there are many people who are very interested in having these published online. Audio costs about $10 per disk (per hour) to digitise. The Internet Archive has 100,000 items in 100 collections.
  • Moving images or video was next. Most people think of Hollywood films in relation to video, but at most there are 150,000 to 200,000 video items that are designed for movie theatres, and half of these are Indian! Many are locked up in copyright, and are problematic. The Internet Archive has 1,000 of these (out of copyright or otherwise permitted). There are other types of materials that people want to see: thousands of archival films, advertisements, training films and government films, being downloaded in the millions. Brewster also put out a call to academics at the conference to put their lectures online in bulk at the Internet Archive. It costs $15 per video hour for digitisation services. Brewster estimates that there are 400 channels of “original” television channels (ignoring duplicate rebroadcasts). If you record a television channel for one year, it requires 10 TB, with a cost of $20,000 for that year. The Television Archive people at the Internet Archive have been recording 20 channels from around the world since 2000 (it’s currently about 1 PB in size) - that’s 1 million hours of TV - but not much has been made available just yet (apart from video from the week of 9/11). The Internet Archive currently has 55,000 videos in 100 collections,
  • Software was next. For example, a good archival source is old software that can be reused / replayed via virtual machines or emulators. Brewster came out against the Digital Millennium Copyright Act, which is “horrible for libraries” and for the publishing industry.
  • The Internet Archive is best known for archiving web pages. It started in 1996, by taking a snapshot of every accessible page on a website. It is now about 2 PB in size, with over 100 billion pages. Most people use this service to find their old materials again, since most people “don’t keep their own materials very well”. (Incidentally, Yahoo! came to the Internet Archive to get a 10-year-old version of their own homepage.)

Brewster then talked about preservation issues, i.e., how to keep the materials available. He referenced the famous library at Alexandria, Egypt which unfortunately is best known for burning. Libraries also tend to be burned by governments due to changes in policies and interests, so the computer world solution to this is backups. The Internet Archive in San Francisco has four employees and 1 PB of storage (including the power bill, bandwidth and people costs, their total costs are about $3,000,000 per year; 6 GB bandwidth is used per second; their storage hardware costs $700,000 for 1 PB). They have a backup of their book and web materials in Alexandria, and also store audio material at the European Archive in Amsterdam. Also, their Open Content Alliance initiative allows various people and organisations to come together to create joint collections for all to use.

Access was the next topic of his presentation. Search is making in-roads in terms of time-based search. One can see how words and their usage change over time (e.g., “marine life”). Semantic Web applications for access can help people to deal with the onslaught of information. There is a huge need to take large related subsets of the Internet Archive collections and to help them make sense for people. Great work has been done recently on wikis and search, but there is a need to “add something more to the mix” to bring structure to this project. To do this, Brewster reckons we need the ease of access and authoring from the wiki world, but also ways to incorporate the structure that we all know is in there, so that it can be flexible enough for people to add structure one item at a time or to have computers help with this task.

20071113b.jpg In the recent initiative “OpenLibrary.org“, the idea is to build one webpage for every book ever published (not just ones still for sale) to include content, metadata, reviews, etc. The relevant concepts in this project include: creating Semantic Web concepts for authors, works and entities; having wiki-editable data and templates; using a tuple-based database with history; making it all open source (both the data and the code, in Python). OpenLibrary.org has 10 million book records, with 250k in full text.

I really enjoyed this talk, and having been a fan of the Wayback Machine for many years, I think there could be an interesting link to the SIOC Project if we think in terms of archiving people’s conversations from the Web, mailing lists and discussion groups for reuse by us and the generations to come.

Lally meetup on Saturday…

Had an interesting evening chatting about Web 2.0, the Semantic Web and Fortune 500 consultancy with Brendan Lally (a Galway-born IT and Web consultant currently based in Colorado) during a night out with a few other web heads including James Cooley and Ina O’Murchu from DERI, and Richard Garsthagen, Technical Marketing Manager EMEA for VMware. We started off in the Kashmir Indian restaurant and gradually made our way to Sheridan’s on the Dock for some organic colas and Erdingers. As Brendan mentioned, Richard helped me to get my Nokia 770 talking to my 6234 (*99# was news to me) so that he could show us Autostitch (a fully-automatic 2D image stitcher). It was good to meet you Brendan; I hope the rest of your round-Ireland trip goes well.

Multiple MediaWikis on Debian

Spent a few hours today trying to make a “wiki farm” on Debian using MediaWiki. I already had six wikis using separate code directories on the one server, so when I needed to update them all it was a real pain. Having to create a seventh standalone wiki today pushed me to doing this. I documented it here. Not sure if my notes will be helpful to others but I hope so… Took a little longer as I wanted to be to lock down each wiki with htpasswd (I know you can lock down parts of MoinMoin, but MediaWiki isn’t so partitionable).

Politics in Ireland revamped, relaunched

Got a message from Damien Mulley about the revamped and relaunched Politics in Ireland website. It looks really good now, and I wish Damien all the best with it. In the words of Damien himself, it’s:

[...] an aggregator like IrishBlogs.ie but for any blog posts that mention a TD name. Over time this will expand to take in more politicians. The site is party neutral and is in no way partisan.

[...] After a few months in redevelopment it’s now ready to be used by the public. It’s been totally rebuilt with new code to allow it to be administered easier on the backend (good for me) and the updates should be a bit more regular. (good for you)

New features are widgets which can be stuck on your blog so you can display the latest politics posts on your own blog: http://www.politicsinireland.com/widget/

Additionally people can now subscribe by email and get these posts to their email address and so don’t need to come back on a daily basis or subcribe using a feed reader. Email sub box is on front page.

Lastly there is now a Wordpress plugin that if you install it, will link to the relevant TD page on Politics in Ireland any time you mention a TD in a blog post. Anyone can install this, once they have a Wordpress blog: http://www.politicsinireland.com/wordpress-plugin/

Informatics Showcase in Dublin, 3rd October 2007

20070905a.png I had an e-mail from Gerard Butler at EI telling me about the forthcoming Informatics Showcase event in Dublin on Wednesday, October 3rd. The event features presentations by researchers on commercial opportunities from their informatics research: typically technology license or campus company investment opportunities.

I’d be interested in attending but will be on leave that week, as the Zimbie and Millifeed projects are both quite interesting to me. Here’s the blurb from the site:

This event is designed to bring people together from the venture capital, business and research community to create mutually-profitable partnerships.

Since the informatics team started bringing researchers and industry together in 2005, Enterprise Ireland has worked closely with the commercialisation specialists in the research community. This has resulted in 12 technologies licensed to Irish companies with an additional 8 deals on track in 2007.

We have also supported the start up of three companies in the last two years, with another four due in 2007. 160 research projects in the informatics area have been funded to date and this event provides the opportunity to hear from 8 researchers who are currently bringing their technologies to market.

Potential investors or licensors of the technologies on show can also schedule a one-to-one meeting on the day with the researchers in advance. To arrange this you need to register. Click here to register

Top Irish websites? Any resources?

Does anyone have a good resource for listing the top Irish websites? All I have to go on at the moment are Alexa’s Ireland listings (which has boards.ie at #9) and the Top 100 Irish Sites (boards.ie is at #6).

Been a busy B…

…for the past few months, hence the lack of regular blog entries. Most of my summer has been taken up with proposal writing for research funding here at DERI, the first of which finished up around the end of the June and the second ran from then until the end of August, so unfortunately I haven’t had time for much else…

Anyway, here are some updates about future social media / social software activities I’m involved in:

ITAG BBQ last week

I attended the yearly IT Association of Galway Barbeque on behalf of DERI last week - it was an enjoyable night, and you can check out the ITAG website for other forthcoming events including Managing Teams Remotely, Borderless Ireland, and the ITAG Industry Awards 2007.

I’ve been Simpson-ized!

20070814a.png20070814b.png

(Courtesy of Simpsonize Me, an old photo and some manual tweaking.)