TDWG 2008

I'm flying back home after a week in Fremantle, West Australia. I was at the TDWG 2008 conference (pronounced "ted-wig"), a place where naturalists and computerists meet to talk about how we can share data.

I haven't been at TDWG before, but it's been happening for quite a few years, and they produced some interesting domain-specific schemas and protocols. There's Darwin Core, a vocabulary for representing taxonomical, observational and curatorial information - a sort of FOAF for naturalists; there's also TAPIR, a network protocol on top of HTTP, for retrieving records from remote servers. The conference also featured several talks by people doing cool stuff with bioinformatics.

Rich Pyle: Taxonomic names

Rich gave a quick and witty introduction to the naming schemes used by biologists and zoologists. They are similar but different, and deeply complicated: a name can contain a combination of person, year, and other names; abbreviations are used, but conventions differ. It all makes sense if you wrap your head around it, i guess.

Matt Jones: DataNetONE

Matt works on DataNetONE - they are building a network for data sharing and distribution across several big organizations. They are interested in "earth observations", the kind of data that goes into mashed-up, multi-layer maps. Matt talked about the challenges in building such a network; interestingly, they take a long-term view: they want to produce useful results today, but also have a sustainable architecture and business model for the next 30 years.

Terry Catapano: Plazi

Terry is a librarian at Columbia University, working on digitizing books and articles. He talked about marking up documents: delimiting paragraphs and taxonomic treatments, marking titles, scientific names of species, etc. For existing works this has to be done manually and the cost is non-trivial; it would make a lot of sense for people to mark up their own papers and then publish them in a digital format. He also made the case for more free-form scientific papers: right now the text needs to fit in A4 (or Letter) page format, with pictures; what's keeping us from switching to rich multimedia objects (think web pages)?

Chris Freeland: BHL

BHL stands for Biodiversity Heritage Library – it's a distributed book scanning project, part of Encyclopedia of Life, that is focusing on public-domain (pre-1923) works in the field of biodiversity. The scanned data is provided publicly for free, they scanned more than 22000 volumes available already, and they still have years of work ahead of them. Books from the Internet Archive are also available from their site.

Jim Croft: Biodiversity information standards

TDWG is not known for speed; one could repeatedly hear variations of the phrase "moving at TDWG time" at the conference. Its work is not driven by industry or big money, many of the people donate their time to move things forward, so nobody was shocked by the subtitle, "are we going wrong, or just not quite right?". He outlined some failings of the TDWG standardisation process – and proposed solutions – in a laugh-out-loud presentation. My favourite slide is based on the episode from XKCD, fields arranged by purity, and is named "where do we fit?"

Chris Freeland: JPEG2000

On the final day, Chris Freeland gave a presentation on the JPEG2000 image format (JP2 for short). Turns out, it's a really interesting and useful piece of technology that has little-to-none browser support but can still benefit people who are serious about imaging. At the Missouri Botanical Garden they have high-resolution photographs and scans stored in JP2 from which they crop tiles and serve them in a google maps fashion.

Greg Riccardi: OntoMorphBankSter

Greg talked about MorphBank, in his words a "superbly organized collection of images and metadata" related to biodiversity. He also described Morphster, a tool for working with ontologies and image adnotation, and described how the two could work together, combining strengths to create an "image-driven ontology and/or ontology-driven image annotation" system – a database with lots of high-quality images, properly adnotated as species occurrences.

Judy Fisher: Users of invasive species data

Apparently there's a lot more to invasive species than mourning the loss of native biodiversity: by studying the strengths and weaknesses of species, some invasives can be controlled and possibly eliminated. Judy described a success story in a nature reserve near Perth, where an invasive grass was drien back using controlled burning – this favoured seeds of native plants, which are better at withstanding fire. She also made the case for data sharing: the studies made by her group gathered a lot of detailed data that could benefit other groups, but there's no good way of publishing/advertising it.

Denis Lepage: AKN

the Avian Knowledge Network is a success story on collecting and publishing biodiversity observation data. They have more than 50 million records, described using Darwin Core (a TDWG standard) with some bird-specific extensions. Given the particular nature of birdwatching (an observer will report how long he stayed in one place, what species he was looking for, what species he expected to see but did not occurr, etc) they can create interesting views, like an accurate map of how common a species is.

Roger Hyam: SpeciesIndex

The final talk of the conference made me feel right at home. TDWG is all about helping naturalists to exchange data, and sometimes it tends to build complicated solutions, like the TAPIR protocol – extremely flexible query semantics (it reminds me of SPARQL), which make it hard to implement and then optimize. Roger's proposal, on the other hand, is dead simple: people are already publishing web pages that describe species; why not create an index of such sites and their pages, keyed to species names? Use the well-estabilished sitemaps protocol, maybe with a simple extension to provide biodiversity-related metadata, to register these pages in an index, and provide a way to find pages via a scientific name. Seems obvious, yet apparently nobody has done this yet.

All in all there were more than 70 talks; these were just the most interesting. Hallway conversations were valuable; the people attending TDWG are an interesting mix of computer people and naturalists. I wouldn't mind coming back next year.

Created:
27 Oct 2008, 14:56
« previous
(GitHub)
next »
(TDWG 2008 – travel)