Friday, November 13, 2009

mx multi-entry key. a self review from a recent web publication



Recently we published an example of an MX generated multi-entry key as part of the publication and review Revision of the Oriental genera of Agathidinae (Hymenoptera, Braconidae) with an emphasis on Thailand and interactive keys to genera published in three different formats authored by Michael J. Sharkey, Dicky S. Yu, Simon van Noort, yours truly, and Lyubomir Penev. The revision was part of a series in ZooKeys reviewing multi-key creation practices, including some suggestions for publishers of open-access journals.

heres the key online.

The publication was good as a nice real world example. It highlighted some strong points in the system (multi-use matrices, auto linker to the Hymenoptera Anatomical Ontology, ability to import/export Nexus files, link to Morphbank images using Morphbank web services) and how we can expose these views as public (ie matrix view can be public [no-edit] or private [edit]). But more importantly it showed me a few things that could be done to improve it particularly in archiving the key version for future use:



1. We expose the nexus file for the key and a character list but need a format to contain all of the key annotations for export (including image and specimen data). We have character lists with image and specimen information attached for download---enough information to recreate the key. BUT best to do this how? SDD? or NeXML? or Other?

2. Also wouldn't it be grand to add a public comment section to the key itself? So while a user is working through the key they could add comments that would be archived for the key author.

3. Greater focus on images as primary language...continue to move further away from the words.

4. Encourage the use of tags. To show confidences and quality of characters.

Wednesday, November 4, 2009

Endnote Gem

HOW from HAO

Working on an Endnote parser for MX as part of our interactions with the Biodiversity Heritage Library (BHL). We want to collect new terms parsed from Journal of Hymenoptera Research (JHR) articles OCRed on BHL.

But then we ran into the first issue: JHR literature citations are not available in a nice format we can import (based on article, BHL has them in Endnote based on volume). Ideally in the end it will be published (in part) as my first gem on Github (results of project #1). The logic for creating, or perhaps better termed the justification for creating, an Endnote parser primarily has to do with Google. We at the Hymenotpera Anatomy Project are adding lots of references and MX reference addition is form heavy (unavoidable)...that is if you type it all in. Google Scholar exports references in Endnote. Thus the proposed work flow is something like this:
  1. I have a citation I want to enter
  2. I Google it
  3. Cut the Endnote file
  4. Paste and verify in MX
May or may not be faster. But thats the next idea...run a short experiment on that. Amount of time it takes me to enter 25 references...each way (results of project #2).

It will be most useful if we can then export all references in Endnote as well (or perhaps some other library friendly formats?). That way we can return the nicely formatted references for those who need them (including JHR and maybe BHL).

What I would really like to see is BHL OCR returned to me based on pages. I know you can already get by asking BHL to email you the pages, and rumor has it that a wrapper is being written to hack just this, but it would be lovely to access it directly without the hack.