Notes from September 10 meeting

From Metro Collaborate

Jump to: navigation, search

Code4LibNYC Meeting | METRO Offices | September 10, 2008

Contents

Lightning talks – notes and contact information

URLs

Please note that the URLs discussed below are also listed on this wiki at Code4libNYC9-10.

Lee Mandell (NYU Digital Library Technology Services) – Archivists' Toolkit

Contact: lee@nyu.edu | http://www.archiviststoolkit.org/

Lee's Presentation: Media:Code4libNYC9-10-08.ppt

The Archivists' Toolkit is an open-source collection management system for archives.

History

Development began w/phase 1 around 6 or 7 years ago. Second phase will end in June. What is sustainability model? Spin off not for profit, find parent, etc.?

Technical info

Focus for this presentation: back-end tech info!

Basics:

  • Database: MySQL, but also works with Oracle and SQL Server
  • "If you give us a test environment, we can work with you!"
  • Java desktop, fat client – rich user interface wasn’t available on the web when we started. Now want to port it over to the web at some point.
  • Can be standalone, single-machine, local network, WAN. Gets slower the farther you get from back-end db.
  • Open source

Problem:

Java is object-oriented, SQL is relational database. "Paradigm mismatch."

Solution:

Use Hibernate to be persistence layer b/t back-end db and Java.

  • Goes against plain old java objects (pojo) – so it’s all Java based. Freed us up from the world of SQL. Also, if we want to go to object-oriented db, then we just work on the bottom part.

On top of Java layer:

  • JGoodies – group of open source tools that go between plain java objects and Swing-based Java front end. Binding, validation, forms, etc.
  • Design UI w/SWING. Binding all in place. Reflected back to plain Java objects automatically.
  • Saved so much time b/c writing persistence layer.

Reporting:

Jasper Reports – needed an open source Crystal Reports, as it were! Not as powerful, but quite good. Import and export very good. EAD can be difficult to work with, mainly because it is document-centric, whereas AT is data-centric. Use of JaxB (https://jaxb.dev.java.net/) as layer between schema and Java, so that AT can bring in EAD.

Exporting METS – use Mets Toolkit. On top of JaxB.

Next Focus:

Connectivity in digital libraries / digital repository. How do you connect these things? Method of cataloging first, then doing scanning, then deposit into repository, then bringing PID back into toolkit. You can put URI anywhere, but you want a persistent identifier!

EAD as a transport mechanism within a content model. Store data as a content model (eg., JAX) and move EAD back and forth. Didn’t want an EAD authoring tool … but have spent a lot of time mapping.

PID travels w/object, returns tab-delimited file w/data and use statement.

2500 downloads of second release At least 30 people posting on website “willing to admit” that they are adopters! 

Basic modules:

  • Names
  • Subjects
  • Accessions
  • Resources

Big idea: must do authority work!

Features include:

  • Listing
  • Searching
  • Can search by linked records

Records are really complicated – if there is a hit many nodes down, it is not so useful to see the record itself. Trying to change this.

Full-text searching means that we have to do specific coding for specific databases…

Notes area within AT is more EAD-intensive support. EAD is document centric, but we aren’t publishing to paper as much. Need to have a data-centric model. Can change more easily to sending data out (e.g., harvesting). Pros and cons in both directions.

New feature: RDE, "rapid data entry" screens. Users don’t want to enter data in a lot of screens – instead, "rapid data entry" screens allow customizable layouts and one-screen entry.

New thoughts / movements:

  • When you get to item level, you pick how to describe it (e.g., Dublin Core, VRA core, etc.)
  • Some DL repositories with hopes for integration
  • DSpace
  • Integrate with CDL repository
  • Fedora
  • We need partners!!
  • Looking to beef up API so toolkit can be extended in local environment.

Greg Kallenberg (NYPL) – NYPL initiatives with Drupal

Contact: http://labs.nypl.org/author/gkallenberg/ | gjkallenberg@gmail.com

Shifting to Drupal/MySQL (http://drupal.org)

  • Motivating question: How can we enhance catalog, bring value and new technologies to users?
  • Shifting from ColdFusion, MSSQL, Static XHTML (both extranet and intranet)

Initial impressions on workload

Configuration, doing a lot more involving IT to get new site up and running. Drupal needs a bit of massaging to get things going.

Where is NYPL starting?

First change: internal blogs. Took best of the internal blogs and put on www.nypl.org/blog, a site driven by Drupal.

Biggest change is in content types and workflow management – not just static content anymore! Dynamic, audio, video, etc. Drupal helps to "democratize" the content production so that users can do their own coding, more CMS-based than currently done.

Massive amount of events going on (over 90 locations!) – can we use social networking component to do this?

Other issues:

  • Copyright and use of digital objects
  • Hardware shifting, shifting platforms.
  • Still working on back end.
  • Omeka exhibition platform
  • Ruby on Rails
  • Shifting ILSes, too!

The Power of Drupal

  • Drupal's greatest feature is its taxonomy power, which allows for grouping of disparate content into one “page.”
  • SOPAC2 – based on Drupal. Layer b/t ILS and Drupal.

Why moving to Drupal blog from WordpressMU?

  • Disparate content, plus power of taxonomy, can spit out content that is not blog like.
  • LDAP staff authentication had issues with WordpressMU. Drupal works much, much better.


Mark Matienzo (NYPL) – NYPL Labs and Individual Projects

Contact: mark@matienzo.org | http://matienzo.org/

Some projects at NYPL and in free time. Mark is interested in collaboration.

Worldcat Python Module:

  • Search API
  • XISD APIs
  • Matienzo.org/project/worldcat

WorldCat search API requires API key. Can get one at no charge if you are part of an OCLC institution. Not sure how OCLC will do this for non-OCLC institutions. XISDs are open, but there are some conditions.


Zgw: Z39.50/www gateway http://matienzo.org/project/zgw

  • Uses Web.py and Z39.50

P* roof of concept code to be built out

Visualization of archival metatdata

http://archivesz.com

  • Document-centered data model. Want focus on metadata, focus on structured data, to do more – integrate with RDF, multiple schemas, etc.
  • Test site built using MARC data and Simile framework from MIT (http://simile.mit.edu). Mapping location information from 852 field. Takes institution name, does geocoding, and then maps it. Simile is a framework to allow use of semantic web technologies in an easy way. Using JSON file, which includes location information. Web service linked into JS library.

ArchiveZ allows importing of EAD finding aids to give visualization. Ruby on Rails, parsing data into MySQL. Rails, in particular, has a great date parser. Archivists Toolkit making things a bit easier as well.

Brooklyn Archival Resources Catalog

Designed as a centralized, informal project, to bring in metadata from many archives. Began by looking at historical inventory of documents done by NYS library. WorldCat’s location info is totally wrong, however! Records, right now, are pulling from NYS library via Z39.50. Motivation is also to update these records. NYS Lib catalog and WorldCat info has metadata, but not displayed well. Some relevant info is not there! People still requesting info via RLIN number.

No parsing of XML data for OCLC modules right now. MarcXML wrapped in SRU xml. PyMARC doesn’t do namespace evaluations, while MARCXML and SRUXML have a record tag … a lot of heavy lifting needs to go on!

Sara Marcus (GSLIS, Queens College) – Use of Wikis in LIS Education

Contact: sara.marcus@qc.cuny.edu | http://gslis703.pbwiki.com

Purpose / exploration

Using wikis in a pedagogical way to instruct, share, and bring student involvement to GSLIS course information.

Benefits

  • Easy to use, tailored navigation, logically easier for setup (by the time students get their passwords, other things might be down)
  • Students can edit wiki, tracking edits and attributing information / history trail.
  • Can upload audio, video, etc.
  • No barriers to information, no access issues from home, etc.
  • Stable area to look for things as well.
  • Can restrict access, can view but not edit, etc.
  • PBWiki is fun, useful and easy thus far. Very few barriers to edit.

Drawbacks

  • Students names aren’t there / can’t see user list in compliance with FERPA.

What do the students think?

Students find it useful – they have a say, they know they will be using this in the future. This year, Sarah is letting them edit and post assignments, correct each other, etc.

How do you recreate the starting state?

  • PBWiki allows you to copy and recreate wiki, delete what isn't desired.
  • Keep desktop copies as needed.
  • Reupload as unit, hide unit.
  • Prior students have access, but no editing abilities.

Emily Molanphy (NYU Health Sciences Library) – Tables of Contents (TOC) in RSS

Contact: molane01@nyumc.org

Question:

NYU Health Sciences currently working through an issue on how to implement something. Other ideas? Best way to do this? Anyone else working on this? Project Goal:

Want to offer TOC via RSS. Especially in health sciences, this is really popular.

Journals offer TOCs in RSS. Issues: not a push service, readers on the web or w/in browser, some info overload.

How to do this?

  • Add feeds URLs to electronic resources management db. Just MySQL db.
  • Create bundles that users subscribe to (Ebling’s choice at UWisc). Pre-group by subject.
  • Incorporate feeds into Drupal

Option 1:

  • Get file of feed URLs from Ebling, keyed to ISSN, and upload
  • Feed URLs available in search results and detail pages
  • Pros: Easy to implement, fine control
  • Cons: Not high-visible and not a lot of value adding (users could do this on their own)

Option 2:

  • Can subscribe to “top journals” or all journals. User downloads OPML file and then upload into an RSS reader.
  • Pros: lots of flexibility
  • Cons: Users are stuck w/groupings, and could be overwhelmed. NYU needs to create and maintain process to bundle feeds into OPML. Need to subscribe to see results, so we’re sending users to Google instead of the library!

Option 3:

  • Drupal has aggregator model. Can bundle all feeds via category. Display page w/all items, or can subscribe. Can do multiple “slices” – each “slice” creates new, aggregated feed, with clickable icon.
  • Pros: library branding, very flexible;
  • Cons: feeds stored in Drupal, not ERD, users stuck w/groupings, too much content?

Where to start?

  • Hyper-specialists
  • Split difference w/option 3 and option 1. Do three general feeds, “top 5 journals,” clinical, and top research. Then offer indiv feeds via ERM and teach them how to use it.

Challenge: Make it work through remote access.

Other possibilities? What about email? People aren’t using feed readers… No – volume of content is too high, and NYU is going to Exchange – users will get very small mailboxes.

SCOPUS also has an alert to RSS feed.

PubMed scheduled search mechanism; Mount Sinai Medical Library and other libraries have worked with users on this. Do specific search, brings back hits, then turn that into an email with results hits. Schedule to run at a certain interval. Help users then to subscribe to these search alerts, get a good search query going, etc.

Ranking from RSS feeds – keyword, how many digs, etc., techniques that will bring things to the top. There is a Django model somewhere… for specialists, this will work really well. For generalist, it might not be as effective.

Web aggregator module from Drupal will be the technique here. Allows for serendipity approach as well.

A good example from Code4Lib – Planet Code4Lib, the Code4Lib blog aggretator (http://planet.code4lib.org).

Some notes from the conveners ...

  • email us if you have suggestions on improvement!
  • let us know if there's anyone you would like to present to the group.
Personal tools