miERIT

Wednesday, February 6, 2008

Institutional Repository Admin module complete

We're testing it out right now (read: bug hunting), but functionally it is complete.

If anyone would like to play around in it before the Feb 22 meeting just drop me (Richard) a line. Just be aware I will be wiping the database clean at that time.

Take care.

Monday, November 26, 2007

IR People

The ability to add, update, and delete people (Creators and Contributors) is functionally complete. note this does not include ETD advisor(s).

Tuesday, October 9, 2007

IR Update

It's been a while since I've posted, so I thought I would give a quick update.

Much of the generic interface work is done on the admin side. navigation, look and feel, etc.
I've created usernames and passwords for each participating school. Please contact me if you're interested in obtaining yours now
We've gotten SQL Server 2005 installed on the machine. The client tools should be installed on my computer by the end of the week, so I should be starting the nitty gritty database work next week sometime.
I'll be doing the adding, editing, and removal of creator/contributors first. basically the author/creator profiles pages.

Friday, September 14, 2007

IR Login page

Here's a screenshot of the login page for the nameless Institutional repository:

More sneak peaks and I get to them.

Friday, September 7, 2007

ETD Standards

Cat McDowell (who knows ever so much more than I do about metadata standards)says -

Additional fields for ETDs have already been standardized and put into the DC template by NDLTD. The thesis.degree element has 4 qualifiers -name,level, discipline, and grantor. Anyway, the page below is awesome- it explains this addition element, how to use DC for ETDS (for dummies), has an XML example, and maps all these to numerical MARC fields...

http://www.ndltd.org/standards/metadata/current.html#thesis.degree

That would probably be the best way to handle the extra ETD fields.

Mission Statement Draft

Joseph suggested this adaptation of ECU's IR mission statement-

The shared digital archive is an organization of people and systems dedicated to capturing, preserving and making more widely available to the international scholarly community the intellectual output of each university’s faculty, staff, and students. This archive serves as an indicator of participating universities' quality and impact, and demonstrates the scientific, social, and economic relevance of our research activities. It also serves as an archive of historical and other materials that broadly support the academic missions of each institution.

I think it's a great start and should work fine with perhaps a few minor edits. Given our discussion of what kind of content we'll accept, I do have a few questions

do we want to use the word "archive"? We are promising permanence, so it is archival in that sense, but this is really an access project rather than an archival quality image/document project.
given our policy on student work, maybe we should end the first sentence by saying "...of each university." and not mentioning students one way or the other.
not sure of what the last sentence is for. Do we need it?

Meeting minutes, etc

UNC Pilot Group IR Meeting - Aug 28, 2007 and subsequent decisionsPresent at meeting: Bucknall, Cox, McDowell, Wolf, Scherlin, Thomas, Riggins

CONTENT AND GOALS

The group agreed that the goal of the shared IR is to make UNC-System scholarship freely available to a global audience. Therefore, the content for the IR will be limited to materials that meet the following criteria

freely available to a worldwide audience (no embargoed materials or materials available only on campus)
scholarly in nature
an intellectual product of that institution's faculty (no student work, with the exception of ETD's or equivalent)
completed work (no in-process works, or datasets)

In addition to these criteria, the following technical constraints may limit what is posted to the shared IR

we do not own a streaming media server
there are theoretical limits on the amount of available storage on the servers at UNCG, but it seems highly unlikely that that will prove a limiting factor

The group agreed that we could make changes in any of these criteria at some future point, if needed.

Name of project/site yet to be determined.

INTERFACES

UNCG will create an ETD interface, and a single institution IR search interface for each school. There will also be global ETD and IR search interfaces. The ETD's will be included in IR searches, but can also be searched separately. The site will also be included in Google and OAI registries. The separate ETD interface will have ETD-specific fields: department (just use the dept name at the time it was submitted and NOT try to do authority work on changing dept names!) advisor Not sure what to call this field but it would be a pull down with Thesis/Diss/Honors Paper

Which fields will be searchable?
Single school ETD interface – we can search any metadata fields we want to search
Single school IR interface - we can search any metadata fields we want to search
Multischool ETD search - we can search any metadata fields we want to search but there will be some issues that we’ll need to work out later (e.g. different schools having different names for the same functional dept)
Multischool IR search – same as the multischool ETD
Full Text search – will use Google to index full text so whatever limits and conditions applying to Google full text indexing will also apply to our site. It will not be possible to combine metadata and full text searches.
OAI search – We will publish DC data as per the standard.
Google search – will index our full text and our metadata

STANDARDS

1. We will use NCDC.
2. We will use the Hcard Microformat for author profiles. Here's the fields from the HCard microformat (http://microformats.org/wiki/hcard) we’re planning on using: Required:
fn (family-name, given-name, additional-name, honorific-prefix, honorific-suffix)
Optional:
url, email, tel
address (adr)
photo
title, org (organization-name, organization-unit) where org name = institution and org unit = department
note (could be used for a para of info about the contributor)
rev

OTHER ISSUES

1. ADDING NEW SCHEMA
From a technical point of view, it is easy to add new schema. What we are planning to do in teh admin is to record which schema you're using (which you'll select from a pulldown) and then you'll enter the actual subject(s) in another field. So to add a new schema, all we have to do is add an item to the pulldown. It seems to me the issues with adding new schema have more to do with cross searchabilty (what if one school uses a certain schema and the others don't?) and with retrospectively indexing to the the new schema once it is added. And there are probably other issues, too, but I don't see it as a technical problem, and I think we should certainly be open to adding new schemas as needed. btw- the reason it is easy to add new schema is that we aren't planning to import the actual controlled vocabularies themselves into the IR. You'd find the LCSH (or whatever) elsewhere and then enter it into the IR manually, through cut and paste, or possibly through some batch data load

2. DOES A COPY HAVE TO GO IN THE IR?
Technically, no. You could enter the metadata only and not upload the document. I believe we decided at our meeting that we would not use that as a technique to include thigns that did not meet our critiera (e.g. ebargoed materials). But we didn't specifically talk about doing that for things that do meet our criteria. If we did that, I assume we'd want a link from the metadata to the item in your Dspace. But in that scenario, your item would be found only by people searching the IR metadata and not by people doing the full text IR search. I guess it raises the policy question, "Does teh group want to require that all contributions be of both the item and metadata, or is it OK to have the metadata only?" My inclination would be that it is OK to have the metadata only, but only if it links to full text - which is exactly what you are asking about. So, I would lean towards "yes, that's OK" but I don;t know how otehrs on the group might feel about it

3. MARC BATCH LOADING

It would take a while for ERIT to build a MARC import program, and the resulting data might or might not be clean enough to load into the IR without double-checking everything. And if folks are going to double check, perhaps it is almost as easy to just cut and paste the pertinent fields from the catalog into the IR. Or perhaps not. It is a question of how much time we save per record, and how many records we expect to import. If the overall time savings is significant, then ERIT will write a program to load MARC records into the IR.

4. STATISTICS

We aren’t sure exactly which stats people will want, but they will probably fall into two types

a. number of items that exist (by date, by author, by format, by dept, etc)

b. number of uses (on campus, or each item, of a single auhtor’s item, of a dept’s uploads, etc)

UNCG envisions a very robust statistical tool that will allow you to export stats easily into Excel for further manipulation. We are willing to build additional statistical reports as the project progresses.

Thursday, September 6, 2007

Institutional Repository Progress

Technical Specifications
The first draft of the IR input fields is online at http://library.uncg.edu/ir/specs.asp. Please keep in mind that this is without my having coded a line yet, so is subject to some change.

Server
The power issues in the server room have been addressed, and the machine that will be hosting the IR is plugged in and being set up.