More sneak peaks and I get to them.
Friday, September 14, 2007
IR Login page
More sneak peaks and I get to them.
Friday, September 7, 2007
ETD Standards
Additional fields for ETDs have already been standardized and put into the DC template by NDLTD. The thesis.degree element has 4 qualifiers -name,level, discipline, and grantor. Anyway, the page below is awesome- it explains this addition element, how to use DC for ETDS (for dummies), has an XML example, and maps all these to numerical MARC fields...
http://www.ndltd.org/standards/metadata/current.html#thesis.degree
That would probably be the best way to handle the extra ETD fields.
Mission Statement Draft
The shared digital archive is an organization of people and systems dedicated to capturing, preserving and making more widely available to the international scholarly community the intellectual output of each university’s faculty, staff, and students. This archive serves as an indicator of participating universities' quality and impact, and demonstrates the scientific, social, and economic relevance of our research activities. It also serves as an archive of historical and other materials that broadly support the academic missions of each institution.
I think it's a great start and should work fine with perhaps a few minor edits. Given our discussion of what kind of content we'll accept, I do have a few questions
- do we want to use the word "archive"? We are promising permanence, so it is archival in that sense, but this is really an access project rather than an archival quality image/document project.
- given our policy on student work, maybe we should end the first sentence by saying "...of each university." and not mentioning students one way or the other.
- not sure of what the last sentence is for. Do we need it?
Meeting minutes, etc
CONTENT AND GOALS
The group agreed that the goal of the shared IR is to make UNC-System scholarship freely available to a global audience. Therefore, the content for the IR will be limited to materials that meet the following criteria
- freely available to a worldwide audience (no embargoed materials or materials available only on campus)
- scholarly in nature
- an intellectual product of that institution's faculty (no student work, with the exception of ETD's or equivalent)
- completed work (no in-process works, or datasets)
- we do not own a streaming media server
- there are theoretical limits on the amount of available storage on the servers at UNCG, but it seems highly unlikely that that will prove a limiting factor
Name of project/site yet to be determined.
INTERFACES
UNCG will create an ETD interface, and a single institution IR search interface for each school. There will also be global ETD and IR search interfaces. The ETD's will be included in IR searches, but can also be searched separately. The site will also be included in Google and OAI registries. The separate ETD interface will have ETD-specific fields: department (just use the dept name at the time it was submitted and NOT try to do authority work on changing dept names!) advisor Not sure what to call this field but it would be a pull down with Thesis/Diss/Honors Paper
Which fields will be searchable?
Single school ETD interface – we can search any metadata fields we want to search
Single school IR interface - we can search any metadata fields we want to search
Multischool ETD search - we can search any metadata fields we want to search but there will be some issues that we’ll need to work out later (e.g. different schools having different names for the same functional dept)
Multischool IR search – same as the multischool ETD
Full Text search – will use Google to index full text so whatever limits and conditions applying to Google full text indexing will also apply to our site. It will not be possible to combine metadata and full text searches.
OAI search – We will publish DC data as per the standard.
Google search – will index our full text and our metadata
STANDARDS
1. We will use NCDC.
2. We will use the Hcard Microformat for author profiles. Here's the fields from the HCard microformat (http://microformats.org/wiki/hcard) we’re planning on using: Required:
fn (family-name, given-name, additional-name, honorific-prefix, honorific-suffix)
Optional:
url, email, tel
address (adr)
photo
title, org (organization-name, organization-unit) where org name = institution and org unit = department
note (could be used for a para of info about the contributor)
rev
OTHER ISSUES
1. ADDING NEW SCHEMA
From a technical point of view, it is easy to add new schema. What we are planning to do in teh admin is to record which schema you're using (which you'll select from a pulldown) and then you'll enter the actual subject(s) in another field. So to add a new schema, all we have to do is add an item to the pulldown. It seems to me the issues with adding new schema have more to do with cross searchabilty (what if one school uses a certain schema and the others don't?) and with retrospectively indexing to the the new schema once it is added. And there are probably other issues, too, but I don't see it as a technical problem, and I think we should certainly be open to adding new schemas as needed. btw- the reason it is easy to add new schema is that we aren't planning to import the actual controlled vocabularies themselves into the IR. You'd find the LCSH (or whatever) elsewhere and then enter it into the IR manually, through cut and paste, or possibly through some batch data load
2. DOES A COPY HAVE TO GO IN THE IR?
Technically, no. You could enter the metadata only and not upload the document. I believe we decided at our meeting that we would not use that as a technique to include thigns that did not meet our critiera (e.g. ebargoed materials). But we didn't specifically talk about doing that for things that do meet our criteria. If we did that, I assume we'd want a link from the metadata to the item in your Dspace. But in that scenario, your item would be found only by people searching the IR metadata and not by people doing the full text IR search. I guess it raises the policy question, "Does teh group want to require that all contributions be of both the item and metadata, or is it OK to have the metadata only?" My inclination would be that it is OK to have the metadata only, but only if it links to full text - which is exactly what you are asking about. So, I would lean towards "yes, that's OK" but I don;t know how otehrs on the group might feel about it
3. MARC BATCH LOADING
It would take a while for ERIT to build a MARC import program, and the resulting data might or might not be clean enough to load into the IR without double-checking everything. And if folks are going to double check, perhaps it is almost as easy to just cut and paste the pertinent fields from the catalog into the IR. Or perhaps not. It is a question of how much time we save per record, and how many records we expect to import. If the overall time savings is significant, then ERIT will write a program to load MARC records into the IR.
4. STATISTICS
We aren’t sure exactly which stats people will want, but they will probably fall into two types
a. number of items that exist (by date, by author, by format, by dept, etc)
b. number of uses (on campus, or each item, of a single auhtor’s item, of a dept’s uploads, etc)
UNCG envisions a very robust statistical tool that will allow you to export stats easily into Excel for further manipulation. We are willing to build additional statistical reports as the project progresses.
Thursday, September 6, 2007
Institutional Repository Progress
The first draft of the IR input fields is online at http://library.uncg.edu/ir/specs.asp. Please keep in mind that this is without my having coded a line yet, so is subject to some change.
Server
The power issues in the server room have been addressed, and the machine that will be hosting the IR is plugged in and being set up.