[EIFLoa] How Do We Measure the Effectiveness of Institutional Repositories?
Iryna Kuchma
iryna.kuchma at eifl.net
Thu Feb 24 13:37:31 EET 2011
*How Do We Measure the Effectiveness of Institutional Repositories?*
Author: Brian Kelly, UK Web Focus at UKOLN <http://www.ukoln.ac.uk/>, a
national centre of expertise in digital information management based at the
University of Bath, UK
http://ukwebfocus.wordpress.com/2011/02/24/how-do-we-measure-the-effectiveness-of-institutional-repositories/
The Need for Metrics
How might one measure the effectiveness of an institutional repository? An
approach which is arising from various activities I am involved in related
to evidence, value and impact is based on the need to identify the
underlying purpose(s) of services and to gather evidence related to how such
purposes are being addressed.
Therefore there is a need to initially identify the purposes of an
institutional repository. Institutions may have a variety of different
purposes (which is why, although gathering evidence can be important,
drawing up league tables is often inappropriate). But let’s suggest that
two key purposes may be: (1) maximising access to research publications and
(2) ensuring long-term preservation of research publications. What measures
may be appropriate for ensuring such purposes are being achieved?
For maximising access to research publications two important measures will
be the numbers of items in the repository and the numbers of accesses to the
items. Since the numbers themselves will have little meaning in isolation
there will be a need to measure trends over time, with an expectation of
growth in the numbers of items deposited (which show slow down once legacy
items have been uploaded and only new items are being deposited) and
continual increase in overall the traffic to the repository as the number of
items grows and access to the items via various resource discovery services
provides easier ways of findings such resources.
Access Statistics for Institutional RepositoriesThe relevance of such
statistics is well-understood with, here at the University of Bath, the IRStats
module for the ePrints repository
service<http://opus.bath.ac.uk/cgi/irstats.cgi>providing access to
information such as details
of all downloads<http://opus.bath.ac.uk/cgi/irstats.cgi?page=get_view2&divisionss=dummy&IRS_epchoice=research_centres&research_centress=dummy&subjectss=dummy&creators_names=dummy&eprint=&period=-3m&IRS_datechoice=range&start_day=1&start_month=1&start_year=2005&end_day=31&end_month=12&end_year=2011&view=AllItemsTable>,
the overall number of downloaded
items<http://opus.bath.ac.uk/cgi/irstats.cgi?page=get_view2&divisionss=dummy&IRS_epchoice=research_centres&research_centress=dummy&subjectss=dummy&creators_names=dummy&eprint=&period=-3m&IRS_datechoice=range&start_day=1&start_month=1&start_year=2005&end_day=31&end_month=12&end_year=2011&view=DownloadCountHTML>(100,003
at the time of writing), the trendsover
time<http://opus.bath.ac.uk/cgi/irstats.cgi?page=get_view2&divisionss=dummy&IRS_epchoice=research_centres&research_centress=dummy&subjectss=dummy&creators_names=dummy&eprint=&period=-3m&IRS_datechoice=range&start_day=1&start_month=1&start_year=2005&end_day=31&end_month=12&end_year=2011&view=MonthlyDownloadsGraph>and
various other summaries, as illustrated.
However it is important to recognise that such measures only indirectly
provide an indication of how well a repository may be doing in maximising
access to research publications. In part traffic may be generated by users
following links to content of no interest to them through use of search
engines such as Google (which is responsible for providing 38% of traffic to
the University of Bath
repository<http://opus.bath.ac.uk/cgi/irstats.cgi?page=get_view2&divisionss=dummy&IRS_epchoice=research_centres&research_centress=dummy&subjectss=dummy&creators_names=dummy&eprint=&period=-3m&IRS_datechoice=range&start_day=1&start_month=1&start_year=2005&end_day=31&end_month=12&end_year=2011&view=SearchEngineGraph>,
with another 10.2% arriving via Google Scholar). In addition even if a
relevant paper is found and read, the ideas it contains may not be felt to
be of direct interest and may not be used to inform subsequent research
activities.
A citation to a resource will provide more tangible evidence of direct
benefits of a repository to supporting research activities and work such as
the MESUR metrics activity <http://www.mesur.org/Metrics.html> is looking to
“*investigate an array of possible impact metrics that includes not only
frequency-based metrics (citation and hit counts), but also network-based
metrics such as those employed in social network analysis and web search
engines*“. However in this post I will focus on evidence which can be easily
gleaned from repositories themselves.
Whilst it is possible to point out various limitations in using such metrics
the danger is that we lose sight of the fact that they can still have a role
to play in providing a proxy indicator of value. So although repository
items which are found and downloaded may not be of interest or may not be
used, other items will be relevant and inform, either directly or
indirectly, other research work. We might therefore assert that an increase
in traffic may also have a positive correlation with an increase in use.
The Numbers of Items in Repositories
Measuring the numbers and growth in numbers of items in a repository would
seem to be less problematic than access statistics. This measurement can
reflect the effectiveness of a repository’s aims in supporting the
preservation of research publications, as publication are migrates from
departmental Web sites or individual’s personal home pages to a centrally
managed environment. The growth in the numbers of items should also, of
course, help in enhancing access to the papers too.
Repositories may, however, only provide access to the metadata about a paper
and not access to the paper itself. This may be due to a number of factors
including copyright restrictions, (perceived) difficulties in uploading
document or the unavailability of the documents.
There may also be a need to be able to differentiate between the total
number of distinct items in a repository and the numbers of formats which
may be made available. Storage of the original master format is often
recommended for preservation purposes and if ease-of-reuse of the content
may be required (e.g. merging together various papers and producing a table
of contents can be much easier if the original files are available, rather
than a series of PDFs which can be more difficult to manipulate.
Alternative formats for items may also help to enhance access for users of
mobile devices or users with disabilities who may require assistive
technologies to process repository items. This then leads to the question
of not only the formats provided but how those formats are being used: is a
PDF easily processed by assistive technology or is it simply a scanned image
which cannot be read by voice browsers? In addition, as suggested by
preliminary research carried out by my colleagues Emma Tonkin and Andy
Hewson described in a post on “Automated Accessibility Analysis of
PDFs in Repositories<http://ukwebfocus.wordpress.com/2010/07/30/automated-accessibility-analysis-of-pdfs-in%C2%A0repositories/>“,
might the cover pages automatically generated by repository systems created
additional barriers to access of such resources?
Trends Across the CommunityThis post has outlined areas in which evidence
should be gathered and used in order to be able to help demonstrate the
value of an institutional repository service and help to ensure that a
number of best practices are being addressed (and, if not, to be able to
develop plans for implementing such best practices).
Although such work should be done within the context of an individual
repository service there are also benefits to be gained from observing
trends across the community. My colleague Paul Walk recently mentioned on
the JISC-Repositories JICMail
lis<https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=JISC-REPOSITORIES;40e90853.1102>UKOLN
development if a prototype harvesting and aggregation system for
metadata from UK Institutional repositories called ‘RepUK’. One aspect of
this work is aggregation of metadata records from institutional
repositories and visualisation of various aspects of the data quality. Mark
Dewey, lead developer for this work, has released an initial prototype
tool<http://kitt.bath.ac.uk/RepUK/>.
As cane be seen this can provide a visualisation of the growth in the number
of records<http://kitt.bath.ac.uk/RepUK/graph.htm?grp=RepoDepositStats&k=json>across
the 133 repositories which have been harvested.
Discussion
This post has suggested that metrics are needed in order to help to provide
answers, perhaps indirectly, to questions regarding the effectiveness of
institutional repositories as well as to support and inform the development
of the repositories and the adoption of best practices. Of course measuring
the effectiveness of institutional repositories will also require user
surveys, but this post only considers quantitative appr0aches which are
summarised in the table below..
*Metric* *Purpose* *Comments* Total usage Provides an indication of
repository’s effectiveness in enhancing access to research papers. Data may
need to be carefully interpretted. Number of items Provides an indication
of repository’s effectiveness in both enhancing access to research papers
and in ensuring their preservation. It might be expected that growth with
decrease after a backlog of papers have been uploaded. Profiling
Alternative Formats May provide an indication that papers can be accessed by
users with disabilities or my users using mobile devices. Provision of
multiple formats may enhance access and reuse. Profiling Format
Quality Provides
an indication that the formats provided are fit for purpose (e.g. PDFs are
not just scanned images) This may indicate problems with repository
workflow, need for education, etc.
But what additional tools may be needed (I would welcome a mobile app for my
iPod Touch along the lines of the stats app for WordPress
blogs<http://itunes.apple.com/us/app/wordpress-mobile-statistics/id410530771?mt=8>)?
What advice is needed in interpretting the findings (and avoiding
misinterpretations?) Your thoughts are welcomed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.eifl.net/pipermail/eifloa/attachments/20110224/dd3aeb12/attachment.html
More information about the eifloa
mailing list