An Open Source Alternative for Libraries

Gang Wan
Texas A&M University
United States
Wangang11@gmail.com

ABSTRACT: As Linux becomes a major operating system and Apache becomes the most popular Web server today, the concept of Open Source Software (OSS) has been familiar to many people. Software developers from all over the world have been contributing their efforts to build a huge OSS community that will exercise a great impact on various disciplines. Librarians and information professionals have foreseen the magnitude of this impact and believed that this relatively new model would bring lots of benefits to libraries. This article discusses and compares some major OSS projects particularly useful to libraries, and proposes a real-life OSS solution for your library based on the author’s first-hand experience.

I. OSS and LAMP

The name of Open Source Software (OSS) was given in 1998 by a group of leaders in the free software community[1]. Since then, the OSS movement has become a revolution in software development. And an international OSS community has formed based on the Open Source Initiative (OSI). As described in the Open Source Definition (OSD), the idea of OSS promotes software reliability and quality by supporting peer review and rapid evolution of source code. To be certified as open source software, the license of a program must guarantee the right to access, redistribute, modify, and use it freely[2]. Currently, there are quite a few open source licenses, such as Apache License, BSD license, GNU General Public License, MIT License, etc. Among them, GNU General Public License (GPL) is the most widely used now. It articulates the rights granted to the users of a computer program as well as restrictions entailed.

Probably the LAMP model has more profound influence on the information industry than anything else in the open source community, particularly in this age of World Wide Web. In this article, the acronym LAMP refers to a set of OSS programs used together to support dynamic websites or servers. The L in the acronym stands for the Linux operating system, which serves as the core of the LAMP model; the A represents the Apache Web server, which makes dynamic websites available from the Linux server so that users can access and interact with the Web pages with their Web browsers; the M refers to the database server MySQL, which is necessary for developing dynamic database-driven Websites; the P stands for one or more of the following scripting and programming languages: PHP, Python, or Perl, which can be used to create interactions between the end-users and the databases[3]. With the rapid development of OSS, these components have been fairly mature and stable today, and the applications based on the LAMP model have been largely adopted by enterprises, institutes and governments. A low-cost LAMP Brower/Server (B/S) model for libraries can be simply described in the following diagram. The implementation of OSS packages discussed in the following sections is based on this structure.


Fig. 1. The B/S structure based on the LAMP model

In order to install and configure OSS packages for libraries described in this article, some basic knowledge about Linux, Apache, MySQL and PHP/Perl/Python should be comprehended. For the Linux operating system, the prerequisite knowledge includes how to choose a distribution and install it, some basic commands (e.g. ls, man, mkdir, rmdir, cd, cp, mount, tar, ifconfig, etc.), and X-window desktops (e.g. GNOME and KDE). Most Linux reference books cover these topics.

For the Apache Web server, it is necessary to know how to find the configuration files and how to modify them. The most predominant configuration file is named “httpd.conf” and normally can be found under the /etc/httpd/conf directory in the Linux system. More detailed information on this topic can be found from the Apache’s documentation site, http://httpd.apache.org/docs/2.2/.

In addition, it is also important to understand major administrative commands and SQL statements (e.g. select, create, insert, delete, update, etc) for the MySQL database system, which are covered by the online MySQL Reference Manual[4].

Most scripts and programs mentioned in the following sections are written in Perl and PHP. Therefore, good knowledge of these languages is very helpful for customizing these programs for a specific library. The manuals of these languages can be easily found from their websites, http://www.php.net/docs.php and http://www.perl.org/docs.html.

II. Implement the Koha Library Management System

Since a library management system (also called integrated library system) plays a momentous role in supporting business and technical functions of a library, it is a critical part of any software solutions for libraries. A typical library management system includes several modules: acquisitions, cataloging, circulation, and administration. It also provides an online public access catalog (OPAC) that can be searched by patrons through a Web browser.

Some currently known open source library management systems are Avanti, Koha, Openbiblio, Evergreen, and Emilda[5, 6]. Of these projects, the New Zealand-based Koha is probably the most active and complete one. Including modules for circulation, cataloging, acquisitions, serials, reserves, patron management, branch relationships, and more, Koha has been declared as a true enterprise-class library management system, comparable to those commercial ones[7]. Since the distribution of its version 2.2, Koha has been a mature product and implemented by over 100 libraries in various sizes[8]. The latest distribution is Koha2.2.5, which was released in January, 2006 and available on Savannah, a website for distributing free and open source software.

Although Koha has released a MS Windows distribution, LAMP platform is highly recommended to be its environment. Here P stands for the Perl modules. In addition, during the installation, Internet access is also needed, considering some other Perl modules will be downloaded from the CPAN (Comprehensive Perl Archive Network) severs. Moreover, the Linux root account and the MySQL administrator account are required for creating Koha databases and users.

The first step of the installation is to download a distribution from Savannah (for version 2.2.5) or Sourceforge (for older versions). After extracting Koha installation files, you can go to the Koha folder and run the installation file from a terminal (or a Linux shell). The command to use is “perl installer.pl”. Running the installer.pl program will check if necessary Perl modules are available. In most cases, you will get an error message as follows.

You are missing some Perl modules required by Koha. Please run this again after installing them. They may be installed by finding packages from your operating system supplier, or running (as root) the following commands:

export LC_ALL=C
perl -MCPAN -e 'install "Date::Manip"'
perl -MCPAN -e 'install "HTML::Template"'
perl -MCPAN -e 'install "MARC::Record"'
perl -MCPAN -e 'install "Mail::Sendmail"' …

The names of missing modules may be different from the above example. To continue the installation, you need to run the commands in the error message first. These commands allow you to download the required Perl modules from a CPAN mirror site that you designate.

The CPAN (Comprehensive Perl Archive Network) is a large collection of Perl software and documentation, whose main purpose is to help programmers easily locate modules and scripts. The CPAN has a worldwide network of mirror sites. The CPAN master site (ftp.funet.fi), as an example, has over 250 public mirrors in 60 countries[9].

After installing these Perl modules through CPAN, you can resume the Koha installation by running the installer.pl again. The whole installation process includes copying Koha files, setting up the administrator account, and creating Koha databases. The information such as file locations, listening ports (used for virtual hosts), MySQL administrator account, etc. is collected through interactive dialogues.

So far, all files and database tables of the Koha system have been set up. To make the system work, a critical step is to modify the Koha’s configuration files. There are two files storing Koha’s configuration information, which can be found in the /etc/ folder by default. One of them is Koha.conf, containing some administrative information, such as the database name and the administrator account. The other file is Koha-http.conf, which includes Web-access configuration information for the Koha system, such as listening ports, aliases of Koha folders, etc. The main purpose of Koha-httpd.conf is to tell the Apache Web server how to run the scripts for the Koha OPAC and the intranet. Therefore, it is necessary to include this file in Apache’s configuration file, httpd.conf.

Now the entire installation process has been completed. The Koha OPAC and the intranet can be accessed by visiting http://your_Web_URL:8000 and http://your_Web_URL:8080 through a Web browser. Here you need to replace your_Web_URL with the actual URL of your Web server, and 8000 or 8080 with the actual listening ports that you assigned for the virtual hosts during the installation.

Before launching the Koha system for your library, you may also want to customize its Web interfaces. They are all generated from some HTML templates by Perl scripts. The default locations of these templates are /usr/local/koha/opac/htdocs/opac-tmpl and /usr/local/koha/intranet/htdocs/intranet-tmpl. Furthermore, as an open source system, its greatest advantage is that all the source codes are accessible. Therefore, you can modify the Perl scripts in /usr/local/koha/opac/cgi-bin and /usr/local/koha/intranet/cgi-bin to customize the Koha system’s functionalities.

III. Create Digital Collections with Greenstone Digital Library System

Digital libraries are composed of collections of digital objects, including text, image, video, and audio, along with methods for access and retrieval, and for selection, organization and maintenance of the collection[10]. As the needs for managing digital collections have increased greatly in recent years, some library software vendors, such as Dynix and Endeavor, have released commercial digital library management systems. To avoid paying expensive license fees, a good option is to implement an open source alternative.

Some major OSS projects for building and managing digital collections are DSpace, Fedora, eprints and Greenstone. Any of these projects has been implemented by a certain number of libraries. This article uses Greenstone as an example, since it is fairly easy to install, customize and use.

Greenstone has various distributions for Windows, Unix/Linux and Mac. In accordance with the theme of OSS, this article discusses the implementation of Greenstone in the LAMP platform. The Greenstone digital library software includes a Web interface for searching and viewing digital collections, and a librarian’s interface for building and managing collections. Since the Greenstone Librarian Interface (GLI) is developed with Java, the Java Run-time Environment (JRE) is required to run the program. The JRE package is free for download from Sun Develop Network (http://java.sun.com/j2se/1.5.0/download.jsp).

The installation of Greenstone Digital Library Software is very straightforward: just run the install.sh file from a terminal by typing “./install.sh” or “sh install.sh”. Then the necessary information such as the administrator account, file locations, and the location of cgi (or cgi-bin) folder will be collected through interactive dialogues. Here CGI stands for the Common Gateway Interface, a standard protocol for interfacing external applications with a Web server[11]. CGI scripts enable users to exchange information with the server via a Web browser. During the process of the installation, Greenstone will copy an executable CGI file named library to the CGI folder on the Apache server. When a user visits the URL – http://your_Web_server/cgi-bin/library (this URL is just an example, and the actual one depends on the location of the cgi-bin folder on the server), the Apache server will execute the library program and start the Greenstone Run-time System.

While the Greenstone digital library collections are Web-based and can be accessed anywhere via a browser, the Greenstone Librarian Interface (GLI) is a client program that can only be run from the computers that have installed Greenstone digital library software[12]. It is placed in the subdirectory gli of the top-level Greenstone directory (/usr/local/gsdl/gli in Linux by default). When the executable file gli.sh is run from the Linux shell, it will check if the Java Runtime Environment, Perl and Greenstone Digital Library Software are all installed on the machine, and then start the Greenstone Librarian Interface.

As mentioned earlier, in most cases your Linux server has already included Perl. So you just need to download and install the JRE. A very important step after installing the JRE is to set up Java environmental variables so that the system knows where to start it. The sample commands for setting up these variables are list below.

export JAVA_HOME=/usr/java/jdk1.5.0-06 (should be your java location)
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin

These commands can be either executed from the Linux shell every time when you want to run the JRE, or added to the user profile file (/etc/profile by default). In the latter case, the Linux system can load these variables automatically when it starts.

Configuration of the Greenstone digital collections can be done through the GLI and the Greenstone’s Web interface. Its appearance can also be customized by modifying macro files in the macro folder under the gdsl directory. The detailed document about the configuration of Greenstone is Inside Greenstone Collections, available from the wiki of this project. (http://greenstone.sourceforge.net/wiki/index.php/Manual)

IV. Index and Retrieve Your Documents with Swish-e

This section discusses the implementation of open source information retrieval software. Librarians have been working with information retrieval software for a long time. The applications of information retrieval software include online indexes/databases, OPAC, Web search engines, etc. As information retrieval becomes a very important area of the computer science and information science, numerous OSS projects in this area can be found online. Some widely known ones are Harvest, Ht://Dig, Isite/Isearch, MPS, Swish-e, WebGlimpse, and Yaz/Zebra. Among them Swish-e has received many good reviews for its simplicity and strong functionalities.

Swish-e can not only work as a traditional indexer for documents, but also a Web search engine as it supports Web crawling. Therefore, using it as a search engine for your library’s website is a good way to help patrons locate information.

Currently, Swish-e is only distributed in source packages. A C compiler is needed to build Swish-e before installing it. Perl is also required for running the Web spider[13]. Both tools are included by most Linux distributions.

After downloading and extracting Swish-e package, you can go to the corresponding Swish-e folder and execute the following commands from the Linux shell to build and install Swish-e.

./configure
make
make check
make install

Now the Swish-e indexer has been successfully installed. The following part gives an example of implementing Swish-e as a Web search engine, not about how to use it to index files on the local drives. The detailed instructions on its indexing and searching commands can be found from Swish-e’s online documentation.

First of all, you need to create a directory for the Swish-e configuration and index files (e.g. /web_index). The configuration file is named swish.conf, which includes a list of Swish-e directives. When you run the command swish-e -S prog -c swish.conf from the shell, the Swish-e program will follow the directives in the configuration file to index you files or Web pages. A very simple example of the swish.conf file is given below.

IndexDir spider.pl (use spider.pl for indexing)
SwishProgParameters default http://librarysite/index.html (the starting point for the spider file to crawl)
Metanames swishtitle swishdocpath (extra searching by title and path)
StoreDescription TXT* 10000 (store 10000 characters in the index file for txt documents)
StoreDescription HTML* 10000

You may also add more directives to this file to customize the indexer. Some major ones are IgnoreWords (set up a stop word list), FuzzyIndexingMode (set up a stemming rule), UseStemming (turn on/off the stemming function), MinWordLimit (words with fewer letters than this limit will not be indexed) and Buzzwords (specify words that will be indexed regardless of other rules).

The index generated by Swish-e is named index.swish-e and saved in the /web_index directory in this example.

Similar to Greenstone, the Swish-e based Web search engine uses CGI scripts for interactions between users and the server. Therefore, a CGI file called swish.cgi needs to be moved to the cgi-bin directory of the web server. The default path of swish.cgi is /usr/local/lib/swish-e/, or you can run swish-e -h | grep libexecdir to find its path.

When a user visits swish.cgi via a Web browser (e.g. http://URL/cgi-bin/swish.cgi), it will look up a configuration file named .swishcgi.conf for information like the location of the index file created previously, the title of the search page, etc. Some sample content of the .swishcgi.conf file is given below, which can be created with any text editor.

return { title => 'Search this library site', swish_binary => '/usr/local/bin/swish-e', swish_index => '/home/web_index/index.swish-e', }

After saving this configuration file in the cgi-bin directory and including the swish.cgi file on the library’s Web pages, patrons can easily search and retrieve information within the library’s site. In addition, there are numerous ways to customize the Swish-e search engine. More details can be found from its online documentation (http://swish-e.org/docs/index.html).

V. Build You Library’s Blog with WordPress

Blogging has been regarded as one of the most highly touted features of the Web 2.0 era [14]. A blog is a Web page composed of reverse chronological entries, on which both its owner and readers can input content via Web forms. Simplicity is probably the main reason for the success of the blog, since users do not need to have any technological knowledge to have their own Web pages. Many librarians have also realized that a blog could be a good media to communicate with their patrons and promote their services.

Although it is easy to create new blogs from some commercial blog hosts, such as blogger.com and MSN Space, hosting the library’s blog from its own Web server is probably a better choice for better flexibility. One of the most popular open source blogging tools is WordPress. Written in PHP, WordPress is a perfect example of LAMP applications. It uses MySQL database system to manage user accounts and entries. So you need to create a new database for the blog as well as the administrator account for this database.

The configuration script of WordPress is wp-config.php. And there is a sample file (wp-config-sample.php) enclosed in the WordPress package. Before installing it, you need to make sure that there is a wp-config.php file and that the information in it is correct. The easiest way to do that is to open the sample file with a text editor, modify the database configuration information, and save it as wp-config.php.

All the files in the WordPress package should be placed in a folder that can be accessed by the Apache HTTP server. The typical path for these files in Linux is /var/www/html/blog/, where /var/www/html/ is the home directory of the Web space hosted on this server, and /blog is created for WordPress. After moving files to this folder, you can visit PHP scripts via a Web browser. The installation process can be initiated by running the install.php script within the /wp-admin subdirectory. An example of its Web URL is http://library_url/blog/wp-admin/install.php. You may visit the URL from a browser and follow the instructions to complete the installation.

Now you have hosted a WordPress blog locally. You may go to the administration panel (sample URL: http://your-blog-url/wp-admin/) to manage and post the blog entries. The WordPress blog is highly customizable. The WordPress’ website (http://codex.wordpress.org) provides comprehensive documentation on further configuration, design and layout of a blog.

VI. OSS for Other Library Functions

Many libraries also provide some other services, such as interlibrary loan, document delivery, virtual reference, etc. Interlibrary Loan (ILL) services enable library patrons to access materials in other participating libraries, which normally can be requested via a Web form nowadays. In the circle of the ILL process, an automated system that helps librarians manage the requests and render them to other libraries is demanded. Some widely used ILL software packages are OCLC’s ILLiad, SirsiDynix’s URSA, FDI’s VDX, and RLG’s ILL manager. There are also a few OSS projects for this library function, such as OpenILL, ILL Wizard and ILL ASAP. OpenILL is not purely an Open Source system, since the participating libraries need to pay certain fees to use it, which are relatively cheap compared to commercial software. Both ILL Wizard and ILL ASAP were developed in the Windows system, and can be easily installed. The detailed implementation process is not discussed in this article, since they are not typical LAMP applications.

Document delivery services are often regarded as part of interlibrary loan services, through which interlibrary loan librarians provide a copy of the requested material to the patron. With the advance of digitizing technology, many libraries can deliver a digitized copy to the patron via email. This is enabled by a software system behind the scene. A well-known OSS electronic document delivery program is Prospero, developed at Ohio State University’s Prior Health Sciences Library. Originally designed as an Ariel (a document delivery tool developed by RLG) add-on to deliver electronic documents to end users, Prospero included scanning capabilities later. Then it could serve as an open source alternative to Ariel[15]. However, as it has not been updated since 2003, Prospero is less mentioned currently.

Virtual reference (VR) services enable library patrons to ask reference questions online through a Web form or an instant messenger. VR has been considered as a representative feature of the new generation of libraries, since it breaks the physical limit of the traditional libraries. Some commercial software packages for VR offer many add-on functions besides instant chatting, such as knowledge base and co-browsing. Compared to them, most OSS VR projects are not so sophisticated. Two representatives of these projects are RAKIM developed at Miami University Libraries and OpenAAQ. Additionally, some general chatting software programs, such as AIM and MSN have been used by a few libraries for VR services. A free but not open source chatting program, Jybe (http://jybe.com), is particularly useful to libraries, since it provides the co-browsing function. It is pretty easy to install and can be added on the toolbar of Internet Explorer (IE) or Firefox. The user can initiate a session by clicking the Jybe icon on the toolbar.

VII. Summary

The above sessions proposed an OSS solution for libraries, which focused on their major functions including circulation, cataloguing, acquisition, providing digital content, information retrieving and promoting library resources. OSS programs for other library functions, such as ILL, EDD and VR were also discussed. The benefits of implementing the OSS solution for a library are obvious. It is a cheap alternative to the expensive commercial software package. Many libraries pay tens of thousands dollars a year just for their library management systems. This is a fairly heavy burden to many libraries. Adopting OSS can certainly help them reduce the burden to some degree.

In addition, since the source codes of these programs are completely open to the public, they can be customized and enhanced by an individual library. System librarians can also develop some add-on applications for these OSS programs. On the contrary, the commercial software packages from different vendors do not provide such flexibility.

However, some disadvantages of implementing the OSS solution should also be recognized. Maintenance and support of these software programs is probably the most significant concern. Staff members need to have enough technical expertise to do this job themselves, since there is no vendor providing customer services. Also, the development of a specific OSS project is not on a continuous basis. After all, programmers who participate in these projects are volunteered. Their input to the projects depends on many uncertain factors, such as their free time or their personal interests. Most small projects do not update often, and quite a few stop developing after a period of time.

Therefore, you need to be aware of both favorable and unfavorable factors before considering utilizing an OSS system. Those applications with a longer history or better known are more preferable, as their user communities are relatively larger and helpful for troubleshooting or maintenance.

Reference

[1]. Bretthauer, D. (2002). Open Source Software: A History. Information Technology & Libraries, 21(1), 3.

[2]. Initiative, O.S. (2006). The Open Source Definition. [Cited 5/20/2006]. URL: http://www.opensource.org/docs/definition.php.

[3]. White, A., & Balsamo, J. (2005). using LAMP to make our library shine. Computers in Libraries, 25(5), 6.

[4]. AB, M. (2006). MySQL Documentation. [Cited 5/28/2006]. URL: http://dev.mysql.com/doc/.

[5]. Blalock, L. (2006). Open-Source Software for Libraries. [Cited 5/28/2006]. URL: http://creativelibrarian.com/library-oss/.

[6]. EIFL.net. (2006). Open Source Software. [Cited 5/20/2006]. URL: http://www.eifl.net/opensoft/soft.html.

[7]. Koha.org. (2006). About Koha. [Cited 06/06/2006]. URL: http://www.koha.org/about-koha/.

[8]. Poulain, P. (2006). Koha 2.2.5 released notes. [Cited 06/07/2006]. URL: http://savannah.nongnu.org/forum/forum.php?forum_id=4244.

[9]. Ashton, E. (2006). CPAN Frequently Asked Questions. [Cited 6/10/2006]. URL: http://www.cpan.org/misc/cpan-faq.html#What_is_CPAN.

[10]. Akscyn, R.M., &.W., Ian H. (1998). Report of First Summit on International Cooperation on Digital Libraries. Workshop of ACM Digital Libraries. (Marriott City Center, Pittsburgh, PA; June 27-28, 1998).

[11]. W3C.org. (1999). CGI: Common Gateway Interface. [Cited 06/10/2006]. URL: http://www.w3.org/CGI/.

[12]. Witten, I.H., & Boddie, S. (2004). Greenstone digital library installer's guide. [Cited 06/10/2006]. URL: http://greenstone.sourceforge.net/wiki/index.php/Manual.

[13]. Swish-e.org. (2004). Swish-e installation instructions. [Cited 6/15/2006]. URL: http://swish-e.org/docs/install.html.

[14]. O'Reilly, T. (2005). What Is Web 2.0: Design Patterns and Business Models for the Next Generation of Software. [Cited 6/15/2006]. URL: http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html.

[15]. Weible, C.L., & Robben, C. (2002). Calming the Tempest: The Benefits of Using Prospero for Electronic Document Delivery in a Large Academic Library. Journal of Interlibrary Loan, Document Delivery & Information Supply, 12(4), 79.


Submitted to CLIEJ on 19 September 2006.
Copyright © 2006 Gang Wan

Wan, Gang. (2007). An Open Source Alternative for Libraries. Chinese Librarianship: an International Electronic Journal, 23. URL: http://www.iclc.us/cliej/cl23wan.htm