Final Report on Pinyin Conversion

By

Pinyin Liaison Group
Council on East Asian Libraries
(CEAL)

March 2000

Susie Cheng, University of Hawaii
Yu-lan Chou, University of California at Berkeley
Guo-qing Li, Ohio State University
James Lin, Harvard University
Amy Tsiang, UCLA
Peter Zhou (Chair), University of Pittsburgh


Preamble

For nearly half a century, libraries in North America have been using Wade-Giles romanization for cataloging Chinese language materials.  In 1997, the Library of Congress (LC) announced a decision to switch to Pinyin for the romanization of Chinese in cataloging and authority records.  Pinyin is a romanization scheme widely used by governments, educational institutions, commercial publishers and news media in the Western world for transliterating Chinese scripts.  Pinyin conversion, now scheduled to occur in 2000, will bring about systematic changes in millions of records, both those in Chinese and those in other languages that have headings, notes, or title added entries in Wade-Giles.  This will be the single largest conversion of romanization systems in the history of American libraries to date.  This report discusses issues related to the change to Pinyin in North American libraries and recommends the necessary steps libraries should take to adopt the use of  Pinyin and to convert their cataloging records, following LC's lead.  This document is in the public domain, free of copyright restrictions.  Therefore, we encourage CEAL members to adapt this report or change it into their own planning document in local deliberations on Pinyin conversion.


Table of Contents

I.                    History and current status of Pinyin conversion

1.      Romanization of Chinese scripts in Anglo-American cataloging

2.      Planning for Pinyin conversion

3.      A recommended timeline for CEAL libraries

 

II.                 Options for CEAL libraries and analyses of local conversion needs

1.      Conversion options using national utilities

2.      Name authority records

3.      Pinyin marker

4.      Other prerequisites for Day 1

5.      Split files

6.      Conversion of non-Chinese records and non-standard Chinese romanization forms

7.      Implementation of Pinyin conversion by CEAL libraries

8.      Implications for Public Services

 

Appendix: Major Pinyin conversion tools and documentation


I.      History and current status of Pinyin conversion

1. Romanization of Chinese scripts in Anglo-American cataloging

 

Before 1957, there were no set rules for cataloging Chinese language materials in Anglo-American cataloging.  Different libraries varied widely with respect to the pattern and choice of language for bibliographic description and subject analysis.  The use of Wade-Giles as the standard romanization scheme in American cataloging practice was initiated in February 1958 when the Library of Congress began to catalog Chinese materials in Wade-Giles, following the publication in 1957 of Preliminary Rules and Manual for Cataloging Chinese, Japanese and Korean Materials[1].  While Wade-Giles became the standard romanization for Chinese scripts in North American cataloging, in 1958 China promulgated the Pinyin romanization scheme as its own standard for romanizing Chinese.  That same year, soon after the new scheme was publicized, the British Library started to use Pinyin for the bibliographic control of Chinese language materials.[2]  Ever since its introduction, Pinyin has been gradually adopted by the international community as the standard Chinese romanization system.

 

The Library of Congress (LC) first proposed conversion from the Wade-Giles system to Pinyin in 1979.  The conversion was to take place in 1980, to coincide with LC's introduction of computerized cataloging for Chinese materials[3].  That plan failed to garner sufficient support in the East Asian library community and was given up.  Again in 1990-91, LC publicly explored this issue and sought feedback from the library community.  Though there was considerable support, strong concerns about varying standards for word division in Pinyin and the lack of computer programs for conversion of online records again defeated LC's proposal[4].  In 1996, the National Library of Australia developed a conversion program that automatically converted 500,000 Chinese  records from Wade-Giles to Pinyin.  This influenced the Library of Congress and the East Asian library community to again explore the possibility of converting to the Chinese standard.  In November 1997, LC announced that in the year 2000 it would begin using Pinyin as the new standard romanization scheme for cataloging Chinese materials and would at the same time convert all its existing Wade-Giles machine-readable cataloging and related authority records to Pinyin. 

 

The East Asian library community endorsed LC’s decision.  In May 1997, the Council on East Asian Libraries (CEAL) had already formed a taskforce to investigate the feasibility of adopting Pinyin as the standard Chinese romanization system for use in North American libraries.  In its final report, the CEAL Taskforce supported the Library of Congress’ plan to convert to Pinyin romanization, recommending that such a program "be carried out only after a careful look at the impact of such a change on present national and local databases, on future additions to information about individual libraries, and on user access to the information.”[5]  In May 1998, CEAL appointed a Pinyin Liaison Group to succeed the Taskforce and to represent CEAL in deliberations with LC, RLG and OCLC on matters related to the implementation of Pinyin conversion in North American libraries.

 

2.  Planning for Pinyin Conversion

 

Since its November 1997 announcement, LC has issued Pinyin romanization guidelines and new classification schedules and has developed a conversion timeline in cooperation with RLG and OCLC.  On June 29, 1999, RLG held a forum to discuss Pinyin conversion with representatives from LC, OCLC, CEAL and senior library administrators from eight research libraries with large Chinese collections.  The panelists discussed RLG's and OCLC’s plans for conversion, LC romanization guidelines, the sequence of conversion, and implications for local systems.  On October 7, 1999, Harvard University organized a meeting at the Library of Congress with representatives from LC, OCLC, RLG, CEAL and selected libraries with large Chinese collections.  During this meeting, representatives of these institutions reached an agreement on a conversion timeline for LC, RLG, OCLC, and individual libraries to follow.  They also discussed various conversion options, name authority records (NARs), and issues related to local systems.  On January 16, 2000, RLG held another forum on Pinyin conversion.  At this forum, participants discussed RLG and OCLC conversion services, the conversion of non-Chinese records, and miscellaneous issues related to the implementation of Pinyin conversion.  The latter included especially how to mark bibliographic and authority records that have been processed through Pinyin conversion,


The following key assumptions have emerged from these planning discussions:

3. A Recommended Timeline for CEAL libraries

II.    Options for CEAL Libraries and Analyses of Local Conversion Needs

1.      Conversion options utilizing national bibliographic utilities

 

CEAL Libraries have the following options for conversion by the national utilities:

 

OCLC Services

 

Option 1: Conversion Based on the Library's Local Database.  Under this option, an individual library sends a file of MARC records from its local system to OCLC for conversion.  Conversion may be limited to specific fields in the bibliographic records, but a final decision on this had not been made by OCLC as of this writing.  Library then replaces its existing records with the converted local records returned by OCLC.

 

Option 2: Conversion Based on the Library's Archive Records.  Under this option, OCLC creates a file of a library's archive records and converts them to Pinyin.  (Note that any editing done in a library's local system will not be reflected.)  Library then replaces its current records with the converted archive records supplied by OCLC.

 

Option 3: Delivery of New Copies of Converted Master Records.  Under this option, OCLC delivers copies of converted master records to which a library's holdings symbol is attached.  (Editing done during previous uses of the record or in a library's local system would not be included.)  A library then replaces its current records with OCLC's Pinyin-converted master records.

 

Authority Records:  At the time of conversion, OCLC can optionally provide a copy of converted National Authority File records associated with headings in these bibliographic records but will not convert authority records extracted from a library’s own database. 

 

Batch loaded Records:  Batch loading software will be modified so that incoming records are converted.

 

RLG Services

 

Conversion of Records in RLG's Union Catalog (RLIN):  All libraries with Chinese language records in the RLG union catalog will be converted in the October 2000 to April 2001 timeframe.  RLG will first convert clusters that contain LC records followed by the conversion of clusters containing records of individual libraries, beginning with the largest Chinese collections.  A library can order a snapshot of its converted records as soon as conversion of all its records is completed.  RLG will add a "Current Pinyin Conversion Status" page to the RLG Web site indicating which libraries' records have been completely converted and which ones are in process.

 

Conversion of Batch loaded Records:  RLG will convert Chinese language records in Wade-Giles that are batch loaded after October 1, 2000, providing the source library identifies that a file contains Chinese language records requiring conversion.  The library can request a copy of these converted records after they are loaded.

 

Libraries will handle post-conversion catalog maintenance and clean-up by themselves.  Although the bibliographic utilities can provide some help in catalog maintenance, the responsibility for post-conversion maintenance will be wholly the responsibility of each individual library.  Retrospective conversion projects already underway may continue to be in Wade-Giles, as both OCLC and RLG can convert these records into Pinyin after Day 2 as part of their batch loading programs.  Therefore, libraries need not be concerned about not being able to complete retrospective conversion before Day 1.  OCLC will modify batch load software to convert incoming bibliographic records and modify other services such as their "Authority Control Suite" and "Bibliographic Record Notification" to reflect the needs of Pinyin conversion.  RLG will provide files of all changed headings sorted by the frequency of their appearance in bibliographic records to guide updates for authority records. 

 

2.      Name authority records (NARs)

 

OCLC will complete conversion of NARs before Day 1.  As part of the Pinyin conversion programs, LC will compile a data dictionary of headings that should not convert.  If there is sufficient interest, LC’s Cataloging Distribution Service will distribute a file of converted NARs.  Converted NARs will also be included in the daily NACO distribution.

 

            CEAL libraries should feel free to use the Chinese conventional place names that have already been established in Pinyin in the National Authority File, even in Wade-Giles records, as these forms are specifically accounted for in conversion programs.  Among Chinese conventional place names established by the Library of Congress in Pinyin, two name headings have been identified as being susceptible to double conversion. These are: Teng Xian (Shandong Sheng, China) and Pi Xian (Jiangsu Sheng, China) (Chinese: ?? and ??) These headings should be double-checked after bibliographic records and NARs are converted.

            Due to the conversion timeline, LC’s bibliographic records and NARs will be converted before Day 1.  Double conversion of NARs can best be avoided by taking great care in how one changes Wade-Giles records after LC’s converted headings begin to appear in the name authority file.  It is advisable that CEAL libraries not use LC converted personal and corporate name headings in Wade-Giles records until such records are converted to Pinyin.  This way, the risk of double conversion can be reduced.  It will be perfectly safe to use LC’s converted headings in bibliographic records that include the Pinyin marker (field 987) coded to indicate that the record was either created in or converted to Pinyin. 

 

Currently, LC is formulating plans for a moratorium on creating and changing authority records with Wade-Giles romanization. LC will issue guidelines for NACO/BIBCO libraries to help them minimize the risk of double conversion.

3.      Pinyin marker

 

As CEAL libraries begin to catalog in Pinyin, they will implement a Pinyin marker in the 987 field of bibliographic records and in the 008/07 field of name authority records.  Records converted by the bibliographic utilities will also include the Pinyin marker fields to indicate their conversion status.  (See the instructions on the Pinyin marker in item 6, Appendix.)

 

4.      Other prerequisites for Day 1

 

LC will complete changes in subject headings and classification schedules.  LC and the utilities will conduct thorough tests of the conversion specifications for accuracy and will notify libraries of the final specifications.

 

5.      Split files

 

While CEAL libraries will create all new records in Pinyin after Day 1, there will be a period (estimated to last no more than six months) during which the bibliographic utilities will contain a mixture of Wade-Giles and Pinyin records.  During this period, to ease the work of copy cataloging, libraries may choose to accept a mix of records in Pinyin and Wade-Giles.  In addition, libraries will have to keep their Wade-Giles records prior to total conversion.  After Day 1, libraries should immediately begin to prepare for Day 2, the date when they will do all cataloging in Pinyin.  In this period of split files, before conversion of a library's records is complete, libraries should make sure that when a Pinyin record is created or adopted that a Pinyin marker is properly inserted.  It is noteworthy that currently there are already records with Pinyin headings in the national and local databases without a Pinyin marker attached.  Such occurrences should be minimized.  OCLC and RLG should complete conversion of their entire databases of Chinese records by April 1, 2001.

 

6.      Conersion of non-Chinese records and non-standard Chinese romanization forms

 

OCLC will scan its entire database for Wade-Giles headings in non-Chinese records.  Wade-Giles name headings in non-Chinese language records will be converted to Pinyin by OCLC after Chinese bibliographic records are converted.  The changed records will be distributed to member libraries if they have chosen conversion option 3 described above.  Non-standard Chinese romanization forms will be ignored during this database scan.  Non-standard romanization forms will be converted manually if necessary. 

 

            RLG may schedule the conversion of Wade-Giles strings in Japanese and Korean language records and in records with “Chinese” listed in the 041 field after completing the conversion of Chinese language records in the RLG union catalog, but this is not part of the 2000-2001 project currently underway.

 

            In CEAL member libraries, the conversion of name headings from Wade-Giles to Pinyin in non-Chinese records will be mostly a local task.  Libraries using OCLC services should make certain  that changed non-Chinese records distributed by OCLC properly replace equivalent local records in their own databases.  Libraries using RLG services are encouraged to contact RLG regarding how to convert their non-Chinese records, as this is not part of RLG's announced 2000-2001 conversion project.

 

7.      Implementation of Pinyin conversion by CEAL libraries

 

Libraries should start to plan the implementation of Pinyin conversion immediately.  This should include deliberations on budget implications, conversion options, systems implications, the use of the Pinyin markers and Name Authority Records, cataloging workflow, and procedures such as the use of new cataloging schedules.

 

It is necessary that each CEAL library request its parent library to set up a taskforce with representatives from the Chinese collection and cataloging units, the library’s central cataloging department, information systems and other relevant personnel.  It is also imperative that CEAL libraries begin to communicate with the  national utilities regarding their conversion services.

 

Staff training and user education are also critical to the success of Pinyin conversion.  Cataloging and acquisitions staff involved in the processing of Chinese language materials need to be trained in the new Pinyin romanization scheme and in LC’s new subject headings and classification schedules. 

 

8.      Implications for Public Services

 

Library users need to be informed of the conversion and to be provided with proper search guides.  Special attention should be given to word division in Pinyin, as the standard being implemented differs somewhat from what may be familiar to users.  User education is especially critical during the period when two romanization forms co-exist in catalogs and networks.   Efforts should be made to direct users in how to search materials in the split files between October 1, 2000  and October 1, 2001.   Libraries need to prepare proper users' guides and other relevant handouts to assist users in their search of the converted local OPAC.  Pinyin conversion will also necessitate the re-labeling of current Chinese periodicals and their back files if such materials are shelved alphabetically by title.  Such serials re-labeling projects should be coordinated with the conversion of the associated records to Pinyin.


Appendix: Major Pinyin conversion tools and documentation

 

1.      LC’s Pinyin conversion timeline

http://lcweb.loc.gov/catdir/pinyin/timeline.html

 

2.      LC’s New Chinese Romanization guidelines

http://lcweb.loc.gov/catdir/pinyin/romcover.html

 

3.      Classification schedules: Chinese literary authors

http://lcweb.loc.gov/catdir/pinyin/authors1949.html

http://lcweb.loc.gov/catdir/pinyin/authors2001.html

 

4.      Classification schedule: changes to Chinese conventional place names

        http://lcweb.loc.gov/catdir/pinyin/class6.html

 

5.      Conventional Chinese place names

http://lcweb.loc.gov/catdir/pinyin/placefaq.html

 

6.      Pinyin markers

http://lcweb.loc.gov/marc/pinyin.html

      http://lcweb.loc.gov/catdir/pinyin/authorities.html

 

7.       LC’s announcement on Pinyin conversion project

http://lcweb.loc.gov/catdir/pinyin/announce.html

 

Works Consulted


[1] Cataloging service ; bulletin no. 42.

 

[2]  “Pinyin: possible approaches for cataloging and automation” prepared by Collections Services, Library of Congress.  In Committee on East Asian Libraries Bulletin, no. 90, June 1990, p. 56-62.

 

[3] LC Information Bulletin, Vol.  38, No.  26, June 29, 1979.

 

[4] "Library of Congress Position on the Use of Pinyin Romanization".  In Committee on East Asian Libraries Bulletin, no. 92 (February 1991), p. 32.

 

[5]  “Summary report of the CEAL task force to review a possible change from the Wade-Giles to the Pinyin romanization system” In Journal of East Asian Libraries, no. 115, June 1998, p.40-44.


Origianlly posted on eastlib on 18 April 2000.