A consistent collection of terms
chosen for specific purposes
with explicitly stated, logical constraints
on their intended meanings and relationships.
Terms represent concepts, but it is the concepts themselves and their relationships, not the terms, that constitute the thesaurus.
Terms are related to one another in three different ways:
We also support a USE-WITH relationship in which a non-preferred text is associated with two descriptors. An example is "soil pollution" USE "contamination and pollution" WITH "soil resources".
Each concept represented by a term in the thesaurus has several components that identify, explain, and relate the term to other terms:
The USGS thesaurus and the other controlled vocabularies we use are stored in a relational database. Each thesaurus has three tables, one for preferred terms including the hierarchical relationships, one for non-preferred terms, and one providing the see-also linkages from one preferred term to another. A single table contains information about all of the thesauri in the database, and associative tables link that with alternative names for thesauri and category terms for thesauri. These tables are illustrated in the following simple entity-relationship diagram:
Table | Field | Description |
---|---|---|
term | Table of preferred terms, with hierarchical relationships. Field parent indicates the broader term. | |
code | Unique identifier of a term, may be integer or character. | |
name | Text of the term, character. | |
parent | Unique identifier of the parent term, a value of code or NULL. | |
scope | Scope note for the term, often serves as a general definition. May be NULL. | |
nonpref | Table of non-preferred terms. Field also specifies coordinated terms in a USE-WITH relationship. | |
code | Unique identifier of a preferred term, a value of term:code. | |
name | Non-preferred text, character. Cannot match a value of term:name. | |
also | Unique identifier of the coordinated term if the non-preferred text describes a USE-WITH relationship, a value of term:code, or NULL if the non-preferred text describes a USE-FOR relationship. | |
visible | Integer flag indicating whether the non-preferred term should be shown to end-users. If the non-preferred term is informative, this should be 1. If the non-preferred term is a misspelling, this should be 0. | |
relterm | Table of non-hierarchical "see-also" relationships between terms | |
a | Unique identifier of a term, a value of term:code. | |
b | Unique identifier of the related term, a different value of term:code. | |
thesaurus | Information about each thesaurus present in the database. | |
tag | Unique integer identifying the thesaurus. In web services this value is named thcode. | |
name | Preferred name of the thesaurus. Alternative names are specified in the associative table thname. | |
creator | Person or organization responsible for the creation of the thesaurus. | |
rights | Legal statement indicating limitations on usage of the thesaurus, if any. | |
edition | Version number or other edition indicator. | |
date | Revision date, in form YYYY-MM-DD. | |
tblname | Name of the table containing preferred terms, for example term. | |
codetype | Term indicating whether the unique identifiers of the terms are integer (value number) or alphanumeric (value alpha). | |
contact | Name or email address of the primary contact for the thesaurus. | |
nonpref | Name of the table containing non-preferred terms, for example nonpref. | |
relterm | Name of the table containing non-hierarchical "see-also" relationships between terms, for example relterm. | |
prefix | Suggested short prefix to use in XML namespace declarations. | |
uri | Uniform Resource Identifier for the thesaurus, for use in XML namespace declarations. | |
mdate | Last modified date of this record, in format YYYY-MM-DD. | |
mtime | Last modified time of this record, in format HH:MM:SS. | |
userid | User name or email address of the person last modifying this record. | |
scope | General scope statement indicating the intended usage and purpose of the thesaurus. | |
thname | Table of alternative names for thesauri. | |
thcode | Unique identifier of a thesaurus, a value of thesaurus:tag. | |
name | Alternative name of the thesaurus. Must not match any value of thesaurus:name. | |
thcategory | Table of category terms assigned to thesauri. | |
thcode | Unique identifier of a thesaurus to which the term is assigned, a value of thesaurus:tag. | |
thesaurus | Unique identifier of a thesaurus from which the assigned term is drawn, a value of thesaurus:tag. | |
code | Unique identifier of a term from the thesaurus identified in thesaurus. |
Download | Size | Content | Format |
---|---|---|---|
thesauri.zip | 8 MB | All thesauri | SQLite |
USGSThesaurus.rdf | 960 kB | USGS Thesaurus | SKOS RDF-XML |
MarinePlanningData.rdf | 100 kB | Data Categories for Marine Planning | SKOS RDF-XML |
CMECS.rdf | 575 kB | Coastal and Marine Ecosystem Classification System | SKOS RDF-XML |
Specialists recognize two different strategies for building controlled vocabularies: top-down, in which terms and their relationships are defined intuitively prior to their direct application in an indexing situation; and bottom-up, in which terms and relationships are added to the vocabulary in the process of indexing. But the same specialists also recognize that most vocabularies are developed using a combination of these two abstract approaches. We developed the USGS thesaurus using this combined strategy. Beginning by simply listing lots of important terms, we grouped those terms using a card-sorting procedure, and then refined the hierarchy with intuitive processes (that is, by relying on what we know). Subsequent revisions have occurred by group deliberation.
Preliminary development of the thesaurus was conducted using commercial software (MultiTES) by a contractor. Subsequent development and revision has occurred in a web-based database application developed by the group meeting the specific needs of this project.
We examined many similar controlled vocabularies of various types before and during this process. Examples are the GEOREF thesaurus produced by the American Geological Institute, the CERES thesaurus, the Geographic Names Information System (GNIS), the Integrated Taxonomic Information System (ITIS), the categorization scheme used in the Marine Realms Information Bank, and numerous smaller or more specialized vocabularies such as glossaries of scientific and technical terms presented on USGS web sites.
Name | Organization | Expertise |
---|---|---|
VeeAnn Cross | Woods Hole Coastal and Marine Science Center | Ocean sciences, data management |
Arnell Forde | St. Petersburg Coastal and Marine Science Center | Ocean sciences, data management |
David Govoni | Office of Enterprise Information (Retired) | Live birds, dead bugs |
Leslie Hsu | Community for Data Integration | Earth science, data integration |
Cassandra Ladino | Office of Enterprise Information | Administrative information systems, data management |
Amanda Liford | Science Analysis and Synthesis | Library Science, data management |
Lisa Zolly | Science Analysis and Synthesis | Library Science, software development |
Name | Organization | Expertise |
---|---|---|
USGS employees | ||
Alan Allwardt | Geology-Pacific Coastal and Marine Science Center | Earth Science, Library Science |
Karen Arcamonte | Biology | Library science |
Hylan Beydler | Geography-MCMC | Land characterization |
Nancy Blair | GIO-Library | Library coordination, cataloging & indexing |
Linda Broussard | Biology-Library | Life sciences, records management |
Pamela Callais | GIO-Library | Cataloging & indexing |
Brian Carpenter | GIO-Library | Library Science |
Liz Ciganovich | Water-CAPP | Publications |
Susan Cochran | Pacific Coastal and Marine Science Center | Ocean sciences, data management |
Wendy Danchuk | Hydrology | Cartography, publications |
Jeff Dietterle | GIO-EWeb | Geography, publication |
Carmelo Ferrigno | GIO-EWeb | Information architecture & design |
Karen Kaye | Biology | Information architecture |
Richard Huffine | GIO-SIEO | Library Science |
Irena Kavalek | GIO-Library | Cataloging & indexing |
Fran Lightsom | Woods Hole Coastal and Marine Science Center | Ocean science, data management |
Celso Puente | Water | Hydrology |
Peter Schweitzer | Geology, Energy, and Minerals Science Centr (Retired) | Earth science, software development, data management |
Gary Waggoner | Biology-CBI | Life sciences |
Gail Wendt | Communications | Hydrology, communication, publications |
Consultants and outside reviewers | ||
Linda Hill | Alexandria Digital Library, UC Santa Barbara | |
Gail Hodge | Information International Associates, Inc. | |
Candy Schwartz | Graduate School of Library and Information Sciences, Simmons College | |
Jessica Milstead | The JELEM Company | |
Amy Warner | Lexonomy Information Architecture Consulting |