|
Data and Metadata Review: Water-related Data and Information Annotated Bibliography and URLs Last Updated: Thursday, February 16, 2006 |
Appendix 1 - An Overview of the FGDC Geospatial Metadata Standard
Appendix 2- A simple example of metadata (Dublin Core and IEEE Learning Object)
Appendix 3 - Examples of Mapping between Metadata formats
Appendix 4 - World Water Vision - Vision Library
Appendix 5 - Water Related Databases: Result of the inventory made by the Working Group on Information Management for the 14th session of the ACC Subcommittee on Water Resources and other identified relevant databases
Appendix 6 - World Water Day 2000 - UN and Freshwater Issues - A brief Survey of Facts and Links by Gunilla Bjorkland, GeWa Consulting and the Global Water Partnership
Appendix 7 - EarthWatch - Many United Nations organizations, programmes, specialized agencies and convention secretariats are partners in the UN system-wide Earthwatch, contributing in some way to environmental observation, assessment and reporting activities that provide information for decision-making.
| 1. Executive summary |
A proposed working definition of metadata for these purposes is - Metadata is that data or information that is stored, available and partially or completely describes the characteristics, properties, format, location, availability, provenance and other features of the original sets of data and information.
There are many current metadata schemes; of these, many are concerned with "document-like" objects and a few deal with data in databases. The proposed metadata database will encompass these and other types of data and information.
Some of the principles proposed for this project are:
1. The data and information should remain in the possession and control of the originators (or owners) of that data (if that is their stated objective).
2. Wherever possible, the originators (owners) of the data should be responsible for data entry, editing, maintenance and updating.
3. A true copy of the metadata should be kept separate from the original data. At the very least, a copy of the metadata should be kept separate. It is entirely possible, though, for the metadata to be kept with the original data and edited and forwarded by the originators/owners of the data.
4. The metadata should be kept in an efficient and rapid database - preferably in a standard, non-proprietary "format" such as SQL to ensure the widest possible audience and greatest ease of access.
5. The design of the metadata database will be cooperative, comprehensive and staged.
6. Whenever appropriate, we will use previous efforts at metadata descriptions and implementations. As examples, the FGDC/ISO metadata set for geospatial information could (should) be used as the basis for the geospatial components and the Dublin Core/Warwick Framework could (should) be used as the basis for document descriptions.
7. The issue of multi-language implementations of the metadata database should be addressed during the design process. As an outgrowth of the metadata database process, a multilingual "concordance" will be produced. It will relate terms used to describe water data and information and their synonyms or functional equivalent terms in the various languages, to each other.
8. The technical aspects of the metadata database implementation are complex and should be examined as the design process unfolds. As the various component parts of the database (such as the scope, data descriptions, metadata descriptors, classification "trees", ownership issues, accessibility and security issues, multilingual access, and concordances) are designed, the database structure and design should be concurrently modified and improved.
A possible schema for implementation is presented. It would preserve the security of the "master copy" of the metadata database while enabling regional deployment of replicated copies to other servers. It would also permit remote data entry and editing by the owners or managers of the information.
The users would interact with the proposed metadata database in many different ways. Since it will be an open system, users will be free to use the mechanisms that will be provided (menu-driven searches, fill-in forms, etc) to find the data sources from the database. They will also have the choice of writing their own more complex queries, or using queries produced by others. The goal is to make the use of the system as easy as possible for non-expert users, but retain the option for experts to custom design and produce whatever interfaces to the metadata they prefer.
| 2. Background |
Numerous working definitions of metadata exist.
1. The Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSGDM) defines metadata simply as "data about data" that describe the content, quality, condition, and other characteristics of data ( http://www.fgdc.gov//) The FGDC standards for geospatial data are being harmonized with the ISO metadata standards ( http://www.fgdc.gov/metadata/whatsnew/fgdciso.html).
An overview of the FGDC - CSDGM metadata schema is given in Appendix 1
2. The World Wide Web Consortium definition is that "Metadata means "data about data" or "information about information" but probably more importantly now it should be taken to mean "machine understandable information, about information on the web". This obviously reflects their emphasis on the internet and the WWW. Berners-Lee (2000) http://www.w3.org/DesignIssues/Metadata.html) produced a very simple definition of metadata as it applies to the WWW; it is "machine understandable information about web resources or other things". Some axioms flow from this concept. The first is that "metadata is data" and can be treated as such. The second is that "metadata about one document can occur within the document, or within a separate document, or it may be transferred accompanying the document" and consequently "metadata can describe metadata".
In the Dublin Core Metadata Initiative ( http://purl.org/dc) project, metadata is "a description of objects, documents or services which may contain data about their form and content. It may be part of the resources themselves or kept separately from them". Typical metadata would be "the catalog records for printed publications: e.g. CIP-records and tables of content when included in the document, catalog records when stored in separate OPACs or abstract and index databases."
In essence, the Dublin Core metadata element set (at http://purl.oclc.org/metadata/dublin_core) was intended to improve the recall and the precision of results when retrieving information primarily from the World Wide Web. It is now being extended as a generic metadata standard for libraries, archives, government and other publishers of online information. The original Dublin Core standard was limited in its scope to "document-like objects" and was deliberately limited to a small set of 15 elements that apply to a wide range of such information resources. A review of the Dublin Core initiative was presented by Weibel (1999) (The State of the Dublin Core Metadata Initiative April 1999. D-Lib Magazine April 1999 Volume 5 Number 4 at http://www.dlib.org/dlib/april99/04weibel.html http://www.dlib.org/dlib/april99/04weibel.html)
The 15 core elements of the Dublin Core, and a comparison with the IEEE Learning Objects metadata set, is given in Appendix 2
3. The Warwick Framework was developed in an attempt to make metadata more interoperable and less dependent upon specific schema (Lagoze, Lynch, and Daniel 1996 http://www.dlib.org/dlib/july96/lagoze/07lagoze.html ). The framework proposes a "container-package" model. It is a conceptual framework and specific methods for managing the containers are provided by the particular application. The container is simply any mechanism for aggregating packages, which may be of three types. The first type is the "primitive" package, which contains one or more pieces of metadata and is labeled as a MARC package, a Dublin Core package, an FGDC package, etc. The second type is an "indirect" package, which refers to another information resource (a URL or other reference) and the third type is a "container" package. There is no limit to the degree of nesting involved (Cathro, 1997). It is designed to permit designers of individual metadata sets to focus on their specific requirements, without having to be concerned with any broader (maybe unbounded) scope of generic or other metadata schema. It permits the developer to vary the syntax of their metadata sets depending on their semantic requirements, domain of expertise or community of interest practices. The developer can also optimize functional (processing) requirements for the kind of metadata in question. It effectively allows each community of expertise to manage its own specific metadata sets. It promotes interoperability by allowing users and software to access individual packages separately, and it accommodates future metadata sets. A good literature review relating the Dublin Core and Warwick Framework was performed by Thiele (1997) (The Dublin Core and Warwick Framework A Review of the Literature, March 1995 - September 1997 D-Lib Magazine January 1998 at http://www.dlib.org/dlib/january98/01thiele.html)
4. The US National Committee on Information Technology Standards, Technical Committee L8 on Data Representation simply defines metadata as "data about data". The focus of their work is "on establishing ways to describe data to facilitate human use and to enable intelligent computer processing. Data is described through use of metadata (data about data). Metadata issues covered by the committee include naming, identification, definitions, classification, and registration." ( http://www.ncits.org/). They are also working closely with other organizations such as the relevant FGDC and ISO committees. Other initiatives to harmonize different metadata schema are under way (Baker and Lynch, 1998).
The concept of "metadata" is most useful in navigating through large quantities of data and information when specific information is needed. Cathro (1997) states that the key purpose of metadata "is to facilitate and improve the retrieval of information" and uses the standard metrics of information retrieval of recall (amount of relevant information retrieved) and precision (relevance of the retrieved information). In the case of environmental information, these concepts are particularly useful since environmental data and information can be extremely varied in format, location, accuracy, precision, validity, availability and quantity. If this data and information is "coded" in some fashion by descriptors that then become the metadata set, navigation, retrieval (recall and precision) and display can be substantially enhanced.
There are many other proposed metadata systems including BibTex, Categories for the Description of Works of Art (CDWA), CIMI (Computer Interchange of Museum, The EELS Metadata Format, The EEVL Metadata Format, Government Information Locator Service (GILS), IAFA/whois++ Templates, ICPSR SGML Codebook Initiative, LDAP Data Interchange Format (LDIF), MARC , USMARC, UKMARC, UNIMARC, PICA+ , RFC 1807, Summary Object Interchange Format (SOIF), Text Encoding Initiative (TEI) Independent Headers, and the Uniform Resource Characteristics/Citations (URCs). For more details of these and other initiatives see an annotated bibliography of metadata references at:
http://www.inweh.unu.edu//unuinweh/metadata/searches/AnnotatedBibliography1.htm))
or a similar listing in a Java-based searchable catalog at
http://www.inweh.unu.edu/unuinweh/metadata/Javacatalog/metadatainformation.htm
Other summaries are available from Dempsey (1996) and Dempsey and Heery (1997)
Proposed Working Definition of Metadata
A proposed working definition is:
Metadata is that data or information that is stored, available and partially or completely describes the characteristics, properties, format, location, availability, provenance and other features of the original sets of data and information.
This definition infers that:
- It can be complete or incomplete.
- Individual "parts" of the metadata set can be required or optional in different metadata schemas.
- It should be machine-readable and not simply a paper catalog.
- It can be stored separate from or together with the original data and information.
- It can be used apart from the original data and information to search and locate data and information.
- It can be produced according to an agreed format or can be interconverted between different metadata formats.
- It can be extended as more information becomes available or requirements evolve and change (i.e. it can be edited and extended)
It may have other properties if it is to have maximum utility:
- It should be searchable using different search mechanisms and software.
- It should be separate from the mechanisms used to organize and search.
- It should be searchable in different languages.
- It should be easy to produce and edit.
- It should be stored in an accessible format and location(s).
- It should be stored in a secure format to prevent alteration by unauthorized persons.
- It should not require the use of proprietary formats and software.
- It should be easy to produce an intelligible "data description" or "data catalog" from the metadata.
- It should account for all the different types of data and information that are currently used and be able to incorporate future changes and developments.
The metadata concept is even more useful if disparate and disconnected data sets are the norm - as in the case of most environmental data. The metadata can act as a unifying mechanism to facilitate searching, retrieval and display.
The development of databases and informational materials about water issues, whether they be in databases on hydrology, social issues, water quality or economic issues or in text format in documents and reports, has been carried out by a number of agencies around the world for different reasons and using vastly different formats and storage mechanisms. A metadata structure surrounding that data would greatly aid in searching, retrieving and formatting materials from those databases and documents.
Typical of present efforts are those of the Federal Geographic Data Committee (currently being harmonized with a similar effort from the International Standards Organization) for Geospatial Metadata, and those of the Dublin Core metadata proposal for library materials. Mechanisms chosen to document metadata have included databases, EXtended Markup Language (XML) labeling of documents and detailed schema for data description (such as the FGDC).
As many sources of data became more widely available, it was clear that surrogates were needed to provide the users with more information about the contents of the data files and their context. This was first realized with geospatial data, but the same was true for traditional libraries that were initially document-centric but became more focused on digital resources. As more and more files and kinds of objects became available digitally, the need for more information about how, when and where the data were collected, the purpose of collection, formats, platforms for viewing, modeling, visualizing and manipulating the data, any copyright or other restrictions, and information such as author(s), organization, title, subject, description or abstracted information became obvious.
It also became clear in the early years that, if there was no agreement on what elements should be in a metadata schema and what the content of these elements should be, a great deal of the usefulness of metadata would not be realized. Metadata is most useful when it is standardized to a certain extent. At a minimum, different metadata schema should be interchangeable where their elements are the same, similar or can be easily related to each other.
All of these requirements argue for an electronic version of metadata rather than a print catalog. The issues of storage, navigation through catalogs, and retrieving and displaying information in an efficient manner all argue for a database-driven system. The rise of the Internet has made this need even more obvious, especially when those digital resources (including the massive number of documents on the WWW) are considered. In fact, the three driving forces for metadata implementation have been the WWW and the Internet, the massive amount of geospatial and remote sensing data being produced, and the modernization of libraries and library science with the advent of inexpensive computing resources.
As an example, in February 1999 an estimated lower bound on the size of the indexable Web was 800 million pages, encompassing about 15 terabytes of information or about 6 terabytes of text after removing HTML tags, comments, and extra whitespace. In December 1997, the pages had numbered 320 million. In the 1999 study, no search engine indexed more than about 16% of the estimated size of the publicly indexable web (Lawrence and Giles, Accessibility of information on the web, Nature, Vol. 400, pp. 107-109, 1999 and http://wwwmetrics.com/). As these authors state, other problems are that "Indexing of new or modified pages by just one of the major search engines can take months" and "Search engines are typically more likely to index sites that have more links to them (more 'popular' sites). They are also typically more likely to index US sites than non-US sites (AltaVista is an exception), and more likely to index commercial sites than educational sites". Thus, complete indexing by the current search engines, even of web content alone, is not really a reasonable solution. Browsing the WWW to find relevant data and information is also doomed to failure; material is being added at such a rate (doubling every 7 - 9 months) and the number of servers is increasing so fast that it would be an overwhelming task.
Even if we consider only restricted subsets of information such as those related to water quality, quantity and all related socioeconomic, economic, political and other issues, it is clear that simple browsing or using search engines will not suffice. There are really two main issues - how the available data and information be located, and how it be retrieved in a useful and manageable manner and a useable format. If every "piece" of data has to be retrieved and categorized individually, only very small fractions of the total will be used. If metadata were to be attached in some way to these subsets of data and information, then their discovery, retrieval and application would be greatly enhanced.
|
To quote from the FGDC web site "Metadata helps people who use data find the data they need and determine how best to use it. Metadata benefit the data producing organization as well. As personnel change in an organization, undocumented data may lose their value. Later workers may have little understanding of the contents and uses for a digital data base and may find they can't trust results generated from these data. Lack of knowledge about other organizations' data can lead to duplication of effort. It may seem burdensome to add the cost of generating metadata to the cost of data collection, but in the long run it's worth it." |
Metadata is essential for any data-intensive function. If each reporting or monitoring activity is to have maximum utility, the results should be able to be stored, retrieved and used as efficiently as possible. Metadata surrounding the data, information and reports will provide this level of efficiency. In addition, it will make the data available to a much wider audience both within and without the immediate participants in the reports and the monitoring programs. In previous studies, experts involved in the studies have often not had easy access to the data and results of other parts of the same study until after their reports were submitted. It is quite possible that many aspects of the reporting and monitoring functions would be important to other teams of experts in their analyses and reporting activities. There is also an element of serendipity in some combinations of data with other data sets; for instance, Digital Elevation Mapping of areas might provide clues to certain socioeconomic observations as well as serving as key input into hydrological studies. Similarly, information from land use analyses (or the basic data) might have applications that are not immediately obvious; perhaps, for example, in economic models. Ensuring easy access to codified data sets using metadata would certainly help such interactions. The occurrence of such unforeseen interactions could be coded into the metadata sets themselves to ensure that such knowledge is not lost.
In the progression from data to information to knowledge, metadata about each stage should be produced, captured and disseminated. Even if the metadata serves only to produce a complete data description and an effective data catalog that can be maintained past the completion of the various phases of the projects, it will have served an extremely valuable function. In fact, it should do much more than that - it should serve as the common access mechanism for all of the data, information and knowledge generated during the lifecycle of the project and after its completion. In too many cases, the data gathered and the knowledge generated from investigation or studies is not available in an organized format after the project's completion. This means that in many subsequent projects, it must be gathered all over again. Even without this function (a "complete memory and retrieval" function), the data gathered in such projects will be valuable as historical or baseline data in future projects.
Water Data and Information Sources and Availability There is an enormous amount of water information of all types available electronically on the Internet and an unknown , but very large, quantity available from other sources such as books, electronic publications (CD-ROM and similar technologies), reports, publications, articles, databases, etc. Part of the effort required in the metadata database will be to devise mechanisms to find and quickly categorize these items. This process will have to rely on electronic searches, resource discovery by library procedures and the expertise, knowledge and publication catalogs of the partners. No one method will be able to produce a comprehensive list.
As an example of the scale involved, a series of searches on the Internet, dealing only with information related to the United Nations, elicited the following results from "Google" at http://www.google.com/ - one of the more comprehensive search engines:
An analysis of a subset of these pages (the first 1000 listed with the second set of search terms) showed that:
- Approximately 40% were United Nations sites
- Approximately 20% were WWW sites referring to the United Nations activities in general (including water)
- Approximately 30% were WWW sites referring to water issues in general and including a reference to the United Nations
- Approximately 10% were sites dedicated to water issues.
- About 15% of the sites could not be reached and were not included in the statistics
- Approximately 25% of the results were for UNEP site pages, 75% were references from other sites (excluding the sites that did not respond)
Other search engines (Lycos, Northern Lights, Excite, Copernicus 2000, Hotbot, and 15 other search engines) produced fewer results than those listed above. No attempt was made to discover a set of "unique" results by combining all of these results from all of the engines, but it is inevitable (and quite normal) for search engines to catalog only part of the available information.
These search engines provide only rudimentary assessments of the "relevance" of the documents they list as "hits". Some search engines (Yahoo!, for example) have categorized WWW sites and pages in WWW sites and produce lists of related information. Typically, those sites have much less coverage of the available information than the search engines that attempt to index the entire Internet.
As another example, the World Water Vision (a program of the World Water Council) CD-ROM lists over 60 documents including secretariat reports, regional reports, vision documents and brochures in their Vision Library. See Appendix 4 for a copy of their Library Catalog from the CD-ROM. They also list many other "links" to other information sources on the CD-ROM ( Appendix 4 - Part 2 ) They have a Vision Explorer where information on water issues can be added to their database.
Finally, as a last example, the ACC Subcommittee sponsored a survey in 1998 of water-related information. The result was an inventory made by the Working Group on Information Management for the 14th session of the ACC Subcommittee on Water Resources and other identified relevant databases. Some sites or databases indicated that they contain metadata. A summary is provided in Appendix 5 . A survey by GeWa Consulting of the water-related activities in the United Nations ( Appendix 6 ) is also appended. To paraphrase the introduction to the survey "Some 20 UN agencies have been recognised as having freshwater on their agenda, The UN ACC Subcommittee on Water Resources, being the UN co-ordinating body for freshwater, was identified as the task manager for chapter 18, the freshwater chapter of Agenda 21. The UN bodies having freshwater on their agenda are listed and links are provided to their WWW homepage"
Earthwatch - Many United Nations organizations, programmes, specialized agencies and convention secretariats are partners in the UN system-wide Earthwatch, contributing in some way to environmental observation, assessment and reporting activities that provide information for decision-making. Appendix 7 gives an overview of the organizations cooperating in Earthwatch and their activities.
| 3. Principles |
Principles:
1. The data and information should remain in the possession and control of the originators (or owners) of that data (if that is their stated objective).
2. Wherever possible, the originators (owners) of the data should be responsible for data entry, editing, maintenance and updating.
3. A true copy of the metadata should be kept separate from the original data. At the very least, a copy of the metadata should be kept separate, even though it is entirely possible for the metadata to be kept with the original data and edited and forwarded by the originators/owners of the data.
4. To ensure the widest possible audience and greatest ease of access, the metadata should be kept in an efficient and rapid database - preferably in a standard, non-proprietary "format" such as SQL. This "metadata database" is the key feature of the metadata proposal. It will have the following set of features:
a). The metadata will be in an SQL database in a published and available format.
b). The metadata database will have one "master" copy and as many "shadow" or replicate copies as required. All of these will be linked to the master copy which will NOT be accessible to all (for security reasons). Replication will be automatic and intelligent (i.e. only changed portions of the database will be sent to the "shadow" copies).
c). A complete instruction set for the metadata should be available on-line.
d). A complete data catalog describing the available data should be produced from the metadata.
e). Searching the metadata database should be as flexible and simple as possible. This could include everything from "pre-programmed" menu-driven enquiries for non-specialists, to custom-built queries for specific purposes. In as many cases as possible, the queries should be capable of being constructed by the users without knowledge of the underlying database structure and without knowledge of the SQL query language.
f). The result of a search will be a reference to the location of the actual data or information. This should be accessible on-line whenever feasible. In other words, the metadata should be "linked" to the actual data, which is stored elsewhere. When this is not the case (for example, materials available only as paper documents), the metadata should provide full instructions on how the user can obtain the information.
g). Various levels and types of security for the metadata database should be specified. This could range from password access to full encryption for all or part of the database. A "guiding principle" might be adopted to limit such measures as much as possible in order to promote wider access to the information. There still might be participants with legitimate concerns about releasing certain information to the public.
5. The design of the metadata database will be cooperative, comprehensive and staged:
a) Cooperative Design - The design process for the metadata database will depend heavily on the cooperative design of an accurate and comprehensive metadata description encompassing a wide range of data and information types and formats. It is envisioned that this design process will involve meetings of representatives from affected or interested agencies. An iterative design of the metadata descriptions will be used to define the final product. It is also preferable that the final product (the metadata description and its implementation as a database) be flexible enough to permit later modification, addition and improvement without excessive recoding or manual intervention.
b) Comprehensive Design - Although the metadata database will be based on the requirements of the participants in the process, attempts should be made to ensure that no design decision limits the potential scope and application of the metadata database to one or a few projects.
c) Staged Design - An iterative design process will involve the production of a series of "staged models" of the metadata database that will be tested against actual sets of data on various servers. This constant testing will reveal any issues on smaller data sets before the deployment of the actual metadata database. Each staged model will be supplied to participants for testing, suggested modifications and other comments.
6. Whenever appropriate, we will use previous efforts at metadata descriptions and implementations. As examples, the FGDC/ISO metadata set for geospatial information could (should) be used as the basis for the geospatial components and the Dublin Core could (should) be used as the basis for document descriptions.
7. The issue of multi-language implementations of the metadata database should be addressed during the design process. Since the vocabulary of the database is constrained due to its nature, the possibility of machine translation should be addressed and examined. This might involve producing a "dictionary" of all terms used in the metadata database, and coupling each term with specific translation to multiple languages. This will necessarily involve decisions by experts on the various topics and their correct translation. Previous efforts at multilingual "classifications' of environmental terms will be used in this process. The issue of translating full-text documents to multiple languages is not addressed by the metadata database, but at least extracted key words or descriptor terms would be included in the database.
8. As an outgrowth of the metadata database process, a multilingual "concordance" will be produced. It will relate terms used to describe water data and information and their synonyms or functional equivalent terms in the various languages, to each other. This table could be used as the basis for multilingual enquiries to the database or for machine translation of the key descriptor terms. Note that this "concordance" is not the same as a simple translation - a good example would be the translation of the term "watershed" between different languages, countries and interest groups.
9. The technical aspects of the metadata database implementation are complex and should be examined as the design process unfolds. As the various component parts of the database (such as the scope, data descriptions, metadata descriptors, classification "trees", ownership issues, accessibility and security issues, multilingual access, and concordances) are designed, the database structure and design should be concurrently modified and improved. This ensures that the final design reflects the design goals and principles. If the database design is left until the completion of the design exercise, many decisions may not be possible to implement or the database design may be compromised by earlier decisions. Again, a cooperative, comprehensive and staged database design process would be advantageous.
One possible implementation of the metadata database is diagrammed below:
Figure 1. Proposed Metadata Distribution System
Technical features might include the establishment and maintenance of a large, secure master database server running an enterprise SQL database such as Oracle, Sybase or Microsoft SQL. This server will have redundant and "hot-swappable" disk storage in a RAID configuration, redundant and replaceable power supplies, and Ethernet network connections to a fast backbone location on the Internet. Access to this master SQL server could be controlled through rigorous security measures. That access would be only for editing and updating the database. All standard SQL tracking, auditing and "roll-back" features would be implemented on this server.
The master SQL server will replicate the metadata database intelligently to the replicate SQL servers. Initially there could be a single replicate server, but eventually it might be preferable to have many, located in different regions of the world to provide better web access through the linked Web Servers. These features will enable stable, secure, and fast delivery of database information to the web server computer used to interrogate the database upon inquiries from the users over the Internet.
This separation of the two functions (database servers and web servers) will allow each to be optimized for its particular task. The web server will also have multiple fast and reliable Internet connections, but the web pages it delivers will be constructed from results generated on the database server.
The Master and Replicated Database Servers
An important function of the system will be database replication between the master server and the replicated database(s). In this instance, replication will mean an intelligent exchange of only the information that has changed since the last exchange - the entire database does not have to be copied in either direction. In addition, a "master copy" of the database on the master server could be replicated to the replicate servers at predetermined times to take advantage of the various time zones around the world.
Another advantage of the separation of functions is that the database need only be backed up to other media from the master server.
Finally, the two tasks of linking to the internet and accepting inquiries for the database versus running and optimizing a very efficient SQL database server cannot be optimized onto one machine. They each require different capabilities of a server, so the machines are best separated into SQL servers and Web servers so that each can be optimized. Depending on eventual usage patterns, it is also easier to upgrade the appropriate machine to take care of any limitations that occur.
The function of the replicate database servers will be simply to respond to requests to extract and sort materials from the metadata on that SQL server. This could potentially be a very large data set, and speed in satisfying these queries will be essential. The quantity of data actually delivered (the location of the particular set of information requested) may not be large, but a fast database server will be required to sort through and extract it from the large metadata database.
As the number of fields in this metadata database increases to take care of different types of queries (location, river basin, drainage area, choice of area from a map, type of sample, samples, times, years, graphs, water flows, rainfall, water quality parameters, data quality measures, accuracy estimates, agency holding the data, etc.) the requirement for a fast replicate database server computer becomes more obvious.
Although the Web server computers need not be as "robust", as large or as fast as the database servers, they do need to have excellent and reliable Internet connectivity. They will be responsible for storing security and access permission information, navigation software to interrogate the database server, and delivery mechanisms to send out results to the users or to send requests for the data to the remote and distributed servers that actually store the raw data. Multiple concurrent users will be the normal situation.
The Web Server will have navigation tools that will be used to generate SQL queries that will be sent to the database server computer. These navigation tools will include:
- Simple or complex "prepackaged" query statements
- Stored, user-specified query statements
- Forms-based query tools
- Text-based query tools
- Map-based query tools
- General SQL query tools
- Configuration specifications for each server containing the raw data (although this item could be part of the metadata set)
| 5. How would a user interact with the system ? |
Having decided on their approach to locating information, they could search by many different mechanisms. For example, they could proceed by:
And by many other types of query.
| 6. Other factors to be considered: |
It is our experience that one of the major barriers to progress in water issues is that of data access, sharing and reliability. When this barrier is coupled with those of competing national and local interests, data hoarding, diverse data formats and standards, and large gaps in basic knowledge and scope of data sets, there is a pernicious type of "non-co-operation" or "grid-lock" in data sharing exercises. This has been the case in many areas of the world where we have had experience with Environmental Information Systems design and implementation.
One of the very attractive features of the metadata database component is that it could be a means to minimize these effects. If these protocols, processes and analyses were to form integral parts of capacity building programs in all types of water assessments around the world, many of the expressed concerns and problems would become more manageable, as the level of expertise and the databank size would increase. The steering effect of a large, well-organized and accessible global metadata database of water-related data and information on future projects would be very noticeable. If donor agencies were convinced of the merits of such efforts, they would include them in projects, thus increasing the scope and relevance of the entire exercise.
For assessment, different models can be linked, with proper assumptions, into decision support systems (DSS) with GIS capabilities. This DSS could be accessed either locally (national level), in regions, or globally. The connection between watershed models (i.e. hydrology, water quality, etc.) can be facilitated by technical user interfaces that can aid in the transfer of key data files among the models. Additional models, particularly those used by regional agencies, could be linked to the system to provide relevant answers to water resources management questions.
In the strictest sense, a GIS is a software system capable of storing, manipulating, and displaying geographic information. A step further in water management and planning, is to integrate different sources of information and knowledge into Spatial Decision Support Systems (SDSS).
A key feature of correctly designed SDSS tools for water assessment and management programs is that many different models can be tested and used. In most situations, there is no one correct model. The ability to easily incorporate different models, using the same data set, and then compare the output from many such models has led to agreement on some contentious issues - for instance, SDSS software was used in this manner during the negotiations (between US and Canada) on sources, effects and control of acid rain.
9. Selected References (See also the
Annotated Bibliography)
ALEXANDRIA DIGITAL LIBRARY www.alexandria.ucsb.edu/public-documents/metadata/metadata_ws.html
Baker, T 1998 Languages for Dublin Core 1998.. D-Lib Magazine, December 1998. http://www.dlib.org/dlib/december98/12baker.html.
Berners-Lee , T. (2000) http://www.w3.org/DesignIssues/Metadata.html
Clark, Suzanne, Larsgaard, Mary, and Teague, Cynthia, 1992, Cartographic citations: A style guide: Chicago, American Library Association, Map and Geography Roundtable.
Bridge, Virginia 9-12 November 1998 www.ariadne.ac.uk/issue18/metadiversity/
Dempsey (1996)
Dempsey and Heery (1997)
Department of Commerce, 1992, Spatial Data Transfer Standard (SDTS) (Federal Information Processing Standard 173): Washington, Department of Commerce, National Institute of Standards and Technology.
Department of Defense, 1990, Military specification ARC Digitized Raster Graphics (ADRG) (MIL-A-89007): Philadelphia, Department of Defense, Defense Printing Service Detachment Office.
Department of Defense, 1992, Vector Product Format (MIL-STD-600006): Philadelphia, Department of Defense, Defense Printing Service Detachment Office.
Dodd, Susan, 1982, Cataloging machine-readable files. Chicago, American Library Association.
Dublin Core Metadata Initiative Organizational Website. (April 1999) http://purl.org/dchttp://cs-tr.cs.cornell.edu/Dienst/UI/1.0/Display/ncstrl.cornell/TR96-1593
Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSGDM) http://www.fgdc.gov//
ISO 11179 Parts 1-6, Specification and Standardization of Data Elements, ftp://sdct-sunsrv1.ncsl.nist.gov/x3l8/11179/
Lagoze, Carl, Lynch, Clifford, Daniel,Ron Jr 1996 The Warwick Framework: A Container Architecture for Aggregating Sets of Metadata. Cornell Computer Science.. (June, 1996) Technical Report TR96-1593.
Lawrence and Giles, 1999 Accessibility of information on the web, Nature, Vol. 400, pp. 107-109, 1999 http://wwwmetrics.com/
Li, Xia, and Crane, Nancy, 1993, Electronic style: A guide to citing electronic information: Westport, Connecticut, Meckler Publishing.
Metaweb - Australian metadata http://www.dstc.edu.au/Research/Projects/metaweb/dcsites.html
Network Development and MARC Standards Office, 1988, USMARC code list for relators, sources, and description conventions: Washington, Library of Congress, Cataloging Distribution Service.
Network Development and MARC Standards Office, 1988, USMARC format for bibliographic data: Washington, Library of Congress, Cataloging Distribution Service.
no author, 1994, The Government Information Locator Service (GILS): Report to the Information Infrastructure Task Force (May 2, 1994).
Steere, David C. , Baptista, Antonio, McNamee, Dylan, Pu , Calton and Walpole , Jonathan.. 2000 ..Research challenges in environmental observation and forecasting systems International Conference on Mobile Computing and Networking Proceedings of the sixth annual international conference on Mobile computing and networking August 6 - 11, 2000, Boston, MA USA Pages 292-299 www.acm.org/pubs/citations/proceedings/comm/345910/p292-steere/
Thiele (1997) The Dublin Core and Warwick Framework A Review of the Literature, March 1995 - September 1997 D-Lib Magazine January 1998 http://www.dlib.org/dlib/january98/01thiele.html
US National Committee on Information Technology Standards, http://www.ncits.org/
Weibel, Stuart (1999) The State of the Dublin Core Metadata Initiative April 1999. D-Lib Magazine April 1999 Volume 5 Number 4 http://www.dlib.org/dlib/april99/04weibel.html
Weibel, Stuart, Kunze, John, Lagoze, Carl and Wolf, Misha 1998 Dublin Core Metadata for Resource Discovery. . IETF Informational RFC. (September 1998) http://www.ietf.org/rfc/rfc2413.txt
Westbrook, J. H., and Grattidge, W., 1991, A glossary of terms relating to data, data capture, data manipulation, and data bases: CODATA Bulletin, v. 23, no. 1-2.
Links to Appendices:
Appendix 1 - An Overview of the FGDC Geospatial Metadata Standard
Appendix 2 - A simple example of metadata (Dublin Core and IEEE Learning Object)
Appendix 3 - Examples of Mapping between Metadata formats
Appendix 4 - World Water Vision - Vision Library
Appendix 5 - Water Related Databases: Result of the inventory made by the Working Group on Information Management for the 14th session of the ACC Subcommittee on Water Resources and other identified relevant databases
Appendix 6 - World Water Day 2000 - UN and Freshwater Issues - A brief Survey of Facts and Links by Gunilla Bjorkland, GeWa Consulting and the Global Water Partnership
Appendix 7 - EarthWatch EarthWatch - Many United Nations organizations, programmes, specialized agencies and convention secretariats are partners in the UN system-wide Earthwatch, contributing in some way to environmental observation, assessment and reporting activities that provide information for decision-making.