Training in Biodiversity Informatics through Fieldwork, Distributed
Databases, and Web-Accessible Resources
Anne Maglia, University of Missouri-Rolla
Jennifer Leopold, University of Missouri-Rolla
A large barrier to training the next generation of bioinformaticians is that few students are attracted to both biology and computer science. The problem is more obvious in biodiversity informatics in which data is collected in the field and computational methods are derived in the computer laboratory. Herein we describe an ongoing project designed to give students practical training in both computer science and field biology. We developed exercises for three different learning groups (university, high school, and middle school), which include both hands-on hypothesis testing and data collection and the development and use of computational methods to store and analyze data. At all levels, students who participated in these exercises were exposed to the field of biodiversity informatics, biological field work, distributed databases, and the scientific method. These exercises, along with several secondary consequences of the project, fostered interest among the participating students in biology, computer science, and the integration of both fields.
New advancements in biology (such as DNA sequencing) have generated huge quantities of new and exciting data that promise to provide advances in disease treatment and prevention, bioremediation, and biodiversity management. But massive amounts of new data require novel and innovative methods of data storage, analysis, and mining. Unfortunately, most computer scientists are unaware of the biology problems that require computational solutions and/or do not understand biological data well enough to know what software tools and technologies could be applied. Furthermore, biologists are rarely trained in computer science, and thus, have difficulty applying complex computational methodologies to biological problems. Consequently, much of the vital new biological information is underutilized, and many problems remain unanswered.
The impact of this impediment is even more pronounced when considering biodiversity questions. There is a tremendous urgency to identify those factors that result in major impacts on biodiversity, especially in light of rapid species declines and extinctions. Biologists are collecting vast amounts of data on the impact of global warming, the consequences of environmental contamination, the destruction of natural resources, and the cause of population declines. The development of new computational methods would allow these data to be used more efficiently to model global patterns of environmental and species change and understand biocomplexity and the environment, with the ultimate goal of protecting and preserving biodiversity.
Fortunately, bioinformatics, the field devoted to developing computational solutions to expand the use of biological data, is one of the fastest growing subfields in biology. In fact, recently both the National Science Foundation and the National Institute of Health identified bioinformatics as “the essential underpinning of all biological fields in the 21st century” (NIH-NSF, 2002). Some have even argued that in the very near future, most competitive life scientists will possess expertise in both biological and computer science. (Park, 2001).
Unfortunately, the largest barrier to training the next generation of bioinformaticians is that few students are attracted to both biology and computer science. The problem is even more obvious in biodiversity informatics where the data is collected in the field and the computational methods are derived in the computer laboratory.
Here we describe an ongoing project designed to give students practical training in both computer science and biology, and to foster their interest in the integration of both fields. The project follows the new paradigm of science education in which: 1) students perform real science as they construct meaning and acquire understanding; 2) students develop thinking processes and are encouraged to seek answers that enhance their knowledge and acquire an understanding of the physical universe in which they live; and 3) students are presented with problem-solving activities that incorporate authentic, real-life questions and generalization to broader ideas and applications (Christiansen, 1995).2. Methods and Exercises
2.1 University Students
This exercise was designed to bring university-level computer science and biology majors together to work on a biological problem requiring computational solutions. The goal of these series of exercises was to open communication between the different groups and to foster interest in both fields. These exercises exemplify our approach to training students under the paradigm identified above. By approach, we refer to the development of a prototypical process in which: 1) students with backgrounds in biology and computer science work side-by-side on site to solve problems, 2) biologists lead the formation and testing of a hypothesis, with computer scientists observing and assisting, 3) computer scientists develop software tools with the input of the biologists, (4) computer scientists and biologists learn about the access, design, and development of databases and Web sites targeted at users with little to no computer training. Herein we describe a single example problem, although any number of additional hypotheses could be addressed. It is the approach taken and the student training that should be the focus of attention.
A volunteer team of three early graduate-level computer science students and three undergraduate biology students were given the following challenges: 1) develop a hypothesis related to biodiversity at a local conservation site, 2) collect data to support or refute the hypothesis, 3) develop a Web-accessible database with simple-to-use access and analysis functions, and 4) determine the results of data collection, and 5) design a Website to present the results of the exercise.
2.1.1 Hypothesis Development
Together the team searched online museum collection databases (e.g., Field Museum of Natural History: http://fm1.fieldmuseum.org/collections/; California Academy of Sciences: http://www.calacademy.org/research/), online field guides (e.g., amphibians of Missouri: http://www.conservation.state.mo.us/nathis/herpetol/), and other biology-related sites (e.g., Center for North American Herpetology, http://www.naherpetology.org/; AmphibiaWeb, http://elib.cs.berkeley.edu/aw/) to determine the organisms that may be present at their study conservation site (Missouri Department of Conservation Bray Conservation Area; Rolla, MO). They also used the Web to assemble information about the climate and geography of the region, as well as general information about the organisms they might encounter (e.g., dietary habits, habitat requirements, seasonality).
external link to Field Museum of Natural History. http://fm1.fieldmuseum.org/collections/
An external link to amphibians of Missouri. http://www.conservation.state.mo.us/nathis/herpetol/
An external link to Center for North American Herpetology. http://www.naherpetology.org/
An external link to AmphibiaWeb. http://elib.cs.berkeley.edu/aw/
Using the information they collected, the team developed a hypothesis about the amphibian populations in the area. They found that Missouri is one of several states reporting major declines in amphibian populations, and that it is home to several threatened and endangered species. The team chose to examine the status of the amphibian populations at the study site, and hypothesized that there would be no difference in the number of amphibian species (and individuals within species) at the site as compared to other similar sites in the region. After visiting the site, the team found that it included four habitat types---stream, pond, forest, grassland (Fig. 1)---and subsequently hypothesized that there would be no difference in the amphibian community composition among the habitat types.
2.1.2 Hypothesis Testing
Figure 1. Satellite photo (courtesy of the US Geological Survey) of Bray conservation area with various habitat types labeled.
Figure 2. Screen shot of the user- interface for the Web-accessible database designed and developed by the students. View shows a simple data entry form.
An external link to the accessible version of the database, along with links to the exercises and a tutorial on using the database. http://web.umr.edu/~bioinf/bray/tutorial/
2.1.3 Database Development and Use
Once the data were collected, the computer science and biology students discussed the data and the hierarchical relationships among them. The computer science students then worked together to develop a Web-accessible MySQL database with a simple user-interface written as a Java applet (Fig. 2). Once finished, the biology students entered the data into the database while the computer science students observed.
The team worked together to analyze the data in the database. Amphibians were found in only two habitat types: streams and ponds. There was a significant difference in the number of frogs found in the different habitat types, with more found in the pond. They found only three amphibian species, and only one was present in consistently measurable densities. The results were lower than expected (relative to other sites), indicating that the area may be under the influence of factors leading to reduced amphibian populations. However, because the data collection was conducted during one season, it is possible that the results reflect a seasonal effect, and that there are more amphibians present than the data indicate. Therefore, the team decided to develop a long-term biodiversity monitoring program at the conservation site. Subsequently, more students have become involved, and the team visits the site at least once per month to collect data. The team has also led exercises for several local high school and junior high groups (see sections below), and is currently working to involve scout troops in the monitoring efforts through a collaborative outreach program between the Missouri Department of Conservation and the University of Missouri-Rolla.
Using Dreamweaver, the students worked together to develop a project Website: http://web.umr.edu/~bioinf/bray/. At this Website they included a description of the project, the methods they used, the current results of their study, and a link to their database. The database is currently being used in the long-term biodiversity monitoring efforts, and the Website should serve as a focal point for unifying further bio-monitoring efforts at the conservation site.
Recently, the team has begun a new exercise in which they are analyzing the color pattern distribution of the cricket frogs in the area. They have hypothesized that they will be able to identify individual frogs by examining the unique markings on each frog. If successful, they plan to use this information to track movement patterns and activity levels of the individual frogs in the area. They plan to develop a digital library and online identification key of the various color patterns and link these to their Website. They also plan to associate the GPS data in the database with digital maps of the Bray site so that the exact location of each specimen caught can be mapped. By enhancing their Website and developing the digital library of color patterns and graphical representations of specimen localities, the students will further their knowledge and skills in the areas of Web data management, multimedia computing, and the design of end-user data access and analysis tools.
external link to the authors' project website. http://web.umr.edu/~bioinf/bray/
2.2 High School Students
As a continuation of the exercises above, the university students developed simple exercises that included several local high school groups. Students involved in the exercises above worked with high school students to help them develop a hypothesis about the amphibians at the Bray site. The students hypothesized that there would be no difference in the density of frog species in the stream and pond areas. As in the previous exercise, the students collected information about the different habitats, including water and air temperatures and water pH. The students collected all of the amphibians they encountered in the areas, and released them after identified. The students entered the data into the database developed in the exercise above. Analyzing the data, they found that cricket frogs were much more prevalent in the pond habitat than in the stream habitat, but that there were no differences in the density of other species between the habitats.
2.3 Middle School Students
As a further development of the exercises above, the university students participated in several field days for middle school students. During these outings, the university students took the middle school students to their field site and described the projects they were working on to the younger students. They explained to them how they developed the hypotheses they were testing and helped them develop their own hypothesis. They hypothesized that there would be no difference in the frequency of the different color patterns seen in the cricket frogs of the area. They showed the students how to collect GPS and environmental data, and then helped the students collect several frogs. After identifying the different color patterns, they helped the students enter the data into the database.
Results and Summary
At all levels, students who participated in these exercises were exposed to the field of biodiversity informatics, biological field work, and distributed databases. Furthermore, each group gained experience developing and testing their own hypothesis.
At the university-level, the exercises facilitated cross-training of and open-communication between the biologists and computer scientists on the research team. It allowed computer science students to experience first-hand the processes used by biologists to test hypotheses and capture data, which made it easier for them to develop the database. It also gave them insights into how information access/analysis software should be designed for end-users who are untrained as computer professionals. In turn, it demonstrated to the biologists the importance of collecting data such that they can be easily stored and queried electronically, showed them how a database was designed and developed, and gave them experience designing Web pages. Additionally, by conducting the exercises for the high school and middle school students, the university students further developed their hypothesis-testing abilities. They also had the opportunity to view others utilizing their database and user-interface, and were able to identify possible usability enhancements. Subsequently, the computer science students modified the user-interface of the database to make it easier for the biologists to enter data.
An additional outcome of this exercise was that university students were given the opportunity to teach science to younger students. It is hoped that these sorts of interactions will help to foster an interest in science education. And given the current dearth of science teachers, particularly at the secondary level, developing an interest in science education may be one of the most important outcomes of this project.
At the high school level, the students learned about scientific research projects and hypothesis development. They also learned how biology and computer science combine to answer questions about biodiversity. Finally, by gaining hands-on experience answering their own questions, they were given the opportunity to see science as something that is within their capabilities, and scientists as accessible.
At the middle school level, the students were given a first-hand look at science in action. They learned about amphibians and biodiversity and how to collect data to test a hypothesis. They were also given the opportunity to interact with college scientists in a fun learning environment.
4. Value-Added Benefits
This project resulted in several important secondary consequences, including a long-term biodiversity monitoring program at a Missouri Department of Conservation site. The bio-monitoring efforts are currently being extended to include local high school students, scout troops, and a University of Missouri-Rolla Ecology class. This project also fostered a collaborative relationship between the University of Missouri-Rolla and the Missouri Department of Conservation. This partnership is resulting in the development of summer camps for middle school girls, jointly-sponsored school programs and day-long workshops, and the pursuit of additional projects, funds, and joint-ventures. Furthermore, the development of an interactive Web site documenting the project activities (including video clips, photos, interactive images, and sounds) and a Web user interface to the bio-monitoring database has become a key factor in further motivating interest in this project among additional groups of students and conservationists located throughout the area.
We thank Phil Helfrich, Connie Schmiedeskamp, and the Missouri Department of Conservation for their cooperation and access to the Marguerite Bray conservation site. All scientific collections were made under MDC Wildlife Collector’s Permit #11734 to A. Maglia.
NIH-NSF 2002. Program description NSF-02-109: National Institute of Health and National Science Foundation Bioengineering and Bioinformatics Summer Institutes Program (BBSI). (http://www.nsf.gov/pubs/2002/nsf02109/nsf02109.htm)
Park, P. (2001). Training for the Bioinformatics Boon. The Scientist. 15(20):31.
Christensen, M. 1995. Providing Hands-On, Minds-On, and Authentic Learning Experiences in Science. North Central Regional Educational Laboratory (NCREL). (http://www.ncrel.org/sdrs/areas/issues/content/cntareas/science/sc500.htm)
********** End of Document **********
© 2003 Wake Forest University (from Volume 5, Number 2, of The Interactive Multimedia Electronic Journal of Computer-Enhanced Learning).