Genomics & Bioinformatics (Bio 290) Projects and Presentations

Fall 2002 (instructor: Fornari)

The goal of this project is an in-depth, genomic and bioinformatic analysis of a human disease gene, which is homologous to one or more genes in one or more model organisms (the bacterium E. coli, the single-celled yeast, S. cerevisiae, the roundworm C. elegans, the arthropod D. melanogaster, the mouse M. musculus). Start your disease gene search and selection by consulting figure 1.15, p. 37 of your aPoGS text (Gibson & Muse).

Keep in mind that your project should include a “global” analysis of your selected disease gene, in addition to the implied “local” analysis of a single, isolated gene. A global analysis reflects the central themes of the course: structural, functional (both transcriptomics and proteomics), and comparative genomics. A local analysis reflects the gene’s structure and function, and its primary role in the disease etiology. Both analysis types should reflect the intricate relationships among sequence, structure, function, genome organization and expression, and evolution.

Some specifics in your research should reflect the ways in which you have been studying genomics and bioinformatics. For example, any particular gene was once mapped and localized to a chromosome within the human genome; how was the gene discovered and mapped? You learned in class about the progression from karyotype to individual chromosome, to a region and band within the chromosome, to a detailed and definite location marked by not only genetic markers but also by neutral markers such as RFLP’s, SSLP’s, and SNP’s. Also describe and analyze the physical map associated with, or substantiated by the low and high-resolution genetic maps with both their genetic and neutral markers. How was the gene sequenced? What is its sequence? What are the molecular features and properties of this gene (i.e., describe and analyze its gene “anatomy”)? What is the sequence of the protein produced by this gene? What are the important and relevant structural domains and motifs in the protein, which are related to the disease pathology? Has the gene’s expression been characterized by Microarray analysis? How are these domains and motifs related to similar domains and motifs in the other, model organisms? Is the disease gene in a cluster of syntenic genes? Is the disease gene a member of a conserved gene family? Is the protein’s domain a member of a super-family of protein domains? These questions cover a full range of analysis, from cytogenetics, to gene and protein molecular characterizations, to functional and evolutionary comparisons with other genomes in other organisms.

You will of course be making extensive use of bioinformatics programs and databases found on the Internet (NCBI, etc.). My evaluation of your project will be based in part on the depth of your analyses, and on how well you use these available bioinformatics tools. But a tool is only a tool, and its use generates data that must be interpreted, analyzed, compared, and critiqued from a variety of perspectives. Finally, conclusions must be drawn along with recommendations for future experiments or analyses. In your bioinformatics analyses, follow closely the recommendations in the “An Introduction to Sequence Similarity Searches” article (recall what was said in lecture about performing bioinformatics searches and analyses by more than one algorithm).

The format of your project should be that of a Web HomePage, so you will want to use Microsoft Frontpage to construct the final version with all text and images; you may also use PowerPoint to construct your final version. The length depends on the amount of information associated with a particular disease gene, and you should strive for brevity and clarity but not at the expense of completeness (e.g., in answering the questions posed in paragraph 3).

Bibliography and other documentation of your project and the sources that you used to create it:

Any texts, journal articles, general articles and publications.

Any web sites used in a significant way (i.e., web sites that you used as primary or major sources of information and/or data). List the actual URL in the locator bar of the browser by copying and pasting into your report.

Include a separate list of all programs (or algorithms) used, with appropriate web-site or other references; this list should be grouped into those programs for structural genomics, those for functional genomics, and those for comparative genomics. Part of the intent of this listing is to make certain that you know the distinctions among these various programs, in terms of their primary purposes. Also indicate on this list which programs have similar functions (e.g., analyzing promoter consensus sequences), but use different algorithms.

Within the body of the report (so not here in the bibliography), indicate the parameters used for any search or comparison program. This information is especially important if you adjust or change the default values on any program.