Overview of Projects for Summer 2018
Computational Approaches to Study the
Transmission
and Pathogenesis of Mycobacterium ulcerans
Dr. Jordan’s research areas include microbial ecology,
transmission and pathogenesis of the environmental pathogen,
Mycobacterium ulcerans. What we do not know include how the organism is
transmitted to humans, and under what environmental circumstances lead
to the production of mycolactone, a lipid toxin and sole virulence
determinant of M. ulcerans. These gaps in the knowledge base is
important because M. ulcerans infection leads to a devastating skin
disease known as Buruli ulcer that impacts at least 33 countries with
highest incidence in rural West Africa.
Potential project(s) for the REU
fellows include 1. Environmental
screening for the presence and abundance of M. ulcerans among aquatic
samples collected from Benin, West Africa. The objective of this work
is to determine presence and abundance among environmental samples
collected from Buruli ulcer endemic and non-endemic aquatic habitats in
order to test the hypothesis that M. ulcerans resides and replicates
within a specific niche within the aquatic habitat. In order to test
this hypothesis, DNA will be isolated from preserved samples that have
been collected from aquatic habitats from Benin West Africa. The
isolated DNA will be subjected to semi-quantitative and quantitative
PCR targeting M. ulcerans. Positive samples will be strain typed using
Variable Number Tandem Repeat Profiling and verified by amplicon
sequencing and comparison against the BLAST database. We expect
specific aquatic matrices (such as water filtrand, soil, or
invertebrates) to be positive for M. ulcerans. We also expect
positivity and concentration to be higher among samples collected from
Buruli ulcer endemic habitats. 2. Impact of UV on mycolactone gene
expression. The objective of this work is to determine whether there is
modulation of mycolactone gene expression and production when subjected
to UV. This objective will test the hypothesis that expression of genes
responsible for mycolactone production is upregulated as a stress
response. In order to test this hypothesis, M. ulcerans replicates will
be grown to exponential phase then placed into petri plates and
subjected to UV for 0 (control), and 5 seconds to 5 minutes. The
bacteria will be collected and serially diluted for plating to
determine UV impact on M. ulcerans growth. Additionally, RNA will be
isolated from the bacteria and, following isolation and verification of
RNA integrity, converted to cDNA for RT-PCR targeting genes responsible
for mycolactone production. Modulation of gene expression will be
analyzed using computation software in the R package. We expect
mycolactone to be upregulated upon increased UV exposure. Data from
both projects will be valuable for assessing the environmental niche of
M. ulcerans, determining the mode of transmission of the pathogen to
people, and conditions for mycolactone production. Additionally,
methods described will allow the student to obtain or develop skills of
molecular biology, data management and data interpretation.
Changes in Hemoglobin Expression in Response to
Environmental Changes
The Hoffmann Lab is broadly interested in evolutionary genomics and
molecular evolution. An overriding theme is to better understand the
connection between the emergence of novel genes and the origins of
biological innovations. Relating to this theme, the Hoffmann Lab 1.
Explores the different mechanisms involved in the origin of new genes,
2. Assesses the forces underlying the retention and functional
variation of these genes, and 3. Works to gain insight into the
processes underlying intra- and inter-specific variation in the number
and nature of genes in animal genomes. Current projects include studies
of the evolution of animal gene families, the emergence of novel genes
via gene and genome duplication, functional variation among paralogous
members of a gene family, the evolution of small RNA repertoires, and
the interplay between transposable elements and small RNAs. Dr.
Hoffmann’s team pursues these questions using an integrative
approach that involves combining bioinformatics and evolutionary
genomics with perspectives from other disciplines such as molecular
population genetics, cellular and structural biology, protein
biochemistry and animal physiology that are brought by our
collaborators.
The dual challenge of respiration
(oxygen extraction and delivery) and
ionoregulation is a poorly studied problem of particular physiological
significance for basal aquatic vertebrates. This is particularly true
of fish with both gills and an air-breathing organ (ABO) that tolerate
a vast range of oxygen concentrations and salinities, such as gars.
These species need to contend with extracting oxygen from media with
different oxygen concentrations under a wide range of environmental
conditions that probably change throughout an animal’s
development. As such, the alligator gar (Atractosteus spatula), the
most salinity tolerant species in the basal bony fishes with an ABO
which also stands at the crux of basal vertebrate and teleost
evolution, offers unique opportunities to better understand the dual
physiological regulation of these systems. At the cellular and
organismal level, vertebrate hemoglobins play a fundamental role in
mediating responses to changes in oxygen availability, as this protein
is in charge of delivering oxygen from respiratory organs to the cells
of tissues to enable aerobic metabolism. Hemoglobins are the products
of a gene family, and most fish synthesize different hemoglobin
subunits throughout development and also in response to environmental
changes. However, a clear understanding of the combined effects of
developmental and environmental changes is lacking. Because of its
ability to extract oxygen from water and air, its wide tolerance to
changes in salinity and the availability of a high quality genome for a
relatively closely related species, the alligator gar offers unique
opportunities to study gene expression plasticity. Our work seeks to
study this important model species through seeking answers to key
questions in physiology and functional and evolutionary genomics
related to a fundamental aspect of life on this planet: how organisms
maintain a stable delivery of oxygen under varying conditions. Thus, as
a first approximation to understanding how gars are able to deliver
oxygen under different environmental conditions, and understand how
exposure to physiological challenges early in development can influence
responses at later stages, we are analyzing alligator gar
transcriptomes to 1. Characterize the set of hemoglobins expressed at
different stages of development, 2. Characterize changes in hemoglobin
expression in response to changes in O2 availability and changes in
salinity, and 3. Characterize changes in blood chemistry relative to
oxygen binding affinity in response to changes in O2 availability and
changes in salinity.
Visualization of Genomic Data
Dr. Jankun-Kelly works in the area of information and scientific
visualization. He was developed novel methods for visualization
interfaces, interfaces for linked image browsing, models for visual
exploration, and visual analysis tools for bioinformatics.
REU projects for bioinformatics will
challenge
students to work
together with computer scientists and biology experts to solve complex
problems via interactive computer graphics. While two examples of such
projects are given, actual projects will be determined in collaboration
with application scientist and the student. 1. MSAVis, a multiple
sequence alignment visualization system, has several feasible
extensions that can be tackled in parallel by dedicated students; two
are presented here. First, as it stands, MSAVis does not allow editing
of protein sequences to test different alignment hypotheses; this is a
feature of interest to its users. A student would add this
functionality which would involve modifying MSAVis’
interaction mechanisms and integrating it with sequence alignment
software. Second, there are additional protein features that could be
integrated such as binding sites or information about secondary
structure. Such a project would involve designing the visual metaphors
for the added information and designing the interface to query the
biological databases to extract them. 2. In this project, a web-based
tool named GeneAtlas will be refined. The gene atlas allows the
efficient comparison of multiple gene expression samples (usually from
species at different times in their life cycle) to be compared
efficiently. Additional interaction methods and visual metaphors could
be explored to make this a tool with genuine impact on biological
studies.
Genomics for Studying the Role of Polyamine
Metabolism in Pneumoccal Virulence
The Nanduri lab routinely uses 1D LC ESI MS/MS to conduct global
expression analysis which can be applied to study host (mouse) and the
pathogen (S. pneumoniae) response in an intranasal challenge model of
pneumonia. We also use single nucleotide resolution transcriptome
mapping approaches such as RNA-seq to study global gene expression
during infection. Both the mass spectrometry based proteomics and
RNA-seq approach generate data that requires bioinformatics analysis
utilizing available open source pipelines to identify a list of
genes/proteins that are differentially expressed in the host and
pathogen during infection. Mass spectrometry data and RNA-Seq data can
also be utilized for genome structural annotation i.e. defining the
expressed elements and their boundaries in a genome sequence.
Furthermore, the list of differentially expressed genes and proteins
are not useful unless the corresponding biological information is
retrieved and analyzed in the context of pathways and networks for
knowledge discovery. All these aspects of conducting polyamine research
in the pneumococcus are amenable to training in multiple aspects of
bioinformatics and computational biology at the undergraduate level.
Streptococcus pneumoniae (pneumococcus) is a human pathogen is
associated with the etiology of meningitis, pneumonia, bacteremia,
bronchitis, sinusitis, and otitis media. Based on the structure of the
capsule, more than 90 different serotypes of S. pneumoniae are
described in literature. Genome plasticity, serotype variability and
increasing antibiotic resistance confound the efforts to control this
pathogen. The availability of genome sequences for
representative serotype strains and mouse models of disease allow the
identification of host-pathogen interactions that underscore disease
for developing therapeutic strategies. S. pneumoniae is a commensal in
nasopharynx, when the host is immunocompromised, this opportunistic
pathogen invades sterile spaces such as lungs causes pneumonia and when
it reaches blood it results in sepsis. As pneumococcus traverses
through the nasopharynx to various anatomical locations in the human
body, it has to adapt its metabolism to host niche and also circumvent
host defenses at each of these locations. The intersection of
pneumococcal metabolism with virulence during infection is expected to
elucidate key pneumococcal genes/proteins involved in pathogenesis.
Polyamines are poly cationic aliphatic hydrocarbon compounds that are
ubiquitous in all living cells. Polyamines, such as putrescine,
spermidine and cadavarine, carry a net positive charge at physiological
pH. The positive charge of polyamines helps maintain the conformation
of negatively charged nucleic acids. Polyamines are involved in
pathogen adaptation to growth in vivo, response to physiological
stress, and modulation of host immune responses. Impaired
polyamine transport and biosynthesis in S. pneumoniae TIGR4 render the
bacterium incapable of surviving in mouse models of nasopharyngeal
colonization, pneumonia and sepsis.
High Performance Computing for Genome Sequencing
and Assembly
Dr. Peterson’s research is focused on exploring the structure
and evolution of eukaryotic and prokaryotic genomes using genomic,
cytogenetic, molecular biology, and computational techniques. By
elucidating and comparing the sequences of genes and repeat sequences
from a diverse group of organisms, his lab is illuminating trends in
molecular evolution and discovering sequences responsible for economic
and adaptive traits. Such research accelerates agricultural
plant/animal improvement through marker-aided selection strategies
and/or genetic engineering. Additionally, we are investigating
repetitive DNA sequences and their role in genome evolution. At
present, the research organisms we are studying include cotton, conifer
trees, nematode and arthropod pests, crocodilians, and bacteria with
anti-fungal properties. Bioinformatics and high performance computing
play a central role in this research, especially in genome assembly and
analysis of massive nucleic acid datasets. We have been involved in
large-scale genome sequencing/sequence analysis projects that have been
published in journals such as Nature, Nature Biotechnology, and Science.
Increasingly, Dr. Peterson’s
research
has focused on the use of computational biology techniques to distill
biological information from large, complex datasets. REU students
working in the Peterson lab will be trained to use high performance
computing (HPC) instruments to assemble and annotate genomes sequenced
by my research team. Training will be tailored to each individual
trainee based upon his/her familiarity, if any, with UNIX, HPC, and the
genome/organism assigned. After becoming proficient in UNIX,
undergraduate trainees will be taught to use modern open-source genome
analysis scripts to explore test data sets. Once proficiency using the
scripts has been demonstrated, trainees will be assigned a previously
uncharacterized DNA sequence dataset to assemble and annotate. The
sheer number of genomes sequenced by the IGBB (ca. 30 per year) means
that there is no shortage of data for characterization/study.
High Throughput Maize Genomics
In the Corn Host Plant Resistance Research Unit of the USDA Agriculture
Research Service, Dr. Warburton investigates the genetic basis of
aflatoxin and A. flavus resistance in corn using genetic and genomic
tools. We are currently working to identify and validate genetic
sequences associated with resistance to the toxic fungus Aspergillus
flavus. In the course of genetic analyses, the lab generate very large
amounts of sequencing data. This data ranges from high coverage but low
depth (Genotype by Sequencing data, GBS) to low coverage and high depth
(one gene sequenced multiple times in multiple individuals). This data
must all be stored, retrieved and analyzed as efficiently as possible,
and re-analyzed as new information comes to light. Changes in DNA
sequences are correlated with changes in plan phenotypes, via genome
wide association studies (GWAS), candidate gene association analysis,
and linkage mapping. The data storage and retrieval is computationally
intensive, and help is often needed with programs to find the exact
sequence variation we need as reliably and quickly as possible. This is
typically used for association or linkage mapping studies.
In addition, we are beginning to work with RNA and gene expression
data. Rather than the more simple changes in genetic sequence that we
have dealt with using DNA sequences, there is an added component where
number of copies of each RNA, or expression levels, becomes very
important and must also be stored, retrieved and analyzed for each
unique sequence. The added data associated with each genetic sequence
requires computational skills to handle. Expertise in both databasing
and programming is very useful in this, and biological understanding is
good to ensure data retrieval and analyses are working correctly. There
are many currently available online resources for DNA and RNA
pipelines, from data generation, storage, alignment, and analysis, but
for any given project and species, these pipelines almost always need
tweaking and manual curation.
Genomic Dynamics of Populations
Dr. Welch is an evolutionary geneticist with two distinct research foci
at present. The role of transcribed microsatellites as agents of
adaptive change is being studied using the annual sunflower, Helianthus
annuus, and RNAseq based methods. He is also investigating the
population dynamics of small populations using Caribbean rock iguanas
as a model system.Projects for undergraduates will be designed to both
generate usable data, and serve as complete introductions to hypothesis
driven research. Projects will be focused on understanding the role
microsatellites play in generating gene expression variance, and how
that phenotypic variance is influenced by selection. For example,
students involved in screening microsatellites for amplification and
variability in sunflowers will be testing the hypothesis that
transcribed microsatellites are under greater evolutionary constraint
than are anonymous microsatellites. The prediction that follows from
this hypothesis is that anonymous loci should harbor more variation
than those that are transcribed.
Some students will learn how to perform
fragment
analysis, and basic computational biology associated with population
genetic studies. Students collecting data on seed set and seed mass
could test the hypothesis that variance in reproductive success varies
across multiple populations. In this way, students are involved in
meaningful research, and they are introduced to the entire process of
science from hypothesis development to reporting. Students with more
advanced computational skills will be afforded the opportunity to
develop bioinformatics projects using our RNAseq data. For example, one
REU student in the summer of 2015 studied variation in transcriptomes
by comparing sequence similarity across six individuals. She followed
up by demonstrating that transcribed microsatellites tend to be found
in genes that are consistently transcribed rather than in spurious
transcripts that are unique to individual sunflowers. She concluded
that functional genes are relatively enriched with microsatellites.