Importance of Natural Resources

Katherine Pollard: “Massive Data Sheds Light on Your Microbiome” | Talks at Google

PAUL LI: Hi. Good morning, everyone. Thank you all for coming. My name is Paul Li. I’m a science administrator
at the Gladstone Institutes and a faculty member in
the cognitive sciences at UC Berkeley. Today we’re going to kick off
our new initiative coming out from the Gladstone Institutes
called Open Classroom Talks, where we send our
scientists outside to Bay Area communities, such
as Google, to share their interdisciplinary
research. So to start off our
Open Classroom Talks, I’d like to introduce
you to our guest speaker for today, Dr.
Katie Pollard, who is a senior investigator
at the Gladstone Institutes and a professor
in human genetics and biostatistics at UCSF. She received her Ph.D. from UC
Berkeley, where she developed computationally intensive
statistical methods for analysis of microarray
data with applications in cancer biology. For her post-doctoral
work, also at Berkeley, she developed bioconductor
open-source software packages for clustering and multiple
hypothesis testing. Katie was also part of the
chimpanzee sequencing analysis consortium that published a
sequence of the chimp genome, and she used the
sequence to identify the fastest-evolving
regions in the human genome. In 2005, she joined the faculty
at the UC Davis Genome Center and the Department of
Statistics before moving to UCSF in fall 2008. So without further
ado, please join me in welcoming Dr. Katie
Pollard, everyone. [APPLAUSE] KATHERINE POLLARD: Hi everybody. It’s an honor to be here today. I thought I would start–
you just heard who I am. I thought I’d start by
asking a little bit about you guys and what brought
you to the talk today. So how many people here are
programmers in the room? All right, a bunch
of programmers. How many of you work
with big data sets? How many have a biology
interest or background? A few? OK, great. And how many of
you are interested about your own microbiome,
like you came because you’re curious about yourself? OK, good. All right. So there’s a lot of reasons
to be here and learn something at this Gladstone
Open Classroom. I’m really pleased to be here. As Paul mentioned
from my background, I haven’t been working
on the human microbiome since undergrad or
a long time ago. I was working in human genetics,
and in particular asking, what’s unique about
the human genome? What makes us different from
chimpanzees and other animals? And I made some really exciting
discoveries in that realm, and I was working
really hard to use comparative and
computational approaches to decipher human DNA. And there’s a chunk of my
lab that still works on that, and it’s a great
interest of mine. But about a decade
ago, a colleague who’s a microbiologist said to
me, sort of prodding me, he said, you know, most of
the DNA in the human body isn’t encoded in
the human genome. And I was like, yeah, whatever. And he said, no,
really, it’s not. Most of the DNA
comes from microbes. And I eventually believed him. But then I thought,
well, they’re just kind of along for
the ride on our body. This isn’t really significant. And I’m studying
the human DNA, which is going to explain
who we are and why we’re different from each
other and why we get diseases. But slowly, the more
I looked into it, I realized that the
facts really support the idea that the human
microbiome not only is a large contribution to
our biological makeup, but that it’s not just
a passive contribution, that the microbial cells in our
bodies, which are bacterial, they’re microbial
eukaryotes like yeasts. They’re archaea, which are a
sort of underappreciated part of the microbial tree of life. And they’re also viruses
that don’t have cells at all, but are just
molecules in our body. All of those organisms
together are not just along for the ride. They’re interacting
with our cells and forming essentially
additional organs or parts of organs in our body. And the genes encoded in
their DNA are making proteins, and they’re making the majority
of the proteins in our body. And since proteins
are the things that are the basic building
blocks of life, it would be completely
irresponsible to not be looking at the
microbial DNA to try to understand human biology. Now, the first
bullet point there, about cells
outnumbering by 10 to 1, there was actually,
if anyone caught it, an article just
this week saying that that estimate might be
off by an order of magnitude. And so maybe about
half the cells are human and half
microbial, not 10 to 1. But either way,
there’s certainly a lot of microbial cells
in our body– trillions. So more than the dollars
in the national debt might be an example. So that’s a very big number. And because they’re not
all– our cells in our body, all the trillions of
cells that are human, have the same DNA in
them, modulo cancer and some occasional other
mutations that happen. But the microbial cells
are all different types. They’re different
organisms, different species from one another. So they have a lot
of different genomes. And that’s why, regardless
of whether the cells are approximately equal or 10 to
1, the genes, the pieces of DNA that encode proteins, really
are primarily microbial. regardless of the cell count. So we have these
microbes in our bodies. And so why was I so
oblivious to this when I was trotting along
looking at the human DNA, and why is this sort of a
hot topic all of a sudden that we’re seeing in the
“New Yorker” or other news that we might read? The microbes have been there. They’ve been evolving
with us as a species. They’re on plants. They’re on other animals. So why were we kind of
not so tuned into them? Well, it seems a bit surprising,
because people have been aware of microbes for
hundreds of years. But until about 10
to 15 years ago, it was impossible to study
most of the microbes, because the classic
approach was to extract them from their natural environment,
which in this case is our body, and try to grow them in
these dishes in the lab and try to figure out what
they are and what they do. But the vast
majority of microbes that live in the human body, and
in fact, in most environments, can’t survive in
the lab environment, because the conditions
aren’t right, and because they
can’t survive alone. This approach takes
one type of microbe and tries to grow
it in isolation, and these communities
are made up of different types of
microbes that are all interacting with each other. So trying to grow them in
this way just doesn’t work. So they’re not amenable
to this direct study by microbiological techniques. But the advent of
two technologies that are the absolute sort
of basis for all the work that we do in my
lab have enabled us to get over
this hurdle, which is that we can footprint
or measure what’s there by looking at their DNA. And so that is possible because
of next-generation or low-cost high-throughput sequencing
that you can extract DNA from the community
without the cells needing to be able to grow in the lab. You can just directly assay it. So if I wanted to study your
microbiome here in the room today, I could pass
around a test tube and have you spit
in the tube if I wanted to study the microbes in
your saliva or in your mouth. I could collect a stool sample
to study your gut microbiome, or I could actually go in
and take some of your skin or take a biopsy. Some of those are more
invasive, but as you can see, it’s pretty easy to just extract
some bodily fluid or tissue, and then I can extract
DNA, sequence it. And then the other
technology that’s essential for this work is
high-performance computing, which should be no surprise
to a Google audience. But the data sets from
a single saliva sample could be up to a few terabytes. And if I want to
analyze a bunch of these and compare them to each
other and do modeling, I need high-performance
computing. So because of low-cost
high-throughput sequencing and advances in
computing, it’s become possible now to study microbial
communities and the human body, and also in natural
environments. And the basic
questions that in any of these samples that one
wants to ask are, who’s there? So what types of
microbes are there? And we call this a
taxonomic profile, or what– and also, and
perhaps more importantly, what are they doing? So what genes, what proteins
are encoded in their genome, and what does that
say about how they might be interacting
with our human cells or affecting our health? So an example would be that
microbes in your microbiome might encode an enzyme
that would metabolize a drug that you take. So we are probably all aware
of the idea of personalized or precision medicine. This has been in the
news a lot, and the idea is that when you go into the
doctor’s office and a drug is prescribed, that the
type of drug or the dose might depend on your
own genetic makeup, that different people
respond differently or need different doses or have
different adverse reactions that other people don’t have. Well, your microbiome is
also involved intricately in metabolism of a
drug, and so knowing that you had a
microbe that was going inactivate a
particular drug would be really helpful information. It would also be helpful to know
if your microbiome was going to extract more energy or
less energy from your food than somebody else’s, or
make your gut membrane more permeable or cause more
inflammation or less inflammation, or just be a
signature for where you grew up or where you came from,
or how you were raised, or how many times you’ve
ever taken antibiotics. Does it encode
antibiotic-resistant genes? So there are a lot of
really interesting questions we can ask if we know who’s
there and what they’re doing. So I should have said
at the beginning, also, I’m happy to take
questions during the talk. If anything’s not clear,
please interrupt me. I don’t want this to be formal. And I will also take
questions at the end. So how do we do this, since
there are some data scientists and programmers in the room. I won’t go into a lot
of technical detail, but I wanted to give you
some of the detail of what this computation looks like. So the data set, as I said, can
be a huge file or a database– terabytes of data. So what is it? These are DNA or RNA or
proteins, different kinds of biomolecules. But let’s just take
DNA as an example. And we read sequences,
little fragments from the genomes of
all the organisms that are in the sample. And those fragments
would be typically on the order of 100 base
pairs or a couple hundred base pairs– base pairs being
A’s, C’s, T’s, and G’s that encode genetic information. So one sample, like
your saliva sample that I would collect
in a test tube, we might obtain on the
order of, say, 50 million of these 100-letter-long
sequences from it. So that’s the file or the
database that I would create, and that’s what I’m calling
the metagenome here. These little lines
represent the sequences. And there’s not 50
million lines there, but it’s just a representation. So if I want to
know what’s there and what functions are encoded
or what genes those fragments came from, the basic
thing that we do, and this is sort of– there are
many variants of it, but the basic idea
is we compare that to database of sequences
that we’ve seen before, either directly to the
sequences or to a model, which is a more sensitive
way of comparing the sequences– a model of
what certain gene families look like. So we can do a
probabilistic comparison to the model or a
direct comparison. And my lab didn’t invent
sequence alignment or what we call here
homology search– so seeing if a sequence looks
like it’s similar enough to say it’s the same as
one of these sequences– but we spend a lot of time
tuning those algorithms so they work with these really big data
sets of little fragments that come– unlike other
types of genomic data, they come from lots of different
organisms, many of which we’ve never seen the
genome of before. So there won’t be any perfect
matches in the database. So since a lot of people here
probably think about search, that’s kind of the challenge. Yeah. AUDIENCE: Just a quick
question on that. When you– KATHERINE POLLARD: Yes. AUDIENCE: [INAUDIBLE] in
which ones, for example. KATHERINE POLLARD: Yeah, so for
the protein sequence databases, there is at the NCBI a
non-redundant sequence database that has
everything that’s ever been publicly deposited,
and with direct exact copies deleted. So non-redundant means
one copy of each. There’s also genomes
that we can download, so we can align not to gene
sequences but to whole genomes. So we have all the genomes. There’s about
30,000 genomes that have been sequenced so far,
most of which are microbial. Those are available
through NCBI. A slightly curated
version of them is available through
a database called PATRIC that’s hosted in the UK. There’s data
resources in the US, mostly through
NCBI, and in Europe mostly through the European
Bioinformatics Institute or EBI. They have mostly the
same data in them. There are some smaller
sort of niche databases. And in terms of
annotation, there’s a bunch of functional
annotations that get laid on top
of these sequences, such as the pathways
that they’re in. KEG is an example of a database,
the Kyoto Encyclopedia. We also have a
lot of annotations that would be layered
on top of the sequences. So after we find a
match, we can see what’s known about that sequence. Does that answer? AUDIENCE: Yeah. Just one quick follow-up. KATHERINE POLLARD: Yeah. AUDIENCE: Is that a problem
that there’s so many databases, and is there any
effort to unify [INAUDIBLE]? KATHERINE POLLARD: Yeah,
it can be a problem, because each one is quite large. And if you want to
host this locally, it can be kind of an
unbearable amount of data. There are differences
in results if you use one database versus another. We had a paper this
past year showing that the– we studied
some ocean data and showed that you
come to completely different conclusions
about photosynthesis, and it’s like at
different times of day, at different times
of year, based on which gene database you use. So it is a problem. There is a group called the
Genomic Standards Consortium that’s attempting to establish
some standards in this field. And actually the White House
is really interested in this, so I made several visits to the
White House in the past year to talk about data
standards for this field, because it’s a little
bit Wild West right now. AUDIENCE: [INAUDIBLE] GA for GH? KATHERINE POLLARD: The what? AUDIENCE: The GA for
GH, Google Alliance for Genomics and Health. KATHERINE POLLARD: No, actually. So this is specific
to microbiome. It may be kind of
connected with that, but that would be a
broader initiative that would cover other types of
data, not just microbiome data. This was specifically
around the microbiome, and it was called the
Unified Microbiome Project. But a lot of them
have the same people, and there’s just overlapping
people in discussions. So yeah, the data curation
is a really important aspect. And we’ve spent
some time– I think we’re one of the
only labs that’s stopped to ask if it actually
matters which database you use, or exactly what
runtime parameters you use for these blasts search. So if you change your cutoffs or
your preprocessing of the data, we’ve seen that can
have a huge effect on the biological conclusions. And that’s been, I think,
underappreciated in the field. So we’ve been kind
of a bit curmudgeony, but spending a lot of
time sort of figuring out how these choices in
the analysis pipelines affect the conclusions
that someone makes. And we can do that with either
gold-standard data sets, where we’re pretty sure
we know the answer, or we also do a
lot of simulations. So we take all the genomes
that are out there, we create metagenomic
communities, we fragment them up and simulate
the sequencing experiment on the computer, and then
since we made the community, we know the answer and we
can see in which situations you make false conclusions
or overestimate things. So it’s not perfect, because it
doesn’t capture all the noise in real life, but
it’s nice because we can see where things
break down by trying a lot of different scenarios. So the basic output is a
count of the number of reads that hits a taxonomic group,
like a particular type of bacteria or a function,
a protein family. And part of this sort of
rigorous evaluation we’ve done has shown that you have to
normalize in a variety of ways. So a long gene, just
by chance you’ll get more sequences from it. But it doesn’t mean there’s
more copies of the gene. It just means it’s longer. That’s a really simple
example, but there is a lot of other biases
in these experiments that need to be accounted for,
for accurate quantification. Accurate quantification
is important if you want to compare
across samples. Maybe within a sample
it’s not so important, but if I want to
compare my microbiome to a bunch of other
people’s, or do a big meta analysis of publicly
available data, I need these measurements
to be meaningful, biologically meaningful and
comparable across samples. And that’s a huge problem
in the field right now. And we have to point that out. So one of the more often
touted associations between the microbiome
and human biology is a reported association
with human body mass index. So an argument was made here
under where it says Ley, and you see the purple bars. That was an early study with
19 people in it, 12 of which were obese, seven
lean, and there was more of a particular
phylum of bacteria called firmicutes, which
is the x-axis, proportion of the sequencing library
that was firmicutes. There was more in
the obese people, so it was concluded that the
microbiome was associated with obesity, and
perhaps even playing a causal role in
obesity, or mediating effects that would on obesity. And then in difference, we did
a meta analysis about two years ago now where we looked
across different data sets that either were designed
to look at obesity or weren’t, they were just
random samples of people, but they had lean and obese
individuals in the study. And what we found
was if you don’t do a careful job normalizing
the data, and even if you do, there’s some unaccounted
variability that actually results in two things. The first one is that you don’t
see a consistent association with obesity. In these data sets
there actually isn’t more firmicutes in
the obese people in general, especially as the
study sizes get larger, like they are on the other side,
in the pink and green bars. But more upsetting to me
was that this biomarker for obesity, that
a lot of firmicutes varied more between
the studies that it did between the lean
and obese in each study. And so something– and
this isn’t a very sort of fine resolution measurement. Firmicutes are one of the
two most prevalent phyla of bacteria, phylum being
the highest taxonomic level. So a lot of the
sequencing– there were millions and
millions of sequencing reads supporting these estimates
of the amount of firmicutes, and yet they still got in some
studies like 20% firmicutes on average across
lots of people, and in other studies 80%. So there are a number
of technical reasons why that might be. There could be also
biological reasons, like these are different
study populations, adults living in different
parts of the world. Some of them are college
students, maybe different diets or lifestyles. So some of this could be
real biological differences between the populations,
and some of it I’m sure is also technical differences
in how they did the experiment. So there is a bunch
of issues that I’m not going to go into
in detail, but just something that might be
interesting for you to know is when you see a headline, the
microbiome causes this or is associated with that, there is
I think in the field right now a lot of bias still,
and a lot of– I’m afraid that a lot
of those results aren’t going to replicate when
someone does another study, that people are
sort of overfitting data in a particular
study or that has a particular bias in how
the measurements were made. And this isn’t so shocking. This is a new technology,
there’s a lot of excitement, and there’s a lot of real
promise around it as well. But with any new technology
that comes online, I think it takes a little
while to figure out what all the hitches are. And I think we’re
still figuring that out with microbiome
experiments right now. But I’m not totally pessimistic. Our lab and a number of
other groups around the world are working hard to
solve these problems. So I think there is a lot of
promise for this approach. But I also think
there’s a lot of sort of noisy and contradictory
data out there right now. So luckily, yeah,
some of these problems that we observe in the data
can be modeled and corrected for in the analysis. So we generate a little
bit of this kind of data in my own lab,
but the main thing we created our research
program around is data mining. So a lot of people
deposit not only genome sequences
for microbes, but these metagenomic
experiments where they’ve sampled people
with different traits from different
parts of the world, and also non-human primates or
other organisms’ microbiomes. And most of this data
is publicly available. So the details don’t
matter, but this is just a list of a bunch of
the larger studies. And this isn’t all the
data that’s out there. We have at this point close
to 2 terabytes of data on our lab server
that we’ve downloaded from public resources. But for these
particular comparisons, we grab a certain
subset of the data sets. And this can be on
the order of sort of tens of terabytes of data. And you can see that the
people who have been studied are from a number of
different countries. And excitingly, and I’ll show
some of this later in the talk, like any new sort of
field of biomedicine, at first most of the
people who are studied are North Americans and
Europeans, and maybe someone from Japan or China. But there have been a few
studies in individuals living in nonindustrialized
countries that are revealing some really big
differences in the microbiome that I’ll touch on. So with all the caveats
of issues of data quality and normalization in mind,
I want to nonetheless share with you some of the
things that our field has discovered about the microbiome. Some of these are our work
in my lab and some of them are work of other colleagues. So what’s come to light,
and perhaps some of you know this already, that
one of the strongest drivers of differences
between microbiomes between different
people is what you eat. So what we put into our
bodies is the energy source also for our microbiome. And so depending on diet,
we see large differences. And these are pretty
massive differences, like on the scale
of what I showed you of those differences
between studies in the amount of firmicutes. Someone could have
20% and someone else could have 80% of their
sequencing be from firmicutes. And that can be flipped
within about 24 hours by changing what you ate. So it’s not what you ate 50
years ago, for the most part, but what you literally just ate. So a friend and colleague
at UCSF, Peter Turnbow, has done experiments
where he switches people between different diets
and sees these rapid shifts. Others have looked in
infants, and there’s some differences between
breastfeeding and formula– also between caesarean
and vaginal birth, because you pick up–
the womb is sterile and you get your initial
inoculate during birth. That can affect you
for several years, and perhaps have long-term
effects on your health. Switching between a
vegan and a meat diet has a huge effect on the
composition of your gut microbiome. And those effects are
seen quickly, within a day or so of shifting, and
can be shifted back by just changing the diet back. And despite the sort
of inconsistent results we see when we just
sample people and ask– we see no association
if we just sample them and ask if their microbiome is
associated with their body mass index, so how obese they are. Yet there is some
relationship, because you can take the microbiome
of an obese person, or an obese mouse that
was created genetically, in the case of the
mouse, or by diet, you can transfer fecal
matter to a mouse that doesn’t have a microbiome
that’s been raised germ-free, and you can induce
some of the aspects, like some weight gain and
some metabolic differences. So these shifts
in the microbiome that I’m describing
in response to diet can have physiological
effects and could in theory effect
something like obesity, even though we don’t see a
simple signature for that when we look at the whole data set. So there may still
be– I’m not proposing that there’s no
relationship with obesity, because in an experimental
system we can control it. But out in the
population, there’s too many other things
going on to see that kind of association. The microbiome isn’t
just responding to our diet for its
own sake, but it’s also producing a lot of
molecules from our diet to help us to harvest
energy, synthesize vitamins. So a bunch of things that
happen during digestion, or as I mentioned earlier
in metabolizing a drug, a bunch of those things cannot
happen without the microbiome. So we’re working on
a study right now where a very toxic metabolite
that causes liver damage, it can be eliminated
by just changing the microbiome, essentially. So there are a lot of
not just their genes, but the byproducts
of those genes, of the pathways that
those genes encode– a lot of the
molecules in our body that cause health
or disease are being produced by the microbiome,
sometimes multiple microbes. So to make that end
product, that, say, anti-inflammatory might take
several different microbes contributing, and maybe
some human genes too. So it can be quite
complicated– not as easy as remove this
one microbe or add this other one, in many cases. And the gut microbiome,
which is the main source that– we have microbes
all over our body, on our skin, in our ears, in
our noses, and pretty much even parts of our body we thought
were sterile previously, like the heart. But the biggest chunk of
the microbes in our body is in our gut. By weight, most of us
have about six pounds of microbes that
are in our body, and like four pounds or so of
that is the gut microbiome. But it’s not only
important for digestion. The gut microbiome
is also communicating with your immune system,
hormones, and in some work I find really fascinating,
through the vagus nerve, which is how our brain controls
how our gut works, and communication between
our gut and our brain. Through the vagus
nerve, the microbiome can communicate with our brain. And that communication
is associated with things like
mood, depression, and certain behaviors,
and perhaps, there’s some new evidence, also autism. So the gut microbiome
isn’t just about the gut. It’s also affecting a lot of
other systems in our body. And so not surprisingly,
there is every week a paper coming out now
showing that the microbiome’s associated with a
different disease. As I mentioned, I’m
fearful that some of those may not replicate. But nonetheless,
there are a number of associations that have been
observed in large data sets and have replicated. Associating the
microbiome– so saying, OK, sick people tend to have
these types of microbes or these types of
microbial genes, is interesting, but it doesn’t
really get at cause and effect. So one of the big
pushes in my lab right now is to ask why
the microbiome’s different in disease. Was that a response
to the disease state? Or was it actually
playing a causal role? And to do that, we have
to do longitudinal studies and look at people over time,
or do studies in the lab, for example, with mice or
other models of disease, and ask whether the
microbiome changes first and then you start
seeing the disease, or if it’s the other way around. And can you cause
the disease just by changing the microbiome,
or alleviate it? And the same thing
with metabolism of a drug or any other trait
that you might want to look at. The promise here,
which I think is really interesting
and really amazing, is that because the microbiome
can change so easily– I mentioned going from
a meat to a vegan diet can change the microbiome
within 24 hours– because it’s so
easily manipulated, it could be a much
better therapeutic target than trying to target a human
genetic cause of disease. So gene therapy has
had some success, and perhaps will have
more in the coming years with genome
editing techniques that have recently
been developed, but it would still be pretty
challenging to go in and edit someone’s human genome to
fix a genetic disposition to a disease. But if you could just take a
particular yogurt or probiotic or change your diet and have an
effect on the microbiome that would change your
disease risk or alleviate a bad response to a drug, that’s
something that can much more easily be manipulated. That’s not very invasive. So there’s a lot of promise. And also, just traditional
drug development, like taking a
protein and have it change how some protein
in the human body works, we can also do that to
target microbial proteins. So without even
changing the community, we could also drug the
microbiome, essentially. So I think there’s
a lot of potential. But what I want to talk about
for the next part of the talk is a different idea, which is
that the microbiome isn’t just causing disease, but is
actually protecting us from disease– that
there’s a lot of components of the microbiome
that are healthy, and that disease might be
caused just by losing microbes, not by gaining bad ones,
but losing good ones. So there’s a hypothesis that’s
been around for a long time. I didn’t come up with this,
but I’m very interested in it. And the idea is that we
evolved over millions of years with a lot more
microbes than most of us have in our bodies now, because
modern life has eliminated a huge component
of the microbiome. So it’s like removing an organ
from our bodies, essentially. And we’re all living
still, but there are a number of diseases
that have sort of rapidly and unexplictitly
risen in abundance in industrialized countries. And the strongest example
are the autoimmune diseases, including things like
allergies and asthma, inflammatory bowel
diseases, MS. So all these autoinflammatory
diseases, it’s hypothesized, could be the result of
losing a big component of our microbiome. And I’m really interested
in this scientifically, but also personally, because I
have two autoimmune diseases. I’m a patient. And there aren’t really any
good explanations for my disease or any– there are sort of
things to help me get by, but there aren’t
really any cures. And so I’m really also
as a patient excited about this line of research. So we’ve started to
do some work looking at the microbiomes
of people who aren’t exposed to as much antibiotics
and such a clean lifestyle. So this is a picture of students
who went to South Africa, and we actually piggybacked
on a project that was there to study the human
genetics of KhoeSan hunter gatherers. So these are people living in
South Africa in a lifestyle like our ancestors had thousands
and thousands of years ago. They don’t have routine access
to medical or dental care. They eat a really
different diet than someone in an industrialized
country, they don’t brush their
teeth, et cetera. So this team from
Stanford, friends of mine, were– Carlos Bustamante’s
lab and others were going over to study the human
genetics of these people, and the student who was
sort of shared with my group came to lab meeting and
said, it really sucks, we have all this bacterial
contamination in our samples, and some of them it’s like
85% of our DNA isn’t human, it’s bacterial. And I said, don’t
throw that away. I want to look at it. So a postdoc from my lab got
busy with the contamination. And what we found was that
some of the most abundant– not surprisingly, but
we found that some of the most abundant organisms
in these KhoeSan people, and I’m sorry the data
isn’t really easy to see, but the result is
stated in words there. We found some of those
most abundant organisms in these hunter-gatherers
are rare or completely absent from people in the study I
label HMP as Human Microbiome Project. That was hundreds of individuals
living in the United States. So this was our initial foray. And then as I noted
earlier, there have been some other
people, friends, who’ve studied people in
other parts of Africa. And there’s been a
great study in Peru of two different groups,
one of hunter-gatherers like these KhoeSan people, and
another of agriculturalists, neither living a very
modern lifestyle, but different lifestyles
from each other. And they’ve generated
a ton of data. And at this point
there’s over 3,000 of these shotgun
metagenomic experiments from people from different
parts of the world. The majority are from China,
North America and Europe, but we’re now getting some data
from other parts of the world. And what my lab has done
is a big meta analysis, so we downloaded or
mined all this data and then analyzed
it together, not the way it was
necessarily analyzed in the original
studies, but in bulk to deal with a lot of
the confounding issues that I talked about
earlier, and then asked how the microbes look. Are they the same
in different people, or do they look different
depending on where in the world you are from? And we did this for each
of every common microbiome constituent. So here’s one example. This is eubacterium rectale. It’s common in the
human microbiome. And there’s a map there
showing the sampling locations with the little
Google dots on it. And there’s a phylogenetic
tree that I just overlaid on the map. The exact position on
the map doesn’t matter. This is just a visual
representation. But the colors of
the dots represent where the person came from
whose microbiome was sampled, and the tree is showing
how related they are. So if things are close
together in the tree, then those two
people’s microbiomes were really similar, and not
overall, but specifically for this species of bacteria. So for this species, we
see a lot of structure. For example, the
yellow dots over there, which are all
Chinese individuals, form their own
clade in the tree. So everyone in China has
a eubacterium rectale that’s more similar to
other people in China than to anybody
in another place. The individuals from
the southern hemisphere are pretty different
from everybody else. And the North American
and European clades are mixed together. So there’s a lot of exchange
of the different strains between Europeans
and North Americans. There’s sort of different
groups there, but if you look, they’re kind of a mix
of people from Europe and people from North America. So that’s what this
microbe looks like. But then we go to a
different species, this is bacteroides
uniformis, and it wasn’t found at
all in the people in the southern hemisphere. So the African or
the South Americans, it was wasn’t present as far
as we could tell in them. But if you were from North
America, Europe, or China, you just had kind
of a random strain. So these strains were all pretty
closely related– not that closely related, but they didn’t
form any geographic– there was no geographic pattern at all. This thing gets around,
this particular species, and a new strain
arises in it, at least in the northern hemisphere
is spreading all around to different peoples. So if you’re living
here in California, your strain could
look the most– the person in the world with
the most similar strain to you could be in China. But that isn’t the case
for this other microbe. That would be highly unlikely
that someone in China would have the same
rectale as you. So we thought this was
really interesting, that there was a geographic
signal for some microbes and not for others. And we’re also now looking
at non-human primates. So sort of back to my
original research program of humans versus
chimps, we’re looking in chimps and gorillas and
baboons to see how they fit in. And so far it looks
like they have pretty different
microbiomes than we do, and that the people in the
non-industrialized countries look more like than
non-human primates. So that’s probably like
our ancestral microbiome. And then those of us in the
more industrialized countries have a divergent
microbiome that’s changed from an ancestral state. So that’s our
hypothesis right now, and we’re continuing
to work on that. So just the last thing
I wanted to mention is that our main focus
is the human body, but microbes are the
major constituents of most ecosystems. And there is some
work in my lab that’s looking in natural
environments as well. And even less is known about
the microbes that are there. So actually, saying that 1% of
the microbes in the human body have ever been studied by
traditional microbiological techniques, it’s less than that
if you go out and sample out in the San Francisco
Bay or on the windshield wipers of your car, or out
in the desert somewhere. And yet the microbes are
playing really important roles in ecosystem services,
in global warming, and also they help to mediate
things like oil spills. Some microbes can
eat toxic things and totally help out with
these natural disasters. There’s microbes that
are still chewing away at all the mercury that’s
up in the Sierra Nevada from mining during the gold
rush and slowly degrading a lot of the mercury
and other heavy metal toxins that are up there. So it would be helpful from
an ecological perspective to also know who is there
and what they’re doing. So these same tools
are being applied to natural environments. And just really briefly,
I wanted to show you, I won’t talk too much
about the modeling, but the challenge–
the human body is complicated enough to sample. Something like the
world’s oceans is massive. Microbes are tiny. The oceans are massive. If there’s trillions of
microbes in each of our bodies, imagine how many are
in the world’s oceans. So we have sort of sparse
sampling of what’s out there. And one of the approaches
my lab has used is to try to build
models to predict what’s going on in places
where we haven’t sampled yet. So a couple years
ago we had a paper where we used this technique
to reconstruct an essentially extinct ecosystem called
tall grass prairie, which used to cover the Midwestern
United States before so many people were there, and
before agriculture, monoculture basically destroyed
this ecosystem. And we did that by sampling in
graveyards where the tall grass prairie hadn’t been disturbed,
and in some nature preserves, the only places where the
tall grass prairie still exists, built a model for
which kind of microbes you find in which
kinds of soil, and then used what we think that the
climate and other properties of the soil were
before agriculture, and predicted what microbes
you would find where. And this is just a heat map
that’s showing the diversity. So the warmer colors would be
like a more diverse community of microbes would be found in
certain parts of the Midwest compared to others. So we did this modeling. We’re also thinking forward. So if you can build
a model like this, you can predict for soils
or air or the oceans what the microbes will look like
in future climate scenarios. So we’ve been doing
a lot of that. And it looks like
as the Earth warms, the microbes are
going to like it and we’re going to get
more diversity across most of North America, Tibet, a
lot of the world’s oceans. So that’s something
to keep in mind. And one particular
health consequence is that particular microbes
that have effects on humans, such as fungal allergens in air,
may change as climate changes. So this is a study where
we sampled on the doorsills of people’s houses. It’s sort of a great
passive collector for what’s in the environment, because
most people don’t clean on the top of their door sills. And it was called the
1,000 Homes Project. We had a bunch of collaborators. But anyway, it was a
citizen science project. People sampled
their own doorsill and then sent the samples
in and we sequenced them. And a lot of things were
discovered in that data, but this analysis focused
on fungal allergens. So you can see if
there is DNA from fungi that are known allergens. And so we mapped where
the allergens were found across the United States. And then in these pictures,
what we did was we plugged in values for
future climate scenarios. And what this is
showing is the change in the amount of microbes. So for this particular– I’m
just showing a few examples, but for this epicoccum
nigrum, which is a pretty wicked
pathogen and can kill immunocompromised
people, we see in red– red would be an
increase in the amount of it. So California is going to get
a lot more nigrum in the future than it has now, as
well as the South. Other places might get
less, as shown in blue. And then this other
species, alternaria, is mostly decreasing,
but it looks like Florida is going to
get pretty hit by this one. So we can make those kind
of predictions as well. So to wrap up, getting
back to the human health angle and our own biology,
just a couple summary points about what the
field has learned so far. I feel like it’s
just very early days for this field of research. There are a lot of
technical problems we’re still trying to overcome. But we’ve already observed that
we can’t explain human biology completely by just looking
at our own human DNA and that our individual
microbial communities are certainly playing a
complex role in our health and our normal day-to-day life. And what makes us human
or makes us who we are isn’t just our own
DNA, but also the DNA of the particular microbes
we are carrying around at any given time. This is not fixed. We’re not stuck with
it for our whole life. It’s manipulatable. And so it’s likely that
health care in the future will actually be leveraging this
to diagnose you and potentially treat you as an alternative to
more invasive things like drugs or surgery. So that’s where we’re
looking towards the future. I think there’s
a lot of promise, but I also think there’s
a lot of challenges. So I’ll be really
excited for collaboration with anyone here who has ideas
about how to work on the field, and I’m very happy to
take any questions. So thanks. [APPLAUSE] AUDIENCE: Hi. Is there a certain aspect of
the data that is less confounded or has less technical
problems in current days? Like if I would go
read papers now, what would I believe
more than not? KATHERINE POLLARD:
That’s a great question. So let’s see if I can’t
explain what I think one of the biggest problems is. It’s pretty technical,
but let me give it a shot. So the common thing
that someone does, let’s say to say how
much of bacteroides uniformis do you have,
to quantify that, is that they look
at the 50 million sequencing reads and see
how many of those align to uniformis, maybe you
normalize that count a little bit, and then divide
by the total sequencing library. So it’s the proportion
of your sample that came from that bacterium. And unfortunately, what
we really care about is something about
the community, like what proportion of the
cells were of that type. And it turns out the
proportion of the sequencing reads isn’t a very good estimate
of the proportion of the cells. And I’ll try to explain a
couple of the reasons why. One is that different
organisms will have genomes of different sizes. So something with a really
big genome, like a yeast, like a fungal
allergen, could take up a whole bunch of your
sequencing library, but there might only
be a few cells there. The human DNA is in there. So I mentioned that
the student thought the bacteria was a contaminant
of his human genetic study. We think of the human
DNA as a contaminant of the microbial study. And it can be up to
half of the sample. So if half your library
just went to human, that can kind of
skew the ratios. So as the genome sizes of the
organisms change, the amount of contamination from human
or from the lab process– unfortunately, one
of the kits that’s used for extracting
the DNA biases the ratios of different
types of organisms. So what can you trust? I would say if
you see in a paper that something measured the
actual cellular abundance or cellular relative
abundance of an organism, I would believe
that as something really quantitative
and comparable, but not the sort of proportion
of the sequencing reads. And so that’s not sort of
like a field of inquiry that’s reliable, but a
statistic that I think is really reliable. You would have to
dig down a little bit to find that, obviously. But I wish I could say that,
say, high level things were easier to estimate, like the
amount of a particular phylum– you might not get a
specific species right, but you might get the phylum. You know a higher
taxonomic resolution. I actually don’t
think that’s the case. And I actually think
studies that drill down on one specific thing and
quantify it very carefully probably do better than the
ones that are sort of trying to get everything right. I also think that
right now, studies that do this experiment
of sequencing just all the random sequences
are– still a lot of them are problematic. But some people are going in
and targeting specific things and sequencing those. In general, that’s
probably more reliable, like more targeted studies. I think that the unbiased
approach has a lot of promise, but we’re not quite there yet on
a lot of the analytical tools. Does that answer your question? AUDIENCE: Oh, definitely. KATHERINE POLLARD:
Yeah, OK, great. Any other questions? AUDIENCE: You were just
talking about a lack of analytical tools. KATHERINE POLLARD: Yes. AUDIENCE: I’m curious to know
where are the biggest gaps, and what kinds of
technologies are you guys currently developing
or would like to develop? KATHERINE POLLARD:
Yeah, great question. So we’re developing a lot of
tools for these normalizations that I mentioned, adjusting
for the size of the genome, so trying to estimate
and then adjust for the size of the
genomes in the sample and the amount of contamination
and other sorts of biases. Those are kind of
under the hood. And they require someone to be
like a command line programmer to run them. But actually what I wanted to
show you was something that I thought might be interesting. We’re basically trying
to make this research accessible to other people,
because my lab can’t think of all the interesting
questions to ask, or have time to ask all
of them of the data. So we’ve downloaded
every microbiome study that’s ever been done. There’s over 3,000 now. And we’ve, we think as
accurately as currently can, quantified the amount
of every protein family and every microbe in them. And then we’ve compared them
across these 3,000 samples, some of which have
diseases or are from different
parts of the world, or different aged individuals
or males versus females. We’ve computed all that
as like a pre-compute, and then we’ve built this tool
so people can search the data and just pull out the answers. And so, let’s see. It’s a search tool. You can search right
here by a gene, and we’re adding something that
would be more like a Google search, where you search by a
word, like obesity or enzyme, or ABC transporter. So we’re still working. That will be out in
another month or two, expanding the
search aspect of it. And then here, I think if
I’m online I could show you, these are just jobs
that have run recently. So I can pull up a
job and show you– this is in development right
now, so it’s not beautiful. But someone searched
a gene sequence, and then these are things
that we think it might be. So these are like possible
annotations of it. This is a quantification
of how common it is in the human microbiome. So people– some people
have no copies of this in their microbiome,
some of the samples, and the sample with
the most of this gene had 2.3 copies per organism
in their microbiome. So they had organisms that had
multiple copies– on average had more than two
copies of this gene. So people vary a lot. These are some of
the taxa that have the gene, some of the
bacteria that are carrying it. And then you can
look at associations with different traits. So let’s see if I can find one. This is statistically
significant– what country you’re from. So I can click on it. And it looks like
this gene is more common in the
microbiomes of people from North America
compared to Europe or Asia. So we’ve precomputed
all of these, and these can just be
queried by somebody. So that’s one of the tools
that we’re working on, as well as under-the-hood
tools for making sure the numbers that go
into these tests are accurate qualifications. Well, I’m happy to
stick around if anybody wants to shout afterwards. Thank you so much
for the opportunity. [APPLAUSE]

Reader Comments

  1. Great talk, great presenter, fascinating subject – I am no scientist and no technology geek but I was riveted from the beginning to the end….
    Thank you!

  2. Hopefully this can be put to practicable use that can actually help people, rather than be the latest new thing with a lot of hype that ends up being a money train for industry but no help to the general population.

    And, notice how insulin resistance, which must surely be a factor here, is, once again, flying under the radar as everyone gets giddy with the (money making) possibilities of the microbiome.

  3. its not the fucking becteria, it's the wrong food that gets you fat and sick, also gives feeds wrong type of bacteria too much.

  4. I would like to know what the microbes do in the body – like which proteins/SCFAs etc they produce rather than their exact species.

  5. You're a Dr…grow up and lose the dreadlocks…you look ridiculous. Great talk, very informative and knowledgeable speaker…now look the part!

Leave a Reply

Your email address will not be published. Required fields are marked *