Extreme environments are encoded in the genomes of the organisms that live there

The conversationAn organism’s genome is a set of DNA instructions necessary for its development, function and reproduction. The genome of a modern-day organism contains information from its journey on an evolutionary path that begins with the “first universal common ancestor” of all life on Earth and culminates in that organism.

An organism’s genome is encoded within itself and contains information that can reveal connections to its ancestors and relatives.

Other dimensions of the genome

Our research examines the hypothesis that an organism’s genome might contain types of information other than genealogy or taxonomy. We wondered: Could an organism’s genome contain information that would allow us to determine what type of environment the organism lives in?

Pitch Lake in Trinidad and Tobago

Extremophiles have been found in environments such as Pitch Lake in Trinidad and Tobago, the largest asphalt deposit in the world.

Image credit: Anton_Ivanov/Shutterstock.com

As unlikely as it seems, our team of computer science and biology researchers at the University of Waterloo and Western University found that this is the case for extremophiles – organisms that live and thrive in extremely harsh environments. These environmental conditions range from extreme heat (over 100°C) to extreme cold (below -12°C), high radiation or extreme acidity or pressure.

DNA as language

We looked at genomic DNA as a text written in a ‘DNA language’. A DNA strand (or DNA sequence) consists of a sequence of basic units called nucleotides strung together by a sugar-phosphate backbone. There are four such different DNA units: adenine, cytosine, guanine and thymine (A,C,G,T).

Abstractly, a DNA sequence can be thought of as a line of text, written with ‘letters’ from the ‘DNA alphabet’. For example, ‘CAT’ would be the three-letter ‘DNA word’ corresponding to the three-unit DNA sequence cytosine-adenine-thymine.

In the 1990s it was discovered that by counting such DNA words in a short DNA sequence extracted from an organism’s genome, one could determine the organism’s species and its degree of relatedness to other organisms in the evolutionary ‘tree could identify. of life.”

The mechanism of this identification or classification of an organism based on the number of DNA words is similar to the process that allows us to distinguish an English book from a French book: by taking one page from each book, it stands out that the English text is common. the three-letter word ‘de’, while the three-letter word ‘les’ is common in the French text.

Note that the word frequency profile of each book does not depend on the specific page we read or on whether we considered multiple pages, a single page, or an entire chapter. Similarly, the frequency profile of DNA words in a genome does not depend on the location and on the length of the DNA sequence selected to represent that genome.

DNA strand illustration

A DNA strand consists of a sequence of basic units: adenine, cytosine, guanine and thymine (ACGT).

Image credit: ktsdesign/Shutterstock.com

That DNA word frequency profiles can act as a ‘genomic signature’ of an organism was an important discovery and until now it was believed that the DNA word frequency profile of a genome contained only evolutionary information relating to the species, genus, family, order, class, phylum, kingdom, or domain to which the organism belonged.

Our team wanted to ask whether the DNA word frequency profile of a genome could reveal other types of information – for example, information about the type of extreme environment in which a microbial extremophile thrives.

Environmental impressions in extremophile DNA

We used a dataset of 700 microbial extremophiles living in extreme temperatures (extreme heat or cold) or extreme pH conditions (strongly acidic or alkaline). We used both supervised machine learning and unsupervised machine learning computational approaches to test our hypothesis.

In both types of environmental conditions, we found that we could clearly detect an environmental signal that indicated the type of extreme environment in which a particular organism lived.

In the case of unsupervised machine learning, a ‘blind’ algorithm was given a dataset of extremophile DNA sequences (and no other information about their taxonomy or their habitat). The algorithm was then asked to group these DNA sequences into clusters, based on the similarities it could find between their DNA word frequency profiles.

The expectation was that all clusters discovered in this way would lie along taxonomic lines: bacteria grouped with bacteria, and archaea grouped with archaea. To our great surprise, this was not always the case and some archaea and bacteria were consistently grouped together regardless of which algorithms we used.

The only obvious similarity that could explain why they were considered similar by the multiple machine learning algorithms was that they were heat-loving extremophiles.

A shocking discovery

The tree of life, a conceptual framework used in biology that represents genealogical relationships among species, has three major limbs called domains: bacteria, archaea, and eukarya.

Eukaryotes are organisms with a membrane-bound nucleus, and this domain includes animals, plants, fungi, and the single-celled microscopic protists. In contrast, bacteria and archaea are single-celled organisms that do not have a membrane-bound nucleus that contains the genome. What distinguishes bacteria from archaea is the composition of their cell walls.

Tree of life

A schematic tree of life with the primary domains, archaea and bacteria, shown in purple and blue respectively, and the secondary domain, Eukaryotes, in green.

The three domains of life differ dramatically from each other and genetically a bacterium is as different from an archaeon as a polar bear (eukarya) is from a E.coli (bacteria).

The expectation was therefore that the genomes of a bacterium and an archaeon would be as far apart as possible in any clustering based on any measure of genomic similarity. Our finding of some bacteria and archaea clustered together, apparently just because they are both adapted to extreme heat, means that the extreme temperature environment they live in caused profound, genome-wide, systemic shifts in their genome language.

This discovery is akin to finding a completely new dimension of the genome, an ecological dimension, that exists alongside the known taxonomic dimension.

Genomic impact of other environments

This finding is not only unexpected, but could have implications for our understanding of the evolution of life on Earth, and guide our thinking about what it takes to live in space.

Pyrococcus furiosus

Pyrococcus furiosusa thermophilic archaeon that was surprisingly grouped with thermophilic bacteria.

Indeed, our ongoing research investigates the existence of an environmental signal in the genomic signature of radiation-resistant extremophiles, such as Deinococcus radioduransthat can survive exposure to radiation, cold, dehydration, vacuum conditions and acid, and has been shown to survive in space for up to three years.The conversation

Kathleen A. Hill, associate professor of biology, Western University and Lila Kari, professor of computer science, University of Waterloo

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Leave a Reply

Your email address will not be published. Required fields are marked *