Sequencing Eucalypt DNA: behind the scenes with Andrew Thornhill
Ever wondered what is involved in sequencing the DNA of more than 700 eucalypt species? We asked botanist Andrew Thornhill, lead author of a fascinating new study published in Australian Systematic Botany last month, to shed some light on the many years of work, people involved and challenges of tackling this mammoth task.
Why were you drawn to eucalypts as a research subject?
I have had an interest in native plants since my undergraduate days at Monash University. I remember walking around and looking at the eucalypts growing on campus and thinking I’d like to be able to walk around and just to identify them all off the top of my head one day. I did my PhD at the Australian National University on the palynology, evolution, phylogenetics and biogeography of the Myrtaceae plant family and that’s the family the eucalypts are in. When I started a postdoc in 2011 at the Centre for Australian National Biodiversity Research at CSIRO I was employed to work on making a species level phylogeny of Australian Acacia. I suggested to my supervisor, Joe Miller, that we could make a species level phylogeny of the eucalypts and that I thought it would probably be quite popular and lots of researchers would have an interest and use for it.
Sequencing the DNA of over 700 eucalypt species sounds like a massive task. How long did it take you and what is involved?
It was a big team effort. We based our sampling on the previous work from groups such as Pauline Ladiges and Mike Bayly from the University of Melbourne, and Dorothy Steane from the University of Tasmania. It was their studies that I used to compose a list of species that we didn’t have any DNA sequences for.
I had been working with Leigh Nelson and Dave Yeates from CSIRO Australian National Insect Collection (ANIC) and they were the first to join the team, as well as my PhD supervisor, Mike Crisp from ANU, who had been collecting eucalypt leaves with this kind of project in mind for years, and Carsten Külheim, also from ANU at the time. Carsten had a collection of leaves from Dean Nicolle’s Currency Creek farm, an incredible private resource of eucalypts, with almost every species in the world growing there. We collected leaves from trees growing at a number of botanic gardens, and Mike and I did a big field trip to the Pilbara to collect more species. Finally, I sampled from the eucalypt specimens stored in the Australian National Herbarium in Canberra. In all, it took around three to four months to gather all of the leaves needed for this project.
Next we plated our leaves to extract the DNA with the help of my main volunteer and eucalypt plater Laura Johnson. We gave the plates to CSIRO DNA lab technician extraordinaire Kristy Lam and in less than two months we had the DNA from over 700 eucalypt species which totalled 2560 individual sequences. Then it was up to me to edit each sequence that had been returned and analyse them all together. It took another seven years for the work to finally be published.
Why is it important to sequence the DNA of plants?
Before the rise of DNA and phylogenetics we could only assume based on morphology how organisms were related to each other. That’s not to say that taxonomy didn’t do a really good job, but DNA gives us a non-biased and detailed approach to explaining how everything is related. By extracting the DNA from the eucalypts we hoped that we would be able to explain how all of the different taxonomic groups are related to each other and that would then help us better interpret the evolutionary history of the group. Our study only used four loci which is a very small amount of the total DNA in the leaves of plants. We therefore didn’t get fully conclusive results but there is currently a follow up study using the same DNA extractions from our project and Next Generation Sequencing approaches that allows most of the DNA to be sampled.
Can the phylogeny you made be used in conservation?
Yes, the phylogeny can be used in conservation through a field of science that my colleague Brent Mishler and I named – Spatial Phylogenetics – and which has its origins at CSIRO. In fact an original version of the eucalypt phylogeny was used in a paper led by my colleague Carlos Gonzalez-Orozco and published in Nature Climate Change (external link). In that paper the phylogeny was used to calculate phylogenetic diversity and we modelled into the future where significant areas of phylogenetic diversity would occur if the temperature in Australia keeps rising. The main goal of the research was to identify areas that will become significant so that we can set them aside now, rather than identify areas that are already significant.
Your research uncovered that while eucalypts are an old group, there seemed to be an explosion of diversity in the last 2 million years. Can you explain how you reached this conclusion?
Technically we didn’t uncover that the eucalypts are an old group. The fossil research is what has given this information. The oldest known eucalypt fossil (external link) comes from Patagonia in South America and is 52 million years old. The oldest known Australian eucalypt fossil is a pollen from Bass Strait. We used the age information from these fossils to date the phylogeny and get age estimates for all of the divergences between the eucalypt species and the groups that they are classified into. We then ran analyses to indicate where in the phylogeny there had been significant shifts in the way that the eucalypts had evolved. This is how we arrived to the conclusion that five Eucalyptus clades have had extreme diversification occur in them in the last two million years. The differences between the DNA of the Eucalyptus species in these five clades is small enough, and the age difference amongst them young enough, for the diversification analysis program to assume that the only way this could have happened is very recently.
Were there any interesting challenges you encountered during your research, and how did you deal with them?
One challenge was that our dataset was so big that it couldn’t be run a normal computer. We utilised the CSIRO supercomputer to run our Bayesian analyses and even then it took over four months for the analysis to run. Keeping our heads around taxonomy was slightly difficult. Some species names that were current when I started the project are no longer considered valid. New species have been described since the project was started. This project was started when I was working as a postdoc at CSIRO in Canberra. But the life of a postdoctoral scientist meant that I then moved jobs to the Australian Tropical Herbarium in Cairns, and then I moved again to work at the University of California, Berkeley. This project followed me around the world and I kept working on it when I had time at all of the different places that I worked. It was finally published seven years after it was started, and just as I started a new joint position between the State Herbarium of South Australia and the Environment Institute, University of Adelaide.
Andrew Thornhill is currently a research associate at the State Herbarium of South Australia and the University of Adelaide.
The research article, ‘A dated molecular perspective of eucalypt taxonomy, evolution and diversification’, is free to read throughout May in Australian Systematic Botany. If you’re keen to find out more about the outcomes and implications of his research, read this article Andrew wrote for the Conversation last month.