PLEIADE

PLEIADE

PLEIADE

From patterns to models in computational biodiversity and biotechnology

 

Team Leaders: Jean-Marc Frigerio & Franck Salin

Manager : Florence Le Pierres

Background and motivation

Knowledge and preservation of biodiversity is a crucial issue that must be kept in mind when designing and carrying out any human activity. Knowledge of diversity, particularly in terms of species, is the basis for understanding the dynamics of communities. However, this diversity is poorly known, despite several centuries of natural history. Several "revolutions" allow us to approach this knowledge with new concepts, tools and methods:

  •  the rapprochement between evolution, molecular biology, systematics and genetics, which allows us to understand current diversity as the result of an evolutionary history (phylogenies, molecular systematics);
  •  the NGS revolution that allows access to the molecular diversity of entire communities, all organisms combined (metabarcoding);
  • a current digital revolution combining massive data and access to high performance computing, with the development of analysis methods based on AI.

Pleiade's challenge is to contribute to the development of numerical tools and methods in metabarcoding, with investment in high performance computing, for a better characterization of diversity patterns. For this purpose, Pleiade is a component of a joint Inra/Inria team. (see https://www.inria.fr/equipes/pleiade).

Objectives and strategies

The activity of Pleiade is globally articulated around the following elements:

  • within the R-Syst network, to promote exchanges and discussions around the notion of species, how it is declined according to the major taxonomic groups of the network (from bacteria to insects); to analyze the quality of the dictionary between morphologically based taxonomy (the one on which systematics is based) and molecularly based taxonomy (barcoding), by associating teams producing and maintaining reference databases (bacteria, micro-algae, plants, fungi, nematodes, insects), in collaboration with the Rare infrastructure.
  • study species associations within communities according to spatial, temporal or environmental determinants (community assembly)
  • upstream of this activity, to contribute to the evolution of tools and methods for community inventories, in collaboration with the scientific community of intensive computing to facilitate the transition to scale (coll. HiePACS, SED Inria, IDRIS, GRICAD, ...), and the MIAT unit for support on statistical methods (computational statistics)
  • to transfer the tools and methods in metabarcoding to the user research teams, in particular via collaborations with the PGTB platform; to (in)validate them by a comparison with the tools accepted as the current state of the art (tools behind Mothur, QIIME, DADA2, the BLAST, SWARM, ...) ;
  • to accompany or even anticipate the evolutions in the tools made necessary by the evolution of the sequencing technologies, such as the current emergence of the techniques known as "long reads", by a proximity and collaboration with the PGTB sequencing/genotyping platform

A particular focus is placed on the processing of massive data from NGS, which poses problems that have been partially resolved. Most of the methods were indeed designed when the data sets were of "accessible" size (Sanger data). The scaling up of NGS data is done with heuristics (gluttonous algorithms, etc...). Pleiade's strategy regarding methods and tools is twofold :

  • to go as far as possible (in terms of size of the files to be analyzed) with exact and controlled methods, in order to ensure a repeatability of the treatments, and to propose benchmarks of comparison with the diversity of the available heuristics. This requires the use of intensive computing (parallelization, distribution), which is one of Pleiade's development axes;
  • to link the characterization of diversity to machine learning tools and methods, in particular to rely on supervised learning tools and methods for the construction of inventories with reference bases, and unsupervised learning for the construction of OTUs.

The scientific strategy of Pleiade, which is a small team (two researchers, two engineers each part-time) is to develop a powerful network of ongoing collaborations, including

  • the HPC community, thanks to the double affiliation Inra/Inria, extended with the MIAT unit for computational statistics
  • the European metabarcoding community, thanks to the COST project DNAqua.net
  • within the European metabarcoding community, thanks to the COST project DNAqua.net
  • an association between tools (Pleiade issue) and biological data, within the Inra R-Syst network
  • teams in French Guiana (IPG) on issues related to biodiversity

Key words

Metabarcoding - Community ecology - Molecular-based systematics - NGS - Massive data - Methods and algorithms - High performance computing - Supervised and unsupervised learning

Staff members

Permanents

Fixed-term employees/Post-doc

PhD students

Alain Franc (DR INRAE)

Guillaume Ravel

Sourakhata Tirera
(Institut Pasteur de Guyane, Univ. de Cayenne, co-direction (direction : Anne Lavergne, IPG))

Jean-Marc Frigerio (IR INRAE)

Bonnie Bailet
(SLU, Uppsala, co-direction (direction : Maria Kahlert, SLU))

Simon Labarthe (CR INRAE)

Mohamed-Anwar Abouabdallah

Franck Salin (IE INRAE)