I have been delaying the work on whole genome based phylogeny for almost two months but I have no escape now and had work on it. In principle it is pretty simple but data mining and formatting is itself a challenge. I will try to warm up by reading this article published in Molecular Phylogenetics and Evolution. This work in done Sackler Institute last year.
They have used 12 whole genome sequences with 3130 genes. It is shown that we should have at least 160 genes concatenated in order to produce reliable results.
"More recent phylogenetic studies
(Christensen et al., 2004; Gioia et al., 2006; Redfield et al., 2006)
have included the added power of considering multiple genes in
phylogenetic analysis. With over 10 species of Pasteurellaceae with
whole-genome sequences it is now possible to use whole-genome
datasets to assess the evolutionary relationships in this family."
Here is a list of species used on article,
Here comes the tough part, following methods and material,
1. Matrix construction
They have used 12 whole genome sequences with 3130 genes. It is shown that we should have at least 160 genes concatenated in order to produce reliable results.
"More recent phylogenetic studies
(Christensen et al., 2004; Gioia et al., 2006; Redfield et al., 2006)
have included the added power of considering multiple genes in
phylogenetic analysis. With over 10 species of Pasteurellaceae with
whole-genome sequences it is now possible to use whole-genome
datasets to assess the evolutionary relationships in this family."
Here is a list of species used on article,
Here comes the tough part, following methods and material,
1. Matrix construction