/
2020-04-03

2020-04-03

When?

Friday, 3 April 2020

10.00 - 11.00 h

Where?

Zoom: we are sending out a zoom invitation to the journal club email list and will post it on the NBIS slack channel #nonmodel_organisms

Which paper are we discussing?

"Draft Genome Assembly of a Fouling Barnacle, Amphibalanus amphitrite (Darwin, 1854): The First Reference Genome for Thecostraca"
Kim et alFront. Ecol. Evol., 06 December 2019.
https://www.frontiersin.org/articles/10.3389/fevo.2019.00465/full
+ small supplement (incl BUSCO plot so look at it).

Who is presenting?

Magnus Alm Rosenblad

Notes

This is planned as a discussion about how to evaluate an assembly for a non-model species for which there are no related references to compare with.

Genome size: For this and related species there are some experimental genome size estimations http://genomesize.com/ 
( Use Search Criteria: phylum='Arthropoda' AND sub phylum='Crustacea' AND class='Cirripedia' AND order name='Thoracica' )
The species is listed as Balanus amphitrite and there are two estimations, 740 Mbp and 1.4 Gbp, of which we believe the lower is the correct haploid size since B.improvisus (our species) is that size.
Is this value used in the paper (for calculating sequence coverage etc)?

Heterozygocity: Furthermore, there is no published data on the heterozygocity level in this group of species, and it is mentioned briefly in the paper for the kmer plot.

Sample prep: How many individuals were used, consequences? 8 ug total. No molecule length plot shown. (As usual, have not seen any paper with a plot.)
Sequencing: Mainly Pacbio Sequel (8 cells, 56 Gbp),  short Illumina PE for correction, plus MP for scaffolding.
Assembly: No info on how much of the Pacbio reads were >8-10 kbp. Recommendation fr UGC is 40x >8-10kbp.
Allelic contigs/Haplotig removal: By purge haplotigs, but no info on params, or number of contigs/Mbp removed. Could be very, very few.
Assembly size: Initially 610 Mbp, slightly more after scaffolding (613). N50: 230 Kbp. Contigs: 4350. (Looks pretty good at first glance!)
BUSCO eval: Arthropod set used. Good 94% vs genome, 9% less vs annotated genes (makes sense, genes hard to predict).
BUSCO comment: Suppl Fig 1 shows approx 30% dual copy BUSCO genes, not commented by the authors. Haploid assembly?

So what is the quality? (We can disregard the annotation which is really tricky.) What can we say about it?
% of genome in assembly etc.
How should coverage/sequencing be calculated before sequencing with this kind of heterozygocity?
Etc.

//

Related content