GUNC Output

Output files

Normal Output

GUNC.{DB version}maxCSS_level.tsv - output file with scores for a taxonomic level with the highest CSS score (or the level closest to kingdom if multiple maxima).

Detailed Output

{PREFIX}.all_levels.tsv - output file with results for each taxonomic level.

Output Columns


name of input genome


number of genes called by prodigal or directly provided by the user.


number of genes mapped by diamond into GUNC refDB.


number of contigs containing mapped genes.


taxonomic clade labels at this taxonomic level were used to calculate values in all following columns. For each genome, all scores at six levels (species level can be added using a command-line option) are calculated.


only major clades that have >2% of all mapped genes assigned to them are retained to calculate other scores. Value of this column is n_genes_retained/n_genes_mapped.


n_genes_mapped/n_genes_called * proportion_genes_retained_in_major_clades, i.e. a portion of all called genes retained in major clades.


a result of applying a formula explained in GUNC paper to taxonomy and contig labels of genes retained in major clades. Ranges from 0 to 1 and is set to 0 when genes_retained index is <0.4 because that is too few genes left.


a portion of genes retained in major clades assigned to all clades except the one clade with the highest proportion of genes assigned to it.


an Inverse Simpson Index of fractions of all clades - 1 (as 1 genome is expected). It is a score describing the extent of chimerism, i.e. the effective number of surplus clades represented at a taxlevel.


the mean identity with which genes in abundant lineages (>2%) hit genes in the reference.


genes_retained_index * mean_hit_identity Estimates how well a genome is represented in the GUNC DB.


If a genome passes GUNC analysis it means it is likely to not be chimeric (or that chimerism cannot be detected especially when its reference representation (RRS) is low). A genome passes if clade_separation_score <= 0.45, a cutoff benchmarked using simulated genomes.


Please note that most of genomes having reference_representation_score >0.5 (roughly) are labelled as passing GUNC filters not necessarily because they are non-chimeric but rather because they are so poorly represented in the reference that it is much more difficult to judge.


MIMAG_medium and MIMAG_high filters are incomplete; the MIMAG standard additionally requires data on rRNA and tRNA counts