Example BoaG Programs

BoaG is a flexible language, capable of answering a wide variety of questions. Here we provide several example questions and BoaG programs to answer those questions.

  1. What are proteins in the NR database that have taxonomic name "Escherichia coli"?
  2. What are the list of conserved proteins?
  3. What are protein sequences and the frequencies of coronavirus in taxonomic assignments?
  4. What is the frequency of protein length in the NR database?
  5. What are all clusters belong to a specific protein function SCN?
  6. What are number of protiens in each phylum in the tree of life?
  7. What are the list of taxonomic assignment for all proteins in NR?
  8. How many proteins in NR do not have a taxonomic assignment?

What are proteins in the NR database that have taxonomic name "Escherichia coli"?

s: Sequence = input; count : output sum[string] of int; foreach(i:int; def(s.annotation[i])) if (strfind( "Escherichia coli", s.annotation[i].tax_name) > -1) count[s.seqid] << 1;

Run Example | Published Results

What are the list of conserved proteins?

s: Sequence = input; protOut : output sum [string][string] of int; distinctTax := function (seq: Sequence): int{ taxSet : set of string; foreach(i:int; def(seq.annotation[i])) add(taxSet,seq.annotation[i].tax_id); return(len(taxSet)); }; # we define conserved proteins as those who have > 10 distinct taxonomic assignments if (distinctTax(s) > 10){ foreach(i:int; def(s.annotation[i])){ if (strfind("[",s.annotation[i].defline)> 0) protOut [trim(substring(s.annotation[i].defline, 0, strfind("[",s.annotation[i].defline)))][s.seqid] << 1; else protOut [s.annotation[i].defline][s.seqid]<<1; } }

Run Example | Published Results

What are protein sequences and the frequencies of coronavirus in taxonomic assignments?

s: Sequence = input; count : output sum[string][string] of int; foreach(i:int; def(s.annotation[i])) if (strfind( "coronavirus", s.annotation[i].tax_name) > -1) count[s.seqid][s.annotation[i].tax_name] << 1;

Run Example | Published Results

What is the frequency of protein length in the NR database?

s: Sequence = input; counts: output sum[int] of int; foreach(i:int; def(s.cluster[i])) if (s.cluster[i].similarity==95) counts [s.cluster[i].length] << 1;

Run Example | Published Results

What are all clusters belong to a specific protein function SCN?

s: Sequence = input; counts_protein: output collection[string][string] of int; foreach(i:int; def(s.cluster[i])) if (s.cluster[i].similarity==95) if (strfind( "SCN", s.annotation[i].defline) > -1) counts_protein [s.seqid][s.cluster[i].cid] << 1;

Run Example | Published Results

What are number of protiens in each phylum in the tree of life?

# search for Streptococcus s: Sequence = input; phylCount: output sum [string] of int; taxs := {"Firmicutes", "Fusobacteria"}; for (j := 0; j < len(taxs); j++) exists (i: int; match(taxs[j], s.annotation[i].tax_name)) phylCount[taxs[j]] << 1;

Run Example | Published Results

What are the list of taxonomic assignment for all proteins in NR?

s: Sequence = input; clstrOut : output collection [string] of string; getTaxList := function(seq: Sequence):string { taxids :=""; foreach(i:int; def(s.annotation[i])){ if (s.annotation[i].tax_name !="") taxids = taxids + s.annotation[i].tax_id + " ";} return taxids ; }; foreach(i:int; def(s.cluster[i])) if (s.cluster[i].similarity==95 && s.cluster[i].representative) clstrOut [s.seqid] << getTaxList(s);

Run Example | Published Results

How many proteins in NR do not have a taxonomic assignment?

s: Sequence = input; countNull: output sum of int; foreach (i: int; def(s.annotation[i])) if (s.annotation[i].tax_name == "") countNull << 1;

Run Example | Published Results