
BIOINFORMATICS
In the last couple of years biological data generation expanded rapidly. Particularly, in the field of sequencing with the emerging of the Next Generation Sequencing (NGS) machines.
​
However the analysis of these large datasets is getting more complicated. In our bioinformatics workflows we use state-of-the-art technologies and the most updated, also trustworthy, databases to interpret our genomics data.
Pathogenicity prediction of genetic variants
De novo assembly
RNA quantification
Gene expression analysis
Quality control of NGS sequencing data
Reference mapping
Variant identification
Variant annotation
Standard bioinformatic pipeline
01
Sequencing
The Illumina next-generation sequencing (NGS) method is based on sequencing-by-synthesis, and reversible dye-terminators that enable the identification of single bases as they are introduced into DNA strands. Binary Base Call (BCL) files are the raw data files generated by the Illumina sequencers.
02
Fastq generation
FASTQ is a text-based sequencing data file format that stores both raw sequence data and quality scores (FASTQ).
03
Adapter trimming, quality filtering
Sequences corresponding to the library adapters can be present in the FASTQ files and should be removed from reads because they interfere with downstream analyses, such as alignment of reads to a reference. FastQC aims to provide a simple way to do quality control checks on raw sequence data coming from high throughput sequencing pipelines (trimmed FASTQ, FASTQC).
Software: FastQ Toolkit
04
Reference mapping
The graph-based alignment method uses alt-aware mapping for population haplotypes stitched into the reference with known alignments to establish alternate graph paths that reads could seed-map and align to. A BAM file is the compressed binary version of a SAM file that is used to represent aligned sequences (BAM).
05
Variant calling
The Variant Caller takes mapped and aligned DNA reads as input and calls SNPs and indels through a combination of column-wise detection and local de novo assembly of haplotypes. VCF is a text file format that contains information about variants found at specific positions in a reference genome (VCF).
06
Variant annotation
Nirvana provides clinical-grade annotation of genomic variants and it is being developed under a rigorous testing process to ensure accuracy of the results and enable embedding in other software with regulatory needs. VarSome Premium is a CE IVD-certified and HIPAA-compliant platform allowing fast and accurate variant discovery, annotation, and interpretation of NGS data (final report).
Software: Nirvana, VarSome Premium