Genetic Variation
Introduction
Genetic Variation Analysis of Glycine Max.
Dataset Description
This dataset contains 301 Canadian soybean lines that were subjected to GBS analysis (with ApeKI digestion).
Organism: Glycine Max.
Instrument: Illumina HiSeq 2000.
Layout: Paired-end.
Publication
In order to accelerate the process, a subset of 23 samples was used.
Original Data
Illumina ENA Run: PRJNA287266.
NCBI Genome: Glycine max (soybean).
Bioinformatic Analysis
1 - DNA-Seq Alignment
Application
DNA-Seq Alignment (BWA).
Input
Illumina sequencing data in FASTQ format.
NCBI genome in FASTA format.
ParametersS
Single-end Data.
Genome Sequences: GCF_000004515.6_Glycine_max_v4.0_genomic.fna
Minimum Seed Length: 19
Band Width: 100
Z-dropoff: 100
Trigger Re-seeding: 1.5
Seed Occurrence: 20
Skip Seeds: 500
Drop Chains: 0.5
Discard Chains: 0
Mate Rescue Rounds: 50
Skip Mate Rescue: false
Skip Pairing: false
Matching Score: 1
Mismatch Penalty: 4
Gap Open Penalty (DEL): 6
Gap Open Penalty (INS): 6
Gap Extension Penalty (DEL): 1
Gap Extension Penalty (INS): 1
5'-end Clipping Penalty: 5
3'-end Clipping Penalty: 5
Unpaired Read Penalty: 17
Minimum Score: 30
Split Alignments as Primary: false
MapQ of Supp. Alignments: false
Output All Alignments: false
Soft Clipping for Supp.: false
Shorter Split Hits as Secondary: false
Sort BAM File: By Coordinates
Add Read Group Information: false
Execution Time
Around 30 minutes.
Output
alignments_per_category_bwa.box: Stacked bar plot about the DNA-Seq Alignments results.
relative_alignments_per_category.box: Stacked bar plot about the DNA-Seq Alignment results (percentages).
report_bwa.box: Report about the DNA-Seq Alignment results.
2- Variant Calling
Application
Input
NCBI genome in FASTA format.
DNA-Seq Alignments in BAM format (from the 1 - DNA-Seq Alignment step).
Parameters
BAM files: bam.files folder
Reference Genome: GCF_000004515.6_Glycine_max_v4.0_genomic.fna.gz
Adjust Mapping Quality: 0
Max. Depth: 250
Min. Mapping Quality: 0
Min. Base Quality: 13
Ignore @RG Tags: False
BAQ option: No BAQ
Extension Error Probability: 20
Minimum Fraction of Gapped Reads: 0.002
Tandem Quality: 500
Skip Indel Calling: False
Gapped Reads for Indel: 1
Phred Open Sequencing Error: 40
Keep Alternate Alleles: True
Use Groups: False
VCF File: bcftools.vcf.gz
Execution Time
16 minutes.
Output
bcftools.vcf.gz: VCF File.
variant_calling_report.box: Summary report about variants found in alignments.
raw_read_depth.box: Distribution of raw depths.
proportion_quality_depth.box: Distribution of the proportion quality/raw depth.
average_mapping_quality.box: Distribution of the MQ field.
3- Variant Filtering
Application
Variant Filtering.
Input
VCF file (from the 2- Variant Calling step).
Parameters
Proportion ‘Quality/Counts’: 2
Raw Read Depth: 2
Phred Quality: 20
Average Mapping Quality: 59
Remove Multiple Alleles: True
Check Reads in Both Strands: False
Check if Reads are Balanced: False
Execution Time
2 seconds.
Output
filtered.vcf.gz: Filtered VCF File.
variant_filtering_report.box: Summary report about the filtering step.
raw_read_depth.box: Distribution of raw depths in the variant that passed the filter.
phred_quality.box: Distribution of the QUAL field.
proportion_quality_depth.box: Distribution of the proportion quality/raw depth.
average_mapping_quality.box: Distribution of the MQ field.
4- Variant Annotation
Application:
Variant Annotation using VEP.
Input:
Execution Time
10 minutes.
Output
annotation.box: Table with information of each found variant.
report.box: Summary report with information of the type of variants, their consequences and some population genetics information.