Choosing Parameters

Command Line Parameters

A command line version of VikNGS is available for users who wish to do association testing without running a user interface. The command line tool requires specification of a multi-sample VCF file and corresponding sample information file. By default, the command will run a common association tests on a single thread.

Run vikNGS -h for the list of relevant commands.

Parameter Value/Default Description
--vcf, -v [DIRECTORY] Directory of a multi-sample VCF file (required)
--sample, -i [DIRECTORY] Directory of a file containing sample information (required)
--bed,-b [DIRECTORY] directory of a BED file for collapsing variants
--out, -o [DIRECTORY]= . Directory for output (defaults to current directory)
--help, -h   Print a help message and exit
--common, -c   Perform a common variant association test (default)
--rare, -r [TEST NAME] Perform a rare variant association test
--boot, -n [INT]=1000 Number of bootstrap iterations to calculate
--stop, -s   Stop bootstrapping if p-value looks to be > 0.05
--collapse, -k [INT]=5 Collapse every k variants (rare only)
--gene   Collapse variants by gene if BED file specified (default)
--exon   Collapse variants by exon if BED file specified
--from [INT] Only include variants with POS larger than this value
--to [INT] Only include variants with POS smaller than this value
--chr [CHR NAME] Only include variants on this chromosome
--maf, -m [FLOAT]=0.05 Minor allele frequency cut-off (common-rare threshold)
--depth, -d [INT]=30 Read depth cut-off (low-high read depth threshold)
--missing, -x [FLOAT]=0.1 Missing data cut-off (maximum tolerance for missing data)
--all, -a   Include variants which do not have PASS in the FILTER column
--threads, -t [INT]=1 Number of threads
--batch, -h [INT]=1000 Number of variants to read from VCF before beginning tests

Example 1. Running a common test on 16 threads for variants on chromosome 7 with minor allele frequency > 10% and ignoring what is in the FILTER column of the VCF:

./VikNGS --vcf [...] --sample [...] --chr chr7 -m 0.1 --all -t 16

Example 2. Running a rare test (CAST) on 4 threads, collapsing variants along genes and using one million bootstrap iterations with early stopping:

./VikNGS --vcf [...] --sample [...] --bed [...] -r cast --gene -n 1000000 --stop -t 4

Parameter Explaination

Minor Allele Frequency Cutoff

While reading the VCF file, VikNGS computes an allele frequency for each variant. The minor allele frequency (MAF) is estimated only using the samples included in the multisample VCF file. The MAF cutoff is used to define which variants are considered “rare” versus “common”. When running a common association test, variants with estimated minor allele frequencies less than the MAF cutoff (ie. rare variants) will be excluded from testing. Likewise, when running a rare association test, variants with estimated minor allele frequencies greater than the MAF cutoff (ie. common variants) will be excluded from testing.

Missing Data Threshold

Variants may have ambiguous or missing genotype information (ex. GT = ./.) for some of the individuals in the multi-sample VCF file. If too much data is missing, association tests may produce misleading results. Any variant that is missing more data than this threshold will be excluded from testing. The default value is 0.1 which means if more than 10% of sample calls cannot be determined, the variant will be ignored.

Note

If running a quantitative association test, the proportion of missing data will be calculated from all samples. In a case-control test, two proportions will be calculated (one for all cases, one for all controls) if either cases or controls fail to satisfy the missing threshold, the variant will be excluded.

Filter By Genomic Coordinate

Enables filtering of variants based on the CHR and POS values in the VCF file. Variants outside a specific chromosome or range of positions can be excluded.

Must PASS

Variants which do not contain “PASS” in the FILTER column of the VCF are filtered out. By default this filtering step is on, turning it off will cause the contents of the FILTER column to be ignored.

Read Depth High/Low Cutoff

Samples with read depth above this threshold are considered high read depth samples (default=30). This is only used for the vRVS test if read depth values are provided in the sample infomation file.

Collapse Variants

See information on the BED file section on the Input page for details.

Testing Parameters

See information on the Tests page for details on the tests available.

Note

Use -r cast and -r skat for the CAST-like and SKAT-like tests, respectively.

Threads and Batch Size

Number of threads to perform association testing on. Batch size is the number of variants to process at one time on a given thread.

Warning

VikNGS will parse the VCF file line-by-line and store the data in memory. When using a large batch size, please keep in mind the memory limits of your device as these settings will determine how much memory is used.

Plot Results

Only available on the graphical user interface. A plotting interface will be displayed following the association testing in a new window if this setting is checked.

Explain Filter

Writes a file that explain why filtered variants were filtered if checked. See Output <output> for more details.

Retain Genotypes

This setting will store genotypes parsed from the VCF file in memory and will enable exploration of these values after p-values have been calculated.

Warning

Retaining all genotypes is extremely memory-intensive since a large amount of the data from the VCF file is being stored in memory simultaneously. Please only use this option for small datasets or on machines with very large amounts of memory.