Define where the pipeline should find input data and save output data.

Path to comma-separated file containing information about the samples in the experiment.

required
type: string
pattern: ^\S+\.csv$

A design file with information about the samples in your experiment. Use this parameter to specify the location of the input files. It has to be a comma-separated file with a header row. See usage docs.

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required
type: string

Email address for completion summary.

type: string
pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config) then you don't need to specify this on the command line for every run.

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

type: string

Save FastQ files after merging re-sequenced libraries in the results directory.

type: boolean

Reference genome related files and options required for the workflow.

Name of iGenomes reference.

type: string

If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. --genome GRCh38.

See the nf-core website docs for more details.

Path to FASTA genome file.

type: string
pattern: ^\S+\.fn?a(sta)?(\.gz)?$

This parameter is mandatory if --genome is not specified. If you don't have a BWA index available this will be generated for you automatically. Combine with --save_reference to save BWA index for future runs.

Path to FASTA dictionary file.

type: string

NB If none provided, will be generated automatically from the FASTA reference.

Path to FASTA reference index.

type: string

NB If none provided, will be generated automatically from the FASTA reference

Path to GTF annotation file.

type: string

This parameter is mandatory if --genome is not specified.

Path to GFF3 annotation file.

type: string

This parameter must be specified if --genome or --gtf are not specified.

Path to BED file containing exon intervals. This will be created from the GTF file if not specified.

type: string

Read length

type: number
default: 150

Specify the read length for the STAR aligner.

If generated by the pipeline, save the STAR index in the results directory.

type: boolean

If the STAR index is generated by the pipeline, then please use this parameter to save it to your results folder. These index can then be used for future pipeline runs, reducing processing times.

Path to known indels VCF file

type: string

Path to known indels index file

type: string

Path to dbSNP VCF file

type: string

Path to dbSNP VCF index file

type: string

snpEff DB version.

type: string

If you use AWS iGenomes, this has already been set for you appropriately.
This is used to specify the database to be use to annotate with.
Alternatively databases' names can be listed with the snpEff databases.

snpEff genome.

type: string

If you use AWS iGenomes, this has already been set for you appropriately.
This is used to specify the genome when using the container with pre-downloaded cache.

VEP genome.

type: string

If you use AWS iGenomes, this has already been set for you appropriately.
This is used to specify the genome when using the container with pre-downloaded cache.

VEP species.

type: string

If you use AWS iGenomes, this has already been set for you appropriately.
Alternatively species listed in Ensembl Genomes caches can be used.

VEP cache version.

type: string

If you use AWS iGenomes, this has already been set for you appropriately.
Alternatively cache version can be use to specify the correct Ensembl Genomes version number as these differ from the concurrent Ensembl/VEP version numbers

Type of feature to parse from annotation file

type: string

This parameter value can be exon, transcript or gene. Default exon

Download annotation cache.

type: boolean

Set this parameter, if you wish to download annotation cache.

Do not load the iGenomes reference config.

hidden
type: boolean

Do not load igenomes.config when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in igenomes.config.

The base path to the igenomes reference files

hidden
type: string
default: s3://ngi-igenomes/igenomes/

Define parameters related to read alignment

Specifies the alignment algorithm to use. Currently available option is 'star'

type: string
default: star

This parameter define which aligner is to be used for aligning the RNA reads to the reference genome. Currently only STAR aligner is supported. So use 'star' as the value for this option.

Path to STAR index folder or compressed file (tar.gz)

type: string

This parameter can be used if there is an pre-defined STAR index available. You can either give the full path to the index directory or a compressed file in tar.gz format.

Enable STAR 2-pass mapping mode.

type: boolean

This parameter enables STAR to perform 2-pass mapping. Default true.

Do not use GTF file during STAR index buidling step

type: boolean

Do not use parameter --sjdbGTFfile <GTF file> during the STAR genomeGenerate process.

Option to limit RAM when sorting BAM file. Value to be specified in bytes. If 0, will be set to the genome index size.

type: integer

This parameter specifies the maximum available RAM (bytes) for sorting BAM during STAR alignment.

Specifies the number of genome bins for coordinate-sorting

type: integer
default: 50

This parameter specifies the number of bins to be used for coordinate sorting during STAR alignment step.

Specifies the maximum number of collapsed junctions

type: integer
default: 1000000

Sequencing center information to be added to read group of BAM files.

type: string

This parameter is required for creating a proper BAM header to use in the downstream analysis of GATK.

Specify the sequencing platform used

type: string
default: illumina

This parameter is required for creating a proper BAM header to use in the downstream analysis of GATK.

Where possible, save unaligned reads from aligner to the results directory.

type: boolean

This may either be in the form of FastQ or BAM files depending on the options available for that particular tool.

Save the intermediate BAM files from the alignment step.

type: boolean

By default, intermediate BAM files will not be saved. The final BAM files created after the appropriate filtering step are always saved to limit storage usage. Set this parameter to also save other intermediate BAM files.

Create a CSI index for BAM files instead of the traditional BAI index. This will be required for genomes with larger chromosome sizes.

type: boolean

Specify whether to remove duplicates from the BAM during Picard MarkDuplicates step.

type: boolean

Specify true for removing duplicates from BAM file during Picard MarkDuplicates step.

The minimum phred-scaled confidence threshold at which variants should be called.

type: number
default: 20

Specify the minimum phred-scaled confidence threshold at which variants should be called.

Enable generation of GVCFs by sample additionnaly to the VCFs.

type: boolean

This parameter enables GATK HAPLOTYPECALLER to generate GVCFs. Default false.

Specify which tools RNAvar should use for annotating variants. Values can be 'snpeff', 'vep' or 'merge'. If you specify 'merge', the pipeline runs both snpeff and VEP annotation.

hidden
type: string

List of tools to be used for variant annotation.

This parameter must be a combination of the following values: snpeff, vep, merge

Path to VEP cache.

type: string
default: s3://annotation-cache/vep_cache/

Path to VEP cache which should contain the relevant species, genome and build directories at the path ${vep_species}/${vep_genome}_${vep_cache_version}

Path to snpEff cache.

type: string
default: s3://annotation-cache/snpeff_cache/

Path to snpEff cache which should contain the relevant genome and build directory in the path ${snpeff_species}.${snpeff_version}

Allow usage of fasta file for annotation with VEP

hidden
type: boolean

By pointing VEP to a FASTA file, it is possible to retrieve reference sequence locally. This enables VEP to retrieve HGVS notations (--hgvs), check the reference sequence given in input data, and construct transcript models from a GFF or GTF file without accessing a database.

For details, see here.

Enable the use of the VEP dbNSFP plugin.

hidden
type: boolean

For details, see here.

Path to dbNSFP processed file.

hidden
type: string

To be used with --vep_dbnsfp.
dbNSFP files and more information are available at https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#dbnsfp and https://sites.google.com/site/jpopgen/dbNSFP/

Path to dbNSFP tabix indexed file.

hidden
type: string

To be used with --vep_dbnsfp.

Consequence to annotate with

hidden
type: string

To be used with --vep_dbnsfp.
This params is used to filter/limit outputs to a specific effect of the variant.
The set of consequence terms is defined by the Sequence Ontology and an overview of those used in VEP can be found here: https://www.ensembl.org/info/genome/variation/prediction/predicted_data.html
If one wants to filter using several consequences, then separate those by using '&' (i.e. 'consequence=3_prime_UTR_variant&intron_variant'.

Fields to annotate with

hidden
type: string
default: rs_dbSNP,HGVSc_VEP,HGVSp_VEP,1000Gp3_EAS_AF,1000Gp3_AMR_AF,LRT_score,GERP++_RS,gnomAD_exomes_AF

To be used with --vep_dbnsfp.
This params can be used to retrieve individual values from the dbNSFP file. The values correspond to the name of the columns in the dbNSFP file and are separated by comma.
The column names might differ between the different dbNSFP versions. Please check the Readme.txt file, which is provided with the dbNSFP file, to obtain the correct column names. The Readme file contains also a short description of the provided values and the version of the tools used to generate them.

Default value are explained below:

rs_dbSNP - rs number from dbSNP
HGVSc_VEP - HGVS coding variant presentation from VEP. Multiple entries separated by ';', corresponds to Ensembl_transcriptid
HGVSp_VEP - HGVS protein variant presentation from VEP. Multiple entries separated by ';', corresponds to Ensembl_proteinid
1000Gp3_EAS_AF - Alternative allele frequency in the 1000Gp3 East Asian descendent samples
1000Gp3_AMR_AF - Alternative allele counts in the 1000Gp3 American descendent samples
LRT_score - Original LRT two-sided p-value (LRTori), ranges from 0 to 1
GERP++_RS - Conservation score. The larger the score, the more conserved the site, ranges from -12.3 to 6.17
gnomAD_exomes_AF - Alternative allele frequency in the whole gnomAD exome samples.

Enable the use of the VEP LOFTEE plugin.

hidden
type: boolean

For details, see here.

Enable the use of the VEP SpliceAI plugin.

hidden
type: boolean

For details, see here.

Path to spliceai raw scores snv file.

hidden
type: string

To be used with --vep_spliceai.

Path to spliceai raw scores snv tabix indexed file.

hidden
type: string

To be used with --vep_spliceai.

Path to spliceai raw scores indel file.

hidden
type: string

To be used with --vep_spliceai.

Path to spliceai raw scores indel tabix indexed file.

hidden
type: string

To be used with --vep_spliceai.

Enable the use of the VEP SpliceRegion plugin.

hidden
type: boolean

For details, see here and here.

Add an extra custom argument to VEP.

type: string
default: --everything --filter_common --per_gene --total_length --offline --format vcf

Using this params you can add custom args to VEP.

Use annotation cache keys for snpeff_cache and vep_cache.
Only when using annotation-cache or a similar structure.
See here for more information.

hidden
type: boolean

The output directory where the cache will be saved. You have to use absolute paths to storage on Cloud infrastructure.

hidden
type: string

VEP output-file format.

hidden
type: string

Sets the format of the output-file from VEP. Available formats: json, tab and vcf.

Define parameters that control the stages in the pipeline

Skip the process of base recalibration steps i.e., GATK BaseRecalibrator and GATK ApplyBQSR.

type: boolean

This parameter disable the base recalibration step, thus using a un-calibrated BAM file for variant calling.

Skip the process of preparing interval lists for the GATK variant calling step

type: boolean

This parameter disable preparing multiple interval lists to use with HaplotypeCaller module of GATK. It is recommended not to disable the step as it is required to run the variant calling correctly.

Skip variant filtering of GATK

type: boolean

Set this parameter if you don't want to filter any variants.

Skip variant annotation

type: boolean

Set this parameter if you don't want to run variant annotation.

Skip MultiQC reports

type: boolean

This parameter disable all QC reports

Define parameters of the tools used in the pipeline

Number of times the gene interval list to be split in order to run GATK haplotype caller in parallel

type: integer
default: 25

Set this parameter to decide the number of splits for the gene interval list file.

Do not use gene interval file during variant calling

type: boolean

This parameter, if set to True, does not use the gene intervals during the variant calling step, which then results in variants from all regions including non-genic. Default is False

The window size (in bases) in which to evaluate clustered SNPs.

type: integer
default: 35

This parameter is used by GATK variant filteration step. It defines the window size (in bases) in which to evaluate clustered SNPs. It has to be used together with the other option 'cluster'.

The number of SNPs which make up a cluster. Must be at least 2.

type: integer
default: 3

This parameter is used by GATK variant filteration step. It defines the number of SNPs which make up a cluster within a window. Must be at least 2.

Value to be used for the FisherStrand (FS) filter

type: number
default: 30

This parameter defines the value to use for the FisherStrand (FS) filter in the GATK variant-filtering step.
The value should given in a float number format. Default is 30.0

Value to be used for the QualByDepth (QD) filter

type: number
default: 2

This parameter defines the value to use for the QualByDepth (QD) filter in the GATK variant-filtering step.
The value should given in a float number format. Default is 2.0

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden
type: string
default: master

Base directory for Institutional configs.

hidden
type: string
default: https://raw.githubusercontent.com/nf-core/configs/master

If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.

Institutional config name.

hidden
type: string

Institutional config description.

hidden
type: string

Institutional config contact information.

hidden
type: string

Institutional config URL link.

hidden
type: string

Less common options for the pipeline, typically set in a config file.

Display version and exit.

hidden
type: boolean

Method used to save pipeline results to output directory.

hidden
type: string

The Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.

Email address for completion summary, only when pipeline fails.

hidden
type: string
pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.

Send plain-text email instead of HTML.

hidden
type: boolean

File size limit when attaching MultiQC reports to summary emails.

hidden
type: string
default: 25.MB
pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Do not use coloured log outputs.

hidden
type: boolean

Incoming hook URL for messaging service

hidden
type: string

Incoming hook URL for messaging service. Currently, MS Teams and Slack are supported.

Custom config file to supply to MultiQC.

hidden
type: string

Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file

hidden
type: string

Custom MultiQC yaml file containing HTML including a methods description.

type: string

Boolean whether to validate parameters against the schema at runtime

hidden
type: boolean
default: true

Base URL or local path to location of pipeline test dataset files

hidden
type: string
default: https://raw.githubusercontent.com/nf-core/test-datasets/modules/data

Base URL or local path to location of pipeline test dataset files

hidden
type: string
default: https://raw.githubusercontent.com/nf-core/test-datasets/modules/data

Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.

hidden
type: string