-
Notifications
You must be signed in to change notification settings - Fork 34
7 Quick Example
-
Experiment: E-GEOD-48829
-
Species: Escherichia coli
-
Assumptions: iRAP was installed and configured.
-
All analysis should take less than two hours to run.
-
Create the directories to hold the data in the data sub-folders
mkdir -p $IRAP_DIR/data/reference/ecoli_k12 mkdir -p $IRAP_DIR/data/raw_data/ecoli_k12
-
Put the genome and annotation files in the respective species folder
cd $IRAP_DIR/data/reference/ecoli_k12 wget -c ftp://ftp.ensemblgenomes.org/pub/release-37/bacteria/fasta/bacteria_122_collection/escherichia_coli_k_12_gca_000981485/dna/Escherichia_coli_k_12_gca_000981485.EcoliK12AG100.dna.chromosome.I.fa.gz wget -c ftp://ftp.ensemblgenomes.org/pub/release-37/bacteria/gtf/bacteria_122_collection/escherichia_coli_k_12_gca_000981485/Escherichia_coli_k_12_gca_000981485.EcoliK12AG100.90.gtf.gz gunzip -f Escherichia_coli_k_12_gca_000981485.EcoliK12AG100.90.gtf.gz # remove the header from the GTF cat Escherichia_coli_k_12_gca_000981485.EcoliK12AG100.90.gtf | grep -v "^#" > tmp && mv tmp Escherichia_coli_k_12_gca_000981485.EcoliK12AG100.90.gtf
-
Put the FASTQ files in the respective species raw_data folder
cd $IRAP_DIR/data/raw_data/ecoli_k12 wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR933/SRR933983/SRR933983.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR933/SRR933984/SRR933984.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR933/SRR933985/SRR933985.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR933/SRR933989/SRR933989.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR933/SRR933990/SRR933990.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR933/SRR933991/SRR933991.fastq.gz # (only some of them to keep the example small) #wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR933/SRR933986/SRR933986.fastq.gz #wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR933/SRR933988/SRR933988.fastq.gz
-
Create iRAP’s experiment configuration/control file
# experiment name name=ecoli_ex # species species=ecoli_k12 # reference genome reference=Escherichia_coli_k_12_gca_000981485.EcoliK12AG100.dna.chromosome.I.fa.gz # gtf file gtf_file=Escherichia_coli_k_12_gca_000981485.EcoliK12AG100.90.gtf # user_trans=auto # Enable filtering based on quality qual_filtering=on # Use a contamination data set to filter out reads cont_index=no # Toplevel directory with the data data_dir=$(IRAP_DIR)/data mapper=bowtie2 # some contrasts... # GA=Group A contrasts=GAvsGB GBvsGA GAvsGB=GA GB GBvsGA=GB GA GA=FA FB FC GB=FD FE se=FA FB FC FD FE FA=SRR933983.fastq.gz FA_rs=50 FA_qual=33 FB=SRR933984.fastq.gz FB_rs=50 FB_qual=33 FC=SRR933985.fastq.gz FC_rs=50 FC_qual=33 FD=SRR933989.fastq.gz FD_rs=50 FD_qual=33 FE=SRR933990.fastq.gz FE_rs=50 FE_qual=33 FF=SRR933990.fastq.gz FF_rs=50 FF_qual=33
-
Its assumed that the configuration file was named ecoli_example.conf
-
Dryrun to validate the configuration file and see all commands that will be executed (-n option)
irap conf=ecoli_example.conf mapper=hisat2 de_method=deseq max_threads=8 -n
-
Run iRAP to process the experiment
irap conf=ecoli_example.conf mapper=hisat2 de_method=deseq max_threads=8
-
Output files
-
Filtered FASTQ files
-
ls ecoli_ex/irap_qc/*.f.fastq.gz
-
Bam files
ls ecoli_ex/irap_qc/hisat2/*.bam
-
Gene level quantification
ls ecoli_ex/irap_qc/hisat2/htseq2/genes.raw.htseq2.tsv
-
Transcript/isoform level quantification
ls ecoli_ex/irap_qc/hisat2/htseq2/transcripts.raw.htseq2.tsv
-
Exon level quantification
ls ecoli_ex/irap_qc/hisat2/htseq2/exons.raw.htseq2.tsv
-
Differential expression
ls ecoli_ex/irap_qc/hisat2/htseq2/deseq/*.genes_de.tsv
-
Run iRAP to process the experiment with a different set of methods
irap conf=ecoli_example.conf mapper=hisat2 quant_method=htseq2 de_method=deseq2 max_threads=8
irap conf=ecoli_example.conf mapper=none quant_method=kallisto de_method=deseq2 max_threads=8