Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Premade STAR index download page is unavailable. #867

Closed
mranjan1 opened this issue Mar 25, 2020 · 10 comments
Closed

Premade STAR index download page is unavailable. #867

mranjan1 opened this issue Mar 25, 2020 · 10 comments
Labels
question resolved problem or issue that has been resolved

Comments

@mranjan1
Copy link

I'm constantly getting a "Gateway time out" error when I try to access http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STARgenomes/

Is there anyone else having the same problem?

Is there any other online repository where I can download pre-built STAR indices from?

@alexdobin
Copy link
Owner

Hi Manish,

we are having some network problems at the lab for a few days. Our IT is working to resolve it. There is no place for now to download files from. I think it's best to actually generate the indexes yourselve, as I have not updated the generated genomes for a long time.

Cheers
Alex

@mranjan1
Copy link
Author

mranjan1 commented Apr 5, 2020

Thank you Alex. I had a 'minimum hardware requirement' issue since I am unable to access my HPCC - but I built the index on AWS for now.

Best,
Manish

@alexdobin alexdobin added the resolved problem or issue that has been resolved label Apr 10, 2020
@jolespin
Copy link

jolespin commented Jan 16, 2022

  1. Is this genome/index still the preferred pre-built human STAR index?

  2. If you were to build this from the most version on NCBI GCA_000001405.28_GRCh38.p13
    Would you just use the following files:

With this command?

STAR --runThreadN 4 --runMode genomeGenerate --genomeSAindexNbases 12 --genomeDir ./ --genomeFastaFiles ${GENOME} --sjdbOverhang 99 --sjdbGTFfile ${GTF} --limitGenomeGenerateRAM 15000000000 --genomeSAsparseD 3 --limitIObufferSize 50000000 --limitSjdbInsertNsj 383200
  1. Is the no_alt_analysis_set preferred over the primary assembly?

This UCSC thread mentions:

The no_alt_analysis_set is the one most likely to be relevant for most aligners. It removes alternate alleles. Most aligners cannot yet use alternate alleles.

Edit: I got this error trying to reproduce the index command in [2]:

EXITING because of FATAL input ERROR: --limitIObufferSize requires 2 numbers since 2.7.9a.
SOLUTION: specify 2 numbers in --limitIObufferSize : size of input and output buffers in bytes.

Jan 16 01:59:57 ...... FATAL ERROR, exiting

I'm running this version:

STAR --version
2.7.10a

@alexdobin
Copy link
Owner

Hi Josh,

the pre-built indexes are not supported at the moment.
It's best to build an index with the current STAR version and current annotations.

no_alt_analysis_set is indeed the right FASTA to use.
I recommend using "PRImary" FASTA and GTF from GENCODE:
https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M28/GRCm39.primary_assembly.genome.fa.gz
https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M28/gencode.vM28.primary_assembly.annotation.gtf.gz

Cheers
Alex

@jolespin
Copy link

jolespin commented Jan 17, 2022

Thank you for the links out. I'll find the human versions and get those running today:

Do you recommend any critical parameters to adjust besides --sjdbOverhang (read length minus 1)?

Edit: I'm using 151 bp long reads and this is the command I ended up using (current GENCODE version as of this post).

wget http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/latest_release/GRCh38.primary_assembly.genome.fa.gz
wget http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/latest_release/gencode.v39.primary_assembly.annotation.gtf.gz

gzip -d *.gz

GENOME=GRCh38.primary_assembly.genome.fa
GTF=gencode.v39.primary_assembly.annotation.gtf

STAR --runThreadN 24 --runMode genomeGenerate --genomeSAindexNbases 12 --genomeDir . --genomeFastaFiles ${GENOME} --sjdbOverhang 150 --sjdbGTFfile ${GTF}

@alexdobin
Copy link
Owner

Hi Josh,

you command loos good.
There are no critical parameters, but here are some you may want to consider (from ENCODE):

--outFilterType                  BySJout    //reduces the number of "spurious" junctions
--outFilterMultimapNmax          20         //max number of multiple alignments allowed for a read: if exceeded, the read is considered unmapped
--alignSJoverhangMin             8          //min overhang for unannotated junctions
--alignSJDBoverhangMin           1          //min overhang for annotated junctions
--outFilterMismatchNmax          999        //max number of mismatches per pair (absolute)
--outFilterMismatchNoverLmax     0.06       //max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=12 for the paired read
--alignIntronMin                 20         //min intron
--alignIntronMax                 1000000    //max intron
--alignMatesGapMax               1000000    //max genomic distance between pairs

Cheers
Alex

@annamariabugaj
Copy link

I would like to download the prebuild human genome index but I am not sure how to do this and what is what in the files, could someone please explain me how to download it from this website? https://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STARgenomes/Human/GRCh38_Ensembl99_sparseD3_sjdbOverhang99/

@jolespin
Copy link

jolespin commented Sep 7, 2022

IIRC most of that (or the entire) directory is the first need. The index is a directory that has the genome coordinates you need to run STAR so when you run STAR you would provide the path to that directory that you've downloaded. That directory would be the genome index you use as a reference.

@annamariabugaj
Copy link

Thank you! I am a bit confused with the download - should I use wget and the whole path?

@alexdobin
Copy link
Owner

Hi @BubuAalbu

presently I am not making premade indexes available. Please generate the index from the proper FASTA and GTF files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question resolved problem or issue that has been resolved
Projects
None yet
Development

No branches or pull requests

4 participants