The Solanum lycopersicoides Genome Consortium

Solanum lycopersicoides is a wild tomato species related to the domesticated Solanum lycopersicum.
Disclaimer and data access agreement 

Solanum lycopersicoides Data Access Agreement


The Boyce Thompson Institute and RWTH Aachen University are pleased to make the pre-publication Solanum lycopersicoides LA2951 genome sequences publicly available as a resource for plant breeding and biology. By accessing these data, I hereby agree to respect the rights of the consortium to analyze and publish the first global analyses in a peer-reviewed publication according to the Toronto agreement. This includes: 1) whole chromosome or whole genome level analyses on genes, gene families, and repetitive sequences and 2) comparative analyses with other organisms. Studies that do not overlap with those planned by the consortium may be undertaken following an agreement. Please contact Dr. Bjorn Usadel or Dr. Susan Strickler to discuss such possibilities. The data may be freely downloaded and used by all who respect the restrictions described above. Any use of the S. lycopersicoides genome data prior to its publication must follow the agreement and must credit "The Solanum lycopersicoides Genome Consortium".
Assembly details 

The genome was sequenced at 90X coverage using PacBio technology and was assembled with Canu. Contigs were error-corrected with Quiver and Pilon and scaffolded with Hi-C. Gaps were filled with PacBio sequence using PBJelly.

contig total length: 1,254,763,366 bp
no contigs: 17,507
longest contig: 3,446,189 bp
contig N50: 139,475 bp

final assembly total length: 1,269,715,057 bp
no sequences: 3,096
final assembly N50: 93,894,821 bp
longest sequence: 133,820,975 bp
89.7% of the assembly is found in 12 scaffolds

Annotation details 

Augustus and snap were trained to use as predictors in the Maker pipeline. Sequence data used as input includes: PacBio IsoSeq, RNA-seq from leaves, fruit, and flowers, proteins from S. lycopersicum, S. pennellii, and refseq. A total of 37,938 genes were predicted, 34,240 of these are located on pseudomolecules.

Available Data 
Genome browser BLAST