NLGenomeSweeper (doi:10.15454/DS6VIK)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

NLGenomeSweeper

Identification Number:

doi:10.15454/DS6VIK

Distributor:

Portail Data INRAE

Date of Distribution:

2019-09-17

Version:

1

Bibliographic Citation:

Toda, Nicholas, 2019, "NLGenomeSweeper", https://doi.org/10.15454/DS6VIK, Portail Data INRAE, V1

Study Description

Citation

Title:

NLGenomeSweeper

Identification Number:

doi:10.15454/DS6VIK

Authoring Entity:

Toda, Nicholas (INRA - Institut National de la Recherche Agronomique)

Distributor:

Portail Data INRAE

Access Authority:

Canaguier, Aurélie

Access Authority:

Contenot, Sandrine

Access Authority:

Toda, Nicholas

Depositor:

Marquand, Elodie

Date of Deposit:

2019-09-17

Study Scope

Keywords:

Plant Breeding and Plant Products, Plant Health and Pathology, Vitis vinifera

Abstract:

NLGenomeSweeper is a command line bash pipeline that searches a genome for NBS-LRR (NLR) disease resistance genes based on the presence of the NB-ARC domain using the consensus sequence of the Pfam HMM profile (PF00931) and class specific consensus sequences built from Vitis vinifera. This pipeline can be used with a custom NB-ARC HMM consensus protein sequence(s) built for a species of interest or related species for greater power, separately for each type of NBS-LRR (TNLs, CNLs, NLs) and combine them into a single fasta file for use. This pipeline shows high specificity for complete genes and structurally complete pseudogenes. However, candidate regions are identified but may not necessarily represent functional genes and does not itself do gene prediction. A domain identification step is also included and the output in gff3 format can be used for manual annotation of NLR genes. Therefore, it is primarily for the identification of NLR genes for a genome where either no annotation exists or a large number of genes are expected to be absent due to repeat masking and difficulties in annotation. For many genomes this may be the case. (2019-08-26)

Kind of Data:

Software

Methodology and Processing

Sources Statement

Data Access

Notes:

<a href="https://spdx.org/licenses/MIT.html#licenseText">MIT License</a>

Other Study Description Materials

Other Study-Related Materials

Label:

LICENSE

Notes:

text/plain; charset=US-ASCII

Other Study-Related Materials

Label:

README.md

Text:

README : how to use the script

Notes:

application/octet-stream

Other Study-Related Materials

Label:

source_code.tar.gz

Text:

Source scripts and example data

Notes:

application/gzip