Transposable Elements prediction and annotation in the C. elegans genome (ICPSR doi:10.15454/LQCIW0)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

Transposable Elements prediction and annotation in the C. elegans genome

Identification Number:

doi:10.15454/LQCIW0

Distributor:

Portail Data INRAE

Date of Distribution:

2020-07-03

Version:

1

Bibliographic Citation:

Kozlowski, Djampa, 2020, "Transposable Elements prediction and annotation in the C. elegans genome", https://doi.org/10.15454/LQCIW0, Portail Data INRAE, V1

Study Description

Citation

Title:

Transposable Elements prediction and annotation in the C. elegans genome

Identification Number:

doi:10.15454/LQCIW0

Authoring Entity:

Kozlowski, Djampa (INRAE/Universite Cote d'Azur)

Distributor:

Portail Data INRAE

Access Authority:

Danchin, Etienne

Access Authority:

Kozlowski, Djampa

Depositor:

kozlowski, djampa

Date of Deposit:

2020-07-03

Study Scope

Keywords:

Biodiversity and Ecology, Omics, Plant Health and Pathology

Abstract:

Summary: contains all the essential files produced during the TE prediction, annotation, and post-processing in the C. elegans genome (e.g. TE consensus library, TE annotations, and associated statistics). Also contains the global workflow (used command lines), the REPET configuration files (with parameters) used for this analysis and the inhouse python script used to identify canonical TE annotations using TE consensus library and draft TE annotation (REPET output).

Kind of Data:

Dataset

Kind of Data:

Software

Kind of Data:

Workflow

Methodology and Processing

Sources Statement

Data Access

Notes:

CC0 Waiver

Other Study Description Materials

Other Study-Related Materials

Label:

celegA1_chr_allTEs_nr_join_path.annotStatsPerTE_FullLengthCopy.fa

Text:

Cleaned TE-consensus sequences library (REPET analysis : TEannot + automated cleaning with 1 loose round of TEannot). Used for TE annotation.

Notes:

application/octet-stream

Other Study-Related Materials

Label:

c_elegans.PRJNA13758.WS270.genomic.REPET.TElib.draftAnnot.10-01-20.sorted.gff3

Text:

draft TE annotation (REPET output; gff3 format)

Notes:

application/octet-stream

Other Study-Related Materials

Label:

celegans.TEannotation.10-01-20.filtered.classiffiedOnly.minLen250.minId85.minConsCov33.blastVsCons.noOverlap.bed

Text:

C.elegans canonical TE annotation file. Extracted from the draft TE annotation (bed format) using REPETpostAnal- V1.0.5.py

Notes:

application/vnd.realvnc.bed

Other Study-Related Materials

Label:

C.elegans_TEprediction_and_annotation_with_REPET.Rmd

Text:

bidule

Notes:

text/x-r-markdown

Other Study-Related Materials

Label:

finalAnnot.perConsensusStats.txt

Text:

Per-TE-consensus statistics e.g number of copies, TE class and order, copies length/identity statistics, etc. Tab separated file.

Notes:

text/plain

Other Study-Related Materials

Label:

finalAnnot.perCopyStats.txt

Text:

Per-TE-copy statistics e.g linked TE-consenus, copies length/identity statistics, etc. Tab separated file.

Notes:

text/plain

Other Study-Related Materials

Label:

logs_REPETpostAnal-V1.0.5_celegans.TEannotation.10-01-20.txt

Text:

Logs from C. elegans TE annotation post-processing using REPETpostAnal program. Summarize annotations statistics for each post-processing steps.

Notes:

text/plain

Other Study-Related Materials

Label:

REPETpostAnal-V1.0.5.py

Text:

In-house python (>= 3) script. Used to parse REPET draft annotation (repeatome) and isolate canonical TE annotations.Requires as input i) REPET draft TE-annotation output (.gff3), ii) the TE-consensus library used for the annotation, iii) the genome fasta file. Default parameters are set for stringent filtering post-processing (e.g canonical TE-annotations). External Dependencies: bedtools (>= 2), blast+, bash. Required python libraries: subprocess, os, sys, re, pandas, argparse, numpy, Bio (SeqIO).

Notes:

text/x-python

Other Study-Related Materials

Label:

TEannot.cfg

Text:

TE annot pipeline configuration file with parameters values. Mandatory for TEdenovo pipeline execution. For more information, see : https://urgi.versailles.inra.fr/Tools/REPET https://urgi.versailles.inra.fr/Tools/REPET/TEannottuto

Notes:

application/octet-stream

Other Study-Related Materials

Label:

TEdenovo.cfg

Text:

TE denovo pipeline configuration file with parameters values. Mandatory for TEdenovo pipeline execution. For more information, see : https://urgi.versailles.inra.fr/Tools/REPET https://urgi.versailles.inra.fr/Tools/REPET/TEdenovotuto

Notes:

application/octet-stream