Genotypic data and genetic map for Syrah x Grenache bi-parental grapevine population (doi:10.15454/QEDX2V)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Part 5: Other Study-Related Materials
Entire Codebook

Document Description

Citation

Title:

Genotypic data and genetic map for Syrah x Grenache bi-parental grapevine population

Identification Number:

doi:10.15454/QEDX2V

Distributor:

Portail Data INRAE

Date of Distribution:

2021-05-25

Version:

1

Bibliographic Citation:

Brault, Charlotte; Flutre, Timothée; Launay, Amandine; Doligez, Agnès; Chouiki, Hajar; Sarah, Gautier; Mournet, Pierre, 2021, "Genotypic data and genetic map for Syrah x Grenache bi-parental grapevine population", https://doi.org/10.15454/QEDX2V, Portail Data INRAE, V1, UNF:6:L5hd7gugVIod43JkBypJvQ== [fileUNF]

Study Description

Citation

Title:

Genotypic data and genetic map for Syrah x Grenache bi-parental grapevine population

Identification Number:

doi:10.15454/QEDX2V

Authoring Entity:

Brault, Charlotte (INRAE - Institut national de recherche pour l'agriculture, l'alimentation et l'environnement)

Flutre, Timothée (INRAE - Institut national de recherche pour l'agriculture, l'alimentation et l'environnement)

Launay, Amandine (INRAE - Institut national de recherche pour l'agriculture, l'alimentation et l'environnement)

Doligez, Agnès (INRAE - Institut national de recherche pour l'agriculture, l'alimentation et l'environnement)

Chouiki, Hajar (INRAE - Institut national de recherche pour l'agriculture, l'alimentation et l'environnement)

Sarah, Gautier (INRAE - Institut national de recherche pour l'agriculture, l'alimentation et l'environnement)

Mournet, Pierre (CIRAD)

Software used in Production:

vcftools

Software used in Production:

R

Software used in Production:

Lep-MAP3

Grant Number:

SelVi

Grant Number:

FruitSelGen

Distributor:

Portail Data INRAE

Access Authority:

Doligez, Agnès

Depositor:

Brault, Charlotte

Date of Deposit:

2020-09-29

Study Scope

Keywords:

Genetic Resource, Omics, Plant Breeding and Plant Products, grapevine, Vitis vinifera, SNP, Genotyping-by-sequencing, genetic map

Abstract:

This dataset, intended for QTL detection and genomic prediction, includes : 1) the SNP genotyping data, in vcf format, obtained by GBS in a grapevine bi-parental population, and the script for formatting the data before building the genetic map; 2) the script and custom R functions used for building the map; 3) the script for checking map quality; 4) the consensus genetic map generated from these data, both in R/qtl format and recoded into additive gene-dose. The population is a pseudo-F1 progeny of 188 offsprings of Vitis vinifera L. from a reciprocal cross between cultivars Syrah and Grenache, made at Vassal (French National Grapevine Germplasm Collection, Domaine de Vassal, 34340 Marseillan-Plage, France) in 1995. More details about this Syrah x Grenache (SxG) progeny can be found in Adam-Blondon et al. (2004) and Fournier-Level et al. (2009). SNP genotyping data are in the form of the vcf file obtained after SNP calling and filtering (see details below). All scripts are given as Rmd files including R and bash commands. Genetic maps were built with Carthagene and Lep-MAP3, and checked with custom R scripts and R/qtl.

Kind of Data:

Dataset

Methodology and Processing

Sources Statement

Data Access

Notes:

<img src="https://www.etalab.gouv.fr/wp-content/uploads/2011/10/licence-ouverte-open-licence.gif" alt="Licence Ouverte" height="100"><a href="https://www.etalab.gouv.fr/licence-ouverte-open-licence">Licence Ouverte / Open Licence Version 2.0</a> compatible CC BY

Other Study Description Materials

File Description--f102965

File: 1_SxG-SNP-clean-phased-markers.tab

  • Number of cases: 15434

  • No. of variables per record: 3

  • Type of File: text/tab-separated-values

Notes:

UNF:6:PGaG8BK40UWGAAWTgTFcAQ==

File Description--f102955

File: 2_conversion_table_SxG-geno-names.tab

  • Number of cases: 192

  • No. of variables per record: 2

  • Type of File: text/tab-separated-values

Notes:

UNF:6:dDXI+8UgkrMRIRi2dS7y9Q==

Variable Description

List of Variables:

Variables

marker

f102965 Location:

Variable Format: character

Notes: UNF:6:SXv1GYLL2CiDeWsAxewm7g==

seg

f102965 Location:

Variable Format: character

Notes: UNF:6:iGC8puQ1rcXndJ1waGLhTw==

phase

f102965 Location:

Variable Format: character

Notes: UNF:6:UvvteeEjJjiJyNnnhhzu4A==

geno

f102955 Location:

Variable Format: character

Notes: UNF:6:bEhwDjQrrTYib5xVkDueBQ==

clone.acc.code

f102955 Location:

Variable Format: character

Notes: UNF:6:N+LJ1eMBf6dk3QRMWkGPKw==

Other Study-Related Materials

Label:

0_ReadMe.html

Text:

This document describes the input files, scripts and output files generated for SNP genotypic data in Syrah x Grenache bi-parental population publication and is in html format.

Notes:

text/html

Other Study-Related Materials

Label:

0_ReadMe.md

Text:

This document describes the input files, scripts and output files generated for SNP genotypic data in Syrah x Grenache bi-parental population publication and is in markdown format.

Notes:

text/markdown

Other Study-Related Materials

Label:

1_create_files_for_lepmap.Rmd

Text:

R script that we used to produce the cleaned vcf file and the pedigree file needed to run the « 2_create_genetic_map_lepmap_SxG.Rmd » script. It takes as inputs the « 1_FruitSelGen-grape-VITVI-PN40024-12x-v2-montpellier_SxG_raw-snps_gatk-filter_fimpute_filter-mendel_filter-segreg.vcf » and the « 1_SxG-SNP-clean-phased-markers.tab » files given below.

Notes:

text/x-r-markdown

Other Study-Related Materials

Label:

1_FruitSelGen-grape-VITVI-PN40024-12x-v2-montpellier_SxG_raw-snps_gatk-filter_fimpute_filter-mendel_filter-segreg.vcf

Text:

Text file with genotypes in vcf format, for 188 offsprings + 2 parents + 2 grand-parents and 17,298 SNP markers, obtained as follows. Genotyping was done by sequencing after genomic reduction, using RAD-sequencing technology with ApeKI restriction enzyme (Elshire et al, 2011), as described in Flutre et al (2020). Keygene N.V. owns patents and patent applications protecting its Sequence Based Genotyping technologies. SNP and genotype calling were performed using GATK (DePristo et al, 2011) after sequence reads were mapped with BWA (Li, 2013) onto the PN40024 12X.v2 reference sequence (Canaguier et al, 2017). A first filtering was applied, to discard indels, keep only bi-allelic SNPs without Mendelian violations, set as missing SNP genotypes with less than 10 reads or quality below 20, and discard SNPs with more than 30% missing data. This yielded a final number of 17,298 SNPs. Imputation was performed with FImpute (Sargolzaei et al, 2014).

Notes:

text/vcard

Other Study-Related Materials

Label:

2_create_genetic_map_lepmap_SxG.Rmd

Text:

R and bash script that we used to produce the Syrah x Grenache consensus genetic map with Lep-MAP3 (Rastas et al., 2017), from the SNP data filtered by grouping with Carthagene. It takes as inputs the cleaned vcf file and the pedigree file produced with «create_files_for_lepmap.Rmd». Highly distorted and non-informative markers were removed (parameters removeNonInformative=1 and dataTolerance=0.001). Then markers were separated into 19 linkage groups with SeparateChromosome2 function, with a LOD threshold of 25 and a minimum marker number of 100. Singular markers were assigned to linkage groups with JoinSingles2All function. Markers assigned to wrong linkage groups according to their physical position were removed (7 markers). We applied the ordering function OrderMarkers2 5 or 10 times for each linkage group and kept the replicate with the largest likelihood. We performed a local polynomial regression (loess) of the genetic distance on the physical distance (span=0.8) to detect and remove markers with outlying genetic positions, e.g., markers far from others and not in the sigmoid shape (+/- 40 SE). These ordering and outlier detection steps were run 3 times (bash scripts "map_step1.sh", "map_step2.sh" and "map_step3.sh" given below). Duplicated markers with the same segregation were removed and a last ordering step was done (see bash script "map_step4.sh" given below) in order to produce the "SxG_GBS_lepmap_raw.map" (then, linkage group names need to be added in order to generate "SxG_GBS_lepmap.map") and "SxG_GBS_lepmap.loc" files with 3,964 markers. A "SxG_GBS.qua" file (given below) with a single fake phenotype was also produced for use in "3_analyse_map_quality.Rmd"

Notes:

text/x-r-markdown

Other Study-Related Materials

Label:

2_genetic_mapping_functions.R

Text:

This R script contains three useful functions for genetic map building. The first one is 'GenoClass2GeneDose" that convert genotypic class in "abxcd" format into gene-dose. The second one is 'GeneMappingOutlierFiltering" which detects outlying markers according to their physical and genetic position. The third one is 'flip_chromosome' which returns the chromosome direction. These fonctions are used in "create_genetic_map_lepmap_SxG.Rmd"

Notes:

text/x-r-source

Other Study-Related Materials

Label:

3_analyse_map_quality.Rmd

Text:

R script that we used to produce different statistical tests and diagnostic plots to check map quality. It takes as input the 3 files created with "2_create_genetic_map_lepmap_SxG.Rmd" ("SxG_GBS_lepmap.loc", "SxG_GBS_lepmap.map", "SxG_GBS.qua"). Map quality was analyzed with R/qtl functions (Broman et al. 2003): droponemarker for dropping one marker at a time and rf for plotting two-point recombination fractions and LOD scores. Genotyping error rate was also estimated. We had to remove three outlier markers by hand because they were on the Unknown chromosome. The resulting map had 3,961 fully-informative markers (abxcd segregation) without missing data. This script produces 2 output files: "SxG_GBS.loc" and "SxG_GBS_lepmap_v2.map" (which also need to be modified by hand in order to generate "SxG_GBS.map").

Notes:

text/x-r-markdown

Other Study-Related Materials

Label:

4_SxG_GBS_genedose.tsv

Text:

Text file with genotypes coded in biallelic additive gene doses (0,1,2) for the 188 offsprings and the 3,961 markers mapped on the consensus genetic map. The 3,961 fully-informative markers (abxcd segregation) without missing data from the "SxG_GBS.loc" file were numerically recoded into biallelic additive doses according to the initial biallelic segregation (in the vcf file) and to the phase (available in « 1_SxG-SNP-clean-phased-markers.tab »), with the "3_analyse_map_quality.Rmd" script.

Notes:

text/tsv

Other Study-Related Materials

Label:

4_SxG_GBS.loc

Text:

Text file with genotypes in Joinmap format, for the 188 offsprings and the 3,961 markers mapped on the Syrah x Grenache consensus genetic map produced with "2_create_genetic_map_lepmap_SxG.Rmd" and checked with "3_analyse_map_quality.Rmd". Segregation type is "abxab" for all markers. The first column is the marker name (formatted as chromosome_physical-position), the second one is the segregation type, the third one is the phase, and the remaining columns are genotypic classes at this marker for each offspring. Offsprings are ordered according to "SxG_GBS.qua" file.

Notes:

application/octet-stream

Other Study-Related Materials

Label:

4_SxG_GBS.map

Text:

Text file in Joinmap format with the Syrah x Grenache consensus genetic map comprised of 3,961 markers, produced with "2_create_genetic_map_lepmap_SxG.Rmd" and checked with "3_analyse_map_quality.Rmd".

Notes:

application/octet-stream

Other Study-Related Materials

Label:

4_SxG_GBS.qua

Text:

Text file with a single fake phenotype in Joinmap format, produced with "2_create_genetic_map_lepmap_SxG.Rmd" for running R/qtl.

Notes:

application/octet-stream

Other Study-Related Materials

Label:

map_step1.sh

Text:

Bash script used for submitting the job on the SGE cluster with "OrderMarkers2" function from Lep-MAP3, during the "2_create_genetic_map_lepmap.Rmd" script. It performed the first ordering of markers, computed for each chromosome separately and repeated 5 times. Markers which were uncorrectly assigned to a chromosome were removed. Then, within each chromosome, the marker order with the highest likelihood was kept.

Notes:

application/x-shellscript

Other Study-Related Materials

Label:

map_step2.sh

Text:

Bash script used for submitting the job on the SGE cluster with "OrderMarkers2" function from Lep-MAP3, during the "2_create_genetic_map_lepmap.Rmd" script. It performed the second ordering of markers, computed for each chromosome separately and repeated 5 times. Markers which were uncorrectly assigned to a chromosome and outliers from the first filtering step were removed. Then, within each chromosome, the marker order with the highest likelihood was kept.

Notes:

application/x-shellscript

Other Study-Related Materials

Label:

map_step3.sh

Text:

Bash script used for submitting the job on the SGE cluster with "OrderMarkers2" function from Lep-MAP3, during the "2_create_genetic_map_lepmap.Rmd" script. It performed the third ordering of markers, computed for each chromosome separately and repeated 10 times. Markers which were uncorrectly assigned to a chromosome and outliers from the first and second filtering steps were removed. Then, within each chromosome, the marker order with the highest likelihood was kept.

Notes:

application/x-shellscript

Other Study-Related Materials

Label:

map_step4.sh

Text:

Bash script used for submitting the job on the SGE cluster with "OrderMarkers2" function from Lep-MAP3, during the "2_create_genetic_map_lepmap.Rmd" script. It performed the last (fourth) ordering of markers, computed for each chromosome separately and repeated 20 times. Markers which were uncorrectly assigned to a chromosome, outliers from the first and second filtering steps and markers at the same genetic position were removed. Then, within each chromosome, the marker order with the highest likelihood was kept.

Notes:

application/x-shellscript