Title: | Identifying Unique Multilocus Genotypes where Genotyping Error and Missing Data may be Present |
---|---|
Description: | Tools for the identification of unique of multilocus genotypes when both genotyping error and missing data may be present; targeted for use with large datasets and databases containing multiple samples of each individual (a common situation in conservation genetics, particularly in non-invasive wildlife sampling applications). Functions explicitly incorporate missing data and can tolerate allele mismatches created by genotyping error. If you use this package, please cite the original publication in Molecular Ecology Resources (Galpern et al., 2012), the details for which can be generated using citation('allelematch'). For a complete vignette, please access via the Data S1 Supplementary documentation and tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>. |
Authors: | Paul Galpern <[email protected]> |
Maintainer: | Todd Cross <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.5.4 |
Built: | 2024-10-25 02:43:50 UTC |
Source: | https://github.com/cran/allelematch |
Tools for the identification of unique of multilocus genotypes when both genotyping error and missing data may be present; targeted for use with large datasets and databases containing multiple samples of each individual (a common situation in conservation genetics, particularly in non-invasive wildlife sampling applications). Functions explicitly incorporate missing data and can tolerate allele mismatches created by genotyping error.
Package: | allelematch |
Type: | Package |
Version: | 2.5.2 |
Date: | 2023-05-18 |
License: | GPL (>= 2) |
Requires: | dynamicTreeCut |
LazyLoad: | yes |
Supplementary documentation describing the operation of the software in detail and illustrating the
use of the software using tutorials is available as a vignette.
It is installed with the package and linked from the package index help page. An online version is
also available via the Data S1 Supplementary documentation and tutorials (PDF) located at
doi:10.1111/j.1755-0998.2012.03137.x.
Simulations examining the performance of these tools have also been performed, and results are
available in the publication associated with this package. Please refer to the publication:
Galpern P, Manseau, M, Hettinga P, Smith K, and Wilson P. (2012) allelematch: an R package for
identifying unique multilocus genotypes where genotype error and missing data may be present.
Molecular Ecology Resources 12:771-778.
Use citation("allelematch")
for the full citation. Please also use this publication when
citing the package.
An important core element of the package is dynamic tree cutting, and this is made possible via the cutreeHybrid function within the dynamicTreeCut package for R (Langfelder et al., 2008).
Paul Galpern ([email protected])
Langfelder P, Zhang B, Horvath S. (2008) Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics, 24, 719
Determines the allele frequencies for each locus in a multilocus genetic dataset by
first removing missing observations. It requires an amDataset
object and a map
vector relating each column of the dataset to a genetic locus.
amAlleleFreq( amDatasetFocal, multilocusMap = NULL ) ## S3 method for class 'amAlleleFreq' print(x, ...)
amAlleleFreq( amDatasetFocal, multilocusMap = NULL ) ## S3 method for class 'amAlleleFreq' print(x, ...)
amDatasetFocal |
An |
multilocusMap |
Optionally, a vector of integers or strings giving the mappings onto loci for all
genotype columns in |
x |
An |
... |
Additional arguments to summary |
This function is called by amUnique
.
multilocusMap
is often not required, as amDataset
objects will typically
consist of paired columns of genotypes, where each pair is a separate locus. In cases
where this is not the case (e.g., gender is given in only one column), a map vector
must be specified.
Example: amDataset
consists of gender followed by 4 diploid loci in paired
columnsmultilocusMap = c(1, 2, 2, 3, 3, 4, 4, 5, 5)
or equallymultilocusMap=c("GENDER", "LOC1", "LOC1", "LOC2", "LOC2", "LOC3", "LOC4",
"LOC4")
An amAlleleFreq
object
Paul Galpern ([email protected])
For more information and for a complete vignette, please access via the Data S1 Supplementary documentation and tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>.
## Not run: data("amExample5") ## Produce amDataset object myDataset1 <- amDataset( amExample5, missingCode = "-99", indexColumn = 1, metaDataColumn = 2 ) ## Usage myAlleleFreq <- amAlleleFreq( myDataset1, multilocusMap = c(1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11) ) ## Produce amDataset object, but remove gender column myDataset2 <- amDataset( amExample5, missingCode = "-99", indexColumn = 1, metaDataColumn = 2, ignoreColumn = "gender" ) ## Because all columns are paired, usage is simpler myAlleleFreq <- amAlleleFreq(myDataset2) ## End(Not run)
## Not run: data("amExample5") ## Produce amDataset object myDataset1 <- amDataset( amExample5, missingCode = "-99", indexColumn = 1, metaDataColumn = 2 ) ## Usage myAlleleFreq <- amAlleleFreq( myDataset1, multilocusMap = c(1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11) ) ## Produce amDataset object, but remove gender column myDataset2 <- amDataset( amExample5, missingCode = "-99", indexColumn = 1, metaDataColumn = 2, ignoreColumn = "gender" ) ## Because all columns are paired, usage is simpler myAlleleFreq <- amAlleleFreq(myDataset2) ## End(Not run)
Performs clustering of multilocus genotypes to identify unique consensus and singleton genotypes
and generates analysis output in formatted text, HTML, or CSV. These functions are usually
called by amUnique
. This interface remains to enable a better understanding of how
amUnique
operates. For more information see example.
There are three steps to this analysis: (1) identify the dissimilarity between pairs of genotypes using a metric which takes missing data into account, (2) cluster this dissimilarity matrix using a standard hierarchical agglomerative clustering approach, and (3) use a dynamic tree cutting approach to identify clusters.
amCluster( amDatasetFocal, runUntilSingletons = TRUE, cutHeight = 0.3, missingMethod = 2, consensusMethod = 1, clusterMethod = "complete" ) amHTML.amCluster( x, htmlFile = NULL, htmlCSS = amCSSForHTML() ) amCSV.amCluster( x, csvFile ) ## S3 method for class 'amCluster' summary( object, html = NULL, csv = NULL, ... )
amCluster( amDatasetFocal, runUntilSingletons = TRUE, cutHeight = 0.3, missingMethod = 2, consensusMethod = 1, clusterMethod = "complete" ) amHTML.amCluster( x, htmlFile = NULL, htmlCSS = amCSSForHTML() ) amCSV.amCluster( x, csvFile ) ## S3 method for class 'amCluster' summary( object, html = NULL, csv = NULL, ... )
amDatasetFocal |
An |
runUntilSingletons |
When |
cutHeight |
Sets the tree cutting height using the hybrid method in the |
missingMethod |
The method used to determine the similarity of multilocus genotypes when data is missing. |
consensusMethod |
The method (an integer) used to determine the consensus multilocus genotype from a cluster
of multilocus genotypes. |
clusterMethod |
The method used by |
object , x
|
An |
htmlFile |
HTML filepath to create. |
htmlCSS |
String containing a valid cascading style sheet. |
html |
If |
csvFile , csv
|
CSV filepath to create containing only the unique genotypes determined in the clustering. |
... |
Additional arguments to |
Selecting an appropriate cutHeight
parameter (also known as the d-hat criterion) is
essential. Typically this function is called from amUnique
, and the conversion between
alleleMismatch (m-hat) and cutHeight (d-hat) will be done automatically. Selecting an
appropriate value for alleleMismatch (m-hat) can be done using amUniqueProfile
. See the
supplementary documentation for an explanation of how these parameters are related.
runUntilSingletons=TRUE
provides an efficient and reliable way to determine the unique
individuals in a dataset if the dataset meets certain criteria. To understand how the clustering
is thinning the dataset run this recursion manually using runUntilSingletons=FALSE
. An
example is provided below.
cutHeight
in practice gives the amount of dissimilarity (using the metric described in
amMatrix
) required for two multilocus genotypes to be declared different (also
known as d-hat). The default setting for consensusMethod
performs well.
consensusMethod
|
|
1 |
Genotype with max similarity to others in the cluster is consensus (DEFAULT) |
2 |
Genotype with max similarity to others in the cluster is consensus then interpolate missing alleles using mode non-missing allele in each column |
3 |
Genotype with min missing data is consensus |
4 |
Genotype with min missing data is consensus then interpolate missing alleles using mode non-missing allele in each column |
amCluster
object or side effects: analysis summary written to an HTML file or to the
console, or written to a CSV file.
There is an additional side effect of html = TRUE
(or of htmlFile = NULL
). If
required, there is a clean up of the operating system temporary directory where AlleleMatch
temporary HTML files are stored. Files that match the pattern am*.html and are older than 24
hours are deleted from this temporary directory.
Paul Galpern ([email protected])
For a complete vignette, please access via the Data S1 Supplementary documentation and tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>.
amDataset
, amMatrix
, amPairwise
,
amUnique
, amUniqueProfile
## Not run: data("amExample5") ## Produce amDataset object myDataset <- amDataset( amExample5, missingCode = "-99", indexColumn = 1, metaDataColumn = 2, ignoreColumn = "gender" ) ## Usage myCluster <- amCluster( myDataset, cutHeight = 0.2 ) ## Display analysis as HTML in default browser summary.amCluster( myCluster, html = TRUE ) ## Save analysis to HTML file summary.amCluster( myCluster, html = "myCluster.htm" ) ## Display analysis as formatted text on the console summary.amCluster(myCluster) ## Save unique genotypes only to a CSV file summary.amCluster( myCluster, csv = "myCluster.csv" ) ## Demonstration of how amCluster operates ## Manual control over the recursion in amCluster() summary.amCluster( myCluster1 <- amCluster( myDataset, runUntilSingletons = FALSE, cutHeight = 0.2 ), html = TRUE ) summary.amCluster( myCluster2 <- amCluster( myCluster1$unique, runUntilSingletons = FALSE, cutHeight = 0.2 ), html = TRUE ) summary.amCluster( myCluster3 <- amCluster( myCluster2$unique, runUntilSingletons = FALSE, cutHeight = 0.2 ), html = TRUE ) summary.amCluster( myCluster4 <- amCluster( myCluster3$unique, runUntilSingletons = FALSE, cutHeight = 0.2 ), html = TRUE ) ## No more clusters, therefore stop. ## End(Not run)
## Not run: data("amExample5") ## Produce amDataset object myDataset <- amDataset( amExample5, missingCode = "-99", indexColumn = 1, metaDataColumn = 2, ignoreColumn = "gender" ) ## Usage myCluster <- amCluster( myDataset, cutHeight = 0.2 ) ## Display analysis as HTML in default browser summary.amCluster( myCluster, html = TRUE ) ## Save analysis to HTML file summary.amCluster( myCluster, html = "myCluster.htm" ) ## Display analysis as formatted text on the console summary.amCluster(myCluster) ## Save unique genotypes only to a CSV file summary.amCluster( myCluster, csv = "myCluster.csv" ) ## Demonstration of how amCluster operates ## Manual control over the recursion in amCluster() summary.amCluster( myCluster1 <- amCluster( myDataset, runUntilSingletons = FALSE, cutHeight = 0.2 ), html = TRUE ) summary.amCluster( myCluster2 <- amCluster( myCluster1$unique, runUntilSingletons = FALSE, cutHeight = 0.2 ), html = TRUE ) summary.amCluster( myCluster3 <- amCluster( myCluster2$unique, runUntilSingletons = FALSE, cutHeight = 0.2 ), html = TRUE ) summary.amCluster( myCluster4 <- amCluster( myCluster3$unique, runUntilSingletons = FALSE, cutHeight = 0.2 ), html = TRUE ) ## No more clusters, therefore stop. ## End(Not run)
Returns a string containing the CSS code for embedding in HTML output by
amHTML.amPairwise
and amHTML.amUnique
.
amCSSForHTML()
amCSSForHTML()
This function is used internally. It can also be used as a basis to tweak the CSS code if different output formatting and colour-coding are desired. See examples.
A string containing a complete cascading style sheet.
Paul Galpern ([email protected])
For more information and for a complete vignette, please access via the Data S1 Supplementary documentation and tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>.
amHTML.amPairwise
, amHTML.amUnique
## Not run: data("amExample5") ## Produce CSS file for editing cat(amCSSForHTML(), file = "myCSS.css") ## Edit myCSS.css manually using text editor, then reimport as follows myCSS <- paste( readLines("myCSS.css"), collapse = "\n" ) ## Perform an allelematch unique analysis myUnique <- amUnique( amDataset( amExample5, missingCode = "-99", indexColumn = 1, metaDataColumn = 2, ignoreColumn = "gender" ), alleleMismatch = 3 ) ## Produce HTML output using tweaked CSS amHTML.amUnique( myUnique, html = "myUnique.htm", htmlCSS = myCSS ) ## End(Not run)
## Not run: data("amExample5") ## Produce CSS file for editing cat(amCSSForHTML(), file = "myCSS.css") ## Edit myCSS.css manually using text editor, then reimport as follows myCSS <- paste( readLines("myCSS.css"), collapse = "\n" ) ## Perform an allelematch unique analysis myUnique <- amUnique( amDataset( amExample5, missingCode = "-99", indexColumn = 1, metaDataColumn = 2, ignoreColumn = "gender" ), alleleMismatch = 3 ) ## Produce HTML output using tweaked CSS amHTML.amUnique( myUnique, html = "myUnique.htm", htmlCSS = myCSS ) ## End(Not run)
Given an input matrix or data.frame
produce a amDataset object suitable for use with
other allelematch functions.
amDataset( multilocusDataset, missingCode = "-99", indexColumn = NULL, metaDataColumn = NULL, ignoreColumn = NULL ) ## S3 method for class 'amDataset' print(x, ...)
amDataset( multilocusDataset, missingCode = "-99", indexColumn = NULL, metaDataColumn = NULL, ignoreColumn = NULL ) ## S3 method for class 'amDataset' print(x, ...)
multilocusDataset |
A |
missingCode |
A character string giving the code used for missing data. |
indexColumn |
Optional. |
metaDataColumn |
Optional. |
ignoreColumn |
Optional. |
x |
An amDataset object. |
... |
Additional arguments to summary. |
Examine amExampleData
for an example of a typical input dataset in the diploid
case. (Typically these files will be the CSV output from allele calling software). Sample index
or ID information and sample meta-data may be specified in two additional columns. Columns can
optionally be given names, and these are carried through analyses. If column names are not
given, appropriate names are produced.
Each datum is treated as a character string in allelematch
functions, enabling the mixing
of numeric and alphanumeric data.
The multilocus dataset can contain any number of diploid or haploid markers, and these can be in
any order. Thus in the diploid case there should be two columns for each locus (named, say,
locus1a and locus1b). Please note that AlleleMatch
functions pay no attention to
genetics. In other words, each column is considered a comparable state. Thus matching and
clustering of multilocus genotypes is done on the basis of superficial similarity of the data
matrix rows, rather than on any appreciation of the allelic states at each locus. See
amPairwise
for more discussion.
For this reason it is important when working with diploid data to ensure that identical
individuals will have identical alleles in each column. This can be achieved by sorting each
locus so that in each case the lower length allele appears in, say, a column "locus1a" and the
higher in column "locus1b." This pattern is likely the default in allele calling software and
sorting will typically not be required unless data are derived from an unusual source.
Only one meta-data column is possible with allelematch
. If multiple columns must be
associated with a given sample for downstream analyses, try pasting them together into one
string with an appropriate separator, and separating them later when allelematch analyses are
concluded.
An amDataset
object.
Paul Galpern ([email protected])
For a complete vignette, please access via the Data S1 Supplementary documentation and tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>.
amPairwise
, amUnique
, amExampleData
## Not run: data("amExample5") ## Typical usage myDataset <- amDataset( amExample5, missingCode = "-99", indexColumn = 1, metaDataColumn = 2, ignoreColumn = "gender" ) ## Access elements of amDataset object myMetaData <- myDataset$metaData mySamplingID <- myDataset$index myAlleles <- myDataset$multilocus ## View the structure of amDataset object unclass(myDataset) ## End(Not run)
## Not run: data("amExample5") ## Typical usage myDataset <- amDataset( amExample5, missingCode = "-99", indexColumn = 1, metaDataColumn = 2, ignoreColumn = "gender" ) ## Access elements of amDataset object myMetaData <- myDataset$metaData mySamplingID <- myDataset$index myAlleles <- myDataset$multilocus ## View the structure of amDataset object unclass(myDataset) ## End(Not run)
amExample1
High quality data setamExample2
Good quality data setamExample3
Marginal quality data setamExample4
Low quality data setamExample5
Wildlife data set
Data sets 1 to 4 are simulated. Because the data are simulated, the individual from which the
samples are derived is known. This is given in the column knownIndividual
, and permits an
assessment of how effective the software has been.
Data set 5 is a real dataset from a wildlife population that has been non-invasively sampled. The meta-data has been fabricated. It represents the output from allele calling software.
data(amExample1) data(amExample2) data(amExample3) data(amExample4) data(amExample5)
data(amExample1) data(amExample2) data(amExample3) data(amExample4) data(amExample5)
Data frames with differing numbers of samples in rows, and alleles in columns. Missing data is represented as "-99".
Note how in amExample5
a single marker "Gender" has been combined with diploid loci.
Also note how in all data sets in diploid loci the lower length allele always comes first.
This pattern is typical in allele calling software.
Given an amDataset object find the dissimilarities between pairs of multilocus genotypes, taking missing data into account.
amMatrix( amDatasetFocal, missingMethod = 2 )
amMatrix( amDatasetFocal, missingMethod = 2 )
amDatasetFocal |
An |
missingMethod |
The method used to determine the similarity of multilocus genotypes when data is missing. |
This function is the behind-the-scenes workhorse of AlleleMatch, and typically will not be
called by the user.
missingMethod=2
is the recommended value, and the default, as it has performed
better in simulations. In this method, missing data matches perfectly with missing data,
while missing data matches partially with non-missing data.
missingMethod = 1
is retained for experimental purposes. Here, missing data
matches partially with missing and non-missing data.
A distance/dissimilarity matrix of S3 class amMatrix
.
Paul Galpern ([email protected])
For a complete vignette, please access via the Data S1 Supplementary documentation and tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>.
## Not run: data("amExample1") ## Produce amDataset object myDataset <- amDataset( amExample1, missingCode = "-99", indexColumn = 1, metaDataColumn = 2 ) ## Produce dissimilarity matrix dissimMatrix <- amMatrix(myDataset) ## End(Not run)
## Not run: data("amExample1") ## Produce amDataset object myDataset <- amDataset( amExample1, missingCode = "-99", indexColumn = 1, metaDataColumn = 2 ) ## Produce dissimilarity matrix dissimMatrix <- amMatrix(myDataset) ## End(Not run)
Functions to perform a pairwise matching analysis of a multilocus genotype dataset, and review
the output in formatted text or HTML. For each genotype in the focal dataset all genotypes in
the comparison genotype are returned that match at or above a threshold matching score. The
matching score is also known as the s-hat criterion (see the supplementary documentation). This
is determined using amMatrix
.
amPairwise( amDatasetFocal, amDatasetComparison = amDatasetFocal, alleleMismatch = NULL, matchThreshold = NULL, missingMethod = 2 ) amHTML.amPairwise( x, htmlFile = NULL, htmlCSS = amCSSForHTML() ) amCSV.amPairwise( x, csvFile ) ## S3 method for class 'amPairwise' summary( object, html = NULL, csv = NULL, ... )
amPairwise( amDatasetFocal, amDatasetComparison = amDatasetFocal, alleleMismatch = NULL, matchThreshold = NULL, missingMethod = 2 ) amHTML.amPairwise( x, htmlFile = NULL, htmlCSS = amCSSForHTML() ) amCSV.amPairwise( x, csvFile ) ## S3 method for class 'amPairwise' summary( object, html = NULL, csv = NULL, ... )
amDatasetFocal |
An |
amDatasetComparison |
Optional. |
alleleMismatch |
Maximum number of mismatching alleles which will be tolerated when identifying individuals;
also known as m-hat parameter. |
matchThreshold |
Return comparison genotypes that match with the focal genotype at or above this score or similarity; also known as s-hat parameter. |
missingMethod |
Method used to determine the similarity of multilocus genotypes when data is missing. |
object , x
|
An |
htmlFile |
HTML filepath to create. |
htmlCSS |
A string containing a valid cascading style sheet. |
html |
If |
csvFile , csv
|
CSV filepath to create containing giving a data frame representation of the pairwise matching results. |
... |
Additional arguments to |
Pairwise matching of genotypes is a useful means to assess data quality and inspect for
genotyping errors.
matchThreshold
represents the similarity between two multilocus genotypes and can be
thought of as a percentage similarity (or a Hamming's distance between two vectors) that has
been corrected where missing data is present, such that missing data represents neither a match
nor a mismatch but a "partial" match. See amMatrix
for more discussion of this
metric.
amPairwise
object or side effects: analysis summary written to an HTML file or to the
console, or written to a CSV file.
As matchThreshold
is lowered, the size of the output increases rapidly. Typically
analyses will not be very useful or manageable with thresholds below 0.7.
There is an additional side effect of html = TRUE
(or of htmlFile = NULL
). If
required, there is a clean up of the operating system temporary directory where AlleleMatch
temporary HTML files are stored. Files that match the pattern am*.html and are older 24 hours
are deleted from this temporary directory.
Paul Galpern ([email protected])
For a complete vignette, please access via the Data S1 Supplementary documentation and tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>.
## Not run: data("amExample5") ## Produce amDataset object myDataset <- amDataset( amExample5, missingCode = "-99", indexColumn = 1, metaDataColumn = 2, ignoreColumn = "gender" ) ## Typical usage myPairwise <- amPairwise( myDataset, alleleMismatch = 2 ) ## Display analysis as HTML in default browser summary.amPairwise( myPairwise, html = TRUE ) ## Save analysis to HTML file summary.amPairwise( myPairwise, html = "myPairwise.htm" ) ## Save analysis to CSV file summary.amPairwise( myPairwise, csv = "myPairwise.csv" ) ## Display analysis as formatted text on the console summary.amPairwise(myPairwise) ## Compare one dataset against a second ## Both must have same number of allele columns ## Here we create two datasets artificially from one for illustration purposes myDatasetA <- amDataset( amExample5[sample(nrow(amExample5))[1:25], ], missingCode = "-99", indexColumn = 1, ignoreColumn = 2 ) myDatasetB <- amDataset( amExample5[sample(nrow(amExample5))[1:100], ], missingCode = "-99", indexColumn = 1, ignoreColumn = 2 ) myPairwise2 <- amPairwise( myDatasetA, myDatasetB, alleleMismatch = 3 ) summary.amPairwise( myPairwise2, html = TRUE ) ## End(Not run)
## Not run: data("amExample5") ## Produce amDataset object myDataset <- amDataset( amExample5, missingCode = "-99", indexColumn = 1, metaDataColumn = 2, ignoreColumn = "gender" ) ## Typical usage myPairwise <- amPairwise( myDataset, alleleMismatch = 2 ) ## Display analysis as HTML in default browser summary.amPairwise( myPairwise, html = TRUE ) ## Save analysis to HTML file summary.amPairwise( myPairwise, html = "myPairwise.htm" ) ## Save analysis to CSV file summary.amPairwise( myPairwise, csv = "myPairwise.csv" ) ## Display analysis as formatted text on the console summary.amPairwise(myPairwise) ## Compare one dataset against a second ## Both must have same number of allele columns ## Here we create two datasets artificially from one for illustration purposes myDatasetA <- amDataset( amExample5[sample(nrow(amExample5))[1:25], ], missingCode = "-99", indexColumn = 1, ignoreColumn = 2 ) myDatasetB <- amDataset( amExample5[sample(nrow(amExample5))[1:100], ], missingCode = "-99", indexColumn = 1, ignoreColumn = 2 ) myPairwise2 <- amPairwise( myDatasetA, myDatasetB, alleleMismatch = 3 ) summary.amPairwise( myPairwise2, html = TRUE ) ## End(Not run)
Identifies unique genotypes and generates analysis output in formatted text, HTML, or
CSV. Samples are clustered and matched based on their dissimilarity score (see
amMatrix
). Also calculated is the match probability, Psib, which is the
probability that a sample is a sibling of a unique genotype (and therefore not a
replicate sample) given the allele frequencies in a population consisting of only the
unique genotypes (Wilberg & Dreher, 2004).
amUnique( amDatasetFocal, multilocusMap = NULL, alleleMismatch = NULL, matchThreshold = NULL, cutHeight = NULL, doPsib = "missing", consensusMethod = 1, verbose = TRUE ) amHTML.amUnique( x, htmlFile = NULL, htmlCSS = amCSSForHTML() ) amCSV.amUnique( x, csvFile, uniqueOnly = FALSE ) ## S3 method for class 'amUnique' summary( object, html = NULL, csv = NULL, ... )
amUnique( amDatasetFocal, multilocusMap = NULL, alleleMismatch = NULL, matchThreshold = NULL, cutHeight = NULL, doPsib = "missing", consensusMethod = 1, verbose = TRUE ) amHTML.amUnique( x, htmlFile = NULL, htmlCSS = amCSSForHTML() ) amCSV.amUnique( x, csvFile, uniqueOnly = FALSE ) ## S3 method for class 'amUnique' summary( object, html = NULL, csv = NULL, ... )
amDatasetFocal |
An |
multilocusMap |
Optional. |
alleleMismatch |
Optional. |
matchThreshold |
Optional. |
cutHeight |
Optional. |
doPsib |
String specifying how match probability should be calculated. |
consensusMethod |
The method (an integer) used to determine the consensus multilocus genotype from a
cluster of multilocus genotypes. |
verbose |
If |
object , x
|
An |
htmlFile |
HTML filepath to create. |
htmlCSS |
A string containing a valid cascading style sheet. |
html |
If |
csvFile , csv
|
CSV filepath to create containing a representation of the |
uniqueOnly |
If |
... |
Additional arguments to |
Only one of alleleMismatch
, cutHeight
, matchThreshold
can be
specified, as the three parameters are related.
alleleMismatch
is the most intuitive way to understand how the identification
of unique genotypes proceeds. For example, a setting of alleleMismatch = 4
implies that up to four alleles may be different for multiple samples to be
representatives of the same individual. In practice, however, this value is only an
approximation of the amount of mismatch that may be tolerated. This is because the
clustering process used to identify unique genotypes, and the subsequent matching
which identifies samples that match these unique genotypes is based on a dissimilarity
metric or score (see amMatrix
) that incorporates both allele mismatches
and missing data. alleleMismatch
is not used in analyses and is converted to
this dissimilarity metric in the following manner: cutHeight
which is parameter
for amCluster
and called from this function is cutHeight =
alleleMismatch/(number of allele columns)
and matchThreshold
which is a
parameter for amPairwise
and also called from this function is
matchThreshold = 1 - cutHeight
.
Selecting the appropriate value for alleleMismatch
, cutHeight
, or
matchThreshold
is an important task. Use amUniqueProfile
to
assist in this process. Seethe Data S1 Supplementary documentation and tutorials (PDF)
located at <doi:10.1111/j.1755-0998.2012.03137.x>
doPsib = "missing"
is the default and specifies that match probability Psib
should be calculated for samples that match unique genotypes and have no allele
mismatches, but may differ by having missing data. doPsib = "all"
specifies
that Psib should be calculated for all samples that match unique genotypes. In this
case, if allele mismatches occur, alleles are assumed to be missing at the mismatching
loci.
multilocusMap
is often not required, as amDataset objects will typically
consist of paired columns of genotypes, where each pair is a separate locus. In cases
where this is not the case (e.g., gender is in only one column), a map vector must be
specified.
Example: amDataset
consists of gender followed by 4 diploid loci in paired
columnsmultilocusMap = c(1, 2, 2, 3, 3, 4, 4, 5, 5)
or equallymultilocusMap=c("GENDER", "LOC1", "LOC1", "LOC2", "LOC2", "LOC3", "LOC4",
"LOC4")
For more information on selecting consensusMethod
see amCluster
.
The default consensusMethod = 1
is typically adequate.
amUnique
object or side effects: analysis summary written to an HTML file or to
the console, or written to a CSV file.
There is an additional side effect of html = TRUE
(or of htmlFile =
NULL
). If required, there is a clean up of the operating system temporary directory
where AlleleMatch temporary HTML files are stored. Files that match the pattern
am*.html and are older 24 hours are deleted from this temporary directory.
Paul Galpern ([email protected])
For a complete vignette, please access via the Data S1 Supplementary documentation and
tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>.
Wilberg MJ, Dreher BP (2004) GENECAP: a program for analysis of multilocus genotype data for non-invasive sampling and capture-recapture population estimation. Molecular Ecology Notes, 4, 783-785.
amDataset
, amMatrix
, amPairwise
,
amCluster
, amUniqueProfile
## Not run: data("amExample2") ## Produce amDataset object myDataset <- amDataset( amExample2, missingCode = "-99", indexColumn = 1, ignoreColumn = 2 ) ## Usage ## Optimal alleleMismatch parameter previously found using amUniqueProfile() myUnique <- amUnique( myDataset, alleleMismatch = 3 ) ## Display analysis as HTML in default browser summary.amUnique( myUnique, html = TRUE ) ## Save analysis to HTML file summary.amUnique( myUnique, html = "myUnique.htm" ) ## Save analysis to a CSV file summary.amUnique( myUnique, csv = "myUnique.csv" ) ## Save unique genotypes only to a CSV file summary.amUnique( myUnique, csv = "myUnique.csv", uniqueOnly = TRUE ) ## Data set with gender information data("amExample5") ## Produce amDataset object myDataset2 <- amDataset( amExample5, missingCode = "-99", indexColumn = 1, metaDataColumn = 2 ) ## Usage ## Optimal alleleMismatch parameter previously found using amUniqueProfile() myUniqueProfile <- amUnique( myDataset2, multilocusMap = c(1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11), alleleMismatch = 3 ) ## End(Not run)
## Not run: data("amExample2") ## Produce amDataset object myDataset <- amDataset( amExample2, missingCode = "-99", indexColumn = 1, ignoreColumn = 2 ) ## Usage ## Optimal alleleMismatch parameter previously found using amUniqueProfile() myUnique <- amUnique( myDataset, alleleMismatch = 3 ) ## Display analysis as HTML in default browser summary.amUnique( myUnique, html = TRUE ) ## Save analysis to HTML file summary.amUnique( myUnique, html = "myUnique.htm" ) ## Save analysis to a CSV file summary.amUnique( myUnique, csv = "myUnique.csv" ) ## Save unique genotypes only to a CSV file summary.amUnique( myUnique, csv = "myUnique.csv", uniqueOnly = TRUE ) ## Data set with gender information data("amExample5") ## Produce amDataset object myDataset2 <- amDataset( amExample5, missingCode = "-99", indexColumn = 1, metaDataColumn = 2 ) ## Usage ## Optimal alleleMismatch parameter previously found using amUniqueProfile() myUniqueProfile <- amUnique( myDataset2, multilocusMap = c(1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11), alleleMismatch = 3 ) ## End(Not run)
Function to automatically run amUnique
at a sequence of parameter values to
determine an optimal setting, and optionally plot the result
amUniqueProfile( amDatasetFocal, multilocusMap = NULL, alleleMismatch = NULL, matchThreshold = NULL, cutHeight = NULL, guessOptimum = TRUE, doPlot = TRUE, consensusMethod = 1, verbose = TRUE )
amUniqueProfile( amDatasetFocal, multilocusMap = NULL, alleleMismatch = NULL, matchThreshold = NULL, cutHeight = NULL, guessOptimum = TRUE, doPlot = TRUE, consensusMethod = 1, verbose = TRUE )
amDatasetFocal |
An |
multilocusMap |
Optionally a vector of integers or strings giving the mappings onto loci for all
genotype columns in amDatasetFocal. |
alleleMismatch |
A vector giving a sequence, where elements give the maximum number of mismatching
alleles which will be tolerated when identifying individuals; also known as the
m-hat parameter. |
matchThreshold |
A vector giving a sequence, where elements give the minimum dissimilarity score
which constitutes a match when identifying individuals; also known as the s-hat
parameter. |
cutHeight |
A vector giving a sequence, where elements give the |
doPlot |
If |
guessOptimum |
If |
consensusMethod |
The method (an integer) used to determine the consensus multilocus genotype from a
cluster of multilocus genotypes. |
verbose |
If |
Selecting the appropriate value for alleleMismatch
, cutHeight
, or
matchThreshold
is an important task. Use this function to assist in this
process. Typically the optimal value of any of these parameters is found where the
number of multiple matches is minimized (the majority of samples are similar to only
one unique genotype). Usually there is a minimum when these parameters are set to be
very sensitive to differences among samples (i.e., alleleMismatch
or
cutHeight
are 0, matchThreshold
is 1). Simulations suggest that the next
most sensitive minimum in multiple matches is the optimal value. This minimum will
often be associated with a drop in multiple matches as sensitivity drops. For more
discussion of this important step, see the Data S1 Supplementary documentation and
tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>.
Using guessOptimum = TRUE
will attempt to estimate the location of this minimum
and add it to the profile plot. Manual assessment of this estimate using the plot is
strongly recommended.
If none of alleleMismatch
, cutHeight
, or matchThreshold
is given,
the function runs a sequence of values for alleleMismatch
as follows:
seq(from = 0, to = floor(ncol(amDatasetFocal$multilocus) * 0.4), by = 1)
multilocusMap
is often not required, as amDataset
objects will typically
consist of paired columns of genotypes, where each pair is a separate locus. In cases
where this is not the case (e.g., gender is given in only one column), a map vector
must be specified.
Example: amDataset
consists of gender followed by 4 diploid loci in paired
columnsmultilocusMap = c(1, 2, 2, 3, 3, 4, 4, 5, 5)
or equallymultilocusMap=c("GENDER", "LOC1", "LOC1", "LOC2", "LOC2", "LOC3", "LOC4",
"LOC4")
For more information on selecting consensusMethod
see amCluster
.
The default consensusMethod = 1
is typically adequate.
A data.frame
containing summary data from multiple runs of amUnique
Paul Galpern ([email protected])
For a complete vignette, please access via the Data S1 Supplementary documentation and tutorials (PDF) located at <doi:10.1111/j.1755-0998.2012.03137.x>.
## Not run: data("amExample2") ## Produce amDataset object myDataset <- amDataset( amExample2, missingCode = "-99", indexColumn = 1, metaDataColumn = 2 ) ## Usage (uncomment) myUniqueProfile <- amUniqueProfile(myDataset) ## Data set with gender information data("amExample5") ## Produce amDataset object myDataset2 <- amDataset( amExample5, missingCode = "-99", indexColumn = 1, metaDataColumn = 2 ) ## Usage myUniqueProfile <- amUniqueProfile( myDataset2, multilocusMap = c(1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11)) ## End(Not run)
## Not run: data("amExample2") ## Produce amDataset object myDataset <- amDataset( amExample2, missingCode = "-99", indexColumn = 1, metaDataColumn = 2 ) ## Usage (uncomment) myUniqueProfile <- amUniqueProfile(myDataset) ## Data set with gender information data("amExample5") ## Produce amDataset object myDataset2 <- amDataset( amExample5, missingCode = "-99", indexColumn = 1, metaDataColumn = 2 ) ## Usage myUniqueProfile <- amUniqueProfile( myDataset2, multilocusMap = c(1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11)) ## End(Not run)