Calculate the values of precision and recall for the model and determine which of the two is higher.

Pattern recognition performance metrics

Calculate the values of precision and recall for the model and determine which of the two is higher.

Precision and recall

In pattern recognition, information retrieval, object detection and classification (machine learning), precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space.

Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances, while recall (also known as sensitivity) is the fraction of relevant instances that were retrieved. Both precision and recall are therefore based on relevance.

Consider a computer program for recognizing dogs (the relevant element) in a digital photograph. Upon processing a picture which contains ten cats and twelve dogs, the program identifies eight dogs. Of the eight elements identified as dogs, only five actually are dogs (true positives), while the other three are cats (false positives). Seven dogs were missed (false negatives), and seven cats were correctly excluded (true negatives). The program's precision is then 5/8 (true positives / selected elements) while its recall is 5/12 (true positives / relevant elements).

When a search engine returns 30 pages, only 20 of which are relevant, while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3, which tells us how valid the results are, while its recall is 20/60 = 1/3, which tells us how complete the results are.

Adopting a hypothesis-testing approach from statistics, in which, in this case, the null hypothesis is that a given item is irrelevant, i.e., not a dog, absence of type I and type II errors (i.e. perfect specificity and sensitivity of 100% each) corresponds respectively to perfect precision (no false positive) and perfect recall (no false negative).

More generally, recall is simply the complement of the type II error rate, i.e. one minus the type II error rate. Precision is related to the type I error rate, but in a slightly more complicated way, as it also depends upon the prior distribution of seeing a relevant vs an irrelevant item.

The above cat and dog example contained 8 − 5 = 3 type I errors, for a type I error rate of 3/8, and 12 − 5 = 7 type II errors, for a type II error rate of 7/12. Precision can be seen as a measure of quality, and recall as a measure of quantity. Higher precision means that an algorithm returns more relevant results than irrelevant ones, and high recall means that an algorithm returns most of the relevant results (whether or not irrelevant ones are also returned).

Introduction

In information retrieval, the instances are documents and the task is to return a set of relevant documents given a search term. Recall is the number of relevant documents retrieved by a search divided by the total number of existing relevant documents, while precision is the number of relevant documents retrieved by a search divided by the total number of documents retrieved by that search.

In a classification task, the precision for a class is the number of true positives (i.e. the number of items correctly labelled as belonging to the positive class) divided by the total number of elements labelled as belonging to the positive class (i.e. the sum of true positives and false positives, which are items incorrectly labelled as belonging to the class). Recall in this context is defined as the number of true positives divided by the total number of elements that actually belong to the positive class (i.e. the sum of true positives and false negatives, which are items which were not labelled as belonging to the positive class but should have been).

In information retrieval, a perfect precision score of 1.0 means that every result retrieved by a search was relevant (but says nothing about whether all relevant documents were retrieved) whereas a perfect recall score of 1.0 means that all relevant documents were retrieved by the search (but says nothing about how many irrelevant documents were also retrieved).

Precision and recall are not particularly useful metrics when used in isolation. For instance, it is possible to have perfect recall by simply retrieving every single item. Likewise, it is possible to have near-perfect precision by selecting only a very small number of extremely likely items.

In a classification task, a precision score of 1.0 for a class C means that every item labelled as belonging to class C does indeed belong to class C (but says nothing about the number of items from class C that were not labelled correctly) whereas a recall of 1.0 means that every item from class C was labelled as belonging to class C (but says nothing about how many items from other classes were incorrectly also labelled as belonging to class C).

Often, there is an inverse relationship between precision and recall, where it is possible to increase one at the cost of reducing the other. Brain surgery provides an illustrative example of the tradeoff. Consider a brain surgeon removing a cancerous tumor from a patient's brain. The surgeon needs to remove all of the tumor cells since any remaining cancer cells will regenerate the tumor. Conversely, the surgeon must not remove healthy brain cells since that would leave the patient with impaired brain function. The surgeon may be more liberal in the area of the brain he removes to ensure he has extracted all the cancer cells. This decision increases recall but reduces precision. On the other hand, the surgeon may be more conservative in the brain cells he removes to ensure he extracts only cancer cells. This decision increases precision but reduces recall. That is to say, greater recall increases the chances of removing healthy cells (negative outcome) and increases the chances of removing all cancer cells (positive outcome). Greater precision decreases the chances of removing healthy cells (positive outcome) but also decreases the chances of removing all cancer cells (negative outcome).

Usually, precision and recall scores are not discussed in isolation. Instead, either values for one measure are compared for a fixed level at the other measure (e.g. precision at a recall level of 0.75) or both are combined into a single measure. Examples of measures that are a combination of precision and recall are the F-measure (the weighted harmonic mean of precision and recall), or the Matthews correlation coefficient, which is a geometric mean of the chance-corrected variants: the regression coefficients Informedness (DeltaP') and Markedness (DeltaP).[1][2] Accuracy is a weighted arithmetic mean of Precision and Inverse Precision (weighted by Bias) as well as a weighted arithmetic mean of Recall and Inverse Recall (weighted by Prevalence).[1] Inverse Precision and Inverse Recall are simply the Precision and Recall of the inverse problem where positive and negative labels are exchanged (for both real classes and prediction labels). Recall and Inverse Recall, or equivalently true positive rate and false positive rate, are frequently plotted against each other as ROC curves and provide a principled mechanism to explore operating point tradeoffs. Outside of Information Retrieval, the application of Recall, Precision and F-measure are argued to be flawed as they ignore the true negative cell of the contingency table, and they are easily manipulated by biasing the predictions.[1] The first problem is 'solved' by using Accuracy and the second problem is 'solved' by discounting the chance component and renormalizing to Cohen's kappa, but this no longer affords the opportunity to explore tradeoffs graphically. However, Informedness and Markedness are Kappa-like renormalizations of Recall and Precision,[3] and their geometric mean Matthews correlation coefficient thus acts like a debiased F-measure.

Definition (information retrieval context)

In information retrieval contexts, precision and recall are defined in terms of a set of retrieved documents (e.g. the list of documents produced by a web search engine for a query) and a set of relevant documents (e.g. the list of all documents on the internet that are relevant for a certain topic), cf. relevance.[4]

Precision

In the field of information retrieval, precision is the fraction of retrieved documents that are relevant to the query:

precision = | { relevant documents } ∩ { retrieved documents } | | { retrieved documents } | {\displaystyle {\text{precision}}={\frac {|\{{\text{relevant documents}}\}\cap \{{\text{retrieved documents}}\}|}{|\{{\text{retrieved documents}}\}|}}}

For example, for a text search on a set of documents, precision is the number of correct results divided by the number of all returned results.

Precision takes all retrieved documents into account, but it can also be evaluated at a given cut-off rank, considering only the topmost results returned by the system. This measure is called precision at n or P@n.

Precision is used with recall, the percent of all relevant documents that is returned by the search. The two measures are sometimes used together in the F1 Score (or f-measure) to provide a single measurement for a system.

Note that the meaning and usage of "precision" in the field of information retrieval differs from the definition of accuracy and precision within other branches of science and technology.

Recall

In information retrieval, recall is the fraction of the relevant documents that are successfully retrieved.

recall = | { relevant documents } ∩ { retrieved documents } | | { relevant documents } | {\displaystyle {\text{recall}}={\frac {|\{{\text{relevant documents}}\}\cap \{{\text{retrieved documents}}\}|}{|\{{\text{relevant documents}}\}|}}}

For example, for a text search on a set of documents, recall is the number of correct results divided by the number of results that should have been returned.

In binary classification, recall is called sensitivity. It can be viewed as the probability that a relevant document is retrieved by the query.

It is trivial to achieve recall of 100% by returning all documents in response to any query. Therefore, recall alone is not enough. One needs to measure the number of non-relevant documents also, for example by also computing the precision.

Connection

Precision and recall can be interpreted as (estimated) conditional probabilities: Precision is given by P ( C = P | C ^ = P ) {\displaystyle P(C=P|{\hat {C}}=P)}

Calculate the values of precision and recall for the model and determine which of the two is higher.
while recall is given by P ( C ^ = P | C = P ) {\displaystyle P({\hat {C}}=P|C=P)}
Calculate the values of precision and recall for the model and determine which of the two is higher.
,[5] where C ^ {\displaystyle {\hat {C}}}
Calculate the values of precision and recall for the model and determine which of the two is higher.
is the predicted class and C {\displaystyle C}
Calculate the values of precision and recall for the model and determine which of the two is higher.
is the actual class. Both quantities are, therefore, connected by Bayes' theorem.

Definition (classification context)

For classification tasks, the terms true positives, true negatives, false positives, and false negatives (see Type I and type II errors for definitions) compare the results of the classifier under test with trusted external judgments. The terms positive and negative refer to the classifier's prediction (sometimes known as the expectation), and the terms true and false refer to whether that prediction corresponds to the external judgment (sometimes known as the observation).

Let us define an experiment from P positive instances and N negative instances for some condition. The four outcomes can be formulated in a 2×2 contingency table or confusion matrix, as follows:

Predicted condition Sources: [6][7][8][9][10][11][12][13][14]
  • view
  • talk
  • edit

Total population
= P + N
Positive (PP) Negative (PN) Informedness, bookmaker informedness (BM)
= TPR + TNR − 1
Prevalence threshold (PT)
= TPR × FPR − FPR TPR − FPR {\displaystyle {\mathsf {\tfrac {{\sqrt {{\text{TPR}}\times {\text{FPR}}}}-{\text{FPR}}}{{\text{TPR}}-{\text{FPR}}}}}}
Calculate the values of precision and recall for the model and determine which of the two is higher.

Actual condition

Positive (P) True positive (TP),
hit
False negative (FN),
type II error, miss,
underestimation
True positive rate (TPR), recall, sensitivity (SEN), probability of detection, hit rate, power
= TP/P = 1 − FNR
False negative rate (FNR),
miss rate
= FN/P = 1 − TPR
Negative (N) False positive (FP),
type I error, false alarm,
overestimation
True negative (TN),
correct rejection
False positive rate (FPR),
probability of false alarm, fall-out
= FP/N = 1 − TNR
True negative rate (TNR),
specificity (SPC), selectivity
= TN/N = 1 − FPR
Prevalence
= P/P + N
Positive predictive value (PPV), precision
= TP/PP = 1 − FDR
False omission rate (FOR)
= FN/PN = 1 − NPV
Positive likelihood ratio (LR+)
= TPR/FPR
Negative likelihood ratio (LR−)
= FNR/TNR
Accuracy (ACC) = TP + TN/P + N False discovery rate (FDR)
= FP/PP = 1 − PPV
Negative predictive value (NPV) = TN/PN = 1 − FOR Markedness (MK), deltaP (Δp)
= PPV + NPV − 1
Diagnostic odds ratio (DOR) = LR+/LR−
Balanced accuracy (BA) = TPR + TNR/2 F1 score
= 2 PPV × TPR/PPV + TPR = 2 TP/2 TP + FP + FN
Fowlkes–Mallows index (FM) = PPV × TPR {\displaystyle \scriptstyle {\mathsf {\sqrt {{\text{PPV}}\times {\text{TPR}}}}}}
Calculate the values of precision and recall for the model and determine which of the two is higher.
Matthews correlation coefficient (MCC)
= TPR × TNR × PPV × NPV {\displaystyle \scriptstyle {\mathsf {\sqrt {{\text{TPR}}\times {\text{TNR}}\times {\text{PPV}}\times {\text{NPV}}}}}}
Calculate the values of precision and recall for the model and determine which of the two is higher.
− FNR × FPR × FOR × FDR {\displaystyle \scriptstyle -{\mathsf {\sqrt {{\text{FNR}}\times {\text{FPR}}\times {\text{FOR}}\times {\text{FDR}}}}}}
Calculate the values of precision and recall for the model and determine which of the two is higher.
Threat score (TS), critical success index (CSI), Jaccard index = TP/TP + FN + FP
Terminology and derivations
from a confusion matrix
condition positive (P) the number of real positive cases in the data condition negative (N) the number of real negative cases in the data true positive (TP) A test result that correctly indicates the presence of a condition or characteristic true negative (TN) A test result that correctly indicates the absence of a condition or characteristic false positive (FP) A test result which wrongly indicates that a particular condition or attribute is present false negative (FN) A test result which wrongly indicates that a particular condition or attribute is absent sensitivity, recall, hit rate, or true positive rate (TPR) T P R = T P P = T P T P + F N = 1 − F N R {\displaystyle \mathrm {TPR} ={\frac {\mathrm {TP} }{\mathrm {P} }}={\frac {\mathrm {TP} }{\mathrm {TP} +\mathrm {FN} }}=1-\mathrm {FNR} }
Calculate the values of precision and recall for the model and determine which of the two is higher.
specificity, selectivity or true negative rate (TNR) T N R = T N N = T N T N + F P = 1 − F P R {\displaystyle \mathrm {TNR} ={\frac {\mathrm {TN} }{\mathrm {N} }}={\frac {\mathrm {TN} }{\mathrm {TN} +\mathrm {FP} }}=1-\mathrm {FPR} }
Calculate the values of precision and recall for the model and determine which of the two is higher.
precision or positive predictive value (PPV) P P V = T P T P + F P = 1 − F D R {\displaystyle \mathrm {PPV} ={\frac {\mathrm {TP} }{\mathrm {TP} +\mathrm {FP} }}=1-\mathrm {FDR} }
Calculate the values of precision and recall for the model and determine which of the two is higher.
negative predictive value (NPV) N P V = T N T N + F N = 1 − F O R {\displaystyle \mathrm {NPV} ={\frac {\mathrm {TN} }{\mathrm {TN} +\mathrm {FN} }}=1-\mathrm {FOR} }
Calculate the values of precision and recall for the model and determine which of the two is higher.
miss rate or false negative rate (FNR) F N R = F N P = F N F N + T P = 1 − T P R {\displaystyle \mathrm {FNR} ={\frac {\mathrm {FN} }{\mathrm {P} }}={\frac {\mathrm {FN} }{\mathrm {FN} +\mathrm {TP} }}=1-\mathrm {TPR} }
Calculate the values of precision and recall for the model and determine which of the two is higher.
fall-out or false positive rate (FPR) F P R = F P N = F P F P + T N = 1 − T N R {\displaystyle \mathrm {FPR} ={\frac {\mathrm {FP} }{\mathrm {N} }}={\frac {\mathrm {FP} }{\mathrm {FP} +\mathrm {TN} }}=1-\mathrm {TNR} }
Calculate the values of precision and recall for the model and determine which of the two is higher.
false discovery rate (FDR) F D R = F P F P + T P = 1 − P P V {\displaystyle \mathrm {FDR} ={\frac {\mathrm {FP} }{\mathrm {FP} +\mathrm {TP} }}=1-\mathrm {PPV} }
Calculate the values of precision and recall for the model and determine which of the two is higher.
false omission rate (FOR) F O R = F N F N + T N = 1 − N P V {\displaystyle \mathrm {FOR} ={\frac {\mathrm {FN} }{\mathrm {FN} +\mathrm {TN} }}=1-\mathrm {NPV} }
Calculate the values of precision and recall for the model and determine which of the two is higher.
Positive likelihood ratio (LR+) L R + = T P R F P R {\displaystyle \mathrm {LR+} ={\frac {\mathrm {TPR} }{\mathrm {FPR} }}}
Calculate the values of precision and recall for the model and determine which of the two is higher.
Negative likelihood ratio (LR-) L R − = F N R T N R {\displaystyle \mathrm {LR-} ={\frac {\mathrm {FNR} }{\mathrm {TNR} }}}
Calculate the values of precision and recall for the model and determine which of the two is higher.
prevalence threshold (PT) P T = F P R T P R + F P R {\displaystyle \mathrm {PT} ={\frac {\sqrt {\mathrm {FPR} }}{{\sqrt {\mathrm {TPR} }}+{\sqrt {\mathrm {FPR} }}}}}
Calculate the values of precision and recall for the model and determine which of the two is higher.
threat score (TS) or critical success index (CSI) T S = T P T P + F N + F P {\displaystyle \mathrm {TS} ={\frac {\mathrm {TP} }{\mathrm {TP} +\mathrm {FN} +\mathrm {FP} }}}
Calculate the values of precision and recall for the model and determine which of the two is higher.
Prevalence P P + N {\displaystyle {\frac {\mathrm {P} }{\mathrm {P} +\mathrm {N} }}}
Calculate the values of precision and recall for the model and determine which of the two is higher.
accuracy (ACC) A C C = T P + T N P + N = T P + T N T P + T N + F P + F N {\displaystyle \mathrm {ACC} ={\frac {\mathrm {TP} +\mathrm {TN} }{\mathrm {P} +\mathrm {N} }}={\frac {\mathrm {TP} +\mathrm {TN} }{\mathrm {TP} +\mathrm {TN} +\mathrm {FP} +\mathrm {FN} }}}
Calculate the values of precision and recall for the model and determine which of the two is higher.
balanced accuracy (BA) B A = T P R + T N R 2 {\displaystyle \mathrm {BA} ={\frac {TPR+TNR}{2}}}
Calculate the values of precision and recall for the model and determine which of the two is higher.
F1 score is the harmonic mean of precision and sensitivity: F 1 = 2 × P P V × T P R P P V + T P R = 2 T P 2 T P + F P + F N {\displaystyle \mathrm {F} _{1}=2\times {\frac {\mathrm {PPV} \times \mathrm {TPR} }{\mathrm {PPV} +\mathrm {TPR} }}={\frac {2\mathrm {TP} }{2\mathrm {TP} +\mathrm {FP} +\mathrm {FN} }}}
Calculate the values of precision and recall for the model and determine which of the two is higher.
phi coefficient (φ or rφ) or Matthews correlation coefficient (MCC) M C C = T P × T N − F P × F N ( T P + F P ) ( T P + F N ) ( T N + F P ) ( T N + F N ) {\displaystyle \mathrm {MCC} ={\frac {\mathrm {TP} \times \mathrm {TN} -\mathrm {FP} \times \mathrm {FN} }{\sqrt {(\mathrm {TP} +\mathrm {FP} )(\mathrm {TP} +\mathrm {FN} )(\mathrm {TN} +\mathrm {FP} )(\mathrm {TN} +\mathrm {FN} )}}}}
Calculate the values of precision and recall for the model and determine which of the two is higher.
Fowlkes–Mallows index (FM) F M = T P T P + F P × T P T P + F N = P P V × T P R {\displaystyle \mathrm {FM} ={\sqrt {{\frac {TP}{TP+FP}}\times {\frac {TP}{TP+FN}}}}={\sqrt {PPV\times TPR}}}
Calculate the values of precision and recall for the model and determine which of the two is higher.
informedness or bookmaker informedness (BM) B M = T P R + T N R − 1 {\displaystyle \mathrm {BM} =\mathrm {TPR} +\mathrm {TNR} -1}
Calculate the values of precision and recall for the model and determine which of the two is higher.
markedness (MK) or deltaP (Δp) M K = P P V + N P V − 1 {\displaystyle \mathrm {MK} =\mathrm {PPV} +\mathrm {NPV} -1}
Calculate the values of precision and recall for the model and determine which of the two is higher.
Diagnostic odds ratio (DOR) D O R = L R + L R − {\displaystyle \mathrm {DOR} ={\frac {\mathrm {LR+} }{\mathrm {LR-} }}}
Calculate the values of precision and recall for the model and determine which of the two is higher.

Sources: Fawcett (2006),[15] Piryonesi and El-Diraby (2020),[16] Powers (2011),[17] Ting (2011),[18] CAWCR,[19] D. Chicco & G. Jurman (2020, 2021),[20][21] Tharwat (2018).[22] Balayla (2020)[23]


Precision and recall are then defined as:[24]

Precision = t p t p + f p Recall = t p t p + f n {\displaystyle {\begin{aligned}{\text{Precision}}&={\frac {tp}{tp+fp}}\\{\text{Recall}}&={\frac {tp}{tp+fn}}\,\end{aligned}}}

Recall in this context is also referred to as the true positive rate or sensitivity, and precision is also referred to as positive predictive value (PPV); other related measures used in classification include true negative rate and accuracy.[24] True negative rate is also called specificity.

True negative rate = t n t n + f p {\displaystyle {\text{True negative rate}}={\frac {tn}{tn+fp}}\,}

Imbalanced data

Accuracy = T P + T N T P + T N + F P + F N {\displaystyle {\text{Accuracy}}={\frac {TP+TN}{TP+TN+FP+FN}}\,}

Accuracy can be a misleading metric for imbalanced data sets. Consider a sample with 95 negative and 5 positive values. Classifying all values as negative in this case gives 0.95 accuracy score. There are many metrics that don't suffer from this problem. For example, balanced accuracy[25] (bACC) normalizes true positive and true negative predictions by the number of positive and negative samples, respectively, and divides their sum by two:

Balanced accuracy = T P R + T N R 2 {\displaystyle {\text{Balanced accuracy}}={\frac {TPR+TNR}{2}}\,}

For the previous example (95 negative and 5 positive samples), classifying all as negative gives 0.5 balanced accuracy score (the maximum bACC score is one), which is equivalent to the expected value of a random guess in a balanced data set. Balanced accuracy can serve as an overall performance metric for a model, whether or not the true labels are imbalanced in the data, assuming the cost of FN is the same as FP.

Another metric is the predicted positive condition rate (PPCR), which identifies the percentage of the total population that is flagged. For example, for a search engine that returns 30 results (retrieved documents) out of 1,000,000 documents, the PPCR is 0.003%.

Predicted positive condition rate = T P + F P T P + F P + T N + F N {\displaystyle {\text{Predicted positive condition rate}}={\frac {TP+FP}{TP+FP+TN+FN}}\,}

According to Saito and Rehmsmeier, precision-recall plots are more informative than ROC plots when evaluating binary classifiers on imbalanced data. In such scenarios, ROC plots may be visually deceptive with respect to conclusions about the reliability of classification performance.[26]

Different from the above approaches, if an imbalance scaling is applied directly by weighting the confusion matrix elements, the standard metrics definitions still apply even in the case of imbalanced datasets.[27] The weighting procedure relates the confusion matrix elements to the support set of each considered class.

Probabilistic interpretation

One can also interpret precision and recall not as ratios but as estimations of probabilities:[28]

  • Precision is the estimated probability that a document randomly selected from the pool of retrieved documents is relevant.
  • Recall is the estimated probability that a document randomly selected from the pool of relevant documents is retrieved.

Another interpretation is that precision is the average probability of relevant retrieval and recall is the average probability of complete retrieval averaged over multiple retrieval queries.

F-measure

A measure that combines precision and recall is the harmonic mean of precision and recall, the traditional F-measure or balanced F-score:

F = 2 ⋅ p r e c i s i o n ⋅ r e c a l l p r e c i s i o n + r e c a l l {\displaystyle F=2\cdot {\frac {\mathrm {precision} \cdot \mathrm {recall} }{\mathrm {precision} +\mathrm {recall} }}}

This measure is approximately the average of the two when they are close, and is more generally the harmonic mean, which, for the case of two numbers, coincides with the square of the geometric mean divided by the arithmetic mean. There are several reasons that the F-score can be criticized in particular circumstances due to its bias as an evaluation metric.[1] This is also known as the F 1 {\displaystyle F_{1}}

Calculate the values of precision and recall for the model and determine which of the two is higher.
measure, because recall and precision are evenly weighted.

It is a special case of the general F β {\displaystyle F_{\beta }}

Calculate the values of precision and recall for the model and determine which of the two is higher.
measure (for non-negative real values of  β {\displaystyle \beta }
Calculate the values of precision and recall for the model and determine which of the two is higher.
):

F β = ( 1 + β 2 ) ⋅ p r e c i s i o n ⋅ r e c a l l β 2 ⋅ p r e c i s i o n + r e c a l l {\displaystyle F_{\beta }=(1+\beta ^{2})\cdot {\frac {\mathrm {precision} \cdot \mathrm {recall} }{\beta ^{2}\cdot \mathrm {precision} +\mathrm {recall} }}}

Two other commonly used F {\displaystyle F}

Calculate the values of precision and recall for the model and determine which of the two is higher.
measures are the F 2 {\displaystyle F_{2}}
Calculate the values of precision and recall for the model and determine which of the two is higher.
measure, which weights recall higher than precision, and the F 0.5 {\displaystyle F_{0.5}}
Calculate the values of precision and recall for the model and determine which of the two is higher.
measure, which puts more emphasis on precision than recall.

The F-measure was derived by van Rijsbergen (1979) so that F β {\displaystyle F_{\beta }} "measures the effectiveness of retrieval with respect to a user who attaches β {\displaystyle \beta } times as much importance to recall as precision". It is based on van Rijsbergen's effectiveness measure E α = 1 − 1 α P + 1 − α R {\displaystyle E_{\alpha }=1-{\frac {1}{{\frac {\alpha }{P}}+{\frac {1-\alpha }{R}}}}}

Calculate the values of precision and recall for the model and determine which of the two is higher.
, the second term being the weighted harmonic mean of precision and recall with weights ( α , 1 − α ) {\displaystyle (\alpha ,1-\alpha )}
Calculate the values of precision and recall for the model and determine which of the two is higher.
. Their relationship is F β = 1 − E α {\displaystyle F_{\beta }=1-E_{\alpha }}
Calculate the values of precision and recall for the model and determine which of the two is higher.
where α = 1 1 + β 2 {\displaystyle \alpha ={\frac {1}{1+\beta ^{2}}}}
Calculate the values of precision and recall for the model and determine which of the two is higher.
.

Limitations as goals

There are other parameters and strategies for performance metric of information retrieval system, such as the area under the ROC curve (AUC).[29]

See also

  • Uncertainty coefficient, also called proficiency
  • Sensitivity and specificity
  • Confusion matrix

References

  1. ^ a b c d Powers, David M W (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation" (PDF). Journal of Machine Learning Technologies. 2 (1): 37–63. Archived from the original (PDF) on 2019-11-14.
  2. ^ Perruchet, P.; Peereman, R. (2004). "The exploitation of distributional information in syllable processing". J. Neurolinguistics. 17 (2–3): 97–119. doi:10.1016/s0911-6044(03)00059-9. S2CID 17104364.
  3. ^ Powers, David M. W. (2012). "The Problem with Kappa". Conference of the European Chapter of the Association for Computational Linguistics (EACL2012) Joint ROBUS-UNSUP Workshop.
  4. ^ * Kent, Allen; Berry, Madeline M.; Luehrs, Jr., Fred U.; Perry, J.W. (1955). "Machine literature searching VIII. Operational criteria for designing information retrieval systems". American Documentation. 6 (2): 93. doi:10.1002/asi.5090060209.
  5. ^ Information Retrieval Models, Thomas Roelleke, ISBN 9783031023286, page 76, https://www.google.de/books/edition/Information_Retrieval_Models/YX9yEAAAQBAJ?hl=de&gbpv=1&pg=PA76&printsec=frontcover
  6. ^ Balayla, Jacques (2020). "Prevalence threshold (ϕe) and the geometry of screening curves". PLoS One. 15 (10). doi:10.1371/journal.pone.0240215.
  7. ^ Fawcett, Tom (2006). "An Introduction to ROC Analysis" (PDF). Pattern Recognition Letters. 27 (8): 861–874. doi:10.1016/j.patrec.2005.10.010.
  8. ^ Piryonesi S. Madeh; El-Diraby Tamer E. (2020-03-01). "Data Analytics in Asset Management: Cost-Effective Prediction of the Pavement Condition Index". Journal of Infrastructure Systems. 26 (1): 04019036. doi:10.1061/(ASCE)IS.1943-555X.0000512.
  9. ^ Powers, David M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies. 2 (1): 37–63.
  10. ^ Ting, Kai Ming (2011). Sammut, Claude; Webb, Geoffrey I. (eds.). Encyclopedia of machine learning. Springer. doi:10.1007/978-0-387-30164-8. ISBN 978-0-387-30164-8.
  11. ^ Brooks, Harold; Brown, Barb; Ebert, Beth; Ferro, Chris; Jolliffe, Ian; Koh, Tieh-Yong; Roebber, Paul; Stephenson, David (2015-01-26). "WWRP/WGNE Joint Working Group on Forecast Verification Research". Collaboration for Australian Weather and Climate Research. World Meteorological Organisation. Retrieved 2019-07-17.
  12. ^ Chicco D, Jurman G (January 2020). "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation". BMC Genomics. 21 (1): 6-1–6-13. doi:10.1186/s12864-019-6413-7. PMC 6941312. PMID 31898477.
  13. ^ Chicco D, Toetsch N, Jurman G (February 2021). "The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation". BioData Mining. 14 (13): 1-22. doi:10.1186/s13040-021-00244-z. PMC 7863449. PMID 33541410.
  14. ^ Tharwat A. (August 2018). "Classification assessment methods". Applied Computing and Informatics. doi:10.1016/j.aci.2018.08.003.
  15. ^ Fawcett, Tom (2006). "An Introduction to ROC Analysis" (PDF). Pattern Recognition Letters. 27 (8): 861–874. doi:10.1016/j.patrec.2005.10.010.
  16. ^ Piryonesi S. Madeh; El-Diraby Tamer E. (2020-03-01). "Data Analytics in Asset Management: Cost-Effective Prediction of the Pavement Condition Index". Journal of Infrastructure Systems. 26 (1): 04019036. doi:10.1061/(ASCE)IS.1943-555X.0000512.
  17. ^ Powers, David M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies. 2 (1): 37–63.
  18. ^ Ting, Kai Ming (2011). Sammut, Claude; Webb, Geoffrey I. (eds.). Encyclopedia of machine learning. Springer. doi:10.1007/978-0-387-30164-8. ISBN 978-0-387-30164-8.
  19. ^ Brooks, Harold; Brown, Barb; Ebert, Beth; Ferro, Chris; Jolliffe, Ian; Koh, Tieh-Yong; Roebber, Paul; Stephenson, David (2015-01-26). "WWRP/WGNE Joint Working Group on Forecast Verification Research". Collaboration for Australian Weather and Climate Research. World Meteorological Organisation. Retrieved 2019-07-17.
  20. ^ Chicco D.; Jurman G. (January 2020). "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation". BMC Genomics. 21 (1): 6-1–6-13. doi:10.1186/s12864-019-6413-7. PMC 6941312. PMID 31898477.
  21. ^ Chicco D.; Toetsch N.; Jurman G. (February 2021). "The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation". BioData Mining. 14 (13): 1-22. doi:10.1186/s13040-021-00244-z. PMC 7863449. PMID 33541410.
  22. ^ Tharwat A. (August 2018). "Classification assessment methods". Applied Computing and Informatics. doi:10.1016/j.aci.2018.08.003.
  23. ^ Balayla, Jacques (2020). "Prevalence threshold (ϕe) and the geometry of screening curves". PLoS One. 15 (10). doi:10.1371/journal.pone.0240215.
  24. ^ a b Olson, David L.; and Delen, Dursun (2008); Advanced Data Mining Techniques, Springer, 1st edition (February 1, 2008), page 138, ISBN 3-540-76916-1
  25. ^ Mower, Jeffrey P. (2005-04-12). "PREP-Mt: predictive RNA editor for plant mitochondrial genes". BMC Bioinformatics. 6: 96. doi:10.1186/1471-2105-6-96. ISSN 1471-2105. PMC 1087475. PMID 15826309.
  26. ^ Saito, Takaya; Rehmsmeier, Marc (2015-03-04). Brock, Guy (ed.). "The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets". PLOS ONE. 10 (3): e0118432. Bibcode:2015PLoSO..1018432S. doi:10.1371/journal.pone.0118432. ISSN 1932-6203. PMC 4349800. PMID 25738806.
    • Suzanne Ekelund (March 2017). "Precision-recall curves – what are they and how are they used?". Acute Care Testing.
  27. ^ Tripicchio, Paolo; Camacho-Gonzalez, Gerardo; D'Avella, Salvatore (2020). "Welding defect detection: coping with artifacts in the production line". The International Journal of Advanced Manufacturing Technology. 111 (5): 1659–1669. doi:10.1007/s00170-020-06146-4. S2CID 225136860.
  28. ^ Fatih Cakir, Kun He, Xide Xia, Brian Kulis, Stan Sclaroff, Deep Metric Learning to Rank, In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  29. ^ Zygmunt Zając. What you wanted to know about AUC. http://fastml.com/what-you-wanted-to-know-about-auc/

  • Baeza-Yates, Ricardo; Ribeiro-Neto, Berthier (1999). Modern Information Retrieval. New York, NY: ACM Press, Addison-Wesley, Seiten 75 ff. ISBN 0-201-39829-X
  • Hjørland, Birger (2010); The foundation of the concept of relevance, Journal of the American Society for Information Science and Technology, 61(2), 217-237
  • Makhoul, John; Kubala, Francis; Schwartz, Richard; and Weischedel, Ralph (1999); Performance measures for information extraction, in Proceedings of DARPA Broadcast News Workshop, Herndon, VA, February 1999
  • van Rijsbergen, Cornelis Joost "Keith" (1979); Information Retrieval, London, GB; Boston, MA: Butterworth, 2nd Edition, ISBN 0-408-70929-4

  • Information Retrieval – C. J. van Rijsbergen 1979
  • Computing Precision and Recall for a Multi-class Classification Problem

Retrieved from "https://en.wikipedia.org/w/index.php?title=Precision_and_recall&oldid=1112236752"