How precisely should you read the Aenorm? A mathematical approach

Michel Vellekoop

07 May 2018

All written text contain errors. Subtle spelling errors, incorrect sentence structure, typographical errors, typesetting errors etc. Writing an article of substantial length (i.e. books or newspapers) without errors is almost imposible. The best method to reduce the amount of errors is proofreading, although it still does not guaranty all errors will be discovered and corrected.

A mathematican named Polya, who was responsible for the publication of several scientific magazines, got very annoyed by these errors and decided to analyze the concept of proofreading. Asking someone to read an article and count the number of errors they encountered is not very helpful on its own. If few errors were discovered by the reader, then it could be the text contained few errors or the reader was not very concentrated while proofreading.

Polya introduced the following: ask two people to proofread the article and keep a list of the encountered errors. It could still be the case that both readers were not very concentrated and missed several errors. Hence Polya imposed counting the errors both readers discovered. Denote the two proofreaders with A and B and the number of errors they discovered with a and b respectively. Let the number of errors discovered A and B had in common be donated by c where c is a number smaller than a and smaller than b. The number of errors that neither A nor B were able to discover can then be estimated under some basic assumptions. The reader of this article is advised to pauze here and think about how such an estimation could be constructed.

Suppose the total number of errors in the article is equal to n, person A has probability p of discovering an error and person B a probabilty of q. Moreover, assume that the events of person A and B discovering an error are independent. Now an accurate estimation of c can be made. The article contains n errors and the probability of both A and B discovering an error equals p times q (due to the independence assumption). Thus the number of errors found by both readers is expected to be:

npq = c

The probability of A discovering an error not discovered by B is p(1-q). Hence the expected number of errors in the entire article is:

np(1-q) = a-c

Similarly, the expected number of mistakes B found and A did not equals:

nq(1-p) = b-c

We now have three equations with 3 unknown variables, such that n, p and q can be determined and used to calculate the number of errors missed by both proofreaders:

n(1-q)(1-p). After some basic manipulatons the following equalty can be derived:

n(1-q)(1-p) = (a-c)(b-c)/c

How convenient. If for example A and B discovered 30 and 23 errors of which 15 in common (a = 30, b = 23 & c = 15) and you are willing to assume independence, then you could expect the article to still contain 7 errors.

The editorial staff of the Aenorm recommends you to re-read all artricles written by the Aenorm and ask a friend to do the same. Do not forget to keep a list of errors you discovered to determine the variable c afterwards. Perhaps you should start re-reading this article first, since it appears its writer did not have many friends.