next up previous contents
Next: Some other statistics Up: 02jar Previous: Duality theory   Contents


The length of formal texts, costs

Probably the most interesting statistics would be the lengths of formal texts created in this project compared to the original CCL-book. These figures can be used to compute the so called the de Bruijn factor, but we run into some difficulties in deciding which measure to use.


Table II: The de Bruijn factors
Measure The CCL-book Formalized The de Bruijn factor
Characters (bytes) 312,620 3,098,460 9.91
Compressed $\approx 90,000$ 566,720 6.29
Lines 4,466 82,749 18.52
Tokens (words) 66,990 808,020 12.06


The items formalized thus far occupy 111.65 pages in the CCL-book. This material is formalized not only in the WAYBEL series but also in some other MML articles that were revised for the purpose of the CCL project. We include these other articles in our comparisons so the numbers here differ slightly from the ones given in Section [*]. We believe that these numbers tell something but the reader should be aware that a lot of material in the WAYBEL series should have been located in the YELLOW series or other MML articles. Some clean up is definitely due.

For this comparison, we assumed that the CCL-book has 40 lines per page, 70 characters per line and 15 tokens per line. These assumptions are on the conservative side as we were too lazy to do the exact counting and we do not have the CCL-book available in electronic form. Table [*] presents the numbers. Note that the formalization occupies only 566,720 bytes when compressed so the text is quite redundant; unfortunately, we cannot offer the corresponding number for the CCL-book. However, it is believed that material like the CCL-book becomes somewhere between 3 and 4 times shorter when compressed. Thus, in this table we used a compression rate of 3.5.

F. Wiedijk [60] reported some comparisons of de Bruijn factors for different types of mathematical texts and he used WAYBEL14 [21], which formalizes pages 105-108 of the CCL-book, as one of the cases. He used only the byte count and obtained the results shown in Table [*].


Table: The de Bruijn factor(s) for WAYBEL14 from [60]
  informal formal de Bruijn factor  
uncompressed 11.7K 78.4K apparent 6.7
compressed 4.0K 16.3K intrinsic 4.1
         


We cannot offer any reliable or even substantiated claims about the cost of the CCL project measured in man-years. F. Wiedijk [59] came to the conclusion that writing a MIZAR article takes approximately 1 month of work, and 1.5 months if we take into account the maintenance of MML. Applying these numbers on 57 articles in the CCL project results in somewhere between 5 and 7.5 man-years. Since no records of this type have been kept for the project, it is hard to say how accurate these numbers are. We have the impression that they are an underestimate--many articles took substantially longer than a month to complete while several were written in about a week. Many hours of effort of a general nature, especially on the part of the project leader G. Bancerek, not associated with any particular article are hard to put in numbers. Also, for many participants of the project a substantial amount of time was needed to learn the material they were supposed to formalize, and again it is hard to do accounting of such time. Certainly, this time varied significantly among the authors and there was a generally shared impression that the material in the WAYBEL series needed much more work per line than in the YELLOW series (but some of the YELLOW articles posed essential challenges). This impression becomes more apparent with the later items of the CCL-book.


next up previous contents
Next: Some other statistics Up: 02jar Previous: Duality theory   Contents
Grzegorz Bancerek 2002-07-25