The TPTP as a Subject of Research
Stephan Schulz
Technische Universität München
schulz[@]eprover.org
Introduction
The TPTP is the largest and most systematic collection of first-order
theorem proving problems. As such, it is mostly used as a tool in the
evaluation of first-order deduction systems.
However, the TPTP has first been published in November 1993, more that
15 years ago, and the 25 releases up to version 3.5.0 show a
significant change in the composition of the TPTP and the type of
problems that are represented in the library.
I suggest that it is useful to look into these developments, i.e. to
treat the TPTP not only as a tool of research, but also as a subject
of it.
Some preliminary results
Note: Problem properties are taken from the latest released version of
the problem for all instances of TPTP. This introduces a small error
in the case that a problem has been bugfixed. It also abstracts from
the elimination of explicit equality axioms from TPTP. Individual
problem features are extracted from TPTP headers, not recomputed from
scratch.
- Version is the TPTP version number.
- Problems is the number of problems in the given TPTP version.
- FOF prop is the proportion of problems in full first order format
(as opposed to CNF format).
- Axioms is the average number of input clauses or
formulas in problems.
- Atoms is the average number of atom occurrences
in problems.
- Preds is the average number of distinct
predicate symbols in problems.
- Funcs is the average number of distinct
functions symbols in problems.
Version | Problems | FOF prop | Axioms | Atoms | Preds | Funcs |
v1.0.0 | 2295 | 0.0 | 98.19 | 98.19 | 6.71 | 25.8 |
v1.1.1 | 2652 | 0.0 | 113.94 | 113.94 | 9.54 | 23.13 |
v1.1.2 | 2652 | 0.0 | 113.94 | 113.94 | 9.55 | 23.13 |
v1.1.3 | 2652 | 0.0 | 113.94 | 113.94 | 9.55 | 23.13 |
v1.2.0 | 2755 | 0.0 | 82.39 | 82.39 | 9.49 | 22.78 |
v1.2.1 | 2752 | 0.0 | 82.46 | 82.46 | 9.51 | 22.81 |
v2.0.0 | 3277 | 0.07 | 72.48 | 72.48 | 8.57 | 20.08 |
v2.1.0 | 3622 | 0.1 | 75.04 | 75.04 | 10.57 | 22.29 |
v2.2.0 | 4004 | 0.17 | 69.75 | 69.75 | 10.12 | 21.16 |
v2.2.1 | 4003 | 0.17 | 69.72 | 69.72 | 10.12 | 21.15 |
v2.3.0 | 4229 | 0.16 | 66.47 | 66.47 | 9.72 | 20.35 |
v2.4.1 | 5882 | 0.25 | 75.8 | 75.8 | 13.78 | 20.14 |
v2.5.0 | 1132 | 0.37 | 120.59 | 120.59 | 15.89 | 25.66 |
v2.6.0 | 6973 | 0.22 | 82.78 | 82.78 | 21.23 | 18.97 |
v2.7.0 | 7267 | 0.24 | 86.05 | 86.05 | 20.58 | 20.74 |
v3.0.0 | 7266 | 0.24 | 86.06 | 86.06 | 20.58 | 20.74 |
v3.0.1 | 7266 | 0.24 | 86.06 | 86.06 | 20.58 | 20.74 |
v3.1.0 | 8013 | 0.29 | 85.81 | 85.81 | 19.45 | 22.89 |
v3.1.1 | 8013 | 0.29 | 85.81 | 85.81 | 19.45 | 22.89 |
v3.2.0 | 8984 | 0.3 | 144.04 | 144.04 | 20.46 | 27.22 |
v3.3.0 | 9894 | 0.37 | 143.37 | 143.37 | 21.0 | 26.28 |
v3.4.0 | 11279 | 0.45 | 19279.78 | 19279.78 | 1504.55 | 6167.97 |
v3.4.1 | 11279 | 0.45 | 19279.78 | 19279.78 | 1504.55 | 6167.97 |
v3.4.2 | 11279 | 0.45 | 19279.78 | 19279.78 | 1504.55 | 6167.97 |
v3.5.0 | 11395 | 0.44 | 19444.82 | 19444.82 | 1517.59 | 6225.15 |
Interesting observations:
- There are several TPTP releases which seem to mostly fix bugs and
have no significant impact on the problem set. I'm sure this is
known to the maintainer, but I was not aware of this.
- The official archive of TPTP-2.5.0 is defect.
- The proportion of FOF problems increases fairly steadily and
nearly monotonously.
- Also, all size measures increase over time, although not with the
same steadiness.
- TPTP-3.4.0 has a significant influx of very large problems - I
suspect the CYC, MIZAR, and SUMO problems. If this is typical for
future applications, system designers face new challenges.
Points for discussion
- What other syntactic properties warrant such an analysis?
- There is a social-science dimension to this - in how far is the
TPTP pushing developers to support certain features? How much is it
pushing users to expect certain features and tailor their problems
towards those?
- Is this more than navel-gazing? Is the TPTP representative for
actual applications? If not, is it becoming more
representative over time?
- Are there more points to discuss?