CASC-15 Entrants' System Descriptions

AI-SETHEO
ANDP
DISCOUNT 2.1
E 0.1
Bliksem 1.00
Gandalf c-1.1
HERBY 1.0 and PHERBY 1.0
Otter 3.0.5 and MACE 1.3.2
p-SETHEO
Functional Satchmo
SCOTT v3.1.0
SPASS V1.0.0a
TGTP and Octopus
Vampire
Waldmeister 798

AI-SETHEO

Joachim Draeger
Institut für Informatik, Technische Universität München
draeger@informatik.tu-muenchen.de

Architecture

AI-SETHEO is a hybrid system combining a connection tableaux theorem prover (SETHEO [GL+94, MI+97]) and a lemma generator (DELTA [Sch94]). This combines aspects of bottom-up reasoning with the top-down reasoning in connection tableaux theorem provers. For avoiding a performance breakdown due to the overwhelming number of brute-force generated lemmata a lemma selection procedure rejects both trivial and irrelevant lemmata.

Implementation

AI-SETHEO is implemented as a shell script which invokes the prover SETHEO, the lemma generator DELTA, and the lemma selection procedure. The selection procedure is realized in GAWK.

Expected Competition Performance

AI-SETHEO is still under construction; thus the competition performance can only be guessed. Early experiments show that it should be equal or better to the performance of SETHEO.

References

GL+94: Goller C., Letz R., Mayr K., Schumann J. (1994), SETHEO V3.2: Recent Developments, Bundy A., Proceedings of the 12th International Conference on Automated Deduction (Nancy, France), pp.778-782, Lecture Notes in Artificial Intelligence 814, Springer-Verlag.
MI+97: Moser M., Ibens O., Letz R., Steinbach J., Goller C., Schumann J., Mayr K. (1997), SETHEO and E-SETHEO: The CADE-13 Systems, Journal of Automated Reasoning 18(2), pp.237-246.
Sch94: Schumann J. (1994), DELTA - A Bottom-up Preprocessor for Top-Down Theorem Provers, Bundy A., Proceedings of the 12th International Conference on Automated Deduction (Nancy, France), pp.774-777, Lecture Notes in Artificial Intelligence 814, Springer-Verlag.

ANDP

Dafa Li
Department of Applied Mathematics, Tsinghua University
lidafa@tsinghua.edu.cn

Architecture

ANDP [Li97] is a natural deduction system for first-order logic without equality, adapted from Gentzen's system. There are five connectives for ANDP, negation, conjunction, disjunction, implication, and equivalence, whose precedence decreases in that order. The rules of inference for the connectives are MP (Modus Ponens), MT (Modus Tollens), LDS and RDS (Left and Right Disjunctive Syllogism), CP (Conditional Proof), ADD (Addition), CNJUN (Conjunction), SIMP (Simplication), De Morgan's rule, Double negation, IMP (Implication), and DILEMMA (cases). There are two quantifiers for ANDP, the universal quantifier (Ax) and the existential quantifier (Ex). The rules of inference for the quantifiers are UG (Universal Generalization), EG (Existential Generalization), US (Universal Specialization), and ES (Existential Specialization). There are two unification algorithms for treating quantifiers. One is for introducing quantifiers, the other is for eliminating quantifiers without Skolem functions.

Search strategies: First use the rules for connectives then for quantifiers. First reason from bottom to top, then from top to bottom.

Other strategies: Subsumption with quantifiers, first eliminate existential quantifiers then universal quantifiers, minimal scopes of quantifiers.

Implementation

ANDP is written in Common LISP (Lucid Common Lisp, KCL) and runs on a SUN workstation. In input formulas &, or, and -> are used to stand for conjunction, disjunction, and implication, respectively. Therefore input formulas, e.g., (Ax)[Px & Qy -> Py], look like those in text books. The first-order logic formulas are stored internally in Polish style.

Expected Competition Performance

ANDP is entered in the FOF division. ANDP can not treat formulas with equality. To treat equality it is neccessary to introduce axioms for equality. Therefore ANDP is expected to perform well on theorems without equality, but may fail to prove theorems with equality.

References

Li97: Li D. (1995), Unification Algorithms for Eliminating and Introducing Quantifiers in Natural Deduction Automated Theorem Proving, Journal of Automated Reasoning 18(1), pp.105-134.

E 0.1

S. Schulz
Institut für Informatik, Technische Universität München
schulz@informatik.tu-muenchen.de

Architecture

E 0.1 is a purely equational theorem prover. It is not primarily intended as an independent prover, but as an preprocessor implementing the bottom-up part of the METOP calculus [Mos96], with SETHEO [MI+97] to implement the top-down part in a future combined system. However, as the calculus employs inference rules very similar to those of superposition calculi [BG94], it is also possible to use E as a prover on its own. The implemented calculus combines ordered paramodulation with rewriting. For the special case that only unit clauses are present, this calculus degenerates into unfailing completion [BDP89] as implemented in DISCOUNT [DKS97]. As we expect SETHEO to deal with the non-equational part of the theory, no special rules for non-equational literals have been implemented, i.e. resolution is simulated via paramodulation and equality resolution. A particular feature of E is the term representation. Terms are maximally shared (within two classes of clauses) to minimize memory consumption, and we employ graph rewriting techniques to optimize simplification. As an additional feature, E has a very clean interface for adding and developing new search control heuristics.

Implementation

E is implemented in ANSI C, using the GNU C compiler. The program has been successfully installed under SunOS 4.3.x, Solaris 2.x, HP-UX B 10.20, and various versions of Linux. It uses it's own library, CLIB, for data types, I/O, and memory management.

Expected Competition Performance

E is in a very early phase of development, so it is hard to predict actual performance. We expect it to perform adequately in the UEQ category, while the performance in the MIX category is likely to be mediocre for the current version.

References

BDP89: Bachmair L., Dershowitz N., Plaisted D.A. (1989), Completion Without Failure, Ait-Kaci H., Nivat M., Resolution of Equations in Algebraic Structures, pp.1-30, Academic Press.
BG94: Bachmair L., Ganzinger H. (1994), Rewrite-Based Equational Theorem Proving with Selection and Simplification, Journal of Logic and Computation 4(3), pp.217-247.
DKS97: Denzinger J., Kronenburg M., Schulz S. (1997), DISCOUNT: A Distributed and Learning Equational Prover, Journal of Automated Reasoning 18(2), pp.189-198.
MI+97: Moser M., Ibens O., Letz R., Steinbach J., Goller C., Schumann J., Mayr K. (1997), SETHEO and E-SETHEO: The CADE-13 Systems, Journal of Automated Reasoning 18(2), pp.237-246.
Mos96: Moser M. (1996), Goal-Directed Resoning in Clausal Logic with Equality, PhD thesis, Institut für Informatik, Technische Universität München, Munich, Germany, and Dissertationen zur Informatik 2, CS Press.

DISCOUNT 2.1

S. Schulz, F. Brandt
Institut für Informatik, Technische Universität München
{schulz,brandtf}@informatik.tu-muenchen.de

Architecture

DISCOUNT/TSM is a sequential version of the well-known distributed equational theorem prover DISCOUNT [DKS97], using a new learning heuristic to control the selection of certain inference steps. DISCOUNT is based on unfailing completion, extended to deal with arbitrarily quantified goals. The basic algorithm repeatedly selects an unprocessed equation, rewrites it to normal form w.r.t. the already processed equations, and uses it to build critical pairs (new unprocessed equations). The equation is then used for interreduction and added to the set of processed equations. Interleaved with this completion process, critical goals are generated (by paramodulating facts into goals), and normalized. The process stops when either a goal has been proven or all facts and goals have been processed.

The selection of the next equation to be processed is controlled by a term space map (TSM) [Sch98], a generalized term evaluation tree [DS96]. A TSM is a recursive structure capturing characteristic distributions of term nodes in representative patterns of useful and useless equations collected from previous proofs. Suitable proofs for learning are selected using tree-based similarity criteria on TSMs and some simple features of the problem specification.

Implementation

Both DISCOUNT and the supplementary programs for proof analysis and learning are implemented in ANSI C. The programs have been successfully installed under SunOS 4.3.x, Solaris 2.x and Linux. DISCOUNT originally was designed in 1991, before the use of efficient indexing methods became widespread. Consequently, the basic inference engine is primitive by today's standards, and modern systems (e.g. Waldmeister [HBF96]) achieve a much higher throughput.

Expected Competition Performance

We expect this new DISCOUNT version to perform similarly to the previous versions. That is, it should be a serious contender, but will probably not be among the top three systems.

References

DKS97: Denzinger J., Kronenburg M., Schulz S. (1997), DISCOUNT: A Distributed and Learning Equational Prover, Journal of Automated Reasoning 18(2), pp.189-198.
DS96: Denzinger J., Schulz S. (1996), Learning Domain Knowledge to Improve Theorem Proving, McRobbie M., Slaney J.K., Proceedings of the 13th International Conference on Automated Deduction (New Brunswick, USA), pp.62-76, Lecture Notes in Artificial Intelligence 1104, Springer-Verlag.
HBF96: Hillenbrand T., Buch A., Fettig R. (1996), On Gaining Efficiency in Completion-Based Theorem Proving, Ganzinger H., Proceedings of the 7th International Conference on Rewriting Techniques and Applications (New Brunswick, USA), pp.432-435, Lecture Notes in Computer Science 1103, Springer-Verlag.
Sch98: Schulz S. (1998), Term Space Mapping for DISCOUNT, Proceedings of the CADE-15 Workshop on Using AI Methods in Deduction (Lindau, Germany).

Bliksem 1.00

H. de Nivelle
CWI Amsterdam
nivelle@cwi.nl

Architecture

Bliksem is a resolution based theorem prover. It supports ordered resolution, factoring, ordered paramodulation, unit simplification and demodulation. Bliksem supports many orders (KBO, RPO, LPO, simple symbol orders, non-lifaftble orders) and resolution decision procedures. At this moment it has decision procedures for the E⁺-class, for the prefix class exists^* forall forall exists^*, and for the guarded fragment of first order logic [FL+93, AvN96, DeN94]. Additionally Bliksem also has the equality strategies of [BG90]. There is no hyperresolution in Bliksem, but it can be simulated using an appropriate order.

Bliksem supports various transformations from first order logic to clausal normal form. In particular it supports various structural and non-structural transformations [ABFL94].

Implementation

Bliksem is implemented in C. The datastructure that is used for literals, is the prefix with ends representation. A literal is stored in two arrays t and e. The first contains the literal in prefix notaton. The second contains for each subterm a reference to the end of this subterm. So suppose that the term p(X,s(s(Y))) is stored on position 100. This is done as follows: t[100] = p e[100] = 105 t[101] = X e[101] = 102 t[102] = s e[102] = 105 t[103] = s e[103] = 105 t[104] = Y e[104] = 105 Clauses are stored in another array as a sequence of indices referring to the literals.

We use discrimination trees for forward subsumption, forward demodulation, and forward unit simplification. In the discrimination trees the terms are stored in a normalized manner. In addition we use bit filters as a filter for subsumption. There are no optimized techniques for the other derivation rules.

Expected Competition Performance

As it is the first time for us it is difficult to give expected performances.

References

AvN96: Andreka H., van Benthem J., Nemeti I. (1996), Modal Languages and Bounded Fragments of Predicate Logic, Research Report ML-96-03, Institute for Logic, Language and Computation, Amsterdam, The Netherlands.
BFM94: Baaz M., Fermüller C., Leitsch A. (1994), A Non-Elementary Speed Up in Proof Length by Structural Clause Form Transformation, Proceedings of the 9th Annual IEEE Symposium on Logic in Computer Science (Paris, France), pp.213-219, IEEE Computer Society Press.
BG90: Bachmair L., Ganzinger H. (1990), On Restrictions of Ordered Paramodulation with Simplification, Stickel M.E., Proceedings of the 10th International Conference on Automated Deduction (Kaiserslautern, Germany), pp.427-441, Lecture Notes in Artificial Intelligence 449, Springer-Verlag.
DeN94: DeNivelle H. (1994), Resolution Games and Non-Liftable Resolution Orderings, Pacholski L., Tiuryn J., Computer Science Logic, pp.279-293, Springer-Verlag.
FL+93: Fermüller C., Leitsch A., Tammet T., Zamov N. (1993), Resolution Methods for the Decision Problem, Lecture Notes in Computer Science 679, Springer-Verlag.

Gandalf c-1.1

T. Tammet
Department of Computer Science, University of Göteborg
tammet@cs.chalmers.se

Architecture

Gandalf implements a number of different basic strategies: binary ordered resolution for several orderings, versions of set-of-support resolution, binary unit resolution, hyperresolution. Enhancements are used for cutting off literals and combining different strategies, in particular forward and backward reasoning, into one run. Equality is handled by ordered paramodulation and demodulation.

The basic idea of the automatic mode in Gandalf - also used during the CASC-14 competition - is time-slicing: Gandalf selects a set of different search strategies, allocates time to these and finally runs the strategies one after another.

The selection of strategies and the percentage of time they receive is in a somewhat ad hoc way based on the characteristics of the problem: percentage of Horn and almost-Horn clauses, the number of clauses, percentage of clauses where literals can be ordered using term and variable depths.

There exist also versions of Gandalf for intuitionistic logic and type theory, see [Tam96] and [TS96]. More information about Gandalf can be found in [Tam98].

Implementation

Gandalf is implemented in Scheme. We are using the scm interpreter developed by A. Jaffer and the Scheme-to-C compiler Hobbit developed by T. Tammet for the scm system.

The source, binaries, manual and other materials are available at

    http://www.cs.chalmers.se/~tammet/gandalf/

Expected Competition Performance

Since Gandalf won the MIX class of CASC-14, we expect Gandalf to be one of the top performers in the MIX category during CASC-15. With some luck it may even win the MIX category again.

Gandalf is likely to do well in the UEQ, SAT and FOF categories, although it is unlikely it will win any of these.

References

Tam98: Tammet T. (1998), Towards Efficient Subsumption, Kirchner C., Kirchner H., Proceedings of the 15th International Conference on Automated Deduction (Lindau, Germany), Lecture Notes in Artificial Intelligence, Springer-Verlag.
Tam96: Tammet T. (1996), A Resolution Theorem Prover for Intuitionistic Logic, McRobbie M., Slaney J.K., Proceedings of the 13th International Conference on Automated Deduction (New Brunswick, USA), pp.2-16, Lecture Notes in Artificial Intelligence 1104, Springer-Verlag.
TS96: Tammet T., Smith J. (1996), Optimised Encodings of Fragments of Type Theory in First Order Logic, Berardi S., Coppo M., Types for Proofs and Programs: Proceedings of the International Workshop TYPES'95 (Torino, Italy), pp.265-287, Lecture Notes in Artificial Intelligence 1158, Springer-Verlag.

HERBY 1.0 and PHERBY 1.0

M. Almulla
Department of Mathematics and Computer Science, Kuwait University
almulla@sun470.sci.kuniv.edu.kw

M. Newborn
School of Computer Science, McGill University
newborn@opus.cs.mcgill.ca

Architecture

HERBY [Alm95, AN96] is a semantic-tree based theorem prover that implements a number of different heuristics [YAN96, YAN] for choosing ground atoms from the Herbrand base of the given problem. The version of HERBY that will participate at CADE-15 is expected to use 15 atom selection heuristics. The selection of heuristics and the percentage of time they receive is fixed and is independent of the characteristics of the problem. An earlier version of HERBY was tested on the Stickel test set.

PHERBY is a parallel version of HERBY. PHERBY uses a bigger set of heuristics than HERBY, as well as cooperative computing in determining the quality of atoms to be used. This is the first version of the program.

Implementation

HERBY and PHERBY are implemented in C. PHERBY runs on a PVM consisting of a network of approximately 20 120MHz and 150MHz pentium computers running the LINUX operating system.

Expected Competition Performance

72 theorems were used for testing ATP programs at the CADE-14 competition. HERBY solved 20 of them in less than 300 seconds.

PHERBY is expected to perform better than the serial version HERBY. However, it is expected to show much weaker performance compared to resolution-refutation theorem provers. Out of the 72 theorems that were used for testing ATP programs at the CADE-14 competition, PHERBY solved 24 in less than 300 seconds.

References

Alm95: Almulla M. (1995), Analysis of the Use of Semantic Trees for Proofs of Unsatisfiability, PhD thesis, McGill University, Montreal, Canada.
AN96: Almulla M., Newborn M. (1996), The Practicality of Generating Semantic Trees for Proofs of Unsatisfiability, International Journal of Computer Mathematics 62(1-2), pp.45-61.
YAN96: Yu Q., Almulla M., Newborn M. (1996), Heuristics for a Semantic Tree Theorem Prover, Kautz H., Selman B., Proceedings of the 4th International Symposium on Artificial Intelligence and Mathematics AI/MATH-96 (Fort Lauderdale, USA), pp.162-165.
YAN: Yu Q., Almulla M., Newborn M. (Submitted), Heuristics used by HERBY for Semantic Tree Theorem Proving, Annals of Mathematics and Artificial Intelligence.

Otter 3.0.5 and MACE 1.3.2

W.W. McCune
Mathematics and Computer Science Division, Argonne National Laboratory
mccune@mcs.anl.gov

Architecture

Otter [McC94a, McC96, MW97] searches for proofs, and MACE [McC94b, McC94c] searches for small finite models. Both programs apply to statements in first-order (unsorted) logic with equality. The two programs are independent and work by completely different methods. (In practice, the programs are complementary.) Version 3.0.5 of Otter has been used. Otter is based on resolution and paramodulation applied to clauses; an Otter search uses the "given clause algorithm" and typically involves a large database of clauses; subsumption and demodulation play an important role. Version 1.3.2 of MACE has been used. Roughly speaking, MACE works as follows. For a given domain size, an equivalent propositional problem is constructed and a Davis-Putnam-Loveland decision procedure is applied. MACE typically cannot handle clauses with many variables or deeply nested terms.

Implementation

Both programs are written in C. Otter uses shared data structures for clauses and terms, and it uses indexing for resolution, paramodulation, forward and backward subsumption, forward and backward demodulation, and unit conflict. MACE is much simpler than Otter; its Davis-Putnam-Loveland procedure uses a counting algorithm to do the unit resolutions, and forward subsumption is not applied.

Expected Competition Performance

These versions of Otter and MACE are nearly identical to the versions entered in last year's CASC-14. Both programs did well last year, but neither won any top prizes. We do not expect any winning performances in CASC-15.

References

McC94a: McCune W.W. (1994), MACE: Models and Counterexamples, http://www.mcs.anl.gov/home/mccune/ar/mace/.
McC94b: McCune W.W. (1994), Otter 3.0 Reference Manual and Guide, Technical Report ANL-94/6, Argonne National Laboratory, Argonne, USA.
McC94c: McCune W.W. (1994), A Davis-Putnam Program and its Application to Finite First-Order Model Search: Quasigroup Existence Problems, Technical Report ANL/MCS-TM-194, Argonne National Laboratory, Argonne, USA.
McC96: McCune W.W. (1996), Otter, http://www.mcs.anl.gov/home/mccune/ar/otter/.
MW97: McCune W.W., Wos L. (1997), Otter: The CADE-13 Competition Incarnations, Journal of Automated Reasoning 18(2), pp.211-220.

Acknowledgments: Ross Overbeek, Larry Wos, Bob Veroff, and Rusty Lusk contributed to the development of Otter. MACE was motivated by previous work of John Slaney, Mark Stickel, and Hantao Zhang.

p-SETHEO

A. Wolf, R. Letz, G. Stenz
Institut für Informatik, Technische Universität München
{wolfa,letz,stenzg}@in.tum.de

Architecture

p-SETHEO [Wol98] (single processor version) is an environment for strategy parallel competition of automated theorem provers. Actually, it is used for controlling strategies which consist of different parameterizations of the automated theorem prover SETHEO [MI+97]. p-SETHEO time-slices different strategies. The first successful sub-prover stops the system.

Implementation

The implementation uses Perl, C and shell tools in approximately 1000 lines of code (without SETHEO).

Expected Competition Performance

p-SETHEO is still under construction, so we cannot give any estimations.

References

MI+97: Moser M., Ibens O., Letz R., Steinbach J., Goller C., Schumann J., Mayr K. (1997), SETHEO and E-SETHEO: The CADE-13 Systems, Journal of Automated Reasoning 2(18), pp.237-246.
Wol98: Wolf A. (1998), p-SETHEO: : Strategy Parallelism in Automated Theorem Proving, de Swart H., Proceedings of International Conference TABLEAUX'98: Analytic Tableaux and Related Methods (Oisterwijk, The Netherlands), Lecture Notes in Artificial Intelligence, Springer-Verlag.

Functional Satchmo

T. Geisler and S. Panne
Insitut für Informatik, Universität München
{Tim.Geisler,Sven.Panne}@informatik.uni-muenchen.de

Architecture

Functional Satchmo [GPS97] implements the positive unit hyper-resolution (PUHR) tableau method [BY96,] a calculus for range-restricted clauses, which is actually a formalisation of Satchmo [MB88].

PUHR tableaux are trees in which every node is labeled with a ground atom (unit) or with a positive ground clause. PUHR tableaux for a given clause set are built starting from the tree consisting of a single unlabeled node, by repeatedly applying the following two rules to the leaf of a branch. The PUHR rule resolves the body literals of an input clause simultaneously with the units on the branch and extends the branch with a new leaf, labeled with the (hyper-)resolvent. The splitting rule extends the branch with n new leaf nodes B¹,...,Bⁿ, if the branch contains a positive ground clause B¹ v...v Bⁿ, but none of the units B¹ or ... or Bⁿ. If the empty clause is selected, the branch is marked as closed. If none of these rules is applicable, then the set of units of a branch represents a Herbrand model of the input clause set. If all branches are closed, then the input clause set is unsatisfiable.

Complement splitting, a variant of the above splitting rule, is used to enhance efficiency and to guarantee minimality of the leftmost generated model [BY96]. A newly appended unit is additionally labeled by the complements of the positive labels of all its right siblings. A branch is now marked as closed, too, if the set of the labels of its units contains two complementary literals.

Incremental evaluation avoids repeated applications of the PUHR rule with the same clauses and accelerates its applicability test: If a unit is added to the branch in a splitting step, then exactly those PUHR steps using this unit have to be applied as the next steps. This leads to significantly more efficient implementations of the PUHR tableau method [SG96].

The set of positive ground clauses for which splitting has not yet been applied, is partitioned into nesting and non-nesting clauses. The latter clauses are ordered with increasing length and are preferred to the former ones, which must be enqueued to ensure fairness.

Implementation

Functional Satchmo is implemented in the purely functional language Haskell, which performs lazy evaluation. A PUHR tableau is represented as a (lazy) list of its open branches. The sets of units in tableau branches are represented as standard discrimination trees.

Expected Competition Performance

We expect average performance in the SAT division.

References

BY96: Bry F., Yahya A. (1996), Minimal Model Generation with Positive Unit Hyper-resolution Tableaux, Proceedings of TABLEAUX'96: the 5th Workshop on Theorem Proving with Analytic Tableaux and Related Methods (Palermo, Italy), pp.143-159, Lecture Notes in Artificial Intelligence 1071, Springer-Verlag.
GPS97: Satchmo: The Compiling and Functional Variants, Journal of Automated Reasoning 18(2), pp.227-236.
MB88: Manthey R., Bry F. (1988), SATCHMO: A Theorem Prover Implemented in Prolog, Lusk E. Overbeek R., Proceedings of the 9th International Conference on Automated Deduction (Argonne, USA), pp.415-434, Lecture Notes in Computer Science 310, Springer-Verlag.
SG96: Schütz H., Geisler T. (1996), Efficient Model Generation through Compilation, McRobbie M., Slaney J.K., Proceedings of the 13th International Conference on Automated Deduction (New Brunswick, USA), pp.431-447, Lecture Notes in Artificial Intelligence 1104, Springer-Verlag.

SCOTT v3.1.0

K. Hodgson and J.K. Slaney
Automated Reasoning Project, Australian National University
{kahlil,jks}@arp.anu.edu.au

Architecture

SCOTT (Semantically Constrained OTTER) [SH98] is an extension of OTTER [McC94] and uses the same inference rules and overall proof search structure as OTTER. It adds to its parent prover a module which searches dynamically for small finite models of subsets of the clauses in order to extract semantic information which may help to guide the proof search. For the purposes of CASC-15, the models are used only to adjust the weights of clauses in order to make OTTER prefer clauses which are false in the guiding model or models. The difference between the present version of SCOTT and previous ones [SLM94, Sla93] is mainly that it now generates and uses many models rather than just one, resulting in richer semantic control and in greater stability under small perturbations of problem formulations and or settings. Although many more elaborate uses of the models are easy to envisage, for the present the extra weight assigned to clauses is a function simply of the number of models in which they are true.

Implementation

SCOTT is written in C. It uses the existing program OTTER as a sub-program to perform the proof search and the existing constraint satisfaction program FINDER [Sla94] to generate the models. The additional code implements the interface between the two, maintains the list of known models, labels clauses with the numbers of models in which they hold and controls calls to the model-generator.

Expected Competition Performance

In principle SCOTT should perform at least as well as OTTER across all parts of the Mix division, if only because it has OTTER as a subroutine. However, in practice it does not invariably do better than OTTER, for three reasons. Firstly it has the extra overheads of searching for models and testing clauses against them. Secondly, the settings of OTTER in autonomous mode are fine-tuned for searching without semantic guidance and may not be ideal when a semantic component is added. Thirdly, there are no magic bullets: any strategy may lose sometimes, and in order to make the comparison interesting we always make SCOTT try a model-guided search if possible. Overall, we expect good performance in comparison with cognate systems that employ relatively naïve guidance.

References

McC94: McCune W.W. (1994), Otter 3.0 Reference Manual and Guide, Technical Report ANL-94/6, Argonne National Laboratory, Argonne, USA.
SH98: Slaney J.K., Hodgson K. (1998), System Description: MSCOTT, TR-ARP-06-98, Automated Reasoning Project, Australian National University, Canberra, Australia.
SLM94: Slaney J.K., Lusk E., McCune W.W. (1994), SCOTT: Semantically Constrained Otter, System Description, Bundy A., Proceedings of the 12th International Conference on Automated Deduction (Nancy, France), pp.764-768, Lecture Notes in Artificial Intelligence 814, Springer-Verlag.
Sla93: Slaney J.K. (1993), SCOTT: A Model-Guided Theorem Prover, Bajcsy R., Proceedings of the 13th International Conference on Artificial Intelligence (Chambery, France), pp.109-114, Morgan-Kaufman.
Sla94: Slaney J.K. (1994), Finite Domain Enumerator, System Description, Bundy A., Proceedings of the 12th International Conference on Automated Deduction (Nancy, France), pp.764-768, Lecture Notes in Artificial Intelligence 814, Springer-Verlag.

SPASS V1.0.0a

C. Weidenbach
Max-Planck-Institut für Informatik
weidenb@mpi-sb.mpg.de

Architecture

SPASS [Wei97] is an automated theorem prover for full sorted first-order logic with equality. Its calculus is an extension of superposition by sorts [Wei98] and a splitting rule for case analysis. In contrast to many approaches to order-sorted clausal reasoning, the calculus enables sort predicates and equations to occur arbitrarily within clauses. Therefore, the sort theory is not separated from the problem clauses, but automatically and dynamically extracted [GMW97]. Nevertheless, it can be used to significantly cut the search space.

The splitting rule splits clauses, if possible, into variable disjoint parts that can be independently refuted. This rule is not necessary for completeness of the calculus but it makes SPASS an efficient decision procedure for several decidable subclasses of first-order logic [HS97].

We put a lot of effort in a clause normal form translation that produces small clause normal forms. The major techniques are renaming and improved Skolemization [NRW98].

Implementation

SPASS is written in ANSI-C and relies on a generic library that supports data structures and efficient functions for automated theorem proving.

The SPASS distribution containing the source code for SPASS is freely available from the SPASS homepage. The code should compile and run on any machine where the usual GNU tools are available. This has been tested on SUN Sparc workstations under Solaris and SunOS, on personal computers under Linux and Solaris and on Dec Alpha workstations under Digital Unix.

Expected Competition Performance

We hope to improve on the performance of SPASS at CASC-13 and CASC-14.

References

GMW97: Ganzinger H., Meyer C., Weidenbach C. (1997), Soft Typing for Ordered Resolution, McCune W.W., Proceedings of the 14th International Conference on Automated Deduction (Townsville, Australia), pp.321-335, Lecture Notes in Artificial Intelligence 1249, Springer-Verlag.
HS97: Hustadt U., Schmidt R.A. (1997), On Evaluating Decision Procedures for Modal Logics, Pollack M.E., Proceedings of the 15th International Joint Conference on Artificial Intelligence (Nagoya, Japan), pp.202-207.
NRW98: Nonnengart A., Rock G., Weidenbach C. (1998), On Generating Small Clause Normal Forms, Kirchner C., Kirchner H., Proceedings of the 15th International Conference on Automated Deduction (Lindau, Germany), Lecture Notes in Artificial Intelligence, Springer-Verlag.
Wei97: Weidenbach C. (1997), SPASS: Version 0.49, Journal of Automated Reasoning 2(18), pp.247-252.
Wei98: Weidenbach C. (1998), Sorted Unification and Tree Automata, Bibel W., Schmitt P.H., Automated Deduction, A Basis for Applications, Kluwer Academic Publishers.

TGTP and Octopus

M. Newborn
School of Computer Science, McGill University
newborn@cs.mcgill.ca

Architecture

TGTP [New97] is a resolution-refutation theorem prover. It carries out a sequence of iteratively-deepening searches, looking for a linear proof. While searching for a linear proof, TGTP also stores information on generated unit clauses. This information is stored in a large hash table with space for one million entries, permitting proofs that are not linear to be derived. Completeness is sacrificed only by limits on the number of literals in a clause and on the number of constants, functions, and variables in a literal. TGTP uses only two inference rules: binary resolution and binary factoring.

Octopus is a parallel verison of TGTP, with one master and as many slaves as available - 25 are expected for the CADE-15 competition. The master computer reads the given theorem from disk, spawns slaves and then sends the theorem to the slaves. Each slave carries out the same search process, but each palces a different limit on the number of literals in a clause and the number of constants, functions, and variables in a literal. In addition, some processors use the set-of-support strategy and others do not.

Implementation

TGTP is implemented in C, consisting of twelve -.c files, two -.h files, 150 functions, and 12286 lines of code.

Octopus modifies only slightly the code of TGTP for the parallel implementation. It is expected to run on a network of 20-25 pentium computers running at approximately 150Mhz. The processors communicate with eachother using Parallel Virtual Machine (pvm).

Expected Competition Performance

At CADE-14, TGTP solved 34 of the 72 theorems. The current version is slightly stronger, capable of solving a few more of the 72 theorems, but remaining weak on theorems involving the equality axioms.

Octopus is slightly stronger than TGTP, solving 40 theorems at CADE-14 and currently able to solve another three or four.

References

New97: Newborn M. (1997), The Great Theorem Prover, Version 3, Newborn Software, Montreal, Canada.

Vampire

A. Voronkov
Computer Science Department, Uppsala University
voronkov@csd.uu.se

Architecture

Vampire [Vor95] is a resolution-based theorem prover. It is not fully implemented yet, so it is unclear what features we shall be able to present at CASC.

In the near future we plan to implement binary/hyperresolution with clause splitting. Our main emphasis is on the efficient implementation as opposite to logical optimizations.

Implementation

Vampire is implemented in C++ with no tuning to a particular computer or system. Except for a couple of string library functions, ANSI C++ is used.

We use very efficient data structures for indexing. At the moment we have code trees for forward subsumption and a combination of discrimination trees and substitution trees for back subsumption and aggressive term structure sharing. We are working on code trees for binary resolution and hyperresolution.

Expected Competition Performance

We do not expect high performance since we shall not be able to implement equality before the competition. Our aim is to see how it works on nonequality problems and make a good version for the 1999 competition.

References

Vor95: Voronkov A. (1995), The Anatomy of Vampire, Journal of Automated Reasoning 15(2), pp.237-265.

Waldmeister 798

T. Hillenbrand, A. Jaeger, B. Löchner, and A. Buch
FB Informatik, Universität Kaiserslautern
{hillen,jaeger,loechner}@informatik.uni-kl.de,arnim.buch@sdm.de

Architecture

Waldmeister is a theorem prover for unit equational logic. Its proof procedure is unfailing Knuth-Bendix completion [BDP89]. We started the development from a three-level model of saturation-based ATP systems [BH96, HB+96]. Since the efforts of the last years have led to a satisfactory inference machine, we now focus on the question how to guide the proof search. The main completion loop is parameterized by a selection heuristic and a reduction ordering. Both are derived from the algebraic structure described by the axioms. Since last year, the derivation process has been lifted such that control knowledge is expressed declaratively, which eases the integration of human experience. For full flexibility, the search parameters may be re-instantiated during the proof search. Recently, we also realized a database-driven mechanism which allows simple extensions of the underlying inference calculus, e.g. for reasoning on Horn clauses as described in [BDP89:pp.24ff].

Implementation

Reaching efficiency both in terms of time and space has been a major goal in the development of our inference machine. As to the first, fast rewriting is achieved by sophisticated indexing techniques. As to the second, the set of critical pairs is represented in a compressed data structure. To incorporate goal orientation into the completion procedure, the hypotheses are blown up to sets of rewrite successors and sometimes predecessors. A step-by-step proof object is constructed at run-time. Our ATP system is implemented in ANSI-C and runs under Sun-OS, Solaris and Linux. The Waldmeister distribution contains an easy-to-use version along with documentation material and problem sets. If you are interested in tackling problems with Waldmeister, just send an e-mail to the authors.

Expected Competition Performance

We conjecture that Waldmeister will solve the majority of the competition problems in less time than one would need to study the corresponding problem specifications.

References

BDP89: Bachmair L., Dershowitz N., Plaisted D.A. (1989), Completion Without Failure, Ait-Kaci H., Nivat M., Resolution of Equations in Algebraic Structures, pp.1-30, Academic Press.
BH96: Buch A., Hillenbrand T. (1996), Waldmeister: Development of a High Performance Completion-Based Theorem Prover, SEKI-Report SR-96-01, AG Effiziente Algorithmen, Universitaet Kaiserslautern, Kaiserslautern, Germany.
HB+97: Hillenbrand T., Buch A., Vogt R., Löchner B. (1997), Waldmeister: High Performance Equational Deduction, Journal of Automated Reasoning 18(2), pp.265-270.

t[100] = p	e[100] = 105
t[101] = X	e[101] = 102
t[102] = s	e[102] = 105
t[103] = s	e[103] = 105
t[104] = Y	e[104] = 105