CASC-16 Entrants' System Descriptions

Bliksem 1.10
E 0.5
E-SETHEO csp99 and E-SETHEO-FLOTTER csp99
FDP 0.9
Fiesta 2
Gandalf c-1.8c (CASC-15 MIX Division winner)
HOTTER 0.0
MUSCADET v2.1
Otter-MACE 437
SCOTT IV
SSCPA 1999.6
SPASS 0.95TPTP (CASC-15 FOF and SAT Division winner)
SPASS 1.0.0TPTP
Vampire 0.0
Waldmeister 798 (CASC-15 UEQ Division winner)
Waldmeister 799

Bliksem 1.10

H. de Nivelle
Max-Planck-Institut für Informatik
nivelle@mpi-sb.mpg.de

Architecture

Bliksem implements the ordered resolution + superposition calculus [BG90,DeN94,Rob65]. It supports many different orders, including reduction orders, A-orders, and non-liftable orders. Special attention has been given to resolution strategies that provide decision procedures for certain subsets of first order logic [FL+93,DeN98]. Bliksem is able to transform first order formulae into clausal normal form, using different structural or non-structural clause transformations. Bliksem 1.10 improves over Bliksem 1.00 by having a posteriori orders, (which will not be used during the competition), equality factoring + tautology elimination, optimized normal form transformations, non-unit demodulators, and equality subsumption (which takes into account commutativity of =)

In the automode Bliksem 1.10 is able to recognize decidable classes. However these are unlikely to show up in the competition. Apart from that, the automode is still quite primitive. We don't know how to select the right strategy + order.

Bliksem 1.00 has had soundness problems. We believe that there is no way in which the correctness of a program with the complexity of Bliksem can be guaranteed. We do not give any such guarantees for Bliksem 1.10. Instead Bliksem 1.10 is able to output its proofs with such a level of specification that they can be easily checked by another computer program. In the future we plan to have the proofs checked by a type checker, namely COQ [BB+96].

Implementation

Bliksem is written in C. High priority has been given to portability. Bliksem has been compiled succcessfully using cc, gcc, and djpcc. The data structures have been carefully chosen after benchmark tests on the basic operations. We are still in the process of completing benchmark tests on the basic algorithms. Bliksem is available from:

    http://www.mpi-sb.mpg.de/~nivelle

Expected Competition Performance

Bliksem 1.10 improves over Bliksem 1.00. However all systems have been improved, so we cannot make any assumptions about the ranking of Bliksem 1.10. However we expect Bliksem to rank high in terms of number of generated clauses per second.

References

BG90: Bachmair L., Ganzinger H. (1990), On Restrictions of Ordered Paramodulation with Simplification, Stickel M.E., Proceedings of the 10th International Conference on Automated Deduction (Kaiserslautern, Germany), pp.427-441, Lecture Notes in Artificial Intelligence 449, Springer-Verlag.
BB+96: Barras B. et al (1996), The COQ Proof Assistent Reference Manual, Version 6.1, INRIA, Nancy, France.
DeN49: DeNivelle H. (1994), Resolution Games and Non-Liftable Resolution Orderings, Pacholski L., Tiuryn J., Computer Science Logic, pp.279-293, Springer-Verlag.
Den98: DeNivelle H. (1998), A Resolution Decision Procedure for the Guarded Fragment, Kirchner C., Kirchner H., Proceedings of the 15th International Conference on Automated Deduction (Lindau, Germany), pp.191-204, Lecture Notes in Artificial Intelligence 1421, Springer-Verlag.
FL+93: Fermüller C., Leitsch A., Tammet T., Zamov N. (1993), Resolution Methods for the Decision Problem, Lecture Notes in Computer Science 679, Springer-Verlag.
Rob65: Robinson J.A. (1965), A Machine-Oriented Logic Based on the Resolution Principle, Journal of the ACM 12(1), pp.23-41.

E 0.5

S. Schulz
Institut für Informatik, Technische Universität München
schulz@informatik.tu-muenchen.de

Architecture

E [Sch99] is a purely equational theorem prover. It was not originally designed as an independent prover, but as an preprocessor implementing the bottom-up part of the METOP calculus, with SETHEO [MI+97] to implement the top-down part in a future combined system.

The calculus used by E combines superposition [BG94] (with selection of negative literals) and rewriting. For the special case that only unit clauses are present, this calculus degenerates into unfailing completion [BDP89]. For the Horn case, E can simulate variants of the positive unit strategy described in [Der91]. No special rules for non-equational literals have been implemented, i.e., resolution is simulated via paramodulation and equality resolution.

Proof search in E is controlled by a literal selection strategy and an evaluation heuristic. A variety of selection strategies (including e.g., no selection, or select all negative literals) is available. Evaluation heuristics can be constructed on the fly by combining various parameterized primitive evaluation functions. We also aim at including some learning heuristics into the prover by the competition date. A simple automatic mode can select strategy and heuristic based on simple problem characteristics.

Implementation

E is implemented in ANSI C, using the GNU C compiler. The most unique feature of the implementation is the maximally shared term representation. This includes parallel rewriting for all instances of a particular subterm. A second important feature is the use of perfect discrimination trees with age and size constraints for rewriting and unit-subsumption.

The program has been successfully installed under SunOS 4.3.x, Solaris 2.x, HP-UX B 10.20, and various versions of Linux. Sources are available freely from:

    http://wwwjessen.informatik.tu-muenchen.de/~schulz/WORK/eprover.html

Expected Competition Performance

We expect E to perform pretty well in all categories it participates in, with particularly strong showing for Horn problems. It is unlikely to make the first place in any category, though.

References

BDP89: Bachmair L., Dershowitz N., Plaisted D.A. (1989), Completion Without Failure, Ait-Kaci H., Nivat M., Resolution of Equations in Algebraic Structures, pp.1-30, Academic Press.
BG94: Bachmair L., Ganzinger H. (1994), Rewrite-Based Equational Theorem Proving with Selection and Simplification, Journal of Logic and Computation 4(3), pp.217-247.
Der91: Dershowitz N. (1991), Ordering-Based Strategies for Horn Clauses, Mylopolous J., Reiter R., Proceedings of the 12th International Joint Conference on Artificial Intelligence (Sydney, Australia), pp.118-124, Morgan-Kaufmann.
MI+97: Moser M., Ibens O., Letz R., Steinbach J., Goller C., Schumann J., Mayr K. (1997), SETHEO and E-SETHEO: The CADE-13 Systems, Journal of Automated Reasoning 18(2), pp.237-246.
Sch99: Schulz S. (1999), System Abstract: E 0.3, Ganzinger H., Proceedings of the 16th International Conference on Automated Deduction (Trento, Italy), Lecture Notes in Artificial Intelligence, Springer-Verlag.

E-SETHEO 99csp and E-SETHEO-FLOTTER 99csp

R. Letz, S. Schulz, G. Stenz, A. Wolf
Institut für Informatik, Technische Universität München
{letz,schulz,stenzg,wolfa}@informatik.tu-muenchen.de

Architecture

E-SETHEO is a compositional theorem prover for formulae in first order clause logic, combining the systems E [Sch99] and SETHEO [MI+97]. It incorporates different calculi and proof procedures like superposition, model elimination and semantic trees (DPLL procedure). Furthermore, the system contains transformation techniques which may split the formula into independent subparts or which may perform a ground instantiation. Finally, advanced methods for controlling and optimizing the combination of the subsystems are applied.

E-SETHEO-FLOTTER includes the program FLOTTER [WGR96] as a preprocessing module for transforming non-clausal formulae to clausal form.

On the top level, the system performs a parallelization of different strategies stemming from the system P-SETHEO [Wol98a]. In contrast to P-SETHEO, in the version 99csp of E-SETHEO, the different strategies are run sequentially, one after the other, depending on the allocation of resources to the different strategies, so-called schedules. A schedule is determined from rough syntactic characteristics of the input formula (e.g., if it is a Horn formula without equality). The schedules have been computed from a large number of experimental data using a combination of a genetic algorithm [SW99] and a gradient procedure [Wol98b].

Implementation

According to the diversity of the contained systems, the modules of E-SETHEO are implemented in different programming languages like C, Prolog, Scheme, and shell tools.

The program runs under Solaris 2.6. Sources are available freely.

Expected Competition Performance

We expect E-SETHEO to perform well in all categories it participates in.

References

MI+97: Moser M., Ibens O., Letz R., Steinbach J., Goller C., Schumann J., Mayr K. (1997), SETHEO and E-SETHEO: The CADE-13 Systems, Journal of Automated Reasoning 18(2), pp.237-246.
Sch99: Schulz S. (1999), System Abstract: E 0.3, Ganzinger H., Proceedings of the 16th International Conference on Automated Deduction (Trento, Italy), Lecture Notes in Artificial Intelligence, Springer-Verlag.
SW99: Stenz G., Wolf A. (1999), Strategy Selection by Genetic Programming, Kumar A., Russell I., Proceedings of the 12th Florida Artificial Intelligence Research Symposium (Orlando, USA), pp.346-350, AAAI Press.
WGR96: Weidenbach C., Gaede B., Rock G. (1996), SPASS and FLOTTER, McRobbie M., Slaney J.K., Proceedings of the 13th International Conference on Automated Deduction (New Brunswick, USA), pp.141-145, Lecture Notes in Artificial Intelligence 1104, Springer-Verlag.
Wol98a: Wolf A. (1998), p-SETHEO: Strategy Parallelism in Automated Theorem Proving, de Swart H., Proceedings of International Conference TABLEAUX'98: Analytic Tableaux and Related Methods (Oisterwijk, The Netherlands), pp.320-324, Lecture Notes in Artificial Intelligence 1397, Springer-Verlag.
Wol98b: Wolf A. (1998), Strategy Selection for Automated Theorem Proving, Giunchiglia F., Proceedings of the 8th International Conference on Artificial Intelligence: Methodology, Systems, Applications (AIMSA) (Sozopol, Bulgaria), pp.452-465, Lecture Notes in Artificial Intelligence 1480, Springer-Verlag.

FDP 0.9

P. Baumgartner
Institute for Computer Science, University of Koblenz-Landau
peter@uni-koblenz.de

Architecture

FDP is a complete theorem prover for first order clausal logic. The underlying calculus is a generalisation of the well-known Davis-Putnam procedure. The generalisation is the splitting wrt. complementary literals, which may contain variables.

The model construction technique is similar to the one of the predecessor calculus "Hyper Tableaux - The Next Generation" described in [Bau98]. Since there is no paper on FDP yet, this is the only relevant publication so far.

Proof search in FDP can be semantically guided by supplying an "initial interpretation", and proof search means to modify this initial interpretation until success. By default, a "trivial" initial interpretation is used that assign true to all positive literals. This emphasizes the role of negative clauses.

There is no built-in equality-handling (yet!) and so far FDP is restricted to clausal logic without equality. Currently, equality is treated in a preprocessing transformation similar to Brand's well-known STE-transformation.

Implementation

FDP is written as an interpreter in Eclipse Prolog. The implementation is highly experimental, preliminary and slow. Some, but not enough effort has been spent on clause selection for the next inference. This step is costly, but poorly implemented. The next implementation will be significantly faster by using indexing techniques.

The current implementation is available through the WWW at:

     http://www.uni-koblenz.de/~peter/FDP/

Comments etc. are highly welcome!

Expected Competition Performance

No chance to seriously compete with the well-known, leading systems. I am optimistic that the situation will improve one day with a better implementation and when knowing the calculus better.

In the MIX division, I expect better performance for non-equality problems than for equality problems, and better performance for non-Horn problems than for Horn problems.

FDP decides function-free clause logic. Many SAT problems are of this kind. Therefore, FDP participates in the SAT division as well.

References

Bau98: Baumgartner P. (1998), Hyper Tableaux - The Next Generation, de Swart H., Automated Reasoning with Analytic Tableaux and Related Methods: Proceedings of TABLEAUX'98 (Oisterwijk, The Netherlands), pp.60-76, Lecture Notes in Computer Science 1397, Springer-Verlag.

Fiesta 2

Robert Nieuwenhuis
Technical University of Catalonia (UPC)
roberto@lsi.upc.es

Architecture

Fiesta is essentially an implementation of unfailing Knuth-Bendix completion with the basicness restriction. The marks indicating subterms blocked for inferences are removed in rules that are used in backward demodulation, but not in forward demodulation (this is a potential source of incompleteness).

The main characteristic of this system is that it does not adjust any system parameters according to the given axiom set or conjecture. This is of course not realistic w.r.t. real-world applications. But we want to make the point that we strongly believe that this is also the case for the other extreme, i.e., applying specific strategies (for very small problem sets), discovered by studying proofs obtained by running (sometimes several) provers on the competition problem set without any time limit. This of course gives no real information either about the behaviour of a prover in a five-minutes run on a new problem.

The following are the fixed settings used at the competition (but of course this does not mean that the prover includes no other orderings or selection strategies; all system parameters can be set in a flexible way):

Fiesta always uses a lexicographic path ordering with a precedence where symbols are first compared by arity (a larger arities means larger in the precedence) and second by order of appearance in the input.

The selection of the next rule used for superposition consists of picking every three cycles once the oldest one, and twice the smallest one according to the size measure where symbols are counted as their arity plus one, and variables as one.

The system has a self-adjusting policy for deleting too large equations: initially the weight limit is small (27), but each time completion terminates without a proof, it restarts with the limit incremented by four (hence this policy is not a source of incompleteness).

If a proof is found, it is output in a format readible by our Prolog proof checker.

Implementation

The system is implemented on top of our collection Dedam of data structures, which provide terms with all their operations (input-output, matching, unification, etc.), as well as substitution trees for efficient many-to-one matching and unification.

Terms sometimes carry additional marks used for indicating that they are in normal form, or as an index in an LPO-cache that avoids the repeated comparision of the same terms with LPO. This is quite useful for rewriting with rules that are not orientable a priori, which is done eagerly.

Expected Competition Performance

We are perfectly aware of the fact that this system cannot compete with other participating systems that look up in a table the right strategy for for a class of problems. In our opinion, such a table may be crucial for winning the competition, but it is probably far less useful for real-world problems and, even if we had the manpower for doing so, we would not want to spend the impressive amount of time needed for building it.

We believe however, that the performance of Fiesta at the competition will give some evidence for the fact that with such a table we could prove most if not all problems.

References

Gandalf c-1.8c

T. Tammet
Department of Computer Science, University of Göteborg
tammet@cs.chalmers.se

Architecture

Gandalf implements a number of different basic strategies: binary ordered resolution for several orderings, versions of set-of-support resolution, binary unit resolution, hyperresolution. Enhancements are used for cutting off literals and combining different strategies, in particular forward and backward reasoning, into one run. Equality is handled by ordered paramodulation and demodulation.

The basic idea of the automatic mode in Gandalf - also used during the CASC-14 competition - is time-slicing: Gandalf selects a set of different search strategies, allocates time to these and finally runs the strategies one after another.

The selection of strategies and the percentage of time they receive is in a somewhat ad hoc way based on the characteristics of the problem: percentage of Horn and almost-Horn clauses, the number of clauses, percentage of clauses where literals can be ordered using term and variable depths.

There exist also versions of Gandalf for intuitionistic logic and type theory, see [Tam96] and [TS96]. More information about Gandalf can be found in [Tam98].

Implementation

Gandalf is implemented in Scheme. We are using the scm interpreter developed by A. Jaffer and the Scheme-to-C compiler Hobbit developed by T. Tammet for the scm system.

The source, binaries, manual and other materials are available at:

    http://www.cs.chalmers.se/~tammet/gandalf/

Expected Competition Performance

Since Gandalf won the MIX class of CASC-14, we expect Gandalf to be one of the top performers in the MIX category during CASC-15. With some luck it may even win the MIX category again.

Gandalf is likely to do well in the UEQ, SAT and FOF categories, although it is unlikely it will win any of these.

References

Tam98: Tammet T. (1998), Towards Efficient Subsumption, Kirchner C., Kirchner H., Proceedings of the 15th International Conference on Automated Deduction (Lindau, Germany), Lecture Notes in Artificial Intelligence, Springer-Verlag.
Tam96: Tammet T. (1996), A Resolution Theorem Prover for Intuitionistic Logic, McRobbie M., Slaney J.K., Proceedings of the 13th International Conference on Automated Deduction (New Brunswick, USA), pp.2-16, Lecture Notes in Artificial Intelligence 1104, Springer-Verlag.
TS96: Tammet T., Smith J. (1996), Optimised Encodings of Fragments of Type Theory in First Order Logic, Berardi S., Coppo M., Types for Proofs and Programs: Proceedings of the International Workshop TYPES'95 (Torino, Italy), pp.265-287, Lecture Notes in Artificial Intelligence 1158, Springer-Verlag.

HOTTER 0.0

H. Zhang
Department of Computer Science, University of Iowa
hzhang@cs.uiowa.edu

Architecture

HOTTER stands for Humble OTTER, an extension of OTTER 3.0.5 [McC94,McC98a,MW97] for experimenting ordered resolution and model generation. HOTTER inherits most inference rules and overall proof search structure of OTTER, except that ordered binary resolution is used hyperresolution is used usually in OTTER. Like SCOTT [HS99], a method for building models of a partial set of clauses is built into HOTTER in order to determine consistent subsets of the clause set. These models are used to guide the proof search in a hope to get the proof or show its nonexistence quicker.

The model generation method is a new method [Zh99] which represents models by rewrite rules. SATO [Zh97], an implementation of the Davis-Putnam method, is also built into HOTTER as a decision procedure for the propositional logic.

For the SAT problems, McCune's MACE [McC98b] is used to test if the input clauses have small finite models. If it does not have, HOTTER takes it over.

Implementation

Like OTTER and SATO, HOTTER is written in C and use the same input as OTTER. The plan has been made to use the built-in SATO for the search of small size finite models in the future release of HOTTER.

Expected Competition Performance

HOTTER has not been fully tested over TPTP problems yet. It's expected the performance of HOTTER is different from OTTER when hyperresolution plays an major role in the proof of OTTER. Overall, HOTTER's performance should be comparable to that of OTTER, with the former being more efficient on near-propositional non-Horn problems.

References

HS99: Hodgson K., Slaney J.K. (1999), Semantic guidance with SCOTT, Technical Report TR-ARP-01-99, Automated Reasoning Project, Australian National University, Canberra, Australia.
MW97: McCune W.W., Wos L. (1997), Otter: The CADE-13 Competition Incarnations, Journal of Automated Reasoning 18(2), pp.211-220.
McC94: McCune W.W. (1994), Otter 3.0 Reference Manual and Guide, Technical Report ANL-94/6, Argonne National Laboratory, Argonne, USA.
McC98a: McCune W.W. (1998), Otter: An Automated Deduction System, http://www.mcs.anl.gov/home/mccune/ar/otter/.
McC98b: McCune W.W. (1998), MACE: Models and Counterexamples, http://www.mcs.anl.gov/home/mccune/ar/mace/.
Zha97: Zhang H. (1997), SATO: an Efficient Propositional Prover, http://www.cs.uiowa.edu/~hzhang/sato/.
Zha99: Zhang H. (1999), Automated Construction of Rewrite Rule Models, http://www.cs.uiowa.edu/~hzhang/model/.

Acknowledgments: Without Bill McCune's OTTER, there would be no HOTTER.

MUSCADET v2.1

D. Pastre
Université René Descartes (Paris 5), UFR de Mathématiques et Informatique
pastre@math-info.univ-paris5.fr

Architecture

MUSCADET is a theorem prover based upon Natural Deduction, in the sense of [Ble71] and [Pas78]. It is built as a knowledged-based system, that is, methods are expressed by rules interpreted by an inference engine. Some rules are universal mathematical knowledge and are given to the system. Other rules are specific to some concepts and are built by metarules from the definitions of these concepts. The notions of hypothesis and conclusion to be proved are privileged. The first conclusion is the first order statement of the conjecture.

Methods are those used by humans, and are executed according to a plan decided by metarules and heuristics. Methods include splitting theorems into two or more subtheorems (conjonctive conclusion or disjunctive hypotheses), treatment of universal, existential or disjunctive conclusion, treatment of existential or disjunctive hypotheseses (one at a time when it is decided (by heuristic rules), replacement of the conclusion by its definition. Hypotheses are never replaced by their definition. Instead, rules are built by metarules from definitions and used to deduce new hypotheses from old ones. Universal hypotheses are not added. Instead, rules are built as from the definitions. Functional symbols may be used, but an automatic creation of intermediate objects allows to flatten deep subformulae and treat them as if concepts were defined by predicate symbols. Positive properties are privileged, in hypotheses and in conclusion. So, a special treatment of negation is done. The successive steps of a proof may be forward deduction (deduce new hypotheses from old ones), backward deduction (replace the conclusion by a new one) or refutation (only if the conclusion is a negation).

The system is also able to work with second order statements. It may also receive knowledge and know-how for a specific domain from a human user; see [Pas89] and [Pas93]. These two possibilities are not used while working with the TPTP Library.

Implementation

The first version of MUSCADET was a real rule-based system [Pas89]. Rules and metarules were written in a declarative language defined for this and interpreted by an inference engine written in Pascal. The competition version is entirely written in SWI-Prolog. Rules are written as declarative Prolog clauses. Metarules are written as sets of Prolog clauses, more or less declarative. The Pascal inference engine is replaced by the Prolog interpreter and some procedural Prolog clauses.

Expected Competition Performance

The best performances of MUSCADET will be for problems manipulating many concepts in which all statements (conjectures, definitions, axioms) are expressed in a manner similar to the practice of humans, especially of mathematicians. It will have poor performances for problems using few concepts but large and deep formulas leading to many splittings.

References

Ble71: Bledsoe W.W. (1971), Splitting and Reduction Heuristics in Automatic Theorem Proving, Artificial Intelligence 2, pp.55-77.
Pas78: Pastre D. (1978), Automatic Theorem Proving in Set Theory, Artificial Intelligence 10, pp.1-27.
Pas89: Pastre D. (1989), MUSCADET : An Automatic Theorem Proving System using Knowledge and Metaknowledge in Mathematics, Artificial Intelligence 38, pp.257-318.
Pas93: Pastre D. (1993), Automated Theorem Proving in Mathematics, Annals of Mathematics and Artificial Intelligence 8, pp.425-447.

Otter-MACE 437

W. McCune
Mathematics and Computer Science, Argonne National Laboratory
mccune@mcs.anl.gov

Architecture

Otter [McC94,McC98a,MW97] searches for proofs, and MACE [McC98b] searches for small finite models. Both programs apply to statements in first-order (unsorted) logic with equality. The two programs are independent and work by completely different methods. (In practice, the programs are complementary.) Version 3.0.5 of Otter has been used. Otter is based on resolution and paramodulation applied to clauses; an Otter search uses the "given clause algorithm" and typically involves a large database of clauses; subsumption and demodulation play an important role. Version 1.3.2 of MACE has been used. Roughly speaking, MACE works as follows. For a given domain size, an equivalent propositional problem is constructed and a Davis-Putnam-Loveland decision procedure is applied. MACE typically cannot handle clauses with many variables or deeply nested terms. Otter-MACE 437 (= Otter 3.0.5 + MACE 1.3.2) is a simple combination of the two programs.

Implementation

Both programs are written in C. Otter uses shared data structures for clauses and terms, and it uses indexing for resolution, paramodulation, forward and backward subsumption, forward and backward demodulation, and unit conflict. MACE is much simpler than Otter; its Davis-Putnam-Loveland procedure uses a counting algorithm to do the unit resolutions, and forward subsumption is not applied. Otter-MACE is a shell script that runs Otter for a few seconds, then (if Otter has not found a refutation) MACE for a few seconds, then (if MACE has not found a model) Otter again for the remaining time.

Expected Competition Performance

These versions of Otter and MACE are essentially the same as the versions entered in the previous two CASC competitions. Both programs generally perform well at CASC, but no top prizes have been won since CASC-13. We do not expect any winning performances in CASC-16. We see our entry as a benchmark for measuring improvements in other systems.

References

MW97: McCune W.W., Wos L. (1997), Otter: The CADE-13 Competition Incarnations, Journal of Automated Reasoning 18(2), pp.211-220.
McC94: McCune W.W. (1994), Otter 3.0 Reference Manual and Guide, Technical Report ANL-94/6, Argonne National Laboratory, Argonne, USA.
McC98a: McCune W.W. (1998), Otter: An Automated Deduction System, http://www.mcs.anl.gov/home/mccune/ar/otter/.
McC98b: McCune W.W. (1998), MACE: Models and Counterexamples, http://www.mcs.anl.gov/home/mccune/ar/mace/.

Acknowledgments: Ross Overbeek, Larry Wos, Bob Veroff, and Rusty Lusk contributed to the development of Otter. MACE was motivated by previous work of John Slaney, Mark Stickel, and Hantao Zhang.

SCOTT IV

K. Hodgson and J. Slaney
Automated Reasoning Project, Australian National University
{kahlil, jks}@arp.anu.edu.au

Architecture

SCOTT (Semantically Constrained OTTER) [HS99] is an extension of OTTER [McC94] and uses the same inference rules and overall proof search structure. This extension incorporates FINDER [Sla94] (a finite model generator) as a subsystem of OTTER in order to determine (maximally) consistent subsets of the clause space. These maximally consistent subsets are used to guide the proof search by allowing us to increase the likelihood of certain necessary inferences being made.

Unlike previous versions [Sla93,SLM94], the current version of SCOTT searches for maximally consistent subsets much more aggressively, and manages the cost of this much more intelligently. Most importantly, however, our naïve notions of semantic guidance have been superseded by those with genuine theoretical support. In addition, semantic resolution has been tentatively implemented and, provided it can be made sufficiently stable, will be incorporated in the competition version.

Implementation

SCOTT is written in C. It uses the existing program OTTER to perform the proof search and additional code to determine which clauses should be chosen for inference. This additional code calls the existing program FINDER in order to witness the consistency of particular sets of clauses.

Expected Competition Performance

In principle SCOTT should perform at least as well as OTTER across all parts of the MIX and UEQ divisions, if only because it has OTTER as a subroutine. Preliminary results show that, when SCOTT does find a proof, it requires at least an order of magnitude fewer inferences than OTTER. However, searching for models and testing clauses against them can be costly, and so SCOTT may perform poorly if the time-limit is too severe. Overall, we expect good performance in comparison with cognate systems that employ relatively naïve guidance.

References

HS99: Hodgson K., Slaney J.K. (1999), Semantic guidance with SCOTT, Technical Report TR-ARP-01-99, Automated Reasoning Project, Australian National University, Canberra, Australia.
McC94: McCune W.W. (1994), Otter 3.0 Reference Manual and Guide, Technical Report ANL-94/6, Argonne National Laboratory, Argonne, USA.
SLM94: Slaney J.K., Lusk E., McCune W.W. (1994), SCOTT: Semantically Constrained Otter, System Description, Bundy A., Proceedings of the 12th International Conference on Automated Deduction (Nancy, France), pp.764-768, Lecture Notes in Artificial Intelligence 814, Springer-Verlag.
Sla93: Slaney J.K. (1993), SCOTT: A Model-Guided Theorem Prover, Bajcsy R., Proceedings of the 13th International Conference on Artificial Intelligence (Chambery, France), pp.109-114, Morgan-Kaufman.
Sla94: Slaney J.K. (1994), FINDER: Finite Domain Enumerator, System Description, Bundy A., Proceedings of the 12th International Conference on Automated Deduction (Nancy, France), pp.798-801, Lecture Notes in Artificial Intelligence 814, Springer-Verlag.

SSCPA 1999.6

G. Sutcliffe
Department of Computer Science, James Cook University
geoff@cs.jcu.edu.au

Architecture

SSCPA [SS99a] is an uncooperative multiple calculus competition parallelism ATP system, that multitasks on a single CPU. SSCPA uses the syntactic characteristics of the given problem to classify it into one of 18 predefined specialist problem classes (SPCs). SSCPA then uses performance data from available sequential ATP systems to recommend those systems that perform well for that SPC. The recommended systems are run in parallel, in one of several user selectable modes.

The choice of what syntactic characteristics are used to form the SPCs is based on community input and analysis of system performance data. Performance data for the available ATP systems, for problems in the TPTP [SS98], is used to evaluate the systems within each SPC.

SSCPA has five modes of operation, which differ in terms of systems chosen, CPU allocation to each system, CPU priority for each system, and start time for each system. In CASC-16 SSCPA will run the eager mode, which uses three or four recommended systems, CPU allocation halving from the time limit, equal CPU priority, and all systems starting together.

Implementation

SSCPA uses perl} scripts that extract system recommendations from the performance data anto d control the parallel execution of the systems. The tptp2X utility is used to convert the problem to the formats expected by the ATP systems. A WWW interface to some SSCPA modes is publicly available at:

    http://www.cs.jcu.edu.au/cgi-bin/tptp/SystemOnTPTPFormMaker

Expected Competition Performance

SSCPA is expected to perform well in the more heterogeneous divisions, i.e., MIX and FOF. The benefits of SSCPA's archtecture are less pronounced in divisions where single systems could dominate, e.g., UEQ, where Waldmesiter may dominate as it did in CASC-15 [SS99b].

References

SS99a: Sutcliffe G., Seyfang D. (1999), Smart Selective Competition Parallelism ATP, Kumar A., Russell I., Proceedings of the 12th Florida Artificial Intelligence Research Symposium (Orlando, USA), pp.341-345, AAAI Press.
SS98: Sutcliffe G., Suttner C.B. (1998), The TPTP Problem Library: CNF Release v1.2.1, Journal of Automated Reasoning 21(2), pp.177-203.
SS99b: Sutcliffe G., Suttner C.B. (1999), The CADE-15 ATP System Competition, Journal of Automated Reasoning.

SPASS 0.95TPTP

C. Weidenbach
Max-Planck-Institut für Informatik
weidenb@mpi-sb.mpg.de

Architecture

SPASS [Wei97] is an automated theorem prover for full sorted first-order logic with equality. Its calculus is an extension of superposition by sorts [Wei98] and a splitting rule for case analysis. In contrast to many approaches to order-sorted clausal reasoning, the calculus enables sort predicates and equations to occur arbitrarily within clauses. Therefore, the sort theory is not separated from the problem clauses, but automatically and dynamically extracted [GMW97]. Nevertheless, it can be used to significantly cut the search space.

The splitting rule splits clauses, if possible, into variable disjoint parts that can be independently refuted. This rule is not necessary for completeness of the calculus but it makes SPASS an efficient decision procedure for several decidable subclasses of first-order logic [HS97].

We put a lot of effort in a clause normal form translation that produces small clause normal forms. The major techniques are renaming and improved Skolemization [NRW98].

Implementation

SPASS is written in ANSI-C and relies on a generic library that supports data structures and efficient functions for automated theorem proving.

The SPASS distribution containing the source code for SPASS is freely available from the SPASS homepage. The code should compile and run on any machine where the usual GNU tools are available. This has been tested on SUN Sparc workstations under Solaris and SunOS, on personal computers under Linux and Solaris and on Dec Alpha workstations under Digital Unix.

Expected Competition Performance

We hope to improve on the performance of SPASS at CASC-13 and CASC-14.

References

GMW97: Ganzinger H., Meyer C., Weidenbach C. (1997), Soft Typing for Ordered Resolution, McCune W.W., Proceedings of the 14th International Conference on Automated Deduction (Townsville, Australia), pp.321-335, Lecture Notes in Artificial Intelligence 1249, Springer-Verlag.
HS97: Hustadt U., Schmidt R.A. (1997), On Evaluating Decision Procedures for Modal Logics, Pollack M.E., Proceedings of the 15th International Joint Conference on Artificial Intelligence (Nagoya, Japan), pp.202-207.
NRW98: Nonnengart A., Rock G., Weidenbach C. (1998), On Generating Small Clause Normal Forms, Kirchner C., Kirchner H., Proceedings of the 15th International Conference on Automated Deduction (Lindau, Germany), Lecture Notes in Artificial Intelligence, Springer-Verlag.
Wei97: Weidenbach C. (1997), SPASS: Version 0.49, Journal of Automated Reasoning 2(18), pp.247-252.
Wei98: Weidenbach C. (1998), Sorted Unification and Tree Automata, Bibel W., Schmitt P.H., Automated Deduction, A Basis for Applications, Kluwer Academic Publishers.

SPASS 1.0.0TPTP

C. Weidenbach
Max-Planck-Institut für Informatik
weidenb@mpi-sb.mpg.de

Architecture

SPASS [WA+99] is an automated theorem prover for first-order logic with equality. It is a saturation based prover employing superposition, sorts and splitting [Wei99]. The current version 1.0.0 offers a variety of further inference and reduction rules including hyper resolution, unit resulting resolution, various variants of paramodulation and a terminator [AO83].

The prover is available for many platforms, in particular, we now also provide a Windows 95/98/NT version including a neat graphical user interface.

Implementation

SPASS is implemented in ANSI C and relies on our internal library supporting specific data structures and algorithms like, e.g., indexing or orderings (KBO, RPOS). The code is documented, flexible and was designed to be reusable. The code should compile on any platform supporting an ANSI C compiler.

Expected Competition Performance

The SPASS competition version 1.0.0TPTP is tuned towards TPTP problems. In particular, the standard automatic configuration module is replaced by a module adapted to the TPTP. We hope that the new version can improve over the version 0.95TPTP that entered the previous competition.

References

AO93: Antoniou G., Ohlbach H-J. (1983), Terminator, Bundy A., Proceedings of the 8th International Joint Conference on Artificial Intelligence (Karlsruhe, Germany), pp.916-919.
WA+99: Weidenbach C., Afshordel B., Brahm U., Cohrs C., Engel T., Keen E., Theobalt C., Tpoic D. (1999), System Description: SPASS Version 1.0.0, Ganzinger H., Proceedings of the 16th International Conference on Automated Deduction (Trento, Italy), Lecture Notes in Artificial Intelligence, Springer-Verlag.
Wei99: Weidenbach C. (1999), SPASS: Combining Superposition, Sorts and Splitting, Robinson A., Voronkov A., Handbook of Automated Reasoning, Elsevier Science.

Vampire 0.0

A. Riazanov, A. Voronkov
Computer Science Department, The University of Manchester
{riazanoa,voronkov}@cs.man.ac.uk

Architecture

Vampire is a resolution-based theorem prover for first-order classical logic. The current version implements ordered binary resolution and hyperresolution, superposition and some simplifications. For proof search Vampire uses a saturation-based procedure similar to that of Otter [McC94]. One interesting feature of Vampire is the so-called limited resource strategy. The idea is as follows. When a time limit is given, Vampire makes estimations on what clauses (called currently eligible clauses) can be processed by the end of the time limit. For each newly generated clause it is checked whether it is currently eligible. If it is not, Vampire discards the clause. The estimation principle takes into consideration the current strategy and tries to calculate how the process of clause generation will develop by the time limit.

Implementation

To gain efficiency we implement the most costly set-at-a-time operations, such as resolution, superposition and subsumption, using combination of indexing and compilation. We use a variant of code trees [Vor95] for forward subsumption and forward demodulation, path indexing [Sti89] for backward subsumption and a variant of discrimination trees [McC92] for resolution and superposition. All used indexing techniques provide so-called "perfect filtering" [Graf96,RSV99]. For storing persistent clauses we use tree-like representation of terms. The sharing is agressive. For other operations we use different versions of flatterms. Vampire is implemented in C++ and compiled by gcc.

Expected Competition Performance

We expect Vampire to perform well, especially on the problems without equality.

References

Gra96: Graf P. (1996), Term Indexing, Lecture Notes in Computer Science 1053, Springer-Verlag.
McC92: McCune W.W. (1992), Experiments with Discrimination-Tree Indexing and Path Indexing for Term Retrieval, Journal of Automated Reasoning 9(2), pp.147-167.
McC94: McCune W.W. (1994), Otter 3.0 Reference Manual and Guide, Technical Report ANL-94/6, Argonne National Laboratory, Argonne, USA.
RSV99: Ramakrishnan I.V., Sekar R., Voronkov A. (1999), Term Indexing, Robinson A., Voronkov A., Handbook of Automated Reasoning, Elsevier Science.
Sti89: Stickel M.E. (1989), The Path Indexing Method for Indexing Terms, Technical Note 473, SRI International, Menlo Park, USA.
Vor95: Voronkov A. (1995), The Anatomy of Vampire, Journal of Automated Reasoning 15(2), pp.237-265.

Waldmeister 798

T. Hillenbrand, A. Jaeger, B. Löchner, and A. Buch
FB Informatik, Universität Kaiserslautern
{hillen,jaeger,loechner}@informatik.uni-kl.de,arnim.buch@sdm.de

Architecture

Waldmeister is a theorem prover for unit equational logic. Its proof procedure is unfailing Knuth-Bendix completion [BDP89]. We started the development from a three-level model of saturation-based ATP systems [BH96, HB+96]. Since the efforts of the last years have led to a satisfactory inference machine, we now focus on the question how to guide the proof search.

The main completion loop is parameterized by a selection heuristic and a reduction ordering. Both are derived from the algebraic structure described by the axioms. Since last year, the derivation process has been lifted such that control knowledge is expressed declaratively, which eases the integration of human experience. For full flexibility, the search parameters may be re-instantiated during the proof search.

Recently, we also realized a database-driven mechanism which allows simple extensions of the underlying inference calculus, e.g. for reasoning on Horn clauses as described in [BDP89:pp.24ff].

Implementation

Reaching efficiency both in terms of time and space has been a major goal in the development of our inference machine. As to the first, fast rewriting is achieved by sophisticated indexing techniques. As to the second, the set of critical pairs is represented in a compressed data structure.

To incorporate goal orientation into the completion procedure, the hypotheses are blown up to sets of rewrite successors and sometimes predecessors. A step-by-step proof object is constructed at run-time.

Our ATP system is implemented in ANSI-C and runs under Sun-OS, Solaris and Linux. The Waldmeister distribution contains an easy-to-use version along with documentation material and problem sets. If you are interested in tackling problems with Waldmeister, just send an e-mail to the authors.

Expected Competition Performance

We conjecture that Waldmeister will solve the majority of the competition problems in less time than one would need to study the corresponding problem specifications.

References

BDP89: Bachmair L., Dershowitz N., Plaisted D.A. (1989), Completion Without Failure, Ait-Kaci H., Nivat M., Resolution of Equations in Algebraic Structures, pp.1-30, Academic Press.
BH96: Buch A., Hillenbrand T. (1996), Waldmeister: Development of a High Performance Completion-Based Theorem Prover, SEKI-Report SR-96-01, AG Effiziente Algorithmen, Universitaet Kaiserslautern, Kaiserslautern, Germany.
HB+97: Hillenbrand T., Buch A., Vogt R., Löchner B. (1997), Waldmeister: High Performance Equational Deduction, Journal of Automated Reasoning 18(2), pp.265-270.

Waldmeister 799

T. Hillenbrand, A. Jaeger, B. Löchner, and A. Buch
FB Informatik, Universität Kaiserslautern
{hillen,jaeger,loechner}@informatik.uni-kl.de,arnim.buch@sdm.de

This year's version of our unit equality theorem prover is in most parts identical to Waldmeister 798 that won the CASC-15 UEQ competition with an enormous lead [SS99]. Since last year's description is included in this collection we will only report on the differences here.

Architecture

Waldmeister is an implementation of unfailing Knuth-Bendix completion [BDP89]. Our approach to control the proof search is to choose selection strategy and reduction ordering according to the algebraic structure given in the problem specification [HJL99]. Since Waldmeister 798, the corresponding part of the system has been developed further into a robust component that is easy to maintain. In addition we have systematized the derivation of useful control knowledge, which also gave rise to progress on some problem domains.

Recently, Waldmeister has successfully been integrated into the Omega proof development environment [BC+97]. Moreover, we have established a convenient WWW interface that is reachable via:

    http://www-avenhaus.informatik.uni-kl.de/waldmeister

From this URL further documents and the complete Waldmeister distribution containing an easy-to-use version together with problems and a manual are available.

Implementation

No major changes took place. Some portions of code have been slightly revised, e.g., a number of time-consuming inner loops have carefully been reworked.

Expected Competition Performance

We expect some improvements in comparison with last year's version, depending on problem domains and proof tasks.

References

BDP89: Bachmair L., Dershowitz N., Plaisted D.A. (1989), Completion Without Failure, Ait-Kaci H., Nivat M., Resolution of Equations in Algebraic Structures, pp.1-30, Academic Press.
BC+97: Benzmüller C., Cheikhrouhou L., Fehrer D., Fiedler A., Huang X., Kerber M., Kohlhase M., Konrad K., Meier A., Melis E., Schaarschmidt W., Siekmann J., Sorge V. (1997), OMEGA: Towards a Mathematical Assistant, McCune W.W., Proceedings of the 14th International Conference on Automated Deduction (Townsville, Australia), pp.146-160, Lecture Notes in Artificial Intelligence 1249, Springer-Verlag.
HJL99: Hillenbrand T., Jaeger A., Löchner B. (1999), Waldmeister - Improvements in Performance and Ease of Use, Ganzinger H., Proceedings of the 16th International Conference on Automated Deduction (Trento, Italy), Lecture Notes in Artificial Intelligence, Springer-Verlag.
SS99: Sutcliffe G., Suttner C.B. (1999), The CADE-15 ATP System Competition, Journal of Automated Reasoning.