Entrants' System Descriptions

Darwin 1.3
FM-Darwin 1.3
DCTP 10.21p (CASC-20 EPR division winner)
E and EP 0.99
Equinox 1.0
Faust 1.0
Geo 2006i
iProver 0.1
MathServe 0.80
Muscadet 2.6
Octopus 2006
Otter 3.3
Paradox 1.3 (CASC-20 SAT division winner)
Paradox 2.0
Theo 2006
Vampire 8.0 (CASC-20 MIX and FOF divisions winner)
Vampire 8.1
Waldmeister 704 (CASC-20 UEQ division winner)
Waldmeister 806

Darwin 1.3

Peter Baumgartner¹, Alexander Fuchs², Cesare Tinelli²
¹National ICT Australia, Australia,
Peter.Baumgartner[@]nicta.com.au
²The University of Iowa, USA
{fuchs,tinelli}[@]cs.uiowa.edu

Architecture

Darwin [BFT04, BFT06a] is an automated theorem prover for first order clausal logic. It is an implementation of the Model Evolution Calculus [BT03]. The Model Evolution Calculus lifts the propositional DPLL procedure to first-order logic. One of the main motivations for this approach was the possibility of migrating to the first-order level some of those very effective search techniques developed by the SAT community for the DPLL procedure.

The current version of Darwin implements first-order versions of unit propagation inference rules analogously to a restricted form of unit resolution and subsumption by unit clauses. To retain completeness, it includes a first-order version of the (binary) propositional splitting inference rule.

Proof search in Darwin starts with a default interpretation for a given clause set, which is evolved towards a model or until a refutation is found.

Implementation

The central data structure is the context. A context represents an interpretation as a set of first-order literals. The context is grown by using instances of literals from the input clauses. The implementation of Darwin is intended to support basic operations on contexts in an efficient way. This involves the handling of large sets of candidate literals for modifying the current context. The candidate literals are computed via simultaneous unification between given clauses and context literals. This process is sped up by storing partial unifiers for each given clause and merging them for different combinations of context literals, instead of redoing the whole unifier computations. For efficient filtering of unneeded candidates against context literals, discrimination tree or substitution tree indexing is employed. The splitting rule generates choice points in the derivation which are backtracked using a form of backjumping similar to the one used in DPLL-based SAT solvers.

Improvements to the previous version include additional preprocessing steps, less memory requirements, and lemma learning [BFT06b].

Darwin is implemented in OCaml and has been tested under various Linux distributions. It is available from:

    http://goedel.cs.uiowa.edu/Darwin/

Strategies

Darwin traverses the search space by iterative deepening over the term depth of candidate literals. Darwin employs a uniform search strategy for all problem classes.

Expected Competition Performance

We expect its performance to be very strong in the EPR divisions; we anticipate performance below average in the CNF and FOF divisions, and weak performance in the SAT division.

FM-Darwin 1.3

Peter Baumgartner¹, Alexander Fuchs², Cesare Tinelli²
¹National ICT Australia, Australia,
Peter.Baumgartner[@]nicta.com.au
²The University of Iowa, USA
{fuchs,tinelli}[@]cs.uiowa.edu

Architecture

FM-Darwin [BF+06] is a finite model finder for first order clausal logic with equality.

The system tries to find a finite domain model for sizes 1,2,.. by equisatisfiable preserving transformation into corresponding function-free clause sets, which are decided by the Darwin prover [BFT06a].

FM-Darwin operates similarly to the finite model finder Paradox [CS03]. Unlike Paradox, it is not based on reduction to propositional satisfiability problems, but on reduction to satisfiability problems of (not necessarily ground) function-free clause sets instead.

Implementation

FM-Darwin is implemented as an extension to Darwin, which can now be used in its default mode or as a finite model finder.

Similar to Paradox FM-Darwin uses clause splitting, term definitions, static symmetry reduction, sort inference, and lemmas. In constrast, clause splitting and term definitions are only applied in a restricted way, that is for variable disjunct clause and ground terms, as ground instantiation is not performed and thus the exponential increase in the size of the clause set does not happen.

FM-Darwin is available from:

    http://goedel.cs.uiowa.edu/Darwin/

Strategies

FM-Darwin implements strategies that are also used in other finite domain model finders.

Like Paradox, FM-Darwin is a complete method for function-free clause sets, and can also detect unsatisfiability if the totality axioms are not used in a refutation.

Expected Competition Performance

We expect its performance to be good in the SAT division and average in the EPR divison.

DCTP 10.21p

Gernot Stenz
Technische Universität München, Germany
stenzg[@]informatik.tu-muenchen.de

Architecture

DCTP 1.31 [Ste02a] is an automated theorem prover for first order clause logic. It is an implementation of the disconnection calculus described in [Bil96,LS01,Ste02b]. The disconnection calculus is a proof confluent and inherently cut-free tableau calculus with a weak connectedness condition. The inherently depth-first proof search is guided by a literal selection based on literal instantiatedness or literal complexity and a heavily parameterised link selection. The pruning mechanisms mostly rely on different forms of variant deletion and unit based strategies. Additionally the calculus has been augmented by full tableau pruning.

DCTP 10.21p is a strategy parallel version using the technology of E-SETHEO [SW99] to combine several different strategies based on DCTP 1.31.

Implementation

DCTP 1.31 has been implemented as a monolithic system in the Bigloo dialect of the Scheme language. The most important data structures are perfect discrimination trees, which are used in many variations. DCTP 10.21p has been derived of the Perl implementation of E-SETHEO and includes DCTP 1.31 as well as the E prover as its CNF converter. Both versions run under Solaris and Linux.

We are currently integrating a range of new techniques into DCTP which are mostly based on the results described in [LS04], as well as a certain form of unit propagation. We are hopeful that these improvements will be ready in time for CASC-J2.

Strategies

DCTP 1.31 is a single strategy prover. Individual strategies are started by DCTP 10.21p using the schedule based resource allocation scheme known from the E-SETHEO system. Of course, different schedules have been precomputed for the syntactic problem classes. The problem classes are more or less identical with the sub-classes of the competition organisers. Again, we have no idea whether or not this conflicts with the organisers' tuning restrictions.

Expected Competition Performance

DCTP 10.21p is the CASC-20 EPR division winner.

E and EP 0.99

Stephan Schulz
Institut für Informatik, Technische Universität, Germany
schulz[@]eprover.org

Architecture

E 0.99 [Sch02,Sch04a] is a purely equational theorem prover. The core proof procedure operates on formulas in clause normal form, using a calculus that combines superposition (with selection of negative literals) and rewriting. No special rules for non-equational literals have been implemented, i.e., resolution is simulated via paramodulation and equality resolution. The basic calculus is extended with rules for AC redundancy elemination, some contextual simplification, and pseudo-splitting. The latest version of E also supports simultaneous paramodulation, either for all inferences or for selected inferences.

E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e. a strict separation of active and passive facts. Proof search in E is primarily controlled by a literal selection strategy, a clause evaluation heuristic, and a simplification ordering. The prover supports a large number of preprogrammed literal selection strategies, many of which are only experimental. Clause evaluation heuristics can be constructed on the fly by combining various parameterized primitive evaluation functions, or can be selected from a set of predefined heuristics. Supported term orderings are several parameterized instances of Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO).

The prover uses a preprocessing step to convert formulas in full first order format to clause normal form. This step may introduce (first-order) definitions to avoid an exponential growth of the formula. Preprocessing also unfolds equational definitions and performs some simplifications on the clause level.

The automatic mode determines literal selection strategy, term ordering, and search heuristic based on simple problem characteristics of the preprocessed clausal problem.

EP 0.99 is just a combination of E 0.99 in verbose mode and a proof analysis tool extracting the used inference steps.

Implementation

E is implemented in ANSI C, using the GNU C compiler. The most outstanding feature is the global sharing of rewrite steps. Current versions of E add rewrite links from rewritten to new terms. In effect, E is caching rewrite operations as long as sufficient memory is available. Other important features are the use of perfect discrimination trees with age and size constraints for rewriting and unit-subsumption, feature vector indexing [Sch04b] for forward and backward subsumption and contextual literal cutting, and a new polynomial implementation of LPO [Loe04b].

The program has been successfully installed under SunOS 4.3.x, Solaris 2.x, HP-UX B 10.20, MacOS-X, and various versions of Linux. Sources of the latest released version are available freely from:

    http://www.eprover.org

EP 0.99 is a simple Bourne shell script calling E and the postprocessor in a pipeline.

Strategies

E's automatic mode is optimized for performance on TPTP. The optimization of E 0.9 is based on about 90 test runs over the library (and previous experience) and consists of the selection of one of about 40 different strategies for each problem. All test runs have been performed on SUN Ultra 60/300 machines with a time limit of 300 seconds (or roughly equivalent configurations). All individual strategies are general purpose, the worst one solves about 49% of TPTP 3.0.1, the best one about 60%. We expect similar optimization for E 0.99.

E distinguishes problem classes based on a number of features, all of which have between 2 and 4 possible values. The most important ones are:

Is the most general non-negative clause unit, Horn, or Non-Horn?
Is the most general negative clauce unit or non-unit?
Are all negative clauses unit clauses?
Are all literals equality literals, are some literas equality literals, or is the problem non-equational?
Are there a few, some, or many clauses in the problem?
Is the maximum arity of any function symbol 0, 1, 2, or greater?
Is the sum of function symbol arities in the signature small, medium, or large?

Wherever there is a three-way split on a numerical feature value, the limits are selected automatically with the aim of splitting the set of problems into approximately equal sized parts based on this one feature.

For classes above a threshold size, we assign the absolute best heuristic to the class. For smaller, non-empty classes, we assign the globally best heuristic that solves the same number of problems on this class as the best heuristic on this class does. Empty classes are assigned the globally best heuristic. Typically, most selected heuristics are assigned to more than one class.

Expected Competition Performance

In the last years, E performed well in most proof categories. We believe that E will again be among the strongest provers in the CNF category. Performance on FOF problems should be competitive and better than in previous years. We hope that E will at least be a useful complement to dedicated systems in the other categories.

EP 0.99 will be hampered by the fact that it has to analyse the inference step listing, an operation that typically is about as expensive as the proof search itself. Nevertheless, it should be competitive among the CNF^* and FOF^* systems.

Equinox 1.0

Koen Claessen
Chalmers University of Technology, Sweden
koen[@]cs.chalmers.se

Give all ground clauses in the problem to a SAT solver.
Run the SAT-solver.
If a contradiction is found, we have a proof and we terminate.
If a model is found, we use the model to indicate which new ground instantiations should be added to the SAT-solver.
Goto 2.

Expected Competition Performance

Equinox is still in its infancy. There should however be problems that it can solve that few other provers can handle.

Faust 1.0

Yury Puzis
University of Miami, USA
yury.puzis[@]gmail.com

Architecture

Faust 1.0 is a fully concurrent automatic theorem prover for fist order logic. Faust implements binary resolution, full resolution, superposition, equality resolution, forward subsumption resolution, simplifications such as propositional tautology deletion, factoring of identical literals, equality tautology deletion, trivial equality resolution, as well as forward and backward subsumption. Faust has been designed as a multi-threaded application to allow concurrent execution of logically incompatible strategies. The idea that governs the design of the architecture is that collaborative specialist strategies can be more scalable then powerful multipurpose strategies executed in sequence. For this purpose a 3-layered architecture was designed:

Kernel that functions as a persistent storage of inferences, provides indexing, implementation of all the inference rules and simplifications, and engages strategies built on top of it in a non-intrusive, transparent interaction.
Individual strategies
Strategy Manager that examines the problem, splits it into subproblems and chooses the most appropriate strategy for each.

Implementation

Faust is implemented in C++ and supports gcc 4.0+. All strategies (except the base one) are implemented by inheriting from each other. The JJParser library is used in order to parse the initial problem file.

The website for the project is at this address:

    http://umsis.miami.edu/~ypuzis/FAUST-AutomatedTheoremProver.html

Strategies

Implemented strategies include model resolution, unit resolution and semantic resolution. The Strategy Manager attempts to show contradiction by:

Putting each negated conjecture clause into individual subproblem.
Augmenting each subset with axioms that pass the pure literal deletion test (with respect to clauses in the subproblem).
Choosing an appropriate strategy and ordering method for each subproblem (based on, for example, presence or absence of equality clauses).

Common to all strategies is an Otter loop, modified to correctly and efficiently handle formulae that are found as a consequence of kernel-level interaction with a another strategy. Ordering methods include age-weight ratio, KBO, and other heuristics. Kernel-level strategy interaction includes:

Performing subsumption against inferences obtained from all threads.
Inferences done once (including by a different thread) are never redone.

Expected Competition Performance

Faust has a relatively extensive ammunition of inference rules and is implemented efficiently. However, the advantages of concurrent, collaborative execution of strategies are far from being fully exploited in the current version. Therefore, the expected performance is modest.

Geo 2006i

Hans de Nivelle¹, Jia Meng²
¹Max-Planck Institut für Informatik, Germany
nivelle[@]mpi-inf.mpg.de
²National ICT Australia, Australia
jiameng[@]rsise.anu.edu.au

Architecture

Geo is based on geometric resolution. A geometric formula is a formula with form:

   Forall x1, ..., xn 
   A1 /\ ... /\ Ap /\ v1 != w1 /\ ... /\ vq != wq -> Z(x1,...,xn).

The Ai must be non-equality atoms. The arguments of the atoms Ai must be among the variables x1, ..., xn. There are no function symbols, and no constants in the atoms Ai. The inequalities vj != wj must have their arguments among x1, ..., xn.

Z(x1,...,xn) must have one of the following three forms:

The false constant FALSE.
A disjunction B1 | ... | Br. The atoms Bk must be non-equality atoms. The arguments of the atoms Bk are among the variables x1, ..., xn. There are no function symbols, and no constants in the atoms Bk.
An existential quantification EXISTS y B. y is a variable distinct from x1,...,xn. B is an atom which has all arguments among x1,...,xn,y.

As input, Geo accepts geometric formulas and first-order formulas. First-order formulas are transformed into geometric formulas. During this translation, function symbols have to be removed, and replaced by relations.

During search, Geo searches for a model of the geometric formulas by backtracking. At each choice point, after all subbranches have been refuted, it derives a new rule of Type 1, which refutes the current model attempt without backtracking. In this way, Geo learns during backtracking.

In case Geo finds a finite model, it simply prints the model. In case no model exists, it will eventually learn the rule -> FALSE, so that it is able to output a refutation proof.

The final aim of geometric resolution, and of Geo, is to obtain a prover that is both good at finding proofs, and at finding finite models.

Implementation

Geo is written in C++. We try to keep the code readable and reasonably efficient at the same time.

Strategies

Currently, Geo has only one strategy: It searches for a model by exhaustive search, and learns a new rule at each choice point, after it has refuted all subbranches. There are still many uncertainties in the way geometric resolution has to be implemented, and Geo is only a first attempt.

Expected Competition Performance

Geo was completed too short before the competition. We have not done big testing, so we cannot predict the performance.

iProver 0.1

Konstantin Korovin
The University of Manchester, England
korovin[@]cs.man.ac.uk

Architecture

iProver is based on the instantiation calculus [GK03], which is complete for first-order logic. iProver successively instantiates a given set of clauses, and uses a ground SAT solver for checking whether the current ground abstraction of clauses is satisfiable. If the ground solver finds unsatisfiability then the prover terminates, proving unsatisfiability of the initial set of clauses; otherwise the model for ground clauses is used to guide further instantiations. The saturation process is implemented as a modification of a given clause algorithm. We use non-perfect discrimination trees for the unification index. The following redundancy eliminations are implemented: blocking non-proper instantiations; dismatching constraints [GK04]; removing duplicates. Equality is dealt with (internally) by adding the necessary axioms of equality.

Implementation

iProver is implemented in OCaml and for the ground reasoning uses C version of MiniSat implemented by Niklas Sörensson. The interface from MiniSat-C to OCAML is implemented by Ahmad Abu-Khazneh.

Strategies

Various literal selection functions are implemented preferring negative/positive, shallow/deep literals. Parameters of the instantiation loop include: frequency of calling the SAT solver and selection of the given clause based on its properties.

Expected Competition Performance

This is the first implementation, without many optimizations and more importantly without equality being built in. Therefore the expected performance is very modest.

MathServe 0.80

Jürgen Zimmer, Serge Autexier
Universität des Saarlandes, Germany
{jzimmer,serge}[@]ags.uni-sb.de

Architecture

MathServe is a framework for integrating reasoning systems as Web Services into a networked environment. The functionality of these Reasoning Web Services is captured in Semantic Web Service descriptions using the OWL-S [MarURL] upper ontology for semantic web services. The MathServe Broker is a middle-agent which knows all available reasoning services and answers queries for client applications. Given a problem description, the MathServe Broker can automatically retrieve services and to combine services in case one single service is not sufficient to tackle a problem. Our Broker performs a "semantic" best match by analysing incoming problems and choosing the best service available for that problem.

MathServe currently integrates the ATP systems Otter 3.3, EP 0.91, SPASS 2.2 and Vampire 8.0. All ATP services get a problem description in the new TPTP format (TPTP Library v3.1.1) and a time limit in seconds as an input. The problem is transformed into the provers' input format using the tptp2X utility. The result specifies the status of the given problem with one of the statuses defined in the SZS Status Ontology [SZS04].

Since MathServe does not form an ATP system on its own it only enters the Demonstration Division of the CASC.

Implementation

MathServe is implemented in Java and uses an Apache Tomcat web server and the AXIS package to offer reasoning systems as Web Services. All services are accessible programmatically using WSDL descriptions of their interface. ATP systems are integrated via Java wrappers that manage the translation of the input problems into the prover's input format and the translation of resulting resolution proofs into TPTP format.

Strategies

We collected data on the performance of the underlying ATP systems on all the Specialists Problem Classes (SPCs) [SS01] of the TPTP Library 3.1.1. The OWL-S descriptions of all ATP services have been annotated with this data. The MathServe Broker determines the SPC of an incoming TPTP problem and uses the performance data to choose a suitable prover for that problem.

Expected Competition Performance

At CASC-20, MathServe 0.62 could solve 392 out of 660 problems. Partly, this performance was due to the older versions of EP (0.82) and Vampire (7.0) used. Furthermore, MathServe could not handle big problems with descriptions bigger than 2MB. We improved the problem handling of MathServe and it should now be able to handle big problems. We also integrated newer versions of the ATP systems SPASS, E and Vampire as well as the systems DCTP, Waldmeister and Paradox.

Muscadet 2.6

Dominique Pastre
Université René Descartes (Paris 5), France
pastre[@]math-info.univ-paris5.fr

Architecture

The Muscadet theorem prover [Pas89] is a knowledge-based system. It is based on Natural Deduction, following the terminology of [Ble71] and [Pas78], and uses methods which resembles those used by humans. It is composed of an inference engine, which interprets and executes rules, and of one or several bases of facts, which are the internal representation of "theorems to be proved". Rules are either universal and put into the system, or built by the system itself by metarules from data (definitions and lemmas). Rules may add new hypotheses, modify the conclusion, create objects, split theorems into two or more subtheorems or build new rules which are local for a (sub-)theorem.

Implementation

Muscadet 2 [Pas01] is implemented in SWI-Prolog. Rules are written as declarative Prolog clauses. Metarules are written as sets of Prolog clauses, more or less declarative. The inference engine includes the Prolog interpreter and some procedural Prolog clauses. Muscadet 2 is available from:

  http://www.math-info.univ-paris5.fr/~pastre/muscadet/muscadet.html

Strategies

There are specific strategies for existential, universal, conjunctive or disjunctive hypotheses and conclusions. Functional symbols may be used, but an automatic creation of intermediate objects allows deep subformulae to be flattened and treated as if the concepts were defined by predicate symbols. The successive steps of a proof may be forward deduction (deduce new hypotheses from old ones), backward deduction (replace the conclusion by a new one) or refutation (only if the conclusion is a negation).

The system is also able to work with second order statements. It may also receive knowledge and know-how for a specific domain from a human user; see [Pas89, Pas93]. These two possibilities are not used while working with the TPTP Library.

Expected Competition Performance

The best performances of Muscadet will be for problems manipulating many concepts in which all statements (conjectures, definitions, axioms) are expressed in a manner similar to the practice of humans, especially of mathematicians [Pas02,Pas06]. It will have poor performances for problems using few concepts but large and deep formulas leading to many splittings.

Octopus 2006

Monty Newborn
McGill University, Canada
newborn[@]cs.mcgill.ca

Architecture

Octopus 2006 is a parallel version of Theo and is designed to run on as many computers as are available. For this competition, we expect to run on between 100 and 200 computers. Octopus's single processor version, Theo [New01], is a resolution-refutation theorem prover for first order logic. It accepts theorems in either clausal form or FOL form. Theo's inference procedures include binary resolution, binary factoring, instantiation, demodulation, and hash table resolutions.

Implementation

Octopus is written in C and runs under both Linux and FreeBSD. It contains about 70000 lines of source code. It normally runs on a PC with at least 512Mb of memory.

Strategies

Each processor of Octopus is a copy of Theo. When the master receives a theorem to prove, it delivers it to all the slaves with instructions on which weakened version of the theorem to try. Each slave then tries a different weakened version. If a slave finds a proof of a weakened version, it goes on to attempt to prove the given theorem. Strategies used by Octopus are similar to those used by Theo. Theo's procedure is described in Theo's System Description. Once the master has sent the theorem to the slave, there is no communication among the computers except when one finds a proof. If a base clause can be split, however, the slaves split up the effort to find a proof. Each slave returns the results of its effort to the master. When each of the split subproblems has been solved, the master concludes it has a proof.

Expected Competition Performance

Octopus is only marginally better than it was last year.

Otter 3.3

William McCune
Argonne National Laboratory, USA
mccune[@]mcs.anl.gov

Architecture

Otter 3.3 [McC03a] is an ATP system for statements in first-order (unsorted) logic with equality. Otter is based on resolution and paramodulation applied to clauses. An Otter search uses the "given clause algorithm", and typically involves a large database of clauses; subsumption and demodulation play an important role.

Implementation

Otter is written in C. Otter uses shared data structures for clauses and terms, and it uses indexing for resolution, paramodulation, forward and backward subsumption, forward and backward demodulation, and unit conflict. Otter is available from:

    http://www-unix.mcs.anl.gov/AR/otter/

Strategies

Otter's original automatic mode, which reflects no tuning to the TPTP problems, will be used.

Expected Competition Performance

Otter has been entered into CASC as a stable benchmark against which progress can be judged (there have been only minor changes to Otter since 1996 [MW97], nothing that really affects its performace in CASC). This is not an ordinary entry, and we do not hope for Otter to do well in the competition.

Acknowledgments: Ross Overbeek, Larry Wos, Bob Veroff, and Rusty Lusk contributed to the development of Otter.

Paradox 1.3

Koen Claessen, Niklas Sörensson
Chalmers University of Technology and Gothenburg University, Sweden
{koen,nik}[@]cs.chalmers.se

Architecture

Paradox [CS03] is a finite-domain model generator. It is based on a MACE-style [McC94] flattening and instantiating of the first-order clauses into propositional clauses, and then the use of a SAT solver to solve the resulting problem.

Paradox incorporates the following features: Polynomial-time clause splitting heuristics, the use of incremental SAT, static symmetry reduction techniques, and the use of sort inference.

The main differences with Paradox 1.0 are: a better SAT-solver, better memory behaviour, and a faster clause instantiation algorithm.

Implementation

The main part of Paradox is implemented in Haskell using the GHC compiler. Paradox also has a built-in incremental SAT solver which is written in C++. The two parts are linked together on the object level using Haskell's Foreign Function Interface.

Strategies

There is only one strategy in Paradox:

Analyze the problem, finding an upper bound N on the domain size of models, where N is possibly infinite. A finite such upper bound can be found, for example, for EPR problems.
Flatten the problem, and split clauses and simplify as much as possible.
Instantiate the problem for domain sizes 1 up to N, applying the SAT solver incrementally for each size. Report "SATISFIABLE" when a model is found.
When no model of sizes smaller or equal to N is found, report "CONTRADICTION".

In this way, Paradox can be used both as a model finder and as an EPR solver.

Expected Competition Performance

Paradox 1.3 is the CASC-20 SAT division winner.

Paradox 2.0

Koen Claessen, Niklas Sörensson
Chalmers University of Technology and Gothenburg University, Sweden
{koen,nik}[@]cs.chalmers.se

Architecture

Paradox 2.0ualpha is the beginning of a complete rewrite of Paradox 1.3. Paradox 2.0-alpha does not have all the features yet that Paradox 1.3 has.

See the description of Paradox 1.3 for general information.

Expected Competition Performance

Paradox 2.0-alpha will probably perform slightly worse than Paradox 1.3.

Theo 2006

Monty Newborn
McGill University, Canada
newborn[@]cs.mcgill.ca

Architecture

Theo [New01] is a resolution-refutation theorem prover for first order logic. It accepts theorems in either clausal form or FOL form. Theo's inference procedures include binary resolution, binary factoring, instantiation, demodulation, and hash table resolutions.

Implementation

Theo is written in C and runs under both LINUX and FREEBSD. It contains about 50000 lines of source code. Originally it was called The Great Theorem Prover.

Strategies

Theo uses a large hash table (16 million entries) to store clauses. This permits complex proofs to be found, some as long as 500 inferences. It uses what might be called a brute-force iteratively deepening depth-first search looking for a contradiction while storing information about clauses - unit clauses in particular - in its hash table.

When Theo participated in the 2004 CASC Competition, it used a learning strategy [NW04] briefly described in this paragraph. When given a theorem, it first created a list of potential ways to "weaken" the theorem by weakening one of the clauses. It then randomly selected one of the weakenings, tried to prove the weakened version of the theorem, and then used the results from this effort to help prove the given theorem. A weakened version was created by modifying one clause by replacing a constant or function by a variable or by deleting a literal. Certain clauses from the proof of the weakened version were added to the base clauses when Theo next attempted to prove the given theorem. In addition, base clauses that participated in the proof of the weakened version were placed in the set-of-support. Theo then attempted to prove the given theorem with the revised set of base clauses.

The learning strategy was further developed for the 2005 competition as described here. When Theo is given a theorem, it first tries to prove a weakened version. It does this for 50% of the allotted time. If successful, it then attempts to prove the given theorem as described in the previous paragraph. If unsuccessful, however, Theo then weakens the given theorem further by weakening an additional clause. With two clauses now weakened, Theo then attempts to prove the given theorem for 50% of the remaining time (25% of the originally allotted time). If successful, Theo uses the results to attempt to prove the given theorem. If unsuccessful, Theo picks two other weakenings and tries again for 50% of the remaining time. Now certain weakened proofs are thrown out, as they generally seem to be useless. Weakened proofs that are shorter than three inferences are considered useless, as are those that are less than five and do not involve the negated conclusion. Attempts to improve the learning strategy are of major interest in the ongoing development of Theo.

For this 2006 competition, efforts have been made to avoid attempting certain weakenings in the first place. These weakenings tend to produce useless proofs. As a trivial example, weakening equal(X,X) to equal(X,Y) is highly likely to produce a trivial proof. In addition to work on the learning strategy, the procedure for transforming FOFs to CNF has been improved.

Expected Competition Performance

Theo is marginally better than it was last year.

Vampire 8.0

Andrei Voronkov
University of Manchester, England
voronkov[@]cs.man.ac.uk

Architecture

Vampire 8.0, [RV02,Vor05] is an automatic theorem prover for first-order classical logic. It consists of a shell and a kernel. The kernel implements the calculi of ordered binary resolution and superposition for handling equality. The splitting rule and negative equality splitting are simulated by the introduction of new predicate definitions and dynamic folding of such definitions. A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion (optionally modulo commutativity), subsumption resolution, rewriting by ordered unit equalities, and a lightweight basicness. The CASC version uses the Knuth-Bendix ordering. The lexicographic path ordering has been implemented recently but will not be used for this CASC.

A number of efficient indexing techniques are used to implement all major operations on sets of terms and clauses. Run-time algorithm specialisation is used to accelerate some costly operations, e.g., checks of ordering constraints. Although the kernel of the system works only with clausal normal forms, the shell accepts a problem in the full first-order logic syntax, clausifies it and performs a number of useful transformations before passing the result to the kernel. When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF.

Implementation

Vampire 8.0 is implemented in C++. The supported compilers are gcc gcc 3.3.x, and Microsoft Visual C++.

Strategies

The Vampire kernel provides a fairly large number of features for strategy selection. The most important ones are:

Choice of the main saturation procedure :
- Limited Resource Strategy
- DISCOUNT loop
- Otter loop
A variety of optional simplifications.
Parameterised reduction orderings.
A number of built-in literal selection functions and different modes of comparing literals.
Age-weight ratio that specifies how strongly lighter clauses are preferred for inference selection.
Set-of-support strategy.

The automatic mode of Vampire 8.0 is derived from extensive experimental data obtained on problems from TPTP v3.0.1. Input problems are classified taking into account simple syntactic properties, such as being Horn or non-Horn, presence of equality, etc. Additionally, we take into account the presence of some important kinds of axiom, such as set theory axioms, associativity and commutativity. Every class of problems is assigned a fixed schedule consisting of a number of kernel strategies called one by one with different time limits.

Main differences between Vampire 8.0 and 7.0

a naming technique for a short clause form transformation
a better Skolemisation algorithm
the new TPTP input syntax as well as the KIF syntax
possibility of working with multiple knowledge bases
a query answering mode
lexicographic path ordering

Expected Competition Performance

Vampire 8.0 is the CASC-20 MIX and FOF divisions winner.

Vampire 8.1

Andrei Voronkov
University of Manchester, England
voronkov[@]cs.man.ac.uk

No system description supplied. However, see the description of Vampire 8.0 for general information. Minor changes have been made, including a bugfix in the FOF to CNF conversion.

Waldmeister 704

Jean-Marie Gaillourdet¹, Thomas Hillenbrand², Bernd Löchner¹
¹Technische Universität Kaiserslautern, Germany,
²Max-Planck-Institut für Informatik Saarbrücken, Germany
waldmeister[@]informatik.uni-kl.de

Architecture

Waldmeister 704 is a system for unit equational deduction. Its theoretical basis is unfailing completion in the sense of [BDP89] with refinements towards ordered completion (cf. [AHL03]). The system saturates the input axiomatization, distinguishing active facts, which induce a rewrite relation, and passive facts, which are the one-step conclusions of the active ones up to redundancy. The saturation process is parameterized by a reduction ordering and a heuristic assessment of passive facts [HJL99]. For an in-depth description of the system, see [Hil03]. Waldmeister 704 improves over last year's version in several respects. Firstly, the detection of redundancies in the presence of associative-commutative operators has been strenghtened (cf. [Loe04a]). In a class of AC-equivalent equations, an element is redundant if each of its ground instances can be rewritten, with the ground convergent rewrite system for AC, into an instance of another element. Instead of elaborately checking this kind of reducability explicitly, it can be rephrased in terms of ordering constraints and efficiently be approximated with a polynomial test. Secondly, the last teething troubles of the implementation of the Waldmeister loop have been overcome. Thirdly, a number of strategies have slightly been revised.

Implementation

The prover is coded in ANSI-C. It runs on Solaris, Linux, and newly also on MacOS X. In addition, it is now available for Windows users via the Cygwin platform. The central data strucures are: perfect discrimination trees for the active facts; group-wise compressions for the passive ones; and sets of rewrite successors for the conjectures. Visit the Waldmeister web pages at:

    http://www.waldmeister.org

Strategies

The approach taken to control the proof search is to choose the search parameters according to the algebraic structure given in the problem specification [HJL99]. This is based on the observation that proof tasks sharing major parts of their axiomatization often behave similar. Hence, for a number of domains, the influence of different reduction orderings and heuristic assessments has been analyzed experimentally; and in most cases it has been possible to distinguish a strategy uniformly superior on the whole domain. In essence, every such strategy consists of an instantiation of the first parameter to a Knuth-Bendix ordering or to a lexicographic path ordering, and an instantiation of the second parameter to one of the weighting functions addweight, gtweight, or mixweight, which, if called on an equation

s = t, return

|

s| + |t|,

|max

_>(s,t)|, or

|max

_>(s,t)| · (|s| + |t| + 1) + |s| + |t|, respectively, where

|

s| denotes the number of symbols in

Expected Competition Performance

Waldmeister 704 is the CASC-20 UEQ division winner.

Waldmeister 806

Thomas Hillenbrand¹, Bernd Löchner²
¹Max-Planck-Institut für Informatik Saarbrücken, Germany
²Technische Universität Kaiserslautern, Germany,
waldmeister[@]informatik.uni-kl.de

No system description supplied. However, see the description of Waldmeister 704 for general information.

References

AHL03: Avenhaus J., Hillenbrand T., Löchner B. (2003), On Using Ground Joinable Equations in Equational Theorem Proving, Journal of Symbolic Computation 36(1-2), pp.217-233, Elsevier Science.
BDP89: Bachmair L., Dershowitz N., Plaisted D.A. (1989), Completion Without Failure, Ait-Kaci H., Nivat M., Resolution of Equations in Algebraic Structures, pp.1-30, Academic Press.
BT03: Baumgartner P., Tinelli C. (2003), The Model Evolution Calculus, Baader F., Proceedings of the 19th International Conference on Automated Deduction (Miami, USA), pp.350-364, Lecture Notes in Artificial Intelligence 2741, Springer-Verlag.
BFT04: Baumgartner P., Fuchs A., Tinelli C. (2004), Darwin - A Theorem Prover for the Model Evolution Calculus, Sutcliffe G., Schulz S., Tammet T., Proceedings of the Workshop on Empirically Successful First Order Reasoning, 2nd International Joint Conference on Automated Reasoning (Cork, Ireland).
BFT06a: Baumgartner P., Fuchs A., Tinelli C. (2006), Implementing the Model Evolution Calculus, International Journal on Artificial Intelligence Tools 15(1), pp.21-52.
BFT06b: Baumgartner P., Fuchs A., Tinelli C. (2006), Lemma Learning in the Model Evolution Calculus, Hermann M., Voronkov A., Proceedings of the 13th International Conference on Logic for Programming, Artificial Intelligence, and Reasoning (Phnom Penh, Cambodia), Lecture Notes in Artificial Intelligence.
BF+06: Baumgartner P., Fuchs A., de Nivelle H., Tinelli C. (2006), Computing Finite Models by Reduction to Function-Free Clause Logic, Ahrendt W., Baumgartner P., de Nivelle H., Proceedings of the 3rd Workshop on Disproving - Non-Theorems, Non-Validity, Non-Provability, 3rd International Joint Conference on Automated Reasoning (Seattle, USA).
Bil96: Billon J-P. (1996), The Disconnection Method: A Confluent Integration of Unification in the Analytic Framework, Miglioli P., Moscato U., Mundici D., Ornaghi M., Proceedings of TABLEAUX'96: the 5th Workshop on Theorem Proving with Analytic Tableaux and Related Methods (Palermo, Italy), pp.110-126, Lecture Notes in Artificial Intelligence 1071, Springer-Verlag.
Ble71: Bledsoe W.W. (1971), Splitting and Reduction Heuristics in Automatic Theorem Proving, Artificial Intelligence 2, pp.55-77.
CS03: Claessen K., Sorensson N. (2003), New Techniques that Improve MACE-style Finite Model Finding, Baumgartner P., Fermueller C., Proceedings of the CADE-19 Workshop: Model Computation - Principles, Algorithms, Applications (Miami, USA).
GK03: Ganzinger H., Korovin K. (2003), New Directions in Instantiation-Based Theorem Proving, Kolaitis P., Proceedings of the 18th IEEE Symposium on Logic in Computer Science (Ottawa, Canada), pp.55-64.
GK04: Ganzinger H., Korovin K. (2004), Integrating Equational Reasoning into Instantiation-Based Theorem Proving, Marcinkowski J., Tarlecki A., Proceedings of the 18th International Workshop on Computer Science Logic, 13th Annual Conference of the EACSL (Karpacz, Poland), pp.71-84, Lecture Notes in Computer Science 3210.
HJL99: Hillenbrand T., Jaeger A., Löchner B. (1999), Waldmeister - Improvements in Performance and Ease of Use, Ganzinger H., Proceedings of the 16th International Conference on Automated Deduction (Trento, Italy), pp.232-236, Lecture Notes in Artificial Intelligence 1632, Springer-Verlag.
Hil03: Hillenbrand T. (2003), Citius altius fortius: Lessons Learned from the Theorem Prover Waldmeister, Dahn I., Vigneron L., Proceedings of the 4th International Workshop on First-Order Theorem Proving (Valencia, Spain), Electronic Notes in Theoretical Computer Science 86.1, Elsevier Science.
LS01: Letz R., Stenz G. (2001), Model Elimination and Connection Tableau Procedures, Robinson A., Voronkov A., Handbook of Automated Reasoning, pp.2015-2114, Elsevier Science.
LS04: Letz R., Stenz G. (2004), Generalised Handling of Variables in Disconnection Tableaux, Rusinowitch M., Basin D., Proceedings of the 2nd International Joint Conference on Automated Reasoning (Cork, Ireland), pp.289-306, Lecture Notes in Artificial Intelligence 3097.
Loe04a: Löchner B. (2004), A Redundancy Criterion Based on Ground Reducibility by Ordered Rewriting, Rusinowitch M., Basin D., Proceedings of the 2nd International Joint Conference on Automated Reasoning (Cork, Ireland), pp.45-59, Lecture Notes in Artificial Intelligence 3097.
Loe04b: Loechner B. (2004), What to Know When Implementing LPO, Sutcliffe G., Schulz S., Tammet T., Proceedings of the Workshop on Empirically Successful First Order Reasoning, 2nd International Joint Conference on Automated Reasoning (Cork, Ireland).
MarURL: Martin D. et al. (URL), OWL-S 1.1 Release, http://www.daml.org/services/owl-s/1.1/.
MW97: McCune W.W., Wos L. (1997), Otter: The CADE-13 Competition Incarnations, Journal of Automated Reasoning 18(2), pp.211-220.
McC94: McCune W.W. (1994), A Davis-Putnam Program and its Application to Finite First-Order Model Search: Quasigroup Existence Problems, ANL/MCS-TM-194, Argonne National Laboratory, Argonne, USA.
McC03a: McCune W.W. (2003), Otter 3.3 Reference Manual, ANL/MSC-TM-263, Argonne National Laboratory, Argonne, USA.
New01: Newborn M. (2001), Automated Theorem Proving: Theory and Practice, Springer.
NW04: Newborn M., Wang Z. (2004), Octopus: Combining Learning and Parallel Search, Journal of Automated Reasoning 33(2), pp.171-218.
Pas78: Pastre D. (1978), Automatic Theorem Proving in Set Theory, Artificial Intelligence 10, pp.1-27.
Pas89: Pastre D. (1989), Muscadet : An Automatic Theorem Proving System using Knowledge and Metaknowledge in Mathematics, Artificial Intelligence 38, pp.257-318.
Pas93: Pastre D. (1993), Automated Theorem Proving in Mathematics, Annals of Mathematics and Artificial Intelligence 8, pp.425-447.
Pas01: Pastre D. (2001), Muscadet2.3 : A Knowledge-based Theorem Prover based on Natural Deduction, Gore R., Leitsch A., Nipkow T., Proceedings of the International Joint Conference on Automated Reasoning (Siena, Italy), pp.685-689, Lecture Notes in Artificial Intelligence 2083, Springer-Verlag.
Pas02: Pastre D. (2002), Strong and Weak Points of the Muscadet Theorem Prover, AI Communications 15(2-3), pp.147-160.
Pas06: Pastre D. (2006), Complementarity of Natural Deduction and Resolution Principle in Empirically Automated Theorem Proving, http://www.math-info.univ-paris5.fr/~pastre/compl-ND-RP.pdf.
RV02: Riazanov A., Voronkov A. (2002), The Design and Implementation of Vampire, AI Communications 15(2-3), pp.91-110.
Sch02: Schulz S. (2002), E: A Brainiac Theorem Prover, AI Communications 15(2-3), pp.111-126.
Sch04a: Schulz S. (2004), System Abstract: E 0.81, Rusinowitch M., Basin D., Proceedings of the 2nd International Joint Conference on Automated Reasoning (Cork, Ireland), pp.223-228, Lecture Notes in Artificial Intelligence 3097.
Sch04b: Schulz S. (2004), Simple and Efficient Clause Subsumption with Feature Vector Indexing, Sutcliffe G., Schulz S., Tammet T., Proceedings of the Workshop on Empirically Successful First Order Reasoning, 2nd International Joint Conference on Automated Reasoning (Cork, Ireland).
SW99: Stenz G., Wolf A. (1999), E-SETHEO: Design, Configuration and Use of a Parallel Automated Theorem Prover, Foo N., Proceedings of AI'99: The 12th Australian Joint Conference on Artificial Intelligence (Sydney, Australia), pp.231-243, Lecture Notes in Artificial Intelligence 1747, Springer-Verlag.
Ste02a: Stenz G. (2002), DCTP 1.2 - System Abstract, Fermüller C., Egly U., Proceedings of TABLEAUX 2002: Automated Reasoning with Analytic Tableaux and Related Methods (Copenhagen, Denmark), pp.335-340, Lecture Notes in Artificial Intelligence 2381, Springer-Verlag.
Ste02b: Stenz G. (2002), The Disconnection Calculus, PhD thesis, Institut für Informatik, Technische Universität München, Munich, Germany.
SS01: Sutcliffe G., Suttner C.B. (2001), Evaluating General Purpose Automated Theorem Proving Systems, Artificial Intelligence 131(1-2), pp.39-54.
SZS04: Sutcliffe G., Zimmer J., Schulz S. (2004), TSTP Data-Exchange Formats for Automated Theorem Proving Tools, Zhang W., Sorge V., Distributed Constraint Problem Solving and Reasoning in Multi-Agent Systems, pp.201-215, Frontiers in Artificial Intelligence and Applications 112, IOS Press.
Vor05: Voronkov A. (2005), Vampire Reference Manual and User's Guide, http://www.vampire.fm.