Darwin 1.3
Peter Baumgartner1,
Alexander Fuchs2,
Cesare Tinelli2
1National ICT Australia, Australia,
2The University of Iowa, USA
Architecture
Darwin [BFT04, BFT06a]
is an automated theorem prover for first order clausal logic.
It is an implementation of the Model Evolution Calculus
[BT03].
The Model Evolution Calculus lifts the propositional DPLL procedure to
first-order logic. One of the main motivations for this approach
was the possibility of migrating to the first-order level some of
those very effective search techniques developed by the SAT community
for the DPLL procedure.
The current version of Darwin implements first-order versions of unit propagation inference rules analogously to a restricted form of unit resolution and subsumption by unit clauses. To retain completeness, it includes a first-order version of the (binary) propositional splitting inference rule.
Proof search in Darwin starts with a default interpretation for a given clause set, which is evolved towards a model or until a refutation is found.
Improvements to the previous version include additional preprocessing steps, less memory requirements, and lemma learning [BFT06b].
Darwin is implemented in OCaml and has been tested under various Linux distributions. It is available from:
http://goedel.cs.uiowa.edu/Darwin/
Darwin 1.4.1
Peter Baumgartner1,
Alexander Fuchs2,
Cesare Tinelli2
1NICTA, Australia,
2The University of Iowa, USA
Architecture
Darwin [BFT04, BFT06a]
is an automated theorem prover for first order clausal logic.
It is an implementation of the Model Evolution Calculus
[BT03].
The Model Evolution Calculus lifts the propositional DPLL procedure to
first-order logic. One of the main motivations for this approach
was the possibility of migrating to the first-order level some of
those very effective search techniques developed by the SAT community
for the DPLL procedure.
The current version of Darwin implements first-order versions of unit propagation inference rules analogously to a restricted form of unit resolution and subsumption by unit clauses. To retain completeness, it includes a first-order version of the (binary) propositional splitting inference rule.
Proof search in Darwin starts with a default interpretation for a given clause set, which is evolved towards a model or until a refutation is found.
Darwin is implemented in OCaml and has been tested under various Linux distributions. It is available from:
http://goedel.cs.uiowa.edu/Darwin/
FM-Darwin 1.4.1
Peter Baumgartner1,
Alexander Fuchs2,
Cesare Tinelli2
1NICTA, Australia,
2The University of Iowa, USA
Architecture
FM-Darwin [BF+06] is a finite model finder for
first order clausal logic with equality, in the spirit of Paradox
[CS03].
For each domain size the problem is transformed into an equisatisfiable function-free clause set, which is decided by the Darwin prover [BFT06a].
Like Paradox, the system uses clause splitting, term definitions, static symmetry reduction, sort inference, and lemmas. In contrast, clause splitting and term definitions are only applied in a restricted way, that is for variable disjunct clause and ground terms, as ground instantiation is not performed and thus the exponential increase in the size of the clause set does not happen.
FM-Darwin is available from:
http://goedel.cs.uiowa.edu/Darwin/
The first-order clausal problem is transformed into a decidable logic, here function-free clause logic. Then, the system tries to find a finite domain model for sizes 1 .. n. A given domain is encoded by adding the axioms of totality and functionality for the symbols which have been converted from function to predicate symbols, such that any finite model of the original problem can be translated into a model of the transformed problem, and vice versa.
Like Paradox, FM-Darwin is a complete method for function-free clause sets, and can also detect unsatisfiability if the totality axioms are not used in a refutation.
E and EP 0.999
Stephan Schulz
Institut für Informatik, Technische Universität, Germany
Architecture
E 0.999 [Sch02,Sch04a]
is a purely
equational theorem prover. The core proof procedure operates on
formulas in clause normal form, using a calculus that combines
superposition (with selection of negative literals) and rewriting. No
special rules for non-equational literals have been implemented, i.e.,
resolution is simulated via paramodulation and equality
resolution. The basic calculus is extended with rules for AC
redundancy elimination, some contextual simplification, and
pseudo-splitting. The latest versions of E also supports simultaneous
paramodulation, either for all inferences or for selected inferences.
E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e. a strict separation of active and passive facts. Proof search in E is primarily controlled by a literal selection strategy, a clause evaluation heuristic, and a simplification ordering. The prover supports a large number of preprogrammed literal selection strategies, many of which are only experimental. Clause evaluation heuristics can be constructed on the fly by combining various parameterized primitive evaluation functions, or can be selected from a set of predefined heuristics. Supported term orderings are several parameterized instances of Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO).
The prover uses a preprocessing step to convert formulas in full first order format to clause normal form. This step may introduce (first-order) definitions to avoid an exponential growth of the formula. Preprocessing also unfolds equational definitions and performs some simplifications on the clause level.
The automatic mode determines literal selection strategy, term ordering, and search heuristic based on simple problem characteristics of the preprocessed clausal problem.
EP 0.999 is just a combination of E 0.999 in verbose mode and a proof analysis tool extracting the used inference steps.
The program has been successfully installed under SunOS 4.3.x, Solaris 2.x, HP-UX B 10.20, MacOS-X, and various versions of Linux. Sources of the latest released version are available freely from:
http://www.eprover.orgEP 0.999 is a simple Bourne shell script calling E and the postprocessor in a pipeline.
E distinguishes problem classes based on a number of features, all of which have between 2 and 4 possible values. The most important ones are:
For classes above a threshold size, we assign the absolute best heuristic to the class. For smaller, non-empty classes, we assign the globally best heuristic that solves the same number of problems on this class as the best heuristic on this class does. Empty classes are assigned the globally best heuristic. Typically, most selected heuristics are assigned to more than one class.
EP 0.999 will be hampered by the fact that it has to analyse the inference step listing, an operation that typically is about as expensive as the proof search itself. Nevertheless, it should be competitive among the FOF* and CNF* systems.
E-KRHyper 1.0
Björn Pelzer, Christoph Wernhard
Universität Koblenz-Landau, Germany
Architecture
E-KRHyper 1.0 [PW07] is a theorem proving and
model generation system for first-order logic with equality.
It is an implementation of the E-hyper tableau calculus
[BFP07], which integrates a superposition-based
handling of equality [BG98] into the hyper
tableau calculus [BFN96].
The system is an extension of the KRHyper theorem prover
[Wer03], which implements the original hyper
tableau calculus.
An E-hyper tableau is a tree whose nodes are labeled with clauses and which is built up by the application of the inference rules of the E-hyper tableau calculus. The calculus rules are designed such that most of the reasoning is performed using positive unit clauses. A branch can be extended with new clauses that have been derived from the clauses of that branch.
A positive disjunction can be used to split a branch, creating a new branch for each disjunct. No variables may be shared between branches, and if a case-split creates branches with shared variables, then these are immediately substituted by ground terms. The grounding substitution is arbitrary as long as the terms in its range are irreducible: the branch being split may not contain a positive equational unit which can simplify a substituting term, i.e. rewrite it with one that is smaller according to a reduction ordering. When multiple irreducible substitutions are possible, each of them must be applied in consecutive splittings in order to preserve completeness.
Redundancy rules allow the detection and removal of clauses that are redundant with respect to a branch.
The hyper extension inference from the original hyper tableau calculus is equivalent to a series of E-hyper tableau calculus inference applications. Therefore the implementation of the hyper extension in KRHyper by a variant of semi-naive evaluation [Ull89] is retained in E-KRHyper, where it serves as a shortcut inference for the resolution of non-equational literals.
http://www.uni-koblenz.de/~bpelzer/ekrhyperThe system accepts clause input in the TPTP-supported Protein-format. E-KRHyper operates on an E-hyper tableau that is represented by linked node records. A layered discrimination-tree based index provides access to the clauses in the tableau and supports backtracking.
Z(x1,...,xn) must have one of the following three forms:
Geo accepts formulas either in its own syntax or in TPTP-syntax.
Because CNF has no special status for Geo,
TPTP-formulas in CNF form are treated as ordinary first-order formulas.
During search, Geo searches for a model of the geometric formulas
by a combination of backtracking and learning.
Whenever it cannot extend an interpretation into a model, Geo learns
a new rule of Type 1, which ensures that Geo will not explore similar
models in the future.
In case Geo finds a finite model, it simply prints the model. In case no
model exists, it will eventually learn the rule -> FALSE,
so that it is able to output a proof.
The final aim of geometric resolution, and of Geo, is to obtain a prover
that is both good at finding proofs, and at finding finite models.
Differences with last year (2006)
Acknowledgments:
Thomas Raths contributed to the development of
leanCoP 2.0 by performing comprehensive
benchmark tests on several problem libraries.
Equinox 1.2
Koen Claessen
Chalmers University of Technology, Sweden
Architecture
Equinox is an experimental theorem prover for pure first-order logic
with equality. It finds ground proofs of the input theory, by solving
successive ground instantiations of the theory using an incremental
SAT-solver. Equality is dealt with using a Nelson-Oppen framework.
Implementation
The main part of Equinox is implemented in Haskell using the GHC
compiler. Equinox also has a built-in incremental SAT solver (MiniSat)
which is written in C++. The two parts are linked together on the
object level using Haskell's Foreign Function Interface.
Strategies
There is only one strategy in Equinox:
Expected Competition Performance
Equinox should perform reasonably well. There should be problems that
it can solve that few other provers can handle.
Fampire 1.3
Josef Urban
Charles University of Prague, Czech Republic
Architecture
Fampire 1.3. is the Vampire 8.1 prover using the FLOTTER (SPASS) clausifier.
Implementation
Vampire and (a slightly modified) FLOTTER are linked together using a simple
FlotterOnTPTP.pl
Perl script, Geoff Sutcliffe's tptp4X utility, and slightly modified dfg2tptp
utility.
Strategies
The strategy is to use the FLOTTER clausifier, and run Vampire on the CNF
problem.
Expected Competition Performance
The system outperforms Vampire 8.1 on Mizar problems, however it is weaker
than Vampire 8.1 on the TPTP library (for which is Vampire pre-optimized).
It could solve some problems that Vampire 8.1 will not solve.
Geo 2007f
Hans de Nivelle
University of Wroclaw, Poland
Architecture
Geo [dNM06,deN07] is
based on geometric resolution.
A geometric formula is a formula of form:
FORALL x1, ..., xn
[ A1 /\ ... /\ Ap /\ v1 != w1 /\ ... /\ vq != wq -> Z(x1,...,xn) ].
The Ai are atoms not containing (dis)equality.
The arguments of the atoms Ai and the inequalities vj != wj
are among the variables x1, ..., xn.
Function symbols and constant symbols are not allowed.
As input, Geo accepts geometric formulas and first-order formulas.
First-order formulas are transformed into geometric formulas.
During this translation, function symbols and constants are removed, and
replaced by relations.
Implementation
Geo is written in C++. We try to keep the code readable
and reasonably efficient at the same time. Geo has no sophisticated
datastructures. At this moment the priority lies with the theoretical
understanding of the calculus.
Strategies
Currently, Geo has only one strategy: It searches for a model by exhaustive
search, and learns a new rule at each backtracking point.
Lemmas do not expire, but they can be simplified.
Expected Competition Performance
Geo has become up to a hundred times faster on hard problems,
due to the added simplification rules and the improved selection rules.
So we expect that it will solve some more problems than last year.
leanCoP 2.0
Jens Otten
University of Potsdam, Germany
Architecture
leanCoP is an automated theorem prover for
classical first-order logic. It is a very compact
implementation of the connection calculus as
described in the original leanCoP paper [OB03],
which is similar to the connection tableau
calculus [LS01].
Implementation
leanCoP is implemented in Prolog. The source code
of the core prover is only a few lines long and fits on half a page.
In a preprocessing step
first-order formulas are translated into clausal form;
equality can be handled by adding the equality axioms.
leanCoP 2.0 keeps the input clauses
in Prolog's database thus integrating Prolog's indexing
mechanism into the lean theorem proving framework.
leanCoP is available at
http://www.leancop.de
Strategies
The reduction rule of the connection calculus is applied
before the extension rule is applied. Open branches are
selected in a depth-first way. Iterative deepening on the
proof depth is used to achieve completeness.
leanCoP 2.0 integrates inference rules for
regularity, factorization and an additional technique
to limit backtracking.
Expected Competition Performance
We expect the performance of leanCoP 2.0 to
be state-of-the-art according to the TPTP standards,
i.e. it is not subsumed by any other theorem prover on
the given problem set.
ileanCoP 1.2
Jens Otten
University of Potsdam, Germany
Architecture
ileanCoP is an automated theorem prover for
intuitionistic first-order logic. It is a very compact
implementation of the clausal connection calculus
for intuitionistic logic [Ott05],
which uses prefixes to capture the specific restrictions of
intuitionistic logic.
Implementation
ileanCoP is implemented in Prolog and based on
the classical prover leanCoP.
Only a few minor additions
are necessary to turn leanCoP into a theorem prover
for intuitionistic logic. The source code
of the core prover, which includes the prefix unification algorithm,
is only a few lines long and fits on one page.
In a preprocessing step formulas are translated into clausal form and
a prefix is added to each literal.
Equality can be handled by adding the equality axioms.
ileanCoP is available at
http://www.leancop.de/ileancop
Acknowledgments: Thomas Raths contributed to the development of ileanCoP by performing comprehensive benchmark tests on several problem libraries.
iProver 0.2
Konstantin Korovin
The University of Manchester, England
Architecture
iProver is based on the instantiation calculus [GK03],
which is complete for first-order logic.
iProver successively instantiates a given set of clauses, and uses a ground SAT solver for checking whether the current ground abstraction of clauses is satisfiable. If the ground solver finds unsatisfiability then the prover terminates, proving unsatisfiability of the initial set of clauses; otherwise the model for ground clauses is used to guide further instantiations. The saturation process is implemented as a modification of a given clause algorithm. We use non-perfect discrimination trees for the unification index. The following redundancy eliminations are implemented: blocking non-proper instantiations; dismatching constraints [GK04]; resolution-based simplifications; propositional-based simplifications, and removing duplicates. Equality is dealt with (internally) by adding the necessary axioms of equality.
Metis 2.0
Joe Hurd
Oxford University, UK
Architecture
Metis 2.0 [Hur03] is a proof tactic used in the
HOL4 interactive theorem
prover. It works by converting a higher order logic goal to a set of
clauses in first order logic, with the property that a refutation of
the clause set can be translated to a higher order logic proof of the
original goal.
Experiments with various first order calculi [Hur03] have shown a given clause algorithm and ordered resolution to best suit this application, and that is what Metis 2.0 implements. Since equality often appears in higher order logic goals, Metis 2.0 also implements the ordered paramodulation calculus.
In addition to standard size and distance measures, Metis 2.0 uses finite models to weight clauses in the Waiting set. When integrated with higher order logic, a finite model is manually constructed to interpret standard functions and relations in such a way as to make many axioms true and negated goals false. Non-standard functions and relations are interpreted randomly. Since it is part of the CASC competition rules that standard functions and relations are obfuscated, Metis 2.0 will back-off to interpreting all functions and relations randomly (except equality).
Metis 2.0 is free software, released under the GPL. It can be downloaded from:
http://www.gilith.com/software/metis
Testing on previous UEQ problem sets has shown Metis to be weak on
equality problems, so I expect Metis 2.0 to perform relatively worse
on the equality divisions, particularly UEQ.
Muscadet 2.7
Dominique Pastre
Université René Descartes (Paris 5), France
Architecture
The Muscadet theorem prover [Pas01] is a
knowledge-based system.
It is based on Natural Deduction, following the terminology of
[Ble71] and [Pas78],
and uses methods which resembles those used by humans.
It is composed of an inference engine, which interprets and executes rules,
and of one or several bases of facts,
which are the internal representation of "theorems to be proved".
Rules are either universal and put into the system, or built by the system
itself by metarules from data (definitions and lemmas).
Rules may add new hypotheses, modify the conclusion, create objects,
split theorems into two or more subtheorems
or build new rules which are local for a (sub-)theorem.
http://www.math-info.univ-paris5.fr/~pastre/muscadet/muscadet.html
The system is also able to work with second order statements. It may also receive knowledge and know-how for a specific domain from a human user; see [Pas89, Pas93]. These two possibilities are not used while working with the TPTP Library.
Otter 3.3 [McC03a] is an ATP system for statements in first-order (unsorted) logic with equality. Otter is based on resolution and paramodulation applied to clauses. An Otter search uses the "given clause algorithm", and typically involves a large database of clauses; subsumption and demodulation play an important role.
Otter is written in C. Otter uses shared data structures for clauses and terms, and it uses indexing for resolution, paramodulation, forward and backward subsumption, forward and backward demodulation, and unit conflict. Otter is available from:
http://www-unix.mcs.anl.gov/AR/otter/
Otter's original automatic mode, which reflects no tuning to the TPTP problems, will be used.
Otter has been entered into CASC as a stable benchmark against which progress can be judged (there have been only minor changes to Otter since 1996 [MW97], nothing that really affects its performace in CASC). This is not an ordinary entry, and we do not hope for Otter to do well in the competition.
Acknowledgments: Ross Overbeek, Larry Wos, Bob Veroff, and Rusty Lusk contributed to the development of Otter.
Paradox 1.3
Koen Claessen, Niklas Sörensson
Chalmers University of Technology and
Gothenburg University, Sweden
Architecture
Paradox [CS03] is a
finite-domain model generator. It is based on a MACE-style
[McC94] flattening and instantiating of the
first-order clauses into propositional clauses, and then the use of
a SAT solver to solve the resulting problem.
Paradox incorporates the following features: Polynomial-time clause splitting heuristics, the use of incremental SAT, static symmetry reduction techniques, and the use of sort inference.
The main differences with Paradox 1.0 are: a better SAT-solver, better memory behaviour, and a faster clause instantiation algorithm.
Paradox 2.2
Koen Claessen, Niklas Sörensson
Chalmers University of Technology, Sweden
Architecture
Paradox 2.2 is a rewrite of Paradox 1.3.
Paradox 2.2 does not have all
the features yet that Paradox 1.3 has. Some experimental features,
such as type-based model finding, have been added.
See the description of Paradox 1.3 for
general information.
No system description supplied.
However, see the
description of Vampire 8.0 for general information.
Minor changes have been made, including a bugfix in the FOF to
CNF conversion.
None supplied.
No system description supplied.
However, see the
description of Waldmeister 704 for general information.
Vampire 8.1
Andrei Voronkov
University of Manchester, England
Expected Competition Performance
Vampire 8.1 is the CASC-J3 FOF and CNF division winner.
Vampire 9.0
Andrei Voronkov
University of Manchester, England
Waldmeister 806
Thomas Hillenbrand1, Bernd Löchner2
1Max-Planck-Institut für Informatik Saarbrücken, Germany
2Technische Universität Kaiserslautern, Germany,
Expected Competition Performance
Waldmeister 806 is the CASC-J3 UEQ division winner.