Darwin 1.2
Peter Baumgartner1,
Alexander Fuchs2,
Cesare Tinelli2
1Max-Planck-Institut für Informatik Saarbrücken, Germany,
baumgart@mpi-sb.mpg.de
2The University of Iowa, USA
{fuchs,tinelli}@cs.uiowa.edu
Architecture
Darwin [BFT04]
is an automated theorem prover for first order clausal logic.
It is the first implementation of the Model Evolution Calculus
[BT03]. The Model Evolution
Calculus lifts the propositional DPLL procedure to
first-order logic. One of the main motivations for this approach
was the possibility of migrating to the first-order level some of
those very effective search techniques developed by the SAT community
for the DPLL procedure.
The current version of Darwin implements first-order versions of unit
propagation inference rules analogously to a restricted form of unit
resolution and subsumption by unit clauses. To retain completeness,
it includes a first-order version of the (binary) propositional
splitting inference rule.
Proof search in Darwin starts with a default interpretation for a given clause set, which is evolved towards a model or until a refutation is found.
Darwin is implemented in OCaml and has been tested under various Linux distributions (compiled but untested on FreeBSD, MacOS X, Windows). It is available from:
http://goedel.cs.uiowa.edu/Darwin/
Darwin traverses the search space by iterative deepening over the term depth of candidate literals. Darwin employs a uniform search strategy for all problem classes.
Darwin is a first implementation for the Model Evolution calculus. We expect its performance to be strong in the EPR divisions; we anticipate performance below average in the MIX division, and weak performance in the SAT division.
DCTP 1.31 [Ste02a] is an automated theorem prover
for first order clause logic.
It is an implementation of the disconnection calculus described in
[Bil96,LS01,
Ste02b].
The disconnection calculus is a proof confluent and inherently
cut-free tableau calculus with a weak connectedness condition.
The inherently depth-first proof search is guided by a literal selection
based on literal instantiatedness or literal complexity and a heavily
parameterised link selection.
The pruning mechanisms mostly rely on different forms of
variant deletion and unit based strategies.
Additionally the calculus has been augmented by full tableau pruning.
DCTP 10.21p is a strategy parallel version using the technology of
E-SETHEO [SW99] to combine several different
strategies based on DCTP 1.31.
DCTP 1.31 has been implemented as a monolithic system in the Bigloo
dialect of the Scheme language.
The most important data structures are perfect discrimination trees,
which are used in many variations.
DCTP 10.21p has been derived of the Perl implementation of E-SETHEO
and includes DCTP 1.31 as well as the E prover as its CNF converter. Both
versions run under Solaris and Linux.
We are currently integrating a range of new techniques into DCTP which are
mostly based on the results described in [LS04],
as well as a certain form of unit propagation.
We are hopeful that these improvements will be ready in time for CASC-J2.
DCTP 1.31 is a single strategy prover.
Individual strategies are started by DCTP 10.21p using the schedule based
resource allocation scheme known from the E-SETHEO system. Of course,
different schedules have been precomputed for the syntactic problem classes.
The problem classes are more or less identical with the sub-classes of the
competition organisers. Again, we have no idea whether or not this
conflicts with the organisers' tuning restrictions.
DCTP 10.21p is the CASC-J2 EPR division winner.
E is based on the DISCOUNT-loop variant of the given-clause
algorithm, i.e. a strict separation of active and passive facts.
Proof search in E is primarily controlled by a literal selection
strategy, a clause evaluation heuristic, and a simplification
ordering. The prover supports a large number of preprogrammed literal
selection strategies, many of which are only experimental. Clause
evaluation heuristics can be constructed on the fly by combining
various parameterized primitive evaluation functions, or can be
selected from a set of predefined heuristics. Supported term orderings
are several parameterized instances of Knuth-Bendix-Ordering (KBO) and
Lexicographic Path Ordering (LPO).
The prover uses a preprocessing step to convert formulas in full first
order format to clause normal form. Preprocessing also unfolds
equational definitions and performs some simplifications on the clause
level.
The automatic mode can selects literal selection strategy, term
ordering, and search heuristic based on simple problem characteristics
of the preprocessed clausal problem.
EP 0.9 is just a combination of E 0.9 in verbose mode and a
proof analysis tool extracting the used inference steps.
DCTP 10.21p
Gernot Stenz
Technische Universität München, Germany
stenzg@informatik.tu-muenchen.de
Architecture
Implementation
Strategies
Expected Competition Performance
E and EP 0.9
Stephan Schulz
Technische Universität München, Germany, and ITC/irst, Italy
schulz@eprover.de
Architecture
E 0.9 [Sch02,Sch04a] is a purely
equational theorem prover. The core proof procedure operates on
formulas in clause normal form, using a calculus that combines
superposition (with selection of negative literals) and rewriting. No
special rules for non-equational literals have been implemented, i.e.,
resolution is simulated via paramodulation and equality
resolution. The basic calculus is extended with rules for AC
redundancy elemination, some contextual simplification, and
pseudo-splitting. The latest version of E also supports simultaneous
paramodulation, either for all inferences or for selected inferences.
Implementation
E is implemented in ANSI C, using the GNU C compiler. The most
outstanding feature is the global sharing of rewrite steps. Current
versions of E add rewrite links from rewritten to new
terms. In effect, E is caching rewrite operations as long as
sufficient memory is available. Other important features are the use
of perfect discrimination trees with age and size constraints
for rewriting and unit-subsumption, feature vector
indexing [Sch04b] for forward- and
backward subsumption and contextual literal cutting, and a new
polynomial implementation of LPO [Loe04].
The program has been successfully installed under SunOS 4.3.x, Solaris 2.x, HP-UX B 10.20, MacOS-X, and various versions of Linux. Sources of the latest released version are available freely from:
http://www.eprover.org
EP 0.9 is a simple Bourne shell script calling E and the postprocessor in a pipeline.
E distinguishes problem classes based on a number of features, all of which have between 2 and 4 possible values. The most important ones are:
For classes above a threshold size, we assign the absolute best heuristic to the class. For smaller, non-empty classes, we assign the globally best heuristic that solves the same number of problems on this class as the best heuristic on this class does. Empty classes are assigned the globally best heuristic. Typically, most selected heuristics are assigned to more than one class.
EP 0.9 will be hampered by the fact that it has to analyse the inference step listing, an operation that typically is about as expensive as the proof search itself. Nevertheless, it should be competitive among the MIX and FOF proof class systems.
Gandalf
[Tam97,Tam98]
is a family of automated theorem provers, including classical, type theory,
intuitionistic and linear logic provers, plus finite a model builder.
The version c-2.6 contains the classical logic prover for clause
form input and the finite model builder.
One distinguishing feature of Gandalf is that it contains a large
number of different search strategies and is capable of automatically
selecting suitable strategies and experimenting with these strategies.
The finite model building component of Gandalf uses the Zchaff propositional
logic solver by L.Zhang [MM+01] as an
external program called by Gandalf.
Zchaff is not free, although it can be used freely for research purposes.
Gandalf is not optimised for Zchaff or linked together with it: Zchaff
can be freely replaced by other satisfiability checkers.
Gandalf is implemented in Scheme and compiled to C using the Hobbit
Scheme-to-C compiler.
Version scm5d6 of the Scheme interpreter scm by A.Jaffer is used as the
underlying Scheme system.
Zchaff is implemented in C++.
Gandalf has been tested on Linux, Solaris, and MS Windows under Cygwin.
Gandalf is available under GPL from:
One of the basic ideas used in Gandalf is time-slicing: Gandalf typically
runs a number of searches with different strategies one after another, until
either the proof is found or time runs out. Also, during each specific run
Gandalf typically modifies its strategy as the time limit for this run
starts coming closer. Selected clauses from unsuccessful runs are sometimes
used in later runs.
In the normal mode Gandalf attempts to find only unsatisfiability.
It has to be called with a -sat flag to find satisfiability.
The following strategies are run:
Gandalf c-2.6-SAT is the CASC-J2 SAT division, assurance class, winner.
MathServ currently integrates the ATP systems Otter 3.3, EP 0.82,
SPASS 2.1 and Vampire 7.0.
All ATP services get a problem description in the new TPTP format (TPTP
Library v3.0.1) and a time limit in seconds as an input.
The problem is transformed into the provers' input format using the tptp2X
utility.
The result specifies the status of the given problem with one of the
statuses defined in the SZS Status Ontology [SZS04].
If the system delivered a refutation proof then the proof is translated into
TSTP format (optionally in XTSTP) using tools developed by Geoff Sutcliffe.
Since MathServ does not form an ATP system on its own it only enters the
Demonstration Division of the CASC.
The MUSCADET theorem prover is a knowledge-based system.
It is based on Natural Deduction, following the terminology of
[Ble71] and [Pas78],
and uses methods which resembles those used by humans.
It is composed of an inference engine, which interprets and executes rules,
and of one or several bases of facts,
which are the internal representation of "theorems to be proved".
Rules are either universal and put into the system, or built by the system
itself by metarules from data (definitions and lemmas).
Rules may add new hypotheses, modify the conclusion, create objects,
split theorems into two or more subtheorems
or build new rules which are local for a (sub-)theorem.
The system is also able to work with second order statements. It
may also receive knowledge and know-how for a specific domain from a
human user; see [Pas89] and
[Pas93]. These two possibilities
are not used while working with the TPTP Library.
Mace2 [McC01] searches for finite models of
first-order (including equality) statements.
Mace2 iterates through domain sizes, starting with 2.
For a given domain size, a propositional satisfiability problem is constucted
from the ground instances of the statements, and a DPLL procedure is applied.
Mace2 is an entirely different program from Mace4
[McC03b], in which the ground problem for a
given domain size contains equality and is decided by rewriting.
Mace2 is coded in ANSI C. It uses the same code as Otter
[McC03a] for parsing input, and (for the most part)
accepts the same intput files as Otter.
Mace2 is packaged and distributed with Otter, which is available from:
Mace2 has been evolving slowly for about ten years.
Two important strategies have been added recently.
In 2001, a method to reduce the number of isomorphic models was added; this
method is similar in spirit to the least number optimization used in
rewrite-based methods, but it applies only to the first five constants.
In 2003, a clause-parting method (based on the variable occurrences in
literals of flattened clauses) was added to improve performace on inputs with
large clauses.
Although Mace2 has several experimental features, it uses one fixed stragegy
for CASC.
Mace2 is not expected to win any prizes, because it uses one fixed strategy,
and no tuning has been done with respect to the TPTP problem library.
Also, Mace2 does not accept function symbols with arity greater than 3 or
predicate symbols with arity greater than 4.
Overall performace, however, should be respectable.
Mace4 is an entirely different program from Mace2, in which
the problem for a given domain size is reduced to a purely
propositional SAT problem that is decided by DPLL.
Mace4 is not expected to win any prizes, because it uses one fixed strategy,
and no tuning has been done with respect to the TPTP problem library.
Overall performace, however, should be respectable.
An early version of Mace4 competed in CASC-2002 under then name ICGNS.
Equinox---1.0
Koen Claessen
Chalmers University of Technology, Sweden
koen@cs.chalmers.se
Architecture
Equinox is a new theorem prover for pure first-order logic with
equality. It finds ground proofs of the input theory, by solving
successive ground instantiations of the theory using an incremental
SAT-solver. Equality is dealt with using a Nelson-Oppen framework.
Implementation
The main part of Equinox is implemented in Haskell using the GHC compiler.
Equinox also has a built-in incremental SAT solver which is written in C++.
The two parts are linked together on the object level using Haskell's
Foreign Function Interface.
Strategies
There is only one strategy in Equinox:
Expected Competition Performance
Equinox is still in its infancy. There should however be problems that
it can solve that few other provers can handle.
Gandalf c-2.6-SAT
Tanel Tammet
Tallinn Technical University, Estonia
tammet@cc.ttu.ee
Architecture
Implementation
http://www.ttu.ee/it/gandalf
Strategies
Expected Competition Performance
MathServ 0.62
Jürgen Zimmer
Universität des Saarlandes, Germany
{jzimmer,serge}@ags.uni-sb.de
Architecture
MathServ is a framework for integrating reasoning systems as Web Services
into a networked environment.
The functionality of these Reasoning Web Services is captured in Semantic
Web Service descriptions using the OWL-S [MarURL]
upper ontology for semantic web services.
The MathServ Broker is a middle-agent which knows all available reasoning
services and answers queries for client applications.
Given a problem description, the MathServ Broker can automatically retrieve
services and to combine services in case one single service is not
sufficient to tackle a problem.
Our Broker performs a "semantic" best match by analysing incoming problems
and choosing the best service available for that problem.
Implementation
MathServ is implemented in Java and uses an Apache Tomcat web server and
the AXIS package to offer reasoning systems as Web Services.
All services are accessible programmatically using WSDL descriptions of
their interface.
ATP systems are integrated via Java wrappers that manage the translation
of the input problems into the prover's input format and the translation
of resulting resolution proofs into TSTP format.
Strategies
We collected data on the performance of the underlying ATP systems on all
the Specialists Problem Classes (SPCs) [SS01] of
the TPTP Library.
The OWL-S descriptions of all ATP services have been annotated with this data.
The MathServ Broker determines the SPC of an incoming TPTP problem and uses
the performance data to choose a suitable prover for that problem.
Expected Competition Performance
The system is too new to make any predictions.
MUSCADET 2.5
Dominique Pastre
Université René Descartes - Paris, France
pastre@math-info.univ-paris5.fr
Architecture
Implementation
MUSCADET 2 [Pas02]
is implemented in SWI-Prolog.
Rules are written as declarative Prolog clauses.
Metarules are written as sets of Prolog clauses, more or less declarative.
The inference engine includes the Prolog interpreter and some procedural Prolog
clauses.
MUSCADET 2.5 is available from:
http://www.math-info.univ-paris5.fr/~pastre/muscadet/muscadet.html
Strategies
There are specific strategies for existential, universal, conjunctive or
disjunctive hypotheses and conclusions.
Functional symbols may be used, but an automatic creation of intermediate
objects allows deep subformulae to be flattened and treated as if the
concepts were defined by predicate symbols.
The successive steps of a proof may be forward deduction (deduce new hypotheses
from old ones), backward deduction (replace the conclusion by a new one) or
refutation (only if the conclusion is a negation).
Expected Competition Performance
The best performances of MUSCADET will be for problems
manipulating many concepts in which all statements (conjectures,
definitions, axioms) are expressed in a manner similar to the
practice of humans, especially of mathematicians. It will have poor
performances for problems using few concepts but large and deep
formulas leading to many splittings.
Mace2 2.2
William McCune
Argonne National Laboratory, USA
mccune@mcs.anl.gov
Architecture
Implementation
http://www.mcs.anl.gov/AR/otter
Strategies
Expected Competition Performance
Mace4 2004-D
William McCune
Argonne National Laboratory, USA
mccune@mcs.anl.gov
Architecture
Mace4 [McC03b] searches for finite models of
first-order
(unsorted, with equality) statements. Given input clauses, it generates
ground instances over a finite domain, then it uses a decision
procedure based on rewriting try to determine satisfiability.
If there is no model of that size, it increments the domain size
and tries again. Input clauses are not "flattened" as they are
in procedures that reduce the problem to propositional satisfiability
without equality, for example, Mace2 [McC01].
Implementation
Mace4 is coded in ANSI C and is available from:
http://www.mcs.anl.gov/AR/mace4
Strategies
The two main parts of the Mace4 method
are (1) selecting the next empty cell in the tables of functions
being constructed and deciding which values need
to be considered for that cell, and (2) propagating assignments.
Mace4 uses the basic least number heuristic (LNH) to reduce
isomorphism.
The LNH was introduced in Falcon [Zha96] and is
also used in SEM.
Effective use of the LNH requires careful cell selection.
Propagation is by ground rewriting and inference rules to
derive negated equalities.
Expected Competition Performance
Octopus JN05
Monty Newborn, Zongyan Wang
McGill University, Canada
newborn@cs.mcgill.ca
Architecture
Octopus [NW04] is a parallel version of
THEO, and is designed to run on as many computers
as are available.
For this competition, we expect to run on between 100 and 200 computers.
In the 2004 Cork competition, it ran on 120 PCs.
Octopus's single processor version, THEO [New01],
is a resolution-refutation theorem prover for first order logic.
It accepts theorems in either clausal form or FOL form. THEO's inference
procedures include binary resolution, binary factoring, instantiation,
demodulation, and hash table resolutions.
Implementation
Octopus is written in C and runs under both LINUX and FREEBSD.
It contains about 70000 lines of source code.
It normally runs on a PC with at least 512Mb of memory.
Strategies
Each processor of Octopus is a copy of THEO.
When the master receives a theorem to prove, it delivers it to all the
slaves with instructions on which weakened version of the theorem to try.
Each slave then tries a different weakened version.
If a slave finds a proof of a weakened version, it goes on to attempt to
prove the given theorem.
Strategies similar to those in THEO are used if the weakened version is not
proved by some slave.
THEO's procedure is described in THEO's "System
Description".
Once the master has sent the theorem to the slave, there is no communication
among the computers except when one finds a proof.
Otter 3.3 [McC03a] is an ATP system for
statements in first-order (unsorted) logic with equality.
Otter is based on resolution and paramodulation applied to clauses.
An Otter search uses the "given clause algorithm", and typically involves
a large database of clauses; subsumption and demodulation play an important
role.
Otter is written in C.
Otter uses shared data structures for clauses and terms, and it uses
indexing for resolution, paramodulation, forward and backward subsumption,
forward and backward demodulation, and unit conflict.
Otter is available from:
Otter's original automatic mode, which reflects no tuning to the TPTP
problems, will be used.
Otter has been entered into CASC-J2 as a stable benchmark against which
progress can be judged (there have been only minor changes to Otter since
1996 [MW97], nothing that really affects
its performace in CASC).
This is not an ordinary entry, and we do not hope for Otter to do well
in the competition.
Acknowledgments: Ross Overbeek, Larry Wos, Bob Veroff, and
Rusty Lusk contributed to the development of Otter.
Paradox 1.0 [CS03] is a finite-domain model
generator.
It is based on a MACE-style [McC94]
flattening and instantiating of the FO clauses into propositional clauses, and
then the use of a SAT solver to solve the resulting problem.
Paradox incorporates the following novel features: New polynomial-time
clause splitting heuristics, the use of incremental SAT,
static symmetry reduction techniques, and the use of sort
inference.
The main part of Paradox is implemented in Haskell using the GHC compiler.
Paradox also has a built-in incremental SAT solver which is written in C++.
The two parts are linked together on the object level using Haskell's Foreign
Function Interface.
Paradox uses the following non-standard Haskell extensions: local universal
type quantification and hash-consing.
There is only one strategy in Paradox:
In this way, Paradox can be used both as a model finder and as an EPR solver.
Paradox 1.0 is the CASC-J2 SAT division, model class, winner.
Paradox incorporates the following features: Polynomial-time
clause splitting heuristics, the use of incremental SAT,
static symmetry reduction techniques, and the use of sort
inference.
The main differences with Paradox 1.0 are: a better SAT-solver,
better memory behaviour, and a faster clause instantiation algorithm.
Prover9 has available positive ordered (and nonordered) resolution and
paramodulation, negative ordered (and nonordered) resolution, factoring,
positive and negative hyperresolution, UR-resolution, and demodulation
(term rewriting).
Terms can be ordered with LPO, RPO, or KBO.
Selection of the "given clause" is by an age-weight ratio.
Proofs can be given at two levels of detail:
(1) standard, in which each line of the proof is a stored clause
with detailed justification, and
(2) expanded, with a separate line for each operation.
When FOF problems are input, proof of transformation to clauses
is not given.
Completeness is not guaranteed, so termination does not indicate
satisfiability.
Given a problem, Prover9 adjusts its inference rules and strategy
according to the category of the problem
(HNE, HEQ, NNE, NEQ, PEQ, FNE, FEQ, UEQ)
and according several other syntactic properties of the input clauses.
Terms are ordered by LPO for demodulation and for the inference rules,
with a simple rule for determining symbol precedence.
For the FOF problems, a preprocessing step attempts to
reduce the problem to independent subproblems by a miniscope
transformation [MINISCOPE]; if the problem reduction succeeds, each
subproblem is clausified and given to the ordinary search
procedure; if the problem reduction fails, the original
problem is clausified and given to the search procedure.
As this description is being written, the specific rules for
deciding the strategy have not been finalized. They will be
available from the URL given above by the time CASC occurs.
Finishes in the middle of the pack are anticipated in all categories
in which Prover9 competes (MIX, FOF, and UEQ).
When THEO participated in the 2004 CASC Competition, it used a learning
strategy described in this paragraph.
When given a theorem, it first created a list of potential ways to "weaken"
the theorem by weakening one of the clauses.
It then randomly selected one of the weakenings, tried to prove the weakened
version of the theorem, and then used the results from this effort to help
prove the given theorem.
A weakened version was created by modifying one clause by replacing a constant
or function by a variable or by deleting a literal.
Certain clauses from the proof of the weakened version were added to the
base clauses when THEO next attempted to prove the given theorem.
In addition, base clauses that participated in the proof of the weakened
version were placed in the set-of-support.
THEO then attempted to prove the given theorem with the revised set of base
clauses.
Over the last year, this learning strategy has been further developed as
described here.
When THEO is given a theorem, it first tries to prove a weakened version.
It does this for 50% of the allotted time.
If successful, it then attempts to prove the given theorem as described in
the previous paragraph.
If unsuccessful, however, THEO then weakens the given theorem further by
weakening an additional clause.
With two clauses now weakened, THEO then attempts to prove the given theorem
for 50% of the remaining time (25% of the originally allotted time).
If successful, THEO uses the results to attempt to prove the given theorem.
If unsuccessful, THEO picks two other weakenings and ties again for 50% of
the remaining time.
Now certain weakened proofs are thrown out, as they generally seem to be
useless.
Weakened proofs that are shorter than three inferences are considered useless,
as are those that are less than five but do not involve the negated
conclusion.
Attempts to improve the learning strategy are of major interest in the ongoing
development of THEO.
Octopus, the parallel version of THEO, uses variations of the described
learning strategy.
Vampire [RV02] 7.0 is an automatic theorem
prover for first-order classical logic.
Its kernel implements the calculi of ordered binary resolution
and superposition for handling equality.
The splitting rule and negative equality splitting are simulated by the
introduction of new predicate definitions and dynamic folding
of such definitions.
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion
(optionally modulo commutativity),
subsumption resolution, rewriting by ordered unit equalities,
basicness restrictions and irreducibility of substitution terms.
The reduction orderings used are the standard Knuth-Bendix ordering and a
special non-recursive version of the Knuth-Bendix ordering.
A number of efficient indexing techniques is used to implement all major
operations on sets of terms and clauses. Run-time
algorithm specialisation is used to accelerate some costly
operations, e.g., checks of ordering constraints.
Although the kernel of the system works only with clausal normal forms, the
preprocessor component accepts a problem in the full first-order logic syntax,
clausifies it and performs a number of useful transformations before passing
the result to the kernel. When a theorem is proven, the system
produces a verifiable proof, which validates both the clausification
phase and the refutation of the CNF.
The current release features a built-in proof checker for the
clausifying phase, which will be extended to check complete proofs.
Vampire 7.0 is implemented in C++.
The supported compilers are gcc 3.2.x, gcc 3.3.x, and Microsoft Visual C++.
This version has been successfully compiled for Linux, but
has not been fully tested on Solaris and Win32.
It is available (conditions apply) from:
The Vampire kernel provides a fairly large number of features for strategy
selection.
The most important ones are:
The automatic mode of Vampire 7.0 is derived from extensive
experimental data obtained on problems from TPTP v2.6.0.
Input problems are classified taking into account simple syntactic properties,
such as being Horn or non-Horn, presence of equality, etc.
Additionally, we take into account the presence of some important kinds of
axioms, such as set theory axioms, associativity and commutativity.
Every class of problems is assigned a fixed schedule consisting
of a number of kernel strategies called one by one with different time limits.
A number of efficient indexing techniques are used to implement all major
operations on sets of terms and clauses.
Run-time algorithm specialisation is used to accelerate some costly operations,
e.g., checks of ordering constraints.
Although the kernel of the system works only with clausal normal forms, the
shell accepts a problem in the full first-order logic syntax, clausifies it
and performs a number of useful transformations before passing the result to
the kernel.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
The automatic mode of Vampire 8.0 is derived from extensive experimental
data obtained on problems from TPTP v3.0.1.
Input problems are classified taking into account simple syntactic properties,
such as being Horn or non-Horn, presence of equality, etc.
Additionally, we take into account the presence of some important kinds of
axiom, such as set theory axioms, associativity and commutativity.
Every class of problems is assigned a fixed schedule consisting of a number
of kernel strategies called one by one with different time limits.
Main differences between Vampire 8.0 and 7.0
Waldmeister 704 is a system for unit equational deduction.
Its theoretical basis is unfailing completion in the sense of
[BDP89] with refinements towards ordered
completion (cf. [AHL03]).
The system saturates the input axiomatization, distinguishing active facts,
which induce a rewrite relation, and passive facts, which are the one-step
conclusions of the active ones up to redundancy.
The saturation process is parameterized by a reduction ordering and a heuristic
assessment of passive facts [HJL99].
For an in-depth description of the system, see
[Hil03].
Waldmeister 704 improves over last year's version in several respects.
Firstly, the detection of redundancies in the presence of
associative-commutative operators has been strenghtened (cf.
[Loe04]).
In a class of AC-equivalent equations, an element is redundant if each of
its ground instances can be rewritten, with the ground convergent rewrite
system for AC, into an instance of another element.
Instead of elaborately checking this kind of reducability explicitly, it
can be rephrased in terms of ordering constraints and efficiently be
approximated with a polynomial test.
Secondly, the last teething troubles of the implementation of the Waldmeister
loop have been overcome.
Thirdly, a number of strategies have slightly been revised.
The prover is coded in ANSI-C. It runs on Solaris, Linux, and newly
also on MacOS X. In addition, it is now available for Windows users
via the Cygwin platform. The central data strucures are: perfect
discrimination trees for the active facts; group-wise compressions for
the passive ones; and sets of rewrite successors for the conjectures.
Visit the Waldmeister web pages at:
The approach taken to control the proof search is to choose the search
parameters according to the algebraic structure given in the problem
specification [HJL99].
This is based on the observation that proof tasks sharing major parts of
their axiomatization often behave similar.
Hence, for a number of domains, the influence of different reduction orderings
and heuristic assessments has been analyzed experimentally; and in most cases
it has been possible to distinguish a strategy uniformly superior on the whole
domain.
In essence, every such strategy consists of an instantiation of the first
parameter to a Knuth-Bendix ordering or to a lexicographic path ordering, and
an instantiation of the second parameter to one of the weighting functions
addweight, gtweight, or mixweight, which, if
called on an equation , return
,
, or
,
respectively, where denotes the number of symbols
in .
Waldmeister 704 is the CASC-J2 UEQ division winner.
Wgandalf uses some ideas, but no code from the earlier Gandalf system of
the same author: essentially, it is a completely new system.
Wgandalf is available under GPL, and at some point in the future it will
be available from the site:
Otter 3.3
William McCune
Argonne National Laboratory, USA
mccune@mcs.anl.gov
Architecture
Implementation
http://www-unix.mcs.anl.gov/AR/otter/
Strategies
Expected Competition Performance
Paradox 1.0
Koen Claessen, Niklas Sörensson
Chalmers University of Technology and
Gothenburg University, Sweden
{koen,nik}@cs.chalmers.se
Architecture
Implementation
Strategies
Expected Competition Performance
Paradox 1.3
Koen Claessen, Niklas Sörensson
Chalmers University of Technology and
Gothenburg University, Sweden
{koen,nik}@cs.chalmers.se
Architecture
Paradox [CS03] is a
finite-domain model generator. It is based on a MACE-style
[McC94] flattening and instantiating of the
first-order clauses into propositional clauses, and then the use of
a SAT solver to solve the resulting problem.
Implementation
The main part of Paradox is implemented in Haskell using the GHC compiler.
Paradox also has a built-in incremental SAT solver which is written in C++.
The two parts are linked together on the object level using Haskell's Foreign
Function Interface.
Strategies
There is only one strategy in Paradox:
In this way, Paradox can be used both as a model finder and as an EPR solver.
Expected Competition Performance
Paradox 1.3 should perform slightly better than Paradox 1.0 and 1.1.
Prover9 July-2005
William McCune
Argonne National Laboratory, U.S.A.
mccune@mcs.anl.gov
Architecture
Prover9, Version July-2005, is a resolution/paramodulation prover
for first-order logic with equality.
Its overall architecture is very similar to that of Otter-3.3
[McC03a].
It uses the "given clause algorithm", in which not-yet-given clauses are
available for rewriting and for other inference operations (sometimes called
the "Otter loop").
Implementation
Prover9 is coded in C, and it uses the LADR libraries
[McCURL].
Some of the code descended from EQP [McC97].
(LADR has some AC functions, but Prover9 does not use them).
Term data structures are not shared (as they are in Otter).
Term indexing is used extensively, with discrimination tree indexing for
finding rewrite rules and subsuming units, FPA/Path indexing for finding
subsumed units, rewritable terms, and resolvable literals.
Feature vector indexing
[Sch04b]
is used for forward and backward nonunit subsumption.
At the time of CASC, Prover9 will be available at:
http://www.mcs.anl.gov/~mccune/prover9/
Strategies
Like Otter, Prover9 has available many strategies; the following
statements apply to CASC-2005.
Expected Competition Performance
Some of the strategy development for CASC was done by experimentation with
the CASC-2004 competition "selected" problems.
(Prover9 has not yet been run on other TPTP problems.)
Prover9 is unlikely to challenge the CASC leaders, because
(1) extensive testing and tuning over TPTP problems has not been done,
(2) theories (e.g., ring, combinatory logic, set theory) are not recognized,
(3) term orderings and symbol precedences are not fine-tuned, and
(4) multiple searches with differing strategies are not run.
THEO JN05
Monty Newborn
McGill University, Canada
newborn@cs.mcgill.ca
Architecture
THEO [New01] is a resolution-refutation theorem
prover for first order logic.
It accepts theorems in either clausal form or FOL form.
THEO's inference procedures include binary resolution, binary factoring,
instantiation, demodulation, and hash table resolutions.
Implementation
THEO is written in C and runs under both LINUX and FREEBSD.
It contains about 50000 lines of source code.
Originally it was called The Great Theorem Prover.
Strategies
THEO uses a large hash table (16 million entries) to store clauses.
This permits complex proofs to be found, some as long as 500 inferences.
It uses what might be called a brute-force iteratively deepening depth-first
search looking for a contradiction while storing information about
clauses - unit clauses in particular - in its hash table.
Expected Competition Performance
While the learning strategy has been quite successful, THEO is only
marginally better than it was last year.
Vampire 7.0
Alexandre Riazanov, Andrei Voronkov
University of Manchester, England
{riazanoa,voronkov}@cs.man.ac.uk
Architecture
Implementation
http://www.cs.man.ac.uk/~riazanoa/Vampire
Strategies
Expected Competition Performance
Vampire 7.0 is the CASC-J2 MIX and FOF divisions, assurance and proof classes,
winner.
Vampire 8.0
Andrei Voronkov
University of Manchester, England
voronkov@cs.man.ac.uk
Architecture
Vampire 8.0, [RV02,Vor05]
is an automatic theorem prover for first-order classical logic.
It consists of a shell and a kernel.
The kernel implements the calculi of ordered binary resolution and
superposition for handling equality.
The splitting rule and negative equality splitting are simulated by the
introduction of new predicate definitions and dynamic folding of such
definitions.
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion
(optionally modulo commutativity), subsumption resolution, rewriting by
ordered unit equalities, and a lightweight basicness.
The CASC version uses the Knuth-Bendix ordering.
The lexicographic path ordering has been implemented recently but will not be
used for this CASC.
Implementation
Vampire 8.0 is implemented in C++. The supported compilers are gcc
gcc 3.3.x, and Microsoft Visual C++.
Strategies
The Vampire kernel provides a fairly large number of features for
strategy selection.
The most important ones are:
Expected Competition Performance
We expect Vampire 8.0 to improve over Vampire 7.0 in the FOF, MIX, and UEQ
divisions.
It is still not ready to compete in the EPR division.
Waldmeister 704
Jean-Marie Gaillourdet1,
Thomas Hillenbrand2, Bernd Löchner1
1Technische Universität Kaiserslautern, Germany,
2Max-Planck-Institut für Informatik Saarbrücken, Germany
waldmeister@informatik.uni-kl.de
Architecture
Implementation
http://www.waldmeister.org
Strategies
Expected Competition Performance
Wgandalf 0.1
Tanel Tammet
Tallinn Technical University, Estonia
tammet@cc.ttu.ee
Architecture
Wgandalf 0.1 is an early, pre-alpha version of the full 1st order prover
intended to be useful in rule-based databases, semantic web apps, and similar
data-centric applications, while being a powerful conventional prover at the
same time.
The current pre-alpha version achieves none of the intentions.
It is usable only as a conventional prover, while being a fairly weak
conventional prover.
It is not able to output a proof and it does not contain any
components for model building.
However, it does contain several different search strategies and it uses
time slicing to call these strategies one after another.
Implementation
Wgandalf is implemented in C, using various Gnu tools, like automake,
autoconf, bison and flex.
It does incorporate the scm Scheme interpreter of A.Jaffer and the
Hobbit Scheme-to-C compiler of the author as an extension language.
Most of the Wgandalf is written directly in C, however, and is
capable of functioning without the scm system.
Wgandalf has been developed and tested on Linux.
http://www.ttu.ee/it/wgandalf
which will contain further details about the architecture, strategies
and implementation.
Strategies
Wgandalf uses a number of conventional strategies on top of ordinary
binary resolution: literal orderings of various kinds, set of
support, clause simplification methods, ordinary subsumption,
demodulation and paramodulation. The basic resolution loop is
a DISCOUNT loop, not the OTTER loop. Only the key data about
the derivation history is kept, based on what the actual clauses
are re-created when picked as given clauses. Wgandalf does
contain a few strategies of a more exotic kind, which will be
described in the wgandalf documentation.
Expected Competition Performance
Wgandalf 0.1 is expected to perform significantly worse than the current
state-of-the-art provers.
However, it is likely to perform better than the classic provers and the older
Gandalf system of the same author.