Darwin 1.3
Peter Baumgartner1,
Alexander Fuchs2,
Cesare Tinelli2
1National ICT Australia, Australia,
Peter.Baumgartner[@]nicta.com.au
2The University of Iowa, USA
{fuchs,tinelli}[@]cs.uiowa.edu
Architecture
Darwin [BFT04, BFT06a]
is an automated theorem prover for first order clausal logic.
It is an implementation of the Model Evolution Calculus
[BT03].
The Model Evolution Calculus lifts the propositional DPLL procedure to
first-order logic. One of the main motivations for this approach
was the possibility of migrating to the first-order level some of
those very effective search techniques developed by the SAT community
for the DPLL procedure.
The current version of Darwin implements first-order versions of unit propagation inference rules analogously to a restricted form of unit resolution and subsumption by unit clauses. To retain completeness, it includes a first-order version of the (binary) propositional splitting inference rule.
Proof search in Darwin starts with a default interpretation for a given clause set, which is evolved towards a model or until a refutation is found.
Improvements to the previous version include additional preprocessing steps, less memory requirements, and lemma learning [BFT06b].
Darwin is implemented in OCaml and has been tested under various Linux distributions. It is available from:
http://goedel.cs.uiowa.edu/Darwin/
FM-Darwin 1.3
Peter Baumgartner1,
Alexander Fuchs2,
Cesare Tinelli2
1National ICT Australia, Australia,
Peter.Baumgartner[@]nicta.com.au
2The University of Iowa, USA
{fuchs,tinelli}[@]cs.uiowa.edu
Architecture
FM-Darwin [BF+06] is a finite model finder for
first order clausal logic with equality.
The system tries to find a finite domain model for sizes 1,2,.. by equisatisfiable preserving transformation into corresponding function-free clause sets, which are decided by the Darwin prover [BFT06a].
FM-Darwin operates similarly to the finite model finder Paradox [CS03]. Unlike Paradox, it is not based on reduction to propositional satisfiability problems, but on reduction to satisfiability problems of (not necessarily ground) function-free clause sets instead.
Similar to Paradox FM-Darwin uses clause splitting, term definitions, static symmetry reduction, sort inference, and lemmas. In constrast, clause splitting and term definitions are only applied in a restricted way, that is for variable disjunct clause and ground terms, as ground instantiation is not performed and thus the exponential increase in the size of the clause set does not happen.
FM-Darwin is available from:
http://goedel.cs.uiowa.edu/Darwin/
Like Paradox, FM-Darwin is a complete method for function-free clause sets, and can also detect unsatisfiability if the totality axioms are not used in a refutation.
DCTP 10.21p
Gernot Stenz
Technische Universität München, Germany
stenzg[@]informatik.tu-muenchen.de
Architecture
DCTP 1.31 [Ste02a] is an automated theorem prover
for first order clause logic.
It is an implementation of the disconnection calculus described in
[Bil96,LS01,Ste02b].
The disconnection calculus is a proof confluent and inherently
cut-free tableau calculus with a weak connectedness condition.
The inherently depth-first proof search is guided by a literal selection
based on literal instantiatedness or literal complexity and a heavily
parameterised link selection.
The pruning mechanisms mostly rely on different forms of
variant deletion and unit based strategies.
Additionally the calculus has been augmented by full tableau pruning.
DCTP 10.21p is a strategy parallel version using the technology of E-SETHEO [SW99] to combine several different strategies based on DCTP 1.31.
We are currently integrating a range of new techniques into DCTP which are mostly based on the results described in [LS04], as well as a certain form of unit propagation. We are hopeful that these improvements will be ready in time for CASC-J2.
E and EP 0.99
Stephan Schulz
Institut für Informatik, Technische Universität, Germany
schulz[@]eprover.org
Architecture
E 0.99 [Sch02,Sch04a]
is a purely equational theorem prover. The core proof procedure operates on
formulas in clause normal form, using a calculus that combines
superposition (with selection of negative literals) and rewriting. No
special rules for non-equational literals have been implemented, i.e.,
resolution is simulated via paramodulation and equality
resolution. The basic calculus is extended with rules for AC
redundancy elemination, some contextual simplification, and
pseudo-splitting. The latest version of E also supports simultaneous
paramodulation, either for all inferences or for selected inferences.
E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e. a strict separation of active and passive facts. Proof search in E is primarily controlled by a literal selection strategy, a clause evaluation heuristic, and a simplification ordering. The prover supports a large number of preprogrammed literal selection strategies, many of which are only experimental. Clause evaluation heuristics can be constructed on the fly by combining various parameterized primitive evaluation functions, or can be selected from a set of predefined heuristics. Supported term orderings are several parameterized instances of Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO).
The prover uses a preprocessing step to convert formulas in full first order format to clause normal form. This step may introduce (first-order) definitions to avoid an exponential growth of the formula. Preprocessing also unfolds equational definitions and performs some simplifications on the clause level.
The automatic mode determines literal selection strategy, term ordering, and search heuristic based on simple problem characteristics of the preprocessed clausal problem.
EP 0.99 is just a combination of E 0.99 in verbose mode and a proof analysis tool extracting the used inference steps.
The program has been successfully installed under SunOS 4.3.x, Solaris 2.x, HP-UX B 10.20, MacOS-X, and various versions of Linux. Sources of the latest released version are available freely from:
http://www.eprover.orgEP 0.99 is a simple Bourne shell script calling E and the postprocessor in a pipeline.
E distinguishes problem classes based on a number of features, all of which have between 2 and 4 possible values. The most important ones are:
For classes above a threshold size, we assign the absolute best heuristic to the class. For smaller, non-empty classes, we assign the globally best heuristic that solves the same number of problems on this class as the best heuristic on this class does. Empty classes are assigned the globally best heuristic. Typically, most selected heuristics are assigned to more than one class.
EP 0.99 will be hampered by the fact that it has to analyse the inference step listing, an operation that typically is about as expensive as the proof search itself. Nevertheless, it should be competitive among the CNF* and FOF* systems.
The website for the project is at this address:
Z(x1,...,xn) must have one of the following three forms:
During search, Geo searches for a model of the geometric formulas
by backtracking. At each choice point, after all subbranches have
been refuted, it derives a new rule of Type 1, which refutes the
current model attempt without backtracking.
In this way, Geo learns during backtracking.
In case Geo finds a finite model, it simply prints the model. In case no
model exists, it will eventually learn the rule -> FALSE,
so that it is able to output a refutation proof.
The final aim of geometric resolution, and of Geo, is to obtain a prover
that is both good at finding proofs, and at finding finite models.
MathServe currently integrates the ATP systems Otter 3.3, EP 0.91,
SPASS 2.2 and Vampire 8.0. All ATP services get a problem description
in the new TPTP format (TPTP Library v3.1.1) and a time limit in
seconds as an input. The problem is transformed into the provers' input
format using the tptp2X utility. The result specifies the status of the
given problem with one of the statuses defined in the SZS Status
Ontology [SZS04].
Since MathServe does not form an ATP system on its own it only
enters the Demonstration Division of the CASC.
The system is also able to work with second order statements.
It may also receive knowledge and know-how for a specific domain from a
human user; see [Pas89,
Pas93].
These two possibilities are not used while working with the TPTP Library.
Equinox 1.0
Koen Claessen
Chalmers University of Technology, Sweden
koen[@]cs.chalmers.se
Architecture
Equinox is an experimental new theorem prover for pure first-order logic with
equality. It finds ground proofs of the input theory, by solving successive
ground instantiations of the theory using an incremental SAT-solver. Equality
is dealt with using a Nelson-Oppen framework.
Implementation
The main part of Equinox is implemented in Haskell using the GHC compiler.
Equinox also has a built-in incremental SAT solver which is written in C++.
The two parts are linked together on the object level using Haskell's
Foreign Function Interface.
Strategies
There is only one strategy in Equinox:
Expected Competition Performance
Equinox is still in its infancy. There should however be problems that
it can solve that few other provers can handle.
Faust 1.0
Yury Puzis
University of Miami, USA
yury.puzis[@]gmail.com
Architecture
Faust 1.0 is a fully concurrent automatic theorem prover for fist
order logic. Faust implements binary resolution, full resolution,
superposition, equality resolution, forward subsumption resolution,
simplifications such as propositional tautology deletion, factoring
of identical literals, equality tautology deletion, trivial equality
resolution, as well as forward and backward subsumption. Faust has
been designed as a multi-threaded application to allow concurrent
execution of logically incompatible strategies. The idea that governs
the design of the architecture is that collaborative specialist
strategies can be more scalable then powerful multipurpose strategies
executed in sequence. For this purpose a 3-layered architecture was
designed:
Implementation
Faust is implemented in C++ and supports gcc 4.0+. All strategies (except
the base one) are implemented by inheriting from each other. The JJParser
library is used in order to parse the initial problem file.
http://umsis.miami.edu/~ypuzis/FAUST-AutomatedTheoremProver.html
Strategies
Implemented strategies include model resolution, unit resolution
and semantic resolution. The Strategy Manager attempts to show
contradiction by:
Common to all strategies is an Otter loop, modified to correctly
and efficiently handle formulae that are found as a consequence of
kernel-level interaction with a another strategy. Ordering methods
include age-weight ratio, KBO, and other heuristics. Kernel-level
strategy interaction includes:
Expected Competition Performance
Faust has a relatively extensive ammunition of inference rules and
is implemented efficiently. However, the advantages of concurrent,
collaborative execution
of strategies are far from being fully exploited in the current
version. Therefore, the expected performance is modest.
Geo 2006i
Hans de Nivelle1,
Jia Meng2
1Max-Planck Institut für Informatik, Germany
nivelle[@]mpi-inf.mpg.de
2National ICT Australia, Australia
jiameng[@]rsise.anu.edu.au
Architecture
Geo is based on geometric resolution.
A geometric formula is a formula with form:
Forall x1, ..., xn
A1 /\ ... /\ Ap /\ v1 != w1 /\ ... /\ vq != wq -> Z(x1,...,xn).
The Ai must be non-equality atoms.
The arguments of the atoms Ai must be among the variables
x1, ..., xn.
There are no function symbols, and no constants in the atoms Ai.
The inequalities vj != wj must have their arguments among
x1, ..., xn.
As input, Geo accepts geometric formulas and first-order formulas.
First-order formulas are transformed into geometric formulas.
During this translation, function symbols have to be removed, and
replaced by relations.
Implementation
Geo is written in C++. We try to keep the code readable
and reasonably efficient at the same time.
Strategies
Currently, Geo has only one strategy: It searches for a model by exhaustive
search, and learns a new rule at each choice point, after it has
refuted all subbranches.
There are still many uncertainties in the way geometric resolution
has to be implemented, and Geo is only a first attempt.
Expected Competition Performance
Geo was completed too short before the competition.
We have not done big testing, so we cannot predict the performance.
iProver 0.1
Konstantin Korovin
The University of Manchester, England
korovin[@]cs.man.ac.uk
Architecture
iProver is based on the instantiation calculus [GK03],
which is complete for first-order logic.
iProver successively instantiates a given set of clauses, and uses a ground
SAT solver for checking whether the current ground abstraction of clauses
is satisfiable.
If the ground solver finds unsatisfiability then the prover terminates,
proving unsatisfiability of the initial set of clauses; otherwise the model
for ground clauses is used to guide further instantiations.
The saturation process is implemented as a modification of a given
clause algorithm.
We use non-perfect discrimination trees for the unification index.
The following redundancy eliminations are implemented:
blocking non-proper instantiations; dismatching constraints
[GK04]; removing duplicates.
Equality is dealt with (internally) by adding the necessary axioms of
equality.
Implementation
iProver is implemented in OCaml and for the ground reasoning uses
C version of MiniSat implemented by Niklas Sörensson.
The interface from MiniSat-C to OCAML is implemented by Ahmad Abu-Khazneh.
Strategies
Various literal selection functions are implemented preferring
negative/positive, shallow/deep literals.
Parameters of the instantiation loop include: frequency of calling the SAT
solver and selection of the given clause based on its properties.
Expected Competition Performance
This is the first implementation, without many optimizations
and more importantly without equality being built in.
Therefore the expected performance is very modest.
MathServe 0.80
Jürgen Zimmer, Serge Autexier
Universität des Saarlandes, Germany
{jzimmer,serge}[@]ags.uni-sb.de
Architecture
MathServe is a framework for integrating
reasoning systems as Web Services into a networked environment. The
functionality of these Reasoning Web Services is captured in Semantic
Web Service descriptions using the OWL-S [MarURL]
upper ontology for semantic web services. The MathServe Broker is a
middle-agent which knows all available reasoning services and answers
queries for client applications. Given a problem description, the
MathServe Broker can automatically retrieve services and to combine
services in case one single service is not sufficient to tackle a
problem. Our Broker performs a "semantic" best match by analysing
incoming problems and choosing the best service available for that
problem.
Implementation
MathServe is implemented in Java and uses an Apache Tomcat web server
and the AXIS package to offer reasoning systems as Web Services. All
services are accessible programmatically using WSDL descriptions of
their interface. ATP systems are integrated via Java wrappers that
manage the translation of the input problems into the prover's input
format and the translation of resulting resolution proofs into TPTP
format.
Strategies
We collected data on the performance of the underlying ATP systems on
all the Specialists Problem Classes (SPCs) [SS01]
of the TPTP Library 3.1.1. The OWL-S descriptions of all ATP services
have been annotated with this data. The MathServe Broker determines the
SPC of an incoming TPTP problem and uses the performance data to choose
a suitable prover for that problem.
Expected Competition Performance
At CASC-20, MathServe 0.62 could solve 392 out of 660 problems. Partly,
this performance was due to the older versions of EP (0.82) and Vampire
(7.0) used. Furthermore, MathServe could not handle big problems with
descriptions bigger than 2MB. We improved the problem handling of
MathServe and it should now be able to handle big problems. We also
integrated newer versions of the ATP systems SPASS, E and Vampire
as well as the systems DCTP, Waldmeister and Paradox.
Muscadet 2.6
Dominique Pastre
Université René Descartes (Paris 5), France
pastre[@]math-info.univ-paris5.fr
Architecture
The Muscadet theorem prover [Pas89] is a
knowledge-based system.
It is based on Natural Deduction, following the terminology of
[Ble71] and [Pas78],
and uses methods which resembles those used by humans.
It is composed of an inference engine, which interprets and executes rules,
and of one or several bases of facts,
which are the internal representation of "theorems to be proved".
Rules are either universal and put into the system, or built by the system
itself by metarules from data (definitions and lemmas).
Rules may add new hypotheses, modify the conclusion, create objects,
split theorems into two or more subtheorems
or build new rules which are local for a (sub-)theorem.
Implementation
Muscadet 2 [Pas01] is implemented in SWI-Prolog.
Rules are written as declarative Prolog clauses.
Metarules are written as sets of Prolog clauses, more or less declarative.
The inference engine includes the Prolog interpreter and some procedural Prolog
clauses.
Muscadet 2 is available from:
http://www.math-info.univ-paris5.fr/~pastre/muscadet/muscadet.html
Strategies
There are specific strategies for existential, universal, conjunctive or
disjunctive hypotheses and conclusions.
Functional symbols may be used, but an automatic creation of intermediate
objects allows deep subformulae to be flattened and treated as if the
concepts were defined by predicate symbols.
The successive steps of a proof may be forward deduction (deduce new
hypotheses from old ones), backward deduction (replace the conclusion by
a new one) or refutation (only if the conclusion is a negation).
Expected Competition Performance
The best performances of Muscadet will be for problems
manipulating many concepts in which all statements (conjectures,
definitions, axioms) are expressed in a manner similar to the
practice of humans, especially of mathematicians
[Pas02,Pas06].
It will have poor performances for problems using few concepts but large
and deep formulas leading to many splittings.
Octopus 2006
Monty Newborn
McGill University, Canada
newborn[@]cs.mcgill.ca
Architecture
Octopus 2006 is a parallel version of Theo and is designed to run on as many
computers as are available.
For this competition, we expect to run on between 100 and 200 computers.
Octopus's single processor version, Theo [New01], is
a resolution-refutation theorem prover for first order logic.
It accepts theorems in either clausal form or FOL form.
Theo's inference procedures include binary resolution, binary factoring,
instantiation, demodulation, and hash table resolutions.
Otter 3.3 [McC03a] is an ATP system for
statements in first-order (unsorted) logic with equality.
Otter is based on resolution and paramodulation applied to clauses.
An Otter search uses the "given clause algorithm", and typically involves
a large database of clauses; subsumption and demodulation play an important
role.
Otter is written in C.
Otter uses shared data structures for clauses and terms, and it uses
indexing for resolution, paramodulation, forward and backward subsumption,
forward and backward demodulation, and unit conflict.
Otter is available from:
Otter's original automatic mode, which reflects no tuning to the TPTP
problems, will be used.
Otter has been entered into CASC as a stable benchmark against which
progress can be judged (there have been only minor changes to Otter since
1996 [MW97], nothing that really affects
its performace in CASC).
This is not an ordinary entry, and we do not hope for Otter to do well
in the competition.
Acknowledgments: Ross Overbeek, Larry Wos, Bob Veroff, and
Rusty Lusk contributed to the development of Otter.
Paradox incorporates the following features: Polynomial-time
clause splitting heuristics, the use of incremental SAT,
static symmetry reduction techniques, and the use of sort
inference.
The main differences with Paradox 1.0 are: a better SAT-solver,
better memory behaviour, and a faster clause instantiation algorithm.
See the description of Paradox 1.3
for general information.
When Theo participated in the 2004 CASC Competition, it used a learning
strategy [NW04] briefly described in this
paragraph.
When given a theorem, it first created a list of potential ways to "weaken"
the theorem by weakening one of the clauses.
It then randomly selected one of the weakenings, tried to prove the weakened
version of the theorem, and then used the results from this effort to help
prove the given theorem.
A weakened version was created by modifying one clause by replacing a
constant or function by a variable or by deleting a literal.
Certain clauses from the proof of the weakened version were added to the
base clauses when Theo next attempted to prove the given theorem.
In addition, base clauses that participated in the proof of the weakened
version were placed in the set-of-support.
Theo then attempted to prove the given theorem with the revised set of base
clauses.
The learning strategy was further developed for the 2005 competition as
described here.
When Theo is given a theorem, it first tries to prove a weakened version.
It does this for 50% of the allotted time.
If successful, it then attempts to prove the given theorem as described in
the previous paragraph.
If unsuccessful, however, Theo then weakens the given theorem further by
weakening an additional clause.
With two clauses now weakened, Theo then attempts to prove the given theorem
for 50% of the remaining time (25% of the originally allotted time).
If successful, Theo uses the results to attempt to prove the given theorem.
If unsuccessful, Theo picks two other weakenings and tries again for 50% of
the remaining time.
Now certain weakened proofs are thrown out, as they generally seem to be
useless.
Weakened proofs that are shorter than three inferences are considered
useless, as are those that are less than five and do not involve the negated
conclusion.
Attempts to improve the learning strategy are of major interest in the
ongoing development of Theo.
For this 2006 competition, efforts have been made to avoid attempting
certain weakenings in the first place.
These weakenings tend to produce useless proofs. As a trivial example,
weakening equal(X,X) to equal(X,Y) is highly likely to
produce a trivial proof.
In addition to work on the learning strategy, the procedure for transforming
FOFs to CNF has been improved.
A number of efficient indexing techniques are used to implement all major
operations on sets of terms and clauses.
Run-time algorithm specialisation is used to accelerate some costly operations,
e.g., checks of ordering constraints.
Although the kernel of the system works only with clausal normal forms, the
shell accepts a problem in the full first-order logic syntax, clausifies it
and performs a number of useful transformations before passing the result to
the kernel.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
The automatic mode of Vampire 8.0 is derived from extensive experimental
data obtained on problems from TPTP v3.0.1.
Input problems are classified taking into account simple syntactic properties,
such as being Horn or non-Horn, presence of equality, etc.
Additionally, we take into account the presence of some important kinds of
axiom, such as set theory axioms, associativity and commutativity.
Every class of problems is assigned a fixed schedule consisting of a number
of kernel strategies called one by one with different time limits.
Main differences between Vampire 8.0 and 7.0
No system description supplied.
However, see the description of Vampire 8.0 for
general information.
Minor changes have been made, including a bugfix in the FOF to
CNF conversion.
No system description supplied.
However, see the description of
Waldmeister 704 for general information.
Otter 3.3
William McCune
Argonne National Laboratory, USA
mccune[@]mcs.anl.gov
Architecture
Implementation
http://www-unix.mcs.anl.gov/AR/otter/
Strategies
Expected Competition Performance
Paradox 1.3
Koen Claessen, Niklas Sörensson
Chalmers University of Technology and
Gothenburg University, Sweden
{koen,nik}[@]cs.chalmers.se
Architecture
Paradox [CS03] is a
finite-domain model generator. It is based on a MACE-style
[McC94] flattening and instantiating of the
first-order clauses into propositional clauses, and then the use of
a SAT solver to solve the resulting problem.
Implementation
The main part of Paradox is implemented in Haskell using the GHC compiler.
Paradox also has a built-in incremental SAT solver which is written in C++.
The two parts are linked together on the object level using Haskell's Foreign
Function Interface.
Strategies
There is only one strategy in Paradox:
In this way, Paradox can be used both as a model finder and as an EPR solver.
Expected Competition Performance
Paradox 1.3 is the CASC-20 SAT division winner.
Paradox 2.0
Koen Claessen, Niklas Sörensson
Chalmers University of Technology and
Gothenburg University, Sweden
{koen,nik}[@]cs.chalmers.se
Architecture
Paradox 2.0ualpha is the beginning of a complete rewrite of Paradox 1.3.
Paradox 2.0-alpha does not have all the features yet that Paradox 1.3 has.
Expected Competition Performance
Paradox 2.0-alpha will probably perform slightly worse than Paradox 1.3.
Theo 2006
Monty Newborn
McGill University, Canada
newborn[@]cs.mcgill.ca
Architecture
Theo [New01] is a resolution-refutation theorem
prover for first order logic.
It accepts theorems in either clausal form or FOL form.
Theo's inference procedures include binary resolution, binary factoring,
instantiation, demodulation, and hash table resolutions.
Implementation
Theo is written in C and runs under both LINUX and FREEBSD.
It contains about 50000 lines of source code.
Originally it was called The Great Theorem Prover.
Strategies
Theo uses a large hash table (16 million entries) to store clauses.
This permits complex proofs to be found, some as long as 500 inferences.
It uses what might be called a brute-force iteratively deepening depth-first
search looking for a contradiction while storing information about
clauses - unit clauses in particular - in its hash table.
Expected Competition Performance
Theo is marginally better than it was last year.
Vampire 8.0
Andrei Voronkov
University of Manchester, England
voronkov[@]cs.man.ac.uk
Architecture
Vampire 8.0, [RV02,Vor05]
is an automatic theorem prover for first-order classical logic.
It consists of a shell and a kernel.
The kernel implements the calculi of ordered binary resolution and
superposition for handling equality.
The splitting rule and negative equality splitting are simulated by the
introduction of new predicate definitions and dynamic folding of such
definitions.
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion
(optionally modulo commutativity), subsumption resolution, rewriting by
ordered unit equalities, and a lightweight basicness.
The CASC version uses the Knuth-Bendix ordering.
The lexicographic path ordering has been implemented recently but will not be
used for this CASC.
Implementation
Vampire 8.0 is implemented in C++. The supported compilers are gcc
gcc 3.3.x, and Microsoft Visual C++.
Strategies
The Vampire kernel provides a fairly large number of features for
strategy selection.
The most important ones are:
Expected Competition Performance
Vampire 8.0 is the CASC-20 MIX and FOF divisions winner.
Vampire 8.1
Andrei Voronkov
University of Manchester, England
voronkov[@]cs.man.ac.uk
Waldmeister 704
Jean-Marie Gaillourdet1,
Thomas Hillenbrand2, Bernd Löchner1
1Technische Universität Kaiserslautern, Germany,
2Max-Planck-Institut für Informatik Saarbrücken, Germany
waldmeister[@]informatik.uni-kl.de
Architecture
Waldmeister 704 is a system for unit equational deduction.
Its theoretical basis is unfailing completion in the sense of
[BDP89] with refinements towards ordered
completion (cf. [AHL03]).
The system saturates the input axiomatization, distinguishing active facts,
which induce a rewrite relation, and passive facts, which are the one-step
conclusions of the active ones up to redundancy.
The saturation process is parameterized by a reduction ordering and a heuristic
assessment of passive facts [HJL99].
For an in-depth description of the system, see
[Hil03].
Waldmeister 704 improves over last year's version in several respects.
Firstly, the detection of redundancies in the presence of
associative-commutative operators has been strenghtened (cf.
[Loe04a]).
In a class of AC-equivalent equations, an element is redundant if each of
its ground instances can be rewritten, with the ground convergent rewrite
system for AC, into an instance of another element.
Instead of elaborately checking this kind of reducability explicitly, it
can be rephrased in terms of ordering constraints and efficiently be
approximated with a polynomial test.
Secondly, the last teething troubles of the implementation of the Waldmeister
loop have been overcome.
Thirdly, a number of strategies have slightly been revised.
Implementation
The prover is coded in ANSI-C. It runs on Solaris, Linux, and newly
also on MacOS X. In addition, it is now available for Windows users
via the Cygwin platform. The central data strucures are: perfect
discrimination trees for the active facts; group-wise compressions for
the passive ones; and sets of rewrite successors for the conjectures.
Visit the Waldmeister web pages at:
http://www.waldmeister.org
Strategies
The approach taken to control the proof search is to choose the search
parameters according to the algebraic structure given in the problem
specification [HJL99].
This is based on the observation that proof tasks sharing major parts of
their axiomatization often behave similar.
Hence, for a number of domains, the influence of different reduction orderings
and heuristic assessments has been analyzed experimentally; and in most cases
it has been possible to distinguish a strategy uniformly superior on the whole
domain.
In essence, every such strategy consists of an instantiation of the first
parameter to a Knuth-Bendix ordering or to a lexicographic path ordering, and
an instantiation of the second parameter to one of the weighting functions
addweight, gtweight, or mixweight, which, if
called on an equation , return
,
, or
,
respectively, where denotes the number of symbols
in .
Expected Competition Performance
Waldmeister 704 is the CASC-20 UEQ division winner.
Waldmeister 806
Thomas Hillenbrand1, Bernd Löchner2
1Max-Planck-Institut für Informatik Saarbrücken, Germany
2Technische Universität Kaiserslautern, Germany,
waldmeister[@]informatik.uni-kl.de