Proof search in E is controlled by a literal selection strategy and an
evaluation heuristic. A variety of selection strategies (including
e.g., no selection, or select all negative literals)
is available. Evaluation heuristics can be constructed on the fly by
combining various parameterized primitive evaluation functions. A
simple automatic mode can select strategy, term ordering, and
heuristic based on simple problem characteristics.
New features since last year include a wider variety of literal
selection strategies, and an improved automatic mode that now also
switches between two different Knuth-Bendix orderings. A proof checker
is now available as well.
The program has been successfully installed under SunOS 4.3.x,
Solaris 2.x, HP-UX B 10.20, and various versions of
Linux. Sources are available freely from:
E's auto mode is optimized for performance on the complete TPTP
2.3.0. The optimization is based on a fairly large number of test runs
and consists of the selection of one of about 50 different strategies
for each problem.
E-SETHEO-FLOTTER additionally includes the program Flotter
[WGR96] as a
preprocessing module for transforming non-clausal formulae to
clausal form.
In contrast to the former P-SETHEO
[Wol98], since version 99csp of E-SETHEO,
the different strategies
are run sequentially, one after the other, depending on the allocation
of resources to the different strategies, so-called schedules,
which have
been computed from experimental
data using machine learning techniques as described in
[SW99].
Schedule selection depends on syntactic
characteristics of the input formula.
Version 2000csp of E-SETHEO employs a new strategy interweaving the lemma
mechanisms of E and SETHEO.
The program runs under Solaris 2.6. Sources are available freely.
Gandalf is a reolsution prover which implements a number of different
basic strategies:
binary ordered resolution for several orderings,
versions of set-of-support resolution,
binary unit resolution, hyperresolution.
Enhancements are used for cutting off literals and combining different
strategies, in particular forward and backward reasoning, into one run.
Equality is handled by ordered paramodulation and demodulation.
When compared to earlier versions, the new version adds special strategies
for a limited kind of clause demodulation
and combining a propositional prover with a first-order prover.
The basic idea of the automatic mode in Gandalf is time-slicing: Gandalf
selects a set of different search strategies,
allocates time to these and finally runs the strategies one after another.
The motivation is the following: since
it is very hard to determine a suitable strategy by some heuristic procedure,
it pays off to try a number of possibly suitable strategies.
The selection of strategies and the percentage of time they receive is in
a some what ad hoc way based on the characteristics
of the problem: percentage of Horn and almost-Horn clauses,
the number of clauses, percentage of clauses where
literals can be ordered using term and variable depths.
Some relevant publications are:
[Tam97,
TS98,
Tam98].
Ordinary Gandalf is a resolution prover which implements a number of
different basic strategies: binary ordered
resolution for several orderings, versions of set-of-support resolution,
binary unit resolution, hyperresolution.
Enhancements are used for cutting off literals and combining different
strategies, in particular forward and backward
reasoning, into one run. Equality is handled by ordered paramodulation
and demodulation.
GandalfSat uses time-slicing similarly to ordinary Gandalf.
It attempts hyperresolution, ordered resolution based on
[FLTZ] results, and two kinds of
finite model building algorithms (Falcon-like and MACE-like) for attempting
to show satisfiability.
Some relevant publications are:
[FL+93,
Tam97,
TS98].
Methods are those used by humans, and are executed according to a
plan decided by metarules and heuristics. Methods include splitting
theorems into two or more subtheorems (conjunctive conclusion or
disjunctive hypotheses), treatment of universal, existential or
disjunctive conclusion, treatment of existential or disjunctive
hypotheses (one at a time when it is decided by heuristic rules),
replacement of the conclusion by its definition. Hypotheses are never
replaced by their definition. Instead, rules are built by metarules
from definitions and used to deduce new hypotheses from old ones.
Universal hypotheses are not added. Instead, rules are built as from
the definitions. Functional symbols may be used, but an automatic
creation of intermediate objects allows deep subformulae to be flattened
and treated as if the concepts were defined by predicate symbols.
Positive properties are privileged, in hypotheses and in conclusions.
So, a special treatment of negation is done. The successive steps of
a proof may be forward deduction (deduce new hypotheses from old
ones), backward deduction (replace the conclusion by a new one) or
refutation (only if the conclusion is a negation).
The system is also able to work with second order statements. It
may also receive knowledge and know-how for a specific domain from a
human user; see [Pas89] and
[Pas93]. These two possibilities
are not used while working with the TPTP Library.
The prover is available for many platforms, in particular, a Windows 95/98/NT
version with a neat graphical user interface is now also provided.
This year's version of our unit equality theorem prover is in most parts
identical to Waldmeister 798 that won the CASC-15 UEQ competition with
an enormous lead [SS99]. Since
last year's description is included in
this collection only the differences are reported here.
Recently, Waldmeister has successfully been integrated into the Omega proof
development environment [BC+97].
Moreover, a convenient WWW interface has been established.
It is reachable via:
Bliksem 1.10
H. de Nivelle
Max Planck Institut für Informatik,
Germany
nivelle@mpi-sb.mpg.de
Architecture
Bliksem is a resolution and paramodulation based theorem prover.
It implements many different orders, including many orders that
define decision procedures. The automatic selection of strategies
is poor. A special feature of Bliksem is that it is able to output
totally specified proofs. After processing, these proofs can be
checked by Coq.
Implementation
Bliksem is written in portable C. High priority has been given
to portability. Bliksem has been compiled succcessfully using
cc, gcc, and djpcc. The data structures have been carefully
chosen after benchmark tests on the basic operations.
Bliksem can be obtained from:
http://www.mpi-sb.mpg.de/~bliksem
Expected Competition Performance
The system does not differ from last year's entry.
So it is reasonable to expect the same competition performance
as last year.
References
E 0.6
S. Schulz
Institut für Informatik, Technische Universität München,
Germany
schulz@informatik.tu-muenchen.de
Architecture
E 0.6 [Sch99] is a purely equational
theorem prover. The calculus used by E combines superposition (with
selection of negative literals) and rewriting. For the special case
that only unit clauses are present, it degenerates into unfailing
completion. For the Horn case E can simulate various positive
unit strategies. No special rules for non-equational literals have
been implemented, i.e., resolution is simulated via paramodulation and
equality resolution.
Implementation
E is implemented in ANSI C, using the GNU C compiler. The most unique
feature of the implementation is the maximally shared term
representation. This includes parallel rewriting for all instances of
a particular subterm. A second important feature is the use of
perfect discrimination trees with age and size constraints
for rewriting and unit-subsumption.
http://wwwjessen.informatik.tu-muenchen.de/~schulz/WORK/eprover.html
Expected Competition Performance
As in the last year, E is expected to perform pretty well in all
categories it participates in, with particularly strong showing for
Horn problems. Very little effort has been spent on the performance for
unit equality problems, however, and it is unlikely that E can come close to
Waldmeister's performance in this category.
References
E-SETHEO 2000csp
R. Letz, S. Schulz, G. Stenz
Institut für Informatik, Technische Universität München,
Germany
{letz,schulz,stenz}@informatik.tu-muenchen.de
Architecture
E-SETHEO is a compositional theorem prover for formulae in first order
clause logic, combining the systems E
[Sch99] and SETHEO
[MI+97].
It incorporates different calculi and proof procedures
like superposition, model elimination and semantic trees (the DPLL
procedure). Furthermore, the system contains transformation techniques
which may split the formula into independent subparts or which may
perform a ground instantiation.
Finally, advanced methods for controlling and optimizing the combination
of the subsystems are applied.
Implementation
According to the diversity of the contained systems, the modules of
E-SETHEO are implemented in different programming languages like C,
Prolog, Scheme, and Shell tools.
Expected Competition Performance
We expect E-SETHEO to perform well in all categories it participates in.
References
Gandalf c-2.1
T. Tammet
Tallinn Technical University, Estonia
tammet@cc.ttu.ee
Architecture
Gandalf version "c" stands for the Gandalf for classical logic.
Special versions for intuitionistic logic and
type theory exist but are not covered here.
Implementation
Gandalf is implemented in Scheme, using the scm interpreter developed
by A. Jaffer and the Scheme-to-C compiler Hobbit developed by T. Tammet
for the scm system.
Expected Competition Performance
Gandalf is likely to do well in the non-Horn-with-equality subclass of
problems. It is likely to have mediocre
performance in the general mixed class, and poor performance in the pure
equality class.
References
GandalfSat 1.0
T. Tammet
Tallinn Technical University, Estonia
tammet@cc.ttu.ee
Architecture
GandalfSat is a special version of the Gandalf
theorem prover, optimised for satisifiability checking. Essentially
it contains the same code as Gandalf, but uses different default search
strategies to Gandalf.
Implementation
Gandalf is implemented in Scheme, using the scm interpreter developed
by A. Jaffer and the Scheme-to-C compiler Hobbit developed by T. Tammet
for the scm system.
Expected Competition Performance
Since GandalfSat has been developed recently, we cannot reliably predict its
performance. However, we expect
it to be among the better provers for satisfiability checking.
References
MACE 1.4
W. McCune
Mathematics and Computer Science, Argonne National Laboratory
mccune@mcs.anl.gov
Architecture
MACE [McC00] searches for
small finite models.
The program
applies to statements in first-order (unsorted) logic with equality.
Roughly speaking, MACE works as follows. For a
given domain size, an equivalent propositional problem is constructed
and a Davis-Putnam-Loveland decision procedure is applied. MACE
typically cannot handle clauses with many variables or deeply nested
terms.
Implementation
MACE is written in C.
Its Davis-Putnam-Loveland procedure uses a counting algorithm to do the
unit resolutions, and forward subsumption is not applied.
Expected Competition Performance
MACE unexpectedly won the SAT division at CASC-16, in part because
the difficulty ratings were biased towards MACE. MACE is expected
to do well at CASC-17.
References
Acknowledgments: MACE was motivated by previous work of
John Slaney, Mark Stickel, and Hantao Zhang.
MUSCADET 2.2
D. Pastre
Université René Descartes (Paris 5),
UFR de Mathématiques et Informatique,
France
pastre@math-info.univ-paris5.fr
Architecture
MUSCADET is a theorem prover based upon Natural Deduction, in the
sense of [Ble71] and
[Pas78].
It is built as a knowledged-based system, that is, methods are
expressed by rules interpreted by an inference engine. Some rules are
universal mathematical knowledge and are given to the system. Other
rules are specific to some concepts and are built by metarules from
the definitions of these concepts. The notions of hypothesis and
conclusion to be proved are privileged. The first conclusion is the
first order statement of the conjecture.
Implementation
The first version of MUSCADET was a real rule-based system
[Pas89]. Rules and metarules were
written in a declarative language defined for this and interpreted by
an inference engine written in Pascal. The competition version is
entirely written in SWI-Prolog. Rules are written as declarative
Prolog clauses. Metarules are written as sets of Prolog clauses, more
or less declarative. The Pascal inference engine is replaced by the
Prolog interpreter and some procedural Prolog clauses.
Expected Competition Performance
The best performances of MUSCADET will be for problems
manipulating many concepts in which all statements (conjectures,
definitions, axioms) are expressed in a manner similar to the
practice of humans, especially of mathematicians. It will have poor
performances for problems using few concepts but large and deep
formulas leading to many splittings.
References
Otter-MACE 437
W. McCune
Mathematics and Computer Science, Argonne National Laboratory
mccune@mcs.anl.gov
Architecture
Otter [McC94,McC98a,MW97] searches for
proofs, and MACE [McC98b] searches for
small finite models. Both programs
apply to statements in first-order (unsorted) logic with equality. The
two programs are independent and work by completely different
methods. (In practice, the programs are complementary.) Version 3.0.5
of Otter has been used. Otter is based on resolution and
paramodulation applied to clauses; an Otter search uses the "given
clause algorithm" and typically involves a large database of clauses;
subsumption and demodulation play an important role. Version 1.3.2 of
MACE has been used. Roughly speaking, MACE works as follows. For a
given domain size, an equivalent propositional problem is constructed
and a Davis-Putnam-Loveland decision procedure is applied. MACE
typically cannot handle clauses with many variables or deeply nested
terms. Otter-MACE 437 (= Otter 3.0.5 + MACE 1.3.2) is a simple
combination of the two programs.
Implementation
Both programs are written in C. Otter uses shared data structures for
clauses and terms, and it uses indexing for resolution,
paramodulation, forward and backward subsumption, forward and backward
demodulation, and unit conflict. MACE is much simpler than Otter; its
Davis-Putnam-Loveland procedure uses a counting algorithm to do the
unit resolutions, and forward subsumption is not applied.
Otter-MACE is a shell script that runs Otter for a few seconds,
then (if Otter has not found a refutation) MACE for a few seconds,
then (if MACE has not found a model) Otter again for the remaining time.
Expected Competition Performance
These versions of Otter and MACE are essentially the same as the
versions entered in the previous two CASC competitions. Both programs
generally perform well at CASC, but no top prizes have been won since
CASC-13. We do not expect any winning performances in CASC-16. We
see our entry as a benchmark for measuring improvements in other
systems.
References
Acknowledgments: Ross Overbeek, Larry Wos, Bob Veroff,
and Rusty Lusk contributed to the development of Otter. MACE was motivated
by previous work of John Slaney, Mark Stickel, and Hantao Zhang.
Otter 3.1
W. McCune
Mathematics and Computer Science, Argonne National Laboratory
mccune@mcs.anl.gov
Architecture
Otter
[McC94,
McC98,
MW97] searches for proofs.
Otter applies to statements in first-order (unsorted) logic with equality.
Otter is based on resolution and
paramodulation applied to clauses; an Otter search uses the ``given
clause algorithm'' and typically involves a large database of clauses;
subsumption and demodulation play an important role.
Implementation
Otter is written in C. Otter uses shared data structures for
clauses and terms, and it uses indexing for resolution,
paramodulation, forward and backward subsumption, forward and backward
demodulation, and unit conflict.
Expected Competition Performance
Otter's original automatic mode was used for CASC-13 (the first CASC).
A secondary automatic mode, derived from some tuning with respect to
TPTP, was used For CASC-14, -15, and -16. The secondary automatic
mode is less complete for nonunit equality problems but it proves a few
more TPTP theorems. For CASC-17, we return to the original automatic
mode, which reflects no tuning to TPTP. Otter did very well in
CASC-13, but it has changed very little in automatic capability since
then. Powerful new programs have been introduced in the past four
years, and old competitors have been greatly improved, so no winning
performances are expected from Otter.
References
Acknowledgments: Ross Overbeek, Larry Wos, Bob Veroff,
and Rusty Lusk contributed to the development of Otter.
SCOTT V
K. Hodgson and J. Slaney
Automated Reasoning Project, Australian National University, Australia
{kahlil,jks}@arp.anu.edu.au
Architecture
SCOTT (Semantically Constrained OTTER)
[HS99,
HS00] is an extension
of OTTER [McC94] and uses the same
inference rules and overall proof
search structure. This extension incorporates FINDER
[Sla94] (a
finite model generator) as a subsystem of OTTER in order to determine
consistent subsets of the clause space. These consistent subsets are
used to guide the proof search by increasing the likelihood that
certain necessary inferences are made
[HS00]. (This can be seen to
loosely generalise the set of support strategy). Not all consistent
subsets are created equal, with different collections providing
different levels of guidance, and so a number of strategies for
determining which collections provide the best guidance have been
developed. These incorporate theoretical estimates of the guidance
provided and the corresponding cost incurred, enabling us to manage
the trade-off much more effectively than in previous versions of SCOTT
[HS99].
Implementation
SCOTT is written in C. It uses the program OTTER to perform the proof
search, the progam FINDER to witness the consistency of particular
sets of clauses, and additional code to determine which clauses should
be chosen for inference. The system may be obtained from:
http://arp.anu.edu.au/software/scott/
Expected Competition Performance
Preliminary results show that, SCOTT-V out performs SCOTT-IV, so we
expect an improvement on the performance in CASC-16. However,
searching for models and testing clauses against them can be costly,
and so SCOTT may perform poorly if the time-limit is too severe.
References
SPASS 1.0.0TPTP
C. Weidenbach
Max-Planck-Institut für Informatik
weidenb@mpi-sb.mpg.de
Architecture
SPASS [WA+99] is an automated theorem prover
for first-order logic with equality.
It is a saturation based prover employing superposition, sorts and
splitting [Wei99]. The current
version 1.0.0 offers a variety of further inference and reduction rules
including hyper resolution, unit resulting resolution, various variants
of paramodulation and a terminator [AO83].
Implementation
SPASS is implemented in ANSI C and relies on our internal
library supporting specific data structures and algorithms like, e.g., indexing
or orderings (KBO, RPOS). The code is documented, flexible and was
designed to be reusable. The code should compile on any platform
supporting an ANSI C compiler.
Expected Competition Performance
The SPASS competition version 1.0.0TPTP is tuned towards
TPTP problems. In particular, the standard automatic configuration module
is replaced by a module adapted to the TPTP. We hope that the new
version can improve over the version 0.95TPTP that entered the previous
competition.
References
Vampire 0.0
A. Riazanov, A. Voronkov
Computer Science Department, The University of Manchester
{riazanoa,voronkov}@cs.man.ac.uk
Architecture
Vampire is a resolution-based theorem prover for first-order classical
logic. The current version implements ordered binary resolution and
hyperresolution, superposition and some simplifications. For proof
search Vampire uses a saturation-based procedure similar
to that of Otter [McC94].
One interesting feature of Vampire is the so-called
limited resource strategy.
The idea is as follows. When a time limit is given, Vampire makes estimations
on what clauses (called currently eligible clauses) can be processed
by the end of the time limit. For each newly generated clause it is
checked whether it is currently eligible. If it is not, Vampire discards the
clause. The estimation principle takes into consideration the current
strategy and tries to calculate how the process of clause generation will
develop by the time limit.
Implementation
To gain efficiency the most costly set-at-a-time
operations, such as resolution, superposition and subsumption,
are implemented using combination of indexing and compilation.
We use a variant of code trees [Vor95]
for forward subsumption and forward demodulation,
path indexing [Sti89] for backward subsumption
and a variant of discrimination trees [McC92]
for resolution and superposition.
All used indexing techniques provide so-called "perfect filtering"
[Graf96,RSV99].
For storing persistent clauses a tree-like representation of
terms is used. The sharing is agressive. For other operations different
versions of flatterms are used.
Vampire is implemented in C++ and compiled by gcc.
Expected Competition Performance
We expect Vampire to perform well, especially on the problems without equality.
References
Vampire 1.0
A. Riazanov, A. Voronkov
Computer Science Department, The University of Manchester
{riazanoa,voronkov}@cs.man.ac.uk
Architecture
Vampire 1.0 is an automatic theorem prover for first-order classical logic.
It implements the calculi of ordered binary resolution,
hyperresolution and superposition for handling equality.
The splitting rule is simulated by introducing new predicate symbols.
A number of standard redundancy criteria and simplification techniques
is used for pruning the search space: subsumption,
tautology deletion, subsumption resolution and rewriting by ordered
unit equalities. The only term ordering used in Vampire at the moment
is a special version of the Knuth-Bendix ordering which allows
efficient approximation algorithms for solving ordering constraints.
The competition version will exploit a number of simple preprocessing
techniques, such as elimination of simple predicate and function
definitions. A number of efficient indexing techniques is used
to implement all major operations on sets of terms and clauses,
such as an improved version of code trees
[Vor95]
for forward subsumption and a combination of path indexing
[Sti89] and database joins for backward
subsumption.
Implementation
Vampire 1.0 is implemented in C++ and can be compiled by 2.91 or newer
versions of gcc. The binaries for Solaris and Linux are
available from the authors, for details see:
http://www.cs.man.ac.uk/~riazanoa/Vampire/
Expected Competition Performance
We cannot make reliable predictions of Vampire's performance at CASC-17.
References
Waldmeister 799
T. Hillenbrand, A. Jaeger, B. Löchner, and A. Buch
FB Informatik, Universität Kaiserslautern
{hillen,jaeger,loechner}@informatik.uni-kl.de,arnim.buch@sdm.de
Architecture
Waldmeister is an implementation of unfailing Knuth-Bendix
completion [BDP89].
Our approach to control the proof search is
to choose selection strategy and reduction ordering according to the
algebraic structure given in the problem
specification [HJL99].
Since Waldmeister 798, the
corresponding part of the system has been developed further into a robust
component that is easy to maintain. In addition the derivation of useful
control knowledge has been systematized, which also gave rise to
progress on some problem domains.
http://www-avenhaus.informatik.uni-kl.de/waldmeister
From
this URL further documents and the complete Waldmeister
distribution containing an easy-to-use version together with problems
and a manual are available.
Implementation
No major changes took place. Some portions of code have been slightly
revised, e.g., a number of time-consuming inner loops have carefully
been reworked.
Expected Competition Performance
We expect some improvements in comparison with last year's version,
depending on problem domains and proof tasks.
References
Waldmeister 600
T. Hillenbrand, B. Löchner, A. Jaeger, and A. Buch
FB Informatik, Universität Kaiserslautern
waldmeister@informatik.uni-kl.de
Architecture
Waldmeister 600 [HJL99] is an
implementation of unfailing
Knuth-Bendix completion [BDP89].
The system architecture is
built on a stringent distinction of active facts (selected
rewrite rules) and passive facts (critical pairs): Firstly,
only the active facts induce the normalization relation.
Secondly, the passive facts are normalized not permanently,
but on generation only. The saturation cycle is
parameterized by a reduction ordering and a heuristic
assessment of passive facts. The control parameters are
chosen according to the algebraic structure given in the
problem specification
[HJL99]. Recent developments include
the integration of the kernel strategy
[HL00] as well as
extensions towards ordered completion, but are still in
progress.
Implementation
The system is implemented in ANSI-C and runs under SunOS,
Solaris and Linux. Refined perfect discrimination trees are
employed to index the active facts, whereas the passive
facts are stored in a compressed representation. The
hypotheses are blown up to sets of rewrite successors and
sometimes predecessors. A step-by-step proof object is
constructed at run-time. Waldmeister can be found on the Web
at:
http://www-avenhaus.informatik.uni-kl.de/waldmeister
Expected Competition Performance
The Waldmeister system being as powerful as it is today, we
hope to improve over last year's version in some problem
domains.
References