AI-SETHEO
Joachim Draeger
Institut für Informatik, Technische Universität München
draeger@informatik.tu-muenchen.de
Architecture
AI-SETHEO is a hybrid system combining a connection tableaux theorem
prover (SETHEO [GL+94,
MI+97]) and a lemma generator
(DELTA [Sch94]).
This combines aspects of bottom-up
reasoning with the top-down reasoning in connection tableaux theorem
provers. For avoiding a performance breakdown due to the
overwhelming number of brute-force generated lemmata a lemma selection
procedure rejects both trivial and irrelevant lemmata.
ANDP
Dafa Li
Department of Applied Mathematics, Tsinghua University
lidafa@tsinghua.edu.cn
Architecture
ANDP [Li97] is a natural deduction system
for first-order logic without equality, adapted from Gentzen's system.
There are five connectives for ANDP, negation, conjunction, disjunction,
implication, and equivalence, whose precedence decreases in that order.
The rules of inference for the connectives are
MP (Modus Ponens),
MT (Modus Tollens),
LDS and RDS (Left and Right Disjunctive Syllogism),
CP (Conditional Proof),
ADD (Addition),
CNJUN (Conjunction),
SIMP (Simplication),
De Morgan's rule,
Double negation,
IMP (Implication),
and
DILEMMA (cases).
There are two quantifiers for ANDP,
the universal quantifier (Ax) and the existential quantifier (Ex).
The rules of inference for the quantifiers are
UG (Universal Generalization),
EG (Existential Generalization),
US (Universal Specialization),
and
ES (Existential Specialization).
There are two unification algorithms for treating quantifiers.
One is for introducing quantifiers, the other is for eliminating quantifiers
without Skolem functions.
Search strategies: First use the rules for connectives then for quantifiers. First reason from bottom to top, then from top to bottom.
Other strategies: Subsumption with quantifiers, first eliminate existential quantifiers then universal quantifiers, minimal scopes of quantifiers.
E 0.1
S. Schulz
Institut für Informatik, Technische Universität München
schulz@informatik.tu-muenchen.de
Architecture
E 0.1 is a purely equational theorem prover. It is not
primarily intended as an independent prover, but as an preprocessor
implementing the bottom-up part of the METOP
calculus [Mos96], with SETHEO
[MI+97] to
implement the top-down part in a future combined system. However, as
the calculus employs inference rules very similar to those of
superposition calculi [BG94], it
is also possible to use E as a prover on its own.
The implemented calculus combines ordered paramodulation with
rewriting. For the special case that only unit clauses are present,
this calculus degenerates into unfailing completion
[BDP89]
as implemented in DISCOUNT [DKS97]. As
we expect SETHEO to deal with the non-equational part of the theory,
no special rules for non-equational literals have been implemented,
i.e. resolution is simulated via paramodulation and equality
resolution.
A particular feature of E is the term representation. Terms are
maximally shared (within two classes of clauses) to minimize memory
consumption, and we employ graph rewriting techniques to optimize
simplification. As an additional feature, E has a very clean interface
for adding and developing new search control heuristics.
DISCOUNT 2.1
S. Schulz, F. Brandt
Institut für Informatik, Technische Universität München
{schulz,brandtf}@informatik.tu-muenchen.de
Architecture
DISCOUNT/TSM is a sequential version of the well-known distributed
equational theorem prover DISCOUNT [DKS97],
using a new learning
heuristic to control the selection of certain inference steps.
DISCOUNT is based on unfailing completion, extended to deal with
arbitrarily quantified goals. The basic algorithm repeatedly selects
an unprocessed equation, rewrites it to normal form w.r.t. the already
processed equations, and uses it to build critical pairs (new
unprocessed equations). The equation is then used for interreduction
and added to the set of processed equations. Interleaved with this
completion process, critical goals are generated (by
paramodulating facts into goals), and normalized. The process stops
when either a goal has been proven or all facts and goals have been
processed.
The selection of the next equation to be processed is controlled by a term space map (TSM) [Sch98], a generalized term evaluation tree [DS96]. A TSM is a recursive structure capturing characteristic distributions of term nodes in representative patterns of useful and useless equations collected from previous proofs. Suitable proofs for learning are selected using tree-based similarity criteria on TSMs and some simple features of the problem specification.
Bliksem 1.00
H. de Nivelle
CWI Amsterdam
nivelle@cwi.nl
Architecture
Bliksem is a resolution based theorem prover. It supports ordered
resolution, factoring, ordered paramodulation, unit simplification
and demodulation.
Bliksem supports many orders (KBO, RPO, LPO, simple symbol orders,
non-lifaftble orders) and resolution decision procedures.
At this moment it has decision procedures for the E+-class,
for the prefix class exists* forall forall exists*,
and for the guarded fragment of first order logic
[FL+93,
AvN96,
DeN94].
Additionally Bliksem also has the equality strategies of
[BG90].
There is no hyperresolution in Bliksem, but it can be simulated
using an appropriate order.
Bliksem supports various transformations from first order logic to clausal normal form. In particular it supports various structural and non-structural transformations [ABFL94].
t[100] = p | e[100] = 105 |
t[101] = X | e[101] = 102 |
t[102] = s | e[102] = 105 |
t[103] = s | e[103] = 105 |
t[104] = Y | e[104] = 105 |
We use discrimination trees for forward subsumption, forward demodulation, and forward unit simplification. In the discrimination trees the terms are stored in a normalized manner. In addition we use bit filters as a filter for subsumption. There are no optimized techniques for the other derivation rules.
The basic idea of the automatic mode in Gandalf - also used during
the CASC-14 competition - is time-slicing: Gandalf selects a set of
different search strategies, allocates time to these and finally runs
the strategies one after another.
The selection of strategies and the percentage of time they receive is
in a somewhat ad hoc way based on the characteristics of the problem:
percentage of Horn and almost-Horn clauses, the number of clauses,
percentage of clauses where literals can be ordered using term and
variable depths.
There exist also versions of Gandalf for intuitionistic logic
and type theory, see [Tam96] and
[TS96].
More information about Gandalf can be found in
[Tam98].
The source, binaries, manual and other materials are available at
Gandalf is likely to do well in the UEQ, SAT and FOF categories,
although it is unlikely it will win any of these.
M. Newborn
PHERBY is a parallel version of HERBY. PHERBY uses
a bigger set of heuristics than HERBY, as well as cooperative
computing in determining the quality of atoms to be used. This
is the first version of the program.
PHERBY is expected to perform better than the serial version HERBY.
However, it is expected to show much weaker performance compared to
resolution-refutation theorem provers. Out of the 72 theorems that
were used for testing ATP programs at the CADE-14 competition,
PHERBY solved 24 in less than 300 seconds.
PUHR tableaux are trees in which every node is labeled with
a ground atom (unit) or with a positive ground clause.
PUHR tableaux for a given clause set are built starting from the tree
consisting of a single unlabeled node, by repeatedly applying
the following two rules to the leaf of a branch.
The PUHR rule resolves the body literals of an input clause
simultaneously with the units on the branch and extends the branch
with a new leaf, labeled with the (hyper-)resolvent.
The splitting rule extends the branch with n new leaf nodes
B1,...,Bn, if the branch contains a positive
ground clause B1 v...v Bn, but none of
the units B1 or ... or Bn.
If the empty clause is selected, the branch is marked as closed.
If none of these rules is applicable, then the set of units of a branch
represents a Herbrand model of the input clause set.
If all branches are closed, then the input clause set is unsatisfiable.
Complement splitting, a variant of the above splitting rule, is
used to enhance efficiency and to guarantee minimality of the leftmost
generated model [BY96].
A newly appended unit is additionally labeled by the complements of the
positive labels of all its right siblings.
A branch is now marked as closed, too,
if the set of the labels of its units contains two complementary literals.
Incremental evaluation avoids repeated applications of the PUHR rule
with the same clauses and accelerates its applicability test:
If a unit is added to the branch in a splitting step, then exactly those
PUHR steps using this unit have to be applied as the next steps.
This leads to significantly more efficient implementations of the PUHR
tableau method [SG96].
The set of positive ground clauses for which
splitting has not yet been applied, is partitioned into nesting
and non-nesting clauses.
The latter clauses are ordered with increasing length and are
preferred to the former ones, which must be enqueued to ensure fairness.
The splitting rule splits clauses, if possible,
into variable disjoint parts that can be independently refuted.
This rule is not necessary for completeness of the calculus but it makes
SPASS an efficient decision procedure for several decidable subclasses
of first-order logic [HS97].
We put a lot of effort in a clause normal form translation that
produces small clause normal forms.
The major techniques are renaming and improved Skolemization
[NRW98].
The SPASS distribution containing the source code for SPASS is freely
available from the SPASS homepage. The code should compile and run on
any machine where the usual GNU tools are available. This has been
tested on SUN Sparc workstations under Solaris and SunOS, on personal
computers under Linux and Solaris and on Dec Alpha workstations under
Digital Unix.
Octopus is a parallel verison of TGTP, with one master and
as many slaves as available - 25 are expected for the CADE-15
competition. The master computer reads the given theorem
from disk, spawns slaves and then sends the theorem to the
slaves. Each slave carries out the same search process,
but each palces a different limit on the number of literals
in a clause and the number of constants, functions, and variables
in a literal. In addition, some processors use the set-of-support
strategy and others do not.
Octopus modifies only slightly the code of TGTP for the parallel
implementation. It is expected to run on a network of 20-25
pentium computers running at approximately 150Mhz. The
processors communicate with eachother using Parallel Virtual
Machine (pvm).
Octopus is slightly stronger than TGTP, solving 40 theorems
at CADE-14 and currently able to solve another three or four.
In the near future we plan to implement binary/hyperresolution with
clause splitting. Our main emphasis is on the efficient implementation
as opposite to logical optimizations.
We use very efficient data structures for indexing. At the moment we have
code trees for forward subsumption and a combination of discrimination
trees and substitution trees for back subsumption and aggressive term
structure sharing. We are working on code trees for binary resolution
and hyperresolution.
Gandalf c-1.1
T. Tammet
Department of Computer Science, University of Göteborg
tammet@cs.chalmers.se
Architecture
Gandalf implements a number of different basic strategies: binary
ordered resolution for several orderings, versions of set-of-support
resolution, binary unit resolution, hyperresolution. Enhancements are
used for cutting off literals and combining different strategies, in
particular forward and backward reasoning, into one run. Equality is
handled by ordered paramodulation and demodulation.
Implementation
Gandalf is implemented in Scheme. We are using the scm
interpreter developed by A. Jaffer and the Scheme-to-C compiler
Hobbit developed by T. Tammet for the scm system.
http://www.cs.chalmers.se/~tammet/gandalf/
Expected Competition Performance
Since Gandalf won the MIX class of CASC-14, we expect Gandalf to be one of
the top performers in the MIX category during CASC-15. With some luck it
may even win the MIX category again.
References
HERBY 1.0 and PHERBY 1.0
M. Almulla
Department of Mathematics and Computer Science, Kuwait University
almulla@sun470.sci.kuniv.edu.kw
School of Computer Science, McGill University
newborn@opus.cs.mcgill.ca
Architecture
HERBY [Alm95,
AN96] is a semantic-tree based theorem prover
that implements a number of different heuristics
[YAN96, YAN] for
choosing ground atoms from the Herbrand base of the given problem.
The version of HERBY that will participate at CADE-15 is expected
to use 15 atom selection heuristics. The selection of heuristics
and the percentage of time they receive is fixed and is independent
of the characteristics of the problem. An earlier version of
HERBY was tested on the Stickel test set.
Implementation
HERBY and PHERBY are implemented in C.
PHERBY runs on a PVM consisting of a network of approximately 20 120MHz
and 150MHz pentium computers running the LINUX operating system.
Expected Competition Performance
72 theorems were used for testing ATP programs at the CADE-14
competition. HERBY solved 20 of them in less than 300 seconds.
References
Otter 3.0.5 and MACE 1.3.2
W.W. McCune
Mathematics and Computer Science Division, Argonne National Laboratory
mccune@mcs.anl.gov
Architecture
Otter [McC94a,
McC96,
MW97] searches for proofs, and
MACE [McC94b,
McC94c] searches for small finite models.
Both programs apply to statements in first-order (unsorted) logic
with equality. The two programs are independent and work by
completely different methods. (In practice, the programs
are complementary.)
Version 3.0.5 of Otter has been used.
Otter is based on resolution and paramodulation applied to clauses;
an Otter search uses the "given clause algorithm" and
typically involves a large database of clauses;
subsumption and demodulation play an important role.
Version 1.3.2 of MACE has been used.
Roughly speaking, MACE works as follows.
For a given domain size, an equivalent propositional problem is
constructed and a Davis-Putnam-Loveland decision procedure is
applied. MACE typically cannot handle clauses with many variables
or deeply nested terms.
Implementation
Both programs are written in C. Otter uses shared data structures
for clauses and terms, and it uses indexing for resolution,
paramodulation, forward and backward subsumption,
forward and backward demodulation, and unit conflict.
MACE is much simpler than Otter; its Davis-Putnam-Loveland
procedure uses a counting algorithm to do the unit resolutions,
and forward subsumption is not applied.
Expected Competition Performance
These versions of Otter and MACE are nearly identical to
the versions entered in last year's CASC-14. Both
programs did well last year, but neither won any top prizes.
We do not expect any winning performances in CASC-15.
References
Acknowledgments: Ross Overbeek, Larry Wos, Bob Veroff,
and Rusty Lusk contributed to the development of Otter. MACE was motivated
by previous work of John Slaney, Mark Stickel, and Hantao Zhang.
p-SETHEO
A. Wolf, R. Letz, G. Stenz
Institut für Informatik, Technische Universität München
{wolfa,letz,stenzg}@in.tum.de
Architecture
p-SETHEO [Wol98]
(single processor version) is an environment for strategy
parallel competition of automated theorem provers.
Actually, it is used for controlling strategies which consist of different
parameterizations of the automated theorem prover SETHEO
[MI+97].
p-SETHEO time-slices different strategies.
The first successful sub-prover stops the system.
Implementation
The implementation uses Perl, C and shell tools in approximately
1000 lines of code (without SETHEO).
Expected Competition Performance
p-SETHEO is still under construction, so we cannot give any estimations.
References
Functional Satchmo
T. Geisler and S. Panne
Insitut für Informatik, Universität München
{Tim.Geisler,Sven.Panne}@informatik.uni-muenchen.de
Architecture
Functional Satchmo [GPS97] implements
the positive unit hyper-resolution (PUHR) tableau method
[BY96,] a calculus for range-restricted
clauses, which is actually a formalisation of Satchmo
[MB88].
Implementation
Functional Satchmo is implemented in the purely functional language
Haskell, which performs lazy evaluation.
A PUHR tableau is represented as a (lazy) list of its open branches.
The sets of units in tableau branches are represented as
standard discrimination trees.
Expected Competition Performance
We expect average performance in the SAT division.
References
SCOTT v3.1.0
K. Hodgson and J.K. Slaney
Automated Reasoning Project, Australian National University
{kahlil,jks}@arp.anu.edu.au
Architecture
SCOTT (Semantically Constrained OTTER) [SH98]
is an extension of OTTER [McC94] and uses the
same inference rules and overall proof
search structure as OTTER. It adds to its parent prover a module which
searches dynamically for small finite models of subsets of the clauses in
order to extract semantic information which may help to guide the proof
search. For the purposes of CASC-15, the models are used only to adjust the
weights of clauses in order to make OTTER prefer clauses which are false in
the guiding model or models. The difference between the present version of
SCOTT and previous ones [SLM94,
Sla93] is mainly that it now
generates and uses many models rather than just one, resulting in richer
semantic control and in greater stability under small perturbations of problem
formulations and or settings. Although many more elaborate uses of the models
are easy to envisage, for the present the extra weight assigned to clauses is
a function simply of the number of models in which they are true.
Implementation
SCOTT is written in C. It uses the existing program OTTER as a
sub-program to perform the proof search and the existing constraint
satisfaction program FINDER [Sla94] to
generate the models. The
additional code implements the interface between the two, maintains the list
of known models, labels clauses with the numbers of models in which they hold
and controls calls to the model-generator.
Expected Competition Performance
In principle SCOTT should perform at least as well as OTTER across all
parts of the Mix division, if only because it has OTTER as a
subroutine. However, in practice it does not invariably do better than OTTER,
for three reasons. Firstly it has the extra overheads of searching for models
and testing clauses against them. Secondly, the settings of OTTER in
autonomous mode are fine-tuned for searching without semantic guidance and may
not be ideal when a semantic component is added. Thirdly, there are no magic
bullets: any strategy may lose sometimes, and in order to make the comparison
interesting we always make SCOTT try a model-guided search if possible.
Overall, we expect good performance in comparison with cognate systems that
employ relatively naïve guidance.
References
SPASS V1.0.0a
C. Weidenbach
Max-Planck-Institut für Informatik
weidenb@mpi-sb.mpg.de
Architecture
SPASS [Wei97] is an automated theorem
prover for full sorted first-order logic with equality.
Its calculus is an extension of superposition by sorts
[Wei98] and a splitting rule for case analysis.
In contrast to many approaches to order-sorted clausal reasoning,
the calculus enables sort predicates and equations
to occur arbitrarily within clauses. Therefore, the sort theory is not
separated from the problem clauses, but automatically
and dynamically extracted [GMW97].
Nevertheless, it can be used to significantly cut the search space.
Implementation
SPASS is written in ANSI-C and relies on a generic library that supports
data structures and efficient functions for automated theorem proving.
Expected Competition Performance
We hope to improve on the performance of SPASS at CASC-13 and CASC-14.
References
TGTP and Octopus
M. Newborn
School of Computer Science, McGill University
newborn@cs.mcgill.ca
Architecture
TGTP [New97] is a resolution-refutation
theorem prover. It carries
out a sequence of iteratively-deepening searches, looking
for a linear proof. While searching for a linear proof, TGTP
also stores information on generated unit clauses. This
information is stored in a large hash table with space for
one million entries, permitting proofs that are not linear to
be derived. Completeness is sacrificed only by limits on the
number of literals in a clause and on the number of constants,
functions, and variables in a literal. TGTP uses only two
inference rules: binary resolution and binary factoring.
Implementation
TGTP is implemented in C, consisting of twelve -.c files,
two -.h files, 150 functions, and 12286 lines of code.
Expected Competition Performance
At CADE-14, TGTP solved 34 of the 72 theorems. The current
version is slightly stronger, capable of solving a few more
of the 72 theorems, but remaining weak on theorems involving
the equality axioms.
References
Vampire
A. Voronkov
Computer Science Department, Uppsala University
voronkov@csd.uu.se
Architecture
Vampire [Vor95] is a resolution-based theorem
prover.
It is not fully implemented yet, so it is unclear what features we shall be
able to present at CASC.
Implementation
Vampire is implemented in C++ with no tuning to a particular computer or
system. Except for a couple of string library functions, ANSI C++ is
used.
Expected Competition Performance
We do not expect high performance since we shall not be able to
implement equality before the competition. Our aim is to see how it
works on nonequality problems and make a good version for the 1999
competition.
References
Waldmeister 798
T. Hillenbrand, A. Jaeger, B. Löchner, and A. Buch
FB Informatik, Universität Kaiserslautern
{hillen,jaeger,loechner}@informatik.uni-kl.de,arnim.buch@sdm.de
Architecture
Waldmeister is a theorem prover for unit equational logic. Its
proof procedure is unfailing Knuth-Bendix completion
[BDP89].
We started the development from a three-level model of saturation-based
ATP systems [BH96,
HB+96]. Since the efforts
of the last years have led to a satisfactory inference machine, we now
focus on the question how to guide the proof search.
The main completion loop is parameterized by a selection heuristic and
a reduction ordering. Both are derived from the algebraic structure
described by the axioms. Since last year, the derivation process has
been lifted such that control knowledge is expressed declaratively,
which eases the integration of human experience. For full flexibility,
the search parameters may be re-instantiated during the proof search.
Recently, we also realized a database-driven mechanism which allows
simple extensions of the underlying inference calculus, e.g. for
reasoning on Horn clauses as described in
[BDP89:pp.24ff].
Implementation
Reaching efficiency both in terms of time and space has been a major
goal in the development of our inference machine. As to the first,
fast rewriting is achieved by sophisticated indexing techniques. As to
the second, the set of critical pairs is represented in a compressed
data structure.
To incorporate goal orientation into the completion procedure, the
hypotheses are blown up to sets of rewrite successors and sometimes
predecessors. A step-by-step proof object is constructed at run-time.
Our ATP system is implemented in ANSI-C and runs under Sun-OS, Solaris
and Linux. The Waldmeister distribution contains an easy-to-use
version along with documentation material and problem sets. If you are
interested in tackling problems with Waldmeister, just send an
e-mail to the authors.
Expected Competition Performance
We conjecture that Waldmeister will solve the majority of the
competition problems in less time than one would need to study the
corresponding problem specifications.
References