CASC-15 Entrants' System Descriptions
AI-SETHEO
Joachim Draeger
Institut für Informatik, Technische Universität München
draeger@informatik.tu-muenchen.de
Architecture
AI-SETHEO is a hybrid system combining a connection tableaux theorem
prover (SETHEO [GL+94,
MI+97]) and a lemma generator
(DELTA [Sch94]).
This combines aspects of bottom-up
reasoning with the top-down reasoning in connection tableaux theorem
provers. For avoiding a performance breakdown due to the
overwhelming number of brute-force generated lemmata a lemma selection
procedure rejects both trivial and irrelevant lemmata.
Implementation
AI-SETHEO is implemented as a shell script which invokes the prover
SETHEO, the lemma generator DELTA, and the lemma selection procedure. The
selection procedure is realized in GAWK.
Expected Competition Performance
AI-SETHEO is still under construction; thus the competition performance
can only be guessed. Early experiments show that it
should be equal or better to the performance of SETHEO.
References
- GL+94
- Goller C., Letz R., Mayr K., Schumann J. (1994),
SETHEO V3.2: Recent Developments,
Bundy A., Proceedings of the 12th International Conference on
Automated Deduction (Nancy, France),
pp.778-782,
Lecture Notes in Artificial Intelligence 814, Springer-Verlag.
- MI+97
- Moser M., Ibens O., Letz R., Steinbach J., Goller C., Schumann J., Mayr K.
(1997),
SETHEO and E-SETHEO: The CADE-13 Systems,
Journal of Automated Reasoning 18(2), pp.237-246.
- Sch94
- Schumann J. (1994),
DELTA - A Bottom-up Preprocessor for Top-Down Theorem
Provers,
Bundy A., Proceedings of the 12th International Conference on
Automated Deduction (Nancy, France),
pp.774-777,
Lecture Notes in Artificial Intelligence 814, Springer-Verlag.
ANDP
Dafa Li
Department of Applied Mathematics, Tsinghua University
lidafa@tsinghua.edu.cn
Architecture
ANDP [Li97] is a natural deduction system
for first-order logic without equality, adapted from Gentzen's system.
There are five connectives for ANDP, negation, conjunction, disjunction,
implication, and equivalence, whose precedence decreases in that order.
The rules of inference for the connectives are
MP (Modus Ponens),
MT (Modus Tollens),
LDS and RDS (Left and Right Disjunctive Syllogism),
CP (Conditional Proof),
ADD (Addition),
CNJUN (Conjunction),
SIMP (Simplication),
De Morgan's rule,
Double negation,
IMP (Implication),
and
DILEMMA (cases).
There are two quantifiers for ANDP,
the universal quantifier (Ax) and the existential quantifier (Ex).
The rules of inference for the quantifiers are
UG (Universal Generalization),
EG (Existential Generalization),
US (Universal Specialization),
and
ES (Existential Specialization).
There are two unification algorithms for treating quantifiers.
One is for introducing quantifiers, the other is for eliminating quantifiers
without Skolem functions.
Search strategies: First use the rules for connectives then for quantifiers.
First reason from bottom to top, then from top to bottom.
Other strategies: Subsumption with quantifiers, first eliminate existential
quantifiers then universal quantifiers, minimal scopes of quantifiers.
Implementation
ANDP is written in Common LISP (Lucid Common Lisp, KCL) and runs on
a SUN workstation.
In input formulas &, or, and -> are used to
stand for conjunction, disjunction, and implication, respectively.
Therefore input formulas, e.g.,
(Ax)[Px & Qy -> Py], look like
those in text books.
The first-order logic formulas are stored internally in Polish style.
Expected Competition Performance
ANDP is entered in the FOF division.
ANDP can not treat formulas with equality.
To treat equality it is neccessary to introduce axioms for equality.
Therefore ANDP is expected to perform well on theorems without equality,
but may fail to prove theorems with equality.
References
- Li97
- Li D. (1995),
Unification Algorithms for Eliminating and Introducing Quantifiers
in Natural Deduction Automated Theorem Proving,
Journal of Automated Reasoning 18(1), pp.105-134.
E 0.1
S. Schulz
Institut für Informatik, Technische Universität München
schulz@informatik.tu-muenchen.de
Architecture
E 0.1 is a purely equational theorem prover. It is not
primarily intended as an independent prover, but as an preprocessor
implementing the bottom-up part of the METOP
calculus [Mos96], with SETHEO
[MI+97] to
implement the top-down part in a future combined system. However, as
the calculus employs inference rules very similar to those of
superposition calculi [BG94], it
is also possible to use E as a prover on its own.
The implemented calculus combines ordered paramodulation with
rewriting. For the special case that only unit clauses are present,
this calculus degenerates into unfailing completion
[BDP89]
as implemented in DISCOUNT [DKS97]. As
we expect SETHEO to deal with the non-equational part of the theory,
no special rules for non-equational literals have been implemented,
i.e. resolution is simulated via paramodulation and equality
resolution.
A particular feature of E is the term representation. Terms are
maximally shared (within two classes of clauses) to minimize memory
consumption, and we employ graph rewriting techniques to optimize
simplification. As an additional feature, E has a very clean interface
for adding and developing new search control heuristics.
Implementation
E is implemented in ANSI C, using the GNU C compiler. The program has
been successfully installed under SunOS 4.3.x, Solaris 2.x,
HP-UX B 10.20, and various versions of Linux. It uses it's own library,
CLIB, for data types, I/O, and memory management.
Expected Competition Performance
E is in a very early phase of development, so it is hard to predict
actual performance. We expect it to perform adequately in the UEQ
category, while the performance in the MIX category is likely to be
mediocre for the current version.
References
- BDP89
- Bachmair L., Dershowitz N., Plaisted D.A. (1989),
Completion Without Failure,
Ait-Kaci H., Nivat M.,
Resolution of Equations in Algebraic Structures,
pp.1-30,
Academic Press.
- BG94
- Bachmair L., Ganzinger H. (1994),
Rewrite-Based Equational Theorem Proving with Selection and
Simplification,
Journal of Logic and Computation 4(3),
pp.217-247.
- DKS97
- Denzinger J., Kronenburg M., Schulz S. (1997),
DISCOUNT: A Distributed and Learning Equational Prover,
Journal of Automated Reasoning 18(2),
pp.189-198.
- MI+97
- Moser M., Ibens O., Letz R., Steinbach J., Goller C., Schumann J.,
Mayr K. (1997),
SETHEO and E-SETHEO: The CADE-13 Systems,
Journal of Automated Reasoning 18(2),
pp.237-246.
- Mos96
- Moser M. (1996),
Goal-Directed Resoning in Clausal Logic with Equality,
PhD thesis,
Institut für Informatik, Technische Universität München,
Munich, Germany, and Dissertationen zur Informatik 2, CS Press.
DISCOUNT 2.1
S. Schulz, F. Brandt
Institut für Informatik, Technische Universität München
{schulz,brandtf}@informatik.tu-muenchen.de
Architecture
DISCOUNT/TSM is a sequential version of the well-known distributed
equational theorem prover DISCOUNT [DKS97],
using a new learning
heuristic to control the selection of certain inference steps.
DISCOUNT is based on unfailing completion, extended to deal with
arbitrarily quantified goals. The basic algorithm repeatedly selects
an unprocessed equation, rewrites it to normal form w.r.t. the already
processed equations, and uses it to build critical pairs (new
unprocessed equations). The equation is then used for interreduction
and added to the set of processed equations. Interleaved with this
completion process, critical goals are generated (by
paramodulating facts into goals), and normalized. The process stops
when either a goal has been proven or all facts and goals have been
processed.
The selection of the next equation to be processed is controlled by a
term space map (TSM) [Sch98],
a generalized term evaluation tree
[DS96]. A TSM is a recursive structure
capturing characteristic distributions of term nodes in representative
patterns of useful and useless equations collected from previous
proofs. Suitable proofs for learning are selected using tree-based
similarity criteria on TSMs and some simple features of the problem
specification.
Implementation
Both DISCOUNT and the supplementary
programs for proof analysis and learning are implemented in ANSI C.
The programs have been successfully installed under SunOS 4.3.x,
Solaris 2.x and Linux.
DISCOUNT originally was designed in 1991, before the use of efficient
indexing methods became widespread. Consequently, the basic inference
engine is primitive by today's standards, and modern systems (e.g.
Waldmeister [HBF96]) achieve a much
higher throughput.
Expected Competition Performance
We expect this new DISCOUNT version to perform similarly to the
previous versions. That is, it should be a serious contender, but will
probably not be among the top three systems.
References
- DKS97
- Denzinger J., Kronenburg M., Schulz S. (1997),
DISCOUNT: A Distributed and Learning Equational Prover,
Journal of Automated Reasoning 18(2),
pp.189-198.
- DS96
- Denzinger J., Schulz S. (1996),
Learning Domain Knowledge to Improve Theorem Proving,
McRobbie M., Slaney J.K.,
Proceedings of the 13th International Conference on Automated
Deduction (New Brunswick, USA),
pp.62-76,
Lecture Notes in Artificial Intelligence 1104,
Springer-Verlag.
- HBF96
- Hillenbrand T., Buch A., Fettig R. (1996),
On Gaining Efficiency in Completion-Based Theorem Proving,
Ganzinger H.,
Proceedings of the 7th International Conference on Rewriting
Techniques and Applications (New Brunswick, USA),
pp.432-435,
Lecture Notes in Computer Science 1103,
Springer-Verlag.
- Sch98
- Schulz S. (1998),
Term Space Mapping for DISCOUNT,
Proceedings of the CADE-15 Workshop on Using AI Methods in
Deduction (Lindau, Germany).
Bliksem 1.00
H. de Nivelle
CWI Amsterdam
nivelle@cwi.nl
Architecture
Bliksem is a resolution based theorem prover. It supports ordered
resolution, factoring, ordered paramodulation, unit simplification
and demodulation.
Bliksem supports many orders (KBO, RPO, LPO, simple symbol orders,
non-lifaftble orders) and resolution decision procedures.
At this moment it has decision procedures for the E+-class,
for the prefix class exists* forall forall exists*,
and for the guarded fragment of first order logic
[FL+93,
AvN96,
DeN94].
Additionally Bliksem also has the equality strategies of
[BG90].
There is no hyperresolution in Bliksem, but it can be simulated
using an appropriate order.
Bliksem supports various transformations from first order logic to
clausal normal form. In particular it supports various structural and
non-structural transformations [ABFL94].
Implementation
Bliksem is implemented in C. The datastructure that is
used for literals, is the prefix with ends representation.
A literal is stored in two arrays t and e.
The first contains the literal in prefix notaton.
The second contains for each subterm a reference to the end of this subterm.
So suppose that the term p(X,s(s(Y))) is stored on position
100. This is done as follows:
t[100] = p
| e[100] = 105
|
t[101] = X
| e[101] = 102
|
t[102] = s
| e[102] = 105
|
t[103] = s
| e[103] = 105
|
t[104] = Y
| e[104] = 105
|
Clauses are stored in another array as a sequence of indices
referring to the literals.
We use discrimination trees for forward subsumption, forward
demodulation, and forward unit simplification.
In the discrimination trees the terms are stored in a normalized manner.
In addition we use bit filters as a filter for subsumption.
There are no optimized techniques for the other derivation rules.
Expected Competition Performance
As it is the first time for us it is difficult to give expected performances.
References
- AvN96
- Andreka H., van Benthem J., Nemeti I. (1996),
Modal Languages and Bounded Fragments of Predicate Logic,
Research Report ML-96-03,
Institute for Logic, Language and Computation, Amsterdam, The Netherlands.
- BFM94
- Baaz M., Fermüller C., Leitsch A. (1994),
A Non-Elementary Speed Up in Proof Length by Structural Clause
Form Transformation,
Proceedings of the 9th Annual IEEE Symposium on Logic in Computer
Science (Paris, France),
pp.213-219, IEEE Computer Society Press.
- BG90
- Bachmair L., Ganzinger H. (1990),
On Restrictions of Ordered Paramodulation with
Simplification,
Stickel M.E.,
Proceedings of the 10th International Conference on Automated
Deduction (Kaiserslautern, Germany),
pp.427-441,
Lecture Notes in Artificial Intelligence 449, Springer-Verlag.
- DeN94
- DeNivelle H. (1994),
Resolution Games and Non-Liftable Resolution Orderings,
Pacholski L., Tiuryn J.,
Computer Science Logic,
pp.279-293,
Springer-Verlag.
- FL+93
- Fermüller C., Leitsch A., Tammet T., Zamov N. (1993),
Resolution Methods for the Decision Problem,
Lecture Notes in Computer Science 679,
Springer-Verlag.
Gandalf c-1.1
T. Tammet
Department of Computer Science, University of Göteborg
tammet@cs.chalmers.se
Architecture
Gandalf implements a number of different basic strategies: binary
ordered resolution for several orderings, versions of set-of-support
resolution, binary unit resolution, hyperresolution. Enhancements are
used for cutting off literals and combining different strategies, in
particular forward and backward reasoning, into one run. Equality is
handled by ordered paramodulation and demodulation.
The basic idea of the automatic mode in Gandalf - also used during
the CASC-14 competition - is time-slicing: Gandalf selects a set of
different search strategies, allocates time to these and finally runs
the strategies one after another.
The selection of strategies and the percentage of time they receive is
in a somewhat ad hoc way based on the characteristics of the problem:
percentage of Horn and almost-Horn clauses, the number of clauses,
percentage of clauses where literals can be ordered using term and
variable depths.
There exist also versions of Gandalf for intuitionistic logic
and type theory, see [Tam96] and
[TS96].
More information about Gandalf can be found in
[Tam98].
Implementation
Gandalf is implemented in Scheme. We are using the scm
interpreter developed by A. Jaffer and the Scheme-to-C compiler
Hobbit developed by T. Tammet for the scm system.
The source, binaries, manual and other materials are available at
http://www.cs.chalmers.se/~tammet/gandalf/
Expected Competition Performance
Since Gandalf won the MIX class of CASC-14, we expect Gandalf to be one of
the top performers in the MIX category during CASC-15. With some luck it
may even win the MIX category again.
Gandalf is likely to do well in the UEQ, SAT and FOF categories,
although it is unlikely it will win any of these.
- Tam98
- Tammet T. (1998),
Towards Efficient Subsumption,
Kirchner C., Kirchner H.,
Proceedings of the 15th International Conference on Automated
Deduction (Lindau, Germany),
Lecture Notes in Artificial Intelligence, Springer-Verlag.
- Tam96
- Tammet T. (1996),
A Resolution Theorem Prover for Intuitionistic Logic,
McRobbie M., Slaney J.K.,
Proceedings of the 13th International Conference on Automated
Deduction (New Brunswick, USA),
pp.2-16,
Lecture Notes in Artificial Intelligence 1104, Springer-Verlag.
- TS96
- Tammet T., Smith J. (1996),
Optimised Encodings of Fragments of Type Theory in First Order
Logic,
Berardi S., Coppo M.,
Types for Proofs and Programs: Proceedings of the International
Workshop TYPES'95 (Torino, Italy),
pp.265-287,
Lecture Notes in Artificial Intelligence 1158, Springer-Verlag.
HERBY 1.0 and PHERBY 1.0
M. Almulla
Department of Mathematics and Computer Science, Kuwait University
almulla@sun470.sci.kuniv.edu.kw
M. Newborn
School of Computer Science, McGill University
newborn@opus.cs.mcgill.ca
Architecture
HERBY [Alm95,
AN96] is a semantic-tree based theorem prover
that implements a number of different heuristics
[YAN96, YAN] for
choosing ground atoms from the Herbrand base of the given problem.
The version of HERBY that will participate at CADE-15 is expected
to use 15 atom selection heuristics. The selection of heuristics
and the percentage of time they receive is fixed and is independent
of the characteristics of the problem. An earlier version of
HERBY was tested on the Stickel test set.
PHERBY is a parallel version of HERBY. PHERBY uses
a bigger set of heuristics than HERBY, as well as cooperative
computing in determining the quality of atoms to be used. This
is the first version of the program.
Implementation
HERBY and PHERBY are implemented in C.
PHERBY runs on a PVM consisting of a network of approximately 20 120MHz
and 150MHz pentium computers running the LINUX operating system.
Expected Competition Performance
72 theorems were used for testing ATP programs at the CADE-14
competition. HERBY solved 20 of them in less than 300 seconds.
PHERBY is expected to perform better than the serial version HERBY.
However, it is expected to show much weaker performance compared to
resolution-refutation theorem provers. Out of the 72 theorems that
were used for testing ATP programs at the CADE-14 competition,
PHERBY solved 24 in less than 300 seconds.
- Alm95
- Almulla M. (1995),
Analysis of the Use of Semantic Trees for Proofs of
Unsatisfiability,
PhD thesis, McGill University, Montreal, Canada.
- AN96
- Almulla M., Newborn M. (1996),
The Practicality of Generating Semantic Trees for Proofs of
Unsatisfiability,
International Journal of Computer Mathematics 62(1-2),
pp.45-61.
- YAN96
- Yu Q., Almulla M., Newborn M. (1996),
Heuristics for a Semantic Tree Theorem Prover,
Kautz H., Selman B.,
Proceedings of the 4th International Symposium on Artificial
Intelligence and Mathematics AI/MATH-96 (Fort Lauderdale, USA),
pp.162-165.
- YAN
- Yu Q., Almulla M., Newborn M. (Submitted),
Heuristics used by HERBY for Semantic Tree Theorem
Proving,
Annals of Mathematics and Artificial Intelligence.
Otter 3.0.5 and MACE 1.3.2
W.W. McCune
Mathematics and Computer Science Division, Argonne National Laboratory
mccune@mcs.anl.gov
Architecture
Otter [McC94a,
McC96,
MW97] searches for proofs, and
MACE [McC94b,
McC94c] searches for small finite models.
Both programs apply to statements in first-order (unsorted) logic
with equality. The two programs are independent and work by
completely different methods. (In practice, the programs
are complementary.)
Version 3.0.5 of Otter has been used.
Otter is based on resolution and paramodulation applied to clauses;
an Otter search uses the "given clause algorithm" and
typically involves a large database of clauses;
subsumption and demodulation play an important role.
Version 1.3.2 of MACE has been used.
Roughly speaking, MACE works as follows.
For a given domain size, an equivalent propositional problem is
constructed and a Davis-Putnam-Loveland decision procedure is
applied. MACE typically cannot handle clauses with many variables
or deeply nested terms.
Implementation
Both programs are written in C. Otter uses shared data structures
for clauses and terms, and it uses indexing for resolution,
paramodulation, forward and backward subsumption,
forward and backward demodulation, and unit conflict.
MACE is much simpler than Otter; its Davis-Putnam-Loveland
procedure uses a counting algorithm to do the unit resolutions,
and forward subsumption is not applied.
Expected Competition Performance
These versions of Otter and MACE are nearly identical to
the versions entered in last year's CASC-14. Both
programs did well last year, but neither won any top prizes.
We do not expect any winning performances in CASC-15.
References
- McC94a
- McCune W.W. (1994),
MACE: Models and Counterexamples,
http://www.mcs.anl.gov/home/mccune/ar/mace/.
- McC94b
- McCune W.W. (1994),
Otter 3.0 Reference Manual and Guide,
Technical Report ANL-94/6,
Argonne National Laboratory, Argonne, USA.
- McC94c
- McCune W.W. (1994),
A Davis-Putnam Program and its Application to Finite First-Order
Model Search: Quasigroup Existence Problems,
Technical Report ANL/MCS-TM-194,
Argonne National Laboratory, Argonne, USA.
- McC96
- McCune W.W. (1996),
Otter,
http://www.mcs.anl.gov/home/mccune/ar/otter/.
- MW97
- McCune W.W., Wos L. (1997),
Otter: The CADE-13 Competition Incarnations,
Journal of Automated Reasoning 18(2),
pp.211-220.
Acknowledgments: Ross Overbeek, Larry Wos, Bob Veroff,
and Rusty Lusk contributed to the development of Otter. MACE was motivated
by previous work of John Slaney, Mark Stickel, and Hantao Zhang.
p-SETHEO
A. Wolf, R. Letz, G. Stenz
Institut für Informatik, Technische Universität München
{wolfa,letz,stenzg}@in.tum.de
Architecture
p-SETHEO [Wol98]
(single processor version) is an environment for strategy
parallel competition of automated theorem provers.
Actually, it is used for controlling strategies which consist of different
parameterizations of the automated theorem prover SETHEO
[MI+97].
p-SETHEO time-slices different strategies.
The first successful sub-prover stops the system.
Implementation
The implementation uses Perl, C and shell tools in approximately
1000 lines of code (without SETHEO).
Expected Competition Performance
p-SETHEO is still under construction, so we cannot give any estimations.
References
- MI+97
- Moser M., Ibens O., Letz R., Steinbach J., Goller C., Schumann J., Mayr K.
(1997),
SETHEO and E-SETHEO: The CADE-13 Systems,
Journal of Automated Reasoning 2(18), pp.237-246.
- Wol98
- Wolf A. (1998),
p-SETHEO: : Strategy Parallelism in Automated Theorem
Proving,
de Swart H., Proceedings of International Conference TABLEAUX'98:
Analytic Tableaux and Related Methods (Oisterwijk, The Netherlands),
Lecture Notes in Artificial Intelligence, Springer-Verlag.
Functional Satchmo
T. Geisler and S. Panne
Insitut für Informatik, Universität München
{Tim.Geisler,Sven.Panne}@informatik.uni-muenchen.de
Architecture
Functional Satchmo [GPS97] implements
the positive unit hyper-resolution (PUHR) tableau method
[BY96,] a calculus for range-restricted
clauses, which is actually a formalisation of Satchmo
[MB88].
PUHR tableaux are trees in which every node is labeled with
a ground atom (unit) or with a positive ground clause.
PUHR tableaux for a given clause set are built starting from the tree
consisting of a single unlabeled node, by repeatedly applying
the following two rules to the leaf of a branch.
The PUHR rule resolves the body literals of an input clause
simultaneously with the units on the branch and extends the branch
with a new leaf, labeled with the (hyper-)resolvent.
The splitting rule extends the branch with n new leaf nodes
B1,...,Bn, if the branch contains a positive
ground clause B1 v...v Bn, but none of
the units B1 or ... or Bn.
If the empty clause is selected, the branch is marked as closed.
If none of these rules is applicable, then the set of units of a branch
represents a Herbrand model of the input clause set.
If all branches are closed, then the input clause set is unsatisfiable.
Complement splitting, a variant of the above splitting rule, is
used to enhance efficiency and to guarantee minimality of the leftmost
generated model [BY96].
A newly appended unit is additionally labeled by the complements of the
positive labels of all its right siblings.
A branch is now marked as closed, too,
if the set of the labels of its units contains two complementary literals.
Incremental evaluation avoids repeated applications of the PUHR rule
with the same clauses and accelerates its applicability test:
If a unit is added to the branch in a splitting step, then exactly those
PUHR steps using this unit have to be applied as the next steps.
This leads to significantly more efficient implementations of the PUHR
tableau method [SG96].
The set of positive ground clauses for which
splitting has not yet been applied, is partitioned into nesting
and non-nesting clauses.
The latter clauses are ordered with increasing length and are
preferred to the former ones, which must be enqueued to ensure fairness.
Implementation
Functional Satchmo is implemented in the purely functional language
Haskell, which performs lazy evaluation.
A PUHR tableau is represented as a (lazy) list of its open branches.
The sets of units in tableau branches are represented as
standard discrimination trees.
Expected Competition Performance
We expect average performance in the SAT division.
References
- BY96
- Bry F., Yahya A. (1996),
Minimal Model Generation with Positive Unit Hyper-resolution
Tableaux,
Proceedings of TABLEAUX'96: the 5th Workshop on Theorem Proving with
Analytic Tableaux and Related Methods (Palermo, Italy),
pp.143-159,
Lecture Notes in Artificial Intelligence 1071,
Springer-Verlag.
- GPS97
- Satchmo: The Compiling and Functional Variants,
Journal of Automated Reasoning 18(2),
pp.227-236.
- MB88
- Manthey R., Bry F. (1988),
SATCHMO: A Theorem Prover Implemented in Prolog,
Lusk E. Overbeek R.,
Proceedings of the 9th International Conference on Automated
Deduction (Argonne, USA),
pp.415-434,
Lecture Notes in Computer Science 310,
Springer-Verlag.
- SG96
- Schütz H., Geisler T. (1996),
Efficient Model Generation through Compilation,
McRobbie M., Slaney J.K.,
Proceedings of the 13th International Conference on Automated
Deduction (New Brunswick, USA),
pp.431-447,
Lecture Notes in Artificial Intelligence 1104,
Springer-Verlag.
SCOTT v3.1.0
K. Hodgson and J.K. Slaney
Automated Reasoning Project, Australian National University
{kahlil,jks}@arp.anu.edu.au
Architecture
SCOTT (Semantically Constrained OTTER) [SH98]
is an extension of OTTER [McC94] and uses the
same inference rules and overall proof
search structure as OTTER. It adds to its parent prover a module which
searches dynamically for small finite models of subsets of the clauses in
order to extract semantic information which may help to guide the proof
search. For the purposes of CASC-15, the models are used only to adjust the
weights of clauses in order to make OTTER prefer clauses which are false in
the guiding model or models. The difference between the present version of
SCOTT and previous ones [SLM94,
Sla93] is mainly that it now
generates and uses many models rather than just one, resulting in richer
semantic control and in greater stability under small perturbations of problem
formulations and or settings. Although many more elaborate uses of the models
are easy to envisage, for the present the extra weight assigned to clauses is
a function simply of the number of models in which they are true.
Implementation
SCOTT is written in C. It uses the existing program OTTER as a
sub-program to perform the proof search and the existing constraint
satisfaction program FINDER [Sla94] to
generate the models. The
additional code implements the interface between the two, maintains the list
of known models, labels clauses with the numbers of models in which they hold
and controls calls to the model-generator.
Expected Competition Performance
In principle SCOTT should perform at least as well as OTTER across all
parts of the Mix division, if only because it has OTTER as a
subroutine. However, in practice it does not invariably do better than OTTER,
for three reasons. Firstly it has the extra overheads of searching for models
and testing clauses against them. Secondly, the settings of OTTER in
autonomous mode are fine-tuned for searching without semantic guidance and may
not be ideal when a semantic component is added. Thirdly, there are no magic
bullets: any strategy may lose sometimes, and in order to make the comparison
interesting we always make SCOTT try a model-guided search if possible.
Overall, we expect good performance in comparison with cognate systems that
employ relatively naïve guidance.
References
- McC94
- McCune W.W. (1994),
Otter 3.0 Reference Manual and Guide,
Technical Report ANL-94/6,
Argonne National Laboratory, Argonne, USA.
- SH98
- Slaney J.K., Hodgson K. (1998),
System Description: MSCOTT,
TR-ARP-06-98,
Automated Reasoning Project, Australian National University, Canberra,
Australia.
- SLM94
- Slaney J.K., Lusk E., McCune W.W. (1994),
SCOTT: Semantically Constrained Otter, System Description,
Bundy A.,
Proceedings of the 12th International Conference on Automated
Deduction (Nancy, France),
pp.764-768,
Lecture Notes in Artificial Intelligence 814,
Springer-Verlag.
- Sla93
- Slaney J.K. (1993),
SCOTT: A Model-Guided Theorem Prover,
Bajcsy R.,
Proceedings of the 13th International Conference on Artificial
Intelligence (Chambery, France),
pp.109-114,
Morgan-Kaufman.
- Sla94
- Slaney J.K. (1994),
Finite Domain Enumerator, System Description,
Bundy A.,
Proceedings of the 12th International Conference on Automated
Deduction (Nancy, France),
pp.764-768,
Lecture Notes in Artificial Intelligence 814,
Springer-Verlag.
SPASS V1.0.0a
C. Weidenbach
Max-Planck-Institut für Informatik
weidenb@mpi-sb.mpg.de
Architecture
SPASS [Wei97] is an automated theorem
prover for full sorted first-order logic with equality.
Its calculus is an extension of superposition by sorts
[Wei98] and a splitting rule for case analysis.
In contrast to many approaches to order-sorted clausal reasoning,
the calculus enables sort predicates and equations
to occur arbitrarily within clauses. Therefore, the sort theory is not
separated from the problem clauses, but automatically
and dynamically extracted [GMW97].
Nevertheless, it can be used to significantly cut the search space.
The splitting rule splits clauses, if possible,
into variable disjoint parts that can be independently refuted.
This rule is not necessary for completeness of the calculus but it makes
SPASS an efficient decision procedure for several decidable subclasses
of first-order logic [HS97].
We put a lot of effort in a clause normal form translation that
produces small clause normal forms.
The major techniques are renaming and improved Skolemization
[NRW98].
Implementation
SPASS is written in ANSI-C and relies on a generic library that supports
data structures and efficient functions for automated theorem proving.
The SPASS distribution containing the source code for SPASS is freely
available from the SPASS homepage. The code should compile and run on
any machine where the usual GNU tools are available. This has been
tested on SUN Sparc workstations under Solaris and SunOS, on personal
computers under Linux and Solaris and on Dec Alpha workstations under
Digital Unix.
Expected Competition Performance
We hope to improve on the performance of SPASS at CASC-13 and CASC-14.
References
- GMW97
- Ganzinger H., Meyer C., Weidenbach C. (1997),
Soft Typing for Ordered Resolution,
McCune W.W.,
Proceedings of the 14th International Conference on Automated
Deduction (Townsville, Australia),
pp.321-335,
Lecture Notes in Artificial Intelligence 1249, Springer-Verlag.
- HS97
- Hustadt U., Schmidt R.A. (1997),
On Evaluating Decision Procedures for Modal Logics,
Pollack M.E.,
Proceedings of the 15th International Joint Conference on
Artificial Intelligence (Nagoya, Japan),
pp.202-207.
- NRW98
- Nonnengart A., Rock G., Weidenbach C. (1998),
On Generating Small Clause Normal Forms,
Kirchner C., Kirchner H.,
Proceedings of the 15th International Conference on Automated
Deduction (Lindau, Germany),
Lecture Notes in Artificial Intelligence, Springer-Verlag.
- Wei97
- Weidenbach C. (1997),
SPASS: Version 0.49,
Journal of Automated Reasoning 2(18),
pp.247-252.
- Wei98
- Weidenbach C. (1998),
Sorted Unification and Tree Automata,
Bibel W., Schmitt P.H.,
Automated Deduction, A Basis for Applications,
Kluwer Academic Publishers.
TGTP and Octopus
M. Newborn
School of Computer Science, McGill University
newborn@cs.mcgill.ca
Architecture
TGTP [New97] is a resolution-refutation
theorem prover. It carries
out a sequence of iteratively-deepening searches, looking
for a linear proof. While searching for a linear proof, TGTP
also stores information on generated unit clauses. This
information is stored in a large hash table with space for
one million entries, permitting proofs that are not linear to
be derived. Completeness is sacrificed only by limits on the
number of literals in a clause and on the number of constants,
functions, and variables in a literal. TGTP uses only two
inference rules: binary resolution and binary factoring.
Octopus is a parallel verison of TGTP, with one master and
as many slaves as available - 25 are expected for the CADE-15
competition. The master computer reads the given theorem
from disk, spawns slaves and then sends the theorem to the
slaves. Each slave carries out the same search process,
but each palces a different limit on the number of literals
in a clause and the number of constants, functions, and variables
in a literal. In addition, some processors use the set-of-support
strategy and others do not.
Implementation
TGTP is implemented in C, consisting of twelve -.c files,
two -.h files, 150 functions, and 12286 lines of code.
Octopus modifies only slightly the code of TGTP for the parallel
implementation. It is expected to run on a network of 20-25
pentium computers running at approximately 150Mhz. The
processors communicate with eachother using Parallel Virtual
Machine (pvm).
Expected Competition Performance
At CADE-14, TGTP solved 34 of the 72 theorems. The current
version is slightly stronger, capable of solving a few more
of the 72 theorems, but remaining weak on theorems involving
the equality axioms.
Octopus is slightly stronger than TGTP, solving 40 theorems
at CADE-14 and currently able to solve another three or four.
- New97
- Newborn M. (1997),
The Great Theorem Prover, Version 3,
Newborn Software, Montreal, Canada.
Vampire
A. Voronkov
Computer Science Department, Uppsala University
voronkov@csd.uu.se
Architecture
Vampire [Vor95] is a resolution-based theorem
prover.
It is not fully implemented yet, so it is unclear what features we shall be
able to present at CASC.
In the near future we plan to implement binary/hyperresolution with
clause splitting. Our main emphasis is on the efficient implementation
as opposite to logical optimizations.
Implementation
Vampire is implemented in C++ with no tuning to a particular computer or
system. Except for a couple of string library functions, ANSI C++ is
used.
We use very efficient data structures for indexing. At the moment we have
code trees for forward subsumption and a combination of discrimination
trees and substitution trees for back subsumption and aggressive term
structure sharing. We are working on code trees for binary resolution
and hyperresolution.
Expected Competition Performance
We do not expect high performance since we shall not be able to
implement equality before the competition. Our aim is to see how it
works on nonequality problems and make a good version for the 1999
competition.
References
- Vor95
- Voronkov A. (1995),
The Anatomy of Vampire,
Journal of Automated Reasoning 15(2),
pp.237-265.
Waldmeister 798
T. Hillenbrand, A. Jaeger, B. Löchner, and A. Buch
FB Informatik, Universität Kaiserslautern
{hillen,jaeger,loechner}@informatik.uni-kl.de,arnim.buch@sdm.de
Architecture
Waldmeister is a theorem prover for unit equational logic. Its
proof procedure is unfailing Knuth-Bendix completion
[BDP89].
We started the development from a three-level model of saturation-based
ATP systems [BH96,
HB+96]. Since the efforts
of the last years have led to a satisfactory inference machine, we now
focus on the question how to guide the proof search.
The main completion loop is parameterized by a selection heuristic and
a reduction ordering. Both are derived from the algebraic structure
described by the axioms. Since last year, the derivation process has
been lifted such that control knowledge is expressed declaratively,
which eases the integration of human experience. For full flexibility,
the search parameters may be re-instantiated during the proof search.
Recently, we also realized a database-driven mechanism which allows
simple extensions of the underlying inference calculus, e.g. for
reasoning on Horn clauses as described in
[BDP89:pp.24ff].
Implementation
Reaching efficiency both in terms of time and space has been a major
goal in the development of our inference machine. As to the first,
fast rewriting is achieved by sophisticated indexing techniques. As to
the second, the set of critical pairs is represented in a compressed
data structure.
To incorporate goal orientation into the completion procedure, the
hypotheses are blown up to sets of rewrite successors and sometimes
predecessors. A step-by-step proof object is constructed at run-time.
Our ATP system is implemented in ANSI-C and runs under Sun-OS, Solaris
and Linux. The Waldmeister distribution contains an easy-to-use
version along with documentation material and problem sets. If you are
interested in tackling problems with Waldmeister, just send an
e-mail to the authors.
Expected Competition Performance
We conjecture that Waldmeister will solve the majority of the
competition problems in less time than one would need to study the
corresponding problem specifications.
References
- BDP89
- Bachmair L., Dershowitz N., Plaisted D.A. (1989),
Completion Without Failure,
Ait-Kaci H., Nivat M.,
Resolution of Equations in Algebraic Structures,
pp.1-30,
Academic Press.
- BH96
- Buch A., Hillenbrand T. (1996),
Waldmeister: Development of a High Performance Completion-Based
Theorem Prover,
SEKI-Report SR-96-01,
AG Effiziente Algorithmen, Universitaet Kaiserslautern, Kaiserslautern,
Germany.
- HB+97
- Hillenbrand T., Buch A., Vogt R., Löchner B. (1997),
Waldmeister: High Performance Equational Deduction,
Journal of Automated Reasoning 18(2),
pp.265-270.