Entrants' System Descriptions
CARINE 0.72
Paul Haroun, Monty Newborn
McGill University
Canada
pharoun@cs.mcgill.ca
Architecture
CARINE is a first-order classical logic ATP system intended for experimental
purposes.
It is based on ideas from THEO [New01].
The inference rules implemented so far are binary resolution and binary
factoring.
Implementation
CARINE is implemented in ANSI-C and currently runs under SunOS and Microsoft
Windows.
It has not yet been tested on Linux but it works in the Linux emulation
layer: Cygwin.
CARINE relies heavily on tables that are either 1) representations of
graphs (i.e. dependency/adjacency matrices) or 2) lookup tables resulting
from memoization and dynamic programming techniques.
The graphs are relations that are formed from the information gathered from
the base clauses.
These relations are mainly groupings of the base clauses or their literals
according to certain common characteristics.
The tables are allocated dynamically based on the input and remain the same
size throughout the search.
A finite automaton, also implemented as a table, is used to store unit
clauses.
The system will be available at:
http://www.cs.mcgill.ca/~pharoun/atp_carine_site
Strategies
CARINE is based on a semi-linear resolution search.
It performs the implemented inference rules in an iteratively deepening
depth first search until a unit clause is obtained.
The unit clause is then evaluated and if it passes the tests it is resolved
with all the stored unit clauses and if none yields the empty clause then
it is added to the unit clauses table.
The main implemented strategies are delayed clause construction, time slicing,
extended depth search and memoization.
Delayed clause construction is very useful when clauses contain many
literals and/or many terms.
Only an "interesting" clause is actually constructed.
This strategy is the heart of the system.
Input parameters are used to control the search.
Time slicing, if turned on, would calculate time slices based on the
evaluation of the input clauses at the beginning of the search.
Each time slice uses different parameters to control the search.
Currently, only two time slices are applicable.
In the first slice an aggressive search is performed and in the second
time slice a more conservative search is performed.
Extended depth search is performed based on certain heuristics.
Certain paths are explored deeper than the iteration depth when the
heuristics apply.
Expected Competition Performance
CARINE is a system built mainly for experimental purposes.
It is still in its elementary stage.
References
- New01
- Newborn M. (2001),
Automated Theorem Proving: Theory and Practice,
Springer.
CiME 2.01
Evelyne Contejean, Benjamin Monate
Université Paris-Sud, France
contejea@lri.fr
monate@lix.polytechnique.fr
Architecture
CiME [CM+00] is intended to be a toolkit,
which contains nowadays the following features:
- An interactive toplevel to allow naming of objects and call to
various functions.
- Solving Diophantine constraints over finite intervals
- Solving Presburger constraints
- Unification modulo disjoint classical equational theories, such
as the free theory, C, AC, ACU, Abelian Groups, Boolean Rings
- String Rewriting Systems, KB completion.
- Parameterized String Rewriting Systems confluence checking
- Term Rewriting Systems, possibly with commutative or
associative-commutative symbols, KB or ordered completion.
- Termination of TRSs using standard or dependency pairs criteria,
automatic generation of termination orderings based on polynomial
interpretations, including weak orderings for dependency pairs
criteria.
The ordered completion of term rewriting systems will be used during
the competion to attempt to solve unification problems, that is
problems in the UEQ division [CM96].
Implementation
CiME2 is fully written in
Objective CAML, a functional language of the ML family developed in the
CRISTAL project at
INRIA Rocquencourt.
CiME2 is available at:
http://cime.lri.fr/
as binaries for SPARC workstations running Solaris (at least version 2.6) and
for pentium PCs running Linux, and its sources are available by read-only
anynomous CVS.
Strategies
There are two distinct kinds of strategies to perform completion:
- The first one is, given an equation, how to choose its orientation
when it becomes a rule?
The choice is made thanks to an ordering which has usually to be
provided by the user.
During the competition, this ordering is chosen among RPO, KBO or
a polynomial ordering according to an analysis of the input
axioms, where the known axioms are among classical ones.
- The second one is which inference rule has to be applied to the
system, among orienting an equation into a rule and computing critical
pairs.
Each of these choices is given a weight, and the lowest weighted choice
is made.
The weight depends on the size of the involved equations/rules and
on how "old" they are.
Expected Competition Performance
This will be the second participation of CiME2 in CASC, in the UEQ
division.
References
- CM+00
- Contejean E., Marché C., Monate B., Urbain X. (2000),
CiME version 2,
http://cime.lri.fr.
- CM96
- Contejean E., Marché C. (1996),
CiME: Completion Modulo E,
Ganzinger H.,
Proceedings of the 7th International Conference on Rewriting
Techniques and Applications
(New Brunswick, USA),
pp.416-419,
Lecture Notes in Computer Science 1103,
Springer-Verlag.
Acknowledgments: Claude Marché and Xavier Urbain contributed to the
development of CiME 2.
DCTP 1.3 and 10.2p
Gernot Stenz
Max-Planck-Institut für Informatik, Germany
stenz@mpi-sb.mpg.de
Architecture
DCTP 1.3 [Ste02a] is an automated theorem
prover for first order clause logic.
It is an implementation of the disconnection calculus described in
[Bil96,LS01,Ste02b].
The disconnection calculus is a proof confluent and inherently
cut-free tableau calculus with a weak connectedness condition.
The inherently depth-first proof search is guided by a literal selection
based on literal instantiatedness or literal complexity and a heavily
parameterised link selection.
The pruning mechanisms mostly rely on different forms of variant
deletion and unit based strategies.
Additionally the calculus has been augmented by full tableau pruning.
The new DCTP 1.3 has been enhanced with respect to clause preprocessing,
selection functions and closure heuristics.
Most prominent among the improvements is the introduction of a unification
index for finding connections, which also replaces the connection graph
hitherto used.
DCTP 10.2p is a strategy parallel version using the technology of E-SETHEO
[SW99]
to combine several different strategies based on DCTP 1.3.
Implementation
DCTP 1.3 has been implemented as a monolithic system in the Bigloo dialect
of the Scheme language.
The most important data structures are perfect discrimination trees,
which are used in many variations.
DCTP 10.2p has been derived of the Perl implementation of E-SETHEO
and includes DCTP 1.3 as well as additional components written in Prolog and
Shell tools.
Both versions run under Solaris and Linux.
Strategies
DCTP 1.3 is a single strategy prover.
Individual strategies are started by DCTP 10.2p using the schedule based
resource allocation scheme known from the E-SETHEO system.
Of course, different schedules have been precomputed for the syntactic problem
classes.
The problem classes are more or less identical with the sub-classes of the
competition organisers.
We have no idea whether or not this conflicts with the organisers' tuning
restrictions.
Expected Competition Performance
We expect both DCTP 1.3 and DCTP 10.2p to perform reasonably well, in
particular in the EPR (in any case) and SAT (depending on the selection of
problems for the competition) categories.
References
- Bil96
- Billon J-P. (1996),
The Disconnection Method: A Confluent Integration of Unification
in the Analytic Framework,
Miglioli P., Moscato U., Mundici D., Ornaghi M.,
Proceedings of TABLEAUX'96: the 5th Workshop on Theorem Proving with
Analytic Tableaux and Related Methods
(Palermo, Italy),
pp.110-126,
Lecture Notes in Artificial Intelligence 1071,
Springer-Verlag.
- LS01
- Letz R., Stenz G. (2001),
Model Elimination and Connection Tableau Procedures,
Robinson A., Voronkov A.,
Handbook of Automated Reasoning,
pp.2015-2114,
Elsevier Science.
- SW99
- Stenz G., Wolf A. (1999),
E-SETHEO: Design, Configuration and Use of a Parallel Automated
Theorem Prover,
Foo N.,
Proceedings of AI'99: The 12th Australian Joint Conference on
Artificial Intelligence
(Sydney, Australia),
pp.231-243,
Lecture Notes in Artificial Intelligence 1747,
Springer-Verlag.
- Ste02a
- Stenz G. (2002),
DCTP 1.2 - System Abstract,
Fermüller C., Egly U.,
Proceedings of TABLEAUX 2002: Automated Reasoning with Analytic
Tableaux and Related Methods
(Copenhagen, Denmark),
pp.335-340,
Lecture Notes in Artificial Intelligence 2381,
Springer-Verlag.
- Ste02b
- Stenz G. (2002),
The Disconnection Calculus, PhD thesis,
Institut für Informatik, Technische Universität München,
Munich, Germany.
E 0.8 and EP 0.8
Stephan Schulz
Technische Universität München, Germany,
and
RISC-Linz, Johannes Kepler Universität, Austria
schulz@informatik.tu-muenchen.de
Architecture
E 0.8
[Sch01,Sch02]
is a purely equational theorem prover.
The calculus used by E combines superposition (with selection of negative
literals) and rewriting.
No special rules for non-equational literals have been implemented,
i.e. resolution is simulated via paramodulation and equality
resolution.
E also implements AC redundancy elimination and AC simplification for
dynamically recognized associative and commutative equational theories,
as well as pseudo-splitting for clauses.
It now also unfolds equational definitions in a preprocessing stage.
E is based on the DISCOUNT-loop variant of the given-clause
algorithm, i.e. a strict separation of active and passive facts.
Proof search in E is primarily controlled by a literal selection
strategy, a clause evaluation heuristic, and a simplification
ordering.
The prover supports a large number of preprogrammed literal selection
strategies, many of which are only experimental.
Clause evaluation heuristics can be constructed on the fly by combining
various parameterized primitive evaluation functions, or can be
selected from a set of predefined heuristics.
Supported term orderings are several parameterized instances of
Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO).
An automatic mode can select literal selection strategy, term ordering
(different versions of KBO and LPO), and search heuristic based on simple
problem characteristics.
EP 0.8 is just a combination of E 0.8 in verbose mode and a proof
analysis tool extracting the used inference steps.
Implementation
E is implemented in ANSI C, using the GNU C compiler.
The most outstanding feature is the global sharing of rewrite steps.
Contrary to earlier version of E, which destructively changed all shared
instances of a term, the latest version only adds a rewrite
link from the rewritten to the new term.
In effect, E is caching rewrite operations as long as sufficient memory
is available.
A second important feature is the use of perfect discrimination trees
with age and size constraints for rewriting and unit-subsumption.
The program has been successfully installed under SunOS 4.3.x,
Solaris 2.x, HP-UX B 10.20, MacOS-X, and various
versions of Linux.
Sources of the latest released version and a current snapshot are available
freely from:
http://www4.informatik.tu-muenchen.de/~schulz/WORK/eprover.html
EP 0.8 is a simple Bourne shell script calling E and the postprocessor in
a pipeline.
Strategies
E's automatic mode is optimized for performance on TPTP 2.5.1.
The optimization is based on a fairly large number of test runs and
consists of the selection of one of about 50 different strategies for
each problem.
All test runs have been performed on SUN Ultra 60/300 machines with a
time limit of 120 seconds (or roughly equivalent configurations).
All individual strategies are general purpose, the worst one solves
about 45% of TPTP 2.5.1, the best one 55%.
E distinguishes problem classes based on a number of features, all of
which have between two and four possible values. These are:
- Is the most general non-negative clause unit, Horn, or Non-Horn?
- Is the most general negative clause unit or non-unit?
- Are all negative clauses unit clauses?
- Are all literals equality literals, are some literals equality
literals, or is the problem non-equational?
- Are there only a few, some, or many positive non-ground
unit clauses among the axioms?
- Are all goals (negative clauses) ground?
- Are there a few, some, or many clauses in the problem?
- Are there a few, some, or many literals?
- Are there a few, some, or many (sub)terms?
- Are there a few, some or many positive ground
unit clauses among the axioms?
- Is the maximum arity of any function symbol 0, 1, 2, or greater?
Wherever there is a selection of few, some, and many of a certain
entity, the limits are selected automatically with the aim of
splitting the set of clauses into three sets of approximately equal
size based on this one feature.
For each non-empty class, we assign the most general candidate
heuristic that solves the same number of problems on this class as the
best heuristic on this class does.
Empty classes are assigned the globally best heuristic.
Typically, most selected heuristics are assigned to more than one class.
Expected Competition Performance
In the last year, E performed well in the MIX category of CASC and
came in third in the UEQ division.
We believe that E will again be among the strongest provers in the
MIX category, in particular due to its good performance for Horn problems.
In UEQ, E will probably be beaten only by Waldmeister, and, possibly
E-SETHEO (which incorporates E).
EP 0.8 will be hampered by the fact that it has to analyse the
inference step listing, an operation that typically is about as
expensive as the proof search itself.
Nevertheless, it should be competitive among the MIX* systems.
References
- Sch01
- Schulz S. (2001),
System Abstract: E 0.61,
Gore R., Leitsch A., Nipkow T.,
Proceedings of the International Joint Conference on Automated
Reasoning
(Siena, Italy),
pp.370-375,
Lecture Notes in Artificial Intelligence 2083,
Springer-Verlag.
- Sch02
- Schulz S. (2002),
E: A Brainiac Theorem Prover,
AI Communications 15(2-3),
pp.111-126.
E-SETHEO csp02
Reinhold Letz, Stephan Schulz, Gernot Stenz
Technische Universität München,
Germany
{letz,schulz,stenz}@informatik.tu-muenchen.de
Architecture
E-SETHEO is a compositional theorem prover for formulae in first order
clause logic, combining the systems E [Sch01],
DCTP [Ste02] and SETHEO
[MI+97].
It incorporates different calculi and proof procedures like superposition,
model elimination and semantic trees (the DPLL procedure).
Furthermore, the system contains transformation techniques which may split
the formula into independent subparts or which may perform a ground
instantiation.
Finally, advanced methods for controlling and optimizing the combination
of the subsystems are applied.
The first-order variant of E-SETHEO no longer uses Flotter
[WGR96] as a preprocessing module for
transforming non-clausal formulae to clausal form.
Instead, a more primitive normal form transformation is employed.
Since version 99csp of E-SETHEO, the different strategies are run sequentially,
one after the other.
E-SETHEO csp02 incorporates the new version of the disconnection prover DCTP
with integrated equality handling as a new strategy as well as a new version
of the E prover.
The new Scheme version of SETHEO that is in use features local unit failure
caching [LS01] and lazy root paramodulation,
an optimisation of lazy paramodulation which is complete in the Horn case
[LS02].
Implementation
According to the diversity of the contained systems, the modules of
E-SETHEO are implemented in different programming languages like C,
Prolog, Scheme, and Shell tools.
The program runs under Solaris and, with a little luck, under Linux,
too. Sources are available from the authors.
Strategies
Individual strategies are started by E-SETHEO depending on the allocation
of resources to the different strategies, so-called schedules,
which have been computed from experimental data using machine learning
techniques as described in [SW99].
Schedule selection depends on syntactic characteristics of the input formula
such as the Horn-ness of formulae, whether a problem contains equality
literals or whether the formula is in the Bernays-Schönfinkel class.
The problem classes are more or less identical with the sub-classes of the
competition.
We have no idea whether or not this conflicts with the organisers' tuning
restrictions.
Expected Competition Performance
E-SETHEO csp02 was the CASC-18 EPR division winner.
References
- LS01
- Letz R., Stenz G. (2001),
Model Elimination and Connection Tableau Procedures,
Robinson A., Voronkov A.,
Handbook of Automated Reasoning,
pp.2015-2114,
Elsevier Science.
- LS02
- Letz R., Stenz G. (2002),
Integration of Equality Reasoning into the Disconnection
Calculus,
Fermüller C., Egly U.,
Proceedings of TABLEAUX 2002: Automated Reasoning with Analytic
Tableaux and Related Methods
(Copenhagen, Denmark),
pp.176-190,
Lecture Notes in Artificial Intelligence 2381,
Springer-Verlag.
- MI+97
- Moser M., Ibens O., Letz R., Steinbach J., Goller C., Schumann J.,
Mayr K. (1997),
SETHEO and E-SETHEO: The CADE-13 Systems,
Journal of Automated Reasoning 18(2),
pp.237-246.
- Sch01
- Schulz S. (2001),
System Abstract: E 0.61,
Gore R., Leitsch A., Nipkow T.,
Proceedings of the International Joint Conference on Automated
Reasoning
(Siena, Italy),
pp.370-375,
Lecture Notes in Artificial Intelligence 2083,
Springer-Verlag.
- SW99
- Stenz G., Wolf A. (1999),
E-SETHEO: Design, Configuration and Use of a Parallel Automated
Theorem Prover,
Foo N.,
Proceedings of AI'99: The 12th Australian Joint Conference on
Artificial Intelligence
(Sydney, Australia),
pp.231-243,
Lecture Notes in Artificial Intelligence 1747,
Springer-Verlag.
- Ste02
- Stenz G. (2002),
DCTP 1.2 - System Abstract,
Fermüller C., Egly U.,
Proceedings of TABLEAUX 2002: Automated Reasoning with Analytic
Tableaux and Related Methods
(Copenhagen, Denmark),
pp.335-340,
Lecture Notes in Artificial Intelligence 2381,
Springer-Verlag.
- WGR96
- Weidenbach C., Gaede B., Rock G. (1996),
SPASS and FLOTTER,
McRobbie M., Slaney J.K.,
Proceedings of the 13th International Conference on Automated
Deduction
(New Brunswick, USA),
pp.141-145,
Lecture Notes in Artificial Intelligence 1104,
Springer-Verlag.
E-SETHEO csp03
Reinhold Letz, Stephan Schulz, Gernot Stenz
Technische Universität München, Germany,
Max-Planck-Institut für Informatik, Germany,
and
RISC-Linz, Johannes Kepler Universität, Austria
{letz,schulz,stenz}@informatik.tu-muenchen.de
Architecture
E-SETHEO is a compositional theorem prover for formulae in first order
clause logic, combining the systems E [Sch01],
DCTP [Ste02] and SETHEO
[MI+97].
It incorporates different calculi and proof procedures like superposition,
model elimination and semantic trees (the DPLL procedure).
Furthermore, the system contains transformation techniques which may split
the formula into independent subparts or which may perform a ground
instantiation.
Finally, advanced methods for controlling and optimizing the combination
of the subsystems are applied.
The first-order variant of E-SETHEO no longer uses Flotter
[WGR96] as a preprocessing module for
transforming non-clausal formulae to clausal form.
Instead, a more primitive normal form transformation is employed.
Since version 99csp of E-SETHEO, the different strategies are run sequentially,
one after the other.
E-SETHEO csp03 incorporates the new version of the disconnection prover DCTP
with new preprocessing and heuristics as a new strategy, as well as a new
version of the E prover.
The new Scheme version of SETHEO that is in use features local unit failure
caching [LS01] and lazy root
paramodulation, an optimisation of lazy paramodulation which is complete in
the Horn case [LS02].
Other than that (and a new resource distribution scheme), E-SETHEO csp03 is
identical to E-SETHEO csp02.
Implementation
According to the diversity of the contained systems, the modules of
E-SETHEO are implemented in different programming languages like C,
Prolog, Scheme, and Shell tools.
The program runs under Solaris and Linux.
Sources are available from the authors.
Strategies
Individual strategies are started by E-SETHEO depending on the allocation
of resources to the different strategies, so-called schedules,
which have been computed from experimental data using machine learning
techniques as described in [SW99].
Schedule selection depends on syntactic characteristics of the input formula
such as the Horn-ness of formulae, whether a problem contains equality
literals, or whether the formula is in the Bernays-Schönfinkel class.
The problem classes are more or less identical with the sub-classes of the
competition.
We have no idea whether or not this conflicts with the organisers' tuning
restrictions.
Expected Competition Performance
We expect E-SETHEO to perform well in all categories it participates in.
References
- LS01
- Letz R., Stenz G. (2001),
Model Elimination and Connection Tableau Procedures,
Robinson A., Voronkov A.,
Handbook of Automated Reasoning,
pp.2015-2114,
Elsevier Science.
- LS02
- Letz R., Stenz G. (2002),
Integration of Equality Reasoning into the Disconnection
Calculus,
Fermüller C., Egly U.,
Proceedings of TABLEAUX 2002: Automated Reasoning with Analytic
Tableaux and Related Methods
(Copenhagen, Denmark),
pp.176-190,
Lecture Notes in Artificial Intelligence 2381,
Springer-Verlag.
- MI+97
- Moser M., Ibens O., Letz R., Steinbach J., Goller C., Schumann J.,
Mayr K. (1997),
SETHEO and E-SETHEO: The CADE-13 Systems,
Journal of Automated Reasoning 18(2),
pp.237-246.
- Sch01
- Schulz S. (2001),
System Abstract: E 0.61,
Gore R., Leitsch A., Nipkow T.,
Proceedings of the International Joint Conference on Automated
Reasoning
(Siena, Italy),
pp.370-375,
Lecture Notes in Artificial Intelligence 2083,
Springer-Verlag.
- SW99
- Stenz G., Wolf A. (1999),
E-SETHEO: Design, Configuration and Use of a Parallel Automated
Theorem Prover,
Foo N.,
Proceedings of AI'99: The 12th Australian Joint Conference on
Artificial Intelligence
(Sydney, Australia),
pp.231-243,
Lecture Notes in Artificial Intelligence 1747,
Springer-Verlag.
- Ste02
- Stenz G. (2002),
DCTP 1.2 - System Abstract,
Fermüller C., Egly U.,
Proceedings of TABLEAUX 2002: Automated Reasoning with Analytic
Tableaux and Related Methods
(Copenhagen, Denmark),
pp.335-340,
Lecture Notes in Artificial Intelligence 2381,
Springer-Verlag.
- WGR96
- Weidenbach C., Gaede B., Rock G. (1996),
SPASS and FLOTTER,
McRobbie M., Slaney J.K.,
Proceedings of the 13th International Conference on Automated
Deduction
(New Brunswick, USA),
pp.141-145,
Lecture Notes in Artificial Intelligence 1104,
Springer-Verlag.
Gandalf c-2.5
Tanel Tammet
Tallinn Technical University, Estonia
Safelogic AB, Sweden
tammet@cc.ttu.ee
Architecture
Gandalf
[Tam97,Tam98]
is a family of automated theorem provers, including classical, type theory,
intuitionistic and linear logic provers, plus finite a model builder.
The version c-2.5 contains the classical logic prover for clause
form input and the finite model builder.
One distinguishing feature of Gandalf is that it contains a large
number of different search strategies and is capable of automatically
selecting suitable strategies and experimenting with these strategies.
Gandalf is available under GPL.
There exists a separate commercial version of Gandalf, called G,
developed and distributed by Safelogic AB
(www.safelogic.se), which
contains numerous additions, strategies, and optimisations, aimed specifically
at verification of large systems.
The finite model building component of Gandalf uses the Zchaff propositional
logic solver by L.Zhang [MM+01] as an external program called by Gandalf.
Zchaff is not free, although it can be used freely for research purposes.
Gandalf is not optimised for Zchaff or linked together with it: Zchaff
can be freely replaced by other satisfiability checkers.
Implementation
Gandalf is implemented in Scheme and compiled to C using the Hobbit
Scheme-to-C compiler. Version scm5d6 of the Scheme interpreter scm by A.Jaffer
is used as the underlying Scheme system.
Gandalf has been tested on Linux, Solaris and MS Windows under Cygwin.
The propositional satisifiability checker Zchaff used by Gandalf during
finite model building is implemented by L.Zhang in C++.
Gandalf should be publicly available at:
http://www.ttu.ee/it/gandalf
Strategies
One of the basic ideas used in Gandalf is time-slicing: Gandalf typically
runs a number of searches with different strategies one after another, until
either the proof is found or time runs out. Also, during each specific run
Gandalf typically modifies its strategy as the time limit for this run
starts coming closer. Selected clauses from unsuccessful runs are sometimes
used in later runs.
In the normal mode Gandalf attempts to find only unsatisfiability.
It has to be called with a -sat flag to find satisfiability.
Gandalf selects the strategy list according to the following criteria:
- Unsatisfiability checking.
Gandalf selects the basic strategies from the following list:
hyperresolution, binary sos resolution, unit resolution, ordered
resolution (term-depth based, literal size based and polarity plus
literal size and structure based).
Each strategy may be iterated over a limit on term depth.
For clause sets containing equality, some strategies are tried with both
the Knuth-Bendix ordering and recursive path ordering, as well as with
several different ordering principles of function symbols for these
orderings.
Typically Gandalf selects one or two strategies to iterate over the term
depth limit and one or two strategies to iterate over the selection of
equality orderings.
At the second half of each strategy run Gandalf imposes additional
restrictions, like introducing unit restriction and switching over
to strict best-first clause selection.
The strategy list selection criteria for a particular problem is based
on following:
- Problem class from TPTP: UEQ, PEQ, HNE, HEQ, NEQ, NNE.
This strictly determines the list of basic strategies.
The following criteria determine relative amount of time given to
each strategy.
- Problem size.
A problem is classified either as small, medium, or big, according
to the number of clauses in the problem.
For bigger problems, the set of support strategy gets relatively
more time than other strategies.
- Percentage of clauses which can be ordered by term depth:
small, medium, or all.
For bigger percentages, the term depth ordering gets relatively
more time than other strategies.
- Satisfiability checking.
The following strategies are run:
- Finite model building by incremental search through function symbol
interpretations.
- Ordered binary resolution (term depth): only for problems not
containing equality.
- Finite model building using MACE-style flattening and external
propositional prover.
- Satisfiability/unsatisfiability checking for essentially propositional
problems.
The following strategies are run:
- Unsatisfiability search by resolution.
- Satisfiability/unsatisfiability search by propositonal saturation.
- Satisfiability search for small models by propositonal saturation.
Expected Competition Performance
Gandalf c-2.5-SAT was the CASC-18 SAT division winner.
References
- MM+01
- Moskewicz M., Madigan C., Zhao Y., Zhang L., Malik S. (2001),
Chaff: Engineering an Efficient SAT Solver,
Blaauw D., Lavagno L.,
Proceedings of the 39th Design Automation Conference
(Las Vegas, USA),
pp.530-535.
- Tam97
- Tammet T. (1997),
Gandalf,
Journal of Automated Reasoning 18(2),
pp.199-204.
- Tam98
- Tammet T. (1998),
Towards Efficient Subsumption,
Kirchner C., Kirchner H.,
Proceedings of the 15th International Conference on Automated
Deduction
(Lindau, Germany),
pp.427-440,
Lecture Notes in Artificial Intelligence 1421,
Springer-Verlag.
Gandalf c-2.6
Tanel Tammet
Tallinn Technical University, Estonia
tammet@cc.ttu.ee
Architecture
Gandalf
[Tam97,Tam98]
is a family of automated theorem provers, including classical, type theory,
intuitionistic and linear logic provers, plus finite a model builder.
The version c-2.6 contains the classical logic prover for clause
form input and the finite model builder.
One distinguishing feature of Gandalf is that it contains a large
number of different search strategies and is capable of automatically
selecting suitable strategies and experimenting with these strategies.
The finite model building component of Gandalf uses the Zchaff propositional
logic solver by L.Zhang [MM+01] as an
external program called by Gandalf.
Zchaff is not free, although it can be used freely for research purposes.
Gandalf is not optimised for Zchaff or linked together with it: Zchaff
can be freely replaced by other satisfiability checkers.
Implementation
Gandalf is implemented in Scheme and compiled to C using the Hobbit
Scheme-to-C compiler.
Version scm5d6 of the Scheme interpreter scm by A.Jaffer is used as the
underlying Scheme system.
Zchaff is implemented in C++.
Gandalf has been tested on Linux, Solaris, and MS Windows under Cygwin.
Gandalf is available under GPL from:
http://www.ttu.ee/it/gandalf
Strategies
One of the basic ideas used in Gandalf is time-slicing: Gandalf typically
runs a number of searches with different strategies one after another, until
either the proof is found or time runs out. Also, during each specific run
Gandalf typically modifies its strategy as the time limit for this run
starts coming closer. Selected clauses from unsuccessful runs are sometimes
used in later runs.
In the normal mode Gandalf attempts to find only unsatisfiability.
It has to be called with a -sat flag to find satisfiability.
Gandalf selects the strategy list according to the following criteria:
- Unsatisfiability checking.
Gandalf selects the basic strategies from the following list:
hyperresolution, binary sos resolution, unit resolution, ordered
resolution (term-depth based, literal size based and polarity plus
literal size and structure based).
Each strategy may be iterated over a limit on term depth.
For clause sets containing equality, some strategies are tried with both
the Knuth-Bendix ordering and recursive path ordering, as well as with
several different ordering principles of function symbols for these
orderings.
Typically Gandalf selects one or two strategies to iterate over the term
depth limit and one or two strategies to iterate over the selection of
equality orderings.
At the second half of each strategy run Gandalf imposes additional
restrictions, like introducing unit restriction and switching over
to strict best-first clause selection.
The strategy list selection criteria for a particular problem is based
on the following:
- Problem class from TPTP: UEQ, PEQ, HNE, HEQ, NEQ, NNE.
This strictly determines the list of basic strategies.
The following criteria determine relative amount of time given to
each strategy.
- Problem size.
A problem is classified either as small, medium, or big, according
to the number of clauses in the problem.
For bigger problems, the set of support strategy gets relatively
more time than other strategies.
- Percentage of clauses which can be ordered by term depth:
small, medium, or all.
For bigger percentages, the term depth ordering gets relatively
more time than other strategies.
- Satisfiability checking.
The following strategies are run:
- Finite model building by incremental search through function symbol
interpretations.
- Ordered binary resolution (term depth): only for problems not
containing equality.
- Finite model building using MACE-style flattening and the external
propositional prover.
- Satisfiability/unsatisfiability checking for essentially propositional
problems.
The following strategies are run:
- Unsatisfiability search by resolution.
- Satisfiability/unsatisfiability search by propositonal saturation.
- Satisfiability search for small models by propositonal saturation.
Expected Competition Performance
We expect Gandalf to be among the best provers in most of the main categories
it competes in.
References
- MM+01
- Moskewicz M., Madigan C., Zhao Y., Zhang L., Malik S. (2001),
Chaff: Engineering an Efficient SAT Solver,
Blaauw D., Lavagno L.,
Proceedings of the 39th Design Automation Conference
(Las Vegas, USA),
pp.530-535.
- Tam97
- Tammet T. (1997),
Gandalf,
Journal of Automated Reasoning 18(2),
pp.199-204.
- Tam98
- Tammet T. (1998),
Towards Efficient Subsumption,
Kirchner C., Kirchner H.,
Proceedings of the 15th International Conference on Automated
Deduction
(Lindau, Germany),
pp.427-440,
Lecture Notes in Artificial Intelligence 1421,
Springer-Verlag.
MUSCADET 2.4
D. Pastre
Université René Descartes (Paris 5),
France
pastre@math-info.univ-paris5.fr
Architecture
The MUSCADET theorem prover is a knowledge-based system.
It is based on Natural Deduction, following the terminology of
[Ble71] and
[Pas78],
and uses methods that resemble those used by humans.
It is composed of an inference engine, which interprets and executes rules,
and of one or several bases of facts,
which are the internal representation of "theorems to be proved".
Rules are either universal and put into the system, or built by the system
itself by metarules from data (definitions and lemmas).
Rules may add new hypotheses, modify the conclusion, create objects,
split theorems into two or more subtheorems,
or build new rules which are local for a (sub-)theorem.
Implementation
MUSCADET 2 [Pas02b]
is implemented in SWI-Prolog.
Rules are written as declarative Prolog clauses.
Metarules are written as sets of Prolog clauses, more or less declarative.
The inference engine includes the Prolog interpreter and some procedural Prolog
clauses.
MUSCADET 2.4 is available from:
http://www.math-info.univ-paris5.fr/~pastre/muscadet/muscadet.html
Strategies
There are specific strategies for existential, universal, conjunctive or
disjunctive hypotheses, and conclusions.
Functional symbols may be used, but an automatic creation of intermediate
objects allows deep subformulae to be flattened and treated as if the
concepts were defined by predicate symbols.
The successive steps of a proof may be forward deduction (deduce new hypotheses
from old ones), backward deduction (replace the conclusion by a new one) or
refutation (only if the conclusion is a negation).
The system is also able to work with second order statements. It
may also receive knowledge and know-how for a specific domain from a
human user; see [Pas89] and
[Pas93]. These two possibilities
are not used while working with the TPTP Library.
Expected Competition Performance
The best performances of MUSCADET will be for problems
manipulating many concepts in which all statements (conjectures,
definitions, axioms) are expressed in a manner similar to the
practice of humans, especially of mathematicians. It will have poor
performances for problems using few concepts but large and deep
formulas leading to many splittings.
References
- Ble71
- Bledsoe W.W. (1971),
Splitting and Reduction Heuristics in Automatic Theorem Proving,
Artificial Intelligence 2,
pp.55-77.
- Pas78
- Pastre D. (1978),
Automatic Theorem Proving in Set Theory,
Artificial Intelligence 10,
pp.1-27.
- Pas89
- Pastre D. (1989),
MUSCADET : An Automatic Theorem Proving System using Knowledge
and Metaknowledge in Mathematics,
Artificial Intelligence 38,
pp.257-318.
- Pas93
- Pastre D. (1993),
Automated Theorem Proving in Mathematics,
Annals of Mathematics and Artificial Intelligence 8,
pp.425-447.
- Pas01a
- Pastre D. (2001),
Muscadet2.3 : A Knowledge-based Theorem Prover based on Natural
Deduction,
Gore R., Leitsch A., Nipkow T.,
Proceedings of the International Joint Conference on Automated
Reasoning
(Siena, Italy),
pp.685-689,
Lecture Notes in Artificial Intelligence 2083,
Springer-Verlag.
- Pas01b
- Pastre D. (2001),
Implementation of Knowledge Bases for Natural Deduction,
Nieuwenhuis R., Voronkov A.,
Proceedings of the 8th International Conference on Logic for
Programming, Artificial Intelligence and Reasoning
(Havana, Cuba),
pp.29-68,
Lecture Notes in Artificial Intelligence 2250,
Springer-Verlag.
- Pas02a
- Pastre D. (2002),
Strong and Weak Points of the MUSCADET Theorem Prover,
AI Communications 15(2-3),
pp.147-160.
- Pas02b
- Pastre D. (2002),
MUSCADET version 2.4 : User's Manual,
http://www.math-info.univ-paris5.fr/~pastre/muscadet/manual-en.ps.
Octopus N
Monty Newborn, Zongyan Wang
McGill University, Canada
newborn@cs.mcgill.ca
Architecture
Octopus is a parallel ATP system.
It is an improved version of the single-processor ATP system Theo
[New01].
Inference rules used by Octopus include binary resolution, binary factoring,
instantiation, demodulation, and hash table resolutions.
Octopus performs 3000-10000 inferences/second on each processor.
Implementation
Octopus is implemented in C and currently runs under Linux.
It runs on a network of 20-40 PCs housed in a laboratory at McGill
University's School of Computer Science.
The processors communicate using PVM [GB+94].
Strategies
Octopus begins by determining a number of weakened versions of the given
theorem, and then assigns one such version to each computer. Each
computer then attempts to prove the weakened version of the theorem
assigned to it. If successful, the computer then uses the proof found to
weakened theorem to help prove the given theorem. In essence, Octopus
combines learning and parallel theorem proving.
In the current version of Octopus, a weakened version of a theorem
consists of the same clauses of the given theorem except for one. In that
one clause, a constant is replaced by a variable that doesn't appear
elsewhere in the clause. If a proof exists to the given theorem, a proof
exists to the weakened version, and often, though far from always, the
proof of the weakened version is easier to find.
When a proof is found to a weakened version, certain clauses in
the proof are added to the base clauses of the given theorem. Octopus
then tries to prove the given theorem with the augmented set of base
clauses.
Each processor in the system also uses different values for the
maximum number of literals and terms in inferences generated when looking
for a proof. Thus while two computers may try to solve the same weakened
version of the given theorem, they do it with different values for the
maximum number of literals and terms in derived inferences.
Expected Competition Performance
Octopus should do somewhat better than its predecessors that competed in
the competition in the late 1990s, though it is not likely to finish among
the top perfomers.
References
- New01
- Newborn M. (2001),
Automated Theorem Proving: Theory and Practice,
Springer.
- GB+94
- Geist A., Beguelin A., Dongarra J., Jiang W., Manchek R., and Sunderam V.
(1994),
PVM: Parallel Virtual Machine: A Users Guide and Tutorial for
Network Parallel Computing,
MIT Press.
Otter 3.2
William McCune
Argonne National Laboratory, USA
mccune@mcs.anl.gov
Architecture
Otter 3.2 [McC94] is an ATP system for
statements in first-order (unsorted) logic with equality.
Otter is based on resolution and paramodulation applied to clauses.
An Otter search uses the "given clause algorithm", and typically involves
a large database of clauses; subsumption and demodulation play an important
role.
Implementation
Otter is written in C.
Otter uses shared data structures for clauses and terms, and it uses
indexing for resolution, paramodulation, forward and backward subsumption,
forward and backward demodulation, and unit conflict.
Otter is available from:
http://www-unix.mcs.anl.gov/AR/otter/
Strategies
Otter's original automatic mode, which reflects no tuning to the TPTP
problems, will be used.
Expected Competition Performance
Otter has been entered into CASC-19 as a stable benchmark against which
progress can be judged (there have been only minor changes to Otter since
1996 [MW97], nothing that really affects
its performace in CASC).
This is not an ordinary entry, and we do not hope for Otter to do well
in the competition.
References
- McC94
- McCune W.W. (1994),
Otter 3.0 Reference Manual and Guide,
ANL-94/6,
Argonne National Laboratory, Argonne, USA.
- MW97
- McCune W.W., Wos L. (1997),
Otter: The CADE-13 Competition Incarnations,
Journal of Automated Reasoning 18(2),
pp.211-220.
Acknowledgments: Ross Overbeek, Larry Wos, Bob Veroff, and
Rusty Lusk contributed to the development of Otter.
Paradox 1.0
Koen Claessen, Niklas Sörensson
Chalmers University of Technology and
Gothenburg University, Sweden
{koen,nik}@cs.chalmers.se
Architecture
Paradox 1.0 [CS03] is a finite-domain model
generator.
It is based on a MACE-style [McC94]
flattening and instantiating of the FO clauses into propositional clauses, and
then the use of a SAT solver to solve the resulting problem.
Paradox incorporates the following novel features: New polynomial-time
clause splitting heuristics, the use of incremental SAT,
static symmetry reduction techniques, and the use of sort
inference.
Implementation
The main part of Paradox is implemented in Haskell using the GHC compiler.
Paradox also has a built-in incremental SAT solver which is written in C++.
The two parts are linked together on the object level using Haskell's Foreign
Function Interface.
Paradox uses the following non-standard Haskell extensions: local universal
type quantification and hash-consing.
Strategies
There is only one strategy in Paradox:
- Analyze the problem, finding an upper bound N on the domain size of
models, where N is possibly infinite.
A finite such upper bound can for example be found for EPR problems.
- Flatten the problem, and split clauses and simplify as much as possible.
- Instantiate the problem for domain sizes 1 up to N, applying the SAT
solver incrementally for each size.
Report "SATISFIABLE" when a model is found.
- When no model of sizes smaller or equal to N is found, report
"CONTRADICTION".
In this way, Paradox can be used both as a model finder and as an EPR solver.
Expected Competition Performance
Paradox will enter the CASC competition in two categories: SAT and EPR.
Paradox beats last year's winner (2002) in the SAT category, and has also
solved a number of "unknown" problems from TPTP within a short time limit.
So it should have some chance of winning this year's SAT competition.
Paradox is not optimized at all for the EPR category, but should perform
reasonably well.
References
- CS03
- Claessen K., Sorensson N. (2003),
New Techniques that Improve MACE-style Finite Model
Finding,
Baumgartner P., Fermueller C.,
Proceedings of the CADE-19 Workshop: Model Computation - Principles,
Algorithms, Applications
(Miami, USA).
- McC94
- McCune W.W. (1994),
A Davis-Putnam Program and its Application to Finite First-Order
Model Search: Quasigroup Existence Problems,
ANL/MCS-TM-194,
Argonne National Laboratory, Argonne, USA.
THEO J2003
Monty Newborn
McGill University,
Canada
newborn@cs.mcgill.ca
Architecture
THEO [New01] is a resolution-refutation
theorem prover for first order clause logic.
It uses binary resolution, binary factoring, instantiation, demodulation,
and hash table resolutions.
Implementation
Theo is written in C and runs under both LINUX and FREEBSD. It contains
about 35000 lines of source code.
Originally it was called The Great Theorem Prover.
Strategies
THEO uses a large hash table (16 megaentries) to store clauses. This
permits complex proofs to be found, some as long as 500 inferences.
It uses what might be called a brute-force iteratively deepening
depth-first search for a contradiction while storing information
about clauses - unit clauses in particular - in its hash table.
Expected Competition Performance
THEO should perform slightly better than it has in the past.
References
- New01
- Newborn M. (2001),
Automated Theorem Proving: Theory and Practice,
Springer.
Vampire 5.0
Alexandre Riazanov, Andrei Voronkov
University of Manchester, England
{riazanoa,voronkov}@cs.man.ac.uk
Architecture
Vampire [RV01,
RV02] 5.0 is an automatic theorem prover
for first-order classical logic.
Its kernel implements the calculi of ordered binary resolution and
superposition for handling equality.
The splitting rule is simulated by introducing new predicate symbols.
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion, subsumption
resolution and rewriting by ordered unit equalities.
The only term ordering used in Vampire at the moment is a special
non-recursive version of the Knuth-Bendix ordering that allows
efficient approximation algorithms for solving ordering constraints.
By the system installation deadline we may implement the standard
Knuth-Bendix ordering.
A number of efficient indexing techniques are used to implement all
major operations on sets of terms and clauses.
Although the kernel of the system works only with clausal normal
forms, the preprocessor component accepts a problem in the full
first-order logic syntax, clausifies it and performs a number of
useful transformations before passing the result to the kernel.
Implementation
Vampire 5.0 is implemented in C++.
The main supported compiler version is gcc 2.95.3, although in the nearest
future we are going to move to gcc 3.x.
The system has been successfully compiled for Linux and Solaris.
It is available from:
http://www.cs.man.ac.uk/~riazanoa/Vampire/
Strategies
The Vampire kernel provides a fairly large number of features
for strategy selection.
The most important ones are:
- Choice of the main saturation procedure : (i) OTTER loop, with or without
the Limited Resource Strategy, (ii) DISCOUNT loop.
- A variety of optional simplifications.
- Parameterised simplification ordering.
- A number of built-in literal selection functions and different modes of
comparing literals.
- Age-weight ratio that specifies how strongly lighter clauses are preferred
for inference selection.
The standalone executables for Vampire 5.0 and Vampire 5.0-CASC use very
simple time slicing to make sure that several kernel strategies are tried on
a given problem.
The automatic mode of Vampire 5.0 is primitive.
Seven problem classes are distinguished corresponding to the competition
divisions HNE, HEQ, NNE, NEQ, PEQ, UEQ and EPR.
Every class is assigned a fixed schedule consisting of a number of kernel
strategies called one by one with different time limits.
Expected Competition Performance
Vampire 5.0 is the CASC-18 MIX and FOF divisions winner.
References
- RV01
- Riazanov A., Voronkov A. (2001),
Vampire 1.1 (System Description),
Gore R., Leitsch A., Nipkow T.,
Proceedings of the International Joint Conference on Automated
Reasoning
(Siena, Italy),
pp.376-380,
Lecture Notes in Artificial Intelligence 2083,
Springer-Verlag.
- RV02
- Riazanov A., Voronkov A. (2002),
The Design and Implementation of Vampire,
AI Communications,
To appear.
Vampire 6.0
Alexandre Riazanov, Andrei Voronkov
University of Manchester, England
{riazanoa,voronkov}@cs.man.ac.uk
Architecture
Vampire [RV02] 6.0 is an automatic theorem
prover for first-order classical logic.
Its kernel implements the calculi of ordered binary resolution, superposition
for handling equality and ordered chaining for transitive predicates.
The splitting rule is simulated by introducing new predicate symbols.
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion,
subsumption resolution and rewriting by ordered unit equalities.
The reduction orderings used are the standard Knuth-Bendix ordering and a
special non-recursive version of the Knuth-Bendix ordering that allows
efficient approximation algorithms for solving ordering constraints.
A number of efficient indexing techniques are used to implement all major
operations on sets of terms and clauses.
Although the kernel of the system works only with clausal normal forms, the
preprocessor component accepts a problem in the full first-order logic syntax,
clausifies it and performs a number of useful transformations before passing
the result to the kernel.
Implementation
Vampire 6.0 is implemented in C++.
The supported compilers are gcc 2.95.3, 3.x and Microsoft Visual C++.
The system has been successfully compiled for Linux, Solaris and Win32.
It is available (conditions apply) from:
http://www.cs.man.ac.uk/~riazanoa/Vampire/
Strategies
The Vampire kernel provides a fairly large number of features for strategy
selection.
The most important ones are:
- Choice of the main saturation procedure : (i) OTTER loop, with or
without the Limited Resource Strategy, (ii) DISCOUNT loop.
- A variety of optional simplifications.
- Parameterised reduction orderings.
- A number of built-in literal selection functions and different modes
of comparing literals.
- Age-weight ratio that specifies how strongly lighter clauses are
preferred for inference selection.
The standalone executable for Vampire 6.0 uses very
simple time slicing to make sure that several kernel strategies are tried
on a given problem.
The automatic mode of Vampire 6.0 is primitive.
Seven problem classes are distinguished corresponding to the competition
divisions HNE, HEQ, NNE, NEQ, PEQ, UEQ and EPR.
Every class is assigned a fixed schedule consisting of a number of kernel
strategies called one by one with different time limits.
Expected Competition Performance
We have made many, mostly cosmetic, changes since version 5.0, but they are
unlikely to affect the performance drastically.
We hope that Vampire 6.0 will perform at least as well as Vampire 5.0.
References
- RV02
- Riazanov A., Voronkov A. (2002),
The Design and Implementation of Vampire,
AI Communications 15(2-3),
pp.91-110.
Waldmeister 702
Thomas Hillenbrand1, Bernd Löchner2
1Max-Planck-Institut für Informatik Saarbrücken, Germany,
2Universität Kaiserslautern, Germany
waldmeister@informatik.uni-kl.de
Architecture
Waldmeister 702 is an implementation of unfailing Knuth-Bendix
completion [BDP89] with extensions
towards ordered completion (see [AHL00])
and basicness [BG+92,
NR92].
The system saturates the input axiomatization, distinguishing active facts,
which induce a rewrite relation, and passive facts, which are the one-step
conclusions of the active ones up to redundancy.
The saturation process is parameterized by a reduction ordering and a
heuristic assessment of passive facts.
Only recently, we have designed a thorough refinement of the system
architecture concerning the representation of passive facts
[HL02].
The aim of that work - the next Waldmeister loop - is, besides gaining
more structural clarity, to cut down memory consumption especially for
long-lasting proof attempts, and hence less relevant in the CASC setting.
Implementation
The system is implemented in ANSI-C and runs under Solaris and Linux.
The central data strucures are: perfect discrimination trees for the
active facts; element-wise compressions for the passive ones; and sets
of rewrite successors for the conjectures. Waldmeister can be found on
the Web at
http://www-avenhaus.informatik.uni-kl.de/waldmeister
Strategies
Our approach to control the proof search is to choose the search
parameters according to the algebraic structure given in the problem
specification [HJL99].
This is based on the observation that proof tasks sharing major parts of
their axiomatization often behave similar.
Hence, for a number of domains, the influence of different reduction orderings
and heuristic assessments has been analyzed experimentally; and in most cases
it has been possible to distinguish a strategy uniformly superior on the whole
domain.
In essence, every such strategy consists of an instantiation of the first
parameter to a Knuth-Bendix ordering or to a lexicographic path ordering, and
an instantiation of the second parameter to one of the weighting functions
addweight, gtweight, or mixweight, which, if
called on an equation , return
,
, or
,
respectively, where denotes the number of symbols
in .
Expected Competition Performance
Waldmeister 702 is the CASC-18 UEQ division winner.
References
- AHL00
- Avenhaus J., Hillenbrand T., Löchner B. (2000),
On Using Ground Joinable Equations in Equational Theorem
Proving,
Baumgartner P., Zhang H.,
Proceedings of the 3rd International Workshop on First Order Theorem
Proving
(St Andrews, Scotland),
pp.33-43.
- BDP89
- Bachmair L., Dershowitz N., Plaisted D.A. (1989),
Completion Without Failure,
Ait-Kaci H., Nivat M.,
Resolution of Equations in Algebraic Structures,
pp.1-30,
Academic Press.
- BG+92
- Bachmair L., Ganzinger H., Lynch C., Snyder W. (1992),
Basic Paramodulation and Superposition,
Kapur D.,
Proceedings of the 11th International Conference on Automated
Deduction (Saratoga Springs, USA),
pp.462-476,
Lecture Notes in Artificial Intelligence 607,
Springer-Verlag.
- HJL99
- Hillenbrand T., Jaeger A., Löchner B. (1999),
Waldmeister - Improvements in Performance and Ease of Use,
Ganzinger H.,
Proceedings of the 16th International Conference on Automated
Deduction
(Trento, Italy),
pp.232-236,
Lecture Notes in Artificial Intelligence 1632,
Springer-Verlag.
- HL02
- Hillenbrand T., Löchner B. (2002),
The Next Waldmeister Loop,
Voronkov A.,
Proceedings of the 18th International Conference on Automated
Deduction
(Copenhagen, Denmark),
pp.486-500,
Lecture Notes in Artificial Intelligence 2392,
Springer-Verlag.
- NR92
- Nieuwenhuis R., Rivero J.M. (1992),
Basic Superposition is Complete,
Krieg-Brückner B.,
Proceedings of the 4th European Symposium on Programming
(Rennes, France),
pp.371-390,
Lecture Notes in Computer Science 582,
Springer-Verlag.
Waldmeister 703
Jean-Marie Gaillourdet1, Thomas Hillenbrand2, Bernd Löchner1
1Universität Kaiserslautern, Germany,
2Max-Planck-Institut für Informatik Saarbrücken, Germany
waldmeister@informatik.uni-kl.de
Architecture
Waldmeister is a system for unit equational deduction.
Its theoretical basis is unfailing completion in the sense of
[BDP89] with refinements
towards ordered completion (cf. [AHL03]).
The system saturates the input axiomatization, distinguishing active facts,
which induce a rewrite relation, and passive facts, which are the one-step
conclusions of the active ones up to redundancy.
The saturation process is parameterized by a reduction ordering and a
heuristic assessment of passive facts
[HJL99].
For an in-depth description of the system, see
[Hil03].
Since last year's competition, the "new Waldmeister loop" has been
implemented, and is now operational
[GH+03].
This notion captures a novel organization of the saturation-based proof
procedure into a system architecture
[HL02], featuring a highly compact
representation of the search state which exploits its inherent structure.
The revealed structure also paves the way to an easily implemented
parallelization of the prover with modest communication requirements
and rewarding speed-ups.
With the new architecture it is now possible to solve problems that were
previously out of reach, for example deriving Boolean from the second Winker
Lemma (ROB007-1).
The proof is found with a standard strategy just in a parallel overnight run.
Over 70,000 active facts and 500,000,000 passive ones are represented in no
more than 200 MBytes.
This foils the common belief that provers succeed either within five minutes
or not all.
Implementation
The prover is coded in ANSI-C.
It runs on Solaris, Linux, and newly also on MacOS X.
In addition, it is now available for Windows users via the Cygwin platform.
The central data strucures are: perfect discrimination trees for the active
facts; group-wise compressions for the passive ones; and sets of rewrite
successors for the conjectures.
Visit the thoroughly rewritten Waldmeister Web pages at:
http://www.mpi-sb.mpg.de/~hillen/waldmeister/
Strategies
The approach taken to control the proof search is to choose the
search parameters according to the algebraic structure given in the
problem specification [HJL99].
This is based on the observation that proof tasks sharing major parts of
their axiomatization often behave similarly.
Hence, for a number of domains, the influence of different
reduction orderings and heuristic assessments has been analyzed
experimentally; and in most cases it has been possible to distinguish
a strategy uniformly superior on the whole domain.
In essence, every such strategy consists of an instantiation of the first
parameter to a Knuth-Bendix ordering or to a lexicographic path ordering,
and an instantiation of the second parameter to one of the weighting
functions addweight, gtweight, or
mixweight, which, if called on an equation , return ,
, or
,
respectively, where denotes the number of
symbols in .
Expected Competition Performance
The focus of recent developments has been on coping with large
search states; so we hope that this year's system version will be
competitive with last year's.
References
- AHL03
- Avenhaus J., Hillenbrand T., Löchner B. (2003),
On Using Ground Joinable Equations in Equational Theorem
Proving,
Journal of Symbolic Computation 36(1-2),
pp.217-233,
Elsevier Science.
- BDP89
- Bachmair L., Dershowitz N., Plaisted D.A. (1989),
Completion Without Failure,
Ait-Kaci H., Nivat M.,
Resolution of Equations in Algebraic Structures,
pp.1-30,
Academic Press.
- GH+03
- Gaillourdet J-M., Hillenbrand T., Löchner B., Spies H. (2003),
The New Waldmeister Loop at Work,
Baader F.,
Proceedings of the 19th International Conference on Automated
Deduction
(Miami, USA),
pp.To appear,
Lecture Notes in Artificial Intelligence,
Springer-Verlag.
- HJL99
- Hillenbrand T., Jaeger A., Löchner B. (1999),
Waldmeister - Improvements in Performance and Ease of Use,
Ganzinger H.,
Proceedings of the 16th International Conference on Automated
Deduction
(Trento, Italy),
pp.232-236,
Lecture Notes in Artificial Intelligence 1632,
Springer-Verlag.
- HL02
- Hillenbrand T., Löchner B. (2002),
The Next Waldmeister Loop,
Voronkov A.,
Proceedings of the 18th International Conference on Automated
Deduction
(Copenhagen, Denmark),
pp.486-500,
Lecture Notes in Artificial Intelligence 2392,
Springer-Verlag.
- Hil03
- Hillenbrand T. (2003),
Citius altius fortius: Lessons Learned from the Theorem Prover
Waldmeister,
Dahn I., Vigneron L.,
Proceedings of the 4th International Workshop on First-Order Theorem
Proving
(Valencia, Spain),
Electronic Notes in Theoretical Computer Science 86.1,
Elsevier Science.