Bliksem 1.12a
Hans de Nivelle
Max Planck Institut für Informatik, Germany
nivelle@mpi-sb.mpg.de
Architecture
Bliksem is a theorem prover implementing resolution and paramodulation.
It implements many different strategies, including several orders that
define decision procedures.
It does not implement basicness or constraints.
The automatic selection of strategies is poor, which partially explains the
bad performance.
A special feature of Bliksem is that it is able to output total proofs
that can be easily verified [BHN00].
Additional programs that can check the proofs, or convert them into
Coq-format [COQ02], are available from the
home page.
Bliksem 1.12a is a reduced version of Bliksem 1.12. It was planned to replace the CNF-transformer by a Java program that is able to generate correctness proofs. In order to prepare for this, the CNF-transformer of Bliksem has been reduced to a minimum. Unfortunately, the new CNF-transformer will not be completed before the competition.
http://www.mpi-sb.mpg.de/~bliksem
CiME 2
Evelyne Contejean, Benjamin Monate
LRI, Universite Paris-Sud, France
{contejea,monate}@lri.fr
Architecture
CiME is intended to be a toolkit, which contains nowadays the following
features:
The ordered completion of term rewriting systems
[CM96] will be used during
the competion to attempt to solve unification problems, that is
problems in the UEQ division.
http://cime.lri.fr/as binaries for SPARC workstations running Solaris (at least version 2.6) and for pentium PCs running Linux, and its sources are available by read-only anynomous CVS.
The new DCTP 1.2 features several methods of builtin equality treatment [LS02], based on ordered paramodulation and on a variant of lazy root paramodulation that is also featured in the current version of SETHEO. Also, a formula preprocessor has been integrated making use of clause factoring, full subsumption, demodulation and a number of other techniques.
DCTP 10.1p is a strategy parallel version using the technology of E-SETHEO [SW99] to combine several different strategies based on DCTP 1.2.
E 0.7 and EP 0.7
Stephan Schulz
Technische Universität München, Germany, and
Safelogic A.B., Sweden
schulz@informatik.tu-muenchen.de
Architecture
E 0.7 [Sch01,Sch02]
is a purely equational theorem prover.
The calculus used by E combines superposition (with selection of negative
literals) and rewriting.
No special rules for non-equational literals have been implemented,
i.e. resolution is simulated via paramodulation and equality resolution.
E also implements AC redundancy elimination and AC simplification for
dynamically recognized associative and commutative equational theories,
as well as pseudo-splitting for clauses.
E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e., a strict separation of active and passive facts. Proof search in E is primarily controlled by a literal selection strategy, a clause evaluation heuristic, and a simplification ordering. The prover supports a large number of preprogrammed literal selection strategies, many of which are only experimental. Clause evaluation heuristics can be constructed on the fly by combining various parameterized primitive evaluation functions, or can be selected from a set of predefined heuristics. Supported term orderings are several parameterized instances of Knuth-Bendix Ordering (KBO) and Lexicographic Path Ordering (LPO).
An automatic mode can select literal selection strategy, term ordering (different versions of KBO and LPO), and search heuristic based on simple problem characteristics.
EP 0.7 is just a combination of E 0.7 in verbose mode and a proof analysis tool extracting the used inference steps.
The program has been successfully installed under SunOS 4.3.x, Solaris 2.x, HP-UX B 10.20, and various versions of Linux. Sources of the latest released version and a current snapshot are available freely from:
http://wwwjessen.informatik.tu-muenchen.de/~schulz/WORK/eprover.htmlEP 0.7 is a simple Bourne shell script calling E and the postprocessor in a pipeline.
E distinguishes four primary problem classes: Problems where all non-negative clauses are unit clauses, where all non-negative clauses are Horn clauses (and at least one is not a unit), and non-Horn problems without and with equality. For each of these classes a separate set of candidate heuristics is created and run over all problems in this class. To generate the automatic mode for any of the four primary classes, the class is partitioned into subclasses defined by (some) of the following nine features:
For each non-empty class, we assign the most general candidate heuristic that solves the same number of problems on this class as the best heuristic on this class does. Typically, most selected heuristics are assigned to more than one class. As an example, the current intermediate version uses a total of eight heuristics to cover all 532 problems with unit axioms.
EP 0.7 will be hampered by the fact that it has to analyse the
inference step listing, an operation that typically is about as
expensive as the proof search itself.
Nevertheless, it should be competitive among the MIX* systems.
Since version 99csp of E-SETHEO, the different strategies
are run sequentially, one after the other, depending on the allocation
of resources to the different strategies, so-called schedules,
which have been computed from experimental data using machine learning
techniques as described in [SW99].
Schedule selection depends on syntactic characteristics of the input formula.
E-SETHEO csp01 incorporates the disconnection prover DCTP
[LS01] as a new strategy as well as a
new version of the E prover.
The program runs under Solaris 2.6. Sources are available freely.
Since version 99csp of E-SETHEO, the different strategies are run sequentially,
one after the other.
E-SETHEO csp02 incorporates the new version of the disconnection prover DCTP
with integrated equality handling as a new strategy as well as a new version
of the E prover.
The new Scheme version of SETHEO that is in use features local unit failure
caching [LS01] and lazy root paramodulation,
an optimisation of lazy paramodulation which is complete in the Horn case
[LS02].
The program runs under Solaris and, with a little luck, under Linux,
too. Sources are available from the authors.
References
E-SETHEO csp01
Reinhold Letz, Stephan Schulz, Gernot Stenz
Technische Universität München,
Germany
{letz,schulz,stenz}@informatik.tu-muenchen.de
Architecture
E-SETHEO is a compositional theorem prover for formulae in first order
clause logic, combining the systems E [Sch99]
and SETHEO [MI+97].
It incorporates different calculi and proof procedures
like superposition, model elimination and semantic trees (the DPLL
procedure).
Furthermore, the system contains transformation techniques
which may split the formula into independent subparts or which may
perform a ground instantiation.
Finally, advanced methods for controlling and optimizing the combination
of the subsystems are applied.
E-SETHEO additionally includes the program Flotter
[WGR96] as a
preprocessing module for transforming non-clausal formulae to clausal form.
Implementation
According to the diversity of the contained systems, the modules of
E-SETHEO are implemented in different programming languages like C,
Prolog, Scheme, and Shell tools.
References
E-SETHEO csp02
Reinhold Letz, Stephan Schulz, Gernot Stenz
Technische Universität München,
Germany
{letz,schulz,stenz}@informatik.tu-muenchen.de
Architecture
E-SETHEO is a compositional theorem prover for formulae in first order
clause logic, combining the systems E [Sch01],
DCTP [Ste02] and SETHEO
[MI+97].
It incorporates different calculi and proof procedures like superposition,
model elimination and semantic trees (the DPLL procedure).
Furthermore, the system contains transformation techniques which may split
the formula into independent subparts or which may perform a ground
instantiation.
Finally, advanced methods for controlling and optimizing the combination
of the subsystems are applied.
The first-order variant of E-SETHEO no longer uses Flotter
[WGR96] as a preprocessing module for
transforming non-clausal formulae to clausal form.
Instead, a more primitive normal form transformation is employed.
Implementation
According to the diversity of the contained systems, the modules of
E-SETHEO are implemented in different programming languages like C,
Prolog, Scheme, and Shell tools.
Strategies
Individual strategies are started be E-SETHEO depending on the allocation
of resources to the different strategies, so-called schedules,
which have been computed from experimental data using machine learning
techniques as described in [SW99].
Schedule selection depends on syntactic characteristics of the input formula
such as the Horn-ness of formulae, whether a problem contains equality
literals or whether the formula is in the Bernays-Schönfinkel class.
The problem classes are more or less identical with the sub-classes of the
competition.
We have no idea whether or not this conflicts with the organisers' tuning
restrictions.
Expected Competition Performance
We expect E-SETHEO to perform well in all categories it participates in.
References
EXLOG 2
Ivan Kossey
Uzhgorod State University,
Ukraine
Ivan.Kossey@t-online.de
Architecture
EXLOG, developed since 1999, is a theorem prover in first order logic.
The basic features are binary resolution with factors
[Kos82], forward and
backward subsumption, unit preference, a weak variant of set of support,
and exclusion of predicates with satisfiable propositional abstraction.
Features for equality are rewrite rules (created from unit equality clauses
with ordering of terms on length/alphabetical), and commutative unification.
A somewhat modified Knuth-Bendix algorithm is used for equations with
commutative unification for symmetric predicates and commutative functions.
Paramodulation and AC-unification are not implemented.
EXLOG accepts FOF, clausal TPTP format and his own input format.
Implementation
Assembler (TASM32) under Windows32. EXLOG runs on PC under Win95, Win98,
WinNT, WinME, Win2000, WinXP. Developed and tested under Win2000.
The subsumption algorithm [Kos84]
and its implementation seems to be fast.
Gandalf c-2.5
Tanel Tammet
Tallinn Technical University, Estonia
Safelogic AB, Sweden
tammet@cc.ttu.ee
Architecture
Gandalf
[Tam97,Tam98]
is a family of automated theorem provers, including classical, type theory,
intuitionistic and linear logic provers, plus finite a model builder.
The version c-2.5 contains the classical logic prover for clause
form input and the finite model builder.
One distinguishing feature of Gandalf is that it contains a large
number of different search strategies and is capable of automatically
selecting suitable strategies and experimenting with these strategies.
Gandalf is available under GPL. There exists a separate commercial version of Gandalf, called G, developed and distributed by Safelogic AB (www.safelogic.se), which contains numerous additions, strategies, and optimisations, aimed specifically at verification of large systems.
The finite model building component of Gandalf uses the Zchaff propositional logic solver by L.Zhang [MM+01] as an external program called by Gandalf. Zchaff is not free, although it can be used freely for research purposes. Gandalf is not optimised for Zchaff or linked together with it: Zchaff can be freely replaced by other satisfiability checkers.
Gandalf has been tested on Linux, Solaris and MS Windows under Cygwin.
The propositional satisifiability checker Zchaff used by Gandalf during finite model building is implemented by L.Zhang in C++.
Gandalf should be publicly available at:
http://www.ttu.ee/it/gandalf
In the normal mode Gandalf attempts to find only unsatisfiability. It has to be called with a -sat flag to find satisfiability. Gandalf selects the strategy list according to the following criteria:
Each strategy may be iterated over a limit on term depth. For clause sets containing equality, some strategies are tried with both the Knuth-Bendix ordering and recursive path ordering, as well as with several different ordering principles of function symbols for these orderings.
Typically Gandalf selects one or two strategies to iterate over the term depth limit and one or two strategies to iterate over the selection of equality orderings. At the second half of each strategy run Gandalf imposes additional restrictions, like introducing unit restriction and switching over to strict best-first clause selection.
The strategy list selection criteria for a particular problem is based on following:
In the SAT division, Gandalf version 2.5 should perform significantly better than older versions of Gandalf or any other satisfiability checker that participated at CASC during 2001. However, no prediction can be made about the relative performance at 2002.
In the EPR division, Gandalf now uses suitable strategies, differently from
Gandalf in 2001, and is expected to be among the better provers in this class.
Ordinary Gandalf is a resolution prover which implements a number of
different basic strategies: binary ordered resolution for several orderings,
versions of set-of-support resolution, binary unit resolution, hyperresolution.
Enhancements are used for cutting off literals and combining different
strategies, in particular forward and backward reasoning, into one run.
Equality is handled by ordered paramodulation and demodulation.
GandalfSat uses time-slicing similarly to ordinary Gandalf.
It attempts hyperresolution, ordered resolution based on
[FL+93] results, and two kinds of
finite model building algorithms (Falcon-like and MACE-like) for attempting
to show satisfiability.
Some relevant publications are:
[FL+93,
Tam97,
TS98].
For details about ZChaff, see [MM+01].
Despite the optimizations in eground, there are problems where
eground runs out of time or memory.
In these cases eground outputs an incomplete ground clause set.
If eground has output an incomplete clause set and ZChaff reports
that it is satisfiable, then no result is reported by GrAnDe.
If eground has generated an incomplete clause set and ZChaff
reports that it is unsatisfiable, or if eground has generated a
complete clause set, then ZChaff's result is reported by GrAnDe as the overall
result.
References
GandalfSat 1.0
Tanel Tammet
Tallinn Technical University, Estonia
tammet@cc.ttu.ee
Architecture
GandalfSat is a special version of the
Gandalf
theorem prover, optimised for satisifiability checking.
Essentially it contains the same code as Gandalf, but uses different
default search strategies to Gandalf.
Implementation
Gandalf is implemented in Scheme, using the scm interpreter developed
by A. Jaffer and the Scheme-to-C compiler Hobbit developed by T. Tammet
for the scm system.
References
GrAnDe 1.1
Geoff Sutcliffe1, Stephan Schulz2
1University of Miami, USA,
2Technische Universität München, Germany
geoff@cs.miami.edu, schulz@informatik.tu-muenchen.de
Architecture
GrAnDe (Ground And Decide)
[SS02] is a decision procedure for
CNF problems with a finite Herbrand universe.
GrAnDe uses the grounding procedure eground
[Sch02] to generate ground instances of
such a problem, and the propositional decision system ZChaff
[MM+01] to determine the satisfiability of
the resulting propositional problem.
The set of all ground instances of a set of clauses is often too large
to compute with reasonable resources.
Therefore eground tries to find a smaller set of ground instances
that is still equiconsistent with the original set of clauses.
Three techniques are combined: clause splitting, structural constraints on
variable instantiation, and propositional simplification.
Clause splitting separates a clause into variable disjoint parts, and
replaces the orginal clause by new clauses formed from the parts and
link literals.
Structural constraints are used to prohibit variable instantiations that
would lead to the generation of pure literals in the resulting ground clauses.
Propositional simplification removes clauses and literals from the generated
ground clause set, while maintaining satisfiability.
Implementation
eground is implemented in ANSI-C and based on the E
[Sch01] libraries.
It uses the basic infrastructure up to the shared term representation,
and adds a new, compact clause data type for the generated propositional
clauses.
Propositional unit-clauses, used for simplification, are represented by
a simple array entry, making unit subsumption and unit simplification into
O(1) operations (in the number of unit clauses).
Clause splitting is implemented naively, using a flood-fill algorithm to
find connected sub-clauses.
Finally, variable constraints are directly computed in conjunctive normal
form, and are used to directly construct possible instantiations.
For details about ZChaff, see [MM+01].
A Perl script called And is used to combine eground and ZChaff. The script invokes eground on the EPR problem, allowing it maximally 66% of the CPU time limit. The propositional clauses written by eground are captured into a temporary file, which is used as the input for ZChaff. ZChaff is allowed whatever CPU time has not been used by eground.
GrAnDe, including the separate components, is available from:
http://www.cs.miami.edu/~tptp/ATPSystems/GrAnDe/It should be widely portable between recent UNIX variants.
ICGNS 2002
William McCune
Argonne National Laboratory, USA
mccune@mcs.anl.gov
Architecture
ICGNS 2002 [McC02] searches for finite models
of first-order (unsorted, with equality) statements.
Given input clauses, it generates ground instances over a finite domain,
then uses a decision procedure try to determine satisfiability.
If there is no model of that size, it increments the domain size and
tries again.
The ICGNS method is SEM-like [ZZ95] rather than MACE-like [McC01]; that is, the ground problem is propositional/equality rather than flattened and purely propositional. The two main parts of the ICGNS method are (1) selecting the next empty cell in the tables of functions being constructed and deciding which values need to be considered for that cell, and (2) propagating assignments. ICGNS uses the basic least number heuristic (LNH) to reduce isomorphism. The LNH was introduced in Falcon [Zha96] and is also used in SEM. Effective use of the LNH requires careful cell selection. Propagation is by ground rewriting and inference rules to derive negated equalities.
http://www.mcs.anl.gov/home/mccune/icgns/
Otter 3.2
William McCune
Argonne National Laboratory, USA
mccune@mcs.anl.gov
Architecture
Otter 3.2 [McC94] is an ATP system for
statements in first-order (unsorted) logic with equality.
Otter is based on resolution and paramodulation applied to clauses.
An Otter search uses the "given clause algorithm", and typically involves
a large database of clauses; subsumption and demodulation play an important
role.
Acknowledgments: Ross Overbeek, Larry Wos, Bob Veroff, and Rusty Lusk contributed to the development of Otter.
SCOTT 6.1
John Slaney
Australian National University, Australia
John.Slaney@anu.edu.au
Architecture
SCOTT [HS01,
HS02] consists of two main components:
a saturation-based theorem prover (Otter [McC02]
in fact) and a finite domain constraint solver
(FINDER [Sla94]) that generates models for
subsets of the clauses deduced.
The models are used to guide selection of given clauses, clauses which are
false in the guiding models being given preference.
Because model generation is computationally expensive, the critical issue
for performance is the tradeoff between search quality due to semantic
guidance and the time used in securing and refining it.
Successive versions of SCOTT have competed regularly in CASC. Version 6.1 is little changed from version 6.0 which competed in 2001, except for some amendments aimed at reducing the number of times a useless model is generated, deleted and re-generated, and slight tuning of the default values for parameters.
http://arp.anu.edu.au/software/scott/
Compared to Vampire 1.0 that participated in the previous competition, this version has many more literal selection functions, more flexible splitting without backtracking and improved memory management. The automatic mode of Vampire 2.0 is primitive, it recognises only very simple syntactic properties of the input problem, like the presence of equality or non-Horn clauses. In the preprocessing stage it exploits a number of primitive techniques, such as elimination of simple predicate and function definitions.
Vampire 2.0-CASC is a version based on Vampire 2.0, specialised to win the competition. To adjust the strategy it estimates the problem by checking some syntactic properties, such as presence of multiliteral, non-Horn and ground clauses, equations and nonequational literals. Additionally we consider such quantitative characteristics as the number of axioms, literals, small and large terms.
http://www.cs.man.ac.uk/~riazanoa/Vampire
Vampire 5.0 and Vampire 5.0-CASC
Alexandre Riazanov, Andrei Voronkov
University of Manchester, England
{riazanoa,voronkov}@cs.man.ac.uk
Architecture
Vampire [RV01,
RV02] 5.0 is an automatic theorem prover
for first-order classical logic.
Its kernel implements the calculi of ordered binary resolution and
superposition for handling equality.
The splitting rule is simulated by introducing new predicate symbols.
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion, subsumption
resolution and rewriting by ordered unit equalities.
The only term ordering used in Vampire at the moment is a special
non-recursive version of the Knuth-Bendix ordering which allows
efficient approximation algorithms for solving ordering constraints.
By the system installation deadline we may implement the standard
Knuth-Bendix ordering.
A number of efficient indexing techniques are used to implement all
major operations on sets of terms and clauses.
Although the kernel of the system works only with clausal normal
forms, the preprocessor component accepts a problem in the full
first-order logic syntax, clausifies them and performs a number of
useful transformations before passing the result to the kernel.
http://www.cs.man.ac.uk/~riazanoa/Vampire/
The automatic mode of Vampire 5.0 is primitive. Seven problem classes are distinguished corresponding to the competition divisions HNE, HEQ, NNE, NEQ, PEQ, UEQ and EPR. Every class is assigned a fixed schedule consisting of a number of kernel strategies called one by one with different time limits. The automatic mode of Vampire 5.0-CASC is better tuned towards TPTP 2.4.0. It uses the same problem characteristics as E 0.7 to divide the problems into a large number of classes and may potentially assign a separate strategy to every such class.
Since last year's version, stronger redundancy criteria have been integrated, including ground joinability tests with ordering constraints on variables [AHL00]. In several problem domains this technique is helpful especially for harder proof tasks. Some restructuring of the prover is on the way to also include an implementation of confluence trees with full ordering constraints [AL01]; but further work is necessary to have them speed up the proof search.
http://www-avenhaus.informatik.uni-kl.de/waldmeister
Waldmeister 702
Thomas Hillenbrand1, Bernd Löchner2
1Max-Planck-Institut für Informatik Saarbrücken, Germany,
2Universität Kaiserslautern, Germany
waldmeister@informatik.uni-kl.de
Architecture
Waldmeister 702 is an implementation of unfailing Knuth-Bendix
completion [BDP89] with extensions
towards ordered completion (see [AHL00])
and basicness [BG+92,
NR92].
The system saturates the input axiomatization, distinguishing active facts,
which induce a rewrite relation, and passive facts, which are the one-step
conclusions of the active ones up to redundancy.
The saturation process is parameterized by a reduction ordering and a
heuristic assessment of passive facts.
Only recently, we have designed a thorough refinement of the system architecture concerning the representation of passive facts [HL02]. The aim of that work - the next Waldmeister loop - is, besides gaining more structural clarity, to cut down memory consumption especially for long-lasting proof attempts, and hence less relevant in the CASC setting.
http://www-avenhaus.informatik.uni-kl.de/waldmeister