Darwin 1.4.4
Peter Baumgartner1,
Alexander Fuchs2,
Cesare Tinelli2
1NICTA, Australia,
2The University of Iowa, USA
Architecture
Darwin [BFT04,BFT06]
is an automated theorem prover for first order clausal logic.
It is an implementation of the Model Evolution calculus
[BT03].
The Model Evolution calculus lifts the propositional DPLL procedure to
first-order logic.
One of the main motivations for this approach was the possibility of
migrating to the first-order level some of those very effective search
techniques developed by the SAT community for the DPLL procedure.
The current version of Darwin implements first-order versions of unit propagation inference rules analogously to a restricted form of unit resolution and subsumption by unit clauses. To retain completeness, it includes a first-order version of the (binary) propositional splitting inference rule.
Proof search in Darwin starts with a default interpretation for a given clause set, which is evolved towards a model or until a refutation is found.
Darwin is implemented in OCaml and has been tested under various Linux distributions. It is available from
http://goedel.cs.uiowa.edu/Darwin/Changes to the previous version consist of minor bug fixes.
E 1.1pre and EP 1.1pre
Stephan Schulz
Technische Universität München, Germany,
Architecture
E 1.1pre [Sch02,Sch04a] is
a purely equational theorem prover.
The core proof procedure operates on formulas in clause normal form, using
a calculus that combines superposition (with selection of negative literals)
and rewriting.
No special rules for non-equational literals have been implemented, i.e.,
resolution is simulated via paramodulation and equality resolution.
The basic calculus is extended with rules for AC redundancy elimination, some
contextual simplification, and pseudo-splitting with definition caching.
The latest versions of E also supports simultaneous paramodulation, either
for all inferences or for selected inferences.
E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e., a strict separation of active and passive facts. Proof search in E is primarily controlled by a literal selection strategy, a clause evaluation heuristic, and a simplification ordering. The prover supports a large number of preprogrammed literal selection strategies, many of which are only experimental. Clause evaluation heuristics can be constructed on the fly by combining various parameterized primitive evaluation functions, or can be selected from a set of predefined heuristics. Supported term orderings are several parameterized instances of Knuth-Bendix Ordering (KBO) and Lexicographic Path Ordering (LPO).
The prover uses a preprocessing step to convert formulas in full first order format to clause normal form. This step may introduce (first-order) definitions to avoid an exponential growth of the formula. Preprocessing also unfolds equational definitions and performs some simplifications on the clause level.
The automatic mode determines literal selection strategy, term ordering, and search heuristic based on simple problem characteristics of the preprocessed clausal problem.
EP 1.1pre is just a combination of E 1.1pre in verbose mode and a proof analysis tool extracting the used inference steps.
E distinguishes problem classes based on a number of features, all of which have between 2 and 4 possible values. The most important ones are:
For the LTB part of the competition, E will use a relevancy-based pruning approach and attempt to solve the problems with successively more complete specifications until it suceeds or runs out of time.
The program has been successfully installed under SunOS 4.3.x, Solaris 2.x, HP-UX B 10.20, MacOS-X, and various versions of Linux. Sources of the latest released version are available freely from:
http://www.eprover.orgEP 1.1pre is a simple Bourne shell script calling E and the postprocessor in a pipeline.
EP 1.1p will be hampered by the fact that it has to analyse the inference step listing, an operation that typically is about as expensive as the proof search itself. Nevertheless, it should be competitive among the FOF and LTB proof class systems.
E-Darwin 1.2
Björn Pelzer
University Koblenz-Landau, Germany
Architecture
E-Darwin is an automated theorem prover for first order clausal logic with
equality. It is a modified version of the Darwin prover
[BFT04], and it implements the Model Evolution
with Equality calculus [BT05].
The general operation of the original Darwin has been extended by inferences
for handling equality, specifically a paramodulation and a reflexivity
inference.
Whereas the original Darwin derived units only, the new inferences in
E-Darwin allow the derivation of multi-literal clauses.
A more extensive detection and handling of redundancy has been added as well.
E-Darwin is implemented in the functional/imperative language OCaml. The system has been tested on Unix and is available under the GNU Public License from the E-Darwin website at:
http://www.uni-koblenz.de/~bpelzer/edarwin
E-KRHyper 1.1.3
Björn Pelzer
University Koblenz-Landau, Germany
Architecture
E-KRHyper [PW07] is a theorem proving and model
generation system for first-order logic with equality.
It is an implementation of the E-hyper tableau calculus
[BFP07], which integrates a superposition-based
handling of equality [BG98] into the hyper tableau
calculus [BFN96].
The system is an extension of the KRHyper theorem prover
[Wer03], which implements the original hyper
tableau calculus.
An E-hyper tableau is a tree whose nodes are labeled with clauses and which is built up by the application of the inference rules of the E-hyper tableau calculus. The calculus rules are designed such that most of the reasoning is performed using positive unit clauses. A branch can be extended with new clauses that have been derived from the clauses of that branch.
A positive disjunction can be used to split a branch, creating a new branch for each disjunct. No variables may be shared between branches, and if a case-split creates branches with shared variables, then these are immediately substituted by ground terms. The grounding substitution is arbitrary as long as the terms in its range are irreducible: the branch being split may not contain a positive equational unit which can simplify a substituting term, i.e. rewrite it with one that is smaller according to a reduction ordering. When multiple irreducible substitutions are possible, each of them must be applied in consecutive splittings in order to preserve completeness.
Redundancy rules allow the detection and removal of clauses that are redundant with respect to a branch.
The hyper extension inference from the original hyper tableau calculus is equivalent to a series of E-hyper tableau calculus inference applications. Therefore the implementation of the hyper extension in KRHyper by a variant of semi-naive evaluation [Ull89] is retained in E-KRHyper, where it serves as a shortcut inference for the resolution of non-equational literals.
The system runs on Unix and MS-Windows platforms and is available under the GNU Public License from the E-KRHyper website at:
http://www.uni-koblenz.de/~bpelzer/ekrhyper
This is the first version of Equinox that is complete.
For the LTB division a small script partitions
the time available to the given problem set and runs
iProver with the allocated time limit, and
time is repartitioned after each solved problem.
iProver is available at:
For the LTB division, iProver-SInE runs iProver as the underlying inference
engine of SInE 0.4, i.e., axiom selection is done
by SInE, and proof attempts are done by iProver.
The first-order reasoning in iProver-Eq consists of three components, each
based on a given clause algorithm: selection of clauses for instantiation,
unit paramodulation on selected literals and resolution for simplifications.
For non-equational input, the system behaves almost identically to
the iProver system, employing the same calculus and simplifications.
When equational literals are selected in clauses, a separate calculus based
on ordered paramodulation and demodulation is used to find conflicts between
selected literals.
Literals participating in a conflict give rise to instantiations of the
clauses they are selected in and these instances are in turn passed to the
ground solver.
The ground solver is required to reason modulo equality and we have
integrated the CVC3 solver [BT07] for this task.
Although Isabelle is designed for interactive proof development,
it is a little known fact that it is possible to run Isabelle from
the command line, passing in a theory file with a formula to solve.
Isabelle theory files can include Standard ML code to be executed when the
file is processed.
The TPTP2X Isabelle format module outputs a THF problem in Isabelle/HOL
syntax, augmented with ML code that (1) runs the nine tactics in sequence,
each with a CPU time limit, until one succeeds or all fail, and
(2) reports the result and proof (if found) using the SZS standards.
A Perl script is used to insert the CPU time limit (equally divided
over the nine tactics) into TPTP2X's Isabelle format output, and then
run the command line isabelle-process on the resulting theory file.
The implementation of Isabelle relies on a small LCF-style kernel, meaning that
inferences are implemented as operations on an abstract theorem
datatypes. Assuming the kernel is correct, all values of type theorem
are correct by construction.
Most of the code for Isabelle was written by the Isabelle teams at the
University of Cambridge and the Technische Universität München.
A notable exception is the metis proof method, which was taken from
the HOL4 theorem prover (also implemented in ML).
Isabelle/HOL is available for all major platforms under a BSD-style license
from
For the LTB division, leanCoP-SInE runs leanCoP as the underlying inference
engine of SInE 0.4, i.e., axiom selection is done
by SInE, and proof attempts are done by leanCoP.
The calculus currently employed by LEO-II is best characterized as a
strategic refinement of extensional higher-order RUE resolution
[Ben99].
Novel aspects in LEO-II version 1.0 over previous versions include
splitting of the conjecture in several subproblems (if appropriate),
improved clause normalization and improved strategies for the basic
fragment of simple type theory [BS09] (all but
one of the examples that were characterized as challenging for higher-order
provers in [BS09] are now provable in less than 0.15
seconds in LEO-II).
LEO-II offers also an interactive mode and, most importantly, it is
capable of generating proof objects in TSTP format.
Equinox 4.1
Koen Claessen
Chalmers University of Technology, Sweden
Architecture
Equinox is an experimental theorem prover for pure first-order logic
with equality. It finds ground proofs of the input theory, by solving
successive ground instantiations of the theory using an incremental
SAT-solver. Equinox is based on a "stack" of increasingly powerful
logics, starting at propositional logic all the way up the first-order
logic with equality. The implementation consists of a corresponding
stack of theorem provers.
Strategies
There is only one strategy in Equinox:
Implementation
The main part of Equinox is implemented in Haskell using the GHC
compiler. Equinox also has a built-in incremental SAT solver (MiniSat)
which is written in C++. The two parts are linked together on the
object level using Haskell's Foreign Function Interface.
Expected Competition Performance
Equinox should perform better than previous years. There should be
problems that it can solve that few other provers can handle.
Infinox 1.0
Ann Lillieström, Koen Claessen
Chalmers University of Technology, Sweden
Architecture
Infinox is a tool that (for some problems) can show the absence of
finite models. It works by showing the existence of functions or
relations with certain properties that imply infinity, such as:
All methods in Infinox are based on one of these three principles.
Strategies
Infinox runs a number of fixed methods in sequence, and terminates
if one of them succeeds.
Implementation
Infinox is implemented in Haskell using the GHC compiler, and it uses
Stephan Schulz' E-prover as a back-end.
Expected Competition Performance
Infinox participates in only the demonstration division.
iProver 0.5
Konstantin Korovin
The University of Manchester, United Kingdom
Architecture
iProver is an automated theorem prover which is based on an instantiation
calculus Inst-Gen [GK03,
Kor08a], complete for first-order logic.
One of the distinctive features of iProver is a modular combination of
first-order reasoning with ground reasoning.
In particular, iProver currently integrates MiniSat for reasoning with
ground abstractions of first-order clauses.
In addition to instantiation, iProver implements ordered resolution calculus
and a combination of instantiation and ordered resolution, see
[Kor08b] for the implementation details.
The saturation process is implemented as a modification of a given
clause algorithm.
We use non-perfect discrimination trees for the unification indexes,
priority queues for passive clauses and
a compressed vector index for subsumption and subsumption resolution
(both forward and backward).
The following redundancy eliminations are implemented:
blocking non-proper instantiations; dismatching constraints
[GK04,Kor08b];
global subsumption [Kor08b]; resolution-based
simplifications and propositional-based simplifications.
Equality is dealt with (internally) by adding the necessary axioms of equality.
iProver has a satisfiablity mode which includes a finite model finder,
based on an adaptation of ideas from DarwinFM and Paradox to the iProver
setting.
Strategies
iProver v0.5 has around 40 options to control the proof
search including options for literal selections,
passive clause selections, frequency of calling the SAT solver,
simplifications and options for combination of instantiation with resolution.
At CASC iProver will execute a fixed schedule of selected options.
Implementation
iProver is implemented in OCaml and for the ground reasoning uses MiniSat.
iProver accepts FOF and CNF formats.
In the case of FOF format, either Vampire or E prover is used for
clausification.
http://www.cs.man.ac.uk/~korovink/iprover/
Expected Competition Performance
iProver 0.5 is the CASC-J4 EPR division winner.
iProver 0.7 and iProver-SInE 0.7
Konstantin Korovin
The University of Manchester, United Kingdom
Architecture
iProver is an automated theorem prover based on an instantiation calculus
Inst-Gen [GK03,Kor08a]
which is complete for first-order logic.
One of the distinctive features of iProver is a modular combination of
first-order reasoning with ground reasoning.
In particular, iProver currently integrates MiniSat
[ES04] for reasoning with ground abstractions
of first-order clauses.
In addition to instantiation, iProver implements ordered resolution calculus
and a combination of instantiation and ordered resolution, see
[Kor08b] for the implementation details.
The saturation process is implemented as a modification of a given
clause algorithm.
We use non-perfect discrimination trees for the unification indexes,
priority queues for passive clauses and a compressed vector index for
subsumption and subsumption resolution (both forward and backward).
The following redundancy eliminations are implemented: blocking non-proper
instantiations; dismatching constraints
[GK04,Kor08b];
global subsumption [Kor08b]; resolution-based
simplifications and propositional-based simplifications.
We implemented a compressed feature vector index
for efficient forward/backward subsumption and subsumption resolution.
Equality is dealt with (internally) by adding the necessary axioms of equality.
Strategies
iProver has around 40 options to control the proof
search including options for literal selections,
passive clause selections, frequency of calling the SAT solver,
simplifications and options for combination of instantiation with resolution.
At CASC iProver will execute a small number of fixed schedules of selected
options depending on general syntactic properties such as Horn/non-Horn,
equational/non-equational and maximal term depth.
Implementation
iProver is implemented in OCaml and for the ground reasoning uses MiniSat.
iProver accepts FOF and CNF formats, in the case of FOF format,
E prover is used for clausification.
iProver is available at:
http://www.cs.man.ac.uk/~korovink/iprover/
Expected Competition Performance
We expect strong performance in the EPR division
and reasonably good performance in FOF, CNF, FNT, SAT and LTB divisions.
iProver-Eq 0.5
Konstantin Korovin, Christoph Sticksel
The University of Manchester, United Kingdom
Architecture
iProver-Eq extends the iProver system
[Kor08a] with built-in equational reasoning, along
the lines of [GK04].
As in the iProver system, first-order reasoning is combined with ground
satisfiability checking where the latter is delegated to an off-the-shelf
ground solver.
The main differences with iProver are that the ground reasoning is done modulo
the theory of equality and the first-order reasoning is done by a variant
of ordered paramodulation calculus.
Strategies
The most influential options on the proof search in iProver-Eq control literal
selection and passive clause selection in the individual given clause
algorithms as well as the global distribution of time between the three
parts and the ground solver.
At CASC, iProver-Eq will execute a fixed schedule of selected options as
well as simple heuristics for the initial distribution between the equational
and non-equational reasoning.
Implementation
iProver-Eq is implemented in OCaml.
For the ground reasoning iProver-Eq uses CVC3 in the equational case and
MiniSat in the non-equational case.
iProver-Eq accepts FOF and CNF formats. In the case of FOF format,
the E prover is used for clausification.
Expected Competition Performance
iProver-Eq is still work in progress and has not yet been evaluated.
We hope iProver-Eq will perform reasonably well in FOF and CNF divisions,
in particular improving iProver's performance on equational problems.
Isabelle/HOL 2009
Jasmin Blanchette1,
Lawrence C. Paulson2,
Tobias Nipkow1,
Makarius Wenzel1,
Stefan Berghofer1
1Technische Universität München, Germany,
2University of Cambridge, United Kingdom
Architecture
Isabelle/HOL 2009 [NPW02] is the higher-order
logic incarnation of the generic proof assistant
Isabelle2009.
While designed for interactive proof development, Isabelle/HOL also provides
several automatic proof tactics, notably
an equational reasoner [Nip89],
a classical reasoner [PN94],
a tableau prover [Pau99],
and a first-order resolution-based prover [Hur03].
Strategies
The Isabelle/HOL wrapper submitted to the competition simply tries the following
tactics sequentially:
Implementation
Isabelle is a generic theorem prover written in Standard ML. Its meta-logic,
Isabelle/Pure, provides an intuitionistic fragment of higher-order logic. The
HOL object logic extends pure with a more elaborate version of higher-order
logic, complete with the familiar connectives and quantifiers. Other object
logics are available, notably FOL (first-order logic) and ZF
(Zermelo–Fraenkel set theory).
http://www.cl.cam.ac.uk/research/hvg/Isabelle
Expected Competition Performance
Early results with Isabelle/HOL 2008 suggest that Isabelle will finish third
in the THF category, immediately after TPS.
However, in the meantime, two unsound proof modes have been disabled in TPS.
This could be enough to provide us with a second place.
leanCoP 2.1 and leanCoP-SInE 2.1
Jens Otten
University of Potsdam, Germany
Architecture
leanCoP [OB03,Ott08a]
is an automated theorem prover for classical first-order logic.
It is a very compact implementation of the connection (tableau) calculus
[Bib87,LS01].
Strategies
The reduction rule of the connection calculus is applied before the extension
rule.
Open branches are selected in a depth-first way.
Iterative deepening on the proof depth is used to achieve completeness.
leanCoP 2.1 includes all the additional strategies and search techniques
of leanCoP 2.0, i.e., regularity, lemmata, restricted backtracking
[Ott08b], and a fixed strategy scheduling.
Implementation
leanCoP is implemented in Prolog.
The source code of the core prover is only a few lines long and fits on
half a page.
Prolog's built-in indexing mechanism is used to quickly find connections.
While the core prover and the definitional clausal form transformation
from leanCoP 2.0 have not changed, the following extensions and improvements
have been integrated into leanCoP 2.1:
The source code of leanCoP 2.1, which is available under the GNU general
public license, together with more information can be found on the
leanCoP web site at:
http://www.leancop.de
Expected Competition Performance
We expect the performance of leanCoP 2.1 to be similar to that of leanCoP 2.0.
On problems with many axioms, like those in the LTB division of CASC,
the combined leanCoP-SInE prover should perform significantly
better than leanCoP alone.
LEO-II 1.0
Christoph Benzmüller1,2, Frank Theiss1
1International University of Germany, Germany,
2Articulate Software, USA
Architecture
The higher-order theorem prover LEO-II is the successor of LEO
[BK98], which was developed as part of the
Saarbruecken OMEGA system [SBA06].
The development of the independent LEO-II prover started in 2006 at
Cambridge University, UK, in a joint project with Larry Paulson that
was funded by the EPSRC.
Previous versions of LEO-II have been described in the literature
[BP+07,BP+08].
Strategies
LEO-II is designed to cooperate with specialist systems for fragments
of higher-order logic. The idea is to combine the strengths of the
different systems: LEO-II predominantly addresses higher-order aspects
in its reasoning process, with the aim of quickly removing
higher-order clauses from the search space, and turning them into
first-order clauses that can be refuted with a first-order ATP
system. Currently, LEO-II is capable of cooperating with the
first-order ATP systems E, SPASS, and Vampire. By default LEO-II
cooperates with E.
Implementation
LEO-II is implemented in Objective Caml version 3.10 and its problem
representation language is the new TPTP THF language
[BRS08].
The improved performance of LEO-II in comparison to its predecessor LEO (implemented in LISP) is due to several novel features including the exploitation of term sharing and term indexing techniques [TB06], support for primitive equality reasoning (extensional higher-order RUE resolution), and improved heuristics at the calculus level.
Unfortunately the LEO-II system currently uses only a very simple sequential collaboration model with first-order ATPs instead of using the more advanced, concurrent and resource-adaptive OANTS architecture [BS+08] as exploited by its predecessor LEO.
The LEO-II system is distributed under a BSD style license and it is available online at:
http://leoprover.org
Matita 1.0
The HELM Team and contributors
University of Bologna, Italy
Architecture
Matita (0.5.7) is an interactive theorem prover including a fully automatic
module (prover) for problems falling in the first order unit equality case.
An overall description of the system and its architecture can be found
in [AC+07], but there is no paper devoted
to the automatic prover built in the system.
The ATP system registered to the CASC competition will be part of the upcoming 1.0 release of the Matita interactive theorem prover, and is thus not part of any officially released versions, but can be downloaded using the SVN repository at:
http://matita.cs.unibo.it
Metis 2.2
Joe Hurd
Galois, Inc., USA
Architecture
Metis 2.2 [Hur03] is a proof tactic used in the
HOL4 interactive theorem prover.
It works by converting a higher order logic goal to a set of clauses in
first order logic, with the property that a refutation of the clause set
can be translated to a higher order logic proof of the original goal.
Experiments with various first order calculi [Hur03] have shown a given clause algorithm and ordered resolution to best suit this application, and that is what Metis 2.2 implements. Since equality often appears in interactive theorem prover goals, Metis 2.2 also implements the ordered paramodulation calculus.
In addition to standard age and size measures, Metis 2.2 uses finite models to weight clauses in the Passive set. When integrated with higher order logic, a finite model is manually constructed to interpret standard functions and relations in such a way as to make many standard axioms true and negated goals false. Non-standard functions and relations are interpreted randomly, but with a bias towards making negated goals false. Since it is part of the CASC competition rules that standard functions and relations are obfuscated, Metis 2.2 will back off to the biased random interpretation of all functions and relations (except equality), using a finite model with 4 elements.
Metis 2.2 reads problems in TPTP format and outputs detailed proofs in TSTP format, where each refutation step is one of 6 simple inference rules. Metis 2.2 implements a complete calculus, so when the set of clauses is saturated it can soundly declare the input problem to be unprovable (and outputs the saturation set).
Metis 2.2 is free software, released under the GPL. It can be downloaded from
http://www.gilith.com/software/metis
OSHL-S 0.6
Hao Xu, David Plaisted
University of North Carolina at Chapel Hill, USA
Architecture
OSHL-S is a theorem prover based on the architecture and strategy
introduced in [PZ00] with a few improvements.
Type inference is
employed to reduce the number of instances that are generated before
a contradicting instance is found.
http://www.cs.unc.edu/~xuh/oshls
Otter 3.3
William McCune
Argonne National Laboratory, USA
Architecture
Otter 3.3 [McC03a] is an ATP system for
statements in first-order (unsorted) logic with equality.
Otter is based on resolution and paramodulation applied to clauses.
An Otter search uses the "given clause algorithm", and typically involves
a large database of clauses; subsumption and demodulation play an important
role.
http://www-unix.mcs.anl.gov/AR/otter/
Acknowledgments: Ross Overbeek, Larry Wos, Bob Veroff, and Rusty Lusk contributed to the development of Otter.
Paradox 3.0
Koen Claessen, Niklas Sörensson
Chalmers University of Technology, Sweden
Architecture
Paradox [CS03] is a
finite-domain model generator. It is based on a MACE-style
[McC03b] flattening and instantiating of the
first-order clauses into propositional clauses, and then the use of
a SAT solver to solve the resulting problem.
Paradox incorporates the following features: Polynomial-time clause splitting heuristics, the use of incremental SAT, static symmetry reduction techniques, and the use of sort inference.
SInE is available from:
When searching for a proof, TPS first searches for an expansion proof
[Mil87] or an extensional expansion proof
[Bro07] of the theorem. Part
of this process involves searching for acceptable matings
[And81].
Using higher-order unification, a pair of occurrences of subformulas
(which are usually literals) is mated appropriately on each vertical
path through an expanded form of the theorem to be proved. The
expansion proof thus obtained is then translated
[Pfe87] without further
search into a proof of the theorem in natural deduction style.
The translation process provides an effective soundness check, but
it is not actually used during the competition.
TPS is implemented in Common Lisp, and is available from:
No system description supplied.
Substitution tree indexing is used to implement all major operations on
sets of terms and literals.
Although the kernel of the system works only with clausal normal forms,
the shell accepts a problem in the full first-order logic syntax, clausifies
it and performs a number of useful transformations before passing the result
to the kernel.
Also the axiom selection algorithm of last year's LTB divison winner SInE
can be enabled as part of the preprocessing.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
No system description supplied.
Waldmeister C09a is a minor upgrade of Waldmeister 806, the best feature
of which is TPTP format output.
SInE 0.4
Kryštof Hoder
The University of Manchester, United Kingdom
Architecture
SiNE 0.4 is an axiom selection system for first order theories.
It uses a syntactic approach based on symbols' presence in axioms and
the conjecture.
(When we say symbols, we mean function and predicate symbols taken together.)
A relation D (as in "Defines") is created between symbols and axioms, which
represents the fact that for a symbol there are some axioms that "give it
its meaning".
After the relation has been constructed, the axiom selection starts.
At the beginning only the conjecture is selected, in each iteration the
selection is extended by all axioms that are D-related to any symbol used
in already selected formulae.
The iteration continues until no more axioms are selected.
The selected axioms and the conjecture are then handed to an underlying
inference engine.
Strategies
In order to construct the D-relation, we compute for each symbol the
number of axioms in which it appears, which we call the generality
index of the symbol.
Then each axiom is put into the D-relation of the least general symbol
it contains.
(If there are more symbols with lowest generality index, axiom the is put in
the relation of all of them.)
This strategy selects about 2% of axioms on problems CSR(075-109)+1.
In order to get different sets of axioms for multiple proof attempts on
each problem, we have introduced a benevolence parameter.
Given benevolence b ≥ 1, formula F, which has the least
general symbol with generality index g0, is put into the
D-relations of all its symbols with generality index
g ≤ b × g0.
Another way to modify the axiom set is to add axioms that were used in
proofs of previously proved problems (which used the same axiom set).
Implementation
The axiom selection is implemented in Python.
Problems are accessed one by one as present in the batch file.
First each problem's time limit is calculated.
This calculation is "optimistic" in a way that it assumes that the success
rate of problems to come will be the same as of the already encountered ones.
(It gives a problem more than it's fair share of time, assuming that many
of the problems won't use their whole time limit.)
Each problem is attempted in five phases, each of them specifies different
selection parameters (i.e., benevolence and whether the axioms from previous
proofs are included).
For each phase a D-relation is constructed (or reused if it was constructed
for the same axiom set previously).
Then axioms are selected and passed to an underlying prover.
This version contains EP as an underlying prover, as it was used last year
by the competition-winning version.
http://www.cs.man.ac.uk/~hoderk/sine/sine.tgz
Expected Competition Performance
SInE 0.3 was the CASC-J4 LTB division winner.
SInE 0.4 is a minimally updated version for the new LTB rules.
TPS 3.20080227G1d
Peter B. Andrews1,
Chad E. Brown2
1Carnegie Mellon University, USA,
2Saarland University, Germany
Architecture
TPS is a higher-order theorem proving system that has been developed over
several decades under the supervision of Peter B. Andrews with substantial
work by Eve Longini Cohen, Dale A. Miller, Frank Pfenning, Sunil Issar, Carl
Klapper, Dan Nesmith, Hongwei Xi, Matthew Bishop, Chad E. Brown, and Mark
Kaminski. TPS can be used to prove theorems of Church's type theory
automatically, interactively, or semi-automatically
[AB+96,AB06].
Strategies
Strategies used by TPS in the search process include:
Implementation
TPS has been developed as a research tool for developing, investigating, and
refining a variety of methods of searching for expansion proofs, and
variations of these methods. Its behavior is controlled by hundreds of
flags. A set of flags, with values for them, is called a mode. 52 modes have
been found which collectively suffice for automatically proving virtually all
the theorems which TPS has proved automatically thus far. When searching for a
proof in automatic mode, TPS tries each of these modes in turn for a specified
amount of time.
http://gtps.math.cmu.edu/tps.html
Expected Competition Performance
TPS is quite versatile when searching for proofs of higher-order
theorems, but it is effectively slow when it must try many
modes. Therefore, the performance of TPS is likely to depend very
heavily on the way the competition is set up. If the emphasis is on
speed, with short time limits, TPS may not do very well. On the other
hand, if the emphasis is on the variety of theorems that can be
proved, with very liberal time limits, TPS may do very well.
Vampire 10.0
Andrei Voronkov
University of Manchester, United Kingdom
Expected Competition Performance
Vampire 10.0 is the CASC-J4 FOF and CNF division winner.
Vampire 11.0
Kryštof Hoder, Andrei Voronkov
The University of Manchester, United Kingdom
Architecture
Vampire 11.0 is an automatic theorem prover for first-order classical logic.
It consists of a shell and a kernel.
The kernel implements the calculi of ordered binary resolution and
superposition for handling equality.
The splitting rule in kernel adds propositional parts to clauses, which
are manipulated using binary decision diagrams (BDDs).
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion,
subsumption resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Strategies
The Vampire 11.0 kernel provides a fairly large number of features for
strategy selection.
The most important ones are:
The Vampire 11.0 core is a completely new Vampire, and it shares virtually
no code with the previous versions.
The automatic mode of Vampire 11.0, however, uses both the Vampire 11.0
core and Vampire 10.0 to solve problems, based on input problem
classification that takes into account simple syntactic properties, such
as being Horn or non-Horn, presence of equality, etc.
Implementation
Vampire 11.0 is implemented in C++. The supported compilers are gcc 4.1.x,
gcc 4.3.x.
Expected Competition Performance
We expect Vampire 11.0 to perform comparably to the Vampire 10.0.
Waldmeister 806
Thomas Hillenbrand1,
Bernd Löchner2
1Max-Planck-Institut für Informatik Saarbrücken, Germany,
2Technische Universität Kaiserslautern, Germany
Expected Competition Performance
Waldmeister 806 is the CASC-J4 UEQ division winner.
Waldmeister C09a
Thomas Hillenbrand
Max-Planck-Institut für Informatik, Germany
Architecture
Waldmeister C09a [Hil03] is a system for unit
equational deduction.
Its theoretical basis is unfailing completion in the sense of
[BDP89] with refinements towards ordered
completion (cf. [AHL03]).
The system saturates the input axiomatization, distinguishing active facts,
which induce a rewrite relation, and passive facts, which are the one-step
conclusions of the active ones up to redundancy.
The saturation process is parameterized by a reduction ordering and a heuristic
assessment of passive facts [HJL99].
Implementation
The prover is coded in ANSI-C. It runs on Solaris, Linux, and newly
also on MacOS X. In addition, it is now available for Windows users
via the Cygwin platform. The central data strucures are: perfect
discrimination trees for the active facts; group-wise compressions for
the passive ones; and sets of rewrite successors for the conjectures.
Visit the Waldmeister web pages at:
http://www.waldmeister.org
Strategies
The approach taken to control the proof search is to choose the search
parameters according to the algebraic structure given in the problem
specification [HJL99].
This is based on the observation that proof tasks sharing major parts of
their axiomatization often behave similar.
Hence, for a number of domains, the influence of different reduction orderings
and heuristic assessments has been analyzed experimentally; and in most cases
it has been possible to distinguish a strategy uniformly superior on the whole
domain.
In essence, every such strategy consists of an instantiation of the first
parameter to a Knuth-Bendix ordering or to a lexicographic path ordering, and
an instantiation of the second parameter to one of the weighting functions
addweight, gtweight, or mixweight, which, if
called on an equation , return
,
, or
,
respectively, where denotes the number of symbols
in .
Expected Competition Performance
Waldmeister C09a will output much nicer proofs than Waldmeister 806.