Entrants' System Descriptions
agsyHOL 1.0
Fredrik Lindblad
University of Gothenburg, Sweden
Architecture
AgsyHOL 1.0 [agsyHOL] is based on intuitionistic sequent calculus extended
with rules for classical and equality reasoning.
There is an introduction and elimination rule for each logical construct,
plus rules for RAA, AC and extensionality.
There are special inference modes for elimination and equality reasoning.
A backward search is applied to find proof derivations in the calculus.
Search is controlled by locally assigning priorities to different kind of
constraints, and refining that part of the half-finished proof which is
blocking a constraint with the highest priority.
Strategies
The system has no set of strategies it chooses from by pre-analysing the
problem.
What it does do before starting the proof search is to try to minimise the
need for classical reasoning by transforming the problem using double
negation elimination and the de Morgan laws.
Implementation
The system is implemented in Haskell.
It consists of a proof checker for the calculus.
The proof checker is annotated with search control information controlling
priorities of constraints and costs of choices.
The system also consists of an implementation of lazy narrowing, a general
purpose search mechanism, which is applied to the proof checker in order
to achieve proof search.
The lazy narrowing search has been extended in order to deal with
customizable priorities and costs.
agsyHOL is available from:
https://github.com/frelindb/agsyHOL/
Expected Competition Performance
On SystemOnTPTP it solves 1722 of the THF problems (TPTP version 6.0.0).
Beagle 0.9
Peter Baumgartner, Josh Bax
NICTA and Australian National University, Australia
Architecture
Beagle is an automated theorem prover for sorted first-order logic with
equality over built-in theories.
The theories currently supported are integer arithmetic, linear rational
arithmetic and linear real arithmetic.
It accepts formulas in the FOF and TFF formats of the TPTP syntax, and
formulas in the SMT-LIB version 2 format.
Beagle first converts the input formulas into clause normal form.
Pure arithmetic (sub-)formulas are treated by eager application of quantifier
elimination.
The core reasoning component implements the Hierarchic Superposition Calculus
with Weak Abstraction (HSPWA) [BW13].
Extensions are a splitting rule for clauses that can be divided into variable
disjoint parts, and a chaining inference rule for reasoning with inequalities.
The HSPWA calculus generalizes the superposition calculus by integrating
theory reasoning in a black-box style.
For the theories mentioned above, Beagle combines quantifier elimination
procedures and other solvers to dispatch proof obligations over these theories.
The default solvers are an improved version of Cooper's algorithm for linear
integer arithmetic, and the Fourier-Motzkin algorithm for linear real/rational
arithmetic.
Non-linear integer arithmetic is treated by partial instantiation.
Strategies
Beagles uses the Discount loop for saturating a clause set under the calculus'
inference rules.
Simplification techniques include standard ones, such as subsumption deletion,
demodulation by ordered unit equations, and tautology deletion.
It also includes theory specific simplification rules for evaluating
ground (sub)terms, and for exploiting cancellation laws and properties of
neutral elements, among others.
In the competition an aggressive form of arithmetic simplification is used,
which seems to perform best in practice.
Beagle uses strategy scheduling by trying (at most) two flag settings
sequentially.
Implementation
Beagle is implemented in Scala.
It is a full implementation of the HSPWA calculus.
It uses a simple form of indexing, essentially top-symbol hashes, stored
with each term and computed in a lazy way.
Fairness is achieved through a combination of measuring clause weights
and their derivation-age.
It can be fine-tuned with a weight-age ratio parameter, as in other provers.
Beagle's web site is
http://users.cecs.anu.edu.au/~baumgart/systems/beagle/
Expected Competition Performance
Beagle should perform reasonably well.
cocATP 0.2.0
Cristóbal Camarero
University of Cantabria, Spain
Architecture
cocATP is a Coq-inspired [BC04] automated theorem prover made on the free
time of the author.
It implements (the non-inductive) part of Coq's logic (calculus of
constructions) and syntax.
The proof terms it creates are accepted as proofs by Coq by the addition of
a few definitions (substituting the respective inductive definitions of Coq).
As in Coq, equality and logical connectives other than implication (which is
a dependent product of the logic) are defined adding the proper axioms.
The reasoning is tried to be done the more general possible, avoiding to
give any special treatment to both equality and logical connectives.
At difference of most of the other provers, cocATP does not rely on SAT or
first-order solver.
All reasoning is done in the high-order logic that is the calculus of
constructions.
Strategies
The first of the rules is: for a node with conjecture (forall x:T, P x)
create a new node with hypothesis (x:T) and conecture (P x).
Other of the major rules consist in: when having a node with conjecture (C)
and some hypothesis (H:forall (x1:T1) (x2:T2) ... (xn:Tn), P x1 x2 ... xn),
to unificate P with C and prove the Ti necessary.
The unification is a subset of the possible high-order unifications.
And in this case is a one-sided unification.
There is a lot of other ad-hoc rules and conditions to apply them.
Some of the rules can create existential terms (x:=??:T) alike to Coq's
tactic evar.
With these rules resulting proofs are very similar to human-made proofs.
There is a partially implemented second strategy more similar to the ones
found in other theorem provers.
It consist a saturation algorithm with the high-order modus_ponens.
As example, given a hypothesis (P) and another (or the same) hypothesis
(forall x:T, Q x), unficate P with T generating a new hypothesis Q.
This generalizes the typical saturation rules, as resolution and superposition.
The strategy yet lacks a good conditions for subsumption and simplifcation.
Probably it will not be used in competition.
Implementation
cocATP is implemented in Python-2.7 and C.
It uses the Ply-3.4 library to build the parsers for both the Coq and TPTP
syntaxes.
Recently, the processing core has been reimplemented in C as a Python module,
and a Python variable controls if using a pure Python or the C version.
Using the C module can be an order of magnitude faster, but it is less
tested and introduces the possibility of SEGFAULTS.
Includes a type-verifier of Calculus of Constructions without inductive
constructions, which must be defined with axioms.
That is, a buggy partial clone of Coq.
There is support for most of the TPTP syntax, problems are translated to
a set of calculus of constructions terms.
cocATP has been specially prepared for THF, but all the other TPTP formulae
without numbers should be accepted.
However cocATP does NOT include a SAT solver, thus it will probably not solve
your trivial CNF problems.
More info at:
http://www.alumnos.unican.es/ccc66/cocATP.htm
Expected Competition Performance
With the reimplementation of the kernel in C it is expected a great
reduction in the execution time, but not a lot of newly solved problems.
Crossbow 0.1
Radek Micek
Charles University in Prague, Czech Republic
Architecture
Crossbow 0.1 is a MACE2-style finite model finder similar to Paradox.
It is an implementation of techniques from [CS03] plus
improved flattening and built-in support for commutativity.
Strategies
Same strategy is used for all problems:
- First order formulas are clausified by another prover.
- Additional lemmas are generated by another prover,
small lemmas are added to the clauses of the problem.
- Clauses of the problem are preprocessed (simplification, flattening,
splitting, term definitions, commutativity detection).
- Clauses are instantiated for increasing domain sizes until some model is
found. Instantition takes into account commutativity of functions
and symmetry of predicates.
Implementation
Crossbow is implemented in OCaml.
Clausification and lemma generation is done by E.
MiniSat is used for incremental SAT solving.
Source code is available from:
https://github.com/radekm/crossbow
Expected Competition Performance
It should perform slightly better than Paradox.
CVC4 1.4
Andrew Reynolds
EPFL, Switzerland
Architecture
CVC4 [BC+11] is an SMT solver based on the DPLL(T) architecture [NOT06] that
includes built-in support for many theories including linear arithmetic,
arrays, bit vectors, datatypes and strings.
It incorporates various approaches for handling universally quantified
formulas.
In particular, CVC4 uses primarily uses heuristic approaches based on
E-matching for answering "unsatisfiable", and finite model finding approaches
for answering "satisfiable".
Like other SMT solvers, CVC4 treats quantified formulas using a two-tiered
approach.
First, quantified formulas are replaced by fresh boolean predicates and the
ground theory solver(s) are used in conjunction with the underlying SAT
solver to determine satisfiability.
If the problem is unsatisfiable at the ground level, then the solver answers
"unsatisfiable".
Otherwise, the quantifier instantiation module is invoked, and will either
add instances of quantified formulas to the problem, answer "satisfiable",
or return unknown.
The finite model finding has been developed to target problems containing
background theories, whose quantification is limited to finite and
uninterpreted sorts.
In finite model finding mode, CVC4 uses a ground theory of finite cardinality
constraints that minimizes the number of ground equivalence classes, as
described in [RT+13].
When the problem is satisfiable at the ground level, a candidate model is
constructed that contains complete interpretations for all predicate and
function symbols.
Quantifier instantiation strategies are then invoked to add instances of
quantified formulas that are in conflict with the candidate model, as
described in [RT+13].
If no instances are added, then the solver reports "satisfiable".
Strategies
For handling theorems, CVC4 primarily uses various configurations of
E-matching.
This year, CVC4 incorporates new methods for finding conflicting instances
of quantified formulas [RTd14], which have been shown to lead to improved
performance on unsatisfiable TPTP benchmarks.
CVC4 also incorporates a model-based heuristic for handling quantified
formulas containing only pure arithmetic, which will be used in the TFA
division.
For handling non-theorems, the finite model finding feature of CVC4 will
use a number of orthogonal quantifier instantiation strategies.
This year, it will incorporate several new features, including an optimized
implementation of model-based quantifier instantiation which improves upon
[RT+13], as well as techniques for sort inference.
Since CVC4 with finite model finding is also capable of answering
"unsatisfiable", it will be used as a strategy for theorems as well.
Implementation
CVC4 is implemented in C++.
The code is available from
https://github.com/CVC4
Expected Competition Performance
In the FOF division, CVC4 should perform better than last year, primarily
due to the incorporation of methods from [RTd14].
In the FNT division, CVC4 should also perform better than last year, due
to its incorporation of sort inference techniques and general improvements
to the implementation.
This is the first year CVC4 has entered the TFA division, where it should
be fairly competitive due to its efficient handling of ground arithmetic
constraints.
E 1.9
Stephan Schulz
DHBW Stuttgart, Germany
Architecture
E [Sch02,Sch13] is a purely equational theorem prover for full
first-order logic with equality.
It consists of an (optional) clausifier for pre-processing full first-order
formulae into clausal form, and a saturation algorithm implementing an
instance of the superposition calculus with negative literal selection and
a number of redundancy elimination techniques.
E is based on the DISCOUNT-loop variant of the given-clause algorithm,
i.e., a strict separation of active and passive facts.
No special rules for non-equational literals have been implemented.
Resolution is effectively simulated by paramodulation and equality resolution.
Strategies
Proof search in E is primarily controlled by a literal selection
strategy, a clause evaluation heuristic, and a simplification ordering.
The prover supports a large number of pre-programmed literal selection
strategies.
Clause evaluation heuristics can be constructed on the fly by combining
various parameterized primitive evaluation functions, or can be selected
from a set of predefined heuristics.
Clause evaluation heuristics are based on symbol-counting, but also take
other clause properties into account.
In particular, the search can prefer clauses from the set of support, or
containing many symbols also present in the goal.
Supported term orderings are several parameterized instances of
Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO).
For CASC-J7, E implements a strategy-scheduling automatic mode.
The total CPU time available is broken into 8 (unequal) time slices.
For each time slice, the problem is classified into one of several classes,
based on a number of simple features (number of clauses, maximal symbol
arity, presence of equality, presence of non-unit and non-Horn clauses, ...).
For each class, a schedule of strategies is greedily constructed from
experimental data as follows:
The first strategy assigned to a schedule is the the one that solves
the most problems from this class in the first time slice.
Each subsequent strategy is selected based on the number of solutions on
problems not already solved by a preceding strategy.
About 210 different strategies have been evaluated on all untyped
first-order problems from TPTP 6.0.0, and about 180 of these
strategies are used in the automatic mode.
Implementation
E is build around perfectly shared terms, i.e. each distinct term is
only represented once in a term bank.
The whole set of terms thus consists of a number of interconnected
directed acyclic graphs.
Term memory is managed by a simple mark-and-sweep garbage collector.
Unconditional (forward) rewriting using unit clauses is implemented
using perfect discrimination trees with size and age constraints.
Whenever a possible simplification is detected, it is added as a rewrite
link in the term bank.
As a result, not only terms, but also rewrite steps are shared.
Subsumption and contextual literal cutting (also known as subsumption
resolution) is supported using feature vector indexing [Sch04].
Superposition and backward rewriting use fingerprint indexing [Sch12],
a new technique combining ideas from feature vector indexing and path
indexing.
Finally, LPO and KBO are implemented using the elegant and efficient
algorithms developed by Bernd Löchner in [Loe06,Loe06].
The prover and additional information are available at
http://www.eprover.org
Expected Competition Performance
E 1.9 has slightly better strategies than previous versions, and
can produce proof objects quite efficiently.
The system is expected to perform well in most proof classes, but will
at best complement top systems in the disproof classes.
E.T. 0.1
Josef Urban1,
Cezary Kaliszyk2,
Stephan Schulz3,
Jiri Vyskocil4
1Radboud University Nijmegen, The Netherlands,
2University of Innsbruck, Austria,
3DHBW Stuttgart, Germany,
4Czech Technical University, Czech Republic
Architecture
E.T. 0.1 is a metasystem using E prover with specific strategies [Urb13,KU13]
and preprocessing tools [KU13,KU13,KU13] that are targeted mainly at
problems with many redundant axioms.
Its design is motivated by the recent experiments in the Large-Theory Batch
division [KUV14] and on the Flyspeck, Mizar and Isabelle datasets, however,
E.T. does no learning from related proofs.
Strategies
We characterize formulas by the symbols and terms that they contain,
normalized in various ways.
Then we run various algorithms that try to remove the redundant axioms and
use special strategies on such problems.
Implementation
The metasystem is implemented in ca. 1000 lines of Perl.
It uses a number of external programs, some of them based on E's code base,
some of them independently implemented in C++.
Expected Competition Performance
E.T. can solve some problems that E 1.8 cannot prove, and even some TPTP
problems with rating 1.
The CASC performance should not be much worse than that of E, possibly
better, depending on problem selection.
HOLyHammer 140616
Cezary Kaliszyk1,
Josef Urban2,
Stephan Schulz3
1University of Innsbruck, Austria,
2Radboud University Nijmegen, The Netherlands,
3DHBW Stuttgart, Germany
Architecture
HOLyHammer [KU14,KU13] is a metasystem for premise selection and proof
translation from polymorphic HOL to FOF ATP provers.
This is a version restricted to THF0 and to using E prover with specific
strategies.
Strategies
The translation uses polymorphic type tags, the apply functor, lambda lifting
and predicate abstraction.
The premise selection [KU14] uses k-NN, naive Bayes and a modified
implementation of the Meng-Paulson relevance filter.
We characterize formulas by the symbols and terms that they contain,
normalized in various ways.
Implementation
The HH metasystem consists of about 2000 lines of OCaml code on top of
about 5000 lines of HOL Light code.
The predictors are about 400 lines of C++ code.
The system produces higher-order logic proofs internally that match the
HOL Light kernel, however no TSTP proofs are produced [KU13].
Expected Competition Performance
We expect HOLyHammer to solve some large problems with many redundant axioms,
however we have not tested it on the whole set of THF0 TPTP problems.
iProver 0.9
Konstantin Korovin
University of Manchester, United Kingdom
Architecture
iProver is an automated theorem prover based on an instantiation calculus
Inst-Gen [GK03,Kor09] which is complete for first-order logic.
One of the distinctive features of iProver is a modular combination of
first-order reasoning with ground reasoning.
In particular, iProver currently integrates MiniSat [ES04] for reasoning
with ground abstractions of first-order clauses.
In addition to instantiation, iProver implements ordered resolution calculus
and a combination of instantiation and ordered resolution; see
[Kor08] for the implementation details.
The saturation process is implemented as a modification of a given
clause algorithm.
iProver uses non-perfect discrimination trees for the unification indexes,
priority queues for passive clauses, and a compressed vector index for
subsumption and subsumption resolution (both forward and backward).
The following redundancy eliminations are implemented: blocking non-proper
instantiations; dismatching constraints [GK04,Kor08]; global subsumption
[Kor08]; resolution-based simplifications and propositional-based
simplifications.
A compressed feature vector index is used
for efficient forward/backward subsumption and subsumption resolution.
Equality is dealt with (internally) by adding the necessary axioms of equality
with an option of using Brand's transformation.
In the LTB division, iProver-SInE uses axiom selection based
on the SInE algorithm [HV11] as implemented in Vampire [HKV10],
i.e., axiom selection is done by Vampire and
proof attempts are done by iProver.
Major additions in the current version are:
- answer computation,
- several modes for model output using first-order definitions in
term algebra,
- Brand's transformation.
Strategies
iProver has around 40 options to control the proof
search including options for literal selection,
passive clause selection, frequency of calling the SAT solver,
simplifications and options for combination of instantiation with resolution.
At CASC iProver will execute a small number of fixed schedules of selected
options depending on general syntactic properties such as Horn/non-Horn,
equational/non-equational, and maximal term depth.
Implementation
iProver is implemented in OCaml and for the ground reasoning uses MiniSat.
iProver accepts FOF and CNF formats,
where Vampire [HKV10] is used for clausification of FOF problems.
iProver is available from
http://www.cs.man.ac.uk/~korovink/iprover/
Expected Competition Performance
iProver 0.9 is the CASC-24 EPR division winner.
iProver 1.0
Konstantin Korovin, Christoph Sticksel
University of Manchester, United Kingdom
Architecture
iProver is an automated theorem prover based on an instantiation calculus
Inst-Gen [GK03,Kor13] which is complete for first-order logic.
iProver combines first-order reasoning with ground reasoning for which it
uses MiniSat [ES04] and was recently extended with PicoSAT [Bie08] and
Lingeling [Bie12] (only MiniSat will be used at this CASC).
iProver also combines instantiation with ordered resolution;
see [Kor08] for the implementation details.
The proof search is implemented using a saturation process based on
the given clause algorithm. iProver uses non-perfect discrimination trees
for the unification indexes, priority queues for passive clauses, and
a compressed vector index for subsumption and subsumption resolution
(both forward and backward).
The following redundancy eliminations are implemented: blocking non-proper
instantiations; dismatching constraints [GK04,Kor08]; global subsumption
[Kor08]; resolution-based simplifications and propositional-based
simplifications. A compressed feature vector index is used
for efficient forward/backward subsumption and subsumption resolution.
Equality is dealt with (internally) by adding the necessary axioms of equality
with an option of using Brand's transformation.
In the LTB division, iProver uses axiom selection based on the SInE
algorithm [HV11] as implemented in Vampire [HKV11], i.e., axiom selection
is done by Vampire and proof attempts are done by iProver.
Some of iProver features are summarised below.
- proof extraction for both instantiation and resolution,
- model representation, using first-order definitions in term algebra,
- answer substitutions,
- semantic filtering,
- type inference, monotonic [CLS11] and non-cyclic types,
- Brand's transformation.
Type inference is targeted at improving finite model finding and symmetry
breaking.
Semantic filtering is used in preprocessing to eliminated irrelevant clauses.
Proof extraction is challenging
due to simplifications such global subsumption which involve global reasoning
with the whole clause set and can be computationally expensive.
Strategies
iProver has around 60 options to control the proof search including options
for literal selection, passive clause selection, frequency of calling the
SAT solver, simplifications and options for combination of instantiation
with resolution.
At CASC iProver will execute a small number of fixed schedules of selected
options depending on general syntactic properties such as Horn/non-Horn,
equational/non-equational, and maximal term depth.
The strategy for satisfiable problems (FNT division) includes finite model
finding.
Implementation
iProver is implemented in OCaml and for the ground reasoning uses
MiniSat [ES04].
iProver accepts FOF and CNF formats.
Vampire [HKV11,HK+12] is used for proof-producing clausification of FOF
problems as well as for axiom selection [HV11] in the LTB division.
iProver is available from
http://www.cs.man.ac.uk/~korovink/iprover/
Expected Competition Performance
iProver 1.0 is the CASC-24 FNT division winner.
Isabelle 2013
Jasmin C. Blanchette1, Lawrence C. Paulson2,
Tobias Nipkow1, Makarius Wenzel3
1Technische Universität München, Germany,
2University of Cambridge, United Kingdom,
3Université Paris Sud, France
Architecture
Isabelle/HOL 2013 [NPW12] is the higher-order logic incarnation of the generic
proof assistant Isabelle2013.
Isabelle/HOL provides several automatic proof tactics, notably an equational
reasoner [Nip89], a classical reasoner [PN94], and a tableau prover [Pau99].
It also integrates external first- and higher-order provers via its subsystem
Sledgehammer [PB10,BBP11].
Isabelle includes a parser for the TPTP syntaxes CNF, FOF, TFF0, and THF0, due
to Nik Sultana.
It also includes TPTP versions of its popular tools, invokable on the command
line as isabelle tptp_tool max_secs
file.p. For example:
isabelle tptp_isabelle_hot 100 SEU/SEU824^3.p
Strategies
The Isabelle tactic submitted to the competition simply tries the
following tactics sequentially:
- sledgehammer
- Invokes the following sequence of provers as oracles via Sledgehammer:
- spass - SPASS 3.8ds [BP+12];
- vampire - Vampire 1.8 (revision 1435) [RV02];
- e - E 1.4 [Sch04];
- z3_tptp - Z3 3.2 with TPTP syntax [dMB08].
- nitpick
- For problems involving only the type $o of Booleans, checks
whether a finite model exists using Nitpick [BN10].
- simp
- Performs equational reasoning using rewrite rules [Nip89].
- blast
- Searches for a proof using a fast untyped tableau prover and then
attempts to reconstruct the proof using Isabelle tactics
[Pau99].
- auto+spass
- Combines simplification and classical reasoning
[Pau94] under one roof; then invoke Sledgehammer
with SPASS on any subgoals that emerge.
- z3
- Invokes the SMT solver Z3 3.2 [dMB08].
- cvc3
- Invokes the SMT solver CVC3 2.2 [BT07].
- fast
- Searches for a proof using sequent-style reasoning, performing a
depth-first search [Pau94]. Unlike
blast, it construct proofs directly in Isabelle. That makes it
slower but enables it to work in the presence of the more unusual features
of HOL, such as type classes and function unknowns.
- best
- Similar to fast, except that it performs a best-first search.
- force
- Similar to auto, but more exhaustive.
- meson
- Implements Loveland's MESON procedure [Lov78].
Constructs proofs directly in Isabelle.
- fastforce
- Combines fast and force.
Implementation
Isabelle is a generic theorem prover written in Standard ML.
Its meta-logic, Isabelle/Pure, provides an intuitionistic fragment of
higher-order logic.
The HOL object logic extends pure with a more elaborate version of higher-order
logic, complete with the familiar connectives and quantifiers.
Other object logics are available, notably FOL (first-order logic) and ZF
(Zermelo-Fraenkel set theory).
The implementation of Isabelle relies on a small LCF-style kernel, meaning that
inferences are implemented as operations on an abstract theorem
datatype.
Assuming the kernel is correct, all values of type theorem are
correct by construction.
Most of the code for Isabelle was written by the Isabelle teams at the
University of Cambridge and the Technische Universität München.
Isabelle/HOL is available for all major platforms under a BSD-style license
from
http://www.cl.cam.ac.uk/research/hvg/Isabelle
Expected Competition Performance
Third place, after Satallax and Satallax_MaLeS but before AgsyHOL and LEO-II.
leanCoP 2.2
Jens Otten
University of Potsdam, Germany
Architecture
leanCoP [OB03,Ott08] is an automated theorem prover for classical first-order
logic with equality.
It is a very compact implementation of the connection (tableau)
calculus [Bib87,LS01].
Strategies
The reduction rule of the connection calculus is applied before the
extension rule. Open branches are selected in a depth-first way.
Iterative deepening on the proof depth is performed in order to achieve
completeness.
Additional inference rules and techniques include regularity, lemmata,
and restricted backtracking [Ott10].
leanCoP uses an optimized structure-preserving transformation
into clausal form [Ott10] and a fixed
strategy scheduling, which is controlled by a shell script.
Implementation
leanCoP is implemented in Prolog.
The source code of the core prover consists only of a few lines of code.
Prolog's built-in indexing mechanism is used to quickly find connections
when the extension rule is applied.
leanCoP can read formulae in leanCoP syntax and in
TPTP first-order syntax. Equality axioms and axioms to support
distinct objects are automatically added if required.
The leanCoP core prover returns a very compact connection proof,
which is then translated into a more comprehensive output format,
e.g., into a lean (TPTP-style) connection proof or into a
readable text proof.
The source code of leanCoP 2.2 is available under the GNU general
public license.
It can be downloaded from the leanCoP website at:
http://www.leancop.de
The website also contains information about ileanCoP [Ott08] and MleanCoP
[Ott12,Ott14], two versions of leanCoP for first-order intuitionistic logic
and first-order modal logic, respectively.
Expected Competition Performance
As the core prover has not changed, the performance of leanCoP 2.2
is expected to be similar to the performance of leanCoP 2.1.
LEO-II 1.6.2
Christoph Benzmüller
Freie Universität Berlin, Germany
Architecture
LEO-II [BP+08], the successor of LEO [BK98], is a higher-order ATP system
based on extensional higher-order resolution.
More precisely, LEO-II employs a refinement of extensional higher-order
RUE resolution [Ben99].
LEO-II is designed to cooperate with specialist systems for fragments of
higher-order logic.
By default, LEO-II cooperates with the first-order ATP system E [Sch02].
LEO-II is often too weak to find a refutation amongst the steadily growing
set of clauses on its own.
However, some of the clauses in LEO-II's search space attain a special
status: they are first-order clauses modulo the application of an
appropriate transformation function.
Therefore, LEO-II launches a cooperating first-order ATP system every n
iterations of its (standard) resolution proof search loop (e.g., 10).
If the first-order ATP system finds a refutation, it communicates its success
to LEO-II in the standard SZS format.
Communication between LEO-II and the cooperating first-order ATP system
uses the TPTP language and standards.
Strategies
LEO-II employs an adapted "Otter loop".
Moreover, LEO-II uses some basic strategy scheduling to try different
search strategies or flag settings.
These search strategies also include some different relevance filters.
Implementation
LEO-II is implemented in OCaml 4, and its problem representation language
is the TPTP THF language [BRS08].
In fact, the development of LEO-II has largely paralleled the development
of the TPTP THF language and related infrastructure [SB10].
LEO-II's parser supports the TPTP THF0 language and also the TPTP languages
FOF and CNF.
Unfortunately the LEO-II system still uses only a very simple
sequential collaboration model with first-order ATPs instead of using
the more advanced, concurrent and resource-adaptive OANTS architecture
[BS+08] as exploited by its predecessor LEO.
The LEO-II system is distributed under a BSD style license, and it is
available from
http://www.leoprover.org
Expected Competition Performance
There are only minor recent improvements on LEO-II.
It is unclear whether they are sufficient for attacking LEO-II's main
competitors.
Muscadet 4.4
Dominique Pastre
University Paris Descartes, France
Architecture
The Muscadet theorem prover is a knowledge-based system.
It is based on Natural Deduction, following the terminology of [Ble71] and
[Pas78], and uses methods which resembles those used by humans.
It is composed of an inference engine, which interprets and executes rules,
and of one or several bases of facts,
which are the internal representation of "theorems to be proved".
Rules are either universal and put into the system, or built by the system
itself by metarules from data (definitions and lemmas).
Rules may add new hypotheses, modify the conclusion, create objects,
split theorems into two or more subtheorems
or build new rules which are local for a (sub-)theorem.
Strategies
There are specific strategies for existential, universal, conjonctive or
disjunctive hypotheses and conclusions, and equalities.
Functional symbols may be used, but an automatic creation of intermediate
objects allows deep subformulae to be flattened and treated as if the
concepts were defined by predicate symbols.
The successive steps of a proof may be forward deduction (deduce new hypotheses
from old ones), backward deduction (replace the conclusion by a new one),
refutation (only if the conclusion is a negation), search for objects
satisfying the conclusion or dynamic building of new rules.
The system is also able to work with second order statements.
It may also receive knowledge and know-how for a specific domain from a
human user; see [Pas89] and [Pas93].
These two possibilities are not used while working with the TPTP Library.
Implementation
Muscadet [Pas01] is implemented in SWI-Prolog.
Rules are written as more or less declarative Prolog clauses.
Metarules are written as sets of Prolog clauses.
The inference engine includes the Prolog interpreter and some procedural
Prolog clauses.
A theorem may be split into several subtheorems, structured as a tree with
"and" and "or" nodes.
All the proof search steps are memorized as facts including all the elements
which will be necessary to extract later the useful steps (the name of the
executed action or applied rule, the new facts added or rule dynamically
built, the antecedents and a brief explanation).
Muscadet is available from
http://www.math-info.univ-paris5.fr/~pastre/muscadet/muscadet.html
Expected Competition Performance
The best performances of Muscadet will be for problems manipulating many
concepts in which all statements (conjectures, definitions, axioms) are
expressed in a manner similar to the practice of humans, especially of
mathematicians [Pas02,Pas07].
It will have poor performances for problems using few concepts but large
and deep formulas leading to many splittings.
Its best results will be in set theory, especially for functions and relations.
Its originality is that proofs are given in natural style.
Changes since last year are only minor corrections.
Princess 140704
Philipp Rümmer, Peter Backeman
Uppsala University, Sweden
Architecture
Princess [Rue08,Rue12] is a theorem prover for first-order logic
modulo linear integer arithmetic.
The prover uses a combination of techniques from the areas of first-order
reasoning and SMT solving.
The main underlying calculus is a free-variable tableau calculus,
which is extended with constraints to enable backtracking-free proof
expansion, and positive unit hyper-resolution for lightweight
instantiation of quantified formulae.
Linear integer arithmetic is handled using a set of built-in proof rules
resembling the Omega test, which altogether yields a calculus that is
complete for full Presburger arithmetic, for first-order logic, and for
a number of further fragments.
In addition, some built-in procedures for nonlinear integer arithmetic are
available.
The internal calculus of Princess only supports uninterpreted predicates;
uninterpreted functions are encoded as predicates, together with the usual
axioms.
Through appropriate translation of quantified formulae with functions,
the e-matching technique common in SMT solvers can be simulated; triggers
in quantified formulae are chosen based on heuristics similar to those in
the Simplify prover.
Strategies
For CASC, Princess will run a fixed schedule of configurations for
each problem (portfolio method).
Configurations determine, among others, the mode of proof expansion
(depth-first, breadth-first), selection of triggers in quantified formulae,
clausification, and the handling of functions.
The portfolio was chosen based on training with a random sample of problems
from the TPTP library.
Implementation
Princess is entirely written in Scala and runs on any recent Java
virtual machine; besides the standard Scala and Java libraries, only
the Cup parser library is used.
Princess is available from:
http://www.philipp.ruemmer.org/princess.shtml
Expected Competition Performance
Princess is mainly designed for integer problems (TFI), and should
perform reasonably well here.
Reasoning about rationals or reals is not the main focus of the work, and
results will be accordingly.
Prover9 2009-11A
Bob Veroff on behalf of William McCune
University of New Mexico, USA
Architecture
Prover9, Version 2009-11A, is a resolution/paramodulation prover for
first-order logic with equality.
Its overall architecture is very similar to that of Otter-3.3 [McC03].
It uses the "given clause algorithm", in which not-yet-given clauses are
available for rewriting and for other inference operations (sometimes called
the "Otter loop").
Prover9 has available positive ordered (and nonordered) resolution and
paramodulation, negative ordered (and nonordered) resolution, factoring,
positive and negative hyperresolution, UR-resolution, and demodulation
(term rewriting).
Terms can be ordered with LPO, RPO, or KBO.
Selection of the "given clause" is by an age-weight ratio.
Proofs can be given at two levels of detail:
(1) standard, in which each line of the proof is a stored clause
with detailed justification, and
(2) expanded, with a separate line for each operation.
When FOF problems are input, proof of transformation to clauses
is not given.
Completeness is not guaranteed, so termination does not indicate
satisfiability.
Strategies
Like Otter, Prover9 has available many strategies; the following
statements apply to CASC-2012.
Given a problem, Prover9 adjusts its inference rules and strategy according
to syntactic properties of the input clauses such as the presence of
equality and non-Horn clauses.
Prover9 also does some preprocessing, for example, to eliminate predicates.
In previous CASC competitions, Prover9 has used LPO to order terms for
demodulation and for the inference rules, with a simple rule for determining
symbol precedence.
For CASC 2012, we are going to use KBO instead.
For the FOF problems, a preprocessing step attempts to
reduce the problem to independent subproblems by a miniscope
transformation; if the problem reduction succeeds, each
subproblem is clausified and given to the ordinary search
procedure; if the problem reduction fails, the original
problem is clausified and given to the search procedure.
Implementation
Prover9 is coded in C, and it uses the LADR libraries.
Some of the code descended from EQP [McC97].
(LADR has some AC functions, but Prover9 does not use them).
Term data structures are not shared (as they are in Otter).
Term indexing is used extensively, with discrimination tree indexing for
finding rewrite rules and subsuming units, FPA/Path indexing for finding
subsumed units, rewritable terms, and resolvable literals.
Feature vector indexing [Sch04] is used for forward and backward nonunit
subsumption.
Prover9 is available from
http://www.cs.unm.edu/~mccune/prover9/
Expected Competition Performance
Some of the strategy development for CASC was done by experimentation with
the CASC-2004 competition "selected" problems.
(Prover9 has not yet been run on other TPTP problems.)
Prover9 is unlikely to challenge the CASC leaders, because
(1) extensive testing and tuning over TPTP problems has not been done,
(2) theories (e.g., ring, combinatory logic, set theory) are not recognized,
(3) term orderings and symbol precedences are not fine-tuned, and
(4) multiple searches with differing strategies are not run.
Finishes in the middle of the pack are anticipated in all categories
in which Prover9 competes.
Satallax 2.7
Chad E. Brown
Saarland University, Germany
Architecture
Satallax 2.7 [Bro12] is an automated theorem prover for higher-order logic.
The particular form of higher-order logic supported by Satallax is Church's
simple type theory with extensionality and choice operators.
The SAT solver MiniSat [ES04] is responsible for much of the search for a
proof.
The theoretical basis of search is a complete ground tableau calculus for
higher-order logic [BS10] with a choice operator [BB11].
A problem is given in the THF format.
A branch is formed from the axioms of the problem and the negation of the
conjecture (if any is given).
From this point on, Satallax tries to determine unsatisfiability or
satisfiability of this branch.
Satallax progressively generates higher-order formulae and corresponding
propositional clauses [Bro13].
These formulae and propositional clauses correspond to instances of the
tableau rules.
Satallax uses the SAT solver MiniSat as an engine to test the current set
of propositional clauses for unsatisfiability. If the clauses are
unsatisfiable, then the original branch is unsatisfiable.
Additionally, Satallax may optionally generate first-order formulas in
addition to the propositional clauses.
If this option is used, then Satallax peroidically calls the first-order
theorem prover E to test for first-order unsatisfiability.
If the set of first-order formulas is unsatisfiable, then the original branch
is unsatisfiable.
Strategies
There are about a hundred flags that control the order in which formulas
and instantiation terms are considered and propositional clauses are
generated.
Other flags activate some optional extensions to the basic proof procedure
(such as whether or not to call the theorem prover E).
A collection of flag settings is called a mode.
Approximately 500 modes have been defined and tested so far.
A strategy schedule is an ordered collection of modes with information about
how much time the mode should be allotted.
Satallax tries each of the modes for a certain amount of time sequentially.
Satallax 2.7 has strategy schedule consisting of 68 modes.
Each mode is tried for time limits ranging from 0.1 seconds to 54.9 seconds.
The strategy schedule was determined through experimentation using the
THF problems in version 5.4.0 of the TPTP library.
Implementation
Satallax 2.7 is implemented in OCaml.
A foreign function interface is used to in teract with MiniSat 2.2.0.
Satallax is available from
http://satallax.com
Expected Competition Performance
Satallax 2.7 is the CASC-24 THF division runner-up.
Satallax_MaLeS 1.3
Daniel Kuehlwein
Radboud University Nijmegen, The Netherlands
Architecture
Satallax_MaLeS 1.3 improves Satallax's automatic mode with machine learning
techniques to predict which search strategy is most likely to find a proof.
Strategies
Satallax_MaLeS 1.3 relies on Geoff Sutcliffe's MakeListStat to classify
problems.
It uses the same data as Satallax_MaLeS 1.2 as basis for the learning
algorithms.
Implementation
Satallax_MaLeS is based on Python, in particular the Sklearn library that
contains many machine learning algorithms.
After CASC Satallax_MaLeS will be available from
http://www.cs.ru.nl/~kuehlwein/
Expected Competition Performance
We expect Satallax_MaLeS 1.3 to perform slightly better than
Satallax_MaLeS 1.2.
SPASS+T 2.2.19
Uwe Waldmann
Max-Planck-Insitut für Informatik, Germany
Architecture
SPASS+T is an extension of the superposition-based theorem prover SPASS
that integrates algebraic knowledge into SPASS in three complementary ways:
by passing derived formulas to an external SMT procedure
(currently Yices or CVC3), by adding standard axioms,
and by built-in arithmetic simplification and inference rules.
A first version of the system has been described in [PW06];
later a much more sophisticated coupling of the SMT procedure has been
added [Zim07].
The latest version provides improved support for isint/1, israt/1, floor/1
and ceiling/1 and adds partial input abstraction and history-dependent weights
for numerical constants.
Strategies
Standard axioms and built-in arithmetic simplification and inference rules
are integrated into the standard main loop of SPASS.
Inferences between standard axioms are excluded, so the
user-supplied formulas are taken as set of support.
The external SMT procedure runs in parallel in a separate process,
leading occasionally to non-deterministic behaviour.
Implementation
SPASS+T is implemented in C.
SPASS+T is available from
http://www.mpi-inf.mpg.de/~uwe/software/#TSPASS
Expected Competition Performance
SPASS+T 2.2.16 came second in the TFA division of last CASC;
we expect a similar performance in CASC 2013.
SPASS+T 2.2.20
Uwe Waldmann
Max-Planck-Insitut für Informatik, Germany
Architecture
SPASS+T is an extension of the superposition-based theorem prover SPASS
that integrates algebraic knowledge into SPASS in three complementary ways:
by passing derived formulas to an external SMT procedure
(currently Yices or CVC3), by adding standard axioms,
and by built-in arithmetic simplification and inference rules.
A first version of the system has been described in [PW06];
later a much more sophisticated coupling of the SMT procedure has been
added [Zim07].
The latest version provides improved support for isint/1, israt/1, floor/1
and ceiling/1 and adds partial input abstraction and history-dependent weights
for numerical constants.
Strategies
Standard axioms and built-in arithmetic simplification and inference rules
are integrated into the standard main loop of SPASS.
Inferences between standard axioms are excluded, so the
user-supplied formulas are taken as set of support.
The external SMT procedure runs in parallel in a separate process,
leading occasionally to non-deterministic behaviour.
Implementation
SPASS+T is implemented in C.
SPASS+T is available from
http://www.mpi-inf.mpg.de/~uwe/software/#TSPASS
Expected Competition Performance
SPASS+T 2.2.19 came first in the TFA division of last CASC;
we expect a similar performance in CASC 2014.
Vampire 2.6
Krystof Hoder, Andrei Voronkov
University of Manchester, England
Architecture
Vampire 2.6 is an automatic theorem prover for first-order classical logic.
It consists of a shell and a kernel.
The kernel implements the calculi of ordered binary resolution and
superposition for handling equality.
It also implements the Inst-gen calculus.
The splitting rule in kernel adds propositional parts to clauses, which are
being manipulated using binary decision diagrams and a SAT solver.
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion,
subsumption resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Substitution tree and code tree indexes are used to implement all major
operations on sets of terms, literals and clauses.
Although the kernel of the system works only with clausal normal form, the
shell accepts a problem in the full first-order logic syntax, clausifies it
and performs a number of useful transformations before passing the result
to the kernel.
Also the axiom selection algorithm Sine [HV11] can be enabled as part of
the preprocessing.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
Strategies
The Vampire 2.6 kernel provides a fairly large number of options for strategy se
lection. The most important ones are:
- Choice of the main procedure:
- Limited Resource Strategy
- DISCOUNT loop
- Otter loop
- Goal oriented mode based on tabulation
- Instantiation using the Inst-gen calculus
- A variety of optional simplifications.
- Parameterized reduction orderings.
- A number of built-in literal selection functions and different modes
of comparing literals.
- Age-weight ratio that specifies how strongly lighter clauses are
preferred for inference selection.
- Set-of-support strategy.
Implementation
Vampire 2.6 is implemented in C++.
Expected Competition Performance
Vampire 2.6 is the CASC-24 FOF division winner.
Waldmeister 710
Thomas Hillenbrand
Max-Planck-Institut für Informatik, Germany
Architecture
Waldmeister 710 [Hil03] is a system for unit equational deduction.
Its theoretical basis is unfailing completion in the sense of
[BDP89] with refinements towards ordered completion (cf. [AHL03]).
The system saturates the input axiomatization, distinguishing active facts,
which induce a rewrite relation, and passive facts, which are the one-step
conclusions of the active ones up to redundancy.
The saturation process is parameterized by a reduction ordering and a heuristic
assessment of passive facts [HJL99].
This year's version is the result of polishing and fixing a few things in
last year's.
Strategies
The prover is coded in ANSI-C. It runs on Solaris, Linux,
MacOS X, and Windows/Cygwin. The central data strucures are: perfect
discrimination trees for the active facts; group-wise compressions for
the passive ones; and sets of rewrite successors for the conjectures.
Visit the Waldmeister web pages at:
http://www.waldmeister.org
Implementation
The approach taken to control the proof search is to choose the search
parameters – reduction ordering and heuristic assessment –
according to the algebraic structure given in the problem specification [HJL99].
This is based on the observation that proof tasks sharing major parts of
their axiomatization often behave similarly.
Expected Competition Performance
Waldmeister 710 is the CASC-23 UEQ division winner.
VanHElsing 1.0
Daniel Kuehlwein
Radboud University Nijmegen, The Netherlands
Architecture
VanHElsing improves E's automatic mode with machine learning techniques to
predict which search strategy is most likely to find a proof.
Strategies
VanHElsing relies on E's features to classify problems.
It uses the same data as E 1.8 as basis for the learning algorithms.
Implementation
VanHElsing is based on Python, in particular the Sklearn library that
contains many machine learning algorithms.
After CASC VanHElsing will be availble from
http://www.cs.ru.nl/~kuehlwein/
Expected Competition Performance
We expect VanHElsing to perform slightly better than E 1.8.
Zipperposition 0.4
Simon Cruanes
INRIA, France
Architecture
Zipperposition is a typed first-order superposition theorem prover written in
OCaml for prototyping.
It deals with hashconsed polymorphic terms, has a Hindley-Milner-like
type-inference algorithm and accepts TFF1 input.
Its core superposition calculus is strongly inspired from E (discount loop,
mostly same inferences including condensation and contextual literal cutting).
The new version has experimental support for linear integer arithmetic.
It features a new calculus (not published yet), inspired from the work
of Waldmann on superposition modulo rational arithmetic [Wal01].
The calculus is an extension of superposition that doesn't use a
black-box solver, but has special inferences to deal with arithmetic
equations, inequations and divisibility literals.
Additional redundancy criteria are used to make the search space more
tractable.
As we don't have completeness results yet, the solver will fail on
satisfiable problems, even if it finds a saturated set of clauses.
Strategies
Zipperposition uses several clause queues, but unlike E it has no
heuristic to choose them based on the input.
The strategy is therefore a simple alternance of old clauses, and
"small" clauses, in one run.
Implementation
Zipperposition is in pure OCaml, with a mix of imperative and functional
style, including iterators, zippers, etc.
It uses Non-perfect Discrimination Trees and Feature Vectors [Sch04] for
indexing.
Numbers are handled by Zarith, a frontend to GMP.
Expected Competition Performance
In FOF, we expect decent performance.
The results should be quite similar to last year as the superposition core
isn't the focus of our research.
In TFA, since we feature a new calculus, performance is hard to predict.
We expect the prover to perform well on problems SMT solvers will struggle
with, and conversely; best performance should be reached on unsatisfiable
problems with few inequations but possibly function symbols and quantifiers;
worst will be on combinatorial ground problems that require little symbolic
reasoning.
References