Entrants' System Descriptions
Beagle 0.9.22
Peter Baumgartner
NICTA and Australian National University, Australia
Architecture
Beagle is an automated theorem prover for sorted first-order logic with
equality over built-in theories.
The theories currently supported are integer arithmetic, linear rational
arithmetic and linear real arithmetic.
It accepts formulas in the FOF and TFF formats of the TPTP syntax, and
formulas in the SMT-LIB version 2 format.
Beagle first converts the input formulas into clause normal form.
Pure arithmetic (sub-)formulas are treated by eager application of quantifier
elimination.
The core reasoning component implements the Hierarchic Superposition Calculus
with Weak Abstraction (HSPWA)
[BW13].
Extensions are a splitting rule for clauses that can be divided into variable
disjoint parts, and a chaining inference rule for reasoning with inequalities.
The HSPWA calculus generalizes the superposition calculus by integrating
theory reasoning in a black-box style.
For the theories mentioned above, Beagle combines quantifier elimination
procedures and other solvers to dispatch proof obligations over these theories.
The default solvers are an improved version of Cooper's algorithm for linear
integer arithmetic, and the Fourier-Motzkin algorithm for linear real/rational
arithmetic.
Non-linear integer arithmetic is treated by partial instantiation.
Strategies
Beagles uses the Discount loop for saturating a clause set under the calculus'
inference rules.
Simplification techniques include standard ones, such as subsumption deletion,
demodulation by ordered unit equations, and tautology deletion.
It also includes theory specific simplification rules for evaluating
ground (sub)terms, and for exploiting cancellation laws and properties of
neutral elements, among others.
In the competition an aggressive form of arithmetic simplification is used,
which seems to perform best in practice.
Beagle uses strategy scheduling by trying (at most) two flag settings
sequentially.
Implementation
Beagle is implemented in Scala.
It is a full implementation of the HSPWA calculus.
It uses a simple form of indexing, essentially top-symbol hashes, stored
with each term and computed in a lazy way.
Fairness is achieved through a combination of measuring clause weights
and their derivation-age.
It can be fine-tuned with a weight-age ratio parameter, as in other provers.
Beagle's web site is
https://bitbucket.org/peba123/beagle
Expected Competition Performance
Beagle is implemented in a straightforward way and would benefit from
optimized data structures.
We do not expect it to come in among the first.
CVC4 1.4
Andrew Reynolds
EPFL, Switzerland
Architecture
CVC4 [BC+11] is an SMT solver based on the DPLL(T) architecture [NOT06] that
includes built-in support for many theories including linear arithmetic,
arrays, bit vectors, datatypes and strings.
It incorporates various approaches for handling universally quantified
formulas.
In particular, CVC4 uses primarily uses heuristic approaches based on
E-matching for answering "unsatisfiable", and finite model finding approaches
for answering "satisfiable".
Like other SMT solvers, CVC4 treats quantified formulas using a two-tiered
approach.
First, quantified formulas are replaced by fresh boolean predicates and the
ground theory solver(s) are used in conjunction with the underlying SAT
solver to determine satisfiability.
If the problem is unsatisfiable at the ground level, then the solver answers
"unsatisfiable".
Otherwise, the quantifier instantiation module is invoked, and will either
add instances of quantified formulas to the problem, answer "satisfiable",
or return unknown.
The finite model finding has been developed to target problems containing
background theories, whose quantification is limited to finite and
uninterpreted sorts.
In finite model finding mode, CVC4 uses a ground theory of finite cardinality
constraints that minimizes the number of ground equivalence classes, as
described in [RT+13].
When the problem is satisfiable at the ground level, a candidate model is
constructed that contains complete interpretations for all predicate and
function symbols.
Quantifier instantiation strategies are then invoked to add instances of
quantified formulas that are in conflict with the candidate model, as
described in [RT+13].
If no instances are added, then the solver reports "satisfiable".
Strategies
For handling theorems, CVC4 primarily uses various configurations of
E-matching.
This year, CVC4 incorporates new methods for finding conflicting instances
of quantified formulas [RTd14], which have been shown to lead to improved
performance on unsatisfiable TPTP benchmarks.
CVC4 also incorporates a model-based heuristic for handling quantified
formulas containing only pure arithmetic, which will be used in the TFA
division.
For handling non-theorems, the finite model finding feature of CVC4 will
use a number of orthogonal quantifier instantiation strategies.
This year, it will incorporate several new features, including an optimized
implementation of model-based quantifier instantiation which improves upon
[RT+13], as well as techniques for sort inference.
Since CVC4 with finite model finding is also capable of answering
"unsatisfiable", it will be used as a strategy for theorems as well.
Implementation
CVC4 is implemented in C++.
The code is available from
https://github.com/CVC4
Expected Competition Performance
CVC4 1.4 is the CASC-J7 TFA division winner.
CVC4 1.5
Andrew Reynolds
EPFL, Switzerland
Architecture
CVC4 [BC+11] is an SMT solver based on the DPLL(T) architecture [NOT06] that
includes built-in support for many theories, including linear arithmetic,
arrays, bit vectors, datatypes and strings.
It incorporates approaches for handling universally quantified formulas.
CVC4 primarily uses heuristic approaches based on E-matching for theorems,
and finite model finding approaches for non-theorems.
Like other SMT solvers, CVC4 treats quantified formulas using a two-tiered
approach. First, quantified formulas are replaced by fresh Boolean
predicates and the ground theory solver(s) are used in conjunction with the
underlying SAT solver to determine satisfiability.
If the problem is unsatisfiable at the ground level, then the solver answers
"unsatisfiable".
Otherwise, the quantifier instantiation module is invoked, and will either
add instances of quantified formulas to the problem, answer "satisfiable",
or return unknown.
Finite model finding in CVC4 targets problems containing background
theories whose quantification is limited to finite and uninterpreted sorts.
In finite model finding mode, CVC4 uses a ground theory of finite cardinality
constraints that minimizes the number of ground equivalence classes, as
described in [RT+13].
When the problem is satisfiable at the ground level, a candidate model is
constructed that contains complete interpretations for all predicate and
function symbols.
It then adds instances of quantified
formulas that are in conflict with the candidate model, as described in
[RT+13].
If no instances are added, it reports "satisfiable".
Strategies
For handling theorems, CVC4 primarily uses configurations that combine
conflict-based quantifier instantiation
[RTd14]
and E-matching.
CVC4 uses a handful of orthogonal trigger selection strategies for E-matching.
For handling non-theorems, CVC4 primarily uses finite model finding
techniques. These techniques can also be used for bounded integer
quantification for non-theorems involving arithmetic
[Rey13].
Since CVC4 with finite model finding is also capable of establishing
unsatisfiability, it is used as a strategy for theorems as well.
For problems in pure arithmetic, CVC4 uses techniques for
counterexample-guided quantifier instantiation
[RD+15],
which select relevant quantifier instantiations based on models for
counterexamples to quantified formulas.
CVC4 relies on this method both for theorems in TFA and non-theorems in TFN.
Implementation
CVC4 is implemented in C++.
The code is available from
https://github.com/CVC4
Expected Competition Performance
CVC4 has undergone various performance improvements in the past year.
It is expected to perform better than last year in FOF, due to improved trigger
selection for E-matching, and higher reliance upon conflict-based
quantifier instantiation.
It is expected to perform better than last year in TFA, due to a revised
implementation of counterexample-guided quantifier instantiation, and
improvements to E-matching.
It is expected to be competitive in TFN.
E 1.9.1
Stephan Schulz
DHBW Stuttgart, Germany
Architecture
E
[Sch02,
Sch13]
is a purely equational theorem prover for full first-order logic with equality.
It consists of an (optional) clausifier for pre-processing full first-order
formulae into clausal form, and a saturation algorithm implementing an instance
of the superposition calculus with negative literal selection and a number of
redundancy elimination techniques.
E is based on the DISCOUNT-loop variant of the given-clause algorithm,
i.e., a strict separation of active and passive facts.
No special rules for non-equational literals have been implemented.
Resolution is effectively simulated by paramodulation and equality resolution.
For the LTB divisions, a control program uses a SInE-like analysis to
extract reduced axiomatizations that are handed to several instances
of E.
E will not use the on-the-fly learning introduced this year.
Strategies
Proof search in E is primarily controlled by a literal selection
strategy, a clause evaluation heuristic, and a simplification
ordering. The prover supports a large number of pre-programmed literal
selection strategies. Clause evaluation heuristics can be constructed
on the fly by combining various parameterized primitive evaluation
functions, or can be selected from a set of predefined
heuristics. Clause evaluation heuristics are based on symbol-counting,
but also take other clause properties into account. In particular, the
search can prefer clauses from the set of support, or containing many
symbols also present in the goal. Supported term orderings are several
parameterized instances of Knuth-Bendix-Ordering (KBO) and
Lexicographic Path Ordering (LPO).
For CASC-25, E implements a strategy-scheduling automatic mode. The
total CPU time available is broken into 8 (unequal) time slices. For
each time slice, the problem is classified into one of several
classes, based on a number of simple features (number of clauses,
maximal symbol arity, presence of equality, presence of non-unit and
non-Horn clauses,...). For each class, a schedule of strategies is
greedily constructed from experimental data as follows: The first
strategy assigned to a schedule is the the one that solves the most
problems from this class in the first time slice. Each subsequent
strategy is selected based on the number of solutions on problems not
already solved by a preceding strategy.
About 210 different strategies have been evaluated on all untyped
first-order problems from TPTP 6.0.0, and about 180 of these
strategies are used in the automatic mode. A few new strategies may be
added.
Implementation
E is build around perfectly shared terms, i.e. each distinct term is
only represented once in a term bank. The whole set of terms thus
consists of a number of interconnected directed acyclic graphs.
Term memory is managed by a simple mark-and-sweep garbage collector.
Unconditional (forward) rewriting using unit clauses is implemented
using perfect discrimination trees with size and age constraints.
Whenever a possible simplification is detected, it is added as a rewrite
link in the term bank.
As a result, not only terms, but also rewrite steps are shared.
Subsumption and contextual literal cutting (also known as subsumption
resolution) is supported using feature vector indexing
[Sch04].
Superposition and backward rewriting use fingerprint indexing
[Sch12],
a new technique combining ideas from feature vector indexing and path
indexing.
Finally, LPO and KBO are implemented using the elegant and efficient
algorithms developed by Bernd Löchner in
[Loe06,
Loe06].
The prover and additional information are available at
http://www.eprover.org
Expected Competition Performance
E 1.9.1 has only seen minor changes and inconsequential bug fixes
compared to last years version. It can produce proof objects quite
efficiently. The system is expected to perform reasonably well in most
proof classes, but will at best complement top systems in the disproof
classes.
ePrincess 1.0
Peter Backeman
Uppsala University, Sweden
Architecture
Princess
[Rue08,Rue12]
is a theorem prover for first-order logic modulo linear integer arithmetic.
The prover uses a combination of techniques from the areas of first-order
reasoning and SMT solving.
The main underlying calculus is a free-variable tableau calculus,
which is extended with constraints to enable backtracking-free proof
expansion, and positive unit hyper-resolution for lightweight
instantiation of quantified formulae.
Linear integer arithmetic is handled using a set of built-in proof rules
resembling the Omega test, which altogether yields a calculus that is
complete for full Presburger arithmetic, for first-order logic, and for a
number of further fragments.
In addition, some built-in procedures for nonlinear integer arithmetic are
available.
The internal calculus of Princess only supports uninterpreted
predicates; uninterpreted functions are encoded as predicates,
together with the usual axioms. Through appropriate translation of
quantified formulae with functions, the e-matching technique common in
SMT solvers can be simulated; triggers in quantified formulae are
chosen based on heuristics similar to those in the Simplify prover.
ePrincess is an extension of Princess using a calculus presented in
[BR15a]
to handle first-order formulas with equality.
A restriced version of simultaneous rigid e-unification is utilised to handle
equalities with free variables.
The solving procedures presented in
[BR15b]
have been implemented and constitue the foundation of the unification step.
Strategies
ePrincess is using the same strategies as Princess, which means for
CASC, ePrincess will run a fixed schedule of configurations for each
problem (portfolio method). Configurations determine, among others,
the mode of proof expansion (depth-first, breadth-first), selection of
triggers in quantified formulae, clausification, and the handling of
functions. The portfolio was chosen based on training with a random
sample of problems from the TPTP library.
Implementation
ePrincess is entirely written in Scala and runs on any recent Java
virtual machine; besides the standard Scala and Java libraries, only
the Cup parser library is used.
ePrincess is available from
http://user.it.uu.se/~petba168/breu/
Expected Competition Performance
ePrincess is mainly designed for first order formula (FOF), and based
on performance measurements it should perform decently dealing with
problems containing equality.
ET 0.2
Josef Urban
Radboud University Nijmegen, The Netherlands
Architecture
E.T. 0.1 is a metasystem using E prover with specific strategies [Urb13,KU13]
and preprocessing tools [KU13,KU13,KU13] that are targeted mainly at
problems with many redundant axioms.
Its design is motivated by the recent experiments in the Large-Theory Batch
division [KUV14] and on the Flyspeck, Mizar and Isabelle datasets, however,
E.T. does no learning from related proofs.
Strategies
We characterize formulas by the symbols and terms that they contain,
normalized in various ways.
Then we run various algorithms that try to remove the redundant axioms and
use special strategies on such problems.
Implementation
The metasystem is implemented in ca. 1000 lines of Perl.
It uses a number of external programs, some of them based on E's code base,
some of them independently implemented in C++.
Expected Competition Performance
ET 0.2 is expected to be slightly better than ET 0.1, provided I do not
do too much windsurfing.
Geo-III 2015E
Hans de Nivelle
University of Wroclaw, Poland
Architecture
Geo III is a theorem prover for Partial Classical Logic
[deN11],
based on reduction to Kleene Logic
[deN14].
Currently, only Sections 4 and 5 of
[deN14]
are implemented.
Since Kleene logic generalizes 2-valued logic, it is still possible to take
part in CASC.
Apart from being 3-valued, the main differences with earlier
versions of Geo are (1) more sophisticated learning schemes,
(2) improved proof logging, and (3) replacement of
recursion by explicit use of a stack.
- The Geo family of provers uses exhaustive backtracking, combined with
learning after failure. Earlier versions learned only conflict formulas.
Geo III learns disjunctions of arbitrary width.
Experiments show that this often results in shorter proofs.
- If Geo will be ever embedded in proof assistants, these assistants
will require proofs. In order to be able to provide
these at the required level of detail, Geo III contains a
hierarchy of proof rules that is independent of the
rest of the system, and that can be modified independently.
- In order to be flexible in the main loop, recursion was replaced
by an explicit stack. Using an explicit stack, it is easier to
remove unused assumptions, or to rearrange the order of assumptions.
Also, restarts are easier to implement with a stack.
Strategies
Geo uses breadth-first, exhaustive model search, combined with learning.
In case of branching, the branches are explored in random order.
Specially for CASC, a restart strategy was added, which ensures
that proof search is always restarted after 4 minutes.
This was done because Geo III has no indexing.
After some time, proof search becomes so inefficient that it makes no sense
to continue, so that it is better to restart.
Implementation
Geo III is written in C++ 11. No features outside of the standard
were used. It has been tested with g++ (version 4.8.4) and with
clang.
Version 2015E contains almost no technical optimizations, because
the calculus (the geometric format, the learning schemes,
and the restart strategies) are not yet mature.
Expected Competition Performance
Since we didn't have time to test Geo extensively, we are
unable to make any predictions. We hope to learn from CASC about
the relative strengths of Geo.
iProver 0.9
Konstantin Korovin
University of Manchester, United Kingdom
Architecture
iProver is an automated theorem prover based on an instantiation calculus
Inst-Gen [GK03,Kor09] which is complete for first-order logic.
One of the distinctive features of iProver is a modular combination of
first-order reasoning with ground reasoning.
In particular, iProver currently integrates MiniSat [ES04] for reasoning
with ground abstractions of first-order clauses.
In addition to instantiation, iProver implements ordered resolution calculus
and a combination of instantiation and ordered resolution; see
[Kor08] for the implementation details.
The saturation process is implemented as a modification of a given
clause algorithm.
iProver uses non-perfect discrimination trees for the unification indexes,
priority queues for passive clauses, and a compressed vector index for
subsumption and subsumption resolution (both forward and backward).
The following redundancy eliminations are implemented: blocking non-proper
instantiations; dismatching constraints [GK04,Kor08]; global subsumption
[Kor08]; resolution-based simplifications and propositional-based
simplifications.
A compressed feature vector index is used
for efficient forward/backward subsumption and subsumption resolution.
Equality is dealt with (internally) by adding the necessary axioms of equality
with an option of using Brand's transformation.
In the LTB division, iProver-SInE uses axiom selection based
on the SInE algorithm [HV11] as implemented in Vampire [HKV10],
i.e., axiom selection is done by Vampire and
proof attempts are done by iProver.
Major additions in the current version are:
- answer computation,
- several modes for model output using first-order definitions in
term algebra,
- Brand's transformation.
Strategies
iProver has around 40 options to control the proof
search including options for literal selection,
passive clause selection, frequency of calling the SAT solver,
simplifications and options for combination of instantiation with resolution.
At CASC iProver will execute a small number of fixed schedules of selected
options depending on general syntactic properties such as Horn/non-Horn,
equational/non-equational, and maximal term depth.
Implementation
iProver is implemented in OCaml and for the ground reasoning uses MiniSat.
iProver accepts FOF and CNF formats,
where Vampire [HKV10] is used for clausification of FOF problems.
iProver is available from
http://www.cs.man.ac.uk/~korovink/iprover/
Expected Competition Performance
iProver 0.9 is the CASC-J7 EPR division winner.
iProver 1.0
Konstantin Korovin, Christoph Sticksel
University of Manchester, United Kingdom
Architecture
iProver is an automated theorem prover based on an instantiation calculus
Inst-Gen
[GK03,Kor13] ,
which is complete for first-order logic.
iProver combines first-order reasoning with ground reasoning for which it
uses MiniSat
[ES04]
and was recently extended with PicoSAT
[Bie08]
and Lingeling
[Bie12]
(only MiniSat will be used at this CASC).
iProver also combines instantiation with ordered resolution; see
[Kor08]
for the implementation details.
The proof search is implemented using a saturation process based on
the given clause algorithm. iProver uses non-perfect discrimination trees
for the unification indexes, priority queues for passive clauses, and
a compressed vector index for subsumption and subsumption resolution
(both forward and backward).
The following redundancy eliminations are implemented: blocking non-proper
instantiations; dismatching constraints
[GK04,Kor08];
global subsumption
[Kor08];
resolution-based simplifications and propositional-based simplifications.
A compressed feature vector index is used for efficient forward/backward
subsumption and subsumption resolution.
Equality is dealt with (internally) by adding the necessary axioms of equality
with an option of using Brand's transformation.
In the LTB division, iProver uses axiom selection based on the SInE
algorithm
[HV11]
as implemented in Vampire
[HKV11],
i.e., axiom selection is done by Vampire and proof attempts are done by
iProver.
Some of iProver features are summarised below.
- proof extraction for both instantiation and resolution,
- model representation, using first-order definitions in term algebra,
- answer substitutions,
- semantic filtering,
- type inference, monotonic
[CLS11]
and non-cyclic types,
- Brand's transformation.
Type inference is targeted at improving finite model finding and symmetry
breaking.
Semantic filtering is used in preprocessing to eliminated irrelevant clauses.
Proof extraction is challenging
due to simplifications such global subsumption which involve global reasoning
with the whole clause set and can be computationally expensive.
Strategies
iProver has around 60 options to control the proof search including options
for literal selection, passive clause selection, frequency of calling the
SAT solver, simplifications and options for combination of instantiation
with resolution.
At CASC iProver will execute a small number of fixed schedules of selected
options depending on general syntactic properties such as Horn/non-Horn,
equational/non-equational, and maximal term depth.
The strategy for satisfiable problems (FNT division) includes finite model
finding.
Implementation
iProver is implemented in OCaml and for the ground reasoning uses
MiniSat
[ES04].
iProver accepts FOF and CNF formats.
Vampire
[HKV11,HK+12]
is used for proof-producing clausification of FOF problems as well as for
axiom selection
[HV11]
in the LTB division.
iProver is available from
http://www.cs.man.ac.uk/~korovink/iprover/
Expected Competition Performance
iProver 1.0 is the CASC-J7 FNT division winner.
iProver 2.0
Konstantin Korovin
University of Manchester, United Kingdom
Architecture
iProver is an automated theorem prover based on an instantiation calculus
Inst-Gen
[GK03,Kor13],
which is complete for first-order logic.
iProver combines first-order reasoning with ground reasoning for which it
uses MiniSat
[ES04]
and optionally PicoSAT
[Bie08]
and Lingeling
[Bie12]
(only MiniSat will be used at this CASC).
iProver also combines instantiation with ordered resolution; see
[Kor08,Kor13]
for the implementation details.
The proof search is implemented using a saturation process based on
the given clause algorithm.
iProver uses non-perfect discrimination trees for the unification indexes,
priority queues for passive clauses, and a compressed vector index for
The following redundancy eliminations are implemented: blocking non-proper
instantiations; dismatching constraints
[GK04,Kor08];
global subsumption
[Kor08];
resolution-based simplifications and propositional-based simplifications.
A compressed feature vector index is used for efficient forward/backward
subsumption and subsumption resolution.
Equality is dealt with (internally) by adding the necessary axioms of
equality.
Recent changes in iProver include improved preprocessing and incremental
finite model finding; support of the AIG format for hardware verification
and model-checking (implemented by Dmitry Tsarkov).
In the LTB division, iProver uses axiom selection based on the SInE
algorithm
[HV11]
as implemented in Vampire
[HKV11],
i.e., axiom selection is done by Vampire and proof attempts are done by
iProver.
Some of iProver features are summarised below.
- proof extraction for both instantiation and resolution
[KS12]
- model representation, using first-order definitions in term algebra
[KS12]
- answer substitutions,
- semantic filtering
- incremental finite model finding,
- sort inference, monotonic
[CLS11]
and non-cyclic
[Kor13]
sorts.
- Brand's transformation.
Sort inference is targeted at improving finite model finding and symmetry
breaking.
Semantic filtering is used in preprocessing to eliminated irrelevant clauses.
Proof extraction is challenging due to simplifications such global subsumption
which involve global reasoning with the whole clause set and can be
computationally expensive.
Strategies
iProver has around 60 options to control the proof search including options
for literal selection, passive clause selection, frequency of calling the
SAT solver, simplifications and options for combination of instantiation
with resolution.
At CASC iProver will execute a small number of fixed schedules of selected
options depending on general syntactic properties such as Horn/non-Horn,
equational/non-equational, and maximal term depth.
For the LTB and FNT divisions several strategies are run in parallel.
Implementation
iProver is implemented in OCaml and for the ground reasoning uses MiniSat
[ES04].
iProver accepts FOF and CNF formats.
Vampire
[HKV11,HK+12]
and E prover
[Sch13]
are used for proof-producing clausification of FOF problems,
Vampire is also used for axiom selection
[HV11]
in the LTB division.
iProver is available at:
http://www.cs.man.ac.uk/~korovink/iprover/
Expected Competition Performance
Compared to the last year, iProver had several improvements in datastructures
and finite model finding.
We expect similar performance in the EPR division and improved performance
on satisfiable problems (FNT division).
iProverModulo 0.7-0.3
Guillaume Burel
ENSIIE/Cedric, France
Architecture
iProverModulo
[Bur11]
is an extension of iProver
[Kor08]
to integrate Polarized resolution modulo
[Dow10].
Polarized resolution modulo consists in presenting the theory in which the
problem has to be solved by means of polarized rewriting rules instead of
axioms.
It can also be seen as a combination of the set-of-support strategy and
selection of literals.
iProverModulo consists of two tools: First, autotheo is a theory
preprocessor that converts the axioms of the input into rewriting
rules that can be used by Polarized resolution modulo. Second, these
rewriting rules are handled by a
patched version of iProver 0.7 which integrates Polarized resolution modulo.
The integration of polarized resolution modulo in iProver only affects
its ordered resolution calculus, so that the instantiation calculus is
untouched.
iProverModulo 0.7-0.3 outputs a proof that is made of two parts:
First, autotheo print a derivation of the transformation of the axioms
into rewriting rules.
This derivation is in TSTP format and includes the CNF conversions obtained
from E.
Second, the modified version of iProver outputs a Dedukti proof from this
rewriting rules and the non-axiom formulas, following the ideas of
[Bur13].
Strategies
Autotheo is first run to transform the formulas of the
problem whose role is "axiom" into polarized rewriting rules.
Autotheo offers a set of strategies to that purpose.
For the competition, the Equiv and the ClausalAll strategies
will be used.
The former strategy orients formulas intuitively depending of their
shape. It may be incomplete, so that the prover may give up in certain cases.
However, it shows interesting results on some problems.
The second strategy should be complete, at least when equality is not
involved.
The rewriting system for the first strategy is tried for half the time
given for the problem, then the prover is restarted with the second strategy
if no proof has been found.
The patched version of iProver is run on the remaining formulas
modulo the rewriting rules produced by autotheo. No scheduling is
performed. To be compatible with Polarized resolution modulo, literals
are selected only when they are maximal w.r.t. a KBO ordering, and orphans
are not eliminated. To take advantage of Polarized resolution modulo,
the resolution calculus is triggered more often than the instantiation
calculus, on the contrary to the original iProver.
Normalization of clauses w.r.t. the term rewriting system produced by
autotheo is performed by transforming these rules into an
OCaml program, compiling this program, and dynamically linking it with
the prover.
Implementation
iProverModulo is available as a patch to iProver.
The most important additions are the plugin-based normalization engine and
the handling of polarized rewriting rules.
iProverModulo is available from
http://www.ensiie.fr/~guillaume.burel/blackandwhite_iProverModulo.html.en
Since iProverModulo needs to compile rewriting rules, an OCaml
compiler is also provided.
Autotheo is available independently from iProverModulo from
http://www.ensiie.fr/~guillaume.burel/blackandwhite_autotheo.html.en
Autotheo uses E to compute clausal normal form of formula. The version
of E it uses is very slightly modified to make it print the CNF
derivation even if no proof is found.
Both of autotheo and iProver are written in OCaml.
Expected Competition Performance
The core of iProverModulo was untouched since last time it entered the
competition in CASC-24. However, compilation of rewriting rules failed
at the time, so a slight improvement is to be expected this year.
Isabelle 2015
Jasmin Blanchette
Inria Nancy, France
Architecture
Isabelle/HOL 2015
[NPW02]
is the higher-order logic incarnation of the generic proof assistant
Isabelle2015.
Isabelle/HOL provides several automatic proof tactics, notably an equational
reasoner
[Nip89],
a classical reasoner
[PN94],
and a tableau prover
[Pau99].
It also integrates external first- and higher-order provers via its subsystem
Sledgehammer
[PB10,BBP11].
Isabelle includes a parser for the TPTP syntaxes CNF, FOF, TFF0, and THF0, due
to Nik Sultana. It also includes TPTP versions of its popular tools, invokable
on the command line as isabelle tptp_tool max_secs
file.p. For example:
isabelle tptp_isabelle_hot 100 SEU/SEU824^3.p
Isabelle is available in two versions.
The HOT version (which is not participating in CASC-25) includes LEO-II
[BP+08]
and Satallax
[Bro12]
as Sledgehammer backends, whereas the competition version leaves them out.
Strategies
The Isabelle tactic submitted to the competition simply tries the
following tactics sequentially:
- sledgehammer
- Invokes the following sequence of provers as oracles via Sledgehammer:
- satallax - Satallax 2.7
[Bro12] (HOT version only);
- leo2 - LEO-II 1.6.2
[BPTF08] (HOT version only);
- spass - SPASS 3.8ds
[BP+12];
- vampire - Vampire 3.0 (revision 1435)
[RV02];
- e - E 1.8
[Sch04];
- nitpick
- For problems involving only the type $o of Booleans, checks
whether a finite model exists using Nitpick
[BN10].
- simp
- Performs equational reasoning using rewrite rules
[Nip89].
- blast
- Searches for a proof using a fast untyped tableau prover and then
attempts to reconstruct the proof using Isabelle tactics
[Pau99].
- auto+spass
- Combines simplification and classical reasoning
[PN94]
under one roof; then invoke Sledgehammer with SPASS on any subgoals that
emerge.
- z3
- Invokes the SMT solver Z3 4.4.0
[dMB08].
- cvc4
- Invokes the SMT solver CVC4 1.5pre
[BT07].
- fast
- Searches for a proof using sequent-style reasoning, performing a
depth-first search
[PN94].
Unlike blast, it construct proofs directly in Isabelle.
That makes it slower but enables it to work in the presence of the more
unusual features of HOL, such as type classes and function unknowns.
- best
- Similar to fast, except that it performs a best-first search.
- force
- Similar to auto, but more exhaustive.
- meson
- Implements Loveland's MESON procedure
[Lov78].
Constructs proofs directly in Isabelle.
- fastforce
- Combines fast and force.
Implementation
Isabelle is a generic theorem prover written in Standard ML. Its meta-logic,
Isabelle/Pure, provides an intuitionistic fragment of higher-order logic. The
HOL object logic extends pure with a more elaborate version of higher-order
logic, complete with the familiar connectives and quantifiers. Other object
logics are available, notably FOL (first-order logic) and ZF
(Zermelo–Fraenkel set theory).
The implementation of Isabelle relies on a small LCF-style kernel, meaning that
inferences are implemented as operations on an abstract theorem
datatype. Assuming the kernel is correct, all values of type theorem
are correct by construction.
Most of the code for Isabelle was written by the Isabelle teams at the
University of Cambridge and the Technische Universität München.
Isabelle/HOL is available for all major platforms under a BSD-style license
from
http://www.cl.cam.ac.uk/research/hvg/Isabelle
Expected Competition Performance
Thanks to the addition of CVC4 and a new version of Vampire,
Isabelle might have become now strong enough to take on Satallax
and its various declensions. But we expect Isabelle to end in
second or third place, to be honest.
leanCoP 2.2
Jens Otten
University of Potsdam, Germany
Architecture
leanCoP [OB03,Ott08] is an automated theorem prover for classical first-order
logic with equality.
It is a very compact implementation of the connection (tableau)
calculus [Bib87,LS01].
Strategies
The reduction rule of the connection calculus is applied before the
extension rule. Open branches are selected in a depth-first way.
Iterative deepening on the proof depth is performed in order to achieve
completeness.
Additional inference rules and techniques include regularity, lemmata,
and restricted backtracking [Ott10].
leanCoP uses an optimized structure-preserving transformation
into clausal form [Ott10] and a fixed
strategy scheduling, which is controlled by a shell script.
Implementation
leanCoP is implemented in Prolog.
The source code of the core prover consists only of a few lines of code.
Prolog's built-in indexing mechanism is used to quickly find connections
when the extension rule is applied.
leanCoP can read formulae in leanCoP syntax and in
TPTP first-order syntax. Equality axioms and axioms to support
distinct objects are automatically added if required.
The leanCoP core prover returns a very compact connection proof,
which is then translated into a more comprehensive output format,
e.g., into a lean (TPTP-style) connection proof or into a
readable text proof.
The source code of leanCoP 2.2 is available under the GNU general
public license.
It can be downloaded from the leanCoP website at:
http://www.leancop.de
The website also contains information about ileanCoP [Ott08] and MleanCoP
[Ott12,Ott14], two versions of leanCoP for first-order intuitionistic logic
and first-order modal logic, respectively.
Expected Competition Performance
As the core prover has not changed, the performance of leanCoP 2.2
is expected to be similar to the performance of leanCoP 2.1.
LEO-II 1.6.2
Christoph Benzmüller
Freie Universität Berlin, Germany
Architecture
LEO-II [BP+08], the successor of LEO [BK98], is a higher-order ATP system
based on extensional higher-order resolution.
More precisely, LEO-II employs a refinement of extensional higher-order
RUE resolution [Ben99].
LEO-II is designed to cooperate with specialist systems for fragments of
higher-order logic.
By default, LEO-II cooperates with the first-order ATP system E [Sch02].
LEO-II is often too weak to find a refutation amongst the steadily growing
set of clauses on its own.
However, some of the clauses in LEO-II's search space attain a special
status: they are first-order clauses modulo the application of an
appropriate transformation function.
Therefore, LEO-II launches a cooperating first-order ATP system every n
iterations of its (standard) resolution proof search loop (e.g., 10).
If the first-order ATP system finds a refutation, it communicates its success
to LEO-II in the standard SZS format.
Communication between LEO-II and the cooperating first-order ATP system
uses the TPTP language and standards.
Strategies
LEO-II employs an adapted "Otter loop".
Moreover, LEO-II uses some basic strategy scheduling to try different
search strategies or flag settings.
These search strategies also include some different relevance filters.
Implementation
LEO-II is implemented in OCaml 4, and its problem representation language
is the TPTP THF language [BRS08].
In fact, the development of LEO-II has largely paralleled the development
of the TPTP THF language and related infrastructure [SB10].
LEO-II's parser supports the TPTP THF0 language and also the TPTP languages
FOF and CNF.
Unfortunately the LEO-II system still uses only a very simple
sequential collaboration model with first-order ATPs instead of using
the more advanced, concurrent and resource-adaptive OANTS architecture
[BS+08] as exploited by its predecessor LEO.
The LEO-II system is distributed under a BSD style license, and it is
available from
http://www.leoprover.org
Expected Competition Performance
There are only minor recent improvements on LEO-II.
It is unclear whether they are sufficient for attacking LEO-II's main
competitors.
MaLARea 0.5
Josef Urban
Radboud University Nijmegen, The Netherlands
Architecture
MaLARea 0.5
[Urb07,
US+08]
is a metasystem for ATP in large theories where symbol and formula names are
used consistently.
It uses several deductive systems (now E, SPASS, Vampire, Paradox, Mace),
as well as complementary AI techniques like machine learning (the SNoW system)
based on symbol-based similarity, model-based similarity, term-based
similarity, and obviously previous successful proofs.
The version for CASC will mainly use E prover with the BliStr
[Urb13]
large-theory strategies, possibly also Prover9, Mace and Paradox.
The premise selection methods will likely also use the distance-weighted
k-nearest neighbor
[KU13]
and E's implementation of SInE.
Strategies
The basic strategy is to run ATPs on problems, then use the machine learner
to learn axiom relevance for conjectures from solutions, and use
the most relevant axioms for next ATP attempts.
This is iterated, using different timelimits and axiom limits.
Various features are used for learning, and the learning is complemented
by other criteria like model-based reasoning, symbol and term-based
similarity, etc.
Implementation
The metasystem is implemented in ca. 2500 lines of Perl. It uses
many external programs - the above mentioned ATPs and machine learner,
TPTP utilities, LADR utilities for work with models, and some standard
Unix tools.
MaLARea is available from
https://github.com/JUrban/MPTP2/tree/master/MaLARea
The metasystem's Perl code is released under GPL2
Expected Competition Performance
Thanks to machine learning, MaLARea is strongest on batches of many
related problems with many redundant axioms where some of the problems
are easy to solve and can be used for learning the axiom
relevance.
MaLARea is not very good when all problems are too difficult (nothing to
learn from), or the problems (are few and) have nothing in common.
Some of its techniques (selection by symbol and term-based similarity,
model-based reasoning) could however make it even there slightly stronger
than standard ATPs.
MaLARea has a very good performance on the MPTP Challenge, which is a
predecessor of the LTB division, and it is the winner of the 2008 MZR LTB
category.
MaLARea 0.4 came second in the 2012 MZR@Turing competition and solved
most problems in the Assurance class.
Muscadet 4.5
Dominique Pastre
University Paris Descartes, France
Architecture
The Muscadet theorem prover is a knowledge-based system.
It is based on Natural Deduction, following the terminology of [Ble71] and
[Pas78], and uses methods which resembles those used by humans.
It is composed of an inference engine, which interprets and executes rules,
and of one or several bases of facts,
which are the internal representation of "theorems to be proved".
Rules are either universal and put into the system, or built by the system
itself by metarules from data (definitions and lemmas).
Rules may add new hypotheses, modify the conclusion, create objects,
split theorems into two or more subtheorems
or build new rules which are local for a (sub-)theorem.
Strategies
There are specific strategies for existential, universal, conjunctive or
disjunctive hypotheses and conclusions, and equalities.
Functional symbols may be used, but an automatic creation of intermediate
objects allows deep subformulae to be flattened and treated as if the
concepts were defined by predicate symbols.
The successive steps of a proof may be forward deduction (deduce new hypotheses
from old ones), backward deduction (replace the conclusion by a new one),
refutation (only if the conclusion is a negation), search for objects
satisfying the conclusion or dynamic building of new rules.
The system is also able to work with second order statements.
It may also receive knowledge and know-how for a specific domain from a
human user; see [Pas89] and [Pas93].
These two possibilities are not used while working with the TPTP Library.
Implementation
Muscadet [Pas01] is implemented in SWI-Prolog.
Rules are written as more or less declarative Prolog clauses.
Metarules are written as sets of Prolog clauses.
The inference engine includes the Prolog interpreter and some procedural
Prolog clauses.
A theorem may be split into several subtheorems, structured as a tree with
"and" and "or" nodes.
All the proof search steps are memorized as facts including all the elements
which will be necessary to extract later the useful steps (the name of the
executed action or applied rule, the new facts added or rule dynamically
built, the antecedents and a brief explanation).
Muscadet is available from
http://www.normalesup.org/~pastre/muscadet/muscadet.html
Expected Competition Performance
The best performances of Muscadet will be for problems manipulating many
concepts in which all statements (conjectures, definitions, axioms) are
expressed in a manner similar to the practice of humans, especially of
mathematicians [Pas02,Pas07].
It will have poor performances for problems using few concepts but large
and deep formulas leading to many splittings.
Its best results will be in set theory, especially for functions and relations.
It has not been not really improved from previous years.
Nevertheless it is still maintained and slightly improved.
Its originality is that proofs are given in natural style.
It should now be combined with other provers.
Nitpick 2015
Jasmin Blanchette
Inria Nancy, France
Architecture
Nitpick
[BN10]
is an open source counterexample generator for Isabelle/HOL
[NPW02].
It builds on Kodkod
[TJ07],
a highly optimized first-order relational model finder based on SAT.
The name Nitpick is appropriated from a now retired Alloy precursor.
In a case study, it was applied successfully to a formalization of the C++
memory model
[BW+11].
Strategies
Nitpick employs Kodkod to find a finite model of the negated conjecture. The
translation from HOL to Kodkod's first-order relational logic (FORL) is
parameterized by the cardinalities of the atomic types occurring in it. Nitpick
enumerates the possible cardinalities for each atomic type, exploiting
monotonicity to prune the search space
[BK11].
If a formula has a finite counterexample, the tool eventually finds it,
unless it runs out of resources.
SAT solvers are particularly sensitive to the encoding of problems, so special
care is needed when translating HOL formulas.
As a rule, HOL scalars are mapped to FORL singletons and functions are mapped to
FORL relations accompanied by a constraint. More specifically,
an n-ary first-order function (curried or not) can be coded as an
(n + 1)-ary relation accompanied by a constraint. However, if the return
type is the type of Booleans, the function is more efficiently coded as an
unconstrained n-ary relation.
Higher-order quantification and functions bring complications of their own. A
function from σ to τ cannot be directly passed as an argument in FORL;
Nitpick's workaround is to pass |σ| arguments of type τ that encode a
function table.
Implementation
Nitpick, like most of Isabelle/HOL, is written in Standard ML. Unlike Isabelle
itself, which adheres to the LCF small-kernel discipline, Nitpick does not
certify its results and must be trusted.
Nitpick is available as part of Isabelle/HOL for all major platforms under a
BSD-style license from
http://www.cl.cam.ac.uk/research/hvg/Isabelle
Expected Competition Performance
Thanks to Kodkod's amazing power, we expect that Nitpick will beat both
Satallax and Refute with its hands tied behind its back.
Princess 20150706
Philipp Rümmer
Uppsala University, Sweden
Architecture
Princess
[Rue08,Rue12]
is a theorem prover for first-order logic modulo linear integer arithmetic.
The prover uses a combination of techniques from the areas of first-order
reasoning and SMT solving.
The main underlying calculus is a free-variable tableau calculus,
which is extended with constraints to enable backtracking-free proof
expansion, and positive unit hyper-resolution for lightweight
instantiation of quantified formulae. Linear integer arithmetic is
handled using a set of built-in proof rules resembling the Omega test,
which altogether yields a calculus that is complete for full
Presburger arithmetic, for first-order logic, and for a number of
further fragments. In addition, some built-in procedures for nonlinear
integer arithmetic are available.
The internal calculus of Princess only supports uninterpreted
predicates; uninterpreted functions are encoded as predicates,
together with the usual axioms. Through appropriate translation of
quantified formulae with functions, the e-matching technique common in
SMT solvers can be simulated; triggers in quantified formulae are
chosen based on heuristics similar to those in the Simplify prover.
Strategies
For CASC, Princess will run a fixed schedule of configurations for
each problem (portfolio method). Configurations determine, among
others, the mode of proof expansion (depth-first, breadth-first),
selection of triggers in quantified formulae, clausification, and the
handling of functions. The portfolio was chosen based on training
with a random sample of problems from the TPTP library.
Implementation
Princess is entirely written in Scala and runs on any recent Java
virtual machine; besides the standard Scala and Java libraries, only
the Cup parser library is used.
Princess is available from
http://www.philipp.ruemmer.org/princess.shtml
Expected Competition Performance
Princess is mainly designed for integer problems (TFI), and should
perform reasonably well here.
Compared to last year, only minor updates were done, so that performance
should be similar as in 2014. This version of Princess
does not output proofs, however, and will be ranked accordingly.
Refute 2015
Jasmin Blanchette
Inria Nancy, France
Architecture
Refute
[Web08]
is an open source counterexample generator for Isabelle/HOL
[NPW02] based on a
SAT solver, and Nitpick's
[BN10]
precursor.
Strategies
Refute employs a SAT solver to find a finite model of the negated conjecture.
The translation from HOL to propositional logic is parameterized by the
cardinalities of the atomic types occurring in the conjecture. Refute enumerates
the possible cardinalities for each atomic type. If a formula has a finite
counterexample, the tool eventually finds it, unless it runs out of resources.
Implementation
Refute, like most of Isabelle/HOL, is written in Standard ML. Unlike Isabelle
itself, which adheres to the LCF small-kernel discipline, Refute does not
certify its results and must be trusted.
Refute is available as part of Isabelle/HOL for all major platforms under a
BSD-style license from
http://www.cl.cam.ac.uk/research/hvg/Isabelle
Expected Competition Performance
We expect Refute to beat Satallax but also to be beaten by Nitpick.
Satallax 2.8
Nik Sultana
Cambridge University, United Kingdom
Architecture
Satallax 2.8
[Bro12]
is an automated theorem prover for higher-order logic.
The particular form of higher-order logic supported by Satallax is Church's
simple type theory with extensionality and choice operators.
The SAT solver MiniSat
[ES04]
is responsible for much of the search for a proof.
The theoretical basis of search is a complete ground tableau calculus for
higher-order logic
[BS10]
with a choice operator
[BB11].
A problem is given in the THF format.
A branch is formed from the axioms of the problem and the negation of the
conjecture (if any is given).
From this point on, Satallax tries to determine unsatisfiability or
satisfiability of this branch.
Satallax progressively generates higher-order formulae and corresponding
propositional clauses
[Bro13].
These formulae and propositional clauses correspond to instances of the
tableau rules.
Satallax uses the SAT solver MiniSat as an engine to test the current set
of propositional clauses for unsatisfiability.
If the clauses are unsatisfiable, then the original branch is unsatisfiable.
Additionally, Satallax may optionally generate first-order formulas in
addition to the propositional clauses.
If this option is used, then Satallax peroidically calls the first-order
theorem prover E to test for first-order unsatisfiability.
If the set of first-order formulas is unsatisfiable, then the original branch
is unsatisfiable.
Strategies
There are about a hundred flags that control the order in which formulas and
instantiation terms are considered and propositional clauses are generated.
Other flags activate some optional extensions to the basic proof procedure
(such as whether or not to call the theorem prover E).
A collection of flag settings is called a mode.
Approximately 500 modes have been defined and tested so far.
A strategy schedule is an ordered collection of modes with information about
how much time the mode should be allotted.
Satallax tries each of the modes for a certain amount of time sequentially.
Satallax 2.7 has strategy schedule consisting of 68 modes.
Each mode is tried for time limits ranging from 0.1 seconds to 54.9 seconds.
The strategy schedule was determined through experimentation using the THF
problems in version 5.4.0 of the TPTP library.
Implementation
Satallax is implemented in OCaml.
A foreign function interface is used to interact with MiniSat 2.2.0.
Satallax is available from
http://mathgate.info/cebrown/satallax/
http://mathgate.info/cebrown/satallax/
Expected Competition Performance
Similar to last year, since the changes from v2.7 to v2.8 consist of
improvements in proof output.
Satallax-MaLeS 1.3
Daniel Kuehlwein
Radboud University Nijmegen, The Netherlands
Architecture
Satallax-MaLeS 1.3 improves Satallax's automatic mode with machine learning
techniques to predict which search strategy is most likely to find a proof.
Strategies
Satallax-MaLeS 1.3 relies on Geoff Sutcliffe's MakeListStat to classify
problems.
It uses the same data as Satallax_MaLeS 1.2 as basis for the learning
algorithms.
Implementation
Satallax-MaLeS is based on Python, in particular the Sklearn library that
contains many machine learning algorithms.
After CASC Satallax-MaLeS will be available from
http://www.cs.ru.nl/~kuehlwein/
Expected Competition Performance
Satallax-MaLes 1.3 is the CASC-J7 THF division winner.
SPASS+T 2.2.22
Uwe Waldmann
Max-Planck-Institut für Informatik, Germany
Architecture
SPASS+T is an extension of the superposition-based theorem prover SPASS
that integrates algebraic knowledge into SPASS in three complementary ways:
by passing derived formulas to an external SMT procedure
(currently Yices or CVC3), by adding standard axioms,
and by built-in arithmetic simplification and inference rules.
A first version of the system has been described in
[PW06];
later a much more sophisticated coupling of the SMT procedure has been
added [Zim07]. The latest version fixes some bugs related to
memory management and to name clashes with the SMT procedure
and provides improved support for non-arithmetic problems.
Strategies
Standard axioms and built-in arithmetic simplification and inference rules
are integrated into the standard main loop of SPASS.
Inferences between standard axioms are excluded, so the
user-supplied formulas are taken as set of support.
The external SMT procedure runs in parallel in a separate process,
leading occasionally to non-deterministic behaviour.
Implementation
SPASS+T is implemented in C. The system is available from
http://people.mpi-inf.mpg.de/~uwe/software/#TSPASS
Expected Competition Performance
We expect SPASS+T to be among the leading systems in the TFA division
at CASC-25.
Vampire 2.6
Krystof Hoder, Andrei Voronkov
University of Manchester, England
Architecture
Vampire 2.6 is an automatic theorem prover for first-order classical logic.
It consists of a shell and a kernel.
The kernel implements the calculi of ordered binary resolution and
superposition for handling equality.
It also implements the Inst-gen calculus.
The splitting rule in kernel adds propositional parts to clauses, which are
being manipulated using binary decision diagrams and a SAT solver.
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion,
subsumption resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Substitution tree and code tree indexes are used to implement all major
operations on sets of terms, literals and clauses.
Although the kernel of the system works only with clausal normal form, the
shell accepts a problem in the full first-order logic syntax, clausifies it
and performs a number of useful transformations before passing the result
to the kernel.
Also the axiom selection algorithm Sine [HV11] can be enabled as part of
the preprocessing.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
Strategies
The Vampire 2.6 kernel provides a fairly large number of options for strategy se
lection. The most important ones are:
- Choice of the main procedure:
- Limited Resource Strategy
- DISCOUNT loop
- Otter loop
- Goal oriented mode based on tabulation
- Instantiation using the Inst-gen calculus
- A variety of optional simplifications.
- Parameterized reduction orderings.
- A number of built-in literal selection functions and different modes
of comparing literals.
- Age-weight ratio that specifies how strongly lighter clauses are
preferred for inference selection.
- Set-of-support strategy.
Implementation
Vampire 2.6 is implemented in C++.
Expected Competition Performance
Vampire 2.6 is the CASC-J7 FOF division winner.
Vampire 4.0
Giles Reger
University of Manchester, United Kingdom
Architecture
Vampire 4.0 is an automatic theorem prover for first-order logic.
Vampire implements the calculi of ordered binary resolution and superposition
for handling equality.
It also implements the Inst-gen calculus and a MACE-style finite model builder.
Splitting in resolution-based proof search is controlled by the AVATAR
architecture, which uses a SAT solver to make splitting decisions.
Both resolution and instantiation based proof search make use of global
subsumption.
A number of standard redundancy criteria and simplification techniques are used
for pruning the search space: subsumption, tautology deletion, subsumption
resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Substitution tree and code tree indexes are used to implement all major
operations on sets of terms, literals and clauses.
Internally, Vampire works only with clausal normal form.
Problems in the full first-order logic syntax are clausified during
preprocessing.
Vampire implements many useful preprocessing transformations including the
Sine axiom selection algorithm.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
Strategies
Vampire 4.0 provides a very large number of options for strategy selection. The most important ones are:
- Choices of saturation algorithm:
- Limited Resource Strategy
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
- Splitting via AVATAR
- A variety of optional simplifications.
- Parameterized reduction orderings.
- A number of built-in literal selection functions and different modes
of comparing literals.
- Age-weight ratio that specifies how strongly lighter clauses are preferred
for inference selection.
- Set-of-support strategy.
- Ground equational reasoning via congruence closure.
- Evaluation of interpreted functions.
- Extensionality resolution with detection of extensionality axioms
Implementation
Vampire 4.0 is implemented in C++.
Expected Competition Performance
Vampire 4.0 should be an improvement on Vampire 2.6, which is the CASC-J7
FOF division winner.
Improvements have also been made to the parts of the prover relevant to
satisfiability checking and theory reasoning.
VampireZ3 1.0
Giles Reger
University of Manchester, United Kingdom
Architecture
VampireZ3 version 1.0 is a combination of Vampire version 4.0 (see
other description) and Z3 version 4.3.1.
Vampire 4.0 uses the AVATAR architecture to make splitting decisions.
Briefly, the first-order search space is represented in the SAT solver
with propositional symbols consistently naming variable-disjoint components.
A SAT solver is then used to (iteratively) select a subset of components to
search.
In VampireZ3 the Z3 SMT solver is used in place of a SAT solver and
ground components are translated into Z3 terms.
This means Z3’s efficient methods for ground reasoning with equality and
theories are exposed by AVATAR, as the SMT solver only produces
theory-consistent models.
Strategies
All strategies of Vampire 4.0 are available. Z3 is only used when splitting
is selected.
Implementation
Vampire and Z3 are both implemented in C++.
Expected Competition Performance
VampireZ3 1.0 should perform better than Vampire 4.0 on theory problems.
The combination of Z3 for reasoning with ground theories and Vampire for
reasoning with quantifiers should be very powerful.
ZenonArith 0.1.0
Guillaume Bury
Inria, France
Architecture
Zenon Arith 0.1.0
[BD15]
is an extension of the tableaux-based Zenon
[BDD07], to linear arithmetic.
This extension uses the simplex algorithm as a decision procedure to solve
problems over rationals, and a branch-and-bound approach to solve problems
over integers.
This extension is also able to handle real linear arithmetic, which actually
coincides with the rational fragment of linear arithmetic due to the
syntactical restrictions of the TPTP input format for reals.
Strategies
This extension consists of a smooth integration of arithmetic deductive rules
to the basic tableau rules, so that there is a natural interleaving between
arithmetic and regular analytic rules.
This extension is able to deal with problems that are (exclusively)
universally quantified, or existentially quantified formulas.
Universally quantified formulas are handled by translating unsatisfiability
explanation from the simplex algorithm into inference rules.
Instantiation for existentially quantified formulas is achieved by trying to
solve sets of formulas that cover all branches of a derivation tree.
Implementation
Zenon Arith is implemented in OCaml, and uses the Zarith (which is a binding
for the gmp library) library to handle arbitrary precision integers and
rationals.
It uses an incremental implementation of the simplex algorithm in OCaml as
a decision procedure over linear arithmetic.
This version of Zenon comes with a built-in notion of types for terms, and
inference rules that discriminate over the types of expressions.
Zenon Arith can automatically output Coq
[CH88]
proof scripts, that can then be checked by the Coq proof assistant.
The main developper is Guillaume Bury.
Zenon Arith is available at:
https://www.rocq.inria.fr/deducteam/ZenonArith/
Expected Competition Performance
We do not expect to be among the best provers in terms of number of solved
problems, but we aim to solve a large part of the problems, with the lowest
average CPU time.