Beagle 0.9.47
Peter Baumgartner
Data61, Australia
Architecture
Beagle
[BBW15]
is an automated theorem prover for sorted first-order logic with equality over
built-in theories.
The theories currently supported are integer arithmetic, linear rational
arithmetic and linear real arithmetic.
It accepts formulas in the FOF and TFF formats of the TPTP syntax, and
formulas in the SMT-LIB version 2 format.
Beagle first converts the input formulas into clause normal form. Pure arithmetic (sub-)formulas are treated by eager application of quantifier elimination. The core reasoning component implements the Hierarchic Superposition Calculus with Weak Abstraction (HSPWA) [BW13]. Extensions are a splitting rule for clauses that can be divided into variable disjoint parts, and a partial instantiation rule for variables with finite domain, and two kinds of background-sorted variables trading off completeness vs. search space.
The HSPWA calculus generalizes the superposition calculus by integrating theory reasoning in a black-box style. For the theories mentioned above, Beagle combines quantifier elimination procedures and other solvers to dispatch proof obligations over these theories. The default solvers are an improved version of Cooper's algorithm for linear integer arithmetic, and the CVC4 SMT solver for linear real/rational arithmetic. Non-linear integer arithmetic is treated by partial instantiation and additional lemmas.
Beagle uses strategy scheduling by trying (at most) three flag settings sequentially.
https://bitbucket.org/peba123/beagle
CVC4 1.5
Andrew Reynolds
EPFL, Switzerland
Architecture
CVC4 [BC+11] is an SMT solver based on the DPLL(T) architecture [NOT06] that
includes built-in support for many theories, including linear arithmetic,
arrays, bit vectors, datatypes and strings.
It incorporates approaches for handling universally quantified formulas.
CVC4 primarily uses heuristic approaches based on E-matching for theorems,
and finite model finding approaches for non-theorems.
Like other SMT solvers, CVC4 treats quantified formulas using a two-tiered
approach. First, quantified formulas are replaced by fresh Boolean
predicates and the ground theory solver(s) are used in conjunction with the
underlying SAT solver to determine satisfiability.
If the problem is unsatisfiable at the ground level, then the solver answers
"unsatisfiable".
Otherwise, the quantifier instantiation module is invoked, and will either
add instances of quantified formulas to the problem, answer "satisfiable",
or return unknown.
Finite model finding in CVC4 targets problems containing background
theories whose quantification is limited to finite and uninterpreted sorts.
In finite model finding mode, CVC4 uses a ground theory of finite cardinality
constraints that minimizes the number of ground equivalence classes, as
described in [RT+13].
When the problem is satisfiable at the ground level, a candidate model is
constructed that contains complete interpretations for all predicate and
function symbols.
It then adds instances of quantified
formulas that are in conflict with the candidate model, as described in
[RT+13].
If no instances are added, it reports "satisfiable".
https://github.com/CVC4
CVC4 1.5
Andrew Reynolds
University of Iowa, USA
Architecture
CVC4
[BC+11]
is an SMT solver based on the DPLL(T) architecture
[NOT06]
that includes built-in support for many theories, including linear arithmetic,
arrays, bit vectors, datatypes, finite sets and strings.
It incorporates approaches for handling universally quantified formulas.
For problems involving free function and predicate symbols, CVC4 primarily
uses heuristic approaches based on E-matching for theorems, and finite model
finding approaches for non-theorems.
For problems in pure arithmetic, CVC4 uses techniques for counterexample-guided
quantifier instantiation
[RD+15].
Like other SMT solvers, CVC4 treats quantified formulas using a two-tiered approach. First, quantified formulas are replaced by fresh Boolean predicates and the ground theory solver(s) are used in conjunction with the underlying SAT solver to determine satisfiability. If the problem is unsatisfiable at the ground level, then the solver answers "unsatisfiable". Otherwise, the quantifier instantiation module is invoked, and will either add instances of quantified formulas to the problem, answer "satisfiable", or return unknown. Finite model finding in CVC4 targets problems containing background theories whose quantification is limited to finite and uninterpreted sorts. In finite model finding mode, CVC4 uses a ground theory of finite cardinality constraints that minimizes the number of ground equivalence classes, as described in [RT+13]. When the problem is satisfiable at the ground level, a candidate model is constructed that contains complete interpretations for all predicate and function symbols. It then adds instances of quantified formulas that are in conflict with the candidate model, as described in [RT+13]. If no instances are added, it reports "satisfiable".
https://github.com/CVC4
E 2.0
Stephan Schulz
DHBW Stuttgart, Germany
Architecture
E 2.0
[Sch02,
Sch13]
is a purely equational theorem prover for many-sorted first-order logic with
equality.
It consists of an (optional) clausifier for pre-processing full first-order
formulae into clausal form, and a saturation algorithm implementing an instance
of the superposition calculus with negative literal selection and a
number of redundancy elimination techniques.
E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e.,
a strict separation of active and passive facts.
No special rules for non-equational literals have been implemented.
Resolution is effectively simulated by paramodulation and equality resolution.
For the LTB divisions, a control program uses a SInE-like analysis to extract reduced axiomatizations that are handed to several instances of E. E will not use on-the-fly learning this year.
For CASC-J8, E implements a strategy-scheduling automatic mode. The total CPU time available is broken into several (unequal) time slices. For each time slice, the problem is classified into one of several classes, based on a number of simple features (number of clauses, maximal symbol arity, presence of equality, presence of non-unit and non-Horn clauses, ...). For each class, a schedule of strategies is greedily constructed from experimental data as follows: The first strategy assigned to a schedule is the the one that solves the most problems from this class in the first time slice. Each subsequent strategy is selected based on the number of solutions on problems not already solved by a preceding strategy.
About 210 different strategies have been evaluated on all untyped first-order problems from TPTP 6.0.0, and about 180 of these strategies are used in the automatic mode.
http://www.eprover.org
Geo-III 2016C
Hans de Nivelle
University of Wrocław, Poland
Architecture
Geo III is a theorem prover for Partial Classical Logic, based on reduction to
Kleene Logic
[deN14].
Currently, only Chapters 4 and 5 are implemented.
Since Kleene logic generalizes 2-valued logic, Geo III can take part in CASC.
Apart from being 3-valued, the main differences with earlier versions of Geo
are (1) more sophisticated learning schemes, (2) improved proof
logging, and (3) replacement of recursion by explicit use of a stack.
The Geo family of provers uses exhaustive backtracking, combined with
learning after failure.
Earlier versions learned only conflict formulas.
Geo III learns disjunctions of arbitrary width.
Experiments show that this often results in shorter proofs.
If Geo will be ever embedded in proof assistants, these assistants will require proofs. In order to be able to provide these at the required level of detail, Geo III contains a hierarchy of proof rules that is independent of the rest of the system, and that can be modified independently. In order to be flexible in the main loop, recursion was replaced by an explicit stack. Using an explicit stack, it is easier to remove unused assumptions, or to rearrange the order of assumptions. Also, restarts are easier to implement with a stack.
iProver 2.5
Kontantin Korovin
University of Manchester, United Kingdom
Architecture
iProver is an automated theorem prover based on an instantiation calculus
Inst-Gen
[GK03,
Kor13]
which is complete for first-order logic.
iProver combines first-order reasoning with ground reasoning for which it
uses MiniSat
[ES04]
and optionally PicoSAT
[Bie08]
(only MiniSat will be used at this CASC).
iProver also combines instantiation with ordered resolution; see
[Kor08,
Kor13]
for the implementation details.
The proof search is implemented using a saturation process based on
the given clause algorithm. iProver uses non-perfect discrimination trees
for the unification indexes, priority queues for passive clauses, and
a compressed vector index for subsumption and subsumption resolution
(both forward and backward).
The following redundancy eliminations are implemented: blocking non-proper
instantiations; dismatching constraints
[GK04,
Kor08];
global subsumption
[Kor08];
resolution-based simplifications and propositional-based simplifications.
A compressed feature vector index is used for efficient forward/backward
subsumption and subsumption resolution.
Equality is dealt with (internally) by adding the necessary axioms of equality.
Recent changes in iProver include improved preprocessing and incremental
finite model finding; support of the AIG format for hardware verification and
model-checking (implemented with Dmitry Tsarkov).
In the LTB division, iProver uses axiom selection based on the Sine algorithm [HV11] as implemented in Vampire [KV13], i.e., axiom selection is done by Vampire and proof attempts are done by iProver.
Some of iProver features are summarised below.
iProver is available at:
http://www.cs.man.ac.uk/~korovink/iprover/
Isabelle 2015
Jasmin Blanchette
Inria Nancy, France
Architecture
Isabelle/HOL 2015
[NPW02]
is the higher-order logic incarnation of the generic proof assistant
Isabelle2015.
Isabelle/HOL provides several automatic proof tactics, notably an equational
reasoner
[Nip89],
a classical reasoner
[PN94],
and a tableau prover
[Pau99].
It also integrates external first- and higher-order provers via its subsystem
Sledgehammer
[PB10,BBP11].
Isabelle includes a parser for the TPTP syntaxes CNF, FOF, TFF0, and THF0, due
to Nik Sultana. It also includes TPTP versions of its popular tools, invokable
on the command line as isabelle tptp_tool max_secs
file.p. For example:
Isabelle is available in two versions. The HOT version (which is not participating in CASC-J8) includes LEO-II [BP+08] and Satallax [Bro12] as Sledgehammer backends, whereas the competition version leaves them out.isabelle tptp_isabelle_hot 100 SEU/SEU824^3.p
The implementation of Isabelle relies on a small LCF-style kernel, meaning that inferences are implemented as operations on an abstract theorem datatype. Assuming the kernel is correct, all values of type theorem are correct by construction.
Most of the code for Isabelle was written by the Isabelle teams at the University of Cambridge and the Technische Universität München. Isabelle/HOL is available for all major platforms under a BSD-style license from
http://www.cl.cam.ac.uk/research/hvg/Isabelle
leanCoP 2.2
Jens Otten
University of Potsdam, Germany
Architecture
leanCoP
[OB03,
Ott08]
is an automated theorem prover for classical first-order logic with equality.
It is a very compact implementation of the connection (tableau) calculus
[Bib87,
LS01].
leanCoP can read formulae in leanCoP syntax and in TPTP first-order syntax. Equality axioms and axioms to support distinct objects are automatically added if required. The leanCoP core prover returns a very compact connection proof, which is then translated into a more comprehensive output format, e.g., into a lean (TPTP-style) connection proof or into a readable text proof.
The source code of leanCoP 2.2 is available under the GNU general public license. It can be downloaded from the leanCoP website at:
http://www.leancop.deThe website also contains information about ileanCoP [Ott08] and MleanCoP [Ott12, Ott14], two versions of leanCoP for first-order intuitionistic logic and first-order modal logic, respectively.
LEO-II 1.7.0
Max Wisniewski
Freie Universität Berlin, Germany
Architecture
LEO-II
[BP+08],
the successor of LEO
[BK98],
is a higher-order ATP system based on extensional higher-order resolution.
More precisely, LEO-II employs a refinement of extensional higher-order
RUE resolution
[Ben99].
LEO-II is designed to cooperate with specialist systems for fragments of
higher-order logic.
By default, LEO-II cooperates with the first-order ATP system E
[Sch02].
LEO-II is often too weak to find a refutation amongst the steadily growing
set of clauses on its own.
However, some of the clauses in LEO-II's search space attain a special
status: they are first-order clauses modulo the application of an
appropriate transformation function.
Therefore, LEO-II launches a cooperating first-order ATP system every n
iterations of its (standard) resolution proof search loop (e.g., 10).
If the first-order ATP system finds a refutation, it communicates its success
to LEO-II in the standard SZS format.
Communication between LEO-II and the cooperating first-order ATP system
uses the TPTP language and standards.
Unfortunately the LEO-II system still uses only a very simple sequential collaboration model with first-order ATPs instead of using the more advanced, concurrent and resource-adaptive OANTS architecture [BS+08] as exploited by its predecessor LEO.
The LEO-II system is distributed under a BSD style license, and it is available from
http://www.leoprover.org
Leo-III 1.0
Max Wisniewski
Freie Universität Berlin, Germany
Architecture
Leo-III
[WSB14],
the successor of LEO-II
[BP+08],
is a higher-order ATP system based on ordered higher-order paramodulation
employing an agent-based blackboard architecture.
In its first version, Leo-III is using multiple, adapted sequential DISCOUNT
loops, each with different search strategies.
In addition, similar to LEO-II, each sequential loop will call non-blockingly
an external ATP every n iterations of the sequential loop.
In the current version, the called ATPs have to understand THF syntax and
return the result in standard SZS format.
In the competition mode only our own prover LEO-II will be used as a
cooperation prover.
If either one of the paramodulation loops or one of the external provers
finds a proof, the system stops and returns the result.
Ultimately, Leo-III is in its first version an enhancement of LEO-II. The main improvement in comparison to its predecessor is the better equational handling with the new calculus, and the multi-search of the agent architecture.
https://github.com/cbenzmueller/Leo-III
Leo+III 1.0
Max Wisniewski
Freie Universität Berlin, Germany
Architecture
Leo-III
[WSB14],
the successor of LEO-II
[BP+08],
is a higher-order ATP system based on ordered higher-order paramodulation
employing an agent-based blackboard architecture.
In its first version, Leo-III is using multiple, adapted sequential DISCOUNT
loops, each with different search strategies.
In addition, similar to LEO-II, each sequential loop will call non-blockingly
an external ATP every n iterations of the sequential loop.
In the current version, the called ATPs have to understand THF syntax and
return the result in standard SZS format.
In the competition mode only our own prover LEO-II will be used as a
cooperation prover.
If either one of the paramodulation loops or one of the external provers
finds a proof, the system stops and returns the result.
Leo+III is Leo-III 'plus' Satallax as a subprover.
Ultimately, Leo-III is in its first version an enhancement of LEO-II. The main improvement in comparison to its predecessor is the better equational handling with the new calculus, and the multi-search of the agent architecture.
https://github.com/cbenzmueller/Leo-III
Nitpick 2015
Jasmin Blanchette
Inria Nancy, France
Architecture
Nitpick
[BN10]
is an open source counterexample generator for Isabelle/HOL
[NPW02].
It builds on Kodkod
[TJ07],
a highly optimized first-order relational model finder based on SAT.
The name Nitpick is appropriated from a now retired Alloy precursor.
In a case study, it was applied successfully to a formalization of the C++
memory model
[BW+11].
SAT solvers are particularly sensitive to the encoding of problems, so special care is needed when translating HOL formulas. As a rule, HOL scalars are mapped to FORL singletons and functions are mapped to FORL relations accompanied by a constraint. More specifically, an n-ary first-order function (curried or not) can be coded as an (n + 1)-ary relation accompanied by a constraint. However, if the return type is the type of Booleans, the function is more efficiently coded as an unconstrained n-ary relation. Higher-order quantification and functions bring complications of their own. A function from σ to τ cannot be directly passed as an argument in FORL; Nitpick's workaround is to pass |σ| arguments of type τ that encode a function table.
Nitpick is available as part of Isabelle/HOL for all major platforms under a BSD-style license from
http://www.cl.cam.ac.uk/research/hvg/Isabelle
Princess 160606
Philipp Rümmer
Uppsala University, Sweden
Architecture
Princess
[Rue08,
Rue12]
is a theorem prover for first-order logic modulo linear integer arithmetic.
The prover uses a combination of techniques from the areas of first-order
reasoning and SMT solving.
The main underlying calculus is a free-variable tableau calculus,
which is extended with constraints to enable backtracking-free proof
expansion, and positive unit hyper-resolution for lightweight
instantiation of quantified formulae.
Linear integer arithmetic is handled using a set of built-in proof rules
resembling the Omega test, which altogether yields a calculus that is
complete for full Presburger arithmetic, for first-order logic, and for a
number of further fragments.
In addition, some built-in procedures for nonlinear integer arithmetic are
available.
The internal calculus of Princess only supports uninterpreted predicates; uninterpreted functions are encoded as predicates, together with the usual axioms. Through appropriate translation of quantified formulae with functions, the e-matching technique common in SMT solvers can be simulated; triggers in quantified formulae are chosen based on heuristics similar to those in the Simplify prover.
http://www.philipp.ruemmer.org/princess.shtml
Prover9 2009-11A
Bob Veroff on behalf of William McCune
University of New Mexico, USA
Architecture
Prover9, Version 2009-11A, is a resolution/paramodulation prover for
first-order logic with equality.
Its overall architecture is very similar to that of Otter-3.3
[McC03].
It uses the "given clause algorithm", in which not-yet-given clauses are
available for rewriting and for other inference operations (sometimes called
the "Otter loop").
Prover9 has available positive ordered (and nonordered) resolution and paramodulation, negative ordered (and nonordered) resolution, factoring, positive and negative hyperresolution, UR-resolution, and demodulation (term rewriting). Terms can be ordered with LPO, RPO, or KBO. Selection of the "given clause" is by an age-weight ratio.
Proofs can be given at two levels of detail: (1) standard, in which each line of the proof is a stored clause with detailed justification, and (2) expanded, with a separate line for each operation. When FOF problems are input, proof of transformation to clauses is not given.
Completeness is not guaranteed, so termination does not indicate satisfiability.
Given a problem, Prover9 adjusts its inference rules and strategy according to syntactic properties of the input clauses such as the presence of equality and non-Horn clauses. Prover9 also does some preprocessing, for example, to eliminate predicates.
For CASC Prover9 uses KBO to order terms for demodulation and for the inference rules, with a simple rule for determining symbol precedence.
For the FOF problems, a preprocessing step attempts to reduce the problem to independent subproblems by a miniscope transformation; if the problem reduction succeeds, each subproblem is clausified and given to the ordinary search procedure; if the problem reduction fails, the original problem is clausified and given to the search procedure.
http://www.cs.unm.edu/~mccune/prover9/
Refute 2015
Jasmin Blanchette
Inria Nancy, France
Architecture
Refute
[Web08]
is an open source counterexample generator for Isabelle/HOL
[NPW02] based on a
SAT solver, and Nitpick's
[BN10]
precursor.
Refute employs a SAT solver to find a finite model of the negated conjecture. The translation from HOL to propositional logic is parameterized by the cardinalities of the atomic types occurring in the conjecture. Refute enumerates the possible cardinalities for each atomic type. If a formula has a finite counterexample, the tool eventually finds it, unless it runs out of resources.
Refute is available as part of Isabelle/HOL for all major platforms under a BSD-style license from
http://www.cl.cam.ac.uk/research/hvg/Isabelle
Satallax 2.8
Nik Sultana
Cambridge University, United Kingdom
Architecture
Satallax 2.8
[Bro12]
is an automated theorem prover for higher-order logic.
The particular form of higher-order logic supported by Satallax is Church's
simple type theory with extensionality and choice operators.
The SAT solver MiniSat
[ES04]
is responsible for much of the search for a proof.
The theoretical basis of search is a complete ground tableau calculus for
higher-order logic
[BS10]
with a choice operator
[BB11].
A problem is given in the THF format.
A branch is formed from the axioms of the problem and the negation of the
conjecture (if any is given).
From this point on, Satallax tries to determine unsatisfiability or
satisfiability of this branch.
Satallax progressively generates higher-order formulae and corresponding
propositional clauses
[Bro13].
These formulae and propositional clauses correspond to instances of the
tableau rules.
Satallax uses the SAT solver MiniSat as an engine to test the current set
of propositional clauses for unsatisfiability.
If the clauses are unsatisfiable, then the original branch is unsatisfiable.
Additionally, Satallax may optionally generate first-order formulas in
addition to the propositional clauses.
If this option is used, then Satallax peroidically calls the first-order
theorem prover E to test for first-order unsatisfiability.
If the set of first-order formulas is unsatisfiable, then the original branch
is unsatisfiable.
http://mathgate.info/cebrown/satallax/
Satallax 3.0
Michael Färber
Universität Innsbruck, Austria
Architecture
Satallax 3.0
[Bro12]
is an automated theorem prover for higher-order logic.
The particular form of higher-order logic supported by Satallax is Church's
simple type theory with extensionality and choice operators.
The SAT solver MiniSat
[ES04]
is responsible for much of the proof search.
The theoretical basis of search is a complete ground tableau calculus for
higher-order logic
[BS10]
with a choice operator
[BB11].
Problems are given in the THF format.
Proof search: A branch is formed from the axioms of the problem and the negation of the conjecture (if any is given). From this point on, Satallax tries to determine unsatisfiability or satisfiability of this branch. Satallax progressively generates higher-order formulae and corresponding propositional clauses [Bro13]. These formulae and propositional clauses correspond to instances of the tableau rules. Satallax uses the SAT solver MiniSat to test the current set of propositional clauses for unsatisfiability. If the clauses are unsatisfiable, then the original branch is unsatisfiable. Optionally, Satallax generates first-order formulae in addition to the propositional clauses. If this option is used, then Satallax periodically calls the first-order theorem prover E to test for first-order unsatisfiability. If the set of first-order formulae is unsatisfiable, then the original branch is unsatisfiable. Upon request, Satallax attempts to reconstruct a proof which can be output in the TSTP format.
http://satallaxprover.com
A number of standard redundancy criteria and simplification techniques are used
for pruning the search space: subsumption, tautology deletion, subsumption
resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Substitution tree and code tree indexes are used to implement all major
operations on sets of terms, literals and clauses.
Internally, Vampire works only with clausal normal form.
Problems in the full first-order logic syntax are clausified during
preprocessing.
Vampire implements many useful preprocessing transformations including the
Sine axiom selection algorithm.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion,
subsumption resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Substitution tree and code tree indexes are used to implement all major
operations on sets of terms, literals and clauses.
Internally, Vampire works only with clausal normal form.
Problems in the full first-order logic syntax are clausified during
preprocessing.
Vampire implements many useful preprocessing transformations including the
SinE axiom selection algorithm.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
Vampire 4.0
Giles Reger
University of Manchester, United Kingdom
Architecture
Vampire 4.0 is an automatic theorem prover for first-order logic.
Vampire implements the calculi of ordered binary resolution and superposition
for handling equality.
It also implements the Inst-gen calculus and a MACE-style finite model builder.
Splitting in resolution-based proof search is controlled by the AVATAR
architecture, which uses a SAT solver to make splitting decisions.
Both resolution and instantiation based proof search make use of global
subsumption.
Strategies
Vampire 4.0 provides a very large number of options for strategy selection. The most important ones are:
Implementation
Vampire 4.0 is implemented in C++.
Expected Competition Performance
Vampire 4.0 is the CASC-25 FOF, FNT, EPR, and LTB division winner.
Vampire 4.1
Giles Reger
University of Manchester, United Kingdom
Architecture
Vampire
[KV13]
4.1 is an automatic theorem prover for first-order logic.
Vampire implements the calculi of ordered binary resolution and superposition
for handling equality.
It also implements the Inst-gen calculus
[Kor13]
and a MACE-style finite model builder
[RSV16].
Splitting in resolution-based proof search is controlled by the AVATAR
architecture
[Vor14]
which uses a SAT or SMT solver to make splitting decisions.
Both resolution and instantiation based proof search make use of global
subsumption
[Kor13].
Strategies
Vampire 4.1 provides a very large number of options for strategy selection.
The most important ones are:
Implementation
Vampire 4.1 is implemented in C++.
Expected Competition Performance
Vampire 4.1 should be an improvement on Vampire 4.0 and VampireZ3 1.0, which
won 5 divisions between them last year.
Note that this year there is not a seperate VampireZ3 entry as Vampire 4.1
includes Z3.