Entrants' System Descriptions


Beagle 0.9.22

Peter Baumgartner
NICTA and Australian National University, Australia

Architecture

Beagle is an automated theorem prover for sorted first-order logic with equality over built-in theories. The theories currently supported are integer arithmetic, linear rational arithmetic and linear real arithmetic. It accepts formulas in the FOF and TFF formats of the TPTP syntax, and formulas in the SMT-LIB version 2 format.

Beagle first converts the input formulas into clause normal form. Pure arithmetic (sub-)formulas are treated by eager application of quantifier elimination. The core reasoning component implements the Hierarchic Superposition Calculus with Weak Abstraction (HSPWA) [BW13]. Extensions are a splitting rule for clauses that can be divided into variable disjoint parts, and a chaining inference rule for reasoning with inequalities.

The HSPWA calculus generalizes the superposition calculus by integrating theory reasoning in a black-box style. For the theories mentioned above, Beagle combines quantifier elimination procedures and other solvers to dispatch proof obligations over these theories. The default solvers are an improved version of Cooper's algorithm for linear integer arithmetic, and the Fourier-Motzkin algorithm for linear real/rational arithmetic. Non-linear integer arithmetic is treated by partial instantiation.

Strategies

Beagles uses the Discount loop for saturating a clause set under the calculus' inference rules. Simplification techniques include standard ones, such as subsumption deletion, demodulation by ordered unit equations, and tautology deletion. It also includes theory specific simplification rules for evaluating ground (sub)terms, and for exploiting cancellation laws and properties of neutral elements, among others. In the competition an aggressive form of arithmetic simplification is used, which seems to perform best in practice.

Beagle uses strategy scheduling by trying (at most) two flag settings sequentially.

Implementation

Beagle is implemented in Scala. It is a full implementation of the HSPWA calculus. It uses a simple form of indexing, essentially top-symbol hashes, stored with each term and computed in a lazy way. Fairness is achieved through a combination of measuring clause weights and their derivation-age. It can be fine-tuned with a weight-age ratio parameter, as in other provers.

Beagle's web site is

    https://bitbucket.org/peba123/beagle

Expected Competition Performance

Beagle is implemented in a straightforward way and would benefit from optimized data structures. We do not expect it to come in among the first.


CVC4 1.4

Andrew Reynolds
EPFL, Switzerland

Architecture

CVC4 [
BC+11] is an SMT solver based on the DPLL(T) architecture [NOT06] that includes built-in support for many theories including linear arithmetic, arrays, bit vectors, datatypes and strings. It incorporates various approaches for handling universally quantified formulas. In particular, CVC4 uses primarily uses heuristic approaches based on E-matching for answering "unsatisfiable", and finite model finding approaches for answering "satisfiable".

Like other SMT solvers, CVC4 treats quantified formulas using a two-tiered approach. First, quantified formulas are replaced by fresh boolean predicates and the ground theory solver(s) are used in conjunction with the underlying SAT solver to determine satisfiability. If the problem is unsatisfiable at the ground level, then the solver answers "unsatisfiable". Otherwise, the quantifier instantiation module is invoked, and will either add instances of quantified formulas to the problem, answer "satisfiable", or return unknown.

The finite model finding has been developed to target problems containing background theories, whose quantification is limited to finite and uninterpreted sorts. In finite model finding mode, CVC4 uses a ground theory of finite cardinality constraints that minimizes the number of ground equivalence classes, as described in [RT+13]. When the problem is satisfiable at the ground level, a candidate model is constructed that contains complete interpretations for all predicate and function symbols. Quantifier instantiation strategies are then invoked to add instances of quantified formulas that are in conflict with the candidate model, as described in [RT+13]. If no instances are added, then the solver reports "satisfiable".

Strategies

For handling theorems, CVC4 primarily uses various configurations of E-matching. This year, CVC4 incorporates new methods for finding conflicting instances of quantified formulas [RTd14], which have been shown to lead to improved performance on unsatisfiable TPTP benchmarks. CVC4 also incorporates a model-based heuristic for handling quantified formulas containing only pure arithmetic, which will be used in the TFA division.

For handling non-theorems, the finite model finding feature of CVC4 will use a number of orthogonal quantifier instantiation strategies. This year, it will incorporate several new features, including an optimized implementation of model-based quantifier instantiation which improves upon [RT+13], as well as techniques for sort inference. Since CVC4 with finite model finding is also capable of answering "unsatisfiable", it will be used as a strategy for theorems as well.

Implementation

CVC4 is implemented in C++. The code is available from
    https://github.com/CVC4

Expected Competition Performance

CVC4 1.4 is the CASC-J7 TFA division winner.


CVC4 1.5

Andrew Reynolds
EPFL, Switzerland

Architecture

CVC4 [
BC+11] is an SMT solver based on the DPLL(T) architecture [NOT06] that includes built-in support for many theories, including linear arithmetic, arrays, bit vectors, datatypes and strings. It incorporates approaches for handling universally quantified formulas. CVC4 primarily uses heuristic approaches based on E-matching for theorems, and finite model finding approaches for non-theorems. Like other SMT solvers, CVC4 treats quantified formulas using a two-tiered approach. First, quantified formulas are replaced by fresh Boolean predicates and the ground theory solver(s) are used in conjunction with the underlying SAT solver to determine satisfiability. If the problem is unsatisfiable at the ground level, then the solver answers "unsatisfiable". Otherwise, the quantifier instantiation module is invoked, and will either add instances of quantified formulas to the problem, answer "satisfiable", or return unknown. Finite model finding in CVC4 targets problems containing background theories whose quantification is limited to finite and uninterpreted sorts. In finite model finding mode, CVC4 uses a ground theory of finite cardinality constraints that minimizes the number of ground equivalence classes, as described in [RT+13]. When the problem is satisfiable at the ground level, a candidate model is constructed that contains complete interpretations for all predicate and function symbols. It then adds instances of quantified formulas that are in conflict with the candidate model, as described in [RT+13]. If no instances are added, it reports "satisfiable".

Strategies

For handling theorems, CVC4 primarily uses configurations that combine conflict-based quantifier instantiation [RTd14] and E-matching. CVC4 uses a handful of orthogonal trigger selection strategies for E-matching. For handling non-theorems, CVC4 primarily uses finite model finding techniques. These techniques can also be used for bounded integer quantification for non-theorems involving arithmetic [Rey13]. Since CVC4 with finite model finding is also capable of establishing unsatisfiability, it is used as a strategy for theorems as well. For problems in pure arithmetic, CVC4 uses techniques for counterexample-guided quantifier instantiation [RD+15], which select relevant quantifier instantiations based on models for counterexamples to quantified formulas. CVC4 relies on this method both for theorems in TFA and non-theorems in TFN.

Implementation

CVC4 is implemented in C++. The code is available from
    https://github.com/CVC4

Expected Competition Performance

CVC4 has undergone various performance improvements in the past year. It is expected to perform better than last year in FOF, due to improved trigger selection for E-matching, and higher reliance upon conflict-based quantifier instantiation. It is expected to perform better than last year in TFA, due to a revised implementation of counterexample-guided quantifier instantiation, and improvements to E-matching. It is expected to be competitive in TFN.


E 1.9.1

Stephan Schulz
DHBW Stuttgart, Germany

Architecture

E [
Sch02, Sch13] is a purely equational theorem prover for full first-order logic with equality. It consists of an (optional) clausifier for pre-processing full first-order formulae into clausal form, and a saturation algorithm implementing an instance of the superposition calculus with negative literal selection and a number of redundancy elimination techniques. E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e., a strict separation of active and passive facts. No special rules for non-equational literals have been implemented. Resolution is effectively simulated by paramodulation and equality resolution. For the LTB divisions, a control program uses a SInE-like analysis to extract reduced axiomatizations that are handed to several instances of E. E will not use the on-the-fly learning introduced this year.

Strategies

Proof search in E is primarily controlled by a literal selection strategy, a clause evaluation heuristic, and a simplification ordering. The prover supports a large number of pre-programmed literal selection strategies. Clause evaluation heuristics can be constructed on the fly by combining various parameterized primitive evaluation functions, or can be selected from a set of predefined heuristics. Clause evaluation heuristics are based on symbol-counting, but also take other clause properties into account. In particular, the search can prefer clauses from the set of support, or containing many symbols also present in the goal. Supported term orderings are several parameterized instances of Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO).

For CASC-25, E implements a strategy-scheduling automatic mode. The total CPU time available is broken into 8 (unequal) time slices. For each time slice, the problem is classified into one of several classes, based on a number of simple features (number of clauses, maximal symbol arity, presence of equality, presence of non-unit and non-Horn clauses,...). For each class, a schedule of strategies is greedily constructed from experimental data as follows: The first strategy assigned to a schedule is the the one that solves the most problems from this class in the first time slice. Each subsequent strategy is selected based on the number of solutions on problems not already solved by a preceding strategy.

About 210 different strategies have been evaluated on all untyped first-order problems from TPTP 6.0.0, and about 180 of these strategies are used in the automatic mode. A few new strategies may be added.

Implementation

E is build around perfectly shared terms, i.e. each distinct term is only represented once in a term bank. The whole set of terms thus consists of a number of interconnected directed acyclic graphs. Term memory is managed by a simple mark-and-sweep garbage collector. Unconditional (forward) rewriting using unit clauses is implemented using perfect discrimination trees with size and age constraints. Whenever a possible simplification is detected, it is added as a rewrite link in the term bank. As a result, not only terms, but also rewrite steps are shared. Subsumption and contextual literal cutting (also known as subsumption resolution) is supported using feature vector indexing [Sch04]. Superposition and backward rewriting use fingerprint indexing [Sch12], a new technique combining ideas from feature vector indexing and path indexing. Finally, LPO and KBO are implemented using the elegant and efficient algorithms developed by Bernd Löchner in [Loe06, Loe06]. The prover and additional information are available at
    http://www.eprover.org

Expected Competition Performance

E 1.9.1 has only seen minor changes and inconsequential bug fixes compared to last years version. It can produce proof objects quite efficiently. The system is expected to perform reasonably well in most proof classes, but will at best complement top systems in the disproof classes.


ePrincess 1.0

Peter Backeman
Uppsala University, Sweden

Architecture

Princess [
Rue08,Rue12] is a theorem prover for first-order logic modulo linear integer arithmetic. The prover uses a combination of techniques from the areas of first-order reasoning and SMT solving. The main underlying calculus is a free-variable tableau calculus, which is extended with constraints to enable backtracking-free proof expansion, and positive unit hyper-resolution for lightweight instantiation of quantified formulae. Linear integer arithmetic is handled using a set of built-in proof rules resembling the Omega test, which altogether yields a calculus that is complete for full Presburger arithmetic, for first-order logic, and for a number of further fragments. In addition, some built-in procedures for nonlinear integer arithmetic are available. The internal calculus of Princess only supports uninterpreted predicates; uninterpreted functions are encoded as predicates, together with the usual axioms. Through appropriate translation of quantified formulae with functions, the e-matching technique common in SMT solvers can be simulated; triggers in quantified formulae are chosen based on heuristics similar to those in the Simplify prover.

ePrincess is an extension of Princess using a calculus presented in [BR15a] to handle first-order formulas with equality. A restriced version of simultaneous rigid e-unification is utilised to handle equalities with free variables. The solving procedures presented in [BR15b] have been implemented and constitue the foundation of the unification step.

Strategies

ePrincess is using the same strategies as Princess, which means for CASC, ePrincess will run a fixed schedule of configurations for each problem (portfolio method). Configurations determine, among others, the mode of proof expansion (depth-first, breadth-first), selection of triggers in quantified formulae, clausification, and the handling of functions. The portfolio was chosen based on training with a random sample of problems from the TPTP library.

Implementation

ePrincess is entirely written in Scala and runs on any recent Java virtual machine; besides the standard Scala and Java libraries, only the Cup parser library is used. ePrincess is available from
    
    http://user.it.uu.se/~petba168/breu/

Expected Competition Performance

ePrincess is mainly designed for first order formula (FOF), and based on performance measurements it should perform decently dealing with problems containing equality.


ET 0.2

Josef Urban
Radboud University Nijmegen, The Netherlands

Architecture

E.T. 0.1 is a metasystem using E prover with specific strategies [
Urb13,KU13] and preprocessing tools [KU13,KU13,KU13] that are targeted mainly at problems with many redundant axioms. Its design is motivated by the recent experiments in the Large-Theory Batch division [KUV14] and on the Flyspeck, Mizar and Isabelle datasets, however, E.T. does no learning from related proofs.

Strategies

We characterize formulas by the symbols and terms that they contain, normalized in various ways. Then we run various algorithms that try to remove the redundant axioms and use special strategies on such problems.

Implementation

The metasystem is implemented in ca. 1000 lines of Perl. It uses a number of external programs, some of them based on E's code base, some of them independently implemented in C++.

Expected Competition Performance

ET 0.2 is expected to be slightly better than ET 0.1, provided I do not do too much windsurfing.


Geo-III 2015E

Hans de Nivelle
University of Wroclaw, Poland

Architecture

Geo III is a theorem prover for Partial Classical Logic [
deN11], based on reduction to Kleene Logic [deN14]. Currently, only Sections 4 and 5 of [deN14] are implemented. Since Kleene logic generalizes 2-valued logic, it is still possible to take part in CASC. Apart from being 3-valued, the main differences with earlier versions of Geo are (1) more sophisticated learning schemes, (2) improved proof logging, and (3) replacement of recursion by explicit use of a stack.
  1. The Geo family of provers uses exhaustive backtracking, combined with learning after failure. Earlier versions learned only conflict formulas. Geo III learns disjunctions of arbitrary width. Experiments show that this often results in shorter proofs.
  2. If Geo will be ever embedded in proof assistants, these assistants will require proofs. In order to be able to provide these at the required level of detail, Geo III contains a hierarchy of proof rules that is independent of the rest of the system, and that can be modified independently.
  3. In order to be flexible in the main loop, recursion was replaced by an explicit stack. Using an explicit stack, it is easier to remove unused assumptions, or to rearrange the order of assumptions. Also, restarts are easier to implement with a stack.

Strategies

Geo uses breadth-first, exhaustive model search, combined with learning. In case of branching, the branches are explored in random order. Specially for CASC, a restart strategy was added, which ensures that proof search is always restarted after 4 minutes. This was done because Geo III has no indexing. After some time, proof search becomes so inefficient that it makes no sense to continue, so that it is better to restart.

Implementation

Geo III is written in C++ 11. No features outside of the standard were used. It has been tested with g++ (version 4.8.4) and with clang. Version 2015E contains almost no technical optimizations, because the calculus (the geometric format, the learning schemes, and the restart strategies) are not yet mature.

Expected Competition Performance

Since we didn't have time to test Geo extensively, we are unable to make any predictions. We hope to learn from CASC about the relative strengths of Geo.


iProver 0.9

Konstantin Korovin
University of Manchester, United Kingdom

Architecture

iProver is an automated theorem prover based on an instantiation calculus Inst-Gen [
GK03,Kor09] which is complete for first-order logic. One of the distinctive features of iProver is a modular combination of first-order reasoning with ground reasoning. In particular, iProver currently integrates MiniSat [ES04] for reasoning with ground abstractions of first-order clauses. In addition to instantiation, iProver implements ordered resolution calculus and a combination of instantiation and ordered resolution; see [Kor08] for the implementation details. The saturation process is implemented as a modification of a given clause algorithm. iProver uses non-perfect discrimination trees for the unification indexes, priority queues for passive clauses, and a compressed vector index for subsumption and subsumption resolution (both forward and backward). The following redundancy eliminations are implemented: blocking non-proper instantiations; dismatching constraints [GK04,Kor08]; global subsumption [Kor08]; resolution-based simplifications and propositional-based simplifications. A compressed feature vector index is used for efficient forward/backward subsumption and subsumption resolution. Equality is dealt with (internally) by adding the necessary axioms of equality with an option of using Brand's transformation. In the LTB division, iProver-SInE uses axiom selection based on the SInE algorithm [HV11] as implemented in Vampire [HKV10], i.e., axiom selection is done by Vampire and proof attempts are done by iProver.

Major additions in the current version are:

Strategies

iProver has around 40 options to control the proof search including options for literal selection, passive clause selection, frequency of calling the SAT solver, simplifications and options for combination of instantiation with resolution. At CASC iProver will execute a small number of fixed schedules of selected options depending on general syntactic properties such as Horn/non-Horn, equational/non-equational, and maximal term depth.

Implementation

iProver is implemented in OCaml and for the ground reasoning uses MiniSat. iProver accepts FOF and CNF formats, where Vampire [HKV10] is used for clausification of FOF problems.

iProver is available from

    http://www.cs.man.ac.uk/~korovink/iprover/

Expected Competition Performance

iProver 0.9 is the CASC-J7 EPR division winner.


iProver 1.0

Konstantin Korovin, Christoph Sticksel
University of Manchester, United Kingdom

Architecture

iProver is an automated theorem prover based on an instantiation calculus Inst-Gen [
GK03,Kor13] , which is complete for first-order logic. iProver combines first-order reasoning with ground reasoning for which it uses MiniSat [ES04] and was recently extended with PicoSAT [Bie08] and Lingeling [Bie12] (only MiniSat will be used at this CASC). iProver also combines instantiation with ordered resolution; see [Kor08] for the implementation details. The proof search is implemented using a saturation process based on the given clause algorithm. iProver uses non-perfect discrimination trees for the unification indexes, priority queues for passive clauses, and a compressed vector index for subsumption and subsumption resolution (both forward and backward). The following redundancy eliminations are implemented: blocking non-proper instantiations; dismatching constraints [GK04,Kor08]; global subsumption [Kor08]; resolution-based simplifications and propositional-based simplifications. A compressed feature vector index is used for efficient forward/backward subsumption and subsumption resolution. Equality is dealt with (internally) by adding the necessary axioms of equality with an option of using Brand's transformation. In the LTB division, iProver uses axiom selection based on the SInE algorithm [HV11] as implemented in Vampire [HKV11], i.e., axiom selection is done by Vampire and proof attempts are done by iProver.

Some of iProver features are summarised below.

Type inference is targeted at improving finite model finding and symmetry breaking. Semantic filtering is used in preprocessing to eliminated irrelevant clauses. Proof extraction is challenging due to simplifications such global subsumption which involve global reasoning with the whole clause set and can be computationally expensive.

Strategies

iProver has around 60 options to control the proof search including options for literal selection, passive clause selection, frequency of calling the SAT solver, simplifications and options for combination of instantiation with resolution. At CASC iProver will execute a small number of fixed schedules of selected options depending on general syntactic properties such as Horn/non-Horn, equational/non-equational, and maximal term depth. The strategy for satisfiable problems (FNT division) includes finite model finding.

Implementation

iProver is implemented in OCaml and for the ground reasoning uses MiniSat [ES04]. iProver accepts FOF and CNF formats. Vampire [HKV11,HK+12] is used for proof-producing clausification of FOF problems as well as for axiom selection [HV11] in the LTB division.

iProver is available from

    http://www.cs.man.ac.uk/~korovink/iprover/

Expected Competition Performance

iProver 1.0 is the CASC-J7 FNT division winner.


iProver 2.0

Konstantin Korovin
University of Manchester, United Kingdom

Architecture

iProver is an automated theorem prover based on an instantiation calculus Inst-Gen [
GK03,Kor13], which is complete for first-order logic. iProver combines first-order reasoning with ground reasoning for which it uses MiniSat [ES04] and optionally PicoSAT [Bie08] and Lingeling [Bie12] (only MiniSat will be used at this CASC). iProver also combines instantiation with ordered resolution; see [Kor08,Kor13] for the implementation details. The proof search is implemented using a saturation process based on the given clause algorithm. iProver uses non-perfect discrimination trees for the unification indexes, priority queues for passive clauses, and a compressed vector index for The following redundancy eliminations are implemented: blocking non-proper instantiations; dismatching constraints [GK04,Kor08]; global subsumption [Kor08]; resolution-based simplifications and propositional-based simplifications. A compressed feature vector index is used for efficient forward/backward subsumption and subsumption resolution. Equality is dealt with (internally) by adding the necessary axioms of equality. Recent changes in iProver include improved preprocessing and incremental finite model finding; support of the AIG format for hardware verification and model-checking (implemented by Dmitry Tsarkov).

In the LTB division, iProver uses axiom selection based on the SInE algorithm [HV11] as implemented in Vampire [HKV11], i.e., axiom selection is done by Vampire and proof attempts are done by iProver.

Some of iProver features are summarised below.

Sort inference is targeted at improving finite model finding and symmetry breaking. Semantic filtering is used in preprocessing to eliminated irrelevant clauses. Proof extraction is challenging due to simplifications such global subsumption which involve global reasoning with the whole clause set and can be computationally expensive.

Strategies

iProver has around 60 options to control the proof search including options for literal selection, passive clause selection, frequency of calling the SAT solver, simplifications and options for combination of instantiation with resolution. At CASC iProver will execute a small number of fixed schedules of selected options depending on general syntactic properties such as Horn/non-Horn, equational/non-equational, and maximal term depth. For the LTB and FNT divisions several strategies are run in parallel.

Implementation

iProver is implemented in OCaml and for the ground reasoning uses MiniSat [ES04]. iProver accepts FOF and CNF formats. Vampire [HKV11,HK+12] and E prover [Sch13] are used for proof-producing clausification of FOF problems, Vampire is also used for axiom selection [HV11] in the LTB division. iProver is available at:
    http://www.cs.man.ac.uk/~korovink/iprover/

Expected Competition Performance

Compared to the last year, iProver had several improvements in datastructures and finite model finding. We expect similar performance in the EPR division and improved performance on satisfiable problems (FNT division).


iProverModulo 0.7-0.3

Guillaume Burel
ENSIIE/Cedric, France

Architecture

iProverModulo [
Bur11] is an extension of iProver [Kor08] to integrate Polarized resolution modulo [Dow10]. Polarized resolution modulo consists in presenting the theory in which the problem has to be solved by means of polarized rewriting rules instead of axioms. It can also be seen as a combination of the set-of-support strategy and selection of literals.

iProverModulo consists of two tools: First, autotheo is a theory preprocessor that converts the axioms of the input into rewriting rules that can be used by Polarized resolution modulo. Second, these rewriting rules are handled by a patched version of iProver 0.7 which integrates Polarized resolution modulo. The integration of polarized resolution modulo in iProver only affects its ordered resolution calculus, so that the instantiation calculus is untouched.

iProverModulo 0.7-0.3 outputs a proof that is made of two parts: First, autotheo print a derivation of the transformation of the axioms into rewriting rules. This derivation is in TSTP format and includes the CNF conversions obtained from E. Second, the modified version of iProver outputs a Dedukti proof from this rewriting rules and the non-axiom formulas, following the ideas of [Bur13].

Strategies

Autotheo is first run to transform the formulas of the problem whose role is "axiom" into polarized rewriting rules. Autotheo offers a set of strategies to that purpose. For the competition, the Equiv and the ClausalAll strategies will be used. The former strategy orients formulas intuitively depending of their shape. It may be incomplete, so that the prover may give up in certain cases. However, it shows interesting results on some problems. The second strategy should be complete, at least when equality is not involved. The rewriting system for the first strategy is tried for half the time given for the problem, then the prover is restarted with the second strategy if no proof has been found.

The patched version of iProver is run on the remaining formulas modulo the rewriting rules produced by autotheo. No scheduling is performed. To be compatible with Polarized resolution modulo, literals are selected only when they are maximal w.r.t. a KBO ordering, and orphans are not eliminated. To take advantage of Polarized resolution modulo, the resolution calculus is triggered more often than the instantiation calculus, on the contrary to the original iProver.

Normalization of clauses w.r.t. the term rewriting system produced by autotheo is performed by transforming these rules into an OCaml program, compiling this program, and dynamically linking it with the prover.

Implementation

iProverModulo is available as a patch to iProver. The most important additions are the plugin-based normalization engine and the handling of polarized rewriting rules. iProverModulo is available from
    http://www.ensiie.fr/~guillaume.burel/blackandwhite_iProverModulo.html.en
Since iProverModulo needs to compile rewriting rules, an OCaml compiler is also provided.

Autotheo is available independently from iProverModulo from

    http://www.ensiie.fr/~guillaume.burel/blackandwhite_autotheo.html.en
Autotheo uses E to compute clausal normal form of formula. The version of E it uses is very slightly modified to make it print the CNF derivation even if no proof is found.

Both of autotheo and iProver are written in OCaml.

Expected Competition Performance

The core of iProverModulo was untouched since last time it entered the competition in CASC-24. However, compilation of rewriting rules failed at the time, so a slight improvement is to be expected this year.


Isabelle 2015

Jasmin Blanchette
Inria Nancy, France

Architecture

Isabelle/HOL 2015 [
NPW02] is the higher-order logic incarnation of the generic proof assistant Isabelle2015. Isabelle/HOL provides several automatic proof tactics, notably an equational reasoner [Nip89], a classical reasoner [PN94], and a tableau prover [Pau99]. It also integrates external first- and higher-order provers via its subsystem Sledgehammer [PB10,BBP11]. Isabelle includes a parser for the TPTP syntaxes CNF, FOF, TFF0, and THF0, due to Nik Sultana. It also includes TPTP versions of its popular tools, invokable on the command line as isabelle tptp_tool max_secs file.p. For example:
isabelle tptp_isabelle_hot 100 SEU/SEU824^3.p
Isabelle is available in two versions. The HOT version (which is not participating in CASC-25) includes LEO-II [BP+08] and Satallax [Bro12] as Sledgehammer backends, whereas the competition version leaves them out.

Strategies

The Isabelle tactic submitted to the competition simply tries the following tactics sequentially:
sledgehammer
Invokes the following sequence of provers as oracles via Sledgehammer:
nitpick
For problems involving only the type $o of Booleans, checks whether a finite model exists using Nitpick [BN10].
simp
Performs equational reasoning using rewrite rules [Nip89].
blast
Searches for a proof using a fast untyped tableau prover and then attempts to reconstruct the proof using Isabelle tactics [Pau99].
auto+spass
Combines simplification and classical reasoning [PN94] under one roof; then invoke Sledgehammer with SPASS on any subgoals that emerge.
z3
Invokes the SMT solver Z3 4.4.0 [dMB08].
cvc4
Invokes the SMT solver CVC4 1.5pre [BT07].
fast
Searches for a proof using sequent-style reasoning, performing a depth-first search [PN94]. Unlike blast, it construct proofs directly in Isabelle. That makes it slower but enables it to work in the presence of the more unusual features of HOL, such as type classes and function unknowns.
best
Similar to fast, except that it performs a best-first search.
force
Similar to auto, but more exhaustive.
meson
Implements Loveland's MESON procedure [Lov78]. Constructs proofs directly in Isabelle.
fastforce
Combines fast and force.

Implementation

Isabelle is a generic theorem prover written in Standard ML. Its meta-logic, Isabelle/Pure, provides an intuitionistic fragment of higher-order logic. The HOL object logic extends pure with a more elaborate version of higher-order logic, complete with the familiar connectives and quantifiers. Other object logics are available, notably FOL (first-order logic) and ZF (Zermelo–Fraenkel set theory).

The implementation of Isabelle relies on a small LCF-style kernel, meaning that inferences are implemented as operations on an abstract theorem datatype. Assuming the kernel is correct, all values of type theorem are correct by construction.

Most of the code for Isabelle was written by the Isabelle teams at the University of Cambridge and the Technische Universität München. Isabelle/HOL is available for all major platforms under a BSD-style license from

    http://www.cl.cam.ac.uk/research/hvg/Isabelle

Expected Competition Performance

Thanks to the addition of CVC4 and a new version of Vampire, Isabelle might have become now strong enough to take on Satallax and its various declensions. But we expect Isabelle to end in second or third place, to be honest.


leanCoP 2.2

Jens Otten
University of Potsdam, Germany

Architecture

leanCoP [
OB03,Ott08] is an automated theorem prover for classical first-order logic with equality. It is a very compact implementation of the connection (tableau) calculus [Bib87,LS01].

Strategies

The reduction rule of the connection calculus is applied before the extension rule. Open branches are selected in a depth-first way. Iterative deepening on the proof depth is performed in order to achieve completeness. Additional inference rules and techniques include regularity, lemmata, and restricted backtracking [Ott10]. leanCoP uses an optimized structure-preserving transformation into clausal form [Ott10] and a fixed strategy scheduling, which is controlled by a shell script.

Implementation

leanCoP is implemented in Prolog. The source code of the core prover consists only of a few lines of code. Prolog's built-in indexing mechanism is used to quickly find connections when the extension rule is applied.

leanCoP can read formulae in leanCoP syntax and in TPTP first-order syntax. Equality axioms and axioms to support distinct objects are automatically added if required. The leanCoP core prover returns a very compact connection proof, which is then translated into a more comprehensive output format, e.g., into a lean (TPTP-style) connection proof or into a readable text proof.

The source code of leanCoP 2.2 is available under the GNU general public license. It can be downloaded from the leanCoP website at:

    http://www.leancop.de
The website also contains information about ileanCoP [Ott08] and MleanCoP [Ott12,Ott14], two versions of leanCoP for first-order intuitionistic logic and first-order modal logic, respectively.

Expected Competition Performance

As the core prover has not changed, the performance of leanCoP 2.2 is expected to be similar to the performance of leanCoP 2.1.


LEO-II 1.6.2

Christoph Benzmüller
Freie Universität Berlin, Germany

Architecture

LEO-II [
BP+08], the successor of LEO [BK98], is a higher-order ATP system based on extensional higher-order resolution. More precisely, LEO-II employs a refinement of extensional higher-order RUE resolution [Ben99]. LEO-II is designed to cooperate with specialist systems for fragments of higher-order logic. By default, LEO-II cooperates with the first-order ATP system E [Sch02]. LEO-II is often too weak to find a refutation amongst the steadily growing set of clauses on its own. However, some of the clauses in LEO-II's search space attain a special status: they are first-order clauses modulo the application of an appropriate transformation function. Therefore, LEO-II launches a cooperating first-order ATP system every n iterations of its (standard) resolution proof search loop (e.g., 10). If the first-order ATP system finds a refutation, it communicates its success to LEO-II in the standard SZS format. Communication between LEO-II and the cooperating first-order ATP system uses the TPTP language and standards.

Strategies

LEO-II employs an adapted "Otter loop". Moreover, LEO-II uses some basic strategy scheduling to try different search strategies or flag settings. These search strategies also include some different relevance filters.

Implementation

LEO-II is implemented in OCaml 4, and its problem representation language is the TPTP THF language [BRS08]. In fact, the development of LEO-II has largely paralleled the development of the TPTP THF language and related infrastructure [SB10]. LEO-II's parser supports the TPTP THF0 language and also the TPTP languages FOF and CNF.

Unfortunately the LEO-II system still uses only a very simple sequential collaboration model with first-order ATPs instead of using the more advanced, concurrent and resource-adaptive OANTS architecture [BS+08] as exploited by its predecessor LEO.

The LEO-II system is distributed under a BSD style license, and it is available from

    http://www.leoprover.org

Expected Competition Performance

There are only minor recent improvements on LEO-II. It is unclear whether they are sufficient for attacking LEO-II's main competitors.

MaLARea 0.5

Josef Urban
Radboud University Nijmegen, The Netherlands

Architecture

MaLARea 0.5 [
Urb07, US+08] is a metasystem for ATP in large theories where symbol and formula names are used consistently. It uses several deductive systems (now E, SPASS, Vampire, Paradox, Mace), as well as complementary AI techniques like machine learning (the SNoW system) based on symbol-based similarity, model-based similarity, term-based similarity, and obviously previous successful proofs. The version for CASC will mainly use E prover with the BliStr [Urb13] large-theory strategies, possibly also Prover9, Mace and Paradox. The premise selection methods will likely also use the distance-weighted k-nearest neighbor [KU13] and E's implementation of SInE.

Strategies

The basic strategy is to run ATPs on problems, then use the machine learner to learn axiom relevance for conjectures from solutions, and use the most relevant axioms for next ATP attempts. This is iterated, using different timelimits and axiom limits. Various features are used for learning, and the learning is complemented by other criteria like model-based reasoning, symbol and term-based similarity, etc.

Implementation

The metasystem is implemented in ca. 2500 lines of Perl. It uses many external programs - the above mentioned ATPs and machine learner, TPTP utilities, LADR utilities for work with models, and some standard Unix tools.

MaLARea is available from

    https://github.com/JUrban/MPTP2/tree/master/MaLARea
The metasystem's Perl code is released under GPL2

Expected Competition Performance

Thanks to machine learning, MaLARea is strongest on batches of many related problems with many redundant axioms where some of the problems are easy to solve and can be used for learning the axiom relevance. MaLARea is not very good when all problems are too difficult (nothing to learn from), or the problems (are few and) have nothing in common. Some of its techniques (selection by symbol and term-based similarity, model-based reasoning) could however make it even there slightly stronger than standard ATPs. MaLARea has a very good performance on the MPTP Challenge, which is a predecessor of the LTB division, and it is the winner of the 2008 MZR LTB category. MaLARea 0.4 came second in the 2012 MZR@Turing competition and solved most problems in the Assurance class.


Muscadet 4.5

Dominique Pastre
University Paris Descartes, France

Architecture

The Muscadet theorem prover is a knowledge-based system. It is based on Natural Deduction, following the terminology of [
Ble71] and [Pas78], and uses methods which resembles those used by humans. It is composed of an inference engine, which interprets and executes rules, and of one or several bases of facts, which are the internal representation of "theorems to be proved". Rules are either universal and put into the system, or built by the system itself by metarules from data (definitions and lemmas). Rules may add new hypotheses, modify the conclusion, create objects, split theorems into two or more subtheorems or build new rules which are local for a (sub-)theorem.

Strategies

There are specific strategies for existential, universal, conjunctive or disjunctive hypotheses and conclusions, and equalities. Functional symbols may be used, but an automatic creation of intermediate objects allows deep subformulae to be flattened and treated as if the concepts were defined by predicate symbols. The successive steps of a proof may be forward deduction (deduce new hypotheses from old ones), backward deduction (replace the conclusion by a new one), refutation (only if the conclusion is a negation), search for objects satisfying the conclusion or dynamic building of new rules.

The system is also able to work with second order statements. It may also receive knowledge and know-how for a specific domain from a human user; see [Pas89] and [Pas93]. These two possibilities are not used while working with the TPTP Library.

Implementation

Muscadet [Pas01] is implemented in SWI-Prolog. Rules are written as more or less declarative Prolog clauses. Metarules are written as sets of Prolog clauses. The inference engine includes the Prolog interpreter and some procedural Prolog clauses. A theorem may be split into several subtheorems, structured as a tree with "and" and "or" nodes. All the proof search steps are memorized as facts including all the elements which will be necessary to extract later the useful steps (the name of the executed action or applied rule, the new facts added or rule dynamically built, the antecedents and a brief explanation).

Muscadet is available from

    http://www.normalesup.org/~pastre/muscadet/muscadet.html

Expected Competition Performance

The best performances of Muscadet will be for problems manipulating many concepts in which all statements (conjectures, definitions, axioms) are expressed in a manner similar to the practice of humans, especially of mathematicians [Pas02,Pas07]. It will have poor performances for problems using few concepts but large and deep formulas leading to many splittings. Its best results will be in set theory, especially for functions and relations. It has not been not really improved from previous years. Nevertheless it is still maintained and slightly improved. Its originality is that proofs are given in natural style. It should now be combined with other provers.


Nitpick 2015

Jasmin Blanchette
Inria Nancy, France

Architecture

Nitpick [
BN10] is an open source counterexample generator for Isabelle/HOL [NPW02]. It builds on Kodkod [TJ07], a highly optimized first-order relational model finder based on SAT. The name Nitpick is appropriated from a now retired Alloy precursor. In a case study, it was applied successfully to a formalization of the C++ memory model [BW+11].

Strategies

Nitpick employs Kodkod to find a finite model of the negated conjecture. The translation from HOL to Kodkod's first-order relational logic (FORL) is parameterized by the cardinalities of the atomic types occurring in it. Nitpick enumerates the possible cardinalities for each atomic type, exploiting monotonicity to prune the search space [BK11]. If a formula has a finite counterexample, the tool eventually finds it, unless it runs out of resources.

SAT solvers are particularly sensitive to the encoding of problems, so special care is needed when translating HOL formulas. As a rule, HOL scalars are mapped to FORL singletons and functions are mapped to FORL relations accompanied by a constraint. More specifically, an n-ary first-order function (curried or not) can be coded as an (n + 1)-ary relation accompanied by a constraint. However, if the return type is the type of Booleans, the function is more efficiently coded as an unconstrained n-ary relation. Higher-order quantification and functions bring complications of their own. A function from σ to τ cannot be directly passed as an argument in FORL; Nitpick's workaround is to pass |σ| arguments of type τ that encode a function table.

Implementation

Nitpick, like most of Isabelle/HOL, is written in Standard ML. Unlike Isabelle itself, which adheres to the LCF small-kernel discipline, Nitpick does not certify its results and must be trusted.

Nitpick is available as part of Isabelle/HOL for all major platforms under a BSD-style license from

    http://www.cl.cam.ac.uk/research/hvg/Isabelle

Expected Competition Performance

Thanks to Kodkod's amazing power, we expect that Nitpick will beat both Satallax and Refute with its hands tied behind its back.


Princess 20150706

Philipp Rümmer
Uppsala University, Sweden

Architecture

Princess [
Rue08,Rue12] is a theorem prover for first-order logic modulo linear integer arithmetic. The prover uses a combination of techniques from the areas of first-order reasoning and SMT solving. The main underlying calculus is a free-variable tableau calculus, which is extended with constraints to enable backtracking-free proof expansion, and positive unit hyper-resolution for lightweight instantiation of quantified formulae. Linear integer arithmetic is handled using a set of built-in proof rules resembling the Omega test, which altogether yields a calculus that is complete for full Presburger arithmetic, for first-order logic, and for a number of further fragments. In addition, some built-in procedures for nonlinear integer arithmetic are available. The internal calculus of Princess only supports uninterpreted predicates; uninterpreted functions are encoded as predicates, together with the usual axioms. Through appropriate translation of quantified formulae with functions, the e-matching technique common in SMT solvers can be simulated; triggers in quantified formulae are chosen based on heuristics similar to those in the Simplify prover.

Strategies

For CASC, Princess will run a fixed schedule of configurations for each problem (portfolio method). Configurations determine, among others, the mode of proof expansion (depth-first, breadth-first), selection of triggers in quantified formulae, clausification, and the handling of functions. The portfolio was chosen based on training with a random sample of problems from the TPTP library.

Implementation

Princess is entirely written in Scala and runs on any recent Java virtual machine; besides the standard Scala and Java libraries, only the Cup parser library is used. Princess is available from
    http://www.philipp.ruemmer.org/princess.shtml

Expected Competition Performance

Princess is mainly designed for integer problems (TFI), and should perform reasonably well here. Compared to last year, only minor updates were done, so that performance should be similar as in 2014. This version of Princess does not output proofs, however, and will be ranked accordingly.


Refute 2015

Jasmin Blanchette
Inria Nancy, France

Architecture

Refute [
Web08] is an open source counterexample generator for Isabelle/HOL [NPW02] based on a SAT solver, and Nitpick's [BN10] precursor.

Strategies

Refute employs a SAT solver to find a finite model of the negated conjecture. The translation from HOL to propositional logic is parameterized by the cardinalities of the atomic types occurring in the conjecture. Refute enumerates the possible cardinalities for each atomic type. If a formula has a finite counterexample, the tool eventually finds it, unless it runs out of resources.

Implementation

Refute, like most of Isabelle/HOL, is written in Standard ML. Unlike Isabelle itself, which adheres to the LCF small-kernel discipline, Refute does not certify its results and must be trusted.

Refute is available as part of Isabelle/HOL for all major platforms under a BSD-style license from

    http://www.cl.cam.ac.uk/research/hvg/Isabelle

Expected Competition Performance

We expect Refute to beat Satallax but also to be beaten by Nitpick.


Satallax 2.8

Nik Sultana
Cambridge University, United Kingdom

Architecture

Satallax 2.8 [
Bro12] is an automated theorem prover for higher-order logic. The particular form of higher-order logic supported by Satallax is Church's simple type theory with extensionality and choice operators. The SAT solver MiniSat [ES04] is responsible for much of the search for a proof. The theoretical basis of search is a complete ground tableau calculus for higher-order logic [BS10] with a choice operator [BB11]. A problem is given in the THF format. A branch is formed from the axioms of the problem and the negation of the conjecture (if any is given). From this point on, Satallax tries to determine unsatisfiability or satisfiability of this branch. Satallax progressively generates higher-order formulae and corresponding propositional clauses [Bro13]. These formulae and propositional clauses correspond to instances of the tableau rules. Satallax uses the SAT solver MiniSat as an engine to test the current set of propositional clauses for unsatisfiability. If the clauses are unsatisfiable, then the original branch is unsatisfiable. Additionally, Satallax may optionally generate first-order formulas in addition to the propositional clauses. If this option is used, then Satallax peroidically calls the first-order theorem prover E to test for first-order unsatisfiability. If the set of first-order formulas is unsatisfiable, then the original branch is unsatisfiable.

Strategies

There are about a hundred flags that control the order in which formulas and instantiation terms are considered and propositional clauses are generated. Other flags activate some optional extensions to the basic proof procedure (such as whether or not to call the theorem prover E). A collection of flag settings is called a mode. Approximately 500 modes have been defined and tested so far. A strategy schedule is an ordered collection of modes with information about how much time the mode should be allotted. Satallax tries each of the modes for a certain amount of time sequentially. Satallax 2.7 has strategy schedule consisting of 68 modes. Each mode is tried for time limits ranging from 0.1 seconds to 54.9 seconds. The strategy schedule was determined through experimentation using the THF problems in version 5.4.0 of the TPTP library.

Implementation

Satallax is implemented in OCaml. A foreign function interface is used to interact with MiniSat 2.2.0. Satallax is available from
    http://mathgate.info/cebrown/satallax/
http://mathgate.info/cebrown/satallax/

Expected Competition Performance

Similar to last year, since the changes from v2.7 to v2.8 consist of improvements in proof output.


Satallax-MaLeS 1.3

Daniel Kuehlwein
Radboud University Nijmegen, The Netherlands

Architecture

Satallax-MaLeS 1.3 improves Satallax's automatic mode with machine learning techniques to predict which search strategy is most likely to find a proof.

Strategies

Satallax-MaLeS 1.3 relies on Geoff Sutcliffe's MakeListStat to classify problems. It uses the same data as Satallax_MaLeS 1.2 as basis for the learning algorithms.

Implementation

Satallax-MaLeS is based on Python, in particular the Sklearn library that contains many machine learning algorithms. After CASC Satallax-MaLeS will be available from
    
    http://www.cs.ru.nl/~kuehlwein/

Expected Competition Performance

Satallax-MaLes 1.3 is the CASC-J7 THF division winner.


SPASS+T 2.2.22

Uwe Waldmann
Max-Planck-Institut für Informatik, Germany

Architecture

SPASS+T is an extension of the superposition-based theorem prover SPASS that integrates algebraic knowledge into SPASS in three complementary ways: by passing derived formulas to an external SMT procedure (currently Yices or CVC3), by adding standard axioms, and by built-in arithmetic simplification and inference rules. A first version of the system has been described in [
PW06]; later a much more sophisticated coupling of the SMT procedure has been added [Zim07]. The latest version fixes some bugs related to memory management and to name clashes with the SMT procedure and provides improved support for non-arithmetic problems.

Strategies

Standard axioms and built-in arithmetic simplification and inference rules are integrated into the standard main loop of SPASS. Inferences between standard axioms are excluded, so the user-supplied formulas are taken as set of support. The external SMT procedure runs in parallel in a separate process, leading occasionally to non-deterministic behaviour.

Implementation

SPASS+T is implemented in C. The system is available from
    http://people.mpi-inf.mpg.de/~uwe/software/#TSPASS

Expected Competition Performance

We expect SPASS+T to be among the leading systems in the TFA division at CASC-25.


Vampire 2.6

Krystof Hoder, Andrei Voronkov
University of Manchester, England

Architecture

Vampire 2.6 is an automatic theorem prover for first-order classical logic. It consists of a shell and a kernel. The kernel implements the calculi of ordered binary resolution and superposition for handling equality. It also implements the Inst-gen calculus. The splitting rule in kernel adds propositional parts to clauses, which are being manipulated using binary decision diagrams and a SAT solver. A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered unit equalities. The reduction ordering is the Knuth-Bendix Ordering.

Substitution tree and code tree indexes are used to implement all major operations on sets of terms, literals and clauses. Although the kernel of the system works only with clausal normal form, the shell accepts a problem in the full first-order logic syntax, clausifies it and performs a number of useful transformations before passing the result to the kernel. Also the axiom selection algorithm Sine [HV11] can be enabled as part of the preprocessing.

When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF.

Strategies

The Vampire 2.6 kernel provides a fairly large number of options for strategy se lection. The most important ones are:

Implementation

Vampire 2.6 is implemented in C++.

Expected Competition Performance

Vampire 2.6 is the CASC-J7 FOF division winner.


Vampire 4.0

Giles Reger
University of Manchester, United Kingdom

Architecture

Vampire 4.0 is an automatic theorem prover for first-order logic. Vampire implements the calculi of ordered binary resolution and superposition for handling equality. It also implements the Inst-gen calculus and a MACE-style finite model builder. Splitting in resolution-based proof search is controlled by the AVATAR architecture, which uses a SAT solver to make splitting decisions. Both resolution and instantiation based proof search make use of global subsumption.

A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered unit equalities. The reduction ordering is the Knuth-Bendix Ordering. Substitution tree and code tree indexes are used to implement all major operations on sets of terms, literals and clauses. Internally, Vampire works only with clausal normal form. Problems in the full first-order logic syntax are clausified during preprocessing. Vampire implements many useful preprocessing transformations including the Sine axiom selection algorithm.

When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF.

Strategies

Vampire 4.0 provides a very large number of options for strategy selection. The most important ones are:

Implementation

Vampire 4.0 is implemented in C++.

Expected Competition Performance

Vampire 4.0 should be an improvement on Vampire 2.6, which is the CASC-J7 FOF division winner. Improvements have also been made to the parts of the prover relevant to satisfiability checking and theory reasoning.


VampireZ3 1.0

Giles Reger
University of Manchester, United Kingdom

Architecture

VampireZ3 version 1.0 is a combination of Vampire version 4.0 (see
other description) and Z3 version 4.3.1. Vampire 4.0 uses the AVATAR architecture to make splitting decisions. Briefly, the first-order search space is represented in the SAT solver with propositional symbols consistently naming variable-disjoint components. A SAT solver is then used to (iteratively) select a subset of components to search. In VampireZ3 the Z3 SMT solver is used in place of a SAT solver and ground components are translated into Z3 terms. This means Z3’s efficient methods for ground reasoning with equality and theories are exposed by AVATAR, as the SMT solver only produces theory-consistent models.

Strategies

All strategies of Vampire 4.0 are available. Z3 is only used when splitting is selected.

Implementation

Vampire and Z3 are both implemented in C++.

Expected Competition Performance

VampireZ3 1.0 should perform better than Vampire 4.0 on theory problems. The combination of Z3 for reasoning with ground theories and Vampire for reasoning with quantifiers should be very powerful.


ZenonArith 0.1.0

Guillaume Bury
Inria, France

Architecture

Zenon Arith 0.1.0 [
BD15] is an extension of the tableaux-based Zenon [BDD07], to linear arithmetic. This extension uses the simplex algorithm as a decision procedure to solve problems over rationals, and a branch-and-bound approach to solve problems over integers.

This extension is also able to handle real linear arithmetic, which actually coincides with the rational fragment of linear arithmetic due to the syntactical restrictions of the TPTP input format for reals.

Strategies

This extension consists of a smooth integration of arithmetic deductive rules to the basic tableau rules, so that there is a natural interleaving between arithmetic and regular analytic rules. This extension is able to deal with problems that are (exclusively) universally quantified, or existentially quantified formulas. Universally quantified formulas are handled by translating unsatisfiability explanation from the simplex algorithm into inference rules. Instantiation for existentially quantified formulas is achieved by trying to solve sets of formulas that cover all branches of a derivation tree.

Implementation

Zenon Arith is implemented in OCaml, and uses the Zarith (which is a binding for the gmp library) library to handle arbitrary precision integers and rationals. It uses an incremental implementation of the simplex algorithm in OCaml as a decision procedure over linear arithmetic. This version of Zenon comes with a built-in notion of types for terms, and inference rules that discriminate over the types of expressions. Zenon Arith can automatically output Coq [CH88] proof scripts, that can then be checked by the Coq proof assistant. The main developper is Guillaume Bury. Zenon Arith is available at:
    https://www.rocq.inria.fr/deducteam/ZenonArith/

Expected Competition Performance

We do not expect to be among the best provers in terms of number of solved problems, but we aim to solve a large part of the problems, with the lowest average CPU time.