Entrants' System Descriptions

Beagle 0.9.47
CVC4 1.5 (CASC-25 TFN division winner)
CVC4 1.5.1
E 2.0
Geo-III 2016C
iProver 2.5
Isabelle 2015
leanCoP 2.2
LEO-II 1.7.0
Leo-III 1.0
Leo+III 1.0
Nitpick 2015
Princess 160606
Prover9 2011-11A
Refute 2015
Satallax 2.8 (CASC-25 THF division winner)
Satallax 3.0
Vampire 4.0 (CASC-25 FOF, FNT, EPR, and LTB division winner)
Vampire 4.1
VampireZ3 1.0 (CASC-25 TFA division winner)

Beagle 0.9.47

Architecture

Beagle [BBW15] is an automated theorem prover for sorted first-order logic with equality over built-in theories. The theories currently supported are integer arithmetic, linear rational arithmetic and linear real arithmetic. It accepts formulas in the FOF and TFF formats of the TPTP syntax, and formulas in the SMT-LIB version 2 format.

Beagle first converts the input formulas into clause normal form. Pure arithmetic (sub-)formulas are treated by eager application of quantifier elimination. The core reasoning component implements the Hierarchic Superposition Calculus with Weak Abstraction (HSPWA) [BW13]. Extensions are a splitting rule for clauses that can be divided into variable disjoint parts, and a partial instantiation rule for variables with finite domain, and two kinds of background-sorted variables trading off completeness vs. search space.

The HSPWA calculus generalizes the superposition calculus by integrating theory reasoning in a black-box style. For the theories mentioned above, Beagle combines quantifier elimination procedures and other solvers to dispatch proof obligations over these theories. The default solvers are an improved version of Cooper's algorithm for linear integer arithmetic, and the CVC4 SMT solver for linear real/rational arithmetic. Non-linear integer arithmetic is treated by partial instantiation and additional lemmas.

Strategies

Beagles uses the Discount loop for saturating a clause set under the calculus' inference rules. Simplification techniques include standard ones, such as subsumption deletion, demodulation by ordered unit equations, and tautology deletion. It also includes theory specific simplification rules for evaluating ground (sub)terms, and for exploiting cancellation laws and properties of neutral elements, among others. In the competition an aggressive form of arithmetic simplification is used, which seems to perform best in practice.

Beagle uses strategy scheduling by trying (at most) three flag settings sequentially.

Implementation

Beagle is implemented in Scala. It is a full implementation of the HSPWA calculus. It uses a simple form of indexing, essentially top-symbol hashes, stored with each term and computed in a lazy way. Fairness is achieved through a combination of measuring clause weights and their derivation-age. It can be fine-tuned with a weight-age ratio parameter, as in other provers. Beagle's web site is

    https://bitbucket.org/peba123/beagle

Expected Competition Performance

Beagle is implemented in a straightforward way and would benefit from optimized data structures. We do not expect it to come in among the first.

CVC4 1.5

Andrew Reynolds
EPFL, Switzerland

Architecture

CVC4 [BC+11] is an SMT solver based on the DPLL(T) architecture [NOT06] that includes built-in support for many theories, including linear arithmetic, arrays, bit vectors, datatypes and strings. It incorporates approaches for handling universally quantified formulas. CVC4 primarily uses heuristic approaches based on E-matching for theorems, and finite model finding approaches for non-theorems. Like other SMT solvers, CVC4 treats quantified formulas using a two-tiered approach. First, quantified formulas are replaced by fresh Boolean predicates and the ground theory solver(s) are used in conjunction with the underlying SAT solver to determine satisfiability. If the problem is unsatisfiable at the ground level, then the solver answers "unsatisfiable". Otherwise, the quantifier instantiation module is invoked, and will either add instances of quantified formulas to the problem, answer "satisfiable", or return unknown. Finite model finding in CVC4 targets problems containing background theories whose quantification is limited to finite and uninterpreted sorts. In finite model finding mode, CVC4 uses a ground theory of finite cardinality constraints that minimizes the number of ground equivalence classes, as described in [RT+13]. When the problem is satisfiable at the ground level, a candidate model is constructed that contains complete interpretations for all predicate and function symbols. It then adds instances of quantified formulas that are in conflict with the candidate model, as described in [RT+13]. If no instances are added, it reports "satisfiable".

Strategies

For handling theorems, CVC4 primarily uses configurations that combine conflict-based quantifier instantiation [RTd14] and E-matching. CVC4 uses a handful of orthogonal trigger selection strategies for E-matching. For handling non-theorems, CVC4 primarily uses finite model finding techniques. These techniques can also be used for bounded integer quantification for non-theorems involving arithmetic [Rey13]. Since CVC4 with finite model finding is also capable of establishing unsatisfiability, it is used as a strategy for theorems as well. For problems in pure arithmetic, CVC4 uses techniques for counterexample-guided quantifier instantiation [RD+15], which select relevant quantifier instantiations based on models for counterexamples to quantified formulas. CVC4 relies on this method both for theorems in TFA and non-theorems in TFN.

Implementation

CVC4 is implemented in C++. The code is available from

    https://github.com/CVC4

Expected Competition Performance

CVC4 1.5 is the CASC-25 TFN division winner.

CVC4 1.5

Andrew Reynolds
University of Iowa, USA

Architecture

Like other SMT solvers, CVC4 treats quantified formulas using a two-tiered approach. First, quantified formulas are replaced by fresh Boolean predicates and the ground theory solver(s) are used in conjunction with the underlying SAT solver to determine satisfiability. If the problem is unsatisfiable at the ground level, then the solver answers "unsatisfiable". Otherwise, the quantifier instantiation module is invoked, and will either add instances of quantified formulas to the problem, answer "satisfiable", or return unknown. Finite model finding in CVC4 targets problems containing background theories whose quantification is limited to finite and uninterpreted sorts. In finite model finding mode, CVC4 uses a ground theory of finite cardinality constraints that minimizes the number of ground equivalence classes, as described in [RT+13]. When the problem is satisfiable at the ground level, a candidate model is constructed that contains complete interpretations for all predicate and function symbols. It then adds instances of quantified formulas that are in conflict with the candidate model, as described in [RT+13]. If no instances are added, it reports "satisfiable".

Strategies

For handling theorems, CVC4 primarily uses configurations that combine conflict-based quantifier instantiation [RTd14] and E-matching. CVC4 uses a handful of orthogonal trigger selection strategies for E-matching. For handling non-theorems, CVC4 primarily uses finite model finding techniques. Since CVC4 with finite model finding is also capable of establishing unsatisfiability, it is used as a strategy for theorems as well. For problems in pure arithmetic, CVC4 uses variations of counterexample-guided quantifier instantiation, which select relevant quantifier instantiations based on models for counterexamples to quantified formulas. CVC4 relies on this method both for theorems in TFA and non-theorems in TFN.

Implementation

CVC4 is implemented in C++. The code is available from

    https://github.com/CVC4

Expected Competition Performance

CVC4 should perform moderately better than last year in FOF and TFA. The main improvements have been a new implementation of counterexample-guided quantifier instantiation [RD+15] for linear real and integer arithmetic, optimizations for ground theory combination and conflict-based quantifier instantiation, and the use of new strategies. It should perform roughly the same in FNT and TFN.

E 2.0

Stephan Schulz
DHBW Stuttgart, Germany

Architecture

E 2.0 [Sch02, Sch13] is a purely equational theorem prover for many-sorted first-order logic with equality. It consists of an (optional) clausifier for pre-processing full first-order formulae into clausal form, and a saturation algorithm implementing an instance of the superposition calculus with negative literal selection and a number of redundancy elimination techniques. E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e., a strict separation of active and passive facts. No special rules for non-equational literals have been implemented. Resolution is effectively simulated by paramodulation and equality resolution.

For the LTB divisions, a control program uses a SInE-like analysis to extract reduced axiomatizations that are handed to several instances of E. E will not use on-the-fly learning this year.

Strategies

Proof search in E is primarily controlled by a literal selection strategy, a clause selection heuristic, and a simplification ordering. The prover supports a large number of pre-programmed literal selection strategies. Clause selection heuristics can be constructed on the fly by combining various parameterized primitive evaluation functions, or can be selected from a set of predefined heuristics. Clause evaluation heuristics are based on symbol-counting, but also take other clause properties into account. In particular, the search can prefer clauses from the set of support, or containing many symbols also present in the goal. Supported term orderings are several parameterized instances of Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO).

For CASC-J8, E implements a strategy-scheduling automatic mode. The total CPU time available is broken into several (unequal) time slices. For each time slice, the problem is classified into one of several classes, based on a number of simple features (number of clauses, maximal symbol arity, presence of equality, presence of non-unit and non-Horn clauses, ...). For each class, a schedule of strategies is greedily constructed from experimental data as follows: The first strategy assigned to a schedule is the the one that solves the most problems from this class in the first time slice. Each subsequent strategy is selected based on the number of solutions on problems not already solved by a preceding strategy.

About 210 different strategies have been evaluated on all untyped first-order problems from TPTP 6.0.0, and about 180 of these strategies are used in the automatic mode.

Implementation

E is build around perfectly shared terms, i.e. each distinct term is only represented once in a term bank. The whole set of terms thus consists of a number of interconnected directed acyclic graphs. Term memory is managed by a simple mark-and-sweep garbage collector. Unconditional (forward) rewriting using unit clauses is implemented using perfect discrimination trees with size and age constraints. Whenever a possible simplification is detected, it is added as a rewrite link in the term bank. As a result, not only terms, but also rewrite steps are shared. Subsumption and contextual literal cutting (also known as subsumption resolution) is supported using feature vector indexing [ Sch04]. Superposition and backward rewriting use fingerprint indexing [Sch12], a new technique combining ideas from feature vector indexing and path indexing. Finally, LPO and KBO are implemented using the elegant and efficient algorithms developed by Bernd Löchner in [Loe06, Loe06]. The prover and additional information are available at

    http://www.eprover.org

Expected Competition Performance

E 2.0 has slightly better strategies than previous versions, and has some minor improvements in the inference engine. The system is expected to perform well in most proof classes, but will at best complement top systems in the disproof classes.

Geo-III 2016C

Hans de Nivelle
University of Wrocław, Poland

Architecture

Geo III is a theorem prover for Partial Classical Logic, based on reduction to Kleene Logic [deN14]. Currently, only Chapters 4 and 5 are implemented. Since Kleene logic generalizes 2-valued logic, Geo III can take part in CASC. Apart from being 3-valued, the main differences with earlier versions of Geo are (1) more sophisticated learning schemes, (2) improved proof logging, and (3) replacement of recursion by explicit use of a stack. The Geo family of provers uses exhaustive backtracking, combined with learning after failure. Earlier versions learned only conflict formulas. Geo III learns disjunctions of arbitrary width. Experiments show that this often results in shorter proofs.

If Geo will be ever embedded in proof assistants, these assistants will require proofs. In order to be able to provide these at the required level of detail, Geo III contains a hierarchy of proof rules that is independent of the rest of the system, and that can be modified independently. In order to be flexible in the main loop, recursion was replaced by an explicit stack. Using an explicit stack, it is easier to remove unused assumptions, or to rearrange the order of assumptions. Also, restarts are easier to implement with a stack.

Strategies

Geo uses breadth-first, exhaustive model search, combined with learning. In case of branching, the branches are explored in random order. Specially for CASC, a restart strategy was added, which ensures that proof search is always restarted after 4 minutes. This was done because Geo III has no indexing. After some time, proof search becomes so inefficient that it makes no sense to continue, so that it is better to restart.

Implementation

Geo III is written in C++-11. No features outside of the standard are used. It has been tested with g++ version 4.8.4 and with clang. Difference with previous year's version is that version 2016C uses sophisticated matching algorithms [deN16] for establishing if a geometric formula is false in an interpretation.

Expected Competition Performance

We expect that Geo 2016C will be better than Geo 2015E.

iProver 2.5

Kontantin Korovin
University of Manchester, United Kingdom

Architecture

iProver is an automated theorem prover based on an instantiation calculus Inst-Gen [GK03, Kor13] which is complete for first-order logic. iProver combines first-order reasoning with ground reasoning for which it uses MiniSat [ES04] and optionally PicoSAT [Bie08] (only MiniSat will be used at this CASC). iProver also combines instantiation with ordered resolution; see [Kor08, Kor13] for the implementation details. The proof search is implemented using a saturation process based on the given clause algorithm. iProver uses non-perfect discrimination trees for the unification indexes, priority queues for passive clauses, and a compressed vector index for subsumption and subsumption resolution (both forward and backward). The following redundancy eliminations are implemented: blocking non-proper instantiations; dismatching constraints [GK04, Kor08]; global subsumption [Kor08]; resolution-based simplifications and propositional-based simplifications. A compressed feature vector index is used for efficient forward/backward subsumption and subsumption resolution. Equality is dealt with (internally) by adding the necessary axioms of equality. Recent changes in iProver include improved preprocessing and incremental finite model finding; support of the AIG format for hardware verification and model-checking (implemented with Dmitry Tsarkov).

In the LTB division, iProver uses axiom selection based on the Sine algorithm [HV11] as implemented in Vampire [KV13], i.e., axiom selection is done by Vampire and proof attempts are done by iProver.

Some of iProver features are summarised below.

proof extraction for both instantiation and resolution [KS12],
model representation, using first-order definitions in term algebra [KS12],
answer substitutions,
semantic filtering,
incremental finite model finding,
sort inference, monotonic [CLS11] and non-cyclic [Kor13] sorts,
predicate elimination [KK16].

Sort inference is targeted at improving finite model finding and symmetry breaking. Semantic filtering is used in preprocessing to eliminated irrelevant clauses. Proof extraction is challenging due to simplifications such global subsumption which involve global reasoning with the whole clause set and can be computationally expensive.

Strategies

iProver has around 60 options to control the proof search including options for literal selection, passive clause selection, frequency of calling the SAT solver, simplifications and options for combination of instantiation with resolution. At CASC iProver will execute a small number of fixed schedules of selected options depending on general syntactic properties such as Horn/non-Horn, equational/non-equational, and maximal term depth. For the LTB and FNT divisions several strategies are run in parallel.

Implementation

Prover is implemented in OCaml and for the ground reasoning uses MiniSat [ES04]. iProver accepts FOF and CNF formats. Vampire [KV13, HK+12] and E prover [Sch13] are used for proof-producing clausification of FOF problems, Vampire is also used for axiom selection [HV11] in the LTB division.

iProver is available at:

    http://www.cs.man.ac.uk/~korovink/iprover/

Expected Competition Performance

Compared to the last year, we restructured core datastructures aiming at flexibility to different extensions rather than performance. We also improved preprocessing, including predicated elimination. We expect a moderatly improved overall performance.

Isabelle 2015

Jasmin Blanchette
Inria Nancy, France

Architecture

Isabelle/HOL 2015 [NPW02] is the higher-order logic incarnation of the generic proof assistant Isabelle2015. Isabelle/HOL provides several automatic proof tactics, notably an equational reasoner [Nip89], a classical reasoner [PN94], and a tableau prover [Pau99]. It also integrates external first- and higher-order provers via its subsystem Sledgehammer [PB10,BBP11]. Isabelle includes a parser for the TPTP syntaxes CNF, FOF, TFF0, and THF0, due to Nik Sultana. It also includes TPTP versions of its popular tools, invokable on the command line as isabelle tptp_tool max_secs file.p. For example:

isabelle tptp_isabelle_hot 100 SEU/SEU824^3.p

Isabelle is available in two versions. The HOT version (which is not participating in CASC-J8) includes LEO-II [BP+08] and Satallax [Bro12] as Sledgehammer backends, whereas the competition version leaves them out.

Strategies

The Isabelle tactic submitted to the competition simply tries the following tactics sequentially:

sledgehammer

Invokes the following sequence of provers as oracles via Sledgehammer:

satallax - Satallax 2.7 [Bro12] (HOT version only);
leo2 - LEO-II 1.6.2 [BPTF08] (HOT version only);
spass - SPASS 3.8ds [BP+12];
vampire - Vampire 3.0 (revision 1435) [RV02];
e - E 1.8 [Sch04];

nitpick

For problems involving only the type $o of Booleans, checks whether a finite model exists using Nitpick [BN10].

simp

Performs equational reasoning using rewrite rules [Nip89].

blast

Searches for a proof using a fast untyped tableau prover and then attempts to reconstruct the proof using Isabelle tactics [Pau99].

auto+spass

Combines simplification and classical reasoning [PN94] under one roof; then invoke Sledgehammer with SPASS on any subgoals that emerge.

z3

Invokes the SMT solver Z3 4.4.0 [dMB08].

cvc4

Invokes the SMT solver CVC4 1.5pre [BT07].

fast

Searches for a proof using sequent-style reasoning, performing a depth-first search [PN94]. Unlike blast, it construct proofs directly in Isabelle. That makes it slower but enables it to work in the presence of the more unusual features of HOL, such as type classes and function unknowns.

best

Similar to fast, except that it performs a best-first search.

force

Similar to auto, but more exhaustive.

meson

Implements Loveland's MESON procedure [Lov78]. Constructs proofs directly in Isabelle.

fastforce

Combines fast and force.

Implementation

Isabelle is a generic theorem prover written in Standard ML. Its meta-logic, Isabelle/Pure, provides an intuitionistic fragment of higher-order logic. The HOL object logic extends pure with a more elaborate version of higher-order logic, complete with the familiar connectives and quantifiers. Other object logics are available, notably FOL (first-order logic) and ZF (Zermelo–Fraenkel set theory).

The implementation of Isabelle relies on a small LCF-style kernel, meaning that inferences are implemented as operations on an abstract theorem datatype. Assuming the kernel is correct, all values of type theorem are correct by construction.

Most of the code for Isabelle was written by the Isabelle teams at the University of Cambridge and the Technische Universität München. Isabelle/HOL is available for all major platforms under a BSD-style license from

    http://www.cl.cam.ac.uk/research/hvg/Isabelle

Expected Competition Performance

Thanks to the addition of CVC4 and a new version of Vampire, Isabelle might have become now strong enough to take on Satallax and its various declensions. But we expect Isabelle to end in second or third place, to be honest.

leanCoP 2.2

Jens Otten
University of Potsdam, Germany

Architecture

leanCoP [OB03, Ott08] is an automated theorem prover for classical first-order logic with equality. It is a very compact implementation of the connection (tableau) calculus [Bib87, LS01].

Strategies

The reduction rule of the connection calculus is applied before the extension rule. Open branches are selected in a depth-first way. Iterative deepening on the proof depth is performed in order to achieve completeness. Additional inference rules and techniques include regularity, lemmata, and restricted backtracking [Ott10]. leanCoP uses an optimized structure-preserving transformation into clausal form [Ott10] and a fixed strategy scheduling, which is controlled by a shell script.

Implementation

leanCoP is implemented in Prolog. The source code of the core prover consists only of a few lines of code. Prolog's built-in indexing mechanism is used to quickly find connections when the extension rule is applied.

leanCoP can read formulae in leanCoP syntax and in TPTP first-order syntax. Equality axioms and axioms to support distinct objects are automatically added if required. The leanCoP core prover returns a very compact connection proof, which is then translated into a more comprehensive output format, e.g., into a lean (TPTP-style) connection proof or into a readable text proof.

The source code of leanCoP 2.2 is available under the GNU general public license. It can be downloaded from the leanCoP website at:

    http://www.leancop.de

The website also contains information about ileanCoP [Ott08] and MleanCoP [Ott12, Ott14], two versions of leanCoP for first-order intuitionistic logic and first-order modal logic, respectively.

Expected Competition Performance

As the prover has not changed, the performance of leanCoP 2.2 is expected to be the same as last year.

LEO-II 1.7.0

Max Wisniewski
Freie Universität Berlin, Germany

Architecture

LEO-II [BP+08], the successor of LEO [BK98], is a higher-order ATP system based on extensional higher-order resolution. More precisely, LEO-II employs a refinement of extensional higher-order RUE resolution [Ben99]. LEO-II is designed to cooperate with specialist systems for fragments of higher-order logic. By default, LEO-II cooperates with the first-order ATP system E [Sch02]. LEO-II is often too weak to find a refutation amongst the steadily growing set of clauses on its own. However, some of the clauses in LEO-II's search space attain a special status: they are first-order clauses modulo the application of an appropriate transformation function. Therefore, LEO-II launches a cooperating first-order ATP system every n iterations of its (standard) resolution proof search loop (e.g., 10). If the first-order ATP system finds a refutation, it communicates its success to LEO-II in the standard SZS format. Communication between LEO-II and the cooperating first-order ATP system uses the TPTP language and standards.

Strategies

LEO-II employs an adapted "Otter loop". Moreover, LEO-II uses some basic strategy scheduling to try different search strategies or flag settings. These search strategies also include some different relevance filters.

Implementation

LEO-II is implemented in OCaml 4, and its problem representation language is the TPTP THF language [BRS08]. In fact, the development of LEO-II has largely paralleled the development of the TPTP THF language and related infrastructure [SB10]. LEO-II's parser supports the TPTP THF0 language and also the TPTP languages FOF and CNF.

Unfortunately the LEO-II system still uses only a very simple sequential collaboration model with first-order ATPs instead of using the more advanced, concurrent and resource-adaptive OANTS architecture [BS+08] as exploited by its predecessor LEO.

The LEO-II system is distributed under a BSD style license, and it is available from

    http://www.leoprover.org

Expected Competition Performance

Leo-II 1.7.0 differs from last years CASC version only wrt to some proof generation aspects and some other minor modifications. These changes are not expected to improve LEO-II's performance at CASC over the previous version.

Leo-III 1.0

Max Wisniewski
Freie Universität Berlin, Germany

Architecture

Leo-III [WSB14], the successor of LEO-II [BP+08], is a higher-order ATP system based on ordered higher-order paramodulation employing an agent-based blackboard architecture. In its first version, Leo-III is using multiple, adapted sequential DISCOUNT loops, each with different search strategies. In addition, similar to LEO-II, each sequential loop will call non-blockingly an external ATP every n iterations of the sequential loop. In the current version, the called ATPs have to understand THF syntax and return the result in standard SZS format. In the competition mode only our own prover LEO-II will be used as a cooperation prover. If either one of the paramodulation loops or one of the external provers finds a proof, the system stops and returns the result.

Strategies

Leo-III runs multiple search strategies in parallel. These strategies containing some incomplete versions, that are outperforming the complete versions for some problem inputs. The search also differs in the employed relevance filters, preprocessing techniques and hence the considered formula set.

Ultimately, Leo-III is in its first version an enhancement of LEO-II. The main improvement in comparison to its predecessor is the better equational handling with the new calculus, and the multi-search of the agent architecture.

Implementation

Leo-III is implemented in Scala. Its natural problem representation is the TPTP THF language [BRS08], but it can process every language of the TPTP including TFF and FOF. Leo-III is available from:

    https://github.com/cbenzmueller/Leo-III

Expected Competition Performance

In its first version Leo-III is not yet tuned for performance, but more of a straight-forward implementation of the calculus itself. Due to its cooperation with LEO-II it will work at least as good as LEO-II, but it will most probably not be able to compete with the main competitors.

Leo+III 1.0

Max Wisniewski
Freie Universität Berlin, Germany

Architecture

Strategies

Implementation

Leo-III is implemented in Scala. Its natural problem representation is the TPTP THF language [BRS08], but it can process every language of the TPTP including TFF and FOF. Leo-III is available from:

    https://github.com/cbenzmueller/Leo-III

Expected Competition Performance

This version is only for demonstrative purposes. For the use of Satallax a different set of preprocessing techniques is used, but this version should at least be as competitive as Leo-III in the competition.

Nitpick 2015

Jasmin Blanchette
Inria Nancy, France

Architecture

Nitpick [BN10] is an open source counterexample generator for Isabelle/HOL [NPW02]. It builds on Kodkod [TJ07], a highly optimized first-order relational model finder based on SAT. The name Nitpick is appropriated from a now retired Alloy precursor. In a case study, it was applied successfully to a formalization of the C++ memory model [BW+11].

Strategies

Nitpick employs Kodkod to find a finite model of the negated conjecture. The translation from HOL to Kodkod's first-order relational logic (FORL) is parameterized by the cardinalities of the atomic types occurring in it. Nitpick enumerates the possible cardinalities for each atomic type, exploiting monotonicity to prune the search space [BK11]. If a formula has a finite counterexample, the tool eventually finds it, unless it runs out of resources.

SAT solvers are particularly sensitive to the encoding of problems, so special care is needed when translating HOL formulas. As a rule, HOL scalars are mapped to FORL singletons and functions are mapped to FORL relations accompanied by a constraint. More specifically, an n-ary first-order function (curried or not) can be coded as an (n + 1)-ary relation accompanied by a constraint. However, if the return type is the type of Booleans, the function is more efficiently coded as an unconstrained n-ary relation. Higher-order quantification and functions bring complications of their own. A function from σ to τ cannot be directly passed as an argument in FORL; Nitpick's workaround is to pass |σ| arguments of type τ that encode a function table.

Implementation

Nitpick, like most of Isabelle/HOL, is written in Standard ML. Unlike Isabelle itself, which adheres to the LCF small-kernel discipline, Nitpick does not certify its results and must be trusted.

Nitpick is available as part of Isabelle/HOL for all major platforms under a BSD-style license from

    http://www.cl.cam.ac.uk/research/hvg/Isabelle

Expected Competition Performance

Thanks to Kodkod's amazing power, we expect that Nitpick will beat both Satallax and Refute with its hands tied behind its back.

Princess 160606

Philipp Rümmer
Uppsala University, Sweden

Architecture

Princess [Rue08, Rue12] is a theorem prover for first-order logic modulo linear integer arithmetic. The prover uses a combination of techniques from the areas of first-order reasoning and SMT solving. The main underlying calculus is a free-variable tableau calculus, which is extended with constraints to enable backtracking-free proof expansion, and positive unit hyper-resolution for lightweight instantiation of quantified formulae. Linear integer arithmetic is handled using a set of built-in proof rules resembling the Omega test, which altogether yields a calculus that is complete for full Presburger arithmetic, for first-order logic, and for a number of further fragments. In addition, some built-in procedures for nonlinear integer arithmetic are available.

The internal calculus of Princess only supports uninterpreted predicates; uninterpreted functions are encoded as predicates, together with the usual axioms. Through appropriate translation of quantified formulae with functions, the e-matching technique common in SMT solvers can be simulated; triggers in quantified formulae are chosen based on heuristics similar to those in the Simplify prover.

Strategies

For CASC, Princess will run a fixed schedule of configurations for each problem (portfolio method). Configurations determine, among others, the mode of proof expansion (depth-first, breadth-first), selection of triggers in quantified formulae, clausification, and the handling of functions. The portfolio was chosen based on training with a random sample of problems from the TPTP library.

Implementation

Princess is entirely written in Scala and runs on any recent Java virtual machine; besides the standard Scala and Java libraries, only the Cup parser library is used. Princess is available from:

    http://www.philipp.ruemmer.org/princess.shtml

Expected Competition Performance

Princess should perform roughly as in 2015. Compared to last year, initial support for outputting proofs was added, though not for all relevant configurations yet.

Prover9 2009-11A

Bob Veroff on behalf of William McCune
University of New Mexico, USA

Architecture

Prover9, Version 2009-11A, is a resolution/paramodulation prover for first-order logic with equality. Its overall architecture is very similar to that of Otter-3.3 [McC03]. It uses the "given clause algorithm", in which not-yet-given clauses are available for rewriting and for other inference operations (sometimes called the "Otter loop").

Prover9 has available positive ordered (and nonordered) resolution and paramodulation, negative ordered (and nonordered) resolution, factoring, positive and negative hyperresolution, UR-resolution, and demodulation (term rewriting). Terms can be ordered with LPO, RPO, or KBO. Selection of the "given clause" is by an age-weight ratio.

Proofs can be given at two levels of detail: (1) standard, in which each line of the proof is a stored clause with detailed justification, and (2) expanded, with a separate line for each operation. When FOF problems are input, proof of transformation to clauses is not given.

Completeness is not guaranteed, so termination does not indicate satisfiability.

Strategies

Prover9 has available many strategies; the following statements apply to CASC.

Given a problem, Prover9 adjusts its inference rules and strategy according to syntactic properties of the input clauses such as the presence of equality and non-Horn clauses. Prover9 also does some preprocessing, for example, to eliminate predicates.

For CASC Prover9 uses KBO to order terms for demodulation and for the inference rules, with a simple rule for determining symbol precedence.

For the FOF problems, a preprocessing step attempts to reduce the problem to independent subproblems by a miniscope transformation; if the problem reduction succeeds, each subproblem is clausified and given to the ordinary search procedure; if the problem reduction fails, the original problem is clausified and given to the search procedure.

Implementation

Prover9 is coded in C, and it uses the LADR libraries. Some of the code descended from EQP [McC97]. (LADR has some AC functions, but Prover9 does not use them). Term data structures are not shared (as they are in Otter). Term indexing is used extensively, with discrimination tree indexing for finding rewrite rules and subsuming units, FPA/Path indexing for finding subsumed units, rewritable terms, and resolvable literals. Feature vector indexing [Sch04] is used for forward and backward nonunit subsumption. Prover9 is available from

    http://www.cs.unm.edu/~mccune/prover9/

Expected Competition Performance

Prover9 is the CASC fixed point, against which progress can be judged. Each year it is expected do worse than the previous year, relative to the other systems.

Refute 2015

Jasmin Blanchette
Inria Nancy, France

Architecture

Refute [Web08] is an open source counterexample generator for Isabelle/HOL [NPW02] based on a SAT solver, and Nitpick's [BN10] precursor.

Strategies

Refute employs a SAT solver to find a finite model of the negated conjecture. The translation from HOL to propositional logic is parameterized by the cardinalities of the atomic types occurring in the conjecture. Refute enumerates the possible cardinalities for each atomic type. If a formula has a finite counterexample, the tool eventually finds it, unless it runs out of resources.

Implementation

Refute, like most of Isabelle/HOL, is written in Standard ML. Unlike Isabelle itself, which adheres to the LCF small-kernel discipline, Refute does not certify its results and must be trusted.

Refute is available as part of Isabelle/HOL for all major platforms under a BSD-style license from

    http://www.cl.cam.ac.uk/research/hvg/Isabelle

Expected Competition Performance

We expect Refute to beat Satallax but also to be beaten by Nitpick.

Satallax 2.8

Nik Sultana
Cambridge University, United Kingdom

Architecture

Satallax 2.8 [Bro12] is an automated theorem prover for higher-order logic. The particular form of higher-order logic supported by Satallax is Church's simple type theory with extensionality and choice operators. The SAT solver MiniSat [ES04] is responsible for much of the search for a proof. The theoretical basis of search is a complete ground tableau calculus for higher-order logic [BS10] with a choice operator [BB11]. A problem is given in the THF format. A branch is formed from the axioms of the problem and the negation of the conjecture (if any is given). From this point on, Satallax tries to determine unsatisfiability or satisfiability of this branch. Satallax progressively generates higher-order formulae and corresponding propositional clauses [Bro13]. These formulae and propositional clauses correspond to instances of the tableau rules. Satallax uses the SAT solver MiniSat as an engine to test the current set of propositional clauses for unsatisfiability. If the clauses are unsatisfiable, then the original branch is unsatisfiable. Additionally, Satallax may optionally generate first-order formulas in addition to the propositional clauses. If this option is used, then Satallax peroidically calls the first-order theorem prover E to test for first-order unsatisfiability. If the set of first-order formulas is unsatisfiable, then the original branch is unsatisfiable.

Strategies

There are about a hundred flags that control the order in which formulas and instantiation terms are considered and propositional clauses are generated. Other flags activate some optional extensions to the basic proof procedure (such as whether or not to call the theorem prover E). A collection of flag settings is called a mode. Approximately 500 modes have been defined and tested so far. A strategy schedule is an ordered collection of modes with information about how much time the mode should be allotted. Satallax tries each of the modes for a certain amount of time sequentially. Satallax 2.7 has strategy schedule consisting of 68 modes. Each mode is tried for time limits ranging from 0.1 seconds to 54.9 seconds. The strategy schedule was determined through experimentation using the THF problems in version 5.4.0 of the TPTP library.

Implementation

Satallax is implemented in OCaml. A foreign function interface is used to interact with MiniSat 2.2.0. Satallax is available from

    http://mathgate.info/cebrown/satallax/

Expected Competition Performance

Satallax 2.8 is the CASC-25 THF division winner.

Satallax 3.0

Michael Färber
Universität Innsbruck, Austria

Architecture

Satallax 3.0 [Bro12] is an automated theorem prover for higher-order logic. The particular form of higher-order logic supported by Satallax is Church's simple type theory with extensionality and choice operators. The SAT solver MiniSat [ES04] is responsible for much of the proof search. The theoretical basis of search is a complete ground tableau calculus for higher-order logic [BS10] with a choice operator [BB11]. Problems are given in the THF format.

Proof search: A branch is formed from the axioms of the problem and the negation of the conjecture (if any is given). From this point on, Satallax tries to determine unsatisfiability or satisfiability of this branch. Satallax progressively generates higher-order formulae and corresponding propositional clauses [Bro13]. These formulae and propositional clauses correspond to instances of the tableau rules. Satallax uses the SAT solver MiniSat to test the current set of propositional clauses for unsatisfiability. If the clauses are unsatisfiable, then the original branch is unsatisfiable. Optionally, Satallax generates first-order formulae in addition to the propositional clauses. If this option is used, then Satallax periodically calls the first-order theorem prover E to test for first-order unsatisfiability. If the set of first-order formulae is unsatisfiable, then the original branch is unsatisfiable. Upon request, Satallax attempts to reconstruct a proof which can be output in the TSTP format.

Strategies

There are about 140 flags that control the order in which formulae and instantiation terms are considered and propositional clauses are generated. Other flags activate some optional extensions to the basic proof procedure (such as whether or not to call the theorem prover E). A collection of flag settings is called a mode. Approximately 500 modes have been defined and tested so far. A strategy schedule is an ordered collection of modes with information about how much time the mode should be allotted. Satallax tries each of the modes for a certain amount of time sequentially. Satallax 3.0 has a strategy schedule consisting of 54 modes (15 of which make use of E). Each mode is tried for time limits ranging from less than a second to about 90 seconds. The strategy schedule was determined through experimentation using the THF problems in version 6.3.0 of the TPTP library.

Implementation

Satallax is implemented in OCaml. A foreign function interface is used to interact with MiniSat 2.2.0 Satallax is available at:

    http://satallaxprover.com

Expected Competition Performance

Since 2015, systems are required to return TSTP proofs. Previous versions of Satallax could only construct such proofs if E was not used in the search. Satallax 3.0 can construct a proof when using E. Since some problems are (effectively) only solvable when using E, this should improve performance over last year. In addition, some support for guiding the search using interpretations has been implemented. This is also expected to improve performance.

Vampire 4.0

Giles Reger
University of Manchester, United Kingdom

Architecture

Vampire 4.0 is an automatic theorem prover for first-order logic. Vampire implements the calculi of ordered binary resolution and superposition for handling equality. It also implements the Inst-gen calculus and a MACE-style finite model builder. Splitting in resolution-based proof search is controlled by the AVATAR architecture, which uses a SAT solver to make splitting decisions. Both resolution and instantiation based proof search make use of global subsumption.

A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered unit equalities. The reduction ordering is the Knuth-Bendix Ordering. Substitution tree and code tree indexes are used to implement all major operations on sets of terms, literals and clauses. Internally, Vampire works only with clausal normal form. Problems in the full first-order logic syntax are clausified during preprocessing. Vampire implements many useful preprocessing transformations including the Sine axiom selection algorithm.

When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF.

Strategies

Vampire 4.0 provides a very large number of options for strategy selection. The most important ones are:

Choices of saturation algorithm:
- Limited Resource Strategy
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
Splitting via AVATAR
A variety of optional simplifications.
Parameterized reduction orderings.
A number of built-in literal selection functions and different modes of comparing literals.
Age-weight ratio that specifies how strongly lighter clauses are preferred for inference selection.
Set-of-support strategy.
Ground equational reasoning via congruence closure.
Evaluation of interpreted functions.
Extensionality resolution with detection of extensionality axioms

Implementation

Vampire 4.0 is implemented in C++.

Expected Competition Performance

Vampire 4.0 is the CASC-25 FOF, FNT, EPR, and LTB division winner.

Vampire 4.1

Giles Reger
University of Manchester, United Kingdom

Architecture

Vampire [KV13] 4.1 is an automatic theorem prover for first-order logic. Vampire implements the calculi of ordered binary resolution and superposition for handling equality. It also implements the Inst-gen calculus [Kor13] and a MACE-style finite model builder [RSV16]. Splitting in resolution-based proof search is controlled by the AVATAR architecture [Vor14] which uses a SAT or SMT solver to make splitting decisions. Both resolution and instantiation based proof search make use of global subsumption [Kor13].

When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF.

Strategies

Vampire 4.1 provides a very large number of options for strategy selection. The most important ones are:

Choices of saturation algorithm:
- Limited Resource Strategy [RV03].
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
Splitting via AVATAR
A variety of optional simplifications.
Parameterized reduction orderings.
A number of built-in literal selection functions and different modes of comparing literals.
Age-weight ratio that specifies how strongly lighter clauses are preferred for inference selection.
Set-of-support strategy.
Ground equational reasoning via congruence closure.
Addition of theory axioms and evaluation of interpreted functions.
Use of Z3 [dMB08] with AVATAR to restrict search to ground-theory-consistent splitting branches.
Extensionality resolution [GK+14] with detection of extensionality axioms.

Implementation

Vampire 4.1 is implemented in C++.

Expected Competition Performance

Vampire 4.1 should be an improvement on Vampire 4.0 and VampireZ3 1.0, which won 5 divisions between them last year. Note that this year there is not a seperate VampireZ3 entry as Vampire 4.1 includes Z3.

VampireZ3 1.0

Giles Reger
University of Manchester, United Kingdom