Entrants' System Descriptions

CVC4 1.5.2
E 2.1
ET 0.2
iProver 2.5 (CASC-J8 EPR winner)
iProver 2.6
iProverModulo 2.5-0.1
Isabelle 2016
lean-nanoCoP 1.0
LEO-II 1.7.0
Leo-III 1.1
MaLARea 0.6
Princess 170717
Prover9 2011-11A
Satallax 3.0 (CASC-J8 THF winner)
Satallax 3.2
Scavenger EP-0.1 and EP-0.2
Vampire 4.0 (CASC-J8 FOF and LTB winner)
Vampire 4.1 (CASC-J8 TFA and FNT winner)
Vampire 4.2
Zipperposition 1.1

CVC4 1.5.2

Architecture

CVC4 [BC+11] is an SMT solver based on the DPLL(T) architecture [NOT06] that includes built-in support for many theories, including linear arithmetic, arrays, bit vectors, datatypes, finite sets and strings. It incorporates approaches for handling universally quantified formulas. For problems involving free function and predicate symbols, CVC4 primarily uses heuristic approaches based on E-matching for theorems, and finite model finding approaches for non-theorems. For problems in pure arithmetic, CVC4 uses techniques for counterexample-guided quantifier instantiation [RD+15]. Like other SMT solvers, CVC4 treats quantified formulas using a two-tiered approach. First, quantified formulas are replaced by fresh Boolean predicates and the ground theory solver(s) are used in conjunction with the underlying SAT solver to determine satisfiability. If the problem is unsatisfiable at the ground level, then the solver answers "unsatisfiable". Otherwise, the quantifier instantiation module is invoked, and will either add instances of quantified formulas to the problem, answer "satisfiable", or return unknown. Finite model finding in CVC4 targets problems containing background theories whose quantification is limited to finite and uninterpreted sorts. In finite model finding mode, CVC4 uses a ground theory of finite cardinality constraints that minimizes the number of ground equivalence classes, as described in [RT+13]. When the problem is satisfiable at the ground level, a candidate model is constructed that contains complete interpretations for all predicate and function symbols. It then adds instances of quantified formulas that are in conflict with the candidate model, as described in [RT+13]. If no instances are added, it reports "satisfiable".

Strategies

For handling theorems, CVC4 primarily uses conflict-based quantifier instantiation [RTd14] and E-matching. CVC4 uses a handful of orthogonal trigger selection strategies for E-matching. For handling non-theorems, CVC4 primarily uses finite model finding techniques. Since CVC4 with finite model finding is also capable of establishing unsatisfiability, it is used as a strategy for theorems as well. For problems in pure arithmetic, CVC4 uses variations of counterexample-guided quantifier instantiation [RD+15], which select relevant quantifier instantiations based on models for counterexamples to quantified formulas. CVC4 relies on this method both for theorems in TFA and non-theorems in TFN.

Implementation

CVC4 is implemented in C++. The code is available from:

    https://github.com/CVC4

Expected Competition Performance

For TFA, CVC4 should perform better than last year due to its use of new heuristic techniques for non-linear real and integer arithmetic [RT+17]. For FOF, it should perform slightly better due to improvements in the implementation of E-matching and several optimizations related to conflict-based instantiation [BFR17]. It should perform roughly the same in the FNT division as last year.

E 2.1

Stephan Schulz
DHBW Stuttgart, Germany

Architecture

E 2.1 [Sch02,Sch13] is a purely equational theorem prover for many-sorted first-order logic with equality. It consists of an (optional) clausifier for pre-processing full first-order formulae into clausal form, and a saturation algorithm implementing an instance of the superposition calculus with negative literal selection and a number of redundancy elimination techniques. E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e., a strict separation of active and passive facts. No special rules for non-equational literals have been implemented. Resolution is effectively simulated by paramodulation and equality resolution.

For the SLH and LTB divisions, a control program uses a SInE-like analysis to extract reduced axiomatizations that are handed to several instances of E. E will probably not use on-the-fly learning this year.

Strategies

Proof search in E is primarily controlled by a literal selection strategy, a clause selection heuristic, and a simplification ordering. The prover supports a large number of pre-programmed literal selection strategies. Clause selection heuristics can be constructed on the fly by combining various parameterized primitive evaluation functions, or can be selected from a set of predefined heuristics. Clause evaluation heuristics are based on symbol-counting, but also take other clause properties into account. In particular, the search can prefer clauses from the set of support, or containing many symbols also present in the goal. Supported term orderings are several parameterized instances of Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO).

For CASC-26, E implements a strategy-scheduling automatic mode. The total CPU time available is broken into several (unequal) time slices. For each time slice, the problem is classified into one of several classes, based on a number of simple features (number of clauses, maximal symbol arity, presence of equality, presence of non-unit and non-Horn clauses,...). For each class, a schedule of strategies is greedily constructed from experimental data as follows: The first strategy assigned to a schedule is the the one that solves the most problems from this class in the first time slice. Each subsequent strategy is selected based on the number of solutions on problems not already solved by a preceding strategy. About 220 different strategies have been evaluated on all untyped first-order problems from TPTP 6.4.0. About 90 of these strategies are used in the automatic mode, and about 210 are used in at least one schedule.

Implementation

E is build around perfectly shared terms, i.e. each distinct term is only represented once in a term bank. The whole set of terms thus consists of a number of interconnected directed acyclic graphs. Term memory is managed by a simple mark-and-sweep garbage collector. Unconditional (forward) rewriting using unit clauses is implemented using perfect discrimination trees with size and age constraints. Whenever a possible simplification is detected, it is added as a rewrite link in the term bank. As a result, not only terms, but also rewrite steps are shared. Subsumption and contextual literal cutting (also known as subsumption resolution) is supported using feature vector indexing [Sch04]. Superposition and backward rewriting use fingerprint indexing [Sch12], a new technique combining ideas from feature vector indexing and path indexing. Finally, LPO and KBO are implemented using the elegant and efficient algorithms developed by Bernd Löchner in [Loe06,Loe06]. The prover and additional information are available at

    http://www.eprover.org

Expected Competition Performance

E 2.1 has slightly better strategies than previous versions, and has some minor improvements in clausification and Set-of-Support implementation. The system is expected to perform well in most proof classes, but will at best complement top systems in the disproof classes.

ET 0.2

Josef Urban
Czech Technical University in Prague, Czech Republic

Architecture

ET [KS+15] 0.2 is a metasystem using E prover with specific strategies [Urb13,KU13,JU17] and preprocessing tools [KU13,KU13,KU13] that are targeted mainly at problems with many redundant axioms. Its design is motivated by the recent experiments in the Large-Theory Batch division [KUV15] and on the Flyspeck, Mizar and Isabelle datasets, however, ET does no learning from related proofs.

Strategies

We characterize formulas by the symbols and terms that they contain, normalized in various ways. Then we run various algorithms that try to remove the redundant axioms and use special strategies on such problems.

Implementation

The metasystem is implemented in ca. 1000 lines of Perl. It uses a number of external programs, some of them based on E's code base, some of them independently implemented in C++.

Expected Competition Performance

ET can solve some problems that E 1.8 cannot prove, and even some TPTP problems with rating 1.00. The CASC performance should not be much worse than that of E, possibly better, depending on problem selection.

iProver 2.5

Kontantin Korovin
University of Manchester, United Kingdom

Architecture

iProver is an automated theorem prover based on an instantiation calculus Inst-Gen [GK03, Kor13] which is complete for first-order logic. iProver combines first-order reasoning with ground reasoning for which it uses MiniSat [ES04] and optionally PicoSAT [Bie08] (only MiniSat will be used at this CASC). iProver also combines instantiation with ordered resolution; see [Kor08, Kor13] for the implementation details. The proof search is implemented using a saturation process based on the given clause algorithm. iProver uses non-perfect discrimination trees for the unification indexes, priority queues for passive clauses, and a compressed vector index for subsumption and subsumption resolution (both forward and backward). The following redundancy eliminations are implemented: blocking non-proper instantiations; dismatching constraints [GK04, Kor08]; global subsumption [Kor08]; resolution-based simplifications and propositional-based simplifications. A compressed feature vector index is used for efficient forward/backward subsumption and subsumption resolution. Equality is dealt with (internally) by adding the necessary axioms of equality. Recent changes in iProver include improved preprocessing and incremental finite model finding; support of the AIG format for hardware verification and model-checking (implemented with Dmitry Tsarkov).

In the LTB division, iProver uses axiom selection based on the Sine algorithm [HV11] as implemented in Vampire [KV13], i.e., axiom selection is done by Vampire and proof attempts are done by iProver.

Some of iProver features are summarised below.

proof extraction for both instantiation and resolution [KS12],
model representation, using first-order definitions in term algebra [KS12],
answer substitutions,
semantic filtering,
incremental finite model finding,
sort inference, monotonic [CLS11] and non-cyclic [Kor13] sorts,
predicate elimination [KK16].

Sort inference is targeted at improving finite model finding and symmetry breaking. Semantic filtering is used in preprocessing to eliminated irrelevant clauses. Proof extraction is challenging due to simplifications such global subsumption which involve global reasoning with the whole clause set and can be computationally expensive.

Strategies

iProver has around 60 options to control the proof search including options for literal selection, passive clause selection, frequency of calling the SAT solver, simplifications and options for combination of instantiation with resolution. At CASC iProver will execute a small number of fixed schedules of selected options depending on general syntactic properties such as Horn/non-Horn, equational/non-equational, and maximal term depth. For the LTB and FNT divisions several strategies are run in parallel.

Implementation

Prover is implemented in OCaml and for the ground reasoning uses MiniSat [ES04]. iProver accepts FOF and CNF formats. Vampire [KV13, HK+12] and E prover [Sch13] are used for proof-producing clausification of FOF problems, Vampire is also used for axiom selection [HV11] in the LTB division.

iProver is available at:

    http://www.cs.man.ac.uk/~korovink/iprover/

Expected Competition Performance

iProver 2.5 is the CASC-J8 THF division winner.

iProver 2.6

Konstantin Korovin
University of Manchester, United Kingdom

Architecture

iProver is an automated theorem prover based on an instantiation calculus Inst-Gen [GK03,Kor13] which is complete for first-order logic. iProver combines first-order reasoning with ground reasoning for which it uses MiniSat [ES04] and optionally PicoSAT [Bie08] (only MiniSat will be used at this CASC). iProver also combines instantiation with ordered resolution; see [Kor08,Kor13] for the implementation details. The proof search is implemented using a saturation process based on the given clause algorithm. iProver uses non-perfect discrimination trees for the unification indexes, priority queues for passive clauses, and a compressed vector index for subsumption and subsumption resolution (both forward and backward). The following redundancy eliminations are implemented: blocking non-proper instantiations; dismatching constraints [GK04,Kor08]; global subsumption [Kor08]; resolution-based simplifications and propositional-based simplifications. A compressed feature vector index is used for efficient forward/backward subsumption and subsumption resolution. Equality is dealt with (internally) by adding the necessary axioms of equality. Recent changes in iProver include improved preprocessing and incremental finite model finding; support for the TFF format restricted to clauses; the AIG format for hardware verification and QBF reasoning.

In the LTB and SLH divisions, iProver combines an abstraction-refinement framework [HK17] with axiom selection based on the SinE algorithm [HV11] as implemented in Vampire [KV13], i.e., axiom selection is done by Vampire and proof attempts are done by iProver.

Some of iProver features are summarised below.

proof extraction for both instantiation and resolution [KS12],
model representation, using first-order definitions in term algebra [KS12],
answer substitutions,
semantic filtering,
incremental finite model finding,
sort inference, monotonic [CLS11] and non-cyclic [Kor13] sorts,
support for the TFF format restricted to clauses,
predicate elimination [KK16].

Strategies

iProver has around 60 options to control the proof search including options for literal selection, passive clause selection, frequency of calling the SAT solver, simplifications and options for combination of instantiation with resolution. At CASC iProver will execute a small number of fixed schedules of selected options depending on general syntactic properties such as Horn/non-Horn, equational/non-equational, and maximal term depth. For the LTB, SLH and FNT divisions several strategies are run in parallel.

Implementation

iProver is implemented in OCaml and for the ground reasoning uses MiniSat [ES04]. iProver accepts FOF, TFF and CNF formats. Vampire [KV13,HK+12] and E prover [Sch13] are used for proof-producing clausification of FOF/TFF problems, Vampire is also used for axiom selection [HV11] in the LTB/SLH divisions. iProver is available at:

    http://www.cs.man.ac.uk/~korovink/iprover/

Expected Competition Performance

Compared to the last year, we integrated an abstraction-refinement framework [HK17] which we expect to improve performance in the LTB and SLH divisions. There are a several general improvements that should positively affect overall performance.

iProverModulo 2.5-0.1

Guillaume Burel
ENSIIE, University Paris‑Saclay, France

Architecture

iProverModulo [Bur11] is an extension of iProver [Kor08] to integrate Polarized resolution modulo [Dow10]. Polarized resolution modulo consists in presenting the theory in which the problem has to be solved by means of polarized rewriting rules instead of axioms. It can also be seen as a combination of the set-of-support strategy and selection of literals.

iProverModulo consists of two tools: First, autotheo is a theory preprocessor that converts the axioms of the input into rewriting rules that can be used by Polarized resolution modulo. Second, these rewriting rules are handled by a patched version of iProver 2.5 that integrates Polarized resolution modulo. The integration of polarized resolution modulo in iProver only affects its ordered resolution calculus, so that the instantiation calculus is untouched.

iProverModulo 2.5+0.1 outputs a proof that is made of two parts: First, autotheo prints a derivation of the transformation of the axioms into rewriting rules. This derivation is in TSTP format and includes the CNF conversions obtained from E. Second, the modified version of iProver outputs a proof in TSTP format from this set of rewriting rules and the other input formulas.

Strategies

Autotheo is first run to transform the formulas of the problem whose role is "axiom" into polarized rewriting rules. Autotheo offers a set of strategies to that purpose. For the competition, the Equiv and the ClausalAll strategies will be used. The former strategy orients formulas intuitively depending of their shape. It may be incomplete, so that the prover may give up in certain cases. However, it shows interesting results on some problems. The second strategy should be complete, at least when equality is not involved. The rewriting system for the first strategy is tried for half the time given for the problem, then the prover is restarted with the second strategy if no proof has been found.

The patched version of iProver is run on the remaining formulas modulo the rewriting rules produced by autotheo. No scheduling is performed. To be compatible with Polarized resolution modulo, literals are selected only when they are maximal w.r.t. a KBO ordering, and orphans are not eliminated. To take advantage of Polarized resolution modulo, the resolution calculus is triggered more often than the instantiation calculus, on the contrary to the original iProver.

Normalization of clauses w.r.t. the term rewriting system produced by autotheo is performed by transforming these rules into an OCaml program, compiling this program, and dynamically linking it with the prover.

Implementation

iProverModulo is available as a patch to iProver. The most important additions are the plugin-based normalization engine and the handling of polarized rewriting rules. iProverModulo is available from

    http://www.ensiie.fr/~guillaume.burel/blackandwhite_iProverModulo.html.en

Since iProverModulo needs to compile rewriting rules, an OCaml compiler is also provided.

Autotheo is available independently from iProverModulo from

    http://www.ensiie.fr/~guillaume.burel/blackandwhite_autotheo.html.en

Autotheo uses E to compute clausal normal form of formula. The version of E it uses is very slightly modified to make it print the CNF derivation even if no proof is found.

Both of autotheo and iProver are written in OCaml.

For the SLD division, iProverModulo uses the CNF transformation tool provided with the Logtk library [Cru14].

Expected Competition Performance

Although iProverModulo is now based on version 2.5 of iProver, no great improvement of performance is expected compared to CASC-25, since only the resolution part of iProver, which is relatively stable, has been modified.

Isabelle 2016

Jasmin Blanchette
Vrije Universiteit Amsterdam, Netherlands

Architecture

Isabelle/HOL 2016 [NPW02] is the higher-order logic incarnation of the generic proof assistant Isabelle2016. Isabelle/HOL provides several automatic proof tactics, notably an equational reasoner [Nip89], a classical reasoner [PN94], and a tableau prover [Pau99]. It also integrates external first- and higher-order provers via its subsystem Sledgehammer [PB10,BBP11]. Isabelle includes a parser for the TPTP syntaxes CNF, FOF, TFF0, and THF0, due to Nik Sultana. It also includes TPTP versions of its popular tools, invokable on the command line as isabelle tptp_tool max_secs file.p. For example:

isabelle tptp_isabelle_hot 100 SEU/SEU824^3.p

Isabelle is available in two versions. The HOT version (which is not participating in CASC-J8) includes LEO-II [BP+08] and Satallax [Bro12] as Sledgehammer backends, whereas the competition version leaves them out.

Strategies

The Isabelle tactic submitted to the competition simply tries the following tactics sequentially:

sledgehammer

Invokes the following sequence of provers as oracles via Sledgehammer:

satallax - Satallax 2.7 [Bro12] (HOT version only);
leo2 - LEO-II 1.6.2 [BPTF08] (HOT version only);
spass - SPASS 3.8ds [BP+12];
vampire - Vampire 3.0 (revision 1435) [RV02];
e - E 1.8 [Sch04];

nitpick

For problems involving only the type $o of Booleans, checks whether a finite model exists using Nitpick [BN10].

simp

Performs equational reasoning using rewrite rules [Nip89].

blast

Searches for a proof using a fast untyped tableau prover and then attempts to reconstruct the proof using Isabelle tactics [Pau99].

auto+spass

Combines simplification and classical reasoning [PN94] under one roof; then invoke Sledgehammer with SPASS on any subgoals that emerge.

z3

Invokes the SMT solver Z3 4.4.0 [dMB08].

cvc4

Invokes the SMT solver CVC4 1.5pre [BT07].

fast

Searches for a proof using sequent-style reasoning, performing a depth-first search [PN94]. Unlike blast, it construct proofs directly in Isabelle. That makes it slower but enables it to work in the presence of the more unusual features of HOL, such as type classes and function unknowns.

best

Similar to fast, except that it performs a best-first search.

force

Similar to auto, but more exhaustive.

meson

Implements Loveland's MESON procedure [Lov78]. Constructs proofs directly in Isabelle.

fastforce

Combines fast and force.

Implementation

Isabelle is a generic theorem prover written in Standard ML. Its meta-logic, Isabelle/Pure, provides an intuitionistic fragment of higher-order logic. The HOL object logic extends pure with a more elaborate version of higher-order logic, complete with the familiar connectives and quantifiers. Other object logics are available, notably FOL (first-order logic) and ZF (Zermelo–Fraenkel set theory).

The implementation of Isabelle relies on a small LCF-style kernel, meaning that inferences are implemented as operations on an abstract theorem datatype. Assuming the kernel is correct, all values of type theorem are correct by construction.

Most of the code for Isabelle was written by the Isabelle teams at the University of Cambridge and the Technische Universität München. Isabelle/HOL is available for all major platforms under a BSD-style license from

    http://www.cl.cam.ac.uk/research/hvg/Isabelle

Expected Competition Performance

I expect we will end up in second place (excluding proof output), behind Satallax, since we haven't upgraded the system since 2016. We will be back, hopefully in 2018!

lean-nanoCoP 1.0

Jens Otten
University of Oslo, Norway

Architecture

lean-nanoCoP is an automated theorem prover for classical first-order logic with equality. It combines the provers leanCoP [OB03,Ott08] and nanoCoP [Ott16], which are very compact implementations of the clausal connection calculus [Bib87] and the non-clausal connection calculus [Ott11], respectively.

Strategies

The reduction rule of the connection calculus is applied before the extension rule. Open branches are selected in a depth-first way. Iterative deepening on the proof depth is performed in order to achieve completeness. Additional inference rules and techniques include regularity, lemmata, and restricted backtracking [Ott10]. leanCoP uses an optimized structure-preserving transformation into clausal forms [Ott10]. The fixed strategy scheduling, which is controlled by a shell script, invokes the leanCoP and nanoCoP core provers.

Implementation

leanCoP and nanoCoP are implemented in Prolog. The source code of the core provers consists only of a few lines of code. Prolog's built-in indexing mechanism is used to quickly find connections when the extension rule is applied.

lean-nanoCoP can read formulae in leanCoP syntax and in TPTP first-order syntax. The leanCoP and nanoCoP core provers return very compact connection proofs; leanCoP translates its proof into a more readable output format.

The source codes of leanCoP and nanoCoP are available under the GNU general public license. They can be downloaded from the leanCoP and nanoCoP websites at

    http://www.leancop.de

and

    http://www.leancop.de/nanocop

The leanCoP website also contains information about ileanCoP [Ott08] and MleanCoP [Ott14], two versions of leanCoP for first-order intuitionistic logic and several first-order modal logics, respectively. Recently, versions of nanoCoP for these logics have been developed as well [Ott17].

Expected Competition Performance

For problems that are in a "strong" non-clausal form, the combination of leanCoP and nanoCoP are expected to perform better than the leanCoP prover by itself.

LEO-II 1.7.0

Alexander Steen
Freie Universität Berlin, Germany

Architecture

LEO-II [BP+08], the successor of LEO [BK98], is a higher-order ATP system based on extensional higher-order resolution. More precisely, LEO-II employs a refinement of extensional higher-order RUE resolution [Ben99]. LEO-II is designed to cooperate with specialist systems for fragments of higher-order logic. By default, LEO-II cooperates with the first-order ATP system E [Sch02]. LEO-II is often too weak to find a refutation amongst the steadily growing set of clauses on its own. However, some of the clauses in LEO-II's search space attain a special status: they are first-order clauses modulo the application of an appropriate transformation function. Therefore, LEO-II launches a cooperating first-order ATP system every n iterations of its (standard) resolution proof search loop (e.g., 10). If the first-order ATP system finds a refutation, it communicates its success to LEO-II in the standard SZS format. Communication between LEO-II and the cooperating first-order ATP system uses the TPTP language and standards.

Strategies

LEO-II employs an adapted "Otter loop". Moreover, LEO-II uses some basic strategy scheduling to try different search strategies or flag settings. These search strategies also include some different relevance filters.

Implementation

LEO-II is implemented in OCaml 4, and its problem representation language is the TPTP THF language [BRS08]. In fact, the development of LEO-II has largely paralleled the development of the TPTP THF language and related infrastructure [SB10]. LEO-II's parser supports the TPTP THF0 language and also the TPTP languages FOF and CNF.

Unfortunately the LEO-II system still uses only a very simple sequential collaboration model with first-order ATPs instead of using the more advanced, concurrent and resource-adaptive OANTS architecture [BS+08] as exploited by its predecessor LEO.

The LEO-II system is distributed under a BSD style license, and it is available from

    http://www.leoprover.org

Expected Competition Performance

LEO-II ist not actively being developed anymore, hence there are no expected improvements to last year's CASC results.

Leo-III 1.1

Alexander Steen
Freie Universität Berlin, Germany

Architecture

Leo-III [SWB16], the successor of LEO-II [BP+08], is a higher-order ATP system based on higher-order paramodulation with inference restrictions using a higher-order term ordering. Since Leo-III employs a agent-based blackboard architecture, multiple independent proof search approaches can be run in parallel as so-called agents. In version 1.1, each agent runs a sequential proof search based on the given-clause algorithm as known from E, each with different search strategy.

Leo-III heavily relies on cooperation with external (first-order) ATPs that are called asynchronously during proof search. At the moment, first-order cooperation is limited to typed first-order systems, where CVC4 [BC+11] is used as default external system. Nevertheless, further external systems (also further higher-order systems or counter model generators) can be employed using command-line arguments. If either one of the saturation procedure loops or one of the external provers finds a proof, the system stops and returns the result.

Strategies

Leo-III runs multiple search strategies in parallel using its agent-based architecture. The search strategies differ in the employed relevance filter parameters, inference parameters, pre-processing techniques and hence the considered formula set. The available portfolio of strategies also contains incomplete approaches that might outperform default search strategies for some problem input classes.

Implementation

Leo-III exemplarily utilizes and instantiates the associated LeoPARD system platform [WSB15] for higher-order (HO) deduction systems implemented in Scala (currently using Scala 2.12). The prover makes use of LeoPARD’s sophisticated data structures and implements its own reasoning logic on top, e.g. as agents in LeoPARD’s provided blackboard architecture [BSW17].

A generic parser is provided that supports all TPTP syntax dialects. It is implemented using ANTLR4 and converts its produced concrete syntax tree to an internal TPTP AST data structure which is then transformed into polymorphically typed lambda terms. As of version 1.1, Leo-III supports all common TPTP dialects (CNF, FOF, TFF, THF) including their polymorphic variants [BP13,KRS16].

The term data structure of Leo-III uses a spine term representation augmented with explicit substitutions and De Bruijn-indices. Furthermore, terms are perfectly shared during proof search, permitting constant-time equality checks between alpha-equivalent terms.

As pointed out before, Leo-III’s agents may at any point invoke external reasoning tools. To that end, Leo-III includes an encoding module that translates (polymorphic) higher-order clauses to polymorphic and monomorphic typed first-order clauses. While LEO-II relied on cooperation with untyped first-order provers, we hope to reduce clutter and therefore achieve better results using native type support in first-order provers.

Leo-III 1.1 will be available on GitHub after CASC-26:

    https://github.com/cbenzmueller/Leo-III

Expected Competition Performance

In contrast to its last version 1.0 (competed at CASC-J8), Leo-III 1.1 has been improved in several aspects. Due to the novel cooperation schemes with typed first-order provers, we strongly expect better results compared to its predecessor LEO-II.

MaLARea 0.6

Josef Urban
Czech Technical University in Prague, Czech Republic

Architecture

MaLARea 0.6 [Urb07,US+08,KUV15] is a metasystem for ATP in large theories where symbol and formula names are used consistently. It uses several deductive systems (now E, SPASS, Vampire, Paradox, Mace), as well as complementary AI techniques like machine learning (the SNoW system) based on symbol-based similarity, model-based similarity, term-based similarity, and obviously previous successful proofs. The version for CASC-26 will use the E prover with the BliStr(Tune) [Urb13,JU17] large-theory strategies, possibly also Prover9, Mace and Paradox. The premise selection methods will likely also use the distance-weighted k-nearest neighbor [KU13] and E's implementation of SInE.

Strategies

The basic strategy is to run ATPs on problems, then use the machine learner to learn axiom relevance for conjectures from solutions, and use the most relevant axioms for next ATP attempts. This is iterated, using different timelimits and axiom limits. Various features are used for learning, and the learning is complemented by other criteria like model-based reasoning, symbol and term-based similarity, etc.

Implementation

The metasystem is implemented in ca. 2500 lines of Perl. It uses many external programs - the above mentioned ATPs and machine learner, TPTP utilities, LADR utilities for work with models, and some standard Unix tools. MaLARea is available at:

    https://github.com/JUrban/MPTP2/tree/master/MaLARea

The metasystem's Perl code is released under GPL2.

Expected Competition Performance

Thanks to machine learning, MaLARea is strongest on batches of many related problems with many redundant axioms where some of the problems are easy to solve and can be used for learning the axiom relevance. MaLARea is not very good when all problems are too difficult (nothing to learn from), or the problems (are few and) have nothing in common. Some of its techniques (selection by symbol and term-based similarity, model-based reasoning) could however make it even there slightly stronger than standard ATPs. MaLARea has a very good performance on the MPTP Challenge, which is a predecessor of the LTB division, and on several previous LTB competitions.

Princess 170717

Philipp Rümmer
Uppsala University, Sweden

Architecture

Princess [Rue08,Rue12] is a theorem prover for first-order logic modulo linear integer arithmetic. The prover uses a combination of techniques from the areas of first-order reasoning and SMT solving. The main underlying calculus is a free-variable tableau calculus, which is extended with constraints to enable backtracking-free proof expansion, and positive unit hyper-resolution for lightweight instantiation of quantified formulae. Linear integer arithmetic is handled using a set of built-in proof rules resembling the Omega test, which altogether yields a calculus that is complete for full Presburger arithmetic, for first-order logic, and for a number of further fragments. In addition, some built-in procedures for nonlinear integer arithmetic are available.

The internal calculus of Princess only supports uninterpreted predicates; uninterpreted functions are encoded as predicates, together with the usual axioms. Through appropriate translation of quantified formulae with functions, the e-matching technique common in SMT solvers can be simulated; triggers in quantified formulae are chosen based on heuristics similar to those in the Simplify prover.

Strategies

For CASC, Princess will run a fixed schedule of configurations for each problem (portfolio method). Configurations determine, among others, the mode of proof expansion (depth-first, breadth-first), selection of triggers in quantified formulae, clausification, and the handling of functions. The portfolio was chosen based on training with a random sample of problems from the TPTP library.

Implementation

Princess is entirely written in Scala and runs on any recent Java virtual machine; besides the standard Scala and Java libraries, only the Cup parser library is used.

Princess is available from:

    http://www.philipp.ruemmer.org/princess.shtml

Expected Competition Performance

Princess should perform roughly as in the last years. Compared to last year, the support for outputting proofs was extended, and should now cover all relevant theory modules for CASC (but not yet all proof strategies).

Prover9 2009-11A

Bob Veroff on behalf of William McCune
University of New Mexico, USA

Architecture

Prover9, Version 2009-11A, is a resolution/paramodulation prover for first-order logic with equality. Its overall architecture is very similar to that of Otter-3.3 [McC03]. It uses the "given clause algorithm", in which not-yet-given clauses are available for rewriting and for other inference operations (sometimes called the "Otter loop").

Prover9 has available positive ordered (and nonordered) resolution and paramodulation, negative ordered (and nonordered) resolution, factoring, positive and negative hyperresolution, UR-resolution, and demodulation (term rewriting). Terms can be ordered with LPO, RPO, or KBO. Selection of the "given clause" is by an age-weight ratio.

Proofs can be given at two levels of detail: (1) standard, in which each line of the proof is a stored clause with detailed justification, and (2) expanded, with a separate line for each operation. When FOF problems are input, proof of transformation to clauses is not given.

Completeness is not guaranteed, so termination does not indicate satisfiability.

Strategies

Prover9 has available many strategies; the following statements apply to CASC.

Given a problem, Prover9 adjusts its inference rules and strategy according to syntactic properties of the input clauses such as the presence of equality and non-Horn clauses. Prover9 also does some preprocessing, for example, to eliminate predicates.

For CASC Prover9 uses KBO to order terms for demodulation and for the inference rules, with a simple rule for determining symbol precedence.

For the FOF problems, a preprocessing step attempts to reduce the problem to independent subproblems by a miniscope transformation; if the problem reduction succeeds, each subproblem is clausified and given to the ordinary search procedure; if the problem reduction fails, the original problem is clausified and given to the search procedure.

Implementation

Prover9 is coded in C, and it uses the LADR libraries. Some of the code descended from EQP [McC97]. (LADR has some AC functions, but Prover9 does not use them). Term data structures are not shared (as they are in Otter). Term indexing is used extensively, with discrimination tree indexing for finding rewrite rules and subsuming units, FPA/Path indexing for finding subsumed units, rewritable terms, and resolvable literals. Feature vector indexing [Sch04] is used for forward and backward nonunit subsumption. Prover9 is available from

    http://www.cs.unm.edu/~mccune/prover9/

Expected Competition Performance

Prover9 is the CASC fixed point, against which progress can be judged. Each year it is expected do worse than the previous year, relative to the other systems.

Satallax 3.0

Michael Färber
Universität Innsbruck, Austria

Architecture

Satallax 3.0 [Bro12] is an automated theorem prover for higher-order logic. The particular form of higher-order logic supported by Satallax is Church's simple type theory with extensionality and choice operators. The SAT solver MiniSat [ES04] is responsible for much of the proof search. The theoretical basis of search is a complete ground tableau calculus for higher-order logic [BS10] with a choice operator [BB11]. Problems are given in the THF format.

Proof search: A branch is formed from the axioms of the problem and the negation of the conjecture (if any is given). From this point on, Satallax tries to determine unsatisfiability or satisfiability of this branch. Satallax progressively generates higher-order formulae and corresponding propositional clauses [Bro13]. These formulae and propositional clauses correspond to instances of the tableau rules. Satallax uses the SAT solver MiniSat to test the current set of propositional clauses for unsatisfiability. If the clauses are unsatisfiable, then the original branch is unsatisfiable. Optionally, Satallax generates first-order formulae in addition to the propositional clauses. If this option is used, then Satallax periodically calls the first-order theorem prover E to test for first-order unsatisfiability. If the set of first-order formulae is unsatisfiable, then the original branch is unsatisfiable. Upon request, Satallax attempts to reconstruct a proof which can be output in the TSTP format.

Strategies

There are about 140 flags that control the order in which formulae and instantiation terms are considered and propositional clauses are generated. Other flags activate some optional extensions to the basic proof procedure (such as whether or not to call the theorem prover E). A collection of flag settings is called a mode. Approximately 500 modes have been defined and tested so far. A strategy schedule is an ordered collection of modes with information about how much time the mode should be allotted. Satallax tries each of the modes for a certain amount of time sequentially. Satallax 3.0 has a strategy schedule consisting of 54 modes (15 of which make use of E). Each mode is tried for time limits ranging from less than a second to about 90 seconds. The strategy schedule was determined through experimentation using the THF problems in version 6.3.0 of the TPTP library.

Implementation

Satallax is implemented in OCaml. A foreign function interface is used to interact with MiniSat 2.2.0 Satallax is available at:

    http://satallaxprover.com

Expected Competition Performance

Satallax 3.0 is the CASC-J8 THF division winner.

Satallax 3.2

Michael Färber
Universität Innsbruck, Austria

Architecture

Satallax 3.2 [Bro12] is an automated theorem prover for higher-order logic. The particular form of higher-order logic supported by Satallax is Church's simple type theory with extensionality and choice operators. The SAT solver MiniSat [ES04] is responsible for much of the proof search. The theoretical basis of search is a complete ground tableau calculus for higher-order logic [BS10] with a choice operator [BB11]. Problems are given in the THF format.

Proof search: A branch is formed from the axioms of the problem and the negation of the conjecture (if any is given). From this point on, Satallax tries to determine unsatisfiability or satisfiability of this branch. Satallax progressively generates higher-order formulae and corresponding propositional clauses [Bro13]. These formulae and propositional clauses correspond to instances of the tableau rules. Satallax uses the SAT solver MiniSat to test the current set of propositional clauses for unsatisfiability. If the clauses are unsatisfiable, then the original branch is unsatisfiable. Optionally, Satallax generates first-order formulae in addition to the propositional clauses. If this option is used, then Satallax periodically calls the first-order theorem prover E [Sch13] to test for first-order unsatisfiability. If the set of first-order formulae is unsatisfiable, then the original branch is unsatisfiable. Upon request, Satallax attempts to reconstruct a proof which can be output in the TSTP format. The proof reconstruction has been significantly changed since Satallax 3.0 in order to make proof reconstruction more efficient and thus less likely to fail within the time constraints.

Strategies

There are about 150 flags that control the order in which formulae and instantiation terms are considered and propositional clauses are generated. Other flags activate some optional extensions to the basic proof procedure (such as whether or not to call the theorem prover E). A collection of flag settings is called a mode. Approximately 500 modes have been defined and tested so far. A strategy schedule is an ordered collection of modes with information about how much time the mode should be allotted. Satallax tries each of the modes for a certain amount of time sequentially. Before deciding on the schedule to use, Satallax parses the problem and determines if it is big enough that a SInE-based premise selection algorithm [HV11] should be used. If SInE is not activated, then Satallax 3.2 uses a strategy schedule consisting of 37 modes. Each mode is tried for time limits ranging from less than a second to just over 1 minute. If SInE is activated, than Satallax is run with a SInE-specific schedule consisting of 20 possible SInE parameter values selecting different premises and some corresponding modes and time limits.

Implementation

Satallax is implemented in OCaml, making use of the external tools MiniSat (via a foreign function interface) and E. Satallax is available at:

    http://satallaxprover.com

Expected Competition Performance

The addition of a SInE-like procedure for premise selection means Satallax should be able to solve some large problems that were previously out of reach. In addition, the changes to the way TSTP proofs are generated should mean that proofs are more likely to be constructed and reported after a proof has been found. We hope that this will be reflected in an improved performance over Satallax 3.0 from last year.

Scavenger EP-0.1 and EP-0.2

Bruno Woltzenlogel Paleo
Australian National University, Australia
Daniyar Itegulov, Uladzislau Padtsiolkin
ITMO University, Russia

Architecture

Scavenger [ISW17] is a theorem prover based on the new Conflict Resolution calculus [SW17]. At the proof-theoretical level, Conflict Resolution (CR) is a generalization of the conflict-driven clause learning (CDCL) principle to first-order logic. CR derivations are isomorphic to implication graphs (as maintained by SAT-solvers): every unit-propagating resolution inference corresponds to a new propagated literal in the graph; every assumption/decision corresponds to a decision literal in the graph; and every conflict inference corresponds to a conflict in the graph. CR's clause learning inference learns a disjunction of negations of instances of the decision literals that are ancestors of the conflict, using the compositions of the unifiers on the paths from the decisions to the conflict. From a natural deduction point of view, CR's clause learning inference rule generalizes implication/negation introduction by taking unification into account and considering several assumptions at once [Wol16]. In this sense, it does to Gentzen's implication/negation introduction what Robinson's resolution did to implication elimination (a.k.a. modus ponens).

The architecture of Scavenger attempts to be similar to the architecture of SAT-solvers, but data structures typically used in sat-solvers (e.g., Two-Watched-Literals) cannot be easily and efficiently generalized to first-order logic. Because of that, Scavenger's architecture also has a "taste" of saturation. For example, whereas in a SAT-solver propagation causes a literal to be assigned (either true or false), in Scavenger, propagation often requires generation of an instance of a literal, and it is this generated instance that is assigned.

Strategies

Proof search in the Conflict Resolution calculus presents unique challenges. For example, in contrast to what happens in the propositional case, unit-propagation may not terminate. Scavenger is an experimental prover, and such challenges have been dealt with in various ways [ISW17]. Scavenger-EP-0.1 is one of the three versions evaluated in [ISW17]. It simply ignores the non-termination of unit-propagation (and hence is incomplete). (Scavenger-TD-0.1 and Scavenger-PD-0.1 maintain completeness by iteratively deepening the propagation and making decisions eagerly. However, on TPTP problems they did not perform as well as Scavenger-EP-0.1, and therefore will not participate in CASC this year.)

Scavenger-EP-0.2 extends Scavenger-EP-0.1 from CNF without equality to FOF with equality. However, equality reasoning is done in a naive way: (instances of) equality axioms are added to the problem when needed. Scavenger-EP-0.2 also implements a VSIDS heuristic for decision literal selection and optimizes unification and some data structures.

Implementation

Scavenger is implemented in Scala and runs on the Java Virtual Machine. Terms and formulas are simply typed lambda terms. Clauses are two-sided sequents (pairs of lists of positive and negative atomic formulas). Inference rules are classes with assertions that ensure their soundness. A hashmap is used to allow quicker detection of propagating clauses (in an attempt to generalize two-watched-literals to first-order logic). Scavenger is available at:

    https://gitlab.com/aossie/Scavenger/

Expected Competition Performance

Both competing versions of Scavenger are expected to perform better on effectively propositional problems (where the non-termination of unit-propagation is not an issue) than on problems that are not in this fragment. Scavenger-EP-0.1 has been evaluated in [ISW17], and similar performance is expected in CASC. Scavenger-EP-0.2 has not been thoroughly evaluated yet. It is hoped that it will perform better than Scavenger-EP-0.1.

Vampire 4.0

Giles Reger
University of Manchester, United Kingdom

Architecture

Vampire 4.0 is an automatic theorem prover for first-order logic. Vampire implements the calculi of ordered binary resolution and superposition for handling equality. It also implements the Inst-gen calculus and a MACE-style finite model builder. Splitting in resolution-based proof search is controlled by the AVATAR architecture, which uses a SAT solver to make splitting decisions. Both resolution and instantiation based proof search make use of global subsumption.

A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered unit equalities. The reduction ordering is the Knuth-Bendix Ordering. Substitution tree and code tree indexes are used to implement all major operations on sets of terms, literals and clauses. Internally, Vampire works only with clausal normal form. Problems in the full first-order logic syntax are clausified during preprocessing. Vampire implements many useful preprocessing transformations including the Sine axiom selection algorithm.

When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF.

Strategies

Vampire 4.0 provides a very large number of options for strategy selection. The most important ones are:

Choices of saturation algorithm:
- Limited Resource Strategy
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
Splitting via AVATAR
A variety of optional simplifications.
Parameterized reduction orderings.
A number of built-in literal selection functions and different modes of comparing literals.
Age-weight ratio that specifies how strongly lighter clauses are preferred for inference selection.
Set-of-support strategy.
Ground equational reasoning via congruence closure.
Evaluation of interpreted functions.
Extensionality resolution with detection of extensionality axioms

Implementation

Vampire 4.0 is implemented in C++.

Expected Competition Performance

Vampire 4.0 is the CASC-J8 FOF and LTB division winner.

Vampire 4.1

Giles Reger
University of Manchester, United Kingdom

Architecture

Vampire [KV13] 4.1 is an automatic theorem prover for first-order logic. Vampire implements the calculi of ordered binary resolution and superposition for handling equality. It also implements the Inst-gen calculus [Kor13] and a MACE-style finite model builder [RSV16]. Splitting in resolution-based proof search is controlled by the AVATAR architecture [Vor14] which uses a SAT or SMT solver to make splitting decisions. Both resolution and instantiation based proof search make use of global subsumption [Kor13].

When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF.

Strategies

Vampire 4.1 provides a very large number of options for strategy selection. The most important ones are:

Choices of saturation algorithm:
- Limited Resource Strategy [RV03].
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
Splitting via AVATAR
A variety of optional simplifications.
Parameterized reduction orderings.
A number of built-in literal selection functions and different modes of comparing literals.
Age-weight ratio that specifies how strongly lighter clauses are preferred for inference selection.
Set-of-support strategy.
Ground equational reasoning via congruence closure.
Addition of theory axioms and evaluation of interpreted functions.
Use of Z3 [dMB08] with AVATAR to restrict search to ground-theory-consistent splitting branches.
Extensionality resolution [GK+14] with detection of extensionality axioms.

Implementation

Vampire 4.1 is implemented in C++.

Expected Competition Performance

Vampire 4.0 is the CASC-J8 TFA and FNT division winner.

Vampire 4.2

Giles Reger
University of Manchester, United Kingdom

Architecture

Vampire [KV13] 4.2 is an automatic theorem prover for first-order logic. Vampire implements the calculi of ordered binary resolution and superposition for handling equality. It also implements the Inst-gen calculus and a MACE-style finite model builder [RSV16]. Splitting in resolution-based proof search is controlled by the AVATAR architecture which uses a SAT or SMT solver to make splitting decisions [Vor14,RB+16]. Both resolution and instantiation based proof search make use of global subsumption.

Strategies

Vampire 4.2 provides a very large number of options for strategy selection. The most important ones are:

Choices of saturation algorithm:
- Limited Resource Strategy [LRS]
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
Splitting via AVATAR [Vor14]
A variety of optional simplifications.
Parameterized reduction orderings.
A number of built-in literal selection functions and different modes of comparing literals [HR+16].
Age-weight ratio that specifies how strongly lighter clauses are preferred for inference selection.
Set-of-support strategy.
Ground equational reasoning via congruence closure.
Addition of theory axioms and evaluation of interpreted functions.
Use of Z3 with AVATAR to restrict search to ground-theory-consistent splitting branches [RB+16].
Specialised theory instantiation and unification
Extensionality resolution with detection of extensionality axioms

Implementation

Vampire 4.2 is implemented in C++. It makes use of minisat and z3.

Expected Competition Performance

Vampire 4.2 should be an improvement on Vampire 4.1, which won 4 divisions last year. Most changes have happened in parts relevant to TFA, some small changes in parts relevant to model building (EPR and FNT), and some general improvements to preprocessing that will effect all tracks.

Zipperposition 1.1

Simon Cruanes
Inria Nancy, France

Architecture

Zipperposition is a superposition-based theorem prover for typed first-order logic with equality. It features a number of extensions that include polymorphic types; linear arithmetic on integers and rationals using a specialized set of first-order inference rules; datatypes with structural induction; user-defined rewriting on terms and formulas ("deduction modulo theories"); a lightweight variant of AVATAR for boolean case splitting; first-class booleans [KK+16]; and (very experimental) support for higher-order logic, extending first-order rules to higher-order terms using a customized variant of pattern unification. The core architecture of the prover is based on saturation with an extensible set of rules for inferences and simplifications. The initial calculus and main loop were imitations of an old version of E [Sch02], but there are many more rules nowadays. A summary of the calculus for integer arithmetic and induction can be found in [Cru15].

Strategies

The system does not feature advanced strategies: only one saturation loop with pick-given ratio and clause selection heuristics is used. No tuning specific to CASC was made.

Implementation

The prover is implemented in OCaml, and has been around for five years. Term indexing is done using discrimination trees (non perfect ones for unification, perfect ones for rewriting) as well as feature vectors for subsumption. Some inference rules such as contextual literal cutting make heavy use of subsumption. The code can be found at

    https://github.com/c-cube/zipperposition

and is entirely free software (BSD-licensed). Zipperposition can also output graphic proofs using graphviz. Some tools to perform type inference and clausification for typed formulas are also provided, as well as a separate library for dealing with terms and formulas [Cru15].

Expected Competition Performance

The prover is expected to have decent performance on first-order theorems, hopefully beating prover9; decent performance in arithmetic (ignoring the lack of real arithmetic), but behind more sophisticated provers such as Vampire or CVC4; and poor performance on THF problems except those that only require first-class booleans.