Entrants' System Descriptions


agsyHOL 1.0

Fredrik Lindblad
University of Gothenburg, Sweden

Architecture

AgsyHOL 1.0 [
agsyHOL] is based on intuitionistic sequent calculus extended with rules for classical and equality reasoning. There is an introduction and elimination rule for each logical construct, plus rules for RAA, AC and extensionality. There are special inference modes for elimination and equality reasoning.

A backward search is applied to find proof derivations in the calculus. Search is controlled by locally assigning priorities to different kind of constraints, and refining that part of the half-finished proof which is blocking a constraint with the highest priority.

Strategies

The system has no set of strategies it chooses from by pre-analysing the problem. What it does do before starting the proof search is to try to minimise the need for classical reasoning by transforming the problem using double negation elimination and the de Morgan laws.

Implementation

The system is implemented in Haskell. It consists of a proof checker for the calculus. The proof checker is annotated with search control information controlling priorities of constraints and costs of choices. The system also consists of an implementation of lazy narrowing, a general purpose search mechanism, which is applied to the proof checker in order to achieve proof search. The lazy narrowing search has been extended in order to deal with customizable priorities and costs. agsyHOL is available from:
    https://github.com/frelindb/agsyHOL/

Expected Competition Performance

On SystemOnTPTP it solves 1722 of the THF problems (TPTP version 6.0.0).


Beagle 0.9

Peter Baumgartner, Josh Bax
NICTA and Australian National University, Australia

Architecture

Beagle is an automated theorem prover for sorted first-order logic with equality over built-in theories. The theories currently supported are integer arithmetic, linear rational arithmetic and linear real arithmetic. It accepts formulas in the FOF and TFF formats of the TPTP syntax, and formulas in the SMT-LIB version 2 format.

Beagle first converts the input formulas into clause normal form. Pure arithmetic (sub-)formulas are treated by eager application of quantifier elimination. The core reasoning component implements the Hierarchic Superposition Calculus with Weak Abstraction (HSPWA) [BW13]. Extensions are a splitting rule for clauses that can be divided into variable disjoint parts, and a chaining inference rule for reasoning with inequalities.

The HSPWA calculus generalizes the superposition calculus by integrating theory reasoning in a black-box style. For the theories mentioned above, Beagle combines quantifier elimination procedures and other solvers to dispatch proof obligations over these theories. The default solvers are an improved version of Cooper's algorithm for linear integer arithmetic, and the Fourier-Motzkin algorithm for linear real/rational arithmetic. Non-linear integer arithmetic is treated by partial instantiation.

Strategies

Beagles uses the Discount loop for saturating a clause set under the calculus' inference rules. Simplification techniques include standard ones, such as subsumption deletion, demodulation by ordered unit equations, and tautology deletion. It also includes theory specific simplification rules for evaluating ground (sub)terms, and for exploiting cancellation laws and properties of neutral elements, among others. In the competition an aggressive form of arithmetic simplification is used, which seems to perform best in practice.

Beagle uses strategy scheduling by trying (at most) two flag settings sequentially.

Implementation

Beagle is implemented in Scala. It is a full implementation of the HSPWA calculus. It uses a simple form of indexing, essentially top-symbol hashes, stored with each term and computed in a lazy way. Fairness is achieved through a combination of measuring clause weights and their derivation-age. It can be fine-tuned with a weight-age ratio parameter, as in other provers.

Beagle's web site is

    http://users.cecs.anu.edu.au/~baumgart/systems/beagle/

Expected Competition Performance

Beagle should perform reasonably well.


cocATP 0.2.0

Cristóbal Camarero
University of Cantabria, Spain

Architecture

cocATP is a Coq-inspired [
BC04] automated theorem prover made on the free time of the author. It implements (the non-inductive) part of Coq's logic (calculus of constructions) and syntax. The proof terms it creates are accepted as proofs by Coq by the addition of a few definitions (substituting the respective inductive definitions of Coq). As in Coq, equality and logical connectives other than implication (which is a dependent product of the logic) are defined adding the proper axioms. The reasoning is tried to be done the more general possible, avoiding to give any special treatment to both equality and logical connectives.

At difference of most of the other provers, cocATP does not rely on SAT or first-order solver. All reasoning is done in the high-order logic that is the calculus of constructions.

Strategies

The first of the rules is: for a node with conjecture (forall x:T, P x) create a new node with hypothesis (x:T) and conecture (P x). Other of the major rules consist in: when having a node with conjecture (C) and some hypothesis (H:forall (x1:T1) (x2:T2) ... (xn:Tn), P x1 x2 ... xn), to unificate P with C and prove the Ti necessary. The unification is a subset of the possible high-order unifications. And in this case is a one-sided unification. There is a lot of other ad-hoc rules and conditions to apply them. Some of the rules can create existential terms (x:=??:T) alike to Coq's tactic evar. With these rules resulting proofs are very similar to human-made proofs.

There is a partially implemented second strategy more similar to the ones found in other theorem provers. It consist a saturation algorithm with the high-order modus_ponens. As example, given a hypothesis (P) and another (or the same) hypothesis (forall x:T, Q x), unficate P with T generating a new hypothesis Q. This generalizes the typical saturation rules, as resolution and superposition. The strategy yet lacks a good conditions for subsumption and simplifcation. Probably it will not be used in competition.

Implementation

cocATP is implemented in Python-2.7 and C. It uses the Ply-3.4 library to build the parsers for both the Coq and TPTP syntaxes. Recently, the processing core has been reimplemented in C as a Python module, and a Python variable controls if using a pure Python or the C version. Using the C module can be an order of magnitude faster, but it is less tested and introduces the possibility of SEGFAULTS.

Includes a type-verifier of Calculus of Constructions without inductive constructions, which must be defined with axioms. That is, a buggy partial clone of Coq. There is support for most of the TPTP syntax, problems are translated to a set of calculus of constructions terms. cocATP has been specially prepared for THF, but all the other TPTP formulae without numbers should be accepted. However cocATP does NOT include a SAT solver, thus it will probably not solve your trivial CNF problems.

More info at:

    http://www.alumnos.unican.es/ccc66/cocATP.htm

Expected Competition Performance

With the reimplementation of the kernel in C it is expected a great reduction in the execution time, but not a lot of newly solved problems.


Crossbow 0.1

Radek Micek
Charles University in Prague, Czech Republic

Architecture

Crossbow 0.1 is a MACE2-style finite model finder similar to Paradox. It is an implementation of techniques from [
CS03] plus improved flattening and built-in support for commutativity.

Strategies

Same strategy is used for all problems:
  1. First order formulas are clausified by another prover.
  2. Additional lemmas are generated by another prover, small lemmas are added to the clauses of the problem.
  3. Clauses of the problem are preprocessed (simplification, flattening, splitting, term definitions, commutativity detection).
  4. Clauses are instantiated for increasing domain sizes until some model is found. Instantition takes into account commutativity of functions and symmetry of predicates.

Implementation

Crossbow is implemented in OCaml. Clausification and lemma generation is done by E. MiniSat is used for incremental SAT solving. Source code is available from:
    https://github.com/radekm/crossbow

Expected Competition Performance

It should perform slightly better than Paradox.


CVC4 1.4

Andrew Reynolds
EPFL, Switzerland

Architecture

CVC4 [
BC+11] is an SMT solver based on the DPLL(T) architecture [NOT06] that includes built-in support for many theories including linear arithmetic, arrays, bit vectors, datatypes and strings. It incorporates various approaches for handling universally quantified formulas. In particular, CVC4 uses primarily uses heuristic approaches based on E-matching for answering "unsatisfiable", and finite model finding approaches for answering "satisfiable".

Like other SMT solvers, CVC4 treats quantified formulas using a two-tiered approach. First, quantified formulas are replaced by fresh boolean predicates and the ground theory solver(s) are used in conjunction with the underlying SAT solver to determine satisfiability. If the problem is unsatisfiable at the ground level, then the solver answers "unsatisfiable". Otherwise, the quantifier instantiation module is invoked, and will either add instances of quantified formulas to the problem, answer "satisfiable", or return unknown.

The finite model finding has been developed to target problems containing background theories, whose quantification is limited to finite and uninterpreted sorts. In finite model finding mode, CVC4 uses a ground theory of finite cardinality constraints that minimizes the number of ground equivalence classes, as described in [RT+13]. When the problem is satisfiable at the ground level, a candidate model is constructed that contains complete interpretations for all predicate and function symbols. Quantifier instantiation strategies are then invoked to add instances of quantified formulas that are in conflict with the candidate model, as described in [RT+13]. If no instances are added, then the solver reports "satisfiable".

Strategies

For handling theorems, CVC4 primarily uses various configurations of E-matching. This year, CVC4 incorporates new methods for finding conflicting instances of quantified formulas [RTd14], which have been shown to lead to improved performance on unsatisfiable TPTP benchmarks. CVC4 also incorporates a model-based heuristic for handling quantified formulas containing only pure arithmetic, which will be used in the TFA division.

For handling non-theorems, the finite model finding feature of CVC4 will use a number of orthogonal quantifier instantiation strategies. This year, it will incorporate several new features, including an optimized implementation of model-based quantifier instantiation which improves upon [RT+13], as well as techniques for sort inference. Since CVC4 with finite model finding is also capable of answering "unsatisfiable", it will be used as a strategy for theorems as well.

Implementation

CVC4 is implemented in C++. The code is available from
    https://github.com/CVC4

Expected Competition Performance

In the FOF division, CVC4 should perform better than last year, primarily due to the incorporation of methods from [RTd14]. In the FNT division, CVC4 should also perform better than last year, due to its incorporation of sort inference techniques and general improvements to the implementation. This is the first year CVC4 has entered the TFA division, where it should be fairly competitive due to its efficient handling of ground arithmetic constraints.


E 1.9

Stephan Schulz
DHBW Stuttgart, Germany

Architecture

E [
Sch02,Sch13] is a purely equational theorem prover for full first-order logic with equality. It consists of an (optional) clausifier for pre-processing full first-order formulae into clausal form, and a saturation algorithm implementing an instance of the superposition calculus with negative literal selection and a number of redundancy elimination techniques. E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e., a strict separation of active and passive facts. No special rules for non-equational literals have been implemented. Resolution is effectively simulated by paramodulation and equality resolution.

Strategies

Proof search in E is primarily controlled by a literal selection strategy, a clause evaluation heuristic, and a simplification ordering. The prover supports a large number of pre-programmed literal selection strategies. Clause evaluation heuristics can be constructed on the fly by combining various parameterized primitive evaluation functions, or can be selected from a set of predefined heuristics. Clause evaluation heuristics are based on symbol-counting, but also take other clause properties into account. In particular, the search can prefer clauses from the set of support, or containing many symbols also present in the goal. Supported term orderings are several parameterized instances of Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO).

For CASC-J7, E implements a strategy-scheduling automatic mode. The total CPU time available is broken into 8 (unequal) time slices. For each time slice, the problem is classified into one of several classes, based on a number of simple features (number of clauses, maximal symbol arity, presence of equality, presence of non-unit and non-Horn clauses, ...). For each class, a schedule of strategies is greedily constructed from experimental data as follows: The first strategy assigned to a schedule is the the one that solves the most problems from this class in the first time slice. Each subsequent strategy is selected based on the number of solutions on problems not already solved by a preceding strategy.

About 210 different strategies have been evaluated on all untyped first-order problems from TPTP 6.0.0, and about 180 of these strategies are used in the automatic mode.

Implementation

E is build around perfectly shared terms, i.e. each distinct term is only represented once in a term bank. The whole set of terms thus consists of a number of interconnected directed acyclic graphs. Term memory is managed by a simple mark-and-sweep garbage collector. Unconditional (forward) rewriting using unit clauses is implemented using perfect discrimination trees with size and age constraints. Whenever a possible simplification is detected, it is added as a rewrite link in the term bank. As a result, not only terms, but also rewrite steps are shared. Subsumption and contextual literal cutting (also known as subsumption resolution) is supported using feature vector indexing [Sch04]. Superposition and backward rewriting use fingerprint indexing [Sch12], a new technique combining ideas from feature vector indexing and path indexing. Finally, LPO and KBO are implemented using the elegant and efficient algorithms developed by Bernd Löchner in [Loe06,Loe06]. The prover and additional information are available at
    http://www.eprover.org

Expected Competition Performance

E 1.9 has slightly better strategies than previous versions, and can produce proof objects quite efficiently. The system is expected to perform well in most proof classes, but will at best complement top systems in the disproof classes.


E.T. 0.1

Josef Urban1, Cezary Kaliszyk2, Stephan Schulz3, Jiri Vyskocil4
1Radboud University Nijmegen, The Netherlands, 2University of Innsbruck, Austria, 3DHBW Stuttgart, Germany, 4Czech Technical University, Czech Republic

Architecture

E.T. 0.1 is a metasystem using E prover with specific strategies [
Urb13,KU13] and preprocessing tools [KU13,KU13,KU13] that are targeted mainly at problems with many redundant axioms. Its design is motivated by the recent experiments in the Large-Theory Batch division [KUV14] and on the Flyspeck, Mizar and Isabelle datasets, however, E.T. does no learning from related proofs.

Strategies

We characterize formulas by the symbols and terms that they contain, normalized in various ways. Then we run various algorithms that try to remove the redundant axioms and use special strategies on such problems.

Implementation

The metasystem is implemented in ca. 1000 lines of Perl. It uses a number of external programs, some of them based on E's code base, some of them independently implemented in C++.

Expected Competition Performance

E.T. can solve some problems that E 1.8 cannot prove, and even some TPTP problems with rating 1. The CASC performance should not be much worse than that of E, possibly better, depending on problem selection.


HOLyHammer 140616

Cezary Kaliszyk1, Josef Urban2, Stephan Schulz3
1University of Innsbruck, Austria, 2Radboud University Nijmegen, The Netherlands, 3DHBW Stuttgart, Germany

Architecture

HOLyHammer [
KU14,KU13] is a metasystem for premise selection and proof translation from polymorphic HOL to FOF ATP provers. This is a version restricted to THF0 and to using E prover with specific strategies.

Strategies

The translation uses polymorphic type tags, the apply functor, lambda lifting and predicate abstraction. The premise selection [KU14] uses k-NN, naive Bayes and a modified implementation of the Meng-Paulson relevance filter. We characterize formulas by the symbols and terms that they contain, normalized in various ways.

Implementation

The HH metasystem consists of about 2000 lines of OCaml code on top of about 5000 lines of HOL Light code. The predictors are about 400 lines of C++ code. The system produces higher-order logic proofs internally that match the HOL Light kernel, however no TSTP proofs are produced [KU13].

Expected Competition Performance

We expect HOLyHammer to solve some large problems with many redundant axioms, however we have not tested it on the whole set of THF0 TPTP problems.


iProver 0.9

Konstantin Korovin
University of Manchester, United Kingdom

Architecture

iProver is an automated theorem prover based on an instantiation calculus Inst-Gen [
GK03,Kor09] which is complete for first-order logic. One of the distinctive features of iProver is a modular combination of first-order reasoning with ground reasoning. In particular, iProver currently integrates MiniSat [ES04] for reasoning with ground abstractions of first-order clauses. In addition to instantiation, iProver implements ordered resolution calculus and a combination of instantiation and ordered resolution; see [Kor08] for the implementation details. The saturation process is implemented as a modification of a given clause algorithm. iProver uses non-perfect discrimination trees for the unification indexes, priority queues for passive clauses, and a compressed vector index for subsumption and subsumption resolution (both forward and backward). The following redundancy eliminations are implemented: blocking non-proper instantiations; dismatching constraints [GK04,Kor08]; global subsumption [Kor08]; resolution-based simplifications and propositional-based simplifications. A compressed feature vector index is used for efficient forward/backward subsumption and subsumption resolution. Equality is dealt with (internally) by adding the necessary axioms of equality with an option of using Brand's transformation. In the LTB division, iProver-SInE uses axiom selection based on the SInE algorithm [HV11] as implemented in Vampire [HKV10], i.e., axiom selection is done by Vampire and proof attempts are done by iProver.

Major additions in the current version are:

Strategies

iProver has around 40 options to control the proof search including options for literal selection, passive clause selection, frequency of calling the SAT solver, simplifications and options for combination of instantiation with resolution. At CASC iProver will execute a small number of fixed schedules of selected options depending on general syntactic properties such as Horn/non-Horn, equational/non-equational, and maximal term depth.

Implementation

iProver is implemented in OCaml and for the ground reasoning uses MiniSat. iProver accepts FOF and CNF formats, where Vampire [HKV10] is used for clausification of FOF problems.

iProver is available from

    http://www.cs.man.ac.uk/~korovink/iprover/

Expected Competition Performance

iProver 0.9 is the CASC-24 EPR division winner.


iProver 1.0

Konstantin Korovin, Christoph Sticksel
University of Manchester, United Kingdom

Architecture

iProver is an automated theorem prover based on an instantiation calculus Inst-Gen [
GK03,Kor13] which is complete for first-order logic. iProver combines first-order reasoning with ground reasoning for which it uses MiniSat [ES04] and was recently extended with PicoSAT [Bie08] and Lingeling [Bie12] (only MiniSat will be used at this CASC). iProver also combines instantiation with ordered resolution; see [Kor08] for the implementation details. The proof search is implemented using a saturation process based on the given clause algorithm. iProver uses non-perfect discrimination trees for the unification indexes, priority queues for passive clauses, and a compressed vector index for subsumption and subsumption resolution (both forward and backward). The following redundancy eliminations are implemented: blocking non-proper instantiations; dismatching constraints [GK04,Kor08]; global subsumption [Kor08]; resolution-based simplifications and propositional-based simplifications. A compressed feature vector index is used for efficient forward/backward subsumption and subsumption resolution. Equality is dealt with (internally) by adding the necessary axioms of equality with an option of using Brand's transformation. In the LTB division, iProver uses axiom selection based on the SInE algorithm [HV11] as implemented in Vampire [HKV11], i.e., axiom selection is done by Vampire and proof attempts are done by iProver.

Some of iProver features are summarised below.

Type inference is targeted at improving finite model finding and symmetry breaking. Semantic filtering is used in preprocessing to eliminated irrelevant clauses. Proof extraction is challenging due to simplifications such global subsumption which involve global reasoning with the whole clause set and can be computationally expensive.

Strategies

iProver has around 60 options to control the proof search including options for literal selection, passive clause selection, frequency of calling the SAT solver, simplifications and options for combination of instantiation with resolution. At CASC iProver will execute a small number of fixed schedules of selected options depending on general syntactic properties such as Horn/non-Horn, equational/non-equational, and maximal term depth. The strategy for satisfiable problems (FNT division) includes finite model finding.

Implementation

iProver is implemented in OCaml and for the ground reasoning uses MiniSat [ES04]. iProver accepts FOF and CNF formats. Vampire [HKV11,HK+12] is used for proof-producing clausification of FOF problems as well as for axiom selection [HV11] in the LTB division.

iProver is available from

    http://www.cs.man.ac.uk/~korovink/iprover/

Expected Competition Performance

iProver 1.0 is the CASC-24 FNT division winner.


Isabelle 2013

Jasmin C. Blanchette1, Lawrence C. Paulson2, Tobias Nipkow1, Makarius Wenzel3
1Technische Universität München, Germany, 2University of Cambridge, United Kingdom, 3Université Paris Sud, France

Architecture

Isabelle/HOL 2013 [
NPW12] is the higher-order logic incarnation of the generic proof assistant Isabelle2013. Isabelle/HOL provides several automatic proof tactics, notably an equational reasoner [Nip89], a classical reasoner [PN94], and a tableau prover [Pau99]. It also integrates external first- and higher-order provers via its subsystem Sledgehammer [PB10,BBP11].

Isabelle includes a parser for the TPTP syntaxes CNF, FOF, TFF0, and THF0, due to Nik Sultana. It also includes TPTP versions of its popular tools, invokable on the command line as isabelle tptp_tool max_secs file.p. For example:

isabelle tptp_isabelle_hot 100 SEU/SEU824^3.p

Strategies

The Isabelle tactic submitted to the competition simply tries the following tactics sequentially:
sledgehammer
Invokes the following sequence of provers as oracles via Sledgehammer:
nitpick
For problems involving only the type $o of Booleans, checks whether a finite model exists using Nitpick [BN10].
simp
Performs equational reasoning using rewrite rules [Nip89].
blast
Searches for a proof using a fast untyped tableau prover and then attempts to reconstruct the proof using Isabelle tactics [Pau99].
auto+spass
Combines simplification and classical reasoning [Pau94] under one roof; then invoke Sledgehammer with SPASS on any subgoals that emerge.
z3
Invokes the SMT solver Z3 3.2 [dMB08].
cvc3
Invokes the SMT solver CVC3 2.2 [BT07].
fast
Searches for a proof using sequent-style reasoning, performing a depth-first search [Pau94]. Unlike blast, it construct proofs directly in Isabelle. That makes it slower but enables it to work in the presence of the more unusual features of HOL, such as type classes and function unknowns.
best
Similar to fast, except that it performs a best-first search.
force
Similar to auto, but more exhaustive.
meson
Implements Loveland's MESON procedure [Lov78]. Constructs proofs directly in Isabelle.
fastforce
Combines fast and force.

Implementation

Isabelle is a generic theorem prover written in Standard ML. Its meta-logic, Isabelle/Pure, provides an intuitionistic fragment of higher-order logic. The HOL object logic extends pure with a more elaborate version of higher-order logic, complete with the familiar connectives and quantifiers. Other object logics are available, notably FOL (first-order logic) and ZF (Zermelo-Fraenkel set theory).

The implementation of Isabelle relies on a small LCF-style kernel, meaning that inferences are implemented as operations on an abstract theorem datatype. Assuming the kernel is correct, all values of type theorem are correct by construction.

Most of the code for Isabelle was written by the Isabelle teams at the University of Cambridge and the Technische Universität München. Isabelle/HOL is available for all major platforms under a BSD-style license from

    http://www.cl.cam.ac.uk/research/hvg/Isabelle

Expected Competition Performance

Third place, after Satallax and Satallax_MaLeS but before AgsyHOL and LEO-II.


leanCoP 2.2

Jens Otten
University of Potsdam, Germany

Architecture

leanCoP [
OB03,Ott08] is an automated theorem prover for classical first-order logic with equality. It is a very compact implementation of the connection (tableau) calculus [Bib87,LS01].

Strategies

The reduction rule of the connection calculus is applied before the extension rule. Open branches are selected in a depth-first way. Iterative deepening on the proof depth is performed in order to achieve completeness. Additional inference rules and techniques include regularity, lemmata, and restricted backtracking [Ott10]. leanCoP uses an optimized structure-preserving transformation into clausal form [Ott10] and a fixed strategy scheduling, which is controlled by a shell script.

Implementation

leanCoP is implemented in Prolog. The source code of the core prover consists only of a few lines of code. Prolog's built-in indexing mechanism is used to quickly find connections when the extension rule is applied.

leanCoP can read formulae in leanCoP syntax and in TPTP first-order syntax. Equality axioms and axioms to support distinct objects are automatically added if required. The leanCoP core prover returns a very compact connection proof, which is then translated into a more comprehensive output format, e.g., into a lean (TPTP-style) connection proof or into a readable text proof.

The source code of leanCoP 2.2 is available under the GNU general public license. It can be downloaded from the leanCoP website at:

    http://www.leancop.de
The website also contains information about ileanCoP [Ott08] and MleanCoP [Ott12,Ott14], two versions of leanCoP for first-order intuitionistic logic and first-order modal logic, respectively.

Expected Competition Performance

As the core prover has not changed, the performance of leanCoP 2.2 is expected to be similar to the performance of leanCoP 2.1.


LEO-II 1.6.2

Christoph Benzmüller
Freie Universität Berlin, Germany

Architecture

LEO-II [
BP+08], the successor of LEO [BK98], is a higher-order ATP system based on extensional higher-order resolution. More precisely, LEO-II employs a refinement of extensional higher-order RUE resolution [Ben99]. LEO-II is designed to cooperate with specialist systems for fragments of higher-order logic. By default, LEO-II cooperates with the first-order ATP system E [Sch02]. LEO-II is often too weak to find a refutation amongst the steadily growing set of clauses on its own. However, some of the clauses in LEO-II's search space attain a special status: they are first-order clauses modulo the application of an appropriate transformation function. Therefore, LEO-II launches a cooperating first-order ATP system every n iterations of its (standard) resolution proof search loop (e.g., 10). If the first-order ATP system finds a refutation, it communicates its success to LEO-II in the standard SZS format. Communication between LEO-II and the cooperating first-order ATP system uses the TPTP language and standards.

Strategies

LEO-II employs an adapted "Otter loop". Moreover, LEO-II uses some basic strategy scheduling to try different search strategies or flag settings. These search strategies also include some different relevance filters.

Implementation

LEO-II is implemented in OCaml 4, and its problem representation language is the TPTP THF language [BRS08]. In fact, the development of LEO-II has largely paralleled the development of the TPTP THF language and related infrastructure [SB10]. LEO-II's parser supports the TPTP THF0 language and also the TPTP languages FOF and CNF.

Unfortunately the LEO-II system still uses only a very simple sequential collaboration model with first-order ATPs instead of using the more advanced, concurrent and resource-adaptive OANTS architecture [BS+08] as exploited by its predecessor LEO.

The LEO-II system is distributed under a BSD style license, and it is available from

    http://www.leoprover.org

Expected Competition Performance

There are only minor recent improvements on LEO-II. It is unclear whether they are sufficient for attacking LEO-II's main competitors.


Muscadet 4.4

Dominique Pastre
University Paris Descartes, France

Architecture

The Muscadet theorem prover is a knowledge-based system. It is based on Natural Deduction, following the terminology of [
Ble71] and [Pas78], and uses methods which resembles those used by humans. It is composed of an inference engine, which interprets and executes rules, and of one or several bases of facts, which are the internal representation of "theorems to be proved". Rules are either universal and put into the system, or built by the system itself by metarules from data (definitions and lemmas). Rules may add new hypotheses, modify the conclusion, create objects, split theorems into two or more subtheorems or build new rules which are local for a (sub-)theorem.

Strategies

There are specific strategies for existential, universal, conjonctive or disjunctive hypotheses and conclusions, and equalities. Functional symbols may be used, but an automatic creation of intermediate objects allows deep subformulae to be flattened and treated as if the concepts were defined by predicate symbols. The successive steps of a proof may be forward deduction (deduce new hypotheses from old ones), backward deduction (replace the conclusion by a new one), refutation (only if the conclusion is a negation), search for objects satisfying the conclusion or dynamic building of new rules.

The system is also able to work with second order statements. It may also receive knowledge and know-how for a specific domain from a human user; see [Pas89] and [Pas93]. These two possibilities are not used while working with the TPTP Library.

Implementation

Muscadet [Pas01] is implemented in SWI-Prolog. Rules are written as more or less declarative Prolog clauses. Metarules are written as sets of Prolog clauses. The inference engine includes the Prolog interpreter and some procedural Prolog clauses. A theorem may be split into several subtheorems, structured as a tree with "and" and "or" nodes. All the proof search steps are memorized as facts including all the elements which will be necessary to extract later the useful steps (the name of the executed action or applied rule, the new facts added or rule dynamically built, the antecedents and a brief explanation).

Muscadet is available from

    http://www.math-info.univ-paris5.fr/~pastre/muscadet/muscadet.html

Expected Competition Performance

The best performances of Muscadet will be for problems manipulating many concepts in which all statements (conjectures, definitions, axioms) are expressed in a manner similar to the practice of humans, especially of mathematicians [Pas02,Pas07]. It will have poor performances for problems using few concepts but large and deep formulas leading to many splittings. Its best results will be in set theory, especially for functions and relations. Its originality is that proofs are given in natural style. Changes since last year are only minor corrections.


Princess 140704

Philipp Rümmer, Peter Backeman
Uppsala University, Sweden

Architecture

Princess [
Rue08,Rue12] is a theorem prover for first-order logic modulo linear integer arithmetic. The prover uses a combination of techniques from the areas of first-order reasoning and SMT solving. The main underlying calculus is a free-variable tableau calculus, which is extended with constraints to enable backtracking-free proof expansion, and positive unit hyper-resolution for lightweight instantiation of quantified formulae. Linear integer arithmetic is handled using a set of built-in proof rules resembling the Omega test, which altogether yields a calculus that is complete for full Presburger arithmetic, for first-order logic, and for a number of further fragments. In addition, some built-in procedures for nonlinear integer arithmetic are available.

The internal calculus of Princess only supports uninterpreted predicates; uninterpreted functions are encoded as predicates, together with the usual axioms. Through appropriate translation of quantified formulae with functions, the e-matching technique common in SMT solvers can be simulated; triggers in quantified formulae are chosen based on heuristics similar to those in the Simplify prover.

Strategies

For CASC, Princess will run a fixed schedule of configurations for each problem (portfolio method). Configurations determine, among others, the mode of proof expansion (depth-first, breadth-first), selection of triggers in quantified formulae, clausification, and the handling of functions. The portfolio was chosen based on training with a random sample of problems from the TPTP library.

Implementation

Princess is entirely written in Scala and runs on any recent Java virtual machine; besides the standard Scala and Java libraries, only the Cup parser library is used. Princess is available from:
    http://www.philipp.ruemmer.org/princess.shtml

Expected Competition Performance

Princess is mainly designed for integer problems (TFI), and should perform reasonably well here. Reasoning about rationals or reals is not the main focus of the work, and results will be accordingly.


Prover9 2009-11A

Bob Veroff on behalf of William McCune
University of New Mexico, USA

Architecture

Prover9, Version 2009-11A, is a resolution/paramodulation prover for first-order logic with equality. Its overall architecture is very similar to that of Otter-3.3 [
McC03]. It uses the "given clause algorithm", in which not-yet-given clauses are available for rewriting and for other inference operations (sometimes called the "Otter loop").

Prover9 has available positive ordered (and nonordered) resolution and paramodulation, negative ordered (and nonordered) resolution, factoring, positive and negative hyperresolution, UR-resolution, and demodulation (term rewriting). Terms can be ordered with LPO, RPO, or KBO. Selection of the "given clause" is by an age-weight ratio.

Proofs can be given at two levels of detail: (1) standard, in which each line of the proof is a stored clause with detailed justification, and (2) expanded, with a separate line for each operation. When FOF problems are input, proof of transformation to clauses is not given.

Completeness is not guaranteed, so termination does not indicate satisfiability.

Strategies

Like Otter, Prover9 has available many strategies; the following statements apply to CASC-2012.

Given a problem, Prover9 adjusts its inference rules and strategy according to syntactic properties of the input clauses such as the presence of equality and non-Horn clauses. Prover9 also does some preprocessing, for example, to eliminate predicates.

In previous CASC competitions, Prover9 has used LPO to order terms for demodulation and for the inference rules, with a simple rule for determining symbol precedence. For CASC 2012, we are going to use KBO instead.

For the FOF problems, a preprocessing step attempts to reduce the problem to independent subproblems by a miniscope transformation; if the problem reduction succeeds, each subproblem is clausified and given to the ordinary search procedure; if the problem reduction fails, the original problem is clausified and given to the search procedure.

Implementation

Prover9 is coded in C, and it uses the LADR libraries. Some of the code descended from EQP [McC97]. (LADR has some AC functions, but Prover9 does not use them). Term data structures are not shared (as they are in Otter). Term indexing is used extensively, with discrimination tree indexing for finding rewrite rules and subsuming units, FPA/Path indexing for finding subsumed units, rewritable terms, and resolvable literals. Feature vector indexing [Sch04] is used for forward and backward nonunit subsumption. Prover9 is available from
    http://www.cs.unm.edu/~mccune/prover9/

Expected Competition Performance

Some of the strategy development for CASC was done by experimentation with the CASC-2004 competition "selected" problems. (Prover9 has not yet been run on other TPTP problems.) Prover9 is unlikely to challenge the CASC leaders, because (1) extensive testing and tuning over TPTP problems has not been done, (2) theories (e.g., ring, combinatory logic, set theory) are not recognized, (3) term orderings and symbol precedences are not fine-tuned, and (4) multiple searches with differing strategies are not run.

Finishes in the middle of the pack are anticipated in all categories in which Prover9 competes.


Satallax 2.7

Chad E. Brown
Saarland University, Germany

Architecture

Satallax 2.7 [
Bro12] is an automated theorem prover for higher-order logic. The particular form of higher-order logic supported by Satallax is Church's simple type theory with extensionality and choice operators. The SAT solver MiniSat [ES04] is responsible for much of the search for a proof. The theoretical basis of search is a complete ground tableau calculus for higher-order logic [BS10] with a choice operator [BB11]. A problem is given in the THF format. A branch is formed from the axioms of the problem and the negation of the conjecture (if any is given). From this point on, Satallax tries to determine unsatisfiability or satisfiability of this branch.

Satallax progressively generates higher-order formulae and corresponding propositional clauses [Bro13]. These formulae and propositional clauses correspond to instances of the tableau rules. Satallax uses the SAT solver MiniSat as an engine to test the current set of propositional clauses for unsatisfiability. If the clauses are unsatisfiable, then the original branch is unsatisfiable.

Additionally, Satallax may optionally generate first-order formulas in addition to the propositional clauses. If this option is used, then Satallax peroidically calls the first-order theorem prover E to test for first-order unsatisfiability. If the set of first-order formulas is unsatisfiable, then the original branch is unsatisfiable.

Strategies

There are about a hundred flags that control the order in which formulas and instantiation terms are considered and propositional clauses are generated. Other flags activate some optional extensions to the basic proof procedure (such as whether or not to call the theorem prover E). A collection of flag settings is called a mode. Approximately 500 modes have been defined and tested so far. A strategy schedule is an ordered collection of modes with information about how much time the mode should be allotted. Satallax tries each of the modes for a certain amount of time sequentially. Satallax 2.7 has strategy schedule consisting of 68 modes. Each mode is tried for time limits ranging from 0.1 seconds to 54.9 seconds. The strategy schedule was determined through experimentation using the THF problems in version 5.4.0 of the TPTP library.

Implementation

Satallax 2.7 is implemented in OCaml. A foreign function interface is used to in teract with MiniSat 2.2.0. Satallax is available from
    http://satallax.com

Expected Competition Performance

Satallax 2.7 is the CASC-24 THF division runner-up.


Satallax_MaLeS 1.3

Daniel Kuehlwein
Radboud University Nijmegen, The Netherlands

Architecture

Satallax_MaLeS 1.3 improves Satallax's automatic mode with machine learning techniques to predict which search strategy is most likely to find a proof.

Strategies

Satallax_MaLeS 1.3 relies on Geoff Sutcliffe's MakeListStat to classify problems. It uses the same data as Satallax_MaLeS 1.2 as basis for the learning algorithms.

Implementation

Satallax_MaLeS is based on Python, in particular the Sklearn library that contains many machine learning algorithms. After CASC Satallax_MaLeS will be available from
    
    http://www.cs.ru.nl/~kuehlwein/

Expected Competition Performance

We expect Satallax_MaLeS 1.3 to perform slightly better than Satallax_MaLeS 1.2.


SPASS+T 2.2.19

Uwe Waldmann
Max-Planck-Insitut für Informatik, Germany

Architecture

SPASS+T is an extension of the superposition-based theorem prover SPASS that integrates algebraic knowledge into SPASS in three complementary ways: by passing derived formulas to an external SMT procedure (currently Yices or CVC3), by adding standard axioms, and by built-in arithmetic simplification and inference rules. A first version of the system has been described in [
PW06]; later a much more sophisticated coupling of the SMT procedure has been added [Zim07]. The latest version provides improved support for isint/1, israt/1, floor/1 and ceiling/1 and adds partial input abstraction and history-dependent weights for numerical constants.

Strategies

Standard axioms and built-in arithmetic simplification and inference rules are integrated into the standard main loop of SPASS. Inferences between standard axioms are excluded, so the user-supplied formulas are taken as set of support. The external SMT procedure runs in parallel in a separate process, leading occasionally to non-deterministic behaviour.

Implementation

SPASS+T is implemented in C. SPASS+T is available from
    http://www.mpi-inf.mpg.de/~uwe/software/#TSPASS

Expected Competition Performance

SPASS+T 2.2.16 came second in the TFA division of last CASC; we expect a similar performance in CASC 2013.


SPASS+T 2.2.20

Uwe Waldmann
Max-Planck-Insitut für Informatik, Germany

Architecture

SPASS+T is an extension of the superposition-based theorem prover SPASS that integrates algebraic knowledge into SPASS in three complementary ways: by passing derived formulas to an external SMT procedure (currently Yices or CVC3), by adding standard axioms, and by built-in arithmetic simplification and inference rules. A first version of the system has been described in [
PW06]; later a much more sophisticated coupling of the SMT procedure has been added [Zim07]. The latest version provides improved support for isint/1, israt/1, floor/1 and ceiling/1 and adds partial input abstraction and history-dependent weights for numerical constants.

Strategies

Standard axioms and built-in arithmetic simplification and inference rules are integrated into the standard main loop of SPASS. Inferences between standard axioms are excluded, so the user-supplied formulas are taken as set of support. The external SMT procedure runs in parallel in a separate process, leading occasionally to non-deterministic behaviour.

Implementation

SPASS+T is implemented in C. SPASS+T is available from
    http://www.mpi-inf.mpg.de/~uwe/software/#TSPASS

Expected Competition Performance

SPASS+T 2.2.19 came first in the TFA division of last CASC; we expect a similar performance in CASC 2014.


Vampire 2.6

Krystof Hoder, Andrei Voronkov
University of Manchester, England

Architecture

Vampire 2.6 is an automatic theorem prover for first-order classical logic. It consists of a shell and a kernel. The kernel implements the calculi of ordered binary resolution and superposition for handling equality. It also implements the Inst-gen calculus. The splitting rule in kernel adds propositional parts to clauses, which are being manipulated using binary decision diagrams and a SAT solver. A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered unit equalities. The reduction ordering is the Knuth-Bendix Ordering.

Substitution tree and code tree indexes are used to implement all major operations on sets of terms, literals and clauses. Although the kernel of the system works only with clausal normal form, the shell accepts a problem in the full first-order logic syntax, clausifies it and performs a number of useful transformations before passing the result to the kernel. Also the axiom selection algorithm Sine [HV11] can be enabled as part of the preprocessing.

When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF.

Strategies

The Vampire 2.6 kernel provides a fairly large number of options for strategy se lection. The most important ones are:

Implementation

Vampire 2.6 is implemented in C++.

Expected Competition Performance

Vampire 2.6 is the CASC-24 FOF division winner.


Waldmeister 710

Thomas Hillenbrand
Max-Planck-Institut für Informatik, Germany

Architecture

Waldmeister 710 [
Hil03] is a system for unit equational deduction. Its theoretical basis is unfailing completion in the sense of [BDP89] with refinements towards ordered completion (cf. [AHL03]). The system saturates the input axiomatization, distinguishing active facts, which induce a rewrite relation, and passive facts, which are the one-step conclusions of the active ones up to redundancy. The saturation process is parameterized by a reduction ordering and a heuristic assessment of passive facts [HJL99]. This year's version is the result of polishing and fixing a few things in last year's.

Strategies

The prover is coded in ANSI-C. It runs on Solaris, Linux, MacOS X, and Windows/Cygwin. The central data strucures are: perfect discrimination trees for the active facts; group-wise compressions for the passive ones; and sets of rewrite successors for the conjectures. Visit the Waldmeister web pages at:
    http://www.waldmeister.org

Implementation

The approach taken to control the proof search is to choose the search parameters – reduction ordering and heuristic assessment – according to the algebraic structure given in the problem specification [HJL99]. This is based on the observation that proof tasks sharing major parts of their axiomatization often behave similarly.

Expected Competition Performance

Waldmeister 710 is the CASC-23 UEQ division winner.


VanHElsing 1.0

Daniel Kuehlwein
Radboud University Nijmegen, The Netherlands

Architecture

VanHElsing improves E's automatic mode with machine learning techniques to predict which search strategy is most likely to find a proof.

Strategies

VanHElsing relies on E's features to classify problems. It uses the same data as E 1.8 as basis for the learning algorithms.

Implementation

VanHElsing is based on Python, in particular the Sklearn library that contains many machine learning algorithms. After CASC VanHElsing will be availble from
    
    http://www.cs.ru.nl/~kuehlwein/

Expected Competition Performance

We expect VanHElsing to perform slightly better than E 1.8.


Zipperposition 0.4

Simon Cruanes
INRIA, France

Architecture

Zipperposition is a typed first-order superposition theorem prover written in OCaml for prototyping. It deals with hashconsed polymorphic terms, has a Hindley-Milner-like type-inference algorithm and accepts TFF1 input. Its core superposition calculus is strongly inspired from E (discount loop, mostly same inferences including condensation and contextual literal cutting).

The new version has experimental support for linear integer arithmetic. It features a new calculus (not published yet), inspired from the work of Waldmann on superposition modulo rational arithmetic [Wal01]. The calculus is an extension of superposition that doesn't use a black-box solver, but has special inferences to deal with arithmetic equations, inequations and divisibility literals. Additional redundancy criteria are used to make the search space more tractable. As we don't have completeness results yet, the solver will fail on satisfiable problems, even if it finds a saturated set of clauses.

Strategies

Zipperposition uses several clause queues, but unlike E it has no heuristic to choose them based on the input. The strategy is therefore a simple alternance of old clauses, and "small" clauses, in one run.

Implementation

Zipperposition is in pure OCaml, with a mix of imperative and functional style, including iterators, zippers, etc. It uses Non-perfect Discrimination Trees and Feature Vectors [Sch04] for indexing. Numbers are handled by Zarith, a frontend to GMP.

Expected Competition Performance

In FOF, we expect decent performance. The results should be quite similar to last year as the superposition core isn't the focus of our research.

In TFA, since we feature a new calculus, performance is hard to predict. We expect the prover to perform well on problems SMT solvers will struggle with, and conversely; best performance should be reached on unsatisfiable problems with few inequations but possibly function symbols and quantifiers; worst will be on combinatorial ground problems that require little symbolic reasoning.


References