Entrants' System Descriptions

CSE 1.2
CSE_E 1.1
CVC4 1.7
E 2.4
Enigma 0.5
Etableau 0.1
GKC 0.4
iProver 2.8
iProver 3.0
leanCoP 2.2
LEO-II 1.7.0
Leo-III 1.4
MaedMax 1.3
MaLARea 0.6
MaLARea 0.8
nanoCoP 1.1
Prover9 1109a
PyRes 1.0
Satallax 3.3
Satallax 3.4
Twee 2.2
Vampire 4.3
Vampire 4.4
Waldmeister 710
Zipperposition 1.5

CSE 1.2

Feng Cao
Southwest Jiaotong University, China

Architecture

CSE 1.2 is the latest version of CSE that is an automated theorem prover for first-order logic without equality mainly based on a novel inference mechanism, called as Contradiction Separation Based Dynamic Multi-Clause Synergized Automated Deduction (S-CS) [XL+18], which is able to handle multiple (two or more) clauses dynamically in a synergized way in one deduction step, while binary resolution is its special case. CSE 1.2 also adopts conventional factoring, equality resolution (ER rule), and variable renaming. Some pre-processing techniques, including pure literal deletion and simplification based on the distance to the goal clause, and a number of standard redundancy criteria for pruning the search space: tautology deletion, subsumption (forward and backward) are applied as well. CSE 1.2 has mainly the following improvements comparing to CSE 1.1, the version in CASC-J9 last year.

A new S-CS inference algorithm is implemented which can make the generation of S-CS Clause slowdown in terms of the number of literals and the maximal function depth to a greater extent.
A new S-CS inference algorithm is added which can make full use of the same input clause by repeated usage.
Different S-CS inference algorithms are applied in a mixed way when CSE 1.2 tries to solve a problem.

Internally, CSE 1.1 works only with clausal normal form. E prover [Sch13], is adopted with thanks for clausification of full first-order logic problems during preprocessing.

Strategies

CSE 1.2 inherited most of the strategies in CSE 1.1, the main new strategies are:

Deduction weight. According to the symbol structure of the input clause, it is evaluated how its deduction weight should be increased.
Substitution complexity. In the process of S-CS deduction, the complexity of the substitution is evaluated to decide whether the deduction can be performed.

Implementation

CSE 1.2 is implemented mainly in C++, and JAVA is used for batch problem running implementation. Shared data structure is used for constants and shared variables storage. In addition, special data structure is designed for property description of clause, literal and term, so that it can support the multiple strategy mode. E prover is used for clausification of FOF problems, and then TPTP4X is applied to convert the CNF format into TPTP format.

Expected Competition Performance

CSE 1.2 has made some improvements compared to CSE 1.1, and we expect a better performance in the competition. Acknowledgement
Development of CSE 1.0 has been partially supported by the National Natural Science Foundation of China (NSFC) (Grant No.61673320) and the Fundamental Research Funds for the Central Universities in China (Grant No. 2682017ZT12, 2682018CX59).

CSE_E 1.1

Feng Cao
Southwest Jiaotong University, China

Architecture

CSE_E 1.1 is an automated theorem prover for first-order logic by combining CSE 1.2 and E 2.3, where CSE is based on the Contradiction Separation Based Dynamic Multi-Clause Synergized Automated Deduction (S-CS) [XL+18] and E is based on superposition. The combination mechanism is like this: E and CSE are applied to the given problem sequentially. If either prover solves the problem, then the proof process completes. If neither CSE nor E can solve the problem, some inferred clauses, especially unit clauses, by CSE will be fed to E as lemmas, along with the original clauses, for further proof search.

This kind of combination is expected to take advantage of both CSE and E, and produce a better performance. Concretely, CSE is able to generate a good number of unit clauses, based on the fact that unit clauses are helpful for proof search and equality handling. On the other hand, E has a good ability on equality handling.

Strategies

The strategies of CSE part of CSE_E 1.1 take the same strategies as in CSE 1.2 standalone, e.g., clause/literal selection, strategy selection, and CSC strategy. The newly added strategies are:

Lemma filtering according to whether it contains equality.
Lemma filtering according to whether it is inferred by the proof search going back to the negated conjecture.

Implementation

CSE_E 1.1 is implemented mainly in C++, and JAVA is used for batch problem running implementation. The job dispatch between CSE and E is implemented in JAVA.

Expected Competition Performance

We expect it to solve some hard problems and have a satisfying performance.

CVC4 1.7

Andrew Reynolds
University of Iowa, USA

Architecture

CVC4 [BC+11] is an SMT solver based on the DPLL(T) architecture [NOT06] that includes built-in support for many theories, including linear arithmetic, arrays, bit vectors, datatypes, finite sets and strings. It incorporates approaches for handling universally quantified formulas. For problems involving free function and predicate symbols, CVC4 primarily uses heuristic approaches based on conflict-based instantiation and E-matching for theorems, and finite model finding approaches for non-theorems. For problems in pure arithmetic, CVC4 uses techniques for counterexample-guided quantifier instantiation [RKK17].

Like other SMT solvers, CVC4 treats quantified formulas using a two-tiered approach. First, quantified formulas are replaced by fresh Boolean predicates and the ground theory solver(s) are used in conjunction with the underlying SAT solver to determine satisfiability. If the problem is unsatisfiable at the ground level, then the solver answers "unsatisfiable". Otherwise, the quantifier instantiation module is invoked, and will either add instances of quantified formulas to the problem, answer "satisfiable", or return unknown. Finite model finding in CVC4 targets problems containing background theories whose quantification is limited to finite and uninterpreted sorts. In finite model finding mode, CVC4 uses a ground theory of finite cardinality constraints that minimizes the number of ground equivalence classes, as described in [RT+13]. When the problem is satisfiable at the ground level, a candidate model is constructed that contains complete interpretations for all predicate and function symbols. It then adds instances of quantified formulas that are in conflict with the candidate model, as described in [RT+13]. If no instances are added, it reports "satisfiable". CVC4 now has native support for problems in higher-order logic, as described in recent work [BR+19]. It uses a pragmatic approach for HOL, where lambdas are eliminated eagerly via lambda lifting. The approach extends the theory solver for quantifier-free uninterpreted functions (UF) and E-matching. For the former, the theory solver for UF in CVC4 now handles equalities between functions using an extensionality inference. Partial applications of functions are handle using a (lazy) applicative encoding where some function applications are equated to the applicative encoding. For the latter, several of the data structures for E-matching have been modified to incorporate matching in the presence of equalities between functions, function variables, and partial function applications.

Strategies

For handling theorems, CVC4 primarily uses conflict-based quantifier instantiation [RTd14,BFR17], enumerative instantiation [RBF18], and E-matching. CVC4 uses a handful of orthogonal trigger selection strategies for E-matching. For handling non-theorems, CVC4 primarily uses finite model finding techniques. Since CVC4 with finite model finding is also capable of establishing unsatisfiability, it is used as a strategy for theorems as well. For problems in pure arithmetic, CVC4 uses variations of counterexample-guided quantifier instantiation [RD+15], which select relevant quantifier instantiations based on models for counterexamples to quantified formulas. At the quantifier-free level, CVC4 uses standard decision procedures for linear arithmetic and uninterpreted functions, as well as heuristic approaches for handling non-linear arithmetic [RT+17].

Implementation

CVC4 is implemented in C++. The code is available from:

    https://github.com/CVC4

Expected Competition Performance

The first-order theorem proving and finite model finding capabilities of CVC4 have not changed much in the past year. Its non-linear capabilities have been improved slightly, so it should improve slightly for TFA. This is CVC4's first year competing in the higher-order division (THF), where it should be fairly competitive but is not expected to win.

E 2.4

Stephan Schulz
DHBW Stuttgart, Germany

Architecture

E 2.4pre [Sch02, Sch13, SCV19] is a purely equational theorem prover for many-sorted first-order logic with equality. It consists of an (optional) clausifier for pre-processing full first-order formulae into clausal form, and a saturation algorithm implementing an instance of the superposition calculus with negative literal selection and a number of redundancy elimination techniques. E is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e., a strict separation of active and passive facts. No special rules for non-equational literals have been implemented. Resolution is effectively simulated by paramodulation and equality resolution. As of E 2.1, PicoSAT [Bie08] can be used to periodically check the (on-the-fly grounded) proof state for propositional unsatisfiability.

For the LTB divisions, a control program uses a SInE-like analysis to extract reduced axiomatizations that are handed to several instances of E. E will not use on-the-fly learning this year.

Strategies

Proof search in E is primarily controlled by a literal selection strategy, a clause selection heuristic, and a simplification ordering. The prover supports a large number of pre-programmed literal selection strategies. Clause selection heuristics can be constructed on the fly by combining various parameterized primitive evaluation functions, or can be selected from a set of predefined heuristics. Clause evaluation heuristics are based on symbol-counting, but also take other clause properties into account. In particular, the search can prefer clauses from the set of support, or containing many symbols also present in the goal. Supported term orderings are several parameterized instances of Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO), which can be lifted in different ways to literal orderings.

For CASC-27, E implements a strategy-scheduling automatic mode. The total CPU time available is broken into several (unequal) time slices. For each time slice, the problem is classified into one of several classes, based on a number of simple features (number of clauses, maximal symbol arity, presence of equality, presence of non-unit and non-Horn clauses, ...). For each class, a schedule of strategies is greedily constructed from experimental data as follows: The first strategy assigned to a schedule is the the one that solves the most problems from this class in the first time slice. Each subsequent strategy is selected based on the number of solutions on problems not already solved by a preceding strategy.

About 230 different strategies have been thoroughly evaluated on all untyped first-order problems from TPTP 7.2.0. In addition, we have explored some parts of the heuristic parameter space with a short time limit of 5 seconds. This allowed us to test about 650 strategies on all TPTP problems, and an extra 7000 strategies on all 1193 UEQ problems from TPTP 7.2.0. About 100 of these strategies are used in the automatic mode, and about 450 are used in at least one schedule.

Implementation

E is build around perfectly shared terms, i.e. each distinct term is represented only once in a term bank. The whole set of terms thus consists of a number of interconnected directed acyclic graphs. Term memory is managed by a simple mark-and-sweep garbage collector. Unconditional (forward) rewriting using unit clauses is implemented using perfect discrimination trees with size and age constraints. Whenever a possible simplification is detected, it is added as a rewrite link in the term bank. As a result, not only terms, but also rewrite steps are shared. Subsumption and contextual literal cutting (also known as subsumption resolution) is supported using feature vector indexing [Sch13]. Superposition and backward rewriting use fingerprint indexing [Sch12], a new technique combining ideas from feature vector indexing and path indexing. Finally, LPO and KBO are implemented using the elegant and efficient algorithms developed by Bernd Löchner in [Loe06, Loe06]. The prover and additional information are available at

    https://www.eprover.org

Expected Competition Performance

The inference core of E 2.4 has been slightly simplified since last years pre-release. We have also been able to evaluate more different search strategies, in particular those incorporating PicoSAT. As a result, we expect performance to be somewhat better than in the last years. The system is expected to perform well in most proof classes, but will at best complement top systems in the disproof classes.

Enigma 0.5

Jan Jakubuv
Czech Technical University in Prague, Czech Republic

Architecture

ENIGMA (Efficient learNing-based Inference Guiding MAchine) [JU17, JU18, JU19, GJU19, CJSU19] is an efficient implementation of learning-based guidance for given clause selection in saturation-based automated theorem provers. Clauses from many proof searches are classified as positive and negative based on their participation in the proofs. An efficient classification model is trained on this data, using fast feature-based characterization of the clauses. The learned model is then tightly linked with the core prover and used as a basis of a parameterized evaluation heuristic that provides fast ranking of all generated clauses. ENIGMA 0.4 uses E as its underlying prover. The CASC-27 FOF competition version will most likely use a classifier based on gradient-boosted trees (XGBoost or LightGBM) and clause features that abstract from symbol and formula/clause names. We may also include neural classifiers based on clause structure. The system will be pre-trained on the latest public TPTP library and also use a portfolio of strategies pre-trained for TPTP with BliStr(Tune) [Urb13, JU17]. The system may also include strategies for large TPTP problems used previously in the E.T. system.

Strategies

The core system is a modification of E adding learning-based evaluation of the generated clauses. The system is pre-trained on TPTP using abstracted characterization of clauses, and the trained classifier is used during the competition as an additional clause evaluation heuristic that is combined in various ways with the core E strategies.

Implementation

The system modifies E's implementation by adding various features extraction methods and linking E with efficient learning-based classifiers (tree-based, linear and neural). ENIGMA is available at:

    https://github.com/ai4reason/enigmatic

Expected Competition Performance

ENIGMA should be able to improve on E's auto schedule in FOF.

Etableau 0.1

John Hester
University of Florida, USA

GKC 0.4

Tanel Tammet
Tallinn University of Technology, Estonia

Architecture

GKC [Tam19] is a resolution prover optimized for search in large knowledge bases. It is built as a building block for developing specialized methods and strategies for commonsense reasoning, including nonmonotonic reasoning, probabilistic reasoning, and machine learning methods. We envision natural language question answering systems as the main potential application these specialized methods. These standard inference rules have been implemented:

Binary resolution with optionally the set of support strategy, negative or positive ordered resolution or unit restriction.
Hyperresolution.
Factorization.
Paramodulation and demodulation with the Knuth-Bendix ordering.

GKC does not currently implement any propositional inferences or instance generation.

Strategies

For CASC GKC uses multiple strategies run sequentially, with the time limit starting at one second for each, increased 5 times once the whole batch has been performed. The strategy selection takes into consideration the basic properties of the problem, in particular its approximate size. There is no interplay between different strategies.

We perform the selection of a given clause by using several queues in order to spread the selection relatively uniformly over these categories of derived clauses and their descendants: axioms, external axioms, assumptions, and goals. The queues are organized in two layers. As a first layer we use the common ratio-based algorithm of alternating between selecting n clauses from a weight-ordered queue and one clause from the FIFO queue with the derivation order. As a second layer we use four separate queues based on the derivation history of a clause. Each queue in the second layer contains the two sub-queues of the first layer.

Implementation

GKC is implemented in C. The data representation machinery is built upon a shared memory graph database Whitedb, enabling it to solve multiple different queries in parallel processeses without a need to repeatedly parse or load the large parsed knowledge base from the disk. An interesting aspect of GKC is the pervasive use of hash indexes, feature vectors and fingerprints, while no tree indexes are used.

GKC can be obtained from:

    https://github.com/tammet/gkc/

Expected Competition Performance

We expect GKC to be in the middle of the final ranking for FOF, and below the average in UEQ and EPR. In particular, GKC only looks for proofs in the EPR class and does not try to show non-provability. On the other hand, we expect GKC to perform well on very large problems.

iProver 2.8

Konstantin Korovin
University of Manchester, United Kingdom

Description Missing

Expected Competition Performance

iProver 2.8 was the CASC-J9 EPR winner.

iProver 3.0

Konstantin Korovin
University of Manchester, United Kingdom

Architecture

iProver is an automated theorem prover based on an instantiation calculus Inst-Gen [Kor13, Kor08, GK03], which is complete for first-order logic. iProver approximates first-order clauses using propositional abstractions which are solved using MiniSAT [ES04] and refined using model-guided instantiations. iProver combines instantiation, resolution and superposition. New features in iProver 3.0 include:

Superposition calculus together with simplifications such as demodulation, light normalisation, subsumption, subsumption resolution and global subsumption [DK19]. iProver's simplification set up is tunable via command line options and generalises common architectures such as Discount or Otter.
In the LTB and FEW divisions we use machine learning for learning iProver's heuristics based on dynamic clustering and hyper-parameter optimisation using SMAC [HHL12] as described in [HK19, HK19]. In these divisions iProver runs in parallel most promising learnt heuristics.
In the LTB division iProver combines abstraction-refinement [HK18] with axiom selection based on the SinE algorithm [HV11] as implemented in Vampire [KV13] We also implemented a number of under-and over-approximation strategies [HK18, HK19].

Strategies

iProver has around 100 options to control the proof search including options for literal selection, passive clause selection, frequency of calling the SAT solver, simplifications and options for combination of instantiation with resolution and superposition. By default iProver will execute a small number of fixed schedules of selected options depending on general syntactic properties such as Horn/non-Horn, equational/non-equational, and maximal term depth. In the LTB and FEW divisions iProver runs in parallel heuristics learnt using machine learning [HK19, HK19].

Implementation

iProver is implemented in OCaml and for the ground reasoning uses MiniSat [ES04]. iProver accepts FOF, TFF and CNF formats. Vampire [KV13, HK+12] and E prover [Sch13] are used for proof-producing clausification of FOF/TFF problems, Vampire is also used for SInE axiom selection [HV11] in the LTB division. iProver is available at:

    http://www.cs.man.ac.uk/~korovink/iprover/

Expected Competition Performance

Compared to the last year, we integrated superposition which should improve performance of iProver on equational problems. In the LTB/FEW division we also expect improvements due to new learnt strategies and new abstraction refinement strategies.

leanCoP 2.2

Jens Otten
University of Oslo, Norway

Architecture

leanCoP [OB03, Ott08] is an automated theorem prover for classical first-order logic with equality. It is a very compact implementation of the connection (tableau) calculus [Bib87, LS01].

Strategies

The reduction rule of the connection calculus is applied before the extension rule. Open branches are selected in a depth-first way. Iterative deepening on the proof depth is performed in order to achieve completeness. Additional inference rules and techniques include regularity, lemmata, and restricted backtracking [Ott10]. leanCoP uses an optimized structure-preserving transformation into clausal form [Ott10] and a fixed strategy schedule that is controlled by a shell script.

Implementation

leanCoP is implemented in Prolog. The source code of the core prover consists only of a few lines of code. Prolog's built-in indexing mechanism is used to quickly find connections when the extension rule is applied.

leanCoP can read formulae in leanCoP syntax and in TPTP first-order syntax. Equality axioms and axioms to support distinct objects are automatically added if required. The leanCoP core prover returns a very compact connection proof, which is then translated into a more comprehensive output format, e.g., into a lean (TPTP-style) connection proof or into a readable text proof.

The source code of leanCoP 2.2 is available under the GNU general public license. It can be downloaded from the leanCoP website at:

    http://www.leancop.de

The website also contains information about ileanCoP [Ott08] and MleanCoP [Ott12, Ott14], two versions of leanCoP for first-order intuitionistic logic and first-order modal logic, respectively.

Expected Competition Performance

As the prover has not changed, the performance of leanCoP 2.2 is expected to be the same as last year.

LEO-II 1.7.0

Alexander Steen
University of Luxembourg, Luxembourg

Architecture

LEO-II [BP+08], the successor of LEO [BK98], is a higher-order ATP system based on extensional higher-order resolution. More precisely, LEO-II employs a refinement of extensional higher-order RUE resolution [Ben99]. LEO-II is designed to cooperate with specialist systems for fragments of higher-order logic. By default, LEO-II cooperates with the first-order ATP system E [Sch02]. LEO-II is often too weak to find a refutation amongst the steadily growing set of clauses on its own. However, some of the clauses in LEO-II's search space attain a special status: they are first-order clauses modulo the application of an appropriate transformation function. Therefore, LEO-II launches a cooperating first-order ATP system every n iterations of its (standard) resolution proof search loop (e.g., 10). If the first-order ATP system finds a refutation, it communicates its success to LEO-II in the standard SZS format. Communication between LEO-II and the cooperating first-order ATP system uses the TPTP language and standards.

Strategies

LEO-II employs an adapted "Otter loop". Moreover, LEO-II uses some basic strategy scheduling to try different search strategies or flag settings. These search strategies also include some different relevance filters.

Implementation

LEO-II is implemented in OCaml 4, and its problem representation language is the TPTP THF language [BRS08]. In fact, the development of LEO-II has largely paralleled the development of the TPTP THF language and related infrastructure [SB10]. LEO-II's parser supports the TPTP THF0 language and also the TPTP languages FOF and CNF.

Unfortunately the LEO-II system still uses only a very simple sequential collaboration model with first-order ATPs instead of using the more advanced, concurrent and resource-adaptive OANTS architecture [BS+08] as exploited by its predecessor LEO.

The LEO-II system is distributed under a BSD style license, and it is available from

    http://www.leoprover.org

Expected Competition Performance

LEO-II ist not actively being developed anymore, hence there are no expected improvements to last year's CASC results.

Leo-III 1.4

Alexander Steen
University of Luxembourg, Luxembourg

Architecture

Leo-III [SB18] [SB18], the successor of LEO-II [BP+08] is a higher-order ATP system based on extensional higher-order paramodulation with inference restrictions using a higher-order term ordering. The calculus contains dedicated extensionality rules and is augmented with equational simplification routines that have their intellectual roots in first-order superposition-based theorem proving. The saturation algorithm is a variant of the given clause loop procedure inspired by the first-order ATP system E.

Leo-III cooperates with external first-order ATPs which are called asynchronously during proof search; a focus is on cooperation with systems that support typed first-order (TFF) input. For this year's CASC, CVC4 [BC+11] and E [Sch02, Sch13] are used as external systems. However, cooperation is in general not limited to first-order systems. Further TPTP/TSTP-compliant external systems (such as higher-order ATPs or counter model generators) may be included using simple command-line arguments. If the saturation procedure loop (or one of the external provers) finds a proof, the system stops, generates the proof certificate and returns the result.

For the LTB division, Leo-III is augmented by an external Python3 driver which schedules Leo-III of the batches.

Strategies

Leo-III comes with several configuration parameters that influence its proof search by applying different heuristics and/or restricting inferences. These parameters can be chosen manually by the user on start-up. Leo-III currently does not offer portfolio scheduling (time slicing) nor automatic selection of configuration parameters that seems somehow beneficial for the reasoning problem input at hand. Strategies and strategy scheduling will be addressed in further upcoming versions.

Implementation

Leo-III utilizes and instantiates the associated LeoPARD system platform [WSB15] for higher-order (HO) deduction systems implemented in Scala (currently using Scala 2.12 and running on a JVM with Java 8). The prover makes use of LeoPARD's sophisticated data structures and implements its own reasoning logic on top. A generic parser is provided that supports all TPTP syntax dialects. It is implemented using ANTLR4 and converts its produced concrete syntax tree to an internal TPTP AST data structure which is then transformed into polymorphically typed lambda terms. As of version 1.1, Leo-III supports all common TPTP dialects (CNF, FOF, TFF, THF) as well as its polymorphic variants [BP13, KSR16].

The term data structure of Leo-III uses a polymorphically typed spine term representation augmented with explicit substitutions and De Bruijn-indices. Furthermore, terms are perfectly shared during proof search, permitting constant-time equality checks between alpha-equivalent terms.

Leo-III's saturation procedure may at any point invoke external reasoning tools. To that end, Leo-III includes an encoding module which translates (polymorphic) higher-order clauses to polymorphic and monomorphic typed first-order clauses, whichever is supported by the external system. While LEO-II relied on cooperation with untyped first-order provers, Leo-III exploits the native type support in first-order provers (TFF logic) for removing clutter during translation and, in turn, higher effectivity of external cooperation.

Leo-III is available on GitHub:

    https://github.com/leoprover/Leo-III

Expected Competition Performance

Version 1.4 only marginally improves the previous release by fixing some bugs. However, this year we try to run a native build of Leo-III using the ahead-of-time GraalVM compiler. This should improve the running time behavior of Leo-III but will, most likely, not improve its general success rate. So, we expect a similar performance as in last year's CASC.

As a novelty, this year is the first time that Leo-III will compete in the LTB division of CASC. Stemming from Leo-III's support for polymorphic HOL reasoning, we expect a reasonable performance compared to the other systems. On the other hand, Leo-III's LTB mode does not do any learning and/or analysis of the learning samples.

MaedMax 1.3

Sarah Winkler
Universität Innsbruck, Austria

Architecture

MaedMax [WM18] is an equational theorem prover based on maximal ordered completion. This variant of ordered completion is inspired by maximal completion as developed by Klein and Hirokawa [KH11]. In both of these approaches the proof search is guided by a maxSMT solver. More precisely, the idea is to maintain a single pool of equations. One then orients as many equations as possible into a terminating rewrite system, by encoding some class of reduction orders in SMT and solving a maxSMT problem. If a terminating rewrite system obtained in this way joins the goal the procedure succeeds. Otherwise, if it gives rise to a ground complete presentation, then the goal can be decided by (basic) narrowing. If neither of these success criteria apply, extended critical pairs are added to the equation pool and the procedure is reiterated. Instead of orienting as many equations as possible, one may also use more complex goal functions, which are often more effective [SW15].

Compared to traditional ordered completion [BDP89], maximal ordered completion has the advantage of being fully automatic with respect to the reduction order, a notoriously critical parameter: rather than requiring an explicit heuristic, the reduction order is determined by optimizing some goal function. Moreover, rather than processing one equation at a time, the inference scheme is inherently more parallel: multiple equations are oriented at once, and critical pair computation as well as rewriting is done for many equations in every iteration.

Different criteria for ground joinability are supported, some of which can contribute to the optimization in the maxSMT call.

MaedMax can output proofs in the TSTP format as well as in CPF, which is checkable by the verified proof checker CeTA [SWZ15].

Strategies

The strategy can be controlled by numerous flags, which determine the goal function which gets optimized by the SMT solver, classes of reduction orders considered, selection of new equations and goals, the ground joinability criteria used, the restart frequency, bounds for newly selected equations and goals, and treatment of AC operators, among others.

Implementation

MaedMax is implemented in OCaml, interfacing Yices 1 and Z3 as maxSMT solvers. Fingerprint indexing [Sch12] is used to retrieve mathcing and unifiable terms. Source code and binary are available at: MaedMax is available at:

    http://informatik-maedmax.uibk.ac.at/

Expected Competition Performance

Being a new participant in a new (or at least long abolished) category, it's going to be a surprise ...

MaLARea 0.6

Josef Urban
Czech Technical University in Prague, Czech Republic

Architecture

MaLARea 0.6 [Urb07,US+08,KUV15] is a metasystem for ATP in large theories where symbol and formula names are used consistently. It uses several deductive systems (now E, SPASS, Vampire, Paradox, Mace), as well as complementary AI techniques like machine learning (the SNoW system) based on symbol-based similarity, model-based similarity, term-based similarity, and obviously previous successful proofs. The version for CASC-J9 will use the E prover with the BliStr(Tune) [Urb13,JU17] large-theory strategies, possibly also Prover9, Mace and Paradox. The premise selection methods will likely also use the distance-weighted k-nearest neighbor [KU13] and E's implementation of SInE.

Strategies

The basic strategy is to run ATPs on problems, then use the machine learner to learn axiom relevance for conjectures from solutions, and use the most relevant axioms for next ATP attempts. This is iterated, using different timelimits and axiom limits. Various features are used for learning, and the learning is complemented by other criteria like model-based reasoning, symbol and term-based similarity, etc.

Implementation

The metasystem is implemented in ca. 2500 lines of Perl. It uses many external programs - the above mentioned ATPs and machine learner, TPTP utilities, LADR utilities for work with models, and some standard Unix tools. MaLARea is available at:

    https://github.com/JUrban/MPTP2/tree/master/MaLARea

The metasystem's Perl code is released under GPL2.

Expected Competition Performance

MaLARea 0.6 was the CASC-J9 LTB winner.

MaLARea 0.8

Josef Urban
Czech Technical University in Prague, Czech Republic

Architecture

MaLARea 0.8 [Urb07,US+08,KUV15] is a metasystem for ATP in large theories where symbol and formula names are used consistently. It uses several deductive systems (now E, SPASS, Vampire, Paradox, Mace), as well as complementary AI techniques like machine learning based on symbol-based similarity, model-based similarity, term-based similarity, and obviously previous successful proofs. The version for CASC-27 will use the E prover with the BliStr(Tune) [Urb13,JU17] large-theory strategies, possibly also Prover9, Mace and Paradox. The premise selection methods will likely also use the distance-weighted k-nearest neighbor [KU13], ATPBoost [PU18], and E's implementation of SInE.

Strategies

The basic strategy is to run ATPs on problems, then use the machine learner to learn axiom relevance for conjectures from solutions, and use the most relevant axioms for next ATP attempts. This is iterated, using different time limits and axiom limits. Various features are used for learning, and the learning is complemented by other criteria like model-based reasoning, symbol and term-based similarity, etc.

Implementation

    https://github.com/JUrban/MPTP2/tree/master/MaLARea

The metasystem's Perl code is released under GPL2.

Expected Competition Performance

Thanks to machine learning, MaLARea is strongest on batches of many related problems with many redundant axioms where some of the problems are easy to solve and can be used for learning the axiom relevance. MaLARea is not very good when all problems are too difficult (nothing to learn from), or the problems (are few and) have nothing in common. Some of its techniques (selection by symbol and term-based similarity, model-based reasoning) could however make it even there slightly stronger than standard ATPs. MaLARea has a very good performance on the MPTP Challenge, which is a predecessor of the LTB division, and on several previous LTB competitions.

nanoCoP---1.1

Jens Otten
University of Oslo, Norway

Architecture

nanoCoP [Ott16,Ott17] is an automated theorem prover for classical first-order logic with equality. It is a very compact implementation of the non-clausal connection calculus [Ott11].

Strategies

An additional decomposition rule is added to the clausal connection calculus and the extension rule is generalized to non-clausal formulae. Open branches are selected in a depth-first way. Iterative deepening on the proof depth is performed in order to achieve completeness. Additional inference rules and techniques include regularity, lemmata, restricted backtracking, and a fixed strategy schedule that is controlled by a shell script [Ott18].

Implementation

nanoCoP is implemented in Prolog. The source code of the core prover consists only of a few lines of code. Prolog's built-in indexing mechanism is used to quickly find connections when the extension rule is applied.

nanoCoP can read formulae in leanCoP/nanoCoP syntax and in TPTP first-order syntax. Equality axioms are automatically added if required. The nanoCoP core prover returns a compact non-clausal connection proof.

The source code of nanoCoP 1.1 is available under the GNU general public license. It can be downloaded from the nanoCoP website at:

    http://www.leancop.de/nanocop

The provers nanoCoP-i and nanoCoP-M are version of nanoCoP for first-order intuitionistic logic and first-order modal logic, respectively. They are based on an adapted non-clausal connection calculus for non-classical logics [Ott17].

Expected Competition Performance

nanoCoP is expected to have a better performance than leanCoP on formulae that have a nested (non-clausal) structure.

Prover9 1109a

Bob Veroff on behalf of William McCune
University of New Mexico, USA

Architecture

Prover9, Version 2009-11A, is a resolution/paramodulation prover for first-order logic with equality. Its overall architecture is very similar to that of Otter-3.3 [McC03]. It uses the "given clause algorithm", in which not-yet-given clauses are available for rewriting and for other inference operations (sometimes called the "Otter loop").

Prover9 has available positive ordered (and nonordered) resolution and paramodulation, negative ordered (and nonordered) resolution, factoring, positive and negative hyperresolution, UR-resolution, and demodulation (term rewriting). Terms can be ordered with LPO, RPO, or KBO. Selection of the "given clause" is by an age-weight ratio.

Proofs can be given at two levels of detail: (1) standard, in which each line of the proof is a stored clause with detailed justification, and (2) expanded, with a separate line for each operation. When FOF problems are input, proof of transformation to clauses is not given.

Completeness is not guaranteed, so termination does not indicate satisfiability.

Strategies

Prover9 has available many strategies; the following statements apply to CASC.

Given a problem, Prover9 adjusts its inference rules and strategy according to syntactic properties of the input clauses such as the presence of equality and non-Horn clauses. Prover9 also does some preprocessing, for example, to eliminate predicates.

For CASC Prover9 uses KBO to order terms for demodulation and for the inference rules, with a simple rule for determining symbol precedence.

For the FOF problems, a preprocessing step attempts to reduce the problem to independent subproblems by a miniscope transformation; if the problem reduction succeeds, each subproblem is clausified and given to the ordinary search procedure; if the problem reduction fails, the original problem is clausified and given to the search procedure.

Implementation

Prover9 is coded in C, and it uses the LADR libraries. Some of the code descended from EQP [McC97]. (LADR has some AC functions, but Prover9 does not use them). Term data structures are not shared (as they are in Otter). Term indexing is used extensively, with discrimination tree indexing for finding rewrite rules and subsuming units, FPA/Path indexing for finding subsumed units, rewritable terms, and resolvable literals. Feature vector indexing [Sch04] is used for forward and backward nonunit subsumption. Prover9 is available from

   http://www.cs.unm.edu/~mccune/prover9/

Expected Competition Performance

Prover9 is the CASC fixed point, against which progress can be judged. Each year it is expected do worse than the previous year, relative to the other systems.

PyRes 1.0

Stephan Schulz
DHBW Stuttgart, Germany

Architecture

PyRes is a simple resolution-style theorem prover for first-order logic, implemented in very clear and well-commented Python. It has been written as a pedagogical tool to illustrate the architecture and basic algorithms of a saturation-style theorem prover. The prover consists of a parser for (most of) TPTP-3 format, a simple clausifier to convert full first-order format into clause normal form, and a saturation core trying to derive the empty clause from the resulting clause set. The saturation core is based on the DISCOUNT-loop variant of the given-clause algorithm, i.e., a strict separation of active and passive facts. It implements simple binary resolution and factoring [Rob65], optionally with selection of negative literals [BG+01]. Redundancy elimination is restricted to forward and backward subsumption and tautology deletion. There are no inference rules for equality - if equality is detected, the necessary axioms are added.

Strategies

The prover supports several negative literal selection strategies, as well as selection of the given clause from a set of differently weighted priority queues in the style of E [SCV19]. In the competition, it will always select the syntactically largest literal, and will use weight-age interleaved clause selection with a pick-given ratio of 5 to 1.

Implementation

The prover is implemented in Python 2.7, with maximal emphasis on clear and well-documented code. Terms are represented as nested lists (equivalent to LISP style s-expressions), Literals, clauses, and formulas are implemented as classes using an object-oriented style. The system does not use any indexing or other advanced techniques. PyRes builds a proof object on the fly, and can print a TPTP-3 style proof or saturation derivation. The system source is available at

    https://github.com/eprover/PyRes

Expected Competition Performance

Performance is expected to be mediocre for non-equational problems and abysmal for problems with equality. However, per CASC rules, PyRes will still be assumed superior to any non-participating prover.

Satallax 3.3

Michael Färber
Universität Innsbruck, Austria

Architecture

Satallax 3.3 [Bro12] is an automated theorem prover for higher-order logic. The particular form of higher-order logic supported by Satallax is Church's simple type theory with extensionality and choice operators. The SAT solver MiniSat [ES04] is responsible for much of the proof search. The theoretical basis of search is a complete ground tableau calculus for higher-order logic [BS10] with a choice operator [BB11]. Problems are given in the THF format.

Proof search: A branch is formed from the axioms of the problem and the negation of the conjecture (if any is given). From this point on, Satallax tries to determine unsatisfiability or satisfiability of this branch. Satallax progressively generates higher-order formulae and corresponding propositional clauses [Bro13]. These formulae and propositional clauses correspond to instances of the tableau rules. Satallax uses the SAT solver MiniSat to test the current set of propositional clauses for unsatisfiability. If the clauses are unsatisfiable, then the original branch is unsatisfiable. Optionally, Satallax generates first-order formulae in addition to the propositional clauses. If this option is used, then Satallax periodically calls the first-order theorem prover E [Sch13] to test for first-order unsatisfiability. If the set of first-order formulae is unsatisfiable, then the original branch is unsatisfiable. Upon request, Satallax attempts to reconstruct a proof which can be output in the TSTP format.

Strategies

There are about 150 flags that control the order in which formulae and instantiation terms are considered and propositional clauses are generated. Other flags activate some optional extensions to the basic proof procedure (such as whether or not to call the theorem prover E). A collection of flag settings is called a mode. Approximately 500 modes have been defined and tested so far. A strategy schedule is an ordered collection of modes with information about how much time the mode should be allotted. Satallax tries each of the modes for a certain amount of time sequentially. Before deciding on the schedule to use, Satallax parses the problem and determines if it is big enough that a SInE-based premise selection algorithm [HV11] should be used. If SInE is not activated, then Satallax 3.3 uses a strategy schedule consisting of 48 modes (16 of which make use of Mizar style soft types). Each mode is tried for time limits ranging from less than a second to just over 1 minute. If SInE is activated, than Satallax is run with a SInE-specific schedule consisting of 20 possible SInE parameter values selecting different premises and some corresponding modes and time limits.

Implementation

Satallax is implemented in OCaml, making use of the external tools MiniSat (via a foreign function interface) and E. Satallax is available at:

    http://satallaxprover.com

Expected Competition Performance

Satallax 3.3 was the CASC-J9 THF winner.

Satallax 3.4

Michael Färber
ENS Paris-Saclay, France

Architecture

Satallax 3.4 [Bro12] is an automated theorem prover for higher-order logic. The particular form of higher-order logic supported by Satallax is Church's simple type theory with extensionality and choice operators. The SAT solver MiniSat [ES04] is responsible for much of the proof search. The theoretical basis of search is a complete ground tableau calculus for higher-order logic [BS10] with a choice operator [BB11]. Problems are given in the THF format.

Proof search: A branch is formed from the axioms of the problem and the negation of the conjecture (if any is given). From this point on, Satallax tries to determine unsatisfiability or satisfiability of this branch. Satallax progressively generates higher-order formulae and corresponding propositional clauses [Bro13]. These formulae and propositional clauses correspond to instances of the tableau rules. Satallax uses the SAT solver MiniSat to test the current set of propositional clauses for unsatisfiability. If the clauses are unsatisfiable, then the original branch is unsatisfiable. Optionally, Satallax generates lambda-free higher-order logic (lfHOL) formulae in addition to the propositional clauses [VB+19]. If this option is used, then Satallax periodically calls the theorem prover E [Sch13] to test for lfHOL unsatisfiability. If the set of lfHOL formulae is unsatisfiable, then the original branch is unsatisfiable. Upon request, Satallax attempts to reconstruct a proof which can be output in the TSTP format.

Strategies

There are about 150 flags that control the order in which formulae and instantiation terms are considered and propositional clauses are generated. Other flags activate some optional extensions to the basic proof procedure (such as whether or not to call the theorem prover E). A collection of flag settings is called a mode. Approximately 500 modes have been defined and tested so far. A strategy schedule is an ordered collection of modes with information about how much time the mode should be allotted. Satallax tries each of the modes for a certain amount of time sequentially. Before deciding on the schedule to use, Satallax parses the problem and determines if it is big enough that a SInE-based premise selection algorithm [HV11] should be used. If SInE is not activated, then Satallax uses a strategy schedule consisting of 37 modes. If SInE is activated, than Satallax is run with a SInE-specific schedule consisting of 58 modes with different SInE parameter values selecting different premises. Each mode is tried for time limits ranging from less than a second to just over 1 minute.

Implementation

Satallax is implemented in OCaml, making use of the external tools MiniSat (via a foreign function interface) and E. Satallax is available at:

    http://cl-informatik.uibk.ac.at/~mfaerber/satallax.html

Expected Competition Performance

Satallax 3.4 adds support for lambda-free higher order logic in E. Among others, this allows Satallax to efficiently translate formulas with partially applied terms to E. This feature was implemented with the help of Petar Vukmirovic during a research visit to Vrije Universiteit Amsterdam, which was sponsored by a travel prize contributed by Jasmin Blanchette for CASC-J9. This improves the performance of Satallax compared to version 3.3. However, it might be offset by the rising amount of polymorphic problems in TPTP, which Satallax does not support.

Twee 2.2

Nick Smallbone
Chalmers University of Technology, Sweden

Architecture

Twee 2.2 is an equational prover based on unfailing completion [BDP89]. It features ground joinability testing [MN90] and a connectedness test [BD88], which together eliminate many redundant inferences in the presence of unorientable equations.

Twee's implementation of ground joinability testing performs case splits on the order of variables, in the style of [MN90], and discharges individual cases by rewriting modulo a variable ordering. It is able to pick only useful case splits and to case split on a subset of the variables, which makes it efficient enough to be switched on unconditionally. Horn clauses are encoded as equations as described in [CS18]. The CASC version of Twee "handles" non-Horn clauses by discarding them.

Strategies

The main loop is a DISCOUNT loop. The active set contains rewrite rules and unorientable equations, which are used for rewriting, and the passive set contains unprocessed critical pairs. Twee often interreduces the active set, and occasionally simplifies the passive set with respect to the active set. Each critical pair is scored using a weighted sum of the weight of both of its terms. Terms are treated as DAGs when computing weights, i.e., duplicate subterms are only counted once per term. The weights of critical pairs that correspond to Horn clauses are adjusted by the heuristic described in Section 5 of [CS18].

Implementation

Twee is written in Haskell. Terms are represented as array-based flatterms for efficient unification and matching. Rewriting uses an imperfect discrimination tree. The passive set is represented as a heap. It achieves high space efficiency (12 bytes per critical pair) by storing the parent rule numbers and overlap position instead of the full critical pair and by grouping all critical pairs of each rule into one heap entry.

Twee uses an LCF-style kernel: all rules in the active set come with a certified proof object which traces back to the input axioms. When a conjecture is proved, the proof object is transformed into a human-readable proof. Proof construction does not harm efficiency because the proof kernel is invoked only when a new rule is accepted. In particular, reasoning about the passive set does not invoke the kernel. The translation from Horn clauses to equations is not yet certified.

Twee can be downloaded from:

    http://nick8325.github.io/twee

Expected Competition Performance

Twee should be competitive in UEQ but will probably not beat Waldmeister. In FOF, it will do much worse since it is really an equational prover extended to handle Horn clauses. It should still do well on Horn problems with a heavy equational part, and with a lucky selection of problems, it may solve a few hard ones.

Vampire 4.3

Giles Reger
University of Manchester, United Kingdom

Architecture

Vampire [KV13] 4.3 is an automatic theorem prover for first-order logic. Vampire implements the calculi of ordered binary resolution and superposition for handling equality. It also implements the Inst-gen calculus and a MACE-style finite model builder [RSV16]. Splitting in resolution-based proof search is controlled by the AVATAR architecture which uses a SAT or SMT solver to make splitting decisions [Vor14,RB+16]. Both resolution and instantiation based proof search make use of global subsumption.

A number of standard redundancy criteria and simplification techniques are used for pruning the search space: subsumption, tautology deletion, subsumption resolution and rewriting by ordered unit equalities. The reduction ordering is the Knuth-Bendix Ordering. Substitution tree and code tree indexes are used to implement all major operations on sets of terms, literals and clauses. Internally, Vampire works only with clausal normal form. Problems in the full first-order logic syntax are clausified during preprocessing. Vampire implements many useful preprocessing transformations including the SinE axiom selection algorithm. When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF.

Strategies

Vampire 4.3 provides a very large number of options for strategy selection. The most important ones are:

Choices of saturation algorithm:
- Limited Resource Strategy [RV03]
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
Splitting via AVATAR [Vor14]
A variety of optional simplifications.
Parameterized reduction orderings.
A number of built-in literal selection functions and different modes of comparing literals [HR+16].
Age-weight ratio that specifies how strongly lighter clauses are preferred for inference selection.
Set-of-support strategy.
Ground equational reasoning via congruence closure.
Addition of theory axioms and evaluation of interpreted functions.
Use of Z3 with AVATAR to restrict search to ground-theory-consistent splitting branches [RB+16].
Specialised theory instantiation and unification [RSV18]
Extensionality resolution with detection of extensionality axioms

Implementation

Vampire 4.3 is implemented in C++. It makes use of minisat and z3.

Expected Competition Performance

Vampire 4.3 was the CASC-J9 TFA, FOF, and FNT winner.

Vampire 4.4

Giles Reger
University of Manchester, United Kingdom

There are no major changes to the main part of Vampire since 4.4, beyond some new proof search heuristics and new default values for some options. The biggest addition is support for higher-order reasoning via translation to applicative form and combinators, addition of axioms and extra inference rules, and a new form of combinatory unification.

Architecture

Vampire [KV13] 4.4 is an automatic theorem prover for first-order logic with extensions to theory-reasoning and higher-order logic. Vampire implements the calculi of ordered binary resolution and superposition for handling equality. It also implements the Inst-gen calculus and a MACE-style finite model builder [RSV16]. Splitting in resolution-based proof search is controlled by the AVATAR architecture which uses a SAT or SMT solver to make splitting decisions [Vor14, RB+16]. Both resolution and instantiation based proof search make use of global subsumption.

When a theorem is proved, the system produces a verifiable proof, which validates both the clausification phase and the refutation of the CNF. Vampire 4.4 provides a very large number of options for strategy selection. The most important ones are:

Choices of saturation algorithm:
- Limited Resource Strategy [RV03]
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
Splitting via AVATAR [Vor14]
A variety of optional simplifications.
Parameterized reduction orderings.
A number of built-in literal selection functions and different modes of comparing literals [HR+16].
Age-weight ratio that specifies how strongly lighter clauses are preferred for inference selection.
Set-of-support strategy.
For theory-reasoning:
- Ground equational reasoning via congruence closure.
- Addition of theory axioms and evaluation of interpreted functions.
- Use of Z3 with AVATAR to restrict search to ground-theory-consistent splitting branches [RB+16].
- Specialised theory instantiation and unification [RSV18].
- Extensionality resolution with detection of extensionality axioms
For higher-order problems:
- Translation to applicative and combinator form.
- Addition of combinator axioms.
- Addition of shortcut inference rules that encode axioms.
- Proof search heuristics targetting the growth of combinator axioms.
- Restricted combinatory unification [RSV18].

Implementation

Vampire 4.4 is implemented in C++. It makes use of Minisat and Z3.

Expected Competition Performance

Vampire 4.4 should be comparable to 4.3.

Waldmeister 710

Thomas Hillenbrand
Max-Planck-Institut für Informatik, Germany

Architecture

Waldmeister 710 [Hil03] is a system for unit equational deduction. Its theoretical basis is unfailing completion in the sense of [BDP89] with refinements towards ordered completion (cf. [AHL03]). The system saturates the input axiomatization, distinguishing active facts, which induce a rewrite relation, and passive facts, which are the one-step conclusions of the active ones up to redundancy. The saturation process is parameterized by a reduction ordering and a heuristic assessment of passive facts [HJL99].

Strategies

The prover is coded in ANSI-C. It runs on Solaris, Linux, MacOS X, and Windows/Cygwin. The central data strucures are: perfect discrimination trees for the active facts; group-wise compressions for the passive ones; and sets of rewrite successors for the conjectures. Visit the Waldmeister web pages at:

    http://www.waldmeister.org

Implementation

The approach taken to control the proof search is to choose the search parameters – reduction ordering and heuristic assessment – according to the algebraic structure given in the problem specification [HJL99]. This is based on the observation that proof tasks sharing major parts of their axiomatization often behave similarly.

Expected Competition Performance

Waldmeister 710 was the CASC-J7 UEQ winner.

Zipperposition 1.5

Petar Vukmirović
Vrije Universiteit Amsterdam, The Netherlands

Architecture

Zipperposition is a superposition-based theorem prover for typed first-order logic with equality and higher-order logic. It is a pragmatic implementation of a recently developed complete calculus for Boolean-free higher-order logic [BB+19]. It features a number of extensions that include polymorphic types; linear arithmetic on integers and rationals using a specialized set of first-order inference rules; datatypes with structural induction; user-defined rewriting on terms and formulas ("deduction modulo theories"); a lightweight variant of AVATAR for case splitting; first-class booleans [KK+16]. The core architecture of the prover is based on saturation with an extensible set of rules for inferences and simplifications. The initial calculus and main loop were imitations of an old version of E [Sch02], but there are many more rules nowadays. A summary of the calculus for integer arithmetic and induction can be found in [Cru15].

Strategies

The system uses various strategies in a portfolio. The strategies have been chosen by manual inspection of different TPTP problems. Most of them are inspired by efficient strategies used in E. Portfolio mode differentiates higher-order problems from the first-order ones. If the problem is first-order all higher-order prover features are turned off. Other than that, the portfolio is static and does not depend on the syntactic properties of the problem.

Implementation

The prover is implemented in OCaml, and has been around for seven years. Term indexing is done using fingerprints for unification, perfect discrimination trees for rewriting, and feature vectors for subsumption. Some inference rules such as contextual literal cutting make heavy use of subsumption. For higher-order problems some strategies use E prover, running in lambda-free higher-order mode, as an end-game backend prover. The code can be found at

    https://github.com/c-cube/zipperposition

and is entirely free software (BSD-licensed). Zipperposition can also output graphic proofs using graphviz. Some tools to perform type inference and clausification for typed formulas are also provided, as well as a separate library for dealing with terms and formulas [Cru15].

Expected Competition Performance

The prover is expected to have average performance on FOF, similar to Prover9, and a decent performance on THF, but still slightly behind last year's CASC winners.