Entrants' System Descriptions
CSE 1.0
Feng Cao (Yang Xu, Jun Liu, Shuwei Chen, Xiaomei Zhong, Peng Xu, Qinghua Liu,
Huimin Fu, Xiaodong Guan, Zhenming Song)
Southwest Jiaotong University, China
Ulster University, United Kingdom (Jun Liu)
Architecture
CSE 1.0 is an automated theorem prover for first-order logic without equality
mainly based on a novel inference mechanism, called as Contradiction Separation
Based Dynamic Multi-Clause Synergized Automated Deduction (S-CS)
[XL+18].
This S-CS inference rule is able to handle multiple (two or more) clauses
dynamically in a synergized way in one deduction step, while binary resolution
is its special case.
In each step, multiple clauses are selected and separated into two parts,
sub-clauses, while one part of each clause is used to form a contradiction,
and the disjunction of the remaining literals forms the logical consequence
of the selected clauses, called contradiction separation clause (CSC).
CSE 1.0 adopts conventional factoring, equality resolution, and variable
renaming.
Some pre-processing techniques, including pure literal deletion and
simplification based on the distance to the goal clause, and a number of
standard redundancy criteria for pruning the search space: tautology deletion,
subsumption (forward and backward) are applied as well.
Internally, CSE 1.0 works only with clausal normal form.
E prover
[Sch13] is adopted with thanks for clausification of full first-order logic
problems during preprocessing.
Strategies
CSE 1.0 has provided a number of options for strategy selection.
The most important ones are:
- Clause selection.
This strategy category mainly considers the times of the clauses taken
part in the inference, the weights considering clause redundancy, term
weight, clause complexity, the number of literals, the number of
complementary predicates, the clause being a source clause or an
inferred clause, and so on.
- Literal selection.
This strategy category mainly considers the times of the literal taken
part in the inference, literal stability, literal complexity, the number
of predicates in the literal, etc.
- Weight strategy.
The weights to the clauses are calculated mainly considering the times
of clause taken part in the deduction process, and clause redundancy.
The weights are updated dynamically during the deduction process.
- Contradiction separation clause (CSC) strategy.
This strategy category mainly considers the number of literals in the
CSC, the percent of the ground literals appearing in the CSC, the number
of function symbol acting on the literals in a clause, and the
effectiveness evaluation of the CSC.
- Control strategy of the contradiction separation deduction process.
The number of literals in the CSC, and the number of clauses involved in
each contradiction are dynamically changed during the deduction process.
Implementation
CSE 1.0 is implemented mainly in C++, and JAVA is used for batch problem
running implementation.
Shared data structure is used for constants and shared variables storage.
E prover is used for clausification of FOF problems, and then TPTP2X is
applied to convert the CNF format into TPTP format.
Expected Competition Performance
CSE is new to CASC, and we expect it to have an acceptable performance.
Acknowledgement:
Development of CSE 1.0 has been partially supported
by the National Natural Science Foundation of China (NSFC) (Grant No.61673320)
and the Fundamental Research Funds for the Central Universities in China (Grant
No.2682018ZT10).
CSE 1.1
Feng Cao (Yang Xu, Jun Liu, Shuwei Chen, Xingxing He, Xiaomei Zhong, Peng Xu,
Qinghua Liu, Huimin Fu, Jian Zhong, Guanfeng Wu, Xiaodong Guan,
Zhenming Song)
Southwest Jiaotong University, China
Ulster University, United Kingdom (Jun Liu)
Architecture
The basic inference mechanism of CSE 1.1 is similar to CSE 1.0, i.e., it is
an automated theorem prover for first-order logic without equality mainly
based on a novel inference mechanism, called as Contradiction Separation
Based Dynamic Multi-Clause Synergized Automated Deduction (S-CS)
[XL+18], which is able to handle multiple (two or more) clauses dynamically in
a synergized way in one deduction step, while binary resolution is its special
case.
The difference between CSE 1.0 and CSE 1.1 is that there are two S-CS deduction
mechanisms in CSE 1.1, where one is called from left to right, which refers
the clauses that are not in the contradiction under construction, and another
is named from right to left, which considers the clauses that are already in
the contradiction under construction.
In addition, it supports the repeat usage of the same clause in one deduction
step.
These characteristics make the S-CS deduction be able to produce more unit
clauses.
CSE 1.1 adopts conventional factoring, equality resolution, and variable
renaming.
Some pre-processing techniques, including pure literal deletion and
simplification based on the distance to the goal clause, and a number of
standard redundancy criteria for pruning the search space: tautology deletion,
subsumption (forward and backward) are applied as well.
Internally, CSE 1.1 works only with clausal normal form. E prover
[Sch13] is adopted with thanks for clausification of full first-order logic
problems during preprocessing.
Strategies
Strategies
CSE 1.1 inherited most of the clause/literal selection strategy selection,
while the crucial difference comes from the multiple strategy mode and some
heuristic strategies.
The multiple strategy mode allows CSE 1.1 to solve the problem by trying
different combination of strategies.
Besides the strategies used in CSE 1.0, e.g., clause selection, literal
selection, and weight strategy, there are some different strategies:
- Deduction framework.
This provides two overall options for S-CS deduction: integrity deduction
mode, which takes all the clauses into consideration during deduction
process, and contradiction separation clause deduction mode, which
considers only a subset of clauses.
- Repeat usage of clause.
This strategy provides two strategies: repeat usage of axiom and repeat
usage of clause.
- Contradiction separation clause strategy.
Besides the CSC strategies in CSE 1.0, CSE 1.1 allows the usage of the
medium CSCs during the contradiction construction process.
Implementation
CSE 1.1 is implemented mainly in C++, and JAVA is used for batch problem
running implementation.
Shared data structure is used for constants and shared variables storage.
In addition, special data structure is designed for property description of
clause, literal and term, so that it can support the multiple strategy mode.
E prover is used for clausification of FOF problems, and then TPTP2X is
applied to convert the CNF format into TPTP format.
Expected Competition Performance
CSE is new to CASC, and we expect it to have an acceptable performance.
Acknowledgement
Development of CSE 1.0 has been partially supported
by the National Natural Science Foundation of China (NSFC) (Grant No.61673320)
and the Fundamental Research Funds for the Central Universities in China (Grant
No.2682018ZT10).
CSE_E 1.0
Feng Cao (Yang Xu, Stephan Schulz, Jun Liu, Shuwei Chen, Xingxing He,
Xiaomei Zhong, Peng Xu, Qinghua Liu, Huimin Fu, Jian Zhong, Guanfeng Wu,
Xiaodong Guan, Zhenming Song)
Southwest Jiaotong University, China
DHBW Stuttgart, Germany (Stephan Schulz)
Ulster University, United Kingdom (Jun Liu)
Architecture
CSE_E 1.0 is an automated theorem prover for first-order logic by combining
CSE 1.1 and E 2.1, where CSE is based on the Contradiction Separation Based
Dynamic Multi-Clause Synergized Automated Deduction (S-CS)
[XL+18] and E is based on superposition.
The combination mechanism is like this: E and CSE are applied to the given
problem sequentially.
If either prover solves the problem, then the proof process completes.
If neither CSE nor E can solve the problem, some inferred clauses, especially
unit clauses, by CSE will be fed to E as lemmas, along with the original
clauses, for further proof search.
This kind of combination is expected to take advantage of both CSE and E,
and produce a better performance.
Concretely, CSE is able to generate a good number of unit clauses, based on
the fact that unit clauses are helpful for proof search and equality handling.
On the other hand, E has a good ability on equality handling.
Strategies
The strategies of CSE part of CSE_E 1.0 take the same strategies as in
CSE 1.1 standalone, e.g., clause/literal selection, strategy selection, and
CSC strategy.
The different built-in strategies in CSE_E 1.1 are:
- Evaluation of contradiction separation clause.
The evaluation is based on the number of clauses, the distance to the
goal clause, and other factors.
- Deletion of contradiction separation clause.
This is realized based on weights.
Three weight calculation methods considering variables, functions,
terms, and the time of a clause being involved in the deduction are
provided.
Implementation
CSE_E 1.0 is implemented mainly in C++, and JAVA is used for batch problem
running implementation.
The job dispatch between CSE and E is implemented in JAVA.
Expected Competition Performance
CSE_E is new to CASC, and it is the first step trying to incorporate CSE with
other theorem provers.
We expect it to have a satisfying performance.
Acknowledgement
Development of CSE 1.0 has been partially supported
by the National Natural Science Foundation of China (NSFC) (Grant No.61673320)
and the Fundamental Research Funds for the Central Universities in China (Grant
No.2682018ZT10).
Thanks should also be given to Prof. Stephan Schulz for his kind permission
on using his E prover that makes CSE-E possible.
CVC4 1.6pre
Andrew Reynolds
University of Iowa, USA
Architecture
CVC4
[BC+11] is an SMT solver based on the DPLL(T) architecture
[NOT06] that includes built-in support for many theories, including linear
arithmetic, arrays, bit vectors, datatypes, finite sets and strings.
It incorporates approaches for handling universally quantified formulas.
For problems involving free function and predicate symbols, CVC4 primarily
uses heuristic approaches based on E-matching for theorems, and finite model
finding approaches for non-theorems.
For problems in pure arithmetic, CVC4 uses techniques for counterexample-guided
quantifier instantiation
[RD+15].
Like other SMT solvers, CVC4 treats quantified formulas using a two-tiered
approach.
First, quantified formulas are replaced by fresh Boolean predicates and the
ground theory solver(s) are used in conjunction with the underlying SAT solver
to determine satisfiability.
If the problem is unsatisfiable at the ground level, then the solver answers
"unsatisfiable".
Otherwise, the quantifier instantiation module is invoked, and will either
add instances of quantified formulas to the problem, answer "satisfiable", or
return unknown.
Finite model finding in CVC4 targets problems containing background theories
whose quantification is limited to finite and uninterpreted sorts.
In finite model finding mode, CVC4 uses a ground theory of finite cardinality
constraints that minimizes the number of ground equivalence classes, as
described in
[RT+13].
When the problem is satisfiable at the ground level, a candidate model is
constructed that contains complete interpretations for all predicate and
function symbols.
It then adds instances of quantified formulas that are in conflict with the
candidate model, as described in
[RT+13].
If no instances are added, it reports "satisfiable".
Strategies
For handling theorems, CVC4 primarily uses conflict-based quantifier
instantiation
[RTd14,BFR17] and E-matching.
CVC4 uses a handful of orthogonal trigger selection strategies for E-matching.
For handling non-theorems, CVC4 primarily uses finite model finding techniques.
Since CVC4 with finite model finding is also capable of establishing
unsatisfiability, it is used as a strategy for theorems as well.
For problems in pure arithmetic, CVC4 uses variations of counterexample-guided
quantifier instantiation
[RD+15], which select relevant quantifier instantiations based on models for
counterexamples to quantified formulas.
CVC4 relies on this method both for theorems in TFA and non-theorems in TFN.
At the quantifier-free level, CVC4 uses standard decision procedures for
linear arithmetic and uninterpreted functions, as well as heuristic approaches
for handling non-linear arithmetic
[RT+17].
Implementation
CVC4 is implemented in C++. The code is available from:
https://github.com/CVC4
Expected Competition Performance
The first-order theorem proving and finite model finding capabilities of
CVC4 have not changed much in the past year.
Its support for non-linear arithmetic has been improved.
It is expected that CVC4 will perform slightly better than last year.
E 2.2pre
Stephan Schulz
DHBW Stuttgart, Germany
Architecture
E 2.1
[Sch02,Sch13] is a purely equational theorem prover for many-sorted
first-order logic with equality.
It consists of an (optional) clausifier for pre-processing full first-order
formulae into clausal form, and a saturation algorithm implementing an instance
of the superposition calculus with negative literal selection and a
number of redundancy elimination techniques.
E is based on the DISCOUNT-loop variant of the given-clause algorithm,
i.e., a strict separation of active and passive facts.
No special rules for non-equational literals have been implemented.
Resolution is effectively simulated by paramodulation and equality resolution.
However, as of E 2.1, PicoSAT
[Bie08]
can be used to periodically check the (on-the-fly grounded) proof state for
propositional unsatisfiability.
For LTB division, a control program uses a SInE-like analysis to extract
reduced axiomatizations that are handed to several instances of E.
E will probably not use on-the-fly learning this year.
Strategies
Proof search in E is primarily controlled by a literal selection strategy, a
clause selection heuristic, and a simplification ordering.
The prover supports a large number of pre-programmed literal selection
strategies.
Clause selection heuristics can be constructed on the fly by combining various
parameterized primitive evaluation functions, or can be selected from a set
of predefined heuristics.
Clause evaluation heuristics are based on symbol-counting, but also take other
clause properties into account.
In particular, the search can prefer clauses from the set of support, or
containing many symbols also present in the goal.
Supported term orderings are several parameterized instances of
Knuth-Bendix-Ordering (KBO) and Lexicographic Path Ordering (LPO).
For CASC-J9, E implements a strategy-scheduling automatic mode.
The total CPU time available is broken into several (unequal) time slices.
For each time slice, the problem is classified into one of several classes,
based on a number of simple features (number of clauses, maximal symbol arity,
presence of equality, presence of non-unit and non-Horn clauses,...).
For each class, a schedule of strategies is greedily constructed from
experimental data as follows:
The first strategy assigned to a schedule is the the one that solves
the most problems from this class in the first time slice.
Each subsequent strategy is selected based on the number of solutions on
problems not already solved by a preceding strategy.
About 220 different strategies have been evaluated on all untyped
first-order problems from TPTP 6.4.0.
About 90 of these strategies are used in the automatic mode, and about 210
are used in at least one schedule.
Implementation
E is build around perfectly shared terms, i.e. each distinct term is
only represented once in a term bank.
The whole set of terms thus consists of a number of interconnected directed
acyclic graphs.
Term memory is managed by a simple mark-and-sweep garbage collector.
Unconditional (forward) rewriting using unit clauses is implemented
using perfect discrimination trees with size and age constraints.
Whenever a possible simplification is detected, it is added as a
rewrite link in the term bank.
As a result, not only terms, but also rewrite steps are shared.
Subsumption and contextual literal cutting (also known as subsumption
resolution) is supported using feature vector indexing
[Sch13].
Superposition and backward rewriting use fingerprint indexing
[Sch12],
a new technique combining ideas from feature vector indexing and path indexing.
Finally, LPO and KBO are implemented using the elegant and efficient algorithms
developed by Bernd Löchner in
[Loe06,Loe06].
The prover and additional information are available at
http://www.eprover.org
Expected Competition Performance
E 2.2pre has only minor changes compared to last years pre-releases.
The major change is the integration of PicoSAT, however, very few PicoSAT
strategies have been evaluated.
As a result, we expect performance to be similar to last years, with maybe
most improvements in the EPR division.
The system is expected to perform well in most proof classes, but will at
best complement top systems in the disproof classes.
Geo-III 2018C
Hans de Nivelle
Nazarbayev University, Kazakhstan
Architecture
Geo III is a theorem prover for Partial Classical Logic
[deN11],
based on reduction to Kleene Logic
[deN17].
Currently, only Chapters 4 and 5 are implemented.
Since Kleene logic generalizes 2-valued logic, Geo III can take part in CASC.
Apart from being 3-valued, the main differences with earlier
versions of Geo are the following:
- The Geo family of provers uses exhaustive backtracking, in combination
with learning after failure.
Earlier versions (before 2016) learned only conflict formulas.
Geo III learns disjunctions of arbitrary width.
Experiments show that this often results in shorter proofs.
- If Geo will be ever embedded in proof assistants, these assistants
will require proofs.
In order to be able to provide these at the required level of detail,
Geo III contains a hierarchy of proof rules that is independent of the
rest of the system, and that can be modified independently.
- In order to be more flexible in the main algorithm, recursive
backtracking has been replaced by use of a stack.
By using a stack, it has become possible to implement non-chronological
backtracking, remove unused assumptions, or to rearrange the order of
assumptions.
Also, restarts are easier to implement with a stack.
- Matching a geometric formula into a candidate model is a critical
operation in Geo.
Compared to previous versions, the matching algorithm has been improved
theoretically, reimplemented, and is no longer a bottle neck.
As for future plans, we want to add backward simplification to the main
algorithm.
This involves matching between geometric formulas, which was not
possible before, because we had no usable matching algorithm.
We also want to reimplement proof logging, and to implement full PCL.
Strategies
Geo uses breadth-first, exhaustive model search, combined with learning.
In case of branching, branches are explored in pseudo-random order.
Specially for CASC, a restart strategy was added, which ensures that proof
search is always restarted after 4 minutes.
This was done because Geo III has no indexing.
After some time, proof search becomes so inefficient that it makes no sense
to continue, so that it is better to restart.
Implementation
Geo III is written in C++-14.
No features outside of the standard are used.
It has been tested with g++ (version 4.8.4) and with clang.
The main difference with Geo 2016C is that version 2018C uses a new matching
algorithm, which on average performs 100 to 1000 times better than the
previous one.
Geo-III is available at:
https://cs-sst.github.io/faculty/nivelle/implementation/index
Expected Competition Performance
We are slowly closing the gaps in Geo.
We expect Geo 2018C to be better than 2016C, but the way to the top is long.
Acknowledgement
Development of Geo 2018C was supported by the Polish National Science Center
(NCN) through grant number DEC-2015/17/B/ST6/01898 (Applications for Logic
with Partial Functions).
Grackle 0.1
Jan Jakubuv
Czech Technical University in Prague, Czech Republic
Architecture
Grackle is a bird species found in large numbers through much of North America.
Different subspecies of the grackle family evolved a different bill length.
This has the effect that different subspecies feed on different nutriment and
do not compete with each other.
This motivates the Grackle system.
Grackle 0.1 is a generalization of BliStrTune
[Urb13,JU17,Ju18], a system for invention of complementary E prover strategies, based
on ParamILS system
[HH+09].
BliStrTune was previously extended to invent Vampire strategies
[JSU17] but this is not used here.
Grackle is a next step in this direction of generalization, and it is able to
develop complementary strategies of an arbitrary parametrized algorithm, not
only E or Vampire.
In CASC-J9, however, Grackle is used only to develop E strategies and the main
difference from BliStrTune is a separate invention of SinE parameters for E
prover.
Strategies
The basic strategy is to divide the given time limit for the LTB category
into halves.
In the first half, Grackle is launched to invent a set of well-performing
complementary E strategies on the provided training examples.
Provided solutions of the training examples are not used.
In the second half of the time limit, the invented strategies are evaluated
using E prover.
Hence the output solutions are in E prover output format (TPTP).
The evaluation again has two stages.
Firstly with a small CPU limit (e.g. 10 seconds per strategy and problem) and
then the unsolved problems are evaluated with a longer limit (e.g. 60 seconds).
Implementation
The metasystem is implemented using ATPy Python library available at:
https://github.com/ai4reason/atpy
The Grackle system is a part of this library.
Grackle itself uses ParamILS to improve an existing strategy.
Grackle requires a set of reasonably performing E prover strategies to start
with.
These are extracted from E's auto mode and previous Grackle/BliStrTune runs
on a subset of TPTP library.
The code of ATPy and Grackle is released under GPL2.
Expected Competition Performance
Grackle will compete only in the LTB category.
First solutions are to be expected after the training phase, that is, after
about half of the overall time limit.
As the problems in the LTB category are expected to be large, Grackle would
normally require several days (maybe weeks) for a successful training.
This is because a separate premise selection is still missing in Grackle and
only the SinE algorithm implemented natively in E prover is used.
Hence the expectations are humble and Grackle can only surprise.
iProver 2.6
Konstantin Korovin
University of Manchester, United Kingdom
Architecture
iProver is an automated theorem prover based on an instantiation calculus
Inst-Gen
[GK03,Kor13] which is complete for first-order logic.
iProver combines first-order reasoning with ground reasoning for which it uses
MiniSat
[ES04] and optionally PicoSAT
[Bie08] (only MiniSat will be used at this CASC).
iProver also combines instantiation with ordered resolution; see
[Kor08,Kor13] for the implementation details.
The proof search is implemented using a saturation process based on the given
clause algorithm.
iProver uses non-perfect discrimination trees for the unification indexes,
priority queues for passive clauses, and a compressed vector index for
subsumption and subsumption resolution (both forward and backward).
The following redundancy eliminations are implemented: blocking non-proper
instantiations; dismatching constraints
[GK04,Kor08]; global subsumption
[Kor08]; resolution-based simplifications and propositional-based
simplifications.
A compressed feature vector index is used for efficient forward/backward
subsumption and subsumption resolution.
Equality is dealt with (internally) by adding the necessary axioms of equality.
Recent changes in iProver include improved preprocessing and incremental
finite model finding; support for the TFF format restricted to clauses; the
AIG format for hardware verification and QBF reasoning.
In the LTB and SLH divisions, iProver combines an abstraction-refinement
framework
[HK17] with axiom selection based on the SinE algorithm
[HV11] as implemented in Vampire
[KV13], i.e., axiom selection is done by Vampire and proof attempts are done by
iProver.
Some of iProver features are summarised below.
- proof extraction for both instantiation and resolution
[KS12],
- model representation, using first-order definitions in term algebra
[KS12],
- answer substitutions,
- semantic filtering,
- incremental finite model finding,
- sort inference, monotonic
[CLS11] and non-cyclic
[Kor13] sorts,
- support for the TFF format restricted to clauses,
- predicate elimination
[KK16].
Sort inference is targeted at improving finite model finding and symmetry
breaking.
Semantic filtering is used in preprocessing to eliminated irrelevant clauses.
Proof extraction is challenging due to simplifications such global subsumption
which involve global reasoning with the whole clause set and can be
computationally expensive.
Strategies
iProver has around 60 options to control the proof search including options
for literal selection, passive clause selection, frequency of calling the
SAT solver, simplifications and options for combination of instantiation with
resolution.
At CASC iProver will execute a small number of fixed schedules of selected
options depending on general syntactic properties such as Horn/non-Horn,
equational/non-equational, and maximal term depth.
For the LTB, SLH and FNT divisions several strategies are run in parallel.
Implementation
iProver is implemented in OCaml and for the ground reasoning uses MiniSat
[ES04].
iProver accepts FOF, TFF and CNF formats.
Vampire
[KV13,HK+12] and E prover
[Sch13] are used for proof-producing clausification of FOF/TFF problems,
Vampire is also used for axiom selection [HV11] in the LTB/SLH divisions.
iProver is available at:
http://www.cs.man.ac.uk/~korovink/iprover/
Expected Competition Performance
iProver 2.6 is the CASC-26 EPR division winner.
iProver 2.8
Konstantin Korovin
University of Manchester, United Kingdom
Description Missing
leanCoP 2.2
Jens Otten
University of Oslo, Norway
Architecture
leanCoP
[OB03,
Ott08]
is an automated theorem prover for classical first-order logic with equality.
It is a very compact implementation of the connection (tableau) calculus
[Bib87,
LS01].
Strategies
The reduction rule of the connection calculus is applied before the
extension rule.
Open branches are selected in a depth-first way.
Iterative deepening on the proof depth is performed in order to achieve
completeness.
Additional inference rules and techniques include regularity, lemmata,
and restricted backtracking
[Ott10].
leanCoP uses an optimized structure-preserving transformation
into clausal form
[Ott10]
and a fixed strategy schedule that is controlled by a shell script.
Implementation
leanCoP is implemented in Prolog.
The source code of the core prover consists only of a few lines of code.
Prolog's built-in indexing mechanism is used to quickly find connections
when the extension rule is applied.
leanCoP can read formulae in leanCoP syntax and in
TPTP first-order syntax. Equality axioms and axioms to support
distinct objects are automatically added if required.
The leanCoP core prover returns a very compact connection proof,
which is then translated into a more comprehensive output format,
e.g., into a lean (TPTP-style) connection proof or into a
readable text proof.
The source code of leanCoP 2.2 is available under the GNU general
public license.
It can be downloaded from the leanCoP website at:
http://www.leancop.de
The website also contains information about ileanCoP
[Ott08]
and MleanCoP
[Ott12,
Ott14],
two versions of leanCoP for first-order intuitionistic logic and first-order
modal logic, respectively.
Expected Competition Performance
As the prover has not changed, the performance of leanCoP 2.2
is expected to be the same as last year.
LEO-II 1.7.0
Alexander Steen
Freie Universität Berlin, Germany
Architecture
LEO-II
[BP+08],
the successor of LEO
[BK98],
is a higher-order ATP system based on extensional higher-order resolution.
More precisely, LEO-II employs a refinement of extensional higher-order
RUE resolution
[Ben99].
LEO-II is designed to cooperate with specialist systems for fragments of
higher-order logic.
By default, LEO-II cooperates with the first-order ATP system E
[Sch02].
LEO-II is often too weak to find a refutation amongst the steadily growing
set of clauses on its own.
However, some of the clauses in LEO-II's search space attain a special
status: they are first-order clauses modulo the application of an
appropriate transformation function.
Therefore, LEO-II launches a cooperating first-order ATP system every n
iterations of its (standard) resolution proof search loop (e.g., 10).
If the first-order ATP system finds a refutation, it communicates its success
to LEO-II in the standard SZS format.
Communication between LEO-II and the cooperating first-order ATP system
uses the TPTP language and standards.
Strategies
LEO-II employs an adapted "Otter loop".
Moreover, LEO-II uses some basic strategy scheduling to try different
search strategies or flag settings.
These search strategies also include some different relevance filters.
Implementation
LEO-II is implemented in OCaml 4, and its problem representation language
is the TPTP THF language
[BRS08].
In fact, the development of LEO-II has largely paralleled the development
of the TPTP THF language and related infrastructure
[SB10].
LEO-II's parser supports the TPTP THF0 language and also the TPTP languages
FOF and CNF.
Unfortunately the LEO-II system still uses only a very simple
sequential collaboration model with first-order ATPs instead of using
the more advanced, concurrent and resource-adaptive OANTS architecture
[BS+08]
as exploited by its predecessor LEO.
The LEO-II system is distributed under a BSD style license, and it is
available from
http://www.leoprover.org
Expected Competition Performance
LEO-II ist not actively being developed anymore, hence there are no
expected improvements to last year's CASC results.
Leo-III 1.3
Alexander Steen
Freie Universität Berlin, Germany
Architecture
Leo-III
[SB18], the successor of LEO-II
[BP+08], is a higher-order ATP system based on extensional higher-order
paramodulation with inference restrictions using a higher-order term ordering.
The calculus is augmented with dedicated extensionality rules and equational
simplification routines that have their intellectual roots in first-order
superposition-based theorem proving.
Although Leo-III is originally designed as an agent-based reasoning system,
its current version utilizes one sequential saturation algorithm only.
The saturation algorithm itself is a variant of the given clause loop
procedure.
Leo-III heavily relies on cooperation with external (first-order) ATPs that
are called asynchronously during proof search.
At the moment, first-order cooperation focuses on typed first-order (TFF)
systems, where CVC4
[BC+11] and E
[Sch02,Sch13] are used as default external systems.
Nevertheless, cooperation is not limited to first-order systems.
Further TPTP/TSTP-compliant external systems (such as higher-order ATPs or
counter model generators) may be included using simple command-line arguments.
If the saturation procedure loop (or one of the external provers) finds a
proof, the system stops, generates the proof certificate and returns the
result.
Strategies
Leo-III comes with pre-defined search strategies that can be chosen manually
by the user on start-up.
However, currently, Leo-III supports only very naive automatic strategy
scheduling that is, by default, disabled as its effectivity seems not
well-examined yet.
Strategies will primarily be addressed in further upcoming versions.
Implementation
Leo-III exemplarily utilizes and instantiates the associated LeoPARD system
platform
[WSB15] for higher-order (HO) deduction systems implemented in Scala
(currently using Scala 2.12 and running on a JVM with Java 8).
The prover makes use of LeoPARD's sophisticated data structures and implements
its own reasoning logic on top.
A generic parser is provided that supports all TPTP syntax dialects.
It is implemented using ANTLR4 and converts its produced concrete syntax tree
to an internal TPTP AST data structure which is then transformed into
polymorphically typed lambda terms.
As of version 1.1, Leo-III supports all common TPTP dialects (CNF, FOF,
TFF, THF) as well as its polymorphic variants
[BP13,KRS16].
The term data structure of Leo-III uses a polymorphically typed spine term
representation augmented with explicit substitutions and De Bruijn-indices.
Furthermore, terms are perfectly shared during proof search, permitting
constant-time equality checks between alpha-equivalent terms.
Leo-III's saturation procedure may at any point invoke external reasoning
tools.
To that end, Leo-III includes an encoding module which translates (polymorphic)
higher-order clauses to polymorphic and monomorphic typed first-order clauses,
whichever is supported by the external system.
While LEO-II relied on cooperation with untyped first-order provers, Leo-III
exploits the native type support in first-order provers (TFF logic) for
removing clutter during translation and, in turn, higher effectivity of
external cooperation.
Leo-III is available on GitHub:
https://github.com/leoprover/Leo-III
Expected Competition Performance
As of Leo-III 1.1, novel cooperation schemes with typed first-order provers
were used that significantly increased the reasoning capabilities of Leo-III.
In version 1.3, some minor bug fixes and parameter tweaks were conducted which
should improve Leo-III, at least to some extent, compared to last year's
performance.
Some hard problems may be solved by Leo-III's function synthesis capabilities.
MaLARea 0.6
Josef Urban
Czech Technical University in Prague, Czech Republic
Architecture
MaLARea 0.6
[Urb07,US+08,KUV15]
is a metasystem for ATP in large theories where symbol and formula names are
used consistently.
It uses several deductive systems (now E, SPASS, Vampire, Paradox, Mace), as
well as complementary AI techniques like machine learning (the SNoW system)
based on symbol-based similarity, model-based similarity, term-based
similarity, and obviously previous successful proofs.
The version for CASC-J9 will use the E prover with the BliStr(Tune)
[Urb13,JU17]
large-theory strategies, possibly also Prover9, Mace and Paradox.
The premise selection methods will likely also use the distance-weighted
k-nearest neighbor
[KU13]
and E's implementation of SInE.
Strategies
The basic strategy is to run ATPs on problems, then use the machine learner
to learn axiom relevance for conjectures from solutions, and use
the most relevant axioms for next ATP attempts.
This is iterated, using different timelimits and axiom limits.
Various features are used for learning, and the learning is complemented by
other criteria like model-based reasoning, symbol and term-based similarity,
etc.
Implementation
The metasystem is implemented in ca. 2500 lines of Perl.
It uses many external programs - the above mentioned ATPs and machine learner,
TPTP utilities, LADR utilities for work with models, and some standard
Unix tools.
MaLARea is available at:
https://github.com/JUrban/MPTP2/tree/master/MaLARea
The metasystem's Perl code is released under GPL2.
Expected Competition Performance
Thanks to machine learning, MaLARea is strongest on batches of many
related problems with many redundant axioms where some of the problems
are easy to solve and can be used for learning the axiom relevance.
MaLARea is not very good when all problems are too difficult (nothing to
learn from), or the problems (are few and) have nothing in common.
Some of its techniques (selection by symbol and term-based similarity,
model-based reasoning) could however make it even there slightly stronger
than standard ATPs.
MaLARea has a very good performance on the MPTP Challenge, which is a
predecessor of the LTB division, and on several previous LTB competitions.
nanoCoP---1.1
Jens Otten
University of Oslo, Norway
Architecture
nanoCoP
[Ott16,Ott17]
is an automated theorem prover for classical first-order logic with equality.
It is a very compact implementation of the non-clausal connection calculus
[Ott11].
Strategies
An additional decomposition rule is added to the clausal connection
calculus and the extension rule is generalized to non-clausal formulae.
Open branches are selected in a depth-first way.
Iterative deepening on the proof depth is performed in order to achieve
completeness.
Additional inference rules and techniques include regularity, lemmata,
restricted backtracking, and a fixed strategy schedule that is
controlled by a shell script
[Ott18].
Implementation
nanoCoP is implemented in Prolog.
The source code of the core prover consists only of a few lines of code.
Prolog's built-in indexing mechanism is used to quickly find connections
when the extension rule is applied.
nanoCoP can read formulae in leanCoP/nanoCoP syntax and in TPTP first-order
syntax. Equality axioms are automatically added if required.
The nanoCoP core prover returns a compact non-clausal connection proof.
The source code of nanoCoP 1.1 is available under the GNU general
public license.
It can be downloaded from the nanoCoP website at:
http://www.leancop.de/nanocop
The provers nanoCoP-i and nanoCoP-M are version of nanoCoP for first-order
intuitionistic logic and first-order modal logic, respectively.
They are based on an adapted non-clausal connection calculus for non-classical
logics
[Ott17].
Expected Competition Performance
nanoCoP is expected to have a better performance than leanCoP on formulae that
have a nested (non-clausal) structure.
Princess 170717
Philipp Rümmer
Uppsala University, Sweden
Architecture
Princess
[Rue08,Rue12] is a theorem prover for first-order logic
modulo linear integer arithmetic.
The prover uses a combination of techniques from the areas of first-order
reasoning and SMT solving.
The main underlying calculus is a free-variable tableau calculus,
which is extended with constraints to enable backtracking-free proof
expansion, and positive unit hyper-resolution for lightweight
instantiation of quantified formulae.
Linear integer arithmetic is handled using a set of built-in proof rules
resembling the Omega test, which altogether yields a calculus that is
complete for full Presburger arithmetic, for first-order logic, and for a
number of further fragments.
In addition, some built-in procedures for nonlinear integer arithmetic are
available.
The internal calculus of Princess only supports uninterpreted predicates;
uninterpreted functions are encoded as predicates, together with the usual
axioms.
Through appropriate translation of quantified formulae with functions, the
e-matching technique common in SMT solvers can be simulated; triggers in
quantified formulae are chosen based on heuristics similar to those in the
Simplify prover.
Strategies
For CASC, Princess will run a fixed schedule of configurations for each
problem (portfolio method).
Configurations determine, among others, the mode of proof expansion
(depth-first, breadth-first), selection of triggers in quantified formulae,
clausification, and the handling of functions.
The portfolio was chosen based on training with a random sample of problems
from the TPTP library.
Implementation
Princess is entirely written in Scala and runs on any recent Java virtual
machine; besides the standard Scala and Java libraries, only the Cup parser
library is used.
Princess is available from:
http://www.philipp.ruemmer.org/princess.shtml
Expected Competition Performance
Princess should perform roughly as in the last years.
Compared to last year, the support for outputting proofs was extended, and
should now cover all relevant theory modules for CASC (but not yet all proof
strategies).
Prover9 1109a
Bob Veroff on behalf of William McCune
University of New Mexico, USA
Architecture
Prover9, Version 2009-11A, is a resolution/paramodulation prover for
first-order logic with equality.
Its overall architecture is very similar to that of Otter-3.3
[McC03].
It uses the "given clause algorithm", in which not-yet-given clauses are
available for rewriting and for other inference operations (sometimes called
the "Otter loop").
Prover9 has available positive ordered (and nonordered) resolution and
paramodulation, negative ordered (and nonordered) resolution, factoring,
positive and negative hyperresolution, UR-resolution, and demodulation
(term rewriting).
Terms can be ordered with LPO, RPO, or KBO.
Selection of the "given clause" is by an age-weight ratio.
Proofs can be given at two levels of detail:
(1) standard, in which each line of the proof is a stored clause
with detailed justification, and
(2) expanded, with a separate line for each operation.
When FOF problems are input, proof of transformation to clauses
is not given.
Completeness is not guaranteed, so termination does not indicate
satisfiability.
Strategies
Prover9 has available many strategies; the following
statements apply to CASC.
Given a problem, Prover9 adjusts its inference rules and strategy according
to syntactic properties of the input clauses such as the presence of
equality and non-Horn clauses.
Prover9 also does some preprocessing, for example, to eliminate predicates.
For CASC Prover9 uses KBO to order terms for demodulation and for the
inference rules, with a simple rule for determining symbol precedence.
For the FOF problems, a preprocessing step attempts to
reduce the problem to independent subproblems by a miniscope
transformation; if the problem reduction succeeds, each
subproblem is clausified and given to the ordinary search
procedure; if the problem reduction fails, the original
problem is clausified and given to the search procedure.
Implementation
Prover9 is coded in C, and it uses the LADR libraries.
Some of the code descended from EQP
[McC97].
(LADR has some AC functions, but Prover9 does not use them).
Term data structures are not shared (as they are in Otter).
Term indexing is used extensively, with discrimination tree indexing for
finding rewrite rules and subsuming units, FPA/Path indexing for finding
subsumed units, rewritable terms, and resolvable literals.
Feature vector indexing
[Sch04]
is used for forward and backward nonunit subsumption.
Prover9 is available from
http://www.cs.unm.edu/~mccune/prover9/
Expected Competition Performance
Prover9 is the CASC fixed point, against which progress can be judged.
Each year it is expected do worse than the previous year, relative to the
other systems.
Satallax 3.2
Michael Färber
Universität Innsbruck, Austria
Architecture
Satallax 3.2
[Bro12]
is an automated theorem prover for higher-order logic.
The particular form of higher-order logic supported by Satallax is Church's
simple type theory with extensionality and choice operators.
The SAT solver MiniSat
[ES04]
is responsible for much of the proof search.
The theoretical basis of search is a complete ground tableau calculus for
higher-order logic
[BS10]
with a choice operator
[BB11].
Problems are given in the THF format.
Proof search:
A branch is formed from the axioms of the problem and the negation of the
conjecture (if any is given).
From this point on, Satallax tries to determine unsatisfiability or
satisfiability of this branch.
Satallax progressively generates higher-order formulae and corresponding
propositional clauses
[Bro13].
These formulae and propositional clauses correspond to instances of the
tableau rules.
Satallax uses the SAT solver MiniSat to test the current set of propositional
clauses for unsatisfiability.
If the clauses are unsatisfiable, then the original branch is unsatisfiable.
Optionally, Satallax generates first-order formulae in addition to the
propositional clauses.
If this option is used, then Satallax periodically calls the first-order
theorem prover E
[Sch13] to test for first-order unsatisfiability.
If the set of first-order formulae is unsatisfiable, then the original branch
is unsatisfiable.
Upon request, Satallax attempts to reconstruct a proof which can be output
in the TSTP format.
The proof reconstruction has been significantly changed since Satallax 3.0
in order to make proof reconstruction more efficient and thus less likely to
fail within the time constraints.
Strategies
There are about 150 flags that control the order in which formulae and
instantiation terms are considered and propositional clauses are generated.
Other flags activate some optional extensions to the basic proof procedure
(such as whether or not to call the theorem prover E).
A collection of flag settings is called a mode.
Approximately 500 modes have been defined and tested so far.
A strategy schedule is an ordered collection of modes with information about
how much time the mode should be allotted.
Satallax tries each of the modes for a certain amount of time sequentially.
Before deciding on the schedule to use, Satallax parses the problem and
determines if it is big enough that a SInE-based premise selection algorithm
[HV11] should be used.
If SInE is not activated, then Satallax 3.2 uses a strategy schedule
consisting of 37 modes.
Each mode is tried for time limits ranging from less than a second to just
over 1 minute.
If SInE is activated, than Satallax is run with a SInE-specific schedule
consisting of 20 possible SInE parameter values selecting different premises
and some corresponding modes and time limits.
Implementation
Satallax is implemented in OCaml, making use of the external tools MiniSat
(via a foreign function interface) and E.
Satallax is available at:
http://satallaxprover.com
Expected Competition Performance
Satallax 3.2 is the CASC-26 THF division winner.
Satallax 3.3
Michael Färber
Universität Innsbruck, Austria
Architecture
Satallax 3.3
[Bro12]
is an automated theorem prover for higher-order logic.
The particular form of higher-order logic supported by Satallax is Church's
simple type theory with extensionality and choice operators.
The SAT solver MiniSat
[ES04]
is responsible for much of the proof search.
The theoretical basis of search is a complete ground tableau calculus for
higher-order logic
[BS10]
with a choice operator
[BB11].
Problems are given in the THF format.
Proof search:
A branch is formed from the axioms of the problem and the negation of the
conjecture (if any is given).
From this point on, Satallax tries to determine unsatisfiability or
satisfiability of this branch.
Satallax progressively generates higher-order formulae and corresponding
propositional clauses
[Bro13].
These formulae and propositional clauses correspond to instances of the
tableau rules.
Satallax uses the SAT solver MiniSat to test the current set of propositional
clauses for unsatisfiability.
If the clauses are unsatisfiable, then the original branch is unsatisfiable.
Optionally, Satallax generates first-order formulae in addition to the
propositional clauses.
If this option is used, then Satallax periodically calls the first-order
theorem prover E
[Sch13] to test for first-order unsatisfiability.
If the set of first-order formulae is unsatisfiable, then the original branch
is unsatisfiable.
Upon request, Satallax attempts to reconstruct a proof which can be output
in the TSTP format.
Strategies
There are about 150 flags that control the order in which formulae and
instantiation terms are considered and propositional clauses are generated.
Other flags activate some optional extensions to the basic proof procedure
(such as whether or not to call the theorem prover E).
A collection of flag settings is called a mode.
Approximately 500 modes have been defined and tested so far.
A strategy schedule is an ordered collection of modes with information about
how much time the mode should be allotted.
Satallax tries each of the modes for a certain amount of time sequentially.
Before deciding on the schedule to use, Satallax parses the problem and
determines if it is big enough that a SInE-based premise selection algorithm
[HV11] should be used.
If SInE is not activated, then Satallax 3.3 uses a strategy schedule
consisting of 48 modes (16 of which make use of Mizar style soft types).
Each mode is tried for time limits ranging from less than a second to just
over 1 minute.
If SInE is activated, than Satallax is run with a SInE-specific schedule
consisting of 20 possible SInE parameter values selecting different premises
and some corresponding modes and time limits.
Implementation
Satallax is implemented in OCaml, making use of the external tools MiniSat
(via a foreign function interface) and E.
Satallax is available at:
http://satallaxprover.com
Expected Competition Performance
Satallax 3.3 adds support for Mizar style soft types.
As this technique has only minor impact on proof search performance, we
expect Satallax 3.3 to perform about as well as Satallax 3.2.
Twee 2.2
Nick Smallbone
Chalmers University of Technology, Sweden
Architecture
Twee 2.2 is an equational prover based on unfailing completion
[BDP89].
It features ground joinability testing
[MN90] and a connectedness test
[BD88], which together eliminate many redundant inferences in the presence of
unorientable equations.
Twee's implementation of ground joinability testing performs case splits on
the order of variables, in the style of
[MN90], and discharges individual cases by rewriting modulo a variable ordering.
It is able to pick only useful case splits and to case split on a subset of
the variables, which makes it efficient enough to be switched on
unconditionally.
Horn clauses are encoded as equations as described in
[CS18].
The CASC version of Twee "handles" non-Horn clauses by discarding them.
Strategies
Twee's strategy is simple and it does not tune its heuristics or strategy
based on the input problem.
The term ordering is always KBO; functions are ordered by number of
occurrences (commonly-occurring symbols are smaller) and always have weight 1.
The main loop is a DISCOUNT loop.
The active set contains rewrite rules and unorientable equations, which are
used for rewriting, and the passive set contains unprocessed critical pairs.
Twee often interreduces the active set, and occasionally simplifies the
passive set with respect to the active set.
Each critical pair is scored using a weighted sum of the weight of both of
its terms. Terms are treated as DAGs when computing weights, i.e., duplicate
subterms are only counted once per term.
The weights of critical pairs that correspond to Horn clauses are adjusted
by the heuristic described in
[CS18], section 5.
Implementation
Twee is written in Haskell.
Terms are represented as array-based flatterms for efficient unification and
matching.
Rewriting uses an imperfect discrimination tree.
The passive set is represented as a heap.
It achieves high space efficiency (12 bytes per critical pair) by storing
the parent rule numbers and overlap position instead of the full critical
pair and by grouping all critical pairs of each rule into one heap entry.
Twee uses an LCF-style kernel: all rules in the active set come with a
certified proof object which traces back to the input axioms.
When a conjecture is proved, the proof object is transformed into a
human-readable proof.
Proof construction does not harm efficiency because the proof kernel is
invoked only when a new rule is accepted.
In particular, reasoning about the passive set does not invoke the kernel.
The translation from Horn clauses to equations is not yet certified.
Twee can be downloaded from:
http://nick8325.github.io/twee
Expected Competition Performance
Twee is an equational prover dressed up as a Horn clause prover, so its
performance on general first-order problems will be poor.
It will do well on Horn problems with a heavy equational component.
With a bit of luck, it may solve a few hard problems.
Vampire 4.0
Giles Reger
University of Manchester, United Kingdom
Architecture
Vampire 4.0 is an automatic theorem prover for first-order logic.
Vampire implements the calculi of ordered binary resolution and superposition
for handling equality.
It also implements the Inst-gen calculus and a MACE-style finite model builder.
Splitting in resolution-based proof search is controlled by the AVATAR
architecture, which uses a SAT solver to make splitting decisions.
Both resolution and instantiation based proof search make use of global
subsumption.
A number of standard redundancy criteria and simplification techniques are used
for pruning the search space: subsumption, tautology deletion, subsumption
resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Substitution tree and code tree indexes are used to implement all major
operations on sets of terms, literals and clauses.
Internally, Vampire works only with clausal normal form.
Problems in the full first-order logic syntax are clausified during
preprocessing.
Vampire implements many useful preprocessing transformations including the
Sine axiom selection algorithm.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
Strategies
Vampire 4.0 provides a very large number of options for strategy selection. The most important ones are:
- Choices of saturation algorithm:
- Limited Resource Strategy
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
- Splitting via AVATAR
- A variety of optional simplifications.
- Parameterized reduction orderings.
- A number of built-in literal selection functions and different modes
of comparing literals.
- Age-weight ratio that specifies how strongly lighter clauses are preferred
for inference selection.
- Set-of-support strategy.
- Ground equational reasoning via congruence closure.
- Evaluation of interpreted functions.
- Extensionality resolution with detection of extensionality axioms
Implementation
Vampire 4.0 is implemented in C++.
Expected Competition Performance
Vampire 4.0 is the CASC-26 LTB division winner.
Vampire 4.1
Giles Reger
University of Manchester, United Kingdom
Architecture
Vampire
[KV13]
4.1 is an automatic theorem prover for first-order logic.
Vampire implements the calculi of ordered binary resolution and superposition
for handling equality.
It also implements the Inst-gen calculus
[Kor13]
and a MACE-style finite model builder
[RSV16].
Splitting in resolution-based proof search is controlled by the AVATAR
architecture
[Vor14]
which uses a SAT or SMT solver to make splitting decisions.
Both resolution and instantiation based proof search make use of global
subsumption
[Kor13].
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion,
subsumption resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Substitution tree and code tree indexes are used to implement all major
operations on sets of terms, literals and clauses.
Internally, Vampire works only with clausal normal form.
Problems in the full first-order logic syntax are clausified during
preprocessing.
Vampire implements many useful preprocessing transformations including the
SinE axiom selection algorithm.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
Strategies
Vampire 4.1 provides a very large number of options for strategy selection.
The most important ones are:
- Choices of saturation algorithm:
- Limited Resource Strategy
[RV03].
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
- Splitting via AVATAR
- A variety of optional simplifications.
- Parameterized reduction orderings.
- A number of built-in literal selection functions and different modes
of comparing literals.
- Age-weight ratio that specifies how strongly lighter clauses are
preferred for inference selection.
- Set-of-support strategy.
- Ground equational reasoning via congruence closure.
- Addition of theory axioms and evaluation of interpreted functions.
- Use of Z3
[dMB08]
with AVATAR to restrict search to ground-theory-consistent splitting
branches.
- Extensionality resolution
[GK+14]
with detection of extensionality axioms.
Implementation
Vampire 4.1 is implemented in C++.
Expected Competition Performance
Vampire 4.1 is the CASC-26 TFA and FNT division winner.
Vampire 4.2
Giles Reger
University of Manchester, United Kingdom
Architecture
Vampire
[KV13] 4.2 is an automatic theorem prover for first-order logic.
Vampire implements the calculi of ordered binary resolution and superposition
for handling equality.
It also implements the Inst-gen calculus and a MACE-style finite model builder
[RSV16].
Splitting in resolution-based proof search is controlled by the AVATAR
architecture which uses a SAT or SMT solver to make splitting decisions
[Vor14,RB+16].
Both resolution and instantiation based proof search make use of global
subsumption.
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion, subsumption
resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Substitution tree and code tree indexes are used to implement all major
operations on sets of terms, literals and clauses.
Internally, Vampire works only with clausal normal form.
Problems in the full first-order logic syntax are clausified during
preprocessing.
Vampire implements many useful preprocessing transformations including the
SinE axiom selection algorithm.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
Strategies
Vampire 4.2 provides a very large number of options for strategy selection.
The most important ones are:
- Choices of saturation algorithm:
- Limited Resource Strategy
[RV03]
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
- Splitting via AVATAR
[Vor14]
- A variety of optional simplifications.
- Parameterized reduction orderings.
- A number of built-in literal selection functions and different modes of
comparing literals
[HR+16].
- Age-weight ratio that specifies how strongly lighter clauses are preferred
for inference selection.
- Set-of-support strategy.
- Ground equational reasoning via congruence closure.
- Addition of theory axioms and evaluation of interpreted functions.
- Use of Z3 with AVATAR to restrict search to ground-theory-consistent
splitting branches
[RB+16].
- Specialised theory instantiation and unification
- Extensionality resolution with detection of extensionality axioms
Implementation
Vampire 4.2 is implemented in C++. It makes use of minisat and z3.
Expected Competition Performance
Vampire 4.2 is the CASC-26 FOF division winner.
Vampire 4.3
Giles Reger
University of Manchester, United Kingdom
This description is very similar to that of Vampire 4.2.
The main difference is the use of theory instantiation and unification with
abstraction
[RSV18]
for theory reasoning (this was experimental in 4.2). The set-of-support
strategy for theory reasoning has also been extended.
Little has changed in other areas of Vampire.
As always there have been some small improvements to heuristics, data
structures and schedules but nothing fundamentally new.
Architecture
Vampire
[KV13] 4.3 is an automatic theorem prover for first-order logic.
Vampire implements the calculi of ordered binary resolution and superposition
for handling equality.
It also implements the Inst-gen calculus and a MACE-style finite model builder
[RSV16].
Splitting in resolution-based proof search is controlled by the AVATAR
architecture which uses a SAT or SMT solver to make splitting decisions
[Vor14,RB+16].
Both resolution and instantiation based proof search make use of global
subsumption.
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion, subsumption
resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Substitution tree and code tree indexes are used to implement all major
operations on sets of terms, literals and clauses.
Internally, Vampire works only with clausal normal form.
Problems in the full first-order logic syntax are clausified during
preprocessing.
Vampire implements many useful preprocessing transformations including the
SinE axiom selection algorithm.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
Strategies
Vampire 4.3 provides a very large number of options for strategy selection.
The most important ones are:
- Choices of saturation algorithm:
- Limited Resource Strategy
[RV03]
- DISCOUNT loop
- Otter loop
- Instantiation using the Inst-Gen calculus
- MACE-style finite model building with sort inference
- Splitting via AVATAR
[Vor14]
- A variety of optional simplifications.
- Parameterized reduction orderings.
- A number of built-in literal selection functions and different modes of
comparing literals
[HR+16].
- Age-weight ratio that specifies how strongly lighter clauses are preferred
for inference selection.
- Set-of-support strategy.
- Ground equational reasoning via congruence closure.
- Addition of theory axioms and evaluation of interpreted functions.
- Use of Z3 with AVATAR to restrict search to ground-theory-consistent
splitting branches
[RB+16].
- Specialised theory instantiation and unification
[RSV18]
- Extensionality resolution with detection of extensionality axioms
Implementation
Vampire 4.3 is implemented in C++. It makes use of minisat and z3.
Expected Competition Performance
Vampire 4.3 should be better than 4.2 at theory reasoning and as good as 4.2 at everything else.