An E-hyper tableau is a tree whose nodes are labeled with clauses and which
is built up by the application of the inference rules of the E-hyper tableau
calculus.
The calculus rules are designed such that most of the reasoning is performed
using positive unit clauses.
Splitting is done without rigid variables.
Instead, variables which would be shared between branches are prevented by
ground substitutions, which are guessed from the Herbrand universe and
constrained by rewrite rules.
Redundancy rules allow the detection and removal of clauses that are
redundant with respect to a branch.
The hyper extension inference from the original hyper tableau calculus is
equivalent to a series of E-hyper tableau calculus inference applications.
Therefore the implementation of the hyper extension in KRHyper by a variant
of semi-naive evaluation [Ull89] is retained in E-KRHyper, where it serves
as a shortcut inference for the resolution of non-equational literals.
From the known performance of different E strategies over the TPTP library,
the system learns which strategy is most likely to solve a problem.
Given a new problem, E-MaLeS extracts the problems features (number of
terms, number of literals, etc) and applies the learned function to the
features.
The result is a (hopefully optimal) strategy.
As a new addition in 1.1, E-MaLeS learns not only the optimal strategy, but
also for how long it should run each strategy.
This allows better strategy scheduling.
EP 1.4pre is just a combination of E 1.4pre in verbose mode
and a proof analysis tool extracting the used inference steps. For the
LTB division, a control program uses a SInE-like analysis to extract
reduced axiomatizations that are handed to several instances of E.
The automatic mode is based on a static partition of the set of all
clausal problems based on a number of simple features (number of
clauses, maximal symbol arity, presence of equality, presence of
non-unit and non-Horn clauses,...). Each class of clauses is
automatically assigned a heuristic that performs well on problems from
this class in test runs. About 100 different strategies have been
evaluated on all untyped first-order problems from TPTP 4.1.0.
EP 1.6pre is just a combination of E 1.6pre in verbose mode
and a proof analysis tool extracting the used inference steps. For the
LTB and MZR divisions, a control program uses a SInE-like analysis to
extract reduced axiomatizations that are handed to several instances
of E.
The automatic mode is based on a static partition of the set of all
clausal problems based on a number of simple features (number of
clauses, maximal symbol arity, presence of equality, presence of
non-unit and non-Horn clauses,...). Each class of clauses is
automatically assigned a heuristic that performs well on problems from
this class in test runs. About 60 different strategies have been
evaluated on all bug-free untyped first-order problems from TPTP
5.3.0.
Fimo features symmetry breaking and incremental answer set solving provided by
the underlying iASP system iClingo.
Additionally, Fimo deskolemizes some of the Skolem functions by transforming
them to aggregates in the resulting logic program.
Major additions in the current version are:
iProver is available from
In the LTB division, iProver uses axiom selection based
on the SInE algorithm [HV11] as implemented in Vampire [HKV11],
i.e., axiom selection is done by Vampire and
proof attempts are done by iProver.
iProver features
are summaries below with recent additions marked with (*).
iProver is available from
iProver-Eq consists of three core components: i) ground reasoning
by an SMT solver, ii) first-order equational reasoning on literals
in a candidate model by a labelled unit superposition calculus
[KS10,KS10] and iii) instantiation of clauses with substitutions
obtained by ii).
Given a set of first-order clauses, iProver-Eq first abstracts it
to a set of ground clauses which are then passed to the ground
solver. If the ground abstraction is unsatisfiable, then the set of
first-order clauses is also unsatisfiable. Otherwise, literals are
selected from the first-order clauses based on the model of the
ground solver. The labelled unit superposition calculus checks
whether selected literals are conflicting. If they are conflicting,
then clauses are instantiated such that the ground solver has to
refine its model in order to resolve the conflict. Otherwise,
satisfiability of the initial first-order clause set is shown.
Clause selection and literal selection in the unit superposition
calculus are implemented in separate given clause
algorithms. Relevant substitutions are accumulated in labels during
unit superposition inferences and then used to instantiate
clauses. For redundancy elimination iProver-Eq uses demodulation,
dismatching constraints and global subsumption. In order to
efficiently propagate redundancy elimination from instantiation into
unit superposition, we implemented different representations of
labels based on sets, AND/OR-trees and OBDDs. Non-equational
resolution and equational superposition inferences provide further
simplifications.
For the LTB division, iProver-Eq-SInE runs iProver-Eq as the
underlying inference engine of SInE [HV11], i.e., axiom selection is
done by SInE, and proof attempts are done by iProver-Eq.
If no equational literals occur in the input, iProver-Eq falls back
to the inference rules of iProver, otherwise the latter are disabled
and only unit superposition is used. If all clauses are unit
equations, no instances need to be generated and the calculus
is run without the otherwise necessary bookkeeping.
iProver-Eq is available from
Previous versions of Isabelle relied on the TPTP2X tool to translate TPTP
files to Isabelle theory files.
Starting this year, Isabelle includes a parser for the TPTP syntaxes CNF,
FOF, TFF0, and THF0 as well as TPTP versions of its popular tools, invokable
on the command line as isabelle tptp_tool max_secs
file.p.
For example:
The implementation of Isabelle relies on a small LCF-style kernel, meaning that
inferences are implemented as operations on an abstract theorem
datatype.
Assuming the kernel is correct, all values of type theorem are
correct by construction.
Most of the code for Isabelle was written by the Isabelle teams at the
University of Cambridge and the Technische Universität München.
Isabelle/HOL is available for all major platforms under a BSD-style license
from
leanCoP can read formulae in leanCoP syntax as well as in
(raw) TPTP first-order syntax.
Equality axioms and axioms to support distinct objects are automatically
added if required.
The leanCoP core prover returns a very compact connection proof,
which can be translated into different proof formats,
e.g., into a lean (unofficial) TPTP syntax format for representing
connection proofs [OS10] or into a readable text proof.
The leanCoP core prover and all its additional components run on any
(standard) Prolog system; ECLiPSe, SICStus and SWI Prolog are currently
explicitly supported.
The shell script that implements the fixed strategy scheduling
runs on Linux/Unix and MacOS platforms as well as on most Windows platforms.
The source code of leanCoP 2.2 is available under the GNU general
public license.
Together with more information it can be found on the
leanCoP website at
LEO-II's parser supports the TPTP THF0 language and also the TPTP languages
FOF and CNF.
Unfortunately the LEO-II system still uses only a very simple
sequential collaboration model with first-order ATPs instead of using
the more advanced, concurrent and resource-adaptive OANTS architecture
[BS+08] as exploited by its predecessor LEO.
The LEO-II system is distributed under a BSD style license, and it is
available from
MaLARea is available from
The system is also able to work with second order statements. It
may also receive knowledge and know-how for a specific domain from a
human user; see [Pas89] and [Pas93].
These two possibilities are not used while working with the TPTP Library.
Muscadet is available from
The translation from HOL to Kodkod's first-order relational logic (FORL) is
parameterized by the cardinalities of the atomic types occurring in it.
Nitrox enumerates the possible cardinalities for the universe.
If a formula has a finite counterexample, the tool eventually finds it,
unless it runs out of resources.
Nitpick is optimized to work with higher-order logic (HOL) and its definitional
principles (e.g., (co)inductive predicates, (co)inductive datatypes,
(co)recursive functions).
When invoked on untyped first-order problem, few of its optimizations come
into play, and the problem handed to Kodkod is essentially a first-order
relational logic (FORL) rendering of the TPTP FOF problem.
There are two main exceptions:
Paradox incorporates the following features: Polynomial-time
clause splitting heuristics, the use of incremental SAT,
static symmetry reduction techniques, and the use of sort
inference.
The internal calculus of Princess only supports uninterpreted
predicates; uninterpreted functions are encoded as predicates,
together with the usual axioms.
Through appropriate translation of quantified formulae with functions,
the e-matching technique common in SMT solvers can be simulated;
triggers in quantified formulae are chosen based on heuristics similar
to those in the Simplify prover.
For CASC, Princess will run a schedule with a small number of
configurations for each problem (portfolio method). The schedule is
determined either statically, or dynamically using syntactic
attributes of problems (such as number and kind of quantifiers, etc),
based on training using a random sample of problems from the TPTP
library.
Prover9 has available positive ordered (and nonordered) resolution and
paramodulation, negative ordered (and nonordered) resolution, factoring,
positive and negative hyperresolution, UR-resolution, and demodulation
(term rewriting).
Terms can be ordered with LPO, RPO, or KBO.
Selection of the "given clause" is by an age-weight ratio.
Proofs can be given at two levels of detail:
(1) standard, in which each line of the proof is a stored clause
with detailed justification, and
(2) expanded, with a separate line for each operation.
When FOF problems are input, proof of transformation to clauses
is not given.
Completeness is not guaranteed, so termination does not indicate
satisfiability.
Given a problem, Prover9 adjusts its inference rules and strategy according
to syntactic properties of the input clauses such as the presence of
equality and non-Horn clauses.
Prover9 also does some preprocessing, for example, to eliminate predicates.
In previous CASC competitions, Prover9 has used LPO to order terms for
demodulation and for the inference rules, with a simple rule for determining
symbol precedence.
For CASC 2012, we are going to use KBO instead.
For the FOF problems, a preprocessing step attempts to
reduce the problem to independent subproblems by a miniscope
transformation; if the problem reduction succeeds, each
subproblem is clausified and given to the ordinary search
procedure; if the problem reduction fails, the original
problem is clausified and given to the search procedure.
Finishes in the middle of the pack are anticipated in all categories
in which Prover9 competes.
The theoretical basis of search is a complete ground tableau calculus for
higher-order logic [BS10] with a choice operator [BB11].
A problem is given in the THF format.
A branch is formed from the axioms of the problem and the negation of the
conjecture (if any is given).
From this point on, Satallax tries to determine unsatisfiability or
satisfiability of this branch.
Satallax progressively generates higher-order formulae and corresponding
propositional clauses [Bro11].
These formulae and propositional clauses correspond to instances of the
tableau rules.
Satallax uses the SAT solver MiniSat as an engine to test the current set
of propositional clauses for unsatisfiability.
If the clauses are unsatisfiable, then the original branch is unsatisfiable.
If there are no quantifiers at function types, the generation of higher-order
formulae and corresponding clauses may terminate [BS09,BS09].
In such a case, if MiniSat reports the final set of clauses as satisfiable,
then the original set of higher-order formulae is satisfiable (by a standard
model in which all types are interpreted as finite sets).
A strategy schedule is an ordered collection of modes with information about
how much time the mode should be allotted.
Satallax tries each of the modes for a certain amount of time sequentially.
Satallax 2.1 has eight strategy schedules which were determined through
experimentation using the THF problems in version 5.1.0 of the TPTP library.
One of these eight strategy schedules is chosen based on the amount of time
Satallax is given to solve the problem.
For example, if Satallax is given 180 seconds to solve the problem, then
a schedule with 38 modes is chosen.
Satallax progressively generates higher-order formulae and corresponding
propositional clauses [Bro11].
These formulae and propositional clauses correspond to instances of the
tableau rules.
Satallax uses the SAT solver MiniSat as an engine to test the current set of
propositional clauses for unsatisfiability.
If the clauses are unsatisfiable, then the original branch is unsatisfiable.
The current system implements a complete calculus based on binary resolution.
Superdeduction [BHK07] is a variant of deduction modulo [DHK03], which is an
extension of logical deduction systems consisting in canonically adding ad hoc
deduction rules translating axiomatic theories. This has several advantages:
Substitution tree and code tree indexes are used to implement all major
operations on sets of terms, literals and clauses.
Although the kernel of the system works only with clausal normal form, the
shell accepts a problem in the full first-order logic syntax, clausifies
it and performs a number of useful transformations before passing the result
to the kernel.
Also the axiom selection algorithm Sine [HV11] can be enabled as part of
the preprocessing.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
Substitution tree and code tree indexes are used to implement all major
operations on sets of terms, literals and clauses.
Although the kernel of the system works only with clausal normal form, the
shell accepts a problem in the full first-order logic syntax, clausifies it
and performs a number of useful transformations before passing the result
to the kernel.
Also the axiom selection algorithm Sine [HV11] can be enabled as part of
the preprocessing.
When a theorem is proved, the system produces a verifiable proof, which
validates both the clausification phase and the refutation of the CNF.
Zenon outputs formal proofs that can be checked by Coq or Isabelle.
Zenon is available from
The FOF division now includes some CNF problems; this will probably
hinder Zenon's performance somewhat.
CVC4 0.0
Andrew Reynolds1, Cesare Tinelli1, Clark Barrett2
1University of Iowa, USA, 2New York University, USA
Architecture
CVC4 is a Satisfiability Modulo Theories (SMT) solver with support for many
theories including linear arithmetic, arrays, bit vectors, and datatypes.
Like most SMT solvers, it is based on the DPLL(T) framework [NOT06].
The latest development version of CVC4 supports reasoning for first-order
formulas, including a finite model finding feature which can answer
satisfiable in the presence of quantified (universal) formulas.
Using a theory solver for EUF with cardinality constraints, the system finds a
minimal candidate model with respect to uninterpreted sorts, that is, one
with a minimal number of ground equivalence classes.
When such a candidate model is found, the system will build interpretations
for all uninterpreted functions and predicates, using heuristics for defining
default values.
Consistency is then checked by exhaustively instantiating all quantified
formulas with ground instances that may falsify the candidate
interpretation.
The candidate interpretation is then refined according to these
instantiations.
This process is repeated until a fixed point is reached, at which point the
candidate interpretation specifies a model.
Strategies
The EUF with cardinality constraints theory solver searches for candidate
models for a sort S incrementally, i.e. it first postulates that the
cardinality of S = 1, and if it fails will postulate the cardinality of S =
2, etc.
Given cardinality constraint k for S, it maintains a disequality graph
induced by the ground constraints between terms of sort S.
The solver attempts to merge equivalence classes of type S using splitting on
demand [BN+06] until no further equivalence classes can be merged.
The solver will guide the search by adding conflict lemmas of the form
( C => card(S)>k ), where C is the explanation of a clique of size k+1 in the
disequality graph of S.
The strategies for model completion and quantifier instantiation are very
preliminary at this point.
Implementation
CVC4 is implemented in C++.
Much of the code uses context dependent data structures, whose values
automatically update when the DPLL(T) search backtracks.
The finite model finding module maintains discrimination-tree-like data
structures for storing candidate interpretations for functions and
predicates, as well as the set of instantiations produced.
Since the CVC4 parser does not support the TPTP syntax, the system relies
on a translation from TPTP to SMT2.
Expected Competition Performance
The finite model finding feature in CVC4 is still in development.
Much can be done to improve the strategy for producing instantiations,
including recognizing when certains axioms do not apply.
Currently, the bottleneck for CVC4+finite model finding is in processing
the large number of instantiations produced.
This problem is made worse by the fact that CVC4 currently has no way of
removing lemmas once they are added to the system.
Preliminary results on TPTP problems are favorable compared to other SMT
solvers, which however are not generally equipped to find models for
universal formulas.
E-Darwin 1.5
Björn Pelzer
University Koblenz-Landau, Germany
Architecture
E-Darwin 1.5 [BFT04,BPT11] is an automated theorem prover for first order
clausal logic with equality.
It is a modified version of the Darwin prover [BFT04], intended as a testbed
for variants of the Model Evolution calculus [BT03].
Among other things it implements several different approaches [BT05,BPT11]
to incorporating equality reasoning into Model Evolution.
Three principal data structures are used: the context (a set of rewrite
literals), the set of constrained clauses, and the set of derived candidates.
The prover always selects one candidate, which may be a new clause or a new
context literal, and exhaustively computes inferences with this candidate
and the context and clause set, moving the results to the candidate set.
Afterwards the candidate is inserted into one of the context or the clause
set, respectively, and the next candidate is selected.
The inferences are superposition-based.
Demodulation and various means of redundancy detection are used as well.
Strategies
The uniform search strategy is identical to the one employed in the original
Darwin, slightly adapted to account for derived clauses.
Implementation
E-Darwin is implemented in the functional/imperative language OCaml.
Darwin's method of storing partial unifiers
has been adapted to equations and subterm positions for the superposition
inferences in E-Darwin.
A combination of perfect and non-perfect discrimination tree indexes
is used to store the context and the clauses.
The system has been tested on Unix and is available under the GNU Public
License from
http://userpages.uni-koblenz.de/~bpelzer/edarwin
Expected Competition Performance
After some bugfixes the performance should be slightly better than last year.
While the original
Darwin performs strongly in EPR, E-Darwin is more of a generalist,
less effective in EPR, yet stronger in the other divisions.
Strategies
E-KRHyper uses a uniform search strategy for all problems.
The E-hyper tableau is generated depth-first, with E-KRHyper always working
on a single branch.
Refutational completeness and a fair search control are ensured
by an iterative deepening strategy with a limit on the maximum term
weight of generated clauses.
In the LTB division E-KRHyper sequentially tries three axiom selection
strategies:
an implementation of Krystof Hoder's SInE algorithm,
another incomplete selection based on the CNF representations of the axioms,
and finally the complete axiom set.
Implementation
E-KRHyper is implemented in the functional/imperative language OCaml.
The system accepts input in the TPTP-format and in the TPTP-supported
Protein-format.
The calculus implemented by E-KRHyper works on clauses, so first order
formula input is converted into CNF by an algorithm similar to the one
used by Otter [McC03], with some additional connector literals to prevent
explosive clause growth when dealing with DNF-like structures.
E-KRHyper operates on an E-hyper tableau which is represented by linked
node records.
Several layered discrimination-tree based indexes (both perfect and
non-perfect) provide access to the clauses in the tableau and support
backtracking.
The system runs on Unix and MS-Windows platforms and is available under
the GNU Public License from
http://www.uni-koblenz.de/~bpelzer/ekrhyper
Expected Competition Performance
The redundancy handling has been improved, resulting in a slight improvement
over last year.
Overall E-KRHyper will remain in the middle ground.
E-MaLeS 1.1
Daniel Kuehlwein1, Josef Urban1, Stephan Schulz2
1Radboud University Nijmegen, The Netherlands,
2Technische Universität München, Germany
Architecture
E-MaLeS (Machine Learning of Strategies) uses kernel-based machine learning
to improve the strategy selection of E prover.
Strategies
Each E strategy is run for the predicted time, starting with the strategy
with the lowest predicted time.
Implementation
E-MaLeS is implemented in python, using the numpy and scipy libraries.
It is available from
http://www.cs.ru.nl/~kuehlwein/
Expected Competition Performance
E-MaLeS should perform slightly better than E.
EP 1.4pre
Stephan Schulz
Technische Universität München, Germany
Architecture
E 1.4pre [Sch02,Sch04] is described in this section.
E is a purely equational theorem prover for full first-order logic with
equality.
It consists of an (optional) clausifier for pre-processing full
first-order formulae into clausal from, and a saturation algorithm
implementing an instance of the superposition calculus with negative
literal selection and a number of redundancy elimination techniques.
E is based on the DISCOUNT-loop variant of the given-clause
algorithm, i.e., a strict separation of active and passive facts.
No special rules for non-equational literals have been implemented.
Resolution is effectively simulated by paramodulation and equality
resolution.
Strategies
Proof search in E is primarily controlled by a literal selection
strategy, a clause evaluation heuristic, and a simplification
ordering. The prover supports a large number of pre-programmed literal
selection strategies. Clause evaluation heuristics can be constructed
on the fly by combining various parametrized primitive evaluation
functions, or can be selected from a set of predefined
heuristics. Clause evaluation heuristics are based on symbol-counting,
but also take other clause properties into account. In particular, the
search can prefer clauses from the set of support, or containing many
symbols also present in the goal. Supported term orderings are several
parametrized instances of Knuth-Bendix-Ordering (KBO) and
Lexicographic Path Ordering (LPO).
Implementation
E is build around perfectly shared terms, i.e. each distinct term is
only represented once in a term bank. The whole set of terms thus
consists of a number of interconnected directed acyclic graphs.
Term memory is managed by a simple mark-and-sweep garbage collector.
Unconditional (forward) rewriting using unit clauses is implemented
using perfect discrimination trees with size and age constraints.
Whenever a possible simplification is detected, it is added as a rewrite
link in the term bank.
As a result, not only terms, but also rewrite steps are shared.
Subsumption and contextual literal cutting (also known as subsumption
resolution) is supported using feature vector indexing [Sch04].
Superposition and backward rewriting use fingerprint indexing, a new
technique combining ideas from feature vector indexing and path
indexing.
Finally, LPO and KBO are implemented using the elegant and efficient
algorithms developed by Bernd Löchner in [Loe06,Loe06].
The prover and additional information are available at
http://www.eprover.org
Expected Competition Performance
EP 1.4pre is the CASC-23 CNF division winner.
EP 1.6pre
Stephan Schulz
Technische Universität München, Germany
Architecture
E 1.6pre [Sch02] is a purely equational theorem prover for full
first-order logic with equality. It consists of an (optional)
clausifier for pre-processing full first-order formulae into clausal
form, and a saturation algorithm implementing an instance of the
superposition calculus with negative literal selection and a number of
redundancy elimination techniques. E is based on the DISCOUNT-loop
variant of the given-clause algorithm, i.e., a strict
separation of active and passive facts. No special rules for
non-equational literals have been implemented. Resolution is
effectively simulated by paramodulation and equality resolution.
Strategies
Proof search in E is primarily controlled by a literal selection
strategy, a clause evaluation heuristic, and a simplification
ordering. The prover supports a large number of pre-programmed literal
selection strategies. Clause evaluation heuristics can be constructed
on the fly by combining various parametrized primitive evaluation
functions, or can be selected from a set of predefined
heuristics. Clause evaluation heuristics are based on symbol-counting,
but also take other clause properties into account. In particular, the
search can prefer clauses from the set of support, or containing many
symbols also present in the goal. Supported term orderings are several
parametrized instances of Knuth-Bendix-Ordering (KBO) and
Lexicographic Path Ordering (LPO).
Implementation
E is build around perfectly shared terms, i.e. each distinct term is
only represented once in a term bank. The whole set of terms thus
consists of a number of interconnected directed acyclic graphs.
Term memory is managed by a simple mark-and-sweep garbage collector.
Unconditional (forward) rewriting using unit clauses is implemented
using perfect discrimination trees with size and age constraints.
Whenever a possible simplification is detected, it is added as a rewrite
link in the term bank.
As a result, not only terms, but also rewrite steps are shared.
Subsumption and contextual literal cutting (also known as subsumption
resolution) is supported using feature vector indexing [Sch04].
Superposition and backward rewriting use fingerprint indexing [Sch12],
a new technique combining ideas from feature vector indexing and path
indexing.
Finally, LPO and KBO are implemented using the elegant and efficient
algorithms developed by Bernd Löchner in [Loe06,Loe06].
The prover and additional information are available at
http://www.eprover.org
Expected Competition Performance
E 1.6pre has improved indexing and now integrates optional SinE-like
preprocessing for large FOF problems. The system is expected to
perform well in most proof classes, but will at best complement top
systems in the disproof classes.
FIMO 0.3
Orkunt Sabuncu
University of Potsdam, Germany
Architecture
Fimo is a system for computing finite models of first-order formulas by
incremental Answer Set Programming (iASP).
The input theory is transformed to an incremental logic program.
If any, answer sets of this program represent finite models of the input
theory.
iClingo is used for computing answer sets of iASP programs.
Strategies
Fimo is the successor of the system fmc2iasp [GSS11].
Unlike fmc2iasp Fimo does not rely on flattening for translating the input
theory to a satisfiability problem.
Implementation
Fimo is developed in Python. It is available from
http://potassco.sourceforge.net
Expected Competition Performance
We are expecting a similar performance to the version that competed in
CASC 2011.
iProver 0.9
Konstantin Korovin
University of Manchester, United Kingdom
Architecture
iProver is an automated theorem prover based on an instantiation calculus
Inst-Gen [GK03,Kor09] which is complete for first-order logic.
One of the distinctive features of iProver is a modular combination of
first-order reasoning with ground reasoning.
In particular, iProver currently integrates MiniSat [ES04] for reasoning
with ground abstractions of first-order clauses.
In addition to instantiation, iProver implements ordered resolution calculus
and a combination of instantiation and ordered resolution; see
[Kor08] for the implementation details.
The saturation process is implemented as a modification of a given
clause algorithm.
iProver uses non-perfect discrimination trees for the unification indexes,
priority queues for passive clauses, and a compressed vector index for
subsumption and subsumption resolution (both forward and backward).
The following redundancy eliminations are implemented: blocking non-proper
instantiations; dismatching constraints [GK04,Kor08]; global subsumption
[Kor08]; resolution-based simplifications and propositional-based
simplifications.
A compressed feature vector index is used
for efficient forward/backward subsumption and subsumption resolution.
Equality is dealt with (internally) by adding the necessary axioms of equality
with an option of using Brand's transformation.
In the LTB division, iProver-SInE uses axiom selection based
on the SInE algorithm [HV11] as implemented in Vampire [HKV10],
i.e., axiom selection is done by Vampire and
proof attempts are done by iProver.
Strategies
iProver has around 40 options to control the proof
search including options for literal selection,
passive clause selection, frequency of calling the SAT solver,
simplifications and options for combination of instantiation with resolution.
At CASC iProver will execute a small number of fixed schedules of selected
options depending on general syntactic properties such as Horn/non-Horn,
equational/non-equational, and maximal term depth.
Implementation
iProver is implemented in OCaml and for the ground reasoning uses MiniSat.
iProver accepts FOF and CNF formats,
where Vampire [HKV10] is used for clausification of FOF problems.
http://www.cs.man.ac.uk/~korovink/iprover/
Expected Competition Performance
iProver 0.9 is the CASC-23 EPR division winner.
iProver 0.99
Konstantin Korovin, Christoph Sticksel
University of Manchester, United Kingdom
Architecture
iProver is an automated theorem prover based on an instantiation calculus
Inst-Gen [GK03,Kor09] which is complete for first-order logic.
iProver combines first-order reasoning with
ground reasoning for which it uses MiniSat [ES04].
iProver also combines instantiation with
ordered resolution; see [Kor08] for the implementation details.
The proof search is implemented using a saturation process based on
the given clause algorithm. iProver uses non-perfect discrimination trees
for the unification indexes, priority queues for passive clauses, and
a compressed vector index for subsumption and subsumption resolution
(both forward and backward).
The following redundancy eliminations are implemented: blocking non-proper
instantiations; dismatching constraints [GK04,Kor08]; global subsumption
[Kor08]; resolution-based simplifications and propositional-based
simplifications. A compressed feature vector index is used
for efficient forward/backward subsumption and subsumption resolution.
Equality is dealt with (internally) by adding the necessary axioms of equality
with an option of using Brand's transformation.
Type inference is targeted at improving finite model finding and symmetry
breaking.
Semantic filtering is used in preprocessing to eliminated irrelevant clauses.
Proof extraction is challenging
due to simplifications such global subsumption which involve global reasoning
with the whole clause set and can be computationally expensive.
Strategies
iProver has around 60 options to control the proof
search including options for literal selection,
passive clause selection, frequency of calling the SAT solver,
simplifications and options for combination of instantiation with resolution.
At CASC iProver will execute a small number of fixed schedules of selected
options depending on general syntactic properties such as Horn/non-Horn,
equational/non-equational, and maximal term depth.
The strategy for satisfiable problems (FNT division) includes finite model
finding.
Implementation
iProver is implemented in OCaml and for the ground reasoning uses MiniSat
[ES04].
iProver accepts FOF and CNF formats.
Vampire [HKV11] is used for proof-producing clausification of FOF
problems as well as for axiom selection [HV11] in the LTB division.
http://www.cs.man.ac.uk/~korovink/iprover/
Expected Competition Performance
We expect very good performance on satisfiable problems (FNT division)
due to recent additions.
In the EPR division we expect similar performance to the previous version but
additional computations needed for the proof extraction can degrade the
performance.
iProver-Eq 0.8
Christoph Sticksel, Konstantin Korovin
University of Manchester, United Kingdom
Architecture
iProver-Eq [KS10] extends the iProver system [Kor08] with built-in
equational reasoning, along the lines of [GK04]. As in the iProver
system, first-order reasoning is combined with ground satisfiability
checking where the latter is delegated to an off-the-shelf ground
solver.
Strategies
Proof search options in iProver-Eq control clause and literal
selection in the respective given clause algorithms. Equally
important is the global distribution of time between the inference
engines and the ground solver. At CASC, iProver-Eq will execute a
fixed schedule of selected options.
Implementation
iProver-Eq is implemented in OCaml and uses a prototype of the CVC4
SMT solver, which is the successor of CVC3 [BT07], for the ground
reasoning in the equational case and MiniSat [ES04] in the
non-equational case. iProver-Eq accepts FOF and CNF formats, where
Vampire [HKV11] is used for clausification and preprocessing
of FOF problems.
http://www.cs.man.ac.uk/~sticksec/iprover-eq
Expected Competition Performance
iProver-Eq has seen many optimisations from the version in the
previous CASCs, in particular regarding labelling and the
integration of CVC4. We expect reasonably good performance in all
divisions, including the EPR divisions where instantiation-based
methods are particularly strong.
Isabelle 2012
Jasmin C. Blanchette1, Lawrence C. Paulson2,
Tobias Nipkow1, Makarius Wenzel3
1Technische Universität München, Germany,
2University of Cambridge, United Kingdom,
3Université Paris Sud, France
Architecture
Isabelle/HOL 2012 [NPW12] is the higher-order logic incarnation of the generic
proof assistant
Isabelle2012.
Isabelle/HOL provides several automatic proof tactics, notably an equational
reasoner [Nip89], a classical reasoner [PN94], and a tableau prover [Pau99].
It also integrates external first- and higher-order provers via its subsystem
Sledgehammer [PB10,BBP11].
Two versions of Isabelle participate this year.
The demo version includes its competitors LEO-II [BP+08] and Satallax
[Bro12] as Sledgehammer backends, whereas the competition version
leaves them out.
isabelle tptp_isabelle_demo 100 SEU/SEU824^3.p
Strategies
The Isabelle tactic submitted to the competition simply tries the
following tactics sequentially:
Implementation
Isabelle is a generic theorem prover written in Standard ML.
Its meta-logic, Isabelle/Pure, provides an intuitionistic fragment of
higher-order logic.
The HOL object logic extends pure with a more elaborate version of higher-order
logic, complete with the familiar connectives and quantifiers.
Other object logics are available, notably FOL (first-order logic) and ZF
(Zermelo-Fraenkel set theory).
http://www.cl.cam.ac.uk/research/hvg/Isabelle
Expected Competition Performance
By integrating LEO-II and Satallax as two of many of its tactics, Isabelle
should have all the tools at its disposals to win the competition this year.
leanCoP---2.2
Jens Otten
University of Potsdam, Germany
Architecture
leanCoP [OB03,Ott08] is an automated theorem prover for classical first-order
logic with equality.
It is a very compact implementation of the connection (tableau) calculus
[Bib87,LS01].
Strategies
The reduction rule of the connection calculus is applied before the extension
rule.
Open branches are selected in a depth-first way.
Iterative deepening on the proof depth is used to achieve completeness.
Additional inference rules and strategies include regularity, lemmata, and
restricted backtracking [Ott10].
leanCoP uses an optimized structure-preserving transformation into clausal
form [Ott10] and a fixed strategy scheduling, which is invoked by a shell
script.
Implementation
leanCoP is implemented in Prolog.
The source code of the core prover is only a few lines long and fits on
half a page. Prolog's built-in indexing mechanism is used to quickly find
connections.
http://www.leancop.de
The website also contains information about the leanCoP versions
ileanCoP and MleanCoP, leading automated theorem provers for
first-order intuitionistic logic and several first-order
modal logics, respectively.
Expected Competition Performance
As the core prover has not changed, the performance of leanCoP 2.2
is expected to be similar to the performance of leanCoP 2.1.
leanCoP-ARDE---2.2
Mario Frank, Thomas Raths, Jens Otten
University of Potsdam, Germany
Architecture
ARDE (Axiom Relevance Decision Engine, current version 0.5) is a tool that
processes large sets of axioms and categorizes them into classes.
Given a conjecture it selects those axioms from the axiom set
that are somehow connected to the conjecture and thus possibly needed for the
proof of the conjecture.
For the proof process itself it consults leanCoP [OB03,Ott08], a compact
theorem prover for first-order classical logic.
Strategies
ARDE analyses all axioms and gathers information about included predicate
symbols, their arity and additional information like their polarities.
Based on these properties, ARDE tries to search for axioms that are
possibly needed.
These axioms together with the conjecture are then given to the leanCoP 2.2
theorem prover.
ARDE is multi-threaded and has the ability to trigger multiple proof
attemps concurrently with all threads having different axiom sets.
Implementation
ARDE is implemented in C++ using the C++11 standard.
It uses version 1.42 of the Boost library (e.g. Spirit), G++ 4.4.5
and GNU Prolog 1.3.1.
Other used tools include egrep and GNU make.
Expected Competition Performance
On large problems containing many axioms leanCoP-ARDE 2.2 is exptected to
show a better performance than the leanCoP 2.2 prover on its own.
LEO-II---1.4
Christoph Benzmüller
Free University Berlin, Germany
Architecture
LEO-II [BP+08], the successor of LEO [BK98], is a higher-order ATP system
based on extensional higher-order resolution.
More precisely, LEO-II employs a refinement of extensional higher-order
RUE resolution [Ben99].
LEO-II is designed to cooperate with specialist systems for fragments of
higher-order logic.
By default, LEO-II cooperates with the first-order ATP system E [Sch02].
LEO-II is often too weak to find a refutation amongst the steadily growing
set of clauses on its own.
However, some of the clauses in LEO-II's search space attain a special
status: they are first-order clauses modulo the application of an
appropriate transformation function.
Therefore, LEO-II launches a cooperating first-order ATP system every n
iterations of its (standard) resolution proof search loop (e.g., 10).
If the first-order ATP system finds a refutation, it communicates its success
to LEO-II in the standard SZS format.
Communication between LEO-II and the cooperating first-order ATP system
uses the TPTP language and standards.
Strategies
LEO-II employs an adapted "Otter loop".
Moreover, LEO-II now also uses some very basic strategy scheduling to try
different search strategies or flag settings.
These search strategies also include some different relevance filters.
Implementation
LEO-II is implemented in Objective Caml version 3.12, and its problem
representation language is the TPTP THF language [BRS08].
In fact, the development of LEO-II has largely paralleled the development
of the TPTP THF language and related infrastructure [SB10].
http://www.leoprover.org
Expected Competition Performance
LEO-II has not improved much since 2010.
The main modifications concern proof output in order to enable proof
reconstruction/verification of LEO-II proofs in other systems [SB12].
MaLARea 0.4
Josef Urban1, Daniel Kuehlwein1, Stephan Schulz2, Jiri Vyskocil3
1Radboud University Nijmegen, The Netherlands,
2Technische Universität München, Germany,
3Czech Technical University, Czech Republic
Architecture
MaLARea 0.4 [Urb07,USPV08] is a metasystem for ATP in large theories
where symbol and formula names are used consistently.
It uses several deductive systems (now E,SPASS,Vampire, Paradox, Mace), as
well as complementary AI techniques like machine learning (the SNoW system)
based on symbol-based similarity, model-based similarity, term-based
similarity, and obviously previous successful proofs.
The version for CASC 2012 will mainly use E prover, possibly also Prover9,
Mace and Paradox.
The premise selection methods will likely also use kernel-based learning
and E's implementation of SInE.
Strategies
The basic strategy is to run ATPs on problems, then use the machine learner
to learn axiom relevance for conjectures from solutions, and use
the most relevant axioms for next ATP attempts.
This is iterated, using different timelimits and axiom limits.
Various features are used for learning, and the learning is complemented
by other criteria like model-based reasoning, symbol and term-based
similarity, etc.
Implementation
The metasystem is implemented in ca. 2500 lines of Perl.
It uses many external programs - the above mentioned ATPs and machine learner,
TPTP utilities, LADR utilities for work with models, and some standard
Unix tools.
https://github.com/JUrban/MPTP2/tree/master/MaLARea
The metasystem's Perl code is released under GPL2.
Expected Competition Performance
Thanks to machine learning, MaLARea is strongest on batches of many
related problems with many redundant axioms where some of the problems
are easy to solve and can be used for learning the axiom relevance.
MaLARea is not very good when all problems are too difficult (nothing to
learn from), or the problems (are few and) have nothing in common.
Some of its techniques (selection by symbol and term-based similarity,
model-based reasoning) could however make it even there slightly stronger
than standard ATPs.
MaLARea has a very good performance on the MPTP Challenge, which is a
predecessor of the LTB division, and it is the winner of the 2008 MZR
LTB category which had the same rules as the MPTP Challenge.
Since then the LTB rules changed and MaLARea did not compete.
Muscadet 4.2
Dominique Pastre
University Paris Descartes, France
Architecture
The Muscadet theorem prover is a knowledge-based system.
It is based on Natural Deduction, following the terminology of [Ble71] and
[Pas78], and uses methods which resembles those used by humans.
It is composed of an inference engine, which interprets and executes rules,
and of one or several bases of facts,
which are the internal representation of "theorems to be proved".
Rules are either universal and put into the system, or built by the system
itself by metarules from data (definitions and lemmas).
Rules may add new hypotheses, modify the conclusion, create objects,
split theorems into two or more subtheorems
or build new rules which are local for a (sub-)theorem.
Strategies
There are specific strategies for existential, universal, conjonctive or
disjunctive hypotheses and conclusions, and equalities.
Functional symbols may be used, but an automatic creation of intermediate
objects allows deep subformulae to be flattened and treated as if the
concepts were defined by predicate symbols.
The successive steps of a proof may be forward deduction (deduce new hypotheses
from old ones), backward deduction (replace the conclusion by a new one),
refutation (only if the conclusion is a negation), search for objects satisfying the conclusion
or dynamic building of new rules.
Implementation
Muscadet [Pas01] is implemented in SWI-Prolog.
Rules are written as more or less declarative Prolog clauses.
Metarules are written as sets of Prolog clauses.
The inference engine includes the Prolog interpreter and some procedural
Prolog clauses.
A theorem may be split into several subtheorems, structured as a tree
with "and" and "or" nodes.
All the proof search steps are memorized as facts including all the elements
which will be necessary to extract later the useful steps (the name of the
executed action or applied rule, the new facts added or rule dynamically
built, the antecedents and a brief explanation).
http://www.math-info.univ-paris5.fr/~pastre/muscadet/muscadet.html
Expected Competition Performance
The best performances of Muscadet will be for problems
manipulating many concepts in which all statements (conjectures,
definitions, axioms) are expressed in a manner similar to the
practice of humans, especially of mathematicians [Pas02,Pas07].
It will have poor performances for problems using few concepts but large
and deep formulas leading to many splittings.
Its best results will be in set theory, especially for functions and relations.
Its originality is that proofs are given in natural style.
This 4.2 version is a minor revision of 4.1 version.
It differs only in that a number of bugs have been fixed.
Nitrox 2012
Jasmin C. Blanchette1, Emina Torlak2
1Technische Universität München, Germany,
2University of California, USA
Architecture
Nitrox is the first-order version of Nitpick [BN10], an open source
counterexample generator for Isabelle/HOL [NPW12].
It builds on Kodkod [TJ07], a highly optimized first-order relational
model finder based on SAT.
The name Nitrox is a portmanteau of Nitpick
and Paradox (clever, eh?).
Strategies
Nitrox employs Kodkod to find a finite model of the negated conjecture.
It performs a few transformations on the input, such as pushing quantifiers
inside, but 99% of the solving logic is in Kodkod and the underlying SAT
solver.
Implementation
Nitrox, like most of Isabelle/HOL, is written in Standard ML.
Unlike Isabelle itself, which adheres to the LCF small-kernel discipline,
Nitrox does not certify its results and must be trusted.
Kodkod is written in Java.
MiniSat 1.14 is used as the SAT solver.
Expected Competition Performance
Since Nitpick was designed for HOL, it doesn't have any type inference à
la Paradox.
It also doesn't use the SAT solver incrementally, which penalizes it a bit
(but not as much as the missing type inference).
Kodkod itself is known to perform less well on FOF than Paradox, because it
is designed and optimized for a somewhat different logic, FORL.
On the other hand, Kodkod's symmetry breaking seems better calibrated than
Paradox's.
Hence, we expect Nitrox to end up in second place at best in the TNF category.
Paradox 3.0
Koen Claessen, Niklas Sörensson
Chalmers University of Technology, Sweden
Architecture
Paradox [CS03] is a finite-domain model generator.
It is based on a MACE-style [McC03] flattening and instantiating of the
first-order clauses into propositional clauses, and then the use of
a SAT solver to solve the resulting problem.
Strategies
There is only one strategy in Paradox:
In this way, Paradox can be used both as a model finder and as an EPR solver.
Implementation
The main part of Paradox is implemented in Haskell using the GHC compiler.
Paradox also has a built-in incremental SAT solver which is written in C++.
The two parts are linked together on the object level using Haskell's Foreign
Function Interface.
Expected Competition Performance
Paradox 3.0 is the CASC-23 FNT division winner.
Princess 2012-06-04
Philipp Rümmer, Aleksandar Zeljić
Uppsala University, Sweden
Architecture
Princess [Rue08,Rue12] is a theorem prover for first-order logic modulo
linear integer arithmetic.
The prover has been under development since 2007, and represents a
combination of techniques from the areas of first-order reasoning and
SMT solving.
The main underlying calculus is a free-variable tableau calculus, which
is extended with constraints to enable backtracking-free proof expansion,
and positive unit hyper-resolution for lightweight instantiation of quantified
formulae.
Linear integer arithmetic is handled using a set of built-in proof rules
resembling the Omega test, which altogether yields a calculus that is
complete for full Presburger arithmetic, for first-order logic, and for
a number of further fragments.
Strategies
Princess supports a number of different proof expansion strategies
(e.g., depth-first, breadth-first), which are chosen based on
syntactic properties of a problem (in particular, the kind of
quantifiers occurring in the problem). Further options exist to
control, for instance, the selection of triggers in quantified
formulae, clausification, and the handling of functions.
Implementation
Princess is entirely written in Scala and runs on any recent Java
virtual machine; besides the standard Scala and Java libraries, only
the Cup parser library is employed.
Princess is available from
http://www.philipp.ruemmer.org/princess.shtml
Expected Competition Performance
Since Princess enters CASC for the first time, performance predictions
are entirely speculative. Princess is mainly designed for the TFI
category, and should perform reasonably well here. Support for
rationals and reals is preliminary and incomplete, so that mediocre
results are expected for TFR. Finally, although Princess is in theory
complete for FOL, it is not designed for this logic and enters FOF
just for fun.
Prover9 2009-11A
Bob Veroff on behalf of William McCune
University of New Mexico, USA
Architecture
Prover9, Version 2009-11A, is a resolution/paramodulation prover for
first-order logic with equality.
Its overall architecture is very similar to that of Otter-3.3 [McC03].
It uses the "given clause algorithm", in which not-yet-given clauses are
available for rewriting and for other inference operations (sometimes called
the "Otter loop").
Strategies
Like Otter, Prover9 has available many strategies; the following
statements apply to CASC-2012.
Implementation
Prover9 is coded in C, and it uses the LADR libraries.
Some of the code descended from EQP [McC97].
(LADR has some AC functions, but Prover9 does not use them).
Term data structures are not shared (as they are in Otter).
Term indexing is used extensively, with discrimination tree indexing for
finding rewrite rules and subsuming units, FPA/Path indexing for finding
subsumed units, rewritable terms, and resolvable literals.
Feature vector indexing [Sch04] is used for forward and backward nonunit
subsumption.
Prover9 is available from
http://www.cs.unm.edu/~mccune/prover9/
Expected Competition Performance
Some of the strategy development for CASC was done by experimentation with
the CASC-2004 competition "selected" problems.
(Prover9 has not yet been run on other TPTP problems.)
Prover9 is unlikely to challenge the CASC leaders, because
(1) extensive testing and tuning over TPTP problems has not been done,
(2) theories (e.g., ring, combinatory logic, set theory) are not recognized,
(3) term orderings and symbol precedences are not fine-tuned, and
(4) multiple searches with differing strategies are not run.
PS-E---1.0
Daniel Kuehlwein1, Josef Urban1, Stephan Schulz2
1Radboud University Nijmegen, The Netherlands,
2Technische Universität München, Germany
Architecture
PS-E (Premise Selector - E) [KvL+12] is a learning based metasystem for large
theories where symbol and formula names are used consistently.
It uses the MOR [KvL+12] algorithm for learning and is based upon E.
PS-E learns how symbols and terms relate to used premises.
Given a new conjecture, PS-E extracts its symbols, terms and subterms and
predicts which premises are needed to solve it.
When PS-E finds new proofs, it updates its learning algorithm to produce
even more accurate predictions.
Strategies
The basic strategy is to run ATPs on problems, then use the machine learner
to learn axiom relevance for conjectures from solutions, and use the most
relevant axioms for next ATP attempts.
This is iterated, using different timelimits and axiom limits.
The learning parameters (regularization,the gaussian kernel parameters sigma,
thresholds) are optimized via cross validation on the 1000 example problems
and solutions.
Implementation
PS-E is implemented in python, using the numpy and scipy libraries.
It is available from
http://www.cs.ru.nl/~kuehlwein/
Expected Competition Performance
Based on the performance of the MOR algorithms in [KvL+12] PS-E is expected
to be very competitive.
Satallax 2.1
Chad E. Brown
Saarland University, Germany
Architecture
Satallax [Bro11] is an automated theorem prover for higher-order logic.
The particular form of higher-order logic supported by Satallax is Church's
simple type theory with extensionality and choice operators.
The SAT solver MiniSat [ES04] is responsible for much of the search for a
proof.
Strategies
There are a number of flags that control the order in which formulas and
instantiation terms are considered and propositional clauses are generated.
Other flags activate some optional extensions to the basic proof procedure.
A collection of flag settings is called a mode.
Approximately 250 modes have been tried so far.
Regardless of the mode, the search procedure is sound and complete for
higher-order logic with choice.
This implies that if search terminates with a particular mode, then we can
conclude that the original set of formulae is unsatisfiable or satisfiable.
Implementation
Satallax 2.1 is implemented in OCaml.
A foreign function interface is used to interact with MiniSat.
Satallax is available from
http://satallax.com
Expected Competition Performance
Satallax 2.1 is the CASC-23 THF division winner.
Satallax 2.4
Chad E. Brown
Saarland University, Germany
Architecture
Satallax 2.4 is an automated theorem prover for higher-order logic.
The particular form of higher-order logic supported by Satallax is Church's
simple type theory with extensionality and choice operators.
The SAT solver MiniSat [ES04] is responsible for much of the search for a
proof.
The theoretical basis of search is a complete ground tableau calculus for
higher-order logic [BS10] with a choice operator [BB11].
A problem is given in the THF format.
A branch is formed from the axioms of the problem and the negation of the
conjecture (if any is given).
From this point on, Satallax tries to determine unsatisfiability or
satisfiability of this branch.
Strategies
There are about a hundred flags [Bro12] that control the order in which
formulas and instantiation terms are considered and propositional clauses
are generated.
Other flags activate some optional extensions to the basic proof procedure.
A collection of flag settings is called a mode.
Approximately 300 modes have been defined and tested so far.
A strategy schedule is an ordered collection of modes with information about
how much time the mode should be allotted.
Satallax tries each of the modes for a certain amount of time sequentially.
Satallax 2.4 has strategy schedule consisting of 58 modes (3 of which may be
tried twice - the second time with a longer timeout).
Each mode is tried for time limits ranging from 0.1 seconds to 27.3 seconds.
The strategy schedule was determined through experimentation using the THF
problems in version 5.3.0 of the TPTP library.
Implementation
Satallax 2.4 is implemented in OCaml.
A foreign function interface is used to interact with MiniSat 2.2.0
(Satallax 2.1 used MiniSat 2.0.)
Satallax is available from
http://satallax.com
Expected Competition Performance
Satallax 2.1 won the THF division last year by proving 246 of 300 theorems.
Satallax 2.4 can prove 272 of these same 300 theorems indicating an
improvement of roughly 10 percent.
On the other hand, the Sledgehammer tactic in Isabelle has also improved in
the past year and has a good chance of winning the THF division.
SPASS+T 2.2.14
Uwe Waldmann1, Stephan Zimmer2
1Max-Planck-Institut für Informatik, Germany,
2AbsInt GmbH, Germany
Architecture
SPASS+T is an extension of the superposition-based theorem prover SPASS
that integrates algebraic knowledge into SPASS in three complementary ways:
by passing derived formulas to an external SMT procedure
(currently Yices or CVC3), by adding standard axioms,
and by built-in arithmetic simplification and inference rules.
A first version of the system has been described in [PW06].
In the current version, a much more sophisticated coupling of
the SMT procedure has been added [Zim07].
Strategies
Standard axioms and built-in arithmetic simplification and inference rules
are integrated into the standard main loop of SPASS.
Inferences between standard axioms are excluded, so the
user-supplied formulas are taken as set of support.
The external SMT procedure runs in parallel in a separate process,
leading occasionally to non-deterministic behaviour.
Implementation
SPASS+T is implemented in C. The system is available from
http://www.mpi-inf.mpg.de/~uwe/software/#TSPASS
Expected Competition Performance
SPASS+T 2.2.14 is the CASC-23 TFA division winner.
SPASS+T 2.2.16
Uwe Waldmann1, Stephan Zimmer2
1Max-Planck-Institut für Informatik, Germany,
2AbsInt GmbH, Germany
Architecture
SPASS+T is an extension of the superposition-based theorem prover SPASS
that integrates algebraic knowledge into SPASS in three complementary ways:
by passing derived formulas to an external SMT procedure
(currently Yices or CVC3), by adding standard axioms,
and by built-in arithmetic simplification and inference rules.
A first version of the system has been described in [PW06];
later a much more sophisticated coupling of the SMT procedure has been
added [Zim07].
Strategies
Standard axioms and built-in arithmetic simplification and inference rules
are integrated into the standard main loop of SPASS.
Inferences between standard axioms are excluded, so the
user-supplied formulas are taken as set of support.
The external SMT procedure runs in parallel in a separate process,
leading occasionally to non-deterministic behaviour.
Implementation
SPASS+T is implemented in C. The system is available from
http://www.mpi-inf.mpg.de/~uwe/software/#TSPASS
Expected Competition Performance
SPASS+T 2.2.14 has been the winner of the TFA division of last CASC;
we expect a similar performance in CASC 2012.
STP---1.0
Adam Pease1, Stephan Schulz2
1Articulate Software, USA,
2Technische Universität München, Germany
Architecture
STP is a minimalistic resolution theorem prover designed for educational
purposes.
The goal is to present a simple and clean series of provers starting with
the most basic functional prover that can process TPTP problems, and
gradually extending the system to implement more and more state-of-the-art
techniques.
STP is open source and it is hoped that it will provide a basis for many
new derivative provers to participate in CASC.
Strategies
It includes subsumption and tautology elimination as simplification techniques, and a few simple strategies for axiom selection.
Implementation
It supports both CNF translation and proof object generation and output in
TSTP format.
STP is available in both Python and Java implementations.
Expected Competition Performance
Performance is expected to be poor compared to modern systems.
SuperZenon 0.0.1
David Delahaye1, Melanie Jacquel2
1CEDRIC/CNAM, France, 2Siemens IC-MOL, France
Architecture
SuperZenon is an experimental extension of the Zenon [BDD07] automated
theorem prover, using the principles of superdeduction, among which the
theory is used to enrich the deduction system with new deduction rules.
A version of SuperZenon has been instantiated for the set theory of the B
method [Abr96].
This allows us to provide another prover to
Atelier B, which can be used to verify
B proof rules in particular.
This version of SuperZenon has been successfully applied (with significant
speed-ups both in terms of proof time and proof size) to the verification
of B proof rules coming from the database maintained by
Siemens IC-MOL [JB+12].
Strategies
The strategy of SuperZenon relies on the automatic orientation of some
axioms of the theory.
The axioms that can be oriented are either equivalences or implications
where at least one member is atomic.
The axioms which cannot be oriented are left as axioms.
It should be noted that the preservation of completeness after this
transformation of a part of the theory cannot be ensured in general.
Implementation
The implementation of SuperZenon relies on the one of Zenon, and has been
realized thanks to the ability of Zenon to extend its core of deductive
rules to match specific requirements.
The compilation of superdeduction rules is performed using the proof search
method of Zenon with a subset of rules.
The implementation of Super Zenon is available from
http://cedric.cnam.fr/~delahaye/super-zenon/
Expected Competition Performance
SuperZenon is expected to improve the performance of Zenon in each division
in which Zenon competes (i.e., FOFT and FOF).
TPS 3.120601S1b
Chad E. Brown1, Peter Andrews2
1Saarland University, Germany,
2Carnegie Melon University, USA
Architecture
TPS is a higher-order theorem proving system that has been developed over
several decades under the supervision of Peter B. Andrews with substantial
work by Eve Longini Cohen, Dale A. Miller, Frank Pfenning, Sunil Issar,
Carl Klapper, Dan Nesmith, Hongwei Xi, Matthew Bishop, Chad E. Brown,
Mark Kaminski, Rémy Chrétien and Cris Perdue.
TPS can be used to prove theorems of Church's type theory automatically,
interactively, or semi-automatically [AB+96,AB06].
When searching for a proof, TPS first searches for an expansion proof
[Mil87] or an extensional expansion proof [Bro07] of the theorem.
Part of this process involves searching for acceptable matings [And81].
Using higher-order unification, a pair of occurrences of subformulae (which
are usually literals) is mated appropriately on each vertical path through
an expanded form of the theorem to be proved.
The expansion proof thus obtained is then translated [Pfe87] without further
search into a proof of the theorem in natural deduction style.
Strategies
Strategies used by TPS in the search process include:
Implementation
TPS has been developed as a research tool for developing, investigating, and
refining a variety of methods of searching for expansion proofs, and
variations of these methods.
Its behavior is controlled by hundreds of flags.
A set of flags, with values for them, is called a mode.
The strategy of the current version of TPS consists of 71 modes.
When searching for a proof in automatic mode, TPS tries each of these modes
in turn for a specified amount of time (at least 1 second and at most 41
seconds).
If TPS succeeds in finding an expansion proof, it translates the expansion
proof to a natural deduction proof.
This final step ensures that TPS will not incorrectly report that a formula
has been proven.
TPS is implemented in Common Lisp, and is available from
http://gtps.math.cmu.edu/tps.html
Expected Competition Performance
TPS was the CASC-22 THF division winner in 2009.
In the two CASC competitions since 2009 TPS has come in last behind Isabelle,
LEO-II and Satallax.
There have been no changes to the code of TPS since last year.
However, TPS has been run with many modes on the THF problems from the TPTP.
This information has been used to generate a new strategy schedule which
should make TPS more competitive.
Vampire-LTB 1.8
Krystof Hoder, Andrei Voronkov
The University of Manchester, United Kingdom
Architecture
Vampire 1.8, is an automatic theorem prover for first-order
classical logic.
It consists of a shell and a kernel.
The kernel implements the calculi of ordered binary resolution and
superposition for handling equality.
The splitting rule in kernel adds propositional parts to clauses, which are
manipulated using binary decision diagrams and a SAT solver.
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion,
subsumption resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Strategies
The Vampire 1.8 kernel provides a fairly large number of options for
strategy selection. The most important ones are:
Implementation
Vampire 1.8 is implemented in C++.
Expected Competition Performance
Vampire-LTB 1.8 is the CASC-23 LTB division winner.
Vampire 2.6
Krystof Hoder, Andrei Voronkov
University of Manchester, England
Architecture
Vampire 2.6 is an automatic theorem prover for first-order classical logic.
It consists of a shell and a kernel.
The kernel implements the calculi of ordered binary resolution and
superposition for handling equality.
It also implements the Inst-gen calculus.
The splitting rule in kernel adds propositional parts to clauses, which are
being manipulated using binary decision diagrams and a SAT solver.
A number of standard redundancy criteria and simplification techniques are
used for pruning the search space: subsumption, tautology deletion,
subsumption resolution and rewriting by ordered unit equalities.
The reduction ordering is the Knuth-Bendix Ordering.
Strategies
The Vampire 2.6 kernel provides a fairly large number of options for strategy se
lection. The most important ones are:
Implementation
Vampire 2.6 is implemented in C++.
Expected Competition Performance
We expect Vampire 2.6 to outperform the last year's Vampire 1.8, especially
in the satisfiability-checking divisions.
Zenon 0.7.1
Damien Doligez
INRIA, France
Architecture
Zenon 0.7.1 [BDD07] is a theorem prover based on a proof-confluent
version of analytic tableaux. It uses all the usual tableau rules
for first-order logic with a rule-based handling of equality.
Strategies
Zenon is a fully automatic black-box design with no user-selectable
strategies.
Implementation
Zenon is implemented in OCaml.
Its most interesting data structure is the representation of
first-order formulas and terms: they are hash-consed modulo alpha
conversion.
http://zenon-prover.org
Expected Competition Performance
The first-order reasoning part of Zenon has not changed much since
2010, and the handling of equality is still really bad, so Zenon is
again expected to rank in the lower half of the field.