The CADE-14 ATP System Competition
to be held at
Sheraton Breakwater Hotel, Townsville, Australia
16th July, 1997
Call for Participation
The CADE-14 ATP System Competition will evaluate the performance of sound,
fully automatic, 1st order ATP systems, in terms of:
- the number of problems solved, and
- the average runtime for successful solutions;
in the context of:
- a bounded number of eligible problems, chosen from the
TPTP Problem Library, and
- a specified CPU time limit for each solution attempt,
The competition machines are SUN Ultra 140s, each with 64MB memory and
running Solaris 2.5.1.
The competition is being organized by
Christian Suttner and
Geoff Sutcliffe.
If you have any questions about the competition, please
email the organizers.
The competition will be overseen by a panel of knowledgeable researchers who
are not participating in the event.
It is planned to publish the competition results in a form that includes
contributions written by entrants.
This document contains information about the:
Competition Divisions
The competition will be divided into divisions according to syntactic problem
characteristics:
- The MIX Division:
Mixed CNF Non-Propositional Theorems
Mixed means Horn and non-Horn problems, with or without
equality, but not unit equality problems (see the UEQ division below).
The MIX division is divided into four categories:
- Horn with No Equality
- Non-Horn with No Equality
- Horn with Equality
- Non-Horn with Equality
- The UEQ Division: Unit Equality CNF Theorems
Theorems containing only unit equality clauses.
- The SAT Division:
Mixed CNF Non-Propositional Non-theorems
Entry in the competition is subject to the following rules:
Problem Selection and Preparation
The problems to be solved will be selected from the
TPTP Problem Library, v2.0.0.
The TPTP difficulty rating scheme identifies problems as:
- Solvable by all state-of-the-art ATP systems (easy)
- Solvable by some state-of-the-art ATP systems (difficult)
- Solvable by no state-of-the-art ATP systems (unsolved)
- Theorem-hood unknown (open)
The competition will use "difficult" TPTP problems.
The problems will be selected to ensure an even distribution of problem
difficulty.
The TPTP distinguishes versions of problems as one of standard, non-standard,
incomplete, or special.
The competition will use standard TPTP problems.
The eligible problems for each competition division and category can be
extracted from the TPTP using the tptp1T script that is distributed with
the TPTP, as follows:
- MIX Division, Horn with No Equality
tptp1T CNF NonProp Unsatisfiable Rating 0.01 0.99 Standard Horn NoEq
- MIX Division, Non-Horn with No Equality
tptp1T CNF NonProp Unsatisfiable Rating 0.01 0.99 Standard NonHorn NoEq
- MIX Division, Horn with Equality
tptp1T CNF NonProp Unsatisfiable Rating 0.01 0.99 Standard Horn SomeEq
and
tptp1T CNF NonProp Unsatisfiable Rating 0.01 0.99 Standard Horn NonUnitEq
- MIX Division, Non-Horn with Equality
tptp1T CNF NonProp Unsatisfiable Rating 0.01 0.99 Standard NonHorn SomeEq
- UEQ Division
tptp1T CNF NonProp Unsatisfiable Rating 0.01 0.99 Standard UnitEq
- SAT Division
tptp1T CNF NonProp Satisfiable Rating 0.00 0.00 Standard
and
tptp1T CNF NonProp Satisfiable Rating 0.00 0.00 Special
Note that all the problems are "easy".
We have had to accept this to get some eligible problems.
Further, Special problems have been accepted, as there
are too few Standard problems.
Finally, there have been a few changes to problem version status since
the release of TPTP v2.0.0, so check the explicit eligible problems
list.
- FOF Divisions
tptp1T FOF Standard NonProp Theorem
There is no Rating information available for
FOF problems.
The number of problems to be used will be chosen between a minimum that
ensures sufficient confidence in the competition results (the competition
organizers will ensure that there are sufficient resources available), and
a maximum determined from
the number of workstations available,
the time allocated to the competition,
the number of ATP systems, and
the minimal time limit:
Number of workstations * Time for competition
Maximum = ---------------------------------------------
Number of systems * Minimal time limit
The problems to be used will be randomly selected on the day of the
competition, from the most recent TPTP release.
The tptp2X utility, distributed with the TPTP, will be used to:
- replace predicate and function symbols by short, meaningless symbols,
- randomly reorder the clauses and literals,
- randomly reverse the unit equalities in the UEQ division,
- remove equality axioms that are not needed by an ATP system,
- output the problems in the format required.
(If the tptp2X utility does not support the format required, then it is
the entrant's responsibility to provide a reformating program.
The most acceptable form for this program is a tptp2X format module.
If the ATP system is publically available, then the competition organizers
may be able to produce such a module.)
Time Limits and Timing
A CPU time limit will be imposed on individual solution attempts.
The time limit will be chosen between a minimum of 3 minutes (the competition
organizers will ensure that there are sufficient resources available), and
a maximum determined from
the number of workstations available,
the time allocated to the competition,
the number of ATP systems, and
the minimal number of problems to be used:
Number of workstations * Time for competition
Maximum = ----------------------------------------------
Number of systems * Minimal number of problems
The timing will be done in units of 1 second, and the minimal time to find
a solution is 1 second.
If an ATP system cannot solve a problem, the runtime will be set to the
time limit.
System Execution
It is the responsibility of each entrant to ensure that their ATP system is
operational on the competition hardware, by 4th July 1997.
The ATP systems will be tested for soundness before the competition.
Systems which fail this test, or are found to be unsound at any time during
or after the competition, will be disqualified.
The ATP systems will be executed by a shell script that invokes each system
by a single command line, with the TPTP file name, the time limit (if required
by the entrant), and entrant specified system switches (the same for all
problems) as command line arguments.
The command line may not use UNIX shell features, e.g., redirection and
piping cannot be used.
The time limit will be imposed by sending a SIGXCPU (signal 30) to the ATP
system.
The ATP systems must be interruptable by SIGXCPU.
When terminating, the ATP system must output a distinguished string (specified
by the entrant) to stdout, indicating the result:
- A proof exists, or
- No proof exists (clauses are satisfiable), or
- No solution found.
The ATP systems are not required to output solutions (proofs or models).
However, systems that do output solutions will be highlighted in the
presentation of results.
For every problem solved, the solution process must be reproducible by running
the system again.
Performance Evaluation
The systems will be ranked within each competition division and category.
The ranking will be according to the number of problems solved.
If several systems solve the same number of problems, then those systems
will be ranked according to their average runtimes over solutions found.
Note: If only one ATP system is registered for a particular division or
category, no winner can be announced for that division or category, but the
results for that system will still be presented.
FOF (First Order Form) Demonstration Division
The infrastructure for a FOF competition division will not be ready for
CASC-14.
However, systems that can deal with FOF syntax can demonstrate their abilities
in the FOF Demonstration division
(the tptp2X utility will contain a clausifer that may be prepended to a CNF
system to form a FOF system; the tptp2X runtime is included in the total
runtime).
Ideally the systems will run on locally provided standard UNIX workstations,
but use of any hardware supplied by the entrant or accessed via the
Internet is acceptable.
The FOF Demonstration division will use non-trivial FOF theorems randomly
selected from the TPTP.
A CPU time limit, equal to the one in the competition divisions, will be
imposed on each solution attempt.
The system execution will be controlled by a perl script, provided
by the competition organizers.
The results will be presented, but no winner assessment will be made.
Special Hardware Demonstration Divisions
CNF ATP systems that cannot run on the locally provided standard UNIX
workstations may enter the Special Hardware Demonstration divisions.
The hardware is supplied by the entrant or accessed via the InterNet.
The rules for entry are the same as for the competition divisions, and
the same problems will be used.
A wall clock time limit, equal to the CPU time limit in the competition
divisions, will be imposed on each solution attempt.
The system execution will be controlled by a perl script, provided
by the competition organizers.
The results will be presented along with the competition divisions' results,
but no winner assessment will be made.
Conditions for Participation
Entering CASC-14 is subject to the following rules:
Disclaimer
Every effort is being made to organize the competition in a fair and
constructive manner.
No responsibility will be taken if, for one reason or the other, your system
does not win.
Competition Dinner
A dinner for entrants and associates will be held on the evening of 14th July.
To attend this dinner it is necessary to register for the competition, using
the
CADE-14 registration form.
Note: You have to be directly associated with an entered ATP system to attend
this dinner.
It's an exclusive event.
(Spouses are welcome to come along; please indicate this in your registration.)
ATP System Registration Form
Registration Deadline: 15th June 1997
|
Register as early as possible, so that the organizers can ensure
that
sufficient resources are available. Do it now!