The CADE-14 ATP System Competition

to be held at

Sheraton Breakwater Hotel, Townsville, Australia
16th July, 1997

Call for Participation

The CADE-14 ATP System Competition will evaluate the performance of sound, fully automatic, 1st order ATP systems, in terms of:

the number of problems solved, and
the average runtime for successful solutions;

in the context of:

a bounded number of eligible problems, chosen from the TPTP Problem Library, and
a specified CPU time limit for each solution attempt,

The competition machines are SUN Ultra 140s, each with 64MB memory and running Solaris 2.5.1.

The competition is being organized by Christian Suttner and Geoff Sutcliffe. If you have any questions about the competition, please email the organizers. The competition will be overseen by a panel of knowledgeable researchers who are not participating in the event. It is planned to publish the competition results in a form that includes contributions written by entrants.

This document contains information about the:

Competition Divisions
Problem Selection and Preparation
Time Limits and Timing
System Execution
Performance Evaluation
FOF Demonstration Division
Special Hardware Demonstration Divisions
Conditions for Participation
Competition Dinner
System Registration Form

Competition Divisions

The competition will be divided into divisions according to syntactic problem characteristics:

The MIX Division: Mixed CNF Non-Propositional Theorems
Mixed means Horn and non-Horn problems, with or without equality, but not unit equality problems (see the UEQ division below).
The MIX division is divided into four categories:
- Horn with No Equality
- Non-Horn with No Equality
- Horn with Equality
- Non-Horn with Equality
The UEQ Division: Unit Equality CNF Theorems
Theorems containing only unit equality clauses.
The SAT Division: Mixed CNF Non-Propositional Non-theorems

Entry in the competition is subject to the following rules:

ATP systems may be entered only at the division level.

ATP systems may be entered into more than one division (a system that is not entered into a particular division is assumed to be useless for that type of problem).

All problems will be presented in CNF. ATP systems that accept FOF problems may enter the FOF Demonstration division.
All systems must run on a single locally provided standard UNIX workstation. ATP systems that cannot run on the locally provided standard UNIX workstations may enter the Special Hardware Demonstration divisions.

Problem Selection and Preparation

The problems to be solved will be selected from the TPTP Problem Library, v2.0.0.

The TPTP difficulty rating scheme identifies problems as:

Solvable by all state-of-the-art ATP systems (easy)
Solvable by some state-of-the-art ATP systems (difficult)
Solvable by no state-of-the-art ATP systems (unsolved)
Theorem-hood unknown (open)

The competition will use "difficult" TPTP problems. The problems will be selected to ensure an even distribution of problem difficulty.

The TPTP distinguishes versions of problems as one of standard, non-standard, incomplete, or special. The competition will use standard TPTP problems.

The eligible problems for each competition division and category can be extracted from the TPTP using the tptp1T script that is distributed with the TPTP, as follows:

MIX Division, Horn with No Equality
tptp1T CNF NonProp Unsatisfiable Rating 0.01 0.99 Standard Horn NoEq
MIX Division, Non-Horn with No Equality
tptp1T CNF NonProp Unsatisfiable Rating 0.01 0.99 Standard NonHorn NoEq
MIX Division, Horn with Equality
tptp1T CNF NonProp Unsatisfiable Rating 0.01 0.99 Standard Horn SomeEq
and
tptp1T CNF NonProp Unsatisfiable Rating 0.01 0.99 Standard Horn NonUnitEq
MIX Division, Non-Horn with Equality
tptp1T CNF NonProp Unsatisfiable Rating 0.01 0.99 Standard NonHorn SomeEq
UEQ Division
tptp1T CNF NonProp Unsatisfiable Rating 0.01 0.99 Standard UnitEq
SAT Division
tptp1T CNF NonProp Satisfiable Rating 0.00 0.00 Standard
and
tptp1T CNF NonProp Satisfiable Rating 0.00 0.00 Special
Note that all the problems are "easy". We have had to accept this to get some eligible problems. Further, Special problems have been accepted, as there are too few Standard problems. Finally, there have been a few changes to problem version status since the release of TPTP v2.0.0, so check the explicit eligible problems list.
FOF Divisions
tptp1T FOF Standard NonProp Theorem
There is no Rating information available for FOF problems.

The number of problems to be used will be chosen between a minimum that ensures sufficient confidence in the competition results (the competition organizers will ensure that there are sufficient resources available), and a maximum determined from the number of workstations available, the time allocated to the competition, the number of ATP systems, and the minimal time limit:

          Number of workstations * Time for competition
Maximum = ---------------------------------------------
             Number of systems * Minimal time limit

The problems to be used will be randomly selected on the day of the competition, from the most recent TPTP release. The tptp2X utility, distributed with the TPTP, will be used to:

replace predicate and function symbols by short, meaningless symbols,
randomly reorder the clauses and literals,
randomly reverse the unit equalities in the UEQ division,
remove equality axioms that are not needed by an ATP system,
output the problems in the format required. (If the tptp2X utility does not support the format required, then it is the entrant's responsibility to provide a reformating program. The most acceptable form for this program is a tptp2X format module. If the ATP system is publically available, then the competition organizers may be able to produce such a module.)

Time Limits and Timing

A CPU time limit will be imposed on individual solution attempts. The time limit will be chosen between a minimum of 3 minutes (the competition organizers will ensure that there are sufficient resources available), and a maximum determined from the number of workstations available, the time allocated to the competition, the number of ATP systems, and the minimal number of problems to be used:

          Number of workstations * Time for competition
Maximum = ----------------------------------------------
          Number of systems * Minimal number of problems

The timing will be done in units of 1 second, and the minimal time to find a solution is 1 second. If an ATP system cannot solve a problem, the runtime will be set to the time limit.

System Execution

It is the responsibility of each entrant to ensure that their ATP system is operational on the competition hardware, by 4th July 1997. The ATP systems will be tested for soundness before the competition. Systems which fail this test, or are found to be unsound at any time during or after the competition, will be disqualified.

The ATP systems will be executed by a shell script that invokes each system by a single command line, with the TPTP file name, the time limit (if required by the entrant), and entrant specified system switches (the same for all problems) as command line arguments. The command line may not use UNIX shell features, e.g., redirection and piping cannot be used.

The time limit will be imposed by sending a SIGXCPU (signal 30) to the ATP system. The ATP systems must be interruptable by SIGXCPU.

When terminating, the ATP system must output a distinguished string (specified by the entrant) to stdout, indicating the result:

A proof exists, or
No proof exists (clauses are satisfiable), or
No solution found.

The ATP systems are not required to output solutions (proofs or models). However, systems that do output solutions will be highlighted in the presentation of results.

For every problem solved, the solution process must be reproducible by running the system again.

Performance Evaluation

The systems will be ranked within each competition division and category. The ranking will be according to the number of problems solved. If several systems solve the same number of problems, then those systems will be ranked according to their average runtimes over solutions found.

Note: If only one ATP system is registered for a particular division or category, no winner can be announced for that division or category, but the results for that system will still be presented.

FOF (First Order Form) Demonstration Division

The infrastructure for a FOF competition division will not be ready for CASC-14. However, systems that can deal with FOF syntax can demonstrate their abilities in the FOF Demonstration division (the tptp2X utility will contain a clausifer that may be prepended to a CNF system to form a FOF system; the tptp2X runtime is included in the total runtime). Ideally the systems will run on locally provided standard UNIX workstations, but use of any hardware supplied by the entrant or accessed via the Internet is acceptable.

The FOF Demonstration division will use non-trivial FOF theorems randomly selected from the TPTP. A CPU time limit, equal to the one in the competition divisions, will be imposed on each solution attempt. The system execution will be controlled by a perl script, provided by the competition organizers.

The results will be presented, but no winner assessment will be made.

Special Hardware Demonstration Divisions

CNF ATP systems that cannot run on the locally provided standard UNIX workstations may enter the Special Hardware Demonstration divisions. The hardware is supplied by the entrant or accessed via the InterNet.

The rules for entry are the same as for the competition divisions, and the same problems will be used. A wall clock time limit, equal to the CPU time limit in the competition divisions, will be imposed on each solution attempt. The system execution will be controlled by a perl script, provided by the competition organizers.

The results will be presented along with the competition divisions' results, but no winner assessment will be made.

Conditions for Participation

Entering CASC-14 is subject to the following rules:

A person must be nominated to handle all issues (including execution difficulties) arising before and during the competition.

It is not necessary for entrants to physically attend the competition.

It is assumed that each entrant has read all the documents related to the competition (they will be made available to entrants) and will comply with the competition rules. Non-compliance with the rules could lead to disqualification.

Entering many similar versions of the same system is deprecated. Entrants may be required to limit the number of system versions that they enter.

The precomputation and storage of any information for individual TPTP problems for usage during the competition is contrary to the spirit of the competition. The panel may disqualify unfairly doped systems.

Winners are expected to provide public access to their system's source code. If you want to participate without releasing your source code, please email the organizers.
Catch-all rule: no cheating is allowed; the panel may disqualify entrants due to unfairness and may adjust competition rules in case of misuse.

Disclaimer

Every effort is being made to organize the competition in a fair and constructive manner. No responsibility will be taken if, for one reason or the other, your system does not win.

Competition Dinner

A dinner for entrants and associates will be held on the evening of 14th July. To attend this dinner it is necessary to register for the competition, using the CADE-14 registration form.

Note: You have to be directly associated with an entered ATP system to attend this dinner. It's an exclusive event. (Spouses are welcome to come along; please indicate this in your registration.)

ATP System Registration Form

Registration Deadline: 15th June 1997


ATP System Name
Competition Divisions	MIX	UEQ	SAT
FOF Demonstration Division
SH Demonstration Divisions	MIX	UEQ	SAT	FOF

Entrant (handling all competition participation issues)
Name
Institution
Email
Phone
FAX
System URL
Associate's names

There is a competition registration fee of AU$50, which includes the competition dinner and an elegant CADE-14 ATP System Competition T-shirt. The nominated entrant must register and pay the fee for the competition, using the CADE-14 registration form (you don't need to do that right now, but you should send in this system registration immediately). If you cannot come to Townsville, we'll post you a T-shirt and eat your dinner. Associates who wish to attend the competition dinner and get a T-shirt, must also register.

Your system registration will be confirmed by email.