- For many years CASC has focussed on breadth of coverage over TPTP problems.
- Regular bursts of enthusiasm for "hard problems", e.g., solving TPTP problems with rating 1.00.
- Evidence of incremental development (as noted in the early days of CASC); see the system version numbers from recent CASCs.
- A new event, with new stimulation, focussed on hard problems provided by the entrants ... the IJCAR ATP System Challenge (IASC, pronounced "I ask"!).

- The CADE ATP System Competition (CASC) will run in CADE years
- The IJCAR ATP System Challenge (IASC), will run in IJCAR years.
- IASC is modelled after soccer competitions, e.g., the UEFA Champions League, that have a league phase and a knockout phase. The IASC league phase has only one group containing all systems, and the knockout phase starts with 8 systems. Read all the details below.

- Each entrant submits 3+2 challenge problems to me
- I keep the challenge problems secret until the day of the challenge (live at IJCAR).
- Each problem must be accomanied by a proof in TPTP format, which I'll verify.
- It is OK to submit a problem from the TPTP.

- Use just one format of problems, either FOF (because there will be more systems), or TF0 without arithmetic (because it's a better language).
- Systems are submitted and tested for soundness, as in the competition.

- League Phase: Each system plays a game against each other system, using their first 3
challenge problems, i.e., 6 problems per game.
- The time limit will be T min wall clock (I think T=5 is OK for the resources available - it will depend on the number of entrants - see the example numbers below).
- A solution must be a proof in TPTP format, which I'll verify.
- At the end of the League Phase eight systems are chosen by the panel, based on the systems' performances. While the numbers of problems solved in the games will be a strong influence on the choice, the panel can use their discretion, e.g., in the case of ties.
- The eight systems are given a ranking, the best is rank-1, down to the worst rank-8.

- Knockout Phase:
- Round 1: Four games are played - rank-1 v rank-5, rank-2 v rank-6,
rank-3 v rank-7, rank-4 v rank-8.
In each game all 5 challenge problems of the two systems are used, i.e., 10 problems per
game.
- The time limit is 2*T mins.
- If any game results in a tie, a tie-break is run using problems randomly selected from the challenge problems from other systems in Round 1, until one system gets ahead.
- A solution must be a proof in TPTP format, which I'll verify.
- The winners are winner-1-5, winner-2-6, etc.

- Round 2: Two games are played - the winner-1-5 v winner-3-7, and winner-2-6 v
winner-4-8.
- Same rules as Round 1.
- The winners are winner-1-5-3-7, winner-2-6-4-8, etc.

- Round 3 - The Finals: One game is played - winner-1-5-3-7 v winner-2-6-4-8.
- Same rules as Round 1.
- The winner is the "Champion of the Challange"

- Round 1: Four games are played - rank-1 v rank-5, rank-2 v rank-6,
rank-3 v rank-7, rank-4 v rank-8.
In each game all 5 challenge problems of the two systems are used, i.e., 10 problems per
game.
- Human Phase:
- This is for scientific interest, and is not a competition.
- All the problems are released online.
- Developers are given 1 week to tweek their systems.
- A second League Phase is run, using all 5 problems from all systems. The same time limit applies as the original League Phase.
- The results will be reported in the Journal report of the Challenge. A detailed evaluation with system decriptions by the developers with be published separately.

- Maybe increase the time limit as the Knockout Rounds progress?
- I'm nervous about having humans in the loop, because they introduce variability that might be contraversial. Having the systems submitted in advance, tested for soundness, then used consistently in the competition is cleaner.

- The League Phase can be run in parallel on the 30 nodes of StarExec. If there are N systems, then there are N*6 system-problem pairs per game, but a system needs to run its own 3 problems only once. Maximally each pair will take T mins (and I expect a higher fraction unsolved than in the traditional CASC because systems will submit problems that are "hard" for other systems).
- Each system has to attempt 3*N problems. That will take 3*N(problems) * N(systems) * T(mins)/30(nodes) mins, e.g., if N=10 and T=5, 30 min.
- In order to make the Knockout Phase exciting, like a real sports event where a system can play only one game at a time, I'll use only two nodes per game (one per system). Each game has 10 problems, 2 systems. So a game will take 10*2(problems * 2(systems) * T(mins)/2(nodes) mins, e.g., 100 mins for T=5.
- The games in each Knockout Round are run in parallel, so for three rounds that'll take maximally 300mins. But each winning system in Rounds 2 and 3 has already run its own 5 problems, so it's less than that.