How hard should test instances be...

Topic > How hard should test instances be...

1 IntroductionDuring instance-specific macro learning experiments [1], we faced a problem where there was no significant difference between the perfect model and the others macro templates/sets. Initially I thought that learning in general was not useful. But then I realized that this problem was caused in part by the way I collect data. The test examples were so easy to capture any significant differences in performance between models. So, we need to make test instances more difficult to solve in general. It is also possible to fall to the opposite extreme, that is, to make problems too difficult to the point that no model can solve most cases. In practice, in order to consider a test instance, I think I should set a lower bound on the execution time of the empty set and an upper bound on the execution time of the perfect model so as to have a clear view of the differences between the models /macro sets .2 Details We need to find sufficiently difficult instances to test macro performance. It is crucial that the cases I test are difficult, because otherwise the differences in the models' performance may not be clear. This is easy to check: if the problem is really easily solved on the empty set, I shouldn't include it in the results. So I add a lower bound to the time of the empty set:T(i,m0) > MinTimeBut is this assumption correct? If we set it up in advance before running the experiment, maybe it's fine. My argument is: keeping in mind that we want to measure the significance of the performance difference between models/macrosets, and given that the process switching time of current operating systems is non-zero, we should make that assumption. This is because there will be a small instance...half of the sheet of paper...h, and it was difficult to avoid generating such instances for testing. It is not possible to fully control the output of a random problem generator, and mprime problems were relatively easy or extremely difficult. So, the only way I found to make things fairer in the comparison was to apply the upper bound to the perfect model as discussed above. This method was very effective in demonstrating that the perfect model is superior to other macros/models. Because in many cases where not all set macros timed out, the perfect model (or imaginary best prediction model) solved the problem in a fraction of a second, while most other macros solved it in more than 50 minutes ! This showed that removing the misinformation entries gave us a clearer picture and more correct view of the power of instance-specific macros.