Yeah, I meant to say (and edited) that the evaluator operates once per mesh point.
At least from the looks of per_pixel_eqn.c, calculations are done on a mesh point by mesh point basis, and only single (double float) values are returned by eval_gen_expr(). (specifically, answer_matrix[i][j] = eval_gen_expr(eqn_ptr); )
Instead of calling eval_gen_expr for each i,j, operating on the single values y_per_pixel, etc, you could instead call it once for each equation, passing it x_matrix, etc, and returning answer_matrix all in one go.
Thus in eval.c, for example when performing addition in eval_tree_expr(), instead of:
return (left_arg + right_arg);
(i.e. a single operation nestled inside of a bunch of branches), you could have
for(i = 0; i<gx; i++)
for(j = 0; j<gy; j++)
answer[i][j] = left_arg[i][j] + right_arg[i][j];
Thus all of the if() and switch() statements within eval.c are run only once per equation, not once per equation per mesh point. In an ideal world this wouldn't make too much difference, but with the P4's wonderful 20-stage pipeline, avoiding branches could make a huge difference.
In addition, the for loop above could be rewritten with SSE or 3DNow! instructions.