Increasing the weight of difficulty
Many players express frustration about the high influence of execution scores and the comparatively small influence of difficulty scores on tournament results leading to a risk-averse type of play. Systematic analyses of tournament results have demonstrated that this is not just a perception by the competitors, but a fact: the committee calculated the standard deviations of all 3 judging categories for 72 pools from Paganello and FPAW 2009 and 2010.
Summing up, the data shows that Difficulty has a lower weight than Execution. AI, however, is the dominating category (see table below; more detailed results available if needed). If AI is 100% in terms of weight, than Execution would be 74% and Difficulty 61% in this sample.
Since nobody wants to increase the scoring weight of execution, the committee thinks that the best idea would be to figure in a way to increase the weight of Difficulty to the same level as that of AI. All committee members agreed that the best way to do this is to educate judges in order to use the whole difficulty rating scale, instead of mainly using scores from 3-7, to increase variance for Difficulty. However, since judging education is a long process, as well as requiring a lot of effort to fully accomplish, committee members agreed to implement a multiplier as a transitional measure to solve this issue mathematically until judges are better educated and subsequent analyses of categories indicate a wider variation of difficulty scores on routine score sheets. The multiplier shall increase the weight of Diff simply by multiplying the overall Difficulty scores of all teams by 1.5 when calculating the final results of a pool. This would bring the weight of Difficulty to 92% in the analysed sample, while AI would stay at 100% and Execution at 74%.
IF the problem lies in the lack of education for the Diff judges and we skew the scores to rectify, then the judges who are educated will be way out range. When i judge difficulty I recognize that these players are the tens in the sport and that is because the do “ten” moves. Whenwas the last time you gave a ten?
I give at least one ten a round…
The multiplier is a good tool. The problem with using 1.5 as an average multiplier for diff is that it doesn’t turn out to be the same every time as shown in the table above. WE used to and should continue to calculate the value of the multiplier based on the standard deviation of components of that round of play. Scores are close and using the actual SD and precise multiplier should matter. Accuracy should not be taken so lightly as to assign one value to the multiplier. In the example, Diff was only brought to 92% of AI which is not equal….and could make a difference if scores are close. In a pinch I understand that the number 1.5 is better than nothing but why not be totally accurate and equal out AI and Diff?
I’m glad Mikey brought up that old multiplier. It was a great idea at the time, but without a computer on site, it was very time consuming to execute, for those who happened to know the secret formula. Now we can not only have a master laptop at every event, but tablets and monitor results boards are possible, as well. But back to the old style multiplier, it still had the same vulnerability on being too execution heavy as the system has today. With current technology, we can not only calculate a by-the-round flexible balancing mean based multiplier, to balance the A.I. and diff categories, but we could consider using the execution score as a secondary multiplier to get a complete technical merit score. For example, with a raw diff score of 6 and a flexible balancing multiplier of, let’s just say 1.33, and multiply that by an ex score of 8.5(brought down a decimal to .85), one would earn a tech merit score of 6.78. Since we have proposed to ergonomically simplify A.I., those A.I. judges could record execution as well. Diff can’t do it (judge ex), cause it would be a conflict and a distraction from not penalizing for errors in diff, as we’re supposed to be doing now.
But where I’m going with this is a pathway to doing away with execution as a standalone category and going to a binary system, which I know from many years of talking with other players around the world about judging, is worth considering. Then, only 6 judges would be necessary for an average event, and 10 judges, with 5 per panel, could be used for large international events, where we throw the high and low scores of each category. Many thanx to the committee for offering this forum for input.
First I like to answer Z’s question, “When was the last time you gave a ten?” I only gave a 10 once or twice and this was for some insane combo from Tommy or Fabio Sanna. Why? Because nobody educated my judging skills.
The only time I attended a judging lesson was at one of the German Open early 00’th. After that all i learned about judging was from other judges sitting next to me and, as you can imagine, many times i was told subjective and contradicting opinions about judging.
OK it is actually not that bad. Meanwhile, I read the competition manual and discussed a lot about the judging issue. But my point is the following:
We need a better judging education of the players!!! Why is there no obligatory judging lesson at the first day of every tournament?
Here comes my opinion about the Diff multiplier:
I think balancing AI and Diff is an essential tool to tackle some of the major shortcomings of the current judging system.
These days we do have tournaments all over the world and each tournament has players with a different level of play. The range goes from spread the tournaments where new jammer are “forced” to participate, to national championships, to major events like Paganello and FPAW. How can an unflexible judging system account for this diversity?
I like the idea of balancing AI and Diff in each pool. For instance this can be done by scaling the results such that the maximum (of AI or Diff) over all teams is equal 10. Hence, there would be two different multipliers one for AI and one for Diff.
I agree with Jan, that our judging education is miserable. And we should improve that. The suggestion with the obligatory judging lesson is great!
I would also force every competitor to read the manual before he/she compete at least one time!
Educating Judges to use the full scale – OK!
Using a Multiplier to balance the categories – also OK.
But I think if the Judges know there scores will be multiplied they wouldn’t change their habits and change their scale.
When will the Multiplier be used and who will decide it? Will it be a rule to use the Multiplier always, or only if a gap is between the categories?
1.5x doesn’t seem to be enough. The range of AI scores on any given judge’s sheet can be huge, while the range of Diff scores is almost always narrow because (1) it’s an average rather than a granted score and (2) judges generally don’t give super low or super high scores. 1.5x doesn’t seem to solve it.
Thank you committee for the great analysis – very professional and helpful!
I don’t like the multiplier because it think it’s important to have immediate and published scores (for us and for the spectators) and I’m afraid that the multiplier will extend the period till the results are up.
But I agree that diff has too little weight. Both, AI and technic/diff (see my comment 2) should have more or less the same weight. To have the multiplier and immediate scores we need more technical support on the judging tables.
Maybe we could also try to solve the problem with education (diff judges need to use the full range!!!) I agree 100% with Jan Schreck words: “…We need a better judging education of the players!!! Why is there no obligatory judging lesson at the first day of every tournament?…”.
Could it be helpful, if diff judges just had to mark boxes every 15 sec?
Box 1: very easy (meaning the average stuff in those 15 sec. where very easy)
Box 2: easy
Box 3: medium
Box 4: difficult
Box 5: very difficult
Giving 2-4-6-8-10 points for the boxes 1-2-3-4-5. I think I would check box 5 more often than giving a 10 with the current system.
This time too I’m agree with Reto, we must use a full scale, it’s not easy, I think to did a lot of mistake, but I think share in box is it a brillant idea…I don’t know if is it possible but maybe that’s the right way. I’ve appreciated your hard work but I think the idea of multipler it’s not a good idea.
Thank you for working so hard to determine a way to evenly weight the difficulty with other categories. Unless there is a compelling reason to use 1.5 as the multiplier, couldn’t we simply use the multiplier number that makes the categories evenly weighted at 100%, rather than assigning 1.5 as the multiplier? (Judging is automated in spreadsheets now, so the potential errors in multiplication should be less of an issue, if that is a concern.) Reto’s checkbox idea is interesting.