GJE Results Analysis

GRAND JURY EUROPEEN

THE STATISTICIAN'S POINT OF VIEW

Bernard Burtschy, Professor of statistics - in addition to being a member of the Jury and one of the winners of the Euro Cave-European Grand Jury Trophy - draws conclusions which enable him to make a statistical analysis based on the official results.

------------------------------------

Classifying wines, vintages, even wine-tasters, with a view to quantification, is a process which naturally interests the statistician, a man preoccupied by figures. The process is all the more interesting due to the fact that the use of a set of collective notes to draw up a classification of Chāteaux demands a technique that is not without its hazards.

Which scoring system ?

France in particular and Europe in general, are using scales from 1 to 20 or from 1 to 10. The American University tradition, adopted in particular by Robert Parker, operates on a scale of 100. This scale has its own rules which are not always understood. Scores range from 50 to 100 in theory, but in practice it is rare to find a score lower than 72 (zero). the maximum, 100, is very rare too. The gap is therefore from to 25 to 30 points.

The jury was asked to score the wines on a scale of 100 so that the scores would be comparable to the American system. A detailed analysis of the wine-tasters notes showed that they scored in three different ways :

The first group scored according to the rules of the American system. Their scores were between 70 and 100. i.e. a gap of 30 points.

A second group used the scale that they were probably most familiar with, from 0 to 10 or from 0 to 20. Then they multiplied by a factor of 5 or 10. Their scores showed a range of between 20 and 100, i.e. a gap of 80 points.

The third group combined the two systems and scored between 50 and 100, i.e. a gap of 50 points.

The aggregation of individual scores

The classifications were compiled in traditional manner by combining the individual scores. A rough classification was thus obtained, quantifiable in terms of points. This type of classification, which is easy to understand, supposes a perfect homogeneity in the scoring system between the tasters. Even if they adopted the same scale for scoring, this homogeneity would not exist, because it is rare that two tasters score with the same distribution of points.

Is this important ? Yes, because the influence of the taster on the final classification depends on his system of scoring. A taster who gives the same score to every wine has no influence on the classification. The more his scores are diverse, the greater his influence will be.

The consequences are immediate. A taster who uses a scale of 60 points will have twice the influence of a taster who uses a scale of 30 points, and will thus count for two.
The "rough classification" gives greater weight to a taster who uses a large scale.

A taster with disparate notes will have a maximum influence.
It is obviously possible to try and standardize the tasters' evaluations using identical systems of scoring. The experiment, often conducted in product tests, shows that this standardization is never perfect. With wines, it is virtually impossible because it is so difficult to reflect all the nuances of a wine in one score. As one has to concede that there are several languages in Europe, one must also admit that there are several ways of scoring.

Standardized classifications

There are some more or less sophisticated ways of reducing the exaggerated influence of a particular taster on the overall classification. The simplest method is to take not a wine's score, but its position in the classification. Each taster identifies a first wine and a last one, with perhaps ones that are equal. The sum of the positioning offers a more accurate result than the "rough classification". The positioning method is unsatisfactory in one respect compared to the "rough classification": if a wine is far ahead (or behind) the others, the positioning method doesn't take this into account.

Statisticians prefer to standardize the scores of each taster by putting them on the same average with a constant dispersion. The contribution of each taster to the classification is thus exactly the same. From a rigorously scientific point of view, the only useful classification is one that has been standardized. Thereafter, nothing prevents one from weighting each taster, depending for example on the number of wines discovered "blind".

Group tasting versus individual tasting

The virtues of individual tasting are well known - as are the faults. The individual taster has his own particular taste and it is one that the public can refer to. On the other hand nothing enables one to distinguish an eventual substandard performance by the taster confronted by a spoilt wine, due to a lack of comparison with others. Collective tasting, by diluting the influence of each taster in the group, renders the tasting both more reliable and less personal. Apart from the question of principles or ideologies for or against each type of tasting method, modern statistical methods enable the faults of collective tastings to be remedied.

A taster, who judges a series of wines, is rightly judged in turn by his tasting. One needs a lot of time to discover a particular taster's method of tasting. The only way to judge in any kind of formal way is to position his tasting in relation to his judgment of other wines or in relation to other tasters tasting the same wines. Unfortunately, the context in which the tasting takes place also changes.
They are never the same bottles, on the same occasion, etc. And comparison is very personal (which sometimes allows one to save one's reputation).

Collective tasting, if it is properly processed statistically, enables an immediate comparison to be made of the differing profiles of each of the tasters. The rich potential for analysis is incomparable. Twenty tasters, tasting the same bottles on the same day enables one to draw up twenty parallel classifications, and to reveal the similarities and the divergences.
Even the analysis of the similarities and the divergences are better than all the classifications in the world.

Apart from anything else, the methods involved enable a reliable classification with nuances. Two wines could have the same score for different reasons: one for its smoothness and the other for its austere elegance. The confrontation between tasters' judgments, some in favor of the smoothness, others appreciating the suave elegance, enables one to situate a wine in a much surer way than with an individual tasting. An analysis of contrasts remains the surest method of analyzing.

This approach to tasting has serious consequences for the compilation of juries. Many think that a jury must be homogenous, that each taster must be a "clone" of the ideal taster, the best in the world. This is to ignore the enormous variety of wines, their complexities and the diversity of tastes. On the contrary, wine-tasting juries must reflect the tastes of consumers. Professional tasters have often enough been accused of living on another planet, totally cut off from reality. Collective tasting, properly treated from a statistical point of view with modern processing methods, enables one to reveal this diversity of choices. Let's not deprive ourselves of that...