Claims on overall performance, liking, preference, and efficacy can be challenged through the Advertising Standard Council of India (ASCI), a self-regulatory system for the industry. Stated claims are supported by data that are statistically evaluated to judge if the experimental results are due to a real effect or to random variation. Therefore, before one can be translated into a statistical research hypothesis. If a claim cannot be translated as a research hypothesis, it cannot be substantiated.
Claims may be disputed for one or more of the following reasons:
A statistical hypothesis is a statement about the quantifiable aspects of products, which can be estimated from experimental results but are not otherwise directly observed. In statistical terminology, a research hypothesis (Claim) is called the alternative hypothesis. A complementary statement to the alternative hypothesis is called the null hypothesis. Statistical tests of significance are rules for judging whether the experimental results support the claim formulated as an alternative hypothesis.
Through experimental designs, data are collected and relevant sample statistics are computed, such as the mean, the standard deviation, etc. Since these statistics are subject to sampling and experimental errors, the statistical tests may lead to an incorrect decision. Suppose the decision is made to reject the null hypothesis and accept the alternative hypothesis. This decision, if it turns out to be wrong, is said to result in a type I error. The probability of type I error is denoted by α and is known as the significance level of the statistical test. A probability of α = 0.05 indicates that the test is liable to wrongly reject the null hypothesis 5 times in 100 cases. Significance levels α = 0.05 and 0.01 are often used in scientific applications and are generally the accepted levels for claims substantiations. On the other hand, a type II error results if the decision is made not to reject the null hypothesis but in fact it is false. The probability of type II error is denoted by ß. In the planning of a claim substantiation study, both types of errors should be controlled.
Statistical analysis is probabilistic. A statistically significant result may not be of practical significance to the consumers. For example, the colour of a cosmetic product may have changed over time from its original colour. The change may be statistically significant, but not necessarily in the eyes of consumers. Thus, instead of merely having a statically significant change, one may need to determine the amount of change that will be perceived as significant by the consumers. This amount of change must be determined by correlating trained panel results with consumer test results.
Claims may be classified by two properties: style and competitive focus. Style refers to the statement being made about the advertised brand, the most common being a “distinction” claim, in which a brand claims to be preferred, more efficacious, safer, etc. Another style is a “similarity” claim, which conveys that the advertised product is like the competitor’s product in one or more attributes. All products in this category must be tested against the advertised product.
A competitive focus claim is a statement being made about the competition against one or more explicitly identified brands or implied brands. For example, the claim may be targeted against an implied brand, i.e., “preferred over the leading brand.” or more broadly against a brand set, i.e., “No leading oil is more absorbent.”
In both style and competitive focus, a claim statement can be monadic, making no comparison with other products, i.e., a statement of quality, an invitation to try the product or an untargeted claim. An untargeted claim is considered puffery and requires no formal substation.
A useful model for the evaluation of a proposed claim must address the following aspects:
A model incorporating these aspects becomes increasingly important in disputed claims.
Rationale
Consumer products contain ingredients that affect the perception that they are desirable. By linking ingredients in a product to experimental results, one can provide a rationale for the claim. Experimental support from allied sciences, such as in-vitro studies and model systems, can also provide additional rationale for the claim.
Objective
A claim becomes stronger if its usefulness can be objectively and subjectively determined. An objective measure of product performance is desirable. It can be obtained by clinical studies in real-life settings on humans and the targeted population per product. Responses from such clinical studies can be measured by bio instruments or obtained by trained or expert panels. Indeed, data from trained panels are recognized as objective measures. Descriptive analysis and the spectrum method, which used a trained panel, can provide objective measures of the sensory properties of personal care products.
A descriptive panel undergoes rigid training and validation/calibration as specified by each method.
Subjective
When properly carried out, subjective measures obtained from home use testes, among others, may provide useful and acceptable data for claims substantiation.
Safety
Obviously, cosmetics products must be safe and without adverse side effects. The model must address the safety aspects. Safety-related data can be obtained from research guidance panel tests, central location consumer tests, and the various types of laboratory model systems, i.e., in vitro and in vivo tests.
As indicated, a model incorporating these aspects provides a way to deal with conflicts, permits more efficient use of data for the development of truthful claims, and promotes effective communications between parties in disputed situations.
A conceptual model for assessing perception data measuring interdependent attributes is postulated, this model defines the following:
The experimental designs suited for obtaining data for parity and superiority claims are discussed further on.
A superiority claim simply indicates that the product advertised is the best in the market. It is essential that direct product-to-product comparative testing be used for substantiating a superiority claim. An appropriate design for comparing two products at a time is known as the paired-comparison design. Which is discussed further on. An example of a superiority claim is “compared to the leading brands, tropical Isles is unsurpassed as a skin moisturizer and conditioner” As stated before, a claim must be translated into a statistical hypothesis. In order to do this, we must have a well-defined scale on which these products can be scored for comparison. Suppose product A is being compared with a leading brand for claim substantiation. If, on a scale for comparing such products, high scores correspond to superior products, we can formulate two statistical hypotheses such as the following:
H0: Average score of leading brand average score of product A
H1: Average score of leading brand <average score of product A
To be able to claim superiority for product A, the null hypothesis must be rejected at, say, the 95% confidence level (5% significance level) in favour of the alternative hypothesis, which states that product A is superior to the leading brand.
Parity claims are difficult to establish by means of hypothesis testing methodology because for parity claims the research hypothesis essentially states that the products are equivalent. Using a rating scale, the equivalency is translated as equality of two average scores, equality of average scores can only be stated as the null hypothesis. A statistical test will either reject the null hypothesis when there is sufficient evidence in support of the alternative or will not reject it. If the null hypothesis is not rejected, it should not be understood that the products are equivalent. Intentionally or otherwise, one can design an experiment to collect insufficient data, lacking information that leads to a decision not to reject H0. This decision only means that there is insufficient information to disown the parity claim. it does not mean that a parity claim is established with any degree of confidence.
In disputed parity claims, if a proper formulation of hypotheses and a sound design are not used, differences may arise that will be difficult to resolve among the parties involved. It is a waste of time to argue about the validity of a claim if the methodology and the design are not carefully employed. As stated above, one can design an experiment with an insufficient sample to mask significant differences between products because of the failure of the study to reject the null hypothesis. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis”.
Therefore, the formulation of hypotheses for a parity claim and their statistical testing must be done in such a way that the decision to reject the null hypothesis amounts to the parity of products.
There are three important elements in the development of a strong product claim:
A product is developed to meet either the needs of the general population or those of a specific user group in the population. Depending on the stated claim, the general population or a specific group defines the target population. In particular, the user of the product could be the purchaser and not necessarily the user. For instance, the wife is the purchaser of baby powder. on the other hand, the husband is the purchaser of after-shave skin conditioners. In the first case, wives would be the target population, and in the latter case, husbands. If the claim is for the general population, then the participants in the test would be a random sample of the population. Similarly, a random sample of a specific user group should be used in the study.
In gathering consumer data for claim substantiation, it is important that the product attributes related to the claim be included in the questionnaire. For example, if “soft” and “smooth” are sensory attributes claimed for the product, then these attributes must be included in the questionnaire in the form of intensity and /or hedonic (like/dislike) questions.
How many attributes questions the questionnaire should include is often a difficult decision to make in questionnaire development. If a product has undergone a series of descriptive sensory analyses, this should provide the appropriate number of attributes for inclusion. Briefly, descriptive analysis is a sensory methodology that provides quantitative descriptions of products based on the perceptions of a group of qualified subjects. It is a total sensory description, taking into account all sensations perceived—visual, auditory, olfactory, kinesthetic, and so on – when the product is evaluated. In practice, the desirable number of attributes has ranged from 10 to 15.
Another aspect of questionnaire development is the choice of the rating scale (1=dislike extremely, 5=neither like nor dislike, 9=like extremely) developed in 1947 at the Quartermaster Food and Container Institute for the U.S. Armed Forces. This is the most extensively studied of rating scales and, as a result, is the most reliable one for acceptance/preference measurement. Information on questionnaire development is widely available.
The paired comparison is the most powerful design to support almost all types of product claims. The statistical analysis of paired-comparison design is simple and meets all the essential statistical assumptions; the test is simple to execute for both the experimenter and the panellist, and the evaluation of two products by a single panellist, and the evaluation of two products by a single panellists first nicely into the classic paired-comparison situation (i.e., right/left sides of biological materials.)
The general idea of the paired-comparison design is to form homogeneous pairs of like units so that comparisons between units of a pair measure differences due to treatments rather than units. This arrangement leads to dependency between observation on units of a pair measure difference due to treatments rather than units. This arrangement leads to dependency between observations on units of the same pair. This situation can be extended to sensory and consumer testing. The statistical assumption in the analysis is that the differences is independent and normally distributed; in most cases this assumption is satisfied in practice. Furthermore, the common problem of correlation of ratings among panelists becomes irrelevant, since one is now dealing with differences di.
For reasons of cost, time, and other business constraints, one must conduct a consumer test with more than two products for evaluation by panelists at the same time. In this situation, the randomized complete block design (RCBD) is used for claim substation. The statistical model for describing an observation is
yij=µ+Ai+Bj+eij
Where = the observed rating for the product given by the panelist; µ= the grand mean; = the effect of the product; = the effect of the panelist; and = random errors assumed to be independently and normally distributed, with mean zero and variance . In this model, the effect of panelist-to-panelist variation is removed from the random errors , making the test of significance more sensitive.
In most consumer testing claim studies, the statistical analysis from the RCBD or the single-factor repeated- measures design is sufficient. Also, the SAS code in table 5 can easily be expanded to include demographics, product usage information, and so on.
We have covered the importance of statistical experimental design to consumer tests for supporting claim substantiation. In particular, the formulation of statistical research hypotheses is discussed and its importance in parity claims reviewed. The use of a paired-comparison design is recommended for claims substantiation. The importance of understanding the power of a statistical test and its relationship to sample size to provide a claim that can withstand rigorous scrutiny was emphasized.
Maximo C. Gacula, J., & Singh, J. (1998). Consumer Testing Statistics and Claims Substantiation. In L. B. Aust, Handbook of Cosmetic Claims Substnatiations (pp. 235-258). New York: Marcel Dekker, Inc.