A NEW VIEW ON THE QUALITY OF JACÍK’S TEST

Assessment of physical fitness has gone through a long development as well as the changed assessment approach connected with its conception and definition. With respect to close relationship between fi tness and health (Heyward, 2010; Janssen & LeBlanc, 2010; Sucho mel, 2004; Vanhees et al., 2005), it seems to be appropriate to divide the term “fi tness” into two categories – health-related and performance-related. As Vanhees et al. (2005) claims, health-related fi tness is then viewed as a state characterized by an ability to perform daily activities with vigour, and traits and capacities that are associated with low risk of premature development of the hypokinetic diseases (i.e., those associated with physical inactivity). Today’s conception of fi tness assessment has deviated from predominantly performancerelated fi tness towards the health-related one, which also resulted in reduction of test items within the established test batteries (Suchomel, 2003). According to Plowman et al. (2006), FITNESSGRAM, whose the last ninth ver sion refl ects its long lasting development, belongs amongst the most elaborated fi tness test batteries. In a sense of increasing quality of particular tests and the whole test battery (its validity and reliability), many tests have been removed or modifi ed in the course of development (Welk & Meredith, 2008). Since 2009 a research group from the Centre for Kinanthropology Research (Faculty of Physical Culture, Palacký University in Olomouc) has been working on a conception named “Self-assessment of physical fi tness” and forms an appropriate web-based system of motor tests (software INDA RES.com). Although some of the attributes of this conception can be presently found in the last version of the FITNESSGRAM system (strong involvement of an individual into the test realization; on-line data recording in special software accessible on internet), this diagnostic approach to assessment of physical fi tness is completely new (Welk & Meredith, 2008). The basic assumption for the self-assessment conception is a positive eff ect (Marcus & Forsyth, 2009) of a strategy called “self-orientation of an individual” (interiorization) within physical activity behavioural changes. The physical fitness self-assessment is somewhat limited A NEW VIEW ON THE QUALITY OF JACÍK’S TEST


INTRODUCTION
Assessment of physical fitness has gone through a long development as well as the changed assessment approach connected with its conception and definition.With respect to close relationship between fi tness and health (Heyward, 2010;Janssen & LeBlanc, 2010;Sucho mel, 2004;Vanhees et al., 2005), it seems to be appropriate to divide the term "fi tness" into two categories -health-related and performance-related.As Vanhees et al. (2005) claims, health-related fi tness is then viewed as a state characterized by an ability to perform daily activities with vigour, and traits and capacities that are associated with low risk of premature development of the hypokinetic diseases (i.e., those associated with physical inactivity).Today's conception of fi tness assessment has deviated from predominantly performancerelated fi tness towards the health-related one, which also resulted in reduction of test items within the established test batteries (Suchomel, 2003).According to Plowman et al. (2006), FITNESSGRAM, whose the last ninth ver sion refl ects its long lasting development, belongs amongst the most elaborated fi tness test batteries.In a sense of increasing quality of particular tests and the whole test battery (its validity and reliability), many tests have been removed or modifi ed in the course of development (Welk & Meredith, 2008).Since 2009 a research group from the Centre for Kinanthropology Research (Faculty of Physical Culture, Palacký University in Olomouc) has been working on a conception named "Self-assessment of physical fi tness" and forms an appropriate web-based system of motor tests (software INDA RES.com).Although some of the attributes of this conception can be presently found in the last version of the FITNESSGRAM system (strong involvement of an individual into the test realization; on-line data recording in special software accessible on internet), this diagnostic approach to assessment of physical fi tness is completely new (Welk & Meredith, 2008).The basic assumption for the self-assessment conception is a positive eff ect (Marcus & Forsyth, 2009) of a strategy called "self-orientation of an individual" (interiorization) within physical activity behavioural changes.The physical fitness self-assessment is somewhat limited when compared with "classic" health-related fi tness assessment.These limits are mainly presented by material, spatial and knowledge-based demands corresponding with specifi c test realization.Therefore, many current tests are insuffi cient for this purpose (some appropriate alternative of the test is necessary) or the tests need to be somewhat modifi ed.
Aerobic fi tness, as a part of fi ve item model of health--related fi tness (body composition, cardiovascular capacity, flexibility, muscular endurance, and muscle strength), is amongst the most important parts because of its high association with the risk of cardiovascular diseases.According to WHO (2010), physical in activity, that directly results in the level of aerobic fi tness, is an independent risk factor of chronic diseases, and annually causes death of about 1,9 million people in the whole world.
The maximum oxygen consumption (VO 2 max) is com monly used as a suitable indicator to assess aerobic fi tness (Caspersen, Powell, & Christenson, 1985;Hey ward, 2010;Mahar, Welk, Rowe, Crotts, & McIver, 2006;The Presidentʼs Council on Physical Fitness and Sports, 2010;Vanhees et al., 2005).Due the fact, that the most precise estimate of this parameter requires labo ratory conditions, this indicator of aerobic fi tness is hardly acceptable for fi eld use.To obtain at least rough estimate of this parameter many fi eld performance tests (Continuous to exhaustion tests, Intermittent tests, and Walking/running tests) or simple intensity tests (any of the group from step tests) are used by many test systems.Nevertheless, the accuracy of these estimates vary a lotit depends on the type of the test and/or other factors such as "age", "gender" etc. (e.g. in meta-analytical study Baumgartner and Jackson (1982) presented the level of criterion validity of long distance running tests -criterion VO 2 max -that ranges from 0.22 to 0.91).
In 1983 V. Jacík published his results regarding to a new formed fi eld performance test named "Overall motor test" (orig."Celostní motorický test") (Jacík, 1983) (at present called "Jacík's test" and here abbreviat ed "JT") including characteristics of test's quality such as validity and reliability and norms for both boys and girls aged from 7 to 20 years.The test forming relates to two principles -human ontogenetic development and global understanding of human kinetics.The changes of elementary positions (reaching the vertical body position mainly) that are included in the test, the author considered to be one of the culminations of childʼs kinetic ontogeny necessary for his/her following development.At once, they are usually included into the group of activities forming the base of humans' movement base.In author's mind the test execution doesn't need any special skills.As a matter of this fact, an infl uence of cultural-social, racial, and geographical diff erence is, within the age group or gender, reduced.As the author claims, however the test result is conditioned by a complex of several skills (hybrid base), global endurance do minates.While forming the test, the author proved concurrent validity concerning to the two criteria.The fi rst criterion was "The badge of fi tness" named BBPOV ("Buď připraven k práci a obraně vlasti") where the level of validity were expressed by the correlation coeffi cientr = 0.483 to 0.537 and the second criterion was "Test of basic physical fi tness for university students" where r = 0.642 (females) and r = 0.780 (males).The author considers the test to be "satisfactory valid" indicator of the overall physical fi tness.
In spite of the fact that JT is frequently used in a prac tice (e.g. as a particular item of an entrance exam to secondary schools or universities) no study clarifying participation of particular motor abilities that infl uence the overall performance in JT has been produced yet.In consideration of criterion choice (The badge of fi tness BPPOV), we do not agree with Jacík's choice to determine test validity as indicator of the construct "overall physical fi tness", and we consider this determination to be inappropriate.Due to a wide content of "The badge of fi tness BPPOV" it is highly diffi cult to explain the results of JT with such a poor validity.Another weak point of the test standardization by Jacík is the presented results and their reliability.The results cannot be taken as suffi cient for methods of their calculation that are not sensitive to numbers of measurement errors.
When looking for appropriate field test of selfassess ment of aerobic fi tness, a lot of obstacles appear.Though some weaknesses of JT, from the material and spatial demands, such a test seems to be an adequate test useful for self-assessment of physical fi tness (in general).Moreover, because of dominance of global endurance in performance of JT, as mentioned above, one can expect high association between the test performance and the level of aerobic fi tness of individual.Considering all the se circumstances, there is a high chance that the test is suitable for aerobic fi tness self-assessment.By this means, a research problem can be specifi ed as verifying JT as suitable test for aerobic fi tness assessment.
The aim of this study is to assess the Jacík's test reliability and to specify participation of aerobic fi tness on the test performance (to assess criterion validity of the test as an indicator of aerobic fi tness).

Participants
Eighty nine PE university students participated in the study.The selection of participants was partly limit ed because of realization of the study was linked to teaching process of PE students (an interconnection of the faculty research and the teaching process).
In this sense, it was necessary divide the study into the two parts -"sex-divided" sample (validity study -44 fe males; reliability study -45 males).In the course of the study fi ve students did not complete all measurements (personal reasons, illness).Complete data were obtained from 43 females (22.2 ± 2.0 year, BMI = 23.2 ± 2.5 kg/m 2 ) and 41 males (21.5 ± 1.2 year, BMI = 25.2 ± 1.6 kg/m 2 ).
The Institutional research ethics committee of Faculty of Physical Culture of Palacký University agreed to the proposed research and approved the study with no.7/2011.The students were informed about the intention of the study and all procedures.A demand to obtain the relatively maximal possible performance was emphasized.All participants took part in this study volun tarily, after being properly informed.They were asked to participate during their PE classes and their decision to participate did not have any negative eff ects on their studies.

Measures and procedures
All measurements were realized at the Faculty of Phy sical Culture at Palacký University in Olomouc (Czech Republic) from October 2010 to February 2011.The basic somatic characteristics such as weight and height were measured in order to determine BMI.All students took part in JT and in a partial study to assess the validity, each of them had to undergo physiological load test to defi ne VO 2 max on a treadmill.

Maximal exercise test.
The investigations were carried out under relatively standard conditions (temperature 19-21 °C, humidity 40-50%) in the laboratory of the Faculty of Physical Culture.The participants were asked to come with an empty stomach and to abstain from drinking coff ee, alcohol and hard physical activities for 24 hours before entering the laboratory.
Graded exercise test till maximum was realized on a LODE Valiant treadmill (Netherland).Heart rate and spiroergometric variables were monitored during the test.After fi ve minutes of warming-up at walking speed 7 km/hour (at last minute treadmill slope was in creased up to 5%) treadmill speed was for one minute increased to 9 km/hour.Subsequently, each 30 seconds speed increased about 1 km/hour up to 12 km/hour.After achievement of this speed, each 30 seconds tread mill slope was increased about 2% up to subject's voli tional maxi mum.Oxygen consumption among the others spiro ergometric parameters was monitored with use of breath-by-breath gas analyzer (ZAN Ergo USB 600, Ger many).Simultaneously, a Polar (Finland) heart rate monitor was applied.We considered only one essential parameter from all the monitored parameters -maximum oxygen consumption -VO 2 max [ml/kg/min.].

Jacík's test (JT).
To assess the validity of JT, fema les performed the test once (after one week from performing Maximal exercise test); to assess reliability, men performed the test twice (one week pause from the fi rst test actualization).The fi rst test performance was monitored by one observer, whereas the second (males) measurement was monitored by two observers (to assess objectivity).The test was always performed under the same conditions (on a pad) based on exactly set assign ment (Jacík, 1983) -to change repetitively three positions during two minutes (back-lying position is a start ing one; followed by heel-stand, abdomen-lying position, heel-stand, back-lying position etc.); each position is fi nished by clapping palms to thighs and the result of this test is a number of performed positions numbered as points.The individual is informed about the last ten remaining seconds.

Statistical analysis
Software SPSS 17 (SPSS Inc., Chicago, IL) and Statistica 8 (StatSoft Inc., Tulsa, OK) were used to process the data.Basic descriptive characteristics such as mean, standard deviation and 95% confi dence intervals were calculated for all the indicators.After verifying normality assumption, while assessing a) criterion validity of JT, b) describing the relationship between JT and BMI, c) and verifying homoscedasticity of JT (relationship between variables -1.absolute value of diff erence of repeated individual performance and 2. average value of repeated individual performance), a coeffi cient of Pear son product moment correlation completed with the value of coefficient of determination (as effect size) was used.The objectivity of JT (two observers; probed at the second attempt), as one of the sources of measure ment errors and the level of relative reliability, expressed through intra-class correlation (ICC, parallel one-way model), was calculated according to formula ICC = (MS S -MS E )/MS S , where MS S = "mean square of subjects", MS E = "mean square of error".Standard error of measurement (SEM), and Bland and Altmanʼs 95% limits of agreement were computed to estimate absolute reliability of JT.Then, these limits were graphically depicted by Bland and Altman's plot (according to Atkinson & Nevill, 1998).SEM was calculated by a formula according to the methods of Thomas, Nelson, and Silverman (2005) -SEM = SD• √(1 -ICC), where SEM = "standard error of measurement", SD = the sample standard deviation and ICC = the calculated intraclass correlation coeffi cient (unbiased error).Microsoft Excel was used to construct Bland and Altman's plot.
Paired t-test for repeated measures (null hypothesis about equal means in the two repeated performances in JT was verifi ed; 5% level of statistical signifi cance) and the value of eff ect size (ES; Cohen's d coeffi cient for repeated measures calculated according to Cohen (1988)) were used to assess systematic bias in JT. 1 involve basic indicators of somatic and performance characteristics in the research sample.They are divided into the two groups in accordance with two parts of research -the validity study (females) and the reliability study (males).The somatic indicator is represented with BMI; the motor indicator is represented with aerobic fi tness (VO 2 max) and performance in JT.The observed BMI in females, WHO (2010) interprets as a normal weight, resp.immediately above the lower limit of overweight category in males.Considering this interpretation should be advised well-known limitation of usage of BMI in individuals having the above average developed musculature.Given the na ture of the sample, this limitation must be respected when interpreting observed BMI.According to Jacík's (Jacík, 1983) fi ve-scale population norm (in age category 19-20 year), it is possible to assess the performance reached by females as "above average" and by males as performance as "average".According to the Czech fema le's norm of oxygen uptake at maximal load (Seliger & Bartůněk, 1978) (although it is outdated there are not recent norms in the Czech Republic) sample of the ob served females can be interpreted as strongly above average.

Results in TABLE
The part A in the TABLE 2 describes the relationship between performance in JT and the level of aerobic fi tness criterion (VO 2 max) completed with the relationship of the variables towards the somatic indicator BMI.The coeffi cient of determination (r 2 ) indicates that proportion of shared variability in JT and VO 2 max is very low -13.3% only.With reference to calculated correlation, neither BMI is a factor signifi cantly determining their performance in JT.The value of Pearson's correlation coeffi cient (the coeffi cient of determination respectively; TABLE 2, section B) between the absolute values of diff erence and the average value of repeated individual performances in JT (a week interval) indicates  The paired t-test proved statistical signifi cance of the diff erence in reached points in JT between the fi rst and second measurement (increase of 3.54 points, t = 3.484, p = 0.001), therefore, a null hypothesis was rejected.The variability of intra-individual performance diff erences highly varied (from -10 to 31 points) -standard deviation of the increasing performance is 6.5 points.The value of Cohen's d is equal to 0.41 and relates to a medium eff ect of a factor which causes the intra-individual performance diff erences in the test (Cohen, 1988).Here, the measured diff erence (3.54 points) represents a bias of measurement.
The absolute reliability is expressed by the standard error of measurement (SEM).In male participants the SEM value equals 3.96 points.The second indicator of the absolute reliability, Bland's and Altman's 95% limits of agreement of repeated measurements, is graphically depicted in Fig. 1.The bias is presented with dotted line in the graph (the value equals to 3.54 points) together with 95% limits of agreement depicted with dashed lines (also 95% prediction interval) (see +/-1.96 SD in the graph), whose lower bound equals to -9.2 points and upper equals to 16.3 points.

DISCUSSION
The research sample included individuals with relatively consistent performance proved in JT (TABLE 1) (variation coeffi cient for females equals to 11.6%; 9.8% for males in the fi rst measurement and 10.4% in the second, respectively), whose performance can be classifi ed, based on the general population norm, as above average in females, respectively average in males.These performance levels are expected in PE students because of their above average amount as well as intensity of realized daily physical activity (their study programme include a large number of physical activities during the week).Because of this homogeneity, the calculated correlation coeffi cients can be higher than those reached by the general population.It can be deducted that the indicators of reliability or validity expressed by these coeffi cients are even lower among the general population.
The degree of correlation between performances in JT and selected criterion of VO 2 max indicates that Fig. 1 A Bland-Altman plot for repeated measurement of Jacík's test.The diff erences between the trials are plotted against each individual's mean for the two trials in Jacík's test.The bias line and random error lines (+1.96SD and -1.96SD) forming the 95% limits of agreement are also presented in the plot the validity of JT as an indicator for aerobic fi tness is not satisfactory.Considering VO 2 max as an indicator of aero bic fi tness, it must be claimed that this motor performance presumption infl uences only 13.3% of overall result in JT in female PE students.For example, in step-tests (depending on type of a test and population sample) it was found out this infl uence from 16% (Canadian Home Fitness Test) up to 56% (Queen College Step Test).
Actually, the probably best known running tests of aerobic fi tness present still higher values (20 m Shuttle Run Test -58%; Cooperʼs 12 m Run -81%).However, as other results suggest, the low test validity level could be explained by the lower level of JT reliability.
Although the test objectivity seems to be highly satisfactory (ICC = 0.997) in male PE students, other items of reliability analysis do not achieve such good results.The study showed that the test performance between two repeated attempts within a week interval increased on average by 3.54 points (systematic error) in male PE students.Only 14% of the cases were lower than in the previous attempt, and the variability of the re-test diff erences was high (from -10 to 31 points).The computed value of Cohen's d (eff ect size) refers to a medium eff ect of a factor causing these changes.In the sense of high demands for accuracy of the test, we consider this value of systematic error to be practically signifi cant.It can be estimated only, whether its source is the learning eff ect and experience with the test (most of the participants did this test for the fi rst time) or not.The better participants learn, the better they can distribute their strength into the time limit of 2 minutes; a strategy they can use to increase or decrease rhythm of repeated positions within the attempt.In case learning eff ect aff ects the intra-individual diff erences, the re-test diff erences should decrease with number of attempts.The value of re-test's ICC (TABLE 2) and the range of its 95% interval of confi dence points out a "questionable" relative reliability (Atkinson & Nevill, 1998).It indicates very "unsteady" order of individual within the group in repeated testing and consequently its "unsteadiness" towards all normative results.We do not expect the interval between the test realizations had a significant impact on increasing or decreasing of individual's performance abilities for this test which would also be a possible source of measurement errors.In the case of this test, other sources of measurement errors might be tiredness, changes of health or change of motivation.The measurement was realized at the same time in one week period, and therefore we expect the contingent ti red ness to be for all athletes (regular training) in both measurements the same or roughly the same.
In monitored males, the standard error of measurement in JT was 3.96 points which equals to 17.8% of one category of population norm compiled by Jacík.We consider this value to be practically signifi cant.Computed Bland's and Altman's 95% limits of agreement of repeated measurements (Fig. 1) refl ects the fact that 98% of possible following test/re-test diff erences could range from -9.2 to 16.3 points (i.e.25.5 points!).The magnitude of this interval (together with SEM value) denotes the low level of JT absolute reliability.
While test qualities were originally verifi ed by Jacík, the reliability was assessed on the bases of correlation analysis only -the indicators of relative reliability respec tively.Computed coeffi cients of stability (r = 0.854-0.875),long term stability (r = 0.807) and objectivity (r = 0.981) led him to claim test reliability as satisfactory (Jacík, 1983).However, these results cannot be considered as suffi cient, because the statistics used here (test/re-test correlation coeffi cient as stabili ty and inter-rater correlation coeffi cient as objectivity) are not sen sitive to many of measurement errors (e.g.mentioned standard error of measurement) (Atkinson & Nevill, 1998;Hopkins, 2000;Thomas, Nelson, & Silverman, 2005).The source of errors such as the already mentioned learning eff ect cannot be detected in coeffi cients computed by Jacík.Contrary to Jacík's results, based on our results in PE students we can confi rm neither our assumption of suffi cient test reliability nor of the test validity.
Limitations.In this study, it is necessary to point out to several limitations in the research -1.although all the participants were PE university students (same university), the reliability and validity were verifi ed sepa rately in diff erent groups of individuals (diff erent gender); 2. the detected higher homogeneity of the research sample (lower data variability) could increase the level of Pearson's correlation coefficient used in the test; 3. the test/re-test diff erences in JT, just like in most motor tests, depend on a level of performance motivation of participants in partial trials -so the performance mo ti vation can be considered as source of measurement error (in the study, this limitation was partly eliminated by clarifying the importance of an achievement of maximal performance in both trials to the participants).

CONCLUSIONS
The study showed considerably poor reliability (absolute and relative) of JT in PE university students.There fore, the test is not an appropriate tool for assessing PE studentsʼ motor abilities, not even aerobic fi tness.The results have showed that the performance in JT can be explained with 13% of VO 2 max level -this is highly insuffi cient for aerobic fi tness assessment.Due to the fact that the performance interpretation in this test might have been highly misleading, we do not con-sider the test to be used appropriately for any assessment purpose, unless the movement content of the test is changed or modifi ed respectively (to increase test re liability).běhátku).Objektivita a míra relativní reliability je vyjádřena pomocí ICC.SEM a Bland-Altmanovy 95% limity shody byly stanoveny k posouzení absolutní reliability.

TABLE 2
Validity and reliability characteristics of Jacík's test -Pearson correlation coeffi cient (r) and intraclass correlation (ICC) Atkinson et al. (1998)bsolute value of diff erence between repeated measurement of individual performances in Jacík's test in one week interval, Mean retest -mean of repeated measurement of individual performances in Jacík's test in one week interval, 95% CI -95% interval of confi dence the data homoscedasticity (the error of measurement does not grow with increasing score in JT).Therefore, it is not necessary to log-transformation of the obtained data.Calculated high value of intra-class correlation ICC (TABLE 2, section B, the relation "Jacík's test Observer 1" vs. "Jacík's test Observer 2") refers to a high level of objectivity, i.e. measurement is independent from the observer.The detected value of intra-class correlation for repeated measurements (TABLE 2, section B, the relation "Jacík's test Week 1" vs. "Jacík's test Week 2"), as an indicator of relative reliability, can be according toAtkinson et al. (1998)interpreted as a "questionable reliability".