Test-retest reliability of survey items on ownership and use of physical activity trackers

and retrieve personalised data on almost anything, including aspects of health and lifestyle. As such, this is proving popular for the eHealth and mHealth sectors (Johnston, Hoffman, & Thornton, 2014). There are several commercial products labelled as wearable devices, and have been defined as “smart electronic devices available in various forms that are used near or on the human body to sense and analyse physiological and psychological data, such as feelings, sleep, movements, heart rate and blood pressure” (Khakurel, Melkas, & Porras, 2018). Commercially developed wearable devices that provide physical activity measurements can include activity trackers, with components such as accelerometers, gyroscope, heart rate monitors (HRM), global positioning systems, and step-counters. In addition, these components can be built into phones that can be accessed by smartphone applications (apps). Although HRM is an actual physical device with hardware components specific to its purposes, and apps are installed on a person’s phone and has a graphical interface that interacts between the Introduction


Introduction
Todays' young adolescents grow up not knowing what life is like without the internet (Underwood & Farrington-Flint, 2015). Through the internet, adolescents can regularly stay connected with others (Yonker, Zan, Scirica, Jethwani, & Kinane, 2015). Constant internet-access is often blamed for maladaptive health behaviours (David, Kim, Brickman, Ran, & Curtis, 2015). This includes decreased physical activity and increased sedentary behaviours, and this may be a contributing factor to the rising rate of overweight and obesity among adolescents (Robinson et al., 2017). However, there are many potential methods for health promotion, resulting from technological advancements. One such method is personalised capturing devices with interfaces on mobile phones. user and the hardware, both function as physical activity trackers (PAT).
There is much potential to use tailored information obtained from wearables to help promote physical activity based on biopsychosocial information obtained through PATs (Fanning, Mullen, & McAuley, 2012;Lubans, Smith, Skinner, & Morgan, 2014). Ahmadvand, Gatchel, Brownstein, and Nissen (2018) believe that digital tools form a new paradigm for biopsychosocial-digital interventions. For example, PAT were tested among children with leukaemia (Hooke, Gilchrist, Tanner, Hart, & Withycombe, 2016), or activities that promote physical activity indirectly through games (Garde et al., 2015) had positive effects on the main outcomes. These various approaches can reach a variety of population groups to achieve individual goals and targets for tracking the daily dose of physical activity, increase physical activity as an intervention, and to assist training of athletes.
Researchers are interested to know if these devices can be used to promote physical activity (Hooke et al., 2016). Much of the prior research on PAT has been the accuracy of the device against gold standards in physical activity measurements, or against each other (Namba et al., 2012). However, for young adolescents, the fact that something is not as accurate as another, is less important than the overall design, comfort and ease of use (Ridgers, McNarry, & Mackintosh, 2016). With the broad commercial appeal and reach for populations, the use of PAT may provide feedback to the participant that makes an intervention to change behaviours appealing.
A prominent motivation theory that can be used to link behaviours common in PAT is Control Theory (Carver & Scheier, 1982). Users require regular selfmonitoring to sustain usage. This prompts an increase in the individual's sense of responsibility, and they take their own active role in the goal setting process, from setting realistic goals, through creating plans and supporting self-control to achieving goals.
Goal setting is important for sustained physical activity (Lyons, Lewis, Mayrsohn, & Rowland, 2014). Features from PAT can support fitness, friends and fun, and are specifically useful in maintaining physical activity levels throughout adolescence (Hardie Murphy, Rowe, & Woods, 2016). Moreover, adolescence is a time where health-related habits, including regular physical activity, can lead into adulthood (Telama et al., 2014). Therefore, it is important for researchers to be able to establish the mechanisms of how PAT are used to adopt and maintain healthy behaviours. To date, there are only a few studies that have used PAT within intervention studies on young adolescents (McCallum, Rooksby, & Gray, 2018), although there has been increased interest recently (Ridgers et al., 2017).
PAT is a fast-growing area in terms of commercial development as well as scientific research. The majority of product's design features have been made with the adult market in mind and currently PAT is not user-friendly nor appealing to youth (Ridgers et al., 2016). Yet the potential of PAT to assist with long-term behaviour change in youth is vast. PAT provide regular and tailored feedback both objectively and subjectively, and this information if communicated appropriately, has strong potential to increase exercise self-efficacy (Bandura, 2004). Self-efficacy is an important component of social cognitive theory, and a positive correlate of a child's physical activity behaviour (Van Der Horst, Paw, Twisk, & Van Mechelen, 2007). Therefore, it is important to understand if children own these devices or apps, if they use them and what factors are correlated with ownership and use (Dos Santos, Bredehoft, Gonzalez, & Montgomery, 2016). To have a reliable instrument to measure ownership and usage is important; currently none exist. Only through reliable instruments, it is possible to track changes over time, to establish comparable data across populations in large cross-national studies, and provide baseline data to help inform interventions. There are potential implications for public health and behavioural scientists to use PAT, particularly when it is administered as part of a large population survey, whereby associations across a variety of health behaviours can be appealing (Currie & Alemán-Díaz, 2015). Therefore, the aim of this study is to examine the test-retest reliability of the items in ownership and use of physical activity trackers among young adolescents.

Participants
The study followed the principles of the Helsinki Declaration (World Medical Association, 2013). The Ethical Committee of the Faculty of Physical Culture, Palacký University Olomouc approved the study (under reg. no. 7/2017), based on the principals' agreement and parents'/guardians' informed consent for the adolescents taking part in the survey. All adolescents were given the option to withdraw from the study at any time and the data was collected through anonymised pen-andpaper questionnaires, which were inserted into blank envelopes after being completed.
Seven schools from four administrative districts of the Olomouc region in the Czech Republic were selected at random and invited to take part in the test-retest study during September to November 2017. In total, 1,017 pupils were registered in the classes enrolled in the survey. The first part of the weeks?" and "Have you acquired a heart rate monitor or sports watch in the past three weeks?". Both questions had the same response items. Responses were recoded in two circumstances. Scenario one, if a person responded "no" in the test and "yes" in the retest question, and "yes" in recent acquisition, they were labelled as "no" for the purposes of test-retesting. Scenario two, if the participants' response was "yes, but do not use it actively" in the test, and "yes, and I use it actively" in the retest and "yes, and I use it actively" in the acquisition, their response was recoded as "yes, but I do not use it actively".

Physical activity measurement
A self-reported physical activity measure was used.
Respondents were asked to report the number of days in the past week, whereby they had taken part in moderate-to-vigorous intensity physical activity for at least 60 minutes. The standardised item has acceptable psychometrics for large surveys (Biddle, Gorely, Pearson, & Bull, 2011). In line with global physical activity recommendations for health (World Health Organization, 2010), there were two groups divided into meeting the recommendations (7 days) and not meeting the recommendations (0-6 days). The crude cut offs have been criticised for loss of detail in physical activity behaviours, especially taking part in any activity can be seen to be beneficial (Janssen & LeBlanc, 2010). Therefore, the test-retest reliability of the entire scale has also been reported in this paper and used as a descriptive variable based on the aforementioned cut-offs.

Procedures
The study aimed at developing and validating a questionnaire comprehensively investigating leisure-time use (with focus on physical activity) and its correlates in 11-to 15-year-olds. The questionnaire was administered in the 5 th , 7 th and 9 th grades by trained research assistants, replacing teachers during regular class time. Three weeks was chosen as a gap that gave a balance between reduction of memory recall and time for changes in behaviours. Next, the questionnaire from both the waves of data collection were paired using a unique ID code ensuring the anonymity of data (consisting of first letter of mother's and father's given name, day and month of birth).

Statistical analyses
IBM SPSS Statistics (Version 24 for Windows; IBM, Armonk, NY, USA) was used for all analyses and α set at 95%. Descriptive statistics were conducted through chi-square tests of independence. To control for bias of non-respondents, a Student's t-test was conducted to test the difference between missing data on PAT data collection (Test) included 856 children with a response rate of 84.1%. We did not obtain written consent from 71 parents or guardians, 2 pupils withdrew from the study during the survey, and the remaining 88 pupils were missing at the time of survey due to illness (n = 62) or other unspecified reasons. The second part of data collection (Retest) took place exactly three weeks later with the identical questionnaire in the same classes and included 857 pupils with a response rate of 84.2%. An increase in the number of pupils was attributed to absences from the first data collection. The data file was then left with only the matched cases (n = 755).

Background variables
All participants were asked to include some basic background variables including gender, age, and family affluence. Age was determined by calculating the month and year of birth to the date of the first test. Age categories were formed for age nearest to 11, 13, and 15 years old. Family affluence was measured through the child-friendly proxy measure of social economic status  through the Family Affluence Scale III (FAS). Three groups categorised in line with the recommendations in reporting FAS (Inchley et al., 2016), whereby ridit transformations were set to low (0-0.2), medium (.21-.80), and high (.81-1) FAS.

Physical activity trackers
The wording of the PAT items of the survey was based on the use of marketing material from a leading company in commercial HRM devices. They were then reviewed by an expert panel on adolescents' surveys, before the final terminology for the items were used.
There were two items used to assess the ownership and usage of PAT. The header question was, "Do you have any of the following physical activity measuring devices?" Item one was, "smartphone application", and item two was, "Heart rate monitor / sports watch". Response categories include, "no", "yes, but I do not use it actively", and "yes, I use it actively". For the purpose of reliability analyses, the entire response scales were used and the responses were dichotomised into "no" (0), and "yes" (1).

Recent acquisition
One of the problems faced with test-retest studies, is the changes of circumstances between the two time points. As digital health is a fast-growing area, there are possibilities of ownership changing because the individual may have bought or enabled such a device or downloaded a new app. Therefore, the respondents were asked, "Have you installed an app in the past three and physical activity. Overall stability was determined by the proportion of participants showing no shift in response between the test and retest. After adjustment for recent acquisition, we used the single measure of intraclass correlation coefficients (ICC) to measure reliability. More specifically, the ICC was tested with the two-way random model with an absolute agreement type. 95% confidence intervals (CI) in a non-stratified and stratified by gender or by age models to describe the variety in the ICCs. If the 95% confidence intervals did not overlap, the values were considered to be significantly different. Acceptable reliability criteria were based on the Landis and Koch divisions of agreement (Landis & Koch, 1977) and Cohen's Kappa statistics were used to estimate the stability of the variables, after classifying the variables into a dichotomous way of owners and non-owners. Interpretation of Cohen's Kappa include the following cut-offs, 0 as indicating no agreement, .01-.20 as poor agreement, .21-.40 as fair, .41-.60 as moderate, .61-.80 as good, and .81-1.00 as almost perfect agreement (Cohen, 1988).

Descriptive statistics
After final cleaning of the data (incomprehensive data n = 8, missing gender n = 2, missing age n = 4), the sample in this study included 741 adolescents (53% boys, 47% girls; 47% 11-year-olds, 33% 13-year-olds, and 20% 15-year-olds). Fewer adolescents responded to the question on PAT in the test than in the retest (apps 492 vs 566, HRM 490 vs 516) and represented 66% sample responses. Data with both test and retest were present for apps (n = 461) and HRM (n = 475) are presented in Table 1. Table 1 provides an overview of overall proportions of adolescents who completed both test and retest and 1) do not own; 2) do own, but do not use actively; and 3) do own, and use actively apps and HRM.
A Student's t-test between respondents with and without data on PAT was conducted for the physical activity variable. For both apps (p = .067) and HRM (p = .079), the missing data was not significantly different from the available data. The majority of missing data on apps were from 11-year-olds (51%), fewer from 13-year-olds (23%) and even less with 15-year-olds (11%). According to chi-square tests of independence there were significant differences in missing responses across the ages (χ 2 = 101.88, p < .001). Therefore, data for PAT across the ages were more evenly distributed (35% 11-year-olds, 38% 13-year-olds, and 27% 15-year-olds) than the original sample. There were no differences in response rates between boys and girls (χ 2 = 3.00, p = .392). Note. HRM = heart rate monitor or sports watch; FAS = Family Affluence Scale III; MVPA = moderate-to-vigorous physical activity.

Reliability analyses
PAT was dichotomised into non-owners and owners (Table 2). Cohen's Kappa is reported for overall sample, and stratified by gender, age or FAS. There was a change in the interpretation among boys and 11-yearolds, where there was large agreement for both apps and HRM. In addition, large agreement was reported among 15-year-olds in app ownership. The ICC was conducted for the overall sample and stratified by gender, age, family affluence and physical activity. There was moderate agreement across the sample in apps and HRM. However, agreement was only fair for girls for both apps and HRM, and 13-yearolds for apps. Confidence intervals did not overlap between boys and girls with HRM. All the stratification variables were almost perfect and large agreement for gender (Kappa = .973), birth month (ICC = .998), birth year (ICC = .994) and FAS (ICC = .915). There was good and moderate agreement for moderate-tovigorous physical activity (ICC = .797, Kappa = .455). Stability of PAT is represented in Figure 1 with the proportion of shifts in responses between test and retest. Almost two thirds (64%) of adolescents report no shift on apps, over a quarter (28%) reported a shift of one category, and less than one in ten (8%) reported a shift of two categories, after adjusting for any recent acquisition of apps or HRM. The proportion of adolescents with no response shift was higher for HRM (77%), as was shift in two categories (10%) than apps.

Discussion
According to the results of this study, there was good reliability in a three-week test-retest of the PAT itemsapps and HRM -among boys and less reliable among girls. Reliability was consistent across the age groups. After dichotomisation of PAT into owners and nonowners, there were moderate correlations for boys and girls. Similarly, there was good agreement for 11-yearolds and moderate for 13-and 15-year-olds.
The differences between boys and girls require consideration. There were no statistical differences between non-ownership and ownership of apps or HRM between boys and girls. According to control theories, ownership  of apps or HRM is an indicator of more interest in physical activity and digital technology. Another perspective of interest in the behaviour of physical activity could be by examining the physical activity levels in behaviours (Hagger & Chatzisarantis, 2014), whereby previous studies on young adolescents have reported significantly large differences in physical activity levels between boys and girls (Kalman et al., 2015). In addition, boys have reported to spend more time on computer-based activities than girls (Sigmundová et al., 2017). Therefore, we infer that boys have better knowledge of physical activity tracking devices and can therefore respond to the items more reliably than girls. The proportion of ownership across ages also did not significantly differ in both apps and HRM. In a previous study carried out in Finland, there were reported increases in ownership and use of PAT as young adolescents got older from 11 and 15 years (Ng, Tynjälä, & Kokko, 2017). There could be cultural differences among adolescents between the Czech Republic and Finland. For example, reliance on mobile technology in Finland has risen since the popularity of Nokia in the 1990s, which coincided with the demise of landline phones (Statistics Finland, 2015). The longer history of smartphone usage among youth in Finland may be a reason for such differences. Therefore, even though young adolescents may start accumulating independent money as they get older, the adolescents in the Czech Republic may tend to spend money on other items, such as fashion, food and taking part in leisure activities (Inchley et al., 2016). In both this study and Ng et al. (2017), there were low numbers of young adolescents who reported to own HRM. This may be related to the specific functioning of HRM and its costs, however these conclusions warrant further study. We also found that family affluence was not associated with the ownership or usage of apps or HRM, thus we may believe that the commercial products used by the adolescents are priced correctly. The correlations from adolescents from high FAS groups were good for HRM and agreement of ownership was fair for individuals in the low FAS group. The acceptability of HRM usage among the low FAS group may be one method to improve the issues surrounding the lack of comfort of wearables among adolescents (Ridgers et al., 2016).
Approximately a half of adolescents reported owning apps and a quarter own HRM. This is much lower than the figures reported in Finland (Ng et al., 2017). However, it is not surprising because the majority of the apps and devices were designed for adults. For example, in a review of studies on the impact of physical activity apps and wearables only 8/111 studies (7%) included children (McCallum et al., 2018). Young adolescents have reported a low usage of apps and HRM because they are not child-friendly (Goodyear, Kerner, & Quennerstedt, 2019). In addition, feasibility studies have reported design and comfort as important factors for product development (Ridgers et al., 2016). It is worth noting that the quality of applications designed for children and adolescents (assessment of engagement and quality of information) correlates with the number of techniques identified to change health behaviours included in the app. Also, the number of app features correlates positively with an assessment of engagement (Schoeppe et al., 2017). Research into the factors impacting the use of digital based interventions -apps, trackers -by youth is needed. A better understanding of what young adolescents feel is acceptable will inform future feasibility studies using digital tools to promote physical activity.
Although adolescents identified many benefits of using modern technologies for the purpose of improving health behaviours (e.g., motivational aspects, free of judgment, easy access), they also indicated the limitations that result from technology use are for example: the possibility of distractions, and negative social comparisons (Radovic, McCarty, Katzman, & Richardson, 2018). Careful designs for either current use of existing products or new products for observation and interventions may need to consider such restrictions. More education on the user experience is needed, such as feedback and motives as suggested through Control Theory (Carver & Scheier, 1982). Young adolescents may need to feel they can make use of the positive aspects from technology while avoiding potential information addiction, or maladaptive behaviours such as bullying, isolation, and depression based on information from PAT (Piwek, Ellis, Andrews, & Joinson, 2016).
To the authors' knowledge, this is the first study to investigate the test-retest reliability of PAT items. However, the instrument is in its infancy, and there are study limitations to consider when interpreting the results. Specific details about the types of PAT, such as pedometers, smartwatches, or heart rate monitors were not stated in the items. A test-retest of a product can be problematic because changes in behaviours may occur between the test and retest. We attempted to address this by including an item about recent acquisition, and recoded accordingly. However, to strengthen the scientific use of these items, validity studies for both PAT and acquisition items are needed. The context of the test and retest was in schools across the specific region of the Czech Republic, and results may differ in different areas, regions or countries.