Table 1
A priori hypotheses and predictions driving experimental treatments or inclusion of additional explanatory and control variables.
| Hypothesis | Prediction | |
|---|---|---|
| 1) | Social interaction fosters engagement in YardMap. | Participants in the social version of YardMap will be more active in the project than participants in the non-social version and will login more times. |
| 2) | Based on activity theory (Krasny and Roth 2010), mapping and identification of sustainable practices increases content knowledge | Post-pre-test differences will increase in the mapping (only) and social mapping treatments compared to the waitlist control. |
| 3) | Activity in the project (a measure of effort) will increase learning. | The number of logins into the project will be positively associated with the post-pre test difference. |
| 4) | Based on theories of social learning, social interaction within YardMap will increase content knowledge more than will non-social mapping. | Post-pre-test differences will be greater for participants using the fully social mapping application than for participants using the mapping application stripped of social tools. |
| 5) | The amount of learning that can be detected is lower for higher pre-test scores. | Pre-test score will be negatively associated with post-pre differences in content knowledge. |

Figure 1
Example of participant-generated YardMap. Participants in the social version can view each other’s yard practices (site characteristics), habitat types, and birds seen as well as forum comments and the news feed of all comments.
Table 2
Different kinds of experiences available to participants in the social vs. non-social version of the YardMap Web Application.
| Social treatment | Non-social treatment |
|---|---|
| Participants can peek at others’ maps by browsing and clicking items in a shared map interface. | Participants can view points representing maps in a shared map interface but cannot peek at others’ maps. |
| Participants can comment directly on each other’s maps and items within those maps. | Participants cannot see maps or objects and cannot comment. |
| Participants can access the forum in YardMap (i.e., can post, like, comment, share, follow) and see the newsfeed in their own map page. | No forum access. |
| Participants can access learn articles and infographics. | Participants can access learn articles and infographics. |

Figure 2
The message that waitlist control participants saw on the YardMap Website after they took the pre-test. This message remained when they signed in until they completed the post-test about 8 weeks later.

Figure 3
Number of logins by individuals over 8-week study period. Logins are distributed as a Zipf curve.
Table 3
Pre-test scores. Scores on pre-test for the three categories of tests using all participants who completed the pre-test regardless of whether they completed the post-test.
| Measure | Bird-IDs | Tree-IDs | Ecological concepts |
|---|---|---|---|
| Number of questions | 8 | 6 | 5 |
| Mean ± SEM | 3.7 ± 0.08 | 4.75 ± 0.06 | 3.07 ± 0.05 |
| Median | 4 | 6 | 3 |
| Skew | 0.1 | –0.53 | –0.1 |
| Kurtosis (sharpness of peak) | –0.69 | –0.95 | –0.75 |
| Sample size | 591 | 586 | 580 |

Figure 4
Distributions of pre-test scores (a–c) and post-pre differences in scores (d–f) for the three types of questions.
Table 4
Pre-test bias among ~560 participants completing the pre-test. Results of GLMs (for Bird-ID and ecological concepts) and non-parametric analyses for Tree-ID to determine whether pre-test scores were random with respect to treatment. Sample included all participants who completed the pre-test within 50 minutes, including those who did not take the post-test.
| Scores on pre-tests | Explanatory variable | Effect size | Test statistic | P-value |
|---|---|---|---|---|
| Bird-IDs (GLM, Negative binomial, n = 591) | Control v. Two treatments combined | 0.06 ± 0.04 | t = 1.29 | 0.20 |
| 1 control and 2 separate experimental treatments | 0.04 ± 0.03 | t = 1.77 | 0.08 | |
| Tree-IDs (Mann-Whitney U, Kruskal-Wallis, n = 555) | Control v. Two treatments combined | 0.38 ± 0.17 | W = 37,886 | 0.01* |
| 1 control and 2 separate experimental treatments | – | Chi-square = 6.67 | 0.036* | |
| Ecological Concept questions, (GLM, Gaussian, n = 580) | Control v. Two treatments combined | –0.02 ± 0.09 | t = –0.19 | 0.85 |
| 1 control and 2 separate experimental treatments | 0.03 ± 0.05 | t = 0.48 | 0.63 |
Table 5
Post-test bias. Results of Generalized Linear Models to determine the effect of treatment and pre-test score on the tendency for participants to take (1) the post-test or not (0) (binomial response variable). N = 560 participants.
| Explanatory variable | Effect size | z | P-value |
|---|---|---|---|
| Control (0) v. Two treatments (1) | –0.72 ± 0.19 | –3.83 | ≤0.001* |
| Three separate categories: Control, non-social, social (0, 1, 2) | –0.38 ± 0.10 | –3.62 | ≤0.001* |
| Bird-IDs pre-test score (0–8) | 0.15 ± 0.05 | 3.15 | ≤0.002* |
| Tree-IDs pre-test score (0–6) | –0.002 ± 0.068 | –0.03 | 0.97 |
| Ecological concepts pre-test score (0–5) | 0.01 ± 0.08 | 0.15 | 0.88 |
Table 6
Results of General Linear Models to test predictions regarding learning as measured by post-pre differences. The response variable was post-pre difference in scores for Bird-ID, Tree-ID, and Ecological concepts (analyzed separately). The explanatory variables included experimental treatment (waitlist control was coded as zero; the non-social treatment was coded as 1; and the social treatment was coded as 2), number of logins as a measure of a participant’s activity in the project, and pre-test score (birds, trees, or ecological concepts). The r-square for these analyses ranged from 0.10–0.27.
| Explanatory variable | Bird-ID | Tree-ID | Ecological concepts | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Estimated effect size ± SEM | t | p-value | Estimated effect size ± SEM | t | p-value | Estimated effect size ± SEM | t | p-value | |
| Treatment (0 vs. 1 and 2 combined) | 0.11 ± 0.14 | 0.77 | 0.44 | –0.03 ± 0.09 | –0.17 | 0.86 | 0.08 ± 0.07 | 0.49 | 0.49 |
| N logins | –0.01 ± 0.07 | –0.20 | 0.84 | 0.02 ± 0.03 | 0.82 | 0.41 | 0.08 ± 0.05 | 0.14 | 0.14 |
| Pre-test score | –0.41 ± 0.06 | –7.14 | <0.001*** | –0.55 ± 0.06 | –8.64 | <0.001*** | –0.57 ± 0.05 | –12.47 | <0.001*** |
| Treatment (1 vs. 2) | 0.17 ± 0.17 | 0.99 | 0.32 | 0.12 ± 0.18 | 0.67 | 0.50 | –0.14 ± 0.13 | –1.09 | 0.28 |
| N logins | –0.01 ± 0.07 | –0.10 | 0.92 | 0.02 ± 0.03 | 0.85 | 0.39 | 0.08 ± 0.05 | 1.44 | 0.15 |
| Pre-test score | –0.41 ± 0.09 | –4.55 | <0.001*** | –0.54 ± 0.09 | –5.97 | <0.001*** | –0.53 ± 0.06 | –8.79 | <0.001*** |

Figure 5
Linear regression lines for the relationship between pre-test score and post-pre-test difference. R2 values ranged from 0.11 to 0.75 and were lowest for birds and highest for ecological concepts.
Table 7
Recommendations for how to proceed with controlled studies of online learning.
| Potential problem | Recommendation |
|---|---|
| Increased learning in waitlist control | Design study with two waitlist controls and unseen questions:
|
| Insufficient variation in pre-test scores | When using instruments that have not been validated, test the pre-test with 50 random participants to make sure there is enough variation in pre-test scores to detect an increase. |
| Differences among treatments in pre-test scores | Increase the sample size to allow segmentation of the pre-test data in a way that homogenizes pre-test scores among treatments. |
| Learning potential declines with pre-test score | Include pre-test score as an explanatory variable in analyses of learning. |
| Increase number of high-effort participants in sample | Increase sample size to provide a more robust sample of high effort participants, allowing segmentation of data to study effects of activity in the project on learning. |

Figure 6
Expected relationship between pre-test score and learning difference (post minus pre-test) when learning occurs versus when it does not occur. If learning occurs, we predict that the learning difference is highest for participants with low pre-test scores, and declines to zero for participants with perfect pre-test scores. Participants with perfect pre-test scores cannot demonstrate learning based on the questions asked. In contrast, if there is no learning, we no relationship between pre-test scores in the study and the post-pre learning difference (a line with zero slope). A steeper negative slope will tend to show that more learning has occurred.
