Skip to PREreview

PREreview of Adverse weather amplifies social media activity

Published
DOI
10.5281/zenodo.11943098
License
CC BY 4.0

I have read a revised draft of "Adverse weather amplifies social media activity" [1] with interest and provide its evaluation below. Because the editorial team has already determined the theoretical contribution / scope of interest sufficient, my focus is on evaluating whether the analytic methods might justify claims made.

## Introduction

Authors could consider rephrasing their main effects (in title and elsewhere) to be about "deviations" or something like that, instead of "adverse weather". This is because effects on social media use are observed, for example, between 15-20C to 20-25C temperature bins, and many would not find the latter "adverse". Viewed in this light, the effects of adverse weather are only an (extreme) part of the story.

Authors present a "for and against" list of social media effects on p.3 (para 4). Since this is intended to be brief and sans detail, I'd only ask authors to consider rounding the evidence presented by including prominent empirical papers such as e.g. [2], [3], and [4].

Authors' first research question is about social media activity volume. If the scope of their data allows, authors could consider clarifying in (supplementary) analyses whether these effects are driven by individuals posting more frequently (more posts), or more individuals posting (more users), or (most likely) a combination of both.

## Data

If possible, authors could improve on their data descriptions (p. 4) by indicating numbers/rates of observations that remain after sequential exclusions. For example, "The Facebook data consist of ... who selected English as their language (XX%), chose the U.S. as their country of residence (XX%)", and so on.

Authors have explained adequately why raw data cannot be shared, and have made some efforts in sharing synthetic data. I suggest, however, that authors improve on their descriptions of the provenance of these data. They write on p. 1 that "We used aggregated Facebook data published previously (Baylis et al. 2018; Coviello et al., 2014).", and on p. 4 that data "are derived" and "were collected". Some of this language might be due to efforts to anonymize the submission, but it does not help in clarifying who collected the data, how, and when. I think those questions are critical even (possibly especially) though data cannot be shared, and suggest authors to describe these in more detail. For example, what was the code / API endpoint to gather the Twitter data? Who collected it and where was it first and most fully described? How did authors get it from authors of those previous papers (if that indeed happened) if TOS doesn’t allow sharing it?

## Analyses

I have some questions that the authors could consider answering and potentially revising the text in order to inform future readers better.

  • Why maximum temperature?

  • "We also control for daily temperature range, percentage cloud cover, and relative humidity, represented here via h(μ)." (p.6)

    • It is unclear to me what "h(μ)" above indicates; which of "daily temperature range, percentage cloud cover, and relative humidity" does μ indicate and should it not then have [t] subscript for day-of-study?

    • Many of my comments re equations reflect my background in psychology, where we are more familiar with multilevel model equations rather than fixed effects. Authors might consider this audience feature in (potential) revisions.

  • "We estimate our relationships of interest using indicator variables for each 5°C maximum temperature and temperature range bin, for each 1cm precipitation bin, and for each 20 percentage point bin of cloud cover and relative humidity (represented here by f(), g(), and h() respectively)." (p. 6)

    • It is also unclear to me whether/how f() is used here to refer to "indicator variables for each 5°C maximum temperature and temperature range bin" when f() in the equation is written with f(tmax[jmt]). Is this notation intended to mean that the effect of tmax varies between temperature ranges? Consider clarifying as many readers might not be familiar with this notation.

  • Confirm if model 2 should not read (e.g) d(tmax[jmt], precip[jmt]) instead of multiplying the f() and g().

  • "For the purposes of computability, we take a simple random sample from this larger sampling frame of individuals to create a panel of 10,000 individuals representative of the frequent users in our Twitter data." (p. 12)

    • Is this a random sample of user IDs, in which case their frequency of use is not weighted in the selection and thus the "frequent users" reference here is not valid? Consider clarifying.

    • How does the model account for imbalance in county populations? Do populous counties weight more in the analyses and so is aggregate estimate for mean-sized county?

In addition,

  • Could authors please supplement/replace in-text p values with more informative confidence intervals or the like in reporting estimates?

  • Could authors comment on their models (extremely high) R2 values, and help readers understand how they relate to "projected model R2" values?

  • Authors' analyses do not account for spatial autocorrelation between cities/geographic features. It is likely that the predictors and outcomes are more similar among nearby counties. More importantly, weather alterations in a county might have effects on neighboring counties' social media use via social group overlap across counties. It would therefore be important to account for spatial autocorrelation in the analyses, or clearly explain why they are ignored and what the repercussions could be.

## Code & reproducibility

Finally, I did not review the code or data the authors shared on OSF. Code review would take too long for my time budget because of adverse features of the code base, such as lack of consistent styling & linting, lack of a reproducible environment (such as renv or Docker), and using hard coded instead of relative paths. I will note, though, that authors state (p.1)

>Computational reproducibility: The code files underlying this manuscript are computationally reproducible and analysis scripts alongside synthetic replication data are available via OSF: here.

According to my (perhaps strict) interpretation of reproducibility, this does not seem to be the case. The code does not run without significant changes by the user, for example, in recreating the (R) environment without information on what it was, adjusting parallel processing options to local deployments, and changing hard-coded paths to relative paths. In case the authors determine they have will/time to improve on this, they might find Chapter 1 of <https://r4ds.hadley.nz/workflow-scripts> useful. The quoted sentence above should end in a link to the OSF repo instead of the word "here".

## Minor comments

  • Please add (doi) links for each citation. I had quite the time manually tracking references whereas I could have simply clicked on hyperlinks.

  • (p. 8) "the effects of both freezing temperatures and hot temperatures increase social media use." should read "both freezing and hot temperatures increase social media use." Consider proofreading throughout for ensuring clarity re analytic methods and statements of results.

  • "right-hand-side measurement error" (p. 16) could instead read "measurement error in predictors" for clarity

Respectfully signed,

Matti Vuorre

[1]: https://arxiv.org/abs/2302.08456

[2]: https://doi.org/10.1073/pnas.1902058116

[3]: https://www.nature.com/articles/s44220-023-00063-7

[4]: https://doi.org/10.1177/21677026221078309

This review of "Adverse weather amplifies social media activity" [1] is contributed by Matti Vuorre under CC-BY to Psychological Science and the PREreview platform. My review of the manuscript's previous draft can be found at <https://doi.org/10.5281/zenodo.10887605>.

Competing interests

The author declares that they have no competing interests.