Summary: This manuscript by Judd et al. adds a novel, important story to the literature of how transposable elements are co-opted for gene regulatory networks. Leveraging a set of existing ChIP-seq datasets in the mouse liver for circadian rhythm transcription factors, the authors demonstrate how a recent, abundant mouse transposable element – RSINE1 – is enriched at circadian rhythm enhancer sites, as confirmed using additional datasets and well-established criteria for enhancers. They show how the consensus RSINE1 sequence previously contained degenerate motifs for transcription factor binding sites, which evolved into mature binding sites through point mutations after their insertion near pre-existing circadian rhythm enhancers. To test their hypothesis that RSINE1 is involved in circadian rhythm enhancers, they employ luciferase assays to test the enhancer activity of RSINE1 consensus relative to a consensus with optimized TFBSs and two test enhancers, with and without RSINE1. Their results succinctly support their hypothesis, which highlights an interesting new story that complements the existing body of literature.
The study is generally well-planned and well-done, raising only two major concerns and a number of minor concerns.
Essential Revisions/Major Concerns.
- For the luciferase assay, considering the circadian dependence of the system, it is vital that the authors clarify if all vectors were tested using the same batch of cells in parallel; if each vector was tested at distinct times, for example, the results observed could be purely due to the confounding factor of circadian rhythm. In addition, the authors state that they transfected 500ng each of the test vectors and Renilla luciferase control vector. This control vector has an overwhelming signal, such that the manufacturer of the plasmid and Dual Luciferase Assay (Promega) normally suggests a 1:10 ratio of Renilla to test vector. Authors should confirm that these values are accurate and are not a typo (50ng of Renilla instead of 500ng).
- Regarding the title, it does not accurately depict the scale of the impact of their findings. It suggests that TEs are involved in the de novo evolution of mouse circadian enhancers, when really it is the evolution of complexity that is at play. One suggestion for a more accurate title would be “Evolution of Mouse Circadian Enhancer Complexity via Transposable Element Co-Option”.
- Figure 5A: Colors are inverted violin plots vs cartoon for “Bound RSINE1” and “All RSINE1.” Could also rotate violin plots to put them in-line with model drawing.
- Regarding the ChIP-seq datasets, because the reanalysis of prior data initially does not recapitulate the results of the original authors, it is important for the authors of this manuscript to clarify whether the intersection of the peaks of both datasets were used for calling circadian enhancers, or if they used all of their newly-called peaks; if so, authors should justify this choice. In Figure S1A, Venn Diagrams should be used to showcase overlap between old and new peaks, especially since the peaks called in reanalysis do not fully match those reported in the original publications.
Minor Comments:
- Code and newly-generated data used for analysis and results should be made publicly available on a source such as GitHub, and the repository containing it should be included in a “Data Analysis” section.
- Line 155, “To investigate the biochemical activity of RABS…” is improperly phrased. Paragraph describes a genomic analysis with no true biochemical assays being used; “to investigate the temporal activity of RABS” is a more accurate statement.
- When discussing specific datasets in text (eg The “temporal genome-wide binding profiles” alluded to in lines 155-156 and 250), clarify if the dataset is one of the ones cited and cite in-line, or specify how the meta-dataset was generated in the methods if the dataset was generated by compiling all the datasets in Table 1.
- Table 1 should also contain a column specifying the times sampled in each dataset where relevant. Stating that all studies were hourly, for example, or just the common timepoints between them which were compared would suffice.
- Figure 2A: Some of the points that are colored as significant are partially hidden by black points; I would recommend replotting these points above the black points to ensure all are visible.
- Figure 4A: Boxes highlighting motifs are nearly invisible; using a horizontal bracket would significantly improve the visibility of the motif sequence.
- Figure 4B: Motif logos could be included for E-Box and RORE to help highlight significance of findings.
- The authors mention that “transcriptional control has been extensively studied in the liver,” however, a cursory look in NCBI SRA reveals that the ChIP-seq data needed for this study is only available for the liver. As-is, one would be tempted to think that the study is needlessly limited in its scope, especially given Fig. 4G. Thus, explicitly stating that this study can only be done in the liver in mice would be good to pre-empt such thoughts from the reader. Related to this, an appeal in the discussion to study circadian rhythm gene activity in other tissues, and the implications of the findings in this paper in CR gene regulation differences between tissues, would be nice.
Overall, this study provides an important new story to the field of transposable elements and the evolution of gene regulatory networks. Given clarifications and corrections on the concerns raised, I would strongly support the conclusions reached in this preprint.