PREreview del Seeing in the Dark: Intelligent Fourier Light Field Imaging for Bioluminescence Microscopy

Publicado: 7 de abril de 2026
DOI: 10.5281/zenodo.19452559
Licencia: CC0 1.0

Summary

In the work “Seeing in the Dark: Intelligent Fourier Light Field Imaging for Bioluminescence Microscopy”, the authors Luis Felipe Morales Curiel and Michael Krieg, developed a Fourier Light Field Microscope and a deep-learning-based reconstruction method for bioluminescence microscopy. They claim to have achieved sub-second volumetric imaging with significantly improved spatial resolution, eliminating the speed-resolution trade-off and overcoming the slow classical deconvolution. They demonstrate these capabilities by imaging roundworms (C. elegans) and mouse spheroids.

In a clear and interesting introduction, they recapitulate briefly some existing approaches to the longstanding goal of tree-dimensional, non-invasive biological imaging, that reliably demonstrates cells and tissues the closest as possible to their native conditions. In the following, they propose a design for a Fourier Light Field Microscope, and present their optimized convolutional attention-based neural network for image reconstruction. In the Validation section, they exemplify three applications of their pipeline in biological systems.

Although the interesting network’s structure - especially exemplified by the spatial-weighting of the features - points towards increased resolution in substantially reduced time, the manuscript raises concerns regarding the reliability of the presented results. Several aspects suggest that the data may not have undergone sufficiently careful verification prior to pre-print, which undermines confidence in the reported outcomes. In addition, the presence of significant conceptual inaccuracies calls into question the validity of some of the interpretations drawn from the data. The appearance and structure of certain plots further suggest that the figures may have not been produced through reproducible and robust scripts even for relatively straightforward implementations. Taken together, these issues highlight the need for a more rigorous review of the data handling procedures and for clearer documentation of the analysis workflow to ensure transparency and reproducibility.

Finally, the manuscript is quite difficult to follow due to a non-linear narrative structure: key information is scattered between the main text, figure captions, and supplementary materials, which complicates the reading process and makes it challenging to reconstruct the logical progression of the study. For these reasons, the manuscript cannot be recommended for publication in its current form.

Major Concerns

• In Figure 1b, a PSD is used to estimate the microscope resolution as the inverse of the cutoff frequency: “In agreement with the design parameters and the target size, the largest recovered frequencies correspond to the smallest size in the image (Fig. 1b)”. How a cutoff frequency of about 0.9μm⁻¹, which inverse corresponds to 1.1μm, could lead to a resolution of 230nm, as shown in the image and stated in the text? Furthermore, what has been done to the PSD noise, and how could it reach negative values at the end? The caption for items a and b also seems to mistake “PSF” for “PSD” when referring to one image or another (which one should be “inset”?).

• Figure 1: Where in the images can we see the mentioned artifacts arising from classical deconvolution, such as ringing, speckling and instability, as they sustain the reported need of better reconstruction methods?

• For the robustness of the work, it would be interesting to explain why the Wide Field image of the microscope can be used as the Ground Truth for the reconstruction of a Fourier Light Field image, as they rely on different process/paths.

• Figure 2h: What does ‘magnitude’ stand for here? How does it relate to the PSD? What should be its unit? It is difficult to identity where the curves approach zero, as it is shifted from the x-axis. Suppl. Fig. S4 seems to be an extended version of Figure 2h. Why ‘magnitude’ became ‘mean intensity’ and how do they relate to each other?

• Suppl. Table 1 summarizes the different model configurations. However, as described only by those parameters, many models appear to be the same (6 and 7; 8, 9 and 10; 14, 15 and 16). What is the difference between these sets? Furthermore, LUCID is exactly one of these repeated models (model 14). In Figure 2e, couldn’t LUCID be just an outlier of this configuration? The mean values for structural similarity and RMSE when considering the whole set (14, 15 and 16) are no better than for other models, for example.

• When referring to the importance of denoising data before feeding them into the network, the authors report an improved methodology in relation to their previous work, by simultaneously suppressing noise while reconstruction the image. However, data shown in Suppl. Fig. S2 mostly reports a denoising procedure denoising data, without actually comparing both strategies. How much faster or computationally lighter could this new procedure be? As the importance of denoising for reconstruction has already been shown in this previous work, I do not see the point of maintaining this figure as it is.

• Bringing the last two items together, it is also contradictory after arguing about the importance of the denoising data, realizing that the chosen model, LUCID (model 14), has not applied denoising methods, according to Suppl. Table 1. Does that column refer to a different procedure?

• Figure 3b: The five most relevant views seem to be wrongly selected for both datasets. For the fluorescence data (top), the peak close to the input channel 80 might be greater than the second red dot, it is not clear. However, for the bioluminescence data (bottom), the peak right after the last red dot is clearly larger than this previous one (possibly even the peak close to the input channel 15 is larger). Maybe here the first five points have been selected instead of the largest ones. The entire discussion is based on that. Besides that, how is the normalized attribution calculated? We should also be able to see the original image without any views being removed, together with its SNR, for completeness.

• Figure 4d: “In addition, the reconstruction has higher spatial resolution and structural similarity metrics (SSIM) computed against reference volumes compared to the deconvolved images (Fig. 4d)”. Figure shows only similarity data, no mention to resolution.

• Figure 4h: Should contain the experimental data and the fit to a powerlaw. In addi- tion, the text says the “dimensional analysis revealed a scaling exponent larger than 1”. The dimensional analysis of which relation? Apparently, the exponent has been fitted experimentally, I do no see where the dimensional analysis comes from.

• Suppl. Fig S5 supports one of the main statement of this work, regarding the network being much faster than the deconvolution. Nonetheless, it raises many questions and caveats. First, axis are labeled ‘value’ and ‘variable’, with no indication of units. ‘Deconvolution.Time.s.’ and ‘PredictionTime.s.’ at the bottom suggest it might be seconds. On the other hand, the text mention “FLFM restoration using LUCID [...] was nearly real-time and took only 1 min for the whole video stack [...], while classical deconvolution took 1.5 h to complete on the same workstation”, which would lead to believe time unit in the y-axis are minutes. But most importantly, is it a linear or a logarithmic scale? The intervals 0.1, 1.0, 10.0 and 100.0 suggests logarithmic, but what about the minor grids in between? For logarithm they should indicate something around 3 × 10^x but it misleads to believe they are the intermediate values. Lastly, 1.5 h would be the upper limit the deconvolution took, and not the mean value indicated in green. Therefore, the statement “with two orders of magnitude faster reconstruction times” would be also overrated.

• The following paragraph, and its respective figure items, has no correlation to the topic “Excitation-free imaging of calcium dynamics” under which it can be found: “Finally, we used our LUCID pipelines to reconstruct light-field images recorded from bioluminescent mESC derived spheroids. We compared the output between regular deconvolution and our network to the bioluminescent ground truth obtained from the regular z-stack by 3D wide-field imaging (Fig. 6 i-k). Again, we observed that LUCID performs better, as evaluated by the intensity distribution and the overall SSIM (Fig. 6l)”.

Minor Concerns

• The detailed explanation of the microscope’s components presented in Figure 2 would be more helpful when first presenting the microscope in Figure 1. It should also contain the information on the microlens array (quantity and spiral distribution), as it will be important later on.

• Scale bars missing in a few images, for instance: Figure 1a, 1g and 1h, Figure 2f.

• Many acronyms have been either never explained or explained long after the first usage, for instance: GFP, MLA, BWM, SSIM, WF, GT, EI, Grad-CAM.

• Figure 2d: “The comparison of intensity profiles [...] corresponds accurately to the presence or absence of neuronal signals”. What is that supposed to mean? How neuronal signals have been determined here? “The pixels considered for the metrics are marked in the mask projection”. What and where is this mask projection?

• Figure 2e: What exactly are the other models? At this point in the text we have no information on them. If this graphic is the criteria for picking LUCID, it might fit better earlier when detailing the model, instead of between datasets. No indication of the symbols # and * mentioned in the caption.

• Figure 3b: How does the input channel/perspective relate to the angular contribution of the views, central discussion point in that section? Much later we indirectly figure out with the Supplementary Material the spiral distribution of the lenses.

• Figure 4e: The colorbar should represent the distance to the neighbors? If so, they cannot be intuitively verified, as the map is not in scale. What are the units?

• Figure 4f: The text reports a video stack of 60 minutes and 120 frames, but the y-label shows data until 132 minutes. Is it right, or should it be frames? And then, in Figure 4h, why data from only 19 minutes has been used for the evaluation?

• Figure 2g reproduces part of the Suppl. Fig. S6. In this last one however, the Z-projection of both models (BWM and neurons) seems strangely similar. Are they supposed to look so much alike or was there any mistakes?

• Figure 6d: No legend for the colors (only in the caption).

• Figure 6g: What are the ticks values in y-axis? 0 to 1? If so, could it be negative, as for the red curve? And then how was this normalized intensity calculated?

• Figure 6i: Intensity plots missing units. It shouldn’t be micrometers, as the red dashed line on the image, according to the scale bar, is not that long.

• Failed cross-references for almost all the supplementary videos.

• Specific items in a few figures and also Suppl. Fig. S3 have never been mentioned/explained in text.

• Suppl. Video 2 lasting only 2 seconds did not provide additional information relevant to the discussion.

• In the Methods section, under the topic related to the microscope design, the camera pixel size would have been an important parameter to mention, as it relates to the resolution.

• Figures presenting different datasets could be labeled for better readability (for example, when showing fluorescence vs bioluminescence images, C. elegans vs spheroids models, Body Wall Muscle vs neurons).

Conflitos de interesse

Os autores declaram que não possuem conflitos de interesse.

Uso de Inteligência Artificial (IA)

Os autores declaram que usaram IA generativa para desenvolver novas ideias para sua avaliação.

Comentarios

Escribir un comentario

No se han publicado comentarios aún.