Daily Digest — May 10, 2026
Variant: B (Detail-First)
Note: Today’s digest features 3 accessible open-access papers. Severalunread papers remain paywalled and need PDFs (see end).
Paper 1: Competition between parallel sensorimotor learning systems
Albert et al., eLife 2021 | PDF
Abstract
Sensorimotor learning is supported by at least two parallel systems: a strategic process that benefits from explicit knowledge and an implicit process that adapts subconsciously. How do these systems interact? Does one system’s contributions suppress the other, or do they operate independently? Here, we illustrate that during reaching, implicit and explicit systems both learn from visual target errors. This shared error leads to competition such that an increase in the explicit system’s response siphons away resources that are needed for implicit adaptation, thus reducing its learning. As a result, steady-state implicit learning can vary across experimental conditions, due to changes in strategy. Furthermore, strategies can mask changes in implicit learning properties, such as its error sensitivity. These ideas, however, become more complex in conditions where subjects adapt using multiple visual landmarks, a situation which introduces learning from sensory prediction errors in addition to target errors. These two types of implicit errors can oppose each other, leading to another type of competition. Thus, during sensorimotor adaptation, implicit and explicit learning systems compete for a common resource: error.
Experiment
The authors investigate how implicit and explicit sensorimotor learning systems interact during visuomotor rotation adaptation. Participants make reaching movements while a cursor is rotated relative to their actual hand position. The key manipulation is separating implicit recalibration (unconscious adaptation) from explicit re-aiming (conscious strategy).
Key paradigms:
- Visuomotor rotation: Cursor rotated by 20°–90° during reaching to targets
- Stepwise vs. abrupt rotations: Gradual introduction vs. sudden large perturbation
- Preparation time manipulation: Limiting time before movement initiation to suppress explicit strategy
- Coaching instructions: Explicitly instructing participants about the rotation
Participants: Multiple experiments with n=20–70 per condition across several labs (Johns Hopkins, York University, Buenos Aires).
Results (Figure by Figure)
Figure 1 — Total implicit learning is shaped by competition with explicit strategy
Panel A shows the visuomotor rotation setup: participants move from start to target, but the cursor path is rotated. Hand path is composed of explicit (aiming) and implicit corrections. The authors contrast two hypotheses: independence (implicit learning unaffected by explicit strategy) vs. competition (implicit learning decreases when explicit strategy increases).
Panels B–D analyze data from Neville and Cressman (2018) where participants adapted to 20°, 40°, or 60° rotations. As rotation size increased, explicit re-aiming increased dramatically (panel C), while implicit learning actually decreased slightly (panel D). This is the opposite of what an independence model predicts — it’s exactly what competition predicts.
Panels E–G show a stepwise rotation experiment. Implicit learning exhibited saturation when the driving input remained constant, consistent with the competition model where implicit learning is driven by (rotation − explicit strategy).
Panels H–L show another dataset where implicit learning scaled with rotation size, and panels M–Q show non-monotonic implicit learning. The competition model captures all three phenotypes (saturation, scaling, non-monotonicity) depending on how explicit strategy changes with rotation size.
Figure 2 — Increases or decreases in explicit strategy oppositely impact implicit adaptation
Panel A shows the coaching experiment: participants with instructions about the rotation (purple) vs. without (black). Explicit adaptation was enhanced with coaching (panel B), and critically, implicit adaptation was reduced (panel C). The competition model predicts this trade-off perfectly.
Panels D–G examine gradual vs. abrupt rotations. Stepwise rotations suppress explicit re-aiming by ~10° (panel F), and this reduction in strategy is associated with increased implicit adaptation (panel G). Again, competition model fits; independence model fails.
Figure 3 — Strategy suppresses implicit learning across individual participants
Experiment 2 manipulated preparation time (PT). No PT Limit participants had time to develop explicit strategies; Limit PT participants had to initiate movements quickly, suppressing strategy.
Panels A–C show that Limit PT participants had much lower explicit re-aiming (panel B) but higher implicit learning (panel C). The competition model captured this trade-off at the individual level.
Panels I–O replicate this in a laptop-based control experiment (Experiment 3) with more precise implicit measures (via exclusion/alignment tests). Limit PT increased implicit learning by ~40% (panel O), confirming that strategy suppression liberates implicit learning.
Figure 4 — Correlations between implicit and explicit learning are consistent with competition, not SPE generalization
An alternative explanation for the implicit-explicit trade-off is “aim-centered generalization” — implicit learning appears smaller because it’s measured relative to the target, not the aim point. The authors show that even when correcting for this, the competition model still explains the data better than independence. The correlation structure across participants and conditions supports competition.
Important Methods & Highlighted Points
- Competition model: Implicit learning = f(rotation − explicit strategy), not f(rotation) alone
- Two error types: Target error (cursor-target deviation) and sensory prediction error (expected vs. actual cursor motion). These can oppose each other in multi-landmark conditions.
- Measuring implicit learning: Via “exclusion/alignment” tests where participants are instructed to aim straight at the target without re-aiming.
- Error sensitivity (bᵢ): The gain of implicit learning on error. Competition theory shows this can appear to change when actually it’s the explicit strategy that’s changing.
- Key finding: Strategies can mask changes in implicit learning properties. A condition that appears to have low implicit error sensitivity may actually just have high explicit strategy.
Why It Matters
This paper resolves a fundamental debate in motor learning: are implicit and explicit systems independent or interacting? The answer is competition — they fight over the same error signal. When you consciously re-aim, you unconsciously recalibrate less.
This has broad implications:
- Motor rehabilitation: Patients using conscious compensation strategies may be suppressing automatic recalibration, slowing true recovery.
- Skill learning: Coaches who encourage explicit strategies may inadvertently reduce implicit learning gains.
- Experimental design: Any measure of “implicit learning” is confounded by explicit strategy unless both are measured separately.
For Raghavendra’s interests: This connects to the broader theme of multiple learning systems in the brain (striatum vs. cerebellum vs. cortex) and how they interact. The “competition for error” framework could be extended to other domains beyond sensorimotor learning.
Paper 2: Cells use molecular working memory to navigate in changing chemoattractant fields
Nandan, Das et al., eLife 2022 | PDF
Abstract
In order to migrate over large distances, cells within tissues and organisms rely on sensing local gradient cues which are irregular, conflicting, and changing over time and space. The mechanism how they generate persistent directional migration when signals are disrupted, while still remaining adaptive to signal’s localization changes remain unknown. Here, we find that single cells utilize a molecular mechanism akin to a working memory to satisfy these two opposing demands. We derive theoretically that this is characteristic for receptor networks maintained away from steady states. Time-resolved live-cell imaging of Epidermal growth factor receptor (EGFR) phosphorylation dynamics shows that cells transiently memorize position of encountered signals via slow-escaping remnant of the polarized signaling state, a dynamical ‘ghost’, driving memory-guided persistent directional migration. The metastability of this state further enables migrational adaptation when encountering new signals. We thus identify basic mechanism of real-time computations underlying cellular navigation in changing chemoattractant fields.
Experiment
The authors combine theoretical modeling (dynamical systems / bifurcation theory) with live-cell imaging to show that single cells implement a form of working memory.
Theoretical framework:
- Receptor signaling networks modeled as dynamical systems near bifurcation points
- Two key bifurcations: subcritical pitchfork (creates bistability / polarization) and saddle-node (creates metastability / memory)
- “Dynamical ghost”: a slow-escaping remnant of a polarized state that persists after the signal disappears
Experimental system:
- Cell line: MDA-MB-231 breast cancer cells (highly migratory)
- Reporter: EGFR-mCitrine phosphorylation dynamics
- Microfluidics: Custom devices generating changing EGF gradient fields
- Imaging: Time-resolved live-cell microscopy with spatial resolution
Key manipulations:
- EGF gradients applied, removed, and reapplied in different spatial configurations
- Single-cell tracking of both molecular signaling (EGFR phosphorylation) and cellular behavior (shape, migration direction)
Results (Figure by Figure)
Figure 1 — In silico manifestation of metastable polarized membrane signaling
Panel A illustrates the dynamical mechanism: a subcritical pitchfork bifurcation creates two stable polarized states (left/right) separated by an unstable saddle point. When the signal is present, the system sits in one polarized state. When the signal disappears, the polarized state becomes a “ghost” — not truly stable, but decaying very slowly.
Panel B shows the bifurcation diagram: as signal strength (S) decreases, the stable polarized branches merge with unstable branches via saddle-node bifurcations. Beyond the bifurcation point, no stable polarized states exist, but the system retains a “memory” because it lingers near the ghost of the former attractor.
Panels C–F show simulation results. The model predicts that after signal removal, cells should maintain their polarization direction for an extended period (memory), while remaining able to re-polarize when a new signal appears (adaptability).
Figure 2 — Molecular memory in polarized EGFR phosphorylation translates to memory in polarized cell shape
Panel A shows the microfluidic device: cells migrate in channels with controllable EGF gradients.
Panels B–D show live-cell imaging data. When EGF is applied, EGFR phosphorylation polarizes (panel B). When EGF is removed, the phosphorylation gradient slowly decays but persists for minutes (panel C) — this is the molecular memory.
Panels E–G connect molecular memory to cellular behavior. The polarized phosphorylation state drives polarized cell shape (panel E). State-space trajectories (panel G) show that cells get “trapped” in regions of state space corresponding to previous polarization directions, confirming the dynamical systems prediction of metastability.
Figure 3 — Cellular navigation in changing chemoattractant fields
This figure (inferred from text) likely shows the behavioral consequences: cells encountering a signal, migrating toward it, losing the signal, continuing in the same direction due to memory, then reorienting when a new signal appears.
The key result is that cells balance two competing demands:
- Persistence: Continue migrating in a direction even when the signal temporarily disappears
- Adaptability: Reorient when the signal changes location
The metastable “ghost” state provides exactly this balance: it’s stable enough to provide memory, but unstable enough to allow re-polarization.
Important Methods & Highlighted Points
- Bifurcation theory: The authors use mathematical analysis to show that memory emerges naturally from receptor networks operating near saddle-node bifurcations. This is not a property of any specific molecular implementation — it’s a generic feature of a broad class of dynamical systems.
- Metastability vs. bistability: Bistability (two stable states) provides memory but not adaptability. Metastability (slow decay from a ghost attractor) provides both.
- Single-cell analysis: All measurements are at single-cell resolution, showing that memory is a cell-autonomous property, not an emergent population phenomenon.
- EGFR as model system: While the authors study EGFR specifically, they emphasize the mechanism is general — any receptor network with similar dynamical properties should show the same behavior.
Why It Matters
This paper makes a striking claim: single cells have working memory. Not metaphorically, but mechanistically — via the same dynamical systems principles that neuroscientists invoke for neural working memory (persistent activity, attractor dynamics).
Key implications:
- Cancer metastasis: Tumor cells navigate chemoattractant gradients to invade tissues. Understanding their navigational memory could reveal new therapeutic targets.
- Evolution of cognition: If single cells already implement working memory via biochemical networks, then the evolutionary origins of neural computation may be much deeper than previously thought.
- Synthetic biology: The bifurcation-based design principle could be used to engineer cellular memory circuits.
For Raghavendra’s interests: This connects directly to the “Cellular Learning” note and the Gershman eLife paper on single-cell learning. It provides a concrete mechanistic basis for how non-neural cells can exhibit memory-like behavior — not through synaptic plasticity, but through dynamical systems metastability.
Paper 3: Reinforcement Learning Using a Continuous Time Actor-Critic Framework with Spiking Neurons
Frémaux, Sprekeler & Gerstner, PLOS Computational Biology 2013 | PDF
Abstract
Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only been partially elucidated. On one hand, experimental evidence shows that the neuromodulator dopamine carries information about rewards and affects synaptic plasticity. On the other hand, the theory of reinforcement learning provides a framework for reward-based learning. Recent models of reward-modulated spike-timing-dependent plasticity have made first steps towards bridging the gap between the two approaches, but faced two problems. First, reinforcement learning is typically formulated in a discrete framework, ill-adapted to the description of natural situations. Second, biologically plausible models of reward-modulated spike-timing-dependent plasticity require precise calculation of the reward prediction error, yet it remains to be shown how this can be computed by neurons. Here we propose a solution to these problems by extending the continuous temporal difference (TD) learning of Doya (2000) to the case of spiking neurons in an actor-critic network operating in continuous time, and with continuous state and action representations. In our model, the critic learns to predict expected future rewards in real time. Its activity, together with actual rewards, conditions the delivery of a neuromodulatory TD signal to itself and to the actor, which is responsible for action choice. In simulations, we show that such an architecture can solve a Morris water-maze-like navigation task, in a number of trials consistent with reported animal performance. We also use our model to solve the acrobot and the cartpole problems, two complex motor control tasks. Our model provides a plausible way of computing reward prediction error in the brain. Moreover, the analytically derived learning rule is consistent with experimental evidence for dopamine-modulated spike-timing-dependent plasticity.
Main Contribution
The paper bridges three gaps simultaneously:
- Discrete → continuous time: Standard RL operates in discrete time steps; biological systems operate continuously.
- Rate-based → spiking neurons: Most RL models use rate-based units; this model uses biologically plausible spiking neurons.
- Abstract TD error → neural implementation: Shows how reward prediction error can be computed by a neural circuit using eligibility traces and neuromodulation.
Key Results (Figure by Figure)
Figure 1 — Navigation task and actor-critic network
Panel A shows the simulated environment: an agent (rat analog) navigates a maze to find a reward area (green disk) while avoiding obstacles (red). The agent receives sensory input via place cells that fire at specific locations.
Panel B shows the network architecture. Bottom layer: place cells encode position. Critic neurons: predict expected future reward (value function). Actor neurons: drive action choice (movement direction). A neuromodulatory signal (analogous to dopamine) broadcasts the TD error to modulate plasticity in both critic and actor synapses.
The critic learns via TD learning: it predicts future rewards, and the prediction error drives synaptic updates. The actor learns via policy gradient: actions that lead to better-than-expected outcomes are strengthened.
Figure 2 — Critic learning in a linear track task
This figure illustrates the three-factor learning rule (TD-LTP) given in Equation 17:
- Factor 1: Presynaptic spike train Xᵢ(t)
- Factor 2: Postsynaptic activity
- Factor 3: Neuromodulatory TD signal δ(t)
The TD signal is computed as the difference between actual reward r(t) and expected reward (critic’s prediction). The critic’s prediction is derived from the temporal derivative of the value function — a continuous-time analog of the discrete TD error.
Panel B shows that the critic successfully learns to predict rewards in a linear track after a small number of trials. The value function develops a peak at the reward location and decays with distance, consistent with animal behavior.
Figure 4 — Maze navigation learning task
Panel A shows a complex maze with a U-shaped obstacle forcing the agent to make a detour. This is analogous to a Morris water maze with barriers.
Panels B–D show learning curves. The agent initially explores randomly, but over trials learns to navigate efficiently to the goal. The number of trials required is comparable to reported animal performance in similar tasks.
The model also solves the acrobot (a two-link pendulum swing-up task) and cartpole (balance a pole on a moving cart), demonstrating that the framework generalizes beyond navigation to general motor control.
Key insight: The continuous-time formulation is not just a mathematical detail — it enables the model to handle naturally varying timescales of behavior and neural dynamics, which discrete-time models struggle with.
Important Methods & Highlighted Points
- Continuous TD learning: Based on Doya (2000), the TD error is computed as δ(t) = r(t) − V(t) + dV/dt, where V(t) is the critic’s value estimate. This is the continuous-time limit of the discrete TD(0) algorithm.
- Eligibility traces: The model uses a continuous eligibility trace κ(t) that decays exponentially. This allows synaptic updates to be contingent on recent pre-post spike pairings, solving the credit assignment problem in continuous time.
- Spike-timing-dependent plasticity (STDP): The learning rule naturally implements a form of dopamine-modulated STDP where the TD error gates the standard STDP window.
- Neuromodulator as TD error: The model treats dopamine (or a similar neuromodulator) as broadcasting δ(t) globally, consistent with the diffuse projection patterns of dopaminergic neurons.
Takeaway
This paper is a landmark in computational neuroscience because it shows that reinforcement learning with spiking neurons is not just theoretically possible — it’s biologically plausible.
For Raghavendra’s interests:
- Dopamine and TD learning: The paper provides a concrete neural implementation of the dopamine = TD error hypothesis, going beyond abstract models.
- Eligibility traces: The continuous eligibility trace formalism connects to the Lehmann et al. eLife paper (featured in May 8) on behavioral eligibility traces in humans.
- Continuous vs. discrete: The extension to continuous time is crucial for biological realism. Brains don’t operate in discrete time steps.
- Gerstner lab: This is classic Gerstner — rigorous theory, biologically plausible implementation, and connection to experimental data. The learning rule derived analytically matches experimental observations of dopamine-modulated STDP.
Paywalledunread Papers Needing PDFs
The following papers are taggedunread but are behind paywalls. Please send PDFs when possible:
- Behavioral timescale synaptic plasticity: properties, elements and functions — Magee, Nature Neuroscience 2026
- Visual cortex papers from Twitter — Science + Neuron (digital twin visual cortex work)
- The deteriorating soma and the indispensable — Royal Society Proceedings B
Digest generated on May 10, 2026 | Variant B (Detail-First)