Decoding mouse navigation decisions by analyzing L2/3 neuron activity in the retrosplenial cortex, following the findings of Tseng et al. (2022) in their Neuron Paper.
Welcome to our teams' repository for McGill's PharmaHacks Neural Decoding challenge! If you are interested in viewing our code, you can click here to open the notebook.
Hyperlinks lead to LinkedIn profiles*
PharmaHacks is a hackathon organized by students of McGill University. PharmaHacks' mission is to "provide interested students with bioinformatics/cheminformatics training through extracurricular means to prepare them for future jobs in industry, academia, and government."[^1]
We want to thank the organizers for putting together such an amazing event, we look forward to future events!
[!NOTE] PharmaHacks 2024 had two challenges;
- Neural Decoding: From Calcium Imaging Data, analyze and predict results from neural activity.
- Genomics: Using scRNA-seq data, predict Covid-19 case severity in patients
[^1]: PharmaHacks' LinkedIn
Neuroscientist & Dr. Shih-Yi Tseng et al. published a Neuron paper documenting their experiments performed on 8 mice. The experiment captures over 200,000 neurons over 6 areas of the mices' posterior cortices; L2/3 and L5, V1, secondary visual areas, Retrosplenial Cortex (RSC), and the Posterior Parietal Cortext (PPC).
The mice are given two possibilities, black walls or white walls. The correct choice of turning is according to two rules.
Rule A: the mice must turn left when the walls are black and right when the walls are white.
Rule B: the mice must turn right when the walls are black and left when the walls are white.
The maze the mice were trialed in is shown above.
It is a Y-shaped maze with two choices, left or right. After they make their turn, they are looped back to the beginning of the maze & trialed again (approx. 400 trials per day of experimenting).
Thanks to their experiment, we are able to access the mices' neural data and analyze what neuron activation corresponds to navigation decision making.
From the data provided by the researchers, we were tasked with creating a Machine Learning model that would be able to predict a mouse's position in the maze.
Our first mission was to understand the data. after thorough research & analysis of the neural paper & use tutorial of the data, we narrowed down our focus to these specific factors:
The data has 4 deconvoluted planes, each of which are desynchronized from one another & have many NaN (missing) values. Below was our process to resolve these issues;
Unsynchronized data:
NaN values: Two methods of resolution
Once it came down to choosing a model, we had to research different categories of models. Through our prior analysis, we knew we wanted to use something of the classification/regression sort which led us to using a RandomForestRegressor.
What is a RandomForestRegressor?
To explain this, first we have to look at what a DecisionTreeRegressor is. A DecisionTreeRegressor is a model that recursively splits the training data into partitions. These splits allow for the model to choose which data best fits the training data & predicts off of the most accurate splits (leafs).
So what is a RandomForestRegressor? A RandomForestRegressor creates & trains multiple DecisionTreeRegressors on subsets of the data. It then chooses the DecisionTreeRegressors with the lowest error indicators & averages them together to create the most accurate possible version.
Our two models had the following MSEs:
Below are the graphs for our model results.
πββοΈ Overview of forward movement prediction results:
πββοΈ Zoom-in of forward movement prediction results:
Here we have a general view of our models vs. the actual data for the forward movement of the mouse in the maze.
Blue plotting: Actual data.
Orange: Dropped NaN model predictions. We can see that the model predicts closely to the actual neural data though there are few inaccuracies in it's predictions (dips & spikes) though it mostly follows the same trajectory as the actual data.
Green: Iterative Imputer model predictions. In the beginning of the Overview graph, we can see a slight buffer before the graphing begins. This is due to the modeel analyzing the data first and it then begins to predict once it has a grasp on how it should be predicting.
β€΅οΈ Overview of lateral movement prediction results:
β€΅οΈ Zoom-in of lateral movement prediction results:
In these charts, we see the difference in predictions of our models vs. the real lateral movement data.
Blue plotting: Actual data.
Orange: Dropped NaN model predictions. The dropped NaN model follows the same trend as the actual data with some hiccups.
Green: Iterative Imputer model predictions. We can see that the imputed data model quite accurately follows the actual data.
Through analyzation of the graphs, we can see that while there are a few hiccups, our models are quite successful in their predictions of the neural navigation activation. Due to the lack of time and computing resources, we were only able to train our models off of one *.nwb (Neurodata Without Borders file-type) file. Having only been trained on one file, our models still resulted in MSE's of 0.046 & 0.047 which we see as very good results.
In our model that dropped all NaNs, we see that the model closely predicts the actual data though in some instances overfits which can be expected when we remove data in a case such as this.
On the side of our imputed model, we see that the model is more active in it's predictions, making more movement than both the dropped NaN model & the actual data. Again, this makes sense as when you impute over missing values, it can lead to more variance in prediction.
Though we did not win, we ultimately all enjoyed our time at PharmaHacks 2024. Again, thank you to the PharmaHacks team for organizing such a fun and educational event.
[!IMPORTANT] Here is the link to the team on their website.
Here is the link to their LinkedIn.