diff --git a/07_final_assignment/paper/main.tex b/07_final_assignment/paper/main.tex index 4ed23eb..58f3201 100644 --- a/07_final_assignment/paper/main.tex +++ b/07_final_assignment/paper/main.tex @@ -12,10 +12,10 @@ \usepackage{todonotes} \title{Simulation of Grainger et al. (2012) with Rescorla Wagner equations} \shorttitle{Grainger et al. (2012) simulation with RW equations} -\author{Robert Geirhos (3827808), Klara Grethen (3899962), \\David-Elias Künstle (3822829), Felicia Saar (3818590),\\Julia Maier (), Marlene Weller (), Anne-Kathrin Mahlke (3897867)} +\author{Robert Geirhos (3827808), Klara Grethen (3899962), \\David-Elias Künstle (3822829), Felicia Saar (3818590),\\Julia Maier (3879869), Marlene Weller (3837283), Anne-Kathrin Mahlke (3897867)} \affiliation{Linguistics for Cognitive Science Course, University of Tübingen} -\abstract{TODO TODO TODO our abstract goes here TODO TODO TODO} +\abstract{We try to simulate the results of a word learning experiments with baboons. To that end we use ndl, which is based on the Rescorla-Wagner learning model. The learning parameters by themselves re not able to make learning slow enough to be coparable to the monkeys, which is why we introduced a random parameter that makes the models take random guesses in 65\% of the trials. That way, we can successfully model the monkeys' performance.} \lstset{ % basicstyle=\footnotesize, % the size of the fonts that are used for the code @@ -47,10 +47,12 @@ We decided to try to model the results from Grainger et al. (2012) using Naive Discriminative Learning (NDL), which is a concept of modelling learning (and also an R-package) based on the Rescorla-Wagner model (Rescorla \& Wagner, 1972) and the equilibrium equations by Danks (2003). \section{Simulations} + \subsection{Stimuli} -100 trials in each block +For stimuli we used the words given in the supplemetary material of the original paper. The list contained 307 four-letter words and 7832 non-words, each also made up of four letters. In every trial, the word or non-word was presented split into overlapping trigrams (for example for the word atom: \#at, ato, tom, om\#), one trigram after the other, as proposed by Baayen et al. (2016). + \subsection{Experimental Code} -\todo{why we didn't use the given code, what we improved, how the result is structured - Goal: modular and comprehensive experiment. Problems with paper and given code. What's a block in our experiment.} +%\todo{why we didn't use the given code, what we improved, how the result is structured - Goal: modular and comprehensive experiment. Problems with paper and given code. What's a block in our experiment.} Since preliminary experiments showed that the monkeys performed with very high accuracies (>90\%), we decided to introduce a random parameter $ r $ in the experiment, defined as the fraction of times the monkey would make a random guess instead of an experience-based prediction. \subsection{Choice of Parameters} @@ -60,16 +62,16 @@ $$ p_{max} = 1 - \frac{r}{2} = 0.675$$ In other words, the maximum possible performance is no longer 1.0 (for a very intelligent monkey) but rather restricted by $ r $. If a monkey's performance is slightly better than $ p_{max} $, this is assured to be due to chance. \subsubsection{Alpha and Beta} Both $ \alpha $ and $ \beta $ were our independent variables which we manipulated over the course of the experiments. We gathered data for every possible combination of $ \alpha $ and $ \beta $ values within an equally spaced range from 0.0 to 0.3. A total of 15 values for each $ \alpha $ and $ \beta $ were combined to $ 15*15 = 225 $ possible combinations. Since $ \alpha $ and $ \beta $ were internally multiplied to a single value, we expected the outcome to be symmetrical due to the commutativity of the multiplication operation and therefore calculated each combination of $ \alpha $ and $ \beta $ only once, which we used as a trick to improve the overall runtime. Therefore, $\sum_{i=1}^{15}i = 120$ combinations remained to be explored. - \subsubsection{Lambda} The independent variable $\lambda$ represents the maximum activation in the Rescorla-Wagner model and therefore limits the learning. It makes it possible to modulate saliency of a stimulus. A more salient stimulus could not only have higher learning rates but also a higher maximum activation. In the original experiment the stimulus were same colored words and nonwords with four letters on an equally colored background. We assume the single words and nonwords are equally salient and keep therefore $\lambda$ constant (1). \subsection{Running Parallelized Experiments} -Running an experiment with a single combination of $ \alpha $ and $ \beta $ on a normal desktop computer took about 75 minutes. Therefore, the parameter space one could explore within a reasonable amount of time was quite restricted. We decided to write a parallelized version of the code to reduce the overall runtime. Using the R packages foreach, parallel and doParallel \todo{(TODO: Cite them properly)}, we restructured the experiment. Since conflicts can easily occur when more than one core is trying to access a shared data structure at the same time, we implemented a parallelized version that is able to run without even containing critical sections. Instead, each thread has its own data structure, a .txt file, and in the end the results are harvested and combined. This version of the experiment ran on a cluster with 15 cores, each performing a total amount of eight experiments. Altogether, 120 combinations of $ \alpha $ and $ \beta $ were explored overnight, which would have taken about 150 hours in a non-parallelized version. +Running an experiment with a single combination of $ \alpha $ and $ \beta $ on a normal desktop computer took about 75 minutes. Therefore, the parameter space one could explore within a reasonable amount of time was quite restricted. We decided to write a parallelized version of the code to reduce the overall runtime. Using the R packages foreach, parallel and doParallel %\todo{(TODO: Cite them properly)} +, we restructured the experiment. Since conflicts can easily occur when more than one core is trying to access a shared data structure at the same time, we implemented a parallelized version that is able to run without even containing critical sections. Instead, each thread has its own data structure, a .txt file, and in the end the results are harvested and combined. This version of the experiment ran on a cluster with 15 cores, each performing a total amount of eight experiments. Altogether, 120 combinations of $ \alpha $ and $ \beta $ were explored overnight, which would have taken about 150 hours in a non-parallelized version. \section{Results} -\todo{results} +The number of words learned by the actual monkeys ranged between 87 and 308. With the chosen range for $\alpha$ and $\beta$, we obtained between 275 and 307 learned words, however, it is important to note that we only presented 307 words, so the model reached maximum learning potential. The general accuracy for the real monkeys lay between 71.14\% and 79.81\%, while our accuracies moved between 0.60 and 0.68. Accuracies for word and non-word decisions are similar in both cases. \begin{figure*}[ht] \centering @@ -97,11 +99,15 @@ \section{Discussion} -\todo{"your conclusions about what is most likely to underlie the different success rates of the baboons"} -While working on this project, we encountered several problems of different natures: -Firstly, our model learns a lot faster than the monkeys in the actual experiment, which we tried to account for by using the random parameter mentioned above, and also by keeping our learning rates $\alpha$ and $\beta$ very small. However, we were also restricted here, as we did not want to use floating-point numbers, which might have led to unforeseeable behaviour. -Secondly, the after studying the original paper, we were left with some unanswered questions +%\todo{"your conclusions about what is most likely to underlie the different success rates of the baboons"} +The results show that our model is actually too good for the actual monkeys. Only the random parameter we introduced made it possible to obtain similar results as in the original experiment. When trying to account for the unequality only by lowering the learning rates, we encountered a restriction in form of the need to use floating-point numbers, which might have led to unforeseeable behaviour. Therefore, we chose to use the random parameter instead.\\ +Unfortunately, some information on the exact conduct of the original experiment was missing in the paper, so we had to guess some of the details. For expample, it was not made clear what a block of trials would have looked like in the first few blocks, when there were no already known words to be used in the corresponding 25\% of the block.\\ +We were also slightly unhappy with the definition of a word being learned, which was when the word had 80\% accuracy of recognition. We would expect this definition to become proplematic when a word was 'almost' learned, but not quite reaching the 80\%. In the next block with that word, the learning would be a lot quicker than for an actually new word. It might be a good idea to monitor and save the knowledge level concerning one specific word an measuring the actual number of reptitions a word needed to become known.\\ +Concerning our code, there are a few measurements that could be taken to improve it, too. As mentioned above, we parallelised the process because it would have otherwise taken far too long to calculate. It would be very interesting to look into ways to make the program run even faster, therefore enabling more trials to be run and therefore resulting in more data and exacter results.\\ +Shortening running times would also make it possible to re-run the program with more words to see if there are changes in the later learning process which we now could not explore due to lack of words. The mode of presentation could be reassessed, as well as whether the number of letters changes the behaviour of the model.\\ +Lastly, of course, different models could be used in the experiment, to see if other models fit the results of the actual monkeys better. +\newpage \appendix \section{Complete Results} Here are the complete results of our experiments. The abbreviations used are: