diff --git a/07_final_assignment/paper/main.tex b/07_final_assignment/paper/main.tex
index da86ae7..4ed23eb 100644
--- a/07_final_assignment/paper/main.tex
+++ b/07_final_assignment/paper/main.tex
@@ -12,7 +12,7 @@
 \usepackage{todonotes}
 \title{Simulation of Grainger et al. (2012) with Rescorla Wagner equations}
 \shorttitle{Grainger et al. (2012) simulation with RW equations}
-\author{Robert Geirhos (3827808), Klara Grethen (3899962), \\David-Elias Künstle (3822829), Felicia Saar (3818590)}
+\author{Robert Geirhos (3827808), Klara Grethen (3899962), \\David-Elias Künstle (3822829), Felicia Saar (3818590),\\Julia Maier (), Marlene Weller (), Anne-Kathrin Mahlke (3897867)}
 \affiliation{Linguistics for Cognitive Science Course, University of Tübingen}
 
 \abstract{TODO TODO TODO our abstract goes here TODO TODO TODO}
@@ -39,9 +39,16 @@
 
 %\cite{}
 \section{Introduction}
-\todo{statement of the problem}
+Computational models have been used for a very long time in an attempt to represent actual events and phenomena. These models help scientists formulate their hypotheses more precisely, and to test these by applying the model to different situations. The models are also used to make predictions about future behavior and events.\\
+One scientific field, where computational models are becoming increasingly important, is linguistics, especially the field of language learning. This is where our modelling attempt can be placed as well.\\
+In 2012, Grainger et al. tested Baboons on their ability to learn and recognize words as opposed to non-words, when presented to them in written form on a computer screen. Their goal was to show that it is possible to process orthographic information without knowledge of its semantic component and by that, that it ist possible to learn to 'read' (to a certain extend), without prior language knowledge.\\
+To achieve that goal, they trained Baboons (who had, of course, no knowledge of human language) to discriminate words from non-words that were presented on a computer screen. The Baboons were able to independently start blocks of 100 trials in which they would be presented with 50 non-words, 25 already known words and always the same unknown word in the other 25 trials in random order. They reacted by pressing one of two buttons on the screen and were rewarded with food every time they answered correctly. A word was regarded as known once 80\% of the responses to it in one block of trials were correct. It was then added to the group of already known words and used as such in subsequent trials. The difference between the words and the non-words was, for the monkeys, that words appeared repeatedly, while one single non-word was only shown very few times.\\
+The results show that correctness of the responses for both words and nonwords grew above chance very quickly, while word accuracy was slightly higher overall. It also became clear, that the monkeys did not only recognize words because of their appearing more often, but also were able to find patterns and by that recognize new words as words quite quickly.\\
+We decided to try to model the results from Grainger et al. (2012) using Naive Discriminative Learning (NDL), which is a concept of modelling learning (and also an R-package) based on the Rescorla-Wagner model (Rescorla \& Wagner, 1972) and the equilibrium equations by Danks (2003).
 
 \section{Simulations}
+\subsection{Stimuli}
+100 trials in each block
 \subsection{Experimental Code}
 \todo{why we didn't use the given code, what we improved, how the result is structured - Goal: modular and comprehensive experiment. Problems with paper and given code. What's a block in our experiment.}
 Since preliminary experiments showed that the monkeys performed with very high accuracies (>90\%), we decided to introduce a random parameter $ r $ in the experiment, defined as the fraction of times the monkey would make a random guess instead of an experience-based prediction.
@@ -56,7 +63,7 @@
 
 \subsubsection{Lambda}
 The independent variable $\lambda$ represents the maximum activation in the Rescorla-Wagner model and therefore limits the learning.
-It makes it possible to modulate saliency of a stimulus. A more salient stimulus could not only have higher learning rates but also a higher maximum activation. In the original experiment the stimulus were same colored words and nonwords with four letters on a equally colored background. We assume the single words and nonwords are equally salient and keep therefore $\lambda$ constant (1). 
+It makes it possible to modulate saliency of a stimulus. A more salient stimulus could not only have higher learning rates but also a higher maximum activation. In the original experiment the stimulus were same colored words and nonwords with four letters on an equally colored background. We assume the single words and nonwords are equally salient and keep therefore $\lambda$ constant (1). 
 
 \subsection{Running Parallelized Experiments}
 Running an experiment with a single combination of $ \alpha $ and $ \beta $ on a normal desktop computer took about 75 minutes. Therefore, the parameter space one could explore within a reasonable amount of time was quite restricted. We decided to write a parallelized version of the code to reduce the overall runtime. Using the R packages foreach, parallel and doParallel \todo{(TODO: Cite them properly)}, we restructured the experiment. Since conflicts can easily occur when more than one core is trying to access a shared data structure at the same time, we implemented a parallelized version that is able to run without even containing critical sections. Instead, each thread has its own data structure, a .txt file, and in the end the results are harvested and combined. This version of the experiment ran on a cluster with 15 cores, each performing a total amount of eight experiments. Altogether, 120 combinations of $ \alpha $ and $ \beta $ were explored overnight, which would have taken about 150 hours in a non-parallelized version.
@@ -91,6 +98,9 @@
 
 \section{Discussion}
 \todo{"your conclusions about what is most likely to underlie the different success rates of the baboons"}
+While working on this project, we encountered several problems of different natures: 
+Firstly, our model learns a lot faster than the monkeys in the actual experiment, which we tried to account for by using the random parameter mentioned above, and also by keeping our learning rates $\alpha$ and $\beta$ very small. However, we were also restricted here, as we did not want to use floating-point numbers, which might have led to unforeseeable behaviour. 
+Secondly, the after studying the original paper, we were left with some unanswered questions
 
 \appendix
 \section{Complete Results}