diff --git a/07_final_assignment/paper/main.tex b/07_final_assignment/paper/main.tex
index 32a1cf2..7658e42 100644
--- a/07_final_assignment/paper/main.tex
+++ b/07_final_assignment/paper/main.tex
@@ -41,12 +41,17 @@
 
 %\cite{}
 \section{Introduction}
-Computational models have been used for a very long time in an attempt to represent actual events and phenomena. These models help scientists formulate their hypotheses more precisely, and to test these by applying the model to different situations. The models are also used to make predictions about future behavior and events.\\
-One scientific field, where computational models are becoming increasingly important, is linguistics, especially the field of language learning. This is where our modelling attempt can be placed as well.\\
-In 2012, Grainger et al. tested Baboons on their ability to learn and recognize words as opposed to non-words, when presented to them in written form on a computer screen. Their goal was to show that it is possible to process orthographic information without knowledge of its semantic component and by that, that it ist possible to learn to 'read' (to a certain extend), without prior language knowledge.\\
-To achieve that goal, they trained Baboons (who had, of course, no knowledge of human language) to discriminate words from non-words that were presented on a computer screen. The Baboons were able to independently start blocks of 100 trials in which they would be presented with 50 non-words, 25 already known words and always the same unknown word in the other 25 trials in random order. They reacted by pressing one of two buttons on the screen and were rewarded with food every time they answered correctly. A word was regarded as known once 80\% of the responses to it in one block of trials were correct. It was then added to the group of already known words and used as such in subsequent trials. The difference between the words and the non-words was, for the monkeys, that words appeared repeatedly, while one single non-word was only shown very few times.\\
-The results show that correctness of the responses for both words and nonwords grew above chance very quickly, while word accuracy was slightly higher overall. It also became clear, that the monkeys did not only recognize words because of their appearing more often, but also were able to find patterns and by that recognize new words as words quite quickly.\\
-We decided to try to model the results from Grainger et al. (2012) using Naive Discriminative Learning (NDL), which is a concept of modelling learning (and also an R-package) based on the Rescorla-Wagner model (Rescorla \& Wagner, 1972) and the equilibrium equations by Danks (2003).
+Computational models have been used for a very long time in an attempt to represent actual events and phenomena. These models help scientists formulate their hypotheses more precisely, and to test these by applying the model to different situations. The models are also used to make predictions about future behavior and events.
+
+One scientific field, where computational models are becoming increasingly important, is linguistics, especially the field of language learning. This is where our modelling attempt can be placed as well.
+
+In \citeyear{Grainger245}, \citeauthor{Grainger245} tested Baboons on their ability to learn and recognize words as opposed to non-words, when presented to them in written form on a computer screen. Their goal was to show that it is possible to process orthographic information without knowledge of its semantic component and by that, that it ist possible to learn to 'read' (to a certain extend), without prior language knowledge.
+
+To achieve that goal, they trained Baboons (who had, of course, no knowledge of human language) to discriminate words from non-words that were presented on a computer screen. The Baboons were able to independently start blocks of 100 trials in which they would be presented with 50 non-words, 25 already known words and always the same unknown word in the other 25 trials in random order. They reacted by pressing one of two buttons on the screen and were rewarded with food every time they answered correctly. A word was regarded as known once 80\% of the responses to it in one block of trials were correct. It was then added to the group of already known words and used as such in subsequent trials. The difference between the words and the non-words was, for the monkeys, that words appeared repeatedly, while one single non-word was only shown very few times.
+
+The results show that correctness of the responses for both words and nonwords grew above chance very quickly, while word accuracy was slightly higher overall. It also became clear, that the monkeys did not only recognize words because of their appearing more often, but also were able to find patterns and by that recognize new words as words quite quickly.
+
+We decided to try to model the results from \textcite{Grainger245} using Naive Discriminative Learning (NDL), which is a concept of modelling learning (and also an R-package) based on the Rescorla-Wagner model \parencite{rescorla1972theory} and the equilibrium equations by Danks (2003).
 
 \subsection{Naive Discrimination Learning}
 Since the first experiments in modern learning theory by Ivan Pavlov it's observable, that learning is not only making associations between co-occurring cues and outcomes but discriminating which cues predict the presence and the absence of a outcome \parencite{baayen2015abstraction}.
@@ -125,11 +130,16 @@
 
 \section{Discussion}
 
-The results show that our model is actually too good for the actual monkeys. Only the random parameter we introduced made it possible to obtain similar results as in the original experiment. When trying to account for the unequality only by lowering the learning rates, we encountered a restriction in form of the need to use floating-point numbers, which might have led to unforeseeable behaviour. Therefore, we chose to use the random parameter instead.\\
-Unfortunately, some information on the exact conduct of the original experiment was missing in the paper, so we had to guess some of the details. For expample, it was not made clear what a block of trials would have looked like in the first few blocks, when there were no already known words to be used in the corresponding 25\% of the block.\\
-We were also slightly unhappy with the definition of a word being learned, which was when the word had 80\% accuracy of recognition. We would expect this definition to become proplematic when a word was 'almost' learned, but not quite reaching the 80\%. In the next block with that word, the learning would be a lot quicker than for an actually new word. It might be a good idea to monitor and save the knowledge level concerning one specific word an measuring the actual number of reptitions a word needed to become known.\\
-Concerning our code, there are a few measurements that could be taken to improve it, too. As mentioned above, we parallelised the process because it would have otherwise taken far too long to calculate. It would be very interesting to look into ways to make the program run even faster, therefore enabling more trials to be run and therefore resulting in more data and exacter results.\\
-Shortening running times would also make it possible to re-run the program with more words to see if there are changes in the later learning process which we now could not explore due to lack of words. The mode of presentation could be reassessed, as well as whether the number of letters changes the behaviour of the model.\\
+The results show that our model is actually too good for the actual monkeys. Only the random parameter we introduced made it possible to obtain similar results as in the original experiment. When trying to account for the unequality only by lowering the learning rates, we encountered a restriction in form of the need to use floating-point numbers, which might have led to unforeseeable behaviour. Therefore, we chose to use the random parameter instead.
+
+Unfortunately, some information on the exact conduct of the original experiment was missing in the paper, so we had to guess some of the details. For expample, it was not made clear what a block of trials would have looked like in the first few blocks, when there were no already known words to be used in the corresponding 25\% of the block.
+
+We were also slightly unhappy with the definition of a word being learned, which was when the word had 80\% accuracy of recognition. We would expect this definition to become proplematic when a word was 'almost' learned, but not quite reaching the 80\%. In the next block with that word, the learning would be a lot quicker than for an actually new word. It might be a good idea to monitor and save the knowledge level concerning one specific word an measuring the actual number of reptitions a word needed to become known.
+
+Concerning our code, there are a few measurements that could be taken to improve it, too. As mentioned above, we parallelised the process because it would have otherwise taken far too long to calculate. It would be very interesting to look into ways to make the program run even faster, therefore enabling more trials to be run and therefore resulting in more data and exacter results.
+
+Shortening running times would also make it possible to re-run the program with more words to see if there are changes in the later learning process which we now could not explore due to lack of words. The mode of presentation could be reassessed, as well as whether the number of letters changes the behaviour of the model.
+
 Lastly, of course, different models could be used in the experiment, to see if other models fit the results of the actual monkeys better.
 
 \newpage
@@ -155,8 +165,4 @@
 
 \lstinputlisting[language=R]{../baboonSimulation.R}
 
-\end{document}
-%%% Local Variable:
-%%% mode: latex
-%%% TeX-master: t
-%%% End:
+\end{document}
\ No newline at end of file