diff --git a/text/thesis/01Introduction.tex b/text/thesis/01Introduction.tex
index 3839c7d..ef4bf57 100644
--- a/text/thesis/01Introduction.tex
+++ b/text/thesis/01Introduction.tex
@@ -9,14 +9,14 @@
     In a slightly different context it might become possible to handle a machine (e.g. an industrial robot or mobile robots like quadrocopters) with \qq{thoughts} (i.e. brain activity) like an additional limb. One could learn to use the possibilities of the robot like the possibilities of his arm and hand to modulate something.\\
     Similar to that application it could be possible to drive a car by brain activity. This would lower the reaction time needed to activate the breaks for example by direct interaction instead of using the nerves down to the leg to press the break.
 
-    Using non-invasive methods like EEG makes it harder to get a good signal and determine its origin. However it lowers the risk of injuries and infections which makes it the method of choice for wide spread application (cf. \cite{Collinger13}). Modern versions of these caps even use dry electrodes which allow for more comfort without loosing predictive strength (cf. \cite{Yeung15}). So everybody may put on and off an EEG-cap  without high costs for production or placement.
+    Using non-invasive methods like EEG makes it harder to get a good signal and determine its origin. However it lowers the risk of injuries and infections which makes it the method of choice for wide spread application (cf. \cite{Collinger13}). Modern versions of EEG-caps even use dry electrodes which allow for more comfort without loosing predictive strength (cf. \cite{Yeung15}). So everybody may put on and off an EEG-cap  without high costs for production or placement.
 
     Predicting synergies instead of positions or movement is closer to the concept the nervous system uses. This should make it easier to predict them while we can also use them to move an robotic arm or an quadrocopter.
     Because there are different possibilities to calculate synergies from EMG we compare them and try to reconstruct movement from them.
 
     To be able to compare the results similar calculations were done with other data and paradigms like direct prediction from EEG.
 \section{Overview}
-    After this Introduction in Materials and Methods (Chapter \ref{chp:mat}) we show the scientific background based on the methods used in the work. These reach from PCA and Autoencoders over SVMs and regression to boxplots and topographical plots.\\
+    After this Introduction in Materials and Methods (Chapter \ref{chp:mat}) we show the scientific background of the methods used in the work. These reach from PCA and Autoencoders over SVMs and regression to boxplots and topographical plots.\\
     In chapter \ref{chp:results} Results we show the numerical findings of our work separated into parts on synergies, classification, regression and a topographical analysis of the brain activity.\\
     This results and their meaning will be discussed in chapter \ref{chp:dis} Discussion.\\
     Finally we take a look in the possible future and discuss which further research could be done based on or related to our work (chapter \ref{chp:fut}).
diff --git a/text/thesis/02MaterialsAndMethods.tex b/text/thesis/02MaterialsAndMethods.tex
index 17fe984..b7e2895 100644
--- a/text/thesis/02MaterialsAndMethods.tex
+++ b/text/thesis/02MaterialsAndMethods.tex
@@ -17,7 +17,7 @@
         The frequencies typically used for movement prediction in EEG are about 8-24 Hz (\cite{Blokland15},\cite{Ahmadian13},\cite{Wang09}).
         EEG is often used for non-invasive BCIs because it is cheap and easier to use than e.g. fMRI. The electrodes have to be spread over the scalp. To allow for comparability there are standardized methods for this. These methods also bring a naming convention with them.
         \subsubsection{10-20 system}
-            In this standard adjacent electrodes are placed either 10\% or 20\% of the total front-back or left-right distance apart. This standardization also makes it possible to name each electrode or rather here place. This is done with capital letters for lobes (Frontal, \qq{Central}, Parietal, Occipital and Temporal) and numbers for the specific place on the lobe. Even numbers are on the right side of the head, odd on the left; larger numbers are closer to the ears, lower numbers closer to the other hemisphere. The exact number now refers to the exact distance from center: $$\left\lceil\frac{x}{2}\right\rceil\cdot \frac{d}{10}$$ where $x$ is the number and $d$ the diameter of the scalp. Electrodes in the centre are named with a lower case $z$ e.g. $Cz$.\\
+            In this standard adjacent electrodes are placed either 10\% or 20\% of the total front-back or left-right distance apart. This standardization also makes it possible to name each electrode or rather here place. This is done with capital letters for lobes (Frontal, \qq{Central}, Parietal, Occipital and Temporal) and numbers for the specific place on the lobe. Even numbers are on the right side of the head, odd on the left; larger numbers are closer to the ears, lower numbers closer to the other hemisphere. The exact number now refers to the exact distance from center: $$\left\lceil\frac{x}{2}\right\rceil\cdot \frac{d}{10}$$ where $x$ is the number and $d$ the diameter of the scalp. Electrodes in the center are named with a lower case $z$ e.g. $Cz$.\\
             Electrodes between two lobes (10\% instead of 20\% distance) are named with the both adjacent lobes (anterior first) e.g. $FCz$ (between frontal and central lobe).
             Also see figure~\ref{fig:10-20}.
         \begin{figure}[!p]
@@ -34,14 +34,14 @@
                 \item Alpha: 7.5-12.5 Hz (depending also on age)
                 \item Beta: 13-20Hz
             \end{itemize}
-            There are different definitions of the limits of the bands, as we only use them for rough estimation we stick on these. For more exact results an analysis of wave patterns would be necessary.
+            There are different definitions of the limits of the bands, as we only use them for rough estimation we stick to these. For more exact results an analysis of wave patterns would be necessary.
     \subsection{Power estimation}
         \subsubsection{EEG}
             To use data from EEG one way is to analyze the occurring frequencies and their respective power.\\
             To gain these from the continuous signal there are different methods. The intuitive approach would be to use Fourier transformation however the Fourier transform does not need to exists for a continuous signal. So we used power spectral density (PSD) estimation.
         \subsubsection{Power spectral density estimation}
-            The PSD is the power per frequency. Power here refers to the square of the amplitude.\\
-            If the Fourier transform is existing, PSD can be calculated from it e.g. as periodogram. If not it has to be estimated. One way to do so is parametrized with an Autoregressive model(AR). Here one assumes that the there is a correlation of the spectral density between $p$ consecutive samples and the following one. This leads to an equation with only $p$ parameters which can be estimated in different ways. We used Burg's method (\texttt{pburg} from \matlab{} library).\\
+            The PSD is the power per frequency, where power refers to the square of the amplitude.\\
+            If the Fourier transform is existing, PSD can be calculated from it e.g. as periodogram. If not it has to be estimated. One way to do so is parametrized with an Autoregressive model (AR). Here one assumes that there is a correlation of the spectral density between $p$ consecutive samples and the following one. This leads to an equation with only $p$ parameters which can be estimated in different ways. We used Burg's method (\texttt{pburg} from \matlab{} library).\\
             In Figure~\ref{fig:psd} we see the difference between autoregressive \texttt{pburg} and periodogram \texttt{pwelch} PSD estimation.
             \begin{figure}
                 \includegraphics[width=\textwidth]{psd.png}
@@ -53,25 +53,25 @@
         \label{mat:burg}
             Burg's method (\cite{Burg75}) is a special case of parametric PSD estimation. It interprets the Yule-Walker-Equations as least squares problem and iteratively estimates solutions.\\
             According to \cite{Huang14} Burg's method fits well in cases with the need of high resolution.\\
-            Burg and Levinson-Durbin algorithms are examples for PSD estimation where an autoregressive model is used instead of Fast Fourier Transformation. The approach is described well by Spyers-Ashby et al. (\cite{Spyers98}). The idea is to lower the number of parameters determining the production of the signal. The number of parameters used is called \textit{model order} (250 in our example, lower in most cases). These parameters are estimated from the original data. For PSD estimation the modeled values are used which allows easier transformation since the data is generated by an known process.\\
-            Often the Rational transfer function modeling is used having the general form of $$x_n=-\sum\limits_{k=1}^p a_kx_{n-k}+ \sum\limits_{k=0}^q b_ku_{n-k},$$ where $x_n$ is the output, $u_n$ the input. $a,b$ are the system parameters which have to be estimated from original data. As we have unknown input in our application the output can only be estimated which simplifies the formula as follows $$\hat{x}_n=-\sum\limits_{k=1}^p a_k\hat{x}_{n-k}.$$
+            Burg and Levinson-Durbin algorithms are examples for PSD estimation which use an autoregressive model instead of Fast Fourier Transformation. The approach is described well by \cite{Spyers98}. The idea is to lower the number of parameters determining the production of the signal. The number of parameters used is called \textit{model order} (250 in our example, lower in most cases). These parameters are estimated from the original data. For PSD estimation the modeled values are used which allows easier transformation since the data is generated by an known process.\\
+            Often the Rational Transfer Function Modeling is used having the general form of $$x_n=-\sum\limits_{k=1}^p a_kx_{n-k}+ \sum\limits_{k=0}^q b_ku_{n-k},$$ where $x_n$ is the output, $u_n$ the input. $a,b$ are the system parameters which have to be estimated from original data. As we have unknown input in our application the output can only be estimated which simplifies the formula as follows $$\hat{x}_n=-\sum\limits_{k=1}^p a_k\hat{x}_{n-k}.$$
             Estimating the parameters is done by minimizing the forward prediction error $E$: $$E=\frac{1}{N}\sum\limits_{i=1}^N \left(x_i-\hat{x}_i\right)^2$$
-            The minimum has zero slope and can be found by setting the derivative to zero:$$\frac{\partial E}{\partial a_k},\text{ for } 1\le k\le p$$
+            The minimum has zero slope and can be found by setting the derivative to zero:$$\frac{\partial E}{\partial a_k}=0,\text{ for } 1\le k\le p$$
             This yields a set of equations called \emph{Yule-Walker-Equations} (cf. \cite{Yule27},\cite{Walker31}).\\
             Using forward and backward prediction the parameters ($a_k$) are estimated based on the Yule-Walker-Equations then.
     \subsection{Low Frequencies}
         In the 2000s there began a movement using new techniques to record ultrafast and infraslow brainwaves (above 50Hz and below 1Hz). These were found to have some importance (cf. \cite{Vanhatalo04}).\\
-        Also in predicting movements there was found some significance in low frequency as was done by Liu et al. (\cite{Liu11}) and Antelis et al. (\cite{Antelis13}) for example. Antelis et al. found correlations between hand movement and low frequency signal of $(0.29,0.15,0.37)$ in the dimensions respectively.\\
-        Lew et al. (\cite{Lew14}) state low frequencies are mainly involved in spontaneous self-induced movement and can be found before the movement starts. By this they may be a great possibility to lower reaction time of neuroprostheses for example.
+        Also in predicting movements there was found some significance in low frequency as was done by \cite{Liu11} and \cite{Antelis13} for example. \citeauthor{Antelis13} found correlations between hand movement and low frequency signal of $(0.29,0.15,0.37)$ in the dimensions respectively.\\
+        \cite{Lew14} state low frequencies are mainly involved in spontaneous self-induced movement and can be found before the movement starts. By this they may be a great possibility to lower reaction time of neuroprostheses for example.
     \subsection{Filtering}
-        Filtering of the recorded EEG signal is necessary for different reasons. First there are current relics from 50Hz current. These can be filtered out with bandstop filters.\\
+        Filtering of the recorded EEG signal is necessary for different reasons. First there are current artifacts from 50Hz current. These can be filtered out with bandstop filters.\\
         Secondly we need to concentrate on the interesting frequencies (for classical EEG 1-50Hz). This is done by applying lowpass or highpass filters respectively. This is necessary because the PSD of lower frequency is a lot higher than that of higher frequencies. The relation $$PSD(f)=\frac{c}{f^\gamma}$$ holds for constants $c$ and $\gamma$ (\cite{Demanuele07}).\\
         The Butterworth filter (\cite{Butterworth30}) was invented by Stephen Butterworth in 1930. Its advantage was uniform sensitivity to all wanted frequencies. In comparison to other filters Butterworth's is smoother because it is flat in the pass band and monotonic over all frequencies. This however leads to decreased steepness meaning a higher portion of frequencies beyond cutoff.
     \subsection{EMG}
         When using muscles they are contracted after an signal via an efferent nerve activates them. Contraction of muscles also releases measurable energy which is used for Electromyography (EMG). There are intramuscular applications of EMG but we only used surface EMG.\\
         From surface EMG activity of muscles can be estimated however not very precisely without repetition. Since the muscles used for arm movements are quite large in our setting EMG allows relatively precise estimations of underlying muscle activity.
 
-        EMG is mainly developed for diagnostic tasks. However it is also applicable in science to track muscle activity.
+        EMG is mainly developed for diagnostic tasks. However it is also applicable in science to track muscle activity as we do here.
     \subsection{Synergies}
     \label{back:synergies}
         Movement of the arm (and other parts of the body) are under-determined meaning with given trajectory there are different muscle contractions possible. One idea how this problem could be solved by our nervous system are synergies. Proposed by Bernstein in 1967 (\cite{Bernstein67}) they describe the goal of the movement (e.g. the trajectory) instead of controlling single muscles. This would mean however that predicting the activity of single muscles from EEG is harder than predicting a synergy which in turn determines the contraction of muscles.\\
@@ -91,7 +91,7 @@
         In Figure~\ref{fig:pca} we see the eigenvectors of the data. The longer vector is the principal component the shorter one is orthogonal to it and explains the remaining variance. The second component here also is the component which explains least variance, since most variance is orthogonal to it.
     \subsection{NMF}
     \label{mat:nmf}
-        In some applications Non-negative Matrix Factorization (NMF) is preferred over PCA (cf. \cite{Lee99}). This is because it does not learn eigenvectors but decomposes the input into parts which are all possibly used in the input. When seen as matrix factorization PCA yields matrices of arbitrary sign where one represents the eigenvectors the other the specific mixture of them. Because an entry may be negative cancellation is possible. This leads to unintuitive representation in the first matrix.\\
+        In some applications non-Negative Matrix Factorization (NMF) is preferred over PCA (cf. \cite{Lee99}). This is because it does not learn eigenvectors but decomposes the input into parts which are all possibly used in the input. When seen as matrix factorization PCA yields matrices of arbitrary sign where one represents the eigenvectors the other the specific mixture of them. Because an entry may be negative cancellation is possible. This leads to unintuitive representation in the first matrix.\\
         NMF in contrast only allows positive entries. This leads to \qq{what is in, is in} meaning no cancellation which in turn yields more intuitive matrices. The first contains possible parts of the data, the second how strongly they are represented in the current input.\\
         The formula for NMF is
         $$Input\approx \mathbf{WH}$$
@@ -117,7 +117,7 @@
         ALS usually converges faster and with an better result than multiplicative update algorithms which would be the alternative in \matlab{}.
     \subsection{Autoencoders}
     \label{mat:autoenc}
-        Autoencoders are a specific type of artificial neural networks (ANN). They work like typical ANNs by adjusting weights between the layers however they do not predict an unknown output but they predict their own input. What is interesting now is manipulating the size of the hidden layer where the size of the hidden layer has to be smaller than the input size. Now in the hidden layer the information of the input can be found in a condensed form (e.g. synergies instead of single muscle activity).\\
+        Autoencoders are a specific type of artificial neural networks (ANN). They work like typical ANNs by adjusting weights between the layers however they do not predict an unknown output but they predict their own input. What is interesting now is manipulating the size of the hidden layer where the hidden layer has to be smaller than the input layer. Now in the hidden layer the information of the input can be found in a condensed form (e.g. synergies instead of single muscle activity).\\
         Autoencoders have been successfully used by Spüler et al. to extract synergies from EMG (\cite{Spueler16}). Especially with a lower number of synergies autoencoders should perform better than PCA or NMF because linear models fail to discover the agonist-antagonist relations that are typical for muscle movements. These however can be detected by autoencoders which allows for good estimations with half the synergies.\\
         An autoencoder's input layer has as many neurons as there are input dimensions (e.g. one for each EMG channel). The number of hidden layer neurons may be varied. We mostly used 3. The output layer is of the same size as the input layer. This autoencoder is shown in Figure~\ref{fig:autoenc}.
         \begin{figure}
@@ -140,11 +140,11 @@
         where $y_i$ is the class (+1 or -1) of the corresponding data.
         \begin{figure}
             \centering
-            \includegraphics[width=0.8\textwidth]{pictures/svm.png}
+            \includegraphics[width=0.6\textwidth]{pictures/svm.png}
             \caption{Margins and hyperplane (Figure by Cyc and Peter Buch)}
             \label{fig:svm}
         \end{figure}
-        This prototype of a SVM is only able to separate two classes of linear separable data. For other data some improvements were necessary.
+        This prototype of a SVM is only able to separate two classes of linear separable data. For other data some improvements are necessary.
         \subsubsection{Multiclass SVM}
             If there are more than two classes to separate it can be done with SVMs in different ways. One approach is \emph{one-vs-one} meaning all classes are compared with the according SVM and the SVM votes for one or the other class. This is done for all pairs and the class with most votes is picked.\\
             Another approach is \emph{one-vs-all} where every class is compared against the remaining. Here scores are used to determine which class matches best, i.e. in which class the data is farthest from the separating hyperplane.
@@ -188,7 +188,7 @@
     \subsection{Regression}
         Regression is the idea of finding $\beta$ so that $$y= X\beta+\epsilon$$ where X is the $n\times p$ input matrix and y the $n\times 1$ output vector of a system. Having this $\beta$ from given input the output can be predicted.\\
         There are different ways to find this $\beta$. One common approach is the \emph{ordinary least squares}-Algorithm. $$\hat{\beta}=\arg\min\limits_{b\in\mathds{R}^p} \left(y-Xb\right)^T\left(y-Xb\right),$$ meaning the chosen $\hat\beta$ is that $b$ which produces the lowest error since $Xb$ should be - besides from noise $\epsilon$ - the same as $y$.\\
-        Choosing $\beta$ like this brings the risk of overfitting. The training data is matched well, new data however are classified poorly. This problem is addressed with RIDGE-Regression.
+        Choosing $\beta$ like this brings the risk of overfitting. The training data is matched well, new data points however are classified poorly. This problem is addressed with RIDGE-Regression.
         \subsubsection{RIDGE-Regression}
         \label{mm:ridge}
             Instead of minimizing only the error in RIDGE-Regression (also called \emph{Tikhonov regularization}) the size of vector $\beta$ is also minimized: $$\hat{\beta}=\arg\min\limits_{b\in\mathds{R}^p} \left(y-Xb\right)^T\left(y-Xb\right)+\lambda b^Tb.$$
@@ -236,9 +236,9 @@
         A data point $y$ is classified as outlier if $y > q_3+1.5\cdot(q_3-q_1)$ or $y < q_1-1.5\cdot(q_3-q_1)$, where $q_1,q_3$ are the first and third quartile (which are also defining the box).
 \section{Experimental design}
 \label{mm:design}
-    The data used for this work were mainly recorded by Farid Shiman, Nerea Irastorza-Landa, and Andrea Sarasola-Sanz for their work (\cite{Shiman15},\cite{Sarasola15}). We were allowed to use them for further analysis.\\
+    The data used for this work was mainly recorded by Farid Shiman, Nerea Irastorza-Landa, and Andrea Sarasola-Sanz for their work (\cite{Shiman15},\cite{Sarasola15}). We were allowed to use it for further analysis.\\
     There were 9 right-handed subjects with an average age of 25 (variance 6.67, minimum 20, maximum 28). Three female and 6 male subjects were tested. All the tasks were performed with the dominant right hand.\\
-    To perform was a center-out reaching task to one of four targets (see \ref{fig:experimentalDesign}) while 32 channel EEG, 6 channel%
+    To perform was a center-out reaching task to one of four targets (see figure \ref{fig:experimentalDesign}) while 32 channel EEG, 6 channel%
     \footnote{\texttt{'AbdPolLo', 'Biceps', 'Triceps', 'FrontDelt', 'MidDelt'} and \texttt{'BackDelt'} were recorded for every subject, others only in some. Only the 6 channels tracked in every session were used} %
     surface EMG and 7 DOF kinematics were tracked.
     \begin{figure}
@@ -266,7 +266,7 @@
             \label{alg:load_bcidat}
         \end{algorithm}
         \subsubsection{Signal}
-            The signal is loaded as matrix of 41 channels (see Table~\ref{tab:channelNames}). All the values are integers corresponding to the voltage and also can be loaded as floating point values representing microvolts. Since the representation should not make any difference when analyzing the spectrum we use the smaller representation.
+            The signal is loaded as matrix of 41 channels (see Table~\ref{tab:channelNames}). All the values are integers corresponding to the voltage and also can be loaded as floating point values representing microvolts. Since the representation should not make any difference when analyzing the spectrum we use the smaller representation as integers.
             \begin{table}
                 \centering
                 \begin{tabular}{c|c|l}
@@ -356,7 +356,7 @@
         \end{itemize}
         For re-synchronization we pass the shift to every function that needs to align EMG and EEG data and operate on \emph{time-from-start} there.
     \subsection{Waveform length}
-        For the representation of EMG data we used waveform length as described by Phinyomark et al. in \cite{Phinyomark12} which gives a measurement for the change in the window. It is calculated as the sum over the absolute difference of consecutive measurements in the time window:
+        For the representation of EMG data we used waveform length as described by \cite{Phinyomark12} which gives a measurement for the change in the window. It is calculated as the sum over the absolute difference of consecutive measurements in the time window:
         $$\sum\limits_{i=1}^N \left| x_i-x_{i-1}\right|,$$
         where $N$ is the number of measurements in the time window and $x_i$ are the measurements themselves.\\
         It describes the length of the way of a needle plotting the EMG on a self-moving band.
@@ -368,18 +368,18 @@
         In addition we take the second before movement onset out of the data (classified as -1) and (optionally) half a second before movement onset as belonging to the following stimulus (cf. \ref{mat:pause}).\\
         Finally we do some smoothening by taking the most occurring class one second before to one second after the current time step as its class.\\
         As last step we adjust the length of the stimulus-vector to the length of the EEG data.\\
-        According to this classification we take only data in the further analysis which are classified different than -1 meaning they are either clear rest or clear movement.
+        According to this classification we take only data points in the further analysis which are classified different than -1 meaning they are either clear rest or clear movement.
     \subsection{Synergies}
     \label{mat:synergies}
         Synergies we generate based on different options for dimensionality reduction (cf. \ref{back:synergies}).\\
-        EMG data (as wave length) is reduced to $n$ dimensions, where $n$ is the desired number of Synergies.\\
+        EMG data (as waveform length) is reduced to $n$ dimensions, where $n$ is the desired number of Synergies.\\
         Using PCA this is done by taking the first $n$ components. Then the EMG data is transformed into the $n$-dimensional space spanned by the components.\\
         NMF is done with $n$ as inner dimension. Then EMG data is multiplied with the resulting matrix to transform it to $n$-dimensional data.\\
-        Eventually autoencoders are trained with a hidden layer of size $n$ and afterwards EMG data is encoded with the learned weights. This is equivalent to taking the hidden layer activity for the corresponding time step.\\
+        Autoencoders eventually are trained with a hidden layer of size $n$ and afterwards EMG data is encoded with the learned weights. This is equivalent to taking the hidden layer activity for the corresponding time step.\\
         Since synergies are generated from EMG they have the same dimensionality in the first dimension\footnote{only depending on window size and shift for EMG data and the recoding duration} and $n$ in the second.
     \subsection{Kinematics}
         Kinematic data we used either as movement or as position. The position was directly recorded, the movement is the first derivative of the position in time.\\
-        The kinematic record was started after the EEG recording. In synchronization channel\footnote{cf. Table~\ref{tab:channelNames}} there is a peak when kinematic recording is started. This was used to align movement with EEG and EMG data. In addition we adjusted the kinematic data to the EMG window and shift to be able to use corresponding data for the same time step. This was done by summing all differences (for movement) or by calculating the mean position in the time window.\\
+        The recording of kinematics was started after that of EEG. In synchronization channel\footnote{cf. Table~\ref{tab:channelNames}} there is a peak when kinematic recording is started. This was used to align movement with EEG and EMG data. In addition we adjusted the kinematic data to the EMG window and shift to be able to use corresponding data for the same time step. This was done by summing all differences (for movement) or by calculating the mean position in the time window.\\
         Size of this data is same as EMG and Synergies in length but has only three features per time step since we used only 3D positioning ($x,y$ and $\theta$) of the hand and no information about the fingers.
 \section{Data Analysis}
         Figure~\ref{fig:overview} shows the steps of our work. EEG, EMG and positions were recorded, Synergies and velocities were calculated from them. To check the performance of our methods the relations between them were predicted.
diff --git a/text/thesis/03Results.tex b/text/thesis/03Results.tex
index 9d456e7..f05f715 100644
--- a/text/thesis/03Results.tex
+++ b/text/thesis/03Results.tex
@@ -11,7 +11,7 @@
         \label{fig:noSyn}
     \end{figure}%TODO (last): check orientation of figure (bottom should be outer edge)
     When comparing the results of prediction via different number of synergies, 2 synergies perform significantly ($p<0.01$) worse than 3 and 4. Between 3 and 4 synergies there is no significant difference ($p\approx0.1$).\\
-    Even for Autoencoder only the performance of 2 synergies is significantly ($p<0.1$) worse.
+    For each method of synergy generation alone the performance of 2 synergies is not significantly ($p>0.05$) worse. Only the over-all performance with more data is significant.
 \section{Classification}
     \subsection{Comparison of methods of recording}
         The different methods of recording (EEG, EMG and Low frequencies) also differ in the results. An ANOVA gives $p<0.001$ for all classifications done on 4 different movements and rest.
diff --git a/text/thesis/04Discussion.tex b/text/thesis/04Discussion.tex
index b553806..b21cd1e 100644
--- a/text/thesis/04Discussion.tex
+++ b/text/thesis/04Discussion.tex
@@ -23,7 +23,7 @@
 \label{dis:lf}
     Our findings concerning low frequencies are a lot less promising than e.g. in \cite{Lew14}.\\
     The reason for that might be that the movements were not self induced but extrinsically motivated by a cue. \citeauthor{Lew14} however use low frequencies exactly for the purpose to detect voluntary movement.\\
-    We show that the use of low frequencies (at least as we did it here) has no advantage over the use of EMG (see table \ref{tab:pCorr}). This might also be a hint that movement relics were have the biggest part in low frequencies while moving. This however makes it impossible to use them for continuous tasks.\\
+    We show that the use of low frequencies (at least as we did it here) has no advantage over the use of EMG (see table \ref{tab:pCorr}). This might also be a hint that movement artifacts were have the biggest part in low frequencies while moving. This however makes it impossible to use them for continuous tasks.\\
     Low frequencies are great to early detect voluntary movement but are not applicable in our configuration.
 
     Which is interesting nevertheless is, that low frequencies also occur in rest. Quite some of the movements are classified as rest (see figure \ref{fig:cmFull}). If a sample is classified correctly as movement it is quite likely that is is also classified correctly - however with an preference on class 3 again. This matches the understanding of low frequencies as pre-movement activation mainly belonging to voluntary movement. The subjects probably plan all the possible movements while in rest to execute then once the stimulus was shown.
@@ -50,15 +50,13 @@
     \label{dis:noSyn}
         As shown in section~\ref{res:noSyn} 2 and 4 Synergies are good values for Autoencoder since the slope of the mean prediction is steeper before than after. Another neuron doesn't improve the result as much as the last.\\
         For PCA and NNMF this value is reached at 3 as figure \ref{fig:noSyn} shows.
-        %TODO: 2, 4
-        % Autoencoder better when having fewer synergies(?)
     \subsection{Autoencoder, PCA or NMF}
         In many applications the synergies computed with different methods perform similar, however some differences can be found.
         \subsubsection{Prediction from EEG}
             PCA data is predicted from EEG significantly worse than e.g. autoencoder data ($p<0.001$). Between NMF and autoencoder there is no significant difference.\\
             We conclude autoencoder and NMF are to prefer when looking for good predictability from EEG.
         \subsubsection{Number of Synergies}
-            TODO
+            With our data we can not show a better performance of an autoencoder with only 2 synergies. Similar to the other methods of synergy calculation there is a significant decrease in predictive performance.
     \subsection{Prediction via Synergies}
         Of course the prediction via Synergies is a bit worse than direct prediction, since the machine learning techniques could do the same dimensionality reduction and also much more.\\
         This decrease however is not large which suggests that synergies are a valid step in between.\\
diff --git a/text/thesis/05Future.tex b/text/thesis/05Future.tex
index af15ff8..e78cdc1 100644
--- a/text/thesis/05Future.tex
+++ b/text/thesis/05Future.tex
@@ -16,7 +16,8 @@
     For a better use of low frequency features our work could be compared with data recorded when subjects move voluntarily. This might also influence the way synergies are predicted and could lead to an better prediction.\\
     Additionally this task matches the requirements for an BCI better, as movement in daily life is more voluntary than decided by a single auditory cue.
 \section{Synergies}
-    TODO %TODO
     \subsection{Generation of Synergies}
         We proofed the plausibility of synergies here so the next step could be to improve the acquisition. Generating them from EMG may include unnecessary information. The generation of synergies as an intermediate step between EEG (or generally brain activity) and EMG (or generally muscle activity) my achieve even better results.\\
         A dimensionality reduction in EEG only probably will not work since there is to much unrelated activity, EMG only bears the problem of lower fit to the movement as we showed.
+    \subsection{Autoencoders}
+        We did not find significantly better performance of autoencoders even with only 2 synergies. Since this was not the focus of our work here that might however be possible. Additional research is needed to answer finally which method is best to generate synergies.
diff --git a/text/thesis/pictures/kernel.tikz b/text/thesis/pictures/kernel.tikz
index 30c973a..30a4d92 100644
--- a/text/thesis/pictures/kernel.tikz
+++ b/text/thesis/pictures/kernel.tikz
@@ -1,10 +1,10 @@
 %kernel.tikz
 \begin{tikzpicture}[scale=2]
-  \node[left] (O) at (0,0) {$O$};
+  \node[below left] (O) at (0,0) {$O$};
   %Draw the Circle around it all
-  \draw[semithick] (0,0) circle (1);
-  \draw[dotted,red] (0,0) circle (1.5);
-  \draw[dotted,blue] (0,0) circle (0.5);
+  \draw[thick] (0,0) circle (1);
+  \draw[dotted,red,very thick] (0,0) circle (1.5);
+  \draw[dotted,blue,very thick] (0,0) circle (0.5);
   %Draw grid
   \draw[->] (-1.6,0) -- (1.6,0);
   \draw[->] (0,-1.6) -- (0,1.6);
@@ -13,8 +13,8 @@
   \node[below] (O) at (3,0) {$O$};
   %Draw the Circle around it all
   \draw (4,-0.1) -- (4,0.1);
-  \draw[dotted,red] (4.5,-0.1) -- (4.5,0.1);
-  \draw[dotted,blue] (3.5,-0.1) -- (3.5,0.1);
+  \draw[dotted,red,very thick] (4.5,-0.1) -- (4.5,0.1);
+  \draw[dotted,blue,very thick] (3.5,-0.1) -- (3.5,0.1);
   %Draw grid
   \draw[->] (2.75,0) -- (5,0);
 \end{tikzpicture}
diff --git a/text/thesis/thesis.tex b/text/thesis/thesis.tex
index 8090cfe..fbe1987 100644
--- a/text/thesis/thesis.tex
+++ b/text/thesis/thesis.tex
@@ -123,11 +123,11 @@
 \section*{Acknowledgments} %TODO : morgen nochmal lesen
 \addcontentsline{toc}{section}{Acknowledgments}
 
-\noindent First of all I would like to thank my supervisor Martin Spüler who always had an open ear for problems and gave lots of useful input. %TODO
+\noindent First of all I would like to thank my supervisor Martin Spüler who always had an open ear for problems and gave lots of useful input. %TODO: "open ear"?
 
 \noindent My thanks also go to Farid Shiman, Nerea Irastorza-Landa and Andrea Sarasola-Sanz whose data I was allowed to use for my work.
 
-\noindent Also my reviewers Professores Rosenstiel and Benda shall be named here. I want to express my thanks for their willingness to review my thesis and my hope that this thesis may found a link between your chairs for further projects.
+\noindent Also my reviewers Professores Rosenstiel and Benda shall be named here. I want to express my thanks for their willingness to review my thesis and my hope that this thesis may found a link between their chairs for further projects.
 
 \noindent This work would not have been possible without the Free Software Community providing \LaTeX, TikZ and Ubuntu for example. My deep gratitude goes to all the developers spending time for the improvement of commonly available software.